We are nowadays used to navigating the internet to answer many of our questions and to satisfy our needs, such as booking a hotel or buying a flight ticket. While we absorb information from the web, however, we are also leaving our fingerprints there. “Until 10 years ago, it was taken for granted that everybody saw the same content online,” points out Ancsa Hannak, a research associate at the Hungarian Academy of Sciences and a Visiting Professor at the Central European University. “However, this is not the case anymore, as most of the content to which we are exposed is now filtered, based on our previous searches, our location and the people we are connected to.” Personalization is indeed shaping most aspects of our lives, from searching through online purchase to media consumption and networking… and users are still unaware!
What kind of data is used for personalization? And how much content is affected?
In her talk on “Measuring personalization in online content serving services”, Dr. Hannak moves from anecdotal evidence to a systematic analysis of the phenomenon. The first challenge to address is that differences in online content do not always imply personalization, as most websites are dynamic and change over time. Dr. Hannak developed a novel methodology based on the repeated analysis of the outcome of queries from real users and control queries from fresh and clean browsers to filter out differences in online content which cannot be attributed to personalization, but to collateral effects such as distributed infrastructure, load balancing, A/B testing and so on.
By analyzing web searches and people’s browsers it was revealed that as much as 12% of the content was affected by personalization for searches performed through Google, and 15% of those performed through Bing. As expected, instead, searches on DuckDuckGo, a search engine which claims to protect the privacy of its users, did not reveal biases.
Which user features drive personalization? From the analysis of a pool of independent factors, including gender, age, and others, location clearly emerged as the key ingredient behind personalization of online content. In particular local queries, connected to searches in the geographical area such as for coffee places, showed a high level of variability, whereas looking for famous people or historical facts typically produced much more universal results. “Sensitivity to geographical position at a small scale opens the chapter of data protection and information privacy, as location is often correlated with ethnicity, wealth and other sensitive information,” warns Dr. Hannak.
When used correctly, personalization may be a useful tool to help us speed up searches by leveraging our tastes and preferences in a world where the amount of information is becoming overwhelming. Although you might expect Netflix to suggest new TV shows for you to watch, providing a convenient service of recommendation, unfortunately personalization can also be used for illegal discriminatory actions. A case in point is the recent EuroDisney fraud, where the company behind the famous amusement park was found guilty by the EU commission of price discrimination with respect to citizens of different European countries.
Blog post by Federico Battiston