A journalist and a data scientist, namely Svea Eckert and Andreas Dewes joined forces and have made an incredible research, claiming that anonymous browsing data can easily be exposed.

At the Def Con hacking conference in Las Vegas, the duo showed their findings, revealing how they got access to a database which contained 3 billion URLs from 3 million German users across 9 million different websites. They had record of those 3 million Germans through the 30-day period, with few people not browsing a lot with only opening just over a dozen websites during the period, while other users had tens of thousands of data points. This also showed, they had full access to their online lives.

The duo of Eckert and Dewes revealed that they got the information much easier than actually buying the data. They created a fake marketing company which was complemented by its own website and LinkedIn profile for its chief executive, fooling other companies.

The pair claimed on that website that they had developed a machine-learning algorithm which will market their companies more efficiently to people, but the condition was that it had to be presented with a large amount of user data.

Eckert revealed that the pair called almost a hundred companies and told them that they require raw data, which is called the clickstream of people’s lives. She also revealed that the companies did not hesitate before giving them the browsing data.


Andreas Dewes and Svea Eckert during a presentation

The data which Eckert and Dewes gathered came for free, with a data broker co-operating with the pair and was willing to let the duo test their hypothetical AI advertising platform. And with a lot of ease, all the anonymous data of many users was exposed by the pair easily.

The data scientist, Dewes explained some ways to find an individual person amongst the long list of URLs and timestamps. One such example was when anyone who visited their own alaytics page on Twitter ends up generating a URL with their own username and is only visible to them. If a canny broker can identify that URL, he could end up with the anonymous data of that actual person.

Other ways also include by uniquely identifying an individual. This can be done, for instance, if 10 URLs which a person visits regularly can identify different hobbies, it is easier to identify a user. Moreover, by creating fingerprints from that user data, it is possible to compare it with different URLs people have visited and identify that certain individual.

According to Dewes, a similar methodology was implemented back in 2008 by researchers who unmasked few Netflix users after a set of ratings were published by Netflix of films with public profiles on IMDB. As a result, several Netflix users were unmasked, with one woman filing a lawsuit against Netflix for the violation of her privacy.

It would be interesting to know, that another way to collect anonymous data is through Google Translate, with the platform storing the text of every query in the URL. Researchers benefited from this and were able to reveal the details of an operation about a German cybercrime investigation, with the detective held in charge using Google Translate to communicate with foreign police officers.

According to Dewes, the data was collected from a number of browser plugins, with the tool of “safe surfing” being the main offender. As a result of Dewes and Eckert revealing their findings, the browser plugin improved its privacy policy.