A study looking at new approaches to discovering censored websites around the world has found that the list of banned websites in China, Indonesia, Iran, and Turkey is actually 10 times longer than thought.
In total, they found nearly 6 million censored websites across the four countries, which is considerably more than had previously been documented. Interestingly, the most common type of websites banned and the type of content restricted are rather specific for each country. Some are obvious, and some are rather more surprising.
Researchers from the Department of Computer Science at the University of Oxford were interested in finding new ways to track unknown websites that are censored, revealing their results in a study currently available to view on arXiv.org.
“The problem that we and a few others have been banging our heads against for quite a while is what about the websites that you don’t know are blocked?” one of the study's authors, Joss Wright, told New Scientist.
For example, it’s a lot easier to monitor well-known sites like Facebook and Twitter that are famously blocked in China than it is to find ones you’ve never heard of.
To do this, they created a fully automated system that uses web-crawling techniques – a much-used method of browsing the web and scraping data, most commonly used by search engines – that investigated known filtered or blocked sites. They followed links these filtered pages hosted to see if they would lead to new previously undetected sites, and then used their tool to test whether this new site was censored too.
According to the authors, their tool performs better than any current state-of-the-art filter detection tools. To demonstrate this, using the four countries mentioned as experiments, they built up a data list of over 6 million banned websites – larger than any other currently available list.
Studying the banned content of these countries also provided some rather intriguing information on the kind of content China, Indonesia, Iran, and Turkey are most likely to restrict.
In China, perhaps unsurprisingly, it was news and media outlets, as well as search engines and translators, that were most restricted. In Iran, personal blogs and personal pages were more likely to be filtered, although trends also pointed towards websites that explain how to avoid being filtered – which suggests a spirit of rebellion.
In Indonesia, shopping sites and personal ads were the most blocked content, and Turkey keeps a tight rein on gambling sites – something that is tightly regulated there, as well as, strangely, dating sites. Perhaps they still believe in the power of meeting someone IRL?
For the authors of this study it was about how to track undetected filtered sites, but once you have that information, it is intriguing to study the content censorship authorities feel is the most important to block.
[H/T: New Scientist]