Internet usage differs markedly from one country to another when accessed websites are considered, especially in countries where censorship is exercised. People all over the world usually use Tor to communicate anonymously over the internet. The institute of networks and security (INS) hosts an exit node, with high bandwidth (200 MBit/s), on the Tor network for research purposes. A recently published research study investigated the DNS traffic of this Tor exit node for anomalies and used the obtained results to identify which websites people are visiting via the Tor network.
The exit traffic of this high bandwidth Tor exit node provided statistics concerning websites visited, bandwidth, and the target countries based on the IP addresses. The study was then extended to involve the DNS traffic associated with this exit node. Because the Tor system is namely used for anonymization, all DNS requests, along with their associated replies, originated from the Tor exit traffic. This information was obtained via installing a special DNS relay server, linking it to a university DNS server, and setting the configuration of the Tor exit node in order to use this DNS relay server for the purpose of name resolution. Query logs obtained from the DNS relay server will thus include all needed information, including the requested domain name and the response, i.e. the IP address. Moreover, any direct relationships to connections (Tor circuits) are broken. To maximize the details of data obtained, the values of DNS timeouts, returned by the relay to the exit node, is set to a low level of 1 minute.
The study presents some interesting information, which we will overview throughout this article.
Top ranked surface internet websites (NOT visited):
Surface internet websites belonging to Alexa’s top 1 million list, which were not accessed via Tor, represent one of the investigations conducted throughout this study. These include websites which are sometimes not accessed during a 1 hour period, websites which are often missing, and websites that are never or very rarely accessed via the study’s Tor exit node. Only the top 100 websites of the list were checked.
The first group of websites (almost always visited) included twitch.tv, sina.com.cn, wikia.com, 360.cn, ebay.com, aliexpress.com, weibo.com, livejasmin.com, yahoo.co.jp, linkedin.com, google.co.uk, bongcams.com, alipay.com, netflix.com, baidu.com, pornhub.com, t.co, wikipedia.org, xhamster.com, tumblr.com, reddit.com, and google.fr. These websites were absent between 6.7% of all 1 hour periods (twitch.tv) and almost 0% not occurring in a single hour over a period of five months (the last six websites of the aforementioned list).
The second group included websites which were absent in 12.8% to 29.8% of the traffic of the studied Tor exit node. The group included amazon.co.jp, google.com.br, imdb.com, jd.com, sohu.com, naver.com, and google.co.in.
The third group included websites which are absent between 100% (never visited) and 53.7% of the traffic of the studied Tor exit node. The group included tmall.com, login.tmall.com, and google.co.jp.
Country code top-level-domains (TLDs) queried via Tor:
Evaluation of the domains’ country codes shows several anomalies. As shown via the chart in figure (1), after exclusion of all generic TLDs, the TLD distribution of the queried domain names showed extreme distribution.
Figure (1): Domain name queries in comparison with registered domain names
29% of all country code TLD queries are for domain names under .ru, while 12.5% are for domain names under .de. The .fr country code represented 4.27% of the queries, and .nl represented 4.04%. Collectively, each of the top 25 country codes represented over 1%, and all together they accounted for 86.9% of all domain name queries. Under the .ru country code there are 4,976,168 registered domain names, while under the .de country code, there are 14,572,679 domains (around three times as many). As such, in comparison, the .ru country code is overpresented to .de by a factor of 6.8. A more striking result can be obtained with the .su country code (the former Soviet Union which still exists as a TLD and is mainly used in Russia), which was associated with 4.7 times as many domain name queries during the period of 5 months than .ru. Another spike exists for Ukraine’s country code, .ua, where online censorship is exercised by the state. The peaks associated with .tv and .io can be explained – these domain names’ codes are registered for the purpose of the name itself, unlike other TLD country codes.
Consequently, we can conclude for which country code TLDs it is more important for users to access anonymously. Russian websites are almost always visited anonymously, while no one cares whether or not German websites identify who their visitors are. Nevertheless, China represents a different example. China has very strict rules regarding internet access, so Tor represents a pivotal tool to users there. China has an enormous number of registered domains, but comparatively with a relatively small number of queries. This can be explained by the fact that domains registered under the Chinese country code are hosted within China, which means that their content is mostly under political control and the name of the website owner is strictly verified. As such, accessing these websites in an anonymous manner is useless, as undesirable or critical website content can never be expected. For Chinese users, the importance of Tor lies in accessing foreign websites. On the other hand, Russians and Ukrainians have less difficulties in accessing foreign online content, yet they may want to maintain their anonymity against their states.
Results are very similar when the amount of traffic is considered. Figure (2) shows the traffic volume in Bytes transferred to different countries on the basis of the IP address location.
Figure (2): Comparing DNS queries to traffic associated with different countries
We can note that the exceptions are associated with .fr, .cn, .nl, .ro, and .at. The big traffic volume associated with Austria can be explained via the geo-location of the exit node, particularly large-scale internet traffic, such as video, will be routed to close servers. France, Romania, and the Netherlands are comparable as they have many hosting centers for TLDs from numerous parts of the world. This would leave China as the exception, as some popular websites targeted at other world countries can be physically hosted in China. This can be explained by the fact that Chinese users are often forced to use Tor, even if its services are suboptimal (large files, remote exit node, etc.).