The Tor network was established to provide anonymity to internet users who are concerned about preserving their anonymity when browsing specific online content. The Tor network is comprised of a set of router circuits that utilize cryptography to apply multiple layers of encryption to transmitted data packets, thus concealing the IP address of the user, the established connection between the user and the remote web server, and the online content they are accessing.
Even though Tor can offer high levels of anonymity, it can be vulnerable to a specific attack form known as Website Fingerprinting (WF). Via a WF attack, an adversary can guess the target web page accessed by a Tor user, under certain circumstances.
A recently published paper analyzed the robustness of the Tor network against WF attacks. For this purpose, an adversary who had the ability to eavesdrop the data packets sent and received by the target Tor user has been emulated. The attack was based on the postulation that the adversary knew in advance the group of possible remote web pages the Tor user would be browsing. In other words, the attack was performed under the assumption of the existence of a closed world.
Launching a Website Fingerprinting (WF) attack:
To successfully launch this attack, the adversary must create a dataset including a group of traces with the data packets being sent and received when the target user downloads each one of the web pages before launching the attack. This dataset had to be formatted properly, and it included around 100 different websites with different attributes throughout the experiment. Thereafter, a Machine Learning Algorithm must be applied to categorize each trace according to the web page it is linked to. Three algorithms were used to run an exhaustive group of tests: KNN, Random Forest, and SVM. The parameters associated with each of the used algorithms were turned in order to obtain the ideal setting that yields the best precision and success rate.
Success rate of WF attacks:
Website Fingerprinting attacks can be executed with high success rates as long as enough samples are obtained to generate a highly representative dataset and the parameters of the used three algorithms are properly optimized. Occasionally, the time needed for training and classification of the used algorithms can be inappropriately long, especially if the dataset includes a large number of samples. The experiments carried out in this study prove that the dataset format and the number of attributes of each sample are the main factors that greatly affect the precision of the classifiers and the success of the attack. On the other hand, it should be emphasized that the attack can be successfully executed so long as the adversary is capable of capturing the traffic sent and received by the Tor user, which is not always possible.
When a more realistic environment is considered, the accuracy of the used classifiers would considerably decline for three fundamental reasons. Firstly, in this study, the researchers have assumed that the target Tor user does not connect to two or more web servers simultaneously, so it is relatively easy to identify each of the established connections. Moreover, all the samples that have been used in the experiments were associated with a small number of Tor circuits. In a live Tor network setting, it is highly likely that the samples obtained from the target user will correspond to circuits other than those that had been used for the samples utilized for the training of the classifiers. Finally, it should be noted that the experiments conducted in this study have been carried out in an environment of a closed world: i.e. the web pages that the target user is supposed to browse are limited in number and known by the adversary. A Tor user who is used to browsing a large number of websites is less likely to be vulnerable to this form of attack.
This study has proven that Website Fingerprinting can be successful in undermining the anonymity of Tor users when a closed world environment setting is considered. WF is also used to bust hackers via Micro-Honeypots. However, more research is needed to test this attack in an open world environment setting, where the number of possible websites is more or less unlimited and unknown to the attacker. In such setting, the attack should be launched via unsupervised learning. Moreover, emulating a flow correlation attack would be very helpful in such setting.