Throughout the years of Tor existence many users lost their anonymity. I’m going to explain a technique called “Correlation Attack” that government agencies used in the past for that purpose. These include exploiting human errors as well as highly sophisticated mathematical methods exploiting software flaws.
This attack has been around since Tor widespread usage began and it seems like it isn’t going anywhere in the recent future. An attacker controlling the first and last router in a Tor circuit can use timing and data properties to correlate streams observed at those routers and therefore break Tor’s anonymity.
No simple patch can be made that can prevent this method because it’s not exploiting any bug, but rather uses math (probability and statistics) and attacks the logic of Tor network. With that said, there are ways to made this task much more difficult, but they are usually rejected to preserve low latency.
Some attacks are not even against software, but against users. For example, if dark market admin shared some information about himself such as state, age and/or past criminal activities, it becomes feasible for government agencies to monitor all possible suspects’ internet activity and try to see which one connects to the Tor network at the same time admin comes online.
Previous example was easy, let’s analyze a case where targets are smarter and disclose zero information about themselves. The idea is to control a sizeable portion of Tor relays and hopefully, as many guards (1st relay that knows your IP address) and exit relays (those that connect to server).
It’s already clear that this attack needs good sponsorship and is mostly done by government agencies. Reason behind this is that Tor counts over 7000 relays and over 2 million daily users.
Since Tor employs volunteer resource model, anyone is encouraged to start any number of relays to help Tor network. One that controls a sizeable portion of relays has a chance of “serving” as guard and end relay for the same user. It’s only a matter of time when you will start using compromised circuit.
Attacker uses automatic packet analysis on both relays to calculate a correlation coefficient. The most useful variables are timing, packet size and frequency. Although this information gives the attacker pretty good idea which website you are visiting, because of huge size of Tor network there are many false positives.
Exact percentage of these conclusion greatly vary on what kind of traffic you are making. For example, the easiest target is the one that is downloading some files because there are many sizeable packets to compare. One that is simply browsing a website is doing the same as thousands of other users and a chance for false positives increases.
According to this paper, 80% of users can de deanonymized in the period of six months by realistic adversaries. This is no proof on court because of possible false positives (ranges from 5-10% depending on the correlation algorithm), but provides enough suspicion to start further monitoring.
It’s very likely that Carnegie Mellon University attack on Tor network was indeed correlation attack. The information about Tor users was then sold to FBI for $1 million. At the time (early 2014), Tor relays could easily confirm their suspicion by adding an arbitrary value to the packet and check for it on the other end to reach the level of certainty. This was quickly patched, but correlation attack is still not prevented.
This attack was pitfall for many websites and their users including Silk Road 2.0 and 2 child porn sites.
Good thing is that Tor contributors are well aware of this attack. The Tor Project is already working on techniques that make website fingerprinting attacks less effective.
You shouldn’t be concerned about these attacks if you’re using a trusted VPN to connect to Tor network because this attack won’t yield your IP address, but the one belonging to a proxy server. Be aware that all VPNs must obey the laws of the country they reside in and most countries require all ISP (including VPNs) to keep the log of all users activity for a period of time (usually around 2 years) and provide that information if the court issues a warrant. Even if VPN resides in a country that has no such laws, they might be selling your information. Thankfully, deepdotweb offers great advice on choosing the right VPN.
Before you comment “VPN + Tor sucks”, read what Tor developers have to say on this topic. Using VPN has both its benefits and downsides, I recommended using VPN because it saves you from this particular attack.
My opinion is that the quality of VPN is all that matters. If they log your data, they will only make government agencies wait for a warrant. They’ll sell it to everyone that offers some money too. On the other hand, no-log VPN can be invaluable.
P.S. I believe all VPNs keep logs – why wouldn’t they? You can’t know it anyway. And I can’t persuade myself that they would refuse money for my identity either. At least some VPNs don’t have to give up our identity to law enforcement agencies, which is nice.