Home » Articles » Tor’s Biggest Threat – Correlation Attack
Click Here To Hide Tor

Tor’s Biggest Threat – Correlation Attack

Throughout the years of Tor existence many users lost their anonymity. I’m going to explain a technique called “Correlation Attack” that government agencies used in the past for that purpose. These include exploiting human errors as well as highly sophisticated mathematical methods exploiting software flaws.

This attack has been around since Tor widespread usage began and it seems like it isn’t going anywhere in the recent future. An attacker controlling the first and last router in a Tor circuit can use timing and data properties to correlate streams observed at those routers and therefore break Tor’s anonymity.

No simple patch can be made that can prevent this method because it’s not exploiting any bug, but rather uses math (probability and statistics) and attacks the logic of Tor network. With that said, there are ways to made this task much more difficult, but they are usually rejected to preserve low latency.

Some attacks are not even against software, but against users. For example, if dark market admin shared some information about himself such as state, age and/or past criminal activities, it becomes feasible for government agencies to monitor all possible suspects’ internet activity and try to see which one connects to the Tor network at the same time admin comes online.

Previous example was easy, let’s analyze a case where targets are smarter and disclose zero information about themselves. The idea is to control a sizeable portion of Tor relays and hopefully, as many guards (1st relay that knows your IP address) and exit relays (those that connect to server).

It’s already clear that this attack needs good sponsorship and is mostly done by government agencies. Reason behind this is that Tor counts over 7000 relays and over 2 million daily users.

Since Tor employs volunteer resource model, anyone is encouraged to start any number of relays to help Tor network. One that controls a sizeable portion of relays has a chance of “serving” as guard and end relay for the same user. It’s only a matter of time when you will start using compromised circuit.

Attacker uses automatic packet analysis on both relays to calculate a correlation coefficient. The most useful variables are timing, packet size and frequency. Although this information gives the attacker pretty good idea which website you are visiting, because of huge size of Tor network there are many false positives.

Exact percentage of these conclusion greatly vary on what kind of traffic you are making. For example, the easiest target is the one that is downloading some files because there are many sizeable packets to compare. One that is simply browsing a website is doing the same as thousands of other users and a chance for false positives increases.

According to this paper, 80% of users can de deanonymized in the period of six months by realistic adversaries. This is no proof on court because of possible false positives (ranges from 5-10% depending on the correlation algorithm), but provides enough suspicion to start further monitoring.

It’s very likely that Carnegie Mellon University attack on Tor network was indeed correlation attack. The information about Tor users was then sold to FBI for $1 million. At the time (early 2014), Tor relays could easily confirm their suspicion by adding an arbitrary value to the packet and check for it on the other end to reach the level of certainty. This was quickly patched, but correlation attack is still not prevented.

This attack was pitfall for many websites and their users including Silk Road 2.0 and 2 child porn sites.

Good thing is that Tor contributors are well aware of this attack. The Tor Project is already working on techniques that make website fingerprinting attacks less effective.

You shouldn’t be concerned about these attacks if you’re using a trusted VPN to connect to Tor network because this attack won’t yield your IP address, but the one belonging to a proxy server. Be aware that all VPNs must obey the laws of the country they reside in and most countries require all ISP (including VPNs) to keep the log of all users activity for a period of time (usually around 2 years) and provide that information if the court issues a warrant. Even if VPN resides in a country that has no such laws, they might be selling your information. Thankfully, deepdotweb offers great advice on choosing the right VPN.

Before you comment “VPN + Tor sucks”, read what Tor developers have to say on this topic. Using VPN has both its benefits and downsides, I recommended using VPN because it saves you from this particular attack.

My opinion is that the quality of VPN is all that matters. If they log your data, they will only make government agencies wait for a warrant. They’ll sell it to everyone that offers some money too. On the other hand, no-log VPN can be invaluable.

P.S. I believe all VPNs keep logs – why wouldn’t they? You can’t know it anyway. And I can’t persuade myself that they would refuse money for my identity either. At least some VPNs don’t have to give up our identity to law enforcement agencies, which is nice.

Inserting image...


  1. This is why Tor bridges with anonymous, public Wi-Fi access points are so very critical, along with good backend encryption, such as a USB hardware encrypted stick with Tails persistent storage, and within that, a TrueCrypt volume.

    • Wrong on so many level, it’s hard to tell where to start. TrueCrypt: Hopefully you’re at least using the second-to-most-recent version rather than the very last that was posted just as the developer went AWOL. A binary analysis of the newer one showed *significant* differences. Something really fishy there. Now, let’s gloss over that for now and assume it never happened. Reality: It is an abandoned project and has been for years. That means no patches for bugs, no hardening against newer attacks against its implementation etc. It’s obsolete nature makes it now a completely untrustable program. I ‘think’ it’s pretty common knowledge by now that the math of modern encryption isn’t getting brute forced anytime in the near future. BUT, *how* that math is implemented into software is critically important and the primary (not counting researchers, probably the only) angle of attack.

      Now, A ‘USB encrypted stick with blah, blah, blah’ won’t do you any good when it comes to LE correlating your location. Or do you drive half-way across the country every time you use the Internet? *Most* of these busts we hear about sends everyone into an ignorant frenzy of “omg, I’m going to air-gap my USB, increase to 65,536 bit keys,” etc. This is a symptom observable in people not knowing how a particular technology truly works. I mean truly. They always overcompensate. If you can grab a spec or RFC and have a solid grasp on a programming language (no, “web development” does not count), say, enough to at least do a reasonably-complete implementation of the spec/RFC, then you can say you know the technology. Otherwise, you sound like a parrot squawking back what you’ve heard other overreacting individuals spew forth. You truly give yourself away when you topped it all all with “hardware encrypted stick…and within that a truecrypt volume.” First, you aren’t really getting anything out of that. The best insider info we’ve ever had into the big N’s capabilities is Snowden, who has already said encryption works. Period. Not ‘double-triple-extra-secret-squirrel encryption.’ Like I said before, no one’s really wasting their time on breaking the math any more. They’re going after flaws in the implementation. Or stupid user errors. $20 says you use the same password for the TC volume as you do with your ‘hardware encrypted stick’ Guess what? You may as well not be using a TC volume (pointless anyway) since if your ‘hardware encrypted stick’ is cryptographically-broken, so is your TC.

      The reality is that only 1 or 2 busts that we’re aware of had anything to do, primarily, with just the technology. Shit, it wasn’t technology that first clued-in the LE types. Almost all busts started out as regular investigations and continued-on as regular investigations…just adding in some tech to help seal the lid on the person’s coffin. While it is important, depending on your use case, to take certain actions–you’re far,far, far more likely to get popped going around in your car wardriving WiFi hotspots so you can do your dirt. Encrypt all you want, you’ll look suspicious as hell and they’ll find a reason to detain you and see what else you’re up to. Plus, there’s always Rubber-hose cryptanalysis (where you’re beaten/tortured into giving-up your keys). If you’re in the US, they won’t quite go that far (Hopefully. Yet.), though they’ll happily let you rot in a jail cell for contempt of court for refusing to do so. The courts here have, so far, come down on the side of the accused in terms of labeling compulsory key handing-over as self-incrimination. But who’s to say how long that will last?

      Rather than all this teenaged-prone ‘more and faster encryption…then more and faster again…and jump through these hoops…buy what know-it-alls are telling me to buy to stay safe’ crap, you and others get off your asses and do something truly helpful: Vote. Get others to vote. Evangelize the death of personal liberty if we don’t curtail this police state shit.

      Tor bridges help with censorship, NOT anonymity. They do so only by virtue of the fact they’re not well-known and thus on everyone’s black/watch-list. There’s nothing special about them. The ability to fingerprint Tor traffic, which is not to say crack/read it–just to know that’s the protocol a person is using–is already in common usage amongst ISPs, the gov’t, Cloudflare, etc.

      • Anonymous

        Your analysis simply does not bear-out with the facts of reality. Fact is that the FBI, with its NIT technology, has only been able to unmask a tiny fraction of users who are visiting “illicit” websites. Try this — scroll up and down, while carefully looking at the list to your right. Why are those sites still online? Some have been up for a few years now. Don’t you think that the FBI/LE/TLAs would love to take those sites down??? Or, are they all government stooges?

        As for multiple layers encryption, you’re mixing apples & oranges. A hardware encrypted USB stick uses a PIN, which you must enter before the hardware will unlock itself. A Tails persistent storage volume, using LUKS, will use an alphanumeric password. Ditto for TrueCrypt, but with that, you have the option of using keyfiles, also. Why TrueCrypt and not VeraCrypt? Because TrueCrypt volumes can be mounted in the Debian tails environment. Just check their documentation for more details.

        The principle here is Defense-in-Depth. A single layer of encryption is, as you pointed out, almost certainly sufficient; however, there have been examples of LE (in Britain) cracking a PGP disk only to find another encrypted file container within that, and so, it can and does happen.

      • Anonymous

        Shush, can we be friends?

  2. Something similar, reddit manipulation:


  3. as if any of that data is definitive to hold up as legit.. shm..

  4. So do we know how the FBI identified the Playpen server? CMU data? TorMail? Plain bad OpSec?

    • Filip Jelic

      Law enforcers never reveal that information, but since playpen was taken down under operation Pacifier which was 1month after CMU sold data to FBI, it was probably correlation attack by CMU.

      • Filip Jelic

        I used timing correlation to conclude that haha :)

      • Yes, that’s what I’m guessing, based on timing. I suspect that the FBI did disclose that in the root Playpen case, but undoubtedly under seal. It’s also possible that the whole thing is parallel construction, based on NSA intercepts. They saw CMU’s attack on Tor, and knew what it found, and then tipped off the FBI. So the FBI subpoenas CMU, and the rest is history. Too paranoid?

  5. If you rent a vps with crypto coin and run your own openvpn server with an entry node then how would they unmask you unless they co-opted your vps os .

  6. Thanks for this information, i am buying a VPN

  7. tails give your more secure

  8. So would it be a good idea to set up your own Tor node and always use it as first hop (use it’s IP like it is a bridge)?

  9. What about fingprinting attack on encrypted traffic with classifier like naive bayes ?

    Which is mentioned here

    Here is an eyperiment which gives an really bad result for vpn single hop systems.


Leave a Reply

Your email address will not be published. Required fields are marked *


Captcha: *