Tor is by far the most widely used anonymous network that is increasingly being exploited by users hosting and publishing content via hidden services. In most cases, Tor’s hidden services are used to distribute content that is morally or illegally deplorable, e.g. child pornography. Law enforcement agencies (LEAs) are constantly trying to identify users hosting and distributing such content. Innovative techniques to deanonymize Tor’s hidden services are namely based on active and passive network traffic analysis, and depend on the efficiency of the deanonymization entity to take control of Tor’s edge communication. Apart from this, locating Tor’s hidden services and establishing links between illegal content, such as videos and photos, with those hosting these forms of content is considerably a controversial and hard task to accomplish successfully.
Deanonymization of Tor hidden services and identification of the real destination of network traffic flow has been extensively studied during the past few years. A myriad of network flow watermarking algorithms have been proposed as means for tracing back the origin of hidden services. The majority of watermarking algorithms alter the packet timestamps to impress a specific timing pattern within the network traffic flows. Throughout this paper, we will take a look at some of the most recent network flow watermarking algorithms.
Inflow – the most recent watermarking algorithm:
A recently published paper, proposed “Inflow”, a novel technique to locate hidden services via means of inverse flow watermarking. Inflow makes use of the effect of congestion mechanisms on traffic flowing through the Tor network. Packet dropping influences Tor flow control and leads to time gaps in network flows detected on the hidden server side. By taking control over communication edges and identifying watermarking gaps, Inflow is capable of detecting the origin of the Tor hidden server. Testing Inflow over the live Tor network has yielded success rates between 90% and 98%.
Rainbow – watermarking via packet timing:
RAINBOW represents an example of watermarking algorithm based on packet timing, which involves delay of each packet, by a computed level. The value of the delay is equal to the output of a cumulative function that randomly evolves with a step of plus or minus as a specific watermark value for each packet. The detection algorithm of RAINBOW relies on the comparison between the flow’s interpacket delays (IPDs) before being flow watermarking is undertaken and the flow of packets in intercepted by the detector.
Another proposed watermarking algorithm is also based on IPDs. The algorithm involves two groups of randomly chosen pairs of consecutive packets, and the IPDs are computed for every pair within each group. The two average values of the IPDs within the two groups are considered to be statistically equal to each other. This proposed watermarking technique aims at slightly modifying the IPDs, so that the difference that exists between the two average values is not equal to zero. The abundance of the two groups represents a form of redundancy and determines the reliability of detection.
Interval centroid based watermarking:
Another technique for watermarking is known as “interval centroid based watermarking”. Via this method, the time axis is divided into intervals that each has a fixed duration “T“. For each interval, a centroid is computed as the remainder of the value following the division of the packets’ timestamps observed in the T interval. Within the embedding algorithm, a percentage of the packets’ flow is delayed, in a manner associated with alteration of the statistical balances existing among the groups of intervals. As such, watermark detection relies on the statistical analysis of interval centroids.
With time, interval packet counting based watermarking techniques and the time axis is divided within intervals, and the number of packets within each interval represents the watermark carrier, and some of the flow packets are delayed to alter the packets’ statistical balance counting per interval.
Timing is not the only feature that can be utilized as a carrier for watermarking. Bit rate and packet size are two other traffic features that have attracted attention for network flow watermarking algorithms. However, size based watermarks have to be embedded directly at the traffic flow source, while rate based watermarks are strongly identifiable by third parties.
DROPWAT – invisible watermarking algorithm:
DROPWAT is another watermarking algorithm that has two main features that set it apart from currently existing network trackback solutions. First, it is based on a novel paradigm to impress a watermark with the network traffic flow, taking advantage of the network’s reaction to loss of packets. Secondly, the watermark embedded by DROPWAT is entirely invisible to the adversary. DROPWAT is highly efficient even when the traffic is routed via proxy servers.