The art of hiding secret messages in innocuous looking objects, known as steganography was introduced in a DDW article last month, featuring a tutorial on a useful tool that embeds data (usually text messages) into JPEG images – without making it obvious to the naked eye.
Today I would like to discuss steganography’s counterparty, namely steganalysis – the art of uncovering hidden communication and exposing the use of steganography. Steganalysis is not to be confused with the more well known cryptanalysis which focuses on decrypting a message that has already been intercepted. When performing Steganalysis you do not know where to look, you can’t identify the channel of communication which could be a stego image (image carrying a hidden message) uploaded onto facebook, a torrent of a star wars flick (which could carry a massive amount of hidden data – even a small video) … the possibilities are endless.
In 2001, well known security researchers Niels Provos and Peter Honeyman released a study whereby a distributed computing framework was used to analyse two million images uploaded to eBay and one million from USENET archives. Sadly no hidden messages were found – however, this study was made 15 years ago and the availability of steganography tools has increased and steganalysis methods have become a lot more sophisticated.
It is important to note that steganalysis merely constitutes the decision on whether or not a particular object contains a payload i.e. hidden information. A particular steganography tool or method is know to be broken, if it can be steganalysed with a success rate higher that 50% i.e. better than random guessing. Of course researchers are trying to push detection performance far beyond 50%! Rémi Cogranne from UTT published a research paper in 2014 claiming to achieve a detection performance of JPEG image steganography between 80 and 93 percent, depending on the amount of data hidden in the image. Just like at an airport security screening, it is easier to detect someone trying to smuggle a machete as opposed to a pen knife.
Remi and Niels are by far not the only researchers focusing on steganalysis and new white papers are published constantly on not only image but also video and audio steganalysis. Unfortunately easy to use steganalysis tools seem quite hard to come by or are out of date. A search for ‘steganalysis’ on Github.com currently returns a dire list of 38 projects. This being said, there is no need to loose hope, because if we look at the steganography world, there are lots of high quality tools available that are easy to use such as Steghide, JSteg and JPHS to name a few that specialize on JPEG embedding alone.
Now that we have a better understanding of steganalysis lets look at how we might go about actually finding some hidden data in images.
- Choosing where to look
In order to steganalyse images, we need to first get our hands on them. People using steganography will try to make it hard for outsiders to find out exactly who they are communicating with. To achieve this, they would upload a stego image to a public social network or image sharing site. Lets choose twitter.com as our target.
- Crawling the target
Because browsing Twitter and downloading random images by hand will take forever, we employ something called a web crawler (otherwise known as scraper), a computer program that browses or ‘crawls’ through web pages systematically. Usually this is performed by search engines for web indexing purposes, but anyone can create their own crawler that performs some custom task. We could create a crawler using the Ruby programming language in combination with a HTML library such as Nokogiri to crawl Twitter and download any JPEG image files encountered. Of course we could target our crawler to say all followers of @torproject because we think those people are likely to use steganography. Our crawler should have no problem downloading all these peoples image uploads, which could easily go into the millions of images.
- Running steganalysis tool
Once the crawler has finished its job, it is time to find a suitable steganalysis tool. As most images on the web are in JPEG format, we are going to choose a tool that specializes on JPEG’s. It is always a good idea to use multiple tools that use different steganalysis techniques and combine the output to give a more objective result. Binghamton University’s Digital Data Embedding Laboratory has published state of the art JPEG steganalysis tools, however they need to be combined with an AI framework and the ones we have tested unfortunately only work on greyscale images.
An up to date and ready to use JPEG steganalysis tool would surely be very much welcomed by many enthusiasts.
- Extract hidden messages from suspicious images
Once we have identified the images that are likely to have hidden information (know as stego images), we will try to extract the hidden message. Because there are different mechanisms used to embed a message into an image, we need to try out all of them (or at least the most popular embedding techniques). Additionally it is likely that we will discover an encrypted message that will need to be hacked via dictionary attack (script that tries out millions of likely passwords) for example.
The art of steganography is far from mature and most people haven’t even heard of it. However the number of monthly downloads of Steghide (arguably the most popular tool for image steganography) has doubled from 3,237 to 7,479 between October and November 2016 possibly due to increase fear of surveillance following Americas election results (37% of downloads originate from the US according to sourceforge.net). Naturally an increase in steganography activity will spark further interest in steganalysis, so we can look forward to more developments and stego tools in the near future.
babysnoop – @babysn00p