The TORminator

by OSIN

In September, 2007, a Swedish security researcher revealed that he was able to capture unencrypted data by operating exit nodes on the Tor network.

With his setup, he was able to sniff usernames and passwords of accounts used by embassy officials and corporate personnel.

For those of us who have used Tor for years, this is not Earth-shattering news.  In fact, the Tor web site goes out of its way to remind users that Tor should not be relied upon for strong anonymity.  Even though communication among Tor nodes is encrypted, the connection is no longer automatically encrypted once you leave the exit node to visit an external web site.  This means that if you log into an account that is not using SSH or HTTPS, your traffic can be sniffed.

As Tor becomes more known to the general public, you can bet that there will be others who will exploit users' lack of understanding.  But this article is not about the finer details of Tor.

A couple of years ago, I surmised that if the NSA wanted to wiretap Internet traffic en masse, they might do so at the Internet Exchange Points (IXPs).  You can read more on that project at uk.geocities com/osin1776/.

Briefly, an IXP is a place where many different ISPs exchange Internet traffic.

Of course, any NSA role in IXP traffic sniffing is speculation on my part, but I doubt that entities such as the FBI, DEA, NSA, and DoD would totally ignore such an easy access point.  Also, I doubt that those same entities could fail to notice Tor.  Tor attracts a lot of hacker-types, but now that the general population is starting to notice the system, it will inevitably attract elements of criminal activity.

The events of September 2007 spurred a question in my mind: who is running the Tor exit nodes, and where are they located?  More specifically, could any of the Tor router IPs be associated with or located at the same address as an IXP?

Isn't it a huge leap to assume that any Tor router could be associated with the U.S. Government?

I may not have definitive evidence that any particular Tor router is associated with the NSA, but if we use some common sense, we might come up with some possibilities.

For instance, a search on the Internet shows the Verizon/MCI behemoth currently holds the largest telecommunications contract with the U.S. Federal Government.  Since I had already done a quick, sampled search in the Tor file which holds a listing of the routers and their IPs, I found a name that stood out as I searched ARIN's records: Verizon Internet Services.

Before I go on, let's talk about what is, for our purposes, the most important of Tor's files.

When you first start Tor, it builds a listing of all participating Tor nodes.  This file is called cached-routers, or, in newer versions, cached-descriptors.

Suppose that you are logged in as user and you start Tor.  Then, the cached-routers file will be placed your home directory; in this example, it will be called: /home/user/.tor/cached-routers

If you take a look in that file after stopping Tor, you'll see a lot of information on those nodes.  I don't know what everything means in that file, but this command will give you a list of all the IP addresses participating in the Tor network and the associated names:

$ cat /home/user/.tor/cached-routers | grep "router " > tmp.txt

The space after the word router is important.

If you replace the grep command with wc -l, you can get the number of Tor nodes that were participating at the time you started Tor.  The file is a treasure trove of information such as what OS each node is using and how long each node has been been up, but for our purposes, we're only interested in the router line.

Getting back to Verizon, we can do a search on ARIN for "Verizon Internet Services" and get a listing of their supposed IP address space.

I say supposed, because ARIN sometimes truncates the results it returns to the browser.

The first entry in ARIN's records for Verizon Internet Services is: 64.222.0.0-64.223.255.255

We could then run these commands to see if any of the Tor nodes fall within this range:

$ cat /home/user/.tor/cached-routers | grep "router" | grep " 64\.222\."
$ cat /home/user/.tor/cached-routers | grep "router" | grep " 64\.223\."

Note that the space before the 64 is needed.

All these commands I'm running just seem to be screaming, "Script me!"  So, I've created a script which will do some of the leg work for you.  It is a Perl script and can be downloaded from the 2600 code repository.

Let's turn our attention to this script, which is called parse.pl.

The first thing the script does is to set up a .wgetrc file in the home directory of the user you're running as.  This is one of the places you'll have to edit the script for yourself.

Then, you can run the script at the command line like this:

$ ./parse.pl [tor_cache_file] [IP_segments_test_file] [registry]

You must create several directories in the script's working directory before proceeding.  Since Verizon is our example these directories would be:

verizon/
verizon/ARIN
verizon/RIPE
verizon/APNIC
verizon/LACNIC
verizon/AFRINIC

There are three variables passed to the script.

tor_cache_file is the location of the cached-routers file.  It is usually in the /home/user/.tor directory of whatever user you're logged in at the time.

The IP_segments_test_file file lists the major IP segments of an entity, in this case Verizon, that we want to test against in the list of Tor routers in the cached-routers file.  As I mentioned, not every listing for Verizon comes up, so it might be better if we search the entire range matching the first number in the IP addresses of each of Verizon's entries.  Here is the verizon/verizon.txt segments file I created for Verizon:

64\. 
138\. 
199\. 
129\. 
130\. 
162\. 
151\. 
141\. 
209\. 
207\. 
68\. 
4\. 
70\. 
71\. 
72\. 
96\.

It's obvious that Verizon doesn't have all that space, it behooves you to search all of it, just in case.

The registry variable is either ARIN, APNIC, RIPE, LACNIC, or AFRINIC, in all-capital letters.  As stated earlier, ARIN doesn't return all records all the time.  And it's obvious that Verizon isn't assigned all the address space listed in verizon/verizon.txt.  So we can check that same file against the other registries to see which Tor IPs are located in those registries.

When running this script against Verizon, you would use this command:

$ ./parse.pl /home/user/.tor/cached-routers verizon ARIN

The script uses GNU Wget to make a call to the registry, in this case ARIN, and creates a HTML file for each IP address it tests.

After the script has run, it is trivial to run a command to find out how many Tor routers might be listed as falling under Verizon Internet Services:

$ cat verizon/ARIN/*.html | grep "OrgName:" | grep "Verizon" | wc -l

You can then look at each HTML file to get more information as to what ARIN returned.

What were the results of my test?  I can't say anything conclusive, but Verizon Internet Services is consistently listed as a host of many nodes in the Tor network, usually having 15-25 nodes active at a time.

For all of the IPs I examined which are registered to Verizon Internet Services, ARIN says the address that was entered during registration is: 1880 Campus Commons Dr., Reston, VA 20191.

The interesting thing is that the address above maps just down the road from the location listed for MCI's MAE-East IXP facility at Reston.  In fact, they're both within the same area code.

During my searches, I came across another entity called ThePlanet.com.  This entity had anywhere between 15-30 nodes active at a time, and all the IPs are listed by ARIN as being near the same address as the Dallas Infomart IXP run by Switch & Data, 1950 Stemmons Freeway, Dallas, TX.

Keep in mind that I have just looked at a very tiny portion of the Tor nodes that participate in the system at one time.

But now I want to look at something else: what countries might be contributing to the Tor system?  Well, I've been engrossed for the past several years in mapping out the IPv4 address space for various countries.  Just a couple of months ago, I finished the Middle East.  You can see the project at uk.geocities.com/osin331.

Using the same concepts as with Verizon, we can scan the cached-routers file to see if any Tor nodes map back to countries we're interested in.  Since most of the Middle East falls under RIPE, that is the registry we'll be hitting.  During one scan, I found that Iran had two nodes in the Tor network; Israel, seven nodes; and the United Arab Emirates, two nodes.

Thus far, I've looked just a small portion of the total number of Tor routers that are out there.  Wouldn't it be nice if we could get a snapshot of every Tor node out there?  Not being one who can leave well enough alone, I decided to see if was possible to analyze the entire Tor system at a given point in time.  So, I set out to create a set of scripts that do this very thing for me.  I utilize a MySQL database to store the data, and I update this database roughly every 20 minutes.

First, my system's starter script creates the routers.txt file of all the Tor nodes listed in the cached-routers or cached-descriptors file when Tor first starts up.

Then, for each IP listed, the script first checks to see if that IP is in its database.  If it is, then the DATE_UPDATED field is updated to reflect the current time, allowing that entry to remain on file.

If the IP is not in the database, then the script checks the OrgID returned by ARIN.  The ARIN OrgID lets us know if the IP address in question is assigned by ARIN, or, alternately, which registry we should look in if the IP is not in ARIN's jurisdiction, then the script will contact the appropriate registry to get the information we need.

The script runs until all IPs are checked.  At the end of the run, old entries in the database are removed, but the historical IP record data gleaned from the registries is kept for future reference to speed up the process.

I have been running the above setup for a week now, and something interesting about the Tor network has come to light.  Guess which country is hosting the most Tor nodes?  If you said the United States, you are wrong.  In fact, the country that hosts the most Tor nodes usually hosts more than the U.S., China , Russia, and Great Britain combined.  Which country is it?  Germany.

That result seems to remain consistent no matter how long I run my scripts.

Why Germany hosts so many Tor nodes is beyond me, and the number is surprisingly large.  (Editor's Note: Location of lots of U.S. military/intelligence resources running exit nodes?)

Usually, they comprise nearly a third of all Tor nodes at any given time.  I'm at a loss to explain why Germans are flocking to Tor.  It might even be that Germans themselves are unaware of this information and that a foreign power is be running exit Tor nodes in Germany to circumvent that foreign power's own laws.

I'll leave the conspiracy theories up to the reader, but if someone out there knows why Germany is hosting so many Tor nodes, I'd like to hear it.

parse.pl:

#!/usr/bin/perl 
# usage: ./parse.pl [tor_cache_file] [IP_segments_test_file] [registry] 
#
# Registry is either ARIN, APNIC, RIPE, LACNIC, or AFRINIC
# SET .wgetrc to local proxy to use TOR. 
# Edit this script to support whatever 'user' you are using. 
# You don't have to use TOR but if you do, make sure Privoxy 
# and TOR are both running before executing this script

# If you don't want to use TOR, just comment out the line below 
$cmd=`/bin/echo "http_proxy=127.0.0.1:8118" > /home/user/.wgetrc`; 
 
# Location of the main tor cache file 
$tor_cache_file=$ARGV[0]; 
 
# File with list of IP segments to test 
$entity=$ARGV[1]; 
 
# Which registry we want to search against 
$registry=$ARGV[2]; 
$registry_url=""; 
 
# Note that the entity IP segment file lives in the entity directory!!! 
$entityfile=$entity."/"."$entity.txt"; 

open (ENTFILE,"$entityfile"); 
 
while(<ENTFILE>) { 
# Find out which registry we're running against and set variable 
# Note that the below registry URLs may change without notice 
if ($registry eq "APNIC") { 
  $registry_url="http://www.apnic.net/apnic-bin/whois.pl?searchtext=";
} 
else { 
  if ($registry eq "RIPE") { 
    $registry_url="http://www.ripe.net/fcgi-bin/whois?form_type=simple\\&full_query_string=\\&do_search=Search\\&searchtext="; 
  } 
  else { 
    if ($registry eq "LACNIC") { 
      $registry_url="http://lacnic.net/cgi-bin/lacnic/whois?lg=EN\\&query="; 
    } 
	else { 
      if ($registry eq "AFRINIC") { 
        $registry_url="http://www.afrinic.net/cgi-bin/whois?searchtext=";
      } 
	else { 
      $registry_url="http://ws.arin.net/whois/?queryinput=";
    } 
  } 
 } 
} 
 
chomp(); 
 
# Read in the IP segment we are testing against 
$oct=$_; 
# Cat out the regular tor cache file for the 'router' lines > tmp.txt 
# Note the space before $oct- it is needed due to the file format 
$cmd=`/bin/cat $tor_cache_file | grep 'router' | grep' $oct'> tmp.txt`; 
open (OCTFILE,"tmp.txt"); 
        while(<OCTFILE>) { 
        chomp(); 
        $line=$_; 
        # Split the line out by spaces 
        ($hold, $tor_name, $ip) = split, $line," "; 
        # Now for each IP go out to see if it is in whatever registry passed in 
        $cmd2=`/usr/bin/wget -O $entity/$registry/$ip.html $registry_url$ip`; 
        # we need to delay since some registries see multiple calls coming
		# from an IP as an attempt to data mine- esp. RIPE. 
	    # Sleep 5 seconds 
        sleep 5; 
        } 
}

Code: parse.pl

Code: Tor Analyzer




This set of scripts makes up the Tor Analyzer that I've written to give some basic statistics on a Tor network.  It is not and should not be expected to be a real-time analysis.  There is no way for me to provide that analysis given the limited resources to the Internet registries, limited bandwidth, etc.  Only large organizations and government entities can do that.  But my Tor Analyzer will give you some idea as to who is running a Tor node, what country they're operating in, etc.  I am assuming the reader has knowledge in setting up a MySQL database, UNIX/Linux shell scripting, and general systems administration.  These scripts are not intended for people who are first starting out (newbies).  Also, these scripts are meant to run in a UNIX/Linux-type environment.

After you have downloaded the scripts, you should first set up the database.  I use MySQL because it is readily available on most Linux disbributions.  Once you have the database started you should log into the MySQL admin console, usually by issuing a command such as mysql -uroot.  Here are the commands that I used to create the Tor database:

create database tor;

use tor;

create table tor_ips (IP VARCHAR(20) NOT NULL,PRIMARY KEY(IP),TOR_NAME VARCHAR(100),DATE_UPDATED DATETIME);

create table registry (IP VARCHAR(20) NOT NULL,PRIMARY KEY(IP),REG_NAME VARCHAR(100),DESCRTEXT,COUNTRY VARCHAR(2),REGISTRY VARCHAR(10),DATE_ENTERED DATETIME,DATE_UPDATED DATETIME);

grant all privileges on tor.* to 'tor'@'localhost' identified by 'torminator' with grant option;

In the above example I use the user tor and the password torminator to set access on that database.  You can use the stats.pl script to test if you're connecting properly.  It should just return 0 stats.

Once you have the database up and running there are two files you need to edit.  The first one is the start_tor_analyzer.sh script.  In that file you'll need to set the tor executable path and other settings.  Assuming you have systems administration experience you should have no problems setting the values in that file.  They're mostly self-explanatory.

The second file is called variables.pl.  Although it has a ".pl" extension it is nothing more than a file that sets the various variables needed by the scripts, such as the database, MySQL machine IP, the user and password to access that database.  Also there are three very important variables that must be set correctly for the analyzer to work.

The first one is called TORCACHEFILE.  This is the name of the cache file that Tor uses to store the nodes that will be analyzed.  This file has a different name depending on the version of Tor you're using.  In my experience it is usually called cached-routers or cached-descriptors.

The second variable is called USERHOMEDIR and should be self-explanatory.  This is the home directory of whatever user you'll be running Tor.  That is because when Tor first starts up it creates a .tor directory under that home directory.  Note that there is no ending / for that variable.

The third variable is called SCRIPTWORKINGDIR and this is the full path directory where the analyzer scripts are located.  This is also the directory where temp files are created.  Note that there is no ending / for this variable as well.

By the way, if you're going to run the start_tor_analyzer.sh script in cron, you will need to copy the variables.pl file to the home directory of whatever user you'll be running it under, or you'll have to edit at the top of each of the Perl scripts to reflect the full, absolute path name to the variables.pl file.  Otherwise, the run will fail in cron.

Once all of this is set up you should be able to run the analyzer by running this command: ./start_tor_analyzer.sh

Also, if you don't want to use that script you can just start the analyzer by running this command: ./tor_analyzer.pl, but make sure Tor is running before running the command.  My suggestion is to run the script manually for the first time since it usually takes anywhere from 1-4 hours to complete a fresh run.

After that the IP is logged to the registry table for future reference, should that IP show up again in the system.  This makes things run much faster after an initial run.  My experience shows that subsequent runs take about 5-20 minutes, but this is highly dependent on the number of Tor nodes at run time.

Most of the scripts are used by tor_analyzer.pl and were not meant to be run by themselves.  However, two scripts are provided for your convenience to query the database.  The first one is called query.pl.  It takes a properly formatted SQL command as input.

Below is an example of running the script from the command line:

$ ./query.pl "Select tor_ips.IP,tor_ips.TOR_NAME,registry.REG_NAME,registry.DESCR,registry.REGISTRY,registry.COUNTRY from tor_ips,registry where registry.COUNTRY='US' and registry.IP=tor_ips.IP"

Just hit Enter and it will return all the records in which IPs are listed for the U.S.

Please note this document assumes the reader has familiarity with SQL.

The second script provided is called stats.pl.  You run it by itself and it outputs some rudimentary statistics on several countries and the registries.  Edit the script as you like.  Don't feel slighted if your country is not listed in the stats.  I just picked the ones that I see more often than others.  For some reason Germany consistently has the most Tor nodes and by a large margin.  I'm not sure why.  Anyway, have fun!

Return to $2600 Index