Stupid WebStat Tricks

by StankDawg

Anyone who has ever maintained a web site has probably used some web stats (short for Web Statistics) program to monitor their sites visitors.

These packages all have various features, layouts, and designs but they all do basically the same thing and that is to gather almost everything out of the log and save you the trouble of scanning through it yourself.  They parse through your web server logs and collect and organize all of that dry, raw text data and put it into a nice, clean, human readable format.  Web statistics packages are plentiful and they serve a great purpose for the webmaster.

What is in a server log anyway?

A web server log keeps track of all of the dates and times of every hit to every item on the site.  Everything that is served up by the web server is logged including pages, style sheets, images, and anything else that is reachable over the web.

The record of each hit contains several fields of information.  This includes the agent (usually the web browser), the OS fingerprint, and the IP address of the requester.

Stats programs parse through your web server logs and collect and organize all of that dry, raw text data and put it into a nice, clean, human readable format.  Some go above and beyond the basics to not only analyze the web logs (which contain IP addresses) but to see where they resolve.  This allows you to let you see what sites are linking to you.  They also may break down your hits by User-Agent (usually a browser), country, OS version, and lots of other stuff that a webmaster can use to optimize their site.

If your users all use a certain browser, you might put special code in your pages to give extra functionality to that particular browser for example.

But why would a hacker care about this?

The answer is as simple as thinking of all of the things that are logged by the web server.  Just having the raw logs alone could yield some great footprint information.  You get the same benefits that the webmaster does!  The thing to keep in mind here is that all hits are logged in a web server.  The stats programs will gather them all up and far, far too many people make these stats publicly available.

Some webmasters actually want their stats exposed for some reason.  They may think that it is some sort of service to their visitors or maybe a way to "show-off" their hits.  What they don't realize is that while showing off their hits, they are also giving a listing of almost every file on their server (or at least the ones that have been visited).  The scary thing is that these visits include not only external visits, but internal visits as well!

You may be wondering what sort of things could possibly be found in someone's boring old stats pages.

Since all internal visits are logged as well as external visits, some things appear that may not have been intended for public consumption.  While the webmaster is working on, or developing his/her pages, they are generating hits on those pages.

I have gone to many "under construction" sites only to find that their web stats are working and I can see the complete list of URLs that they are working on!  They certainly didn't mean for them to be public, but they are.  I have entered contests early, joined sites that weren't open for business yet, and tagged guestbooks even when they weren't expecting any guests.

Even if the site is not under construction, they are always working on some pages, somewhere, that are not publicly available yet and these links are picked up by the stats programs.  Some companies use test servers for development and do not move anything to the live server.  This is definitely the best practice to avoid having anything "accidentally" go public.

There are many statistics packages out there.

I have tried many of them, from the Analog Stats package to AWStats and everything in between.  We also have a few custom Perl scripts written in-house to "watch the watchers" and see who is looking at what.

For the rest of this discussion, let's focus on Webalizer; which is the most common stats package that I see, as a base for the examples in the rest of this article.  It is no more or less vulnerable than any others, but it just gives a specific example for these scenarios.

By default, Webalizer logs the top 20 pages visited.  Webalizer can also be configured to provide a link to the entire list of URLs.  The same holds true with the list of referrers.  You may see pages that are listed that you didn't even know, or that you weren't meant to know, existed.  Since you can see the exact pages that are being hit the most, you may find out that some quick redirection is happening and you may find a page that isn't meant to be traveled directly to.  It may have source code in it that was supposed to be hidden, or some configuration data in it that can explain how the site works.  All of this would have been invisible to a user who didn't have access to public web stats.

One other thing to keep in mind is that when we say "ALL" pages, we really mean "ALL" pages.  This means password protected pages and directories are also logged and therefore reflected on the stats page.  You may not have the password to get into that directory, but you may be able to at least get the username.  Another one of Webalizer's defaults is to log the top 10 users that log into a system account.  If you want into that directory bad enough, it simply becomes a matter of brute force password cracking at this point.

Another interesting thing to keep in mind is the basic general espionage that can be done by looking at competitor's stats.  It doesn't even have to be a competitor; it can be a friend, an enemy, or a random blogger on the Internet.  You can see which of their pages are the most popular and use that information to your advantage.  Perhaps you decide that all of their hits are going to a certain web application or tool that they make available.  You could write a similar application and try to steal their traffic away and over to your site, if you were so motivated.

You could also see where most of their hits are coming from.  By default (and again, I am only using Webalizer to have a consistent example and these techniques are just as effective with any stats package) Webalizer logs the top 30 referrers in its stats generation.  You can see where all of their hits are coming from and visit those pages to see why.  Maybe they are advertising on a site that you hadn't heard of before and could also be advertising on.  Combined with the duplication of their page or application as mentioned earlier, you could not only copy them, but also steal their own customers away from right under their nose.

On a similar dirty thought, think of that damage that can be done by posting seemingly anonymous comments to that site defaming (which would be libel) your competitor's web site or products.  The potential damage to their reputation could be devastating.  Even thinking less nefarious, you could simply learn where they links are coming from and how and that simple knowledge could be invaluable.  I am not suggesting that anyone do any of this, but any business owner who is reading this should be concerned at the possibility.

Most people install Webalizer into a directory named /usage which makes it easy to find on most server.  Other common places to find installations is /webalizer, /webstats, or just /stats.  You may also find it in a directory with the version number such as /webalizer-2.01-10.

If you don't have a particular target site, or cannot find it on a particular site, then you can find many publicly accessible stats programs on Google by using some Google hacking techniques.  If it wasn't Googled, then maybe it is excluded by the robots.txt file (as mentioned in my "Robots and Spiders" article in the Winter 2003-2004 issue of 2600).

Here is an example of Google hacking for open stats packages.  To find a site using Webalizer, try these exact strings:

"Monthly Statistics for" and 'inurl:"usage"'

https://www.google.com/search?q="Monthly+Statistics+for"+and+'inurl:"usage"'

https://yandex.com/search/?text="Monthly+Statistics+for"+and+'inurl:"usage"'

This combines a literal string from the page and a static part of the string used in the URL.  This URL string is a literal in the code and will not change unless someone has modified the code.  Modifying your code is a practice that I highly encourage and changing a literal value is very easily done.  It will protect you from the default hunters of the world by taking away publicly known literal strings from their search attempts.  Use the same technique and apply it to your stats package of choice.

All of these vulnerabilities are easily fixed.

One way to limit the potential for abuse is to read up about the package that you are using and how to configure it in such as way as to not show certain hits or certain pages that you do not want known.  You can configure it to not show hits from the localhost, or have it ignore hits to certain directories, for example.  This method, however, is probably not the best approach.  You may be working remotely and not from the localhost.  There are always new pages or changes in your naming conventions that may allow information to slip through and you will be constantly plugging holes in your stats software.  If you must make your stats public, at least make it a part of your security policy to regularly check these stats for sensitive data and update it accordingly.

There is one big and easy fix.  If you are running a machine with some sort of control panel software, then your stats are usually only viewable by logging into the control panel (but not necessarily).  If you are running your own server, or are installing your own stat packages outside of the control panel, then you really need to password protect the directory in which the stats are generated.  It is very simple to add a password and now you have a reason to do exactly that.  I do this, and so should you.  Protect your stats packages with a password!

"The Revolution Will Be Digitized"

Linkz: Freshmeat  (http://freshmeat.net/browse/245/) which has Webalizer, AWStats, and many more.

Shoutz:  The DDP, Doug, tehbizz, the listeners of BinRev Radio and DDP HackRadio.

Return to $2600 Index