---------[ Whisker: next-generation CGI scanner			doc v1.3.0

--[ by rain forest puppy / ADM / wiretrip  	(rfp@wiretrip.net)
								
----[ Table of Contents

 +General
 +-	- Background
 |	- What whisker does/has
 |	  - array
 |	  - scan
 |	- Command line reference (feature overview)
 |	- Unix multi-threaded front-end use
 +-	- Usage notes

 +Technical
 +-	- Global variable list
 |	- Language reference
 |	- Advanced coding tekniq
 |	  - Logic evaluation
 |	- Scan optimization
 +-	- Notes for eval/internal coding

 +Misc
 +-	- Wish list/future enhancements
 |	- What's to become of web scanners
 |	- Whisker in the news!
 +-	- Signoff


----[ Background

A CGI scanner is just a CGI scanner, right?  And they're pretty lame apps
to boot, right?  Hmmm...well, perhaps.  That's because no one has given
any thought to them.  Yeah, until I did.  Perhaps I have to much time on
my hands. ;)  After reading this, I will be surprised if you don't think I've
put way to much thought into this.

I've waded through the pile of CGI scanners found on Packetstorm (before
JP got his way; j3rk), Rootshell, etc.  Team Void's VoidEye and the cgichk.*
are the most comprehensive....but that seems to be the 'goal' they shoot
for--try to have 'the most checks in any scanner'.  Great.  Nevermind the
fact that some of the checks are completely wrong (I think it's funny to
notice how the Cold Fusion '/expeval/' has propogated to so many scanners
as '/expelval/'--one kiddie made a mistake, and they all copied.)  

Wait...CGI scanning isn't that complex, is it?  Well, to do it right, yes.
Why?  Hmmm...I can think of a few reasons:

1.  /cgi-bin is pretty damn common, I'll give you that.  But I've also been
on many a hosting provider that used /cgi-local.  And I've seen people use
/cgi, /cgibin, etc.  Fact of the matter is that it could also be 
/~user/cgi-bin, or /~user/cgis, etc.  Then there's some scripts that are
all over the place, like wwwboard, which may or may not have it's own
directory.

Point of the point:  wouldn't it be nice to define multiple directories?

2.  You know what really irks me?  Seeing a CGI scanner thrash around through
/cgi-bin or whatnot, when /cgi-bin doesn't even exist.  Talk about noisy in the 
logs.  Now, if we waste a brain cell, we can see that if we query the /cgi-bin
directory (by itself), we'll get a 200 (ok), 403 (forbidden), or 302 (for 
custom error pages) if it exists, or a 404 if it doesn't.  Wow.  So if we
just do a quick check on /cgi-bin, and get a 404, we can save our however
many /cgi-bin CGI checks we were going to make.  That could save you 65
entries in the httpd logs.

Point of the point:  save noise/time by querying parent dirs

3.  If you have more to spare, let's waste another brain cell for another 
obvious issue.  Why should I query for, say, test-cgi on an IIS server?
Or /scripts/samples/details.idc on Apache?  Why should I even bother checking
various httpds at all (like a firewall proxy, etc)?  When we do a request,
the server gives us it's name and version.  How nice of them.  How about
we take advantage of their generosity?

Point of the point:  tailor your scan to the server you're scanning

4.  Virtual hosts.  Most webservers nowadays (especially Apache with it's 
VirtualHost directive, and IIS with its virtual host setup wizards) allow
you to assign many actually domain name/websites to the same IP.  Well,
hell...how does the server know which site you want when you connect?
Well, browsers give a second piece of information, the 'Host' directive.
So, a request may look like:

	GET /~rfp/index.html HTTP/1.1
	Host: www.el8.org

So say we have SlikWilly Virtual Hosting, they run off RedHat Linux using 
Apache.  They setup their only IP (as that's all they could afford for
their $39.95/month shared DS0) to host the site www.slikwilly.com.  Now,
on the actual box, the location for their files are in /home/httpd/html/
for html files, and /home/httpd/cgi-bin/ for, whatelse, but their CGI apps.
So a request to www.slikwilly.com/index.html is going to be pulled from
/home/httpd/html/index.html.  So far, so cool.

Well the powers that be at Defcon decide that they've had it with catalog.com,
since ADM hacked their webpage there.  They want to move over to
SlikWilly.com in hopes that it will keep those ADM people from changing
the site.  So Slik Willy himself hops into his httpd.conf and adds a
VirtualHost directive for www.defcon.org.  He sets up the html directory
to be /home/defcon/html/, so that those Defcon people can ftp in via his nifty
wu-ftpd-2.4.2(beta 18).  So that means that www.defcon.org/index.html should 
be pulled  from /home/defcon/html/index.html.  Slik Willy also gives them
their own cgi-bin, located in /home/defcon/html/cgi-bin/ (which means it's
no silly aliased directory, since Slik doesn't understand all that stuff).

So, now, in this situation, www.defcon.org is a *virtual* site off of 
www.slikwilly.com (the root site). What exactly does that mean will
happen?  Well, let's see:

If I give the request:
	GET /index.html HTTP/1.0
I will get back the file at (assuming it exists):
	/home/httpd/html/index.html
which is Slik Willy's file (www.slikwilly.com)

If I check for:
	GET /cgi-bin/test-cgi HTTP/1.0
I will be checking for:
	/home/httpd/cgi-bin/test-cgi
which is again Slik Willy's file (www.slikwilly.com)

Now, if I check for:
	GET /index.html HTTP/1.0
	Host: www.defcon.org
I will get back:
	/home/defcon/html/index.html
which is the www.defcon.org homepage

Similarly:
	GET /cgi-bin/test-cgi HTTP/1.0
	Host: www.defcon.org
I will be checking:
	/home/defcon/html/cgi-bin/test-cgi
which is in www.defcon.org's cgi-bin.

Now, why does any of this fscking matter whatsoever?  Well, imagine you wanted
to be like ADM, and try to hack www.defcon.org again.  So you whip out
your trusty cgichk.c CGI scanner (oooh, you hacker you) and rev it up
against www.defcon.org.  Well, guess what--the scanner connects to Slik
Willy's box, does generic requests (no Host), and winds up scanning Slik
Willy's cgi-bin for cgis, not the actual www.defcon.org's cgi-bin.  And
there exists the possibility that www.defcon.org had way cooler stuff than
Slik Willy.

But lemme just make it known, this usually works in your favor.  For instance,
on IIS, the virtual hosts will *NOT* (unless specifically added) have
/scripts mapped to them--but the root site will.  So, trying to GET
/scripts will work off the main (generic) site, but if you try a virtual
host with Host directive, most likely /scripts won't be mapped over.  Same
for Slik Willy.  test-cgi comes by default in /home/httpd/cgi-bin/, not
/home/defcon/html/cgi-bin.  So scanning the root site is better to find
the 'default' install CGIs.

Point of the point:  there's a whole 'nother world out there hiding behind
			virtual hosts--and you may not be scanning who you
			think you really are

5.  Some places user custom error pages.  Unfortunately, the
implementation is such that instead of generating a 404 'not found', you
always get a 200 'success', with HTML to indicate the missing page.

Point of the point:  being able to minimize this anomaly would lessen
			false positives

6.  More wishes:  at a decent rate, it seems more CGI and webserver problems 
are found.  Plus, I might like to customize which scans I want to do
against a particular host.  Having to edit C code and recompile everytime
could quite severely suck, especially if I'm a lousy C coder to boot.

Point of the point:  if this was all scriptable, that'd be nifty

7.  Input sources.  I dunno about you, but I'm quite tired of doing bizarre 
awk/host -l combos, dumping them to a file, and then feeding them back into 
the various scanners.  Sometimes I want to just feed in output from nmap
(after all, it has a list of the found open port 80's, right?), sometimes
just a laundry list of IPs/domains, and sometimes, I'd just like to do a
single host on the command line.

Point of the point:  flexibility of input would be nice as well.

8.  IDS/log avoidance.  Do you know how many IDS alarms you'll set off by
requesting /cgi-bin/phf?  Let alone it's easy to spot in the logs.  So 
instead of just handing over the plaintext, why not URL encoded all/part of
it to break up the literal plaintext string, such as /cgi-%62in/ph%66.  It 
keeps the string-matching/packet-grep IDS systems from getting a positive id,
and the more encoded you make it, the harder it is to figure out what it is
(on the flip side, it also stands out more in the logs, even if it's unknown
what /%63%67%69%2d%62%69%6e/%66%69%6e%67%65%72 is really scanning for).

Point of the point:  being able to spoof IDSs would be a nice feature

Well, that's enough wishes, don't you think?  Now, do they come true....


----[ Whisker has all that, plus a bonus feature or two :)

Yeah, no kidding.  Come on, I wouldn't wish for something that I didn't
actually implement.  I'd look dumb. :)  My future wishes are down below
at the end. 

Anyways, so whisker does all that.  Let's look at the two basic functions of
whisker, array and scan.  This is a reprint of the command reference below,
but a little more verbose.

-[ array {name} = {comma delimited list}

This is one of the two core commands of whisker (the other being scan).
Basically, you make an array named {name} with elements from your comma
delimited list.  This array is then referenced as @{name}, and given to
the scan function to scan the permutations of the names in the @array.
You can include another array in the list of elements...it will be added
inline.

Example:
	# let's make an array of common unix cgi locations
	array roots = cgi-bin, cgi-local, cgibin, cgis, cgi

	array first = a,b,c
	array second = d, @first, e
	# second = d,a,b,c,e

	array bigroots = cgi-bin, cgi-bin/secret, cgi-bin/rfp
	
	# this is a big NO!
	array moreroots = cgi-bin/@first, rfp/@bigroots
	# only the scan() function will parse roots like this


-[ scan ({optional server regex}) {dirs} >> {script}

This is the heart of whisker.  This command is what actually performs
the scanning.  There are a few aspects to the command.  First is the
{optional server regex}.  You can do a server specific scan one of two
ways:

	server (iis)
	scan () scripts/tools >> getdrvrs.exe
	endserver

or shorten it as:

	scan (iis) scripts/tools >> getdrvrs.exe

Scan will only do the check if the server regex is () or matches 
(similar to the server command).  Now, {dirs} and {script} are required.
{dirs} is a command delimited list of directories to check to see if
{script} exists.  {dirs} may also contain arrays made with the array
command.  Let's see some examples:

	scan () cgi-bin, cgi-local >> my.cgi

will check for /cgi-bin/my.cgi and /cgi-local/my.cgi

	scan () a/b, a/c, a/d >> my.cgi

will check for /a/b/my.cgi, /a/c/my.cgi, /a/d/my.cgi

	array subdirs = b,c,d
	scan () a/@subdirs >> my.cgi

will check for /a/b/my.cgi, /a/c/my.cgi, /a/d/my.cgi

	scan () @subdirs >> my.cgi

will check for /b/my.cgi, /c/my.cgi, /d/my.cgi

	scan () a, a/@subdirs, f/@subdirs/g >> my.cgi

will scan for all those permutations, expanding out @subdirs into every
combo involving the elements in @subdirs.  So you see how powerful
directory arrays can be.  If we have an array of places we want to look
for CGIs
	
	array roots = cgi-bin, cgi-local, scripts
	array people = ~rfp, ~adm, ~wiretrip

we then can scan for wanted combos

	scan () @roots, @people/@roots >> my.cgi

this is nice because we only have to adjust our arrays to compensate
for different locations, and we can use the arrays for all our scans in
the programs.  How centralized. :)

Also, by request, you can specify multiple files to scan with the following
syntax:

	scan () @roots >> file1, file2, file3, file4
	
Note that this breaks evaluation logic (like 'info' and 'ifexist').

You can specify the root directory by using a single /, as such:

	scan () / >> index.html

whisker automatically checks each directory as it goes in {dirs}, and 
caches the response.  See 'Advanced coding tekniq: Optimized Scans' for
more information on how to (ab)use this command properly.


So basically, you define arrays of directories (although that's 
optional), and use scan to scan for the scripts.  Easy enough.  Plus,
there's a suite of other simple logic to help out in our scanning
endeavours.  You can take a peek at the included sample scan.db to see
usage, if you're a learn by doing/example type person.  Anyways, onto
using whisker....


----[ Commandline reference

Here is the commandline reference ripped from whisker itself:

Usage:  whisker (options)

	-n+ *nmap output (machine format, v2.06+)
	-h+ *scan single host (IP or domain)
	-H+ *host list to scan (file)
	-F+ *(for unix multi-threaded front end use only)
	-s+  specifies the script database file (defaults to scan.db)
	-V   use virtual hosts when possible
	-N   query Netcraft for server OS guess
	-S+  force server version (e.g. -S "Apache/1.3.6")
	-u+  user input; pass XXUser to script
	-i   more info (exploit information and such)
	-v   verbose.  Print more information
	-d   debug. Print extra crud++ (to STDERR)
	-l+  log to file instead of stdout

	-I 1 IDS-evasive mode 0 (URL encoding)
	-I 2 IDS-evasive mode 1 (/./ directory insertion)
	-I 3 IDS-evasive mode 2 (both 1 and 2)
	-I 4 IDS-evasive mode 4 (premature URL ending--NO APACHE)
	-I 5 IDS-evasive mode 5 (long URL)
	-I 6 IDS-evasive mode 6 (fake parameter)

	-A 1 alternate db format: Voideye exp.dat
	-A 2 alternate db format: cgichk*.r (in rebol)
	-A 3 alternate db format: cgichk.c/messala.c (not cgiexp.c)

	-p+  proxy off x.x.x.x port y (HTTP proxy--see docs)
	-P   request and format proxy list from fsu.virtualave.net
	-B 1 bounce off of altavista.com (and netcraft.com)
	-B 2 bounce off of samspade.org
	-B 3 bounce off of anonymizer.com
	-B 4 bounce off fsu.virtualave.net proxy list (random)

 	+ requires parameter;  * one must exist;


You have three input options, -n, -h, or -H.  You must have at least
one, but you can use multiple.  They are:

   -n nmap.file 	supply a nmap (v2.06+) *MACHINE FORMAT* output file
			you can get this by using nmap -m nmap.out
			whisker will read it in and check every host with port
			80 found to be 'open'
   -h {ip or domain}	single host.  Just supply host on commandline, such
			as "-h www.microsoft.com"
   -H host.file		this is essentially a laundry list of ips and/or domains,
			one per line, like thus:
				www.microsoft.com
				www.sun.com
				123.123.145.167

The other important option is -s, which lets you specify the scan database to
use.  Starting with version 1.3, whisker will now default to scan.db, so you
do not need to specify -s, unless you want to use a different database.  Also
starting with version 1.3, you can now specify alternate style lists of CGIs
to scan.  Whisker can read in VoidEye's exp.dat file, cgi-chk.r (written in
Rebol), cgichk.c (the non-hex encoded one), and messala.c.  Whisker will actually
convert those files to whisker-compatible scan databases, and then will even
apply some logic to them.  And as a bonus, whisker will even fix the 'expelval'
bug (it should be 'expeval') that so many of the scanners have copied from each
other (even though it's wrong).  To use an alternate format, you have to use the
-s option to specify the location of the alternate file, and the -A option to
specify the type, like so:

	For exp.dat (VoidEye) files:
		whisker.pl -s exp.dat -A 1

	For cgi-chk.r (Rebol) files:
		whisker.pl -s cgi-chk.r -A 2
	
	For mesalla.c, cgichk.c, and other generic .c files (NOT cgiexp.c):
		whisker.pl -s mesalla.c -A 3
		

-V tells whisker to attempt to use virtual host domains where-ever
possible. If you're scanning an IP address, -V won't do anything.  But
if you're scanning a domain name, whisker will include the domain name
in the Host: directive. 

-v will print more verbose information (to console or logfile).  -d will
print debugging information to STDERR (usually console). -i will include
information specified by the 'info' command.

-l log.file  will redirect all information to log.file

-p x.x.x.x.y is the proxy command. No, this is not SOCKSified, or anything 
else. Basically, the -p is for firewalls and such that you connect to the
PROXY, and then issue:

	GET http://my.target.webserver:80/page/i/want.htm HTTP/1.0

x.x.x.x is the IP address, and y is the port OF THE PROXY.

-u is just a simple way to give information on the commandline, that is
placed directly into the XXUser variable for use within the script. 
That way, you can have a configuration switch externally, like "-u 1"
will be normal scans, and "-u 2" will be extra-stealthy scans, etc.  Use
"if XXUser == ???" inside the script to query the value. 

-I tells whisker to invoke anti-IDS techniques.  Whisker originally came
with two types (the old -I is now -I 1, and the old -E is now -I 2). 
Version 1.3 added three more types (-I 4-6).  Types 4, 5 and 6 are new
and expirimental, and have not been fully tested.  Descriptions of
each type:

	-I 1 -- will "URLify" the request line, as spoken about earlier.
		It will encode all letters, numbers, dashes and dots as 
		their hex escaped sequence equivalent.
 
 	-I 2 -- will replace all / with /./, which breaks up the string
 		(i.e strings become /./cgi-bin/./some.cgi)
 		
 	-I 3 -- both option 1 & 2 (tested and found to be very effective
 		against most IDSes)

	-I 4 -- Experimental new evasion tactic.  Whisker tries to take
		advantage of improperly coded regex's in IDSes by
		'faking' the end of the URL request by sending:
		METHOD /%20HTTP%2F1.1%0D%0A%0D%0A/../../some.cgi HTTP/1.1
		which turns out looking like:
		METHOD / HTTP/1.1\r\n\r\n/../../some.cgi HTTP/1.1
		If an IDS improperly stops at the fake HTTP portion,
		they will miss the actual cgi request.
		
		NOTE: this does not work with Apache; IIS, on the other
		hand, uses it no problem.
		
	-I 5 -- Some IDSes only look within the first xxx bytes of the 
		URL, assuming that the request is the first thing to
		come in the packet.  Whisker tries to avoid this by
		sending a very long (approx 1-2K) directory, followed
		by a /../ and then the name of the CGI to scan.  Note
		that this method is a tad slower (have to generate lots
		of random strings) and is MAJORLY obvious in the web
		server logs.  The XXIDSMode5Limit variable controls 
		how much data is sent (in approx. 15 byte units).
	
	-I 6 -- Like the premature URL end tactic (mode 4), this mode
		tries to outsmart the 'smart' IDSes, which think it's
		ok to scrap the URL after the parameters (?param=..).
		Whisker will fake a parameter like so:
		METHOD /index.htm%3Fparam=/../some.cgi HTTP/1.1
		which turns out looking like:
		METHOD /index.htm?param=/../some.cgi HTTP/1.1


Note that modes 4 and 6 play on the fact that IDSes will typically
reconvert encoded characters (%20, %3F, etc) back before testing...
this isn't always the preferred approach.  'Packet grep' style IDSes
will be able to avoid these problems, since they just scan for the
particular CGI key-words...regardless to how they appear in the string.
'Smart' IDSes, on the other hand, who try to interpret around the HTTP
protocol (and take shortcuts) may fall prey to these tactics.


-N causes whisker to query Netcraft (www.netcraft.com) and see what they
think the OS is.  Not 100% reliable, but it's a start, and fairly accurate.

-S lets you override what server whisker parses the script as.  You submit
a server string, such as -S "Apache/1.3.1 PHP/3.0.2a".  Useful for
situations in which whisker can't determine the server type, and you want
to force it to assume a particular one.

-B are the new 'bounce' scan methods.  This causes whisker to bounce scans
via other types of servers, so that you do not directly contact the server--
it adds a layer of obfuscation to your scan.  Currently there are 4 types
of bounces:

	-B 1 -- bounce off of AltaVista.com (by Philip Stoev).  A very
		good scan, since AltaVista is a high capacity site, and
		not likely to keep track of what/where their crawler
		indexes, let alone of who submitted the request.
		However, AltaVista does not return page content, and
		an initial request to Netcraft needs to be made to 
		figure out the server type (since AltaVista does not
		report such information).
		
	-B 2 -- bounce off of Samspade.org (by Styx).  Whisker will
		reroute scans to use the anonymous proxy found at
		www.samspade.org.
		
	-B 3 -- bounce off of Anonymizer.com.  Reroute scans to use
		the anonymous proxy found at invis.free.anonymizer.com.
		
	-B 4 -- distributed proxy scanning.  This is a special scan
		type, where whisker will actually send each CGI scan
		request (could number up to ~160) through a different,
		RANDOM public proxy on the internet.  You need to
		initially download the proxy list from 
		fsu.virtualave.net (which is typically 500-600 proxies)
		using the -P command.  Once done, you can use the -B 4
		bounce to reroute each scan through a different proxy
		on the list.  Note that this method is slow, as a new
		server has to be contacted for each scan.  Whisker will
		handle rescanning if the proxy times out or is down.

That's it, go play!  Use the included scan.db for reference.  The rest
of this is technical information and whatnot. 


----[ Unix multi-threaded front-end use

With version 1.3, I've included a script named 'multi.pl'.  This is 
the multi-threaded (well, multi-forking) frontend for whisker.  By
default, multi.pl will run 5 whiskers in parallel to speed up your
scans.  To use multi.pl, you pass the exact same options as you would
normally pass to whisker.pl--there are no option changes.  multi.pl
will internally figure out how to divy up the work, and then call
whisker.pl with the appropriate options.

Note: multi.pl is only useful when you are scanning multiple hosts;
it is impossible for whisker to run parallel scans for the same host,
since it breaks all the logic and dependancies.  Also, the -l option
is not available for multi.pl; instead, either redirect to a log
(>whisker.log), or pipe to the 'tee' command (|tee whisker.log).


----[ Usage notes

Particular options and modes in whisker require the current working 
directory to be writable by yourself.  If you abort (CTRL-C) whisker
during a scan, there may be left over temp files.

Whisker will now automatically rescan with dumb.db; there is also a
feature function in whisker that will attempt to still identify the
server type (guess) before it rescans with dumb.db--this feature
is highly enhanced when used in combination with a nmap input file
that has OS identification.

Windows users should read the install.txt that comes with whisker.


----[ Global variable list

These are the variables accessible from within the script.  Why all the
prefixed XX's?  So you're less likely to clobber them. :)  I suggest
don't poke values into these unless you know what you're doing. 

** Note: this list is not complete, due to time constraints.  Check my
website for updated documentation and a full list.

Name		Default value		Description

XXPort		80		port to scan...80 for normal webservers
XXRoot		""         	default prefix for URLs...
XXMeth		HEAD     	how to retrieve the file...HEAD preferable
XXVer		HTTP/1.0  	http version for whisker to use
XXDebug		?		do we want debug output
XXVerbose	?		do we want verbose output
XXProxy		0		are we using a proxy
IP		?		ip address of target to connect to
XXTarget	?		actual target ip (host/ip modified for proxy scans)
XXBadMeth	1		bad method compensation if 400 or 500
XXSStr		?		return server software string
XXRet		?		http return code of page
XXRetStr	?		http return string
XXSVer		?		http version return from server
XXIDS		0		whether or not to use IDS spoofing
XXForce		0		force scan(), regardless of server (used for dumb.db)
XXForceS	0		force server() comparisons as well
XXCLLeak	?		content-location leak
XXIsIndex 	0		is it a directory index?
XXStopOnDir 	0		stop on a directory index
XXCLen		0		content length
XXAVHide	0		use AltaVista scan bounce? (-B 1)
XXAnonymizer	0		use anonymizer scan bounce? (-B 3)
XXInited	0             	has runinitial been called?
XXNetcraft	0		check with netcraft?
XXNetcraftOS	?		netcraft return results
XXNetcraftSStr	?		netcraft return results
XXReferer	1		send referer with each request?
XXGiveCookie	1		give back any cookies?
XXRescanDumb	0		do we need to rescan using dumb.db?
XXNoContent	0		stop after the headers, even on GET
XXTimeoutVal	20		timeout value per check, in seconds
XXIDSMode5Limit	100		approx. limit * 15 = IDS mode 5 length
XXUseSSL	0		use SSL on unix?


XXUserAgent	Mozilla/4.7 [en] (Win95; U)
XXSSLPath	/usr/local/ssl/bin/openssl

Proxy info (don't recommend you play with it):
XXProxy_addy, XXProxy2port, XXP_target

Cached inet_aton() result:  XXinet_aton


----[ Language reference

Ok, here's the commands that whisker supports in its scripts.

****NOTE: all {} are visual delimiters for viewing only--they are not to be 
included. If you see something like ({variable}), that means that the ()
are required, but the {} are not.  Also, all mentions of regex's are case
insensitive; however, variable names *ARE* case sensitive; commands are not.


-[ # {comment}

Just your usual, everyday comment.  COMMENTS MUST BE ON THEIR OWN LINE!

Example:
	# this is a comment, and won't be executed.

Bad bad bad:
	server (iis)  # if the server has IIS...


-[ print {something to print}

Print out {something to print}.  Duh.  Starting with version 1.3, you can now
use embeded $variables, or \n, \r, or \t.  You can NOT escape them (i.e
\$print only dollar, \\n, etc).  Ending with a \ keeps whisker from printing
a new line.

Example:
	print This will be printed to screen or logfile, depending on switches
	print Return code was $XXRet\n tab->\t<-tab

	print This is a \
	print line continuation (all on one line)


-[ printvarb {variable name}

This will print out the contents of the single variable {variable name}.
(variable name is case sensitive)

NOTE: depreciated by $variable support in print statements.

Example:
	printvarb XXRet


-[ exit

This will 'exit' the scan for the current host, and move along to the next
host to scan

Example:
	exit


-[ exitall

This will immediately exit the program all together, right then and there.

Example:
	exitall


-[ if {variable} {== or !=} {value}   (w/ endif)

Your standard logic test.  if {variable} is equal (==) or not equal (!=)
to the constant {value}, execute up to the first endif.

NOTE: whisker uses a quasi-equality/test system that's more convenient
in this type of situation.  If {value} is a numeric value (all numbers),
then whisker will use a pure "if variable is equal to value" test.
However, if {value} is a string (does not contain all numbers), than it
uses a regex instead, which is more along the lines of "if value is
contained within variable".  This is nicer to match string partials, etc;
granted, you don't want whisker returning 'True' when "20" is found within
"200", rather than "20" not equal to "200".

Example:
	if XXRet == 200
		print The page was found
	endif
	if XXRet != 200
		print The page was NOT found
	endif


-[ ifexist  (w/ endexist)

This command is equivalent to 'if XXRet == 200', and evaluates as true if
the resulting check came back 200 (meaning the page exists).

*Note: right now it's hardcoded to return value of 200...this will be
changed to be user-definable in the future.

Example:
	scan () cgi-bin >> test-cgi
	ifexist
		print They have the test-cgi CGI
		# other stuff to do
	endexist


-[ server ({server regex})   (w/ endserver)

This is basically a 'if the server string contains the string {server regex},
evalute it as true'.  {server regex} is case insensitive, and required.
Everything up to the first 'endserver' are evaluated.  Regex is case 
insensitive.  Starting with version 1.3 you can now use a variable in place
of the regex.

Example:
	server (iis)
		# stuff to do if server string has 'iis' in it
	endserver

	set name=Apache
	server ($name)
		# stuff to do if server string has 'apache' in it
	endserver


-[ set {variable} (.)= {value}

This will set the variable {variable} to {value}.  {value} can either be a 
constant you supply, or another variable name that starts with '$'.  You
don't need to worry about pre-allocating a variable...it will automatically
be created in it's first use.  {variable} and {value} are required.  The '$'
on {variable} is assumed, and can NOT be used.   Variable names and the
values you assign are case sensitive.  Starting with version 1.3, you can
use .= to append a value, rather than replace the value of a variable.

Example:
	set XXMeth = GET
	set MyReturnValue = $XXRet
	set XXMeth .= MORE
	# XXMeth is now GETMORE


Bad bad bad:
	set $MyReturnValue = Some_value_to_assign


-[ startgroup 

Reset the group counters, and start tracking group scans.  Essentially this
lets you see if a full group of files exists.  Note that a 'group' is 'true'
if all scans done since a startgroup have returned successfully.  If any
one scan in the group returns false, the 'group' is evaluated as false (used
with ifgroup, below).

Example:
	startgroup
	scan () cgi-bin >> phf
	scan () cgi-bin >> webdist
	ifgroup
	  print Wow, they have phf AND webdist!
	endifgroup


-[ ifgroup    (w/ endifgroup)

Evaluate the last scans since startgroup, and process if all scans were
successful.  See startgroup for more information and an example.


-[ info {stuff to print}

Print information if the -i switch has been used and the last scan was
successful.  This should be used to provide more information (exploit
info, informational links, notes, etc) about a successful scan.  Version
1.3 lets you use $variables, \n, \r and \t, similar to 'print'.

Example:
	scan () cgi-bin >> phf
	# print this stuff if they have used -i, and phf exists
	info Oh my god! They have phf!  How lame...
	info But then again, it could be one of those phf logger traps


-[ ifinfo   (w/ endinfo)

Evaluate and process if the -i switch was supplied.  Note that ifinfo
allows you to do more than just print information (you can put any 
whisker code in the block), and it does not consider the return status 
of the last scan.

Example:
	server (Apache)
	ifinfo
	# print this stuff only if the it's Apache server and -i switch
	print They're running Apache, in case you didn't notice
	# run any other commands here too
	endinfo
	endserver


-[ usehead

Sets the default method to 'HEAD', while also saving what the current
method was (which can be restored with restoremeth).

Example:
	usehead
	# this will now use HEAD
	scan () cgi-bin >> phf
	restoremeth


-[ useget

Sets the default method to 'GET', while also saving what the current
method was (which can be restored with restoremeth).

Example:
	useget
	# this will now use GET
	scan () cgi-bin >> phf
	restoremeth


-[ usepost

Sets the default method to 'POST', while also saving what the current
method was (which can be restored with restoremeth). 

*Note: whisker automatically adds the required headers for using POST
requests.  You can set what information is actually posted into the
XXPostData variable--whisker will automatically compute Content-Length.

Example:
	usepost
	# this will now use POST
	# use this if you want to submit extra post info
	set XXPostData = somevarb=crap&whatever=morecrap
	scan () cgi-bin >> phf
	restoremeth


-[ restoremeth

Restore to whatever (default) method was chosen before you ran a usehead,
useget, or usepost command.  You should not that this is not implemented
in stack fashion...if you useget, then usepost, then usehead, restoremeth
will then revert to the *PRIOR* method, or in this case, POST.  Therefore
you should always restoremeth before you use a different use* command, or
you will lose the default method.

Example:

	# default scan type is HEAD
	useget
	# now we're GET
	scan () cgi-bin >> phf
	restoremeth
	# we're back to HEAD
	usepost
	# now we're POST	
	scan () cgi-bin >> webdist
	restoremeth
	# we're back to HEAD

Wrong:
	
	# default scan type is HEAD
	useget
	# now we're GET
	scan () cgi-bin >> phf
	usepost
	# now we're POST	
	scan () cgi-bin >> webdist
	restoremeth
	# we're back to GET, we've lost our HEAD default.


-[ savemeth

Essentially does the save operation that useget, usepost, or usehead do
(which can be 'undone' with restoremeth).  This is here in case you want
to do more funky stuff with the XXMeth variable (for instance, use
TRACE, OPTIONS, or set it to * for the various test-cgi vulnerabilities).

Example:
	savemeth
	set XXMeth = TRACE
	restoremeth


-[ insert {file}

Insert the code found in {file} (if it exists) into the script at that
point.  Note that this is a pre-processing command, and done before
whisker even thinks of scanning a host.  Be aware that inserting a file
into a condition is *REALLY* tricky.

Example:
	insert servers.db


-[ fingerprint .{extension} {action}

This is the initial implementation of return code/page fingerprinting
(discussed in detail below).  Basically it causes whisker to verify that
a request with the specified {extension} does not return a 200 (for 
example, Cold Fusion returns a 200 OK for any .cfm request by default
on IIS--which makes it appear as if every .cfm request does indeed 
exist).  Valid actions at this point are skip and exit.  Note that
fingerprint is a pre-processing directive for each command--this means
no matter where the fingerprint command is located in the file, it is
ran *first* before anything else is ran for that host.

If action is skip, and whisker determines that the scanned host returns
200 OK results for that extension, it will just skip any scan with that
extension (and fake a 404 Not Found reply).  If action is exit, it will
print a notice that it exited on fingerprint catch, and move onto the
next host.  A good example of usage would be for scanning 
www.harley-davidson.com--any request for practically anything (.cgi,
.pl, etc) will result in a custom error page, which comes back as 200
OK.  All other scanners will flag this as 'file exists'.  With whisker
you can the option of detecting this anomaly and alerting you to it.

How whisker fingerprints:  right now, implementation is simple.
Whisker generates a random 20-character string, slaps on your 
extension, and requests it--assuming that it won't exist.  If it comes
back 200 OK, then it figures all future requests for that extension
are tainted and implements the fingerprint action handler for that
particular extension.  In the future this will evolve and become more
robust, but for now, it's more than adequate.

Example:
	# skip Cold Fusion files, if they all come back 200 OK
	fingerprint .cfm skip
	# skip this host if every .cgi comes back as 200 OK
	fingerprint .cgi exit


-[ eval   (w/ endeval)

Eval lets you embed raw perl code into your script to do whatever you want.
This gives your scripts unlimited functionality.  See the end of this doc
for eval/raw perl notes on whisker internals.  Note that everything between
eval and endeval is put into a variable, and then just ran through perl's
eval() function.  NOTE: EVAL IS SLOW.  The perl interpreter has to do it's
thing, and it is time consuming.  Just a warning.

Example:
	eval
	print STDOUT "This is a raw perl command\n";
	print "wow, you have a passwd file\n" if(-e "/etc/passwd");
	endeval


-[ array {name} = {comma delimited list}

Basically, you make an array named {name} with elements from your comma
delimited list.  This array is then referenced as @{name}, and given to
the scan function to scan the permutations of the names in the @array.
You can include another array in the list of elements...it will be added
inline.  Array name and values are case sensitive. Starting in version
1.3 you can now use $variable in the array names.

Example:
	# let's make an array of common unix cgi locations
	array roots = cgi-bin, cgi-local, cgibin, cgis, cgi

	set name=f
	array first = a,b,c
	array second = d, @first, e, $name
	# second = d,a,b,c,e,f


-[ scan ({optional server regex}) {dirs} >> {script}

This is the heart of whisker.  This command is what actually performs
the scanning.  There are a few aspects to the command.  First is the
{optional server regex}.   Scan will only do the check if the server 
regex is () or matches (similar to the server command).  Now, {dirs} 
and {script} are required.  {dirs} is a command delimited list of 
directories to check to see if {script} exists.  {dirs} may also contain
arrays made with the array command. {dirs} and {script} are case 
case sensitive.  In version 1.3, the server regex can be a variable
name (similar to 'server').
	
Examples:
	scan (iis) scripts/tools >> getdrvrs.exe
	array roots = cgi-bin, cgi-local, scripts
	array people = ~rfp, ~adm, ~wiretrip
	scan () @roots, @people/@roots >> my.cgi


-[ ifnmapinfo    (with endnmapinfo)

Allows you to process commands if nmap information for that host is 
available.

Examples:
	ifnmapinfo
		# this host has nmap information available
		print NMAP OS guess: $XXNmapOS
		print NMAP TCP ports: $XXNmapTCP
	endnmapinfo
	

-[ pingport {port numer}

Lets you check to see if a particular port is open on the host.  If
nmap information is available, the nmap port list will be consulted;
otherwise whisker will make an attempt to connect to that port on
the host to see if it's open (note: no stealth).  Useful to check
and see if other web ports are open (port 8080, 3128, etc).  Use
the normal 'ifexist', 'info', etc to determine if the port is open
(XXRet is set to '200' if open, '404' if closed).

Example:
	pingport 8080
	ifexist
		print - port 8080 is also open
	endexist	


-[ runinitial

If present, whisker will delay the initial contacting and fingerprint
until runinitial is encountered; this lets you change variables before
the scan is ran.  Note that you can't use 'scan', 'server', or anything
else that contacts the server, other than 'pingport'.  Note that if
runinitial is found in the database, whisker delays processing until it
is encountered.  Therefore, if you wish to scan multiple ports, make
sure you specify runinitial at the beginning of port set.  If 'runinitial'
is not found in the script, whisker automatically inserts it as the first
command ran.

Example:
	# modify whisker to run on port 8080
	pingport 8080
	ifexist
		set XXPort=8080
	endexist
	# set other variables and whatnot
	runinitial

Wrong:
	scan () / >> some.cgi

	set XXPort=8080

	# whisker won't start scanning until here
	runinitial
	scan () / >> some.cgi

Correct:
	# for port 80
	runinitial
	scan () / >> some.cgi

	set XXPort=8080
	# now for port 8080
	runinitial
	scan () / >> some.cgi


-[ clearpagecache

Tells whisker to reset the cache of tracked pages, directories, etc.
Needed if you want to tell whisker to start scanning on a different
port, or wish to start over.  If you don't, the directory caching
will carry over to the other port (i.e. if cgi-bin was found on port
80, it will automatically assume it's found on port 8080, unless you
clear the cache to cause whisker to rescan).

Example:
	# tell it to start of with port 80
	runinitial
	# look for some.cgi on port 80 (default)
	scan () / >> some.cgi

	# clear everything and set to look on port 8080
	clearpagecache
	set XXPort=8080
	runinitial
	scan () / >> some.cgi


----[ Advanced coding tekniq

-[ Logic evaluation

It's best to take a moment and explain how the 'if', 'ifexist', 'server',
etc work when evaluating logic.  Basically, whisker is not block oriented,
but line/linear oriented.  This leads to some nesting problems, but you can 
bend the rules here and there.  Now, let's say we have a simple 'if':

	if XXRet == 200
		print The page existed
	endif

Now, what whisker will do is evaluate the 'if XXRet == 200'.  If this is
true, it will just keep processing line by line.  If this is false, it 
will 'fast forward' to the first 'endif' it comes across.  Same for
'ifexist' (fast forward to first endexist) and 'server' (fast forward to
first endserver).  So you can see how the following nesting breaks:

	# if number one
	if XXRet == 200
		# if number two
		if XXRetStr == OK
			print Page exists
		# endif number one
		endif
		# if number three
		if XXRetStr == Not OK
			print Something is borked
		# endif number two
		endif
	# endif number three
	endif

Now, if 'if number one' is true, it will keep going line by line.  Same for
'if number two & three'.  But if 'if number one' fails, it will just fast 
forward to the first 'endif' it finds, in this case 'endif number one'.  This
means if XXRet does not equal 200, it *will still process* 'if XXRetStr ==
Not OK'...  Whisker will *NOT* fast forward to 'endif number three'.  So you 
can see how this can affect things.  Now, based on this, you can do some
tricks. For instance, a logical AND can be done like so:

	if XX == True
	if YY == True
			print Both XX and YY are true
	endif

Let's take a quick peek.  If XX is true, it continues.  If YY is true, it still
continues, and prints our message.  If either fail, they just fast forward to 
the first endif.  Simple enough.  Logical OR, on the otherhand, kinda sucks:

	if XX == True
		set MyOR = 1
	endif
	if YY == True
		set MyOR = 1
	endif
	if MyOR == 1
		print XX or YY was True
	endif

Yeah, way more code.  I think you get the point, so I won't trace it.  Next is
the simple IF/ELSE type structure.  Whisker has an 'if', but not 'else'.  You
can emulate it like:

	if XX == 1
		print It's 1!
	endif
	if XX != 1
		print It's something else than 1!
	endif

Again, simple stuff.  I'm sure the question of "why the hell don't you just
implement AND/OR and ELSE into whisker?".  My answer is 1. you can still do it
with a bit more code, 2. I want to keep it simple (stupid?), 3. the logic of
doing such would start getting out of control, and I don't want to get a formal
language thing going.  It's just a web scanner, man. :)

server() will run code if a particular server type is found.  But how do you
run code if it's *not* a particular server?  Say I wanted to run something if
the server WASN'T Apache...

	server (apache)
		set apache=1
	endserver
	if apache != 1
		# code to run if it's not apache
	endif

That's all there is to it.  Remember, the check/set variable/check again
procedure tends to work for most logic evaluation situations.

-[ Optimized scans

When I say 'optimized', it mean scans that are coded such that they 
produce the minimal number of requests.  We have the obvious example:

	scan () cgi-bin >> 1.cgi
	scan () cgi-bin >> 2.cgi
	scan () cgi-bin >> 3.cgi
	scan () cgi-bin >> 4.cgi

Here scanning for cgi-bin first is valuable; if it's not found, it will
save us 4 scans (3 if you count the scan for cgi-bin).  If found, it will
cost us one more additional one (5 total).  That gives us a worst case of
5 scans (if cgi-bin exists), best case of 1 scan (if cgi-bin does not exist).
Now, let's say we have:

	scan () cgi-bin/a/b/c >> 1.cgi
	scan () cgi-bin/d/e/f >> 2.cgi
	scan () cgi-local/1/2/3 >> 3.cgi
	scan () cgi-bin/g >> 4.cgi
	scan () cgi-bin >> 5.cgi

Now, the trick is, whisker will scan for all dirs *individually*.  This
means, for 1.cgi, it will scan:
	
	cgi-bin/
	cgi-bin/a/
	cgi-bin/a/b/
	cgi-bin/a/b/c/
	cgi-bin/a/b/c/1.cgi

Wow, that's a lot of scanning.  Same goes for 2, 3, and 4.cgi as well.
All together, with the above set of scans, we will be making 17 checks
(assuming everything exists).  That's worst case 17, best case 2 (2 is 
the check for cgi-bin and cgi-local, and they don't exist).

Now, optimization.  The point of checking for existance of parent dirs
is to speed up scanning of *many* scans that use that parent dir. So,
looking in our set, scanning for the existance of cgi-bin is a good thing,
because if it's not there, it will save us the rest of the checks for
1, 2, 4, and 5.cgi.  But, notice how /a/b/c of 1.cgi aren't shared.
There's no point to check them individually, because they're not shared
with any of the other scans.  What would be nice is if we could check
to see if cgi-bin existed (since knowing ahead of time will help with
the others), and if it does, just go straight to scanning /a/b/c/1.cgi.
Well, we can.  Note the optimized scans below:

	scan () cgi-bin >> a/b/c/1.cgi
	scan () cgi-bin >> d/e/f/2.cgi
	scan () / >> cgi-local/1/2/3/3.cgi
	scan () cgi-bin >> g/4.cgi
	scan () cgi-bin >> 5.cgi

Basically, whisker will do the following:

1. scan for /cgi-bin/ (in 1.cgi)
2. if /cgi-bin/ exists, scan for /cgi-bin/a/b/c/1.cgi right away
3. if /cgi-bin/ exists, scan for /cgi-bin/d/e/f/2.cgi right away
4. scan for /cgi-local/1/2/3/3.cgi right away (since no other scans
	use /cgi-local/ or other dirs, no point in checking them
	individually)
5. if /cgi-bin/ exists, scan for /cgi-bin/g/4.cgi right away
6. if /cgi-bin/ exists, scan for /cgi-bin/5.cgi right away

Wow, we just went from 17 checks to 6 (assuming everything exists).
Granted, 5 checks went to checking for 3.cgi originally.  Since we 
don't need those parent dirs for other scans, we reduced it to one.
That's a worst case of 6, best case of 2.  Much better than 17/2.  
And think of it this way, would you rather have 6 log entries, or 17?

So you can think of the scan function as such:

scan (server) {individual dirs to scan} >> {one thing to scan as a whole}

And just remember, every dir in the 'individual dirs to scan' will
cost you a check (unless cached).  So when scans share dirs in common
(ie they will be cached scan results), use them there.  Otherwise, you
want to push them to the 'scan as a whole' column.

Here's a worst case scenerio of over-optimization:

	scan () / >> scripts/tools/getdrvrs.exe
	scan () / >> scripts/samples/details.idc
	scan () / >> scripts/samples/ctguestb.idc

Now, this will force 3 scans.  Even if /scripts/ doesn't exist, it will
still make 3 scans.  Not as intelligent.  Now, one optimization would be:

	scan () scripts >> tools/getdrvrs.exe
	scan () scripts >> samples/details.idc
	scan () scripts >> samples/ctguestb.idc

Now, if /scripts exists, it will cost us 4 scans.  If /scripts does not,
it only costs us one. (that's a worst/best of 4/1)

	scan () scripts >> tools/getdrvrs.exe
	scan () scripts/samples >> details.idc
	scan () scripts/samples >> ctguestb.idc

Now, assuming scripts exists, and samples does too, it will cost us
5 scans for this, which would be:

	/scripts/
	/scripts/tools/getdrvrs.exe
	/scripts/samples (/scripts is cached)
	/scripts/samples/details.idc
	/scripts/samples/ctguestb.idc (/scripts/samples is cached)

If /scripts/samples didn't exist, but /scripts did, we'd have 3 scans.
If /scripts didn't exist at all, we'd have 1 scan. So really, the previous
optimization would be best (worst/best 4/1).  This optimization 
(worst/best 5/3(or 1)) would be good if there are other CGIs to check for in 
/scripts/samples (making the cached dir check of /scripts and 
/scripts/samples more use).

What it really comes down to is a numbers game, and somewhat psychology
as well.  Directory caching works well when the cache is obviously hit
many times...and it's actually a penalty at other times.  Look at the
pros and cons of it:

You can have 10 /cgi-bin/xxx.cgi checks--just like your normal CGI
scanner.  You cause 10 log entries, even if the scripts don't exist,
which stick out like a sore thumb.  With whisker, first you have
a log entry for /cgi-bin/, which is much less obvious then, say,
/cgi-bin/test-cgi.  I mean, /cgi-bin/, while suspicious, isn't as
obvious.  Now, if that check fails, you don't have the other 10 log
entries.  You just saved yourself those 10 red flags.  If the check
passes, well, then it's worth the red flags to see, right?  After all,
that's what the scanner is for. ;)  Granted, this is a very obvious
result.  But the numbers can be tweaked for any of the optimization
cases above.  How obvious are checks for /scripts/ and /script/samples/
compared to checks for /scripts/samples/details.idc, etc?  BTW, the
realtime IDS systems pick off the full URL requests from the wire only.
So a raw check for /cgi-bin/phf will set off the IDS, regardless of it's
existance.  A check for /cgi-bin, and a result of negative, will save 
you the step of even sending the URL, and therefore keep the IDS quiet.


----[  whisker perl internals for eval and coding

Ok, just a few quick notes on some of the inside perl code.  This will help
if you want to effectively use 'eval', or poke into the code.

All user and global variables are in %D.  So, to reference XXRet, for
instance, it would really be $D{'XXRet'}.

All user arrays are prefixed with 'D'.  So,
	array roots = a,b,c
would become @Droots in perl's 'process space'.  Again, this is to avoid
clobber.  Also note that whisker will also define $D{'Droots'}="--array--" 
for the 'roots' array.  Making arrays named XXRet and whatnot will start 
getting you into trouble, as the XXRet value will be clobbered with the
"--array--" string for a bit...

I suggest using wprint() to print stuff.  wprint() will correctly direct
to console or log file, depending on commandline options.  Use verbose()
to print stuff only when the verbose switch is used, and debugprint() to
print stuff only when the debug switch is used.

You can do http requests by using sendhttp() and sendraw(). *ALL* the
networking functionality (sockets, connects, etc) is contained *ONLY* in
sendraw().  Use rdecode() to decode the server's return value to human
readable string.

For code hackers, I have indicated within the code where to add commands
and where to add scan database pre-process code, anti-IDS functions,
scan bounce stuff, etc.


----[ Wish-list and future updates

Well, the most obvious one I can think of is a more rigorous language 
parsing.  Obviously a flex/yacc combo would be kickass, but I don't
want to port it off perl.  whisker, as it is, is useful demonstration
of theory...but hey, if you want to port it to C, go for it.  Why
didn't I just port it to C?  Well, mostly because perl's auto-allocation
is such a blessing, especially to my nasty array permutation code
in the scan function.  Plus, eval was a nice feature, and I just like
perl all around. :)


----[ What's to become of web scanners

My hope is that rather than make *another* cgichk.c, port it to rebol, 
add a few checks, etc,  that people will use whisker as the engine, and 
now just code cool suave kickass scan databases that are intelligent and
take advantage of the features.  I'd like to see a program that can
pre-process a scan database and optimize the scans--this actually wouldn't
be all that hard.  But I dunno, maybe people will just think whisker is 
stupid and I'll be laughed at.  So be it.


----[ Whisker in the news!

Infoworld did a writeup of whisker:
http://www.infoworld.com/articles/op/xml/99/11/29/991129opsecwatch.xml


----[ Signoff

Well, if you haven't thought I'm crazy by now for putting this much thought
into a CGI scanner, then maybe there's hope for me. :)  Whisker really came
about because of two reasons:  1. I needed something to I could easily
script web audits with (I was tired of rewriting C all the time), and 2.
I wanted to make proof of concept of the 'next-generation' of web scanner.
So here it is.  Now granted, my perl coding isn't the best, and I'd love
for someone to recode the scan function...that directory permutation stuff
is scary code.  But it works, and that's good for me. :)

Drop me a line if you like/use whisker, and definately send me snippets of
interesting scan scripts you make...I would like to compile a nice big
one with lots of intelligence.  Also, if you have ideas/bugs with the code,
let me know.

Till next time!
rain forest puppy		(rfp@wiretrip.net)


----------------[ whisker is GPL.  Do not steal.  Do not pass go.
----------------[ and definately do not collect $200 for my work.

-[ Yes, this document uses a Phrack-esque layout.

----[ EOF