---------[ Whisker: next-generation CGI scanner doc v1.3.0 --[ by rain forest puppy / ADM / wiretrip (rfp@wiretrip.net) ----[ Table of Contents +General +- - Background | - What whisker does/has | - array | - scan | - Command line reference (feature overview) | - Unix multi-threaded front-end use +- - Usage notes +Technical +- - Global variable list | - Language reference | - Advanced coding tekniq | - Logic evaluation | - Scan optimization +- - Notes for eval/internal coding +Misc +- - Wish list/future enhancements | - What's to become of web scanners | - Whisker in the news! +- - Signoff ----[ Background A CGI scanner is just a CGI scanner, right? And they're pretty lame apps to boot, right? Hmmm...well, perhaps. That's because no one has given any thought to them. Yeah, until I did. Perhaps I have to much time on my hands. ;) After reading this, I will be surprised if you don't think I've put way to much thought into this. I've waded through the pile of CGI scanners found on Packetstorm (before JP got his way; j3rk), Rootshell, etc. Team Void's VoidEye and the cgichk.* are the most comprehensive....but that seems to be the 'goal' they shoot for--try to have 'the most checks in any scanner'. Great. Nevermind the fact that some of the checks are completely wrong (I think it's funny to notice how the Cold Fusion '/expeval/' has propogated to so many scanners as '/expelval/'--one kiddie made a mistake, and they all copied.) Wait...CGI scanning isn't that complex, is it? Well, to do it right, yes. Why? Hmmm...I can think of a few reasons: 1. /cgi-bin is pretty damn common, I'll give you that. But I've also been on many a hosting provider that used /cgi-local. And I've seen people use /cgi, /cgibin, etc. Fact of the matter is that it could also be /~user/cgi-bin, or /~user/cgis, etc. Then there's some scripts that are all over the place, like wwwboard, which may or may not have it's own directory. Point of the point: wouldn't it be nice to define multiple directories? 2. You know what really irks me? Seeing a CGI scanner thrash around through /cgi-bin or whatnot, when /cgi-bin doesn't even exist. Talk about noisy in the logs. Now, if we waste a brain cell, we can see that if we query the /cgi-bin directory (by itself), we'll get a 200 (ok), 403 (forbidden), or 302 (for custom error pages) if it exists, or a 404 if it doesn't. Wow. So if we just do a quick check on /cgi-bin, and get a 404, we can save our however many /cgi-bin CGI checks we were going to make. That could save you 65 entries in the httpd logs. Point of the point: save noise/time by querying parent dirs 3. If you have more to spare, let's waste another brain cell for another obvious issue. Why should I query for, say, test-cgi on an IIS server? Or /scripts/samples/details.idc on Apache? Why should I even bother checking various httpds at all (like a firewall proxy, etc)? When we do a request, the server gives us it's name and version. How nice of them. How about we take advantage of their generosity? Point of the point: tailor your scan to the server you're scanning 4. Virtual hosts. Most webservers nowadays (especially Apache with it's VirtualHost directive, and IIS with its virtual host setup wizards) allow you to assign many actually domain name/websites to the same IP. Well, hell...how does the server know which site you want when you connect? Well, browsers give a second piece of information, the 'Host' directive. So, a request may look like: GET /~rfp/index.html HTTP/1.1 Host: www.el8.org So say we have SlikWilly Virtual Hosting, they run off RedHat Linux using Apache. They setup their only IP (as that's all they could afford for their $39.95/month shared DS0) to host the site www.slikwilly.com. Now, on the actual box, the location for their files are in /home/httpd/html/ for html files, and /home/httpd/cgi-bin/ for, whatelse, but their CGI apps. So a request to www.slikwilly.com/index.html is going to be pulled from /home/httpd/html/index.html. So far, so cool. Well the powers that be at Defcon decide that they've had it with catalog.com, since ADM hacked their webpage there. They want to move over to SlikWilly.com in hopes that it will keep those ADM people from changing the site. So Slik Willy himself hops into his httpd.conf and adds a VirtualHost directive for www.defcon.org. He sets up the html directory to be /home/defcon/html/, so that those Defcon people can ftp in via his nifty wu-ftpd-2.4.2(beta 18). So that means that www.defcon.org/index.html should be pulled from /home/defcon/html/index.html. Slik Willy also gives them their own cgi-bin, located in /home/defcon/html/cgi-bin/ (which means it's no silly aliased directory, since Slik doesn't understand all that stuff). So, now, in this situation, www.defcon.org is a *virtual* site off of www.slikwilly.com (the root site). What exactly does that mean will happen? Well, let's see: If I give the request: GET /index.html HTTP/1.0 I will get back the file at (assuming it exists): /home/httpd/html/index.html which is Slik Willy's file (www.slikwilly.com) If I check for: GET /cgi-bin/test-cgi HTTP/1.0 I will be checking for: /home/httpd/cgi-bin/test-cgi which is again Slik Willy's file (www.slikwilly.com) Now, if I check for: GET /index.html HTTP/1.0 Host: www.defcon.org I will get back: /home/defcon/html/index.html which is the www.defcon.org homepage Similarly: GET /cgi-bin/test-cgi HTTP/1.0 Host: www.defcon.org I will be checking: /home/defcon/html/cgi-bin/test-cgi which is in www.defcon.org's cgi-bin. Now, why does any of this fscking matter whatsoever? Well, imagine you wanted to be like ADM, and try to hack www.defcon.org again. So you whip out your trusty cgichk.c CGI scanner (oooh, you hacker you) and rev it up against www.defcon.org. Well, guess what--the scanner connects to Slik Willy's box, does generic requests (no Host), and winds up scanning Slik Willy's cgi-bin for cgis, not the actual www.defcon.org's cgi-bin. And there exists the possibility that www.defcon.org had way cooler stuff than Slik Willy. But lemme just make it known, this usually works in your favor. For instance, on IIS, the virtual hosts will *NOT* (unless specifically added) have /scripts mapped to them--but the root site will. So, trying to GET /scripts will work off the main (generic) site, but if you try a virtual host with Host directive, most likely /scripts won't be mapped over. Same for Slik Willy. test-cgi comes by default in /home/httpd/cgi-bin/, not /home/defcon/html/cgi-bin. So scanning the root site is better to find the 'default' install CGIs. Point of the point: there's a whole 'nother world out there hiding behind virtual hosts--and you may not be scanning who you think you really are 5. Some places user custom error pages. Unfortunately, the implementation is such that instead of generating a 404 'not found', you always get a 200 'success', with HTML to indicate the missing page. Point of the point: being able to minimize this anomaly would lessen false positives 6. More wishes: at a decent rate, it seems more CGI and webserver problems are found. Plus, I might like to customize which scans I want to do against a particular host. Having to edit C code and recompile everytime could quite severely suck, especially if I'm a lousy C coder to boot. Point of the point: if this was all scriptable, that'd be nifty 7. Input sources. I dunno about you, but I'm quite tired of doing bizarre awk/host -l combos, dumping them to a file, and then feeding them back into the various scanners. Sometimes I want to just feed in output from nmap (after all, it has a list of the found open port 80's, right?), sometimes just a laundry list of IPs/domains, and sometimes, I'd just like to do a single host on the command line. Point of the point: flexibility of input would be nice as well. 8. IDS/log avoidance. Do you know how many IDS alarms you'll set off by requesting /cgi-bin/phf? Let alone it's easy to spot in the logs. So instead of just handing over the plaintext, why not URL encoded all/part of it to break up the literal plaintext string, such as /cgi-%62in/ph%66. It keeps the string-matching/packet-grep IDS systems from getting a positive id, and the more encoded you make it, the harder it is to figure out what it is (on the flip side, it also stands out more in the logs, even if it's unknown what /%63%67%69%2d%62%69%6e/%66%69%6e%67%65%72 is really scanning for). Point of the point: being able to spoof IDSs would be a nice feature Well, that's enough wishes, don't you think? Now, do they come true.... ----[ Whisker has all that, plus a bonus feature or two :) Yeah, no kidding. Come on, I wouldn't wish for something that I didn't actually implement. I'd look dumb. :) My future wishes are down below at the end. Anyways, so whisker does all that. Let's look at the two basic functions of whisker, array and scan. This is a reprint of the command reference below, but a little more verbose. -[ array {name} = {comma delimited list} This is one of the two core commands of whisker (the other being scan). Basically, you make an array named {name} with elements from your comma delimited list. This array is then referenced as @{name}, and given to the scan function to scan the permutations of the names in the @array. You can include another array in the list of elements...it will be added inline. Example: # let's make an array of common unix cgi locations array roots = cgi-bin, cgi-local, cgibin, cgis, cgi array first = a,b,c array second = d, @first, e # second = d,a,b,c,e array bigroots = cgi-bin, cgi-bin/secret, cgi-bin/rfp # this is a big NO! array moreroots = cgi-bin/@first, rfp/@bigroots # only the scan() function will parse roots like this -[ scan ({optional server regex}) {dirs} >> {script} This is the heart of whisker. This command is what actually performs the scanning. There are a few aspects to the command. First is the {optional server regex}. You can do a server specific scan one of two ways: server (iis) scan () scripts/tools >> getdrvrs.exe endserver or shorten it as: scan (iis) scripts/tools >> getdrvrs.exe Scan will only do the check if the server regex is () or matches (similar to the server command). Now, {dirs} and {script} are required. {dirs} is a command delimited list of directories to check to see if {script} exists. {dirs} may also contain arrays made with the array command. Let's see some examples: scan () cgi-bin, cgi-local >> my.cgi will check for /cgi-bin/my.cgi and /cgi-local/my.cgi scan () a/b, a/c, a/d >> my.cgi will check for /a/b/my.cgi, /a/c/my.cgi, /a/d/my.cgi array subdirs = b,c,d scan () a/@subdirs >> my.cgi will check for /a/b/my.cgi, /a/c/my.cgi, /a/d/my.cgi scan () @subdirs >> my.cgi will check for /b/my.cgi, /c/my.cgi, /d/my.cgi scan () a, a/@subdirs, f/@subdirs/g >> my.cgi will scan for all those permutations, expanding out @subdirs into every combo involving the elements in @subdirs. So you see how powerful directory arrays can be. If we have an array of places we want to look for CGIs array roots = cgi-bin, cgi-local, scripts array people = ~rfp, ~adm, ~wiretrip we then can scan for wanted combos scan () @roots, @people/@roots >> my.cgi this is nice because we only have to adjust our arrays to compensate for different locations, and we can use the arrays for all our scans in the programs. How centralized. :) Also, by request, you can specify multiple files to scan with the following syntax: scan () @roots >> file1, file2, file3, file4 Note that this breaks evaluation logic (like 'info' and 'ifexist'). You can specify the root directory by using a single /, as such: scan () / >> index.html whisker automatically checks each directory as it goes in {dirs}, and caches the response. See 'Advanced coding tekniq: Optimized Scans' for more information on how to (ab)use this command properly. So basically, you define arrays of directories (although that's optional), and use scan to scan for the scripts. Easy enough. Plus, there's a suite of other simple logic to help out in our scanning endeavours. You can take a peek at the included sample scan.db to see usage, if you're a learn by doing/example type person. Anyways, onto using whisker.... ----[ Commandline reference Here is the commandline reference ripped from whisker itself: Usage: whisker (options) -n+ *nmap output (machine format, v2.06+) -h+ *scan single host (IP or domain) -H+ *host list to scan (file) -F+ *(for unix multi-threaded front end use only) -s+ specifies the script database file (defaults to scan.db) -V use virtual hosts when possible -N query Netcraft for server OS guess -S+ force server version (e.g. -S "Apache/1.3.6") -u+ user input; pass XXUser to script -i more info (exploit information and such) -v verbose. Print more information -d debug. Print extra crud++ (to STDERR) -l+ log to file instead of stdout -I 1 IDS-evasive mode 0 (URL encoding) -I 2 IDS-evasive mode 1 (/./ directory insertion) -I 3 IDS-evasive mode 2 (both 1 and 2) -I 4 IDS-evasive mode 4 (premature URL ending--NO APACHE) -I 5 IDS-evasive mode 5 (long URL) -I 6 IDS-evasive mode 6 (fake parameter) -A 1 alternate db format: Voideye exp.dat -A 2 alternate db format: cgichk*.r (in rebol) -A 3 alternate db format: cgichk.c/messala.c (not cgiexp.c) -p+ proxy off x.x.x.x port y (HTTP proxy--see docs) -P request and format proxy list from fsu.virtualave.net -B 1 bounce off of altavista.com (and netcraft.com) -B 2 bounce off of samspade.org -B 3 bounce off of anonymizer.com -B 4 bounce off fsu.virtualave.net proxy list (random) + requires parameter; * one must exist; You have three input options, -n, -h, or -H. You must have at least one, but you can use multiple. They are: -n nmap.file supply a nmap (v2.06+) *MACHINE FORMAT* output file you can get this by using nmap -m nmap.out whisker will read it in and check every host with port 80 found to be 'open' -h {ip or domain} single host. Just supply host on commandline, such as "-h www.microsoft.com" -H host.file this is essentially a laundry list of ips and/or domains, one per line, like thus: www.microsoft.com www.sun.com 123.123.145.167 The other important option is -s, which lets you specify the scan database to use. Starting with version 1.3, whisker will now default to scan.db, so you do not need to specify -s, unless you want to use a different database. Also starting with version 1.3, you can now specify alternate style lists of CGIs to scan. Whisker can read in VoidEye's exp.dat file, cgi-chk.r (written in Rebol), cgichk.c (the non-hex encoded one), and messala.c. Whisker will actually convert those files to whisker-compatible scan databases, and then will even apply some logic to them. And as a bonus, whisker will even fix the 'expelval' bug (it should be 'expeval') that so many of the scanners have copied from each other (even though it's wrong). To use an alternate format, you have to use the -s option to specify the location of the alternate file, and the -A option to specify the type, like so: For exp.dat (VoidEye) files: whisker.pl -s exp.dat -A 1 For cgi-chk.r (Rebol) files: whisker.pl -s cgi-chk.r -A 2 For mesalla.c, cgichk.c, and other generic .c files (NOT cgiexp.c): whisker.pl -s mesalla.c -A 3 -V tells whisker to attempt to use virtual host domains where-ever possible. If you're scanning an IP address, -V won't do anything. But if you're scanning a domain name, whisker will include the domain name in the Host: directive. -v will print more verbose information (to console or logfile). -d will print debugging information to STDERR (usually console). -i will include information specified by the 'info' command. -l log.file will redirect all information to log.file -p x.x.x.x.y is the proxy command. No, this is not SOCKSified, or anything else. Basically, the -p is for firewalls and such that you connect to the PROXY, and then issue: GET http://my.target.webserver:80/page/i/want.htm HTTP/1.0 x.x.x.x is the IP address, and y is the port OF THE PROXY. -u is just a simple way to give information on the commandline, that is placed directly into the XXUser variable for use within the script. That way, you can have a configuration switch externally, like "-u 1" will be normal scans, and "-u 2" will be extra-stealthy scans, etc. Use "if XXUser == ???" inside the script to query the value. -I tells whisker to invoke anti-IDS techniques. Whisker originally came with two types (the old -I is now -I 1, and the old -E is now -I 2). Version 1.3 added three more types (-I 4-6). Types 4, 5 and 6 are new and expirimental, and have not been fully tested. Descriptions of each type: -I 1 -- will "URLify" the request line, as spoken about earlier. It will encode all letters, numbers, dashes and dots as their hex escaped sequence equivalent. -I 2 -- will replace all / with /./, which breaks up the string (i.e strings become /./cgi-bin/./some.cgi) -I 3 -- both option 1 & 2 (tested and found to be very effective against most IDSes) -I 4 -- Experimental new evasion tactic. Whisker tries to take advantage of improperly coded regex's in IDSes by 'faking' the end of the URL request by sending: METHOD /%20HTTP%2F1.1%0D%0A%0D%0A/../../some.cgi HTTP/1.1 which turns out looking like: METHOD / HTTP/1.1\r\n\r\n/../../some.cgi HTTP/1.1 If an IDS improperly stops at the fake HTTP portion, they will miss the actual cgi request. NOTE: this does not work with Apache; IIS, on the other hand, uses it no problem. -I 5 -- Some IDSes only look within the first xxx bytes of the URL, assuming that the request is the first thing to come in the packet. Whisker tries to avoid this by sending a very long (approx 1-2K) directory, followed by a /../ and then the name of the CGI to scan. Note that this method is a tad slower (have to generate lots of random strings) and is MAJORLY obvious in the web server logs. The XXIDSMode5Limit variable controls how much data is sent (in approx. 15 byte units). -I 6 -- Like the premature URL end tactic (mode 4), this mode tries to outsmart the 'smart' IDSes, which think it's ok to scrap the URL after the parameters (?param=..). Whisker will fake a parameter like so: METHOD /index.htm%3Fparam=/../some.cgi HTTP/1.1 which turns out looking like: METHOD /index.htm?param=/../some.cgi HTTP/1.1 Note that modes 4 and 6 play on the fact that IDSes will typically reconvert encoded characters (%20, %3F, etc) back before testing... this isn't always the preferred approach. 'Packet grep' style IDSes will be able to avoid these problems, since they just scan for the particular CGI key-words...regardless to how they appear in the string. 'Smart' IDSes, on the other hand, who try to interpret around the HTTP protocol (and take shortcuts) may fall prey to these tactics. -N causes whisker to query Netcraft (www.netcraft.com) and see what they think the OS is. Not 100% reliable, but it's a start, and fairly accurate. -S lets you override what server whisker parses the script as. You submit a server string, such as -S "Apache/1.3.1 PHP/3.0.2a". Useful for situations in which whisker can't determine the server type, and you want to force it to assume a particular one. -B are the new 'bounce' scan methods. This causes whisker to bounce scans via other types of servers, so that you do not directly contact the server-- it adds a layer of obfuscation to your scan. Currently there are 4 types of bounces: -B 1 -- bounce off of AltaVista.com (by Philip Stoev). A very good scan, since AltaVista is a high capacity site, and not likely to keep track of what/where their crawler indexes, let alone of who submitted the request. However, AltaVista does not return page content, and an initial request to Netcraft needs to be made to figure out the server type (since AltaVista does not report such information). -B 2 -- bounce off of Samspade.org (by Styx). Whisker will reroute scans to use the anonymous proxy found at www.samspade.org. -B 3 -- bounce off of Anonymizer.com. Reroute scans to use the anonymous proxy found at invis.free.anonymizer.com. -B 4 -- distributed proxy scanning. This is a special scan type, where whisker will actually send each CGI scan request (could number up to ~160) through a different, RANDOM public proxy on the internet. You need to initially download the proxy list from fsu.virtualave.net (which is typically 500-600 proxies) using the -P command. Once done, you can use the -B 4 bounce to reroute each scan through a different proxy on the list. Note that this method is slow, as a new server has to be contacted for each scan. Whisker will handle rescanning if the proxy times out or is down. That's it, go play! Use the included scan.db for reference. The rest of this is technical information and whatnot. ----[ Unix multi-threaded front-end use With version 1.3, I've included a script named 'multi.pl'. This is the multi-threaded (well, multi-forking) frontend for whisker. By default, multi.pl will run 5 whiskers in parallel to speed up your scans. To use multi.pl, you pass the exact same options as you would normally pass to whisker.pl--there are no option changes. multi.pl will internally figure out how to divy up the work, and then call whisker.pl with the appropriate options. Note: multi.pl is only useful when you are scanning multiple hosts; it is impossible for whisker to run parallel scans for the same host, since it breaks all the logic and dependancies. Also, the -l option is not available for multi.pl; instead, either redirect to a log (>whisker.log), or pipe to the 'tee' command (|tee whisker.log). ----[ Usage notes Particular options and modes in whisker require the current working directory to be writable by yourself. If you abort (CTRL-C) whisker during a scan, there may be left over temp files. Whisker will now automatically rescan with dumb.db; there is also a feature function in whisker that will attempt to still identify the server type (guess) before it rescans with dumb.db--this feature is highly enhanced when used in combination with a nmap input file that has OS identification. Windows users should read the install.txt that comes with whisker. ----[ Global variable list These are the variables accessible from within the script. Why all the prefixed XX's? So you're less likely to clobber them. :) I suggest don't poke values into these unless you know what you're doing. ** Note: this list is not complete, due to time constraints. Check my website for updated documentation and a full list. Name Default value Description XXPort 80 port to scan...80 for normal webservers XXRoot "" default prefix for URLs... XXMeth HEAD how to retrieve the file...HEAD preferable XXVer HTTP/1.0 http version for whisker to use XXDebug ? do we want debug output XXVerbose ? do we want verbose output XXProxy 0 are we using a proxy IP ? ip address of target to connect to XXTarget ? actual target ip (host/ip modified for proxy scans) XXBadMeth 1 bad method compensation if 400 or 500 XXSStr ? return server software string XXRet ? http return code of page XXRetStr ? http return string XXSVer ? http version return from server XXIDS 0 whether or not to use IDS spoofing XXForce 0 force scan(), regardless of server (used for dumb.db) XXForceS 0 force server() comparisons as well XXCLLeak ? content-location leak XXIsIndex 0 is it a directory index? XXStopOnDir 0 stop on a directory index XXCLen 0 content length XXAVHide 0 use AltaVista scan bounce? (-B 1) XXAnonymizer 0 use anonymizer scan bounce? (-B 3) XXInited 0 has runinitial been called? XXNetcraft 0 check with netcraft? XXNetcraftOS ? netcraft return results XXNetcraftSStr ? netcraft return results XXReferer 1 send referer with each request? XXGiveCookie 1 give back any cookies? XXRescanDumb 0 do we need to rescan using dumb.db? XXNoContent 0 stop after the headers, even on GET XXTimeoutVal 20 timeout value per check, in seconds XXIDSMode5Limit 100 approx. limit * 15 = IDS mode 5 length XXUseSSL 0 use SSL on unix? XXUserAgent Mozilla/4.7 [en] (Win95; U) XXSSLPath /usr/local/ssl/bin/openssl Proxy info (don't recommend you play with it): XXProxy_addy, XXProxy2port, XXP_target Cached inet_aton() result: XXinet_aton ----[ Language reference Ok, here's the commands that whisker supports in its scripts. ****NOTE: all {} are visual delimiters for viewing only--they are not to be included. If you see something like ({variable}), that means that the () are required, but the {} are not. Also, all mentions of regex's are case insensitive; however, variable names *ARE* case sensitive; commands are not. -[ # {comment} Just your usual, everyday comment. COMMENTS MUST BE ON THEIR OWN LINE! Example: # this is a comment, and won't be executed. Bad bad bad: server (iis) # if the server has IIS... -[ print {something to print} Print out {something to print}. Duh. Starting with version 1.3, you can now use embeded $variables, or \n, \r, or \t. You can NOT escape them (i.e \$print only dollar, \\n, etc). Ending with a \ keeps whisker from printing a new line. Example: print This will be printed to screen or logfile, depending on switches print Return code was $XXRet\n tab->\t<-tab print This is a \ print line continuation (all on one line) -[ printvarb {variable name} This will print out the contents of the single variable {variable name}. (variable name is case sensitive) NOTE: depreciated by $variable support in print statements. Example: printvarb XXRet -[ exit This will 'exit' the scan for the current host, and move along to the next host to scan Example: exit -[ exitall This will immediately exit the program all together, right then and there. Example: exitall -[ if {variable} {== or !=} {value} (w/ endif) Your standard logic test. if {variable} is equal (==) or not equal (!=) to the constant {value}, execute up to the first endif. NOTE: whisker uses a quasi-equality/test system that's more convenient in this type of situation. If {value} is a numeric value (all numbers), then whisker will use a pure "if variable is equal to value" test. However, if {value} is a string (does not contain all numbers), than it uses a regex instead, which is more along the lines of "if value is contained within variable". This is nicer to match string partials, etc; granted, you don't want whisker returning 'True' when "20" is found within "200", rather than "20" not equal to "200". Example: if XXRet == 200 print The page was found endif if XXRet != 200 print The page was NOT found endif -[ ifexist (w/ endexist) This command is equivalent to 'if XXRet == 200', and evaluates as true if the resulting check came back 200 (meaning the page exists). *Note: right now it's hardcoded to return value of 200...this will be changed to be user-definable in the future. Example: scan () cgi-bin >> test-cgi ifexist print They have the test-cgi CGI # other stuff to do endexist -[ server ({server regex}) (w/ endserver) This is basically a 'if the server string contains the string {server regex}, evalute it as true'. {server regex} is case insensitive, and required. Everything up to the first 'endserver' are evaluated. Regex is case insensitive. Starting with version 1.3 you can now use a variable in place of the regex. Example: server (iis) # stuff to do if server string has 'iis' in it endserver set name=Apache server ($name) # stuff to do if server string has 'apache' in it endserver -[ set {variable} (.)= {value} This will set the variable {variable} to {value}. {value} can either be a constant you supply, or another variable name that starts with '$'. You don't need to worry about pre-allocating a variable...it will automatically be created in it's first use. {variable} and {value} are required. The '$' on {variable} is assumed, and can NOT be used. Variable names and the values you assign are case sensitive. Starting with version 1.3, you can use .= to append a value, rather than replace the value of a variable. Example: set XXMeth = GET set MyReturnValue = $XXRet set XXMeth .= MORE # XXMeth is now GETMORE Bad bad bad: set $MyReturnValue = Some_value_to_assign -[ startgroup Reset the group counters, and start tracking group scans. Essentially this lets you see if a full group of files exists. Note that a 'group' is 'true' if all scans done since a startgroup have returned successfully. If any one scan in the group returns false, the 'group' is evaluated as false (used with ifgroup, below). Example: startgroup scan () cgi-bin >> phf scan () cgi-bin >> webdist ifgroup print Wow, they have phf AND webdist! endifgroup -[ ifgroup (w/ endifgroup) Evaluate the last scans since startgroup, and process if all scans were successful. See startgroup for more information and an example. -[ info {stuff to print} Print information if the -i switch has been used and the last scan was successful. This should be used to provide more information (exploit info, informational links, notes, etc) about a successful scan. Version 1.3 lets you use $variables, \n, \r and \t, similar to 'print'. Example: scan () cgi-bin >> phf # print this stuff if they have used -i, and phf exists info Oh my god! They have phf! How lame... info But then again, it could be one of those phf logger traps -[ ifinfo (w/ endinfo) Evaluate and process if the -i switch was supplied. Note that ifinfo allows you to do more than just print information (you can put any whisker code in the block), and it does not consider the return status of the last scan. Example: server (Apache) ifinfo # print this stuff only if the it's Apache server and -i switch print They're running Apache, in case you didn't notice # run any other commands here too endinfo endserver -[ usehead Sets the default method to 'HEAD', while also saving what the current method was (which can be restored with restoremeth). Example: usehead # this will now use HEAD scan () cgi-bin >> phf restoremeth -[ useget Sets the default method to 'GET', while also saving what the current method was (which can be restored with restoremeth). Example: useget # this will now use GET scan () cgi-bin >> phf restoremeth -[ usepost Sets the default method to 'POST', while also saving what the current method was (which can be restored with restoremeth). *Note: whisker automatically adds the required headers for using POST requests. You can set what information is actually posted into the XXPostData variable--whisker will automatically compute Content-Length. Example: usepost # this will now use POST # use this if you want to submit extra post info set XXPostData = somevarb=crap&whatever=morecrap scan () cgi-bin >> phf restoremeth -[ restoremeth Restore to whatever (default) method was chosen before you ran a usehead, useget, or usepost command. You should not that this is not implemented in stack fashion...if you useget, then usepost, then usehead, restoremeth will then revert to the *PRIOR* method, or in this case, POST. Therefore you should always restoremeth before you use a different use* command, or you will lose the default method. Example: # default scan type is HEAD useget # now we're GET scan () cgi-bin >> phf restoremeth # we're back to HEAD usepost # now we're POST scan () cgi-bin >> webdist restoremeth # we're back to HEAD Wrong: # default scan type is HEAD useget # now we're GET scan () cgi-bin >> phf usepost # now we're POST scan () cgi-bin >> webdist restoremeth # we're back to GET, we've lost our HEAD default. -[ savemeth Essentially does the save operation that useget, usepost, or usehead do (which can be 'undone' with restoremeth). This is here in case you want to do more funky stuff with the XXMeth variable (for instance, use TRACE, OPTIONS, or set it to * for the various test-cgi vulnerabilities). Example: savemeth set XXMeth = TRACE restoremeth -[ insert {file} Insert the code found in {file} (if it exists) into the script at that point. Note that this is a pre-processing command, and done before whisker even thinks of scanning a host. Be aware that inserting a file into a condition is *REALLY* tricky. Example: insert servers.db -[ fingerprint .{extension} {action} This is the initial implementation of return code/page fingerprinting (discussed in detail below). Basically it causes whisker to verify that a request with the specified {extension} does not return a 200 (for example, Cold Fusion returns a 200 OK for any .cfm request by default on IIS--which makes it appear as if every .cfm request does indeed exist). Valid actions at this point are skip and exit. Note that fingerprint is a pre-processing directive for each command--this means no matter where the fingerprint command is located in the file, it is ran *first* before anything else is ran for that host. If action is skip, and whisker determines that the scanned host returns 200 OK results for that extension, it will just skip any scan with that extension (and fake a 404 Not Found reply). If action is exit, it will print a notice that it exited on fingerprint catch, and move onto the next host. A good example of usage would be for scanning www.harley-davidson.com--any request for practically anything (.cgi, .pl, etc) will result in a custom error page, which comes back as 200 OK. All other scanners will flag this as 'file exists'. With whisker you can the option of detecting this anomaly and alerting you to it. How whisker fingerprints: right now, implementation is simple. Whisker generates a random 20-character string, slaps on your extension, and requests it--assuming that it won't exist. If it comes back 200 OK, then it figures all future requests for that extension are tainted and implements the fingerprint action handler for that particular extension. In the future this will evolve and become more robust, but for now, it's more than adequate. Example: # skip Cold Fusion files, if they all come back 200 OK fingerprint .cfm skip # skip this host if every .cgi comes back as 200 OK fingerprint .cgi exit -[ eval (w/ endeval) Eval lets you embed raw perl code into your script to do whatever you want. This gives your scripts unlimited functionality. See the end of this doc for eval/raw perl notes on whisker internals. Note that everything between eval and endeval is put into a variable, and then just ran through perl's eval() function. NOTE: EVAL IS SLOW. The perl interpreter has to do it's thing, and it is time consuming. Just a warning. Example: eval print STDOUT "This is a raw perl command\n"; print "wow, you have a passwd file\n" if(-e "/etc/passwd"); endeval -[ array {name} = {comma delimited list} Basically, you make an array named {name} with elements from your comma delimited list. This array is then referenced as @{name}, and given to the scan function to scan the permutations of the names in the @array. You can include another array in the list of elements...it will be added inline. Array name and values are case sensitive. Starting in version 1.3 you can now use $variable in the array names. Example: # let's make an array of common unix cgi locations array roots = cgi-bin, cgi-local, cgibin, cgis, cgi set name=f array first = a,b,c array second = d, @first, e, $name # second = d,a,b,c,e,f -[ scan ({optional server regex}) {dirs} >> {script} This is the heart of whisker. This command is what actually performs the scanning. There are a few aspects to the command. First is the {optional server regex}. Scan will only do the check if the server regex is () or matches (similar to the server command). Now, {dirs} and {script} are required. {dirs} is a command delimited list of directories to check to see if {script} exists. {dirs} may also contain arrays made with the array command. {dirs} and {script} are case case sensitive. In version 1.3, the server regex can be a variable name (similar to 'server'). Examples: scan (iis) scripts/tools >> getdrvrs.exe array roots = cgi-bin, cgi-local, scripts array people = ~rfp, ~adm, ~wiretrip scan () @roots, @people/@roots >> my.cgi -[ ifnmapinfo (with endnmapinfo) Allows you to process commands if nmap information for that host is available. Examples: ifnmapinfo # this host has nmap information available print NMAP OS guess: $XXNmapOS print NMAP TCP ports: $XXNmapTCP endnmapinfo -[ pingport {port numer} Lets you check to see if a particular port is open on the host. If nmap information is available, the nmap port list will be consulted; otherwise whisker will make an attempt to connect to that port on the host to see if it's open (note: no stealth). Useful to check and see if other web ports are open (port 8080, 3128, etc). Use the normal 'ifexist', 'info', etc to determine if the port is open (XXRet is set to '200' if open, '404' if closed). Example: pingport 8080 ifexist print - port 8080 is also open endexist -[ runinitial If present, whisker will delay the initial contacting and fingerprint until runinitial is encountered; this lets you change variables before the scan is ran. Note that you can't use 'scan', 'server', or anything else that contacts the server, other than 'pingport'. Note that if runinitial is found in the database, whisker delays processing until it is encountered. Therefore, if you wish to scan multiple ports, make sure you specify runinitial at the beginning of port set. If 'runinitial' is not found in the script, whisker automatically inserts it as the first command ran. Example: # modify whisker to run on port 8080 pingport 8080 ifexist set XXPort=8080 endexist # set other variables and whatnot runinitial Wrong: scan () / >> some.cgi set XXPort=8080 # whisker won't start scanning until here runinitial scan () / >> some.cgi Correct: # for port 80 runinitial scan () / >> some.cgi set XXPort=8080 # now for port 8080 runinitial scan () / >> some.cgi -[ clearpagecache Tells whisker to reset the cache of tracked pages, directories, etc. Needed if you want to tell whisker to start scanning on a different port, or wish to start over. If you don't, the directory caching will carry over to the other port (i.e. if cgi-bin was found on port 80, it will automatically assume it's found on port 8080, unless you clear the cache to cause whisker to rescan). Example: # tell it to start of with port 80 runinitial # look for some.cgi on port 80 (default) scan () / >> some.cgi # clear everything and set to look on port 8080 clearpagecache set XXPort=8080 runinitial scan () / >> some.cgi ----[ Advanced coding tekniq -[ Logic evaluation It's best to take a moment and explain how the 'if', 'ifexist', 'server', etc work when evaluating logic. Basically, whisker is not block oriented, but line/linear oriented. This leads to some nesting problems, but you can bend the rules here and there. Now, let's say we have a simple 'if': if XXRet == 200 print The page existed endif Now, what whisker will do is evaluate the 'if XXRet == 200'. If this is true, it will just keep processing line by line. If this is false, it will 'fast forward' to the first 'endif' it comes across. Same for 'ifexist' (fast forward to first endexist) and 'server' (fast forward to first endserver). So you can see how the following nesting breaks: # if number one if XXRet == 200 # if number two if XXRetStr == OK print Page exists # endif number one endif # if number three if XXRetStr == Not OK print Something is borked # endif number two endif # endif number three endif Now, if 'if number one' is true, it will keep going line by line. Same for 'if number two & three'. But if 'if number one' fails, it will just fast forward to the first 'endif' it finds, in this case 'endif number one'. This means if XXRet does not equal 200, it *will still process* 'if XXRetStr == Not OK'... Whisker will *NOT* fast forward to 'endif number three'. So you can see how this can affect things. Now, based on this, you can do some tricks. For instance, a logical AND can be done like so: if XX == True if YY == True print Both XX and YY are true endif Let's take a quick peek. If XX is true, it continues. If YY is true, it still continues, and prints our message. If either fail, they just fast forward to the first endif. Simple enough. Logical OR, on the otherhand, kinda sucks: if XX == True set MyOR = 1 endif if YY == True set MyOR = 1 endif if MyOR == 1 print XX or YY was True endif Yeah, way more code. I think you get the point, so I won't trace it. Next is the simple IF/ELSE type structure. Whisker has an 'if', but not 'else'. You can emulate it like: if XX == 1 print It's 1! endif if XX != 1 print It's something else than 1! endif Again, simple stuff. I'm sure the question of "why the hell don't you just implement AND/OR and ELSE into whisker?". My answer is 1. you can still do it with a bit more code, 2. I want to keep it simple (stupid?), 3. the logic of doing such would start getting out of control, and I don't want to get a formal language thing going. It's just a web scanner, man. :) server() will run code if a particular server type is found. But how do you run code if it's *not* a particular server? Say I wanted to run something if the server WASN'T Apache... server (apache) set apache=1 endserver if apache != 1 # code to run if it's not apache endif That's all there is to it. Remember, the check/set variable/check again procedure tends to work for most logic evaluation situations. -[ Optimized scans When I say 'optimized', it mean scans that are coded such that they produce the minimal number of requests. We have the obvious example: scan () cgi-bin >> 1.cgi scan () cgi-bin >> 2.cgi scan () cgi-bin >> 3.cgi scan () cgi-bin >> 4.cgi Here scanning for cgi-bin first is valuable; if it's not found, it will save us 4 scans (3 if you count the scan for cgi-bin). If found, it will cost us one more additional one (5 total). That gives us a worst case of 5 scans (if cgi-bin exists), best case of 1 scan (if cgi-bin does not exist). Now, let's say we have: scan () cgi-bin/a/b/c >> 1.cgi scan () cgi-bin/d/e/f >> 2.cgi scan () cgi-local/1/2/3 >> 3.cgi scan () cgi-bin/g >> 4.cgi scan () cgi-bin >> 5.cgi Now, the trick is, whisker will scan for all dirs *individually*. This means, for 1.cgi, it will scan: cgi-bin/ cgi-bin/a/ cgi-bin/a/b/ cgi-bin/a/b/c/ cgi-bin/a/b/c/1.cgi Wow, that's a lot of scanning. Same goes for 2, 3, and 4.cgi as well. All together, with the above set of scans, we will be making 17 checks (assuming everything exists). That's worst case 17, best case 2 (2 is the check for cgi-bin and cgi-local, and they don't exist). Now, optimization. The point of checking for existance of parent dirs is to speed up scanning of *many* scans that use that parent dir. So, looking in our set, scanning for the existance of cgi-bin is a good thing, because if it's not there, it will save us the rest of the checks for 1, 2, 4, and 5.cgi. But, notice how /a/b/c of 1.cgi aren't shared. There's no point to check them individually, because they're not shared with any of the other scans. What would be nice is if we could check to see if cgi-bin existed (since knowing ahead of time will help with the others), and if it does, just go straight to scanning /a/b/c/1.cgi. Well, we can. Note the optimized scans below: scan () cgi-bin >> a/b/c/1.cgi scan () cgi-bin >> d/e/f/2.cgi scan () / >> cgi-local/1/2/3/3.cgi scan () cgi-bin >> g/4.cgi scan () cgi-bin >> 5.cgi Basically, whisker will do the following: 1. scan for /cgi-bin/ (in 1.cgi) 2. if /cgi-bin/ exists, scan for /cgi-bin/a/b/c/1.cgi right away 3. if /cgi-bin/ exists, scan for /cgi-bin/d/e/f/2.cgi right away 4. scan for /cgi-local/1/2/3/3.cgi right away (since no other scans use /cgi-local/ or other dirs, no point in checking them individually) 5. if /cgi-bin/ exists, scan for /cgi-bin/g/4.cgi right away 6. if /cgi-bin/ exists, scan for /cgi-bin/5.cgi right away Wow, we just went from 17 checks to 6 (assuming everything exists). Granted, 5 checks went to checking for 3.cgi originally. Since we don't need those parent dirs for other scans, we reduced it to one. That's a worst case of 6, best case of 2. Much better than 17/2. And think of it this way, would you rather have 6 log entries, or 17? So you can think of the scan function as such: scan (server) {individual dirs to scan} >> {one thing to scan as a whole} And just remember, every dir in the 'individual dirs to scan' will cost you a check (unless cached). So when scans share dirs in common (ie they will be cached scan results), use them there. Otherwise, you want to push them to the 'scan as a whole' column. Here's a worst case scenerio of over-optimization: scan () / >> scripts/tools/getdrvrs.exe scan () / >> scripts/samples/details.idc scan () / >> scripts/samples/ctguestb.idc Now, this will force 3 scans. Even if /scripts/ doesn't exist, it will still make 3 scans. Not as intelligent. Now, one optimization would be: scan () scripts >> tools/getdrvrs.exe scan () scripts >> samples/details.idc scan () scripts >> samples/ctguestb.idc Now, if /scripts exists, it will cost us 4 scans. If /scripts does not, it only costs us one. (that's a worst/best of 4/1) scan () scripts >> tools/getdrvrs.exe scan () scripts/samples >> details.idc scan () scripts/samples >> ctguestb.idc Now, assuming scripts exists, and samples does too, it will cost us 5 scans for this, which would be: /scripts/ /scripts/tools/getdrvrs.exe /scripts/samples (/scripts is cached) /scripts/samples/details.idc /scripts/samples/ctguestb.idc (/scripts/samples is cached) If /scripts/samples didn't exist, but /scripts did, we'd have 3 scans. If /scripts didn't exist at all, we'd have 1 scan. So really, the previous optimization would be best (worst/best 4/1). This optimization (worst/best 5/3(or 1)) would be good if there are other CGIs to check for in /scripts/samples (making the cached dir check of /scripts and /scripts/samples more use). What it really comes down to is a numbers game, and somewhat psychology as well. Directory caching works well when the cache is obviously hit many times...and it's actually a penalty at other times. Look at the pros and cons of it: You can have 10 /cgi-bin/xxx.cgi checks--just like your normal CGI scanner. You cause 10 log entries, even if the scripts don't exist, which stick out like a sore thumb. With whisker, first you have a log entry for /cgi-bin/, which is much less obvious then, say, /cgi-bin/test-cgi. I mean, /cgi-bin/, while suspicious, isn't as obvious. Now, if that check fails, you don't have the other 10 log entries. You just saved yourself those 10 red flags. If the check passes, well, then it's worth the red flags to see, right? After all, that's what the scanner is for. ;) Granted, this is a very obvious result. But the numbers can be tweaked for any of the optimization cases above. How obvious are checks for /scripts/ and /script/samples/ compared to checks for /scripts/samples/details.idc, etc? BTW, the realtime IDS systems pick off the full URL requests from the wire only. So a raw check for /cgi-bin/phf will set off the IDS, regardless of it's existance. A check for /cgi-bin, and a result of negative, will save you the step of even sending the URL, and therefore keep the IDS quiet. ----[ whisker perl internals for eval and coding Ok, just a few quick notes on some of the inside perl code. This will help if you want to effectively use 'eval', or poke into the code. All user and global variables are in %D. So, to reference XXRet, for instance, it would really be $D{'XXRet'}. All user arrays are prefixed with 'D'. So, array roots = a,b,c would become @Droots in perl's 'process space'. Again, this is to avoid clobber. Also note that whisker will also define $D{'Droots'}="--array--" for the 'roots' array. Making arrays named XXRet and whatnot will start getting you into trouble, as the XXRet value will be clobbered with the "--array--" string for a bit... I suggest using wprint() to print stuff. wprint() will correctly direct to console or log file, depending on commandline options. Use verbose() to print stuff only when the verbose switch is used, and debugprint() to print stuff only when the debug switch is used. You can do http requests by using sendhttp() and sendraw(). *ALL* the networking functionality (sockets, connects, etc) is contained *ONLY* in sendraw(). Use rdecode() to decode the server's return value to human readable string. For code hackers, I have indicated within the code where to add commands and where to add scan database pre-process code, anti-IDS functions, scan bounce stuff, etc. ----[ Wish-list and future updates Well, the most obvious one I can think of is a more rigorous language parsing. Obviously a flex/yacc combo would be kickass, but I don't want to port it off perl. whisker, as it is, is useful demonstration of theory...but hey, if you want to port it to C, go for it. Why didn't I just port it to C? Well, mostly because perl's auto-allocation is such a blessing, especially to my nasty array permutation code in the scan function. Plus, eval was a nice feature, and I just like perl all around. :) ----[ What's to become of web scanners My hope is that rather than make *another* cgichk.c, port it to rebol, add a few checks, etc, that people will use whisker as the engine, and now just code cool suave kickass scan databases that are intelligent and take advantage of the features. I'd like to see a program that can pre-process a scan database and optimize the scans--this actually wouldn't be all that hard. But I dunno, maybe people will just think whisker is stupid and I'll be laughed at. So be it. ----[ Whisker in the news! Infoworld did a writeup of whisker: http://www.infoworld.com/articles/op/xml/99/11/29/991129opsecwatch.xml ----[ Signoff Well, if you haven't thought I'm crazy by now for putting this much thought into a CGI scanner, then maybe there's hope for me. :) Whisker really came about because of two reasons: 1. I needed something to I could easily script web audits with (I was tired of rewriting C all the time), and 2. I wanted to make proof of concept of the 'next-generation' of web scanner. So here it is. Now granted, my perl coding isn't the best, and I'd love for someone to recode the scan function...that directory permutation stuff is scary code. But it works, and that's good for me. :) Drop me a line if you like/use whisker, and definately send me snippets of interesting scan scripts you make...I would like to compile a nice big one with lots of intelligence. Also, if you have ideas/bugs with the code, let me know. Till next time! rain forest puppy (rfp@wiretrip.net) ----------------[ whisker is GPL. Do not steal. Do not pass go. ----------------[ and definately do not collect $200 for my work. -[ Yes, this document uses a Phrack-esque layout. ----[ EOF