by
CASIMIR
Part A
With
A Supplementary Cryptanlysis by Mike Stay from
Bruce Schneier's Crypto-Gram
Read Mike Stay's cryptanalytic summary. Mike is a cryptographer at AccessData.
The Crypto-Gram, as Bruce writes, is "a free monthly newsletter providing summaries, analyses, insights, and commentaries on cryptography and computer security. Bruce's site, of course, is Counterpane Systems. To subscribe, visit subscribe or send a blank message to Subscribe by mail.
Software by PepSoft
(You might need to use FTP Search to find the software.)
Try searching for WXF.
This essay presents in-extenso the reverse-engineering and cracking of an encryption utility distributed by PEPSOFTWARE Ltd, WinXFiles(TM). Map of the essay:
Introduction Strategy Getting prepared PART A - REVERSING OF WINXFILES 1. First contact 2. Where is the fork? 3. The Check routine 4. The making of Check_string 5. The Remainder 6. Decryption process PART B - ANALYSIS AND BREAKING OF WINXFILES Method 1: let's suppose Tail = 000... Step 1: Solving Step 2: Checking password found Notes on Method 1 Method 2: Tail = garbage Step 1: How to find 2 characters from pwd Step 2: System(s) solving Step 3: Checking password found Bad luck! Notes on Method 2 PART C - 'C' SOURCE OF CRACKER
WinXFiles is a perfect example of what should NOT be done in order to efficiently encrypt data. The algorithm used may look 100% secure "on the paper", but doesn't resist patient analysis. It's a proprietary algo: no one knows (well...used to know) its inner working but its creators. You can be pretty sure that a PROPRIETARY algo is always a big CRAP. Why do you think those brainy guys don't get their nice algo published? To prevent jealous competitors from stealing the idea?
Wrong: competitors would just laugh at them and crack it.
But once coded, burried deep in a big .exe or .dll, with an attractive interface on top of it, who will ever know?
On the opposite end, serious schemes (such as BlowFish or DES) can be found fully documented in many cryptography books, everyone knows how they operate, but no quick break has been found yet. Of course, some strong encryption schemes are fully patented, and their usage is submited to a strict regulation (for instance >32-bit BlowFish can not be used outside of the USA and Canada). Moreover, having an algo patented costs money.
With a proprietary algo, you seldom face this sort of problem {:-).
But, of course:
"No one may modify or patch the WinXFiles executable files in any way, including but not limited to decompiling, disassembling, or otherwise reverse engineering the programs." (extract from WinXFiles'doc)I can not care less about such non-sense! Goto FraVia'page for interesting thoughts on subject "Is reverse-engineering illegal?" (WHAT? Don't tell me you don't know by heart his address: Fravia's Pages of Reverse Engineering)
Indeed, the name of the provider of this program should be PEEPSHOW instead of PEPSOFT, 'cause this program is "specially" designed to protect your hot images collection from indiscrete viewing, particulary at work! BTW, you can also encrypt pictures of your mother-in-law (NOT naked, please) or just non sexually-oriented plaintext if you prefer, WinXFiles doesn't care...
This is my third paper on encryption cracking, so I guess my methods are not that bad {:-)
When trying to find out which pwd has been used to encrypt a file, you always face the same problem: the HUGE number of pwd possible. For instance, WinXFiles accepts passwords with lengths between 5 and 30 characters. Upper/lower case, number, symbol are allowed, so a character can take approximatly 2^8 (256) values. This represents:
2^(8*5) + 2^(8*6) + ... + 2^(8*29) + 2^(8*30) = 1.7E+72 (!) pwds to try out.
Of course, no computer is able of handling such a big number in a reasonnable laps of time. To obtain a cracker that works on a standard PC, we must REDUCE strongly this number. This can be achieved by gaining extra information on the inner working of WinXFiles, i.e. by reversing it.
We will reverse WinXFiles 32-bit v3.3 (WXFILES.EXE : 904192 bytes). Versions 2.7, 3.3, 3.4 (and probably others) work the same way, so cracker will work on all of them. Get it on PEPSOFTWARE'site: PepSoftware
Tools needed:In this chapter, we're going to collect some precious info on WXF, which will help us to break it in PART B.
Choose a few text and/or picture files with different names'length (ex: text.txt, picture.txt...), and encrypt them with various pwds. WXF remembers good pwd and original file's name. You will also discern something interesting:
OK, enough assumptions, let's Winice a bit.
We will watch carefully the actions performed by WXF, from introduction of input (a pwd, good or bad) to the end of pwd'checking procedure. There's no need to study the decryption process in order to crack WXF (but it may be interesting anyway).
I ------------------ G ------------ Input ----->|checking routine|-------------------------->|decryption| ^ ------------------ | Good, go on ------------ | | W -----------<-------------------- Wrong, try againCreate a text file called EXAMPLE.TXT, with string ABCDEF (6 bytes) inside. Encrypt it with pwd CASIMIR (UPPERCASE!). You will obtain encrypted file EXAMPLE.XFD.
Now fire Winice, the hunt begins. Select EXAMPLE.XFD from list for decryption, enter bogus pwd: 13579 in little window, but don't press OK yet.
We look for branch I (see scheme above) in code. To pin-point it, we must find out which Windows'function collects input. Check functions most often used for this purpose (GetDlgItemTextA, GetWindowTextA) by setting BPX in Winice, then press OK. It doesn't work: Winice doesn't pop up, and you get the Bad Pwd message {:-(.
Nevermind, there's always a work-around: instead of I, we'll try to localize G and W. Branch G can be localized easily: when pwd is good, WXF must write decrypted file to disk. So we set a BPX on function: WriteFile, and Winice pops up at beginning of WriteFile code. Return from function (F11), ok, WriteFile was called from CS:404580. Now we work backward to find out the fork where branches G and W separate. We return from portion 4045xx: call to this portion took place at code location CS:47332F
:0047332F call 0040455CObviously, fork is located *before* offset 47332F in code, at a conditional jump (je, jne). There are (at least) four suspect (see listing below). Let's set BPXs on all of them, and trace execution as we enter respectively a Good (G) pwd and a Wrong (W) pwd:
:004732BC mov eax, dword ptr [ebp+F940] G W :004732C2 pop edx G W :004732C3 call 00406104 G W :004732C8 test eax, eax G W fork? :004732CA je 004732D9 G W :004732CC mov byte ptr [0047CB0C], 01 W :004732D3 inc dword ptr [0047CB08] W :004732D9 cmp byte ptr [0047CB0C], 00 W fork? :004732E0 jne 00473370 G W :004732E6 lea eax, dword ptr [ebp-14] G :004732E9 push eax G :004732EA mov edx, 0047BB04 G :004732EF mov ecx, 00001000 G :004732F4 lea eax, dword ptr [ebp+FE98] G :004732FA call 004044F8 G :004732FF cmp dword ptr [ebp-14], 00000000 G fork? :00473303 je 00473366 G :00473305 cmp dword ptr [ebp-08], 00000001 G fork? :00473309 jne 00473315 G :0047330B mov eax, dword ptr [ebp-14] :0047330E call 004739C4 :00473313 jmp 0047331D :00473315 mov eax, dword ptr [ebp-14] G :00473318 call 00473A3C G :0047331D lea eax, dword ptr [ebp-18] G :00473320 push eax G :00473321 mov edx, 0047BB04 G :00473326 mov ecx, dword ptr [ebp-14] G :00473329 lea eax, dword ptr [ebp+FD4C] G :0047332F call 0040455C //call to WriteFile GThe fork we're looking for is the one at CS:4732CA. Branching depends on value returned by call to CS:406104 (return value is stored in register eax):
-> eax EQUAL 0 : GOOD pwd -> eax NOT EQUAL 0 : WRONG pwd
Let's investigate what's going on down here, at CS:406104. Before following call, have a look at arguments (eax and edx) passed with call. Dump ds:eax and ds:edx :
-> eax points to garbage (in fact, a check string which contains: 57.66.5F.5F.64.64.64.3C.5E.72.6C.96.57...) -> edx points to our Input string ("13579" : 31.33.35.37.39 in hexa) :00406104 push esi //entry point :00406105 push edi :00406106 push ebx :00406107 mov esi, eax //ds:esi contains Check_string :00406109 mov edi, edx //ds:edi contains Input_string :0040610B or eax, eax :0040610D je 00406112 :0040610F mov eax, dword ptr [eax-04] //Check_string'length (0x0D) :00406112 or edx, edx :00406114 je 00406119 :00406116 mov edx, dword ptr [edx-04] //Input_string'length (0x05) :00406119 mov ecx, eax :0040611B cmp ecx, edx //same lengths? :0040611D jbe 00406121 :0040611F mov ecx, edx //if different lengths, take smallest :00406121 cmp ecx, ecx :00406123 repz :00406124 cmpsb //COMPARE Input_string TO Check_string :00406125 je 00406151 //if strings are the same : set eax to 0 and return . //if strings are different : set eax to a non-zero . //value and return . . :00406156 retIn this case, Check_string differs from Input_string, so WXF refuses to decrypt.
Now, try to decrypt file with Input_string: CASIMIR, you'll obtain Check_string: CASIMIR, and WXF will decrypt the file.
So WXF will let you in if and only if:
Check_string = Input_stringObviously, this equality should occur only when input_string = good pwd.
BTW, this is NOT always the case with WXF: check routine is not restrictive enough, it always lets you in when you give the good pwd, BUT in some special cases, it lets you in even if you gave a wrong pwd (I told you it was a crap)!
Instead of building a Check_string (we still don't know HOW), WXF could have extracted good pwd from encrypted file and checked it against Input_string. Right, but it would have been a huge flaw! Once decrypted, poor little pwd is just like a white lamb in dark code jungle, it can't hide from hungry Winice {;-) BTW, i discovered some encryption applications that do that mistake! (have a look at Turbo Encrypto for instance). No need to add that now they rest in peace...OK, in order to crack WXF, we must understand HOW the hell Check_string is built. That's the main purpose of next chapter.
Well, we can reasonably assume that WXF extracts Check_string from EXAMPLE.XFD, so we'll set a breakpoint on function ReadFile. Winice pops up many times, but the location which raises our interest is the one at CS:4731E8.
:004731D4 lea eax, dword ptr [ebp-14] :004731D7 push eax :004731D8 mov edx, 0047BB04 //start storing chars read at DS:47BB04 :004731DD mov ecx, 000000FF //reads FF (255) characters from file :004731E2 lea eax, dword ptr [ebp+FE98] :004731E8 call 004044F8 //go and read EXAMPLE.XFDAfter call, dump memory at DS:47BB04. This is the beginning of file EXAMPLE.XFD, the first 255 characters. Huuummm, seems that header assumption was right! And now, we just have to monitor access to that data zone, setting a BPR DS:47BB04 DS:47BB04+FF RW. We quickly discover that a routine is fiddling with our header:
:473A3A mov eax,eax //eax=FF ... :473A40 xor esi,esi //esi=0 :473A42 xor ecx,ecx //ecx=0 :473A44 mov cl,byte ptr [0047BAF8] //cl=mysterious_value :473A4A mov edi,eax //edi=FF (255 chars from Header to read) :473A4C test edi,edi :473A4E jle 00473AB9 :473A50 mov eax,0047BB04 //ds:eax -> Header ****>:473A55 movzx ebp,byte ptr [eax] //reads one char from Header * :473A58 dec ecx //cl-- * :473A59 inc esi //esi++ (counts chars read from Input) * :473A5A test ecx, ecx * :473A5C jne 00473A66 //jump if ecx>0 * :473A5E xor ecx,ecx //ecx=0 * :473A60 mov cl,byte ptr [0047BAF8] //cl=mysterious_value * :473A66 xor edx,edx //edx=0 * :473A68 mov dl,byte ptr [0047B7E8] //dl=Input's lengthDump DS:47B7E8, you'll see: 05.31.33.35.37.39. Yes, this is our Input ('13579'), along with input's length (5 characters). So register dl contains Input's length. OK, let's see what's next...
* :473A6E inc edx * :473A6F cmp esi,edx * :473A71 jne 00473A78 //jump if char(s) to read from Input yet * L :473A73 mov esi,1 //restart reading Input from the beginning * O :473A78 xor edx,edx //edx=0 * O :473A7A mov dl,byte ptr [esi+0047B7E8] //dl=char 'esi' from Input * P :473A80 test cl,1 * :473A83 je 00473A89 //jump if cl is even * 2 :473A85 mov edx,ecx //edx=ecx * 5 :473A87 jmp 00473A8D * 5 :473A89 mov edx,ecx //edx=ecx * :473A8B neg edx //edx=(100-edx) * T :473A8D xor ebx,ebx //ebx=0 * I :473A8F mov bl,byte ptr [esi+0047B7E8] //bl=char i from Input * M :473A95 sub edx,ebx //edx-=char 'esi' from Input * E :473A97 add edx,ebp //edx+=char 'eax' from Header * S :473A99 cmp edx,FF * :473A9F jle 00473AA9 //jump if edx<=FF * :473AA1 sub edx,100 //edx-=100, now we have: edx in [0;FF] * :473AA7 jmp 00473AB3 * :473AA9 test edx,edx * :473AAB jge 00473AB3 //jump if edx>=0 * :473AAD add edx,100 //edx+=100, now we have: edx in [0;FF] * :473AB3 mov byte ptr [eax],dl //crushes Header whith Check_string * :473AB5 inc eax //eax++, ready to read char i+1 from Header * :473AB6 dec edi //counter-- *****:473AB7 jne 00473A55 //jump if char(s) to read from Header yet ... :473ABD retWe discern that Header is remplaced by Check_string. To sum up:
Check_string[i] = (Header[i]-Input[i modulo 5]+cl[i]) +/- 100 and * if mysterious_value is EVEN: cl[0] = mysterious_value - 1 cl[2*i] = cl[0] - (2*i) cl[2*i+1] = 100 - cl[0] + (2*i+1) * if mysterious_value is ODD: cl[0] = 100 - mysterious_value - 1 cl[2*i] = 100 - cl[0] + (2*i) cl[2*i+1] = cl[0] - (2*i+1)
Once mysterious_value is known, we can calculate cl at any rank (cl[0], cl[1],...,cl[100],...). It's like pseudo-random numbers: if you know the seed, you know everything.
For instance, here is what we obtain with EXAMPLE.XFD and Input="13579":
mysterious_value = 0A (even) So we have cl[0] = 0A-1 = 09 i=0: cl[1] = cl[2*0+1] = 100-09+(2*0+1) = F8 i=1: cl[2] = cl[2*1] = 09-(2*1) = 07 i=1: cl[3] = cl[2*1+1] = 100-09+(2*1+1) = FA i=2: cl[4] = cl[2*2] = 09-(2*2) = 05 i=2: cl[5] = cl[2*2+1] = 100-09+(2*2+1) = FC i=3: ect... <--------------- 255 characters ------------------> Header(bef.): 7F A1 8D 9C 98 99 94 73 94 B5 94 D1 85... + cl: 09 F8 07 FA 05 FC 03 FE 01 F6 09 F8 07... - <----------------> <----------------> <--------- Input: 31 33 35 37 39 31 33 35 37 39 31 33 35... = ----------------------------------------------------- Header(aft.): 57 166 5F 15F 64 164 64 13C 5E 172 6C 196 57... So we obtain: Check_string= 57 66 5F 5F 64 64 64 3C 5E 72 6C 96 57... = Wf__ddd<^rlûW...But what happens if we enter the good pwd ('CASIMIR')? Well, if pwd is correct we obtain following structure:
<--------------- 255 characters ------------------> Header(aft.): E X A M P L E . T X T | C A S I M I R \ 0 0 0 0 0...0 | Check_string \ (0 means 0x00)
| and \ are used by WXF as delimiters for stored pwd. If they are not present, WXF uses the whole 255 characters as a Check_string.
ONE byte from original data will correspond to ONE byte of encrypted data. Original data is stored just after Header:
<------------- 255 bytes ------------><---- original file size ----> original_file_name|password\000000...0??????????????????????????...? <--------------- HEADER -------------><------ ORIGINAL DATA -------> <- TAIL ->
WXF acts in a very weird way. Open a session with WXF, encrypt a first file: you will have the "0000..." tail just after "\". Now encrypt a second file: you will NOT have the "0000..." tail! Beginning of Header remains the same (original_file_name|password\) but after that you just have garbage. No zeros... Anyway, the Tail doesn't play any role in decryption process, it just acts as a filler, so Header's size is always 255 bytes.
One question remains: where does mysterious_value come from?
The routine that builds Check_string (see above) reads mysterious_value at memory location DS:47BAF8. As usual, we set a BPR on this location, and we discover that mysterious_valus is produced by a small routine:
:00472A07 nop :00472A08 push ebx :00472A09 xor ecx,ecx //ecx=0 (sum of Input's characters) :00472A0B xor edx,edx //edx=0 (counter for characters read from Input) :00472A0D mov dl,byte ptr [0047B7E8] //dl=Input's length :00472A13 test edx,edx :00472A15 jle 00472A26 //jump if Input's length=0 :00472A17 mov eax,0047B7E9 //ds:47B7E9 -> Input (31.33.35.37.39) ***>:00472A1C xor ebx,ebx //ebx=0 * :00472A1E mov bl,byte ptr [eax] //bl=character i from Input * :00472A20 add ecx,ebx //ecx+=character i from Input * :00472A22 inc eax //ready to read character i+1 * :00472A23 dec edx ****:00472A24 jne 00472A1C //jump if char(s) to read from Input yet :00472A26 mov eax,ecx :00472A28 mov ecx,FF :00472A2D cdq :00472A2E idiv ecx //edx=remainder of (chars'sum divided by FF) :00472A30 mov byte ptr [0047BAF8], dl //writes mysterious_value :00472A36 pop ebx :00472A37 retSo we have:
Sum of Input's characters mysterious_value = remainder of: ------------------------- FFWith Input= 13579, we get: remainder of [(31+33+35+37+39)/FF] = 0A
We'll always have: remainder included in [00 ; FE].
In fact, we don't even need to know how WXF proceeds to decrypt a file, because we only look for the *password* stored in Header. Anyway, it may be of some interest, for instance to add a viewer to the cracker we're going to write (so you don't need WXF anymore {;-) ).
Enter correct pwd, then survey calls to ReadFile function occuring AFTER fork. There is a call at CS:40451D, which copys encrypted data from file to DS:47BB04:
C A S I M I R remainder of [(43+ 41+ 53+ 49+ 4D+ 49+ 52)/FF] = 0A (even) encrypted DATA: 7B 8B 8F 93 8D 93 + Remain: 09 F8 07 FA 05 FC - Input: 43 41 53 49 4D 49 = ---------------------- decrypted DATA: 41 142 43 144 45 146 +/- 100: 41 42 43 44 45 46 ASCII: A B C D E F
-encrypted Pictures get the extension : XFP, -encrypted Data (text) " " : XFD,but encryption process is exactly the same for picture and text. Extension is only a way to determine correct viewer to be used.
OK, we have stolen enough info from WXF to be able to crack it right now {:-)
Click here for Part B.Copyright December 1998 by Casimir.
Mail Casimir
Converted to hypertext by Joe Peschel December 14, 1998.
Revised by Joe Peschel Feb. 18, 1999.