                                                           original: 02.01.2004
                                                            updated: 04.02.2004
-------------------------------------------------------------------------------
Tutorial to Haldir's Applied Math CrackMe
-------------------------------------------------------------------------------

This solution has been submitted on crackmes.de as well about a month ago,
but there are some fixes and minor improvements in this version.
Anyways, don't expect too much from this tutorial..
I don't think I am good at teaching (yet?).
Maybe the keygen itself is the most interesting part of this publication.

Some general information about the crackme first:
Haldir tried to confuse the reverser by using the loop unroll compiler option,
so there's a lot of code to paste.. loop unrolling means that for loops are
converted into a (possibly quite long) sequentially executed piece of code.
It is possible that you won't understand everything of the listed disassembly,
if you don't have your only disassembly of the crackme in front of you to
discover where all the values come from. I'll try to explain the necessary
things though.

After I completely analyzed the serial routine, i came to the
conclusion that i have to start the "attack" from the bottom.
The working serial provided by Haldir in a publication on [RET]
helped me a lot to do this.
The approach will be to write a keygen, that takes the first 11 digits of
a serial as input, and produces an appropriate right part to this input value.
(If you try it the other way round, trying to find a left value to
a given right value, you will run into trouble.)


Here comes the check immediately before the "serial-is-correct-message":

00401D37 loc_401D37:                             ; CODE XREF: sub_401000+DD7j
00401D37        mov     edi, dword_4092C0[ebp*4]   ;set pointer to a row of matrix M3
00401D3E        movzx   eax, byte ptr [edi]        ;get 1st byte of current matrix row
00401D41        imul    eax, [esp+0D0h+var_CC]     ;multiply with x[0]
00401D46        movzx   edx, byte ptr [edi+1]      ;get 2nd byte of current matrix row
00401D4A        imul    edx, esi                   ;multiply with x[1]
00401D4D        add     eax, edx                   ;sum them up
00401D4F        movzx   edx, byte ptr [edi+2]      ;...
00401D53        imul    edx, [esp+0D0h+var_D0]
00401D57        add     eax, edx
00401D59        movzx   edx, byte ptr [edi+3]
00401D5D        imul    edx, [esp+0D0h+var_C8]
00401D62        add     eax, edx
00401D64        movzx   edx, byte ptr [edi+4]
00401D68        imul    edx, [esp+0D0h+var_C4]
00401D6D        add     eax, edx
00401D6F        movzx   edx, byte ptr [edi+5]
00401D73        imul    edx, [esp+0D0h+var_A8]
00401D78        add     eax, edx
00401D7A        movzx   edx, byte ptr [edi+6]
00401D7E        imul    edx, [esp+0D0h+var_AC]
00401D83        add     eax, edx
00401D85        movzx   edx, byte ptr [edi+7]
00401D89        imul    edx, [esp+0D0h+var_B0]
00401D8E        add     eax, edx
00401D90        movzx   edx, byte ptr [edi+8]
00401D94        imul    edx, [esp+0D0h+var_B4]
00401D99        add     eax, edx
00401D9B        movzx   edx, byte ptr [edi+9]
00401D9F        imul    edx, [esp+0D0h+var_BC]
00401DA4        add     eax, edx
00401DA6        movzx   edx, byte ptr [edi+0Ah]
00401DAA        imul    edx, [esp+0D0h+var_C0]     ;11 imuls in the loop
                                                   ;=> 11 variables
00401DAF        add     eax, edx                ;now we got one value from the matrix mult.

the result is taken modulo 17 by the following piece of code:
00401DB1        movzx   ecx, ax
00401DB4        mov     eax, 78787879h
00401DB9        imul    ecx
00401DBB        mov     eax, ecx
00401DBD        sar     eax, 1Fh
00401DC0        sar     edx, 3
00401DC3        sub     edx, eax
00401DC5        imul    eax, edx, 11h
00401DC8        sub     ecx, eax
00401DCA        mov     [esp+ebp+0D0h+var_24], cl  ;write value to result vector
00401DD1        add     ebp, 1
00401DD4        cmp     ebp, 0Bh        ;loop 11 times
00401DD7        jb      loc_401D37      ;we loop through 11 matrix rows

As you can see, this is a loop, which executes 11 times. It performs a simple
matrix multiplication (matrix*vector)..
The result vector of this matrix multiplication is checked against "constant" values.
These values are the left part of the input serial s[] (first 11 bytes).


Here is that check:

00401DDD        mov     al, [esp+0D0h+var_70]   ;s[0], 1st byte of input serial
00401DE1        cmp     al, [esp+0D0h+var_24]
00401DE8        jnz     loc_401F24              ;s[0] = r3[0] ?
00401DEE        mov     al, [esp+0D0h+var_6F]
00401DF2        cmp     al, [esp+0D0h+var_23]
00401DF9        jnz     loc_401F24              ;s[1] = r3[1] ?
00401DFF        mov     al, [esp+0D0h+var_6E]
00401E03        cmp     al, [esp+0D0h+var_22]
00401E0A        jnz     loc_401F24              ;s[2] = r3[2] ?
00401E10        mov     al, [esp+0D0h+var_6D]
00401E14        cmp     al, [esp+0D0h+var_21]
00401E1B        jnz     loc_401F24              ;s[3] = r3[3] ?
00401E21        mov     al, [esp+0D0h+var_6C]
00401E25        cmp     al, [esp+0D0h+var_20]
00401E2C        jnz     loc_401F24              ;s[4] = r3[4] ?
00401E32        mov     al, [esp+0D0h+var_6B]
00401E36        cmp     al, [esp+0D0h+var_1F]
00401E3D        jnz     loc_401F24              ;s[5] = r3[5] ?
00401E43        mov     al, [esp+0D0h+var_6A]
00401E47        cmp     al, [esp+0D0h+var_1E]
00401E4E        jnz     loc_401F24              ;s[6] = r3[6] ?
00401E54        mov     al, [esp+0D0h+var_69]
00401E58        cmp     al, [esp+0D0h+var_1D]
00401E5F        jnz     loc_401F24              ;s[7] = r3[7] ?
00401E65        mov     al, [esp+0D0h+var_68]
00401E69        cmp     al, [esp+0D0h+var_1C]
00401E70        jnz     loc_401F24              ;s[8] = r3[8] ?
00401E76        mov     al, [esp+0D0h+var_67]
00401E7A        cmp     al, [esp+0D0h+var_1B]
00401E81        jnz     loc_401F24              ;s[9] = r3[9] ?
00401E87        mov     al, [esp+0D0h+var_66]
00401E8B        cmp     al, [esp+0D0h+var_1A]
00401E92        jnz     loc_401F24              ;s[10] = r3[10] ?
                                             ;if all those checks succeed the serial is ok
00401E98        push    offset aCongratulation ; "Congratulations, your SN is correct"


So what we have is a 11x11 linear system of equations (modulo 17).
In the keygen you see "zz_p::init(17);", which sets a new modulo value to
compute with. We have the matrix hardcoded into the exe and the solution
vector is specified by the input serial. Since the matrix is quadratic and
linearly independant, we can solve this system to a unique solution vector.
I.e. we solve M3*x = r3, where M3 and r3 are completely known. I leave the
task of solving this linear system of equations to the NTL lib:

solve(d, x, transpose(M3), r3);

fills the vector x[] with the appropriate result.
Note: M3 has to be transposed, because NTL calculates solutions of x*A = r.

With this x we can go one step back in the algo and try to get rid of
the 2nd check (another loop looking similar to the other check).
The x we retrieved from this solve call turns into the result vector of
the 2nd system. You can verify that, when you execute the crackme with a
valid serial. The result of the 2nd matrix multiplication becomes
the multiplication vector of the 3rd matrix multiplication (which we
discussed above).

first the matrix multiplication code:

0040169A loc_40169A:                             ; CODE XREF: sub_401000+837j
0040169A        mov     ebp, dword_40932C[edi*4] ;specify a matrix row of M2
004016A1        movzx   eax, byte ptr [ebp+0]    ;get 1st byte of current matrix row
004016A5        imul    eax, [esp+0D0h+var_C4]   ;multiply with 1st vector entry
004016AA        movzx   edx, byte ptr [ebp+1]    ;get 2nd byte of current matrix row
004016AE        imul    edx, [esp+0D0h+var_88]   ;multiply with 2nd vectro entry
004016B3        add     eax, edx                 ;etc..
004016B5        movzx   edx, byte ptr [ebp+2]
004016B9        imul    edx, [esp+0D0h+var_8C]
004016BE        add     eax, edx
004016C0        movzx   edx, byte ptr [ebp+3]
004016C4        imul    edx, [esp+0D0h+var_90]
004016C9        add     eax, edx
004016CB        movzx   edx, byte ptr [ebp+4]
004016CF        imul    edx, [esp+0D0h+var_B4]
004016D4        add     eax, edx
004016D6        movzx   edx, byte ptr [ebp+5]
004016DA        imul    edx, [esp+0D0h+var_C0]
004016DF        add     eax, edx
004016E1        movzx   edx, byte ptr [ebp+6]
004016E5        imul    edx, [esp+0D0h+var_84]
004016EA        add     eax, edx
004016EC        movzx   edx, byte ptr [ebp+7]
004016F0        imul    edx, [esp+0D0h+var_C8]
004016F5        add     eax, edx
004016F7        movzx   edx, byte ptr [ebp+8]
004016FB        imul    edx, [esp+0D0h+var_A0]
00401700        add     eax, edx
00401702        movzx   edx, byte ptr [ebp+9]
00401706        imul    edx, [esp+0D0h+var_9C]
0040170B        add     eax, edx
0040170D        movzx   edx, byte ptr [ebp+0Ah]
00401711        imul    edx, [esp+0D0h+var_98]
00401716        add     eax, edx
00401718        movzx   edx, byte ptr [ebp+0Bh]
0040171C        imul    edx, [esp+0D0h+var_94]
00401721        add     eax, edx
00401723        movzx   edx, byte ptr [ebp+0Ch]
00401727        imul    edx, [esp+0D0h+var_A4]
0040172C        add     eax, edx
0040172E        movzx   edx, byte ptr [ebp+0Dh]
00401732        imul    edx, [esp+0D0h+var_A8]
00401737        add     eax, edx
00401739        movzx   edx, byte ptr [ebp+0Eh]
0040173D        imul    edx, [esp+0D0h+var_AC]
00401742        add     eax, edx
00401744        movzx   ecx, ax
00401747        mov     eax, 0B21642C9h
0040174C        imul    ecx
0040174E        mov     eax, ecx
00401750        sar     eax, 1Fh
00401753        add     edx, ecx
00401755        sar     edx, 4
00401758        sub     edx, eax
0040175A        imul    eax, edx, 17h 
0040175D        sub     ecx, eax        ; this time take the result modulo 23
0040175F        movzx   eax, cl
00401762        mov     [esp+edi+0D0h+var_34], al ; store it in result vector r2
[...]
00401834        cmp     edi, 0Bh             ;11 matrix rows multiplicated
00401837        jb      loc_40169A

The result of the matrix multiplication is taken modulo 23 this time,
so we need to change the modulus in the keygen: "zz_p::init(23);".
Also note that at this point we only see a 11x15 matrix being multiplied
with a vector of the size 11.
For this system there's no unique solution. But if we move on a bit in the
disassembly, we see that it is in fact a 15x15 matrix, only that the
rest of the matrix multiplication is split off.
It would be too much code to paste, because this part completely unrolled,
So here is only the first part of the four missing matrix row
multiplications.

0040183D        mov     eax, [esp+0D0h+var_C8] ; x[2] , this is the new(!) x
00401841        mov     edx, [esp+0D0h+var_C0] ; x[5] , (not the one from the 1st matrix mult.)
00401845        mov     ebp, [esp+0D0h+var_C4] ; x[0]
[...]
0040187E        movzx   ecx, byte_40931C       ; matrix bytes
00401885        imul    ecx, ebp               ; 
0040188C        movzx   esi, byte_40931D
00401893        imul    esi, [esp+0D0h+var_88]
00401898        add     ecx, esi
0040189A        movzx   esi, byte_40931E
004018A1        imul    esi, [esp+0D0h+var_8C]
004018A6        add     ecx, esi
004018A8        movzx   esi, byte_40931F
004018AF        imul    esi, [esp+0D0h+var_90]
004018B4        add     ecx, esi
004018B6        mov     esi, [esp+0D0h+var_B4]
004018BA        imul    edi, esi
004018BD        add     ecx, edi
004018BF        movzx   edi, byte_409321
004018C6        imul    edi, edx
004018C9        add     ecx, edi
004018CB        mov     edi, [esp+0D0h+var_84]
004018CF        movzx   edx, byte_409322
004018D6        imul    edx, edi
004018D9        add     ecx, edx
004018DB        movzx   edx, byte_409323
004018E2        imul    edx, eax
004018E5        movzx   eax, byte_409324
004018EC        imul    eax, [esp+0D0h+var_A0]
004018F1        add     ecx, edx
004018F3        add     ecx, eax
004018F5        movzx   eax, byte_409325
004018FC        imul    eax, [esp+0D0h+var_9C]
00401901        add     ecx, eax
00401903        movzx   eax, byte_409326
0040190A        imul    eax, [esp+0D0h+var_98]
0040190F        add     ecx, eax
00401911        movzx   eax, byte_409327
00401918        imul    eax, [esp+0D0h+var_94]
0040191D        add     ecx, eax
0040191F        movzx   eax, byte_409328
00401926        imul    eax, [esp+0D0h+var_A4]
0040192B        add     ecx, eax
0040192D        movzx   eax, byte_409329
00401934        imul    eax, [esp+0D0h+var_A8]
00401939        add     ecx, eax
0040193B        movzx   eax, byte_40932A
00401942        imul    eax, [esp+0D0h+var_AC]
00401947        add     ecx, eax
00401949        movzx   ecx, cx
0040194C        mov     eax, 0B21642C9h
00401951        imul    ecx
00401953        mov     eax, ecx
00401955        sar     eax, 1Fh
00401958        add     edx, ecx
0040195A        sar     edx, 4
0040195D        sub     edx, eax
0040195F        imul    eax, edx, 17h       ;modulo 23
0040196E        sub     ecx, eax
00401970        movzx   eax, cl
00401973        mov     [esp+0D0h+var_7C], eax  ; store result

After that 3 more matrix rows are multiplied by the unknown vector.
The results of these four values are checked against values
we also know (H[], explained below).

00401C4A        mov     edx, [esp+0D0h+var_80]  ; H[0]
00401C53        mov     ecx, [esp+0D0h+var_7C]  ; result of 1st additional mult.
00401C57        cmp     edx, ecx
00401C59        jnz     loc_401F03
00401C5F        mov     edx, [esp+0D0h+var_BC]  ; H[1]
00401C63        mov     ecx, [esp+0D0h+var_B8]  ; result of 2nd additional mult.
00401C67        cmp     edx, ecx
00401C69        jnz     loc_401EE2
00401C6F        mov     edx, [esp+0D0h+var_D0]  ; H[2]
00401C72        mov     ecx, [esp+0D0h+var_B0]  ; result of 3rd additional mult.
00401C76        cmp     edx, ecx
00401C78        jnz     loc_401EC1
00401C7E        cmp     esi, eax                ; cmp H[3], 4th result
00401C80        jz      short loc_401CA3        ; jump performed if checks passed
[...]
00401CA3 loc_401CA3:                             ; CODE XREF: sub_401000+C80j
00401CA3        push    offset aNextModularAri ; "Next modular arithmetics checks succeed"...


Now what is this so called H[]? It is a hash value solely computed from the first
11 bytes of the input serial AND the result retrieved by solving the 3rd system
of equations (which we already did).
I won't explain the calculation of H[] in detail here (see keygen.c for the algo).

Anyways we know H[], so we have a full result vector r2 (of size 15) of M2*x=r2.
(r2[] = previous x[] extended by the H[] values)
We can solve this linear system of equations like the other one:

  solve(d, x, transpose(M2), r2);


Check #3 and #2 are solved now, #1 is still left, but is rather easy now.
A 16x16 matrix is muplitplied with a vector. The result is the x[] of the
previous solve() again, which is only of size 15, but r1[15] is explicitly given
(see below).

004014D2 loc_4014D2:                             ; CODE XREF: sub_401000+5A7j
004014D2        mov     esi, dword_409280[ecx*4]  ; accessing rows of matrix M1
004014D9        movzx   eax, byte ptr [esi]
004014DC        imul    eax, [esp+0D0h+var_C0]
004014E1        movzx   edx, byte ptr [esi+1]
004014E5        imul    edx, edi
004014E8        add     eax, edx
004014EA        movzx   edx, byte ptr [esi+2]
004014EE        imul    edx, ebp
004014F1        add     eax, edx
004014F3        movzx   edx, byte ptr [esi+3]
004014F7        imul    edx, [esp+0D0h+var_C4]
004014FC        add     eax, edx
004014FE        movzx   edx, byte ptr [esi+4]
00401502        imul    edx, [esp+0D0h+var_94]
00401507        add     eax, edx
00401509        movzx   edx, byte ptr [esi+5]
0040150D        imul    edx, [esp+0D0h+var_98]
00401512        add     eax, edx
00401514        movzx   edx, byte ptr [esi+6]
00401518        imul    edx, [esp+0D0h+var_9C]
0040151D        add     eax, edx
0040151F        movzx   edx, byte ptr [esi+7]
00401523        imul    edx, [esp+0D0h+var_A0]
00401528        add     eax, edx
0040152A        movzx   edx, byte ptr [esi+8]
0040152E        imul    edx, [esp+0D0h+var_A4]
00401533        add     eax, edx
00401535        movzx   edx, byte ptr [esi+9]
00401539        imul    edx, [esp+0D0h+var_D0]
0040153D        add     eax, edx
0040153F        movzx   edx, byte ptr [esi+0Ah]
00401543        imul    edx, [esp+0D0h+var_BC]
00401548        add     eax, edx
0040154A        movzx   edx, byte ptr [esi+0Bh]
0040154E        imul    edx, [esp+0D0h+var_B8]
00401553        add     eax, edx
00401555        movzx   edx, byte ptr [esi+0Ch]
00401559        imul    edx, [esp+0D0h+var_B4]
0040155E        add     eax, edx
00401560        movzx   edx, byte ptr [esi+0Dh]
00401564        imul    edx, [esp+0D0h+var_B0]
00401569        add     eax, edx
0040156B        movzx   edx, byte ptr [esi+0Eh]
0040156F        imul    edx, [esp+0D0h+var_AC]
00401574        add     eax, edx
00401576        movzx   edx, byte ptr [esi+0Fh]
0040157A        imul    edx, [esp+0D0h+var_C8]
0040157F        add     eax, edx
00401581        movzx   esi, ax
00401584        mov     eax, 0B21642C9h
00401589        imul    esi
0040158B        mov     eax, esi
0040158D        sar     eax, 1Fh
00401590        add     edx, esi
00401592        sar     edx, 4
00401595        sub     edx, eax
00401597        imul    eax, edx, 17h     ; modulo 23
0040159A        sub     esi, eax
0040159C        mov     [esp+ecx*2+0D0h+var_54], si ; R1
004015A1        add     ecx, 1
004015A4        cmp     ecx, 10h
004015A7        jb      loc_4014D2
004015AD        mov     esi, [esp+0D0h+var_CC]
004015B1        movzx   eax, [esp+0D0h+var_36]  ; M3.LastRow * x
004015B9        cmp     eax, 14h                ; has to be 0x14
004015BC        jz      short loc_4015DF    ;jump if check passed
[..]
004015DF loc_4015DF:                             ; CODE XREF: sub_401000+5BCj
004015DF                 push    offset aFirstModularAr ; "First modular arithmetics check succeed"...


[esp+0D0h+var_36] has to be 0x14.. You can't see that cleary here, because of the
IDA local variable naming, but that's exactly the result from the multiplication
with the last matrix row. So now we have the complete result vector r1[] and we can
solve this system too..
(r1[] = previous x[] extended by r[15]=0x14)

   solve(d, x, transpose(M1), r1);


The resulting x[] is the solution (the 16 char letter code on the right side of the serial,
without dashes).
We only have to add 0x41 to the values to get the string of characters we need
(because this value is subtracted from the letters at the beginning of the crackme,
to get a 16-byte vector for the first matrix multiplication).
And finally splitting up the string in junks of 4 chars seperated by dashes.

The letter check at the beginning is no problem.. my keygen checks for wrong
input characters anyways and produces an output which passes this check.

That's about it.. Feel free to contact me on efnet,
  nucleon
-------------------------------------------------------------------------------