New Instructions Tutorial
by Qozah
Objectives on this article
Watching most virus writers' code, I've noticed a lack of imagination when using ASM instructions: there is a cool bunch of new of them since 486 and since PPro implementations, even not keeping in mind that the new MMX is here. You should use at least the new 486 instructions. Who is going to have win32 with a 386 ? CE ? <g>. 486+ instructions work in all AMD processors, and even the Pentium ones also do ( not the Pentium Pro and above ones... but they will ).
So, I'm trying to describe some of them, as they can be very useful to you for code optimization, in polymorphism ( I'm telling :) ) and why not, to fool AVs.
Thanks to Intel for not beeing the Microsoft way hiding what these stuff does ( though I had to make research and testing to get this working ) providing me info to make this article, and to all the people I tested CPUID on.
Format used
- rX : "register, X bytes", be X 8, 16 or 32. - rmX : "register or memory, X bytes", be X 8, 16 or 32. - immX: "immediate, X bytes", be X 8, 16 or 32 INSTRUCTIONS ~~~~~~~~~~~~ Bxx ~~~ Format: Bxx rm16,r16 Bxx rm32,r32 Bxx rm16,imm8 Bxx rm32,imm32 Processors: 486+
Description: These are single bit operations, which can be really useful when writing a polymorphic engine: for example, if you have a table which stores anything in 10 bit chunks, this is the mode to access it.
There are these ones: * BSWAP: Bit Swap * BT: Bit Test * BTR: Bit Test and Reset * BTS: Bit Test and Set
These are some examples:
- BTS [esi],15h
The processor goes to ESI address, adds it 15h bits and tests that specific bit. If it's a 1, it sets the Carry Flag, otherwise it clears it. Then, as it's "Bit Test and SET", sets that bit to 1 ( Bit Test just tests it and keeps result in CF, and BTR does the same but making the specified bit be 0 instead of 1 ).
- BSWAP [edi],ebx
Processor goes to EDI address, adds EBX to it and swaps that bit: if it was 0 it's now 1, otherwise it's now a 0.
So, you can see the first value ( that can of course be a register plus an offset ) is where we begin, then oount X bits from there. Forget all the 8/16/32 limitations :P
Don't you begin realizing how useful is this ?
CMOVcc
Format: As mov, but cc has to be substituted by a condition, same asconditional jumps Processors: Pentium Pro +
Description: Fuck, conditional movs ! Have you ever thought about the possibilities we have with that stuff ? One thing to keep in mind, check the CPUID instruction before to know if the processor admits them. Right now I think I won't recommend it's use as they would only work in Pentium Pro and Pentium II processors, but as soon as people start to get them as a standard and AMD adds them to it's processors, you should change your mind.
CMPXCHG
Format: CMPXCHG rm8,r8 CMPXCHG rm16,r16 CMPXCHG rm32,r32 Processors: 486+
Description: Looking at the name there's no further explanation to be added. Compares them and exchanges them.
CPUID
Format: CPUID Processors: 486+
Description: Forget all that flag checking and stuff to know which processor do you have. CPUID ( CPU IDentification ) gives you info in registers EAX, EBX, ECX and EDX, depending on the value in EAX. If Intel tells us the truth, this is:
EAX = 0 -------
EAX: Maximum CPUID value for EAX ( it's 2 now in PPro, while normal value is 1 )
Intel gives us this:
EBX = 'Genu' ECX = 'ineI' EDX = 'ntel'
AMD comes with this, making the order EBX-EDX-ECX:
EBX = 'Auth' EDX = 'enti' ECX = 'cAMD'
Genuine wouldn't fit :)
EAX = 1 -------
This is the important stuff. EAX holds version, while EDX holds information.
EAX = Version Information: 31-14: Useless 13-12: Processor Type 11-8: Processor Family 7-4: Processor Model 3-0: Stepping ID
Let's watch how does this work. Of course, thanks to all the people who helped me to test this :). This are some value to that EAX information.
type family Model Stepping ID PPro: 000 00 110 0001 ? K6-2: 000 00 101 1000 0000 P133: 000 00 101 0010 1100 P75: 000 00 101 0010 0100
This is some info I researched by means of various people and complemented by some Intel little extracts I could get. It's not complete, but DX2 486 for example doesn't use CPUID instruction, and the other models are just OverDrive enhanced versions of these ones.
.---------------------------------------------------------------. | Family | Model | Processor | |-------------+--------+----------------------------------------| | | | | | INTEL | | | | ----- | | | | | | | | 0100 | 1000 | Intel DX4 | | 0101 | 0001 | Pentium processors ( 60 or 66 MHZ ) | | 0101 | 0010 | Pentium processors ( 75 to 200 MHZ ) | | 0101 | 0100 | Pentium MMX ( 166, 200) | | 0110 | 0001 | Pentium II processor | | 0110 | 0011 | Pentium II processor, model 3 | | 0110 | 0101 | Pentium II model 5, and Celeron | | | | | | AMD | | | | --- | | | | | | | | 0101 | 0110 | K6 processors | | 0101 | 1000 | K6-2 processors | | | | | '---------------------------------------------------------------'
Type: There are four kinds: 00 means a normal processor, as 01 is an OverDrive Processor, 10 is a dual processor, and 11 is still reserved.
Family: Indicates if it's a 486, 586 or 686; for a Pentium, it will be 101, for a 486 it will be 100, Pentium Pro and II gets 110. K6 models have the same as Pentium ( 101 )
Model: Models in the same family. You can check out the results, as for example P75 and P133 are the same Model for example.
Stepping ID: Revisions in the same model.
EDX = Feature Information:
I'm showing just the most interesting.
bit feature 00 If there is an FPU ( co-processor ) 04 If RDTSC instruction works 15 If CMOVcc instructions work.
Anyway, there are lots of fields ( most of em useless for us ), specially awaiting to be filled as new instructions arise.
ICEBP
Format: db 0f1h
Description: This undocumented opcode works on every Intel processor, but not in AMD or Cyrix ones. It just generates an int1h instruction. Cool as Vecna said for polymorphic engines, as it fooled even Softice non recognizing it as a valid opcode.
Jcc
Format: As usual conditional jumps Processors: 486+
Description: You just forget all that shit about only using short relative jumps. You can easily jump to rel16 or rel32 offsets, so it seems the big bad stuff about relative jumping is completely eliminated. Code optimization and very big decryptors in your polymorphic code.
MOVZX,MOVSX
Format: MOV?X r16,r/m8 MOV?X r32,r/m8 MOV?X r32,r/m16 Processors: 486+
Description: MOVe with Zero eXtend or MOVe with Sign eXtend. This instruction lets you move a r/m to a register from a bigger size, zero extending the high part or sign extending it. This means you won't need to do a xor cx,cx/mov cl,bl if you for example got a random number in bl to make a loop, and some other applications you would think.
RDTSC
Format: RDTSC Processors: Pentium or +. AMD K6 also supports it.
Description: Forget all the ways on getting a random value: this instruction gets the processor's time-stamp counter into EDX:EAX, a 64 bit long random number ( it increments in each clock cycle ). It's problem is that if the TSD flag ( time stamp disable, in register CR4 ) is set, you only can exec this from ring 0.
I recommend anyway just getting the value in eax if you want values that fill completely the register, as edx lasts a lot to be filled ( well, maybe for slow poly it could be great ). Of course, you can modify the CR4 register as the other Control Registers, but you must be ring0 again to perform that :)
Who would want to hide us the Stamp Counter ???
XADD
Format: XADD rm8,r8 XADD rm16,r16 XADD rm32,r32 Processors: 486+
Description: It exchanges the source and destiny operands, loading the sum into the rmX ( the destiny ). Cool for optimizing code or just making it "another way".
VERR/VERW
Format: VERX rm16 Processors: Pentium Pro+
Description: Checks if a segment is readable or writable ( with VERR or VERW ) from the current prilege level. This segment can be code or data, and the operand is an address ( register or immediate ) that contains the segment selector for the one you want to check. If you can read or write it ( depending on if it's VERR or VERW ), the ZF will be = 1, otherwise it will be = 0. The cool thing on this is that this won't generate any exception, of course :), so you could stop having to use the SEH for searching the kernel32.dll base address.
Bad news are that it's not supported by AMD processors and Pentium ones; just a bet for the future.