
		Notes on Fast System Calls in Windows 2000
		-------------------------------------------

						           by Iceman [UKC]

	Ok , this will be quick and dirty. It's mostly a correction and 
explanation to Elicz's previous article about sysenter / sysexit in NT .
To illustrate behaviour of fast call mechanism , I coded 2 things , a 
user mode program wich trys to call Native API trough fast OS calls , and
a kernel mode driver wich actauly enables fastcalls in Win2k , by loading
syscall MSR's with appropiate values , as well as setting a _KeFeatureBits
ntoskrnl control variable to indicate that fast syscalls are allowed. Please
read the the limitations in associated source code, since I bypassed several
sanity checks from lack of time.
	The kernel mode driver supports dynamic laod / unload , and this is
the preffered way to be used. A wery nice utility for loading KMD's can be 
found at www.osr.com. It's one of my prefered utilities and I used it 
extensively during the development of this example.

	A classic Native API wrapped arround Int 0x2E is extremly simple. For
example , let's see the folowing example:

	 ZwQuerySection                   proc near

 arg_0                            = byte ptr  4

                                  mov    eax, 92h
                                  lea    edx, [esp+arg_0]
                                  int    2Eh

                                  retn   14h
 ZwQuerySection                   endp

	
	EAX contains a System Service Id wich will be used by _KiSystemService
to dispatch control to desired API and EDX contains a pointer to parameters wich
are passed on stack. When Int 0x2E is fired control is transfered to ring0 
trough  a interrupt gate. After requested API is identifyed and executed , 
the control is passed back to ring3 trough a IRETD instruction . Since the call
is defined as __stdcall the next step is to remove parameters from stack. (This
description of NTCALL is like looking from a plane to earth , but it's good 
enough for this article.
	Now let's see how the same API would be called troug a sysentr 
instruction:

	 ZwQuerySection                   proc near

 arg_0                            = byte ptr  4

                                  mov    eax, 92h
                                  lea    edx, [esp+arg_0]
                                  mov    ecx , offset sysexit_EIP
                                  sysenter
 sysexit_EIP:
                                  retn   14h
 ZwQuerySection                   endp

	From the very first sight we see that Ms;s engineeres tried to 
keep things as simple as they can in user mode, with minimals changes 
between the two interface types. Also , notice that EDX is still loaded
with offset of [esp+4] and not with esp . The reason for this is that 
minimal changes where implemented in kernel code to support fast syscalls ,
and the actual implementation hevily realys on code of _KiSystemService 
(Int 0x2E handler ).
	Since Native  API can be called in both ways , both methods must
build a special structure on stack  called KE_TRAP_FRAME. (Many API's from
NT micro-kernel mandatory requires such a frame). The layout of this
structure can be found in the ntddk.inc file redistribuited with the source
code coming with this article.
	The first part of a KE_TRAP_FRAME is a hardware interrupt stack frame.
In the case of a ring transition from ring 3 PM to ring0 PM , the CPU will 
push on kernel stack the folowing :

			SS 	
			ESP
			EFLAGS
			CS
			EIP
 Note that if a priviledge level switch does not occures at exception time 
(i.e , in this case Int 0x2e is fired from ring0 , the CPU will not push
SS and ESP on stack. )

From this point on , it;s _KiSystemService's responsability to create the full
KE_TRAP_FRAME structure. But , rember that fast syscall mechanism is quite 
different. The afirmation made by Intel that sysenter does not save EIP / ESP
nor any registers , should be taken quite literaly. Yes , sysenter does not 
save register , but it does not destroy general purpose registers at handler
entry. And nor does sysexit. Using this property , _KiFastCallEntry must build 
on kernel stack a KE_TRAP_FRAME structure to preserve compatibility with 
old _KiSystemService. So what pases to us the ring3 aplication ? Three things
A pointer to last parameter passed on stack in EDX  , EIP where execution 
should resume after sysexit is passed in ECX , and a Native API Id is passed in
EAX. Since this time the CPU will switch to a special stack , and will not 
build a exception frame itself , we must build it ourselves. Let's see the code
and comment it :

( in NT IA32 , 0xffdff000 is always thx base of FS segment , the so called KPCR 
 aka KI_SHARED_USER_DATA . A complete declaration of the structure can be found 
in ntddk.inc . Also , note that it is user responasbilty to load EDX with offste
t of [esp+4] , and ECX with offset of sysexit EIP. They are NOT laoded 
by sysenter.)

_KiFastCallEntry:

		mov	esp, ss:0FFDFF040h      ; Get a flat pointer to TSS
		mov	esp, [esp+4]		; load in ESP default ring0 ESP

;From this point , we begin to build KE_TRAP_FRAME structure .
					
		push	23h			      ;	push Ring3 SS
	        push	edx			      ;	push Ring0 ESP
		sub	dword ptr [esp], 4	      ;	Adjust User ESP

;Why do we neeed to adjust the value pushed from EDX ? . Remeber that EDX holds
;a pointer to the last parameter passed on user stack . But KE_TRAP_FRAME 
;requires a valid ring 3 ESP , not an aleator value. So , we need to substract
; 4 form it , lowering extended stack pointer to encompass the return address
; of the ring 3 function. So much with this "sub" mistery =)

		pushf				      ;	Save Flags
		or	dword ptr [esp], 200h	      ;	Enable Client interrupts
						      ; sysenter masks interrupts
						      ; in EFLAGS , so renable them
						      ; in the case we will use IRETD 
						      ; for client resume.

		push	1Bh			      ;	Ring3 CS
		push	ecx			      ;	Fake EIP
		push	0			      ;	Fake ErrorCode
		push	ebp                           ; 
		push	ebx
		push	esi
		push	edi
		push	fs			      ;Save user mode FS
		mov	ebx, 30h		      ;0x30 == Kernel mode FS 
		mov	fs, bx                        ;load it in FS
		assume	fs:nothing
					
		push	dword ptr ds:0FFDFF000h	      ;save exception list head
		 	
		mov	dword ptr ds:0FFDFF000h, 0FFFFFFFFh ;make this top level handler
		mov	esi, ds:0FFDFF124h	       ;ESI == current KTHREAD	
		push	dword ptr [esi+134h]	       ;Save previous THREAD mode

;complete KE_TRAP_FRAME trough a single instruction. In this particular case we
;do not need to save all registers.

		sub	esp, 48h		       
		mov	ebx, [esp+6Ch]  		;Get KTRAP_FRAME.CS
		and	ebx, 1
		mov	[esi+134h], bl			;poke thread mode in KTHREAD
		mov	ebp, esp			;complete a EBP based frame
		mov	ebx, [esi+128h]			;Get previous trap frame pointer
		mov	[ebp+3Ch], ebx			;save it 
	        mov	[esi+128h], ebp			;poke new trap frame pointer in  KTHREAD
		cld	                                
		test	byte ptr [esi+2Ch], 0FFh        ;Is sytem debugger present ?  
		jnz	loc_0_46144C			;Blah , Blah , do auxilair
							;procesing

loc_0_461542:							     
								      
		sti	                                ;STI
		jmp	_KiSystemService.ServiceLookup
        
	From this point on , the handler will chain into _KiSytemService wich
will call requested system service. The body of code is common now to both
transfer types.	Regardless of mode , after the system call is executed , 
_KiServiceExit is branched. The execution flow will separate again right before  
_KiServiceExit  makes the final steps to resume client execution. At this point
the handler already cleaned the stack almost totaly , leaving only registers
wich are pushed by CPU in the case of an exception. Let's examine it:
	
_kss_split_client_resume:					      ;	CODE XREF: .text:0046181Cj
		test	_KeFeatureBits,	1000h		;fast syscall enabled ?
		jz	short _kssIretd                 ;nope , use IRETD
		test	dword ptr [esp+4], 1		;Interrupt origianted in ring 0?
		jz	short _kssIretd			;yes , use IRETD
		test	dword ptr [esp+8], 20000h	;Interrupt originated in V86 mode ?
		jnz	short _kssIretd			;yes , use iretd
		pop	edx				; EDX = SYSEXIT_EIP
		add	esp, 8				;balance stack
		pop	ecx				; ECX = SYSEXIT_ESP
		sti					;enable ints
							;remember that sysenter 
							;disabled them 
							 
		sysexit 

_kssIretd:							      
		iretd
	

  The code is self explanatory. A question yet arise . What happens if the 
fast sytem calls are enabled , and _KiSystemService was called trough a
Int 0x2E originating in ring3 32 bit Protected Mode ? Well , the answer is 
damn simple. Nothing bad. The way sysexit is employed here can succesfully
simulate a Iretd with priviledge level change. This explain why ntdll.dll
is still able to use _KiSystemService after my test code enables fast calls.
  A word about sysexit. Intel documented very well this opcode , in the 
Pentim II Processor Instruction set manual (pdf number 24319102.pdf) . It's
not the case that Intel mistaked or Microsoft have access to better 
documentation from Intel (at least not in this case =) ). From where sysexit 
gets resume EIP and ESP is very clear. I quote from their manual:

//////

CS register set to the sum of (16 plus the value in SYSENTER_CS_MSR)
EIP register set to the value contained in the EDX register
SS register set to the sum of (24 plus the value in SYSENTER_CS_MSR)
ESP register set to the value contained in the ECX register
The processor does not save kernel stack or return address information, and does 
not save any registers. 

////
  
   For those interested on a in deepth look at how actualy those two handlers
actualy calls the requested system service , I recomand to study an excelent
paper written by Joey__ wich analysis SystemService mechanism. The paper can
be found at http://www.cmkrnl.com/arc-newint2e.html. 

  Further , look at the attached source code wich illustrates how one can
implement fast transition to operating system in build 1295 of Windows 2k, 
and try the binaries. Sugested sequence is:
	
1.	execute	 kesystem.exe	, notice the result . Dont worry 
exception will be caught by SEH handlers and finaly DrWatson will process
it.

2.	load 	 syscall.sys 	using osrloader.exe (The driver support dynamic
unload. U can discard it at naytime using the same utility.) Before laoding it
be sure you have a PII+ Intel CPU , or else a kernel panic is very probable.

3.	execute kesystem.exe again , this time no user mode exceptions will
	be generated , and NtYieldThread will be called from ntoskrnl. The
	program will succesfuly complete execution.

	From the test results we can tell that  MSR's dealing with fast system
do not contain required values , even if Windows 2K is instaled on  a 
PII+ system. Support for fast sytem calls at kernel level is anyway complete ,
ready to support a new ntdll.dll in the close future. Some words on code:
the _KiFastCallEntry code is duplicated a little. I assume this is because
Ms's enginneres used macros to build KE_TRAP_FRAME on the stack . The code
is not innefective , neither is dead , but in a ideal world it should chain
_KiSyatemService earlier than it does.

	Both user mode program , as well as the kernel mode device driver 
are written in assembly language , for NASM assembler. Ms's link.exe
is required to build the samples. I am sorry that I did not provided more
substantial samples , but my time was very limited. 
Btw , dont get smart and try to BPX with NTICE or other system debugger on
_KiFastCallEntry. Fair warning . See where points ESP at _KiFastCallEntry.
Out of ring0 stack top ESP.
	As for hooking fast system calls ? Well , simple . But think again.
Why should we hook handlers , when we can hook on a API basis , given the
implementation of system calls in NT ? There might be reasons , but very
few resist at a second view.
	
	As a closeup , remeber that my analyisys can be wrong =). I offer no
guarantees over the content of this paper , nor for the attched source code
and binaryes. If you use them , then you doit on your own risk.

		
	Iceman , [UKC] 
			
 
 