crypt() Source

by Dust  (Bern Switzerland)

We received quite a few replies to the letter from SJ in our last issue concerning UNIX encryption.  Several readers also submitted the source code for the crypt() routine which we are printing below.  The following introduction is the most detailed explanation we got.  We apologize to you non-math people but sometimes printing this kind of thing is unavoidable.

I followed the discussion about UNIX password encryption with great interest.  As I've been studying this subject for quite a long time already, there are some technical remarks I'd like to make about it, because there is still some confusion.  About the letter on page 29 of the Autumn 1991 issue, I'd like to say that crypt() is not a kernel routine as stated there, but a library function and as such its source is freely available and can be obtained from several anonymous FTP servers (one is apple.com in the sub-directory: pub/Archive-Vol2/4.3bsd-reno/lib/libc/gen/Makefile/crypt.c.Z

(The source file appears at the end of this article.)

This routine is the same on all UNIX versions.

It is true, however, that some security experts recommend modifying this call on your site for security reasons, for example, by modifying one of the permutation tables.  But this can only be done by recompiling the libraries and it's an action that normally shouldn't be done on UNIX systems, as it makes the system incompatible under certain circumstances (think of NIS, for example).  As stated, a possible attacker is better off using such a program offline, for two reasons: First, it won't be discovered as easily, and second, you can implement a much more powerful version of the algorithm.

One example of a more efficient implementation is the encryption used in the "Cracker" program, a password-hacking program written for system administrators to check the quality of user-chosen passwords.  I also implemented such a program and reach even a slightly better throughput; the C-version reaches about 900 encryptions on a SPARCstation 2, and the 68000-assembler version reaches 72 per second on an Atari ST (and probably also on an Amiga).  I won't publish the source codes here, but I think there's no problem in explaining the main mathematical ideas of improving the algorithm.  Those ideas are taken out of the paper "An Application of a Fast Data Encryption Standard Implementation" by Matt Bishop, Dartmouth College and NASA Ames Research Center.

I'm aware this paper isn't officially available and won't copy it in full extent, but as far as I know there's no law against explaining the ideas on a mathematical basis.

First, I'll explain the DES algorithm itself (which is part of crypt()), but I won't include the actual tables, which you find in the source code.

About notation, ^ means bitwise XOR.

DES itself consists of permutations written as:
P_...()
Expansions written as:
E_...()
Substitutions written as:
S_...()
Permutations exchange bit positions of a given bit-string in a reversible way, expansions do the same but use several bit-positions several times (so the output is wider) or not at all (so the output is smaller, actually a contraction), and substitutions substitute chunks of bit-substrings according to a fix table.

DES takes a cleartext (64-bits) and a key K (64-bits) as input.

The key is used to calculate 16 intermediate keys in the following way:
Using an expansion E_PC1(K), the first intermediate key K[0] is calculated (E_PC1 is a contraction, it only uses 56- of the 64-bits; the remaining ones are considered as parity-bits).

Then the following ones are calculated as K[i] = P_LSH_i(K[i - 1]), so just a special permutation (actually a left-shift) is applied to the previous intermediate key.

Finally, the subkeys k[i] are calculated as k[i] = P_PC2(K[i]), by applying a further permutation to the intermediate keys.

Note that P_PC2 contracts the 56-bit input to 48-bits, so each k[i] is 48-bits wide.

Then the cleartext M is encrypted:

Using an initial permutation, we get a 64-bit wide output: T[0] = P_IP(M)

Those are divided into two halves I[0] and r[0], each 32-bits wide.

The next 16 steps are the same, the output of each being used as the input for its successor.

For rounds i = 0,...,15:
1.  I[i + 1] = r[i]
2.  r[i + 1] = I[i] ^ P_P(S_S(E_E(r[i]) ^ k[i]))
In this equation, E_E expands the 32-bit wide r[i] to 48-bits, S_S substitute the eight 6-bit chunks by eight 4-bit-chunks using eight different but given tables, producing 32-bits of output, which are permuted by P (also giving 32-bits output).

Finally, the two halves I[16] and r[16] are concatenated and the reverse initial permutation applied to it, which gives the result: P_FP(r[16]I[16])
Now, the main mathematical improvement consists in applying E_E to both sides of equations (1) and (2):
1a.  E_E(I[i + 1]) = E_E(r[i])
2a.  E_E(r[i + 1]) = E_E(I[i] ^ P_P(S_S(E_E(r[i]) ^ k[i])))
As you see, you apply a bit-permutation on a bitwise XOR; you can as well apply the E_E permutation to both sides of the XOR first, and afterwards XOR them, because XOR doesn't change the bit positions, giving:
E_E(r[i + 1]) = E_E(I[i]) ^ E_E(P_P(S_S(E_E(r[i]) ^ k[i])))
Now we'll write L[i] instead of E_E(I[i]) and R[i] instead of E_E(r[i]), and we use the operator: F(..) = E_E(P_P(S_S(..)))

Giving:
L[i + 1] = R[i]
R[i + 1] = L[i] ^ F(R[i] ^ k[i])
And thus, always using the above two equations:
L[i + 2] = R[i + 1] = L[i] ^ F(R[i] ^ k[i])
R[i + 2] = L[i + 1] ^ F(R[i + 1] ^ k[i + 1])
         = R[i] ^ F(L[i] ^ F(R[i] ^ k[i]) ^ k[i + 1]) 
         = R[i] ^ F(L[i + 2] ^ k[i + 1])
So, as a result of the improvement, we get:
L[i + 2] = L[i] ^ F(R[i] ^ k[i])
R[i + 2] = R[i] ^ F(L[i + 2] ^ k[i + 1])
Using above result, we only have to use them eight times, as only even indices appear; however, we still have to apply an operation of the type: x ^ F(y ^ z) sixteen times.

But the operation F, which actually combines S-substitutions, P-permutation, and E-permutation, can be performed by constructing a table, input and output being 48-bits wide.

Note that you can't implement such a big table in one piece; you can, however, use four tables, each covering 12-bits of input (note that the substitutions take 6-bit chunks, so you must partition to parts divisible by six.

Each of those tables is indexed by a 12-bit input (4096 entries), giving a 48-bit result.  You use those tables by separating the four 12-bit parts, for each calculating the result by using the tables, and finally xor the four results.

For efficiency, I recommend stuffing the 48-bits to 64-bits in the way that each 12-bit sub-part is aligned to 16-bits (this allows faster access to the subparts, as rotating by 16-bits can often be performed by a special command, for example swap on a 68000).  This way, each of the 4096 entries uses 8 bytes, giving a size of 32K; all four tables then need 128K of memory.

Note, of course, that you must also modify P_IP and P_FP to add the E-permutation and take it back in the end, as in the main loop, you always calculate with L[i] and R[i] instead of I[i] and r[i]; but there's nothing new about it, and it is easy to realize it yourself.  Also note that you can make things a bit faster by combining P_PC2, P_LSH, and P_PC1; but the main time of the algorithm is eaten away by the main loop, this one is performed 400 times during a crypt() encryption.

That was the mathematical part.  Now considering UNIX's crypt(); it works the following way: using a 12-bit salt code, the E-permutation is modified by swapping some entries.

Then the password is taken as key, which encrypts a block of 64 0-bits according to DES 25 times (thus the above operation is executed 25 * 16 = 400 times).  Note that you can leave away the intermediate P_IP and P_FP permutations, as they are inverse operations.  Also note that you need to calculate the sub-keys only one time (they are re-used).

I'm using the following procedure to check a password: out of the encrypted password: I extract the salt-code (the first two characters), which isn't encrypted.  Based on it, I build the modified E_E table and then the F table because F depends on E_E.

This takes a lot of time, because it fills 128K of memory (it eats nearly a second in my implementation), but this doesn't count much, because you only need to do it once; afterwards you probably use the encryption thousands of times using the same salt, depending on the size of your dictionary.  Also think of the fact that, to be efficient, you should check all encrypted passwords with the same salt-code within a password file at the same time, which can be done by sorting the password file according to their salt.

That was all about it.

Note that it's best to implement it in assembler.  The C version is much slower, mainly because of the lack of a command to rotate a bit string (C supports only shifting), and because you're unable to express an action like swap (which exchanges low and high 16-bits of a 32-bit word) in an efficient way.  However, a C version is easier to implement on a machine with an unknown hardware.  Unfortunately, I don't know SPARC assembler...

crypt-bsd-4.3-reno.c
#if defined(LIBC_SCCS) && !defined(lint)
static char sccsid[] = "@(#)crypt.c	5.3 (Berkeley) 5/11/90";
#endif LIBC_SCCS and not lint

/*
 * This program implements the
 * Proposed Federal Information Processing
 *  Data Encryption Standard.
 * See Federal Register, March 17, 1975 (40FR12134)
 */

/*
 * Initial permutation,
 */
static	char	IP[] = {
	58,50,42,34,26,18,10, 2,
	60,52,44,36,28,20,12, 4,
	62,54,46,38,30,22,14, 6,
	64,56,48,40,32,24,16, 8,
	57,49,41,33,25,17, 9, 1,
	59,51,43,35,27,19,11, 3,
	61,53,45,37,29,21,13, 5,
	63,55,47,39,31,23,15, 7,
};

/*
 * Final permutation, FP = IP^(-1)
 */
static	char	FP[] = {
	40, 8,48,16,56,24,64,32,
	39, 7,47,15,55,23,63,31,
	38, 6,46,14,54,22,62,30,
	37, 5,45,13,53,21,61,29,
	36, 4,44,12,52,20,60,28,
	35, 3,43,11,51,19,59,27,
	34, 2,42,10,50,18,58,26,
	33, 1,41, 9,49,17,57,25,
};

/*
 * Permuted-choice 1 from the key bits
 * to yield C and D.
 * Note that bits 8,16... are left out:
 * They are intended for a parity check.
 */
static	char	PC1_C[] = {
	57,49,41,33,25,17, 9,
	 1,58,50,42,34,26,18,
	10, 2,59,51,43,35,27,
	19,11, 3,60,52,44,36,
};

static	char	PC1_D[] = {
	63,55,47,39,31,23,15,
	 7,62,54,46,38,30,22,
	14, 6,61,53,45,37,29,
	21,13, 5,28,20,12, 4,
};

/*
 * Sequence of shifts used for the key schedule.
*/
static	char	shifts[] = {
	1,1,2,2,2,2,2,2,1,2,2,2,2,2,2,1,
};

/*
 * Permuted-choice 2, to pick out the bits from
 * the CD array that generate the key schedule.
 */
static	char	PC2_C[] = {
	14,17,11,24, 1, 5,
	 3,28,15, 6,21,10,
	23,19,12, 4,26, 8,
	16, 7,27,20,13, 2,
};

static	char	PC2_D[] = {
	41,52,31,37,47,55,
	30,40,51,45,33,48,
	44,49,39,56,34,53,
	46,42,50,36,29,32,
};

/*
 * The C and D arrays used to calculate the key schedule.
 */

static	char	C[28];
static	char	D[28];
/*
 * The key schedule.
 * Generated from the key.
 */
static	char	KS[16][48];

/*
 * The E bit-selection table.
 */
static	char	E[48];
static	char	e[] = {
	32, 1, 2, 3, 4, 5,
	 4, 5, 6, 7, 8, 9,
	 8, 9,10,11,12,13,
	12,13,14,15,16,17,
	16,17,18,19,20,21,
	20,21,22,23,24,25,
	24,25,26,27,28,29,
	28,29,30,31,32, 1,
};

/*
 * Set up the key schedule from the key.
 */

setkey(key)
char *key;
{
	register i, j, k;
	int t;

	/*
	 * First, generate C and D by permuting
	 * the key.  The low order bit of each
	 * 8-bit char is not used, so C and D are only 28
	 * bits apiece.
	 */
	for (i=0; i<28; i++) {
		C[i] = key[PC1_C[i]-1];
		D[i] = key[PC1_D[i]-1];
	}
	/*
	 * To generate Ki, rotate C and D according
	 * to schedule and pick up a permutation
	 * using PC2.
	 */
	for (i=0; i<16; i++) {
		/*
		 * rotate.
		 */
		for (k=0; k<shifts[i]; k++) {
			t = C[0];
			for (j=0; j<28-1; j++)
				C[j] = C[j+1];
			C[27] = t;
			t = D[0];
			for (j=0; j<28-1; j++)
				D[j] = D[j+1];
			D[27] = t;
		}
		/*
		 * get Ki. Note C and D are concatenated.
		 */
		for (j=0; j<24; j++) {
			KS[i][j] = C[PC2_C[j]-1];
			KS[i][j+24] = D[PC2_D[j]-28-1];
		}
	}

	for(i=0;i<48;i++)
		E[i] = e[i];
}

/*
 * The 8 selection functions.
 * For some reason, they give a 0-origin
 * index, unlike everything else.
 */
static	char	S[8][64] = {
	14, 4,13, 1, 2,15,11, 8, 3,10, 6,12, 5, 9, 0, 7,
	 0,15, 7, 4,14, 2,13, 1,10, 6,12,11, 9, 5, 3, 8,
	 4, 1,14, 8,13, 6, 2,11,15,12, 9, 7, 3,10, 5, 0,
	15,12, 8, 2, 4, 9, 1, 7, 5,11, 3,14,10, 0, 6,13,

	15, 1, 8,14, 6,11, 3, 4, 9, 7, 2,13,12, 0, 5,10,
	 3,13, 4, 7,15, 2, 8,14,12, 0, 1,10, 6, 9,11, 5,
	 0,14, 7,11,10, 4,13, 1, 5, 8,12, 6, 9, 3, 2,15,
	13, 8,10, 1, 3,15, 4, 2,11, 6, 7,12, 0, 5,14, 9,

	10, 0, 9,14, 6, 3,15, 5, 1,13,12, 7,11, 4, 2, 8,
	13, 7, 0, 9, 3, 4, 6,10, 2, 8, 5,14,12,11,15, 1,
	13, 6, 4, 9, 8,15, 3, 0,11, 1, 2,12, 5,10,14, 7,
	 1,10,13, 0, 6, 9, 8, 7, 4,15,14, 3,11, 5, 2,12,

	 7,13,14, 3, 0, 6, 9,10, 1, 2, 8, 5,11,12, 4,15,
	13, 8,11, 5, 6,15, 0, 3, 4, 7, 2,12, 1,10,14, 9,
	10, 6, 9, 0,12,11, 7,13,15, 1, 3,14, 5, 2, 8, 4,
	 3,15, 0, 6,10, 1,13, 8, 9, 4, 5,11,12, 7, 2,14,

	 2,12, 4, 1, 7,10,11, 6, 8, 5, 3,15,13, 0,14, 9,
	14,11, 2,12, 4, 7,13, 1, 5, 0,15,10, 3, 9, 8, 6,
	 4, 2, 1,11,10,13, 7, 8,15, 9,12, 5, 6, 3, 0,14,
	11, 8,12, 7, 1,14, 2,13, 6,15, 0, 9,10, 4, 5, 3,

	12, 1,10,15, 9, 2, 6, 8, 0,13, 3, 4,14, 7, 5,11,
	10,15, 4, 2, 7,12, 9, 5, 6, 1,13,14, 0,11, 3, 8,
	 9,14,15, 5, 2, 8,12, 3, 7, 0, 4,10, 1,13,11, 6,
	 4, 3, 2,12, 9, 5,15,10,11,14, 1, 7, 6, 0, 8,13,

	 4,11, 2,14,15, 0, 8,13, 3,12, 9, 7, 5,10, 6, 1,
	13, 0,11, 7, 4, 9, 1,10,14, 3, 5,12, 2,15, 8, 6,
	 1, 4,11,13,12, 3, 7,14,10,15, 6, 8, 0, 5, 9, 2,
	 6,11,13, 8, 1, 4,10, 7, 9, 5, 0,15,14, 2, 3,12,

	13, 2, 8, 4, 6,15,11, 1,10, 9, 3,14, 5, 0,12, 7,
	 1,15,13, 8,10, 3, 7, 4,12, 5, 6,11, 0,14, 9, 2,
	 7,11, 4, 1, 9,12,14, 2, 0, 6,10,13,15, 3, 5, 8,
	 2, 1,14, 7, 4,10, 8,13,15,12, 9, 0, 3, 5, 6,11,
};

/*
 * P is a permutation on the selected combination
 * of the current L and key.
 */
static	char	P[] = {
	16, 7,20,21,
	29,12,28,17,
	 1,15,23,26,
	 5,18,31,10,
	 2, 8,24,14,
	32,27, 3, 9,
	19,13,30, 6,
	22,11, 4,25,
};

/*
 * The current block, divided into 2 halves.
 */
static	char	L[64], *R = L+32;
static	char	tempL[32];
static	char	f[32];

/*
 * The combination of the key and the input, before selection.
 */
static	char	preS[48];

/*
 * The payoff: encrypt a block.
 */

encrypt(block, edflag)
char *block;
{
	int i, ii;
	register t, j, k;

	/*
	 * First, permute the bits in the input
	 */
	for (j=0; j<64; j++)
		L[j] = block[IP[j]-1];
	/*
	 * Perform an encryption operation 16 times.
	 */
	for (ii=0; ii<16; ii++) {
		/*
		 * Set direction
		 */
		if (edflag)
			i = 15-ii;
		else
			i = ii;
		/*
		 * Save the R array,
		 * which will be the new L.
		 */
		for (j=0; j<32; j++)
			tempL[j] = R[j];
		/*
		 * Expand R to 48 bits using the E selector;
		 * exclusive-or with the current key bits.
		 */
		for (j=0; j<48; j++)
			preS[j] = R[E[j]-1] ^ KS[i][j];
		/*
		 * The pre-select bits are now considered
		 * in 8 groups of 6 bits each.
		 * The 8 selection functions map these
		 * 6-bit quantities into 4-bit quantities
		 * and the results permuted
		 * to make an f(R, K).
		 * The indexing into the selection functions
		 * is peculiar; it could be simplified by
		 * rewriting the tables.
		 */
		for (j=0; j<8; j++) {
			t = 6*j;
			k = S[j][(preS[t+0]<<5)+
				(preS[t+1]<<3)+
				(preS[t+2]<<2)+
				(preS[t+3]<<1)+
				(preS[t+4]<<0)+
				(preS[t+5]<<4)];
			t = 4*j;
			f[t+0] = (k>>3)&01;
			f[t+1] = (k>>2)&01;
			f[t+2] = (k>>1)&01;
			f[t+3] = (k>>0)&01;
		}
		/*
		 * The new R is L ^ f(R, K).
		 * The f here has to be permuted first, though.
		 */
		for (j=0; j<32; j++)
			R[j] = L[j] ^ f[P[j]-1];
		/*
		 * Finally, the new L (the original R)
		 * is copied back.
		 */
		for (j=0; j<32; j++)
			L[j] = tempL[j];
	}
	/*
	 * The output L and R are reversed.
	 */
	for (j=0; j<32; j++) {
		t = L[j];
		L[j] = R[j];
		R[j] = t;
	}
	/*
	 * The final output
	 * gets the inverse permutation of the very original.
	 */
	for (j=0; j<64; j++)
		block[j] = L[FP[j]-1];
}

char *
crypt(pw,salt)
char *pw;
char *salt;
{
	register i, j, c;
	int temp;
	static char block[66], iobuf[16];

	for(i=0; i<66; i++)
		block[i] = 0;
	for(i=0; (c= *pw) && i<64; pw++){
		for(j=0; j<7; j++, i++)
			block[i] = (c>>(6-j)) & 01;
		i++;
	}
	
	setkey(block);
	
	for(i=0; i<66; i++)
		block[i] = 0;

	for(i=0;i<2;i++){
		c = *salt++;
		iobuf[i] = c;
		if(c>'Z') c -= 6;
		if(c>'9') c -= 7;
		c -= '.';
		for(j=0;j<6;j++){
			if((c>>j) & 01){
				temp = E[6*i+j];
				E[6*i+j] = E[6*i+j+24];
				E[6*i+j+24] = temp;
				}
			}
		}
	
	for(i=0; i<25; i++)
		encrypt(block,0);
	
	for(i=0; i<11; i++){
		c = 0;
		for(j=0; j<6; j++){
			c <<= 1;
			c |= block[6*i+j];
			}
		c += '.';
		if(c>'9') c += 7;
		if(c>'Z') c += 6;
		iobuf[i+2] = c;
	}
	iobuf[i+2] = 0;
	if(iobuf[1]==0)
		iobuf[1] = iobuf[0];
	return(iobuf);
}

Return to $2600 Index