C: The beginning ---------------- Well, C is a relatively simple language, in that it does not have too many language constructs. That's because the language is relatively low-level, it's quite close to the machine, so to say. And the machine is a thing of anarchy and chaos. C was developed to get some structure in this, without sacrificing too much performance, and size. We have: - Data declarations - Functions - Operators - Flow control statements - Type definitions That is what you have to work with. In short, code consists of functions, containing (flow control) statements, with operations on data of specific types. That may seem a bit too much to grasp at this point, but we'll take it one step at a time. Data ---- Ofcourse we need data... like text or numbers. On data, we can perform operations, like adding, subtracting, multiplying and such. So first, let's see how we can give our programs some data. We have 2 kinds of data: initialized data, and uninitialized data. They are declared in much the same way, except that initialized data gets a value assigned at declaration, and uninitialized data does not. So the initial value of an uninitialized variable is undefined. First you give the type of your data variable, then you give the name: char myChar; This is an unintialized data variable. Initialized data works by assigning an initial value to the variable: char myChar = 'a'; Well, char is just 1 primitive data type of C. I will give you the complete list and their sizes in memory here: - char 1 byte integer, also used for characters. - int 2 or 4 bytes integer, depending on the system architecture. - float 4 bytes floating point number. - double 8 bytes floating point number. - (pointers) depending on the system architecture. As you see, some data types are dependant on the system. So to make things easier, I will choose the popular x86 system in 32 bit mode from now on. Note that other systems may vary. There are also 2 'size modifier' directives: - short 2 bytes - long 4 bytes These directives can be prefixed to int, and will determine the size. In most compilers, you can omit the int, and just use long and short as if they were primitive types themselves. So for 2 byte integers, both these declarations are correct: short int a; short a; Similarly for 4 byte integers: long int a; long a; And if there's no size modifying directive for an int, the compiler will use the default size, which is platform-dependant. On the x86 system, ints are long, 4 bytes, and so are pointers. Pointers are a special group of data types, which I will cover later. (in 16 bit realmode OSes, ints used to default to short, and pointers could be either near or far. This had to do with the segmented memory model on old x86 processors (8086, 8088, 80186 and 80286). This legacy system is beyond the scope of this text, since modern x86 systems use 32 bit addressing. But when using a realmode OS such as DOS, you have to pay attention to this.) These data types are seen as signed numbers by default. Signed means that the number can be both positive and negative. Unsigned variables can not have negative values. This is interesting, because a char is only 1 byte, or 8 bits big. It can take on 2^8, or 256 values. With signed, this would be -128 to and with 127. With unsigned, it would be 0 to and with 255. You can control this behavior with the signed and unsigned directives: signed int; unsigned int; It is also legal to define multiple variables of the same type on one line, even intermixing initialized and uninitialized data. It works by simply separating all variables by comma's, like this: unsigned int myVar1, myVar2, myVar3 = 50, myVar4; Functions --------- Functions are the core of any C program. They contain the actual code, and therefore provide the functionality of the program. A function can receive parameters, the data it will process. And a function can return a primitive data type variable. You declare a function in the sequence of return type, function name, and parameter list (in parentheses): int MyFunction(int param1, unsigned char param2, signed short param3) Functions also have the possibility to not return anything. In that case we have the special void data type. We will see this type again later with pointers. This data type can also indicate that we want no parameters. So if you don't need any return value, and no parameters, then you can do: void MyFunction(void) (note: there's old-style and new-style for functions with no parameters. Official ANSI C wants MyFunction(void), but before the ANSI C standard was introduced, MyFunction() was used. For most compilers, both styles should work, but some (eg Borland) may enforce the ANSI C (void) style.) Blocks of code are always between curly braces: {}. So the code that goes into our function is no different. The code block immediately follows our first line which declared the function prototype. This might also be a good time to explain how to add comments to your program. A C comment is prefixed by /* and postfixed by */. Anything between those symbols is considered as comment, and will not be looked at by the compiler. A small example: int main(void) { /* Print some text to the screen, using a library function */ puts("Hello world!"); /* Exit function with return value */ return 0; } Here we have a function calling the puts() function with a text string as a parameter (puts() will 'put' the 's'tring on screen. We will look at these strings later, aswell as the puts function), and then returning a signed integer value of 0 (this is an immediate operand). Note also that each line of code in C is delimited by a semicolon (;). Now, to look at the calling of functions more closely... You can use functions from your own source, but you can also import functions from earlier compiled modules of code, or libraries. Libraries are made up of a number of modules of code. ANSI C comes with quite a few libraries of code, which you can use in your programs. With these libraries, you also get header files, which include these function prototypes, among other things. We will look at these header files more closely lateron, when we are actually going to write a program. Before you can call a function, the compiler needs to know how many parameters are to be passed to the function, and what types they are. This is done via a prototype of the function. If a function is defined in your own source code, above the line where you want to call it, then the compiler already knows the prototype, since it has seen the actual function before, and you won't have to do anything. If a function is below your call, or imported from a code library or module, then the compiler won't know the function, so we have to provide a prototype before using the function. A prototype looks much like the first line of a function, except that the parameter names are optional, and are usually omitted. An example of a prototype: int MyFunction(int x, char y, short z); Or, omitting the names: int MyFunction(int, char, short); Then you're all set to use the function lateron in your source. Calling a function is as simple as filling in the blanks, basically. All you have to do is fill in the variables, in the prototype, and the function will be called, and its return value will be yielded. You can either import functions from another library of code, or use functions from your own source. For example, if we have a function like this: float sqrt(float); which will return the square root of a float we give it, we can make a small piece of code like this: float x = 25, sqrtOfX; sqrtOfX = sqrt(x); As you can see, you can treat a function call like a number. In this example, the parameter x will be passed to the function, the function will do its work, and return the square root of x. And this result will be assigned to the sqrtOfX variable. Operators --------- So now that we know how and where to put our code, the next question ofcourse will be: "How do I write code?". That is not a trivial question, so we will break code down to some subsets. Our first subset is "operations on data". As with most language constructs, C does not have too much operations on data. Here's the list: Mathematical operators: - + : addition. - - : subtraction. - * : multiplication. - / : division. - % : division remainder/modulus. Bitwise operators: - & : AND - | : OR - ^ : XOR - ~ : NOT - >> : shift right - << : shift left They all work the same, in that you specify a target variable, then the first operand, the operator, and then the second operand. I will give a small example, with some data: int destination; int operand1 = 10; int operand2 = 20; destination = operand1 * operand2; This will assign the value of the expression 10 * 20 to the destination variable. Well, to be more precise, the right hand side is an expression, which yields a result. You could just write this in C: operand1 * operand2; This would yield the result, but it never does anything with it. In these first examples, we will assign the result to a variable, but we will see that there are other things we can do with expressions, such as combining them to larger expressions. You could say that the above expression is a 'primitive expression' You can also use immediate operands instead of variables: int destination; int operand1 = 15; destination = operand1 / 3; This will assign the result of 15 divided by 3 to destination. There is also shorthand notation for the case where one of the operands is also the destination variable. The shorthand notation works with putting the operator directly in front of the equals-sign, and specifying only the other operand. So: destination = destination ^ 10; can be written as: destination ^= 10; There's another shorthand case, namely when you want to increase or decrease the value of a variable by 1 unit. Why do I call it a unit? We'll see that later, when discussing pointers. For numbers, the unit is simply the number 1. These are the operators for it: - ++ : increase - -- : decrease They work slightly different from the normal operators. You just prefix or postfix them to a variable, there is no equals-sign involved. When postfixing the operator, the value is used in an expression, and afterwards its value is increased. When prefixing it, the value is increased first, then used in the expression. Some examples: int destination; int operand1 = 30; int operand2 = 19; destination = operand1 - ++operand2; This will result in the following values: destination = 30 - 20 = 10 operand1 = 30 operand2 = 20 Postfixing the operator: destination = operand1 - operand2++; Gives the following results: destination = 30 - 19 = 21 operand1 = 30 operand2 = 20 You can also make more complex expression. The compiler should follow standard operator precedence, and resolve expressions in brackets first. You can recursively compose expressions, yielding the results of each 'primitive equation' in the sequence the brackets and operator precedence have defined. For example, this is possible: int destination; int var1 = 10; int var2 = 20; int var3 = 30; destination = 27 * ((var2 + --var3)/(var1 * 2)) This will give destination the value of: 27 * ((20 + 29)/(10 * 2)) = 27 * 49/20 = 27 * 2 = 54 And finally, you can also use the results of functions in expressions: destination = 25 + sqrt(var0); Now, onto another kind of expressions... Flow-control statements ----------------------- Here comes the interesting part of code, namely controlling the flow of our program, on specific conditions. We do this with boolean expressions, a condition is either TRUE or FALSE. Since a computer can only represent numbers, we express these boolean values in integers (either char, short, int or long). A 0 stands for false, everything else stands for true. But as a rule, when assigning the true value to a variable, 1 is used. For boolean expressions we have some operators: - == : equals - != : not-equals - && : logical and - || : logical (inclusive) or - ! : logical not - > : greater than - < : less than - >= : greater than or equal - <= : less than or equal Any boolean expression should yield the value 0 for false and 1 for true. They work much the same as the previous expressions, namely with first operand, operator, second operand. We will usually not store them in new variables however, but use the result directly in a statement. A simple example of a boolean expression would be: short var; var == 15; The result of this expression is true, or 1, if var equals 15, and false, or 0 if var is any other value than 15. Another example: var1 && var2; This expression results in true if, and only if both var1 and var2 are true (non-zero). var1 || !var2; This expression is true when var1 is true and/or var2 is false. Boolean expressions can also be combined: var1 && !(var2 >= 10 || var3); We have only a handful of flow-control statements: - if-else - do-while - for - break - continue - switch-case - goto The if-statement is simple, but effective... 'If (this expression is true) then run this block of code'. And you can optionally run an alternative block of code if the expression was not true, using the else-statement after the first block of code. Going something like this: int var1, var2; if (var1 == 3) { var2 += 10; } else { var1 /= 5; } do and while can be used to run a block of code in a loop, while ('as long as') the boolean expression is true. do-while() is a special form of while(), where the expression is checked after the code block is executed, instead of before, with the normal while(). As a result, the code block is always run at least once. As an example, I shall show the code for a power function: int mantissa = 5, exponent = 3, result = 1; while (exponent--) result *= mantissa; At the end of the loop, result will contain 5^3. (And as you see, when there's only 1 line of code in the loop, there's no need to put it in between the {} brackets). Now for an example with do-while: int result, var1 = 3, var2 = 7; do { result = var1 * var2; var1 += var2; } while (result < 5000); Here you can't result before entering the loop, because result is not initialized yet. So basically it has just the value that the last program left there when using that piece of memory. Checking its value would be irrelevant. We could make it an initialized value, but we know that the result is less than 5000 the first time anyway, so this way, we save 1 check, and we save the trouble of initializing the result value. The for-loop is similar to a while-loop, but it has a special construct, where you can not only specify a boolean expression which must be true to loop, but you can also initialize some variables before entering the loop, and you can specify some expressions which will be carried out after each loop. These expressions are usually used to update the variables that are used for the loop. It goes like this: for (; ; ) Or less abstract (an example which calculates faculty of x): int i, x, y; for (i = 1, x = 5, y = 1; i <= x; i++) { y *= i; } In fact, it can be shortened to this: int i, x, y; for (i = 2, x = 5, y = 1; i <= x; y *= i++); (Here we see that when there's no code block following a statement, we can just delimit the line of code, and with that the loop, with a ;) The expressions in a for-loop are optional. For an endless loop, you can simply omit all expressions: for (;;) { /* Put code in endless loop here */ } Which brings us to the next question... What if we want to exit a loop such as this endless loop, when a certain condition occurs? That's where the break statement comes in. The break statement will exit the current loop, and continue the rest of the program. You will usually use this with an if (conditon) break; construction. A small example: Let's say you've got a program that reacts to the user's input (a menu or something). It should stay in the loop until the user chooses 'quit': for (;;) { /* Put code to display the options here */ if (getUserInput() == QUIT) break; else doSomething(); } Question: Do we really need to put doSomething() in an else-statement? Answer: In this case we don't. Notice that when the if is true, it will only do the break (and therefore breaks out of the loop). When the if is false, it will do doSomething(). The continue statement is similar to the break statement, but it only exits the current cycle of the loop and enters the next. Going something like this: int i, n, m; /* user inputs n */ for (i = 0; i < 100; i++) { m = i % n; if (m == 0) continue; n /= m; } We see here how we can exit the current loop on a certain condition. In this case we exit if m is 0, to avoid a division-by-zero exception. But, now back to our menu-example of earlier... OK... so now you're probably wondering why you'd want a menu with only a quit-option. Well... You don't. Now, there are two ways to add new options to the menu. You could of course do it by adding if's: int userinput; for (;;) { /* Put code to display the options here */ userinput = getUserInput(); if (userinput == FILE) doFile(); else if (userinput == EDIT) doEdit(); else if (userinput == VIEW) doView(); else if (userinput == QUIT) break; else doBadInput(); } OK, this'll work, but it's not the ideal solution. The compiler only sees a couple of if's, but it isn't able to tell whether they have anything in common or not (compilers aren't that smart yet.), therefore the code it produces won't be as efficient as it could be. The best way to do it is to use the switch-case statement for this. The switch works by testing one int variable against several (const) solutions. The above example menu-example using switch (): int userquit; for (userquit = 0; userquit == 0;) { /* Put code to display the options here */ userinput = getUserInput(); switch (userinput) { case EDIT: doEdit(); break; case VIEW: doView(); break; case QUIT: userquit = 1; break; default: doBadInput(); } } Some comments on this structure: Firstly, the code looks much better :) Secondly, for the compiler, it's very easy to see that the cases are related to each other, because they all depend on the same variable, by definition. Hence the compiler can make optimizations that a normal block of if's won't allow. And thirdly: see the break at the end of each case? That is there to stop the program entering the other cases. You can use this mechanism if you want to make the program do the same thing for some of the cases. For example suppose you're on a diet that prescribes that you can only eat meat on tuesday, thursday and sunday, and you want a program to print (when given a day-number between 1 and 7) if that day is a meat-day? /* day 1 is monday */ int day = getUserInt(); switch (day) { case 2: case 4: case 5: puts("Meatday!!"); break; case 1: case 3: case 6: case 7: puts("No meat"); break; default: puts("This day doesn't exist, no meat for you, pal!"); } Let's say we enter day = 5. It'll start executing from case 5: until it hits a break. See how good this looks? You can instantly see which cases do what... Much better than: if (day == 1 || day = 3 || day == 6 || day == 7) puts("No meat"); else if (day >1 && day <8) puts("Meatday!!"); else puts("The day doesn't exist"); The compiler can't really optimize this in terms of speed, but here it just looks better. So there! And now onto our last control flow statement: goto. This statement can be used to make jumps to and fro in the code. It can be quite useful in some cases. The destination will be marked by a label. Labels are defined by a name, followed by a semicolon: myLabel: They are useful for exiting multiple nested loops at once, where break cannot, among other things. That would look something like this: int i, j; for (i = 25; i >= 0; i--) for (j = 0; j <= 25; j++) if (i == j * j) goto endOfLoop; endOfLoop: /* program continues here */ And that about wraps up our control-flow. Now on to the next challenge... Arrays and pointers ------------------- Okay, here comes an interesting part of C. We will get into direct contact with a part of the machine here, namely the memory. The variables we used earlier were stored in memory aswell, but we didn't get to see where and how exactly. The compiler took care of that for us, we could just use the variables by the name and type we had given them. Now we are going to use linear sets of data, called arrays. And to understand how they work exactly, we have to look at how the machine uses its memory. So how does the machine use its memory? You could picture it as a giant cupboard of drawers, and each drawer has a its own unique number. In every drawer we can store one byte. Picture it like this: 0: [ ] 1: [ ] 2: [ ] 3: [ ] 4: [ ] 5: [ ] ... Or perhaps you prefer a horizontal look at it: 0: 1: 2: 3: 4: 5: ... [ ][ ][ ][ ][ ][ ]... So to get a byte in memory, all we have to know is the number of its container. We call this number the address. It's also interesting to look at how we store larger variables in memory. This is another platform-dependant matter. An int for example is 4 bytes. Now, there is 2 ways to store those 4 bytes in memory. The way we store multi- byte numbers into memory, is called Endianness. There is Big Endian, and Little Endian. Let's say that the 4 bytes of our int are AA, BB, CC and DD, from most significant part to least significant part. So our int looks like AABBCCDD. In Big Endian, we store it like humans write numbers: from left to right, most significant to least significant, so it will look like this: [AA][BB][CC][DD] Little Endian stores it from least significant to most significant instead. So we get the reverse: [DD][CC][BB][AA] Nearly all systems use the Big Endian byte order these days, it became the preferred method for new systems in the 80s. It is also the 'network byte order', used on all standard networks, including ofcourse the internet. But, bear in mind that the original 8086 dates from 1978, when Little Endian was still used predominantly, so the x86 is one of very few systems around today, that still uses the now largely obsolete Little Endian method. So, in short, the address of a multibyte variable will always point to the most significant byte in Big Endian, and it will always point to the least significant byte in Little Endian. Well, what is a pointer? That is simply the address or 'reference' of a variable in memory. A pointer has a type, and contains an address, which is a 32 bit number. So like a normal variable, it actually contains a number, and it is stored in memory (storage is also affected by the Endianness of the architecture, as with the normal multibyte variables). Declaring a pointer of some type is as simple as giving the type, and then the name, with an asterisk (*) prefixed to it: int *myPointer; You can use pointers to any of the primitive types, and also to user-defined types, which we will see later. But now, how do we assign an address value to it? One option is to use the 'address-of' operator, which gets the address of a variable. It is as simple as prepending '&' to the variable name: char myChar = 25; '&myChar' will then resolve to the address of the char in memory. Assigning a value to a pointer works the same as with the other variables: char *myPointer, myChar = 25; myPointer = &myChar; Now myPointer contains a reference to myChar. There is also a dereferencing operator in C, this will get the value of the variable at the address that is being pointed to. It works by prepending a '*' to the variable name. You could say that it is the opposite of the &-operator. The &-operator turns a variable into a pointer, and the *-operator turns a pointer into a variable. So we could store the char that myPointer is pointing to, back into a char like this: char *myPointer, myChar = 25, newChar; myPointer = &myChar; newChar = *myPointer; Now newChar has a value of 25; We could also assign a value to an address by dereferencing a pointer. Take a careful look at this: char *myPointer, myChar; myPointer = &myChar; *myPointer = 25; We store 25 at the address that myPointer is pointing to. But, pay attention here! Look what happened... myPointer contained the address of myChar. So by storing 25 to myPointer, we stored it into the address of myChar, and therefore myChar now has the value of 25! Now from pointers on to arrays... An array in the field of programming is basically what the word means: an array, a row. A row of variables of the same type, more specifically, stored in one linear piece of memory. You can picture it like this: An array of 6 chars (1 byte), starting at address 521: Address: 521: 522: 523: 524: 525: 526: Index: [ 0] [ 1] [ 2] [ 3] [ 4] [ 5] Note that we start indexing by 0, not 1. So, the array of 6 chars has indices 0 to 5 for all the elements there. More generally speaking: N elements of an array are indexed from 0 to (N-1) Also note that we can get the address of each individual element by: starting address + index Let's look at arrays of bigger types. For example, an array of long ints. An array of 4 long ints (4 bytes), starting at address 64: Address: 64: 68: 72: 76: Index: [ 0][ 1][ 2][ 3] We see here that the correct addressing formula for all types is: starting address + (index*sizeof(type)) Incidently, sizeof(type) is recognized by C. If we put: sizeof(int) it will evaluate to a value of 4, which is the number of bytes in an int. More generally speaking, sizeof(x) will evaluate to the total number of bytes of memory used by x. This is applicable to all primitive and user-defined types, and as we will see later, also to arrays and structures. How do we define arrays and use arrays in C? There's basically 2 variations... We have statically allocated arrays and dynamically allocated arrays. First we will look at the statically allocated ones. They are defined much like a single variable: type, name. But after the variable name we put the number of elements we want, in square brackets []. Looking like this: char myArray[256]; Now, we have set up an array of 256 (uninitialized) elements. myArray is the pointer to the first element in the array. To access an element in the array, you can simply use the subscript operator ([]), as it is called: name[index]. For example, we want to test whether element 5 equals 25. You can manipulate myArray[5] like a normal char, so we can just create a normal boolean expression with it: if (myArray[5] == 25) { /* Do something */ } Assigning values and doing operations on array elements are also done like we've seen before. Here's a small example that multiplies every element of an array by 23: short numbers[36], i; for (i = 0; i < (sizeof(numbers)/sizeof(numbers[0])); i++) { numbers[i] *= 23; } Lets also take a closer look at this expression: (sizeof(numbers)/sizeof(numbers[0])); This yields the size of the array. So i will go from 0 to 36. Namely, what happens here, is this: sizeof(numbers) will give use the total amount of memory used for the array numbers. This will be in bytes. But a short int is 2 bytes, so we need to correct for that. sizeof(numbers[0]) will give us the size of element 0 of the array. Element 0 is 1 short int, so this will give us 2 bytes (ofcourse all elements in the array are equal size). So if we divide the two, we get this: (sizeof(numbers)/sizeof(numbers[0])) = 72/2 = 36 Which is our arraysize. This little trick is very useful. If for example we would change the type of the array to long int later, this line can remain untouched, as it would still yield the correct arraysize. And if we decide to change the size of the array to say 50, then all we have to do is to change the definition of the array: short numbers[50]; and the for-loop will still loop through all elements. It's also possible to put initialized data into an array. In that case, the size does not need to be specified, because the compiler can derive that from the number of elements it needs to add. We give a list of elements, in {} brackets, and separate each element with a comma: int myArray[] = { 10, -642342, 12321, 213122, 1231 }; For text, there is a special array definition. Text is stored as an array of chars, terminated by a 0. For example: char myText[] = {'H','e','l','l','o',0}; But, C provides a special construct for such text strings. The following is equivalent: char myText[] = "Hello"; The 0-terminator will be appended automatically. This construct will also allow you to append strings together. If you put 2 or more strings after one another, they will be appended to form 1 string. This makes it possible to split long strings up in multiple lines, or even add comments inbetween. A few examples: char myText[] = "Hi" " there!" " How are you doing?"; char myText[] = "Hello, " "how are you?"; char myText[] = "This is version " /* edit this */ "0.1 beta" " of the software"; It's also possible to initialize some elements, and specify a size. This can be useful when for example only the first element needs to be 0 (for a zero- terminated list, for example): short myArray[32] = { 0 }; myArray is a pointer, in that it evaluates to a memory address, but it's a bit different from the ones we've seen earlier. Namely, this pointer is not stored in memory, but the address is a constant rather, which you can use, but not modify. The pointers we've seen earlier, are stored in memory, and act like variables. You can also use operations on them. For example, we want to fill an array with all powers of 7: unsigned int myArray[256], *myPointer, i, power = 1; myPointer = myArray; for (i = 0; i < (sizeof(myArray)/sizeof(myArray[0])); i++) { *myPointer++ = (power *= 7); } Now, a few notes here... First, this part: *myPointer++ The ++ postfixed to myPointer basically does what you would expect it to do. The operation is performed, and after that, the variable is increased. In this case, the operation is a dereference. So we assign a value to the address that myPointer points to, and then we increase its value by 1. "1 what?" you may ask. The answer to that, is "1 element". If you want to increase the pointer by say 6 elements, then you can just add 6 to it, like this: myPointer += 6; Basically, you can do any operation on it, even stuff like adding 2 pointers. Note however, that it adds 6 elements. The actual address that myPointer contains, will be increased by 6 * sizeof(type), so in this case, that is 6 * 4 = 24. We will look into that some more, when we get to typecasts. Now to the second part: (power *= 7); This is a nice shorthand notation. The statement in brackets is executed, and its value is evaluated, and can be assigned to our array-element. So, power gets multiplied by 7, and the new value of power is then assigned to *myPointer. On to dynamically allocated arrays. Dynamically allocated arrays can be useful when you don't know the size of the array beforehand, because it is dependant on user input, for example. Or, when you don't need the array for the duration of the entire program, and you would like your memory deallocated after you're done with it. For the allocation of memory, we have the following function: void *malloc( size_t size ); Well, we see here that it returns a void *. A type we haven't seen before. Basically, it is a typeless pointer, and it cannot be dereferenced, since we don't know the type, and therefore we don't know how large an element would be. But, we assign its value to a pointer of the type of the array that we need, so there is no trouble. Assigning the value of a variable of one type to the value of another type, is called typecasting. Let's look deeper into that before we continue. Typecasting ----------- This is a very simple and short subject... Typecasting a variable is basically forcing the compiler to interpret the data of the variable as if it were of another type. This is done by putting the desired type before the variable (or expression), in brackets, looking like this: (int)myVariable; Here's an example on how a cast can affect variables: short i; char j = -1; i = (unsigned char)j; Now, instead of what you would expect to happen, i is not -1 now, but it is 255. What happened is this: j is cast to an unsigned char. The bitpattern for -1 is 11111111 in 2s complement notation. Now, we interpret that same bit pattern as if it were not signed. In that case, 11111111 is equal to 255. And that is the value assigned to i. You could also use a cast to add a certain amount of bytes to a pointer, instead of a certain amount of elements. A char is 1 byte, so we could cast our pointer to a char pointer temporarily, and add the number of bytes we want. For example: int *myPointer, myBytes; (char *)myPointer += myBytes; We temporarily change the type of myPointer to char *, then we add the number of bytes we want, which is stored in myBytes in this case. Afterwards, myPointer will be an int * again, since a cast is only temporary. In most cases, a cast is implicit. For example, void * can always be cast to other pointer types. In some cases tho, the compiler might give a warning, or you want to change the behavior to that of another type temporarily. That's when you use a cast. Now, back to the malloc() function... There was this other new thing... the size parameter has type size_t... That's odd... we haven't seen size_t in the variable types. Well, this is because size_t is not a primitive type, but a user-defined one. Basically, it's just a primitive type, but given another name, so that it makes more sense when reading the source. When we search through the header files (we will look at header files more closely lateron), we find that size_t is defined in a file called Stddef.h, by the following line: typedef unsigned int size_t; So, size_t is basically nothing but an unsigned int. The typedef directive is followed by the primitive type, and then a list of new names to be used as variables of that type, separated by comma's. For example: typedef unsigned int colour, serial, uint; Now you can declare variables like this: colour red, green, blue; serial nr1, nr2, nr3; uint a, b, c; An interesting application for these typedefs is portability. You could write a program that uses only user-defined types, and to port it from one system to another, all you would have to do is adapting the type-definitions to the new architecture. For example, you need a 32 bit signed integer type, and on architecture X, you would need a long int, and on architecture Y, you would need an __int32. You would choose a usertype to represent the 32 bit signed integer, let's take sint32 in this example. Then all you have to do to make it work correctly on architecture X, is this: typedef long int sint32; And to make it work on architecture Y, you would use: typedef __int32 sint32; Then there's also the compound datatype in C, the data structure. Structures contain a set of variables, and can be very useful for adding logic and structure to your program. Namely, when you deal with real-world entities for example, and they have certain attributes, you can group them together into 1 compound datatype. Let's say you want to store data about cars. For example, you want to store model, year and colour of a car. First you use the 'struct' keyword. Then you give your type a name. And then you group some types together, and give them names, just like you would do with separate variables. Put them in between {} brackets. Looking like this: struct Car { unsigned int model; unsigned short year; unsigned int colour; }; Then when you want to define a variable of this type, you have to mention that it is a structure, by using the struct keyword. Looking like this: struct Car myCar; You can also initialize the struct, which would look like this: struct Car myCar = { 911, 1989, 500 }; You could make a type of 'struct car', by using a typedef. This would save you from typing 'struct' before each new variable definition. So you would define a type like this: typedef struct Car sCar; And then define your variable like: sCar myCar = { 911, 1989, 500 }; The most convenient way is to combine the struct definition with the typedef. You don't have to give the struct a name then, since you won't be needing the name of the actual struct, but only the name of the newly defined type. Only if you would define a variable of the same type inside the struct, since then the new name is not known until the typedef, which takes place after the struct definition. The entire line would look like: typedef struct { unsigned int model; unsigned short year; unsigned int colour; } Car; If you would reference a struct inside a struct, you would do this: typedef struct tagSelf ( struct tagSelf next; } Self; You use a temporary name, or 'tag', to be able to have the struct reference to itself. It is also common to write each member of the struct on a new line, to increase readability: typedef stuct { char *processor; unsigned int memory; unsigned int diskspace; } Computer, *pComputer; Note also that we defined 2 names here, Computer and *pComputer. *pComputer is a dereferenced pointer, hence the *. This automatically leads pComputer to be a pointer to a Computer struct (the 'p' stands for pointer. For improved readability, sometimes variable names are prefixed with abbreviations of their type, like this 'p' for pointer. This is called Hungarian notation. The inventor was a Microsoft programmer from Hungary, by the name of Charles Simonyi. The code looked like some weird foreign language at first sight, and since Simonyi was Hungarian by birth, they decided to call this convention Hungarian. To this day, all Microsoft code uses this notation. It can be very convenient.) So basically we combined this line: typedef (Computer *) pComputer; together with the definition of the Computer structure itself. To access members in a struct, we use the dot (.) operator. The syntax is: struct.member This will resolve to a 'normal' variable, which can be used in expressions just as usual. An example: Computer myComputer; myComputer.processor = "MC68000"; myComputer.memory = 1048576; myComputer.diskspace = 30234234; When you would have a pointer to a struct, you could do this: pComputer myComputer; (*myComputer).processor = "MC7400"; But, there is a special arrow (->) operator for pointers to structs, which is the preferred method: pComputer myComputer; myComputer->memory = 655360; It is also possible to create arrays of structs, and even initialize them. For example: Computer myComputers[2] = { { "MC68000", 1048576, 30234234 }, { "Pentium", (48*1048576), (3072*1048576) } }; The sizeof operator also works on user-defined types and structures, as I said earlier. So sizeof(Computer) will return the combined size of all the members of the Computer struct: sizeof(char *) sizeof(unsigned int) sizeof(unsigned int) + ---------------------- sizeof(Computer) In other words: 4 4 4 + ---- 12 sizeof(myComputers) would return the total size of the array, which will be 2 * 12 = 24 bytes, in our example. There is another type, very similar to the struct. That's the union. A union is used exactly like a struct, but with one difference: instead of having all members, you can pick one member, and use that. An example should clarify it: union Number { int i; double d; }; union Number myNumber; Now if we want to use this variable, we can use either the int: myNumber.i = 245; or the double: myNumber.d = 26.32345; You can also use the typedef-combinations like we saw earlier with the struct: typedef union { int i; double f; } Number; Number myNumber; The union will always take up as much memory as necessary for the largest member. sizeof() will also return that size. sizeof() will return the size of the type of that member. That about covers the user-defined types, now back to our malloc() call. We now know that we can simply specify the number of bytes as an argument of the malloc() function, and we get a void * back. So, we can have it cast implicitly to a pointer, and we have a pointer of the type we want, to a piece of memory of the size we want. This can be useful to dynamically allocate variables, and keep the memory usage under control. We allocate an array when we need it, and deallocate it again, when we're done with it, and save the memory for other uses and applications. For example, to allocate 1 car structure dynamically, we can do this: Car *myCar; myCar = malloc(sizeof(Car)); We can also allocate an array dynamically... simply by multiplying the size of 1 element by the number of elements we want. For example, an array of 25 ints: int *myArray; myArray = malloc(25*sizeof(int)); Now, to use this array, we can simply apply the subscript operator to the pointer, just like with the statically allocated arrays: myArray[20] = -15; To deallocate the memory again, we can use the free(void *) function. As you can see, it takes a pointer as its argument. Since it's a void *, any pointer will be implicitly cast, so we can just feed it any type of pointer directly. To deallocate our int-array, we simply type: free(myArray); and our memory is regained. Our first program ----------------- Well, we now covered just about everything that C can do. Now, let's look at how we create programs with the language constructs we've seen. How to group it together to a sourcefile, and how to create a binary executable from it. A sourcefile is a simple ASCII file, and can be written with any texteditor you like. There are 2 types of sourcefiles in C: - normal sourcecode, text files usually with the .c extension. - headers, text files usually with the .h extension. There is no physical difference between .c and .h files. Both can contain any form of C statements, directives and code. It's more of a habit of the C programmers to make the distinction, etiquette rather than syntax. Headers are a special kind of sourcefiles, which usually come with code libraries. They contain the necessary function prototypes, type definitions, constants and macros for use with the library. They don't normally contain code. When you use code from a library, you include its header file in the source: #include "header.h" The compiler will search the current directory for header.h first, and if it's not found there, it will continue to search the path specified by the INCLUDE environment variable. There is also this form: #include This will not search the current directory, but will start with the INCLUDE path immediately. To speed up compilation, use the <> form whenever possible, to prevent the compiler from searching too much. Right, now for our first real program, the infamous "Hello world" example... We use the puts() function to print a null-terminated string to the console output stream. Your API reference will tell you which header and which library to use. On *nix systems, you can type "man puts", and on most other systems, there will be some help function implemented into the IDE, where you can search for "puts". You will find that we need the stdio.h ('standard I/O')header file, so we write: #include This will basically paste the code from stdio.h into your current source file at the place of the #include statement. A C program starts execution from the main() function, which is defined to return an int, which will be the exit-code to the OS. There are 2 versions: - no arguments: int main(void) - Commandline passed from the OS: int main(int argc, char *argv[]) argc is the argument-count, the number of commandline parameters passed to the program. argv is an array of null-terminated strings, of size argc. Note that argv[0] is the program name itself, so if for example you had a commandline like this: myprogram one two three Then you would get argc = 4, and: argv[0] = "myprogram" argv[1] = "one" argv[2] = "two" argv[3] = "three" For our "Hello world" example, we could use the main(void) version, since we do not require any commandline parameters to be passed to this program. (Note: The main() function is called by a piece of code known as the 'stub'. The stub contains the raw entrypoint of the executable, and sets up the environment for running a C program (such as parsing the commandline, setting up standard input and output streams, and the OS environment variables). It then calls your main() function. When main() returns, the stub will clean up the environment again, and pass the return value of main() back to the OS and exit.) Actually, I already gave the code for the program as an example for the functions. The entire program would look like this: #include int main(void) { /* Print some text to the screen, using a library function */ puts("Hello world!"); /* Exit function with return value */ return 0; } Save it to a text file called "hello.c", and after discussing the compiling process, we will make an executable out of it. Now that we have our first source, it is time to compile it to a running program. This is a bit problematic to explain, since each compiler has its own commandline options and special behavior. I will just tell a bit about the process of compiling and linking, so you will know what to look for in the documentation of your C compiler. There are several levels in the transition from C source code to a binary executable. Technically, we have these levels: 4. C source code 3. Instruction listing (assembly source) 2. Linkable code object 1. Executable image A compiler will take us from level 4 to 3. Then an assembler will take us from 3 to 2, and finally, a linker will take us from 2 to the final level of 1. But in practice, we find that compilers will go from 4 to 2 immediately these days, and use a bytecode format internally, rather than a true assembly instruction listing. They often still have the option to output an assembly listing tho, should the programmer so require. And compilers automatically invoke the linker these days, so that we can go from C source code to an executable in one go. Compiling and linking will always be separate steps tho, since you may have some precompiled code which you want to link to the newly compiled code, such as imported library functions, or you might want to create some precompiled libraries, so you will not link them to a binary executable at all. So, I can't really help you with the commandline options for your compiler. But now that you know what the compiling process is about, you should be able to figure it out yourself. I can give you 2 examples tho, for both Microsoft's compiler and the GNU C Compiler (gcc). Microsoft Compiler: CL hello.c This will compile hello.c into hello.obj, and then link in the necessary standard C libraries. CL.EXE invokes LINK.EXE itself. It has libc.lib as a standard library, so you do not need to specify any libraries for standard C functions. You only need to specify libraries for non-C functions, like Windows API or third party libraries. CL will give the executable the same name as the source file which contains the main() function, so in our case, we will get hello.exe as a filename. In this case, this line is all you need to have hello.exe generated. if you need other libraries or objects, you can simply add them to the commandline: CL hello.c blah.obj foo.lib So, try to execute your first program now! GNU C compiler: gcc -o hello hello.c This compiler also has a C library as standard, so no need to specify it. The -o switch specifies the name of the executable. If no name is given, then it defaults to a.out instead. Some example programs --------------------- Here is a small program that I think will be useful. It will show you how to do some simple interaction with the user. It will also allow you to play with some bitwise operations, and make you get used to playing with hexadecimal values. This might come in handy lateron, as hexadecimal numbers and bitwise operations can be quite useful in speeding your code up. We will see puts() here again. The puts() function puts a string onto the standard output stream (stdout), which is normally the screen. And I say 'normally' here, because streams may be redirected to other devices, such as printers for example. We will also see gets(), which can read a line from the standard input stream (stdin), which is normally the keyboard input. It will put this line into a char array. The user must pass a pointer to this array to the gets() function. The getchar() function is similar, but it returns the first character from the stdin stream immediately, and does not require a buffer to be passed. We also get to meet fflush(), which 'flushes' a file stream. The stdin and stdout streams are considered as files aswell in C, so we can use it on these streams aswell. In our case we flush the stdin stream, to remove any superfluous input that we would rather ignore. The last function we get to meet, is printf(). This is a function to print formatted text (hence the 'f') onto stdout. It is quite an interesting function, because you can pass as many arguments as you wish. It has ellipsis (...) in its prototype, to make this possible. I will give an example of how to use them later. The printf takes a format string as its first argument, which will basically define what to do with the next arguments. How to format these, to be more specific. It will use a %, followed by some formatting rules, at the place where you want the respective argument to be printed. In our case we will see %s, which will print the argument as a string. So for example, you could do this: char name[] = "Jopie"; printf("Hello %s!", name); This will result in: Hello Jopie! We will also see %d, which will print a signed decimal number. For unsigned numbers, there is also %u. And lastly we will see %08X... This formatting is a bit more advanced. The 0 indicates that we want 0s prefixed to our number. The 8 indicates that we want to have 8 digits in total. And the X tells printf() to print the argument as a hexadecimal number, with uppercase alphas. There is also %x for lowercase alphas, but I prefer uppercase. There are a lot more options with printf(), but it would be a bit much to cover them all here. I suggest you use your reference for that instead. I will leave you with one last example, then we go to the actual program: unsigned int age = 16; char name[] = "Jopie"; printf("Congratulations %s, today is your %uth birthday!"); Result of this would be: Congratulations Jopie, today is your 16th birthday! Note also that puts() always does a newline after printing the string, while printf() does not. So in some cases, you might want to use printf() instead of puts() to avoid that newline. Be careful though, if your string contains %, printf() will see this as formatting. A workaround for this is using %c, which prints 1 character: printf("I can use %c in my strings with printf()", '%'); Anyway, here is the program: #include char name[32]; int x = 1, y = 1; /* Print menu, allow the user to choose an option, and process that option. Returns 0 if user chose to quit, else 1 */ int doMenu() { char buffer[32]; printf( "\nHello %s, here is the menu for today:\n\n" "1) Enter X\n" "2) Enter Y\n" "3) X + Y\n" "4) X - Y\n" "5) X * Y\n" "6) X / Y\n" "7) X & Y\n" "8) X | Y\n" "9) X ^ Y\n" "A) ~X\n" "B) ~Y\n" "C) Quit\n\n" "Current X = %d\n" "Current Y = %d\n\n" "What is your choice? ", name, x, y); /* remove additional input from stdin stream */ fflush(stdin); /* Get a character from stdin stream and process the command */ switch(getchar()) { case '1': /* remove additional input from stdin stream */ fflush(stdin); puts("Please enter new value for X."); x = atoi(gets(buffer)); break; case '2': /* remove additional input from stdin stream */ fflush(stdin); puts("Please enter new value for Y."); y = atoi(gets(buffer)); break; case '3': printf("%d + %d = %d\n", x, y, x + y); break; case '4': printf("%d - %d = %d\n", x, y, x - y); break; case '5': printf("%d * %d = %d\n", x, y, x * y); break; case '6': if(y == 0) puts("Y = 0. Division is not defined."); else printf("%d / %d = %d\n", x, y, x / y); break; /* The following are bitwise operations, let's also print out the hexadecimal representations for clarity */ case '7': printf("%d & %d = %d || " "%08X & %08X = %08X\n", x, y, x & y, x, y, x & y); break; case '8': printf("%d | %d = %d || " "%08X | %08X = %08X\n", x, y, x | y, x, y, x | y); break; case '9': printf("%d ^ %d = %d || " "%08X ^ %08X = %08X\n", x, y, x ^ y, x, y, x ^ y); break; case 'a': case 'A': printf("~%d = %d || " "~%08X = %08X\n", x, ~x, x, ~x); break; case 'b': case 'B': printf("~%d = %d || " "~%08X = %08X\n", y, ~y, y, ~y); break; case 'c': case 'C': /* The user wants to leave, we return 0, to break out of the while() loop in main() */ puts("Thanks, and have a nice day."); return 0; default: /* Since we have arrived here, apparently none of the valid choices were reached */ puts("Invalid choice."); break; } printf("Press enter to continue."); getchar(); return 1; } int main(void) { /* Input the username */ puts("What is your name?"); gets(name); /* Keep returning to menu until user chooses to quit */ while(doMenu()); return 0; } This might be a good time to greet some people. First of all, ofcourse my Diamond Crew people, ewald and Maybird. And ofcourse also all my Phrozen Crew mates, you know who you are :) And the #Win32ASM guys, hutch, llama, Iczelion, nuu, dowap (or whatever :), and the rest. Ofcourse Kalms, and last but not least, the ladies (in no particular order, you know how women are :). Sara`, Tracy, blorght, MoonDawn, Nitallica, CandyII, jessca, embla, Baudie, flipgrrrl, and yes, even taylor^ :) X-Calibre