|
| |


Well, I've postponed it long enough. It's time to look at the
production for a number.
Normally, this would be the grammar that the lexer would follow, not
the parser. However, I'm just going to drop in these rules and include it as part of the
overall parser we are working on. In fact, if you've been using the earlier programs, you
are already using the grammer we'll talk about here.
Numbers
I think most of us are now used to numbers, including integers,
decimal values, and scientific values. It would be nice to have a set of productions that
properly relate to them.
Here's my version:
digits := digit digits | <null>
digit := 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
mantissa := . digit digits | digit . digits | digit digits
scaleid := e | E
scale := scaleid sign digit digits | <null>
sign : + | - | <null>
number := sign mantissa scale
If you try this out on various legal and illegal
numbers, I think you'll find it's works well. Try and trip it up!
Scanning Text
Let's focus now on the detailed process of analyzing some text. I've
said, out the outset, that this parser will only process single strings. This simplifies
the management of scanning the text, somewhat, and allows us to focus on the important
details without getting overly hung up on scanning details. I'll leave more sophisticated
text scanning for you.
One thing that should be clear now is that we need to keep track of
where we are within the string. This means some kind of "position" needs to be
tracked, along with the string itself, to tell us where the next character is. This
position needs to be updated as the scanning proceeds. We may also need to keep track of
the various values as parsing proceeds, but I'm leaving that as a later concern. For now,
let's focus on the scanning process and simply determining whether or not the mathematical
expression is valid.
I'm using a variable called eqpos which is the position within the
string called eq that we are currently processing. It always starts out with the value of
1 so that it refers to the first character in the string. As the string is processed by
the various productions, eqpos is advanced. If the entire equation matches our production
rules, then eqpos will point past the end of the string when everything is done and the
final status will be TRUE (-1) from Expression. If some failure occurs, then eqpos will
point to the first unprocessed character and that fact can be used to illustrate where the
error in the equation is located.
I've neglected much talk about spaces, tabs, and other forms of
so-called 'white space.' But it's necessary, in practice, to make some decisions about it.
I've already been including, in prior code examples, the use of SkipSpaces. But I haven't
actually shown the code for it, just yet. For my purposes, the only valid white space is a
space character. Anything else, such as a tab, will be rejected as invalid input. However,
it would be easy to add that support in the routine. I'll leave that for you, if you are
interested. Also, while it's fine for spaces to be present elsewhere, I think it should be
forbidden to include spaces in the middle of a number -- numbers should not be broken up
with spaces, in other words.
Okay, that said, let's look at SkipSpaces:
SUB SkipSpaces (eqpos AS INTEGER, eq AS STRING)
DO WHILE eqpos <= LEN(eq)
IF MID$(eq, eqpos, 1) <> " " THEN
EXIT DO
END IF
LET eqpos = eqpos + 1
LOOP
END SUB
This routine advances eqpos until it points at a
character in eq that is NOT a space. That's about all.
I've also been using a routine called Match. Let's look
at it:
FUNCTION Match% (charlist AS STRING, eqpos AS INTEGER, eq AS STRING)
DIM status AS INTEGER
IF eqpos <= LEN(eq) THEN
IF INSTR(charlist, MID$(eq, eqpos, 1)) <> 0 THEN
LET eqpos = eqpos + 1
LET status = -1
END IF
ELSE
LET status = 0
END IF
LET Match = status
END FUNCTION
That routine does exactly what I said and updates
'eqpos' if the character is matched. The return status is (-1), if successful, and (0), if
not. You can provide more than one character in the string called charlist that is passed
to Match. In this way, you can check for several different characters matching the next
character in eq.
Well, that's the two helper routines mentioned earlier.
Number Coding?
Okay, it's time to get back to writing the code for scanning
numbers. Here's everything it takes:
SUB Digits (eqpos AS INTEGER, eq AS STRING)
DO WHILE Digit(eqpos, eq)
LOOP
END SUB
FUNCTION Digit% (eqpos AS INTEGER, eq AS STRING)
DIM status AS INTEGER
LET status = Match("0123456789", eqpos, eq)
LET Digit = status
END FUNCTION
FUNCTION Mantissa% (eqpos AS INTEGER, eq AS STRING)
DIM status AS INTEGER
IF Match(".", eqpos, eq) THEN
LET status = Digit(eqpos, eq)
IF status THEN
Digits eqpos, eq
END IF
ELSEIF Digit(eqpos, eq) THEN
Digits eqpos, eq
LET status = -1 OR Match(".", eqpos, eq)
Digits eqpos, eq
ELSE
LET status = 0
END IF
LET Mantissa = status
END FUNCTION
FUNCTION ScaleID% (eqpos AS INTEGER, eq AS STRING)
DIM status AS INTEGER
LET status = Match("eE", eqpos, eq)
LET ScaleID = status
END FUNCTION
SUB Scale (eqpos AS INTEGER, eq AS STRING)
DIM status AS INTEGER, savepos AS INTEGER
LET savepos = eqpos
IF ScaleID(eqpos, eq) THEN
Sign eqpos, eq
LET status = Digit(eqpos, eq)
IF status THEN
Digits eqpos, eq
ELSE
LET eqpos = savepos
END IF
END IF
END SUB
SUB Sign (eqpos AS INTEGER, eq AS STRING)
DIM status AS INTEGER
LET status = Match("+-", eqpos, eq)
END SUB
FUNCTION Number% (eqpos AS INTEGER, eq AS STRING)
DIM status AS INTEGER, dummy as INTEGER
Sign eqpos, eq
LET status = Mantissa(eqpos, eq)
IF status THEN
Scale eqpos, eq
END IF
LET Number = status
END FUNCTION
The above example code also illustrates a coding style
I've used earlier, but what to call out here. If I'm writing code for a FUNCTION, I place
the value assignment for that function at the very bottom of the routine. In cases like
this, where there is a status value, I create a temporary variable in the routine, assign
the status to it, then at the bottom transfer that value to the function's value. It's not
obvious here, why I do that, because the above routine could be trivially reduced to a
single statement. But I do it for consistency with other routines I'll be writing here
where things aren't so trivial -- if I need to change the name of the FUNCTION, I only
have one place I need to edit and this reduces editing mistakes. Since BASIC will often
just create a variable without telling you that you made a typing error, this helps me. In
any case, this is my style for these examples. (In fact, this detail has helped me in
writing all the earlier examples, where I sometimes needed to just change the name of a
routine.)
As I said, you've already been using these routines in
the code, so far.
Last updated: Saturday, January 14, 2006 00:02
|