

To start this process, we could take this from two directions:
start out trying to understand the detailed anatomy
of a number and then work out from there to expressions involving combinations of them;
or,
start out describing the general form of an
expression and leaving the details of worrying about number anatomy until later.
In some sense, this is a kind of bottom-up versus top-down choice.
Both will work fine in this case. The definition of number formats is so well worked-out
by now so that it integrates smoothly with algebraic expressions that we can be pretty
sure we won't need to modify what a number is after we start working on the expressions.
So we are safe in choosing to work on the numbers, first. But it would also work about as
well to proceed from defining the general anatomy of expressions, first -- similar
reasoning suggests that will succeed, too.
I'm going to start with the anatomy of expressions (direction 2,
above) and eventually wind up talking about the details of what a number looks like, later
on. I'm choosing this approach because most of us already have a fairly concrete idea in
our heads about what constitutes a number and we can safely leave the precise details
until later and cover it as a "necessary evil," then.
Let's first list some possible expressions. Maybe that will help in
getting some initial ideas started.
45
23 + 10
3 * 4
( 6 + 5 ) * 7
2 * -8
6 + ( 4 - (2) ) * -3
(-6)
We are probably certain we'd like to accept addition and
subtraction, and multiplication and division, too. What about raising a value to a power?
I think so, but for now let's defer that detail. In fact, this suggests an idea -- why
don't we defer ALL of the mathematical operations for a moment and see where that takes
us?
Handling Parentheses
If we defer considering the mathematical operations until later, we
have the following general possibilities for an expression:
number
( number )
( ( number ) )
( ... ( number ) ... )
If you aren't familiar with my use of ... above, it's an ellipis and
just means "more of the same." In this case, more pairs of parentheses. It's
short-hand. And rather than using specific numbers here, I've decided to just say number
and leave the details of exactly what that is until later on. We can pack the number
inside as many parenthesis pairs as we want, or none at all. Without the math operations
to worry about, there's not much else -- it's just numbers and parentheses.
How might we write a specification for this, though? Perhaps
something like:
expression := number | ( expression )
Before going on, let's see how what I mean by the way I wrote that.
First, when I use lower-case name above, it's a production and is meant to be taken
symbolically. A number does not mean the literal text, "number," for
example. It means any general number can be inserted at that point. It's conceptual. Also,
the single character, |, means --or--, as in
this OR that. It's helpful short-hand to learn. Finally, the := character pair means IS -- the thing on the left side IS
the thing on the right side. In this case, an expression is declared to be the
same thing as either a number or else an open paren followed by an expression
followed by a close paren.
Here's how you should read it: "An expression is
either a number or else it is an open paren, followed by an expression,
followed by a close paren." This kind of special notation is often called a
"production" in the specialized literature (at least, those I've read.)
Note that this production is recursive. A concept on the
right side uses a concept on the left side that uses that same concept, yet again. An
expression is either a number or else an enclosed expression. But an enclosed expression
can be either a number or an enclosed expression. Etc. In other words, it only ends when
you decide to say that an expression is a number. At that point, the process ends. So it
kind of goes like this:
number first option
( number ) second option, first option
( ( number ) ) second option, second option, first option
( ... ( number ) ... ) etc.
The nice thing about specifying an expression that way is that it
compactly describes what an expression really is at this point. We don't need to list each
possibility -- instead, we can simply refer back, recursively, to an expression and
capture all the possibilities that way.
It should be clear now that arranging it so that what goes inside
the two parens is another expression is what allows nested parens -- otherwise, if what
went inside was just a number, then we couldn't nest the paren pairs. Using that
production, I think you can see how you can construct all of the simplest examples
mentioned, those composed only of a number or a number surrounded by one or more
parenthesis pairs.
Coding?
Let's connect what we've discussed above to the idea of writing code
for it. How might we write a program to analyze the above syntax?
Well, let's say we had two subroutines, called Expression and
Number. When called, the Expression subroutine would first check for an open paren, right
away. If one is present, Expression would then simply call itself to get the value of the
expression inside and then, upon getting that value, it would then insist on seeing a
close paren in order to match it up with the open paren it saw earlier. And if so, it
could return successfully with the value it returned to itself. However, if Expression
didn't see an open paren to begin with, it would instead call the subroutine Number to get
the value of the number and return with the result.
Note that when Expression sees an open paren to begin with, it calls
itself. Recursive calls like this are common in "recursive descent" parsers. In
fact, it's where the name comes from.
We haven't yet defined exactly what a number should look like so we
really can't code up Number, yet. But we can get a look at what Expression might look like
and this concrete example might help a little:
FUNCTION Expression% (eqpos AS INTEGER, eq AS STRING)
DIM status AS INTEGER
IF Match("(", eqpos, eq) THEN
LET status = Expression(eqpos, eq)
IF status THEN
LET status = Match(")", eqpos, eq)
END IF
ELSE
LET status = Number(eqpos, eq)
END IF
LET Expression = status
END FUNCTION
There are some important details to examine, here.
First, the routine returns a value. This value is 0
(FALSE) if the expression wasn't matched properly and returns -1 (TRUE) if the expression
was matched. The routine first checks for an open paren. If this is matched, then it calls
itself recursively to see if an expression lies within a parenthesis pair. It then checks,
if an expression was found to be valid, for the closing paren. Third, if an open paren
isn't found, then it must be a number and that function is called to verify this fact.
Either way, the resulting status is returned.
Second, the routine is passed two variables, eqpos
and eq. eq is the equation string, in its entirety, and eqpos is the current
position in the equation string. Functions like Match and Number may make adjustments to
eqpos, depending on their success. They won't change eq, though.
Third, I've introduced a "helper" function
called Match, which simply checks to see if the next character in the string happens to be
in the provided list. The provided list here is always just one character, so only that
character can be matched. If Match does detect a match, it returns TRUE (-1) and adjusts
the eqpos variable to reflect the fact that a match occured.
So, if an open paren is matched, eqpos is updated by
Match so that the next routine seeing eqpos will not "see" the open paren. The
routine then asks itself to examine the string from that point (after the open paren) on
to see if another expression can be validly matched. If so, eqpos will be updated
automatically so that the matched expression is now skipped and then the routine can check
for the final close paren, which is needed to match up with the earlier open paren.
However, if an open paren isn't seen in the first place, then the routine insists on
finding a number.
One thing to notice, if you haven't already, is that
this method won't handle spaces. The expression, ((5)), is
valid, while, ((5 )), is not. The inserted space is enough to
cause problems. To fix that, we need another "helper" function:
FUNCTION Expression% (eqpos AS INTEGER, eq AS STRING)
DIM status AS INTEGER
SkipSpaces eqpos, eq
IF Match("(", eqpos, eq) THEN
LET status = Expression(eqpos, eq)
IF status THEN
SkipSpaces eqpos, eq
LET status = Match(")", eqpos, eq)
END IF
ELSE
LET status = Number(eqpos, eq)
END IF
LET Expression = status
END FUNCTION
It we skip spaces before a hard character. like the
open and close parens, then we've got it covered. Now spaces can take place anywhere
except for inside a number, itself.
To test all this, you can try the following:
- EQ_01.ZIP or EQ_01.BAS
- An example program that will only accept the an expression of the
type mentioned so far. In this example, the code also includes routines to handle
examining numbers, but you can ignore that necessity for now. Just note the code for
Expression and the support routines called Match and SkipSpaces, if you are interested.
Last updated: Friday, January 13, 2006 23:50
|