Domain Specifi C Embedded Languages Chapter Nine
Domain Specifi C Embedded Languages Chapter Nine
Chapter Nine
Chapter Overview
HLAs compile time language was designed with one purpose in mind: to give the HLA user the ability
to change the syntax of the language in a user-dened manner. The compile-time language is actually so
powerful that it lets you implement the syntax of other languages (not just an assembly language) within an
HLA source le. This chapter discusses how to take this feature to an extreme and implement your own
"mini-languages" within the HLA language.
9.2
Page 1003
Chapter Nine
Volume Five
ments in this manner, their example should provide a template for implementing other types of control structures in HLA.
The following sections show how to implement the FOREVER..ENDFOR, WHILE..ENDWHILE, and
IF..ELSEIF..ELSE..ENDIF statements. This text leaves the REPEAT..UNTIL and BEGIN..EXIT..EXITIF..END statements as exercises. The remaining high level language control structures (e.g.,
TRY..ENDTRY) are a little too complex to present at this point.
Because words like "if" and "while" are reserved by HLA, the following examples will use macro identiers like "_if" and "_while". This will let us create recognizable statements using standard HLA identiers
(i.e., no conicts with reserved words).
9.2.1.1
The FOREVER loop is probably the easiest control structure to implement. After all, the basic FOREVER loop simply consists of a label and a JMP instruction. So the rst pass at implementing
_FOREVER.._ENDFOR might look like the following:
#macro _forever: topOfLoop;
topOfLoop:
#terminator _endfor;
jmp topOfLoop;
#endmacro;
Unfortunately, there is a big problem with this simple implementation: youll probably want the ability
to exit the loop via break and breakif statements and you might want the equivalent of a continue and continueif statement as well. If you attempt to use the standard BREAK, BREAKIF, CONTINUE, and CONTINUEIF statements inside this _forever loop implementation, youll quickly discover that they do not work.
Those statements are valid only inside an HLA loop and the _forever macro above is not an HLA loop. Of
course, we could easily solve this problem by dening _FOREVER thusly:
#macro _forever;
forever
#terminator _endfor;
endfor;
#endmacro;
Now you can use BREAK, BREAKIF, CONTINUE, and CONTINUEIF inside the _forever.._endfor statement. However, this solution is ridiculous. The purpose of this section is to show you how you could create
this statement were it not present in the HLA language. Simply renaming FOREVER to _forever is not an
interesting solution.
Probably the best way to implement these additional statements is via KEYWORD macros within the
_forever macro. Not only is this easy to do, but it has the added benet of not allowing the use of these statements outside a _forever loop.
Implementing a _continue statement is very easy. Continue must transfer control to the rst statement
at the top of the loop. Therefore, the _continue #KEYWORD macro will simply expand to a single JMP
instruction that transfers control to the topOfLoop label. The complete implementation is the following:
keyword _continue;
jmp topOfLoop;
Implementing _continueif is a little bit more difcult because this statement must evaluate a boolean
expression and decide whether it must jump to the topOfLoop label. Fortunately, the HLA JT (jump if true)
pseudo-instruction makes this a relatively trivial task. The JT pseudo-instruction expects a boolean expres-
Page 1004
You will implement the _break and _breakif #KEYWORD macros in a similar fashion. The only difference is that you must add a new label just beyond the JMP in the _endfor macro and the break statements
should jump to this local label. The following program provides a complete implementation of the
_forever.._endfor loop as well as a sample test program for the _forever loop.
/************************************************/
/*
*/
/* foreverMac.hla
*/
/*
*/
/* This program demonstrates how to use HLA's
*/
/* "context-free" macros, along with the JT
*/
/* "medium-level" instruction to create
*/
/* the FOREVER..ENDFOR, BREAK, BREAKIF,
*/
/* CONTINUE, and CONTINUEIF control statements. */
/*
*/
/************************************************/
program foreverDemo;
#include( "stdlib.hhf" )
foreverLbl:
//
//
//
//
//
//
keyword _continue;
jmp foreverLbl;
keyword _continueif( cifExpr );
jt( cifExpr ) foreverLbl;
Page 1005
Chapter Nine
Volume Five
//
//
//
//
//
//
terminator _endfor;
jmp foreverLbl;
foreverbrk:
endmacro;
begin foreverDemo;
// A simple main program that demonstrates the use of the
// statements above.
mov( 0, ebx );
_forever
stdout.put( "Top of loop, ebx = ", (type uns32 ebx), nl );
inc( ebx );
// On first iteration, skip all further statements.
_continueif( ebx = 1 );
// On fourth iteration, stop.
_breakif( ebx = 4 );
_continue;
_break;
_endfor;
end foreverDemo;
Program 9.1
Page 1006
9.2.1.2
Once the FOREVER..ENDFOR loop is behind us, implementing other control structures like the
WHILE..ENDWHILE loop is fairly easy. Indeed, the only notable thing about implementing the
_while.._endwhile macros is that the code should implement this control structure as a REPEAT..UNTIL
statement for efciency reasons. The implementation appearing in this section takes a rather lazy approach
to implementing the DO reserved word. The following code uses a #KEYWORD macro to implement a
"_do" clause, but it does not enforce the (proper) use of this keyword. Instead, the code simply ignores the
_do clause wherever it appears between the _while and _endwhile. Perhaps it would have been better to
check for the presence of this statement (not to difcult to do) and verify that it immediately follows the
_while clause and associated expression (somewhat difcult to do), but this just seems like a lot of work to
check for the presence of an irrelevant keyword. So this implementation simply ignores the _do. The complete implementation appears in Program 9.2:
/************************************************/
/*
*/
/* whileMacs.hla
*/
/*
*/
/* This program demonstrates how to use HLA's
*/
/* "context-free" macros, along with the JT and */
/* JF "medium-level" instructions to create
*/
/* the basic WHILE statement.
*/
/*
*/
/************************************************/
program whileDemo;
#include( "stdlib.hhf" )
//
//
//
//
//
//
keyword _do;
Page 1007
Chapter Nine
Volume Five
//
//
//
//
//
//
//
//
//
terminator _endwhile;
whltest:
jt( whlexpr ) repeatwhl;
brkwhl:
endmacro;
begin whileDemo;
//
//
//
//
mov( 0, eax );
_while( eax < 10 ) _do
stdout.put( "eax in loop = ", eax, " ebx=" );
inc( eax );
mov( 0, ebx );
Page 1008
Program 9.2
9.2.1.3
The IF Statement
Simulating the HLA IF..THEN..ELSEIF..ELSE..ENDIF statement using macros is a little bit more
involved than the simulation of FOREVER or WHILE. The semantics of the ELSEIF and ELSE clauses
complicate the code generation and require careful thought. While it is easy to write #KEYWORD macros
for _elseif and _else, ensuring that these statements generate correct (and efcient) code is another matter
altogether.
The basic _if.._endif statement, without the _elseif and _else clauses, is very easy to implement (even
easier than the _while.._endwhile loop of the previous section). The complete implementation is
#macro _if( ifExpr ): onFalse;
jf( ifExpr ) onFalse;
#keyword _then;
#terminator _endif;
onFalse:
#endmacro;
This macro generates code that tests the boolean expression you supply as a macro parameter. If the
expression evaluates false, the code this macro emits immediately jumps to the point just beyond the _endif
terminating macro. So this is a simple and elegant implementation of the IF..ENDIF statement, assuming
you dont need an ELSE or ELSEIF clause.
Adding an ELSE clause to this statement introduces some difculties. First of all, we need some way to
emit the target label of the JF pseudo-instruction in the _else section if it is present and we need to emit this
label in the terminator section if the _else section is not present.
A related problem is that the code after the _if clause must end with a JMP instruction that skips the
_else section if it is present. This JMP must transfer control to the same location as the current onFalse
label.
Page 1009
Chapter Nine
Volume Five
Another problem that occurs when we use #KEYWORD macros to implement the _else clause, is that
we need some mechanism in place to ensure that at most one invocation of the _else macro appears in a
given _if.._endif sequence.
We can easily solve these problems by introducing a compile-time variable (i.e., VAL object) into the
macro. We will use this variable to indicate whether weve seen an _else section. This variable will tell us if
we have more than one _else clause (which is an error) and it will tell us if we need to emit the onFalse label
in the _endif macro. A reasonable implementation might be the following:
#macro _if( ifExpr ): onFalse, ifDone, hasElse;
?hasElse := False;
#keyword _else;
// Check to see if this _if statement already has an _else clause:
#if( hasElse )
#error( "Only one _else clause is legal in an _if statement )
#endif
?hasElse := true;
//
//
//
//
//
Since weve just encountered the _else clause, weve just finished
processing the statements in the _if section. The first thing we
need to do is emit a JMP instruction that will skip around the
_else statements (so the _if section doesnt fall in to the
_else code).
jmp ifDone;
// Okay, emit the onFalse label here so a false expression will transfer
// control to the _else statements:
onFalse:
#terminator _endif;
//
//
//
//
//
#if( hasElse )
ifdone:
#else
onFalse:
#endif
#endmacro;
Page 1010
If the base parameter is a string value holding a valid HLA identier and the number parameter is an
integer numeric operand, then this macro will emit a valid HLA identier that consists of the base string followed by a string representing the numeric constant. For example, genLabel( "Hello", 52) emits the label
Hello52. Since we can easily create an uns32 VAL object inside our _if macro and increment this each time
we need a unique label, the only problem is to generate a unique base string on each invocation of the _if
macro. Fortunately, HLA already does this for us.
Remember, HLA converts all local macro symbols to a unique identier of the form "_xxxx_" where
xxxx represents some four-digit hexadecimal value. Since local symbols are really nothing more than text
constants initialized with these unique identier strings, its very easy to obtain an unique string in a macro
invocation- just declare a local symbol (or use an existing local symbol) and apply the @STRING: operator
to it to extract the unique name as a string. The following example demonstrates how to do this:
#macro uniqueIDs: counter, base;
?counter := 0;
?base := @string:base;
.
.
.
#endmacro;
Once we have the capability to generate a sequence of unique labels throughout a macro, implementing
the _elseif clause simply becomes the task of emitting the last referenced label at the beginning of each
Beta Draft - Do not distribute
Page 1011
Chapter Nine
Volume Five
_elseif (or _else) clause and jumping if false to the next unique label in the series. Program 9.3 implements
the _if.._then.._elseif.._else.._endif statement using exactly this technique.
/*************************************************/
/*
*/
/* IFmacs.hla
*/
/*
*/
/* This program demonstrates how to use HLA's
*/
/* "context-free" macros, along with the JT and */
/* JF "medium-level" instructions to create
*/
/* an IF statement.
*/
/*
*/
/*************************************************/
program IFDemo;
#include( "stdlib.hhf" )
//
//
//
//
//
//
//
//
//
//
//
/*
** Emulate the if..elseif..else..endif statement here.
*/
macro _if( ifexpr ):elseLbl, ifDone, hasElse, base;
//
//
//
//
//
//
?base := @string:base;
//
//
//
//
//
?elseLbl := 0;
Page 1012
?hasElse := false;
//
//
//
//
//
//
//
//
keyword _then;
"_if" clause
clause. So
to do is jump
this "_if"
jmp ifDone;
//
//
//
//
//
//
//
//
//
//
Page 1013
Chapter Nine
Volume Five
genLabel( base, elseLbl ):
?elseLbl := elseLbl+1;
jf(elsex) genLabel( base, elseLbl );
keyword _else;
// Only allow a single "_else" clause in this
// "_if" statement:
#if( hasElse )
#error( "Unexpected '_else' clause" )
#endif
terminator _endif;
//
//
//
//
ifDone:
#if( !hasElse )
genLabel( base, elseLbl ):
#endif
endmacro;
begin IFDemo;
// Quick demo of the use of the above statements.
for( mov( 0, eax ); eax < 5; inc( eax )) do
_if( eax = 0 ) _then
stdout.put( "in _if statement" nl );
Page 1014
end IFDemo;
Program 9.3
Page 1015
Chapter Nine
Volume Five
// switch( i )
mov( i, eax );
// Check to see if "i" is outside the range
cmp( eax, 5 );
// 5..7 and transfer control directly to the
jb EndCase
// DEFAULT case if it is.
cmp( eax, 7 );
ja EndCase;
jmp( JmpTbl[ eax*4 - 5*@size(dword)] );
// case( 5 )
Stmt5:
stdout.put( I=5 );
jmp EndCase;
// Case( 6 )
Stmt6:
stdout.put( I=6 );
jmp EndCase;
// Case( 7 )
Stmt7:
stdout.put( I=7 );
EndCase:
If you study this code carefully, with an eye to writing a macro to implement this statement, youll discover a couple of major problems. First of all, it is exceedingly difcult to determine how many cases and
the range of values those cases cover before actually processing each CASE in the SWITCH statement.
Therefore, it is really difcult to emit the range check (for values outside the range 5..7) and the indirect
jump before processing all the cases in the SWITCH statement. You can easily solve this problem, however,
by moving the checks and the indirect jump to the bottom of the code and inserting a couple of extra JMP
instructions. This produces the following implementation:
readonly
JmpTbl:dword[3] := [ &Stmt5, &Stmt6, &Stmt7 ];
.
.
.
// switch( i )
jmp DoSwitch;
// case( 5 )
Stmt5:
stdout.put( I=5 );
jmp EndCase;
// Case( 6 )
Stmt6:
stdout.put( I=6 );
jmp EndCase;
// Case( 7 )
Stmt7:
stdout.put( I=7 );
jmp EndCase;
// Second jump inserted into this code.
DoSwitch:
mov( i, eax );
Page 1016
Since the range check code appears after all the cases, the macro can now process those cases and easily
determine the bounds on the cases by the time it must emit the CMP instructions above that check the
bounds of the SWITCH value. However, this implementation still has a problem. The entries in the JmpTbl
table refer to labels that can only be determined by rst processing all the cases in the SWITCH statement.
Therefore, a macro cannot emit this table in a READONLY section that appears earlier in the source le than
the SWITCH statement. Fortunately, HLA lets you embed data in the middle of the code section using the
READONLY..ENDREADONLY and STATIC..ENDSTATIC directives1. Taking advantage of this feature
allows use to rewrite the SWITCH implementation as follows:
// switch( i )
jmp DoSwitch;
// case( 5 )
Stmt5:
stdout.put( I=5 );
jmp EndCase;
// Case( 6 )
Stmt6:
stdout.put( I=6 );
jmp EndCase;
// Case( 7 )
Stmt7:
stdout.put( I=7 );
jmp EndCase;
// Second jump inserted into this code.
DoSwitch:
// Insert this label and move the range
mov( i, eax );
// checks and indirect jump down here.
cmp( eax, 5 );
jb EndCase
cmp( eax, 7 );
ja EndCase;
jmp( JmpTbl[ eax*4 - 5*@size(dword)] );
// All the cases (including the default case) jump down here:
EndCase:
readonly
JmpTbl:dword[3] := [ &Stmt5, &Stmt6, &Stmt7 ];
endreadonly;
HLAs macros can produce code like this when processing a SWITCH macro. So this is the type of code we
will generate with a _switch.._case.._default.._endswitch macro.
Since were going to need to know the minimum and maximum case values (in order to generate the
appropriate operands for the CMP instructions above), the _case #KEYWORD macro needs to compare the
1. HLA actually moves the data to the appropriate segment in memory, the data is not stored directly in the CODE section.
Page 1017
Chapter Nine
Volume Five
current case value(s) against the global minimum and maximum case values for all cases. If the current case
value is less than the global minimum or greater than the global maximum, then the _case macro must
update these global values accordingly. The _endswitch macro will use these global minimum and maximum values in the two CMP instructions it generates for the range checking sequence.
For each case value appearing in a _switch statement, the _case macros must save the case value and an
identifying label for that case value. This is necessary so that the _endswitch macro can generate the jump
table. What is really needed is an arbitrary list of records, each record containing a value eld and a label
eld. Unfortunately, the HLA compile-time language does not support arbitrary lists of objects, so we will
have to implement the list using a (xed size) array of record constants. The record declaration will take the
following form:
caseRecord:
record
value:uns32;
label:uns32;
endrecord;
The value eld will hold the current case value. The label eld will hold a unique integer value for the
corresponding _case that the macros can use to generate statement labels. The implementation of the
_switch macro in this section will use a variant of the trick found in the section on the _if macro; it will convert a local macro symbol to a string and append an integer value to the end of that string to create a unique
label. The integer value appended will be the value of the label eld in the caseRecord list.
Processing the _case macro becomes fairly easy at this point. All the _case macro has to do is create an
entry in the caseRecord list, bump a few counters, and emit an appropriate case label prior to the code emission. The implementation in this section uses Pascal semantics, so all but the rst case in the
_switch.._endswitch statement must rst emit a jump to the statement following the _endswitch so the previous cases code doesnt fall into the current case.
The real work in implementing the _switch.._endswitch statement lies in the generation of the jump
table. First of all, there is no requirement that the cases appear in ascending order in the _switch.._endswitch
statement. However, the entries in the jump table must appear in ascending order. Second, there is no
requirement that the cases in the _switch.._endswitch statement be consecutive. Yet the entries in the jump
table must be consecutive case values2. The code that emits the jump table must handle these inconsistencies.
The rst task is to sort the entries in the caseRecord list in ascending order. This is easily accomplished
by writing a little SortCases macro to sort all the caseRecord entries once the _switch.._endswitch macro has
processed all the cases. SortCases doesnt have to be fancy. In fact, a bubblesort algorithm is perfect for this
because:
/**************************************************/
/*
*/
/* switch.hla*/
/*
*/
2. Of course, if there are gaps in the case values, the jump table entries for the missing items should contain the address of the
default case.
Page 1018
program demoSwitch;
#include( "stdlib.hhf" )
const
//
//
//
//
//
maxCases := 256;
type
// The following data type hold the case value
// and statement label information for each
// case appearing in a _switch statement.
caseRecord:
record
value:uns32;
lbl:uns32;
endrecord;
//
//
//
//
//
//
//
//
//
//
//
SortCases
This routine does a bubble sort on an array
of caseRecord objects. It sorts in ascending
order using the "value" field as the key.
This is a good old fashioned bubble sort which
turns out to be very efficient because:
(1) The list of cases is usually quite small, and
(2) The data is usually already sorted (or mostly sorted).
Page 1019
Chapter Nine
Volume Five
#if
(
sort_array[sort_i].value >
sort_array[sort_i+1].value
)
?sort_temp := sort_array[sort_i];
?sort_array[sort_i] := sort_array[sort_i+1];
?sort_array[sort_i+1] := sort_temp;
?sort_didswap := true;
#elseif
(
sort_array[sort_i].value =
sort_array[sort_i+1].value
)
#error
(
"Two cases have the same value: (" +
string( sort_array[sort_i].value ) +
")"
)
#endif
?sort_i := sort_i + 1;
#endwhile
?sort_bnd := sort_bnd - 1;
#endwhile;
endmacro;
Page 1020
?switch_cases:caseRecord[ maxCases ];
// General initialization for processing cases.
?switch_caseIndex := 0;
?switch_minval := $FFFF_FFFF;
?switch_maxval := 0;
?switch_hasotherwise := false;
//
//
//
//
//
//
//
//
//
Page 1021
Chapter Nine
Volume Five
// supplied to the _case macro.
?switch_parmIndex:uns32;
?switch_parmIndex := 0;
#while( switch_parmIndex < switch_parmCount )
?switch_constant: uns32;
?switch_constant: uns32 :=
uns32( @text( switch_parms[ switch_parmIndex ]));
// Update minimum and maximum values based on the
// current case value.
#if( switch_constant < switch_minval )
?switch_minval := switch_constant;
#endif
#if( switch_constant > switch_maxval )
?switch_maxval := switch_constant;
#endif
// Emit a unique label to the source code for this case:
@text
(
+
+
"_case"
@string:switch_caseIndex
string( switch_caseIndex )
):
// Save away the case label and the case value so we
// can build the jump table later on.
?switch_cases[ switch_caseIndex ].value := switch_constant;
?switch_cases[ switch_caseIndex ].lbl := switch_caseIndex;
// Bump switch_caseIndex value because we've just processed
// another case.
?switch_caseIndex := switch_caseIndex + 1;
#if( switch_caseIndex >= maxCases )
#error( "Too many cases in statement" );
#endif
?switch_parmIndex := switch_parmIndex + 1;
#endwhile
Page 1022
// Emit the label for this default case and set the
// switch_hasotherwise flag to true.
switch_otherwise:
?switch_hasotherwise := true;
//
//
//
//
//
//
//
//
//
//
Feel free to
Page 1023
Chapter Nine
Volume Five
SortCases( switch_cases, switch_caseIndex );
"&"
"_case"
@string:switch_caseIndex
string( switch_cases[ switch_i_ ].lbl )
","
)
// Emit "&switch_otherwise" table entries for any gaps present
// in the table:
?switch_j_ := switch_cases[ switch_i_ + 1 ].value;
?switch_curCase_ := switch_curCase_ + 1;
#while( switch_curCase_ < switch_j_ )
&switch_otherwise,
?switch_curCase_ := switch_curCase_ + 1;
#endwhile
?switch_i_ := switch_i_ + 1;
#endwhile
// Emit a dummy entry to terminate the table:
&switch_otherwise];
endreadonly;
#if( switch_caseIndex < 1 )
#error( "Must have at least one case" );
#endif
// After the default case, or after the last
// case entry, jump over the code that does
// the conditional jump.
Page 1024
?switch_cases := 0;
endmacro;
begin demoSwitch;
Page 1025
Chapter Nine
Volume Five
Program 9.4
The problem with this approach is that when the statement immediately following the ENDWHILE executes, that code doesnt know whether the loop terminated because it found the desired value or because it
exhausted the list. The typical solution is to test to see if the loop exhausted the list and deal with that
accordingly:
while( <<There are more items in the list>> ) do
breakif( <<This was the item were looking for>> );
<< select the next item in the list>>
Page 1026
The problem with this "solution" should be obvious if you think about it a moment. Weve already
tested to see if the loop is empty, immediately after leaving the loop we repeat this same test. This is somewhat inefcient. A better solution would be to have something like an "else" clause in the WHILE loop that
executes if you break out of the loop and doesnt execute if the loop terminates because the boolean expression evaluated false. Rather than use the keyword ELSE, lets invent a new (more readable) term: onbreak.
The ONBREAK section of a WHILE loop executes (only once) if a BREAK or BREAKIF statement was the
reason for the loop termination. With this ONBREAK clause, you could recode the previous WHILE loop a
little bit more elegantly as follows:
while( <<There are more items in the list>> ) do
breakif( <<This was the item were looking for>> );
<< select the next item in the list>>
onbreak
<< do something with the item we found >>
endwhile;
Note that if the ONBREAK clause is present, the WHILEs loop body ends at the ONBREAK keyword. The
ONBREAK clause executes at most once per execution of this WHILE statement.
Implementing a _while.._onbreak.._endwhile statement is very easy using HLAs multi-part macros.
Program 9.5 provides the complete implementation of this statement:
/****************************************************/
/*
*/
/* while.hla
*/
/*
*/
/* This program demonstrates a variant of the
*/
/* WHILE loop that provides a special "onbreak"
*/
/* clause. The _onbreak clause executes if the
*/
/* program executes a _break clause or it executes */
/* a _breakif clause and the corresponding
*/
/* boolean expression evaluates true. The _onbreak */
/* section does not execute if the loop terminates */
/* due to the _while boolean expression evaluating */
/* false.
*/
/*
*/
/****************************************************/
program Demo_while;
#include( "stdlib.hhf" )
// _while semantics:
//
// _while( expr )
//
//
<< stmts including optional _break, _breakif
//
_continue, and _continueif statements >>
//
Page 1027
Chapter Nine
Volume Five
//
_onbreak // This section is optional.
//
//
<< stmts that only execute if program executes
//
a _break or _breakif (with true expression)
//
statement. >>
//
// _endwhile;
macro _while( expr ):falseLbl, breakLbl, topOfLoop, hasOnBreak;
// hasOnBreak keeps track of whether we've seen an _onbreak
// section.
?hasOnBreak:boolean:=false;
// Here's the top of the WHILE loop.
// Implement this as a straight-forward WHILE (test for
// loop termination at the top of the loop).
topOfLoop:
jf( expr ) falseLbl;
// Ignore the _do keyword.
keyword _do;
//
//
//
//
//
//
keyword _break;
jmp breakLbl;
keyword _breakif( expr2 );
jt( expr2 ) breakLbl;
//
//
//
//
//
//
//
//
keyword _onbreak;
Page 1028
#if( !hasOnBreak )
jmp topOfLoop;
breakLbl:
#endif
falseLbl:
endmacro;
static
i:int32;
begin Demo_while;
// Demonstration of standard while loop
mov( 0, i );
_while( i < 10 ) _do
stdout.put( "1: i=", i, nl );
inc( i );
_endwhile;
// Demonstration with BREAKIF:
mov( 5, i );
_while( i < 10 ) _do
stdout.put( "2: i=", i, nl );
_breakif( i = 7 );
inc( i );
_endwhile
// Demonstration with _BREAKIF and _ONBREAK:
mov( 0, i );
Page 1029
Chapter Nine
Volume Five
_while( i < 10 ) _do
stdout.put( "3: i=", i, nl );
_breakif( i = 4 );
inc( i );
_onbreak
stdout.put( "Breakif was true at i=", i, nl );
_endwhile
stdout.put( "All Done" nl );
end Demo_while;
Program 9.5
In both cases ("C" and HLA) the << statements>> block executes only if both expr1 and expr2 evaluate
true. So other than the extra typing involved, it is often very easy to simulate logical conjunction by using
two IF statements in HLA.
There is one very big problem with this scheme. Consider what happens if you modify the "C" code to
be the following:
// "C" code employing logical-AND operator:
if( expr1 && expr2 )
{
Page 1030
Before describing how to create this new type of IF statement, we must digress for a moment and
explore an interesting feature of HLAs multi-part macro expansion: #KEYWORD macros do not have to
use unique names. Whenever you declare an HLA #KEYWORD macro, HLA accepts whatever name you
choose. If that name happens to be already dened, then the #KEYWORD macro name takes precedence as
long as the macro is active (that is, from the point you invoke the macro name until HLA encounters the
#TERMINATOR macro). Therefore, the #KEYWORD macro name hides the previous denition of that
name until the termination of the macro. This feature applies even to the original macro name; that is, it is
possible to dene a #KEYWORD macro with the same name as the original macro to which the #KEYWORD macro belongs. This is a very useful feature because it allows you to change the denition of the
macro within the scope of the opening and terminating invocations of the macro.
Although not pertinent to the IF statement we are constructing, you should note that parameter and local
symbols in a macro also override any previously dened symbols of the same name. So if you use that symbol between the opening macro and the terminating macro, you will get the value of the local symbol, not the
global symbol. E.g.,
var
i:int32;
j:int32;
.
.
.
#macro abc:i;
?i:text := "j";
.
.
.
#terminator xyz;
.
.
.
#endmacro;
.
.
.
mov( 25, i );
mov( 10, j );
abc
mov( i, eax );
xyz;
The code above loads 10 into EAX because the "mov(i, eax);" instruction appears between the opening and
terminating macros abc..xyz. Between those two macros the local denition of i takes precedence over the
global denition. Since i is a text constant that expands to j, the aforementioned MOV statement is really
equivalent to "mov(j, eax);" That statement, of course, loads 10 into EAX. Since this problem is difcult to
see while reading your code, you should choose local symbols in multi-part macros very carefully. A good
convention to adopt is to combine your local symbol name with the macro name, e.g.,
#macro abc : i_abc;
You may wonder why HLA allows something to crazy to happen in your source code, in a moment youll
see why this behavior is useful (and now, with this brief message out of the way, back to our regularly scheduled discussion).
Page 1031
Chapter Nine
Volume Five
Before we digressed to discuss this interesting feature in HLA multi-part macros, we were trying to gure out how to efciently simulate the conjunction and disjunction operators in an IF statement without actually using this operators in our code. The problem in the example appearing earlier in this section is that you
would have to duplicate some code in order to convert the IF..ELSE statement properly. The following code
shows this problem:
// "C" code employing logical-AND operator:
if( expr1 && expr2 )
{
<< true statements >>
}
else
{
<< false statements >>
}
Note that this code must duplicate the "<< false statements >>" section if the logic is to exactly match the
original "C" code. This means that the program will be larger and harder to read than is absolutely necessary.
One solution to this problem is to create a new kind of IF statement that doesnt nest the same way standard IF statements nest. In particular, if we dene the statement such that all IF clauses nested with an outer
IF..ENDIF block share the same ELSE and ENDIF clauses. If this were the case, then you could implement
the code above as follows:
if( expr1 ) then
if( expr2 ) then
<< true statements >>
else
<< false statements >>
endif;
Page 1032
/***********************************************/
/*
*/
/* if.hla
*/
/*
*/
/* This program demonstrates a modification of */
/* the IF..ELSE..ENDIF statement using HLA's
*/
/* multi-part macros.
*/
/*
*/
/***********************************************/
program newIF;
#include( "stdlib.hhf" )
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
Page 1033
Chapter Nine
Volume Five
//
//
//
//
//
//
Nested _if clause (yes, HLA lets you replace the main
macro name with a keyword macro). Identical to the
above _if implementation except this one does not
require a matching _endif clause. The single _endif
(matching the first _if clause) terminates all nested
_if clauses as well as the main _if clause.
Page 1034
static
tr:boolean := true;
f:boolean := false;
begin newIF;
// Real quick demo of the _if statement:
_if( tr ) _then
_if( tr ) _then
_if( f ) _then
stdout.put( "error" nl );
_else
stdout.put( "Success" );
_endif
end newIF;
Program 9.6
Just in case youre wondering, this program prints "Success" and then quits. This is because the nested
"_if" statements are equivalent to the expression "true && true && false" which, of course, is false. Therefore, the "_else" portion of this code should execute.
The only surprise in this macro is the fact that it redenes the _if macro as a keyword macro upon invocation of the main _if macro. The reason this code does this is so that any nested _if clauses do not require a
corresponding _endif and dont support an _else clause.
Implementing an ELSEIF clause introduces some difculties, hence its absence in this example. The
design and implementation of an ELSEIF clause is left to the more serious reader3.
Page 1035
Chapter Nine
9.3
Volume Five
This macro emits the code that computes the following (HLL) statement:
reg32 := uns32_expression;
For example, the macro invocation "u32expr( eax, ebx+ecx*5 - edi );" computes the value of the expression
"ebx+ecx*5 - edi" and leaves the result of this expression sitting in the EAX register.
The u32expr macro places several restrictions on the expression. First of all, as the name implies, it
only computes the result of an uns32 expression. No other data types may appear within the expression.
During computation, the macro uses the EAX and EDX registers, so expressions should not contain these
registers as their values may be destroyed by the code that computes the expression (EAX or EDX may
safely appear as the rst operand of the expression, however). Finally, expressions may only contain the following operators:
<, <=, >, >=, <>, !=, =, ==
+, *, /
(, )
The "<>" and "!=" operators are equivalent (not equals) and the "=" and "==" operators are also equivalent
(equals). The operators above are listed in order of increasing precedence; i.e., "*" has a higher precedence
than "+" (as you would expect). You can override the precedence of an operator by using parentheses in the
standard manner.
It is important to remember that u32expr is a macro, not a function. That is, the invocation of this macro
results in a sequence of 80x86 assembly language instructions that computes the desired expression. The
u32expr invocation is not a function call. to some routine that computes the result.
To understand how this macro works, it would be a good idea to review the section on Converting
Arithmetic Expressions to Postx Notation on page 635. That section discusses how to convert oating
point expressions to reverse polish notation; although the u32expr macro works with uns32 objects rather
than oating point objects, the approach it uses to translate expressions into assembly language uses this
same algorithm. So if you dont remember how to translate expressions into reverse polish notation, it might
be worthwhile to review that section of this text.
Converting oating point expressions to reverse polish notation is especially easy because the 80x86s
FPU uses a stack architecture. Alas, the integer instructions on the 80x86 use a register architecture and efciently translating integer expression to assembly language is a bit more difcult (see Arithmetic Expressions on page 597). Well solve this problem by translating the expressions to assembly code in a
somewhat less than efcient manner; well simulate an integer stack architecture by using the 80x86s hardware stack to hold temporary results during an integer calculation.
To push an integer constant or variable onto the 80x86 hardware stack, we need only use a PUSH or
PUSHD instruction. This operation is trivial.
Page 1036
// Get Ys value.
// Add with Xs value and leave sum on TOS.
Subtraction is identical to addition. Although subtraction is not commutative the operands just happen
to be on the stack in the proper order to efciently compute their difference. To compute "X-Y" where X is
on NOS and Y is on TOS, we can use code like the following:
// Compute X-y where X is on NOS and Y is on TOS:
pop( eax );
sub( eax, [esp] );
Multiplication of the two items on the top of stack is a little more complicated since we must use the
MUL instruction (the only unsigned multiplication instruction available) and the destination operand must
be the EDX:EAX register pair. Fortunately, multiplication is a commutative operation, so we can compute
the product of NOS (next on stack) and TOS (top of stack) using code like the following:
// Compute X*Y where X is on NOS and Y is on TOS:
pop( eax );
mul( [esp], eax );
mov( eax, [esp] );
Division is problematic because it is not a commutative operation and its operands on the stack are not
in a convenient order. That is, to compute X/Y it would be really convenient if X was on TOS and Y was in
the NOS position. Alas, as youll soon see, it turns out that X is at NOS and Y is on the TOS. To resolve this
issue requires slightly less efcient code that the sequences weve used above. Since the DIV instruction is
so slow anyway, this will hardly matter.
// Compute X/Y where X is on NOS and Y is on TOS:
mov(
xor(
div(
pop(
mov(
[esp+4], eax );
edx, edx );
[esp], edx:eax );
edx );
eax, [esp] );
//
//
//
//
//
The remaining operators are the comparison operators. These operators compare the value on NOS
with the value on TOS and leave true (1) or false (0) sitting on the stack based on the result of the comparison. While it is easy to work around the non-commutative aspect of many of the comparison operators, the
big challenge is converting the result to true or false. The SETcc instructions are convenient for this purpose, but they only work on byte operands. Therefore, we will have to zero extend the result of the SETcc
instructions to obtain an uns32 result we can push onto the stack. Ultimately, the code we must emit for a
comparison is similar to the following:
// Compute X <= Y where X is on NOS and Y is on TOS.
pop( eax );
cmp( [esp], eax );
setbe( al );
movzx( al, eax );
Page 1037
Chapter Nine
Volume Five
mov( eax, [esp] );
As it turns out, the appearance of parentheses in an expression only affects the order of the instructions
appearing in the sequence, it does not affect the number of type of instructions that correspond to the calculation of an expression. As youll soon see, handling parentheses is an especially trivial operation.
With this short description of how to emit code for each type of arithmetic operator, its time to discuss
exactly how we will write a macro to automate this translation. Once again, a complete discussion of this
topic is well beyond the scope of this text, however a simple introduction to compiler theory will certainly
ease the understanding the u32expr macro.
For efciency and reasons of convenience, most compilers are broken down into several components
called phases. A compiler phase is collection of logically related activities that take place during compilation. There are three general compiler phases we are going to consider here: (1) lexical analysis (or scanning), (2) parsing, and (3) code generation. It is important to realize that these three activities occur
concurrently during compilation; that is, they take place at the same time rather than as three separate,
serial, activities. A compiler will typically run the lexical analysis phase for a short period, transfer control
to the parsing phase, do a little code generation, and then, perhaps, do some more scanning and parsing and
code generation (not necessarily in that order). Real compilers have additional phases, the u32expr macro
will only use these three phases (and if you look at the macro, youll discover that its difcult to separate the
parsing and code generation phases).
Lexical analysis is the process of breaking down a string of characters, representing the expression to
compile, into a sequence of tokens for use by the parser. For example, an expression of the form "MaxVal x <= $1c" contains ve distinct tokens:
MaxVal
x
<=
$1c
Breaking any one of these tokens into smaller objects would destroy the intent of the expression (e.g., converting MaxVal to "Max" and "Val" or converting "<=" into "<" and "="). The job of the lexical analyzer is
to break the string down into a sequence of constituent tokens and return this sequence of tokens to the
parser (generally one token at a time, as the parser requests new tokens). Another task for the lexical analyzer is to remove any extra white space from the string of symbols (since expressions may generally contain
an arbitrary amount of white space).
Fortunately, it is easy to extract the next available token in the input string by skipping all white space
characters and then look at the current character. Identiers always begin with an alphabetic character or an
underscore, numeric values always begin with a decimal digit, a dollar sign ("$"), or a percent sign ("%").
Operators always begin with the corresponding punctuation character that represents the operator. There are
only two major issues here: how do we classify these tokens and how do we differentiate two or more distinct tokens that start with the same character (e.g., "<", "<=", and "<>")? Fortunately, HLAs compile-time
functions provide the tools we need to do this.
Consider the declaration of the u32expr macro:
#macro u32expr( reg, expr ):sexpr;
The expr parameter is a text object representing the expression to compile. The sexpr local symbol will
contain the string equivalent of this text expression. The macro translates the text expr object to a string with
the following statement:
?sexpr := @string:expr;
From this point forward, the macro works with the string in sexpr.
The lexer macro (compile-time function) handles the lexical analysis operation. This macro expects a
single string parameter from which it extracts a single token and removes the string associated with that
Page 1038
The lexer function actually returns a little more than the string it extracts from its parameter. The actual
return value is a record constant that has the denition:
tokType:
record
lexeme:string;
tokClass:tokEnum;
endrecord;
The lexeme eld holds that actual string (e.g., "2" in this example) that the lexer macro returns. The tokClass eld holds a small numeric value (see the tokEnum enumerated data type) that species that type of
the token. In this example, the call to lexer stores the value intconst into the tokClass eld. Having a single
value (like intconst) when the lexeme could take on a large number of different forms (e.g., "2", "3", "4", ...)
will help make the parser easier to write. The call to lexer in the previous example produces the following
results:
str2lex : "+3"
TokenResult.lexeme: "2"
TokenResult.tokClass: intconst
A subsequent call to lexer, immediately after the call above, will process the next available character in
the string and return the following values:
str2lex : "3"
TokenResult.lexeme: "+"
TokenResult.tokClass: plusOp
To see how lexer works, consider the rst few lines of the lexer macro:
#macro lexer( input ):theLexeme,boolResult;
?theLexeme:string;
?boolResult:boolean;
Page 1039
Chapter Nine
Volume Five
The real work begins with the #IF statement where the code uses the @peekCset function to see if the
rst character of the input parameter is a member of the tok1stIDChar set (which is the alphabetic characters
plus an underscore, i.e., the set of character that may appear as the rst character of an identier). If so, the
code executes the @oneOrMoreCset function to extract all legal identier characters (alphanumerics plus
underscore), storing the result in the theLexeme string variable. Note that this function call to @oneOrMoreCset also removes the string it matches from the front of the input string (see the description of @oneOrMoreCset for more details). This macro returns a tokType result by simply specifying a tokType constant
containing theLexeme and the enum constant identier.
If the rst character of the input string is not in the tok1stIDChar set, then the lexer macro checks to see
if the rst character is a legal decimal digit. If so, then this macro processes that string of digits in a manner
very similar to identiers. The code handles hexadecimal and binary constants in a similar fashion. About
the only thing exciting in the whole macro is the way it differentiates tokens that begin with the same symbol. Once it determines that a token begins with a character common to several lexemes, it calls @matchStr
to attempt to match the longer tokens before settling on the shorter lexeme (i.e., lexer attempts to match "<="
or "<>" before it decides the lexeme is just "<"). Other than this complication, the operation of the lexer is
really quite simple.
The operation of the parser/code generation phases is a bit more complex, especially since these macros
are indirectly recursive; to simplify matters we will explore the parser/code generator in a bottom-up fashion.
The parser/code generator phases consist of four separate macros: doTerms, doMulOps, doAddOps, and
doCmpOps. The reason for these four separate macros is to handle the different precedences of the arithmetic operators and the parentheses. An explanation of how these four macros handle the different arithmetic precedences is beyond the scope of this text; well just look at how these four macros do their job.
The doTerms macro is responsible for handling identiers, numeric constants, and subexpressions surrounded by parentheses. The single parameter is the current input string whose rst (non-blank) character
sequence is an identier, constant, or parenthetical expression. Here is the full text for this macro:
#macro doTerms( expr ):termToken;
// Begin by removing any leading white space from the string:
?expr := @trim( expr, 0 );
// Okay, call the lexer to extract the next token from the input:
?termToken:tokType := lexer( expr );
// See if the current token is an identifier. If so, assume that
// its an uns32 identifier and emit the code to push its value onto
// the stack.
#if( termToken.tokClass = identifier )
// If we've got an identifier, emit the code to
// push that identifier onto the stack.
push( @text( termToken.lexeme ));
// If it wasnt an identifier, see if its a numeric constant.
// If so, emit the code that will push this value onto the stack.
Page 1040
The doTerms macro is responsible for leaving a single item sitting on the top of the 80x86 hardware
stack. That stack item is either the value of an uns32 identier, the value of an uns32 expression, or the value
left on the stack via a parenthesized subexpression. The important thing to remember is that you can think of
doTerms as a function that emits code that leaves a single item on the top of the 80x86 stack.
The doMulOps macro handles expressions consisting of a single term (items handled by the doTerms
macro) optionally followed by zero or more pairs consisting of a multiplicative operator ("*" or "/") and a
second term. It is especially important to remember that the doMulOps macro does not require the presence
of a multiplicative operator; it will legally process a single term (identier, numeric constant, or parenthetical expression). If one or more multiplicative operator and term pairs are present, the doMulOps macro will
emit the code that will multiply the values of the two terms together and push the result onto the stack. E.g.,
consider the following:
X * 5
Page 1041
Chapter Nine
Volume Five
Since there is a multiplicative operator present ("*"), the doMulOps macro will call doTerms to process
the two terms (pushing X and then Y onto the stack) and then the doMulOps macro will emit the code to multiply the two values on the stack leaving their product on the stack. The complete code for the doMulOps
macro is the following:
#macro doMulOps( sexpr ):opToken;
// Process the leading term (not optional). Note that
// this expansion leaves an item sitting on the stack.
doTerms( sexpr );
// Process all the MULOPs at the current precedence level.
// (these are optional, there may be zero or more of them.)
// Begin by removing any leading white space.
?sexpr := @trim( sexpr, 0 );
#while( @peekCset( sexpr, MulOps ))
// Save the operator so we know what code we should
// generate later.
?opToken := lexer( sexpr );
// Get the term following the operator.
doTerms( sexpr );
//
//
//
//
[esp+4], eax );
edx, edx );
[esp], edx:eax );
edx );
eax, [esp] );
#endif
?sexpr := @trim( sexpr, 0 );
#endwhile
#endmacro;
Page 1042
This statement simply compiles the sequence of statements appearing between the braces in the rst
operand and then it uses the second string_expression operand as the "returns" value for this statement. As
you may recall from the discussion of instruction composition ( see Instruction Composition in HLA on
page 558), HLA substitutes the "returns" value of a statement in place of that statement if it appears as an
operand to another expression. The RETURNS statement appearing in the u32expr macro returns the register you specify as the rst parameter as the "returns" value for the macro invocation. This lets you invoke the
u32expr macro as an operand to many different instructions (that accept a 32-bit register as an operand). For
example, the following u32expr macro invocations are all legal:
mov( u32expr( eax, i*j+k/15 - 2), m );
if( u32expr( edx, eax < (ebx-2)*ecx) ) then ... endif;
funcCall( u32expr( eax, (x*x + y*y)/z*z ), 16, 2 );
Well, without further ado, heres the complete code for the u32expr compiler and some test code that
checks out the operation of this macro:
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
u32expr.hla
This program demonstrates how to write an "expression compiler"
using the HLA compile-time language. This code defines a macro
(u32expr) that accepts an arithmetic expression as a parameter.
This macro compiles that expression into a sequence of HLA
machine language instructions that will compute the result of
that expression at run-time.
The u32expr macro does have some severe limitations.
First of all, it only support uns32 operands.
Second, it only supports the following arithmetic
operations:
+, -, *, /, <, <=, >, >=, =, <>.
The comparison operators produce zero (false) or
one (true) depending upon the result of the (unsigned)
comparison.
The syntax for a call to u32expr is
u32expr( register, expression )
Page 1043
Chapter Nine
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
Volume Five
program TestExpr;
#include( "stdlib.hhf" )
// Some special character classifications the lexical analyzer uses.
const
//
//
//
//
digits := { '0'..'9' };
hexDigits := { '0'..'9', 'a'..'f', 'A'..'F' };
binDigits := { '0'..'1' };
type
Page 1044
//
//
//
//
//
//
//
//
//
//
//
//
//
tokEnum:
//
//
//
//
//
//
//
enum
{
identifier,
intconst,
lparen,
rparen,
plusOp,
mulOp,
cmpOp
};
tokType:
record
lexeme:string;
tokClass:tokEnum;
endrecord;
//
//
//
//
//
//
//
Page 1045
Chapter Nine
Volume Five
//
//
//
//
This boolean
Page 1046
// Handle the "=" ("=="), "<>" ("!="), "<", "<=", ">", and ">="
// operators here.
#elseif( @peekCset( input, CmpOps ))
// Note that we must check for two-character operators
// first so we don't confuse them with the single
// character opertors:
#if
(
|
|
)
tokType:[ theLexeme, cmpOp ]
#elseif( @matchStr( input, "!=", input, theLexeme ))
tokType:[ "<>", cmpOp ]
#elseif( @matchStr( input, "==", input, theLexeme ))
tokType:[ "=", cmpOp ]
#elseif( @oneCset( input, {'>', '<', '='}, input, theLexeme ))
tokType:[ theLexeme, cmpOp ]
#else
#error( "Illegal comparison operator: " + input )
#endif
Page 1047
Chapter Nine
Volume Five
// Handle the parentheses down here.
#elseif( @oneChar( input, '(', input, theLexeme ))
tokType:[ "(", lparen ]
#elseif( @oneChar( input, ')', input, theLexeme ))
tokType:[ ")", rparen ]
Page 1048
#else
#error( "Unexpected term: '" + termToken.lexeme + "'" )
#endif
endmacro;
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
Page 1049
Chapter Nine
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
Volume Five
those two values off the stack and multiply or divide them and
push the result back onto the stack (sort of like the way the
FPU multiplies or divides values on the FPU stack).
If there are three or more operands in a row, separated by
mulops ("*" or "/") then this macro will process them in
a left-to-right fashion, popping each pair of values off the
stack, operating on them, pushing the result, and then processing
the next pair. E.g.,
i * j * k
yields:
push( i );
push( j );
pop( eax );
mul( (type dword [esp]));
mov( eax, [esp]);
// Pop K
// Compute K* (i*j) [i*j is value on TOS].
// Save product on TOS.
Page 1050
[esp+4], eax );
edx, edx );
[esp], edx:eax );
edx );
eax, [esp] );
#endif
?sexpr := @trim( sexpr, 0 );
#endwhile
endmacro;
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
Page 1051
Chapter Nine
Volume Five
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
Page 1052
#endif
#endwhile
endmacro;
Page 1053
Chapter Nine
//
//
//
//
//
//
//
//
//
//
//
Volume Five
returns
(
{
?sexpr:string := @string:expr;
#if( !@IsReg32( reg ) )
#error( "Expected a 32-bit register" )
#else
// Process the expression and leave the
// result sitting in the specified register.
doCmpOps( sexpr );
pop( reg );
#endif
},
// Return the specified register as the "returns"
// value for this compilation:
@string:reg
)
endmacro;
begin TestExpr;
Page 1054
// Now compute:
//
// eax := x + ecx/2
//
:= 10 + 12/2
//
:= 10 + 6
//
:= 16
//
// This macro emits the following code:
//
// push( x );
// push( ecx );
// pushd( 2 );
Page 1055
Chapter Nine
Volume Five
//
//
//
//
//
//
//
//
mov(
xor(
div(
pop(
mov(
pop(
add(
pop(
[esp+4], eax );
edx, edx );
[esp], edx:eax );
edx );
eax, [esp] );
eax );
eax, [esp] );
eax );
Program 9.7
Page 1056
9.4
Page 1057
Chapter Nine
Page 1058
Volume Five