Retro Compiler Sample
Retro Compiler Sample
T3X Corp.
Implementation
Source Language S Language I Target Language T
Figure 1: Compilation
Compiler
Linker Librar y
Executable
The Language
The source language and implementation language used in this
book is a subset of an obscure, little, procedural language called
T3X. It is a tiny language that once had a tiny community, and it
was even used to write some real-life software, like its own
compiler and linker, an integrated development environment, a
database system, and a few simple games. It was also subject of
a few college courses, most probably because (1) it was
reasonably well defined and documented and (2) due to the size
of its community, nobody could be bothered to do your homework
assignments for you.
T3X looks like a mixture of Pascal, C, and BCPL. It has untyped
data and typed operators, which simplifies the compiler a lot, but
also leaves all the type checking to the programmer, which
requires some discipline on the side of the user — a perspective
that is not in vogue these days, where computing seems to be only
about productivity, safety, and security.
But this text is not about creating a product and making a shiny
web page about it. This is about diving right into the depths of the
matter and having some fun. And a fun language T3X is. It is
interesting to see how little you need to be able to write quite
comprehensible and expressive programs.
Syntax
Syntax is what a language looks like. T3X is a block-structured,
procedural language, which means that its programs describe
procedures for manipulating data, i.e. ‘‘what to do with data’’. It is
called block-structured, because it is structured language that
divides programs into blocks. A block is a chunk of source code
that describes a part of a procedure.
A structured language uses certain constructs to describe the flow
of control while a program executes, typically selection and loops
(repetition).
Source code of procedural languages is mostly organized in the
18 Syntax
More Abstract
Program
Declaration
Statement
Expression
Less Abstract
In procedural languages:
• programs contain declarations, statements, and expressions
• declarations contain statements and expressions
• statements contain expressions
If you are familiar with C or Pascal or BCPL, the T3X syntax will
look quite familiar. Exhibit 4 displays the infamous bubblesor t
algorithm in T3X.
The keywords are highlighted by using upper case in this example,
but this is not necessary and not usually done in actual code.
bubblesort(n,v) star ts the declaration of the procedure
bubblesort with the formal arguments n and v . The body of the
procedure is a block statement (or compound statement) enclosed
in the keywords DO and END. The compound statement declares
the local variables i , swapped , and tmp.
Syntax 19
! This is a comment
bubblesort(n, v) DO VAR i, swapped, tmp;
swapped := %1;
WHILE (swapped) DO
swapped := 0;
FOR (i=0, n-1) DO
IF (v[i] > v[i+1]) DO
tmp := v[i];
v[i] := v[i+1];
v[i+1] := tmp;
swapped := %1;
END
END
END
END
The lexeme %1 denotes the number −1. You could also write -1,
but there is a subtle difference: the former is a value and the latter
is an operator applied to a value, which will not work in contexts
where a constant is expected.
Fur thermore, /\ and \/ denote logical (short-circuit) AND and
OR, and X->Y:Z means ‘‘if x then y else z ’’, just like x?y:z in C.
IF with an ELSE is called IE (If/Else).
You will pick up the rest of the T3X syntax as we walk through the
compiler source code. If you are interested, there is a brief
introduction to T3X in the appendix.
20 Syntax
Semantics
Semantics is how the syntax is interpreted. Note that ‘‘interpreted’’
does not imply the use of interpreting software here. Inter pretation
can be done at various levels, and in the case of the T3X compiler
presented here, the code will eventually be interpreted by a Z80
CPU.
Interpretation in this case is a question of meaning. What does a
statement like
WHILE (swapped) DO ... END
mean? To you, it is probably obvious that it means: ‘‘while the
value of swapped is a ‘true’ value, repeat everything between DO
and END’’.
But now we need to know what a ‘‘true’’ value is and what
‘‘repetition’’ means. This is what the semantics of a language
describes.
For example, the expressions v[i] and s::i both denote the i th
element of a vector. However, the first variant describes the i th
machine word in a vector of machine words, and the second one
describes the i th byte in a byte vector.
In this book, semantics will be specified in three different ways:
• by diagrams describing program flow;
• by shor t machine code sequences that resemble the meaning of
a language construct;
• by simple mathematical formulae.
For instance, the meaning of the [] operator in the expression
v[i] would be specified as follows, assuming that the value of i is
stored in the hl register and the address of v is on top of the stack.
add hl,hl hl = 2 ⋅ i
pop de de = v
add hl,de hl = address of v[i]
ld a,(hl) hl = (hl)
inc hl
Semantics 21
ld h,(hl)
ld l,a
Bootstrapping
Full Compiler
Compiler
Source Code
Source Code
(Language S )
(Language B)
Bootstrapping
Pre-Existing
S -Compiler
B-Compiler
(Stage 0)
read S -Compiler
(Stage 1)
generate
Bootstrapping
Compiler Full Compiler Source Code
Source Code
Pre-
Stage-0 Stage-1 Stage-2
existing
Compiler Compiler Compiler
Compiler
read Stage-3
Compiler
generate
equal stages
Figure 10: Triple Test (light gray boxes indicate binaries)
essential step in the testing of the compiler, though. When this test
fails, the compiler cannot be assumed to generate correct code.
The FCB structure should be an old acquaintance at this point. Note that
it is a byte-field structure, so the proper way to define an FCB is
VAR Fcb::CPM.FCB;
and its fields must be accessed with the byte-subscript operator x::y.
FCB_SEQREC = 32,
FCB_RANRECL = 33,
FCB_RANRECH = 34,
RCB_RANOVFL = 35;
! jp 0x014d
public inline expandfn(2) = [ 0xc3, 0x4d, 0x01 ];
The following functions provide mnemonic names for the CP/M 2.2
BDOS functions. For example,
var Fcb::CPM.FCB;
var Buf::128;
A buffer for file names and a vector of pointers to the file names in
the buffer. Each file name in an FCB has a length of 11 characters.
var Files::MAXFILES*11;
var Ptrs[MAXFILES];
var ntoa_buf::10;
ntoa(x) do var i;
if (x = 0) return "0";
ntoa_buf::9 := 0;
CP/M Example Program 269
i := 9;
while (x) do
i := i-1;
ntoa_buf::i := x mod 10 + ’0’;
x := x / 10;
end
return @ntoa_buf::i;
end
Sor t the pointers to the files names in Ptrs so that they point to
names in lexicographic order. The sortdir function uses a
slightly optimized Bubblesor t algorithm, which should be good
enough for sorting rather small directories.
Read all file names on the given disk and store them in the File
buffer and pointers to the names in the Ptrs vector. When a drive
letter is given insert it into the match string, otherwise use a match
string without a drive letter. In a file match every question mark
will match any character in a file name, so a file match of the form
????????.??? will match any file name. Note that the dot has a
special meaning in cpm.expandfn and must not be replaced with
270 CP/M Example Program
a question mark.
The cpm.search and cpm.searchnext functions both return a
director y code, which is an index into the DMA buffer. The search
functions read the directory one record at a time, and four
director y entries fit in a 128-byte record, so the functions return a
value in the range 0. . 3 upon success. To locate the directory
entr y in the buffer, the return value has to be multiplied by 32. The
layout of the name in the directory entr y is the same as in an FCB.
The readdir function returns the number of file names stored in
the Files buffer.
filesize(i) do
t.memcopy(@Fcb::CPM.FCB_NAME, Ptrs[i], 11);
cpm.getfsiz(Fcb);
CP/M Example Program 271
! ignoring FCB_RANOVF
return (Fcb::CPM.FCB_RANRECL +
Fcb::CPM.FCB_RANRECH * 256 + 7) / 8;
end
Print the size in kilobytes and the name of a file on the console.
printfile(i) do var n, j, s;
n := filesize(i);
s := ntoa(n);
for (j = length(s), 4) writes("\s");
writes(s);
writes("K\s\s");
t.write(T3X.SYSOUT, Ptrs[i], 8);
writes("\s");
t.write(T3X.SYSOUT, Ptrs[i]+8, 3);
end
const COLS = 3;
printdir(k) do var i, j, m, n;
ie (k mod COLS)
n := k / COLS + 1;
else
n := k / COLS;
i := 0;
for (i=0, n) do
m := i;
for (j=0, COLS) do
if (m < k) do
printfile(m);
if (m+n < k) writes(" : ");
end
m := m + n;
end
272 CP/M Example Program
nl();
end
end
do var d, k, b::2;
d := 0;
if (t.getarg(1, b, 2) >= 0)
d := upcase(b::0);
k := readdir(d);
sortdir(k);
printdir(k);
end
The runtime librar y dump alone has a size of about 10K bytes.
The Z80 assembly language code from which the run time librar y
is compiled has a size of 1200 lines and compiles to a binary of
1670 bytes.
The stage-3 executable of the compiler has a size of 34834 bytes
and self-compiles in less than 10 minutes on an 4MHz Z80 system
with its file system on an SRAM card. On a floppy-based system,
the compile time could be expected to be much longer, of course.
274 Some Random Facts
Speed File
System (MHz) System Time LPM LPM/MHz
1
CP/M Z80 4.00 SRAM 9m20s 251 63
DOS V202 4.77 CF3 8m31s 274 58
CP/M Z80 7.40 CF 5m37s 434 59
FreeBSD
TCVM4 1000.00 SSD5 0.80s 164,000 164
FreeBSD
x86-64 1000.00 SSD 0.15s 933,000 933
Figure 83: Self-Compilation Speed, LPM = lines/minute