A Retargetable C Compiler Design and Implementation
A Retargetable C Compiler Design and Implementation
Design and
Implementation
Christopher W. Fraser
AT&T BELL LABORATORIES
ISBN 0-8053-1670-1
1 2 3 4 5 6 7 8 9 10 DOC 98 97 96 95
The Benjamin/Cummings Publishing Company, Inc.
390 Bridge Parkway
Redwood City, CA 94065
To Linda
To Maylee
Contents
Preface xiii
1 Introduction 1
1.1 Llterate Programs 1
1.2 How to Read This Book 2
1.3 Overview 4
1.4 Design 11
1.5 Common Declarations 16
1.6 Syntax Specifications 19
1.7 Errors 20
Further Reading 21
2 Storage Management 23
2.1 Memory Management Interface 23
2.2 Arena Representation 24
2.3 Allocating Space 26
2.4 Deallocating Space 28
2.5 Strings 28
Further Reading 31
Exercises 3 2
3 Symbol Management 35
3.1 Representing Symbols 37
3.2 Representing Symbol Tables 39
3.3 Changing Scope 42
3.4 Finding and Installing Identifiers 44
3.5 Labels 45
3.6 Constants 47
3.7 Generated Variables 49
Further Reading 51
Exercises 51
vii
vii CONTENTS
4 Types 53
4.1 Representing Types 53
4.2 Type Management 56
4.3 Type Predicates 60
4.4 Type Constructors 61
4.5 Function Types 63
4.6 Structure and Enumeration Types 65
4. 7 Type-Checking Functions 69
4.8 Type Mapping 73
Further Reading 74
Exercises 74
7 Parsing 127
7.1 Languages and Grammars 127
7.2 Ambiguity and Parse Trees 128
7.3 Top-Down Parsing 132
7.4 FIRST and FOLLOW Sets 134
7.5 Writing Parsing Functions 137
7.6 Handling Syntax Errors 140
Further Reading 145
Exercises 14 5
8 Expressions 147
8.1 Representing Expressions 147
8.2 Parsing Expressions 151
8.3 Parsing C Expressions 154
8.4 Assignment Expressions 156
8.5 Conditional Expressions 159
8.6 Binary Expressions 160
8.7 Unary and Postfix Expressions 163
8.8 Primary Expressions 166
Further Reading 170
Exercises 171
10 Statements 216
10.1 Representing Code 216
10.2 Execution Points 220
10.3 Recognizing Statements 221
10.4 If Statements 224
10.5 Labels and Gotos 226
x CONTENTS
11 Declarations 252
11.1 Translation Units 253
11.2 Declarations 2 54
11.3 Declarators 265
11.4 Function Declarators 2 70
11.5 Structure Specifiers 276
11.6 Function Definitions 285
11.7 Compound Statements 293
11.8 Finalization 303
11.9 The Main Program 305
Further Reading 308
Exercises 308
19 Retrospective 526
19.1 Data Structures 526
19.2 Interface 527
19.3 Syntactic and Semantic Analyses 529
19.4 Code Generation and Optimization 531
19.5 Testing and Validation 531
Further Reading 5 3 3
Bibliography 535
Index 541
Acknowledg111ents
This book owes much to the many lee users at AT&T Bell Laboratories,
Princeton University, and elsewhere who suffered through bugs and pro-
vided valuable feedback. Those who deserve explicit thanks include Hans
Boehm, Mary Fernandez, Michael Golan, Paul Haahr, Brian Kernighan,
Doug Mcilroy, Rob Pike, Dennis Ritchie, and Ravi Sethi. Ronald Guil-
mette, David Kristal, David Prosser, and Dennis Ritchie provided valu-
able information concerning the fine points of the ANSI Standard and its
interpretation. David Gay helped us adapt the PFORT library of numerical
software to be an invaluable stress test for l cc's code generators.
Careful reviews of both our code and our prose by Jack Davidson,
Todd Proebsting, Norman Ramsey, William Waite, and David Wall con-
tributed significantly to the quality of both. Our thanks to Steve Beck,
who installed and massaged the fonts used for this book, and to Maylee
Noah, who did the artwork with Adobe Illustrator.
Christopher W. Fraser
David R. Hanson
1
Introduction
1
2 CHAPTER 1 • INTRODUCTION
1.3 Overview
l cc transforms a source program to an assembler language program.
Following a sample program through the intermediate steps in this trans-
formation illustrates l cc's major components and data structures. Each
step transforms the program into a different representation: prepro-
cessed source, tokens, trees, directed acyclic graphs, and lists of these
graphs are examples. The initial source code is:
int round(f) float f; {
return f + 0.5; /*truncates */
}
INT inttype
ID "round"
I('
ID "f"
')I
FLOAT floattype
ID "f"
I' I
I {' I
RETURN
ID "f"
'+'
FCON 0.5
I, I
I}' I
EOI
file with a pair of # directives, and every other one names a line other
than 1.
The compiler proper picks up where the preprocessor leaves off. It
starts with the lexical analyzer or scanner, which breaks the input into
the tokens shown in Figure 1.1. The left column is the token code, which
is a small integer, and the right column is the associated value, if there
is one. For example, the value associated with the keyword int is the
value of i nttype, which represents the type integer. The token codes for
single-character tokens are the ASCII codes for the characters themselves,
and EOI marks the end of the input. The lexical analyzer posts the source
coordinate for each token, and it processes the # directive; the rest of the
compiler never sees such directives. 1 cc's lexical analyzer is described
in Chapter 6.
The next compiler phase parses the token stream according to the
syntax rules of the C language. It also analyzes the program for seman-
tic correctness. For example, it checks that the types of the operands
in operations, such as addition, are legal, and it checks for implicit con-
versions. For example, in the sample's addition, f is a float and 0. 5 is
a double, which is a legal combination, and the sum is converted from
double to int implicitly because round's return type is int.
The outcome of this phase for the sample are the two decorated ab-
stract syntax trees shown in Figure 1.2. Each node represents one basic
operation. The first tree reduces the incoming double to a float. It as-
signs a float (ASGN+F) to the cell with the address &f (the left ADDRF+P).
It computes the value to assign by converting to float (CVD+F) the double
fetched (INDIR+D) from address &f (the right ADDRF+P).
6 CHAPTER 1 • INTRODUCTION
ASGN+F RET+I
/~ i
ADDRF+P CVD+F CVD+I
' i i
INDIR+D ADD+D
i
ADDRF+P CVF+D
/~CNST+D
i 0.5
'
'~
INDIR+F
caller "f"- double
i
ADDRF+P
.... >-' --------------
callee "f"--+ float
/~ i '
'
'
'4
ADDRFP CVDF CVDI
' i i
"1"
IND I RD ADDO
,' i
ADDRFP CVFD
/~IND I RD
.'
'
.
.if
"f'~float
'
i i
callee
~
caller
.-"f'~double
IND I RF
i
ADDRGP
Start
(.Gen@
4 Blockbeg 5 ~----
( . Defpoint 8,2, "sample.c"
4Gen@
( . Blockend
(. La~el@
( . Defpoint 0,3,"sample.c"
FIGURE 1.4 Code list for the sample.
8 CHAPTER 1 • INTRODUCTION
At this point, the structures that represent the program pass from
l cc's machine-independent front end into its back end, which translates
these structures into assembler code for the target machine. One can
hand-code a back end to emit code for a specific machine; such code
generators are often largely machine-specific and must be replaced en-
tirely for a new target.
The code generators in this book are driven by tables and a tree gram-
mar that maps dags to instructions as described in Chapters 13-18. This
organization makes the back ends partly independent of the target ma-
chine; that is, only part of the back end must be replaced for a new
target. The other part could be moved into the front end - which serves
all code generators for all target machines - but this step would com-
plicate using l cc with a different kind of code generator, so it has not
been taken.
The code generator operates by annotating the dags. It first identifies
an assembler-code template - an instruction or operand - that imple-
ments each node. Figure 1.5 shows the sample's dags annotated with
assembler code for the 386 or compatibles, henceforth termed X86. %n
denotes the assembler code for child n where the leftmost child is num-
bered 0, and %letter denotes one of the symbol-table entries at which
the node points. In this figure, the solid lines link instructions, and the
dashed lines link parts of instructions, such as addressing modes, to the
instructions in which they are used. For example, in the first dag, the
ASGNF and INDIRD nodes hold instructions, and the two ADDRGP nodes
hold their operands. Also, the CVDF node that was in the right operand
of the ASGNF in Figure 1.3 is gone - it's been swallowed by the instruction
selection because the instruction associated with the ASGNF does both the
conversion and the assignment. Chapter 14 describes the mechanics of
instruction selection and l burg, a program that generates selection code
from compact specifications.
For those who don't know X86 assembler code, fl d loads a floating-
point value onto a stack; fstp pops one off and stores it; fi stp does
likewise but truncates the value and stores the resulting integer instead;
fadd pops two values off and pushes their sum; and pop pops an integral
value off the stack into a register. Chapter 18 elaborates.
The assembler code is easier to read after the compiler takes its next
step, which chains together the nodes that correspond to instructions
in the order in which they're to be emitted, and allocates a register for
each node that needs one. Figure 1.6 shows the linearized instructions
and registers allocated for our sample program. The figure is a bit of
a fiction - the operands aren't actually substituted into the instruction
templates until later - but the white lie helps here.
Like many compilers that originated on UNIX systems, l cc emits as-
sembler code and is used with a separate assembler and linker. This
book's back ends work with the vendors' assemblers on MIPS and SPARC
1.3 • OVERVIEW 9
CVFD
/' ':4.
IND I RD
"# nop\n" " qword ptr %0"
INDIRF
i ADDRGP
y
"fld dword ptr %0\n" 11%all
y
ADDRFP
"%a[ebp]"
FIGURE 1.5 After selecting instructions.
systems, and with Microsoft's MASM 6.11 and Harland's Turbo Assembler
4.0 under DOS. 1 cc generates the assembler language shown in Figure 1.7
for our sample program. The lines in this code delimit its major parts.
The first part is the boilerplate of assembler directives emitted for every
program. The second part is the entry sequence for round. The four push
instructions save the values of some registers, and the mov instruction
establishes the frame pointer for this invocation of round.
The third part is the code emitted from the annotated dags shown in
Figure 1.5 with the symbol-table data filled in. The fourth part is round's
.486
.model small boilerplate
extrn ~turboFloat:near
extrn ~setargv:near
public _round
_TEXT segment
_round:
push ebx entry
push esi sequence
push edi
push ebp
mov ebp,esp
fld qword ptr 20[ebp]
fstp dword ptr 20[ebp]
fld dword ptr 20[ebp] body of
fadd qword ptr L2 round
sub esp,4
fistp dword ptr O[esp]
pop eax
Ll:
mov esp,ebp
pop ebp exit
pop edi sequence
pop esi
pop ebx
ret
_TEXT ends
_DATA segment
align 4
L2 label byte initialized data
dd OOH,03feOOOOOH & boilerplate
_j)ATA ends
end
exit sequence, which restores the registers saved in the entry sequence
and returns to the caller. Ll labels the exit sequence. The last part holds
initialized data and concluding boilerplate. For round, these data consist
only of the constant 0.5; L2 is the address of a variable initialized to
000000003fe00000 16 , which is the IEEE floating-point representation for
the 64-bit, double-precision constant 0.5.
1.4 • DESIGN 11
1.4 Design
There was no separate design phase for 1cc. It began as a compiler for
a subset of C, so its initial design goals were modest and focussed on
its use in teaching about compiler implementation in general and about
code generation in particular. Even as 1cc evolved into a compiler for
ANSI C that suits production use, the design goals changed little.
Computing costs less and less, but programmers cost more and more.
When obliged to choose between two designs, we usually chose the one
that appeared to save our time and yours, as long as the quality of the
generated code remained satisfactory. This priority made 1cc simple,
fast, and less ambitious at optimizing than some competing compil-
ers. 1cc was to have multiple targets, and it was overall simplicity that
counted. That is, we wrote extra code in 1cc's one machine-independent
part to save code in its multiple target-specific parts. Most of the design
and implementation effort devoted to 1cc has been directed at making
it easy to port 1cc to new targets.
1cc had to be simple because it was being written by ry uy two pro-
grammers with many other demands on their time. Simpl• ly saved im-
plementation time and saves more when it comes time lO change the
compiler. Also, we wanted to write this book, and you'll see that it was
hard to make even a simple compiler fit.
1cc is smaller and faster than most other ANSI C compilers. Compila-
tion speed is sometimes neglected in compiler design, but it is widely ap-
preciated; users often cite compilation speed as one of the reasons they
use 1 cc. Fast compilation was not a design goal per se; it's a consequence
of striving for simplicity and of paying attention to those relatively few
compiler components where speed really matters. 1cc's lexical analysis
(Chapter 6) and instruction selection (Chapter 14) are particularly fast,
and contribute most to its speed.
1cc generates reasonably efficient object code. It's designed specifi-
cally to generate good local code; global optimizations, like those done
by optimizing compilers, were not part of 1cc's design. Most modern
compilers, particularly those written by a CPU vendor to support its ma-
chines, must implement ambitious optimizers so that benchmarks put
their machines in the best light. Such compilers are complex and typ-
ically supported by groups of tens of programmers. Highly optimizing
C compilers generate more efficient code than 1 cc does when their op-
timization options are enabled, but the hundreds of programmers who
use 1cc daily as their primary C compiler find that its generated code
is fast enough for most applications, and they save another scarce re-
source - their own time - because 1cc runs faster. And 1cc is easier
to understand when systems programmers find they must change it.
Compilers don't live in a vacuum. They must cooperate with pre-
processors, linkers, loaders, debuggers, assemblers, and operating sys-
12 CHAPTER 1 • INTRODUCTION
terns, all of which may depend on the target. Handling all of the target-
dependent variants of each of these components is impractical. l cc's
design minimizes the adverse impact of these components as much as
possible. For example, its target-dependent code generators emit assem-
bler language and rely on the target's assembler to produce object code.
It also relies on the availability of a separate preprocessor. These design
decisions are not without some risk; for example, in vendor-supplied as-
semblers, we have tripped across several bugs over which we have no
control and thus must live with.
A more important example is generating code with calling sequences
that are compatible with the target's conventions. It must be possible
for l cc to do this so it can use existing libraries. A standard ANSI C
library is a significant undertaking on its own, but even if l cc came with
its own library, it would still need to be able to call routines in target-
specific libraries, such as those that supply system calls. The same con-
straint applies to proprietary third-party libraries, which are increasingly
important and are usually available only in object-code form.
Generating compatible code has significant design consequences on
both lee's target-independent front end and its target-dependent back
ends. A good part of the apparent complexity in the interface between
the front and back ends, detailed in Chapter 5, is due directly to the
tension between this design constraint and those that strive for simplic-
ity and retargetability. The mechanisms in the interface that deal with
passing and returning structures are an example.
l cc's front end is roughly 9,000 lines of code. Its target-dependent
code generators are each about 700 lines, and there are about 1,000
lines of target-independent back-end code that are shared between the
code generators.
With a few exceptions, l cc's front end uses well established compiler
techniques. As surveyed in the previous section, the front end per-
forms lexical, syntactic, and semantic analysis. It also eliminates local
common subexpressions (Chapter 12), folds constant expressions, and
makes many simple, machine-independent transformations that improve
the quality of local code (Chapter 9); many of these improvements are
simple tree transformations that lead to better addressing code. It also
lays down efficient code for loops and switch statements (Chapter 10).
l cc's lexical analyzer and its recursive-descent parser are both written
by hand. Using compiler-construction tools, such as parser generators, is
perhaps the more modern approach for implementing these components,
but using them would make l cc dependent on specific tools. Such de-
pendencies are less a problem now than when l cc was first available, but
there's little incentive to change working code. Theoretically, using these
kinds of tools simplifies both future changes and fixing errors, but ac-
commodating change is less important for a standardized language like
ANSI C, and there have been few lexical or syntactic errors. Indeed, prob-
1.4 • DESIGN 13
ably less than 15 percent of l cc's code concerns parsing, and the error
rate in that code is negligible. Despite its theoretical prominence, pars-
ing is a relatively minor component in l cc and other compilers; semantic
analysis and code generation are the major components and account for
most of the code - and have most of the bugs.
One of the reasons that l cc's back ends are its most interesting com-
ponents is because they show the results of the design choices we made
to enhance retargetability. For retargeting, future changes - each new
target - are important, and the retargeting process must make it rea-
sonably easy to cope with code-generation errors, which are certain to
occur. There are many small design decisions made throughout l cc that
affect retargetability, but two dominate.
First, the back ends use a code-generator generator, l burg, that pro-
duces code generators from compact specifications. These specifications
describe how dags are mapped into instructions or parts thereof (Chap-
ter 14). This approach simplifies writing a code generator, generates
optimal local code, and helps avoid errors because l burg does most of
the tedious work. One of the l burg specifications in this bP'lk can often
be used as a starting point for a new target, so retargeterf' ..on't have to
start from scratch. To avoid depending on foreign tools, tile companion
diskette includes lburg, which is written in ANSI C.
Second, whenever practical, the front end implements as much of an
apparently target-dependent function as possible. For example, the front
end implements switch statements completely, and it implements access
to bit fields by synthesizing appropriate combinations of shifting and
masking. Doing so precludes the use of instructions designed specifi-
cally for bit-field access and switch statements on those increasingly few
targets that have them; simplifying retargeting was deemed more impor-
tant. The front end can also completely implement passing or returning
structures, and it does so using techniques that are often used in target-
dependent calling conventions. These capabilities are under the control
of interface options, so, on some targets, the back end can ignore these
aspects of code generation by setting the appropriate option.
While l cc's overall design goals changed little as the compiler evolved,
the ways in which these goals were realized changed often. Most of these
changes swept more functionality into the front end. The switch state-
ment is an example. In earlier versions of l cc, the code-generation inter-
face included functions that the back end provided specifically to emit
the selection code for a switch statement. As new targets were added,
it became apparent that the new versions of these functions were nearly
identical to the corresponding functions in existing targets. This experi-
ence revealed the relatively simple design changes that permitted all of
this code to be moved into the front end. Doing so required changing
all of the existing back ends, but these changes removed code, and the
design changes simplify the back ends on future targets.
14 CHAPTER 1 • INTRODUCTION
The most significant and most recent design change involves the way
1cc is packaged. Previously, 1 cc was configured with one back end; that
is, the back end for target X was combined with the front end to form
an instance of 1 cc that ran on X and generated code for X. Most of
1cc's back ends generate code for more than one operating system. Its
MIPS back end, for example, generates code for MIPS computers that
run DEC's Ultrix or SGI's IRIX, so two instances of 1cc were configured.
N targets and M operating systems required N x M instances of 1 cc
in order to test them completely, and each one was configured from a
slightly different set of source modules depending on the target and the
operating system. For even small values of N and M, building N x M
compilers quickly becomes tedious and prone to error.
In developing the current version of 1cc for this book, we changed the
code-generation interface, described in Chapter 5, so that it's possible to
combine all of the back ends into a single program. Any instance of 1cc
is a cross-compiler. That is, it can generate code for any of its targets
regardless of the operating system on which it runs. A command-line
option selects the desired target. This design packages all target-specific
data in a structure, and the option selects the appropriate structure,
which the front end then uses to communicate with the back end. This
change again required modifying all of the existing back ends, but the
changes added little new code. The benefits were worth the effort: Only
M instances of 1cc are now needed, and they're all built from one set of
source modules. Bugs tend to be easier to decrypt because they can usu-
ally be reproduced in all instances of 1 cc by specifying the appropriate
target, and it's possible to include targets whose sole purpose is to help
diagnose bugs. It's still possible to build a one-target instance of 1cc,
when it's important to save space.
1cc's source code documents the results of the hundreds of subordi-
nate design choices that must be made when implementing software of
any significance. The source code for 1cc and for this book is in noweb
files that alternate text and code just as this book does. The code is ex-
tracted to form 1 cc's modules, which appear on the companion diskette.
Table 1.1 shows the correspondence between chapters and modules, and
groups the modules according to their primary functions. Some corre-
spondences are one-to-one, some chapters generate several small mod-
ules, and one large module is split across three chapters.
The modules without chapter numbers are omitted from this book,
but they appear on the companion diskette. 1 i st. c implements the
list-manipulation functions described in Exercise 2.15, output. c holds
the output functions, and i nit. c parses and processes C initializers.
event. c implements the event hooks described in Section 8.5, trace. c
emits code to trace calls and returns, and prof. c and profi o. c emit
profiling code.
1.4 • DESIGN 15
where Mis the module name, like alloc.c. {M macros), {M types), and
{M prototypes) define macros and types and declare function prototypes
that are used only within the module. {M data) and {M functions) in-
clude definitions (not declarations) for both external and static data and
16 CHAPTER 1 • INTRODUCTION
The include file confi g. h defines back-end-specific types that are refer-
enced in (interface), as detailed in Chapter 5. c.h defines lee's global
structures and some of its global manifest constants.
1cc can be compiled with pre-ANSI compilers. There are just enough
of these left that it seems prudent to maintain compatibility with them.
ANSI added prototypes, which are so helpful in detecting errors that
we want to use them whenever we can. The following fragments from
output. c show how l cc does so.
( outpu t.c exported functions)= 18
....
extern void outs ARGS((char *));
(output.c functions)= 18
....
void outs(s) char *s; {
char *p;
1. 5 • COMMON DECLARATIONS 17
bp = p;
if (bp > io[fd]->limit)
outflush();
}
Since the declaration for outs appears before its definition, ANSI com-
pilers must treat the definition as if it included the prototype, too, and
thus will check the legality of the parameters in all calls to outs.
ANSI also changed variadic functions. The macro va_start now ex-
pects the last declared parameter as an argument, and varargs. h became
stdarg. h:
....
(c.h exported macros)+=
#ifdef _STDC_
17 18
...
#include <stdarg.h>
#define va_init(a,b) va_start(a,b)
#else
#include <varargs.h>
#define va_init(a,b) va_start(a)
#endif
Definitions of variadic functions also differ. The ANSI C definition
void print(char *fmt, ... ); { ... }
replaces the pre-ANSI C definition
18 CHAPTER 1 • INTRODUCTION
....
(c.h exported macros)+= 18 97
....
#define NELEMS(a) ((int)(sizeof (a)/sizeof ((a)[O])))
#define roundup(x,n) (((x)+((n)-l))&(-((n)-1)))
NELEMS(a) gives the number of elements in array a, and roundup(x,n)
returns x rounded up to the next multiple of n, which must be a power
of two.
for the multiplicative operators. The last two productions specify that
a factor is an ID or a parenthesized expr. These last two productions
could also be written more compactly as
factor: ID I '(' expr ')'
Giving some alternatives on separate lines often makes grammars easier
to read.
Simple function calls could be added to this grammar by adding the
production
factor: ID ' ( ' expr { , expr } ' ) '
which says that a factor can also be an ID followed by a parenthesized
list of one or more exprs separated by commas. All three productions
for factor could be written as
factor: ID [ ' (' expr { , expr } ' ) ' ] I ' (' expr ' ) '
which says that a factor is an ID optionally followed by a parenthesized
list of comma-separated exprs, or just a parenthesized expr.
This notation for syntax specifications is known as extended Backus-
Naur form, or EBNF. Section 7.1 gives the formalities of using EBNF gram-
mars to derive the sentences in a language.
1.7 Errors
1cc is a large, complex program. We find and repair errors routinely. It's
likely that errors were present when we started writing this book and
that the act of writing added more. If you think that you've found an
error, here's what to do.
1. If you found the error by inspecting code in this book, you might
not have a source file that displays the error, so start by creat-
ing one. Most errors, however, are exposed when programmers try
to compile a program they think is valid, so you probably have a
demonstration program already.
2. Preprocess the source file and capture the preprocessor output.
Discard the original code.
3. Prune your source code until it can be pruned no more without
sending the error into hiding. We prune most error demonstrations
to fewer than five lines. We need you to do this pruning because
there are a lot of you and only two of us.
4. Confirm that the source file displays the error with the distributed
version of 1cc. If you've changed 1cc and the error appears only in
your version, then you'll have to chase the error yourself, even if it
turns out to be our fault, because we can't work on your code.
FURTHER READING 21
5. Annotate your code with comments that explain why you think that
1 cc is wrong. If 1cc dies with an assertion failure, please tell us
where it died. If 1cc crashes, please report the last part of the call
chain if you can. If 1cc is rejecting a program you think is valid,
please tell us why you think it's valid, and include supporting page
numbers in the ANSI Standard, Appendix A in The C Programming
Language (Kernighan and Ritchie 1988), or the appropriate section
in C: A Reference Manual (Harbison and Steele 1991). If lee silently
generates incorrect code for some construct, please include the cor-
rupt assembler code in the comments and flag the bad instructions
if you can.
6. Confirm that your error hasn't been fixed already. The latest ver-
sion of 1 cc is always available for anonymous ftp in pub/l cc from
ftp. cs. pri nceton. edu. A LOG file there reports what errors were
fixed and when they were fixed. If you report an error that's been
fixed, you might get a canned reply.
7. Send your program in an electronic mail message addressed to
1cc-bugs@cs. pri nceton. edu. Please send only valid C programs;
put all remarks in C comments so that we can process reports semi-
automatically.
Further Reading
Most compiler texts survey the breadth of compiling algorithms and do
not describe a production compiler, i.e., one that's used daily to compile
production programs. This book makes the other trade-off, sacrificing
the broad survey and showing a production compiler in-depth. These
"breadth" and "depth" books complement one another. For example,
when you read about 1cc's lexical analyzer, consider scanning the ma-
terial in Aho, Sethi, and Ullman (1986); Fischer and LeBlanc (1991); or
Waite and Goos (1984) to learn more about alternatives or the underly-
ing theory. Other depth books include Holub (1990) and Waite and Carter
(1993).
Fraser and Hanson (199lb) describe a previous version of lee, and
include measurements of its compilation speed and the speed of its gen-
erated code. This paper also describes some of 1cc's design alternatives
and its tracing and profiling facilities.
This chapter tells you everything you need to know about noweb to use
this book, but if you want to know more about the design rationale or
implementation see Ramsey (1994). noweb is a descendant of WEB (Knuth
1984). Knuth (1992) collects several of his papers about literate program-
ming.
22 CHAPTER 1 • INTRODUCTION
23
24 CHAPTER 2 • STORAGE MANAGEMENT
(alloc.c2s)=
#include "c. h"
(alloc.c types)
#ifdef PURIFY
(debugging implementation)
#else
(alloc.c data)
(alloc.c functions)
#endif
If PURIFY is defined, the implementation is replaced in its entirety by
one that uses ma 11 oc and free, and is suitable for finding errors. See
Exercise 2.1 for details.
As mentioned above, an arena is a linked list of large blocks of mem-
ory. Each block begins with a header defined by:
(alloc.c types)= 26
.....
struct block {
struct block *next;
char *limit;
char *avail;
};
The space immediately following the arena structure up to the location 26 allocate
given by the limit field is the allocable portion of the block. avail points 28 deallocate
103 limit
to the first free location within the block; space below avail has been 28 newarray
allocated and space beginning at avail and up to limit is available. The
next field points to the next block in the list. The implementation keeps
an arena pointer, which points to the first block in the list with available
space. Blocks are added to the list dynamically during allocation, as
detailed below. Figure 2.1 shows an arena after three blocks have been
allocated. Shading indicates allocated space. The unused space at the
end of the first full-sized arena in Figure 2.1 is explained below.
There are three arenas known by the integers 0-2; clients usually
equate symbolic names to these arena identifiers for use in calls to
allocate, deallocate, and newarray; see Section 5.12. The arena identi-
fiers index an array of pointers to one-element lists, each of which holds
a zero-length block. The first allocation in each arena causes a new block
to be appended to the end of the appropriate list.
(alloc.c data)= 27
.....
static struct block
first[] = { { NULL }, { NULL }, { NULL } },
*arena[] = { &first[O], &first[l], &first[2] };
The initializer for fi rst serves only to provide its size; the omitted ini-
tializers cause the remaining fields of each of the three structures to be
26 CHAPTER 2 • STORAGE MANAGEMENT
arena[l]
1 ::~~ I
avail .
NULL
NULL .
I
fi rst[l]
ap = arena[a];
n = roundup(n, sizeof (union align));
while (ap->avail + n > ap->limit) {
(get a new block 27)
}
ap->avail += n;
return ap->avail - n;
}
.....
(alloc.c types)+=
union align {
25 27
...
2. 3 • ALLOCATING SPACE 27
long l;
char *p;
double d;
int (*f) ARGS((void));
};
( alloc. c types) + =
...
26
union header {
struct block b;
union align a;
};
When a request cannot be filled in the current block, the free space at
the end of the current block is wasted. This waste is illustrated in the
first full-size arena in Figure 2.1.
newarray's implementation simply calls allocate:
(alloc.c functions)+=
....
26 28
void *newarray(m, n, a) unsigned long m, n; unsigned a; { .,..
return allocate(m*n, a);
}
allocate
arena
26
25
2.4 Deallocating Space
first 25
freeblocks 27 An arena is deallocated by adding its blocks to the free-blocks list and
limit 103 reinitializing it to point to the appropriate one-element list that holds a
zero-length block. The blocks are already linked together via their next
fields, so the entire list of blocks can be added to freeb 1ocks with simple
pointer manipulations:
....
(alloc.c functions)+= 28
void deallocate(a) unsigned a; {
arena[a]->next = freeblocks;
freeblocks = first[a].next;
first[a].next =NULL;
arena[a] = &first[a];
}
2.5 Strings
Strings are created for identifiers, constants, registers, and so on. Strings
are compared often; for example, when a symbol table is searched for
an identifier.
The most common uses of strings are provided by the functions ex-
ported by string. c:
2.5 • STRINGS 29
if (n == INT_MIN)
m = (unsigned)INT_MAX + 1;
else i f (n < O)
m = -n;
else
m = n;
do
*--s = m%10 + 'O';
while ((m /= 10) != O);
if (n < 0)
*--s = '-';
return stringn(s, str + sizeof (str) - s);
}
The code uses unsigned arithmetic because ANSI C permits different ma-
chines to treat signed modulus on negative values differently. The code
30 CHAPTER 2 • STORAGE MANAGEMENT
The static variable next points to the next free byte in the current chunk,
and strlimit points one past the end of the chunk. The code allocates
a new chunk, if necessary, and a new table entry. It copies str, which
incidentally allocates space for it as it is copied by incrementing next,
and links the new entry into the appropriate hash chain.
Further Reading
Storage management is a busy area of research; Section 2.5 in Knuth
(1973a) is the definitive reference. There is a long list of techniques that
32 CHAPTER 2 • STORAGE MANAGEMENT
are designed both for general-purpose use and for specific application
areas, including the design described in this chapter (Hanson 1990). A
competitive alternative is "quick fit" (Weinstock and Wulf 1988). Quick-fit
allocators maintain N free lists for the N block sizes requested most fre-
quently. Usually, these sizes are small and contiguous; e.g., 8-128 bytes
in multiples of eight bytes. Allocation is easy and fast: Take the first
block from the appropriate free list. A block is deallocated by adding it
to the head of its list. Requests for sizes other than one of the N favored
sizes are handled with other algorithms, such as first fit (Knuth l 973a).
One of the advantages of 1cc's arena-based algorithm is that alloca-
tions don't have to be paired with individual deallocations; a single deal-
location frees the memory acquired by many allocations, which simplifies
programming. Garbage collection takes this advantage one step further.
A garbage collector periodically finds all of the storage that is in use and
frees the rest. It does so by following all of the accessible pointers in
the program. Appel (1991) and Wilson (1994) survey garbage-collection
algorithms. Garbage collectors usually need help from the programming
language, its compiler, and its run-time system in order to locate the ac-
cessible memory, but there are algorithms that can cope without such
help. Boehm and Weiser (1988) describe one such algorithm for C. It
takes a conservative approach: Anything that looks like a pointer is taken
to be one. As a result, the collector identifies some inaccessible memory
allocate 26 as accessible and thus busy, but that's better than making the opposite
deallocate 28 decision.
Storing all strings in a string table and using hashing to keep only one
copy of any string is a scheme that's been used for years in compilers
and related programming-language implementations, but it's rarely doc-
umented. It's used in SNOBOL4 (Griswold 1972), for example, to make
comparison fast and to make it easy to use strings as keys in associa-
tive tables. Related techniques store strings in a separate string space,
but don't bother to avoid storing multiple copies of the same string
to simplify some string operations, such as substring and concatena-
tion (Hansen 1992; Hanson 1974; McKeeman, Horning, and Wortman
1970).
Knuth (l 973b) is the definitive expose on hashing. Section 7.6 of Aho,
Sethi, and Ullman (1986) describes hash functions and their use in
compilers.
Exercises
2.1 Revise allocate and deallocate to use the C library functions
ma 11 oc and free.
2.2 The only objective way to make decisions between competitive algo-
rithms and designs in 1cc is to implement them and measure their
EXERCISES 33
2.12 stri ngn allocates memory in big chunks to hold the characters in
a string instead of calling a 11 ocate for each string. Revise st ri ngn
so that it calls a 11 ocate for each string and measure the differences
in both time and space. Explain any differences you find.
2.13 The size of stringn's hash table is a power of two, which is often
deprecated. Try a prime and measure the results. Try to design a
better hash function and measure the results.
2.14 stri ngn compares strings with inline code instead of, for example,
calling memcmp. Replace the inline code with a call to memcmp and
measure the result. Why was our decision to inline justified?
2.15 lee makes heavy use of circularly linked lists of pointers, and the
implementation of the module 1 i st. c exemplifies the use of the
allocation macros. 1i st. c exports a list element type and three
list-manipulation functions:
(list.c typedefs)=
typedef struct list *List;
The symbol tables are the central repository for all information within
the compiler. All parts of the compiler communicate through these ta-
bles and access the data - symbols - in them. For example, the lexical
analyzer adds identifiers to the identifier table, and the parser adds type
information to these identifiers. The code generators add target-specific
data to symbol-table entries; for example, register assignments for locals
and parameters. Symbol tables are also used to hold labels, constants,
and types.
Symbol tables map names into sets of symbols. Constants, identifiers,
and label numbers are examples of names. Different names have differ-
ent attributes. For example, the attributes for the identifier that names
a local variable might include the variable's type, its location in a stack
frame for the procedure in which it is declared, and its storage class.
Identifiers that name members of a structure have a very different set
of attributes, including the members' types, the structures in which they
appear, and their locations within those structures.
Symbols are collected into symbol tables. The symbol-table module
manages symbols and symbol tables.
Symbol management must deal not only with the symbols themselves,
but must also handle the scope or visibility rules imposed by the ANSI C
standard. The scope of an identifier is that portion of the program text in
which the identifier is visible; that is, where it may be used in expressions,
and so forth. In C, scopes nest. An identifier is visible at the point of its
declaration until the end of the compound statement or parameter list
in which it is declared. An identifier declared outside of any compound
statement or parameter list has file scope; it is visible from the point of
its declaration to the end of the source file in which it appears.
A declaration for an identifier X hides a visible identifier X declared
at an outer level. The following program illustrates this effect; the line
numbers are for explanatory purposes and are not part of the program.
35
36 CHAPTER 3 • SYMBOL MANAGEMENT
1 int x, y;
2 f( int x , int a) {
3 int b;
4 y = x + a*b;
5 if (y < 5) {
6 int a;
7 y = x + a*b;
8 }
9 y = x + a*b;
10 }
Line 1 declares the globals x and y, whose scopes begin at line 1 and
extend through line 10. But the declaration of the parameter x in line 2
interrupts the scope of the global x. The scopes of the parameters x and a
begin at line 2 and extend through line 9. The scope of a is interrupted by
the declaration of the local a in line 6. Each identifier in the expression on
line 4 is bound to a specific declaration, and these bindings are specified
by C's scope rules. Using x:n to denote the identifier x declared at linen,
y is bound to y:l, x to x:2, a to a:2, and b to b:3. The bindings for the
expression in line 7 are the same, except that a is bound to a:6.
Declarations like those for x in line 2 and a in line 6 create a hole in the
scopes of similarly named identifiers declared in outer scopes. For ex-
ample, the scope of a:6 is lines 6-8, which is the hole in the scope of a:2,
whose scope is lines 2-5 and 9-10. The symbol-management functions
must accommodate this and similar situations.
In most languages, like Pascal, there is one name space for identifiers.
That is, there is a single set of identifiers for all purposes and, at any
point in the program, there can be only one visible identifier of a given
name.
The name spaces in ANSI C categorize identifiers according to use:
Statement labels, tags, members, and ordinary identifiers. Tags identify
structures, unions, and enumerations. There are three separate name
spaces for labels, tags, and identifiers, and, for each structure or union,
there is a separate name space for its members.
For each name space, there can be only one visible identifier of a given
name at any point in the program. There can, however, be more than one
visible identifier at any point in the program if each such identifier is in
a different name space. The following artificial and confusing program
illustrates this effect.
3.1 • REPRESENTING SYMBOLS 37
char *name;
int scope;
Coordinate src;
Symbol up;
List uses;
38 CHAPTER 3 • SYMBOL MANAGEMENT
int sclass;
(symbol flags 50)
Type type;
float ref;
union {
(labels 46)
(struct types65)
(enum constants69)
(enum types 68)
(constants 47)
(function symbols 290)
(globals 265)
(temporaries 346)
} u;
Xsymbol x;
(debugger extension)
};
The fields above the uni on u apply to all kinds of symbols in all tables.
Most of the symbol-table functions read and write only the name, scope,
src, up, and uses fields. Those specific to constants and labels also rely
on some of the fields in the uni on u and some of the (symbol flags) as
detailed below. The remaining fields implement attributes that are asso-
file 104
uses 422
ciated with specific kinds of symbols, and are initialized and modified
Xsymbol 362 by clients of the symbol-table module.
The name field is usually the symbol-table key. For identifiers and
keywords that name types, it holds the name used in the source code.
For generated identifiers, such as structures without tags, name is a digit
string.
The scope field classifies each symbol as a constant, label, global, pa-
rameter, or local:
(sym.c exported types)+=
....
37
enum { CONSTANTS=l, LABELS, GLOBAL, PARAM, LOCAL };
A local declared at nesting level k has a scope equal to LOCAL+k.
The src field is the point in the source code that defines the symbol,
as in a variable declaration. Its Coordinate value pinpoints the symbol's
definition:
....
(sym.c typedefs)+=
typedef struct coord {
37 39
...
char *file;
unsigned x, y;
} Coordinate;
The file field is the name of the file that contains the definition, and
y and x give the line number and character position within that line at
which the definition occurs.
3.2 •REPRESENTING SYMBOL TABLES 39
NEWO(new, FUNC);
new->previous = tp;
new->level =level;
if (tp)
new->all = tp->all;
return new;
}
All dynamically allocated tables are discarded after compiling each func- 38 Coordinate
tion, so they are allocated in the FUNC arena. 286 funcdefn
97 FUNC
Figure 3.1 shows the four tables that emanate from i denti fi ers when 38 GLOBAL
1cc is compiling line 7 of the example at the top of page 36. The figure's 42 level
entry structures show only the name and up fields of their symbols and 24 NEWO
their 1 ink fields. The solid lines show the previous fields, which connect 37 scope
tables; the elements of buckets and the 1 ink fields, which connect en- 39 Table
tries; and the name fields. The dashed lines emanate from the a 11 fields
in tables and from the up fields of symbols.
The a11 field is initialized to the enclosing table's 1i st so that it is
possible to visit all symbols in all scopes by following the symbols be-
ginning at a table's a11. This capability is used by foreach to scan a
table and apply a given function to all symbols at a given scope .
(sym.c functions)+=
...
41 42
void foreach(tp, lev, apply, cl) Table tp; int lev;
....
void (*apply) ARGS((Symbol, void*)); void *cl; {
while (tp && tp->level > lev)
tp = tp->previous;
if (tp && tp->level lev) {
Symbol p;
Coordinate sav;
sav = src;
for (p = tp->all; p && p->scope lev; p = p->up) {
42 CHAPTER 3 • SYMBOL MANAGEMENT
src = p->src;
(*apply)(p, cl);
}
src = sav;
}
}
The while loop finds the table with the proper scope. If one is found,
foreach sets the global variable src to each symbol's definition coordi-
nate and calls apply with the symbol. cl is a pointer to call-specific data
- a closure - supplied by callers of foreach, and this closure is passed
along to apply so that it can access those data, if necessary. src is set so
that diagnostics that might be issued by apply will refer to a meaningful
source coordinate.
The for loop traverses the table's all and stops when it encounters
the end of the list or a symbol at a lower scope. a 11 is not strictly nec-
essary because fo reach could traverse the hash chains, but presenting
the symbols to apply in an order independent of hash addresses makes
the order of the emitted code machine-independent.
0
identifiers\ a b x y
level 6
previous
buckets
all -'
FIGURE 3.1 Symbol tables when compiling line 7 of the example on page 36.
44 CHAPTER 3 • SYMBOL MANAGEMENT
if (types->level == level)
types = types->previous;
if (identifiers->level == level) {
(warn if more than 12 7 identifiers)
identifiers = identifiers->previous;
}
--level;
}
Tables at the current scope are created only if necessary. Few scopes in C
declare new symbols, so lazy table allocation saves time, but exi tscope
must check levels to see if there is a table to remove. rmtypes removes
from the type cache types with tags defined in the vanishing scope; see
Section 4.2.
the entry to the hash chain. 1eve 1 must be zero or at least as large as the
table's scope level; a zero value for 1eve1 indicates that name should be
installed in *tpp. i nsta 11 accepts an argument that specifies the appro-
priate arena because function prototypes and thus the symbols in them
are retained forever, even if they're declared in a nested scope.
1ookup searches a table for a name; it handles lookups where the
search key is the name field of a symbol. It returns a symbol pointer
if it succeeds and the null pointer otherwise.
(sym.c functions)+=
...
44 45
.....
Symbol lookup(name, tp) char *name; Table tp; {
struct entry *p;
unsigned h = (unsigned)name&(HASHSIZE-1);
do
for (p = tp->buckets[h]; p; p p->link)
if (name == p->sym.name)
return &p->sym;
while ((tp = tp->previous) !=NULL);
return NULL;
}
The inner loop scans a hash chain, and the outer loop scans enclosing
scopes. Comparing two strings is trivial because the string module guar- 40 HASHSIZE
44 install
antees that two strings are identical if and only if they are the same 39 Table
string.
3.5 Labels
The symbol-table module also exports functions to manage labels and
constants. These are similar to 1ookup and i nsta 11, but there is no
scope management for these tables, and looking up a label or constant
installs it if necessary and thus always succeeds. Also, the search key is
a field in the union u that is specific to labels or constants.
Compiler-generated labels and the internal counterparts of source-
language labels are named by integers. gen 1abe 1 generates a run of these
integers by incrementing a counter:
(sym.c functions)+=
...
45 46
.....
int genlabel(n) int n; {
static int label = 1;
label += n;
return label - n;
}
46 CHAPTER 3 • SYMBOL MANAGEMENT
When two or more internal labels are found to label the same location,
the equatedto fields of such label symbols point to one of them.
There is an internal label for each source-language label. These and
other compiler-generated labels are kept in 1abel s. This table is created
once for each function (see Section 11.6) and is managed by fi ndl abe l,
which takes a label number and returns the corresponding label sym-
bol, installing and initializing it, and announcing it to the back end, if
necessary.
(sym.c functions)+=
...
45 47
....
Symbol findlabel(lab) int lab; {
struct entry *p;
defsymbol 89 unsigned h = lab&(HASHSIZE-1);
(MIPS) " 457
(SPARC) " 491
(X86) " 520
for (p = labels->buckets[h]; p; p = p->link)
FUNC 97 if (lab== p->sym.u.l.label)
generated 50 return &p->sym;
genlabel 45 NEWO(p, FUNC);
HASHSIZE 40 p->sym.name = stringd(lab);
IR 306
LABELS 38
p->sym.scope = LABELS;
labels 41 p->sym.up = labels->all;
NEWO 24 labels->all = &p->sym;
scope 37 p->link = labels->buckets[h];
stringd 29 labels->buckets[h] = p;
p->sym.generated = 1;
p->sym.u.l.label =lab;
(*IR->defsymbol)(&p->sym);
return &p->sym;
}
3.6 Constants
A reference to a compile-time constant as an operand in an expression is
made by pointing to a symbol for the constant. These symbols reside in
the constants table. Like labels, this table is not scoped; all constants
have a scope field equal to CONSTANTS.
The actual value of a constant is represented by instances of the union
(sym.c typedefs)+=
....
39
typedef union value {
/* signed */ char sc;
short ss;
int i;
unsigned char uc;
unsigned short us;
unsigned int u;
float f;
double d;
void *p;
} Value;
The value is stored in the appropriate field according to its type, e.g.,
integers are stored in the i field, unsigned characters are stored in the
uc field, etc. 38 CONSTANTS
When a constant is installed in constants, its Type is stored in the 40 constants
41 labels
symbol's type field; Types encode C's data types and are described in 37 scope
Chapter 4. The value is stored in u . c. v: 54 Type
160 value
(constants 47) = 38
struct {
Value v;
Symbol loc;
} c;
On some targets, some constants - floating-point numbers - cannot
be stored in instructions, so the compiler generates a static variable and
initializes it to the value of the constant. For these, u. c. l oc points to
the symbol for the generated variable. Taken together, the type and u. c
fields represent all that is known about a constant.
Only one instance of any given constant appears in the constants
table, e.g., if the constant "hello world" appea~s three times in a pro-
gram, all three references point to the same symbol. constant searches
the constant table for a given value of a given type, installing it if neces-
sary, and returns the symbol pointer. Constants are never removed from
the table.
( sym.c functions)+=
....
46 49
.....
Symbol constant(ty, v) Type ty; Value v; {
48 CHAPTER 3 • SYMBOL MANAGEMENT
ty = unqual(ty);
for (p = constants->buckets[h]; p; p = p->link)
if (eqtype(ty, p->sym.type, 1))
(return the symbol if p's value== v 48)
NEWO(p, PERM);
p->sym.name = vtoa(ty, v);
p->sym.scope = CONSTANTS;
p->sym.type = ty;
p->sym.sclass = STATIC;
p->sym.u.c.v = v;
p->link = constants->buckets[h];
p->sym.up = constants->all;
constants->all = &p->sym;
constants->buckets[h] = p;
(announce the constant, if necessary 49)
p->sym.defined = 1;
return &p->sym;
}
CHAR 109 unqual returns the unqualified version of a Type, namely without const
CONSTANTS 38 or volatile, and eqtype tests for type equality (see Section 4.7). If v ap-
constants 40 pears in the table, its symbol pointer is returned. Otherwise, a symbol is
defined 50 allocated and initialized. The name field is set to the string representation
eqtype 69
FLOAT 109
returned by vtoa.
HASHSIZE 40 This value is useful only for the integral types and constant pointers;
INT 109 for the other types, the string returned vtoa may not reliably depict the
NEWO 24 value. Constants are found in the table by comparing their actual values,
PERM 97 not their string representations, because some floating-point constants
scope 37
SHORT 109
have no natural string representations. For example, the constant expres-
STATIC 80 sion (double)(float)0.3 truncates 0.3 to a machine-dependent value.
Type 54 The effect of the cast cannot be captured by a valid string constant.
unqual 60 The type operator determines which union ficlds to compare.
UNSIGNED 109
(sym.c macros)=
#define equalp(x) v.x == p->sym.u.c.v.x
(return the symbol if p's value== v 48) = 48
switch (ty->op) {
case CHAR: if (equalp(uc)) return &p->sym; break;
case SHORT: if (equalp(ss)) return &p->sym; break;
case INT: if (equalp(i)) return &p->sym; break;
case UNSIGNED: if (equalp(u)) return &p->sym; break;
case FLOAT: if (equalp(f)) return &p->sym; break;
3.7 • GENERATED VARIABLES 49
v. i = n;
return constant(inttype, v); 179 addressed
} 109 ARRAY
47 constant
89 defsymbol
457 " (MIPS)
3. 7 Generated Variables 491 " (SPARC)
520 " (X86)
The front end generates local variables for many purposes. For example, 109 DOUBLE
it generates static variables to hold out-of-line constants like strings and 97 FUNC
109 FUNCTION
jump tables for switch statements. It generates locals to pass and return 50 generated
structures to functions and to hold the results of conditional expres- 45 genlabel
sions and switch values. geni dent allocates and initializes a generated 38 GLOBAL
identifier of a specific type, storage class, and scope: 306 IR
38 LOCAL
.... 24 NEWO
(sym.c functions)+= 49 50
Symbol genident(scls, ty, lev) int scls, lev; Type ty; { ... 97 PERM
109 POINTER
Symbol p; 37 scope
29 stringd
NEWO(p, lev >=LOCAL? FUNC : PERM); 47 Value
p->name = stringd(genlabel(l));
p->scope = lev;
p->sclass = scls;
p->type = ty;
p->generated = 1;
if (lev == GLOBAL)
50 CHAPTER 3 • SYMBOL MANAGEMENT
(*IR->defsymbol)(p);
return p;
}
unsigned generated:!;
The names are digit strings, and the generated flag is set. Parameters
and locals are announced to the back end elsewhere; generated globals
are announced here by calling the back end's defsymbo 1 interface func-
tion. IR points to a data structure that connects a specific back end with
the front; Section 5.11 explains how this binding is initialized.
Temporaries are another kind of generated variable, and are distin-
guished by a lit temporary flag:
....
(sym.c functions)+=
Symbol temporary(scls, ty, lev) Type ty; int scls, lev; {
49 50
...
Symbol p = genident(scls, ty, lev);
p->temporary = 1;
return p;
}
bt:ot:
74
defsymbol 89 Back ends must also generate temporary locals to spill registers, for ex-
(MIPS) " 457
(SPARC) " 491
ample. They cannot call temporary directly because they do not know
(X86) " 520 about the type system. newtemp accepts a type suffix, calls btot to map
genident: 49 this suffix into a representative type, and calls temporary with that type.
IR306
(sym.c functions)+=
....
50
LOCAL 38
local 90 Symbol newtemp(sclass, tc) int sclass, tc; {
(MIPS) " 447 Symbol p = temporary(sclass, btot(tc), LOCAL);
(SPARC) " 483
(X86) " 518
(*IR->local)(p);
p->defined = 1;
return p;
}
....
(symbol flags 50)+=
unsigned defined:!;
50 179
... 38
Calls to newtemp occur during code generation, which is too late for new
temporaries to be announced like front-end temporaries. So newtemp
calls 1oca1 to announce them. The flag defined is lit after the symbol
has been announced to the back end.
FURTHER READING 51
Further Reading
1 cc's symbol-table module implements only what is necessary for C.
Other languages need more; for example, in block-structured languages
- those with nested procedures - more than one set of parameters and
locals are visible at the same time. Newer object-oriented languages and
languages with explicit scope directives have more scopes; some need
many separate symbol tables to exist at the same time.
Fraser and Hanson (199lb) describe the evolution of lee's symbol-
table module.
Knuth (1973b), Section 6.4, gives a detailed analysis of hashing and
describes the characteristics of good hashing functions. Suggestions for
good hash functions abound; the one in Aho, Sethi, and Ullman (1986),
Section 7.6 is an example.
Exercises
3.1 Try a better hash function for hashing entries in symbol tables; for
example, try the one in Aho, Sethi, and Ullman (1986), Section 7.6.
Does it make 1cc run faster?
3.2 1cc never removes entries from the constants table. When might
40 constants
this approach be a problem? Propose and implement a fix, and 38 Coordinate
measure the benefit. Is the benefit worth the effort? 42 enterscope
42 exitscope
3.3 Originally, 1 cc used a single hash table for its symbol tables (Fraser 90 global
and Hanson 199lb). In this approach, hash chains held all of the 458 " (MIPS)
symbols that hashed to that bucket, and the chains were ordered in 492 " (SPARC)
decreasing order of scope values. 1ookup simply searched a single 524 " (X86)
hash table. i nstal 1 and enters cope were easy using this approach, 44 install
34 List
but exits cope was more complicated because it had to scan the 45 lookup
chains and remove the symbols at the current scope level. The 422 uses
present design ran faster on some computers, but it might not be
faster than the original design on other computers. Implement the
original design; make sure you handle accesses to global correctly.
Which design is easier to understand? Which is faster?
3.4 sym. c exports data and functions that help generate cross-reference
lists for identifiers and symbol-table information for debuggers.
The -x option causes 1cc to set the uses field of a symbol to a
Li st of pointers to Coordinates that identify each use of the sym-
bol. sym. c exports
(sym.c exported functions)=
extern void use ARGS((Symbol p, Coordinate src));
...
52
Coordinate 38
List 34
Symbol 37
Table 39
use 51
4
Types
( types.c typedefs)= 66
....
typedef struct type *Type;
that is both constant and volatile. The VOID operator identifies the void
type; it has no operand.
The a 1i gn and size fields give the type's alignment and the size of
objects of that type in bytes. As specified by the code-generation inter-
face in Chapter 5, the size must be a multiple of the alignment. The back
end must allocate space for a variable so that its address is a multiple
of its type's alignment.
The x field plays the same role for types as it does in symbols; back
ends may define Xtype to add target-specific fields to the type structure.
This facility is most often used to support debuggers.
The innards of Types are revealed by exporting the declaration so that
back ends may read the size and a 1i gn fields and read and write the x
fields; by convention, these are the only fields the back ends are allowed
to inspect. The front end, however, may access all of the fields.
The op, type, size, and a 1i gn fields give most of the information
needed for dealing with a type. For unqualified types with names or tags
- the built-in types, structure and union types, and enumeration types -
the u. sym fields point to symbol-table entries that give more information
about the types:
(types with names or tags 55)= 54
Symbol sym;
The symbol-table entry gives the name of the type, and the value of 37 symbol
u. sym->addressed is zero if constants of the type can be included as 54 Type
parts of instructions. u. sym->type points back to the type itself; this 41 types
109 VOID
pointer is used to map tags to types, for example. There is one symbol-
table entry for each structure, union, and enumeration type, one for each
basic type, and one for all pointer types. These entries appear in the
types table, as detailed below in Section 4.2. This representation is used
so that the functions in sym. c can be used to manage types.
Types can be depicted in a parenthesized prefix form that follows
closely the English prefix form introduced above. For example, the type
int on the MIPS is:
(INT 4 4 ["int"])
The first 4 is the alignment, the second 4 is the size, and the ["int"]
denotes a pointer to a symbol-table entry for the type name int. Other
types are depicted similarly, for example
(POINTER 4 4 (INT 4 4 ["int"]) ["T*"])
is the type pointer to an int. The type name T* represents the single
symbol-table entry that is used for all pointer types.
The alignments, sizes, and symbol-table pointers are omitted from
explanations (but not from the code) when they're not needed to under-
stand the topic at hand. For instance, the types given at the beginning
of this section are:
56 CHAPTER 4 • TYPES
(INT)
(POINTER (INT))
(POINTER (ARRAY 10 (POINTER (CHAR))))
The last line, which depicts the type pointer to an array of 10 pointers to
char, illustrates the convention for array types in which the number of
elements is given instead of the size of the array. This convention is only
a notational convenience; the size field of the array type always holds
the actual size of the array. The number of elements can be computed by
dividing that size by the size of the element type. Thus, the type array
of 10 ints is more accurately depicted as
(ARRAY 40 4 (INT 4 4 ["int"]))
but, by convention, is usually depicted as (ARRAY 10 (INT)). An incom-
plete type is one whose size is unknown and that thus has a size field
equal to zero. These arise from declarations that omit sizes, such as
int a[];
extern struct table *identifiers;
Opaque pointers, such as pointers to lee's table structures, are incom-
plete types. Sizes for incomplete types are sometimes shown when it's
important to indicate that they are incomplete.
align 78
NELEMS 19
stringn 30 4.2 Type Management
One of the basic operations in type checking is determining whether two
types are equivalent. This test can simplified if there is only one copy
of any type, much the same way that string comparison is simplified by
keeping only one copy of any string.
type does for types what stri ngn does for strings. type manages
typetable:
( types.c data)=
static struct entry {
...
59
type always builds new types for function types and for incomplete array
types. When type builds a new type, it initializes the fields specified by
the arguments, clears the x field, adds the type to the appropriate hash
chain, and returns the new Type.
type searches typetable by using the exclusive OR of the type oper-
ator and the address of the operand as the hash value, and searching
the appropriate chain for a type with the same operator, operand, size,
alignment, and symbol-table entry:
(hash op and ty 57) = 56
78 align
109 ARRAY
(opA((unsigned)ty>>3)) 109 FUNCTION
24 NEW
(search for an existing type 57) = 57 97 PERM
for (tn = typetable[h]; tn; tn = tn->link) 54 Type
56 type
if (tn->type.op == op && tn->type.type ty 56 typetable
&& tn->type.size == size && tn->type.align align
&& tn->type.u.sym == sym)
return &tn->type;
typetable is initialized with the built-in types and the type for void*.
These types are also the values of 14 global variables:
( types.c exported data)=
extern Type chartype;
extern Type doubletype;
extern Type floattype;
extern Type inttype;
extern Type longdouble;
extern Type longtype;
extern Type shorttype;
extern Type signedchar;
extern Type unsignedchar;
extern Type unsignedlong;
58 CHAPTER 4 • TYPES
addressed 179
(typeinit 58)=
#define xx(v,name,op,metrics) { \
...
59 58
(typelnit 58)+=
...
58 61 58
....
{
Symbol p;
p = install(string("void"), &types, GLOBAL, PERM);
voidtype = type(VOID, NULL, 0, 0, p);
p->type = voidtype;
}
typelni t installs the symbol-table entries into the types table de-
fined in Section 3.2. This table holds entries for all types that are
named by identifiers or tags. The basic types are installed by typelni t
and are never removed. But the types associated with structure, union,
and enumeration tags must be removed from typetab 1e when their as-
sociated symbol-table entries are removed from types by exi tscope.
exitscope calls rmtypes(lev) to remove from typetable any types
whose u. sym->scope is greater than or equal to 1ev:
( types.c data)+=
...
56 61
....
static int maxlevel;
(types.c functions)+=
...
58 61
....
void rmtypes(lev) int lev; {
if (maxlevel >= lev) {
int i; 42 exitscope
maxlevel = O; 109 FUNCTION
for (i = O; i < NELEMS(typetable); i++) { 38 GLOBAL
44 install
(remove types with u. sym->scope >= 1 ev 59)
19 NELEMS
} 97 PERM
} 37 scope
} 29 string
58 type!nit
The value of maxl eve 1 is the largest value of u. sym->scope for any type 41 types
in typetable that has an associated symbol-table entry. rmtypes uses 56 typetable
maxl evel to avoid scanning typetabl e in the frequently occurring case 109 VOID
when none of the symbol-table entries have scopes greater than or equal 58 voidtype
to lev. Removing the types also recomputes maxlevel:
(remove types with u. sym->scope >= 1ev 59) = 59
struct entry *tn, **tq = &typetable[i];
while ((tn = *tq) != NULL)
if (tn->type.op == FUNCTION)
tq = &tn->link;
else if (tn->type.u.sym && tn->type.u.sym->scope >= lev)
*tq = tn->link;
else {
(recompute max 1eve 1 60)
tq = &tn->link;
}
60 CHAPTER 4 • TYPES
which, given a type ty, returns (POINTER ty). The symbol-table entry
associated with pointer types is assigned to poi ntersym during initial-
ization, and the type for void* is initialized by calling pt r:
(types.c data)+=
....
59
static Symbol pointersym;
(typelni t 58) += 59 58
....
pointersym = install(string("T*"), &types, GLOBAL, PERM);
pointersym->addressed = IR->ptrmetric.outofline;
voidptype = ptr(voidtype);
While ptr builds a pointer type, deref dereferences it; that is, it returns 179 addressed
the reference type. Given a type (POINTER ty), deref returns ty: 78 align
.... 38 GLOBAL
( types.c functions)+= 61 61
..... 44 install
Type deref(ty) Type ty; { 306 IR
if (i sptr(ty)) 60 isenum
60 isptr
ty = ty->type; 78 outofl i ne
else 97 PERM
error("type error: %s\n", "pointer expected"); 109 POINTER
return isenum(ty) ? unqual(ty)->type : ty; 79 pt rmet ri c
} 29 string
56 type
de ref, like some of the other constructors below, issues errors for invalid 41 types
operands. Technically, these kinds of tests are part of type-checking, not 60 unqual
58 voidptype
type construction, but putting these tests in the constructors simplifies 58 voidtype
the type-checking code and avoids oversights. The last line of deref
handles pointers to enumerations: dereferencing a pointer to an enu-
meration must return its associated unqualified integral type. unqual is
described above.
array(ty, n, a) builds the type (ARRAY n ty). It also arranges for
the resulting type to have alignment a or, if a is 0, the alignment of ty.
array also checks for illegal operands.
....
(types.c functions)+= 61 62
.....
Type array(ty, n, a) Type ty; int n, a; {
62 CHAPTER 4 • TYPES
if (isfunc(ty)) {
error("illegal type 'array of %t'\n", ty);
return array(inttype, n, O);
}
if (level > GLOBAL && isarray(ty) && ty->size == 0)
error("missing array size\n");
if (ty->size == 0) {
if (unqual(ty) == voidtype)
error("illegal type 'array of %t'\n", ty);
else if (Aflag >= 2)
warning("declaring type 'array of %t' is _
undefined\n", ty);
} else if (n > INT_MAX/ty->size) {
error("size of 'array of %t' exceeds %d bytes\n",
ty, INT_MAX);
n = 1;
}
return type(ARRAY, ty, n*ty->size,
a? a : ty->align, NULL);
}
if (isarray(ty))
ty = type(ARRAY, qual(op, ty->type), ty->size,
ty->align, NULL);
else if (isfunc(ty))
warning("qualified function type ignored\n");
else if (isconst(ty) && op == CONST
II isvolatile(ty) && op == VOLATILE)
error("illegal type '%k %t'\n", op, ty);
else {
if (isqual(ty)) {
op += ty->op;
ty = ty->type;
}
ty = type(op, ty, ty->size, ty->align, NULL);
}
return ty;
}
FUNCTION 109 freturn is to function types what ptr is to pointer types. It takes a type
isarray 60 (FUNCTION ty) and dereferences it to yield ty, the type of the return
isfunc 60 value.
oldstyle
ptr
63
61 ( types.c functions)+=
...
64 65
....
Type freturn(ty) Type ty; {
i f (isfunc(ty))
return ty->type;
error("type error: %s\n", "function expected");
return inttype;
}
The predicate vari adi c tests whether a function type has a variable-
length argument list by looking for the type void at the end of its pro-
totype:
....
( types.c functions)+=
int variadic(ty) Type ty; {
64 67 ...
if (isfunc(ty) && ty->u.f .proto) {
inti;
for Ci= O; ty->u.f.proto[i]; i++)
cfi el ds and vfi el ds are both one if the structure or union type has any
const-qualified or volatile-qualified fields. fl i st points to a list of field
structures threaded through their link fields:
(types.c typedefs)+=
....
54
typedef struct field *Field;
name holds the field name, type is the field's type, and offset is the byte
offset to the field in an instance of the structure.
When a field describes a bit field, the type field is either i nttype
or unsi gnedtype, because those are the only two types allowed for bit
fields. The l sb field is nonzero and the following macros apply. l sb is
the number of the least significant bit in the bit field plus one, where bit
cfields 65 numbers start at zero with the least significant bit.
field 182
inttype 57 ....
newstruct 67
(types.c exported macros)+=
#define fieldsize(p) (p)->bitsize
60 74
...
offset 364
structdcl 277 #define fieldright(p) ((p)->lsb - 1)
types 41 #define fieldleft(p) (8*(p)->type->size - \
unsignedtype 58 fi~ldsize(p) - fieldright(p))
vfields 65 #define fieldmask(p) (-(-(unsigned)O<<fieldsize(p)))
fields i ze returns the bi ts i ze field, which holds the size of the bit field
in bits. fi eldri ght is the number of bits to the right of a bit field, and is
used to shift the field over to the least significant bits of a signed or un-
signfd integer. Likewise, fieldleft is the number of bits to the left of a
field; it is used when a signed bit field must be sign-extended. fie l dmas k
is a mask of bi tsi ze qnes and is used to clear the extraneous bits when
a bit field is extracted. Notice that this representation for bit fields does
not depend on the target's endianness; the same representation is used
for both big and little endians.
newstruct creates anew type, (STRUCT ["tag"]) or (UNION ["tag"]),
where tag is the tag. It's called by structdcl whenever a new structure
or union type is declared or defined, with or without a field list. When
a new structure or union type is created, its tag is installed in the types
table. Tags are generated for anonymous structures and unions; that is,
those without tags:
4. 6 • STRUCTURE AND ENUMERA T/ON TYPES 67
...
( types.c functions)+=
Type newstruct(op, tag) int op; char *tag; {
65 68 ...
Symbol p;
if (*tag == O)
tag= stringd(genlabel(l));
else
(check for redefinition of tag 67)
p = install(tag, &types, level, PERM);
p->type = type(op, NULL, 0, 0, p);
if (p->scope > maxlevel)
maxlevel = p->scope;
p->src = src;
return p->type;
}
Installing a new tag in types might create an entry with a scope that
exceeds max 1eve1, so max 1eve1 is adjusted if necessary. Structure types
point to their symbol-table entries, which point back to the type, so that
tags can be mapped to types and vice versa. Tags are mapped to types
when they are used in declarators, for example; see structdcl. Types
are mapped to tags when rmtypes removes them from the typetabl e.
It's illegal to define the same tag more than once in the same scope, 50 defined
but it is legal to declare the same tag more than once. Giving a struc- 45 genlabel
ture declaration with fields declares and defines a structure tag; using a 44 install
structure tag without giving its fields declares the tag. For example, 42 level
45 lookup
struct employee { 59 maxlevel
char *name; 38 PARAM
struct date *hired; 97 PERM
59 rmtypes
char ssn[9]; 37 scope
} 29 stringd
277 structdcl
declares and defines emp 1oyee but only declares date. When a tag is 41 types
defined, its defined flag is lit, and defined is examined to determine if 56 typetable
the tag is being redefined:
(check for redefinition of tag 67)= 67
if ((p = lookup(tag, types)) != NULL && (p->scope == level
I I p->scope == PARAM && level == PARAM+l)) {
if (p->type->op == op && !p->defined)
return p->type;
error("redefinition of '%s' previously defined at %w\n",
p->name, &p->src);
}
Arguments and argument types have scope PARAM, and locals have scopes
beginning at PARAM+l. ANSI C specifies that arguments and top-level
68 CHAPTER 4 • TYPES
locals are in the same scope, so the scope test must test for a local
tag that redefines a tag defined by an argument. This division is not
mandated by the ANSI C Standard; it's used internally by 1cc to separate
parameters and locals so that foreach can visit them separately.
newfi e 1d adds a field with type fty to a structure type ty by allocating
a fie 1d structure and appending it to the field list in ty's symbol-table
entry:
(types.c functions}+=
....
67 69
Field newfield(name, ty, fty) char *name; Type ty, fty; { .,..
Field p, *q = &ty->u.sym->u.s.flist;
if (name NULL)
name= stringd(genlabel(l));
for (p = *q; p; q = &p->link, p *q)
if (p->name == name)
error("duplicate field name '%s' in '%t'\n",
name, ty);
NEWO(p, PERM);
*q = p;
p->name = name;
p->type = fty;
ENUM 109 return p;
Field 66 }
fieldref 76
fields 280 If name is null, newfield generates a name; this capability is used by
foreach 41 fields for unnamed bit fields. Field lists are searched by fieldref; see
genlabel 45 Exercise 4.6.
identifiers 41
NEWO 24
Enumeration types are like structure and union types, except that they
newstruct 67 don't have fields, and their type fields give their associated integral type,
PERM 97 which for 1 cc is always i nttype. The standard permits compilers to use
stringd 29 any integral type that can hold all of the enumeration values, but many
symbol 37 compilers always use ints; 1 cc does likewise to maintain compability.
Enumeration types have a type field so that 1 cc could use different in-
tegral types for different enumerations. Enumeration types are created
by calling newstruct with the operator ENUM, and newstruct returns the
type (ENUM ["tag"]).
Like a structure or union type, the u. sym field of an enumeration type
points to a symbol-table entry for its tag, but it uses a different compo-
nent of the symbo 1 structure:
(enum types 68) = 38
Symbol *idlist;
i dl i st points to a null-terminated array of Symbo1s for the enumeration
constants associated with the enumeration type. These are installed in
the identifiers table, and each one carries its value:
4. 7 • TYPE-CHECKING FUNCTIONS 69
4. 7 Type-Checking Functions
Determining when two types are compatible is the crux of type check-
ing, and the functions described here help to implement ANSI C's type-
checking rules.
eqtype returns one if two types are compatible and zero otherwise .
....
( types.c functions)+=
int eqtype(tyl, ty2, ret) Type tyl, ty2; int ret; {
68 71
...
i f (tyl == ty2)
return 1;
if (tyl->op != ty2->op)
return O;
switch (tyl->op) {
case CHAR: case SHORT: case UNSIGNED: case INT:
case ENUM: case UNION: case STRUCT: case DOUBLE: 109 ARRAY
109 CHAR
return O;
109 CONST
case POINTER: (check for compatible pointer types 70) 109 DOUBLE
case VOLATILE: case CONST+VOLATILE: 109 ENUM
case CONST: (check for compatible qualified types 70) 109 FUNCTION
case ARRAY: (check for compatible array types 70) 109 INT
109 POINTER
case FUNCTION: (check for compatible function types 70)
109 SHORT
} 109 STRUCT
} 56 type
109 UNION
The third argument, ret, is the value returned when either tyl or ty2 is 109 UNSIGNED
an incomplete type. 160 value
A type is always compatible with itself. type ensures that there is only 109 VOLATILE
one instance of most types, so many tests of compatible types pass the
first test in eqtype. Likewise, many tests of incompatible types test types
with different operators, which are never compatible and cause eqtype
to return zero.
If two different types have the same operator CHAR, SHORT, UNSIGNED,
or INT, the two types represent different types, such as unsigned short
and signed short, and are incompatible. Similarly, two enumeration,
structure, or union types are compatible only if they are the same type.
The remaining cases traverse the type structures to determine compat-
ibility. For example, two pointer types are compatible if their referenced
types are compatible:
70 CHAPTER 4 • TYPES
The easy case is when both functions have a prototype. The prototypes
must both have the same number of argument types, and the unqualified
versions of the types in each prototype must be compatible.
(check for compatible prototypes 71)= 70
for ( ; *pl && *p2; pl++, p2++)
if (eqtype(unqual(*pl), unqual(*p2), 1) == O)
return O;
if (*pl == NULL && *p2 == NULL)
return l;
The other case is more complicated. Each argument type in the one
function type that has a prototype must be compatible with the type that
results from applying the default argument promotions to the unqualified
version of the type itself. Also, if the function type with a prototype has a
variable number of arguments, the two function types are incompatible.
(check if prototype is upward compatible 71) = 70
if (variadic(pl ? tyl : ty2))
return O;
if (pl == NULL)
pl = p2;
for ( ; *pl; pl++) {
Type ty = unqual(*pl); 69 eqtype
if (promote(ty) != ty I I ty == floattype) 57 floattype
60 isenum
return O; 60 isint
} 60 isunsigned
return l; 57 longtype
60 unqual
The default argument promotions stipulate that floats are promoted to 65 variadic
doubles and that small integers and enumerations are promoted to ints
or unsigneds. The code above checks the float promotion explicitly, and
calls promote for the others. promote implements the integral promo-
tions:
.....
( types.c functions)+= 69 72
....
Type promote(ty) Type ty; {
ty = unqual(ty);
if (isunsigned(ty) I I ty == longtype)
return ty;
else if (isint(ty) I I isenum(ty))
return inttype;
return ty;
}
int x[];
int x[lO];
The first declaration associates the type (ARRAY 0 (INT)) with x. The
second declaration forms the new type (ARRAY 10 (INT)). These two
types are combined to form the type (ARRAY 10 (INT)), which becomes
the type of x. Combining these two types uses the size of the second
type in the composite type. Another example is combining a function
type with a prototype with a function type without one.
compose accepts two compatible types and returns the composite type.
compose is similar in structure to eqtype and the easy cases are similar .
...
( types.c functions)+=
Type compose(tyl, ty2) Type tyl, ty2; {
71 73
...
if (tyl == ty2)
return tyl;
switch (tyl->op) {
case POINTER:
return ptr(compose(tyl->type, ty2->type));
case CONST+VOLATILE:
return qual(CONST, qual(VOLATILE,
compose(tyl->type, ty2->type)));
case CONST: case VOLATILE:
align 78 return qual(tyl->op, compose(tyl->type, ty2->type));
array 61
ARRAY 109
case ARRAY: { (compose two array types 72) }
CONST 109 case FUNCTION: { (compose two function types 72) }
eqtype 69 }
FUNCTION 109 }
POINTER 109
ptr 61 Two compatible array types form a new array whose size is the size
qual 62 of the complete array, if there is one.
VOLATILE 109
(compose two array types 72) = 72
Type ty = compose(tyl->type, ty2->type);
if (tyl->size && tyl->type->size && ty2->size == 0)
return array(ty, tyl->size/tyl->type->size, tyl->align);
if (ty2->size && ty2->type->size && tyl->size == 0)
return array(ty, ty2->size/ty2->type->size, ty2->align);
return array(ty, 0, O);
The composite type of two compatible function types has a return type
that is the composite type of the two return types, and argument types
that are the composite types of the corresponding argument types. If
one function type does not have a prototype, the composite type has the
prototype from the other function type.
(compose two function types 72)= 72
Type *pl = tyl->u.f .proto, *p2 = ty2->u.f .proto;
4. 8 • TYPE MAPPING 73
Exercises
4.1 Give the parenthesized prefix form for the types in the following
declarations.
long double d;
char ***p;
const int *const volatile *q;
int (*r)[10][4];
struct tree *(*s[])(int, struct tree*, struct tree *);
EXERCISES 75
4.2 Give an example of a C structure definition that draws the tag re-
definition diagnostic described in Section 4.6.
4.3 Implement the predicate
(types.c exported functions)= 75
.....
extern int hasproto ARGS((Type));
which returns one if ty includes no function types or if all of the
function types it includes have prototypes, and zero otherwise.
hasproto is used to warn about missing prototypes. It doesn't warn
about missing prototypes in structure fields that are function point-
ers, because it's called explicitly with the types of the fields as the
structure is parsed.
4.4 1cc prints an English rendition of types in diagnostics. For example,
the types of spri ntf, shown in Section 4.5, and of
char *(*strings)[lO]
are printed as
The output functions interpret the pri ntf-style code %t to print the
next Type argument, and call
( types.c exported functions)+=
...
75 75
.....
extern void outtype ARGS((Type));
to do so. Implement outtype.
4.5 types. c exports three other functions that format and print types .
( types.c exported functions)+=
...
75 76
.....
extern void printdecl ARGS((Symbol p, Type ty));
extern void printproto ARGS((Symbol p, Symbol args[]));
extern char *typestring ARGS((Type ty, char *id));
typestri ng returns a C declaration that specifies ty to be the type
of the identifier id. For example, if ty is
4.9 Explain why l cc insists that the definition of i sdi git in the C pro-
gram below conflicts with the external declaration of i sdi git.
179 addressed
67 newstruct
5
Code Generation Interface
78
5.2 • INTERFACE RECORDS 79
Metrics floatmetric;
Metrics doublemetric;
Metrics ptrmetric;
Metrics structmetric;
ptrmetri c describes pointers of all types. The alignment of a structure
is the maximum of the alignments of its fields and structmetri c. align,
which thus gives the minimum alignment for structures; structmetri e's
size field is unused. Back ends usually set outofl i ne to zero only for
those types whose values can appear as immediate operands of instruc-
tions.
The size and alignment for characters must be one. The front end
correctly treats signed and unsigned integers and longs as distinct types,
but it assumes that they all share i ntmetri c. Likewise for doubles and
long doubles. Each pointer must fit in an unsigned integer.
5.3 Symbols
A symbol represents a variable, label, or constant; the scope field tells
which. For variables and constants, the back end may query the type
field to learn the data type suffix of the item. For variables and labels, the
floating-point value of the ref field approximates the number of times
that variable or label is referenced; a nonzero value thus indicates that
CONSTANTS 38 the variable or label is referenced at least once. For labels, constants,
Coordinate 38 and some variables, a field of the union u supplies additional data.
generated 50 Variables have a scope equal to GLOBAL, PARAM, or LOCAL+k for nesting
GLOBAL 38 level k. scl ass is STATIC, AUTO, EXTERN, or REGISTER. The name of most
LABELS 38 variables is the name used in the source code. For temporaries and other
LOCAL 38
PARAM 38 generated variables, name is a digit sequence. For global and static vari-
sclass 38 ables, u . seg gives the logical segment in which the variable is defined.
scope 37 If the interface flag wants_dag is zero, the front end generates explicit
structarg 292 temporary variables to hold common subexpressions - those used more
symbol 37 than once. It sets the u. t. cse fields of these symbols to the dag nodes
temporary 50
type 56 that compute the values stored in them.
wants_argb 88 The flags temporary and generated are set for temporaries, and the
wants_dag 89 flag generated is set for labels and other generated variables, like those
that hold string literals. structarg identifies structure parameters when
the interface flag wants_argb is set; the material below on wants_argb
elaborates.
Labels have a scope equal to LABELS. The u .1 .1 abel field is a unique
numeric value that identifies the label, and name is the string represen-
tation of that value. Labels have no type or scl ass.
Constants have a scope equal to CONSTANTS, and an sclass equal to
STATIC. For an integral or pointer constant, name is its string represen-
tation as a C constant. For other types, name is undefined. The actual
value of the constant is stored in the u. c. v field, which is defined on
5.4 • TYPES 81
5.4 Types
Symbols have a type field. If the symbol represents a constant or vari-
able, the type field points to a structure that describes the type of the
item. Back ends may read the size and a 1 i gn fields of this structure to
learn the size and alignment constraints of the type in bytes. Back ends
may also pass the type pointer itself to predicates like i sarray and ttob
to learn about the type without examining other fields.
short op;
short count;
Symbol syms[3];
Node kids[2];
Node link;
Xnode x;
} ;
The elements of kids point to the operand nodes. Some dag operators
also take one or two symbol-table pointers as operands; these appear in
syms. The back end may use the third syms for its own purposes; the
front end uses it, too, but its uses are temporary and occur before dags
are passed to the back end, as detailed in Section 12.8. 1 ink points to
the root of the next dag in the forest.
count records the number of times the value of this node is used or
referred to by others. Only references from kids count; 1 ink references
82 CHAPTER 5 • CODE GENERATION INTERFACE
don't count because they don't represent a use of the value of the node.
Indeed, 1ink is meaningful only for root nodes, which are executed for
side effect, not value. If the interface flag wants_dag is zero, roots always
have a zero count. The generated code for shared nodes - those whose
count exceed one - must evaluate the node only once; the value is used
count times.
The x field is the back end's extension to nodes. The back end defines
the type Xnode in confi g. h to hold the per-node data that it needs to
generate code. Chapter 13 describes the fields.
The op field holds a dag operator. The last character of each is a type
suffi.x from the list in the type definition:
....
(c.h exported twes)+= 81 82
.....
enum {
F=FLOAT,
D=DOUBLE,
C=CHAR,
S=SHORT,
!=INT,
U=UNSIGNED,
P=POINTER,
V=VOID,
CHAR 109 B=STRUCT
count 81 };
DOUBLE 109
FLOAT 109 For example, the generic operator ADD has the variants ADDI, ADDU, ADDP,
INT 109 ADDF, and ADDD. These suffixes are defined so that they have the values
POINTER 109 1-9.
SHORT 109
STRUCT 109
The operators are defined by
UNSIGNED 109 ....
VOID 109
( c.h exported t}Pes) += 82 91
.....
wants_dag 89 en um { (operators 82) } ;
Xnode 358
(opera tors 82) = 82
CNST=1«4,
CNSTC=CNST+C,
CNSTD=CNST+D,
CNSTF=CNST+F,
CNSTI=CNST+I,
CNSTP=CNST+P,
CNSTS=CNST+S,
CNSTU=CNST+U,
ARG=2«4,
ARGB=ARG+B,
ARGD=ARG+D,
ARGF=ARG+F,
5.5 •DAG OPERATORS 83
ARGI=ARG+I,
ARGP=ARG+P,
The rest of (operators) defines the remaining operators. Table 5.I lists
each generic operator, its valid type suffixes, and the number of kids and
syms that it uses; multiple values for kids indicate type-specific variants,
which are described below. The notations in the syms column give the
number of syms values and a one-letter code that suggests their uses: IV
indicates that syms [OJ points to a symbol for a variable, IC indicates that
syms [OJ is a constant, and IL indicates that syms [OJ is a label. For IS,
syms [OJ is a constant whose value is a size in bytes; 2S adds syms [lJ,
which is a constant whose value is an alignment. For most operators,
the type suffix denotes the type of operation to perform and the type of
the result. Exceptions are ADDP, in which an integer operand in kids [OJ
is added to a pointer operand in kids [lJ, and SUBP, which subtracts an
integer in kids [lJ from a pointer in kids [OJ. The operators for assign-
ment, comparison, arguments, and some calls return no result; their type
suffixes denote the type of operation to perform.
The leaf operators yield the address of a variable or the value of a con-
stant. syms [OJ identifies the variable or constant. The unary operators
accept and yield a number, except for INDIR, which accepts an address
and yields the value at that address. There is no BCOMI; signed integers
are complemented using BCOMU. The binary operators accept two num- 81 kids
bers and yield one. 81 syms
The type suffix for a conversion operator denotes the type of the re-
sult. For example, CVUI converts an unsigned (U) to a signed integer
(I). Conversions between unsigned and short and between unsigned and
character are unsigned conversions; those between integer and short and
between integer and character are signed conversions. For example, CVSU
converts an unsigned short to an unsigned, and thus clears the high-
order bits. CVSI converts a signed short to a signed integer, and thus
propagates the short's sign to fill the high-order bits.
The front end builds dags or otherwise composes conversions to form
those not in the table. For example, it converts a short to a float by first
converting it to an integer and then to a double. The I6 conversion op-
erators are represented by arrows in Figure 5.1. Composed conversions
follow the path from the source type to the destination type.
ASGN stores the value of kids [lJ into the cell addressed by kids [OJ.
syms [OJ and syms [lJ point to symbol-table entries for integer constants
that give the size of the value and its alignment. These are most useful
for ASGNB, which assigns structures and initializes automatic arrays.
JUMPV is an unconditional jump to the address computed by kids [OJ.
For most jumps, kids [OJ is a constant ADDRGP node, but switch state-
ments compute a variable target, so kids [OJ can be an arbitrary com-
putation. LABEL defines the label given by syms [OJ, and is otherwise a
no-op. For the comparisons, syms [OJ points to a symbol-table entry for
84 CHAPTER 5 • CODE GENERATION INTERFACE
f(char c) { f(c); }
c c
o-r-u-P
1 1
1F
1s 1s
FIGURE 5.1 Conversions.
86 CHAPTER 5 •CODE GENERATION INTERFACE
ADDRFP
/~CVIC i
ever
i
ADDRGP
c
i
IND I RC
i
INDIRC
f
i
ADDRFP
i
ADDRFP
c c
FIGURE 5.2 Forests for f(char c) { f(c); }
becomes the two forests shown in Figure 5.2. The solid lines are kids
pointers and the dashed line is the 1ink pointer. The left forest holds
one dag, which narrows the widened actual argument to the type of the
formal parameter. In the left dag, the left ADDRFP c refers to the formal
parameter, and the one under the INDIRC refers to the actual argument.
The right forest holds two dags. The first widens the formal parameter
c to pass it as an integer, and the second calls f.
Unsigned variants of ASGN, INDIR, ARG, CALL, and RET were omitted
as unnecessary. Signed and unsigned integers have the same size, so
the corresponding signed operator is used instead. Likewise, there is no
kids 81
CALLP or RETP. A pointer is returned by using CVPU and RETI. A pointer-
valued function is called by using CALLI and CVUP.
In Table 5.1, the operators listed at and following ASGN are used for
their side effects. They appear as roots in the forest, and their reference
counts are zero. CALLO, CALLF, and CALLI may also yield a value, in
which case they appear as the right-hand side of an ASGN node and have
a reference count of one. With this lone exception, all operators with side
effects always appear as roots in the forest of dags, and they appear in
the order in which they must be executed. The front end communicates
all constraints on evaluation order by ordering the dags in the forest.
lf ANSI specifies that x must be evaluated before y, then x's dag will
appear in the forest before y's, or they will appear in the same dag with
x in the subtree rooted by y. An example is
inti, *p; f() { i = *p++; }
The code for the body off generates the forest shown in Figure 5.3. The
INDIRP fetches the value of p, and the ASGNP changes p's value to the
sum computed by this INDIRP and 4. The ASGNI sets i to the integer
pointed to by the original value of p. Since the INDIRP appears in the
forest before pis changed, the INDIRI is guaranteed to use the original
value of p.
5.6 • INTERFACE FLAGS 87
PGP~ ~
ADJRY \ooP ADDt \orRr
i
CNSTI
4
where the addresses of the bytes increase from the right to the left.
A computer is a big endian if the least significant byte in each word has
the largest address of the bytes in the word. For example, big endians
lay out the word with the unsigned value OxAABBCCDD thus:
In other words, 1 cc's front end lays out a list of bit fields in the address-
ing order of the bytes in an unsigned integer: from the least significant
byte to the most significant byte on little endians and vice versa on big
endians. ANSI permits either order, but following increasing addresses
is the prevailing convention.
(interface flags 87) +=
...
87 88 79
.....
unsigned mulops_calls:l;
should be zero if the hardware implements multiply, divide, and remain-
der. It should be zero if the hardware leaves these operations to library
routines. The front end unnests nested calls, so it needs to know which
operators are emulated by calls. It might become necessary to generalize
this feature to handle other emulated instructions, but no target so far
has needed more.
88 CHAPTER 5 • CODE GENERATION INTERFACE
tells the front end to emit CALLB nodes to invoke functions that return
structures. If wants_ca 11 b is zero, the front end generates no CALLB
nodes but implements them itself, using simpler operations: It passes
an extra, leading, hidden argument that points to a temporary; it ends
each structure function with an ASGNB dag that copies the return value
to this temporary; and it has the caller use this temporary when it needs
the structure returned. When wants_ca11 b is one, the front end gener-
ates CALLB nodes. The kids [1] field of a CALLB computes the address
of the location at which to store the return value, and the first local of
any function that returns a structure is assumed to hold this address.
Back ends that set wants_cal 1b to one must implement this convention
by, for example, initializing the address of the first local accordingly. If
wants_ca 11 b is zero, the back end cannot control the code for functions
that return structure arguments, so it cannot, in general, mimic an exist-
ing calling convention. In this book, the MIPS and X86 code generators
initialize wants_ca 11 b to zero; the front end's implementation of CALLB
happens to be compatible with the calling conventions for the MIPS .
tells the front end to emit ARGB nodes to pass structure arguments. If
wants_argb is zero, the front end generates no ARGB nodes but imple-
ments structure arguments itself using simpler operations: It builds an
ASGNB dag that copies the structure argument to a temporary; it passes a
pointer to the temporary; it adds an extra indirection to references to the
parameter in the callee; and it changes the types of the callee's formals
to reflect this convention. It also sets structarg for these parameters
to distinguish them from bona fide structure pointers. If wants_argb is
zero, the back end cannot control the code for structure arguments, so
it cannot, in general, mimic an existing calling convention. In this book,
the SPARC code generator initializes wants_argb to zero; the others ini-
tialize it to one. The front end's implementation of ARGB is compatible
with the SPARC calling convention.
tells the front end to evaluate and to present the arguments to the back
end left to right. That is, the ARG nodes that precede the CALL appear in
the same order as the arguments in the source code. If left_to_right
zero, arguments are evaluated and presented right to left. ANSI permits
either order.
5. 7 • INITIALIZATION 89
....
(interface flags 87) += 88 79
unsigned wants_dag:l;
tells the front end to pass dags to the back end. If it's zero, the front
end undags all nodes with reference counts exceeding one. It creates a
temporary, assigns the node to the temporary, and uses the temporary
wherever the node had been used. When wants_dag is zero, all refer-
ence counts are thus zero or one, and only trees, which are degenerate
dags, remain; there are no general dags. The code generators in this
book generate code using a method that requires trees, so they initial-
ize wants_dag to zero, but other code generators for l cc have generated
code from dags.
S. 7 Initialization
During initialization, the front end calls
....
(interface functions 80) + =
void (*progbeg) ARGS((int argc, char *argv[]));
80 89
... 79
argv[O .. argc-1] point to those program arguments that are not recog-
nized by the front end, and are thus deemed target-specific. progbeg
processes such options and initializes the back end.
At the end of compilation, the front end calls 90 address
.... 457 " (MIPS)
(interface functions 80) +=
void (*progend) ARGS((void));
89 89
... 79 490
521
" (SPARC)
" (X86)
38 CONSTANTS
to give the back end an opportunity to finalize its output. On some 92 function
targets, progend has nothing to do and is empty. 448 " (MIPS)
484 " (SPARC)
518 " (X86)
38 GLOBAL
5.8 Definitions 38 LABELS
38 LOCAL
Whenever the front end defines a new symbol with scope CONSTANTS, 90 local
LABELS, or GLOBAL, or a static variable, it calls 447 " (MIPS)
.... 483 " (SPARC)
(interface functions 80) +=
void (*defsymbol) ARGS((Symbol));
89 90
... 79 518
38
" (X86)
PARAM
37 scope
to give the back end an opportunity to initialize its Xsymbol field. For 362 Xsymbol
example, the back end might want to use a different name for the sym-
bol. The conventions on some targets in this book prefix an underscore
to global names. The Xsymbol fields of symbols with scope PARAM are
initialized by function, those with scope LOCAL+k by local, and those
that represent address computations by address.
A symbol is exported if it's defined in the module at hand and used
in other modules. It's imported if it's used in the module at hand and
defined in some other module. The front end calls
90 CHAPTER 5 •CODE GENERATION INTERFACE
....
(interface functions 80) +=
void (*export) ARGS((Symbol));
89 90 ... 79
emits code to define a global variable. The front end will already have
called segment, described below, to direct the definition to the appro-
priate logical segment, and it will have set the symbol's u. seg to that
segment. It will follow the call to global with any appropriate calls to
the data initialization functions. g1oba1 must emit the necessary align-
ment directives and define the label.
The front end announces local variables by calling
....
(interface functions 80) +=
void (*local) ARGS((Symbol));
90 90 ... 79
....
(c.h exported types)+=
enum { CODE=l, BSS, DATA, LIT };
82 97 ...
The front end emits executable code into the CODE segment, defines unini-
tialized variables in the BSS segment, and it defines and initializes ini-
tialized variables in the DATA segment and constants in the LIT segment.
The front end calls
....
(interface functions 80) +=
void (*segment) ARGS((int));
90 91
... 79
5.9 Constants
The interface functions 47 Value
....
(interface functions 80) +=
void (*defaddress) ARGS((Symbol));
91 92
... 79
The codes C, S, I, ... are identical to the type suffixes used for the oper-
ators. The signed fields v. sc and v. ss can be used instead of v. uc and
v. us, but defconst must initialize only the specified number of bits. If
92 CHAPTER 5 • CODE GENERATION INTERFACE
5.10 Functions
The front end compiles functions into private data structures. It com-
pletely consumes each function before passing any part of the function
to the back end. This organization permits certain optimizations. For
example, only by processing complete functions can the front end iden-
tify the locals and parameters whose address is not taken; only these
variables may be assigned to registers.
Three interface functions and two front-end functions collaborate to
compile a function.
....
(interface functions 80)+=
void (*function) ARGS((Symbol, Symbol[], Symbol[], int));
92 95 ... 79
nodes in their x fields to identify the code selected, and returns a pointer
that is ultimately passed to the back end's emit to output the code. Once
the front end calls gen, it does not inspect the contents of the nodes
again, so gen may modify them freely.
emit emits a forest. Typically, it traverses the forest and emits code
by switching on the opcode or some related value stored in the node by
gen.
The MIPS, SPARC, and X86 interfaces are described in Chapters 16, 17,
and 18. The interfaces nul 1 and symbo 1i c are described in Exercises 5.2
and 5.1.
5.12 Upcalls
The front and back ends are clients of each other. The front end calls on
the back end to generate and emit code. The back end calls on the front
end to perform output, allocate storage, interrogate types, and manage
nodes, symbols, and strings. The front-end functions that back ends
may call are summarized below. Some of these functions are explained in
previous chapters, but are included here to make this summary complete.
void *a11 ocate Ci nt n, int a) permanently allocates n bytes in the
arena a, which can be one of
(c.h exported types)+=
....
91
enum { PERM=O, FUNC, STMT };
and returns a pointer to the first byte. The space is guaranteed to be 26 allocate
aligned to suit the machine's most demanding type. Data allocated in 64 freturn
98 outflush
PERM are deallocated at the end of compilation; data allocated in FUNC 16 outs
and STMT are deallocated after compiling functions and statements. 18 print
points to the next character in the output buffer. The idiom *bp++ = c
thus appends c to the output as shown in outs on page 16. One of the
other output functions, described below, must be called at least once
every 80 characters.
prints its third and following arguments on the file descriptor fd. See
print for formatting details. If fd is not 1 (standard output), fpri nt
calls outfl ush to flush the output buffer for fd.
Type freturn(Type ty) is the type of the return value for function
type ty.
....
(c.h exported macros)+= 19 98
....
#define generic(op) ((op)&-15)
98 CHAPTER 5 • CODE GENERATION INTERFACE
is the generic version of the type-specific dag operator op. That is, the
expression generi c(op) returns op without its type suffix.
int gen label (int n) increments the generated-identifier counter by
n and returns its old value.
int istype(Type ty) are type predicates that return nonzero if type
ty is a type shown in the table below.
Predicate Type
i sari th arithmetic
i sarray array
ischar character
i sdoubl e double
i senum enumeration
i sfloat floating
i sfunc function
i sint integral
isptr pointer
isscalar scalar
isstruct structure or union
i sunion union
isunsigned unsigned
generic 97 Node newnode(int op, Node l, Node r, Symbol sym) allocates a dag
genlabel 45 node; initializes the op field to op, kids [O] to l, kids [1] to r, and
kids 81 syms [O] to sym; and returns a pointer to the new node.
local 90
(MIPS) " 447 Symbol newconst (Val u~ v, int t) installs a constant with value v
(SPARC) " 483 and type suffix t into the symbol table, if necessary, and returns a pointer
(X86) .. 518 to tqe symbol-table entry.
newnode 315 Symbol newtemp(i nt scl ass, int t) creates a temporary with stor-
newtemp 50 age class scl ass and a type with type suffix t, and returns a pointer
syms 81
to the symbol-table entry. The new temporary is announced by calling
local.
opi ndex(op) is the operator number, for operator op:
....
(c.h exported macros)+=
#define opindex(op) ((op)>>4)
97 98 ...
opi ndex is used to map the generic operators into a contiguous range of
integers.
writes the current output buffer to the standard output, if it's not empty.
void outs (char *s) appends string s to the output buffer for stan-
dard output, and calls outfl ush if the resulting buffer pointer is within
80 characters of the end of the buffer.
void pri nt(char *fmt, ... ) prints its second and following argu-
ments on standard output. It is like pri ntf but supports only the for-
mats %c, %d, %0, %x, and %s, and it omits precision and field-width spec-
ifications. print supports four 1cc-specific format codes. %5 prints a
string of a specified length; the next two arguments give the string and
its length. %k prints an Fnglish rendition of the integer token code given
by the corresponding argument, and %t prints an English rendition of a
type. %w prints the source coordinates given by its corresponding argu-
ment, which must be a pointer to a Coordinate. print calls outfl ush
if it prints a newline character from fmt within 80 characters of the end
of the output buffer. Each format except %c does the actual output with
outs, which may also flush the buffer.
int roundup(i nt n, int m) is n rounded up to the next multiple of
m, which must be a power of two.
char *stri ng(char *s) installs sin the string table, if necessary, and
returns a pointer to the installed copy.
char *stri ngd(i nt n) returns the string representation of n; stri ngd
installs the returned string in the string table.
Further Reading
Fraser and Hanson (1991a and 1992) describe the earlier versions of
1cc's code generation interface. This chapter is more detailed, and cor-
responds to version 3.1 and above of 1 cc.
Some compiler interfaces emit abstract machine code, which resembles
an assembler code for a fictitious machine (Tanenbaum, van Staveren,
and Stevenson 1982). The front end emits code for the abstract ma-
chine, which the back end reads and translates it to target code. Abstract
machines decouple the front and back ends, and make it easy to insert
extra optimization passes, but the extra 1/0 and structure allocation and
initialization take time. 1cc's tightly coupled interface yields efficient,
compact compilers, but it can complicate maintenance because changes
100 CHAPTER 5 • CODE GENERA T/ON INTERFACE
to the front end may affect the back ends. This complication is less im-
portant for standardized languages like ANSI C because there will be few
changes to the language.
Exercises
5.1 1 cc can be turned into a syntax and semantics checker by writing a
null code generator whose interface record points to functions that
do nothing. Implement this interface.
5.2 Implement a symbolic back end that generates a trace of the inter-
face functions as they are called and a readable representation of
their arguments. As an example, the output of the symbolic back
end that comes with 1cc for
is
export f
segment text
emit 92 function f type=int function(void) class=auto ...
emit 393 maxoffset=O
function 92 node#2 ADDRGP count=2 p
(MIPS) " 448 node'l INDIRP count=2 #2
(SPARC) " 484
(X86) " 518
node#S CNSTI count=l 4
gen 92 node#4 ADDP count=l #1 #5
gen 402 node'3 ASGNP count=O #2 #4 4 4
node#? ADDRGP count=l i
node#8 INDIRI count=l #1
node'6 ASGNI count=O #7 #8 4 4
1:
end f
segment bss
export p
global p type=pointer to int class=auto ...
space 4
export i
global i type=int class=auto scope=GLOBAL ref=lOOO
space 4
All of the interface routines in this back end echo their arguments
and some provide additional information. For example, function
computes a frame size, which it prints as the value of maxoffset
as shown above. gen and emit collaborate to print dags as shown
EXERCISES 101
above. gen numbers the nodes in each forest (by annotating their x
fields), and emit prints these numbers for node operands. emit also
identifies roots by prefixing their numbers with accents graves, as
shown for nodes 1, 3, and 6 in the first forest above. For a LABELV
node, emit prints a line with just the label number and a colon.
Compare this output with the linearized representation shown on
page 95.
5.3 Write a code generator that simply emits the names of all identifiers
visible to other modules, and reports those imported names that
are not used.
5.4 When 1cc's interface was designed, 32-bit integers were the norm,
so nothing was lost by having integers and longs share one metric.
Now, many machines support 32-bit and 64-bit integers, and our
shortcut complicates using both data types in the same code gen-
erator. How would adding two new type suffixes - L for long and
O for unsigned long - change 1cc's interface? Consider the effect
on the type metrics, the node operators in general, and the con-
version operators in particular. Redraw Figure 5.1. Which interface
functions would have to change? How?
5.5 Design an abstract machine consistent with 1cc's interface, and use
it to separate 1cc's front end from its back end. Write a code gen- 92 emit
erator that emits code for your abstract machine. Adapt 1cc's back 393 emit
92 gen
end to read your abstract machine code, rebuild the data structures 402 gen
that the back end uses now, and call the existing back end to gener-
ate code. This exercise might take a month or so, but the flexibility
to read abstract-machine code, optimize it, and write it back out
would simplify experimenting with optimizers.
6
Lexical Analysis
The lexical analyzer reads source text and produces tokens, which are
the basic lexical units of the language. For example, the expression
*pt r = 56 ; contains 10 characters or five tokens: *, pt r, =, 56, and
; . For each token, the lexical analyzer returns its token code and zero
or more associated values. The token codes for single-character tokens,
such as operators and separators, are the characters themselves. Defined
constants (with values that do not collide with the numeric values of sig-
nificant characters) are used for the codes of the tokens that can consist
of one or more characters, such as identifiers and constants.
For example, the statement *ptr = 56; yields the token stream shown
on the left below; the associated values, if there are any, are shown on
the right.
'*I
ID "ptr" symbol-table entry for "ptr"
stringn 30 '='
ICON "56" symbol-table entry for 56
The token codes for the operators * and = are the operators themselves,
i.e., the numeric values of * and =, respectively, and they do not have
associated values. The token code for the identifier ptr is the value of
the defined constant ID, and the associated values are the saved copy
of the identifier string itself, i.e., the string returned by stri ngn, and a
symbol-table entry for the identifier, if there is one. Likewise, the integer
constant 56 returns ICON, and the associated values are the string "56"
and a symbol-table entry for the integer constant 56.
Keywords, such as "for," are assigned their own token codes, which
distinguish them from identifiers.
The lexical analyzer also tracks the source coordinates for each token.
These coordinates, defined in Section 3.1, give the file name, line number,
and character index within the line of the first character of the token.
Coordinates are used to pinpoint the location of errors and to remember
where symbols are defined.
The lexical analyzer is the only part of the compiler that looks at each
character of the source text. It is not unusual for lexical analysis to ac-
count for half the execution time of a compiler. Hence, speed is impor-
tant. The lexical analyzer's main activity is moving characters, so mini-
mizing the amount of character movement helps increase speed. This is
done by dividing the lexical analyzer into two tightly coupled modules.
102
6.1 •INPUT 103
The input module, input. c, reads the input in large chunks into a buffer,
and the recognition module, 1ex. c, examines the characters to recognize
tokens.
6.1 Input
In most programming languages, input is organized in lines. Although
in principle, there is rarely a limit on line length, in practice, line length
is limited. In addition, tokens cannot span line boundaries in most lan-
guages, so making sure complete lines are in memory when they are
being examined simplifies lexical analysis at little expense in capability.
String literals are the one exception in C, but they can be handled as a
special case.
The input module reads the source in large chunks, usually much
larger than individual lines, and it helps arrange for complete tokens
to be present in the input buffer when they are being examined, except
identifiers and string literals. To minimize the overhead of accessing the
input, the input module exports pointers that permit direct access to the
input buffer:
...
(input.c exported data)+=
extern unsigned char *cp;
97 104
...
106 fillbuf
extern unsigned char *limit; 106 nextline
cp limit
where shading depicts the characters that have yet to be consumed and
\n represents the newline. If fi 11 buf is called, it slides the unconsumed
tail of the input buffer down and refills the buffer. The resulting state is
:·. J n-~:;:::r;~~1tl!itl\~::
cp
t t
1 imit
fillbuf 106 where the darker shading differentiates the newly read characters from
limit 103
nextline 106
those moved by fi 11 buf. When a call to fi 11 buf reaches the end of the
input, the buffer's state becomes
:::::::~'~'\~"---~'· ::::
t t
cp limit
Finally, when nextl i ne is called for the last sentinel at *limit, fi 11 buf
sets cp equal to 1i mi t, which indicates end of file (after the first call to
nextl i ne). This final state is
____i:::::
:·.:·.:·.:·.·.:·.:·_:l~\n
cp
t
1 imit
Input is read from the file descriptor given by i nfd; the default is zero,
which is the standard input. fi 1e is the name of the current input file;
line gives the location of the beginning of the current line, if it were
to fit in the buffer; and 1 i neno is the line number of the current line.
The coordinates f, x, y of the token that begins at cp, where f is the file
name, are thus given by fi 1e, cp- line, and 1i neno, where characters in
the line are numbered beginning with zero. 1i ne is used only to compute
the x coordinate, which counts tabs as single characters. fi rstfi 1e
gives the name of the first source file encountered in the input; it's used
in error messages.
The input buffer itself is hidden inside the input module:
(input.c exported macros)=
#define MAXLINE 512
#define BUFSIZE 4096
(input.c data)=
static int bsize;
static unsigned char buffer [MAXLINE+l + BUFSIZE~ :;
BUFSIZE is the size of the input buffer into which characters are read,
and MAXLINE is the maximum number of characters allowed in an uncon-
sumed tail of the input buffer. fi 11 buf must not be called if 1 i mi t-cp
104 file
is greater than MAXLINE. The standard specifies that compilers need not 106 fillbuf
handle lines that exceed 509 characters; l cc handles lines of arbitrary 104 fi rstfi le
length, but, except for identifiers and string literals, insists that tokens 104 infd
not exceed 512 characters. 103 limit
The value of bsi ze encodes three different input states: If bsi ze is less 104 line
104 lineno
than zero, no input has been read or a read error has occurred; if bsi ze 106 nextline
is zero, the end of input has been reached; and bsi ze is greater than
zero when bsi ze characters have just been read. This rather complicated
encoding ensures that 1cc is initialized properly and that it never tries
to read past the end of the input.
i nputini t initializes the input variables and fills the buffer:
...
(input.c functions)+=
void nextline() {
105 106...
do {
if (cp >= limit) {
(refill buffer 106)
if (cp == limit)
return;
} else
lineno++;
for (line= (char *)cp; *cp==' ' I I *cp=='\t'; cp++)
If cp is still equal to 1i mi t after filling the buffer, the end of the file has
been reached. The do-while loop advances cp to the first nonwhite-space
character in the line, treating sentinel newlines as white space. The last
four lines of next 1i ne check for resynchronization directives emitted by
bsize 105 the preprocessor; see Exercise 6.2. i nput!ni t and next 1 i ne call fi 11 buf
buffer 105 to refill the input buffer:
BUFSIZE 105
infd 104 (refill buffer 106)= 105 106
inputinit 105 fillbuf();
limit 103 if (cp >= limit)
line 104
cp = limit;
lineno 104
MAX LINE 105 If the input is exhausted, cp will still be greater than or equal to 1i mi t
resynch 125
when fi 11 buf returns, which leaves these variables set as shown in the
last diagram on page 104. fi 11 buf does all of the buffer management
and the actual input:
(input.c functions)+=
...
106
void fillbuf() {
if (bsize == 0)
return;
i f (cp >= limit)
cp = &buffer[MAXLINE+l];
else
(move the tail portion 107)
bsize = read(infd, &buffer[MAXLINE+l], BUFSIZE);
if (bsize < 0) {
error("read error\n");
exit(l);
6.2 • RECOGNIZING TOKENS 107
}
limit= &buffer[MAXLINE+l+bsize];
*limit= '\n';
}
fi 11 buf reads the BUFSIZE (or fewer) characters into the buffer begin-
ning at position MAXLINE+l, resets 1imi t, and stores the sentinel newline.
If the input buffer is empty when fi 11 buf is called, cp is reset to point
to the first new character. Otherwise, the tail 1i mi t-cp characters are
moved so that the last character is in buffer[MAXLINE], and is thus ad-
jacent to the newly read characters.
(move the tail portion 107) = 106
{
int n = limit - cp;
unsigned char *s = &buffer[MAXLINE+l] - n;
line= (char *)s - ((char *)cp - line);
while (cp < limit)
*s++ = *cp++;
cp = &buffer[MAXLINE+l] - n;
}
Notice the computation of 1i ne: It accounts for the portion of the current
line that has already been consumed, so that cp- 1i ne gives the correct 105 bsize
index of the character *cp. 105 buffer
105 BUFSIZE
106 fillbuf
103 limit
6.2 Recognizing Tokens 104 line
105 MAXLINE
There are two principal techniques for recognizing tokens: building a
finite automaton or writing an ad hoc recognizer by hand. The lexical
structure of most programming languages can be described by regular
expressions, and such expressions can be used to construct a determin-
istic finite automaton that recognizes and returns tokens. The advantage
of this approach is that it can be automated. For example, LEX is a pro-
gram that takes a lexical specification, given as regular expressions, and
generates an automaton and an appropriate interpreting program.
The lexical structure of most languages is simple enough that lexical
analyzers can be constructed easily by hand. In addition, automatically
generated analyzers, such as those produced by LEX, tend to be large
and slower than analyzers built by hand. Tools like LEX are very use-
ful, however, for one-shot programs and for applications with complex
lexical structures.
For C, tokens fall into the six classes defined by the following EBNF
grammar:
108 CHAPTER 6 • LEXICAL ANALYSIS
token:
keyword
identifier
constant
string-literal
operator
punctuator
punctuator:
one of [ ] ( ) { } * , : = ; ...
White space - blanks, tabs, newlines, and comments - separates some
tokens, such as adjacent identifiers, but is otherwise ignored except in
string literals.
The lexical analyzer exports two functions and four variables:
(lex.c exported functions)=
extern int getchr ARGS((void));
extern int gettok ARGS((void));
( token.h 109) +=
...
109
yy(O, 42, 13, MUL, multree,ID, "*")
yy(O, 43, 12, ADD, addtree,ID, "+")
yy(O, 44, 1, 0, 0, I I
II'")
yy(O, 45, 12, SUB, ' '
subtree,ID, "-")
yy(O, 46, 0, 0, 0, I I
II•")
yy(O, 47, 13, DIV, multree, '/',' "/")
xx(DECR, 48, 0, SUB, subtree,ID, II
--")
xx(DEREF, 49, 0, 0, 0, DEREF, "->")
xx(ANDAND, 50, 5' AND, andtree,ANDAND, "&&")
xx(OROR, 51, 4, OR, andtree,OROR, "11 ")
xx(LEQ, 52, 10, LE, cmptree,LEQ, "<=")
given by the values in the second column. token. h is read to define sym-
bols, build arrays indexed by token, and so forth, and using it guarantees
that such definitions are synchronized with one another. This technique
is common in assembler language programming.
Single-character tokens have yy lines and multicharacter tokens and
other definitions have xx lines. The first column in xx is the enumeration
identifier. The other columns give the identifier or character value, the
precedence if the token is an operator (Section 8.3), the generic opera-
tor (Section 5.5), the tree-building function (Section 9.4), the token's set
(Section 7.6), and the string representation.
These columns are extracted for different purposes by defining the xx
and yy macros and including token. h again. The enumeration definition
above illustrates this technique; it defines xx so that each expansion de-
fines one member of the enumeration. For example, the xx line for DECR
expands to
DECR=48,
and thus defines DECR to an enumeration constant with the value 48. yy
is defined to have no replacement, which effectively ignores the yy lines.
The global variable t is often used to hold the current token, so most
calls to gettok use the idiom
DECR 109 t = gettok();
gettok 111
src 108 token, tsym, and src hold the values associated with the current token,
Symbol 37 if there are any. token is the source text for the token itself, and tsym is
token. h 109
token 108
a Symbol for some tokens, such as identifiers and constants. src is the
tsym 108 source coordinate for the current token.
gettok could return a structure containing the token code and the
associated values, or a pointer to such a structure. Since most calls to
gettok examine only the token code, this kind of encapsulation does
not add significant capability. Also, gettok is the most frequently called
function in the compiler; a simple interface makes the code easier to
read.
gettok recognizes a token by switching on its first character, which
classifies the token, and consuming subsequent characters that make up
the token. For some tokens, these characters are given by one or more
of the sets defined by map. map [ c] is a mask that classifies character c
as a member of one or more of six sets:
(lex.c types)=
enum { BLANK=Ol, NEWLINE=02, LETTER=04,
DIGIT=OlO, HEX=020, OTHER=040 };
(lex. c functions)=
int gettok() {
...
117
for (; ;) {
register unsigned char *rep cp;
(skip white space 112)
if (limit - rep < MAXTOKEN) {
cp = rep; 110 BLANK
fillbuf(); 110 DIGIT
rep = cp; 104 file
} 106 fillbuf
110 HEX
src.file = file; 110 LETTER
src.x = (char *)rep - line; 103 limit
src.y = lineno; 104 line
cp = rep + 1; 104 lineno
switch (*rep++) { 110 map
110 NEWLINE
(gettok cases 112) 106 nextline
default: 110 OTHER
if ((map[cp[-l]]&BLANK) O)
(illegal character)
}
}
}
gettok begins by skipping over white space and then checking that there
is at least one token in the input buffer. If there isn't, calling fi 11 buf
ensures that there is. MAXTOKEN applies to all tokens except identifiers,
string literals, and numeric constants; occurrences of these tokens that
are longer than MAXTOKEN characters are handled explicitly in the code
for those tokens. The standard permits compilers to limit string literals
to 509 characters and identifiers to 31 characters. lee increases these
112 CHAPTER 6 • LEXICAL ANAL YS/S
The sections below describe the remaining cases. Recognizing the to-
kens themselves is relatively straightforward; computing the associated
values for some token is what complicates each case.
The code generated for these fragments is short and fast. For example,
on most machines, int is recognized by less than a dozen instructions,
many fewer than are executed when a table is searched for keywords,
even if perfect hashing is used.
case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R':
case 'S': case 'T': case 'U': case 'V': case 'W': case 'X':
I
case 'Y': case 'Z': case I•
id:
(ensure there are at least MAXLINE characters 115)
token = (char *)rep - 1;
while (map[*rcp]&(DIGITILETTER))
rep++;
token= stringn(token, (char *)rep - token);
(tsym -- type named by token 115)
cp = rep;
return ID;
All identifiers are saved in the string table. At the entry to this and all
cases, both cp and rep have been incremented past the first character
of the token. If the input buffer holds less than MAXLINE characters,
6. 5 • RECOGNIZING NUMBERS 115
case '5': case '6': case '7': case '8': case '9': {
unsigned int n = O;
(ensure there are at JeastMAXLINE characters 115)
token = (char *)rep - 1;
MAXLINE 105 if (*token == 'O' && (*rep == 'x' I I *rep == 'X')) {
rep 111 (hexadecimal constant)
token 108 } else if (*token == 'O') {
(octal constant)
} else {
(decimal constant 117)
}
return ICON;
}
As for identifiers, this case begins by insuring that the input buffer holds
at least MAXLINE characters, which permits the code to look ahead, as the
test for hexadecimal constants illustrates.
The fragments for the three kinds of integer constant set n to the value
of the constant. They must not only recognize the constant, but also
ensure that the constant is within the range of representable integers.
Recognizing decimal constants illustrates this processing. The syntax
for decimal constants is:
decimal-constant:
nonzero-digit { digit }
nonzero-digit:
one of 1 2 3 4 5 6 7 8 9
The code accumulates the decimal value in n by repeated multiplications:
6.5 • RECOGNIZING NUMBERS 117
cp += 2;
} else if (*cp == 'u' I I *cp 'U') {
tval.type = unsignedtype;
cp += 1;
} else if (*cp == 'l' I I *cp == 'L') {
if (n > (unsigned)LONG_MAX)
tval.type unsignedlong;
else
tval.type longtype;
cp += 1;
} else if (base == 10 && n > (unsigned)LONG_MAX)
tval.type = unsignedlong;
else if (n > (unsigned)INT_MAX)
tval.type unsignedtype;
else
tval.type inttype;
if (overflow) {
warning("overflow in constant '%5'\n", token,
(char*)cp - token);
n = LONG_MAX;
}
(set tval 's value 118)
isunsigned 60 ppnumber("integer");
longtype 57 return &tval;
ppnumber 119 }
%5 99
token 108 If both U and L appear, n is an unsigned long, and if only U appears,
tval 117
unsignedlong 57
n is an unsigned. If only L appears, n is a long unless it's too big, in
unsignedtype 58 which case it's an unsigned long. n is also an unsigned long if it's an
unsuffixed decimal constant and it's too big to be a long. Unsuffixed
octal and hexadecimal constants are ints unless they're too big, in which
case they're unsigneds. The format code %S prints a string like pri ntf's
%s, but consumes an additional argument that specifies the length of the
string. It can thus print strings that aren't terminated by a null character.
The types int, long, and unsigned are different types, but 1cc insists
that they all have the same size. This constraint simplifies the tests
shown above and the code that sets tva l's value:
(settval 's value 118)= 118
if (isunsigned(tval.type))
tval.u.c.v.u n;
else
tval.u.c.v.i n;
Relaxing this constraint would complicate this code and the tests above.
For example, the standard specifies that the type of an unsuffixed dec-
imal constant is int, long, or unsigned long, depending on its value. In
6. 5 • RECOGNIZING NUMBERS 119
l cc, ints and longs can accommodate the same range of integers, so an
unsuffixed decimal constant is either int or unsigned.
A numeric constant is formed from a preprocessing number, which is
the numeric constant recognized by the C preprocessor. Unfortunately,
the standard specifies preprocessing numbers that are a superset of the
integer and floating constants; that is, a valid preprocessing number may
not be a valid numeric constant. 12 3 . 4 . 5 is an example. The prepro-
cessor deals with such numbers too, but it may pass them on to the
compiler, which must treat them as single tokens and thus must catch
preprocessing numbers that aren't valid constants.
The syntax of a preprocessing number is
pp-number:
[ . ] digit { digit I . I nondigit I E sign I e sign }
sign: - I+
Valid numeric constants are prefixes of preprocessing numbers, so the
processing in icon and fcon might conclude successfully without con-
suming the complete preprocessing number, which is an error. ppnumber
is called from icon, and fcon and checks for this case.
...
(lex.c functions)+=
static void ppnumber(which) char *which; {
117 120
...
47 constant
unsigned char *rep= cp--; 110 DIGIT
111 gettok
for ( ; (map[*cp]&(DIGITILETTER)) I I *cp == I
. ' cp++)
I,
110 LETTER
if ((cp[OJ 'E' 11 cp[O] 'e') 110 map
&& (cp[l] '-' 11 cp[l] == '+')) 111 rep
108 token
cp++;
if (cp > rep)
error("'%S' is a preprocessing number but an _
invalid %s constant\n", token,
(char*)cp-token, which);
}
ppnumber backs up one character and skips over the characters that may
comprise a preprocessing number; if it scans past the end of the numeric
token, there's an error.
fcon recognizes the suffix of floating constants and is called in two
places. One of the calls is shown above in (check for floating constant).
The other call is from the gettok case for '. ':
...
(gettok cases 112) +=
case '.':
...
116 122 111
}
if ((map[*rcp]&DIGIT) == 0)
return '.';
(ensure there are at JeastMAXLINE characters 115}
cp = rep - 1;
token = (char *)cp;
tsym = fcon () ;
return FCON;
The syntax for floating constants is
floating-constant:
fractional-constant [ exponent-part ] [ floating-suffix ]
digit-sequence exponent-part [floating-suffix]
fractional-constant:
[ digit-sequence ] . digit-sequence
digit-sequence .
exponent-part:
e [ sign ] digit-sequence
E [ sign ] digit-sequence
digit-sequence:
digit { digit }
DIGIT 110
map 110 floating-suffix:
ppnumber 119 one off l F L
rep 111
token 108 fcon recognizes a floating-constant, converts the token to a double value,
tsym 108
tval 117
and determines tva l's type and value:
(lex.c functions}+=
....
119
static Symbol fcon() {
(scan past a floating constant 121}
errno = O;
tval.u.c.v.d = strtod(token, NULL);
if (errno == ERANGE)
(warn about overflow120}
(set tva l's type and value 121}
ppnumber("floating");
return &tval;
}
constant is out of range, strtod sets the global variable errno to ERANGE
as stipulated by the ANSI C specification for the C library.
A floating constant follows the syntax shown above, and is recognized
by:
(scan past a floating constant121}= 120
if(*cp=='.')
(scan past a run of digits 121}
if (*cp == 'e' I I *cp == 'E') {
if (*++cp == '-' I I *cp == '+')
cp++;
if (map[*cp]&DIGIT)
(scan past a run of digits 121}
else
error("invalid floating constant '%5'\n", token,
(char*)cp - token);
}
acters, and thus uses unsigned char for the type wchar_t. The syntax
is
character-constant:
[ L ] ' c-char { c-char } '
c-char:
any character except ' , \, or newline
escape-sequence
escape-sequence:
one of\' \" \?\\\a \b \f \n \r \t \v
\ octal-digit [ octal-digit [ octal-digit ] ]
\x hexadecimal-digit { hexadecimal-digit }
string-literal:
[ L ] " { s-char }"
s-char:
any character except ", \, or newline
escape-sequence
String literals can span more than one line if a backslash immediately
precedes the newline. Adjacent string literals are automatically concate-
nated together to form a single literal. In a proper ANSI C implemen-
BUFSIZE 105 tation, this line splicing and string literal concatenation is done by the
limit 103 preprocessor, and the compiler sees only single, uninterrupted string lit-
MAXLINE 105
nextl i ne 106
erals. 1cc implements line splicing and concatenation for string literals
anyway, so that it can be used with pre-ANSI preprocessors.
Implementing these features means that string literals can be longer
than MAXLINE characters, so (ensure there are at leastMAXLINE characters)
cannot be used to ensure that a sequence of adjacent entire string literals
appears in the input buffer. Instead, the code must detect the newline
at 1i mi t and call nextl i ne explicitly, and it must copy the literal into a
private buffer.
....
(gettok cases 112)+= 119 111
scan:
case '\' ' : case ' " ' : {
static char cbuf[BUFSIZE+l];
char *s = cbuf;
int nbad = O;
*s++ = *--cp;
do {
cp++;
(scan one string literal 123)
if (*cp == cbuf[O])
cp++;
else
6.6 • RECOGNIZING CHARACTER CONSTANTS AND STRINGS 123
The outer do-while loop gathers up adjacent string literals, which are
identified by their leading double quote character, into cbuf, and reports
those that are too long. The leading character also determines the type
of the associated value and gettok's return value:
(set tvaland return ICON or SCON 123)= 123
token = cbuf;
tsym = &tval;
if (cbuf[O] == '"') {
tval.type = array(chartype, s - cbuf - 1, O);
tval.u.c.v.p = cbuf + 1;
return SCON;
} else {
61 array
if (s - cbuf > 3) 57 chartype
warning("excess characters in multibyte character_ 111 gettok
literal '%5' ignored\n", token, (char*)cp-token); 103 limit
else if Cs - cbuf <= 2) 110 map
error("missing '\n"); 110 NEWLINE
167 primary
tval.type = inttype; 29 string
tval.u.c.v.i = cbuf[l]; 30 stringn
return ICON; 108 token
} 108 tsym
117 tval
String literals can contain null characters as the result of the escape se-
quence \0, so the length of the literal is given by its type: Ann-character
literal has the type (ARRAY n (CHAR)) (n does not include the double
quotes). gettok's callers, such as primary, call stri ngn when they want
to save the string literal referenced by tval.
The code below, which scans a string literal or character constant,
copes with four situations: newlines at 1 i mi t, escape sequences, non-
ANSI characters, and literals that exceed the size of cbuf.
(scan one string literal 123) = 122
while (*cp != cbuf[O]) {
int c;
if (map[*cp]&NEWLINE) {
if ( cp < 1i mi t)
124 CHAPTER 6 • LEXICAL ANALYSIS
break;
cp++;
nextl i ne();
if ((end of input112})
break;
continue;
}
c = *cp++;
if (c == '\\') {
if (map[*cp]&NEWLINE) {
if (cp < limit)
break;
cp++;
nextline();
}
if (limit - cp < MAXTOKEN)
fillbuf();
c = backslash(cbuf[O]);
} else if (map[c] == O)
nbad++;
if (s < &cbuf[sizeof cbuf] - 2)
*s++ = c;
backslash 126 }
fillbuf 106
limit 103 If *limit is a newline, it serves only to terminate the buffer, and is thus
map 110 ignored unless there's no more input. Other newlines (those for which
MAXTOKEN 111
NEWLINE 110
cp is less than 1i mi t) and the one at the end of file terminate the while
nextline 106 loop without advancing cp. backslash interprets the escape sequences
described above; see Exercise 6.10. nbad counts the number of non-ANSI
characters that appear in the literal; 1cc's -A -A option causes warn-
ings about literals that contain such characters or that are longer than
ANSI's 509-character guarantee.
Further Reading
The input module is based on the design described by Waite (1986). The
difference is that Waite's algorithm moves one partial line instead of
potentially several partial lines or tokens, and does so after scanning
the first newline in the buffer. But this operation overwrites storage
before the buffer when a partial line is longer than a fixed maximum.
The algorithm above avoids this problem, but at the per-token cost of
comparing 1 i mi t-cp with MAXTOKEN.
Lexical analyzers can be generated from a regular-expression specifi-
cation of the lexical structure of the language. LEX (Lesk 1975), which
is available on UNIX, is perhaps the best known example. Schreiner and
EXERCISES 125
Friedman (1985) use LEX in their sample compilers, and Holub (1990) de-
tails an implementation of a similar tool. More recent generators, such
as fl ex, re2c (Bumbulis and Cowan 1993), and Eli's scanner genera-
tor (Gray et al. 1992; Heuring 1986), produce lexical analyzers that are
much faster and smaller than those produced by LEX. On some comput-
ers, EU and re2c produce lexical analyzers that are faster than 1cc's. EU
originated some of the techniques used in 1cc's gettok.
A "perfect" hash function is one that maps each word from a known
set into a different hash number (Cichelli 1980; Jaeschke and Osterburg
1980; Sager 1985). Some compilers use perfect hashing for keywords,
but the hashing itself usually takes more instructions than 1cc uses to
recognize keywords.
lee relies on the library function strtod to convert the string repre-
sentation of a floating constant to its corresponding double value. Doing
this conversion as accurately as possible is complicated; Clinger (1990)
shows that it may require arithmetic of arbitrary precision in some cases.
Many implementations of st rtod are based on Clinger's algorithm. The
opposite problem - converting a double to its string representation -
is just as laborious. Steele and White (1990) give the gory details.
Exercises
105 BUFSIZE
6.1 What happens if a line longer than BUFSIZE characters appears in 104 file
the input? Are zero-length lines handled properly? 111 gettok
104 lineno
6.2 The C preprocessor emits lines of the form 106 nextline
# n "file"
#line n "file"
#1 i ne n
These lines are used to reset the current line number and file name
to n and file, respectively, so that error messages refer to the correct
file. In the third form, the current file name remains unchanged.
re synch, called by next 1i ne, recognizes these lines and resets fi 1e
and 1i neno accordingly. Implement re synch.
6.3 In many implementations of C, the preprocessor runs as a separate
program with its output passed along as the input to the compiler.
Implement the preprocessor as an integral part of input. c, and
measure the resulting improvement. Be warned: Writing a prepro-
cessor is a big job with many pitfalls. The only definitive specifica-
tion for the preprocessor is the ANSI standard.
6.4 Implement the fragments omitted from gettok.
126 CHAPTER 6 • LEXICAL ANALYSIS
6.5 What happens when lee reads an identifier longer than MAXLINE
characters?
6.6 Implement int get ch r (void).
6. 7 Try perfect hashing for the keywords. Does it beat the current im-
plementation?
6.8 The syntax for octal constants is
octal-constant:
O { octal-digit }
octal-digit:
one of O 1 2 3 4 5 6 7
Write (octal constant). Be careful; an octal constant is a valid prefix
of a floating constant, and octal constants can overflow.
6.9 The syntax for hexadecimal constants is
hexadecimal-constant:
( Ox I OX ) hexadecimal-digit { hexadecimal-digit }
hexadecimal-digit:
one of 0 1 2 3 4 5 6 7 a b c d e f A B C D E F
getchr 108
icon 117 Write (hexadecimal constant). Don't forget to handle overflow.
MAXLINE 105
6.10 Implement
(lex.c prototypes)=
static int backslash ARGS((int q));
6.15 lee assumes that int and long (signed and unsigned) have the same
size. Revise i con to remove this regrettable assumption.
7
Parsing
127
128 CHAPTER 7 • PARSING
the rules for the selected nonterminal. In this example, there are only
two rules, so one possible replacement is
expr ==> expr + expr
This operation is a derivation step, and a sequence of such steps that ends
in a sentence is a derivation. At each step, one nonterminal is replaced
by one of its right-hand sides. For example, the sentence ID+ID+ID can
be obtained by the following derivation.
expr ==> expr+ expr
==> expr+ ID
==> expr + expr + ID
==> ID+ expr+ ID
==> ID+ID+ID
In the first step, the production
expr: expr + expr
is applied to replace expr by the right-hand side of this rule. In the sec-
ond step, the rule expr: ID is applied to the rightmost occurrence of expr.
The next three steps apply these rules to arrive at the sentence ID+ID+ID.
Each of the steps in a derivation yields a sentential form, which is a string
of terminals and nonterminals. Sentential forms differ from sentences
in that they can include both terminals and nonterminals; sentences con-
tain just terminals.
At each step in a derivation, any of the nonterminals in the sentential
form can be replaced by the right-hand side of one of its rules. If, at each
step, the leftmost nonterminal is replaced, the derivation is a le~most
derivation. For example,
expr ==> expr+ expr
==> ID+ expr
==> ID + expr + expr
==> ID+ ID+ expr
==> ID+ID+ID
is a leftmost derivation for the sentence ID+ID+ID. Parsers reconstruct a
derivation for a given sentence, i.e., the input C program. 1cc's parser is
a top-down parser that reconstructs the leftmost derivation of its input.
expr expr
/i~
+ expr
expr
/i~
expr + expr
/i~
expr + expr
ic ia expr/i~
+ expr
i
a
ib
i
b
ic
FIGURE 7.1 Two parse trees for a+b+c.
expr:
expr+ expr
expr * expr
ID
Assuming a, b, and c are identifiers, a+b, a+b+c, and a+b*c are sentences
in this language.
A derivation can be written as described above or shown pictorially by
a parse tree. For example, a leftmost derivation for a+b+c is
expr ==> expr + expr
==> ex.pr+ ex.pr+ expr
==> a + expr + expr
==> a+ b + expr
==> a+b+c
and the corresponding parse tree is the one on the left in Figure 7.1.
A parse tree is a tree with its nodes labelled with nonterminals and
its leaves labelled with terminals; the root of the tree is labelled with
the start symbol. If a node is labelled with nonterminal A and its im-
mediate offspring are labelled, left to right, with X 1 , X2, ... , Xn, then
A: X1X2 .. . Xn is a production.
If a sentence has more than one parse tree, which is equivalent to
having more than one leftmost derivation, the language is ambiguous.
For example, a+b+c has another leftmost derivation in addition to the
one shown above, and the resulting parse tree is the one shown on the
right in Figure 7.1.
The problem in this example is that the normal left-associativity of
+ is not captured by the grammar. The correct interpretation, which
corresponds to (a+b)+c, is given by the derivation above, and is shown
in Figure 7.l's left tree.
This problem can be solved by rewriting the grammar to use EBNF's
repetition construct so that a+b+c has only one derivation, which can be
interpreted as (a+b)+c:
130 CHAPTER 7 • PARSING
expr:
expr { + expr }
expr { * expr }
ID
With this change, there is only one leftmost derivation for a+b+c, but
understanding that derivation requires understanding how to apply EBNF
productions involving repetitions. A production of the form
A:~ { oc}
says that A derives ~ followed by zero or more occurrences of oc. This
language is also specified by the grammar
A:~ X
X: EI ocX
X derives the empty string, denoted by E, or oc followed by X. One
application of A's production followed by repeated applications of X's
productions thus derives ~ fallowed by zero or more occurrences of oc.
EBNF's repetition construct is an abbreviation for a hidden nonterminal
like X, but these nonterminals must be included in parse trees. It's easi-
est to do so by rewriting the grammar to include them. Adding them to
the expression grammar yields
expr:
exprX
expr Y
ID
X:E I+ exprX
Y: EI* expr Y
With this change, there's only one leftmost derivation for a+b+c:
expr ~ exprX
~ ax
~ a+ expr X
~ a+bX
~ a+b+exprX
~ a+b+cX
~ a+b+C€
The parser can interpret this derivation as is appropriate for the oper-
ators involved; here, it would choose the left-associative interpretation,
but it could also choose the other interpretation for right-associative op-
erators.
The operator * has the same problem, which can be fixed in a way
similar to that suggested above. In addition, * typically has a higher
7.2 • AMBIGUITY AND PARSE TREES 131
precedence than +, so the grammar should help arrive at the correct in-
terpretation for sentences like a+b*c. For example, the revised grammar
given above does not work; the derivation for a+b*c is
expr ==> exprX
==> ax
==> a+ expr Y
==> a+ b Y
==> a+b*exprY
==> a+b+cY
==> a+b+ci:
The fourth derivation step can cause the expression to be interpreted as
(a+b)*c instead of a+(b*c).
The higher precedence of * can be accommodated by introducing a
separate nonterminal that derives sentences involving *, and arranging
for occurrences of this nonterminal to appear as the operands to +:
expr: termX
term: ID Y
X:i: I+ termX
Y:t:l*IDY
With this grammar, the only leftmost derivation for a+b*c is
expr ==> termX
==> aYX
==> at:X
==> a i: + termX
==> at:+b YX
==> ai:+b*cYX
==> ai:+b*ci:X
==> at:+b*C£€
term derives a sentential form that includes b*c, which can be inter-
preted as the right-hand operand of the sum. As detailed in Chapter 8,
this approach can be generalized to handle an arbitrary number of prece-
dence levels and both right- and left-associative operators.
The grammar manipulations described above are usually omitted, and
the appropriate EBNF grammar is written directly. For example, the ex-
pression grammar shown in Section 1.6 completes the expression gram-
mar shown here.
Other ambiguities can be handled by rewriting the grammar, but it's
often easier to resolve them in an ad hoc fashion by simply choosing
one of the possible interpretations and writing the code to treat other
interpretations as errors. An example is the dangling-else ambiguity in
the if statement:
132 CHAPTER 7 • PARSING
stmt:
if ( expr) stmt
if ( expr) stmt else stmt
Nested if statements have two derivations: one in which the else part is
associated with the outermost if, and one in which the else is associated
with the innermost if, which is the usual interpretation. As shown in
Chapter 10, this ambiguity is handled by parsing the else part as soon
as it's seen, which has the effect of choosing the latter interpretation.
and, since the token c matches the first symbol in the selected produc-
tion, the input is advanced by one token. For the next step, the parser
must choose and apply a production for A. The next input token is a,
so the first production for A is a plausible choice, and the derivation
becomes
S ~ cAd
~ cab d
7.3 • TOP-DOWN PARSING 133
Again, the input is advanced since the input a matches the a in the pro-
duction for A. At this point, the parser is stuck because the next input
token, d, does not match the next symbol in the current derivation step,
b. The problem is that the wrong production for A was chosen. The
parser backs up to the previous step, backing up the input that was con-
sumed in the erroneous step, and applies the other production for A:
S => cAd
=> cad
The input, which was backed up to the a, matches the remainder of the
symbols in the derivation step, and the parser announces success.
As illustrated by this simple example, a top-down parser uses the next
input token to select an applicable production, and consumes input to-
kens as long as they match terminals in the derivation step. When a non-
terminal is encountered in the right-hand side of a derivation, the next
derivation step is made. This example also illustrates a pitfall of top-
down parsing: applying the wrong production and having to backtrack
to a previous step. For even a moderately complicated language, such
backtracking could cause many steps to be reversed. More important,
most of the side effects that can occur in derivation steps are difficult
and costly to undo. Backing up the input an arbitrary distance and un-
doing symbol-table insertions are examples. Also, such backtracking can
make recognition very slow; in the worst case, the running time can be
exponential in the number of tokens in the input.
Top-down parsing techniques are practical only in cases where back-
tracking can be avoided completely. This constraint restricts top-down
parsers to languages in which the appropriate production for the next
derivation step can be chosen correctly by looking at just the next to-
ken in the input. Fortunately, many programming languages, including
C, satisfy this constraint.
A common technique for implementing top-down parsers is to write
a parsing function for each nonterminal in the grammar, and to call that
function when a production for the nonterminal is to be applied. Natu-
rally, parsing functions must be recursive, since they might be applied
recursively. That is, there might be a derivation of the form
A => ... => ocA/3 => ...
where oc and f3 are strings of grammar symbols. Top-down parsers writ-
ten using this strategy are called recursive-descent parsers because they
emulate a descent of the parse tree by calling recursive functions at each
node.
The derivation is not constructed explicitly. The call stack that han-
dles the calls to recursive functions records the state of the derivation
implicitly. For each nonterminal, the corresponding function encodes
the right-hand side of each production as a sequence of comparisons
134 CHAPTER 7 • PARSING
EOI is the token code for the end of input; the input is valid only if all
of it is a sentence in the language.
then
FIRST(exi) u FIRST(ex2) u ... u FIRST(exk)
is added to FIRST(X). If there is a production of the form X: Y1 Y2 ... Yk,
where Yi are grammar symbols, then FIRST(Y1 Y2 ... Yk) is added to
FIRST(X).
FIRST(Y1Y2 ... Yk) depends on the FIRST sets for Y1 through Yk. All
of the elements of FIRST(Yi) except E are added to FIRST(Y1Y2 ... Yk),
which is initially empty. If FIRST(Yi) contains E, all of the elements of
FIRST(Y2 ) except E are also added. This process is repeated, adding all
of the elements of FIRST(Yd except E if FIRST(Yi-i) contains E. The
resulting effect is that FIRST(Y1Y2 ... Yk) contains the elements of the
FIRST sets for the transparent Yis, where a FIRST set is transparent if
it contains E. If all of the FIRST sets for Y1 through Yk contain E, E is
added to FIRST(Y1 Y2 ... Yk).
Consider the grammar for simple expressions given in Section 1.6:
expr:
term { + term }
term { - term }
term:
factor { * factor }
factor { I factor }
factor:
ID
138 CHAPTER 7 • PARSING
which cannot be computed until the value of FIRST ( tenn) is known. Like-
wise, FIRST (term) is
FIRST(factor { *factor}) u FIRST(factor { /factor})
FIRST ( ') ') contributes ) to FOLLOW ( expr), but FIRST ( { , expr } ) con-
tains €, so FIRST( { , expr} ') ') contributes ) as well.
term appears in two places in the two productions for expr, so
FOLLOW(term) = FOLLOW(expr)
u FIRST( { + term } ) u FIRST( { - term } )
{, ) -I + -}
DC T(oc)
terminal A if Ct== A) t = gettokC);
else error
nonterminal X X() ;
oc1 I oc2 I · · · I °'k if Ct E D(oci)) T(oci)
else if Ct E D(oc2)) T(oc2)
There are, of course, other code sequences that are equivalent to those
given in Table 7.1. For example, a switch statement is often used for
T(a1 I £X2 I · · · I <Xk). Also, rote application of the sequences given
in Table 7.1 sometimes leads to redundant code, which can be improved
by simple transformations. For example, the body of the parsing func-
tion for
parameter-list: [ ID { , ID } ]
is derived by applying the rules in Table 7.1 in the following seven steps.
1. T (parameter-list)
2. T([ ID { , ID} ])
3. if Ct == ID) { T(ID { , ID } ) }
4. i f Ct == ID) {
if Ct== ID) t = gettokC);
else errorC"missing identifier\n");
T({,ID})
}
5. if Ct == ID) {
if Ct== ID) t = gettokC);
else errorC"missing identifier\n");
while Ct == ', ') { T(, ID)}
}
6. if Ct == ID) {
if Ct== ID) t = gettokC);
7.5 • WRITING PARSING FUNCTIONS 139
7. if Ct == ID) {
if Ct== ID) t = gettokC);
else errorC"missing identifier\n");
while Ct == ', ') {
if Ct== ', ') t = gettokC);
else errorC"missing ,\n");
if Ct== ID) t = gettokC);
else errorC"missing identifier\n");
}
}
Left factoring is often taken into account when the parsing function is
written instead of rewriting the grammar and adding new nonterminals
as described above. For example, A: DC/3 I DC)' is equivalent to A: DC(/3 I y),
so the code for T(DC/3 I DC)') can be written directly as
T(DC) T(/3 I y)
In a few cases, DC appears as a common prefix in several productions,
and involves significant semantic processing. In such cases, introducing
a new nonterminal and left factoring the relevant productions encapsu-
lates that processing in a single parsing function.
140 CHAPTER 1 • PARSING
....
(error.c functions)+=
void expect(tok) int tok; {
141 142
...
i f (t == tok)
t = gettok();
else {
error("syntax error; found");
printtoken();
fprint(2, "expecting '%k'\n", tok);
}
}
The first test is, of course, never true when expect is called from test;
that call is made to issue the diagnostic. expect is also called from other
parsing functions whenever a specific token is expected, and it consumes
that token. If the expected token is missing, expect issues a diagnostic
and returns without advancing the input, as if the expected token had
been present.
expect calls error to begin the message, and it calls the static function
pri nttoken to print the current token (i.e., the token given by t and
token), and fpri nt to conclude the message. As an example of expect's
effect, the input "int x [ 5 ; " draws the diagnostic
syntax error; found ';' expecting ']'
file 104
fi rstfi le 104 Error messages are initiated by calling error, which is called with a
fprint 97
gettok 111
pri ntf-style format string and arguments. In addition to the message,
test 141 error prints the coordinates of the current token set by gettok and
token 108 keeps a count of the number of error messages issued in errcnt.
va_init 17 ....
VARARGS 18 (error.c functions}+=
void error VARARGS((char *fmt, ... ),
142 144 ...
(fmt, va_alist),char *fmt; va__dcl) {
va__list ap;
The nonterminals listed above are defined in Chapters 8, 10, and 11.
Since ski pto's second argument is an array, it can represent supersets
of these sets when the additional tokens have kind values equal to them-
selves, as exemplified above by}. These supersets are related to FOLLOW
sets in some cases. For example, a statement must be followed by a } or
a token in FIRST(statement). The parsing function for statement thus
passes skipto an array that holds IF, ID, and}.
As ski pto discards tokens, it announces the first eight and the last
one it discards:
(error.c functions}+=
....
142
void skipto(tok, set) int tok; char set[]; {
int n;
char *s;
ski pto discards nothing and issues no diagnostic if tis equal to tok or
is in kind[t]. Suppose bug.c holds only the one line
fprint(2, "expecting '%k'\n", tok);
The syntax error in this example is that this line must be inside a func-
tion. The call to fpri nt looks like the beginning of a function definition,
FURTHER READING 145
but 1cc soon discovers the error. test calls expect and ski pto to issue
the diagnostic
bug.c:l: syntax error; found '2' expecting ')'
bug.c:l: skipping '2' ',' " expecting '%k'\12" ',' 'tok'
Notice that the right parenthesis was not discarded.
Further Reading
There are many books that describe the theory and practice of com-
piler construction, including Aho, Sethi, and Ullman (1986), Fischer and
LeBlanc (1991), and Waite and Goos (1984). Davie and Morrison (1981)
and Wirth (1976) describe the design and implementation of recursive-
descent compilers.
A bottom-up parser reconstructs a rightmost derivation of its input,
and builds parse trees from the leaves to the roots. Bottom-up parsers
are often used in compilers because they accept a larger class of lan-
guages and because the grammars are sometimes easier to write. Most
bottom-up parsers use a variant of LR parsing, which is surveyed by Aho
and Johnson (1974) and covered in detail by Aho, Sethi, and Ullman
(1986). In addition, many parser generators have been constructed.
These programs accept a syntactic specification of the language, usu- 142 expect
144 skipto
ally in a form like that shown in Exercise 7.2, and produce a parsing 141 test
program. YACC (Johnson 1975) is the parser generator used on UNIX.
YACC and LEX work together, often simplifying compiler implementa-
tion considerably. Aho, Sethi, and Ullman (1986), Kernighan and Pike
(1984), and Schreiner and Friedman (1985) contain several examples of
the use of YACC and LEX. Holub (1990) describes the implementation of
another parser generator.
Other parser generators are based on attribute grammars; Waite and
Goos (1984) describe attribute grammars and related parser generators.
The error-handling techniques used in 1cc are like those advocated
by Stirling (1985) and used by Wirth (1976). Burke and Fisher (1987)
describe perhaps the best approach to handling errors for LR and LL
parser tables.
Exercises
7.1 Using the lexical-analyzer and the symbol-table modules from the
previous chapters, cobble together a parser that recognizes expres-
sions defined by the grammar below and prints their parse trees.
146 CHAPTER 7 • PARSING
expr:
term { + term }
term { - term }
term:
factor { * factor }
factor { / factor }
factor:
ID
ID ' (' expr { , expr } ' ) '
'(' expr ')'
7.2 Write a program that computes the FIRST and FOUOW sets for an
EBNF grammar and reports conflicts that interfere with recursive-
descent parsing of the language. Design an input representation for
the grammar that is close in the form to EBNF. For example, sup-
pose grammars are given in free format where nonterminals are in
lowercase with embedded - signs, terminals are in uppercase or en-
closed in single or double quotes, and productions are terminated
by semicolons. For example, the grammar in the previous exercise
could appear as
Give an EBNF specification for the syntax of the input, and write
a recursive-descent parser to recognize it using the techniques de-
scribed in this chapter.
8
Expressions
147
148 CHAPTER 8 • EXPRESSIONS
ADD+I
/~MUL+I
ADD+I
I\
INDIR+I INDIR+I INDIR+I
/~ADD+I
i i i I\
ADDRG+P ADDRG+P ADDRG+P INDIR+I INDIR+I
a b b i i
ADDRG+P ADDRG+P
a b
Tree kids[2];
Node node;
union {
(u fields for Tree variants 168)
} u;
};
The op field holds a code for the operator, the type field points to a Type
for the type of the result computed by the node at runtime, and kids
point to the operands. The node field is used to build dags from trees
as detailed in Section 12.2. Trees for some operators have additional
information tucked away in the fields of their u unions.
The operators form a superset of the node operators described in
Chapter 5 and listed in Table 5.1 (page 84), but they are written differ-
ently to emphasize their use in trees. An operator is formed by adding
a type suffix to a generic operator; for example, ADD+! denotes integer
addition. The + is omitted when referring to the corresponding node
operator ADDI; this convention helps distinguish between trees and dags
in figures and in prose. The type suffixes are listed in Section 5.5, and
Table 5.1 gives the allowable suffixes for each operator.
Table 8.1 lists the six operators that appear in trees in addition to
those shown in Table 5.1. AND, OR, and NOT represent expressions involv-
ing the &&, I I, and ! operators. Comma expressions yield RIGHT trees; 81 kids
by definition, RIGHT evaluates its arguments left to right, and its value 315 node
is the value of its rightmost operand. RIGHT is also used to build trees 54 Type
that logically have more than two operands, such as the COND operator,
which represents conditional expressions of the form c ? e1 : e2. The
first operand of a COND tree is c and the second is a RIGHT tree that holds
e1 and e2. RIGHT trees are also used for expressions such as e++. These
operators are used only by the front end and thus do not need - and
must not have - type suffixes. The FIELD operator identifies a reference
to a bit field.
While trees and dags share many of the same operators, the rules con-
cerning the number of operands and symbols, summarized in Table 5.1,
apply only to dags. The front end is not constrained by these rules when
it builds trees, and it often uses additional operands and symbols in trees
that do not appear in dags. For example, when it builds the tree for the
arguments in a function call, it uses the kids [1] fields in ARG nodes to
build what amounts to a list of arguments, the trees for which are stored
in the kids [OJ fields.
A tree is allocated, initialized, and returned by
( tree.c functions)= 150
....
Tree tree(op, type, left, right)
int op; Type type; Tree left, right; {
Tree p;
NEWO(p, where);
p->Op = op;
p->type = type;
p->kids[OJ left;
p->kids[l] = right;
return p;
}
where = a;
p = (*f)(tok);
where = save;
return p;
}
texpr saves where, sets it to a, calls the parsing function (*f) (tok),
restores the saved value of where, and returns the tree returned by *f.
The remaining functions in tree. c construct, test, or otherwise ma-
nipulate trees and operators. These are all applicative - they build new
trees instead of modifying existing ones, which is necessary because the
8.2 • PARSING EXPRESSIONS 151
front end builds dags for a few operators instead of trees. rightkid(p)
returns the rightmost non-RIGHT operand of a nested series of RIGHT
trees. retype(p, ty) returns p if p->type == ty or a copy of p with type
ty. hascal l(p) is one if p contains a CALL tree and zero otherwise.
generic(op) returns the generic flavor of op, optype(op) returns op's
type suffix, and opindex(op) returns op's operator index, which is the
generic operator mapped into a contiguous range of integers suitable for
use as an index.
t = gettok();
else if (t == '(') {
t = gettok();
expr();
expect(')');
} else
error("unrecognized expression\n");
}
The 13 comes from Table 8.2; the binary operators + and - have prece-
dence 12 and*,/, and% each have precedence 13. When k exceeds 13,
expr calls factor to parse the productions for factor. Expression pars-
ing for this restricted grammar begins by calling expr(12), and the call
to expr in factor must be changed to expr(12).
expr and factor can be used for any expression grammar of the form
expr: expr ® expr I factor
8.2 • PARSING EXPRESSIONS 153
si zeof type-cast
15 ++ -- unary suffix postfix
157 exprl
where ® denotes binary, left-associative operators. Adding operators is 159 expr2
accomplished by appropriately initializing prec. 162 expr3
The while loop in expr handles left-associative operators, which are 155 expr
specified in EBNF by productions like those for ex.pr and term. Right- 166 postfix
155 prec
associative operators, like assignment, are specified in EBNF by produc- 164 unary
tions like
asgn: expr = asgn
They can also be handled using this approach by simply calling expr(k)
instead of expr(k + 1) in the while loop in expr. Assuming all opera-
tors at each precedence level have the same associativity, the decision
of whether to call expr with k or k + 1 can be encoded in a table, han-
dled by writing separate parsing functions for left- and right-associative
operators, or making explicit tests for each kind of operator.
Unary operators can also be handled using this technique. Fortunately,
the unary operators in C have the highest precedence, so they appear in
function n + 1, as does factor in the example above. Otherwise, upon
entry, expr would have to check for the occurrence of unary operators
at the kth level.
Using this technique also simplifies the grammar for expressions, be-
cause most of the nonterminals for the intermediate precedence levels
can be omitted.
154 CHAPTER 8 • EXPRESSIONS
primary-expression:
identifer
constant
string-literal
' ( ' expression ' ) '
There are seven parsing functions for expressions corresponding to the
expression nonterminals in this grammar. The parsing function for
8.3 • PARSING C EXPRESSIONS 155
exprO calls expr to parse the expression, and passes the resulting tree
to root, which returns only the tree that has a side effect. For exam-
ple, the statement a + f() includes a useless addition, which lee is free
to eliminate (even if the addition would overflow). Given the tree for
this expression, root returns the tree for f (). root is described in Exer-
cise 12.9.
if (t == '='
I I (prec[t] >= 6 && prec[t] <= 8)
I I (prec[t] >= 11 && prec[t] <= 13)) {
int op = t;
t = gettok();
if (oper[op] == ASGN)
p = asgntree(ASGN, p, value(exprl(O)));
else
(augmented assignment 158) 197 asgntree
} 159 expr2
155 oper
(test for correct termination 156)
155 prec
return p; 164 unary
} 160 value
oper [op] will be the corresponding generic tree operator for these char-
acters, for example, oper [ '+' J is ADD. exprl handles augmented assign-
ments, such as +=, by recognizing the two tokens that make up the aug-
mented assignment operator:
(augmented assignment 158)= 157
{
expect('=');
p = incr(op, p, exprl(O));
}
Each augmented assignment operator is one token, but this code appears
to treat them as two tokens; expr3, described below, avoids this erro-
neous interpretation by recognizing tokens like + as binary operators
only when they aren't immediately followed by an equals sign. Thus,
exprl correctly interprets a += bas an augmented assignment and lets
expr3 discover the error in a + = b.
i ncr builds trees for expressions of form v ®= e for any binary oper-
ator®, lvalue v, and rvalue e.
( tree.c functions)+=
....
157 159
.....
Tree incr(op, v, e) int op; Tree v, e; {
return asgntree(ASGN, v, (*optree[op])(oper[op], v, e));
asgntree 197 }
expect 142
exprl 157 i ncr is one place where the front end builds a dag instead of a tree. For
expr3 162 example, Figure 8.2 shows the tree returned by i ncr for *f() += b. *f()
oper 155
optree 191
must be evaluated only once, but the lvalue it computes is used twice -
once for the rvalue and once as the target of the assignment. Building
only one tree for *f() reflects these semantics. Ultimately, these kinds
of trees require temporaries, which are generated when the trees are
converted into nodes; Chapter 12 explains.
These dags could have been avoided by using additional tree opera-
tors for augmented assignments. Doing this would increase the number
CALL+P INDIR+I
i
ADDRG+P
i
ADDRG+P
f b
FIGURE 8.2 Tree for *f() += b.
8.5 • CONDITIONAL EXPRESSIONS 159
of tree operators, and it might complicate the semantic analyses for the
binary operators involved. For example, addtree, the function that per-
forms the semantic analysis for +, might have to cope with both + and
+=. There are several other situations in which it's useful to permit dags;
an example occurs in dealing with nested functions, which is described
in Section 9.3.
binary-expression:
unary-expression { binary-operator unary-expression }
binary-operator:
one of I I && ' I ' A & == ! = < > <= >= << >> + - * I %
and are parsed by one function, as described in Section 8.2. Using that
approach, the parsing function - without its tree-building code - for
binary-expression is
void expr3(k) int k; {
if (k > 13)
unary();
else {
expr3(k + 1);
while (prec[t] k) {
t = gettok();
expr3(k + 1);
}
}
}
expr3(14)
unary()
Of the calls leading up the first call to unary (which parses the a), only
expr3 (6) does useful work after unary returns. And none of the recur-
sive calls from within the while loop leading to the second call to unary
(which parses b) do useful work.
This sequence reveals the overall effect of the calls to expr3: parse
a unary-expression, then parse binary-expressions at precedence levels
13, 12, ... , 4. The recursion can be replaced by counting from 14 down
to k. Since nothing interesting happens until the precedence is equal to
prec [t], counting can begin there:
void expr3(k) int k; {
int kl;
unary();
for (kl= prec[t]; kl>= k; kl--)
while (prec[t] == kl) {
t = gettok();
expr3(kl + l);
}
}
Coordinate 38 This transformation also benefits the one remaining recursive call to
prec 155 expr3 by eliminating most of the recursion in that call. Now, the se-
unary 164 quence of calls for a Ib is
expr3(4)
unary()
expr3(7)
unary()
Adding the code to validate and build the trees and to solve two remain-
ing minor problems (augmented assignments and the && and I I opera-
tors) yields the final version of expr3:
...
( tree.c functions)+=
static Tree expr3(k) int k; {
...
160 164
int kl;
Tree p =unary();
p = pointer(p);
if (op == ANDAND I I op == OROR) {
r = pointer(expr3(kl));
if (events.points)
(plant event hooks for && I I)
} else
r = pointer(expr3(kl + 1));
p (*optree[op])(oper[op], p, r);
}
return p;
}
Like conditional expressions, the && and I I operators alter flow of control
and thus must provide for event hooks.
Technically, the && and 11 operators are left-associative, and their right
operands are evaluated only if necessary. It simplifies node generation if
they are treated as right-associative during parsing. Each operator is the
sole occupant of its precedence level, so, for example, making && right
associative simply yields a right-heavy ANDAND tree instead of a left-heavy
one. As detailed in Section 12.3, this apparent error is not only repaired
during node generation, but leads to better code for the short-circuit
evaluation of && and 11 than left-heavy trees. Making 11 right-associative
requires calling expr3(4) instead of expr3(5) in the while loop. For&&, 109 ANDAND
expr3(5) must be called instead of expr3(6). Calling expr3(kl) instead 157 exprl
of expr3(kl+l) for these two operators makes the appropriate calls. 162 expr3
The last problem is augmented assignment. exprl recognizes the 155 oper
191 optree
augmented-assignment operators by recognizing two-token sequences. 109 OROR
But these operators are single tokens, not two-token sequences; for ex- 174 pointer
ample,+= is the token for additive assignment, and+ =is a syntax error.
exprl's approach is correct only if+ =is never recognized as+=. expr3
guarantees this condition by doing just the opposite: a binary operator
is recognized only when it is not followed immediately by an equals sign.
Thus, the + in a + = b is not recognized as a binary operator, and 1cc
detects the syntax error.
unary-operator:
one of++ -- & * + - - !
postfix-expression:
primary-expression { postfix-operator}
postfix-operator:
' [ ' expression ' J '
' (' [ assignment-expression { , assignment-expression } ] ') '
. identifier
-> identifier
++
and the productions for primary-expression, which are given in the next
section. The parsing components of these functions are simple because
these productions are simple. The parsing function for unary-expression
is an example: most of the unary operators are parsed by consuming the
operator, parsing the operand, and building the tree.
...
(tree.c functions)+=
static Tree unary() {
162 166...
Tree p;
DECR 109 switch (t) {
expr 155
istypename 115
case '*'. (p +- unary165) (indirection 179) break;
postfix 166 case '&': (p +- unary165) (address of 179) break;
primary 167 case '+': (p +- unary165) (affirmation) break;
tsym 108 case - ':I
(p +- unary165) (negation 178) break;
case -·: I
(p +- unary165) (complement) break;
case (p +- unary165) (logical not) break;
I! I:
(p -- umuy165)= 164
t = gettok(); p =unary();
Most of the fragments perform semantic checks, which are described
in the next chapter. Three are simple enough to dispose of here. The
expression ++e is semantically equivalent to the augmented assignment
e += 1, so i nc r can build the tree for unary ++:
(preincrementl65)= 164
p = incr(INCR, pointer(p), consttree(l, inttype));
Predecrement is similar.
si zeof ' (' type-name ') ' is a constant of type si ze_t that gives the
number of bytes occupied by an instance of type-name. In lee, si ze_t
is unsigned. Similarly, the unary-expression in si zeof unary-expression
serves only to provide a type whose size is desired; the unary-expression
is not evaluated at runtime. Most of the effort in parsing si zeof goes
into distinguishing between these two forms of sizeof and finding the
appropriate type. Notice that the parentheses are required if the operand
is a type-name.
(sizeof165)= 164
Type ty;
p = NULL; 193 consttree
if Ct == I (') { 142 expect
t = gettok(): 155 expr
if (istypename(t, tsym)) { 149 FIELD
158 incr
ty = typename(); 60 i sfunc
expect(')'); 115 i stypename
} else { 174 pointer
p = postfix(expr(')')); 166 postfix
ty = p->type; 171 ri ghtki d
}
108 tsym
309 typename
} else { 164 unary
p = unary(): 58 unsignedtype
ty = p->type;
}
if (isfunc(ty) I I ty->size == 0)
error("invalid type argument '%t' to 'sizeof'\n", ty);
else if (p && rightkid(p)->OP == FIELD)
error("'sizeof' applied to a bit field\n");
p = consttree(ty->size, unsignedtype);
As the code suggests, si zeof cannot be applied to functions, incomplete
types, or those derived from bit fields.
In unary and in (sizeof), a left parenthesis is a primary-expression or,
if the next token is a type name, the beginning of a type cast.
166 CHAPTER 8 • EXPRESSIONS
If a left parenthesis does not introduce a type cast, it's too late to let
primary parse the parenthesized expression, so unary must handle it.
This is why postfix expects its caller to call primary and pass it the
resulting tree instead of calling primary itself:
(tree.c functions)+=
...
164 167
.....
static Tree postfix(p) Tree p; {
for (;;)
switch (t) {
case !NCR: (postincrement 166) break;
case DECR: (postdecrement) break;
case '[': (subscriptl81) break;
case'(': (calls186) break;
case ' ' · (struct. field) break;
case DEREF: (pointer-> field 182) break;
default:
return p;
}
}
constant
string-literal
' ( ' expression ' ) '
Tree p;
switch (t) {
case ICON:
case FCON: (numeric constants 167) break;
case SCON: (string constants 168) break;
case ID: (an identifierl70) break;
default:
error("illegal expression\n");
p = consttree(O, inttype);
}
t = gettok();
return p;
193 consttree
}
150 tree
108 tsym
CNST trees hold the values of integer and floating constants in their u. v 73 ttob
fields:
(numeric constants 167) = 167
p = tree(CNST + ttob(tsym->type), tsym->type, NULL, NULL);
p->u.v = tsym->u.c.v;
RIGHT
/
RIGHT
~ ASGN+I
INDIR+I
i~~ADD+!
ADDRG+P
~CNST+I
1
(tree.c data)+=
...
155
float refine = 1.0;
p->ref is an estimate of the number of references to the identifier de-
scribed by p; other functions can adjust the weight of one reference to
p by changing refi nc. All external, static, and global identifiers are ad-
dressed with ADDRG operators; parameters are addressed with ADDRF; and
locals are addressed with ADDRL.
Arrays and functions cannot be used as lvalues or rvalues, so refer-
ences to them have only the appropriate addressing operators. Trees for
other types refer to the identifiers' rvalues; an example is th~ tree for i's
rvalue in Figure 8.3. rvalue adds the INDIR:
...
( tree.c functions)+=
Tree rvalue(p) Tree p; {
168 169 ...
Type ty = deref(p->type);
61 deref
ty = unqual(ty); 60 isunsigned
61 ptr
return tree(INDIR + (isunsigned(ty) ? I ttob(ty)), 150 tree
ty, p, NULL); 73 ttob
} 60 unqual
160 value
rva 1 ue can be called with any tree that represents a pointer value. 58 voidtype
1va1 ue, however, must be called with only trees that represent an rvalue 88 wants_argb
- the contents of an addressable location. The INDIR tree added by
rva 1 ue also signals that a tree is a valid lvalue, and the address is ex-
posed by tearing off the IND IR. 1va1 ue implements this check and trans-
formation:
...
(tree.c functions)+=
Tree lvalue(p) Tree p; {
169 173 ...
if (generic(p->op) != INDIR) {
error("lvalue required\n");
return value(p);
} else if (unqual(p->type) == voidtype)
warning('"%t' used as an lvalue\n", p->type);
return p->kids[O];
}
The tree for a structure parameter also depends on the value of the
interface field wants_argb. If wants_argb is 1, the code shown above
170 CHAPTER 8 • EXPRESSIONS
builds the appropriate tree, which has the form (INDIR+B (ADDRF+P x))
for parameter x. If wants_argb is zero, the front end implements struc-
ture arguments by copying them at a call and passing pointers to the
copies. Thus, a reference to a structure parameter needs another indi-
rection to access the structure itself:
(return a tree for a struct parameter 170) = 168
{
e = tree(op, ptr(ptr(p->type)), NULL, NULL);
e->u.sym = p;
return rvalue(rvalue(e));
}
Further Reading
Handling n levels of precedence with one parsing function instead of
n parsing functions is well known folklore in compiler circles, but there
are few explanations of the technique. Hanson (1985) describes the tech-
nique used as it is used in 1 cc. Holzmann (1988) used a similar technique
in his image manipulation language, pico. The technique is technically
equivalent to the one used in BCPL (Richards and Whitby-Strevens 1979),
but the operators and their precedences and associativities are spread
throughout the BCPL code instead of being encapsulated in tables.
EXERCISES 171
Exercises
8.1 Implement
( tree.c exported functions)=
extern Tree retype ARGS((Tree p, Type ty));
...
171
9.1 Conversions
Conversion functions accept one or more types and return a resulting
type, or accept a tree and perhaps a type and return a tree with the
appropriate conversion. promote(Type ty) is an example of the former
kind of conversion: It implements the integral promotions. It widens
an integral type ty to int, unsigned, or long, if necessary. As stipulated
172
9. 1 • CONVERSIONS 173
1cc assumes that doubles and long doubles are the same size and that
57 doubletype
longs and ints (both unsigned and signed) are also the same size. These 57 floattype
assumptions simplify the standard's specification of the usual arithmetic 57 inttype
conversions and thus simplify bi nary. The list below summarizes the 60 isdouble
standard's specification in the more general case, when a long double is 60 isunsigned
bigger than a double, and a long is bigger than an unsigned int: 174 pointer
71 promote
long double 58 unsignedtype
double
float
unsigned long int
long int
unsigned int
int
The type of the operand that appears highest in this list is the type to
which the other operand is converted. If none of these types apply, the
operands are converted to ints. 1cc's assumptions collapse the first two
types to the first if statement in bi nary, and the second if statement
handles floats. The third if statement handles the four integer types
because 1cc's signed long cannot represent all unsigned values.
pointer is an example of the second kind of conversion function that
takes a tree and returns a tree, possibly converted. Array and func-
tion types decay into pointers when used in expressions: (ARRAY T) and
(POINTER T) decay into (FUNCTION T) and (POINTER (FUNCTION T)).
174 CHAPTER 9 • EXPRESSION SEMANTICS
....
(tree.c functions}+= 173 174
.....
Tree pointer(p) Tree p; {
if (isarray(p->type))
p = retype(p, atop(p->type));
else if (isfunc(p->type))
p = retype(p, ptr(p->type));
return p;
}
rva 1ue, 1va1 ue, and va1ue can also be viewed as conversions. cond is
the inverse of va1 ue; it takes a tree that might represent a value and
turns it into a tree for a conditional by adding a comparison with zero:
....
( tree.c functions}+= 174 175
.....
Tree cond(p) Tree p; {
int op= generic(rightkid(p)->op);
c c
o-I-u-P
l l
l
F
l
s
l
s
FIGURE 9.1 Conversions.
....
(tree.c functions)+=
Tree cast(p, type) Tree p; Type type; {
174 182 ...
Type pty, ty = unqual(type);
p = value(p);
if (p->type == type)
return p;
pty = unqual(p->type);
i f (pty == ty)
return retype(p, type);
(convert p to super(pty) 175)
(convert p to super(ty) 176)
109 CHAR
(convert p to ty 177) 109 DOUBLE
return p; 57 doubletype
} 109 ENUM
109 FLOAT
As shown, these conversions are done with the unqualified versions of 109 INT
the types involved. super returns its argument's supertype. 60 isptr
The first step makes all signed integers ints, floats doubles, and point- 109 POINTER
ers unsigneds: 171 retype
109 SHORT
(convert p to super(pty) 175) = 175 203 simplify
60 unqual
switch (pty->op) { 109 UNSIGNED
case CHAR: p = simplify(CVC, super(pty), p, NULL); break; 58 unsignedtype
case SHORT: p = simplify(CVS, super(pty), p, NULL); break; 160 value
case FLOAT: p = simplify(CVF, doubletype, p, NULL); break;
case INT: p = retype(p, inttype); break;
case DOUBLE: p = retype(p, doubletype); break;
case ENUM: p = retype(p, inttype); break;
case UNSIGNED:p = retype(p, unsignedtype); break;
case POINTER:
if (isptr(ty)) {
(pointer-to-pointer conversion 176)
} else
p = simplify(CVP, unsignedtype, p, NULL);
176 CHAPTER 9 • EXPRESSION SEMANTICS
break;
}
simplify builds trees just like tree, but folds constants, if possible, and,
if a generic operator is given as its first argument, s imp1i fy forms the
type-specific operator from its first and second arguments. 1cc insists
that pointers fit in unsigned integers, so that they can be carried by un-
signed operators, which reduces the operator vocabulary. There's one
special case: the CVP+U is eliminated for pointer-to-pointer conversions
because it's always useless there.
(pointer-to-pointer conversion 176) = 175
if (isfunc(pty->type) && !isfunc(ty->type)
I I !isfunc(pty->type) && isfunc(ty->type))
warning("conversion from '%t' to '%t' is compiler _
dependent\n", p->type, ty);
return retype(p, type);
1 cc warns about conversions between object pointers and function point-
ers because the standard permits these different kinds of pointers to
have different sizes. 1 cc, however, insists that they have the same sizes.
The second step converts p, which is now a double, int, or unsigned,
to whichever one of these three types is ty's supertype, if necessary:
doubletype 57
isfunc 60 (convertp to super(ty) 176)= 175
retype 171 {
simplify 203 Type sty= super(ty);
tree 150 pty = p->type;
unsignedtype 58
if (pty != sty)
if (pty == inttype)
p = simplify(CVI, sty, p, NULL);
else if (pty == doubletype)
if (sty == unsignedtype) {
(double-to-unsigned conversion)
} else
p = simplify(CVD, sty, p, NULL);
else if (pty == unsignedtype)
if (sty == doubletype) {
(unsigned-to-double conversion 177)
} else
p = simplify(CVU, sty, p, NULL);
}
Notice that there are no arrows directly between D and u in Figure 9.1.
Most machines have instructions that convert between signed integers
and doubles, but few have instructions that convert between unsigneds
and doubles, so there is no CVU+D or CVD+U. Instead, the front end builds
9. 1 • CONVERSIONS 177
trees that implement these conversions, assuming that integers and un-
signeds are the same size.
An unsigned u can be converted to a double by constructing an ex-
pression equivalent to
2.*(int)(u>>l) + (int)(u&l)
u»l vacates the sign bit so that the shifted result, which is equal to
u/2, can be converted to a double with an integer-to-double conversion.
The floating-point multiplication and addition compute the value desired.
The code builds the tree for this expression:
(unsigned-to-double conversion 177) = 176
Tree two= tree(CNST+D, doubletype, NULL, NULL);
tWO->U.V.d = 2.;
p = (*optree['+'])(ADD,
(*optree['*'])(MUL,
two,
simplify(CVU, inttype,
simplify(RSH, unsignedtype,
p, consttree(l, inttype)), NULL)),
simplify(CVU, inttype,
simplify(BAND, unsignedtype,
p, consttree(l, unsignedtype)), NULL)); 175 cast
57 chartype
Notice that this tree is a dag: It contains two references top. The optree 193 consttree
functions are used for the multiplication and addition so that the integer- 57 doubletype
57 floattype
to-double conversions will be included. 60 isptr
The front end implements double-to-unsigned conversions by con- 191 optree
structing a tree for the appropriate expression. Exercise 9.2 explores 171 retype
how. 57 shorttype
The tree now represents a value whose type is the supertype of ty, 57 signedchar
203 simplify
and the third step in cast converts the tree to the destination type. This 150 tree
step is essentially the inverse of super: 57 unsignedchar
58 unsignedshort
(convert p to ty 177) = 175 58 unsignedtype
if (ty == signedchar I I ty == chartype I I ty shorttype)
p = simplify(CVI, type, p, NULL);
else if (isptr(ty)
I I ty == unsignedchar I I ty == unsignedshort)
p = simplify(CVU, type, p, NULL);
else if (ty == floattype)
p simplify(CVD, type, p, NULL);
else
p retype(p, type);
178 CHAPTER 9 • EXPRESSION SEMANTICS
Exercise 9.5 explains YYnul 1 and nul 1 check, which help catch null-
pointer errors.
Type casts specify explicit conversions. Some casts, such as pointer-
to-pointer casts, generate no code, but simply specify the type of an ex-
pression. Other casts, such as int-to-float, generate code that effects the
conversion at runtime. The code below and the code in cast implement
the rules specified by the standard.
180 CHAPTER 9 • EXPRESSION SEMANTICS
The standard stipulates that the target type specified in a cast must be
a qualified or unqualified scalar type or void, and the type of the operand
- the source type - must be a scalar type. The semantic analysis of
casts divides into computing and checking the target type, parsing the
operand, and computing and checking the source type. typename parses
a type declarator and returns the resulting Type, and thus does most of
the work of computing the target type, except for qualified enumerations:
(typecast 180)=
Type ty, tyl = typename(), pty;
...
180 164
expect(')');
ty = unqual(tyl);
if (isenum(ty)) {
Type ty2 = ty->type;
if (isconst(tyl))
ty2 = qual(CONST, ty2);
if (isvolatile(tyl))
ty2 = qual(VOLATILE, ty2);
tyl = ty2;
ty = ty->type;
}
This code computes the target type tyl and its unqualified variant ty.
Aflag 62
cast 175
The target type for a cast that specifies an enumeration type is the enu-
CONST 109 meration's underlying integral type (which for 1cc is always int), not the
expect 142 enumeration. Thus, tyl and ty must be recomputed before parsing the
isarith 60 operand.
isconst 60 ....
isenum
isint
60
60
(type cast 180) +=
p = pointer(unary());
180 180
... 164
isptr 60
pty = p->type;
isvolatile 60
pointer 174 if (isenum(pty))
qual 62 pty = pty->type;
Type 54
typename 309 This tree is cast to the unqualified type, ty, if the target and source types
unary 164 are legal: arithmetic and enumeration types can be cast to each other;
unqual 60 pointers can be cast to other pointers; pointers can be cast to integral
VOLATILE 109
types and vice versa, but the result is undefined if the sizes of the types
differ; and any type can be cast to void.
....
(type cast180)+=
if (isarith(pty) && isarith(ty)
180 181 ... 164
} else {
ty = q->type;
(qualifyty, when necessary183)
ty = ptr(ty);
}
p = simplify(ADD+P, ty, p, consttree(q->offset, inttype));
CALL+B
/~
RIGHTADDRL+P
~ t3
ADDRG+P
f
l''·ru~\
RIGHT ARG+I
/~ /~
i
ARG+P ADDRG+P
atoi
CNST+I
10 i
ARG+B
INDIR+P INDIR+B
i
ADDRG+P
i
ADDRG+P
str a
FIGURE9.2 Treeforf(a, '\n', atoi(str)).
RIGHT 149
wants_argb 88 code that mimics the established calling sequences on one or more of its
targets. These complexities are the price of compatibility with existing
calling conventions.
The meaningless program
char *str;
struct node { ... } a;
struct node f(struct node x, char c, inti) { ... }
main () { f(a, '\n', atoi(str)); }
illustrates almost all these complexities. The tree for the call to f is
shown in Figure 9.2, which assumes that wants_argb is one. The CALL+B's
right operand is described below. The RIGHT trees in this figure collab-
orate to achieve the desired evaluation order. A CALL's left operand is
a RIGHT tree that evaluates the arguments (the ARG trees) and the func-
tion itself. The leftmost RIGHT tree in Figure 9.2 is an example. The
tree whose root is the shaded RIGHT in Figure 9.2 occurs because of the
nested call to atoi. When this tree is traversed, code is generated so
that the call to atoi occurs before the arguments to fare evaluated. In
general, there's one RIGHT tree for each argument that includes a call,
and one if the function name is itself an expression with a call.
The actual arguments are represented by ARG trees, rightmost argu-
ment first; their right operands are the trees for the evaluation of the rest
of the actual arguments. Recall that ARG trees can have two operands.
9. 3 • FUNCTION CALLS 185
ARG+P
i
RIGHT
/~
ASGN+B
ADDRL+P
/ ~ t2
ADDRL+P INDIR+B
t2 i
ADDRG+P
a
FIGURE 9.3 Passing a structure by value when wants_argb is zero.
The topmost ARG+I is for the argument atoi (str), and its left operand
points to the CALL+I described above. The presence of the RIGHT tree
will cause the back end to store the value returned by atoi in a tempo-
rary, and the reference from the ARG+I to the CALL+I for atoi will pass
that value to f.
The second ARG+I is for the newline passed as the second argument.
f has a prototype and is thus a new-style function, so it might be ex-
pected that the integer constant '\n' would be converted to and passed 168 idtree
as a character. Most machines have constraints, such as stack alignment, 149 RIGHT
88 wants_argb
that force subword types to be passed as words. Even without such con- 88 wants_callb
straints, passing subword types as words is usually more efficient. So
1 cc generates code to widen short arguments and character arguments
to integers when they are passed, and code to narrow them upon entry
for new-style functions. If the global char ch was passed as f's second
argument, the tree would be
(ARG+I (CVC+I (INDIR+C (ADDRG+P ch))))
this address to the caller. When wants_ca 11 b is zero, the front end ar-
ranges to pass this address as a hidden first argument, and it changes
the CALL+B to a CALL+V; in this case, the back end never sees CALLB nodes.
This change is made by l i stnodes when the tree for a call is converted
to a forest of nodes for the back end.
l i stnodes also inspects the interface flag left_ to_ri ght as it tra-
verses a call tree. If l eft_to_ri ght is one, the argument subtree is
traversed by visiting the right operands of ARG trees first, which gen-
erates code that evaluates the arguments from the left to the right. If
l eft_to_ri ght is zero, the left operands are visited first, which evaluates
the arguments from the right to the left.
The case in postfix checks the type of the function expression and
lets ca11 to do most of the work:
(calls 186)= 166
{
Type ty;
Coordinate pt;
p = pointer(p);
if (isptr(p->type) && isfunc(p->type->type))
ty = p->type->type;
else {
Coordinate 38 error("found '%t' expected a function\n", p->type);
freturn 64 ty = func(voidtype, NULL, 1);
func 64 }
hascall 171 pt = src;
isfunc 60 t gettok();
isptr 60
left_to_right 88 p = call(p, ty, pt);
listnodes 318 }
pointer 174
postfix 166 ca11 dedicates locals to deal with each of the semantic issues de-
RIGHT 149 scribed above. n counts the number of actual arguments. args is the
unqual 60 root of the argument tree, and r is the root of the RIGHT tree that holds
voidtype 58 arguments or function expressions that include calls. For the example
wants_callb 88 shown in Figure 9.2, r points to the CALL+! tree. After parsing the argu-
ments, if r is nonnull, it and args are pasted together in a RIGHT tree,
which is the subtree rooted at the shaded RIGHT in Figure 9.2. hasca 11 re-
turns a nonzero value if its argument tree includes a CALL, and funcname
returns the name buried in for the string "a function" if f computes a
function address.
(enode.c functions)= 189
Tree call(f, fty, src) Tree f; Type fty; Coordinate src; {""
int n = O;
Tree args = NULL, r = NULL;
Type *proto, rty = unqual(freturn(fty));
Symbol t3 = NULL;
9.3 • FUNCTION CALLS 187
if (fty->u.f.oldstyle)
proto NULL;
else
proto fty->u.f.proto;
if (hascall (f))
r = f;
if (isstruct(rty))
(initialize for a struct function 187)
if Ct != ')')
for (; ;) {
{parse one argument 188)
if (t ! = I ' I)
break;
t = gettok();
}
expect(' ) ' ) ;
if ({still in a new-style prototype? 187))
error("insufficient number of arguments to %s\n",
funcname(f));
if (r)
args = tree(RIGHT, voidtype, r, args);
if (events.calls)
{plant an event hook for a call) 80 AUTO
return calltree(f, rty, args, t3); 189 calltree
} 142 expect
171 hascall
f is the expression for the function, rty is the return type, and proto is 60 isstruct
either null for an old-style function (even if it has a prototype; see Sec- 42 level
63 oldstyle
tion 4.5) or walks along the function prototype for a new-style function. 149 RIGHT
A nonnull proto is incremented for each actual argument that corre- 50 temporary
sponds to a formal parameter in a new-style prototype, and 150 tree
60 unqual
{still in a new-style prototype? 187)= 187 188 58 voidtype
proto && *proto && *proto != voidtype
tests if p roto points to a formal parameter type, when there is a proto-
type. Reaching the end of a prototype is different from reaching the end
of the actual arguments; for example, excess arguments are permitted in
new-style functions with a variable number of arguments.
If the function returns a structure, t3 is the temporary that's generated
to hold the return value:
{initialize for a st ruct function 187) = 187
{
t3 = temporary(AUTO, unqual(rty), level);
if (rty->size == 0)
error("illegal use of incomplete type '%t'\n", rty);
}
188 CHAPTER 9 • EXPRESSION SEMANTICS
q->type, *proto);
if ((isint(q->type) I I isenum(q->type))
&& q->type->size != inttype->size)
q = cast(q, promote(q->type));
++proto;
}
(enode.c functions}+=
...
186 191
160 value
....
Tree calltree(f, ty, args, t3)
Tree f, args; Type ty; Symbol t3; {
Tree p;
if (args)
f = tree(RIGHT, f->type, args, f);
if (isstruct(ty))
p = tree(RIGHT, ty,
tree(CALL+B, ty, f, addrof(idtree(t3))),
idtree(t3));
else {
Type rty = ty;
190 CHAPTER 9 • EXPRESSION SEMANTICS
if (isenum(ty))
rty = unqual(ty)->type;
else if (isptr(ty))
rty = unsignedtype;
p = tree(CALL + widen(rty), promote(rty), f, NULL);
if (isptr(ty) I I p->type->size > ty->size)
p = cast(p, ty);
}
return p;
}
CALL+B
/~INDIR+B
/
RIGHT
~
ADDRL+P
i
ADDRL+P
call 186
/~
t3 t3
calltree 189
cast 175 ADDRG+P
COND 149 f
isenum60
isptr
60 CALL+B itself returns no value; it exists only to permit back ends to gen-
lvalue169 erate target-specific calling sequences for these functions.
promote71
RIGHT 149 add rof is an internal version of 1va1 ue that doesn't insist on an IN DIR
tree 150 tree (although there is an INDIR tree in call tree's use of addrof). addrof
unqual 60 follows the operands of RIGHT, COND, and ASGN, and the INDIR trees to find
unsignedtype 58 the tree that computes the address specified by its argument. It returns
wants_argb 88 a RIGHT tree representing the original tree and that address, if necessary.
widen 74
For example, if oc is the operand tree buried in p that computes the ad-
dress, addrof(p) returns (RIGHT root(p) oc); if p itself computes the
address, addrof(p) returns p.
Structures are always passed by value, but if wants_argb is zero and
the argument is a structure, it must be copied to a temporary as ex-
plained above. There's one optimization that improves the code for pass-
ing structures that are returned by functions. For example, in
f(f(a, '\n', atoi(str)), 'O', 1);
the node returned by the inner call to f is passed to the outer call. In this
and similar cases, copying the actual argument can be avoided because
it already resides in a temporary. The pattern that must be detected is
9.4 • BINARY OPERA TORS 191
(RIGHT
(CALL+B ... )
(INDIR+B (ADDRL+P te111p))
)
Function Operators
incr += -= *= I= %=
<<= >>= &= A= I=
asgntree
condtree ? :
andtree 11 &&
bittree I A &%
eqtree -- !=
cmptree < > <= >=
shtree << >>
addtree +
subtree
multree * I
TABLE 9.1 Operator semantics functions.
ty = unsignedtype;
(cast 1 and r to type ty 192)
} else {
ty = unsignedtype;
typeerror(op, l, r);
}
return simplify(op + ttob(ty), inttype, l, r);
}
The third argument of zero to eqtype causes eqtype to insist that its
two type arguments are object types or incomplete types.
cast 175 The equality comparison operators are similar to the relationals but
cmptree 193 are fussier about pointer operands. These and other operators distin-
eqtree 195 guish between void pointers, which are pointers to qualified or unquali-
eqtype 69 fied versions of void, and null pointers, which are integral constant ex-
isfunc 60 pressions with the value zero or one of these expressions cast to void *.
isint 60
isptr 60 These definitions are encapsulated in
simplify 203
(enode.c macros)=
ttob 73
unqual 60 #define isvoidptr(ty) \
unsignedtype 58 (isptr(ty) && unqual(ty->type) voidtype)
voidtype 58
...
(enode.c functions)+=
static int isnullptr(e) Tree e; {
...
194 195
(enode.c functions)+=
...
194 195
....
Tree eqtree(op, l, r) int op; Tree l, r; {
Type xty = 1->type, yty = r->type;
(xty and yty point to compatible l}pes 195) = 195 196 201
(isptr(xty) && isptr(yty)
&& eqtype(unqual(xty->type), unqual(yty->type), 1))
The third argument of 1 to eqtype causes eqtype to permit its two type
arguments to be any combinations of compatible object or incomplete 193 cmptree
types. Given the declaration 69 eqtype
60 isenum
int (*p) [10] , (*q) [] ; 60 isfunc
194i snull ptr
eqtype's third argument is what permits p == q but disallows p < q. 60isptr
194isvoidptr
203simplify
9.5 Assignments 60unqual
58 unsignedtype
The legality of an assignment expression, a function argument, a return
statement, or an initialization depends on the legality of an assignment
of an rvalue to the location denoted by an lvalue. assign (xty, e) per-
forms the necessary type-checking for any assignment. It checks the
legality of assigning the tree e to an lvalue that holds a value of type
xty, and returns xty if the assignment is legal or null if it's illegal. The
return value is also the type to which e must be converted before the
assignment is made.
( enode.c functions)+=
...
195 197
Type assign(xty, e) Type xty; Tree e; {
....
Type yty = unqual(e->type);
xty = unqual(xty);
if (isenum(xty))
196 CHAPTER 9 • EXPRESSION SEMANTICS
xty = xty->type;
if (xty->size == 0 I I yty->size == 0)
return NULL;
(assign 196)
}
....
(enode.c functions)+=
Tree asgntree(op, 1, r) int op; Tree l, r; {
195 200
...
Type aty, ty;
r = pointer(r);
ty = assign(l->type, r);
if (ty)
r = cast(r, ty);
else {
typeerror(ASGN, 1, r);
if (r->type == voidtype)
r = retype(r, inttype);
ty = r->type;
}
if (1->op != FIELD)
1 = lvalue(l);
(asgntree 197)
return tree(op + (isunsigned(ty) ? I ttob(ty)),
ty l, r);
I
When the assignment is illegal, assign returns null and asgntree must
choose a type for the result of the assignment. It uses the type of the 195 assign
right operand, unless that type is void, in which case asgntree uses int. 175 cast
This code exemplifies what's needed to recover from semantic errors so 65 cfields
that compilation can continue. 211 computed
149 FIELD
The body of asgntree, revealed by (asgntree), below, detects attempts 50 generated
to change the value of a const location, changes the integral rvalue of 179 isaddrop
assignments to bit fields to meet the specifications of the standard, and 60 isconst
transforms some structure assignments to yield better code. 60 isptr
An lvalue denotes a const location if the type of its referent is qualified 60 isstruct
by const or is a structure type that is const-qualified. A structure type 60 isunsigned
169 lvalue
so qualified has its u. sym->u. s. cfi e 1ds flag set. 174 pointer
171 retype
(asgntree 197)=
aty = 1->type;
...
198 197 150
73
tree
ttob
if (isptr(aty)) 60 unqual
aty = unqual(aty)->type; 58 voidtype
if ( isconst(aty)
I I isstruct(aty) && unqual(aty)->u.sym->u.s.cfields)
if (isaddrop(l->op)
&& !1->u.sym->computed && !1->u.sym->generated)
error("assignment to canst identifier '%s'\n",
1->u.sym->name);
else
error("assignment to canst location\n");
198 CHAPTER 9 • EXPRESSION SEMANTICS
aty is set to the type of the value addressed by the lvalue. The assign-
ment is illegal if aty has a canst qualifier or if it's a structure type with
one or more canst-qualified fields. The gymnastics for issuing the di-
agnostic are used to cope with lvalues that don't have source-program
names.
The result of an assignment is the value of the left operand, and the
type is the qualified version of the left operand. The cast at the begin-
ning of asgntree sets r to the correct tree and ty to the correct type
for r and ty to represent the result, so the result of ASGN is its right
operand. Unfortunately, this scheme doesn't work for bit fields. The re-
sult of an assignment to a bit field is the value that would be extracted
from the field after the assignment, which might differ from the value
represented by r. So, for assignments to bit fields that occupy less than
a full unsigned, asgntree must change r to a tree that computes just
this value.
....
(asgntree 197)+= 197 199
..... 197
if (1->op == FIELD) {
int n = 8*1->u.field->type->size - fieldsize(l->u.field);
if (n > 0 && isunsigned(l->u.field->type))
r = bittree(BAND, r,
consttree(fieldmask(l->u.field), unsignedtype));
asgntree 197 else if (n > 0) {
consttree 193 if (r->op == CNST+I)
FIELD 149 r = consttree(r->u.v.i<<n, inttype);
field 182 else
fieldmask 66
fieldsize 66
r = shtree(LSH, r, consttree(n, inttype));
isunsigned 60 r = shtree(RSH, r, consttree(n, inttype));
unsignedtype 58 }
}
If the bit field is unsigned, the result is r with its excess most significant
bits discarded. If the bit field is signed and has m bits, bit m - 1 is
the sign bit and it must be used to sign-extend the value, which can be
done by arithmetically shifting r left to bring bit m into the sign bit, and
then shifting right by the same amount, dragging the sign bit along in
the process. For example, Figure 9.4 shows the trees assigned to r for
the two assignments in
struct { int a:3; unsigned b:3; } x;
x.a = e;
x.b = e;
In the assignment x. a = e, r is assigned a tree that uses shifts to sign-
extend the rightmost 3 bits of e; for x. b = e, r is assigned a tree that
ANDS e with 7. If r is constant, the left shift is done explicitly to keep
the constant folder from shouting about overflow.
9.5 • ASSIGNMENTS 199
RSH+I BAND+U
I\
LSH+I CNST+I e
I\ CNST+U
I\ 29
7
e CNST+I
29
ASGN+B
x
/~RIGHT RIGHT
/~INDIR+B
CALL+B
/~
CALL+B INDIR+B
ADDRG+P
/ ~
ADDRL+P
i
ADDRL+P ADDRG+P
/ ~x ix
f tl tl f
9.6 Conditionals
The complex semantics of the conditional expression combines parts of
the semantics of comparisons, of the binary operators, of assignment,
and of casts. The COND operator is the only one that takes three operands:
The expression e ? l : r yields the tree shown in Figure 9.6, which is
built by condt ree:
....
(enode.c functions)+= 197
Tree condtree(e, 1, r) Tree e, 1, r; {
Symbol tl;
Type ty, xty = 1->type, yty = r->type;
Tree p;
(condtree 200)
p =tree(COND, ty, cond(e),
tree(RIGHT, ty, root(l), root(r)));
p->u.sym = tl;
return p;
}
tl, carried in the u. sym field of a COND tree, is a temporary that holds
the result of the conditional expression at runtime. t1 is omitted if the
binary 173 result is void.
COND 149 The call cond(e) in the code above type-checks the first operand,
cond 174 which must have a scalar type. There are six legal combinations for the
eqtype 69
isarith 60 types of second and third operands. The three easy cases are when both
RIGHT 149 have arithmetic types, both have compatible structure types, or both have
tree 150 void type. All three of these cases are covered by the two if statements:
unqual 60
(condtree200)=
if (isarith(xty) && isarith(yty))
...
201 200
ty = binary(xty, yty);
else if (eqtype(xty, yty, 1))
ty = unqual(xty);
COND
/tl~
e RIGHT
/~ASGN
ASGN
/~ /~
ADDRL+P l ADDRL+P r
tl tl
The first if statement handles the arithmetic types, and the second han-
dles structure types and void.
The remaining three cases involve pointers. If one of the operands is
a null pointer and the other is a pointer, the resulting type is the nonnull
pointer type:
....
( condtree 200) + = 200 201
.... 200
else if (isptr(xty) && isnullptr(r))
ty = xty;
else if (isnullptr(l) && isptr(yty))
ty = yty;
If one of the operands is a void pointer and the other is a pointer to an
object or incomplete type, the result type is the void pointer:
....
(condtree200)+= 201201 .... 200
else if (isptr(xty) && !isfunc(xty->type) && isvoidptr(yty)
II isptr(yty) && !isfunc(yty->type) && isvoidptr(xty))
ty = voidptype;
If both operands are pointers to qualified or unqualified ve· .ons of com-
patible types, either can serve as the result type:
(condtree 200) +=
....
201 201 200
....
else i f ( (xty and yty point to compatible types 195))
149 COND
ty = xty; 109 CONST
else { 193 consttree
typeerror(COND, l, r); 60 isconst
return consttree(O, inttype); 60 isfunc
} 194 isnullptr
60 isptr
The type-checking code above ignores qualifiers on pointers to quali- 194 isvoidptr
fied types. The resulting pointer type, however, must include all of the 60 isvolatile
qualifiers of the referents of both operand types; so if ty is a pointer, 61 ptr
62 qual
it's rebuilt with the appropriate qualifiers: 60 unqual
.... 58 voidptype
(condtree 200)+= 201 202
.... 200
109 VOLATILE
if (isptr(ty)) {
ty = unqual(unqual(ty)->type);
if (isptr(xty) && isconst(unqual(xty)->type)
I I isptr(yty) && isconst(unqual(yty)->type))
ty = qual(CONST, ty);
if (isptr(xty) && isvolatile(unqual(xty)->type)
I I isptr(yty) && isvolatile(unqual(yty)->type))
ty = qual(VOLATILE, ty);
ty = ptr(ty);
}
...
(condtree 200)+=
if (e->op == CNST+D I I e->op == CNST+F) {
...
201 202 200
e = cast(e, doubletype);
return retype(e->u.v.d != 0.0? 1 : r, ty);
}
if (generic(e->op) == CNST) {
e = cast(e, unsignedtype);
return retype(e->u.v.u ? 1 : r, ty);
}
Tree p;
needconst++;
p = exprl(tok);
needconst--;
return p;
}
(simp.c data)=
int needconst;
9. 7 • CONSTANT FOLDING 203
needconst++;
if (generic(p->op) == CNST && isint(p->type))
n = cast(p, inttype)->u.v.i;
else
error("integer expression must be constant\;i");
needconst--;
return n;
}
if (optype(op) == 0)
op+= ttob(ty);
switch (op) {
(simplify cases204)
}
return tree(op, ty, l, r);
}
simplify does three things that tree does not: it forms a type-specific
operator if it's passed a generic one, it evaluates operators when both
204 CHAPTER 9 • EXPRESSION SEMANTICS
operands are constants, and it transforms some trees into simpler ones
that yield better code as it constructs the tree requested.
Each of the cases in the body of simplify's switch statement han-
dles one type-specific operator. If the operands are both constants, the
code builds and returns a CNST tree for the resulting value; otherwise, it
breaks to the end of the switch statement, which builds and returns the
appropriate tree. The code that checks for constant operands and builds
the resulting CNST tree is almost the same for every case; only the type
suffix, Value field name, operator, and return type vary in each case, so
the code is buried in a set of macros. The case for unsigned addition is
typical:
(simplify cases204)=
case ADD+U:
...
205 203
foldcnst(U,u,+,unsignedtype);
commute(r,l);
break;
This case implements the transformation
(ADD+U (CNST+U C1) (CNST+U C2)) ~ (CNST+U C1 + C2)
This use of fo 1 den st checks whether both operands are CNST+U trees,
and, if so, returns a new CNST+U tree whose u. v. u field is the sum of
simplify 203
tree 150
1->r.v.u and r->r.v.u:
ttob 73
unsignedtype 58
(simp.c macros)=
#define foldcnst(TYPE,VAR,OP,RTYPE) \
...
204
Value 47
if (1->op == CNST+TYPE && r->op == CNST+TYPE) {\
p = tree(CNST+ttob(RTYPE), RTYPE, NULL, NULL);\
p->u.v.VAR = 1->u.v.VAR OP r->u.v.VAR;\
return p; }
For commutative operators, commute ensures that if one of the operands
is a constant, it's the one given as commute's first argument. This trans-
formation reduces the case analyses that back ends must perform, allow-
ing back ends to count on constant operands of commutative operators
being in specific sites.
....
(simp.c macros)+=
#define commute(L,R) \
204 205
...
if (generic(R->op) == CNST && generic(L->op) != CNST) {\
Tree t = L; L = R; R = t; }
commute swaps its arguments, if necessary, to make L refer to the con-
stant operand. For example, the commute(r, 1) in the case for ADD+U
above ensures that if one of the operands is a constant, r refers to that
operand. This transformation also makes some of simplify's transfor-
mations easier, as shown below.
9. 7 • CONSTANT FOLDING 205
int cond = x == 0 I I y == 0
I I x < 0 && y < 0 && x >= min - y
II X<O&&y>O
II X>O&&y<O
I I x > 0 && y > 0 && x <= max - y;
if (!cond && needconst) {
warning("overflow in constant expression\n");
cond = 1;
}
return cond;
}
(simp.c macros)+=
....
205 207
....
#define cvtcnst(FTYPE,TTYPE,EXPR) \
if (1->op == CNST+FTYPE) {\
p = tree(CNST+ttob(TTYPE), TTYPE, NULL, NULL);\
EXPR;\
return p; }
The assignment in the CVC+I case must sign-extend the sign bit of the
character operand manually, because the compiler cannot count on chars
9. 7 • CONSTANT FOLDING 207
xcvtcnst(I, chartype,1->u.v.i,SCHAR_MIN,SCHAR_MAX,
p->u.v.sc = 1->u.v.i); break;
case CVD+F:
xcvtcnst(D, floattype, 1->U. v. d, -FLT_MAX, FLT_M,'
p->u.v.f = 1->u.v.d); break;
case CVD+I:
xcvtcnst(D, inttype,1->u.v.d, INT_MIN,INT_MAX,
p->u.v.i = 1->u.v.d); break;
57 chartype
case CVI+S: 204 commute
xcvtcnst(I,shorttype,1->u.v.i, SHRT_MIN,SHRT_MAX, 327 cvtconst
p->u.v.ss = 1->u.v.i); break; 57 floattype
... 204 foldcnst
(simp.c macros)+=
#define xcvtcnst(FTYPE,TTYPE,VAR,MIN,MAX,EXPR) \
206 208
... 208 identity
202 needconst
57 shorttype
if (1->op == CNST+FTYPE) {\ 203 simplify
if (needconst && (VAR < MIN I I VAR > MAX))\ 150 tree
warning("overflow in constant expression\n");\ 73 ttob
if (needconst I I VAR >= MIN && VAR <= MAX) {\ 58 unsignedtype
p = tree(CNST+ttob(TTYPE), TTYPE, NULL, NULL);\
EXPR;\
return p; } }
In addition to evaluating constant expressions, s imp 1i fy transforms
the trees for some operators to help generate better code. Some of these
transformations remove identities and other simple cases. For example:
...
(simplify cases204)+=
case BAND+U:
207 208... 203
foldcnst(U,u,&,unsignedtype);
commute(r,l);
identity(r,l,U,u,(-(unsigned)O));
if (r->op == CNST+U && r->U.V.U == 0)
208 CHAPTER 9 • EXPRESSION SEMANTICS
cfoldcnst(I,i,!=,inttype);
commute(r,l);
zerofield(NE,I,i);
break;
9.7 • CONSTANT FOLDING 209
(simp.c macros)+=
....
208 209
.....
#define zerofield(OP,TYPE,VAR) \
if (1->op == FIELD\
&& r->op == CNST+TYPE && r->u.v.VAR == 0)\
return eqtree(OP, bittree(BAND, 1->kids[O],\
consttree(\
fieldmask(l->u.field)<<fieldright(l->u.field),\
unsignedtype)), r);
This case implements the transformation
(NE+I (FIELD e) (CNST+I 0)) ~
(NE+I (BAND+U (e (CNST+U 1\1))) (CNST+I 0))
where J\1 is a mask of s bits shifted m bits left, and s is size of the bit
field that lies m bits from the least significant end of the unsigned or
integer in which it appears. cfo 1den st is a version of fo 1den st that's
specialized for the relational operators:
(simp.c macros)+=
....
209 209
.....
#define cfoldcnst(TYPE,VAR,OP,RTYPE) \
if (1->op == CNST+TYPE && r->op == CNST+TYPE) {\
p = tree(CNST+ttob(RTYPE), RTYPE, NULL, NULL);\
p->u.v.i = 1->u.v.VAR OP r->u.v.VAR;\
204 commute
return p; } 193 consttree
195 eqtree
Pointer addition is the most interesting and complex case in simplify 149 FIELD
because it implements many transformations that yield better code. Gen- 182 field
erating efficient addressing is the linchpin of generating efficient code, 66 fieldmask
so effort in this case pays off on all targets. The easy cases handle con- 66 fieldright
stants and identities: 204 foldcnst
208 identity
(simplify cases204)+=
....
208 203 171 retype
203 simplify
case ADD+P: 150 tree
foldaddp(l,r,I,i); 73 ttob
foldaddp(l,r,U,u); 58 unsignedtype
foldaddp(r,l,I,i);
foldaddp(r,l,U,u);
commute(r,l);
identity(r,retype(l,ty),I,i,O);
identity(r,retype(l,ty),U,u,O);
(ADD+P transformations 210)
break;
....
(simp.c macros)+= 209
#define foldaddp(L,R,RTYPE,VAR) \
if (L->op == CNST+P && R->op == CNST+RTYPE) {\
p = tree(CNST+P, ty, NULL, NULL);\
210 CHAPTER 9 • EXPRESSION SEMANTICS
NEWO(q, FUNC);
q->name = stringd(genlabel(l));
q->sclass = p->sclass;
q->scope = p->scope;
q->type = ty;
q->temporary = p->temporary;
q->generated = p->generated;
q->addressed = p->addressed;
q->computed = 1;
q->defined = 1;
9. 7 • CONSTANT FOLDING 211
q->ref = 1;
(announce q 211)
e = tree(e->op, ty, NULL, NULL);
e->u.sym = q;
return e;
}
....
(symbol flags 50)+= 179 292
..... 38
unsigned computed:l;
As for other identifiers, the front end must announce this new identi-
fier to the back end. Since its address is based on the address of another
identifier, represented by p, it's announced by calling the interface func-
tion address, and its computed flag identifies it as a symbol based on
another symbol. But there's a phase problem: p must be announced be-
fore q, but if p is a local or a parameter, it has not yet been passed to
the back end via local or function. addrtree thus calls address only
for globals and statics, and delays the call for locals and parameters:
(announce q 211) = 211
if (p->scope == GLOBAL
I I p->sclass == STATIC I I p->sclass EXTERN) {
if (p->sclass == AUTO) 219 addlocal
q->sclass = STATIC; 90 address
217 Address
(*IR->address)(q, p, n); 457 address (MIPS)
} else { 490 " (SPARC)
Code cp; 521 " (X86)
addlocal(p); 210 addrtree
cp = code(Address); 80 AUTO
217 Code
cp->u.addr.sym = q; 218 code
cp->u.addr.base = p; 80 EXTERN
cp->u.addr.offset = n; 92 function
} 448 " (MIPS)
484 " (SPARC)
The code-list entry Address is described in Section 10.1. lee can't delay 518 " (X86)
the call to address for globals and statics because expressions like &a[S] 38 GLOBAL
are constants and can appear in, for example, initializers. 306 IR
179 isaddrop
The next transformation improves expressions like b [ i] . name, which 90 local
yields a tree of the form (ADD+P (ADD+P i n) (CNST+x c)), where i is 447 " (MIPS)
a tree for an integer expression and n and c are defined above. This tree 483 " (SPARC)
can be transformed into (ADD+P i (ADD+P n (CNST+x c))) and the in- 518 " (X86)
364 offset
ner ADD+P tree will be collapsed to a simple address by the transforma- 38 ref
tion above to yield (ADD+P i n'). 37 scope
.... 80 STATIC
(ADD+P transformations 210)+= 210 212
..... 209 150 tree
if (1->op == ADD+P && isaddrop(l->kids[l]->op)
&& (r->op == CNST+I I I r->op == CNST+U))
212 CHAPTER 9 • EXPRESSION SEMANTICS
Further Reading
1cc's approach to type checking is similar to the one outlined in Chap-
ter 6 of Aho, Sethi, and Ullman (1986). simplify's transformations are
similar to those described by Hanson (1983). Similar transformations
can be done, often more thoroughly, by other kinds of optimizations
or during code generation, but usually at additional cost. s imp 1 i fy im-
plements only those that are likely to benefit almost all programs. A
more systematic approach is necessary to do a more thorough job; see
Exercise 9.8.
Exercises
9.1 Implement Type super(Type ty), which is shown in Figure 9.1.
Don't forget about enumerations and the types long, unsigned long,
and long double.
9.2 How can a double be converted to an unsigned using only double-to-
signed integer conversion? Use your solution to implement cast's
fragment (double-to-unsigned conversion).
9.3 In 1 cc, all enumeration types are represented by integers because
cast 175 that's what most other C compilers do, but the standard permits
postfix 166
simplify 203
each enumeration type to be represented by any of the integral
unary 164 types, as long it the type chosen can hold all the values. For exam-
ple, unsigned characters could be used for enumeration types with
enumeration values in the range 0-255. Explain how cast must
be changed to accommodate this scheme. Earlier versions of 1cc
implemented this scheme.
9.4 Implement the omitted fragments for unary and postfix.
9.5 Dereferencing null pointers is a common programming error in C
programs. 1cc's -n option catches these errors. With -n, 1cc gen-
erates code for
static char *_YYfi 1e = "file";
static void _YYnull(int line) {
char buf[200];
sprintf(buf,"null pointer dereferenced @%s:%d\\n",
_YYfile, line);
write(2, buf, strlen(buf));
abort();
}
at the end of each source file; file is the name of the source file.
It also arranges for its global YYnul 1 to point to the symbol-table
EXERCISES 215
int i;
struct 1 i st {
char *name;
struct entry table {
int age;
int count; 104 lineno
203 simplify
} table[lOJ;
} x[lOO];
and emi tcode to generate and emit code; these functions traverse the
code list.
The code list is a doubly linked list of typed code structures:
{stmt.c typedefs)=.
typedef struct code *Code;
...
231
(stmt.c functions)=
Code code(kind) int kind; {
...
219
Code cp;
The bottom diagram in Figure 10.1 shows the code list after two entries
have been appended.
The values of the enumeration constants that identify code-list en-
tries are important. Those greater than Start generate executable code;
those less than Labe 1 do not generate code, but serve only to declare
information of interest to the back end. Thus, code can detect entries
that will generate unreachable code if it appends one with kind greater
than Start after an unconditional jump:
Code 217 (check for unreachable code 218) = 218
codelist 217
FUNC 97 if (kind > Start) {
Jump 217 for (cp = codelist; cp->kind <Label; )
kind 143 cp = cp->prev;
Label 217 if (cp->kind == Jump I I cp->kind == Switch)
NEW 24 warning("unreachable code\n");
Start 217 }
Switch 217
codelist---.
codehead: Start kind
NULL prev
NULL next
u
codelist - - - - - - - - - - - - - - -
codehead:
FIGURE 10.1 The initial code list and after appending two entries.
10. 1 • REPRESENTING CODE 219
(stmt.c functions)+=
...
218 220
.....
void addlocal(p) Symbol p; {
if (!p->defined) {
code(Local)->u.var p;
p->defined = 1;
p->scope =level;
}
}
addrtree illustrates the use of addlocal and the use of code to append
an Address entry. Address entries carry the data necessary for gencode
to make a call to the interface function address.
(Address 219) = 217
struct {
Symbol sym; 90 address
Symbol base; 217 Address
int offset; 457 address (MIPS)
490 " (SPARC)
} addr; 521 " (X86)
210 addrtree
When gencode processes this entry, it uses the values of the sym, base,
217 Blockbeg
and offset fields as the three arguments to address. 217 Blockend
Bl ockbeg entries store the data necessary to compile a compound 218 code
statement: 50 defined
365 Env
(Blockbeg 219)= 217 337 gencode
struct { 41 identifiers
int level; 42 level
217 Local
Symbol *locals; 364 off set
Table identifiers, types; 37 scope
Env x; 39 Table
} block; 41 types
level is the value of level associated with the block, and local s is
a null-terminated array of symbol-table pointers for the locals declared
in the block. xis back end's Env value for this block. identifiers and
types record the i denti fie rs and types tables when the block was com-
piled; they're used in code omitted from this book to generate debugger
symbol-table information when the option -g is specified. Blockend en-
tries just point to their matching Blockbeg:
220 CHAPTER 10 •STATEMENTS
cp->u.point.src = p ? *p : src;
cp->u.point.point = npoints;
10.3 •RECOGNIZING STATEMENTS 221
Usually, defi nept is called with a null pointer, but the loop and switch
statements generate tests and assignments at the ends of their state-
ments, so the execution points are in a different order in the generated
code than they appear in the source code. For these, the relevant coordi-
nate is saved when the expression is parsed, and is passed to defi nept
when the code for the expression is generated; the calls to defi nept in
forstmt are examples.
statement takes three arguments: loop is the label number for the inner-
most for, while, or do-while loop, swp is a pointer to the swtch structure
that carries all of the data pertaining to the innermost switch statement
(see Section 10.7), and lev tells how deeply statements are currently
nested. If the current statement is not nested in any loop, 1oop is zero;
if it's not nested in any switch statement, swp is null. 1oop is needed to
generate code for break and continue statements, swp is needed to gen-
erate code for switch statements, and 1ev is needed only for the warning
shown above at the beginning of statement. The code for each kind of
statement passes these values along to nested calls to statement, modi-
definept 220 fying them as appropriate.
expect 142 Labels, like those used for 1oop, are local labels, and they're gener-
exprO 156 ated by genlabel (n), which returns the first of n labels. findlabel (n)
fi ndl abe l 46 returns the symbol-table entry for label n.
genlabel 45
idtree 168
For every reference to an identifier, i dtree increments that identifier's
kind 143 ref field by refi nc. This value is approximately proportional to the num-
listnodes 318 ber of times the identifier is referenced. statement and its descendants
nodecount 314 change refi nc to weight each reference to an identifier that is appropri-
refine 169 ate for the statement in which it appears. For example, refi nc is divided
skipto 144
statement 221
by 2 for the arms of an if statement, and it's multiplied by 10 for the
body of a loop. The value of the ref field helps identify those locals and
parameters that might profitably be assigned to registers, and locals are
announced to the back end in decreasing order of their ref values.
The default case handles expressions that are statements:
(expression statement 222) = 221
definept(NULL);
if (kind[t] != ID) {
error("unrecognized statement\n");
t = gettok();
} else {
Tree e = exprO(O);
listnodes(e, 0, O);
if (nodecount == 0 I I nodecount > 200)
10.3 •RECOGNIZING STATEMENTS 223
walk(NULL, 0, O);
deallocate(STMT);
}
1i st nodes and wa 1k are the two functions that generate dags from trees.
Chapter 12 explains their implementations, but their usage must be ex-
plained now in order to understand how the front end implements the
semantics of statements.
1i stnodes takes a tree as its first argument, generates the dag for that
tree as described in Chapter 5, and appends the dag to a growing forest
of dags that it maintains. Thus, the call to 1i stnodes above generates the
dag for the tree returned by exprO, and appends that dag to the forest.
For the input
c = a + b;
a= a/2;
d = a + b;
the fragment (expression statement) is executed three times, once for
each statement, and thus 1i stnodes is called three times. The first call
appends the dag for c = a + b to the initially empty forest, and the
next two calls grow that forest by appending the dags for the second
and third assignments. As detailed in Section 12.1, 1i stnodes reuses
common subexpressions when possible; for example, in the assignment 28 deallocate
d = a + b, it reuses the dags for the !value of a and the rvalue of b 156 exprO
formed for the first assignment. It can't reuse the rvalue of a because 217 Gen
the second assignment changes a. 318 listnodes
314 nodecount
The second and third arguments are label numbers, and their purpose 97 STMT
is explained in the next section; the zeros shown in the call to 1i stnodes 311 walk
above specify no labels. 1i stnodes also accepts the null tree for which
it simply returns.
1i stnodes keeps the forest to itself until wa 1k is called, which accepts
the same arguments as 1i stnodes. wa 1 k takes two steps: First, it passes
its arguments to 1i stnodes, so a call to wa 1k has the same effect as a call
to 1i st nodes. Second, and most important, wa 1k allocates a Gen code-list
entry, stores the forest in that entry, appends the entry to the code list,
and clears the forest. Once a forest is added to the code list, its dags are
no longer available for reuse by 1i stnodes.
The call wa 1k(NULL, 0, 0) effectively executes just the second step,
and it has the effect of adding the current forest to the code list, if there
is a nonempty forest. This call is made whenever the current forest must
be appended to the code list either because some other executable code-
list entry must be appended or because two or more separate flows of
control merge. In the code above, this call is made when nodecount is
zero or when it exceeds 200. nodecount is the number of nodes in the
forest that are available for reuse. wa 1 k is called when the forest has
no nodes that can be reused or when the forest is getting large. The
224 CHAPTER 10 •STATEMENTS
10.4 If Statements
The generated code for an if statement has the form
if expression == 0 goto L
statement1
goto L + 1
L: statement2
L + 1:
(stmt.c functions)+=
...
221 225
....
branch 247 static void ifstmt(lab, loop, swp, lev)
conditional 225 int lab, loop, lev; Swtch swp; {
deallocate 28 t = gettok();
define lab 246
definept 220
expect(' (') ;
expect 142 definept(NULL);
expr 155 walk(conditional(')'), 0, lab);
findlabel 46 refine/= 2.0;
genlabel 45 statement(loop, swp, lev);
ref 38
refi nc 169 if (t == ELSE) {
statement 221 branch(lab + 1);
STMT 97 t = gettok();
Swtch 231 definelab(lab);
walk 311 statement(loop, swp, lev);
if (findlabel(lab + 1)->ref)
definelab(lab + 1);
} else
definelab(lab);
}
The first argument to ifstmt is L; genlabel (2) generates two labels for
use in the if statement. i fstmt's other three arguments echo statement's
arguments. conditional parses an expression by calling expr, and en-
sures that the resulting tree is a conditional, which is an expression
whose value is used only to alter flow of control. The root of the tree
10.4 • IF STATEMENTS 225
for a conditional has one of the comparison operators, AND, OR, NOT, or
a constant. condi ti ona l's argument is the token that should follow the
expression in the context in which condi ti ona 1 is called.
{stmt.c functions)+=
...
224 226
.....
static Tree conditional(tok) int tok; {
Tree p = expr(tok);
The second and third arguments to 1i stnodes and walk are labels
that specify true and false targets. walk(e, tlab, flab) passes its ar-
guments to 1i stnodes, which generates a dag from e and adds it to the
forest, and appends a Gen entry carrying the forest to the code list, as
explained in the previous section. When e is a tree for a conditional ex-
pression, either tl ab or fl ab is nonzero. If tl ab is nonzero, 1i stnodes
generates a dag that transfers control to t 1ab if the result of e is nonzero;
likewise, 1i stnodes generates a dag that jumps to fl ab if e evaluates to
zero. 1i stnodes and wa 1k can be called with a nonzero value for only
one of tl ab or fl ab; control always "falls through" for the other case. 62 Aflag
149 AND
For the if statement, wa1 k is called with a nonzero fl ab corresponding 247 branch
to L in the generated code shown above. define 1ab and branch generate 174 cond
code-list items for label definitions and jumps. L + 1 is defined only if 246 definelab
it's needed; a label's ref field is incremented each time it's used as the 155 expr
217 Gen
target of a branch. For example, L + 1 isn't needed if the branch to it is 168 idtree
eliminated, which occurs in code like 224 ifstmt
60 isfunc
i f ( ... ) 318 listnodes
return; 149 NOT
else 149 OR
169 refine
221 statement
The return statement acts like an unconditional jump, so the call to 311 walk
branch(lab + 1) doesn't emit the branch.
Recall that refi nc is the amount added to each reference to an iden-
tifier in i dtree. Estimating that each arm of an if statement is executed
approximately the same number of times, refi nc is halved before they
are parsed. The result is that a reference to an identifier in one of the
arms counts half as much as a reference before or after the if statement.
i fstmt doesn't have to restore refi nc because statement does.
226 CHAPTER 10 •STATEMENTS
get ch r advances the input to just before the initial character of the next
token and returns that character. It is used to 'peek' at the next character
to check for a colon. Since an identifier can be both a label and a variable,
a separate table, stmtlabs, holds source-language labels:
(stmt.c exported data)=
extern Table stmtlabs;
Like other tables, stmtl abs is managed by 1ookup and i nsta11. It maps
source-language labels to internal label numbers, which are stored in the
symbols' u. 1 . 1abe1 fields.
defined 50 ....
definelab 246
(stmt.c functions)+=
static void stmtlabel() {
225 228 ...
expect 142
FUNC 97 Symbol p = lookup(token, stmtlabs);
getchr 108
install 44 (install token in stmtl abs, if necessary 226)
Label 217 if (p->defined)
lookup 45
statement 221 error("redefinition of label '%s' previously_
Table 39 defined at %w\n", p->name, &p->src);
token 108 p->defined = 1;
definelab(p->u.l.label);
t = gettok();
expect(':');
}
p->scope = LABELS;
p->u.l.label = genlabel(l);
p->src = src;
}
A label's ref field counts the number of references to the label and is
initialized to zero by i nsta11. Each reference to the label increments the
ref field:
(goto statement 227) = 221
walk(NULL, 0, O);
definept(NULL);
t = gettok();
if Ct == ID) {
Symbol p = lookup(token, stmtlabs);
(install token in stmtl abs, if necessary 226)
use(p, src);
branch(p->u.l.label);
t = gettok();
} else
error("missing label in goto\n");
branch(n) builds a JUMPV dag for a branch to the label n, allocates a Jump
code-list entry to hold that dag, and appends the Jump entry to the code
247 branch
list. It also increments n's ref field. 309 checklab
Undefined labels - those referenced in goto statements but never de- 220 defi nept
fined - are found and announced when funcdefn calls checkl ab at the 286 funcdefn
end of a function definition. 45 genlabel
44 install
217 Jump
38 LABELS
10.6 Loops 45 lookup
37 scope
The code for all three kinds of loops has a similar structure involving 226 stmtlabs
three labels: L is the top of the loop, L + 1 labels the test portion of the 108 token
51 use
loop, and L + 2 labels the loop exit. For example, the generated code for 311 walk
a while loop is
goto L + 1
L: statement
L + 1: if expression != 0 goto L
L+ 2:
This layout is better than
L:
L + 1: if expression ! = 0 goto L + 2
statement
goto L
L + 2:
228 CHAPTER 10 •STATEMENTS
t = gettok();
expect('(');
10.6 •LOOPS 229
definept(NULL);
{forstmt 229)
}
pt3 holds the source coordinate for the increment expression for a later
call to defpoi nt.
Multiplying refi nc by 10 estimates that loop bodies are executed 10
times more often than statements outside of loops, and weights refer-
ences to identifiers used in loops accordingly.
Many for loops look like the one in the following code:
230 CHAPTER 10 •STATEMENTS
sum = O;
for Ci = O; i < 10; i++)
sum+= x[i];
The loop bodies in these kinds of loops are always executed at least once
and the leading goto L + 3 could be omitted, which is accomplished by
(forstmt229)+=
....
229 230 229
....
if (e2) {
once= foldcond(el, e2);
if (!once)
branch(lab + 3);
}
fo 1dcond inspects the trees for the initialization and for the test to de-
termine if the loop body will be executed at least once; see Exercise 10.3.
el is passed to fol dcond, which is why it was parsed with texpr above.
The rest of forstmt compiles the loop body and lays down the labels
and expressions as described above.
(forstmt229)+=
.... 229
230
definelab(lab);
statement(lab, swp, lev);
definelab(lab + 1);
definept(&pt3);
branch 247
definelab 246 i f (e3)
definept 220 walk(e3, 0, 0);
findlabel 46 if (e2) {
foldcond 250 if (!once)
forstmt 228 definelab(lab + 3);
labels 41
ref 38 definept(&pt2);
statement 221 walk(e2, lab, 0);
texpr 150 } else {
walk 311 definept(&pt2);
branch(lab);
}
if (findlabel(lab + 2)->ref)
definelab(lab + 2);
Symbol-table entries for generated labels are installed in the 1abe1 s table
by fi ndl abe 1. Llke other labels, the ref field of a generated label is
nonzero only if the label is the target of a jump.
(stmt.c types)=
struct swtch {
Symbol sym;
232 CHAPTER 10 •STATEMENTS
int lab;
Symbol deflab;
int ncases;
int size;
int *values;
Symbol *labels;
} ;
sym holds the temporary, tl, 1ab holds the value of L, and defl ab points
to the symbol-table entry for the default label, if there is one. values and
1abe1 s point to arrays that store the value-label pairs. These arrays have
size elements, ncases of which are occupied, and these ncases are kept
in ascending order of va1ues. A pointer to the swtch structure for the
current switch statement - the switch handle - is passed to statement
and its descendants.
Case and default labels are handled much like break and continue
statements: They refer to the innermost, or current, switch statement,
and case and default labels that appear outside of switch statements,
which is when the switch handle is null, are erroneous. The code for
the break statement determines whether it is associated with a loop or a
switch by examining both the loop handle and the switch handle:
(break statement 232) = 221
branch 247
definept 220
walk(NULL, 0, 0);
genlabel 45 definept(NULL);
labels 41 if (swp && swp->lab > loop)
statement 221 branch(swp->lab + 1);
Switch 217 else if (loop)
swstmt 233
branch(loop + 2);
walk 311
else
error("illegal break statement\n");
t = gettok();
Since the values of labels increase as they are generated, a break refers
to a switch statement if there's a switch handle and its Lis greater than
the loop handle.
Parsing switch statements involves parsing and type-checking the ex-
pression, generating a temporary, appending a Switch placeholder on the
code list, initializing a new switch handle and passing it to statement,
and generating the closing labels and the selection code.
(switch statement232)= 221
swstmt(loop, genlabel(2), lev + 1);
(stmt.c macros)=
#define SWSIZE 512
...
239
10.7 •SWITCH STATEMENTS 233
(stmt.c functions)+=
....
228 235 ....
static void swstmt(loop, lab, lev) int loop, lab, lev; {
Tree e;
struct swtch sw;
Code head, tail;
t = gettok();
expect(' (') ;
definept(NULL);
e = expr(') ');
(type-check e 233)
(generate a temporary to hold e, if necessary233)
head= code(Switch);
sw.lab =lab;
sw.deflab = NULL;
sw.ncases = O;
SW.size = SWSIZE;
sw.values = newarray(SWSIZE, sizeof *sw.value~ FUNC);
sw.labels = newarray(SWSIZE, sizeof *sw.labe 1 , FUNC);
refine /= 10.0;
statement(loop, &sw, lev);
(define L, if necessary, and L + 1 236)
(generate the selection code 236) 175 cast
} 217 Code
218 code
The placeholder Swi tch entry in the code list will be replaced by one or 220 defi nept
142 expect
more Switch entries when the selection code is generated. The switch 155 expr
expression must have integral type, and it's promoted: 97 FUNC
179 isaddrop
(type-check e 233) = 233 60 isint
if (!isint(e->type)) { 60 isvolatile
error("illegal type '%t' in switch expression\n", 41 labels
e->type); 28 newarray
71 promote
e = retype(e, inttype); 169 refine
} 171 retype
e = cast(e, promote(e->type)); 221 statement
217 Switch
The temporary also has type e->type, but the temporary can be avoided 232 SWSIZE
in some cases. If the switch expression is simply an identifier, and it's 343 tail
the right type and is not volatile, then it can be used instead. Otherwise,
the expression is assigned to a temporary:
(generate a temporary to hold e, if necessary233)= 233
if (generic(e->op) == INDIR && isaddrop(e->kids[O]->op)
&& e->kids[O]->u.sym->type == e->type
&& !isvolatile(e->kids[O]->u.sym->type)) {
sw.sym = e->kids[O]->u.sym;
234 CHAPTER 10 •STATEMENTS
walk(NULL, 0, 0);
} else {
sw.sym = genident(REGISTER, e->type, level);
addlocal(sw.sym);
walk(asgn(sw.sym, e), 0, O);
}
Once the switch handle is initialized, case and default labels simply
add data to the handle. For example, a default label fills in the defl ab
field, unless it's already filled in:
(default label234)= 221
if (swp == NULL)
error("illegal default label\n");
else if (swp->deflab)
error("extra default label\n");
else. {
swp->deflab = findlabel(swp->lab);
definelab(swp->deflab->u.l.label);
}
t = gettok();
expect(' : ');
statement(loop, swp, lev);
addlocal 219
caselabel 235 Case labels are similar: The label value is converted to the promoted
cast 175
constexpr 202
type of the switch expression, and a label associated with that value is
definelab 246 generated and defined:
expect 142
findlabel 46 (case label 234) = 221
genident 49 {
genlabel 45 int lab= genlabel(l);
isint 60 if (swp == NULL)
level 42
error("illegal case label\n");
needconst 202
REGISTER 80 definelab(lab);
statement 221 while (t == CASE) {
walk 311 static char stop[] { IF, ID, 0 };
Tree p;
t = gettok();
p = constexpr(O);
if (generic(p->op) == CNST && isint(p->type)) {
if (swp) {
needconst++;
p = cast(p, swp->sym->type);
needconst--;
caselabel(swp, p->u.v.i, lab);
}
} else
10.1 • SWITCH STATEMENTS 235
The for loop inserts the new label and value into the right place in the
va1ues and 1abe1 s arrays so that these arrays are sorted in ascending
236 CHAPTER 10 •STATEMENTS
order of values, which helps both to detect duplicate case values and to
generate good selection code. If necessary, these arrays are doubled in
size to accommodate the new value-label pair.
After the return from statement to swstmt, a default label is defined,
if there was no explicit default, and the exit label, L + 1, is defined, if it
was referenced:
(define L, if necessary, and L + 1 236) = 233
if (sw.deflab == NULL) {
sw.deflab = findlabel(lab);
definelab(lab);
if (sw.ncases == 0)
warning("switch statement with no cases\n");
}
if (findlabel(lab + 1)->ref)
definelab(lab + 1);
The default label is defined even if it isn't referenced, because it will
probably be referenced by the selection code.
The selection code can't be generated until all the cases have been
examined. Compiling statement appends entries to the code list, but
the entries for the selection code need to appear just after those for
expression and before those for statement. The selection code could
codelist 217
definelab 246
appear after statement if branches were inserted so the selection code
findlabel 46 was executed before statement. But there's a solution to this problem
ref 38 that's easier and generates better code: rearrange the code list.
statement 221 The top diagram in Figure 10.2 shows the code list after the exit label
Switch 217 has been defined. The solid circle represents the entry for expression,
swstmt 233
tail 343
the open circle is the Switch placeholder, and the open squares are the
entries for statement, including the definitions for the case and default
labels and the jumps generated by break statements. head points to the
placeholder and code 1i st to the last statement entry.
The first step in generating the selection code is to make the solid
circle the end of the code list:
(generate the selection code 236)= 236
..... 233
tail = codelist;
codelist = head->prev;
codelist->next = head->prev = NULL;
The second diagram in Figure 10.2 shows the outcome of these state-
ments. head and tai 1 point to the entries for the placeholder and for
statement, and code 1i st points to the entry for expression. As the se-
lection code is generated, its entries are appended in the right place:
(generate the selection code 236) +=
....
236 237 233
.....
if (sw.ncases > 0)
10.7 •SWITCH STATEMENTS 237
••• D
i
code list
•••• ••• D
i
code list
i
tail
~~
-~~D, .. t
code list
if'o~o
i~~
head
••• D
i
tail
••• D
i
code list
247 branch
217 codelist
e switch expression
O switch placeholder 239 swgen
343 tail
o statement entries D, selection code entries
~ prev pointers ~ next pointers
swgen(&sw);
branch(lab);
Figure 10.2's third diagram shows the code list after entries for the se-
lection code, which are shown in open triangles, have been added. The
last step is to append the entire list held by head and tail to the code
list and set code list back to tai 1:
(generate the selection code 236) +=
....
236 233
head->next->prev = codelist;
codelist->next = head->next;
codelist =tail;
The last diagram in Figure 10.2 shows the result, which omits the place-
holder.
The fastest selection code when there are more than three cases is a
branch table: The value of expression is used as an index to this table,
238 CHAPTER 10 •STATEMENTS
and the ith entry holds Li, or L if i is not a case label. For this organiza-
tion, selection takes constant time. This table takes space proportional
to u - l + 1 where l and u are the minimum and maximum case values.
For n case values, the density of the table - the fraction occupied by
nondefault destination labels - is n/ (u - l + 1). If the density is too low,
this organization wastes space. Worse, there are legal switch statements
for which it is impractical:
switch (i) {
case INT_MIN : ... ' break;
case INT_MAX: ... ' break;
}
For example,
d(0,9) (9 - 0 + 1)/(39 - 21+1) 10/19 0.53
d(0,5) (5-0+1)/(29-21+1) 6/9 0.67
d(6,9) (9-6+1)/(39-36+1) 4/4 1.0
and v[6 .. 9]), and three tables if density is 0.75 (v[0 .. 2], v[3 .. 5], and
v[6 .. 9]). If density exceeds 1.0, there are none-element tables, which
corresponds to a binary search.
A simple greedy algorithm implements partitioning: If the current ta-
ble is v[i .. j] and d(i,j + 1) ~density, extend the table to v[i .. j + l].
Whenever a table is extended, it's merged with its predecessor if the den-
sity of the combined table is greater than density. swgen does both of
these steps at once by treating the single element v [j + 1] as the table
v[j + l..j + 1] and merging it with its predecessor, if possible. In the
code below, buckets [k] is the index in v of the first value in the kth ta-
ble, i.e., table k is v[buckets[k] .. buckets[k + 1] - 1]. For n case values,
there can be up to n tables, so buckets can have n + 1 elements.
(stmt.c macros}+= 232
...
#define den(i,j) ((j-buckets[i]+l.O)/(v[j]-v[buckets[i]]+l))
(stmt.c functions}+=
...
235 240
static void swgen(swp) Swtch swp; {
....
int *buckets, k, n, *v = swp->values;
buckets = newarray(swp->ncases + 1,
sizeof *buckets, FUNC);
for (n = k = O; k < swp->ncases; k++, n++) { 238 density
buckets[n] = k; 97 FUNC
while (n > 0 && den(n-1, k) >= density) 28 newarray
n--; 240 swcode
} 231 Swtch
buckets[n] = swp->ncases;
swcode(swp, buckets, 0, n - 1);
}
When swgen calls swcode, there are n tables, buckets [O .. n-1] holds the
indices into v for the first value in each table, and buckets [n] is equal
ton, which is the index of a fictitious n+lst table.
The display below illustrates how swgen partitions the example from
above when density is 0.66. The first iteration of the for loop ends with:
v[i] j2.l 22 23 27 28 29 36 37 38 39
The vertical bars appear to the left of the first element of a table and
thus represent the values of buckets. The rightmost bar is the value of
buckets [n]. The value associated with k is underlined. So, at the end
of the first iteration, k is zero and refers to the value 21, and the one
table is v[0 .. 0]. The next two iterations set buckets [1] to 1 and 2, and
in each case combine the single-element tables v[l..1] and v[2 .. 2] with
their predecessors v[O .. O] and v[O .. l]. At the end of the third iteration,
the state is
240 CHAPTER 10 •STATEMENTS
v[i] 121 22 23 27 28 29 36 37 38 39
and the only table is v[0 .. 2]. The fourth iteration cannot merge v[3 .. 3],
which holds just 27, with v[0 .. 2] because the density d(O, 3) = 4/7 =
0.57 is too low, so the state becomes
v[iJ 121 22 23 1U 28 29 36 37 38 39
Next, v[ 4 ..4] (28) can be merged with v[3 .. 3], but v[3 ..4] cannot be
merged with v[0 .. 3] because d(O, 4) = 5/8 = 0.63.
The iteration that examines 29 is the interesting one. Just before the
while loop, n is 2 and the state is
v[i] 121 22 23 127 28 ~ 36 37 38 39
The while loop merges v[3 .. 4] with v[5 .. 5] and decrements n to l; since
d(O, 5) = 6/9 = 0.67, it also merges v[0 .. 2] with the just-formed v[3 .. 5]
and decrements n to 0. The state after the while loop is
v[i] 121 22 23 27 28 29 36 37 38 39
This process ends with two tables; the state just before calling swcode is
i 0 1 2 3 4 5 6 7 8 9
v[i] 121 22 23 27 28 29 136 37 38 391
and n is 2 and buckets holds the indices 0, 6, and 10.
swgen 239 The last two steps arrange the tables described by buckets into a tree
Swtch 231 and traverse this tree generating the selection code for each table. swcode
uses a divide-and-conquer algorithm to do both steps at the same time.
swgen calls swcode with the switch handle, buckets, the lower and upper
bounds of buckets, and the number of tables. buckets also has a sentinel
after its last element, which simplifies accessing the last case value in the
last table.
swcode generates code for the ub-1 b+l tables given by b [lb .. ub]. It
picks the middle table as the root of the search tree, generates code for
it, lfld calls itself recursively for the tables on either side of the root
table.
....
(stmt.c functions)+= 239 242
....
static void swcode(swp, b, lb, ub)
Swtch swp; int b[]; int lb, ub; {
int hilab, lolab, l, u, k =(lb+ ub)/2;
int *v = swp->values;
(swcode241)
}
When there's only one table, switch expressions whose value is not within
the range covered by the table cause control to be transferred to the
default label. For a binary search of tables, control needs to flow to the
appropriate subtable when the switch expression is out of range.
10.7 •SWITCH STATEMENTS 241
(swcode241)=
if Ck > lb && k < ub) {
...
241 240
lolab = genlabelCl);
hilab = genlabelCl);
} else if Ck > lb) {
lolab = genlabelCl);
hilab = swp->deflab->u.l.label;
} else if Ck < ub) {
lolab = swp->deflab->u.l.label;
hilab = genlabelCl);
} else
lolab = hilab = swp->deflab->u.l.label;
1o1 ab and hi 1ab are where control should be transferred to if the switch
expression is less than the root's smallest value or greater than the root's
largest value. If the search tree has both left and right subtables, 1o1 ab
and hi 1ab will label their code sequences. The default label is used for
hilab when there's no right subtable and for lolab when there's no left
subtable. If the root is the only table, the default label is used for both
lo lab and hi lab.
Finally, the code for the root table is generated:
....
(swcode 241)+=
1 = b[k];
241 241
... 240
246 definelab
45 genlabel
u = b[k+l] - 1; 240 swcode
if Cu - 1 + 1 <= 3)
{generate a linear search)
else {
{generate an indirect jump and a branch table 242)
}
and swcode is called recursively to generate the left and right subtables .
....
{swcode 241) += 241 240
if Ck > lb) {
definelabClolab);
swcodeCswp, b, lb, k - 1);
}
if Ck < ub) {
definelabChilab);
swcodeCswp, b, k + 1, ub);
}
The code generated for an indirect jump through a branch table has
the form:
if tl < v[l] goto lolab
if tl > v[u] goto hilab
goto *table[tl-v[l]]
where v[l], v[u], lolab, and hi lab are replaced by the corresponding
values computed by swcode. The branch table is a static array of pointers,
and the tree for the target of an indirect jump is the same one that's built
for indexing an array:
(generate an indirect jump and a branch table 242) = 243
.... 241
Symbol table = genident(STATIC,
array(voidptype, u - 1 + 1, 0), LABELS);
(*IR->defsymbol)(table);
cmp(LT, swp->sym, v[l], lolab);
cmp(GT, swp->sym, v[u], hilab);
walk(tree(JUMP, voidtype,
rvalue((*optree['+'])(ADD, pointer(idtree(table)),
(*optree['-'])(SUB,
cast(idtree(swp->sym), inttype),
array 61
cast 175 consttree(v[l], inttype)))), NULL), 0, O);
consttree 193
defaddress 91 cmp builds the tree for the comparison
(MIPS) " 456
i f p ®n goto L
(SPARC) " 490
(X86) " 523 and converts it to a dag. p is an identifier, ® is a relational operator, and
defsymbol 89
(MIPS) " 457 n is an integer constant:
(SPARC) " 491
(stmt.c functions)+=
...
240 244
(X86) " 520 ....
eqtree 195 static void cmp(op, p, n, lab) int op, n, lab; Symbol p; {
genident 49 listnodes(eqtree(op,
idtree 168 cast(idtree(p), inttype),
IR 306 consttree(n, inttype)),
LABELS 38
listnodes 318 lab, 0);
optree 191 }
pointer 174
rvalue 169 cmp is also used to generate a linear search; see Exercise 10.8.
STATIC 80 The branch table is generated by defining the static variable denoted
swcode 240 by table and calling the interface function defaddress for each of the
Switch 217 labels in the table. But this process cannot be done until the generated
table 41
tree 150 code is emitted, so the relevant data are saved on the code list in a Swi tch
voidptype 58 entry:
voidtype 58
walk 311 (Switch 242)= 217
struct {
Symbol sym;
10.B •RETURN STATEMENTS 243
Symbol table;
Symbol defl ab;
int size;
int *values;
Symbol *labels;
} swtch;
....
(generate an indirect jump and a branch table 242) += 242 241
code(Switch);
codelist->u.swtch.table = table;
codelist->u.swtch.sym = swp->sym;
codelist->u.swtch.deflab = swp->deflab;
codelist->u.swtch.size = u - l + 1;
codelist->u.swtch.values = &v[l];
codelist->u.swtch.labels = &swp->labels[l];
if (v[u] - v[l] + 1 >= 10000)
warning("switch generates a huge table\n");
The table is emitted by emi tcode.
retcode type-checks its argument tree and calls wa 1k to build the corre-
sponding RET dag, as detailed below. This dag is followed by a jump to
cfunc->u. f. 1 abe l, which labels the end of the current function; cfunc
points to the symbol-table entry for the current function. (This jump
may be discarded by branch.) The back end must finish a function with
the epilogue - the code that restores saved values, if necessary, and
transfers from the function to its caller.
The code above doesn't warn about missing return values for functions
that return ints unless 1 cc's -A option is specified, because it's common
to use int functions for void functions; i.e., to use
f(double x) { ... return; }
instead of the more appropriate
void f(double x) { ... return; }
For many programs, warnings about missing int return values would
drown out the more important warnings about the other types.
For void functions, retcode has nothing to do except perhaps plant
an event hook:
(stmt.c functions)+=
...
242 246
....
void retcode(p) Tree p; {
assign 195 Type ty;
branch 247
cfunc 290
freturn 64 if (p NULL) {
pointer 174 if (events.returns)
walk 311 (plant event hook for return)
return;
}
(retcode 244)
}
For types other than void, retcode builds and walks a RET tree. The RET
operator simply identifies the return value so that the back end can put
it in the appropriate place specified by the target's calling conventions,
such as a specific register.
When there's an expression, retcode type-checks it, converts it to the
return type of the function as if it were assigned to a variable of that
type, and wraps it in the appropriate RET tree:
(retcode 244) = 245
.... 244
p = pointer(p);
ty = assign(freturn(cfunc->type), p);
if (ty == NULL) {
error("illegal return type; found '%t' expected '%t'\n",
p->type, freturn(cfunc->type));
10.8 •RETURN STATEMENTS 245
return;
}
p = cast(p, ty);
Integers, unsigneds, floats, and doubles are returned as is. Characters
and shorts are converted to the promoted type of the return type just as
they are in argument lists. Since there's no RET+P, pointers are converted
to unsigneds and returned by RET+I. Calls to such functions are made
with CALL+I, and their values are converted back to pointers with CVU+P .
(retcode 244)+=
...
244 244
if (retv)
(return a structure 245)
if (events.returns)
(plant an event hook for return p)
p = cast(p, promote(p->type));
if (isptr(p->type)) {
(warn if p denotes the address of a local)
p = cast(p, unsignedtype);
}
walk(tree(RET + widen(p->type), p->type, p, NULL), 0, O);
Returning the address of a local variable is a common programming er-
ror, so lee detects and warns about the easy cases; see Exercise 10.9. 197 asgntree
There is no RET +B. Structures are returned by assigning them to a vari- 175 cast
293 compound
able. As described in Section 9.3, if wants_ca11 b is one, this variable is 168 idtree
the second operand to CALL+B in the caller and the first local in the callee, 191 iscallb
and the back end must arrange to pass its address according to target- 60 isptr
specific conventions. If wants_ca11 b is zero, the front end passes the 71 promote
address of this variable as a hidden first argument, and never presents 291 retv
the back end with a CALL+B. In both cases, compound, which implements 149 RIGHT
169 rvalue
compound-statement, arranges for retv to point to the symbol-table en- 150 tree
try for a pointer to this variable. Returning a structure is an assignment 58 unsignedtype
to *retv: 311 walk
88 wants_callb
(return a structure 245) = 245 74 widen
{
if (iscallb(p))
p = tree(RIGHT, p->type,
tree(CALL+B, p->type,
p->kids[O]->kids[O], idtree(retv)),
rvalue(idtree(retv)));
else
p = asgntree(ASGN, rvalue(idtree(retv)), p);
walk(p, 0, O);
if (events.returns)
(plant an event hook for a struct return)
246 CHAPTER 10 • STATEMENTS
return;
}
As for ASGN+B (see Section 9.5) and ARG+B (see Section 9.3), there's an
opportunity to reduce copying for
return f();
f returns the same structure returned by the current function, so the
current function's retv can be used as the temporary for the call to f.
If the call to i sea11 b in the code above identifies this idiom, the CALL+B
tree is rebuilt using retv in place of the temporary.
newnode builds a dag for LABELV with a sym[OJ equal top. The for loop
walks cp backward in the code list to the first entry that represents ex-
ecutable code, and the while loops remove one or more jumps to 1ab.
cp is a jump to lab if *cp is a Jump entry, and its node computes the
address of 1ab:
10.9 • MANAGING LABELS AND JUMPS 247
p->ref++;
return newnode(JUMPV, newnode(ADDRGP, NULL, NULL, p),
NULL, NULL);
217 Code
} 218 code
217 codelist
jump is called by branch, which stores the JUMPV dag in a Jump entry and 246 definelab
appends it to the code list. 248 equatelab
branch also eliminates jumps to jumps and dead jumps. It begins by 46 fi ndl abe 1
appending the jump to the code list using a Label placeholder. The jump 311 forest
is not a label, but Label is used so that (check for unreachable code) in 217 Jump
143 kind
code won't bark, which it would do if the last executable entry on the 217 Label
code list were an unconditional jump. 315 newnode
.... 38 ref
(stmt.c functions)+=
static void branch(lab) int lab; {
247 248 ... 311 walk
Code cp;
Symbol p = findlabel(lab);
walk(NULL, 0, 0);
code(Label)->u.forest = jump(lab);
for (cp = codelist->prev; cp->kind <Label;)
cp = cp->prev;
while ((cp points to a Label 1' lab 248)) {
equatelab(cp->u.forest->syms[O], p);
(remove the entry at cp 247)
while (cp->kind < Label)
cp = cp->prev;
248 CHAPTER 10 •STATEMENTS
}
(eliminate or plant the jump 249)
}
branch's for loop backs up to the first executable or Label entry before
the placeholder. The while loop looks for definitions of labels L' that
form the pattern
L':
goto L
where goto L is the jump in the placeholder.
(cp points to a Label !- lab 248)= 247
cp->kind == Label
&& cp->u.forest->op == LABELV
&& !equal(cp->u.forest->syms[O], p)
If L' !- L, L' is equivalent to L; jumps to L' can go to L instead, and the
Labe 1 entry for L' can be removed.
...
(stmt.c functions)+=
void equatelab(old, new) Symbol old, new; {
...
247 248
old->u.l.equatedto =new;
branch 247 new->ref++;
equatedto 46 }
forest 311
kind 143 makes new a synonym for old. During code generation, references to old
Label 217 are replaced by the label at the end of the list formed by the equatedto
ref 38
fields. These fields form a list because it's possible that new will be
equated to another symbol after old is equated to new. The ref field
counts the number of references to a label from jumps and from the
u. l .equated fields of other labels, so equatelab increments new->ref.
These synonyms complicate testing when two labels are equal. The
fragment (cp points to a Label !- lab) must fail when L' is equal to the
destination of the jump so code such as
top:
goto top;
is not erroneously eliminated, no matter how nonsensical it seems. Just
testing whether L' is equal to the destination, p, isn't enough; the two
labels are equivalent if L' is equal to p or to any label for which p is a
synonym. equa1 implements this more complicated test:
(stmt.c functions)+=
...
248
static int equal(lprime, dst) Symbol lprime, dst; {
for ( ; dst; dst = dst->u.l.equatedto)
if (lprime == dst)
FURTHER READING 249
return 1;
return O;
}
The warning exposes infinite loops like the one shown above.
Exercises
10.1 Implement the do statement.
10.2 Implement the while statement.
10.3 Implement
(stmt.c prototypes)=
static int foldcond ARGS((Tree el, Tree e2));
which is called by forstmt. Hint: Build a tree that conditionally
substitutes el for the left operand of the test e2, when appropriate.
If the operands of this tree are constants, s imp 1 i fy will return a
CNST tree that determines whether the loop body will be executed
at least once.
10.4 There's a while loop in (case label), but there's no repetitive con-
struct in the grammar for case labels. Explain.
10.5 Prove that the execution time of the partitioning algorithm in swgen
is linear in n, the number of case values.
10.6 Here's another implementation of swgen's partitioning algorithm
(suggested by Arthur Watson).
density 238
forstmt 228 while (n > 0) {
simplify 203 float d = den(n-1, k);
swgen 239 if (d < density
I I k < swp->ncases - 1 && d < den(n, k+l))
break;
n--;
}
The difference is that a table and its predecessor are not combined
if the table and v[k+l] would form a denser table. For example,
with density equal to 0.5, the greedy algorithm partitions the val-
ues 1, 6, 7, 8, 11, and 15 into the three tables (1, 6-8), (11), and
(15), and this lookahead variant gives the two tables (1) and (6-8,
11, 15). Analyze and explain this variant. Can you prove under
what conditions it will give fewer tables than the greedy algorithm?
10.7 Change swgen to use the optimal partitioning algorithm described
by Kannan and Proebsting (1994). With density equal to 0.5, the
optimal algorithm partitions the values 1, 6, 7, 8, 9, 10, 15, and 19
into the two tables (1) and (6-10, 15, 19); the greedy algorithm and
its lookahead variant described in the previous exercise generate
the three tables (1, 6-10), (15), and (19). Can you find real programs
on which the optimal algorithm gives fewer tables than the greedy
algorithm? Can you detect the differences in execution times?
EXERCISES 251
if tl = v[u] goto Lu
if t1 < v[l] goto lolab
if tl > v[u] goto hilab
Use cmp to do the comparisons, and avoid generating unnecessary
jumps to 1o1 ab and hi 1ab.
10.9 Implementing (warn if p denotes the address of a local) involves ex-
amining p to see if it's the address of a local or a parameter. This
test catches some, but not all, of these kinds of programming er-
rors. Give an example of an error that this approach cannot detect.
Is there a way to catch all such errors at compile-time? At run-time?
10.10 swcode is passed ub-1b+l tables in b [lb .. ub], and picks the mid-
dle table at b[(lb+ub)/2] as the root of the tree from which it
generates a binary search. Other choices are possible; it may, for
instance, choose the largest table, or profiling data could supply the
frequency of occurrence for each case value, which could pinpoint
the table that's most likely to cover the switch value. Alternatively, 242 cmp
we could assume a specific probability distribution for the case val- 91 defaddress
ues. Suppose all values in the range v[b[l b] .. b[ub + 1] - 1] - even 456 " (MIPS)
those for which there are no case labels - are equally likely to oc- 490 " {SPARC)
523 " {X86)
cur. For this distribution, the root table should be the one with
240 swcode
a case value closest to the middle value in this range. Implement
this strategy by computing swcode's k appropriately. Be careful; it's
possible that no table will cover the middle value, so pick the one
that's closest.
10.11 Some systems support dynamic linking and loading. When new
code is loaded, the dynamic linker must identify and update all re-
locatable addresses in it. This process takes time, so dynamically
linked code benefits from position-independent addresses, which
are relative to the value that the program counter will have during
the execution of the instruction that uses the address. For example,
if the instruction at location 200 jumps to location 300, conven-
tional relocatable code stores the address 300 in the instruction,
but position-independent code stores 300 - 200 or 100 instead. Ex-
tend 1cc's interface so that it can emit position-independent code
for switch statements. The interface defined in Chapter 5 can't
do so because it uses the same defaddress for switch statements
that it uses to initialize pointer data, which mustn't be position-
independent.
11
Declarations
252
11. 1 • TRANSLA T/ON UNITS 253
level = GLOBAL;
for (n = O; t != EOI; n++)
if (kind[t] == CHAR I I kind[t] == STATIC
11 t == ID 11 t == '*' 11 t == I(') { 109 CHAR
decl (dclglobal); 260 dclglobal
298 dcllocal
(deallocate arenas 254) 274 dclparam
} else if (t == ';') { 258 decl
warning("empty declaration\n"); 38 GLOBAL
t = gettok(); 143 kind
} else { 42 level
80 STATIC
error("unrecognized declaration\n");
t = gettok();
}
if (n == 0)
warning("empty input file\n");
}
11.2 Declarations
The syntax for declarations is
declaration:
declaration-specifiers init-declarator { , init-declarator} ;
declaration-specifiers ;
init-declara tor:
deallocate 28 declarator
FUNC 97 declarator= initializer
PERM 97
STMT 97 initializer:
assigmnent-expression
' { ' initializer { , initializer } [ , ] ' } '
declaration-specifiers:
storage-class-specifier [ declaration-specifiers ]
type-specifier [ declaration-specifiers ]
type-qualifier [ declaration-specifiers ]
storage-class-specifier:
typedef I extern I static I auto I register
type-specifier:
void
char I fl oat I short I signed
int I double I long I unsigned
s truct-or-union-specifier
en um-specifier
identifier
type-qualifier: con st I vo 1ati1 e
t = gettok();
} else
p = NULL;
break;
All that remains after parsing declaration-specifiers is to determine
the appropriate Type, which is encoded in the values of sign, size, and
type. This Type is specifier's return value. The default
{compute ty 257}= 257
..... 255
if (type == O) {
type = INT;
ty = inttype;
}
{compute ty 257} +=
...
257 255
57 unsignedlong
58 unsignedshort
if (cons == CONST) 58 unsignedtype
109 VOLATILE
ty = qual(CONST, ty);
if (vol == VOLATILE)
ty = qual(VOLATILE, ty);
dee 1, the parsing function for declaration, starts by calling specifier:
258 CHAPTER 11 • DECLARATIONS
ty = specifier(&sclass);
if Ct == ID 11 t == '*' 11 t 'C' 11 t == '[') {
char *id;
Coordinate pos;
(id, tyl - the first declarator 258}
for (;;) {
(declare id with type tyl 260}
if (t != ',')
break;
t = gettok();
(id, tyl - the next declarator258}
}
} else if (ty == NULL
11 ! (ty is an enumeration or has a tag})
error("empty declaration\n");
CHAR 109 test(';', stop);
Coordinate 38 }
dclr 265
specifier 255 dcl r, described in the next section, parses a declarator. The easy case is
STATIC 80 the one for the second and subsequent declarators:
test 141
(id, tyl - the next declarator258} = 258
id = NULL;
pos = src;
tyl = dclr(ty, &id, NULL, O);
dcl r accepts a base type - the result of specifier - and returns a Type,
an identifier, and possibly a parameter list. The base type, ty in the code
above, is dcl r's first argument, and its next two arguments are the ad-
dresses of the variables to assign the identifier and parameter list, if they
appear. It returns the complete Type. Passing a null pointer as dcl r's
third argument specifies that parameter lists may not appear in this con-
text. As detailed in Sectio:rflll.3, a nonzero fourth argument causes dcl r
to parse an abstract-declwator. pos saves the source coordinate of the
beginning of a declarator for use when the identifier is declared.
The first declaratot is treated differently than the rest because decl
also recognizes function-definitions, which can be confused with only
the first declarator at file scope:
(id, tyl - thefirstdeclarator258}= 258
id = NULL;
11.2 • DECLARATIONS 259
pos = src;
if (level == GLOBAL) {
Symbol *params = NULL;
tyl = dclr(ty, &id, ¶ms, O);
if ((function definition?259)) {
(define function id 259)
return;
} else if (params)
exitparams(params);
} else
tyl = dclr(ty, &id, NULL, 0);
Since the first declarator might be a function definition, a nonnull lo-
cation for the parameter list is passed as dcl r's third argument. If the
declarator includes a function and its parameter list, params is set to an
array of symbol-table entries. When there is a parameter list, but it's
not part of a function definition, exi tparams is called to close the scope
opened by that list. This scope isn't closed when the end of the list is
reached because the parsing function for parameter lists can't differenti-
ate between a function declaration and a function definition. Section 11.4
elaborates.
A declaration is really a function-definition if the first declarator spec-
ifies a function type and includes an identifier, and the next token begins 265 dcl r
either a compound statement or a Hst of parameter declarations: 258 decl
272 exitparams
(function definition? 259) = 259 42 exitscope
params && id && isfunc(tyl) 80 EXTERN
&& (t == '{' I I istypename(t, tsym) 286 funcdefn
I I (kind[t] == STATIC && t != TYPEDEF)) 38 GLOBAL
60 isfunc
115 istypename
decl calls funcdefn to handle function definitions: 143 kind
42 level
(defi.ne function id 259)= 259
63 oldstyle
if (sclass == TYPEDEF) { 271 parameters
error("invalid use of 'typedef'\n"); 80 STATIC
sclass = EXTERN; 108 tsym
}
if (tyl->u.f.oldstyle)
exi tscope () ;
funcdefn(sclass, id, tyl, params, pos);
The call to exi tscope closes the scope opened in parameters because
that scope will be reopened in funcdefn when the declarations for the
parameters are parsed.
The semantics part of decl amounts to declaring the identifier given in
the declarator. As described above, decl's argument is a dclX function
that does this semantic processing, except for typedefs.
260 CHAPTER 11 • DECLARATIONS
(dclglobal 261)
return p;
}
decl accepts any set of specifiers and declarators that are syntactically
legal, so the dclX functions must check for the specifiers that are illegal
11.2 • DECLARATIONS 261
in their specific semantic contexts, and must also check for redeclara-
tions. dclglobal, for example, insists that the storage class be extern,
static, or omitted:
(dclglobal 261)= 261 260
if (sclass == 0) ""
sclass = AUTO;
else if (sclass != EXTERN && sclass != STATIC) {
error("invalid storage class '%k' for '%t %s'\n",
sclass, ty, id);
sclass = AUTO;
}
Globals that have no storage class or an illegal one are given storage class
AUTO so that all identifiers have nonzero storage classes, which simplifies
error checking elsewhere.
dclglobal next checks for redeclaration errors.
(dclglobal 261)+= 261 262 260
...
p = lookup(id, identifiers); ""
if (p && p->scope == GLOBAL) {
if (p->sclass != TYPEDEF && eqtype(ty, p->type, 1))
ty = compose(ty, p->type);
else 80 AUTO
error("redeclaration of '%s' previously declared _ 72 compose
260 dclglobal
at %w\n", p->name, &p->src); 50 defined
if (!isfunc(ty) && p->defined && t == '=') 69 eqtype
error("redefinition of '%s' previously defined_ 80 EXTERN
at %w\n", p->narne, &p->src); 38 GLOBAL
(check for inconsistent linkage 262) 41 identifiers
60 isfunc
}
45 lookup
37 scope
A redeclaration is legal if the types on both declarations are compati- 80 STATIC
ble, which is determined by eqtype, and the resulting type is the com-
posite of the two types. Forming this composite is how the type of x,
illustrated above, changed from (ARRAY (INT)) to (ARRAY 40 4 (INT)).
Some redeclarations are legal, but redefinitions - indicated by a nonzero
defined flag and an approaching initializer - are never legal.
An identifier has one of three kinds of linkage. Identifiers with ex-
ternal linkage can be referenced from other separately compiled trans-
lation units. Those with internal linkage can be referenced only within
the translation unit in which they appear. Parameters and locals have no
linkage.
A global with no storage class or declared extern in its first declaration
has external linkage, and those declared static have internal linkage. On
subsequent declarations, an omitted storage class or extern has a slightly
different interpretation. If the storage class is omitted, it has external
262 CHAPTER 11 • DECLARATIONS
linkage, but if the storage class is extern, the identifier has the same
linkage as a previous file-scope declaration for the identifier. Thus,
static int y;
extern int y;
is legal and y has internal linkage, but
extern int y;
static int y;
is illegal because the second declaration demands that y have internal
linkage when it already has external linkage. Multiple declarations that
all have external or internal linkage are permitted.
The table below summarizes these rules in terms of p->scl ass, the
storage class of an existing declaration, and scl ass, the storage class
for the declaration in hand. AUTO denotes no storage class.
sclass
EXTERN STATIC AUTO
EXTERN J x J
p->sclass STATIC J J x
AUTO J x J
Aflag 62 J marks the legal combinations, and x marks the combinations that are
AUTO 80 linkage errors. The code use in dclglobal above is derived from this
dclglobal 260 table:
EXTERN 80
GLOBAL 38 (check for inconsistent linkage 262)= 261
globals 41 if (p->sclass == EXTERN && sclass == STATIC
i nstal 1 44
PERM 97 I I p->sclass == STATIC && sclass == AUTO
scope 37 I I p->sclass == AUTO && sclass == STATIC)
STATIC 80 warning("inconsistent linkage for '%s' previously_
declared at %w\n", p->name, &p->src);
This if statement prints its warning for the second of the two examples
shown above.
Next, the global is installed in the globals table, if necessary, and its
attributes are initialized or overwritten.
(dclglobal 261)+= 261 263 260
...
....
if (p == NULL I I p->scope != GLOBAL) {
p = install(id, &globals, GLOBAL, PERM);
p->sclass = sclass;
if (p->sclass != STATIC) {
static int nglobals;
nglobals++;
if (Aflag >= 2 && nglobals == 512)
warning("more than 511 external identifiers\n");
11.2 • DECLARATIONS 263
}
(*IR->defsymbol)(p);
} else if (p->sclass == EXTERN)
p->sclass = sclass;
p->type = ty;
p->src = *pos;
New globals are passed to the back end's defsymbol interface function
to initialize their x fields. If an existing global has storage class extern,
and this declaration has no storage class or specifies static, the global's
sclass is changed to either STATIC or AUTO to ensure that it's defined in
fi na 1i ze. If this declaration specifies extern, the assignment to scl ass is
made but has no effect. 1cc's -A option enables warnings about non-ANSI
usage. For example, the standard doesn't require an implementation to
support more that 511 external identifiers in one compilation unit, so
1cc warns about too many externals when -A -A is specified.
The standard permits compilers to accept
f() {extern float g(); ... }
int g() { ... }
h() {extern double g(); ... }
without diagnosing that the first declaration for g conflicts with its defi-
nition (which is also a declaration), or that the last declaration conflicts 80 AUTO
with the first two. Technically, each declaration for g introduces a differ- 260 dclglobal
ent identifier with a scope limited to the compound statement in which 298 dcllocal
89 defsymbol
the declaration appears. But all three g's have external linkage and must 457 " (MIPS)
refer to the same function at execution time. 1cc uses the exte rna 1s ta- 491 " (SPARC)
ble to warn about these kinds of errors. dcllocal adds identifiers with 520 " (X86)
external linkage to externals, and both dcllocal and dclglobal check 69 eqtype
for inconsistencies: 40 externals
80 EXTERN
(dclglobal 261)+=
....
262 263 260 303 finalize
..... 306 IR
{
60 isfunc
Symbol q = lookup(p->name, externals); 45 lookup
if (q && (p->sclass == STATIC 80 STATIC
I I !eqtype(p->type, q->type, 1)))
warning("declaration of '%s' does not match previous _
declaration at %w\n", p->name, &q->src);
}
initializerCp->type, O);
} else if Ct == '=')
initglobalCp, O);
else if Cp->sclass == STATIC && !isfuncCp->type)
&& p->type->size == 0)
errorC"undefined size for '%t %s'\n", p->type, p->name);
The last else if clause above tests for declarations of identifiers with in-
ternal linkage and incomplete types, which are illegal; an example would
be:
static int x[];
i ni tg 1oba1 parses an initializer if one is approaching or if its second
argument is nonzero, and defines the global given by its first argument.
initglobal announces the global in the proper segment, parses its ini-
tializer, adjusts its type, if appropriate, and marks the global as defined .
....
(decl.c functions)+=
static void initglobalCp, flag) Symbol p; int flag; {
260 265 ...
Type ty;
if Ct == '=' I I flag) {
if Cp->sclass == STATIC) {
for Cty = p->type; isarrayCty); ty = ty->type)
AUTO 80
DATA 91
defglobal 265 defglobalCp, isconstCty) ? LIT : DATA);
defined 50 } else
doextern 303 defglobalCp, DATA);
EXTERN 80 if Ct == '=')
import 90
(MIPS) " 457
t = gettok();
(SPARC) " 491 ty = initializerCp->type, O);
(X86) " 523 if CisarrayCp->type) && p->type->size == 0)
isarray 60 p->type = ty;
isconst 60 if Cp->sclass == EXTERN)
isfunc 60
LIT 91
p->sclass = AUTO;
STATIC 80 p->defined = 1;
}
}
i ni ti a 1i ze r is the parsing function for initializer, and is omitted from
this book. If p's type is an array of unknown size, the initialization spec-
ifies the size and thus completes the type. An initialization is always
a definition, in which case an extern storage class is equivalent to no
storage class, so sclass is changed, if necessary. This change prevents
doextern from calling the back end's import for pat the end of compi-
lation.
defg 1oba1 announces the definition of its argument by calling the ap-
propriate interface functions.
11. 3 • DECLARATORS 265
...
{decl.c functions)+=
void defglobal(p, seg) Symbol p; int seg; {
...
264 265
p->u.seg = seg;
swtoseg(p->u.seg);
if (p->sclass != STATIC)
(*IR->export)(p);
(*IR->global)(p);
}
{globals 265) = 38
int seg;
Identifiers with external linkage are announced by calling the export in-
terface function, and global proclaims the actual definition. swtoseg(n)
switches to segment n (one of BSS, LIT, CODE, or DATA) by calling the
segment interface function, but it avoids the calls when the current seg-
ment is n. defglobal records the segment in the global's u.seg field.
11.3 Declarators
Treating {parse the first declarator) as a special case in decl is one of the
messy spots in recognizing declarations. Parsing a declarator, which is 91 BSS
defined below, is worse. The difficulty is that the base type occurs before 91 CODE
its modifiers. For example, int *x specifies the type (POINTER (INT)), 91 DATA
but building the type left-to-right as the declarator is parsed leads to the 267 dclrl
258 decl
meaningless type (INT (POINTER)). The precedence of the operators []
90 export
and () cause similar difficulties, as illustrated by 456 " (MIPS)
490 " (SPARC)
int *x[lO], *f(); 523 .. (X86)
90 global
The types of x and f are 458 " (MIPS)
492 " (SPARC)
(ARRAY 10 (POINTER (INT))) 524 " (X86)
(POINTER (FUNCTION (INT))) 306 IR
91 LIT
The * appears in the same place in the token stream but in different 91 segment
places in the type representation. 459 " (MIPS)
As these examples suggest, it's easier to build a temporary inverted 491 " (SPARC)
501 " (X86)
type during parsing, which is what dcl r does, and then traverse the in-
255 specifier
verted type building the appropriate Type structure afterward. dcl r's 80 STATIC
first argument is the base type, which is the type returned by specifier .
...
{decl.c functions)+=
static Type dclr(basety, id, params, abstract)
...
265 266
dcl rl parses a declarator and returns its inverted type, from which dcl r
builds and returns a normal Type. The id and par am arguments are set to
Aflag 62
array 61 the identifier and parameter list in a declarator. Exercise 11.3 describes
ARRAY 109 the abstract argument. dcl rl uses Type structures for the elements of
CONST 109 an inverted type, and calls tnode to allocate an element and initialize it:
dclrl 267 ...
dcl r265
func 64
(decl.c functions)+=
static Type tnode(op, type) int op; Type type; {
265 267...
FUNCTION 109 Type ty;
NEWO 24
oldstyle 63
POINTER 109 NEWO(ty, STMT);
ptr 61 ty->op = op;
qual 62 ty->type = type;
STMT 97 return ty;
VOLATILE 109 }
pointer: { * { type-qualifi.er} }
Parsing declarators is similar to parsing expressions. The tokens *, (, and
[ are operators, and the identifiers and parameter lists are the operands.
Operators yield inverted type elements and operands set id or pa rams .
(decl.c functions)+=
...
266 271
static Type dclrl(id, params, abstract)
....
char **id; Symbol **params; int abstract; {
Type ty = NULL;
switch (t) {
case ID: (ident 267) break;
case '*' · t = gettok () ; (pointer 268) break;
case ' (': t = gettok(); (abstract function 270) break;
case '[': break;
default: return ty;
}
while Ct== 'C' I I t == '[')
switch (t) {
case ' (': t = gettok(); { (concrete function 268) }
break;
case '[': t = gettok(); { (array268) } break;
} 108 token
return ty;
}
where brackets denote inverted type elements. The type ultimately re-
turned by dcl r is
(POINTER (CONST+VOLATILE (POINTER (CONST POINTER (INT)))))
The code for parsing pointer is
(pointer 268) = 267
if (t == CONST I I t == VOLATILE) {
Type tyl;
tyl = ty = tnode(t, NULL);
while ((t = gettok()) ==CONST I I t ==VOLATILE)
tyl = tnode(t, tyl);
ty->type = dclrl(id, params, abstract);
ty = tyl;
} else
ty = dclrl(id, params, abstract);
ty = tnode(POINTER, ty);
The recursive calls to dcl rl make it unnecessary for the other fragments
in dcl rl to append their inverted types to a pointer type, if there is one.
Exercise 11.2 elaborates.
Control emerges from dcl rl's switch statement with ty equal to the
ARRAY 109 inverted type for a pointer or a function or null. The suffix type operators
CONST 109 [ and ( wrap ty in the appropriate inverted type element. The case for
dclrl 267 arrays is
dclr 265
expect 142 (array268)= 267
FUNCTION 109 int n = O;
intexpr 203
kind 143
if (kind[t] == ID) {
parameters 271 n = intexpr(']', 1);
POINTER 109 if (n <= 0) {
tnode 266 error("'%d' is an illegal array size\n", n);
VOLATILE 109 n = 1;
}
} else
expect(']');
ty = tnode(ARRAY, ty);
ty->size = n;
Parentheses either group declarators or specify a function type. Their
appearance in suffix-declarator always specifies a function type:
(concrete function 268)= 267
Symbol *args;
ty = tnode(FUNCTION, ty);
(open a scope in a parameter list269)
args = parameters(ty);
11. 3 • DECLARATORS 269
the parameter list for f opens a new scope and introducer .de structure
tag T. The structure's lone field, fp, is a pointer to a funt:tion, and the
parameter list for that function opens another new scope and defines a
different tag T. This declaration is legal. The declaration on the second
line is an error because it redefines the tag T - f's parameter x, its tag 265 dcl r
T, f's local y, and y's tag T are all in the same scope. 258 decl
1cc uses scope PARAM for identifiers declared at the top-level parame- 42 enterscope
272 exitparams
ter scope and LOCAL for identifiers like y; LOCAL is equal to PARAM+l. This 42 exitscope
division is only a convenience; foreach can visit just the parameters, for 41 foreach
example. Redeclaration tests, however, must check for LOCAL identifiers 42 level
that erroneously redeclare PARAM identifiers. 38 LOCAL
38 PARAM
The example above is the one case where redeclaration tests must not
make this check. The code above arranges for a nested parameter list
to have a scope of at least PARAM+2. Leaving this "hole" in the scope
numbers avoids erroneous redeclaration diagnostics. For example, the
tag T in fp's parameter has scope PARAM+2, and thus does not elicit a
redeclaration error because the x's tag T has scope PARAM.
At some point, the scope opened by the call or calls to enterscope
must be closed by a matching call to exi tscope. The parameter list may
be part of a function definition or just part of a function declaration. If
the list might be in a function definition, pa rams is nonnull and not pre-
viously set, and dcl r's caller must call exits cope when it's appropriate.
The call to exi tparams in decl 's (id, tyl +- the first declarator) is an
example. exi tparams checks for old-style parameter lists that are used
erroneously, and calls exi tscope. If pa rams is null or already holds a
parameter list, then exi tscope can be called immediately because the
parameter list can't be part of a function definition.
270 CHAPTER 11 • DECLARA T/ONS
break;
t = gettok();
}
params = ltov(&list, FUNC);
272 CHAPTER 11 • DECLARATIONS
the function type, which must be retained for checking calls, other dec-
larations of the same function, and the definition, if one appears. As
described in Section 4.5, a new-style function with no arguments has a
zero-length prototype; a function with a variable number of arguments
has a prototype with at least two elements, the last of which is the type
for void. The use of void to identify a variable number of arguments
is an encoding trick (of perhaps dubious value); it doesn't appear in the
source code and can't be confused with voids that do, because they never
appear in prototypes.
(parse new-style parameter list 273} = 271
int n = O;
Type tyl = NULL;
for(;;){
Type ty;
int sclass = O;
char *id = NULL;
if (tyl && t == ELLIPSIS) {
(terminate 1 i st for a varargs function 274}
t = gettok();
break;
}
if (!istypename(t, tsym) && t != REGISTER) 265 dclr
error("missing parameter type\n"); 115 istypename
n++; 63 oldstyle
ty = dclr(specifier(&sclass), &id, NULL, 1); 80 REGISTER
(declare a parameter and append it to 1ist273} 255 specifier
108 tsym
if (tyl == NULL) 58 voidtype
tyl = ty;
if Ct!=',')
break;
t = gettok();
}
(build the prototype 274}
fty->u.f .oldstyle = O;
tyl is the Type of the first parameter, and it's used to detect invalid use
of void and, as shown above, of ellipses. Each parameter is a declarator,
so parsing one uses the machinery embodied in specifier and dclr,
but, as shown above, permits only the storage class register. If the type
void appears, it must appear alone and first:
(declare a parameter and append it to 1i st 273}= 273
if ( ty == voidtype && (tyl I I id)
I I tyl == voidtype)
error("illegal formal parameter types\n");
if (id == NULL)
274 CHAPTER 11 • DECLARA T/ONS
id= stringd(n);
if (ty != voidtype)
list= append(dclparam(sclass, id, ty, &src), list);
Omitted identifiers are given integer names; dclparam will complain
about these missing identifiers if the declaration is part of a function
definition.
Variable length parameter lists cause the evolving list of parameters
to be terminated by a statically allocated symbol with a null name and
the type void.
(terminate list for a varargs function 274) = 273
static struct symbol sentinel;
if (sentinel.type== NULL) {
sentinel.type= voidtype;
sentinel.defined= 1;
}
if (tyl == voidtype)
error("illegal formal parameter types\n");
list= append(&sentinel, list);
After the new-style parameter list has been parsed, list holds the sym-
bols in the order they appeared. These symbols form the params array
append 34 returned by parameters, and their types form the prototype for the func-
Coordinate 38 tion type:
defined 50
exitparams 272 (build the prototype 274)= 273
exitscope 42 fty->u.f.proto = newarray(length(list) + 1,
funcdefn 286 sizeof (Type*), PERM);
FUNC 97
length 34 params = ltov(&list, FUNC);
list 321 for (n = O; params[n]; n++)
ltov 34 fty->u.f.proto[n] = params[n]->type;
newarray 28 fty->u.f.proto[n] =NULL;
parameters 271
PERM 97 dcl pa ram declares both old-style and new-style parameters. dcl pa ram
stringd 29 is called twice for each parameter: The first call is from parameters
symbol 37
voidtype 58 and the second is from funcdefn. If the parameter list is not part of
a definition, the call to exitscope (in exitparams) discards the entries
made by dcl pa ram.
....
(decl.c functions)+=
static Symbol dclparam(sclass, id, ty, pos)
272 277 ...
int sclass; char *id; Type ty; Coordinate *pos; {
Symbol p;
ty = ptr(ty);
else if (isarray(ty))
ty = atop(ty);
The only explicit storage class permitted is register, but 1cc uses auto
internally to identify nonregister parameters.
...
(dcl param 275) +=
if (sclass == O)
275 275 ... 274
sclass = AUTO;
else if (sclass != REGISTER) {
error("invalid storage class '%k' for '%t%s\n",
sclass, ty, (id275));
sclass = AUTO;
} else if (isvolatile(ty) I I isstruct(ty)) {
warning("register declaration ignored for '%t%s\n",
ty, (id275));
sclass = AUTO; 62 at:op
} 80 AUTO
274 dclparam
50 defined
(id 275)= 275 97 FUNC
stringf(id? "%s'" : '" parameter", id) 41 identifiers
44 inst:all
Parameters may be declared only once, which makes checking for re- 60 isarray
declaration easy: 60 isfunc
... 60 isst:ruct:
(dcl pa ram 275) +=
p = lookup(id, identifiers);
275 275 ... 274 60
42
isvolat:ile
level
45 lookup
if (p && p->scope == level) 61 pt:r
error("duplicate declaration for '%s' previously _ 80 REGISTER
declared at %w\n", id, &p->src); 37 scope
else 99 st:ringf
p = install(id, &identifiers, level, FUNC);
dcl param concludes by initializing p's remaining fields and checking for
and consuming illegal initializations.
(dcl par am 275) +=
...
275 274
p->sclass = sclass;
p->src = *pos;
p->type = ty;
p->defined = 1;
276 CHAPTER 11 • DECLARATIONS
if (t == '=') {
error("illegal initialization for parameter '%s '\n", id);
t = gettok();
(void)exprl(O);
}
Parameters are considered defined when they are declared because they
are announced to the back end by the interface procedure function, as
described in Section 11.6.
Exchanging the two lines fixes the problem for 1i st, but exposes head to
the same problem. The solution is to define the new type before defining
head:
struct node;
struct head { struct node *list; ... };
struct node { struct head *hd; struct node *link; ... };
The lone struct node defines a new incomplete structure type with the
tag node in the scope in which it appears, and hides other tags named
node defined in enclosing scopes, if there are any. If there is a structure
tag node in the same scope as the struct node, the latter declaration has
no effect.
The parsing function for struct-or-union-specifier, structdcl, deals
with tags and their definition, and calls fie 1ds to parse fields and to
assign field offsets. Unions and structures are handled identically, except
for assigning field offsets.
...
(decl.c functions)+=
static Type structdclCop) int op; {
274 280 ...
char *tag;
Type ty;
Symbol p;
Coordinate pos; 38 Coordinate
280 fields
67 newstruct
t = gettok(); 108 token
pos = src;
(structdcl 277)
return ty;
}
structdcl begins by consuming the tag or using the empty string for
omitted tags:
(structdcl 277)=
if Ct == ID) {
...
277 277
tag = token;
t = gettok();
} else
tag = "";
If the tag is followed by a field list, this specifier defines a new tag:
...
(structdcl 277)+=
if Ct== '{') {
277 278... 277
ty->u.sym->defined = 1;
t = gettok();
if (istypename(t, tsym))
fields(ty);
else
error("invalid %k field declarations\n", op);
test('}' , stop) ;
}
newstruct checks for redeclaration of the tag and defines the new type.
If the tag is empty, new st ruct calls gen 1abe1 to generate one. new st ruct
is also used for enumeration specifiers; see Exercise 11.9.
If the struct-or-union-specifier doesn't have fields and the tag is al-
ready in use for the type indicated by op, the specifier refers to that
type.
(structdcl 277) += 277 278
... 277
....
else if (*tag && (p = lookup(tag, types)) != NULL
&& p->type->op == op) {
ty = p->type;
if (t == ' ; ' && p->scope <level)
ty = newstruct(op, tag);
}
defined 50
fields 280 This case also handles the exception described above: If the tag is defined
genlabel 45 in an enclosing scope and the specifier appears alone in a declaration,
i stypename 115
level 42
the specifier defines a new type. As described in Chapter 3, tags have
lookup 45 their own name space, which is managed in the types table.
newstruct 67 If the cases above don't apply, there must be a tag, and the specifier
scope 37 defines a new type:
test
tsym
141
108 (structdcl 277)+=
...
278 277
types 41 else {
use 51 if (*tag == O)
error("missing %k tag\n", op);
ty = newstruct(op, tag);
}
if (*tag && xref)
use(ty->u.sym, pos);
The last else clause handles the case when a specifier appears alone in
a declaration and the tag is already defined in an enclosing scope for a
different purpose. An example is:
enum node { ... };
f(void) {
struct node;
struct head { struct node *list; ... };
11. 5 • STRUCTURE SPECIFIERS 279
The else clause above handles the struct node on the third line.
Most of the complexity of processing structure and union specifiers is
in analyzing the fields and computing their offsets, particularly specifiers
involving bit fields. Fields must be laid out in the order they appear in
fields; their offsets depend on their types and the alignment constraints
of those types. Bit fields are allocated in addressable storage units and
when N bit fields fit in a storage unit, they must be laid out in the or-
der in which they are declared, but that order can be from least to most
significant bit or vice versa. It's conventional to use the order that fol-
lows increasing addresses: least to most significant bit (right to left) on
little-endian targets, and most to least significant bit (left to right) on
big endians. A compiler is not obligated to split bit fields across storage
units, and it may choose any storage unit for bit fields. 1cc uses un-
signed integers so that bit fields can be fetched and stored using integer
loads, stores, and masking operations.
Figure 11.l shows a structure definition and its layout on a little-
endian MIPS. Unsigneds are 32 bits, and integers and unsigneds must
be aligned on 4-byte boundaries. Addresses increase from right to left
as suggested by the numbering of a's elements, and from top to bottom 182 field
as suggested by the offsets on the right side of the figure. The shad- 280 fields
ing depicts holes that result from alignment constraints, and the darker
shading is the hole specified by the 26-bit unnamed bit field. This ex-
ample helps explain the intricacies of fie 1ds, the parsing function for
fields.
fie 1ds parses the field list and builds a list of fie 1d structures em-
anating from ty->u. sym->u. s. fl is t. The fie 1d structure is described
in Section 4.6. Its name, type, and offset fields give the field's name, its
Type, and its offset in bytes from the beginning of the structure, respec-
struct {
char a[S]; used
short sl, s2;
unsigned code:3, used:l; code
unsigned :26;
int amt:?, last;
short id;
} x;
FIGURE 11.1 Llttle-endian structure layout example.
280 CHAPTER 11 • DECLARATIONS
tively. For bit fields, bi tsi ze gives the number of bits in the bit field,
and 1 sb gives the number of the bit field's least significant bit plus one,
where bits are numbered starting at zero with the least significant bit on
all targets. A bit field is identified by a nonzero 1sb. The list of fie 1d
structures is threaded through the 1 ink fields. For the example shown
in Figure 11.1, this list holds the fields shown in the following table.
n counts the number of fields, and is used only for the warning about
declaring more fields than the maximum specified by the standard, which
1cc's -A option enables.
Parsing a field is similar to parsing the declarator in a declaration, and
dcl r does most of the work:
(parse one field 281) = 281
..... 280
p = newfield(id, ty, dclr(tyl, &id, NULL, O));
newfi e 1d allocates a fie 1 d structure, initializes its name and type fields
to the value of id and the Type returned by dcl r, clears the other fields,
and appends it to ty->u. sym->u. s. fl i st. As it walks down the list to
its end, newfi e1 d also checks for duplicate field names.
An oncoming colon signifies a bit field, and fie 1ds must check the
field's type, parse its field width, and check that the width is legal:
(parse one fi.eld 281)+=
...
281 282 280
.....
if (t == ':') {
if (unqual(p->type) != inttype
&& unqual(p->type) != unsignedtype) {
error("'%t' is an illegal bit-field type\n",
p->type);
p->type = inttype;
} 265 dclr
t = gettok(); 182 field
280 fields
p->bitsize = intexpr(O, O); 45 genlabel
if (p->bitsize > 8*inttype->size I I p->bitsize < 0) { 203 intexpr
error('"%d' is an illegal bit-field size\n", 68 newfield
p->bitsize); 29 stringd
p->bitsize = 8*inttype->size; 60 unqual
58 unsignedtype
} else if (p->bitsize == 0 && id) {
warning("extraneous 0-width bit field '%t %s' _
ignored\n", p->type, id);
p->name = stringd(genlabel(l));
}
p->lsb = 1;
}
If a field or bit field is declared const, assignments to that field are for-
bidden. Structure assignments must also be forbidden. For example,
given the definition
struct { int code; const int value; } x, y;
x. code and y. code can be changed, but x. va 1ue and y. va1 ue cannot.
Assignments like x = y are also illegal, and they're caught in asgntree
by inspecting the structure type's cfields flag, which is set here, along
with the vfi e 1ds flag, which records volatile fields:
{parse one fi.eld 281)+=
....
282 280
if (isconst(p->type))
align 78
asgntree 197 ty->u.sym->u.s.cfields 1;
cfields 65 if (isvolatile(p->type))
Field 66 ty->u.sym->u.s.vfields 1·
field 182 '
IR 306 At this point, the field list for Figure 11.1 's example has nine elements:
isconst 60 the eight shown in the table on page 280 plus one between used and amt
isfunc 60 that has a bi tsi ze equal to 26. The 1sb fields of the elements for code,
isvolatile 60
structmetri c 79 used, and amt are all equal to one, and all offset fields are zero.
vfields 65 Next, field makes a pass over the field list computing offsets. It
also computes the alignment of the structure, and rebuilds the field list
omitting those fie 1d structures that represent padding, which are those
with integer names.
{assign field offsets 282) = 285
.... 280
int bits = 0, off = 0, overflow = O;
Field p, *q = &ty->u.sym->u.s.flist;
ty->align = IR->structmetric.align;
for (p = *q; p; p = p->link) {
{compute p->offset 283)
if (p->name == NULL
I I !('1' <= *p->name && *p->name <= '9')) {
*q = p;
q = &p->link;
}
11. 5 • STRUCTURE SPECIFIERS 283
}
*q = NULL;
off is the running total of the number of bytes taken by the fields up
to but not including the one pointed to by p. bi ts is the number of
bits plus one taken by bit fields beyond off by the sequence of bit fields
immediately preceding p. Thus, bi ts is nonzero if the previous field is
a bit field, and it never exceeds unsignedtype->size. fields must also
cope with offset computations that overflow. It uses the macro add to
increment off:
(decl.c macros)= 283
....
#define add(x,n) (x > INT_MAX-(n) ? (overflow=l,x) : x+(n))
#define chkoverflow(x,n) ((void)add(x,n))
chkoverflow uses add to set overflow if x + n overflows. If overflow is
one at the end of fields, the structure is too big.
If the fields appear in a union, all the offsets are zero by definition:
(compute p->offset 283)= 283
.... 282
int a= p->type->align ? p->type->align : 1;
if (p-> l sb)
a = unsignedtype->align;
if (ty->op == UNION) 205 add
off = bits = O; 78 align
280 fields
The value of a is the field's alignment; it's used below to increase the 364 offset
structure's alignment, ty->a l i gn, if necessary. It's also used to round 19 roundup
up off to the appropriate alignment boundary: 109 UNION
58 unsignedtype
(compute p->offset 283)+=
....
283 284 282
....
else if (p->bitsize == 0 I I bits == 0
I I bits - 1 + p->bitsize > 8*unsignedtype->size) {
off= add(off, bits2bytes(bits-1));
bits = O;
chkoverflow(off, a - 1);
off= roundup(off, a);
}
if (a > ty->align)
ty->align = a;
p->offset = off;
....
283
(decl.c macros)+=
#define bits2bytes(n) (((n) + 7)/8)
off must be rounded up if p isn't a bit field, isn't preceded by fields that
ended in the middle of an unsigned, or is a bit field that's too big to fit
in the unsigned partially consumed by previous bit fields. Before off is
284 CHAPTER 11 • DECLARATIONS
For the example in Figure 11.1, the loop in {assign field offsets) ends
with ty->si ze equal to 26, the last value of off, which is not a multiple
of 4, the value of ty->align, so this concluding code bumps ty->size
to 28.
{funcdefn 286)
}
funcdefn has much to do. It must parse the optional declarations for old-
style functions, reconcile new-style declarations with old-style definitions
and vice versa, and initialize the front end in preparation for parsing
compound-statement, which contributes to the code list for the function.
Once the compound-statement is consumed, funcdefn must finalize the
code list for traversal when the back end calls gencode and emi tcode,
arrange the correct arguments to the interface procedure function, and
re-initialize the front end once code for the function has been generated.
funcdefn's sclass, id, and ty parameters give the storage class, func-
tion name, and function type gleaned from the declarator parsed by decl.
pt is the source coordinate of the beginning of that declarator. params
is the array of symbols built by parameters - one for each parameter,
and an extra unnamed one if the parameter list ended with an ellipsis.
callee 93
caller 93 funcdefn starts by removing this extra symbol because it's used only in
Coordinate 38 prototypes, and it checks for illegal return types:
decl 258
emitcode 341
{funcdefn 286)= 286
..... 286
freturn 64 if (isstruct(rty) && rty->size == 0)
function 92 error("illegal use of incomplete type '%t'\n", rty);
(MIPS) " 448 for (n = O; params[n]; n++)
(SPARC) " 484
(X86) " 518
gencode 337
if (n > 0 && params[n-1]->name == NULL)
isstruct 60 params[--n] =NULL;
oldstyle 63 params helps funcdefn build two parallel arrays of pointers to symbol-
parameters 271
table entries. ca11 ee is an array of entries for the parameters as seen by
the function itself, and ca 11 er is an array of entries for the parameters
as seen by callers of the function. Usually, the corresponding entries in
these arrays are the same, but they can differ when argument promotions
force the type of a caller parameter to be different than the type of the
corresponding callee parameter, as shown in Section 1.3. The storage
classes of the caller and callee parameters can also be different when,
for example, a parameter is declared register by the callee but is passed
on the stack by the caller. The details of building callee and caller
depend on whether the definition is old-style or new-style:
{funcdefn 286)+=
...
286 290 286
.....
if (ty->u.f.oldstyle) {
11.6 • FUNCTION DEFINITIONS 287
New-style definitions are the easier of the two because parameters has
already done most of the work, so pa rams can be used as ca 11 ee. The
caller parameters are copies of the corresponding callee parameters, ex-
cept that their types are promoted and they have storage class AUTO to
indicate that they're passed in memory.
(initialize new-style parameters 287)= 287
callee = params;
caller= newarray(n + 1, sizeof *caller, FUNC);
for Ci = O; (p = callee[i]) != NULL && p->name; i++) {
NEW(caller[i], FUNC);
*caller[i] = *p;
caller[i]->type = promote(p->type); 80 AUTO
93 callee
caller[i]->sclass = AUTO; 93 caller
if ('1' <= *p->name && *p->name <= '9') 274 dclparam
error("missing name for parameter %d to _ 258 decl
function '%s'\n", i + l, id); 42 enterscope
}
286 funcdefn
97 FUNC
caller[i] = NULL; 92 function
448 " (MIPS)
Recall that parameters uses the parameter number for a missing param- 484 " (SPARC)
eter identifier, so funcdefn must check for such identifiers. Identif1ers 518 " (X86)
can be omitted in declarations but not in function definitions. ll5 istypename
For old-style definitions, parameters has simply collected the iden- 143 kind
28 newarray
tifiers in the parameter list and checked for duplicates. funcdefn must 24 NEW
parse their declarations and match the resulting identifiers with the ones 271 parameters
in par ams. It uses par ams for the cal 1er, makes a copy for use as cal 1 ee, 71 promote
and calls dee 1 to parse the declarations. 80 STATIC
108 tsym
(initialize old-style parameters 287) =
caller = params;
...
288 287
(decl.c functions)+=
....
static void oldparam(p, cl) Symbol p; void *cl; {
286 293 ....
int i;
Symbol *callee= cl;
AUTO 80
callee 93
caller 93
dclparam 274
defined 50
doubletype 57
floattype 57
foreach 41 (initialize old-style parameters 287)+=
....
funcdefn 286
for (i = O; (p = callee[i]) != NULL; i++) {
....
288 289
identifiers 41
parameters 271 if ( ! p->defi ned)
PARAM 38 callee[i] = dclparam(O, p->name, inttype, &p->src);
promote 71 *caller[i] = *p;
unqual 60 caller[i]->sclass = AUTO;
if (unqual(p->type) == floattype)
caller[i]->type doubletype;
else
caller[i]->type promote(p->type);
}
callee's two symbols have types (CHAR) and (FLOAT), but caller's sym-
bols have types (INT) and (DOUBLE). As shown in Section 1.3, these dif-
ferences cause assignments of the caller values to the callee values at
the entry to f.
The standard permits mixing old-style definitions and new-style decla-
rations, as the code in this book illustrates, but the definitions must agree
with the declarations and vice versa. If a new-style declaration precedes
an old-style definition, the function is deemed to be a new-style function,
and the old-style definition must provide a parameter list whose types
are compatible with the declaration.
...
(initialize old-style parameters 287) +=
p = lookup(id, identifiers);
288 289 ... 287
an execution point for the function's entry point, and calls compound,
which is described in the next section.
...
(funcdefn 286)+=
labels = table(NULL, LABELS);
...
290 291 286
( decl.c data)=
static int regcount;
...
294
The call to define 1ab adds the exit-point label, and defi nept plants the
accompanying execution point. lee warns about the possibility of an
implicit return for functions that return values other than integers, or
for all nonvoid functions if its -A option is specified. The final steps
in parsing the function are to close the scope opened by compound and
check for unreferenced parameters:
...
(funcdefn 286)+=
exitscope();
291 292... 286
walk(NULL, 0, O);
cp = code(Blockbeg);
enterscope();
(compound 294)
cp->u.block.level =level;
cp->u.block.identifiers =identifiers;
cp->u.block.types = types;
code(Blockend)->u.begin = cp;
if (level > LOCAL) {
exits cope();
expect(' } ' ) ;
}
}
compound is called from statement and from funcdefn. The only dif-
ference between these two calls is that the scope is closed only on the
call from statement. As shown above, funcdefn closes the scope that
append 34 compound opens on its behalf so that it can call the interface procedure
AUTO 80 function before doing so.
Blockbeg 217 Most of compound's semantic processing concerns the locals declared
Blockend 217 in the block. de 11oca1 processes each local and appends it to one of the
cfunc 290 lists
code 218
compound 293 ....
(decl.c data)+= 291
dcllocal 298
defined 50 static List autos, registers;
enterscope 42
exitscope 42 depending on its explicit storage class. Locals with no storage class are
expect 142 appended to autos, and static locals are handled like globals.
freturn 64 If compound is called from funcdefn, it must cope with the interface
funcdefn 286 flag wants_ca 11 b. When this flag is one, the back end handles the trans-
function 92 mission of the return value for functions that return structures. The
(MIPS)II 448
(SPARC) II 484 front end generates space for this value in the caller, but it doesn't know
(X86)II 518 how to transmit the address of this space to the callee. It assumes that
genident 49 the back end will arrange to pass this address in a target-dependent way
identifiers 41 and to store it in the first local. So, compound generates the first local
IR 306 and saves its symbol-table entry in retv:
isstruct 60
level 42
List 34
(compound 294)=
autos = registers = NULL;
295 294 ...
LOCAL 38
ptr 61 if (level == LOCAL && IR->wants_callb
ref 38 && isstruct(freturn(cfunc->type))) {
retv 291 retv = genident(AUTO, ptr(freturn(cfunc->type)), level);
statement 221 retv->defined = 1;
types 41
walk 311 retv->ref 1;
wants_callb 88 registers= append(retv, registers);
}
11.1 • COMPOUND STATEMENTS 295
walk(NULL, 0, O);
foreach(identifiers, level, checkref, NULL);
As the statements are compiled, idtree increments the ref fields of the
identifiers they use. Thus, at the end of statements, the ref fields iden-
tify the most frequently accessed variables. checkref, described below,
changes the storage class of any scalar variable referenced at least three
times to REGISTER, unless its address is taken. compound sorts the locals
beginning at cp->u. block. locals [nregs] in decreasing order of ref val-
ues.
{compound 294)+=
...
295 294
{
inti = nregs, j;
Symbol p;
for ( ; (p = cp->u.block.locals[i]) !=NULL; i++) {
for (j = i; j > nregs
&& cp->u.block.locals[j-1]->ref < p->ref; j--)
cp->u.block.locals[j] = cp->u.block.locals[j-1];
cp->u.block.locals[j] = p;
}
}
addressed 179 Some of these locals now have REGISTER storage class, and sorting them
compound 293 on their estimated frequency of use permits the back end to assign reg-
foreach 41
i denti fi ers 41
isters to those that are used most often without having it do its own
idtree 168 analysis. The locals in cp->u.block. locals[O .. nregs-1] may be less
isfunc 60 frequently referenced than the others, but they're presented to the back
isvolatile 60 end first because the programmer explicitly declared them as registers.
level 42 check ref is called at the ends of compound statements for every sym-
PARAM 38
ref 38
bol in the i denti fie rs table, and it does more than change storage
REGISTER 80 classes.
scope 37 ...
walk 311 {decl.c functions)+=
static void checkref(p, cl) Symbol p; void *cl; {
293 298 ...
{checkref 296)
}
...
(checkref 296)+=
if (Aflag >= 2 && p->defined && p->ref == 0) {
296 297 ... 296
if (p->sclass == STATIC)
warning("static '%t %s' is not referenced\n",
p->type, p->name);
else if (p->scope == PARAM)
warning("parameter '%t %s' is not referenced\n",
p->type, p->name);
else if (p->scope >= LOCAL && p->sclass != EXTERN)
warning("local '%t %s' is not referenced\n",
p->type, p->name);
}
Like dclglobal and dcl pa ram, dcllocal starts by checking for an invalid
storage class:
(dcllocal 298}=
if (sclass == O)
298 298
...
AUTO 80
sclass = isfunc(ty) ? EXTERN : AUTO;
compose 72 else if (isfunc(ty) && sclass != EXTERN) {
compound 293 error("invalid storage class '%k' for '%t %s'\n",
Coordinate 38 sclass, ty, id);
dclglobal 260 sclass = EXTERN;
dclparam 274
decl 258
} else if (sclass == REGISTER
eqtype 69 && (isvolatile(ty) I I isstruct(ty) I I isarray(ty))) {
EXTERN 80 warning("register declaration ignored for '%t %s'\n",
identifiers 41 ty' id);
isarray 60 sclass = AUTO;
isfunc 60 }
isstruct 60
isvolatile 60 Local variables may have any storage class, but functions must have no
level 42
LOCAL 38 storage class or extern. Volatile locals and those with aggregate types
lookup 45 may be declared register, but l cc treats them as automatics.
PARAM 38 Next, dcl local checks for redeclarations:
ref 38 ....
REGISTER
scope
80
37
(dell oca l 298} +=
q = lookup(id, identifiers);
298 299 ... 298
p->type = ty;
p->sclass = sclass;
p->src = *pos;
switch (sclass) {
case EXTERN: (extern local 300) break; 34 append
case STATIC: (static local 300) break; 80 AUTO
case REGISTER: (register local 299) break; 294 autos
case AUTO: (autolocal299) break; 296 checkref
} 298 dcllocal
50 defined
Automatic and register locals are the easy ones; they're simply appended 80 EXTERN
to the appropriate list: 97 FUNC
337 gencode
(register local 299)= 299 41 identifiers
registers= append(p, registers); 264 initglobal
regcount++; 44 install
p->defined = 1; 42 level
38 LOCAL
291 regcount
(auto local 299) = 299 80 REGISTER
autos= append(p, autos); 294 registers
p->defined = 1; 80 STATIC
regcount is the number of locals explicitly declared register anywhere
in a function, and is used in checkref, above. Unlike globals, a local's
defined flag is lit when it's declared, before it's passed to the back end,
which occurs in gencode. Locals are treated this way because they can be
declared only once (in a given scope), and their declarations are always
definitions.
Most of the work for static locals is in dealing with the optional ini-
tialization, which is the same as what i ni tgl oba 1 does for globals:
300 CHAPTER 11 • DECLARATIONS
r->src = p->src;
r->type = p->type;
r->sclass = p->sclass;
q = lookupCid, globals);
if Cq && q->sclass != TYPEDEF && q->sclass != ENUM)
r = q;
}
if Cr && !eqtypeCr->type, p->type, 1))
warningC"declaration of '%s' does not match previous_
declaration at %w\n", r->name, &r->src);
}
t = gettok();
definept(NULL);
if (isscalar(p->type)
I I isstruct(p->type) && t != '{') {
if Ct == I {I) {
t = gettok();
e = exprl(O);
expect('}');
} else
e = exprl(O);
} else {
(generate an initialized static t1 302)
e = idtree(tl);
}
walk(root(asgn(p, e)), 0, O);
p->ref = 1;
}
if (!isfunc(p->type) && p->defined && p->type->size <= 0)
error("undefined size for '%t %s'\n", p->type, id);
For a local that has a scalar type, a structure type, or a union type, and
whose initializer is a single expression, the initialization is an assign-
array 61 ment of the initializer to the local. For a local that has an aggregate type
CONST 109 and a brace-enclosed initializer, 1cc generates an anonymous static vari-
defined 50 able, and initializes it as specified by the initializer. A single structure
definept 220 assignment initializes the local, even for arrays.
expect 142
exprl 157 (generate an initialized static t1 302) = 302
genident 49
Symbol tl;
GLOBAL 38
idtree 168 Type ty = p->type, tyl = ty;
initglobal 264 while (isarray(tyl))
isarray 60 tyl = tyl->type;
1sconst 60 if (!isconst(ty) && (!isarray(ty) 11 !isconst(tyl)))
isfunc 60
isscalar 60
ty = qual(CONST, ty);
isstruct 60 tl = genident(STATIC, ty, GLOBAL);
LIT 91 initglobal(tl, 1);
qual 62 if (isarray(p->type) && p->type->size 0
ref 38 && tl->type->size > 0)
STATIC 80
walk 311 p->type = array(p->type->type,
tl->type->size/tl->type->type->size, O);
This static will never be modified, so a const qualifier is added to its
type, which causes initglobal to define it in the LIT segment.
11.8 • FINALIZATION 303
11.8 Finalization
As suggested in the previous section, checkref is also called at the end
of compilation for each file-scope identifier, i.e., those with scope GLOBAL
This call comes from fi na 1 i ze, which also processes externals and glob-
als.
( decl.c functions)+=
....
298 303
void finalize() {
....
foreach(externals, GLOBAL, doextern, NULL);
foreach(identifiers, GLOBAL, doglobal, NULL);
foreach(identifiers, GLOBAL, checkref, NULL);
foreach(constants, CONSTANTS, doconst, NULL);
}
int x;
int x;
int x;
is valid and each declaration is a tentative definition for x. A file-scope
declaration with an initializer is an external definition, and there may be
only one such definition.
At the end of a translation unit, those file-scope identifiers that have
only tentative definitions must be finalized; this is accomplished by as-
smning that the translation unit includes a file-scope external definition
for the identifier with an initializer equal to zero. For example, x is fi-
nalized by assuming
int x = O;
Uninitialized file-scope objects are thus initialized to zero by definition.
dog 1oba1 processes each identifier in i denti fie rs.
...
(decl.c functions)+=
static void doglobal(p, cl) Symbol p; void *cl; {
...
303 305
(main
{
306) =
...
306 305
int i, j;
for Ci "' argc - 1; i > O; i--)
if (strncmp(argv[i], "-target=<", 8) "'= 0)
break;
if Ci > O) {
for (j = O; bindings[j].name; j++)
if (strcmp(&argv[i][8], bindings[j].name) 0)
break;
if (bindings[j].ir)
bindings 96 IR= bindings[j].ir;
fprint 97 else {
Interface 79 fprint(2, "%s: unknown target '%s'\n", argv[O],
main 305 &argv[i][8]);
typelnit 58 exit(l);
}
}
}
i f (!IR) {
inti;
fprint(2, "%s: must specify one of\n", argv[O]);
for (i = O; bindings[i].name; i++)
fprint(2, "\t-target"'%s\n", bindings[i] .name);
exit(l);
}
If no -target option is given, 1cc lists the available targets and exits.
Once IR points to an interface record, the front end is bound to a target
and this binding cannot be changed for the duration of translation unit.
Next, main initializes the front end's type system and parses its other
options:
....
(main 306)+=
typeinit();
306 307... 305
Further Reading
Ritchie (1993) gives a detailed history of C's development and describes
the origins and peculiarities of its declaration syntax, which is one of C's
distinguishing characteristics and the one that is most often criticized.
Sethi (1981) summarizes the ramifications of those design decisions, and
proposes an alternative syntax for declarators in which pointers are de-
noted by the suffix A as in Pascal instead of C's prefix *. If his alternative
had been adopted, dcl rand dcl rl would be much simpler.
Like most high-level languages, C demands that identifiers be declared
before they are used (functions are the lone exception). This rule forces
language designers to permit multiple declarations and induces rules
such as those for C's tentative definitions. Much of the code in dclX,
doglobal, and doextern is devoted to dealing with these design deci-
sions. Modula-3 (Nelson 1991) is one of the few languages that permits
declarations and uses to appear in any order and avoids ordering rules
altogether, which is simpler to understand. This design decision does
have its own impact on the compiler, but that impact is no greater than
the impact of C's rules governing multiple declarations.
Exercises
dclrl 267
dclr 265 11.1 dcl rl accepts the erroneous declaration int *const const *p, yet
doextern 303 l cc issues the expected diagnostic
doglobal 304
illegal type 'const const pointer to int'
Where and how is this error detected?
11.2 dcl rl's implementation looks peculiar. The syntax specification
on page 266 suggests that dcl rl begin with a loop that consumes
pointer followed by parsing the rest of declarator. Rewrite dcl rl
using this approach. You'll find that you'll need to append the
pointer portion of the inverted type to the inverted type constructed
by parsing the rest of declarator. Change your implementation into
one similar to dcl rl's by applying program transformations.
11.3 Type names are used in casts and as operands to si zeof (see (type
cast) and (sizeof) ). The syntax for type definitions is
type-name:
{ type-specifier I type-qualifier } [ abstract-declarator ]
abstract-declarator:
* { type-qualifier }
pointer ' (' abstract-declarator ') '
EXERCISES
{ suffix-abstract-declarator}
pointer { suffix-abstract-declarator }
-
suffix-abstract-declarator:
' [' [ constant-expression] ']'
' C' parameter-list ') '
An abstract-declarator is a declarator without an embedded identi-
fier. Implement
(main.c exported functions}=
extern Type typename ARGS((void));
...
309
Explain how.
11.6 In fields, the field with the largest alignment determines the align-
ment of the entire structure, which is correct only because the sizes
and alignments of the basic types must be powers of two. Revise
fie 1ds so that it is correct for any positive values for the sizes and
alignments of the basic types.
11. 7 A bit field declaration like unsigned: 0 causes subsequent bit fields
to be placed in the next addressable storage unit, even if there's
room in the current one. For example, if the declaration in Fig-
ure 11. l is rewritten as
struct {
char a[S];
short sl, s2;
unsigned code:3, :0, used:l;
310 CHAPTER 11 • DECLARATIONS
the code field stays in the unsigned at offset 12, and used lands to
the right of amt in the unsigned at 16. Explain how fields handles
this case.
11.8 Reading fields is excruciating. Write a new - presumably bet-
ter - version and compare the two versions side-by-side. Is your
version easier to understand? Do you have more confidence in its
correctness?
11.9 The syntax for enumeration specifiers is
enum-specifier:
enum [identifier] ' {' enumerator { , enumerator} '}'
enum identifier
enumerator:
identifier
identifier= constant-expression
The remaining missing pieces of 1cc's front end are those that convert
trees to dags and append them to the code list, and the functions gen code
and emi tcode, which back ends call from their function interface proce-
dures to traverse code lists. These pieces appear in dag. c, which exports
gencode and emitcode (see Section 5.10) and
(dag.c exported functions)+=
....
93
extern void walk ARGS((Tree e, int tlab, int flab));
extern Node listnodes ARGS((Tree e, int tlab, int flab));
extern Node newnode ARGS((int op, Node left, Node right,
Symbol p));
walk and listnodes manipulate the forest of dags defined in Section 5.5.
A sequence of forests represents the code for a function. The sequence is
formed by the Gen, Jump, and Label entries in a code list. As outlined in
Section 10.3, 1 i stnodes constructs a sequence incrementally; it converts
218 code
the tree e to a dag and appends that dag to the forest. Figures 5.2 and 5.3 28 deallocate
(pages 86 and 8 7) show examples of forests. 341 emitcode
wa1 k converts the tree e to a dag by calling 1i st nodes. It appends the 92 function
forest to the code list in a Gen entry, and reinitializes the front end for a 448 " (MIPS)
new forest. 1i st nodes bears the complexity of converting trees to dags, 484 " (SPARC)
518 " (X86)
so wa 1 k is easy: 337 gencode
217 Gen
( dag.c functions)= 315
..... 217 Jump
void walk(tp, tlab, flab) Tree tp; int tlab, flab; { 217 Label
listnodes(tp, tlab, flab); 318 listnodes
if (forest) { 315 newnode
code(Gen)->u.forest = forest->link; 317 reset
97 STMT
forest->link =NULL;
forest = NULL;
}
reset();
deallocate(STMT);
}
ADDI
k~
/"\)b
IND I RI INDIRI
i
ADDRGP
i
ADDRGP
a b
FIGURE 12.1 Dag for (a+b)+b*(a+b).
12.1 • ELIMINATING COMMON SUBEXPRESSIONS 313
Trees contain operators that are not part of the interface repertoire
listed in Table 5.1. 1i stnodes eliminates all occurrences of these oper-
ators, which are listed in Table 8.1, by generating code that implements
them. For example, it implements AND by annotating nodes for the com-
parison operators with labels and by inserting jumps and defining labels
where necessary. Similarly, it implements FIELD, which specifies bit-field
extraction or assignment, by appropriate combinations of shifting and
masking.
A basic block is a sequence of instructions that have a single entry
and a single exit with straight-line code in between; if one instruction
in the block is executed, all the instructions are executed. Instructions
that are targets of jumps and that follow a conditional or unconditional
jump begin basic blocks. Compilers often use a flow graph to represent a
function. The nodes in a flow graph are basic blocks and the directed arcs
indicate the possible flow of control between basic blocks. 1cc's code list
is not a flow graph, and the forests in Gen entries do not represent basic
blocks because they can include jumps and labels. They represent what
might be called expanded basic blocks: They do have single entry points,
but they can have multiple exits and multiple internal execution paths.
As the implementation of 1i stnodes below reveals, this design makes
common expressions available across entire expanded basic blocks in
some cases. It thus extends the lifetimes of these subexpressions beyond
basic blocks with little extra effort. 149 AND
40 constants
149 FIELD
217 Gen
12.1 Eliminating Common Subexpressions 318 listnodes
315 node
1 i stnodes takes a Tree t and builds the corresponding Node n. Trees
are defined in Section 8.1, and nodes are defined in Section 5.5. n->op
comes from t->op, and the elements of n->syms come from t->u. sym,
or from installing t->u. v in the constants table, or are fabricated from
other constants by 1 i stnodes. The elements of n->ki ds come from the
nodes for the corresponding elements of t->ki ds. n also has a count
field, which is the number of nodes that reference n as an operand.
Nodes are built from the bottom up: n is built by traversing tin post-
order with the equivalent of
1 = listnodes(t->kids[O], 0, O);
r = listnodes(t->kids[l], 0, O);
n = node(t->op, l, r, t->u.sym);
node allocates a new node and uses its arguments to initialize the fields.
To eliminate common subexpressions, node must determine if the re-
quested node has already been built; that is, if there's a node with the
same fields that can be used instead of building a new one.
node keeps a table of available nodes and consults this table before
allocating a new node. When it does allocate a new node, it adds that
314 CHAPTER 12 • GENERA TING INTERMEDIATE CODE
node to the table. Building the dag shown in Figure 12.1 from the tree
shown in Figure 8.1 (page 148) illustrates how this table is constructed
and used. Table 12.1 shows the sequence of calls to node, the node
each builds, if any, and the value returned. The middle column shows
node 315 the evolution of the table consulted by node. The nodes are denoted by
numbers. The first call is for the ADDRG+P tree in the lower left corner of
Figure 8.1; node's table is empty, so it builds a node for the ADDRG+P and
returns it. The next four calls, which traverse the remainder of the tree
for a+b, are similar; each builds the corresponding node and returns it.
As nodes are returned, they're used as operands in other nodes. When
node is called for the ADDRG+P node at the leaf of the left operand of the
MUL+I, it finds that node in the table (node 3) and returns it. Similarly, it
also finds that node 4 corresponds to (INDIRI 3). node continues to find
nodes in the table, including the node for the commmon subexpression
a+b.
The nodes depicted in the second column of the table above are stored
in a hash table:
(dag.c data}+=
....
311 333
.....
static struct dag {
struct node node;
struct dag *hlink;
} *buckets[16];
int nodecount;
dag structures hold a node and a pointer to another dag in the same
hash bucket. nodecount is the total number of nodes in buckets. The
hash table rarely has more than a few tens of nodes, which is why it
12.1 • ELIMINATING COMMON SUBEXPRESSIONS 315
has only 16 buckets. node searches buckets for a node with the same
operator, operands, and symbol and returns it if it's found; otherwise, it
builds a new node, adds it to buckets, and returns it.
....
( dag.c functions)+= 311 315
.....
static Node node(op, 1, r, sym)
int op; Node l, r; Symbol sym; {
int i;
struct dag *p;
i = (opindex(op)A((unsigned)sym>>2))&(NELEMS(buckets)-1);
for (p = buckets[i]; p; p = p->hlink)
if (p->node.op op && p->node.syms[O] sym
&& p->node.kids[O] == 1 && p->node.kids[l] r)
return &p->node;
p = dagnode(op, 1, r, sym);
p->hlink = buckets[i];
buckets[i] = p;
++nodecount;
return &p->node;
}
dagnode allocates and initializes a new dag and its embedded node. It
also increments the count fields of the operand nodes, if there are any. 81 count
(dag.c functions)+=
....
315 315
314 dag
97 FUNC
..... 318 listnodes
static struct dag *dagnode(op, l, r, sym)
int op; Node l, r; Symbol sym; { 19 NELEMS
24 NEWO
struct dag *p; 314 nodecount
98 opindex
NEWO(p, FUNC);
p->node.op = op;
if ((p->node.kids[OJ 1) != NULL)
++1->count;
if ((p->node.kids[l] r) != NULL)
++r->count;
p->node.syms[O] = sym;
return p;
}
Only newnode can be used by back ends to build nodes for their own
uses, such as generating code to spill registers.
Nodes appear in buckets as long as the values they represent are
valid. Assignments and function calls can invalidate some or all nodes
in buckets. For example, in
c = a + b;
a= a/2;
d = a + b;
the value of a+b computed in the third line isn't the same as the value
computed in the first line. The second line's assignment to a invalidates
the node (INDIRI a) where a is the node for the !value of a. Operators
with side effects, such as ASGN and CALL, must remove the nodes that
they invalidate. While these nodes are different for each such operator,
1 cc handles only two cases: Assignments to an identifier remove nodes
for its rvalue, and all other operators with side effects remove all nodes.
ki 11 handles assignments:
...
( dag.c functions)+=
static void kill(p) Symbol p; {
...
315 317
inti;
struct dag **q;
dag 314
i sadd rop 179 for (i = O; i < NELEMS(buckets); i++)
NELEMS 19 for (q = &buckets[i); *q; )
newnode 315 if ((*qrepresentsp'srvalue316)) {
nodecount 314
node 315 *q = (*q)->hlink;
--nodecount;
} else
q = &(*q)->hlink;
}
The obvious rvalue of p is a dag of the form (INDIR (ADDRxP p)), where
the ADDRxP is any of the address operators. The less obvious case is a
dag of the form (INDIR oc) where oc is an arbitrary address computation,
which might compute the address of p. Both cases are detected by
(*q represents p's rvalue 316)= 316
generic((*q)->node.op) == INDIR &&
(!isaddrop((*q)->node.kids[O]->op)
I I (*q)->node.kids[O]->syms[O] == p)
Only the INDIR nodes must be removed, because that's enough to make
subsequent searches fail. For example, after the assignment a = a/2,
the node for a+b remains in buckets. But the a+b in the assignment to d
won't find it because the reference to the rvalue of a builds a new node,
which causes a new node to be built for a+b. The sequence of calls to
12.2 • BUILDING NODES 317
...
(dag.c functions)+= 317 321
Node listnodes(tp, tlab, flab) Tree tp; int tlab, flab; {
...
Node p =NULL, l, r;
if (tp == NULL)
return NULL;
if (tp->node)
return tp->node;
switch (generic(tp->op)) {
(1 i stnodes cases 318)
}
tp->node = p;
return p;
}
tp->node points to the node for the tree tp. This field marks the tree
as visited by 1i stnodes, and ensures that 1i stnodes returns the correct
node for trees that are really dags, such as those shown in Figures 8.2
and 8.3 (pages 167 and 158). The multiply referenced subtrees in these
idioms are visited more than once; the first visit builds the node and the
subsequent visits simply return it.
The switch statement in 1i stnodes collects the operators into groups
AND 149 that have the same traversal and node-building code:
COND 149
node 315 (1 i stnodes cases 318)= 318
NOT 149 case AND: { (AND 323) } break;
OR 149 case OR: { (OR) } break;
RIGHT 149 case NOT: { (NOT 322) }
case COND: { (COND 325) } break;
case CNST: { (CNST 327) } break;
case RIGHT: { (RIGHT335) } break;
case JUMP: { {JUMP321}} break;
case CALL: { (CALL 332) } break;
case ARG: { (ARG 334) } break;
case EQ: case NE: case GT: case GE: case LE:
case LT: { (EQ .. LT321) } break;
case ASGN: { (ASGN 328) } break;
case BOR: case BAND: case BXOR:
case ADD: case SUB: case RSH:
case LSH: { (ADD .. RSH319)} break;
case DIV: case MUL:
case MOD: { (DIV .. MOD)} break;
case RET: { (RET) } break;
case CVC: case CVD: case CVF: case CVI:
case CVP: case CVS: case CVU: case BCOM:
case NEG: { (CVx,NEG,BCOM319) } break;
12.2 • BUILDING NODES 319
forest is a circularly linked list, so it points to the last node on the list,
unless it's null, which causes list to initialize forest. The link field
also marks the node as a root, and list won't list roots more than once.
The comparison operators illustrate the use of the arguments tlab
and flab to listnodes. Only one of tlab or flab can be nonzero. The
operator jumps to tl ab if the outcome of the comparison is true and to
fl ab if the outcome is false. Nodes for comparison operators carry the
destination as a label symbol in their syms [OJ fields. This symbol is the
destination when the comparison is true; there is no way to specify a
destination for a false outcome. The case for the comparison operators
thus uses the inverse operator when fl ab is nonzero:
(EQ •• LT321)= 318
Node p;
322 CHAPTER 12 • GENERATING INTERMEDIATE CODE
l = listnodes(tp->kids[O], 0, O);
r = listnodes(tp->kids[l], 0, O);
if (tlab)
list(newnode(tp->op, l, r, findlabel(tlab)));
else if (flab) {
int op= generic(tp->op);
switch (generic(op)) {
case EQ: op= NE+ optype(tp->op); break;
case NE: op= EQ + optype(tp->op); break;
case GT: op= LE+ optype(tp->op); break;
case LT: op= GE+ optype(tp->op); break;
case GE: op= LT+ optype(tp->op); break;
case LE: op= GT+ optype(tp->op); break;
}
list(newnode(op, l, r, findlabel(flab)));
}
if (forest->syms[O])
forest->syms[O]->ref++;
1i stnodes also handles the control-flow operators that appear only in
trees: AND, OR, NOT, and COND. NOT is handled by simply reversing the true
and false labels in a recursive call to 1i stnodes:
AND 149 (NOT 322)=
COND 149 318
cond 174 return listnodes(tp->kids[O], flab, tlab);
fi ndl abe l 46
forest 311 AND and OR use short-circuit evaluation: They must stop evaluating
list 321 their arguments as soon as the outcome is known. For example, in
l i stnodes 318
newnode 315 if Ci>= 0 && i < 10 && a[i] >max) max= a[i];
NOT 149
optype 98 a[i] must not be evaluated if i is less than zero or greater than or equal
OR 149 to 10. The operands of AND and OR are always conditional expressions or
ref 38 constants (andtree calls cond for each operand), so the cases for these
operators need only define the appropriate true and false labels and pass
them to the calls on l i stnodes for the operands.
Suppose tlab is zero and flab is L; the short-circuit code generated
for e1 && e2 has the form
if e1 == 0 goto L
if e2 == 0 goto L
In other words, if e1 is zero, execution continues at L and e2 is not eval-
uated. Otherwise, e2 is evaluated and execution continues at L if e1 is
nonzero but e2 is zero. Control falls through only when both e1 and e2
are nonzero. When t 1ab is L and fl ab is zero, control falls through when
e1 or e2 is zero, and execution continues at L only when e1 and e2 are
both nonzero. The generated code has the form
12.3 • FLOW OF CONTROL 323
if e 1 0 goto L'
if e2 != 0 goto L
L':
fl ab equal to, say, 2, and the recursive calls descend the tree pass-
ing 2 as fl ab. There are no intervening calls to reset, so the second
and third subexpressions can reuse values computed in first and second
subexpressions. The forest for this statement is shown in Figure 12.3.
The 2s under the comparison operators and under the LABELV denote
the symbol-table pointers to the label 2 in their syms [OJ fields.
The expression e ? l : r yields the COND tree shown in Figure 12.4.
The RIGHT tree serves only to carry the two assignment trees. The gen-
erated code is similar to the code for an if statement:
if e == 0 goto L
COND 149 t1 = l
reset 317 goto L + 1
RIGHT 149
L: t1 =r
L + 1:
LSHI
k ~ADDRGP i
ADDP
/ ~ a ~
INDIRI CNSTI ADDRGP
i 2 b
ADDRGP
FIGURE 12.3 Forest for a[i] && a[i ]+b[i] > 0 && a[i ]+b[i] < 10.
12.3 • FLOW OF CONTROL 325
COND
/tl~
e RIGHT
/~
ASGN ASGN
/~ l
ADDRL+P
/~ r
ADDRL+P
tl tl
FIGURE 12A Tree for e ? l : r.
if (tp->u.sym)
addlocal(tp->u.sym);
flab= genlabel(2); 219 addlocal
listnodes(tp->kids[O], 0, flab); 149 COND
reset(); 248 equatelab
46 findlabel
Next, the code for this case generates nodes for first assignment, the 311 forest
jump, L, the second assignment, and L + 1: 45 genlabel
247 jump
.... labelnode
(COND 325}+=
listnodes(q->kids[O], 0, O);
325 ...
326 318 323
321 list
318 listnodes
(equate LABEL to L + 1 325} 38 LOCAL
list(jump(flab + 1)); 317 reset
labelnode(flab);
listnodes(q->kids[l], 0, 0);
(equate LABEL to L + 1 325}
labelnode(flab + 1);
(CNST327)=: 327
.... 318
Type ty = unqual(tp->type);
if (tlab I I flab) {
if (tlab && tp->u.v.i != 0)
list(jump(tlab));
else if (flab && tp->u.v.i == 0)
list(jump(flab));
}
For the example above, nothing is generated for the CNST tree, which is
exactly what the programmer intended. A jump is generated for code
like
while (1) ...
Constants that don't appear in conditiona~ contexts yield CNST nodes,
unless their types dictate that they should be placed out of line:
(CNST327)+=
...
327 318
else if (ty->u.sym->addressed)
p = listnodes(cvtconst(tp), 0, O);
else
p = node(tp->op, NULL, NULL, constant(ty, tp->u.v));
typeinit sets the addressed flag in a basic type's symbol-table entry to
one if constants of that type cannot appear in instructions. Thus, a con- 179 addressed
stant whose type's symbol has addressed set is placed in a variable and 62 atop
references to it are replaced by references to the rvalue of the variable. 47 constant
305 doconst
The constant 0.5 in Figure 1.2 (page 6) is an example; it appears in the 303 finalize
tree, but ends up in a variable as shown in Figure 1.3. cvtconst gener- 49 genident
ates the anonymous static variable, if necessary, and returns a tree for 38 GLOBAL
the rvalue of that variable: 168 idtree
(dag.c functions)+=
...
323 337
60 isarray
.... 247 jump
Tree cvtconst(p) Tree p; { 321 list
Symbol q = constant(p->type, p->u.v); 318 listnodes
Tree e; 315 node
80 STATIC
150 tree
if (q->u.c.loc ==NULL) 58 typelnit
q-~u.c.loc = genident(STATIC, p->type, GLOBAL); 60 unqual
if (isarray(p->type)) {
e = tree(ADDRG+P, atop(p->type), NULL, NULL);
e->u.sym = q->u.c.loc;
} else
e = idtree(q->u.c.loc);
return e;
}
These variables are initialized when finalize calls doconst at the end
of compilation.
328 CHAPTER 12 • GENERATING INTERMEDIATE CODE
12.4 Assignments
Nodes for assignments are always listed and return no value. Trees for
assignments, however, mirror the semantics of assignment in C, which
returns the value of its left operand. The l i stnodes case for assignment
traverses the operands, and builds and lists the assignment node. It
begins by processing the operands:
(ASGN 328)= 328
.... 318
if (tp->kids[O]->op == FIELD) {
(l, r - for a bit-field assignment329)
} else {
l listnodes(tp->kids[O], 0, O);
r = listnodes(tp->kids[l], 0, O);
}
list(newnode(tp->op, l, r, NULL));
forest->syms[O] = intconst(tp->kids[l]->type->size);
forest->syms[l] = intconst(tp->kids[l]->type->align);
An ASGN's syms fields point to symbol-table entries for constants that
give the size and alignment of the value (see page 83). Assignments to
bit fields are described below.
An assignment invalidates nodes in the node table that depend on the
addrtree 210
align 78
previous value of the left operand. l cc handles just two cases:
computed 211
(ASGN 328) +=
...
328 329 318
FIELD 149 ....
forest 311 if (isaddrop(tp->kids[O]->op)
intconst 49 && !tp->kids[O]->u.sym->computed)
isaddrop 179 kill(tp->kids[O]->u.sym);
kill 316 else
list 321
listnodes 318
reset();
newnode 315 If the left operand is the address of a source-language variable or a tem-
reset 317
porary, the assignment kills only nodes for its rvalue. If the left operand
is the address of a computed variable or a computed address, the assign-
ment clears the node table. A computed variable represents the address
of variable plus a constant, such as a field reference, and is generated by
addrtree. Assignments to computed variables are like assignments to ar-
ray elements - an assignment to a single element kills everything. Less
drastic measures require more sophisticated analyses; those that offer
the most benefit, like global common subexpression elimination, require
data-flow analysis of the entire function, which l cc is not designed to
accommodate.
The value of an assignment is the new value of the left operand, which
is the possibly converted value of the right operand, so l i stnodes ar-
ranges for that node to annotate the ASGN tree:
12.4 • ASSIGNMENTS 329
(ASGN 328)+=
...
328 318
p = listnodes(tp->kids[l], 0, O);
tp->ki ds [1] has already been visited and annotated with the node that's
assigned to r above. Sop usually equals r, except for assignments to bit
fields, which compute r differently and may not visit tp->ki ds [1] at all,
as detailed below.
A FIELD tree as the left operand of ASGN tree identifies an assign-
ment to a bit field. These assignments are compiled into the appro-
priate sequence of shifts and bitwise logical operations. Consider, for
example, the multiple assignment w = x. amt = y where x is defined in
Figure 11.1 on page 279 and wand y are global integers. The first assign-
ment, x. amt = y, is compiled into the equivalent of
*~ = ((*~)&OxFFFFFFF80) I (y&Ox7F);
where~ denotes the address x+16. The word at x+16 is fetched, the bits
corresponding to the field amt are cleared, the rightmost 7 bits of y are
ORed into the cleared bits, and the resulting value is stored back into
x+16. This expression isn't quite correct: The value of x. amt = y, which
is assigned tow, is not y, it's the new value of x. amt. This value is equal
to y unless its most significant bit is one, in which case that bit must be
sign-extended if the result of the assignment is used. So, if y is 255, w
becomes -1. 66 Field
1 i stnodes handles this case by building an ASGN tree whose right 149 FIELD
operand computes the correct value. For w = x. amt = y, it builds a right 182 field
operand that's equivalent to (y«25)»25, which is what should appear 66 fieldmask
in place of yin the assignment to*~ above. Figure 12.5 shows the com- 66 fi el dri ght
66 fieldsize
plete tree for this multiple assignment. 318 listnodes
The code for a bit-field assignment builds a tree for the expression 169 lvalue
shown above, and calls 1 i stnodes to traverse it. 361 mask
317 reset
(1, r - for a bit-field assignment 329) = 328
Tree x = tp->kids[O]->kids[O];
Field f = tp->kids[O]->u.field;
reset();
1 = listnodes(lvalue(x), 0, O);
if (fieldsize(f) < 8*f->type->size) {
unsigned int fmask = fieldmask(f);
unsigned int mask= fmask<<fieldright(f);
Tree q = tp->kids[l];
(q - the r.h.s. tree 330)
r = listnodes(q, 0, O);
} else
r = listnodes(tp->kids[l], 0, 0);
The u. sym field of the FIELD tree tucked under the ASGN tree points to a
field structure that describes the bit field. For the amt field, fmask and
330 CHAPTER 12 • GENERATING INTERMEDIATE CODE
ASGN+I
ADDRG+P
/~ASGN+I
/~RSH+I
w
FIELD
i
INDIR+I
I\ LSH+I CNST+I
i I\ 25
ADDRG+P
x+16 i
INDIR+I CNST+I
25
ADDRG+P
y
assignment sets all of the bits in the field, which can be done by ORing
the word with mask:
....
{q .__the r.h.s. tree 330)+=
else if (q->op """" CNST+I && (q->u.v.i&fmask)
330 331
fmask
... 329
/~ /~
CVIU
i CNSTU CVIU
Oxffffff80 i CNSTU
Ox7f
IND IR RSHI
LSH~
/~
j""'CNSTI
/ '-.___/ 25
IND I RI
i
ADDRGP
y
FIGURE 12.6 Forest for w = x. amt = y.
332 CHAPTER 12 • GENERA TING INTERMEDIATE CODE
x. b++ changes the word that holds x. c. The tree returned by bi ttree's
second argument, rval ue(lvalue(x)) in the fragment above, causes that
word to be fetched again. If x were used instead, the assignment to x. c
would use the value of the word before x . b is changed, and the new value
of x. b would be lost.
CALL+B
/~
RIGHT ADDRL+P
/~
temp
ARG f
en
/ "·· ·~
ARG
e2
/~ARG
/
FIGURE 12.7 Tree for f (ei, e2, ... , en) where f returns a structure.
12.5 •FUNCTION CALLS 333
i
ADDRLP
i i }" }
temp
FIGURE 12.s Forest for f (e1 , e2 , ... , en) where f returns a structure.
firstarg = NULL;
if (tp->op == CALL+B && !IR->wants_callb) {
{list CALL+B arguments 333)
p = newnode(CALLV, l, NULL, NULL);
} else {
1 = listnodes(tp->kids[O], 0, 0);
r = listnodes(tp->kids[l], 0, 0);
p = newnode(tp->op, 1, r, NULL);
}
list(p);
reset();
cfunc->u.f.ncalls++;
firstarg =save;
(dag.c data)+=
....
314 343
..... 290 cfunc
static Tree firstarg; 286 funcdefn
92 function
When necessary, fi rstarg carries the tree for the hidden first argument, 448 " (MIPS)
as described below. It's saved, reinitialized to null, and restored so that 484 " (SPARC)
arguments that include other calls don't overwrite it. A call is always 518 " (X86)
listed, and it kills all nodes in the node table. The nca 11 s field in a 306 IR
symbol-table entry for a function records the number of CALLs that func- 88 left_to_right
321 list
tion makes. This value supplies the fourth argument to the interface 318 listnodes
procedure function, which is called from funcdefn. 315 newnode
As Figure 12.7 shows, tp->ki ds [OJ is a RIGHT tree that holds both the 317 reset
arguments and the tree for the function address. Traversing this tree 149 RIGHT
thus lists the arguments and returns the node for the function address, 150 tree
88 wants_callb
which becomes the left operand of the CALL node.
For CALL+B trees, the tree for the hidden first argument is assigned to
fi rstarg:
{list CALL+B arguments 333) = 333
Tree argO = tree(ARG+P, tp->kids[l]->type,
tp->kids[l], NULL);
if (IR->left_to_right)
firstarg = argO;
1 = listnodes(tp->kids[O], 0, O);
if (!IR->left_to_right I I firstarg) {
firstarg =NULL;
334 CHAPTER 12 • GENERA TING INTERMEDIATE CODE
listnodes(argO, 0, O);
}
If 1eft_to_ri ght is one, fi rstarg gets the tree for the hidden argument
just before the arguments are visited, which occurs when 1i stnodes tra-
verses tp->ki ds [OJ, and the hidden argument will be listed before the
other arguments. When left_to_right is zero, fi rstarg is unnecessary
because the hidden argument is listed last anyway.
The last if statement in the fragment above also traverses the tree
for the hidden argument when 1eft_to_ri ght is one and fi rstarg is
nonnull. This case occurs for a call to a function that returns a structure
but that has no arguments. For this case, the fi rstarg will not have
been traversed by the ARG code below because tp->ki ds [OJ contains no
ARG trees for this call.
An ARG subtree is built as the arguments are parsed from left to right,
and thus it always has the rightmost argument as the root, as shown
in Figure 12.7. The ARG nodes can be listed left to right by visiting
tp->ki ds [lJ before tp->ki ds [OJ; visiting the operands in the other or-
der lists the ARG nodes right-to-left.
{ARG 334):: 318
if (IR->left_to_right)
listnodes(tp->kids[lJ, 0, O);
align 78
firstarg 333
if (fi rstarg) {
forest 311 Tree arg = firstarg;
intconst 49 firstarg =NULL;
IR 306 listnodes(arg, 0, O);
left_to_right 88 }
list 321
listnodes 318
1 = listnodes(tp->kids[OJ, 0, O);
newnode 315 list(newnode(tp->op, 1, NULL, NULL));
forest->syms[OJ = intconst(tp->type->size);
forest->syms[lJ = intconst(tp->type->align);
if (!IR->left_to_right)
listnodes(tp->kids[lJ, 0, O);
Like an ASGN node, the syms field of an ARG node points to symbol-table
entries for constants that give the size and alignment of the argument.
The first time execution reaches the test of fi rstarg when the flag
1eft_to_ri ght is one is when the ARG for the first argument - the one
for e 1 in Figure 12.7 - is traversed. If fi rstarg is nonnull, it's listed be-
fore the tree for the first argument and reset to null so that it's traversed
only once.
12.6 •ENFORCING EVALUATION ORDER 335
i
RIGHT
~f
CALL
i
RIGHT ARG
/~ B ~ARG
ARG
e2
/ /
336 CHAPTER 12 • GENERA TING INTERMEDIATE CODE
t, ~~ t, }
FIGURE 12.10 Forest for f (e1, g(e2), e3).
Compare this figure to Figure 12.7. The arguments in Figure 12.9 ap-
pear as the right operand of an extra RIGHT node, which is shaded, and
the left operand of that node is the tree for the nested call g(e2). f's
second ARG node refers to the value of the call to B.
1i stnodes traverses this tree in the code for CALL described in Sec-
tion 12.5. When the left and only operand of the topmost CALL is tra-
versed, the RIGHT trees cause the nested call to B to be traversed and
listed before any of the arguments to f. Figure 12.10 shows the result-
ing forest.
RIGHT trees are also used to enforce the correct semantics for the ex-
pressions e++ and e--. Figure 12.11 shows the tree built by postfix for
i ++. The RIGHT nodes collaborate to return the value of i before it's
incremented, but there's an additional complication. To enforce an eval-
uation order that evaluates the INDIR+I first, that tree must be traversed
FIELD 149 and its node listed in the forest before the assignment to i, and the node
list 321 must annotate the RIGHT tree. Listing this INDIR node is what requires
listnodes 318 special treatment for the RIGHT idiom depicted by the lower RIGHT node
postfix 166 in Figure 12.11.
RIGHT 149
(tp is a tree fore++ 336)= 335
tp->kids[O] && tp->kids[l]
&& generic(tp->kids[l]->op) == ASGN
&& (generic(tp->kids[O]->op) == INDIR
&& tp->kids[O]->kids[O] == tp->kids[l]->kids[O]
I I (tp->kids[O]->op == FIELD
&& tp->kids[O] == tp->kids[l]->kids[O]))
As this test indicates, for postincrement or postdecrement of a bit field,
a FIELD node appears instead of an INDIR node, and this FIELD node is
the target of the assignment.
When e is a not a bit field, the INDIR tree is traversed, and its node is
listed before traversing the RIGHT tree's second operand:
(generate nodes fore++ 336) =
if (generic(tp->kids[O]->Op) == INDIR) {
337 ... 335
p = listnodes(tp->kids[O], 0, 0);
list(p);
listnodes(tp->kids[l], 0, 0);
}
12.7 • DRIVING CODE GENERATION 337
RIGHT
/
RIGHT
~ ASGN+I
INDIR+I
i~~ADD+I
ADDRG+P
~CNST+I
1
Figure 5.3 (page 87) shows the forest for the assignment i = *p++. The
INDIR node for p's rvalue appears before the assignment to p.
Bit fields are problematic. l i stnodes can't list a FIELD node be-
cause there isn't one - FIELD operators appear only in trees. Instead,
l i stnodes must look below the FIELD tree to the INDIR tree that fetches
the word in which the bit field appears, traverse that tree, and list its
node:
(generate nodes fore++ 336) +=
...
336 335
93 callee
93 caller
else { 341 emitcode
list(listnodes(tp->kids[O]->kids[O], 0, O)); 149 FIELD
p = listnodes(tp->kids[O], 0, O); 286 funcdefn
listnodes(tp->kids[l], 0, O); 92 function
448 " (MIPS)
}
484 " (SPARC)
518 " (X86)
321 list
12.7 Driving Code Generation 318 listnodes
Once the code list for a function is complete, funcdefn calls the inter-
face procedure function to generate and emit the code. As described in
Section 5.10, this interface function makes two calls back into the front
end: It calls gen code to generate code, and it calls emi tcode to emit the
code it generates. Each of these functions makes a pass over the code
list, calling the appropriate interface function for each code-list entry.
funcdefn builds two arrays of pointers to symbol-table entries: The
callee array holds the parameters of the function as seen from within
the function, and ca11 er holds the parameters as seen by any callers of
the functions. These arrays are passed to function, which passes them
to gencode:
...
(dag.c functions)+=
void gencode(caller, callee) Symbol caller[], callee[]; {
327 340 ...
338 CHAPTER 12 • GENERATING INTERMEDIATE CODE
Code cp;
Coordinate save;
save = src;
(generate caller to callee assignments 338)
cp = codehead.next;
for ( ; errcnt <= 0 && cp; cp = cp->next)
switch (cp->kind) {
case Address: (gencode Address339) break;
case Blockbeg: (gencode Blockbeg339) break;
case Blockend: (gencode Blockend339) break;
case Defpoint: src = cp->u.point.src; break;
case Gen: case Jump:
case Label: (gencode Gen,Jump,Label 340) break;
case Local: (*IR->local)(cp->u.var); break;
case Switch: break;
}
src = save;
}
latter symbols have been announced to the back end, the interface func-
tion address can be called to define the symbols in Address entries.
These entries can also define symbols that depend on parameters, which
have already been announced.
Gen, Jump, and Label entries carry forests that represent the code for
expressions, jumps, and label definitions. These forests are passed to
the interface function gen:
(gencode Gen, Jump, Label 340}= 338
if (!IR->wants_dag)
cp->u.forest = undag(cp->u.forest);
fixup(cp->u.forest);
cp->u.forest = (*IR->gen)(cp->u.forest);
gen returns a pointer to a node. Usually, it annotates the nodes in forest,
and perhaps reorganizes and returns the forest, but this interface per-
mits gen to return something else that can represented by a pointer to a
node. All of the back ends in this book return a pointer to a list of nodes
for the instructions in the forest. If gen returns null, the corresponding
call to the interface function emit, described below, is not made.
As detailed in previous sections, the forests in Gen entries can have
nodes that are referenced more than once because they represent com-
mon subexpressions. If the interface flag wants_dag is one, gen is
address 90 passed forests with these kinds of nodes. If wants_dag is zero, how-
Address 217 ever, undag generates assignments that store common subexpressions
(MIPS) address 457 in temporaries, and replaces references to the nodes that compute them
(SPARC) " 490
(X86) " 521
by references to the temporaries. Section 12.8 reveals the details.
emit 92 The syms [OJ fields of nodes for the comparison operators and jumps
emit 393 point to symbol-table entries for labels. These labels might be synonyms
equated 341 for the real label, described in Section 10.9. fi xup finds these nodes and
equatelab 248 changes their syms [OJ fields to point to the real labels.
forest 311 ....
gen
Gen
92
217
(dag.c functions}+=
static void fixup(p) Node p; {
337 341 ...
gen 402 for ( ; p; p = p->link)
IR 306
Jump 217 switch (generic(p->op)) {
Label 217 case JUMP:
undag 343 if (p->kids[OJ->op == ADDRG+P)
wants_dag 89 p->kids[OJ->syms[OJ =
equated(p->kids[OJ->syms[OJ);
break;
case EQ: case GE: case GT: case LE: case LT: case NE:
p->syms[OJ = equated(p->syms[OJ);
}
}
When equate lab makes Li a synonym for L2, it sets the u .1. equatedto
field of the symbol-table entry for L1 to the symbol-table entry for L 2 •
12. 7 • DRIVING CODE GENERATION 341
fi xup need inspect only the root nodes in the forest, because JUMP and
the comparison operators always appear as roots.
Once gencode returns, the interface procedure function has all the
information it needs, such as the size of the frame and the number of
registers used, to generate the function prologue. When it's ready to emit
the generated code, it calls emi tcode:
(dag.c functions)+=
...
341 343
....
void emitcode() {
Code cp;
Coordinate save;
save = src;
cp = codehead.next;
for ( ; errcnt <= 0 && cp; cp cp->next) 217 Address
217 Blockbeg
switch (cp->kind) { 217 Blockend
case Address: break; 217 codehead
case Blockbeg: (emitcode Blockbeg) break; 217 Code
case Blockend: (emitcode Blockend) break; 38 Coordinate
case Defpoint: (emitcode Defpoint341) break; 217 Defpoint
92 emit
case Gen: case Jump: 393 emit
case Label: (emitcode Gen,Jump,Label 342) break; 46 equatedto
case Local: (emitcode Local) break; 340 fixup
case Switch: (emi tcode Switch 342) break; 92 function
} 448 " (MIPS)
484 " (SPARC)
src = save; 518 " (X86)
} 337 gencode
92 gen
(emi tcode Defpoi nt 341) = 341 217 Gen
src = cp->u.point.src; 402 gen
217 Jump
The cases for the code-list entries for Defpoi nt, Bl ockbeg, Bl ockend, and 143 kind
Local don't emit code. If lee's -g option is specified, however, these 217 Label
cases call the stab interface functions to emit symbol-table information 217 Local
217 Switch
for debuggers.
Gen, Jump, and Labe 1 entries carry the forests returned by the interface
function gen, and emi tcode passes the nonnull forests to the interface
function emi t:
342 CHAPTER 12 • GENERA TING INTERMED/A TE CODE
CODE 91 The value-label pairs in u. swtch. va1 ues and u. swtch. 1abe1 s are sorted
defaddress 91 in ascending order by value, but those values may not be contiguous.
(MIPS) " 456 The default label in u. swtch. de fl ab is used for the missing values.
(SPARC) " 490
(X86) " 523
defglobal 265
emitcode 341 12.8 Eliminating Multiply Referenced Nodes
emit 92
emit 393 The front end builds trees, but some of those trees are dags. 1 i stnodes
equated 341 takes these trees and builds dags so that it can eliminate common subex-
forest 311
IR 306 pressions. This section describes undag, which takes dags and turns
labels 41 them back into proper trees, though they're still called dags. 1cc's un-
listnodes 318 fortunate abuse of proper terminology is perhaps best dealt with by re-
LIT 91 membering that "trees" refers to the intermediate representation built
simplify 203 and manipulated by the front end, and "dags" refers to the intermediate
swcode 240
Switch 217 representation passed to and manipulated by the back ends.
table 41 1 i stnodes could be eliminated, but this would also sacrifice common-
undag 343 subexpression elimination, which contributes significantly to the qual-
ity of the generated code. The earliest versions of 1 cc did the oppo-
site: the front end built dags directly. This approach was abandoned for
the present scheme because dags made code transformations, like those
done by si mp 1 i fy, much more complicated. Maintaining the reference
counts, for example, was prone to error.
A node that represents a common subexpression is pointed to by the
elements of the kids arrays in at least two other nodes in the same forest,
12.8 •ELIMINATING MULTIPLY REFERENCED NODES 343
and its count field records the number of those pointers. Back ends
can generate code directly from the dags in each forest passed to the
interface function gen, but these multiply referenced nodes complicate
code generation in general and register allocation in particular. Some
compilers thus eliminate these nodes, either in their front end or in their
code generator. They generate code to assign their values to temporaries,
and they replace the references to these nodes with references to their
temporaries. As mentioned in Section 12.7, setting the interface flag
wants_dag to zero causes 1cc's front end to generate these assignments
and thus eliminate multiply referenced nodes. If wants_dag is zero, the
front end also generates assignments for CALLs that return values, even
if they're referenced only once, because listing a CALL node will give it a
hidden reference from the code list. All the code generators in this book
set wants_dag to zero.
gencode calls undag with each forest in the code list before passing
the forest to the interface function gen. undag builds and returns a new
forest, adding the necessary assignments to the new forest as it visits
each node in the old one.
....
( dag.c data)+= 333
static Node *tail;
(dag.c functions)+=
....
341 345
..... 81 count
static Node undag(forest) Node forest; { 311 forest
Node p; 337 gencode
92 gen
tail = &forest; 402 gen
for (p = forest; p; p = p->link) 344 iscall
87 mulops_calls
if (generic(p->op) == INDIR 345 visit
I I iscall(p) && p->count >= 1) 89 wants_dag
visit(p, 1);
else {
visit(p, 1);
*tail = p;
tail = &p->link;
}
*tail = NULL;
return forest;
}
The two arms of the if statement handle nodes that do not appear as
roots in the new forest and those that do. Listed INDIR nodes and calls
referenced by other nodes are replaced in the new forest by assignments
of their values to temporaries. All other listed nodes, such as nodes for
the comparisons, JUMP, LABEL, ASGN, and CALLS executed for side effect
only, are appended to the new forest. Here, calls includes the multiplica-
tive operators if the interface flag mu 1ops_ca11 s is one:
344 CHAPTER 12 • GENERA TING INTERMEDIATE CODE
{dag.c macros)=
#define iscall(p) (generic((p)->op) ==CALL \
I I IR->mulops_calls \
&& ((p)->OP==DIV+I I I (p)->OP==MOD+I I I (p)->OP==MUL+I \
I I (p)->OP==DIV+U I I (p)->OP==MOD+U I I (p)->OP==MUL+U))
vi sit traverses a dag looking for nodes that are referenced more than
once - those whose count fields exceed one. On the first encounter with
each such node, vi sit generates a temporary, builds an assignment of
the node to the temporary, and appends that assignment to the new
forest. When that node is encountered again, either in the same dag or
in a subsequent dag, visit replaces the reference to the node with a
new node that references the appropriate temporary. The effect is that
an assignment to a temporary appears in the new forest just before the
root of the dag that first references it.
An example helps illustrate visit's details. The forest for the state-
ment in
register int n, *q;
n = *q++ = f(n, n);
is shown in Figure 12.12. There are five common subexpressions and
thus five multiply referenced nodes: The lvalues of q and n, the rvalues
IR 306 of q and n, and the call to f. Figure 12.13 shows the forest returned by
mulops_calls 87 undag. Only two of these common subexpressions have been replaced by
undag 343
temporaries: t2 is assigned the rvalue of q and t3 is assigned the value
visit 345
returned by the call to f. There are no temporaries for the lvalues of q
and n because it's just as easy to recompute them, which is why there
ADJRX ~DP~
qLP \__,,)
mot::
i
-,
CN~TI \;ADDnRLP _,'
,, .... ---------------------- _____________ ,,.,'
' -~
, CArI- - - - - - - - ->ASGNI- - - - - - ->ASGNI
ADDRGP
f
ASGNP-------------~ASGNP-------------~ ARGI---~ARGI-,
ADDRLP
/ "'\.. INDIRP ADDRLP
/ "'\.. ADDP
i
INDIRI
i
INDIRI
t
2
i
ADDRLP
q /
IND I RP
"'\.. i
CNSTI ADDRLP
i
ADDRLP
q
i
ADDRLP
4 n n
t2
/ "'\..
ADDRLP CALLI INDIRP
/ "'\.. INDIRI ADDRLP
/ "'\.. INDIRI
t3 i
ADDRGP
i
ADDRLP
i
ADDRLP
n i
ADDRLP
f t2 t3 t3
FIGURE 12.13 Forest for n = *q++ = f(n, n) when wants_dag is zero.
are two (ADDRLP q) nodes and three (ADDRLP n) nodes in Figure 12.13.
As detailed below, there's no temporary for the rvalue of n because n is 343 undag
a register, so it's cheaper to replicate the INDIR node that references n,
which is why there are two (INDIRI (ADDRLP n)) dags in Figure 12.13.
The forest shown in Figure 12.13 is what might be generated if the state-
ment above were written as
register int n, *q, *t2, t3;
t2 = q;
q = *t2 + 1;
t3 = f(n, n);
*t2 = t3;
n = t3;
visit traverses the dag rooted at p and returns either p or a node for
the temporary that holds the value represented by p:
(dag.c functions)+=
....
343 346
.....
static Node visit(p, listed) Node p; int listed; {
if (p)
(visit 346);
return p;
}
1i sted is one when undag calls visit, and it's zero when visit calls
itself recursively.
346 CHAPTER 12 •GENERATING INTERMEDIATE CODE
(temporaries 346) = 38
struct {
Node cse;
} t;
Symbol-table entries for temporaries that hold common subexpressions
are identified as such by having nonnull u. t. cse fields. These fields
point to the nodes that represent their values. Back ends may use this
information to identify common subexpressions that are cheaper to re-
compute than to burn a register for.
btot 74 A nonnull p->syms [2] also marks p as a common subexpression, so
count 81 references to p must be replaced by references to the temporary, which
defined 50
IR 306 is visit's first step:
isunsigned 60
LOCAL 38
(visit346)=
i f (p->syms[2])
...
347 345
local 90
Local 217 p = tmpnode(p);
(MIPS) local 447
(SPARC) " 483 tmpnode builds and returns the dag (INDIR (ADDRLP p->syms [2]) ), which
(X86) " 518 references the temporary's rvalue:
newnode 315 ...
ref
REGISTER
38
80
(dag.c functions)+=
static Node tmpnode(p) Node p; {
345 348 ...
temporary 50 Symbol tmp = p->syms[2];
ttob 73
visit 345
if (--p->count == 0)
p->syms[2] = NULL;
p = newnode(INDIR + (type suffix fortmp->type 346),
newnode(ADDRLP, NULL, NULL, tmp), NULL, NULL);
p->count = 1;
return p;
}
This case also reveals why undag can't be called earlier - for example,
from walk. The storage class of locals and parameters isn't certain un-
til the back end has seen the function. Once consumed, funcdefn calls
checkref, which changes the storage class of frequently accessed locals
and parameters to REGISTER. If undag were called from wa1 k, it would
generate temporaries for automatic locals and parameters that might
later become registers.
The last two cases cover INDIRB nodes and nodes for common subex-
pressions. Registers can't hold structures, so there's no point in copying
them to temporaries; vi sit just replicates the INDIRB node:
(visit 346) +=
....
347 345
else if (p->op == INDIRB) {
--p->count;
p = newnode(p->op, p->kids[O], NULL, NULL);
p->count = 1;
(visit the operands 347)
} else {
(visit the operands 347)
(p->syms [2] - a generated temporary346)
*tail= asgnnode(p->syms[2], p);
tail = &(*tail)->link;
align 78 if ( ! 1 i sted)
checkref 296 p = tmpnode(p);
count 81 }
funcdefn 286
intconst 49 The else clause handles the first encounter with a common subexpres-
newnode 315 sion. After traversing the operands, visit generates a temporary, as
REGISTER 80 described above, and calls
tail 343
tmpnode 346 (dag.c functions)+=
....
346
undag 343
visit 345 static Node asgnnode(tmp, p) Symbol tmp; Node p; {
walk 311 p = newnode(ASGN + (type suffix fortmp->type 346),
newnode(ADDRLP, NULL, NULL, tmp), p, NULL);
p->syms[O] = intconst(tmp->type->size);
p->syms[l] = intconst(tmp->type->align);
return p;
}
Further Reading
Using the code list to represent a function's code is idiosyncratic to 1cc.
A flow graph is the more traditional representation. As detailed in tradi-
tional compiler texts, such as Aho, Sethi, and Ullman (1986), the nodes
in a flow graph are basic blocks and the edges represent branches from
blocks to their successors. A flow graph is the representation usually
used for optimizations that 1cc doesn't do. Many intra-procedural op-
timization algorithms that discover and improve the code in loops use
flow graphs, for example.
The bottom-up hashing algorithm used by node to discover common
subexpressions is also known as value numbering, and it's been used in
compilers since the late 1950s. The node numbers shown in Tables 12.l
and 12.2 are the value numbers of the nodes with which they are associ-
ated. Value numbering is also used in data-flow algorithms that compute
information about available expressions in a flow graph. This informa-
tion can be used to eliminate common subexpressions that are used in
more than one basic block.
The scheme used in Section 12 .3 to generate short-circuit code for the
&& and I I operators is similer to the approach described by Logothetis
and Mishra (1981). That approach and lee's propagate true and false
labels. Another approach, called backpatching, propagates lists of holes
- the empty targets of jumps. Once the targets are known, these lists 316 kill
are traversed to fill the jumps. This approach works particularly well 318 listnodes
315 node
with syntax-directed translations in bottom-up parsers (Aho, Sethi, and 149 OR
Ullman 1986).
Most compilers generate code from trees, but some use dags; Aho,
Sethi, and Ullman (1986) describe the relevant code-generation algo-
rithms for trees and dags and weigh their pros and cons. Earlier ver-
sions of 1cc included code generators that accepted dags. Instruction
selection in these code generators was described with compact "pro-
grams" in a language designed specifically for generating code from 1cc's
dags (Fraser 1989). This language was used to write code generators for
the VAX, Motorola 68030, SPARC, and MIPS. All the code generators in
this book use trees.
Exercises
12.1 ki 11 continues searching buckets for rvalues of p even after it's
found and removed the first one. Give a C fragment that illustrates
why there can be more than one kind of IND IR node for p in buckets
at the same time. Hint: casts.
12.2 Implement {OR), the case in 1i st nodes for the OR operator.
350 CHAPTER 12 • GENERATING INTERMEDIATE CODE
while (a[i] && a[i]+b[i] > 0 && a[i]+b[i] < 10) ...
1 cc generates
S1
goto L + 1
asgntree 197
equatelab 248 L: S2
exprO 156 L + 1:
expr 155
listnodes 318 Revise 1cc to omit the goto and the dead code S2.
mulops_calls 87
12. 9 Figure 12. 5 is the tree for the assignment w = x . amt = y; the lower
ASGN+I tree is the tree for the single assignment x. amt = y. If
the value of a bit-field assignment isn't used, asgntree's efforts
in building a tree for the right operand that computes the correct
result of the assignment are wasted and generate unnecessary code.
Whenever the front end realizes that the value of a tree isn't used,
it passes the tree to root and uses the tree returned in place of the
original; see exprO and expr for examples. Study root and extend
it to simplify the right-hand sides of bit-field assignments when
possible.
12.10 asgntree and the 1 i stnodes code in Section 12.4 collaborate to
compute the result of a bit-field assignment by sign-extending or
masking when necessary. Similar cases occur for other assign-
ments. For example,
int i;
short s;
i = s = OxFFFF;
EXERCISES 351
The bit fields b and c are in the same word, so that word is fetched
twice and stored twice.
12.13 Managing labels and their synonyms is an instance of the union-
find problem, which is described in Chapter 30 of Sedgewick (1990).
Replace equatelab, fixup, and equated with versions that use the
path-compression algorithm commonly used for solving union-find
problems. Measure the improvement in l cc's execution time. If
there's no significant improvement, explain why.
12.14 Why doesn't visit treat ADDRGP nodes like ADDRLP and ADDRFP
nodes? 341 equated
248 equatelab
340 fixup
88 left_to_right
345 visit
13
Structuring the Code Generator
The code generator supplies the front end with interface functions that
find target-dependent instructions to implement machine-independent
intermediate code. Interface functions also assign variables and tempo-
raries to registers, to fixed cells in memory, and to stacks, which are also
in memory.
A recurring priority throughout the design of 1cc's back end has been
overall simplicity. Few compiling texts include any production code gen-
erators, and we present three. Typical modest handwritten code gener-
ators require 1,000-1,500 lines of C. Careful segregation of the target-
specific material has cut this figure roughly in half. The cost is about
1,000 lines of machine-independent code, but we break even at two tar-
gets and profit from there on out; more important, it's easier to get a
new code generator right if we use as much preexisting (i.e., machine-
independent) code as possible.
1cc segregates some target-specific material by simply reorganizing
print 18 mostly machine-independent functions into a large machine-independent
routine that calls a smaller target-specific routine. It segregates other
material by isolating it in tables; for example, 1cc's register allocator is
largely machine-independent, and processes target-specific data held in
structures that have a target-independent form. Finally, 1cc segregates
some target-specific material by capturing it in languages specialized for
concise expression of the material; for example, 1cc uses a language
tailored for expressing instruction selectors, and this language includes
a sublanguage for driving a code emitter.
To the machine-independent part of the code generator, target-specific
operations are like hot coals; they must be handled indirectly, with
"tongs." If a machine-independent routine must emit a store instruc-
tion, for example, it can't just call print. It must create an ASGN dag and
generate code for it, or escape to a target-specific function that emits the
instruction, or emit a predefined target-specific template. All these solu-
tions need more code than a print call, but they can still pay off because
they simplify retargeting. For example, a less machine-independent reg-
ister spiller with target-dependent parts for each of three targets might
take less code overall than 1cc's machine-independent spiller. But de-
bugging spillers is hard, so it can save time to debug one machine-
independent spiller instead of three simpler target-specific ones.
The next chapters cover instruction selection, register allocation, and
the machine-specific material. This chapter describes the overall orga-
352
13. 1 • ORGANIZATION OF THE CODE GENERATOR 353
nization of the code generator and its data structures. It also treats a
few loose ends that are machine-independent but don't fit cleanly under
instruction selection or register allocation.
The rest of this book uses the term tree to denote a tree structure
stored in node records, where the previous chapters use the term dag
for structures built from nodes. To make matters worse, the previous
chapters use the term tree for structures that multiply reference at least
some nodes, so they aren't really trees. Changing terms in midstream
is confusing, but the alternative is even worse. l cc originally used code
generators that worked on dags, but the code generators in this book
require trees; if subsequent text used "dag," it would be wrong, because
some of the algorithms fail if the inputs are not pure trees. l cc still
constructs dags in order to eliminate common subexpressions, but the
code in this book clears wants_dag.
The front end calls the interface procedure function to generate code
for a routine. function decides how to receive and store the formals,
then calls gencode in the front end. gencode calls gen in the back end
for each forest in the code list. When gencode returns, the back end has
seen the entire routine and has computed the stack size and registers
used, so function emits the procedure prologue, then calls emi tcode in
the front end, which calls emit in the back end for each forest in the
code list. When emit returns, function emits the epilogue and returns.
gen coordinates the routines that select instructions and allocate regis-
ter temporaries for those instructions: rewrite, prune, 1 i neari ze, and
ra 11 oc. rewrite selects in~tructions for a single tree. prune projects
subinstructions - operati6ns such as those computed by addressing
modes - out of the tree pecause they don't need registers, and elimi-
nating them now simplifies the register allocator. 1 i neari ze orders for
output the instructions that remain. ra 11 oc accepts one node, allocates
a target register for it, and frees any source registers that are no longer
needed.
rewrite coordinates the routines that select instructions: pre 1abe1,
_label, and reduce. prelabel identifies the set of registers that suits
each node, and edits a few trees to identify more explicitly nodes that
read and write register variables. _ l abe 1 is automatically generated from
a grammar that describes the target machine's instructions. It labels a
emit2 356 tree with all plausible implementations that use the target instructions.
(MIPS) " 444 reduce selects the implementation that's cheapest.
(SPARC) " 478 emit coordinates the routines that emit instructions and that iden-
(X86) " 5ll
emitasm 391 tify some instructions that need not be emitted: emi tasm, requate,
emitcode 341 and movese 1f. requate identifies some unnecessary register-to-register
emit 92 copies, and movese 1f identifies instructions that copy a register to itself.
emit 393 emi tasm interprets assembler templates that are a bit like pri ntf format
function 92 strings. emi tasm escapes to a target-specific emi t2 for a few instructions
(MIPS) " 448
(SPARC) " 484 too complex for templates.
(X86) " 518
gencode 337
gen 92 13.2 Interface Extensions
gen 402
1i neari ze 413
moveself 394 The material in the back end falls into two categories: target-specific
prelabel 398 versus machine-independent, and private to the back end versus visible
prune 386 to the front end. The two categories combine to divide the back end four
ralloc 417 ways. Here's a sample routine of each kind from Table 13.1:
reduce 382
requate 394
rewrite 402 Routine Name Private? Target-specifi.c?
gen no no
function no yes
rewrite yes no
_label yes yes
13.2 • INTERFACE EXTENSIONS 355
The back end makes several passes over the forest of trees. The first
pass calls x. doarg as it encounters each ARG node. 1cc needs doarg in
order to emit code compatible with tricky calling conventions.
x. target marks tree nodes that must be evaluated into a specific reg-
ister:
(Xinterface 355)+=
...
356 357 355
....
void (*target) ARGS((Node));
For example, return values must be developed into the return register,
and some machines develop quotients and remainders into fixed regis-
ters. The mark takes the form of an assignment to the node's syms [RX],
which records the result register for the node. Section 13.5 elaborates.
x . c1ob be r spills to memory and later reloads all registers destroyed
by a given instruction:
(Xinterface 355)+=
...
357 355
void (*clobber) ARGS((Node));
It usually takes the form of a switch on the node's opcode; each of the
few cases calls spi 11, which is a machine-independent procedure that
saves and restores a given set of registers.
412 askregvar
13.3 Upcalls 367 blkcopy
356 doarg
Just as the back end uses some code and data in the front end, so 445 " (MIPS)
477 " (SPARC)
the target-specific code in the back end uses some code and data in 512 " (X86)
the machine-independent part of the back end. Most front-end routines 385 mayrecalc
reached by upcalls are simple and at or near leaves in the call graph, 366 mkactual
so it is easy for Chapter 5 to explain them. The back end's internal 365 mkauto
analogues are less simple and cannot, in general, be described out of 427 spill
context. They're summarized here so that retargeters can find them all 356 x.doarg
in one spot; consult the page cited in the mini-index for the definition
and - perhaps better yet - consult Chapters 16-18 for sample uses. In-
deed, perhaps the best way to retarget 1cc is to adapt one of the existing
code generators; having a complete set of sample upcalls is one of the
attractions.
(conflg.h 355)+=
...
355 358
....
extern int askregvar ARGS((Symbol, Symbol));
extern void blkcopy ARGS ( (int, i nt , int,
int, int, int[]));
extern int getregnum ARGS((Node));
extern int mayrecalc ARGS((Node));
extern int mkactual ARGS((int, int));
extern void mkauto ARGS((Symbol));
358 CHAPTER 13 • STRUCTURING THE CODE GENERA TOR
...
(Xnode fields 358) +=
short inst;
358 359... 358
The tree parallels the one in the front end's kids, but the nodes com-
puted by subinstructions like addressing modes are projected out, as
shown in Figure 1.5. That is, x. kids stores the solid lines in Figure 1.5
on page 9; kids stores all lines there.
x. kids has three elements because 1 cc emits SPARC and X86 instruc-
tions that read up to three source registers, namely those that store one
register to an address formed by adding two others. 1 cc once generated
VAX code and used instructions with up to three operands that used up
to two registers each - a base register and an index register - so that
version of the compiler had six elements in its x. kids.
At some point, the code generator must order the instructions for out-
put. The back end traverses the projected instruction tree in postorder
and forms in x. prev and x. next a doubly linked list of the instructions
in this execution order: 81 kids
... 358 x. inst
(Xnode fields 358) +=
Node prev, next;
359 359... 358
For example, Figure 13.1 shows this list for Figure 1.5. It omits the trees
threaded through kids and x. kids.
The register allocator uses x. prevuse to link all nodes that read and
write the same temporary:
...
(Xnode fields 358) +=
Node prevuse;
359 359
... 358
Some calling conventions pass the first few arguments in registers, so the
back end helps out by recording the argument number in the x. argno
field of ARG nodes:
(Xnode fields 358) +=
...
359 358
short argno;
Each node extension holds several flags that identify properties of the
node. Roots in the forest need some special treatment from, for example,
the register allocator, so the back end flags them using x. 1 i sted:
(Xnode flags 359)=
unsigned listed:!;
...
360 358
360 CHAPTER 13 • STRUCTURING THE CODE GENERA TOR
l1NDIRD
"fld qword ptr %0\n"
lASGNF
"fstp dword ptr %0\n"
lINDIRF
"fld dword ptr %0\n"
lcvFD
"# nop\n"
lADDD
"fadd%1\n"
lcvo1
"sub esp,4\n
fistp dword ptr O[esp]\n
pop %c\n"
lRETI
"# ret\n"
lLABELV
"%a:\n"
The register allocator and the emitter can traverse some nodes more
than once, but they must allocate a register and emit the node only at
the first traversal, so they set x. registered and x. emitted to prevent
reprocessing:
....
(Xnode flags 359)+=
unsigned registered:!;
359 360 ... 358
unsigned emitted:!;
1cc rearranges some expression temporaries to eliminate instructions;
to facilitate these optimizations, the back end uses x. copy to mark all
instructions that copy one register to another, and it uses x. equatable to
mark those that copy a register to a common-subexpression temporary:
....
(Xnode flags 359)+=
unsigned copy:!;
360 361
... 358
unsigned equatable:!;
Some common subexpressions are too cheap to deserve a register. To
save such registers, the back end flags uses x .mayrecal c to mark nodes
for computing common subexpressions that can be reevaluated safely.
13.4 • NODE EXTENSIONS 361
The back end adds two generic opcodes for node structures. LOAD
represents a register-to-register copy. The back end inserts a LOAD node
when a parent needs an input in one register and the child yields a differ-
ent register. For example, if a function is called and its value is assigned
to a register variable, then the child CALL yields the return register, and
the parent needs a LOAD to copy it to the register variable.
If the back end assigns a local or formal to a register, it substitutes
VREG for all ADDRFP or ADDRLP opcodes for the variable. Register and
memory references need different code, and a different opcode tells us
which to emit. There is sure to be an ASGN or INDIR node above the
VREG; otherwise, the program computes the address of a register variable,
which is forbidden. The INDIR is not torn out of the tree even though
programs fetch register variables with no true indirection.
The target-independent Regnode structure describes a target-specific
register:
(confi.g.h 355)+=
...
358 361 ....
typedef struct {
Symbol vbl;
short set;
385 mayrecalc
short number;
315 node
unsigned mask;
} *Regnode;
(gen.c functions)=
Symbol mkreg(fmt, n, mask, set)
...
363
NEWO(p, PERM);
p->x.name = stringf(fmt, n);
NEWO(p->x.regnode, PERM);
p->x.regnode->number = n;
p->x.regnode->mask = mask<<n;
p->x.regnode->set =set;
return p;
}
stri ngf is used to create a register name that includes the register num-
ber. For example, if i is 7, then mkreg("r%d", i, 1, !REG) creates a
register named r7. Acalllikemkreg("sp", 29, 1, !REG) is used if reg-
ister 29 is generally called sp instead of r29.
The back end also represents sets of registers; for example, if a node
must be evaluated into a specific register, the back end marks the node
with the register, but if the node can be evaluated into any one of a set
of registers, then the mark is given a value that represents the set. The
back end represents a set of registers by storing a vector of pointers to 361 mask
register symbols in the x. wi l dcard field of a special wildcard symbol: 24 NEWO
(fields for registers 362) +=
...
362 362
361
97
number
PERM
Symbol *wildcard; 361 set
99 stringf
For example, the back end for a machine with 32 integer registers would 362 x.name
allocate 32 register symbols and store them in a 32-element vector. Then 362 x.regnode
it would allocate one wildcard symbol and store in its x. wi l dcard the
address of the vector. mkwi l dcard creates a register-set symbol:
...
(gen.c functions)+=
Symbol mkwildcard(syms) Symbol *syms; {
363 365
...
Symbol p;
NEWO(p, PERM);
p->x.name = "wildcard";
p->x.wildcard = syms;
return p;
}
The x. name "wi l dcard" should never appear in l cc's output, but x. name
is initialized nonetheless, so that the emitter doesn't crash - and even
emits a telling register name - when the impossible happens.
364 CHAPTER 13 • STRUCTURING THE CODE GENERA TOR
high addresses
frame for ma i n
frame for f
low addresses
FIGURE 13.2 Three stack frames.
13. 6 • FRAME LAYOUT 365
high addresses
return address
saved registers
+-----frame pointer
locals
outgoing arguments
low addresses .....__ _ _ _ _ ____.+-----stack pointer
FIGURE 13.3 Typical frame.
....
(gen.c functions)+=
void mkauto(p) Symbol p; {
363 365
...
offset= roundup(offset + p->type->size, p->type->align);
p->x.offset = -offset;
p->x.name = stringd(-offset);
}
Using the absolute value avoids questions about rounding when divid-
ing negative integers, and we don't assume that all offsets are negative 78 align
because for some formals, for example, they aren't. 410 freemask
361 FREG
At the beginning of each block, the front end calls blockbeg to save 361 IREG
the current stack offset and allocation status of each register: 364 offset
.... 19 roundup
(config.b 355)+=
typedef struct {
362 377... 29 stringd
362 x.name
int offset; 362 x.offset
unsigned freemask[2];
} Env;
....
(gen.c functions)+=
void blockbeg(e) Env *e; {
365 365
...
e->offset = offset;
e->freemask[IREG] freemask[IREG];
e->freemask[FREG] = freemask[FREG];
}
blockend also computes the maximum value of offset for the current
routine. The interface procedure function sets framesize
....
(gen.c data)+= 365 366
int framesi ze; ""
to maxoffset - or more to save space to store data like registers that
must be saved by the callee - and it emits a procedure prologue and
epilogue that adjust the stack pointer by framesi ze to allocate and deal-
locate stack space for all blocks in the routine at once.
Each routine's stack frame includes an argument-build area, which is
a block of memory for outgoing arguments, as shown in Figure 13.3. 1 cc
can pass arguments by pushing them onto the stack; the push instruc-
tions allocate the block of memory implicitly. Current RISC machines,
however, have no push instructions, and simulating them with multiple
instructions is slow. On these machines, 1cc allocates a block of mem-
align 78 ory and moves each argument into its cell in the block. It creates one
blockend 95 block for each routine, making the block big enough for the largest set
blockend 365 of outgoing arguments.
docall 367
freemask 410
The code and data that compute the offsets and block size in the
FREG 361 argument-build area resemble the ones above that manage automatics.
function 92 argoffset is the next available block offset. mkactua1 rounds it up to a
(MIPS) " 448 specified alignment, returns the result, and updates argoffset:
(SPARC) " 484 ....
(X86) " 518 (gen.c data)+= 366 366
IREG 361 int argoffset; ""
maxoffset 365
offset 364 ....
roundup 19 (gen.c functions)+= 365 367
int mkactual(align, size) int align, size; { ""
int n = roundup(argoffset, align);
argoffset = n + size;
return n;
}
do ca11 is invoked on the CALL node that ends each list of arguments.
It clears argoffset for the next set of arguments, and computes in
maxargoffset the size of the largest block of outgoing arguments:
(gen.c data)+=
....
366 368
int maxargoffset; ""
13. 7 • GENERATING CODE TO COPY BLOCKS 367
....
(gen.c functions)+=
static void docall(p) Node p; {
366 367 ...
p->syms[O] = intconst(argoffset);
if (argoffset > maxargoffset)
maxargoffset = argoffset;
argoffset = O;
}
doca 11 records in p->syms [O] the size of this call's argument block, so
that the caller can pop it off the stack if necessary. The X86 code gener-
ator illustrates this mechanism.
return;
If fewer than four bytes remain, bl kcopy calls bl kun ro 11 to emit code to
copy them:
....
(blkcopy367)+=
else if (size <= 2)
367368 ... 367
else if (size == 3) {
blkunroll(2, dreg, doff, sreg, soff, 2, tmp);
blkunroll(l, dreg, doff+2, sreg, soff+2, 1, tmp);
}
Figure 13.4 shows lee generating MIPS code to copy a 20-byte structure
with four-byte alignment of the source and destination. The first col-
umn traces the calls to the procedures above. The second shows the
corresponding emitted code. tmps is initialized to {3, 9, 10}. Chapter 16
describes the MIPS instructions and the MIPS bl kl oop, bl kfetch, and
b1 kun ro 11. Its b1k1 oop copies eight bytes at a time. It calls b1 kcopy
recursively to copy the four bytes left over just before the loop.
370 CHAPTER 13 • STRUCTURING THE CODE GENERATOR
13.8 Initialization
parse.flags recognizes the command-line options that affect code gen-
eration. -d enables debugging output, which helps when retargeting 1cc.
This book omits the calls that emit debugging output, but they're on the
companion diskette.
(gen.c data)+=
...
368 371
....
int dflag = O;
(gen.c functions)+=
...
368 382
....
void parseflags(argc, argv) int argc; char *argv[]; {
inti;
1cc can run on one machine - the host - and emit code for another
- the target. One machine can be a big endian and the other a little en-
dian, which subtly complicates emitting doub 1e constants, and is another
matter that benefits from attention during initialization.
1cc assumes that it is running on and compiling for machines with
IEEE floating-point arithmetic. The host and target machines need not
be the same, but both must use IEEE floating-point arithmetic. This as-
sumption was once constraining, but it sacrifices little now.
FURTHER READING 371
Further Reading
From this chapter on, it helps to be up to date on computer architecture.
For example, b1kunro11 's load-load-store-store pattern makes little sense
without an understanding of how loads and stores typically interact on
current machines. Patterson and Hennessy (1990) surveys computer ar-
chitecture.
372 CHAPTER 13 • STRUCTURING THE CODE GENERATOR
Exercises
13.1 Parts of 1cc assume that the target machine has at most two regis-
ter sets. Identify these parts and generalize them to handle more
register sets.
13.2 Parts of 1cc assume that the target machine has at most N registers
in each register set, where N is the number of bits in an unsigned.
Identify these parts and generalize them to handle larger register
sets.
13.3 The first column in Figure 13.4 gives a call trace for
blkcopy(25, 0, 8, 0, 20, {3, 9, 10})
when the source and destination addresses are divisible by four.
Give the analogous trace when the source and destination addresses
are divisible by two but not four.
13.4 1cc unrolls loops that copy structures of 16 or fewer bytes. This
limit was chosen somewhat arbitrarily. Run experiments to deter-
mine if another limit suits your machine better.
blkcopy 367
14
Selecting and Emitting Instructions
stmt: ASGNI(addr,reg)
addr: ADDRLP
states that the nonterminal stmt matches each ASGNI node whose chil-
dren recursively match the nonterminals addr and reg.
The generated code generator - that is, the output of the code-
generator generator l burg - produces a tree cover, which completely
covers each input tree with patterns from the grammar rules that meet
each pattern's constraints on terminals and nonterminals. For example,
Figure 14.1 gives a cover for the tree
ASGNI(ADDP(INDIRP(ADDRLP(p)),CNSTI(4)),CNSTI(5))
using the two rules above plus a few more shown in the figure. The rules
to the side of each node identify the cover, and the shaded regions each
correspond to one instruction on most machines.
Tree grammars that describe instruction sets are usually ambiguous.
For example, one can typically increment a register by adding one to it
directly, or by loading one into another register then adding the second
register to the first. We prefer the cheapest implementation, so we aug-
ment each rule with a cost, and prefer the tree parse with the smallest
total cost. Section 14.2 shows tree labels with costs.
A partial cover that looks cheap low in the tree can look more expen-
sive when it's completed, because the cover from the root down to the
partial cover can be costly. When matching a subtree, we can't know
which matches will look good when it is completed higher in the tree,
so the generated code generator records the best match for every non-
terminal at each node. Then the higher levels can choose any available
nonterminal, even those that don't look cheap at the lower levels. This
technique - recording a set of solutions and picking one of them later
- is called dynamic programming.
The generated code generator makes two passes over each subject
tree. The first pass is a bottom-up labeller, which finds a set of patterns
that cover each subtree with minimum cost. The second pass is a top-
down reducer, which picks the cheapest cover from the set recorded by
14. 1 • SPECIFICATIONS 375
14.1 Specifications
The following grammar describes 1burg specifications. term and non-
term denote identifiers that are terminals and nonterminals:
grammar:
'%{' configuration '%}' { dcl } %".-6 { rule} [ %% C code ]
dcl:
%start nonterm
%term { term= integer}
rule:
nonterm : tree template [ C expression ]
tree:
term [ ' ( ' tree [ , tree ] ' ) ' ]
non term
template:
11
any character except double quote }
{
11
358 x.state
1burg specifications are line oriented. The tokens %{, %} , and %% must
appear alone in a line, and all of a dcl or rule must appear on a line.
The configuration is C code. It is copied verbatim into the beginning of
BURM. If there's a second%%, the text after it is also copied verbatim into
BURM. at the end.
The configuration interfaces BURM and the trees being parsed. It de-
fines NODEPTILTYPE to be a visible type name for a pointer to a node
in the subject tree. BURM uses the functions or macros OP_LABEL(p),
LEFT_CHILD(p), and RIGHT_CHILD(p) to read the operator and children
from the node pointed to by p.
BURM computes and stores a void pointer state in each node of the
subject tree. The configuration section defines a macro STATE_LABEL(p)
to access the state field of the node pointed to by p. A macro is required
because 1 burg uses it as an lvalue. The other configuration operations
may be implemented as macros or functions.
All 1burg specifications in this book share one configuration:
(]burg prefix375)= 431463 496
#include c.h 11 11
The %start directive names the nonterminal that the root of each tree
must match. If there is no %start directive, the default start symbol is
the nonterminal defined by the first rule.
The %term declarations declare terminals - the operators in subject
trees - and associate a unique, positive integral opcode with each one.
OP_LABEL(p) must return a valid opcode for node p. Each terminal has
fixed arity, which lburg infers from the rules using that terminal. lburg
restricts terminals to at most two children. 1cc's terminal declarations,
for example, include:
(terminal declarations 3 76) = 431 463 496
%start stmt
%term ADDD=306 ADDF=305 ADDI=309 ADDP=311 ADDU=310
%term ADDRFP=279
%term ADDRGP=263
%term ADDRLP=295
%term ARGB=41 ARGD=34 ARGF=33 ARGI=37 ARGP=39
Figure 14.2 holds a partial 1 burg specification for lee and a subset of
the instruction set of most machines. The second and third lines declare
terminals.
Rules define tree patterns in a fully parenthesized prefix form. Ev-
ery nonterminal denotes a tree. A chain rule is a rule whose pattern is
OP_LABEL 375 another nonterminal. In Figure 14.2, rules 4, 5, and 8 are chain rules.
stmt 403
The rules describe the instruction set and addressing modes offered
by the target machine. Each rule has an assembler code template, which
is a quoted string that specifies what to emit when this rule is used.
Section 14.6 describes the format of these templates. In Figure 14.2, the
templates are merely rule numbers.
Rules end with an optional cost. Chain rules must use constant costs,
but other rules may use arbitrary C expressions in which a denotes the
%start stmt
%term ADDI=309 ADDRLP=295 ASGNI=53
%term CNSTI=21 CVCI=85 INDIRC=67
%%
con: CNSTI "1"
addr: ADDRLP "2"
addr: ADDI(reg,con) "3"
re: con "4"
re: reg "5"
reg: ADDI (reg, re) "6" 1
reg: CVCI(INDIRC(addr)) "7" 1
reg: addr "8" 1
stmt: ASGNI(addr,reg) "9" 1
notes that unsigned constants cost nothing if they fit in a byte, and have
an infinite cost otherwise. All costs must evaluate to integers between
zero and LBURG_MAX inclusive. LBURG_MAX is defined as the largest short
integer:
( confi.g.h 355) +=
...
365
#define LBURG_MAX SHRT_MAX
Omitted costs default to zero. The cost of a derivation is the sum of the
costs for all rules applied in the derivation. The tree parser finds the
cheapest parse of the subject tree. It breaks ties arbitrarily.
In Figure 14.2, con matches constants. addr matches trees that can be
computed by address calculations, like an ADDRLP or the sum of a register
and a constant. re matches a constant or a reg, and reg matches any tree
that can be computed into a register. Rule 6 describes an add instruction;
its first operand must be in a register, its second operand must be a
register or a constant, and its result is left in a register. Rule 7 describes
an instruction that loads a byte, extends the sign bit, and leaves the result
in a register. Rule 8 describes an instruction that loads an address into
a register. stmt matches trees executed for side effect, which include
assignments. Rule 9 describes an instruction that stores a register into
the cell addressed by some addressing mode.
rules aren't needed here. They are needed for, say, the CNSTI node, which
matches only con directly, but its parent needs re, and only a chain rule
records that every con is also an re. A bottom-up tree matcher can't
know which matches are needed at a higher level, so it records all of
them and lets the top-down reduction pass select the ones required by
the winner.
NODEPTR.._TYPE 375 Patterns can specify subtrees beyond the immediate children. For ex-
ample, rule 7 of Figure 14.2 refers to the grandchild of the CVCI node.
No separate pattern matches the INDIRC node, but rule ?'s pattern cov-
ers that node. The cost is the cost of matching the ADDRLP c as an addr
(using rule 2) plus one.
Nodes are annotated with (N, C, M) only if C is less than all previ-
ous matches of the nonterminal in rule M. For example, the ADDI node
matches a reg using rule 6; the total cost is 2. It also matches an addr
using rule 3, so chain rule 8 gives a second match for reg, also at a to-
tal cost of 2. Only one of these matches for reg will be recorded. 1 burg
breaks ties arbitrarily, so there's no easy way to predict which match will
win, but it doesn't matter because they have the same cost.
lburg generates the function
which labels the entire subject tree pointed to by a. State zero labels un-
matched trees; such trees may be corrupt or merely inconsistent with the
grammar. lburg starts all generated names with an underscore to avoid
colliding with names in BURM's C prologue and epilogue. The identifiers
are declared static and their addresses are stored in an interface record
so that 1cc can include multiple code generators. One fragment collects
the identifiers' declarations for a structure declarator:
14.3 • REDUCING THE TREE 379
The cost vector stores the cost of the best match for each nonterminal,
and the rule vector stores the rule number that achieved that cost. (Part
of the declaration above is a white lie: l burg compresses the rule field
using bit fields, but l burg supplies functions to extract the fields, so we
needn't waste time studying the encoding.)
l burg writes a function _rule, which accepts a tree's state label and
an integer representing a nonterminal:
(BURM signature 378}+=
....
378 380
....
static int _rule ARGS((void *state, int nt));
It extracts from the label's encoded vector of rule numbers the number
of the rule with the given nonterminal on the left. It returns zero if no
rule matched the nonterminal.
BURM's second pass, or reducer, traverses the subject tree top-down,
so it has the context that the labeller was missing. The root must match
the start nonterminal, so the reducer extracts the best rule for the start
nonterminal from the vector of rule numbers encoded by the root's la-
bel. If this rule's pattern includes nonterminals, then they identify a new
380 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS
frontier to reduce and the nonterminals that the frontier must match.
The process begun with the root is thus repeated recursively to expose
the best cover for the entire tree. The display below traces the process
for Figure 14.3:
_rule(root, stmt) = 9
_rule(root->kids[O], addr) = 2
_rule(root->kids[l], reg)= 6
_rule(root->kids[l]->kids[O], reg)= 7
_rule(root->kids[l]->kids[O]->kids[O]->kids[O], addr) 2
_rule(root->kids[l]->kids[l], re)= 5
_rule(root->kids[l]->kids[l], con)= 1
Each rule's pattern identifies the subject subtrees and nonterminals
for all recursive visits. Here, a subtree is not necessarily an immediate
child of the current node. Patterns with interior operators cause the
reducer to skip the corresponding subject nodes, so the reducer may
proceed directly to grandchildren, great-grandchildren, and so on. On the
other hand, chain rules cause the reducer to revisit the current subject
node, with a new nonterminal, so x is also regarded as a subtree of x.
1burg represents the start nonterminal with 1, so nt for the initial,
root-level call on _rule must be 1. BURM defines and initializes an array
that identifies the values for nested calls:
_rule 379
{BURM signature 378) +=
...
379 381
....
static short *_nts[];
_nts is an array indexed by rule numbers. Each element points to a zero-
terminated vector of short integers, which encode the nonterminals for
that rule's pattern, left-to-right. For example, the following code imple-
ments _nts for Figure 14.2:
static short _rLnts[] { 0 };
static short _rLnts[] { 4, 1, 0 } ;
static short _r4_nts[] { 1, 0 };
static short _r5_nts [] { 4, 0 };
static short _r6_nts[] { 4, 3, 0 } ;
static short _r7_nts[] { 2, 0 };
static short _r9_nts[] { 2' 4, 0 } ;
short *_nts[] ={
0, /* (no rule zero) */
_rl_nts, /* con: CNSTI */
_rl_nts, /* addr: ADDRLP */
_r3_nts, /* addr: ADDI(reg,con) */
_r4_nts, /* re: con */
_rs_nts, /* re: reg */
14.3 • REDUCING THE TREE 381
The user needs only _rule and _nts to write a complete reducer, but
the redundant _kids simplifies many applications:
.....
(BURM signature 378) += ..... 380 389
static void _kids
ARGS((NODEPTR_TYPE p, int rulenum, NODEPTR_TYPE kids[]));
It accepts the address of a tree p, a rule number, and an empty vector of
pointers to trees. The procedure assumes that p matched the given rule,
and it fills in the vector with the subtrees (in the sense described above)
of p that must be reduced recursively. kids is not null-terminated.
The code below shows the minimal reducer. It traverses the best cover
bottom-up and left-to-right, but it doesn't do anything during the traver-
sal. parse labels the tree and then starts the reduction. reduce gets
the number of the matching rule from _rule, the matching frontier from
_kids, and the nonterminals to use for the recursive calls from _nts.
parse(NODEPTR_TYPE p) { 375 NODEPTR_TYPE
_label(p); 379 _rule
reduce(p, 1);
}
This particular reducer does nothing with any node. If the node were
processed - for example, emitted or allocated a register - in preorder,
the processing code would go at the beginning of the reducer. Postorder
processing code would go at the end, and inorder code would go between
reduce's recursive calls on itself. A reducer may recursively traverse sub-
trees in any order, and it may interleave arbitrary actions with recursive
traversals.
Multiple reducers may be written, to implement multipass algorithms
or independent single-pass algorithms. 1cc has three reducers. One
382 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS
identifies the nodes that need registers, another emits code, and a third
prints a tree cover to help during debugging. They all use get ru 1e, which
wraps _rule in some (elided) assertions and encapsulates the indirection
through IR:
...
(gen.c functions)+=
static int getrule(p, nt) Node p; int nt; {
370 382
...
int rulenum;
The first reducer prepares for register allocation. It augments the min-
imal reducer to mark nodes that are computed by instructions and thus
may need registers:
...
(gen.c functions)+=
static void reduce(p, nt) Node p; int nt; {
382 384
...
int rulenum, i;
short *nts;
Node kids[lO];
----~ASGNI ---------------~ASGNI
I\
VREGP CNSTI
/~INDIRI
ADDP
3 4 I\ i conventional
INDIRP INDIRI ADDP
(match
i i
VREGP VREGP
I\
INDIRP INDIRI reg: INDIRI(VREGP)
con: CNSTI
p 3
i i
VREGP VREGP (bonus match
q 3
....
(gen.c functions)+=
static Node reuse(p, nt) Node p; int nt; {
382 385...
struct _state {
short cost[l];
};
Symbol r = p->syms[RX];
if (generic(p->op) == INDIR && p->kids[O]->OP == VREG+P
&& r->u.t.cse && p->x.mayrecalc
&& ((struct _state*)r->u.t.cse->x.state)->cost[nt] == 0)
return r->u.t.cse;
else
return p;
}
The first return effectively ignores the tree p and reuses the definition
of common subexpression. If p uses a common subexpression, then the
definition of that subexpression is guaranteed to have been labelled al-
ready, so the reducer that called reuse can't wander off into the wind.
The cast and artificial _state above are necessary evils to access the la-
beller's cost of matching the tree to nonterminal nt. This book doesn't
expose the form of the state record - except for here, it's needed only in
cse 346 code generated automatically from the 1 burg specification - though it's
mayrecalc 385 easy to understand if you examine the companion diskette's source code
reduce 382 once you understand labelling. The length of the actual, target-specific
RX 362 cost vector can't be known here, but it isn't needed, so the declaration
_state 379
temporary 50
can pretend that the length is one.
VREG 361 reduce also counts the number of remaining uses for each temporary:
x.state 358
x.usecount 362 (count uses of temporaries 384) = 382
if (p->syms[RX] && p->syms[RX]->temporary) {
p->syms[RX]->x.usecount++;
}
(mayrecalc 385)
}
return O;
It also fails if any tree earlier in the forest clobbers an input to the com-
mon subexpression:
....
(mayrecalc 385)+=
for (q =head; q && q->x.listed; q = q->link)
385 385
... 385
if (generic(q->op) == ASGN
&& trashes(q->kids[O], p->syms[RX]->u.t.cse))
return O;
346 cse
If neither condition holds, then the common subexpression can safely be 92 gen
402 gen
reevaluated: 386 prune
(mayrecalc 385)+=
....
385 385
382 reduce
362 RX
p->x.mayrecalc = 1; 358 x. inst
return 1; 359 x.kids
359 x.listed
trashes(p, q) traverses the common subexpression q and reports if the
assignment target p is read anywhere in q:
....
(gen.c functions)+=
static int trashes(p, q) Node p, q; {
385 386 ...
if ( ! q)
return O;
else i f (p->op q->op && p->syms[O] q->syms[O])
return 1;
else
return trashes(p, q->kids[O])
I I trashes(p, q->kids[l]);
}
When reduce and its helpers are done, gen calls prune. It uses the
x. inst mark to construct a tree of just instructions in the x. kids fields.
The register allocator runs next, and only instructions need registers.
386 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS
The rest of the nodes - for example, ADDP nodes evaluated automatically
by addressing hardware - need no registers, so 1cc projects them out
of the tree that the register allocator sees. The original tree remains in
the kids fields. The call to prune follows a reducer, but prune itself isn't
a reducer.
...
(gen.c functions)+=
static Node *prune(p, pp) Node p, pp[]; {
385 388 ...
(prune 386)
}
return pp;
Otherwise, prune clears any trash in the node's x. kids fields:
kids 81 ...
RX 362
(prune 386) +=
p->x.kids[O] = p->x.kids[l] = NULL;
...
386 386 386
temporary 50
x.inst 358
x.kids 359 If p is not an instruction, prune looks for instructions in the subtrees,
x.usecount 362 starting with the first child:
...
(prune 386) +=
if (p->x.inst == O)
386 386
... 386
prune bumps pp and can later store another p into the addressed cell.
This process can't overshoot, because x. kids has been made long enough
to handle the maximum number of registers read by any target instruc-
tion, which is the same as the number of children that any instruction -
and thus any node - can have. Ideally, prune would confirm this asser-
tion, but checking would require at least one more argument that would
be read only by assertions.
The dashed lines in Figure 14.5 show the x. kids that prune adds to
the tree in Figure 14.3 if ASGNI, ADDI, and ever are instructions and the
remaining nodes are subinstructions, which would be the case on many 81 kids
386 prune
current machines: ever loads a byte and extends its sign, ADDI adds 4, 382 reduce
and ASGNI stores the result. The solid lines are kids. 359 x.kids
The display below tracks the calls on prune that are made as the 362 x.usecount
dashed links are created, but it cuts clutter by omitting calls for which p
is zero, and by naming the nodes with their opcodes:
ASGNr - - - - - - _
i
ADDRLP
kids
x.kids
c
FIGURE 14.5 Figure 14.3 pruned.
388 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS
switch (p->op) {
case ADDRF,P: ck(s->x.offset >=lo && s->x.offset <=hi);
case ADDRLP: ck(s->x.offset >=lo && s->x.offset <=hi);
case eNSTe: ck(s->u.c.v.sc >=lo && s->u.c.v.sc <=hi);
case eNSTI: ck(s->u.c.v.i >=lo && s->u.c.v.i <=hi);
14. 5 • DEBUGGING 389
14.S Debugging
1burg augments the tree parser with an encoding of much of its in-
put specification. This material is not strictly necessary, but it can help
produce displays for debugging. For example, the vectors _opname and
_ari ty hold the name and number of children, respectively, for each
terminal:
...
{BURM signature 378)+=
static char *_opname[];
381 390
... 97 fprint
306 IR
388 range
static char _arity[];
They are indexed by the terminal's integral opcode. 1cc uses them in
dumptree, which prints the operator and any subtrees in parentheses
and separated by commas:
...
{gen.c functions)+=
static void dumptree(p) Node p; {
388 390...
fprint(2, "%s(", IR->x._opname[p->op]);
if (IR->x._arity[p->op] == 0 && p->syms[O])
fprint(2, "%s", p->syms[O]->name);
else if (IR->x._arity[p->op] == 1)
dumptree(p->kids[O]);
else if (IR->x._arity[p->op] == 2) {
dumptree(p->kids[O]);
fprint(2, ", ");
dumptree(p->kids[l]);
}
fpri nt(2, ") ");
}
For leaves, dumptree adds p->syms [O] if it's present. It prints the tree
in Figure 14.3 as:
390 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS
int rulenum, i;
short *nts;
Node kids[lOJ;
p = reuse(p, nt);
rulenum = getrule(p, nt);
nts = IR->x._nts[rulenum];
fprint(2, dumpcover(%x)
11
p); 11
fprint 97 dumprule(rulenum);
getrule 382 (*IR->x._kids)(p, rulenum, kids);
IR 306 for (i = O; nts[i]; i++)
reuse 384 dumpcover(kids[i], nts[i], in+l);
}
IR->x._templates[rulenum]);
if (!IR->x._isinstruction[rulenum])
fprint(2, \n 11 11
);
}
When compiling MIPS code for Figure 14.3, dumptree prints:
dumpcover(1001e9b8) stmt: ASGNI(addr, reg) I sw $%2,%1
dumpcover(1001e790) addr: ADDRLP I %a($sp)
dumpcover(1001e95c) reg: addr / la $%c,%1
dumpcover(1001e95c) addr: ADDI(reg, con) I %2($%1)
dumpcover(1001e8a4) reg: CVCI(INDIRC(addr)) / lb $%c,%1
dumpcover(1001e7ec) addr: ADDRLP / %a($sp)
dumpcover(1001e900) con: CNSTI I %a
The next section explains x._templates and the assembler templates
after each rule.
14. 6 • THE EMITTER 391
p = reuse(p, nt);
rulenum = getrule(p, nt);
nts = IR->x._nts[rulenum];
fmt = IR->x._templates[rulenum];
(emi tasm 392)
return O;
}
emit sets x. emitted to flag nodes as it emits them. When emi tasm
encounters an instruction that it has already emitted, it emits only the
name of the register in which that instruction left its result. For all nodes
that develop a value, the register allocator has recorded the target regis-
ter in p->syms [RX]:
(emitasm 392)= 392
.... 391
if (IR->x._isinstruction[rulenum] && p->x.emitted)
outs(p->syms[RX]->x.name);
If the template begins with #, the emitter calls emi t2, a machine-specific
procedure:
(emi tasm 392) +=
....
392 392 391
....
else if (*fmt == '#')
(*IR->x.emit2)(p);
1cc needs this escape hatch to generate arbitrary code for tricky features
like structure arguments. Otherwise, emi tasm emits the template with a
little interpretation:
....
(emi tasm 392) += 392 391
else {
(omit leading register copy? 393)
bp 97 for ((*IR->x._kids)(p, rulenum, kids); *fmt; fmt++)
emit2 356 if (*fmt != '%')
(MIPS) " 444
(SPARC) " 478
*bp++ = *fmt;
(X86) " 511 else if (*++fmt == 'F')
emitasm 391 print("%d", framesize);
emit 92 else if (*fmt >= 'O' && *fmt <= '9')
emit 393 emitasm(kids[*fmt - 'O'], nts[*fmt - 'O']);
framesize 366
IR 306 else if (*fmt >= 'a' && *fmt < 'a' + NELEMS(p->syms))
NELEMS 19 outs(p->syms[*fmt - 'a']->x.name);
outs 16 else
RX 362 *bp++ = *fmt;
x.emitted 360 }
x.name 362
bp is the pointer into the output buffer in the module output. c. %F
tells emi tasm to emit framesi ze, which helps emit local offsets that
are relative to the size of the frame. Substrings of the form %digit
tell it to emit recursively the subtree corresponding to the digit-th
nonterminal from the pattern, counting from zero, left to right, and
ignoring nesting. Substrings like %x tell emi tasm to emit the node's
p->syms [' x' - 'a'] ->x. name; for example, %c emits p->syms [2] ->x. name.
Table 14.1 summarizes these conventions.
So the emitter interprets the string "lw r%c, %1 \n" by emitting "lw r",
then the name (usually a digit string) of the target register, then a comma.
Then it recursively emits p->ki ds [1] as an addr, if nts [1] holds the
14.6 • THE EMIITER 393
Template Emitted
%% One percent sign
%F framesize
%digit The subtree corresponding to the rule's digit-th
nonterminal
%letter p->syms [letter - 'a' ]->x. name
any other character The character itself
#(in position 1) Call emi t2 to emit code
? (in position 1) Skip the first instruction if the source and
destination registers are the same
else
394 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS
(*emitter)(p, p->x.inst);
p->x.emitted = 1;
}
}
Most interface routines have one implementation per target, but there's
only one implementation of emit because the target-specific parts have
been factored out into the assembler code templates.
The indirect call above permits l cc to call another emitter. For exam-
ple, this feature has been used to replace this book's emitter with one
that emits binary object code directly. emitter is initialized to emi tasm:
(gen.c data)+=
...
371 398
unsigned (*emitter) ARGS((Node, int)) = emitasm;
....
emit implements two last-minute optimizations. moveself declines to
emit instructions that copy a register on top of itself:
(gen.c functions)+=
...
static int moveself(p) Node p; {
....
393 394
return p->x.copy
&& p->syms[RX]->x.name == p->x.kids[O]->syms[RX]->x.name;
}
emitasm 391
emit 92
The equality test exploits the fact that the string module stores only one
emit 393 copy of each distinct string. x. copy is set by the cost function move,
RX 362 which is called by rules that select register-to-register moves:
x.copy 360
(gen.c functions)+=
...
394 394
x.emitted 360
x.equatable 360 int move(p) Node p; {
....
x.inst 358 p->x.copy = 1;
x.kids 359
x.name 362
return 1;
}
The first for loop holds several statements that return zero; they cause
the emitter to go ahead and emit the instruction, unless moveself in-
tervenes. The emitter omits the register-to-register copy only if requate
exits the first loop, falls into the second, and returns one. The second
loop replaces all reads of tmp with reads from s re; the first loop counts
these reads in n.
If an instruction copies tmp back to s re, it is changed so that movese 1f
will delete it, and the loop continues to see if more changes are possible:
(requate 395)= 395
.... 395
if (p->x.eopy && p->syms[RX] == sre
&& p->x.kids[O]->syms[RX] == tmp)
361 mask
p->syms[RX] = tmp; 394 moveself
394 requate
Without this test, return f() would copy the value of f from the re- 362 RX
turn register to a temporary and then back to the return register for the 361 set
current function. 361 VREG
If the scan hits an instruction that targets s re, if the instruction 360 x.copy
doesn't assign s re to itself, and if the instruction doesn't merely read 359 x.kids
359 x.next
sre, then requate fails because tmp and sre do not, in general, hold the 362 x.regnode
same value henceforth:
(gen.c macros)= ....
413
#define readsreg(p) \
(generie((p)->op)==INDIR && (p)->kids[O]->OP==VREG+P)
#define setsre(d) ((d) && (d)->x.regnode && \
(d)->x.regnode->set == sre->x.regnode->set && \
(d)->x.regnode->mask&sre->x.regnode->mask)
(requate 395)+=
....
.... 395
395 396
else if (setsre(p->syms[RX]) && !moveself(p) && !readsreg(p))
return O;
For example, e=*p++ generates the pseudo-instructions below when p is
in register rl. Destinations are the rightmost operands.
396 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS
Now requate's second loop replaces all reads of tmp with reads of src;
then requate returns one, and the emitter omits the initial assignment
to tmp.
At this point, the most common source of gratuitous register-to-
register copies is postincrement in a context that uses the original value,
such as c=*p++. l cc's code for these patterns starts with a copy, when
some contexts could avoid it by reordering instructions. For example,
a more ambitious optimizer could reduce the four pseudo-instructions
above to
loadb (rl),r3 fetch character
add rl,l,rl increment p
storeb r3,c store character
Register-to-register moves now account for roughly 5 percent of the
MIPS and SPARC instructions in the standard lee testbed. In the MIPS
code, about half copy a register variable or zero - which is a register-to-
register copy using a source register hard-wired to zero - to a register
variable or an argument or return register. Such moves are not easily
deleted. Some but not all of the rest might be removed, but we're nearing
the limit of what simple register-copy optimizations can do.
....
(gen.c functions)+=
static void prelabel(p) Node p; {
394 399 ...
(prelabel 398)
}
It marks each fussy node with the register on which it insists, and it
marks the remaining nodes - at least those that yield a result instead
of a side effect - with the wi ldcard symbol that represents the set
of valid registers. It also inserts LOAD nodes where register-to-register
copies might be needed.
preload starts by traversing the subtrees left to right:
(prelabel 398)=
if (p == NULL)
...
398 398
return;
prelabel(p->kids[O]);
prelabel(p->kids[l]);
Then it identifies the register class for nodes that leave a result in a
register:
....
(prelabel 398)+=
if (NeedsReg[opindex(p->op)])
398 399
... 398
setreg(p, rmap[optype(p->op)]);
LOAD 361
opindex 98 The NeedsReg test distinguishes nodes executed for side effect from
optype 98
setreg 399 those that need a register to hold their result. NeedsReg is indexed by a
generic opcode and flags the opcodes that yield a value:
....
(gen.c data)+=
static char NeedsReg[] = {
394 402 ...
0, /* unused */
1, /* CNST */
0, 0, /* ARG ASGN */
1, /* INDIR */
1, 1, 1, 1, /* eve CVD CVF CVI */
1, 1, 1, 1, /* CVP CVS CVU NEG */
1, /* CALL */
1, /* LOAD */
0, /* RET */
1, 1, 1, /* ADDRG ADDRF ADDRL */
1, 1, 1, 1, 1, /* ADD SUB LSH MOD RSH */
1, 1, 1, 1, /* BAND BCOM BOR BXOR */
1, 1, /* DIV MUL */
0, 0, 0, 0, 0, 0, /* EQ GE GT LE LT NE */
0, 0, /* JUMP LABEL */
};
Symbol rmap[16];
14. 7 • REGISTER TARGETING 399
rmap is indexed by a type suffix, and holds the wi 1dcard that repre-
sents the set of registers that hold values of each such type. For ex-
ample, rmap [I] typically holds a wildcard that represents the general
registers, and rmap [DJ holds the wildcard that represents the double-
precision floating-point registers. Each register set is target-specific, so
the target's progbeg initializes rmap. set reg records the value from rmap
in the node to support targeting and register allocation:
(gen.c functions)+=
...
398 400
....
void setreg(p, r) Node p; Symbol r; {
p->syms [RX] = r;
}
pre 1abe l targets the right child of each assignment to a register variable
to develop its value directly into the register variable whenever possible:
(pre 1abe l case for ASGN 399) = 399
if (p->kids[O]->op == VREG+P) {
400 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS
rtarget(p, 1, p->kids[O]->syms[O]);
}
Finally, prel abel calls a target-specific procedure that adjusts the regis-
ter class for fussy opcodes:
(prelabel 398)+=
...
399 398
(IR->x.target)(p);
rtarget(p, n, r) guarantees that p->ki ds [n] computes its result di-
rectly into register r:
...
(gen.c functions)+=
void rtarget(p, n, r) Node p; int n; Symbol r; {
399 402 ...
Node q = p->kids[n];
if (!q->syms[RX]->x.wildcard) {
q = newnode(LOAD + optype(q->op),
q, NULL, q->syms[O]);
if (r->u.t.cse == p->kids[n])
r->u.t.cse = q;
p->kids[n] = p->x.kids[n] = q;
q->x.kids[O] = q->kids[O];
}
cse 346
IR 306 setreg(q, r);
LOAD 361 }
newnode 315
optype 98 If the child has already been targeted - to another a register variable
prelabel 398 or to something special like the return register - then rtarget splices a
reg 403 LOAD into the tree between parent and child, and targets the LOAD instead
RX 362 of the child. The code generator emits a register-to-register copy for
setreg 399
x.kids 359 LOADs. If the child has not been targeted already, then q->syms [RX] holds
x.target 357 a wildcard; the final set reg is copacetic because r must be a member of
x.wildcard 363 the wildcard's set. If it weren't, then we'd be asking 1cc to emit code to
copy a register in one register set to a member of another register set,
which doesn't happen without an explicit conversion node.
Figure 14.6 shows three sample trees before and after rtarget. They
assume that rO is the return register and r2 is a register variable. The
first tree has an unconstrained child, so rtarget inserts no LOAD!. The
second tree has an INDIRI that yields r2 below a RETI that expects rO, so
rtarget inserts a LOAD!. The third tree has a CALLI that yields rO below
an ASGNI that expects r2, so again rtarget inserts a LOAD!.
prelabel and rtarget use register targeting to fetch and assign reg-
ister variables, so 1 cc's templates for these operations emit no code for
either operation on any machine. All machines share the rules:
(shared rules 400) =
reg: INDIRC(VREGP) "# read register\n"
...
403 431 463 496
14. 7 • REGISTER TARGETING 401
RETI RETI
i
ADDI syms[RX]=?
i
ADDI syms[RX]=rO
/~ /~
RETI RETI
=>
i
IND I RI syms[RX]=r2
i
LOAD! syms[RX]=rO
i
VREGP
i
INDIRI syms[RX]=r2
r2
i
VREGP
r2
ASGNI ASGNI
/~CALLI
VREGP syms[RX]=rO
/~
VREGP LOAD! syms[RX]=r2
403 reg
r2
i r2 i 400 rtarget
403 stmt
CALLI syms[RX]=rO
i
FIGURE 14.6 rtarget samples.
prelabel(p);
(*IR->x._label)(p);
reduce(p, 1);
}
The interface function gen receives a forest from the front end and makes
several passes over the trees.
...
(gen.c data)+=
Node head;
...
398 410
...
(gen.c functions)+=
Node gen(forest) Node forest; {
...
402 404
int i;
struct node sentinel;
docall 367 Node dummy, p;
forest 311
IR 306
node 315 head = forest;
prelabel 398 for (p = forest; p; p = p->link) {
prune 386 (select instructions for p 402)
reduce 382 }
for (p =forest; p; p = p->link)
prune(p, &dummy);
(linearize forest 414)
(allocate registers 415)
return forest;
}
The first pass calls rewrite to select instructions, and the second prunes
the subinstructions out of the tree. The first pass performs any target-
specific processing for arguments and procedure calls; for example, it
arranges to pass arguments in registers when that's what the calling con-
vention specifies:
(select instructions for p 402) = 402
if (generic(p->op) == CALL)
docall(p);
else if ( generic(p->op) == ASGN
&& generic(p->kids[l]->op) == CALL)
14.9 • SHARED RULES 403
docall(p->kids[l]);
else if (generic(p->op) == ARG)
(*IR->x.doarg)(p);
rewrite(p);
p->x.listed = 1;
Only doarg is target-specific. Within any one tree, the code generator
is free to evaluate the nodes in whatever order seems best, so long as
it evaluates children before parents. Calls can have side effects, so the
front end puts all calls on the forest to fix the order in which the side
effects happen. If the call returns no value, or if the returned value is
ignored, then the call itself appears on the forest; the first if statement
recognizes this pattern. Otherwise, the call appears below an assignment
to a temporary, which is later used where the returned value is needed;
the second if statement recognizes this pattern.
The first pass also marks listed nodes. Chapter 15 elaborates on this
and on the rest of gen's passes.
Use one nonterminal to derive all trees that yield a value. Use this non-
terminal wherever the instruction corresponding to a rule pattern reads a
register. This book uses the nonterminal reg this way. A variant that can
catch a few more errors uses one nonterminal for general-purpose reg-
isters and another for floating-point registers (e.g. freg). For example,
rules that use only one register nonterminal can silently accept corrupt
trees like NEGF(INDIRI(. .. )). This particular error is rare.
Similarly, use one nonterminal to derive all trees executed only for
side effect. Examples include ASGN and ARG. This book uses stmt for
side-effect trees. It is possible to write l burg specifications that combine
reg and stmt into one large class, but the register allocator assumes that
the trees with side effects are roots, and trees with values are interior
nodes or leaves; it can silently emit incorrect code - the worst nightmare
for compiler writers - if its assumptions are violated. Separating reg
from stmt makes the code generator object if these assumptions are
ever violated.
Ensure that there's at least one way to generate code for each opera-
tion in the intermediate language. One easy way to do so is to write one
register-to-register rule for each operator:
reg: LEAF
reg: UNARY(reg)
reg: OPERATOR(reg,reg) 403 reg
403 stmt
Such rules ensure that l cc can match each node at least one way and
emit assembler code with one instruction per node.
Scan your target's architecture manual for instructions or addressing
modes that perform multiple intermediate-code operations, and write
rules with patterns that match what the instructions compute. Rules 3
and 7 in Figure 14.2 are examples. If you have a full set of register-to-
register rules, these bigger rules won't be necessary, but they typically
emit code that is shorter and faster. Skip instructions and addressing
modes so exotic that you can't imagine a C program - or a C compiler
- that could use them.
Use nonterminals to factor the specification. If you find you're repeat-
ing a subpattern often, give it a rule and a nonterminal name of its own.
Further Reading
l cc's instruction selector is based on an algorithm originally described
by Aho and Johnson (1976). The interface was adapted from burg (Fraser,
Henry, and Proebsting 1992) and the implementation from the compat-
ible program iburg (Fraser, Hanson, and Proebsting 1992). iburg per-
forms dynamic programming at compile time. burg uses BURS theory
(Pelegri-Llopart and Graham 1988; Proebsting 1992) to do its dynamic
400 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS
Exercises
14.1 What would break if we changed the type of costs from short to
int?
14.2 _kids is not strictly necessary. Describe how you'd implement a
reducer without it.
14.3 1 burg represents each nonterminal with an integer in a compact
range starting at one, which represents the start nonterminal. The
zero-terminated vector _ntname is indexed by these numbers and
holds the name of the corresponding nonterminal:
(BURM signature 378) +=
...
391
static char *_ntname[];
dumpcover 390
_kids 381 Use it to help write a procedure void dumpmatches (Node p) to dis-
moveself 394 play a node p and all rules that have matched it. Typical output
reuse 384 might be
dumpmatches(Ox1001e790)=ADDRLP(i):
addr: ADDRLP I %a($sp)
re: reg I $%1
reg: addr I la $%c,%1
394 requate
15
Register Allocation
408
15. 1 • ORGANIZATION 409
we'd have to write and debug a new spiller for each target. Even with a
simple spiller like 1 cc's, spills are rare, which means that good test cases
for spillers are complex and hard to find, and spillers are thus hard to
debug. One target-specific spiller would be simpler than 1 cc's, but the
savings would've been lost over the long run.
15.1 Organization
Table 15.1 illustrates the overall organization of the register allocator
by showing highlights from the call graph. Indentation shows who calls
whom. This material is at a high level and is meant to orient us before
we descend to the low levels.
After the back end has selected instructions and projected the subin-
structions out of the tree - in the tree linked through the x. kids array
- 1 i neari ze traverses the projected tree in postorder and links the in-
structions in the order in which they will ultimately execute. gen walks
down this list and passes each instruction to ra 11 oc, which normally
calls just put reg to free the registers no longer used by its children, and
getreg to allocate a register for itself. For temporaries, ralloc allocates
a register at the first assignment, and frees the register at the last use.
If getreg finds no free register that suits the instruction, it calls
spi 11 ee to identify the most distantly used register. Then getreg calls 411 askfi xedreg
spi 11 to generate code to spill this register to memory and reload the 411 askreg
value again later. gens pi 11 generates the spill, and genre 1oad replaces 92 gen
402 gen
all not-yet-processed uses of the register with nodes that load the value 426 genreload
from memory. genreload calls reprune to reestablish the relationship 424 gens pi 11
between kids and x. kids that prune established before spilling changed 412 getreg
the forest. 81 kids
413 li neari ze
386 prune
Name of Routine Purpose 410 putreg
417 ralloc
linearize orders for output one instruction tree 426 reprune
ralloc frees and allocates registers for one instruction 422 spillee
put reg frees a busy register 427 spill
getreg finds and allocates a register 423 spillr
ask reg finds and allocates a free register 359 x.kids
askfixedreg tries to allocate a given register
spil lee identifies a register to spill
spill spills one or more registers
spill r spills one register
genspill generates code to spill a register
gen reload generates code to reload a spilled value
rep rune updates kids after gen re 1oad updates x. kids
ralloc is not the only entry point for these routines. clobber calls
spi 11 directly to spill and reload such registers as those saved across
calls by the caller. Also, each target's interface procedure local can
reach askreg via askregvar, which tries to allocate a register for a reg-
ister variable.
if Cr->mask&-freemask[n])
return NULL;
else {
freemask[n] &= -r->mask;
usedmask[n] I= r->mask;
return s;
}
}
The use of register masks places an upper bound on the number of reg-
isters in a register set; the upper bound is the number of bits in an
unsigned integer mask on the machine that hosts the compiler. This
number has been 32 for every target to date, so fixing askregvar's loop
to 32 iterations seemed tolerable at first. But 1 cc's latest code generator
- for the X86 - would compile faster if we could define smaller register
412 CHAPTER 15 • REGISTER ALLOCATION
sets, and machines exist that have bigger ones. There are now machines
with thirty-two 64-bit unsigned integers, which undermines the motive
behind the shortcut. If we were doing it over, we'd represent register
sets with a structure that could accommodate sets of variable sizes.
getreg demands a register. If askreg can't find one, then spillee
selects one to spill, and spi 11 edits the forest to include instructions that
store it to memory and reload it when it's needed. The second ask reg is
thus guaranteed to find a register.
(gen.c functions)+=
...
411 412
.....
static Symbol getreg(s, mask, p)
Symbol s; unsigned mask[]; Node p; {
Symbol r = askreg(s, mask);
if Cr == NULL) {
r = spillee(s, p);
spill(r->x.regnode->mask, r->x.regnode->set, p);
r = askreg(s, mask);
}
r->x.regnode->vbl =NULL;
return r;
}
(askregvar 412)+=
...
412 413 412
....
else if (p->temporary && p->u.t.cse) {
p->x.name "?"·
. '
return 1;
}
Waiting helps l cc use one register for more than one temporary. To
help distinguish such variables when debugging the compiler, askregvar
temporarily sets the x. name field of such temporaries to a question mark.
If none of the conditions above is met, askregvar asks askreg for a
register. If one is found, the symbol is updated to point at the register:
(askregvar 412) +=
...
413 413 412
....
else if ((r = askreg(regs, vmask)) != NULL) {
p->x.regnode = r->x.regnode;
p->x.regnode->vbl = p;
p->x.name = r->x.name;
return 1;
}
ASGNI - - - - - - -
ADDRLP
/ .
~· ·~~>
-----ADDI
i ~ /)~
Start i
l'.J" CVCI <!-·...- CNSTI
4
IND I RC
ADDRLP
i - - kids
- - - - - - x. kids
------------ x.next & x.prev
c
FIGURE 15.1 Ordering uses.
relink(p, next);
}
linearize(p, &sentinel);
At the end of the loop, gen sets forest to the head of the list, which is
the node after the sentinel in the circular list:
...
(linearize forest 414} +=
forest= sentinel.x.next;
...
414 414 402
Finally, it clears the first x. prev and the last x. next to break the circle:
(linearize forest 414} +=
...
414 402
sentinel.x.next->x.prev =NULL;
sentinel.x.prev->x.next =NULL;
15.3 • ALLOCATING REGISTERS 415
The register allocator makes three passes over the forest. The first
builds a list of all the nodes that use each temporary. This list identifies
the last use and thus when the temporary should be freed, and identi-
fies the nodes that must be changed when a temporary must be spilled
to memory. If p->syms [RX] points to a temporary, then the value of
p->syms [RX]->x. 1astuse points to the last node that uses p; that node's
x. prevuse points to the previous user, and so on. The list includes nodes
that read and write the temporary:
(allocate registers 415) = 415 402
....
for (p = forest; p; p = p->x.next)
for (i = O; i < NELEMS(p->x.kids) && p->x.kids[i]; i++) {
if (p->x.kids[i]->syms[RX]->temporary) {
p->x.kids[i]->x.prevuse =
p->x.kids[i]->syms[RX]->x.lastuse;
p->x.kids[i]->syms[RX]->x.lastuse = p->x.kids[i];
}
}
The fragment uses nested loops - first the instructions, then the chil-
dren of each instruction - to visit the uses in the order in which they'll be
executed. A single unnested loop over the forest is tempting but ~ong:
for (p = forest; p; p = p->x.next) 311 forest
if (p->syms[RX]->temporary) { 19 NELEMS
p->x.prevuse = p->syms[RX]->x.lastuse; 362 RX
50 temporary
p->syms[RX]->x.lastuse = p; 359 x.kids
} 362 x. l astuse
359 x.next
It would visit the same uses, but the order would be wrong for some 359 x.prevuse
inputs. For example, a[i]=a[i]-1 uses the address of a[i] twice and
thus assigns it to a temporary. This incorrect code would visit the INDIR
that fetches the temporary for the left-hand side first, so the INDIR that
fetches the temporary for the right-hand side would appear to be the last
use. The temporary would be freed after the load and reused to hold
the difference, and the subsequent store would use a corrupt addres's.
Figure 15.2 shows the effect of the loop nesting on the order of the
x. prevuse chain for this example.
The second pass over the forest eliminates some instructions that copy
one register to another, by targeting the expression that computed the
source register to use the destination register instead. If the source is
a common subexpression, we use the destination to hold the common
subexpression if the code between the two instructions is straight-line
code and doesn't change the destination:
(allocate registers 415)+=
....
415 417 402
....
for (p = forest; p; p = p->x.next)
416 CHAPTER 15 • REGISTER ALLOCA T/ON
ASGNI..q... ASGNI..q...
Start / ~-- ..~ Start / ~·· ..~
IJ" INDIRI SUBI Jj- INDIRI SUBI
:
/1 ! ~- I I 1.~\
I> /
.-1 ! ~·,
.. \\
/f::..\
lti' \
! INDIRI
! VREGP
.......
··.. ...
2
~ r>
I I~
CNSTI
1
f
\·.
··...
VREGP ! ! INDIRI
2 I .' !I 'if.:";
/ ..
i r> CNSTI
1
········1>INDIRP ······/·····!>INDIRP
I !
~ l
~
I
/ ;. I
/
/ VREGP I VREGP
/ 2 I
2
/ I
I
I
·,_ x. l astuse for temporary 2 ·-x. lastuse for temporary 2
--- kids
-- kids & x.kids
··········· x.next & x.prev
------ x.lastuse or x.prevuse
FIGURE 15.2 Ordering uses. Singly nested and incorrect is shown
on the left; doubly nested and correct on the right.
The first inner loop scans the rest of the forest and exits early if the
destination is set anywhere later in the block or if some node changes
the flow of control. It could quit looking when the temporary dies, but
the extra logic cut only five instructions out of 25,000 in one test, so
we discarded it. If no other node sets the destination, then it's safe to
use that register for the common subexpression. The second inner loop
changes all instances of the common subexpression to use the destina-
tion instead. Once the common subexpression is computed into dst, the
original register-to-register copy copies dst to itself. The emitter and
moveself collaborate to cut such instructions.
15.3 •ALLOCATING REGISTERS 417
Registers are freed when the parent reaches ra 11 oc, but a few nodes, like 357 clobber
435 " (MIPS)
CALLI, can allocate a register and have no parent, if the value goes un-
468 " (SPARC)
used. The if statement above frees the register allocated to such nodes. 502 " (X86)
Existing targets use this code only for CALLs and LOADs. 311 forest
ralloc(p) frees the registers no longer needed by p's children, then 306 IR
allocates a register for p, if p needs one and wasn't processed earlier. 361 LOAD
361 mask
Finally, it calls the target's cl ob be r to spill any registers that this node
398 NeedsReg
clobbers: 98 opindex
(gen.c functions)+=
....
413 422
98 optype
static void ralloc(p) Node p; {
.... 410 putreg
80 REGISTER
inti; 398 rmap
unsigned mask[2]; 362 RX
410 tmask
357 x.clobber
mask[O] = tmask[O]; 359 x. listed
mask[l] = tmask[l]; 359 x.next
(free input registers 418) 360 x.registered
if (!p->x.registered && NeedsReg[opindex(p->op)]
&& rmap[optype(p->op)]) {
(assign output register418)
}
p->x.registered = 1;
(*IR->x.clobber)(p);
}
418 CHAPTER 15 • REGISTER ALLOCATION
r->x. 1astuse points to r's last use. For most expression temporaries,
there is only one use, but temporaries allocated to common subexpres-
sions have multiple uses.
Now ralloc allocates a register to this node. prelabel has stored
in p->syms [RX] a register or wildcard that identifies the registers that
p will accept. Again, common subexpressions complicate matters be-
cause askregvar has pointed their p->syms [RX] at a register variable
that hasn't yet been allocated. So we need to use two values: sym is
p->syms [RX], and set is the set of registers that suit p:
askregvar 412
(assign output register418}=
Symbol sym = p->syms[RX], set= sym;
...
418 417
ra11 oc frees the input registers before allocating the output register,
which allows it to reuse an input register as the output register. This
economy is always safe when the node is implemented by a single in-
struction, but it can be unsafe if a node is implemented by a sequence
of instructions: If the output register is also one of the input registers,
and if the sequence changes the output register before reading the cor-
responding input register, then the read fetches a corrupt value. We take
care that all rules that emit instruction sequences set their output regis-
ter only after they finish reading all input registers. Most templates emit
15.3 •ALLOCATING REGISTERS 419
} else {
p->syms[RX] = r;
r->x.lastuse = p;
}
If the node is not a common subexpression, the else clause stores r into
p->syms [RX] and notes the single use in r->x. 1astuse. If sym is a com-
mon subexpression, x. 1astuse already identifies the users, so the frag-
ment runs down the list, storing r and marking the node as processed
by the register allocator. It also notes in x.equatable if the common
subexpression is already available in some other register.
15.4 Spilling
When the register allocator runs out of registers, it generates code to
spill a busy register to memory, and it replaces all not-yet-processed
uses of that register with nodes that reload the value from memory.
More ambitious alternatives are available - see Exercises 15.6 and 15.7
- but 1cc omits them. Spills are rare, so 1 cc's spiller has been made as
simple as possible without sacrificing target independence. It would be
wasteful to tune code that is seldom used, and test cases are hard to find
genreload 426 and hard to isolate, so it would be hard to test a complex implementation
genspill 424 thoroughly.
LOAD 361 When the register allocator runs out of registers, it spills to memory
prelabel 398 the most distantly used register, which is the optimal choice. The spiller
RX 362 replaces all not-yet-processed uses of that register with nodes that load
spillee 422
spillr 423 the value from memory, and it frees the register to satisfy the current
VREG 361 request.
x.equatable 360 Several routines collaborate to handle spills: spi 11 ee identifies the
x. l astuse 362 best register to spill, and spi 11 r calls genspi 11 to insert the spill code
and genre 1oad to insert the reloads. Figure 15.3 illustrates their opera-
tion on the program
int i;
main() { i = f() + f(); }
which is the simplest program that spills on most targets. It spills the
value of the first call from the return register so that it won't be destroyed
by the second call.
The figure's first column shows the forest before code generation; that
is, the forest from the front end after pre 1abe1 substitutes VREGs for
ADDRLs that reference (temporary) register variables and injects LOADs to
write such registers. The second column shows the forest after lineariza-
tion; it assumes that the nodes linked by arcs with open arrowheads are
15.4 • SPILLING 421
I \ .. ;\·····;
<!- · ..
''
'
I\
VREGP
2 !
LOAD!
I
'
'
VREGP
2
LOAD!
~
VREGP
2 ! LOAD! <:J.. :
Y?:. ·. ·,
t Start CALLI
I> Start CAL LI
t> :
·
! !
CALLI
l'.-'T ! II
ADDRGP ADDRGP ADDRGP
f f ·····l>ASGNI <:J· .. f
kids
kids &
I\.!>
..
x.kids ADDRLP INDIRI
- - - - - - 1ink 4 i
x.next & :
x.prev .. VREGP
2
• - - ASGNI c- ' .···l> ASGNI <::!. ASGNI <::!· ..
!:
CALLI CALLI CALLI
!
ADDRGP
!
ADDRGP
!
ADDRGP
f f f
I\ I \ ·.t> I\ .!>
~
ADDRGP ADDI ADDRGP ADDI <::!· ..
AD~RGP/ADDz.:······ ...:
I \ :. . . I \ ··; i
INDIRI
! !
INDIRI
!~· . . . .~!
···1> INDIRI INDIRI /
INDIRI <l · ···.
_..·l>INDIRf
!
VREGP
2
VREGP
3
VREGP
2
VREGP
3
!
ADDRLP
VREGP
3
4
instructions - although INDIR and ASGN nodes that read and write reg-
isters are typically just comment instructions - and the rest are subin-
structions like address calculations. The last column shows the injected
spill and reload, which use ADDRLP(4). The dark arrows in the last two
columns show kids and x. kids, which are the links that remain when
subinstructions are projected out of the tree.
When get reg runs out of registers, it calls spi 11 ee (set, he re) to
identify the register in set that is used at the greatest distance from
here:
....
(gen.c functions)+=
static Symbol spillee(set, here) Node here; Symbol set; {
417 422
...
Symbol bestreg = NULL;
int bestdist = -1, i;
if (!set->x.wildcard)
return set;
for (i = 31; i >= O; i--) {
Symbol ri = set->x.wildcard[i];
if (ri != NULL
&& ri->x.regnode->mask&tmask[ri->x.regnode->set]) {
Regnode rn = ri->x.regnode;
getreg 412 Node q = here;
kids 81 int dist = O;
mask 361 for (; q && !uses(q, rn->mask); q q->x.next)
NELEMS 19 dist++;
Regnode 361
set 361
if (q && dist > bestdist) {
tmask 410 bestdist = dist;
x.kids 359 bestreg = ri;
x.next 359 }
x.regnode 362 }
x.wildcard 363 }
return bestreg;
}
If set is not a wildcard, then it denotes a single register; only that register
will do, so spi 11 ee simply returns it. Otherwise, set denotes a proper
set of registers, and spi 11 ee searches for an element of that set with the
most distant use. spi 11 ee calls uses to see if node p reads one given
register:
....
(gen.c functions)+=
static int uses(p, mask) Node p; unsigned mask; {
422 423 ...
int i;
Node q;
15.4 • SPILLING 423
spi 11 rCr, here) spills register rand changes each use of rafter here
to use a reload instead:
(gen.c functions)+=
...
422 424
static void spillrCr, here) Symbol r; Node here; { ""
int i;
Node p = r->x.lastuse;
Symbol tmp = newtempCAUTO, optypeCp->op));
(spillr423)
}
(spi 11r423) +=
...
423 423 423
362 x.regnode
spilled, because INDIR.x (VREGP) emits nothing. The value has been com-
puted already, and we want no additional instructions. Next, genspi 11
creates nodes to spill the register to memory:
....
(genspi 11 424) +=
q newnode(ADDRLP, NULL, NULL, s);
424 425
... 424
prune(p, &q);
q = last->x.next;
linearize(p, q);
Finally, it passes the new nodes through the register allocator:
(genspill 424)+=
....
425 424
for (p = last->x.next; p != q; p = p->x.next) {
ralloc(p);
}
357 clobber
If the call on genspi 11 originated because ra 11 oc ran out of registers, 435 " (MIPS)
these calls risk infinite recursion if they actually try to allocate a register. 468 " (SPARC)
We must take care that the code generator can spill a register without 502 " (X86)
allocating another register. Spills are stores, which usually take just one 426 genreload
424 genspill
instruction and thus need no additional register, but some machines have 413 linearize
limits on the size of the constant part of address calculations and thus 315 newnode
require two instructions and a temporary register to complete a store to 386 prune
an arbitrary address. Therefore we must ensure that these stores use a 417 ralloc
402 rewrite
register that is not otherwise allocated by ra 11 oc. The MIPS R3000 ar- 359 x.next
chitecture has such restrictions, but the assembler handles the problem
using a temporary register reserved for the assembler. The SPARC target
is the only one so far that requires attention from the code generator;
Section 17.2 elaborates.
genspi 11 's ra 11 oc calls above must allocate no register, but it calls
ra 11 oc anyway, since ra 11 oc is responsible for more than just allocating
a register. It also calls, for example, the target's clobber. It is unlikely
that a simple store would cause clobber to do anything, but some future
target could do so, so genspi 11 would hide a latent bug if it didn't call
ralloc. The back end sends all other nodes through rewrite, prune,
1i neari ze, and ra 11 oc, so it seems unwise to omit any of these steps
for spill nodes.
genreload(p, tmp, i) changes p->x.kids[i] to load tmp instead of
reading a register that has now been spilled:
426 CHAPTER 15 • REGISTER ALLOCATION
...
(gen.c functions)+=
static void genreload(p, tmp, i)
424 426 ...
Node p; Symbol tmp; inti; {
Node q;
int ty;
(genreload 426)
}
It changes the target node to a tree that loads tmp, selects instructions
for it, and projects out the subinstructions:
(gen reload 426) =
ty = optype(p->x.kids[i]->op);
...
426 426
if (ty == U)
ty = I;
q = newnode(ADDRLP, NULL, NULL, tmp);
p->x.kids[i] = newnode(INDIR + ty, q, NULL, NULL);
rewrite(p->x.kids[i]);
prune(p->x.kids[i], &q);
Next, gen re 1oad linearizes the reloading instructions, as is usual after
pruning, but we need two extra steps first:
kids 81 (genreload 426)+= 426 426
...
linearize 413 reprune(&p->kids[l], reprune(&p->kids[O], 0, i, p), i, p);
newnode 315 prune(p, &q);
optype 98 linearize(p->x.kids[i], p);
prune 386
rewrite 402 In most cases, each entry in x. kids was copied from some entry in some
x.kids 359 kids by prune, but genreload has changed x.kids[i] without updat-
ing the corresponding entry in any kids. The emitter uses kids, so
genreload must find and update the corresponding entry. The call on
reprune above does this, and the second call on prune makes any similar
changes to the node at which p points.
reprune(pp, k, n, p) is called to reestablish the connection between
kids and x. kids when p->x. kids [n] has changed. That is, rep rune must
do whatever is necessary to make it look like the reloads were in the for-
est from the beginning. rep rune is thus an incremental version of prune:
prune establishes a correspondence between kids and x. kids for a com-
plete tree, and reprune reestablishes this correspondence after a change
to just one of them, namely the one corresponding to the reload. Fig-
ure 15.4 shows how reprune repairs the final tree shown in Figure 15.3.
The initial, root-level call on reprune has a pointer, pp, that points to
the first kids entry that might need change.
...
(gen.c functions)+=
static int reprune(pp, k, n, p) Node p, *pp; int k, n; {
426 427 ...
Node q = *pp;
15.4 • SPILLING 427
ADDRGP
/
/ "
ASGNI - - _
'\..
--..:
• · ADDI · - ,
/
ADDRGP
ASGNI · - ,
~-~
•• - ADDI - - ,
i .--/- ' " --. (/ ~-~
INDIRI .c; ••• ---INDIRI '14 .:
INDIRI INDIRI INDIRI
t
ADDRLP
!
VREGP
l
VREGP *
!
ADDRLP VREGP
l
4 2 3 4 3
--·--· kids
------ x.kids
FIGURE 15.4 Figure 15.3's reload before and after reprune.
if (q == NULL I I k > n)
return k;
else if (q->x.inst == 0)
return reprune(&q->kids[l],
reprune(&q->kids[O], k, n, p), n, p);
else if Ck == n) {
*pp= p->x.kids[n]; 357 clobber
return k + 1; 435 " (MIPS)
} else 468 " (SPARC)
502 " (X86)
return k + 1; 412 getreg
} 81 kids
361 mask
kids link the original tree, and x. kids link the instruction tree. The 386 prune
second is a projection of the first, but an arbitrary number of nodes 426 reprune
have been projected out, so finding the kids entry that corresponds to 358 x.inst
p->x. kids [i J requires a recursive tree search. rep rune's recursive calls 359 x.kids
track prune's recursive calls. They bump k, which starts out at zero, and
advance p in exactly those cases where prune finds an instruction and
sets the next entry in x. kids. So when k reaches n, rep rune has found
the kids entry to update.
getreg and each target's clobber call spill(mask, n, here) to spill
all busy registers in register set n that overlap the registers indicated by
mask. A typical use is for CALL nodes, because calls generally corrupt
some registers, which must be spilled before the call and reloaded after-
ward. spi 11 marks the registers as used and runs down the rest of the
forest looking for live registers that need spilling. It economizes by first
confirming that there are registers that need spilling:
(gen.c functions)+=
....
426
void spill(mask, n, here) unsigned mask; int n; Node here; {
428 CHAPTER 15 • REGISTER ALLOCATION
inti;
Node p;
usedmask[n] I= mask;
if (mask&-freemask[n])
for (p = here; p; p p->x.next)
(spi 11 428)
}
The inner loop below identifies the live registers that need spilling and
calls spi 11 r to spill them:
(spill 428)= 428
for (i = O; i < NELEMS(p->x.kids) && p->x.kids[i]; i++) {
Symbol r = p->x.kids[i]->syms[RX];
if (p->x.kids[i]->x.registered && r->x.regnode->set == n
&& r->x.regnode->mask&mask)
spillr(r, here);
}
used page, but spillers can determine the most distantly used regis-
ter (Freiburghouse 1974).
Exercises
15.l Section 15.3 describes an optimization abandoned because it saved
only 5 instructions out of 2 5,000 in one test. Implement the op-
timization and see if you can find useful programs that the opti-
mization improves more.
15.2 Adapt 1cc to use Sethi-Ullman numbering. How much faster does
it make 1cc's code for your programs?
Assembler Meaning
move $10,$11 Set register 10 to the value in register 11.
subu $10,$11,$12 Set register 10 to register 11 minus register 12.
subu $10,$11,12 Set register 10 to register 11 minus the con-
stant 12.
lb $10,11($12) Set register 10 to the byte at the address 11
bytes past the address in register 12.
sub.d $f12,$fl4,$f16 Set register 12 to register 14 minus register
16. Use double-precision floating-point regis-
ters and arithmetic.
sub.s $f12,$f14,$f16 Set register 12 to register 14 minus register
16. Use single-precision floating-point regis-
ters and arithmetic.
b ll Jump to the instruction labelled Ll.
j $31 Jump to the address in register 31.
blt $10, $11, ll Branch to Ll if register 10 is less than
register 11.
.byte Ox20 Initialize the next byte in memory to hexadec-
imal 20.
430
A RETARGETABLE C COMPILER 431
Systems from Digital Equipment run the Ultrix operating system, are little
endians, and use mipselIR. Systems from Silicon Graphics run the IRIX
operating systems, are big endians, and use mi psebIR. The systems share
the same type metric:
(MIPS type metrics43l}= 431
1, 1, 0, /* char */
2, 2, 0, /* short */
4, 4, 0, /* int */
4, 4, 1, /* float */
8, 8, 1, /* double */
4, 4, 0, /* T * */
0, 1, 0, /* struct */
432 CHAPTER 16 • GENERA TING MIPS R3000 CODE
Some of the symbol-table handlers are missing. 1cc, like many compilers,
assumes that all data for the debugger can be encoded using assembler
directives. MIPS compilers encode file names and line numbers this way,
but information about the type and location of identifiers is encoded in
another file, which l cc does not emit. MIPS debuggers can thus report
the location of an error in an executable file from 1 cc, but they can't
report or change the values of identifiers; see Exercise 16.5.
stabinit 80
stabline 80
stabsym 80 16.1 Registers
The MIPS R3000 processor has thirty-two 32-bit registers, which are
known to the assembler as Si. The MIPS R3010 floating-point coproces-
sor adds thirty-two more 32-bit registers, which are usually treated as
sixteen even-numbered 64-bit registers and are known to the assembler
as Sfi.
The hardware imposes only a few constraints - register $0 is always
zero, and the jump-and-link instruction puts the return address in $31 -
but 1cc observes many more conventions used by other compilers, in or-
der to interoperate with the standard libraries and debuggers. Table 16.2
enumerates the conventions.
The assembler reserves $1 to implement pseudo-instructions. For ex-
ample, the hardware permits only 16-bit offsets in address calculations,
but the assembler permits 32-bit offsets by injecting extra instructions
that form a large offset using $1. 1cc uses some pseudo-instructions, but
it forgoes others to simplify adaptations of 1cc that emit binary object
code directly.
The convention reserves $2-$3 and $f0-$f2 for return values, but lee
uses only the first half of each. The second half is for Fortran's complex
arithmetic type. C doesn't have this type, but C compilers respect the
convention to interoperate with Fortran code.
16. 1 • REGISTERS 433
Registers Use
$0 zero; unchangeable
$1 reserved for the assembler
$2-$3 function return value
$4-$7 first few procedure arguments
$8-$15 scratch registers
$16-$23 register variables
$24-$25 scratch registers
$26-$27 reserved for the operating system
$28 global pointer; also called $gp
$29 stack pointer; also called $sp
$30 register variable
$31 procedure return address
progend does nothing for this target. progbeg encodes Table 16.2 in 411 askreg
the register allocator's data structures. 458 gp
89 progend
(MIPS functions433)= 435 431 466 " (SPARC)
static void progbeg(argc, argv) int argc; char *argv[f; { 502 " (X86)
int i;
Each element of i reg represents one integer register, and freg2 repre-
sents pairs of adjacent floating-point registers. d6 represents the pair
$6-$7.
Actually, the machine has only 31 register pairs of each type, but the
declaration supplies 32 to keep askreg's inelegant loop bounds valid.
434 CHAPTER 16 • GENERATING MIPS R3000 CODE
The third argument to mkreg is a mask of three ones, which identifies $8,
$9, and $10. The emitted code takes care to use $8 as the source register
and the other two as temporaries.
target calls set reg to mark nodes that need a special register, and it
calls rtarget to mark nodes that need a child in a special register:
(MIPS functions 433) +=
...
433 435 431
static void target(p) Node p; {
....
switch (p->op) {
(MIPS target 437)
} 363 mkreg
} 400 rtarget
399 setreg
If an instruction clobbers some registers, clobber calls spill to save 427 spill
them first and restore them later.
(MIPS functions 433) +=
...
435 444 431
....
static void clobber(p) Node p; {
switch (p->op) {
(MIPS c 1ob be r 443)
}
}
The cases missing from target and clobber above appear with the ger-
piane instructions in the next section.
8-, 16-, and 32-bit integral instructions, respectively. The optional suffix
u flags some instructions as unsigned. If it's omitted, the operation is
signed.
Constants and identifiers represent themselves in assembler:
(MIPS rules 436) = ...
436 431
aeon: con "%0"
aeon: ADDRGP "%a"
The instructions that access memory use address-calculation hardware
that adds an instruction field and the contents of an integer register. The
assembler syntax is the constant followed by the register in parentheses:
...
(MIPS rules 436) +=
addr: ADDI(reg,acon) "%1($%0)"
...
436 436 431
....
(MIPS rules 436) += 436 437
..... 431
reg: addr "la $%c,%0\n" 1
%c emits p->syms [RX]->x. name. A con is an addr, so lee uses 1a when-
ever it needs to load a constant into a register. Zero is always available
in $0, so we need no instruction to compute zero:
(MIPS rules 436) +=
....
437 437 431
.....
reg: eNSTe "# reg\n" range(a, 0, 0)
reg: eNSTS "# reg\n" range(a, 0, 0)
reg: eNSTI "# reg\n" range(a, 0, 0)
reg: eNSTU "# reg\n" range(a, 0, 0)
reg: eNSTP "# reg\n" range(a, 0, 0)
Recall that cost expressions are evaluated in a context in which a denotes
the node being labelled, which here is the constant value being tested for
zero. target arranges for these nodes to return $0:
(MIPS target 437) = 443
..... 435
case eNSTe: case eNSTI: case eNSTS: case eNSTU: case eNSTP:
if (range(p, 0, 0) == 0) {
setreg(p, ireg[O]);
p->x.registered = 1;
} 433 i reg (MIPS)
break; 467 " (SPARC)
388 range
Allocating $0 makes no sense, so target marks the node to preclude 403 reg
register allocation. 399 setreg
The instructions 1 and s load from and store into memory. They take 403 stmt
357 target
a type suffix, an integer register, and an addr. For example, sw $4,x 435 " (MIPS)
stores the 32-bit integer in $4 into the memory cell labelled x. sb and sh 468 " (SPARC)
do likewise for the low-order 8 and 16 bits of the register. 1 b, 1 h, and 502 " (X86)
lw reverse the process and load an 8-, 16-, or 32-bit value: 360 x.registered
....
(MIPS rules 436) += 437 438
..... 431
stmt: ASGNe(addr,reg) "sb $%1,%0\n" 1
stmt: ASGNS(addr,reg) "sh $%1,%0\n" 1
stmt: ASGNI(addr,reg) "sw $%1,%0\n" 1
stmt:·ASGNP(addr,reg) "sw $%1,%0\n" 1
reg: INDIRe(addr) "lb $%c,%0\n" 1
reg: INDIRS(addr) "lh $%c,%0\n" 1
reg: INDIRI(addr) "lw $%c,%0\n" 1
reg: INDIRP(addr) "lw $%c,%0\n" 1
1b and 1h propagate the sign bit to fill the top part of the register, so they
implement a free ever and CVS!. 1bu and 1 hu fill with zeroes instead, so
they implement a free eveu and CVSU:
438 CHAPTER 16 • GENERATING MIPS R3000 CODE
....
(MIPS rules436)+=
reg: CVCI(INDIRC(addr)) "lb $%c,%0\n" 1
437 438
... 431
...
(MIPS rules436)+=
res: CNSTI "%a"
438 439
range(a,0,31)
... 431
It sets the target register twice: first to the unconverted integer and then
to the equivalent double. See Exercise 16.6.
The t rune. w. d instruction truncates a double and leaves the integral
result in a floating-point register, so 1cc follows up with a mfcl, which
copies a value from the floating point unit to the integer unit, where the
client of the CVDI expects it:
16.2 • SELECTING INSTRUCTIONS 441
....
(MIPS rules 436) +=
reg: CVDI(reg) "trunc.w.d $f2,$f%0,$%c; mfcl $%c,$f2\n" 2
440 441
... 431
...
(MIPS rules 436} +=
ar: CNSTP "%a" range(a, 0, OxOfffffff)
...
442 443 431
If the constant won't fit in 28 bits, then 1cc falls back on more costly
rules that load an arbitrary 32-bit constant into a register and that jump
indirectly using that register. The MIPS assembler makes most of the
decisions that require checking ranges, but at least some versions of the
assembler leave this particular check to the compiler.
The front end and the routines function and target collaborate to
get return values into the return register, and return addresses into the
program counter, so RET nodes produce no code:
...
(MIPS rules 436}+=
stmt: RETD(reg) "# ret\n" 1
...
443 444 431
(mips.c macros}+=
...
434
467
388
" (SPARC)
range
#define INTRET Ox00000004 403 reg
#define FLTRET Ox00000003 400 rtarget
399 setreg
427 spill
(MIPS clobber 443)= 435
403 stmt
case CALLO: case CALLF: 357 target
spill(INTTMP I INTRET, !REG, p); 435 " (MIPS)
spill(FLTTMP, FREG, p); 468 " (SPARC)
break; 502 " (X86)
case CALLI:
spill(INTTMP, !REG, p);
spill(FLTTMP I FLTRET, FREG, p);
break;
case CALLV:
444 CHAPTER 16 • GENERATING MIPS R3000 CODE
if (argoffset == 0)
argno = O;
p->x.argno = argno++;
size= p->syms[l]->u.c.v.i < 4? 4 : p->syms[l]->u.c.v.i;
p->syms[2] = intconst(mkactual(size,
p->syms[O]->u.c.v.i));
}
doca11 clears argoffset at each CALL, so a zero there alerts doarg to 366 argoffset
reset its static argument counter. mkactual uses the argument size and 444 argreg
367 docall
alignment - rounded up to 4 if necessary, because smaller arguments 92 gen
are widened - and returns the argument offset. 402 gen
target uses argreg and rtarget to compute the children of ARG nodes 49 intconst
into the argument register, if there is one: 361 IREG
... 433 i reg (MIPS)
{MIPS target 437) +=
case ARGO: case ARGF: case ARGI: case ARGP: {
...
443 447 435 467
366
" (SPARC)
mkactual
static int tyO; 364 offset
int ty = optype(p->op); 98 optype
400 rtarget
Symbol q; 361 set
357 target
q = argreg(p->x.argno, p->syms[2]->u.c.v.i, ty, tyO); 435 " (MIPS)
if (p->x.argno == 0) 468 " (SPARC)
tyO = ty; 502 " (X86)
359 x.argno
if (q && 362 x.regnode
!((ty == F I I ty == D) && q->x.regnode->set !REG))
rtarget(p, 0, q);
break;
}
The fragment also remembers the type of the first argument to help de-
termine the register for later arguments. The long conditional omits tar-
geting if the argument is floating point but passed in an integer register.
446 CHAPTER 16 • GENERA TING MIPS R3000 CODE
in memory:
(MIPS emi t2 446) =
case ARGO: case ARGF: case ARGI: case ARGP:
446 444 ...
ty = optype(p->op);
if (p->x.argno == 0)
tyO = ty;
q = argreg(p->x.argno, p->syms[2]->u.c.v.i, ty, tyO);
src = getregnum(p->x.kids[OJ);
if (q == NULL && ty == F)
print("s.s $f%d,%d($sp)\n", src, p->syms[2]->u.c.v.i);
else if (q == NULL && ty == D)
print("s.d $f%d,%d($sp)\n", src, p->syms[2]->u.c.v.i);
else if (q == NULL)
print("sw $%d,%d($sp)\n", src, p->syms[2]->u.c.v.i);
else if (ty == F && q->x.regnode->set == IREG)
print("mfcl $%d,$f%d\n", q->x.regnode->number, src);
else if (ty == D && q->x.regnode->set == IREG)
print("mfcl.d $%d,$f%d\n", q->x.regnode->number, src);
break;
argreg 444
blkcopy 367 If argreg returns null, then the caller passes the argument in memory,
dalign 368 so emi t2 stores it, using the offset that doarg computed. The last two
doarg 356
(MIPS) " 445 conditionals above emit code for floating-point arguments transmitted in
(SPARC) " 477 integer registers. mfcl x, y copies a single-precision value from floating-
(X86) " 512 point register y to integer register x. mfcl. d does likewise for doubles;
emit2 356 the target is a register pair.
(MIPS) " 444 emi t2 and target also collaborate to emit block copies:
(SPARC) " 478 ....
(X86) " 511 (MIPS rules 436) += 444 431
IREG 361 stmt: ARGB(INDIRB(reg)) "# argb %0\n" 1
number 361
optype 98 stmt: ASGNB(reg,INDIRB(reg)) "# asgnb %0 %1\n" 1
reg 403 emi t2's case for ASGNB sets the globals that record the alignment of the
salign 368
source and destination blocks, then lets bl kcopy do the rest:
set 361 ....
stmt
target
403
357
(MIPS emit2 446)+=
case ASGNB:
446 447 ... 444
(MIPS) " 435
(SPARC) " 468 dalign = salign = p->syms[l]->u.c.v.i;
(X86) " 502 blkcopy(getregnum(p->x.kids[O]), 0,
tmpregs 434 getregnum(p->x.kids[l]), 0,
x.argno 359 p->syms[O]->u.c.v.i, tmpregs);
x.kids 359 break;
x.regnode 362
The call trace shown in Figure 13.4 starts in this case. tmpregs holds the
numbers of the three temporary registers, which form the triple register
16.3 • IMPLEMENTING FUNCTIONS 447
that progbeg assigned to bl kreg. ARGB and ASGNB target their source-
address register to reserve b1 kreg:
(MIPS target 437) +=
....
445 435
case ASGNB: rtarget(p->kids[l], 0, blkreg); break;
case ARGB: rtarget(p->kids[O], 0, blkreg); break;
This source comes from a grandchild because the intervening child is a
proforma INDIRB. emit2's case for ARGB is similar to the case for ASGNB:
....
(MIPS emi t2 446)+= 446 444
case ARGB:
dalign = 4;
salign = p->syms[l]->u.c.v.i;
blkcopy(29, p->syms[2]->u.c.v.i,
getregnum(p->x.kids[O]), 0,
p->syms[O]->u.c.v.i, tmpregs);
n = p->syms[2]->u.c.v.i + p->syms[O]->u.c.v.i;
dst = p->syms[2]->u.c.v.i;
for ( ; dst <= 12 && dst < n; dst += 4)
print("lw $%d,%d($sp)\n", (dst/4)+4, dst); 412 askregvar
break; 367 blkcopy
434 blkreg
da 1 i gn differs because the stack space for the outgoing argument is al- 368 dalign
356 doarg
ways aligned to at least a multiple of four, which is the most that bl kcopy 445 " (MIPS)
and its helpers can use. The first argument is 29 because the destina- 477 " (SPARC)
tion base register is $sp, and the second argument is the stack offset 512 " (X86)
for the destination block, which doarg computed. If the ARGB overlaps 356 emit2
the first four words of arguments, then the for loop copies the overlap 444 " (MIPS)
478 " (SPARC)
into the corresponding argument registers to conform with the calling 511 " (X86)
convention. 92 function
448 " (MIPS)
484 " (SPARC)
16.3 Implementing Fundions 518 " (X86)
337 gencode
92 gen
The front end calls local to announce each new local variable: 402 gen
.... 365 mkauto
(MIPS functions 433) +=
static void local(p) Symbol p; {
445 448
... 431
89
433
progbeg
" (MIPS)
if (askregvar(p, rmap[ttob(p->type)]) == 0) 466 " (SPARC)
mkauto(p); 498 " (X86)
} 398 rmap
400 rtarget
Machine-independent routines do most of the work. askregvar allocates 368 salign
434 tmpregs
a register if it's appropriate and one is available. Otherwise, mkauto as- 73 ttob
signs a stack offset; Figure 16.1 shows the layout of the MIPS stack frame. 359 x.kids
The front end calls function to announce each new routine. function
drives most of the back end. It calls gencode, which calls gen, which
448 CHAPTER 16 • GENERA TING MIPS R3000 CODE
high addresses
locals and
temporaries
framesize
outgoing
arguments
low addresses
varargs = variadicCf->type)
11 i > 0 && strcmpCcallee[i-1]->name, "va_alist") == O;
By convention on this machine, there must be a prototype, or the last
argument must be named va_alist. function needs it to determine the
location of some incoming arguments:
...
(MIPS function 448) +=
for Ci= O; callee[i]; i++) {
448 451 448
...
(assign location for argument i 449)
}
Recall that the first four words of arguments (including gaps to satisfy
alignments) are passed in registers $4-$7, except the first argument is
passed in $fl2 if it is a float or a double, and the second argument is
passed in $f14 if it is a float or a double and the first argument is passed
in $f12. This calling convention complicates function, particularly the
body of the loop above. It starts by assigning a stack offset to the argu-
ment:
(assign location for argument i 449) =
Symbol p = callee[i];
...
449 449
Symbol q = caller[i];
offset= roundupCoffset, q->type->align);
179 addressed
p->x.offset = q->x.offset = offset; 78 align
p->x.name = q->x.name = stringdCoffset); 444 argreg
r = argregCi, offset, ttobCq->type), ttobCcaller[O]->type)); 80 AUTO
if Ci < 4) 93 callee
argregs [i J = r; 93 caller
92 function
offset= roundupCoffset + q->type->size, 4); 448 " (MIPS)
Even arguments that arrive in a register and remain in one have a re- 484 " (SPARC)
served stack slot. Indeed, the offset helps arg reg determine which reg- 518 " (X86)
ister holds the argument. argregs [i J records for use below argreg's 361 IREG
60 isfloat
result for argument i. All arguments to variadic routines are stored in 60 isstruct
the stack because the code addresses them indirectly: 364 offset
... 19 roundup
(assign location for argument i 449)+=
if Cvarargs)
...
449 450 449 361 set
29 stringd
p->sclass = AUTO; 73 ttob
65 variadic
If the argument arrived in a register and the routine makes no calls that 362 x.name
could overwrite it, then the argument can remain in place if it is neither a 362 x.offset
structure, nor accessed indirectly, nor a floating-point argument passed 362 x.regnode
in an integer register.
(leave argument in place? 449) = 450
r && ncalls == 0 &&
!isstructCq->type) && !p->addressed &&
!CisfloatCq->type) && r->x.regnode->set !REG)
450 CHAPTER 16 • GENERA TING MIPS R3000 CODE
...
(assign location for argument i 449) +=
else if ((leave argument inplace?449)) {
449 450
... 449
The conditional succeeds if and only if the argument arrives in one reg-
ister and must be moved to another one. For example, if an argument
askregvar 412 arrives in $4 but the routine makes calls, then $4 is needed for outgoing
REGISTER 80 arguments. If the incoming argument is used enough to belong in a reg-
rmap 398
sclass 38
ister, the code above arranges the copy. A floating-point argument could
ttob 73 have arrived in an integer register, which is an operation that the front
type 56 end can't express, so the code above conforms the type and scl ass to
tell the front end to generate nothing, and the fragment (save argument
in a register) on page 453 generates the copy.
The conditional in the last else-if statement above tests up to three
clauses. First, askregvar must allocate a register to the argument:
(copy argument to another register? 450) = ...
451 450
askregvar(p, rmap[ttob(p->type)])
If askregvar fails, then the argument will have to go in memory. If it's
not already there, the fragment (save argument in stack) on page 454 will
put it there. In this case, the two sclass fields are already conformed,
but we don't want to conform the two type fields because a conversion
might be needed. For example, a new-style character argument needs a
conversion on big endians; it is passed as an integer, so its value is in the
least significant bits of the argument word, but it's going to be accessed
as a character, so its value must be moved to the most significant end of
the word on big endians.
The second condition confirms that the argument is already in a reg-
ister:
16. 3 • IMPLEMENTING FUNCTIONS 451
....
(copy argument to another register? 450) + =
&& r != NULL
450 451
... 450
If this condition fails, then the argument arrived in memory and needs to
be loaded into the register that askregvar found. For example, such an
argument might be the last of five integer arguments, which means that
it's passed in memory and thus should be loaded into a register now, if
it's used heavily. askregvar sets p->scl ass to REGISTER; q->scl ass is
never REGISTER, so falling through with the differing values causes the
front end to generate the load.
The third and last condition confirms that no conversion is needed:
(copy argument to another register?450)+=
....
451 450
&& (isint(p->type) I I p->type == q->type)
For example, if q (the caller) is a double and p is a float, then a CVDF is
needed. In this case, the scl ass and type fields differ, so falling through
causes the front end to generate a conversion.
After assigning locations to all arguments, function calls gen code to
select code and allocate registers for the body of the routine:
....
(MIPS function 448) +=
offset = O;
449 451
... 448
For variadic routines, 1cc saves the rest of the integer argument registers
too, because the number used varies from call to call:
(MIPS function 448) +=
....
453 454
if (varargs && callee[i-1]) {
.... 448
i = callee[i-1]->x.offset + callee[i-1]->type->size;
for (i = roundup(i, 4)/4; i <= 3; i++)
print("sw $%cl,%d($sp)\n", i + 4, framesize + 4*i);
}
This loop picks up where its predecessor left off and continues until it
has stored the last integer argument register, $7.
For nonvariadic routines, the prologue saves only those argument reg-
isters that are used and that can't stay where they are: 93 callee
93 caller
(save argument i 453)= 453 366 framesi ze
Symbol out= callee[i]; 92 function
Symbol in = caller[i]; 448 " (MIPS)
int rn = r->x.regnode->number; 484 " (SPARC)
518 " (X86)
int rs= r->x.regnode->set; 60 isint
int tyin = ttob(in->type); 361 number
80 REGISTER
if (out->sclass == REGISTER 19 roundup
&& (isint(out->type) I I out->type in->type)) { 361 set
73 ttob
(save argument in a register454) 362 x.offset
} else { 362 x.regnode
(save argument in stack 454)
}
saved += 8;
}
366 framesize
16.4 Defining Data 361 !REG
410 usedmask
47 Value
defconst emits assembler directives to allocate a scalar and initialize it 362 x.name
to a constant:
(MIPS functions433)+=
...
448 456 431
.....
static void defconst(ty, v) int ty; Value v; {
switch (ty) {
(MIPS defconst 455)
}
}
The cases for the integer types emit a size-specific directive and the ap-
propriate constant field:
(MIPS defconst 455)= 456
..... 455
case C: print(".byte %d\n", v.uc); return;
case S: print(".half %d\n", v.ss); return;
case I: print(".word Ox%x\n", v. i); return;
case U: print(".word Ox%x\n", v.u); return;
The case for numeric address constants treats them like unsigned inte-
gers:
456 CHAPTER 16 • GENERATING MIPS R3000 CODE
...
(MIPS defconst 455)+=
case P: print(".word Ox%x\n", v.p); return;
455 456
... 455
....
(MIPS functions 433) +=
static void import(p) Symbol p; {
456 457 ... 431
if (!isfunc(p->type))
print(".extern %s %d\n", p->name, p->type->size);
}
defsymbo 1 generates a unique name for local statics to keep from collid-
ing with other local statics with the same name:
(MIPS defsymbol 457)=
if (p->scope >= LOCAL && p->sclass == STATIC)
...
457 457
else
q->x.name = stringd(q->x.offset);
}
For variables on the stack, address simply computes the adjusted offset.
For variables accessed using a label, it sets x. name to a string of the form
name ± n. If the offset is positive, the literal "+" emits the operator; if
the offset is negative, the %d emits it.
MIPS conventions divide the globals to access small ones faster. MIPS
machines form addresses by adding a register to a signed 16-bit instruc-
tion field, so developing and accessing an arbitrary 32-bit address takes
multiple instructions. To reduce the need for such sequences, translators
put small globals into a special 64K bytes sdata segment. The dedicated
register $gp holds the base address of sdata, so up to 64K bytes of glob-
als can be accessed in one instruction. The -Gn option sets the threshold
gnum:
(MIPS data433)+=
....
434 459 431
....
static int gnum = 8;
(parse -G flag 458) = 433
parseflags(argc, argv);
for (i = O; i < argc; i++)
address 90 if (strncmp(argv[i], "-G", 2) == 0)
(MIPS) " 457
(SPARC) " 490 gnum = atoi(argv[i] + 2);
(X86) " 521
align 78 The front end calls the interface procedure global to announce a new
BSS 91 global symbol:
DATA 91
(MIPS functions 433) +=
....
457 459 431
parsefl ags 370 ....
seg 265 static void global(p) Symbol p; {
stringd 29 if (p->u.seg == BSS) {
x.name 362 (define an uninitialized global 459)
x.offset 362 } else {
(define an initialized global 458)
}
}
global puts small initialized globals into sdata and the rest into data:
(define an initialized global 458) = 458
if (p->u.seg == DATA
&& (p->type->size == 0 I I p->type->size > gnum))
print(" .data\n");
else if (p->u.seg == DATA)
print(".sdata\n");
print(".align %c\n", ".01.2 ... 3"[p->type->align]);
print("%s:\n", p->x.name);
16.4 • DEFINING DATA 459
space emits a directive that reserves a block of memory unless the sym-
bol is in the BSS segment, because global allocates space for BSS sym-
bols:
(MIPS functions433)+=
...
459 460 431
.....
static void space(n) int n; {
if (cseg != BSS)
print(".space %d\n", n);
}
. space clears the block, which the standard requires of declarations that
use it.
460 CHAPTER 16 • GENERA TING MIPS R3000 CODE
Further Reading
356 blkfetch
460 " (MIPS)
Kane and Heinrich (1992) is a reference manual for the MIPS R3000 series. 492 " (SPARC)
1cc's MIPS code generator works on the newer MIPS R4000 series, but it 513 " (X86)
doesn't exploit the R4000 64-bit instructions. 368 dalign
403 reg
368 salign
Exercises
16.1 Why can't small global arrays go into sdata?
16.2 Why must all nonempty argument-build areas be at least 16 bytes
long?
16.3 Explain why the MIPS calling convention can't handle variadic rou-
tines for which the first argument is a float or double.
16.4 Explain why the MIPS calling convention makes it hard to pass struc-
tures reliably in the undeclared suffix of variable length argument
lists. How could this problem be fixed?
16.5 Extend 1cc to emit the information about the type and location
of identifiers that your debugger needs to report and change the
values of identifiers. The symbolic back end that appears on the
companion diskette shows how the stab functions are used.
462 CHAPTER 16 •GENERATING MIPS R3000 CODE
16.6 Page 418 describes ral loc's assumption that all templates clobber
no target register before finishing with all source registers. 1cc's
MIPS template for CVID on page 440 satisfies this requirement in
two ways. Describe them.
16.7 Using the MIPS code generator as a model, write a code generator
for another RISC machine, like the DEC Alpha or Motorola PowerPC.
Read Section 19.2 first.
ralloc 417
17
Generating SPARC Code
Assembler Meaning
mov %i0,%o0 Set register oO to the value in register i 0.
sub %i0,%il,%o0 Set register oO to register i 0 minus
register i 1.
sub %i0,1,%o0 Set register oO to register i O minus one.
ldsb [%i0+4],%o0 Set register oO to the byte at the address
four bytes past the address in register i 0.
ldsb [%i0+%i4],%o0 Set register oO to the byte at the address
equal to the sum of registers i 0 and i 4.
fsubd %f0,%f2,%f4 Set register f4 to register fO minus
register f2. Use double-precision
floating-point arithmetic.
fsubs %f0,%f2,%f4 Set register f4 to register fO minus
register f2. Use single-precision
floating-point arithmetic.
ba Ll Jump to the instruction labelled Ll.
jmp [%i OJ Jump to the address in register i 0.
cmp %i0,%il Compare registers i 0 and i 1 and record
results in the condition flags.
bl Ll Branch to Ll if the last comparison
recorded less-than.
.byte Ox20 Initialize the next byte in memory to
hexadecimal 20.
Interface 79
stabblock 80 TABLE 11.1 Sample SPARC assembler input lines.
stabinit 80
stabline 80
stabsym 80 (SPARC functions 466)
stabtype 80 (SPARC interface definition 464)
The last fragment configures the front end and points to the SPARC rou-
tines and data in the back end:
(SPARC interface definition 464) = 464
Interface sparcIR = {
(SPARC type metrics 465)
0, /* little_endian */
1, /* mulops_calls */
1, /* wants_callb */
0, /* wants_argb */
1, /* left_to_right */
0, /* wants_dag */
(interface routine names)
stabblock, 0, 0, stabinit, stabline, stabsym, stabtype,
{
1, /* max_unaligned_load */
(Xi nterface initializenss)
17. 1 • REGISTERS 465
}
};
17 .1 Registers
The SPARC assembler language programmer sees 32 32-bit general reg-
isters. Most are organized as a stack of overlapping register windows.
Most routines allocate a new window to store locals, temporaries, and
outgoing arguments - the calling convention passes some arguments in
registers - and free the window when they return.
The general registers have at least two names each, as shown in Ta-
ble 17.2. One is r0-r31, and the other encodes a bit more about how the
register is used and where it goes in a register window. gO is hard-wired
to zero. Instructions can write it, but the change won't take. When they
read it, they read zero.
global g7 main i7
global registers main 10
main 17
'main·.,-,:oo f:
. iO f's register window
..
'•
" ..
'.~'in-, ot·, f. i7
f 10
f 17
f oO
f o7
The machine arranges the register windows so that the physical reg-
isters called o0-o7 in each caller are the same registers referred to as
greg 467 i0-i7 in the callee. Figure 17.l shows the register windows for
main() { f(); }
f() { return; }
just before f returns. There are 32 general registers, but each call con-
sumes only 16, because g0-g7 aren't stacked, and the shading shows that
the caller's o0-o7 are the same physical registers as the callee's i0-i7.
The interface procedure progend does nothing for this target. progbeg
parses the target-specific flags -p and -pg, which have 1cc emit code for
the SPARC profilers, but which this book omits. progbeg also initializes
the structures that describe the register set:
(SPARC functions466)= 468 464
....
static void progbeg(argc, argv) int argc; char *argv[]; {
int i;
For example, if 13 bits are enough for the signed offset of an ADDRFP
or ADDRLP node, then one instruction can develop the address into a
register:
(SPARC rules469)+=
....
469 470 463
.....
stk13: ADDRFP "%a" imm(a)
stk13: ADDRLP "%a" imm(a)
reg: stk13 "add %0,%%fp,%?'o%c\n" 1
Otherwise, it takes more instructions:
(SPARC rules 469) +=
....
470 470 463
.....
stk: ADDRFP "set %a,%?'o%c\n" 2
stk: ADDRLP "set %a,%%%c\n" 2
reg: ADDRFP "set %a,%%%c\nadd %%%c,%%fp,%%%c\n" 3
reg: ADDRLP "set %a,?'o%%c\nadd %%%c,%%fp,%%%c\n" 3
set is a pseudo-instruction that generates two instructions if the con-
stant can't be loaded in just one, and if one instruction would do, then
stk13 will take care of it. We might have done something similar in
the MIPS code generator, but the MIPS assembler can hide constant size-
checking completely, so we might as well use this feature. The SPARC
assembler leaves at least part of the problem to the programmer or com-
piler, so we had no choice this time.
The four rules above appear equivalent to
stk: ADDRFP "set %a,%%%c\n" 2
stk: ADDRLP "set %a,%%%c\n" 2
imm 469 reg: stk "add %0,%%fp,%?'o%c\n" 1
reduce 382
reg 403 but the shorter rules fail because they ask reduce to store two different
x. inst 358 values into one x. inst. Recall that a node's x. inst records as an in-
struction the nonterminal that identifies the rule that matches the node,
if there is one. The problem with the short rules above is that the x. inst
field for the ADDRLP or ADDRFP can't identify both stk and reg.
The nonterminal con13 matches small integral constants:
(SPARC rules 469) +=
....
470 470 463
.....
con13: CNSTC "%a" imm(a)
con13: CNSTI "%a" imm(a)
con13: CNSTP "%a" imm(a)
con13: CNSTS "%a" imm(a)
con13: CNSTU "%a" imm(a)
The instructions that read and write memory cells use address calcula-
tion that can add a register to a 13-bit signed constant:
....
(SPARC rules 469) += 470 471
..... 463
base: ADDI(reg,con13) "%%%0+%1"
base: ADDP(reg,con13) "%%%0+%1"
base: ADDU(reg,con13) "%%%0+%1"
If the constant is zero or the register gO, the sum degenerates to a simple
indirect or direct address:
11.2 • SELECTING INSTRUCTIONS 471
....
{SPARC rules 469) +=
base: reg "%%%0"
470 471
... 463
....
(SPARC rules469}+=
reg: INDIRD(base) "ld2 [%0],%%f%c\n" 2
471 472... 463
....
(SPARC rules469)+=
reg: con "set %0,%%%c\n" 1
473 474
... 463
set generates one instruction if the constant fits in 13 bits and two oth-
erwise. The assembler insulates us from the details.
Most binary instructions that operate on integers can accept a register
or a 13-bit constant as the second source operand:
....
(SPARC rules469)+=
re: con13 "%0"
474 474 ... 463
....
(SPARC rules469)+=
reg: NEGI(reg) "neg %%%0,%%%c\n" 1
474 475... 463
CVSU needs a 16-bit mask, which won't fit in the instruction as CVCU's
does.
All SPARC unconditional jumps and conditional branches have a one-
instruction delay slot. The instruction a~er the jump or branch - which
is said to be "in the delay slot" - is always executed, just as if it had been
executed before the jump or branch. For the time being, we'll fill each
delay slot with a harmless nop. The ba instruction targets constant ad-
dresses, and the jmp instruction targets the rest, namely the ones needed 403 reg
403 stmt
for switch statements.
....
(SPARC rules469)+=
addrg: ADDRGP "%a"
475 475... 463
....
(SPARC ruJes469)+=
call: ADDRGP "%a"
475 476
... 463
p->syms[RX] = intconst(mkactual(4,
p->syms[O]->u.c.v.i)/4);
}
ARG nodes are executed for side effect, so they don't normally use
syms [RX], but the SPARC calling convention implements ARG nodes with
357 clobber
register targeting or assignment, so using RX is natural. 435 " (MIPS)
Targeting arranges to compute the first 24 bytes of arguments into 468 " (SPARC)
the registers for outgoing arguments. target calls rtarget to develop 502 " (X86)
the child into the desired a-register, and then it changes the ARG into a 92 emit
LOAD into the same register, which emit and moveself optimize away: 393 emit
... 361 FREG
(SPARC target 473)+=
case ARGI: case ARGP:
476 480
... 468 49
361
intconst
!REG
361 LOAD
if (p->syms[RX]->u.c.v.i < 6) { 361 mask
rtarget(p, 0, oreg[p->syms[RX]->u.c.v.i]); 366 mkactua 1
p->OP = LOAD+optype(p->op); 394 moveself
setreg(p, oreg[p->syms[RX]->u.c.v.i]); 98 optype
} 467 oreg
417 ralloc
break; 403 reg
Calls with too many arguments for these registers pass the rest in mem- 400 rtarget
362 RX
ory. To pass an argument in memory, the assembler template undoes 399 setreg
the division and adds 68: 427 spill
... 403 stmt
(SPARC rules 469) +=
stmt: ARGI(reg) "st %%%0,[%%sp+4*%c+68]\n" 1
476 478... 463 357
435
target
" (MIPS)
stmt: ARGP(reg) "st %%%0,[%%sp+4*%c+68]\n" 1 468 " (SPARC)
502 " (X86)
sp points at 16 words - 64 bytes - in which the operating system can 362 x. regnode
store the routine's i - and 1-registers when the register windows are ex-
hausted and some must be spilled. The next word is reserved for the
478 CHAPTER 17 • GENERATING SPARC CODE
outgoing arguments
not in oO-oS
space to save
i0-i7 and 10-17
if necessary
low addresses
switch (p->op) {
(SPARC emi t2 479)
}
}
onto the stack. A stack slot is reserved for each outgoing argument, and
the only path from a floating-point register to a general register is via
memory, so emi t2 copies the floating-point register into the stack, and
then loads the stack slot into the a-register, unless we're past o5:
(SPARCemit2 479)= 479
.... 478
case ARGF: {
int n = p->syms[RX]->u.c.v.i;
print("st %%f%d,[%%sp+4*%d+68]\n",
getregnum(p->x.kids[O]), n);
i f (n <= 5)
print("ld [%%sp+4*%d+68],%%o%d\n", n, n);
break;
}
The MIPS code generator avoided this step because it never allocated the
argument registers for any other purpose, but the SPARC convention uses
ol-o5 for temporaries when they aren't holding outgoing arguments.
The first SPARC systems offered no instructions to multiply, divide,
or find remainders, so the standard library supplied equivalent func-
tions. It is perhaps premature to abandon these systems, so 1cc sets
mu 1ops_ca 11 s and sticks with the functions even on newer machines
that offer multiplicative instructions (see Exercise 17.1):
(SPARC rules 469) +=
...
478 480 463
.....
reg: DIVI(reg,reg) "call .div,2; nop\n" 2
reg: DIVU(reg,reg) "call .udiv,2; nop\n" 2
reg: MODI(reg,reg) "call .rem,2; nop\n" 2
reg: MODU(reg,reg) "call .urem,2; nop\n" 2
reg: MULI(reg, reg) "call .mul, 2; nop\n" 2
reg: MULU(reg,reg) "call .umul,2; nop\n" 2
target arranges to pass the operands in oO and ol, and to receive the
result in oO:
(SPARC target 473)+=
...
477 468
case DIV!: case MODI: case MULI:
case DIVU: case MODU: case MULU:
!REG 361 setreg(p, oreg[O]);
mulops_calls 87 rtarget(p, 0, oreg[O]);
oreg 467 rtarget(p, 1, oreg[l]);
reg 403 break;
rtarget 400
setreg 399 The library functions allocate no new register window, and instead de-
spill 427
stroy ol-o5:
target 357
(MIPS) " 435 (SPARC clobber 477) +=
...
479 468
(SPARC) " 468
(X86) " 502 case DIV!: case MODI: case MULI:
case DIVU: case MODU: case MULU:
spill(Ox00003e00, !REG, p); break;
The binary floating-point instructions accept only registers:
(SPARC rules 469) +=
...
480 481 463
.....
reg: ADDD(reg,reg) "faddd %%f%0,%%f%1,%%f%c\n" 1
reg: ADDF(reg,reg) "fadds %%f%0,%%f%1,%%f%c\n" 1
reg: DIVD(reg,reg) "fdivd %"..bf%0,%%f%1,%%f%c\n" 1
reg: DIVF(reg,reg) "fdivs %%f%0,%%f%1,%%f%c\n" 1
reg: MULD(reg, reg) "fmuld %%f%0,%%f%1,%%f%c\n" 1
reg: MULF(reg,reg) "fmuls %%f%0,%%f%1,%%f%c\n" 1
reg: SUBD(reg, reg) "fsubd %%f%0,%%f%1,%%f%c\n" 1
reg: SUBF(reg,reg) "fsubs %%f%0,%%f%1,%%f%c\n" 1
Most floating-point unary operators are similar:
17.2 •SELECTING INSTRUCTIONS 481
ld [%%sp+64] ,%",.b%c\n" 3
CVID reverses the process:
(SPARCrules469)+=
....
481481 463
....
reg: CVID(reg) "st %%%0,[%%sp+64]; ld [%%sp+64],%%f%c; _
fi tod %",.bf",.bc, %%f%c\n" 3
403 reg
CVDI and CVID use the spot reserved for the address of the structure re- 403 stmt
turn block for any callees. The spot is unused except between the branch
delay slot of a ca11 instruction and the callee's prologue instruction that
allocates a new stack frame. No CVDI or CVID can appear in any such
interval.
The floating-point comparisons have one delay slot after the branch,
and another after the comparison:
(SPARC rules469)+=
....
481 482 463
....
rel: EQD(reg, reg) 11 fcmped %",.bf%0, %",.bf%l; nop; fbue 11
rel: EQF(reg,reg) 11
fcmpes %%f%0,%%f%1; nop; fbue"
rel: GED(reg,reg) "fcmped %%f%0,%%f%1; nop; fbuge 11
rel: GEF(reg,reg) "fcmpes %%f%0,%%f%1; nop; fbuge 11
rel: GTD(reg,reg) 11 fcmped %%f%0 , %",.bf",.bl; nop; fbug"
rel: GTF(reg,reg) "fcmpes %%f%0,%",.bf%1; nop; fbug 11
rel: LED(reg,reg) 11
fcmped %%f%0,%%f%1; nop; fbule"
rel: LEF(reg,reg) 11 fcmpes %%f%0,%%f%1; nop; fbule"
rel: Lm(reg,reg) 11 fcmped %%f%0,%%f%1; nop; fbul"
rel: LTF(reg,reg) "fcmpes %",.bf%0, %%f%1; nop; fbul"
rel: NED(reg,reg) "fcmped %%f%0, %%f",.bl; nop; fbne 11
rel: NEF(reg,reg) "fcmpes %",.bf%0,%",.bf%l; nop; fbne 11
stmt: rel 11
%0 %a; nop\n 11 4
482 CHAPTER 17 • GENERA TING SPARC CODE
NEGD is similar. One instruction copies the first word and changes the
sign bit in transit. The other instruction copies the second word:
(SPARC ruJes469)+=
...
482 482 463
....
reg: NEGD(reg) 11
# NEGD\n 11
2
reg 403
break;
salign 368
stmt 403 }
tmpregs 434
Finally, emi t2 calls b1 kcopy to generate code to copy a block of memory:
x.kids 359
(SPARC rules469)+=
...
482 463
stmt: ASGNB(reg,INDIRB(reg)) 11
# ASGNB\n 11
Figure 13.4 traces the block-copy generator in action for the MIPS target,
but the SPARC code differs only cosmetically. The SPARC instruction
17.3 • IMPLEMENT/NG FUNCTIONS 483
set has no unaligned loads or stores, but this is moot here because the
example in the figure doesn't use the MIPS unaligned loads and stores
anyway. Recall that sa 1i gn, da 1i gn, and x. max_una1i gned_load collabo-
rate to copy even unaligned blocks, so the target-specific code can ignore
this complication. The g-registers aren't being used, so the emitted code
can use gl-g3 as temporaries; the MIPS code was trickier because the
conventions there made it harder to acquire so many registers at once.
emi t2 omits the usual case for ARGB because wants_argb is zero on
this target.
3. It precedes each return with an ASGNB that copies the block ad-
dressed by the child of the return into the block addressed by the
first local.
The front end announces this local like any other, and the back end ar-
ranges for it to address the stack slot reserved for the location of struc-
ture return blocks:
(structure return block? 484}= 483
if (retstruct) {
p->x.name = stringd(4*16);
p->x.offset = 4*16;
retstruct = O;
return;
}
varargs = variadic(f->type)
I I i > 0 && strcmp(callee[i-1]->name,
"_builtin_va_alist") == O;
The SPARC convention either declares the routine variadic or uses a
macro that names the last argument _bui 1ti n_va_a1i st.
function clears the back end's record of busy registers:
17.3 • IMPLEMENTING FUNCTIONS 485
....
(SPARC function 484) +=
(clear register state 410)
484 485
... 484
maxargoffset holds the size of the stack block for outgoing arguments.
function reserves space for at least o0-o5:
....
(SPARC function 484) +=
maxargoffset = 24;
485 485
... 484
In the first case, function must generate code itself to store the param-
eter if it arrived in a register; the front end can't help because 1cc's
intermediate code gives it no way to store the floating-point value from
an integer register.
If the parameter is integral and arrived in an i -register, it still belongs
in memory if its address is taken or if the routine is variadic:
....
(classify SPARC parameter486)+=
else if (p->addressed I I varargs)
486 486
... 485
Now to call gencode in the front end, which calls gen in the back end.
First, function clears offset to record that no locals have been assigned
to the stack yet, it clears maxoffset to track the largest value of offset,
and it flags each function that returns an aggregate because 1oca1 must
treat its first local specially:
...
(SPARC data 467) +=
static int retstruct;
467 492... 463
...
(SPARC function 484) +=
offset = maxoffset = O;
485 487
... 484
retstruct = isstruct(freturn(f->type));
gencode(caller, callee);
When gencode completes the first code-generation pass and returns,
function can compute the size of the frame and of the argument-build
block, in which the outgoing arguments are marshaled. The size of the
argument-build area must be a multiple of four, or some stack fragments
will be unaligned. The frame size must be a multiple of eight, and in-
cludes space for the locals, the argument-build area, 16 words in which
to save i 0-i 7 and 10-17, and one word to store the address of any ag-
gregate return block:
...
(SPARC function 484) +=
maxargoffset = roundup(maxargoffset, 4);
487 487... 484 294
93
autos
callee
framesize = roundup(maxoffset + maxargoffset + 4*(16+1), 8); 93 caller
366 framesize
function emits code that saves time by allocating no new frame or 64 freturn
register window for routines that don't need them: 92 function
... 448 " (MIPS)
(SPARC function 484) +=
1eaf = ((is this a simple leaf function? 487)) ;
487 488... 484 484
518
" (SPARC)
" (X86)
337 gencode
The constraints are many. The routine must make no calls: 92 gen
402 gen
(is this a simple leaf function? 487) = ...
487 487 60 isstruct
!ncalls 90 local
447 " (MIPS)
It must have no locals or formals in memory: 483 " (SPARC)
... 518 " (X86)
(is this a simple leaf function? 487) +=
&& !maxoffset && !autos
...
487 487 487 366
365
maxargoff set
maxoffset
364 off set
It must not return a structure, because such functions use a frame 19 roundup
pointer in order to access the cell that holds the location of the return
block:
...
(is this a simple leaf function? 487) +=
&& !isstruct(freturn(f->type))
...
487 488 487
Most continue with a save instruction, which allocates a new register win-
dow and adds a register or constant to a register. Most uses of save add
a negative constant to sp, which allocates a new frame on the downward-
growing stack:
....
(SPARC function 484) +=
i f (leaf) {
488 489
... 484
...
(SPARC functions 466) +=
#define exch(x, y, t) (((t) x), ((x) (y)) I
...
484 490 464
((y) = (t)))
ireg[reg++]->x.regnode->number, offset);
print("st %%r%d,[%%fp+%d]\n",
ireg[reg++]->x.regnode->number, offset+ 4);
} else if (isfloat(p->type) && reg <= 5)
print("st %%r%d, [%%fp+%d]\n",
ireg[reg++]->x.regnode->number, offset);
else
reg++;
offset+= roundup(p->type->size, 4);
}
i sfl oat succeeds for floats and doubles, so the first else arm above saves
not just floats but also the first half of any double that arrives in i 5; the
second half will be in memory already, courtesy of the caller.
Finally, function emits some profiling code (not shown), the body of
the current routine, and the epilogue. The general epilogue is a ret
instruction, which jumps back to the caller, and a restore instruction
in the ret's delay slot, which undoes the prologue's save instruction. If
the routine does without a register window and stack frame, there's no
save to undo, but another rename is needed to restore normality to the
names and numbers of the i-registers:
(SPARC function 484)+=
....
489 484
emitcode 341
(emit profiling code)
function 92
(MIPS) " 448 emitcode();
(SPARC) " 484 if (!leaf)
(X86) " 518 print("ret; restore\n");
(MIPS) i reg 433 else {
(SPARC) " 467
rename();
isfloat 60
number 361 print("retl; nop\n");
offset 364 }
reg 403
rename 489 ret and retl are both pseudo-instructions that emit an indirect branch
roundup 19 using the register that holds the return address. They need different
x.regnode 362 names because ret uses i 7, and retl uses o7 to name the same register
because no register stack frame was pushed.
The front end calls import to make visible in the current module a sym-
bol defined in another module. The SPARC assembler assumes that un-
defined symbols are external, so the SPARC import has nothing to do:
....
(SPARC functions 466) +=
static void import(p) Symbol p; {}
490 491
... 464
The front end calls defsymbo l to announce a new symbol and cue the
back end to initialize the x. name field. The SPARC conventions generate a
name for local statics and use the source name for the rest. The SPARC
link editor leaves symbols starting with L out of the symbol table, so
defsymbo l prefixes L to generated symbols. It prefixes an underscore to
the rest, following another SPARC convention:
....
(SPARC functions466)+=
static void defsymbol(p) Symbol p; {
491 491
... 464
segment tracks the current segment in cseg for the interface procedure
space, which emits the SPARC . skip assembler directive to reserve n
bytes of memory for an initialized global or static:
492 CHAPTER 17 • GENERA TING SPARC CODE
....
(SPARC data 467) += 487 463
static int cseg;
....
(SPARC functions466)+= 491 492 464
....
static void space(n) int n; {
if (cseg ! = BSS)
print(". skip %d\n", n);
}
. skip arranges to clear the space that it allocates, which the standard
requires.
If we're in the BSS segment, then the interface procedure gl oba1 can
define the label and reserve space in one fell swoop, using . common for
external symbols and . reserve for the rest:
(SPARC functions466)+=
....
492 492 464
....
static void global(p) Symbol p; {
print(".align %d\n", p->type->align);
if (p->u.seg == BSS
&& (p->sclass == STATIC I I Aflag >= 2))
print(".reserve %s,%d\n", p->x.name, p->type->size);
else if (p->u.seg == BSS)
print(".common %s,%d\n", p->x.name, p->type->size);
Aflag 62
align 78
else
BSS 91 print("%s:\n", p->x.name);
reg 403 }
seg 265
STATIC 80 It also emits an alignment directive and, for initialized globals, the la-
x.name 362 bel. . common also exports the symbol and marks it so that the loader
generates only one common global even if other modules emit . common
directives for the same identifier. . reserve takes neither step. Statics
use it to avoid the export, and the scrupulous double -A option uses it to
have the loader complain when multiple modules define the same global.
Pre-ANSI C permitted multiple definitions, but ANSI C technically expects
exactly one definition; other modules should use extern declarations in-
stead.
If the block's size is not divisible by eight, then an initial bl kcopy copies
the stragglers:
{SPARCblkloop494)+= 494 494
... 493
....
blkcopy(tmps[2], doff, sreg, soff, size&?, tmps);
The loop decrements registers sreg and tmp[2] by eight for each itera-
tion. It does tmp[2] immediately, but pushes sreg's decrement forward
to fill the branch delay slot at the end of the loop:
{SPARC bl kl oop 494) +=
...
494 494 493
....
print("l: dee 8,%%r%d\n", tmps[2]);
The loop next calls b1 kcopy to copy eight bytes from the source to the
blkcopy 367 destination. The source offset is adjusted to account for the fact that
s reg should've been decremented by now:
{SPARC bl kl oop 494) +=
...
494 494 493
....
blkcopy(tmps[2], doff, sreg, soff - 8, 8, tmps);
Finally, the loop continues if more bytes remain:
{SPARC bl kloop 494)+=
...
494 493
print("cmp %%r%d,%%r%d; ", tmps[2], dreg);
pri nt("bgt lb; ");
print("dec 8,%%r%d\n", sreg);
Further Reading
The SPARC reference manual elaborates on the architecture of this ma-
chine (SPARC International 1992). Patterson and Hennessy (1990) explain
the reasons behind delay slots. Krishnamurthy (1990) surveys the liter-
ature in instruction scheduling, which fills delay slots.
EXERCISES 495
Exercises
17.l Add a flag that directs the back end to emit instructions instead of
calls to multiply and divide signed and unsigned integers.
17.2 Adapt 1cc's SPARC code generator to make better use of gl-g7 and
to keep some floating-point variables in floating-point registers. Re-
call that the calling convention and thus all previously compiled
library routines preserve none of these registers.
17.3 Find some use for at least some of the delay slots after uncondi-
tional jumps. For example, the slot after an unconditional jump
can be filled with a copy of the instruction at the jump target, and
the jump can be rewritten to target the next instruction. Some opti-
mizations require buffering code and making an extra pass over it.
The MIPS R3000 architecture has such delay slots too, but the stan-
dard assembler reorders instructions to fill them with something
more useful, so we could ignore the problem there.
17.4 Find some use for at least some of the delay slots after conditional
branches. It may help to exploit the annul bit, which specifies that
the instruction in the delay slot is to have no effect unless the
branch is conditional and taken. Set the annul bit by appending
,a to the opcode (e.g., be, a L4).
17.5 Some SPARC chips stall for at least one clock cycle when a load
instruction immediately precedes an instruction that uses the value
loaded. The object code would run just as fast with a single nop
after the load, though it would be one word longer. Reorder the
emitted assembler code to eliminate at least some of these stalls.
Proebsting and Fischer (1991) describe one solution.
17.6 Some leaf routines need no register window, but still lose the leaf
optimization because they need a frame pointer. For example, some
functions that return structures need no window, but do use a
frame pointer. Change 1cc to generate a frame but no register win-
dow for such routines.
17.7 The SPARC code generator includes idiosyncratic code to ensure
that the spiller can emit code to store a register when no allocable
registers are free. Devise a short test program that exercises this
code.
18
Generating X86 Code
This book uses the name X86 for machines compatible for the purposes
of code generation with the Intel 386 architecture, which include the
Intel 486 and Pentium architectures, plus clones from manufacturers like
AMD and Cyrix. The 1burg specification uses approximate Intel 486 cycle
counts for costs, which often but not always gives the best result for
compatibles. Some costs are omitted because they aren't needed. For
example, if only one rule matches some operator, there is no need for
costs to break ties between derivations.
The X86 architecture is a CISC, or complex instruction set computer.
It has a large set of variable-length instructions and addressing modes.
It has eight 32-bit largely general registers and eight 80-bit floating-point
registers organized as a stack.
There are many C compilers for the X86, and their conventions (e.g.,
for calling functions and returning values) differ. The code generator in
this chapter uses the conventions of Borland C+ + 4.0. That is, it interop-
erates with Borland's standard include files, libraries, and linker. Using
1cc with other X86 environments may require a few changes; documen-
tation on the companion diskette elaborates.
There are many X86 assemblers, and they don't all use the same syn-
tax. lee works with Microsoft's MASM 6.11 and Borland's Turbo Assem-
bler 4.0. That is, it emits code in the intersection of the languages ac-
cepted by these two assemblers. Both have instructions that list the des-
tination operand before the source operand. The registers have names
instead of numbers. Table 18.1 describes enough sample instructions to
get us started.
The file x86. c collects all X86-specific code and data. It's an 1burg
specification with the interface routines after the grammar:
(x86.md496}=
%{
(X86 macros498}
(lburg prefix 375}
(interface prototypes}
(X86 prototypes}
(X86 data 499}
%}
(terminal declarations 376}
%%
(shared rules 400}
498
A RETARGETABLE C COMPILER 497
Assembler Meaning
mov al,byte ptr 8 Set register a 1 to the byte at address 8.
mov dword ptr 8[edi*4],l Set to one the 32-bit word at the address
formed by adding eight to the product of
register edi and four.
subu eax,7 Subtract seven from register eax.
fsub qword ptr x Subtract the double-precision
floating-point value in the memory cell
labelled x from the top of the
floating-point stack.
jmp Ll Jump to the instruction labelled Ll.
cmp dword ptr x,7 Compare the 32-bit word at address x with
seven and record the results in the
condition flags.
jl Ll Branch to Ll if the last comparison
recorded less-than.
dword 020H Initialize the next 32-bit word in memory
to hexadecimal 20.
The last fragment configures the front end and points to the X86-specific
routines and data in the back end:
(X86 interface dehnition 497) = 497
Interface x86IR = {
1, 1, 0, /* char */
2, 2, 0, /* short */
4, 4, 0, /* int */
4, 4, 1, /* float */
8, 4, 1, /* double */
4, 4, 0, /* T * */
0, 4, 0, /* struct; so that ARGB keeps stack aligned */
1, /* little_endian */
0, /* mulops_calls */
0, /* wants_callb */
1, /* wants_argb */
0, /* left_to_right */
0, /* wants_dag */
(interface routine names)
(symbol-table emitters 498)
498 CHAPTER 18 • GENERATING X86 CODE
The MIPS and SPARC conventions evaluate arguments left to right, but
the X86 conventions evaluate them right to left, which is why the inter-
face flag left_to_right is zero.
X86 conventions offer no standard way for compilers to encode sym-
bol tables in assembler code for debuggers, so 1cc's X86 back end in-
cludes no symbol-table emitters:
(symbol-table emitters 498) = 497
0, 0, 0, 0, 0, 0, 0,
18.1 Registers
The X86 architecture includes eight general registers. Assemblers typi-
cally refer to them by a name - eax, ecx, edx, ebx, esp, ebp, esi, and edi
- rather than by a number. 1 cc's register allocator needs a number to
compute shift distances for register masks, so 1cc borrows the encoding
from the binary representation of some instructions:
(X86 macros 498) = 496
enum { EAX=O, ECX=l, EDX=2, EBX=3, ESI=6, EDI=7 };
IREG 361
left_to_right 88 Conventions reserve ebp for the frame pointer and esp for the stack
mkreg 363 pointer, so 1 cc doesn't allocate them.
parsefl ags 370 progbeg builds the structures that describe the registers:
(X86 functions 498) = 501 497
"" {
static void progbeg(argc, argv) int argc; char *argv[];
int i;
Assembler code uses different names for the full 32-bit register and
its low order 8- and 16-bit subregisters. For example, assembler code
uses eax for the first 32-bit register, ax for its bottom half, and al for
its bottom byte. This rule requires initializing separate register vectors
for shorts and characters:
18. 1 • REGISTERS 499
shortreg[ECX] = mkreg( cx 11 11
, ECX, 1, !REG);
shortreg[EDX] mkreg( 11
dx 11
, EDX, 1, !REG);
shortreg[EBX] = mkreg( 11
bx 11
, EBX, 1, !REG);
shortreg[ESI] = mkreg( 11
si 11
, ES!, 1, !REG);
short reg [EDI] = mkreg( 11
di 11
, EDI, 1, !REG);
...
(X86 progbeg 499) +=
charreg[EAX] = mkreg( 11
al 11
, EAX, 1, !REG);
...
499 500 498
charreg[ECX] = mkreg( 11
cl 11
, ECX, 1, !REG);
char reg [EDX] = mkreg( 11
dl 11
, EDX, 1, !REG);
charreg[EBX] = mkreg( 11
bl 11
, EBX, 1, !REG);
No instructions address the bottom byte of esi or edi, so there is no byte
version of those registers. Byte instructions can address the top half of
each 16-bit register, but 1cc does without these byte registers because
using them would complicate code generation. For example, ever would
need to generate one sequence of instructions when the operand is in 498 EAX
the low-order byte and another sequence when the operand is next door. 498 EBX
Table 18.2 summarizes the allocable registers. 498 ECX
The floating-point registers are organized as a stack. Some operands 498 EDI
498 EDX
of some instructions can address an arbitrary floating-point register - 498 ESI
from the top down - but some crucial instructions effectively assume a 361 IREG
stack. For example, all variants of floating-point addition require at least 363 mkreg
one operand to be atop the stack. The assembler operand st denotes the
top of the stack, and st(l) denotes the value underneath it. Pushing a
value on the stack causes st to denote a new cell and st(l) to denote
the cell previously denoted by st.
1cc was tailored for registers with fixed names, not names that change
as a stack grows and shrinks. The X86 floating-point registers violate
these assumptions, so 1cc disables its register allocator for the X86
floating-point registers and lets the instructions manage the registers.
For example, a load instruction pushes a value onto the stack and thus ef-
fectively allocates a register; an addition pops two operands and pushes
their sum, so it effectively releases two registers and allocates one.
The register allocator can't be disabled by simply clearing the entries
in rmap for floats and doubles. If a node yields a value, then the reg-
ister allocator assumes that it needs a register, and expects the node's
syms [RX] to give a register class. So we need a representation of the
floating-point registers, but the representation needs to render the reg-
ister allocator harmless. One easy way to do this is to create registers
with zero masks, which causes getreg to succeed always and to change
no significant compiler state:
....
(X86 progbeg 499)+=
for ( i = 0 ; i < 8 ; i ++)
499 500
... 498
...
(X86 progbeg 499) +=
tmask[FREG] = Oxff;
500 501
... 498
vmask[FREG] = O;
progbeg also emits some boilerplate required to assemble and link the
emitted code:
...
(X86 progbeg 499)+=
print(".486\n");
501 501 ... 498
...
(X86 functions 498) + =
static void segment(n) int n; {
498 502
... 497
91 BSS
if (n == cseg) 91 CODE
91 DATA
return; 90 export
if (cseg == CODE) 456 " (MIPS)
print("_TEXT ends\n"); 490 " (SPARC)
else if (cseg == DATA I I cseg BSS I I cseg LIT) 523 " (X86)
print("_DATA ends\n"); 361 FREG
91 LIT
cseg = n; 89 progbeg
if (cseg == CODE) 433 " (MIPS)
print("_TEXT segment\n"); 466 " (SPARC)
else if (cseg == DATA I I cseg BSS I I cseg -- LIT) 498 " (X86)
print("_DATA segment\n"); 89 progend
466 " (SPARC)
} 502 " (X86)
410 tmask
export needs a directive that must appear between segments. CODE, DATA, 410 vmask
LIT, and BSS are all positive, so export can use segment(O) to close the
active segment without opening a new one.
progbeg clears cseg, which records that the back end is between seg-
ments:
...
(X86 progbeg 499)+=
cseg = O;
501 509 ... 498
progend emits boilerplate that closes the current segment and the en-
tire assembler program:
502 CHAPTER 18 • GENERA TING X86 CODE
The for loop pops the source registers, and the subsequent if statement
pushes any result. Floating-point instructions done for side effect -
such as assignments and conditional branches - push nothing. ckstack
directs the programmer to simplify the expression to avoid the spill. 1cc
merely reports the error because such spills are rare, so reports are un-
likely to irritate users. If 1 cc ignored the problem completely, however,
it would silently emit incorrect code for some programs, which is unac-
ceptable. Exercises 18.8 and 18.9 explore related matters.
A base address may be an ADDRGP or the sum of an aeon and one of the
502 ckstack
Name What It Matches
aeon address constants
addr address calculations for instructions that read and write memory
addrj address calculations for instructions that jump
base unindexed address calculations
cm pf floating-point comparands
con constants
conl the integer constant 1
con2 the integer constant 2
con3 the integer constant 3
flt floating-point operands
index indexed address calculations
mem memory cells used by general-purpose operators
memf memory cells used by floating-point operators
mr memory cells and registers
mrcO memory cells, registers, and constants whose memory cost is 0
mrcl memory cells, registers, and constants whose memory cost is 1
mrc3 memory cells, registers, and constants whose memory cost is 3
re registers and constants
res register cl and constants between 0 and 31 inclusive
reg computations that yield a result in a register
stmt computations done for side effect
general registers. The assembler syntax puts the register name in square
brackets:
(X86 rules 503)+=
...
503 504 497
11%all .....
base: ADDRGP
base: reg 11 [%0] II
If the register is the frame pointer, the same operation computes the
address of a formal or local:
(X86 rules 503)+=
...
504 504 497
11 %a[ebp]" .....
base: ADDRFP
base: ADDRLP 11 %a[ebp]"
Some addresses use an index, which is a register scaled by one, two, four,
or eight:
(XB6 rules 503)+=
...
504 504 497
.....
index: reg "%0 11
index: LSHI(reg,conl) "%0*2"
index: LSHI(reg,con2) "%0*4"
range 388 index: LSHI(reg,con3) "%0*8"
reg 403
conl: CNSTI "1" range(a, 1, 1)
conl: CNSTU "l" range(a, 1, 1)
con2: CNSTI "2" range(a, 2, 2)
con2: CNSTU "2" range(a, 2, 2)
con3: CNSTI "3" range(a, 3' 3)
con3: CNSTU "3" range(a, 3' 3)
Recall that cost expressions are evaluated in a context in which a denotes
the node being labelled, which here is the constant value being compared
with small integers. The unsigned shifts to the left are equivalent to the
integer shifts:
(XB6 rules 503)+=
...
504 504 497
.....
index: LSHU(reg,conl) "%0*2"
index: LSHU(reg,con2) "%0*4"
index: LSHU(reg,con3) "%0*8"
A general address may be a base address or the sum of a base address
and an index. The front end puts index operations on the left; see Sec-
tion 9.7.
(XB6 rules 503) + =
...
504 505 497
.....
addr: base "%0"
18.2 • SELECTING INSTRUCTIONS 505
if (generic(p->kids[l]->kids[O]->op) == INDIR
&& sametree(p->kids[O], p->kids[l]->kids[O]->kids[O]))
return 3;
else
return LBURG_MAX;
}
memop confirms the overall shape of the tree, and sametree confirms that 403 stmt
the destination is the same as the first source operand:
...
(X86 functions 498) +=
static int sametree(p, q) Node p, q; {
...
507 511 497
....
(X86 rules 503)+=
reg: BCOMU(reg) "?mov %c,%0\nnot %c\n" 2
507 508... 497
if (generic(p->kids[l]->op) != CNST
&& ! C(is p->ki ds [1] a constant common subexpression? 508))) {
rtarget(p, 1, intreg[ECX]);
setreg(p, intreg[EAX]);
}
break;
The call on setreg above ensures that this node doesn't target ecx. If it
did, the mov instruction that starts the template would clobber ecx and
18.2 • SELECTING INSTRUCTIONS 500
thus cl before its value has been used. eax is not the only acceptable
register, but non-constant shift amounts were rare in our tests, so it
wasn't worth tailoring a wildcard without ecx for these shifts.
The i mu l instruction multiplies signed integers. One variant multiplies
a register by a register, constant, or memory cell:
...
(X86 rules 503) + =
reg: MULI(reg,mrc3) "?mov %c,%0\nimul %c,%1\n"
508 509
14
... 497
Another variant takes three operands and leaves in a register the product
of a constant and a register or memory cell:
...
(X86 rules 503)+=
reg: MULI(con,mr) "imul %c,%1,%0\n" 13
509 509... 497
It expects its first operand in eax and leaves its result in the double
register edx-eax; eax holds the low-order bits, which is the result of the
operation, unless the operation overflows, in which case ANSI calls the
result undefined, so eax is as good a result as any: 498 EAX
... 498 EDX
(X86 target 508) +=
case MULU:
508 510
... 502 361 IREC
361 mask
363 mkreg
setreg(p, quo); 403 reg
rtarget(p, 0, intreg[EAX]); 400 rtarget
break; 399 setreg
362 x.regnode
quo and rem denote the eax-edx register pair, which hold a product after
an unsigned multiplication and a dividend before a division. After a
division, eax holds the quotient and edx the remainder.
(X86 data 499) +=
...
501 496
static Symbol quo, rem;
....
(X86 target 508) +=
case DIVI: case DIVU:
509 512 ... 502
setreg(p, quo);
rtarget(p, 0, intreg[EAX]);
rtarget(p, 1, intreg[ECX]);
break;
case MODI: case MODU:
setreg(p, rem);
rtarget(p, 0, intreg[EAX]);
rtarget(p, 1, intreg[ECXJ);
break;
An xor instruction clears edx to prepare for an unsigned division:
....
(X86 rules 503)+=
reg: DIVU(reg,reg) "xor edx,edx\ndiv %1\n"
509 510 ... 497
....
(X86 rules 503) +==
reg: CVCI(reg) "# extend\n" 3
510 511
... 497
if (p->op == CVCI)
print("movsx %s,%s\n", (result511), preg(charreg));
else if (p->op == CVCU)
pri nt("movzx %s ,%s\n", (result511), preg(charreg));
else if (p->op == CVS!)
print("movsx %s,%s\n", (result511), preg(shortreg)); 403 reg
else if (p->op == CVSU) 362 RX
359 x.kids
print("movzx %s,%s\n", (result511), preg(shortreg)); 362 x.name
The integral narrowing conversions also require special treatment:
....
(X86 rules 503)+==
reg: CVIC(reg) "# truncate\n" 1
511 512
... 497
....
(X86 rules 503)+=
stmt: ASGNB(reg,INDIRB(reg))
512 513
"mov ecx,%a\nrep movsb\n"
... 497
rtarget(p->kids[O], 0, intreg[ESI]);
break;
The destination is fixed to be the top of the stack, so the template starts
by allocating a block atop the stack and pointing edi at it:
....
(X86 rules 503)+=
stmt: ARGB(INDIRB(reg)) "sub esp,%a\nmov edi,esp\n_
513 513 ... 497
If the code needs the integral result in a (general) register, then we'll
create, use, and free a temporary on the stack in memory:
... 61 ptr
(X86 rules 503)+=
reg: CVDI(reg) "sub esp,4\n_
515 515
... 497 403 reg
403 stmt
fistp dword ptr O[esp]\npop %c\n" 31
The fi l d instruction loads an integer, converts it to an 80-bit floating-
point value, and pushes it onto the floating-point stack:
...
(X86 rules 503) +=
reg: CVID(INDIRI(addr)) "fild dword ptr %0\n" 10
515 515
... 497
If the operand comes from a (general) register, then we create, use, and
free another temporary on the stack in memory:
...
(X86 rules 503)+=
reg: CVID(reg) "push %0\n_
515 515... 497
The similar nonterminal flt, which is defined on page 514, won't do,
because the assembler requires a st(l), st on binary operators but cu-
riously forbids it on fcomp. fcomp stores the result of the comparison in
some machine flags. The instruction fststw ax stores the flags in the
bottom of eax, and the instruction sahf loads them into the flags tested
by the conditional branch instructions:
(X86 rules 503) +=
...
516 517 497
....
stmt: EQD(cmpf,reg) "fcomp%0\nfstsw ax\nsahf\nje %a\n"
stmt: GED(cmpf,reg) "fcomp%0\nfstsw ax\nsahf\njbe %a\n"
stmt: GTD(cmpf,reg) "fcomp%0\nfstsw ax\nsahf\njb %a\n"
stmt: LED(cmpf,reg) "fcomp%0\nfstsw ax\nsahf\njae %a\n"
stmt: LTD(cmpf,reg) "fcomp%0\nfstsw ax\nsahf\nja %a\n"
stmt: NED(cmpf,reg) "fcomp%0\nfstsw ax\nsahf\njne %a\n"
low addresses
offset gives the offset from ebp to the next argument. It determines
the x. offset and x. name fields of the callee and caller views of the ar-
guments:
(assign offset to argument i 520)= 519
Symbol p = callee[i];
Symbol q = caller[i];
p->x.offset = q->x.offset = offset;
p->x.name = q->x.name = stringf("%d", p->x.offset);
p->sclass = q->sclass = AUTO;
offset+= roundup(q->type->size, 4);
The scl ass fields are set to record that no arguments are assigned to
registers, and offset is adjusted for the next argument and to keep the
stack aligned.
function then calls gen code to process the body of the routine. It
first resets offset and maxoffset to record that no locals have yet been
allocated:
(X86 function 519) +=
...
519 520
.... 519
offset = maxoffset = O;
gencode(caller, callee);
framesize = roundup(maxoffset, 4);
AUTO 80 if (framesize > 0)
callee 93 print("sub esp,%d\n", framesize);
caller 93
emitcode 341 When gencode returns, maxoffset is the largest value that offset took
framesize 366 on during the lifetime of gencode, so code to allocate the rest of the
function 92
(MIPS) " 448
frame can now be emitted into the prologue. Then function calls
(SPARC) " 484 emi tcode to emit the body of the routine, and it calls print directly
(X86) " 518 to emit the epilogue, which merely undoes the prologue:
gencode 337
maxoffset 365 (X86 function 519)+=
...
520 519
offset 364 emitcode();
print 18 print("mov esp,ebp\n");
roundup 19 print("pop ebp\n");
sclass 38
stringf 99 print("pop edi\n");
x.name 362 print("pop esi\n");
x.offset 362 print("pop ebx\n");
print("ret\n");
Static locals get a generated name to avoid other static locals of the same
name:
(X86 defsymbol 521)= 521
.... 521
if (p->scope >= LOCAL && p->sclass == STATIC)
p->x.name = stringf("L%d", genlabel(l));
Generated symbols already have a unique numeric name. defsymbo 1 sim-
ply prefixes a letter to make a valid assembler identifier:
(X86 defsymbol 521)+=
...
else if (p->generated)
....
521 521 521
q->x.name = stringd(q->x.offset);
}
}
For variables on the stack, address simply computes the adjusted offset.
For variables .accessed using a label, it sets x. name to a string of the form
name± n. If the offset is positive, the literal "+" emits the operator; if
the offset is negative, the %d emits it.
The front end calls defconst to emit assembler directives to allocate
and initialize a scalar to a constant. The argument ty identifies the
proper member of the union v:
...
(X86 functions498}+=
static void defconst(ty, v) int ty; Value v; {
521 523
... 497
switch (ty) {
{X86 defconst 522}
}
}
Most cases simply emit the member into an assembler directive that al-
locates and initializes a cell of the type ty:
(X86 defconst 522}=
case C: print("db %d\n", v.uc); return;
...
522 522
x.offset 362
print("dd 0%xH\n", *(unsigned *)&v.f);
return;
The two halves of each double must be exchanged if 1cc is running on
a little endian and compiling for a big endian, or vice versa:
(X86 defconst 522}+= 522
...
522
case D: {
unsigned *p = (unsigned *)&v.d;
print("dd 0%xH,0%xH\n", p[swap], p[l - swap]);
return;
}
The interface procedure defaddress allocates space for a pointer and
initializes it to a symbolic address:
18.4 • DEFINING DATA 523
It finds the end of the string by counting, because ANSI C escape codes
permit strings with embedded null bytes.
The front end calls export to expose a symbol to other modules. The
public assembler directive does just that:
(X86 functions 498) +=
...
523 523 497
..... 459 cseg (MIPS)
static void export(p) Symbol p; { 492 " (SPARC)
print("public %s\n", p->x.name); 501 " (X86)
} 91 defconst
455 " (MIPS)
The extern directive makes visible in the current module a symbol ex- 490 " (SPARC)
522 " (X86)
ported by another module, but it may not appear inside a segment, so 38 ref
the interface procedure import temporarily switches out of the current 91 segment
segment: 459 " (MIPS)
if (p->ref > O) {
segment(O);
print("extrn %s:near\n", p->x.name);
segment(oldseg);
}
}
The near directive declares that the external can be addressed directly.
The flat memory model and its 32-bit addresses permit direct addresses
for everything, so it's unnecessary to understand near and the related
directives unless one is generating segmented code, which is harder.
524 CHAPTER 18 • GENERATING X86 CODE
1 cc's implementation of segment for the X86 takes care that the call
segment (0) switches out of the current segment but not into any new
segment. import checks the symbol's ref field to emit the directives
only if the symbol is used, because some X86 linkers object to gratuitous
extrns.
The front end calls the interface procedure global to define a new
global. If the global is initialized, the front end next calls defconst, so
global allocates space only for uninitialized globals, which are in the
BSS segment:
....
(X86 functions 498) += 523 524
..... 497
static void global(p) Symbol p; {
print("align %d\n",
p->type->align > 4? 4 : p->type->align);
print("%s label byte\n", p->x.name);
if (p->u.seg == BSS)
print("db %d dup (O)\n", p->type->size);
}
The front end calls the interface procedure space to define a block of
global data initialized to zero:
....
(X86 functions 498) += 524 497
align 78 static void space(n) int n; {
BSS 91 if (cseg != BSS)
(MIPS) cseg 459 print("db %d dup (O)\n", n);
(SPARC) " 492 }
(X86) " 501
defconst 91
(MIPS) " 455
(SPARC) " 490
(X86) " 522
Further Reading
import 90
(MIPS) " 457 Various reference manuals elaborate on the architecture of this ma-
(SPARC) " 491 chine (Intel Corp. 1993). The assembler manuals that come with Mi-
(X86) " 523 crosoft's MASM and Borland's Turbo Assembler elaborate on the assem-
ref 38
seg 265
bler language in general and the directives that control the various mem-
segment 91 ory models in particular.
(MIPS) " 459
(SPARC) " 491
(X86) " 501 Exercises
x.name 362
18.1 Scan the X86 reference manual for instructions that l cc could use
but doesn't. Add rules to emit these instructions. Benchmark the
compiler before and after each change to determine which changes
pay off.
18.2 Some of l cc's opcodes commute, which means that for every rule
like
EXERCISES 525
because x > y if and only if y < x. Try to find some X86 dual
rules that pay off.
18.4 rep movsb copies eex bytes one at a time. rep movsw copies eex
16-bit units about twice as fast, and rep movsd copies eex 32-bit 502 ckstack
units another rough factor of two faster. Change the block-copy
code to exploit these instructions when it can.
18.5 lee's function prologues and epilogue save and restore ebx, esi,
and edi even if the routine doesn't touch them. Correct this blem-
ish and determine if it was worth the effort.
18.6 Reserve one general register and assign it to the most promising
local. Measure the improvement. Repeat the experiment for more
registers. Which number of register variables gives the best result?
18.7 lee emits lea edi ,l[edi] for the addition in f(i+l). We'd prefer
i ne edi, but it's hard to adapt the X86 code generator to emit that
code for this particular case. Explain why.
18.8 Construct a small C program that draws ekstaek's diagnostic.
18.9 Revise the X86 code generator to spill and reload floating-point reg-
isters without help from the programmer. See the discussion of
ekstaek.
19
Retrospective
526
19.2 • INTERFACE 527
19.2 Interface
1cc's code-generation interface is compact because it omits the inessen-
tial and makes simplifying assumptions. These omissions and assump-
tions do, however, limit the interface's applicability to other languages
and machines.
The interface assumes that signed and unsigned integers and long
integers all have the same size. This assumption lets 1cc make do with
528 CHAPTER 19 • RETROSPECTIVE
exposed the code list - or a flow graph - together with standard im-
plementations of gencode and emi tcode would permit clients to choose
between simplicity and flexibility.
On the other hand, the interface could be simpler yet. For example,
ASGN and CALL have type-specific variants that take different numbers
of operands. This variability complicates decisions that otherwise could
be made by inspecting only the generic operation. Operators that al-
ways generate trivial target code are another example. A few operators
generate nothing on some targets, but some, like CVUI and CVIU, gener-
ate nothing on all current or conceivable targets. Production back ends,
like those described in this book, take pains to avoid generating vacuous
register-to-register moves for these operators. Similarly, the narrowing
conversions CV{UI} x {CS} are vacuous on all targets and might well be
omitted.
Several interface conventions, if not obeyed, can cause subtle errors.
For example, the interface functions local and function, and the code
for the operator CALLB collaborate to generate code for functions that
return structures. Three sites in the back end must cooperate perfectly,
or the compiler will silently generate incorrect code. The front end could
deal with such functions completely and thus eliminate the interface flag
wants_ca11 b, but this would exclude some established calling sequences.
Similar comments apply to ARGB and the flag wants_argb. The trade-off
for generating compatible calling-sequence code is a more complex code- 341 emitcode
generation interface. 92 function
448 " (MIPS)
l cc's interface was designed for use in a monolithic compiler in which 484 " (SPARC)
the front end and back ends are linked together into a single program. 518 " (X86)
This design complicates separating the front and back ends into separate 337 gencode
programs. Some of the interaction is two-way; the upcalls from the inter- 60 isstruct
face function function to gencode and emitcode are examples. These 90 local
447 " (MIPS)
upcalls permit the front end to generate conversion code required at 483 " (SPARC)
function entry. The back end examines few fields in the source-language 518 " (X86)
type representation; it uses front-end functions like i sstruct to query 88 wants_argb
types. To make the back end a separate program, type data must be 88 wants_callb
transmitted to answer such queries, and the back end might have to im-
plement the function entry conversions.
Using ASTs would also make it easier to use l cc for other purposes.
Parts of l cc have been used to build browsers, front ends for other back
ends, back ends for other front ends, and link-time and run-time code
generators, and it has been used to generate code from within a.Tl inter-
preter and a debugger. l cc's design did not anticipate some of these
uses, and at least some of these projects would have been easier if l cc
had built ASTs and let clients traverse and annotate them.
assembler code and the output of the assembled program with saved
baseline assembler code and output. Sometimes we expect the assembler
code to change, so the first comparison can tell us nothing, but it's worth
doing because sometimes it fails unexpectedly and thus tells us that a
change to the compiler went overboard.
We also test, though somewhat less often, using the language confor-
mance section of the Plum-Hall Validation Suite for ANSI C compilers
and with a large set of numeric programs translated from Fortran. The
numeric programs have more variables, longer expressions, and more
common subexpressions than the other tests, which strains the register
allocator and thus tests the spiller better. Spills are rare, so spillers are
often hard to test.
1cc's test suite includes material that came to us as bug reports, but
we wish we'd saved more. lee has been in use at AT&T Bell Laboratories
and Princeton University since 1988 and at many other sites since then.
Many errors have been reported, diagnosed, and corrected. Electronic
news summarized each repair for users at Bell Laboratories and Prince-
ton, so that users might know if they needed to discard old binaries. We
recorded all the news messages, but next time we'd record more.
First, we'd record the shortest possible input that exposes each bug.
Just finding this input can be half the battle. Some bug reports were
nothing more than a note that 1cc's code for the program gave a wrong
answer and a pointer to a directory full of source code. It's hard to find
a compiler error when all you have is a large, unknown source program
and thousands of lines of object code. We usually start by trimming
the program until another cut causes the bug to vanish. Almost all bugs
have, in the end, been demonstrated by sample code of five lines or fewer.
Next time, we'd save these programs with sample input and output, and
create a test harness that would automatically recheck them. One must
resist the temptation to omit bugs deemed too arcane to reoccur. We've
sometimes reintroduced an old bug when fixing a new one, and thus had
to track and fix the old one a second time. A test harness would probably
pay for itself after one or two reintroduced bugs.
We'd also link at least some bugs with the code that corrects them.
1cc was not originally written as a literate program; the English here
was retrofitted to the code. In this, we encountered several compiler
fragments that we could no longer explain immediately. Most of them
turned out to repair bugs, but we'd have saved time if we'd kept more
sample bugs - that is, the source code and sample input and output -
nearby in comments or, now, in possibly elided fragments of the literate
program.
Another kind of test suite would help retargeters. When writing a
back end for a new target, we don't implement the entire code generator
before we start testing. Instead, we implement enough to compile, say,
the trivial program
FURTHER READING 533
main() {
printf("Hello world\n");
}
Further Reading
Schreiner and Friedman (1985) describe how to use LEX (Lesk 1975) and
YACC (Johnson 1975) by building a toy compiler for a small language.
Holub (1990) and Gray et al. (1992) describe more modern variants of
these compiler tools and how to implement them.
Budd (1991) is a gentle introduction to object-oriented programming
and object-oriented programming languages; he describes SmallTalk,
C+ +, Object Pascal, and Objective-C. The reference manuals for C+ +
(Ellis and Stroustrup 1990), Oberon-2 (Mossenbock and Wirth 1991), and
Modula-3 (Nelson 1991) are the definitive sources for those languages.
Ramsey (1993) adapted 1 cc to be an expression server for the retar-
getable debugger 1db. The server accepts a C expression entered during
debugging and a symbol table, compiles the expression as if it appeared
in a context described by the supplied symbol table, and evaluates it.
Ramsey wrote a back end that emits Postscript instead of assembler lan-
guage, and 1db's embedded Postscript interpreter evaluates the gener-
ated code and thus evaluates the expression. He also modified 1 cc to
emit 1db symbol tables.
Appel (1992) describes a research compiler for ML that builds ASTs
and makes more than 30 passes over them during compilation.
Our paper describing an earlier version of 1 cc (Fraser and Hanson
199lb) compares 1 cc's size and speed and the speed of its generated
code with the vendor's compilers and with gee on the VAX, Motorola
534 CHAPTER 19 • RETROSPECTIVE
68020, SPARC, and MIPS R3000. lee generated code that was usually
better than the code generated by the commercial compiler without opti-
mization enabled. A companion paper gives measurements that support
our intuition that register spills are rare (Fraser and Hanson 1992).
Lamb (1981) describes a typical peephole optimizer. The peephole
optimizer copt is about the simplest possible; it is available by anony-
mous ftp from research. att. com. Davidson and Fraser (1984) describe
a peephole optimizer driven by a formal description of the target ma-
chine.
Chaitin et al. (1981) describe register allocation by graph coloring,
and Krishnamurthy (1990) surveys some of the literature in instruction
scheduling. Proebsting and Fischer (1991) describe one of the simplest
integrations of register allocation and instruction scheduling.
Bibliography
Fraser, C. W., and D.R. Hanson. 199la. A code generation interface for
ANSI C. Software-Practice and Experience 21(9), 963-988.
- - . 1991b. A retargetable compiler for ANSI C. SIGPLAN No-
tices 26(10), 29-43.
Kane, G., and J. Heinrich. 1992. MIPS RISC Architecture. Englewood Cliffs,
NJ: Prentice Hall.
Kannan, S., and T. A. Proebsting. 1994. Correction to 'producing good
code for the case statement'. Software-Practice and Experience 24(2),
233.
538 BIBLIOGRAPHY
541
542 INDEX
big endian, 87, 370, 431 X86 indirect jumps for, 515
binary-expression, 154, 161 (break statement), 221, 232
binary, 173, 192-93, 200 bsize,105, 106-107
(bind.c), 96 BSS,91,265,300, 304,458,459,491,
bind.c,96 192, 501, 524
Binding,96 btot,50, 74,346
bindings,96, 306 buffer,105, 106, 107
bitcount,452 BUFSIZE, 105, 105, 107, 112, 122, 125
bit fields, 13, 66 (build an ADD+P tree), 192, 193
and endianness, 66, 87 (build the protot}-pe), 273, 274
assigning constants to, 330 ~builtin_va_alist,484
assignments to, 329, 350 BURM, 373, see also lburg
extracting, 320 (BURM signature), 378-81, 389-91,
postincrementing, 336 406
sign extending, 320 BXOR, 84, 318
simplifying references to, 208
storage layout of, 2 79 C++, 527
types permitted for, 281 CALL+B, 184-86, 189-90,245-46,
unnamed, 309 332-33
bittree, 192, 198, 209,215, 330-31, CALLB, 85, 88, 186, 332, 465, 476, 483,
332 529
BLANK, 110, 111-12 CALLO, 86,442,443,476-77, 518
(blkcopy),367-68 callee,93,94,286, 290,292-93,
blkcopy,357,367,368, 372,434, 337-38, 448-49, 451, 453, 484-85,
446-47, 460, 482, 494 487-88, 518-20
blkfetch, 355, 356, 368-69, 460, 461, callee-saved registers
492,493, 513 MIPS, 452, 454
blkloop,355,356,367-69,460,493, SPARC, 468
513 caller, 93, 94, 286, 292-93, 337-38,
blkreg,434,447 448-49,451,453,484-85,487-89,
blkstore,355,356,368,460,461, 518, 520
493, 513 caller-saved registers, 410, 428
blkunroll, 367, 368, 371, 493 MIPS, 444
(Blockbeg),217,219 SPARC, 468
Blockbeg, 7, 217, 219, 293, 294, 295, CALLF, 86,442,443,476-77,518
338, 339, 341 (CALL), 318, 332
blockbeg,93,95, 339,355, 365 CALL, 84-86, 88, 151, 171, 184, 186,
(Blockend),217,220 189, 199,245,316, 318, 332, 333,
Blockend, 7, 217, 219, 293, 294, 338, 336,343, 344, 361, 366,396,402,
339, 341 417, 427, 429, 445, 512, 529
blockend,93,95,339,355,365, 366 call,186, 190, 199,335,476
block moves, 199, 355, 367 CALL+I,185-86, 190,245
MIPS, 434, 446, 460 CALLI, 86, 400, 403, 417, 442, 443-44,
SPARC, 482, 492 476-77, 517
X86, 512 calling conventions, 93, 94, 184, 338,
BOR, 84, 318,330-31 529
Borland International, Inc., 496 MIPS, 432, 449
bottom-up hashing, 349 SPARC, 465, 468
bottom-up parsing, 127, 145 X86,496
bp,97, 392 CALLP, 86
branch, 224, 225, 227, 230, 232, 237, calls, 85
243, 244, 246, 247 as common subexpressions, 347
branch tables (calls), 166, 186
density of, 238, 250 calltree,187, 189, 190
emitting, 342 CALL+V,186
generated code for, 242 CALLV, 85, 332,333,442,476-77,517
MIPS indirect jumps for, 441 (case label), 221, 234
overhead of, 241 caselabel,234,235
SPARC indirect jumps for, 475 (cases for one-character operators and
traversing, 238, 240, 251 punctuation), 112
544 INDEX
(cases for two-character operators), CNSTI, 82, 378, 388, 403, 437, 439,
112 470,473-74, 504,508
cast, 174, 175, 171, 178, 119, 180, code generator, overview of, 353
181, 188, 189, 192-94, 197, 202-3, code-generator generators, 13, 373
210, 212, 214, 233-34, 235, 242, codehead,217,291,338, 339,341
245, 331 CODE,91,265,293,342,452,459,491,
(cast l and r to type ty), 192, 193-95 501
casts, 179 Code, 211,217,218, 220,233, 246-47,
cfields, 65, 66, 197, 282 291, 293,338,341
cfoldcnst,208,209 code, 211, 217, 218, 219-20, 233, 243,
cfunc,243, 244,290,291,293-94,333 246-41, 294, 311
chain rules, 376 codelist, 217, 218, 236-37, 243,
(changes flow of control?), 416, 417 246-47,249, 291,338,339
character-constant, 122 code lists, 7, 217, 291, 311, 528
characters appending forests to, 223, 311
classifications of, 110 codes for entrtes in, 218
signed vs. unsigned, 206, 257 emitting code from, 341
CHAR,48, 54, 58,60, 69, 73,82, 109, generating code from, 337
115, 175, 253, 256, 251, 271, 280, command-line options, 307
295 -A, 62, 124, 160, 244,263, 281, 292,
charmetric, 58, 78 296,459,492
chartype,57, 58, 74, 123, 177,207 -a,220
(check for floating constant), 117 -b, 220,249
(check for inconsistent linkage), 261, -d, 238,370
262 -g,219,341
(check for invalid use of the specifier), -G,458
255,256 -P, 75,304
(check for legal statement -p,466
termination), 221, 222 -pg,466
(check for redefinition of tag), 67 -target,96, 306
(check for unreachable code), 218 -x, 51
(check if prototype is upward comma operator, 156, 335
compatible), 70, 71 (comment or/), 112
checklab, 22~293, 309 common subexpressions, 6, 80, 223,
(checkref), 296-97 312, 313, 342, 418
checkref, 292-93, 296, 297, 299, 303, allocating registers for, 343
348 bonus match for, 383
(c.h), 16 invalidating, 223, 316, 321, 323,
c.h, 16, 18 326,328
%c,392 recomputing, 360, 382
CISC, 496 commutative operators, 204
ck, 388 commute,204,204,207,208
ckstack,502, 503, 525 comparing strings, 29, 45
(classify SPARC parameter), 485, 486 compatible, 193, 194
(dear register state), 410, 448, 485, compiler-construction tools, 12
519 (complement), 164
clobber, 357, 396, 410, 417, 424-25, compose, 72, 72,261,298
427, 429, 435, 444, 468, 471, 479, composing conversions, 1 74
502,517 composite types, 71
(dose a scope in a parameter list), 272 (compound),294-96
closures, 42 compound,221,245, 291-92,293,298
cmp,242,251 compound-statement, 216, 285, 293
cmptree,109, 192, 193, 194, 195 compound statements, 339, 365
CNSTC,82,388, 389,403,437,470,473 computed, 197,210,211, 328
CNST+D,6 computed symbols, 90, 210, 339, see
(CNST), 318, 327 also Address
CNST, 82, 84, 167, 177, 193-94, 198, (compute p->offset), 282, 283-84
202, 203-5, 234, 318, 326-27, 330, (computety), 255, 257
388, 473, 508 (concrete function), 267, 268
(COND),318, 325-26
INDEX 545
reduce, 353-54, 382, 384-85, 387, 402, register variables, 361, 399, 418
470 MIPS, 434
{reduce k?), 368, 369 SPARC, 468, 483
reducers, 379, 381 X86, 500
ref,38, 168, 211, 221, 224,230, 236, register windows on the SPARC, 463,
246-49,294, 296-98, 302-3, 322, 465
339, 346, 523, 524 Regnode,361,362,411,422
{refill buffer), 105, 106 relink,413,414
refinc,168,169,220, 221,222,224, rem, 509, 509
225, 229, 233, 290, 291 {remove the entry at cp), 246, 247
regcount,291,290299 {remove types with u. sym->scope >=
reg, 356, 400-401, 403, 404, 405, lev), 59
436-44,446,460-61,469-78, rename,488,489,490
480-82,484-86,489-90,492-93, reprune,409,426,426-27
504-6, 508-11, 513-18 {requate), 395-96
register allocation, 354 requate,353-54, 393,394, 395-90
by graph coloring, 428, 531 407, 472, 506
register allocator reset, 311, 317, 321, 323-24, 325,
overview, 409 326,328-29,333
register assignment, 354 {reset refi nc if -a was specified), 221
REGISTER, 39,80, 94-95, 179,202, 234, {result), 511
256, 270,273,275,296, 297-99, resynch,106, 125
346-47, 348, 399, 412, 417-18, resynchronization directives, 106, 125
424,450,451,453,483,486,488 retargeting lee, 357
{register local), 299 RET+B,245
registers RETB, 85
allocation of, 408, 413, 417 {retcode), 244-45
assigning variables to, 94 retcode,243,244,290,291, 295
assignment of, 408 {RET), 318
caller-saved, 410, 428 RET, 84-86, 244, 245, 318, 350, 417,
floating-point, 361 443, 476
general-purpose, 361 RET+l,6, 245
global allocation, 408 RETI,86,400,443,476,517-18
MIPS, 432 RET+P,245
MIPS callee-saved, 452, 454 RETP,86
MIPS caller-saved, 444 retstruct,484,487
MIPS formals, 450 return address
MIPS return, 432, 443 MIPS,432,442-43,455
MIPS scratch, 432, 434, 443 SPARC, 475, 490
MIPS zero, 432, 437 X86, 519
reloading spilled, 425 {return a structure), 245
SPARC, 465, 476 {return a tree for a struct parameter),
SPARC callee-saved, 468 168, 170
SPARC caller-saved, 468 returning structures, see also functions,
SPARC return, 476 returning structures from
SPARC scratch, 467-68 SPARC, 487
SPARC zero, 465, 473 return register
spilling, 357, 409, 420, 472, 502 MIPS,432,443
targeting, 357 SPARC, 476
X86,498 X86,517
X86 return, 517 {return statement), 221, 243
X86 scratch, 500 {return the symbol if p's value== v),
register sets, see wildcards 48
registers,294,295,299 retv,245-46,291, 291-92,294-95
register symbols, 362 retype, 151, 171, 174-77, 179, 181-82,
initializing, 362 197,202,209,233
register targeting, 397 reuse,382-83,384,390-91,406
register-to-register copies, 354, 360, rewrite,353-54,402,403,425
394, 397, 415, see also moveself RIGHT_CHILD,375
and requate right context, 136
INDEX 557
The complete source code for 1cc is available free of charge to the pur-
chaser of this book. All distributions include the source code for the
front end, the code generators for the SPARC, MIPS R3000 and Intel 386,
the source code for the code-generator generator, and documentation
that gives instructions for installing and running 1cc on a variety of plat-
forms. 1cc runs on UNIX systems and on PCs with a 386 processor or its
successor running DOS 6.0 or Windows 3.1.
There is an electronic 1cc mailing list. To subscribe, send a e-mail
message with the one-line body
subscribe lee
to majordomo@cs. pri nceton. edu. This line must appear in the message
body; "Subject:" lines are ignored. Additional information about 1cc is
also available on the Wide World Web via Mosaic and other Web browsers.
The universal resource locator is
http://www.cs.princeton.edu/software/lcc
1cc may be obtained from the sources listed below.
Internet
The distribution is available for downloading via anonymous ftp from
ftp. cs. pri nceton. edu (128.112.152.13) in the directory pub/lee. To re-
trieve information about the distribution, ftp to ftp. cs. p ri nee ton. edu;
for example, on UNIX systems, use the command
ftp ftp.cs.princeton.edu
Log in as anonymous, and use your e-mail address as your password.
Once connected, change to the 1cc directory with the command
cd pub/lee
The file named README gives instructions for retrieving the distribution
with ftp and information about 1cc since this book went to press. The
command
get README
will retrieve this file. Follow the instructions therein for retrieving the
distribution in the form that is appropriate for your system.
563
HOW TO OBTAIN LCC
Diskette
The distribution is available free of charge on a 3.5", high-density diskette
to the original purchaser of this book. To obtain your copy, fill in the
coupon on the next page and return it to Benjamin/Cummings.
l cc is an active research compiler and will continue to change over
time. Thus, the diskette version cannot be as up to date as the online
versions.
To obtain a free 3.5" diskette containing the 1cc distribution, fill in
the coupon below, carefully remove this entire page from the book, fold
the page so that the Benjamin/Cummings Publishing Company address,
printed on the reverse side, is visible, attach appropriate postage, and
mail. Allow two weeks from receipt of this coupon for delivery.