100% found this document useful (1 vote)
1K views578 pages

A Retargetable C Compiler Design and Implementation

Uploaded by

譚琦燁
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
1K views578 pages

A Retargetable C Compiler Design and Implementation

Uploaded by

譚琦燁
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 578

A Retargetable C Compiler:

Design and
Implementation

Christopher W. Fraser
AT&T BELL LABORATORIES

David R. Han son


PRINCETON UNIVERSITY

The Benjamin/Cummings Publishing Company, Inc.


Redwood City, CA• Menlo Park, CA• Reading, MA• New York
Don Mills, Ontario • Wokingham, UK • Amsterdam • Bonn
Sydney • Singapore • Tokyo • Madrid • San Juan
Acquisitions Editor: John Carter Shanklin
Executive Editor: Dan Joraanstad
Editorial Assistant: Melissa Standen
Production Supervisor: Ray Kanarr
Cover Design and lllustration: Cloyce Wall
Text Designer: Peter Vacek
Copyeditor: Elizabeth Gehrman
Proofreader: Christine Sabooni

Many of the designations used by manufacturers and sellers to distinguish their


products are claimed as trademarks. Where those designations appear in this
book, and Benjamin/Cummings was aware of a trademark claim, the designations
have been printed in initial caps or in all caps.
Camera-ready copy for this book was prepared using If..T£X, T£'(, and Adobe Illus-
trator.
Instructional Material Disclaimer
The program presented in this book has been included for its instructional value.
It has been tested with care but is not guaranteed for any particular purpose. Nei-
ther the publisher, AT&T, or the authors offer any warranties or representations,
nor do they accept any liabilities with respect to the program.
Copyright © 1995 by AT&T and David R. Hanson.
All rights reserved. No part of this publication may be reproduced, or stored
in a database or retrieval system, or transmitted, in any form or by any means,
electronic, mechanical, photocopying, recording, or otherwise, without the prior
written permission of the publisher. Printed in the United States of America.
Printed simultaneously in Canada.

library of Congress Cataloging-in-Publication Data


Fraser, Chistopher W.
A retargetable C compiler: design and implementation I
Christopher W. Fraser, David R. Hanson
p. cm.
Includes index.
ISBN 0-8053-1670-1
1. C (Computer program language) 2. Compilers (Computer programs)
I. Hanson, David R. II. Title.
QA76.73.C15F75 1995
005.4'53--dc20 94-24583
CIP

ISBN 0-8053-1670-1
1 2 3 4 5 6 7 8 9 10 DOC 98 97 96 95
The Benjamin/Cummings Publishing Company, Inc.
390 Bridge Parkway
Redwood City, CA 94065
To Linda
To Maylee
Contents

Preface xiii

1 Introduction 1
1.1 Llterate Programs 1
1.2 How to Read This Book 2
1.3 Overview 4
1.4 Design 11
1.5 Common Declarations 16
1.6 Syntax Specifications 19
1.7 Errors 20
Further Reading 21

2 Storage Management 23
2.1 Memory Management Interface 23
2.2 Arena Representation 24
2.3 Allocating Space 26
2.4 Deallocating Space 28
2.5 Strings 28
Further Reading 31
Exercises 3 2

3 Symbol Management 35
3.1 Representing Symbols 37
3.2 Representing Symbol Tables 39
3.3 Changing Scope 42
3.4 Finding and Installing Identifiers 44
3.5 Labels 45
3.6 Constants 47
3.7 Generated Variables 49
Further Reading 51
Exercises 51

vii
vii CONTENTS

4 Types 53
4.1 Representing Types 53
4.2 Type Management 56
4.3 Type Predicates 60
4.4 Type Constructors 61
4.5 Function Types 63
4.6 Structure and Enumeration Types 65
4. 7 Type-Checking Functions 69
4.8 Type Mapping 73
Further Reading 74
Exercises 74

5 Code Generation Interface 78


5.1 Type Metrics 78
5.2 Interface Records 79
5. 3 Symbols 80
5.4 Types 81
5.5 Dag Operators 81
5.6 Interface Flags 87
5.7 Initialization 89
5.8 Definitions 89
5.9 Constants 91
5.10 Functions 92
5.11 Interface Binding 96
5.12 Upcalls 97
Further Reading 99
Exercises 100

6 Lexical Analysis 102


6.1 Input 103
6.2 Recognizing Tokens 107
6.3 Recognizing Keywords 113
6.4 Recognizing Identifiers 114
6. 5 Recognizing Numbers 115
6.6 Recognizing Character Constants and Strings 121
Further Reading 124
Exercises 12 5
CONTENTS ix

7 Parsing 127
7.1 Languages and Grammars 127
7.2 Ambiguity and Parse Trees 128
7.3 Top-Down Parsing 132
7.4 FIRST and FOLLOW Sets 134
7.5 Writing Parsing Functions 137
7.6 Handling Syntax Errors 140
Further Reading 145
Exercises 14 5

8 Expressions 147
8.1 Representing Expressions 147
8.2 Parsing Expressions 151
8.3 Parsing C Expressions 154
8.4 Assignment Expressions 156
8.5 Conditional Expressions 159
8.6 Binary Expressions 160
8.7 Unary and Postfix Expressions 163
8.8 Primary Expressions 166
Further Reading 170
Exercises 171

9 Expression Semantics 172


9.1 Conversions 172
9.2 Unary and Postfix Operators 178
9.3 Function Calls 183
9.4 Binary Operators 191
9.5 Assignments 195
9.6 Conditionals 200
9.7 Constant Folding 202
Further Reading 214
Exercises 214

10 Statements 216
10.1 Representing Code 216
10.2 Execution Points 220
10.3 Recognizing Statements 221
10.4 If Statements 224
10.5 Labels and Gotos 226
x CONTENTS

10.6 Loops 227


10.7 Switch Statements 230
10.8 Return Statements 243
10.9 Managing Labels and Jumps 246
Further Reading 249
Exercises 250

11 Declarations 252
11.1 Translation Units 253
11.2 Declarations 2 54
11.3 Declarators 265
11.4 Function Declarators 2 70
11.5 Structure Specifiers 276
11.6 Function Definitions 285
11.7 Compound Statements 293
11.8 Finalization 303
11.9 The Main Program 305
Further Reading 308
Exercises 308

12 Generating lntennediate Code 311


12.1 Eliminating Common Subexpressions 313
12.2 Building Nodes 317
12.3 Flow of Control 321
12.4 Assignments 328
12.5 Function Calls 332
12.6 Enforcing Evaluation Order 335
12.7 Driving Code Generation 337
12.8 Eliminating Multiply Referenced Nodes 342
Further Reading 349
Exercises 349

13 Structuring the Code Generator 352


13.l Organization of the Code Generator 353
13.2 Interface Extensions 3 54
13.3 Upcalls 357
13.4 Node Extensions 358
13.5 Symbol Extensions 362
13.6 Frame Layout 364
CONTENTS xi

13.7 Generating Code to Copy Blocks 367


13.8 Initialization 370
Further Reading 3 71
Exercises 3 72

14 Selecting and Emitting Instructions 373


14.1 Specifications 375
14.2 Labelling the Tree 377
14.3 Reducing the Tree 379
14.4 Cost Functions 388
14.5 Debugging 389
14.6 The Emitter 391
14.7 Register Targeting 397
14.8 Coordinating Instruction Selection 402
14.9 Shared Rules 403
14.10 Writing Specifications 404
Further Reading 405
Exercises 406

15 Register Allocation 408


15.1 Organization 409
15.2 Tracking the Register State 410
15.3 Allocating Registers 413
15.4 Spilling 420
Further Reading 428
Exercises 429

16 Generating MIPS R3000 Code 430


16.1 Registers 432
16.2 Selecting Instructions 435
16.3 Implementing Functions 447
16.4 Defining Data 4 5 5
16.5 Copying Blocks 460
Further Reading 461
Exercises 461

17 Generating SPARC Code 463


17.1 Registers 465
17.2 Selecting Instructions 469
xii CONTENTS

17.3 Implementing Functions 483


17.4 Defining Data 490
17.5 CopyingBlocks 492
Further Reading 494
Exercises 495

18 Generating X86 Code 496


18.1 Registers 498
18.2 Selecting Instructions 503
18.3 Implementing Functions 518
18.4 Defining Data 520
Further Reading 524
Exercises 524

19 Retrospective 526
19.1 Data Structures 526
19.2 Interface 527
19.3 Syntactic and Semantic Analyses 529
19.4 Code Generation and Optimization 531
19.5 Testing and Validation 531
Further Reading 5 3 3

Bibliography 535

Index 541

How to Obtain Ice 563


Preface

The compiler is the linchpin of the programmer's toolbox. Working pro-


grammers use compilers every day and count heavily on their correct-
ness and reliability. A compiler must accept the standard definition of
the programming language so that source code will be portable across
platforms. A compiler must generate efficient object code. Perhaps more
important, a compiler must generate correct object code; an application
is only as reliable as the compiler that compiled it.
A compiler is itself a large and complex application that is worthy of
study in its own right. This book tours most of the implementation of
1cc, a compiler for the ANSI C programming language. It is to compil-
ing what Software Tools by B. W. Kernighan and P. J. Plauger (Addison-
Wesley, 1976) is to text processing like text editors and macro proces-
sors. Software design and implementation are best learned through ex-
perience with real tools. This book explains in detail and shows most
of the code for a real compiler. The accompanying diskette holds the
source code for the complete compiler.
1cc is a production compiler. It's been used to compile production
programs since 1988 and is now used by hundreds of C programmers
daily. Detailing most of a production compiler in a book leaves little
room for supporting material, so we present only the theory needed for
the implementation at hand and leave the broad survey of compiling
techniques to existing texts. The book omits a few language features -
those with mundane or repetitive implementations and those deliberately
treated only in the exercises - but the full compiler is available on the
diskette, and the book makes it understandable.
The obvious use for this book is to learn more about compiler con-
struction. But only few programmers need to know how to design and
implement compilers. Most work on applications and other aspects of
systems programming. There are four reasons why this majority of C
programmers may benefit from this book.
First, programmers who understand how a C compiler works are often
better programmers in general and better C programmers in particular.
The compiler writer must understand even the darkest comers of the
C language; touring the implementation of those comers reveals much
about the language itself and its efficient realization on modem comput-
ers.
Second, most texts on programming must necessarily use small ex-
amples, which often demonstrate techniques simply and elegantly. Most
xiii
PREFACE

programmers, however, work on large programs that have evolved - or


degenerated - over time. There are few well documented examples of
this kind of "programming in the large" that can serve as reference ex-
amples. 1cc isn't perfect, but this book documents both its good and
bad points in detail and thus provides one such reference point.
Third, a compiler is one of the best demonstrations in computer sci-
ence of the interaction between theory and practice. 1cc displays both
the places where this interaction is smooth and the results are elegant,
as well as where practical demands strain the theory, which shows in
the resulting code. Exploring these interactions in a real program helps
programmers understand when, where, and how to apply different tech-
niques. 1cc also illustrates numerous C programming techniques.
Fourth, this book is an example of a "literate program." Like Ti?(:
The Program by D. E. Knuth (Addison-Wesley, 1986), this book is lee's
source code and the prose that describes it. The code is presented in the
order that best suits understanding, not in the order dictated by the C
programming language. The source code that appears on the diskette is
extracted automatically from the book's text files.
This book is well suited for self-study by both academics and profes-
sionals. The book and its diskette offer complete documented source
code for 1cc, so they may interest practitioners who wish to experiment
with compilation or those working in application areas that use or im-
plement language-based tools and techniques, such as user interfaces.
The book shows a large software system, warts and all. It could thus
be the subject of a postmortem in a software engineering course, for
example.
For compiler courses, this book complements traditional compiler
texts. It shows one way of implementing a C compiler, while traditional
texts survey algorithms for solving the broad range of problems encoun-
tered in compiling. Limited space prevents such texts from including
more than a toy compiler. Code generation is often treated at a particu-
larly high level to avoid tying the book to a specific computer.
As a result, many instructors prepare a substantial programming
project to give their students some practical experience. These instruc-
tors usually must write these compilers from scratch; students duplicate
large portions and have to use the rest with only limited documentation.
The situation is trying for both students and instructors, and unsatisfy-
ing to boot, because the compilers are still toys. By documenting most
of a real compiler and providing the source code, this book offers an
alternative.
This book presents full code generators for the MIPS R3000, SPARC,
and Intel 386 and successor architectures. It exploits recent research that
produces code generators from compact specifications. These methods
allow us to present complete code generators for several machines, which
no other book does. Presenting several code generators avoids tying
PREFACE xv

the book to a single machine, and helps students appreciate engineering


retargetable software.
Assignments can add language features, optimizations, and targets.
When used with a traditional survey text, assignments could also replace
existing modules with those using alternate algorithms. Such assign-
ments come closer to the actual practice of compiler engineering than
assignments that implement most of a toy compiler, where too much
time goes to low-level infrastructure and accommodating repetitive lan-
guage features. Many of the exercises pose just these kinds of engineer-
ing problems.
l cc has also been adapted for purposes other than conventional com-
pilation. For example, it's been used for building a C browser and for
generating remote-procedure-call stubs from declarations. It could also
be used to experiment with language extensions, proposed computer ar-
chitectures, and code-generator technologies.
We assume readers are fluent in C and assembly language for some
computer, know what a compiler is and have a general understanding
of what one does, and have a working understanding of data structures
and algorithms at the level covered in typical undergraduate courses; the
material covered by Algorithms in C by R. Sedgewick (Addison-Wesley,
1990), for example, is more than sufficient for understanding l cc.

Acknowledg111ents
This book owes much to the many lee users at AT&T Bell Laboratories,
Princeton University, and elsewhere who suffered through bugs and pro-
vided valuable feedback. Those who deserve explicit thanks include Hans
Boehm, Mary Fernandez, Michael Golan, Paul Haahr, Brian Kernighan,
Doug Mcilroy, Rob Pike, Dennis Ritchie, and Ravi Sethi. Ronald Guil-
mette, David Kristal, David Prosser, and Dennis Ritchie provided valu-
able information concerning the fine points of the ANSI Standard and its
interpretation. David Gay helped us adapt the PFORT library of numerical
software to be an invaluable stress test for l cc's code generators.
Careful reviews of both our code and our prose by Jack Davidson,
Todd Proebsting, Norman Ramsey, William Waite, and David Wall con-
tributed significantly to the quality of both. Our thanks to Steve Beck,
who installed and massaged the fonts used for this book, and to Maylee
Noah, who did the artwork with Adobe Illustrator.

Christopher W. Fraser
David R. Hanson
1
Introduction

A compiler translates source code to assembler or object code for a target


machine. A retargetable compiler has multiple targets. Machine-specific
compiler parts are isolated in modules that are easily replaced to target
different machines.
This book describes 1cc, a retargetable compiler for ANSI C; it fo-
cuses on the implementation. Most compiler texts survey compiling al-
gorithms, which leaves room for only a toy compiler. This book leaves
the survey to others. It tours most of a practical compiler for full ANSI C,
including code generators for three target machines. It gives only enough
compiling theory to explain the methods that it uses.

1.1 Literate Programs


This book not only describes the implementation of 1cc, it is the imple-
mentation. The noweb system for "literate programming" generates both
the book and the code for 1cc from a single source. This source con-
sists of interleaved prose and labelled code fragments. The fragments
are written in the order that best suits describing the program, namely
the order you see in this book, not the order dictated by the C program-
ming language. The program noweave accepts the source and produces
the book's typescript, which includes most of the code and all of the text.
The program notangl e extracts all of the code, in the proper order for
compilation.
Fragments contain source code and references to other fragments.
Fragment definitions are preceded by their labels in angle brackets. For
example, the code
(a fragment label I)=
sum = O;
...2
for Ci= O; i < 10; i++) (incrementsumI)

(increment sum 1)= 1


sum+= x[i];

sums the elements of x. Fragment uses are typeset as illustrated by the


use of (increment sum) in the example above. Several fragments may
have the same name; notangl e concatenates their definitions to produce

1
2 CHAPTER 1 • INTRODUCTION

a single fragment. noweave identifies this concatenation by using + =


instead of = in continued definitions:
(a fragment label I)+=
....1
printf("%d\n", sum);
Fragment definitions are like macro definitions; notangl e extracts a pro-
gram by expanding one fragment. If its definition refers to other frag-
ments, they are themselves expanded, and so on.
Fragment definitions include aids to help readers navigate among
them. Each fragment name ends with the number of the page on which
the fragment's definition begins. If there's no number, the fragment
isn't defined in this book, but its code does appear on the companion
diskette. Each continued definition also shows the....previous definition,
and the next continued definition, if there is one. 14 is an example of a
previous definition that appears on page 14, and 31 .... says the definition
is continued on page 31. These annotations form a doubly linked list of
definitions; the up arrow points to the previous definition in the list and
down arrow points to the next one. The previous link on the first defi-
nition in a list is omitted, and the next link on the definition is omitted.
These lists are complete: If some of a fragment's definitions appear on
the same page with each other, the links refer to the page on which they
appear.
Most fragments also show a list of pages on which the fragment is
used, as illustrated by the number 1 to the right of the definition for
(increment sum), above. These unadorned use lists are omitted for root
fragments, which define modules, and for some fragments that occur too
frequently, as detailed below.
notangl e also implements one extension to C. A long string literal
can be split across several lines by ending the lines to be continued with
underscores. notangle removes leading white space from continuation
lines and concatenates them to form a single string. The first argument
to error on page 119 is an example of this extension.

1.2 How to Read This Book


Read this book front-to-back. A few variants are possible.
• Chapter 5 describes the interface between the front end and back
ends of the compiler. This chapter has been made as self-contained
as possible.
• Chapters 13-18 describe the back ends of the compiler. Once you
know the interface, you can read these chapters with few excur-
sions back into their predecessors. Indeed, people have replaced
the front end and the back ends without reading, much less under-
standing, the other half.
1.2 • HOW TO READ THIS BOOK 3

• Chapters 16-18 describe the modules that capture all information


about the three targets - the MIPS, SPARC, and Intel 386 and suc-
cessor architectures. Each of these chapters is independent, so you
may read any subset of them. If you read more than one, you may
notice some repetition, but it shouldn't be too irritating because
most code common to all three targets has been factored out into
Chapters 13-15.
Some parts of the book describe 1cc from the bottom up. For example,
the chapters on managing storage, strings, and symbol tables describe
functions that are at or near the ends of call chains. Little context is
needed to understand them.
Other parts of the book give a top-down presentation. For example,
the chapters on parsing expressions, statements, and declarations begin
with the top-level constructs. Top-down material presents some func-
tions or fragments well after the code that uses them, but material near
the first use tells enough about the function or fragment to understand
what's going on in the interim.
Some parts of the book alternate between top-down and bottom-up
presentations. A less variable explanation order would be nice, but it's
unattainable. Like most compilers, 1cc includes mutually recursive func-
tions, so it's impossible to describe all callees before all callers or all
callers before all callees.
Some fragments are easier to explain before you see the code. Others
are easier to explain afterward. If you need help with a fragment, don't
struggle before scanning the text just before and after the fragment.
Most of the code for 1cc appears in the text, but a few fragments are
used but not shown. Some of these fragments hold code that is omitted
to save space. Others implement language extensions, optional debug-
ging aids, or repetitious constructs. For example, once you've seen the
code that handles C's for statement, the code that handles the do-while
statement adds little. The only wholesale omission is the explanation of
how 1cc processes C's initializers, which we skipped because it is long,
not very interesting, and not needed to understand anything else. Frag-
ments that are used but not defined are easy to identify: no page number
follows the fragment name.
Also omitted are assertions. 1cc includes hundreds of assertions.
Most assert something that the code assumes about the value of a param-
eter or data structure. One is assert(O), which guarantees a diagnostic
and thus identifies states that are not supposed to occur. For example, if
a switch is supposed to have a bona fide case for all values of the switch
expression, then the default case might include assert(O).
The companion diskette is complete. Even the assertions and frag-
ments that are omitted from the text appear on the diskette. Many of
them are easily understood once the documented code nearby is under-
stood.
4 CHAPTER 1 • INTRODUCTION

A "mini-index" appears in the middle of the outside margin of many


pages. It lists each program identifier that appears on the page and the
page number on which the identifier is defined in code or explained in
text. These indices not only help locate definitions, but highlight circu-
larities: Identifiers that are used before they are defined appear in the
mini-indices with page numbers that follow the page on which they are
used. Such circularities can be confusing, but they are inevitable in any
description of a large program. A few identifiers are listed with more
than one definition; these name important identifiers that are used for
more than one purpose or that are defined by both code and prose.

1.3 Overview
l cc transforms a source program to an assembler language program.
Following a sample program through the intermediate steps in this trans-
formation illustrates l cc's major components and data structures. Each
step transforms the program into a different representation: prepro-
cessed source, tokens, trees, directed acyclic graphs, and lists of these
graphs are examples. The initial source code is:
int round(f) float f; {
return f + 0.5; /*truncates */
}

round has no prototype, so the argument is passed as a double and round


reduces it to a float upon entry. Then round adds 0. 5, truncates the
result to an integer, and returns it.
The first phase is the C preprocessor, which expands macros, includes
header files, and selects conditionally compiled code. l cc now runs un-
der DOS and UNIX systems, but it originated on UNIX systems. Like many
UNIX compilers, l cc uses a separate preprocessor, which runs as a sepa-
rate process and is not part of this book. We often use the preprocessor
that comes with the GNU C compiler.
A typical preprocessor reads the sample code and emits:
# 1 "sample.c"
int round(f) float f; {
return f + 0.5;
}

The sample uses no preprocessor features, so the preprocessor has noth-


ing to do but strip the comment and insert a # directive to tell the com-
piler the file name and line number of the source code for use when
issuing diagnostics. These sample coordinates are straightforward, but
a program with numerous #include directives brackets each included
1. 3 • OVERVIEW 5

INT inttype
ID "round"
I('
ID "f"
')I
FLOAT floattype
ID "f"
I' I
I {' I
RETURN
ID "f"
'+'
FCON 0.5
I, I
I}' I
EOI

FIGURE 1.1 Token stream for the sample.

file with a pair of # directives, and every other one names a line other
than 1.
The compiler proper picks up where the preprocessor leaves off. It
starts with the lexical analyzer or scanner, which breaks the input into
the tokens shown in Figure 1.1. The left column is the token code, which
is a small integer, and the right column is the associated value, if there
is one. For example, the value associated with the keyword int is the
value of i nttype, which represents the type integer. The token codes for
single-character tokens are the ASCII codes for the characters themselves,
and EOI marks the end of the input. The lexical analyzer posts the source
coordinate for each token, and it processes the # directive; the rest of the
compiler never sees such directives. 1 cc's lexical analyzer is described
in Chapter 6.
The next compiler phase parses the token stream according to the
syntax rules of the C language. It also analyzes the program for seman-
tic correctness. For example, it checks that the types of the operands
in operations, such as addition, are legal, and it checks for implicit con-
versions. For example, in the sample's addition, f is a float and 0. 5 is
a double, which is a legal combination, and the sum is converted from
double to int implicitly because round's return type is int.
The outcome of this phase for the sample are the two decorated ab-
stract syntax trees shown in Figure 1.2. Each node represents one basic
operation. The first tree reduces the incoming double to a float. It as-
signs a float (ASGN+F) to the cell with the address &f (the left ADDRF+P).
It computes the value to assign by converting to float (CVD+F) the double
fetched (INDIR+D) from address &f (the right ADDRF+P).
6 CHAPTER 1 • INTRODUCTION

ASGN+F RET+I

/~ i
ADDRF+P CVD+F CVD+I
' i i
INDIR+D ADD+D

i
ADDRF+P CVF+D
/~CNST+D
i 0.5
'
'~
INDIR+F
caller "f"- double
i
ADDRF+P
.... >-' --------------
callee "f"--+ float

FIGURE 1.2 Abstract syntax trees for the sample.

The second tree implements the sample's lone explicit statement,


and returns an int (RET+I). The value is computed by fetching the float
(INDIR+F) from the cell with the address &f (ADDRF+P), converting it to
double, adding (ADD+D) the daub 1e constant 0. 5 (CNST+D), and truncating
the result to int (CVD+I).
These trees make explicit many facts that are implicit in the source
code. For example, the conversions above are all implicit in the source
code, but explicit in the ANSI standard and thus in the trees. Also, the
trees type all operators explicitly; for example, the addition in the source
code has no explicit type, but its counterpart in the tree does. This
semantic analysis is done as 1cc's parser recognizes the input, and is
covered in Chapters 7-11.
From the trees shown in Figure 1.2, lee produces the directed acyclic
graphs - dags - shown in Figure 1.3. The dags labelled 1 and 2 come
from the trees shown in Figure 1.2. The operators are written without
the plus signs to identify the structures as dags instead of trees. The
transition from trees to dags makes explicit additional implicit facts. For
example, the constant 0. 5, which appeared in a CNST+D node in the tree,
appears as the value of a static variable named 2 in the dag, and the
CNST+D operator has been replaced by operators that develop the address
of the variable (ADDRGP) and fetch its value (INDIRD).
The third dag, shown in Figure 1.3, defines the label named 1 that
appears at the end of round. Return statements are compiled into jumps
to this label, and trivial ones are elided.
As detailed in Chapter 12, the transition from trees to dags also elim-
inates repeated instances of the same expression, which are called com-
mon subexpressions. Optionally, each multiply referenced dag node can
1.3 • OVERVIEW 7

(!)ASGNF @RETI @LABELV

/~ i '
'
'
'4
ADDRFP CVDF CVDI
' i i
"1"

IND I RD ADDO

,' i
ADDRFP CVFD
/~IND I RD
.'
'
.
.if
"f'~float
'
i i
callee
~
caller
.-"f'~double
IND I RF

i
ADDRGP

ADDRFP "2" . . o.s

FIGURE 1.3 Dags for the sample.

be eliminated by assigning its value to a temporary and using the tempo-


rary in several places. The code generators in this book use this option.
These dags appear in the order that they must execute on the code
list shown in Figure 1.4. Each entry in this list following the Start entry 217 Blockbeg
represents one component of the code for round. The Defpoi nt entries 217 Blockend
identify source locations, and the Blockbeg and Blockend entries identify 217 Defpoint
217 Gen
the boundaries of round's one compound statement. The Gen entries 217 Label
carry the dags labelled 1 and 2 in Figure 1.3, and the Label entry carries 217 Start
the dag labelled 3. The code list is described in Chapters 10 and 12.

Start

( . Def point 22, 1, "samp1 e. c"

(.Gen@

4 Blockbeg 5 ~----­
( . Defpoint 8,2, "sample.c"

4Gen@

( . Blockend

(. La~el@
( . Defpoint 0,3,"sample.c"
FIGURE 1.4 Code list for the sample.
8 CHAPTER 1 • INTRODUCTION

At this point, the structures that represent the program pass from
l cc's machine-independent front end into its back end, which translates
these structures into assembler code for the target machine. One can
hand-code a back end to emit code for a specific machine; such code
generators are often largely machine-specific and must be replaced en-
tirely for a new target.
The code generators in this book are driven by tables and a tree gram-
mar that maps dags to instructions as described in Chapters 13-18. This
organization makes the back ends partly independent of the target ma-
chine; that is, only part of the back end must be replaced for a new
target. The other part could be moved into the front end - which serves
all code generators for all target machines - but this step would com-
plicate using l cc with a different kind of code generator, so it has not
been taken.
The code generator operates by annotating the dags. It first identifies
an assembler-code template - an instruction or operand - that imple-
ments each node. Figure 1.5 shows the sample's dags annotated with
assembler code for the 386 or compatibles, henceforth termed X86. %n
denotes the assembler code for child n where the leftmost child is num-
bered 0, and %letter denotes one of the symbol-table entries at which
the node points. In this figure, the solid lines link instructions, and the
dashed lines link parts of instructions, such as addressing modes, to the
instructions in which they are used. For example, in the first dag, the
ASGNF and INDIRD nodes hold instructions, and the two ADDRGP nodes
hold their operands. Also, the CVDF node that was in the right operand
of the ASGNF in Figure 1.3 is gone - it's been swallowed by the instruction
selection because the instruction associated with the ASGNF does both the
conversion and the assignment. Chapter 14 describes the mechanics of
instruction selection and l burg, a program that generates selection code
from compact specifications.
For those who don't know X86 assembler code, fl d loads a floating-
point value onto a stack; fstp pops one off and stores it; fi stp does
likewise but truncates the value and stores the resulting integer instead;
fadd pops two values off and pushes their sum; and pop pops an integral
value off the stack into a register. Chapter 18 elaborates.
The assembler code is easier to read after the compiler takes its next
step, which chains together the nodes that correspond to instructions
in the order in which they're to be emitted, and allocates a register for
each node that needs one. Figure 1.6 shows the linearized instructions
and registers allocated for our sample program. The figure is a bit of
a fiction - the operands aren't actually substituted into the instruction
templates until later - but the white lie helps here.
Like many compilers that originated on UNIX systems, l cc emits as-
sembler code and is used with a separate assembler and linker. This
book's back ends work with the vendors' assemblers on MIPS and SPARC
1.3 • OVERVIEW 9

CD ASGNF @ RETI CI) LABELV


"fstp dword ptr %0\n" "# ret\n" "%a:\n"
-~
~:
ADDRFP IND I RD
i
CVDI
"%a[ebp]" "fl d qword ptr %0\n" "sub esp,4\n
' fistp dword ptr O[esp]\n
y pop %c\n"
ADDRFP
"%a[ebp]" i ADDD
"fadd%1\n"

CVFD
/' ':4.
IND I RD
"# nop\n" " qword ptr %0"

INDIRF
i ADDRGP
y
"fld dword ptr %0\n" 11%all

y
ADDRFP
"%a[ebp]"
FIGURE 1.5 After selecting instructions.

systems, and with Microsoft's MASM 6.11 and Harland's Turbo Assembler
4.0 under DOS. 1 cc generates the assembler language shown in Figure 1.7
for our sample program. The lines in this code delimit its major parts.
The first part is the boilerplate of assembler directives emitted for every
program. The second part is the entry sequence for round. The four push
instructions save the values of some registers, and the mov instruction
establishes the frame pointer for this invocation of round.
The third part is the code emitted from the annotated dags shown in
Figure 1.5 with the symbol-table data filled in. The fourth part is round's

Register Assembler Template


fld qword ptr %a[ebp]\n
fstp dword ptr %a[ebp]\n
fld dword ptr %a[ebp]\n
# nop\n
fadd qword ptr %a\n
eax sub esp,4\nfistp dword ptr O[esp]\npop %c\n
# ret\n
%a:\n

FIGURE 1.6 After allocating registers.


10 CHAPTER 1 • INTRODUCTION

.486
.model small boilerplate
extrn ~turboFloat:near
extrn ~setargv:near
public _round
_TEXT segment
_round:
push ebx entry
push esi sequence
push edi
push ebp
mov ebp,esp
fld qword ptr 20[ebp]
fstp dword ptr 20[ebp]
fld dword ptr 20[ebp] body of
fadd qword ptr L2 round
sub esp,4
fistp dword ptr O[esp]
pop eax
Ll:
mov esp,ebp
pop ebp exit
pop edi sequence
pop esi
pop ebx
ret
_TEXT ends
_DATA segment
align 4
L2 label byte initialized data
dd OOH,03feOOOOOH & boilerplate
_j)ATA ends
end

FIGURE 1.7 Generated assembler language for the sample.

exit sequence, which restores the registers saved in the entry sequence
and returns to the caller. Ll labels the exit sequence. The last part holds
initialized data and concluding boilerplate. For round, these data consist
only of the constant 0.5; L2 is the address of a variable initialized to
000000003fe00000 16 , which is the IEEE floating-point representation for
the 64-bit, double-precision constant 0.5.
1.4 • DESIGN 11

1.4 Design
There was no separate design phase for 1cc. It began as a compiler for
a subset of C, so its initial design goals were modest and focussed on
its use in teaching about compiler implementation in general and about
code generation in particular. Even as 1cc evolved into a compiler for
ANSI C that suits production use, the design goals changed little.
Computing costs less and less, but programmers cost more and more.
When obliged to choose between two designs, we usually chose the one
that appeared to save our time and yours, as long as the quality of the
generated code remained satisfactory. This priority made 1cc simple,
fast, and less ambitious at optimizing than some competing compil-
ers. 1cc was to have multiple targets, and it was overall simplicity that
counted. That is, we wrote extra code in 1cc's one machine-independent
part to save code in its multiple target-specific parts. Most of the design
and implementation effort devoted to 1cc has been directed at making
it easy to port 1cc to new targets.
1cc had to be simple because it was being written by ry uy two pro-
grammers with many other demands on their time. Simpl• ly saved im-
plementation time and saves more when it comes time lO change the
compiler. Also, we wanted to write this book, and you'll see that it was
hard to make even a simple compiler fit.
1cc is smaller and faster than most other ANSI C compilers. Compila-
tion speed is sometimes neglected in compiler design, but it is widely ap-
preciated; users often cite compilation speed as one of the reasons they
use 1 cc. Fast compilation was not a design goal per se; it's a consequence
of striving for simplicity and of paying attention to those relatively few
compiler components where speed really matters. 1cc's lexical analysis
(Chapter 6) and instruction selection (Chapter 14) are particularly fast,
and contribute most to its speed.
1cc generates reasonably efficient object code. It's designed specifi-
cally to generate good local code; global optimizations, like those done
by optimizing compilers, were not part of 1cc's design. Most modern
compilers, particularly those written by a CPU vendor to support its ma-
chines, must implement ambitious optimizers so that benchmarks put
their machines in the best light. Such compilers are complex and typ-
ically supported by groups of tens of programmers. Highly optimizing
C compilers generate more efficient code than 1 cc does when their op-
timization options are enabled, but the hundreds of programmers who
use 1cc daily as their primary C compiler find that its generated code
is fast enough for most applications, and they save another scarce re-
source - their own time - because 1cc runs faster. And 1cc is easier
to understand when systems programmers find they must change it.
Compilers don't live in a vacuum. They must cooperate with pre-
processors, linkers, loaders, debuggers, assemblers, and operating sys-
12 CHAPTER 1 • INTRODUCTION

terns, all of which may depend on the target. Handling all of the target-
dependent variants of each of these components is impractical. l cc's
design minimizes the adverse impact of these components as much as
possible. For example, its target-dependent code generators emit assem-
bler language and rely on the target's assembler to produce object code.
It also relies on the availability of a separate preprocessor. These design
decisions are not without some risk; for example, in vendor-supplied as-
semblers, we have tripped across several bugs over which we have no
control and thus must live with.
A more important example is generating code with calling sequences
that are compatible with the target's conventions. It must be possible
for l cc to do this so it can use existing libraries. A standard ANSI C
library is a significant undertaking on its own, but even if l cc came with
its own library, it would still need to be able to call routines in target-
specific libraries, such as those that supply system calls. The same con-
straint applies to proprietary third-party libraries, which are increasingly
important and are usually available only in object-code form.
Generating compatible code has significant design consequences on
both lee's target-independent front end and its target-dependent back
ends. A good part of the apparent complexity in the interface between
the front and back ends, detailed in Chapter 5, is due directly to the
tension between this design constraint and those that strive for simplic-
ity and retargetability. The mechanisms in the interface that deal with
passing and returning structures are an example.
l cc's front end is roughly 9,000 lines of code. Its target-dependent
code generators are each about 700 lines, and there are about 1,000
lines of target-independent back-end code that are shared between the
code generators.
With a few exceptions, l cc's front end uses well established compiler
techniques. As surveyed in the previous section, the front end per-
forms lexical, syntactic, and semantic analysis. It also eliminates local
common subexpressions (Chapter 12), folds constant expressions, and
makes many simple, machine-independent transformations that improve
the quality of local code (Chapter 9); many of these improvements are
simple tree transformations that lead to better addressing code. It also
lays down efficient code for loops and switch statements (Chapter 10).
l cc's lexical analyzer and its recursive-descent parser are both written
by hand. Using compiler-construction tools, such as parser generators, is
perhaps the more modern approach for implementing these components,
but using them would make l cc dependent on specific tools. Such de-
pendencies are less a problem now than when l cc was first available, but
there's little incentive to change working code. Theoretically, using these
kinds of tools simplifies both future changes and fixing errors, but ac-
commodating change is less important for a standardized language like
ANSI C, and there have been few lexical or syntactic errors. Indeed, prob-
1.4 • DESIGN 13

ably less than 15 percent of l cc's code concerns parsing, and the error
rate in that code is negligible. Despite its theoretical prominence, pars-
ing is a relatively minor component in l cc and other compilers; semantic
analysis and code generation are the major components and account for
most of the code - and have most of the bugs.
One of the reasons that l cc's back ends are its most interesting com-
ponents is because they show the results of the design choices we made
to enhance retargetability. For retargeting, future changes - each new
target - are important, and the retargeting process must make it rea-
sonably easy to cope with code-generation errors, which are certain to
occur. There are many small design decisions made throughout l cc that
affect retargetability, but two dominate.
First, the back ends use a code-generator generator, l burg, that pro-
duces code generators from compact specifications. These specifications
describe how dags are mapped into instructions or parts thereof (Chap-
ter 14). This approach simplifies writing a code generator, generates
optimal local code, and helps avoid errors because l burg does most of
the tedious work. One of the l burg specifications in this bP'lk can often
be used as a starting point for a new target, so retargeterf' ..on't have to
start from scratch. To avoid depending on foreign tools, tile companion
diskette includes lburg, which is written in ANSI C.
Second, whenever practical, the front end implements as much of an
apparently target-dependent function as possible. For example, the front
end implements switch statements completely, and it implements access
to bit fields by synthesizing appropriate combinations of shifting and
masking. Doing so precludes the use of instructions designed specifi-
cally for bit-field access and switch statements on those increasingly few
targets that have them; simplifying retargeting was deemed more impor-
tant. The front end can also completely implement passing or returning
structures, and it does so using techniques that are often used in target-
dependent calling conventions. These capabilities are under the control
of interface options, so, on some targets, the back end can ignore these
aspects of code generation by setting the appropriate option.
While l cc's overall design goals changed little as the compiler evolved,
the ways in which these goals were realized changed often. Most of these
changes swept more functionality into the front end. The switch state-
ment is an example. In earlier versions of l cc, the code-generation inter-
face included functions that the back end provided specifically to emit
the selection code for a switch statement. As new targets were added,
it became apparent that the new versions of these functions were nearly
identical to the corresponding functions in existing targets. This experi-
ence revealed the relatively simple design changes that permitted all of
this code to be moved into the front end. Doing so required changing
all of the existing back ends, but these changes removed code, and the
design changes simplify the back ends on future targets.
14 CHAPTER 1 • INTRODUCTION

The most significant and most recent design change involves the way
1cc is packaged. Previously, 1 cc was configured with one back end; that
is, the back end for target X was combined with the front end to form
an instance of 1 cc that ran on X and generated code for X. Most of
1cc's back ends generate code for more than one operating system. Its
MIPS back end, for example, generates code for MIPS computers that
run DEC's Ultrix or SGI's IRIX, so two instances of 1cc were configured.
N targets and M operating systems required N x M instances of 1 cc
in order to test them completely, and each one was configured from a
slightly different set of source modules depending on the target and the
operating system. For even small values of N and M, building N x M
compilers quickly becomes tedious and prone to error.
In developing the current version of 1cc for this book, we changed the
code-generation interface, described in Chapter 5, so that it's possible to
combine all of the back ends into a single program. Any instance of 1cc
is a cross-compiler. That is, it can generate code for any of its targets
regardless of the operating system on which it runs. A command-line
option selects the desired target. This design packages all target-specific
data in a structure, and the option selects the appropriate structure,
which the front end then uses to communicate with the back end. This
change again required modifying all of the existing back ends, but the
changes added little new code. The benefits were worth the effort: Only
M instances of 1cc are now needed, and they're all built from one set of
source modules. Bugs tend to be easier to decrypt because they can usu-
ally be reproduced in all instances of 1 cc by specifying the appropriate
target, and it's possible to include targets whose sole purpose is to help
diagnose bugs. It's still possible to build a one-target instance of 1cc,
when it's important to save space.
1cc's source code documents the results of the hundreds of subordi-
nate design choices that must be made when implementing software of
any significance. The source code for 1cc and for this book is in noweb
files that alternate text and code just as this book does. The code is ex-
tracted to form 1 cc's modules, which appear on the companion diskette.
Table 1.1 shows the correspondence between chapters and modules, and
groups the modules according to their primary functions. Some corre-
spondences are one-to-one, some chapters generate several small mod-
ules, and one large module is split across three chapters.
The modules without chapter numbers are omitted from this book,
but they appear on the companion diskette. 1 i st. c implements the
list-manipulation functions described in Exercise 2.15, output. c holds
the output functions, and i nit. c parses and processes C initializers.
event. c implements the event hooks described in Section 8.5, trace. c
emits code to trace calls and returns, and prof. c and profi o. c emit
profiling code.
1.4 • DESIGN 15

Function Chapter Header Modules


common definitions 1 c.h
2 alloc.c string.c
infrastructure and 3 sym.c
data structures 4 types.c
list.c
code-generation 5 ops.h bind.c
interface null .c symbolic.c
1/0 and 6 token.h input.c lex.c
lexical analysis output.c
7 error. c
8 expr.c tree.c
parsing and 9 enode.c expr.c simp.c
semantic analysis 10 stmt.c
11 decl.c main.c
init.c
intermediate-code 12 dag.c
generation
debugging and event.c trace.c
profiling prof.c profio.c
target-independent 13 config.h
instruction selection 13, 14, 15 gen.c
and register management
16 mips.md
code generators 17 sparc.md
18 x86.md

TABLE 1.1 Chapters and modules.

By convention, each chapter specifies the implementation of its mod-


ule by a fragment of the form
{Mis)=
#include "c.h"
{M macros)
{M types)
{M prototypes)
{M data)
{M functions)

where Mis the module name, like alloc.c. {M macros), {M types), and
{M prototypes) define macros and types and declare function prototypes
that are used only within the module. {M data) and {M functions) in-
clude definitions (not declarations) for both external and static data and
16 CHAPTER 1 • INTRODUCTION

functions. Empty fragments are elided. A module is extracted by giving


notangle a module name, such as alloc.c, and it extracts the fragment
shown above and all the fragments it uses, which yields the code for
the module.
Page numbers are not included in the fragments above, and they do
not appear in the index; they're used in too many places, and the long
lists of page numbers would be useless. Pointers to previous and subse-
quent definitions are given, however.

1.5 Common Declarations


Each module also specifies what identifiers it exports for use in other
modules. Declarations for exported identifiers are given in fragments
named (M typedefs), (M exported macros), (M exported types), (Mex-
ported data), and (M exported functions), where M names a module.
The header file c. h collects these fragments from all modules by defin-
ing fragments without the Ms whose definitions list the similarly named
fragments from each module. All modules include c. h. These fragments
are neither page-numbered nor indexed, just like those in the last section,
and for the same reason.
(c.h 16)=
(exported macros)
(typedefs)
#include "config.h"
(interface 78)
(exported types)
(exported data)
(exported functions)

The include file confi g. h defines back-end-specific types that are refer-
enced in (interface), as detailed in Chapter 5. c.h defines lee's global
structures and some of its global manifest constants.
1cc can be compiled with pre-ANSI compilers. There are just enough
of these left that it seems prudent to maintain compatibility with them.
ANSI added prototypes, which are so helpful in detecting errors that
we want to use them whenever we can. The following fragments from
output. c show how l cc does so.
( outpu t.c exported functions)= 18
....
extern void outs ARGS((char *));

(output.c functions)= 18
....
void outs(s) char *s; {
char *p;
1. 5 • COMMON DECLARATIONS 17

for (p = bp; (*p *s++) != O; p++)

bp = p;
if (bp > io[fd]->limit)
outflush();
}

Function definitions omit prototypes, so old compilers compile them di-


rectly. Function declarations precede the definitions and give the entire
list of ANSI parameter types as one argument to the macro ARGS. ANSI
compilers must predefine _STDC_, so ARGS yields the types if _STDC_
is defined and discards them otherwise.
(c.h exported macros)=
#ifdef _STDC_
...
17

#define ARGS(list) list


#else
#define ARGS(list) ()
#endif
A pre-ANSI compiler sees the declaration for outs as
extern void outs ();
103 limit
but 1cc and other ANSI C compilers see 321 list
98 outflush
extern void outs (char*); 16 outs

Since the declaration for outs appears before its definition, ANSI com-
pilers must treat the definition as if it included the prototype, too, and
thus will check the legality of the parameters in all calls to outs.
ANSI also changed variadic functions. The macro va_start now ex-
pects the last declared parameter as an argument, and varargs. h became
stdarg. h:
....
(c.h exported macros)+=
#ifdef _STDC_
17 18
...
#include <stdarg.h>
#define va_init(a,b) va_start(a,b)
#else
#include <varargs.h>
#define va_init(a,b) va_start(a)
#endif
Definitions of variadic functions also differ. The ANSI C definition
void print(char *fmt, ... ); { ... }
replaces the pre-ANSI C definition
18 CHAPTER 1 • INTRODUCTION

void print(fmt, va_alist) char *fmt; va_dcl; { ... }


so 1cc's macro VARARGS uses the ANSI parameter list or the pre-ANSI
parameter list and separate declarations depending on the setting of
_STDC_:
(c.h exported macros)+=
....
17 18
#ifdef _STDC_
....
#define VARARGS(newlist,oldlist,olddcls) newlist
#else
#define VARARGS(newlist,oldlist,olddcls) oldlist olddcls
#endif
The definition of print from output. c shows the use of ARGS, va_ i nit,
and VARARGS.
....
(output.c exported functions)+= 16 97 ....
extern void print ARGS((char *, ... ));
....
(output.c functions)+= 16
void print VARARGS((char *fmt, ... ),
(fmt, va_alist),char *fmt; va_dcl) {
va_list ap;
ARCS 17
va_init 17 va_init(ap, fmt);
vprint(fmt, ap);
va_end(ap);
}

This definition is verbose because it gives the same information in two


slightly different formats, but 1 cc uses VARARGS so seldom that it's not
worth fixing.
c. h also includes a few general-purpose macros that fit nowhere else .
(c.h exported macros)+=
....
18 19
#define NULL ((void*)O)
....
NULL is a machine-independent expression for a null pointer; in environ-
ments where integers and pointers aren't the same size, f(NULL) passes
a correct pointer where f(O) can pass more bytes or fewer in the absence
of a prototype for f. 1cc's generated code assumes that pointers fit in
unsigned integers. 1cc can, however, be compiled by other compilers for
which this assumption is false, that is, for which pointers are larger than
integers. Using NULL in calls avoids these kinds of errors in environments
where pointers are wider than unsigned integers, and thus permits 1cc
to be compiled and used as a cross-compiler in such environments.
1.6 •SYNTAX SPECIFICATIONS 19

....
(c.h exported macros)+= 18 97
....
#define NELEMS(a) ((int)(sizeof (a)/sizeof ((a)[O])))
#define roundup(x,n) (((x)+((n)-l))&(-((n)-1)))
NELEMS(a) gives the number of elements in array a, and roundup(x,n)
returns x rounded up to the next multiple of n, which must be a power
of two.

1.6 Syntax Specifications


Grammars are used throughout this book to specify syntax. Examples
include C's lexical structure and its syntax and the specifications read by
l burg, l cc's code-generator generator.
A grammar defines a language, which is a set of sentences composed
of symbols from an alphabet. These symbols are called terminal sym-
bols or tokens. Grammar rules, or productions, define the structure, or
syntax, of the sentences in the language. Productions specify the ways in
which sentences can be produced from nonterminal symbols by repeat-
edly replacing a nonterminal by one of its rules.
A production specifies a sequence of grammar symbols that can re-
place a nonterminal, and a production is defined by listing the nontermi-
nal, a colon, and nonterminal's replacement. A list of replacements for a
nonterminal is given by displaying the alternatives on separate lines or
by separating them by vertical bars (I). Optional phrases are enclosed in
brackets ([ ... ]), braces ({... }) enclose phrases that can be repeated zero
or more times, and parentheses are used for grouping. Nonterminals ap-
pear in slanted type and terminals appear in a fixed-width typewriter
type. The notation "one of ... " is also used to specify a list of alternatives,
all of which are terminals. When vertical bars, parentheses, brackets, or
braces appear as terminals, they're enclosed in single quotes to avoid
confusing their use as terminals with their use in defining productions.
For example, the productions
expr:
term { ( + I - ) term }
term:
factor { ( * I I ) factor }
factor:
ID
'(' expr ')'

define a language of simple expressions. The nonterminals are expr,


term, and factor, and the terminals are ID + - * I C ) . The first pro-
duction says that an expr is a term followed by zero or more occurrences
of + term or - term, and the second production is a similar specification
20 CHAPTER 1 • INTRODUCTION

for the multiplicative operators. The last two productions specify that
a factor is an ID or a parenthesized expr. These last two productions
could also be written more compactly as
factor: ID I '(' expr ')'
Giving some alternatives on separate lines often makes grammars easier
to read.
Simple function calls could be added to this grammar by adding the
production
factor: ID ' ( ' expr { , expr } ' ) '
which says that a factor can also be an ID followed by a parenthesized
list of one or more exprs separated by commas. All three productions
for factor could be written as
factor: ID [ ' (' expr { , expr } ' ) ' ] I ' (' expr ' ) '
which says that a factor is an ID optionally followed by a parenthesized
list of comma-separated exprs, or just a parenthesized expr.
This notation for syntax specifications is known as extended Backus-
Naur form, or EBNF. Section 7.1 gives the formalities of using EBNF gram-
mars to derive the sentences in a language.

1.7 Errors
1cc is a large, complex program. We find and repair errors routinely. It's
likely that errors were present when we started writing this book and
that the act of writing added more. If you think that you've found an
error, here's what to do.
1. If you found the error by inspecting code in this book, you might
not have a source file that displays the error, so start by creat-
ing one. Most errors, however, are exposed when programmers try
to compile a program they think is valid, so you probably have a
demonstration program already.
2. Preprocess the source file and capture the preprocessor output.
Discard the original code.
3. Prune your source code until it can be pruned no more without
sending the error into hiding. We prune most error demonstrations
to fewer than five lines. We need you to do this pruning because
there are a lot of you and only two of us.
4. Confirm that the source file displays the error with the distributed
version of 1cc. If you've changed 1cc and the error appears only in
your version, then you'll have to chase the error yourself, even if it
turns out to be our fault, because we can't work on your code.
FURTHER READING 21

5. Annotate your code with comments that explain why you think that
1 cc is wrong. If 1cc dies with an assertion failure, please tell us
where it died. If 1cc crashes, please report the last part of the call
chain if you can. If 1cc is rejecting a program you think is valid,
please tell us why you think it's valid, and include supporting page
numbers in the ANSI Standard, Appendix A in The C Programming
Language (Kernighan and Ritchie 1988), or the appropriate section
in C: A Reference Manual (Harbison and Steele 1991). If lee silently
generates incorrect code for some construct, please include the cor-
rupt assembler code in the comments and flag the bad instructions
if you can.
6. Confirm that your error hasn't been fixed already. The latest ver-
sion of 1 cc is always available for anonymous ftp in pub/l cc from
ftp. cs. pri nceton. edu. A LOG file there reports what errors were
fixed and when they were fixed. If you report an error that's been
fixed, you might get a canned reply.
7. Send your program in an electronic mail message addressed to
1cc-bugs@cs. pri nceton. edu. Please send only valid C programs;
put all remarks in C comments so that we can process reports semi-
automatically.

Further Reading
Most compiler texts survey the breadth of compiling algorithms and do
not describe a production compiler, i.e., one that's used daily to compile
production programs. This book makes the other trade-off, sacrificing
the broad survey and showing a production compiler in-depth. These
"breadth" and "depth" books complement one another. For example,
when you read about 1cc's lexical analyzer, consider scanning the ma-
terial in Aho, Sethi, and Ullman (1986); Fischer and LeBlanc (1991); or
Waite and Goos (1984) to learn more about alternatives or the underly-
ing theory. Other depth books include Holub (1990) and Waite and Carter
(1993).
Fraser and Hanson (199lb) describe a previous version of lee, and
include measurements of its compilation speed and the speed of its gen-
erated code. This paper also describes some of 1cc's design alternatives
and its tracing and profiling facilities.
This chapter tells you everything you need to know about noweb to use
this book, but if you want to know more about the design rationale or
implementation see Ramsey (1994). noweb is a descendant of WEB (Knuth
1984). Knuth (1992) collects several of his papers about literate program-
ming.
22 CHAPTER 1 • INTRODUCTION

The ANSI Standard (American National Standards Institute, Inc. 1990)


is the definitive specification for the syntax and semantics of the C pro-
gramming language. Unlike some other C compilers, 1cc compiles only
ANSI C; it does not support older features that were dropped by the
ANSI committee. After the standard, Kernighan and Ritchie (1988) is the
quintessential reference for C. It appeared just before the standard was
finalized, and thus is slightly out of date. Harbison and Steele (1991)
was published after the standard and gives the syntax for C exactly as it
appears in the standard. Wirth (1977) describes EBNF.
2
Storage Management

Complex programs allocate memory dynamically, and 1cc is no excep-


tion. In C, ma11 oc allocates memory and free releases it. lee could use
ma11 oc and free, but there is a superior alternative that is more efficient,
easier to program, and better suited for use in compilers, and it is easily
understood in isolation.
Calling ma11 oc incurs the obligation of a subsequent call to free. The
cost of this explicit deallocation can be significant. More important, it's
easy to forget it or, worse, deallocate something that's still referenced.
In some applications, most deallocations occur at the same time. Win-
dow systems are an example. Space for scroll bars, buttons, etc., are
allocated when the window is created and deallocated when the window
is destroyed. A compiler, like 1cc, is another example. 1 cc allocates
memory in response to declarations, statements, and expressions as they
occur within functions, but it deallocates memory only at the ends of
statements and functions.
Most implementations of ma11 oc use memory-management algorithms
that are necessarily based on the sizes of objects. Algorithms based on
object lifetimes are more efficient - if all of the deallocations can be
done at once. Indeed, stacklike allocation would be most efficient, but
it can be used only if object lifetimes are nested, which is generally not
the case in compilers and many other applications.
This chapter describes 1 cc's storage management scheme, which is
based on object lifetimes. In this scheme, allocation is more efficient than
ma11 oc, and the cost of deallocation is negligible. But the real benefit
is that this scheme simplifies the code. Allocation is so cheap that it
encourages simple applicative algorithms in place of more space-efficient
but complex ones. And allocation incurs no deallocation obligation, so
deallocation can't be forgotten.

2.1 Memory Management Interface


Memory is allocated from arenas, and entire arenas are deallocated at
once. Objects with the same lifetimes are allocated from the same arena.
The arena is identified by an arena identifier - a nonnegative integer -
when space from it is allocated or when all of it is deallocated:

23
24 CHAPTER 2 • STORAGE MANAGEMENT

(alloc.c exported functions}=


extern void *allocate ARGS((unsigned long n, unsigned a));
...
24

extern void deallocate ARGS((unsigned a));


Many allocations have the form
struct T *p;
p = allocate(sizeof *p, a);
for some C structure T and arena a. The use of si zeof *p where p is
a pointer works for any pointer type. Alternatives that depend on the
pointer's referent type are prone to error when the code is changed. For
example,
p = allocate(sizeof (struct T), a);
is correct only if p really is a pointer to a st ruct T. If p is changed to
a pointer to another structure and the call isn't updated, a 11 ocate may
allocate too much or too little space. The former is merely inefficient,
but the latter is disasterous.
This allocation idiom is so common that it deserves a macro:
(alloc.c exported macros}=
#define NEW(p,a) ((p) = allocate(sizeof *(p), (a)))
allocate 26
#define NEWO(p,a) memset(NEW((p),(a)), 0, sizeof *(p))
deallocate 28 a11 ocate and thus NEW return a pointer to uninitialized space on the
newarray 28
grounds that most clients will initialize it immediately. NEWO is used for
those allocations that need the new space cleared, which is accomplished
by the C library function memset. memset returns its first argument. No-
tice that both NEW and NEWO evaluate p exactly once, so it's safe to use an
expression that has side effects as an actual argument to either macro;
e.g., NEW(a[i++]).
Incidently, the result of sizeof has type size_t, which must be an
unsigned integral type capable of representing the size of the largest
object that can be declared. In practice, si ze_t is either unsigned int or
unsigned long. The declaration for a 11 ocate uses unsigned long so that
it can always represent the result of si zeof.
Arrays are another common allocation, and newarray allocates enough
uninitialized space in a given arena for melements each of size n bytes:
....
( alloc.c exported functions}+= 24
extern void *newarray
ARGS((unsigned long m, unsigned long n, unsigned a));

2.2 Arena Representation


The implementation of the memory management module is:
2.2 • ARENA REPRESENTATION 25

(alloc.c2s)=
#include "c. h"
(alloc.c types)
#ifdef PURIFY
(debugging implementation)
#else
(alloc.c data)
(alloc.c functions)
#endif
If PURIFY is defined, the implementation is replaced in its entirety by
one that uses ma 11 oc and free, and is suitable for finding errors. See
Exercise 2.1 for details.
As mentioned above, an arena is a linked list of large blocks of mem-
ory. Each block begins with a header defined by:
(alloc.c types)= 26
.....
struct block {
struct block *next;
char *limit;
char *avail;
};

The space immediately following the arena structure up to the location 26 allocate
given by the limit field is the allocable portion of the block. avail points 28 deallocate
103 limit
to the first free location within the block; space below avail has been 28 newarray
allocated and space beginning at avail and up to limit is available. The
next field points to the next block in the list. The implementation keeps
an arena pointer, which points to the first block in the list with available
space. Blocks are added to the list dynamically during allocation, as
detailed below. Figure 2.1 shows an arena after three blocks have been
allocated. Shading indicates allocated space. The unused space at the
end of the first full-sized arena in Figure 2.1 is explained below.
There are three arenas known by the integers 0-2; clients usually
equate symbolic names to these arena identifiers for use in calls to
allocate, deallocate, and newarray; see Section 5.12. The arena identi-
fiers index an array of pointers to one-element lists, each of which holds
a zero-length block. The first allocation in each arena causes a new block
to be appended to the end of the appropriate list.
(alloc.c data)= 27
.....
static struct block
first[] = { { NULL }, { NULL }, { NULL } },
*arena[] = { &first[O], &first[l], &first[2] };
The initializer for fi rst serves only to provide its size; the omitted ini-
tializers cause the remaining fields of each of the three structures to be
26 CHAPTER 2 • STORAGE MANAGEMENT

arena[l]

1 ::~~ I
avail .
NULL
NULL .
I
fi rst[l]

FIGURE 2.1 Arena representation.

initialized to null pointers. While this implementation has only three


arenas, it is easily generalized to any number of arenas by changing only
the number of initializations for first and arena. Section 5.12 describes
align 78 how 1cc uses the three arenas.
arena 25
avail 25
first 25 2.3 Allocating Space
limit 103
roundup 19
Most allocations are trivial: Round the request amount up to the proper
alignment boundary, increment the avai 1 pointer by the amount of the
rounded request, and return the previous value.
(alloc.c functions)=
void *allocate(n, a) unsigned long n; unsigned a; {
...
28

struct block *ap;

ap = arena[a];
n = roundup(n, sizeof (union align));
while (ap->avail + n > ap->limit) {
(get a new block 27)
}
ap->avail += n;
return ap->avail - n;
}
.....
(alloc.c types)+=
union align {
25 27
...
2. 3 • ALLOCATING SPACE 27

long l;
char *p;
double d;
int (*f) ARGS((void));
};

Like malloc, allocate must return a pointer suitably aligned to hold


values of any type. The size of the union al i gn gives the minimum such
alignment on the host machine. Its fields are those that are most likely
to have the strictest alignment requirements.
The while loop in the code above terminates when the block pointed to
by ap has at least n bytes of available space. For most calls to a 11 ocate,
this block is the one pointed to by the arena pointer pointed to by
a 11 ocate's second argument.
If the request cannot be satisfied from the current block, a new block
must be allocated. As shown below, dea 11 oca te never frees a block;
instead, it keeps the free blocks on a list that emanates from freeblocks.
a 11 ocate checks this list before getting a new block:
(alloc.c data)+=
...
25
static struct block *freeblocks;

(get a new block 2 7) = 26


if ((ap->next = freeblocks) != NULL) { 78 align
26 allocate
freeblocks = freeblocks->next; 25 arena
ap = ap->next; 25 avail
} else 28 deallocate
(allocate a new block 28)
ap->avail = (char *)((union header *)ap + 1);
ap->next NULL;
arena[a] = ap;

( alloc. c types) + =
...
26
union header {
struct block b;
union align a;
};

The union header ensures that ap->avai l is set to a properly aligned


address. Once ap points to a new block, the arena pointer passed to
a 11 ocate is set to point to this new block for subsequent allocations. If
the new block came from freeblocks, it might be too small to hold n
bytes, which is why there's a while loop in allocate.
If a new block must be allocated, one is requested that is large enough
to hold the block header and n bytes, and have 1OK of available space
left over:
28 CHAPTER 2 • STORAGE MANAGEMENT

(allocate a new block 28) = 27


{
unsigned m = sizeof (union header) + n + 10*1024;
ap->next = malloc(m);
ap = ap->next;
if (ap == NULL) {
error("insufficient memory\n");
exit(l);
}
ap->limit = (char *)ap + m;
}

When a request cannot be filled in the current block, the free space at
the end of the current block is wasted. This waste is illustrated in the
first full-size arena in Figure 2.1.
newarray's implementation simply calls allocate:
(alloc.c functions)+=
....
26 28
void *newarray(m, n, a) unsigned long m, n; unsigned a; { .,..
return allocate(m*n, a);
}

allocate
arena
26
25
2.4 Deallocating Space
first 25
freeblocks 27 An arena is deallocated by adding its blocks to the free-blocks list and
limit 103 reinitializing it to point to the appropriate one-element list that holds a
zero-length block. The blocks are already linked together via their next
fields, so the entire list of blocks can be added to freeb 1ocks with simple
pointer manipulations:
....
(alloc.c functions)+= 28
void deallocate(a) unsigned a; {
arena[a]->next = freeblocks;
freeblocks = first[a].next;
first[a].next =NULL;
arena[a] = &first[a];
}

2.5 Strings
Strings are created for identifiers, constants, registers, and so on. Strings
are compared often; for example, when a symbol table is searched for
an identifier.
The most common uses of strings are provided by the functions ex-
ported by string. c:
2.5 • STRINGS 29

(string.c exported functions)=


extern char* string ARGS((char *str));
extern char *stringn ARGS((char *str, int len));
extern char *stringd ARGS((int n));
Each of these functions returns a pointer to a permanently allocated
string. string makes a copy of the null-terminated string str, stringn
makes a copy of the len bytes in str, and stringd converts n to its
decimal representation and returns that string.
These functions save exactly one copy of each distinct string, so two
strings returned by these functions can be compared for equality by com-
paring their addresses. These semantics simplify comparisons and save
space, and stringn can handle strings with embedded null characters.
The function string calls stri ngn and provides an example of its use:
(string.c functions)= 29
.....
char *string(str) char *str; {
char *s;

for (s str; *s; s++)

return stringn(str, s - str);


}
30 stringn
st ri ngd converts its argument n into a string in a private buffer and
calls st ri ngn to return the appropriate distinct string.
(string.c functions)+=
....
29 30
.....
char *stringd(n) int n; {
char str[25], *s = str + sizeof (str);
unsigned m;

if (n == INT_MIN)
m = (unsigned)INT_MAX + 1;
else i f (n < O)
m = -n;
else
m = n;
do
*--s = m%10 + 'O';
while ((m /= 10) != O);
if (n < 0)
*--s = '-';
return stringn(s, str + sizeof (str) - s);
}

The code uses unsigned arithmetic because ANSI C permits different ma-
chines to treat signed modulus on negative values differently. The code
30 CHAPTER 2 • STORAGE MANAGEMENT

starts by assigning the absolute value of n to m; no two's complement


signed integer can represent the absolute value of the most negative
number, so this value is special-cased. The string is built backward,
last digit first. Exercise 2.10 explores why the local array str has 25
elements. INLMIN is defined in the standard header 1 i mi ts . h.
stri ngn maintains the set of distinct strings by saving them in a string
table. It saves exactly one copy of each distinct string, and it never re-
moves any string from the table. The string table is an array of 1,024
hash buckets:
(string.c data)=
static struct string {
char *str;
int len;
struct string *link;
} *buckets[1024J;
Each bucket heads a list of strings that share a hash value. Each entry
includes the length of the string (in the 1en field) because strings can
include null bytes.
stri ngn adds a string to the table unless it's already there, and returns
the address of the string.
(string.c functions)+=
...
29
NELEMS 19 char *stringn(str, len) char *str; int len; {
string 29
int i;
unsigned int h;
char *end;
struct string *p;

(h - hash code for str, end - 1 past end of str 31)


for (p = buckets[h]; p; p = p->link)
if (len == p->len) {
char *sl = str, *s2 p->str;
do {
if (sl == end)
return p->str;
} while (*sl++ == *s2++);
}
(install new string str 31)
}
h identifies the hash chain for str. stri ngn loops down this chain and
compares str with strings of equal length. end points to the character
one past the end of str.
An ideal hash function would distribute strings uniformly from o to
NELEMS(buckets)-1, which would give hash chains of equal length. The
code
FURTHER READING 31

(h - hash code for str, end - 1 past end of str 31)= 30


for Ch= 0, i = len, end= str; i > O; i--)
h = (h<<l) + scatter[*(unsigned char *)end++];
h &= NELEMS(buckets)-1;
is a good approximation. scatter is a static array of 256 random num-
bers, which helps distribute the hash values. Using the character pointed
to by end as an index runs the risk that the character will be sign-
extended and become a negative integer; casting end to a pointer to an
unsigned character avoids this possibility. This fragment also leaves end
pointing just past str's last character, which is used when comparing
and copying the string as shown above.
Conventional wisdom recommends that hash table sizes should be
primes. Using a power of two makes 1cc faster, because masking is
faster than modulus.
stri ngn stores new strings in chunks of permanently allocated mem-
ory of at least 4K bytes. PERM identifies the permanent storage arena.
(install new string str 31)= 30
{
static char *next, *strlimit;
if (next + len + 1 >= strlimit) {
int n = len + 4*1024;
next= allocate(n, PERM); 26 allocate
strlimit = next + n; 19 NELEMS
24 NEW
} 97 PERM
NEW(p, PERM); 30 stringn
p->len = len;
for (p->str = next; str < end; )
*next++ = *str++;
*next++ = O;
p->link = buckets[h];
buckets[h] = p;
return p->str;
}

The static variable next points to the next free byte in the current chunk,
and strlimit points one past the end of the chunk. The code allocates
a new chunk, if necessary, and a new table entry. It copies str, which
incidentally allocates space for it as it is copied by incrementing next,
and links the new entry into the appropriate hash chain.

Further Reading
Storage management is a busy area of research; Section 2.5 in Knuth
(1973a) is the definitive reference. There is a long list of techniques that
32 CHAPTER 2 • STORAGE MANAGEMENT

are designed both for general-purpose use and for specific application
areas, including the design described in this chapter (Hanson 1990). A
competitive alternative is "quick fit" (Weinstock and Wulf 1988). Quick-fit
allocators maintain N free lists for the N block sizes requested most fre-
quently. Usually, these sizes are small and contiguous; e.g., 8-128 bytes
in multiples of eight bytes. Allocation is easy and fast: Take the first
block from the appropriate free list. A block is deallocated by adding it
to the head of its list. Requests for sizes other than one of the N favored
sizes are handled with other algorithms, such as first fit (Knuth l 973a).
One of the advantages of 1cc's arena-based algorithm is that alloca-
tions don't have to be paired with individual deallocations; a single deal-
location frees the memory acquired by many allocations, which simplifies
programming. Garbage collection takes this advantage one step further.
A garbage collector periodically finds all of the storage that is in use and
frees the rest. It does so by following all of the accessible pointers in
the program. Appel (1991) and Wilson (1994) survey garbage-collection
algorithms. Garbage collectors usually need help from the programming
language, its compiler, and its run-time system in order to locate the ac-
cessible memory, but there are algorithms that can cope without such
help. Boehm and Weiser (1988) describe one such algorithm for C. It
takes a conservative approach: Anything that looks like a pointer is taken
to be one. As a result, the collector identifies some inaccessible memory
allocate 26 as accessible and thus busy, but that's better than making the opposite
deallocate 28 decision.
Storing all strings in a string table and using hashing to keep only one
copy of any string is a scheme that's been used for years in compilers
and related programming-language implementations, but it's rarely doc-
umented. It's used in SNOBOL4 (Griswold 1972), for example, to make
comparison fast and to make it easy to use strings as keys in associa-
tive tables. Related techniques store strings in a separate string space,
but don't bother to avoid storing multiple copies of the same string
to simplify some string operations, such as substring and concatena-
tion (Hansen 1992; Hanson 1974; McKeeman, Horning, and Wortman
1970).
Knuth (l 973b) is the definitive expose on hashing. Section 7.6 of Aho,
Sethi, and Ullman (1986) describes hash functions and their use in
compilers.

Exercises
2.1 Revise allocate and deallocate to use the C library functions
ma 11 oc and free.
2.2 The only objective way to make decisions between competitive algo-
rithms and designs in 1cc is to implement them and measure their
EXERCISES 33

performance. 1 cc compiling itself is a reasonable benchmark. Mea-


sure the performance of the arena-based algorithm against ma11 oc
and free as implemented in the previous exercise.
2.3 Redefine NEW so that it does most allocations inline, i.e., so that
it calls a 11 ocate only when there isn't enough space in the arena.
Measure the benefit. You'll need to export the arena data structures
to implement inline allocation.
2.4 When a 11 ocate creates a new block, there's a good chance that this
block is adjacent to the previous one for the arena and that they
can be merged into one larger block. Implement this and measure
the improvement.
2.5 When allocate takes a block from freeblocks, it's possible that
the block is too small. Instrument the allocator and find out how
often this situation occurs. Is it worth fixing?
2.6 Show that deallocate works correctly when the arena list holds
only the zero-length block.
2.7 deal locate never frees blocks, for example, by calling free. For
some inputs, 1 cc's arenas will balloon temporarily, but the blocks
allocated will never be reused. Revise dea 11 ocate to free blocks
instead of adding them onto freeblocks. Does this change make 26 allocate
1cc run faster? 28 deallocate
27 freeblocks
24 NEW
2.8 Implement a conservative garbage collector for 1 cc or modify 1cc to
29 stringd
use an existing one. The collector described by Boehm and Weiser 30 stringn
(1988) is publicly available. Most such allocators initiate a collection
or a partial collection at every allocation, so you can simply gut
deallocate, or make it a null macro and revise allocate to call
the appropriate allocation function.
2.9 Strings installed in the string table by stri ngn are never discarded.
Is this feature a problem? Instrument stri ngn to measure the dis-
tribution of the size of the string table. Suppose it gets too big;
how would you revise the string interface to permit strings to be
deleted?
2.10 stri ngd formats its argument into str, which is an array of 25 char-
acters. Explain why 2 5 is large enough for all modem computers
on which 1 cc runs or for which it generates code.
2.11 Many of the integers passed to stri ngd are small; say, in the range
-100 to 100. Strings for these integers could be preallocated at
compile time, and stri ngd and stringn could return pointers to
them and thereby avoid allocations. Implement this optimization.
Does it make 1cc run faster?
34 CHAPTER 2 • STORAGE MANAGEMENT

2.12 stri ngn allocates memory in big chunks to hold the characters in
a string instead of calling a 11 ocate for each string. Revise st ri ngn
so that it calls a 11 ocate for each string and measure the differences
in both time and space. Explain any differences you find.
2.13 The size of stringn's hash table is a power of two, which is often
deprecated. Try a prime and measure the results. Try to design a
better hash function and measure the results.
2.14 stri ngn compares strings with inline code instead of, for example,
calling memcmp. Replace the inline code with a call to memcmp and
measure the result. Why was our decision to inline justified?
2.15 lee makes heavy use of circularly linked lists of pointers, and the
implementation of the module 1 i st. c exemplifies the use of the
allocation macros. 1i st. c exports a list element type and three
list-manipulation functions:
(list.c typedefs)=
typedef struct list *List;

(list.c exported types)=


struct list {
allocate 26 void *x;
list 321 List link;
stringn 30 };

(list.c exported functions)=


extern List append ARGS((void *x, List list));
extern int length ARGS((List list));
extern void *ltov ARGS((List *list, unsigned a));
A Li st holds zero or more elements stored in the x fields of the
list structures. A Li st points to the last struct 1 i st in a list,
and a null Li st is the empty list by definition. append adds a node
containing x onto the end of 1 i st and returns 1i st. 1ength returns
the number of elements in 1 i st. 1tov copies then elements in 1 i st
into a null-terminated array of pointers in the arena indicated by a,
deallocates the list structures, and returns the array. The array has
n + 1 elements including the terminating null element. Implement
the list module.
3
Symbol Management

The symbol tables are the central repository for all information within
the compiler. All parts of the compiler communicate through these ta-
bles and access the data - symbols - in them. For example, the lexical
analyzer adds identifiers to the identifier table, and the parser adds type
information to these identifiers. The code generators add target-specific
data to symbol-table entries; for example, register assignments for locals
and parameters. Symbol tables are also used to hold labels, constants,
and types.
Symbol tables map names into sets of symbols. Constants, identifiers,
and label numbers are examples of names. Different names have differ-
ent attributes. For example, the attributes for the identifier that names
a local variable might include the variable's type, its location in a stack
frame for the procedure in which it is declared, and its storage class.
Identifiers that name members of a structure have a very different set
of attributes, including the members' types, the structures in which they
appear, and their locations within those structures.
Symbols are collected into symbol tables. The symbol-table module
manages symbols and symbol tables.
Symbol management must deal not only with the symbols themselves,
but must also handle the scope or visibility rules imposed by the ANSI C
standard. The scope of an identifier is that portion of the program text in
which the identifier is visible; that is, where it may be used in expressions,
and so forth. In C, scopes nest. An identifier is visible at the point of its
declaration until the end of the compound statement or parameter list
in which it is declared. An identifier declared outside of any compound
statement or parameter list has file scope; it is visible from the point of
its declaration to the end of the source file in which it appears.
A declaration for an identifier X hides a visible identifier X declared
at an outer level. The following program illustrates this effect; the line
numbers are for explanatory purposes and are not part of the program.

35
36 CHAPTER 3 • SYMBOL MANAGEMENT

1 int x, y;
2 f( int x , int a) {
3 int b;
4 y = x + a*b;
5 if (y < 5) {
6 int a;
7 y = x + a*b;
8 }
9 y = x + a*b;
10 }
Line 1 declares the globals x and y, whose scopes begin at line 1 and
extend through line 10. But the declaration of the parameter x in line 2
interrupts the scope of the global x. The scopes of the parameters x and a
begin at line 2 and extend through line 9. The scope of a is interrupted by
the declaration of the local a in line 6. Each identifier in the expression on
line 4 is bound to a specific declaration, and these bindings are specified
by C's scope rules. Using x:n to denote the identifier x declared at linen,
y is bound to y:l, x to x:2, a to a:2, and b to b:3. The bindings for the
expression in line 7 are the same, except that a is bound to a:6.
Declarations like those for x in line 2 and a in line 6 create a hole in the
scopes of similarly named identifiers declared in outer scopes. For ex-
ample, the scope of a:6 is lines 6-8, which is the hole in the scope of a:2,
whose scope is lines 2-5 and 9-10. The symbol-management functions
must accommodate this and similar situations.
In most languages, like Pascal, there is one name space for identifiers.
That is, there is a single set of identifiers for all purposes and, at any
point in the program, there can be only one visible identifier of a given
name.
The name spaces in ANSI C categorize identifiers according to use:
Statement labels, tags, members, and ordinary identifiers. Tags identify
structures, unions, and enumerations. There are three separate name
spaces for labels, tags, and identifiers, and, for each structure or union,
there is a separate name space for its members.
For each name space, there can be only one visible identifier of a given
name at any point in the program. There can, however, be more than one
visible identifier at any point in the program if each such identifier is in
a different name space. The following artificial and confusing program
illustrates this effect.
3.1 • REPRESENTING SYMBOLS 37

1 struct list { int x; struct list *list; } *list;


2 walk(struct list *list) {
3 list:
4 pri ntf("%d\n", 1i st->x);
5 if ((list = list->list) != NULL)
6 goto list;
7 }
8 main() { walk(list); }
Llne 1 declares three identifiers named 1 i st, all of which are visible after
the declaration. 1i st is a structure tag, a field name, and a variable.
The tag and the variable have file scope; technically, so does the field
name, but it can be used only with the field reference operators . and
->. Llne 2 declares a parameter 1 i st whose scope is lines 2-7. Line 3
declares the label 1i st, which has function scope; it is visible anywhere
in the function wa1k. The uses of 1i st in lines 4-8 determines which
name space is used; line 4 uses ordinary identifiers, line 5 uses ordinary
identifiers for the first two occurrences of 1 i st and members of struct
1i st for the rightmost occurrence of 1i st, line 6 consults the label name
space, and line 8 again uses the ordinary identifiers.
Roughly speaking, there is a separate symbol table for each name
space, and symbol tables themselves handle scope. 1 cc also uses sepa-
rate symbol tables for unscoped collections, like constants. 38 Coordinate
34 List
422 uses
3.1 Representing Symbols
The memory-allocation and string modules could be used outside of 1cc,
but the symbol-table module is specific to 1 cc. It manages 1 cc-specific
symbols and symbol tables, and it implements the scope rules and name
spaces specified by ANSI C.
There is little about symbols themselves that is relevant to the symbol-
table module, which needs only those attributes, like names, that relate
to scope. It's simplest, however, to collect the name and all of the other
attributes into a single symbol structure:
(sym.c typedefs)=
typedef struct symbol *Symbol;
...
38

(sym.c exported types)=


struct symbol {
...
38

char *name;
int scope;
Coordinate src;
Symbol up;
List uses;
38 CHAPTER 3 • SYMBOL MANAGEMENT

int sclass;
(symbol flags 50)
Type type;
float ref;
union {
(labels 46)
(struct types65)
(enum constants69)
(enum types 68)
(constants 47)
(function symbols 290)
(globals 265)
(temporaries 346)
} u;
Xsymbol x;
(debugger extension)
};
The fields above the uni on u apply to all kinds of symbols in all tables.
Most of the symbol-table functions read and write only the name, scope,
src, up, and uses fields. Those specific to constants and labels also rely
on some of the fields in the uni on u and some of the (symbol flags) as
detailed below. The remaining fields implement attributes that are asso-
file 104
uses 422
ciated with specific kinds of symbols, and are initialized and modified
Xsymbol 362 by clients of the symbol-table module.
The name field is usually the symbol-table key. For identifiers and
keywords that name types, it holds the name used in the source code.
For generated identifiers, such as structures without tags, name is a digit
string.
The scope field classifies each symbol as a constant, label, global, pa-
rameter, or local:
(sym.c exported types)+=
....
37
enum { CONSTANTS=l, LABELS, GLOBAL, PARAM, LOCAL };
A local declared at nesting level k has a scope equal to LOCAL+k.
The src field is the point in the source code that defines the symbol,
as in a variable declaration. Its Coordinate value pinpoints the symbol's
definition:
....
(sym.c typedefs)+=
typedef struct coord {
37 39
...
char *file;
unsigned x, y;
} Coordinate;
The file field is the name of the file that contains the definition, and
y and x give the line number and character position within that line at
which the definition occurs.
3.2 •REPRESENTING SYMBOL TABLES 39

The up field chains together all symbols in a symbol table, starting


with the last one installed. Traversing up this chain reveals all of the
symbols that are in scope at the time of the traversal, as well as those
hidden by declarations of the same identifiers in nested scopes. This
facility may help back ends emit debugger symbol-table information, for
example.
1cc has an option that causes it to keep track of every use of ev-
ery symbol. When this option is specified, the uses field holds a list of
Coordinates that identify these uses. If the option is not specified, uses
is null. See Exercise 3.4.
The scl ass field is the symbol's extended storage class, which may
be AUTO, REGISTER, STATIC, or EXTERN. scl ass is TYPEDEF for typedefs
and ENUM for enumeration constants, and it's unused and thus zero for
constants and labels.
The type field holds the Type for variables, functions, constants, and
structure, union, and enumeration types.
For variables and labels, the ref field is an approximation of the num-
ber of times that variable is referenced. Section 10.3 explains how this
approximation is computed.
The u field is a union that supplies additional data for labels, struc-
ture and union types, enumeration identifiers, enumeration types, con-
stants, functions, global and static variables, and temporary variables.
The (symbol flags) are one-bit attribute flags for each symbol. The x field 80 AUTO
and the (debugger extension) collect fields that are manipulated only by 40 constants
38 Coordinate
back ends, such as the register assigned to a variable or the information 109 ENUM
necessary to generate data for a debugger. 40 externals
The typedefs for Symbo 1 and Coordinate illustrate a convention used 80 EXTERN
throughout 1cc: Capitalized type names refer to structures with the sim- 41 globals
pler lowercase tag, or to pointers to such structures. Thus, the typedef 41 identifiers
80 REGISTER
Coordinate names the type struct coord and Symbol names the type 38 sclass
struct symbol *. 80 STATIC
37 Symbol
41 table
3.2 Representing Symbol Tables 54 Type
422 uses

Symbol tables are manipulated only by the symbol-table module. It ex-


ports an opaque type for tables and the tables themselves:
(sym.c typedefs)+=
...
38 47
.....
typedef struct table *Table;

(sym.c exported data)= 42


.....
extern Table constants;
extern Table externals;
extern Table globals;
extern Table identifiers;
40 CHAPTER 3 • SYMBOL MANAGEMENT

extern Table labels;


extern Table types;
Subsets of these tables implement three of the name spaces in ANSI C.
identifiers holds the ordinary identifiers. exte rna 1 s holds the subset
of identifiers that have been declared extern; it is used to warn about
conflicting declarations of external identifiers. g1oba1 s is the part of the
i denti fi ers table that holds identifiers with file scope.
Compiler-defined internal labels are stored in 1abe1 s, and type tags
are stored in types.
The tables themselves are lists of hash tables, one for each scope:
(sym.c types)=
struct table {
int level;
Table previous;
struct entry {
struct symbol sym;
struct entry *link;
} *buckets[256];
Symbol all;
};
#define HASHSIZE NELEMS(((Table)O)->buckets)
CONSTANTS 38
GLOBAL 38 A Tab 1e value, like identifiers, points to a tab 1e structure that holds
globals 41
identifiers 41 a hash table for the symbols at one scope, specifically the scope given by
labels 41 the value of the 1eve1 field. The buckets field is an array of pointers to
level 42 the hash chains. The previous field points to the table for the enclosing
NELEMS 19 scope.
symbol 37 Entries in the hash chains hold a symbol structure and a pointer to the
Table 39 next entry on the chain. Looking up a symbol involves hashing the key
table 41
types 41 to pick a chain and walking down the chain to the appropriate symbol.
If the symbol isn't found, following the previous field exposes entries in
the next enclosing scope.
In each table structure, a 11 heads a list of all symbols in this and en-
closing scopes. This list is threaded through the up fields of the symbols.
The symbol-table module initializes all but one of the Tables it ex-
ports:
(sym.c data)= 42
....
static struct table
ens { CONSTANTS },
ext { GLOBAL },
ids { GLOBAL },
tys { GLOBAL } ;
Table constants =&ens;
Table externals = &ext;
3.2 • REPRESENTING SYMBOL TABLES 41

Table identifiers = &ids;


Table globals = &ids;
Table types = &tys;
Table labels;
gl oba1s always points to the identifier table at scope GLOBAL, while
i denti fie rs points to the table at the current scope. types is described
in Chapter 4. funcdefn creates a new labels table for each function.
Tables for nested scopes are created dynamically and linked into the
appropriate enclosing table:
(sym.c functions)= ....
41
Table table(tp, level) Table tp; int level; {
Table new;

NEWO(new, FUNC);
new->previous = tp;
new->level =level;
if (tp)
new->all = tp->all;
return new;
}

All dynamically allocated tables are discarded after compiling each func- 38 Coordinate
tion, so they are allocated in the FUNC arena. 286 funcdefn
97 FUNC
Figure 3.1 shows the four tables that emanate from i denti fi ers when 38 GLOBAL
1cc is compiling line 7 of the example at the top of page 36. The figure's 42 level
entry structures show only the name and up fields of their symbols and 24 NEWO
their 1 ink fields. The solid lines show the previous fields, which connect 37 scope
tables; the elements of buckets and the 1 ink fields, which connect en- 39 Table
tries; and the name fields. The dashed lines emanate from the a 11 fields
in tables and from the up fields of symbols.
The a11 field is initialized to the enclosing table's 1i st so that it is
possible to visit all symbols in all scopes by following the symbols be-
ginning at a table's a11. This capability is used by foreach to scan a
table and apply a given function to all symbols at a given scope .
(sym.c functions)+=
...
41 42
void foreach(tp, lev, apply, cl) Table tp; int lev;
....
void (*apply) ARGS((Symbol, void*)); void *cl; {
while (tp && tp->level > lev)
tp = tp->previous;
if (tp && tp->level lev) {
Symbol p;
Coordinate sav;
sav = src;
for (p = tp->all; p && p->scope lev; p = p->up) {
42 CHAPTER 3 • SYMBOL MANAGEMENT

src = p->src;
(*apply)(p, cl);
}
src = sav;
}
}

The while loop finds the table with the proper scope. If one is found,
foreach sets the global variable src to each symbol's definition coordi-
nate and calls apply with the symbol. cl is a pointer to call-specific data
- a closure - supplied by callers of foreach, and this closure is passed
along to apply so that it can access those data, if necessary. src is set so
that diagnostics that might be issued by apply will refer to a meaningful
source coordinate.
The for loop traverses the table's all and stops when it encounters
the end of the list or a symbol at a lower scope. a 11 is not strictly nec-
essary because fo reach could traverse the hash chains, but presenting
the symbols to apply in an order independent of hash addresses makes
the order of the emitted code machine-independent.

3.3 Changing Scope


apply 41
foreach 41 The value of the global variable level and the corresponding tables rep-
GLOBAL 38 resent a scope.
i denti fi ers 41 ....
rmtypes 59
types 41
(sym.c exported data)+=
extern int level;
39 52
...
(sym.c data)+=
....
40
int level = GLOBAL;
There are more scopes than source-code compound statements because
scopes are used to partition symbols for other purposes. For example,
there are separate scopes for constants and for parameters.
1eve1 is incremented upon entering a new scope.
....
(sym.c functions)+=
void enterscope() {
41 42 ...
++level;
}

At scope exit, 1eve1 is decremented, and the corresponding i denti fie rs


and types tables are removed.
....
(sym.c functions)+=
void exitscope() {
42 44 ...
rmtypes(level);
3.3 • CHANGING SCOPE 43

0
identifiers\ a b x y

level 6
previous
buckets

all -'

FIGURE 3.1 Symbol tables when compiling line 7 of the example on page 36.
44 CHAPTER 3 • SYMBOL MANAGEMENT

if (types->level == level)
types = types->previous;
if (identifiers->level == level) {
(warn if more than 12 7 identifiers)
identifiers = identifiers->previous;
}
--level;
}

Tables at the current scope are created only if necessary. Few scopes in C
declare new symbols, so lazy table allocation saves time, but exi tscope
must check levels to see if there is a table to remove. rmtypes removes
from the type cache types with tags defined in the vanishing scope; see
Section 4.2.

3.4 Finding and Installing Identifiers


i nsta 11 allocates a symbol for name and adds it to a given table at a spe-
cific scope level, allocating a new table if necessary. It returns a symbol
pointer.
(sym.c functions)+=
....
42 45
.....
arena 25 Symbol install(name, tpp, level, arena)
exitscope 42 char *name; Table *tpp; int level, arena; {
HASHSIZE 40
identifiers 41 Table tp = *tpp;
level 42 struct entry *p;
NEWO 24 unsigned h = (unsigned)name&(HASHSIZE-1);
rmtypes 59
scope 37 if (level > 0 && tp->level <level)
Table 39
table 41 tp = *tpp = table(tp, level);
types 41 NEWO(p, arena);
p->sym.name = name;
p->sym.scope =level;
p->sym.up = tp->all;
tp->all = &p->sym;
p->link = tp->buckets[h];
tp->buckets[h] = p;
return &p->sym;
}

name is a saved string, so its address can be its hash value.


tpp points to a table pointer. If *tpp is a table with scopes, like
i denti fi ers, and there is not yet a table corresponding to the scope
indicated by the argument level, install allocates a table for the scope
indicated by 1eve1 and updates *tpp. It then allocates and zeroes a
symbol-table entry, initializes some fields of the symbol itself, and adds
3.5 • LABELS 45

the entry to the hash chain. 1eve 1 must be zero or at least as large as the
table's scope level; a zero value for 1eve1 indicates that name should be
installed in *tpp. i nsta 11 accepts an argument that specifies the appro-
priate arena because function prototypes and thus the symbols in them
are retained forever, even if they're declared in a nested scope.
1ookup searches a table for a name; it handles lookups where the
search key is the name field of a symbol. It returns a symbol pointer
if it succeeds and the null pointer otherwise.
(sym.c functions)+=
...
44 45
.....
Symbol lookup(name, tp) char *name; Table tp; {
struct entry *p;
unsigned h = (unsigned)name&(HASHSIZE-1);

do
for (p = tp->buckets[h]; p; p p->link)
if (name == p->sym.name)
return &p->sym;
while ((tp = tp->previous) !=NULL);
return NULL;
}

The inner loop scans a hash chain, and the outer loop scans enclosing
scopes. Comparing two strings is trivial because the string module guar- 40 HASHSIZE
44 install
antees that two strings are identical if and only if they are the same 39 Table
string.

3.5 Labels
The symbol-table module also exports functions to manage labels and
constants. These are similar to 1ookup and i nsta 11, but there is no
scope management for these tables, and looking up a label or constant
installs it if necessary and thus always succeeds. Also, the search key is
a field in the union u that is specific to labels or constants.
Compiler-generated labels and the internal counterparts of source-
language labels are named by integers. gen 1abe 1 generates a run of these
integers by incrementing a counter:
(sym.c functions)+=
...
45 46
.....
int genlabel(n) int n; {
static int label = 1;

label += n;
return label - n;
}
46 CHAPTER 3 • SYMBOL MANAGEMENT

gen 1abe 1 is also used whenever a unique, anonymous name is needed,


such as for generated identifiers like temporaries.
A symbol is allocated for each label, and u . 1 . 1abe1 holds its label
number:
(labels 46) = 38
struct {
int label;
Symbol equatedto;
} 1;

When two or more internal labels are found to label the same location,
the equatedto fields of such label symbols point to one of them.
There is an internal label for each source-language label. These and
other compiler-generated labels are kept in 1abel s. This table is created
once for each function (see Section 11.6) and is managed by fi ndl abe l,
which takes a label number and returns the corresponding label sym-
bol, installing and initializing it, and announcing it to the back end, if
necessary.
(sym.c functions)+=
...
45 47
....
Symbol findlabel(lab) int lab; {
struct entry *p;
defsymbol 89 unsigned h = lab&(HASHSIZE-1);
(MIPS) " 457
(SPARC) " 491
(X86) " 520
for (p = labels->buckets[h]; p; p = p->link)
FUNC 97 if (lab== p->sym.u.l.label)
generated 50 return &p->sym;
genlabel 45 NEWO(p, FUNC);
HASHSIZE 40 p->sym.name = stringd(lab);
IR 306
LABELS 38
p->sym.scope = LABELS;
labels 41 p->sym.up = labels->all;
NEWO 24 labels->all = &p->sym;
scope 37 p->link = labels->buckets[h];
stringd 29 labels->buckets[h] = p;
p->sym.generated = 1;
p->sym.u.l.label =lab;
(*IR->defsymbol)(&p->sym);
return &p->sym;
}

generated is one of the one-bit (symbol flags), and it identifies a gen-


erated symbol. Some back ends use specific formats for the names of
generated symbols to avoid, for example, cluttering linker tables.
3.6 • CONSTANTS 47

3.6 Constants
A reference to a compile-time constant as an operand in an expression is
made by pointing to a symbol for the constant. These symbols reside in
the constants table. Like labels, this table is not scoped; all constants
have a scope field equal to CONSTANTS.
The actual value of a constant is represented by instances of the union
(sym.c typedefs)+=
....
39
typedef union value {
/* signed */ char sc;
short ss;
int i;
unsigned char uc;
unsigned short us;
unsigned int u;
float f;
double d;
void *p;
} Value;
The value is stored in the appropriate field according to its type, e.g.,
integers are stored in the i field, unsigned characters are stored in the
uc field, etc. 38 CONSTANTS
When a constant is installed in constants, its Type is stored in the 40 constants
41 labels
symbol's type field; Types encode C's data types and are described in 37 scope
Chapter 4. The value is stored in u . c. v: 54 Type
160 value
(constants 47) = 38
struct {
Value v;
Symbol loc;
} c;
On some targets, some constants - floating-point numbers - cannot
be stored in instructions, so the compiler generates a static variable and
initializes it to the value of the constant. For these, u. c. l oc points to
the symbol for the generated variable. Taken together, the type and u. c
fields represent all that is known about a constant.
Only one instance of any given constant appears in the constants
table, e.g., if the constant "hello world" appea~s three times in a pro-
gram, all three references point to the same symbol. constant searches
the constant table for a given value of a given type, installing it if neces-
sary, and returns the symbol pointer. Constants are never removed from
the table.
( sym.c functions)+=
....
46 49
.....
Symbol constant(ty, v) Type ty; Value v; {
48 CHAPTER 3 • SYMBOL MANAGEMENT

struct entry *p;


unsigned h = v.u&(HASHSIZE-1);

ty = unqual(ty);
for (p = constants->buckets[h]; p; p = p->link)
if (eqtype(ty, p->sym.type, 1))
(return the symbol if p's value== v 48)
NEWO(p, PERM);
p->sym.name = vtoa(ty, v);
p->sym.scope = CONSTANTS;
p->sym.type = ty;
p->sym.sclass = STATIC;
p->sym.u.c.v = v;
p->link = constants->buckets[h];
p->sym.up = constants->all;
constants->all = &p->sym;
constants->buckets[h] = p;
(announce the constant, if necessary 49)
p->sym.defined = 1;
return &p->sym;
}

CHAR 109 unqual returns the unqualified version of a Type, namely without const
CONSTANTS 38 or volatile, and eqtype tests for type equality (see Section 4.7). If v ap-
constants 40 pears in the table, its symbol pointer is returned. Otherwise, a symbol is
defined 50 allocated and initialized. The name field is set to the string representation
eqtype 69
FLOAT 109
returned by vtoa.
HASHSIZE 40 This value is useful only for the integral types and constant pointers;
INT 109 for the other types, the string returned vtoa may not reliably depict the
NEWO 24 value. Constants are found in the table by comparing their actual values,
PERM 97 not their string representations, because some floating-point constants
scope 37
SHORT 109
have no natural string representations. For example, the constant expres-
STATIC 80 sion (double)(float)0.3 truncates 0.3 to a machine-dependent value.
Type 54 The effect of the cast cannot be captured by a valid string constant.
unqual 60 The type operator determines which union ficlds to compare.
UNSIGNED 109
(sym.c macros)=
#define equalp(x) v.x == p->sym.u.c.v.x
(return the symbol if p's value== v 48) = 48
switch (ty->op) {
case CHAR: if (equalp(uc)) return &p->sym; break;
case SHORT: if (equalp(ss)) return &p->sym; break;
case INT: if (equalp(i)) return &p->sym; break;
case UNSIGNED: if (equalp(u)) return &p->sym; break;
case FLOAT: if (equalp(f)) return &p->sym; break;
3.7 • GENERATED VARIABLES 49

case DOUBLE: if (equalp(d)) return &p->sym; break;


case ARRAY: case FUNCTION:
case POINTER: if (equalp(p)) return &p->sym; break;
}

constant calls defsymbol to announce to the back end those constants


that might appear in dags:
(announce the constant, if necessary 49) = 48
if (ty->u.sym && !ty->u.sym->addressed)
(*IR->defsymbol)(&p->sym);
The primitive types, like the integers and floating-point types, appear in
dags only if 1cc is so configured, which is what the addressed flag tests.
See Sections 4.2 and 5.1.
Integer constants abound in both the front and back ends. i ntconst
encapsulates the idiom for installing and announcing an integer constant:
....
(sym.c functions)+=
Symbol intconst(n) int n; {
47 49
...
Value v;

v. i = n;
return constant(inttype, v); 179 addressed
} 109 ARRAY
47 constant
89 defsymbol
457 " (MIPS)
3. 7 Generated Variables 491 " (SPARC)
520 " (X86)
The front end generates local variables for many purposes. For example, 109 DOUBLE
it generates static variables to hold out-of-line constants like strings and 97 FUNC
109 FUNCTION
jump tables for switch statements. It generates locals to pass and return 50 generated
structures to functions and to hold the results of conditional expres- 45 genlabel
sions and switch values. geni dent allocates and initializes a generated 38 GLOBAL
identifier of a specific type, storage class, and scope: 306 IR
38 LOCAL
.... 24 NEWO
(sym.c functions)+= 49 50
Symbol genident(scls, ty, lev) int scls, lev; Type ty; { ... 97 PERM
109 POINTER
Symbol p; 37 scope
29 stringd
NEWO(p, lev >=LOCAL? FUNC : PERM); 47 Value
p->name = stringd(genlabel(l));
p->scope = lev;
p->sclass = scls;
p->type = ty;
p->generated = 1;
if (lev == GLOBAL)
50 CHAPTER 3 • SYMBOL MANAGEMENT

(*IR->defsymbol)(p);
return p;
}

(symbol flags 50)=


unsigned temporary:!;
...
50 38

unsigned generated:!;
The names are digit strings, and the generated flag is set. Parameters
and locals are announced to the back end elsewhere; generated globals
are announced here by calling the back end's defsymbo 1 interface func-
tion. IR points to a data structure that connects a specific back end with
the front; Section 5.11 explains how this binding is initialized.
Temporaries are another kind of generated variable, and are distin-
guished by a lit temporary flag:
....
(sym.c functions)+=
Symbol temporary(scls, ty, lev) Type ty; int scls, lev; {
49 50
...
Symbol p = genident(scls, ty, lev);

p->temporary = 1;
return p;
}
bt:ot:
74
defsymbol 89 Back ends must also generate temporary locals to spill registers, for ex-
(MIPS) " 457
(SPARC) " 491
ample. They cannot call temporary directly because they do not know
(X86) " 520 about the type system. newtemp accepts a type suffix, calls btot to map
genident: 49 this suffix into a representative type, and calls temporary with that type.
IR306
(sym.c functions)+=
....
50
LOCAL 38
local 90 Symbol newtemp(sclass, tc) int sclass, tc; {
(MIPS) " 447 Symbol p = temporary(sclass, btot(tc), LOCAL);
(SPARC) " 483
(X86) " 518
(*IR->local)(p);
p->defined = 1;
return p;
}
....
(symbol flags 50)+=
unsigned defined:!;
50 179
... 38

Calls to newtemp occur during code generation, which is too late for new
temporaries to be announced like front-end temporaries. So newtemp
calls 1oca1 to announce them. The flag defined is lit after the symbol
has been announced to the back end.
FURTHER READING 51

Further Reading
1 cc's symbol-table module implements only what is necessary for C.
Other languages need more; for example, in block-structured languages
- those with nested procedures - more than one set of parameters and
locals are visible at the same time. Newer object-oriented languages and
languages with explicit scope directives have more scopes; some need
many separate symbol tables to exist at the same time.
Fraser and Hanson (199lb) describe the evolution of lee's symbol-
table module.
Knuth (1973b), Section 6.4, gives a detailed analysis of hashing and
describes the characteristics of good hashing functions. Suggestions for
good hash functions abound; the one in Aho, Sethi, and Ullman (1986),
Section 7.6 is an example.

Exercises
3.1 Try a better hash function for hashing entries in symbol tables; for
example, try the one in Aho, Sethi, and Ullman (1986), Section 7.6.
Does it make 1cc run faster?
3.2 1cc never removes entries from the constants table. When might
40 constants
this approach be a problem? Propose and implement a fix, and 38 Coordinate
measure the benefit. Is the benefit worth the effort? 42 enterscope
42 exitscope
3.3 Originally, 1 cc used a single hash table for its symbol tables (Fraser 90 global
and Hanson 199lb). In this approach, hash chains held all of the 458 " (MIPS)
symbols that hashed to that bucket, and the chains were ordered in 492 " (SPARC)
decreasing order of scope values. 1ookup simply searched a single 524 " (X86)
hash table. i nstal 1 and enters cope were easy using this approach, 44 install
34 List
but exits cope was more complicated because it had to scan the 45 lookup
chains and remove the symbols at the current scope level. The 422 uses
present design ran faster on some computers, but it might not be
faster than the original design on other computers. Implement the
original design; make sure you handle accesses to global correctly.
Which design is easier to understand? Which is faster?
3.4 sym. c exports data and functions that help generate cross-reference
lists for identifiers and symbol-table information for debuggers.
The -x option causes 1cc to set the uses field of a symbol to a
Li st of pointers to Coordinates that identify each use of the sym-
bol. sym. c exports
(sym.c exported functions)=
extern void use ARGS((Symbol p, Coordinate src));
...
52

which appends s re to p->uses. It also exports


52 CHAPTER 3 • SYMBOL MANAGEMENT

(sym.c exported data)+=


...
42
extern List loci, symbols;

(sym.c exported functions)+=


...
51
extern void locus ARGS((Table tp, Coordinate *cp));
loci and tables hold pointers to Coordinates and Symbols. An
entry in symbo 1s is the tail end of a list of symbols that are visible
from the corresponding source coordinate in 1oci. Following the up
field in this symbol visits all of the symbols visible from this point
in the source program. locus appends tp->all and cp to symbols
and 1oci. tp->a 11 points to the symbol most recently added to the
table *tp, and is thus the current tail of the list of visible symbols.
Implement use and 1ocus; both take fewer than five lines.

Coordinate 38
List 34
Symbol 37
Table 39
use 51
4
Types

Types abound in C programs. These include the types given explicitly in


declarations and those derived as intermediate types in expressions. For
example, the assignment in
int *p, x;
*p = x;
involves three different types. x is the address of a cell that holds an
int, so the type of the address of x - its /value - is "pointer to an int."
The type of the value of x - its rvalue - is int, as expected from the
declaration. Similarly, the type of p's lvalue is pointer to pointer to an
int, the type of p's rvalue is pointer to an int, and the type of *p is int.
1cc must deal with all of these types when it compiles the assignment.
1 cc implements a representation for types and a set of functions on
that representation, which are described in this chapter. The functions
include type constructors, which build types, and type predicates, which
test facts about types. 1 cc must also implement type checking, which
ensures that declarations and expressions adhere to the rules dictated
by the language. Type checking uses the predicates described here and
is detailed in Chapters 9 and 11.

4.1 Representing Types


As suggested above, C types are usually rendered in English in a prefix
form in which a type operator is followed by its operand. For example,
the C declarator int *p declares p to be a pointer to an int, which is a
prefix rendition of the C type int * where pointer to is the operator and
an int is the operand. Similarly, char *(*strings) [10] declares strings
to be a
pointer to
an array of 10
pointers to
char,
where operands are indented under their operators.
There are many ways to represent this kind of prefix type specification.
For example, some older C compilers used bit strings in which the type
operators and the basic type were each encoded with a few bits. Bit-string
53
54 CHAPTER 4 • TYPES

representations are compact and easy to manipulate, but they usually


limit the number of basic types, the number of operators that can be
applied, and may not be able to carry size data, which are needed for
arrays, for example.
1 cc represents types by linked structures that mirror their prefix spec-
ifications. Type nodes are the building blocks:

( types.c typedefs)= 66
....
typedef struct type *Type;

(types.c exported types)= 66


....
struct type {
int op;
Type type;
int align;
int size;
union {
(types with names or tags 55)
(function types 63)
} u;
Xtype x;
} ;
align 78
ARRAY 109 The op field holds an integer operator code, and the type field holds
CHAR 109 the operand. The operators are the values of the global enumeration
CONST 109 constants:
DOUBLE 109
ENUM 109 CHAR LONG ARRAY FUNCTION
FLOAT 109 INT ENUM STRUCT CONST
FUNCTION 109
UNSIGNED FLOAT UNION VOLATILE
INT 109
LONG 109 SHORT DOUBLE POINTER VOID
POINTER 109
SHORT 109 The CHAR, INT, UNSIGNED, SHORT, LONG, and ENUM operators define the inte-
STRUCT 109 gral types, and the FLOAT and DOUBLE operators define the floating types.
UNION 109 Together, these types are known as the arithmetic types. Except for ENUM
UNSIGNED 109 types, these types have no operands. The operand of an ENUM type is its
VOID 109
compatible integral type, i.e., the type of the enumeration identifiers. For
VOLATILE 109
1 cc, this type is always int, as explained in Section 4.6.
The ARRAY, STRUCT, and UNION operators identify the aggregate types.
STRUCT and UNION do not have operands; their fields are stored in an aux-
iliary symbol-table entry for the structure or union tag. ARRAY's operand
is the element type. The POINTER and FUNCTION operators define pointer
types and function types. They take operands that give the referenced
type and the return type. The CONST and VOLATILE operators specify
qualified types; their operands are the unqualified versions of the types.
The sum CONST+VOLATILE is also a type operator, and it specifies a type
4. 1 • REPRESENTING TYPES 55

that is both constant and volatile. The VOID operator identifies the void
type; it has no operand.
The a 1i gn and size fields give the type's alignment and the size of
objects of that type in bytes. As specified by the code-generation inter-
face in Chapter 5, the size must be a multiple of the alignment. The back
end must allocate space for a variable so that its address is a multiple
of its type's alignment.
The x field plays the same role for types as it does in symbols; back
ends may define Xtype to add target-specific fields to the type structure.
This facility is most often used to support debuggers.
The innards of Types are revealed by exporting the declaration so that
back ends may read the size and a 1i gn fields and read and write the x
fields; by convention, these are the only fields the back ends are allowed
to inspect. The front end, however, may access all of the fields.
The op, type, size, and a 1i gn fields give most of the information
needed for dealing with a type. For unqualified types with names or tags
- the built-in types, structure and union types, and enumeration types -
the u. sym fields point to symbol-table entries that give more information
about the types:
(types with names or tags 55)= 54
Symbol sym;
The symbol-table entry gives the name of the type, and the value of 37 symbol
u. sym->addressed is zero if constants of the type can be included as 54 Type
parts of instructions. u. sym->type points back to the type itself; this 41 types
109 VOID
pointer is used to map tags to types, for example. There is one symbol-
table entry for each structure, union, and enumeration type, one for each
basic type, and one for all pointer types. These entries appear in the
types table, as detailed below in Section 4.2. This representation is used
so that the functions in sym. c can be used to manage types.
Types can be depicted in a parenthesized prefix form that follows
closely the English prefix form introduced above. For example, the type
int on the MIPS is:
(INT 4 4 ["int"])
The first 4 is the alignment, the second 4 is the size, and the ["int"]
denotes a pointer to a symbol-table entry for the type name int. Other
types are depicted similarly, for example
(POINTER 4 4 (INT 4 4 ["int"]) ["T*"])
is the type pointer to an int. The type name T* represents the single
symbol-table entry that is used for all pointer types.
The alignments, sizes, and symbol-table pointers are omitted from
explanations (but not from the code) when they're not needed to under-
stand the topic at hand. For instance, the types given at the beginning
of this section are:
56 CHAPTER 4 • TYPES

(INT)
(POINTER (INT))
(POINTER (ARRAY 10 (POINTER (CHAR))))
The last line, which depicts the type pointer to an array of 10 pointers to
char, illustrates the convention for array types in which the number of
elements is given instead of the size of the array. This convention is only
a notational convenience; the size field of the array type always holds
the actual size of the array. The number of elements can be computed by
dividing that size by the size of the element type. Thus, the type array
of 10 ints is more accurately depicted as
(ARRAY 40 4 (INT 4 4 ["int"]))
but, by convention, is usually depicted as (ARRAY 10 (INT)). An incom-
plete type is one whose size is unknown and that thus has a size field
equal to zero. These arise from declarations that omit sizes, such as
int a[];
extern struct table *identifiers;
Opaque pointers, such as pointers to lee's table structures, are incom-
plete types. Sizes for incomplete types are sometimes shown when it's
important to indicate that they are incomplete.
align 78
NELEMS 19
stringn 30 4.2 Type Management
One of the basic operations in type checking is determining whether two
types are equivalent. This test can simplified if there is only one copy
of any type, much the same way that string comparison is simplified by
keeping only one copy of any string.
type does for types what stri ngn does for strings. type manages
typetable:
( types.c data)=
static struct entry {
...
59

struct type type;


struct entry *link;
} *typetable[128];
Each entry structure in typetable holds a type. The function type
searches typetab 1e for the desired type, or constructs a new type:
( types.c functions)=
static Type type(op, ty, size, align, sym)
...
58

int op, size, align; Type ty; void *sym; {


unsigned h = (hash op and ty 57)&(NELEMS(typetable)-1);
struct entry *tn;
4.2 • TYPE MANAGEMENT 57

if (op != FUNCTION && (op != ARRAY I I size > 0))


(search for an existing type 57)
NEW(tn, PERM);
tn->type.op = op;
tn->type.type = ty;
tn->type.size = size;
tn->type.align = align;
tn->type.u.sym = sym;
memset(&tn->type.x, 0, sizeof tn->type.x);
tn->link = typetable[h];
typetable[h] = tn;
return &tn->type;
}

type always builds new types for function types and for incomplete array
types. When type builds a new type, it initializes the fields specified by
the arguments, clears the x field, adds the type to the appropriate hash
chain, and returns the new Type.
type searches typetable by using the exclusive OR of the type oper-
ator and the address of the operand as the hash value, and searching
the appropriate chain for a type with the same operator, operand, size,
alignment, and symbol-table entry:
(hash op and ty 57) = 56
78 align
109 ARRAY
(opA((unsigned)ty>>3)) 109 FUNCTION
24 NEW
(search for an existing type 57) = 57 97 PERM
for (tn = typetable[h]; tn; tn = tn->link) 54 Type
56 type
if (tn->type.op == op && tn->type.type ty 56 typetable
&& tn->type.size == size && tn->type.align align
&& tn->type.u.sym == sym)
return &tn->type;
typetable is initialized with the built-in types and the type for void*.
These types are also the values of 14 global variables:
( types.c exported data)=
extern Type chartype;
extern Type doubletype;
extern Type floattype;
extern Type inttype;
extern Type longdouble;
extern Type longtype;
extern Type shorttype;
extern Type signedchar;
extern Type unsignedchar;
extern Type unsignedlong;
58 CHAPTER 4 • TYPES

extern Type unsignedshort;


extern Type unsignedtype;
extern Type voidptype;
extern Type voidtype;
The front end uses these variables to refer to specific types, and can
thus avoid searching typetab 1e for types that are known to exist. These
variables and typetabl e are initialized by typeini t:
....
( types.c functions)+=
void typeinit() {
56 59 ...
(typeinit 58)
}

As detailed in Section 5.1, each basic type is characterized by its type


metric, which is a triple that gives the type's size and minimum align-
ment, and tells whether constants of the type can appear in dags. These
triples are structures with size, a1i gn, and outofl i ne fields.

addressed 179
(typeinit 58)=
#define xx(v,name,op,metrics) { \
...
59 58

align 78 Symbol p = install(string(name), &types, GLOBAL, PERM);\


CHAR 109
v = type(op, 0, IR->metrics.size, IR->metrics.align, p);\
charmetri c 78
chartype 57 p->type = v; p->addressed IR->metrics.outofline; }
DOUBLE 109 xx(chartype, "char", CHAR, charmetri c);
doublemetric 79 xx(doubletype, "double", DOUBLE, doublemetric);
doubletype 57 xx(floattype, "float", FLOAT, floatmetric);
FLOAT 109
xx(i nttype, "int", INT, i ntmetri c);
floatmetri c 79
floattype 57 xx (1ongdoub1 e, "1 ong daub 1e" , DOUBLE, daub 1emet ri c) ;
GLOBAL 38 xx(longtype, "long int", INT, intmetric);
install 44 xx(shorttype, "short", SHORT, shortmetri c);
INT 109 xx(signedchar, "signed char", CHAR, charmetric);
i ntmetri c 78 xx(unsignedchar, "unsigned char", CHAR, charmetric);
IR 306
longdouble 57 xx(unsignedlong, "unsigned long", UNSIGNED,intmetric);
longtype 57 xx(unsignedshort,"unsigned short",SHORT, shortmetric);
outofl i ne 78 xx(unsignedtype, "unsigned int", UNSIGNED,intmetric);
PERM 97 #undef xx
SHORT 109
shortmetric 78 The unsigned integral types have the same operators, sizes, and align-
shorttype 57
ments as their signed counterparts, but they have different symbol-table
signedchar 57
string 29 entries, so distinct types are constructed for them. Similarly, 1cc as-
types 41 sumes that long and int, and long double and double, have the same
typetable 56 structure, but each has a distinct type. Comparing a type to 1ongtype
unsignedchar 57 suffices to test if it represents the type long. IR points to the interface
UNSIGNED 109
unsignedlong 57
record supplied by the back end; see Section 5.11. The type void has no
metrics:
4.2 •TYPE MANAGEMENT 59

(typelnit 58)+=
...
58 61 58
....
{
Symbol p;
p = install(string("void"), &types, GLOBAL, PERM);
voidtype = type(VOID, NULL, 0, 0, p);
p->type = voidtype;
}
typelni t installs the symbol-table entries into the types table de-
fined in Section 3.2. This table holds entries for all types that are
named by identifiers or tags. The basic types are installed by typelni t
and are never removed. But the types associated with structure, union,
and enumeration tags must be removed from typetab 1e when their as-
sociated symbol-table entries are removed from types by exi tscope.
exitscope calls rmtypes(lev) to remove from typetable any types
whose u. sym->scope is greater than or equal to 1ev:
( types.c data)+=
...
56 61
....
static int maxlevel;

(types.c functions)+=
...
58 61
....
void rmtypes(lev) int lev; {
if (maxlevel >= lev) {
int i; 42 exitscope
maxlevel = O; 109 FUNCTION
for (i = O; i < NELEMS(typetable); i++) { 38 GLOBAL
44 install
(remove types with u. sym->scope >= 1 ev 59)
19 NELEMS
} 97 PERM
} 37 scope
} 29 string
58 type!nit
The value of maxl eve 1 is the largest value of u. sym->scope for any type 41 types
in typetable that has an associated symbol-table entry. rmtypes uses 56 typetable
maxl evel to avoid scanning typetabl e in the frequently occurring case 109 VOID
when none of the symbol-table entries have scopes greater than or equal 58 voidtype
to lev. Removing the types also recomputes maxlevel:
(remove types with u. sym->scope >= 1ev 59) = 59
struct entry *tn, **tq = &typetable[i];
while ((tn = *tq) != NULL)
if (tn->type.op == FUNCTION)
tq = &tn->link;
else if (tn->type.u.sym && tn->type.u.sym->scope >= lev)
*tq = tn->link;
else {
(recompute max 1eve 1 60)
tq = &tn->link;
}
60 CHAPTER 4 • TYPES

(recompute maxlevel 60)= 59


if (tn->type.u.sym && tn->type.u.sym->scope > maxlevel)
maxlevel = tn->type.u.sym->scope;
Function types are treated specially because they have fields that overlap
u. sym but themselves have no u. sym. Arrays and qualified types have no
u. sym fields, so the last clause handles them.

4.3 Type Predicates


The global variables initialized by typeini t can be used to specify a
particular type and to test for a particular type. For example, if the type
ty is equal to i nttype, ty is type int. The is ... predicates listed below,
implemented as macros, test for sets of types by checking for specific
operators. Most operate on unqualified types, which are obtained by
calling unqua 1:

(types.c exported macros)= 60


....
#define isqual(t) ((t)->op >=CONST)
#define unqual(t) (isqual(t) ? (t)->type (t))

( types.c exported macros)+=


...
60 66
ARRAY 109 ....
CHAR 109 #define isvolatile(t) ((t)->op == VOLATILE \
CONST 109 11 (t)->op == CONST+VOLATILE)
DOUBLE 109 #define i sconst(t) ((t)->op == CONST \
ENUM 109 II (t)->op == CONST+VOLATILE)
FUNCTION 109 #define i sarray(t) (unqual(t)->op ==ARRAY)
inttype 57
maxlevel 59 #define i sstruct(t) (unqual(t)->op == STRUCT \
POINTER 109 II unqual(t)->op ==UNION)
scope 37 #define i suni on(t) (unqual(t)->op ==UNION)
STRUCT 109 #define i sfunc(t) (unqual(t)->op ==FUNCTION)
token .h 109 #define isptr(t) (unqual(t)->op ==POINTER)
typeinit 58
UNION 109 #define i schar(t) (unqual(t)->op ==CHAR)
UNSIGNED 109 #define isint(t) (unqual(t)->op >=CHAR\
VOLATILE 109 && unqual(t)->op <=UNSIGNED)
#define i sfloat(t) (unqual(t)->OP <=DOUBLE)
#define isarith(t) (unqual(t)->op <=UNSIGNED)
#define i sunsi gned(t) (unqual(t)->op ==UNSIGNED)
#define i sdouble(t) (unqual(t)->op ==DOUBLE)
#define i sscalar(t) (unqual(t)->op <=POINTER\
II unqual(t)->op == ENUM)
#define i senum(t) (unqual(t)->OP == ENUM)
The values of the type operators are defined in token. h so that the com-
parisons made in the macros above yield the desired result.
4.4 • TYPE CONSTRUCTORS 61

4.4 Type Constructors


type constructs an arbitrary type. Other functions encapsulate calls to
type to construct specific types. For example, ptr builds a pointer type:
( types.c functions)+=
....
59 61
.....
Type ptr(ty) Type ty; {
return type(POINTER, ty, IR->ptrmetric.size,
IR->ptrmetric.align, pointersym);
}

which, given a type ty, returns (POINTER ty). The symbol-table entry
associated with pointer types is assigned to poi ntersym during initial-
ization, and the type for void* is initialized by calling pt r:
(types.c data)+=
....
59
static Symbol pointersym;

(typelni t 58) += 59 58
....
pointersym = install(string("T*"), &types, GLOBAL, PERM);
pointersym->addressed = IR->ptrmetric.outofline;
voidptype = ptr(voidtype);
While ptr builds a pointer type, deref dereferences it; that is, it returns 179 addressed
the reference type. Given a type (POINTER ty), deref returns ty: 78 align
.... 38 GLOBAL
( types.c functions)+= 61 61
..... 44 install
Type deref(ty) Type ty; { 306 IR
if (i sptr(ty)) 60 isenum
60 isptr
ty = ty->type; 78 outofl i ne
else 97 PERM
error("type error: %s\n", "pointer expected"); 109 POINTER
return isenum(ty) ? unqual(ty)->type : ty; 79 pt rmet ri c
} 29 string
56 type
de ref, like some of the other constructors below, issues errors for invalid 41 types
operands. Technically, these kinds of tests are part of type-checking, not 60 unqual
58 voidptype
type construction, but putting these tests in the constructors simplifies 58 voidtype
the type-checking code and avoids oversights. The last line of deref
handles pointers to enumerations: dereferencing a pointer to an enu-
meration must return its associated unqualified integral type. unqual is
described above.
array(ty, n, a) builds the type (ARRAY n ty). It also arranges for
the resulting type to have alignment a or, if a is 0, the alignment of ty.
array also checks for illegal operands.
....
(types.c functions)+= 61 62
.....
Type array(ty, n, a) Type ty; int n, a; {
62 CHAPTER 4 • TYPES

if (isfunc(ty)) {
error("illegal type 'array of %t'\n", ty);
return array(inttype, n, O);
}
if (level > GLOBAL && isarray(ty) && ty->size == 0)
error("missing array size\n");
if (ty->size == 0) {
if (unqual(ty) == voidtype)
error("illegal type 'array of %t'\n", ty);
else if (Aflag >= 2)
warning("declaring type 'array of %t' is _
undefined\n", ty);
} else if (n > INT_MAX/ty->size) {
error("size of 'array of %t' exceeds %d bytes\n",
ty, INT_MAX);
n = 1;
}
return type(ARRAY, ty, n*ty->size,
a? a : ty->align, NULL);
}

C does not permit arrays of functions, arrays of void, or incomplete ar-


align 78 rays (those with zero length) at any scope level except GLOBAL. array
array 61 also forbids arrays whose size is greater than INT_MAX bytes, because it
ARRAY 109 cannot represent their sizes, and warns about declaring incomplete ar-
GLOBAL 38 rays of incomplete types if 1cc's (fussy) double -A option, which sets
isarray 60 Afl ag to 2, indicating that 1cc should warn about non-ANSI usage. The
isfunc 60
level 42 format code %t prints an English description of the corresponding type
ptr61 argument; see Exercise 4.4.
%t 99 Array types "decay" into pointers to their element types in many con-
unqual 60 texts, such as when an array is the type of a formal parameter. atop
voidtype 58
implements this decay:
...
( types.c functions)+=
Type atop(ty) Type ty; {
61 62 ...
if (isarray(ty))
return ptr(ty->type);
error("type error: %s\n", "array expected");
return ptr(ty);
}

qual and unqual, shown above, respectively construct and deconstruct


qualified types. Given a type ty, qual checks for illegal operands and
builds (CONST ty), (VOLATILE ty), or (CONST+VOLATILE ty).
...
( types.c functions)+=
Type qual(op, ty) int op; Type ty; {
62 64 ...
4. 5 • FUNCTION TYPES 63

if (isarray(ty))
ty = type(ARRAY, qual(op, ty->type), ty->size,
ty->align, NULL);
else if (isfunc(ty))
warning("qualified function type ignored\n");
else if (isconst(ty) && op == CONST
II isvolatile(ty) && op == VOLATILE)
error("illegal type '%k %t'\n", op, ty);
else {
if (isqual(ty)) {
op += ty->op;
ty = ty->type;
}
ty = type(op, ty, ty->size, ty->align, NULL);
}
return ty;
}

If ty is the type (ARRAY ety), the qualification applies to the element


type, so qual (op, ty) builds (ARRAY (op ety)). If ty is already quali-
fied, it's either (CONST ty->type) or (VOLATILE ty->type), and op is the
other qualifier. In this case, qual builds (CONST+VOLATILE ty->type).
This convention complicates the code for qua l, but makes it possible to 78 align
describe qualified types with only one type node instead of one or two 109 ARRAY
type nodes, thus simplifying i squa1. 109 CONST
60 isarray
60 isconst
60 isfunc
4.5 Function Types 60 isqual
60 isvolatile
The type field of a function type gives the type of the value returned by 62 qual
the function, and the u union holds a structure that gives the types of 54 Type
the arguments: 109 VOLATILE

(function types 63} = 54


struct {
unsigned oldstyle:l;
Type *proto;
} f;
The f.oldstyle flag distinguishes between the two kinds of function
types: A one indicates an old-style type, which may omit the argument
types of the arguments, and a zero indicates new-style function, which
always includes the argument types. f. proto points to a null-terminated
array of Types; f.proto[i] is the type of argument i+l. The f.oldstyle
flag is needed because old-style function types may carry prototypes,
but, as dictated by the ANSI Standard, those prototypes are not used to
type-check actual arguments that appear in calls to such functions. This
64 CHAPTER 4 • TYPES

anomaly appears when an old-style definition is followed by a new-style


declaration; for example,
int f(x,y) int x; double y; { .•. }
extern int f(int, double);
defines f as an old-style function and subsequently declares a prototype
for f. The prototype must be compatible with the definition, but it's not
used to type-check calls to f.
func builds the type (FUNCTION ty {proto}), where ty is the type of
the return value and the braces enclose the prototype, and it initializes
the prototype and old-style flag:
( types.c functions)+=
...
62 64
....
Type func(ty, proto, style) Type ty, *proto; int style; {
if (ty && (isarray(ty) I I isfunc(ty)))
error("illegal return type '%t'\n", ty);
ty = type(FUNCTION, ty, 0, 0, NULL);
ty->u.f.proto = proto;
ty->u.f.oldstyle = style;
return ty;
}

FUNCTION 109 freturn is to function types what ptr is to pointer types. It takes a type
isarray 60 (FUNCTION ty) and dereferences it to yield ty, the type of the return
isfunc 60 value.
oldstyle
ptr
63
61 ( types.c functions)+=
...
64 65
....
Type freturn(ty) Type ty; {
i f (isfunc(ty))
return ty->type;
error("type error: %s\n", "function expected");
return inttype;
}

ANSI C supports functions with no arguments; such a function is de-


clared with void as the argument list. For example,
void f(void);
declares f to be a function that takes no arguments and returns no value.
Internally, the prototype for functions with no arguments is not empty;
it consists of a void type and the terminating null. Thus, the type of f
is depicted as
(FUNCTION (VOID) {(VOID)})
ANSI C also supports functions with a variable number of arguments.
spri ntf is an example; it's declared as
4.6 •STRUCTURE AND ENUMERATION TYPES 65

int sprintf(char *, char*, ... );


where the ellipsis denotes the variable portion of the argument list. The
prototype for a variable number of arguments consists of the types of
the declared arguments, a void type, and the terminating null. spri ntf's
type is thus
(FUNCTION (INT)
{(POINTER (CHAR))
(POINTER (CHAR))
(VOID)})

The predicate vari adi c tests whether a function type has a variable-
length argument list by looking for the type void at the end of its pro-
totype:
....
( types.c functions)+=
int variadic(ty) Type ty; {
64 67 ...
if (isfunc(ty) && ty->u.f .proto) {
inti;
for Ci= O; ty->u.f.proto[i]; i++)

return i > 1 && ty->u.f.proto[i-1] == voidtype;


} 66 Field
return O; 60 isfunc
} 37 symbol
58 voidtype
A function with a variable number of arguments always has at least one
declared argument, followed by one or more optional arguments, so the
void at the end of the prototype can't be confused with the prototype
for a function with no arguments, which has the one-element prototype
{(VOID)}.

4.6 Structure and Enumeration Types


Structure and union types are identified by tags, and the u. sym fields of
these types point to the symbol-table entries for these tags. The fields
are stored in these symbol-table entries, not in the types themselves. The
relevant field of the symbol structure is u. s:
(struct types 65)= 38
struct {
unsigned cfields:l;
unsigned vfields:l;
Field flist;
} s;
66 CHAPTER 4 • TYPES

cfi el ds and vfi el ds are both one if the structure or union type has any
const-qualified or volatile-qualified fields. fl i st points to a list of field
structures threaded through their link fields:
(types.c typedefs)+=
....
54
typedef struct field *Field;

(types.c exported types)+=


....
54
struct field {
char *name;
Type type;
int offset;
short bitsize;
short lsb;
Field link;
};

name holds the field name, type is the field's type, and offset is the byte
offset to the field in an instance of the structure.
When a field describes a bit field, the type field is either i nttype
or unsi gnedtype, because those are the only two types allowed for bit
fields. The l sb field is nonzero and the following macros apply. l sb is
the number of the least significant bit in the bit field plus one, where bit
cfields 65 numbers start at zero with the least significant bit.
field 182
inttype 57 ....
newstruct 67
(types.c exported macros)+=
#define fieldsize(p) (p)->bitsize
60 74
...
offset 364
structdcl 277 #define fieldright(p) ((p)->lsb - 1)
types 41 #define fieldleft(p) (8*(p)->type->size - \
unsignedtype 58 fi~ldsize(p) - fieldright(p))
vfields 65 #define fieldmask(p) (-(-(unsigned)O<<fieldsize(p)))
fields i ze returns the bi ts i ze field, which holds the size of the bit field
in bits. fi eldri ght is the number of bits to the right of a bit field, and is
used to shift the field over to the least significant bits of a signed or un-
signfd integer. Likewise, fieldleft is the number of bits to the left of a
field; it is used when a signed bit field must be sign-extended. fie l dmas k
is a mask of bi tsi ze qnes and is used to clear the extraneous bits when
a bit field is extracted. Notice that this representation for bit fields does
not depend on the target's endianness; the same representation is used
for both big and little endians.
newstruct creates anew type, (STRUCT ["tag"]) or (UNION ["tag"]),
where tag is the tag. It's called by structdcl whenever a new structure
or union type is declared or defined, with or without a field list. When
a new structure or union type is created, its tag is installed in the types
table. Tags are generated for anonymous structures and unions; that is,
those without tags:
4. 6 • STRUCTURE AND ENUMERA T/ON TYPES 67

...
( types.c functions)+=
Type newstruct(op, tag) int op; char *tag; {
65 68 ...
Symbol p;

if (*tag == O)
tag= stringd(genlabel(l));
else
(check for redefinition of tag 67)
p = install(tag, &types, level, PERM);
p->type = type(op, NULL, 0, 0, p);
if (p->scope > maxlevel)
maxlevel = p->scope;
p->src = src;
return p->type;
}

Installing a new tag in types might create an entry with a scope that
exceeds max 1eve1, so max 1eve1 is adjusted if necessary. Structure types
point to their symbol-table entries, which point back to the type, so that
tags can be mapped to types and vice versa. Tags are mapped to types
when they are used in declarators, for example; see structdcl. Types
are mapped to tags when rmtypes removes them from the typetabl e.
It's illegal to define the same tag more than once in the same scope, 50 defined
but it is legal to declare the same tag more than once. Giving a struc- 45 genlabel
ture declaration with fields declares and defines a structure tag; using a 44 install
structure tag without giving its fields declares the tag. For example, 42 level
45 lookup
struct employee { 59 maxlevel
char *name; 38 PARAM
struct date *hired; 97 PERM
59 rmtypes
char ssn[9]; 37 scope
} 29 stringd
277 structdcl
declares and defines emp 1oyee but only declares date. When a tag is 41 types
defined, its defined flag is lit, and defined is examined to determine if 56 typetable
the tag is being redefined:
(check for redefinition of tag 67)= 67
if ((p = lookup(tag, types)) != NULL && (p->scope == level
I I p->scope == PARAM && level == PARAM+l)) {
if (p->type->op == op && !p->defined)
return p->type;
error("redefinition of '%s' previously defined at %w\n",
p->name, &p->src);
}

Arguments and argument types have scope PARAM, and locals have scopes
beginning at PARAM+l. ANSI C specifies that arguments and top-level
68 CHAPTER 4 • TYPES

locals are in the same scope, so the scope test must test for a local
tag that redefines a tag defined by an argument. This division is not
mandated by the ANSI C Standard; it's used internally by 1cc to separate
parameters and locals so that foreach can visit them separately.
newfi e 1d adds a field with type fty to a structure type ty by allocating
a fie 1d structure and appending it to the field list in ty's symbol-table
entry:
(types.c functions}+=
....
67 69
Field newfield(name, ty, fty) char *name; Type ty, fty; { .,..
Field p, *q = &ty->u.sym->u.s.flist;

if (name NULL)
name= stringd(genlabel(l));
for (p = *q; p; q = &p->link, p *q)
if (p->name == name)
error("duplicate field name '%s' in '%t'\n",
name, ty);
NEWO(p, PERM);
*q = p;
p->name = name;
p->type = fty;
ENUM 109 return p;
Field 66 }
fieldref 76
fields 280 If name is null, newfield generates a name; this capability is used by
foreach 41 fields for unnamed bit fields. Field lists are searched by fieldref; see
genlabel 45 Exercise 4.6.
identifiers 41
NEWO 24
Enumeration types are like structure and union types, except that they
newstruct 67 don't have fields, and their type fields give their associated integral type,
PERM 97 which for 1 cc is always i nttype. The standard permits compilers to use
stringd 29 any integral type that can hold all of the enumeration values, but many
symbol 37 compilers always use ints; 1 cc does likewise to maintain compability.
Enumeration types have a type field so that 1 cc could use different in-
tegral types for different enumerations. Enumeration types are created
by calling newstruct with the operator ENUM, and newstruct returns the
type (ENUM ["tag"]).
Like a structure or union type, the u. sym field of an enumeration type
points to a symbol-table entry for its tag, but it uses a different compo-
nent of the symbo 1 structure:
(enum types 68) = 38
Symbol *idlist;
i dl i st points to a null-terminated array of Symbo1s for the enumeration
constants associated with the enumeration type. These are installed in
the identifiers table, and each one carries its value:
4. 7 • TYPE-CHECKING FUNCTIONS 69

(enum constants 69)= 38


int value;
The enumeration constants are not a part of the enumeration type. They
are created, initialized, and packaged in an array as they are parsed; see
Exercise 11.9.

4. 7 Type-Checking Functions
Determining when two types are compatible is the crux of type check-
ing, and the functions described here help to implement ANSI C's type-
checking rules.
eqtype returns one if two types are compatible and zero otherwise .
....
( types.c functions)+=
int eqtype(tyl, ty2, ret) Type tyl, ty2; int ret; {
68 71
...
i f (tyl == ty2)
return 1;
if (tyl->op != ty2->op)
return O;
switch (tyl->op) {
case CHAR: case SHORT: case UNSIGNED: case INT:
case ENUM: case UNION: case STRUCT: case DOUBLE: 109 ARRAY
109 CHAR
return O;
109 CONST
case POINTER: (check for compatible pointer types 70) 109 DOUBLE
case VOLATILE: case CONST+VOLATILE: 109 ENUM
case CONST: (check for compatible qualified types 70) 109 FUNCTION
case ARRAY: (check for compatible array types 70) 109 INT
109 POINTER
case FUNCTION: (check for compatible function types 70)
109 SHORT
} 109 STRUCT
} 56 type
109 UNION
The third argument, ret, is the value returned when either tyl or ty2 is 109 UNSIGNED
an incomplete type. 160 value
A type is always compatible with itself. type ensures that there is only 109 VOLATILE
one instance of most types, so many tests of compatible types pass the
first test in eqtype. Likewise, many tests of incompatible types test types
with different operators, which are never compatible and cause eqtype
to return zero.
If two different types have the same operator CHAR, SHORT, UNSIGNED,
or INT, the two types represent different types, such as unsigned short
and signed short, and are incompatible. Similarly, two enumeration,
structure, or union types are compatible only if they are the same type.
The remaining cases traverse the type structures to determine compat-
ibility. For example, two pointer types are compatible if their referenced
types are compatible:
70 CHAPTER 4 • TYPES

(check for compatible pointer types 70) = 69


return eqtype(tyl->type, ty2->type, l);
Two similarly qualified types are compatible if their unqualified types
are compatible.
(check for compatible qualified types 70) = 69
return eqtype(tyl->type, ty2->type, l);
An incomplete type is one that does not include the size of the object
it describes. For example, the declaration
int a[];
declares an array in which the size is unknown. The type is given by
(ARRAY 0 (INT)); a size of zero identifies an incomplete type. Two
array types are compatible if their element types are compatible and if
their sizes, if given, are equal:
(check for compatible array types 70)= 69
if (eqtype(tyl->type, ty2->type, 1)) {
if (tyl->size ty2->size)
return l;
if (tyl->si ze 0 I I ty2->size == 0)
eqtype 69
return ret;
}
return O;
eqtype returns ret if one of the array types is incomplete but they are
otherwise compatible. ret is always one when eqtype calls itself, and is
usually one when called from elsewhere. Some operators, such as pointer
comparison, insist on operands that are both incomplete types or both
complete types; ret is 0 for those uses of eqtype. The first test handles
the case when both arrays have unknown sizes.
Two function types are compatible if their return types are compatible
and if their prototypes are compatible:
(check for compatible function types 70) = 69
if (eqtype(tyl->type, ty2->type, 1)) {
Type *pl= tyl->u.f.proto, *p2 = ty2->u.f.proto;
if (pl == p2)
return l;
if (pl && p2) {
(check for compatible prototypes 71)
} else {
(check if prototype is upward compatible 71)
}
}
return O;
4. 7 • TYPE-CHECKING FUNCTIONS 71

The easy case is when both functions have a prototype. The prototypes
must both have the same number of argument types, and the unqualified
versions of the types in each prototype must be compatible.
(check for compatible prototypes 71)= 70
for ( ; *pl && *p2; pl++, p2++)
if (eqtype(unqual(*pl), unqual(*p2), 1) == O)
return O;
if (*pl == NULL && *p2 == NULL)
return l;
The other case is more complicated. Each argument type in the one
function type that has a prototype must be compatible with the type that
results from applying the default argument promotions to the unqualified
version of the type itself. Also, if the function type with a prototype has a
variable number of arguments, the two function types are incompatible.
(check if prototype is upward compatible 71) = 70
if (variadic(pl ? tyl : ty2))
return O;
if (pl == NULL)
pl = p2;
for ( ; *pl; pl++) {
Type ty = unqual(*pl); 69 eqtype
if (promote(ty) != ty I I ty == floattype) 57 floattype
60 isenum
return O; 60 isint
} 60 isunsigned
return l; 57 longtype
60 unqual
The default argument promotions stipulate that floats are promoted to 65 variadic
doubles and that small integers and enumerations are promoted to ints
or unsigneds. The code above checks the float promotion explicitly, and
calls promote for the others. promote implements the integral promo-
tions:
.....
( types.c functions)+= 69 72
....
Type promote(ty) Type ty; {
ty = unqual(ty);
if (isunsigned(ty) I I ty == longtype)
return ty;
else if (isint(ty) I I isenum(ty))
return inttype;
return ty;
}

Two compatible types can be combined to form a new, composite type.


This operation occurs, for example, in the C fragment
72 CHAPTER 4 • TYPES

int x[];
int x[lO];
The first declaration associates the type (ARRAY 0 (INT)) with x. The
second declaration forms the new type (ARRAY 10 (INT)). These two
types are combined to form the type (ARRAY 10 (INT)), which becomes
the type of x. Combining these two types uses the size of the second
type in the composite type. Another example is combining a function
type with a prototype with a function type without one.
compose accepts two compatible types and returns the composite type.
compose is similar in structure to eqtype and the easy cases are similar .
...
( types.c functions)+=
Type compose(tyl, ty2) Type tyl, ty2; {
71 73
...
if (tyl == ty2)
return tyl;
switch (tyl->op) {
case POINTER:
return ptr(compose(tyl->type, ty2->type));
case CONST+VOLATILE:
return qual(CONST, qual(VOLATILE,
compose(tyl->type, ty2->type)));
case CONST: case VOLATILE:
align 78 return qual(tyl->op, compose(tyl->type, ty2->type));
array 61
ARRAY 109
case ARRAY: { (compose two array types 72) }
CONST 109 case FUNCTION: { (compose two function types 72) }
eqtype 69 }
FUNCTION 109 }
POINTER 109
ptr 61 Two compatible array types form a new array whose size is the size
qual 62 of the complete array, if there is one.
VOLATILE 109
(compose two array types 72) = 72
Type ty = compose(tyl->type, ty2->type);
if (tyl->size && tyl->type->size && ty2->size == 0)
return array(ty, tyl->size/tyl->type->size, tyl->align);
if (ty2->size && ty2->type->size && tyl->size == 0)
return array(ty, ty2->size/ty2->type->size, ty2->align);
return array(ty, 0, O);
The composite type of two compatible function types has a return type
that is the composite type of the two return types, and argument types
that are the composite types of the corresponding argument types. If
one function type does not have a prototype, the composite type has the
prototype from the other function type.
(compose two function types 72)= 72
Type *pl = tyl->u.f .proto, *p2 = ty2->u.f .proto;
4. 8 • TYPE MAPPING 73

Type ty = compose(tyl->type, ty2->type);


List tlist = NULL;
if (pl == NULL && p2 == NULL)
return func(ty, NULL, l);
if (pl && p2 == NULL)
return func(ty, pl, tyl->u.f.oldstyle);
if (p2 && pl == NULL)
return func(ty, p2, ty2->u.f.oldstyle);
for ( ; *pl && *p2; pl++, p2++) {
Type ty = compose(unqual(*pl), unqual(*p2));
if Ci sconst (*pl) I I i sconst (*p2))
ty = qual(CONST, ty);
if (isvolatile(*pl) I I isvolatile(*p2))
ty = qual(VOLATILE, ty);
tlist = append(ty, tlist);
}
return func(ty, ltov(&tlist, PERM), O);
This code uses the list functions append and 1tov to maL- oe:.late Lists,
which are lists of pointers.
34 append
109 ARRAY
4.8 Type Mapping 109 CHAR
72 compose
The type representation and type functions described in this chapter are 109 CONST
used primarily by the front end. Back ends may inspect the size and 109 DOUBLE
a1i gn fields, but must not rely on the other fields. 109 ENUM
109 FLOAT
They may, however, have to map Types to type suffixes, which are 64 func
used to form type-specific operators as described in Chapter 5. The type 109 FUNCTION
suffixes are a subset of the type operators. ttob maps a type to its 109 INT
corresponding type suffix: 60 isconst
... 74 60 isvolatile
( types.c functions}+=
int ttob(ty) Type ty; {
72
... 34
34
List
ltov
switch (ty->op) { 63 oldstyle
97 PERM
case CONST: case VOLATILE: case CONST+VOLATILE: 109 POINTER
return ttob(ty->type); 62 qual
case CHAR: case INT: case SHORT: case UNSIGNED: 109 SHORT
case VOID: case FLOAT: case DOUBLE: return ty->op; 109 STRUCT
case POINTER: case FUNCTION: return POINTER; 54 Type
109 UNION
case ARRAY: case STRUCT: case UNION: return STRUCT; 60 unqual
case ENUM: return INT; 109 UNSIGNED
} 109 VOID
} 109 VOLATILE
74 widen
widen is similar, but widens all integral types to int:
74 CHAPTER 4 • TYPES

( types.c exported macros}+=


...
66
#define widen(t) (isint(t) I I isenum(t) ? INT : ttob(t))
btot is the opposite of ttob; it maps an operator or type suffix op to
some Type such that optype(op) == ttob(btot(op)).
(types.c functions}+=
...
73
Type btot(op) int op; {
switch (optype(op)) {
case F: return floattype;
case D: return doubletype;
case C: return chartype;
case S: return shorttype;
case I: return inttype;
case U: return unsignedtype;
case P: return voidptype;
}
}

The enumeration identifiers F, D, C, ... , defined on page 82, are abbrevi-


ations for the corresponding type operators.

chartype 57 Further Reading


doubletype 57
floattype 57 1cc's type representation is typical for languages in which types can
INT 109
isenum 60 be specified by grammars and hence represented by linked structures
isint 60 that amount to abstract syntax trees for expressions derived from those
optype 98 grammars. Aho, Sethi, and Ullman (1986) describe this approach in more
shorttype 57 detail and illustrate how type checking is done not only for languages
ttob 73 like C but also for functional languages, such as ML (Ullman 1994). Sec-
unsignedtype 58
voidptype 58 tion 6.3, particularly Exercise 6.13, in Aho, Sethi, and Ullman (1986) de-
scribes how PCC, the Portable C Compiler Uohnson 1978), represented
types with the bit strings.

Exercises
4.1 Give the parenthesized prefix form for the types in the following
declarations.

long double d;
char ***p;
const int *const volatile *q;
int (*r)[10][4];
struct tree *(*s[])(int, struct tree*, struct tree *);
EXERCISES 75

4.2 Give an example of a C structure definition that draws the tag re-
definition diagnostic described in Section 4.6.
4.3 Implement the predicate
(types.c exported functions)= 75
.....
extern int hasproto ARGS((Type));
which returns one if ty includes no function types or if all of the
function types it includes have prototypes, and zero otherwise.
hasproto is used to warn about missing prototypes. It doesn't warn
about missing prototypes in structure fields that are function point-
ers, because it's called explicitly with the types of the fields as the
structure is parsed.
4.4 1cc prints an English rendition of types in diagnostics. For example,
the types of spri ntf, shown in Section 4.5, and of

char *(*strings)[lO]

are printed as

int function(char *, char*, ... )


pointer to array 10 of pointer to char 54 Type

The output functions interpret the pri ntf-style code %t to print the
next Type argument, and call
( types.c exported functions)+=
...
75 75
.....
extern void outtype ARGS((Type));
to do so. Implement outtype.
4.5 types. c exports three other functions that format and print types .
( types.c exported functions)+=
...
75 76
.....
extern void printdecl ARGS((Symbol p, Type ty));
extern void printproto ARGS((Symbol p, Symbol args[]));
extern char *typestring ARGS((Type ty, char *id));
typestri ng returns a C declaration that specifies ty to be the type
of the identifier id. For example, if ty is

(POINTER (ARRAY 10 (POINTER (CHAR))))

and id is "strings", typestri ng returns "char *(*strings) [10] ".


1cc's -P option helps convert pre-ANSI code to ANSI C by printing
new-style prototypes for functions and globals on the standard er-
ror output. pri ntdecl prints a declaration for p assuming it has
76 CHAPTER 4 • TYPES

type ty (ty is usually p->type), and pri ntproto prints a declara-


tion for a function p that has parameters given by the symbols in
args. pri ntproto uses args to build a function type and then calls
printdecl, which calls typestring. Implement these functions.
4.6 The function
( types.c exported functions)+=
...
75
extern Field fieldref ARGS((char *name, Type ty));
searches ty's field list for the field given by name, and returns a
pointer to the field structure. It returns NULL if ty doesn't have a
field name. Implement fieldref.
4.7 Explain why lee diagnoses that the operands of the assignment in
the C program below have illegal types.

struct { int x, y; } *p;


struct { int x, y; } *q;
main() { p = q; }

4.8 Explain why l cc complains that the argument to f is an illegal type


in the C program below.
Field 66
field 182 void f(struct point { int x, y; } *p) {}
type 56 struct { int x, y; } *origin;
main() { f(origin); }

4.9 Explain why l cc insists that the definition of i sdi git in the C pro-
gram below conflicts with the external declaration of i sdi git.

extern int isdigit(char c);


int isdigit(c) char c; {return c >= 'O' && c <= '9'; }

4.10 Measurements show that the if statement in type's (search for an


existing type) is one of lee's hot spots. Instrument lee to deter-
mine the order of the tests in this conditional that gives the best
execution time. Once you've found the best order, measure the im-
provement of l cc's execution time when compiling itself. Was the
change worthwhile?
4.11 Structure types point to symbols that hold their field lists, and
those symbols point back to the types. Redesign this apparently
awkward data structure so that types are completely independent
of symbol tables. For example, structure types could carry their
field lists in one of the u fields, and types with tags could use an-
other field to store the scope level at which they're defined. You'll
EXERCISES 77

need to revise functions like newstruct to initialize these fields,


and to provide functions for mapping tags to types and perhaps
vice versa. The basic types need the data that's in their symbol-
table entries, such as the addressed flag. Compare your revised
design; is it obviously superior to the present one? Does your im-
plementation duplicate functionality provided elsewhere, such as
in the symbol-table module?

179 addressed
67 newstruct
5
Code Generation Interface

This chapter defines the interface between the target-independent front


end and the target-dependent back ends. Good code-generation inter-
faces are hard to design. An inadequate interface may force each back
end to do work that could otherwise be done once in the front end. If the
interface is too small, it may encode too little information to exploit new
machines thoroughly. If the interface is too large, the back ends may
be needlessly complicated. These competing demands require careful
engineering and re-engineering as new targets expose flaws.
lee's interface consists of a few shared data structures, 19 functions,
most of which are simple, and a 36-operator language, which encodes
the executable code from a source program in directed acyclic graphs,
or dags. The front and back ends share some fields of the shared data
structures, but other fields are private to the front or back end.
Two of the shared data structures are described in previous chapters:
symbo1 in Chapter 3 and type in Chapter 4. Back ends are able to examine
symbol 37 any field in either structure, but by convention they don't. This chapter
type 56
lists the fields that back ends may examine and, to describe the entire
interface in one place, it reviews what they represent. It omits fields that
are logically private to the front (or back) end.

5.1 Type Metrics


A type metric specifies the size and alignment for a primitive type:
(interface 78) =
typedef struct metrics {
...
79 16

unsigned char size, align, outofline;


} Metrics;
The outofl i ne flag controls the placements of constants of the associ-
ated type. If outofl i ne is one, constants cannot appear in dags; such
constants are placed in anonymous static variables and their values are
accessed by fetching the variables. Each primitive type has a metric:
(metrics 78) = 79
Metrics charmetric;
Metrics shortmetric;
Metrics intmetric;

78
5.2 • INTERFACE RECORDS 79

Metrics floatmetric;
Metrics doublemetric;
Metrics ptrmetric;
Metrics structmetric;
ptrmetri c describes pointers of all types. The alignment of a structure
is the maximum of the alignments of its fields and structmetri c. align,
which thus gives the minimum alignment for structures; structmetri e's
size field is unused. Back ends usually set outofl i ne to zero only for
those types whose values can appear as immediate operands of instruc-
tions.
The size and alignment for characters must be one. The front end
correctly treats signed and unsigned integers and longs as distinct types,
but it assumes that they all share i ntmetri c. Likewise for doubles and
long doubles. Each pointer must fit in an unsigned integer.

5.2 Interface Records


A cross-compiler produces code for one machine while r- .11Iling on an-
other. l cc can be linked with code generators for several targets, so it
can be used as either a native compiler or a cross-compiler. l cc's inter-
face record captures everything that its front end needs to know about a 78 align
target machine, including pointers to the interface routines, type metrics, 92 gen
and interface flags. The interface record is defined by: 402 gen
.... 78 intmetric
(interface 78) += 78 96
..... 16 78 Metrics
typedef struct interface { 78 outofl ine
(metrics 78) 355 Xinterface
(interface flags 87)
(interface functions 80)
Xinterface x;
} Interface;
l cc has a distinct instance of the interface record for each target. The x
field is an extension in which the back end stores target-specific interface
data and functions. The x field is private to the back end and is defined
in confi g. h.
The interface records hold pointers to the 19 interface functions de-
scribed in the following sections. The functions defined in this chapter
by (interface functions) are often denoted by just their name. For exam-
ple, gen is used instead of the more accurate but verbose "the function
pointed to by the gen field of the interface record."
The interface record also holds pointers to some functions that the
front end calls to emit symbol tables for debuggers:
80 CHAPTER 5 • CODE GENERA T/ON INTERFACE

(interface functions 80) = 89 79


void (*stabblock) ARGS((int, int, Symbol*)); ""
void (*stabend) ARGS((Coordinate *, Symbol, Coordinate**
Symbol *, Symbol *));
void (*stabfend) ARGS((Symbol, int));
void (*stabinit) ARGS((char *, int, char*[]));
void (*stabline) ARGS((Coordinate *));
void (*stabsym) ARGS((Symbol));
void (*stabtype) ARGS((Symbol));
To save space, this book does not describe these stab functions. The
companion diskette shows them, though some are just stubs for some
targets.

5.3 Symbols
A symbol represents a variable, label, or constant; the scope field tells
which. For variables and constants, the back end may query the type
field to learn the data type suffix of the item. For variables and labels, the
floating-point value of the ref field approximates the number of times
that variable or label is referenced; a nonzero value thus indicates that
CONSTANTS 38 the variable or label is referenced at least once. For labels, constants,
Coordinate 38 and some variables, a field of the union u supplies additional data.
generated 50 Variables have a scope equal to GLOBAL, PARAM, or LOCAL+k for nesting
GLOBAL 38 level k. scl ass is STATIC, AUTO, EXTERN, or REGISTER. The name of most
LABELS 38 variables is the name used in the source code. For temporaries and other
LOCAL 38
PARAM 38 generated variables, name is a digit sequence. For global and static vari-
sclass 38 ables, u . seg gives the logical segment in which the variable is defined.
scope 37 If the interface flag wants_dag is zero, the front end generates explicit
structarg 292 temporary variables to hold common subexpressions - those used more
symbol 37 than once. It sets the u. t. cse fields of these symbols to the dag nodes
temporary 50
type 56 that compute the values stored in them.
wants_argb 88 The flags temporary and generated are set for temporaries, and the
wants_dag 89 flag generated is set for labels and other generated variables, like those
that hold string literals. structarg identifies structure parameters when
the interface flag wants_argb is set; the material below on wants_argb
elaborates.
Labels have a scope equal to LABELS. The u .1 .1 abel field is a unique
numeric value that identifies the label, and name is the string represen-
tation of that value. Labels have no type or scl ass.
Constants have a scope equal to CONSTANTS, and an sclass equal to
STATIC. For an integral or pointer constant, name is its string represen-
tation as a C constant. For other types, name is undefined. The actual
value of the constant is stored in the u. c. v field, which is defined on
5.4 • TYPES 81

page 4 7. If a variable is generated to hold the constant, u. c. 1oc points


to the symbol-table entry for that variable.
Symbols have an x field with type Xsymbol, defined in config.h. It's
an extension in which the back end stores target-specific data for the
symbol, like the stack offset for locals. The x field is private to the back
end, and thus its contents are not part of the interface. Chapter 13
elaborates.

5.4 Types
Symbols have a type field. If the symbol represents a constant or vari-
able, the type field points to a structure that describes the type of the
item. Back ends may read the size and a 1 i gn fields of this structure to
learn the size and alignment constraints of the type in bytes. Back ends
may also pass the type pointer itself to predicates like i sarray and ttob
to learn about the type without examining other fields.

5.5 Dag Operators


Executable code is specified by dags. A function body is a sequence, or
forest, of dags, each of which is passed to the back end via gen. Dag 92 gen
nodes, sometimes called nodes, are defined by: 402 gen
60 isarray
(c.h typedefs)= 315 node
typedef struct node *Node; 73 ttob
56 type
358 Xnode
(c.h exported types)=
struct node {
...
82 362 Xsymbol

short op;
short count;
Symbol syms[3];
Node kids[2];
Node link;
Xnode x;
} ;

The elements of kids point to the operand nodes. Some dag operators
also take one or two symbol-table pointers as operands; these appear in
syms. The back end may use the third syms for its own purposes; the
front end uses it, too, but its uses are temporary and occur before dags
are passed to the back end, as detailed in Section 12.8. 1 ink points to
the root of the next dag in the forest.
count records the number of times the value of this node is used or
referred to by others. Only references from kids count; 1 ink references
82 CHAPTER 5 • CODE GENERATION INTERFACE

don't count because they don't represent a use of the value of the node.
Indeed, 1ink is meaningful only for root nodes, which are executed for
side effect, not value. If the interface flag wants_dag is zero, roots always
have a zero count. The generated code for shared nodes - those whose
count exceed one - must evaluate the node only once; the value is used
count times.
The x field is the back end's extension to nodes. The back end defines
the type Xnode in confi g. h to hold the per-node data that it needs to
generate code. Chapter 13 describes the fields.
The op field holds a dag operator. The last character of each is a type
suffi.x from the list in the type definition:
....
(c.h exported twes)+= 81 82
.....
enum {
F=FLOAT,
D=DOUBLE,
C=CHAR,
S=SHORT,
!=INT,
U=UNSIGNED,
P=POINTER,
V=VOID,
CHAR 109 B=STRUCT
count 81 };
DOUBLE 109
FLOAT 109 For example, the generic operator ADD has the variants ADDI, ADDU, ADDP,
INT 109 ADDF, and ADDD. These suffixes are defined so that they have the values
POINTER 109 1-9.
SHORT 109
STRUCT 109
The operators are defined by
UNSIGNED 109 ....
VOID 109
( c.h exported t}Pes) += 82 91
.....
wants_dag 89 en um { (operators 82) } ;
Xnode 358
(opera tors 82) = 82
CNST=1«4,
CNSTC=CNST+C,
CNSTD=CNST+D,
CNSTF=CNST+F,
CNSTI=CNST+I,
CNSTP=CNST+P,
CNSTS=CNST+S,
CNSTU=CNST+U,
ARG=2«4,
ARGB=ARG+B,
ARGD=ARG+D,
ARGF=ARG+F,
5.5 •DAG OPERATORS 83

ARGI=ARG+I,
ARGP=ARG+P,

The rest of (operators) defines the remaining operators. Table 5.I lists
each generic operator, its valid type suffixes, and the number of kids and
syms that it uses; multiple values for kids indicate type-specific variants,
which are described below. The notations in the syms column give the
number of syms values and a one-letter code that suggests their uses: IV
indicates that syms [OJ points to a symbol for a variable, IC indicates that
syms [OJ is a constant, and IL indicates that syms [OJ is a label. For IS,
syms [OJ is a constant whose value is a size in bytes; 2S adds syms [lJ,
which is a constant whose value is an alignment. For most operators,
the type suffix denotes the type of operation to perform and the type of
the result. Exceptions are ADDP, in which an integer operand in kids [OJ
is added to a pointer operand in kids [lJ, and SUBP, which subtracts an
integer in kids [lJ from a pointer in kids [OJ. The operators for assign-
ment, comparison, arguments, and some calls return no result; their type
suffixes denote the type of operation to perform.
The leaf operators yield the address of a variable or the value of a con-
stant. syms [OJ identifies the variable or constant. The unary operators
accept and yield a number, except for INDIR, which accepts an address
and yields the value at that address. There is no BCOMI; signed integers
are complemented using BCOMU. The binary operators accept two num- 81 kids
bers and yield one. 81 syms
The type suffix for a conversion operator denotes the type of the re-
sult. For example, CVUI converts an unsigned (U) to a signed integer
(I). Conversions between unsigned and short and between unsigned and
character are unsigned conversions; those between integer and short and
between integer and character are signed conversions. For example, CVSU
converts an unsigned short to an unsigned, and thus clears the high-
order bits. CVSI converts a signed short to a signed integer, and thus
propagates the short's sign to fill the high-order bits.
The front end builds dags or otherwise composes conversions to form
those not in the table. For example, it converts a short to a float by first
converting it to an integer and then to a double. The I6 conversion op-
erators are represented by arrows in Figure 5.1. Composed conversions
follow the path from the source type to the destination type.
ASGN stores the value of kids [lJ into the cell addressed by kids [OJ.
syms [OJ and syms [lJ point to symbol-table entries for integer constants
that give the size of the value and its alignment. These are most useful
for ASGNB, which assigns structures and initializes automatic arrays.
JUMPV is an unconditional jump to the address computed by kids [OJ.
For most jumps, kids [OJ is a constant ADDRGP node, but switch state-
ments compute a variable target, so kids [OJ can be an arbitrary com-
putation. LABEL defines the label given by syms [OJ, and is otherwise a
no-op. For the comparisons, syms [OJ points to a symbol-table entry for
84 CHAPTER 5 • CODE GENERATION INTERFACE

syms kids Operator Type Suffixes Operati.on


lV 0 ADDRF p address of a parameter
lV 0 ADDRG p address of a global
lV 0 ADDRL p address of a local
lC 0 CNST CSIUPFD constant

1 BCOM u bitwise complement


1 eve IU convert from char
1 CVD I F convert from double
1 CVF D convert from fl oat
1 CVI cs u D convert from int
1 CVP u convert from pointer
1 CVS IU convert from short
1 cvu CSI P convert from unsigned
1 IND IR CSI PFDB fetch
1 NEG I FD negation

2 ADD IUPFD addition


2 BAND u bitwise AND
2 BOR u bitwise inclusive OR
2 BXOR u bitwise exclusive OR
2 DIV IU FD division
2 LSH IU left shift
2 MOD IU modulus
2 MUL IU FD multiplication
2 RSH IU right shift
2 SUB IUPFD subtraction

2S 2 ASGN CSI PFDB assignment


lL 2 EQ I FD jump if equal
lL 2 GE IU FD jump if greater than or equal
lL 2 GT IU FD jump if greater than
lL 2 LE IU FD jump if less than or equal
lL 2 LT IU FD jump if less than
lL 2 NE I FD jump if not equal

2S 1 ARG I PFDB argument


1 1or2 CALL I FDBV function call
1 RET I FD return from function

1 JUMP v unconditional jump


lL 0 LABEL v label definition

TABLE 5.1 Node operators.


5.5 • DAG OPERATORS 85

the label to jump to if the comparison is true. Signed comparisons are


used for unsigned equals and not equals, since equality tests needn't
special-case the sign bit.
Function calls have a CALL node preceded by zero or more ARG nodes.
The front end unnests function calls - it performs the inner call first,
assigns its value to a temporary, and uses the temporary henceforth -
so ARG nodes are always associated with the next CALL node in the forest.
If wants_dag is one, CALL nodes always appear as roots in the forest. If
wants_dag is zero, only CALLV nodes appear as roots; other CALL nodes
appear as right operands to ASGN nodes, which are roots.
A CALL node's syms [OJ points to a symbol whose only nonnull field is
type, which is the function type of the callee.
ARG nodes establish the value computed by kids [OJ as the next ar-
gument. syms [OJ and syms [lJ point to symbol-table entries for integer
constants that give the size and alignment of the argument.
In CALL nodes, kids [OJ computes the address of the callee. CALLB
nodes are used for calls to functions that return structures; kids [lJ
computes the address of a temporary local variable to hold the returned
value. The CALLB code and the function prologue must collaborate to
store the CALLB's kids [lJ into the callee's first local. The SPARC in-
terface procedures function and 1oca1, and the CAL LB emitter illus-
trate such collaboration. CALLB nodes have a count of zero because the
front end references the temporary wherever the returned value is ref- 81 count
erenced. There is no RETB; the front end uses an ASGNB to the structure 92 function
448 " (MIPS)
addressed by the first local. CALLB nodes appear only if the interface flag 484 " (SPARC)
wants_ca 11 b is one; see Section 5.6. In RET nodes, kids [OJ computes the 518 " (X86)
value returned. 90 local
Character and short-integer actual arguments are always promoted to 447 " (MIPS)
the corresponding integer type even in the presence of a prototype, be- 483 " (SPARC)
518 " (X86)
cause most machines must pass at least integers as arguments. Upon 56 type
entry to the function, the promoted values are converted back to the 88 wants_callb
type declared for the formal parameter. For example, the body of 89 wants_dag

f(char c) { f(c); }

c c

o-r-u-P
1 1
1F
1s 1s
FIGURE 5.1 Conversions.
86 CHAPTER 5 •CODE GENERATION INTERFACE

ASGNC ARGI -- - - - - -~ CALLI

ADDRFP
/~CVIC i
ever
i
ADDRGP
c
i
IND I RC
i
INDIRC
f

i
ADDRFP
i
ADDRFP
c c
FIGURE 5.2 Forests for f(char c) { f(c); }

becomes the two forests shown in Figure 5.2. The solid lines are kids
pointers and the dashed line is the 1ink pointer. The left forest holds
one dag, which narrows the widened actual argument to the type of the
formal parameter. In the left dag, the left ADDRFP c refers to the formal
parameter, and the one under the INDIRC refers to the actual argument.
The right forest holds two dags. The first widens the formal parameter
c to pass it as an integer, and the second calls f.
Unsigned variants of ASGN, INDIR, ARG, CALL, and RET were omitted
as unnecessary. Signed and unsigned integers have the same size, so
the corresponding signed operator is used instead. Likewise, there is no
kids 81
CALLP or RETP. A pointer is returned by using CVPU and RETI. A pointer-
valued function is called by using CALLI and CVUP.
In Table 5.1, the operators listed at and following ASGN are used for
their side effects. They appear as roots in the forest, and their reference
counts are zero. CALLO, CALLF, and CALLI may also yield a value, in
which case they appear as the right-hand side of an ASGN node and have
a reference count of one. With this lone exception, all operators with side
effects always appear as roots in the forest of dags, and they appear in
the order in which they must be executed. The front end communicates
all constraints on evaluation order by ordering the dags in the forest.
lf ANSI specifies that x must be evaluated before y, then x's dag will
appear in the forest before y's, or they will appear in the same dag with
x in the subtree rooted by y. An example is
inti, *p; f() { i = *p++; }

The code for the body off generates the forest shown in Figure 5.3. The
INDIRP fetches the value of p, and the ASGNP changes p's value to the
sum computed by this INDIRP and 4. The ASGNI sets i to the integer
pointed to by the original value of p. Since the INDIRP appears in the
forest before pis changed, the INDIRI is guaranteed to use the original
value of p.
5.6 • INTERFACE FLAGS 87

IND I RP - - - • ASGNP - - - - - - - - - - - - - - >ASGNI

PGP~ ~
ADJRY \ooP ADDt \orRr
i

CNSTI
4

FIGURE 5.3 Forest for int i, *p; f() { *p++; }

5.6 Interface Flags


The interface flags help configure the front end for a target.
(interface flags 87) = 87
..... 79
unsigned little_endian:l;
should be one if the target is a little endian and zero if it's a big endian.
A computer is a little endian if the least significant byte in each word has
the smallest address of the bytes in the word. For example, little endians
lay out the word with the 32-bit unsigned value OxAABBCCDD thus:

where the addresses of the bytes increase from the right to the left.
A computer is a big endian if the least significant byte in each word has
the largest address of the bytes in the word. For example, big endians
lay out the word with the unsigned value OxAABBCCDD thus:

In other words, 1 cc's front end lays out a list of bit fields in the address-
ing order of the bytes in an unsigned integer: from the least significant
byte to the most significant byte on little endians and vice versa on big
endians. ANSI permits either order, but following increasing addresses
is the prevailing convention.
(interface flags 87) +=
...
87 88 79
.....
unsigned mulops_calls:l;
should be zero if the hardware implements multiply, divide, and remain-
der. It should be zero if the hardware leaves these operations to library
routines. The front end unnests nested calls, so it needs to know which
operators are emulated by calls. It might become necessary to generalize
this feature to handle other emulated instructions, but no target so far
has needed more.
88 CHAPTER 5 • CODE GENERATION INTERFACE

(interface flags 87) +=


...
87 88 79
.....
unsigned wants_callb:l;

tells the front end to emit CALLB nodes to invoke functions that return
structures. If wants_ca 11 b is zero, the front end generates no CALLB
nodes but implements them itself, using simpler operations: It passes
an extra, leading, hidden argument that points to a temporary; it ends
each structure function with an ASGNB dag that copies the return value
to this temporary; and it has the caller use this temporary when it needs
the structure returned. When wants_ca11 b is one, the front end gener-
ates CALLB nodes. The kids [1] field of a CALLB computes the address
of the location at which to store the return value, and the first local of
any function that returns a structure is assumed to hold this address.
Back ends that set wants_cal 1b to one must implement this convention
by, for example, initializing the address of the first local accordingly. If
wants_ca 11 b is zero, the back end cannot control the code for functions
that return structure arguments, so it cannot, in general, mimic an exist-
ing calling convention. In this book, the MIPS and X86 code generators
initialize wants_ca 11 b to zero; the front end's implementation of CALLB
happens to be compatible with the calling conventions for the MIPS .

(interface flags 87) +=


...
88 88 79
.....
structarg 292 unsigned wants_argb:l;

tells the front end to emit ARGB nodes to pass structure arguments. If
wants_argb is zero, the front end generates no ARGB nodes but imple-
ments structure arguments itself using simpler operations: It builds an
ASGNB dag that copies the structure argument to a temporary; it passes a
pointer to the temporary; it adds an extra indirection to references to the
parameter in the callee; and it changes the types of the callee's formals
to reflect this convention. It also sets structarg for these parameters
to distinguish them from bona fide structure pointers. If wants_argb is
zero, the back end cannot control the code for structure arguments, so
it cannot, in general, mimic an existing calling convention. In this book,
the SPARC code generator initializes wants_argb to zero; the others ini-
tialize it to one. The front end's implementation of ARGB is compatible
with the SPARC calling convention.

(interface flags 87) +=


...
88 89 79
.....
unsigned left_to_right:l;

tells the front end to evaluate and to present the arguments to the back
end left to right. That is, the ARG nodes that precede the CALL appear in
the same order as the arguments in the source code. If left_to_right
zero, arguments are evaluated and presented right to left. ANSI permits
either order.
5. 7 • INITIALIZATION 89

....
(interface flags 87) += 88 79
unsigned wants_dag:l;
tells the front end to pass dags to the back end. If it's zero, the front
end undags all nodes with reference counts exceeding one. It creates a
temporary, assigns the node to the temporary, and uses the temporary
wherever the node had been used. When wants_dag is zero, all refer-
ence counts are thus zero or one, and only trees, which are degenerate
dags, remain; there are no general dags. The code generators in this
book generate code using a method that requires trees, so they initial-
ize wants_dag to zero, but other code generators for l cc have generated
code from dags.

S. 7 Initialization
During initialization, the front end calls
....
(interface functions 80) + =
void (*progbeg) ARGS((int argc, char *argv[]));
80 89
... 79

argv[O .. argc-1] point to those program arguments that are not recog-
nized by the front end, and are thus deemed target-specific. progbeg
processes such options and initializes the back end.
At the end of compilation, the front end calls 90 address
.... 457 " (MIPS)
(interface functions 80) +=
void (*progend) ARGS((void));
89 89
... 79 490
521
" (SPARC)
" (X86)
38 CONSTANTS
to give the back end an opportunity to finalize its output. On some 92 function
targets, progend has nothing to do and is empty. 448 " (MIPS)
484 " (SPARC)
518 " (X86)
38 GLOBAL
5.8 Definitions 38 LABELS
38 LOCAL
Whenever the front end defines a new symbol with scope CONSTANTS, 90 local
LABELS, or GLOBAL, or a static variable, it calls 447 " (MIPS)
.... 483 " (SPARC)
(interface functions 80) +=
void (*defsymbol) ARGS((Symbol));
89 90
... 79 518
38
" (X86)
PARAM
37 scope
to give the back end an opportunity to initialize its Xsymbol field. For 362 Xsymbol
example, the back end might want to use a different name for the sym-
bol. The conventions on some targets in this book prefix an underscore
to global names. The Xsymbol fields of symbols with scope PARAM are
initialized by function, those with scope LOCAL+k by local, and those
that represent address computations by address.
A symbol is exported if it's defined in the module at hand and used
in other modules. It's imported if it's used in the module at hand and
defined in some other module. The front end calls
90 CHAPTER 5 •CODE GENERATION INTERFACE

....
(interface functions 80) +=
void (*export) ARGS((Symbol));
89 90 ... 79

void (*import) ARGS((Symbol));


to announce an exported or imported symbol. Only nonstatic variables
and functions can be exported. The front end always calls expo rt before
defining the symbol, but it may call import at any time, before or after
the symbol is used. Most targets require expo rt to emit an assembler
directive. Some require nothing from import; the MIPS back end, for
example, has an empty import.
....
(interface functions 80) +=
void (*global) ARGS((Symbol));
90 90 ... 79

emits code to define a global variable. The front end will already have
called segment, described below, to direct the definition to the appro-
priate logical segment, and it will have set the symbol's u. seg to that
segment. It will follow the call to global with any appropriate calls to
the data initialization functions. g1oba1 must emit the necessary align-
ment directives and define the label.
The front end announces local variables by calling
....
(interface functions 80) +=
void (*local) ARGS((Symbol));
90 90 ... 79

defined 50 It announces temporaries likewise; these have the symbol's temporary


defsymbol 89 flag set. local must initialize the Xsymbol field, which holds data like
(MIPS) " 457
the local's stack offset or register number.
(SPARC) " 491
(X86) " 520 The front end calls
....
function
(MIPS) "
92
448
(interface functions 80)+=
void (*address) ARGS((Symbol p, Symbol q, int n));
90 91
... 79
(SPARC) " 484
(X86) " 518 to initialize q to a symbol that represents an address of the form x+n,
segment 91 where x is the address represented by p and n is positive or negative.
(MIPS) " 459
(SPARC) " 491
Like defsymbol, address initializes q's Xsymbol, but it does so based
(X86) " 501 on the values of p's Xsymbol and n. A typical address adds p's stack
temporary 50 offset to n for locals and parameters, and sets q's x. name to p's x. name
x.name 362 concatenated with +n or -n for other variables. For example, if n is 40
Xsymbol 362 and p points to a symbol with the source name array, and if the back
end forms names by prefixing an underscore, then address will create
the name _array+40, so that the addition can be done by the assembler
instead of at run time. address accepts globals, parameters, and locals,
and is called only after these symbols have been initialized by defsymbo l,
function, or local.
When the front end announces a symbol by calling one of the interface
procedures above, it sets the symbol's defined flag after the call. This
flag prevents the front end from announcing a symbol more than once.
1cc's front end manages four logical segments that separate code,
data, and literals:
5.9 • CONSTANTS 91

....
(c.h exported types)+=
enum { CODE=l, BSS, DATA, LIT };
82 97 ...
The front end emits executable code into the CODE segment, defines unini-
tialized variables in the BSS segment, and it defines and initializes ini-
tialized variables in the DATA segment and constants in the LIT segment.
The front end calls
....
(interface functions 80) +=
void (*segment) ARGS((int));
90 91
... 79

to announce a segment change. The argument is one of the segment


codes above. segment maps the logical segments onto the segments pro-
vided by the target machine.
CODE and LIT can be mapped to read-only segments; BSS and DATA
must be mapped to segments that can be read and written. The CODE and
LIT segments can be mapped to the same segment and thus combined.
Any combination of BSS, DATA, and LIT can be combined likewise. CODE
would be combined with them only on single-segment targets.

5.9 Constants
The interface functions 47 Value
....
(interface functions 80) +=
void (*defaddress) ARGS((Symbol));
91 92
... 79

void (*defconst) ARGS((int ty, Value v));


initialize constants. defconst emits directives to define a cell and ini-
tialize it to a constant value. v is the value, and ty encodes its type and
thus which element of the Value v to access, as shown in the following
table.
ty v Field Type
c v.uc character
s v.us short
I v. i int
u v.u unsigned
p v.p any pointer type
F v.f float
D v.d double

The codes C, S, I, ... are identical to the type suffixes used for the oper-
ators. The signed fields v. sc and v. ss can be used instead of v. uc and
v. us, but defconst must initialize only the specified number of bits. If
92 CHAPTER 5 • CODE GENERATION INTERFACE

ty is P, v. p holds a numeric constant of some pointer type. These orig-


inate in declarations like char *p=(char*)OxFO. defaddress initializes
pointer constants that involve symbols instead of numbers.
The defconst functions in Chapters 16-18 permit cross-compilation,
so they compensate for different representations and byte orders. For
example, they swap the two halves of a double if compiling for a big
endian on a little endian or vice versa.
In general, ANSI C compilers can't leave the encoding of floating-point
constants to the assembler, because few assemblers implement C's casts.
For example, the correct initialization for
double x = (float)0.3;
has zeros in the least significant bits. Typical assembler directives like
.double 0.3
can't implement casts and thus erroneously initialize x without zeros in
the least significant bits, so most defconsts must initialize doubles by
emitting two unsigneds.
....
(interface functions 80)+=
void (*defstring) ARGS((int n, char *s));
91 92
... 79

emits code to initialize a string of length 1en to the characters in s. The


defaddress 91 front end converts escape sequences like \000 into the corresponding
(MIPS) " 456 ASCII characters. Null bytes can be embedded ins, so they can't flag its
(SPARC) " 490 end, which is why defstri ng accepts not just s but also its length.
(X86) " 523 ....
defconst 91 (interface functions 80) += 92 92
.... 79
(MIPS) " 455 void (*space) ARGS((int));
(SPARC) " 490
(X86) " 522 emits code to allocate n zero bytes.

5.10 Functions
The front end compiles functions into private data structures. It com-
pletely consumes each function before passing any part of the function
to the back end. This organization permits certain optimizations. For
example, only by processing complete functions can the front end iden-
tify the locals and parameters whose address is not taken; only these
variables may be assigned to registers.
Three interface functions and two front-end functions collaborate to
compile a function.
....
(interface functions 80)+=
void (*function) ARGS((Symbol, Symbol[], Symbol[], int));
92 95 ... 79

void (*emit) ARGS((Node));


Node (*gen) ARGS((Node));
5. 10 • FUNCTIONS 93

(dag.c exported functions)= 311


....
extern void emitcode ARGS((void));
extern void gencode ARGS((Symbol[], Symbol[]));
At the end of each function, the front end calls function to generate and
emit code. The typical form of function is
(typical function 93)=
void function(Symbol f, Symbol caller[], Symbol callee[],
int ncalls) {
(initialize)
gencode(caller, callee);
(emit prologue)
emitcode();
(emit epilogue)
}

gencode is a front-end procedure that traverses the front end's private


structures and passes each forest of dags to the back end's gen, which
selects code, annotates the dag to record its selection, and returns a dag
pointer. gencode also calls local to announce new locals, blockbeg and
blockend to announce the beginning and end of each block, and so on.
emi tcode is a front-end procedure that traverses the private structures
again and passes each of the pointers from gen to emit to emit the code. 95 blockbeg
This organization offers the back-end flexibility in generating function 365 blockbeg
prologue and epilogue code. Before calling gen code, function initializes 95 blockend
the Xsymbo1 fields of the function's parameters and does any other nec- 365 blockend
essary per-function initializations. After calling gencode, the size of the 57 doubletype
341 emitcode
procedure activation record, or frame, and the registers that need saving 92 emit
are known; this information is usually needed to emit the prologue. After 393 emit
calling emitcode to emit the code for the body of the function, function 57 floattype
emits the epilogue. 92 function
448 " (MWS)
The argument f to function points to the symbol for the current func- 484 " (SPARC)
tion, and nca 11 s is the number of calls to other functions made by the 518 " (X86)
current function. nca11 s helps on targets where leaf functions - those 337 gencode
that make no calls - get special treatment. 92 gen
caller and callee are arrays of pointers to symbols; a null pointer 402 gen
90 local
terminates each. The symbols in caller are the function parameters as 447 " (MIPS)
passed by a caller; those in ca11 ee are the parameters as seen within the 483 " (SPARC)
function. For many functions, the symbols in each array are the same, 518 " (X86)
but they can differ in both scl ass and type. For example, in 38 sclass
56 type
single(x) float x; { ... } 362 Xsymbol
a call to single passes the actual argument as a double, but x is a
float within single. Thus, caller[O]->type is doubletype, the front-
end global that represents doubles, and ca11 ee [O]->type is fl oattype.
And in
94 CHAPTER 5 • CODE GENERATION INTERFACE

int strlen(register char *s) { ... }


caller[O]->sclass is AUTO and callee[O]->sclass is REGISTER. Even
without register declarations, the front end assigns frequently referenced
parameters to the REGISTER class, and sets callee's sclass accordingly.
To avoid thwarting the programmer's intentions, this assignment is made
only when there are no explicit register locals.
ca11 er and ca11 ee are passed to gen code. If ca11 er [i] ->type differs
from callee[i]->type, or the value of caller[i]->sclass differs from
callee[i]->sclass, gencode generates an assignment of caller[i] to
ca11 ee [ i J . If the types are not equal, this assignment may include a con-
version; for example, the assignment toxin single includes a truncation
of a double to a float. For parameters that include register declarations,
function must assign a register and initialize the x field accordingly, or
change the ca11 ee's sc 1ass to AUTO to prevent an unnecessary assign-
ment of caller[i] to callee[i].
function could also change the value of callee[i]->sclass from
AUTO to REGISTER if it wished to assign a register to that parameter. The
MIPS calling convention, for example, passes some arguments in regis-
ters, so function assigns those registers to the corresponding cal 1ees
in leaf functions. If, however, ca11 ee [i ]->addressed is set, the address
of the parameter is taken in the function body, and it must be stored in
AUTO 80 memory on most machines.
callee 93 Most back ends define for each function activation an argument-build
caller 93 area to store the arguments to outgoing calls. The front end unnests
function 92 calls, so the argument-build area can be used for all calls. The back end
(MIPS) " 448
(SPARC) " 484
makes the area big enough to hold the largest argument list. When a
(X86) " 518 function is called, the caller's argument-build area becomes the callee's
gencode 337 actual arguments.
REGISTER 80 Calls are unnested because some targets pass some arguments in reg-
sclass 38 isters. If we try to generate code for a nested call like f(a,g(b)), and
if arguments are evaluated and established left to right, it is hard not
to generate code that loads a into the first argument register and then
destroys it by loading b into the same register, because both a and b
belong in the first argument register, but a belongs there later.
Some calling conventions push arguments on a stack. They can handle
nested calls, so an argument-build area is not always necessary. Unnest-
ing has the advantage that stack overflow can occur only at function
entry, which is useful on targets that require explicit prologue code to
detect stack overflow.
For each block, the front end first announces locals with explicit reg-
ister declarations, in order of declaration, to permit programmer control
of register assignment. Then it announces the rest, starting with those
that it estimates to be most frequently used. It assigns REGISTER class
to even these locals if their addresses are not taken and if they are esti-
mated to be used more than twice. This announcement order and sc 1ass
5. 10 • FUNCTIONS 95

override collaborate to put the most promising locals in registers even if


no registers were declared.
If p's sclass is REGISTER, local may decline to allocate a register and
may change scl ass to AUTO. The back end has no alternative if it has
already assigned all available registers to more promising locals. As with
parameters, local could assign a register to a local with sclass equal to
AUTO and change scl ass to REGISTER, but it can do so only if the symbol's
addressed is zero.
Source-language blocks bracket the lifetime of locals. gencode an-
nounces the beginning and end of a block by calling:
....
(interface functions 80)+= 92 79
void (*blockbeg) ARGS((Env *));
void (*blockend) ARGS((Env *));
Env, defined in confi g. h, is target-specific. It typically includes the
data necessary to reuse that portion of the local frame space associated
with the block and to release any registers assigned to locals within the
block. For example, blockbeg typically records in an Env the size of the
frame and the registers that are busy at the beginning of the block, and
bl ockend restores the register state and updates the stack if the new
block has pushed deeper than the maximum depth seen so far. Chap-
ter 13 elaborates.
The front end calls gen to select code. It passes gen a forest of dags. 179 addressed
For example, Figure 5.3 on page 87 shows the forest for 80 AUTO
365Env
inti, *p; f() { i = *p++; } 337 gencode
92 gen
A postorder traversal of this forest yields the linearized representation 402 gen
shown in the table below. 90 local
447 " (MIPS)
Node# op count kids syms 483 " (SPARC)
518 " (X86)
1 ADDRGP 2 p 80 REGISTER
2 IND I RP 2 1 38 sclass
3 CNSTI 1 4
4 ADDP 1 2, 3
5 ASGNP 0 1, 4
6 ADDRGP 1 i
7 IND I RI 1 2
8 ASGNI 0 6, 7

This forest consists of three dags, rooted at nodes 2, 5, and 8. The


INDIRP node, which fetches the value of p, comes before node 5, which
changes p, so the original value of p is available for subsequent use by
node 7, which fetches the integer pointed to by that value.
gen traverses the forest and selects code, but it emits nothing because
it may be necessary to determine, for example, the registers needed be-
fore the function prologue can be emitted. So gen merely annotates the
96 CHAPTER 5 • CODE GENERATION INTERFACE

nodes in their x fields to identify the code selected, and returns a pointer
that is ultimately passed to the back end's emit to output the code. Once
the front end calls gen, it does not inspect the contents of the nodes
again, so gen may modify them freely.
emit emits a forest. Typically, it traverses the forest and emits code
by switching on the opcode or some related value stored in the node by
gen.

5.11 Interface Binding


The compiler option -target=narne identifies the desired target. The
name-interface pairs for the available targets are stored in
(interface 78) +=
...
79 96 16
typedef struct binding { ""
char *name;
Interface *ir;
} Binding;

extern Binding bindings[];


The front end identifies the one in the -target and stores a pointer to
emit 92 its interface record in
emit 393
gen 92 (interface 78) +=
...
96 16
gen 402
Interface 79 extern Interface *IR;
IR 306
mipsebIR 431 Whenever the front end needs to call an interface function, or read a type
mipselIR 431 metric or an interface flag, it uses IR.
sparcIR 464 Back ends must define and initialize bindings, which associates names
x86IR 497 and interface records. For example, the back ends in this book define
bindings in bind. c:
(bind.c 96) =
#include "c. h"
extern Interface nullIR, symbolicIR;
extern Interface mipsebIR, mipselIR;
extern Interface sparcIR, solarisIR;
extern Interface x86IR;
Binding bindings[] = {
"symbolic", &symbolicIR,
"mips-irix", &mipsebIR,
"mips-ultrix", &mipselIR,
"spare-sun", &sparcIR,
"sparc-solaris", &solarisIR,
"x86-dos", &x86IR,
5. 12 • UPCALLS 97

"null", &nul lIR,


NULL, NULL
};

The MIPS, SPARC, and X86 interfaces are described in Chapters 16, 17,
and 18. The interfaces nul 1 and symbo 1i c are described in Exercises 5.2
and 5.1.

5.12 Upcalls
The front and back ends are clients of each other. The front end calls on
the back end to generate and emit code. The back end calls on the front
end to perform output, allocate storage, interrogate types, and manage
nodes, symbols, and strings. The front-end functions that back ends
may call are summarized below. Some of these functions are explained in
previous chapters, but are included here to make this summary complete.
void *a11 ocate Ci nt n, int a) permanently allocates n bytes in the
arena a, which can be one of
(c.h exported types)+=
....
91
enum { PERM=O, FUNC, STMT };
and returns a pointer to the first byte. The space is guaranteed to be 26 allocate
aligned to suit the machine's most demanding type. Data allocated in 64 freturn
98 outflush
PERM are deallocated at the end of compilation; data allocated in FUNC 16 outs
and STMT are deallocated after compiling functions and statements. 18 print

(input.c exported data)= 103


....
extern char *bp;

points to the next character in the output buffer. The idiom *bp++ = c
thus appends c to the output as shown in outs on page 16. One of the
other output functions, described below, must be called at least once
every 80 characters.

(output.c exported functions)+=


....
18 98
....
extern void fprint ARGS((int fd, char *fmt, ... ));

prints its third and following arguments on the file descriptor fd. See
print for formatting details. If fd is not 1 (standard output), fpri nt
calls outfl ush to flush the output buffer for fd.
Type freturn(Type ty) is the type of the return value for function
type ty.
....
(c.h exported macros)+= 19 98
....
#define generic(op) ((op)&-15)
98 CHAPTER 5 • CODE GENERATION INTERFACE

is the generic version of the type-specific dag operator op. That is, the
expression generi c(op) returns op without its type suffix.
int gen label (int n) increments the generated-identifier counter by
n and returns its old value.
int istype(Type ty) are type predicates that return nonzero if type
ty is a type shown in the table below.

Predicate Type
i sari th arithmetic
i sarray array
ischar character
i sdoubl e double
i senum enumeration
i sfloat floating
i sfunc function
i sint integral
isptr pointer
isscalar scalar
isstruct structure or union
i sunion union
isunsigned unsigned

generic 97 Node newnode(int op, Node l, Node r, Symbol sym) allocates a dag
genlabel 45 node; initializes the op field to op, kids [O] to l, kids [1] to r, and
kids 81 syms [O] to sym; and returns a pointer to the new node.
local 90
(MIPS) " 447 Symbol newconst (Val u~ v, int t) installs a constant with value v
(SPARC) " 483 and type suffix t into the symbol table, if necessary, and returns a pointer
(X86) .. 518 to tqe symbol-table entry.
newnode 315 Symbol newtemp(i nt scl ass, int t) creates a temporary with stor-
newtemp 50 age class scl ass and a type with type suffix t, and returns a pointer
syms 81
to the symbol-table entry. The new temporary is announced by calling
local.
opi ndex(op) is the operator number, for operator op:
....
(c.h exported macros)+=
#define opindex(op) ((op)>>4)
97 98 ...
opi ndex is used to map the generic operators into a contiguous range of
integers.

(c.h exported macros)+=


....
98
#define optype(op) ((op)&lS)

is the type suffix for the dag operator op.


....
(output.c exported functions)+=
extern void outflush ARGS((void));
97 99 ...
FURTHER READING 99

writes the current output buffer to the standard output, if it's not empty.
void outs (char *s) appends string s to the output buffer for stan-
dard output, and calls outfl ush if the resulting buffer pointer is within
80 characters of the end of the buffer.
void pri nt(char *fmt, ... ) prints its second and following argu-
ments on standard output. It is like pri ntf but supports only the for-
mats %c, %d, %0, %x, and %s, and it omits precision and field-width spec-
ifications. print supports four 1cc-specific format codes. %5 prints a
string of a specified length; the next two arguments give the string and
its length. %k prints an Fnglish rendition of the integer token code given
by the corresponding argument, and %t prints an English rendition of a
type. %w prints the source coordinates given by its corresponding argu-
ment, which must be a pointer to a Coordinate. print calls outfl ush
if it prints a newline character from fmt within 80 characters of the end
of the output buffer. Each format except %c does the actual output with
outs, which may also flush the buffer.
int roundup(i nt n, int m) is n rounded up to the next multiple of
m, which must be a power of two.
char *stri ng(char *s) installs sin the string table, if necessary, and
returns a pointer to the installed copy.
char *stri ngd(i nt n) returns the string representation of n; stri ngd
installs the returned string in the string table.

( output.c exported functions)+=


...
98
38
98
Coordinate
outflush
extern char *stringf ARGS((char * ... )); 16 outs
18 print
formats its arguments into a string, installs that string to the string table, 19 roundup
29 stringd
and returns a pointer to the installed string. See print for formatting 29 string
details. 73 ttob
int ttob(Type ty) is the type suffix for type ty. 65 variadic
int vari adi c (Type ty) is true if type ty denotes a variadic function.

Further Reading
Fraser and Hanson (1991a and 1992) describe the earlier versions of
1cc's code generation interface. This chapter is more detailed, and cor-
responds to version 3.1 and above of 1 cc.
Some compiler interfaces emit abstract machine code, which resembles
an assembler code for a fictitious machine (Tanenbaum, van Staveren,
and Stevenson 1982). The front end emits code for the abstract ma-
chine, which the back end reads and translates it to target code. Abstract
machines decouple the front and back ends, and make it easy to insert
extra optimization passes, but the extra 1/0 and structure allocation and
initialization take time. 1cc's tightly coupled interface yields efficient,
compact compilers, but it can complicate maintenance because changes
100 CHAPTER 5 • CODE GENERA T/ON INTERFACE

to the front end may affect the back ends. This complication is less im-
portant for standardized languages like ANSI C because there will be few
changes to the language.

Exercises
5.1 1 cc can be turned into a syntax and semantics checker by writing a
null code generator whose interface record points to functions that
do nothing. Implement this interface.
5.2 Implement a symbolic back end that generates a trace of the inter-
face functions as they are called and a readable representation of
their arguments. As an example, the output of the symbolic back
end that comes with 1cc for

inti, *p; f() { i = *p++; }

is

export f
segment text
emit 92 function f type=int function(void) class=auto ...
emit 393 maxoffset=O
function 92 node#2 ADDRGP count=2 p
(MIPS) " 448 node'l INDIRP count=2 #2
(SPARC) " 484
(X86) " 518
node#S CNSTI count=l 4
gen 92 node#4 ADDP count=l #1 #5
gen 402 node'3 ASGNP count=O #2 #4 4 4
node#? ADDRGP count=l i
node#8 INDIRI count=l #1
node'6 ASGNI count=O #7 #8 4 4
1:
end f
segment bss
export p
global p type=pointer to int class=auto ...
space 4
export i
global i type=int class=auto scope=GLOBAL ref=lOOO
space 4

All of the interface routines in this back end echo their arguments
and some provide additional information. For example, function
computes a frame size, which it prints as the value of maxoffset
as shown above. gen and emit collaborate to print dags as shown
EXERCISES 101

above. gen numbers the nodes in each forest (by annotating their x
fields), and emit prints these numbers for node operands. emit also
identifies roots by prefixing their numbers with accents graves, as
shown for nodes 1, 3, and 6 in the first forest above. For a LABELV
node, emit prints a line with just the label number and a colon.
Compare this output with the linearized representation shown on
page 95.
5.3 Write a code generator that simply emits the names of all identifiers
visible to other modules, and reports those imported names that
are not used.
5.4 When 1cc's interface was designed, 32-bit integers were the norm,
so nothing was lost by having integers and longs share one metric.
Now, many machines support 32-bit and 64-bit integers, and our
shortcut complicates using both data types in the same code gen-
erator. How would adding two new type suffixes - L for long and
O for unsigned long - change 1cc's interface? Consider the effect
on the type metrics, the node operators in general, and the con-
version operators in particular. Redraw Figure 5.1. Which interface
functions would have to change? How?
5.5 Design an abstract machine consistent with 1cc's interface, and use
it to separate 1cc's front end from its back end. Write a code gen- 92 emit
erator that emits code for your abstract machine. Adapt 1cc's back 393 emit
92 gen
end to read your abstract machine code, rebuild the data structures 402 gen
that the back end uses now, and call the existing back end to gener-
ate code. This exercise might take a month or so, but the flexibility
to read abstract-machine code, optimize it, and write it back out
would simplify experimenting with optimizers.
6
Lexical Analysis

The lexical analyzer reads source text and produces tokens, which are
the basic lexical units of the language. For example, the expression
*pt r = 56 ; contains 10 characters or five tokens: *, pt r, =, 56, and
; . For each token, the lexical analyzer returns its token code and zero
or more associated values. The token codes for single-character tokens,
such as operators and separators, are the characters themselves. Defined
constants (with values that do not collide with the numeric values of sig-
nificant characters) are used for the codes of the tokens that can consist
of one or more characters, such as identifiers and constants.
For example, the statement *ptr = 56; yields the token stream shown
on the left below; the associated values, if there are any, are shown on
the right.
'*I
ID "ptr" symbol-table entry for "ptr"
stringn 30 '='
ICON "56" symbol-table entry for 56
The token codes for the operators * and = are the operators themselves,
i.e., the numeric values of * and =, respectively, and they do not have
associated values. The token code for the identifier ptr is the value of
the defined constant ID, and the associated values are the saved copy
of the identifier string itself, i.e., the string returned by stri ngn, and a
symbol-table entry for the identifier, if there is one. Likewise, the integer
constant 56 returns ICON, and the associated values are the string "56"
and a symbol-table entry for the integer constant 56.
Keywords, such as "for," are assigned their own token codes, which
distinguish them from identifiers.
The lexical analyzer also tracks the source coordinates for each token.
These coordinates, defined in Section 3.1, give the file name, line number,
and character index within the line of the first character of the token.
Coordinates are used to pinpoint the location of errors and to remember
where symbols are defined.
The lexical analyzer is the only part of the compiler that looks at each
character of the source text. It is not unusual for lexical analysis to ac-
count for half the execution time of a compiler. Hence, speed is impor-
tant. The lexical analyzer's main activity is moving characters, so mini-
mizing the amount of character movement helps increase speed. This is
done by dividing the lexical analyzer into two tightly coupled modules.
102
6.1 •INPUT 103

The input module, input. c, reads the input in large chunks into a buffer,
and the recognition module, 1ex. c, examines the characters to recognize
tokens.

6.1 Input
In most programming languages, input is organized in lines. Although
in principle, there is rarely a limit on line length, in practice, line length
is limited. In addition, tokens cannot span line boundaries in most lan-
guages, so making sure complete lines are in memory when they are
being examined simplifies lexical analysis at little expense in capability.
String literals are the one exception in C, but they can be handled as a
special case.
The input module reads the source in large chunks, usually much
larger than individual lines, and it helps arrange for complete tokens
to be present in the input buffer when they are being examined, except
identifiers and string literals. To minimize the overhead of accessing the
input, the input module exports pointers that permit direct access to the
input buffer:
...
(input.c exported data)+=
extern unsigned char *cp;
97 104
...
106 fillbuf
extern unsigned char *limit; 106 nextline

cp points to the current input character, so *cp is that character. limit


points one character past the end of the characters in the input buffer,
and *limit is always a newline character and acts as a sentinel. These
pointers reference unsigned characters so that *cp, for example, won't
sign-extend a character whose value is greater than 127.
The important consequence of this design is that most of the input
characters are accessed by *cp, and many characters are never moved.
Only identifiers (excluding keywords) and string literals that appear in ex-
ecutable code are copied out of the buffer into permanent storage. Func-
tion calls are required only at line boundaries, which occur infrequently
when compared to the number of characters in the input. Specifically,
the lexical analyzer can use *cp++ to read a character and increment cp.
If *cp++ is a newline character, however, it must call nextl i ne, which
might reset cp and 1 i mi t. After calling next 1 i ne, if cp is equal to 1 i mi t,
the end of file has been reached.
Since *limit is always a newline, and nextl i ne must be called af-
ter reading a newline, it is rarely necessary for the lexical analyzer to
check if cp is less than 1 i mi t. next 1 i ne calls fi 11 buf when the newline
is the character pointed to by 1i mi t. The lexical analyzer can also call
fi 11 buf explicitly if, for example, it wishes to ensure that an entire to-
ken is present in the input buffer. Most tokens are short, less than 32
104 CHAPTER 6 • LEXICAL ANALYSIS

characters, so the lexical analyzer might call fi 11 buf whenever 1i mi t-cp


is less than 32.
This protocol is necessary in order for fi 11 buf to properly handle
lines that span input buffers. In general, each input buffer ends with a
partial line. To maintain the illusion of contiguous lines, and to reduce
unnecessary searching, fi 11 buf moves the 1i mi t-cp characters of the
partial line to the memory locations preceding the characters in the input
buffer so that they will be concatenated with the characters in the trailing
portion of the line when the input buffer is refilled. An example clarifies
this process: Suppose the state of the input buffer is

cp limit

where shading depicts the characters that have yet to be consumed and
\n represents the newline. If fi 11 buf is called, it slides the unconsumed
tail of the input buffer down and refills the buffer. The resulting state is
:·. J n-~:;:::r;~~1tl!itl\~::

cp
t t
1 imit
fillbuf 106 where the darker shading differentiates the newly read characters from
limit 103
nextline 106
those moved by fi 11 buf. When a call to fi 11 buf reaches the end of the
input, the buffer's state becomes

:::::::~'~'\~"---~'· ::::
t t
cp limit

Finally, when nextl i ne is called for the last sentinel at *limit, fi 11 buf
sets cp equal to 1i mi t, which indicates end of file (after the first call to
nextl i ne). This final state is
____i:::::
:·.:·.:·.:·.·.:·.:·_:l~\n

cp
t
1 imit

The remaining global variables exported by input. c are:


(input.c exported data)+=
...
103
extern int infd;
extern char *firstfile;
extern char *file;
extern char *line;
extern int lineno;
6.1 • INPUT 105

Input is read from the file descriptor given by i nfd; the default is zero,
which is the standard input. fi 1e is the name of the current input file;
line gives the location of the beginning of the current line, if it were
to fit in the buffer; and 1 i neno is the line number of the current line.
The coordinates f, x, y of the token that begins at cp, where f is the file
name, are thus given by fi 1e, cp- line, and 1i neno, where characters in
the line are numbered beginning with zero. 1i ne is used only to compute
the x coordinate, which counts tabs as single characters. fi rstfi 1e
gives the name of the first source file encountered in the input; it's used
in error messages.
The input buffer itself is hidden inside the input module:
(input.c exported macros)=
#define MAXLINE 512
#define BUFSIZE 4096

(input.c data)=
static int bsize;
static unsigned char buffer [MAXLINE+l + BUFSIZE~ :;

BUFSIZE is the size of the input buffer into which characters are read,
and MAXLINE is the maximum number of characters allowed in an uncon-
sumed tail of the input buffer. fi 11 buf must not be called if 1 i mi t-cp
104 file
is greater than MAXLINE. The standard specifies that compilers need not 106 fillbuf
handle lines that exceed 509 characters; l cc handles lines of arbitrary 104 fi rstfi le
length, but, except for identifiers and string literals, insists that tokens 104 infd
not exceed 512 characters. 103 limit
The value of bsi ze encodes three different input states: If bsi ze is less 104 line
104 lineno
than zero, no input has been read or a read error has occurred; if bsi ze 106 nextline
is zero, the end of input has been reached; and bsi ze is greater than
zero when bsi ze characters have just been read. This rather complicated
encoding ensures that 1cc is initialized properly and that it never tries
to read past the end of the input.
i nputini t initializes the input variables and fills the buffer:

(input.c functions)= 106


....
void inputinit() {
limit= cp = &buffer[MAXLINE+l];
bsize = -1;
lineno = O;
file = NULL;
(refill buffer 106)
nextl i ne();
}

next 1i ne is called whenever *cp++ reads a newline. If cp is greater than


or equal to 1i mi t, the input buffer is empty.
106 CHAPTER 6 • LEXICAL ANAL YS/S

...
(input.c functions)+=
void nextline() {
105 106...
do {
if (cp >= limit) {
(refill buffer 106)
if (cp == limit)
return;
} else
lineno++;
for (line= (char *)cp; *cp==' ' I I *cp=='\t'; cp++)

} while (*cp == '\n' && cp ==limit);


if (*cp == '#') {
resynch();
nextl i ne();
}
}

If cp is still equal to 1i mi t after filling the buffer, the end of the file has
been reached. The do-while loop advances cp to the first nonwhite-space
character in the line, treating sentinel newlines as white space. The last
four lines of next 1i ne check for resynchronization directives emitted by
bsize 105 the preprocessor; see Exercise 6.2. i nput!ni t and next 1 i ne call fi 11 buf
buffer 105 to refill the input buffer:
BUFSIZE 105
infd 104 (refill buffer 106)= 105 106
inputinit 105 fillbuf();
limit 103 if (cp >= limit)
line 104
cp = limit;
lineno 104
MAX LINE 105 If the input is exhausted, cp will still be greater than or equal to 1i mi t
resynch 125
when fi 11 buf returns, which leaves these variables set as shown in the
last diagram on page 104. fi 11 buf does all of the buffer management
and the actual input:
(input.c functions)+=
...
106
void fillbuf() {
if (bsize == 0)
return;
i f (cp >= limit)
cp = &buffer[MAXLINE+l];
else
(move the tail portion 107)
bsize = read(infd, &buffer[MAXLINE+l], BUFSIZE);
if (bsize < 0) {
error("read error\n");
exit(l);
6.2 • RECOGNIZING TOKENS 107

}
limit= &buffer[MAXLINE+l+bsize];
*limit= '\n';
}

fi 11 buf reads the BUFSIZE (or fewer) characters into the buffer begin-
ning at position MAXLINE+l, resets 1imi t, and stores the sentinel newline.
If the input buffer is empty when fi 11 buf is called, cp is reset to point
to the first new character. Otherwise, the tail 1i mi t-cp characters are
moved so that the last character is in buffer[MAXLINE], and is thus ad-
jacent to the newly read characters.
(move the tail portion 107) = 106
{
int n = limit - cp;
unsigned char *s = &buffer[MAXLINE+l] - n;
line= (char *)s - ((char *)cp - line);
while (cp < limit)
*s++ = *cp++;
cp = &buffer[MAXLINE+l] - n;
}

Notice the computation of 1i ne: It accounts for the portion of the current
line that has already been consumed, so that cp- 1i ne gives the correct 105 bsize
index of the character *cp. 105 buffer
105 BUFSIZE
106 fillbuf
103 limit
6.2 Recognizing Tokens 104 line
105 MAXLINE
There are two principal techniques for recognizing tokens: building a
finite automaton or writing an ad hoc recognizer by hand. The lexical
structure of most programming languages can be described by regular
expressions, and such expressions can be used to construct a determin-
istic finite automaton that recognizes and returns tokens. The advantage
of this approach is that it can be automated. For example, LEX is a pro-
gram that takes a lexical specification, given as regular expressions, and
generates an automaton and an appropriate interpreting program.
The lexical structure of most languages is simple enough that lexical
analyzers can be constructed easily by hand. In addition, automatically
generated analyzers, such as those produced by LEX, tend to be large
and slower than analyzers built by hand. Tools like LEX are very use-
ful, however, for one-shot programs and for applications with complex
lexical structures.
For C, tokens fall into the six classes defined by the following EBNF
grammar:
108 CHAPTER 6 • LEXICAL ANALYSIS

token:
keyword
identifier
constant
string-literal
operator
punctuator
punctuator:
one of [ ] ( ) { } * , : = ; ...
White space - blanks, tabs, newlines, and comments - separates some
tokens, such as adjacent identifiers, but is otherwise ignored except in
string literals.
The lexical analyzer exports two functions and four variables:
(lex.c exported functions)=
extern int getchr ARGS((void));
extern int gettok ARGS((void));

(lex.c exported data)=


extern int t;
extern char *token;
extern Symbol tsym;
Coordinate 38 extern Coordinate src;
gettok 111
gettok returns the next token. getchr returns, but does not consume,
the next nonwhite-space character. The values returned by gettok are
the characters themselves (for single-character tokens), enumeration con-
stants (such as IF) for the keywords, and the following defined constants
for the others:
ID identifiers
FCON floating constants
ICON integer constants
SCON string constants
!NCR ++
DECR
DE REF ->
ANDAND &&
OROR 11
LEQ <=
EQL
NEQ !=
GEQ >=
RSHIFT >>
LSHIFT <<
ELLIPSIS
EOI end of input
6.2 • RECOGNIZING TOKENS 109

These constants are defined by


(lex.c exported types)=
enum {
#define xx(a,b,c,d,e,f,g) a=b,
#define yy(a,b,c,d,e,f ,g)
#include "token.h"
LAST
};

where token. h is a file with 2 56 lines like


( token.h 109) = 109
....
yy(O, 0, 0, 0, 0, 0, 0)
xx(FLOAT, 1, 0, 0, 0, CHAR, "float")
xx(DOUBLE, 2, 0, 0, 0, CHAR, "double")
xx(CHAR, 3' 0, 0, 0, CHAR, "char")
xx(SHORT, 4, 0, 0, 0, CHAR, "short")
xx(INT, 5, 0, 0, 0, CHAR, "int")
xx(UNSIGNED, 6, 0, 0, 0, CHAR, "unsi · .~d")
xx(POINTER, 7, 0, 0, 0, 0, O)
xx(VOID, 8, 0, 0, 0, CHAR, "void")
xx(STRUCT, 9, 0, 0, 0, CHAR, "struct")
xx(UNION, 10, 0, 0, 0, CHAR, "union") 192 addtree
xx(FUNCTION, 11, 0, 0, 0, 0, O) 149 AND
xx(ARRAY, 12, 0, 0, 0, 0, O) 193 cmptree
xx(ENUM, 13, 0, 0, 0, CHAR, "enum") 149 OR
xx(LONG, 14, 0, 0, 0, CHAR, "long")
xx(CONST, 15, 0, 0, 0, CHAR, "const")
xx(VOLATILE, 16, 0, 0, 0, CHAR, "volatile")

( token.h 109) +=
...
109
yy(O, 42, 13, MUL, multree,ID, "*")
yy(O, 43, 12, ADD, addtree,ID, "+")
yy(O, 44, 1, 0, 0, I I
II'")
yy(O, 45, 12, SUB, ' '
subtree,ID, "-")
yy(O, 46, 0, 0, 0, I I
II•")
yy(O, 47, 13, DIV, multree, '/',' "/")
xx(DECR, 48, 0, SUB, subtree,ID, II
--")
xx(DEREF, 49, 0, 0, 0, DEREF, "->")
xx(ANDAND, 50, 5' AND, andtree,ANDAND, "&&")
xx(OROR, 51, 4, OR, andtree,OROR, "11 ")
xx(LEQ, 52, 10, LE, cmptree,LEQ, "<=")

token. h uses macros to collect everything about each token or symbolic


constant into one place. Each line in token. h gives seven values of inter-
est for the token as arguments to either xx or yy. The token codes are
110 CHAPTER 6 • LEXICAL ANAL YS/S

given by the values in the second column. token. h is read to define sym-
bols, build arrays indexed by token, and so forth, and using it guarantees
that such definitions are synchronized with one another. This technique
is common in assembler language programming.
Single-character tokens have yy lines and multicharacter tokens and
other definitions have xx lines. The first column in xx is the enumeration
identifier. The other columns give the identifier or character value, the
precedence if the token is an operator (Section 8.3), the generic opera-
tor (Section 5.5), the tree-building function (Section 9.4), the token's set
(Section 7.6), and the string representation.
These columns are extracted for different purposes by defining the xx
and yy macros and including token. h again. The enumeration definition
above illustrates this technique; it defines xx so that each expansion de-
fines one member of the enumeration. For example, the xx line for DECR
expands to
DECR=48,

and thus defines DECR to an enumeration constant with the value 48. yy
is defined to have no replacement, which effectively ignores the yy lines.
The global variable t is often used to hold the current token, so most
calls to gettok use the idiom
DECR 109 t = gettok();
gettok 111
src 108 token, tsym, and src hold the values associated with the current token,
Symbol 37 if there are any. token is the source text for the token itself, and tsym is
token. h 109
token 108
a Symbol for some tokens, such as identifiers and constants. src is the
tsym 108 source coordinate for the current token.
gettok could return a structure containing the token code and the
associated values, or a pointer to such a structure. Since most calls to
gettok examine only the token code, this kind of encapsulation does
not add significant capability. Also, gettok is the most frequently called
function in the compiler; a simple interface makes the code easier to
read.
gettok recognizes a token by switching on its first character, which
classifies the token, and consuming subsequent characters that make up
the token. For some tokens, these characters are given by one or more
of the sets defined by map. map [ c] is a mask that classifies character c
as a member of one or more of six sets:
(lex.c types)=
enum { BLANK=Ol, NEWLINE=02, LETTER=04,
DIGIT=OlO, HEX=020, OTHER=040 };

(lex.c data)= 117


....
static unsigned char map[256] { (map initializer) } ;
6.2 • RECOGNIZING TOKENS 111

map[c]&BLANK is nonzero if c is a white-space character other than a


newline. Newlines are excluded because hitting one requires gettok to
call next line. The other values identify other subsets of characters:
NEWLINE is the set consisting of just the newline character, LETTER is the
set of upper- and lowercase letters, DIGIT is the set of digits 0-9, HEX is
the set of digits 0-9, a-f, and A-F, and OTHER is the set that holds the
rest of the ASCII characters that are in the source and execution character
sets specified by the standard. If map [ c] is zero, c is not guaranteed to
be acceptable to all ANSI C compilers, which, somewhat surprisingly, is
the case for$,@, and '.
gettok is a large function, but the switch statement that dispatches
on the token's first character divides it into manageable pieces:
(lex.c macros)=
#define MAXTOKEN 32

(lex. c functions)=
int gettok() {
...
117

for (; ;) {
register unsigned char *rep cp;
(skip white space 112)
if (limit - rep < MAXTOKEN) {
cp = rep; 110 BLANK
fillbuf(); 110 DIGIT
rep = cp; 104 file
} 106 fillbuf
110 HEX
src.file = file; 110 LETTER
src.x = (char *)rep - line; 103 limit
src.y = lineno; 104 line
cp = rep + 1; 104 lineno
switch (*rep++) { 110 map
110 NEWLINE
(gettok cases 112) 106 nextline
default: 110 OTHER
if ((map[cp[-l]]&BLANK) O)
(illegal character)
}
}
}

gettok begins by skipping over white space and then checking that there
is at least one token in the input buffer. If there isn't, calling fi 11 buf
ensures that there is. MAXTOKEN applies to all tokens except identifiers,
string literals, and numeric constants; occurrences of these tokens that
are longer than MAXTOKEN characters are handled explicitly in the code
for those tokens. The standard permits compilers to limit string literals
to 509 characters and identifiers to 31 characters. lee increases these
112 CHAPTER 6 • LEXICAL ANAL YS/S

limits to 4,096 (BUFSIZE) and 512 (MAXLINE) to accommodate programs


that emit C programs, because these emitted programs may contain long
identifiers.
Instead of using cp as suggested in Section 6.1, gettok copies cp to
the register variable rep upon entry, and uses rep in token recogni-
tion. gettok copies rep back to cp before it returns, and before calls
to nextl i ne and fi 11 buf. Using rep improves performance and makes
scanning loops compact and fast. For example, white space is elided by
(skip white space 112) = 111
while (map[*rcp]&BLANK)
rep++;
Using a register variable to index map generates efficient code where it
counts. These kinds of scans examine every character in the input, and
they examine characters by accessing the input buffer directly. Some
optimizing compilers can make similar improvements locally, but not
across potentially aliased assignments and calls to other, irrelevant func-
tions.
Each of the sections below describes one of the cases in (gettok
cases). The cases omitted from this book are
(gettok cases 112)= 112
.... 111
BLANK 110 case '/': (comment or/)
BUFSIZE 105 case 'L': (wide-character constants)
fillbuf 106 (cases for two-character operators)
gettok 111
limit 103
(cases for one-character operators and punctuation)
map 110
gettok calls next l i ne when it trips over a newline or one of its syn-
MAXLINE 105
nextl i ne 106 onyms:
rep 111
(gettok cases 112)+=
...
112 113 111
tsym 108 ....
case '\n': case '\v': case '\r': case '\f':
nextl i ne();
if ((end ofinput112)) {
tsym = NULL;
return EOI;
}
continue;

(end of input 112)= 112 124


cp == limit
When control reaches this case, cp points to the character that follows
the newline; when nextl i ne returns, cp still points to that character, and
cp is less than limit. End of file is the exception: here, cp equals limit.
Testing for this condition is rarely needed, because *cp will always be a
newline, which terminates the scans for most tokens.
6. 3 • RECOGNIZING KEYWORDS 113

The sections below describe the remaining cases. Recognizing the to-
kens themselves is relatively straightforward; computing the associated
values for some token is what complicates each case.

6.3 Recognizing Keywords


There are 28 keywords:
keyword: one of
auto double int struct
break else long switch
char extern return union
con st float short unsigned
continue for signed void
default goto sizeof volatile
do if static while
Keywords could be recognized through a look-up in a table in which each
keyword entry carries its token code and each built-in type entry carries
its type. Instead, keywords are recognized by a hard-coded decision tree,
which is faster than searching a table and nearly as simple. The cases
for the lowercase letters that begin keywords make explicit tests for the
110 DIGIT
keywords, which are possible because the entire token must appear in
109 INT
the input buffer. For example, the case for i is 110 LETTER
.... 110 map
(gettok cases 112) +=
case 'i':
...
112 114 111 111
108
rep
tsym
if (rcp[O] == 'f'
&& !(map[rcp[l]]&(DIGITILETTER))) {
cp = rep + 1;
return IF;
}
if (rcp[O] == 'n'
&& rcp[l] == 't'
&& !(map[rcp[2]]&(DIGITILETTER))) {
cp = rep + 2;
tsym = inttype->u.sym;
return INT;
}
goto id;
id labels the code in the next section that scans identifiers. If the token
is if or int, cp is updated and the appropriate token code is returned;
otherwise, the token is an identifier. For int, tsym holds the symbol-
table entry for the type int. The cases for the characters abcdefgl rsuvw
are similar, and were generated automatically by a short program.
114 CHAPTER 6 • LEXICAL ANALYSIS

The code generated for these fragments is short and fast. For example,
on most machines, int is recognized by less than a dozen instructions,
many fewer than are executed when a table is searched for keywords,
even if perfect hashing is used.

6.4 Recognizing Identifiers


The syntax for identifiers is
identifier:
nondigit { nondigit I digit }
digit:
one of 0 1 2 3 4 5 6 7 8 9
nondigit:
one of_
a b c d e f g h i j k l m
n o p q r s t u v wx y z
ABCDE F GHI J KL M
N 0 P Q R S T U V WX Y Z
The code echoes this syntax, but must also cope with the possibility of
DIGIT 110 identifiers that are longer than MAXTOKEN characters and thus might be
LETTER 110
split across input buffers.
map
MAXLINE
110
105 (gettok cases 112)+=
...
113 116 111
MAXTOKEN 111 .....
rep 111 case 'h': case I j I : case 'k': case 'm': case 'n': case 'o':
stringn 30 case 'p': case 'q': case 'x': case 'y': case 'z':
token 108 case 'A': case 'B': case '(': case 'D': case 'E': case 'F':
case 'G': case 'H': case 'I': case I J I : case 'K':

case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R':
case 'S': case 'T': case 'U': case 'V': case 'W': case 'X':
I
case 'Y': case 'Z': case I•

id:
(ensure there are at least MAXLINE characters 115)
token = (char *)rep - 1;
while (map[*rcp]&(DIGITILETTER))
rep++;
token= stringn(token, (char *)rep - token);
(tsym -- type named by token 115)
cp = rep;
return ID;
All identifiers are saved in the string table. At the entry to this and all
cases, both cp and rep have been incremented past the first character
of the token. If the input buffer holds less than MAXLINE characters,
6. 5 • RECOGNIZING NUMBERS 115

ep is backed up one character to point to the identifier's first charac-


ter, fi 11 buf is called to replenish the input buffer, and cp and rep are
adjusted to point to the identifier's second character as before:
(ensure there are at least MAXLINE characters 115)= 114 116 120
if (limit - rep < MAXLINE) {
ep = rep - 1;
fillbuf();
rep = ++ep;
}

A typedef makes an identifier a synonym for a type, and these names


are installed in the i denti fie rs table. gettok thus sets tsym to the
symbol-table entry for token, if there is one:
(tsym - type named by token 115)= 114
tsym = lookup(token, identifiers);
If token names a type, tsym is set to the symbol-table entry for that type,
and tsym->sel ass will be equal to TYPEDEF. Otherwise, tsym is null or
the identifier isn't a type name. The macro
(lex.c exported macros)=
#define istypename(t,tsym) (kind[t] == CHAR \
109 CHAR
I I t == ID && tsym && tsym->selass == TYPEDEF) 106 fillbuf
111 gettok
encapsulates testing if the current token is a type name: A type name is 41 identifiers
either one of the keywords that names a type, such as int, or an identifier 143 kind
that is a typedef for a type. The global variables t and tsym are the only 103 limit
valid arguments to i stypename. 45 lookup
105 MAXLINE
111 rep
108 token
6.5 Recognizing Numbers 108 tsym

There are four kinds of numeric constants in ANSI C:


constant:
floating-constant
integer-constant
enumeration-constant
character-constant
enumeration-constant:
identifier
The code for identifiers shown in the previous section handles enumera-
tion constants, and the code in Section 6.6 handles character constants.
The lexical analyzer returns the token code ID and sets tsym to the
symbol-table entry for the enumeration constant. The caller checks for
116 CHAPTER 6 • LEXICAL ANALYSIS

an enumeration constant and uses the appropriate integer in its place;


the code in Section 8.8 is an instance of this convention.
There are three kinds of integer constants:
integer-constant:
decimal-constant [ integer-suffix ]
octal-constant [ integer-suffix ]
hexadecimal-constant [ integer-suffix ]
integer-suffix:
unsigned-suffix [long-suffix]
long-suffix [ unsigned-suffix ]
unsigned-suffix: u I U
Jong-suffix: 1 I L
The first few characters of the integer constant help identify its kind .
...
(gettok cases 112) +=
case 'O': case '1': case '2': case '3': case '4':
114 119
... 111

case '5': case '6': case '7': case '8': case '9': {
unsigned int n = O;
(ensure there are at JeastMAXLINE characters 115)
token = (char *)rep - 1;
MAXLINE 105 if (*token == 'O' && (*rep == 'x' I I *rep == 'X')) {
rep 111 (hexadecimal constant)
token 108 } else if (*token == 'O') {
(octal constant)
} else {
(decimal constant 117)
}
return ICON;
}

As for identifiers, this case begins by insuring that the input buffer holds
at least MAXLINE characters, which permits the code to look ahead, as the
test for hexadecimal constants illustrates.
The fragments for the three kinds of integer constant set n to the value
of the constant. They must not only recognize the constant, but also
ensure that the constant is within the range of representable integers.
Recognizing decimal constants illustrates this processing. The syntax
for decimal constants is:
decimal-constant:
nonzero-digit { digit }
nonzero-digit:
one of 1 2 3 4 5 6 7 8 9
The code accumulates the decimal value in n by repeated multiplications:
6.5 • RECOGNIZING NUMBERS 117

(decimal constant 117) = 116


int overflow = O;
for (n =*token - 'O'; map[*rcp]&DIGIT; ) {
int d =*rep++ - 'O';
if (n > ((unsigned)UINT_MAX - d)/10)
overflow = 1;
else
n = lO*n + d;
}
(check for floating constant 117)
cp = rep;
tsym = icon(n, overflow, 10);
*
At each step, overflow will occur if 10 n + d > UINT _MAX, where UINLMAX
is the value of the largest representable unsigned number. Rearranging
this equation gives the test shown above, which looks before it leaps into
computing the new value of n. overflow is set to one if the constant
overflows. icon handles the optional suffixes.
A decimal constant is the prefix of a floating constant if the next char-
acter is a period or an exponent indicator:
(check for floating constant 117)= 117
if (*rep == '.' I I *rep == 'e' I I *rep IE I) {
cp = rep; 110 DIGIT
120 fcon
tsym = fcon(); 111 gettok
return FCON ; 110 map
} 111 rep
37 symbol
fcon is similar to icon; it recognizes the suffix of a floating constant. 108 token
overflow will be one when a floating constant has a whole part that 108 tsym
exceeds UINT_MAX, but neither n nor overflow is passed to fcon, which 57 unsignedlong
reexamines token to check for floating overflow.
i con recognizes the optional U and L suffixes (in either case), warns
about values that overflow, initializes a symbol to the appropriate type
and value, and returns a pointer to the symbol
(lex.c data)+=
....
110
static struct symbol tval;
tval serves only to provide the type and value of a constant to gettok's
caller. The caller must lift the relevant data before the next call to gettok .
....
(lex.c functions)+=
static Symbol icon(n, overflow, base)
111 119
...
unsigned n; int overflow, base; {
if ((*cp=='u' I l*cp=='U') && (cp[l]=='l' I lcp[l]=='L')
I I (*cp=='l' I l*cp=='L') && (cp[l]=='u' I lcp[l]=='U')) {
tval.type = unsignedlong;
118 CHAPTER 6 • LEXICAL ANALYSIS

cp += 2;
} else if (*cp == 'u' I I *cp 'U') {
tval.type = unsignedtype;
cp += 1;
} else if (*cp == 'l' I I *cp == 'L') {
if (n > (unsigned)LONG_MAX)
tval.type unsignedlong;
else
tval.type longtype;
cp += 1;
} else if (base == 10 && n > (unsigned)LONG_MAX)
tval.type = unsignedlong;
else if (n > (unsigned)INT_MAX)
tval.type unsignedtype;
else
tval.type inttype;
if (overflow) {
warning("overflow in constant '%5'\n", token,
(char*)cp - token);
n = LONG_MAX;
}
(set tval 's value 118)
isunsigned 60 ppnumber("integer");
longtype 57 return &tval;
ppnumber 119 }
%5 99
token 108 If both U and L appear, n is an unsigned long, and if only U appears,
tval 117
unsignedlong 57
n is an unsigned. If only L appears, n is a long unless it's too big, in
unsignedtype 58 which case it's an unsigned long. n is also an unsigned long if it's an
unsuffixed decimal constant and it's too big to be a long. Unsuffixed
octal and hexadecimal constants are ints unless they're too big, in which
case they're unsigneds. The format code %S prints a string like pri ntf's
%s, but consumes an additional argument that specifies the length of the
string. It can thus print strings that aren't terminated by a null character.
The types int, long, and unsigned are different types, but 1cc insists
that they all have the same size. This constraint simplifies the tests
shown above and the code that sets tva l's value:
(settval 's value 118)= 118
if (isunsigned(tval.type))
tval.u.c.v.u n;
else
tval.u.c.v.i n;
Relaxing this constraint would complicate this code and the tests above.
For example, the standard specifies that the type of an unsuffixed dec-
imal constant is int, long, or unsigned long, depending on its value. In
6. 5 • RECOGNIZING NUMBERS 119

l cc, ints and longs can accommodate the same range of integers, so an
unsuffixed decimal constant is either int or unsigned.
A numeric constant is formed from a preprocessing number, which is
the numeric constant recognized by the C preprocessor. Unfortunately,
the standard specifies preprocessing numbers that are a superset of the
integer and floating constants; that is, a valid preprocessing number may
not be a valid numeric constant. 12 3 . 4 . 5 is an example. The prepro-
cessor deals with such numbers too, but it may pass them on to the
compiler, which must treat them as single tokens and thus must catch
preprocessing numbers that aren't valid constants.
The syntax of a preprocessing number is
pp-number:
[ . ] digit { digit I . I nondigit I E sign I e sign }
sign: - I+
Valid numeric constants are prefixes of preprocessing numbers, so the
processing in icon and fcon might conclude successfully without con-
suming the complete preprocessing number, which is an error. ppnumber
is called from icon, and fcon and checks for this case.
...
(lex.c functions)+=
static void ppnumber(which) char *which; {
117 120
...
47 constant
unsigned char *rep= cp--; 110 DIGIT
111 gettok
for ( ; (map[*cp]&(DIGITILETTER)) I I *cp == I
. ' cp++)
I,
110 LETTER
if ((cp[OJ 'E' 11 cp[O] 'e') 110 map
&& (cp[l] '-' 11 cp[l] == '+')) 111 rep
108 token
cp++;
if (cp > rep)
error("'%S' is a preprocessing number but an _
invalid %s constant\n", token,
(char*)cp-token, which);
}

ppnumber backs up one character and skips over the characters that may
comprise a preprocessing number; if it scans past the end of the numeric
token, there's an error.
fcon recognizes the suffix of floating constants and is called in two
places. One of the calls is shown above in (check for floating constant).
The other call is from the gettok case for '. ':
...
(gettok cases 112) +=
case '.':
...
116 122 111

if (rcp[O] == '.' && rcp[l] I•') {


cp += 2;
return ELLIPSIS;
120 CHAPTER 6 • LEXICAL ANALYSIS

}
if ((map[*rcp]&DIGIT) == 0)
return '.';
(ensure there are at JeastMAXLINE characters 115}
cp = rep - 1;
token = (char *)cp;
tsym = fcon () ;
return FCON;
The syntax for floating constants is
floating-constant:
fractional-constant [ exponent-part ] [ floating-suffix ]
digit-sequence exponent-part [floating-suffix]
fractional-constant:
[ digit-sequence ] . digit-sequence
digit-sequence .
exponent-part:
e [ sign ] digit-sequence
E [ sign ] digit-sequence
digit-sequence:
digit { digit }
DIGIT 110
map 110 floating-suffix:
ppnumber 119 one off l F L
rep 111
token 108 fcon recognizes a floating-constant, converts the token to a double value,
tsym 108
tval 117
and determines tva l's type and value:
(lex.c functions}+=
....
119
static Symbol fcon() {
(scan past a floating constant 121}
errno = O;
tval.u.c.v.d = strtod(token, NULL);
if (errno == ERANGE)
(warn about overflow120}
(set tva l's type and value 121}
ppnumber("floating");
return &tval;
}

(warn about overflow120}= 120 121


warning("overflow in floating constant '%5'\n", token,
(char*)cp - token);
strtod is a C library function that interprets its first string argument as
a floating constant and returns the corresponding double value. If the
6.6 • RECOGNIZING CHARACTER CONSTANTS AND STRINGS 121

constant is out of range, strtod sets the global variable errno to ERANGE
as stipulated by the ANSI C specification for the C library.
A floating constant follows the syntax shown above, and is recognized
by:
(scan past a floating constant121}= 120
if(*cp=='.')
(scan past a run of digits 121}
if (*cp == 'e' I I *cp == 'E') {
if (*++cp == '-' I I *cp == '+')
cp++;
if (map[*cp]&DIGIT)
(scan past a run of digits 121}
else
error("invalid floating constant '%5'\n", token,
(char*)cp - token);
}

(scan past a run of digits 121}= 121


do
cp++;
while (map[*cp]&DIGIT);
As dictated by the syntax, an exponent indicator must be followed by at 110 DIGIT
57 doubletype
least one digit. 57 floattype
A floating constant may have an F or L suffix (but not both); these 57 longdouble
specify the types float and long double, respectively. 110 map
108 token
(set tval 's type and value 121}= 120 117 tval
if (*cp == 'f' I I *cp == 'F') {
++cp;
if (tval.u.c.v.d > FLT_MAX)
(warn about overflow 120}
tval.type = floattype;
tval.u.c.v.f = tval.u.c.v.d;
} else if (*cp == 'l' I I *cp == 'L') {
cp++;
tval.type = longdouble;
} else
tval.type = doubletype;

6.6 Recognizing Character Constants and Strings


Recognizing character constants and string literals is complicated by es-
cape sequences like \n, \034, \xFF, and\", and by wide-character con-
stants. 1cc implements so-called wide characters as normal ASCII char-
122 CHAPTER 6 • LEXICAL ANALYSIS

acters, and thus uses unsigned char for the type wchar_t. The syntax
is
character-constant:
[ L ] ' c-char { c-char } '
c-char:
any character except ' , \, or newline
escape-sequence
escape-sequence:
one of\' \" \?\\\a \b \f \n \r \t \v
\ octal-digit [ octal-digit [ octal-digit ] ]
\x hexadecimal-digit { hexadecimal-digit }
string-literal:
[ L ] " { s-char }"
s-char:
any character except ", \, or newline
escape-sequence
String literals can span more than one line if a backslash immediately
precedes the newline. Adjacent string literals are automatically concate-
nated together to form a single literal. In a proper ANSI C implemen-
BUFSIZE 105 tation, this line splicing and string literal concatenation is done by the
limit 103 preprocessor, and the compiler sees only single, uninterrupted string lit-
MAXLINE 105
nextl i ne 106
erals. 1cc implements line splicing and concatenation for string literals
anyway, so that it can be used with pre-ANSI preprocessors.
Implementing these features means that string literals can be longer
than MAXLINE characters, so (ensure there are at leastMAXLINE characters)
cannot be used to ensure that a sequence of adjacent entire string literals
appears in the input buffer. Instead, the code must detect the newline
at 1i mi t and call nextl i ne explicitly, and it must copy the literal into a
private buffer.
....
(gettok cases 112)+= 119 111
scan:
case '\' ' : case ' " ' : {
static char cbuf[BUFSIZE+l];
char *s = cbuf;
int nbad = O;
*s++ = *--cp;
do {
cp++;
(scan one string literal 123)
if (*cp == cbuf[O])
cp++;
else
6.6 • RECOGNIZING CHARACTER CONSTANTS AND STRINGS 123

error("mi ssi ng %c\n", cbuf[O]):


} while (cbuf[O] == "" && getchr() "");
*s++ = O;
if (s >= &cbuf[sizeof cbuf])
error("%s literal too long\n",
cbuf[O] == '"' ? "string" : "character");
(warn about non-ANSI literal.s)
(set tval and return ICON or SCON 123)
}

The outer do-while loop gathers up adjacent string literals, which are
identified by their leading double quote character, into cbuf, and reports
those that are too long. The leading character also determines the type
of the associated value and gettok's return value:
(set tvaland return ICON or SCON 123)= 123
token = cbuf;
tsym = &tval;
if (cbuf[O] == '"') {
tval.type = array(chartype, s - cbuf - 1, O);
tval.u.c.v.p = cbuf + 1;
return SCON;
} else {
61 array
if (s - cbuf > 3) 57 chartype
warning("excess characters in multibyte character_ 111 gettok
literal '%5' ignored\n", token, (char*)cp-token); 103 limit
else if Cs - cbuf <= 2) 110 map
error("missing '\n"); 110 NEWLINE
167 primary
tval.type = inttype; 29 string
tval.u.c.v.i = cbuf[l]; 30 stringn
return ICON; 108 token
} 108 tsym
117 tval
String literals can contain null characters as the result of the escape se-
quence \0, so the length of the literal is given by its type: Ann-character
literal has the type (ARRAY n (CHAR)) (n does not include the double
quotes). gettok's callers, such as primary, call stri ngn when they want
to save the string literal referenced by tval.
The code below, which scans a string literal or character constant,
copes with four situations: newlines at 1 i mi t, escape sequences, non-
ANSI characters, and literals that exceed the size of cbuf.
(scan one string literal 123) = 122
while (*cp != cbuf[O]) {
int c;
if (map[*cp]&NEWLINE) {
if ( cp < 1i mi t)
124 CHAPTER 6 • LEXICAL ANALYSIS

break;
cp++;
nextl i ne();
if ((end of input112})
break;
continue;
}
c = *cp++;
if (c == '\\') {
if (map[*cp]&NEWLINE) {
if (cp < limit)
break;
cp++;
nextline();
}
if (limit - cp < MAXTOKEN)
fillbuf();
c = backslash(cbuf[O]);
} else if (map[c] == O)
nbad++;
if (s < &cbuf[sizeof cbuf] - 2)
*s++ = c;
backslash 126 }
fillbuf 106
limit 103 If *limit is a newline, it serves only to terminate the buffer, and is thus
map 110 ignored unless there's no more input. Other newlines (those for which
MAXTOKEN 111
NEWLINE 110
cp is less than 1i mi t) and the one at the end of file terminate the while
nextline 106 loop without advancing cp. backslash interprets the escape sequences
described above; see Exercise 6.10. nbad counts the number of non-ANSI
characters that appear in the literal; 1cc's -A -A option causes warn-
ings about literals that contain such characters or that are longer than
ANSI's 509-character guarantee.

Further Reading
The input module is based on the design described by Waite (1986). The
difference is that Waite's algorithm moves one partial line instead of
potentially several partial lines or tokens, and does so after scanning
the first newline in the buffer. But this operation overwrites storage
before the buffer when a partial line is longer than a fixed maximum.
The algorithm above avoids this problem, but at the per-token cost of
comparing 1 i mi t-cp with MAXTOKEN.
Lexical analyzers can be generated from a regular-expression specifi-
cation of the lexical structure of the language. LEX (Lesk 1975), which
is available on UNIX, is perhaps the best known example. Schreiner and
EXERCISES 125

Friedman (1985) use LEX in their sample compilers, and Holub (1990) de-
tails an implementation of a similar tool. More recent generators, such
as fl ex, re2c (Bumbulis and Cowan 1993), and Eli's scanner genera-
tor (Gray et al. 1992; Heuring 1986), produce lexical analyzers that are
much faster and smaller than those produced by LEX. On some comput-
ers, EU and re2c produce lexical analyzers that are faster than 1cc's. EU
originated some of the techniques used in 1cc's gettok.
A "perfect" hash function is one that maps each word from a known
set into a different hash number (Cichelli 1980; Jaeschke and Osterburg
1980; Sager 1985). Some compilers use perfect hashing for keywords,
but the hashing itself usually takes more instructions than 1cc uses to
recognize keywords.
lee relies on the library function strtod to convert the string repre-
sentation of a floating constant to its corresponding double value. Doing
this conversion as accurately as possible is complicated; Clinger (1990)
shows that it may require arithmetic of arbitrary precision in some cases.
Many implementations of st rtod are based on Clinger's algorithm. The
opposite problem - converting a double to its string representation -
is just as laborious. Steele and White (1990) give the gory details.

Exercises
105 BUFSIZE
6.1 What happens if a line longer than BUFSIZE characters appears in 104 file
the input? Are zero-length lines handled properly? 111 gettok
104 lineno
6.2 The C preprocessor emits lines of the form 106 nextline

# n "file"
#line n "file"
#1 i ne n

These lines are used to reset the current line number and file name
to n and file, respectively, so that error messages refer to the correct
file. In the third form, the current file name remains unchanged.
re synch, called by next 1i ne, recognizes these lines and resets fi 1e
and 1i neno accordingly. Implement re synch.
6.3 In many implementations of C, the preprocessor runs as a separate
program with its output passed along as the input to the compiler.
Implement the preprocessor as an integral part of input. c, and
measure the resulting improvement. Be warned: Writing a prepro-
cessor is a big job with many pitfalls. The only definitive specifica-
tion for the preprocessor is the ANSI standard.
6.4 Implement the fragments omitted from gettok.
126 CHAPTER 6 • LEXICAL ANALYSIS

6.5 What happens when lee reads an identifier longer than MAXLINE
characters?
6.6 Implement int get ch r (void).

6. 7 Try perfect hashing for the keywords. Does it beat the current im-
plementation?
6.8 The syntax for octal constants is
octal-constant:
O { octal-digit }
octal-digit:
one of O 1 2 3 4 5 6 7
Write (octal constant). Be careful; an octal constant is a valid prefix
of a floating constant, and octal constants can overflow.
6.9 The syntax for hexadecimal constants is
hexadecimal-constant:
( Ox I OX ) hexadecimal-digit { hexadecimal-digit }
hexadecimal-digit:
one of 0 1 2 3 4 5 6 7 a b c d e f A B C D E F
getchr 108
icon 117 Write (hexadecimal constant). Don't forget to handle overflow.
MAXLINE 105
6.10 Implement
(lex.c prototypes)=
static int backslash ARGS((int q));

which interprets a single escape sequence beginning at cp. q is


either a single or double quote, and thus distinguishes between
character constants and string literals.
6.11 Implement the code for (wide-character constants). Remember that
wchar_t is unsigned char, so the value of the constant L'\377'
is 255, not -1.
6.12 Reimplement the lexical analyzer using LEX or an equivalent pro-
gram generator, and compare the two implementations. Which is
faster? Smaller? Which is easier to understand? Modify?
6.13 How many instructions is (skip whitespace) onyourmachine? How
many would it be if it used cp instead of rep?
6.14 Write a program to generate the (gettok cases) for the C keywords.

6.15 lee assumes that int and long (signed and unsigned) have the same
size. Revise i con to remove this regrettable assumption.
7
Parsing

The lexical analyzer described in Chapter 6 provides a stream of tokens


to the parser. The parser confirms that the input conforms to the syn-
tax of the language, and builds an internal representation of the input
source program. Subsequent phases of 1cc traverse this representation
to generate code for a specific target machine.
1cc uses a recursive-descent parser. It's a straightforward application
of classical parsing techniques for constructing parsers by hand. This
approach produces a small and efficient compiler, and is suitable for
languages as simple as C or Pascal. Indeed, many commercial compilers
are constructed using these techniques.
For more complex languages, however, techniques that use parser gen-
erators might be preferable. For example, C is in the class of languages
that can be recognized by recursive-descent parsers, but other languages,
like ADA, are not. For those languages, more powerful parsers, such as
bottom-up parsers, must be used. Construction of these kinds of parsers
by hand is too difficult; automatic methods must be used.
The remainder of this chapter lays the groundwork in formal language
theory, syntax-directed translation, and error handling that the code in
subsequent chapters implements.

7.1 Languages and Grammars


EBNF grammars, like those shown in the previous chapters, are µsed to
define languages. Most languages of any interest, such as programming
languages, are infinite. Grammars are a way to define infinite sets with
finite specifications.
Productions give the rules for producing the sentences in a language
by repeatedly replacing a nonterminal with the right-hand side of one of
its productions. For example, the EBNF grammar
expr:
expr+ expr
ID

defines a language of simple expressions. The nonterminal expr is the


start nonterminal. Sentences in this language are derived by starting
with expr and replacing a nonterminal by the right-hand side of one of

127
128 CHAPTER 7 • PARSING

the rules for the selected nonterminal. In this example, there are only
two rules, so one possible replacement is
expr ==> expr + expr
This operation is a derivation step, and a sequence of such steps that ends
in a sentence is a derivation. At each step, one nonterminal is replaced
by one of its right-hand sides. For example, the sentence ID+ID+ID can
be obtained by the following derivation.
expr ==> expr+ expr
==> expr+ ID
==> expr + expr + ID
==> ID+ expr+ ID
==> ID+ID+ID
In the first step, the production
expr: expr + expr
is applied to replace expr by the right-hand side of this rule. In the sec-
ond step, the rule expr: ID is applied to the rightmost occurrence of expr.
The next three steps apply these rules to arrive at the sentence ID+ID+ID.
Each of the steps in a derivation yields a sentential form, which is a string
of terminals and nonterminals. Sentential forms differ from sentences
in that they can include both terminals and nonterminals; sentences con-
tain just terminals.
At each step in a derivation, any of the nonterminals in the sentential
form can be replaced by the right-hand side of one of its rules. If, at each
step, the leftmost nonterminal is replaced, the derivation is a le~most
derivation. For example,
expr ==> expr+ expr
==> ID+ expr
==> ID + expr + expr
==> ID+ ID+ expr
==> ID+ID+ID
is a leftmost derivation for the sentence ID+ID+ID. Parsers reconstruct a
derivation for a given sentence, i.e., the input C program. 1cc's parser is
a top-down parser that reconstructs the leftmost derivation of its input.

7.2 Ambiguity and Parse Trees


Consider the language defined by the following grammar.
7.2 • AMBIGUITY AND PARSE TREES 129

expr expr

/i~
+ expr
expr
/i~
expr + expr

/i~
expr + expr
ic ia expr/i~
+ expr

i
a
ib
i
b
ic
FIGURE 7.1 Two parse trees for a+b+c.

expr:
expr+ expr
expr * expr
ID

Assuming a, b, and c are identifiers, a+b, a+b+c, and a+b*c are sentences
in this language.
A derivation can be written as described above or shown pictorially by
a parse tree. For example, a leftmost derivation for a+b+c is
expr ==> expr + expr
==> ex.pr+ ex.pr+ expr
==> a + expr + expr
==> a+ b + expr
==> a+b+c

and the corresponding parse tree is the one on the left in Figure 7.1.
A parse tree is a tree with its nodes labelled with nonterminals and
its leaves labelled with terminals; the root of the tree is labelled with
the start symbol. If a node is labelled with nonterminal A and its im-
mediate offspring are labelled, left to right, with X 1 , X2, ... , Xn, then
A: X1X2 .. . Xn is a production.
If a sentence has more than one parse tree, which is equivalent to
having more than one leftmost derivation, the language is ambiguous.
For example, a+b+c has another leftmost derivation in addition to the
one shown above, and the resulting parse tree is the one shown on the
right in Figure 7.1.
The problem in this example is that the normal left-associativity of
+ is not captured by the grammar. The correct interpretation, which
corresponds to (a+b)+c, is given by the derivation above, and is shown
in Figure 7.l's left tree.
This problem can be solved by rewriting the grammar to use EBNF's
repetition construct so that a+b+c has only one derivation, which can be
interpreted as (a+b)+c:
130 CHAPTER 7 • PARSING

expr:
expr { + expr }
expr { * expr }
ID
With this change, there is only one leftmost derivation for a+b+c, but
understanding that derivation requires understanding how to apply EBNF
productions involving repetitions. A production of the form
A:~ { oc}
says that A derives ~ followed by zero or more occurrences of oc. This
language is also specified by the grammar
A:~ X
X: EI ocX
X derives the empty string, denoted by E, or oc followed by X. One
application of A's production followed by repeated applications of X's
productions thus derives ~ fallowed by zero or more occurrences of oc.
EBNF's repetition construct is an abbreviation for a hidden nonterminal
like X, but these nonterminals must be included in parse trees. It's easi-
est to do so by rewriting the grammar to include them. Adding them to
the expression grammar yields
expr:
exprX
expr Y
ID
X:E I+ exprX
Y: EI* expr Y
With this change, there's only one leftmost derivation for a+b+c:
expr ~ exprX
~ ax
~ a+ expr X
~ a+bX
~ a+b+exprX
~ a+b+cX
~ a+b+C€
The parser can interpret this derivation as is appropriate for the oper-
ators involved; here, it would choose the left-associative interpretation,
but it could also choose the other interpretation for right-associative op-
erators.
The operator * has the same problem, which can be fixed in a way
similar to that suggested above. In addition, * typically has a higher
7.2 • AMBIGUITY AND PARSE TREES 131

precedence than +, so the grammar should help arrive at the correct in-
terpretation for sentences like a+b*c. For example, the revised grammar
given above does not work; the derivation for a+b*c is
expr ==> exprX
==> ax
==> a+ expr Y
==> a+ b Y
==> a+b*exprY
==> a+b+cY
==> a+b+ci:
The fourth derivation step can cause the expression to be interpreted as
(a+b)*c instead of a+(b*c).
The higher precedence of * can be accommodated by introducing a
separate nonterminal that derives sentences involving *, and arranging
for occurrences of this nonterminal to appear as the operands to +:
expr: termX
term: ID Y
X:i: I+ termX
Y:t:l*IDY
With this grammar, the only leftmost derivation for a+b*c is
expr ==> termX
==> aYX
==> at:X
==> a i: + termX
==> at:+b YX
==> ai:+b*cYX
==> ai:+b*ci:X
==> at:+b*C£€
term derives a sentential form that includes b*c, which can be inter-
preted as the right-hand operand of the sum. As detailed in Chapter 8,
this approach can be generalized to handle an arbitrary number of prece-
dence levels and both right- and left-associative operators.
The grammar manipulations described above are usually omitted, and
the appropriate EBNF grammar is written directly. For example, the ex-
pression grammar shown in Section 1.6 completes the expression gram-
mar shown here.
Other ambiguities can be handled by rewriting the grammar, but it's
often easier to resolve them in an ad hoc fashion by simply choosing
one of the possible interpretations and writing the code to treat other
interpretations as errors. An example is the dangling-else ambiguity in
the if statement:
132 CHAPTER 7 • PARSING

stmt:
if ( expr) stmt
if ( expr) stmt else stmt
Nested if statements have two derivations: one in which the else part is
associated with the outermost if, and one in which the else is associated
with the innermost if, which is the usual interpretation. As shown in
Chapter 10, this ambiguity is handled by parsing the else part as soon
as it's seen, which has the effect of choosing the latter interpretation.

7.3 Top-Down Parsing


Grammars define the rules for generating the sentences in a language.
These rules can also be used to recognize sentences. As suggested above,
a parser is a program that recognizes a sentence in a given language by
reconstructing the derivation for the sentence. During the recognition
process, the parser reconstructs the parse tree for the sentence, which is
equivalent to recognizing the derivation. In practice, most parsers do not
construct an explicit tree. Instead, they construct an equivalent internal
representation or simply perform some semantic processing at the points
at which a node would have otherwise been created.
All practical parsers read their input from left to right, but different
kinds of parsers construct parse trees differently. Top-down parsers
reconstruct a leftmost derivation for a sentence by beginning with the
start nonterminal and guessing at the next derivation step. The next
token in the input is used to help select the production to apply as the
next derivation step. For example, the grammar
S: c Ad
A: ab
a
defines the language {cabd, cad}. Suppose a parser for this language is
presented with the input cad. The c suggests the application of the (one
and only) production for S, and the initial step in the derivation is
S ~ cAd

and, since the token c matches the first symbol in the selected produc-
tion, the input is advanced by one token. For the next step, the parser
must choose and apply a production for A. The next input token is a,
so the first production for A is a plausible choice, and the derivation
becomes
S ~ cAd
~ cab d
7.3 • TOP-DOWN PARSING 133

Again, the input is advanced since the input a matches the a in the pro-
duction for A. At this point, the parser is stuck because the next input
token, d, does not match the next symbol in the current derivation step,
b. The problem is that the wrong production for A was chosen. The
parser backs up to the previous step, backing up the input that was con-
sumed in the erroneous step, and applies the other production for A:
S => cAd
=> cad

The input, which was backed up to the a, matches the remainder of the
symbols in the derivation step, and the parser announces success.
As illustrated by this simple example, a top-down parser uses the next
input token to select an applicable production, and consumes input to-
kens as long as they match terminals in the derivation step. When a non-
terminal is encountered in the right-hand side of a derivation, the next
derivation step is made. This example also illustrates a pitfall of top-
down parsing: applying the wrong production and having to backtrack
to a previous step. For even a moderately complicated language, such
backtracking could cause many steps to be reversed. More important,
most of the side effects that can occur in derivation steps are difficult
and costly to undo. Backing up the input an arbitrary distance and un-
doing symbol-table insertions are examples. Also, such backtracking can
make recognition very slow; in the worst case, the running time can be
exponential in the number of tokens in the input.
Top-down parsing techniques are practical only in cases where back-
tracking can be avoided completely. This constraint restricts top-down
parsers to languages in which the appropriate production for the next
derivation step can be chosen correctly by looking at just the next to-
ken in the input. Fortunately, many programming languages, including
C, satisfy this constraint.
A common technique for implementing top-down parsers is to write
a parsing function for each nonterminal in the grammar, and to call that
function when a production for the nonterminal is to be applied. Natu-
rally, parsing functions must be recursive, since they might be applied
recursively. That is, there might be a derivation of the form
A => ... => ocA/3 => ...
where oc and f3 are strings of grammar symbols. Top-down parsers writ-
ten using this strategy are called recursive-descent parsers because they
emulate a descent of the parse tree by calling recursive functions at each
node.
The derivation is not constructed explicitly. The call stack that han-
dles the calls to recursive functions records the state of the derivation
implicitly. For each nonterminal, the corresponding function encodes
the right-hand side of each production as a sequence of comparisons
134 CHAPTER 7 • PARSING

and calls. Terminals appearing in a production become comparisons of


the token expected with the current input token, and nonterminals in the
production become calls to the corresponding functions. For example,
assuming that gettok returns the appropriate tokens for the language
above, the function for the production S: c A d is
int S(void) {
if (t == 'c') {
t = gettok();
if (A() == 0)
return 1;
if Ct == 'd') {
t = gettok();
return 1;
} else
return O;
} else
return O;
}

A and S return one if they recognize sentences derivable from A and


S, and they return zero otherwise. Parsing is initiated by main calling
gettok to get the first token, and then calling S:
gettok 111
int t;
void main(void) {
t = gettok();
if (SO == 0)
error("syntax error\n");
i f (t != EOI)
error("syntax error\n");
}

EOI is the token code for the end of input; the input is valid only if all
of it is a sentence in the language.

7.4 FIRST and FOLLOW Sets


In order to write the parsing functions for each nonterminal in a gram-
mar, it must be possible to select the appropriate production by looking
at just the next token in the input. Given a string of grammar symbols
oc, FIRST(oc) is the set of terminals that begin all sentences derived from
oc. The FIRST sets help select the appropriate production in a derivation
step.
Suppose the grammar contains the productions A: oc and A: {3, and
the next derivation step is the replacement of A by the right-hand side
1.4 • FIRST AND FOLLOW SETS 135

of one of its productions. The parsing function for A is called, and it


must select the appropriate production. If the next token is in FIRST(ex),
the production A: ex is selected, and if the next token is in FIRST(/3), A: f3
is selected. If the next token is not in FIRST(ex) u FIRST(/3), there is a
syntax error. Clearly, FIRST(ex) and FIRST(/3) cannot intersect.
When ex is simply a nonterminal, FIRST(ex) is the set of terminals
that begin sentences derivable from that nonterminal. Given a grammar,
FIRST sets for each grammar symbol X can be computed by inspecting
the productions. This inspection is an iterative process; it is repeated
until nothing new is added to any of the FIRST sets.
If Xis a terminal a, FIRST(X) is {a}. If X is a nonterminal and there
is a production X: aex, where a is a terminal, a is added to FIRST(X). If
there are productions of the form X: [ex] or X: {ex}, E and FIRST(ex) are
added to FIRST(X); E is added because these €-productions can derive
the empty string. If there are productions of the form
X: ex1
ex2

then
FIRST(exi) u FIRST(ex2) u ... u FIRST(exk)
is added to FIRST(X). If there is a production of the form X: Y1 Y2 ... Yk,
where Yi are grammar symbols, then FIRST(Y1 Y2 ... Yk) is added to
FIRST(X).
FIRST(Y1Y2 ... Yk) depends on the FIRST sets for Y1 through Yk. All
of the elements of FIRST(Yi) except E are added to FIRST(Y1Y2 ... Yk),
which is initially empty. If FIRST(Yi) contains E, all of the elements of
FIRST(Y2 ) except E are also added. This process is repeated, adding all
of the elements of FIRST(Yd except E if FIRST(Yi-i) contains E. The
resulting effect is that FIRST(Y1Y2 ... Yk) contains the elements of the
FIRST sets for the transparent Yis, where a FIRST set is transparent if
it contains E. If all of the FIRST sets for Y1 through Yk contain E, E is
added to FIRST(Y1 Y2 ... Yk).
Consider the grammar for simple expressions given in Section 1.6:
expr:
term { + term }
term { - term }
term:
factor { * factor }
factor { I factor }
factor:
ID
138 CHAPTER 7 • PARSING

ID ' (' expr { , expr } ' ) '


'(' expr ')'

This grammar has been rewritten to express alternatives as separate pro-


ductions given on separate lines. FIRST ( expr) is equal to
FIRST (term { + term } ) u FIRST (term { - term } )

which cannot be computed until the value of FIRST ( tenn) is known. Like-
wise, FIRST (term) is
FIRST(factor { *factor}) u FIRST(factor { /factor})

FIRST(factor), however, is easy to compute because all of the produc-


tions for factor start with terminals:
FIRST(factor) = FIRST( ID) u FIRST( ID ' (' expr { , expr} ') ')
u FIRST ( ' (' expr ') ' )
{ID (}

Now FIRST(term) can be computed and is {ID (}; FIRST(expr) is also


{ID (}.
There is one case in which the FIRST sets are not enough to determine
which production to apply. Suppose a grammar contains the productions
X: AB
c
Normally, the appropriate production would be selected depending on
whether the next token is in FIRST(AB) or FIRST(C). Suppose, how-
ever, that FIRST(AB) contains E, meaning that AB can derive the empty
sentence. Then selecting the appropriate production depends not only
on FIRST(AB) and FIRST(C), but also on the tokens that can follow X.
This set of tokens is the FOLLOW set for X; that is, FOLLOW(X) is the
set of terminals that can immediately follow nonterminal X in any sen-
tential form. The FOLLOW sets give the "right context" for the non-
terminal symbols, and are used in error detection as well as in struc-
turing the grammar so that it is suitable for recursive-descent parsing.
In this example, the first production is selected if the next token is in
FIRST(AB) u FOLLOW(X), and second production is selected if the next
token is in FIRST(C). Of course, FIRST(AB) u FOLLOW(X) must be dis-
joint from FIRST(C).
FOLLOW sets are harder to compute than FIRST sets, primarily be-
cause it is necessary to inspect all productions in which a nonterminal is
used instead of just the productions that define the nonterminal. For all
productions of the form X: ocY/3, FIRST(/3) - {E} is added to FOLLOW(Y).
If FIRST(/3) is transparent - if it contains E - FOLLOW(X) is added
1. 5 • WRITING PARSING FUNCTIONS 137

to FOLLOW(Y). For all productions of the form X: ocY, FOUOW(X) is


added to FOLLOW(Y). As for computing FIRST sets, computing FOLLOW
sets is an iterative process; it is repeated until nothing new is added to
any FOLLOW set. The end-of-file marker, -1, is included in the FOLLOW
set of the start symbol.
Here's how the FOUOW sets are computed for the expression gram-
mar. Since expr is the start symbol, FOLLOW ( expr) contains -1. expr
appears in only the productions for factor, so
FOLLOW(expr) = {-1} u FIRST( { , expr} ') ') u FIRST(')')
= {, ) -1}

FIRST ( ') ') contributes ) to FOLLOW ( expr), but FIRST ( { , expr } ) con-
tains €, so FIRST( { , expr} ') ') contributes ) as well.
term appears in two places in the two productions for expr, so
FOLLOW(term) = FOLLOW(expr)
u FIRST( { + term } ) u FIRST( { - term } )
{, ) -I + -}

Similarly, factor appears twice in each of the production:. for term:


FOUOW(factor) = FOLLOW(term)
u FIRST ( { * factor } ) u FIRST( { I factor } )
{' ) -I + - * /}

7.5 Writing Parsing Functions


Equipped with an EBNF grammar for a language and the FIRST and FOL-
LOW sets for each nonterminal, writing parsing functions amounts to
translating the productions for each nonterminal into executable code.
The idea is to write a function X for each nonterminal X, using the pro-
ductions for X as a guide to writing the code for X.
The- rules for this translation are derived from the possible forms for
the productions in the grammar. For each form of production, ex, T(ex)
denotes the translation - the code - for ex. At any point during parsing,
the global variable t contains the current token as read by the lexical
analyzer. Input is advanced by calling gettok.
Given the production, X: ex, Xis
X() { T(ex) }
The right column of Table 7.1 gives T(ex) for each form of production
component ex listed in the left column where
D( ) _ { (FIRST(ex) - {€}) u FOLLOW(X) if€ E FIRST(ex)
ex - FIRST(ex) otherwise
138 CHAPTER 7 • PARSING

DC T(oc)
terminal A if Ct== A) t = gettokC);
else error
nonterminal X X() ;
oc1 I oc2 I · · · I °'k if Ct E D(oci)) T(oci)
else if Ct E D(oc2)) T(oc2)

else if Ct E D(ock)) T(ock)


else error
T(oc1) T(oc2) · · · T(ock)
[oc] if Ct E D(oc)) T(oc)
{ oc} while Ct E D(oc)) T(oc)

TABLE 7.1 Parsing function translations.

There are, of course, other code sequences that are equivalent to those
given in Table 7.1. For example, a switch statement is often used for
T(a1 I £X2 I · · · I <Xk). Also, rote application of the sequences given
in Table 7.1 sometimes leads to redundant code, which can be improved
by simple transformations. For example, the body of the parsing func-
tion for
parameter-list: [ ID { , ID } ]
is derived by applying the rules in Table 7.1 in the following seven steps.
1. T (parameter-list)
2. T([ ID { , ID} ])

3. if Ct == ID) { T(ID { , ID } ) }
4. i f Ct == ID) {
if Ct== ID) t = gettokC);
else errorC"missing identifier\n");
T({,ID})
}

5. if Ct == ID) {
if Ct== ID) t = gettokC);
else errorC"missing identifier\n");
while Ct == ', ') { T(, ID)}
}

6. if Ct == ID) {
if Ct== ID) t = gettokC);
7.5 • WRITING PARSING FUNCTIONS 139

else errorC"missing identifier\n");


while Ct == ', ') {
if Ct== ', ') t = gettokC);
else errorC"missing ,\n");
T(ID)
}
}

7. if Ct == ID) {
if Ct== ID) t = gettokC);
else errorC"missing identifier\n");
while Ct == ', ') {
if Ct== ', ') t = gettokC);
else errorC"missing ,\n");
if Ct== ID) t = gettokC);
else errorC"missing identifier\n");
}
}

The test in the second if statement in step 4, t == ID, i redundant; it


must be true if control reaches that if statement. Similarly, the test for a
comma in the first if statement in the while loop in step 6 is unnecessary.
This function can be simplified to
void parameter_listCvoid) {
if Ct == ID) {
t = gettok();
while Ct== ',') {
t = gettok();
if Ct== ID) t = gettokC);
else errorC"missing identifier\n");
}
}
}

Left factoring is often taken into account when the parsing function is
written instead of rewriting the grammar and adding new nonterminals
as described above. For example, A: DC/3 I DC)' is equivalent to A: DC(/3 I y),
so the code for T(DC/3 I DC)') can be written directly as
T(DC) T(/3 I y)
In a few cases, DC appears as a common prefix in several productions,
and involves significant semantic processing. In such cases, introducing
a new nonterminal and left factoring the relevant productions encapsu-
lates that processing in a single parsing function.
140 CHAPTER 1 • PARSING

7.6 Handling Syntax Errors


The FIRST and FOLLOW sets and subsets thereof are used not only to
guide parsing decisions but also for detecting errors. There are two ma-
jor types of errors: syntax errors and semantic errors. The former occur
when the input is not a sentence in the language. The latter occur when
the input is a sentence, but is meaningless. For example, the expression
x = 6 is syntactically correct, but if x is not declared, the expression is
semantically incorrect.
Semantic errors are detected and handled by each parsing function in
accordance with the semantics of the specific construct. Such errors are
described along with the implementation of the functions.
Syntax errors can be handled in a systematic fashion regardless of
the context in which they occur. Detecting syntax errors is relatively
easy; such errors occur at the error indications in the translations shown
in Table 7.1. Recovering from syntax errors is more difficult, however.
Since it is unreasonable to stop parsing after the first syntax error, most
of the effort in error handling is devoted to recovering from errors so
that parsing can continue.
A syntax error indicates the presence of a sentence that is not in the
language. Recovering from a syntax error is possible only if the erro-
neous input can be converted to a sentence by making appropriate as-
sumptions about missing tokens or by ignoring some of the input. Un-
fortunately, choosing the appropriate course of action is nontrivial. Poor
choices may cause the parser to get completely out of step and cause syn-
tax errors to cascade even if the subsequent input is syntactically cor-
rect. Even worse, naive error recovery may fail to make forward progress
through the input.
The structure of recursive-descent parsers assists in choosing the ap-
propriate error-recovery strategy. The parser is composed of many pars-
ing functions, each of which contributes a small part to the overall goal
of parsing the input. Thus the major goal is split into many subgoals,
each handled by calling on other parsing functions. In order to continue
parsing, each function is written to guarantee that the next token in the
input can legally follow its nonterminal in a sentential form. If an error
is detected, the parsing function reports the error and discards tokens
until it encounters one that can legally follow its nonterminal.
One approach to implementing this technique is to have X, the pars-
ing function for the nonterminal X, ignore input until it encounters a
token in FOLLOW(X). The goal is to resynchronize the parser at a point
in the input from which it can continue. After advancing to a token in
FOLLOW(X), it will appear that all is well to X's caller. One problem
with this naive approach is that it doesn't account for the particular sen-
tential form in which this occurrence of X appears. When X appears
in the sentential form cxX/3, X should ignore tokens in D(/3), which is
7.6 • HANDLING SYNTAX ERRORS 141

often smaller than FOUOW(X). If X stops discarding tokens when it


finds one in FOUOW(X) but not in D(/3), it is stopping too early and its
caller will announce another syntax error unnecessarily. Thus, parsing
functions use sets like D(/3) whenever they are readily known, and use
FOUOW(X) otherwise. For example, when exprO, one of the parsing
functions for expressions, is called to parse the third expression in the
for statement, the set {; ) } is used when it recovers from a syntax error
in the expression.
This strategy is encapsulated in the functions exported by error. c.
( error.c exported functions)= 141
.....
extern void test ARGS((int tok, char set[]));
checks if the next token is equal to tok; if it isn't, a message is issued and
tokens are skipped until one in {tok} u set is encountered. set is the
set of tokens that should not be skipped, and ensures that the amount
of input skipped is limited. A set is simply a null-terminated array of
token codes.
(error.c functions)= 142
.....
void test(tok, set) int tok; char set[]; {
i f (t == tok)
t = gettok();
else { 142 expect
expect(tok); 156 exprO
skipto(tok, set); 361 set
if (t == tok) 144 skipto
t = gettok();
}
}

test issues messages by calling expect and skips tokens by calling


skipto, both of which are described below.
The strategy embodied in test works well when the compiler is faced
with errors for which skipping some of the input is an appropriate action.
It does not work well, however, when an expected token is missing from
the input. In those cases, a more effective strategy is to issue a message,
pretend the expected token was present, and continue parsing. This
scheme effectively inserts missing tokens, and it works well because such
errors are almost always caused by the omission of tokens that have
only simple syntactic functions, such as semicolons and commas. This
strategy is implemented by
(error.c exported functions)+=
...
141 143
.....
extern void expect ARGS((int tok));
which checks if the next token, which is the current value of t, is equal
to tok and, if so, advances the input.
142 CHAPTER 7 • PARSING

....
(error.c functions)+=
void expect(tok) int tok; {
141 142
...
i f (t == tok)
t = gettok();
else {
error("syntax error; found");
printtoken();
fprint(2, "expecting '%k'\n", tok);
}
}

The first test is, of course, never true when expect is called from test;
that call is made to issue the diagnostic. expect is also called from other
parsing functions whenever a specific token is expected, and it consumes
that token. If the expected token is missing, expect issues a diagnostic
and returns without advancing the input, as if the expected token had
been present.
expect calls error to begin the message, and it calls the static function
pri nttoken to print the current token (i.e., the token given by t and
token), and fpri nt to conclude the message. As an example of expect's
effect, the input "int x [ 5 ; " draws the diagnostic
syntax error; found ';' expecting ']'
file 104
fi rstfi le 104 Error messages are initiated by calling error, which is called with a
fprint 97
gettok 111
pri ntf-style format string and arguments. In addition to the message,
test 141 error prints the coordinates of the current token set by gettok and
token 108 keeps a count of the number of error messages issued in errcnt.
va_init 17 ....
VARARGS 18 (error.c functions}+=
void error VARARGS((char *fmt, ... ),
142 144 ...
(fmt, va_alist),char *fmt; va__dcl) {
va__list ap;

if (errcnt++ >= errlimit) {


errcnt = -1;
error("too many errors\n");
exit(l);
}
va__init(ap, fmt);
if (firstfile !=file && firstfile && *firstfile)
fprint(2, "%s: ", firstfile);
fprint(2, "%w: ", &src);
vfprint(2, fmt, ap);
va_end(ap);
}
7.6 •HANDLING SYNTAX ERRORS 143

( error.c data)= 143


.....
int errcnt = O;
int errlimit = 20;
If errcnt gets too big, error terminates execution. warning, which issues
warning diagnostics, is similar, but it doesn't increment errcnt. fatal
is similar to error, but terminates compilation after issuing the error
message. fatal is called only for bona fide compiler bugs.
The last error-handling function is
(error.c exported functions)+=
...
141
extern void skipto ARGS((int tok, char set[]));
which discards tokens until a token t is found that is equal to tok or for
which kind [t] is in the null-terminated array set. The array
(error.c exported data)=
extern char kind[];
is indexed by token codes, and partitions them into sets. lt's defined
by including token. h and extracting its sixth column, as ~8cribed on
page 109:
(error.c data)+=
...
143
char kind[] = {
#define xx(a,b,c,d,e,f,g) f, 142 error
361 set
#define yy(a,b,c,d,e,f,g) f, 144 skipto
#include "token.h" 109 token.h
};

kind[t] is a token code that denotes a set of which t is a member.


For example, the code ID is used to denote the set FIRST(expression)
for the expression defined in Section 8.3. Thus, kind [t] is equal to ID
for every t E FIRST(expression). The test kind[t]==ID determines if
the token tis in FIRST(expression), so passing the array {ID,O} as the
second argument to ski pto causes it to skip tokens until it finds one in
FIRST (expression).
The following table summarizes the values in kind. The token code on
the left denotes the set composed of itself and the tokens on the right.
ID FCON ICON SCON SIZEOF & ++ -- * + - - ( !
CHAR FLOAT DOUBLE SHORT INT UNSIGNED SIGNED
VOID STRUCT UNION ENUM LONG CONST VOLATILE
STATIC EXTERN AUTO REGISTER TYPEDEF
IF BREAK CASE CONTINUE DEFAULT DO ELSE
FOR GOTO RETURN SWITCH WHILE {
For tokens not mentioned in this table, kind [t] is equal tot; for example,
kind ['} '] is equal to '} '. The sets defined by kind are related to FIRST
sets described in Section 7.4 as follows.
144 CHAPTER 7 • PARSING

kind[ID] FIRST (expression)


kind[ID] u kind[IF] FIRST(statement)
kind[CHAR] u kind[STATIC] c FIRST(declaration)
kind [STATIC] c FIRST(parameter)

The nonterminals listed above are defined in Chapters 8, 10, and 11.
Since ski pto's second argument is an array, it can represent supersets
of these sets when the additional tokens have kind values equal to them-
selves, as exemplified above by}. These supersets are related to FOLLOW
sets in some cases. For example, a statement must be followed by a } or
a token in FIRST(statement). The parsing function for statement thus
passes skipto an array that holds IF, ID, and}.
As ski pto discards tokens, it announces the first eight and the last
one it discards:
(error.c functions}+=
....
142
void skipto(tok, set) int tok; char set[]; {
int n;
char *s;

for (n = O; t != EOI && t != tok; t = gettok()) {


for (s = set; *s && kind[t] != *s; s++)
fprint 97 if (kind[t] == *s)
kind 143
set 361 break;
if (n++ == 0)
error("skipping");
if (n <= 8)
printtoken();
else if (n == 9)
fprint(2, " ... ");
}
if (n > 8) {
fprint(2, "up to");
pri nttoken () ;
}
if (n > 0)
fprint(2, "\n");
}

ski pto discards nothing and issues no diagnostic if tis equal to tok or
is in kind[t]. Suppose bug.c holds only the one line
fprint(2, "expecting '%k'\n", tok);
The syntax error in this example is that this line must be inside a func-
tion. The call to fpri nt looks like the beginning of a function definition,
FURTHER READING 145

but 1cc soon discovers the error. test calls expect and ski pto to issue
the diagnostic
bug.c:l: syntax error; found '2' expecting ')'
bug.c:l: skipping '2' ',' " expecting '%k'\12" ',' 'tok'
Notice that the right parenthesis was not discarded.

Further Reading
There are many books that describe the theory and practice of com-
piler construction, including Aho, Sethi, and Ullman (1986), Fischer and
LeBlanc (1991), and Waite and Goos (1984). Davie and Morrison (1981)
and Wirth (1976) describe the design and implementation of recursive-
descent compilers.
A bottom-up parser reconstructs a rightmost derivation of its input,
and builds parse trees from the leaves to the roots. Bottom-up parsers
are often used in compilers because they accept a larger class of lan-
guages and because the grammars are sometimes easier to write. Most
bottom-up parsers use a variant of LR parsing, which is surveyed by Aho
and Johnson (1974) and covered in detail by Aho, Sethi, and Ullman
(1986). In addition, many parser generators have been constructed.
These programs accept a syntactic specification of the language, usu- 142 expect
144 skipto
ally in a form like that shown in Exercise 7.2, and produce a parsing 141 test
program. YACC (Johnson 1975) is the parser generator used on UNIX.
YACC and LEX work together, often simplifying compiler implementa-
tion considerably. Aho, Sethi, and Ullman (1986), Kernighan and Pike
(1984), and Schreiner and Friedman (1985) contain several examples of
the use of YACC and LEX. Holub (1990) describes the implementation of
another parser generator.
Other parser generators are based on attribute grammars; Waite and
Goos (1984) describe attribute grammars and related parser generators.
The error-handling techniques used in 1cc are like those advocated
by Stirling (1985) and used by Wirth (1976). Burke and Fisher (1987)
describe perhaps the best approach to handling errors for LR and LL
parser tables.

Exercises
7.1 Using the lexical-analyzer and the symbol-table modules from the
previous chapters, cobble together a parser that recognizes expres-
sions defined by the grammar below and prints their parse trees.
146 CHAPTER 7 • PARSING

expr:
term { + term }
term { - term }
term:
factor { * factor }
factor { / factor }
factor:
ID
ID ' (' expr { , expr } ' ) '
'(' expr ')'

7.2 Write a program that computes the FIRST and FOUOW sets for an
EBNF grammar and reports conflicts that interfere with recursive-
descent parsing of the language. Design an input representation for
the grammar that is close in the form to EBNF. For example, sup-
pose grammars are given in free format where nonterminals are in
lowercase with embedded - signs, terminals are in uppercase or en-
closed in single or double quotes, and productions are terminated
by semicolons. For example, the grammar in the previous exercise
could appear as

expr term { ( '+' I ' - ' ) term }

term factor { ( '*' I '/' ) term}

factor ID [ '(' expr { , expr} ')' J


'(' expr ')'

Give an EBNF specification for the syntax of the input, and write
a recursive-descent parser to recognize it using the techniques de-
scribed in this chapter.
8
Expressions

C expressions form a sublanguage for which the parsing functions are


relatively straightforward to write. This makes them a good starting
point for describing 1cc's eight modules that collaborate to parse and
analyze the input program. These functions build an internal represen-
tation for the input program that consists of the abstract syntax trees
and the code lists described in Section 1.3.
Four of these modules cooperate in parsing, analyzing, and represent-
ing expressions. expr. c implements the parsing functions that recognize
and translate expressions. tree. c implements low-level functions that
manage trees, which are the internal, intermediate representation for ex-
pressions. enode. c implements type-checking functions that ensure the
semantic validity of expressions, and it exports functions that build and
manipulate trees. simp. c implements functions that perform tree trans-
formations, such as constant folding.
Broadly speaking, this chapter focuses on tree. c and expr. c, and it
describes the shape of the abstract syntax trees used to represent ex-
pressions. Much of this explanation is a top-down tour of the parsing
functions that build trees. Chapter 9 describes the meaning of these
trees as they relate to the semantics of C; most of that explanation is a
bottom-up tour of the semantics functions that type-check trees as they
are built. This chapter's last section is the exception to this general struc-
ture; the functions it describes handle both the shape and meaning of
the leaf nodes in abstract syntax trees, which are the nodes for constants
and identifiers.

8.1 Representing Expressions


In addition to recognizing and analyzing expressions, the compiler must
build an intermediate representation of them from which it can check
their validity and generate code. Abstract syntax trees, or simply trees,
are often used to represent expressions. Abstract syntax trees are parse
trees without nodes for the nonterminals and nodes for useless termi-
nals. In such trees, nodes represent operators and their offspring repre-
sent the operands. For example, the tree for (a+b)+b*(a+b) is shown in
Figure 8.1. There are no nodes for the nonterminals involved in parsing
this expression, and there are no nodes for the tokens C and ) . There
are no nodes for the tokens + and * because the nodes contain operators

147
148 CHAPTER 8 • EXPRESSIONS

ADD+I

/~MUL+I
ADD+I

I\
INDIR+I INDIR+I INDIR+I
/~ADD+I
i i i I\
ADDRG+P ADDRG+P ADDRG+P INDIR+I INDIR+I
a b b i i
ADDRG+P ADDRG+P
a b

FIGURE 8.1 Abstract syntax tree for (a+b)+b*(a+b).

(ADD+I and MUL+I) that represent operator-specific information. Nodes


labelled with the operator ADDRG+P represent computing the address of
the identifier given by their operands.
Trees often contain operators that do not appear in the source lan-
guage. The INDIR+I nodes, for example, fetch the integers at the ad-
dresses specified by their operands, but there's no explicit "fetch" op-
erator in C. Other examples include conversion operators, which arise
tree 150 because of implicit conversions, and operators that are introduced as
the result of semantic rules. Some of these operators do not have any
corresponding operation at runtime; they are introduced only to facili-
tate compilation.
Uke types, trees can be written in a parenthesized prefix notation; for
example, the tree for the expression (a+b)+b*(a+b) shown in Figure 8.1
can be written as
(ADD+I
(ADD+I (INDIR+I (ADDRG+P a)) (INDIR+I (ADDRG+P b)))
(MUL+I
(INDIR+I (ADDRG+P b))
(ADD+I (INDIR+I (ADDRG+P a)) (INDIR+I (ADDRG+P b)))
)
)

Parsing an expression yields a tree whose nodes are defined by


(tree.c typedefs)=
typedef struct tree *Tree;

(tree.c exported types}=


struct tree {
int op;
Type type;
8. 1 • REPRESENT/NG EXPRESSIONS 149

Tree kids[2];
Node node;
union {
(u fields for Tree variants 168)
} u;
};

The op field holds a code for the operator, the type field points to a Type
for the type of the result computed by the node at runtime, and kids
point to the operands. The node field is used to build dags from trees
as detailed in Section 12.2. Trees for some operators have additional
information tucked away in the fields of their u unions.
The operators form a superset of the node operators described in
Chapter 5 and listed in Table 5.1 (page 84), but they are written differ-
ently to emphasize their use in trees. An operator is formed by adding
a type suffix to a generic operator; for example, ADD+! denotes integer
addition. The + is omitted when referring to the corresponding node
operator ADDI; this convention helps distinguish between trees and dags
in figures and in prose. The type suffixes are listed in Section 5.5, and
Table 5.1 gives the allowable suffixes for each operator.
Table 8.1 lists the six operators that appear in trees in addition to
those shown in Table 5.1. AND, OR, and NOT represent expressions involv-
ing the &&, I I, and ! operators. Comma expressions yield RIGHT trees; 81 kids
by definition, RIGHT evaluates its arguments left to right, and its value 315 node
is the value of its rightmost operand. RIGHT is also used to build trees 54 Type
that logically have more than two operands, such as the COND operator,
which represents conditional expressions of the form c ? e1 : e2. The
first operand of a COND tree is c and the second is a RIGHT tree that holds
e1 and e2. RIGHT trees are also used for expressions such as e++. These
operators are used only by the front end and thus do not need - and
must not have - type suffixes. The FIELD operator identifies a reference
to a bit field.
While trees and dags share many of the same operators, the rules con-
cerning the number of operands and symbols, summarized in Table 5.1,
apply only to dags. The front end is not constrained by these rules when

syms kids Operator Operation


2 AND logical And
2 OR logical Or
1 NOT logical Not
lV 2 COND conditional expression
1or2 RIGHT composition
1 FIELD bit-field access

TABLE 8.1 Tree operators.


150 CHAPTER 8 • EXPRESSIONS

it builds trees, and it often uses additional operands and symbols in trees
that do not appear in dags. For example, when it builds the tree for the
arguments in a function call, it uses the kids [1] fields in ARG nodes to
build what amounts to a list of arguments, the trees for which are stored
in the kids [OJ fields.
A tree is allocated, initialized, and returned by
( tree.c functions)= 150
....
Tree tree(op, type, left, right)
int op; Type type; Tree left, right; {
Tree p;

NEWO(p, where);
p->Op = op;
p->type = type;
p->kids[OJ left;
p->kids[l] = right;
return p;
}

( tree.c data)= ....


155
static int where = STMT;
NEWO 24 Trees are allocated in the allocation arena indicated by where, which is
STMT 97
almost always the STMT arena. Data allocated in STMT is deallocated most
frequently, possibly after every statement is parsed. In some cases, how-
ever, an expression's tree must be saved beyond the compilation of the
current statement. The increment expression in a for loop is an exam-
ple. These expressions are parsed by calling texpr with an argument
that specifies the allocation arena:
( tree.c functions)+=
...
150 155
....
Tree texpr(f, tok, a) Tree (*f) ARGS((int)); int tok, a; {
int save = where;
Tree p;

where = a;
p = (*f)(tok);
where = save;
return p;
}

texpr saves where, sets it to a, calls the parsing function (*f) (tok),
restores the saved value of where, and returns the tree returned by *f.
The remaining functions in tree. c construct, test, or otherwise ma-
nipulate trees and operators. These are all applicative - they build new
trees instead of modifying existing ones, which is necessary because the
8.2 • PARSING EXPRESSIONS 151

front end builds dags for a few operators instead of trees. rightkid(p)
returns the rightmost non-RIGHT operand of a nested series of RIGHT
trees. retype(p, ty) returns p if p->type == ty or a copy of p with type
ty. hascal l(p) is one if p contains a CALL tree and zero otherwise.
generic(op) returns the generic flavor of op, optype(op) returns op's
type suffix, and opindex(op) returns op's operator index, which is the
generic operator mapped into a contiguous range of integers suitable for
use as an index.

8.2 Parsing Expressions


C has 41 operators distributed in 15 levels of precedence. Beginning
with an EBNF grammar that contains a nonterminal for each precedence
level, as suggested in Section 7.2, and deriving the parsing functions
is a correct approach, but cumbersome at best. There is an important
simplification to this process that reduces the size of both the grammar
and the resulting code.
Consider the following simplification of the grammar from Section 7.4,
which is for a small subset of C expressions.
expr: term { + term }
term: factor { * factor } 97 generic
171 hascall
factor: ID I ' C' expr ' ) ' 98 opindex
98 optype
Parsing functions can be written directly from this grammar using the 171 retype
translations given in Table 7.1. For example, the steps in deriving and 149 RIGHT
simplifying the body of the parsing function expr (without semantics) 171 ri ghtki d
for ex.pr are
T(expr)
T(term { + term } )
T( term) T ( { + term } )
term(); T({ +term})
term(); while (t == '+') { T(+ term)}
term(); while (t == '+') { T(+) T(term)}
term(); whi 1 e (t == '+') { t = gettok(); T(term) }
term(); while (t == '+') { t = gettok(); term(); }
Likewise, the body of the parsing function term for term is
factor(); while Ct== '*') { t = gettok(); factor(); }
factor is the basis case, and it handles the elementary expressions:
void factor(void) {
if (t == ID)
152 CHAPTER 8 • EXPRESSIONS

t = gettok();
else if (t == '(') {
t = gettok();
expr();
expect(')');
} else
error("unrecognized expression\n");
}

There are two precedence levels in this example. In general, for n


precedence levels there are n+ 1 nonterminals; one for each level and one
for the basis case in which further division is impossible. Consequently,
there are n + 1 functions - one for each nonterminal. If the binary
operators are all left-associative, these functions are very similar. As
illustrated by the bodies for expr and term above, the only essential
differences are the operators expected and the function to be called;
function k calls function k + l.
This similarity can be exploited to replace functions 1 through n by a
single function and a table of operators ordered according to increasing
precedence. 1cc stores the precedences in an array indexed by token
code; Table 8.2 lists the precedence and associativity for all of the C
operators. prec [t] is the precedence of the operator with token code
prec 155 t; for example, prec [ '+' J is 12 and prec [LEQ] is 10. Using prec and
assuming that the only operators are +, -, *, /, and %, then expr and
term given above can be replaced by the single function
void expr(int k) {
if (k > 13)
factor();
else {
expr(k + 1);
while (prec[t] == k) {
t = gettok();
expr(k + 1);
}
}
}

The 13 comes from Table 8.2; the binary operators + and - have prece-
dence 12 and*,/, and% each have precedence 13. When k exceeds 13,
expr calls factor to parse the productions for factor. Expression pars-
ing for this restricted grammar begins by calling expr(12), and the call
to expr in factor must be changed to expr(12).
expr and factor can be used for any expression grammar of the form
expr: expr ® expr I factor
8.2 • PARSING EXPRESSIONS 153

Precedence Associativity Operators Purpose Parsed By


1 left composition expr
2 right = += -= *= assignment exprl
I= %= &= "= I=
<<= >>=
3 right ?: conditional expr2
4 left 11 logical or expr3
5 left && logical and expr3
6 left I bitwise OR expr3
7 left II bitwise XOR expr3
8 left & bitwise AND expr3
9 left != equality expr3
10 left < > <= >= relational expr3
11 left << >> shifting expr3
12 left + - additive expr3
13 left * I % multiplicative expr3
14 * & - + !
++ --
- unary prefix unary

si zeof type-cast
15 ++ -- unary suffix postfix

TABLE8.2 Operator precedences, associativities, and parsing functions.

157 exprl
where ® denotes binary, left-associative operators. Adding operators is 159 expr2
accomplished by appropriately initializing prec. 162 expr3
The while loop in expr handles left-associative operators, which are 155 expr
specified in EBNF by productions like those for ex.pr and term. Right- 166 postfix
155 prec
associative operators, like assignment, are specified in EBNF by produc- 164 unary
tions like
asgn: expr = asgn
They can also be handled using this approach by simply calling expr(k)
instead of expr(k + 1) in the while loop in expr. Assuming all opera-
tors at each precedence level have the same associativity, the decision
of whether to call expr with k or k + 1 can be encoded in a table, han-
dled by writing separate parsing functions for left- and right-associative
operators, or making explicit tests for each kind of operator.
Unary operators can also be handled using this technique. Fortunately,
the unary operators in C have the highest precedence, so they appear in
function n + 1, as does factor in the example above. Otherwise, upon
entry, expr would have to check for the occurrence of unary operators
at the kth level.
Using this technique also simplifies the grammar for expressions, be-
cause most of the nonterminals for the intermediate precedence levels
can be omitted.
154 CHAPTER 8 • EXPRESSIONS

8.3 Parsing C Expressions


The complete syntax for C expressions is
expression:
assignment-expression { , assignment-expression }
assignment-expression:
conditional-expression
unary-expression assign-operator assignment-expression
assign-operator:
one of= += -= *= /= %= <<= >>= &= A= I=
conditional-expression:
binary-expression [ ? expression : conditional-expression ]
binary-expression:
unary-expression { binary-operator unary-expression }
binary-operator:
one of I I && ' I ' A & == ! = < > <= >= << >> + - * I %
unary-expression:
postfix-expression
unary-opera tor unary-expression
' ( ' type-name ' ) ' unary-expression
s i zeof unary-expression
s i zeof ' ( ' type-name ' ) '
unary-opera tor:
one of ++ -- & * + - - !
postfix-expression:
primary-expression { postfix-operator }
postfix-operator:
' [ ' expression ' ] '
' (' [ assignment-expression { , assignment-expression } ] ') '
. identifier
-> identifier
++

primary-expression:
identifer
constant
string-literal
' ( ' expression ' ) '
There are seven parsing functions for expressions corresponding to the
expression nonterminals in this grammar. The parsing function for
8.3 • PARSING C EXPRESSIONS 155

binary-expression uses the techniques described in Section 8.2 to han-


dle all the binary operators, which have precedences between 4 and 13
inclusive (see Table 8.2).
Each of these functions parses the applicable expression, builds a tree
to represent the expression, type-checks the tree, and returns it. Three
arrays, each indexed by token code, guide the operation of these func-
tions. prec[t], mentioned in Section 8.2, gives the precedence of the
operator denoted by token code t. oper[t] is the generic tree operator
that corresponds to token t, and optree[t] points to a function that
builds a tree for the operator denoted by t. For example, prec [ '+'] is
12, aper [ '+'] is ADD, and optree [ '+'] is addtree, which, like most of
the functions referred to by optree and like optree itself, is in enode. c.
prec and aper are defined by including token. h and extracting its third
and fourth columns:
...
( tree.c data)+=
static char prec[] = {
150 169...
#define xx(a,b,c,d,e,f,g) c,
#define yy(a,b,c,d,e,f,g) c,
#include "token.h"
};
static int aper[] = {
#define xx(a,b,c,d,e,f,g) d, 192 addtree
#define yy(a,b,c,d,e,f,g) d, 157 exprl
#include "token.h" 191 optree
}; 174 pointer
149 RIGHT
token. h is described in Section 6.2. 109 token .h
150 tree
Each function is derived using the rules described in Section 7.5. Code 160 value
to build and check the trees is interleaved with the parsing code. Tue
code for expression is typical and is also the simplest:
...
( tree.c functions}+=
Tree expr(tok) int tok; {
150 156
...
static char stop[] { IF I ID' I} I ' 0 };
Tree p = exprl(O);

while Ct == ', ') {


Tree q;
t = gettok();
q pointer(exprl(O));
p = tree(RIGHT, q->type, root(value(p)), q);
}
(test for correct termination 156}
return p;
}
156 CHAPTER 8 • EXPRESSIONS

expr begins by calling exprl, which parses an assignment-expression and


returns the tree; it's described in Section 8.4. The while loop corresponds
to the
{ , assignment-expression}
portion of the production for assignment-expression, and it builds a
RIGHT tree for each comma operator. The functions pointer and value
check for semantic correctness or return transformations of their argu-
ment trees, and are described below. Exercise 12.9 describes root, which
is called with trees that will be executed only for their side effect.
expr's argument, if nonzero, is the code for the token that should
follow this occurrence of an expression.
(test for correct termination 156) = 155 157
if (tok)
test(tok, stop);
If tok is nonzero, but the expression is followed by something else, test
skips input up to the next occurrence of tok or a token in stop, which
is the set toku { IF ID '}' } (see Section 7.6). This convention, which
is used by several parsing functions, helps detect and handle errors. An
expression must be followed by one of the tokens in its FOLLOW set.
But for most uses, there's only one token that can follow expression. For
exprl 157 example, the increment expression in a for loop must be followed by a
expr 155 right parenthesis. So, instead of checking for any token in the FOLLOW
pointer 174 set, exp r checks for one of the tokens in the FOLLOW set, which is more
RIGHT 149
test 141 precise. In contexts where more than one token can follow expression,
value 160 expr(O) is used and the caller checks the legality of the next token.
Statement-level expressions, such as assignments and function calls,
are executed for their side effects:
(tree.c functions)+=
....
155 157
.....
Tree exprO(tok) int tok; {
return root(expr(tok));
}

exprO calls expr to parse the expression, and passes the resulting tree
to root, which returns only the tree that has a side effect. For exam-
ple, the statement a + f() includes a useless addition, which lee is free
to eliminate (even if the addition would overflow). Given the tree for
this expression, root returns the tree for f (). root is described in Exer-
cise 12.9.

8.4 Assignment Expressions


The right recursion in the second production for assignment-expression
makes assignment right-associative; multiple assignments like a = b = e
8.4 • ASSIGNMENT EXPRESSIONS 157

are interpreted as a = (b = c). Using the production


assignment-expression:
unary-expression { assign-operator conditional-expression }
instead would be incorrect because it leads to a left-associative interpre-
tation of multiple assignments like (a = b) = c. This interpretation is
incorrect because the result of an assignment is not an lvalue.
exprl parses assignments.
(tree.c functions)+=
...
156 158
.....
Tree exprl(tok) int tok; {
static char stop[] = { IF, ID, 0 };
Tree p = expr2();

if (t == '='
I I (prec[t] >= 6 && prec[t] <= 8)
I I (prec[t] >= 11 && prec[t] <= 13)) {
int op = t;
t = gettok();
if (oper[op] == ASGN)
p = asgntree(ASGN, p, value(exprl(O)));
else
(augmented assignment 158) 197 asgntree
} 159 expr2
155 oper
(test for correct termination 156)
155 prec
return p; 164 unary
} 160 value

expr2 parses conditional-expressions:


conditional-expression:
binary-expression [ ? expression : conditional-expression ]
The code for exprl doesn't follow the grammar precisely; expr2 is called
for both productions, even though unary should be called for the second
production. expr2 ultimately calls unary, so the code above recognizes
all correct expressions, but it also recognizes incorrect ones. Incorrect
expressions are caught by the semantic analysis in asgntree. The ad-
vantage of this approach is that it handles errors more gracefully. For
example, in a + b = c, a + b is not a unary-expression, so a more strict
parser would signal an error at the + and might signal other errors be-
cause it didn't parse the expression completely. l cc will accept the ex-
pression with no syntax errors, but will complain that the left-hand side
of the assignment isn't an lvalue.
The first if statement in exprl tests for an assignment (=) or the ini-
tial character of the augmented-assignment operators (see Table 8.2).
158 CHAPTER 8 • EXPRESSIONS

oper [op] will be the corresponding generic tree operator for these char-
acters, for example, oper [ '+' J is ADD. exprl handles augmented assign-
ments, such as +=, by recognizing the two tokens that make up the aug-
mented assignment operator:
(augmented assignment 158)= 157
{
expect('=');
p = incr(op, p, exprl(O));
}

Each augmented assignment operator is one token, but this code appears
to treat them as two tokens; expr3, described below, avoids this erro-
neous interpretation by recognizing tokens like + as binary operators
only when they aren't immediately followed by an equals sign. Thus,
exprl correctly interprets a += bas an augmented assignment and lets
expr3 discover the error in a + = b.
i ncr builds trees for expressions of form v ®= e for any binary oper-
ator®, lvalue v, and rvalue e.
( tree.c functions)+=
....
157 159
.....
Tree incr(op, v, e) int op; Tree v, e; {
return asgntree(ASGN, v, (*optree[op])(oper[op], v, e));
asgntree 197 }
expect 142
exprl 157 i ncr is one place where the front end builds a dag instead of a tree. For
expr3 162 example, Figure 8.2 shows the tree returned by i ncr for *f() += b. *f()
oper 155
optree 191
must be evaluated only once, but the lvalue it computes is used twice -
once for the rvalue and once as the target of the assignment. Building
only one tree for *f() reflects these semantics. Ultimately, these kinds
of trees require temporaries, which are generated when the trees are
converted into nodes; Chapter 12 explains.
These dags could have been avoided by using additional tree opera-
tors for augmented assignments. Doing this would increase the number

CALL+P INDIR+I

i
ADDRG+P
i
ADDRG+P
f b
FIGURE 8.2 Tree for *f() += b.
8.5 • CONDITIONAL EXPRESSIONS 159

of tree operators, and it might complicate the semantic analyses for the
binary operators involved. For example, addtree, the function that per-
forms the semantic analysis for +, might have to cope with both + and
+=. There are several other situations in which it's useful to permit dags;
an example occurs in dealing with nested functions, which is described
in Section 9.3.

8.5 Conditional Expressions


The syntax of conditional expressions is
conditional-expression:
binary-expression [ ? expression : conditional-expression ]
The value of a conditional expression is the value of expression if binary-
expression is nonzero; otherwise it's the value of the third operand,
which itself can be a conditional-expression. expr2 does the parsing:
....
( tree.c functions)+=
static Tree expr2() {
158 160...
Tree p = expr3(4);

if Ct == '?') { 192 addtree


Tree l, r; 62 Aflag
Coordinate pts[2]; 149 COND
200 condtree
if (Aflag > 1 && isfunc(p->type)) 38 Coordinate
warning("%s used in a conditional expression\n", 162 expr3
funcname(p)); 155 expr
p = pointer(p); 60 isfunc
t = gettok(); 174 pointer
pts[O] = src;
1 = pointer(expr(':'));
pts[l] = src;
r = pointer(expr2());
p = condtree(p, l, r);
if (events.points)
(plant event hooks for?:)
}
return p;
}

expr2 begins by calling expr3 to parse a binary-expression begipning at


precedence level 4, and concludes by calling condtree to build the COND
tree (shown in Figure 9.6).
A common error in both if statements and conditional expressions is
to use a function name instead of a function call; for example, using the
I '
160 CHAPTER 8 • EXPRESSIONS

expression test ? a : b instead of test(a,b) ? a : b. Both of these


expressions are legal, but the first one is rarely what the programmer
intended. 1 cc's -A option causes 1 cc to warn about this and similarly
suspicious usage. Afl ag records the number of -A options specified;
multiple occurrences elicit more warnings.
1cc includes facilities for executing event hooks at various points in
the source program that correspond to branches in the flow of con-
trol. This facility is used, for example, to inject trees that implement
expression-level profiling, and to inject data for source-level debuggers.
The operator ? : is one of the three that alter flow of control. The param-
eters to the functions that plant hooks include the source coordinates of
the then and else parts of the expression, which is why these coordinates
are saved in pts [OJ and pts [1] in the code above.
Conditional expressions are also used to convert a relational, which is
represented only by flow of control, to a value. For example, a = b < c
sets a to one if b < c and to zero otherwise. This conversion is imple-
mented by va1ue, which builds a COND tree c.orresponding to the expres-
sion c ? 1 : 0.
....
( tree.c functions)+=
Tree value(p) Tree p; {
159 162 ...
int op= generic(rightkid(p)->op);
Aflag 62
AND 149 if (op==AND 11 op==OR 11 op==NOT 11 op==EQ 11 op==NE
COND 149 I I op== LE I I OP==L T I I OP== GE I I op==GT)
condtree 200 p = condtree(p, consttree(l, inttype),
consttree 193
consttree(O, inttype));
NOT 149
OR 149 return p;
ri ghtki d 171 }

1 cc's interface could have specified two flavors of comparison opera-


tors: one that's used in conditional contexts, in which there is always a
jump, and one that's used in value contexts like a = b < c, and yields
a zero or one. An advantage of this design is that 1cc could then use
instructions that capture the outcome of a comparison and avoid the
jumps implied in c ? 1 : O. But only those targets that have these in-
structions and that penalize jumps severely would benefit from this al-
ternative, and the rest would pay for the increased operator vocabulary.
Specifying only the conditional form of the comparison operators is an
example of favoring retargetability over flexibility.

8.6 Binary Expressions


Expressions involving all the binary operators with precedences 4-13 (see
Table 8.2) are defined by the productions
8.6 • BINARY EXPRESSIONS 161

binary-expression:
unary-expression { binary-operator unary-expression }
binary-operator:
one of I I && ' I ' A & == ! = < > <= >= << >> + - * I %

and are parsed by one function, as described in Section 8.2. Using that
approach, the parsing function - without its tree-building code - for
binary-expression is
void expr3(k) int k; {
if (k > 13)
unary();
else {
expr3(k + 1);
while (prec[t] k) {
t = gettok();
expr3(k + 1);
}
}
}

where unary is the parsing function for unary-expression. This function


parses binary-expression correctly, but does more work than is neces-
sary. The call expr3(4) in expr2 is the only external call to expr3 outside 159 expr2
of expr3 itself. Thus, there are 10 recursive calls to expr3 (5) through 162 expr3
164 unary
expr3(14) before the first call to unary. These 10 calls unwind from
highest to lowest precedence as the source expression is parsed. The
while loop is not entered until the call with a k equal to prec [t], where
tis the token that follows the expression parsed by unary. Many of the
recursive calls to expr3 serve only to test if their k is equal to prec [t];
only one succeeds.
For example, here's the sequence of calls for the expression a I b:
expr3(4)
expr3(5)
expr3(6)
expr3(7)
expr3(8)
expr3(9)
expr3(10)
expr3(11)
expr3(12)
expr3(13)
expr3(14)
unary()
expr3(7)
162 CHAPTER 8 • EXPRESSIONS

expr3(14)
unary()
Of the calls leading up the first call to unary (which parses the a), only
expr3 (6) does useful work after unary returns. And none of the recur-
sive calls from within the while loop leading to the second call to unary
(which parses b) do useful work.
This sequence reveals the overall effect of the calls to expr3: parse
a unary-expression, then parse binary-expressions at precedence levels
13, 12, ... , 4. The recursion can be replaced by counting from 14 down
to k. Since nothing interesting happens until the precedence is equal to
prec [t], counting can begin there:
void expr3(k) int k; {
int kl;
unary();
for (kl= prec[t]; kl>= k; kl--)
while (prec[t] == kl) {
t = gettok();
expr3(kl + l);
}
}

Coordinate 38 This transformation also benefits the one remaining recursive call to
prec 155 expr3 by eliminating most of the recursion in that call. Now, the se-
unary 164 quence of calls for a Ib is
expr3(4)
unary()
expr3(7)
unary()
Adding the code to validate and build the trees and to solve two remain-
ing minor problems (augmented assignments and the && and I I opera-
tors) yields the final version of expr3:
...
( tree.c functions)+=
static Tree expr3(k) int k; {
...
160 164

int kl;
Tree p =unary();

for (kl= prec[t]; kl>= k; kl--)


while (prec[t] == kl && *cp != '=') {
Tree r, l;
Coordinate pt;
int op = t;
t = gettok();
pt = src;
8. 7 • UNARY AND POSTFIX EXPRESSIONS 163

p = pointer(p);
if (op == ANDAND I I op == OROR) {
r = pointer(expr3(kl));
if (events.points)
(plant event hooks for && I I)
} else
r = pointer(expr3(kl + 1));
p (*optree[op])(oper[op], p, r);
}
return p;
}

Like conditional expressions, the && and I I operators alter flow of control
and thus must provide for event hooks.
Technically, the && and 11 operators are left-associative, and their right
operands are evaluated only if necessary. It simplifies node generation if
they are treated as right-associative during parsing. Each operator is the
sole occupant of its precedence level, so, for example, making && right
associative simply yields a right-heavy ANDAND tree instead of a left-heavy
one. As detailed in Section 12.3, this apparent error is not only repaired
during node generation, but leads to better code for the short-circuit
evaluation of && and 11 than left-heavy trees. Making 11 right-associative
requires calling expr3(4) instead of expr3(5) in the while loop. For&&, 109 ANDAND
expr3(5) must be called instead of expr3(6). Calling expr3(kl) instead 157 exprl
of expr3(kl+l) for these two operators makes the appropriate calls. 162 expr3
The last problem is augmented assignment. exprl recognizes the 155 oper
191 optree
augmented-assignment operators by recognizing two-token sequences. 109 OROR
But these operators are single tokens, not two-token sequences; for ex- 174 pointer
ample,+= is the token for additive assignment, and+ =is a syntax error.
exprl's approach is correct only if+ =is never recognized as+=. expr3
guarantees this condition by doing just the opposite: a binary operator
is recognized only when it is not followed immediately by an equals sign.
Thus, the + in a + = b is not recognized as a binary operator, and 1cc
detects the syntax error.

8.7 Unary and Postfix Expressions


The remaining functions handle the productions
unary-expression:
postfix-expression
unary-operator unary-expression
' C' type-name ') ' unary-expression
si zeof unary-expression
si zeof 'C' type-name ')'
164 CHAPTER 8 • EXPRESSIONS

unary-operator:
one of++ -- & * + - - !
postfix-expression:
primary-expression { postfix-operator}
postfix-operator:
' [ ' expression ' J '
' (' [ assignment-expression { , assignment-expression } ] ') '
. identifier
-> identifier
++

and the productions for primary-expression, which are given in the next
section. The parsing components of these functions are simple because
these productions are simple. The parsing function for unary-expression
is an example: most of the unary operators are parsed by consuming the
operator, parsing the operand, and building the tree.
...
(tree.c functions)+=
static Tree unary() {
162 166...
Tree p;
DECR 109 switch (t) {
expr 155
istypename 115
case '*'. (p +- unary165) (indirection 179) break;
postfix 166 case '&': (p +- unary165) (address of 179) break;
primary 167 case '+': (p +- unary165) (affirmation) break;
tsym 108 case - ':I
(p +- unary165) (negation 178) break;
case -·: I
(p +- unary165) (complement) break;
case (p +- unary165) (logical not) break;
I! I:

case !NCR: (p +- unary165) (preincrement 165) break;


case DECR: (p +- unary165) (predecrement) break;
case SIZEOF: t = gettok(); { (sizeof165) } break;
case '(':
t = gettok();
if (istypename(t, tsym)) {
(type cast180)
} else
p = postfix(expr(')'));
break;
default:
p = postfix(primary());
}
return p;
}
8. 7 • UNARY AND POSTFIX EXPRESSIONS 165

(p -- umuy165)= 164
t = gettok(); p =unary();
Most of the fragments perform semantic checks, which are described
in the next chapter. Three are simple enough to dispose of here. The
expression ++e is semantically equivalent to the augmented assignment
e += 1, so i nc r can build the tree for unary ++:
(preincrementl65)= 164
p = incr(INCR, pointer(p), consttree(l, inttype));
Predecrement is similar.
si zeof ' (' type-name ') ' is a constant of type si ze_t that gives the
number of bytes occupied by an instance of type-name. In lee, si ze_t
is unsigned. Similarly, the unary-expression in si zeof unary-expression
serves only to provide a type whose size is desired; the unary-expression
is not evaluated at runtime. Most of the effort in parsing si zeof goes
into distinguishing between these two forms of sizeof and finding the
appropriate type. Notice that the parentheses are required if the operand
is a type-name.
(sizeof165)= 164
Type ty;
p = NULL; 193 consttree
if Ct == I (') { 142 expect
t = gettok(): 155 expr
if (istypename(t, tsym)) { 149 FIELD
158 incr
ty = typename(); 60 i sfunc
expect(')'); 115 i stypename
} else { 174 pointer
p = postfix(expr(')')); 166 postfix
ty = p->type; 171 ri ghtki d
}
108 tsym
309 typename
} else { 164 unary
p = unary(): 58 unsignedtype
ty = p->type;
}
if (isfunc(ty) I I ty->size == 0)
error("invalid type argument '%t' to 'sizeof'\n", ty);
else if (p && rightkid(p)->OP == FIELD)
error("'sizeof' applied to a bit field\n");
p = consttree(ty->size, unsignedtype);
As the code suggests, si zeof cannot be applied to functions, incomplete
types, or those derived from bit fields.
In unary and in (sizeof), a left parenthesis is a primary-expression or,
if the next token is a type name, the beginning of a type cast.
166 CHAPTER 8 • EXPRESSIONS

If a left parenthesis does not introduce a type cast, it's too late to let
primary parse the parenthesized expression, so unary must handle it.
This is why postfix expects its caller to call primary and pass it the
resulting tree instead of calling primary itself:
(tree.c functions)+=
...
164 167
.....
static Tree postfix(p) Tree p; {
for (;;)
switch (t) {
case !NCR: (postincrement 166) break;
case DECR: (postdecrement) break;
case '[': (subscriptl81) break;
case'(': (calls186) break;
case ' ' · (struct. field) break;
case DEREF: (pointer-> field 182) break;
default:
return p;
}
}

Again, most of the fragments in postfix check the semantics of the


operand and build the appropriate tree as detailed in the next chapter,
but the tree for postincrement (and postdecrement) can be built by i ncr:
consttree 193
DECR 109 (postincrement 166) = 166
DEREF 109 p = tree(RIGHT, p->type,
incr 158 tree(RIGHT, p->type,
primary 167
p,
RIGHT 149
tree 150 incr(t, p, consttree(l, inttype))),
unary 164 p);
t = gettok();
The tree for postfix ++ is a dag because it must increment the operand
but return the previous value. For example, the expression i ++ builds
the tree shown in Figure 8.3. The two RIGHT operators in this tree ensure
the proper order of evaluation. The value of the entire expression is the
rvalue of i, and the lower RIGHT tree ensures that this value is computed
and saved before i is incremented by the ASGN+I tree. The construction
is identical for p++ where p is a pointer - the addition takes care of
incrementing p by the size of its referent.

8.8 Primary Expressions


The last parsing function for expressions is primary. It parses
primary-expression:
identifier
8.8 • PRIMARY EXPRESSIONS 167

constant
string-literal
' ( ' expression ' ) '

which is analogous to factor in the simple expression grammars de-


scribed in Section 8.2. All that's left to handle are constants and identi-
fiers:
...
(tree.c functions)+=
static Tree primary() {
...
166 168

Tree p;

switch (t) {
case ICON:
case FCON: (numeric constants 167) break;
case SCON: (string constants 168) break;
case ID: (an identifierl70) break;
default:
error("illegal expression\n");
p = consttree(O, inttype);
}
t = gettok();
return p;
193 consttree
}
150 tree
108 tsym
CNST trees hold the values of integer and floating constants in their u. v 73 ttob
fields:
(numeric constants 167) = 167
p = tree(CNST + ttob(tsym->type), tsym->type, NULL, NULL);
p->u.v = tsym->u.c.v;

RIGHT

/
RIGHT

~ ASGN+I
INDIR+I

i~~ADD+!
ADDRG+P

~CNST+I
1

FIGURE 8.3 Tree for i ++.


168 CHAPTER 8 • EXPRESSIONS

(u fields for Tree variants 168) = 168


..... 149
Value v;
String constants are abbreviations for read-only variables initialized to
the value of the string constant:
(string constants 168) = 167
tsym->u.c.v.p = stringn(tsym->u.c.v.p, tsym->type->size);
tsym = constant(tsym->type, tsym->u.c.v);
if (tsym->u.c.loc ==NULL)
tsym->u.c.loc = genident(STATIC, tsym->type, GLOBAL);
p = idtree(tsym->u.c.loc);
The generated variable and its initialization are emitted at the end of
the compilation by fi na1i ze. The tree for strings is the tree for the
generated identifier.
i dt ree (p) builds a tree for accessing the identifier indicated by the
symbol-table entry p. Identifiers are categorized by their scopes and life-
times (parameters, automatic locals, and statics, including globals) and
their types (arrays, functions, and nonarray objects). i dtree uses an
identifier's scope and storage class to determine its addressing operator,
then uses its type to determine the shape of the tree that accesses it, and
stores a pointer to the symbol-table entry in the tree's u. sym field:
constant 47
EXTERN 80 (u fields forTree variants 168)+=
...
168 183 149
.....
finalize 303 Symbol sym;
genident 49
GLOBAL 38
( tree.c functions)+=
...
167 169
IR 306 .....
isarray 60 Tree idtree(p) Symbol p; {
i sfunc 60 int op;
isstruct 60 Tree e;
PARAM 38 Type ty = p->type? unqual(p->type) voidtype;
ref 38
refine 169
scope 37 p->ref += refine;
STATIC 80 if (p->scope == GLOBAL
stringn 30 I I p->sclass == STATIC I I p->sclass EXTERN)
tree 150 op = ADDRG+P;
tsym 108
unqual 60 else if (p->scope == PARAM) {
Value 47 op = ADDRF+P;
voidtype 58 if (isstruct(p->type) && !IR->wants_argb)
wants_argb 88 (return a tree for a struct parameter 170)
} else
op = ADDRL+P;
if (isarray(ty) I I isfunc(ty)) {
e = tree(op, p->type, NULL, NULL);
e->u.sym = p;
} else {
8. 8 • PRIMARY EXPRESSIONS 169

e = tree(op, ptr(p->type), NULL, NULL);


e->u.sym = p;
e = rvalue(e);
}
return e;
}

(tree.c data)+=
...
155
float refine = 1.0;
p->ref is an estimate of the number of references to the identifier de-
scribed by p; other functions can adjust the weight of one reference to
p by changing refi nc. All external, static, and global identifiers are ad-
dressed with ADDRG operators; parameters are addressed with ADDRF; and
locals are addressed with ADDRL.
Arrays and functions cannot be used as lvalues or rvalues, so refer-
ences to them have only the appropriate addressing operators. Trees for
other types refer to the identifiers' rvalues; an example is th~ tree for i's
rvalue in Figure 8.3. rvalue adds the INDIR:
...
( tree.c functions)+=
Tree rvalue(p) Tree p; {
168 169 ...
Type ty = deref(p->type);
61 deref
ty = unqual(ty); 60 isunsigned
61 ptr
return tree(INDIR + (isunsigned(ty) ? I ttob(ty)), 150 tree
ty, p, NULL); 73 ttob
} 60 unqual
160 value
rva 1 ue can be called with any tree that represents a pointer value. 58 voidtype
1va1 ue, however, must be called with only trees that represent an rvalue 88 wants_argb
- the contents of an addressable location. The INDIR tree added by
rva 1 ue also signals that a tree is a valid lvalue, and the address is ex-
posed by tearing off the IND IR. 1va1 ue implements this check and trans-
formation:
...
(tree.c functions)+=
Tree lvalue(p) Tree p; {
169 173 ...
if (generic(p->op) != INDIR) {
error("lvalue required\n");
return value(p);
} else if (unqual(p->type) == voidtype)
warning('"%t' used as an lvalue\n", p->type);
return p->kids[O];
}

The tree for a structure parameter also depends on the value of the
interface field wants_argb. If wants_argb is 1, the code shown above
170 CHAPTER 8 • EXPRESSIONS

builds the appropriate tree, which has the form (INDIR+B (ADDRF+P x))
for parameter x. If wants_argb is zero, the front end implements struc-
ture arguments by copying them at a call and passing pointers to the
copies. Thus, a reference to a structure parameter needs another indi-
rection to access the structure itself:
(return a tree for a struct parameter 170) = 168
{
e = tree(op, ptr(ptr(p->type)), NULL, NULL);
e->u.sym = p;
return rvalue(rvalue(e));
}

For a parameter x, this code builds the tree


(INDIR+B (INDIR+P (ADDRF+P X)))
i dt ree is used wherever a tree for an identifier is needed, such as for
string constants (above) and for identifiers:
(an identifierI70)= 167
if (tsym == NULL)
(undeclared identifier)
if (xref)
use(tsym, src);
const:t:ree 193
ENUM 109
if (tsym->sclass == ENUM)
idt:ree 168 p = consttree(tsym->u.value, inttype);
pt:r 61 else {
rvalue 169 if (tsym->sclass == TYPEDEF)
t:ree 150 error("illegal use of type name '%s'\n", tsym->name);
t:sym 108
use 51
p = idtree(tsym);
value 160 }
want:s_argb 88
If tsym is null, the identifier is undeclared, which draws a diagnostic
unless it's a function call (see Exercise 8.5). Enumeration identifiers are
synonyms for constants and yield trees for the constants, not for the
identifiers.

Further Reading
Handling n levels of precedence with one parsing function instead of
n parsing functions is well known folklore in compiler circles, but there
are few explanations of the technique. Hanson (1985) describes the tech-
nique used as it is used in 1 cc. Holzmann (1988) used a similar technique
in his image manipulation language, pico. The technique is technically
equivalent to the one used in BCPL (Richards and Whitby-Strevens 1979),
but the operators and their precedences and associativities are spread
throughout the BCPL code instead of being encapsulated in tables.
EXERCISES 171

Exercises
8.1 Implement
( tree.c exported functions)=
extern Tree retype ARGS((Tree p, Type ty));
...
171

which returns p if p->type == ty or a copy of p with type ty. Recall


that all tree-manipulation functions are applicative.
8.2 Implement
....
( tree.c exported functions)+=
extern Tree rightkid ARGS((Tree p));
171 171...
which returns the rightmost non-RIGHT operand of a nested series
of RIGHT trees rooted at p. Don't forget that RIGHT nodes can have
one or two operands (but not zero).
8.3 Implement
( tree.c exported functions)+=
....
171
extern int hascall ARGS((Tree p));
which returns one if p contains a CALL tree and zero otherwise.
Don't forget about the interface flag mulops_calls. 162 expr3
40 externals
8.4 Reimplement expr3 the straightforward way shown at the begin- 158 incr
318 listnodes
ning of Section 8.6, and measure its performance. Is the savings 87 mulops_calls
gained by removing the recursive calls worth the effort? 149 RIGHT
8.5 Complete the code for (undeclared identifier) used on page 170. If
the identifier is used as a function, which is legal, supply an im-
plicit declaration for the identifier at the current scope and in the
externals table. Otherwise, the undeclared identifier is an error,
but it's useful to supply an implicit declaration for it anyway so
that compilation can proceed.
8.6 As explained in Section 8.4, the trees returned by i ncr are dags.
Add new tree operators for the augmented assignment operators
and rewrite i ncr to use them and thus avoid the dags. You'll need
to change 1 i stnodes, and you might have to change the semantics
functions in enode. c.
9
Expression Semantics

Expressions must be both syntactically and semantically correct. The


parsing functions described in the previous chapter handle the syntac-
tic issues and some of the simpler semantic issues, such as building the
trees for constants and identifiers. This chapter describes the semantic
analyses that must be done to build trees for expressions. These analy-
ses must deal with three separate subproblems of approximately equal
difficulty: implicit conversions, type checking, and order of evaluation.
Implicit conversions are conversions that do not appear in the source
program and that must be added by the compiler in order to adhere to
the semantic rules of the standard. For example, in a + b, if a is an int
and bis a float, a + bis semantically correct, but an implicit conversion
must be added to convert a's value to a float.
Type-checking confirms that the types of an operator's operands are
legal, determines the type of the result, and computes the type-specific
operator that must be used. For example, type checking a + b verifies
expr 155 that the types of a and b are legal combinations of the arithmetic types,
promote 71
and uses the types of a and b to determine the type of the result, which is
one of numeric types. It also determines which type-specific addition is
required. In the a + b example, type checking is handed the equivalent
of (float)a + b, and determines that floating addition is required.
The compiler must generate trees that obey the standard's rules for
the order of evaluation. For many operators, the order of evaluation is
unspecified. For example, in a [ i ++ J = i, it is unspecified whether i is
incremented before or after the assignment. The order of evaluation is
specified for a few operators; for example, in f() && g(), f must be
called before g; if f returns zero, g must not be called. Similarly, f must
be called before gin (f(), g()). As suggested in expr, RIGHT trees have
a well defined order of evaluation and can be used to force a specific
order of evaluation.

9.1 Conversions
Conversion functions accept one or more types and return a resulting
type, or accept a tree and perhaps a type and return a tree with the
appropriate conversion. promote(Type ty) is an example of the former
kind of conversion: It implements the integral promotions. It widens
an integral type ty to int, unsigned, or long, if necessary. As stipulated

172
9. 1 • CONVERSIONS 173

by the standard, the integral promotions preserve value, including sign.


They are not unsigned preserving. For example, an unsigned char is
promoted to an int, not an unsigned int. A small integral type (or a bit
field) is promoted to int if int can represent all the values of the smaller
type. Otherwise, the small integral type is promoted to unsigned int. In
1cc, int must always represent the values of the smaller integral types,
which is why the final if statement in promote returns i nttype.
bi nary implements the usual arithmetic conversions; it takes two arith-
metic types and returns the type of the result for any binary arithmetic
operator:
( tree.c functions)+=
....
169 174
....
Type binary(xty, yty) Type xty, yty; {
if (isdouble(xty) I I isdouble(yty))
return doubletype;
if (xty == floattype I I yty == floattype)
return floattype;
if (isunsigned(xty) I I isunsigned(yty))
return unsignedtype;
return inttype;
}

1cc assumes that doubles and long doubles are the same size and that
57 doubletype
longs and ints (both unsigned and signed) are also the same size. These 57 floattype
assumptions simplify the standard's specification of the usual arithmetic 57 inttype
conversions and thus simplify bi nary. The list below summarizes the 60 isdouble
standard's specification in the more general case, when a long double is 60 isunsigned
bigger than a double, and a long is bigger than an unsigned int: 174 pointer
71 promote
long double 58 unsignedtype
double
float
unsigned long int
long int
unsigned int
int
The type of the operand that appears highest in this list is the type to
which the other operand is converted. If none of these types apply, the
operands are converted to ints. 1cc's assumptions collapse the first two
types to the first if statement in bi nary, and the second if statement
handles floats. The third if statement handles the four integer types
because 1cc's signed long cannot represent all unsigned values.
pointer is an example of the second kind of conversion function that
takes a tree and returns a tree, possibly converted. Array and func-
tion types decay into pointers when used in expressions: (ARRAY T) and
(POINTER T) decay into (FUNCTION T) and (POINTER (FUNCTION T)).
174 CHAPTER 9 • EXPRESSION SEMANTICS

....
(tree.c functions}+= 173 174
.....
Tree pointer(p) Tree p; {
if (isarray(p->type))
p = retype(p, atop(p->type));
else if (isfunc(p->type))
p = retype(p, ptr(p->type));
return p;
}

rva 1ue, 1va1 ue, and va1ue can also be viewed as conversions. cond is
the inverse of va1 ue; it takes a tree that might represent a value and
turns it into a tree for a conditional by adding a comparison with zero:
....
( tree.c functions}+= 174 175
.....
Tree cond(p) Tree p; {
int op= generic(rightkid(p)->op);

if (op == AND I I op == OR I I op == NOT


I I op == EQ I I op == NE
I I op == LE I I op == LT I I op == GE I I op == GT)
return p;
p = pointer(p);
p = cast(p, promote(p->type));
AND 149
atop 62
return (*optree[NEQ])(NE, p, consttree(O, inttype));
cast 175 }
consttree 193
isarray 60 A conditional has no value; it's used only in a context in which its out-
isfunc 60 come affects the flow of control, such as in an if statement. cond returns
lvalue 169 a tree whose outcome is true when the value is nonzero.
NOT 149 cond calls cast to convert its argument to the basic type given by its
optree 191
OR 149
promoted type. cast implements the conversions depicted in Figure 9.1.
promote 71 Each arrow in Figure 9.1 represents one of the conversion operators. For
ptr 61 example, the arrow from I to D represents conversion from integer to
retype 171 double, CVI+D, and the opposite arrow represents conversion from dou-
ri ghtki d 171 ble to integer, CVD+I. The C above the I denotes signed characters and
rvalue 169
value 160
the C above the Udenotes unsigned characters; similar comments apply
to the two occurrences of S.
Conversions that don't have arrows are implemented by combining the
existing operators. For example, a signed short integer s is converted to
a float by converting it to an integer, then to a double, and finally to float.
The tree is (CVD+F (CVI+D (CVS+! s))). Conversions between unsigned
and double are handled differently, as described below.
cast has three parts that correspond to the steps just outlined. First,
p is converted to its supertype, which is D, I, or U. Then, it's converted to
the supertype of the destination type, if necessary. Finally, it's converted
to the destination type.
9. 1 • CONVERSIONS 175

c c

o-I-u-P
l l
l
F
l
s
l
s
FIGURE 9.1 Conversions.

....
(tree.c functions)+=
Tree cast(p, type) Tree p; Type type; {
174 182 ...
Type pty, ty = unqual(type);

p = value(p);
if (p->type == type)
return p;
pty = unqual(p->type);
i f (pty == ty)
return retype(p, type);
(convert p to super(pty) 175)
(convert p to super(ty) 176)
109 CHAR
(convert p to ty 177) 109 DOUBLE
return p; 57 doubletype
} 109 ENUM
109 FLOAT
As shown, these conversions are done with the unqualified versions of 109 INT
the types involved. super returns its argument's supertype. 60 isptr
The first step makes all signed integers ints, floats doubles, and point- 109 POINTER
ers unsigneds: 171 retype
109 SHORT
(convert p to super(pty) 175) = 175 203 simplify
60 unqual
switch (pty->op) { 109 UNSIGNED
case CHAR: p = simplify(CVC, super(pty), p, NULL); break; 58 unsignedtype
case SHORT: p = simplify(CVS, super(pty), p, NULL); break; 160 value
case FLOAT: p = simplify(CVF, doubletype, p, NULL); break;
case INT: p = retype(p, inttype); break;
case DOUBLE: p = retype(p, doubletype); break;
case ENUM: p = retype(p, inttype); break;
case UNSIGNED:p = retype(p, unsignedtype); break;
case POINTER:
if (isptr(ty)) {
(pointer-to-pointer conversion 176)
} else
p = simplify(CVP, unsignedtype, p, NULL);
176 CHAPTER 9 • EXPRESSION SEMANTICS

break;
}

simplify builds trees just like tree, but folds constants, if possible, and,
if a generic operator is given as its first argument, s imp1i fy forms the
type-specific operator from its first and second arguments. 1cc insists
that pointers fit in unsigned integers, so that they can be carried by un-
signed operators, which reduces the operator vocabulary. There's one
special case: the CVP+U is eliminated for pointer-to-pointer conversions
because it's always useless there.
(pointer-to-pointer conversion 176) = 175
if (isfunc(pty->type) && !isfunc(ty->type)
I I !isfunc(pty->type) && isfunc(ty->type))
warning("conversion from '%t' to '%t' is compiler _
dependent\n", p->type, ty);
return retype(p, type);
1 cc warns about conversions between object pointers and function point-
ers because the standard permits these different kinds of pointers to
have different sizes. 1 cc, however, insists that they have the same sizes.
The second step converts p, which is now a double, int, or unsigned,
to whichever one of these three types is ty's supertype, if necessary:
doubletype 57
isfunc 60 (convertp to super(ty) 176)= 175
retype 171 {
simplify 203 Type sty= super(ty);
tree 150 pty = p->type;
unsignedtype 58
if (pty != sty)
if (pty == inttype)
p = simplify(CVI, sty, p, NULL);
else if (pty == doubletype)
if (sty == unsignedtype) {
(double-to-unsigned conversion)
} else
p = simplify(CVD, sty, p, NULL);
else if (pty == unsignedtype)
if (sty == doubletype) {
(unsigned-to-double conversion 177)
} else
p = simplify(CVU, sty, p, NULL);
}

Notice that there are no arrows directly between D and u in Figure 9.1.
Most machines have instructions that convert between signed integers
and doubles, but few have instructions that convert between unsigneds
and doubles, so there is no CVU+D or CVD+U. Instead, the front end builds
9. 1 • CONVERSIONS 177

trees that implement these conversions, assuming that integers and un-
signeds are the same size.
An unsigned u can be converted to a double by constructing an ex-
pression equivalent to
2.*(int)(u>>l) + (int)(u&l)
u»l vacates the sign bit so that the shifted result, which is equal to
u/2, can be converted to a double with an integer-to-double conversion.
The floating-point multiplication and addition compute the value desired.
The code builds the tree for this expression:
(unsigned-to-double conversion 177) = 176
Tree two= tree(CNST+D, doubletype, NULL, NULL);
tWO->U.V.d = 2.;
p = (*optree['+'])(ADD,
(*optree['*'])(MUL,
two,
simplify(CVU, inttype,
simplify(RSH, unsignedtype,
p, consttree(l, inttype)), NULL)),
simplify(CVU, inttype,
simplify(BAND, unsignedtype,
p, consttree(l, unsignedtype)), NULL)); 175 cast
57 chartype
Notice that this tree is a dag: It contains two references top. The optree 193 consttree
functions are used for the multiplication and addition so that the integer- 57 doubletype
57 floattype
to-double conversions will be included. 60 isptr
The front end implements double-to-unsigned conversions by con- 191 optree
structing a tree for the appropriate expression. Exercise 9.2 explores 171 retype
how. 57 shorttype
The tree now represents a value whose type is the supertype of ty, 57 signedchar
203 simplify
and the third step in cast converts the tree to the destination type. This 150 tree
step is essentially the inverse of super: 57 unsignedchar
58 unsignedshort
(convert p to ty 177) = 175 58 unsignedtype
if (ty == signedchar I I ty == chartype I I ty shorttype)
p = simplify(CVI, type, p, NULL);
else if (isptr(ty)
I I ty == unsignedchar I I ty == unsignedshort)
p = simplify(CVU, type, p, NULL);
else if (ty == floattype)
p simplify(CVD, type, p, NULL);
else
p retype(p, type);
178 CHAPTER 9 • EXPRESSION SEMANTICS

9.2 Unary and Postfix Operators


The conversion functions described above provide the machinery needed
to implement the semantic checking for each of the operators. The con-
straints on the operands, such as their types, and the semantics of the
operator, such as its result type, are defined in the standard. The prose
for unary - is typical:
The operand of the unary - operator shall have arithmetic type.
The result of the unary - operator is the negative of its operand. The
integral promotion is performed on the operand, and the result has the
promoted type.
The code for each operator implements these kinds of specifications;
it checks that the operand trees meet the constraints and it builds the
appropriate tree for the result. For example, the code for unary - is
(negation 178) = 164
p = pointer(p);
if (isarith(p->type)) {
p = cast(p, promote(p->type));
if (isunsigned(p->type)) {
warning("unsigned operand of unary -\n");
p = simplify(NEG, inttype, cast(p, inttype), NULL);
cast: 175 p = cast(p, unsignedtype);
i sari t:h 60 } else
isunsigned 60 p = simplify(NEG, p->type, p, NULL);
lvalue 169
point:er 174 } else
promot:e 71 typeerror(SUB, p, NULL);
simplify 203 typeerror issues a diagnostic for illegal operands to a unary or binary
unsignedt:ype 58
operator. For example, if pi is an int *, -pi is illegal because pi is not
an arithmetic type, and typeerror issues
operand of unary - has illegal type 'pointer to int'
Warning about using unsigned operands to unary - is not required by
the standard, but helps pinpoint probable errors. This warning would be
appropriate even if 1cc supported a signed 1ong type that could hold all
negated unsigneds, because the integral promotions do not yield any of
the long types.
For unary &, the standard says
The operand of the unary & operator shall be either a function designator
or an lvalue that designates an object that is not a bit-field and not
declared with the register storage-class specifier.
Unary & takes an operand of type T and returns its address, which has
type (POINTER T). In most cases, the semantics above are provided by
1va1 ue, which exposes the addressing tree under an INDIR. The excep-
tions are arrays and functions, which have no INDIRs:
9.2 • UNARY AND POSTFIX OPERATORS 179

(address of179)= 164


if (isarray(p->type) I I isfunc(p->type))
p = retype(p, ptr(p->type));
else
p = lvalue(p);
if (isaddrop(p->op) && p->u.sym->sclass == REGISTER)
error("invalid operand of unary&; '%s' is declared_
register\n", p->u.sym->name);
else if (isaddrop(p->op))
p->u.sym->addressed = 1;

(tree.c exported macros)=


#define isaddrop(op) \
((op)==ADDRG+P I I (op)==ADDRL+P I I (op)==ADDRF+P)

(symbol flags 50)+=


....
50 211 38
unsigned addressed:!; ""
As specified above, unary & cannot be applied to register variables or to
bit fields. Trees for bit fields don't have INDIRs, so 1va1 ue catches them.
The front end changes the storage class of frequently referenced locals
and parameters to REGISTER before it passes them to the back end. But
it must not change the storage class of variables whose addresses are 175 cast
taken, which are those symbols with addressed lit. 60 isarray
Unary * is the inverse of unary &; it takes an operand with the type 60 isfunc
(POINTER T) and wraps it in an INDIR tree to represent an rvalue of type 60 isptr
T. Again, most of the work is done by rva 1 ue, and pointers to arrays 169 lvalue
215 nullcheck
and functions need special treatment. 174 pointer
61 ptr
(indirection 179) = 164
80 REGISTER
p = pointer(p); 171 retype
if (isptr(p->type) 169 rvalue
&& (isfunc(p->type->type) I I isarray(p->type->type)))
p = retype(p, p->type->type);
else {
i f (YYnull)
p = nullcheck(p);
p = rvalue(p);
}

Exercise 9.5 explains YYnul 1 and nul 1 check, which help catch null-
pointer errors.
Type casts specify explicit conversions. Some casts, such as pointer-
to-pointer casts, generate no code, but simply specify the type of an ex-
pression. Other casts, such as int-to-float, generate code that effects the
conversion at runtime. The code below and the code in cast implement
the rules specified by the standard.
180 CHAPTER 9 • EXPRESSION SEMANTICS

The standard stipulates that the target type specified in a cast must be
a qualified or unqualified scalar type or void, and the type of the operand
- the source type - must be a scalar type. The semantic analysis of
casts divides into computing and checking the target type, parsing the
operand, and computing and checking the source type. typename parses
a type declarator and returns the resulting Type, and thus does most of
the work of computing the target type, except for qualified enumerations:
(typecast 180)=
Type ty, tyl = typename(), pty;
...
180 164

expect(')');
ty = unqual(tyl);
if (isenum(ty)) {
Type ty2 = ty->type;
if (isconst(tyl))
ty2 = qual(CONST, ty2);
if (isvolatile(tyl))
ty2 = qual(VOLATILE, ty2);
tyl = ty2;
ty = ty->type;
}

This code computes the target type tyl and its unqualified variant ty.
Aflag 62
cast 175
The target type for a cast that specifies an enumeration type is the enu-
CONST 109 meration's underlying integral type (which for 1cc is always int), not the
expect 142 enumeration. Thus, tyl and ty must be recomputed before parsing the
isarith 60 operand.
isconst 60 ....
isenum
isint
60
60
(type cast 180) +=
p = pointer(unary());
180 180
... 164

isptr 60
pty = p->type;
isvolatile 60
pointer 174 if (isenum(pty))
qual 62 pty = pty->type;
Type 54
typename 309 This tree is cast to the unqualified type, ty, if the target and source types
unary 164 are legal: arithmetic and enumeration types can be cast to each other;
unqual 60 pointers can be cast to other pointers; pointers can be cast to integral
VOLATILE 109
types and vice versa, but the result is undefined if the sizes of the types
differ; and any type can be cast to void.
....
(type cast180)+=
if (isarith(pty) && isarith(ty)
180 181 ... 164

I I isptr(pty) && isptr(ty))


p = cast(p, ty);
else if (isptr(pty) && isint(ty)
II isint(pty) && isptr(ty)) {
if (Aflag >= 1 && ty->size < pty->size)
9.2 • UNARY AND POSTFIX OPERATORS 181

warning("conversion from '%t' to '%t' is compiler _


dependent\n", p->type, ty);
p = cast(p, ty);
} else if (ty != voidtype) {
error("cast from '%t' to '%t' is illegal\n",
p->type, tyl);
tyl = inttype;
}
Recall that cast warns about casts between object and function pointers.
The final step is to annotate p with the possibly qualified type:
....
(type castl80) += 180 164
p = retype(p, tyl);
if (generic(p->Op) == INDIR)
p = tree(RIGHT, ty, NULL, p);
A cast is not an lvalue, so if p is an INDIR tree, it's hidden under a RIGHT
tree, which keeps 1va1 ue from accepting it as an lvalue.
The standard stipulates that an expression of the form e [ i] be treated
as equivalent to *(e+i). One of the operands must be a pointer and the
other must be an integral type. The semantics function for addition does
most of the work once e and i are recognized:
(subscriptl81)= 166
{
175 cast:
155 expr
Tree q; 60 isarray
t = gettok(); 60 ispt:r
q = expr('] '); 169 lvalue
i f (YYnul 1) 215 nullcheck
191 opt:ree
if (isptr(p->type)) 174 pointer
p = nullcheck(p); 171 ret:ype
else if (isptr(q->type)) 149 RIGHT
q = nullcheck(q); 169 rvalue
p = (*optree['+'])(ADD, pointer(p), pointer(q)); 150 t:ree
58 voidt:ype
if (isptr(p->type) && isarray(p->type->type))
p = retype(p, p->type->type);
else
p = rvalue(p);
}
The last if statement handles n-dimensional arrays; for example, if x is
declared int x [10] [20], x [i] refers to the ith row, which is has type
(ARRAY 20 (INT)), but x[i] is not an lvalue. Similar comments apply to
i [x], which is a bit peculiar but equivalent nonetheless.
References to fields are similar to subscripting; they yield trees that
refer to the rvalue of the indicated field and are thus lvalues, or, for
array fields, trees that refer to the address of the field. The parsing is
straightforward:
182 CHAPTER 9 • EXPRESSION SEMANTICS

(pointer-> field 182) = 166


t = gettok();
p = pointerCp);
if Ct == ID) {
if CisptrCp->type) && isstructCp->type->type)) {
if CYYnull)
p = nullcheckCp);
p = fieldCp, token);
} else
errorC"left operand of -> has incompatible _
type '%t'\n", p->type);
t = gettok();
} else
errorC"field name expected\n");
field calls fieldref, which returns the Field that gives the type and
location of the field.
( tree.c functions)+=
...
175
Tree fieldCp, name) Tree p; char *name; {
Field q;
Type tyl, ty = p->type;
align 78
array 61
if Ci sptrCty))
deref 61 ty = deref(ty);
Field 66 tyl = ty;
fieldref 76 ty = unqualCty);
isarray 60 if CCq = fieldrefCname, ty)) != NULL) {
isptr 60
(access the field described by q 182)
isstruct 60
nullcheck 215 } else {
pointer 174 errorC"unknown field '%s' of '%t'\n", name, ty);
ptr 61 p = rvalueCretypeCp, ptrCinttype)));
retype 171 }
rvalue 169
token 108
return p;
unqual 60 }

field must cope with qualified structure types. If a structure type is


declared const or volatile, references to its fields must be similarly qual-
ified even though the qualifiers are not permitted in field declarators.
q->type is the type of the field and q->offset is the byte offset to the
field.
(access the field described by q 182) = 183
.... 182
if CisarrayCq->type)) {
ty = q->type->type;
(qualify ty, when necessary 183)
ty = arrayCty, q->type->size/ty->size, q->type->align);
9.3 • FUNCTION CALLS 183

} else {
ty = q->type;
(qualifyty, when necessary183)
ty = ptr(ty);
}
p = simplify(ADD+P, ty, p, consttree(q->offset, inttype));

(qualify ty, when necessaiy 183) = 182 183


if (isconst(tyl) && !isconst(ty))
ty = qual(CONST, ty);
if (isvolatile(tyl) && !isvolatile(ty))
ty = qual(VOLATILE, ty);
s imp 1i fy returns a tree for the address of the field, or the address of the
unsigned that holds a bit field. A nonzero q-> 1sb gives the position plus
one of a bit field's least significant bit, and serves to identify a field as a
bit field. Bit fields are referenced via FIELD trees, and are not !values .
(access the field described by q 182) +=
....
182 182
if (q-> 1sb) {
p = tree(FIELD, ty->type, rvalue(p), NULL);
p->u.field = q;
} else if (!isarray(q->type))
p = rvalue(p); 109 CONST
193 consttree
(u fields for Tree variants 168)+=
....
168 149
66 Field
149 FIELD
Field field; 182 field
60 isarray
The u.field field in a FIELD tree points to the Field structure, defined 60 isconst
in Section 4.6, that describes the bit field. 60 isvolatile
The expression e. name is equivalent to (&e)->narne, so field is also 88 left_to_right
called by the fragment (struct.field). That code builds a tree for the 364 offset
61 ptr
address of . 's left operand, and passes it to fie 1 d. 62 qual
169 rvalue
203 simplify
9.3 Function Calls 150 tree
109 VOLATILE
88 wants_argb
Function calls are easy to parse but difficult to analyze. The analy- 88 wants_callb
sis must cope with calls to both new-style and old-style functions in
which the semantics imposed by the standard affect the conversions
and argument checking. Semantic analysis must also handle the or-
der of evaluation of the arguments (which depends on the interface flag
1eft_to_ri ght), passing and returning structures by value (which de-
pends on the interface flags wants_argb and wants_callb), and actual
arguments that include other calls. All these variants are caused by 1 cc's
interface, not by rules in the standard, and all of them could be elimi-
nated. Doing so, however, would make it impossible for 1 cc to generate
184 CHAPTER 9 • EXPRESSION SEMANTICS

CALL+B

/~
RIGHTADDRL+P
~ t3
ADDRG+P
f

l''·ru~\
RIGHT ARG+I

/~ /~
i
ARG+P ADDRG+P
atoi
CNST+I
10 i
ARG+B

INDIR+P INDIR+B

i
ADDRG+P
i
ADDRG+P
str a
FIGURE9.2 Treeforf(a, '\n', atoi(str)).

RIGHT 149
wants_argb 88 code that mimics the established calling sequences on one or more of its
targets. These complexities are the price of compatibility with existing
calling conventions.
The meaningless program
char *str;
struct node { ... } a;
struct node f(struct node x, char c, inti) { ... }
main () { f(a, '\n', atoi(str)); }
illustrates almost all these complexities. The tree for the call to f is
shown in Figure 9.2, which assumes that wants_argb is one. The CALL+B's
right operand is described below. The RIGHT trees in this figure collab-
orate to achieve the desired evaluation order. A CALL's left operand is
a RIGHT tree that evaluates the arguments (the ARG trees) and the func-
tion itself. The leftmost RIGHT tree in Figure 9.2 is an example. The
tree whose root is the shaded RIGHT in Figure 9.2 occurs because of the
nested call to atoi. When this tree is traversed, code is generated so
that the call to atoi occurs before the arguments to fare evaluated. In
general, there's one RIGHT tree for each argument that includes a call,
and one if the function name is itself an expression with a call.
The actual arguments are represented by ARG trees, rightmost argu-
ment first; their right operands are the trees for the evaluation of the rest
of the actual arguments. Recall that ARG trees can have two operands.
9. 3 • FUNCTION CALLS 185

ARG+P

i
RIGHT

/~
ASGN+B
ADDRL+P
/ ~ t2
ADDRL+P INDIR+B
t2 i
ADDRG+P
a
FIGURE 9.3 Passing a structure by value when wants_argb is zero.

The topmost ARG+I is for the argument atoi (str), and its left operand
points to the CALL+I described above. The presence of the RIGHT tree
will cause the back end to store the value returned by atoi in a tempo-
rary, and the reference from the ARG+I to the CALL+I for atoi will pass
that value to f.
The second ARG+I is for the newline passed as the second argument.
f has a prototype and is thus a new-style function, so it might be ex-
pected that the integer constant '\n' would be converted to and passed 168 idtree
as a character. Most machines have constraints, such as stack alignment, 149 RIGHT
88 wants_argb
that force subword types to be passed as words. Even without such con- 88 wants_callb
straints, passing subword types as words is usually more efficient. So
1 cc generates code to widen short arguments and character arguments
to integers when they are passed, and code to narrow them upon entry
for new-style functions. If the global char ch was passed as f's second
argument, the tree would be
(ARG+I (CVC+I (INDIR+C (ADDRG+P ch))))

The bottom ARG+B tree passes the structure a to f by value. ARG+B


is provided so that back ends can use target-specific calling sequences;
Chapter 16 shows how it's used on the MIPS. If wants_argb is zero, the
front end completely implements value transmission for structures. It
copies the actual argument to a local temporary in the caller, and passes
the address of that temporary. As detailed in idtree, references to the
actual argument in the callee use an extra indirection to fetch the struc-
ture. Figure 9.3 shows the ARG tree for passing a to f when wants_argb
is zero. The RIGHT tree generates code to assign a to the temporary, t2,
followed by passing the address of t2.
The right operand of CALL+B is the address of a temporary in the caller
to which the return value is assigned; t3 in Figure 9.2, for example. When
the interface flag wants_ca 11 b is one, the back end must arrange to pass
186 CHAPTER 9 • EXPRESSION SEMANTICS

this address to the caller. When wants_ca 11 b is zero, the front end ar-
ranges to pass this address as a hidden first argument, and it changes
the CALL+B to a CALL+V; in this case, the back end never sees CALLB nodes.
This change is made by l i stnodes when the tree for a call is converted
to a forest of nodes for the back end.
l i stnodes also inspects the interface flag left_ to_ri ght as it tra-
verses a call tree. If l eft_to_ri ght is one, the argument subtree is
traversed by visiting the right operands of ARG trees first, which gen-
erates code that evaluates the arguments from the left to the right. If
l eft_to_ri ght is zero, the left operands are visited first, which evaluates
the arguments from the right to the left.
The case in postfix checks the type of the function expression and
lets ca11 to do most of the work:
(calls 186)= 166
{
Type ty;
Coordinate pt;
p = pointer(p);
if (isptr(p->type) && isfunc(p->type->type))
ty = p->type->type;
else {
Coordinate 38 error("found '%t' expected a function\n", p->type);
freturn 64 ty = func(voidtype, NULL, 1);
func 64 }
hascall 171 pt = src;
isfunc 60 t gettok();
isptr 60
left_to_right 88 p = call(p, ty, pt);
listnodes 318 }
pointer 174
postfix 166 ca11 dedicates locals to deal with each of the semantic issues de-
RIGHT 149 scribed above. n counts the number of actual arguments. args is the
unqual 60 root of the argument tree, and r is the root of the RIGHT tree that holds
voidtype 58 arguments or function expressions that include calls. For the example
wants_callb 88 shown in Figure 9.2, r points to the CALL+! tree. After parsing the argu-
ments, if r is nonnull, it and args are pasted together in a RIGHT tree,
which is the subtree rooted at the shaded RIGHT in Figure 9.2. hasca 11 re-
turns a nonzero value if its argument tree includes a CALL, and funcname
returns the name buried in for the string "a function" if f computes a
function address.
(enode.c functions)= 189
Tree call(f, fty, src) Tree f; Type fty; Coordinate src; {""
int n = O;
Tree args = NULL, r = NULL;
Type *proto, rty = unqual(freturn(fty));
Symbol t3 = NULL;
9.3 • FUNCTION CALLS 187

if (fty->u.f.oldstyle)
proto NULL;
else
proto fty->u.f.proto;
if (hascall (f))
r = f;
if (isstruct(rty))
(initialize for a struct function 187)
if Ct != ')')
for (; ;) {
{parse one argument 188)
if (t ! = I ' I)
break;
t = gettok();
}
expect(' ) ' ) ;
if ({still in a new-style prototype? 187))
error("insufficient number of arguments to %s\n",
funcname(f));
if (r)
args = tree(RIGHT, voidtype, r, args);
if (events.calls)
{plant an event hook for a call) 80 AUTO
return calltree(f, rty, args, t3); 189 calltree
} 142 expect
171 hascall
f is the expression for the function, rty is the return type, and proto is 60 isstruct
either null for an old-style function (even if it has a prototype; see Sec- 42 level
63 oldstyle
tion 4.5) or walks along the function prototype for a new-style function. 149 RIGHT
A nonnull proto is incremented for each actual argument that corre- 50 temporary
sponds to a formal parameter in a new-style prototype, and 150 tree
60 unqual
{still in a new-style prototype? 187)= 187 188 58 voidtype
proto && *proto && *proto != voidtype
tests if p roto points to a formal parameter type, when there is a proto-
type. Reaching the end of a prototype is different from reaching the end
of the actual arguments; for example, excess arguments are permitted in
new-style functions with a variable number of arguments.
If the function returns a structure, t3 is the temporary that's generated
to hold the return value:
{initialize for a st ruct function 187) = 187
{
t3 = temporary(AUTO, unqual(rty), level);
if (rty->size == 0)
error("illegal use of incomplete type '%t'\n", rty);
}
188 CHAPTER 9 • EXPRESSION SEMANTICS

t3 is the temporary shown in Figure 9.2. This initialization could be


done after parsing the arguments, but it's done before so that the source
coordinate in the diagnostic shown above pinpoints the beginning of the
argument list.
An actual argument is an assignment-expression:
(parse one argument 188) = 187
Tree q = pointer(exprl(O));
if ((still in a new-style prototype? 187))
(new-style argument 188)
else
(old-style argument 189)
if (!IR->wants_argb && isstruct(q->type))
(pass a structure directly 191)
if (q->type->size == 0)
q->type = inttype;
if (hascall(q))
r r ? tree(RIGHT, voidtype, r, q) : q;
args = tree(ARG + widen(q->type), q->type, q, args);
n++;
if (Aflag >= 2 && n == 32)
warning("more than 31 arguments in a call to %s\n",
Aflag 62 funcname(f));
assign 195 The if statement at the beginning of this fragment distinguishes between
cast 175
exprl 157 new-style and old-style function types, and handles calls to new-style
hascall 171 functions that have a varying number of arguments, such as pri ntf, or
IR 306 that have excess arguments. If a prototype specifies a variable length
isstruct 60 argument list (by ending in , ... ), there are at least two types in the
pointer 174 prototype array and the last one is voi dtype. Actual arguments beyond
RIGHT 149
tree 150 the last explicit argument are passed in the same way as arguments to
value 160 old-style functions are passed.
voidtype 58 New-style arguments are passed as if the actual argument were as-
wants_argb 88 signed to the formal parameter. No assignment is actually made because
widen 74 the argument is carried by an ARG tree, but the argument can be type-
checked with assign, which type-checks assignments:
(new-style argument 188) = 188
{
Type aty;
q = value(q);
aty = assign(*proto, q);
if Caty)
q = cast(q, aty);
else
error("type error in argument %d to %s; found '%t' _
expected '%t'\n", n + 1, funcname(f),
9. 3 • FUNCTION CALLS 189

q->type, *proto);
if ((isint(q->type) I I isenum(q->type))
&& q->type->size != inttype->size)
q = cast(q, promote(q->type));
++proto;
}

The second call to cast widens subinteger arguments as described above.


Old-style arguments suffer the default argument promotions. The in-
tegral promotions are performed and floats are promoted to doubles.
(old-style argument 189}= 188
{
if (!fty->u.f.oldstyle && *proto == NULL)
error("too many arguments to %s\n", funcname(f));
q = value(q);
if (q->type == floattype)
q = cast(q, doubletype);
else if (isarray(q->type) I I q->type->size == 0)
error("type error in argument %d to %s; '%t' is
illegal\n", n + 1, funcname(f), q->type);
else
q = cast(q, promote(q->type));
175 cast
}
57 doubletype
57 fl oat type
The first test in this fragment checks f. ol dstyl e because it's not enough 168 idtree
to just check for nonnull f. proto: as mentioned in Section 4.5, old-style 60 isarray
functions can carry prototypes, but these prototypes cannot be used to 60 isenum
type-check actual arguments. 60 isint
The actual CALL tree is built by ca 11 tree, which is presented with the 60 isstruct
63 oldstyle
tree for the function expression, the return type, the argument tree, and 71 promote
the temporary if the function returns a structure. It combines the trees 149 RIGHT
to form the CALL+B tree shown in Figure 9.2: 150 tree

(enode.c functions}+=
...
186 191
160 value
....
Tree calltree(f, ty, args, t3)
Tree f, args; Type ty; Symbol t3; {
Tree p;

if (args)
f = tree(RIGHT, f->type, args, f);
if (isstruct(ty))
p = tree(RIGHT, ty,
tree(CALL+B, ty, f, addrof(idtree(t3))),
idtree(t3));
else {
Type rty = ty;
190 CHAPTER 9 • EXPRESSION SEMANTICS

if (isenum(ty))
rty = unqual(ty)->type;
else if (isptr(ty))
rty = unsignedtype;
p = tree(CALL + widen(rty), promote(rty), f, NULL);
if (isptr(ty) I I p->type->size > ty->size)
p = cast(p, ty);
}
return p;
}

The operator CALL+! is used for integers, unsigneds, and pointers, so


much of ca11 tree is devoted to getting the types correct. A CALL+B
tree is always tucked under a RIGHT tree that returns the address of the
temporary that holds the return value. Figure 9.2 omits this RIGHT tree;
the tree actually built by ca11 tree and thus returned by ca11 begins
RIGHT

CALL+B
/~INDIR+B
/
RIGHT
~
ADDRL+P
i
ADDRL+P
call 186
/~
t3 t3
calltree 189
cast 175 ADDRG+P
COND 149 f
isenum60
isptr
60 CALL+B itself returns no value; it exists only to permit back ends to gen-
lvalue169 erate target-specific calling sequences for these functions.
promote71
RIGHT 149 add rof is an internal version of 1va1 ue that doesn't insist on an IN DIR
tree 150 tree (although there is an INDIR tree in call tree's use of addrof). addrof
unqual 60 follows the operands of RIGHT, COND, and ASGN, and the INDIR trees to find
unsignedtype 58 the tree that computes the address specified by its argument. It returns
wants_argb 88 a RIGHT tree representing the original tree and that address, if necessary.
widen 74
For example, if oc is the operand tree buried in p that computes the ad-
dress, addrof(p) returns (RIGHT root(p) oc); if p itself computes the
address, addrof(p) returns p.
Structures are always passed by value, but if wants_argb is zero and
the argument is a structure, it must be copied to a temporary as ex-
plained above. There's one optimization that improves the code for pass-
ing structures that are returned by functions. For example, in
f(f(a, '\n', atoi(str)), 'O', 1);
the node returned by the inner call to f is passed to the outer call. In this
and similar cases, copying the actual argument can be avoided because
it already resides in a temporary. The pattern that must be detected is
9.4 • BINARY OPERA TORS 191

(RIGHT
(CALL+B ... )
(INDIR+B (ADDRL+P te111p))
)

where te111p is a temporary. i sea11 b looks for this pattern:


...
(enode.c functions}+=
int iscallb(e) Tree e; {
...
189 192

return e->op == RIGHT && e->kids[O] && e->kids[l]


&& e->kids[O]->OP == CALL+B
&& e->kids[l]->OP == INDIR+B
&& isaddrop(e->kids[l]->kids[O]->op)
&& e->kids[l]->kids[O]->u.sym->temporary;
}

(pass a structure directly 191} = 188


if (iscallb(q))
q = addrof(q);
else {
Symbol tl = temporary(AUTO, unqual(q->type), level);
q = asgn(tl, q);
q = tree(RIGHT, ptr(tl->type),
80 AUTO
root(q), lvalue(idtree(tl))); 162 expr3
} 168 idtree
179 i sadd rop
asgn (Symbo 1 t, Tree e) is an internal form of assignment that builds 42 level
and returns a tree for assigning e to the symbol t. 169 lvalue
61 ptr
149 RIGHT
50 temporary
9.4 Binary Operators 109 token.h
150 tree
As the indirect call through optree in expr3 suggests, a semantics func- 60 unqual
tion for a binary operator takes a generic operator and the trees for the
two operands, and returns the tree for the binary expression. Table 9.1
lists the functions and the operators they handle. The operators are
grouped as shown in this table because the operators in each group have
similar semantics.
optree is defined by including token. h (see Section 6.2) and extracting
its fifth column, which holds the names of the tree-building functions:
(enode.c data}=
Tree (*optree[]) ARGS((int, Tree, Tree)) ={
#define xx(a,b,c,d,e,f,g) e,
#define yy(a,b,c,d,e,f ,g) e,
#include "token.h"
} ;
192 CHAPTER 9 • EXPRESSION SEMANTICS

Function Operators
incr += -= *= I= %=
<<= >>= &= A= I=
asgntree
condtree ? :
andtree 11 &&
bittree I A &%
eqtree -- !=
cmptree < > <= >=
shtree << >>
addtree +
subtree
multree * I
TABLE 9.1 Operator semantics functions.

The function for addition typifies these semantics functions. It must


type-check the operands and form the appropriate tree depending on
their types. The easy case is when both operands are arithmetic types:
....
(enode.c functions)+=
static Tree addtree(op, 1, r) int op; Tree l, r; {
191 193
...
asgntree 197 Type ty = inttype;
binary 173
cast 175 if (isarith(l->type) && isarith(r->type)) {
cmptree 193 ty = binary(l->type, r->type);
condtree 200
eqtree 195 (cast 1 and r to type ty 192)
incr 158 } else if (isptr(l->type) && isint(r->type))
isarith 60 return addtree(ADD, r, l);
isfunc 60 else if ( isptr(r->type) && isint(l->type)
isint 60 && !isfunc(r->type->type))
isptr 60
simplify 203 (build an ADD+P tree 193)
else
typeerror(op, 1, r);
return simplify(op, ty, 1, r);
}

(castl and r to type ty 192)= 192 193 194 195


1 = cast(l, ty);
r = cast(r, ty);
Addition can also take a pointer and an integer in either order. The
recursive call to addt ree above switches the arguments so the next clause
can handle both orders. The front end always puts the integer operand
on the left in ADD dags because that order helps back ends implement
some additions with addressing modes.
9.4 • BINARY OPERA TORS 193

The standard distinguishes between pointers to objects and pointers


to functions; most operators, such as addition, that take pointers accept
only object pointers. Integers may be added to object pointers, but the
addition implies a multiplication by the size of the object:
(build an ADD+P tree 193)= 192
{
int n;
ty = unqual(r->type);
(n .._ *ty's size 193)
1 = cast(l, promote(l->type));
if (n > 1)
1 = multree(MUL, consttree(n, inttype), l);
return simplify(ADD+P, ty, 1, r);
}

(n .._ *ty'ssizel93)= 193


n = ty->type->size;
if (n == 0)
error("unknown size for type '%t'\n", ty->type);
consttree builds a tree for a constant of any type that has an associated
integer or an unsigned value:
(enode.c functions)+=
...
192 193
62 atop
173 binary
..... 175 cast
Tree consttree(n, ty) unsigned n; Type ty; { 194 compatible
Tree p; 60 isarith
60 isarray
if Ci sarray(ty)) 71 promote
ty = atop(ty); 203 simplify
p = tree(CNST + ttob(ty), ty, NULL, NULL); 150 tree
73 ttob
p->u.v.u = n; 60 unqual
return p;
}

The relational comparison operators also accept only object pointers


and return integers, but they accept more relaxed constraints on their
pointer operands.
(enode.c functions)+=
...
193 194
.....
static Tree cmptree(op, 1, r) int op; Tree l, r; {
Type ty;

if (isarith(l->type) && isarith(r->type)) {


ty = binary(l->type, r->type);
(cast 1 and r to type ty 192)
} else if (compatible(l->type, r->type)) {
194 CHAPTER 9 • EXPRESSION SEMANTICS

ty = unsignedtype;
(cast 1 and r to type ty 192)
} else {
ty = unsignedtype;
typeerror(op, l, r);
}
return simplify(op + ttob(ty), inttype, l, r);
}

The two pointers must point to qualified or unqualified versions of com-


patible object types or compatible incomplete types. In other words, any
canst and volatile qualifiers must be ignored when type-checking the ob-
jects, which is exactly what compati b1e does:
...
(enode.c functions)+=
static int compatible(tyl, ty2) Type tyl, ty2; {
...
193 194

return isptr(tyl) && !isfunc(tyl->type)


&& isptr(ty2) && !isfunc(ty2->type)
&& eqtype(unqual (t·1l->type), unqual (ty2->type), 0);
}

The third argument of zero to eqtype causes eqtype to insist that its
two type arguments are object types or incomplete types.
cast 175 The equality comparison operators are similar to the relationals but
cmptree 193 are fussier about pointer operands. These and other operators distin-
eqtree 195 guish between void pointers, which are pointers to qualified or unquali-
eqtype 69 fied versions of void, and null pointers, which are integral constant ex-
isfunc 60 pressions with the value zero or one of these expressions cast to void *.
isint 60
isptr 60 These definitions are encapsulated in
simplify 203
(enode.c macros)=
ttob 73
unqual 60 #define isvoidptr(ty) \
unsignedtype 58 (isptr(ty) && unqual(ty->type) voidtype)
voidtype 58
...
(enode.c functions)+=
static int isnullptr(e) Tree e; {
...
194 195

return (isint(e->type) && generic(e->op) == CNST


&& cast(e, unsignedtype)->u.v.u == 0)
I I (isvoidptr(e->type) && e->op == CNST+P
&& e->u.v.p ==NULL);
}

In addition to the arithmetic types, which are handled by calling cmptree,


eqtree accepts a pointer and a null pointer, an object pointer and a void
pointer, or two pointers to qualified or unqualified versions of compat-
ible types. The leading if statement in eqt ree tests for just these three
combinations for the left and right operands, and the recursive call re-
peats the test for the right and left operands, when appropriate.
9.5 • ASSIGNMENTS 195

(enode.c functions)+=
...
194 195
....
Tree eqtree(op, l, r) int op; Tree l, r; {
Type xty = 1->type, yty = r->type;

if (isptr(xty) && isnullptr(r)


I I isptr(xty) && !isfunc(xty->type) && isvoidptr(yty)
11 (xty and yty point to compatible l}pes 195)) {
Type ty = unsignedtype;
(cast 1 and r to twe ty 192)
return simplify(op + U, inttype, l, r);
}
if (isptr(yty) && isnullptr(l)
I I isptr(yty) && !isfunc(yty->type) && isvoidptr(xty))
return eqtree(op, r, l);
return cmptree(op, l, r);
}

(xty and yty point to compatible l}pes 195) = 195 196 201
(isptr(xty) && isptr(yty)
&& eqtype(unqual(xty->type), unqual(yty->type), 1))
The third argument of 1 to eqtype causes eqtype to permit its two type
arguments to be any combinations of compatible object or incomplete 193 cmptree
types. Given the declaration 69 eqtype
60 isenum
int (*p) [10] , (*q) [] ; 60 isfunc
194i snull ptr
eqtype's third argument is what permits p == q but disallows p < q. 60isptr
194isvoidptr
203simplify
9.5 Assignments 60unqual
58 unsignedtype
The legality of an assignment expression, a function argument, a return
statement, or an initialization depends on the legality of an assignment
of an rvalue to the location denoted by an lvalue. assign (xty, e) per-
forms the necessary type-checking for any assignment. It checks the
legality of assigning the tree e to an lvalue that holds a value of type
xty, and returns xty if the assignment is legal or null if it's illegal. The
return value is also the type to which e must be converted before the
assignment is made.
( enode.c functions)+=
...
195 197
Type assign(xty, e) Type xty; Tree e; {
....
Type yty = unqual(e->type);

xty = unqual(xty);
if (isenum(xty))
196 CHAPTER 9 • EXPRESSION SEMANTICS

xty = xty->type;
if (xty->size == 0 I I yty->size == 0)
return NULL;
(assign 196)
}

The body of assign tests the five constraints imposed on assignments by


the standard. The first two permit assignment of arithmetic and struc-
ture types:
(assign 196) = 196
..... 196
if ( isarith(xty) && isarith(yty)
I I isstruct(xty) && xty == yty)
return xty;
The other three cases involve pointers. The null pointer may be assigned
to any pointer:
(assign 196) +=
....
196 196 196
.....
if (isptr(xty) && isnullptr(e))
return xty;
Any pointer may be assigned to a void pointer or vice versa, provided the
type pointed to by the left pointer operand has at least all the qualifiers
asgntree 197 carried by the type pointed to by the right operand:
assign 195 (assign 196) +=
....
196 196 196
i sari th 60 .....
isconst 60 if ((isvoidptr(xty) && isptr(yty)
isnullptr 194 I I isptr(xty) && isvoidptr(yty))
isptr 60 && (*xty has all of*yty's qualifiers 196))
isstruct 60 return xty;
isvoidptr 194
isvolatile 60
(*xty has all of *yty's qualifiers 196)= 196
( (isconst(xty->type) I I !isconst(yty->type))
&& (isvolatile(xty->type) I I !isvolatile(yty->type)))
A pointer can be assigned to another pointer if they both point to com-
patible types and the lvalue has all the qualifiers of the rvalue, as above .
(assign 196)+=
....
196 196 196
.....
if ( (xty and yty point to compatible types 195)
&& (*xty has all of *yty's qualifiers 196))
return xty;
Finally, if none of the cases above apply, the assignment is an error, and
assign returns the null pointer:
....
(assign 196)+= 196 196
return NULL;
assign is used in asgntree to build a tree for an assignment:
9.5 •ASSIGNMENTS 197

....
(enode.c functions)+=
Tree asgntree(op, 1, r) int op; Tree l, r; {
195 200
...
Type aty, ty;

r = pointer(r);
ty = assign(l->type, r);
if (ty)
r = cast(r, ty);
else {
typeerror(ASGN, 1, r);
if (r->type == voidtype)
r = retype(r, inttype);
ty = r->type;
}
if (1->op != FIELD)
1 = lvalue(l);
(asgntree 197)
return tree(op + (isunsigned(ty) ? I ttob(ty)),
ty l, r);
I

When the assignment is illegal, assign returns null and asgntree must
choose a type for the result of the assignment. It uses the type of the 195 assign
right operand, unless that type is void, in which case asgntree uses int. 175 cast
This code exemplifies what's needed to recover from semantic errors so 65 cfields
that compilation can continue. 211 computed
149 FIELD
The body of asgntree, revealed by (asgntree), below, detects attempts 50 generated
to change the value of a const location, changes the integral rvalue of 179 isaddrop
assignments to bit fields to meet the specifications of the standard, and 60 isconst
transforms some structure assignments to yield better code. 60 isptr
An lvalue denotes a const location if the type of its referent is qualified 60 isstruct
by const or is a structure type that is const-qualified. A structure type 60 isunsigned
169 lvalue
so qualified has its u. sym->u. s. cfi e 1ds flag set. 174 pointer
171 retype
(asgntree 197)=
aty = 1->type;
...
198 197 150
73
tree
ttob
if (isptr(aty)) 60 unqual
aty = unqual(aty)->type; 58 voidtype
if ( isconst(aty)
I I isstruct(aty) && unqual(aty)->u.sym->u.s.cfields)
if (isaddrop(l->op)
&& !1->u.sym->computed && !1->u.sym->generated)
error("assignment to canst identifier '%s'\n",
1->u.sym->name);
else
error("assignment to canst location\n");
198 CHAPTER 9 • EXPRESSION SEMANTICS

aty is set to the type of the value addressed by the lvalue. The assign-
ment is illegal if aty has a canst qualifier or if it's a structure type with
one or more canst-qualified fields. The gymnastics for issuing the di-
agnostic are used to cope with lvalues that don't have source-program
names.
The result of an assignment is the value of the left operand, and the
type is the qualified version of the left operand. The cast at the begin-
ning of asgntree sets r to the correct tree and ty to the correct type
for r and ty to represent the result, so the result of ASGN is its right
operand. Unfortunately, this scheme doesn't work for bit fields. The re-
sult of an assignment to a bit field is the value that would be extracted
from the field after the assignment, which might differ from the value
represented by r. So, for assignments to bit fields that occupy less than
a full unsigned, asgntree must change r to a tree that computes just
this value.
....
(asgntree 197)+= 197 199
..... 197
if (1->op == FIELD) {
int n = 8*1->u.field->type->size - fieldsize(l->u.field);
if (n > 0 && isunsigned(l->u.field->type))
r = bittree(BAND, r,
consttree(fieldmask(l->u.field), unsignedtype));
asgntree 197 else if (n > 0) {
consttree 193 if (r->op == CNST+I)
FIELD 149 r = consttree(r->u.v.i<<n, inttype);
field 182 else
fieldmask 66
fieldsize 66
r = shtree(LSH, r, consttree(n, inttype));
isunsigned 60 r = shtree(RSH, r, consttree(n, inttype));
unsignedtype 58 }
}

If the bit field is unsigned, the result is r with its excess most significant
bits discarded. If the bit field is signed and has m bits, bit m - 1 is
the sign bit and it must be used to sign-extend the value, which can be
done by arithmetically shifting r left to bring bit m into the sign bit, and
then shifting right by the same amount, dragging the sign bit along in
the process. For example, Figure 9.4 shows the trees assigned to r for
the two assignments in
struct { int a:3; unsigned b:3; } x;
x.a = e;
x.b = e;
In the assignment x. a = e, r is assigned a tree that uses shifts to sign-
extend the rightmost 3 bits of e; for x. b = e, r is assigned a tree that
ANDS e with 7. If r is constant, the left shift is done explicitly to keep
the constant folder from shouting about overflow.
9.5 • ASSIGNMENTS 199

RSH+I BAND+U

I\
LSH+I CNST+I e
I\ CNST+U
I\ 29
7

e CNST+I
29

FIGURE 9.4 Trees for the results of x. a = e and x . b = e.

Back ends typically generate block moves for structure assignments.


The job of generating good code for these assignments falls mostly on
back ends, but there is an opportunity to reduce the number of useless
block moves, and it's similar to the optimization done by ca 11 for struc-
ture arguments described on page 190. In x = f(), where f returns a
structure, a temporary is generated in the caller to hold f's return value,
and the temporary is copied to x after the call returns. The left side of
Figure 9.5 shows the resulting tree; x stands for the tree for x. This copy
can be avoided by using x in place of the temporary.
This improvement can be made when x addresses a location directly
and there's a temporary that holds the value returned by f:
(asgntree 197)+=
....
198 197
186 call
if (isstruct(ty) && isaddrop(l->op) && iscallb(r)) 168 idtree
return tree(RIGHT, ty, 179 isaddrop
tree(CALL+B, ty, r->kids[O]->kids[O], 1), 191 iscallb
idtree(l->u.sym)); 60 isstruct
149 RIGHT
The right side of Figure 9.5 shows the tree returned by this transforma- 150 tree
tion.

ASGN+B

x
/~RIGHT RIGHT

/~INDIR+B
CALL+B
/~
CALL+B INDIR+B

ADDRG+P
/ ~
ADDRL+P
i
ADDRL+P ADDRG+P
/ ~x ix
f tl tl f

AGURE 9.5 Trees for X=f ().


200 CHAPTER 9 • EXPRESSION SEMANTICS

9.6 Conditionals
The complex semantics of the conditional expression combines parts of
the semantics of comparisons, of the binary operators, of assignment,
and of casts. The COND operator is the only one that takes three operands:
The expression e ? l : r yields the tree shown in Figure 9.6, which is
built by condt ree:
....
(enode.c functions)+= 197
Tree condtree(e, 1, r) Tree e, 1, r; {
Symbol tl;
Type ty, xty = 1->type, yty = r->type;
Tree p;

(condtree 200)
p =tree(COND, ty, cond(e),
tree(RIGHT, ty, root(l), root(r)));
p->u.sym = tl;
return p;
}

tl, carried in the u. sym field of a COND tree, is a temporary that holds
the result of the conditional expression at runtime. t1 is omitted if the
binary 173 result is void.
COND 149 The call cond(e) in the code above type-checks the first operand,
cond 174 which must have a scalar type. There are six legal combinations for the
eqtype 69
isarith 60 types of second and third operands. The three easy cases are when both
RIGHT 149 have arithmetic types, both have compatible structure types, or both have
tree 150 void type. All three of these cases are covered by the two if statements:
unqual 60
(condtree200)=
if (isarith(xty) && isarith(yty))
...
201 200

ty = binary(xty, yty);
else if (eqtype(xty, yty, 1))
ty = unqual(xty);

COND
/tl~
e RIGHT

/~ASGN
ASGN

/~ /~
ADDRL+P l ADDRL+P r
tl tl

FIGURE 9.6 Tree for e ? l r.


9. 6 • CONDITIONALS 201

The first if statement handles the arithmetic types, and the second han-
dles structure types and void.
The remaining three cases involve pointers. If one of the operands is
a null pointer and the other is a pointer, the resulting type is the nonnull
pointer type:
....
( condtree 200) + = 200 201
.... 200
else if (isptr(xty) && isnullptr(r))
ty = xty;
else if (isnullptr(l) && isptr(yty))
ty = yty;
If one of the operands is a void pointer and the other is a pointer to an
object or incomplete type, the result type is the void pointer:
....
(condtree200)+= 201201 .... 200
else if (isptr(xty) && !isfunc(xty->type) && isvoidptr(yty)
II isptr(yty) && !isfunc(yty->type) && isvoidptr(xty))
ty = voidptype;
If both operands are pointers to qualified or unqualified ve· .ons of com-
patible types, either can serve as the result type:
(condtree 200) +=
....
201 201 200
....
else i f ( (xty and yty point to compatible types 195))
149 COND
ty = xty; 109 CONST
else { 193 consttree
typeerror(COND, l, r); 60 isconst
return consttree(O, inttype); 60 isfunc
} 194 isnullptr
60 isptr
The type-checking code above ignores qualifiers on pointers to quali- 194 isvoidptr
fied types. The resulting pointer type, however, must include all of the 60 isvolatile
qualifiers of the referents of both operand types; so if ty is a pointer, 61 ptr
62 qual
it's rebuilt with the appropriate qualifiers: 60 unqual
.... 58 voidptype
(condtree 200)+= 201 202
.... 200
109 VOLATILE
if (isptr(ty)) {
ty = unqual(unqual(ty)->type);
if (isptr(xty) && isconst(unqual(xty)->type)
I I isptr(yty) && isconst(unqual(yty)->type))
ty = qual(CONST, ty);
if (isptr(xty) && isvolatile(unqual(xty)->type)
I I isptr(yty) && isvolatile(unqual(yty)->type))
ty = qual(VOLATILE, ty);
ty = ptr(ty);
}

If the conditional, e, is a constant, the result of the conditional expres-


sion is one of the other operands:
202 CHAPTER 9 • EXPRESSION SEMANTICS

...
(condtree 200)+=
if (e->op == CNST+D I I e->op == CNST+F) {
...
201 202 200

e = cast(e, doubletype);
return retype(e->u.v.d != 0.0? 1 : r, ty);
}
if (generic(e->op) == CNST) {
e = cast(e, unsignedtype);
return retype(e->u.v.u ? 1 : r, ty);
}

This constant folding is not just an optimization; it's mandatory because


conditional expressions can be used in contexts that require constant
expressions.
Finally, if the result type isn't void, the temporary is generated and 1
and r are changed to assignments to that temporary:
(condtree 200) +=
...
202 200
if (ty != voidtype && ty->size > 0) {
tl = temporary(REGISTER, unqual(ty), level);
1 = asgn(tl, 1);
r = asgn(tl, r);
} else
t1 = NULL;
cast 175
doubletype 57
exprl 157
level 42 9.7 Constant Folding
REGISTER 80
retype 171 Constant expressions are permitted wherever a constant is required. Ar-
temporary 50 ray sizes, case labels, bit-field widths, and initializations are examples.
unqual 60
unsignedtype 58 constant-expression: conditional-expression
voidtype 58
Constant expressions are parsed by
(simp.c functions)=
Tree constexpr(tok) int tok; {
...
203

Tree p;

needconst++;
p = exprl(tok);
needconst--;
return p;
}

(simp.c data)=
int needconst;
9. 7 • CONSTANT FOLDING 203

exprl parses assignment-expressions. Technically, constexpr should


call expr2, which parses conditional-expressions, but legal assignments
are never constants, and will always cause semantic errors. Calling exprl
handles syntax errors more gracefully because exprl consumes an en-
tire assignment and thus avoids multiple diagnostics from the cascading
syntax errors. Callers to constexpr report an error if the tree returned
is not a CNST tree and if it's used in a context that requires a constant.
An example is i ntexpr, which parses integer constant expressions:
....
(simp.c functions)+= 202 203
....
int intexpr(tok, n) int tok, n; {
Tree p = constexpr(tok);

needconst++;
if (generic(p->op) == CNST && isint(p->type))
n = cast(p, inttype)->u.v.i;
else
error("integer expression must be constant\;i");
needconst--;
return n;
}

needconst is a global variable that controls the constant folding done


175 cast
by simplify, as detailed below. If it's nonzero, simplify warns about 202 constexpr
constant expressions that overflow and folds them anyway. Otherwise, 157 exprl
it doesn't fold them. 159 expr2
Constant folding is not simply an optimization. The standard makes it 60 isint
required by defining constructs in which the value of a constant expres- 202 needconst
98 optype
sion must be computed during compilation. Array sizes and bit-field 150 tree
widths are examples. simplify returns the tree specified by its argu- 73 ttob
ments, which are the same as tree's:
(simp.c functions)+=
....
203 205
....
Tree simplify(op, ty, l, r) int op; Type ty; Tree l, r; {
int n;
Tree p;

if (optype(op) == 0)
op+= ttob(ty);
switch (op) {
(simplify cases204)
}
return tree(op, ty, l, r);
}

simplify does three things that tree does not: it forms a type-specific
operator if it's passed a generic one, it evaluates operators when both
204 CHAPTER 9 • EXPRESSION SEMANTICS

operands are constants, and it transforms some trees into simpler ones
that yield better code as it constructs the tree requested.
Each of the cases in the body of simplify's switch statement han-
dles one type-specific operator. If the operands are both constants, the
code builds and returns a CNST tree for the resulting value; otherwise, it
breaks to the end of the switch statement, which builds and returns the
appropriate tree. The code that checks for constant operands and builds
the resulting CNST tree is almost the same for every case; only the type
suffix, Value field name, operator, and return type vary in each case, so
the code is buried in a set of macros. The case for unsigned addition is
typical:
(simplify cases204)=
case ADD+U:
...
205 203

foldcnst(U,u,+,unsignedtype);
commute(r,l);
break;
This case implements the transformation
(ADD+U (CNST+U C1) (CNST+U C2)) ~ (CNST+U C1 + C2)
This use of fo 1 den st checks whether both operands are CNST+U trees,
and, if so, returns a new CNST+U tree whose u. v. u field is the sum of
simplify 203
tree 150
1->r.v.u and r->r.v.u:
ttob 73
unsignedtype 58
(simp.c macros)=
#define foldcnst(TYPE,VAR,OP,RTYPE) \
...
204
Value 47
if (1->op == CNST+TYPE && r->op == CNST+TYPE) {\
p = tree(CNST+ttob(RTYPE), RTYPE, NULL, NULL);\
p->u.v.VAR = 1->u.v.VAR OP r->u.v.VAR;\
return p; }
For commutative operators, commute ensures that if one of the operands
is a constant, it's the one given as commute's first argument. This trans-
formation reduces the case analyses that back ends must perform, allow-
ing back ends to count on constant operands of commutative operators
being in specific sites.
....
(simp.c macros)+=
#define commute(L,R) \
204 205
...
if (generic(R->op) == CNST && generic(L->op) != CNST) {\
Tree t = L; L = R; R = t; }
commute swaps its arguments, if necessary, to make L refer to the con-
stant operand. For example, the commute(r, 1) in the case for ADD+U
above ensures that if one of the operands is a constant, r refers to that
operand. This transformation also makes some of simplify's transfor-
mations easier, as shown below.
9. 7 • CONSTANT FOLDING 205

Unsigned addition is easy because the standard dictates that unsigned


operators do not overflow. Signed operations, however, must cope with
overflow. For example, if the operands to ADD+I are constants, but
their sum overflows, the expression is not a constant expression un-
less it's used in a context that demands one. The signed operators use
xfoldcnst, which is like foldcnst, but also checks for overflow .
(simplify cases204)+= 204 206 203
....
.....
case ADD+!:
xfoldcnst(I,i,+,inttype,add,INT__MIN,INT_MAX,needconst);
commute(r,l);
break;
implements the transformation
(ADD+! (CNST+I C1) (CNST+I C2)) =:> (CNST+I C1 + c2)
but only if c 1 + c2 doesn't overflow or if needconst is 1. xfo l dcnst
has four additional arguments: A function, the minimum and maximum
allowable values for the result, and a flag that is nonzero 1• ·_ constant is
required.
(simp.c macros)+= 2't4 206
.....
#define xfoldcnst(TYPE,VAR,OP,RTYPE,FUNC,MIN,MAX,needconst)\
204 commute
if (1->op == CNST+TYPE && r->op == CNST+TYPE\ 204 foldcnst
&& FUNC((double)l->u.v.VAR,(double)r->u.v.VAR,\ 97 FUNC
(double)MIN,(double)MAX, needconst)) {\ 202 needconst
p = tree(CNST+ttob(RTYPE), RTYPE, NULL, NULL);\ 150 tree
p->u.v.VAR = 1->u.v.VAR OP r->u.v.VAR;\ 73 ttob
return p; }
The function takes doubles because l cc assumes that a double has
enough bits in its significand to represent all of the integral types. Test-
ing for constant operands and building the resulting CNST tree are iden-
tical to the code in foldcnst, but the function is called to check the
validity of the operation; it returns zero if the operation will overflow,
and one otherwise. The function is passed the values, the minimum and
maximum, and the flag. All but the flag are converted to doubles. For
integer addition, the test for overflow is simple; x + y overflows if it is
less than INT_MIN or greater than INT_MAX, where INLMIN and INT_MAX
are the ANSI values for the smallest and largest signed integers. The
function, add, handles all the types, so it must not compute x + y be-
cause the addition might overflow. Instead, it tests the conditions under
which overflow will occur:
....
(simp.c functions)+= 203 210
.....
static int add(x, y, min, max, needconst)
double x, y, min, max; int needconst; {
- CHAPTER 9 • EXPRESSION SEMANTICS

int cond = x == 0 I I y == 0
I I x < 0 && y < 0 && x >= min - y
II X<O&&y>O
II X>O&&y<O
I I x > 0 && y > 0 && x <= max - y;
if (!cond && needconst) {
warning("overflow in constant expression\n");
cond = 1;
}
return cond;
}

As shown, needconst forces add to return 1 after issuing a warning. sub,


mul, and div are similar.
The conversions also divide into those that must check for overflow
and those that can ignore it. Conversions from a smaller to a larger
type, between unsigned types, between unsigned and pointer types, and
from integer to unsigned can ignore overflow. The conversions below
exemplify these four cases. They implement transformations like
(CVC+I (CNST+C c)) => (CNST+I c')
where c' is the possibly sign-extended value of c. For the unsigned con-
add 205 versions, c' = c.
cond 174 ....
need con st 202 (simplify cases 204)+= 205 207 203
tree 150
....
case CVC+I:
ttob 73
cvtcnst(C,inttype, p->u.v.i
unsignedshort 58
unsignedtype 58 (l->u.v.sc&0200? (-0<<8) : O)l(l->u.v.sc&0377));
break;
case CVU+S:
cvtcnst(U,unsignedshort,p->u.v.us 1->u.v.u); break;
case CVP+U:
cvtcnst(P,unsignedtype, p->u.v.u (unsigned)l->u.v.p);
break;
case CVI+U:
cvtcnst(I,unsignedtype, p->u.v.u 1->u.v.i); break;

(simp.c macros)+=
....
205 207
....
#define cvtcnst(FTYPE,TTYPE,EXPR) \
if (1->op == CNST+FTYPE) {\
p = tree(CNST+ttob(TTYPE), TTYPE, NULL, NULL);\
EXPR;\
return p; }
The assignment in the CVC+I case must sign-extend the sign bit of the
character operand manually, because the compiler cannot count on chars
9. 7 • CONSTANT FOLDING 207

being signed when it's compiled by another C compiler. It's tempting to


replace the assignment passed to cvtconst by something like
((int)l->u.v.sc<<(8*sizeof(int) - 8))>>(8*sizeof(int) - 8)
but whether or not » replicates the sign bit depends on the compiler
that compiles 1cc.
The four conversions from larger to smaller types must check for over-
flow. They implement transformations like
(CVI+C (CNST+I c)) ~ (CNST+C c)
if c fits in the smaller type or if needconst is one.
...
(simplify cases204)+=
case CVI+C:
206 207
... 203

xcvtcnst(I, chartype,1->u.v.i,SCHAR_MIN,SCHAR_MAX,
p->u.v.sc = 1->u.v.i); break;
case CVD+F:
xcvtcnst(D, floattype, 1->U. v. d, -FLT_MAX, FLT_M,'
p->u.v.f = 1->u.v.d); break;
case CVD+I:
xcvtcnst(D, inttype,1->u.v.d, INT_MIN,INT_MAX,
p->u.v.i = 1->u.v.d); break;
57 chartype
case CVI+S: 204 commute
xcvtcnst(I,shorttype,1->u.v.i, SHRT_MIN,SHRT_MAX, 327 cvtconst
p->u.v.ss = 1->u.v.i); break; 57 floattype
... 204 foldcnst
(simp.c macros)+=
#define xcvtcnst(FTYPE,TTYPE,VAR,MIN,MAX,EXPR) \
206 208
... 208 identity
202 needconst
57 shorttype
if (1->op == CNST+FTYPE) {\ 203 simplify
if (needconst && (VAR < MIN I I VAR > MAX))\ 150 tree
warning("overflow in constant expression\n");\ 73 ttob
if (needconst I I VAR >= MIN && VAR <= MAX) {\ 58 unsignedtype
p = tree(CNST+ttob(TTYPE), TTYPE, NULL, NULL);\
EXPR;\
return p; } }
In addition to evaluating constant expressions, s imp 1i fy transforms
the trees for some operators to help generate better code. Some of these
transformations remove identities and other simple cases. For example:
...
(simplify cases204)+=
case BAND+U:
207 208... 203

foldcnst(U,u,&,unsignedtype);
commute(r,l);
identity(r,l,U,u,(-(unsigned)O));
if (r->op == CNST+U && r->U.V.U == 0)
208 CHAPTER 9 • EXPRESSION SEMANTICS

return tree(RIGHT, unsignedtype, root(l),


consttree(O, unsignedtype));
break;
...
(simp.c macros)+=
#define identity(X,Y,TYPE,VAR,VAL) \
207 209 ...
if (X->op == CNST+TYPE && X->u.v.VAR == VAL)\
return Y
The use of identity and the if statement that follows implement the
transformations
(BAND+U e (CNST+U -0)) => e
(BAND+U e (CNST+U 0)) => Ce , (CNST+U 0))
In the second case, e cannot be discarded because it might have side
effects. commute Cr, 1) makes it necessary to check only if r is a constant.
s imp 1i fy also implements strength reduction for some operators. This
transformation replaces an operator by a less expensive one that com-
putes the same value. For example, an unsigned multiplication by a
power of two can be replaced by a left shift:
(MUL+U (CNST+U zk) e) => (LSH+U e (CNST+I k))
cfoldcnst 209 The code also uses fol dcnst to check for constant operands.
commute 204 ...
consttree
foldcnst
193
204
(simplify cases 204)+=
case MUL+U:
207 208
... 203

RIGHT 149 commute(l,r);


simplify 203 if (1->op == CNST+U && (n = ispow2(1->u.v.u)) != 0)
tree 150
unsignedtype 58 return simplify(LSH+U, unsignedtype, r,
zerofield 209 consttree(n, inttype));
foldcnst(U,u,*,unsignedtype);
break;
i spow2 (u) returns k if u is equal to zk for k > 0.
Bit fields are often tested by expressions such as p->x ! = 0, which
leads to a NE tree with FIELD and CNST trees as operands. Extracting the
bit field, which involves shifting and masking in general, and testing it
can be easily replaced by simpler code that fetches the word that con-
tains the field, ANDs it with a properly positioned bit mask, and tests the
outcome:
...
(simplify cases 204)+=
case NE+I:
...
208 209 203

cfoldcnst(I,i,!=,inttype);
commute(r,l);
zerofield(NE,I,i);
break;
9.7 • CONSTANT FOLDING 209

(simp.c macros)+=
....
208 209
.....
#define zerofield(OP,TYPE,VAR) \
if (1->op == FIELD\
&& r->op == CNST+TYPE && r->u.v.VAR == 0)\
return eqtree(OP, bittree(BAND, 1->kids[O],\
consttree(\
fieldmask(l->u.field)<<fieldright(l->u.field),\
unsignedtype)), r);
This case implements the transformation
(NE+I (FIELD e) (CNST+I 0)) ~
(NE+I (BAND+U (e (CNST+U 1\1))) (CNST+I 0))
where J\1 is a mask of s bits shifted m bits left, and s is size of the bit
field that lies m bits from the least significant end of the unsigned or
integer in which it appears. cfo 1den st is a version of fo 1den st that's
specialized for the relational operators:
(simp.c macros)+=
....
209 209
.....
#define cfoldcnst(TYPE,VAR,OP,RTYPE) \
if (1->op == CNST+TYPE && r->op == CNST+TYPE) {\
p = tree(CNST+ttob(RTYPE), RTYPE, NULL, NULL);\
p->u.v.i = 1->u.v.VAR OP r->u.v.VAR;\
204 commute
return p; } 193 consttree
195 eqtree
Pointer addition is the most interesting and complex case in simplify 149 FIELD
because it implements many transformations that yield better code. Gen- 182 field
erating efficient addressing is the linchpin of generating efficient code, 66 fieldmask
so effort in this case pays off on all targets. The easy cases handle con- 66 fieldright
stants and identities: 204 foldcnst
208 identity
(simplify cases204)+=
....
208 203 171 retype
203 simplify
case ADD+P: 150 tree
foldaddp(l,r,I,i); 73 ttob
foldaddp(l,r,U,u); 58 unsignedtype
foldaddp(r,l,I,i);
foldaddp(r,l,U,u);
commute(r,l);
identity(r,retype(l,ty),I,i,O);
identity(r,retype(l,ty),U,u,O);
(ADD+P transformations 210)
break;
....
(simp.c macros)+= 209
#define foldaddp(L,R,RTYPE,VAR) \
if (L->op == CNST+P && R->op == CNST+RTYPE) {\
p = tree(CNST+P, ty, NULL, NULL);\
210 CHAPTER 9 • EXPRESSION SEMANTICS

p->u.v.p = (char *)L->u.v.p + R->u.v.VAR;\


return p; }
Four uses of fo 1daddp are required because of the asymmetry of ADD+ P's
operands: one is a pointer and the other is an integer or an unsigned.
These uses of fo 1daddp implement transformations like
(ADD+P (CNST+P ci) (CNST+I c2)) ~ (CNST+P c1 + c2)
The uses of identity implement
(ADD+P e (CNST+I 0)) ~ e
and' the similar transformation for unsigned constants.
The remaining transformations of ADD+P trees either produce simpler
and thus better trees or feed another transformation. The transfor-
mation
(ADD+P transformations 210)=
if (isaddrop(l->op)
211 ... 209

&& (r->op == CNST+I I I r->op == CNST+U))


return addrtree(l, cast(r, inttype)->u.v.i, ty);
eliminates indexed addressing of a known location by a constant, which
addressed 179 occurs in array references such as a [ 5J and in field references such as
cast 175 x. name. These expressions yield trees of the form
computed 211
defined 50 (ADD+P n (CNST+x c))
foldaddp 209
FUNC 97 where n denotes a tree for the address of an identifier, x is u or I, and c
generated 50 is a constant. This tree can be transformed to n', a tree for an identifier
genlabel 45 that is bound to the addressed location. add rt ree creates a new identifier
identity 208
isaddrop 179
whose address is the location addressed by 1 plus the constant offset,
NEWO 24 and builds a tree for the address of this identifier.
scope 37
(simp.c functions)+=
...
205
stringd 29
temporary 50 static Tree addrtree(e, n, ty) Tree e; int n; Type ty; {
Symbol p = e->u.sym, q;

NEWO(q, FUNC);
q->name = stringd(genlabel(l));
q->sclass = p->sclass;
q->scope = p->scope;
q->type = ty;
q->temporary = p->temporary;
q->generated = p->generated;
q->addressed = p->addressed;
q->computed = 1;
q->defined = 1;
9. 7 • CONSTANT FOLDING 211

q->ref = 1;
(announce q 211)
e = tree(e->op, ty, NULL, NULL);
e->u.sym = q;
return e;
}

....
(symbol flags 50)+= 179 292
..... 38
unsigned computed:l;
As for other identifiers, the front end must announce this new identi-
fier to the back end. Since its address is based on the address of another
identifier, represented by p, it's announced by calling the interface func-
tion address, and its computed flag identifies it as a symbol based on
another symbol. But there's a phase problem: p must be announced be-
fore q, but if p is a local or a parameter, it has not yet been passed to
the back end via local or function. addrtree thus calls address only
for globals and statics, and delays the call for locals and parameters:
(announce q 211) = 211
if (p->scope == GLOBAL
I I p->sclass == STATIC I I p->sclass EXTERN) {
if (p->sclass == AUTO) 219 addlocal
q->sclass = STATIC; 90 address
217 Address
(*IR->address)(q, p, n); 457 address (MIPS)
} else { 490 " (SPARC)
Code cp; 521 " (X86)
addlocal(p); 210 addrtree
cp = code(Address); 80 AUTO
217 Code
cp->u.addr.sym = q; 218 code
cp->u.addr.base = p; 80 EXTERN
cp->u.addr.offset = n; 92 function
} 448 " (MIPS)
484 " (SPARC)
The code-list entry Address is described in Section 10.1. lee can't delay 518 " (X86)
the call to address for globals and statics because expressions like &a[S] 38 GLOBAL
are constants and can appear in, for example, initializers. 306 IR
179 isaddrop
The next transformation improves expressions like b [ i] . name, which 90 local
yields a tree of the form (ADD+P (ADD+P i n) (CNST+x c)), where i is 447 " (MIPS)
a tree for an integer expression and n and c are defined above. This tree 483 " (SPARC)
can be transformed into (ADD+P i (ADD+P n (CNST+x c))) and the in- 518 " (X86)
364 offset
ner ADD+P tree will be collapsed to a simple address by the transforma- 38 ref
tion above to yield (ADD+P i n'). 37 scope
.... 80 STATIC
(ADD+P transformations 210)+= 210 212
..... 209 150 tree
if (1->op == ADD+P && isaddrop(l->kids[l]->op)
&& (r->op == CNST+I I I r->op == CNST+U))
212 CHAPTER 9 • EXPRESSION SEMANTICS

return simplify(ADD+P, ty, 1->kids[O],


addrtree(l->kids[l], cast(r, inttype)->u.v.i, ty));
Technically, this transformation is safe only when (i + n) + c is equal to
i + (n + c), which is known only at runtime, but the standard permits
these kinds of rearrangements to be made at compile time.
Similarly, the tree (ADD+P (ADD+! i (CNST+x c)) n) can be trans-
formed to (ADD+P i n'); this transformation also applies if SUB+! ap-
pears in place of the ADD+!:
(ADD+P transformations 210)+= 211 212
....
209
.....
if ((1->op == ADD+! I I 1->op == SUB+!)
&& 1->kids[l]->op == CNST+I && isaddrop(r->op))
return simplify(ADD+P, ty, 1->kids[O],
simplify(generic(l->op)+P, ty, r, 1->kids[l]));
The following cases combine constants and implement the transfor-
mations
(ADD+P (ADD+P x (CNST C1)) (CNST C2)) ~
(ADD+P x (CNST C1 + C2))
(ADD+P (ADD+! x (CNST c1)) (ADD+P y (CNST c2))) ~
(ADD+P x (ADD+P y (CNST C1 + C2)))
addrtree 210 These transformations trigger others when x or y are identifier trees.
cast 175
isaddrop 179 (ADD+P transformations 210)+= 212 212
.... 209
optree 191 .....
if (1->op == ADD+P && generic(l->kids[l]->op) == CNST
RIGHT 149
simplify 203 && generic(r->op) == CNST)
tree 150 return simplify(ADD+P, ty, 1->kids[O],
(*optree['+'])(ADD, 1->kids[l], r));
if (1->op == ADD+! && generic(l->kids[l]->op) == CNST
&& r->op == ADD+P && generic(r->kids[l]->op) == CNST)
return simplify(ADD+P, ty, 1->kids[O],
simplify(ADD+P, ty, r->kids[O],
(*optree['+'])(ADD, 1->kids[l], r->kids[l])));
The last transformation reaches into RIGHT trees to apply ADD+P trans-
formations to their operands.
(ADD+P transformations 210)+=
....
212 209
if (1->op == RIGHT && 1->kids[l])
return tree(RIGHT, ty, 1->kids[O],
simplify(ADD+P, ty, 1->kids[l], r));
else if (1->op == RIGHT && 1->kids[O])
return tree(RIGHT, ty,
simplify(ADD+P, ty, 1->kids[O], r), NULL);
These tests implement
9. 7 • CONSTANT FOLDING 213

(ADD+P (RIGHT x y) e) => (RIGHT x (ADD+P y e))


(ADD+P (RIGHT x) e) => (RIGHT (ADD+P x e))

The first test applies to trees formed by expressions such as f () . x; the


call returns a temporary, so referencing a field of that temporary will
benefit from the first ADD+P transformation described above. The second
test applies to expressions that are wrapped in a RIGHT as the result of
a conversion.
Table 9.2 lists the remaining transformations in (simplify cases).
(AND+I (CNST+I 0) e) => (CNST+I O)
(AND+I (CNST+I 1) e) => e
(OR+I (CNST+I O) e) => e
(OR+I (CNST+I c) e), c f:. 0 => (CNST+I 1)
(BCOM+U (BCOM+U e)) => e
(BOR+U (CNST+U O) e) => e
(BXOR+U (CNST+U 0) e) => e
(DIV+I e (CNST+I 1)) => e
(DIV+U e (CNST+U c)), c = 2k => (RSH+U e (CNST+I k))
(GE+U e (CNST+U O)) => (e, (CNST+I 1))
(GE+U (CNST+U 0) e) => (EQ+I e (CNST+I 0))
(GT+U (CNST+U O) e) => (e, (CNST+I 0))
(GT+U e (CNST+U 0)) => (NE+I e (CNST+I 0))
(LE+U (CNST+U 0) e) => (e, (CNST+I 1))
(LE+U e (CNST+U O)) => (EQ+I e (CNST+I O)) 149 RIGHT
(LT+U e (CNST+U O)) => (e, (CNST+I 0))
(LT+U (CNST+U 0) e) => (NE+I e (CNST+I 0))
(LSH+I e (CNST+I O)) => e
(LSH+U e (CNST+I 0)) => e
(MOD+I e (CNST+I 1)) => (e, (CNST+I O))
(MOD+U e (CNST+I c)), c = 2k => (BAND+U e (CNST+U c -1))
(MUL+I (CNST+I c1) (ADD+I e (CNST+I c2))) =>
(ADD+I (MUL+I (CNST+I c 1) e) (CNST+I c 1 x c 2))
(MUL+I (CNST +I c1) (SUB+I e (CNST+I c2))) =>
(SUB+I (MUL+I (CNST+I c 1) e) (CNST+I c 1 x c2))
(MUL+I (CNST+I c) e), c = 2k => (LSH+I e (CNST+I k))
(NEG+D (NEG+D e)) => e
(NEG+F (NEG+F e)) => e
(NEG+I (NEG+I e)), e f:. (CNST+I INT.MIN) => e
(RSH+I e (CNST+I 0)) => e
(RSH+U e (CNST+I O)) => e
(SUB+P e (CNST+I c)) => (ADD+P e (CNST+I -c))
(SUB+P e (CNST+U c)) => (ADD+P e (CNST+U -c))
(SUB+P e1 (ADD+I e2 (CNST+I c))) =>
(SUB+P (SUB+P e1 (CNST+I c)) e2)

TABLE 9.2 Remaining s imp 1 i fy transformations.


214 CHAPTER 9 • EXPRESSION SEMANTICS

Further Reading
1cc's approach to type checking is similar to the one outlined in Chap-
ter 6 of Aho, Sethi, and Ullman (1986). simplify's transformations are
similar to those described by Hanson (1983). Similar transformations
can be done, often more thoroughly, by other kinds of optimizations
or during code generation, but usually at additional cost. s imp 1 i fy im-
plements only those that are likely to benefit almost all programs. A
more systematic approach is necessary to do a more thorough job; see
Exercise 9.8.

Exercises
9.1 Implement Type super(Type ty), which is shown in Figure 9.1.
Don't forget about enumerations and the types long, unsigned long,
and long double.
9.2 How can a double be converted to an unsigned using only double-to-
signed integer conversion? Use your solution to implement cast's
fragment (double-to-unsigned conversion).
9.3 In 1 cc, all enumeration types are represented by integers because
cast 175 that's what most other C compilers do, but the standard permits
postfix 166
simplify 203
each enumeration type to be represented by any of the integral
unary 164 types, as long it the type chosen can hold all the values. For exam-
ple, unsigned characters could be used for enumeration types with
enumeration values in the range 0-255. Explain how cast must
be changed to accommodate this scheme. Earlier versions of 1cc
implemented this scheme.
9.4 Implement the omitted fragments for unary and postfix.
9.5 Dereferencing null pointers is a common programming error in C
programs. 1cc's -n option catches these errors. With -n, 1cc gen-
erates code for
static char *_YYfi 1e = "file";
static void _YYnull(int line) {
char buf[200];
sprintf(buf,"null pointer dereferenced @%s:%d\\n",
_YYfile, line);
write(2, buf, strlen(buf));
abort();
}

at the end of each source file; file is the name of the source file.
It also arranges for its global YYnul 1 to point to the symbol-table
EXERCISES 215

entry for the function _YYnull. Whenever it builds a tree to defer-


ence a pointer p, if YYnul 1 is nonnull, it calls nul lcheck to build
a tree that is equivalent to ((tl = p) 11 _YYnull(lineno), tl),
where t1 is a temporary, and lineno is a constant that gives the
source-code line at which the dereference appears and is the value
of the global 1 i neno. Thus, attempts to dereference a null pointer
at runtime result in calls to _YYnull. Implement nullcheck.
9.6 bittree builds the trees for & I A %, multree for* /, shtree for
« », and subtree for binary -. Implement these functions. The
pointer subtraction code in subtree and the code in bi tt ree for the
modulus operator% are the most subtle. subtree takes about 25
lines, and the others each take less than 20.
9.7 Given the following file-scope declarations, what ADD+P tree is built
for the expression x[lOJ .table[i] .count? Don't forget to apply
s imp 1i fy's transformations.

int i;
struct 1 i st {
char *name;
struct entry table {
int age;
int count; 104 lineno
203 simplify
} table[lOJ;
} x[lOO];

9.8 si mp 1i fy uses ad hoc techniques to implement constant folding,


and it implements only some of the transformations possible. Ex-
plore the possibilities of using 1burg, which is described in Chap-
ter 14, to implement constant folding and a complete set of trans-
formations.
10
Statements

The syntax of C statements is


statement:
ID : statement
case constant-expression : statement
default : statement
[ expression ] ;
; f ' (' expression ') ' statement
if ' (' expression ') ' statement e 1 se statement
switch ' (' expression ') ' statement
whi 1 e ' (' expression ') ' statement
do statement whi 1 e ' (' expression ') '
for ' ( ' [ expression ] ; [ expression ] ; [ expression ] ' ) '
statement
break ;
function 92 continue ;
(MIPS) " 448 goto ID;
(SPARC) " 484 return [ expression] ;
(X86) " 518
gencode 337
compound-statement
compound-statement:
' {' { declaration} { statement} '}'
compound-statement is implemented in Section 11.7. Some languages,
such as Pascal, use semicolons to separate statements. In C, semicolons
terminate statements, which is why they appear in the productions for
the expression, do-while, break, continue, goto, and return statements,
and they do not appear in the production for compound statements.

10.1 Representing Code


The semantics of statements consist of the evaluation of expressions,
perhaps intermixed with jumps and labels, which implement transfer
of control. Expressions are compiled into trees and then converted to
dags, as suggested in Section 1.3 and detailed in Chapter 12. Jumps and
labels are also represented by dags. For each function, these dags are
strung together in a code list, which represents the code for the function.
The front end builds the code list for a function and calls the interface
function function. As described in Section 11.6, back ends call gencode
216
10. 1 • REPRESENTING CODE 217

and emi tcode to generate and emit code; these functions traverse the
code list.
The code list is a doubly linked list of typed code structures:
{stmt.c typedefs)=.
typedef struct code *Code;
...
231

{stmt.c exported types)=


struct code {
enum { Blockbeg, Blockend, Local, Address, Defpoint,
Label, Start, Gen, Jump, Switch
} kind;
Code prev, next;
union {
(Blockbeg 219)
(Blockend 220)
(Local 219)
(Address 219)
(Defpoi nt 220)
(Label, Gen, Jump 220)
(Switch 242)
} u;
} ; 90 address
457 " (MIPS)
Each of the fields of u correspond to one of the values of kind enu- 490 " (SPARC)
merated above except for Start, which needs no u field. Bl ockbeg 521 " (X86)
and Bl ockend entries identify the boundaries of compound statements. 218 code
Local and Address identify local variables that must be announced to 341 emitcode
143 kind
the back end by the 1oca1 and address interface functions. Def point 90 local
entries define the locations of execution points, which are the places 447 " (MIPS)
in the program at which debuggers might plant breakpoints, for exam- 483 " (SPARC)
ple. Label, Gen, and Jump entries carry dags for expressions, labels, and 518 " (X86)
jumps. Switch entries carry the data needed to generate code for a
switch statement.
The code list begins with a Start entry. code 1 i st always points to
the last entry on the list:
(stmt.c data)=
struct code codehead = { Start };
...
238

Code codelist = &codehead;


The top diagram in Figure 10.l shows the initial state of the code list.
As statements in a function are compiled, the code list grows as entries
are appended to it. code allocates an entry, links it to the entry pointed
to by code 1 i st, and advances code 1 i st to point to the new entry, which
is now the last one on the code list. code returns a pointer to the new
entry:
218 CHAPTER 10 •STATEMENTS

(stmt.c functions)=
Code code(kind) int kind; {
...
219

Code cp;

(check for unreachable code 218)


NEW(cp, FUNC);
cp->kind = kind;
cp->prev = codelist;
cp->next = NULL;
codelist->next = cp;
codelist = cp;
return cp;
}

The bottom diagram in Figure 10.1 shows the code list after two entries
have been appended.
The values of the enumeration constants that identify code-list en-
tries are important. Those greater than Start generate executable code;
those less than Labe 1 do not generate code, but serve only to declare
information of interest to the back end. Thus, code can detect entries
that will generate unreachable code if it appends one with kind greater
than Start after an unconditional jump:
Code 217 (check for unreachable code 218) = 218
codelist 217
FUNC 97 if (kind > Start) {
Jump 217 for (cp = codelist; cp->kind <Label; )
kind 143 cp = cp->prev;
Label 217 if (cp->kind == Jump I I cp->kind == Switch)
NEW 24 warning("unreachable code\n");
Start 217 }
Switch 217

codelist---.
codehead: Start kind
NULL prev
NULL next
u

codelist - - - - - - - - - - - - - - -

codehead:

FIGURE 10.1 The initial code list and after appending two entries.
10. 1 • REPRESENTING CODE 219

As detailed in Section 10.7, control doesn't "fall through" switch state-


ments; they're like unconditional jumps.
addl ocal appends a Local entry for a local variable, unless it's already
been defined:
(Local 219)= 217
Symbol var;

(stmt.c functions)+=
...
218 220
.....
void addlocal(p) Symbol p; {
if (!p->defined) {
code(Local)->u.var p;
p->defined = 1;
p->scope =level;
}
}

addrtree illustrates the use of addlocal and the use of code to append
an Address entry. Address entries carry the data necessary for gencode
to make a call to the interface function address.
(Address 219) = 217
struct {
Symbol sym; 90 address
Symbol base; 217 Address
int offset; 457 address (MIPS)
490 " (SPARC)
} addr; 521 " (X86)
210 addrtree
When gencode processes this entry, it uses the values of the sym, base,
217 Blockbeg
and offset fields as the three arguments to address. 217 Blockend
Bl ockbeg entries store the data necessary to compile a compound 218 code
statement: 50 defined
365 Env
(Blockbeg 219)= 217 337 gencode
struct { 41 identifiers
int level; 42 level
217 Local
Symbol *locals; 364 off set
Table identifiers, types; 37 scope
Env x; 39 Table
} block; 41 types

level is the value of level associated with the block, and local s is
a null-terminated array of symbol-table pointers for the locals declared
in the block. xis back end's Env value for this block. identifiers and
types record the i denti fie rs and types tables when the block was com-
piled; they're used in code omitted from this book to generate debugger
symbol-table information when the option -g is specified. Blockend en-
tries just point to their matching Blockbeg:
220 CHAPTER 10 •STATEMENTS

(Bl ockend 220) = 217


Code begin;
Labe 1, Gen, and Jump entries all carry a pointer to a forest:
(Label, Gen, Jump 220)= 217
Node forest;
Each of these entries is identified by its own enumeration constant so
that its purpose can be determined without inspecting its dag. This ca-
pability is used above in code to identify jumps, and it's used in Sec-
tion 10.9 to eliminate jumps to jumps and unreachable jumps.

10.2 Execution Points


Execution points occur before every expression in the grammar at the
beginning of this chapter, before the operands of && and 11, before the
second and third operands of ? : , at the beginning and end of every com-
pound statement, and at the entry and exit to every function. They give
back ends that implement the stab interface functions mentioned in Sec-
tion 5.2 the opportunity to generate code and symbol-table information
for debuggers. For example, debuggers permit breakpoints to be set at
Code 217 execution points.
code 218 Execution points and events are also used to implement 1cc's profiling
Coordinate 38 facility. The option -b causes lee to generate code to count the number
Defpoint 217 of times each execution point is executed and to write those counts to
forest 311
Gen 217 a file. The -a option causes that file to be read during compilation and
Jump 217 used to compute values of refi nc that give exact execution frequencies
Label 217 instead of estimates.
refine 169 An execution-point entry records the source coordinates and a unique
src 108 number that identifies the execution point:
(Defpoi nt 220) = 217
struct {
Coordinate src;
int point;
} point;
defi nept appends a Defpoi nt entry to the code list and fills in either an
explicit coordinate or the current value of src:
(stmt.c functions)+=
...
219 221
.....
void definept(p) Coordinate *p; {
Code cp = code(Defpoint);

cp->u.point.src = p ? *p : src;
cp->u.point.point = npoints;
10.3 •RECOGNIZING STATEMENTS 221

(reset refi nc if -a was specifi.ed)


if (events.points)
(plant event hook)
}

Usually, defi nept is called with a null pointer, but the loop and switch
statements generate tests and assignments at the ends of their state-
ments, so the execution points are in a different order in the generated
code than they appear in the source code. For these, the relevant coordi-
nate is saved when the expression is parsed, and is passed to defi nept
when the code for the expression is generated; the calls to defi nept in
forstmt are examples.

10.3 Recognizing Statements


The parsing function for statement uses the current token to identify the
kind of statement, and switches to statement-specific code:
(stmt.c functions)+=
...
220 224
....
void statement(loop, swp, lev) int loop, lev; Swtch swp; {
float ref = refine;

if (Aflag >= 2 && lev == 15) 62 Aflag


293 compound
warning("more than 15 levels of nested statements\n"); 220 definept
switch (t) { 228 forstmt
case IF: (if statement 224) break; 38 ref
case WHILE: (while statement) break; 169 refine
case DO: (do statement) (semicolon 222) 231 Swtch
case FOR: (for statement 228) break;
case BREAK: (break statement 232) (semicolon 222)
case CONTINUE: (continue statement 228) (semicolon 222)
case SWITCH: (switch statement 232) break;
case CASE: (case label 234) break;
case DEFAULT: (default label 234) break;
case RETURN: (return statement 243) (semicolon 222)
case I {I: compound(loop, swp, lev + 1); break;
case ';': definept(NULL); t = gettok(); break;
case GOTO: (goto statement 227) (semicolon 222)
case ID: (statement label or fall thru to default226)
default: (expression statement 222) (semicolon 222)
}
(check for legal statement termination 222)
refi nc = ref;
}
222 CHAPTER 10 •STATEMENTS

(semicolon 222)= 221


expect(';');
break;

(check for legal statement termination 222)= 221


if (kind[t] != IF && kind[t] != ID
&& t != '}' && t != EOI) {
static char stop[] = { IF, ID, '}', 0 };
error("illegal statement termination\n");
skipto(O, stop);
}

statement takes three arguments: loop is the label number for the inner-
most for, while, or do-while loop, swp is a pointer to the swtch structure
that carries all of the data pertaining to the innermost switch statement
(see Section 10.7), and lev tells how deeply statements are currently
nested. If the current statement is not nested in any loop, 1oop is zero;
if it's not nested in any switch statement, swp is null. 1oop is needed to
generate code for break and continue statements, swp is needed to gen-
erate code for switch statements, and 1ev is needed only for the warning
shown above at the beginning of statement. The code for each kind of
statement passes these values along to nested calls to statement, modi-
definept 220 fying them as appropriate.
expect 142 Labels, like those used for 1oop, are local labels, and they're gener-
exprO 156 ated by genlabel (n), which returns the first of n labels. findlabel (n)
fi ndl abe l 46 returns the symbol-table entry for label n.
genlabel 45
idtree 168
For every reference to an identifier, i dtree increments that identifier's
kind 143 ref field by refi nc. This value is approximately proportional to the num-
listnodes 318 ber of times the identifier is referenced. statement and its descendants
nodecount 314 change refi nc to weight each reference to an identifier that is appropri-
refine 169 ate for the statement in which it appears. For example, refi nc is divided
skipto 144
statement 221
by 2 for the arms of an if statement, and it's multiplied by 10 for the
body of a loop. The value of the ref field helps identify those locals and
parameters that might profitably be assigned to registers, and locals are
announced to the back end in decreasing order of their ref values.
The default case handles expressions that are statements:
(expression statement 222) = 221
definept(NULL);
if (kind[t] != ID) {
error("unrecognized statement\n");
t = gettok();
} else {
Tree e = exprO(O);
listnodes(e, 0, O);
if (nodecount == 0 I I nodecount > 200)
10.3 •RECOGNIZING STATEMENTS 223

walk(NULL, 0, O);
deallocate(STMT);
}

1i st nodes and wa 1k are the two functions that generate dags from trees.
Chapter 12 explains their implementations, but their usage must be ex-
plained now in order to understand how the front end implements the
semantics of statements.
1i stnodes takes a tree as its first argument, generates the dag for that
tree as described in Chapter 5, and appends the dag to a growing forest
of dags that it maintains. Thus, the call to 1i stnodes above generates the
dag for the tree returned by exprO, and appends that dag to the forest.
For the input
c = a + b;
a= a/2;
d = a + b;
the fragment (expression statement) is executed three times, once for
each statement, and thus 1i stnodes is called three times. The first call
appends the dag for c = a + b to the initially empty forest, and the
next two calls grow that forest by appending the dags for the second
and third assignments. As detailed in Section 12.1, 1i stnodes reuses
common subexpressions when possible; for example, in the assignment 28 deallocate
d = a + b, it reuses the dags for the !value of a and the rvalue of b 156 exprO
formed for the first assignment. It can't reuse the rvalue of a because 217 Gen
the second assignment changes a. 318 listnodes
314 nodecount
The second and third arguments are label numbers, and their purpose 97 STMT
is explained in the next section; the zeros shown in the call to 1i stnodes 311 walk
above specify no labels. 1i stnodes also accepts the null tree for which
it simply returns.
1i stnodes keeps the forest to itself until wa 1k is called, which accepts
the same arguments as 1i stnodes. wa 1 k takes two steps: First, it passes
its arguments to 1i stnodes, so a call to wa 1k has the same effect as a call
to 1i st nodes. Second, and most important, wa 1k allocates a Gen code-list
entry, stores the forest in that entry, appends the entry to the code list,
and clears the forest. Once a forest is added to the code list, its dags are
no longer available for reuse by 1i stnodes.
The call wa 1k(NULL, 0, 0) effectively executes just the second step,
and it has the effect of adding the current forest to the code list, if there
is a nonempty forest. This call is made whenever the current forest must
be appended to the code list either because some other executable code-
list entry must be appended or because two or more separate flows of
control merge. In the code above, this call is made when nodecount is
zero or when it exceeds 200. nodecount is the number of nodes in the
forest that are available for reuse. wa 1 k is called when the forest has
no nodes that can be reused or when the forest is getting large. The
224 CHAPTER 10 •STATEMENTS

former condition puts dags that do not share common subexpressions


into separate forests, and the latter one limits the sizes of forests; both
consequences may help back ends.
The call to dea11 ocate frees all the space in the STMT arena, which is
where trees are allocated. wa 1 k also deallocates the STMT arena.

10.4 If Statements
The generated code for an if statement has the form
if expression == 0 goto L
statement1
goto L + 1
L: statement2
L + 1:

If the else part is omitted, the goto L + 1 is omitted. The code is


(if statement 224) = 221
ifstmt(genlabel(2), loop, swp, lev + 1);

(stmt.c functions)+=
...
221 225
....
branch 247 static void ifstmt(lab, loop, swp, lev)
conditional 225 int lab, loop, lev; Swtch swp; {
deallocate 28 t = gettok();
define lab 246
definept 220
expect(' (') ;
expect 142 definept(NULL);
expr 155 walk(conditional(')'), 0, lab);
findlabel 46 refine/= 2.0;
genlabel 45 statement(loop, swp, lev);
ref 38
refi nc 169 if (t == ELSE) {
statement 221 branch(lab + 1);
STMT 97 t = gettok();
Swtch 231 definelab(lab);
walk 311 statement(loop, swp, lev);
if (findlabel(lab + 1)->ref)
definelab(lab + 1);
} else
definelab(lab);
}

The first argument to ifstmt is L; genlabel (2) generates two labels for
use in the if statement. i fstmt's other three arguments echo statement's
arguments. conditional parses an expression by calling expr, and en-
sures that the resulting tree is a conditional, which is an expression
whose value is used only to alter flow of control. The root of the tree
10.4 • IF STATEMENTS 225

for a conditional has one of the comparison operators, AND, OR, NOT, or
a constant. condi ti ona l's argument is the token that should follow the
expression in the context in which condi ti ona 1 is called.
{stmt.c functions)+=
...
224 226
.....
static Tree conditional(tok) int tok; {
Tree p = expr(tok);

if (Aflag > 1 && isfunc(p->type))


warning("%s used in a conditional expression\n",
funcname(p));
return cond(p);
}

The second and third arguments to 1i stnodes and walk are labels
that specify true and false targets. walk(e, tlab, flab) passes its ar-
guments to 1i stnodes, which generates a dag from e and adds it to the
forest, and appends a Gen entry carrying the forest to the code list, as
explained in the previous section. When e is a tree for a conditional ex-
pression, either tl ab or fl ab is nonzero. If tl ab is nonzero, 1i stnodes
generates a dag that transfers control to t 1ab if the result of e is nonzero;
likewise, 1i stnodes generates a dag that jumps to fl ab if e evaluates to
zero. 1i stnodes and wa 1k can be called with a nonzero value for only
one of tl ab or fl ab; control always "falls through" for the other case. 62 Aflag
149 AND
For the if statement, wa1 k is called with a nonzero fl ab corresponding 247 branch
to L in the generated code shown above. define 1ab and branch generate 174 cond
code-list items for label definitions and jumps. L + 1 is defined only if 246 definelab
it's needed; a label's ref field is incremented each time it's used as the 155 expr
217 Gen
target of a branch. For example, L + 1 isn't needed if the branch to it is 168 idtree
eliminated, which occurs in code like 224 ifstmt
60 isfunc
i f ( ... ) 318 listnodes
return; 149 NOT
else 149 OR
169 refine
221 statement
The return statement acts like an unconditional jump, so the call to 311 walk
branch(lab + 1) doesn't emit the branch.
Recall that refi nc is the amount added to each reference to an iden-
tifier in i dtree. Estimating that each arm of an if statement is executed
approximately the same number of times, refi nc is halved before they
are parsed. The result is that a reference to an identifier in one of the
arms counts half as much as a reference before or after the if statement.
i fstmt doesn't have to restore refi nc because statement does.
226 CHAPTER 10 •STATEMENTS

10.5 Labels and Gotos


For statements that begin with an identifier, the identifier is a label if it
is followed by a colon; otherwise, it begins an expression.
(statement label or fall thru to default 226) = 221
if (getchr() == ':') {
stmtlabel ();
statement(loop, swp, lev);
break;
}

get ch r advances the input to just before the initial character of the next
token and returns that character. It is used to 'peek' at the next character
to check for a colon. Since an identifier can be both a label and a variable,
a separate table, stmtlabs, holds source-language labels:
(stmt.c exported data)=
extern Table stmtlabs;
Like other tables, stmtl abs is managed by 1ookup and i nsta11. It maps
source-language labels to internal label numbers, which are stored in the
symbols' u. 1 . 1abe1 fields.
defined 50 ....
definelab 246
(stmt.c functions)+=
static void stmtlabel() {
225 228 ...
expect 142
FUNC 97 Symbol p = lookup(token, stmtlabs);
getchr 108
install 44 (install token in stmtl abs, if necessary 226)
Label 217 if (p->defined)
lookup 45
statement 221 error("redefinition of label '%s' previously_
Table 39 defined at %w\n", p->name, &p->src);
token 108 p->defined = 1;
definelab(p->u.l.label);
t = gettok();
expect(':');
}

definelab(n) builds a LABELV dag that defines the label n, allocates a


Label code-list entry to hold that dag, and appends the Label entry to
the code list.
Labels can be defined before they are referenced and vice versa, so
they can be installed either when they label a statement or when they
appear in a goto statement.
(install token in stmtl abs, if necessary 226) = 226 227
if (p == NULL) {
p = install(token, &stmtlabs, 0, FUNC);
10.6 •LOOPS 227

p->scope = LABELS;
p->u.l.label = genlabel(l);
p->src = src;
}

A label's ref field counts the number of references to the label and is
initialized to zero by i nsta11. Each reference to the label increments the
ref field:
(goto statement 227) = 221
walk(NULL, 0, O);
definept(NULL);
t = gettok();
if Ct == ID) {
Symbol p = lookup(token, stmtlabs);
(install token in stmtl abs, if necessary 226)
use(p, src);
branch(p->u.l.label);
t = gettok();
} else
error("missing label in goto\n");
branch(n) builds a JUMPV dag for a branch to the label n, allocates a Jump
code-list entry to hold that dag, and appends the Jump entry to the code
247 branch
list. It also increments n's ref field. 309 checklab
Undefined labels - those referenced in goto statements but never de- 220 defi nept
fined - are found and announced when funcdefn calls checkl ab at the 286 funcdefn
end of a function definition. 45 genlabel
44 install
217 Jump
38 LABELS
10.6 Loops 45 lookup
37 scope
The code for all three kinds of loops has a similar structure involving 226 stmtlabs
three labels: L is the top of the loop, L + 1 labels the test portion of the 108 token
51 use
loop, and L + 2 labels the loop exit. For example, the generated code for 311 walk
a while loop is
goto L + 1
L: statement
L + 1: if expression != 0 goto L
L+ 2:
This layout is better than
L:
L + 1: if expression ! = 0 goto L + 2
statement
goto L
L + 2:
228 CHAPTER 10 •STATEMENTS

because the former executes n + 2 branch instructions when the loop


body is executed n times; the more obvious layout executes 2n + 1
branches.
The code for continue statements jumps to L + l, and the code for
break statements jumps to L + 2. Lis the loop handle, and is passed to
statement and the functions it calls, as illustrated by i fstmt. A continue
statement, for example, is legal only if there's a loop handle:
(continue statement 228) = 221
walk(NULL, 0, O);
definept(NULL);
if (loop)
branch(loop + 1);
else
error("illegal continue statement\n");
t = gettok();
The first three of the four labels in a for loop have the same meanings
as in the while loop; the layout of the generated code when all three
expressions are present is
expression 1
goto L + 3
branch 247 L: statement
Coordinate 38 L + 1: expression3
defi nept 220 L + 3: if expression2 ! = 0 goto L
expect 142
genlabel 45 L +2:
ifstmt 224
statement 221 expressioni. expression2, and expression3 are called the initialization,
Swtch 231 test, and increment, respectively.
walk 311 Most of the complexity in the parsing function is in coping with the op-
tional expressions, announcing the execution points in the right places,
and implementing an optimization for loops that always execute their
bodies at least once.
(for statement 228) = 221
forstmt(genlabel(4), swp, lev + 1);
....
(stmt.cfunctions)+=
static void forstmt(lab, swp, lev)
226 233...
int lab, lev; Swtch swp; {
int once = O;
Tree el = NULL, e2 = NULL, e3 NULL;
Coordinate pt2, pt3;

t = gettok();
expect('(');
10.6 •LOOPS 229

definept(NULL);
{forstmt 229)
}

First, the initialization is parsed and appended to the code list:


{forstmt 229) = 229
.... 229
if (kind[t] == ID)
el = texpr(exprO, I• I
FUNC);
' '
else
expect(';');
walk(el, 0, O);
Next, the test is parsed, but it cannot be passed to wa 1 k until a~er the
body of the loop has been compiled. The assignment to pt2 saves the
source coordinate of the test for a call to defpoi nt just before the tree
for the test is passed to wa1k.
{forstmt229)+=
...
229 229 229
....
pt2 = src;
refine *= 10.0;
if (kind[t] == ID)
e2 = texpr(conditional, I• I
FUNC);
' '
else 225 conditional
expect(';'); 220 definept
142 expect
wa1 k has an important side effect: it deallocates the STMT arena from 156 exprO
which trees are allocated by tree. texpr causes the trees for the test 97 FUNC
to be allocated in the FUNC arena, so they survive the calls to wa1 k that 143 kind
169 refine
are made when the loop body is compiled. texp r is also used for the 97 STMT
increment: 141 test
{forstmt229)+=
...
229 230 229
150 texpr
.... 150 tree
pt3 = src; 311 walk
if (kind[t] == ID)
e3 = texpr(exprO, ')', FUNC);
else {
static char stop[] = { IF, ID, '}', 0 };
test(')', stop);
}

pt3 holds the source coordinate for the increment expression for a later
call to defpoi nt.
Multiplying refi nc by 10 estimates that loop bodies are executed 10
times more often than statements outside of loops, and weights refer-
ences to identifiers used in loops accordingly.
Many for loops look like the one in the following code:
230 CHAPTER 10 •STATEMENTS

sum = O;
for Ci = O; i < 10; i++)
sum+= x[i];
The loop bodies in these kinds of loops are always executed at least once
and the leading goto L + 3 could be omitted, which is accomplished by
(forstmt229)+=
....
229 230 229
....
if (e2) {
once= foldcond(el, e2);
if (!once)
branch(lab + 3);
}
fo 1dcond inspects the trees for the initialization and for the test to de-
termine if the loop body will be executed at least once; see Exercise 10.3.
el is passed to fol dcond, which is why it was parsed with texpr above.
The rest of forstmt compiles the loop body and lays down the labels
and expressions as described above.
(forstmt229)+=
.... 229
230
definelab(lab);
statement(lab, swp, lev);
definelab(lab + 1);
definept(&pt3);
branch 247
definelab 246 i f (e3)
definept 220 walk(e3, 0, 0);
findlabel 46 if (e2) {
foldcond 250 if (!once)
forstmt 228 definelab(lab + 3);
labels 41
ref 38 definept(&pt2);
statement 221 walk(e2, lab, 0);
texpr 150 } else {
walk 311 definept(&pt2);
branch(lab);
}
if (findlabel(lab + 2)->ref)
definelab(lab + 2);
Symbol-table entries for generated labels are installed in the 1abe1 s table
by fi ndl abe 1. Llke other labels, the ref field of a generated label is
nonzero only if the label is the target of a jump.

10.7 Switch Statements


The C switch statement differs significantly from, for example, the Pascal
case statement. Any statement can follow the switch clause; the place-
ment of the case and default labels is not specified by the syntax of the
10.7 •SWITCH STATEMENTS 231

switch statement. In addition, after executing the statement associated


with a case label, control falls through to the next statement, which might
be labelled by another case label. Case and default labels are simply la-
bels, and have no additional semantics. For example, the intent of the
code
switch (n%4)
while (n > 0) {
case 0: *x++ *y++; n--;
case 3: *x++ = *y++; n--;
case 2: *x++ = *y++; n--;
case 1: *x++ = *y++; n--;
}

is to copy n values from y to x where n ~ 4. The loop is unrolled so


that each iteration copies four values. The switch statement copies the
first n%4 values and the n/ 4 iterations copy the rest. This example is
somewhat contrived but legal nonetheless.
The generated code for a switch statement with n cases and a default
looks like:
tl - expression
select and jump to L1, ... , Ln, L
code for statement
L + 1:

where tl is a temporary associated with the switch statement, and L + 1


is the exit label. Each case label generates a definition for its generated
label, Li, a default label generates a definition for L, and each break inside
a switch generates a jump to the exit label:
goto L + 1
If there's no default, L labels the same location as L + 1.
Parsing the switch statement, the case and default labels, and the
break statement are easy; the hard part is generating good code for the
select and jump fragment. Each case label is associated with an integer
value. These value-label pairs are used to generate the code that selects
and jumps to the appropriate case depending on the value of expression.
These and other data are stored in a swtch structure associated with the
switch statement during parsing:
.....
(stmt.c typedefs)+= 217
typedef struct swtch *Swtch;

(stmt.c types)=
struct swtch {
Symbol sym;
232 CHAPTER 10 •STATEMENTS

int lab;
Symbol deflab;
int ncases;
int size;
int *values;
Symbol *labels;
} ;

sym holds the temporary, tl, 1ab holds the value of L, and defl ab points
to the symbol-table entry for the default label, if there is one. values and
1abe1 s point to arrays that store the value-label pairs. These arrays have
size elements, ncases of which are occupied, and these ncases are kept
in ascending order of va1ues. A pointer to the swtch structure for the
current switch statement - the switch handle - is passed to statement
and its descendants.
Case and default labels are handled much like break and continue
statements: They refer to the innermost, or current, switch statement,
and case and default labels that appear outside of switch statements,
which is when the switch handle is null, are erroneous. The code for
the break statement determines whether it is associated with a loop or a
switch by examining both the loop handle and the switch handle:
(break statement 232) = 221
branch 247
definept 220
walk(NULL, 0, 0);
genlabel 45 definept(NULL);
labels 41 if (swp && swp->lab > loop)
statement 221 branch(swp->lab + 1);
Switch 217 else if (loop)
swstmt 233
branch(loop + 2);
walk 311
else
error("illegal break statement\n");
t = gettok();
Since the values of labels increase as they are generated, a break refers
to a switch statement if there's a switch handle and its Lis greater than
the loop handle.
Parsing switch statements involves parsing and type-checking the ex-
pression, generating a temporary, appending a Switch placeholder on the
code list, initializing a new switch handle and passing it to statement,
and generating the closing labels and the selection code.
(switch statement232)= 221
swstmt(loop, genlabel(2), lev + 1);

(stmt.c macros)=
#define SWSIZE 512
...
239
10.7 •SWITCH STATEMENTS 233

(stmt.c functions)+=
....
228 235 ....
static void swstmt(loop, lab, lev) int loop, lab, lev; {
Tree e;
struct swtch sw;
Code head, tail;

t = gettok();
expect(' (') ;
definept(NULL);
e = expr(') ');
(type-check e 233)
(generate a temporary to hold e, if necessary233)
head= code(Switch);
sw.lab =lab;
sw.deflab = NULL;
sw.ncases = O;
SW.size = SWSIZE;
sw.values = newarray(SWSIZE, sizeof *sw.value~ FUNC);
sw.labels = newarray(SWSIZE, sizeof *sw.labe 1 , FUNC);
refine /= 10.0;
statement(loop, &sw, lev);
(define L, if necessary, and L + 1 236)
(generate the selection code 236) 175 cast
} 217 Code
218 code
The placeholder Swi tch entry in the code list will be replaced by one or 220 defi nept
142 expect
more Switch entries when the selection code is generated. The switch 155 expr
expression must have integral type, and it's promoted: 97 FUNC
179 isaddrop
(type-check e 233) = 233 60 isint
if (!isint(e->type)) { 60 isvolatile
error("illegal type '%t' in switch expression\n", 41 labels
e->type); 28 newarray
71 promote
e = retype(e, inttype); 169 refine
} 171 retype
e = cast(e, promote(e->type)); 221 statement
217 Switch
The temporary also has type e->type, but the temporary can be avoided 232 SWSIZE
in some cases. If the switch expression is simply an identifier, and it's 343 tail
the right type and is not volatile, then it can be used instead. Otherwise,
the expression is assigned to a temporary:
(generate a temporary to hold e, if necessary233)= 233
if (generic(e->op) == INDIR && isaddrop(e->kids[O]->op)
&& e->kids[O]->u.sym->type == e->type
&& !isvolatile(e->kids[O]->u.sym->type)) {
sw.sym = e->kids[O]->u.sym;
234 CHAPTER 10 •STATEMENTS

walk(NULL, 0, 0);
} else {
sw.sym = genident(REGISTER, e->type, level);
addlocal(sw.sym);
walk(asgn(sw.sym, e), 0, O);
}

Once the switch handle is initialized, case and default labels simply
add data to the handle. For example, a default label fills in the defl ab
field, unless it's already filled in:
(default label234)= 221
if (swp == NULL)
error("illegal default label\n");
else if (swp->deflab)
error("extra default label\n");
else. {
swp->deflab = findlabel(swp->lab);
definelab(swp->deflab->u.l.label);
}
t = gettok();
expect(' : ');
statement(loop, swp, lev);
addlocal 219
caselabel 235 Case labels are similar: The label value is converted to the promoted
cast 175
constexpr 202
type of the switch expression, and a label associated with that value is
definelab 246 generated and defined:
expect 142
findlabel 46 (case label 234) = 221
genident 49 {
genlabel 45 int lab= genlabel(l);
isint 60 if (swp == NULL)
level 42
error("illegal case label\n");
needconst 202
REGISTER 80 definelab(lab);
statement 221 while (t == CASE) {
walk 311 static char stop[] { IF, ID, 0 };
Tree p;
t = gettok();
p = constexpr(O);
if (generic(p->op) == CNST && isint(p->type)) {
if (swp) {
needconst++;
p = cast(p, swp->sym->type);
needconst--;
caselabel(swp, p->u.v.i, lab);
}
} else
10.1 • SWITCH STATEMENTS 235

error("case label must be a constant _


integer expression\n");
test(':', stop);
}
statement(loop, swp, lev);
}

needconst is incremented before that call to cast so that si mp 1ify will


fold the conversion even if it overflows. For example, the input
int i;
switch (i)
case Oxffffffff:
elicits the diagnostic
warning: overflow in constant expression
because the case value is an unsigned that can't be represented by an
integer. Notice that a case label is processed even if it ap· ~ars outside
a switch statement; this prevents the case label from cam _1g additional
syntax errors.
caselabel appends the value and the label to the values and labels
arrays in the switch handle. It also detects duplicate labels.
62 Aflag
.... 175 cast
(stmt.c functions)+= 233 239
.... 46 fi ndl abe l
static void caselabel(swp, val, lab) 41 labels
Swtch swp; int val, lab; { 202 needconst
int k; 203 simplify
221 statement
if (swp->ncases >= swp->size) 231 Swtch
141 test
(double the size of values and labels)
k = swp->ncases;
for ( ; k > 0 && swp->values[k-1] >=val; k--) {
swp->values[k] swp->values[k-1];
swp->labels[k] = swp->labels[k-1];
}
if Ck < swp->ncases && swp->values[k] == val)
error("duplicate case label '%d'\n", val);
swp->values[k] =val;
swp->labels[k] = findlabel(lab);
++swp->ncases;
if (Aflag >= 2 && swp->ncases == 258)
warning("more than 257 cases in a switch\n");
}

The for loop inserts the new label and value into the right place in the
va1ues and 1abe1 s arrays so that these arrays are sorted in ascending
236 CHAPTER 10 •STATEMENTS

order of values, which helps both to detect duplicate case values and to
generate good selection code. If necessary, these arrays are doubled in
size to accommodate the new value-label pair.
After the return from statement to swstmt, a default label is defined,
if there was no explicit default, and the exit label, L + 1, is defined, if it
was referenced:
(define L, if necessary, and L + 1 236) = 233
if (sw.deflab == NULL) {
sw.deflab = findlabel(lab);
definelab(lab);
if (sw.ncases == 0)
warning("switch statement with no cases\n");
}
if (findlabel(lab + 1)->ref)
definelab(lab + 1);
The default label is defined even if it isn't referenced, because it will
probably be referenced by the selection code.
The selection code can't be generated until all the cases have been
examined. Compiling statement appends entries to the code list, but
the entries for the selection code need to appear just after those for
expression and before those for statement. The selection code could
codelist 217
definelab 246
appear after statement if branches were inserted so the selection code
findlabel 46 was executed before statement. But there's a solution to this problem
ref 38 that's easier and generates better code: rearrange the code list.
statement 221 The top diagram in Figure 10.2 shows the code list after the exit label
Switch 217 has been defined. The solid circle represents the entry for expression,
swstmt 233
tail 343
the open circle is the Switch placeholder, and the open squares are the
entries for statement, including the definitions for the case and default
labels and the jumps generated by break statements. head points to the
placeholder and code 1i st to the last statement entry.
The first step in generating the selection code is to make the solid
circle the end of the code list:
(generate the selection code 236)= 236
..... 233
tail = codelist;
codelist = head->prev;
codelist->next = head->prev = NULL;
The second diagram in Figure 10.2 shows the outcome of these state-
ments. head and tai 1 point to the entries for the placeholder and for
statement, and code 1i st points to the entry for expression. As the se-
lection code is generated, its entries are appended in the right place:
(generate the selection code 236) +=
....
236 237 233
.....
if (sw.ncases > 0)
10.7 •SWITCH STATEMENTS 237

••• D

i
code list

•••• ••• D

i
code list
i
tail

~~
-~~D, .. t
code list
if'o~o
i~~
head
••• D

i
tail

••• D

i
code list
247 branch
217 codelist
e switch expression
O switch placeholder 239 swgen
343 tail
o statement entries D, selection code entries
~ prev pointers ~ next pointers

FIGURE 10.2 Code-list manipulations for generating switch selection code.

swgen(&sw);
branch(lab);
Figure 10.2's third diagram shows the code list after entries for the se-
lection code, which are shown in open triangles, have been added. The
last step is to append the entire list held by head and tail to the code
list and set code list back to tai 1:
(generate the selection code 236) +=
....
236 233
head->next->prev = codelist;
codelist->next = head->next;
codelist =tail;
The last diagram in Figure 10.2 shows the result, which omits the place-
holder.
The fastest selection code when there are more than three cases is a
branch table: The value of expression is used as an index to this table,
238 CHAPTER 10 •STATEMENTS

and the ith entry holds Li, or L if i is not a case label. For this organiza-
tion, selection takes constant time. This table takes space proportional
to u - l + 1 where l and u are the minimum and maximum case values.
For n case values, the density of the table - the fraction occupied by
nondefault destination labels - is n/ (u - l + 1). If the density is too low,
this organization wastes space. Worse, there are legal switch statements
for which it is impractical:
switch (i) {
case INT_MIN : ... ' break;
case INT_MAX: ... ' break;
}

At the other extreme, a linear search - a sequence of n comparisons -


is compact but slow. It takes only O(n) space for any set of case labels,
but selection takes O(n) time. Using a binary search would reduce the
time to O(logn) and increase the space by O(logn).
1 cc combines branch tables and binary search: It generates a binary
search of dense branch tables. If there are m tables, selection takes
O(logm) time and space that is proportional ton+ logm. Generating
the selection code for this approach involves three steps: partitioning
the value-label pairs into dense tables, arranging these tables into a tree
that mirrors the binary search, and traversing this tree to generate code.
An example helps describe the code for these steps. Suppose the case
values are
i 0 1 2 3 4 5 6 7 8 9
v[i] 21 22 23 27 28 29 36 37 38 39
v is the va 1ues array, and the numbers above the line are the indices into
v. The density, d(i,j), for the subset of values v[i..j] is the number of
values divided by the range of those values:
d(i,j) = (j - i + 1)/(v[j] - v[i] + 1).

For example,
d(0,9) (9 - 0 + 1)/(39 - 21+1) 10/19 0.53
d(0,5) (5-0+1)/(29-21+1) 6/9 0.67
d(6,9) (9-6+1)/(39-36+1) 4/4 1.0

The value of density is the minimum density for branch tables:


{stmt.c data)+=
....
217
float density= 0.5;
As shown, the default density is 0.5, which results in a single table for the
example above because d(O, 9) > 0.5. lee's -dx option changes density
to x. If density is 0.66, the example generates two tables (v[0 .. 5]
10.7 •SWITCH STATEMENTS 239

and v[6 .. 9]), and three tables if density is 0.75 (v[0 .. 2], v[3 .. 5], and
v[6 .. 9]). If density exceeds 1.0, there are none-element tables, which
corresponds to a binary search.
A simple greedy algorithm implements partitioning: If the current ta-
ble is v[i .. j] and d(i,j + 1) ~density, extend the table to v[i .. j + l].
Whenever a table is extended, it's merged with its predecessor if the den-
sity of the combined table is greater than density. swgen does both of
these steps at once by treating the single element v [j + 1] as the table
v[j + l..j + 1] and merging it with its predecessor, if possible. In the
code below, buckets [k] is the index in v of the first value in the kth ta-
ble, i.e., table k is v[buckets[k] .. buckets[k + 1] - 1]. For n case values,
there can be up to n tables, so buckets can have n + 1 elements.
(stmt.c macros}+= 232
...
#define den(i,j) ((j-buckets[i]+l.O)/(v[j]-v[buckets[i]]+l))

(stmt.c functions}+=
...
235 240
static void swgen(swp) Swtch swp; {
....
int *buckets, k, n, *v = swp->values;

buckets = newarray(swp->ncases + 1,
sizeof *buckets, FUNC);
for (n = k = O; k < swp->ncases; k++, n++) { 238 density
buckets[n] = k; 97 FUNC
while (n > 0 && den(n-1, k) >= density) 28 newarray
n--; 240 swcode
} 231 Swtch
buckets[n] = swp->ncases;
swcode(swp, buckets, 0, n - 1);
}

When swgen calls swcode, there are n tables, buckets [O .. n-1] holds the
indices into v for the first value in each table, and buckets [n] is equal
ton, which is the index of a fictitious n+lst table.
The display below illustrates how swgen partitions the example from
above when density is 0.66. The first iteration of the for loop ends with:
v[i] j2.l 22 23 27 28 29 36 37 38 39
The vertical bars appear to the left of the first element of a table and
thus represent the values of buckets. The rightmost bar is the value of
buckets [n]. The value associated with k is underlined. So, at the end
of the first iteration, k is zero and refers to the value 21, and the one
table is v[0 .. 0]. The next two iterations set buckets [1] to 1 and 2, and
in each case combine the single-element tables v[l..1] and v[2 .. 2] with
their predecessors v[O .. O] and v[O .. l]. At the end of the third iteration,
the state is
240 CHAPTER 10 •STATEMENTS

v[i] 121 22 23 27 28 29 36 37 38 39
and the only table is v[0 .. 2]. The fourth iteration cannot merge v[3 .. 3],
which holds just 27, with v[0 .. 2] because the density d(O, 3) = 4/7 =
0.57 is too low, so the state becomes
v[iJ 121 22 23 1U 28 29 36 37 38 39
Next, v[ 4 ..4] (28) can be merged with v[3 .. 3], but v[3 ..4] cannot be
merged with v[0 .. 3] because d(O, 4) = 5/8 = 0.63.
The iteration that examines 29 is the interesting one. Just before the
while loop, n is 2 and the state is
v[i] 121 22 23 127 28 ~ 36 37 38 39
The while loop merges v[3 .. 4] with v[5 .. 5] and decrements n to l; since
d(O, 5) = 6/9 = 0.67, it also merges v[0 .. 2] with the just-formed v[3 .. 5]
and decrements n to 0. The state after the while loop is
v[i] 121 22 23 27 28 29 36 37 38 39
This process ends with two tables; the state just before calling swcode is
i 0 1 2 3 4 5 6 7 8 9
v[i] 121 22 23 27 28 29 136 37 38 391
and n is 2 and buckets holds the indices 0, 6, and 10.
swgen 239 The last two steps arrange the tables described by buckets into a tree
Swtch 231 and traverse this tree generating the selection code for each table. swcode
uses a divide-and-conquer algorithm to do both steps at the same time.
swgen calls swcode with the switch handle, buckets, the lower and upper
bounds of buckets, and the number of tables. buckets also has a sentinel
after its last element, which simplifies accessing the last case value in the
last table.
swcode generates code for the ub-1 b+l tables given by b [lb .. ub]. It
picks the middle table as the root of the search tree, generates code for
it, lfld calls itself recursively for the tables on either side of the root
table.
....
(stmt.c functions)+= 239 242
....
static void swcode(swp, b, lb, ub)
Swtch swp; int b[]; int lb, ub; {
int hilab, lolab, l, u, k =(lb+ ub)/2;
int *v = swp->values;

(swcode241)
}
When there's only one table, switch expressions whose value is not within
the range covered by the table cause control to be transferred to the
default label. For a binary search of tables, control needs to flow to the
appropriate subtable when the switch expression is out of range.
10.7 •SWITCH STATEMENTS 241

(swcode241)=
if Ck > lb && k < ub) {
...
241 240

lolab = genlabelCl);
hilab = genlabelCl);
} else if Ck > lb) {
lolab = genlabelCl);
hilab = swp->deflab->u.l.label;
} else if Ck < ub) {
lolab = swp->deflab->u.l.label;
hilab = genlabelCl);
} else
lolab = hilab = swp->deflab->u.l.label;
1o1 ab and hi 1ab are where control should be transferred to if the switch
expression is less than the root's smallest value or greater than the root's
largest value. If the search tree has both left and right subtables, 1o1 ab
and hi 1ab will label their code sequences. The default label is used for
hilab when there's no right subtable and for lolab when there's no left
subtable. If the root is the only table, the default label is used for both
lo lab and hi lab.
Finally, the code for the root table is generated:
....
(swcode 241)+=
1 = b[k];
241 241
... 240
246 definelab
45 genlabel
u = b[k+l] - 1; 240 swcode
if Cu - 1 + 1 <= 3)
{generate a linear search)
else {
{generate an indirect jump and a branch table 242)
}

and swcode is called recursively to generate the left and right subtables .
....
{swcode 241) += 241 240
if Ck > lb) {
definelabClolab);
swcodeCswp, b, lb, k - 1);
}
if Ck < ub) {
definelabChilab);
swcodeCswp, b, k + 1, ub);
}

A branch table takes two comparisons and an indirect jump - at least


three instructions. For most targets, this overhead makes a branch table
suitable only if there are more than three values in the table. Otherwise,
a short linear search is better; see Exercise 10.8.
242 CHAPTER 10 •STATEMENTS

The code generated for an indirect jump through a branch table has
the form:
if tl < v[l] goto lolab
if tl > v[u] goto hilab
goto *table[tl-v[l]]
where v[l], v[u], lolab, and hi lab are replaced by the corresponding
values computed by swcode. The branch table is a static array of pointers,
and the tree for the target of an indirect jump is the same one that's built
for indexing an array:
(generate an indirect jump and a branch table 242) = 243
.... 241
Symbol table = genident(STATIC,
array(voidptype, u - 1 + 1, 0), LABELS);
(*IR->defsymbol)(table);
cmp(LT, swp->sym, v[l], lolab);
cmp(GT, swp->sym, v[u], hilab);
walk(tree(JUMP, voidtype,
rvalue((*optree['+'])(ADD, pointer(idtree(table)),
(*optree['-'])(SUB,
cast(idtree(swp->sym), inttype),
array 61
cast 175 consttree(v[l], inttype)))), NULL), 0, O);
consttree 193
defaddress 91 cmp builds the tree for the comparison
(MIPS) " 456
i f p ®n goto L
(SPARC) " 490
(X86) " 523 and converts it to a dag. p is an identifier, ® is a relational operator, and
defsymbol 89
(MIPS) " 457 n is an integer constant:
(SPARC) " 491
(stmt.c functions)+=
...
240 244
(X86) " 520 ....
eqtree 195 static void cmp(op, p, n, lab) int op, n, lab; Symbol p; {
genident 49 listnodes(eqtree(op,
idtree 168 cast(idtree(p), inttype),
IR 306 consttree(n, inttype)),
LABELS 38
listnodes 318 lab, 0);
optree 191 }
pointer 174
rvalue 169 cmp is also used to generate a linear search; see Exercise 10.8.
STATIC 80 The branch table is generated by defining the static variable denoted
swcode 240 by table and calling the interface function defaddress for each of the
Switch 217 labels in the table. But this process cannot be done until the generated
table 41
tree 150 code is emitted, so the relevant data are saved on the code list in a Swi tch
voidptype 58 entry:
voidtype 58
walk 311 (Switch 242)= 217
struct {
Symbol sym;
10.B •RETURN STATEMENTS 243

Symbol table;
Symbol defl ab;
int size;
int *values;
Symbol *labels;
} swtch;
....
(generate an indirect jump and a branch table 242) += 242 241
code(Switch);
codelist->u.swtch.table = table;
codelist->u.swtch.sym = swp->sym;
codelist->u.swtch.deflab = swp->deflab;
codelist->u.swtch.size = u - l + 1;
codelist->u.swtch.values = &v[l];
codelist->u.swtch.labels = &swp->labels[l];
if (v[u] - v[l] + 1 >= 10000)
warning("switch generates a huge table\n");
The table is emitted by emi tcode.

10.8 Return Statements


Return statements without an expression appear in void functions, and 62 Aflag
247 branch
returns with expressions appear in all other functions. An extraneous 290 cfunc
expression is an error, and a missing expression draws only a warning: 218 code
(return statement243)= 221
217 codelist
220 definept
{ 341 emitcode
Type rty = freturn(cfunc->type); 155 expr
t = gettok(); 64 freturn
definept(NULL); 41 labels
if Ct ! = ' ; ') 244 retcode
217 Switch
if (rty == voidtype) { 41 table
error("extraneous return value\n"); 58 voidtype
expr(O);
retcode(NULL);
} else
retcode(expr(O));
else {
if (rty != voidtype
&& (rty != inttype I I Aflag >= 1))
warning("missing return value\n");
retcode(NULL);
}
branch(cfunc->u.f.label);
}
244 CHAPTER 10 •STATEMENTS

retcode type-checks its argument tree and calls wa 1k to build the corre-
sponding RET dag, as detailed below. This dag is followed by a jump to
cfunc->u. f. 1 abe l, which labels the end of the current function; cfunc
points to the symbol-table entry for the current function. (This jump
may be discarded by branch.) The back end must finish a function with
the epilogue - the code that restores saved values, if necessary, and
transfers from the function to its caller.
The code above doesn't warn about missing return values for functions
that return ints unless 1 cc's -A option is specified, because it's common
to use int functions for void functions; i.e., to use
f(double x) { ... return; }
instead of the more appropriate
void f(double x) { ... return; }
For many programs, warnings about missing int return values would
drown out the more important warnings about the other types.
For void functions, retcode has nothing to do except perhaps plant
an event hook:
(stmt.c functions)+=
...
242 246
....
void retcode(p) Tree p; {
assign 195 Type ty;
branch 247
cfunc 290
freturn 64 if (p NULL) {
pointer 174 if (events.returns)
walk 311 (plant event hook for return)
return;
}
(retcode 244)
}

For types other than void, retcode builds and walks a RET tree. The RET
operator simply identifies the return value so that the back end can put
it in the appropriate place specified by the target's calling conventions,
such as a specific register.
When there's an expression, retcode type-checks it, converts it to the
return type of the function as if it were assigned to a variable of that
type, and wraps it in the appropriate RET tree:
(retcode 244) = 245
.... 244
p = pointer(p);
ty = assign(freturn(cfunc->type), p);
if (ty == NULL) {
error("illegal return type; found '%t' expected '%t'\n",
p->type, freturn(cfunc->type));
10.8 •RETURN STATEMENTS 245

return;
}
p = cast(p, ty);
Integers, unsigneds, floats, and doubles are returned as is. Characters
and shorts are converted to the promoted type of the return type just as
they are in argument lists. Since there's no RET+P, pointers are converted
to unsigneds and returned by RET+I. Calls to such functions are made
with CALL+I, and their values are converted back to pointers with CVU+P .
(retcode 244)+=
...
244 244
if (retv)
(return a structure 245)
if (events.returns)
(plant an event hook for return p)
p = cast(p, promote(p->type));
if (isptr(p->type)) {
(warn if p denotes the address of a local)
p = cast(p, unsignedtype);
}
walk(tree(RET + widen(p->type), p->type, p, NULL), 0, O);
Returning the address of a local variable is a common programming er-
ror, so lee detects and warns about the easy cases; see Exercise 10.9. 197 asgntree
There is no RET +B. Structures are returned by assigning them to a vari- 175 cast
293 compound
able. As described in Section 9.3, if wants_ca11 b is one, this variable is 168 idtree
the second operand to CALL+B in the caller and the first local in the callee, 191 iscallb
and the back end must arrange to pass its address according to target- 60 isptr
specific conventions. If wants_ca11 b is zero, the front end passes the 71 promote
address of this variable as a hidden first argument, and never presents 291 retv
the back end with a CALL+B. In both cases, compound, which implements 149 RIGHT
169 rvalue
compound-statement, arranges for retv to point to the symbol-table en- 150 tree
try for a pointer to this variable. Returning a structure is an assignment 58 unsignedtype
to *retv: 311 walk
88 wants_callb
(return a structure 245) = 245 74 widen
{
if (iscallb(p))
p = tree(RIGHT, p->type,
tree(CALL+B, p->type,
p->kids[O]->kids[O], idtree(retv)),
rvalue(idtree(retv)));
else
p = asgntree(ASGN, rvalue(idtree(retv)), p);
walk(p, 0, O);
if (events.returns)
(plant an event hook for a struct return)
246 CHAPTER 10 • STATEMENTS

return;
}

As for ASGN+B (see Section 9.5) and ARG+B (see Section 9.3), there's an
opportunity to reduce copying for
return f();
f returns the same structure returned by the current function, so the
current function's retv can be used as the temporary for the call to f.
If the call to i sea11 b in the code above identifies this idiom, the CALL+B
tree is rebuilt using retv in place of the temporary.

10.9 Managing Labels and Jumps


Labels are defined by defi ne 1ab, and jumps to labels are made by
branch. These functions also collaborate to remove dead jumps, i.e.,
those that follow an unconditional jump or a switch, to avoid jumps to
jumps, and to avoid jumps to immediately following labels. They do so
using a scheme similar to the one used in code to detect unreachable
code.
defi nel ab appends a label definition to the code list and then checks
branch 247 if the preceding executable entry is a jump to the new label:
Code 217 ....
code 218 (stmt.c functions)+= 244 247
.....
codelist 217 void definelab(lab) int lab; {
findlabel 46 Code cp;
forest 311
iscallb 191 Symbol p = findlabel(lab);
Jump 217
kind 143 walk(NULL, 0, O);
Label 217 code(Label)->u.forest = newnode(LABELV, NULL, NULL, p);
newnode 315 for (cp = codelist->prev; cp->kind <=Label;)
ref 38
retv 291 cp = cp->prev;
walk 311 while ({cp points to a Jump to lab 247)) {
p->ref--;
(remove the entry at cp 247)
while (cp->kind <= Label)
cp = cp->prev;
}
}

newnode builds a dag for LABELV with a sym[OJ equal top. The for loop
walks cp backward in the code list to the first entry that represents ex-
ecutable code, and the while loops remove one or more jumps to 1ab.
cp is a jump to lab if *cp is a Jump entry, and its node computes the
address of 1ab:
10.9 • MANAGING LABELS AND JUMPS 247

(cp points to a Jump to lab 247)= 246


cp->kind == Jump
&& cp->u.forest->kids[O]
&& cp->u.forest->kids[O]->op == ADDRGP
&& cp->u.forest->kids[O]->syms[O] == p
Dropping the Jump out of the code list removes the useless jump:
(remove the entry at cp 247) = 246 247
cp->prev->next = cp->next;
cp->next->prev = cp->prev;
cp = cp->prev;
When defi nel ab removes a jump, it decrements the label's ref field.
It does so because building a jump dag increments the target label's ref
field:
....
(stmt.c functions)+=
Node jump(lab) int lab; {
246 247 ...
Symbol p = findlabel(lab);

p->ref++;
return newnode(JUMPV, newnode(ADDRGP, NULL, NULL, p),
NULL, NULL);
217 Code
} 218 code
217 codelist
jump is called by branch, which stores the JUMPV dag in a Jump entry and 246 definelab
appends it to the code list. 248 equatelab
branch also eliminates jumps to jumps and dead jumps. It begins by 46 fi ndl abe 1
appending the jump to the code list using a Label placeholder. The jump 311 forest
is not a label, but Label is used so that (check for unreachable code) in 217 Jump
143 kind
code won't bark, which it would do if the last executable entry on the 217 Label
code list were an unconditional jump. 315 newnode
.... 38 ref
(stmt.c functions)+=
static void branch(lab) int lab; {
247 248 ... 311 walk

Code cp;
Symbol p = findlabel(lab);

walk(NULL, 0, 0);
code(Label)->u.forest = jump(lab);
for (cp = codelist->prev; cp->kind <Label;)
cp = cp->prev;
while ((cp points to a Label 1' lab 248)) {
equatelab(cp->u.forest->syms[O], p);
(remove the entry at cp 247)
while (cp->kind < Label)
cp = cp->prev;
248 CHAPTER 10 •STATEMENTS

}
(eliminate or plant the jump 249)
}

branch's for loop backs up to the first executable or Label entry before
the placeholder. The while loop looks for definitions of labels L' that
form the pattern
L':
goto L
where goto L is the jump in the placeholder.
(cp points to a Label !- lab 248)= 247
cp->kind == Label
&& cp->u.forest->op == LABELV
&& !equal(cp->u.forest->syms[O], p)
If L' !- L, L' is equivalent to L; jumps to L' can go to L instead, and the
Labe 1 entry for L' can be removed.
...
(stmt.c functions)+=
void equatelab(old, new) Symbol old, new; {
...
247 248

old->u.l.equatedto =new;
branch 247 new->ref++;
equatedto 46 }
forest 311
kind 143 makes new a synonym for old. During code generation, references to old
Label 217 are replaced by the label at the end of the list formed by the equatedto
ref 38
fields. These fields form a list because it's possible that new will be
equated to another symbol after old is equated to new. The ref field
counts the number of references to a label from jumps and from the
u. l .equated fields of other labels, so equatelab increments new->ref.
These synonyms complicate testing when two labels are equal. The
fragment (cp points to a Label !- lab) must fail when L' is equal to the
destination of the jump so code such as
top:
goto top;
is not erroneously eliminated, no matter how nonsensical it seems. Just
testing whether L' is equal to the destination, p, isn't enough; the two
labels are equivalent if L' is equal to p or to any label for which p is a
synonym. equa1 implements this more complicated test:
(stmt.c functions)+=
...
248
static int equal(lprime, dst) Symbol lprime, dst; {
for ( ; dst; dst = dst->u.l.equatedto)
if (lprime == dst)
FURTHER READING 249

return 1;
return O;
}

If cp ends on a Jump or Switch, the branch is unreachable, and the


placeholder can be deleted. Otherwise, the placeholder becomes a Jump:
{eliminate or plant the jump 249)= 248
if (cp->kind == Jump I I cp->kind == Switch) {
p->ref--;
codelist->prev->next = NULL;
codelist = codelist->prev;
} else {
codelist->kind = Jump;
if (cp->kind == Label
&& cp->u.forest->op == LABELV
&& equal(cp->u.forest->syms[O], p))
warning("source code specifies an infinite loop");
}

The warning exposes infinite loops like the one shown above.

Further Reading 217 codelist


248 equal
Baskett (1978) describes the motivations for the layout of the generated 311 forest
217 Jump
code for the loops. 143 kind
1cc's execution points have been used for generating debugger sym- 217 Label
bol tables and for profiling. Ramsey and Hanson (1992) describe how the 38 ref
retargetable debugger 1db uses execution points to locate breakpoints 217 Switch
and to provide starting points for searching the debugger's symbol ta-
ble. Ramsey (1993) details the use of the stab interface functions to
generate symbol-table data, and describes how 1cc itself can be used
to evaluate C expressions entered during debugging. Fraser and Han-
son (199lb) describe the implementation of 1cc's machine-independent
profiling enabled by its -b option.
Many papers and compiler texts describe how to generate selection
code for switch statements. Hennessy and Mendelsohn (1982) and Bern-
stein (1985) describe techniques similar to the one used in lee. The
greedy algorithm groups the case values into dense tables in linear time,
but not into the minimum number of such tables. The one-page paper
by Kannan and Proebsting (1994) gives a simple quadratic algorithm for
doing so.
250 CHAPTER 10 • STATEMENTS

Exercises
10.1 Implement the do statement.
10.2 Implement the while statement.
10.3 Implement
(stmt.c prototypes)=
static int foldcond ARGS((Tree el, Tree e2));
which is called by forstmt. Hint: Build a tree that conditionally
substitutes el for the left operand of the test e2, when appropriate.
If the operands of this tree are constants, s imp 1 i fy will return a
CNST tree that determines whether the loop body will be executed
at least once.
10.4 There's a while loop in (case label), but there's no repetitive con-
struct in the grammar for case labels. Explain.
10.5 Prove that the execution time of the partitioning algorithm in swgen
is linear in n, the number of case values.
10.6 Here's another implementation of swgen's partitioning algorithm
(suggested by Arthur Watson).
density 238
forstmt 228 while (n > 0) {
simplify 203 float d = den(n-1, k);
swgen 239 if (d < density
I I k < swp->ncases - 1 && d < den(n, k+l))
break;
n--;
}

The difference is that a table and its predecessor are not combined
if the table and v[k+l] would form a denser table. For example,
with density equal to 0.5, the greedy algorithm partitions the val-
ues 1, 6, 7, 8, 11, and 15 into the three tables (1, 6-8), (11), and
(15), and this lookahead variant gives the two tables (1) and (6-8,
11, 15). Analyze and explain this variant. Can you prove under
what conditions it will give fewer tables than the greedy algorithm?
10.7 Change swgen to use the optimal partitioning algorithm described
by Kannan and Proebsting (1994). With density equal to 0.5, the
optimal algorithm partitions the values 1, 6, 7, 8, 9, 10, 15, and 19
into the two tables (1) and (6-10, 15, 19); the greedy algorithm and
its lookahead variant described in the previous exercise generate
the three tables (1, 6-10), (15), and (19). Can you find real programs
on which the optimal algorithm gives fewer tables than the greedy
algorithm? Can you detect the differences in execution times?
EXERCISES 251

10.8 Implement swcode's (generate a linear search). The generated code


has the form
i f t1 = v[l] goto Li

if tl = v[u] goto Lu
if t1 < v[l] goto lolab
if tl > v[u] goto hilab
Use cmp to do the comparisons, and avoid generating unnecessary
jumps to 1o1 ab and hi 1ab.
10.9 Implementing (warn if p denotes the address of a local) involves ex-
amining p to see if it's the address of a local or a parameter. This
test catches some, but not all, of these kinds of programming er-
rors. Give an example of an error that this approach cannot detect.
Is there a way to catch all such errors at compile-time? At run-time?
10.10 swcode is passed ub-1b+l tables in b [lb .. ub], and picks the mid-
dle table at b[(lb+ub)/2] as the root of the tree from which it
generates a binary search. Other choices are possible; it may, for
instance, choose the largest table, or profiling data could supply the
frequency of occurrence for each case value, which could pinpoint
the table that's most likely to cover the switch value. Alternatively, 242 cmp
we could assume a specific probability distribution for the case val- 91 defaddress
ues. Suppose all values in the range v[b[l b] .. b[ub + 1] - 1] - even 456 " (MIPS)
those for which there are no case labels - are equally likely to oc- 490 " {SPARC)
523 " {X86)
cur. For this distribution, the root table should be the one with
240 swcode
a case value closest to the middle value in this range. Implement
this strategy by computing swcode's k appropriately. Be careful; it's
possible that no table will cover the middle value, so pick the one
that's closest.
10.11 Some systems support dynamic linking and loading. When new
code is loaded, the dynamic linker must identify and update all re-
locatable addresses in it. This process takes time, so dynamically
linked code benefits from position-independent addresses, which
are relative to the value that the program counter will have during
the execution of the instruction that uses the address. For example,
if the instruction at location 200 jumps to location 300, conven-
tional relocatable code stores the address 300 in the instruction,
but position-independent code stores 300 - 200 or 100 instead. Ex-
tend 1cc's interface so that it can emit position-independent code
for switch statements. The interface defined in Chapter 5 can't
do so because it uses the same defaddress for switch statements
that it uses to initialize pointer data, which mustn't be position-
independent.
11
Declarations

Declarations specify the types of identifiers, define structure and union


types, and give the code for functions. Parsing declarations can be
viewed as converting the textual representation of types to the corre-
sponding internal representations described in Chapter 4 and generating
the code list decribed in Section 1.3.
Declarations are the most difficult part of C to parse. There are two
main sources of this difficulty. First, the syntax of declarations is de-
signed to illustrate the use of an identifier. For example, the declaration
int *x [10] declares x to be an array of 10 pointers to ints. The idea
is that the declaration illustrates the use of x; for example, the type of
*x[i] is int. Unfortunately, distributing the type information throughout
the declaration complicates parsing it.
The other difficulty comes from the restrictions on the declarations for
globals, locals, and parameters. For example, locals and globals can be
declared static, but parameters cannot. Likewise, both function declara-
function 92 tions and function definitions may appear at file scope, but only function
(MIPS) " 448
(SPARC) " 484 declarations may appear at a local scope. It is possible to write a syntax
(X86) " 518 specification that embodies these kinds of restrictions, but the result is a
set of repetitious productions that vary slightly in detail. An alternative,
illustrated by the declaration syntax given throughout this chapter, is to
specify the syntax of the most general case and use semantic checks dur-
ing parsing to enforce the appropriate restrictions depending on context.
Since the rules concerning redeclaration vary among the three kinds of
identifiers, such checks are necessary in any case.
The text and the code in this chapter reflect these difficulties: This
chapter is the longest one in this book, and some of its code is intricate
and complex because it must cope with many, sometimes subtle, details.
Some of the functions are mutually recursive or are used for several
purposes, so circularities in their explanations are unavoidable.
The first five sections describe how declarations are parsed and are in-
ternalized in the front end's data structures described in previous chap-
ters. The last four sections cover function definitions, compound state-
ments, finalization, and l cc's main program. These sections are perhaps
the more important because they contribute most to understanding the
interaction between the front end and the back ends. Section 11.6, for
example, is where the front end calls the back ends' function interface
routine, and Section 11.9 reveals how the interface record for a specific
target is bound to the front end.

252
11. 1 • TRANSLA T/ON UNITS 253

11.1 Translation Units


A C translation unit consists of one or more declarations or function
definitions:
translation-unit:
external-declaration { external-declaration}
external-declaration:
function-definition
declaration
program is the parsing function for translation-unit and one of the five
functions exported by decl. c, which processes all declarations. It ac-
tually parses translation-unit as if it permitted empty input, and only
warns about that case:
(decl.c functions)= 255
.....
void program() {
int n;

level = GLOBAL;
for (n = O; t != EOI; n++)
if (kind[t] == CHAR I I kind[t] == STATIC
11 t == ID 11 t == '*' 11 t == I(') { 109 CHAR
decl (dclglobal); 260 dclglobal
298 dcllocal
(deallocate arenas 254) 274 dclparam
} else if (t == ';') { 258 decl
warning("empty declaration\n"); 38 GLOBAL
t = gettok(); 143 kind
} else { 42 level
80 STATIC
error("unrecognized declaration\n");
t = gettok();
}
if (n == 0)
warning("empty input file\n");
}

decl is the parsing function for both function-definition and declara-


tion, because a function-definition looks like a declaration followed by
a compound-statement. decl's argument is dclglobal, dcllocal, or
dcl par am. After decl and its collaborators have digested a complete dec-
laration for an identifier, they call a dclX function to validate the identi-
fier and install it in the appropriate symbol table. These dclX functions
enforce the semantic differences between globals, locals, and parameters
mentioned above.
The two else arms in the loop body above handle two error conditions.
The standard insists that a declaration declare at least an identifier, a
254 CHAPTER 11 • DECLARATIONS

structure or enumeration tag, or one or more enumeration members. The


first else warns about declarations that don't, and the second diagnoses
declarations with syntax errors.
Declarations can allocate space in any arena. Function definitions al-
locate space in the PERM and FUNC arenas, and variable declarations use
space in the STMT arena for the trees that represent initializers. Thus,
both the FUNC and STMT arenas are deallocated at the ends of declara-
tions and definitions:
{deallocate arenas 254)= 253
deallocate(STMT);
deallocate(FUNC);

11.2 Declarations
The syntax for declarations is
declaration:
declaration-specifiers init-declarator { , init-declarator} ;
declaration-specifiers ;
init-declara tor:
deallocate 28 declarator
FUNC 97 declarator= initializer
PERM 97
STMT 97 initializer:
assigmnent-expression
' { ' initializer { , initializer } [ , ] ' } '
declaration-specifiers:
storage-class-specifier [ declaration-specifiers ]
type-specifier [ declaration-specifiers ]
type-qualifier [ declaration-specifiers ]
storage-class-specifier:
typedef I extern I static I auto I register
type-specifier:
void
char I fl oat I short I signed
int I double I long I unsigned
s truct-or-union-specifier
en um-specifier
identifier
type-qualifier: con st I vo 1ati1 e

A declaration specifies the type of an identifier and its other attributes,


such as its storage class. A definition declares an identifier and causes
11.2 •DECLARATIONS 255

storage for it to be reserved. Declarations with initializers are definitions;


those without initiallzers are tentative definitions, which are covered in
Section 11.8.
A declaration begins with one or more specifiers in any order. For
example, all the declarations '
short canst x;
canst short x;
canst short int x;
int canst short x;
declare x to be short integer that cannot be modified. storage-class-
specifiers, type-specifiers, and type-qualifiers can appear in any order,
but only one of each kind of specifier can appear. This flexibility com-
plicates specifier, the parsing function for declaration-specifiers.
( decl.c functions)+=
...
253 258
....
static Type specifier(sclass) int *sclass; {
int els, cons, sign, size, type, vol;
Type ty = NULL;

els = vol = cons sign size type O;


if (sclass == NULL)
els = AUTO; 80 AUTO
for (; ;) {
int *p, tt = t;
switch (t) {
(set p and ty 256)
default: p = NULL;
}
if (p == NULL)
break;
(check for invalid use of the specifier 256)
*p = tt;
}
if (scl ass)
*sclass = els;
(compute ty 257)
return ty;
}

If specifier's argument, scl ass, is nonnull, it points to the variable


to which the token code for the storage class should be assigned. The
locals els, vol, cons, sign, size, and type record the appearance of the
similarly named specifiers by being assigned the token code for their
specifier:
256 CHAPTER 11 • DECLARATIONS

(set p and ty 256)= 256 255


....
case AUTO:
case REGISTER: if (level <= GLOBAL && els == 0)
error("invalid use of '%k'\n", t);
p =&els; t = gettok(); break;
case STATIC: case EXTERN:
case TYPEDEF: p =&els; t = gettok(); break;
case CONST: p =&cons; t = gettok(); break;
case VOLATILE: p =&vol; t = gettok(); break;
case SIGNED:
case UNSIGNED: p =&sign; t gettok(); break;
case LONG:
case SHORT: p =&size; t gettok(); break;
case VOID: case CHAR: case INT: case FLOAT:
case DOUBLE: p = &type; ty = tsym->type;
t = gettok(); break;
case ENUM: p =&type; ty = enumdcl(); break;
case STRUCT:
case UNION: p =&type; ty = structdcl(t); break;
These variables are initialized to zero and change only when their cor-
responding specifier is encountered. Thus, a nonzero value for any of
AUTO 80 these variables indicates that their specifier has already appeared, which
CHAR 109 helps detect errors:
CONST 109
DOUBLE 109 (check for invalid use of the specifier256)= 255
enumdcl 310 if (*p)
ENUM 109 error("invalid use of '%k'\n", tt);
EXTERN 80
FLOAT 109 Once all the declaration-specifiers have been consumed, the values of
GLOBAL 38
INT 109
sign, size, and type encode the specified type. enumdcl and structdcl
i stypename 115 parse enum-specifier and struct-or-union-specifier.
level 42 If scl ass is null, then a storage class must not appear, so cl sis initial-
LONG 109 ized as if one did occur, which catches errors. This flexibility is needed
REGISTER 80 because specifier is called when parsing abstract-declarators, which do
SHORT 109
specifier 255
not have storage classes; see Section 11.3 and Exercise 11.3.
STATIC 80 The body of the switch statement shown above points p to the appro-
structdcl 277 priate local variable, and sets ty to a Type if the token is a type-specifier.
STRUCT 109 A typedef name, which arrives as an ID token, can appear with only a
tsym 108 storage class or a qualifier:
UNION 109
UNSIGNED 109 (set p and ty 256)+= •
256 255
use 51
VOID 109
case ID:
VOLATILE 109 if (istypename(t, tsym) && type == 0) {
use(tsym, src);
ty = tsym->type;
p = &type;
11.2 • DECLARATIONS 257

t = gettok();
} else
p = NULL;
break;
All that remains after parsing declaration-specifiers is to determine
the appropriate Type, which is encoded in the values of sign, size, and
type. This Type is specifier's return value. The default
{compute ty 257}= 257
..... 255
if (type == O) {
type = INT;
ty = inttype;
}

is what makes short canst x declare x a short integer. The remaining


cases inspect sign, size, and type to determine the appropriate type:
{compute ty 257}+=
...
257 257 255
.....
if (size == SHORT && type != INT
I I size == LONG && type != INT && type != DOUBLE
I I sign && type != INT && type != CHAR)
error("invalid type specification\n");
if (type == CHAR && sign) 109 CHAR
ty = sign == UNSIGNED ? unsignedchar : signedchar; 109 CONST
else if (size == SHORT) 258 decl
ty = sign == UNSIGNED ? unsignedshort : shorttype; 109 DOUBLE
310 enumdcl
else if (size == LONG && type == DOUBLE) 109 INT
ty = longdouble; 57 longdouble
else if (size == LONG) 109 LONG
ty = sign == UNSIGNED ? unsignedlong : longtype; 57 longtype
else if (sign == UNSIGNED && type == INT) 62 qual
109 SHORT
ty = unsignedtype; 57 shorttype
57 signedchar
The explicit inclusion of sign in the test for CHAR is needed to distinguish 255 specifier
signed and unsigned chars from plain chars, which are a distinct type. 277 structdcl
The resulting Type, computed by the code above, enumdcl or structdcl, 57 unsignedchar
can be qualified by const or volatile qualifiers, or both: 109 UNSIGNED

{compute ty 257} +=
...
257 255
57 unsignedlong
58 unsignedshort
if (cons == CONST) 58 unsignedtype
109 VOLATILE
ty = qual(CONST, ty);
if (vol == VOLATILE)
ty = qual(VOLATILE, ty);
dee 1, the parsing function for declaration, starts by calling specifier:
258 CHAPTER 11 • DECLARATIONS

(decl.c functions}+= 255 260


...
....
static void decl(dcl)
Symbol (*dcl) ARGS((int, char*, Type, Coordinate *)); {
int sclass;
Type ty, tyl;
static char stop[] = { CHAR, STATIC, ID, 0 };

ty = specifier(&sclass);
if Ct == ID 11 t == '*' 11 t 'C' 11 t == '[') {
char *id;
Coordinate pos;
(id, tyl - the first declarator 258}
for (;;) {
(declare id with type tyl 260}
if (t != ',')
break;
t = gettok();
(id, tyl - the next declarator258}
}
} else if (ty == NULL
11 ! (ty is an enumeration or has a tag})
error("empty declaration\n");
CHAR 109 test(';', stop);
Coordinate 38 }
dclr 265
specifier 255 dcl r, described in the next section, parses a declarator. The easy case is
STATIC 80 the one for the second and subsequent declarators:
test 141
(id, tyl - the next declarator258} = 258
id = NULL;
pos = src;
tyl = dclr(ty, &id, NULL, O);
dcl r accepts a base type - the result of specifier - and returns a Type,
an identifier, and possibly a parameter list. The base type, ty in the code
above, is dcl r's first argument, and its next two arguments are the ad-
dresses of the variables to assign the identifier and parameter list, if they
appear. It returns the complete Type. Passing a null pointer as dcl r's
third argument specifies that parameter lists may not appear in this con-
text. As detailed in Sectio:rflll.3, a nonzero fourth argument causes dcl r
to parse an abstract-declwator. pos saves the source coordinate of the
beginning of a declarator for use when the identifier is declared.
The first declaratot is treated differently than the rest because decl
also recognizes function-definitions, which can be confused with only
the first declarator at file scope:
(id, tyl - thefirstdeclarator258}= 258
id = NULL;
11.2 • DECLARATIONS 259

pos = src;
if (level == GLOBAL) {
Symbol *params = NULL;
tyl = dclr(ty, &id, &params, O);
if ((function definition?259)) {
(define function id 259)
return;
} else if (params)
exitparams(params);
} else
tyl = dclr(ty, &id, NULL, 0);
Since the first declarator might be a function definition, a nonnull lo-
cation for the parameter list is passed as dcl r's third argument. If the
declarator includes a function and its parameter list, params is set to an
array of symbol-table entries. When there is a parameter list, but it's
not part of a function definition, exi tparams is called to close the scope
opened by that list. This scope isn't closed when the end of the list is
reached because the parsing function for parameter lists can't differenti-
ate between a function declaration and a function definition. Section 11.4
elaborates.
A declaration is really a function-definition if the first declarator spec-
ifies a function type and includes an identifier, and the next token begins 265 dcl r
either a compound statement or a Hst of parameter declarations: 258 decl
272 exitparams
(function definition? 259) = 259 42 exitscope
params && id && isfunc(tyl) 80 EXTERN
&& (t == '{' I I istypename(t, tsym) 286 funcdefn
I I (kind[t] == STATIC && t != TYPEDEF)) 38 GLOBAL
60 isfunc
115 istypename
decl calls funcdefn to handle function definitions: 143 kind
42 level
(defi.ne function id 259)= 259
63 oldstyle
if (sclass == TYPEDEF) { 271 parameters
error("invalid use of 'typedef'\n"); 80 STATIC
sclass = EXTERN; 108 tsym
}
if (tyl->u.f.oldstyle)
exi tscope () ;
funcdefn(sclass, id, tyl, params, pos);
The call to exi tscope closes the scope opened in parameters because
that scope will be reopened in funcdefn when the declarations for the
parameters are parsed.
The semantics part of decl amounts to declaring the identifier given in
the declarator. As described above, decl's argument is a dclX function
that does this semantic processing, except for typedefs.
260 CHAPTER 11 • DECLARATIONS

(declare id with t}pe tyl 260)= 258


if (Aflag >= 1 && !hasproto(tyl))
warning("missing prototype\n");
i f (id == NULL)
error("mi ssi ng i denti fi er\n");
else if (sclass == TYPEDEF)
(declare id a typedef for tyl 260)
else
(void)(*dcl)(sclass, id, tyl, &pos);
Typedefs are the easy case. The semantic processing simply checks for
redeclaration errors, installs the identifier id into the i denti fie rs table,
and fills in its type and storage class attributes.
(declare id a typedef for tyl 260) = 260
{
Symbol p = lookup(id, identifiers);
if (p && p->scope == level)
error("redeclaration of '%s'\n", id);
p = install(id, &identifiers, level,
level <LOCAL? PERM : FUNC);
p->type = tyl;
p->sclass = TYPEDEF;
Afl ag 62
Coordinate 38
p->src = pos;
dcllocal 298 }
decl 258
FUNC 97 The three dclX functions are more complicated. Each copes with a
hasproto 75 slightly different declaration semantics, and dclglobal and dcllocal
identifiers 41 also parse initializers. dclglobal is the most complicated of the three
install 44 functions because it must cope with valid redeclarations. For example,
level 42
LOCAL 38 extern int x[];
lookup 45
int x[lO];
PERM 97
scope 37
validly declares x twice. The second declaration also changes x's type
from (ARRAY (INT)) to (ARRAY 40 4 (INT)).
(decl.c functions)+=
....
258 264
.....
static Symbol dclglobal(sclass, id, ty, pos)
int sclass; char *id; Type ty; Coordinate *pos; {
Symbol p, q;

(dclglobal 261)
return p;
}

decl accepts any set of specifiers and declarators that are syntactically
legal, so the dclX functions must check for the specifiers that are illegal
11.2 • DECLARATIONS 261

in their specific semantic contexts, and must also check for redeclara-
tions. dclglobal, for example, insists that the storage class be extern,
static, or omitted:
(dclglobal 261)= 261 260
if (sclass == 0) ""
sclass = AUTO;
else if (sclass != EXTERN && sclass != STATIC) {
error("invalid storage class '%k' for '%t %s'\n",
sclass, ty, id);
sclass = AUTO;
}

Globals that have no storage class or an illegal one are given storage class
AUTO so that all identifiers have nonzero storage classes, which simplifies
error checking elsewhere.
dclglobal next checks for redeclaration errors.
(dclglobal 261)+= 261 262 260
...
p = lookup(id, identifiers); ""
if (p && p->scope == GLOBAL) {
if (p->sclass != TYPEDEF && eqtype(ty, p->type, 1))
ty = compose(ty, p->type);
else 80 AUTO
error("redeclaration of '%s' previously declared _ 72 compose
260 dclglobal
at %w\n", p->name, &p->src); 50 defined
if (!isfunc(ty) && p->defined && t == '=') 69 eqtype
error("redefinition of '%s' previously defined_ 80 EXTERN
at %w\n", p->narne, &p->src); 38 GLOBAL
(check for inconsistent linkage 262) 41 identifiers
60 isfunc
}
45 lookup
37 scope
A redeclaration is legal if the types on both declarations are compati- 80 STATIC
ble, which is determined by eqtype, and the resulting type is the com-
posite of the two types. Forming this composite is how the type of x,
illustrated above, changed from (ARRAY (INT)) to (ARRAY 40 4 (INT)).
Some redeclarations are legal, but redefinitions - indicated by a nonzero
defined flag and an approaching initializer - are never legal.
An identifier has one of three kinds of linkage. Identifiers with ex-
ternal linkage can be referenced from other separately compiled trans-
lation units. Those with internal linkage can be referenced only within
the translation unit in which they appear. Parameters and locals have no
linkage.
A global with no storage class or declared extern in its first declaration
has external linkage, and those declared static have internal linkage. On
subsequent declarations, an omitted storage class or extern has a slightly
different interpretation. If the storage class is omitted, it has external
262 CHAPTER 11 • DECLARATIONS

linkage, but if the storage class is extern, the identifier has the same
linkage as a previous file-scope declaration for the identifier. Thus,
static int y;
extern int y;
is legal and y has internal linkage, but
extern int y;
static int y;
is illegal because the second declaration demands that y have internal
linkage when it already has external linkage. Multiple declarations that
all have external or internal linkage are permitted.
The table below summarizes these rules in terms of p->scl ass, the
storage class of an existing declaration, and scl ass, the storage class
for the declaration in hand. AUTO denotes no storage class.
sclass
EXTERN STATIC AUTO
EXTERN J x J
p->sclass STATIC J J x
AUTO J x J
Aflag 62 J marks the legal combinations, and x marks the combinations that are
AUTO 80 linkage errors. The code use in dclglobal above is derived from this
dclglobal 260 table:
EXTERN 80
GLOBAL 38 (check for inconsistent linkage 262)= 261
globals 41 if (p->sclass == EXTERN && sclass == STATIC
i nstal 1 44
PERM 97 I I p->sclass == STATIC && sclass == AUTO
scope 37 I I p->sclass == AUTO && sclass == STATIC)
STATIC 80 warning("inconsistent linkage for '%s' previously_
declared at %w\n", p->name, &p->src);
This if statement prints its warning for the second of the two examples
shown above.
Next, the global is installed in the globals table, if necessary, and its
attributes are initialized or overwritten.
(dclglobal 261)+= 261 263 260
...
....
if (p == NULL I I p->scope != GLOBAL) {
p = install(id, &globals, GLOBAL, PERM);
p->sclass = sclass;
if (p->sclass != STATIC) {
static int nglobals;
nglobals++;
if (Aflag >= 2 && nglobals == 512)
warning("more than 511 external identifiers\n");
11.2 • DECLARATIONS 263

}
(*IR->defsymbol)(p);
} else if (p->sclass == EXTERN)
p->sclass = sclass;
p->type = ty;
p->src = *pos;
New globals are passed to the back end's defsymbol interface function
to initialize their x fields. If an existing global has storage class extern,
and this declaration has no storage class or specifies static, the global's
sclass is changed to either STATIC or AUTO to ensure that it's defined in
fi na 1i ze. If this declaration specifies extern, the assignment to scl ass is
made but has no effect. 1cc's -A option enables warnings about non-ANSI
usage. For example, the standard doesn't require an implementation to
support more that 511 external identifiers in one compilation unit, so
1cc warns about too many externals when -A -A is specified.
The standard permits compilers to accept
f() {extern float g(); ... }
int g() { ... }
h() {extern double g(); ... }
without diagnosing that the first declaration for g conflicts with its defi-
nition (which is also a declaration), or that the last declaration conflicts 80 AUTO
with the first two. Technically, each declaration for g introduces a differ- 260 dclglobal
ent identifier with a scope limited to the compound statement in which 298 dcllocal
89 defsymbol
the declaration appears. But all three g's have external linkage and must 457 " (MIPS)
refer to the same function at execution time. 1cc uses the exte rna 1s ta- 491 " (SPARC)
ble to warn about these kinds of errors. dcllocal adds identifiers with 520 " (X86)
external linkage to externals, and both dcllocal and dclglobal check 69 eqtype
for inconsistencies: 40 externals
80 EXTERN
(dclglobal 261)+=
....
262 263 260 303 finalize
..... 306 IR
{
60 isfunc
Symbol q = lookup(p->name, externals); 45 lookup
if (q && (p->sclass == STATIC 80 STATIC
I I !eqtype(p->type, q->type, 1)))
warning("declaration of '%s' does not match previous _
declaration at %w\n", p->name, &q->src);
}

dclglobal concludes by parsing an initializer, if there's one coming and


it's appropriate.
(dclglobal 261)+= 263
.... 260
if Ct == '=' && isfunc(p->type)) {
error("illegal initialization for '%s'\n", p->name);
t = gettok();
264 CHAPTER 11 • DECLARA T/ONS

initializerCp->type, O);
} else if Ct == '=')
initglobalCp, O);
else if Cp->sclass == STATIC && !isfuncCp->type)
&& p->type->size == 0)
errorC"undefined size for '%t %s'\n", p->type, p->name);
The last else if clause above tests for declarations of identifiers with in-
ternal linkage and incomplete types, which are illegal; an example would
be:
static int x[];
i ni tg 1oba1 parses an initializer if one is approaching or if its second
argument is nonzero, and defines the global given by its first argument.
initglobal announces the global in the proper segment, parses its ini-
tializer, adjusts its type, if appropriate, and marks the global as defined .
....
(decl.c functions)+=
static void initglobalCp, flag) Symbol p; int flag; {
260 265 ...
Type ty;

if Ct == '=' I I flag) {
if Cp->sclass == STATIC) {
for Cty = p->type; isarrayCty); ty = ty->type)
AUTO 80
DATA 91
defglobal 265 defglobalCp, isconstCty) ? LIT : DATA);
defined 50 } else
doextern 303 defglobalCp, DATA);
EXTERN 80 if Ct == '=')
import 90
(MIPS) " 457
t = gettok();
(SPARC) " 491 ty = initializerCp->type, O);
(X86) " 523 if CisarrayCp->type) && p->type->size == 0)
isarray 60 p->type = ty;
isconst 60 if Cp->sclass == EXTERN)
isfunc 60
LIT 91
p->sclass = AUTO;
STATIC 80 p->defined = 1;
}
}
i ni ti a 1i ze r is the parsing function for initializer, and is omitted from
this book. If p's type is an array of unknown size, the initialization spec-
ifies the size and thus completes the type. An initialization is always
a definition, in which case an extern storage class is equivalent to no
storage class, so sclass is changed, if necessary. This change prevents
doextern from calling the back end's import for pat the end of compi-
lation.
defg 1oba1 announces the definition of its argument by calling the ap-
propriate interface functions.
11. 3 • DECLARATORS 265

...
{decl.c functions)+=
void defglobal(p, seg) Symbol p; int seg; {
...
264 265

p->u.seg = seg;
swtoseg(p->u.seg);
if (p->sclass != STATIC)
(*IR->export)(p);
(*IR->global)(p);
}

{globals 265) = 38
int seg;
Identifiers with external linkage are announced by calling the export in-
terface function, and global proclaims the actual definition. swtoseg(n)
switches to segment n (one of BSS, LIT, CODE, or DATA) by calling the
segment interface function, but it avoids the calls when the current seg-
ment is n. defglobal records the segment in the global's u.seg field.

11.3 Declarators
Treating {parse the first declarator) as a special case in decl is one of the
messy spots in recognizing declarations. Parsing a declarator, which is 91 BSS
defined below, is worse. The difficulty is that the base type occurs before 91 CODE
its modifiers. For example, int *x specifies the type (POINTER (INT)), 91 DATA
but building the type left-to-right as the declarator is parsed leads to the 267 dclrl
258 decl
meaningless type (INT (POINTER)). The precedence of the operators []
90 export
and () cause similar difficulties, as illustrated by 456 " (MIPS)
490 " (SPARC)
int *x[lO], *f(); 523 .. (X86)
90 global
The types of x and f are 458 " (MIPS)
492 " (SPARC)
(ARRAY 10 (POINTER (INT))) 524 " (X86)
(POINTER (FUNCTION (INT))) 306 IR
91 LIT
The * appears in the same place in the token stream but in different 91 segment
places in the type representation. 459 " (MIPS)
As these examples suggest, it's easier to build a temporary inverted 491 " (SPARC)
501 " (X86)
type during parsing, which is what dcl r does, and then traverse the in-
255 specifier
verted type building the appropriate Type structure afterward. dcl r's 80 STATIC
first argument is the base type, which is the type returned by specifier .
...
{decl.c functions)+=
static Type dclr(basety, id, params, abstract)
...
265 266

Type basety; char **id; Symbol **params; int abstract; {


Type ty = dclrl(id, params, abstract);
266 CHAPTER 11 • DECLARATIONS

for ( ; ty; ty = ty->type)


switch (ty->op) {
case POINTER:
basety = ptr(basety);
break;
case FUNCTION:
basety = func(basety, ty->u.f.proto,
ty->u.f.oldstyle);
break;
case ARRAY:
basety = array(basety, ty->size, O);
break;
case CONST: case VOLATILE:
basety = qual(ty->op, basety);
break;
}
if (Aflag >= 2 && basety->size > 32767)
warning("more than 32767 bytes in '%t'\n", basety);
return basety;
}

dcl rl parses a declarator and returns its inverted type, from which dcl r
builds and returns a normal Type. The id and par am arguments are set to
Aflag 62
array 61 the identifier and parameter list in a declarator. Exercise 11.3 describes
ARRAY 109 the abstract argument. dcl rl uses Type structures for the elements of
CONST 109 an inverted type, and calls tnode to allocate an element and initialize it:
dclrl 267 ...
dcl r265
func 64
(decl.c functions)+=
static Type tnode(op, type) int op; Type type; {
265 267...
FUNCTION 109 Type ty;
NEWO 24
oldstyle 63
POINTER 109 NEWO(ty, STMT);
ptr 61 ty->op = op;
qual 62 ty->type = type;
STMT 97 return ty;
VOLATILE 109 }

dcl rl is the parsing function for declarator; the syntax is


declarator:
pointer direct-declarator { suffix-declarator }
direct-declarator:
identifier
' (' declarator ') '
suffix-declarator:
' [' [ constant-expression ] 'J '
' (' [ parameter-list ] ') '
11. 3 • DECLARATORS 267

pointer: { * { type-qualifi.er} }
Parsing declarators is similar to parsing expressions. The tokens *, (, and
[ are operators, and the identifiers and parameter lists are the operands.
Operators yield inverted type elements and operands set id or pa rams .
(decl.c functions)+=
...
266 271
static Type dclrl(id, params, abstract)
....
char **id; Symbol **params; int abstract; {
Type ty = NULL;

switch (t) {
case ID: (ident 267) break;
case '*' · t = gettok () ; (pointer 268) break;
case ' (': t = gettok(); (abstract function 270) break;
case '[': break;
default: return ty;
}
while Ct== 'C' I I t == '[')
switch (t) {
case ' (': t = gettok(); { (concrete function 268) }
break;
case '[': t = gettok(); { (array268) } break;
} 108 token
return ty;
}

If id is nonnull it points to the location at which to store the identifier.


If it is null, it also indicates that the declarator must not include an
identifier.
(ident 267) = 267
i f (id)
*id = token;
else
error("extraneous identifier '%s'\n", token);
t = gettok();
Pointers may be intermixed with any number of canst and volatile qual-
ifiers. For example,
int *canst *canst volatile *p;
declares p to be a "pointer to a constant volatile pointer to a constant
pointer to an integer." p and ***p can be changed, but *p and **p cannot,
and *p may be changed by some external means because it's volatile.
de 1 rl returns the inverted type
[POINTER [CONST [POINTER [VOLATILE [CONST [POINTER]]]]]]
268 CHAPTER 11 • DECLARATIONS

where brackets denote inverted type elements. The type ultimately re-
turned by dcl r is
(POINTER (CONST+VOLATILE (POINTER (CONST POINTER (INT)))))
The code for parsing pointer is
(pointer 268) = 267
if (t == CONST I I t == VOLATILE) {
Type tyl;
tyl = ty = tnode(t, NULL);
while ((t = gettok()) ==CONST I I t ==VOLATILE)
tyl = tnode(t, tyl);
ty->type = dclrl(id, params, abstract);
ty = tyl;
} else
ty = dclrl(id, params, abstract);
ty = tnode(POINTER, ty);
The recursive calls to dcl rl make it unnecessary for the other fragments
in dcl rl to append their inverted types to a pointer type, if there is one.
Exercise 11.2 elaborates.
Control emerges from dcl rl's switch statement with ty equal to the
ARRAY 109 inverted type for a pointer or a function or null. The suffix type operators
CONST 109 [ and ( wrap ty in the appropriate inverted type element. The case for
dclrl 267 arrays is
dclr 265
expect 142 (array268)= 267
FUNCTION 109 int n = O;
intexpr 203
kind 143
if (kind[t] == ID) {
parameters 271 n = intexpr(']', 1);
POINTER 109 if (n <= 0) {
tnode 266 error("'%d' is an illegal array size\n", n);
VOLATILE 109 n = 1;
}
} else
expect(']');
ty = tnode(ARRAY, ty);
ty->size = n;
Parentheses either group declarators or specify a function type. Their
appearance in suffix-declarator always specifies a function type:
(concrete function 268)= 267
Symbol *args;
ty = tnode(FUNCTION, ty);
(open a scope in a parameter list269)
args = parameters(ty);
11. 3 • DECLARATORS 269

if (params && *params NULL)


*params = args;
else
exitparams(args);

(open a scope in a parameter list 269) = 268 270


enterscope();
if (level > PARAM)
enters cope() ;
A parameter list in a function type opens a new scope; hence the call
to enterscope in this case. The second call to enterscope handles an
implementation anomaly that occurs when a parameter list itself includes
another scope. For example, in the declaration
void f(struct T {int (*fp)(struct T {int m; }); } x) {
struct T { float a; } y;
}

the parameter list for f opens a new scope and introducer .de structure
tag T. The structure's lone field, fp, is a pointer to a funt:tion, and the
parameter list for that function opens another new scope and defines a
different tag T. This declaration is legal. The declaration on the second
line is an error because it redefines the tag T - f's parameter x, its tag 265 dcl r
T, f's local y, and y's tag T are all in the same scope. 258 decl
1cc uses scope PARAM for identifiers declared at the top-level parame- 42 enterscope
272 exitparams
ter scope and LOCAL for identifiers like y; LOCAL is equal to PARAM+l. This 42 exitscope
division is only a convenience; foreach can visit just the parameters, for 41 foreach
example. Redeclaration tests, however, must check for LOCAL identifiers 42 level
that erroneously redeclare PARAM identifiers. 38 LOCAL
38 PARAM
The example above is the one case where redeclaration tests must not
make this check. The code above arranges for a nested parameter list
to have a scope of at least PARAM+2. Leaving this "hole" in the scope
numbers avoids erroneous redeclaration diagnostics. For example, the
tag T in fp's parameter has scope PARAM+2, and thus does not elicit a
redeclaration error because the x's tag T has scope PARAM.
At some point, the scope opened by the call or calls to enterscope
must be closed by a matching call to exi tscope. The parameter list may
be part of a function definition or just part of a function declaration. If
the list might be in a function definition, pa rams is nonnull and not pre-
viously set, and dcl r's caller must call exits cope when it's appropriate.
The call to exi tparams in decl 's (id, tyl +- the first declarator) is an
example. exi tparams checks for old-style parameter lists that are used
erroneously, and calls exi tscope. If pa rams is null or already holds a
parameter list, then exi tscope can be called immediately because the
parameter list can't be part of a function definition.
270 CHAPTER 11 • DECLARA T/ONS

abstract-declarators, described in Exercise 11.3, complicate the use of


parentheses for grouping.
(abstract function 270) = 267
if (abstract
&& (t ==REGISTER I I istypename(t, tsym) I I t == ')')) {
Symbol *args;
ty = tnode(FUNCTION, ty);
(open a scope in a parameter list 269)
args = parameters(ty);
exitparams(args);
} else {
ty = dclrl(id, params, abstract);
expect(')');
if (abstract && ty == NULL
&& (id == NULL I I *id == NULL))
return tnode(FUNCTION, NULL);
}

If dcl r is called to parse an abstract-declarator, which is indicated by


a nonzero fourth argument, a ( signals a parameter list if it's followed
by a new-style parameter list or by a nonempty declarator and a match-
ing ). Since abstract-declarators do not appear in function definitions,
dclrl 267 exi tparams can be called immediately after parsing the parameter list.
dclr 265
exitparams 272
expect 142
FUNCTION 109 11.4 Function Declarators
i den ti fi ers 41
istypename 115 The standard permits function declarations and definitions to include
parameters 271 old-style and new-style parameter lists. The syntax is
REGISTER 80
tnode 266 parameter-list:
tsym 108
parameter { , parameter } [ , . . . ]
identifi.er { , identifi.er }
parameter:
declaration-specifi.ers declarator
declaration-specifi.ers [ abstract-declarator ]
An old-style list is just a list of identifiers. A new-style list is a list of
declarators, one for each parameter, or at least one parameter followed
by a comma and ellipsis (, ... ), which specifies a function with a variable
number of parameters, or the single type specifier void, which specifies
a function with no parameters. These two styles and their interaction in
declarations and definitions are what contributes most to the complexity
of recognizing and analyzing them.
parameters parses both styles. It installs each of the parameters in
the i den ti fi ers table at the current scope level, which is established by
11.4 • FUNCTION DECLARATORS 271

parameters caller by calling enterscope, as illustrated in the previous


section. It returns a pointer to a null-terminated array of symbols, one
for each parameter. The first token of a parameter list identifies the
style:
....
(decl.c functions}+=
static Symbol *parameters(fty) Type fty; {
267 272...
List list = NULL;
Symbol *params;

if (kind[t] == STATIC I I istypename(t, tsym)) {


(parse new-style parameter list 273}
} else {
(parse old-style parameter list 271)
}
if Ct!=')') {
static char stop[] { CHAR, STATIC, IF, ')', 0 };
expect(')');
skipto('{', stop);
}
if (t == ')')
t = gettok();
return params; 34 append
} 109 CHAR
274 dclparam
parameters also annotates the function type, fty, with parameter infor- 50 defined
mation, as described below. 42 enterscope
Old-style parameters are simply gathered up into a Li st, which is con- 142 expect
verted to a null-terminated array after the parameters are recognized. 97 FUNC
115 istypename
(parse old-style parameter list 271} = 271 143 kind
if (t == ID) 34 List
321 list
for(;;){
34 ltov
Symbol p; 144 skipto
if (t != ID) { 80 STATIC
error("expecting an identifier\n"); 108 token
break; 108 tsym
}
p = dclparam(O, token, inttype, &src);
p->defined = O;
list= append(p, list);
t = gettok();
if (t != I I I)

break;
t = gettok();
}
params = ltov(&list, FUNC);
272 CHAPTER 11 • DECLARATIONS

fty->u.f .proto = NULL;


fty->u.f .oldstyle = 1;
The parameters are installed in i denti fi ers by calling dcl param. Their
types are unknown, so they're installed with the type integer. If the
parameter list is part of function definition (which it must be), these
symbols will be discarded and reinstalled when the declarations are pro-
cessed by funcdefn. They're installed here only to detect duplicate pa-
rameters. Setting the defi ned bit to zero identifies old-style parameters.
The function type, fty, is edited to record that it's old-style.
At the end of a parameter list that is not part of a function definition,
new-style parameters can simply go out of scope after using them to
build a prototype, as shown below. But it's an error to use an old-style
parameter list in such a context. For example, in
int (*f)(int a, float b);
int (*g)(a, b);
the first line is a legal new-style declaration for the type
(POINTER (FUNCTION (INT) {(INT) (FLOAT)}))
but the second line is an illegal old-style declaration of the type
dclparam 274 (POINTER (FUNCTION (INT)))
defined 50
exitscope 42 because it includes a parameter list in a context other than a function
funcdefn 286 definition. exi tparams squawks about this error:
identifiers 41
(decl.c functions)+= 271 274
....
level 42 .....
ltov 34 static void exitparams(params) Symbol params[J; {
oldstyle 63 if (params[O] && !params[O]->defined)
parameters 271
PARAM 38
error("extraneous old-style parameter list\n");
(close a scope in a parameter list 272)
}

(close a scope in a parameter list 272) = 272


if (level > PARAM)
exitscope();
exitscope();
As mentioned in Exercise 2.15, the array returned by ltov always has at
least the null terminating element, so if params comes from parameters,
it will always be nonnull.
New-style parameter lists are more complicated because they have sev-
eral variants. A list may or may not contain identifiers depending on
whether or not it is part of a function definition. Either variant can end
in , ... , and a list consisting of just void is legal in both definitions
and declarations. Also, a new-style declaration provides a prototype for
11.4 • FUNCTION DECLARATORS 273

the function type, which must be retained for checking calls, other dec-
larations of the same function, and the definition, if one appears. As
described in Section 4.5, a new-style function with no arguments has a
zero-length prototype; a function with a variable number of arguments
has a prototype with at least two elements, the last of which is the type
for void. The use of void to identify a variable number of arguments
is an encoding trick (of perhaps dubious value); it doesn't appear in the
source code and can't be confused with voids that do, because they never
appear in prototypes.
(parse new-style parameter list 273} = 271
int n = O;
Type tyl = NULL;
for(;;){
Type ty;
int sclass = O;
char *id = NULL;
if (tyl && t == ELLIPSIS) {
(terminate 1 i st for a varargs function 274}
t = gettok();
break;
}
if (!istypename(t, tsym) && t != REGISTER) 265 dclr
error("missing parameter type\n"); 115 istypename
n++; 63 oldstyle
ty = dclr(specifier(&sclass), &id, NULL, 1); 80 REGISTER
(declare a parameter and append it to 1ist273} 255 specifier
108 tsym
if (tyl == NULL) 58 voidtype
tyl = ty;
if Ct!=',')
break;
t = gettok();
}
(build the prototype 274}
fty->u.f .oldstyle = O;
tyl is the Type of the first parameter, and it's used to detect invalid use
of void and, as shown above, of ellipses. Each parameter is a declarator,
so parsing one uses the machinery embodied in specifier and dclr,
but, as shown above, permits only the storage class register. If the type
void appears, it must appear alone and first:
(declare a parameter and append it to 1i st 273}= 273
if ( ty == voidtype && (tyl I I id)
I I tyl == voidtype)
error("illegal formal parameter types\n");
if (id == NULL)
274 CHAPTER 11 • DECLARA T/ONS

id= stringd(n);
if (ty != voidtype)
list= append(dclparam(sclass, id, ty, &src), list);
Omitted identifiers are given integer names; dclparam will complain
about these missing identifiers if the declaration is part of a function
definition.
Variable length parameter lists cause the evolving list of parameters
to be terminated by a statically allocated symbol with a null name and
the type void.
(terminate list for a varargs function 274) = 273
static struct symbol sentinel;
if (sentinel.type== NULL) {
sentinel.type= voidtype;
sentinel.defined= 1;
}
if (tyl == voidtype)
error("illegal formal parameter types\n");
list= append(&sentinel, list);
After the new-style parameter list has been parsed, list holds the sym-
bols in the order they appeared. These symbols form the params array
append 34 returned by parameters, and their types form the prototype for the func-
Coordinate 38 tion type:
defined 50
exitparams 272 (build the prototype 274)= 273
exitscope 42 fty->u.f.proto = newarray(length(list) + 1,
funcdefn 286 sizeof (Type*), PERM);
FUNC 97
length 34 params = ltov(&list, FUNC);
list 321 for (n = O; params[n]; n++)
ltov 34 fty->u.f.proto[n] = params[n]->type;
newarray 28 fty->u.f.proto[n] =NULL;
parameters 271
PERM 97 dcl pa ram declares both old-style and new-style parameters. dcl pa ram
stringd 29 is called twice for each parameter: The first call is from parameters
symbol 37
voidtype 58 and the second is from funcdefn. If the parameter list is not part of
a definition, the call to exitscope (in exitparams) discards the entries
made by dcl pa ram.
....
(decl.c functions)+=
static Symbol dclparam(sclass, id, ty, pos)
272 277 ...
int sclass; char *id; Type ty; Coordinate *pos; {
Symbol p;

(dcl pa ram 275)


return p;
}
11.4 • FUNCTION DECLARATORS 275

Declaring parameters is simpler than and different from declaring glob-


als. First, the types (ARRAY T) and (FUNCTION T) decay to (POINTER T)
and (POINTER (FUNCTION T)):
(dcl pa ram 275) =
if (isfunc(ty))
...
275 274

ty = ptr(ty);
else if (isarray(ty))
ty = atop(ty);
The only explicit storage class permitted is register, but 1cc uses auto
internally to identify nonregister parameters.
...
(dcl param 275) +=
if (sclass == O)
275 275 ... 274

sclass = AUTO;
else if (sclass != REGISTER) {
error("invalid storage class '%k' for '%t%s\n",
sclass, ty, (id275));
sclass = AUTO;
} else if (isvolatile(ty) I I isstruct(ty)) {
warning("register declaration ignored for '%t%s\n",
ty, (id275));
sclass = AUTO; 62 at:op
} 80 AUTO
274 dclparam
50 defined
(id 275)= 275 97 FUNC
stringf(id? "%s'" : '" parameter", id) 41 identifiers
44 inst:all
Parameters may be declared only once, which makes checking for re- 60 isarray
declaration easy: 60 isfunc
... 60 isst:ruct:
(dcl pa ram 275) +=
p = lookup(id, identifiers);
275 275 ... 274 60
42
isvolat:ile
level
45 lookup
if (p && p->scope == level) 61 pt:r
error("duplicate declaration for '%s' previously _ 80 REGISTER
declared at %w\n", id, &p->src); 37 scope
else 99 st:ringf
p = install(id, &identifiers, level, FUNC);
dcl param concludes by initializing p's remaining fields and checking for
and consuming illegal initializations.
(dcl par am 275) +=
...
275 274
p->sclass = sclass;
p->src = *pos;
p->type = ty;
p->defined = 1;
276 CHAPTER 11 • DECLARATIONS

if (t == '=') {
error("illegal initialization for parameter '%s '\n", id);
t = gettok();
(void)exprl(O);
}

Parameters are considered defined when they are declared because they
are announced to the back end by the interface procedure function, as
described in Section 11.6.

11.5 Structure Specifiers


Syntactically, structure, union, and enumeration specifiers are the same
as the types specified by the keywords int, float, etc. Semantically, how-
ever, they define new types. A structure or union specifier defines an
aggregate type with named fields, and an enumeration specifier defines
a type and an associated set of named integral constants. Exercise 11.9
describes enumeration specifiers. The syntax for structure and union
specifiers is:
struct-or-union-specifier:
struct-or-union [ identifier ] ' { ' fields { fields } '}'
exprl 157 struct-or-union identifier
function 92
(MIPS) " 448 struct-or-union: struct I union
(SPARC) " 484
fields:
(X86) " 518
{ type-specifier I type-qualifier } field { , field } ;
field:
declarator
[ declarator ] : constant-expression
The identifier, which is the tag of the structure or union, is optional only
if the specifier includes a list of fields. A struct-or-union-specifier defines
a new type if it includes fields or if it appears alone in a declaration and
there is no definition of a structure, union, or enumeration type with
the same tag in the same scope. This last kind of definition caters to
mutually recursive structure declarations. For example, the intent of
struct head { struct node *list; ... };
struct node { struct head *hd; struct node *link; ... };
is for the 1 i st field in an instance of head to point to the nodes in a
linked list, and for each node to point to the head of the list. The list
is threaded through the 1ink fields. But if node has already been de-
clared as a structure or union tag in an enclosing scope, the 1i st field
is a pointer to that type, not to the node declared here. Subsequent as-
signments of pointers to nodes to 1i st fields will be diagnosed as errors.
11. 5 • STRUCTURE SPECIFIERS 277

Exchanging the two lines fixes the problem for 1i st, but exposes head to
the same problem. The solution is to define the new type before defining
head:
struct node;
struct head { struct node *list; ... };
struct node { struct head *hd; struct node *link; ... };
The lone struct node defines a new incomplete structure type with the
tag node in the scope in which it appears, and hides other tags named
node defined in enclosing scopes, if there are any. If there is a structure
tag node in the same scope as the struct node, the latter declaration has
no effect.
The parsing function for struct-or-union-specifier, structdcl, deals
with tags and their definition, and calls fie 1ds to parse fields and to
assign field offsets. Unions and structures are handled identically, except
for assigning field offsets.
...
(decl.c functions)+=
static Type structdclCop) int op; {
274 280 ...
char *tag;
Type ty;
Symbol p;
Coordinate pos; 38 Coordinate
280 fields
67 newstruct
t = gettok(); 108 token
pos = src;
(structdcl 277)
return ty;
}

structdcl begins by consuming the tag or using the empty string for
omitted tags:
(structdcl 277)=
if Ct == ID) {
...
277 277

tag = token;
t = gettok();
} else
tag = "";
If the tag is followed by a field list, this specifier defines a new tag:
...
(structdcl 277)+=
if Ct== '{') {
277 278... 277

static char stop[] = { IF, ' ' 0 };


' '
ty = newstructCop, tag);
ty->u.sym->src = pos;
278 CHAPTER 11 • DECLARATIONS

ty->u.sym->defined = 1;
t = gettok();
if (istypename(t, tsym))
fields(ty);
else
error("invalid %k field declarations\n", op);
test('}' , stop) ;
}

newstruct checks for redeclaration of the tag and defines the new type.
If the tag is empty, new st ruct calls gen 1abe1 to generate one. new st ruct
is also used for enumeration specifiers; see Exercise 11.9.
If the struct-or-union-specifier doesn't have fields and the tag is al-
ready in use for the type indicated by op, the specifier refers to that
type.
(structdcl 277) += 277 278
... 277
....
else if (*tag && (p = lookup(tag, types)) != NULL
&& p->type->op == op) {
ty = p->type;
if (t == ' ; ' && p->scope <level)
ty = newstruct(op, tag);
}
defined 50
fields 280 This case also handles the exception described above: If the tag is defined
genlabel 45 in an enclosing scope and the specifier appears alone in a declaration,
i stypename 115
level 42
the specifier defines a new type. As described in Chapter 3, tags have
lookup 45 their own name space, which is managed in the types table.
newstruct 67 If the cases above don't apply, there must be a tag, and the specifier
scope 37 defines a new type:
test
tsym
141
108 (structdcl 277)+=
...
278 277
types 41 else {
use 51 if (*tag == O)
error("missing %k tag\n", op);
ty = newstruct(op, tag);
}
if (*tag && xref)
use(ty->u.sym, pos);
The last else clause handles the case when a specifier appears alone in
a declaration and the tag is already defined in an enclosing scope for a
different purpose. An example is:
enum node { ... };
f(void) {
struct node;
struct head { struct node *list; ... };
11. 5 • STRUCTURE SPECIFIERS 279

struct node { struct head *hd; struct node *link; ... };

The else clause above handles the struct node on the third line.
Most of the complexity of processing structure and union specifiers is
in analyzing the fields and computing their offsets, particularly specifiers
involving bit fields. Fields must be laid out in the order they appear in
fields; their offsets depend on their types and the alignment constraints
of those types. Bit fields are allocated in addressable storage units and
when N bit fields fit in a storage unit, they must be laid out in the or-
der in which they are declared, but that order can be from least to most
significant bit or vice versa. It's conventional to use the order that fol-
lows increasing addresses: least to most significant bit (right to left) on
little-endian targets, and most to least significant bit (left to right) on
big endians. A compiler is not obligated to split bit fields across storage
units, and it may choose any storage unit for bit fields. 1cc uses un-
signed integers so that bit fields can be fetched and stored using integer
loads, stores, and masking operations.
Figure 11.l shows a structure definition and its layout on a little-
endian MIPS. Unsigneds are 32 bits, and integers and unsigneds must
be aligned on 4-byte boundaries. Addresses increase from right to left
as suggested by the numbering of a's elements, and from top to bottom 182 field
as suggested by the offsets on the right side of the figure. The shad- 280 fields
ing depicts holes that result from alignment constraints, and the darker
shading is the hole specified by the 26-bit unnamed bit field. This ex-
ample helps explain the intricacies of fie 1ds, the parsing function for
fields.
fie 1ds parses the field list and builds a list of fie 1d structures em-
anating from ty->u. sym->u. s. fl is t. The fie 1d structure is described
in Section 4.6. Its name, type, and offset fields give the field's name, its
Type, and its offset in bytes from the beginning of the structure, respec-

struct {
char a[S]; used
short sl, s2;
unsigned code:3, used:l; code
unsigned :26;
int amt:?, last;
short id;
} x;
FIGURE 11.1 Llttle-endian structure layout example.
280 CHAPTER 11 • DECLARATIONS

tively. For bit fields, bi tsi ze gives the number of bits in the bit field,
and 1 sb gives the number of the bit field's least significant bit plus one,
where bits are numbered starting at zero with the least significant bit on
all targets. A bit field is identified by a nonzero 1sb. The list of fie 1d
structures is threaded through the 1 ink fields. For the example shown
in Figure 11.1, this list holds the fields shown in the following table.

name type offset bitsize lsb


a chartype 0
sl shorttype 6
s2 shorttype 8
code unsignedtype 12 3 1
used unsignedtype 12 1 4
amt inttype 16 7 1
last inttype 20
id shortype 24
fie 1ds first builds the list of fie 1ds, then traverses this list computing
offsets and bit-field positions:
...
(decl.c functions)+=
static void fields(ty) Type ty; {
277 286
...
{ (parse fields 280) }
Aflag 62
CHAR 109 { (assign field offsets 282) }
Field 66 }
field 182
istypename 115 A list of fields is parsed by calling specifier to consume the field's
specifier 255 specifiers, then parsing each field:
test 141
tsym 108 (parse fields 280) = 280
int n = O;
while (istypename(t, tsym)) {
static char stop[]= {IF, CHAR, '}', 0 };
Type tyl = specifier(NULL);
for (; ; ) {
Field p;
char *id = NULL;
(parse one field 281)
n++;
if (Aflag >= 2 && n == 128)
warning("more than 127 fields in '%t'\n", ty);
if Ct ! = I' ')
break;
t = gettok();
}
test(' ; ' , stop);
}
11. 5 • STRUCTURE SPECIFIERS 281

n counts the number of fields, and is used only for the warning about
declaring more fields than the maximum specified by the standard, which
1cc's -A option enables.
Parsing a field is similar to parsing the declarator in a declaration, and
dcl r does most of the work:
(parse one field 281) = 281
..... 280
p = newfield(id, ty, dclr(tyl, &id, NULL, O));
newfi e 1d allocates a fie 1 d structure, initializes its name and type fields
to the value of id and the Type returned by dcl r, clears the other fields,
and appends it to ty->u. sym->u. s. fl i st. As it walks down the list to
its end, newfi e1 d also checks for duplicate field names.
An oncoming colon signifies a bit field, and fie 1ds must check the
field's type, parse its field width, and check that the width is legal:
(parse one fi.eld 281)+=
...
281 282 280
.....
if (t == ':') {
if (unqual(p->type) != inttype
&& unqual(p->type) != unsignedtype) {
error("'%t' is an illegal bit-field type\n",
p->type);
p->type = inttype;
} 265 dclr
t = gettok(); 182 field
280 fields
p->bitsize = intexpr(O, O); 45 genlabel
if (p->bitsize > 8*inttype->size I I p->bitsize < 0) { 203 intexpr
error('"%d' is an illegal bit-field size\n", 68 newfield
p->bitsize); 29 stringd
p->bitsize = 8*inttype->size; 60 unqual
58 unsignedtype
} else if (p->bitsize == 0 && id) {
warning("extraneous 0-width bit field '%t %s' _
ignored\n", p->type, id);
p->name = stringd(genlabel(l));
}
p->lsb = 1;
}

As shown, a bit field must be a qualified or unqualified version of int or


unsigned. Compilers are permitted to treat plain int bit fields as either
signed or unsigned; 1 cc treats them as signed. An unnamed bit field
specifies padding; for now, it's appended to the list like other fields with
a unique integer name, but it's removed when offsets are assigned. Simi-
larly, the 1sb field is set to one for now to identify the field as a bit field;
it's changed to the correct value when its offset is assigned.
newfi e 1d has done all of the work for normal fields except to check
for missing field names and illegal types:
282 CHAPTER 11 • DECLARA T/ONS

{parse one field 281)+=


....
281 282
.... 280
else {
if (id == NULL)
error("field name missing\n");
else if (isfunc(p->type))
error('"%t' is an illegal field type\n", p->type);
else if (p->type->size == 0)
error("undefined size for field '%t %s'\n",
p->type id);
I

If a field or bit field is declared const, assignments to that field are for-
bidden. Structure assignments must also be forbidden. For example,
given the definition
struct { int code; const int value; } x, y;
x. code and y. code can be changed, but x. va 1ue and y. va1 ue cannot.
Assignments like x = y are also illegal, and they're caught in asgntree
by inspecting the structure type's cfields flag, which is set here, along
with the vfi e 1ds flag, which records volatile fields:
{parse one fi.eld 281)+=
....
282 280
if (isconst(p->type))
align 78
asgntree 197 ty->u.sym->u.s.cfields 1;
cfields 65 if (isvolatile(p->type))
Field 66 ty->u.sym->u.s.vfields 1·
field 182 '
IR 306 At this point, the field list for Figure 11.1 's example has nine elements:
isconst 60 the eight shown in the table on page 280 plus one between used and amt
isfunc 60 that has a bi tsi ze equal to 26. The 1sb fields of the elements for code,
isvolatile 60
structmetri c 79 used, and amt are all equal to one, and all offset fields are zero.
vfields 65 Next, field makes a pass over the field list computing offsets. It
also computes the alignment of the structure, and rebuilds the field list
omitting those fie 1d structures that represent padding, which are those
with integer names.
{assign field offsets 282) = 285
.... 280
int bits = 0, off = 0, overflow = O;
Field p, *q = &ty->u.sym->u.s.flist;
ty->align = IR->structmetric.align;
for (p = *q; p; p = p->link) {
{compute p->offset 283)
if (p->name == NULL
I I !('1' <= *p->name && *p->name <= '9')) {
*q = p;
q = &p->link;
}
11. 5 • STRUCTURE SPECIFIERS 283

}
*q = NULL;
off is the running total of the number of bytes taken by the fields up
to but not including the one pointed to by p. bi ts is the number of
bits plus one taken by bit fields beyond off by the sequence of bit fields
immediately preceding p. Thus, bi ts is nonzero if the previous field is
a bit field, and it never exceeds unsignedtype->size. fields must also
cope with offset computations that overflow. It uses the macro add to
increment off:
(decl.c macros)= 283
....
#define add(x,n) (x > INT_MAX-(n) ? (overflow=l,x) : x+(n))
#define chkoverflow(x,n) ((void)add(x,n))
chkoverflow uses add to set overflow if x + n overflows. If overflow is
one at the end of fields, the structure is too big.
If the fields appear in a union, all the offsets are zero by definition:
(compute p->offset 283)= 283
.... 282
int a= p->type->align ? p->type->align : 1;
if (p-> l sb)
a = unsignedtype->align;
if (ty->op == UNION) 205 add
off = bits = O; 78 align
280 fields
The value of a is the field's alignment; it's used below to increase the 364 offset
structure's alignment, ty->a l i gn, if necessary. It's also used to round 19 roundup
up off to the appropriate alignment boundary: 109 UNION
58 unsignedtype
(compute p->offset 283)+=
....
283 284 282
....
else if (p->bitsize == 0 I I bits == 0
I I bits - 1 + p->bitsize > 8*unsignedtype->size) {
off= add(off, bits2bytes(bits-1));
bits = O;
chkoverflow(off, a - 1);
off= roundup(off, a);
}
if (a > ty->align)
ty->align = a;
p->offset = off;
....
283
(decl.c macros)+=
#define bits2bytes(n) (((n) + 7)/8)
off must be rounded up if p isn't a bit field, isn't preceded by fields that
ended in the middle of an unsigned, or is a bit field that's too big to fit
in the unsigned partially consumed by previous bit fields. Before off is
284 CHAPTER 11 • DECLARATIONS

rounded up, it must be incremented past the bits occupied by previous


bit fields in the unsigned at the current value of off. This space isn't
accounted for until a normal field is encountered or the end of the list
is reached. bi ts is this space in bits plus one, so it's converted to bytes
by computing the ceiling of bits-1 divided by eight. This computation
is correct even when bi ts is zero.
When the field sl from Figure 11.1 is processed, off is 5, and bi ts
is 0. sl's alignment is 2, so the code above sets off to 6, which becomes
the offset for sl. When amt is processed, off is 12, which is the unsigned
that holds code and used, and bi ts is 3 + 1 + 26 + 1 = 31. amt needs 7
bits, which won't fit, so off is set to 12 + ((31 - 1) + 7)/8 = 16, which
is on a four-byte boundary as dictated by the alignment a, and bi ts is
reset to zero. Next, last is processed; off is 16 and bits is 7 + 1 = 8.
Since last isn't a bit field, off is set to 16 + ((8 - 1) + 7)/8 = 17, then
rounded up to 20.
Once the offset to p is computed and stored, off is incremented by
the size of p's type, except for bit fields. If p is a bit field, p-> 1sb is
computed and bi ts is incremented by the bit-field width:
(compute p->offset 283)+=
....
283 282
if (p->lsb) {
if (bits == 0)
bits = 1;
add 205
IR 306 if (IR->little_endian)
little_endian 87 p->lsb =bits;
unsignedtype 58 else
p->lsb = 8*unsignedtype->size - bits + 1
- p->bitsize + 1;
bits += p->bitsize;
} else
off= add(off, p->type->size);
if (off + bits2bytes(bits-1) > ty->size)
ty->size =off+ bits2bytes(bits-1);
bi ts is the bit offset plus one in addressing order, but 1 sb is the number
of bits plus one to the right of the bit field regardless of addressing order.
On a little endian, bi ts and 1sb are the same. But on a big endian with
32-bit unsigneds, for example, the number of bits to the right of an m-bit
field is 32 - (bits - 1) - m, where bits - 1 is the number of bits used
for previous bit fields. The last statement in the code above updates
ty->si ze. This code works for both structures and unions because off
is reset to zero for union fields. For unions, including the additional
space given by bi ts is crucial; if it's omitted, the size of
union { int x; int a:31, b:4; };
would end up being 4 instead of 8 because the 4 bits for b, which are
recorded only in bi ts, wouldn't get counted.
11.6 • FUNCTION DEFINITIONS 285

When code is processed, off is 10 and bits is zero. The round-up


code shown above bumps off to 12 and becomes code's offset, because
bit fields must start on a boundary suitable for an unsigned. bi ts and
code's 1 sb are set to l. If Figure 11.l's example is compiled on a 32-bit
big endian, code's 1 sb is 32 - 1 + 1 - 3 + 1 = 30; i.e., there are 29 bits
to the right of code on a big endian. used is reached with off still equal
to 12 and bits equal to 4. used fits in the unsigned at offset 12, so its
1 sb becomes 4 and bi ts becomes 5. The padding between used and code
causes bi ts to be incremented to 5 + 26 = 31. There isn't room in the
unsigned at offset 12 for amt, so off and bits get changed to 16 and
zero, as described above, and amt's 1sb becomes one.
Structures can appear in arrays: A structure must end on an address
boundary that is a multiple of the alignment of the field with the strictest
alignment, so that incrementing a pointer to element n advances the
pointer to element n + l. For example, if a structure contains a double
and doubles have an alignment of 8, then the structure must have an
alignment of 8. As shown above, fie 1ds keeps a structure's alignment,
ty->a1 i gn, greater than or equal to the alignments of its fields, but it
must pad the structure to a multiple of this alignment, if necessary:
{assign field offsets 282) +=
...
282 280
chkoverflow(ty->size, ty->align - 1);
ty->size = roundup(ty->size, ty->align); 78 align
if (overflow) { 258 decl
error("size of '%t' exceeds %d bytes\n", ty, INTJ1AX); 280 fields
ty->size = INT_MAX&(-(ty->align - 1)); 286 funcdefn
19 roundup
}

For the example in Figure 11.1, the loop in {assign field offsets) ends
with ty->si ze equal to 26, the last value of off, which is not a multiple
of 4, the value of ty->align, so this concluding code bumps ty->size
to 28.

11.6 Function Definitions


A function definition is a declaration without its terminating semicolon
followed by a compound-statement. In a definition of an old-style func-
tion, an optional list of declarations intervenes.
function-definition:
declaration-specifiers declarator { declaration }
compound-statement

The parsing function is funcdefn, which is called from decl when it


realizes that a function definition is approaching.
286 CHAPTER 11 • DECLARATIONS

{decl.c functions)+= 280 288


...
.....
static void funcdefn(sclass, id, ty, params, pt) int sclass;
char *id; Type ty; Symbol params[]; Coordinate pt; I
int i, n;
Symbol *callee, *caller, p;
Type rty = freturn(ty);

{funcdefn 286)
}

funcdefn has much to do. It must parse the optional declarations for old-
style functions, reconcile new-style declarations with old-style definitions
and vice versa, and initialize the front end in preparation for parsing
compound-statement, which contributes to the code list for the function.
Once the compound-statement is consumed, funcdefn must finalize the
code list for traversal when the back end calls gencode and emi tcode,
arrange the correct arguments to the interface procedure function, and
re-initialize the front end once code for the function has been generated.
funcdefn's sclass, id, and ty parameters give the storage class, func-
tion name, and function type gleaned from the declarator parsed by decl.
pt is the source coordinate of the beginning of that declarator. params
is the array of symbols built by parameters - one for each parameter,
and an extra unnamed one if the parameter list ended with an ellipsis.
callee 93
caller 93 funcdefn starts by removing this extra symbol because it's used only in
Coordinate 38 prototypes, and it checks for illegal return types:
decl 258
emitcode 341
{funcdefn 286)= 286
..... 286
freturn 64 if (isstruct(rty) && rty->size == 0)
function 92 error("illegal use of incomplete type '%t'\n", rty);
(MIPS) " 448 for (n = O; params[n]; n++)
(SPARC) " 484
(X86) " 518
gencode 337
if (n > 0 && params[n-1]->name == NULL)
isstruct 60 params[--n] =NULL;
oldstyle 63 params helps funcdefn build two parallel arrays of pointers to symbol-
parameters 271
table entries. ca11 ee is an array of entries for the parameters as seen by
the function itself, and ca 11 er is an array of entries for the parameters
as seen by callers of the function. Usually, the corresponding entries in
these arrays are the same, but they can differ when argument promotions
force the type of a caller parameter to be different than the type of the
corresponding callee parameter, as shown in Section 1.3. The storage
classes of the caller and callee parameters can also be different when,
for example, a parameter is declared register by the callee but is passed
on the stack by the caller. The details of building callee and caller
depend on whether the definition is old-style or new-style:
{funcdefn 286)+=
...
286 290 286
.....
if (ty->u.f.oldstyle) {
11.6 • FUNCTION DEFINITIONS 287

(initialize old-style parameters 287)


} else {
(initialize new-style parameters 287)
}
for (i = O; (p = callee[i]) != NULL; i++)
if (p->type->size == 0) {
error("undefined size for parameter '%t %s'\n",
p->type, p->name);
caller[i]->type = p->type = inttype;
}

New-style definitions are the easier of the two because parameters has
already done most of the work, so pa rams can be used as ca 11 ee. The
caller parameters are copies of the corresponding callee parameters, ex-
cept that their types are promoted and they have storage class AUTO to
indicate that they're passed in memory.
(initialize new-style parameters 287)= 287
callee = params;
caller= newarray(n + 1, sizeof *caller, FUNC);
for Ci = O; (p = callee[i]) != NULL && p->name; i++) {
NEW(caller[i], FUNC);
*caller[i] = *p;
caller[i]->type = promote(p->type); 80 AUTO
93 callee
caller[i]->sclass = AUTO; 93 caller
if ('1' <= *p->name && *p->name <= '9') 274 dclparam
error("missing name for parameter %d to _ 258 decl
function '%s'\n", i + l, id); 42 enterscope
}
286 funcdefn
97 FUNC
caller[i] = NULL; 92 function
448 " (MIPS)
Recall that parameters uses the parameter number for a missing param- 484 " (SPARC)
eter identifier, so funcdefn must check for such identifiers. Identif1ers 518 " (X86)
can be omitted in declarations but not in function definitions. ll5 istypename
For old-style definitions, parameters has simply collected the iden- 143 kind
28 newarray
tifiers in the parameter list and checked for duplicates. funcdefn must 24 NEW
parse their declarations and match the resulting identifiers with the ones 271 parameters
in par ams. It uses par ams for the cal 1er, makes a copy for use as cal 1 ee, 71 promote
and calls dee 1 to parse the declarations. 80 STATIC
108 tsym
(initialize old-style parameters 287) =
caller = params;
...
288 287

callee= newarray(n + 1, sizeof *callee, FUNC);


memcpy(callee, caller, (n+l)*sizeof *callee);
enterscope();
while (kind[t] == STATIC I I istypename(t, tsym))
decl(dclparam);
288 CHAPTER 11 • DECLARA T/ONS

Parsing the parameter declarations adds a symbol-table entry for each


identifier to the identifiers table. These declarations may omit integer
parameters and they might declare identifiers that are not in ca11 ee.
funcdefn checks for the second condition by visiting every symbol with
scope PARAM and changing ca11 ee to point to that symbol:
....
(initialize old-style parameters 287) += 287 288 .... 287
foreach(identifiers, PARAM, oldparam, callee);

(decl.c functions)+=
....
static void oldparam(p, cl) Symbol p; void *cl; {
286 293 ....
int i;
Symbol *callee= cl;

for Ci= O; callee[i]; i++)


if (p->name == callee[i]->name) {
callee[i] = p;
return;
}
error("declared parameter '%s' is missing\n", p->name);
}

AUTO 80
callee 93
caller 93
dclparam 274
defined 50
doubletype 57
floattype 57
foreach 41 (initialize old-style parameters 287)+=
....
funcdefn 286
for (i = O; (p = callee[i]) != NULL; i++) {
....
288 289
identifiers 41
parameters 271 if ( ! p->defi ned)
PARAM 38 callee[i] = dclparam(O, p->name, inttype, &p->src);
promote 71 *caller[i] = *p;
unqual 60 caller[i]->sclass = AUTO;
if (unqual(p->type) == floattype)
caller[i]->type doubletype;
else
caller[i]->type promote(p->type);
}

Arguments in calls to old-style functions suffer the default argument pro-


motions, so the types of the ca11 er symbols are modified accordingly.
For example, in
f(c,x) char c; float x; { ... }
11.6 • FUNCTION DEFINITIONS 289

callee's two symbols have types (CHAR) and (FLOAT), but caller's sym-
bols have types (INT) and (DOUBLE). As shown in Section 1.3, these dif-
ferences cause assignments of the caller values to the callee values at
the entry to f.
The standard permits mixing old-style definitions and new-style decla-
rations, as the code in this book illustrates, but the definitions must agree
with the declarations and vice versa. If a new-style declaration precedes
an old-style definition, the function is deemed to be a new-style function,
and the old-style definition must provide a parameter list whose types
are compatible with the declaration.
...
(initialize old-style parameters 287) +=
p = lookup(id, identifiers);
288 289 ... 287

if (p && p->scope == GLOBAL && isfunc(p->type)


&& p->type->u.f.proto) {
Type *proto = p->type->u.f .proto;
for (i = O; caller[i] && proto[i]; i++)
if (eqtype(unqual(proto[i]),
unqual(caller[i]->type), 1) == 0)
break;
if (proto[i] I I caller[i])
error("conflicting argument declarations for_
function '%s '\n", id); 93 caller
} 69 eqtype
92 function
The new-style declaration cannot end in , ... because there's no compat- 448 " (MIPS)
ible old-style definition. The code above checks that cal 1 er's types are 484 " (SPARC)
compatible with the corresponding types in the new-style declaration. 518 " (X86)
Thus, the only compatible declaration for f above is 38 GLOBAL
41 i denti fi ers
60 isfunc
extern int f(int, double); 45 lookup
37 scope
The declaration 60 unqual
extern int f(char, float);
looks compatible because its types are the same as those for c and x
in the definition above, but for compability purposes, it's the promoted
types that matter.
If a new-style declaration follows an old-style definition, the function
remains an old-style function, but the declaration must be compatible,
as above. 1cc implements this check by building a prototype for the old-
style function and changing the function's type to include this prototype.
The function type's o1dsty1 e flag is 1, so this prototype is used only by
eqtype for these kinds of checks.
(initialize old-style parameters 287) +=
...
289 287
else {
290 CHAPTER 11 • DECLARATIONS

Type *proto = newarray(n + 1, sizeof *proto, PERM);


i f (Aflag >= 1)
warning("missing prototype for '%s '\n", id);
for Ci = 0; i < n; i ++)
proto[i] = caller[i]->type;
proto[i] = NULL;
ty = func(rty, proto, 1);
}

When a subsequent new-style declaration appears, redeclaration code will


call eqtype and will use this prototype to check for compatibility.
Once the caller and callee are built, funcdefn can define the symbol
for the function itself, because other functions, such as statement and
retcode, need access to the current function. This symbol is posted in
a global variable:
(decl.c exported data)= 291
....
extern Symbol cfunc;
Additional information for a function is carried in the symbol's u. f field:
(function symbols 290)= 38
struct {
Coordinate pt;
Aflag 62 int label;
callee 93 int ncalls;
caller 93
Coordinate 38 Symbol *callee;
dclglobal 260 } f;
defined 50
eqtype 69 pt is the source coordinate for the function's entry point, 1abe1 is the la-
funcdefn 286 bel for the exit point, ncal 1 sis the number of calls made by the function,
func 64 and the field ca11 ee is a copy of funcdefn's local variable ca11 ee.
genlabel 45 ....
identifiers 41 (funcdefn 286) += 286 291
.... 286
isfunc 60 p = lookup(id, identifiers);
lookup 45 if (p && isfunc(p->type) && p->defined)
newarray 28 error("redefinition of '%s' previously defined at %w\n",
PERM 97
refine 169 p->name, &p->src);
retcode 244 cfunc = dclglobal(sclass, id, ty, &pt);
Start 217 cfunc->u.f.label = genlabel(l);
statement 221 cfunc->u.f.callee = callee;
use 51 cfunc->u.f .pt = src;
cfunc->defined = 1;
if (xref)
use(cfunc, cfunc->src);
At this point, funcdefn is finally ready to parse the function's body. It ini-
tializes the symbol tables for internal labels and statement labels, initial-
izes refi nc to one, sets the code list to the single Start entry, appends
11. 6 • FUNCTION DEFINITIONS 291

an execution point for the function's entry point, and calls compound,
which is described in the next section.
...
(funcdefn 286)+=
labels = table(NULL, LABELS);
...
290 291 286

stmtlabs = table(NULL, LABELS);


refine = 1.0;
regcount = O;
codelist = &codehead;
codelist->next = NULL;
definept(NULL);
if (!IR->wants_callb && isstruct(rty))
retv = genident(AUTO, ptr(rty), PARAM);
compound(O, NULL, O);

( decl.c data)=
static int regcount;
...
294

(decl.c exported data)+=


...
290
extern Symbol retv;
regcount is the number of locals explicitly declared register. As detailed
in Section 10.8, if the interface flag wants_ca 11 b is zero, the front end
completely implements functions that return structures. To do so, it 62 Aflag
80 AUTO
creates a hidden parameter that points to the location at which to store 290 cfunc
the return value and posts the symbol for this parameter in retv. It also 217 codehead
arranges to pass the values for this parameter in calls; see Section 9.3. 217 Code
The code list grows as the compound-statement is parsed and ana- 217 codelist
lyzed. When compound returns, funcdefn adds a tree for a return state- 293 compound
246 definelab
ment to the code list, if necessary. The code is similar to adding a jump: 220 definept
The return is needed only if control can flow into the end of the function. 286 funcdefn
... 49 genident
(funcdefn
{
286) +=
...
291 292 286 306
60
IR
isstruct
Code cp; 217 Jump
143 kind
for (cp = codelist; cp->kind <Label; cp = cp->prev) 217 Label
38 LABELS
if (cp->kind != Jump) { 41 labels
if (rty != voidtype 38 PARAM
&& (rty != inttype I I Aflag >= 1)) 61 ptr
169 refine
warning("missing return value\n"); 244 retcode
retcode(NULL); 226 stmtlabs
} 41 table
} 58 voidtype
definelab(cfunc->u.f.label); 88 wants_callb
definept(NULL);
292 CHAPTER 11 • DECLARATIONS

The call to define 1ab adds the exit-point label, and defi nept plants the
accompanying execution point. lee warns about the possibility of an
implicit return for functions that return values other than integers, or
for all nonvoid functions if its -A option is specified. The final steps
in parsing the function are to close the scope opened by compound and
check for unreferenced parameters:
...
(funcdefn 286)+=
exitscope();
291 292... 286

foreach(identifiers, level, checkref, NULL);


checkref is described in the next section.
The code list for the function is now complete (except for the changes
made in gencode), and funcdefn is almost ready to call the interface
procedure function. Before doing so, however, it may have to make
two transformations to the ca11 er and ca11 ee, depending on the values
of the interface flags wants_ca11 b and wants_argb. If wants_ca11 b is
zero, the hidden argument, pointed to by retv, must be inserted at the
beginning of ca11 ee and a copy of it must be inserted at the beginning
of caller:
...
(funcdefn 286)+=
if (!IR->wants_callb && isstruct(rty)) {
292 292... 286

callee 93 Symbol *a;


caller 93 a= newarray(n + 2, sizeof *a, FUNC);
checkref 296 a[O] = retv;
compound 293 memcpy(&a[l], callee, (n+l)*sizeof *callee);
definelab 246 callee = a;
definept 220
exitscope 42 a= newarray(n + 2, sizeof *a, FUNC);
foreach 41 NEW(a[OJ, FUNC);
funcdefn 286 *a[O] = *retv;
FUNC 97 memcpy(&a[l], caller, (n+l)*sizeof *callee);
function 92 caller = a;
(MIPS) " 448
(SPARC) " 484
}
(X86) " 518 If wants_argb is zero, the front end completely implements structure
gencode 337
identifiers 41 parameters, as described in Sections 8.8 and 9.3. idtree, for example,
idtree 168 generates an extra indirection for structure parameters when wants_argb
IR 306 is zero because the parameters are really the addresses of the structures.
isstruct 60 This lie must be corrected for the back end, however, which is done by
level 42 changing the types of the caller and callee parameters and by lighting
newarray 28
structarg to identify the identifiers so changed.
NEW
retv
24
291
(symbol flags 50)+=
...
211 38
wants_argb 88
wants_ca11 b 88 unsigned structarg:l;
...
(funcdefn 286)+=
if (!IR->wants_argb)
292 293... 286
11. 7 • COMPOUND STATEMENTS 293

for (i = O; caller[i]; i++)


if (isstruct(caller[i]->type)) {
caller[i]->type = ptr(caller[i]->type);
callee[i]->type = ptr(callee[i]->type);
caller[i]->structarg = callee[i]->structarg = 1;
}

Finally, funcdefn exports the function, if necessary, and passes control


to the back end:
...
(funcdefn 286)+=
if (cfunc->sclass != STATIC)
292 293 286 ...
(*IR->export)(cfunc);
swtoseg(CODE);
(*IR->function)(cfunc, caller, callee, cfunc->u.f.ncalls);
funcdefn concludes by flushing the output, checking for undefined
statement labels, optionally planting an end-of-function event hook, clos-
ing the PARAM scope, and consuming the closing brace on the function's
compound-statement.
(funcdefn 286) +=
...
293 286
217
217
Blockbeg
Blockend
outflush(); 93 callee
93 caller
foreach(stmtlabs, LABELS, checklab, NULL); 290 cfunc
exits cope() ; 309 checklab
expect('}'); 296 checkref
91 CODE
checkl ab is similar to check ref; see Exercise 11.4. 217 Code
42 exitscope
142 expect
90 export
11.7 Compound Statements 456 " (MIPS)
490 " (SPARC)
The syntax of compound statements is 523 " (X86)
41 foreach
compound-statement: 286 funcdefn
' {' { declaration } { statement } '} ' 92 function
448 " (MIPS)
and compound is the parsing function. It appends a Bl ockbeg entry to 484 " (SPARC)
the code list, opens a new scope, parses the optional declarations and 518 " (X86)
306 IR
statements, and appends a Bl ockend entry to the code list. compound's 60 isstruct
arguments are the loop handle, the switch handle, and the structured 38 LABELS
statement nesting level. 98 outflush
... 38 PARAM
(decl.c functions)+=
-void compound(loop, swp, lev)
288 296 ... 61
80
ptr
STATIC
int loop, lev; struct swtch *swp; { 226 stmtlabs
292 structarg
Code cp;
int nregs;
294 CHAPTER 11 • DECLARATIONS

walk(NULL, 0, O);
cp = code(Blockbeg);
enterscope();
(compound 294)
cp->u.block.level =level;
cp->u.block.identifiers =identifiers;
cp->u.block.types = types;
code(Blockend)->u.begin = cp;
if (level > LOCAL) {
exits cope();
expect(' } ' ) ;
}
}

compound is called from statement and from funcdefn. The only dif-
ference between these two calls is that the scope is closed only on the
call from statement. As shown above, funcdefn closes the scope that
append 34 compound opens on its behalf so that it can call the interface procedure
AUTO 80 function before doing so.
Blockbeg 217 Most of compound's semantic processing concerns the locals declared
Blockend 217 in the block. de 11oca1 processes each local and appends it to one of the
cfunc 290 lists
code 218
compound 293 ....
(decl.c data)+= 291
dcllocal 298
defined 50 static List autos, registers;
enterscope 42
exitscope 42 depending on its explicit storage class. Locals with no storage class are
expect 142 appended to autos, and static locals are handled like globals.
freturn 64 If compound is called from funcdefn, it must cope with the interface
funcdefn 286 flag wants_ca 11 b. When this flag is one, the back end handles the trans-
function 92 mission of the return value for functions that return structures. The
(MIPS)II 448
(SPARC) II 484 front end generates space for this value in the caller, but it doesn't know
(X86)II 518 how to transmit the address of this space to the callee. It assumes that
genident 49 the back end will arrange to pass this address in a target-dependent way
identifiers 41 and to store it in the first local. So, compound generates the first local
IR 306 and saves its symbol-table entry in retv:
isstruct 60
level 42
List 34
(compound 294)=
autos = registers = NULL;
295 294 ...
LOCAL 38
ptr 61 if (level == LOCAL && IR->wants_callb
ref 38 && isstruct(freturn(cfunc->type))) {
retv 291 retv = genident(AUTO, ptr(freturn(cfunc->type)), level);
statement 221 retv->defined = 1;
types 41
walk 311 retv->ref 1;
wants_callb 88 registers= append(retv, registers);
}
11.1 • COMPOUND STATEMENTS 295

retv is appended to registers even though it's an AUTO to ensure that


it's passed to the back end as the first local; this order is arranged below.
The front end uses retv in one of two ways depending on the value of
wants_ca11 b. When wants_ca 11 b is one, retv is the symbol-table entry
for the local that holds the address at which to store the return value, as
just described. When wants_ca 11 b is zero, there is no such local because
the front end arranges to pass this address as the value of the hidden
first parameter; in this case, retv is the symbol-table entry for that pa-
rameter. As far as retcode is concerned, retv is the symbol-table entry
for the variable that carries the address, regardless of how it got there.
Next, compound parses the optional block-level declarations:
(compound 294)+=
...
294 295 294
....
expect('{');
while (kind[t] == CHAR I I kind[t] == STATIC
I I istypename(t, tsym) && getchr() != ':')
decl(dcllocal);
The call to getch r checks for the rare but legal code exemplified by
typedef int T;
f() { T: ... ; goto T; }
istypename(t) says Tis a typedef, but inside f, Tis a label. Peeking at 34 append
the next input character avoids the misinterpretation. 80 AUTO
Once the locals are consumed, those on the autos list are appended 294 autos
to the registers list, which is then converted to a null-terminated array 217 Blockbeg
109 CHAR
and assigned to the u. block. locals field of the Blockbeg code-list entry.
(compound 294) +=
...
295 295 294
293
298
compound
dcllocal
.... 258 decl
{ 142 expect
int i; 97 FUNC
Symbol *a= ltov(&autos, STMT); 108 getchr
nregs = length(registers); 115 istypename
143 kind
for (i = O; a[i]; i++) 34 length
registers= append(a[i], registers); 34 ltov
cp->u.block.locals = ltov(&registers, FUNC); 294 registers
} 244 retcode
291 retv
cp->u.block.locals[O .. nregs-1] are the register locals, and the au- 221 statement
tomatic locals begin at cp->u.block. locals[nregs]. This ordering en- 80 STATIC
97 STMT
sures that the register locals are announced to the back end before the 108 tsym
automatic locals. 88 wants_callb
Next, the statements are processed:
(compound 294) +=
...
295 296.... 294
while (kind[t] == IF I I kind[t] ID)
statement(loop, swp, lev);
296 CHAPTER 11 • DECLARATIONS

walk(NULL, 0, O);
foreach(identifiers, level, checkref, NULL);
As the statements are compiled, idtree increments the ref fields of the
identifiers they use. Thus, at the end of statements, the ref fields iden-
tify the most frequently accessed variables. checkref, described below,
changes the storage class of any scalar variable referenced at least three
times to REGISTER, unless its address is taken. compound sorts the locals
beginning at cp->u. block. locals [nregs] in decreasing order of ref val-
ues.
{compound 294)+=
...
295 294
{
inti = nregs, j;
Symbol p;
for ( ; (p = cp->u.block.locals[i]) !=NULL; i++) {
for (j = i; j > nregs
&& cp->u.block.locals[j-1]->ref < p->ref; j--)
cp->u.block.locals[j] = cp->u.block.locals[j-1];
cp->u.block.locals[j] = p;
}
}

addressed 179 Some of these locals now have REGISTER storage class, and sorting them
compound 293 on their estimated frequency of use permits the back end to assign reg-
foreach 41
i denti fi ers 41
isters to those that are used most often without having it do its own
idtree 168 analysis. The locals in cp->u.block. locals[O .. nregs-1] may be less
isfunc 60 frequently referenced than the others, but they're presented to the back
isvolatile 60 end first because the programmer explicitly declared them as registers.
level 42 check ref is called at the ends of compound statements for every sym-
PARAM 38
ref 38
bol in the i denti fie rs table, and it does more than change storage
REGISTER 80 classes.
scope 37 ...
walk 311 {decl.c functions)+=
static void checkref(p, cl) Symbol p; void *cl; {
293 298 ...
{checkref 296)
}

It also prevents volatile locals and parameters from landing in registers


by lighting their addressed flags:
{checkref 296)=
if (p->scope >= PARAM
...
297 296

&& (isvolatile(p->type) I I isfunc(p->type)))


p->addressed = 1;
check ref warns about unreferenced statics, parameters, and locals when
1cc's -A option appears twice:
11.7 • COMPOUND STATEMENTS 297

...
(checkref 296)+=
if (Aflag >= 2 && p->defined && p->ref == 0) {
296 297 ... 296

if (p->sclass == STATIC)
warning("static '%t %s' is not referenced\n",
p->type, p->name);
else if (p->scope == PARAM)
warning("parameter '%t %s' is not referenced\n",
p->type, p->name);
else if (p->scope >= LOCAL && p->sclass != EXTERN)
warning("local '%t %s' is not referenced\n",
p->type, p->name);
}

There's more to changing a parameter's or local's storage class from AUTO


to REGISTER than is suggested above. A parameter's storage class is
changed only if there are no explicitly declared register locals. To do
otherwise risks using the registers for parameters instead of for locals
as was intended.
...
(checkref 296)+=
if (p->sclass == AUTO
297 297 ... 296

&& (p->scope == PARAM && regcount == 0


I I p->scope >= LOCAL)
&& !p->addressed && isscalar(p->type) && p->ref >= 3.0) 179 addressed
62 Aflag
p->sclass = REGISTER; 80 AUTO
296 checkref
dcl 1ocal increments regcount for each local explicitly declared register 298 dcllocal
in any block. 50 defined
check ref also helps manage the exte rna1s table. As shown below, 40 externals
dcllocal installs locals that are declared extern in externals as well as 80 EXTERN
in i denti fi ers. When the local goes out of scope, check ref adds the 303 finalize
38 GLOBAL
value of the ref field in its i denti fie rs symbol to the ref field of its 41 identifiers
exte rna 1s symbol: 60 isscalar
... 42 level
(check ref 296) +=
if (p->scope >= LOCAL && p->sclass == EXTERN) {
297 297... 296 38 LOCAL
45 lookup
Symbol q = lookup(p->name, externals); 38 PARAM
q->ref += p->ref; 38 ref
} 291 regcount
80 REGISTER
A ref value for an identifier in the exte rna1s table thus accumulates the 37 scope
80 STATIC
references from all functions that reference that identifier.
Finally, checkref is also called at the end of compilation to check
for undefined static variables and functions. It tests for this call, which
comes from fi na1i ze, by inspecting the current scope level:
(checkref 296)+= 297 296
...
if (level == GLOBAL && p->sclass == STATIC && !p->defined
298 CHAPTER 11 • DECLARATIONS

&& isfunc(p->type) && p->ref)


error("undefined static '%t %s'\n", p->type, p->name);
l cc doesn't complain about unreferenced static functions that are de-
clared but never defined because the standard doesn't say that such dec-
larations are errors.
decl calls the last of the dclX functions, dcllocal, when it's called
from compound for each local.
....
(decl.c functions}+=
static Symbol dcllocal(sclass, id, ty, pos)
296 303
...
int sclass; char *id; Type ty; Coordinate *pos; {
Symbol p, q;

(dcl local 298}


return p;
}

Like dclglobal and dcl pa ram, dcllocal starts by checking for an invalid
storage class:
(dcllocal 298}=
if (sclass == O)
298 298
...
AUTO 80
sclass = isfunc(ty) ? EXTERN : AUTO;
compose 72 else if (isfunc(ty) && sclass != EXTERN) {
compound 293 error("invalid storage class '%k' for '%t %s'\n",
Coordinate 38 sclass, ty, id);
dclglobal 260 sclass = EXTERN;
dclparam 274
decl 258
} else if (sclass == REGISTER
eqtype 69 && (isvolatile(ty) I I isstruct(ty) I I isarray(ty))) {
EXTERN 80 warning("register declaration ignored for '%t %s'\n",
identifiers 41 ty' id);
isarray 60 sclass = AUTO;
isfunc 60 }
isstruct 60
isvolatile 60 Local variables may have any storage class, but functions must have no
level 42
LOCAL 38 storage class or extern. Volatile locals and those with aggregate types
lookup 45 may be declared register, but l cc treats them as automatics.
PARAM 38 Next, dcl local checks for redeclarations:
ref 38 ....
REGISTER
scope
80
37
(dell oca l 298} +=
q = lookup(id, identifiers);
298 299 ... 298

if (q && q->scope >= level


I I q && q->scope == PARAM && level == LOCAL)
if (sclass == EXTERN && q->sclass == EXTERN
&& eqtype(q->type, ty, 1))
ty = compose(ty, q->type);
else
11.7 • COMPOUND STATEMENTS 299

error("redeclaration of '%s' previously _


declared at %w\n", q->name, &q->src);
1cc uses different scopes for parameters and for the locals in a function's
compound-statement, but the standard treats these scopes as one. Thus,
a local declaration is a redeclaration if there's already an identifier at the
same scope or if a parameter has the same name and the local has scope
LOCAL. The code
f() {extern int x[]; extern int x[lO]; ... }
illustrates the one case when more than one declaration for a local is
permitted: when they're extern declarations. Here, the second extern
declaration contributes more information about x's type - namely, its
size.
dcl 1ocal next installs the identifier, initializes its fields, and switches
on its storage class, which dictates subsequent processing .
...
(dcllocal 298)+=
p = install(id, &identifiers, level, FUNC);
298 301... 298

p->type = ty;
p->sclass = sclass;
p->src = *pos;
switch (sclass) {
case EXTERN: (extern local 300) break; 34 append
case STATIC: (static local 300) break; 80 AUTO
case REGISTER: (register local 299) break; 294 autos
case AUTO: (autolocal299) break; 296 checkref
} 298 dcllocal
50 defined
Automatic and register locals are the easy ones; they're simply appended 80 EXTERN
to the appropriate list: 97 FUNC
337 gencode
(register local 299)= 299 41 identifiers
registers= append(p, registers); 264 initglobal
regcount++; 44 install
p->defined = 1; 42 level
38 LOCAL
291 regcount
(auto local 299) = 299 80 REGISTER
autos= append(p, autos); 294 registers
p->defined = 1; 80 STATIC
regcount is the number of locals explicitly declared register anywhere
in a function, and is used in checkref, above. Unlike globals, a local's
defined flag is lit when it's declared, before it's passed to the back end,
which occurs in gencode. Locals are treated this way because they can be
declared only once (in a given scope), and their declarations are always
definitions.
Most of the work for static locals is in dealing with the optional ini-
tialization, which is the same as what i ni tgl oba 1 does for globals:
300 CHAPTER 11 • DECLARATIONS

(static local 300} = 299


(*IR->defsymbol)(p);
initglobal(p, O);
if (!p->defined)
if (p->type->size > 0) {
defglobal(p, BSS);
(*IR->space)(p->type->size);
} else
error("undefined size for '%t %s'\n",
p->type, p->name);
p->defined = 1;
If there's no initialization, p->defined is zero when initglobal returns,
and dcl 1oca1 must allocate space for the static local. Like uninitialized
globals, uninitialized statics are defined in the BSS segment.
Locals declared extern suffer the rules summarized by the column
labelled EXTERN in the table on page 262: If there's a visible file-scope
declaration for the identifier, the local refers to that declaration. In any
case, the local is announced via the interface function defsymbo1 since
it's like a global except for scope.
(extern local 300}= 300
.... 299
if (q && q->scope == GLOBAL && q->sclass STATIC) {
BSS 91
dcllocal 298
p->sclass = STATIC;
defglobal 265 p->scope = GLOBAL;
defined 50 (*IR->defsymbol)(p);
defsymbol 89 p->sclass = EXTERN;
(MIPS) " 457 p->scope =level;
(SPARC) " 491
(X86) " 520
} else
externals 40 (*IR->defsymbol)(p);
EXTERN 80
GLOBAL 38 As this code suggests, the presence of a visible file-scope declaration for
initglobal 264 a static identifier by the same name needs special treatment. A back
install 44 end's defsymbol might treat statics and externs differently, for exam-
IR 306 ple, by using different conventions for their target-dependent names.
level 42
lookup 45 So, de 11 oca l changes the storage class and scope for duration of the
PERM 97 defsymbol call. This code also fails to check that the two identifiers
scope 37 have compatible types, because that check is made below.
space 92 Extern locals are also installed in the externals table that, as de-
(MIPS) " 459 scribed in Section 11.2, is used to detect inconsistencies in block-level
(SPARC) " 492
(X86) " 524
extern declarations.
STATIC 80
(extern local300}+=
...
300 299
{
Symbol r = lookup(id, externals);
if (r == NULL) {
r = install(p->name, &externals, GLOBAL, PERM);
11.1 • COMPOUND STATEMENTS 301

r->src = p->src;
r->type = p->type;
r->sclass = p->sclass;
q = lookupCid, globals);
if Cq && q->sclass != TYPEDEF && q->sclass != ENUM)
r = q;
}
if Cr && !eqtypeCr->type, p->type, 1))
warningC"declaration of '%s' does not match previous_
declaration at %w\n", r->name, &r->src);
}

If there's already a symbol for the identifier in externals, it must have


a compatible type. Otherwise, the identifier is installed in externals.
There's a tricky case that's not covered by dcl 1oca l's redeclaration code
shown on page 298. In
int x;
fCint x) { ... {extern float x; ... } }
the extern declaration in f for x conflicts with the file-scope declaration
for x, because they specify different types for the same x. The 1ookup
call in the redeclaration code returns a pointer to the symbol for the
parameter x and assigns that pointer to q. It's this value that's used at 298 dcllocal
the beginning {extern local) to check for file-scope identifiers; the pa- 109 ENUM
69 eqtype
rameter x hides the file-scope x, but the latter is the one that's needed 40 externals
to check for these kinds of conflicts. Thus, dcl 1oca1 looks up the iden- 80 EXTERN
tifier in globals and, if one is found, uses it to check for compatible 41 globals
types. When there's no intervening declaration that hides the file-scope 264 initglobal
identifier, this second call to lookup sets q to its existing value, which 45 lookup
is the common case. The example above is rare, but occurs nonetheless,
particularly in large programs.
dcl local concludes by parsing the optional initialization. Unlike in
initglobal, the initial value may be an arbitrary expression in some
cases. If the local has a scalar type, its initializer may be an expres-
sion or an expression enclosed in braces. If the local is a structure or
union, its initializer can be a single expression or a brace-enclosed list
of constant expressions. If the local is an array, its initializer can only
be a brace-enclosed list of constant expressions. An array must either
have an explicit size or an initializer that determines its size. de 11oca1
handles all of these cases by generating an assignment to the local:
(dcllocal 298)+= 299 298
...
if Ct == '=') {
Tree e;
if Csclass == EXTERN)
errorC"illegal initialization of 'extern %s'\n", id);
302 CHAPTER 11 • DECLARATIONS

t = gettok();
definept(NULL);
if (isscalar(p->type)
I I isstruct(p->type) && t != '{') {
if Ct == I {I) {
t = gettok();
e = exprl(O);
expect('}');
} else
e = exprl(O);
} else {
(generate an initialized static t1 302)
e = idtree(tl);
}
walk(root(asgn(p, e)), 0, O);
p->ref = 1;
}
if (!isfunc(p->type) && p->defined && p->type->size <= 0)
error("undefined size for '%t %s'\n", p->type, id);
For a local that has a scalar type, a structure type, or a union type, and
whose initializer is a single expression, the initialization is an assign-
array 61 ment of the initializer to the local. For a local that has an aggregate type
CONST 109 and a brace-enclosed initializer, 1cc generates an anonymous static vari-
defined 50 able, and initializes it as specified by the initializer. A single structure
definept 220 assignment initializes the local, even for arrays.
expect 142
exprl 157 (generate an initialized static t1 302) = 302
genident 49
Symbol tl;
GLOBAL 38
idtree 168 Type ty = p->type, tyl = ty;
initglobal 264 while (isarray(tyl))
isarray 60 tyl = tyl->type;
1sconst 60 if (!isconst(ty) && (!isarray(ty) 11 !isconst(tyl)))
isfunc 60
isscalar 60
ty = qual(CONST, ty);
isstruct 60 tl = genident(STATIC, ty, GLOBAL);
LIT 91 initglobal(tl, 1);
qual 62 if (isarray(p->type) && p->type->size 0
ref 38 && tl->type->size > 0)
STATIC 80
walk 311 p->type = array(p->type->type,
tl->type->size/tl->type->type->size, O);
This static will never be modified, so a const qualifier is added to its
type, which causes initglobal to define it in the LIT segment.
11.8 • FINALIZATION 303

11.8 Finalization
As suggested in the previous section, checkref is also called at the end
of compilation for each file-scope identifier, i.e., those with scope GLOBAL
This call comes from fi na 1 i ze, which also processes externals and glob-
als.
( decl.c functions)+=
....
298 303
void finalize() {
....
foreach(externals, GLOBAL, doextern, NULL);
foreach(identifiers, GLOBAL, doglobal, NULL);
foreach(identifiers, GLOBAL, checkref, NULL);
foreach(constants, CONSTANTS, doconst, NULL);
}

Each of fi na 1 i ze's four lines processes a set of symbols in the tables


shown in the calls to foreach. The first line processes the identifiers in
exte rna 1s. Recall that de 11oca1 installs locals that are declared extern
in this table. Some of these declarations refer to identifiers that are
also declared at file scope and thus have entries in i denti fi ers. Some,
however, refer to identifiers declared in other translation units, and these
must be imported by the translation unit in which the extern declarations
occur. doexte rn imports just these identifiers by calling the interface
function import: 296 checkref
38 CONSTANTS
(decl.c functions)+=
....
303 304 40 constants
.... 298 dcllocal
static void doextern(p, cl) Symbol p; void *cl; { 89 defsymbol
Symbol q = lookup(p->name, identifiers); 457 " (MIPS)
491 " (SPARC)
if (q) 520 " (X86)
q->ref += p->ref; 305 doconst
304 doglobal
else { 40 externals
(*IR->defsymbol)(p); 41 foreach
(*IR->import)(p); 38 GLOBAL
} 41 identifiers
} 90 import
457 " (MIPS)
import cannot be called when dcllocal encounters an extern declaration 491 " (SPARC)
523 " (X86)
because the local declaration can appear before the file-scope definition, 306 IR
and import must not be called for those identifiers. 45 lookup
The second call to foreach finalizes tentative definitions and file-scope 38 ref
extern declarations. A file-scope declaration of an object without an ini-
tializer that has no storage class or has the storage class static is a ten-
tative definition. There may be more than one such declaration for an
identifier, as long as the declarations specify compatible types. For ex-
ample, the input
304 CHAPTER 11 • DECLARATIONS

int x;
int x;
int x;
is valid and each declaration is a tentative definition for x. A file-scope
declaration with an initializer is an external definition, and there may be
only one such definition.
At the end of a translation unit, those file-scope identifiers that have
only tentative definitions must be finalized; this is accomplished by as-
smning that the translation unit includes a file-scope external definition
for the identifier with an initializer equal to zero. For example, x is fi-
nalized by assuming
int x = O;
Uninitialized file-scope objects are thus initialized to zero by definition.
dog 1oba1 processes each identifier in i denti fie rs.
...
(decl.c functions)+=
static void doglobal(p, cl) Symbol p; void *cl; {
...
303 305

if (!p->defined && (p->sclass == EXTERN


I I isfunc(p->type) && p->sclass == AUTO))
(*IR->import)(p);
array 61 else if (!p->defined && !isfunc(p->type)
AUTO 80 && (p->sclass == AUTO I I p->sclass == STATIC)) {
BSS 91 if (isarray(p->type)
defglobal 265 && p->type->size == 0 && p->type->type->size > 0)
defined 50
EXTERN 80
p->type = array(p->type->type, 1, O);
i denti fi ers 41 if (p->type->size > 0) {
import 90 defglobal(p, BSS);
(MIPS) " 457 (*IR->space)(p->type->size);
(SPARC) " 491 } else
(X86) " 523
IR 306
error("undefined size for '%t %s'\n",
isarray 60 p->type, p->name);
isfunc 60 p->defined = 1;
space 92 }
(MIPS) " 459 (print an ANSI declaration for p 305)
(SPARC) " 492
}
{X86) " 524
STATIC 80
If an extern identifier or nonstatic function is undefined, it's imported,
because it refers to a definition given in some other translation unit.
Undefined objects - those with only tentative definitions - are defined
in the BSS segment. Back ends must ensure that this segment is cleared
before execution. Arrays receive special treatment: If the array's size is
unspecified, it's defined as if it were declared with one element.
lee's -P option causes dog 1oba1 to print an ANSI-style declaration on
the standard error output.
11.9 • THE MAIN PROGRAM 305

(print an ANSI declaration for p 305) = 304


if (Pflag
&& !isfunc(p->type)
&& !p->generated && p->sclass != EXTERN)
printdecl(p, p->type);
For functions, this output includes prototypes even if the functions are
specified with old-style definitions. Editing this output helps convert old
programs to ANSI C. See Exercise 4.5.
During compilation, most constants end up in dags and thus embed-
ded in machine instructions. As specified by the configuration metrics
shown in Section 5.1, some constants cannot appear in instructions, and
string literals never appear in instructions. For each such constant, an
anonymous static variable is generated, and doconst arranges to initial-
ize that variable to the value of the constant.
(decl.c functions)+=
...
304
void doconst(p, cl) Symbol p; void *cl; {
if (p->u.c.loc) {
defglobal(p->u.c.loc, LIT);
if (isarray(p->type))
(*IR->defstring)(p->type->size, p->u.c.v.p);
else 40 constants
(*IR->defconst)(ttob(p->type), p->u.c.v); 91 defconst
455 " (MIPS)
p->u.c.loc->defined = 1; 490 " (SPARC)
p->u.c.loc =NULL; 522 " (X86)
} 265 defglobal
} 50 defined
92 defstring
The u. c. 1oc fields of symbols in the constants table point to the symbol 456 " (MIPS)
for the anonymous static. 490 " (SPARC)
523 " (X86)
80 EXTERN
303 finalize
11.9 The Main Program 50 generated
306 IR
The function main, in main.c, calls program and finalize to initiate and 60 isarray
conclude compilation, and it calls the interface functions progbeg and 60 isfunc
91 LIT
progend to let a back end do its initialization and finalization. 89 progbeg
433 " (MIPS)
(main.c functions)= 466 " (SPARC)
int main(argc, argv) int argc; char *argv[]; { 498 " (X86)
(main 306) 89 progend
return errcnt > O; 466 " (SPARC)
} 502 " (X86)
253 program
errcnt is the number of errors detected during compilation, so 1cc re- 73 ttob
turns one when there are errors. On most systems, this exit code stops
306 CHAPTER 11 • DECLARATIONS

the compilation system from running subsequent processors, such as


the assembler and linker.
Before main calls the initialization functions, it must point IR to the
appropriate interface record, as specified in Section 5.11. The back end
initializes the array bindings to pairs of names and pointers to their as-
sociated interface records. main uses its rightmost -target"'name option
to select the desired interface record:
(main.c data)=
Interface *IR "' NULL;
...
307

(main
{
306) =
...
306 305

int i, j;
for Ci "' argc - 1; i > O; i--)
if (strncmp(argv[i], "-target=<", 8) "'= 0)
break;
if Ci > O) {
for (j = O; bindings[j].name; j++)
if (strcmp(&argv[i][8], bindings[j].name) 0)
break;
if (bindings[j].ir)
bindings 96 IR= bindings[j].ir;
fprint 97 else {
Interface 79 fprint(2, "%s: unknown target '%s'\n", argv[O],
main 305 &argv[i][8]);
typelnit 58 exit(l);
}
}
}
i f (!IR) {
inti;
fprint(2, "%s: must specify one of\n", argv[O]);
for (i = O; bindings[i].name; i++)
fprint(2, "\t-target"'%s\n", bindings[i] .name);
exit(l);
}

If no -target option is given, 1cc lists the available targets and exits.
Once IR points to an interface record, the front end is bound to a target
and this binding cannot be changed for the duration of translation unit.
Next, main initializes the front end's type system and parses its other
options:
....
(main 306)+=
typeinit();
306 307... 305

argc = doargs(argc, argv);


11.9 • THE MAIN PROGRAM 307

In addition to processing the arguments the front end understands,


doargs sets i nfi 1e and outfi 1e to the first and second nonoption ar-
guments. These values name the source-file input file and the assembler
language output file. If one or both of these files is specified, main opens
the file and sets the appropriate file descriptor.
(main.c data)+=
...
306
static char *infile, *outfile;
...
(main 306)+=
if (infile && strcmp(infile, "-") != 0)
306 307... 305

if ((infd = open(infile, 0)) < 0) {


fprint(2, "%s: can't read '%s'\n",
argv[O], infile);
exit(l);
}
if (outfile && strcmp(outfile, "-") != 0)
if ((outfd = creat(outfile, 0666)) < 0) {
fprint(2, "%s: can't write '%s'\n",
argv[O], outfile);
exit(l);
}
inputinit(); 303 finalize
outputini t () ; 97 fprint
104 infd
Once the descriptors are initialized, the input and output modules are 105 inputlnit
initialized by the Ini t functions shown above, and the back end is ini- 306 IR
tialized: 305 main
... 98 outflush
(main 306)+=
t = gettok();
307 307
... 305 89
433
progbeg
" (MIPS)
(*IR->progbeg)(argc, argv); 466 " (SPARC)
498 " (X86)
doargs changes argv to hold just those options that it doesn't under- 89 progend
466 " (SPARC)
stand, which are assumed to be back-end options. doargs returns the
502 " (X86)
number of these options, which is assigned to argc above. program com- 253 program
piles the source code
...
(main 306)+=
program();
307 307... 305

and main concludes by calling fi na1 i ze and the interface procedure


progend, and by flushing the output:
(main 306)+=
...
307 305
finalize();
(*IR->progend)();
outflush();
308 CHAPTER 11 • DECLARATIONS

Further Reading
Ritchie (1993) gives a detailed history of C's development and describes
the origins and peculiarities of its declaration syntax, which is one of C's
distinguishing characteristics and the one that is most often criticized.
Sethi (1981) summarizes the ramifications of those design decisions, and
proposes an alternative syntax for declarators in which pointers are de-
noted by the suffix A as in Pascal instead of C's prefix *. If his alternative
had been adopted, dcl rand dcl rl would be much simpler.
Like most high-level languages, C demands that identifiers be declared
before they are used (functions are the lone exception). This rule forces
language designers to permit multiple declarations and induces rules
such as those for C's tentative definitions. Much of the code in dclX,
doglobal, and doextern is devoted to dealing with these design deci-
sions. Modula-3 (Nelson 1991) is one of the few languages that permits
declarations and uses to appear in any order and avoids ordering rules
altogether, which is simpler to understand. This design decision does
have its own impact on the compiler, but that impact is no greater than
the impact of C's rules governing multiple declarations.

Exercises
dclrl 267
dclr 265 11.1 dcl rl accepts the erroneous declaration int *const const *p, yet
doextern 303 l cc issues the expected diagnostic
doglobal 304
illegal type 'const const pointer to int'
Where and how is this error detected?
11.2 dcl rl's implementation looks peculiar. The syntax specification
on page 266 suggests that dcl rl begin with a loop that consumes
pointer followed by parsing the rest of declarator. Rewrite dcl rl
using this approach. You'll find that you'll need to append the
pointer portion of the inverted type to the inverted type constructed
by parsing the rest of declarator. Change your implementation into
one similar to dcl rl's by applying program transformations.
11.3 Type names are used in casts and as operands to si zeof (see (type
cast) and (sizeof) ). The syntax for type definitions is
type-name:
{ type-specifier I type-qualifier } [ abstract-declarator ]
abstract-declarator:
* { type-qualifier }
pointer ' (' abstract-declarator ') '
EXERCISES

{ suffix-abstract-declarator}
pointer { suffix-abstract-declarator }
-
suffix-abstract-declarator:
' [' [ constant-expression] ']'
' C' parameter-list ') '
An abstract-declarator is a declarator without an embedded identi-
fier. Implement
(main.c exported functions}=
extern Type typename ARGS((void));
...
309

which parses type-name. dcl r parses an abstract-declarator when


its abstract argument is one, so typename can get dcl r to do most
of the work and takes less than 10 lines.
11.4 Implement
....
(main.c exported functions}+=
extern void checklab ARGS((Symbol p, void *cl));
309 310 ...
which is called for each symbol in stmtl abs. checkl ab issues an
error if p is an undefined label.
11.5 dcl 1oca1 calls i ni tgl oba1 to parse the initialization for a static 298 dcllocal
local, but it also parses an optional initialization. Nevertheless, 1cc 265 dclr
280 fields
correctly rejects input such as 264 initglobal
226 stmtlabs
f() { static int x = 2 3; }
g() { static int y = 2; 3; }

Explain how.
11.6 In fields, the field with the largest alignment determines the align-
ment of the entire structure, which is correct only because the sizes
and alignments of the basic types must be powers of two. Revise
fie 1ds so that it is correct for any positive values for the sizes and
alignments of the basic types.
11. 7 A bit field declaration like unsigned: 0 causes subsequent bit fields
to be placed in the next addressable storage unit, even if there's
room in the current one. For example, if the declaration in Fig-
ure 11. l is rewritten as

struct {
char a[S];
short sl, s2;
unsigned code:3, :0, used:l;
310 CHAPTER 11 • DECLARATIONS

int amt:?, last;


short id;
}

the code field stays in the unsigned at offset 12, and used lands to
the right of amt in the unsigned at 16. Explain how fields handles
this case.
11.8 Reading fields is excruciating. Write a new - presumably bet-
ter - version and compare the two versions side-by-side. Is your
version easier to understand? Do you have more confidence in its
correctness?
11.9 The syntax for enumeration specifiers is
enum-specifier:
enum [identifier] ' {' enumerator { , enumerator} '}'
enum identifier
enumerator:
identifier
identifier= constant-expression

Implement the parsing function for enum-specifier


ENUM 109
fields 280 (main.c exported functions}+=
...
309
i dent:ifi ers 41 extern Type enumdcl ARGS((void));
newst:ruct: 67
st:ruct:dcl 277 enumdcl is similar to structdcl, but much simpler, and there's no
special rule about enum-specifiers appearing alone in declarations
because there are no mutually recursive enumeration definitions.
So an enum-specifier with enumerators must not refer to an ex-
isting enumeration type. enumdcl can use newstruct to define a
new enumeration type, and it installs the enumeration constants in
i denti fi ers with storage class ENUM. The integer value of an enu-
meration constant is stored in the symbol's u. value field.
12
Generating Intermediate Code

The remaining missing pieces of 1cc's front end are those that convert
trees to dags and append them to the code list, and the functions gen code
and emi tcode, which back ends call from their function interface proce-
dures to traverse code lists. These pieces appear in dag. c, which exports
gencode and emitcode (see Section 5.10) and
(dag.c exported functions)+=
....
93
extern void walk ARGS((Tree e, int tlab, int flab));
extern Node listnodes ARGS((Tree e, int tlab, int flab));
extern Node newnode ARGS((int op, Node left, Node right,
Symbol p));
walk and listnodes manipulate the forest of dags defined in Section 5.5.
A sequence of forests represents the code for a function. The sequence is
formed by the Gen, Jump, and Label entries in a code list. As outlined in
Section 10.3, 1 i stnodes constructs a sequence incrementally; it converts
218 code
the tree e to a dag and appends that dag to the forest. Figures 5.2 and 5.3 28 deallocate
(pages 86 and 8 7) show examples of forests. 341 emitcode
wa1 k converts the tree e to a dag by calling 1i st nodes. It appends the 92 function
forest to the code list in a Gen entry, and reinitializes the front end for a 448 " (MIPS)
new forest. 1i st nodes bears the complexity of converting trees to dags, 484 " (SPARC)
518 " (X86)
so wa 1 k is easy: 337 gencode
217 Gen
( dag.c functions)= 315
..... 217 Jump
void walk(tp, tlab, flab) Tree tp; int tlab, flab; { 217 Label
listnodes(tp, tlab, flab); 318 listnodes
if (forest) { 315 newnode
code(Gen)->u.forest = forest->link; 317 reset
97 STMT
forest->link =NULL;
forest = NULL;
}
reset();
deallocate(STMT);
}

(dag.c data)= 314


.....
static Node forest;
forest points to the last node in the current forest, which, while it's
under construction, is a circularly linked list threaded through the 1 ink
311
312 CHAPTER 12 • GENERA TING INTERMEDIATE CODE

fields of nodes, so forest-> 1ink is the first node in the forest. wa 1 k


turns this list into a noncircular linked list as it appends the forest to
the code list.
The values of tlab and flab passed to listnodes and walk are label
numbers and are used when e is a conditional expression, like a compar-
ison. If tlab is nonzero, listnodes generates code that jumps to tlab
when e is nonzero. If fl ab is nonzero, 1i st nodes generates code that
jumps to fl ab when e is zero. Section 10.4 shows how these labels are
used in generating code for if statements. Only one of t 1ab or fl ab can
be nonzero.
newnode allocates a node and initializes its fields with the values of its
arguments. newnode is called by define 1ab and jump in the front end,
and by back ends that must build dags to spill registers, for example.
1i stnodes also eliminates common subexpressions - expressions that
compute redundant values. For example, Figure 8.1(page148) shows the
tree for the expression (a+b)+b*(a+b). The value of a+b is computed
twice. The rvalue of b is also computed twice: once in a+b and again in
the multiplication. The rvalue of b is a trivial computation, but a redun-
dant one nonetheless. Eliminating these common subexpressions yields
the dag shown in Figure 12.1. Lvalues can also be common expressions;
p's !value in the forest shown in Figure 5.3 (page 87) is an example. In
these and other figures that depict dags, the operators are shown as they
definelab 246 appear in Table 5.1. Omitting the +before suffixes distinguishes trees
jump 247 from dags.
listnodes 318 Some trees built by the front end are really dags because they mir-
newnode 315
walk 311 ror the dags implicit in the source language by the augmented assign-
ment and postfix operators. The trees for a += b, shown in Figure 8.2
and for i ++, shown in Figure 8.3, are examples. 1 i st nodes must detect
such idioms in order to generate intermediate code that evaluates the
operands of these operators as dictated by the standard, which says that
the operands of the prefix, suffix, and augmented assignment operators
must be evaluated only once.

ADDI

k~
/"\)b
IND I RI INDIRI

i
ADDRGP
i
ADDRGP
a b
FIGURE 12.1 Dag for (a+b)+b*(a+b).
12.1 • ELIMINATING COMMON SUBEXPRESSIONS 313

Trees contain operators that are not part of the interface repertoire
listed in Table 5.1. 1i stnodes eliminates all occurrences of these oper-
ators, which are listed in Table 8.1, by generating code that implements
them. For example, it implements AND by annotating nodes for the com-
parison operators with labels and by inserting jumps and defining labels
where necessary. Similarly, it implements FIELD, which specifies bit-field
extraction or assignment, by appropriate combinations of shifting and
masking.
A basic block is a sequence of instructions that have a single entry
and a single exit with straight-line code in between; if one instruction
in the block is executed, all the instructions are executed. Instructions
that are targets of jumps and that follow a conditional or unconditional
jump begin basic blocks. Compilers often use a flow graph to represent a
function. The nodes in a flow graph are basic blocks and the directed arcs
indicate the possible flow of control between basic blocks. 1cc's code list
is not a flow graph, and the forests in Gen entries do not represent basic
blocks because they can include jumps and labels. They represent what
might be called expanded basic blocks: They do have single entry points,
but they can have multiple exits and multiple internal execution paths.
As the implementation of 1i stnodes below reveals, this design makes
common expressions available across entire expanded basic blocks in
some cases. It thus extends the lifetimes of these subexpressions beyond
basic blocks with little extra effort. 149 AND
40 constants
149 FIELD
217 Gen
12.1 Eliminating Common Subexpressions 318 listnodes
315 node
1 i stnodes takes a Tree t and builds the corresponding Node n. Trees
are defined in Section 8.1, and nodes are defined in Section 5.5. n->op
comes from t->op, and the elements of n->syms come from t->u. sym,
or from installing t->u. v in the constants table, or are fabricated from
other constants by 1 i stnodes. The elements of n->ki ds come from the
nodes for the corresponding elements of t->ki ds. n also has a count
field, which is the number of nodes that reference n as an operand.
Nodes are built from the bottom up: n is built by traversing tin post-
order with the equivalent of
1 = listnodes(t->kids[O], 0, O);
r = listnodes(t->kids[l], 0, O);
n = node(t->op, l, r, t->u.sym);
node allocates a new node and uses its arguments to initialize the fields.
To eliminate common subexpressions, node must determine if the re-
quested node has already been built; that is, if there's a node with the
same fields that can be used instead of building a new one.
node keeps a table of available nodes and consults this table before
allocating a new node. When it does allocate a new node, it adds that
314 CHAPTER 12 • GENERA TING INTERMEDIATE CODE

Called With Builds Returns


ADDRG+P a l=(ADDRGP a) 1
INDIR+I 1 2=(INDIRI 1) 2
ADDRG+P b 3=(ADDRGP b) 3
INDIR+I 3 4=(INDIRI 3) 4
ADD+! 2 4 S=(ADDI 2 4) 5
ADDRG+P b 3
INDIR+I 3 4
ADDRG+P a 1
INDIR+I 1 2
ADDRG+P b 3
INDIR+I 3 4
ADD+! 2 4 5
MUL+I 3 5 6=(MULI 3 5) 6
ADD+! 5 6 ?=(ADDI 5 6) 7

TABLE 12.1 Calls to node for (a+b)+b*(a+b).

node to the table. Building the dag shown in Figure 12.1 from the tree
shown in Figure 8.1 (page 148) illustrates how this table is constructed
and used. Table 12.1 shows the sequence of calls to node, the node
each builds, if any, and the value returned. The middle column shows
node 315 the evolution of the table consulted by node. The nodes are denoted by
numbers. The first call is for the ADDRG+P tree in the lower left corner of
Figure 8.1; node's table is empty, so it builds a node for the ADDRG+P and
returns it. The next four calls, which traverse the remainder of the tree
for a+b, are similar; each builds the corresponding node and returns it.
As nodes are returned, they're used as operands in other nodes. When
node is called for the ADDRG+P node at the leaf of the left operand of the
MUL+I, it finds that node in the table (node 3) and returns it. Similarly, it
also finds that node 4 corresponds to (INDIRI 3). node continues to find
nodes in the table, including the node for the commmon subexpression
a+b.
The nodes depicted in the second column of the table above are stored
in a hash table:
(dag.c data}+=
....
311 333
.....
static struct dag {
struct node node;
struct dag *hlink;
} *buckets[16];
int nodecount;
dag structures hold a node and a pointer to another dag in the same
hash bucket. nodecount is the total number of nodes in buckets. The
hash table rarely has more than a few tens of nodes, which is why it
12.1 • ELIMINATING COMMON SUBEXPRESSIONS 315

has only 16 buckets. node searches buckets for a node with the same
operator, operands, and symbol and returns it if it's found; otherwise, it
builds a new node, adds it to buckets, and returns it.
....
( dag.c functions)+= 311 315
.....
static Node node(op, 1, r, sym)
int op; Node l, r; Symbol sym; {
int i;
struct dag *p;

i = (opindex(op)A((unsigned)sym>>2))&(NELEMS(buckets)-1);
for (p = buckets[i]; p; p = p->hlink)
if (p->node.op op && p->node.syms[O] sym
&& p->node.kids[O] == 1 && p->node.kids[l] r)
return &p->node;
p = dagnode(op, 1, r, sym);
p->hlink = buckets[i];
buckets[i] = p;
++nodecount;
return &p->node;
}

dagnode allocates and initializes a new dag and its embedded node. It
also increments the count fields of the operand nodes, if there are any. 81 count

(dag.c functions)+=
....
315 315
314 dag
97 FUNC
..... 318 listnodes
static struct dag *dagnode(op, l, r, sym)
int op; Node l, r; Symbol sym; { 19 NELEMS
24 NEWO
struct dag *p; 314 nodecount
98 opindex
NEWO(p, FUNC);
p->node.op = op;
if ((p->node.kids[OJ 1) != NULL)
++1->count;
if ((p->node.kids[l] r) != NULL)
++r->count;
p->node.syms[O] = sym;
return p;
}

1i stnodes calls node to use a node for a common subexpression or to


allocate a new node. It calls newnode to bypass the search and build a
new node that is not added to buckets:
....
( dag.c functions)+= 315 316
.....
Node newnode(op, l, r, sym) int op; Node l, r; Symbol sym; {
return &dagnode(op, l, r, sym)->node;
}
316 CHAPTER 12 • GENERA TING INTERMEDIATE CODE

Only newnode can be used by back ends to build nodes for their own
uses, such as generating code to spill registers.
Nodes appear in buckets as long as the values they represent are
valid. Assignments and function calls can invalidate some or all nodes
in buckets. For example, in
c = a + b;
a= a/2;
d = a + b;
the value of a+b computed in the third line isn't the same as the value
computed in the first line. The second line's assignment to a invalidates
the node (INDIRI a) where a is the node for the !value of a. Operators
with side effects, such as ASGN and CALL, must remove the nodes that
they invalidate. While these nodes are different for each such operator,
1 cc handles only two cases: Assignments to an identifier remove nodes
for its rvalue, and all other operators with side effects remove all nodes.
ki 11 handles assignments:
...
( dag.c functions)+=
static void kill(p) Symbol p; {
...
315 317

inti;
struct dag **q;
dag 314
i sadd rop 179 for (i = O; i < NELEMS(buckets); i++)
NELEMS 19 for (q = &buckets[i); *q; )
newnode 315 if ((*qrepresentsp'srvalue316)) {
nodecount 314
node 315 *q = (*q)->hlink;
--nodecount;
} else
q = &(*q)->hlink;
}

The obvious rvalue of p is a dag of the form (INDIR (ADDRxP p)), where
the ADDRxP is any of the address operators. The less obvious case is a
dag of the form (INDIR oc) where oc is an arbitrary address computation,
which might compute the address of p. Both cases are detected by
(*q represents p's rvalue 316)= 316
generic((*q)->node.op) == INDIR &&
(!isaddrop((*q)->node.kids[O]->op)
I I (*q)->node.kids[O]->syms[O] == p)
Only the INDIR nodes must be removed, because that's enough to make
subsequent searches fail. For example, after the assignment a = a/2,
the node for a+b remains in buckets. But the a+b in the assignment to d
won't find it because the reference to the rvalue of a builds a new node,
which causes a new node to be built for a+b. The sequence of calls to
12.2 • BUILDING NODES 317

Called With Builds Kills Returns


ADDRG+P c l=(ADDRGP c) 1
ADDRG+P a 2=(ADDRGP a) 2
INDIR+I 2 3=(INDIRI 2) 3
ADDRG+P b 4=(ADDRGP b) 4
INDIR+I 4 5=(INDIRI 4) 5
ADD+! 3 5 6=(ADDI 3 5) 6
ASGN+I 1 6 7
ADDRG+P a 2
INDIR+I 2 3
CNST+I 2 8=(CNSTI 2) 8
DIV+! 3 8 9=(DIVI 3 8) 9
ASGN+I 2 9 3 10
ADDRG+P d ll=(ADDRGP d) 11
ADDRG+P a 2
INDIR+I 2 12=(INDIRI 2) 12
ADDRG+P b 4
INDIR+I 4 5
ADD+! 12 5 13=(ADDI 12 5) 13
ASGN+I 11 13 14

TABLE 12.2 Calls to node for c = a + b; a = a/2 ; d = a + b.


316 kil 1
318 listnodes
node for this example appears in Table 12.2. The rvalue of a ind = a + b 315 newnode
reuses the lvalue but builds a new INDIRI node because the assignment 314 nodecount
a = a/2 killed node 3. Assignments build nodes by calling newnode, so 315 node
they don't appear in buckets.
reset removes all nodes in buckets by clearing both buckets and
nodecount:
(dag.c functions)+=
....
316 318
.....
static void reset() {
if (nodecount > 0)
memset(buckets, 0, sizeof buckets);
nodecount = O;
}

12.2 Building Nodes


1 i stnodes builds a node for its argument tree by calling itself recur-
sively on the tree's operands, calling node or newnode depending on the
operator, and calling ki 11 or reset when necessary.
318 CHAPTER 12 • GENERA TING INTERMEDIATE CODE

...
(dag.c functions)+= 317 321
Node listnodes(tp, tlab, flab) Tree tp; int tlab, flab; {
...
Node p =NULL, l, r;

if (tp == NULL)
return NULL;
if (tp->node)
return tp->node;
switch (generic(tp->op)) {
(1 i stnodes cases 318)
}
tp->node = p;
return p;
}

tp->node points to the node for the tree tp. This field marks the tree
as visited by 1i stnodes, and ensures that 1i stnodes returns the correct
node for trees that are really dags, such as those shown in Figures 8.2
and 8.3 (pages 167 and 158). The multiply referenced subtrees in these
idioms are visited more than once; the first visit builds the node and the
subsequent visits simply return it.
The switch statement in 1i stnodes collects the operators into groups
AND 149 that have the same traversal and node-building code:
COND 149
node 315 (1 i stnodes cases 318)= 318
NOT 149 case AND: { (AND 323) } break;
OR 149 case OR: { (OR) } break;
RIGHT 149 case NOT: { (NOT 322) }
case COND: { (COND 325) } break;
case CNST: { (CNST 327) } break;
case RIGHT: { (RIGHT335) } break;
case JUMP: { {JUMP321}} break;
case CALL: { (CALL 332) } break;
case ARG: { (ARG 334) } break;
case EQ: case NE: case GT: case GE: case LE:
case LT: { (EQ .. LT321) } break;
case ASGN: { (ASGN 328) } break;
case BOR: case BAND: case BXOR:
case ADD: case SUB: case RSH:
case LSH: { (ADD .. RSH319)} break;
case DIV: case MUL:
case MOD: { (DIV .. MOD)} break;
case RET: { (RET) } break;
case CVC: case CVD: case CVF: case CVI:
case CVP: case CVS: case CVU: case BCOM:
case NEG: { (CVx,NEG,BCOM319) } break;
12.2 • BUILDING NODES 319

case INDIR: { (INDIR 319) } break;


case FIELD: { (FIELD 320) } break;
case ADDRG:
case ADDRF: { (ADDRG,ADDRF319) } break;
case ADDRL: { (ADDRL319)} break;
The largest operator group is the one for the unary operators. The traver-
sal code visits the lone operand and builds the node:
(CVx, NEG, BCOM 319): 318
1 = listnodes(tp->kids[O], 0, O);
p = node(tp->op, l, NULL, NULL);
The traversal code for the binary operators is similar:
(ADD .. RSH 319) = 318
1 = listnodes(tp->kids[O], 0, O);
r = listnodes(tp->kids[l], 0, O);
p = node(tp->op, l, r, NULL);
DIV, MUL, and MOD aren't included in this case because they must be
treated as calls if the interface flag mulops_cal 1sis set; see Exercise 12.5.
The three address operators build nodes for the lvalues of the symbols
they reference:
219 addlocal
(ADDRG,ADDRF 319)= 319 149 FIELD
p = node(tp->op, NULL, NULL, tp->u.sym); 60 isptr
60 isstruct
60 isvolatile
(ADDRL 319): 319 318 listnodes
if (tp->u.sym->temporary) 217 Local
addlocal(tp->u.sym); 87 mulops_calls
p = node(tp->op, NULL, NULL, tp->u.sym); 315 node
50 temporary
If a local is a temporary, it may not yet appear on the code list. addl oca1 60 unqual
65 vfields
adds a Loca1 code list entry for the temporary, if necessary. These en-
tries are not made earlier because some temporaries are never used and
thus need never be announced. Waiting until the last possible moment to
generate Loca1 code-list entries effectively discards unused temporaries.
INDIR trees build nodes for rvalues, but locations declared volatile
demand special treatment:
(INDIR 319): 319
Type ty = tp->kids[O]->type;
1 = listnodes(tp->kids[O], 0, O);
if (isptr(ty))
ty = unqual(ty)->type;
if (isvolatile(ty)
I I (isstruct(ty) && unqual(ty)->u.sym->u.s.vfields))
320 CHAPTER 12 • GENERATING INTERMEDIATE CODE

p = newnode(tp->op, l, NULL, NULL);


else
p = node(tp->op, l, NULL, NULL);
If the lvalue has a type (POINTER T), INDIR is treated like the other unary
operators. But if the lvalue has a type (POINTER (VOLATILE T)), every
read of the rvalue that appears in the source code must actually read the
rvalue at execution. This constraint means that the rvalue must not be
treated as a common subexpression, so the node is built by newnode. This
constraint also applies to lvalues with types (POINTER (STRUCT ... ))
and (POINTER (UNION ...) ) where one or more fields of the structure
or union are declared volatile.
Bit fields are referenced by FIELD trees. An assignment to a bit field
appears as an ASGN tree with a FIELD tree as its left operand; the case for
ASGN, described in Section 12.4, detects this idiom. Appearances of FIELD
in other trees denote the rvalue of the bit field. FIELD operators can
appear only in trees, so list nodes must synthesize bit-field extraction
from other operations, such as shifting and masking.
There are two cases for extracting a bit field of s bits that lies m bits
from the right of the unsigned or integer in which it appears. If the field
is unsigned, it could be extracted by shifting it to the right m bits then
ANDing it with a mask of s ones. If the field is signed, however, its most
consttree 193 significant bit is treated as a sign bit and must be extended when the
FIELD 149 field is fetched. Thus, a signed field can be extracted by code like
field 182
fieldleft 66 ((int)((*p)«(32 - m)))»(32 - s)
fieldsize 66
listnodes 318 assuming a 32-bit word and that p points to the word that holds the field.
newnode 315 This expression shifts the word left so the field's most significant bit is
node 315 in the sign bit, then shifts it right arithmetically, which drags the sign bit
into the vacated bits. This expression also works for the unsigned case
by replacing the cast to an int with a cast to an unsigned, which is what
l i stnodes uses for both cases.
(FIELD 320)=: 319
Tree q = shtree(RSH,
shtree(LSH, tp->kids[O],
consttree(fieldleft(tp->u.field), inttype)),
consttree(8*tp->type->size - fieldsize(tp->u.field),
inttype));
p = listnodes(q, 0, O);
fieldleft is 32 - m and the first argument to consttree is 32 -s. The
type of the tree built by the inner call to shtree depends on the type of
tp->kids[O] and will be int or unsigned, which causes the outer shtree
to generate an RSH+I or an RSH+U.
12. 3 • FLOW OF CONTROL 321

12.3 Flow of Control


The unary and binary operators described in the previous section con-
tribute nodes to the node table, and they're referenced by other nodes,
but, except for INDIR nodes, they never appear as roots in the forest. A
node appears as a root if it has a side effect or if it must be evaluated
before the nodes in the dags further down the forest. The appearance of
the INDIRP node as a root in Figure 5.3 on page 87 is an example of this
second case. Assignments, calls, returns, labels, jumps, and conditional
jumps are examples of the first case. Some of these operators also af-
fect the node table because they can alter flow of control. Jumps are the
simplest:
(JUMP 321)= 318
l = listnodes(tp->kids[O], 0, O);
list(newnode(JUMPV, l, NULL, NULL));
reset();
The node table must be reset at jumps since none of the expressions in
the table can be used in the code that follows the jump. The JUMPV node
is listed - appended to the forest as a root - by list:
(dag.c functions)+=
...
318 323
.....
static void list(p) Node p; { 311 forest
if (p && p->link ==NULL) { 318 listnodes
if (forest) { 315 newnode
p->link = forest->link; 317 reset
forest->link = p;
} else
p->link = p;
forest = p;
}
}

forest is a circularly linked list, so it points to the last node on the list,
unless it's null, which causes list to initialize forest. The link field
also marks the node as a root, and list won't list roots more than once.
The comparison operators illustrate the use of the arguments tlab
and flab to listnodes. Only one of tlab or flab can be nonzero. The
operator jumps to tl ab if the outcome of the comparison is true and to
fl ab if the outcome is false. Nodes for comparison operators carry the
destination as a label symbol in their syms [OJ fields. This symbol is the
destination when the comparison is true; there is no way to specify a
destination for a false outcome. The case for the comparison operators
thus uses the inverse operator when fl ab is nonzero:
(EQ •• LT321)= 318
Node p;
322 CHAPTER 12 • GENERATING INTERMEDIATE CODE

l = listnodes(tp->kids[O], 0, O);
r = listnodes(tp->kids[l], 0, O);
if (tlab)
list(newnode(tp->op, l, r, findlabel(tlab)));
else if (flab) {
int op= generic(tp->op);
switch (generic(op)) {
case EQ: op= NE+ optype(tp->op); break;
case NE: op= EQ + optype(tp->op); break;
case GT: op= LE+ optype(tp->op); break;
case LT: op= GE+ optype(tp->op); break;
case GE: op= LT+ optype(tp->op); break;
case LE: op= GT+ optype(tp->op); break;
}
list(newnode(op, l, r, findlabel(flab)));
}
if (forest->syms[O])
forest->syms[O]->ref++;
1i stnodes also handles the control-flow operators that appear only in
trees: AND, OR, NOT, and COND. NOT is handled by simply reversing the true
and false labels in a recursive call to 1i stnodes:
AND 149 (NOT 322)=
COND 149 318
cond 174 return listnodes(tp->kids[O], flab, tlab);
fi ndl abe l 46
forest 311 AND and OR use short-circuit evaluation: They must stop evaluating
list 321 their arguments as soon as the outcome is known. For example, in
l i stnodes 318
newnode 315 if Ci>= 0 && i < 10 && a[i] >max) max= a[i];
NOT 149
optype 98 a[i] must not be evaluated if i is less than zero or greater than or equal
OR 149 to 10. The operands of AND and OR are always conditional expressions or
ref 38 constants (andtree calls cond for each operand), so the cases for these
operators need only define the appropriate true and false labels and pass
them to the calls on l i stnodes for the operands.
Suppose tlab is zero and flab is L; the short-circuit code generated
for e1 && e2 has the form
if e1 == 0 goto L
if e2 == 0 goto L
In other words, if e1 is zero, execution continues at L and e2 is not eval-
uated. Otherwise, e2 is evaluated and execution continues at L if e1 is
nonzero but e2 is zero. Control falls through only when both e1 and e2
are nonzero. When t 1ab is L and fl ab is zero, control falls through when
e1 or e2 is zero, and execution continues at L only when e1 and e2 are
both nonzero. The generated code has the form
12.3 • FLOW OF CONTROL 323

if e 1 0 goto L'
if e2 != 0 goto L
L':

In this case, if e1 is zero, control falls through without evaluating e2 . The


1i stnodes code for AND is thus
(AND 323)= 318
i f (flab) {
listnodes(tp->kids[O], 0, flab);
listnodes(tp->kids[l], 0, flab);
} else {
flab= genlabel(l);
listnodes(tp->kids[O], 0, flab);
listnodes(tp->kids[l], tlab, O);
labelnode(flab);
}

The code for OR is similar; see Exercise 12.2.


1abe1 node appends a LABELV node to the forest:
(dag.c functions)+=
....
321 327
....
static void labelnode(lab) int lab; {
if (forest && forest->op == LABELV) 149 AND
equatelab(findlabel(lab), forest->syms[O]); 248 equatelab
else 46 fi ndl abe l
list(newnode(LABELV, NULL, NULL, findlabel(lab))); 311 forest
reset(); 45 genlabel
321 list
}
318 listnodes
315 newnode
If the last node in the forest is a label, there's no reason to append an- 149 OR
other one; the new label, 1ab, is made a synonym for the existing label 317 reset
as described in Section 10.9. Common subexpressions in the node table
must be discarded at a label because there can be more than one path
to the subsequent code, so 1abe1 node calls reset.
As detailed in Section 8.6, OR and AND are treated as right-associative
operators, so expressions such as e1 && e2 && ... && en build right-heavy
trees, as depicted in Figure 12.2.
This arrangement guarantees that the recursive calls to 1i stnodes in
the code above visit the expressions ei in the correct order to yield short-
circuit evaluation. It also helps eliminate common subexpressions that
appear in the ei. For example, in
if (a[i] && a[i]+b[i] > 0 && a[i]+b[i] < 10) ...
where a and b are integer arrays, the address computation 4*i, the rval-
ues of a[i] and b[i], and the sum a[i]+b[i] are each computed once
and reused as necessary. The AND tree is passed to 1i stnodes with
324 CHAPTER 12 • GENERA TING INTERMEDIATE CODE:

FIGURE 12.2 Tree for e1 && e2 && . . . && en .

fl ab equal to, say, 2, and the recursive calls descend the tree pass-
ing 2 as fl ab. There are no intervening calls to reset, so the second
and third subexpressions can reuse values computed in first and second
subexpressions. The forest for this statement is shown in Figure 12.3.
The 2s under the comparison operators and under the LABELV denote
the symbol-table pointers to the label 2 in their syms [OJ fields.
The expression e ? l : r yields the COND tree shown in Figure 12.4.
The RIGHT tree serves only to carry the two assignment trees. The gen-
erated code is similar to the code for an if statement:
if e == 0 goto L
COND 149 t1 = l
reset 317 goto L + 1
RIGHT 149
L: t1 =r
L + 1:

EQI - - - - - - - - - - - - - _,..LEI - - - - _,..GE! - _,.. · · · - •LABELV


L2~ k!---/2~ 2
INDIRI~ ~~TI ~DOI CNSTI
i ~ ~ 10
ADDP IND I RI

LSHI
k ~ADDRGP i
ADDP
/ ~ a ~
INDIRI CNSTI ADDRGP
i 2 b
ADDRGP

FIGURE 12.3 Forest for a[i] && a[i ]+b[i] > 0 && a[i ]+b[i] < 10.
12.3 • FLOW OF CONTROL 325

COND
/tl~
e RIGHT

/~
ASGN ASGN

/~ l
ADDRL+P
/~ r
ADDRL+P
tl tl
FIGURE 12A Tree for e ? l : r.

The rvalue of t1 is the result of the COND expression. The assignments to


tl are omitted if the value of the conditional expression is not used or
if both l and r are void expressions. The code for COND begins by adding
a LOCAL code list entry for tl, if it's present; generating Land L + 1; and
traversing the conditional expression e with L as the false label.
(COND 325}=
Tree q = tp->kids[l];
...
325 318

if (tp->u.sym)
addlocal(tp->u.sym);
flab= genlabel(2); 219 addlocal
listnodes(tp->kids[O], 0, flab); 149 COND
reset(); 248 equatelab
46 findlabel
Next, the code for this case generates nodes for first assignment, the 311 forest
jump, L, the second assignment, and L + 1: 45 genlabel
247 jump
.... labelnode
(COND 325}+=
listnodes(q->kids[O], 0, O);
325 ...
326 318 323
321 list
318 listnodes
(equate LABEL to L + 1 325} 38 LOCAL
list(jump(flab + 1)); 317 reset
labelnode(flab);
listnodes(q->kids[l], 0, 0);
(equate LABEL to L + 1 325}
labelnode(flab + 1);

(equate LABEL to L + 1 325}= 325


if (forest->op == LABELV) {
equatelab(forest->syms[O], findlabel(flab + 1));
unlist();
}

If the last node in either arm is a label, it can be equated to L + 1 and


removed from the forest, which is what un 1i st does. Removing this label
in the first arm can eliminate a branch to a branch; see Exercise 12.7.
326 CHAPTER 12 • GENERA TING INTERMEDIATE CODE

The value of the conditional expression, if there is one, is in the tem-


porary tl. The COND tree has contributed the nodes that generate the
assignments to tl, but the COND tree itself has no value. The node that's
returned - and that annotates the COND tree - is the result of generating
a node for the rvalue of tl:
(COND 325)+=
....
325 318
i f (tp->u.sym)
p = listnodes(idtree(tp->u.sym), 0, O);
The call to reset after traversing tp->ki ds [OJ, the tree fore, discards
common subexpressions before traversing l or r, because conditional
operands may be evaluated before other operands in subexpressions.
For example, in the assignment
n = a[i] + (i>O?a[i]:O);

the conditional is evaluated before the addition's left operand even


though the left operand's tree is traversed first. Without the call to reset,
the node for the common subexpression a [ i ] would appear in the node
table when the tree for (i>O?a[i] :0) is traversed, and the generated
code would be equivalent to
if i <= 0 goto L1
COND 149 t2 = a[i]
conditional 225
1dtree 168 t1 = t2
ifstmt 224 goto L2
l1stnodes 318 L1: t1 = 0
reset 317 L2: n = t2 + tl
s1mplify 203
where t1 holds the value of the conditional, and t2 holds the value of
a [ i]. t2 is computed only in the then arm of the conditional, but is used
to compute the sum. reset must be called whenever the evaluation order
might be different than the traversal order, and when the generated code
might have multiple execution paths.
Most constants appear as operands to the unary and binary operators,
but constant folding makes it possible for an integer constant to appear
as the first operand to the comparisons and hence to COND. For example,
the statement
if (2.5) ...

causes con di tiona 1 to build the expression 2 . 5 ! = 0. 0, which si mp 1i fy


folds to a tree for the integer constant 1. i fstmt passes this tree to
1i stnodes with a nonzero fl ab. For an integer CNST tree, 1i stnodes
generates a jump if t 1ab is nonzero and the constant is nonzero, or if
fl ab is nonzero and the constant is zero:
12.3 • FLOW OF CONTROL 327

(CNST327)=: 327
.... 318
Type ty = unqual(tp->type);
if (tlab I I flab) {
if (tlab && tp->u.v.i != 0)
list(jump(tlab));
else if (flab && tp->u.v.i == 0)
list(jump(flab));
}
For the example above, nothing is generated for the CNST tree, which is
exactly what the programmer intended. A jump is generated for code
like
while (1) ...
Constants that don't appear in conditiona~ contexts yield CNST nodes,
unless their types dictate that they should be placed out of line:
(CNST327)+=
...
327 318
else if (ty->u.sym->addressed)
p = listnodes(cvtconst(tp), 0, O);
else
p = node(tp->op, NULL, NULL, constant(ty, tp->u.v));
typeinit sets the addressed flag in a basic type's symbol-table entry to
one if constants of that type cannot appear in instructions. Thus, a con- 179 addressed
stant whose type's symbol has addressed set is placed in a variable and 62 atop
references to it are replaced by references to the rvalue of the variable. 47 constant
305 doconst
The constant 0.5 in Figure 1.2 (page 6) is an example; it appears in the 303 finalize
tree, but ends up in a variable as shown in Figure 1.3. cvtconst gener- 49 genident
ates the anonymous static variable, if necessary, and returns a tree for 38 GLOBAL
the rvalue of that variable: 168 idtree
(dag.c functions)+=
...
323 337
60 isarray
.... 247 jump
Tree cvtconst(p) Tree p; { 321 list
Symbol q = constant(p->type, p->u.v); 318 listnodes
Tree e; 315 node
80 STATIC
150 tree
if (q->u.c.loc ==NULL) 58 typelnit
q-~u.c.loc = genident(STATIC, p->type, GLOBAL); 60 unqual
if (isarray(p->type)) {
e = tree(ADDRG+P, atop(p->type), NULL, NULL);
e->u.sym = q->u.c.loc;
} else
e = idtree(q->u.c.loc);
return e;
}
These variables are initialized when finalize calls doconst at the end
of compilation.
328 CHAPTER 12 • GENERATING INTERMEDIATE CODE

12.4 Assignments
Nodes for assignments are always listed and return no value. Trees for
assignments, however, mirror the semantics of assignment in C, which
returns the value of its left operand. The l i stnodes case for assignment
traverses the operands, and builds and lists the assignment node. It
begins by processing the operands:
(ASGN 328)= 328
.... 318
if (tp->kids[O]->op == FIELD) {
(l, r - for a bit-field assignment329)
} else {
l listnodes(tp->kids[O], 0, O);
r = listnodes(tp->kids[l], 0, O);
}
list(newnode(tp->op, l, r, NULL));
forest->syms[O] = intconst(tp->kids[l]->type->size);
forest->syms[l] = intconst(tp->kids[l]->type->align);
An ASGN's syms fields point to symbol-table entries for constants that
give the size and alignment of the value (see page 83). Assignments to
bit fields are described below.
An assignment invalidates nodes in the node table that depend on the
addrtree 210
align 78
previous value of the left operand. l cc handles just two cases:
computed 211
(ASGN 328) +=
...
328 329 318
FIELD 149 ....
forest 311 if (isaddrop(tp->kids[O]->op)
intconst 49 && !tp->kids[O]->u.sym->computed)
isaddrop 179 kill(tp->kids[O]->u.sym);
kill 316 else
list 321
listnodes 318
reset();
newnode 315 If the left operand is the address of a source-language variable or a tem-
reset 317
porary, the assignment kills only nodes for its rvalue. If the left operand
is the address of a computed variable or a computed address, the assign-
ment clears the node table. A computed variable represents the address
of variable plus a constant, such as a field reference, and is generated by
addrtree. Assignments to computed variables are like assignments to ar-
ray elements - an assignment to a single element kills everything. Less
drastic measures require more sophisticated analyses; those that offer
the most benefit, like global common subexpression elimination, require
data-flow analysis of the entire function, which l cc is not designed to
accommodate.
The value of an assignment is the new value of the left operand, which
is the possibly converted value of the right operand, so l i stnodes ar-
ranges for that node to annotate the ASGN tree:
12.4 • ASSIGNMENTS 329

(ASGN 328)+=
...
328 318
p = listnodes(tp->kids[l], 0, O);
tp->ki ds [1] has already been visited and annotated with the node that's
assigned to r above. Sop usually equals r, except for assignments to bit
fields, which compute r differently and may not visit tp->ki ds [1] at all,
as detailed below.
A FIELD tree as the left operand of ASGN tree identifies an assign-
ment to a bit field. These assignments are compiled into the appro-
priate sequence of shifts and bitwise logical operations. Consider, for
example, the multiple assignment w = x. amt = y where x is defined in
Figure 11.1 on page 279 and wand y are global integers. The first assign-
ment, x. amt = y, is compiled into the equivalent of
*~ = ((*~)&OxFFFFFFF80) I (y&Ox7F);
where~ denotes the address x+16. The word at x+16 is fetched, the bits
corresponding to the field amt are cleared, the rightmost 7 bits of y are
ORed into the cleared bits, and the resulting value is stored back into
x+16. This expression isn't quite correct: The value of x. amt = y, which
is assigned tow, is not y, it's the new value of x. amt. This value is equal
to y unless its most significant bit is one, in which case that bit must be
sign-extended if the result of the assignment is used. So, if y is 255, w
becomes -1. 66 Field
1 i stnodes handles this case by building an ASGN tree whose right 149 FIELD
operand computes the correct value. For w = x. amt = y, it builds a right 182 field
operand that's equivalent to (y«25)»25, which is what should appear 66 fieldmask
in place of yin the assignment to*~ above. Figure 12.5 shows the com- 66 fi el dri ght
66 fieldsize
plete tree for this multiple assignment. 318 listnodes
The code for a bit-field assignment builds a tree for the expression 169 lvalue
shown above, and calls 1 i stnodes to traverse it. 361 mask
317 reset
(1, r - for a bit-field assignment 329) = 328
Tree x = tp->kids[O]->kids[O];
Field f = tp->kids[O]->u.field;
reset();
1 = listnodes(lvalue(x), 0, O);
if (fieldsize(f) < 8*f->type->size) {
unsigned int fmask = fieldmask(f);
unsigned int mask= fmask<<fieldright(f);
Tree q = tp->kids[l];
(q - the r.h.s. tree 330)
r = listnodes(q, 0, O);
} else
r = listnodes(tp->kids[l], 0, 0);
The u. sym field of the FIELD tree tucked under the ASGN tree points to a
field structure that describes the bit field. For the amt field, fmask and
330 CHAPTER 12 • GENERATING INTERMEDIATE CODE

ASGN+I

ADDRG+P
/~ASGN+I
/~RSH+I
w

FIELD

i
INDIR+I
I\ LSH+I CNST+I
i I\ 25
ADDRG+P
x+16 i
INDIR+I CNST+I
25

ADDRG+P
y

FIGURE 12.5 Tree for w "" x . amt "" y.

mask are both 7F15; the complement of mask is FFFFFFF8015. 1i stnodes


treats an assignment to a bit field like an assignment to an array element,
and thus clears the node table.
There are two cases of assigning constants to bit fields that merit
consttree 193
special treatment. If the constant is zero, the assignment clears the field,
listnodes 318 which can be done more simply by ANDing the word with the complement
mask 361 of mask:
unsignedtype 58
{q .__the r.h.s. tree 330)=
if (q->op """" CNST+I && q->u.v.i ="" 0
330 329 ...
I I q->op : = CNST+U && q->u.v.u """" 0)
q "" bittree(BAND, x, consttree(-mask, unsignedtype));
If the constant is equal to 25- 1 where s is the size of the bit field, the

assignment sets all of the bits in the field, which can be done by ORing
the word with mask:
....
{q .__the r.h.s. tree 330)+=
else if (q->op """" CNST+I && (q->u.v.i&fmask)
330 331
fmask
... 329

II q->op """" CNST+U && (q->u.v.u&fmask) fmask)


q"" bittree(BOR, x, consttree(mask, unsignedtype));
These improvements make assignments of constants to 1-bit fields as ef-
ficient as the more verbose logical operations. For example, x. used "" 1
is compiled into the equivalent of
*DC "" *DC I Ox8;

where DC denotes the address x+12.


The general case requires the two ANDS and the OR shown in the as-
signment to * f3 above.
12.4 • ASSIGNMENTS 331

(q .._ the r.h.s. tree 330) +=


....
330 329
else {
listnodes(q, 0, O);
q = bittree(BOR,
bittree(BAND, rvalue(lvalue(x)),
consttree(-mask, unsignedtype)),
bittree(BAND, shtree(LSH, cast(q, unsignedtype),
consttree(fieldright(f), inttype)),
consttree(mask, unsignedtype)));
}

Figure 12 .6 shows the forest for the multiple assignment w = x . amt = y,


which falls into this general case. In all three cases, q is the tree for the
right-hand side of the assignment to the bit field, but tp->ki ds [1] is
the tree that represents the value of the assignment. For example, it's
the RSHI node in Figure 12.6 that annotates the bit-field ASGN+I tree in
Figure 12.5, and is thus used as the right operand for the assignment tow.
The final else clause in (q .._ the r.h.s. tree) rebuilds the rvalue of the
word that holds the field because that word may have been changed by
tp->ki ds [1]. For example, in
struct { int b:4, c:4; } x;
x.c = x.b++;
175 cast
193 consttree
ASGNI - - - - - - - - - - - - - - - - - - - - - - - - - -> ASGNI 66 fi el dri ght
318 listnodes
/~BORU
ADDRGP ADDRGP
/ 169
361
lvalue
mask
169 rvalue
X+l6 / ~ w
58 unsignedtype
BANDU BAN DU

/~ /~
CVIU
i CNSTU CVIU
Oxffffff80 i CNSTU
Ox7f
IND IR RSHI

LSH~
/~
j""'CNSTI
/ '-.___/ 25

IND I RI

i
ADDRGP
y
FIGURE 12.6 Forest for w = x. amt = y.
332 CHAPTER 12 • GENERA TING INTERMEDIATE CODE

x. b++ changes the word that holds x. c. The tree returned by bi ttree's
second argument, rval ue(lvalue(x)) in the fragment above, causes that
word to be fetched again. If x were used instead, the assignment to x. c
would use the value of the word before x . b is changed, and the new value
of x. b would be lost.

12.5 Function Calls


Figure 12.7 shows the form of a CALL+B tree, which is the most general
form of CALL. As explained in Section 9.3, the right operand is the address
of a temporary to which the value returned is assigned. The other CALLs
have only one operand.
CALL+B complicates the 1 i stnodes case for CALL, because if the inter-
face wants_ca 11 b is zero, the right operand is passed as a hidden first
argument and a CALLV node is used instead of a CALLB node. Another
complication is that the ARG trees appear down in the CALL tree's left
operand, but the corresponding nodes are listed, and the CALL node's
left operand is the address of the function (see page 85). The inter-
face flag 1eft_to_ri ght supplies the last complication: The arguments
are evaluated left to right if that flag is one and right-to-left if it's zero.
This evaluation order also applies to the hidden first argument when
firstarg 333 wants_callb is zero. Figure 12.8 shows the form of forest generated for
left_to_right 88 the tree in Figure 12.7 when wants_callb is zero and left_to_right is
listnodes 318 one. The leading ARGP node is the hidden first argument.
wants_callb 88
The 1 i stnodes case for CALL is
(CALL 332)= 318
Tree save= firstarg;

CALL+B

/~
RIGHT ADDRL+P
/~
temp

ARG f

en
/ "·· ·~
ARG

e2
/~ARG
/
FIGURE 12.7 Tree for f (ei, e2, ... , en) where f returns a structure.
12.5 •FUNCTION CALLS 333

ARGP- - -. ARG - - -. ARG - - -. - - -. ARG- - -.CALLV

i
ADDRLP
i i }" }
temp
FIGURE 12.s Forest for f (e1 , e2 , ... , en) where f returns a structure.

firstarg = NULL;
if (tp->op == CALL+B && !IR->wants_callb) {
{list CALL+B arguments 333)
p = newnode(CALLV, l, NULL, NULL);
} else {
1 = listnodes(tp->kids[O], 0, 0);
r = listnodes(tp->kids[l], 0, 0);
p = newnode(tp->op, 1, r, NULL);
}
list(p);
reset();
cfunc->u.f.ncalls++;
firstarg =save;

(dag.c data)+=
....
314 343
..... 290 cfunc
static Tree firstarg; 286 funcdefn
92 function
When necessary, fi rstarg carries the tree for the hidden first argument, 448 " (MIPS)
as described below. It's saved, reinitialized to null, and restored so that 484 " (SPARC)
arguments that include other calls don't overwrite it. A call is always 518 " (X86)
listed, and it kills all nodes in the node table. The nca 11 s field in a 306 IR
symbol-table entry for a function records the number of CALLs that func- 88 left_to_right
321 list
tion makes. This value supplies the fourth argument to the interface 318 listnodes
procedure function, which is called from funcdefn. 315 newnode
As Figure 12.7 shows, tp->ki ds [OJ is a RIGHT tree that holds both the 317 reset
arguments and the tree for the function address. Traversing this tree 149 RIGHT
thus lists the arguments and returns the node for the function address, 150 tree
88 wants_callb
which becomes the left operand of the CALL node.
For CALL+B trees, the tree for the hidden first argument is assigned to
fi rstarg:
{list CALL+B arguments 333) = 333
Tree argO = tree(ARG+P, tp->kids[l]->type,
tp->kids[l], NULL);
if (IR->left_to_right)
firstarg = argO;
1 = listnodes(tp->kids[O], 0, O);
if (!IR->left_to_right I I firstarg) {
firstarg =NULL;
334 CHAPTER 12 • GENERA TING INTERMEDIATE CODE

listnodes(argO, 0, O);
}

If 1eft_to_ri ght is one, fi rstarg gets the tree for the hidden argument
just before the arguments are visited, which occurs when 1i stnodes tra-
verses tp->ki ds [OJ, and the hidden argument will be listed before the
other arguments. When left_to_right is zero, fi rstarg is unnecessary
because the hidden argument is listed last anyway.
The last if statement in the fragment above also traverses the tree
for the hidden argument when 1eft_to_ri ght is one and fi rstarg is
nonnull. This case occurs for a call to a function that returns a structure
but that has no arguments. For this case, the fi rstarg will not have
been traversed by the ARG code below because tp->ki ds [OJ contains no
ARG trees for this call.
An ARG subtree is built as the arguments are parsed from left to right,
and thus it always has the rightmost argument as the root, as shown
in Figure 12.7. The ARG nodes can be listed left to right by visiting
tp->ki ds [lJ before tp->ki ds [OJ; visiting the operands in the other or-
der lists the ARG nodes right-to-left.
{ARG 334):: 318
if (IR->left_to_right)
listnodes(tp->kids[lJ, 0, O);
align 78
firstarg 333
if (fi rstarg) {
forest 311 Tree arg = firstarg;
intconst 49 firstarg =NULL;
IR 306 listnodes(arg, 0, O);
left_to_right 88 }
list 321
listnodes 318
1 = listnodes(tp->kids[OJ, 0, O);
newnode 315 list(newnode(tp->op, 1, NULL, NULL));
forest->syms[OJ = intconst(tp->type->size);
forest->syms[lJ = intconst(tp->type->align);
if (!IR->left_to_right)
listnodes(tp->kids[lJ, 0, O);
Like an ASGN node, the syms field of an ARG node points to symbol-table
entries for constants that give the size and alignment of the argument.
The first time execution reaches the test of fi rstarg when the flag
1eft_to_ri ght is one is when the ARG for the first argument - the one
for e 1 in Figure 12.7 - is traversed. If fi rstarg is nonnull, it's listed be-
fore the tree for the first argument and reset to null so that it's traversed
only once.
12.6 •ENFORCING EVALUATION ORDER 335

12.6 Enforcing Evaluation Order


There are only a few operators for which the standard specifies an or-
der of evaluation. It specifies short-circuit evaluation for AND and OR
and the usual if-statement evaluation for COND. The left operand of the
comma operator must be evaluated before its right operand. 1 cc rep-
resents e 1 , e 2 with the tree (RIGHT e 1 e2), so the 1 i stnodes case for
RIGHT evaluates the left operand first, then evaluates and returns the
right operand:
(RIGHT 335)= 318
if ( (tp is a tree fore++ 336)) {
(generate nodes fore++ 336)
} else if (tp->kids[l]) {
listnodes(tp->kids[O], 0, O);
p = listnodes(tp->kids[l], tlab, flab);
} else
p = listnodes(tp->kids[O], tlab, flab);
As Chapters 8 and 9 and this code suggest, RIGHT trees are used for
purposes other than the comma operator, and they may have one or two
operands. The value of a RIGHT tree is the value of its rightmost operand.
For example, RIGHT trees are used to unnest nested calls - those that
have calls as arguments. ca 11 hoists all such arguments into the left 149 AND
186 call
operand of a RIGHT tree so that they are listed before the ARGs of the call 149 COND
in which they appear. Figure 12.9 shows the tree for f (e1, g(e2), e3). 318 listnodes
149 OR
149 RIGHT
CALL

i
RIGHT

~f
CALL

i
RIGHT ARG

/~ B ~ARG
ARG

e2
/ /
336 CHAPTER 12 • GENERA TING INTERMEDIATE CODE

ARG - - - - _,..CALL - - - - _,.. ARG - - - - _,.. ARG - - - - _,.. ARG - - - - _,..CALL

t, ~~ t, }
FIGURE 12.10 Forest for f (e1, g(e2), e3).

Compare this figure to Figure 12.7. The arguments in Figure 12.9 ap-
pear as the right operand of an extra RIGHT node, which is shaded, and
the left operand of that node is the tree for the nested call g(e2). f's
second ARG node refers to the value of the call to B.
1i stnodes traverses this tree in the code for CALL described in Sec-
tion 12.5. When the left and only operand of the topmost CALL is tra-
versed, the RIGHT trees cause the nested call to B to be traversed and
listed before any of the arguments to f. Figure 12.10 shows the result-
ing forest.
RIGHT trees are also used to enforce the correct semantics for the ex-
pressions e++ and e--. Figure 12.11 shows the tree built by postfix for
i ++. The RIGHT nodes collaborate to return the value of i before it's
incremented, but there's an additional complication. To enforce an eval-
uation order that evaluates the INDIR+I first, that tree must be traversed
FIELD 149 and its node listed in the forest before the assignment to i, and the node
list 321 must annotate the RIGHT tree. Listing this INDIR node is what requires
listnodes 318 special treatment for the RIGHT idiom depicted by the lower RIGHT node
postfix 166 in Figure 12.11.
RIGHT 149
(tp is a tree fore++ 336)= 335
tp->kids[O] && tp->kids[l]
&& generic(tp->kids[l]->op) == ASGN
&& (generic(tp->kids[O]->op) == INDIR
&& tp->kids[O]->kids[O] == tp->kids[l]->kids[O]
I I (tp->kids[O]->op == FIELD
&& tp->kids[O] == tp->kids[l]->kids[O]))
As this test indicates, for postincrement or postdecrement of a bit field,
a FIELD node appears instead of an INDIR node, and this FIELD node is
the target of the assignment.
When e is a not a bit field, the INDIR tree is traversed, and its node is
listed before traversing the RIGHT tree's second operand:
(generate nodes fore++ 336) =
if (generic(tp->kids[O]->Op) == INDIR) {
337 ... 335

p = listnodes(tp->kids[O], 0, 0);
list(p);
listnodes(tp->kids[l], 0, 0);
}
12.7 • DRIVING CODE GENERATION 337

RIGHT

/
RIGHT

~ ASGN+I
INDIR+I

i~~ADD+I
ADDRG+P

~CNST+I
1

FIGURE 12.11 Tree for i ++.

Figure 5.3 (page 87) shows the forest for the assignment i = *p++. The
INDIR node for p's rvalue appears before the assignment to p.
Bit fields are problematic. l i stnodes can't list a FIELD node be-
cause there isn't one - FIELD operators appear only in trees. Instead,
l i stnodes must look below the FIELD tree to the INDIR tree that fetches
the word in which the bit field appears, traverse that tree, and list its
node:
(generate nodes fore++ 336) +=
...
336 335
93 callee
93 caller
else { 341 emitcode
list(listnodes(tp->kids[O]->kids[O], 0, O)); 149 FIELD
p = listnodes(tp->kids[O], 0, O); 286 funcdefn
listnodes(tp->kids[l], 0, O); 92 function
448 " (MIPS)
}
484 " (SPARC)
518 " (X86)
321 list
12.7 Driving Code Generation 318 listnodes

Once the code list for a function is complete, funcdefn calls the inter-
face procedure function to generate and emit the code. As described in
Section 5.10, this interface function makes two calls back into the front
end: It calls gen code to generate code, and it calls emi tcode to emit the
code it generates. Each of these functions makes a pass over the code
list, calling the appropriate interface function for each code-list entry.
funcdefn builds two arrays of pointers to symbol-table entries: The
callee array holds the parameters of the function as seen from within
the function, and ca11 er holds the parameters as seen by any callers of
the functions. These arrays are passed to function, which passes them
to gencode:
...
(dag.c functions)+=
void gencode(caller, callee) Symbol caller[], callee[]; {
327 340 ...
338 CHAPTER 12 • GENERATING INTERMEDIATE CODE

Code cp;
Coordinate save;

save = src;
(generate caller to callee assignments 338)
cp = codehead.next;
for ( ; errcnt <= 0 && cp; cp = cp->next)
switch (cp->kind) {
case Address: (gencode Address339) break;
case Blockbeg: (gencode Blockbeg339) break;
case Blockend: (gencode Blockend339) break;
case Defpoint: src = cp->u.point.src; break;
case Gen: case Jump:
case Label: (gencode Gen,Jump,Label 340) break;
case Local: (*IR->local)(cp->u.var); break;
case Switch: break;
}
src = save;
}

The assignments to s re are made so that diagnostics issued during code


generation will include the source coordinate of the offending expres-
Address 217 sion.
Blockbeg 217 Before making its pass through the code list, gencode inspects the
Blockend 217 symbols in ca 11 er and ca11 ee. For most functions, corresponding sym-
callee 93 bols in these arrays describe the same variable. For character and short
caller 93 parameters, however, the front end always promotes the argument to
codehead 217
Code 217 an integer or an unsigned; an example of this case is described in Sec-
codelist 217 tion 5.5. When this promotion occurs, the types of the caller symbol
Coordinate 38 and its corresponding ca 11 ee symbol are different, and gen code must
Defpoint 217 generate an assignment of the ca11 er to the ca 11 ee. This assignment
gencode 337 must also occur if the storage classes of a caller and callee are dif-
Gen 217
IR 306 ferent, which can occur, for example, when the back end changes the
Jump 217 storage class of the ca 11 er or ca 11 ee to conform to the target's calling
kind 143 convention.
Label 217 These assignments change the code list, and they must be inserted
local 90 before the first entry for the body of the function.
Local 217
(MIPS) local 447 (generate caller to cal lee assignments 338)= 338
(SPARC) " 483
(X86) " 518 {
src 108 int i;
Switch 217 Symbol p, q;
cp = codehead.next->next;
codelist = codehead.next;
for (i = O; (p = callee[i]) != NULL
&& (q = caller[i]) != NULL; i++)
12. 7 • DRIVING CODE GENERATION 339

if (p->sclass != q->sclass I I p->type != q->type)


walk(asgn(p, idtree(q)), 0, O);
codelist->next = cp;
cp->prev = codelist;
}

The manipulations of codehead and codelist before the loop collabo-


rate to split the code list into two pieces: codel i st points to the single
Start entry and cp points to the rest of the code list. The call to walk
appends each assignment to the code list pointed to by codel i st, as
usual. After the assignments are appended, the rest of the code list is
reattached. These list manipulations are similar to those used in Sec-
tion 10.7 and illustrated in Figure 10.2 to insert the selection code for
the switch statement in the correct position.
The cases for Blockbeg and Blockend code-list entries announce the
beginnings and ends of source-level compound statements.
(gencode Blockbeg 339)= 338
{
Symbol *p = cp->u.block.locals;
(*IR->blockbeg)(&cp->u.block.x);
for ( ; *p; p++)
if ((*p)->ref != 0.0)
(*IR->local)(*p); 90 address
} 217 Address
457 address (MIPS)
490 " (SPARC)
(gen code Bl ockend 339) = 338 521 " (X86)
(*IR->blockend)(&cp->u.begin->u.block.x); 210 addrtree
95 blockbeg
The interface functions blockbeg and blockend are passed the address 217 Blockbeg
of an Env value associated with the block. Back ends can use this value 365 blockbeg
to save values that must be restored at the end of the block, such as sets 95 blockend
of busy registers and frame offsets. 217 Blockend
365 blockend
Bl ockbeg entries include an array of pointers to symbol-table entries 217 codehead
for the locals declared in the block, and these are announced by the 217 codelist
interface function local. Other locals, such as temporaries, appear in 365 Env
Local entries and are announced similarly, as shown above. 168 idtree
Address entries carry the information necessary to define symbols 306 IR
90 local
that depend on the addresses of locals or parameters and are created by 217 Local
add rt ree. These symbols are announced by calling the interface function 447 local (MIPS)
address: 483 " (SPARC)
518 " (X86)
(gencode Address 339)= 338 364 offset
(*IR->address)(cp->u.addr.sym, cp->u.addr.base, 38 ref
cp->u.addr.offset); 217 Start
311 walk
For locals, these entries appear on the code list after the Blockbeg or
Loe al entries that carry the symbols on which they depend. Once these
340 CHAPTER 12 • GENERATING INTERMEDIATE CODE

latter symbols have been announced to the back end, the interface func-
tion address can be called to define the symbols in Address entries.
These entries can also define symbols that depend on parameters, which
have already been announced.
Gen, Jump, and Label entries carry forests that represent the code for
expressions, jumps, and label definitions. These forests are passed to
the interface function gen:
(gencode Gen, Jump, Label 340}= 338
if (!IR->wants_dag)
cp->u.forest = undag(cp->u.forest);
fixup(cp->u.forest);
cp->u.forest = (*IR->gen)(cp->u.forest);
gen returns a pointer to a node. Usually, it annotates the nodes in forest,
and perhaps reorganizes and returns the forest, but this interface per-
mits gen to return something else that can represented by a pointer to a
node. All of the back ends in this book return a pointer to a list of nodes
for the instructions in the forest. If gen returns null, the corresponding
call to the interface function emit, described below, is not made.
As detailed in previous sections, the forests in Gen entries can have
nodes that are referenced more than once because they represent com-
mon subexpressions. If the interface flag wants_dag is one, gen is
address 90 passed forests with these kinds of nodes. If wants_dag is zero, how-
Address 217 ever, undag generates assignments that store common subexpressions
(MIPS) address 457 in temporaries, and replaces references to the nodes that compute them
(SPARC) " 490
(X86) " 521
by references to the temporaries. Section 12.8 reveals the details.
emit 92 The syms [OJ fields of nodes for the comparison operators and jumps
emit 393 point to symbol-table entries for labels. These labels might be synonyms
equated 341 for the real label, described in Section 10.9. fi xup finds these nodes and
equatelab 248 changes their syms [OJ fields to point to the real labels.
forest 311 ....
gen
Gen
92
217
(dag.c functions}+=
static void fixup(p) Node p; {
337 341 ...
gen 402 for ( ; p; p = p->link)
IR 306
Jump 217 switch (generic(p->op)) {
Label 217 case JUMP:
undag 343 if (p->kids[OJ->op == ADDRG+P)
wants_dag 89 p->kids[OJ->syms[OJ =
equated(p->kids[OJ->syms[OJ);
break;
case EQ: case GE: case GT: case LE: case LT: case NE:
p->syms[OJ = equated(p->syms[OJ);
}
}

When equate lab makes Li a synonym for L2, it sets the u .1. equatedto
field of the symbol-table entry for L1 to the symbol-table entry for L 2 •
12. 7 • DRIVING CODE GENERATION 341

equated follows the list of symbols formed by these fields, if there is


one, to find the real label at the end:
(dag.c functions)+=
...
340 341
....
static Symbol equated(p) Symbol p; {
while (p->u.l.equatedto)
p = p->u.l.equatedto;
return p;
}

fi xup need inspect only the root nodes in the forest, because JUMP and
the comparison operators always appear as roots.
Once gencode returns, the interface procedure function has all the
information it needs, such as the size of the frame and the number of
registers used, to generate the function prologue. When it's ready to emit
the generated code, it calls emi tcode:
(dag.c functions)+=
...
341 343
....
void emitcode() {
Code cp;
Coordinate save;

save = src;
cp = codehead.next;
for ( ; errcnt <= 0 && cp; cp cp->next) 217 Address
217 Blockbeg
switch (cp->kind) { 217 Blockend
case Address: break; 217 codehead
case Blockbeg: (emitcode Blockbeg) break; 217 Code
case Blockend: (emitcode Blockend) break; 38 Coordinate
case Defpoint: (emitcode Defpoint341) break; 217 Defpoint
92 emit
case Gen: case Jump: 393 emit
case Label: (emitcode Gen,Jump,Label 342) break; 46 equatedto
case Local: (emitcode Local) break; 340 fixup
case Switch: (emi tcode Switch 342) break; 92 function
} 448 " (MIPS)
484 " (SPARC)
src = save; 518 " (X86)
} 337 gencode
92 gen
(emi tcode Defpoi nt 341) = 341 217 Gen
src = cp->u.point.src; 402 gen
217 Jump
The cases for the code-list entries for Defpoi nt, Bl ockbeg, Bl ockend, and 143 kind
Local don't emit code. If lee's -g option is specified, however, these 217 Label
cases call the stab interface functions to emit symbol-table information 217 Local
217 Switch
for debuggers.
Gen, Jump, and Labe 1 entries carry the forests returned by the interface
function gen, and emi tcode passes the nonnull forests to the interface
function emi t:
342 CHAPTER 12 • GENERA TING INTERMED/A TE CODE

(emitcode Gen,Jump,Label 342)= 341


if (cp->u.forest)
(*IR->emit)(cp->u.forest);
Switch code-list entries carry branch tables for switch statements gen-
erated by swcode. The u. swtch. va 1ues and u. swtch. 1abe1 s arrays in
these entries hold u. swtch. size value-label pairs that form the table.
emi tcode generates a global variable for the table whose symbol-table
entry is in the u. swtch. table field, and initializes the table to the ad-
dresses specified by the labels.
(emitcode Switch 342)= 341
{
inti, k = cp->u.swtch.values[O];
defglobal(cp->u.swtch.table, LIT);
for (i = O; i < cp->u.swtch.size; i++) {
while (k++ < cp->u.swtch.values[i])
(*IR->defaddress)(equated(cp->u.swtch.deflab));
(*IR->defaddress)(equated(cp->u.swtch.labels[i]));
}
swtoseg(CODE);
}

CODE 91 The value-label pairs in u. swtch. va1 ues and u. swtch. 1abe1 s are sorted
defaddress 91 in ascending order by value, but those values may not be contiguous.
(MIPS) " 456 The default label in u. swtch. de fl ab is used for the missing values.
(SPARC) " 490
(X86) " 523
defglobal 265
emitcode 341 12.8 Eliminating Multiply Referenced Nodes
emit 92
emit 393 The front end builds trees, but some of those trees are dags. 1 i stnodes
equated 341 takes these trees and builds dags so that it can eliminate common subex-
forest 311
IR 306 pressions. This section describes undag, which takes dags and turns
labels 41 them back into proper trees, though they're still called dags. 1cc's un-
listnodes 318 fortunate abuse of proper terminology is perhaps best dealt with by re-
LIT 91 membering that "trees" refers to the intermediate representation built
simplify 203 and manipulated by the front end, and "dags" refers to the intermediate
swcode 240
Switch 217 representation passed to and manipulated by the back ends.
table 41 1 i stnodes could be eliminated, but this would also sacrifice common-
undag 343 subexpression elimination, which contributes significantly to the qual-
ity of the generated code. The earliest versions of 1 cc did the oppo-
site: the front end built dags directly. This approach was abandoned for
the present scheme because dags made code transformations, like those
done by si mp 1 i fy, much more complicated. Maintaining the reference
counts, for example, was prone to error.
A node that represents a common subexpression is pointed to by the
elements of the kids arrays in at least two other nodes in the same forest,
12.8 •ELIMINATING MULTIPLY REFERENCED NODES 343

and its count field records the number of those pointers. Back ends
can generate code directly from the dags in each forest passed to the
interface function gen, but these multiply referenced nodes complicate
code generation in general and register allocation in particular. Some
compilers thus eliminate these nodes, either in their front end or in their
code generator. They generate code to assign their values to temporaries,
and they replace the references to these nodes with references to their
temporaries. As mentioned in Section 12.7, setting the interface flag
wants_dag to zero causes 1cc's front end to generate these assignments
and thus eliminate multiply referenced nodes. If wants_dag is zero, the
front end also generates assignments for CALLs that return values, even
if they're referenced only once, because listing a CALL node will give it a
hidden reference from the code list. All the code generators in this book
set wants_dag to zero.
gencode calls undag with each forest in the code list before passing
the forest to the interface function gen. undag builds and returns a new
forest, adding the necessary assignments to the new forest as it visits
each node in the old one.
....
( dag.c data)+= 333
static Node *tail;

(dag.c functions)+=
....
341 345
..... 81 count
static Node undag(forest) Node forest; { 311 forest
Node p; 337 gencode
92 gen
tail = &forest; 402 gen
for (p = forest; p; p = p->link) 344 iscall
87 mulops_calls
if (generic(p->op) == INDIR 345 visit
I I iscall(p) && p->count >= 1) 89 wants_dag
visit(p, 1);
else {
visit(p, 1);
*tail = p;
tail = &p->link;
}
*tail = NULL;
return forest;
}
The two arms of the if statement handle nodes that do not appear as
roots in the new forest and those that do. Listed INDIR nodes and calls
referenced by other nodes are replaced in the new forest by assignments
of their values to temporaries. All other listed nodes, such as nodes for
the comparisons, JUMP, LABEL, ASGN, and CALLS executed for side effect
only, are appended to the new forest. Here, calls includes the multiplica-
tive operators if the interface flag mu 1ops_ca11 s is one:
344 CHAPTER 12 • GENERA TING INTERMEDIATE CODE

{dag.c macros)=
#define iscall(p) (generic((p)->op) ==CALL \
I I IR->mulops_calls \
&& ((p)->OP==DIV+I I I (p)->OP==MOD+I I I (p)->OP==MUL+I \
I I (p)->OP==DIV+U I I (p)->OP==MOD+U I I (p)->OP==MUL+U))
vi sit traverses a dag looking for nodes that are referenced more than
once - those whose count fields exceed one. On the first encounter with
each such node, vi sit generates a temporary, builds an assignment of
the node to the temporary, and appends that assignment to the new
forest. When that node is encountered again, either in the same dag or
in a subsequent dag, visit replaces the reference to the node with a
new node that references the appropriate temporary. The effect is that
an assignment to a temporary appears in the new forest just before the
root of the dag that first references it.
An example helps illustrate visit's details. The forest for the state-
ment in
register int n, *q;
n = *q++ = f(n, n);
is shown in Figure 12.12. There are five common subexpressions and
thus five multiply referenced nodes: The lvalues of q and n, the rvalues
IR 306 of q and n, and the call to f. Figure 12.13 shows the forest returned by
mulops_calls 87 undag. Only two of these common subexpressions have been replaced by
undag 343
temporaries: t2 is assigned the rvalue of q and t3 is assigned the value
visit 345
returned by the call to f. There are no temporaries for the lvalues of q
and n because it's just as easy to recompute them, which is why there

INDIRP- - -.ASGNP- - - - - - - - - - - - - - - -. ARGI- - - -> ARGI -

ADJRX ~DP~
qLP \__,,)
mot::
i
-,
CN~TI \;ADDnRLP _,'
,, .... ---------------------- _____________ ,,.,'
' -~
, CArI- - - - - - - - ->ASGNI- - - - - - ->ASGNI

ADDRGP
f

FIGURE12.12 Forestforn = *q++ = f(n, n).


12.8 • ELIMINATING MULTIPLY REFERENCED NODES 345

ASGNP-------------~ASGNP-------------~ ARGI---~ARGI-,

ADDRLP
/ "'\.. INDIRP ADDRLP
/ "'\.. ADDP
i
INDIRI
i
INDIRI
t
2
i
ADDRLP
q /
IND I RP
"'\.. i
CNSTI ADDRLP
i
ADDRLP
q
i
ADDRLP
4 n n

t2

- - - - -> ASGNI- - - - - - - - - - - - - - • ASGNI- - - - - - - - - - - - - - ~ ASGNI

/ "'\..
ADDRLP CALLI INDIRP
/ "'\.. INDIRI ADDRLP
/ "'\.. INDIRI
t3 i
ADDRGP
i
ADDRLP
i
ADDRLP
n i
ADDRLP
f t2 t3 t3
FIGURE 12.13 Forest for n = *q++ = f(n, n) when wants_dag is zero.

are two (ADDRLP q) nodes and three (ADDRLP n) nodes in Figure 12.13.
As detailed below, there's no temporary for the rvalue of n because n is 343 undag
a register, so it's cheaper to replicate the INDIR node that references n,
which is why there are two (INDIRI (ADDRLP n)) dags in Figure 12.13.
The forest shown in Figure 12.13 is what might be generated if the state-
ment above were written as
register int n, *q, *t2, t3;
t2 = q;
q = *t2 + 1;
t3 = f(n, n);
*t2 = t3;
n = t3;

visit traverses the dag rooted at p and returns either p or a node for
the temporary that holds the value represented by p:
(dag.c functions)+=
....
343 346
.....
static Node visit(p, listed) Node p; int listed; {
if (p)
(visit 346);
return p;
}

1i sted is one when undag calls visit, and it's zero when visit calls
itself recursively.
346 CHAPTER 12 •GENERATING INTERMEDIATE CODE

When visit generates a temporary for a node p, it stores the symbol-


table entry for that temporary in p->syms [2], which is not otherwise used
by the front end. visit must also announce the temporary by calling the
interface function 1oca1 just as if there were a Loca1 code-list entry for
the temporary:
(p->syms [2] - a generated tempora.zy 346) = 348
p->syms[2] = temporary(REGISTER, btot(p->op), LOCAL);
p->syms[2]->ref = 1;
p->syms[2]->u.t.cse = p;
(*IR->local)(p->syms[2]);
p->syms[2]->defined = 1;

(temporaries 346) = 38
struct {
Node cse;
} t;
Symbol-table entries for temporaries that hold common subexpressions
are identified as such by having nonnull u. t. cse fields. These fields
point to the nodes that represent their values. Back ends may use this
information to identify common subexpressions that are cheaper to re-
compute than to burn a register for.
btot 74 A nonnull p->syms [2] also marks p as a common subexpression, so
count 81 references to p must be replaced by references to the temporary, which
defined 50
IR 306 is visit's first step:
isunsigned 60
LOCAL 38
(visit346)=
i f (p->syms[2])
...
347 345
local 90
Local 217 p = tmpnode(p);
(MIPS) local 447
(SPARC) " 483 tmpnode builds and returns the dag (INDIR (ADDRLP p->syms [2]) ), which
(X86) " 518 references the temporary's rvalue:
newnode 315 ...
ref
REGISTER
38
80
(dag.c functions)+=
static Node tmpnode(p) Node p; {
345 348 ...
temporary 50 Symbol tmp = p->syms[2];
ttob 73
visit 345
if (--p->count == 0)
p->syms[2] = NULL;
p = newnode(INDIR + (type suffix fortmp->type 346),
newnode(ADDRLP, NULL, NULL, tmp), NULL, NULL);
p->count = 1;
return p;
}

(type suffix for tmp->type 346) = 346 348


(isunsigned(tmp->type) ? I ttob(tmp->type))
12.8 •ELIMINATING MULTIPLY REFERENCED NODES 347

p->count is the number of references to p. tmpnode decrements p->count


for each reference and clears p->syms [2] on the last one. Setting the
value of p->syms [2] to null reinitializes it for use by the back end.
For nodes that are referenced only once and calls made for side effect
only, visit traverses and rewrites their operands:
....
(visit 346) +=
else if (p->count <= 1 && !iscall(p)
346 347 ... 345

II p->count == 0 && iscall(p)) {


(visit the operands 347)
}

(visit the operands 347)= 347 348


p->kids[O] = visit(p->kids[O], O);
p->kids[l] = visit(p->kids[l], 0);
Calls that are referenced from other nodes are not processed here be-
cause they're replaced by assignments even if they have only one refer-
ence. They're treated as common subexpressions, as shown below.
As suggested above, temporaries are not generated for the addresses
of locals and parameters because it's usually cheaper to recompute their
addresses instead of using a register to save them. visit thus builds
and returns a new node for all ADDRLP and ADDRFP nodes:
81 count
....
(visit 346)+=
else if (p->op == ADDRLP I I p->op == ADDRFP) {
347 347
... 345 344 iscall
315 newnode
80 REGISTER
p = newnode(p->op, NULL, NULL, p->syms[O]); 346 tmpnode
p->COUnt = 1; 345 visit
}

Similarly, it's usually wasteful to store the rvalue of a register variable


in another register. It's better to build a new dag for each reference to
the register's rvalue, as shown in Figure 12.13 for the two references to
n. Figure 12.13 shows that q's rvalue is copied to a temporary. The two
references to q's rvalue must not be duplicated because the INDIRP is
listed, which indicates that the value must be copied, because q might
be changed. vi sit thus looks for the pattern (INDIR (ADDRxP v)) where
v is a register, but steers clear of those in which the INDIR is listed .
....
(visit 346)+=
else if (generic(p->op) == INDIR && !listed
347 348 345 ...
&& (p->kids[O]->op == ADDRLP I I p->kids[O]->op == ADDRFP)
&& p->kids[O]->syms[O]->sclass == REGISTER) {
p = newnode(p->op, newnode(p->kids[O]->op, NULL, NULL,
p->kids[O]->syms[O]), NULL, NULL);
p->COUnt = 1;
}
348 CHAPTER 12 • GENERA TING INTERMEDIATE CODE

This case also reveals why undag can't be called earlier - for example,
from walk. The storage class of locals and parameters isn't certain un-
til the back end has seen the function. Once consumed, funcdefn calls
checkref, which changes the storage class of frequently accessed locals
and parameters to REGISTER. If undag were called from wa1 k, it would
generate temporaries for automatic locals and parameters that might
later become registers.
The last two cases cover INDIRB nodes and nodes for common subex-
pressions. Registers can't hold structures, so there's no point in copying
them to temporaries; vi sit just replicates the INDIRB node:
(visit 346) +=
....
347 345
else if (p->op == INDIRB) {
--p->count;
p = newnode(p->op, p->kids[O], NULL, NULL);
p->count = 1;
(visit the operands 347)
} else {
(visit the operands 347)
(p->syms [2] - a generated temporary346)
*tail= asgnnode(p->syms[2], p);
tail = &(*tail)->link;
align 78 if ( ! 1 i sted)
checkref 296 p = tmpnode(p);
count 81 }
funcdefn 286
intconst 49 The else clause handles the first encounter with a common subexpres-
newnode 315 sion. After traversing the operands, visit generates a temporary, as
REGISTER 80 described above, and calls
tail 343
tmpnode 346 (dag.c functions)+=
....
346
undag 343
visit 345 static Node asgnnode(tmp, p) Symbol tmp; Node p; {
walk 311 p = newnode(ASGN + (type suffix fortmp->type 346),
newnode(ADDRLP, NULL, NULL, tmp), p, NULL);
p->syms[O] = intconst(tmp->type->size);
p->syms[l] = intconst(tmp->type->align);
return p;
}

to generate an assignment to that temporary. It then appends the assign-


ment to the new forest. This code is responsible for the assignments to
t2 and t3 in Figure 12.13. If 1 i sted is zero, pis referenced from another
node, so visit must return a reference to the temporary. Otherwise, p
is referenced from the old forest, which isn't included in p->count and
thus doesn't consume a reference to the temporary.
FURTHER READING 349

Further Reading
Using the code list to represent a function's code is idiosyncratic to 1cc.
A flow graph is the more traditional representation. As detailed in tradi-
tional compiler texts, such as Aho, Sethi, and Ullman (1986), the nodes
in a flow graph are basic blocks and the edges represent branches from
blocks to their successors. A flow graph is the representation usually
used for optimizations that 1cc doesn't do. Many intra-procedural op-
timization algorithms that discover and improve the code in loops use
flow graphs, for example.
The bottom-up hashing algorithm used by node to discover common
subexpressions is also known as value numbering, and it's been used in
compilers since the late 1950s. The node numbers shown in Tables 12.l
and 12.2 are the value numbers of the nodes with which they are associ-
ated. Value numbering is also used in data-flow algorithms that compute
information about available expressions in a flow graph. This informa-
tion can be used to eliminate common subexpressions that are used in
more than one basic block.
The scheme used in Section 12 .3 to generate short-circuit code for the
&& and I I operators is similer to the approach described by Logothetis
and Mishra (1981). That approach and lee's propagate true and false
labels. Another approach, called backpatching, propagates lists of holes
- the empty targets of jumps. Once the targets are known, these lists 316 kill
are traversed to fill the jumps. This approach works particularly well 318 listnodes
315 node
with syntax-directed translations in bottom-up parsers (Aho, Sethi, and 149 OR
Ullman 1986).
Most compilers generate code from trees, but some use dags; Aho,
Sethi, and Ullman (1986) describe the relevant code-generation algo-
rithms for trees and dags and weigh their pros and cons. Earlier ver-
sions of 1cc included code generators that accepted dags. Instruction
selection in these code generators was described with compact "pro-
grams" in a language designed specifically for generating code from 1cc's
dags (Fraser 1989). This language was used to write code generators for
the VAX, Motorola 68030, SPARC, and MIPS. All the code generators in
this book use trees.

Exercises
12.1 ki 11 continues searching buckets for rvalues of p even after it's
found and removed the first one. Give a C fragment that illustrates
why there can be more than one kind of IND IR node for p in buckets
at the same time. Hint: casts.
12.2 Implement {OR), the case in 1i st nodes for the OR operator.
350 CHAPTER 12 • GENERATING INTERMEDIATE CODE

12.3 Draw the forest generated for the statement

while (a[i] && a[i]+b[i] > 0 && a[i]+b[i] < 10) ...

where a and b are integer arrays.


12.4 Implement (RET); RET nodes are always roots.
12.5 Implement (DIV .. MOD); make sure your code handles the interface
flag mul ops_ca11 s properly.
12.6 Implement unlist, which is described in Section 12.3.
12.7 Give an example of a conditional expression where the calls to
equatelab and unlist in (COND) eliminate a branch to a branch.
Hint: nested conditional expressions.
12.8 For code of the form
i f (1) S1 else S2

1 cc generates
S1
goto L + 1
asgntree 197
equatelab 248 L: S2
exprO 156 L + 1:
expr 155
listnodes 318 Revise 1cc to omit the goto and the dead code S2.
mulops_calls 87
12. 9 Figure 12. 5 is the tree for the assignment w = x . amt = y; the lower
ASGN+I tree is the tree for the single assignment x. amt = y. If
the value of a bit-field assignment isn't used, asgntree's efforts
in building a tree for the right operand that computes the correct
result of the assignment are wasted and generate unnecessary code.
Whenever the front end realizes that the value of a tree isn't used,
it passes the tree to root and uses the tree returned in place of the
original; see exprO and expr for examples. Study root and extend
it to simplify the right-hand sides of bit-field assignments when
possible.
12.10 asgntree and the 1 i stnodes code in Section 12.4 collaborate to
compute the result of a bit-field assignment by sign-extending or
masking when necessary. Similar cases occur for other assign-
ments. For example,

int i;
short s;
i = s = OxFFFF;
EXERCISES 351

sets i to -1 on targets that have 16-bit shorts and 32-bit integers.


There's no special code for these kinds of assignments, but l cc
generates the correct code for this assignment. Explain how.
12.11 Draw the forest for the tree shown in Figure 12.7 when the flag
lefLto_right is zero.
12.12 Draw the tree and the forest for the augmented assignment in

struct { int b:4, c:4; } x;


x.c += x.b++;

The bit fields b and c are in the same word, so that word is fetched
twice and stored twice.
12.13 Managing labels and their synonyms is an instance of the union-
find problem, which is described in Chapter 30 of Sedgewick (1990).
Replace equatelab, fixup, and equated with versions that use the
path-compression algorithm commonly used for solving union-find
problems. Measure the improvement in l cc's execution time. If
there's no significant improvement, explain why.
12.14 Why doesn't visit treat ADDRGP nodes like ADDRLP and ADDRFP
nodes? 341 equated
248 equatelab
340 fixup
88 left_to_right
345 visit
13
Structuring the Code Generator

The code generator supplies the front end with interface functions that
find target-dependent instructions to implement machine-independent
intermediate code. Interface functions also assign variables and tempo-
raries to registers, to fixed cells in memory, and to stacks, which are also
in memory.
A recurring priority throughout the design of 1cc's back end has been
overall simplicity. Few compiling texts include any production code gen-
erators, and we present three. Typical modest handwritten code gener-
ators require 1,000-1,500 lines of C. Careful segregation of the target-
specific material has cut this figure roughly in half. The cost is about
1,000 lines of machine-independent code, but we break even at two tar-
gets and profit from there on out; more important, it's easier to get a
new code generator right if we use as much preexisting (i.e., machine-
independent) code as possible.
1cc segregates some target-specific material by simply reorganizing
print 18 mostly machine-independent functions into a large machine-independent
routine that calls a smaller target-specific routine. It segregates other
material by isolating it in tables; for example, 1cc's register allocator is
largely machine-independent, and processes target-specific data held in
structures that have a target-independent form. Finally, 1cc segregates
some target-specific material by capturing it in languages specialized for
concise expression of the material; for example, 1cc uses a language
tailored for expressing instruction selectors, and this language includes
a sublanguage for driving a code emitter.
To the machine-independent part of the code generator, target-specific
operations are like hot coals; they must be handled indirectly, with
"tongs." If a machine-independent routine must emit a store instruc-
tion, for example, it can't just call print. It must create an ASGN dag and
generate code for it, or escape to a target-specific function that emits the
instruction, or emit a predefined target-specific template. All these solu-
tions need more code than a print call, but they can still pay off because
they simplify retargeting. For example, a less machine-independent reg-
ister spiller with target-dependent parts for each of three targets might
take less code overall than 1cc's machine-independent spiller. But de-
bugging spillers is hard, so it can save time to debug one machine-
independent spiller instead of three simpler target-specific ones.
The next chapters cover instruction selection, register allocation, and
the machine-specific material. This chapter describes the overall orga-

352
13. 1 • ORGANIZATION OF THE CODE GENERATOR 353

nization of the code generator and its data structures. It also treats a
few loose ends that are machine-independent but don't fit cleanly under
instruction selection or register allocation.
The rest of this book uses the term tree to denote a tree structure
stored in node records, where the previous chapters use the term dag
for structures built from nodes. To make matters worse, the previous
chapters use the term tree for structures that multiply reference at least
some nodes, so they aren't really trees. Changing terms in midstream
is confusing, but the alternative is even worse. l cc originally used code
generators that worked on dags, but the code generators in this book
require trees; if subsequent text used "dag," it would be wrong, because
some of the algorithms fail if the inputs are not pure trees. l cc still
constructs dags in order to eliminate common subexpressions, but the
code in this book clears wants_dag.

13.1 Organization of the Code Generator


Table 13.1 illustrates the overall organization of the back end by show-
ing highlights from the call graph. Indentation shows which routines call
which other routines. This table and section necessarily omit many de-
tails and even many routines. They'll simply orient us; they can't answer
all questions. 356 emit2
444 " (MIPS)
478 " (SPARC)
Name of Routine Purpose 511 " (X86)
391 emitasm
function emits the function prologue and epilogue and calls 341 emitcode
gen code 92 emit
gen code interprets the code llst and passes trees to gen 393 emit
gen drives rewrite, prune, linearize, and ralloc 92 function
rewrite drives pre1abe1, _1abe1, and reduce 448 " (MIPS)
pre label changes the tree to cope with register variables and 484 " (SPARC)
spedal targets 518 " (X86)
_label labels tree with all plausible implementations 337 gencode
92 gen
reduce selects the cheapest implementation 402 gen
prune projects subinstructions out of the tree 413 linearize
linearize orders instructions for output 394 moveself
ralloc allocates registers 315 node
emitcode interprets the code llst and passes nodes to emit 398 prelabel
emit runs down the llst of instructions and drives emi tasm 386 prune
requate eliminates some register-to-register copies 417 ralloc
moveself eliminates instructions that copy a register to itself 382 reduce
emitasm interprets assembler templates and emits most 394 requate
402 rewrite
instructions 89 wants_dag
emit2 emits a few instructions too complex for templates

TABLE 13.1 Simplified back-end call tree.


354 CHAPTER 13 • STRUCTURING THE CODE GENERATOR

The front end calls the interface procedure function to generate code
for a routine. function decides how to receive and store the formals,
then calls gencode in the front end. gencode calls gen in the back end
for each forest in the code list. When gencode returns, the back end has
seen the entire routine and has computed the stack size and registers
used, so function emits the procedure prologue, then calls emi tcode in
the front end, which calls emit in the back end for each forest in the
code list. When emit returns, function emits the epilogue and returns.
gen coordinates the routines that select instructions and allocate regis-
ter temporaries for those instructions: rewrite, prune, 1 i neari ze, and
ra 11 oc. rewrite selects in~tructions for a single tree. prune projects
subinstructions - operati6ns such as those computed by addressing
modes - out of the tree pecause they don't need registers, and elimi-
nating them now simplifies the register allocator. 1 i neari ze orders for
output the instructions that remain. ra 11 oc accepts one node, allocates
a target register for it, and frees any source registers that are no longer
needed.
rewrite coordinates the routines that select instructions: pre 1abe1,
_label, and reduce. prelabel identifies the set of registers that suits
each node, and edits a few trees to identify more explicitly nodes that
read and write register variables. _ l abe 1 is automatically generated from
a grammar that describes the target machine's instructions. It labels a
emit2 356 tree with all plausible implementations that use the target instructions.
(MIPS) " 444 reduce selects the implementation that's cheapest.
(SPARC) " 478 emit coordinates the routines that emit instructions and that iden-
(X86) " 5ll
emitasm 391 tify some instructions that need not be emitted: emi tasm, requate,
emitcode 341 and movese 1f. requate identifies some unnecessary register-to-register
emit 92 copies, and movese 1f identifies instructions that copy a register to itself.
emit 393 emi tasm interprets assembler templates that are a bit like pri ntf format
function 92 strings. emi tasm escapes to a target-specific emi t2 for a few instructions
(MIPS) " 448
(SPARC) " 484 too complex for templates.
(X86) " 518
gencode 337
gen 92 13.2 Interface Extensions
gen 402
1i neari ze 413
moveself 394 The material in the back end falls into two categories: target-specific
prelabel 398 versus machine-independent, and private to the back end versus visible
prune 386 to the front end. The two categories combine to divide the back end four
ralloc 417 ways. Here's a sample routine of each kind from Table 13.1:
reduce 382
requate 394
rewrite 402 Routine Name Private? Target-specifi.c?
gen no no
function no yes
rewrite yes no
_label yes yes
13.2 • INTERFACE EXTENSIONS 355

Chapter 5 presents the public interface. This section summarizes the


back end's private internal interface; Chapters 16-18 supply example im-
plementations of this private interface, so they can help answer detailed
questions.
Four routines in the public interface - blockbeg, blockend, emit,
and gen - are target-independent. They could be moved into the front
end, but that would complicate using the front end with different code-
generation technologies. So one can retarget 1cc by replacing all routines
in the public interface or by replacing all but blockbeg, blockend, emit,
and gen and implementing the private interface instead.
The Xinterface structure extends the interface record:
(config.h 355) =
typedef struct {
...
357

(Xi nterface 355)


} Xinterface;
This type collects all machine-specific data and routines that the target-
independent part of the back end needs to generate code. It is to the
target-independent part of this back end what the main body of the in-
terface record is to the front end.
It starts with material that helps generate efficient code for ASGNB and
ARGB, which copy blocks of memory. 1cc generates loops to copy large
356 blkfetch
blocks, but it unrolls short loops into straight-line code because loop 460 " (MIPS)
overhead can swamp the cost of data movement for, say, an eight-byte 492 " (SPARC)
block copy. The block-copy generator has machine-specific and machine- 513 " (X86)
independent parts. The machine-specific material is a small integer and 356 blkloop
three procedures: 460 " (MIPS)
493 " (SPARC)
(Xi nterface initializer 355)=
blkfetch, blkstore, blkloop,
...
379 432 464 498 513
356
" (X86)
blkstore
461 " (MIPS)
493 " (SPARC)
Code generators need not use the block-copy generator; for example, 513 " (X86)
Chapter 18's code generator uses the X86 block-copy instructions, so it 95 blockbeg
implements only stubs for the routines above. 365 blockbeg
The integer x. max_una1i gned_load gives the maximum width in bytes 95 blockend
that the target machine can load and store unaligned: 365 blockend
92 emit
(Xinterface 355)= 356 355 393 emit
.... 92 gen
unsigned char max_unaligned_load; 402 gen
For example, the SPARC architecture implements no unaligned loads, so
its x. max_una 1i gned_load is one, because only load-byte instructions
require no alignment. The MIPS architecture, however, does support un-
aligned 2- and 4-byte loads, so its x.max_unaligned_load is four.
The procedure x. bl kfetch emits code to load a register from a given
cell:
356 CHAPTER 13 • STRUCTURING THE CODE GENERATOR

(Xi nterface 355) += 355 356 355


...
....
void (*blkfetch) ARGS((int size, int off, int reg, int tmp));
It emits code to load register tmp with size bytes from the address
formed by adding register reg and the constant offset off. The pro-
cedure x. b1ks to re emits code to store a register into a given cell:
(Xinterface 355)+= 3'!'6 356 355
....
void (*blkstore) ARGS((int size, int off, int reg, int tmp));
It emits code to store size bytes from register tmp to the address formed
by adding register reg and offset off.
The procedure x.blkloop emits a loop to copy a block in memory:
(Xinterface 355)+=
...
356 356 355
....
void (*blkloop) ARGS((int dreg, int doff,
int sreg, int soff,
int size, int tmps[]));
x.blkloop emits a loop to copy size bytes in memory. The source ad-
dress is formed by adding register sreg and offset soff, and the desti·
nation address is formed by adding register dreg and offset doff. tmps
is an array of three integers that represent registers available to help
implement the loop.
emit 92
After the interface to the block-copy generator comes the interface to
emit 393
gen 92 the instruction selector:
gen 402
(Xinterface 355)+=
...
356 356 355
reg 403 ....
(interface to instruction selector 379)
This fragment captures most of the target-specific code and data needed
by the machine-independent gen and emit. It is generated automatically
from a compact specification. The retargeter thus writes the specifica-
tion instead of the interface code and data. Neither the specification nor
the interface to the instruction selector can be described without prelim·
inaries. The introduction to Chapter 14 elaborates.
x. emi t2 emits instructions that cannot be handled by emitting simple
instruction templates:
(Xi nterface 355) +=
...
356 356 355
....
void (*emit2) ARGS((Node));
Every machine - and many calling conventions - have a few idiosyn-
cracies that can be hard to accommodate without emi t2's escape clause.
x. doarg computes the register or stack cell assigned to the next argu-
ment:
(Xi nterface 355) +=
...
356 357 355
....
void (*doarg) ARGS((Node));
13.3 • UPCALLS 357

The back end makes several passes over the forest of trees. The first
pass calls x. doarg as it encounters each ARG node. 1cc needs doarg in
order to emit code compatible with tricky calling conventions.
x. target marks tree nodes that must be evaluated into a specific reg-
ister:
(Xinterface 355)+=
...
356 357 355
....
void (*target) ARGS((Node));
For example, return values must be developed into the return register,
and some machines develop quotients and remainders into fixed regis-
ters. The mark takes the form of an assignment to the node's syms [RX],
which records the result register for the node. Section 13.5 elaborates.
x . c1ob be r spills to memory and later reloads all registers destroyed
by a given instruction:
(Xinterface 355)+=
...
357 355
void (*clobber) ARGS((Node));
It usually takes the form of a switch on the node's opcode; each of the
few cases calls spi 11, which is a machine-independent procedure that
saves and restores a given set of registers.

412 askregvar
13.3 Upcalls 367 blkcopy
356 doarg
Just as the back end uses some code and data in the front end, so 445 " (MIPS)
477 " (SPARC)
the target-specific code in the back end uses some code and data in 512 " (X86)
the machine-independent part of the back end. Most front-end routines 385 mayrecalc
reached by upcalls are simple and at or near leaves in the call graph, 366 mkactual
so it is easy for Chapter 5 to explain them. The back end's internal 365 mkauto
analogues are less simple and cannot, in general, be described out of 427 spill
context. They're summarized here so that retargeters can find them all 356 x.doarg
in one spot; consult the page cited in the mini-index for the definition
and - perhaps better yet - consult Chapters 16-18 for sample uses. In-
deed, perhaps the best way to retarget 1cc is to adapt one of the existing
code generators; having a complete set of sample upcalls is one of the
attractions.
(conflg.h 355)+=
...
355 358
....
extern int askregvar ARGS((Symbol, Symbol));
extern void blkcopy ARGS ( (int, i nt , int,
int, int, int[]));
extern int getregnum ARGS((Node));
extern int mayrecalc ARGS((Node));
extern int mkactual ARGS((int, int));
extern void mkauto ARGS((Symbol));
358 CHAPTER 13 • STRUCTURING THE CODE GENERA TOR

extern Symbol mkreg ARGS((char *, int, int, int));


extern Symbol mkwildcard ARGS((Symbol *));
extern int move ARGS((Node));
extern int notarget ARGS((Node));
extern void parseflags ARGS((int, char**));
extern int range ARGS((Node, int, int));
extern void rtarget ARGS((Node, int, Symbol));
extern void setreg ARGS((Node, Symbol));
extern void spill ARGS((unsigned, int, Node));

extern int argoffset, maxargoffset;


extern int bflag, dflag;
extern int dalign, salign;
extern int framesize;
extern unsigned freemask[], usedmask[];
extern int offset, maxoffset;
extern Symbol rmap[];
extern int swap;
extern unsigned tmask[], vmask[];

13.4 Node Extensions


argoffset 366
dalign 368 The code generator operates mainly by annotating extensions to the front
dflag 370 end's nodes. Annotations record such data as the instructions selected
framesize 366
freemask 410 and the registers allocated. The extension field in the node structure is
maxargoffset 366 named x and has type Xnode:
maxoffset 365
(config.h 355)+=
....
357 361
mkreg 363 ....
mkwildcard 363 typedef struct {
move 394 (Xnode flags 359)
node 315 (Xnode fields 358)
notarget 404
offset 364 } Xnode;
parseflags 370
range 388 The instruction selector identifies the instructions and addressing modes
rmap 398 that can implement the node, and it uses x. state to record the results:
rtarget 400
salign 368 (Xnode fields 358) = 359
.... 358
setreg 399 void *state;
spill 427
swap 371 Chapter 14 elaborates on the information represented by the structure
tmask 410 at which x. state points.
usedmask 410 Nodes implemented by instructions can need registers, but those real-
vmask 410
ized by addressing modes don't, so it is useful to distinguish these two
classes once the instruction selector has identified them. The back end
uses x. inst to mark nodes that are implemented by instructions:
13.4 • NODE EXTENSIONS 359

...
(Xnode fields 358) +=
short inst;
358 359... 358

x. inst is nonzero if the node is implemented by an instruction. The


value helps identify the instruction.
The back end forms in x. kids a tree of the instructions:
...
(Xnode fields 358)+=
Node kids[3];
359 359... 358

The tree parallels the one in the front end's kids, but the nodes com-
puted by subinstructions like addressing modes are projected out, as
shown in Figure 1.5. That is, x. kids stores the solid lines in Figure 1.5
on page 9; kids stores all lines there.
x. kids has three elements because 1 cc emits SPARC and X86 instruc-
tions that read up to three source registers, namely those that store one
register to an address formed by adding two others. 1 cc once generated
VAX code and used instructions with up to three operands that used up
to two registers each - a base register and an index register - so that
version of the compiler had six elements in its x. kids.
At some point, the code generator must order the instructions for out-
put. The back end traverses the projected instruction tree in postorder
and forms in x. prev and x. next a doubly linked list of the instructions
in this execution order: 81 kids
... 358 x. inst
(Xnode fields 358) +=
Node prev, next;
359 359... 358

For example, Figure 13.1 shows this list for Figure 1.5. It omits the trees
threaded through kids and x. kids.
The register allocator uses x. prevuse to link all nodes that read and
write the same temporary:
...
(Xnode fields 358) +=
Node prevuse;
359 359
... 358

Some calling conventions pass the first few arguments in registers, so the
back end helps out by recording the argument number in the x. argno
field of ARG nodes:
(Xnode fields 358) +=
...
359 358
short argno;
Each node extension holds several flags that identify properties of the
node. Roots in the forest need some special treatment from, for example,
the register allocator, so the back end flags them using x. 1 i sted:
(Xnode flags 359)=
unsigned listed:!;
...
360 358
360 CHAPTER 13 • STRUCTURING THE CODE GENERA TOR

gen's return value

l1NDIRD
"fld qword ptr %0\n"

lASGNF
"fstp dword ptr %0\n"

lINDIRF
"fld dword ptr %0\n"

lcvFD
"# nop\n"

lADDD
"fadd%1\n"

lcvo1
"sub esp,4\n
fistp dword ptr O[esp]\n
pop %c\n"

lRETI
"# ret\n"

lLABELV
"%a:\n"

FIGURE 13.1 Figure 1.5 linearized.

The register allocator and the emitter can traverse some nodes more
than once, but they must allocate a register and emit the node only at
the first traversal, so they set x. registered and x. emitted to prevent
reprocessing:
....
(Xnode flags 359)+=
unsigned registered:!;
359 360 ... 358

unsigned emitted:!;
1cc rearranges some expression temporaries to eliminate instructions;
to facilitate these optimizations, the back end uses x. copy to mark all
instructions that copy one register to another, and it uses x. equatable to
mark those that copy a register to a common-subexpression temporary:
....
(Xnode flags 359)+=
unsigned copy:!;
360 361
... 358

unsigned equatable:!;
Some common subexpressions are too cheap to deserve a register. To
save such registers, the back end flags uses x .mayrecal c to mark nodes
for computing common subexpressions that can be reevaluated safely.
13.4 • NODE EXTENSIONS 361

(Xnode flags 359) +=


...
360 358
unsigned mayrecalc:l;

The back end adds two generic opcodes for node structures. LOAD
represents a register-to-register copy. The back end inserts a LOAD node
when a parent needs an input in one register and the child yields a differ-
ent register. For example, if a function is called and its value is assigned
to a register variable, then the child CALL yields the return register, and
the parent needs a LOAD to copy it to the register variable.
If the back end assigns a local or formal to a register, it substitutes
VREG for all ADDRFP or ADDRLP opcodes for the variable. Register and
memory references need different code, and a different opcode tells us
which to emit. There is sure to be an ASGN or INDIR node above the
VREG; otherwise, the program computes the address of a register variable,
which is forbidden. The INDIR is not torn out of the tree even though
programs fetch register variables with no true indirection.
The target-independent Regnode structure describes a target-specific
register:
(confi.g.h 355)+=
...
358 361 ....
typedef struct {
Symbol vbl;
short set;
385 mayrecalc
short number;
315 node
unsigned mask;
} *Regnode;

If the register has been assigned to hold a variable - as opposed to a


temporary value - vbl points to the symbol structure for that variable.
set can handle a large number of register sets, but it handles all current
targets with just IREG and FREG:
(confi.g.h 355)+=
...
361 362 ....
enum { IREG=O, FREG=l };

IREG and FREG distinguish general registers from floating-point regis-


ters. number holds the register number; even if registers are identified
by a name instead of a number (as in X86 assemblers) there is usually
a companion numeric encoding used by binary emitters and debuggers.
mask has ones in bit positions corresponding to the underlying hardware
registers occupied. Most single-width registers have just a single one
bit, and most double-width registers have exactly two. For example, the
mask 1 identifies the single-width register 0, and the mask 6 identifies
the double-width register that occupies single-width registers 1 and 2.
The X86 architecture has one-, two-, and four-byte integer registers, so
its masks have one, two, or four one-bits. This representation is general
enough to describe most but not all register sets; see Exercise 13.2.
362 CHAPTER 13 • STRUCTURING THE CODE GENERA TOR

13.S Symbol Extensions


The back end also extends symbo 1 structures. The field is named x and
has type Xsymbo 1:
....
{confi.g.h 355)+= 361 362
....
typedef struct {
char *name;
int offset;
{fi.elds for temporaries 362)
{fi.elds for registers 362)
} Xsymbol;
x . name is what the back end emits for the symbol. For globals, it can
equal name on some targets. For locals and formals, it is a digit string
equivalent to x . offset, which is a stack offset. Offsets for local variables
are always negative, but offsets for parameters can be positive, which
explains why x. offset is signed.
If the symbol is a temporary in which the front end has stored a com-
mon subexpression, then the back end links all nodes that read or write
the expression using x. 1astuse, and it computes the number of such
uses into x. usecount:
{fi.elds for temporaries 362) = 362
function 92
Node lastuse;
(MIPS) " 448
(SPARC) " 484 int usecount;
(X86) " 518
mkreg 363
During initialization, the back end allocates one register symbol for
offset 364 each allocable register. It represents the register allocated to a node with
Regnode 361 a symbol so that the emitter can output register names and numbers
symbol 37 using the same mechanism that emits identifiers and constants, which
syms 81 are also held in syms. These register symbols use two unique fields:
{fi.elds for registers 362)= 363
.... 362
Regnode regnode;
The back end points x. regnode at a structure that describes the register,
and it sets x. name to the register's name or number. When it allocates a
register to a node p, it stores the corresponding symbol in p->syms [RX].
The back end sets RX to two to avoid having to move the values that the
front end passes it in syms [OJ and syms [1]:
....
{confi.g.h 355) += 362 365
....
enum { RX=2 };
Once the front end calls function, however, all elements of syms become
the property of the back end. The front end is done with them, and the
back end can change them as it sees fit. Most of its changes are to the
Xsymbol field and to syms[RX], but some changes are to other fields.
mkreg creates and initializes a register symbol:
13.5 • SYMBOL EXTENSIONS 363

(gen.c functions)=
Symbol mkreg(fmt, n, mask, set)
...
363

char *fmt; int n, mask, set; {


Symbol p;

NEWO(p, PERM);
p->x.name = stringf(fmt, n);
NEWO(p->x.regnode, PERM);
p->x.regnode->number = n;
p->x.regnode->mask = mask<<n;
p->x.regnode->set =set;
return p;
}

stri ngf is used to create a register name that includes the register num-
ber. For example, if i is 7, then mkreg("r%d", i, 1, !REG) creates a
register named r7. Acalllikemkreg("sp", 29, 1, !REG) is used if reg-
ister 29 is generally called sp instead of r29.
The back end also represents sets of registers; for example, if a node
must be evaluated into a specific register, the back end marks the node
with the register, but if the node can be evaluated into any one of a set
of registers, then the mark is given a value that represents the set. The
back end represents a set of registers by storing a vector of pointers to 361 mask
register symbols in the x. wi l dcard field of a special wildcard symbol: 24 NEWO
(fields for registers 362) +=
...
362 362
361
97
number
PERM
Symbol *wildcard; 361 set
99 stringf
For example, the back end for a machine with 32 integer registers would 362 x.name
allocate 32 register symbols and store them in a 32-element vector. Then 362 x.regnode
it would allocate one wildcard symbol and store in its x. wi l dcard the
address of the vector. mkwi l dcard creates a register-set symbol:
...
(gen.c functions)+=
Symbol mkwildcard(syms) Symbol *syms; {
363 365
...
Symbol p;

NEWO(p, PERM);
p->x.name = "wildcard";
p->x.wildcard = syms;
return p;
}

The x. name "wi l dcard" should never appear in l cc's output, but x. name
is initialized nonetheless, so that the emitter doesn't crash - and even
emits a telling register name - when the impossible happens.
364 CHAPTER 13 • STRUCTURING THE CODE GENERA TOR

13.6 Frame Layout


A procedure activation record, or frame, holds all the state information
needed for one invocation of a procedure, including the automatic vari-
ables, return address, and saved registers. A stack stores one frame for
each active procedure invocation. The stack grows down, toward lower
addresses. For example, if main calls f, and f calls itself recursively
once, the stack resembles the illustration shown in Figure 13.2. The
stack grows into the shaded area.
A logical frame pointer points somewhere into a stack frame. On all
targets, the locals have negative offsets from the frame pointer. Formals
and other data can be at positive or negative offsets, depending on the
target's convention. Figure 13.3 shows a typical frame.
Some targets hold the frame pointer in a physical register; for example,
Figure 18.1 shows that the X86 frame pointer is stored in register ebp,
and it points at one of the registers saved in the frame. Other targets
store only the stack pointer and represent the frame pointer as the sum
of the stack pointer and a constant; the MIPS code generator, for example,
does this. The virtual frame pointer for a routine with an 80-byte frame
is the address 80($sp) (80 plus the value of the stack pointer, $sp), and
-4+80($sp) references the local assigned offset -4 (see Figure 16.1).
offset is the absolute value of the stack offset for the last automatic
mkauto 365 variable, and mkauto arranges aligned stack space for the next one:
(gen.c data)= 365
.....
int offset;

high addresses
frame for ma i n

frame for f

..---- frame pointer


frame for f
..---- stack pointer

low addresses
FIGURE 13.2 Three stack frames.
13. 6 • FRAME LAYOUT 365

high addresses
return address

saved registers
+-----frame pointer

locals

outgoing arguments
low addresses .....__ _ _ _ _ ____.+-----stack pointer
FIGURE 13.3 Typical frame.

....
(gen.c functions)+=
void mkauto(p) Symbol p; {
363 365
...
offset= roundup(offset + p->type->size, p->type->align);
p->x.offset = -offset;
p->x.name = stringd(-offset);
}

Using the absolute value avoids questions about rounding when divid-
ing negative integers, and we don't assume that all offsets are negative 78 align
because for some formals, for example, they aren't. 410 freemask
361 FREG
At the beginning of each block, the front end calls blockbeg to save 361 IREG
the current stack offset and allocation status of each register: 364 offset
.... 19 roundup
(config.b 355)+=
typedef struct {
362 377... 29 stringd
362 x.name
int offset; 362 x.offset
unsigned freemask[2];
} Env;
....
(gen.c functions)+=
void blockbeg(e) Env *e; {
365 365
...
e->offset = offset;
e->freemask[IREG] freemask[IREG];
e->freemask[FREG] = freemask[FREG];
}

blockend restores the saved values at the end of the block:


....
(gen.c data)+=
int maxoffset;
364 366...
....
(gen.c functions)+=
void blockend(e) Env *e; {
365 366
...
366 CHAPTER 13 • STRUCTURING THE CODE GENERATOR

if (offset > maxoffset)


maxoffset = offset;
offset = e->offset;
freemask[IREG] e->freemask[IREG];
freemask[FREG] = e->freemask[FREG];
}

blockend also computes the maximum value of offset for the current
routine. The interface procedure function sets framesize
....
(gen.c data)+= 365 366
int framesi ze; ""
to maxoffset - or more to save space to store data like registers that
must be saved by the callee - and it emits a procedure prologue and
epilogue that adjust the stack pointer by framesi ze to allocate and deal-
locate stack space for all blocks in the routine at once.
Each routine's stack frame includes an argument-build area, which is
a block of memory for outgoing arguments, as shown in Figure 13.3. 1 cc
can pass arguments by pushing them onto the stack; the push instruc-
tions allocate the block of memory implicitly. Current RISC machines,
however, have no push instructions, and simulating them with multiple
instructions is slow. On these machines, 1cc allocates a block of mem-
align 78 ory and moves each argument into its cell in the block. It creates one
blockend 95 block for each routine, making the block big enough for the largest set
blockend 365 of outgoing arguments.
docall 367
freemask 410
The code and data that compute the offsets and block size in the
FREG 361 argument-build area resemble the ones above that manage automatics.
function 92 argoffset is the next available block offset. mkactua1 rounds it up to a
(MIPS) " 448 specified alignment, returns the result, and updates argoffset:
(SPARC) " 484 ....
(X86) " 518 (gen.c data)+= 366 366
IREG 361 int argoffset; ""
maxoffset 365
offset 364 ....
roundup 19 (gen.c functions)+= 365 367
int mkactual(align, size) int align, size; { ""
int n = roundup(argoffset, align);

argoffset = n + size;
return n;
}

do ca11 is invoked on the CALL node that ends each list of arguments.
It clears argoffset for the next set of arguments, and computes in
maxargoffset the size of the largest block of outgoing arguments:
(gen.c data)+=
....
366 368
int maxargoffset; ""
13. 7 • GENERATING CODE TO COPY BLOCKS 367

....
(gen.c functions)+=
static void docall(p) Node p; {
366 367 ...
p->syms[O] = intconst(argoffset);
if (argoffset > maxargoffset)
maxargoffset = argoffset;
argoffset = O;
}

doca 11 records in p->syms [O] the size of this call's argument block, so
that the caller can pop it off the stack if necessary. The X86 code gener-
ator illustrates this mechanism.

13.7 Generating Code to Copy Blocks


ASGNB and ARGB copy blocks of memory. 1 cc generates loops to copy
large blocks, but it unrolls short loops into straight-line code because
loop overhead can swamp the cost of data movement for, say, an eight-
byte block copy.
bl kcopy is the entry point into the block-copy generator. It is machine-
independent and shares b1k1 oop's signature:
....
(gen.c functions)+=
void blkcopy(dreg, doff, sreg, soff, size, tmp)
367 368 ... 366 argoffset
int dreg, doff, sreg, soff, size, tmp[]; { 356 blkloop
460 " (MIPS)
(bl kcopy 367) 493 " (SPARC)
} 513 " (X86)
368 blkunroll
blkcopy emits code to copy size bytes in memory. The source address 49 intconst
is formed by adding register sreg and offset soff, and the destination 366 maxargoffset
address is formed by adding register dreg and offset doff. tmps gives
the numbers of three registers available for use as temporaries by the
emitted code.
bl kcopy calls bl kl oop for long blocks, but it unrolls the loops for
blocks of 16 or fewer bytes; we chose this limit somewhat arbitrarily
after determining what some other compilers used. bl kcopy is recursive,
so it starts by confirming that there's something left to copy:
(bl kcopy 367) =
if (size == 0)
...
367 367

return;
If fewer than four bytes remain, bl kcopy calls bl kun ro 11 to emit code to
copy them:
....
(blkcopy367)+=
else if (size <= 2)
367368 ... 367

blkunroll(size, dreg, doff, sreg, soff, size, tmp);


368 CHAPTER 13 • STRUCTURING THE CODE GENERATOR

else if (size == 3) {
blkunroll(2, dreg, doff, sreg, soff, 2, tmp);
blkunroll(l, dreg, doff+2, sreg, soff+2, 1, tmp);
}

If the block has 4 to 16 bytes, bl kcopy rounds size down to a multiple of


four (using size&-3) and calls blkunroll to copy that number of bytes
four at a time. It then calls itself recursively to handle the remaining
zero to three bytes:
....
(blkcopy367)+=
else if (size <= 16) {
367368 ... 367

blkunroll(4, dreg, doff, sreg, soff, size&-3, tmp);


blkcopy(dreg, doff+(size&-3),
sreg, soff+(size&-3), size&3, tmp);
}

Loops copy blocks exceeding 16 bytes:


(blkcopy367)+= 368
.... 367
else
(*IR->x.blkloop)(dreg, doff, sreg, soff, size, tmp);
bl kunrol l shares a signature with bl kcopy and bl kloop, except for an
blkcopy 367 extra leading integer k, which is the number of bytes to copy at a time
blkfet:ch 356
(MIPS) " 460 and must be one, two, or four:
(SPARC) " 492 ....
(X86) "
blkloop
513
356
(gen.c functions)+= 367 370
static void blkunroll(k, dreg, doff, sreg, soff, size, tmp)
...
(MIPS) " 460 int k, dreg, doff, sreg, soff, size, tmp[]; {
(SPARC) " 493 int i;
(X86) " 513
blkst:ore 356
(MIPS) " 461 (reduce k? 369)
(SPARC) " 493 (emit unrolled loop 369)
(X86) " 513 }
IR 306
x.blkloop 356 In a perfect world, blkunroll would interleave calls on blkfetch and
bl ks tore to copy a block k bytes at a time. In this world, the alignments
of the source or destination addresses may not be multiples of k, and
some targets can't load or store k-byte units unless the address is a mul-
tiple of k. b1 kcopy's original caller sets globals sa1 i gn and da 1i gn to the
alignment for the source and destination blocks:
....
(gen.c data)+=
int dalign, salign;
366 370 ...
If the compiler knows nothing about a source or destination alignment,
then it sets sa 1i gn or da 1 i gn to one, since all blocks have an address
13.7 • GENERATING CODE TO COPY BLOCKS 369

that's divisible by one. Using globals for da 1 i gn and sa1 i gn is a trade-


off: it would be cleaner to pass them as arguments, but the procedures
have too many arguments already, and packaging the arguments as struc-
tures is a cure worse than the disease. bl kunrol l uses these values and
x. max_una1 i gned_ load to reduce k, and thus copy smaller chunks if k
exceeds the maximum size for unaligned loads and the alignment of the
source or destination:
{reduce k?369)= 368
if Ck > IR->x.max_unaligned_load
&& Ck > salign I I k > dalign))
k = IR->x.max_unaligned_load;
So, a large block with, say, 32-bit alignment for the destination but only
16-bit alignment for the source gets copied 16 bits at a time. Copying the
first 16 bits would give 32-bit alignment for the rest of the source, but it
would drop the rest of the destination down to 16-bit alignment, so this
step alone wouldn't help us generate better code; see Exercise 13.3.
bl kunrol l's other complication caters to machines that stall when
a load comes right before an instruction that uses the value loaded.
b1 kun ro 11 cuts such stalls by emitting two loads and then two stores,
so stores don't follow their companion loads immediately:
{emit unrolled loop 369) =
for Ci = O; i+k < size; i += 2*k) {
...
369 368 367
356
blkcopy
blkfetch
460 " (MIPS)
C*IR->x.blkfetch)Ck, soff+i, sreg, tmp[O]);
492 " (SPARC)
C*IR->x.blkfetch)Ck, soff+i+k, sreg, tmp[l]); 513 " (X86)
C*IR->x.blkstore)(k, doff+i, dreg, tmp[O]); 356 blkloop
C*IR->x.blkstore)Ck, doff+i+k, dreg, tmp[l]); 460 " (MIPS)
} 493 " (SPARC)
513 " (X86)
Each trip through the for loop emits one pair. It quits when no pairs 368 blkunroll
remain, and emits one last copy if the call requested an odd number: 368 dalign

{emit unrolled loop 369) + =


...
369 368
306
368
IR
salign
355 x.blkfetch
if Ci < size) { 356 x.blkstore
(*IR->x.blkfetch)(k, i+soff, sreg, tmp[O]);
C*IR->x.blkstore)(k, i+doff, dreg, tmp[O]);
}

Figure 13.4 shows lee generating MIPS code to copy a 20-byte structure
with four-byte alignment of the source and destination. The first col-
umn traces the calls to the procedures above. The second shows the
corresponding emitted code. tmps is initialized to {3, 9, 10}. Chapter 16
describes the MIPS instructions and the MIPS bl kl oop, bl kfetch, and
b1 kun ro 11. Its b1k1 oop copies eight bytes at a time. It calls b1 kcopy
recursively to copy the four bytes left over just before the loop.
370 CHAPTER 13 • STRUCTURING THE CODE GENERATOR

blkcopy(25, 0, 8, 0, 20, {3,9,10})


blkloop(25, 0, 8, 0, 20, {3,9,10}) addu $8,$8,16
addu $10, $25, 16
blkcopy(lO, 0, 8, 0, 4, {3,9,10})
blkunroll(4, 10, 0, 8, 0, 4, {3,9,10})
blkfetch(4, 0, 8, 3) lw $3,0($8)
blkstore(4, 0, 10, 3) SW $3,0($10)
blkcopy(lO, 0, 8, 0, 0, {3,9,10})
L. 3:
addu $8,$8,-8
addu $10,$10,-8
blkcopy(lO, 0, 8, 0, 8, {3,9,10})
blkunroll(4, 10, 0, 8, 0, 8, {3,9,10})
blkfetch(4, 0, 8, 3) lw $3,0($8)
blkfetch(4, 4, 8, 9) lw $9,4($8)
blkstore(4, 0, 10, 3) SW $3,0($10)
blkstore(4, 4, 10, 9) SW $9,4($10)
bltu $25,$10,L.3

FIGURE 13.4 Generating a structure copy.

13.8 Initialization
parse.flags recognizes the command-line options that affect code gen-
eration. -d enables debugging output, which helps when retargeting 1cc.
This book omits the calls that emit debugging output, but they're on the
companion diskette.
(gen.c data)+=
...
368 371
....
int dflag = O;

(gen.c functions)+=
...
368 382
....
void parseflags(argc, argv) int argc; char *argv[]; {
inti;

for (i = O; i < argc; i++)


if (strcmp(argv[i], "-d") 0)
dflag = 1;
}

1cc can run on one machine - the host - and emit code for another
- the target. One machine can be a big endian and the other a little en-
dian, which subtly complicates emitting doub 1e constants, and is another
matter that benefits from attention during initialization.
1cc assumes that it is running on and compiling for machines with
IEEE floating-point arithmetic. The host and target machines need not
be the same, but both must use IEEE floating-point arithmetic. This as-
sumption was once constraining, but it sacrifices little now.
FURTHER READING 371

The discussion about the interface procedure defconst in Chapter 5


explained that code generators for C must encode floating-point numbers
themselves. That is, they must emit equivalent hexidecimal constants
and shun the assembler directives that convert a textual representation
of a floating-point constant to its internal form.
1cc can emit a single word for each single-precision float, but it must
emit two words for doubles. If 1cc is running on a little endian and
compiling for a little endian, or if both machines are big endian, then
both encode doubles the same way, and the code generator can emit in
order the two words that comprise the double. But if one machine is a
big endian and the other a little endian, then one expects the high-order
word first and the other expects the low-order word first. defconst must
exchange the two halves as it emits them.
The interface flag 1itt1 e_endi an classifies the target, but nothing in
the interface classifies the host. 1cc classifies the host automatically
during initialization:
...
(gen.c data)+=
int swap;
370 394...
(shared p rogbeg 3 71) = 433 466 498
{
union { 368 blkunroll
char c; 91 defconst
int i; 455 " (MIPS)
} u; 490 " (SPARC)
522 " (X86)
u. i = O;
306 IR
u.c = 1; 87 little_endian
swap= (u.i == 1) != IR->little_endian;
}

Llttle-endian machines define u. c on top of the low bits of u. i, so the


assignment to u . c above sets u . i to 1. Big-endian machines define u . c
on top of the high bits of u. i, so the assignment to u. c to sets u. i to
OxOlOOOOOO on 1cc's 32-bit targets.

Further Reading
From this chapter on, it helps to be up to date on computer architecture.
For example, b1kunro11 's load-load-store-store pattern makes little sense
without an understanding of how loads and stores typically interact on
current machines. Patterson and Hennessy (1990) surveys computer ar-
chitecture.
372 CHAPTER 13 • STRUCTURING THE CODE GENERATOR

Exercises
13.1 Parts of 1cc assume that the target machine has at most two regis-
ter sets. Identify these parts and generalize them to handle more
register sets.
13.2 Parts of 1cc assume that the target machine has at most N registers
in each register set, where N is the number of bits in an unsigned.
Identify these parts and generalize them to handle larger register
sets.
13.3 The first column in Figure 13.4 gives a call trace for
blkcopy(25, 0, 8, 0, 20, {3, 9, 10})
when the source and destination addresses are divisible by four.
Give the analogous trace when the source and destination addresses
are divisible by two but not four.
13.4 1cc unrolls loops that copy structures of 16 or fewer bytes. This
limit was chosen somewhat arbitrarily. Run experiments to deter-
mine if another limit suits your machine better.

blkcopy 367
14
Selecting and Emitting Instructions

The instruction selectors in this book are generated automatically from


compact specifications by the program 1burg, which is a code-generator
generator. 1cc has had other instruction selectors - some written by
hand, some written by other code-generator generators - but none
of them appear in this book. 1burg's code generators can misbehave
if nodes are traversed more than once, so all back ends in this book
clear wants_dag and act on trees, although the tree elements have type
struct node, not struct tree.
1burg accepts a compact specification and emits a tree parser written
in C that selects instructions for a target machine. Just as the front end's
parser partitions its input into units like statements and expressions, a
tree parser accepts a subject tree of intermediate code and partitions it
into chunks that correspond to instructions on the target machine. The
partition is called a tree cover. This chapter refers to the generated tree
parser as BURM, but 1cc needs one parser for each target machine, so
emits one BURM into each of mi ps. c, spare. c, and x86. c. 430 mips.c
463 sparc.c
The core of an 1burg specification is a tree grammar. Like conven- 89 wants_dag
tional grammars, a tree grammar is a list of rules, and each rule has a 496 x86.c
nonterminal on the left and a pattern of terminals - operators in the
intermediate code - and nonterminals on the right.
Typical rules associate with each pattern an addressing mode or in-
struction that performs the operator that appears in the pattern. Con-
ventional patterns are compared with a linear string, but tree patterns are
compared with a structured tree, so tree patterns must describe the op-
erators they match and the relative positions of those operators in the
pattern. 1 burg specifications describe this structure with a functional
notation and parentheses. For example, the pattern
ADDI(reg, con)
matches a tree at an ADDI node if the node's first child recursively
matches the nonterminal reg and the second child recursively matches
the nonterminal con. The rule
addr: ADDI(reg, con)
states that the nonterminal addr matches this sample pattern, and the
rule
stmt: ASGNI(addr, reg)
373
374 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS

stmt: ASGNI(addr,reg)

addr: ADDP(addr,con) reg: con


con: CNSTI

reg: INDIRP(addr) CNSTI

addr: ADDRLP

FIGURE14.1 Cover for ASGNI (ADDP(INDIRP(ADDRLP(p)), CNSTI (4)), CNSTI (5)).

states that the nonterminal stmt matches each ASGNI node whose chil-
dren recursively match the nonterminals addr and reg.
The generated code generator - that is, the output of the code-
generator generator l burg - produces a tree cover, which completely
covers each input tree with patterns from the grammar rules that meet
each pattern's constraints on terminals and nonterminals. For example,
Figure 14.1 gives a cover for the tree
ASGNI(ADDP(INDIRP(ADDRLP(p)),CNSTI(4)),CNSTI(5))
using the two rules above plus a few more shown in the figure. The rules
to the side of each node identify the cover, and the shaded regions each
correspond to one instruction on most machines.
Tree grammars that describe instruction sets are usually ambiguous.
For example, one can typically increment a register by adding one to it
directly, or by loading one into another register then adding the second
register to the first. We prefer the cheapest implementation, so we aug-
ment each rule with a cost, and prefer the tree parse with the smallest
total cost. Section 14.2 shows tree labels with costs.
A partial cover that looks cheap low in the tree can look more expen-
sive when it's completed, because the cover from the root down to the
partial cover can be costly. When matching a subtree, we can't know
which matches will look good when it is completed higher in the tree,
so the generated code generator records the best match for every non-
terminal at each node. Then the higher levels can choose any available
nonterminal, even those that don't look cheap at the lower levels. This
technique - recording a set of solutions and picking one of them later
- is called dynamic programming.
The generated code generator makes two passes over each subject
tree. The first pass is a bottom-up labeller, which finds a set of patterns
that cover each subtree with minimum cost. The second pass is a top-
down reducer, which picks the cheapest cover from the set recorded by
14. 1 • SPECIFICATIONS 375

the labeller. It generates the code associated with the minimum-cost


patterns.

14.1 Specifications
The following grammar describes 1burg specifications. term and non-
term denote identifiers that are terminals and nonterminals:
grammar:
'%{' configuration '%}' { dcl } %".-6 { rule} [ %% C code ]
dcl:
%start nonterm
%term { term= integer}
rule:
nonterm : tree template [ C expression ]
tree:
term [ ' ( ' tree [ , tree ] ' ) ' ]
non term
template:
11
any character except double quote }
{
11

358 x.state
1burg specifications are line oriented. The tokens %{, %} , and %% must
appear alone in a line, and all of a dcl or rule must appear on a line.
The configuration is C code. It is copied verbatim into the beginning of
BURM. If there's a second%%, the text after it is also copied verbatim into
BURM. at the end.
The configuration interfaces BURM and the trees being parsed. It de-
fines NODEPTILTYPE to be a visible type name for a pointer to a node
in the subject tree. BURM uses the functions or macros OP_LABEL(p),
LEFT_CHILD(p), and RIGHT_CHILD(p) to read the operator and children
from the node pointed to by p.
BURM computes and stores a void pointer state in each node of the
subject tree. The configuration section defines a macro STATE_LABEL(p)
to access the state field of the node pointed to by p. A macro is required
because 1 burg uses it as an lvalue. The other configuration operations
may be implemented as macros or functions.
All 1burg specifications in this book share one configuration:
(]burg prefix375)= 431463 496
#include c.h 11 11

#define NODEPTILTYPE Node


#define OP_LABEL(p) ((p)->op)
#define LEFT_CHILD(p) ((p)->kids[O])
#define RIGHT_CHILD(p) ((p)->kids[l])
#define STATE_LABEL(p) ((p)->x.state)
376 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS

The %start directive names the nonterminal that the root of each tree
must match. If there is no %start directive, the default start symbol is
the nonterminal defined by the first rule.
The %term declarations declare terminals - the operators in subject
trees - and associate a unique, positive integral opcode with each one.
OP_LABEL(p) must return a valid opcode for node p. Each terminal has
fixed arity, which lburg infers from the rules using that terminal. lburg
restricts terminals to at most two children. 1cc's terminal declarations,
for example, include:
(terminal declarations 3 76) = 431 463 496
%start stmt
%term ADDD=306 ADDF=305 ADDI=309 ADDP=311 ADDU=310
%term ADDRFP=279
%term ADDRGP=263
%term ADDRLP=295
%term ARGB=41 ARGD=34 ARGF=33 ARGI=37 ARGP=39
Figure 14.2 holds a partial 1 burg specification for lee and a subset of
the instruction set of most machines. The second and third lines declare
terminals.
Rules define tree patterns in a fully parenthesized prefix form. Ev-
ery nonterminal denotes a tree. A chain rule is a rule whose pattern is
OP_LABEL 375 another nonterminal. In Figure 14.2, rules 4, 5, and 8 are chain rules.
stmt 403
The rules describe the instruction set and addressing modes offered
by the target machine. Each rule has an assembler code template, which
is a quoted string that specifies what to emit when this rule is used.
Section 14.6 describes the format of these templates. In Figure 14.2, the
templates are merely rule numbers.
Rules end with an optional cost. Chain rules must use constant costs,
but other rules may use arbitrary C expressions in which a denotes the

%start stmt
%term ADDI=309 ADDRLP=295 ASGNI=53
%term CNSTI=21 CVCI=85 INDIRC=67
%%
con: CNSTI "1"
addr: ADDRLP "2"
addr: ADDI(reg,con) "3"
re: con "4"
re: reg "5"
reg: ADDI (reg, re) "6" 1
reg: CVCI(INDIRC(addr)) "7" 1
reg: addr "8" 1
stmt: ASGNI(addr,reg) "9" 1

FIGURE 14.2 Sample 1burg specification.


14.2 • LABELLING THE TREE 377

node being matched. For example, the rule


con: CNSTU (a->syms[O]->u.c.v.u < 256 ? 0 : LBURG_MAX)

notes that unsigned constants cost nothing if they fit in a byte, and have
an infinite cost otherwise. All costs must evaluate to integers between
zero and LBURG_MAX inclusive. LBURG_MAX is defined as the largest short
integer:
( confi.g.h 355) +=
...
365
#define LBURG_MAX SHRT_MAX

Omitted costs default to zero. The cost of a derivation is the sum of the
costs for all rules applied in the derivation. The tree parser finds the
cheapest parse of the subject tree. It breaks ties arbitrarily.
In Figure 14.2, con matches constants. addr matches trees that can be
computed by address calculations, like an ADDRLP or the sum of a register
and a constant. re matches a constant or a reg, and reg matches any tree
that can be computed into a register. Rule 6 describes an add instruction;
its first operand must be in a register, its second operand must be a
register or a constant, and its result is left in a register. Rule 7 describes
an instruction that loads a byte, extends the sign bit, and leaves the result
in a register. Rule 8 describes an instruction that loads an address into
a register. stmt matches trees executed for side effect, which include
assignments. Rule 9 describes an instruction that stores a register into
the cell addressed by some addressing mode.

14.2 Labelling the Tree


BURM starts by labelling the subject tree. It works bottom-up and left-
to-right, computing the rules that cover the tree with the minimum cost.
Figure 14.3 shows the tree for the assignment in the fragment:
{ i n t i ; char c; i = c + 4; }

The other annotations in Figure 14.3 describe the labelling. (N, C, M)


indicates that the pattern associated with rule M with rule number N
matches the node with cost C. Each C sums the costs of the nonterminals
on the rule's right-hand side and the cost of the relevant pattern or chain
rule.
For example, rule 2 of Figure 14.2 matches the node ADDRLP i with
zero cost, so the node is labelled with (2, 0, addr: ADDRLP). Rule 8 says
that anything that matches an add r also matches a reg - with an addi-
tional cost of one - so the node is also labelled with (8, 1, reg: addr).
And rule 5 says that anything that matches a reg also matches an re -
at no extra cost - so the node is also labelled with (5, 1, re: reg). As it
happens, the next match higher in the tree needs an addr, so the chain
378 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS

ASGNI (9, 3, stmt: ASGNI(addr, reg))


(2, 0, add r: ADDRLP) / ""
(8, 1, reg: addr) ~ (6, 2, reg: ADDI(reg, re))
(5, 1, re: reg) ADDRLP ADDI (5, 2, re: reg)
/ ~· 1, addr: ADDI(reg,con))

(7, 1, reg: cver(INDIRe(addr))) ever eNSTI


(5, 1, re: reg) I 4
t
i
INDIRe ,,
(1, 0, con: CNSTI)
(4 O re: con )

ADDRLP (2, 0, addr: ADDRLP)


c (8, 1, reg: addr)
(5, 1, re: reg)

FIGURE 14.3 Labelled tree for i =c + 4.

rules aren't needed here. They are needed for, say, the CNSTI node, which
matches only con directly, but its parent needs re, and only a chain rule
records that every con is also an re. A bottom-up tree matcher can't
know which matches are needed at a higher level, so it records all of
them and lets the top-down reduction pass select the ones required by
the winner.
NODEPTR.._TYPE 375 Patterns can specify subtrees beyond the immediate children. For ex-
ample, rule 7 of Figure 14.2 refers to the grandchild of the CVCI node.
No separate pattern matches the INDIRC node, but rule ?'s pattern cov-
ers that node. The cost is the cost of matching the ADDRLP c as an addr
(using rule 2) plus one.
Nodes are annotated with (N, C, M) only if C is less than all previ-
ous matches of the nonterminal in rule M. For example, the ADDI node
matches a reg using rule 6; the total cost is 2. It also matches an addr
using rule 3, so chain rule 8 gives a second match for reg, also at a to-
tal cost of 2. Only one of these matches for reg will be recorded. 1 burg
breaks ties arbitrarily, so there's no easy way to predict which match will
win, but it doesn't matter because they have the same cost.
lburg generates the function

(BURM signatllre 378) =


static void _label ARGS((NODEPTR_TYPE a));
...
379

which labels the entire subject tree pointed to by a. State zero labels un-
matched trees; such trees may be corrupt or merely inconsistent with the
grammar. lburg starts all generated names with an underscore to avoid
colliding with names in BURM's C prologue and epilogue. The identifiers
are declared static and their addresses are stored in an interface record
so that 1cc can include multiple code generators. One fragment collects
the identifiers' declarations for a structure declarator:
14.3 • REDUCING THE TREE 379

(interface to instruction selector379}= 356


void (*_label) ARGS((Node));
Another collects the names for the C initializer that records the names
of the statics:
(Xi nterface initializer355} +=
....
355 432 464 498
_label,
The other identifiers that l burg defines in BURM have corresponding en-
tries in the two fragments above, but the text below elides them to cut
repetition.

14.3 Reducing the Tree


BURM's labeller traverses the subject tree bottom-up. It can't know which
rule will match at the next higher level, so it can't know which non-
terminal that rule will require. So it uses dynamic programming and
records the best match for all nonterminals. The label encodes a vector
of rule numbers, one for each nonterminal. lburg creates a structure
type _state in which _label stores the best (N, C, M) for each nonter-
minal:
struct _state {
short cost[MA)(_NONTERMINALS];
short rule[MA)(_NONTERMINALS];
};

The cost vector stores the cost of the best match for each nonterminal,
and the rule vector stores the rule number that achieved that cost. (Part
of the declaration above is a white lie: l burg compresses the rule field
using bit fields, but l burg supplies functions to extract the fields, so we
needn't waste time studying the encoding.)
l burg writes a function _rule, which accepts a tree's state label and
an integer representing a nonterminal:
(BURM signature 378}+=
....
378 380
....
static int _rule ARGS((void *state, int nt));
It extracts from the label's encoded vector of rule numbers the number
of the rule with the given nonterminal on the left. It returns zero if no
rule matched the nonterminal.
BURM's second pass, or reducer, traverses the subject tree top-down,
so it has the context that the labeller was missing. The root must match
the start nonterminal, so the reducer extracts the best rule for the start
nonterminal from the vector of rule numbers encoded by the root's la-
bel. If this rule's pattern includes nonterminals, then they identify a new
380 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS

frontier to reduce and the nonterminals that the frontier must match.
The process begun with the root is thus repeated recursively to expose
the best cover for the entire tree. The display below traces the process
for Figure 14.3:
_rule(root, stmt) = 9
_rule(root->kids[O], addr) = 2
_rule(root->kids[l], reg)= 6
_rule(root->kids[l]->kids[O], reg)= 7
_rule(root->kids[l]->kids[O]->kids[O]->kids[O], addr) 2
_rule(root->kids[l]->kids[l], re)= 5
_rule(root->kids[l]->kids[l], con)= 1
Each rule's pattern identifies the subject subtrees and nonterminals
for all recursive visits. Here, a subtree is not necessarily an immediate
child of the current node. Patterns with interior operators cause the
reducer to skip the corresponding subject nodes, so the reducer may
proceed directly to grandchildren, great-grandchildren, and so on. On the
other hand, chain rules cause the reducer to revisit the current subject
node, with a new nonterminal, so x is also regarded as a subtree of x.
1burg represents the start nonterminal with 1, so nt for the initial,
root-level call on _rule must be 1. BURM defines and initializes an array
that identifies the values for nested calls:
_rule 379
{BURM signature 378) +=
...
379 381
....
static short *_nts[];
_nts is an array indexed by rule numbers. Each element points to a zero-
terminated vector of short integers, which encode the nonterminals for
that rule's pattern, left-to-right. For example, the following code imple-
ments _nts for Figure 14.2:
static short _rLnts[] { 0 };
static short _rLnts[] { 4, 1, 0 } ;
static short _r4_nts[] { 1, 0 };
static short _r5_nts [] { 4, 0 };
static short _r6_nts[] { 4, 3, 0 } ;
static short _r7_nts[] { 2, 0 };
static short _r9_nts[] { 2' 4, 0 } ;

short *_nts[] ={
0, /* (no rule zero) */
_rl_nts, /* con: CNSTI */
_rl_nts, /* addr: ADDRLP */
_r3_nts, /* addr: ADDI(reg,con) */
_r4_nts, /* re: con */
_rs_nts, /* re: reg */
14.3 • REDUCING THE TREE 381

_r6_nts, /* reg: ADDI(reg,rc) */


_r7_nts, /* reg: CVCI(INDIRC(addr)) */
_r7_nts, /* reg: addr */
_r9_nts, /* stmt: ASGNI(addr,reg) */
};

The user needs only _rule and _nts to write a complete reducer, but
the redundant _kids simplifies many applications:
.....
(BURM signature 378) += ..... 380 389
static void _kids
ARGS((NODEPTR_TYPE p, int rulenum, NODEPTR_TYPE kids[]));
It accepts the address of a tree p, a rule number, and an empty vector of
pointers to trees. The procedure assumes that p matched the given rule,
and it fills in the vector with the subtrees (in the sense described above)
of p that must be reduced recursively. kids is not null-terminated.
The code below shows the minimal reducer. It traverses the best cover
bottom-up and left-to-right, but it doesn't do anything during the traver-
sal. parse labels the tree and then starts the reduction. reduce gets
the number of the matching rule from _rule, the matching frontier from
_kids, and the nonterminals to use for the recursive calls from _nts.
parse(NODEPTR_TYPE p) { 375 NODEPTR_TYPE
_label(p); 379 _rule
reduce(p, 1);
}

reduce(NODEPTR_TYPE p, int nt) {


inti, rulenum = _rule(STATE_LABEL(p), nt);
short *nts = _nts[rulenum];
NODEPTR_TYPE kids[lO];

_kids(p, rulenum, kids);


for (i = O; nts[i]; i++)
reduce(kids[i], nts[i]);
}

This particular reducer does nothing with any node. If the node were
processed - for example, emitted or allocated a register - in preorder,
the processing code would go at the beginning of the reducer. Postorder
processing code would go at the end, and inorder code would go between
reduce's recursive calls on itself. A reducer may recursively traverse sub-
trees in any order, and it may interleave arbitrary actions with recursive
traversals.
Multiple reducers may be written, to implement multipass algorithms
or independent single-pass algorithms. 1cc has three reducers. One
382 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS

identifies the nodes that need registers, another emits code, and a third
prints a tree cover to help during debugging. They all use get ru 1e, which
wraps _rule in some (elided) assertions and encapsulates the indirection
through IR:
...
(gen.c functions)+=
static int getrule(p, nt) Node p; int nt; {
370 382
...
int rulenum;

rulenum = (*IR->x._rule)(p->x.state, nt);


return rulenum;
}

The first reducer prepares for register allocation. It augments the min-
imal reducer to mark nodes that are computed by instructions and thus
may need registers:
...
(gen.c functions)+=
static void reduce(p, nt) Node p; int nt; {
382 384
...
int rulenum, i;
short *nts;
Node kids[lO];

IR 306 p = reuse(p, nt);


reuse 384 rulenum = getrule(p, nt);
_rule 379
x.inst 358
nts = IR->x._nts[rulenum];
x.state 358 (*IR->x._kids)(p, rulenum, kids);
for Ci= O; nts[i]; i++)
reduce(kids[i], nts[i]);
if (IR->x._isinstruction[rulenum]) {
p->x.inst = nt;
(count uses of temporaries 384)
}
}

1burg flags in x. i si nstructi on rules that emit instructions, in contrast


to those that emit subinstructions like addressing modes; it does so by
examining the assembler template, which Section 14.6 explains.
x. inst above is more than just a flag; it also identifies the nonterminal
responsible for the mark. The register allocator linearizes the instruction
tree, and the emitter reduces each instruction in isolation, so the emitter
needs a record of the nonterminal used in the instruction's reduction.
reduce collaborates with reuse to reverse excessive common subex-
pression elimination. The front end assigns common subexpressions to
temporaries and uses the temporaries to avoid recalculation, but this
can increase costs in some cases. For example, MIPS addressing hard-
ware adds a 16-bit constant to a register for free, so when such a sum
14. 3 • REDUCING THE TREE 383

is used only as an address (that is, by instructions that reference mem-


ory), putting it in a register would only add an instruction and consume
another register.
So 1burg extends the labeller to look for trees that read registers -
INDIRx(VREGP). If the register holds a common subexpression, and if
the expression may be profitably recalculated, the labeller augments the
label with bonus matches equal to the set of all free matches of the
expression assigned to the temporary.
For example, consider the code for p->b=q->b when, say, pis in regis-
ter 23, q is in register 30, and the field b has offset 4. Figure 14.4 shows
the trees of intermediate code.
The first tree copies the common subexpression 4 to a temporary reg-
ister, and the second tree uses the temporary twice to complete the state-
ment. The first label on the INDIRI node results from a typical pattern
match, but the second is a bonus match. Without the bonus match, 1cc's
MIPS code generator would emit five instructions:
1a $2 5, 4 load the constant 4 into register 2 5
add $24, $30, $25 compute the address of q->b into register 24
lw $24, ($24) load value of q->b into register 24
add $25, $23, $25 compute the address of p->b into register 25
sw $24, ($25) store the value of q->b into p->b
The bonus match enables several others, and together they save three 384 reuse
instructions and one register:
lw $24,4($30) load value of i into register 24
sw $24,4($23) store register 24 into x[O]
lee's reducers call reuse(p, nt) to see if the reduction of node p using
nonterminal nt uses a bonus match. If so, reuse returns the common
subexpression instead of p, and thus has the reducer reprocess the com-
mon subexpression and ignore the temporary:

----~ASGNI ---------------~ASGNI

I\
VREGP CNSTI
/~INDIRI
ADDP
3 4 I\ i conventional
INDIRP INDIRI ADDP
(match
i i
VREGP VREGP
I\
INDIRP INDIRI reg: INDIRI(VREGP)
con: CNSTI
p 3
i i
VREGP VREGP (bonus match
q 3

FIGURE 14.4 Excessive common subexpression elimination in p->b=q->b.


384 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS

....
(gen.c functions)+=
static Node reuse(p, nt) Node p; int nt; {
382 385...
struct _state {
short cost[l];
};
Symbol r = p->syms[RX];
if (generic(p->op) == INDIR && p->kids[O]->OP == VREG+P
&& r->u.t.cse && p->x.mayrecalc
&& ((struct _state*)r->u.t.cse->x.state)->cost[nt] == 0)
return r->u.t.cse;
else
return p;
}

The first return effectively ignores the tree p and reuses the definition
of common subexpression. If p uses a common subexpression, then the
definition of that subexpression is guaranteed to have been labelled al-
ready, so the reducer that called reuse can't wander off into the wind.
The cast and artificial _state above are necessary evils to access the la-
beller's cost of matching the tree to nonterminal nt. This book doesn't
expose the form of the state record - except for here, it's needed only in
cse 346 code generated automatically from the 1 burg specification - though it's
mayrecalc 385 easy to understand if you examine the companion diskette's source code
reduce 382 once you understand labelling. The length of the actual, target-specific
RX 362 cost vector can't be known here, but it isn't needed, so the declaration
_state 379
temporary 50
can pretend that the length is one.
VREG 361 reduce also counts the number of remaining uses for each temporary:
x.state 358
x.usecount 362 (count uses of temporaries 384) = 382
if (p->syms[RX] && p->syms[RX]->temporary) {
p->syms[RX]->x.usecount++;
}

If reuse leaves a temporary with no readers, the register allocator will


eliminate the code that loads the temporary.
The initial version of reuse was implemented one type suffix at a time,
which illustrates what really matters in at least some C programs. 1cc
comes with a testbed of 18 programs comprising roughly 9,000 lines. We
store baseline assembler code for these programs, which we compare
with the new code every time we change 1cc. The first cut at reuse
eliminated only free common subexpressions with the type suffix I. It
saved the MIPS testbed 58 instructions. Adding the suffixes C, S, D, F,
and B saved nothing, but adding P saved 382 instructions.
A common subexpression can't be recalculated if even one of its inputs
has changed. Before allowing a bonus match, the labeller calls may re ca1c
14. 3 • REDUCING THE TREE 385

to confirm that the common subexpression can be reevaluated, and it


records the answer in x.mayrecalc:
....
(gen.c functions)+=
int mayrecalc(p) Node p; {
384 385 ...
Node q;

(mayrecalc 385)
}

mayrecalc fails if the node does not represent a common subexpression:


(mayrecalc 385)=
if (!p->syms[RX]->u.t.cse)
...
385 385

return O;
It also fails if any tree earlier in the forest clobbers an input to the com-
mon subexpression:
....
(mayrecalc 385)+=
for (q =head; q && q->x.listed; q = q->link)
385 385
... 385

if (generic(q->op) == ASGN
&& trashes(q->kids[O], p->syms[RX]->u.t.cse))
return O;
346 cse
If neither condition holds, then the common subexpression can safely be 92 gen
402 gen
reevaluated: 386 prune
(mayrecalc 385)+=
....
385 385
382 reduce
362 RX
p->x.mayrecalc = 1; 358 x. inst
return 1; 359 x.kids
359 x.listed
trashes(p, q) traverses the common subexpression q and reports if the
assignment target p is read anywhere in q:
....
(gen.c functions)+=
static int trashes(p, q) Node p, q; {
385 386 ...
if ( ! q)
return O;
else i f (p->op q->op && p->syms[O] q->syms[O])
return 1;
else
return trashes(p, q->kids[O])
I I trashes(p, q->kids[l]);
}

When reduce and its helpers are done, gen calls prune. It uses the
x. inst mark to construct a tree of just instructions in the x. kids fields.
The register allocator runs next, and only instructions need registers.
386 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS

The rest of the nodes - for example, ADDP nodes evaluated automatically
by addressing hardware - need no registers, so 1cc projects them out
of the tree that the register allocator sees. The original tree remains in
the kids fields. The call to prune follows a reducer, but prune itself isn't
a reducer.
...
(gen.c functions)+=
static Node *prune(p, pp) Node p, pp[]; {
385 388 ...
(prune 386)
}

pp points to an element of some node's x. kids vector, namely the next


element to fill in. p points at the tree to prune. If p represents an in-
struction, prune stores the instruction into *pp and returns pp+l, which
points at the next empty cell. Otherwise, prune stores nothing, returns
pp, and does not advance.
If the tree pis empty, prune is done:
(prune 386)=
if (p == NULL)
...
386 386

return pp;
Otherwise, prune clears any trash in the node's x. kids fields:
kids 81 ...
RX 362
(prune 386) +=
p->x.kids[O] = p->x.kids[l] = NULL;
...
386 386 386
temporary 50
x.inst 358
x.kids 359 If p is not an instruction, prune looks for instructions in the subtrees,
x.usecount 362 starting with the first child:
...
(prune 386) +=
if (p->x.inst == O)
386 386
... 386

return prune(p->kids[l], prune(p->kids[O], pp));


Each recursive call can store zero or more instructions. Nesting the calls
above ensures that prune returns the cumulative effect on pp.
If p is an instruction that sets a temporary, and if the temporary's
x. usecount is less than two, then the temporary is set (by the instruction)
but never used, the instruction is omitted from the tree, and the traversal
continues as above:
...
(prune 386)+=
else if (p->syms[RX] && p->syms[RX]->temporary
386 387 ... 386

&& p->syms[RX]->x.usecount < 2) {


p->x.inst = O;
return prune(p->kids[l], prune(p->kids[O], pp));
}
14.3 • REDUCING THE TREE 387

Recall that reduce just computed x. usecount.


If none of the conditions above are met, p is a necessary instruction.
prune deposits it in *pp and returns the address of the next element
to set. It also prunes the node's subtrees and deposits any instructions
there into p's x. kids, because any instructions below this one must be
children of p and not the higher node into which pp points.
(prune 386)+= 386
... 386
else {
prune(p->kids[l], prune(p->kids[O], &p->x.kids[O]));
*pp = p;
return pp + 1;
}

prune bumps pp and can later store another p into the addressed cell.
This process can't overshoot, because x. kids has been made long enough
to handle the maximum number of registers read by any target instruc-
tion, which is the same as the number of children that any instruction -
and thus any node - can have. Ideally, prune would confirm this asser-
tion, but checking would require at least one more argument that would
be read only by assertions.
The dashed lines in Figure 14.5 show the x. kids that prune adds to
the tree in Figure 14.3 if ASGNI, ADDI, and ever are instructions and the
remaining nodes are subinstructions, which would be the case on many 81 kids
386 prune
current machines: ever loads a byte and extends its sign, ADDI adds 4, 382 reduce
and ASGNI stores the result. The solid lines are kids. 359 x.kids
The display below tracks the calls on prune that are made as the 362 x.usecount
dashed links are created, but it cuts clutter by omitting calls for which p
is zero, and by naming the nodes with their opcodes:

ASGNr - - - - - - _

ADD/. _\Dor>-' '


(---/ ~
.._ever eNSTI
i 4
INDrRe

i
ADDRLP
kids
x.kids

c
FIGURE 14.5 Figure 14.3 pruned.
388 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS

prune(ASGNI, &dummy) called


prune(ADDRLP, &ASGNI->x.kids[O]) called
prune(ADDI, &ASGNI->x.kids[O]) called
prune(ever, &ADDI->x.kids[O]) called
prune(INDIRe, &ADDI->x.kids[O]) called
prune(ADDRLP, &ADDI->x.kids[O]) called
prune(eVeI, &ADDI->x. kids [OJ) points the ADDI at the ever
prune(eNSTI, &ADDI->x.kids[l]) called
prune(ADDI, &ASGNI->x.kids[O]) points the ASGNI at the ADDI
prune(ASGNI, &dummy) points dummy at the ASGNI
gen calls prune and supplies a dummy cell to receive the pointer to the
top-level instruction. A dummy cell suffices because the root is executed
for side effect, so it must be an instruction, and gen knows where the
roots are without examining dummy.

14.4 Cost Functions


Most of the costs in l burg specifications are constant, but a few depend
on properties of the node being matched. For example, some instructions
that add a constant to another operand are confined to constants that fit
gen 92 in a few bits. For nodes p that hold a constant - ADDRL and ADDRF nodes
gen 402 hold constant stack offsets, and eNST nodes hold numeric constants -
prune 386 range(p, lo, hi) determines whether the constant lies between integers
x.offset 362 lo and hi inclusive. If it does, range returns a zero cost; otherwise it
returns a high cost, which forces the tree parser to use another match.
In an l burg cost expression, a denotes the node being matched, namely
the argument to _label when the cost expression is evaluated. A typical
use is:
con8: eNSTI "%a" range(a, -128, 127)
The rule above matches all eNSTI nodes, but the cost is prohibitive if the
constant doesn't fit in a signed 8-bit field. The implementation is:
(gen.c functions)+=
...
386 389
.....
#define ck(i) return (i) ? 0 : LBURG_MAX

int range(p, lo, hi) Node p; int lo, hi; {


Symbols= p->syms[O];

switch (p->op) {
case ADDRF,P: ck(s->x.offset >=lo && s->x.offset <=hi);
case ADDRLP: ck(s->x.offset >=lo && s->x.offset <=hi);
case eNSTe: ck(s->u.c.v.sc >=lo && s->u.c.v.sc <=hi);
case eNSTI: ck(s->u.c.v.i >=lo && s->u.c.v.i <=hi);
14. 5 • DEBUGGING 389

case CNSTS: ck(s->u.c.v.ss >= lo && s->u.c.v.ss <= hi);


case CNSTU: ck(s->u.c.v.u >=lo && s->u.c.v.u <=hi);
case CNSTP: ck(s->u.c.v.p == 0 && lo<= 0 && hi >= O);
}
return LBURG_MAX;
}

For unsigned character constants, range should zero-extend with the


value of u. c. v. uc, and not sign-extend with u. c. v. sc, but range's short-
cut can't hurt because CNSTC nodes appear only as the right-hand side of
an ASGNC, which ignores the extended bits anyway. Without this short-
cut, we'd need signed and unsigned variants of CNSTC to distinguish the
two cases. Unsigned short constants behave likewise.

14.S Debugging
1burg augments the tree parser with an encoding of much of its in-
put specification. This material is not strictly necessary, but it can help
produce displays for debugging. For example, the vectors _opname and
_ari ty hold the name and number of children, respectively, for each
terminal:
...
{BURM signature 378)+=
static char *_opname[];
381 390
... 97 fprint
306 IR
388 range
static char _arity[];
They are indexed by the terminal's integral opcode. 1cc uses them in
dumptree, which prints the operator and any subtrees in parentheses
and separated by commas:
...
{gen.c functions)+=
static void dumptree(p) Node p; {
388 390...
fprint(2, "%s(", IR->x._opname[p->op]);
if (IR->x._arity[p->op] == 0 && p->syms[O])
fprint(2, "%s", p->syms[O]->name);
else if (IR->x._arity[p->op] == 1)
dumptree(p->kids[O]);
else if (IR->x._arity[p->op] == 2) {
dumptree(p->kids[O]);
fprint(2, ", ");
dumptree(p->kids[l]);
}
fpri nt(2, ") ");
}

For leaves, dumptree adds p->syms [O] if it's present. It prints the tree
in Figure 14.3 as:
390 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS

ASGNI(ADDRLP(i), ADDI(CVCI(INDIRC(ADDRLP(c))), CNSTI(4)))


l cc uses dumptree, but this book omits the calls. They aren't interesting,
but dumptree itself is worth presenting to demonstrate l burg's debug-
ging support.
The vector _string holds the text for each rule.
...
(BURM signature 378)+=
static char *_string[];
...
389 391

It is indexed by a rule number. The reducer dumpcover extends the min-


imal reducer and uses _string to print a tree cover using indentation:
...
(gen.c functions)+=
static void dumpcover(p, nt, in) Node p; int nt, in; {
...
389 391

int rulenum, i;
short *nts;
Node kids[lOJ;

p = reuse(p, nt);
rulenum = getrule(p, nt);
nts = IR->x._nts[rulenum];
fprint(2, dumpcover(%x)
11
p); 11

for Ci = 0; i < in; i ++)


dumptree 389
fpri nt(2, 11 11
);

fprint 97 dumprule(rulenum);
getrule 382 (*IR->x._kids)(p, rulenum, kids);
IR 306 for (i = O; nts[i]; i++)
reuse 384 dumpcover(kids[i], nts[i], in+l);
}

static void dumprule(rulenum) int rulenum; {


fprint(2, %s I %s", IR->x._string[rulenum],
11

IR->x._templates[rulenum]);
if (!IR->x._isinstruction[rulenum])
fprint(2, \n 11 11
);

}
When compiling MIPS code for Figure 14.3, dumptree prints:
dumpcover(1001e9b8) stmt: ASGNI(addr, reg) I sw $%2,%1
dumpcover(1001e790) addr: ADDRLP I %a($sp)
dumpcover(1001e95c) reg: addr / la $%c,%1
dumpcover(1001e95c) addr: ADDI(reg, con) I %2($%1)
dumpcover(1001e8a4) reg: CVCI(INDIRC(addr)) / lb $%c,%1
dumpcover(1001e7ec) addr: ADDRLP / %a($sp)
dumpcover(1001e900) con: CNSTI I %a
The next section explains x._templates and the assembler templates
after each rule.
14. 6 • THE EMITTER 391

14.6 The Emitter


1cc's emitter is what actually outputs assembler code for the target ma-
chine. The emitter is target-independent and driven by two arrays that
capture the necessary machine-specific data. 1 burg emits into each BURM
some C code that declares and initializes these arrays. Both arrays are
indexed by a rule number. One yields the template for the rule:
(BURM signature 378) +=
....
390 391
....
static char *_template[];
The other flags the templates that correspond to instructions, and thus
distinguishes them from subinstructions like addressing modes:
(BURM signature 378) +=
....
391 406
....
static char _isinstruction[];
lburg numbers the rules starting from one, and it reports matches by
returning rule numbers, from which the templates may be found when
necessary. If a template ends with a newline character, then lburg as-
sumes that it is an instruction. If it ends with no newline character, then
it's necessarily a piece of an instruction, such as an operand.
emi tasm interprets the ru 1e structure and its assembler code template:
(gen.c functions)+=
.... 92 emit
static unsigned emitasm(p, nt) Node p; int nt; {
390 393
.... 393 emit
382 getrule
int rulenum; 306 IR
short *nts; 384 reuse
char *fmt;
Node kids[lO];

p = reuse(p, nt);
rulenum = getrule(p, nt);
nts = IR->x._nts[rulenum];
fmt = IR->x._templates[rulenum];
(emi tasm 392)
return O;
}

emi tasm is another reducer, but it processes a partially linearized tree.


Llst elements are the roots of subtrees for instructions. emi tasm calls
itself recursively only to process subinstructions like address calcula-
tions. Its traversal starts with an instruction and ends when the recur-
sion reaches the instructions that supply values to this instruction. That
is, emi tasm's reduction traces the intra-instruction tree parse, which cor-
responds to addressing modes and other computations inside a single
instruction. emi tasm's driver, emit, ensures that emi tasm sees these in-
structions in the right order, which handles interinstruction ordering.
392 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS

emit sets x. emitted to flag nodes as it emits them. When emi tasm
encounters an instruction that it has already emitted, it emits only the
name of the register in which that instruction left its result. For all nodes
that develop a value, the register allocator has recorded the target regis-
ter in p->syms [RX]:
(emitasm 392)= 392
.... 391
if (IR->x._isinstruction[rulenum] && p->x.emitted)
outs(p->syms[RX]->x.name);
If the template begins with #, the emitter calls emi t2, a machine-specific
procedure:
(emi tasm 392) +=
....
392 392 391
....
else if (*fmt == '#')
(*IR->x.emit2)(p);
1cc needs this escape hatch to generate arbitrary code for tricky features
like structure arguments. Otherwise, emi tasm emits the template with a
little interpretation:
....
(emi tasm 392) += 392 391
else {
(omit leading register copy? 393)
bp 97 for ((*IR->x._kids)(p, rulenum, kids); *fmt; fmt++)
emit2 356 if (*fmt != '%')
(MIPS) " 444
(SPARC) " 478
*bp++ = *fmt;
(X86) " 511 else if (*++fmt == 'F')
emitasm 391 print("%d", framesize);
emit 92 else if (*fmt >= 'O' && *fmt <= '9')
emit 393 emitasm(kids[*fmt - 'O'], nts[*fmt - 'O']);
framesize 366
IR 306 else if (*fmt >= 'a' && *fmt < 'a' + NELEMS(p->syms))
NELEMS 19 outs(p->syms[*fmt - 'a']->x.name);
outs 16 else
RX 362 *bp++ = *fmt;
x.emitted 360 }
x.name 362
bp is the pointer into the output buffer in the module output. c. %F
tells emi tasm to emit framesi ze, which helps emit local offsets that
are relative to the size of the frame. Substrings of the form %digit
tell it to emit recursively the subtree corresponding to the digit-th
nonterminal from the pattern, counting from zero, left to right, and
ignoring nesting. Substrings like %x tell emi tasm to emit the node's
p->syms [' x' - 'a'] ->x. name; for example, %c emits p->syms [2] ->x. name.
Table 14.1 summarizes these conventions.
So the emitter interprets the string "lw r%c, %1 \n" by emitting "lw r",
then the name (usually a digit string) of the target register, then a comma.
Then it recursively emits p->ki ds [1] as an addr, if nts [1] holds the
14.6 • THE EMIITER 393

Template Emitted
%% One percent sign
%F framesize
%digit The subtree corresponding to the rule's digit-th
nonterminal
%letter p->syms [letter - 'a' ]->x. name
any other character The character itself
#(in position 1) Call emi t2 to emit code
? (in position 1) Skip the first instruction if the source and
destination registers are the same

TABLE 14.1 Emitter template syntax.

integer that represents the nonterminal addr. Finally, emitasm emits a


newline character.
Some targets have general three-operand instructions, which take two
independent sources and yield an independent destination. Other targets
save instruction bits by substituting two-operand instructions, which
constrain the destination to be the first source. The first source might
not be dead, so 1cc uses two-instruction templates for opcodes like ADDI.
The first instruction copies the first source to the destination, and second
adds the second source to the destination. If the first source is dead, the
356 emit2
register allocator usually arranges for the destination to share the same 444 " (MIPS)
register, so the first instruction copies a register to itself and is redun- 478 " (SPARC)
dant. These redundant instructions are most easily omitted at the last 511 " (X86)
minute, in the emitter. Each specification flags such instructions with a 391 emitasm
leading question mark, and emit skips them if the source and destination 394 moveself
394 requate
registers are the same. 362 RX
360 x.equatable
(omit leading register copy? 393} = 392 359 x.next
if (*fmt == '?') {
fmt++;
if (p->syms[RX] == p->kids[O]->syms[RX])
while (*fmt++ != '\n')

The interface procedure emit traverses a list of instructions and emits


them one at a time:
....
(gen.c functions}+= 391 394
.....
void emit(p) Node p; {
for (; p; p = p->x.next) {
if (p->x.equatable && requate(p) I I moveself(p))

else
394 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS

(*emitter)(p, p->x.inst);
p->x.emitted = 1;
}
}

Most interface routines have one implementation per target, but there's
only one implementation of emit because the target-specific parts have
been factored out into the assembler code templates.
The indirect call above permits l cc to call another emitter. For exam-
ple, this feature has been used to replace this book's emitter with one
that emits binary object code directly. emitter is initialized to emi tasm:
(gen.c data)+=
...
371 398
unsigned (*emitter) ARGS((Node, int)) = emitasm;
....
emit implements two last-minute optimizations. moveself declines to
emit instructions that copy a register on top of itself:
(gen.c functions)+=
...
static int moveself(p) Node p; {
....
393 394

return p->x.copy
&& p->syms[RX]->x.name == p->x.kids[O]->syms[RX]->x.name;
}
emitasm 391
emit 92
The equality test exploits the fact that the string module stores only one
emit 393 copy of each distinct string. x. copy is set by the cost function move,
RX 362 which is called by rules that select register-to-register moves:
x.copy 360
(gen.c functions)+=
...
394 394
x.emitted 360
x.equatable 360 int move(p) Node p; {
....
x.inst 358 p->x.copy = 1;
x.kids 359
x.name 362
return 1;
}

emi t's other optimization eliminates some register-to-register copies


by changing the instructions that use the destination register to use
the source register instead. The register allocator sets x. equatable if
p copies a register s re to a temporary register tmp for use as a common
subexpression. If x. equatable is set, then the emitter calls requate,
which scans forward from p:
(gen.c functions)+=
...
394 398
....
static int requate(q) Node q; {
Symbol src = q->x.kids[O]->syms[RX];
Symbol tmp = q->syms[RX];
Node p;
int n = O;
14.6 • THE EMITTER 395

for (p = q->x.next; p; p = p->x.next)


(requate 395)
for (p = q->x.next; p; p = p->x.next)
if (p->syms[RX] == tmp && readsreg(p)) {
p->syms[RX] = sre;
i f (--n <= 0)
break;
}
return 1;
}

The first for loop holds several statements that return zero; they cause
the emitter to go ahead and emit the instruction, unless moveself in-
tervenes. The emitter omits the register-to-register copy only if requate
exits the first loop, falls into the second, and returns one. The second
loop replaces all reads of tmp with reads from s re; the first loop counts
these reads in n.
If an instruction copies tmp back to s re, it is changed so that movese 1f
will delete it, and the loop continues to see if more changes are possible:
(requate 395)= 395
.... 395
if (p->x.eopy && p->syms[RX] == sre
&& p->x.kids[O]->syms[RX] == tmp)
361 mask
p->syms[RX] = tmp; 394 moveself
394 requate
Without this test, return f() would copy the value of f from the re- 362 RX
turn register to a temporary and then back to the return register for the 361 set
current function. 361 VREG
If the scan hits an instruction that targets s re, if the instruction 360 x.copy
doesn't assign s re to itself, and if the instruction doesn't merely read 359 x.kids
359 x.next
sre, then requate fails because tmp and sre do not, in general, hold the 362 x.regnode
same value henceforth:
(gen.c macros)= ....
413
#define readsreg(p) \
(generie((p)->op)==INDIR && (p)->kids[O]->OP==VREG+P)
#define setsre(d) ((d) && (d)->x.regnode && \
(d)->x.regnode->set == sre->x.regnode->set && \
(d)->x.regnode->mask&sre->x.regnode->mask)

(requate 395)+=
....
.... 395
395 396
else if (setsre(p->syms[RX]) && !moveself(p) && !readsreg(p))
return O;
For example, e=*p++ generates the pseudo-instructions below when p is
in register rl. Destinations are the rightmost operands.
396 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS

move rl,r2 save value of p


add r2,l,rl increment p
loadb (r2),r3 fetch character
storeb r3,c store character
requate could change the add to use rl instead of r2, but it can't change
any subsequent instructions likewise, because rl and r2 aren't equivalent
after the add.
requate also quits if it encounters an instruction that spills tmp:
(requate 395}+= 395 396
...
.... 395
else if (generic(p->op) == ASGN && p->kids[O]->op == ADDRLP
&& p->kids[O]->syms[O]->temporary
&& p->kids[l]->syms[RX]->x.name == tmp->x.name)
return O;
No explicit flag identifies the nodes that gens pi 11 inserts, but the con-
dition above catches them.
requate also gives up if it hits a call, unless it ends the forest, because
src might be a caller-saved register, which calls clobber .
(requate 395}+=
...
396 396 395
....
else if (generic(p->op) == CALL && p->x.next)
return O;
clobber 357
(MIPS) " 435 Usually, src is a callee-saved register variable, so requate might confirm
(SPARC) " 468 that the register is caller-saved before giving up, but this check netted
(X86) " 502
no gains in several thousand lines of source code, so it was abandoned
genspill 424
readsreg 395 as a gratuitous complication.
requate 394 requate also gives up at each label, unless it ends the forest, because
RX 362 src might have a different value afterward:
temporary 50
( requate 395} +=
...
396 396 395
x.name 362 ....
x.next 359 else if (p->op == LABEL+V && p->x.next)
return O;
If none of the tests above succeed, tmp and src hold the same value,
so if this node reads tmp, it is counted and the loop continues to see if
the rest of the uses of tmp can be replaced with s re:
(requate 395}+=
...
396 396 395
....
else if (p->syms[RX] == tmp && readsreg(p))
n++;
If a node writes tmp, or if requate runs out of instructions, then the
forest is done with tmp, and requate's first loop exits:
(requate 395}+=
...
396 395
else if (p->syms[RX] == tmp)
break;
14.7 • REGISTER TARGETING 397

Now requate's second loop replaces all reads of tmp with reads of src;
then requate returns one, and the emitter omits the initial assignment
to tmp.
At this point, the most common source of gratuitous register-to-
register copies is postincrement in a context that uses the original value,
such as c=*p++. l cc's code for these patterns starts with a copy, when
some contexts could avoid it by reordering instructions. For example,
a more ambitious optimizer could reduce the four pseudo-instructions
above to
loadb (rl),r3 fetch character
add rl,l,rl increment p
storeb r3,c store character
Register-to-register moves now account for roughly 5 percent of the
MIPS and SPARC instructions in the standard lee testbed. In the MIPS
code, about half copy a register variable or zero - which is a register-to-
register copy using a source register hard-wired to zero - to a register
variable or an argument or return register. Such moves are not easily
deleted. Some but not all of the rest might be removed, but we're nearing
the limit of what simple register-copy optimizations can do.

14.7 Register Targeting 315 node


398 pre label
Some nodes can be evaluated in any one of a large set of registers, but 394 requate
others are fussier. For example, most computers can compute integer
sums into any of the general registers, but most calling conventions leave
return values in only one register.
If a node needs a child in a fixed register, register targeting tries to
compute the child into that register. If the child can't compute its value
there, then the code generator must splice a register-to-register copy into
the tree between the parent and child. For example, in
f(a, b) { return a + b; }
the return is fussy but the sum isn't, so the code can compute the sum
directly into the return register. In contrast,
f() { register inti = g(); }
g generally returns a value in one register, and the register variable i
will be assigned to another register, so a register-to-register copy can't
be avoided.
The next chapter covers the actual allocation of registers to vari-
ables and temporaries, but the register-to-register copies are instruc-
tions. They can be handled just like all other instructions only if they
are represented by nodes. To that end, prel abel makes a pass over the
tree before labelling:
398 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS

....
(gen.c functions)+=
static void prelabel(p) Node p; {
394 399 ...
(prelabel 398)
}

It marks each fussy node with the register on which it insists, and it
marks the remaining nodes - at least those that yield a result instead
of a side effect - with the wi ldcard symbol that represents the set
of valid registers. It also inserts LOAD nodes where register-to-register
copies might be needed.
preload starts by traversing the subtrees left to right:
(prelabel 398)=
if (p == NULL)
...
398 398

return;
prelabel(p->kids[O]);
prelabel(p->kids[l]);
Then it identifies the register class for nodes that leave a result in a
register:
....
(prelabel 398)+=
if (NeedsReg[opindex(p->op)])
398 399
... 398

setreg(p, rmap[optype(p->op)]);
LOAD 361
opindex 98 The NeedsReg test distinguishes nodes executed for side effect from
optype 98
setreg 399 those that need a register to hold their result. NeedsReg is indexed by a
generic opcode and flags the opcodes that yield a value:
....
(gen.c data)+=
static char NeedsReg[] = {
394 402 ...
0, /* unused */
1, /* CNST */
0, 0, /* ARG ASGN */
1, /* INDIR */
1, 1, 1, 1, /* eve CVD CVF CVI */
1, 1, 1, 1, /* CVP CVS CVU NEG */
1, /* CALL */
1, /* LOAD */
0, /* RET */
1, 1, 1, /* ADDRG ADDRF ADDRL */
1, 1, 1, 1, 1, /* ADD SUB LSH MOD RSH */
1, 1, 1, 1, /* BAND BCOM BOR BXOR */
1, 1, /* DIV MUL */
0, 0, 0, 0, 0, 0, /* EQ GE GT LE LT NE */
0, 0, /* JUMP LABEL */
};
Symbol rmap[16];
14. 7 • REGISTER TARGETING 399

rmap is indexed by a type suffix, and holds the wi 1dcard that repre-
sents the set of registers that hold values of each such type. For ex-
ample, rmap [I] typically holds a wildcard that represents the general
registers, and rmap [DJ holds the wildcard that represents the double-
precision floating-point registers. Each register set is target-specific, so
the target's progbeg initializes rmap. set reg records the value from rmap
in the node to support targeting and register allocation:
(gen.c functions)+=
...
398 400
....
void setreg(p, r) Node p; Symbol r; {
p->syms [RX] = r;
}

It would be too trivial to merit a function if it hadn't been a useful spot


for assertions and breakpoints in the past.
pre 1abe l's call on set reg assigns the same wildcard to all opcodes
with the same type suffix; prelabel corrects fussy nodes below.
Register variables can influence targeting, so pre 1abe1 next identifies
nodes that read and write register variables. Front-end symbols distin-
guish between register and nonregister variables - the symbol's scl ass
field is REGISTER - but front-end nodes don't. The back end must gen-
erate different code to access these two storage classes, so pre label
changes some opcodes that access register variables. It replaces ADDRL
and ADDRF with VREG if the symbol referenced is a register variable, and 398 prelabel
89 progbeg
it replaces the wildcard in the INDIR above a VREG with the single register 433 " (MIPS)
assigned to the variable: 466 " (SPARC)

(pre l abe 1 398) +=


...
398 400 398
498 " (X86)
.... 80 REGISTER
rmap
switch (generic(p->op)) { 398
case ADDRF: case ADDRL: 362 RX
if (p->syms[O]->sclass REGISTER) 38 sclass
361 VREG
p->op = VREG+P;
break;
case INDIR:
if (p->kids[O]->op == VREG+P)
setreg(p, p->kids[O]->syms[O]);
break;
case ASGN:
(pre 1abe1 case for ASGN 399)
break;
}

pre 1abe l targets the right child of each assignment to a register variable
to develop its value directly into the register variable whenever possible:
(pre 1abe l case for ASGN 399) = 399
if (p->kids[O]->op == VREG+P) {
400 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS

rtarget(p, 1, p->kids[O]->syms[O]);
}

Finally, prel abel calls a target-specific procedure that adjusts the regis-
ter class for fussy opcodes:
(prelabel 398)+=
...
399 398
(IR->x.target)(p);
rtarget(p, n, r) guarantees that p->ki ds [n] computes its result di-
rectly into register r:
...
(gen.c functions)+=
void rtarget(p, n, r) Node p; int n; Symbol r; {
399 402 ...
Node q = p->kids[n];

if (!q->syms[RX]->x.wildcard) {
q = newnode(LOAD + optype(q->op),
q, NULL, q->syms[O]);
if (r->u.t.cse == p->kids[n])
r->u.t.cse = q;
p->kids[n] = p->x.kids[n] = q;
q->x.kids[O] = q->kids[O];
}
cse 346
IR 306 setreg(q, r);
LOAD 361 }
newnode 315
optype 98 If the child has already been targeted - to another a register variable
prelabel 398 or to something special like the return register - then rtarget splices a
reg 403 LOAD into the tree between parent and child, and targets the LOAD instead
RX 362 of the child. The code generator emits a register-to-register copy for
setreg 399
x.kids 359 LOADs. If the child has not been targeted already, then q->syms [RX] holds
x.target 357 a wildcard; the final set reg is copacetic because r must be a member of
x.wildcard 363 the wildcard's set. If it weren't, then we'd be asking 1cc to emit code to
copy a register in one register set to a member of another register set,
which doesn't happen without an explicit conversion node.
Figure 14.6 shows three sample trees before and after rtarget. They
assume that rO is the return register and r2 is a register variable. The
first tree has an unconstrained child, so rtarget inserts no LOAD!. The
second tree has an INDIRI that yields r2 below a RETI that expects rO, so
rtarget inserts a LOAD!. The third tree has a CALLI that yields rO below
an ASGNI that expects r2, so again rtarget inserts a LOAD!.
prelabel and rtarget use register targeting to fetch and assign reg-
ister variables, so 1 cc's templates for these operations emit no code for
either operation on any machine. All machines share the rules:
(shared rules 400) =
reg: INDIRC(VREGP) "# read register\n"
...
403 431 463 496
14. 7 • REGISTER TARGETING 401

RETI RETI

i
ADDI syms[RX]=?
i
ADDI syms[RX]=rO

/~ /~

RETI RETI
=>
i
IND I RI syms[RX]=r2
i
LOAD! syms[RX]=rO

i
VREGP
i
INDIRI syms[RX]=r2
r2
i
VREGP
r2

ASGNI ASGNI

/~CALLI
VREGP syms[RX]=rO
/~
VREGP LOAD! syms[RX]=r2
403 reg
r2
i r2 i 400 rtarget
403 stmt
CALLI syms[RX]=rO

i
FIGURE 14.6 rtarget samples.

reg: INDIRD(VREGP) "# read register\n"


reg: INDIRF(VREGP) "# read register\n"
reg: INDIRI(VREGP) "# read register\n"
reg: INDIRP(VREGP) "# read register\n"
reg: INDIRS(VREGP) "# read register\n"
stmt: ASGNC(VREGP,reg) "# write register\n"
stmt: ASGND(VREGP,reg) "# write register\n"
stmt: ASGNF(VREGP,reg) "#write register\n"
stmt: ASGNI(VREGP,reg) "#write register\n"
stmt: ASGNP(VREGP,reg) "# write register\n"
stmt: ASGNS(VREGP,reg) "# write register\n"
The comment template emits no code, but it appears in debugging out-
put, so the descriptive comments can help.
402 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS

14.8 Coordinating Instruction Selection


Section 13.l explained that rewrite and gen coordinate some of the pro-
cesses described in this chapter. Now for the details. rewrite performs
register targeting and instruction selection for a single tree:
...
(gen.c functions)+=
static void rewrite(p) Node p; {
...
400 402

prelabel(p);
(*IR->x._label)(p);
reduce(p, 1);
}

The interface function gen receives a forest from the front end and makes
several passes over the trees.
...
(gen.c data)+=
Node head;
...
398 410

...
(gen.c functions)+=
Node gen(forest) Node forest; {
...
402 404

int i;
struct node sentinel;
docall 367 Node dummy, p;
forest 311
IR 306
node 315 head = forest;
prelabel 398 for (p = forest; p; p = p->link) {
prune 386 (select instructions for p 402)
reduce 382 }
for (p =forest; p; p = p->link)
prune(p, &dummy);
(linearize forest 414)
(allocate registers 415)
return forest;
}

The first pass calls rewrite to select instructions, and the second prunes
the subinstructions out of the tree. The first pass performs any target-
specific processing for arguments and procedure calls; for example, it
arranges to pass arguments in registers when that's what the calling con-
vention specifies:
(select instructions for p 402) = 402
if (generic(p->op) == CALL)
docall(p);
else if ( generic(p->op) == ASGN
&& generic(p->kids[l]->op) == CALL)
14.9 • SHARED RULES 403

docall(p->kids[l]);
else if (generic(p->op) == ARG)
(*IR->x.doarg)(p);
rewrite(p);
p->x.listed = 1;
Only doarg is target-specific. Within any one tree, the code generator
is free to evaluate the nodes in whatever order seems best, so long as
it evaluates children before parents. Calls can have side effects, so the
front end puts all calls on the forest to fix the order in which the side
effects happen. If the call returns no value, or if the returned value is
ignored, then the call itself appears on the forest; the first if statement
recognizes this pattern. Otherwise, the call appears below an assignment
to a temporary, which is later used where the returned value is needed;
the second if statement recognizes this pattern.
The first pass also marks listed nodes. Chapter 15 elaborates on this
and on the rest of gen's passes.

14.9 Shared Rules


A few rules are common to all targets in this book. They are factored out
in a target-independent fragment to save space and to keep them consis- 356 doarg
tent as 1cc changes. Some common rules match the integer constants: 445 " (MIPS)
.... 477 " (SPARC)
(shared rules 400) +=
con: CNSTC "%a"
400 403
... 431 463 496 512
367
" (X86)
docall
con: CNSTI "%a" 92 gen
con: CNSTP "%a" 402 gen
306 IR
con: CNSTS "%a" 404 notarget
con: CNSTU "%a" 402 rewrite
356 x.doarg
A convention shared by all 1burg specifications in this book has the 359 x. listed
nonterminal reg match all computations that yield a result in a register
and the nonterminal stmt match all roots, which are executed for some
side effect, typically on memory or the program counter. The rule
....
(shared rules 400) +=
stmt: reg lltl
403 403
... 431 463 496

is necessary when a node that yields a register appears as a root. A CALLI


is such a node when the caller ignores its value.
The following rules note that no current 1cc target requires any com-
putation to convert an integral or pointer type to another such type of
the same size:
....
(shared rules 400) += 403 431 463 496
reg: CVIU(reg) "%0" notarget(a)
404 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS

reg: CVPU(reg) "%0" notarget(a)


reg: CVUI(reg) "%0" notarget(a)
reg: CVUP(reg) "%0" notarget(a)
The cost function notarget makes a zero-cost match in most cases, but
if register targeting has constrained the node to yield a fixed register -
that is, the destination register is no longer a wildcard that represents
a set of registers - then a register-to-register copy may be required, so
notarget returns a cost that aborts this rule:
(gen.c functions}+=
...
402 410
.....
int notarget(p) Node p; {
return p->syms[RX]->x.wildcard ? 0 : LBURG_MAX;
}

Each specification includes parallel rules that generate a unit-cost register


copy, which is used when the node has a fixed target register.

14.10 Writing Specifications


Chapters 16-18 show some complete lburg inputs. Perhaps the easiest
way to write an 1burg specification is to start by adapting one explained
reg 403 in this book, but a few general principles can help. Page 436 illustrates
RX 362 several of these guidelines.
x.wildcard 363 Write roughly one rule for each instruction and one for each address-
ing mode that you want to use. The templates give the assembler syntax,
and the patterns describe the effect of the instruction using a tree of
intermediate-language operators.
Replicate rules for equivalent operators. For example, a rule for ADDI
usually requires a similar rule for ADDU and ADDP.
Write extra rules for each tree operator that can be implemented as
a special case of some more general operation. For example, ADDRLP
can often be implemented by adding a constant to the frame pointer,
so whenever you write a rule that matches the sum of a constant and a
register, write a variant of the rule that matches ADDRLP.
Write an extra rule for each degenerate case of a more general opera-
tion. For example, if some addressing mode matches the sum of a con-
stant and a register, then it can also perform simple indirect addressing,
when the constant is zero.
Write an extra rule that emits multiple instructions for each operator
in the intermediate language that no single instruction implements. For
example, many machines have no instruction that implements CVCI di-
rectly, so their specifications implement a rule whose template has two
shift instructions. These instructions propagate the sign bit by shifting
the byte left logically or arithmetically, then right arithmetically.
FURTHER READING 405

Use one nonterminal to derive all trees that yield a value. Use this non-
terminal wherever the instruction corresponding to a rule pattern reads a
register. This book uses the nonterminal reg this way. A variant that can
catch a few more errors uses one nonterminal for general-purpose reg-
isters and another for floating-point registers (e.g. freg). For example,
rules that use only one register nonterminal can silently accept corrupt
trees like NEGF(INDIRI(. .. )). This particular error is rare.
Similarly, use one nonterminal to derive all trees executed only for
side effect. Examples include ASGN and ARG. This book uses stmt for
side-effect trees. It is possible to write l burg specifications that combine
reg and stmt into one large class, but the register allocator assumes that
the trees with side effects are roots, and trees with values are interior
nodes or leaves; it can silently emit incorrect code - the worst nightmare
for compiler writers - if its assumptions are violated. Separating reg
from stmt makes the code generator object if these assumptions are
ever violated.
Ensure that there's at least one way to generate code for each opera-
tion in the intermediate language. One easy way to do so is to write one
register-to-register rule for each operator:
reg: LEAF
reg: UNARY(reg)
reg: OPERATOR(reg,reg) 403 reg
403 stmt
Such rules ensure that l cc can match each node at least one way and
emit assembler code with one instruction per node.
Scan your target's architecture manual for instructions or addressing
modes that perform multiple intermediate-code operations, and write
rules with patterns that match what the instructions compute. Rules 3
and 7 in Figure 14.2 are examples. If you have a full set of register-to-
register rules, these bigger rules won't be necessary, but they typically
emit code that is shorter and faster. Skip instructions and addressing
modes so exotic that you can't imagine a C program - or a C compiler
- that could use them.
Use nonterminals to factor the specification. If you find you're repeat-
ing a subpattern often, give it a rule and a nonterminal name of its own.

Further Reading
l cc's instruction selector is based on an algorithm originally described
by Aho and Johnson (1976). The interface was adapted from burg (Fraser,
Henry, and Proebsting 1992) and the implementation from the compat-
ible program iburg (Fraser, Hanson, and Proebsting 1992). iburg per-
forms dynamic programming at compile time. burg uses BURS theory
(Pelegri-Llopart and Graham 1988; Proebsting 1992) to do its dynamic
400 CHAPTER 14 • SELECTING AND EMITTING INSTRUCTIONS

programming when processing the specification, so it is faster but some-


what less flexible.
The retargetable compiler gee (Stallman 1992) displays another way to
select instructions. It uses a naive code generator and a thorough retar-
getable peephole optimizer that is driven by a description of the target
machine. Davidson and Fraser (1984) describe the underlying method.

Exercises
14.1 What would break if we changed the type of costs from short to
int?
14.2 _kids is not strictly necessary. Describe how you'd implement a
reducer without it.
14.3 1 burg represents each nonterminal with an integer in a compact
range starting at one, which represents the start nonterminal. The
zero-terminated vector _ntname is indexed by these numbers and
holds the name of the corresponding nonterminal:
(BURM signature 378) +=
...
391
static char *_ntname[];
dumpcover 390
_kids 381 Use it to help write a procedure void dumpmatches (Node p) to dis-
moveself 394 play a node p and all rules that have matched it. Typical output
reuse 384 might be

dumpmatches(Ox1001e790)=ADDRLP(i):
addr: ADDRLP I %a($sp)
re: reg I $%1
reg: addr I la $%c,%1

dumpmatches is not a reducer.


14.4 Tree parsers misbehave on dags. Does the labelling pass misbe-
have? Why or why not? Does the reducer misbehave? Why or why
not?
14.5 Use the -d or -Wf-d option to compile

f(i) { return (i-22)>>22; }

for any machine. The lines from dumpcover identify themselves.


Which correspond to reuse's bonus matches? Which correspond
to subsequent matches enabled by bonus matches?
14.6 How many instructions does the moveself optimization save when
compiling 1cc on your machine?
EXERCISES 407

14.7 One measure of an optimization is whether it pays off compiling the


compiler. For example, the requate optimization takes time. Can
you detect how much time it takes on your machine? When the
optimization is used to compile l cc with itself, it can save time by
generating a faster compiler. Can you measure this improvement
on your machine? Did requate pay off?

394 requate
15
Register Allocation

Register allocation can be viewed as having two parts: allocation decides


which values will occupy registers, and assignment assigns a particular
register to each value. Instruction selection commits certain subexpres-
sions to registers and thus implicitly allocates temporaries, or interme-
diate values, but the allocation for register variables and the assignment
of all registers has been left to this chapter.
Like all register allocators, l cc's has several tasks. It must keep track
of which registers are free and which are busy. It must allocate a register
at the beginning of the lifetime of a variable or an intermediate value, and
it must free the register for other use at the end of that lifetime. Finally,
when the register allocator runs out of registers, it must generate code
to spill a register to memory and to reload it when the spilled value is
needed again later.
l cc provides register variables, and it assigns some variables to regis-
ters even without explicit declarations, but this is the extent of its global
register allocation. It does no interprocedural register allocation, and
allocates temporaries only locally, within a forest.
More sophisticated register allocators are available, but this one yields
code that is satisfactory and competitive with other compilers in wide
use. l cc's spiller is particularly modest. Typical compilations spill so
seldom that it seemed more effective to invest tuning effort elsewhere.
A more ambitious register allocator would keep more values in registers
over longer intervals, which would increase the demand for registers and
thus would increase the number of spills. l cc's register allocator is sim-
ple, so its companion spiller can be simple too.
A top priority in the design of l cc's register allocator was that it have
enough flexibility to match existing conventions for register usage, be·
cause we wanted l cc's code to work with common existing ANSI C Ii·
braries. That is, we didn't want to write, maintain, or compile with an
l cc-specific library.
The second priority was overall simplicity, particularly minimizing
target-specific code. These goals can conflict. For example, l cc's spiller
is target independent, and thus must construct indirectly the instruc-
tions to spill and reload values. That is, it creates intermediate-code
trees and passes them through the code generator; this is complicated
by the fact that we're already in the middle of the code generator and out
of registers to boot. A target-specific spiller would be simpler because it
could simply emit target instructions to spill and reload the register, but

408
15. 1 • ORGANIZATION 409

we'd have to write and debug a new spiller for each target. Even with a
simple spiller like 1 cc's, spills are rare, which means that good test cases
for spillers are complex and hard to find, and spillers are thus hard to
debug. One target-specific spiller would be simpler than 1 cc's, but the
savings would've been lost over the long run.

15.1 Organization
Table 15.1 illustrates the overall organization of the register allocator
by showing highlights from the call graph. Indentation shows who calls
whom. This material is at a high level and is meant to orient us before
we descend to the low levels.
After the back end has selected instructions and projected the subin-
structions out of the tree - in the tree linked through the x. kids array
- 1 i neari ze traverses the projected tree in postorder and links the in-
structions in the order in which they will ultimately execute. gen walks
down this list and passes each instruction to ra 11 oc, which normally
calls just put reg to free the registers no longer used by its children, and
getreg to allocate a register for itself. For temporaries, ralloc allocates
a register at the first assignment, and frees the register at the last use.
If getreg finds no free register that suits the instruction, it calls
spi 11 ee to identify the most distantly used register. Then getreg calls 411 askfi xedreg
spi 11 to generate code to spill this register to memory and reload the 411 askreg
value again later. gens pi 11 generates the spill, and genre 1oad replaces 92 gen
402 gen
all not-yet-processed uses of the register with nodes that load the value 426 genreload
from memory. genreload calls reprune to reestablish the relationship 424 gens pi 11
between kids and x. kids that prune established before spilling changed 412 getreg
the forest. 81 kids
413 li neari ze
386 prune
Name of Routine Purpose 410 putreg
417 ralloc
linearize orders for output one instruction tree 426 reprune
ralloc frees and allocates registers for one instruction 422 spillee
put reg frees a busy register 427 spill
getreg finds and allocates a register 423 spillr
ask reg finds and allocates a free register 359 x.kids
askfixedreg tries to allocate a given register
spil lee identifies a register to spill
spill spills one or more registers
spill r spills one register
genspill generates code to spill a register
gen reload generates code to reload a spilled value
rep rune updates kids after gen re 1oad updates x. kids

TABLE 15.1 Back-end call tree (simplified).


410 CHAPTER 15 • REGISTER ALLOCATION

ralloc is not the only entry point for these routines. clobber calls
spi 11 directly to spill and reload such registers as those saved across
calls by the caller. Also, each target's interface procedure local can
reach askreg via askregvar, which tries to allocate a register for a reg-
ister variable.

15.2 Tracking the Register State


Masks record which registers are free and which have been used during
the routine being compiled. freemask tracks which registers are free; it
tells the register allocator which registers it can allocate. usedmask tracks
all registers that have been used by the current routine; it tells function
which registers must be saved in the procedure prologue and restored in
the epilogue. Both masks are vectors with one element for each register
set.
(gen.c data)+=
...
402 410
....
unsigned freemask[2);
unsigned usedmask[2];
Each target's function interface procedure initializes the masks to record
that no registers have been used and all are free:
askreg 411
askregvar 412 (clear register state410)= 448 485 519
clobber 357 usedmask[O] = usedmask[l] = O;
(MIPS) " 435
(SPARC) " 468 freemask[O] = freemask[l] = -(unsigned)O;
(X86) " 502
function 92 Each progbeg sets the parallel masks tmask and vmask. tmask identifies
(MIPS) " 448 the registers to use for temporary values. vmask identifies the registers
(SPARC) " 484 that may be allocated to register variables.
(X86) "
getreg
518
412 (gen.c data)+=
...
410
local 90 unsigned tmask[2];
(MIPS) " 447 unsigned vmask[2];
(SPARC) " 483
(X86) " 518 Unallocable registers - the stack pointer, for instance - belong in nei-
mask 361
progbeg 89 ther tmask nor vmask.
(MIPS) " 433 The values of freemask and usedmask are maintained by the low-level
(SPARC) " 466 routines put reg, get reg, askreg, and askregvar, which allocate and free
(X86) " 498 individual registers. putreg frees the register represented by symbol r.
ralloc 417 Only freemask distinguishes busy registers from free ones, so putreg
set 361
spill 427 need change nothing else.
x.regnode 362
(gen.c functions)+=
...
404 411
....
static void putreg(r) Symbol r; {
freemask[r->x.regnode->set] I= r->x.regnode->mask;
}
15.2 • TRACKING THE REGISTER STATE 411

askfixedregr allocates a fixed register r if possible. If the register is


busy, askfixedreg returns null. Otherwise, it adjusts the record of the
register state and returns r.
....
(gen.c functions)+=
static Symbol askfixedregCs) Symbol s; {
410 411 ...
Regnode r = s->x.regnode;
int n = r->set;

if Cr->mask&-freemask[n])
return NULL;
else {
freemask[n] &= -r->mask;
usedmask[n] I= r->mask;
return s;
}
}

askreg accepts a symbol that represents one fixed register or a wild-


card symbol that represents a set of registers. askreg's second argument
is a mask that can limit the wildcard. If the register is fixed, ask reg sim-
ply calls askfi xedreg. Otherwise, it looks for a free register acceptable
to the mask and the set of registers represented by the wildcard:
412 askregvar
....
(gen.c functions)+=
static Symbol askregCrs, rmask)
411 412 ... 410
361
freemask
mask
361 Regnode
Symbol rs; unsigned rmask[]; { 361 set
int i; 410 usedmask
362 x.regnode
if Crs->x.wildcard == NULL) 363 x.wildcard
return askfixedregCrs);
for Ci = 31; i >= O; i--) {
Symbol r = rs->x.wildcard[i];
if Cr != NULL
&& !Cr->x.regnode->mask&-rmask[r->x.regnode->set])
&& askfixedregCr))
return r;
}
return NULL;
}

The use of register masks places an upper bound on the number of reg-
isters in a register set; the upper bound is the number of bits in an
unsigned integer mask on the machine that hosts the compiler. This
number has been 32 for every target to date, so fixing askregvar's loop
to 32 iterations seemed tolerable at first. But 1 cc's latest code generator
- for the X86 - would compile faster if we could define smaller register
412 CHAPTER 15 • REGISTER ALLOCATION

sets, and machines exist that have bigger ones. There are now machines
with thirty-two 64-bit unsigned integers, which undermines the motive
behind the shortcut. If we were doing it over, we'd represent register
sets with a structure that could accommodate sets of variable sizes.
getreg demands a register. If askreg can't find one, then spillee
selects one to spill, and spi 11 edits the forest to include instructions that
store it to memory and reload it when it's needed. The second ask reg is
thus guaranteed to find a register.
(gen.c functions)+=
...
411 412
.....
static Symbol getreg(s, mask, p)
Symbol s; unsigned mask[]; Node p; {
Symbol r = askreg(s, mask);
if Cr == NULL) {
r = spillee(s, p);
spill(r->x.regnode->mask, r->x.regnode->set, p);
r = askreg(s, mask);
}
r->x.regnode->vbl =NULL;
return r;
}

If a register is allocated to a variable, x. regnode->vb 1 points to the sym-


askreg 411 bol that represents the variable; getreg's default assumes that the reg-
AUTO 80 ister is not allocated to a variable, so it clears the vbl field.
isscalar 60 askregvar tries to allocate a register to a local variable or formal pa-
mask 361
rameter. It returns one if it succeeds and zero otherwise:
REGISTER
set
80
361 (gen.c functions)+=
...
412 413
spillee 422 .....
spill 427
int askregvar(p, regs) Symbol p, regs; {
vbl 361 Symbol r;
x.regnode 362
(askregvar 412)
}

askregvar declines to allocate a register if the variable is an aggregate,


or if it doesn't have the register storage class:
(askregvar 412) = 413
..... 412
if (p->sclass != REGISTER)
return O;
else if (!isscalar(p->type)) {
p->sclass = AUTO;
return O;
}

If u. t. cse is set, then the variable is a temporary allocated to hold a


common subexpression, and askregvar postpones allocation until the
register allocator processes the expression:
15.3 •ALLOCATING REGISTERS 413

(askregvar 412)+=
...
412 413 412
....
else if (p->temporary && p->u.t.cse) {
p->x.name "?"·
. '
return 1;
}

Waiting helps l cc use one register for more than one temporary. To
help distinguish such variables when debugging the compiler, askregvar
temporarily sets the x. name field of such temporaries to a question mark.
If none of the conditions above is met, askregvar asks askreg for a
register. If one is found, the symbol is updated to point at the register:
(askregvar 412) +=
...
413 413 412
....
else if ((r = askreg(regs, vmask)) != NULL) {
p->x.regnode = r->x.regnode;
p->x.regnode->vbl = p;
p->x.name = r->x.name;
return 1;
}

Otherwise, the variable is forced onto the stack:


(askregvar 412)+=
...
413 412
else { 411 askreg
p->sclass = AUTO; 412 askregvar
return O; 80 AUTO
} 346 cse
19 NELEMS
50 temporary
361 vbl
15.3 Allocating Registers 410 vmask
359 x.kids
Register allocation starts by picking the order in which to execute the 362 x.name
instructions. linearize(p, next) linearizes the instruction tree rooted 359 x.next
359 x.prev
at p. The list is doubly linked through the x. next and x. prev fields. The 362 x.regnode
parameter next points to a sentinel at the end of the list formed so far.
linearize adds the dotted lines that turn Figure 14.5 into Figure 15.1.
(gen.c macros)+=
...
395
#define relink(a, b) ((b)->x.prev = (a), (a)->x.next = (b))
(gen.c functions)+=
...
412 417
....
static void linearize(p, next) Node next, p; {
int i;

for (i = O; i < NELEMS(p->x.kids) && p->x.kids[i]; i++)


linearize(p->x.kids[i], next);
relink(next->x.prev, p);
414 CHAPTER 15 • REGISTER ALLOCA T/ON

ASGNI - - - - - - -

ADDRLP
/ .
~· ·~~>
-----ADDI
i ~ /)~
Start i
l'.J" CVCI <!-·...- CNSTI
4

IND I RC

ADDRLP
i - - kids
- - - - - - x. kids
------------ x.next & x.prev
c
FIGURE 15.1 Ordering uses.

relink(p, next);
}

1 i neari ze traverses the tree in preorder, so it starts by processing


the subtrees recursively. Then it appends p to the growing list, which
amounts to inserting it between next and its predecessor. The first
relink points next's predecessor forward top and p back at next's pre-
forest 311 decessor. The second relink does the same operation for p and next
gen 92 themselves.
gen 402 gen calls re 1 ink to initialize the list to a circular list holding only the
1 i neari ze 413
relink 413 sentinel:
x.next 359
x.prev 359 (linearize forest 414} = ...
414 402
relink(&sentinel, &sentinel);
Then it runs down the forest, linearizing each listed tree, and linking the
trees into the growing list before the sentinel:
...
(linearize forest 414} +=
for (p =forest; p; p = p->link)
...
414 414 402

linearize(p, &sentinel);
At the end of the loop, gen sets forest to the head of the list, which is
the node after the sentinel in the circular list:
...
(linearize forest 414} +=
forest= sentinel.x.next;
...
414 414 402

Finally, it clears the first x. prev and the last x. next to break the circle:
(linearize forest 414} +=
...
414 402
sentinel.x.next->x.prev =NULL;
sentinel.x.prev->x.next =NULL;
15.3 • ALLOCATING REGISTERS 415

The register allocator makes three passes over the forest. The first
builds a list of all the nodes that use each temporary. This list identifies
the last use and thus when the temporary should be freed, and identi-
fies the nodes that must be changed when a temporary must be spilled
to memory. If p->syms [RX] points to a temporary, then the value of
p->syms [RX]->x. 1astuse points to the last node that uses p; that node's
x. prevuse points to the previous user, and so on. The list includes nodes
that read and write the temporary:
(allocate registers 415) = 415 402
....
for (p = forest; p; p = p->x.next)
for (i = O; i < NELEMS(p->x.kids) && p->x.kids[i]; i++) {
if (p->x.kids[i]->syms[RX]->temporary) {
p->x.kids[i]->x.prevuse =
p->x.kids[i]->syms[RX]->x.lastuse;
p->x.kids[i]->syms[RX]->x.lastuse = p->x.kids[i];
}
}

The fragment uses nested loops - first the instructions, then the chil-
dren of each instruction - to visit the uses in the order in which they'll be
executed. A single unnested loop over the forest is tempting but ~ong:
for (p = forest; p; p = p->x.next) 311 forest
if (p->syms[RX]->temporary) { 19 NELEMS
p->x.prevuse = p->syms[RX]->x.lastuse; 362 RX
50 temporary
p->syms[RX]->x.lastuse = p; 359 x.kids
} 362 x. l astuse
359 x.next
It would visit the same uses, but the order would be wrong for some 359 x.prevuse
inputs. For example, a[i]=a[i]-1 uses the address of a[i] twice and
thus assigns it to a temporary. This incorrect code would visit the INDIR
that fetches the temporary for the left-hand side first, so the INDIR that
fetches the temporary for the right-hand side would appear to be the last
use. The temporary would be freed after the load and reused to hold
the difference, and the subsequent store would use a corrupt addres's.
Figure 15.2 shows the effect of the loop nesting on the order of the
x. prevuse chain for this example.
The second pass over the forest eliminates some instructions that copy
one register to another, by targeting the expression that computed the
source register to use the destination register instead. If the source is
a common subexpression, we use the destination to hold the common
subexpression if the code between the two instructions is straight-line
code and doesn't change the destination:
(allocate registers 415)+=
....
415 417 402
....
for (p = forest; p; p = p->x.next)
416 CHAPTER 15 • REGISTER ALLOCA T/ON

ASGNI..q... ASGNI..q...
Start / ~-- ..~ Start / ~·· ..~
IJ" INDIRI SUBI Jj- INDIRI SUBI

:
/1 ! ~- I I 1.~\
I> /
.-1 ! ~·,
.. \\
/f::..\
lti' \
! INDIRI
! VREGP
.......
··.. ...
2
~ r>
I I~
CNSTI
1
f
\·.
··...
VREGP ! ! INDIRI
2 I .' !I 'if.:";
/ ..
i r> CNSTI
1

········1>INDIRP ······/·····!>INDIRP
I !
~ l
~
I
/ ;. I

/
/ VREGP I VREGP
/ 2 I
2
/ I
I
I
·,_ x. l astuse for temporary 2 ·-x. lastuse for temporary 2
--- kids
-- kids & x.kids
··········· x.next & x.prev
------ x.lastuse or x.prevuse
FIGURE 15.2 Ordering uses. Singly nested and incorrect is shown
on the left; doubly nested and correct on the right.

cse 346 if (p->x.copy && p->x.kids[O]->syms[RX]->u.t.cse) {


moveself 394 Symbol dst = p->syms[RX];
RX 362
x.copy 360 Symbol temp= p->x.kids[O]->syms[RX];
x.kids 359 Node q;
x. l astuse 362
x.next 359 for (q = temp->u.t.cse; q; q = q->x.next)
x.prevuse 359 if (p != q && dst == q->syms[RX]
11 ((changes flow of control?4I7)))
break;
if c!q)
for (q = temp->x.lastuse; q; q q->x.prevuse)
q->syms[RX] = dst;
}

The first inner loop scans the rest of the forest and exits early if the
destination is set anywhere later in the block or if some node changes
the flow of control. It could quit looking when the temporary dies, but
the extra logic cut only five instructions out of 25,000 in one test, so
we discarded it. If no other node sets the destination, then it's safe to
use that register for the common subexpression. The second inner loop
changes all instances of the common subexpression to use the destina-
tion instead. Once the common subexpression is computed into dst, the
original register-to-register copy copies dst to itself. The emitter and
moveself collaborate to cut such instructions.
15.3 •ALLOCATING REGISTERS 417

Calls are deemed a break in straight-line code only if the destination


isn't a register variable, because calls don't change register variables:
(changes flow of control? 417) = 416
q->op == LABELV I I q->op == JUMPV I I generic(q->op)==RET I I
generic(q->op)==EQ I I generic(q->op)==NE I I
generic(q->op)==LE I I generic(q->op)==LT I I
generic(q->op)==GE I I generic(q->op)==GT I I
(generic(q->op) == CALL && dst->sclass != REGISTER)
The last pass over the forest finally allocates a register for each node.
rmap is a vector indexed by a type suffix; each element is the register
wildcard that represents the set of registers that suit untargeted nodes
of the corresponding type.
(allocate registers 415) +=
....
415 402
for (p = forest; p; p = p->x.next) {
ralloc(p);
if (p->x.listed && NeedsReg[opindex(p->op)]
&& rmap[optype(p->op)]) {
putreg(p->syms[RX]);
}
}

Registers are freed when the parent reaches ra 11 oc, but a few nodes, like 357 clobber
435 " (MIPS)
CALLI, can allocate a register and have no parent, if the value goes un-
468 " (SPARC)
used. The if statement above frees the register allocated to such nodes. 502 " (X86)
Existing targets use this code only for CALLs and LOADs. 311 forest
ralloc(p) frees the registers no longer needed by p's children, then 306 IR
allocates a register for p, if p needs one and wasn't processed earlier. 361 LOAD
361 mask
Finally, it calls the target's cl ob be r to spill any registers that this node
398 NeedsReg
clobbers: 98 opindex
(gen.c functions)+=
....
413 422
98 optype
static void ralloc(p) Node p; {
.... 410 putreg
80 REGISTER
inti; 398 rmap
unsigned mask[2]; 362 RX
410 tmask
357 x.clobber
mask[O] = tmask[O]; 359 x. listed
mask[l] = tmask[l]; 359 x.next
(free input registers 418) 360 x.registered
if (!p->x.registered && NeedsReg[opindex(p->op)]
&& rmap[optype(p->op)]) {
(assign output register418)
}
p->x.registered = 1;
(*IR->x.clobber)(p);
}
418 CHAPTER 15 • REGISTER ALLOCATION

If a child yields a register variable, or if the register holds a common


subexpression for which other uses remain, then its register must not be
freed. The if statement below catches exactly these exceptions:
(free input registers418}= 417
for (i = O; i < NELEMS(p->x.kids) && p->x.kids[i]; i++) {
Node kid= p->x.kids[i];
Symbol r = kid->syms[RX];
if (r->sclass !=REGISTER && r->x.lastuse == kid)
putreg(r);
}

r->x. 1astuse points to r's last use. For most expression temporaries,
there is only one use, but temporaries allocated to common subexpres-
sions have multiple uses.
Now ralloc allocates a register to this node. prelabel has stored
in p->syms [RX] a register or wildcard that identifies the registers that
p will accept. Again, common subexpressions complicate matters be-
cause askregvar has pointed their p->syms [RX] at a register variable
that hasn't yet been allocated. So we need to use two values: sym is
p->syms [RX], and set is the set of registers that suit p:

askregvar 412
(assign output register418}=
Symbol sym = p->syms[RX], set= sym;
...
418 417

cse 346 if (sym->temporary && sym->u.t.cse)


getreg 412 set= rmap[optype(p->op)];
mask 361
NELEMS 19 If p needs no register, then ralloc is done. Otherwise, it asks getreg for
optype 98
prelabel 398 a register and stores it in the node or nodes that need it:
putreg 410
(assign output register418}+=
...
418 417
ralloc 417
REGISTER 80 if (set->sclass != REGISTER) {
rmap 398 Symbol r;
RX 362 (mask out some input registers 419}
set 361 r = getreg(set, mask, p);
temporary 50
x.kids 359 (assign r to nodes 419}
x. l astuse 362 }

ra11 oc frees the input registers before allocating the output register,
which allows it to reuse an input register as the output register. This
economy is always safe when the node is implemented by a single in-
struction, but it can be unsafe if a node is implemented by a sequence
of instructions: If the output register is also one of the input registers,
and if the sequence changes the output register before reading the cor-
responding input register, then the read fetches a corrupt value. We take
care that all rules that emit instruction sequences set their output regis-
ter only after they finish reading all input registers. Most templates emit
15.3 •ALLOCATING REGISTERS 419

just one instruction, so this assumption is a good default, but it does


require considerable care with multi-instruction sequences.
This rule is impractical for instructions that require the output register
to be one of the input registers. For example, the X86 add instructions
take only two operands; they add the second to the first and leave the
result in the first. If the first operand isn't dead yet, the generated code
must form the sum into a free register, and it must start by copying the
first operand to this free register. The code template is thus generally
two instructions: the first copies the first operand to the destination
register, and the second computes the sum. For example, the X86 add
template is:
reg: ADDI(reg,mril) "mov %c,%0\nadd %c,%1\n" 2
Such templates change the output register before reading all input reg-
isters, so they violate the rule above.
To handle two-operand instructions, we mark their code templates
with a leading question mark. That is, the complete form of the rule
above is:
reg: ADDI(reg,mril) "?mov %c,%0\nadd %c,%1\n" 2
When ralloc sees such a rule, it edits mask to prevent reallocation of all
input registers but the first, which is why the loop below starts at one 346 cse
instead of zero: 382 getrule
306 IR
(mask out some input registers 419)= 418 361 mask
if (*IR->x._templates[getrule(p, p->x.inst)] == '?') 19 NELEMS
for (i = 1; i < NELEMS(p->x.kids) && p->x.kids[i]; i++) { 417 ralloc
362 RX
Symbol r = p->x.kids[i]->syms[RX]; 361 set
mask[r->x.regnode->set] &= -r->x.regnode->mask; 50 temporary
} 360 x.copy
360 x.equatable
The code generators must take care that no node targets the same reg- 358 x. inst
ister as any of its children except the first. 359 x.kids
Once the register is allocated, ra 11 oc stores the allocated register into 362 x. l astuse
359 x.prevuse
the nodes that use it: 360 x.registered
362 x.regnode
(assign r to nodes 419) = 418
if (sym->temporary && sym->u.t.cse) {
Node q;
r->x.lastuse = sym->x.lastuse;
for (q = sym->x.lastuse; q; q = q->x.prevuse) {
q->syms[RX] = r;
q->x.registered = 1;
if (q->x.copy)
q->x.equatable = 1;
}
420 CHAPTER 15 • REGISTER ALLOCA T/ON

} else {
p->syms[RX] = r;
r->x.lastuse = p;
}

If the node is not a common subexpression, the else clause stores r into
p->syms [RX] and notes the single use in r->x. 1astuse. If sym is a com-
mon subexpression, x. 1astuse already identifies the users, so the frag-
ment runs down the list, storing r and marking the node as processed
by the register allocator. It also notes in x.equatable if the common
subexpression is already available in some other register.

15.4 Spilling
When the register allocator runs out of registers, it generates code to
spill a busy register to memory, and it replaces all not-yet-processed
uses of that register with nodes that reload the value from memory.
More ambitious alternatives are available - see Exercises 15.6 and 15.7
- but 1cc omits them. Spills are rare, so 1 cc's spiller has been made as
simple as possible without sacrificing target independence. It would be
wasteful to tune code that is seldom used, and test cases are hard to find
genreload 426 and hard to isolate, so it would be hard to test a complex implementation
genspill 424 thoroughly.
LOAD 361 When the register allocator runs out of registers, it spills to memory
prelabel 398 the most distantly used register, which is the optimal choice. The spiller
RX 362 replaces all not-yet-processed uses of that register with nodes that load
spillee 422
spillr 423 the value from memory, and it frees the register to satisfy the current
VREG 361 request.
x.equatable 360 Several routines collaborate to handle spills: spi 11 ee identifies the
x. l astuse 362 best register to spill, and spi 11 r calls genspi 11 to insert the spill code
and genre 1oad to insert the reloads. Figure 15.3 illustrates their opera-
tion on the program
int i;
main() { i = f() + f(); }

which is the simplest program that spills on most targets. It spills the
value of the first call from the return register so that it won't be destroyed
by the second call.
The figure's first column shows the forest before code generation; that
is, the forest from the front end after pre 1abe1 substitutes VREGs for
ADDRLs that reference (temporary) register variables and injects LOADs to
write such registers. The second column shows the forest after lineariza-
tion; it assumes that the nodes linked by arcs with open arrowheads are
15.4 • SPILLING 421

ASGNI - - - • ASGNI 4 ······· ... .. ·l> ASGNI

I \ .. ;\·····;
<!- · ..
''
'
I\
VREGP
2 !
LOAD!
I
'
'
VREGP
2
LOAD!
~
VREGP
2 ! LOAD! <:J.. :
Y?:. ·. ·,
t Start CALLI
I> Start CAL LI
t> :
·

! !
CALLI
l'.-'T ! II
ADDRGP ADDRGP ADDRGP
f f ·····l>ASGNI <:J· .. f
kids
kids &
I\.!>
..
x.kids ADDRLP INDIRI
- - - - - - 1ink 4 i
x.next & :
x.prev .. VREGP
2
• - - ASGNI c- ' .···l> ASGNI <::!. ASGNI <::!· ..

..- I \ I \ I> I\··~


: VREGP
• 3 !
LOADI ; VREGP
: 3
LOAD!
!:
VREGP
3
LOAD I

!:
CALLI CALLI CALLI

!
ADDRGP
!
ADDRGP
!
ADDRGP
f f f

'· -.ASGNI ASGNI <::!·. ASGNI <:J ...

I\ I \ ·.t> I\ .!>
~
ADDRGP ADDI ADDRGP ADDI <::!· ..
AD~RGP/ADDz.:······ ...:
I \ :. . . I \ ··; i
INDIRI
! !
INDIRI
!~· . . . .~!
···1> INDIRI INDIRI /
INDIRI <l · ···.
_..·l>INDIRf
!
VREGP
2
VREGP
3
VREGP
2
VREGP
3
!
ADDRLP
VREGP
3
4

FIGURE 15.3 Spilling in i = f () + f ().


422 CHAPTER 15 • REGISTER ALLOCATION

instructions - although INDIR and ASGN nodes that read and write reg-
isters are typically just comment instructions - and the rest are subin-
structions like address calculations. The last column shows the injected
spill and reload, which use ADDRLP(4). The dark arrows in the last two
columns show kids and x. kids, which are the links that remain when
subinstructions are projected out of the tree.
When get reg runs out of registers, it calls spi 11 ee (set, he re) to
identify the register in set that is used at the greatest distance from
here:
....
(gen.c functions)+=
static Symbol spillee(set, here) Node here; Symbol set; {
417 422
...
Symbol bestreg = NULL;
int bestdist = -1, i;

if (!set->x.wildcard)
return set;
for (i = 31; i >= O; i--) {
Symbol ri = set->x.wildcard[i];
if (ri != NULL
&& ri->x.regnode->mask&tmask[ri->x.regnode->set]) {
Regnode rn = ri->x.regnode;
getreg 412 Node q = here;
kids 81 int dist = O;
mask 361 for (; q && !uses(q, rn->mask); q q->x.next)
NELEMS 19 dist++;
Regnode 361
set 361
if (q && dist > bestdist) {
tmask 410 bestdist = dist;
x.kids 359 bestreg = ri;
x.next 359 }
x.regnode 362 }
x.wildcard 363 }
return bestreg;
}

If set is not a wildcard, then it denotes a single register; only that register
will do, so spi 11 ee simply returns it. Otherwise, set denotes a proper
set of registers, and spi 11 ee searches for an element of that set with the
most distant use. spi 11 ee calls uses to see if node p reads one given
register:
....
(gen.c functions)+=
static int uses(p, mask) Node p; unsigned mask; {
422 423 ...
int i;
Node q;
15.4 • SPILLING 423

for Ci = O; i < NELEMSCp->x.kids)


&& Cq = p->x.kids[i]) != NULL; i++)
if Cq->x.registered
&& mask&q->syms[RX]->x.regnode->mask)
return 1;
return O;
}

spi 11 rCr, here) spills register rand changes each use of rafter here
to use a reload instead:
(gen.c functions)+=
...
422 424
static void spillrCr, here) Symbol r; Node here; { ""
int i;
Node p = r->x.lastuse;
Symbol tmp = newtempCAUTO, optypeCp->op));
(spillr423)
}

spi 11 r spills the register to memory. It is sometimes possible to spill


to another register, but this complicates the logic because it risks an-
other spill and thus infinite recursion. spi 11 r finds the first use - the
x. prevuse chain ends with the first use - which is the assignment that
establishes the value in r: 80 AUTO
424 genspill
(spillr423)= 423 423 361 mask
while Cp->x.prevuse) { "" 19 NELEMS
50 newtemp
p = p->x.prevuse; 98 optype
} 362 RX
359 x.kids
r can hold a simple expression temporary with a single use or a common 362 x. l astuse
subexpression with multiple uses, but both are assigned by exactly one 359 x.next
instruction. spi 11 r finds it and sends it to gens pi 11, which stitches a 359 x.prevuse
spill into the forest at the assignment: 360 x.registered

(spi 11r423) +=
...
423 423 423
362 x.regnode

genspillCr, p, tmp); ""


The spill could be done anywhere between the assignment and here, but
the site of the assignment is a good safe place for it, which explains why
the spill in Figure 15.3 is in the last column's first tree.
Next, spi 11 r changes all remaining nodes that read r to load the spill
cell instead; it concludes by freeing r:
(spi 11r423) += 423 423
...
for Cp = here->x.next; p; p = p->x.next)
for Ci = O; i < NELEMSCp->x.kids) && p->x.kids[i]; i++) {
Node k = p->x.kids[i];
424 CHAPTER 15 • REGISTER ALLOCA T/ON

if (k->x.registered && k->syms[RX] -- r)


genreload(p, tmp, i);
}
putreg(r);
The scan for nodes that read r starts with he re->x. next instead of he re
for reasons that are subtle. here can spill one of its own kids. For ex-
ample, the code (*f) () might load the value of the pointer into a caller-
saved register and then use an indirect call instruction, which clobber
must spill. Most instruction templates hold just one instruction, so they
finish reading their input registers before clobbering anything; the in-
direct call, for example, doesn't clobber the address register until after
it's done with the value, unless the address register is used again by an
instruction after the call, which is at or after here->x. next.
Also, gen reload doesn't call ralloc to allocate registers for the nodes
that it inserts. gen reload stitches each reload into the list of instructions
just before the instruction that uses the reloaded value. Such instruc-
tions had referenced the spilled value at or before here, but genreload
edits them to use reloads that are after here. We simply postpone regis-
ter allocation for the new instructions until ralloc encounters them on
the list of remaining instructions.
gens pi 11 ( r, last, tmp) spills to tmp the assignment of r at last:
clobber 357
....
(gen.c functions)+= 423 426
....
(MIPS) " 435 static void genspill(r, last, tmp)
(SPARC) " 468
(X86) " 502 Symbol r, tmp; Node last; {
FUNC 97 Node p, q;
genreload 426 Symbol s;
NEWO 24 unsigned ty;
optype 98
putreg 410
ralloc 417
(genspi 11 424)
REGISTER 80 }
RX 362
vbl 361 gens pi 11 synthesizes a register variable of the appropriate type to use
x.name 362 in the spill:
x.registered 360
x.regnode 362 (genspill 424)= ....
425 424
ty = optype(last->op);
if (ty == U)
ty = I;
NEWO(s, FUNC);
s->sclass = REGISTER;
s->x.name = r->x.name;
s->x.regnode = r->x.regnode;
s->x.regnode->vbl = s;
The register being spilled is not a register variable, but pretending it is
ensures that no instructions will be generated to compute the value to be
15.4 • SPILLING 425

spilled, because INDIR.x (VREGP) emits nothing. The value has been com-
puted already, and we want no additional instructions. Next, genspi 11
creates nodes to spill the register to memory:
....
(genspi 11 424) +=
q newnode(ADDRLP, NULL, NULL, s);
424 425
... 424

q newnode(INDIR + ty, q, NULL, NULL);


p newnode(ADDRLP, NllLL, NULL, tmp);
p newnode(ASGN + ty, p, q, NULL);
Now genspi 11 selects instructions, projects out the subinstructions, and
linearizes the resulting instruction tree:
....
(genspill 424)+=
rewrite(p);
425 425
... 424

prune(p, &q);
q = last->x.next;
linearize(p, q);
Finally, it passes the new nodes through the register allocator:
(genspill 424)+=
....
425 424
for (p = last->x.next; p != q; p = p->x.next) {
ralloc(p);
}
357 clobber
If the call on genspi 11 originated because ra 11 oc ran out of registers, 435 " (MIPS)
these calls risk infinite recursion if they actually try to allocate a register. 468 " (SPARC)
We must take care that the code generator can spill a register without 502 " (X86)
allocating another register. Spills are stores, which usually take just one 426 genreload
424 genspill
instruction and thus need no additional register, but some machines have 413 linearize
limits on the size of the constant part of address calculations and thus 315 newnode
require two instructions and a temporary register to complete a store to 386 prune
an arbitrary address. Therefore we must ensure that these stores use a 417 ralloc
402 rewrite
register that is not otherwise allocated by ra 11 oc. The MIPS R3000 ar- 359 x.next
chitecture has such restrictions, but the assembler handles the problem
using a temporary register reserved for the assembler. The SPARC target
is the only one so far that requires attention from the code generator;
Section 17.2 elaborates.
genspi 11 's ra 11 oc calls above must allocate no register, but it calls
ra 11 oc anyway, since ra 11 oc is responsible for more than just allocating
a register. It also calls, for example, the target's clobber. It is unlikely
that a simple store would cause clobber to do anything, but some future
target could do so, so genspi 11 would hide a latent bug if it didn't call
ralloc. The back end sends all other nodes through rewrite, prune,
1i neari ze, and ra 11 oc, so it seems unwise to omit any of these steps
for spill nodes.
genreload(p, tmp, i) changes p->x.kids[i] to load tmp instead of
reading a register that has now been spilled:
426 CHAPTER 15 • REGISTER ALLOCATION

...
(gen.c functions)+=
static void genreload(p, tmp, i)
424 426 ...
Node p; Symbol tmp; inti; {
Node q;
int ty;

(genreload 426)
}

It changes the target node to a tree that loads tmp, selects instructions
for it, and projects out the subinstructions:
(gen reload 426) =
ty = optype(p->x.kids[i]->op);
...
426 426

if (ty == U)
ty = I;
q = newnode(ADDRLP, NULL, NULL, tmp);
p->x.kids[i] = newnode(INDIR + ty, q, NULL, NULL);
rewrite(p->x.kids[i]);
prune(p->x.kids[i], &q);
Next, gen re 1oad linearizes the reloading instructions, as is usual after
pruning, but we need two extra steps first:
kids 81 (genreload 426)+= 426 426
...
linearize 413 reprune(&p->kids[l], reprune(&p->kids[O], 0, i, p), i, p);
newnode 315 prune(p, &q);
optype 98 linearize(p->x.kids[i], p);
prune 386
rewrite 402 In most cases, each entry in x. kids was copied from some entry in some
x.kids 359 kids by prune, but genreload has changed x.kids[i] without updat-
ing the corresponding entry in any kids. The emitter uses kids, so
genreload must find and update the corresponding entry. The call on
reprune above does this, and the second call on prune makes any similar
changes to the node at which p points.
reprune(pp, k, n, p) is called to reestablish the connection between
kids and x. kids when p->x. kids [n] has changed. That is, rep rune must
do whatever is necessary to make it look like the reloads were in the for-
est from the beginning. rep rune is thus an incremental version of prune:
prune establishes a correspondence between kids and x. kids for a com-
plete tree, and reprune reestablishes this correspondence after a change
to just one of them, namely the one corresponding to the reload. Fig-
ure 15.4 shows how reprune repairs the final tree shown in Figure 15.3.
The initial, root-level call on reprune has a pointer, pp, that points to
the first kids entry that might need change.
...
(gen.c functions)+=
static int reprune(pp, k, n, p) Node p, *pp; int k, n; {
426 427 ...
Node q = *pp;
15.4 • SPILLING 427

ADDRGP
/
/ "
ASGNI - - _

'\..
--..:
• · ADDI · - ,
/
ADDRGP
ASGNI · - ,

~-~
•• - ADDI - - ,
i .--/- ' " --. (/ ~-~
INDIRI .c; ••• ---INDIRI '14 .:
INDIRI INDIRI INDIRI

t
ADDRLP
!
VREGP
l
VREGP *
!
ADDRLP VREGP
l
4 2 3 4 3

--·--· kids
------ x.kids
FIGURE 15.4 Figure 15.3's reload before and after reprune.

if (q == NULL I I k > n)
return k;
else if (q->x.inst == 0)
return reprune(&q->kids[l],
reprune(&q->kids[O], k, n, p), n, p);
else if Ck == n) {
*pp= p->x.kids[n]; 357 clobber
return k + 1; 435 " (MIPS)
} else 468 " (SPARC)
502 " (X86)
return k + 1; 412 getreg
} 81 kids
361 mask
kids link the original tree, and x. kids link the instruction tree. The 386 prune
second is a projection of the first, but an arbitrary number of nodes 426 reprune
have been projected out, so finding the kids entry that corresponds to 358 x.inst
p->x. kids [i J requires a recursive tree search. rep rune's recursive calls 359 x.kids
track prune's recursive calls. They bump k, which starts out at zero, and
advance p in exactly those cases where prune finds an instruction and
sets the next entry in x. kids. So when k reaches n, rep rune has found
the kids entry to update.
getreg and each target's clobber call spill(mask, n, here) to spill
all busy registers in register set n that overlap the registers indicated by
mask. A typical use is for CALL nodes, because calls generally corrupt
some registers, which must be spilled before the call and reloaded after-
ward. spi 11 marks the registers as used and runs down the rest of the
forest looking for live registers that need spilling. It economizes by first
confirming that there are registers that need spilling:
(gen.c functions)+=
....
426
void spill(mask, n, here) unsigned mask; int n; Node here; {
428 CHAPTER 15 • REGISTER ALLOCATION

inti;
Node p;

usedmask[n] I= mask;
if (mask&-freemask[n])
for (p = here; p; p p->x.next)
(spi 11 428)
}

The inner loop below identifies the live registers that need spilling and
calls spi 11 r to spill them:
(spill 428)= 428
for (i = O; i < NELEMS(p->x.kids) && p->x.kids[i]; i++) {
Symbol r = p->x.kids[i]->syms[RX];
if (p->x.kids[i]->x.registered && r->x.regnode->set == n
&& r->x.regnode->mask&mask)
spillr(r, here);
}

Spill gives l cc caller-saved registers for free, as a special case of a mech-


anism that is needed for oddball instructions that clobber some fixed set
of registers.
freemask 410
linearize 413
mask 361 Further Reading
NELEMS 19
RX 362
set 361 Execution ordering determines the order of the instruction in the output.
spill r 423 Most languages are only partially constrained. For example, ANSI speci-
usedmask 410 fies that we must evaluate assignment statements in order, but it doesn't
x.kids 359 care which operand of an assignment is computed first. linearize
x.next 359
x.registered 360 uses one fixed order, but better alternatives exist. For example, Sethi-
x.regnode 362 Ullman numbering (Aho, Sethi, and Ullman 1986) can save registers by
evaluating first the children that need the most registers.
Instruction scheduling interacts with register allocation. It helps to
start slow instructions long before their result is needed, but this ties up
the result register longer and thus uses more registers. Proebsting and
Fischer (1991) solve one class of trade-offs compactly. Krishnamurthy
(1990) surveys some of the literature in instruction scheduling.
Many ambitious register allocators use graph coloring. The compiler
builds a graph in which the values computed are the nodes, and it links
two nodes if and only if the two values are ever live at the same time,
which means that they can't share a register or, equivalently, a graph
color. Chaitin et al. (1981) describe the process.
Selecting a register to spill is related to page replacement in oper-
ating systems. Virtual memory systems can't know the most distantly
EXERCISES 429

used page, but spillers can determine the most distantly used regis-
ter (Freiburghouse 1974).

Exercises
15.l Section 15.3 describes an optimization abandoned because it saved
only 5 instructions out of 2 5,000 in one test. Implement the op-
timization and see if you can find useful programs that the opti-
mization improves more.
15.2 Adapt 1cc to use Sethi-Ullman numbering. How much faster does
it make 1cc's code for your programs?

15.3 Construct some input programs that exercise the spiller.


15.4 Change spi 11 ee to spill the least recently used register. Using tim-
ings, can you detect any difference in compilation rate or quality of
the generated code for some useful programs?
15.5 The simplest compilers omit spillers and die uttering a diagnostic
instead. Remove lee's spiller. You'll have to change clobber. How
much simpler is the new compiler? How many of your favorite C
programs do you need to compile before one is rejected? How does 357 clobber
this number change if you change CALLs to copy the result register 435 " (MIPS)
to an arbitrary register, thus avoiding the spill in f()+f()? 468 " (SPARC)
502 " (X86)
15.6 Some spills occur when registers are still available. For example, 422 spillee
the expression f()+f() must spill the first return value from the
return register because the second call needs the same register,
but other registers might be free. Change 1cc's spiller to spill to
another register when it can. How much does this change improve
the code for your favorite C programs? Was it worth the effort?
15.7 1cc generates one reload for each reference processed after the
spill, but fewer reloads can suffice. Change 1cc's spiller to avoid
gratuitous reloads. How much does this change improve the code
for your favorite C programs? Was it worth the effort?
16
Generating MIPS R3000 Code

The MIPS R3000 architecture and the companion R3010 floating-point


unit comprise a RISC. They have 32 32-bit registers each, a compact set
of 32-bit instructions, and one addressing mode. They access memory
only through explicit load and store instructions. The calling convention
passes some arguments in registers.
We might begin this chapter with a complete description of the MIPS
assembler language, but describing the add instruction in isolation and
then giving it again in the rule for ADDI later would be repetitious, so we
begin not with a reference manual but rather with a few sample instruc-
tions, as shown in Table 16.1. Our aim here is to appreciate the general
syntax of the assembler language - the appearance and position of regis-
ters, addresses, opcodes, and assembler directives. This understanding,
plus the templates and the text that describes the rules, plus the parallel
construction of repetitive rules, tell us what we need to know about the
target machine.
The file mi ps. c collects all target-specific code and data for the MIPS

Assembler Meaning
move $10,$11 Set register 10 to the value in register 11.
subu $10,$11,$12 Set register 10 to register 11 minus register 12.
subu $10,$11,12 Set register 10 to register 11 minus the con-
stant 12.
lb $10,11($12) Set register 10 to the byte at the address 11
bytes past the address in register 12.
sub.d $f12,$fl4,$f16 Set register 12 to register 14 minus register
16. Use double-precision floating-point regis-
ters and arithmetic.
sub.s $f12,$f14,$f16 Set register 12 to register 14 minus register
16. Use single-precision floating-point regis-
ters and arithmetic.
b ll Jump to the instruction labelled Ll.
j $31 Jump to the address in register 31.
blt $10, $11, ll Branch to Ll if register 10 is less than
register 11.
.byte Ox20 Initialize the next byte in memory to hexadec-
imal 20.

TABLE 16.1 Sample MIPS assembler input lines.

430
A RETARGETABLE C COMPILER 431

code generator. It's an 1burg specification with the interface routines


after the grammar:
(mips.md431}=
%{
(mips.c macros}
(lburg prefix375}
(interface prototypes}
(MIPS prototypes}
(MIPS data433}
%}
(terminal declarations 376}
%%
(shared rules 400}
(MIPS rules 436}
%%
(MIPS functions 433}
(MIPS interface definition 431}
The last fragment configures the front end and points to the MIPS code
and to data in the back end. Most targets have just one interface record,
but MIPS machines can be configured as either big or little endian, so 1cc
needs two interface records for them:
79 Interface
(MIPS interface definition 431}= 431
Interface mipsebIR = {
(MIPS type metrics43l}
0, /* little_endian */
(shared interface definition 432}
}, mi pselIR = {
(MIPS type metrics 431}
1, /* little_endian */
(shared interface definition432}
};

Systems from Digital Equipment run the Ultrix operating system, are little
endians, and use mipselIR. Systems from Silicon Graphics run the IRIX
operating systems, are big endians, and use mi psebIR. The systems share
the same type metric:
(MIPS type metrics43l}= 431
1, 1, 0, /* char */
2, 2, 0, /* short */
4, 4, 0, /* int */
4, 4, 1, /* float */
8, 8, 1, /* double */
4, 4, 0, /* T * */
0, 1, 0, /* struct */
432 CHAPTER 16 • GENERA TING MIPS R3000 CODE

They also share the rest of interface record:


(shared interface definition432)= 431
0, /* mulops_calls */
0, /* wants_callb */
1, /* wants_argb */
1, /* left_to_right */
0, /* wants_dag */
(interface routine names)
0, 0, 0, stabinit, stabline, stabsym, 0,
{
4, /* max_unaligned_load */
(Xi nterface initializenss)
}

Some of the symbol-table handlers are missing. 1cc, like many compilers,
assumes that all data for the debugger can be encoded using assembler
directives. MIPS compilers encode file names and line numbers this way,
but information about the type and location of identifiers is encoded in
another file, which l cc does not emit. MIPS debuggers can thus report
the location of an error in an executable file from 1 cc, but they can't
report or change the values of identifiers; see Exercise 16.5.
stabinit 80
stabline 80
stabsym 80 16.1 Registers
The MIPS R3000 processor has thirty-two 32-bit registers, which are
known to the assembler as Si. The MIPS R3010 floating-point coproces-
sor adds thirty-two more 32-bit registers, which are usually treated as
sixteen even-numbered 64-bit registers and are known to the assembler
as Sfi.
The hardware imposes only a few constraints - register $0 is always
zero, and the jump-and-link instruction puts the return address in $31 -
but 1cc observes many more conventions used by other compilers, in or-
der to interoperate with the standard libraries and debuggers. Table 16.2
enumerates the conventions.
The assembler reserves $1 to implement pseudo-instructions. For ex-
ample, the hardware permits only 16-bit offsets in address calculations,
but the assembler permits 32-bit offsets by injecting extra instructions
that form a large offset using $1. 1cc uses some pseudo-instructions, but
it forgoes others to simplify adaptations of 1cc that emit binary object
code directly.
The convention reserves $2-$3 and $f0-$f2 for return values, but lee
uses only the first half of each. The second half is for Fortran's complex
arithmetic type. C doesn't have this type, but C compilers respect the
convention to interoperate with Fortran code.
16. 1 • REGISTERS 433

Registers Use
$0 zero; unchangeable
$1 reserved for the assembler
$2-$3 function return value
$4-$7 first few procedure arguments
$8-$15 scratch registers
$16-$23 register variables
$24-$25 scratch registers
$26-$27 reserved for the operating system
$28 global pointer; also called $gp
$29 stack pointer; also called $sp
$30 register variable
$31 procedure return address

$f0-$f2 function return value


$f4-$f10 scratch registers
$f12-$f14 first two procedure arguments
$f16-$f18 scratch registers
$f20-$f30 register variables

TABLE 16.2 MIPS register conventions.

progend does nothing for this target. progbeg encodes Table 16.2 in 411 askreg
the register allocator's data structures. 458 gp
89 progend
(MIPS functions433)= 435 431 466 " (SPARC)
static void progbeg(argc, argv) int argc; char *argv[f; { 502 " (X86)
int i;

(shared progbeg 371)


print(".set reorder\n");
(parse -G flag 458)
(initialize MIPS register structures 434)
}

First, it emits a harmless directive - the MIPS assembler objects to empty


inputs - and parses a target-specific flag. Then it initializes the vectors
of register symbols:
(MIPS data433)=
static Symbol ireg[32], freg2[32], d6;
...
434 431

Each element of i reg represents one integer register, and freg2 repre-
sents pairs of adjacent floating-point registers. d6 represents the pair
$6-$7.
Actually, the machine has only 31 register pairs of each type, but the
declaration supplies 32 to keep askreg's inelegant loop bounds valid.
434 CHAPTER 16 • GENERATING MIPS R3000 CODE

(initialize MIPS register structures 434) = 434


.... 433
for Ci = O; i < 31; i += 2)
freg2[i] = mkregC"%d", i, 3, FREG);
for Ci = O; i < 32; i++)
i reg [i] = mkregC"%d", i, 1, IREG);
ireg[29]->x.name = "sp";
d6=mkregC"6", 6, 3, IREG);
mkreg assigns numeric register names. It renames the stack pointer -
$29 - to use the assembler mnemonic $sp. We don't need the mnemonic
$gp, or we'd rename $28 too.
rmap stores the wildcard that identifies the default register class to
use for each type:
(initialize MIPS register structures 434) +=
...
434 434 433
....
rmap[C] = rmap[S] = rmap[P] = rmap[B] rmap[U] = rmap[I] =
mkwildcardCireg);
rmap[F] = rmap[D] = mkwildcardCfreg2);
tmask identifies the scratch registers, and vmask the register variables:
(mips.c macros)= 443
....
#define INTTMP Ox0300ff00
blkcopy 367 #define INTVAR Ox40ff0000
(MIPS) freg2 433 #define FLTTMP OxOOOfOffO
(SPARC) " 467 #define FLTVAR OxfffOOOOO
FREG 361
!REG 361
(initialize MIPS register structures 434) +=
...
434 434 433
(MIPS) i reg 433 ....
(SPARC) " 467 tmask[IREG] = INTTMP; tmask[FREG] = FLTTMP;
mkreg 363 vmask[IREG] = INTVAR; vmask[FREG] = FLTVAR;
mkwildcard 363
rmap 398 ARGB and ASGNB need not just source and destination addresses but also
tmask 410 three temporary registers to copy a block. $3 is always available for tem-
vmask 410 porary use, but we need two more, and all five registers must be distinct.
x.name 362
An easy way to enforce this rule is to target the source register to a reg-
ister triple: $8 for the source and $9 and $10 for the two temporaries.
tmpregs lists the three temporaries - $3, $9, and $10 - for blkcopy:
(MIPS data433)+=
...
433 434 431
....
static int tmpregs[] = {3, 9, 10};
bl kreg is the source register triple:
(MIPS data433)+=
...
434 458 431
....
static Symbol blkreg;

(initialize MIPS register structures 434) +=


...
434 433
blkreg = mkregC"8", 8, 7, IREG);
16.2 • SELECTING INSTRUCTIONS 435

Name What It Matches


aeon address constants
add r address calculations for instructions that read and write memory
ar labels and addresses in registers
con constants
re registers and constants
res registers and constants that fit in five bits
reg computations that yield a result in a register
stmt computations done for side effect

TABLE 16.3 MIPS nonterminals.

The third argument to mkreg is a mask of three ones, which identifies $8,
$9, and $10. The emitted code takes care to use $8 as the source register
and the other two as temporaries.
target calls set reg to mark nodes that need a special register, and it
calls rtarget to mark nodes that need a child in a special register:
(MIPS functions 433) +=
...
433 435 431
static void target(p) Node p; {
....
switch (p->op) {
(MIPS target 437)
} 363 mkreg
} 400 rtarget
399 setreg
If an instruction clobbers some registers, clobber calls spill to save 427 spill
them first and restore them later.
(MIPS functions 433) +=
...
435 444 431
....
static void clobber(p) Node p; {
switch (p->op) {
(MIPS c 1ob be r 443)
}
}

The cases missing from target and clobber above appear with the ger-
piane instructions in the next section.

16.2 Selecting Instructions


Table 16.3 summarizes the nonterminals in lee's lburg specification for
the MIPS code generator. It provides a high-level overview of the organi-
zation of the tree grammar.
Some assembler instructions have a suffix that identifies the data type
on which the instruction operates. The suffixes s and d identify single-
and double-precision floating-point instructions, and b, h, and w identify
436 CHAPTER 16 • GENERATING MIPS R3000 CODE

8-, 16-, and 32-bit integral instructions, respectively. The optional suffix
u flags some instructions as unsigned. If it's omitted, the operation is
signed.
Constants and identifiers represent themselves in assembler:
(MIPS rules 436) = ...
436 431
aeon: con "%0"
aeon: ADDRGP "%a"
The instructions that access memory use address-calculation hardware
that adds an instruction field and the contents of an integer register. The
assembler syntax is the constant followed by the register in parentheses:
...
(MIPS rules 436) +=
addr: ADDI(reg,acon) "%1($%0)"
...
436 436 431

addr: ADDU(reg,acon) "%1($%0)"


addr: ADDP(reg,acon) "%1($%0)"
Degenerate sums - with a zero constant or $0 - supply absolute and
indirect addressing. The zero field may be omitted:
...
CMIPS rules 436) + =
addr: aeon "%0"
...
436 436 431

addr: reg "($%0)"


%F 392
reg 403 The hardware permits only 16-bit offsets in address calculations, but the
assembler uses $1 and extra instructions to synthesize larger values, so
l cc can ignore the hardware restriction, at least on this machine.
ADDRFP and ADDRLP add a constant offset to the stack pointer:
...
(MIPS rules 436) +=
addr: ADDRFP "%a+%F($sp)"
...
436 437 431

addr: ADDRLP "%a+%F($sp)"


%a emits p->syms [OJ ->x. name, and %F emits the size of the frame. $sp is
decremented by the frame size when the routine starts, so the %F($sp)
part recreates $sp's initial value. Locals have negative offsets, so they're
below the initial $sp. Formals have positive offset, so they're just above,
in the caller's frame. Section 16.3 elaborates.
addr illustrates several guidelines from Section 14.10. It includes rules
for each of the addressing modes, plus rules for degenerate cases, rules
replicated for equivalent operators, and rules that implement operators
that are special cases of a more general computation. It also factors the
specification to avoid replicating the material above for each rule that
uses addr.
The pseudo-instruction la performs an address calculation and leaves
the address in a register. For example, la $2, x($4) adds $4 to the ad-
dress x and leaves the result in $2:
16.2 • SELECTING INSTRUCTIONS 437

....
(MIPS rules 436) += 436 437
..... 431
reg: addr "la $%c,%0\n" 1
%c emits p->syms [RX]->x. name. A con is an addr, so lee uses 1a when-
ever it needs to load a constant into a register. Zero is always available
in $0, so we need no instruction to compute zero:
(MIPS rules 436) +=
....
437 437 431
.....
reg: eNSTe "# reg\n" range(a, 0, 0)
reg: eNSTS "# reg\n" range(a, 0, 0)
reg: eNSTI "# reg\n" range(a, 0, 0)
reg: eNSTU "# reg\n" range(a, 0, 0)
reg: eNSTP "# reg\n" range(a, 0, 0)
Recall that cost expressions are evaluated in a context in which a denotes
the node being labelled, which here is the constant value being tested for
zero. target arranges for these nodes to return $0:
(MIPS target 437) = 443
..... 435
case eNSTe: case eNSTI: case eNSTS: case eNSTU: case eNSTP:
if (range(p, 0, 0) == 0) {
setreg(p, ireg[O]);
p->x.registered = 1;
} 433 i reg (MIPS)
break; 467 " (SPARC)
388 range
Allocating $0 makes no sense, so target marks the node to preclude 403 reg
register allocation. 399 setreg
The instructions 1 and s load from and store into memory. They take 403 stmt
357 target
a type suffix, an integer register, and an addr. For example, sw $4,x 435 " (MIPS)
stores the 32-bit integer in $4 into the memory cell labelled x. sb and sh 468 " (SPARC)
do likewise for the low-order 8 and 16 bits of the register. 1 b, 1 h, and 502 " (X86)
lw reverse the process and load an 8-, 16-, or 32-bit value: 360 x.registered
....
(MIPS rules 436) += 437 438
..... 431
stmt: ASGNe(addr,reg) "sb $%1,%0\n" 1
stmt: ASGNS(addr,reg) "sh $%1,%0\n" 1
stmt: ASGNI(addr,reg) "sw $%1,%0\n" 1
stmt:·ASGNP(addr,reg) "sw $%1,%0\n" 1
reg: INDIRe(addr) "lb $%c,%0\n" 1
reg: INDIRS(addr) "lh $%c,%0\n" 1
reg: INDIRI(addr) "lw $%c,%0\n" 1
reg: INDIRP(addr) "lw $%c,%0\n" 1
1b and 1h propagate the sign bit to fill the top part of the register, so they
implement a free ever and CVS!. 1bu and 1 hu fill with zeroes instead, so
they implement a free eveu and CVSU:
438 CHAPTER 16 • GENERATING MIPS R3000 CODE

....
(MIPS rules436)+=
reg: CVCI(INDIRC(addr)) "lb $%c,%0\n" 1
437 438
... 431

reg: CVSI(INDIRS(addr)) "lh $%c,%0\n" 1


reg: CVCU(INDIRC(addr)) "lbu $%c,%0\n" 1
reg: CVSU(INDIRS(addr)) "lhu $%c,%0\n" 1
These rules illustrate another guideline from Section 14.10: when in-
structions evaluate more than one intermediate-language opcode, write
a rule that matches multiple nodes.
1 . and s. load and store floating-point values. All floating-point in-
structions separate the opcode and type suffix with a period:
....
(MIPS rules 436) +=
reg: INDIRD(addr) "l .d $f%c,%0\n" 1
438 438 ... 431

reg: INDIRF(addr) "l.s $f%c,%0\n" 1


stmt: ASGND(addr,reg) "s.d $f%1,%0\n" 1
stmt: ASGNF(addr,reg) "s.s $f%1,%0\n" 1
All integer-multiplicative instructions accept two source registers and
leave a result in a destination register. The left and right operands follow
the source register. For example, div $4, $5 , $6 divides $5 by $6 and
leaves the result in $4.
....
(MIPS rules436)+=
reg: DIVI(reg,reg) "div $%c,$%0,$%1\n" 1
438 438 ... 431
reg 403
stmt 403 reg: DIVU(reg,reg) "divu $%c,$%0,$%1\n" 1
reg: MODI(reg,reg) "rem $%c,$%0,$%1\n" 1
reg: MODU(reg,reg) "remu $%c,$%0,$%1\n" 1
reg: MULI(reg,reg) "mul $%c,$%0,$%1\n" 1
reg: MULU(reg,reg) "mul $%c,$%0,$%1\n" 1
The remaining binary integer instructions also have an immediate form,
in which the right operand may be a constant instruction field:
....
(MIPS rules 436) +=
re: con "%0"
438 439 ... 431

re: reg "$%0"

reg: ADDI(reg,rc) "addu $%c,$%0,%1\n" 1


reg: ADDP(reg,rc) "addu $%c,$%0,%1\n" 1
reg: ADDU(reg,rc) "addu $%c,$%0,%1\n" 1
reg: BANDU(reg,rc) "and $%c,$%0,%1\n" 1
reg: BORU(reg,rc) "or $%c,$%0,%1\n" 1
reg: BXORU(reg,rc) "xor $%c,$%0,%1\n" 1
reg: SUBI(reg,rc) "subu $%c,$%0,%1\n" 1
reg: SUBP(reg, re) "subu $%c,$%0,%1\n" 1
reg: SUBU(reg,rc) "subu $%c,$%0,%1\n" 1
Immediate shift instructions, however, require constants between zero
and 31:
16.2 • SELECTING INSTRUCTIONS 439

...
(MIPS rules436)+=
res: CNSTI "%a"
438 439
range(a,0,31)
... 431

rc5: reg "$%0"

reg: LSHI(reg,rc5) "sll $%c,$%0,%1\n" 1


reg: LSHU(reg,rc5) "sll $%c,$%0,%1\n" 1
reg: RSHI(reg,rc5) "sra $%c,$%0,%1\n" 1
reg: RSHU(reg,rc5) "srl $%c,$%0,%1\n" 1
Only register forms are available for the unary instructions:
...
(MIPS rules 436) +=
reg: BCOMU(reg) "not $%c,$%0\n" 1
439 439
... 431

reg: NEGI(reg) "negu $%c,$%0\n" 1


reg: LOADC(reg) "move $%c,$%0\n" move(a)
reg: LOADS(reg) "move $%c,$%0\n" move(a)
reg: LOADI(reg) "move $%c,$%0\n" move(a)
reg: LOADP(reg) "move $%c,$%0\n" move(a)
reg: LOADU(reg) "move $%c,$%0\n" move(a)
Recall that move returns one but also marks the node as a register-to-
register move and thus a candidate for some optimizations.
The floating-point instructions also have only register forms:
... 394 move
(MIPS rules436)+=
reg: ADDD(reg,reg) "add.d $f%c,$f%0,$f%1\n" 1
439 439... 431 388 range
403 reg
reg: ADDF(reg,reg) "add.s $f%c,$f%0,$f%1\n" 1
reg: DIVD(reg,reg) "div.d $f%c,$f%0,$f%1\n" 1
reg: DIVF (reg , reg) "div.s $f%c,$f%0,$f%1\n" 1
reg: MULD(reg, reg) "mul .d $f%c,$f%0,$f%1\n" 1
reg: MULF(reg,reg) "mul .s $f%c,$f%0,$f%1\n" 1
reg: SUBD(reg,reg) "sub.d $f%c,$f%0,$f%1\n" 1
reg: SUB F(reg, reg) "sub.s $f%c,$f%0,$f%1\n" 1
reg: LOADD(reg) "mov.d $f%c,$f%0\n" move(a)
reg: LOADF(reg) "mov.s $f%c,$f%0\n" move(a)
reg: NEGD(reg) "neg.d $f%c,$f%0\n" 1
reg: NEGF(reg) "neg.s $f%c,$f%0\n" 1
Few instructions are specialized to convert between types. ever and CVSI
sign-extend by shifting first left and then right. CVCU and CVSU zero-
extend by "anding out" the top part of the register:
...
(MIPS rules436)+=
reg: CVCI(reg) "sl 1 $%c,$%0,24; sra $%c,$%c,24\n"
439 440
2
... 431

reg: CVSI(reg) "sl 1 $%c,$%0,16; sra $%c,$%c,16\n" 2


reg: CVCU(reg) "and $%c,$%0,0xff\n" 1
reg: CVSU(reg) "and $%c,$%0,0xffff\n" 1
440 CHAPTER 16 • GENERA TING MIPS R3000 CODE

These rules illustrate another guideline from Section 14.10: When no


instruction directly implements an operation, write a rule that pieces the
operation together using other instructions.
The rest of the conversions, which involve only integer and pointer
types, do nothing. Conversions to narrower types like CVIC need not
clear the top of the register because the front end never builds trees
that use the upper bits of a narrow value. The shared rules and the rules
below generate nothing if the existing register will do:
...
(MIPS rules 436) +=
reg: CVIC(reg) "%0" notarget(a)
...
439 440 431

reg: CVIS(reg) "%0" notarget(a)


reg: CVUC(reg) "%0" notarget(a)
reg: CVUS(reg) "%0" notarget(a)
More expensive rules generate a register-to-register copy if a specific reg-
ister has been targeted:
(MIPS rules 436) + =
...
reg: CVIC(reg) "move $%c,$%0\n" move(a)
...
440 440 431

reg: CVIS(reg) "move $%c,$%0\n" move(a)


reg: CVIU(reg) "move $%c,$%0\n" move(a)
reg: CVPU(reg) "move $%c,$%0\n" move(a)
move 394 reg: CVUC(reg) "move $%c,$%0\n" move(a)
notarget 404 reg: CVUI(reg) "move $%c,$%0\n" move(a)
reg 403
reg: CVUP(reg) "move $%c,$%0\n" move(a)
reg: CVUS(reg) "move $%c,$%0\n" move(a)
cvt. d. s converts a float to a double, and cvt. s. d reverses the process:
...
(MIPS rules 436) +=
reg: CVDF( reg) "cvt.s.d $f%c,$f%0\n" 1
...
440 440 431

reg: CVFD(reg) "cvt.d.s $f%c,$f%0\n" 1

cvt. d . w converts an integer to a double. The integer must be in a


floating-point register, so CVID starts with a mtcl, which copies a value
from the integer unit to the floating-point unit:
...
(MIPS rules 436) +=
reg: CVID(reg) "mtcl $%0,$f%c; cvt.d.w $f%c,$f%c\n"
...2
440 441 431

It sets the target register twice: first to the unconverted integer and then
to the equivalent double. See Exercise 16.6.
The t rune. w. d instruction truncates a double and leaves the integral
result in a floating-point register, so 1cc follows up with a mfcl, which
copies a value from the floating point unit to the integer unit, where the
client of the CVDI expects it:
16.2 • SELECTING INSTRUCTIONS 441

....
(MIPS rules 436) +=
reg: CVDI(reg) "trunc.w.d $f2,$f%0,$%c; mfcl $%c,$f2\n" 2
440 441
... 431

It needs a floating-point scratch register to hold the converted value, so


it uses $f2, which the calling convention reserves as a secondary return
register, but which 1cc does not otherwise use.
A label is defined by following it with a colon:
....
(MIPS rules436)+=
stmt: LABELV "%a:\n"
441 441
... 431

The b instruction jumps unconditionally to a fixed address, and the j


instruction jumps unconditionally to an address from a register:
....
(MIPS rules 436) +=
stmt: JUMPV(acon) "b %0\n" 1
441 441 ... 431

stmt: JUMPV(reg) "j $%0\n" 1


Switch statements implemented using branch tables need j. All other
unconditional branches use b.
The integer conditional branches compare two registers and branch if
the named condition holds:
....
(MIPS rules436)+=
stmt: EQI(reg,reg) "beq $%0,$%1,%a\n" 1
441 442 ... 431

stmt: GEI (reg, reg) "bge $%0,$%1,%a\n" 1


403 reg
stmt: GEU(reg,reg) "bgeu $%0,$%1,%a\n" 1 403 stmt
stmt: GTI (reg, reg) "bgt $%0,$%1,%a\n" 1
stmt: GTU(reg,reg) "bgtu $%0,$%1,%a\n" 1
stmt: LEI( reg, reg) "ble $%0,$%1,%a\n" 1
stmt: LEU(reg,reg) "bleu $%0,$%1,%a\n" 1
stmt: LTI(reg,reg) "blt $%0,$%1,%a\n" 1
stmt: LTU(reg,reg) "bltu $%0,$%1,%a\n" 1
stmt: NEI(reg,reg) "bne $%0,$%1,%a\n" 1
The hardware does not implement all these instructions directly, but the
assembler compensates. For example, hardware instructions for GE, GT,
LE, and LT assume that the second comparand is zero, but the assem-
bler synthesizes the pseudo-instructions above by computing a differ-
ence into $1 if necessary. Also, only the j instruction accepts a full 32-bit
address, but the pseudo-instructions above assemble arbitrary addresses
from the hardware's more restricted addresses. When 1cc uses pseudo-
instructions, it can't know exactly what real instructions the assembler
emits, so the costs above are necessarily approximate, but there's only
one way to generate code for these intermediate-language operators any-
way, so the inaccuracies can't hurt the quality of the emitted code.
The floating-point conditional branches test a condition flag set by a
separate comparison instruction. For example, c. 1t. d SfO, $f2 sets the
flag if the double in $f0 is less than the double in $f2. belt branches if
the flag is set, and bclf branches if the flag is clear.
442 CHAPTER 16 •GENERATING MIPS R3000 CODE

(MIPS rules 436) +=


....
441 442
..... 431
stmt: EQD(reg,reg) "c.eq.d $f%0,$f%1; belt %a\n" 2
stmt: EQF(reg,reg) "c.eq.s $f%0, $f%1; belt %a\n" 2
stmt: LED(reg, reg) "c. le.d $f%0,$f%1; belt %a\n" 2
stmt: LEF(reg,reg) "c. le.s $f%0,$f%1; belt %a\n" 2
stmt: LTD(reg,reg) "c. lt.d $f%0,$f%1; belt %a\n" 2
stmt: LTF(reg, reg) "c. lt.s $f%0,$f%1; belt %a\n" 2
Floating-point comparisons implement only less-than, less-than-or-equal,
and equal, so 1cc implements the rest by inverting the sense of the re-
lation and following it with a bcl f:
(MIPS rules436)+=
....
442 442 431
.....
stmt: GED(reg,reg) "c. lt.d $f!Yo0' $f%1; bclf %a\n" 2
stmt: GEF(reg, reg) "c. lt.s $f%0,$f%1; bclf %a\n" 2
stmt: GTD(reg,reg) "c. le.d $f%0,$f%1; bclf %a\n" 2
stmt: GTF(reg,reg) "c. le. s $f%0,$f%1; bclf %a\n" 2
stmt: NED(reg,reg) "c.eq.d $f%0,$f%1; bclf %a\n" 2
stmt: NEF(reg, reg) "c.eq.s $f%0,$f%1; bclf %a\n" 2
For example, it can't use
c.gt.d $f0,$f2
reg 403 belt L
stmt 403
so instead it uses
c. le.d $f0, $f2
bclf L
The jal instruction saves the program counter in $31 and jumps to
an address stored in a constant instruction field or in a register .
(MIPS rules 436) +=
....
442 442 431
.....
ar: ADDRGP "%a"

reg: CALLD(ar) "jal %0\n" 1


reg: CALLF(ar) "jal %0\n" 1
reg: CALLI(ar) "jal %0\n" 1
stmt: CALLV(ar) "jal %0\n" 1
CALLV yields no result and thus matches stmt instead of reg. Most calls
jump to a label, but indirect calls like (*p) O need a register form:
(MIPS rules436)+=
....
442 443 431
.....
ar: reg "$%0"
Some device drivers jump to addresses at fixed numeric addresses. jal
insists that they fit in 28 bits:
16.2 • SELECTING INSTRUCTIONS 443

...
(MIPS rules 436} +=
ar: CNSTP "%a" range(a, 0, OxOfffffff)
...
442 443 431

If the constant won't fit in 28 bits, then 1cc falls back on more costly
rules that load an arbitrary 32-bit constant into a register and that jump
indirectly using that register. The MIPS assembler makes most of the
decisions that require checking ranges, but at least some versions of the
assembler leave this particular check to the compiler.
The front end and the routines function and target collaborate to
get return values into the return register, and return addresses into the
program counter, so RET nodes produce no code:
...
(MIPS rules 436}+=
stmt: RETD(reg) "# ret\n" 1
...
443 444 431

stmt: RETF(reg) "# ret\n" 1


stmt: RETI(reg) "# ret\n" 1
CALLOs and CALLFs yield $f0, and CALLIS yield $2. Each RET has its child
compute its value into the corresponding register.
...
(MIPS target 437}+=
case CALLO: case CALLF: setreg(p, freg2[0]);
437 445
break;
... 435

case CALLI: setreg(p, ireg[2]); break;


case RETO: case RETF: rtarget(p, 0, freg2[0]); break;
434 FLTTMP
case RETI: rtarget(p, 0, ireg[2]); break; 433 freg2 (MIPS)
467 " (SPARC)
Recall that set reg sets the result register for the node in hand, and that 361 FREG
rtarget sets the result register for a child of the node. rtarget exists 92 function
because simply calling setreg on the child could clobber something of 448 " (MIPS)
value. The material on rtarget elaborates. 484 " (SPARC)
The scratch and return registers are not preserved across calls, so any 518 " (X86)
434 INTTMP
live ones must be spilled and reloaded, except for the return register 361 !REG
used by the call itself: 433 i reg (MIPS)

(mips.c macros}+=
...
434
467
388
" (SPARC)
range
#define INTRET Ox00000004 403 reg
#define FLTRET Ox00000003 400 rtarget
399 setreg
427 spill
(MIPS clobber 443)= 435
403 stmt
case CALLO: case CALLF: 357 target
spill(INTTMP I INTRET, !REG, p); 435 " (MIPS)
spill(FLTTMP, FREG, p); 468 " (SPARC)
break; 502 " (X86)
case CALLI:
spill(INTTMP, !REG, p);
spill(FLTTMP I FLTRET, FREG, p);
break;
case CALLV:
444 CHAPTER 16 • GENERATING MIPS R3000 CODE

spill(INTTMP INTRET, !REG, p);


spill(FLTTMP FLTRET, FREG, p);
break;
Floating-point values return in the double register $f0, and all other
values return in $2. target and clobber collaborate. Consider CALLI;
target arranges for it to yield register $2, and clobber asks spi 11 to
save all other caller-saved registers before the call and to restore them
afterward.
The rules to transmit arguments require collaboration between target
and emit2:
(MIPS rules 436) +=
...
443 446 431
arg\n"
....
stmt: ARGD(reg) "# 1
stmt: ARGF(reg) "# arg\n" 1
stmt: ARGI(reg) "# arg\n" 1
stmt: ARGP(reg) "# arg\n" 1

(MIPS functions 433) +=


...
435 444 431
....
static void emit2(p) Node p; {
int dst, n, src, ty;
static int tyO;
Symbol q;
clobber 357
(MIPS) " 435
(SPARC) " 468 switch (p->op) {
(X86) " 502 (MIPS emit2 446)
FLTRET 443 }
FLTTMP 434 }
(MIPS) freg2 433
(SPARC) " 467 The MIPS calling convention passes the first four words of arguments
FREG 361
(including gaps to satisfy alignments) in registers $4-$7, except that if
INTRET 443
INTTMP 434 the first argument is a float or a double, it is passed in $fl2, and if the
IREG 361 first argument is passed in $f12 and the second argument is a float or
offset 364 a double, the second argument is passed in $f14. argreg implements
reg 403 these rules:
spill
stmt
427
403 (MIPS functions 433) +=
...
target 357 ....
444 445 431
(MIPS) " 435
static Symbol argreg(argno, offset, ty, tyO)
(SPARC) " 468 int argno, offset, ty, tyO; {
(X86) " 502 if (offset > 12)
return NULL;
else if (argno == 0 && (ty == F I I ty == D))
return freg2[12];
else if (argno == 1 && (ty == F I I ty == D)
&& (tyO == F I I tyO == D))
return freg2[14];
else if (argno == 1 && ty == D)
16.2 • SELECTING INSTRUCTIONS 445

return d6; /* Pair! */


else
return ireg[(offset/4) + 4];
}
argno is the argument number. offset and ty are the offset and type
of an argument. tyO is the type of the first argument, which can influ-
ence the placement the second argument. If the argument is passed in a
register, argreg returns the register. Otherwise, it returns null.
gen calls doarg to compute argno and offset for argreg:
...
{MIPS functions433)+=
static void doarg(p) Node p; {
...
444 447 431

static int argno;


int size;

if (argoffset == 0)
argno = O;
p->x.argno = argno++;
size= p->syms[l]->u.c.v.i < 4? 4 : p->syms[l]->u.c.v.i;
p->syms[2] = intconst(mkactual(size,
p->syms[O]->u.c.v.i));
}
doca11 clears argoffset at each CALL, so a zero there alerts doarg to 366 argoffset
reset its static argument counter. mkactual uses the argument size and 444 argreg
367 docall
alignment - rounded up to 4 if necessary, because smaller arguments 92 gen
are widened - and returns the argument offset. 402 gen
target uses argreg and rtarget to compute the children of ARG nodes 49 intconst
into the argument register, if there is one: 361 IREG
... 433 i reg (MIPS)
{MIPS target 437) +=
case ARGO: case ARGF: case ARGI: case ARGP: {
...
443 447 435 467
366
" (SPARC)
mkactual
static int tyO; 364 offset
int ty = optype(p->op); 98 optype
400 rtarget
Symbol q; 361 set
357 target
q = argreg(p->x.argno, p->syms[2]->u.c.v.i, ty, tyO); 435 " (MIPS)
if (p->x.argno == 0) 468 " (SPARC)
tyO = ty; 502 " (X86)
359 x.argno
if (q && 362 x.regnode
!((ty == F I I ty == D) && q->x.regnode->set !REG))
rtarget(p, 0, q);
break;
}
The fragment also remembers the type of the first argument to help de-
termine the register for later arguments. The long conditional omits tar-
geting if the argument is floating point but passed in an integer register.
446 CHAPTER 16 • GENERA TING MIPS R3000 CODE

1 cc assumes that floating-point opcodes yield floating-point registers, so


no tree can develop an unconverted floating-point value into an integer
register. emi t2 must handle these oddballs and the arguments tl;:tat travel
\

in memory:
(MIPS emi t2 446) =
case ARGO: case ARGF: case ARGI: case ARGP:
446 444 ...
ty = optype(p->op);
if (p->x.argno == 0)
tyO = ty;
q = argreg(p->x.argno, p->syms[2]->u.c.v.i, ty, tyO);
src = getregnum(p->x.kids[OJ);
if (q == NULL && ty == F)
print("s.s $f%d,%d($sp)\n", src, p->syms[2]->u.c.v.i);
else if (q == NULL && ty == D)
print("s.d $f%d,%d($sp)\n", src, p->syms[2]->u.c.v.i);
else if (q == NULL)
print("sw $%d,%d($sp)\n", src, p->syms[2]->u.c.v.i);
else if (ty == F && q->x.regnode->set == IREG)
print("mfcl $%d,$f%d\n", q->x.regnode->number, src);
else if (ty == D && q->x.regnode->set == IREG)
print("mfcl.d $%d,$f%d\n", q->x.regnode->number, src);
break;
argreg 444
blkcopy 367 If argreg returns null, then the caller passes the argument in memory,
dalign 368 so emi t2 stores it, using the offset that doarg computed. The last two
doarg 356
(MIPS) " 445 conditionals above emit code for floating-point arguments transmitted in
(SPARC) " 477 integer registers. mfcl x, y copies a single-precision value from floating-
(X86) " 512 point register y to integer register x. mfcl. d does likewise for doubles;
emit2 356 the target is a register pair.
(MIPS) " 444 emi t2 and target also collaborate to emit block copies:
(SPARC) " 478 ....
(X86) " 511 (MIPS rules 436) += 444 431
IREG 361 stmt: ARGB(INDIRB(reg)) "# argb %0\n" 1
number 361
optype 98 stmt: ASGNB(reg,INDIRB(reg)) "# asgnb %0 %1\n" 1
reg 403 emi t2's case for ASGNB sets the globals that record the alignment of the
salign 368
source and destination blocks, then lets bl kcopy do the rest:
set 361 ....
stmt
target
403
357
(MIPS emit2 446)+=
case ASGNB:
446 447 ... 444
(MIPS) " 435
(SPARC) " 468 dalign = salign = p->syms[l]->u.c.v.i;
(X86) " 502 blkcopy(getregnum(p->x.kids[O]), 0,
tmpregs 434 getregnum(p->x.kids[l]), 0,
x.argno 359 p->syms[O]->u.c.v.i, tmpregs);
x.kids 359 break;
x.regnode 362
The call trace shown in Figure 13.4 starts in this case. tmpregs holds the
numbers of the three temporary registers, which form the triple register
16.3 • IMPLEMENTING FUNCTIONS 447

that progbeg assigned to bl kreg. ARGB and ASGNB target their source-
address register to reserve b1 kreg:
(MIPS target 437) +=
....
445 435
case ASGNB: rtarget(p->kids[l], 0, blkreg); break;
case ARGB: rtarget(p->kids[O], 0, blkreg); break;
This source comes from a grandchild because the intervening child is a
proforma INDIRB. emit2's case for ARGB is similar to the case for ASGNB:
....
(MIPS emi t2 446)+= 446 444
case ARGB:
dalign = 4;
salign = p->syms[l]->u.c.v.i;
blkcopy(29, p->syms[2]->u.c.v.i,
getregnum(p->x.kids[O]), 0,
p->syms[O]->u.c.v.i, tmpregs);
n = p->syms[2]->u.c.v.i + p->syms[O]->u.c.v.i;
dst = p->syms[2]->u.c.v.i;
for ( ; dst <= 12 && dst < n; dst += 4)
print("lw $%d,%d($sp)\n", (dst/4)+4, dst); 412 askregvar
break; 367 blkcopy
434 blkreg
da 1 i gn differs because the stack space for the outgoing argument is al- 368 dalign
356 doarg
ways aligned to at least a multiple of four, which is the most that bl kcopy 445 " (MIPS)
and its helpers can use. The first argument is 29 because the destina- 477 " (SPARC)
tion base register is $sp, and the second argument is the stack offset 512 " (X86)
for the destination block, which doarg computed. If the ARGB overlaps 356 emit2
the first four words of arguments, then the for loop copies the overlap 444 " (MIPS)
478 " (SPARC)
into the corresponding argument registers to conform with the calling 511 " (X86)
convention. 92 function
448 " (MIPS)
484 " (SPARC)
16.3 Implementing Fundions 518 " (X86)
337 gencode
92 gen
The front end calls local to announce each new local variable: 402 gen
.... 365 mkauto
(MIPS functions 433) +=
static void local(p) Symbol p; {
445 448
... 431
89
433
progbeg
" (MIPS)
if (askregvar(p, rmap[ttob(p->type)]) == 0) 466 " (SPARC)
mkauto(p); 498 " (X86)
} 398 rmap
400 rtarget
Machine-independent routines do most of the work. askregvar allocates 368 salign
434 tmpregs
a register if it's appropriate and one is available. Otherwise, mkauto as- 73 ttob
signs a stack offset; Figure 16.1 shows the layout of the MIPS stack frame. 359 x.kids
The front end calls function to announce each new routine. function
drives most of the back end. It calls gencode, which calls gen, which
448 CHAPTER 16 • GENERA TING MIPS R3000 CODE

high addresses

locals and
temporaries

framesize

outgoing
arguments

low addresses

FIGURE 16.1 MIPS stack frame.

calls the labeller, reducer, linearizer, and register allocator. function


also calls the front end's emi tcode, which calls the back end's emitter.
callee 93
(MIPS functions 433) +=
...
caller 93
447 455.... 431
emitcode 341 static void function(f, caller, callee, ncalls)
maxargoffset 366 Symbol f, callee[], caller[]; int ncalls; {
maxoffset 365 inti, saved, sizefsave, sizeisave, varargs;
offset 364 Symbol r, argregs[4];

(MIPS function 448)


}

The front end passes to function a symbol that represents a routine,


vectors of symbols representing the caller's and callee's view of the ar-
guments, and a count of the number of calls made by the routine. It
starts by freeing all registers and clearing the variables that track the
frame and that track the area into which the outgoing arguments are
copied:
(MIPS function 448)= 448
.... 448
(clear register state 410)
offset = maxoffset = maxargoffset = O;
Then it determines whether the routine is variadic, because this attribute
influences some of the code that we're about to generate:
(MIPS function 448) +=
...
448 449 448
....
for (i = O; callee[i]; i++)
16. 3 • IMPLEMENTING FUNCTIONS 449

varargs = variadicCf->type)
11 i > 0 && strcmpCcallee[i-1]->name, "va_alist") == O;
By convention on this machine, there must be a prototype, or the last
argument must be named va_alist. function needs it to determine the
location of some incoming arguments:
...
(MIPS function 448) +=
for Ci= O; callee[i]; i++) {
448 451 448
...
(assign location for argument i 449)
}
Recall that the first four words of arguments (including gaps to satisfy
alignments) are passed in registers $4-$7, except the first argument is
passed in $fl2 if it is a float or a double, and the second argument is
passed in $f14 if it is a float or a double and the first argument is passed
in $f12. This calling convention complicates function, particularly the
body of the loop above. It starts by assigning a stack offset to the argu-
ment:
(assign location for argument i 449) =
Symbol p = callee[i];
...
449 449

Symbol q = caller[i];
offset= roundupCoffset, q->type->align);
179 addressed
p->x.offset = q->x.offset = offset; 78 align
p->x.name = q->x.name = stringdCoffset); 444 argreg
r = argregCi, offset, ttobCq->type), ttobCcaller[O]->type)); 80 AUTO
if Ci < 4) 93 callee
argregs [i J = r; 93 caller
92 function
offset= roundupCoffset + q->type->size, 4); 448 " (MIPS)
Even arguments that arrive in a register and remain in one have a re- 484 " (SPARC)
served stack slot. Indeed, the offset helps arg reg determine which reg- 518 " (X86)
ister holds the argument. argregs [i J records for use below argreg's 361 IREG
60 isfloat
result for argument i. All arguments to variadic routines are stored in 60 isstruct
the stack because the code addresses them indirectly: 364 offset
... 19 roundup
(assign location for argument i 449)+=
if Cvarargs)
...
449 450 449 361 set
29 stringd
p->sclass = AUTO; 73 ttob
65 variadic
If the argument arrived in a register and the routine makes no calls that 362 x.name
could overwrite it, then the argument can remain in place if it is neither a 362 x.offset
structure, nor accessed indirectly, nor a floating-point argument passed 362 x.regnode
in an integer register.
(leave argument in place? 449) = 450
r && ncalls == 0 &&
!isstructCq->type) && !p->addressed &&
!CisfloatCq->type) && r->x.regnode->set !REG)
450 CHAPTER 16 • GENERA TING MIPS R3000 CODE

...
(assign location for argument i 449) +=
else if ((leave argument inplace?449)) {
449 450
... 449

p->sclass = q->sclass = REGISTER;


askregvar(p, r);
q->X = p->X;
q->type = p->type;
}

askregvar is guaranteed to succeed because r can't have been allocated


for any other purpose yet; a hidden assertion confirms this claim. Con-
forming the type and scl ass fields prevents the front end from generat-
ing code to copy or convert the argument. Finally, we allocate a register
for the argument that doesn't have one or can't stay in the one that it
has:
(assign location for argument i 449) +=
...
450 449
else if ((copyargumenttoanotherregister?450)) {
p->sclass = q->sclass = REGISTER;
q->type = p->type;
}

The conditional succeeds if and only if the argument arrives in one reg-
ister and must be moved to another one. For example, if an argument
askregvar 412 arrives in $4 but the routine makes calls, then $4 is needed for outgoing
REGISTER 80 arguments. If the incoming argument is used enough to belong in a reg-
rmap 398
sclass 38
ister, the code above arranges the copy. A floating-point argument could
ttob 73 have arrived in an integer register, which is an operation that the front
type 56 end can't express, so the code above conforms the type and scl ass to
tell the front end to generate nothing, and the fragment (save argument
in a register) on page 453 generates the copy.
The conditional in the last else-if statement above tests up to three
clauses. First, askregvar must allocate a register to the argument:
(copy argument to another register? 450) = ...
451 450
askregvar(p, rmap[ttob(p->type)])
If askregvar fails, then the argument will have to go in memory. If it's
not already there, the fragment (save argument in stack) on page 454 will
put it there. In this case, the two sclass fields are already conformed,
but we don't want to conform the two type fields because a conversion
might be needed. For example, a new-style character argument needs a
conversion on big endians; it is passed as an integer, so its value is in the
least significant bits of the argument word, but it's going to be accessed
as a character, so its value must be moved to the most significant end of
the word on big endians.
The second condition confirms that the argument is already in a reg-
ister:
16. 3 • IMPLEMENTING FUNCTIONS 451

....
(copy argument to another register? 450) + =
&& r != NULL
450 451
... 450

If this condition fails, then the argument arrived in memory and needs to
be loaded into the register that askregvar found. For example, such an
argument might be the last of five integer arguments, which means that
it's passed in memory and thus should be loaded into a register now, if
it's used heavily. askregvar sets p->scl ass to REGISTER; q->scl ass is
never REGISTER, so falling through with the differing values causes the
front end to generate the load.
The third and last condition confirms that no conversion is needed:
(copy argument to another register?450)+=
....
451 450
&& (isint(p->type) I I p->type == q->type)
For example, if q (the caller) is a double and p is a float, then a CVDF is
needed. In this case, the scl ass and type fields differ, so falling through
causes the front end to generate a conversion.
After assigning locations to all arguments, function calls gen code to
select code and allocate registers for the body of the routine:
....
(MIPS function 448) +=
offset = O;
449 451
... 448

gencode(caller, callee); 412 askregvar


93 callee
When gencode returns, usedmask identifies the registers that the rou- 93 caller
tine touches. function adds the register that holds the return address 361 FREG
- unless the routine makes no calls - and removes the registers that 92 function
the caller must have saved: 448 " (MIPS)
484 " (SPARC)
....
(MIPS function 448) +=
if (ncalls)
451 451 ... 448 518
337
" (X86)
gencode
361 !REG
usedmask[IREI.] I= ((unsigned)1)<<31; 60 isint
usedmask[IREG] &= OxcOffOOOO; 366 maxargoffset
usedmask[FREG] &= OxfffOOOOO; 364 offset
80 REGISTER
function then completes the computation of the size of the argument- 19 roundup
build area: 38 sclass
56 type
....
(MIPS function 448) +=
maxargoffset = roundup(maxargoffset, 4);
451 452 ... 448 410 usedmask

if (maxargoffset && maxargoffset < 16)


maxargoffset = 16;
The calling convention requires that the size of the outgoing argument
block be divisible by four and at least 16 bytes unless it's empty. Then
function computes the size of the frame and the blocks within it needed
to save the floating-point and integer registers.
452 CHAPTER 16 • GENERA TING MIPS R3000 CODE

(MIPS function 448)+=


...
451 452 448
.....
sizefsave = 4*bitcount(usedmask[FREG]);
sizeisave = 4*bitcount(usedmask[IREG]);
framesize = roundup(maxargoffset + sizefsave
+ sizeisave + maxoffset, 8);
bitcount counts the ones in an unsigned integer. Figure 16.l illustrates
these values. The convention keeps the stack aligned to double words.
Now function has the data it needs to start emitting the routine. The
prologue switches to the code segment, ensures word alignment, and
emits some boilerplate that starts MIPS routines:
(MIPS function 448)+= 452 452
... 448
.....
segment(CODE);
print(" .align 2\n");
print(".ent %s\n", f->x.name);
print("%s:\n", f->x.name);
i = maxargoffset + sizefsave - framesize;
print(".frame $sp,%d,$31\n", framesize);
if (framesize > 0)
print("addu $sp,$sp,%d\n", -framesize);
if (usedmask[FREG])
print(".fmask Ox%x,%d\n", usedmask[FREG], i - 8);
CODE 91 if (usedmask[IREG])
framesize 366 print(".mask Ox%x,%d\n", usedmask[IREG],
FREG 361 i + sizeisave - 4);
function 92
(MIPS) " 448 1cc's code uses only the label and addu, which allocates a frame for the
(SPARC) " 484 routine. The rest of the directives describe the routine to other programs
(X86) " 518 like debuggers and profilers. . ent announces a procedure entry point;
!REG 361
maxargoffset 366 . frame declares the stack pointer, frame size, and return-address regis-
maxoffset 365 ter; and . fmask and .mask identify the registers saved and their locations
roundup 19 in the stack, respectively.
segment 91 The prologue continues with code to store the callee-saved registers,
(MIPS) " 459 as defined in Table 16.2:
(SPARC) " 491
(X86) " 501 (MIPS function 448) +=
...
452 453 448
usedmask 410
.....
saved = maxargoffset;
x.name 362
for Ci = 20; i <= 30; i += 2)
if (usedmask[FREG]&(3<<i)) {
print("s.d $f%d,%d($sp)\n", i, saved);
saved += 8;
}
for (i = 16; i <= 31; i++)
if (usedmask[IREG]&(l<<i)) {
print("sw $%d,%d($sp)\n", i, saved);
saved += 4;
}
16.3 • IMPLEMENTING FUNCTIONS 453

Then it saves arguments that arrive in registers:


(MIPS function 448) +=
....
452 453 448
....
for (i = O; i < 4 && callee[i]; i++) {
r = argregs[i];
if (r && r->x.regnode != callee[i]->x.regnode) {
(save argument i 453)
}
}

For variadic routines, 1cc saves the rest of the integer argument registers
too, because the number used varies from call to call:
(MIPS function 448) +=
....
453 454
if (varargs && callee[i-1]) {
.... 448

i = callee[i-1]->x.offset + callee[i-1]->type->size;
for (i = roundup(i, 4)/4; i <= 3; i++)
print("sw $%cl,%d($sp)\n", i + 4, framesize + 4*i);
}

This loop picks up where its predecessor left off and continues until it
has stored the last integer argument register, $7.
For nonvariadic routines, the prologue saves only those argument reg-
isters that are used and that can't stay where they are: 93 callee
93 caller
(save argument i 453)= 453 366 framesi ze
Symbol out= callee[i]; 92 function
Symbol in = caller[i]; 448 " (MIPS)
int rn = r->x.regnode->number; 484 " (SPARC)
518 " (X86)
int rs= r->x.regnode->set; 60 isint
int tyin = ttob(in->type); 361 number
80 REGISTER
if (out->sclass == REGISTER 19 roundup
&& (isint(out->type) I I out->type in->type)) { 361 set
73 ttob
(save argument in a register454) 362 x.offset
} else { 362 x.regnode
(save argument in stack 454)
}

It distinguishes arguments assigned to some other register from those


assigned to memory. The clause after the && matches the one in (leave
argument in place?) on page 449, which determined what we should
generate here.
If a register was allocated for an argument that arrives in a register,
and if the argument can't remain where it is, then function emits code
to copy an incoming argument register to another register:
454 CHAPTER 16 • GENERATING MIPS R3000 CODE

{save argument in a register454)= 453


int outn = out->x.regnode->number;
if (rs == FREG && tyin == D)
print("mov.d $f%d,$f%d\n", outn, rn);
else if (rs == FREG && tyin == F)
pri nt("mov. s $-F~d, $f%d\n", outn, rn);
else if (rs == !REG && tyin == D)
print("mtcl.d $%d,$f%d\n", rn, outn);
else if (rs == !REG && tyin == F)
pri nt("mtcl $%d, $f%d\n", rn, outn);
else
print("move $%d,$%d\n", outn, rn);
If the argument has been assigned to memory, then the prologue stores
it into the procedure activation record:
{save argument in stack 454) = 453
int off = in->x.offset + framesize;
if (rs == FREG && tyin == D)
print("s.d $f%d,%d($sp)\n", rn, off);
else if (rs == FREG && tyin == F)
print("s.s $f%d,%d($sp)\n", rn, off);
else {
emitcode 341 int i, n = (in->type->size + 3)/4;
framesize 366
FREG 361 for Ci = rn; i < rn+n && i <= 7; i++)
function 92 print("sw $%d,%d($sp)\n", i, off+ (i-rn)*4);
(MIPS) " 448 }
(SPARC) " 484
(X86) " 518 The loop in the last arm usually executes only one iteration and stores
IREG 361 a single integer argument, but it also handles floats that arrived in an
maxargoffset 366 integer register, and the loop generalizes to handle double and structure
number 361
usedmask 410
arguments, which can occupy multiple integer registers. It terminates
x.offset 362 when it runs out of arguments or argument registers, whichever comes
x.regnode 362 first.
After emitting the last of the procedure prologue, function emits the
procedure body:
....
{MIPS function 448) +=
emitcode();
453 454... 448

The epilogue reloads the callee-saved registers, first the floating-point


registers:
....
{MIPS function 448)+=
saved = maxargoffset;
454 455... 448

for (i = 20; i <= 30; i += 2)


if (usedmask[FREG]&(3<<i)) {
print("l .d $f%d,%d($sp)\n", i, saved);
16.4 • DEFINING DATA 455

saved += 8;
}

and then the general registers:


(MIPS function 448) +=
...
454 455 448
.....
for Ci = 16; i <= 31; i++)
if (usedmask[IREG]&(l<<i)) {
print("lw $%d,%d($sp)\n", i, saved);
saved += 4;
}

Now it can pop the frame off the stack:


(MIPS function 448)+=
...
455 455 448
.....
if (framesize > 0)
print("addu $sp,$sp,%d\n", framesize);
and return:
(MIPS function 448) +=
...
455 448
print("j $31\n");
print(".end %s\n", f->x.name);

366 framesize
16.4 Defining Data 361 !REG
410 usedmask
47 Value
defconst emits assembler directives to allocate a scalar and initialize it 362 x.name
to a constant:
(MIPS functions433)+=
...
448 456 431
.....
static void defconst(ty, v) int ty; Value v; {
switch (ty) {
(MIPS defconst 455)
}
}

The cases for the integer types emit a size-specific directive and the ap-
propriate constant field:
(MIPS defconst 455)= 456
..... 455
case C: print(".byte %d\n", v.uc); return;
case S: print(".half %d\n", v.ss); return;
case I: print(".word Ox%x\n", v. i); return;
case U: print(".word Ox%x\n", v.u); return;
The case for numeric address constants treats them like unsigned inte-
gers:
456 CHAPTER 16 • GENERATING MIPS R3000 CODE

...
(MIPS defconst 455)+=
case P: print(".word Ox%x\n", v.p); return;
455 456
... 455

defaddress handles symbolic address constants:


...
(MIPS functions 433) + =
static void defaddress(p) Symbol p; {
455 456... 431

print(".word %s\n", p->x.name);


}
The assembler's . float and .double directives can't express floating-
point constants that result from arbitrary expressions (e.g., with casts),
so defconst emits floating-point constants in hexadecimal:
...
(MIPS defconst 455)+=
case F: print(".word Ox%x\n", *(unsigned *)&v.f); return;
...
456 456 455

The two halves of each double must be exchanged if 1cc is running on


a little endian and compiling for a big endian, or vice versa:
(MIPS defconst 455)+=
...
456 455
case D: {
unsigned *p = (unsigned *)&v.d;
print(".word Ox%x\n.word Ox%x\n", p[swap], p[!swap]);
return;
}
defconst 91
(MIPS) " 455
Barring this possible exchange, this code assumes that the host and tar-
(SPARC) " 490 get encode floating-point numbers identically. This assumption is not
(X86) " 522 particularly constraining because most targets use IEEE floating point
import 90 now.
(MIPS) " 457 defstring emits directives for a series of bytes:
(SPARC) " 491
(MIPS functions433)+=
...
(X86) "
swap
523
371 static void defstring(n, str) int n; char *str; { ...
456 456 431

x.name 362 char *s;

for (s = str; s < str + n; s++)


print(".byte %d\n", (*s)&0377);
}
It finds the end of the string by counting because ANSI C escape codes
permit strings with embedded null bytes.
export uses an assembler directive to make a symbol visible in other
modules:
...
(MIPS functions 433) +=
static void export(p) Symbol p; {
...
456 457 431

print(".globl %s\n", p->x.name);


}
import uses a companion directive to make a symbol from another mod-
ule visible in this one:
16.4 •DEFINING DATA 457

....
(MIPS functions 433) +=
static void import(p) Symbol p; {
456 457 ... 431

if (!isfunc(p->type))
print(".extern %s %d\n", p->name, p->type->size);
}

MIPS compiler conventions omit such directives for functions.


The front end calls defsymbol to announce a new symbol and cue the
back end to initialize the x. name field:
....
(MIPS functions 433) +=
static void defsymbol(p) Symbol p; {
457 457
... 431

(MIPS defsymbol 457)


}

defsymbo 1 generates a unique name for local statics to keep from collid-
ing with other local statics with the same name:
(MIPS defsymbol 457)=
if (p->scope >= LOCAL && p->sclass == STATIC)
...
457 457

p->x.name = stringf("L.%d", genlabel(l));


By convention, such symbols start with an L and a period. If the symbol is
generated but not covered by the rule above, then the name field already
80 EXTERN
holds a digit string: 50 generated
.... 45 genlabel
(MIPS defsymbol 457)+=
else if (p->generated)
457 457
... 457
38 GLOBAL
60 isfunc
p->x.name = stringf("L.%s", p->name); 38 LOCAL
37 scope
Otherwise, the front- and back-end names are the same: 80 STATIC
.... 99 stringf
(MIPS defsymbol 457)+= 457 457 362 x.name
else 362 x.offset
p->x.name = p->name;
Many UNIX assemblers normally omit from the symbol table symbols
that start with L, so compilers can save space in object files by starting
temporary labels with L.
address is like defsymbol for symbols that represent some other sym-
bol plus a constant offset.
....
(MIPS functions 433) += 457 458 431
static void address(q, p, n) Symbol q, p; int n; { ...
q->x.offset = p->x.offset + n;
if (p->scope == GLOBAL
I I p->sclass == STATIC I I p->sclass == EXTERN)
q->x. name stri ngf("%s%s%d", p->x. name,
n >= 0 ? "+" : "", n);
458 CHAPTER 16 • GENERA TING MIPS R3000 CODE

else
q->x.name = stringd(q->x.offset);
}

For variables on the stack, address simply computes the adjusted offset.
For variables accessed using a label, it sets x. name to a string of the form
name ± n. If the offset is positive, the literal "+" emits the operator; if
the offset is negative, the %d emits it.
MIPS conventions divide the globals to access small ones faster. MIPS
machines form addresses by adding a register to a signed 16-bit instruc-
tion field, so developing and accessing an arbitrary 32-bit address takes
multiple instructions. To reduce the need for such sequences, translators
put small globals into a special 64K bytes sdata segment. The dedicated
register $gp holds the base address of sdata, so up to 64K bytes of glob-
als can be accessed in one instruction. The -Gn option sets the threshold
gnum:
(MIPS data433)+=
....
434 459 431
....
static int gnum = 8;
(parse -G flag 458) = 433
parseflags(argc, argv);
for (i = O; i < argc; i++)
address 90 if (strncmp(argv[i], "-G", 2) == 0)
(MIPS) " 457
(SPARC) " 490 gnum = atoi(argv[i] + 2);
(X86) " 521
align 78 The front end calls the interface procedure global to announce a new
BSS 91 global symbol:
DATA 91
(MIPS functions 433) +=
....
457 459 431
parsefl ags 370 ....
seg 265 static void global(p) Symbol p; {
stringd 29 if (p->u.seg == BSS) {
x.name 362 (define an uninitialized global 459)
x.offset 362 } else {
(define an initialized global 458)
}
}

global puts small initialized globals into sdata and the rest into data:
(define an initialized global 458) = 458
if (p->u.seg == DATA
&& (p->type->size == 0 I I p->type->size > gnum))
print(" .data\n");
else if (p->u.seg == DATA)
print(".sdata\n");
print(".align %c\n", ".01.2 ... 3"[p->type->align]);
print("%s:\n", p->x.name);
16.4 • DEFINING DATA 459

p->type->size is zero when the size is unknown, which happens when


certain array declarations omit bounds. This path through gl oba 1 winds
up by emitting an alignment directive and the label. " . 01. 2 ... 3" [x] is
a compact expression for the logarithm to the base 2 of an alignment x,
which is what this . a 1 i gn directive expects.
The directives for uninitialized globals implicitly define the label, re-
serve space, and choose the segment based on size:
(definean uninitialized global 459) = 458
if (p->sclass == STATIC I I Aflag >= 2)
print(".lcomm %s,%d\n", p->x.name, p->type->size);
else
print( ".comm %s,%d\n", p->x.name, p->type->size);
. comm also exports the symbol and marks it so that the loader generates
only one common global even if other modules emit . comm directives for
the same identifier. . 1comm takes neither step. 1 cc uses it for statics to
avoid the export, and the scrupulous double -A option uses it to have the
loader object when multiple modules define the same global. Pre-ANSI
C permitted multiple definitions, but ANSI C technically expects exactly
one definition; other modules should use extern declarations instead.
cseg tracks the current segment:
(MIPS data433)+=
...
458 431
62 Aflag
static int cseg; 91 BSS
Since symbols in the DATA and BSS segments do their own segment 91 CODE
91 DATA
switching, segment emits directives for only the text and literal segments:
(MIPS functions433)+=
...
458 459 431
90
458
global
" (MIPS)
..... 492 " (SPARC)
static void segment(n) int n; {
524 " (X86)
cseg = n; 91 LIT
switch (n) { 80 STATIC
case CODE: print(".text\n"); break; 362 x.name
case LIT: print(".rdata\n"); break;
}
}

space emits a directive that reserves a block of memory unless the sym-
bol is in the BSS segment, because global allocates space for BSS sym-
bols:
(MIPS functions433)+=
...
459 460 431
.....
static void space(n) int n; {
if (cseg != BSS)
print(".space %d\n", n);
}

. space clears the block, which the standard requires of declarations that
use it.
460 CHAPTER 16 • GENERA TING MIPS R3000 CODE

16.5 Copying Blocks


b1k1 oop emits a loop to copy size bytes from a source address - formed
by adding register sreg and offset soff - to a destination address -
formed by adding register dreg and offset doff. Figure 13.4 shows
blkloop, blkfetch, and blkstore in action.
(MIPS functions433)+=
....
459 460 431
.
static void blkloop(dreg, doff, sreg, soff, size, tmps)
.....
int dreg, doff, sreg, soff, size, tmps[]; {
int lab= genlabel(l);

print("addu $%d,$%d,%d\n", sreg, sreg, size&-7);


print("addu $%d,$%d,%d\n", tmps[2], dreg, size&-7);
blkcopy(tmps[2], doff, sreg, soff, size&?, tmps);
print("L.%d:\n", lab);
print("addu $%d,$%d,%d\n", sreg, sreg, -8);
print("addu $%d,$%d,%d\n", tmps[2], tmps[2], -8); r
I

blkcopy(tmps[2], doff, sreg, soff, 8, tmps); '


print("bltu $%d,$%d,L.%d\n", dreg, tmps[2], lab);
}

tmps names three registers to use as temporaries. Each iteration copies


blkcopy 367 eight bytes. Initial code points s reg and tmps [2] at the end of the block
blkstore 356
(MIPS) " 461 to copy. If the block's size is not a multiple of eight, then the first
(SPARC) " 493 b1 kcopy copies the stragglers. Then the loop decrements registers s reg
(X86) " 513 and tmps[2], calls blkcopy to copy the eight bytes at which they now
genlabel 45 point, and iterates until the value in register tmps [2] reaches the one in
reg 403 register dreg.
salign 368
blkfetch emits code to load register tmp with one, two, or four bytes
from the address formed by adding register reg and offset off:
(MIPS functions 433)+=
....
460 461 431
.....
static void blkfetch(size, off, reg, tmp)
int size, off, reg, tmp; {
if (size == 1)
print("lbu $%d,%d($%d)\n", tmp, off, reg);
else if (salign >= size && size == 2)
print("lhu $%d,%d($%d)\n", tmp, off, reg);
else if (salign >= size)
print("lw $%d,%d($%d)\n", tmp, off, reg);
else if (size == 2)
print("ulhu $%d,%d($%d)\n", tmp, off, reg);
else
print("ulw $%d,%d($%d)\n", tmp, off, reg);
}
FURTHER READING 461

If the source alignment, as given by sa1i gn, is at least as great as the


size of the unit to load, then bl kfetch uses ordinary aligned loads. Oth-
erwise, it uses assembler pseudo-instructions that load unaligned units.
For byte loads, alignment is moot. bl kstore is the dual of bl kfetch:
(MIPS functions 433) +=
...
460 431
static void blkstore(size, off, reg, tmp)
int size, off, reg, tmp; {
if (size == 1)
print("sb $%d,%d($%d)\n", tmp, off, reg);
else if (dalign >= size && size == 2)
print("sh $%d,%d($%d)\n", tmp, off, reg);
else if (dalign >= size)
print("sw $%d,%d($%d)\n", tmp, off, reg);
else if (size == 2)
print("ush $%d,%d($%d)\n", tmp, off, reg);
else
print("usw $%d,%d($%d)\n", tmp, off, reg);
}

Further Reading
356 blkfetch
460 " (MIPS)
Kane and Heinrich (1992) is a reference manual for the MIPS R3000 series. 492 " (SPARC)
1cc's MIPS code generator works on the newer MIPS R4000 series, but it 513 " (X86)
doesn't exploit the R4000 64-bit instructions. 368 dalign
403 reg
368 salign
Exercises
16.1 Why can't small global arrays go into sdata?
16.2 Why must all nonempty argument-build areas be at least 16 bytes
long?
16.3 Explain why the MIPS calling convention can't handle variadic rou-
tines for which the first argument is a float or double.
16.4 Explain why the MIPS calling convention makes it hard to pass struc-
tures reliably in the undeclared suffix of variable length argument
lists. How could this problem be fixed?
16.5 Extend 1cc to emit the information about the type and location
of identifiers that your debugger needs to report and change the
values of identifiers. The symbolic back end that appears on the
companion diskette shows how the stab functions are used.
462 CHAPTER 16 •GENERATING MIPS R3000 CODE

16.6 Page 418 describes ral loc's assumption that all templates clobber
no target register before finishing with all source registers. 1cc's
MIPS template for CVID on page 440 satisfies this requirement in
two ways. Describe them.
16.7 Using the MIPS code generator as a model, write a code generator
for another RISC machine, like the DEC Alpha or Motorola PowerPC.
Read Section 19.2 first.

ralloc 417
17
Generating SPARC Code

The SPARC architecture is another RISC. It has 32 32-bit general registers,


32 32-bit floating-point registers, a compact set of 32-bit instructions,
and two addressing modes. It accesses memory only through explicit
load and store instructions.
The main architectural differences between the MIPS and SPARC in-
volve the SPARC register windows, which automatically save and restore
registers at calls and returns. The associated calling convention changes
function a lot. For a truly simple function, see the X86 function.
1 cc's target is, however, the assembler language, not the machine lan-
guage, and the MIPS and SPARC assemblers differ in ways that exaggerate
the differences between the code generators. Most RISC machines can,
for example, increment a register by a small constant in one instruction,
but larger constants take more instructions. They develop a full 32-bit
constant into a temporary, which they then add to the register. The MIPS
assembler insulates us from this feature; that is, it lets us use arbitrary
constants almost everywhere, and it generates the multi-instruction im- 92 function
448 " (MIPS)
plementation when necessary. The SPARC assembler is more literal and 484 " (SPARC)
requires the code generator to emit different code for large and small 518 " (X86)
constants. Similarly, the MIPS assembler schedules instructions, but the
SPARC code generator does not.
SPARC assembler instructions list source operands before the destina-
tion operand. A% precedes register names. Table 17.1 describes enough
sample instructions to get us started.
The file spare. c collects all code and data specific to the SPARC archi-
tecture. It's an 1burg specification with the interface routines after the
grammar:
(sparc.md 463)=
%{
(Iburg prefix 37 s)
(interface prototypes)
(SPARC prototypes)
(SPARC data 467)
%}
(terminal declarations 376)
%%
(shared rules 400)
(SPARC rules 469)
%%
463
464 CHAPTER 17 • GENERATING SPARC CODE

Assembler Meaning
mov %i0,%o0 Set register oO to the value in register i 0.
sub %i0,%il,%o0 Set register oO to register i 0 minus
register i 1.
sub %i0,1,%o0 Set register oO to register i O minus one.
ldsb [%i0+4],%o0 Set register oO to the byte at the address
four bytes past the address in register i 0.
ldsb [%i0+%i4],%o0 Set register oO to the byte at the address
equal to the sum of registers i 0 and i 4.
fsubd %f0,%f2,%f4 Set register f4 to register fO minus
register f2. Use double-precision
floating-point arithmetic.
fsubs %f0,%f2,%f4 Set register f4 to register fO minus
register f2. Use single-precision
floating-point arithmetic.
ba Ll Jump to the instruction labelled Ll.
jmp [%i OJ Jump to the address in register i 0.
cmp %i0,%il Compare registers i 0 and i 1 and record
results in the condition flags.
bl Ll Branch to Ll if the last comparison
recorded less-than.
.byte Ox20 Initialize the next byte in memory to
hexadecimal 20.
Interface 79
stabblock 80 TABLE 11.1 Sample SPARC assembler input lines.
stabinit 80
stabline 80
stabsym 80 (SPARC functions 466)
stabtype 80 (SPARC interface definition 464)

The last fragment configures the front end and points to the SPARC rou-
tines and data in the back end:
(SPARC interface definition 464) = 464
Interface sparcIR = {
(SPARC type metrics 465)
0, /* little_endian */
1, /* mulops_calls */
1, /* wants_callb */
0, /* wants_argb */
1, /* left_to_right */
0, /* wants_dag */
(interface routine names)
stabblock, 0, 0, stabinit, stabline, stabsym, stabtype,
{
1, /* max_unaligned_load */
(Xi nterface initializenss)
17. 1 • REGISTERS 465

}
};

(SPARC type metrics 465) = 464


1, 1, 0, /* char */
2, 2' 0, /* short */
4, 4, 0, /* int */
4, 4, 1, /* float */
8, 8, 1, /* double */
4, 4, 0, /* T * */
0, 1, 0, /* struct */
mulops_calls is one because some SPARC processors implement the
multiplication and division with code instead of hardware.
The SPARC and MIPS conventions for structure arguments and return
values are duals. The MIPS conventions use ARGB but no CALLB, and the
SPARC conventions use CALLB but no ARGB.
The symbol-table emitter is elided but complete. The two zeros in the
stab routines correspond to routines that need emit nothing on this par-
ticular target. The other stab names above are #defined to zero when
building the SPARC code generator into a cross-compiler on another ma-
chine because the elided code includes headers and refers to identifiers
that are known only on SPARC systems. 87 mulops_calls

17 .1 Registers
The SPARC assembler language programmer sees 32 32-bit general reg-
isters. Most are organized as a stack of overlapping register windows.
Most routines allocate a new window to store locals, temporaries, and
outgoing arguments - the calling convention passes some arguments in
registers - and free the window when they return.
The general registers have at least two names each, as shown in Ta-
ble 17.2. One is r0-r31, and the other encodes a bit more about how the
register is used and where it goes in a register window. gO is hard-wired
to zero. Instructions can write it, but the change won't take. When they
read it, they read zero.

Basic Name Equivalent Name Explanation


r0-r7 g0-g7 Fixed global registers. Not stacked.
r8-rl5 o0-o7 Outgoing arguments. Stacked.
r16-r23 10-17 Locals. Stacked.
r24-r31 i0-i7 Incoming arguments. Stacked.

TABLE 17.2 SPARC general registers.


466 CHAPTER 17 • GENERA TING SPARC CODE

global gO main iO mai n's regis terwindow

global g7 main i7
global registers main 10

main 17
'main·.,-,:oo f:
. iO f's register window
..
'•
" ..
'.~'in-, ot·, f. i7
f 10

f 17
f oO

f o7

FIGURE 17.1 main calls f.

The machine arranges the register windows so that the physical reg-
isters called o0-o7 in each caller are the same registers referred to as
greg 467 i0-i7 in the callee. Figure 17.l shows the register windows for
main() { f(); }
f() { return; }

just before f returns. There are 32 general registers, but each call con-
sumes only 16, because g0-g7 aren't stacked, and the shading shows that
the caller's o0-o7 are the same physical registers as the callee's i0-i7.
The interface procedure progend does nothing for this target. progbeg
parses the target-specific flags -p and -pg, which have 1cc emit code for
the SPARC profilers, but which this book omits. progbeg also initializes
the structures that describe the register set:
(SPARC functions466)= 468 464
....
static void progbeg(argc, argv) int argc; char *argv[]; {
int i;

(shared progbeg 371)


(parse SPARC flags)
(initialize SPARC register structures 467)
}

progbeg causes each element of greg to describe one general register:


11. 1 • REGISTERS 467

(SPARC data 467) = 467


..... 463
static Symbol greg[32];
static Symbol *oreg = &greg[8], *ireg = &greg[24];
The initialization code parallels Table 17.2:
(initialize SPARC register structures 467) = 467
..... 466
for Ci = O; i < 8; i++) {
greg[i + OJ = mkreg(stringfC"g%d", i), i + 0, 1, !REG);
greg[i + 8] = mkreg(stringfC"o%d", i), i + 8, 1, !REG);
greg[i + 16] = mkregCstringf("l%d", i), i + 16, 1, IREG);
greg[i + 24] = mkregCstringf("i%d", i), i + 24, 1, IREG);
}
The machine also has 32 32-bit floating-point registers, f0-f31 in as-
sembler language. These are conventional registers and involve noth-
ing like the general-register stack. Even-odd register pairs may be used
as double-precision floating-point registers. progbeg causes each ele-
ment of freg to describe one single-precision floating-point register, and
each even numbered element of freg2 to describe one double-precision
floating-point register:
....
(SPARC data 467) += 467 487
..... 463
static Symbol freg[32], freg2[32];
.... 361 FREG
(initialize SPARC register structures 467) += 467 467
..... 466 361 !REG
for Ci= O; i < 32; i++) 363 mkreg
freg[i] mkreg("%d", i, 1, FREG); 363 mkwildcard
for Ci = O; i < 31; i += 2) 89 progbeg
433 " (MIPS)
freg2 [i] mkreg("%d", i, 3, FREG); 466 " (SPARC)
rmap stores the wildcard that identifies the default register class to 498 " (X86)
use for each type: 398 rmap
.... 99 stringf
(initialize SPARC register structures 467) += 467 468
..... 466
rmap[C] = rmap[S] = rmap[P] = rmap[B] rmap[U] = rmap[I] =
mkwildcard(greg);
rmap[F] = mkwildcardCfreg);
rmap[D] = mkwildcardCfreg2);
1cc puts no variables or temporaries in g0-g7, i 6-i 7, oO, or 06-07. The
calling convention does not preserve g0-g7 across calls. 06 is used as the
stack pointer; it's sometimes termed sp. i 6 is used as the frame pointer;
it's sometimes termed fp. Each caller puts its return address in its o7,
which is known as i7 in the callee (see Figure 17.1). A function returns
its value in its i 0, which its caller knows as oO. A routine that calls other
routines expects its callees to destroy its o0-o7. Floating-point values
return in fO or fO-fl.
That leaves general registers iO-iS, 10-17, and ol-oS. lee will put
temporaries in any of these registers:
468 CHAPTER 11 • GENERATING SPARC CODE

(initialize SPARC register structures 467} +=


...
467 468 466
.....
tmask[IREG] = Ox3fff3e00;
1cc will put register variables in about half of them, namely 14-17 and
iO-i 5:
(initialize SPARC register structures 467} +=
...
468 468 466
.....
vmask[IREG] = Ox3ff00000;
Recall that tmask identifies the registers that may serve as temporaries
while evaluating expressions, and vmask identifies those that may hold
register variables. The dividing line between temporaries and register
variables is somewhat arbitrary. Ordinarily, the two sets are mutually
exclusive: registers for variables are spilled in the routine's prologue to
avoid spilling all live register variables at all call sites, but temporaries
can be spilled at the call sites because it's easy for the register alloca-
tor to identify the few that are typically live. The SPARC register stack,
however, automatically saves many registers when it enters a routine, so
we may as well permit temporaries in all of them. We restrict variables
to about half of the registers because they get first crack at the register,
and if we leave too few temporaries, then we can get a lot of spills or
hamstring the register allocator altogether.
The calling convention preserves no floating-point registers across
calls. 1cc uses them only for temporaries:
FREG 361
!REG 361 (initialize SPARC regi.ster structures 467} +=
...
468 466
rtarget 400 tmask[FREG] = -(unsigned)O;
setreg 399 vmask[FREG] = O;
spill 427
tmask 410 target calls set reg to mark nodes that need a special register, and it
vmask 410 calls rtarget to mark nodes that need a child in a special register:
(SPARC functions466}+=
...
466 468 464
.....
static void target(p) Node p; {
switch (p->op) {
(SPARC target 473}
}
}
If an instruction clobbers some registers, c1obbe r calls spi 11 to save
them first and restore them later.
(SPARC functions 466} +=
...
468 469 464
.....
static void clobber(p) Node p; {
switch (p->op) {
(SPARC clobber 477}
}
}
The cases missing from target and clobber above appear with the ger-
mane instructions in the next section.
17.2 • SELECTING INSTRUCTIONS 469

17.2 Selecting Instructions


Table 17.3 summarizes the nonterminals in lee's 1burg specification for
the SPARC code generator. It provides a high-level overview of the orga-
nization of the tree grammar.
The use of percent signs in the SPARC assembler language interacts
unattractively with 1cc's use of percent sign as the template escape char-
acter. For example, the set pseudo-instruction sets a register to a con-
stant integer or address, so the rule for ADDRGP is:
(SPARC ruJes469)=
reg: ADDRGP "set %a,%%%c\n" 1
...
470 463

The template substring %% emits one %, and the template substring %c


emits the name of the destination register, so the template substring
%%%c directs the emitter to precede the register name with a percent sign
in the generated code. It isn't pretty, but it's consistent with print and
pri ntf, and if we'd picked a different escape character, it would probably
have needed quoting on some other target.
SPARC instructions with an immediate field store a signed 13-bit con-
stant, so several instructions use the target-specific cost function i mm,
which returns a zero cost if p's constant value fits and a huge cost if it
doesn't:
18 print
....
(SPARC functions 466) +=
static int imm(p) Node p; {
468 477 ... 464 388 range
403 reg
return range(p, -4096, 4095);
}

For example, if 13 bits are enough for the signed offset of an ADDRFP
or ADDRLP node, then one instruction can develop the address into a
register:

Name What It Matches


addr address calculations for instructions that read and write memory
addrg ADDRG nodes
base addr minus the register+register addressing mode
call operands to ca11 instructions
con constants
con13 constants that fit in 13 signed bits
re registers and constants
reg computations that yield a result in a register
stk addresses of locals and formals
stk13 addresses of locals and formals that fit in 13 signed bits
stmt computations done for side effect

TABLE 17.3 SPARC nonterminals


470 CHAPTER 17 • GENERA TING SPARC CODE

(SPARC rules469)+=
....
469 470 463
.....
stk13: ADDRFP "%a" imm(a)
stk13: ADDRLP "%a" imm(a)
reg: stk13 "add %0,%%fp,%?'o%c\n" 1
Otherwise, it takes more instructions:
(SPARC rules 469) +=
....
470 470 463
.....
stk: ADDRFP "set %a,%?'o%c\n" 2
stk: ADDRLP "set %a,%%%c\n" 2
reg: ADDRFP "set %a,%%%c\nadd %%%c,%%fp,%%%c\n" 3
reg: ADDRLP "set %a,?'o%%c\nadd %%%c,%%fp,%%%c\n" 3
set is a pseudo-instruction that generates two instructions if the con-
stant can't be loaded in just one, and if one instruction would do, then
stk13 will take care of it. We might have done something similar in
the MIPS code generator, but the MIPS assembler can hide constant size-
checking completely, so we might as well use this feature. The SPARC
assembler leaves at least part of the problem to the programmer or com-
piler, so we had no choice this time.
The four rules above appear equivalent to
stk: ADDRFP "set %a,%%%c\n" 2
stk: ADDRLP "set %a,%%%c\n" 2
imm 469 reg: stk "add %0,%%fp,%?'o%c\n" 1
reduce 382
reg 403 but the shorter rules fail because they ask reduce to store two different
x. inst 358 values into one x. inst. Recall that a node's x. inst records as an in-
struction the nonterminal that identifies the rule that matches the node,
if there is one. The problem with the short rules above is that the x. inst
field for the ADDRLP or ADDRFP can't identify both stk and reg.
The nonterminal con13 matches small integral constants:
(SPARC rules 469) +=
....
470 470 463
.....
con13: CNSTC "%a" imm(a)
con13: CNSTI "%a" imm(a)
con13: CNSTP "%a" imm(a)
con13: CNSTS "%a" imm(a)
con13: CNSTU "%a" imm(a)
The instructions that read and write memory cells use address calcula-
tion that can add a register to a 13-bit signed constant:
....
(SPARC rules 469) += 470 471
..... 463
base: ADDI(reg,con13) "%%%0+%1"
base: ADDP(reg,con13) "%%%0+%1"
base: ADDU(reg,con13) "%%%0+%1"
If the constant is zero or the register gO, the sum degenerates to a simple
indirect or direct address:
11.2 • SELECTING INSTRUCTIONS 471

....
{SPARC rules 469) +=
base: reg "%%%0"
470 471
... 463

base: con13 "%0"


If the register is the frame pointer, then the sum yields the address of a
formal or local:
....
{SPARC rules469)+=
base: stk13 "%%fp+%0"
471 471... 463

The address calculation hardware can also add two registers:


....
{SPARC rules 469) +=
addr: base "%0"
471 471... 463

addr: ADDI(reg,reg) "%%%0+%%%1"


addr: ADDP(reg,reg) "%%"..60+%%%1"
addr: ADDU(reg,reg) "%%%0+%%%1"
addr: stk "%%fp+%"...6%0"
Most loads and stores can use the full set of addressing modes above:
....
{SPARC rules469)+=
reg: INDIRC(addr) "ldsb [%0] ,%%"-'c\n" 1
471 471... 463

reg: INDIRS(addr) "ldsh [%0],%%%c\n" 1


reg: INDIRI(addr) "ld [%0] ,%%%c\n" 1 469 imm
reg: INDIRP(addr) "ld [%0] ,%%%c\n" 1 403 reg
403 stmt
reg: INDIRF(addr) "ld [%0],%%f%c\n" 1
stmt: ASGNC(addr,reg) "stb %%%1, [%0]\n" 1
stmt: ASGNS(addr,reg) "sth %%%1,[%0]\n" 1
stmt: ASGNI(addr,reg) "st %%%1,[%0]\n" 1
stmt: ASGNP(addr,reg) "st %%%1, [%0]\n" 1
stmt: ASGNF(addr,reg) "st %%f%1,[%0]\n" 1
The 1dd and std instructions load and store a double, but only at ad-
dresses divisible by eight. The conventions for aligning arguments and
globals guarantee only divisibility by four, so 1dd and std suit only locals:
....
{SPARC rules 469) +=
addrl: ADDRLP "%%%fp+%a"
471 472
imm(a)
... 463

reg: INDIRD(addrl) "ldd [%0],%%f%c\n" 1


stmt: ASGND(addrl,reg) "std %%f%1,[%0]\n" 1
The pseudo-instructions 1d2 and st2 generate instruction pairs to load
and store doubles aligned to a multiple of four, but some SPARC assem-
blers silently emit incorrect code when the address is the sum of two
registers, so the rules for these pseudo-instructions use the nonterminal
base, which omits register-plus-register addressing:
472 CHAPTER 11 • GENERATING SPARC CODE

....
(SPARC rules469}+=
reg: INDIRD(base) "ld2 [%0],%%f%c\n" 2
471 472... 463

stmt: ASGND(base,reg) "st2 %%f%1,[%0]\n" 2


But for this assembler bug, the rules defining base and addr could be
combined and define a single nonterminal.
The spiller needs to generate code that can store a register when all
allocable registers are busy. When the offset doesn't fit in a SPARC imme-
diate field, the ASGN rules above generate multiple instructions, and the
instructions need a register to communicate, which violates the spiller's
assumption. 1 cc corrects this problem with a second copy of the ASGN
rules. They use the unallocable register gl to help store locals - 1cc
spills only to locals - that are not immediately addressable:
....
(SPARC rules469}+=
spill: ADDRLP "%a" ! imm(a)
472 472
... 463

stmt: ASGNC(spill,reg) "set %0,%%gl\nstb %%%1, [%%fp+%°~gl]\n"


stmt: ASGNS(spill,reg) "set %0,%°~gl\nsth %%%1, [%%fp+%%gl]\n"
stmt: ASGNI(spill,reg) "set %0,%%gl\nst %%%1,[%%fp+%%gl]\n"
stmt: ASGNP(spill,reg) "set %0,%°~gl\nst %%%1, [%%fp+%%gl]\n"
stmt: ASGNF(spill,reg) "set %0,%%gl\nst %°~f%1, [%%fp+%%gl]\n"
stmt: ASGND(spill,reg) "set %0,%%gl\nstd %%-F~l. [%%fp+%%gl]\n"
imm 469
move 394 The rules have an artificially low cost of zero so that they'll win when
moveself 394 they match, which isn't often. These rules can apply to stores that aren't
reg 403
spills, but using a cost of zero in those cases is harmless. See Exer-
requate 394
spil 1 427 cise 17.7.
stmt 403 1dsb and 1dsh extend the sign bit of the cell that they load, so they
implement a CVCI and CVSI for free. 1dub and 1duh clear the top bits, so
they include a free cvcu and cvsu.
....
(SPARC rules469}+=
reg: CVCI(INDIRC(addr)) "ldsb [%0] ,%%%c\n" 1
472 472... 463

reg: CVSI(INDIRS(addr)) "ldsh [%0],%%%c\n" 1


reg: CVCU(INDIRC(addr)) "ldub [%0],%%%c\n" 1
reg: CVSU(INDIRS(addr)) "lduh [%0],%%%c\n" 1
The integral conversions to types no wider than the source can also gen-
erate a register-to-register move instruction. Recall that move returns one
and marks the node for possible optimization by requate and movese 1f .
....
(SPARC rules469}+=
reg: CVIC(reg) "mov %%%0,%%%c\n" move(a)
472 473
... 463

reg: CVIS(reg) "mov %%°~0,%%%c\n" move(a)


reg: CVIU(reg) "mov %°.i6%0,%%%c\n" move(a)
reg: CVPU(reg) "mov %%%0,%%%c\n" move(a)
reg: CVUC(reg) "mov ro%%0,%%%c\n" move(a)
17.2 • SELECTING INSTRUCTIONS 473

reg: CVUI(reg) "mov %%%0,%%%c\n" move(a)


reg: CVUP(reg) "mov %%"AIO,%%%c\n" move(a)
reg: CVUS(reg) "mov %%%0,%%%c\n" move(a)
If the node targets no special register, it can generate nothing at all:
(SPARC rules 469) +=
...
472 473 463
.....
reg: CVIC(reg) "%0" notarget(a)
reg: CVIS(reg) "%0" notarget(a)
reg: CVUC(reg) "%0" notarget(a)
reg: CVUS(reg) "%0" notarget(a)
This second list looks shorter than the one above it, but the target-
independent fragment (shared rules) on page 400 makes up the differ-
ence.
LOADs also generate register copies:
(SPARC rules 469) +=
...
473 473 463
.....
reg: LOADC(reg) "mov %%%0,%%%c\n" move(a)
reg: LOADI(reg) "mov %%%0,%%%c\n" move(a)
reg: LOADP(reg) "mov %%%0,%%%c\n" move(a)
reg: LOADS(reg) "mov %%%0,%"Al%c\n" move(a)
reg: LOADU(reg) "mov %%%0,%"Al%c\n" move(a)
It would be nice to share these rules too, but the templates are machine-
specific. 467 greg
Register gO is hard-wired to hold zero, so integral CNST nodes with the 361 LOAD
394 move
value zero generate no code:
(SPARC rules469)+=
...
473 474 463
404
388
notarget
range
..... 403 reg
reg: CNSTC "# reg\n" range(a, 0, O)
399 setreg
reg: CNSTI "# reg\n" range(a, 0, 0) 357 target
reg: CNSTP "# reg\n" range(a, 0, 0) 435 " (MIPS)
reg: CNSTS "# reg\n" range(a, 0, 0) 468 " (SPARC)
reg: CNSTU "# reg\n" range(a, 0, 0) 502 " (X86)
360 x.registered
Recall that cost expressions are evaluated in a context in which a denotes
the node being labelled, which here is the constant value being tested for
zero. target arranges for these nodes to return gO:
(SPARC target 473)= 476 468
.....
case CNSTC: case CNSTI: case CNSTS: case CNSTU: case CNSTP:
if (range(p, 0, 0) == 0) {
setreg(p, greg[OJ);
p->x.registered = 1;
}
break;
Allocating gO makes no sense, so target marks the node to preclude
register allocation.
The set pseudo-instruction can load any constant into a register:
474 CHAPTER 17 • GENERA TING SPARC CODE

....
(SPARC rules469)+=
reg: con "set %0,%%%c\n" 1
473 474
... 463

set generates one instruction if the constant fits in 13 bits and two oth-
erwise. The assembler insulates us from the details.
Most binary instructions that operate on integers can accept a register
or a 13-bit constant as the second source operand:
....
(SPARC rules469)+=
re: con13 "%0"
474 474 ... 463

re: reg "%%%0"


The first operand and the result must be registers:
....
(SPARC rules469)+=
reg: ADDI(reg,rc) "add %%%0,%1,%%%c\n" 1
474 474 ... 463

reg: ADDP(reg,rc) "add %%%0,%1,%%%c\n" 1


reg: ADDU(reg,rc) "add %%%0,%1,%%%c\n" 1
reg: BANDU(reg,rc) "and %%%0,%1,%%%c\n" 1
reg: BORU(reg, re) "or %%%0,%1,%%%c\n" 1
reg: BXORU(reg,rc) "xor %%%0,%1,%%%c\n" 1
reg: SUB! (reg , re) "sub %%%0,%1,%%%c\n" 1
reg: SUBP(reg,rc) "sub %%%0,%1,%%%c\n" 1
range 388 reg: SUBU(reg,rc) "sub %%%0,%1,%%%c\n" 1
reg 403
Shift instructions, however, can't accept constant shift operands less
than zero or greater than 31:
....
(SPARC rules469)+=
res: CNSTI "%a" range(a, 0, 31)
474 474 ... 463

res: reg "%%%0"


The first operand and the result must be registers:
....
(SPARC rules469)+=
reg: LSHI(reg,rcS) "sll %%%0,%1,%%%c\n" 1
474 474
... 463

reg: LSHU(reg,rcS) "sll %%%0,%1,%%%c\n" 1


reg: RSHI(reg,rcS) "sra %%%0,%1,%%%c\n" 1
reg: RSHU(reg,rcS) "srl %%%0,%1,%%%c\n" 1
The three Boolean operators have variants that complement the second
operand:
....
(SPARC rules 469) +=
reg: BANDU(reg,BCOMU(rc)) "andn %%%0,%1,%ro%c\n"
474 475
1
... 463

reg: BORU(reg,BCOMU(rc)) "orn %%%0,%1,%%%c\n" 1


reg: BXORU(reg,BCOMU(rc)) "xnor %%%0,%1,%%%c\n" 1
The unary operators work on registers only:
17.2 • SELECTING INSTRUCTIONS 475

....
(SPARC rules469)+=
reg: NEGI(reg) "neg %%%0,%%%c\n" 1
474 475... 463

reg: BCOMU(reg) "not %%%0,%%%e\n" 1

The conversions that widen a signed character or a signed short do so


by shifting left and then right arithmetically to extend the sign bit:
(SPARC rules469)+=
....
475 475 463
reg: CVCI(reg) "sll %%%0,24,%%%e; sra %%%e,24,%%%e\nir 2
reg: CVSI(reg) "sll %%%0,16,%%%e; sra %%%e,16,%%%e\n" 2
The unsigned conversions use and instructions to clear the top bits:
....
(SPARC rules469)+=
reg: CVCU(reg) "and %%%0,0xff,%%%e\n"
475 475
... 463
1
reg: CVSU(reg) "set Oxffff ,%%gl; and %%%0,%%gl,%%%e\n" 2

CVSU needs a 16-bit mask, which won't fit in the instruction as CVCU's
does.
All SPARC unconditional jumps and conditional branches have a one-
instruction delay slot. The instruction a~er the jump or branch - which
is said to be "in the delay slot" - is always executed, just as if it had been
executed before the jump or branch. For the time being, we'll fill each
delay slot with a harmless nop. The ba instruction targets constant ad-
dresses, and the jmp instruction targets the rest, namely the ones needed 403 reg
403 stmt
for switch statements.
....
(SPARC rules469)+=
addrg: ADDRGP "%a"
475 475... 463

stmt: JUMPV(addrg) "ba %0; nop\n" 2


stmt: JUMPV(addr) "jmp %0; nop\n" 2
stmt: LABELV "%a:\n"
The integral relationals compare one register to another register or to a
constant:
....
(SPARC rules469)+=
stmt: EQI(reg,re) "emp %%%0,%1; be %a; nop\n"
475 476
3
... 463

stmt: GEI(reg,re) "emp %%%0,%1; bge %a; nop\n" 3


stmt: GEU(reg,re) "emp %%%0,%1; bgeu %a; nop\n" 3
stmt: GTI (reg , re) "emp %%%0,%1; bg %a; nop\n" 3
stmt: GTU (reg, re) "emp %%%0,%1; bgu %a; nop\n" 3
stmt: LEI(reg,re) "emp %%%0,%1; ble %a; nop\n" 3
stmt: LEU(reg,re) "cmp %%%0,%1; bleu %a; nop\n" 3
stmt: LTI(reg, re) "emp %%%0,%1; bl %a; nop\n" 3
stmt: LTU(reg, re) "emp %%%0,%1; blu %a; nop\n" 3
stmt: NEI(reg,re) "emp %%%0,%1; bne %a; nop\n" 3

The cal 1 instruction targets a constant address or a computed one:


476 CHAPTER 17 •GENERATING SPARC CODE

....
(SPARC ruJes469)+=
call: ADDRGP "%a"
475 476
... 463

call: addr "%0"


reg: CALLD(call) "call %0; nop\n" 2
reg: CALLF(call) "call %0; nop\n" 2
reg: CALLI(call) "call %0; nop\n" 2
stmt: CALLV(call) "call %0; nop\n" 2
stmt: CALLB(call,reg) "call %0; st %%%1,[%%sp+64]\n" 2
CALLB transmits the address of the return block by storing it into the
stack. The store instruction occupies the delay slot.
The front end follows each RET node with a jump to the procedure
epilogue, so RET nodes generate no code and serve only to help the back
end target the return register:
....
(SPARC ruJes469)+=
stmt: RETD(reg) "# ret\n" 1
476 477
... 463

stmt: RETF(reg) "# ret\n" 1


stmt: RETI(reg) "# ret\n" 1
Functions return values in fO, fO-fl, or oO, which is known as i 0 in the
callee. target arranges compliance with this convention:
....
(SPARC target 473)+=
case CALLO: setreg(p, freg2[0]); break;
...
473 476 468
call 186
(MIPS) freg2 433 case CALLF: setreg(p, freg[O]); break;
(SPARC) " 467 case CALLI:
(SPARC) freg 467 case CALLV: setreg(p, oreg[O]); break;
(MIPS) i reg 433 case RETD: rtarget(p, 0, freg2[0]); break;
(SPARC) " 467
oreg 467
case RETF: rtarget(p, 0, freg[O]); break;
reg 403 The case for RETI marks the node to prevent register allocation and avoid
rtarget 400 an apparent contradiction:
setreg 399 ....
stmt
target
403
357
(SPARC target 473)+=
case RETI:
...
476 477 468

(MIPS) " 435 rtarget(p, 0, ireg[O]);


(SPARC) " 468
(X86) " 502 p->kids[O]->x.registered = 1;
x.registered 360 break;
If a routine's first argument is integral, it resides in i o. If a function
returns an integer, iO must hold the return value too. lee's register
allocator can spill temporaries but not formals, so the register allocator
will fail if we ask it to allocate i 0 to a RETI. Formals are, however, dead
at returns, so we simply mark the node allocated, which awards i Oto the
RETI and prevents the register allocator from doing anything, including
spilling the formal.
The register stack automatically saves and restores the general regis-
ters at calls, so only the floating-point registers, minus the return regis-
ter, need to be explicitly saved and restored:
17.2 • SELECTING INSTRUCTIONS 477

(SPARC clobber 477)=


case CALLB: case CALLO: case CALLF: case CALLI:
...
479 468

spill(-(unsigned)3, FREG, p);


break;
case CALLV:
spill(oreg[O]->x.regnode->mask, !REG, p);
spill(-(unsigned)3, FREG, p);
break;
Recall that ralloc calls the target's clobber after allocating a register to
the node.
doarg stores in each ARG node's syms [RX] an integer constant symbol
equal to the argument offset divided by four, which names the outgoing
o-register for most arguments:
...
(SPARC functions466)+=
static void doarg(p) Node p; {
469 478
... 464

p->syms[RX] = intconst(mkactual(4,
p->syms[O]->u.c.v.i)/4);
}

ARG nodes are executed for side effect, so they don't normally use
syms [RX], but the SPARC calling convention implements ARG nodes with
357 clobber
register targeting or assignment, so using RX is natural. 435 " (MIPS)
Targeting arranges to compute the first 24 bytes of arguments into 468 " (SPARC)
the registers for outgoing arguments. target calls rtarget to develop 502 " (X86)
the child into the desired a-register, and then it changes the ARG into a 92 emit
LOAD into the same register, which emit and moveself optimize away: 393 emit
... 361 FREG
(SPARC target 473)+=
case ARGI: case ARGP:
476 480
... 468 49
361
intconst
!REG
361 LOAD
if (p->syms[RX]->u.c.v.i < 6) { 361 mask
rtarget(p, 0, oreg[p->syms[RX]->u.c.v.i]); 366 mkactua 1
p->OP = LOAD+optype(p->op); 394 moveself
setreg(p, oreg[p->syms[RX]->u.c.v.i]); 98 optype
} 467 oreg
417 ralloc
break; 403 reg
Calls with too many arguments for these registers pass the rest in mem- 400 rtarget
362 RX
ory. To pass an argument in memory, the assembler template undoes 399 setreg
the division and adds 68: 427 spill
... 403 stmt
(SPARC rules 469) +=
stmt: ARGI(reg) "st %%%0,[%%sp+4*%c+68]\n" 1
476 478... 463 357
435
target
" (MIPS)
stmt: ARGP(reg) "st %%%0,[%%sp+4*%c+68]\n" 1 468 " (SPARC)
502 " (X86)
sp points at 16 words - 64 bytes - in which the operating system can 362 x. regnode
store the routine's i - and 1-registers when the register windows are ex-
hausted and some must be spilled. The next word is reserved for the
478 CHAPTER 17 • GENERATING SPARC CODE

high addresses !i.~~~·>." . " . ~ ~ :.v'~:'~i:~ ')~


current fp ~''~,':~'g~.::,. ,
caller's sp ----+ · "· · '''" ", , " > ;,.
locals and
temporaries

outgoing arguments
not in oO-oS

space to save
i0-i7 and 10-17
if necessary

low addresses

FIGURE 17.2 SPARC frame layout.


reg 403
stmt 403
address of the structure return block, if there is one. The next words are
for the outgoing arguments; space is reserved there for even those that
arrive in i 0-i 5. So the argument at argument offset n is at %sp+n+68,
which explains the template above. Figure 17.2 shows a SPARC frame.
The code for variadic routines can look in only one spot for, say, the ar-
gument at offset 20, so even floating-point arguments must travel in o0-
o5; note the parallel with the MIPS calling convention. 1cc assumes that
floating-point opcodes yield floating-point registers, so no tree can de-
velop an unconverted floating-point value into an integer register. emi t2
must handle these odd ARGs:
...
(SPARC rules469)+=
stmt: ARGD(reg) "# ARGD\n" 1
...
477 480 463

stmt: ARGF(reg) "# ARGF\n" 1


...
(SPARC functions466)+=
static void emit2(p) Node p; {
...
477 483 464

switch (p->op) {
(SPARC emi t2 479)
}
}

ARGF must get a value from a floating-point register into an a-register or


17.2 • SELECTING INSTRUCTIONS 479

onto the stack. A stack slot is reserved for each outgoing argument, and
the only path from a floating-point register to a general register is via
memory, so emi t2 copies the floating-point register into the stack, and
then loads the stack slot into the a-register, unless we're past o5:
(SPARCemit2 479)= 479
.... 478
case ARGF: {
int n = p->syms[RX]->u.c.v.i;
print("st %%f%d,[%%sp+4*%d+68]\n",
getregnum(p->x.kids[O]), n);
i f (n <= 5)
print("ld [%%sp+4*%d+68],%%o%d\n", n, n);
break;
}

ARGO is similar, but it needs two stores and up to two loads:


(SPARC emit2 479)+= 479
... 482 478
....
case ARGO: {
int n = p->syms[RX]->u.c.v.i;
int src = getregnum(p->x.kids[O]);
print("st %%f%d,[%%sp+4*%d+68]\n", src, n);
print("st %%f%d,[%%sp+4*%d+68]\n", src+l, n+l);
if (n <= 5) 357 clobber
print("ld [ro%sp+4*%d+68] ,ro%o%d\n", n, n); 435 " (MIPS)
if (n <= 4) 468 " (SPARC)
print("ld [%%sp+4*%d+68],%%o%d\n", n+l, n+l); 502 " (X86)
break; 356 emit2
444 " (MIPS)
} 478 " (SPARC)
511 " (X86)
If a double argument is preceded by, say, five integers, then its first
361 !REG
half travels in o5 and its second half on the stack. Splitting the double 362 RX
seems strange, but variadic routines leave no alternative, and procedure 427 spill
prologues reunite the two halves. 359 x.kids
It seems unwise to ask the register allocator to allocate a general reg-
ister to a floating-point node, so clobber calls spill to ensure that any
live value in the argument register is saved before the floating-point ARC
and restored later:
(SPARC clobber 477) += 477 480 468
...
....
case ARGF:
if (p->syms[2]->u.c.v.i <= 6)
spill((l<<(p->syms[2]->u.c.v.i + 8)), !REG, p);
break;
case ARGO:
if (p->syms[2]->u.c.v.i <= 5)
spill((3<<(p->syms[2]->u.c.v.i + 8))&0xff00, !REG, p);
break;
480 CHAPTER 17 • GENERATING SPARC CODE

The MIPS code generator avoided this step because it never allocated the
argument registers for any other purpose, but the SPARC convention uses
ol-o5 for temporaries when they aren't holding outgoing arguments.
The first SPARC systems offered no instructions to multiply, divide,
or find remainders, so the standard library supplied equivalent func-
tions. It is perhaps premature to abandon these systems, so 1cc sets
mu 1ops_ca 11 s and sticks with the functions even on newer machines
that offer multiplicative instructions (see Exercise 17.1):
(SPARC rules 469) +=
...
478 480 463
.....
reg: DIVI(reg,reg) "call .div,2; nop\n" 2
reg: DIVU(reg,reg) "call .udiv,2; nop\n" 2
reg: MODI(reg,reg) "call .rem,2; nop\n" 2
reg: MODU(reg,reg) "call .urem,2; nop\n" 2
reg: MULI(reg, reg) "call .mul, 2; nop\n" 2
reg: MULU(reg,reg) "call .umul,2; nop\n" 2
target arranges to pass the operands in oO and ol, and to receive the
result in oO:
(SPARC target 473)+=
...
477 468
case DIV!: case MODI: case MULI:
case DIVU: case MODU: case MULU:
!REG 361 setreg(p, oreg[O]);
mulops_calls 87 rtarget(p, 0, oreg[O]);
oreg 467 rtarget(p, 1, oreg[l]);
reg 403 break;
rtarget 400
setreg 399 The library functions allocate no new register window, and instead de-
spill 427
stroy ol-o5:
target 357
(MIPS) " 435 (SPARC clobber 477) +=
...
479 468
(SPARC) " 468
(X86) " 502 case DIV!: case MODI: case MULI:
case DIVU: case MODU: case MULU:
spill(Ox00003e00, !REG, p); break;
The binary floating-point instructions accept only registers:
(SPARC rules 469) +=
...
480 481 463
.....
reg: ADDD(reg,reg) "faddd %%f%0,%%f%1,%%f%c\n" 1
reg: ADDF(reg,reg) "fadds %%f%0,%%f%1,%%f%c\n" 1
reg: DIVD(reg,reg) "fdivd %"..bf%0,%%f%1,%%f%c\n" 1
reg: DIVF(reg,reg) "fdivs %%f%0,%%f%1,%%f%c\n" 1
reg: MULD(reg, reg) "fmuld %%f%0,%%f%1,%%f%c\n" 1
reg: MULF(reg,reg) "fmuls %%f%0,%%f%1,%%f%c\n" 1
reg: SUBD(reg, reg) "fsubd %%f%0,%%f%1,%%f%c\n" 1
reg: SUBF(reg,reg) "fsubs %%f%0,%%f%1,%%f%c\n" 1
Most floating-point unary operators are similar:
17.2 •SELECTING INSTRUCTIONS 481

(SPARC rules 469) +=


....
480 481
.... 463
reg: NEGF(reg) "fnegs %%f%0,%%f%c\n" 1
reg: LOADF(reg) "fmovs %%f%0,%%f%c\n" 1
reg: CVDF(reg) "fdtos %%f%0,%%f%c\n" 1
reg: CVFD(reg) "fstod %%f%0,%%f%c\n" 1
The conversions between doubles and integers need three instructions
each because the necessary conversion instructions use only floating-
point registers even for the integral operand. fdtoi converts a double
into an integer, but leaves the result in a floating-point register. The
parent of the CVDI expects a general register, so the template copies the
result to a general register via a temporary cell in memory, which is the
only available path:
(SPARC rules469)+=
....
reg: CVDI (reg) "fdtoi %",.bf%0, %%f0; st %",.bfO, [%%sp+64] ; _
481 481 .... 463

ld [%%sp+64] ,%",.b%c\n" 3
CVID reverses the process:
(SPARCrules469)+=
....
481481 463
....
reg: CVID(reg) "st %%%0,[%%sp+64]; ld [%%sp+64],%%f%c; _
fi tod %",.bf",.bc, %%f%c\n" 3
403 reg
CVDI and CVID use the spot reserved for the address of the structure re- 403 stmt
turn block for any callees. The spot is unused except between the branch
delay slot of a ca11 instruction and the callee's prologue instruction that
allocates a new stack frame. No CVDI or CVID can appear in any such
interval.
The floating-point comparisons have one delay slot after the branch,
and another after the comparison:
(SPARC rules469)+=
....
481 482 463
....
rel: EQD(reg, reg) 11 fcmped %",.bf%0, %",.bf%l; nop; fbue 11
rel: EQF(reg,reg) 11
fcmpes %%f%0,%%f%1; nop; fbue"
rel: GED(reg,reg) "fcmped %%f%0,%%f%1; nop; fbuge 11
rel: GEF(reg,reg) "fcmpes %%f%0,%%f%1; nop; fbuge 11
rel: GTD(reg,reg) 11 fcmped %%f%0 , %",.bf",.bl; nop; fbug"
rel: GTF(reg,reg) "fcmpes %%f%0,%",.bf%1; nop; fbug 11
rel: LED(reg,reg) 11
fcmped %%f%0,%%f%1; nop; fbule"
rel: LEF(reg,reg) 11 fcmpes %%f%0,%%f%1; nop; fbule"
rel: Lm(reg,reg) 11 fcmped %%f%0,%%f%1; nop; fbul"
rel: LTF(reg,reg) "fcmpes %",.bf%0, %%f%1; nop; fbul"
rel: NED(reg,reg) "fcmped %%f%0, %%f",.bl; nop; fbne 11
rel: NEF(reg,reg) "fcmpes %",.bf%0,%",.bf%l; nop; fbne 11

stmt: rel 11
%0 %a; nop\n 11 4
482 CHAPTER 17 • GENERA TING SPARC CODE

A few opcodes can't be implemented by any fixed assembler template,


and must be saved for emi t2. No SPARC instruction copies one double-
precision register to another, so 1 cc emits two single-precision instruc-
tions for each LOADD:
(SPARC ruJes469)+=
...
481 482 463
....
reg: LOADD(reg) 11
# LOADD\n 11
2

(SPARC emit2 479)+=


...
479 482 478
....
case LOADD: {
int dst = getregnum(p);
int src = getregnum(p->x.kids[O]);
print("fmovs %%f",(;d,%%f%d; src, dst); 11
,

print("fmovs %%f%d,%%f%d\n", src+l, dst+l);


break;
}

NEGD is similar. One instruction copies the first word and changes the
sign bit in transit. The other instruction copies the second word:
(SPARC ruJes469)+=
...
482 482 463
....
reg: NEGD(reg) 11
# NEGD\n 11
2

(SPARC emit2 479)+=


...
482 482 478
blkcopy 367 ....
dalign 368
case NEGD: {
emit2 356 int dst = getregn~m(p);
(MIPS) " 444 int src = getregnum(p->x.kids[O]);
(SPARC) " 478 print( fnegs %%f%d,%%f",(;d;
11
src, dst); 11
,

(X86) " 511 print( fmovs %%f%d,%%f%d\n", src+l, dst+l);


11

reg 403
break;
salign 368
stmt 403 }
tmpregs 434
Finally, emi t2 calls b1 kcopy to generate code to copy a block of memory:
x.kids 359
(SPARC rules469)+=
...
482 463
stmt: ASGNB(reg,INDIRB(reg)) 11
# ASGNB\n 11

(SPARC emit2 479)+=


...
482 478
case ASGNB: {
static int tmpregs[] = { 1, 2, 3 };
dalign = salign = p->syms[l]->u.c.v.i;
blkcopy(getregnum(p->x.kids[O]), 0,
getregnum(p->x.kids[l]), 0,
p->syms[O]->u.c.v.i, tmpregs);
break;
}

Figure 13.4 traces the block-copy generator in action for the MIPS target,
but the SPARC code differs only cosmetically. The SPARC instruction
17.3 • IMPLEMENT/NG FUNCTIONS 483

set has no unaligned loads or stores, but this is moot here because the
example in the figure doesn't use the MIPS unaligned loads and stores
anyway. Recall that sa 1i gn, da 1i gn, and x. max_una1i gned_load collabo-
rate to copy even unaligned blocks, so the target-specific code can ignore
this complication. The g-registers aren't being used, so the emitted code
can use gl-g3 as temporaries; the MIPS code was trickier because the
conventions there made it harder to acquire so many registers at once.
emi t2 omits the usual case for ARGB because wants_argb is zero on
this target.

17.3 Implementing Functions


The front end calls 1oca1 to announce new local variables. Like its coun-
terpart for the other targets, the SPARC local calls askregvar to assign
the local to a register if possible, and it calls mkauto if askregvar can't
comply:
...
(SPARC functions 466) + =
static void local(p) Symbol p; {
...
478 484 464

(structure return block? 484)


(put even lightly used locals in registers 483)
if (askregvar(p, rmap[ttob(p->type)]) == 0) 179 addressed
mkauto(p); 412 askregvar
} 368 dalign
356 emit2
The front end won't switch a local to use scl ass REGISTER unless it es- 444 " (MIPS)
timates that the variable will be used three or more times. This cutoff 478 " (SPARC)
leaves in memory locals used too little to justify spilling a register in 511 " (X86)
60 isfloat
the procedure prologue and reloading it in the epilogue. SPARC register 60 isscalar
windows, however, make some general registers available for locals au- 365 mkauto
tomatically, so our code might as well use them even if the local is used 80 REGISTER
only once or twice: 398 rmap
368 salign
(put even lightly used locals in registers 483) = 483 38 sclass
if (isscalar(p->type) && !p->addressed && !isfloat(p->type)) 73 ttob
88 wants_argb
p->sclass = REGISTER; 88 wants_callb
The SPARC code generator sets wants_cal 1b so that it can match the
SPARC convention for returning structures. When wants_ca11 b is set,
the front end takes three actions:
1. It generates CALLB nodes to reach functions that return structures.
2. It sets the second child of each CALLB to a node that computes the
address of the block into which the callee must store the structure
that it's returning.
484 CHAPTER 17 • GENERATING SPARC CODE

3. It precedes each return with an ASGNB that copies the block ad-
dressed by the child of the return into the block addressed by the
first local.
The front end announces this local like any other, and the back end ar-
ranges for it to address the stack slot reserved for the location of struc-
ture return blocks:
(structure return block? 484}= 483
if (retstruct) {
p->x.name = stringd(4*16);
p->x.offset = 4*16;
retstruct = O;
return;
}

function sets retstruct to one if the current function returns a struc-


ture or union.
The front end calls the interface procedure function to announce each
new routine. function drives most of the back end. It calls gencode,
which calls gen, which calls the labeller, reducer, linearizer, and register
allocator. function also calls the front end's emitcode, which calls the
back end's emitter. The front end passes to function a symbol that
autos 294 represents a routine, vectors of symbols representing the caller's and
callee 93 callee's views of the arguments, and a count of the number of calls made
caller 93 by the routine:
emitcode 341
(SPARC functions 466} +=
....
483 489 464
gencode 337 .....
gen 92 static void function(f, caller, callee, ncalls)
gen 402 Symbol f, callee[], caller[]; int ncalls; {
reg 403 int autos= 0, i, leaf, reg, varargs;
retstruct 487
stringd 29
variadic 65 (SPARC function 484}
x.name 362 }
x.offset 362
1eaf flags simple leaf routines, varargs flags variadic routines, and
autos counts the parameters in memory, which helps compute 1eaf.
Only varargs can be computed immediately:
(SPARC function 484}= 485
..... 484
for (i = O; callee[i]; i++)

varargs = variadic(f->type)
I I i > 0 && strcmp(callee[i-1]->name,
"_builtin_va_alist") == O;
The SPARC convention either declares the routine variadic or uses a
macro that names the last argument _bui 1ti n_va_a1i st.
function clears the back end's record of busy registers:
17.3 • IMPLEMENTING FUNCTIONS 485

....
(SPARC function 484) +=
(clear register state 410)
484 485
... 484

for (i = O; i < 8; i++)


ireg[i]->x.regnode->vbl =NULL;
The for loop above has no counterpart in the MIPS code generator. 1cc al-
locates no variables to MIPS argument registers, but it does allocate vari-
ables to SPARC argument registers, so x. regnode->vbl can hold trash
from the last routine compiled by the SPARC code generator.
offset initially holds the frame offset of the next formal parameter
in this routine. function initializes it to record the fact that each frame
includes at least one word to hold the address of the target block for
functions that return a structure, plus 16 words in which to store i 0-i 7
and 10-17 if the register window must be spilled:
....
(SPARC function 484) +=
offset = 68;
485 485
... 484

maxargoffset holds the size of the stack block for outgoing arguments.
function reserves space for at least o0-o5:
....
(SPARC function 484) +=
maxargoffset = 24;
485 485
... 484

Procedure prologues store incoming floating-point arguments into this


space because little can be done with them in the i -registers, and variadic 93 callee
93 caller
callees like pri ntf store all incoming argument registers in this space, 92 function
because they must use a procedure prologue that works for an unknown 448 " (MIPS)
number of arguments, and they must access those arguments using ad- 484 " (SPARC)
dresses calculated at runtime, not register numbers fixed at compile time. 518 " (X86)
function determines the i -register or stack offset for each incom- 433 i reg (MIPS)
467 " (SPARC)
ing argument. At the beginning of each iteration of the for loop below, 60 isstruct
offset holds the stack offset reserved for the next parameter, and reg 366 maxargoffset
holds the number of the register or register pair for the next parameter, 364 offset
if the parameter arrives in a register. The stack needs 4-byte alignment, 403 reg
so we round the parameter size up to a multiple of four before doing any- 19 roundup
361 vbl
thing with it. This parameter chews up size bytes of stack space and 362 x.regnode
thus size/4 registers, except for structure arguments, which are passed
by reference and thus chew up only one i -register.
....
(SPARC function 484) +=
reg = O;
485 487
... 484

for (i = O; callee[i]; i++) {


Symbol p = callee[i], q = caller[i];
int size= roundup(q->type->size, 4);
(classify SPARC parameter486)
offset += size;
reg += isstruct(p->type) ? 1 size/4;
}
480 CHAPTER 17 • GENERA TING SPARC CODE

function can confine its attention to scalar formals because wants_argb


is zero.
If the parameter is a floating-point value or past the end of the argu-
ment registers, then it goes in memory, and this routine needs a frame
in memory to store this parameter:
(classify SPARC parameter486)=
if (isfloat(p->type) I I reg >= 6) {
...
486 485

p->x.offset = q->x.offset = offset;


p->x.name = q->x.name = stringd(offset);
p->sclass = q->sclass = AUTO;
autos++;
}

In the first case, function must generate code itself to store the param-
eter if it arrived in a register; the front end can't help because 1cc's
intermediate code gives it no way to store the floating-point value from
an integer register.
If the parameter is integral and arrived in an i -register, it still belongs
in memory if its address is taken or if the routine is variadic:
....
(classify SPARC parameter486)+=
else if (p->addressed I I varargs)
486 486
... 485

addressed 179 (arrives in an i -register, belongs in memory 486)


askregvar 412
AUTO 80
(arrives in an i -register, belongs in memory 486) = 486
autos 294
function 92 {
(MIPS) " 448 p->x.offset = offset;
(SPARC) " 484 p->x.name = stringd(p->x.offset);
(X86) " 518 p->sclass = AUTO;
(MIPS) i reg 433
(SPARC) " 467
q->sclass = REGISTER;
isfloat 60 askregvar(q, ireg[reg]);
offset 364 autos++;
reg 403 }
REGISTER 80
sclass 38 function sets the callee's and caller's scl ass to differing values so the
stringd 29 front end will generate an assignment to store the register.
wants_argb 88
x.name 362
The parameter can remain in a register if it arrived in one, if it's inte-
x.offset 362 gral, if its address isn't taken, and if the routine isn't variadic:
(classify SPARC parameter486)+=
....
486 485
else {
p->sclass = q->sclass = REGISTER;
askregvar(p, ireg[reg]);
q->x.name = p->x.name;
}
17.3 •IMPLEMENTING FUNCTIONS 487

Now to call gencode in the front end, which calls gen in the back end.
First, function clears offset to record that no locals have been assigned
to the stack yet, it clears maxoffset to track the largest value of offset,
and it flags each function that returns an aggregate because 1oca1 must
treat its first local specially:
...
(SPARC data 467) +=
static int retstruct;
467 492... 463

...
(SPARC function 484) +=
offset = maxoffset = O;
485 487
... 484

retstruct = isstruct(freturn(f->type));
gencode(caller, callee);
When gencode completes the first code-generation pass and returns,
function can compute the size of the frame and of the argument-build
block, in which the outgoing arguments are marshaled. The size of the
argument-build area must be a multiple of four, or some stack fragments
will be unaligned. The frame size must be a multiple of eight, and in-
cludes space for the locals, the argument-build area, 16 words in which
to save i 0-i 7 and 10-17, and one word to store the address of any ag-
gregate return block:
...
(SPARC function 484) +=
maxargoffset = roundup(maxargoffset, 4);
487 487... 484 294
93
autos
callee
framesize = roundup(maxoffset + maxargoffset + 4*(16+1), 8); 93 caller
366 framesize
function emits code that saves time by allocating no new frame or 64 freturn
register window for routines that don't need them: 92 function
... 448 " (MIPS)
(SPARC function 484) +=
1eaf = ((is this a simple leaf function? 487)) ;
487 488... 484 484
518
" (SPARC)
" (X86)
337 gencode
The constraints are many. The routine must make no calls: 92 gen
402 gen
(is this a simple leaf function? 487) = ...
487 487 60 isstruct
!ncalls 90 local
447 " (MIPS)
It must have no locals or formals in memory: 483 " (SPARC)
... 518 " (X86)
(is this a simple leaf function? 487) +=
&& !maxoffset && !autos
...
487 487 487 366
365
maxargoff set
maxoffset
364 off set
It must not return a structure, because such functions use a frame 19 roundup
pointer in order to access the cell that holds the location of the return
block:
...
(is this a simple leaf function? 487) +=
&& !isstruct(freturn(f->type))
...
487 488 487

It must save no registers:


468 CHAPTER 11 • GENERA TING SPARC CODE

(is this a simple leaf function? 487) +=


....
487 487
&& !(usedmask[IREG]&OxOOffffOl)
&& !(usedmask[FREG]&-(unsigned)3)
which means that it must confine itself to the incoming argument regis-
ters o0-o7. The routine must also require neither debugging nor profil-
ing, but those checks are omitted from this book. If all these conditions
are met, then the routine can make do with no frame.
All prologues start with a common boilerplate:
....
(SPARC function 484) +=
print(".align 4\n.proc 4\n%s:\n", f->x.name);
487 488
... 484

Most continue with a save instruction, which allocates a new register win-
dow and adds a register or constant to a register. Most uses of save add
a negative constant to sp, which allocates a new frame on the downward-
growing stack:
....
(SPARC function 484) +=
i f (leaf) {
488 489
... 484

(emit leaf prologue488)


} else if (framesize <= 4095)
print("save %%sp,%d,%%sp\n", -framesize);
callee 93 else
caller 93 print("set %d,%%gl; save %%sp,%%gl,%%sp\n", -framesize);
framesize 366
FREG 361 If the constant won't fit in a SPARC immediate field, then the prologue
function 92 first computes it into register gl.
(MIPS) " 448 Routines eligible for the leaf optimization require no prologue, but the
(SPARC) " 484
(X86) " 518
code generator has used the i-registers for arguments and, for that mat-
greg 467 ter, for the locals and temporaries. Now we've decided to generate no
IREG 361 frame or register window, so we must use the corresponding o-registers
number 361 instead. 1cc's back end was not designed with wholesale register renam-
REGISTER 80 ing in mind, so even the best solution is clunky: function temporarily
rename 489
usedmask 410
changes the structures that store the name and number of each i -register
x.name 362 to name an o-register instead. It starts with its caller argument vector.
x.regnode 362 function's initial for loop copied the name of an i-register into the ar-
gument's x. name field, so now function must correct that field:
(emit leaf prologue 488) = 488
for Ci= O; caller[i] && callee[i]; i++) {
Symbol p = caller[i], q = callee[i];
if (p->sclass == REGISTER && q->sclass == REGISTER)
p->x.name = greg[q->x.regnode->number - 16]->x.name;
}
rename();
The procedure rename makes the remaining changes:
17.3 • IMPLEMENTING FUNCTIONS 489

...
(SPARC functions 466) +=
#define exch(x, y, t) (((t) x), ((x) (y)) I
...
484 490 464
((y) = (t)))

static void rename() {


int i;

for Ci = O; i < 8; i++) {


char *ptmp;
int itmp;
if (ireg[i]->x.regnode->vbl)
ireg[i]->x.regnode->vbl->x.name = oreg[i]->x.name;
exch(ireg[i]->x.name, oreg[i]->x.name, ptmp);
exch(ireg[i]->x.regnode->number,
oreg[i]->x.regnode->number, itmp);
}
}

rename exchanges the name and number from corresponding i - and o-


registers, so that another exchange at the end of function will restore
normality. If the register allocator has assigned the register to a variable,
rename also corrects the name recorded in the symbol structure for that
variable. Exchanges implement rename's changes because they must be
undone at the end of the current routine, but simple assignments im- 93 caller
plement the changes to ca 11 er and register variables because they don't 92 funct:i on
outlive the current routine. 448 " (MIPS)
function next emits prologue code to save any arguments that arrived 484 " (SPARC)
518 " (X86)
in registers but can't remain there. Variadic routines must save all of i 0- 433 i reg (MIPS)
i 5 because their prologue code can't know how many of them actually 467 " (SPARC)
hold arguments: 60 isdouble
... 361 number
(SPARC function 484)+=
if (varargs)
488 490
... 484 364
467
offset:
oreg
for (; reg < 6; reg++) 403 reg
print("st %%i%d, [%"~fp+%d]\n", reg, 4*reg + 68); 361 vbl
362 x.name
else 362 x.regnode
(spill floats and doubles from i 0-i 5 489)

Prologues also save floating-point values that arrive in general registers


because instructions can't do much with them there.
(spill floats and doubles from iO-i 5 489)= 489
offset= 4*(16 + 1);
reg = O;
for Ci= O; caller[i]; i++) {
Symbol p = caller[i];
if (isdouble(p->type) && reg <= 4) {
print("st %%r%d,[%%fp+%d]\n",
490 CHAPTER 17 • GENERATING SPARC CODE

ireg[reg++]->x.regnode->number, offset);
print("st %%r%d,[%%fp+%d]\n",
ireg[reg++]->x.regnode->number, offset+ 4);
} else if (isfloat(p->type) && reg <= 5)
print("st %%r%d, [%%fp+%d]\n",
ireg[reg++]->x.regnode->number, offset);
else
reg++;
offset+= roundup(p->type->size, 4);
}

i sfl oat succeeds for floats and doubles, so the first else arm above saves
not just floats but also the first half of any double that arrives in i 5; the
second half will be in memory already, courtesy of the caller.
Finally, function emits some profiling code (not shown), the body of
the current routine, and the epilogue. The general epilogue is a ret
instruction, which jumps back to the caller, and a restore instruction
in the ret's delay slot, which undoes the prologue's save instruction. If
the routine does without a register window and stack frame, there's no
save to undo, but another rename is needed to restore normality to the
names and numbers of the i-registers:
(SPARC function 484)+=
....
489 484
emitcode 341
(emit profiling code)
function 92
(MIPS) " 448 emitcode();
(SPARC) " 484 if (!leaf)
(X86) " 518 print("ret; restore\n");
(MIPS) i reg 433 else {
(SPARC) " 467
rename();
isfloat 60
number 361 print("retl; nop\n");
offset 364 }
reg 403
rename 489 ret and retl are both pseudo-instructions that emit an indirect branch
roundup 19 using the register that holds the return address. They need different
x.regnode 362 names because ret uses i 7, and retl uses o7 to name the same register
because no register stack frame was pushed.

17.4 Defining Data


The SPARC defconst, defaddress, defstring, and address are the same
as their MIPS counterparts. See Chapter 16 for the code.
The front end calls export to expose a symbol to other modules, which
is the purpose of the SPARC assembler directive .global:
....
(SPARC functions466)+= 489 491
.... 464
static void export(p) Symbol p; {
17.4 • DEFINING DATA 491

print(".global %s\n", p->x.name);


}

The front end calls import to make visible in the current module a sym-
bol defined in another module. The SPARC assembler assumes that un-
defined symbols are external, so the SPARC import has nothing to do:
....
(SPARC functions 466) +=
static void import(p) Symbol p; {}
490 491
... 464

The front end calls defsymbo l to announce a new symbol and cue the
back end to initialize the x. name field. The SPARC conventions generate a
name for local statics and use the source name for the rest. The SPARC
link editor leaves symbols starting with L out of the symbol table, so
defsymbo l prefixes L to generated symbols. It prefixes an underscore to
the rest, following another SPARC convention:
....
(SPARC functions466)+=
static void defsymbol(p) Symbol p; {
491 491
... 464

if (p->scope >= LOCAL && p->sclass == STATIC)


p->x.name stringf("%d", genlabel(l));
else
p->x.name p->name;
if (p->scope >= LABELS) 91 BSS
p->x.name = stringf(p->generated ? "L%s" "__%5"' 91 CODE
p->x.name); 459 cseg (MIPS)
} 492 " (SPARC)
501 " (X86)
Statics at file scope retain their names. Statics at deeper scope get num- 91 DATA
50 generated
bers to avoid colliding with other statics of the same name in other rou- 45 genlabel
tines. 38 LABELS
The interface routine segment emits the . seg "name", which switches 91 LIT
to a new segment: 38 LOCAL
37 scope
....
(SPARC functions466)+=
static void segment(n) int n; {
491 492
... 464 92
459
space
" (MIPS)
492 " (SPARC)
cseg = n; 524 " (X86)
switch (n) { 80 STATIC
case CODE: print(".seg \"text\"\n"); break; 99 stringf
case BSS: print(". seg \"bss\"\n"); break; 362 x.name
case DATA: print(" .seg \"data\"\n"); break;
case LIT: print(".seg \"text\"\n"); break;
}
}

segment tracks the current segment in cseg for the interface procedure
space, which emits the SPARC . skip assembler directive to reserve n
bytes of memory for an initialized global or static:
492 CHAPTER 17 • GENERA TING SPARC CODE

....
(SPARC data 467) += 487 463
static int cseg;
....
(SPARC functions466)+= 491 492 464
....
static void space(n) int n; {
if (cseg ! = BSS)
print(". skip %d\n", n);
}

. skip arranges to clear the space that it allocates, which the standard
requires.
If we're in the BSS segment, then the interface procedure gl oba1 can
define the label and reserve space in one fell swoop, using . common for
external symbols and . reserve for the rest:
(SPARC functions466)+=
....
492 492 464
....
static void global(p) Symbol p; {
print(".align %d\n", p->type->align);
if (p->u.seg == BSS
&& (p->sclass == STATIC I I Aflag >= 2))
print(".reserve %s,%d\n", p->x.name, p->type->size);
else if (p->u.seg == BSS)
print(".common %s,%d\n", p->x.name, p->type->size);
Aflag 62
align 78
else
BSS 91 print("%s:\n", p->x.name);
reg 403 }
seg 265
STATIC 80 It also emits an alignment directive and, for initialized globals, the la-
x.name 362 bel. . common also exports the symbol and marks it so that the loader
generates only one common global even if other modules emit . common
directives for the same identifier. . reserve takes neither step. Statics
use it to avoid the export, and the scrupulous double -A option uses it to
have the loader complain when multiple modules define the same global.
Pre-ANSI C permitted multiple definitions, but ANSI C technically expects
exactly one definition; other modules should use extern declarations in-
stead.

17.5 Copying Blocks


blkfetch emits code to load register tmp with k bytes from the address
formed by adding register reg and offset off. k is 1, 2, or 4:
....
(SPARC functions 466) += 492 493 .... 464
static void blkfetch(k, off, reg, tmp)
int k, off, reg, tmp; {
if (k == 1)
17.5 • COPYING BLOCKS 493

print("ldub [%%r%d+%d] ,%%r%d\n", reg, off, tmp);


else if (k == 2)
print("lduh [%%r%d+%d],%%r%d\n", reg, off, tmp);
else
print("ld [%%r%d+%d],%%r%d\n", reg, off, tmp);
}

No SPARC instructions load unaligned values, so bl kfetch needn't decide


between aligned and unaligned loads, which the MIPS bl kfetch does.
bl kun ro 11 has used x. max_una l i gned_ load to pick a block size and
guarantee that the alignment is no smaller than the block size. bl kfetch
need only choose between loading an 8-bit byte, a 16-bit halfword, or a
32-bit word. bl kstore mirrors bl kfetch:
....
(SPARC functions 466) +=
static void blkstore(k, off, reg, tmp)
492 493 ... 464

int k, off, reg, tmp; {


i f (k == 1)
print("stb %%r%d, [%%r%d+%d]\n", tmp, reg, off);
else if (k == 2)
print("sth %%r%d, [%%r%d+%d]\n", tmp, reg, off);
else
print("st %%r%d,[%%r%d+%d]\n", tmp, reg, off);
356 blkfetch
}
460 " (MIPS)
492 " (SPARC)
All SPARC blk procedures use generic register names like r9. If we tried 513 " (X86)
to use the g, i, 1, and o names that we use elsewhere, we'd need to 368 blkunroll
change the interface between the bl k procedures to pass symbolic regis- 403 reg
ter names instead of integral register numbers, which would complicate
adapting l cc to emit binary object code directly, for example.
blkloop emits a loop to copy size bytes from a source address -
formed by adding register sreg and offset soff - to a destination ad-
dress - formed by adding register dreg and offset doff:
(SPARC functions 466) +=
....
493 464
static void blkloop(dreg, doff, sreg, soff, size, tmps)
int dreg, doff, sreg, soff, size, tmps[]; {
(SPARC bl kloop 494)
}

tmp names three registers to use as temporaries. Each iteration copies


eight bytes. Initial code points s reg to the end of the source block and
tmp [2] to the end of the target block. This fragment has two arms. Block
sizes that fit in a signed 13-bit field are processed directly. Larger block
sizes are computed into register tmps [2], and the register is added to
the incoming source and destination addresses.
494 CHAPTER 17 • GENERATING SPARC CODE

{SPARC bl kloop 494)= 494


.... 493
if ((size&-7) < 4096) {
print("add %%r%d,%d,%%r%d\n", sreg, size&-7, sreg);
print("add %%r%d,%d,%%r%d\n", dreg, size&-7, tmps[2]);
} else {
print("set %d,%%r%d\n", size&-7, tmps[2]);
print("add %%r%d,%%r%d,%%r%d\n", sreg, tmps[2], sreg);
print("add %%r%d,%%r%d,%%r%d\n", dreg, tmps[2], tmps[2]);
}

If the block's size is not divisible by eight, then an initial bl kcopy copies
the stragglers:
{SPARCblkloop494)+= 494 494
... 493
....
blkcopy(tmps[2], doff, sreg, soff, size&?, tmps);
The loop decrements registers sreg and tmp[2] by eight for each itera-
tion. It does tmp[2] immediately, but pushes sreg's decrement forward
to fill the branch delay slot at the end of the loop:
{SPARC bl kl oop 494) +=
...
494 494 493
....
print("l: dee 8,%%r%d\n", tmps[2]);
The loop next calls b1 kcopy to copy eight bytes from the source to the
blkcopy 367 destination. The source offset is adjusted to account for the fact that
s reg should've been decremented by now:
{SPARC bl kl oop 494) +=
...
494 494 493
....
blkcopy(tmps[2], doff, sreg, soff - 8, 8, tmps);
Finally, the loop continues if more bytes remain:
{SPARC bl kloop 494)+=
...
494 493
print("cmp %%r%d,%%r%d; ", tmps[2], dreg);
pri nt("bgt lb; ");
print("dec 8,%%r%d\n", sreg);

Further Reading
The SPARC reference manual elaborates on the architecture of this ma-
chine (SPARC International 1992). Patterson and Hennessy (1990) explain
the reasons behind delay slots. Krishnamurthy (1990) surveys the liter-
ature in instruction scheduling, which fills delay slots.
EXERCISES 495

Exercises
17.l Add a flag that directs the back end to emit instructions instead of
calls to multiply and divide signed and unsigned integers.
17.2 Adapt 1cc's SPARC code generator to make better use of gl-g7 and
to keep some floating-point variables in floating-point registers. Re-
call that the calling convention and thus all previously compiled
library routines preserve none of these registers.
17.3 Find some use for at least some of the delay slots after uncondi-
tional jumps. For example, the slot after an unconditional jump
can be filled with a copy of the instruction at the jump target, and
the jump can be rewritten to target the next instruction. Some opti-
mizations require buffering code and making an extra pass over it.
The MIPS R3000 architecture has such delay slots too, but the stan-
dard assembler reorders instructions to fill them with something
more useful, so we could ignore the problem there.
17.4 Find some use for at least some of the delay slots after conditional
branches. It may help to exploit the annul bit, which specifies that
the instruction in the delay slot is to have no effect unless the
branch is conditional and taken. Set the annul bit by appending
,a to the opcode (e.g., be, a L4).
17.5 Some SPARC chips stall for at least one clock cycle when a load
instruction immediately precedes an instruction that uses the value
loaded. The object code would run just as fast with a single nop
after the load, though it would be one word longer. Reorder the
emitted assembler code to eliminate at least some of these stalls.
Proebsting and Fischer (1991) describe one solution.
17.6 Some leaf routines need no register window, but still lose the leaf
optimization because they need a frame pointer. For example, some
functions that return structures need no window, but do use a
frame pointer. Change 1cc to generate a frame but no register win-
dow for such routines.
17.7 The SPARC code generator includes idiosyncratic code to ensure
that the spiller can emit code to store a register when no allocable
registers are free. Devise a short test program that exercises this
code.
18
Generating X86 Code

This book uses the name X86 for machines compatible for the purposes
of code generation with the Intel 386 architecture, which include the
Intel 486 and Pentium architectures, plus clones from manufacturers like
AMD and Cyrix. The 1burg specification uses approximate Intel 486 cycle
counts for costs, which often but not always gives the best result for
compatibles. Some costs are omitted because they aren't needed. For
example, if only one rule matches some operator, there is no need for
costs to break ties between derivations.
The X86 architecture is a CISC, or complex instruction set computer.
It has a large set of variable-length instructions and addressing modes.
It has eight 32-bit largely general registers and eight 80-bit floating-point
registers organized as a stack.
There are many C compilers for the X86, and their conventions (e.g.,
for calling functions and returning values) differ. The code generator in
this chapter uses the conventions of Borland C+ + 4.0. That is, it interop-
erates with Borland's standard include files, libraries, and linker. Using
1cc with other X86 environments may require a few changes; documen-
tation on the companion diskette elaborates.
There are many X86 assemblers, and they don't all use the same syn-
tax. lee works with Microsoft's MASM 6.11 and Borland's Turbo Assem-
bler 4.0. That is, it emits code in the intersection of the languages ac-
cepted by these two assemblers. Both have instructions that list the des-
tination operand before the source operand. The registers have names
instead of numbers. Table 18.1 describes enough sample instructions to
get us started.
The file x86. c collects all X86-specific code and data. It's an 1burg
specification with the interface routines after the grammar:
(x86.md496}=
%{
(X86 macros498}
(lburg prefix 375}
(interface prototypes}
(X86 prototypes}
(X86 data 499}
%}
(terminal declarations 376}
%%
(shared rules 400}
498
A RETARGETABLE C COMPILER 497

Assembler Meaning
mov al,byte ptr 8 Set register a 1 to the byte at address 8.
mov dword ptr 8[edi*4],l Set to one the 32-bit word at the address
formed by adding eight to the product of
register edi and four.
subu eax,7 Subtract seven from register eax.
fsub qword ptr x Subtract the double-precision
floating-point value in the memory cell
labelled x from the top of the
floating-point stack.
jmp Ll Jump to the instruction labelled Ll.
cmp dword ptr x,7 Compare the 32-bit word at address x with
seven and record the results in the
condition flags.
jl Ll Branch to Ll if the last comparison
recorded less-than.
dword 020H Initialize the next 32-bit word in memory
to hexadecimal 20.

TABLE 18.1 Sample X86 assembler input lines.

(X86 rules 503)


%% 79 Interface
(X86 functions 498)
(X86 interface definition 497)

The last fragment configures the front end and points to the X86-specific
routines and data in the back end:
(X86 interface dehnition 497) = 497
Interface x86IR = {
1, 1, 0, /* char */
2, 2, 0, /* short */
4, 4, 0, /* int */
4, 4, 1, /* float */
8, 4, 1, /* double */
4, 4, 0, /* T * */
0, 4, 0, /* struct; so that ARGB keeps stack aligned */
1, /* little_endian */
0, /* mulops_calls */
0, /* wants_callb */
1, /* wants_argb */
0, /* left_to_right */
0, /* wants_dag */
(interface routine names)
(symbol-table emitters 498)
498 CHAPTER 18 • GENERATING X86 CODE

{l, (Xi nterface initializer355)}


};

The MIPS and SPARC conventions evaluate arguments left to right, but
the X86 conventions evaluate them right to left, which is why the inter-
face flag left_to_right is zero.
X86 conventions offer no standard way for compilers to encode sym-
bol tables in assembler code for debuggers, so 1cc's X86 back end in-
cludes no symbol-table emitters:
(symbol-table emitters 498) = 497
0, 0, 0, 0, 0, 0, 0,

18.1 Registers
The X86 architecture includes eight general registers. Assemblers typi-
cally refer to them by a name - eax, ecx, edx, ebx, esp, ebp, esi, and edi
- rather than by a number. 1 cc's register allocator needs a number to
compute shift distances for register masks, so 1cc borrows the encoding
from the binary representation of some instructions:
(X86 macros 498) = 496
enum { EAX=O, ECX=l, EDX=2, EBX=3, ESI=6, EDI=7 };
IREG 361
left_to_right 88 Conventions reserve ebp for the frame pointer and esp for the stack
mkreg 363 pointer, so 1 cc doesn't allocate them.
parsefl ags 370 progbeg builds the structures that describe the registers:
(X86 functions 498) = 501 497
"" {
static void progbeg(argc, argv) int argc; char *argv[];
int i;

(shared progbeg 371)


parseflags(argc, argv);
int reg [EAX] = mkreg("eax", EAX, 1, !REG);
i nt reg [ EDX] = mkreg("edx", EDX, 1, !REG);
intreg[ECX] = mkreg("ecx", ECX, 1, !REG);
int reg [EBX] = mkreg("ebx", EBX, 1, !REG);
int reg [ES!] = mkreg("esi", ES!, 1, !REG);
intreg[EDI] = mkreg("edi", EDI, 1, !REG);
(X86 progbeg 499)
}

Assembler code uses different names for the full 32-bit register and
its low order 8- and 16-bit subregisters. For example, assembler code
uses eax for the first 32-bit register, ax for its bottom half, and al for
its bottom byte. This rule requires initializing separate register vectors
for shorts and characters:
18. 1 • REGISTERS 499

(X86 data 499)=


static Symbol charreg[32], shortreg[32], intreg[32];
...
501 496

static Symbol fltreg[32];

(X86 progbeg 499)=


shortreg[EAX] = mkreg( ax 11 11
, EAX, 1, !REG);
...
499 498

shortreg[ECX] = mkreg( cx 11 11
, ECX, 1, !REG);
shortreg[EDX] mkreg( 11
dx 11
, EDX, 1, !REG);
shortreg[EBX] = mkreg( 11
bx 11
, EBX, 1, !REG);
shortreg[ESI] = mkreg( 11
si 11
, ES!, 1, !REG);
short reg [EDI] = mkreg( 11
di 11
, EDI, 1, !REG);
...
(X86 progbeg 499) +=
charreg[EAX] = mkreg( 11
al 11
, EAX, 1, !REG);
...
499 500 498

charreg[ECX] = mkreg( 11
cl 11
, ECX, 1, !REG);
char reg [EDX] = mkreg( 11
dl 11
, EDX, 1, !REG);
charreg[EBX] = mkreg( 11
bl 11
, EBX, 1, !REG);
No instructions address the bottom byte of esi or edi, so there is no byte
version of those registers. Byte instructions can address the top half of
each 16-bit register, but 1cc does without these byte registers because
using them would complicate code generation. For example, ever would
need to generate one sequence of instructions when the operand is in 498 EAX
the low-order byte and another sequence when the operand is next door. 498 EBX
Table 18.2 summarizes the allocable registers. 498 ECX
The floating-point registers are organized as a stack. Some operands 498 EDI
498 EDX
of some instructions can address an arbitrary floating-point register - 498 ESI
from the top down - but some crucial instructions effectively assume a 361 IREG
stack. For example, all variants of floating-point addition require at least 363 mkreg
one operand to be atop the stack. The assembler operand st denotes the
top of the stack, and st(l) denotes the value underneath it. Pushing a
value on the stack causes st to denote a new cell and st(l) to denote
the cell previously denoted by st.
1cc was tailored for registers with fixed names, not names that change
as a stack grows and shrinks. The X86 floating-point registers violate

Int Short Char


eax ax al
ecx ex cl
edx dx dl
ebx bx bl
esi si
edi di

TABLE 18.2 Allocable X86 registers.


500 CHAPTER 18 • GENERATING X86 CODE

these assumptions, so 1cc disables its register allocator for the X86
floating-point registers and lets the instructions manage the registers.
For example, a load instruction pushes a value onto the stack and thus ef-
fectively allocates a register; an addition pops two operands and pushes
their sum, so it effectively releases two registers and allocates one.
The register allocator can't be disabled by simply clearing the entries
in rmap for floats and doubles. If a node yields a value, then the reg-
ister allocator assumes that it needs a register, and expects the node's
syms [RX] to give a register class. So we need a representation of the
floating-point registers, but the representation needs to render the reg-
ister allocator harmless. One easy way to do this is to create registers
with zero masks, which causes getreg to succeed always and to change
no significant compiler state:
....
(X86 progbeg 499)+=
for ( i = 0 ; i < 8 ; i ++)
499 500
... 498

fl treg [i] = mkreg ("%d", i , 0, FREG) ;


This dodge permits 1 cc's register allocator to work, but it can't do a very
good job. This problem exemplifies a trade-off common in retargetable
compilers. We move as much code as seems reasonable into the machine-
independent parts of the compiler, but then those parts are fixed, and
code generators for targets with features not anticipated in the design
EAX 498 require idiosyncratic work-arounds and emit suboptimal code.
EBX 498
ECX 498
rmap stores the wildcard that identifies the default register class to
EDI 498 use for each type:
EDX 498 ....
ESI 498
FREG 361
(X86 progbeg 499)+=
rmap[C] = mkwildcard(charreg);
500 500
... 498

getreg 412 rmap[S] = mkwildcard(shortreg);


!REG 361 rmap[P] = rmap[B] = rmap[U] = rmap[I] = mkwildcard(intreg);
mkreg 363
mkwildcard 363
rmap[F] = rmap[D] = mkwildcard(fltreg);
rmap 398
tmask and vmask identify the registers to use for temporaries and to al-
tmask 410
vmask 410 locate to register variables. The X86 gives 1cc only six general registers,
and some of these are spilled by calls, block copies, and other special in-
structions or sequences of instructions. If there are too many common
subexpressions, 1cc's simple register allocator can emit code that does
to the registers what thrashing does to pages of virtual memory. The
conservative solution thus reserves all six general registers for tempo-
raries and allocates no variables to registers.
....
(X86 progbeg 499)+=
tmask[IREG] = (l<<EDI) (l«ESI) (l«EBX)
500 501
... 498

I (l«EDX) (l«ECX) (l«EAX);


vmask[IREG] = O;
1cc does likewise for the floating-point registers.
18. 1 • REGISTERS 501

...
(X86 progbeg 499) +=
tmask[FREG] = Oxff;
500 501
... 498

vmask[FREG] = O;
progbeg also emits some boilerplate required to assemble and link the
emitted code:
...
(X86 progbeg 499)+=
print(".486\n");
501 501 ... 498

print(" .model small\n");


print("extrn _turboFloat:near\n");
print("extrn _setargv:near\n");
The references to external symbols direct the linker to arrange a partic-
ular floating-point package and code to set argc and argv in each main
routine.
To switch from one segment to another requires two directives: an
ends that names the current segment, and a segment that names the
new one:
...
(X86 data 499)+=
static int cseg;
499 509 ... 496

...
(X86 functions 498) + =
static void segment(n) int n; {
498 502
... 497
91 BSS
if (n == cseg) 91 CODE
91 DATA
return; 90 export
if (cseg == CODE) 456 " (MIPS)
print("_TEXT ends\n"); 490 " (SPARC)
else if (cseg == DATA I I cseg BSS I I cseg LIT) 523 " (X86)
print("_DATA ends\n"); 361 FREG
91 LIT
cseg = n; 89 progbeg
if (cseg == CODE) 433 " (MIPS)
print("_TEXT segment\n"); 466 " (SPARC)
else if (cseg == DATA I I cseg BSS I I cseg -- LIT) 498 " (X86)
print("_DATA segment\n"); 89 progend
466 " (SPARC)
} 502 " (X86)
410 tmask
export needs a directive that must appear between segments. CODE, DATA, 410 vmask
LIT, and BSS are all positive, so export can use segment(O) to close the
active segment without opening a new one.
progbeg clears cseg, which records that the back end is between seg-
ments:
...
(X86 progbeg 499)+=
cseg = O;
501 509 ... 498

progend emits boilerplate that closes the current segment and the en-
tire assembler program:
502 CHAPTER 18 • GENERA TING X86 CODE

(X86 functions 498) +=


...
501 502 497
....
static void progend() {
segment(O);
print("end\n");
}

target records that an operator needs a specific register, and clobber


calls spi 11 to spill and reloads busy register that are overwritten by a
few operators:
(X86 functions 498) +=
...
502 502 497
....
static void target(p) Node p; {
switch (p->op) {
(X86 target 508)
}
}

static void clobber(p) Node p; {


static int nstack = O;

nstack = ckstack(p, nstack);


switch (p->op) {
(X86 clobber 513)
count 81 }
NELEMS 19 }
optype 98
progbeg 89 The cases missing above appear with the instructions for the germane
(MIPS) " 433 operators in the next section. clobber tracks in nstate the height of
(SPARC) " 466 the stack of floating-point registers. When progbeg disabled allocation
(X86) " 498 of these registers, it also disabled the spiller, so the X86 code generator
segment 91
(MIPS) " 459 must cope with floating-point spills itself. ckstack adjusts nstate to
record the result of the current instruction:
(SPARC) "
(X86) "
491
501
(X86 functions 498) +=
...
502 507 497
spill 427 ....
x.kids 359 #define isfp(p) (optype((p)->op)==F I I optype((p)->op)==D)

static int ckstack(p, n) Node p; int n; {


int i;

for (i = O; i < NELEMS(p->x.kids) && p->x.kids[i]; i++)


if (isfp(p->x.kids[i]))
n--;
if (isfp(p) && p->COUnt > 0)
n++;
if (n > 8)
error("expression too complicated\n");
return n;
}
18.2 • SELECTING INSTRUCTIONS 503

The for loop pops the source registers, and the subsequent if statement
pushes any result. Floating-point instructions done for side effect -
such as assignments and conditional branches - push nothing. ckstack
directs the programmer to simplify the expression to avoid the spill. 1cc
merely reports the error because such spills are rare, so reports are un-
likely to irritate users. If 1 cc ignored the problem completely, however,
it would silently emit incorrect code for some programs, which is unac-
ceptable. Exercises 18.8 and 18.9 explore related matters.

18.2 Selecting Instructions


Table 18.3 summarizes the nonterminals in lee's lburg specification for
the X86. It provides a high-level overview of the organization of the tree
grammar.
Integer and address constants denote themselves:
(X86 rules 503) = 504
..... 497
aeon: ADDRGP "%a"
aeon: con "%0"

A base address may be an ADDRGP or the sum of an aeon and one of the
502 ckstack
Name What It Matches
aeon address constants
addr address calculations for instructions that read and write memory
addrj address calculations for instructions that jump
base unindexed address calculations
cm pf floating-point comparands
con constants
conl the integer constant 1
con2 the integer constant 2
con3 the integer constant 3
flt floating-point operands
index indexed address calculations
mem memory cells used by general-purpose operators
memf memory cells used by floating-point operators
mr memory cells and registers
mrcO memory cells, registers, and constants whose memory cost is 0
mrcl memory cells, registers, and constants whose memory cost is 1
mrc3 memory cells, registers, and constants whose memory cost is 3
re registers and constants
res register cl and constants between 0 and 31 inclusive
reg computations that yield a result in a register
stmt computations done for side effect

TABLE 18.3 X86 nonterminals.


504 CHAPTER 18 • GENERATING X86 CODE

general registers. The assembler syntax puts the register name in square
brackets:
(X86 rules 503)+=
...
503 504 497
11%all .....
base: ADDRGP
base: reg 11 [%0] II

base: ADDI(reg,acon) 11 %1[%0] II

base: ADDP(reg,acon) 11%1[%0] II

base: ADDU(reg,acon) 11%1[%0] II

If the register is the frame pointer, the same operation computes the
address of a formal or local:
(X86 rules 503)+=
...
504 504 497
11 %a[ebp]" .....
base: ADDRFP
base: ADDRLP 11 %a[ebp]"
Some addresses use an index, which is a register scaled by one, two, four,
or eight:
(XB6 rules 503)+=
...
504 504 497
.....
index: reg "%0 11
index: LSHI(reg,conl) "%0*2"
index: LSHI(reg,con2) "%0*4"
range 388 index: LSHI(reg,con3) "%0*8"
reg 403
conl: CNSTI "1" range(a, 1, 1)
conl: CNSTU "l" range(a, 1, 1)
con2: CNSTI "2" range(a, 2, 2)
con2: CNSTU "2" range(a, 2, 2)
con3: CNSTI "3" range(a, 3' 3)
con3: CNSTU "3" range(a, 3' 3)
Recall that cost expressions are evaluated in a context in which a denotes
the node being labelled, which here is the constant value being compared
with small integers. The unsigned shifts to the left are equivalent to the
integer shifts:
(XB6 rules 503)+=
...
504 504 497
.....
index: LSHU(reg,conl) "%0*2"
index: LSHU(reg,con2) "%0*4"
index: LSHU(reg,con3) "%0*8"
A general address may be a base address or the sum of a base address
and an index. The front end puts index operations on the left; see Sec-
tion 9.7.
(XB6 rules 503) + =
...
504 505 497
.....
addr: base "%0"
18.2 • SELECTING INSTRUCTIONS 505

addr: ADDI(index,base) "%1[%0]"


addr: ADDP(index,base) "%1[%0]"
addr: ADDU(index,base) "%1[%0]"
If the base address is zero, the sum degenerates to just the index:
(X86 rules 503)+=
....
504 505 497
.....
addr: index "[%0]"
Many instructions accept an operand in memory. Assemblers for many
machines encode the datatype in the instruction opcode, but here the
operand specifier does the job. word denotes a 16-bit operand, and dword
a 32-bit operand.
(X86 rules 503)+=
....
505 505 497
.....
mem: INDIRC(addr) "byte ptr %0"
mem: INDIRI(addr) "dword ptr %0"
mem: INDIRP(addr) "dword ptr %0"
mem: INDIRS(addr) "word ptr %0"
Some instructions accept a register or immediate operand, some accept
an operand in a register or memory, and some accept all three:
(X86 rules 503)+=
....
505 505 497
.....
re: reg "%0" 403 reg
re: con "%0"

mr: reg "%0"


mr: mem "%0"

mrcO: mem "%0"


mrcO: re "%0"
Some instructions in the last class access memory without cost; others
suffer a penalty of one cycle, and still others a penalty of three cycles:
(X86 rules 503)+=
....
505 505 497
.....
mrcl: mem "%0" 1
mrcl: re "%0"

mrc3: mem "%0" 3


mrc3: re "%0"
The lea instruction loads an address into a register, and the mov instruc-
tion loads a register, constant, or memory cell:
(X86 rules 503)+=
....
505 506 497
.....
reg: addr "lea %c,%0\n" 1
reg: mrcO "mov %c,%0\n" 1
506 CHAPTER 18 • GENERA TING X86 CODE

reg: LOADC(reg) "mov %c,%0\n" move(a)


reg: LOADI(reg) "mov %c,%0\n" move(a)
reg: LOADP(reg) "mov %c,%0\n" move(a)
reg: LOADS(reg) "mov %c,%0\n" move(a)
reg: LOADU(reg) "mov %c,%0\n" move(a)
mov incurs no additional penalty for its memory access, so it uses mrcO.
Recall that the cost function move returns one but also marks the node
as a register-to-register copy; emit, requate, and moveself collaborate
to remove some marked instructions.
Integral addition and subtraction incur a one-cycle penalty when ac-
cessing memory, so they use mrcl:
....
(X86 rules 503)+=
reg: ADDI (reg , mrel) "?mov %c,%0\nadd %c,%1\n" 1
505 506
... 497

reg: ADDP(reg,mrcl) "?mov %c,%0\nadd %c,%1\n" 1


reg: ADDU(reg,mrcl) "?mov %c,%0\nadd %c,%1\n" 1
reg: SUBI(reg,mrcl) "?mov %c,%0\nsub %c,%1\n" 1
reg: SUBP(reg,mrcl) "?mov %c,%0\nsub %c,%1\n" 1
reg: SUBU (reg, mrcl) "?mov %c,%0\nsub %c,%1\n" 1
The bitwise instructions are similar:
....
emit 92
(X86 rules 503)+=
reg: BANDU(reg,mrcl) "?mov %c,%0\nand %c,%1\n"
506 507
1
... 497

emit 393 reg: BORU(reg,mrcl) "?mov %c,%0\nor %c,%1\n" 1


move 394
moveself 394 reg: BXORU(reg,mrcl) "?mov %c,%0\nxor %c,%1\n" 1
reg 403 Recall that a leading question mark in the assembler template tells emit
requate 394
to omit the first instruction in the template if the current instruction
reuses the first kid's destination register. That is, if %c is eax, %0 is ebx
and %1 is ecx, then the SUBU template above emits
mov eax,ebx
sub eax,ecx
but if %c is eax, %0 is eax, and %1 is ecx, the same template emits
sub eax,ecx
The binary instructions clobber their first operand, so the general imple-
mentation must start by copying their first operand into the destination
register, but the copy is redundant in many cases. The costs above are
estimates, because 1 cc doesn't determine whether the mov is needed until
it allocates registers, which is too late to help select instructions. This is
a classic phase-ordering problem: the compiler must select instructions
to allocate registers, but it must allocate registers to compute instruction
costs accurately.
The binary operators above have a variant that modifies a memory
cell. Some fix the other operand to be one. For example, the instruction
18.2 • SELECTING INSTRUCTIONS 507

inc dword ptr i


bumps i by one.
...
(X86 rules 503) +=
stmt: ASGNI(addr,ADDI(mem,conl)) "inc %1\n" memop(a)
...
506 507 497

stmt: ASGNI(addr,ADDU(mem,conl)) "inc %1\n" memop(a)


stmt: ASGNP(addr,ADDP(mem,conl)) "inc %1\n" memop(a)
stmt: ASGNI(addr,SUBI(mem,conl)) "dee %1\n" memop(a)
stmt: ASGNI(addr,SUBU(mem,conl)) "dee %1\n" memop(a)
stmt: ASGNP(addr,SUBP(mem,conl)) "dee %1\n" memop(a)
The lone operand identifies the source operand and the destination.
memop confirms that the tree has the form ASGNa(x ,b(INDIR(x) ,c)):
...
(X86 functions 498) +=
static int memop(p) Node p; {
...
502 507 497

if (generic(p->kids[l]->kids[O]->op) == INDIR
&& sametree(p->kids[O], p->kids[l]->kids[O]->kids[O]))
return 3;
else
return LBURG_MAX;
}

memop confirms the overall shape of the tree, and sametree confirms that 403 stmt
the destination is the same as the first source operand:
...
(X86 functions 498) +=
static int sametree(p, q) Node p, q; {
...
507 511 497

return p == NULL && q == NULL


I I p && q && p->op == q->op && p->syms[O] == q->syms[O]
&& sametree(p->kids[O], q->kids[O])
&& sametree(p->kids[l], q->kids[l]);
}

Other variants on the binary operators permit the second operand to be


a register or constant:
...
(X86 rules 503)+=
stmt: ASGNI(addr,ADDI(mem,rc)) "add %1,%2\n"
...
507 508
memop(a)
497

stmt: ASGNI(addr,ADDU(mem,rc)) "add %1,%2\n" memop(a)


stmt: ASGNI(addr,SUBI(mem,rc)) "sub %1,%2\n" memop(a)
stmt: ASGNI(addr,SUBU(mem,rc)) "sub %1,%2\n" memop(a)

stmt: ASGNI(addr,BANDU(mem,rc)) "and %1,%2\n" memop(a)


stmt: ASGNI(addr,BORU(mem,rc)) "or %1,%2\n" memop(a)
stmt: ASGNI(addr,BXORU(mem,rc)) "xor %1,%2\n" memop(a)
Each integral unary operator clobbers its lone operand:
508 CHAPTER 18 • GENERA TING X86 CODE

....
(X86 rules 503)+=
reg: BCOMU(reg) "?mov %c,%0\nnot %c\n" 2
507 508... 497

reg: NEGI(reg) "?mov %c,%0\nneg %c\n" 2

stmt: ASGNI(addr,BCOMU(mem)) "not %1\n" memop(a)


stmt: ASGNI(addr,NEGI(mem)) "neg %1\n" memop(a)
The shift instructions are similar to the other binary integral instruc-
tions, except that the shift distance must be constant or in byte register
cl, which is the bottom of register ecx:
....
(X86 rules 503) +=
reg: LSHI(reg,rcS) "?mov %c,%0\nsal %c,%1\n" 2
508 509... 497

reg: LSHU(reg, rcS) "?mov %c,%0\nshl %c,%1\n" 2


reg: RSHI(reg,rcS) "?mov %c,%0\nsar %c,%1\n" 2
reg: RSHU(reg,rcS) "?mov %c,%0\nshr %c,%1\n" 2

stmt: ASGNI(addr,LSHI(mem,rcS)) "sal %1,%2\n" memop(a)


stmt: ASGNI(addr,LSHU(mem,rcS)) "shl %1,%2\n" memop(a)
stmt: ASGNI(addr,RSHI(mem,rcS)) "sar %1,%2\n" memop(a)
stmt: ASGNI(addr,RSHU(mem,rcS)) "shr %1,%2\n" memop(a)

res: CNSTI "%a" range(a, 0, 31)


cse 346 res: reg "cl"
EAX 498
ECX 498 We take care to emit no shifts by constants less than zero or greater than
memop 507
range 388 31. There are many X86 assemblers, so we can't be sure that some won't
reg 403 issue a diagnostic for undefined shifts. rtarget arranges to compute into
rtarget 400 cl all shift counts that aren't constants between zero and 31 inclusive:
RX 362
setreg 399 (is p->kids[l] a constant common subexpression?508)= 508
stmt 403 generic(p->kids[l]->op) == INDIR
VREG 361
&& p->kids[l]->kids[O]->OP == VREG+P
&& p->kids[l]->syms[RX]->u.t.cse
&& generic(p->kids[l]->syms[RX]->u.t.cse->op) == CNST
(X86 target 508) =
case RSHI: case RSHU: case LSHI: case LSHU:
...
509 502

if (generic(p->kids[l]->op) != CNST
&& ! C(is p->ki ds [1] a constant common subexpression? 508))) {
rtarget(p, 1, intreg[ECX]);
setreg(p, intreg[EAX]);
}
break;
The call on setreg above ensures that this node doesn't target ecx. If it
did, the mov instruction that starts the template would clobber ecx and
18.2 • SELECTING INSTRUCTIONS 500

thus cl before its value has been used. eax is not the only acceptable
register, but non-constant shift amounts were rare in our tests, so it
wasn't worth tailoring a wildcard without ecx for these shifts.
The i mu l instruction multiplies signed integers. One variant multiplies
a register by a register, constant, or memory cell:
...
(X86 rules 503) + =
reg: MULI(reg,mrc3) "?mov %c,%0\nimul %c,%1\n"
508 509
14
... 497

Another variant takes three operands and leaves in a register the product
of a constant and a register or memory cell:
...
(X86 rules 503)+=
reg: MULI(con,mr) "imul %c,%1,%0\n" 13
509 509... 497

The remaining multiplicative instructions are more constrained. The


mul instruction multiples unsigned integers.
...
(X86 rules 503) +=
reg: MULU(reg,mr) "mul %1\n" 13
509 510
... 497

It expects its first operand in eax and leaves its result in the double
register edx-eax; eax holds the low-order bits, which is the result of the
operation, unless the operation overflows, in which case ANSI calls the
result undefined, so eax is as good a result as any: 498 EAX
... 498 EDX
(X86 target 508) +=
case MULU:
508 510
... 502 361 IREC
361 mask
363 mkreg
setreg(p, quo); 403 reg
rtarget(p, 0, intreg[EAX]); 400 rtarget
break; 399 setreg
362 x.regnode
quo and rem denote the eax-edx register pair, which hold a product after
an unsigned multiplication and a dividend before a division. After a
division, eax holds the quotient and edx the remainder.
(X86 data 499) +=
...
501 496
static Symbol quo, rem;

(X86 progbeg 499)+=


...
501 498
quo= mkreg("eax", EAX, 1, IREG);
quo->x.regnode->mask I= l<<EDX;
rem= mkreg("edx", EDX, 1, IREG);
rem->x.regnode->mask I= l<<EAX;
The div instruction divides integers. It expects its first argument in the
edx-eax double register, and it leaves the quotient in eax and the remain-
der in edx:
510 CHAPTER 18 • GENERA TING X86 CODE

....
(X86 target 508) +=
case DIVI: case DIVU:
509 512 ... 502

setreg(p, quo);
rtarget(p, 0, intreg[EAX]);
rtarget(p, 1, intreg[ECX]);
break;
case MODI: case MODU:
setreg(p, rem);
rtarget(p, 0, intreg[EAX]);
rtarget(p, 1, intreg[ECXJ);
break;
An xor instruction clears edx to prepare for an unsigned division:
....
(X86 rules 503)+=
reg: DIVU(reg,reg) "xor edx,edx\ndiv %1\n"
509 510 ... 497

reg: MODU(reg,reg) "xor edx,edx\ndiv %1\n"


The cdq instruction propagates eax's sign bit through edx to prepare for
a signed division:
....
(X86 rules 503)+=
reg: DIVI(reg,reg) "cdq\nidiv %1\n"
510 510 ... 497

reg: MODI(reg,reg) "cdq\nidiv %1\n"


EAX 498 The first instruction clobbers edx, so it's vital that that the divisor be
ECX 498 elsewhere. Targeting it into ecx above is one solution. It's gratuitously
emit2 356
(MIPS) " 444 restrictive, but integer division and modulus are not particularly com-
(SPARC) " 478 mon.
(X86) " 511 The conversions between integral and pointer types are vacuous and
move 394 thus implemented by mov instructions. move marks them as such in
quo 509 hopes of eliminating them:
reg 403 ....
rem
rtarget
509
400
(X86 rules 503)+=
reg : CVIU Creg) "mov %c,%0\n" move(a)
510 510
... 497

setreg 399 reg: CVPU(reg) "mov %c,%0\n" move(a)


reg: CVUI Creg) "mov %c,%0\n" move(a)
reg: CVUP(reg) "mov %c,%0\n" move(a)
movsx and movzx are like mov, but they sign- or zero-extend to widen the
input:
....
(X86 rules 503)+=
reg: CVCI(INDIRC(addr)) "movsx %c,byte ptr
510 511
%0\n" 3
... 497

reg: CVCU(INDIRC(addr)) "movzx %c,byte ptr %0\n" 3


reg: CVSI(INDIRS(addr)) "movsx %c,word ptr %0\n" 3
reg: CVSU(INDIRS(addr)) "movzx %c,word ptr %0\n" 3
movsx and movzx can also operate on registers, but they require help
from emit2, because the source operand must name the 8- or 16-bit sub-
register:
18.2 • SELECTING INSTRUCTIONS 511

....
(X86 rules 503) +==
reg: CVCI(reg) "# extend\n" 3
510 511
... 497

reg: CVCU(reg) "# extend\n" 3


reg: CVSI(reg) "# extend\n" 3
reg: CVSU(reg) "# extend\n" 3
....
(X86 functions 498) +==
static void emit2(p) Node p; {
507 512
... 497

(X86 emit2 511)


}

(result 511) == 511


p->syms[RX]->x.name

(X86 emit2 511)==


#define preg(f) ((f)[getregnum(p->x.kids[O])]->x.name)
...
511 511

if (p->op == CVCI)
print("movsx %s,%s\n", (result511), preg(charreg));
else if (p->op == CVCU)
pri nt("movzx %s ,%s\n", (result511), preg(charreg));
else if (p->op == CVS!)
print("movsx %s,%s\n", (result511), preg(shortreg)); 403 reg
else if (p->op == CVSU) 362 RX
359 x.kids
print("movzx %s,%s\n", (result511), preg(shortreg)); 362 x.name
The integral narrowing conversions also require special treatment:
....
(X86 rules 503)+==
reg: CVIC(reg) "# truncate\n" 1
511 512
... 497

reg: CVIS(reg) "# truncate\n" 1


reg: CVUC(reg) "# truncate\n" 1
reg: CVUS(reg) "# truncate\n" 1
The template "?mov %c,%0\n" and cost move(a) would move the input
to the output and omit the move when the source and destination are the
same, but mov expects both its source and target to be the same size, so
when a mov is necessary, emi t2 emits one but uses the 16-bit version of
the source and target registers, which mollifies the assembler and copies
enough bits for all integral narrowing conversions:
....
(X86 emit2 511)+== 511 511
else if (p->op == CVIC I I p->op == CVIS
I I p->OP == cvuc I I p->Op == CVUS) {
char *dst = shortreg[getregnum(p)]->x.name;
char *src = preg(shortreg);
if (dst ! = src)
512 CHAPTER 18 •GENERATING XB6 CODE

print("mov %s,%s\n", dst, src);


}
The mov instruction stores as well as loads:
{X86 rules 503)+=
...
511 512 497
.....
stmt: ASGNC(addr,rc) "mov byte ptr %0,%1\n" 1
stmt: ASGNI(addr,rc) "mov dword ptr %0,%1\n" 1
stmt: ASGNP(addr,rc) "mov dword ptr %0,%1\n" 1
stmt: ASGNS(addr,rc) "mov word ptr %0,%1\n" 1
ARGI and ARGP are analogous to ASGNI and ASGNP, but their target is a
new cell atop the stack. They use the push instruction, which pushes an
argument onto the stack:
{X86 rules 503) +=
...
512 513 497
.....
stmt: ARGI(mrc3) "push %0\n" 1
stmt: ARGP(mrc3) "push %0\n" 1
The mrc3 above is correct, if counter-intuitive. push 0 takes four cycles
even though
mov eax,O
push eax
takes only two.
doarg calls mkactua l, which computes the stack offset for the next
docall 367
EDI 498
actual argument and updates maxargoffset. Unlike 1 cc's RISC targets,
ES! 498 the X86 has a push instruction that obviates any need for mkactual's
maxargoffset 366 stack offset, but doarg still calls mkactual to compute maxargoffset,
mkactual 366 which doca 11 stores in CALL nodes because the ca 11 instructions need
rtarget 400 it to pop the actual arguments off the stack after the call.
stmt
target
403
357 {X86 functions 498) +=
...
511 513 497
.....
(MIPS) " 435 static void doarg(p) Node p; {
(SPARC) " 468 mkactual(4, p->syms[O]->u.c.v.i);
(X86) " 502 }
ASGNB copies a block of memory. The movsb instruction copies a byte
from the address in esi to the address in edi, then adds one to each
of those registers. The rep string-instruction prefix repeats the suffix in-
struction ecx times, so the combination rep movsb copies ecx bytes from
the address in esi to the address in edi. target arranges to compute
the source and destination addresses into esi and edi:
{X86 target 508) +=
...
510 513
..... 502
case ASGNB:
rtarget(p, 0, intreg[EDI]);
rtarget(p, 1, intreg[ESI]);
break;
The template for ASGNB copies the size of the block into ecx and issues
the rep movsb:
18.2 • SELECTING INSTRUCTIONS 513

....
(X86 rules 503)+=
stmt: ASGNB(reg,INDIRB(reg))
512 513
"mov ecx,%a\nrep movsb\n"
... 497

ARGB is similar. The source is the ARGB's lone child:


....
(X86 target 508)+=
case ARGB:
512 517
... 502

rtarget(p->kids[O], 0, intreg[ESI]);
break;
The destination is fixed to be the top of the stack, so the template starts
by allocating a block atop the stack and pointing edi at it:
....
(X86 rules 503)+=
stmt: ARGB(INDIRB(reg)) "sub esp,%a\nmov edi,esp\n_
513 513 ... 497

mov ecx,%a\nrep movsb\n"


rep clobbers ecx and movsb clobbers esi and edi:
(X86 clobber 513)=
case ASGNB: case ARGB:
...
517 502

spill(l<<ECX I l<<ESI I l<<EDI, !REG, p);


break;
The bl k procedures aren't needed:
.... 498 ECX
(X86 functions 498) +=
static void blkfetch(k, off, reg, tmp)
512 518 ... 497 498
498
EDI
ESI
361 IREG
int k, off, reg, tmp; {} 403 reg
static void blkstore(k, off, reg, tmp) 400 rtarget
int k, off, reg, tmp; {} 427 spill
static void blkloop(dreg, doff, sreg, soff, size, tmps) 403 stmt
int dreg, doff, sreg, soff, size, tmps[]; {} 497 x86IR

The fragment (interface routine names) expects static bl k procedures


and appears in x86IR, so we must define the routines, but they don't
have to do anything because nothing calls them
The floating-point instructions use a stack of eight 80-bit registers.
All temporary values are 80 bits; ANSI C allows calculations to use extra
precision, so the code generator need not compensate.
Some floating-point instructions take an operand from memory. The
operand, not the operator, specifies the type:
....
(X86 rules 503)+=
memf: INDIRD(addr) "qword ptr %0"
513 514
... 497

memf: INDIRF(addr) "dword ptr %0"


memf: CVFD(INDIRF(addr)) "dword ptr %0"
The fl d instruction loads a floating-point value from memory and pushes
it onto the floating-point stack:
514 CHAPTER 18 • GENERA TING X86 CODE

(X86 rules 503)+=


....
513 514 497
.....
reg: memf "fl d %0\n" 3
fstp pops the floating-point stack and stores the result in memory:
(X86 rules 503)+=
....
514 514 497
.....
stmt: ASGND(addr,reg) "fstp qword ptr %0\n" 7
stmt: ASGNF(addr,reg) "fstp dword ptr %0\n" 7
stmt: ASGNF(addr,CVDF(reg)) "fstp dword ptr %0\n" 7
Floating-point arguments travel on the memory stack, so a subtraction
allocates space for them, and an fstp fills the space:
(X86 rules 503)+=
....
514 514 497
.....
stmt: ARGD(reg) "sub esp,8\nfstp qword ptr [esp]\n"
stmt: ARGF(reg) "sub esp,4\nfstp dword ptr [esp]\n"
The unary operators change the element atop the floating-point stack.
For example, the fchs instruction negates the top of the stack:
(X86rules503)+=
....
514 514 497
.....
reg: NEGD(reg) "fchs\n"
reg: NEGF(reg) "fchs\n"
The binary operators work on the top two elements of the floating-point
stack or on the top element and an operand from memory:
reg 403 (X86 rules 503)+=
....
514 514
..... 497
stmt 403
flt: memf " %0"
flt: reg "p st(l) ,st"
For example, the instruction
fsubp st(l),st
subtracts the top of the stack (st) from the element one (st(l)) un-
derneath it and pops (the p suffix) the stack once, discarding the value
subtracted. The instruction
fsub qword ptr x
subtracts the 64-bit value of x from the top of the stack. The p suffix
is missing, so the height of the floating-point stack doesn't change. The
other binary operators are similar:
....
(X86 rules 503)+= 514 515
..... 497
reg: ADDD(reg,flt) "fadd%1\n"
reg: ADDF(reg,flt) "fadd%1\n"
reg: DIVD(reg,flt) "fdiv%1\n"
reg: DIVF(reg,flt) "fdiv%1\n"
reg: MULD(reg,flt) "fmul%1\n"
reg: MULF(reg,flt) "fmul%1\n"
reg: SUBD(reg,flt) "fsub%1\n"
reg: SUBF(reg, flt) "fsub%1\n"
18.2 • SELECTING INSTRUCTIONS 515

The conversion from float to double does nothing to a floating-point reg-


ister, because the register is already 80 bits wide and thus needs no
further widening. CVFD's widening has thus already been done:
...
(X86 rules 503) +=
reg: CVFD(reg) "# CVFD\n"
514 515... 497

No instruction directly narrows a double to a float, so we must store the


value into a temporary float, which narrows the value. Then we reload
the value, which widens it again, but the extra precision is gone:
...
(X86 rules 503)+=
reg: CVDF(reg) "sub esp,4\nfstp dword ptr O[esp]\n_
515 515
... 497

fld dword ptr O[esp]\nadd esp,4\n" 12


The conversion from double to integer is similar. The instruction fi stp
pops the floating-point stack, converts the value to an integral value, and
stores it in memory:
...
(X86 rules 503)+=
stmt: ASGNI(addr,CVDI(reg)) "fistp dword ptr %0\n"
515 515
...29 497

If the code needs the integral result in a (general) register, then we'll
create, use, and free a temporary on the stack in memory:
... 61 ptr
(X86 rules 503)+=
reg: CVDI(reg) "sub esp,4\n_
515 515
... 497 403 reg
403 stmt
fistp dword ptr O[esp]\npop %c\n" 31
The fi l d instruction loads an integer, converts it to an 80-bit floating-
point value, and pushes it onto the floating-point stack:
...
(X86 rules 503) +=
reg: CVID(INDIRI(addr)) "fild dword ptr %0\n" 10
515 515
... 497

If the operand comes from a (general) register, then we create, use, and
free another temporary on the stack in memory:
...
(X86 rules 503)+=
reg: CVID(reg) "push %0\n_
515 515... 497

fild dword ptr O[esp]\nadd esp,4\n" 12


The jmp instruction jumps unconditionally. lt accepts a label, register,
or memory cell:
...
(X86rules 503)+=
addrj: ADDRGP "%a"
515 516
... 497

addrj: reg "%0" 2


addrj: mem "%0" 2
516 CHAPTER 18 • GENERA TING X86 CODE

stmt: JUMPV(addrj) "jmp %0\n" 3


stmt: LABELV "%a:\n"
The conditional branches compare two values and branch when the con-
dition is met. The cmp instruction does the comparisons and has several
variants. One compares a memory cell with a register or constant. The
signed integers have all six relationals:
....
(X86 rules 503)+= 515 516
..... 497
stmt: EQI(mem,rc) "cmp %0,%1\nje %a\n" 5
stmt: GEI(mem,rc) "cmp %0,%1\njge %a\n" 5
stmt: GTI(mem,rc) "cmp %0,%1\njg %a\n" 5
stmt: LEI(mem,rc) "cmp %0,%1\njle %a\n" 5
stmt: LTI(mem,rc) "cmp %0,%1\njl %a\n" 5
stmt: NEI(mem,rc) "cmp %0,%1\njne %a\n" 5
The unsigned integers have only four because EQI and NEI work for un-
signed integers too:
(X86 rules 503) +=
....
516 516 497
.....
stmt: GEU(mem,rc) "cmp %0,%1\njae %a\n" 5
stmt: GTU(mem,rc) "cmp %0,%1\nja %a\n" 5
stmt: LEU(mem,rc) "cmp %0,%1\njbe %a\n" 5
stmt: LTU(mem,rc) "cmp %0,%1\njb %a\n" 5
reg 403
stmt 403
Another variant of cmp compares a register to a constant, a memory cell,
or another register, so we repeat the signed and unsigned rules above
with this combination of operands:
(X86 rules 503)+=
....
516 516 497
.....
stmt: EQI(reg,mrcl) "cmp %0,%1\nje %a\n" 4
stmt: GEI (reg, mrcl) "cmp %0,%1\njge %a\n" 4
stmt: GTI(reg,mrcl) "cmp %0,%1\njg %a\n" 4
stmt: LEI(reg,mrcl) "cmp %0,%1\njle %a\n" 4
stmt: LTI(reg,mrcl) "cmp %0,%1\njl %a\n" 4
stmt: NEI(reg,mrcl) "cmp %0,%1\njne %a\n" 4

stmt: GEU(reg, mrcl) "cmp %0,%1\njae %a\n" 4


stmt: GTU(reg,mrcl) "cmp %0,%1\nja %a\n" 4
stmt: LEU(reg,mrcl) "cmp %0,%1\njbe %a\n" 4
stmt: LTU(reg,mrcl) "cmp %0,%1\njb %a\n" 4
The instruction fcomp x pops one element from the floating-point
stack and compares it with the operand x in memory. The fcompp variant
pops both comparands from the floating-point stack. The nonterminal
cmpf allows one rule to emit both variants:
(X86 rules 503)+=
....
516 517 497
.....
cmpf: memf " %0"
cmpf: reg "p"
18.2 • SELECTING INSTRUCTIONS 517

The similar nonterminal flt, which is defined on page 514, won't do,
because the assembler requires a st(l), st on binary operators but cu-
riously forbids it on fcomp. fcomp stores the result of the comparison in
some machine flags. The instruction fststw ax stores the flags in the
bottom of eax, and the instruction sahf loads them into the flags tested
by the conditional branch instructions:
(X86 rules 503) +=
...
516 517 497
....
stmt: EQD(cmpf,reg) "fcomp%0\nfstsw ax\nsahf\nje %a\n"
stmt: GED(cmpf,reg) "fcomp%0\nfstsw ax\nsahf\njbe %a\n"
stmt: GTD(cmpf,reg) "fcomp%0\nfstsw ax\nsahf\njb %a\n"
stmt: LED(cmpf,reg) "fcomp%0\nfstsw ax\nsahf\njae %a\n"
stmt: LTD(cmpf,reg) "fcomp%0\nfstsw ax\nsahf\nja %a\n"
stmt: NED(cmpf,reg) "fcomp%0\nfstsw ax\nsahf\njne %a\n"

stmt: EQF(cmpf,reg) "fcomp%0\nfstsw ax\nsahf\nje %a\n"


stmt: GEF(cmpf,reg) "fcomp%0\nfstsw ax\nsahf\njbe %a\n"
stmt: GTF(cmpf,reg) "fcomp%0\nfstsw ax\nsahf\njb %a\n"
stmt: LEF(cmpf,reg) "fcomp%0\nfstsw ax\nsahf\njae %a\n"
stmt: LTF(cmpf,reg) "fcomp%0\nfstsw ax\nsahf\nja %a\n"
stmt: NEF(cmpf,reg) "fcomp%0\nfstsw ax\nsahf\njne %a\n"
clobber records that the floating-point conditional branches destroy eax:
(X86 clobber 513)+=
...
513 518 502
357 clobber
.... 435 " (MIPS)
case EQD: case LED: case GED: case LTD: case GTD: case NED: 468 " (SPARC)
case EQF: case LEF: case GEF: case LTF: case GTF: case NEF: 502 " (X86)
spill(l<<EAX, !REG, p); 498 EAX
361 !REG
break; 403 reg
The ca11 instruction pushes on the stack the address of the next in- 400 rtarget
399 setreg
struction and jumps to the address specified by its operand:
(X86 rules 503) +=
... 427
403
spi 11
stmt
517 518.... 497
reg: CALLI(addrj) "call %0\nadd esp,%a\n"
stmt: CALLV(addrj) "call %0\nadd esp,%a\n"
The add instruction pops the arguments off the stack after the call. The
front end points each call node's syms [OJ at a symbol equal to the num-
ber of bytes of actual arguments. The %a causes this number to be emit-
ted. The return value arrives in eax:
(X86 target 508) +=
...
513 502
case CALLI: case CALLV:
setreg(p, intreg[EAX]);
break;
case RETI:
rtarget(p, 0, intreg[EAX]);
break;
518 CHAPTER 18 • GENERA TING X86 CODE

Floating-point functions return a value in the top of the stack of floating-


point registers:
(X86 rules 503)+=
...
517 518 497
....
reg: CALLF(addrj) "call %0\nadd esp,%a\n"
reg: CALLD(addrj) "call %0\nadd esp,%a\n"

(X86 clobber 513)+=


...
517 502
case CALLO: case CALLF:
spill(l<<EDX I l<<EAX, !REG, p);
break;
Return nodes exist, as usual, more to guide register targeting than to
emit code:
(X86 rules 503)+=
...
518 497
stmt: RETI(reg) "# ret\n"
stmt: RETF(reg) "# ret\n"
stmt: RETD(reg) "# ret\n"

18.3 Implementing Functions


askregvar 412 The front end calls local to announce a local variable, including the
AUTO 80 temporaries that it generates. The code generator assigns no floating-
callee 93 point locals - not even temporaries - to registers, so local starts by
caller 93
EAX 498 forcing them onto the stack:
EDX 498
(X86 functions 498) +=
...
513 518 497
!REG 361 ....
isfloat 60 static void local(p) Symbol p; {
mkauto 365 if (isfloat(p->type))
progbeg 89 p->sclass = AUTO;
(MIPS) " 433 if (askregvar(p, rmap[ttob(p->type)]) == 0)
(SPARC) " 466
(X86) " 498 mkauto(p);
reg 403 }
rmap 398
spill 427 Floating-point and integral locals are handled asymmetrically because
stmt 403 integral temporaries are assigned to registers. Other locals aren't, but
ttob 73 progbeg cleared vmask [IREG], which directs askregvar to keep bona fide
vmask 410 variables out of registers.
The front end calls the interface procedure function to announce a
new routine:
(X86 functions 498) +=
...
518 520 497
....
static void function(f, caller, callee, n)
Symbol f, callee[], caller[]; int n; {
int i;
18.3 • IMPLEMENTING FUNCTIONS 519

(XB6 function 519)


}

It emits the procedure prologue, which includes a label and instructions


to save ebx, esi, edi, and ebp:
(XB6 function 519)= 519
..... 519
print("%s:\n", f->x.name);
print("push ebx\n");
print("push esi\n");
print("push edi\n");
Print("push ebp\n");
print("mov ebp,esp\n");
The prologue code also updates ebp. Figure 18.1 shows an X86 frame.
Next, function clears the state of the register allocator and calculates
the stack offset for each incoming argument. The first resides 20 "Qytes
frorp. ebp: 16 bytes save registers and four more save the return adQ.ress .
(XB6 function 519)+=
....
519 520 519
.....
(clear register state410)
offset = 16 + 4;
for Ci= O; callee[i]; i++) {
(assign offset to argument i 520) 93 callee
} 92 function
448 " (MIPS)
484 " (SPARC)
518 " (X86)
high addresses 364 offset
362 x.name
incoming
arguments 20
return address 16
saved ebx 12
saved esi 8
s~ved edi 4
ebp--. saved ebp 0
locals and
temporaries

low addresses

FIGURE 18.1 A fr~me for the X86.


520 CHAPTER 18 • GENERA TING X86 CODE

offset gives the offset from ebp to the next argument. It determines
the x. offset and x. name fields of the callee and caller views of the ar-
guments:
(assign offset to argument i 520)= 519
Symbol p = callee[i];
Symbol q = caller[i];
p->x.offset = q->x.offset = offset;
p->x.name = q->x.name = stringf("%d", p->x.offset);
p->sclass = q->sclass = AUTO;
offset+= roundup(q->type->size, 4);
The scl ass fields are set to record that no arguments are assigned to
registers, and offset is adjusted for the next argument and to keep the
stack aligned.
function then calls gen code to process the body of the routine. It
first resets offset and maxoffset to record that no locals have yet been
allocated:
(X86 function 519) +=
...
519 520
.... 519
offset = maxoffset = O;
gencode(caller, callee);
framesize = roundup(maxoffset, 4);
AUTO 80 if (framesize > 0)
callee 93 print("sub esp,%d\n", framesize);
caller 93
emitcode 341 When gencode returns, maxoffset is the largest value that offset took
framesize 366 on during the lifetime of gencode, so code to allocate the rest of the
function 92
(MIPS) " 448
frame can now be emitted into the prologue. Then function calls
(SPARC) " 484 emi tcode to emit the body of the routine, and it calls print directly
(X86) " 518 to emit the epilogue, which merely undoes the prologue:
gencode 337
maxoffset 365 (X86 function 519)+=
...
520 519
offset 364 emitcode();
print 18 print("mov esp,ebp\n");
roundup 19 print("pop ebp\n");
sclass 38
stringf 99 print("pop edi\n");
x.name 362 print("pop esi\n");
x.offset 362 print("pop ebx\n");
print("ret\n");

18.4 Defining Data


The front end calls defsymbo 1 to announce each new symbol:
(X86 functions 498) + =
...
518 521
.... 497
static void defsymbol(p) Symbol p; {
18.4 •DEFINING DATA 521

(X86 defsymbol 521)


}

Static locals get a generated name to avoid other static locals of the same
name:
(X86 defsymbol 521)= 521
.... 521
if (p->scope >= LOCAL && p->sclass == STATIC)
p->x.name = stringf("L%d", genlabel(l));
Generated symbols already have a unique numeric name. defsymbo 1 sim-
ply prefixes a letter to make a valid assembler identifier:
(X86 defsymbol 521)+=
...
else if (p->generated)
....
521 521 521

p->x.name = stringf("L%s", p->name);


Conventions for exported globals prefix an underscore to the name:
(X86 defsymbol 521)+=
...
521 521 521
....
else if (p->scope == GLOBAL I I p->sclass == EXTERN)
p->x.name = stringf("_%s", p->name);
Hexadecimal constants must be reformatted. Where the front end uses
Oxff, the X86 assembler expects OffH:
(X86 defsymbol 521)+=
... 38 CONSTANTS
89 defsymbol
....
521 521 521
457 " (MIPS)
else if (p->scope == CONSTANTS 491 " (SPARC)
&& (isint(p->type) I I isptr(p->type)) 520 " (X86)
&& p->name[O] == 'O' && p->name[l] == 'x') 80 EXTERN
p->x.name = stringf("O%sH", &p->name[2]); 50 generated
45 genlabel
The front end and back ends share the same name for the remaining 38 GLOBAL
symbols, such as decimal constants and static globals: 60 isint

(X86 defsymbol 521)+=


...
521 521
60 isptr
38 LOCAL
37 scope
else 80 STATIC
p->x.name = p->name; 99 stringf
362 x.name
The interface procedure address does for symbols that use offset arith· 362 x.offset
metic, like _up+28, what defsymbol does for ordinary symbols:
(X86 functions 498) +=
... 497
....
s20 s22
static void address(q, p, n) Symbol q, p; int n; {
if (p->scope == GLOBAL
I I p->sclass == STATIC I I p->sclass EXTERN)
q->x.name = stringf("%s%s%d",
p->x.name, n >= 0 ? "+" : "", n);
else {
q->x.offset = p->x.offset + n;
522 CHAPTER 18 • GENERATING X86 CODE

q->x.name = stringd(q->x.offset);
}
}
For variables on the stack, address simply computes the adjusted offset.
For variables .accessed using a label, it sets x. name to a string of the form
name± n. If the offset is positive, the literal "+" emits the operator; if
the offset is negative, the %d emits it.
The front end calls defconst to emit assembler directives to allocate
and initialize a scalar to a constant. The argument ty identifies the
proper member of the union v:
...
(X86 functions498}+=
static void defconst(ty, v) int ty; Value v; {
521 523
... 497

switch (ty) {
{X86 defconst 522}
}
}
Most cases simply emit the member into an assembler directive that al-
locates and initializes a cell of the type ty:
(X86 defconst 522}=
case C: print("db %d\n", v.uc); return;
...
522 522

address 90 case S: print("dw %d\n", v.ss); return;


(MIPS) " 457 case I: print("dd %d\n", v. i ) ; return;
(SPARC) " 490 case U: print("dd 0%xH\n", v.u ); return;
(X86) " 521 case P: print("dd 0%xH\n", v.p ); return;
defaddress 91
(MIPS) " 456 The assembler's real4 and real8 directives are unusable because they
(SPARC) " 490 can't express floating-point constants that result from arbitrary expres-
(X86) .. 523 sions (e.g., with casts), so defconst emits floating-point constants in hex-
stringd 29
adecimal:
swap 371
...
Value
x.name
47
362
(X86 defconst 522}+=
case F:
522 522
...
522

x.offset 362
print("dd 0%xH\n", *(unsigned *)&v.f);
return;
The two halves of each double must be exchanged if 1cc is running on
a little endian and compiling for a big endian, or vice versa:
(X86 defconst 522}+= 522
...
522
case D: {
unsigned *p = (unsigned *)&v.d;
print("dd 0%xH,0%xH\n", p[swap], p[l - swap]);
return;
}
The interface procedure defaddress allocates space for a pointer and
initializes it to a symbolic address:
18.4 • DEFINING DATA 523

(X86 functions 498) +=


...
522 523 497
.....
static void defaddress(p) Symbol p; {
print("dd %s\n", p->x.name);
}

defconst's switch case for pointers initializes a pointer to a numeric


address.
The interface procedure defstri ng emits directives that initialize a
series of bytes:
(X86 functions 498) +=
...
523 523 497
.....
static void defstring(n, str) int n; char *str; {
char *s;

for (s = str; s < str + n; s++)


pri nt("db %d\n", (*s)&0377);
}

It finds the end of the string by counting, because ANSI C escape codes
permit strings with embedded null bytes.
The front end calls export to expose a symbol to other modules. The
public assembler directive does just that:
(X86 functions 498) +=
...
523 523 497
..... 459 cseg (MIPS)
static void export(p) Symbol p; { 492 " (SPARC)
print("public %s\n", p->x.name); 501 " (X86)
} 91 defconst
455 " (MIPS)
The extern directive makes visible in the current module a symbol ex- 490 " (SPARC)
522 " (X86)
ported by another module, but it may not appear inside a segment, so 38 ref
the interface procedure import temporarily switches out of the current 91 segment
segment: 459 " (MIPS)

(X86 functions 498) + = ...


523 524 497
491
501
" (SPARC)
" (X86)
.....
static void import(p) Symbol p; { 362 x.name
int oldseg = cseg;

if (p->ref > O) {
segment(O);
print("extrn %s:near\n", p->x.name);
segment(oldseg);
}
}

The near directive declares that the external can be addressed directly.
The flat memory model and its 32-bit addresses permit direct addresses
for everything, so it's unnecessary to understand near and the related
directives unless one is generating segmented code, which is harder.
524 CHAPTER 18 • GENERATING X86 CODE

1 cc's implementation of segment for the X86 takes care that the call
segment (0) switches out of the current segment but not into any new
segment. import checks the symbol's ref field to emit the directives
only if the symbol is used, because some X86 linkers object to gratuitous
extrns.
The front end calls the interface procedure global to define a new
global. If the global is initialized, the front end next calls defconst, so
global allocates space only for uninitialized globals, which are in the
BSS segment:
....
(X86 functions 498) += 523 524
..... 497
static void global(p) Symbol p; {
print("align %d\n",
p->type->align > 4? 4 : p->type->align);
print("%s label byte\n", p->x.name);
if (p->u.seg == BSS)
print("db %d dup (O)\n", p->type->size);
}

The front end calls the interface procedure space to define a block of
global data initialized to zero:
....
(X86 functions 498) += 524 497
align 78 static void space(n) int n; {
BSS 91 if (cseg != BSS)
(MIPS) cseg 459 print("db %d dup (O)\n", n);
(SPARC) " 492 }
(X86) " 501
defconst 91
(MIPS) " 455
(SPARC) " 490
(X86) " 522
Further Reading
import 90
(MIPS) " 457 Various reference manuals elaborate on the architecture of this ma-
(SPARC) " 491 chine (Intel Corp. 1993). The assembler manuals that come with Mi-
(X86) " 523 crosoft's MASM and Borland's Turbo Assembler elaborate on the assem-
ref 38
seg 265
bler language in general and the directives that control the various mem-
segment 91 ory models in particular.
(MIPS) " 459
(SPARC) " 491
(X86) " 501 Exercises
x.name 362
18.1 Scan the X86 reference manual for instructions that l cc could use
but doesn't. Add rules to emit these instructions. Benchmark the
compiler before and after each change to determine which changes
pay off.
18.2 Some of l cc's opcodes commute, which means that for every rule
like
EXERCISES 525

reg: ADDI(reg,mrel) "mov %e,%0\nadd %e,%1\n" 2

we might also have a rule

reg: ADDI(mrel,reg) "mov %e,%1\nadd %e,%0\n" 2

Experiment with adding some commuted rules. Which ones make a


significant difference? Which can't make a difference because the
front end never generates them?
18.3 Some noncommutative operations have a dual that exchanges their
operands. For example, the rule

stmt: GTI(reg,mrel) "emp %0,%1\njg %a\n" 2

has the dual

stmt: GTI(mrel,reg) "emp %1,%0\njl %a\n" 2

because x > y if and only if y < x. Try to find some X86 dual
rules that pay off.
18.4 rep movsb copies eex bytes one at a time. rep movsw copies eex
16-bit units about twice as fast, and rep movsd copies eex 32-bit 502 ckstack
units another rough factor of two faster. Change the block-copy
code to exploit these instructions when it can.
18.5 lee's function prologues and epilogue save and restore ebx, esi,
and edi even if the routine doesn't touch them. Correct this blem-
ish and determine if it was worth the effort.
18.6 Reserve one general register and assign it to the most promising
local. Measure the improvement. Repeat the experiment for more
registers. Which number of register variables gives the best result?
18.7 lee emits lea edi ,l[edi] for the addition in f(i+l). We'd prefer
i ne edi, but it's hard to adapt the X86 code generator to emit that
code for this particular case. Explain why.
18.8 Construct a small C program that draws ekstaek's diagnostic.
18.9 Revise the X86 code generator to spill and reload floating-point reg-
isters without help from the programmer. See the discussion of
ekstaek.
19
Retrospective

l cc is one way to build a C compiler. Hundreds of technical decisions


were made during l cc's design and implementation, and there are viable
alternatives for many of them. The exercises in the previous chapters
suggest some alternatives. This chapter looks back at l cc's design and
discusses some of the global design alternatives that would most affect
the current implementation. These alternatives are the ones that, with
the benefit of hindsight, we might now prefer.
Many of the programming techniques used in l cc, such as Chapter 2's
storage allocator and the string management described in Section 2.5,
are useful in a wide range of applications. The symbol-table module,
described in Chapter 3, is specific to l cc, but can be easily adapted to
other applications that need similar functions, and the input module in
Section 6.1 can be used anywhere high-speed input is important.
The parsing techniques detailed in Chapter 8 are useful in applications
that must parse and evaluate expressions, such as spreadsheets. Even
l burg has applications beyond its use for selecting instructions, as de-
scribed in Chapter 14. The matchers l burg generates know little about
l cc's nodes and they can be used for problems that boil down to match-
ing patterns in trees. The approach epitomized by lburg - generating
a program from a compact specification of its salient attributes - has
wide applicability. Other compilers routinely use this approach for gen-
erating lexical analyzers and parsers with tools like LEX and YACC, for
example.

19.1 Data Structures


Sharing data structures between the front end and the code generator
is manageable because there are few such structures. A disadvantage
of this approach, however, is that the structures are more complex than
they might be in other designs, which compromises simplicity. For exam-
ple, symbols represent all identifiers across the interface. Symbols have
many fields, but some are relevant only to the front end, and access to
them can be regulated only by convention. Some symbols use only a few
of the fields; labels, for example, use only the name field and the fields
in u. l. A data structure tailored to labels would be much less cluttered.
C shares the blame for this complexity: Specifying all of the possi-
bilities requires a type system richer than C's. Some of the complexity

526
19.2 • INTERFACE 527

might be avoided by defining separate structures - for example, one


for each kind of symbol and another for private front-end data - but
doing so increases the data-structure vocabulary and hence complexity.
Type systems with inheritance simplify defining variants of a structure
without also complicating uses of those variants. The type systems in
object-oriented languages, such as Oberon-2, Modula-3, and C++, have
the necessary machinery. In these languages, we would define a base
symbol type with only the fields common to all symbols, and separate
types for each kind of symbol. These types would use inheritance to
extend the base type with symbol-specific fields.
In Modula-3, for example, the base type might be defined simply as
TYPE Symbol = OBJECT
name: TEXT
END;
which defines an object type with one field, name, that holds a string. A
type for labels would add fields specific to labels:
TYPE Label = Symbol OBJECT
label: INTEGER;
equatedto: Label
END;
91 defconst
which defines Label to be an object type with all of Symbol's fields plus 455 " (MIPS)
the two label-specific fields. Procedures that manipulate Symbo 1s can 490 " (SPARC)
also manipulate Labels, because a Label is also a Symbol. 522 " (X86)
315 node
The same mechanism could be used for the other data structures, such 37 symbol
as types, trees, and nodes. The back-end extensions - the x fields of
symbo 1s and nodes - would be unnecessary because the back end could
define additional types that extend front-end types with target-specific
fields.
Object-oriented languages also support methods, which are proce-
dures that are associated with and operate on values of a specific type.
Methods would replace some of the interface functions, and they would
eliminate switch statements like the ones in the implementations of
defconst, because the methods would be applied to only specific types.

19.2 Interface
1cc's code-generation interface is compact because it omits the inessen-
tial and makes simplifying assumptions. These omissions and assump-
tions do, however, limit the interface's applicability to other languages
and machines.
The interface assumes that signed and unsigned integers and long
integers all have the same size. This assumption lets 1cc make do with
528 CHAPTER 19 • RETROSPECTIVE

nine type suffixes and 108 type-specific operators, but it complicates


full use of some 64-bit machines. If we had it to do over again, we might
use distinct type suffixes for signed and unsigned characters, shorts,
integers, and long integers, and for floats, doubles, and long doubles.
We've even considered backing such types into 1cc, though it's hard to be
enthusiastic about the chore. For example, adding a suffix for just long
doubles would add at least 19 operators and code in both the front and
back ends to handle them. This change wouldn't need a lot of additional
code in a few places; it would need a few lines of code in many places.
Another alternative is for the suffixes to denote only datatype, not size,
and to add separate suffixes for each size. For example, ADDI2 and ADDI4
would denote addition of 2-byte and 4-byte integers. The sizes could also
be carried elsewhere in a node instead of being encoded in the operator
names.
The interface assumes that all pointers have the same representation.
This assumption complicates targeting word-addressed machines, where
pointers to units smaller than a word - like characters - need extra bits
to identify a unit within the word. Differentiating between character and
word pointers would add another suffix and at least 13 more operators.
We don't regret this assumption yet, but we haven't targeted a word-
addressed machine yet either.
The operator repertoire omits some operators whose effect can be
emitcode 341 synthesized from simpler ones. For example, bit fields are accessed with
gencode 337 shifting and masking instead of specific bit-field operators, which may
local 90
(MIPS) " 447
complicate thorough exploitation of machines with bit-field instructions.
(SPARC) " 483 On the other hand, the front end special-cases one-bit fields and gen-
(X86) " 518 erates efficient masking dags, which often yields better code than code
node 315 that uses bit-field instructions.
The interface has gone through several revisions and has been simpli-
fied each time by moving functionality into the front end or by pruning
the interface vocabulary. For example, earlier versions had an interface
function and an operator to implement switches. Each revision made the
back ends smaller, but blemishes remain.
On one hand, we may have moved too much into the front end. For
example, there were once operators for such holes in the opcode x type
matrix as INDIRU, RETP, and CVUD; cutting the redundancy saved a lit-
tle code, but a more regular operator set would be easier to learn. As
another example, the back end doesn't see the code list and can tra-
verse it only via gen code and emi tcode. Several people have used 1cc
to study global optimizations, and some found that they needed finer
control over the traversals. To get this control, they had to expose the
code list to the back end - that is, have the code list be the interface
- and move more ambitious versions of gen code and emi tcode into the
target-independent part of the back end. This change replaces interface
functions like 1oca1 with the equivalent code-list entry. An interface that
19. 3 • SYNTACTIC AND SEMANTIC ANALYSES 529

exposed the code list - or a flow graph - together with standard im-
plementations of gencode and emi tcode would permit clients to choose
between simplicity and flexibility.
On the other hand, the interface could be simpler yet. For example,
ASGN and CALL have type-specific variants that take different numbers
of operands. This variability complicates decisions that otherwise could
be made by inspecting only the generic operation. Operators that al-
ways generate trivial target code are another example. A few operators
generate nothing on some targets, but some, like CVUI and CVIU, gener-
ate nothing on all current or conceivable targets. Production back ends,
like those described in this book, take pains to avoid generating vacuous
register-to-register moves for these operators. Similarly, the narrowing
conversions CV{UI} x {CS} are vacuous on all targets and might well be
omitted.
Several interface conventions, if not obeyed, can cause subtle errors.
For example, the interface functions local and function, and the code
for the operator CALLB collaborate to generate code for functions that
return structures. Three sites in the back end must cooperate perfectly,
or the compiler will silently generate incorrect code. The front end could
deal with such functions completely and thus eliminate the interface flag
wants_ca11 b, but this would exclude some established calling sequences.
Similar comments apply to ARGB and the flag wants_argb. The trade-off
for generating compatible calling-sequence code is a more complex code- 341 emitcode
generation interface. 92 function
448 " (MIPS)
l cc's interface was designed for use in a monolithic compiler in which 484 " (SPARC)
the front end and back ends are linked together into a single program. 518 " (X86)
This design complicates separating the front and back ends into separate 337 gencode
programs. Some of the interaction is two-way; the upcalls from the inter- 60 isstruct
face function function to gencode and emitcode are examples. These 90 local
447 " (MIPS)
upcalls permit the front end to generate conversion code required at 483 " (SPARC)
function entry. The back end examines few fields in the source-language 518 " (X86)
type representation; it uses front-end functions like i sstruct to query 88 wants_argb
types. To make the back end a separate program, type data must be 88 wants_callb
transmitted to answer such queries, and the back end might have to im-
plement the function entry conversions.

19.3 Syntactic and Semantic Analyses


l cc interleaves parsing and semantic analyses. This approach is typical
of many compilers based on the classical design for recursive-descent
parsers that has been used widely since the early 1960s. It's easy to
understand and to implement by hand, and it yields fast compilers.
Many languages, such as C, were designed for one-pass compilation,
in which code is emitted as the source program is consumed, as in l cc.
530 CHAPTER 19 • RETROSPECTIVE

Most languages have a declaration-before-use rule: They insist that iden-


tifiers be declared before they are used, except in specific contexts, and
they provide mechanisms that help programmers comply. For example,
the C declaration
extern Tree (*optree[])(int, Tree, Tree);
declares, but does not define, optree so that it can be used before it's
defined. Other examples include the forward structure declaration de-
scribed on page 276 and Pascal's forward declaration. The sole purpose
of these kinds of declarations is to make one-pass compilation possible.
Modern languages, such as Modula-3 and ML, have no such rules.
In Modula-3, for example, declarations are definitions; they introduce
names for constants, types, variables, exceptions, and procedures, and
they can appear in any order. The order in which they appear affects
only the order in which initializations are executed. This flexibility can
be confusing at first, but Modula-3 has fewer linguistic rules and special
cases than does C, which makes it easier to understand in the long run.
Languages with these kinds of features demand multiple-pass compil-
ers because the entire source must be consumed in order to resolve the
dependencies between declarations, for example. These compilers sepa-
rate syntax analysis from semantic analysis because they must. The first
pass usually builds an AST (abstract syntax tree) for the entire source
program, and subsequent passes traverse the AST adding pass-specific
annotations. For example, the declarations pass in a Modula-3 com-
piler analyzes only declarations, builds symbol tables, and annotates the
nodes in the AST with pointers to symbol-table entries. The code genera-
tion passes might visit procedure nodes and their descendants, generate
code, and annotate procedure nodes with the equivalent of 1cc's code
lists.
1 cc's one-pass approach has its advantages. It consumes less memory
than AS Ts require, and it can be faster because simple constructs don't
pay for the time overhead associated with building and traversing ASTs.
Initializations of large arrays exemplify these advantages; 1cc compiles
them in space proportional to only their largest single initializer, and
can thus handle initializations of any size. Compilers that use ASTs usu-
ally build a tree for an entire list of initializers, and thus may limit the
maximum size of an initialization in order to avoid excessive memory
use.
On modem computers, however, the time and space efficiency of one-
pass compilers is no longer as important as the flexibility of the AST
approach. Separating the various compilation passes into AST traversals
can simplify the code for each pass. This approach would simplify 1cc's
modules that parse and analyze declarations, expressions, and state-
ments, and it would make the corresponding chapters in this book easier
to understand.
19.4 • CODE GENERATION AND OPTIMIZATION 531

Using ASTs would also make it easier to use l cc for other purposes.
Parts of l cc have been used to build browsers, front ends for other back
ends, back ends for other front ends, and link-time and run-time code
generators, and it has been used to generate code from within a.Tl inter-
preter and a debugger. l cc's design did not anticipate some of these
uses, and at least some of these projects would have been easier if l cc
had built ASTs and let clients traverse and annotate them.

19.4 Code Generation and Optimization


Code generation requires trade-offs. Ambitious optimizers emit better
code, but they're bigger and slower. A bigger compiler would've taken
us longer and wouldn't have fit in a book, and a slower compiler would
cost time for us and for all programmers, for whom compilation time is
often a bottleneck. So l cc emits satisfactory code, but other compilers
can beat it on this score.
l cc's instruction selection is optimal for each tree in isolation, but the
boundaries between the code for adjacent trees may be suboptimal. l cc
would benefit from a final peephole optimization pass to clean up such
problems. Of the various optimizations that one might add, however,
this one is probably the simplest, but our past experience suggests it
would yield the least. 92 function
l cc's interface is designed to support only code generation; it has no 448 " (MIPS)
484 " (SPARC)
direct support for building a flow graph or other structures that facilitate
518 " (X86)
global optimization. More elaborate versions of function and gen could 92 gen
collaborate to build the relevant structures, perform optimizations, and 402 gen
invoke the simpler gen, but generating flow graphs from ASTs is a more
general solution.
l cc's register allocator is rudimentary. It allocates some variables and
local common subexpressions to registers, but in all other respects it is
minimal. A modern graph-coloring register allocator would do better. We
resisted a more ambitious register allocator mainly because we estimated
that it would add over 1,000 lines, or roughly 10%, to the compiler, and
we already had to omit parts of the compiler from this book.
l cc's SPARC code needs instruction scheduling now, and other targets
are likely to need scheduling in the future. Ideally, scheduling interacts
with register allocation, but a postpass scheduler would probably be sim-
pler and would thus fit l cc better.

19.5 Testing and Validation


This book's companion diskette includes some programs that we use to
test l cc at every change. This first-level testing compares the emitted
532 CHAPTER 19 • RETROSPECTIVE

assembler code and the output of the assembled program with saved
baseline assembler code and output. Sometimes we expect the assembler
code to change, so the first comparison can tell us nothing, but it's worth
doing because sometimes it fails unexpectedly and thus tells us that a
change to the compiler went overboard.
We also test, though somewhat less often, using the language confor-
mance section of the Plum-Hall Validation Suite for ANSI C compilers
and with a large set of numeric programs translated from Fortran. The
numeric programs have more variables, longer expressions, and more
common subexpressions than the other tests, which strains the register
allocator and thus tests the spiller better. Spills are rare, so spillers are
often hard to test.
1cc's test suite includes material that came to us as bug reports, but
we wish we'd saved more. lee has been in use at AT&T Bell Laboratories
and Princeton University since 1988 and at many other sites since then.
Many errors have been reported, diagnosed, and corrected. Electronic
news summarized each repair for users at Bell Laboratories and Prince-
ton, so that users might know if they needed to discard old binaries. We
recorded all the news messages, but next time we'd record more.
First, we'd record the shortest possible input that exposes each bug.
Just finding this input can be half the battle. Some bug reports were
nothing more than a note that 1cc's code for the program gave a wrong
answer and a pointer to a directory full of source code. It's hard to find
a compiler error when all you have is a large, unknown source program
and thousands of lines of object code. We usually start by trimming
the program until another cut causes the bug to vanish. Almost all bugs
have, in the end, been demonstrated by sample code of five lines or fewer.
Next time, we'd save these programs with sample input and output, and
create a test harness that would automatically recheck them. One must
resist the temptation to omit bugs deemed too arcane to reoccur. We've
sometimes reintroduced an old bug when fixing a new one, and thus had
to track and fix the old one a second time. A test harness would probably
pay for itself after one or two reintroduced bugs.
We'd also link at least some bugs with the code that corrects them.
1cc was not originally written as a literate program; the English here
was retrofitted to the code. In this, we encountered several compiler
fragments that we could no longer explain immediately. Most of them
turned out to repair bugs, but we'd have saved time if we'd kept more
sample bugs - that is, the source code and sample input and output -
nearby in comments or, now, in possibly elided fragments of the literate
program.
Another kind of test suite would help retargeters. When writing a
back end for a new target, we don't implement the entire code generator
before we start testing. Instead, we implement enough to compile, say,
the trivial program
FURTHER READING 533

main() {
printf("Hello world\n");
}

When we get 1cc to compile that correctly, we trust - perhaps naively -


all simple function calls and use them to test another primitive feature
that is needed for most other testing. Integer assignment is a typical
second step:
main() {
int i = O;
pri ntf("%d\n", i);
}

We continue testing with a series of similar programs. Each is simple,


tests exactly one new feature, and uses as few other features as possible,
in order to minimize the amount of assembler code and compiler traces
we must read if the test program fails. We never took the time to collect
the tiny test programs as a guide for future retargetings, but doing so
would have saved us time in the long run, and it would save you time
when you write an 1 cc back end for your favorite computer.

Further Reading
Schreiner and Friedman (1985) describe how to use LEX (Lesk 1975) and
YACC (Johnson 1975) by building a toy compiler for a small language.
Holub (1990) and Gray et al. (1992) describe more modern variants of
these compiler tools and how to implement them.
Budd (1991) is a gentle introduction to object-oriented programming
and object-oriented programming languages; he describes SmallTalk,
C+ +, Object Pascal, and Objective-C. The reference manuals for C+ +
(Ellis and Stroustrup 1990), Oberon-2 (Mossenbock and Wirth 1991), and
Modula-3 (Nelson 1991) are the definitive sources for those languages.
Ramsey (1993) adapted 1 cc to be an expression server for the retar-
getable debugger 1db. The server accepts a C expression entered during
debugging and a symbol table, compiles the expression as if it appeared
in a context described by the supplied symbol table, and evaluates it.
Ramsey wrote a back end that emits Postscript instead of assembler lan-
guage, and 1db's embedded Postscript interpreter evaluates the gener-
ated code and thus evaluates the expression. He also modified 1 cc to
emit 1db symbol tables.
Appel (1992) describes a research compiler for ML that builds ASTs
and makes more than 30 passes over them during compilation.
Our paper describing an earlier version of 1 cc (Fraser and Hanson
199lb) compares 1 cc's size and speed and the speed of its generated
code with the vendor's compilers and with gee on the VAX, Motorola
534 CHAPTER 19 • RETROSPECTIVE

68020, SPARC, and MIPS R3000. lee generated code that was usually
better than the code generated by the commercial compiler without opti-
mization enabled. A companion paper gives measurements that support
our intuition that register spills are rare (Fraser and Hanson 1992).
Lamb (1981) describes a typical peephole optimizer. The peephole
optimizer copt is about the simplest possible; it is available by anony-
mous ftp from research. att. com. Davidson and Fraser (1984) describe
a peephole optimizer driven by a formal description of the target ma-
chine.
Chaitin et al. (1981) describe register allocation by graph coloring,
and Krishnamurthy (1990) surveys some of the literature in instruction
scheduling. Proebsting and Fischer (1991) describe one of the simplest
integrations of register allocation and instruction scheduling.
Bibliography

Aho, A. V., and S. C. Johnson. 1974. LR parsing. ACM Computing Sur-


veys 6(2), 99-124.

- - . 1976. Optimal code generation for expression trees. Journal of


the ACM 23(3), 488-501.
Aho, A. V., R. Sethi, and J. D. Ullman. 1986. Compilers: Principles, Tech-
niques, and Tools. Reading, MA: Addison Wesley.

American National Standards Institute, Inc. 1990. American National


Standard for Information Systems, Programming Language C ANSI
XJ.159-1989. New York: American National Standards Institute, Inc.
Appel, A. W. 1991. Garbage collection. In P. Lee, Ed., Topics in Advanced
Language Implementation Techniques, 89-100. Cambridge, MA: MIT
Press.
- - . 1992. Compiling with Continuations. Cambridge: Cambridge Uni-
versity Press.
Baskett, F. 1978. The best simple code generation technique for while,
for, and do loops. SIGPLAN Notices 13(4), 31-32.
Bernstein, R. L. 1985. Producing good code for the case statement.
Software-Practice and Experience 15(10), 1021-1024.
Boehm, H.-J., and M. Weiser. 1988. Garbage collection in an uncooperative
environment. Software-Practice and Experience 18(9), 807-820.
Budd, T. A. 1991. An Introduction to Object-Oriented Programming. Read-
ing, MA: Addison Wesley.
Bumbulis, P., and D. D. Cowan. 1993. RE2C: A more versatile scanner
generator. ACM Letters on Programming Languages and Systems 2(1-
4), 70-84.
Burke, M. G., and G. A. Fisher. 1987. A practical method for LR and LL syn-
tactic error diagnosis. ACM Transactions on Programming Languages
and Systems 9(2), 164-197.
Chaitin, G. J., M.A. Auslander, A. K. Chandra, J. Cocke, M. E. Hopkins,
and P. W. Markstein. 1981. Register allocation via coloring. Journal of
Computer Languages 6, 47-57.
535
538 BIBLIOGRAPHY

Cichelli, R. J. 1980. Minimal perfect hash functions made simple. Com-


munications of the ACM 23(1), 17-19.

Clinger, W. D. 1990. How to read floating-point numbers accurately. Pro-


ceedings of the SIGPLAN'90 Conference on Programming Language De-
sign and Implementation, SIGPLAN Notices 25(6), 92-101.

Davidson, J. W., and C. W. Fraser. 1984. Code selection through object


code optimization. ACM Transactions on Programming Languages and
Systems 6(4), 505-526.

Davie, A. J. T., and R. Morrison. 1981. Recursive Descent Compiling. New


York: John Wiley & Sons.
Ellis, M. A., and B. Stroustrup. 1990. The Annotated C++ Reference Man-
ual. Reading, MA: Addison Wesley.

Fischer, C. N., and R. J. LeBlanc, Jr. 1991. Crafting a Compiler with C.


Redwood City, CA: Benjamin/Cummings.
Fraser, C. W. 1989. A language for writing code generators. Proceedings
of the SIGPLAN'89 Conference on Programming Language Design and
Implementation, SIGPLAN Notices 24(7), 238-245.

Fraser, C. W., and D.R. Hanson. 199la. A code generation interface for
ANSI C. Software-Practice and Experience 21(9), 963-988.
- - . 1991b. A retargetable compiler for ANSI C. SIGPLAN No-
tices 26(10), 29-43.

- - . 1992. Simple register spilling in a retargetable compiler.


Software-Practice and Experience 22(1), 85-99.

Fraser, C. W., D.R. Hanson, and T. A. Proebsting. 1992. Engineering a sim-


ple, efficient code-generator generator. ACM Letters on Programming
Languages and Systems 1(3), 213-226.

Fraser, C. W., R.R. Henry, and T. A. Proebsting. 1992. BURG-Fast optimal


instruction selection and tree parsing. SIGPLAN Notices 27(4), 68-76.
Freiburghouse, R. A. 1974. Register allocation via usage counts. Commu-
nications of the ACM 17(11), 638-642.

Gray, R. W., V. P. Heuring, S. P. Levi, A. M. Sloane, and W. M. Waite. 1992.


Eli: A complete, flexible compiler construction system. Communica-
tions of the ACM 35(2), 121-131.

Griswold, R. E. 1972. The Macro Implementation of SNOBOL4. San Fran-


cisco: W. H. Freeman.
BIBLIOGRAPHY 537

Hansen, W. J. 1992. Subsequence references: First-class values for


substrings. ACM Transactions on Programming Languages and Sys-
tems 14(4), 471-489.

Hanson, D. R. 1974. A simple technique for representing strings in For-


tran IV. Communications of the ACM 17(11), 646-647.
- - . 1983. Simple code optimizations. Software-Practice and Experi-
ence 13(8), 745-763.

- - . 1985. Compact recursive-descent parsing of expressions.


Software-Practice and Experience 15(12), 1205-1212.

- - . 1990. Fast allocation and deallocation of memory based on object


lifetimes. Software-Practice and Experience 20(1), 5-12.
Harbison, S. P., and G. L. Steele, Jr. 1991. C: A Reference Manual (third
edition). Englewood Cliffs, NJ: Prentice Hall.
Hennessy, J. L., and N. Mendelsohn. 1982. Compilation of the Pascal case
statement. Software-Practice and Experience 12(9), 879-882.
Heuring, V. P. 1986. The automatic generation of fast lexical analyzers.
Software-Practice and Experience 16(9), 801-808.

Holub, A. I. 1990. Compiler Design in C. Englewood Cliffs, NJ: Prentice


Hall.
Holzmann, G. J. 1988. Beyond Photography. Englewood Cliffs, NJ: Prentice
Hall.
Intel Corp. 1993. Intel486 Microprocessor Family Programmer's Reference
Manual. Intel Corp.

Jaeschke, G., and G. Osterburg. 1980. On Cichelli's minimal perfect hash


function method. Communications of the ACM 23(12), 728-729.
Johnson, S. C. 1975. YACC-Yet another compiler compiler. Technical
Report 32, Murray Hill, NJ: Computing Science Research Center, AT&T
Bell Laboratories.
- - . 1978. A portable compiler: Theory and practice. In Conference
Record of the ACM Symposium on Principles of Programming Lan-
guages, Tucson, AZ, 97-104.

Kane, G., and J. Heinrich. 1992. MIPS RISC Architecture. Englewood Cliffs,
NJ: Prentice Hall.
Kannan, S., and T. A. Proebsting. 1994. Correction to 'producing good
code for the case statement'. Software-Practice and Experience 24(2),
233.
538 BIBLIOGRAPHY

Kernighan, B. W., and R. Pike. 1984. The UNIX Programming Environment.


Englewood Cliffs, NJ: Prentice Hall.
Kernighan, B. W., and D. M. Ritchie. 1988. The C Programming Language
(second edition). Englewood Cliffs, NJ: Prentice Hall.
Knuth, D. E. 1973a. The Art of Computer Programming: Volume l, Fun-
damental Algorithms (second edition). Reading, MA: Addison Wesley.

- - . 1973b. The Art of Computer Programming: Volume 3, Searching


and Sorting. Reading, MA: Addison Wesley.

- - . 1984. The TeXBook. Reading, MA: Addison Wesley.


- - . 1992. Literate Programming. CSU Lecture Notes Number 27. Stan-
ford, CA: Center for the Study of Language and Information, Stanford
University.
Krishnamurthy, S. M. 1990. A brief survey of papers on scheduling for
pipelined processors. SIGPLAN Notices 25(7), 97-106.
Lamb, D. A. 1981. Construction of a peephole optimizer. Software-
Practice and Experience 11(12), 639-648.

Lesk, M. E. 1975. LEX-A lexical analyzer generator. Technical Report 39,


Murray Hill, NJ: Computing Science Research Center, AT&T Bell Labo-
ratories.
Logothetis, G., and P. Mishra. 1981. Compiling short-circuit Boolean ex-
pressions in one pass. Software-Practice and Experience 11(11), 1197-
1214.
McKeeman, W. M., J. J. Horning, and D. B. Wortman. 1970. A Compiler
Generator. Englewood Cliffs, NJ: Prentice Hall.

Mossenbock, H., and N. Wirth. 1991. The programming language Oberon-


2. Structured Programming 12(4), 179-195.
Nelson, G. 1991. Systems Programming with Modula-3. Englewood Cliffs,
NJ: Prentice Hall.
Patterson, D. A., and J. L. Hennessy. 1990. Computer Architecture: A
Quantitative Approach. San Mateo, CA: Morgan Kaufmann.

Pelegri-Llopart, E., and S. L. Graham. 1988. Optimal code generation for


expression trees: An application of BURS theory. In Conference Record
of the ACM Symposium on Principles of Programming Languages, San
Diego, CA, 294-308.
BIBLIOGRAPHY 539

Proebsting, T. A. 1992. Simple and efficient BURS table generation. Pro-


ceedings of the SIGPLAN'92 Conference on Programming Language De-
sign and Implementation, SIGPLAN Notices 27(6), 331-340.

Proebsting, T. A., and C. N. Fischer. 1991. Linear-time, optimal code


scheduling for delayed-load architectures. Proceedings of the SIG-
PLAN'91 Conference on Programming Language Design and Implemen-
tation, SIGPLAN Notices 26(6), 256-267.

Ramsey, N. 1993. A Retargetable Debugger. Ph.D. diss., Princeton Univer-


sity, Princeton, NJ.
- - . 1994. Literate programming simplified. IEEE Software 11(5), 97-
105.
Ramsey, N., and D.R. Hanson. 1992. A retargetable debugger. Proceedings
of the SIGPLAN'92 Conference on Programming Language Design and
Implementation, SIGPLAN Notices 27(7), 22-31.

Richards, M., and C. Whitby-Strevens. 1979. BCPL-The Language and Its


Compiler. Cambridge: Cambridge University Press.

Ritchie, D. M. 1993. The development of the C language. Preprints of the


Second ACM SIGPLAN History of Programming Languages Conference
(HOPL-II), SIGPLAN Notices 28(3), 201-208.

Sager, T. J. 1985. A polynomial time generator for minimal perfect hash


functions. Communications of the ACM 28(5), 523-532.
Schreiner, A. T., and H. G. Friedman, Jr. 1985. Introduction to Compiler
Construction with UNIX. Englewood Cliffs, NJ: Prentice Hall.

Sedgewick, R. 1990. Algorithms in C. Reading, MA: Addison Wesley.


Sethi, R. 1981. Uniform syntax for type expressions and declarators.
Software-Practice and Experience 11(6), 623-628.

SPARC International. 1992. The SPARC Architecture Manual, Version 8.


Englewood Cliffs, NJ: Prentice Hall.
Stallman, R. M. 1992. Using and porting GNU CC. Technical report, Cam-
bridge, MA: Free Software Foundation.
Steele, Jr., G. L., and J. L. White. 1990. How to print floating-point numbers
accurately. Proceedings of the SIGPLAN'90 Conference on Programming
Language Design and Implementation, SIGPLAN Notices 2 5(6), 112-126.

Stirling, C. 1985. Follow set error recovery. Software-Practice and Expe-


rience 15(3), 239-257.
540 BIBLIOGRAPHY

Tanenbaum, A. S., H. van Staveren, and J. W. Stevenson. 1982. Using


peephole optimization on intermediate code. ACM Transactions on Pro-
gramming Languages and Systems 4(1), 21-36.

Ullman, J. D. 1994. Elements of ML Programming. Englewood Cliffs, NJ:


Prentice Hall.
Waite, W. M. 1986. The cost of lexical analysis. Software-Practice and
Experience 16(5), 473-488.

Waite, W. M., and L. R. Carter. 1993. An Introduction to Compiler Con-


struction. New York: Harper Collins.

Waite, W. M., and G. Goos. 1984. Compiler Construction. New York:


Springer-Verlag.
Weinstock, C. B., and W. A. Wulf. 1988. Quick fit: An efficient algorithm
for heap storage management. SIGPLAN Notices 23(10), 141-148.
Wilson, P. R. 1994. Uniprocessor garbage collection techniques. ACM
Computing Surveys 27, to appear.

Wirth, N. 1976. Algorithms +Data Structures =Programs. Englewood


Cliffs, NJ: Prentice Hall.
- - . 1977. What can be done about the unnecessary diversity of no-
tation for syntactic definitions? Communications of the ACM 20(11),
822-823.
Index
Bold page numbers refer to definitions. For a fragment or an identifier,
roman numbers refer to its uses in code, and italic numbers refer to its
uses in the text. Fragments and identifiers without definitions identify
those omitted from this book.

abstract-declarator, 270, 308 (ADDRL), 319


(abstract function), 267,270 ADDRL, 84, 168,169, 179,319, 388,
abstract machine code, 99 399,420
abstract syntax trees, 5, 147, 530 ADDRLP, 346, 341, 351, 361, 376, 371,
dags in, 312, 318, 342 388, 396, 404, 425-26, 436, 469,
operators specific to, 149, 313 470-72, 504
prefix form for, 148 addrof, 189, 190,191
(access the fi.eld described byq), (ADD .. RSH),318,319
182-83 addrtree,210,211,212,219, 328,339
activation record, see frame addtree,109,155, 159, 192
ADA, 127 ADD+U,204
ADD+D,6 ADDU,82,376,404,436,438,470-71,
ADDD,82,376,439,480, 514 474,504-7
ADDF,82,376,439,480, 514 (affi.rmation), 164
ADD,82, 84, 109, 158, 177, 181,183, Aflag,62, 159,160,180,188,221,
192,204-5,209,211-12,242,318 225,235, 243,260, 262,266, 280,
add,205,206,283-84 290-91,297,459,492
ADD+I, 148-49 (a fragment label), 1-2
ADDI, 82, 149, 313, 376, 387, 393, 404, aggregate types, 54
430,436,438,470-71,474, 504-7 -A,62,124, 160,244,263,281,292,
addlocal,211,219,234,319,325 296,459,492
ADDP, 82-83,376,386,404,436,438, -a,220
470-71,474,504-7 align,26-27, 54, 56-58,61-63, 72, 78,
(ADD+P transformations), 209, 210-12 19, 182, 282-83, 285,328, 334,
address calculations 348,365-66,449,458,492, 524
MIPS,436 alignment
SPARC, 470 of allocated memory, 27
X86, 503 of structures and unions, 283, 285
addressed,49, 58,61, 7~ 95, 179, (allocate a new block), 27, 28
210, 296, 327, 449, 483, 486 allocate, 24-25, 26, 21, 28, 31, 32-34,
(Address),217,219 91
Address, 211, 217, 219, 338, 339-40, (allocate registers), 402, 415, 417
341 allocating registers, 354, 408, 413, 417
address, 89, 90, 211, 217, 219, 339-40, allocation arenas, see arenas
457,458,490,521, 522 (alloc.c), 25
(address of), 164, 179 alloc.c, 15-16
ADDRF, 84, 168, 169, 179,319,388,399 ambiguous languages, 129
ADDRF+P, 5-6 ANDAND, 109, 163
ADDRFP,86,34~351, 361,376,388, (AND), 318, 323
436,469,470,504 AND,109,149, 160, 174,225,313,318,
(ADDRG,ADDRF),319 322-23, 335
ADDRG,84,168, 169, 179,319, 327, andtree, 109, 192, 322
340,469 (an identifi.er), 167, 170
ADDRG+P, 148 (announce q), 211
ADDRGP, 6, 8, 83, 247, 351, 376, 436, (announce the constant, if necessary),
442,469,475-76, 503,515 48,49

541
542 INDEX

append,34, 73,271,274,294-95,299 ASGN+I, 166,331,350


applicative functions, 150 ASGNI, 86, 374, 387, 400, 401, 437,
apply,41,42 471-72, 507-8, 512,515
arena, 25, 26, 44 asgnnode,348
arenas,23,97, 150,224,229,254 ASGNP,86,401,437,471-72, 507, 512
ARG+B,185,246 (asgntree), 197-99
ARGB, 82, 88, 355, 367, 376, 434, 446, asgntree, 157, 192, 196, 197, 245,
447, 465, 483, 513, 529 282,350
argc, 89, 305-6, 307, 370, 433, 458, askfixedreg,409,411
466,498 askreg,409-10,411,412-13,433
ARGD,82, 376,444-46,478,479, 514 (askregvar),412-13
ARGF,82, 376,444-46,47~ 514 askregvar, 357, 410-11, 412, 418,
(ARG), 318, 334 447, 450-51, 483, 486, 518
ARG, 82-83, 84-86, 88, 150, 184-86, assembler templates, 8, 354, 376, 392
18~ 318, 332,333,334-36,35~ question mark in, 506
359,403,405,445,477-79 assert(0),3
ARG+I,185 assertions, use of, 3
ARGI,83, 376,444-46,477, 512 (assign field offsets), 280, 282, 285
argoffset,358,366,366,445 (assign), 196
ARGP,83,332,376,444-46,477,512 assign, 18~195, 196, 197,244
argreg,444,445-46,449 assigning registers, 354, 408
ARGS, 17, 18 (assign location for argument i ),
argument-build area, 94, 366 449-50
MIPS, 452 assignment-expression, 154, 157
SPARC, 477, 487 assignments
argument evaluation order, 88, 183, dags for, 328
332 multiple, 331
argument transmission, 356 of arguments to parameters, 338
MIPS, 444, 449-50 of structures, 199
SPARC, 477 results of, 198
X86, 512, 519 to bit fields, 198, 329
argv, 89, 305-6, 307, 370, 433, 458, to pointers, 196
466,498 to qualified types, 198
arithmetic conversions, 173 values of, 328
arithmetic types, 54 (assign offset to argument i ), 519, 520
_arity,389 (assign output register), 417, 418
(array), 267, 268 (assign r to nodes), 418, 419
ARRAY,49, 54, 57,60,62-63,69, 72-73, associativity, 130, 152
109,266,268 AST, see abstract syntax trees
array,61,62, 72, 123, 182,242,266, atop,62, 174, 193,275,327
302, 304 (augmented assignment), 157, 158
arrays augmented assignments, 158, 312
parsing declarations of, 268 AUT0,39,80, 94-95, 187, 191,211,
sizes of initialized, 264 255-56,261-63,264,275,28~
structures in, 285 291, 294, 295, 297, 304, 412-13,
(arrives in an i -register, belongs in 423,449,486, 518, 520
memory), 486 (auto local), 299
ASGN+B,246 autos,294, 294-95,299,484,486-87
ASGNB, 83, 85, 88, 355, 367, 434, avail, 25, 26-27
446-47, 482, 484, 512
ASGNC,389,401,437,471-72, 512 backpatching, 349
ASGN+F, 5 backslash,124, 126
ASGNF,~401,438,471-72,514 backtracking, 133
(ASGN),318, 328-29 BAND,84, 177,198,207,209,318,
ASGN, 83-86, 157-58, 190, 197, 198, 330-31
245, 316, 318, 320, 328-29, 334, basic blocks, 313
336,343,348, 352,361,385,396, BCOM,84,318
399, 402, 405, 422, 425, 472, 529, BCOMI,83
555 BCOMU,83,439,474-75, 508
asgn,191,202,234,302,339 -b,220,249
INDEX 543

big endian, 87, 370, 431 X86 indirect jumps for, 515
binary-expression, 154, 161 (break statement), 221, 232
binary, 173, 192-93, 200 bsize,105, 106-107
(bind.c), 96 BSS,91,265,300, 304,458,459,491,
bind.c,96 192, 501, 524
Binding,96 btot,50, 74,346
bindings,96, 306 buffer,105, 106, 107
bitcount,452 BUFSIZE, 105, 105, 107, 112, 122, 125
bit fields, 13, 66 (build an ADD+P tree), 192, 193
and endianness, 66, 87 (build the protot}-pe), 273, 274
assigning constants to, 330 ~builtin_va_alist,484
assignments to, 329, 350 BURM, 373, see also lburg
extracting, 320 (BURM signature), 378-81, 389-91,
postincrementing, 336 406
sign extending, 320 BXOR, 84, 318
simplifying references to, 208
storage layout of, 2 79 C++, 527
types permitted for, 281 CALL+B, 184-86, 189-90,245-46,
unnamed, 309 332-33
bittree, 192, 198, 209,215, 330-31, CALLB, 85, 88, 186, 332, 465, 476, 483,
332 529
BLANK, 110, 111-12 CALLO, 86,442,443,476-77, 518
(blkcopy),367-68 callee,93,94,286, 290,292-93,
blkcopy,357,367,368, 372,434, 337-38, 448-49, 451, 453, 484-85,
446-47, 460, 482, 494 487-88, 518-20
blkfetch, 355, 356, 368-69, 460, 461, callee-saved registers
492,493, 513 MIPS, 452, 454
blkloop,355,356,367-69,460,493, SPARC, 468
513 caller, 93, 94, 286, 292-93, 337-38,
blkreg,434,447 448-49,451,453,484-85,487-89,
blkstore,355,356,368,460,461, 518, 520
493, 513 caller-saved registers, 410, 428
blkunroll, 367, 368, 371, 493 MIPS, 444
(Blockbeg),217,219 SPARC, 468
Blockbeg, 7, 217, 219, 293, 294, 295, CALLF, 86,442,443,476-77,518
338, 339, 341 (CALL), 318, 332
blockbeg,93,95, 339,355, 365 CALL, 84-86, 88, 151, 171, 184, 186,
(Blockend),217,220 189, 199,245,316, 318, 332, 333,
Blockend, 7, 217, 219, 293, 294, 338, 336,343, 344, 361, 366,396,402,
339, 341 417, 427, 429, 445, 512, 529
blockend,93,95,339,355,365, 366 call,186, 190, 199,335,476
block moves, 199, 355, 367 CALL+I,185-86, 190,245
MIPS, 434, 446, 460 CALLI, 86, 400, 403, 417, 442, 443-44,
SPARC, 482, 492 476-77, 517
X86, 512 calling conventions, 93, 94, 184, 338,
BOR, 84, 318,330-31 529
Borland International, Inc., 496 MIPS, 432, 449
bottom-up hashing, 349 SPARC, 465, 468
bottom-up parsing, 127, 145 X86,496
bp,97, 392 CALLP, 86
branch, 224, 225, 227, 230, 232, 237, calls, 85
243, 244, 246, 247 as common subexpressions, 347
branch tables (calls), 166, 186
density of, 238, 250 calltree,187, 189, 190
emitting, 342 CALL+V,186
generated code for, 242 CALLV, 85, 332,333,442,476-77,517
MIPS indirect jumps for, 441 (case label), 221, 234
overhead of, 241 caselabel,234,235
SPARC indirect jumps for, 475 (cases for one-character operators and
traversing, 238, 240, 251 punctuation), 112
544 INDEX

(cases for two-character operators), CNSTI, 82, 378, 388, 403, 437, 439,
112 470,473-74, 504,508
cast, 174, 175, 171, 178, 119, 180, code generator, overview of, 353
181, 188, 189, 192-94, 197, 202-3, code-generator generators, 13, 373
210, 212, 214, 233-34, 235, 242, codehead,217,291,338, 339,341
245, 331 CODE,91,265,293,342,452,459,491,
(cast l and r to type ty), 192, 193-95 501
casts, 179 Code, 211,217,218, 220,233, 246-47,
cfields, 65, 66, 197, 282 291, 293,338,341
cfoldcnst,208,209 code, 211, 217, 218, 219-20, 233, 243,
cfunc,243, 244,290,291,293-94,333 246-41, 294, 311
chain rules, 376 codelist, 217, 218, 236-37, 243,
(changes flow of control?), 416, 417 246-47,249, 291,338,339
character-constant, 122 code lists, 7, 217, 291, 311, 528
characters appending forests to, 223, 311
classifications of, 110 codes for entrtes in, 218
signed vs. unsigned, 206, 257 emitting code from, 341
CHAR,48, 54, 58,60, 69, 73,82, 109, generating code from, 337
115, 175, 253, 256, 251, 271, 280, command-line options, 307
295 -A, 62, 124, 160, 244,263, 281, 292,
charmetric, 58, 78 296,459,492
chartype,57, 58, 74, 123, 177,207 -a,220
(check for floating constant), 117 -b, 220,249
(check for inconsistent linkage), 261, -d, 238,370
262 -g,219,341
(check for invalid use of the specifier), -G,458
255,256 -P, 75,304
(check for legal statement -p,466
termination), 221, 222 -pg,466
(check for redefinition of tag), 67 -target,96, 306
(check for unreachable code), 218 -x, 51
(check if prototype is upward comma operator, 156, 335
compatible), 70, 71 (comment or/), 112
checklab, 22~293, 309 common subexpressions, 6, 80, 223,
(checkref), 296-97 312, 313, 342, 418
checkref, 292-93, 296, 297, 299, 303, allocating registers for, 343
348 bonus match for, 383
(c.h), 16 invalidating, 223, 316, 321, 323,
c.h, 16, 18 326,328
%c,392 recomputing, 360, 382
CISC, 496 commutative operators, 204
ck, 388 commute,204,204,207,208
ckstack,502, 503, 525 comparing strings, 29, 45
(classify SPARC parameter), 485, 486 compatible, 193, 194
(dear register state), 410, 448, 485, compiler-construction tools, 12
519 (complement), 164
clobber, 357, 396, 410, 417, 424-25, compose, 72, 72,261,298
427, 429, 435, 444, 468, 471, 479, composing conversions, 1 74
502,517 composite types, 71
(dose a scope in a parameter list), 272 (compound),294-96
closures, 42 compound,221,245, 291-92,293,298
cmp,242,251 compound-statement, 216, 285, 293
cmptree,109, 192, 193, 194, 195 compound statements, 339, 365
CNSTC,82,388, 389,403,437,470,473 computed, 197,210,211, 328
CNST+D,6 computed symbols, 90, 210, 339, see
(CNST), 318, 327 also Address
CNST, 82, 84, 167, 177, 193-94, 198, (compute p->offset), 282, 283-84
202, 203-5, 234, 318, 326-27, 330, (computety), 255, 257
388, 473, 508 (concrete function), 267, 268
(COND),318, 325-26
INDEX 545

COND,149, 159-60, 190, 200,318, 322, (copy argument to another register?),


324-26, 335 450-51
cond, 174,200,206,225, 322 costs
conditional-expression, 154, 157, 159 in tree grammars, 3 74
conditional,224,225,229, 326 lburg, 376, 388
conditionals, 200, 224 of chain rules, 3 76
values of, 326 count,81, 82, 85,315,343,346-48,
(condtree), 200-202 502
condtree, 159, 192,200 (count uses of temporaries), 382, 384
(confi.g.h), 355, 357-58, 361-62, 365, cover, tree, 373, 377
377 (cp points to a Jump to lab), 246, 247
config.h, 16, 79,81-82,95 +
(cp points to a Label lab), 247, 248
constant, 115 cross-compiler, 14, 79, 370
constant-expression, 202 cross-reference lists, 51
constant expressions, 203 cseg,459,491,492,501,523-24
overflow in, 205 cse,346,384-85,400,413,416,
constant folding, 147, 202 418-19,508
ofconversions,206 (CVx,NEG,BCOM), 318,319
constant,47,49, 119, 168,327 CVC,84, 175,206,318
constants, 80 CVCI, 387, 404, 437, 438, 439, 472,
enumeration, 68 475, 499, 510-11
floating point, 47, 92, 456 cvcu, 437, 438, 439, 472, 475, 510-11
initializing, 91, 305 CVD+F,5
installing, 98 CVDF,8,440,451,481,514-15
integer, 49 CVD,84, 176-77,207,318
out-of-line, 47, 49, 78, 327 CVD+I,6
overflow of, 117 CVDI,440,441,481, 515
representing, 4 7 CVFD,440,481,513, 515
(constants), 38, 47 CVF,84, 175,318
CONSTANTS, 38, 40, 47, 48, 80, 89, 303, CVIC,440,472-73, 511
521 CVID,440,462,481, 515
constants, 39, 40, 47, 48, 51, 303, CVI,84, 176-77,206-7,318
305, 313 CVIU,403,440,472, 510,529
constexpr,202,203,234 CVP,84, 175,206,318
CONST, 54,60,63,69, 72-73, 109, 180, CVP+U, 176
183,201,256-57,266,268,302 CVPU,86,404,440,472,510
const,48 CVS, 84, 175,318
const locations, 197 CVS!, 83, 437, 438, 439, 472, 475,
consttree, 160, 165-67, 170, 174, 177, 510-11
183, 193, 198, 201,208-9, 242, CVSU, 83, 437, 438, 439, 472, 475,
320, 330-31 510-11
continued fragments, 2 cvtcnst,206
(continue statement), 221, 228 cvtconst, 207, 327
conventions, see calling conventions cvu, 84, 176-77,206,318
conversions CVUI,83,404,440,473, 510, 529
between unsigneds and doubles, CVU+P,245
176 CVUP,86,404,440,473, 510
MIPS arguments, 450
narrowing,440,472,511 d6,433,445
of conditionals to values, 160 dag.c, 311
of values to conditionals, 174 dag,314,315-16
preserving value or sign, 173 dagnode,315
(convertp tosuper(pty)), 175 dags, 6, 78,81,342
(convertp tosuper(ty)), 175, 176 allocating and initializing, 98, 315
(convertp toty), 175, 177 argument operators, 85
Coordinate,37,38, 39,41, 51-52,80, assignment operators, 83
99, 108, 159, 162, 186, 220,228, avoiding, 89, 340, 343
258,260,274,277, 286,290,298, back-end extension to, 82
338,341 conversion operators, 83
converting trees to, 223, 311
546 INDEX

execution order of, 86 (define an uninitialized global), 458,


forests of, 223, 311 459
jmilpi;, 83 defined,48, 50, 6~ 90,210,219, 226,
mµltiplicative operators as calls, 87, 261, 264,271-72, 274-75, 278,
' 31l:\, 343 288,290,294, 297,299-300,302,
opera:fors, 81-82, 84, 98 304-5, 346
references to, 81 (define function id), 259
roots 1 see root nodes definelab,224,225,226,230,234,
searching for, 315 236, 241, 246, 241, 291, 292, 312
dalign, 358, 368, 369, 446, 447, 461, definept,220,221-22,224, 227-30,
482,483 232-33,243,291,292,302
dangling-else ambiguity, 131 defining constants, 91
DATA,91, 264,265,458,459,491,501 MIPS, 455
(dclglobal),260,261-63 SPARC, 490
dclglobal,253,260,261-63,290,298 X86, 522
(dcllocal),298-99,301 definitions, 254
dcllocal,253,260,263,294,295, (Defpoint),217,220
291, 298, 303, 309 Defpoint, 7, 217, 220, 338, 341
(dclparam),274,275 defpoint,229
dclparam, 253, 271, 272, 274, 275, defstring,92, 305,456,490, 523
287-88, 298 defsymbol,46, 49-50,89,90,242,263,
dclrl,265,266,267,268,270,308 300, 303,457,491,520, 521
dcl r, 258-59, 265, 266, 268-70, 273, delay slots, 475, 481, 494
281, 308-9 density,238,239, 250
dead jumps, 246 DEREF,109, 166
(deallocate arenas), 253, 254 deref,61, 169,182
deallocate, 24, 25, 27, 28, 32-33, 223, derivations, 128
224, 254, 311 deterministic finite automaton, 107
(debugger extension), 38 dfl ag, 358, 370
debuggers, 217,249 diagnostics, see errors, reporting
symbol tables for, 39, 51, 79, -d,238
219-20,341,432,465 digit, 114
(debugging implementation), 25 Digital Equipment Corporation, 431
decimal-constant, 116 DIGIT, 110, 111, 113-14,117,119-21
(decimal constant), 116, 117 directed acyclic graphs, see dags
declaration, 254 discarding tokens, 140, 144
declaration-before-use rule, 308, 530 DIV,84, 109,318,319,344
declarations, 254 div,206
declaration-specifiers, 254 (DIV •. MOD), 318
declarator, 266 doarg, 356, 357, 403, 445, 446-41,
(declare a parameter and append it to 477, 512
list), 273 doargs,306,307
(declare id a typedef for tyl), 260 docall, 366,367,402-3,445, 512
(declare id with typetyl), 258, 260 doconst,303, 305,327
decl.c,253 doextern,264,303, 308
decl, 253, 257, 258, 265, 269, 285-86, doglobal,303,304, 308
287,295,298 (do statement), 221
DECR, 109, 110, 164, 166 DOUBLE,49, 54, 58,60,69, 73,82, 109,
defaddress,91, 92,242,251,342, 175, 256-57
456,490, 522, 523 doublemetric,58, 79
default argument promotions, 71, 189, (double the size of values and
288 1 abel s), 235
(default label), 221, 234 (double-to-unsigned conversion), 176
defconst, 91, 92, 305, 371, 455, 456, doubletype,57, 58, 74,93, 121, 173,
490, 522, 523-24, 521 175-77,189,202,288
defglobal,264,265,300,304-5,342 dumpcover,390,406
(define L, if necessary, and L + 1), 233, dumpmatches,406
236 dumprule,390
(define an initialized global), 458 dumptree,389,390
dynamicprogramming,374,379
INDEX 547

EAX,498,499-500, 508-10, 517-18 equated,340,341, 342, 351


EBNF, 20 equatedto,46,248,341
EBX,498,499-500 (equate LABEL to L + 1), 325
ECX,498,499-500, 508,510, 513 equate lab, 247, 248, 323, 325, 340,
EDI,498,499-500, 512-13 350-51
EDX,498,499-500, 509, 518 ERANGE, 120, 121
(eliminate or plant the jump), 248, 249 errcnt, 142-43, 30~338,341
emit2, 353-54, 356, 392-93,444, errno, 120, 121
446-47, 478, 479, 482-83, 510, error. c, 141
511 error, 142, 143
(emitasm),391, 392 errors
emitasm, 353-54,391,392,394 detecting, 140
(emitcode Blockbeg),341 recovering from, 140
(emitcode Blockend),341 reporting, 20, 142
(emitcode Defpoint),341 escape-sequence, 122
(emitcode Gen,Jump,Label), 341, escape sequences, 121, 126
342 ESI,498,499-500, 512-13
emitcode,93,217,243,286, 311,337, evaluation order, 166, 172, 312, 326,
341, 342, 353-54, 448, 454, 484, 335
490,520, 528-29 event.c, 14
(emitcode Local),341 event hooks, 160,244, 293
(emitcode Switch),341,342 execution ordering, 354, 359, 409, 413,
(emit epilogue), 93 428
emit,92, 93, 96, 100-101, 340-41,342, execution points, 217, 220
353-56, 391-92, 393, 477, 506 exitparams, 259, 269-70,272,274
( emitleaf prologue), 488 exitscope,42,44, 51,59, 259,269,
(emit profiling code), 490 272, 274, 292-94
(emit prologue), 93 exit sequence, 10,244,292,354,366
emitter,394 MIPS, 454
emitting instructions, 354, 356, 391 SPARC,490
(emit unrolled loop), 368, 369 X86, 520
enclosing scope, see nested scopes expanded basic blocks, 313
endianness, 87, 370, 431 expect, 141, 142, 14~ 158, 165, 180,
(end of input), 112, 124 187, 222, 224, 226, 228-29,
enode.c, 147, 155, 171 233-34, 268,270-71, 293-95,302
(ensure there are at least MAXLINE explicit conversions, 179
characters), 114, 115, 116, 120 export,90, 265,293,456,490, 501,
enterscope,42, 51, 269,271,287,294 523
entry sequence, 9, 341, 354, 366 exprO, 141, 156,222, 223,229, 350
MIPS, 452 exprl, 153, 155, 156, 157, 158, 163,
SPARC, 488 188,202, 203,276, 302
X86,519 expr2, 153,157, 159, 161,203
(enum constants), 38, 69 expr3, 153,158, 159, 161, 162, 163,
enumdcl, 256-57, 310 171,191
enumeration-constant, 115 expr.c, 147
enumeration constants, 68 expression, 154
ENUM, 39, 54,60, 68,69, 73, 109, 170, expressions, rearranging, 212
175,256, 301, 310 (expression statement), 221, 222
enum-specifier, 310 expr, 153, 155, 156, 159, 164-65, 172,
Env, 9~ 219, 339,365 181,224,225,233,243, 350
EOI, 5, 112, 134, 144,222,253 extended Backus-Naur form, see EBNF
epilogue, see also exit sequence extension
lburg, 375 to interface records, 3 54
EQ, 84, 160, 174,318,322,340,417 to nodes, 358
EQI, 441, 475, 516 to symbols, 362
(EQ •. LT), 318, 321 external identifiers, 263, 299
eqtree, 192, 194, 195,209,242 consistency of declarations for, 300
eqtype,48,69, 70-71, 72, 194-9~200, definitions of, 304
261,263, 289-90,298,301 importing, 303
equal,248, 248 MIPS, 456
548 INDEX

SPARC, 490 FLTVAR,434


X86,523 foldaddp,209,210
external linkage, 261 foldcnst,204,205,207,208-9
externals,39,40, 111,263,297, foldcond, 230,250
300-301, 303 f.oldstyle,63, 189
EXTERN, 39,80, 168,211,256,259, FOLLOW sets, 136, 140, 146
261-64,297-301, 304-5,457, 521 foreach, 41, 42, 68, 269, 288, 292-93,
(extern local), 299, 300 296, 303
forest,220,246-49,311,321,325,
fatal, 143 328,334,340, 342-43,402,
fcon, 117, 120 414-15, 417
%F,392,436 forests
(FIELD), 319, 320 circularly linked lists for, 311
FIELD, 149, 165, 183, 197-98,209, generating code for, 337
313, 319, 320, 328, 329, 336-31 formals
Field,65,66,68, 76, 182-83,280,282, MIPS, 436
329 MIPS register, 450
field, 66, 76, 182, 182-83, 198, 209, offsets for, 362
219-82,320, 329 SPARC, 469, 485
fieldleft,66,320 X86, 519
fieldmask,66, 198,209,329 format codes, 99
fieldref,68, 76, 182 %S, 118
fieldright,66,209,329,331 %t,62
fields (for statement), 221, 228
computing offsets and alignments (forstmt), 229-30
of, 282, 309 forstmt,221,228,230,250
qualified, 282 fprint,97, 142,144,306-7,389-90
references to, 181 f.proto,63, 189
simplifying references to, 211 fragment definitions, 1
storage layout of, 279 fragment uses, 2
fields, 276 frame, 93, 354, 364
(fields for registers), 362-63 MIPS, 447, 452
(fields for temporaries), 362 SPARC, 487
fields, 68, 217, 278, 279, 280, 283, X86, 520
285, 309-10 frame pointer, 9, 364
fieldsize,66, 198, 320,329 SPARC, 467
file,38, 104, 105,111, 125,142 X86,498
file scope, 35 framesize,358,366, 392,452-55,
fillbuf, 103-105, 106, 111-12, 115, 487-88, 520
124 freeblocks,27,27-28,33
finalize, 168, 263, 297, 303, 305, free,23,25,32-33
307, 321 (free input registers), 417, 418
findlabel,46,222,224,230,234-36, freemask,358,365-66,410,410,428
246-47,322-23,325 freg2,433,434,443-44,467,476
finite automaton, 107 FREG,361, 365-66,434,443-44,
firstarg,332,333,334 451-52,454,467-68,477,488,
firstfile,104, 105, 142 500-501
first,25,26,28 freg, 467, 476
FIRST sets, 134, 143, 146 freturn, 64, 97, 186, 243-44, 286, 294,
fixup,340, 341, 351 487
flist,65, 66,68,282 (funcdefn),286,290-93
FLOAT,48,54,58, 73,82,109,175,256 funcdefn, 41, 221, 259, 212, 274, 285,
floating-constant, 120 286, 290-94, 333, 337, 348
floating types, 54 FUNC,41,46,49,97,205,210,218,226,
floatmetric,58, 79 229, 233, 239, 254, 260, 271,
floattype,57, 58, 71, 74,93,121, 274-75, 287, 292, 295, 299, 315,
173, 177,189,207,288 424
flow graphs, 313, 531 func,64, 73, 186,266,290
FLTRET,443,444 funcname, 159, 186, 187-89,225
FLTTMP,434,443-44 function-dei1nition 285
INDEX 549

(function detinition?), 259 generated,46,49, 50,80, 197,210,


FUNCTION,49, 54, 57, 59-60,64,69, 305,457,491,521
72-73,109,266,268,270 generated symbols, 47, 49, 80, 168
function, 85, 89-90, 92, 93-94, 100, MIPS, 457
211, 216, 252, 276, 286, 287, 289, SPARC, 491
292, 293, 294, 311, 333, 337, 341, (generate nodes fore++), 335, 336-37
353-54, 362, 366,410,443,447, (generate the selection code), 233,
448, 451-54, 463, 484, 485-90, 236-37
518, 519-20, 529, 531 generic,97, 98, 151
function pointers, 176, 193 generic operators, 84, 149, 203
functions Gen, 7, 217, 220, 223, 225, 311, 313,
declarations of, 270 338, 340, 341
definitions of, 259, 270, 285, see gen, 79, 81, 92, 93, 95-96, 100-101,
also funcdefn 340-41, 343, 353-56, 385, 388,
emitting code for, 92 402, 403, 409, 414, 445, 447, 484,
leaf, 487 487, 531
order of evaluating arguments to, 88 genident,49, 50, 168,234, 242,291,
passing structures to, 88, 169, 183, 294,302,327
185, 190 genlabel,45,46,49,67-68,98,210,
returning structures from, 85, 88, 222,224,227-28,232,234, 241,
183, 187, 245, 292, 294,332, 278,281, 290,323,325,457,460,
529, see also retv 491,521
returning structures from on the (genre load), 426
SPARC, 484 genreload,409,420,424-25,426
variadic, see variadic functions (genspill),424-25
with no arguments, 64 genspill,396,409,420,423,424
function scope, 37 (get a new block), 26, 27
(function symbols), 38, 290 getchr, 108, 126,226,295
function types, 54 getreg,409-10,412,418,422,427,
500
garbage collection, 32 getrule,382,390-91,419
gcc,4, 533 (gettok cases), 111, 112-14, 116, 119,
GE,84, 160, 174,318,322,340,417, 122
441 gettok, 108, 110, 111, 115, 117, 119,
(gencode Address), 338, 339 123, 125, 134, 142
(gencode Blockbeg), 338,339 -g,219,341
(gencode Blockend),338,339 GLOBAL,38,40,41,42,49,58-59,61,
(gencode Gen,Jump,Label),338,340 62,80, 89, 168,211,253,256,259,
gencode, 93-95, 216, 219, 286, 292, 261-62,289,297,300,302,303,
299, 311, 337, 338, 341, 343, 327,457, 521
353-54, 447, 451, 484, 487, 520, global, 51,90,265,458,459,492, 524
528-29 global optimization, 531
(generate a linear search), 241 (globals), 38, 265
(generate an indirect jump and a globals,39,40,41,262,301
branch table), 241, 242-43 GNU C compiler, see gee
(generate an initialized static tl), 302 gnum,458
(generate a temporary to hold e, if (goto statement), 221, 227
necessary), 233 gp,433,458
(generate ca11 er to ca11 ee grammars, 19
assignments), 338 tree, 373
generated code greg,466,467,473,488
for branch tables, 242 GT, 84,160, 174,242,318,322,340,
for conditional expressions, 324 417,441
for if statements, 224
for loops, 227 (h - hash code for str, end - 1 past
for switch selection, 236, see also end ofstr), 30, 31
branch tables hascall, 151, 171, 186, 187-88
for switch statements, 231 hashing strings, 30
for&&, 322 (hash op and ty), 56, 57
HASHSIZE,40,44-46,48
550 INDEX

hash tables, 30, 40, 314 infd,104, 105,106,307


hasproto,75,260 infile, 307
hexadecimal-constant, 126 inheritance, 527
(hexadecimal constant}, 116 init.c,14
HEX, 110, 111 initglobal,264,299,300,302,309
(initialize for a struct function}, 187
ICON,102,116, 123,167 (initialize}, 93
icon, 117, 126 (initialize MIPS register structures},
(id, tyl ~ the first declarator}, 258 433,434
(id, tyl ~ the next declarator}, 2 58 (initialize new-style parameters}, 287
(ident}, 267 (initialize old-style parameters},
identifier, 114 287-89
identifiers, number of references to, initializer, 254
296, see also refi nc initializer, 264
i denti fie rs, 39, 40, 41, 44, 68, 115, initializers, 263
219, 260, 270, 272, 275, 288, 292, (initialize SPARC register structures},
294, 296-91, 298-99, 303-4, 310 466, 467-68
identities, eliminating, 207 input buffer sentinel, 103, 106
identity,207,208,209,210 input.c, 125
(id}, 275 inputinit,105, 106,307
ID, 109, 114, 115,155, 157, 167, 182, install, 44, 45, 51, 58-59, 61, 67,
221-22,227,229,234,253,256, 226-21, 260, 262, 275, 299-300
258, 267-68,271,277,295 (install new string str}, 30, 31
idtree, 168, 170, 185, 189,191, 199, (install token in stmtlabs, if
222, 225,242,245,292,296,302, necessary}, 226, 227
326-27, 339 instructions
IEEE floating point, 370, 456 emitting, 391
IF,108, 113, 155,157,221-22,229, MIPS samples, 430
234,271,277, 280,295 ordering, 409, 413, 428
(if statement}, 221, 224 scheduling, 428, 463, 475, 481, 494,
ifstmt,224,225,228, 326 531
(illegal character}, 111 selecting,354,373,402,531
immediate instructions SPARC samples, 463
MIPS, 438, 443, 463 two-operand, 393, 419
SPARC, 463, 469 X86 samples, 496
imm,469,470-72 instruction trees
implicit conversions, 6, 172 projecting, 359, 385, 426
import,90,264,30~456,457,491, intconst,49,328,334,348,367,445,
523, 524 477
incomplete types, 56 integer-constant, 116
(increment sum}, 1 integral promotions, 71, 172, 189
incr,158, 165-66, 111, 192 integral types, 54
INDIRB, 348, 446, 441, 482, 513 Intel Corporation, 496
INDIRC, 86,400,437-38,471-72,505, (interface flags}, 79, 87-89
510 (interface}, 16, 78-79, 96
INDIR+D,5 Interface, 79,96,306,431,464,497
INDIRD, 6, 8,401,438,471-72,513 interface record, 79, 252
(indirection}, 164, 179 back-end extension to, 354
INDIR+F,6 binding to a specific, 96, 306
(INDIR}, 319 (interface routine names), 432, 464,
INDIR, 83-84, 86, 169, 118-19, 181, 497
190, 191, 233, 316, 319-21, (interface to instruction selector}, 356,
336-37, 343, 345, 346, 347, 349, 379
361, 383, 384, 395, 399, 415, 422, internal linkage, 261
425-26, 507-8 intexpr,203,268,281
INDIR+I, 148,336 INT,48,54,58,69, 73-74,82, 109, 113,
INDIRI, 86, 311, 383, 400, 401, 43 7, 175, 256-57
471, 505, 515 intmetric,58, 78, 79
INDIRP, 86, 95, 321, 341, 401, 437, INT_MIN,29, 30,205,207
471, 505 INTRET,443,444
INDEX 551

INTTMP,434,443-44 (JUMP), 318, 321


inttype,57,60, 66,173 JUMP,84,242,318,340, 341,343
INTVAR,434 Jump, 217, 218, 220, 227, 246, 247,
inverted type, 265 249,291, 311,338, 340,341
IREG,361, 365-66,434,443-46,449, jump,247, 312, 325,327
451-52,454-55,467-6~47~ jumpstojumps, 247
479-80,488,498-500, 509, 513, JUMPV, 83, 227, 247, 321, 417, 441,
517-18 475, 516
ireg,433,434,437,443,445,467,
476,485-86,489-90 keyword, 113
IR,46,49,50, 58,61, 96, 168, 188,211, keywords, 102
242,263, 265,282, 284, 291-94, %k,99
300,303-5, 306, 333-34, 338-40, _kids, 381, 406
342, 344, 346, 368-69, 371, 382, kids, 81, 83, 86, 98, 149, 359, 386-87,
389-92,400,402-3,417,419 409, 422, 426-27
IRIX operating system, 431 kill, 316, 317, 328, 349
isaddrop, 179, 191, 197, 199, 210-12, kind, 115, 143, 143-44,217-18,222,
233,316, 328 229,246-49, 253, 259,268, 271,
isarith, 60, 178, 180, 192-93, 196, 287,291,295,338, 341
200 K&R C, see pre-ANSI C compilers
isarray,60,62-64,81, 168, 174, 179,
181-83, 189, 193, 264,275, 298, (l, r - for a bit-field assignment), 328,
302, 304-5, 327 329
iscallb, 191, 199,245,246 (Label,Gen,Jump),217,220
iscall,343,344,347 LABEL, 83-84,343, 396
ischar,60 Label, 7, 217, 218, 220, 226, 246,
isconst,60,63, 73, 180, 183, 196-97, 247-48,291,311,338,340, 341
201,264, 282,302 _label,353-54, 378,379, 388
isdouble,60, 173,489 labelnode,323,325
isenum,60,61, 71, 74, 180, 189-90, labels, 80
195 appending code-list entries for, 246
isfloat,60,449,483,486,490, 518 appending dags to the forest for,
isfunc,60,62-65, 159, 165, 168, 174, 323
176, 179, 186, 192, 194-95,201, case and default, 234
225,259, 261,263-64, 275, 282, compiler-generated, 45-46
289-90,296,298, 302,304-5,457 equality of, 248
isint,60, 71, 74,180, 189, 192, 194, equated, 248, 340
203,233-34,451,453,521 exit-point, 292
isnullptr, 194, 195-96,201 local, 222
(is p->ki ds [1] a constant common removing dags from the forest for,
subexpression?), 508 325
ispow2,208 source-language, 45-46, 226
isptr,60,61, 175, 177, 179-82, 186, true and false, 225, 312, 321
190, 192, 194-97, 201,245, 319, undefined, 227
521 (labels), 38, 46
isqual,60,63 LABELS,38,46, 80, 89,227,242, 291,
isscalar,60,297,302,412,483 293,491
isstruct,60, 168, 182, 187-89, labels, 40, 41, 46-47, 230, 232-33,
196-97, 199,275, 286, 291-94, 235,243,291,342
298,302,319,449,485,487, 529 LABELV, 101, 226,246, 248-49, 323-24,
(is this a simple leaf function?), 325,417,441,475, 516
487-88 l burg, 13, 373
istypename, 115, 164-65,256,259, approximate costs, 441
270-71,273,278,280,287, 295 arity of terminals, 376
isunion,60 chain rules, 3 76
isunsigned,60, 71, 118, 169, 173, configuration, 375
178, 197-98, 346 costs, 376, 388
isvoidptr, 194, 195-96,201 dags versus trees, 373
isvolatile,60,63, 73, 180, 183, 196, epilogue,375
201,233,275,282,296,298,319 guidelines, 404, 436, 438
552 INDEX

labeller, 374, 377 LOADD,439,482


MIPS nonterminals, 43 5 LOAD, 361, 398, 400, 417, 420, 473, 411
nonterminals, 3 73 LOADI,400,439,473, 506
prologue, 375 (Local), 217, 219 ,
reducers, 374, 379, 381 LOCAL,38,49-50, 80, 89,260,269,294,
SPARC nonterminals, 469 297-98,299, 32~346,457,491,
specifications, 3 75 521 ',
subtrees, 380 Local, 217, 219, 319, 338, 339, 341-,
terminals, 373 346
tree cover, 377 local, 50, 85, 89, 90, 93, 95, 98, 2lli
X86 nonterminals, 503 217, 338, 339, 346, 410, 447, 483,
LBURG_MAX,37~388-89,404, 507 487, 518, 528-29
(lburg prefix), 375, 431, 463, 496 locals, 217, 294
leaf functions, 487 assigning registers to, 296
(leave argument in place?), 449, 450 declared explicitly as registers, 297
LEFT_CHILD, 375 declared extern, 300
left factoring, 139 initialization of, 302
leftmost derivation, 128 MIPS, 436, 447
left_to_right, 88, 183, 186, 332, offsets for, 362
333, 334, 351, 498 SPARC, 469, 483
LE, 84, 109, 160, 174, 318, 322,340, X86, 518
417,441 loci, 52
length,34,274,295 locus,52
LEQ,109 (logical. not), 164
LETTER,110, 111, 113-14, 119 longdouble,57, 58, 121,257
level,40-41,42,44,62,67, 187,191, LONG, 54, 109,256-57
202, 219,234, 253,256, 259-60, long input lines, 103, 105, 122, 125
269, 272,275, 278,292, 294, long string literals, 2
296-300 longtype,57, 58, 71,118,257
lexical analyzer, 5, 102 lookup, 45, 51, 67, 115, 226, 260-61,
generators, 107, 124 263, 275,278, 289-90,297-98,
lifetimes of allocated objects, 23 300, 301,303
limit,17,25-26,28,103, 104, loop handles, 222, 228, 293
105-107, 111, 112, 115, 122, LSH,84, 198,208,318,320,331
123-24 LT,84, 160, 174,242,318,322,340,
limits.h,30 417,441
(linearize forest), 402, 414 ltov,34, 73,271,272,274,295
linearize,353-54,409,413,413-14, l value, 169, 174, 178, 179, 181, 190,
425, 428 191, 197,329,331
line boundaries, 103 lvalues, 53, 169
line, 104, 10~ 106, 10~ 111
lineno,104, 10~111, 12~215 main.c, 305
linkage,261 (main), 305, 306-7
linked lists, use of, 34 main,305,306-7
(list CALL+B arguments), 333 malloc,23,2~2~28, 32-33
list.c,14,34 map, 110, 111, 112, 117, 119-21,
listed nodes, see root nodes 123-24
List, 34, 37, 51, 52, 13, 271, 294 (map initializer), 110
list, 17,34,271,274,321,321,325, mask,329-31,361, 363,395,410-12,
327-28, 333-34, 336-37 417-19,422-23,427-28,477, 509
(l i stnodes cases), 318 (mask out some input registers), 418,
listnodes, 111, 186, 222, 223, 225, 419
242, 311-13, 315, 317, 318, 319, maxargoffset,358,366,367,448,
320-23, 325, 326, 328-30, 331, 451-52,454,48~487, 512
332, 333, 334-31, 342, 349-50 maxlevel,59, 59, 61
literate programming, see noweb MAXLINE, 105, 105, 112, 114, 115, 116,
LIT,91,264,26~302,305,342,459, 122, 126
491, 501 maxoffset,358,365,366,448,452,
little endian, 87, 370, 431 481, 520
little_endian,87,284, 371 MAXTOKEN,111, 114, 124
INDEX 553

max_unaligned_load,355 mkauto,357, 364,365,44~ 483,518


(mayrecalc), 385 mkreg, 358,362,363,434-35,467,
mayrecalc,357,361, 384,385 498-500, 509
memcmp,34 mkwildcard,358, 363,434,467, 500
memop, 507,507 ML, 74, 530
memset,24,57,317 MOD, 84,318,319,344
(metrics), 78, 79 Modula-3, 308, 527, 530
Metrics, 78, 79 modules in l cc, 15
(M), 15 monolithic compilers, 529
Microsoft Corporation, 496 move,358, 394,439,472, 506,510
mini-indices, 4 moveself, 353-54, 393, 394, 395, 406,
MIPS 416, 472, 477, 506
address calculations, 436 (move the tail portion), 106, 107
argument-build area, 452 MS-DOS, 496
argument transmission, 444, 449-50 MUL, 84, 109, 177, 193, 208, 318, 319,
block moves, 434, 446, 460 344
callee-saved registers, 452, 454 mul, 206
caller-saved registers, 444 MUL+I, 148
calling conventions, 432, 449 mulops_calls, 87, 171, 319, 343, 344,
defining constants, 455 350, 465, 480
entry sequence, 45 2 multiple assignments, 331
exitsequence,454 multiple-pass compilation, 530
external identifiers, 456 multree, 109, 192, 193,215
formals, 436
frame, 447, 452 (n ~ *ty's size), 193
generated symbols, 45 7 name, 37,44-46,48-49, 58,66-68, 76,
immediate instructions, 438, 443, 80,96,170, 182,197, 210,263-64,
463 281-82, 286-88, 290, 297-98, 300,
instruction suffixes, 436 303-4,306, 362,389,449,45~
l burg nonterminals, 435 484,491, 521
locals, 436, 447 name spaces, 36
pseudo-instructions, 432, 441 navigating fragment definitions, 2
register formals, 450 ncalls,93, 290, 293, 333,448-49,451,
registers, 432 484,487
register variables, 432, 434 needconst, 202, 203, 205, 207, 234,
return address, 432, 442-43, 455 235
return register, 432, 443 NeedsReg, 398,417
sample instructions, 430 (negation), 164, 178
scratch registers, 432, 434, 443 NEGD,439,482,514
segments, 459 NEG, 84, 178,318
stack pointer, 432, 434 NE, 84,160, 174,208,318,322, 340,
structure arguments, 449, 454 417
variadic functions, 448-49, 453 NE!, 441, 475, 516
wildcards, 434 NELEMS,19, 30,31,40, 56,59,315-16,
zero register, 432, 437 392,413,415,418-19,422-23,
mips.c,373,430 428,502
(MIPS clobber), 435, 443 nested calls, 183, 186, 335
(MIPS defconst), 455-56 nested scopes, 35
(MIPS defsymbol ), 457 nesting levels, see scope, levels
mipsebIR,96,431 NEW0,24,41,44,46,48-49,68, 150,
mi pse lI R, 96, 431 210,266, 315,363,424
(MIPS emit2), 444, 446-47 newarray,24-25,28, 233,239,274,
(MIPS function), 448-49, 451-55 287,290, 292
(MIPS interface definition), 431 newconst, 98
(mi ps. md), 431 newfield,68,281
(MIPS rules), 431, 436-44, 446 NEW,24, 31,33,57,218, 287,292
(MIPS target), 435, 437, 443, 445, 447 NEWLINE, 110, 111,123-24
(MIPS type metrics), 431 newnode, 98, 246, 311, 312, 315,
missing tokens, 141 316-1~ 320,328,333-34,346-48,
mkactual,357, 366,445,477, 512 400,425-26
554 INDEX

newstruct, 66, 67, 68, 77, 277, 278, omitted fragments, 3


310 omitted modules, 14
(new-style argument), 188 one-pass compilation, 529
new-style functions, 63, 187-88, 270, opaquetypes,39, 56
272, 287 (open a scope in a parameter list), 268,
and old-style declarations, 286, 289 269,270
newtemp,50, 98,423 operating systems
nextline, 103-104, 105, 106, 111, IRIX, 431
112, 122, 124, 125 MS-DOS, 496
nodecount,222, 223,314,315-16, 317 Ultrix, 431
Node,81 operators, 528
node,81, 149, 313, 314,315,31~ (operators), 82
318-20, 327, 349, 353, 358, 361, oper, 155, 157, 158, 163
397, 402, 527-28 opindex,98, 151,315,398,417
NODEPTR_TYPE,375,378,381 OP_LABEL,375, 376
nodes, back-end extension to, 358 _opname,389
nondigit, 114 optree, 155, 158, 163, 174, 177, 181,
nonterminals, 19 191,212,242
l burg, 373 optype, 74,98, 151,203, 322,398,
MIPS lburg, 435 400,417-18,423-24,426,445-46,
SPARC l burg, 469 477,502
X86 lburg, 503 ordering instructions, 359
notangle, 1-2, 16 oreg,467,476-77,480,489
notarget,358,403,404,440,473 (OR), 318
(NOT), 318, 322 OR,109, 149, 160, 174,225,318,
NOT, 149, 160, 174,225,318, 322 322-23, 335, 349
noweave, 1 OROR,109, 163
noweb, 1, 14,21 OTHER, 110, 111
nstack,502 outfil e, 307
nstate, 502 outflush,17, 9~98,99,293,307
_ntname,406 outofline, 58,61, 78, 79
_nts, 380-81 output buffer, 97
null characters, 12 3 output.c, 14, 16, 18,392
nullcheck, 179, 181-82,215 outs, 16, 1~9~99,392
null-pointer errors, 179, 214 outtype,75
null pointers, 194, 196, 201
number,361, 363,446,453-54,488-90 (p - unary), 164, 165
(numeric constants), 167 parameters, see also formals
representing with two arrays, 286,
Oberon-2, 527 337
object-oriented languages, 527 parameters, 259, 268, 270, 271, 274,
object pointers, 176, 193 286-88
octal-constant, 126 PARAM,38, 6~ 80, 89, 168,269, 272,
(octal constant), 116 288,291,293,296-98
offset,66, 183, 211, 219,283, 339, (parse fields), 280
358, 362,364,365, 366,444-45, parseflags,358,370,458,498
448-49, 451, 485, 487, 489-90, (parse -G flag), 433, 458
519, 520 (parse new-style parameter list), 271,
offsets 273
for formals, 362 (parse old-style parameter list), 271
for locals, 362 (parse one argument), 187, 188
initializing, 365 (parse one field), 280, 281-82
oldparam,288 parser, tree, 373
(old-style argument), 188, 189 parser generators, 127, 145
old-style functions, 188, 270-71, 287 (parse SPARC flags), 466
and new-style declarations, 286, 289 parse trees, 129, 147
oldstyle,63,64, 73, 187, 189,259, parsing functions, 133, 137, 151
266,272-73,286 simplifying, 139, 161
(omit leading register copy?), 392, 393 partitioning case labels, 239, 250
omitted assertions, 3 (pass a structure directly), 188, 191
INDEX 555

patterns, tree, 373 print, 18, 9~ 99,352,469, 520


peephole optimization, 406, 531, 534 printproto, 75, 76
perfect hashing, 114, 125 procedure activation record, see frame
PERM, 31,48-49, 57-59,61,67-68, 73, productions, 19, 127
97,254,260, 262,274,290,300, prof.c, 14
363 profiling, 220
-P, 75,304 profio.c, 14
(plant an event hook for a call), 187 progbeg,89, 305,307, 399,410,433,
(plant an event hook for a struct 447, 466, 467, 498, 501-2, 518
return), 245 progend, 89, 305, 307, 433, 466, 501,
(plant an event hook for return p), 245 502
(plant event hook for return), 244 program,253,305,307
(plant event hook), 221 projecting instruction trees, 359, 385,
(plant event hooks for?: ) , 159 426
(plant event hooks for && I I ), 163 prologue, see also entry sequence
pointer, 267 lburg,375
(pointer->field), 166, 182 promote, 71, 172-73, 174,178, 189-90,
(pointer), 267, 268 193,233,245,287-88
POINTER,49, 54,60-61,69, 72-73,82, promoting subword arguments, 338
109, 175,266,268 prototypes
pointer, 155, 156, 159, 163, 165, 173, in types, see types
174, 178-82, 186, 188, 197,242, printing, 75
244 (prune), 386-87
pointers prune, 353-54, 385, 386,387,402,
additions of integers to, 192 409, 425-27
assignments to, 196 pseudo-instructions
comparing, 201 MIPS, 432, 441
simplifying additions involving, 209 SPARC, 463, 470
types for, 54, 528 (p->syms [2] - a generated
pointersym,61 temporary), 346, 348
(pointer-to-pointer conversion), 175, ptr,61, 61, 64, 72, 169-70, 174, 179,
176 182-83, 191, 201, 266, 275, 291,
position-independent code, 251 293-94, 515
(postdecrement), 166 ptrmetric,61, 79
postfix-expression, 154, 164 (put even lightly used locals in
postfix, 153, 164-65,166, 186, 214, registers), 483
336 putreg,409,410,417-18,424
(postincrement), 166
Postscript, 533 (q - the r.h.s. tree), 329, 330-31
pp-number, 119 (*q represents p's rvalue), 316
ppnumber,118, 119, 120 qual,62,63, 72-73, 180, 183,201, 257,
pre-ANSI C compilers, 16, 122 266,302
precedence, 131, 152 qualified types, 54, 182, 197
prec, 152-53, 155,157, 162 (qualify ty, when necessary), 182, 183
(predecrement), 164 question mark
(preincrement), 164, 165 in assembler templates, 506
(prelabel caseforASGN), 399 quick-fit allocation, 32
(prelabel), 398-400 quo,509, 509
prelabel, 353-54, 397, 398, 399-400,
402,418, 420 ralloc, 353-54, 409-10, 417, 418-19,
preload,398 424, 425, 462, 477
preprocessing numbers, 119 range, 358,388, 389,437,439,443,
preprocessor, ANSI C, 4 469,473-74,504, 508
preprocessor output, 4, 125 rep, 111,112-17, 119-20
primary-expression, 154, 166 readsreg,395,396
primary, 123, 164, 166,167 (recompute max level ), 59, 60
(print an ANSI declaration for p), 304, recursive-descent parsing, 127, 133
305 recursive structure declarations, 2 76
printdecl, 75-76,305 redeclaration errors, 67, 252, 260-61,
printf,2, 99, 142, 188 269,275, 278,298
556 INDEX

reduce, 353-54, 382, 384-85, 387, 402, register variables, 361, 399, 418
470 MIPS, 434
{reduce k?), 368, 369 SPARC, 468, 483
reducers, 379, 381 X86, 500
ref,38, 168, 211, 221, 224,230, 236, register windows on the SPARC, 463,
246-49,294, 296-98, 302-3, 322, 465
339, 346, 523, 524 Regnode,361,362,411,422
{refill buffer), 105, 106 relink,413,414
refinc,168,169,220, 221,222,224, rem, 509, 509
225, 229, 233, 290, 291 {remove the entry at cp), 246, 247
regcount,291,290299 {remove types with u. sym->scope >=
reg, 356, 400-401, 403, 404, 405, lev), 59
436-44,446,460-61,469-78, rename,488,489,490
480-82,484-86,489-90,492-93, reprune,409,426,426-27
504-6, 508-11, 513-18 {requate), 395-96
register allocation, 354 requate,353-54, 393,394, 395-90
by graph coloring, 428, 531 407, 472, 506
register allocator reset, 311, 317, 321, 323-24, 325,
overview, 409 326,328-29,333
register assignment, 354 {reset refi nc if -a was specified), 221
REGISTER, 39,80, 94-95, 179,202, 234, {result), 511
256, 270,273,275,296, 297-99, resynch,106, 125
346-47, 348, 399, 412, 417-18, resynchronization directives, 106, 125
424,450,451,453,483,486,488 retargeting lee, 357
{register local), 299 RET+B,245
registers RETB, 85
allocation of, 408, 413, 417 {retcode), 244-45
assigning variables to, 94 retcode,243,244,290,291, 295
assignment of, 408 {RET), 318
caller-saved, 410, 428 RET, 84-86, 244, 245, 318, 350, 417,
floating-point, 361 443, 476
general-purpose, 361 RET+l,6, 245
global allocation, 408 RETI,86,400,443,476,517-18
MIPS, 432 RET+P,245
MIPS callee-saved, 452, 454 RETP,86
MIPS caller-saved, 444 retstruct,484,487
MIPS formals, 450 return address
MIPS return, 432, 443 MIPS,432,442-43,455
MIPS scratch, 432, 434, 443 SPARC, 475, 490
MIPS zero, 432, 437 X86, 519
reloading spilled, 425 {return a structure), 245
SPARC, 465, 476 {return a tree for a struct parameter),
SPARC callee-saved, 468 168, 170
SPARC caller-saved, 468 returning structures, see also functions,
SPARC return, 476 returning structures from
SPARC scratch, 467-68 SPARC, 487
SPARC zero, 465, 473 return register
spilling, 357, 409, 420, 472, 502 MIPS,432,443
targeting, 357 SPARC, 476
X86,498 X86,517
X86 return, 517 {return statement), 221, 243
X86 scratch, 500 {return the symbol if p's value== v),
register sets, see wildcards 48
registers,294,295,299 retv,245-46,291, 291-92,294-95
register symbols, 362 retype, 151, 171, 174-77, 179, 181-82,
initializing, 362 197,202,209,233
register targeting, 397 reuse,382-83,384,390-91,406
register-to-register copies, 354, 360, rewrite,353-54,402,403,425
394, 397, 415, see also moveself RIGHT_CHILD,375
and requate right context, 136
INDEX 557

(RIGHT), 318, 335 scratch registers


RIGHT, 149, 151, 155, 156, 166, 171, MlPS,432,434,443
181,184-86,187-89,190, 191, SPARC, 467-68
199-200,208,212-13,245, 318, X86, 500
324, 333, 335-36 (search for an existing type), 57
rightkid, 151, 160,165, 171, 174 seg,265,458,492, 524
RISC, 430, 463 segment,90,91,265,452,459,491,
rmap, 358, 398, 399, 417, 434, 447, 501,502,523, 524
450, 467, 483, 500, 518 segments, 90
rmtypes,42,44,59,67 MIPS, 459
root,155, 156, 190, 191,200,208, SPARC, 491
302,350 X86,501
root nodes, 321, 343 selecting instructions, 354, 373, 402
roundup,19,26,99,283,285,365-66, (select instructions for p), 402
449,451-53,485,487,490, 520 selection code, 236
RSH,84, 177, 198, 318,320 semantic errors, 140
RSH+I,320 (semicolon), 221, 222
RSHI,331,439,474, 508 sentential forms, 128
RSH+U,320 Sethi-Ullman numbering, 428
rtarget, 358, 400, 401, 435, 443, 445, set, 141,143-44, 361,363,395,
447, 468, 476, 477, 480, 508, 410-12,418-19,422,428,445-46,
512-13, 517 449,453
_rule,379, 380-82 (set p and ty), 255, 256
rule, 391 setreg, 358, 398, 399, 400, 435, 437,
rules, in tree grammars, 373 443, 468, 473, 476-77, 480, 508,
rvalue, 169, 169, 174, 179, 181-83, 517
242,245,331 (set tva l's type and value), 120, 121
rvalues, 53, 168 (settval 's value), 118
RX,362,384-86,392-96,399-400,404, (set tva1 and return ICON or SCON),
415-20, 423-24, 428, 477, 479, 123
508, 511 (shared interface definition), 431, 432
(shared progbeg), 371, 433, 466, 498
salign,358,368,369,446-47,460, (shared rules), 400, 403, 431, 463, 496
461,482,483 short-circuit evaluation, 322, 335
sametree,507 SHORT,48,54,58,69, 73,82, 109, 175,
sample instructions 256-57
MIPS,430 shortmetric,58, 78
SPARC, 463 shorttype,57, 58, 74,177,207,257
X86,496 shtree,192. 198,215, 320,331
(save argument i ), 453 %S, 99, 118
(save argument in a register), 453, 454 side effects, 156
(save argument in stack), 453, 454 signedchar,57,58, 177,257
scanner, see lexical analyzer Silicon Graphics Corporation, 431
(scan one string literal), 122, 123 simp.c,147
(scan past a floating constant), 120, (simplify cases), 203, 204-9
121 simplify, 175, 176, 183, 192-95,203,
(scan past a run of digits), 121 204, 207, 208, 212, 214-15, 235,
scatter,31 250,326, 342
scheduling instructions, 428, 475, 481, (sizeof). 164, 165
494 size_t,24,165
sclass, 38, 39, 80, 93-95, 399, 450-51, skipto, 141, 143,144,222,271
483, 486, 520 (skip whitespace), 111, 112
scope, 35, 80 source-code fragments, 1
in parameter lists, 67, 259, 269 source coordinates, 5, 38, 51, 99, 102,
interrupted, 36 338
levels, 38 space,92,300,304,459,491,492, 524
scope, 37, 41, 44, 46, 47, 48-49, 59-60, SPARC
67,80,89,168,210-11,219,227, address calculation, 4 70
260-62,275,278,289,296-98, argument-build area, 487
300,457,491, 521 argument transmission, 4 77
558 INDEX

block moves, 482, 492 stabsym,80,432,464


branch tables, 475 stabtype,80,464
calling conventions, 465, 468 stack pointer, 364
defining constants, 490 MIPS, 432, 434
entry sequence, 488 SPARC, 467
exit sequence, 490 X86,498
external identifiers, 490 %start,376
formals, 469, 485 Start,7,217,217-18,290,339
frame, 487 start nonterminal, 12 7
frame pointer, 467 _state,379,384
generated symbols, 491 state,358, 379,384
immediate instructions, 463, 469 STATLLABEL,375
l burg nonterminals, 469 statement, 216
locals, 469, 483 statement,221,222, 224-25,226,
pseudo-instructions, 463, 470 228, 230, 232, 233-35, 236, 290,
registers, 4 76 294, 295
register variables, 468, 483 (statement label or fall thru to
register windows, 463, 465 default), 221, 226
return address, 475, 490 statements
returning structures, 484, 487 break, 232
return register, 4 76 case labels, 2 34
sample instructions, 463 continue, 228
scratch registers, 46 7-68 for, 228
segments, 491 if,224
stack pointer, 467 loop, 227
variadic functions, 478, 484-85, 489 missing values in return, 244
wildcards, 467 nesting level, 222, 293
zero register, 465, 473 return, 225,243,291
(SPARC bl kloop), 493, 494 selection code for switch, 236
sparc.c,373,463 switch, 230
(SPARC clobber), 468, 477, 479-80 using semicolons to terminate, 216
(SPARCemit2), 478, 479, 482 STATIC,39,48,80, 168,211,242,253,
(SPARC function), 484-85, 487-90 256, 258-59,261-62, 263,271,
(SPARC interface definition), 464 287,293,295,297,299-300,302,
sparcIR,96,464 304, 327,457,459,491-92, 521
(sparc.md), 463 (static local), 299, 300
(SPARC rules), 463, 469-78, 480-82 stdarg.h, 17
(SPARC target), 468, 473, 476-77, _STDC_, 11-18
480 (still in a new-style prototype?), 187,
(SPARC type metrics), 464, 465 188
specifier, 255, 256-57, 258, 265, STMT,97, 150,223, 224, 229, 254, 266,
213, 280 295, 311
spillee,409,412,420,422,429 stmt, 376, 401, 403, 405, 437-38,
(spill floats and doubles from iO-i 5), 441-44, 446, 471-72, 475-78,
489 481-82, 507-8, 512-18
(spill), 428 stmtlabel,226
spill, 357, 358, 409-10, 412, 427, stmtlabs,226,226,291,293,309
435, 443, 444, 468, 472, 477, 479, strength reduction, 208
502, 513, 517-18 string.c,28
spilling registers, 357, 409, 420, 472, (string constants), 167, 168
502 stringd,29, 33,46,49,67-68, 99,210,
(spi 11 r), 423 274,281,365,449,458,484,486,
spillr,409,420,423,428 522
spri ntf, 64-65, 15 stringf,99,275, 363,457,467,491,
src,108, 110,220,338 520-21
stabblock,80,464 _string,390
stabend,80 string,29,29, 58-59,61,99, 123
stabfend,80 string-literal, 122
stabinit,80,432,464 string literals, 103, 122
stabline,80,432,464
INDEX 559

stringn,29,30,33-34, 56, 102,114, tables, 52


123, 168 tags, see types
strings, storing, 29 tail,233,236-37,343,348
string table, 30 -target,96,306
strtod, 120-21, 125 target,357,435,43~443-46,468,
structarg, BO, BB,292,293 473, 476-77, 4BO, 502, 512
(structdcl ), 277-78 targeting registers, 357, 397
structdcl, 66-67, 256-57, 277, 310 templates, assembler, 354, 376, 392
(struct.field), 166 temporaries, 46, 50, 80, 90, 98, 158,
STRUCT, 54,60,69, 73,82,109,256 319,339
structmetric, 79,282 for common subexpressions, 340,
struct-or-union-specifier, 276 344,346
structure arguments, see also for conditionals, 326
functions, passing structures to for structure return values, 185, 332
MIPS, 449, 454 for switch expressions, 231
(structure return block?), 483, 484 linking uses of, 362, 415
SUB, B4, 109, 178,212,242,318 (temporaries), 38, 346
sub, 206 temporary, 50, BO, 90,187, 191,202,
subinstructions, 354, 385, 409 210,319, 346,384, 386,396,413,
subject tree, 3 73 415, 418-19
SUBP, B3,438,474,506-7 tentative definitions, 255, 303
(subscript), 166, 181 %term,376
subscripting, 181 (tenninal declarations), 376, 431, 463,
subtree, l burg, 380 496
subtree, 109, 192,215 terminals, 19
SUBU,438,474, 506 lburg,373
Sun Microsystems, Inc., 463 ( tenninate list for a varargs
super, 175, 177, 214 funcaon),273,274
supertypes, 174 (test for correct termination), 155,
swap,358,371,456, 522 156, 157
(swcode), 240, 241 test,141, 142, 14~ 156, 229,235,
swcode,239,240,241,251,342 258,278,280
swgen,237,239,240, 250 test suites, 532
switch branch tables, see branch tables texpr,150,229-30
(Switch), 217,242 %t,62,99
switch handles, 222, 232, 293 t, 108
Switch, 217, 218, 232, 233, 236, 242, tmask, 358, 410, 417, 422, 434, 46B,
243,249,338,341,342 500
(switch statement), 221, 232 tmpnode, 346, 347, 348
SWSIZE,232,233 tmpregs,434,446,482
swstmt,232,233, 236 tnode,266,268,270
Swtch,221,224,228,231,235,239-40 token, 108
swtoseg,26~293, 342 token codes, 5, 99, 102
(symbol flags), 38, 50, 179, 211, 292 associated values for, 102, 110, 113
Symbol,37,39, 52, 110 definitions of, 110
symbol, 37, 40, 55, 65, 6B, 7B, BO, 117, (token.h), 109
274, 362, 527 token.h,60, 109, 110, 143, 155,191
symbols including, 109, 143, 155, 191
back-end extension to, 39, 81 token, 108, 110, 114-16, 117, 123,
computed, see computed symbols 142, 182,226-27,267,271,277
target-specific names for, 89 tokens, 5, 19, 102
symbols,52 top-down parsing, 128
(symbol-table emitters), 497, 498 (tp is a tree fore++), 335, 336
symbol tables, 35 trace.c, 14
sym.c, 51, 55 translation-unit, 253
syms,81, B3,98, 362 trashes,3B5
syntax-directed translation, 349 tree
cover, 373, 377
Table,39,40,44-45, 52,219,226 parser, 373
table, 39-40,41,44,242-43,291,342 partial cover, 374
500 INDEX

patterns, 373 metrics, 58, 78


subject, 373 new-style function, 63
transformations, 12 old-style function, 63
tree.c, 147, 150 opaque, see opaque types
tree grammars, 8, 373 operators, 54, 60, 74
ambiguity in, 374 parameter prototypes in, 63, 75
costs in, 374 predefined, 58
Tree, 148 predicates, 98
tree, 148, 150, 155, 166-70, 176, 177, prefix form for, 55
181, 183, 187-91, 193, 197, qualified,60,267
199-200, 203, 211-12, 229, 242, scope of, 59
245,327,333 sizes of, 55, 78, 81
(tsym ~ typenamed bytoken), 114, tags, 55, 65, 276
115 typedefs for, 115, 256
tsym, 108, 110, 112, 113, 115, 117, unqualified,60
120, 123, 164-65, 167-68, 170, types.c,75
256,259,270-71,273, 278,280, types, 40, 41, 44, 55, 58, 59, 61, 66, 67,
287, 295 219, 278, 294
ttob, 73, 73-74, 81, 99, 167, 169, type-specifier, 254
193-94,197,203-7,209, 305,346, typestring, 75-76
447,449-50,453,483, 518 type suffixes, 50, 73, 82, 91, 98
tval,117, 118, 120-21, 123 (type sufflx for tmp->type), 346, 348
(ty is an enumeration or has a tag), (types with names or tags), 54, 55
258 typetable,56, 56-58, 59,67
(type cast), 164, 180-81 (typicalfunction), 93
(type-check e), 233
type-checking, 53, 147, 172 u.block,295
actual arguments, 189 u.c, 371
pointer comparisons, 201 u.c. loc, 81
type constructors, 53 u.c.v,80
TYPEDEF, 39, 115, 170, 256, 259-61, u.c.v.sc,389
301 u.c.v.uc,389
typedef,34,37-39,47,54,66, 78-79, u.f,290
81,96, 148,217,231,355,358, (u fields for Tree variants), 149, 168,
361-62, 365 183
(typedefs), 16 u.i,371
typeerror,178, 192,194, 197,201 UINT_MAX, 117
Type, 39, 47-48, 54, 55, 57, 63, 73, 75, u. l. label, 80
149, 180 Ultrix operating system, 431
type, 56, 57, 61, 69, 76, 78, 80-81, 85, unary-expression, 154, 163
93, 450-51 unary, 153, 157, 161, 162, 164, 165,
(type!nit), 58-59,61 166, 180, 214
type!nit, 58, 59-60, 306, 327 unary operators, 153
type metrics, 58, 78 undag,340, 342,343, 348
type-name, 308 (undeclared identifier), 170
typename,165,180,309 uninitialized objects, 304
type predicates, 53 union-find, 351
type-qualifier, 2 54 UNION, 54,60,69, 73, 109,256,283
types unlist, 325, 350
alignments of, 55, 78, 81 unqual,48,60, 60-62, 71, 73, 168-69,
assumptions about, 58, 79, 527 175, 180, 182, 186-87, 190-91,
back-end extension to, 55 193-95, 197, 200-202, 281,
categories of, 54 288-89,319, 327
compatible, 69 unreachable code, 218, 246
composite, 71 unreferenced identifiers, 292, 296, 298
decaying to other types, 62, 173 unsignedchar, 57, 58, 177,257
enumeration, 68 UNSIGNED,48, 54, 58,60,69, 73,82,
equality of, 56, 69 109, 175, 256-57
incomplete, 264 unsignedlong, 57,58, 117-18,257
intermediate, 53 unsigned operands of unary-, 178
INDEX 561

unsigned-preserving conversions, 173 walk,223-2~227-28,229,232,234,


unsignedshort,58, 177,206,257 242,244, 245-47,294,296, 302,
(unsigned-to-double conversion), 176, 311, 312, 339, 348
177 ending a basic block with, 223
unsignedtype,58, 66, 74, 118,165, wants_argb,80,88, 168, 169-70,
173, 175-78, 190, 194-95, 198, 183-85, 188, 190, 292, 483, 486,
202,204, 206-9, 245, 257,281, 529
283-84,330-31 wants_callb, 85, 88, 183, 185-86,
upcalls, 97 245, 291-92, 294-95, 332, 333,
in the back end, 357 483, 529
usedmask, 358, 410, 410, 428, 451, wants_dag, 80, 82, 85, 89, 340, 343,
454-55,488 353, 373
u.seg, 80,90 (warn about non-ANSI literals), 123
use,51, 52, 170,227,256,278,290 (warn about overflow), 120, 121
uses,37, 38-39, 51,422 (warn if more than 127 identifiers), 44
u.s.flist, 279 (warn if p denotes the address of a
u.t.cse,80,412 local), 245
u.value,310 warning, 143
warnings, see errors, reporting
va_alist, 18, 142,449 wchar_t, 122, 126
va_init, 17, 1~ 142 where,150
Value,47,49,91,168, 204,455, 522 (while statement), 221
value,47,69, 155, 156, 157,160, %w,99
169-70, 174, 175, 188-89 wide-character constants, 121
value numbering, 349 (wide-character constants), 112
value-preserving conversions, 173 widen, 73, 74, 188, 190,245
varargs.h, 17 widening, 185
VARARGS, 18, 142 wildcard,363, 398-99
variadic functions, 17, 64, 99, 187, 274 wildcards, 363, 399
MIPS, 448-49, 453 initializing, 363
SPARC,478,484-85,489 MIPS, 434
variadic,65, 71, 99,449,484 SPARC, 467
va_start, 17 X86, 500
vbl,361,412,424,48~489 word-addressed machines, 528
vfields,65, 66,282, 319
visibility, see scope X86
(visit), 345, 346-48 address calculations, 503
visit,343,344, 345,347,351 argument transmission, 512, 519
(visit the operands), 347, 348 assemblers, 496
vmask, 358, 410, 413, 434, 468, 500, block moves, 512
518 branch tables, 515
VOID, 54-5~ 59, 73,82, 109,256 calling conventions, 496
void pointers, 194, 196, 201 defining constants, 522
voidptype,58,61, 74,201,242 entry sequence, 519
voidtype,58, 59, 61-62,65, 168-69, exit sequence, 520
181, 186-87, 18~ 194, 197, 202, external identifiers, 523
242-43, 273-74, 291 formals, 519
VOLATILE, 54,60,63,69, 72-73,109, frame, 520
180,183, 201,256-57,266,268 frame pointer, 498
volatile, 48 l burg nonterminals, 503
volatile variables, 319 locals, 518
v.p,92 registers, 498
VREG,361,384, 395,399,420,508 register variables, 500
VREGP,383,400-401 return address, 519
v.sc,91 return register, 517
v.ss, 91 sample instructions, 496
vtoa,48 scratch registers, 500
v.uc, 91 segments, 501
v.us,91 stack pointer, 498
wildcards, 500
562 INDEX

x86.c,373,496 Xsymbol,38, Bl, 89-90,93,362


(XB6 clobber), 502, 513, 517-18 x.target,357,400
(XB6 defconst), 522 x._templates, 390,419
(X86 defsymbol ), 521 (*xty has all of*yty's qualifiers), 196
(XB6 emit2), 511 Xtype,54, 55
(X86 function), 519-20 x.usecount,362,384,386-87
(X86 interface definition), 497 x.wildcard,363,400,404,411,422
x86IR,96,497, 513
(x86.md),496 YYnull,179,181-82, 214
(XB6 progbeg), 498, 499-501, 509 _YYnull,215
(X86 rules), 497, 503-18
(XB6target), 502, 508-10, 512-13, zerofield,208,209
517 zero register
x.argno,359,445-46 MIPS, 432, 437
x.blkfetch,355,369 SPARC, 465, 473
x.blkloop,356,368
x.blkstore,356,369
x.clobber,357,417
x.copy,360, 394,416,419
x. doarg, 356, 357, 403
x.emit2,356, 392
x.emitted,360, 392,394
x.equatable,360, 393, 394,419,420
xfoldcnst,205
-x, 51'
x.inst,358, 359,382, 385,386, 394,
419, 427, 470
(Xinterface),355-57
Xinterface, 79,355
(Xi nterface initializer), 355, 379,
432,464,498
x.isinstruction,382
x.kids,359, 385,386-87,394-95,400,
409, 413, 415-16, 418-19, 422,
426-27, 446-47, 479, 482, 502,
511
x.lastuse,362,415-16,418,420,423
x.listed,359,385,403,417
x.max_unaligned_load, 355, 369,
483, 493 .
x.mayrecalc,360, 384, 385
x.name,90,362,363,365,392,394,
396,413,424,434,449,452,
455-56, 451-58, 484, 486, 488,
491, 511,519, 520, 522
x.next,359, 393,395-96,413-14,
422-23,425,428
(Xnode fields), 358-59
(Xnode flags), 358, 359-61
Xnode,81, 82,358
x.offset,362,365,388,449,453-54,
457-58,484,486, 520
x.prev,359,413-14
x.prevuse,359,415,419,423
x.registered,360,417,419,423-24,
428, 437, 473, 476
x. regnode, 362, 363, 395, 410-11, 412,
419,422-24,428,445-46,449,
453-54,477,485,488-90,509
x.state,358,375,382,384
How to Obtain Ice

The complete source code for 1cc is available free of charge to the pur-
chaser of this book. All distributions include the source code for the
front end, the code generators for the SPARC, MIPS R3000 and Intel 386,
the source code for the code-generator generator, and documentation
that gives instructions for installing and running 1cc on a variety of plat-
forms. 1cc runs on UNIX systems and on PCs with a 386 processor or its
successor running DOS 6.0 or Windows 3.1.
There is an electronic 1cc mailing list. To subscribe, send a e-mail
message with the one-line body
subscribe lee
to majordomo@cs. pri nceton. edu. This line must appear in the message
body; "Subject:" lines are ignored. Additional information about 1cc is
also available on the Wide World Web via Mosaic and other Web browsers.
The universal resource locator is
http://www.cs.princeton.edu/software/lcc
1cc may be obtained from the sources listed below.

Internet
The distribution is available for downloading via anonymous ftp from
ftp. cs. pri nceton. edu (128.112.152.13) in the directory pub/lee. To re-
trieve information about the distribution, ftp to ftp. cs. p ri nee ton. edu;
for example, on UNIX systems, use the command
ftp ftp.cs.princeton.edu
Log in as anonymous, and use your e-mail address as your password.
Once connected, change to the 1cc directory with the command
cd pub/lee
The file named README gives instructions for retrieving the distribution
with ftp and information about 1cc since this book went to press. The
command
get README
will retrieve this file. Follow the instructions therein for retrieving the
distribution in the form that is appropriate for your system.
563
HOW TO OBTAIN LCC

Diskette
The distribution is available free of charge on a 3.5", high-density diskette
to the original purchaser of this book. To obtain your copy, fill in the
coupon on the next page and return it to Benjamin/Cummings.
l cc is an active research compiler and will continue to change over
time. Thus, the diskette version cannot be as up to date as the online
versions.
To obtain a free 3.5" diskette containing the 1cc distribution, fill in
the coupon below, carefully remove this entire page from the book, fold
the page so that the Benjamin/Cummings Publishing Company address,
printed on the reverse side, is visible, attach appropriate postage, and
mail. Allow two weeks from receipt of this coupon for delivery.

Only an original of this page can be redeemed for a diskette; photo-


copies are not accepted.
Computer Science Marketing Department
Benjamin/Cummings Publishing Company
390 Bridge Parkway
Redwood City, CA 94065
Attention: l cc Disk Fulfillment

You might also like