GCC Internals 1
GCC Internals 1
Abhijat Vichare
CFDVS, Indian Institute of Technology, Bombay
January 2008
A.Vichare
GCC Internals
Plan
Part II Gimple The MD-RTL and IR-RTL Languages in GCC GCC Machine Descriptions
A.Vichare
GCC Internals
A.Vichare
GCC Internals
(1:1:2)
gcc
Target Program
A.Vichare GCC Internals
(1:1:2)
cc1
cpp
gcc
Target Program
A.Vichare GCC Internals
(1:1:2)
cc1
cpp
gcc
Target Program
A.Vichare GCC Internals
(1:1:2)
cc1
cpp
gcc
as
Target Program
A.Vichare GCC Internals
(1:1:2)
cc1
cpp
gcc
as
ld
Target Program
A.Vichare GCC Internals
(1:1:2)
cc1
cpp
gcc
as glibc/newlib ld
Target Program
A.Vichare GCC Internals
(1:1:2)
cc1
cpp
gcc
as
GCC glibc/newlib
ld
Target Program
A.Vichare GCC Internals
A.Vichare
GCC Internals
GCC is: Retargetable: Can generate code for many back ends Re-sourcable: Can accept code in many HLLs
A.Vichare
GCC Internals
GCC is: Retargetable: Can generate code for many back ends Re-sourcable: Can accept code in many HLLs
A.Vichare
GCC Internals
Sequence
Parsing
Semantic Analysis
Optimization
GCC is: Retargetable: Can generate code for many back ends Re-sourcable: Can accept code in many HLLs
A.Vichare
GCC Internals
tdevelop : The Development time (the gcc developer view) tbuild : The Build time (the gcc builder view) top : The Operation time (the gcc user view)
The downloaded GCC sources . . . . . . correspond to the gcc developer view, and . . . are ready for gcc builder view.
A.Vichare
GCC Internals
(2:1:3)
GCC
HLL Specic Code, per HLL Language and Machine Independent Generic Code Machine dependent Generator Code Set of Machine Descriptions
tdev
Parser
Genericizer
Gimplier
RTL Generator
Optimizer
Code Generator
A.Vichare
GCC Internals
(2:1:3)
GCC
HLL Specic Code, per HLL Language and Machine Independent Generic Code Machine dependent Generator Code Set of Machine Descriptions
tdev
Selected
tbuild
Tree SSA Optimizer cc1/gcc RTL Generator Code Generator
Parser
Genericizer
Gimplier
Optimizer
A.Vichare
GCC Internals
(2:1:3)
GCC
HLL Specic Code, per HLL Language and Machine Independent Generic Code Copied Machine dependent Generator Code Set of Machine Descriptions
tdev
Selected
tbuild
Tree SSA Optimizer cc1/gcc RTL Generator Code Generator
Parser
Genericizer
Gimplier
Optimizer
A.Vichare
GCC Internals
(2:1:3)
GCC
HLL Specic Code, per HLL Language and Machine Independent Generic Code Copied Machine dependent Generator Code Set of Machine Descriptions
tdev
Selected
Generated
tbuild
Tree SSA Optimizer cc1/gcc RTL Generator Code Generator
Parser
Genericizer
Gimplier
Optimizer
A.Vichare
GCC Internals
(2:1:3)
GCC
HLL Specic Code, per HLL Language and Machine Independent Generic Code Machine dependent Generator Code Set of Machine Descriptions
tdev
Parser
Genericizer
Gimplier
RTL Generator
Optimizer
Code Generator
top
Source Program
Assembly Program
A.Vichare
GCC Internals
Is GCC complex?
As a Compiler . . . . . . Architecture? Not quite! . . . Implementation? Very much! Architecture wise:
1
Supercially: GCC is similar to typical compilers! Deeper down: Dierences are due to: Retargetability GCC can be (and is) used as a Cross Compiler !
A.Vichare
GCC Internals
(1:1:3)
For the targetted (= pristine + generated) C compiler Total Total Total Total Total lines of code lines of pure code pure code WITHOUT #include number of #include directives #include les 810827 606980 602351 4629 336
A.Vichare
GCC Internals
(1:1:4)
Realistic code size information (excludes comments) Total Total Total Total lines lines lines lines of of of of code .md code header code C code 47290 23566 9986 16961
A.Vichare
GCC Internals
A.Vichare
GCC Internals
(1:1:6,7)
A.Vichare
GCC Internals
(1:1:6,7)
Args
Stmt1
Stmt2
A.Vichare
GCC Internals
(1:1:6,7)
f (a) { unsigned int i.0; char * i.1; char * D.1140; int D.1141; ... goto <D1136>; <D1135>: ... D.1140 = a + i.1; D.1141 = g * i; ... <D1136>: if (i < n) goto <D1135>; ... }
A.Vichare GCC Internals
(1:1:6,7)
f (a) { ... int D.1144; ... <bb 0>: n_2 = 10; i_3 = 0; goto <bb 2> (<L1>); <L0>: ... D.1140_9 = a_8 + i.1_7; D.1141_11 = g_10 * i_1; ... <L1>:; if (i_1 < n_2) goto <L0>; else ...; ... }
A.Vichare GCC Internals
(1:1:6,7)
(insn 21 20 22 2 (parallel [ (set (reg:SI 61 [ D.1141 ]) (mult:SI (reg:SI 66) (mem/i:SI (plus:SI (reg/f:SI 54 ...) (const_int -8 ...))))) (clobber (reg:CC 17 flags)) ]) -1 (nil) (nil))
A.Vichare
GCC Internals
(1:1:6,7)
A.Vichare
GCC Internals
(2:1:5)
A.Vichare
GCC Internals
(2:1:10)
Creating GIMPLE representation in cc1 and GCC c_genericize() c-gimplify.c gimplify_function_tree() gimplify.c gimplify_body() gimplify.c gimplify_stmt() gimplify.c gimplify_expr() gimplify.c lang_hooks.callgraph.expand_function() tree_rest_of_compilation() tree-optimize.c tree_register_cfg_hooks() cfghooks.c execute_pass_list() passes.c /* TO: Gimple Optimisations passes */
A.Vichare
GCC Internals
(2:1:11)
A.Vichare
GCC Internals
(2:1:13)
Tree Pass Organisation Data structure records pass info: name, function to execute etc. (struct tree opt pass in tree-pass.h) Instantiate a struct tree opt pass variable in each pass le. List the pass variables (in passes.c). Dead Code Elimination (tree-ssa-dce.c) struct tree_opt_pass pass_dce = { "dce", // pass name tree_ssa_dce, // fn to execute NULL, // sub passes ... // and much more };
A.Vichare
GCC Internals
(2:1:19)
A.Vichare
GCC Internals
RTL Passes
Driver: passes.c:rest of compilation ()
(2:1:20)
Basic Structure: Sequence of calls to rest of handle * () + bookkeeping calls. (over 40 calls!) Bulk of generated code used here! (generated code in: $GCCBUILDDIR/gcc/*.[ch]) Goals:
Optimise RTL Complete the non strict RTL
Manipulate
either the list of RTL representation of input, or contents of an RTL expression, or both.
A.Vichare
GCC Internals
(2:1:26)
passes.c:rest of handle final() calls assemble_start_function (); final_start_function (); final (); final_end_function (); assemble_end_function (); varasm.c final.c final.c final.c varasm.c
A.Vichare
GCC Internals
A.Vichare
GCC Internals
(1:1:14)
Some Terminology The sources of a compiler are compiled (i.e. built) on machine X X is called as the Build system The built compiler runs on machine Y Y is called as the Host system The compiler compiles code for target Z Z is called as the Target system Note: The built compiler itself runs on the Host machine and generates executables that run on Target machine!!!
A.Vichare
GCC Internals
(1:1:15)
Some Denitions Note: The built compiler itself runs on the Host machine and generates executables that run on Target machine!!! A few interesting permutations of X, Y and Z are: X=Y=Z X=Y=Z X=Y=Z Native build Cross compiler Canadian Cross compiler
Example Native i386: built on i386, hosted on i386, produces i386 code. Sparc cross on i386: built on i386, hosted on i386, produces Sparc code.
A.Vichare
GCC Internals
Building a Compiler:
Bootstrapping A compiler is just another program It is improved, bugs are xed and newer versions are released To build a new version given a built old version:
1 2
Stage 1: Build the new compiler using the old compiler Stage 2: Build another new compiler using compiler from stage 1 Stage 3: Build another new compiler using compiler from stage 2 Stage 2 and stage 3 builds must result in identical compilers
A.Vichare
GCC Internals
(1:1:11)
Our conventions GCC source directory : $(GCCHOME) GCC build directory : $(GCCBUILDDIR) GCC install directory : $(GCCINSTALLDIR) $(GCCHOME) = $(GCCBUILDDIR) = $(GCCINSTALLDIR)
A.Vichare
GCC Internals
(1:1:16)
Some Information Build-Host-Target systems inferred for native builds Specify Target system for cross builds Build Host systems: inferred Build-Host-Target systems can be explicitly specied too For GCC: A system = three entities
cpu vendor os
A.Vichare
GCC Internals
(1:1:17,19)
Specify target: optional for native builds, necessary for others (option --target=<host-cpu-vendor string>) Choose source languages (option --enable-languages=<CSV lang list (c,java)) Specify the installation directory (option --prefix=<absolute path of $(GCCBUILDDIR)>) configure output: customized Makefile
prompt$ make 2> make.err > make.log prompt$ make install 2> install.err > install.log Tip Run configure in $(GCCBUILDDIR). See $(GCCHOME)/INSTALL/.
A.Vichare GCC Internals
Adding a New MD
To add a new backend to GCC Dene a new system name, typically a triple. e.g. spim-gnu-linux Edit $GCCHOME/config.sub to recognize the triple Edit $GCCHOME/gcc/config.gcc to dene
any backend specic variables any backend specic les $GCCHOME/gcc/config/<cpu> is used as the backend directory
(1:1:18)
for recognized system names. Tip Read comments in $GCCHOME/config.sub & $GCCHOME/gcc/config/<cpu>.
A.Vichare GCC Internals
(1:1:20)
GCC builds in two main phases: Adapt the compiler source for the specied build/host/target systems Consider a cross compiler:
Find the target MD in the source tree Include MD info into the sources (details follow)
A.Vichare
GCC Internals
(1:1:21)
make rst compiles and runs a series of programs that process the target MD Typically, the program source le names are prexed with gen The $GCCHOME/gcc/gen*.c programs
read the target MD les, and extract info to create & populate the main GCC data structures
struct c_test insn_conditions[], size_t n_insn_conditions genconditions genconstants libiberty.a gensupport.c rtl.c readrtl.c printrtl1.c errors.c bitmap.c ggcnone.c genflags genconfig gencodes genattr genemit genextract genopinit genpeep insnconditions.c insnconstants.c insnflags.h insnconfig.h insncodes.h insnattr.h insnemit.c insnextract.c insnopinit.c insnpeep.c HAVE_ATTR_(md_inst_attribs) RTX exmission functions for every insn in MD file Extract operands of RTL instructions in MD file Writes a function that initialises an array with the code for each insn/expand in MD file Extract peephole optimisation information in MD files GCC_INSN_CONSTANTS_H HAVE_(md instructions) enum insn_code { CODE_FOR_(md inst)= ..
...
};
A.Vichare
GCC Internals
(1:1:23)
Run: make install to install the compiler Tip Redirect all the outputs: $ make > make.log 2> make.err
A.Vichare GCC Internals