0% found this document useful (0 votes)
78 views

Build GCC Cross Compiler For A Specify CPU

This document discusses building a GCC cross compiler for a custom CPU. It first describes designing a simple 32-bit RISC CPU, including its specification, registers, instruction formats, and instruction sets. It then discusses the structure of GCC and the knowledge required to port it. The build flow involves first building a GCC cross assembler and linker, then the cross compiler. A test program is used to validate the cross compiler.

Uploaded by

Atul Jadhav
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views

Build GCC Cross Compiler For A Specify CPU

This document discusses building a GCC cross compiler for a custom CPU. It first describes designing a simple 32-bit RISC CPU, including its specification, registers, instruction formats, and instruction sets. It then discusses the structure of GCC and the knowledge required to port it. The build flow involves first building a GCC cross assembler and linker, then the cross compiler. A test program is used to validate the cross compiler.

Uploaded by

Atul Jadhav
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 45

Build GCC Cross

Compiler for a Specify


CPU

Chia-Tsun Wu
D92943007
[email protected]
Outline

 Introduction to SoC
 Motivation and project goal
 Design a CPU
 Tools are used to design CPU hardware
 CPU Specification
 CPU Design flow
 Simulation and Results
Outline

 Build a GCC Cross Compiler


 GCC structure
 Knowledge to port GCC
 Build Flow
 Build a GCC Cross Assembler and Cross Linker
 Build a GCC Cross Compiler
 A simple test program
 Summary
Introduction to SoC
 SoC: System on a Chip.
 Highly integrated include:
 CPU
 System Bus
 Peripherals
 Co-processor
 …………

 Low cost, low area, high performance.


What is SOC?

Portable / reusable IP Software (both on-chip and off)


Embedded CPU Mixed-signal Blocks
Embedded Memory Programmable HW (FPGAs)
Real World Interfaces > 500K gates
(USB, PCI, Ethernet)
SOC Design Flow
System Specs..

HW/SW
Partitioning
Hardware Descript. Software Descript.

HW Synth. and Software Gen.


Configuration Interface Synthesis & Parameterization

Configuration Hardware HW/SW Software


Modules Components Interfaces Modules

HW/SW Integration
and Cosimulation

Integrated
System
System Evaluation Design Coverification

System Validation
Motivation and project goal
 Motivation:
 SoC is the major trend in recent years
 CPU is one of the key kernel of SoC design
 Development environment is the most important t
o a CPU
 Goal:
 Design a simple 32-bit RISC CPU
 Build a cross assembler and cross linker for a spec
ify CPU
 Build a cross compiler for a specify CPU
Design a CPU
 Specification
 32-bit RISC based CPU
 General-purpose register architecture
 32-bit (64 Gbyte) addressing
 32-bit fixed instruction length (excluding immediate data)
 MSB first
 Reset address 0x000ffffc
 No pipeline, one instruction cycle four clock cycles
 Instruction fetch
 Instruction decode and Data fetch
 Execution
 Write back
 No interrupt
 No timer
Registers
 General purpose register R0~R15
 R13: Accumulator
 R14: memory data pointer
 R15: stack pointer

 Program counter (PC) (0x000ffffc after r


eset)
 Program status (PS) (Sign flag, Zero fla
g, oVerflow flag, Carry flag)
Instruction formats
 General: OP Rn1, Rn2
 OP: 8 bits
 n: register number 0000: R0, 1111: R15
 Immediate: OP #data, Rn2
 OP: 8 bits
 n: register number 0000: R0, 1111: R15
 #data:32 bit data
 Branch: OP Addr
 OP: 16 bit (low byte=0x00)
 Addr: 32 bits branch address
Instruction sets
 ADD Rn1,Rn2 Machine code:00000000Rn1Rn2
 Rn2=Rn1+Rn2
 Flag: SZVC
 ADDC Rn1,Rn2 Machine code:00000001Rn1Rn2
 Rn2=Rn1+Rn2
 Flag: SZVC
 SUB Rn1,Rn2 Machine code:00000010Rn1Rn2
 Rn2=Rn2-Rn1
 Flag: SZVC
 SUBC Rn1,Rn2 Machine code:00000011Rn1Rn2
 Rn2=Rn2-Rn1
 Flag: SZVC
Instruction sets
 LDI #data,Rn2 Machine code:00001000000Rn2#Data
 Rn2=data
 Flag:
 MOV Rn1,Rn2 Machine code:00000101Rn1Rn2
 Rn2=Rn1
 Flag:
 RET Machine code:0000011000000000
 PC=[SP--]
 Flag:
 JMP #Addr Machine code:0000011100000000#Addr
 PC=[Addr]
 Flag:
Tools are used
 Synposis Design Compiler
 Mentor Graph ModelSim
 Synposis Apollo
 TSMC 0.25um standard cell libraries
Design Flow CPU Specifications

RTL Coding

Test bench Function simulation

Constrain Design compiler

Test bench Gate level simulation

Constrain Apollo

Test bench Post layout simulation Tape out


Test vectors
LDI #0x0,R0 00000000000000000000010000000000 00000000000000000000000000000000
LDI #0x1,R1 00000000000000000000010000000001 00000000000000000000000000000001
LDI #0x2,R2 00000000000000000000010000000010 00000000000000000000000000000010
LDI #0x3,R3 00000000000000000000010000000011 00000000000000000000000000000011
LDI #0x4,R4 00000000000000000000010000000100 00000000000000000000000000000100
LDI #0x5,R5 00000000000000000000010000000101 00000000000000000000000000000101
LDI #0x6,R6 00000000000000000000010000000110 00000000000000000000000000000110
LDI #0x7,R7 00000000000000000000010000000111 00000000000000000000000000000111
LDI #0x8,R8 00000000000000000000010000001000 00000000000000000000000000001000
LDI #0x9,R9 00000000000000000000010000001001 00000000000000000000000000001001
LDI #0xa,R10 00000000000000000000010000001010 00000000000000000000000000001010
LDI #0xb,R11 00000000000000000000010000001011 00000000000000000000000000001011
LDI #0xc,R12 00000000000000000000010000001100 00000000000000000000000000001100
LDI #0xd,R13 00000000000000000000010000001101 00000000000000000000000000001101
LDI #0xe,R14 00000000000000000000010000001110 00000000000000000000000000001110
LDI #0xf,R15 00000000000000000000010000001111 00000000000000000000000000001111
ADD R0,R1 00000000000000000000000000000001
ADDC R2,R3 00000000000000000000000100100011
SUB R4,R5 00000000000000000000001001000101
SUBC R6,R7 00000000000000000000001101100111
MOV R8,R9 00000000000000000000010110001001
JMP 0x000000 00000000000000000000011100000000 00000000000000000000000000000000
Simulation result
Synthesis results
 TSMC 0.25um  UMC 0.18um
 Area:0.35mm*mm  Area:0.19mm*mm
 Clock:400MHz  Clock:600MHz
 Power:1.73mW  Power:1mW
Build a GCC Cross Compiler

 GCC structure
 Knowledge to port GCC
 Build Flow
 Build a GCC Cross Assembler and Cross
Linker
 Build a GCC Cross Compiler
 A simple test program
 Summary
GCC Execution

Input file
gcc
output file

cc1 gas ld
cpp
g++ (assembler) (linker)
The Structure of Compiler
The Structure of GCC
C C++ ObjC Fortran

Parsing

TREE
RTL
Machine
Description Global Optimizations
- Jump Optimization
Macro - Common Subexpr. Elimination
Definition - Loop Optimization
- Data Flow Analysis

Instruction Combining
Instruction Scheduling
Register Class Preferencing
Register Allocation
Peephole Optimizations

Assembly
GCC Code Generation

 Backend machine description pattern


match intermediate format (RTL).
 Machine description like a template.
 Machine description includes
 type bit widths, memory alignment
 instruction patterns, register classes

 peephole optimization rules


GCC Code Generation (cont’d)

(set (reg:SF 12)


(minus:SF (reg:SF 13) Intermediate format (RTL)
(reg:SF 14)))

(define_insn "subsf3"
[(set (match_operand:SF 0 "register_operand" "=f")
(minus:SF (match_operand:SF 1 "register_operand" "f")
(match_operand:SF 2 "register_operand" "f")))]
""
"subf\\t%0,%1,%2")
Machine description

subf r1, r2, r3 Output assembly


Example of RTL

(plus:SI (reg:SI 8) (const_int 123))

 Adds two 4-byte integer (SImode) opera


nds.
 First operand is register
 Register is also 4-byte integer.
 Register number is 8.
 Second operand is constant integer.
 Value is “123”.
 Mode is VOIDmode (not given).
Templates

 Used for three purposes:


 Generating RTL from parse tree.
 Generating machine insns from RTL.
 Specifying parameters about instructions.
 Sample Template for RISC machine:
(define_insn "addsi3"
[(set (match_operand:SI 0 "register_operand" "=r")
(plus:SI (match_operand:SI "register_operand" "%r")
(match_operand:SI 2 "register_operand" "r")))]
""
"add %0,%1,%2"
[(set_attr "type "arith")])
GCC Porting and Retargeting

 Porting to new machines/processors


 The “Using and Porting the GCC” book and
self-contained.
 Done by describing machine, not how to
compile for machine.
 Using GCC as backend for other language
 Few well-documented.
 Few examples.
 See GNAT 、 GNU Cobol 、 Fortran porting.
 In both case, copy from similar ports.
How to port GCC

 In directory gcc-xxx/gcc/config/machi
ne/
 machine.h
 Contain C macros that define general attributes of
the machine.
 machine.md
 Contain RTL expressions that define the instructio
n set.
 Input to programs that procude .h and .c files.
 machine.c
 Machine-dependent functions; normally things too
large to cleanly put into above two files.
How to port GCC (cont’d)
gcc/config
--Architecture characteristic key
 H A hardware implementation does not exist.
 M A hardware implementation is not currently being manufactured.
 S A Free simulator does not exist.
 L Integer registers are narrower than 32 bits.
 Q Integer registers are at least 64 bits wide.
 N Memory is not byte addressable, and/or bytes are not eight bits.
 F Floating point arithmetic is not included in the instruction set
 I Architecture does not use IEEE format floating point numbers
 C Architecture does not have a single condition code register.
 B Architecture has delay slots.
 D Architecture has a stack that grows upward.
 l Port cannot use ILP32 mode integer arithmetic.
gcc/config
--Architecture characteristic key
 q Port can use LP64 mode integer arithmetic.
 r Port can switch between ILP32 and LP64 at runtime. (Not necessarily supporte
d by all subtargets.)
 c Port uses cc0.
 p Port does not use define_peephole.
 f Port does not define prologue and/or epilogue RTL expanders.
 g Port does not define TARGET_ASM_FUNCTION_(PRO|EPI)LOGUE.
 m Port does not use define_constants.
 b Port does not use '"* ..."' notation for output template code.
 d Port uses DFA scheduler descriptions.
 h Port contains old scheduler descriptions.
 a Port generates multiple inheritance thunks using TARGET_ASM_OUTPUT_MI(_
VCALL)_THUNK.
 t All insns either produce exactly one assembly instruction, or trigger a define_sp
lit.
 e <arch>-elf is not a supported target.
 s <arch>-elf is the correct target to use with the simulator in /cvs/src.
gcc/config
--Architecture characteristic key
 Gcc-config.txt
define_peephole
 In addition to instruction patterns the `md' file may c
ontain definitions of machine-specific peephole optimi
zations.
 The combiner does not notice certain peephole optim
izations when the data flow in the program does not
suggest that it should try them.
 For example, sometimes two consecutive insns relate
d in purpose can be combined even though the seco
nd one does not appear to use a register computed i
n the first one. A machine-specific peephole optimizer
can detect such opportunities.
define_splits
 Often you can rewrite the single insn as a list of individual insns,
each corresponding to one machine instruction.
 The compiler splits the insn if there is a reason to believe that it
might improve instruction or delay slot scheduling.
 Splits are evaluated after the combiner pass and before the sch
eduling passes
 Splits optimaized the speed and instruction length
 they are the perfect place to put this intelligence.
 Ex: If we are loading a small negative constant we can save spa
ce and time by loading the positive value and then sign extendi
ng it.
define_expand
 On some target machines, some standard pattern names for RT
L generation cannot be handled with single insn, but a sequenc
e of RTL insns can represent them.
 For these target machines, you can write a `define_expand' to s
pecify how to generate the sequence of RTL.
 A `define_expand' is an RTL expression that looks almost like a
`define_insn'; but, unlike the latter, a `define_expand' is used o
nly for RTL generation and it can produce more than one RTL in
sn.
 The combiner pass only
 cares about reducing the number of instructions
 does not care about instruction lengths or speeds
define_insn
 Push and pop  Addition
 movsi_push  add_to_stack
 movsi_popmove  addsi3
 Move  addsi_regs
 movqi_unsigned_register_load movqi_signed_regis  addsi_small_int
ter_load  addsi_big_int
 *movqi_internal  *addsi_for_reload
 movhi  Subtraction
 movhi_unsigned_register_load movhi_signed_regis  subsi3
ter_load
 *movhi_internal
 Multiplication
 movsi
 mulsidi3
 movsi_internal
 umulsidi3
 movdi
 mulhisi3
 *movdi_insn
 umulhisi3
 movsf
 mulsi3
 *movsf_internal  Negation
 *movsf_constant_storeSigned  negsi2
 conversions from a smaller integer to a larger int  Shifts
eger  ashlsi3
 extendqisi2  ashrsi3
 extendhisi2  lshrsi3
 zero_extendqisi2
 zero_extendhisi2
define_insn
 Logical Operations  Calls & Jumps
 andsi3  call
 iorsi3  call_value
 xorsi3  jump
 one_cmplsi2  indirect_jump
 Comparisons  tablejump
 cmpsi  Function Prologues and Epilogues
 *cmpsi_internal  prologue
 Branches  epilogue
 beq  return_from_func
 bne  leave_func
 blt  enter_func
 ble  Miscellaneous
 bgt  nop
 bge  blockage
 bltu
 bleu
 bgtu
 bgeu
 *branch_true
 *branch_false
define_insn “addsi_regs”
 (define_insn "addsi_regs"
 [(set (match_operand:SI 0 "register_operand" "=r")
 (plus:SI (match_operand:SI 1 "register_operand" "%0")

 (match_operand:SI 2 "register_operand" "r")))]

 ""
 "add %2, %0"
 )
 ;set value x chapter 9.15 p110
 ; value=x
 ; (plus:m x y)
 ; x+y with carry out in mode m
define_insn “addsi_regs” (cont’d)
 ; (mach_operand:m n predicate constraint) chapter 10.4 p131
 ; if condition(predicate) is true then return n
 ; n count from 0
 ; for each number n, only one match_operand expression
 ; predicate is a name of C function call. return 0 when failed
 ; general_operand: check the operand is either a constant, a register, or a memory r
eference
 ; register_operand: check the operand is register or not
 ; immediate_operand: check the operand is immediate data or not
 ; constraint: describes one kind of operand that is permited
 ; r: register
 ; m: any kind of memory operand
 ; o: only offsetable memory operand
 ; V: only not offsetable memory operand
 ; <: memory operand with autodecrement addressing
 ; >: memory operand with autoincrement addressing
 ; i: immediate integer operand
 ; 0~9: an operand that matches the specified operand number is allowed.
Build a GCC Cross Compiler

Machine Description Configure GCC Configure Binutils

Make Make

Make install Make install

GCC compiler
Build a GCC Cross Assembler and
Cross Linker
 Binutils: Ver 2.14
 Configure --target=fr30-elf –prefix=dir
 Make

 Make install
Build a GCC Cross Compiler
 GCC: ver 3.3.1
 ../configure --target=fr30-elf --prefix=dir --
enable-languages=c
 Make

 Make install
A simple c to test cross compiler
int test(int i,int j,int k)
{
int a;
int b;
a=49999999;
b=39999999;
a+=k;
b+=j;
a++;
b--;
i += a + b;
return i;
}

 fr30-elf-gcc –S –O2 t.c


A simple c to test cross compiler
(cont’d)
 .file "t.c"
 .text
 .p2align 2
 .globl test
 .type test, @function
 test:
 mov r4, r2 ;00000000000000000000010101000010
 ldi:32 #50000000, r4 ;00000000000000000000010000000100
;10111110101111000010000000
 ldi:32 #39999998, r1 ;00000000000000000000010000000001
;10011000100101100111111110
 add r6, r4 ;00000000000000000000000001100100
 add r5, r1 ;00000000000000000000000001010001
 add r1, r4 ;00000000000000000000000000010100
 add r2, r4 ;00000000000000000000000000100100
 ret
 .size test, .-test
 .ident "GCC: (GNU) 3.3.1 (cygming special)"
A simple c to test cross compiler
(cont’d)
Summary

 Study RTL is more important than study MD.


 Build cross assembler and cross linker before build cross compil
er.
 There are few data to port GCC as a cross compiler
 Modify an existing MD is easier than to create a new one.
 “The main goal of GCC was to make a good, fast compiler for
machines in the class that the GNU system aims to run on: 32-
bit machines that address 8-bit bytes and have several general
registers.” -- Richard Stallman.
 It seems that to design a new CPU is easier than to build a cros
s compiler for a GIEE studient.
 http://gcc.gnu.org

You might also like