Name		Name	Last commit message	Last commit date
parent directory ..
compiler_headers		compiler_headers
goto-conversion		goto-conversion
library		library
literals		literals
CMakeLists.txt		CMakeLists.txt
Makefile		Makefile
README.md		README.md
allocate_objects.cpp		allocate_objects.cpp
allocate_objects.h		allocate_objects.h
anonymous_member.cpp		anonymous_member.cpp
anonymous_member.h		anonymous_member.h
ansi_c_convert_type.cpp		ansi_c_convert_type.cpp
ansi_c_convert_type.h		ansi_c_convert_type.h
ansi_c_declaration.cpp		ansi_c_declaration.cpp
ansi_c_declaration.h		ansi_c_declaration.h
ansi_c_entry_point.cpp		ansi_c_entry_point.cpp
ansi_c_entry_point.h		ansi_c_entry_point.h
ansi_c_internal_additions.cpp		ansi_c_internal_additions.cpp
ansi_c_internal_additions.h		ansi_c_internal_additions.h
ansi_c_language.cpp		ansi_c_language.cpp
ansi_c_language.h		ansi_c_language.h
ansi_c_parse_tree.cpp		ansi_c_parse_tree.cpp
ansi_c_parse_tree.h		ansi_c_parse_tree.h
ansi_c_parser.cpp		ansi_c_parser.cpp
ansi_c_parser.h		ansi_c_parser.h
ansi_c_scope.cpp		ansi_c_scope.cpp
ansi_c_scope.h		ansi_c_scope.h
ansi_c_typecheck.cpp		ansi_c_typecheck.cpp
ansi_c_typecheck.h		ansi_c_typecheck.h
builtin_factory.cpp		builtin_factory.cpp
builtin_factory.h		builtin_factory.h
c_expr.cpp		c_expr.cpp
c_expr.h		c_expr.h
c_misc.cpp		c_misc.cpp
c_misc.h		c_misc.h
c_nondet_symbol_factory.cpp		c_nondet_symbol_factory.cpp
c_nondet_symbol_factory.h		c_nondet_symbol_factory.h
c_object_factory_parameters.cpp		c_object_factory_parameters.cpp
c_object_factory_parameters.h		c_object_factory_parameters.h
c_preprocess.cpp		c_preprocess.cpp
c_preprocess.h		c_preprocess.h
c_qualifiers.cpp		c_qualifiers.cpp
c_qualifiers.h		c_qualifiers.h
c_storage_spec.cpp		c_storage_spec.cpp
c_storage_spec.h		c_storage_spec.h
c_typecast.cpp		c_typecast.cpp
c_typecast.h		c_typecast.h
c_typecheck_base.cpp		c_typecheck_base.cpp
c_typecheck_base.h		c_typecheck_base.h
c_typecheck_code.cpp		c_typecheck_code.cpp
c_typecheck_expr.cpp		c_typecheck_expr.cpp
c_typecheck_gcc_polymorphic_builtins.cpp		c_typecheck_gcc_polymorphic_builtins.cpp
c_typecheck_initializer.cpp		c_typecheck_initializer.cpp
c_typecheck_shadow_memory_builtin.cpp		c_typecheck_shadow_memory_builtin.cpp
c_typecheck_type.cpp		c_typecheck_type.cpp
c_typecheck_typecast.cpp		c_typecheck_typecast.cpp
cprover_builtin_headers.h		cprover_builtin_headers.h
cprover_library.cpp		cprover_library.cpp
cprover_library.h		cprover_library.h
designator.cpp		designator.cpp
designator.h		designator.h
expr2c.cpp		expr2c.cpp
expr2c.h		expr2c.h
expr2c_class.h		expr2c_class.h
file_converter.cpp		file_converter.cpp
gcc_types.cpp		gcc_types.cpp
gcc_types.h		gcc_types.h
gcc_version.cpp		gcc_version.cpp
gcc_version.h		gcc_version.h
library_check.sh		library_check.sh
merged_type.h		merged_type.h
module_dependencies.txt		module_dependencies.txt
padding.cpp		padding.cpp
padding.h		padding.h
parser.y		parser.y
parser_static.inc		parser_static.inc
preprocessor_line.cpp		preprocessor_line.cpp
preprocessor_line.h		preprocessor_line.h
printf_formatter.cpp		printf_formatter.cpp
printf_formatter.h		printf_formatter.h
scanner.l		scanner.l
type2name.cpp		type2name.cpp
type2name.h		type2name.h
typedef_type.h		typedef_type.h

README.md

\ingroup module_hidden \defgroup ansi-c ansi-c

Folder ansi-c

\author Kareem Khazem, Martin Brain

\section overview Overview

Contains the front-end for ANSI C, plus a variety of common extensions. This parses the file, performs some basic sanity checks (this is one area in which the UI could be improved; patches most welcome) and then produces a goto-program (see below). The parser is a traditional Flex / Bison system.

internal_addition.c contains the implementation of various ‘magic’ functions that are that allow control of the analysis from the source code level. These include assertions, assumptions, atomic blocks, memory fences and rounding modes.

The library/ subdirectory contains versions of some of the C standard header files that make use of the CPROVER built-in functions. This allows CPROVER programs to be ‘aware’ of the functionality and model it correctly. Examples include stdio.c, string.c, setjmp.c and various threading interfaces.

\section preprocessing Preprocessing & Parsing

In the \ref ansi-c directory

Key classes:

\ref languaget and its subclasses
ansi_c_parse_treet

\dot digraph G { node [shape=box]; rankdir="LR"; 1 [shape=none, label=""]; 2 [label="preprocessing & parsing"]; 3 [shape=none, label=""]; 1 -> 2 [label="Command line options, file names"]; 2 -> 3 [label="Parse tree"]; } \enddot

\section type-checking Type-checking

In the \ref ansi-c directory

Key classes:

\ref languaget and its subclasses
\ref irept
\ref irep_idt
\ref symbolt
symbol_tablet

\dot digraph G { node [shape=box]; rankdir="LR"; 1 [shape=none, label=""]; 2 [label="type checking"]; 3 [shape=none, label=""]; 1 -> 2 [label="Parse tree"]; 2 -> 3 [label="Symbol table"]; } \enddot

This stage generates a symbol table, mapping identifiers to symbols; \ref symbolt "symbols" are tuples of (value, type, location, flags).

This is a good point to introduce the \ref irept ("internal representation") class---the base type of many of CBMC's hierarchical data structures. In particular, \ref exprt "expressions", \ref typet "types" and \ref codet "statements" are all subtypes of \ref irept. An irep is a tree of ireps. A subtlety is that an irep is actually the root of three (possibly empty) trees, i.e. it has three disjoint sets of children: \ref irept::get_sub() returns a list of children, and \ref irept::get_named_sub() returns an association from names to children. Most clients never use these functions directly, as subtypes of irept generally provide more descriptive functions. For example, the operands of an \ref exprt "expression" (\ref exprt::op0() "op0", op1 etc) are really that expression's children; the \ref code_assignt::lhs() "left-hand" and right-hand side of an \ref code_assignt "assignment" are the children of that assignment. The \ref irept::pretty() function provides a descriptive string representation of an irep.

\ref irep_idt "irep_idts" ("identifiers") are strings that use sharing to improve memory consumption. A common pattern is a map from irep_idts to ireps. A goto-program contains a single symbol table (with a single scope), meaning that the names of identifiers in the target program are lightly mangled in order to make them globally unique. If there is an identifier foo in the target program, the name field of foo's \ref symbolt "symbol" in the goto-program will be

foo if it is global;
bar::foo if it is a parameter to a function bar();
bar::3::foo if it is a local variable in a function bar(), where 3 is a counter that is incremented every time a newly-scoped foo is encountered in that function.

The use of sharing to save memory is a pervasive design decision in the implementation of ireps and identifiers. Sharing makes equality comparisons fast (as there is no need to traverse entire trees), and this is especially important given the large number of map lookups throughout the codebase. More importantly, the use of sharing saves vast amounts of memory, as there is plenty of duplication within the goto-program data structures. For example, every statement, and every sub-expression of a statement, contains a \ref source_locationt that indicates the source file and location that it came from. Every symbol in every expression has a field indicating its type and location; etc. Although each of these are constructed as separate objects, the values that they eventually point to are shared throughout the codebase, decreasing memory consumption dramatically.

The Type Checking stage turns a parse tree into a \ref symbol_tablet "symbol table". In this context, the 'symbols' consist of code statements as well as what might more traditionally be called symbols. Thus, for example:

The statement int foo = 11; is converted into a symbol whose type is integer_typet and value is the \ref constant_exprt "constant expression" 11; that symbol is stored in the symbol table using the mangled name of foo as the key;
The function definition void foo(){ int x = 11; bar(); } is converted into a symbol whose type is \ref code_typet (not to be confused with \ref typet or \ref codet!); the code_typet contains the parameter and return types of the function. The value of the symbol is the function's body (a \ref codet), and the symbol is stored in the symbol table with foo as the key.

\section performance Parsing performance considerations

Measured on trunk/regression/ansi-c/windows_h_VS_2012/main.i
13%: Copying into i_preprocessed
5%: ansi_c_parser.read()
53%: yyansi_clex()
29%: parser (without typechecking)

\section references Compiler References

CodeWarrior C Compilers Reference 3.2:

http://cache.freescale.com/files/soft_dev_tools/doc/ref_manual/CCOMPILERRM.pdf

http://cache.freescale.com/files/soft_dev_tools/doc/ref_manual/ASMX86RM.pdf

ARM 4.1 Compiler Reference:

http://infocenter.arm.com/help/topic/com.arm.doc.dui0491c/DUI0491C_arm_compiler_reference.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ansi-c

ansi-c

README.md

Folder ansi-c

Files

ansi-c

Directory actions

More options

Directory actions

More options

Latest commit

History

ansi-c

Folders and files

parent directory

README.md

Folder ansi-c