\ingroup module_hidden
\page tutorial Tutorials
\section cbmc_tutorial CBMC Developer Tutorial
\tableofcontents
\author Kareem Khazem
This is an introduction to hacking on the cprover
codebase. It is not
intended as a user guide to CBMC
or related tools. It is structured
as a series of programming exercises that aim to acclimatise the reader
to the basic data structures and workflow needed for contributing to
CBMC
.
Clone the CBMC repository and build it. The build instructions are
written in a file called COMPILING.md in the top level of the repository. I
recommend that you build using CMake---this will place all of the CBMC tools in
a single directory, which you can add to your $PATH
. For example, if you built
the codebase with the following two commands at the top level of the repository:
cmake -DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_CXX_COMPILER=/usr/bin/clang++ \
-DCMAKE_C_COMPILER=/usr/bin/clang \
-B build -S .
cmake --build build
then all the CBMC binaries will be built into build/bin
, and you can run the
following commands to make CBMC invokable from your terminal.
# Assuming you cloned CBMC into ~/code
export PATH=~/code/cbmc/build/bin:$PATH
# Add to your shell's startup configuration file so that you don't have to run that command every time.
echo 'export PATH=~/code/cbmc/build/bin:$PATH' >> .bashrc
If you are using Make instead of CMake, the built binaries will be
scattered throughout the source tree. This tutorial uses the binaries
src/cbmc/cbmc
, src/goto-instrument/goto-instrument
, and
src/goto-cc/goto-gcc
, so you will need to add each of those
directories to your $PATH
, or symlink to those binaries from a
directory that is already in your $PATH
.
Ensure that graphviz is installed on your system (in
particular, you should be able to run a program called dot
). Install
Doxygen and generate doxygen documentation:
# In the src directory
doxygen doxyfile
# View the documentation in a web browser
firefox ~/code/cbmc/doc/html/index.html
If you've never used doxygen documentation before, get familiar with the
layout. Open the generated HTML page in a web browser; search for the
class goto_programt
in the search bar, and jump to the documentation
for that class; and read through the copious documentation.
CBMC's code is located under the src
directory. Even if you plan to
contribute only to CBMC, it is important to be familiar with several
other of cprover's auxiliary tools.
If you built using CMake on Unix, you should be able to run the
goto-gcc
command.
Find or write a moderately-interesting C program; we'll call it main.c
.
Run the following commands:
goto-gcc -o main.gb main.c
cc -o main.exe main.c
Invoke ./main.gb
and ./main.exe
and observe that they run identically.
The version that was compiled with goto-gcc
is larger, though:
du -hs *.{goto,exe}
Programs compiled with goto-gcc
are mostly identical to their clang
-
or gcc
-compiled counterparts, but contain additional object code in
cprover's intermediate representation. The intermediate representation
is (informally) called a goto-program.
goto-instrument
is a Swiss army knife for viewing goto-programs and
performing single program analyses on them. Run the following command:
goto-instrument --show-goto-functions main.gb
Many of the instructions in the goto-program intermediate representation
are similar to their C counterparts. if
and goto
statements replace
structured programming constructs.
Find or write a small C program (2 or 3 functions, each containing a few
varied statements). Compile it using goto-gcc
as above into an object
file called main
. You can write the diagram to a file and then view it:
goto-instrument --dot main.gb | tail -n +2 | dot -Tpng > main.png
open main.png
(the invocation of tail
is used to filter out the first line of
goto-instrument
output. If goto-instrument
writes more or less
debug output by the time you read this, read the output of
goto-instrument --dot main
and change the invocation of tail
accordingly.)
There are a few other views of goto-programs. Run goto-instrument -h
and try the various switches under the "Diagnosis" section.
In this section, you will learn about the basic goto-program data structures. Reading from and manipulating these data structures form the core of writing an analysis for CBMC.
The entry point of goto-instrument
is in goto_instrument_main.cpp
.
Follow the control flow into goto_instrument_parse_optionst::doit()
, located in goto_instrument_parse_options.cpp
.
At some point in that function, there will be a long sequence of if
statements.
$ goto-instrument --greet main.gb
hello, world!
$ goto-instrument --greet Leperina main
hello, Leperina!
You will also need to add the greet
option to the
goto_instrument_parse_options.h
file in order for this to work.
Notice that in the .h
file, options that take an argument are followed
by a colon (like (property):
), while simple switches have no colon.
Make sure that you return 0;
after printing the message.
The idea behind goto-instrument
is that it parses a goto-program and
then performs one single analysis on that goto-program, and then
returns. Each of the switches in doit
function of
goto_instrument_parse_options
does something different with the
goto-program that was supplied on the command line.
At this point in goto-instrument_parse_options
(where the if
statements are), the goto-program will have been loaded into the object
goto_functions
, of type goto_functionst
. This has a field called
function_map
, a map from function names to functions.
The following is quite difficult to follow from doxygen, but: the value
type of function_map
is goto_function_templatet<goto_programt>
.
Each \ref goto_programt object contains a list of
\ref goto_programt::instructiont called
instructions
. Each instruction has a field called code
, which has
type \ref codet.
index
* type: unsignedbv
* width: 8
* #c_type: char
0: symbol
* type: array
* size: nil
* type:
* #source_location:
* file: src/main.c
* line: 18
* function:
* working_directory: /some/dir
0: unsignedbv
* width: 8
* #c_type: char
...
The sub-nodes of a particular node in the pretty representation are
numbered, starting from 0. They can be accessed through the op0()
,
op1()
and op2()
methods in the exprt
class.
Every node in the pretty representation has an identifier, accessed
through the id()
function. The file util/irep_ids.def
lists the
possible values of these identifiers; have a quick scan through that
file. In the pretty representation above, the following facts are true
of that particular node:
node.id() == ID_index
node.type().id() == ID_unsignedbv
node.op0().id() == ID_symbol
node.op0().type().id() == ID_array
The fact that the op0()
child has a symbol
ID means that you could
cast it to a symbol_exprt
(which is a subtype of exprt
) using the
function to_symbol_expr
.