This document is designed to help AI coding assistants (Kiro, Claude Code, Copilot, etc.) effectively work with the CBMC codebase. It provides a comprehensive overview of the project structure, key concepts, and development practices.
- Project Overview
- Repository Structure
- Key Directories
- Architectural Concepts
- Central Data Structures
- Build System
- Testing Framework
- Coding Standards
- Documentation Practices
- Common Development Workflows
- Navigation Tips
- Important Links
CBMC (C Bounded Model Checker) is the main tool in the CProver suite for formal verification of C and C++ programs.
- Bounded model checking for C/C++ programs
- Supports C89, C99, most of C11, C17, C23
- Supports most compiler extensions from gcc and Visual Studio
- Verifies array bounds, pointer safety, exceptions, and user-specified assertions
- Performs verification by unwinding loops and passing equations to decision procedures
- Also includes JBMC for Java bytecode verification
- Main site: cprover.org
- Documentation: diffblue.github.io/cbmc
cbmc/
├── src/ # Main source code
├── jbmc/ # Java Bounded Model Checker
├── regression/ # Regression test suites
├── unit/ # Unit test suites
├── doc/ # Documentation
│ ├── architectural/ # Architecture documentation
│ ├── ADR/ # Architecture Decision Records
│ ├── cprover-manual/ # User manual
│ └── man/ # Man pages
├── scripts/ # Build and development scripts
├── cmake/ # CMake configuration
├── integration/ # Integration test examples
│ ├── linux/ # Linux integration examples
│ └── xen/ # Xen hypervisor examples
├── .github/ # GitHub configuration and workflows
│ ├── workflows/ # CI/CD workflow definitions
│ │ ├── build-and-test-Linux.yaml # Main Linux build and test
│ │ ├── pull-request-checks.yaml # PR validation checks
│ │ ├── coverage.yaml # Code coverage reporting
│ │ ├── syntax-checks.yaml # Code style and linting
│ │ ├── codeql-analysis.yml # Security analysis
│ │ ├── performance.yaml # Performance benchmarking
│ │ └── release-packages.yaml # Release automation
│ └── dependabot.yml # Dependency update automation
├── CODING_STANDARD.md # Coding conventions
├── COMPILING.md # Build instructions
├── TOOLS_OVERVIEW.md # Overview of all tools
└── README.md # Main readme
The source is organized into modular directories by functionality:
-
util/- Fundamental utilities and data structures- Base data structures like
irept,exprt,typet - String handling, expression utilities
- Foundation for everything else
- Base data structures like
-
goto-programs/- GOTO intermediate representation- Core IR data structures:
goto_programt,goto_functiont,goto_modelt - The heart of CBMC's program representation
- Core IR data structures:
-
linking/- Linking GOTO programs together- Combines multiple GOTO programs
-
big-int/- Big integer arithmetic- Arbitrary precision integer operations
- Used throughout CBMC for large numeric computations
-
langapi/- Language API interface- Abstract interface for language front-ends
-
ansi-c/- C language front-end- Parsing and type-checking for C
-
cpp/- C++ language front-end- C++ specific parsing (depends on
ansi-c)
- C++ specific parsing (depends on
-
goto-symex/- Symbolic execution engine- Core symbolic execution implementation
- Transforms GOTO programs into logical formulas
-
analyses/- Static analyses- Various static analysis passes
- Abstract interpretation implementations
-
pointer-analysis/- Pointer analysis- Points-to analysis and pointer tracking
-
goto-checker/- Verification orchestration- Coordinates the verification process
solvers/- Decision procedures- SAT/SMT solver interfaces
- Bit-blasting and encoding
cbmc/- Main CBMC toolgoto-cc/- Compiler wrappergoto-instrument/- Program instrumentationgoto-analyzer/- Abstract interpretation toolgoto-diff/- Diff tool for GOTO programsgoto-harness/- Test harness generationgoto-bmc/- Bounded model checkingmemory-analyzer/- Memory analysis with gdbsymtab2gb/- Symbol table to GOTO binary
json/- JSON handlingxmllang/- XML supportassembler/- Assembly support
Parallel structure to main CBMC for Java:
jbmc/src/- Java-specific source codejava_bytecode/- Java bytecode front-end- Parsing and analysis of Java .class files
- Java-specific type system and language features
- JVM instruction handling
jbmc/- Main JBMC tool executable- Entry point for Java verification
janalyzer/- Java static analyzer- Abstract interpretation for Java
jdiff/- Diff tool for Java programs- Comparison of Java GOTO programs
miniz/- ZIP compression library- Used for reading JAR files and compressed class files
jbmc/regression/- Java regression testsjbmc/unit/- Java unit tests
Extensive test suites organized by tool and feature:
cbmc/- Main CBMC testsgoto-instrument/- Instrumentation testsgoto-analyzer/- Analysis testscontracts/- Contract testscbmc-cpp/- C++ specific testssmt2_solver/- SMT solver tests- Many more specialized test directories
See regression/README.md for test tags and categories.
Unit tests using the Catch framework:
- Organized by module matching
src/structure - Tests for individual components and utilities
- Run with
unitexecutable
-
architectural/- Architecture documentationbackground-concepts.md- Key conceptscbmc-architecture.md- High-level architecturecentral-data-structures.md- Core data structuresfolder-walkthrough.md- Directory guidegoto-program-transformations.md- Instrumentation passes
-
ADR/- Architecture Decision Records- Documents key architectural decisions
- Useful for understanding design rationale
-
cprover-manual/- User manual -
man/- Man pages for tools
- Build helpers and utilities
cpplint.py- Style checkertest.pl- Regression test runner (inregression/)- CI/CD related scripts
CBMC follows a compiler-like architecture with these stages:
Source Code → Preprocessing → Parsing → Type Checking
↓
Goto Conversion
↓
Goto Program (IR)
↓
Instrumentation/Transformations
↓
Symbolic Execution
↓
SAT/SMT Encoding
↓
Decision Procedure
↓
Counterexample/Trace
The GOTO program is CBMC's intermediate representation (IR):
- Language-agnostic representation of programs
- Similar to control flow graphs (CFGs)
- Can be saved to "goto binaries" (by
goto-cc) - Processed by all back-end tools
- Symbol Table - Maps identifiers to their definitions
- GOTO Functions - Collection of functions in IR form
- GOTO Instructions - Individual instructions with guards and types
- Symbolic Execution - Explores program paths symbolically
- Bounded Model Checking - Unwinds loops to finite depth
- Decision Procedures - SAT/SMT solvers that check satisfiability
The top-level data structure representing a complete program:
class goto_modelt {
symbol_tablet symbol_table; // All symbols (variables, functions, types)
goto_functionst goto_functions; // All functions in GOTO form
};A map from function names to function definitions:
// Conceptually: map<identifier, goto_functiont>Represents a single function:
class goto_functiont {
goto_programt body; // Function body (instruction sequence)
std::vector<irep_idt> parameter_identifiers; // Parameter names
};A sequence of GOTO instructions forming a function body:
class goto_programt {
std::list<instructiont> instructions; // Ordered list of instructions
};See src/goto-programs/goto_program.h for details.
A single instruction in the GOTO program:
class goto_instructiont {
goto_program_instruction_typet type; // Instruction type (ASSIGN, GOTO, etc.)
codet code; // The actual code/statement
exprt guard; // Boolean condition (optional)
source_locationt source_location; // Original source location
// ... and other fields
};Instruction Types include:
ASSIGN- AssignmentFUNCTION_CALL- Function callRETURN- Return statementGOTO- Conditional/unconditional jumpASSUME- Assumption (path constraint)ASSERT- Assertion to verifySKIP- No-op- And more...
Represents a symbol (variable, function, type):
class symbolt {
irep_idt name; // Unique identifier
typet type; // Type of symbol
exprt value; // Initial value (if applicable)
source_locationt location;
// ... other metadata
};The base data structure for most CBMC types:
class irept {
// Tree structure with:
// - An ID (string)
// - Named sub-trees (map)
// - Ordered sub-trees (vector)
};Key classes built on irept:
exprt- Expressionstypet- Typescodet- Code/statementssource_locationt- Source locations
Important: Use the specific subclass methods rather than raw irept access.
exprt- Represents expressions (operators, literals, variables, etc.)typet- Represents types (int, pointer, array, struct, etc.)
Both inherit from irept and provide type-safe accessors.
Key dependency relationships:
- Tools (cbmc, goto-cc, etc.) → goto-instrument → goto-symex
- goto-symex → solvers, pointer-analysis
- Languages (cpp, ansi-c) → langapi → goto-programs
- Almost everything → util
- util → big-int
See doc/architectural/folder-walkthrough.md for the full dependency graph.
CBMC requires bison, flex, a C- and C++ compiler, and make or CMake. To
speed up rebuilds, install ccache.
CBMC uses CMake 3.8+ as the primary build system.
# 1. Update submodules
git submodule update --init
# 2. Generate build files
cmake -S . -Bbuild
# 3. Build
cmake --build build --parallel $(nproc)
# 4. Run tests
ctest --test-dir build -V -L CORE# Use specific compiler
cmake -S . -Bbuild -DCMAKE_CXX_COMPILER=clang++
# Build with different SAT solver
cmake -S . -Bbuild -Dsat_impl=cadical
# Debug build
cmake -S . -Bbuild -DCMAKE_BUILD_TYPE=Debug
# Release build
cmake -S . -Bbuild -DCMAKE_BUILD_TYPE=ReleaseAfter building, executables are in:
build/bin/- Main executables (cbmc, goto-cc, etc.)build/lib/- Libraries
Traditional makefiles are also available:
cd src
make minisat2-download
make -j$(nproc) # Parallel buildConfiguration in src/config.inc (SAT solver paths, etc.).
CBMC can use various SAT/SMT solvers:
- MiniSat (default)
- CaDiCaL
- Glucose
- Z3
- And others
CMake automatically downloads MiniSat during configuration.
See COMPILING.md for detailed build instructions for all platforms.
Location: regression/
# Run all regression tests
make -C regression test
# Run specific test directory
cd regression/cbmc
make test
# Using CMake
ctest --test-dir build -V -L CORE -j$(nproc)Each test directory contains:
- Test cases (
.c,.cpp,.javafiles) - Test descriptor (
test.desc) with flags (seeregression/test.pl --helpfor format details)
Important tags (see regression/README.md):
smt-backend- Requires SMT backendbroken-smt-backend- Known issues with SMTthorough-smt-backend- Too slow for CI- Similar tags for specific solvers
Location: unit/
# Build and run all unit tests
make -C unit test
# Using CMake
cmake --build build --target unit
cd unit && ../build/bin/unit
# Run specific test suite
cd unit && ../build/bin/unit "[solvers]" # Run only solver tests- Uses Catch2 testing framework
- Tests organized by module
- Each test is a
TEST_CASEorSCENARIO
Regression Test Example:
# In regression/cbmc/my-test/
main.c # Test input
test.desc # Test configurationtest.desc format:
CORE
main.c
--bounds-check --unwind 5
^VERIFICATION SUCCESSFUL$
Unit Test Example:
TEST_CASE("My feature test", "[my-module]")
{
// Setup
// Test
REQUIRE(result == expected);
}CBMC follows strict coding standards documented in CODING_STANDARD.md.
Enforced by clang-format - Run before committing!
Key rules:
- 2 spaces for indentation (no tabs)
- 80 character line limit
- Matching
{ }in same column (except initializers/lambdas) - Spaces around binary operators (
=,+,==) - Space after comma and colon (in for loops)
*/&attached to variable name:int *ptr;- No trailing whitespace
- Newline at end of file
// Correct
if(condition)
{
do_something();
}
else
{
do_other();
}
// Single-line blocks (allowed)
if(condition)
do_something();
// For loops
for(int i = 0; i < n; i++)
{
// body
}- No
/* */style comments (use//instead) - Every file must start with author comment and
\fileDoxygen tag
/// \file
/// Brief description of this file's purpose- Document all classes, functions, and non-obvious members
/// Brief description ending with period. Longer description can follow.
/// \param arg: Description of parameter
/// \param [out] result: Output parameter description
/// \param [in,out] state: In-out parameter description
/// \return Description of return value
int my_function(int arg, int &result, state_t &state);- Methods > 50 lines should be broken into smaller functions
- Use blank lines to separate logical blocks
- Prefer clear variable names over comments
- Type safety: Use proper const-correctness
- Error handling: Use descriptive INVARIANT messages
- Consider impact on external users
- Public interfaces should be stable
- Document deprecations clearly
- Interfaces = anything used outside a single directory
- Const-correctness - Mark const what should be const
- Type safety - Avoid casts when possible
- Error handling - Use INVARIANT/PRECONDITION/POSTCONDITION
- Documentation - Prioritize readability
- Testing - Include regression tests for changes
CBMC uses Doxygen for API documentation.
cd src
doxygen
# Output in doc/html/Follow LLVM guidelines with extensions:
/// This is the brief description (first sentence).
///
/// More detailed explanation can follow in subsequent paragraphs.
/// Feel free to break into multiple paragraphs for readability.
///
/// \param param1: Short description
/// \param param2: Longer description that needs multiple lines.
/// Additional lines indented by two spaces for clarity.
/// \param [out] output: This parameter is modified by the function
/// \return Description of return value- File documentation - Every
.cppand.hfile - Class documentation - Purpose and usage
- Function documentation - Parameters, return values, behavior
- Complex algorithms - Explain the approach
- Non-obvious code - Clarify intent
Read before coding:
CODING_STANDARD.md- Style and conventionsCOMPILING.md- Build instructionsTOOLS_OVERVIEW.md- Tool descriptionsdoc/architectural/- Architecture deep-divesdoc/ADR/- Design decisions- Doxygen output - API reference
# 1. Create a feature branch from develop
git checkout develop
git checkout -b feature/my-improvement
# 2. Make changes following coding standards
# Edit files in src/
# 3. Format code
# (clang-format is enforced in CI)
# 4. Build
cmake --build build
# 5. Run relevant tests
ctest --test-dir build -V -L CORE -R <relevant-module>
cd unit && ../build/bin/unit "[relevant-module]"
# 6. Commit with clear message
git commit -m "Add feature X to improve Y (explains WHAT)" -m "Doing X is important, because ... (explain WHY and possibly HOW)"
# 7. Push and create PR targeting develop
git push origin feature/my-improvement# 1. Implement the feature in appropriate src/ directory
# 2. Add unit tests in unit/
# 3. Add regression tests in regression/
# 4. Update documentation if needed
# 5. Ensure all tests pass
# 6. Create PR with detailed description# 1. Create regression test that reproduces the bug
# 2. Confirm test fails with current code
# 3. Fix the bug
# 4. Confirm test now passes
# 5. Ensure no other tests break
# 6. Create PR referencing the issue# Generate GOTO binary from C code
goto-cc -o program.gb program.c
# View GOTO program
goto-instrument --show-goto-functions program.gb
# Instrument GOTO program
goto-instrument --bounds-check program.gb instrumented.gb
# Verify with CBMC
cbmc instrumented.gb# Single test
cd regression/cbmc
../test.pl -C -p -c ../../../src/cbmc/cbmc my-test
# All tests in a category
cd regression/cbmc
make testBy Functionality:
- Parsing C code →
src/ansi-c/ - Symbolic execution →
src/goto-symex/ - SAT/SMT solvers →
src/solvers/ - Main CBMC tool →
src/cbmc/ - Instrumentation →
src/goto-instrument/
By Data Structure:
- GOTO programs →
src/goto-programs/goto_program.h - Expressions →
src/util/std_expr.h,src/util/expr.h - Types →
src/util/type.h,src/util/std_types.h - Symbols →
src/util/symbol.h - Symbol table →
src/util/symbol_table.h
By Concept:
- Loop unwinding →
src/goto-symex/andsrc/goto-instrument/ - Pointer analysis →
src/pointer-analysis/ - Static analysis →
src/analyses/ - Abstract interpretation →
src/analyses/
# Find where a class is defined
rg "class goto_programt" src/
# Find usages of a function
rg "goto_convert\(" src/
# Find test cases for a feature
rg "bounds.check" regression/ -l
# Find documentation
rg "\\page" doc/ -A 5Each source directory has module_dependencies.txt:
# Check what a module depends on
cat src/goto-symex/module_dependencies.txtTo understand how data flows through CBMC:
- Start - Source code input
- Frontend -
src/ansi-c/orsrc/cpp/parses to AST - Type Checking - Language-specific type checking
- Symbol Table - Symbols populated in
symbol_tablet - GOTO Conversion - AST →
goto_programt - Goto Model - Complete
goto_modeltcreated - Instrumentation -
goto-instrumenttransforms - Symbolic Execution -
goto-symexexplores paths - Solver - SAT/SMT solver in
src/solvers/ - Result - Verification result or counterexample
Start with:
src/cbmc/cbmc_parse_options.cpp- Entry point for CBMCsrc/util/irep.h- Core data structure of all intermediate representations- Headers in
src/goto-programs/- Core IR structures doc/architectural/- Architecture documentation
Understand patterns:
- Most data structures inherit from
irept - Use
id2string()to convertirep_idttostd::string - Expression/type casting with
expr_cast.h - Visitors for traversing expressions/instructions
- CProver Documentation - Complete developer docs
- CBMC Architecture - High-level overview
- Background Concepts - Key concepts
- Developer Tutorial - Getting started
- CODING_STANDARD.md - Coding conventions
- COMPILING.md - Build instructions
- TOOLS_OVERVIEW.md - All tools explained
- FEATURE_IDEAS.md - Mini-projects for contributors
- README.md - Main readme
- doc/architectural/folder-walkthrough.md - Directory structure
- doc/architectural/central-data-structures.md - Core data structures
- doc/architectural/goto-program-transformations.md - Instrumentation passes
- doc/architectural/compilation-and-development.md - Development guide
- doc/ADR/ - Design decisions
- doc/ADR/cpp_api_modularisation.md - API design
- doc/ADR/symex_ready_goto.md - Symex design
- Main Website - User documentation
- GitHub Repository - Public repository
- CProver Manual - User manual
| Task | Command |
|---|---|
| Build everything | cmake --build build |
| Build CBMC only | cmake --build build --target cbmc |
| Run all tests | ctest --test-dir build -V -L CORE -j$(nproc) |
| Run unit tests | cd unit && ../build/bin/unit |
| Format code | clang-format -i <file> |
| Generate docs | cd src && doxygen |
| Create GOTO binary | goto-cc -o out.gb input.c |
| View GOTO program | goto-instrument --show-goto-functions prog.gb |
| Run CBMC | cbmc program.gb or cbmc program.c |
src/goto-programs/goto_program.h- Core IR structuressrc/util/irep.h- Base data structuresrc/util/expr.h- Expressionssrc/util/type.h- Typessrc/util/symbol.h- Symbolssrc/goto-symex/goto_symext.h- Symbolic executionsrc/solvers/- SAT/SMT interfaces
cbmc → goto-instrument → goto-symex → {solvers, pointer-analysis}
goto-cc → cpp → ansi-c → langapi → goto-programs
goto-programs → util → big-int
- Check existing patterns in similar code
- Follow the module structure (don't cross module boundaries unnecessarily)
- Add tests (unit and regression)
- Update documentation
- Follow coding standards (formatting, comments)
- Check if there's a regression test that demonstrates the issue
- Use
goto-instrument --show-goto-functionsto inspect IR - Look for similar fixed bugs in git history
- Check module dependencies if getting linker errors
- Understand data flow first (see Navigation Tips)
- Check impact on public interfaces
- Run full test suite
- Update documentation
- Don't break 80-char line limit (enforced by CI)
- Don't use
/* */comments - Don't forget Doxygen comments on public interfaces
- Don't skip regression tests
- Don't modify
ireptdirectly; use subclass accessors - Don't forget to update submodules after pulling
- CBMC outputs verification results and traces
- Traces show path through program leading to property violation
- Check
source_locationtin instructions for source mapping
Last Updated: 2026-01-19
This guide is maintained to help AI coding assistants work effectively with CBMC. For questions or updates, refer to the main documentation or ask the development team.