This library can be seen as an extension of modarith, originally a Python script developed by Michael Scott (TII) for general finite field arithmetic using portable C or Rust code. Our library, named m4-modarith
, generates C code with embedded assembly tailored for the ARM Cortex-M4.
Its main script takes as input the target prime and desired reduction method. It generates efficient C code, optimized for the target prime.
The library can be used on two main ways:
-
As a standalone script for generating efficient code for a specific prime and reduction method tailored for the ARM Cortex-M4 architecture
- Usage:
python m4generator.py <prime name or expression> <reduction>
-
prime name
: a short name for the prime of interest, using valid C identifier characters; e.g. C.25519. Alternatively, an integer or expression can be used directly. - Valid values for
reduction
:pmp_asm
,pmp_c
,mont
- Description of reduction types and their parameters:
-
pmp_asm
: reduction modulo pseudo-Mersenne primes, i.e. those of the form$2^n - c$ . -
pmp_c
: same aspmp_asm
, except that reduction is implemented mostly in C, with inline assembly blocks only as needed. -
mont
: reduction using Montgomery multiplication.
- For our inlining policy, please refer to inline_policy.md.
- Refer to modifying.md#adding-a-new-prime-number for a reference of other possible options for the script
- Usage:
-
As a benchmark suite of several generated code for different primes and reduction methods, as well as a testing platform of each generated function.
- Refer to modifying.md for instructions for adding new primes, unit tests, and benchmarks to the project.
This library was built and tested on NUCLEO-L4R5ZI and STM32F407G-DISC1 evaluation boards, as well as using QEMU for emulation. Any other usage can be considered experimental and no guarantees are made on its compilation process.
- arm-none-eabi-gcc (clang toolchain currently not supported).
- For Mac users, please use homebrew's package gcc-arm-embedded (and not the
arm-none-eabi-gcc
package)
- For Mac users, please use homebrew's package gcc-arm-embedded (and not the
- stlink to flash binaries into the development board. Some distribution have packages for this.
- Standard development tools: CMake, make, git, ninja. Although not strictly necessary, clang-format is used by modarith, and if not present, will result in error messages being printed during the build process -- but note that they do not interrupt the build process, merely cluttering the output.
- python and python3 (both commands must be present on the development machine).
- addchain; binaries for your system may be downloaded from https://github.com/mmcloughlin/addchain/releases, and must be copied to a location accessible from your system PATH.
- Suggested VS Code extensions: CMake Tools, Cortex-Debug, C++ TestMate. Use of the Remote - SSH extension is also supported.
- For modarith, some additional dependencies are needed. Here are listed the main dependencies, but please check its github repository for any other possible dependency.
For a complete outline of the build process, including code generation, compilation, and testing steps, please refer to build.md.
As a quick guide for configuring and building, these commands should fully build the project:
# From the main project directory
rm -rf build
mkdir build
cd build
cmake -DCMAKE_TOOLCHAIN_FILE=cmake/arm-none-eabi-gcc-libopencm3.toolchain -DCMAKE_BUILD_TYPE=Release -G Ninja ..
ninja
# From the main project directory
qemu-system-arm -M olimex-stm32-h405 -semihosting -nographic -serial null -serial stdio -monitor null -kernel build/run_tests.elf
The hardware abstraction layer developed for this project employs the Cortex-M4's Single Wire Output (SWO) feature to redirect standard output. This avoids the need to modify the board or connect a USB-serial converter in boards such as the STM32F407G-DISC1. To read from SWO, it is necessary to use either tools from either OpenOCD or stlink, with the latter being the simplest option. To clarify, reading from a serial port is NOT supported, as standard output is not redirected to any of the MCU's serial ports.
The following commands can be used to flash the board and monitor standard output using SWO for stlink versions >= 1.8.0, assuming the use of the NUCLEO-L4R5ZI evaluation board:
# From the main project directory
st-flash --reset write build/bench.bin 0x08000000
st-trace -c20m
If using the STM32F407G-DISC1 board, the clock frequency passed to st-trace
must be modified, again when using stlink versions >= 1.8.0:
st-trace -c24m
Older versions of stlink implicitly assumed that clock frequencies were given in MHz to the st-trace
tool, without the use of the "m" suffix. Thus, the shuffix should be removed from the argument, as illustrated next for the case of the STM32F407G-DISC1 evaluation board:
st-trace -c24
The scripts
directory in the root of the project includes some shell scripts that serve two main purposes:
-
Automating the development in a Visual Studio Code environment. It is possible to perform a series of tasks directly within the environment, such as flashing binaries to the board, monitoring the standard output of the board to get test or benchmark results, automated running of tests (under QEMU) using the Testing pane of Visual Studio Code, and debugging the project (whether under QEMU or on the hardware itself, using the ST-Link programmer present on ST evaluation boards).
-
Supporting a development workflow where boards are not connected locally to the development computer, but instead to a remote PC running a Unix-like operating system (tested on Linux) which is accessible via ssh. The same tasks mentioned in item 1 can be transparently run via ssh.
To allow the use of these scripts, some information must be edited in the file scripts/select_host_board.sh
: the remote host to be accessed, in the REMOTE_HOST
variable, and the serial number of the corresponding evaluation board (STM32F407G-DISC1, denoted as STM32F4DISCOVERY
in the script, or NUCLEO-L4R5ZI).
It is recommended that a hostname is configured in ~/.ssh/config
for the remote workflow, so as to ensure that passwords are not prompted when connecting. If not using a remote development workflow, a valid string must nevertheless be inserted to prevent syntax errors in the script; however, the script detects that a board is connected to the local machine, and does not perform any remote connections in that case.
To obtain the serial number for the connected board(s), run st-info --probe
, and look for a 24-digit hexadecimal string output by the tool. It should be copied and pasted into the corresponding line (according to the development board in use) starting with STLINK_SERIAL=
.
Although these scripts are primarily intended for use in a Visual Studio Code environment, they can be used directly from a terminal, or set up to be used in a different development environment. After editing scripts/select_host_board.sh
, the following steps can be used to flash and monitor the board with the aid of these scripts:
-
From the
scripts
directory, run./upload.sh ../build/bench
(resp../upload.sh ../build/run_tests
) to flash the benchmarking (resp. test) binaries to the board. -
From the
scripts
directory, run./monitor.sh
to display the standard output from the board.
Occasionally, the board may fail to flash using the default tool (OpenOCD) used by the scripts/upload.sh
script. We have found that retrying while switching between different tools (OpenOCD and stlink) a couple of times may remedy this problem, without requiring a physical disconnection of the board, which may be inconvenient or even impossible in a remote workflow. To use the stlink tools instead of the default OpenOCD tools for flashing, edit scripts/upload.sh
and replace USE_OPENOCD=1
with USE_OPENOCD=0
.
For a more complete overview, please refer to project_structure.md. Here are its directory components
build
: Contains CMake-generated build artifacts (object files, binaries).cmake
: Hosts CMake modules for code generation, toolchain setup, and dependency management.hw
: Hardware-specific code (e.g., STM32 HAL, linker scripts).libopencm3
: Submodule for the libopencm3 firmware library (STM32 support).m4-bench
: Generated benchmarks for M4 implementations of primes.m4-codegen
: Generated M4 code for primes.m4-custom
: Custom M4 assembly code for specific primes.m4-external
: Benchmark code for external implementations.m4-tests
: Generated unit tests for M4 implementations.mini-gmp
: Minimal GMP implementation for testing.modarith
: Python scripts for original modarith.modarith-bench
: Generated benchmarks for modarith implementations of primes.modarith-codegen
: Generated modarith code for primes.scripts
: Python scripts of M4 for code generation, testing, and debugging.src
: Necessary support C test and utility code.unity
: Submodule for the Unity unit testing framework.
We follow the same API as modarith. From its README:
Assume the modulus is
nres() -- Convert a big number to internal format
redc() -- Convert back from internal format, result
modfsb() -- Perform final subtraction to reduce from
modadd() -- Modular addition, result
modsub() -- Modular subtraction, result
modneg() -- Modular negation, result
modmul() -- Modular multiplication, result
modsqr() -- Modular squaring, result
modmli() -- Modular multiplication by a small integer, result
modcpy() -- Copy a big number
modpro() -- Calculate progenitor, for subsequent use for modular inverses and square roots
modinv() -- Modular inversion
modsqrt() -- Modular square root
modis1() -- Test for equal to unity
modis0() -- Test for equal to zero
modone() -- Set equal to unity
modzer() -- Set equal to zero
modint() -- Convert an integer to internal format
modqr() -- Test for quadratic residue
modcmv() -- Conditional constant time move
modcsw() -- Conditional constant time swap
modshl() -- shift left by bits
modshr() -- shift right by bits
mod2r() -- set to 2^r
modexp() -- export from internal format to byte array
modimp() -- import to internal format from byte array
modsign() -- Extract sign (parity bit)
modcmp() -- Test for equality \