Skip to content

Crypto-TII/m4-modarith

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

M4-MODARITH

This library can be seen as an extension of modarith, originally a Python script developed by Michael Scott (TII) for general finite field arithmetic using portable C or Rust code. Our library, named m4-modarith, generates C code with embedded assembly tailored for the ARM Cortex-M4.

Its main script takes as input the target prime and desired reduction method. It generates efficient C code, optimized for the target prime.

Main usage

The library can be used on two main ways:

  • As a standalone script for generating efficient code for a specific prime and reduction method tailored for the ARM Cortex-M4 architecture

    • Usage: python m4generator.py <prime name or expression> <reduction>
    • prime name: a short name for the prime of interest, using valid C identifier characters; e.g. C.25519. Alternatively, an integer or expression can be used directly.
    • Valid values for reduction: pmp_asm, pmp_c, mont
      • Description of reduction types and their parameters:
      • pmp_asm: reduction modulo pseudo-Mersenne primes, i.e. those of the form $2^n - c$.
      • pmp_c: same as pmp_asm, except that reduction is implemented mostly in C, with inline assembly blocks only as needed.
      • mont: reduction using Montgomery multiplication.
    • For our inlining policy, please refer to inline_policy.md.
    • Refer to modifying.md#adding-a-new-prime-number for a reference of other possible options for the script
  • As a benchmark suite of several generated code for different primes and reduction methods, as well as a testing platform of each generated function.

    • Refer to modifying.md for instructions for adding new primes, unit tests, and benchmarks to the project.

Overall building process

This library was built and tested on NUCLEO-L4R5ZI and STM32F407G-DISC1 evaluation boards, as well as using QEMU for emulation. Any other usage can be considered experimental and no guarantees are made on its compilation process.

Dependencies

  • arm-none-eabi-gcc (clang toolchain currently not supported).
    • For Mac users, please use homebrew's package gcc-arm-embedded (and not the arm-none-eabi-gcc package)
  • stlink to flash binaries into the development board. Some distribution have packages for this.
  • Standard development tools: CMake, make, git, ninja. Although not strictly necessary, clang-format is used by modarith, and if not present, will result in error messages being printed during the build process -- but note that they do not interrupt the build process, merely cluttering the output.
  • python and python3 (both commands must be present on the development machine).
  • addchain; binaries for your system may be downloaded from https://github.com/mmcloughlin/addchain/releases, and must be copied to a location accessible from your system PATH.
  • Suggested VS Code extensions: CMake Tools, Cortex-Debug, C++ TestMate. Use of the Remote - SSH extension is also supported.
  • For modarith, some additional dependencies are needed. Here are listed the main dependencies, but please check its github repository for any other possible dependency.

Configure and build

For a complete outline of the build process, including code generation, compilation, and testing steps, please refer to build.md.

As a quick guide for configuring and building, these commands should fully build the project:

# From the main project directory
rm -rf build
mkdir build
cd build
cmake -DCMAKE_TOOLCHAIN_FILE=cmake/arm-none-eabi-gcc-libopencm3.toolchain -DCMAKE_BUILD_TYPE=Release -G Ninja ..
ninja

Emulate with QEMU (mostly to run unit tests)

# From the main project directory
qemu-system-arm -M olimex-stm32-h405 -semihosting -nographic -serial null -serial stdio -monitor null -kernel build/run_tests.elf

Flash to board and monitor via SWO

The hardware abstraction layer developed for this project employs the Cortex-M4's Single Wire Output (SWO) feature to redirect standard output. This avoids the need to modify the board or connect a USB-serial converter in boards such as the STM32F407G-DISC1. To read from SWO, it is necessary to use either tools from either OpenOCD or stlink, with the latter being the simplest option. To clarify, reading from a serial port is NOT supported, as standard output is not redirected to any of the MCU's serial ports.

The following commands can be used to flash the board and monitor standard output using SWO for stlink versions >= 1.8.0, assuming the use of the NUCLEO-L4R5ZI evaluation board:

# From the main project directory
st-flash --reset write build/bench.bin 0x08000000
st-trace -c20m

If using the STM32F407G-DISC1 board, the clock frequency passed to st-trace must be modified, again when using stlink versions >= 1.8.0:

st-trace -c24m

Older versions of stlink implicitly assumed that clock frequencies were given in MHz to the st-trace tool, without the use of the "m" suffix. Thus, the shuffix should be removed from the argument, as illustrated next for the case of the STM32F407G-DISC1 evaluation board:

st-trace -c24

Helper scripts

The scripts directory in the root of the project includes some shell scripts that serve two main purposes:

  1. Automating the development in a Visual Studio Code environment. It is possible to perform a series of tasks directly within the environment, such as flashing binaries to the board, monitoring the standard output of the board to get test or benchmark results, automated running of tests (under QEMU) using the Testing pane of Visual Studio Code, and debugging the project (whether under QEMU or on the hardware itself, using the ST-Link programmer present on ST evaluation boards).

  2. Supporting a development workflow where boards are not connected locally to the development computer, but instead to a remote PC running a Unix-like operating system (tested on Linux) which is accessible via ssh. The same tasks mentioned in item 1 can be transparently run via ssh.

To allow the use of these scripts, some information must be edited in the file scripts/select_host_board.sh: the remote host to be accessed, in the REMOTE_HOST variable, and the serial number of the corresponding evaluation board (STM32F407G-DISC1, denoted as STM32F4DISCOVERY in the script, or NUCLEO-L4R5ZI).

It is recommended that a hostname is configured in ~/.ssh/config for the remote workflow, so as to ensure that passwords are not prompted when connecting. If not using a remote development workflow, a valid string must nevertheless be inserted to prevent syntax errors in the script; however, the script detects that a board is connected to the local machine, and does not perform any remote connections in that case.

To obtain the serial number for the connected board(s), run st-info --probe, and look for a 24-digit hexadecimal string output by the tool. It should be copied and pasted into the corresponding line (according to the development board in use) starting with STLINK_SERIAL=.

Although these scripts are primarily intended for use in a Visual Studio Code environment, they can be used directly from a terminal, or set up to be used in a different development environment. After editing scripts/select_host_board.sh, the following steps can be used to flash and monitor the board with the aid of these scripts:

  1. From the scripts directory, run ./upload.sh ../build/bench (resp. ./upload.sh ../build/run_tests) to flash the benchmarking (resp. test) binaries to the board.

  2. From the scripts directory, run ./monitor.sh to display the standard output from the board.

Occasionally, the board may fail to flash using the default tool (OpenOCD) used by the scripts/upload.sh script. We have found that retrying while switching between different tools (OpenOCD and stlink) a couple of times may remedy this problem, without requiring a physical disconnection of the board, which may be inconvenient or even impossible in a remote workflow. To use the stlink tools instead of the default OpenOCD tools for flashing, edit scripts/upload.sh and replace USE_OPENOCD=1 with USE_OPENOCD=0.


Library Structure Overview

For a more complete overview, please refer to project_structure.md. Here are its directory components

  • build: Contains CMake-generated build artifacts (object files, binaries).
  • cmake: Hosts CMake modules for code generation, toolchain setup, and dependency management.
  • hw: Hardware-specific code (e.g., STM32 HAL, linker scripts).
  • libopencm3: Submodule for the libopencm3 firmware library (STM32 support).
  • m4-bench: Generated benchmarks for M4 implementations of primes.
  • m4-codegen: Generated M4 code for primes.
  • m4-custom: Custom M4 assembly code for specific primes.
  • m4-external: Benchmark code for external implementations.
  • m4-tests: Generated unit tests for M4 implementations.
  • mini-gmp: Minimal GMP implementation for testing.
  • modarith: Python scripts for original modarith.
  • modarith-bench: Generated benchmarks for modarith implementations of primes.
  • modarith-codegen: Generated modarith code for primes.
  • scripts: Python scripts of M4 for code generation, testing, and debugging.
  • src: Necessary support C test and utility code.
  • unity: Submodule for the Unity unit testing framework.

Library API

We follow the same API as modarith. From its README:

Assume the modulus is $p$. The provided functions are

nres() -- Convert a big number to internal format
redc() -- Convert back from internal format, result $\lt p$
modfsb() -- Perform final subtraction to reduce from $\lt 2p$ to $\lt p$
modadd() -- Modular addition, result $\lt 2p$
modsub() -- Modular subtraction, result $\lt 2p$
modneg() -- Modular negation, result $\lt 2p$
modmul() -- Modular multiplication, result $\lt 2p$
modsqr() -- Modular squaring, result $\lt 2p$
modmli() -- Modular multiplication by a small integer, result $\lt 2p$
modcpy() -- Copy a big number
modpro() -- Calculate progenitor, for subsequent use for modular inverses and square roots
modinv() -- Modular inversion
modsqrt() -- Modular square root
modis1() -- Test for equal to unity
modis0() -- Test for equal to zero
modone() -- Set equal to unity
modzer() -- Set equal to zero
modint() -- Convert an integer to internal format
modqr() -- Test for quadratic residue
modcmv() -- Conditional constant time move
modcsw() -- Conditional constant time swap
modshl() -- shift left by bits
modshr() -- shift right by bits
mod2r() -- set to 2^r
modexp() -- export from internal format to byte array
modimp() -- import to internal format from byte array
modsign() -- Extract sign (parity bit)
modcmp() -- Test for equality \


About

Script generated finite field arithmetic for elliptic curve cryptography targeting ARM Cortex-M4

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •