Skip to content

Commit eb752e1

Browse files
authored
Add an elf2bin utility. (#328)
This tool reads ELF image files, paying attention only to the 'segment' view of the file presented by the program header table, and ignoring the section header table completely (including not even noticing whether one exists). It extracts the contents of the loadable segments of the file, and outputs them in various simple forms: plain binary files, two well-known hex file formats (Intel Hex and Motorola S-records), and the much simpler 'Verilog hex' format which is just a binary file with each byte translated into a two-digit hex number. Additional features include the ability to restrict which segments are extracted; the choice of whether to output one binary file per segment or a single one that combines all segments at their correct relative offsets with padding in between; and the ability to distribute the output bytes across multiple output files suitable for loading into a banked ROM setup in which (for example) each of four ROMs holds only the bytes living at addresses with a particular residue mod 4. In other words, this tool provides essentially all the same functionality that Arm Compiler's `fromelf` did, in its binary and hex output modes.
1 parent aeb9a29 commit eb752e1

File tree

12 files changed

+3304
-1
lines changed

12 files changed

+3304
-1
lines changed

arm-software/docs/arm-toolchain-for-embedded-features.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,4 +17,9 @@ There are no experimental features implemented.
1717

1818
# Features
1919

20-
There are no additional features implemented.
20+
In addition to the LLVM tools, Arm Toolchain for Embedded provides a
21+
utility `elf2bin`. This extracts the contents of the loadable segments
22+
from an ELF executable file, and outputs it in various forms suitable
23+
for loading into embedded targets, such as Intel Hex, Motorola
24+
S-records, or raw binary files. The documentation for `elf2bin` can be
25+
found in `elf2bin.md`.

arm-software/docs/elf2bin.md

Lines changed: 270 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,270 @@
1+
# `elf2bin`: convert ELF images to binary or hex via their segment view
2+
3+
`elf2bin` is a tool for converting ELF images to binary or hex
4+
formats.
5+
6+
It does a similar job to `llvm-objcopy` with binary or hex output, but
7+
unlike `llvm-objcopy`, it exclusively uses the ELF program header
8+
table, ignoring the sections. So it can cope with ELF images which
9+
have no section header table at all, or images whose sections don't
10+
exactly match up to the segments, e.g. have a gap in a segment which
11+
no section covers. Also, it supports a wider range of binary and hex
12+
output options.
13+
14+
The feature set of `elf2bin` is similar to the feature set of the
15+
`fromelf` tool shipped as part of the proprietary Arm Compiler 6
16+
toolchain, although the detailed syntax is different. Users migrating
17+
from that toolchain should find that `elf2bin` will support similar
18+
use cases.
19+
20+
(However, `elf2bin` is focused on binary and hex output, and does not
21+
support the other modes of `fromelf`, such as converting one ELF file
22+
to another, or generating diagnostic dumps and disassembly. For that
23+
functionality, use LLVM supporting tools such as `llvm-objcopy`,
24+
`llvm-objdump`, `llvm-nm` and `llvm-size`.)
25+
26+
## Using `elf2bin`
27+
28+
The general format of an `elf2bin` command involves:
29+
30+
* An output mode option, telling `elf2bin` what kind of binary or hex
31+
output it's generating.
32+
33+
* One or more input file names, which must be ELF images or dynamic
34+
libraries.
35+
36+
* Either a single output file name, or a pattern that tells `elf2bin`
37+
how to name multiple output files.
38+
39+
* Optionally, other options to adjust behavior.
40+
41+
### Output modes
42+
43+
This section lists the available options to set the type of output file.
44+
45+
#### `--ihex`: Intel hex format
46+
47+
The Intel hex format is a record-based format. Each data record states
48+
the address it is expected to be loaded at. So a single output file
49+
can specify two or more segments at widely separated addresses without
50+
having to include all the space in between.
51+
52+
Each line of an Intel hex file begins with a `:`. (This makes it easy
53+
to tell apart from the Motorola format which starts lines with `S`.)
54+
55+
The Intel hex format allows addresses to be specified in the form of a
56+
32-bit linear address, or in an 8086-style segment:offset pair. Like
57+
`fromelf` (and unlike GNU `objdump`), `elf2bin` always uses the linear
58+
address option, so that its hex files are as easy as possible to
59+
interpret.
60+
61+
There is no version of the Intel hex format that supports 64-bit
62+
addresses. `elf2bin` will give an error if a 64-bit input file
63+
specifies data to be loaded at an address that does not fit in 32
64+
bits.
65+
66+
#### `--srec`: Motorola hex format
67+
68+
The Motorola hex format is similar in concept to the Intel one: each
69+
data record specifies an address and some data to load at that address.
70+
71+
Each line of a Motorola hex file begins with an `S`. (This makes it
72+
easy to tell apart from the Intel format which starts lines with `:`.)
73+
74+
In the Motorola format, there are multiple record types which store
75+
addresses in 16-bit, 24-bit or 32-bit format. `elf2bin` keeps its
76+
output as simple and consistent as possible, by always using the
77+
32-bit record types (`S3` and `S7`).
78+
79+
#### `--bin`: one binary file per segment
80+
81+
The `--bin` option writes each loadable segment into a raw binary
82+
file, containing the bytes of data in the segment and nothing else.
83+
84+
If there is more than one loadable segment, then you must use `-O` to
85+
specify a pattern for the output file names, instead of `-o` to
86+
specify a single output file name.
87+
88+
#### `--bincombined`: one single binary file
89+
90+
The `--bincombined` mode writes out a _single_ binary file, which
91+
contains all the loadable segments in the image, with padding between
92+
them to put them at the correct relative offsets from each other.
93+
94+
The resulting file is suitable for loading at the base address of the
95+
first segment in memory.
96+
97+
(You can adjust the base address further downwards with `--base`,
98+
which adds padding before the first segment.)
99+
100+
#### `--vhx` and `--vhxcombined`: Verilog hex format
101+
102+
The Verilog hex format is a translation of a binary file into hex, by
103+
turning each binary byte into a two-digit hex number on a line by
104+
itself.
105+
106+
So, unlike the Intel and Motorola hex formats, there is no data inside
107+
the file that specifies the address to load data at.
108+
109+
`--vhx` behaves similarly to `--bin`: it outputs one hex file per
110+
loadable segment. `--vhxcombined` behaves similarly to
111+
`--bincombined`: it outputs a single hex file containing all the
112+
segments, with padding between them if necessary.
113+
114+
### Output file naming
115+
116+
If `elf2bin` is writing a single output file, you can use the `-o` (or
117+
`--output-file`) option to tell it the name of the file, e.g.
118+
119+
```
120+
elf2bin --srec -o output.hex input.elf
121+
```
122+
123+
But in many situations `elf2bin` will produce multiple output files:
124+
125+
* because you gave it multiple input files
126+
* because you used `--bin` or `--vhx` on a file with multiple segments
127+
* because you used `--banks` to split binary output into interleaved bank files
128+
* more than one of the above
129+
130+
In that case, using `-o` will produce an error, because `elf2bin` will
131+
notice that you've asked it to write more than one output file to the
132+
same location. Instead, you must use the `-O` or `--output-pattern`
133+
option to provide a _pattern_ for constructing each output file name.
134+
135+
Patterns look a bit like `printf` format strings: they consist of
136+
literal characters interleaved with formatting directives introduced
137+
by `%`. The available formatting directives are:
138+
139+
* `%f` expands to the base name of the input file, with directory path
140+
and file extension removed. For example, if an input file is called
141+
`foo/bar/baz.elf`, then `%f` will expand to just `baz` when
142+
generating output from that file.
143+
144+
* `%F` expands to the _full_ name of the input file, with the
145+
directory path still removed, but the extension left on. For
146+
example, `foo/bar/baz.elf` will turn into `baz.elf`.
147+
148+
* `%a` and `%A` expand to the base address of a particular ELF
149+
segment. These are for the `--bin` or `--vhx` modes, where each
150+
segment is output to a separate file. ELF contains no way to assign
151+
segments a human-readable name, so the base address is the simplest
152+
way to distinguish them. The address is generated in hex, with no
153+
leading zeroes (unless it's actually `0`). `%a` generates hex digits
154+
`a`-`f` in lower case, and `%A` generates them in upper case.
155+
156+
* `%b` expands to the bank number, if you're using `--banks` to split
157+
binary (or VHX) output into more than one bank. Banks are numbered
158+
consecutively upwards from 0, and are written in decimal.
159+
160+
* `%%` expands to a literal `%`, if you need one in the output file
161+
name.
162+
163+
Some examples:
164+
165+
```
166+
elf2bin --ihex -O %f.hex one.elf two.elf # generates one.hex and two.hex
167+
elf2bin --ihex -O %F.hex one.elf two.elf # generates one.elf.hex and two.elf.hex
168+
elf2bin --bin -O out-%a.bin input.elf # might generate, say, out-0.bin and out-f000.bin
169+
elf2bin --bin -O out-%A.bin input.elf # same but you'd get out-F000.bin
170+
elf2bin --bincombined --banks 1x2 out-%b.bin input.elf # out-0.bin and out-1.bin
171+
elf2bin --srec -O out-%%.hex input.elf # just gives out-%.hex
172+
```
173+
174+
In all cases, `elf2bin` will check its set of output files to ensure
175+
you haven't tried to direct two output files to the same name.
176+
177+
In a complex case, you may need to use more than one of these
178+
directives. For example, if you're using `--bin` with multiple ELF
179+
files at once, some of which have multiple segments, _and_ you're
180+
using bank interleaving, then you'll need to use all of `%f`, `%a` (or
181+
`%A`) and `%b` to generate a distinct name for each output file:
182+
183+
```
184+
elf2bin --bin --banks 2x4 -O %f-%a-%b.bin one.elf two.elf
185+
```
186+
187+
### Other options
188+
189+
#### `--base`: set the base address of a combined output file
190+
191+
If you're using the `--bincombined` or `--vhxcombined` output modes,
192+
you can use the `--base` option to specify the address you want the
193+
output file to begin at.
194+
195+
If this is lower than the start address of any segment, `elf2bin` will
196+
prepend padding to the file.
197+
198+
For example, if `input.elf` has its lowest segment starting at 0x8000,
199+
then you'll normally get an output file beginning with the data of
200+
that segment. But adding `--base 0x6000` will give an output file
201+
beginning with 0x2000 zero bytes, so that you could load the whole
202+
file beginning at address 0x6000 and all the segments would end up in
203+
the right places.
204+
205+
#### `--banks`: split the output between banks intended for separate ROMs
206+
207+
In binary and VHX formats, you can use `--banks` to request the output
208+
split up into interleaved banks, for example so that you can direct a
209+
CPU's 32-bit data bus to four ROMs each with an 8-bit data bus.
210+
211+
The argument to `--banks` consists of two numbers separated by an `x`.
212+
The first number is the 'width' of each bank: the number of
213+
consecutive bytes of data that go into each bank file before moving on
214+
to the next. The second is the number of banks.
215+
216+
For example, `--banks 2x4` generates four banks, each of which
217+
receives 2 consecutive bytes of the data in turn. That is, the output
218+
file for bank 0 would get all the bytes intended to end up in memory
219+
at addresses 0,1 (mod 8), bank 1 would get addresses 2,3 (mod 8), bank
220+
2 would get 4,5 and bank 3 would get 6,7.
221+
222+
#### `--datareclen`: control data record length in hex output formats
223+
224+
In the record-based hex formats `--ihex` and `--srec`, you can use
225+
`--datareclen` to control the number of bytes of the ELF file that
226+
appear in each data record. By default this is 16. The upper limit is
227+
different for the two formats.
228+
229+
#### `--segments`: control which loadable segments to output
230+
231+
You can use `--segments` to restrict `elf2bin` to writing only a
232+
subset of the loadable segments in the ELF file.
233+
234+
The argument is a comma-separated list of base addresses.
235+
236+
For example, if you had an input file containing segments at addresses
237+
0x8000, 0x20000 and 0x10000000, then `--segments 0x8000,0x10000000`
238+
would skip the middle one. This option applies to all output modes.
239+
240+
#### `--physical` and `--virtual`: choose which segment address field to use
241+
242+
In the ELF program header table, each segment has a 'physical address'
243+
and 'virtual address' field, called `p_paddr` and `p_vaddr`
244+
respectively in the ELF specification. Some ELF files set the two
245+
addresses differently, to indicate that the image is loaded into
246+
memory in one layout and then remapped (or physically moved) into a
247+
different layout to be run.
248+
249+
By default `elf2bin` uses the physical address field as the address of
250+
the segment. You can use `--virtual` to make it use the virtual
251+
address field instead.
252+
253+
(The `--physical` option is also provided, to explicitly ask for the
254+
physical address.)
255+
256+
#### `--zi`: include zero-initialized data after each segment
257+
258+
Normally, `elf2bin` treats each segment as containing only the bytes
259+
actually stored in the ELF file. That is, the segment is treated as
260+
having length corresponding to its `p_filesz` field, not its
261+
`p_memsz`.
262+
263+
You can use `--zi`, in any mode, to tell `elf2bin` to include zero
264+
padding after each segment to bring it up to its `p_memsz` length.
265+
266+
(If the ELF file specifies different physical and virtual addresses
267+
for each segment, then this option probably makes more sense in
268+
combination with `--virtual`, since the physical layout might pack all
269+
the segments tightly together without leaving room for the
270+
zero-initialized trailer of each one.)

arm-software/embedded/CMakeLists.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -354,6 +354,9 @@ add_subdirectory(
354354
${llvmproject_src_dir}/llvm llvm
355355
)
356356

357+
add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/../shared/elf2bin elf2bin)
358+
add_dependencies(check-all check-elf2bin)
359+
357360
if(LLVM_TOOLCHAIN_C_LIBRARY STREQUAL llvmlibc)
358361
install(
359362
FILES
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# cmake build script for elf2bin
2+
#
3+
# Copyright (c) 2022-2025, Arm Limited and affiliates.
4+
#
5+
# Part of the Arm Toolchain project, under the Apache License v2.0 with LLVM Exceptions.
6+
# See https://llvm.org/LICENSE.txt for license information.
7+
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
8+
#
9+
10+
# Tell add_llvm_tool where to find tablegen to process Opts.td
11+
set(LLVM_TABLEGEN_EXE ${CMAKE_BINARY_DIR}/llvm/bin/llvm-tblgen)
12+
13+
# Specify which LLVM libraries we want to link with
14+
set(LLVM_LINK_COMPONENTS
15+
Object
16+
Option
17+
Support
18+
)
19+
20+
# Add the LLVM include directories (in both source and build dirs),
21+
# and the elf2bin directory in the build dir where tablegen will write
22+
# Opts.inc to.
23+
include_directories(
24+
${CMAKE_CURRENT_SOURCE_DIR}/../../../llvm/include
25+
${CMAKE_BINARY_DIR}/llvm/include
26+
${CMAKE_BINARY_DIR}/elf2bin
27+
)
28+
29+
# Turn off RTTI: the LLVM libraries are compiled without it, so we
30+
# must compile without it too, or we'll get a link error trying to
31+
# find the RTTI for LLVM types such as llvm::opt::OptTable.
32+
if(CMAKE_COMPILER_IS_GNUCXX)
33+
list(APPEND LLVM_COMPILE_FLAGS "-fno-rtti")
34+
elseif(MSVC)
35+
list(APPEND LLVM_COMPILE_FLAGS "/GR-")
36+
endif()
37+
38+
# Process Opts.td into a list of command-line options.
39+
set(LLVM_TARGET_DEFINITIONS Opts.td)
40+
tablegen(LLVM Opts.inc -gen-opt-parser-defs)
41+
add_public_tablegen_target(Elf2BinOptsTableGen)
42+
43+
# Build the elf2bin binary itself.
44+
add_llvm_tool(elf2bin
45+
elf2bin.cpp
46+
elf.cpp
47+
bin.cpp
48+
hex.cpp
49+
DEPENDS
50+
Elf2BinOptsTableGen
51+
)
52+
53+
# And run its Python-based test suite.
54+
add_custom_target(check-elf2bin
55+
DEPENDS elf2bin
56+
COMMAND ${Python3_EXECUTABLE} ${CMAKE_CURRENT_SOURCE_DIR}/test/test.py
57+
$<TARGET_FILE:elf2bin>)

0 commit comments

Comments
 (0)