@@ -2,42 +2,288 @@ Summary
22~~~~~~~
33
44The bmap-tools project implements bmap-related tools and API modules. The
5- entire project is written in python, and requires python 2.7+.
6-
7- Currently the main user of this project is Tizen IVI, but the project is
8- generic and can be used everywhere, when dealing with raw images.
5+ entire project is written in python and supports python 2.7 and python 3.x.
96
107The project author and maintainer is Artem Bityutskiy <
[email protected] >.
118Please, feel free to contact me if you have questions.
129
13- The project is documented here:
14- https://source.tizen.org/documentation/reference/bmaptool
10+ Project git repository is here:
11+ https://github.com/01org/bmap-tools.git
1512
16- The project mailing list is (no need to subscribe to post there):
17- 1813
19- Mailing list archives:
20- http://lists.infradead.org/pipermail/bmap-tools/
14+ Introduction
15+ ~~~~~~~~~~~~
2116
22- Subscribe here:
23- http://lists.infradead.org/mailman/listinfo/bmap-tools
17+ Bmaptool is a generic tool for creating the block map (bmap) for a file and
18+ copying files using the block map. The idea is that large files, like raw
19+ system image files, can be copied or flashed a lot faster and more reliably
20+ with bmaptool than with traditional tools, like "dd" or "cp".
2421
25- The project git is here:
26- https://github.com/01org/bmap-tools.git
22+ Bmaptool was originally created for the "Tizen IVI" project and it was used for
23+ flashing system images to USB sticks and other block devices. Bmaptool can also
24+ be used for general image flashing purposes, for example, flashing Fedora Linux
25+ OS distribution images to USB sticks.
2726
28- Signed release tarballs are available here:
29- ftp://ftp.infradead.org/pub/bmap-tools/
27+ Originally Tizen IVI images had been flashed using the "dd" tool, but bmaptool
28+ brought a number of advantages.
3029
31- Packages for various distributions are available here:
32- * The latest release: http://download.tizen.org/tools/latest-release/
33- * The latest pre-release: http://download.tizen.org/tools/pre-release/
34- * Older releases: http://download.tizen.org/tools/archive
30+ * Faster. Depending on various factors, like write speed, image size, how full
31+ is the image, and so on, bmaptool was 5-7 times faster than "dd" in the Tizen
32+ IVI project.
33+ * Integrity. Bmaptool verifies data integrity while flashing, which means that
34+ possible data corruptions will be noticed immediately.
35+ * Usability. Bmaptool can read images directly from the remote server, so users
36+ do not have to download images and save them locally.
37+ * Protects user's data. Unlike "dd", if you make a mistake and specify a wrong
38+ block device name, bmaptool will less likely destroy your data because it has
39+ protection mechanisms which, for example, prevent bmaptool from writing to a
40+ mounted block device.
3541
36- Please, contribute by sending patches to the mailing list, feel free to CC
37- me: Artem Bityutskiy <
[email protected] >
3842
39- The project structure
40- ~~~~~~~~~~~~~~~~~~~~~
43+ Usage
44+ ~~~~~
45+
46+ Bmaptool supports 2 subcommands:
47+ * "copy" - copy a file to another file using bmap or flash an image to a block
48+ device
49+ * "create" - create a bmap for a file
50+
51+ You can get usage reference for bmaptool and all the supported command using
52+ the "-h" or "--help" options:
53+
54+ $ bmaptool -h # General bmaptool help
55+ $ bmaptool cmd -h # Help on the "cmd" sub-command
56+
57+ You can also refer to the bmaptool manual page:
58+ $ man bmaptool
59+
60+
61+ Concept
62+ ~~~~~~~
63+
64+ This section provides general information about the block map (bmap) necessary
65+ for understanding how bmaptool works. The structure of the section is:
66+
67+ * "Sparse files" - the bmap ideas are based on sparse files, so it is important
68+ to understand what sparse files are.
69+ * "The block map" - explains what bmap is.
70+ * "Raw images" - the main usage scenario for bmaptool is flashing raw images,
71+ which this section discusses.
72+ * "Usage scenarios" - describes various possible bmap and bmaptool usage
73+ scenarios.
74+
75+ Sparse files
76+
77+ One of the main roles of a filesystem, generally speaking, is to map blocks of
78+ file data to disk sectors. Different file-systems do this mapping differently,
79+ and filesystem performance largely depends on how well the filesystem can do
80+ the mapping. The filesystem block size is usually 4KiB, but may also be 8KiB or
81+ larger.
82+
83+ Obviously, to implement the mapping, the file-system has to maintain some kind
84+ of on-disk index. For any file on the file-system, and any offset within the
85+ file, the index allows you to find the corresponding disk sector, which stores
86+ the file's data. Whenever we write to a file, the filesystem looks up the index
87+ and writes to the corresponding disk sectors. Sometimes the filesystem has to
88+ allocate new disk sectors and update the index (such as when appending data to
89+ the file). The filesystem index is sometimes referred to as the "filesystem
90+ metadata".
91+
92+ What happens if a file area is not mapped to any disk sectors? Is this
93+ possible? The answer is yes. It is possible and these unmapped areas are often
94+ called "holes". And those files which have holes are often called "sparse
95+ files".
96+
97+ All reasonable file-systems like Linux ext[234], btrfs, XFS, or Solaris XFS,
98+ and even Windows' NTFS, support sparse files. Old and less reasonable
99+ filesystems, like FAT, do not support holes.
100+
101+ Reading holes returns zeroes. Writing to a hole causes the filesystem to
102+ allocate disk sectors for the corresponding blocks. Here is how you can create
103+ a 4GiB file with all blocks unmapped, which means that the file consists of a
104+ huge 4GiB hole:
105+
106+ $ truncate -s4G image.raw
107+ $ stat image.raw
108+ File: image.raw
109+ Size: 4294967296 Blocks: 0 IO Block: 4096 regular file
110+
111+ Notice that "image.raw" is a 4GiB file, which occupies 0 blocks on the disk!
112+ So, the entire file's contents are not mapped anywhere. Reading this file would
113+ result in reading 4GiB of zeroes. If you write to the middle of the image.raw
114+ file, you'll end up with 2 holes and a mapped area in the middle.
115+
116+ Therefore:
117+ * Sparse files are files with holes.
118+ * Sparse files help save disk space, because, roughly speaking, holes do not
119+ occupy disk space.
120+ * A hole is an unmapped area of a file, meaning that it is not mapped anywhere
121+ on the disk.
122+ * Reading data from a hole returns zeroes.
123+ * Writing data to a hole destroys it by forcing the filesystem to map
124+ corresponding file areas to disk sectors.
125+ * Filesystems usually operate with blocks, so sizes and offsets of holes are
126+ aligned to the block boundary.
127+
128+ It is also useful to know that you should work with sparse files carefully. It
129+ is easy to accidentally expand a sparse file, that is, to map all holes to
130+ zero-filled disk areas. For example, "scp" always expands sparse files, the
131+ "tar" and "rsync" tools do the same, by default, unless you use the "--sparse"
132+ option. Compressing and then decompressing a sparse file usually expands it.
133+
134+ There are 2 ioctl's in Linux which allow you to find mapped and unmapped areas:
135+ "FIBMAP" and "FIEMAP". The former is very old and is probably supported by all
136+ Linux systems, but it is rather limited and requires root privileges. The
137+ latter is a lot more advanced and does not require root privileges, but it is
138+ relatively new (added in Linux kernel, version 2.6.28).
139+
140+ Recent versions of the Linux kernel (starting from 3.1) also support the
141+ "SEEK_HOLE" and "SEEK_DATA" values for the "whence" argument of the standard
142+ "lseek()" system call. They allow positioning to the next hole and the next
143+ mapped area of the file.
144+
145+ Advanced Linux filesystems, in modern kernels, also allow "punching holes",
146+ meaning that it is possible to unmap any aligned area and turn it into a hole.
147+ This is implemented using the "FALLOC_FL_PUNCH_HOLE" "mode" of the
148+ "fallocate()" system call.
149+
150+ The bmap
151+
152+ The bmap is an XML file, which contains a list of mapped areas, plus some
153+ additional information about the file it was created for, for example:
154+ * SHA256 checksum of the bmap file itself
155+ * SHA256 checksum of the mapped areas
156+ * the original file size
157+ * amount of mapped data
158+
159+ The bmap file is designed to be both easily machine-readable and
160+ human-readable. All the machine-readable information is provided by XML tags.
161+ The human-oriented information is in XML comments, which explain the meaning of
162+ XML tags and provide useful information like amount of mapped data in percent
163+ and in MiB or GiB.
164+
165+ So, the best way to understand bmap is to just to read it. Here is an example
166+ of a Tizen IVI 2.0 alpha snapshot bmap file. The vast amount of block ranges
167+ have been removed, though, to keep it shorter.
168+
169+ Raw images
170+
171+ Raw images are the simplest type of system images which may be flashed to the
172+ target block device, block-by-block, without any further processing. Raw images
173+ just "mirror" the target block device: they usually start with the MBR sector.
174+ There is a partition table at the beginning of the image and one or more
175+ partitions containing filesystems, like ext4. Usually, no special tools are
176+ required to flash a raw image to the target block device. The standard "dd"
177+ command can do the job:
178+
179+ $ dd if=tizen-ivi-image.raw of=/dev/usb_stick
180+
181+ At first glance, raw images do not look very appealing because they are large
182+ and it takes a lot of time to flash them. However, with bmap, raw images become
183+ a much more attractive type of image. We will demonstrate this, using Tizen IVI
184+ as an example.
185+
186+ The Tizen IVI project uses raw images which take 3.7GiB in Tizen IVI 2.0 alpha.
187+ The images are created by the MIC tool. Here is a brief description of how MIC
188+ creates them:
189+
190+ * create a 3.7GiB sparse file, which will become the Tizen IVI image in the end
191+ * partition the file using the "parted" tool
192+ * format the partitions using the "mkfs.ext4" tool
193+ * loop-back mount all the partitions
194+ * install all the required packages to the partitions: copy all the needed
195+ files and do all the tweaks
196+ * unmount all loop-back-mounted image partitions, the image is ready
197+ * generate the block map file for the image
198+ * compress the image using "bzip2", turning them into a small file, around
199+ 300MiB
200+
201+ The Tizen IVI raw images are initially sparse files. All the mapped blocks
202+ represent useful data and all the holes represent unused regions, which
203+ "contain" zeroes and do not have to be copied when flashing the image. Although
204+ information about holes is lost once the image gets compressed, the bmap file
205+ still has it and it can be used to reconstruct the uncompressed image or to
206+ flash the image quickly, by copying only the mapped regions.
207+
208+ Raw images compress extremely well because the holes are essentially zeroes,
209+ which compress perfectly. This is why 3.7GiB Tizen IVI raw images, which
210+ contain about 1.1GiB of mapped blocks, take only 300MiB in a compressed form.
211+ And the important point is that you need to decompress them only while
212+ flashing. The bmaptool does this "on-the-fly".
213+
214+ Therefore:
215+ * raw images are distributed in a compressed form, and they are almost as small
216+ as a tarball (that includes all the data the image would take)
217+ * the bmap file and the bmaptool make it possible to quickly flash the
218+ compressed raw image to the target block device
219+ * optionally, the bmaptool can reconstruct the original uncompressed sparse raw
220+ image file
221+
222+ And, what is even more important, is that flashing raw images is extremely fast
223+ because you write directly to the block device, and write sequentially.
224+
225+ Another great thing about raw images is that they may be 100% ready-to-go and
226+ all you need to do is to put the image on your device "as-is". You do not have
227+ to know the image format, which partitions and filesystems it contains, etc.
228+ This is simple and robust.
229+
230+ Usage scenarios
231+
232+ Flashing or copying large images is the main bmaptool use case. The idea is
233+ that if you have a raw image file and its bmap, you can flash it to a device by
234+ writing only the mapped blocks and skipping the unmapped blocks.
235+
236+ What this basically means is that with bmap it is not necessary to try to
237+ minimize the raw image size by making the partitions small, which would require
238+ resizing them. The image can contain huge multi-gigabyte partitions, just like
239+ the target device requires. The image will then be a huge sparse file, with
240+ little mapped data. And because unmapped areas "contain" zeroes, the huge image
241+ will compress extremely well, so the huge image will be very small in
242+ compressed form. It can then be distributed in compressed form, and flashed
243+ very quickly with bmaptool and the bmap file, because bmaptool will decompress
244+ the image on-the-fly and write only mapped areas.
245+
246+ The additional benefit of using bmap for flashing is the checksum verification.
247+ Indeed, the "bmaptool create" command generates SHA256 checksums for all mapped
248+ block ranges, and the "bmaptool copy" command verifies the checksums while
249+ writing. Integrity of the bmap file itself is also protected by a SHA256
250+ checksum and bmaptool verifies it before starting flashing.
251+
252+ On top of this, the bmap file can be signed using OpenPGP (gpg) and bmaptool
253+ automatically verifies the signature if it is present. This allows for
254+ verifying the bmap file integrity and authoring. And since the bmap file
255+ contains SHA256 checksums for all the mapped image data, the bmap file
256+ signature verification should be enough to guarantee integrity and authoring of
257+ the image file.
258+
259+ The second usage scenario is reconstructing sparse files Generally speaking, if
260+ you had a sparse file but then expanded it, there is no way to reconstruct it.
261+ In some cases, something like
262+
263+ $ cp --sparse=always expanded.file reconstructed.file
264+
265+ would be enough. However, a file reconstructed this way will not necessarily be
266+ the same as the original sparse file. The original sparse file could have
267+ contained mapped blocks filled with all zeroes (not holes), and, in the
268+ reconstructed file, these blocks will become holes. In some cases, this does
269+ not matter. For example, if you just want to save disk space. However, for raw
270+ images, flashing it does matter, because it is essential to write zero-filled
271+ blocks and not skip them. Indeed, if you do not write the zero-filled block to
272+ corresponding disk sectors which, presumably, contain garbage, you end up with
273+ garbage in those blocks. In other words, when we are talking about flashing raw
274+ images, the difference between zero-filled blocks and holes in the original
275+ image is essential because zero-filled blocks are the required blocks which are
276+ expected to contain zeroes, while holes are just unneeded blocks with no
277+ expectations regarding the contents.
278+
279+ Bmaptool may be helpful for reconstructing sparse files properly. Before the
280+ sparse file is expanded, you should generate its bmap (for example, by using
281+ the "bmaptool create" command). Then you may compress your file or, otherwise,
282+ expand it. Later on, you may reconstruct it using the "bmaptool copy" command.
283+
284+
285+ Project structure
286+ ~~~~~~~~~~~~~~~~~
41287
42288--------------------------------------------------------------------------------
43289| - bmaptool | A tools to create bmap and copy with bmap. Based |
@@ -69,7 +315,10 @@ The project structure
69315| | - TransRead.py | Provides a transparent way to read various kind of |
70316| | | files (compressed, etc) |
71317| - debian/* | Debian packaging for the project. |
318+ | - doc/* | Project documentation. |
72319| - packaging/* | RPM packaging (Fedora & OpenSuse) for the project. |
320+ | - contrib/* | Various contributions that may be useful, but |
321+ | | project maintainers do not really test or maintain. |
73322--------------------------------------------------------------------------------
74323
75324How to run unit tests
@@ -79,40 +328,6 @@ Just install the 'nose' python test framework and run the 'nosetests' command in
79328the project root directory. If you want to see tests coverage report, run
80329'nosetests --with-coverage'.
81330
82- Branches and releases
83- ~~~~~~~~~~~~~~~~~~~~~
84-
85- The project uses the following git branches:
86- 1. devel - here we do all the development, so this branch contains the latest
87- code. Things may be broken in this branch, although we do not commit
88- anything before it passes the unit-tests. But of course, the unit-tests
89- have limited coverage. Anyway, do not use this branch unless you are a
90- developer or you know what you are doing.
91- 2. master - we do not use this branch for anything but pointing to the latest
92- release. This means that you may safely take this branch and be sure this
93- is the latest stable code.
94- 3. release-x.0 - pre-releases or releases or bug-fix releases of version "x".
95-
96- Let's take an example. When we start developing the 'bmap-tools' project from
97- scratch, and have the first version 1.0-rc1 which somehow works, we create the
98- 'release-1.0' branch. The idea is that this branch will eventually contain the
99- first bmap-tools release version 1.0. But at the moment it contains the
100- pre-release version 0.1. As we move forward, we cut pre-releases
101- 1.0-rc2, 1.0-rc3..., 1.0-rc7, and so on. They are all published in the
102- 'release-1.0' branch. And of course, the 'master' branch points to the latest
103- release (same as release candidate, rc).
104-
105- Then at some point we finally release the first 'bmap-tools' version 1.0. No
106- more features are added to the 1.0 release. At the same time we continue
107- developing in the 'devel' branch and add major features for the next '2.0'
108- release. We create the 'release-2.0' branch, and publish 2.0 pre-releases
109- there: 2.0-rc1, 2.0-rc2, etc.
110-
111- Meanwhile, users report brown-paperbag flaws in bmap-tools-1.0. We fix the
112- issues, and publish bug-fix releases: 1.1, 1.2, etc. They are also published in
113- the 'release-1.0' branch. The 'master' branch points to the latest 2.0
114- release, though.
115-
116331Credits
117332~~~~~~~
118333
0 commit comments