Skip to content
This repository was archived by the owner on Jan 29, 2024. It is now read-only.

Commit 0bb7b49

Browse files
committed
shoehorned CRC32 thanks to Clang optimization improvements
1 parent 83b6ea5 commit 0bb7b49

File tree

5 files changed

+214
-36
lines changed

5 files changed

+214
-36
lines changed

README.md

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,7 @@ It is a much more space efficient alternative to the 4kB Atmel/Microchip [AN_423
1111

1212
Downloading can be accomplished with the existing [dfu-util](http://dfu-util.sourceforge.net/) utilities.
1313

14-
Downloading a raw binary file looks like this:
15-
16-
```
17-
dfu-util -D write.bin
18-
```
19-
20-
or by using the provided dx1elf2dfu utility, one can create a .dfu file to be downloaded:
14+
Using the provided dx1elf2dfu utility, one can create a .dfu file; that file can then be downloaded like so:
2115

2216
```
2317
dfu-util -D write.dfu
@@ -27,13 +21,13 @@ dfu-util -D write.dfu
2721

2822
The linker memory map of the user application must be modified to have an origin at 0x0000_0400 rather than at 0x0000_0000. This bootloader resides at 0x0000_0000.
2923

30-
When booting, the bootloader checks whether a GPIO pin (nominally PA15) is connected to ground. If so, it runs the bootloader instead of the user application.
24+
When booting, the bootloader checks whether a GPIO pin (nominally PA15) is connected to ground. It also computes a CRC32 of the user application. If the user application is unprogrammed or corrupted, the CRC32 check should fall. If the CRC32 check fails or the GPIO pin is grounded, it runs the bootloader instead of the user application.
3125

3226
When branching to the user application, the bootloader includes functionality to update the [VTOR (Vector Table Offset Register)](http://infocenter.arm.com/help/topic/com.arm.doc.dui0662a/Ciheijba.html) and update the stack pointer to suit the value in the user application's vector table.
3327

3428
## Requirements for compiling
3529

36-
[Rowley Crossworks for ARM](http://www.rowley.co.uk/arm/) is presently needed to compile this code. With Crossworks for ARM v4.1.1, using the Clang 5.0.1 compiler produces an 1010 byte image. The more mainstream GCC does not appear to be optimized enough to produce an image that comes anywhere close to fitting into 1024 bytes.
30+
[Rowley Crossworks for ARM](http://www.rowley.co.uk/arm/) is presently needed to compile this code. With Crossworks for ARM v4.3.0, using the Clang 7.0.0 compiler produces a 1014 byte image. The more mainstream GCC does not appear to be optimized enough to produce an image that comes anywhere close to fitting into 1024 bytes.
3731

3832
There is no dependency in the code on the [Rowley Crossworks for ARM](http://www.rowley.co.uk/arm/) toolchain per se, but at this time I am not aware of any other ready-to-use Clang ARM cross-compiler package that I can readily point users to.
3933

bootloader.c

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -242,9 +242,23 @@ void bootloader(void)
242242
PORT->Group[0].PINCFG[15].reg = PORT_PINCFG_PULLEN | PORT_PINCFG_INEN;
243243
PORT->Group[0].OUTSET.reg = (1UL << 15);
244244

245-
if (PORT->Group[0].IN.reg & (1UL << 15))
246-
return; /* pin not grounded, so run user app */
245+
PAC1->WPCLR.reg = 2; /* clear DSU */
247246

247+
DSU->ADDR.reg = 0x400; /* start CRC check at beginning of user app */
248+
DSU->LENGTH.reg = *(volatile uint32_t *)0x410; /* use length encoded into unused vector address in user app */
249+
250+
/* ask DSU to compute CRC */
251+
DSU->DATA.reg = 0xFFFFFFFF;
252+
DSU->CTRL.bit.CRC = 1;
253+
while (!DSU->STATUSA.bit.DONE);
254+
255+
if (!(PORT->Group[0].IN.reg & (1UL << 15)))
256+
goto run_bootloader; /* pin grounded, so run bootloader */
257+
258+
if (0 == DSU->DATA.reg)
259+
return; /* CRC passes, so run user app */
260+
261+
run_bootloader:
248262
/*
249263
configure oscillator for crystal-free USB operation
250264
*/

dx1elf2dfu/README.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,18 @@
11
dx1elf2dfu
22
==========
33

4-
This tool is a possible aid for developers who use the Dx1bootloader and want to generate a DFU image directly from a Dx1 ELF object file.
4+
This tool is for users of the Dx1bootloader to generate a DFU image directly from a SAMD11/SAMD21 ELF object file.
5+
6+
This DFU image output includes a CRC32 calculation that is stored inside the user application; this is checked by the bootloader to verify the user application's integrity.
57

68
## Sample Usage
79

810
```
911
dx1elf2dfu myapp.elf myapp.dfu
1012
```
1113

14+
## Theory of Operation
15+
16+
The Cortex-M0 vector table has nine unused 32-bit entries marked 'Reserved' that only serve as wasted flash space. By using two of these entries to store the user application length and its CRC32, the dx1elf2dfu utility can communicate this information to the bootloader without consuming additional space.
17+
18+
By storing the user application length, the bootloader will only compute the CRC32 over a prescribed portion of the flash. This frees the user application, if it wishes, to store and re-write data in upper portions of the flash without impacting the CRC32 protection.

dx1elf2dfu/dx1elf2dfu.c

Lines changed: 186 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,10 @@
66
https://github.com/majbthrd/SAMDx1-USB-DFU-Bootloader
77
and thus expects the input ELF to not use the first 1024 bytes.
88
9+
The bootloader expects an application length and CRC32 to be stored
10+
without the user application (using unused vector table entries). This
11+
tool calculates these quantities and inserts them into the output DFU.
12+
913
Permission is hereby granted, free of charge, to any person obtaining a
1014
copy of this software and associated documentation files (the "Software"),
1115
to deal in the Software without restriction, including without limitation
@@ -33,6 +37,10 @@
3337
#define USB_VENDOR_ID 0x1209
3438
#define USB_PRODUCT_ID 0x2003
3539

40+
static const uint32_t origin_addr = 0x400; /* origin of the application (first address available after the bootloader) */
41+
static const uint32_t app_len_offset = 0x10; /* reserved application vector where the application size is stored */
42+
static const uint32_t app_crc_offset = 0x14; /* reserved application vector where the CRC value is stored */
43+
3644
typedef struct Elf32_Ehdr
3745
{
3846
uint8_t e_ident[16];
@@ -241,7 +249,7 @@ static struct memory_blob *find_blob(uint32_t address, uint32_t count, struct me
241249
}
242250

243251
addition = malloc(sizeof(struct memory_blob));
244-
memset(addition, 0, sizeof(struct memory_blob));
252+
memset(addition, 0xFF, sizeof(struct memory_blob));
245253

246254
addition->data = malloc(count);
247255
addition->address = address;
@@ -324,6 +332,26 @@ static const uint32_t crc32_table[256] =
324332
0xB40BBE37, 0xC30C8EA1, 0x5A05DF1B, 0x2D02EF8D,
325333
};
326334

335+
static const uint8_t high_lookup[256] =
336+
{
337+
0, 65, 195, 130, 134, 199, 69, 4, 77, 12, 142, 207, 203, 138, 8, 73,
338+
154, 219, 89, 24, 28, 93, 223, 158, 215, 150, 20, 85, 81, 16, 146, 211,
339+
117, 52, 182, 247, 243, 178, 48, 113, 56, 121, 251, 186, 190, 255, 125, 60,
340+
239, 174, 44, 109, 105, 40, 170, 235, 162, 227, 97, 32, 36, 101, 231, 166,
341+
234, 171, 41, 104, 108, 45, 175, 238, 167, 230, 100, 37, 33, 96, 226, 163,
342+
112, 49, 179, 242, 246, 183, 53, 116, 61, 124, 254, 191, 187, 250, 120, 57,
343+
159, 222, 92, 29, 25, 88, 218, 155, 210, 147, 17, 80, 84, 21, 151, 214,
344+
5, 68, 198, 135, 131, 194, 64, 1, 72, 9, 139, 202, 206, 143, 13, 76,
345+
149, 212, 86, 23, 19, 82, 208, 145, 216, 153, 27, 90, 94, 31, 157, 220,
346+
15, 78, 204, 141, 137, 200, 74, 11, 66, 3, 129, 192, 196, 133, 7, 70,
347+
224, 161, 35, 98, 102, 39, 165, 228, 173, 236, 110, 47, 43, 106, 232, 169,
348+
122, 59, 185, 248, 252, 189, 63, 126, 55, 118, 244, 181, 177, 240, 114, 51,
349+
127, 62, 188, 253, 249, 184, 58, 123, 50, 115, 241, 176, 180, 245, 119, 54,
350+
229, 164, 38, 103, 99, 34, 160, 225, 168, 233, 107, 42, 46, 111, 237, 172,
351+
10, 75, 201, 136, 140, 205, 79, 14, 71, 6, 132, 197, 193, 128, 2, 67,
352+
144, 209, 83, 18, 22, 87, 213, 148, 221, 156, 30, 95, 91, 26, 152, 217,
353+
};
354+
327355
static uint32_t crc32_calc(uint32_t crc, uint8_t *buffer, uint32_t length)
328356
{
329357
while (length--)
@@ -332,6 +360,46 @@ static uint32_t crc32_calc(uint32_t crc, uint8_t *buffer, uint32_t length)
332360
return crc;
333361
}
334362

363+
static uint32_t reverse_crc32_calc(uint32_t crc, const uint8_t *buffer, uint32_t length)
364+
{
365+
const uint8_t *data = buffer + length;
366+
uint8_t high_byte;
367+
368+
while (length--)
369+
{
370+
data--;
371+
high_byte = (uint8_t)(crc >> 24);
372+
crc ^= crc32_table[high_lookup[high_byte]];
373+
crc <<= 8;
374+
crc += high_lookup[high_byte] ^ *data;
375+
}
376+
377+
return crc;
378+
}
379+
380+
static uint32_t calc_span(uint32_t prior_crc, uint32_t post_crc)
381+
{
382+
int index;
383+
uint8_t byte;
384+
uint32_t span, table;
385+
386+
for (index = 3; index >= 0; index--)
387+
{
388+
table = (table << 8) | high_lookup[post_crc >> 24];
389+
post_crc = (post_crc ^ crc32_table[table & 0xFF]) << 8;
390+
}
391+
for (index = 0; index < 4; index++)
392+
{
393+
byte = (uint8_t)(prior_crc ^ table);
394+
prior_crc = (prior_crc >> 8) ^ crc32_table[table & 0xFF];
395+
span >>= 8;
396+
span |= (uint32_t)byte << 24;
397+
table >>= 8;
398+
}
399+
400+
return span;
401+
}
402+
335403
int main(int argc, char *argv[])
336404
{
337405
FILE *elffp;
@@ -340,8 +408,10 @@ int main(int argc, char *argv[])
340408
Elf32_Phdr *ph;
341409
Elf32_Shdr sh;
342410
struct memory_blob *blob, *pm_list;
343-
uint32_t phy_addr, origin_addr, crc32, stuff_size;
411+
uint32_t phy_addr, dfu_crc32, stuff_size, max_offset;
344412
uint8_t scratchpad[64 /* sized to be at least as large as the DFU suffix */];
413+
uint32_t pre_crc, post_crc, span;
414+
uint8_t *binary;
345415

346416
if (argc < 3)
347417
{
@@ -407,7 +477,7 @@ int main(int argc, char *argv[])
407477
fclose(elffp);
408478

409479
/*
410-
write blob list to raw file
480+
sanity check blob list
411481
*/
412482

413483
blob = pm_list;
@@ -418,43 +488,136 @@ int main(int argc, char *argv[])
418488
return -1;
419489
}
420490

421-
origin_addr = 0x400; /* this is the first address available after the bootloader */
422-
crc32 = 0xFFFFFFFF;
423-
424491
if (blob->address < origin_addr)
425492
{
426493
printf("ERROR: provided ELF intrudes into bootloader space; DFU cannot be created\n");
427494
return -1;
428495
}
429496

497+
if (blob->address != origin_addr)
498+
{
499+
printf("ERROR: provided ELF must start at 0x%x; DFU cannot be created\n", origin_addr);
500+
return -1;
501+
}
502+
430503
dfufp = fopen(argv[2], "wb");
431504
if (!dfufp)
432505
{
433506
printf("ERROR: unable to open file <%s> for writing\n", argv[2]);
434507
return -1;
435508
}
436509

510+
blob = pm_list; phy_addr = origin_addr;
511+
437512
while (blob)
438513
{
439-
phy_addr = blob->address;
514+
stuff_size = blob->address - phy_addr;
440515

441-
while (origin_addr < phy_addr)
516+
if (stuff_size)
442517
{
443-
memset(scratchpad, 0xFF, sizeof(scratchpad));
444-
stuff_size = phy_addr - origin_addr;
445-
if (stuff_size > sizeof(scratchpad))
446-
stuff_size = sizeof(scratchpad);
447-
crc32 = crc32_calc(crc32, scratchpad, stuff_size);
448-
fwrite(scratchpad, stuff_size, 1, dfufp);
449-
origin_addr += stuff_size;
518+
/* there is a gap, so create a blank region here */
519+
find_blob(phy_addr, stuff_size, &pm_list);
450520
}
451521

452-
crc32 = crc32_calc(crc32, blob->data, blob->count);
453-
fwrite(blob->data, blob->count, 1, dfufp);
454-
origin_addr += blob->count;
522+
/* figure the best-case starting point of the next blob */
523+
phy_addr = blob->address + blob->count;
524+
525+
/* advance to next blob */
455526
blob = blob->next;
456527
}
457528

529+
max_offset = phy_addr - origin_addr;
530+
531+
/* check if program image ends on a 32-bit boundary */
532+
if (phy_addr & 0x3)
533+
{
534+
stuff_size = 4 - (phy_addr & 0x3);
535+
max_offset += stuff_size;
536+
/* create blank region at end to pad out to whole 32-bits (as the CRC will require this) */
537+
find_blob(phy_addr, stuff_size, &pm_list);
538+
}
539+
540+
/*
541+
The upcoming CRC calculations are artificially complicated with using blobs, so we
542+
take the approach of moving all the blobs into a malloc-ed linear region of memory.
543+
*/
544+
545+
binary = malloc(max_offset);
546+
547+
while (pm_list)
548+
{
549+
blob = pm_list;
550+
memcpy(binary + blob->address - origin_addr, blob->data, blob->count);
551+
pm_list = pm_list->next;
552+
free(blob);
553+
}
554+
555+
/*
556+
as a sanity check, we check what the value is of the vector entry we over-write with the length
557+
*/
558+
559+
stuff_size = (uint32_t)binary[app_len_offset + 3] << 24;
560+
stuff_size |= (uint32_t)binary[app_len_offset + 2] << 16;
561+
stuff_size |= (uint32_t)binary[app_len_offset + 1] << 8;
562+
stuff_size |= (uint32_t)binary[app_len_offset + 0] << 0;
563+
564+
if (stuff_size)
565+
{
566+
printf("WARNING: overwriting 0x%x at 0x%x with length value (%d)\n", stuff_size, origin_addr + app_len_offset, max_offset);
567+
}
568+
569+
/* store app length within application itself */
570+
binary[app_len_offset + 0] = (uint8_t)(max_offset >> 0);
571+
binary[app_len_offset + 1] = (uint8_t)(max_offset >> 8);
572+
binary[app_len_offset + 2] = (uint8_t)(max_offset >> 16);
573+
binary[app_len_offset + 3] = (uint8_t)(max_offset >> 24);
574+
575+
/*
576+
perform the calculation to determine what 32-bit value to write
577+
in order to cause the overall CRC32 to calculate to be zero
578+
*/
579+
580+
pre_crc = crc32_calc(0xFFFFFFFF /* starting CRC value */, binary, app_crc_offset);
581+
post_crc = reverse_crc32_calc(0 /* desired end value */, binary + app_crc_offset + 4, max_offset - (app_crc_offset + 4));
582+
span = calc_span(pre_crc, post_crc);
583+
584+
/*
585+
as a sanity check, we check what the value is of the vector entry we over-write with the CRC32
586+
*/
587+
588+
stuff_size = (uint32_t)binary[app_crc_offset + 3] << 24;
589+
stuff_size |= (uint32_t)binary[app_crc_offset + 2] << 16;
590+
stuff_size |= (uint32_t)binary[app_crc_offset + 1] << 8;
591+
stuff_size |= (uint32_t)binary[app_crc_offset + 0] << 0;
592+
593+
if (stuff_size)
594+
{
595+
printf("WARNING: overwriting 0x%x at 0x%x with CRC value (%d)\n", stuff_size, origin_addr + app_crc_offset, span);
596+
}
597+
598+
/* store app CRC within application itself */
599+
binary[app_crc_offset + 0] = (uint8_t)(span >> 0);
600+
binary[app_crc_offset + 1] = (uint8_t)(span >> 8);
601+
binary[app_crc_offset + 2] = (uint8_t)(span >> 16);
602+
binary[app_crc_offset + 3] = (uint8_t)(span >> 24);
603+
604+
stuff_size = crc32_calc(0xFFFFFFFF, binary, max_offset);
605+
606+
if (stuff_size)
607+
{
608+
printf("ERROR: something has gone wrong with the CRC calculation; value was 0x%x\n", stuff_size);
609+
return -1;
610+
}
611+
612+
/* start the DFU CRC32 calculation */
613+
dfu_crc32 = crc32_calc(0xFFFFFFFF, binary, max_offset);
614+
615+
/* write the tweaked image into the DFU file */
616+
fwrite(binary, max_offset, 1, dfufp);
617+
618+
/* free remaining malloc-ed memory */
619+
free(binary);
620+
458621
/*
459622
append DFU standard suffix
460623
*/
@@ -474,12 +637,12 @@ int main(int argc, char *argv[])
474637
scratchpad[i++] = 16; // bLength
475638

476639
/* the CRC-32 has now been calculated over the entire file, save for the CRC field itself */
477-
crc32 = crc32_calc(crc32, scratchpad, i);
640+
dfu_crc32 = crc32_calc(dfu_crc32, scratchpad, i);
478641

479-
scratchpad[i++] = (uint8_t)(crc32 >> 0);
480-
scratchpad[i++] = (uint8_t)(crc32 >> 8);
481-
scratchpad[i++] = (uint8_t)(crc32 >> 16);
482-
scratchpad[i++] = (uint8_t)(crc32 >> 24);
642+
scratchpad[i++] = (uint8_t)(dfu_crc32 >> 0);
643+
scratchpad[i++] = (uint8_t)(dfu_crc32 >> 8);
644+
scratchpad[i++] = (uint8_t)(dfu_crc32 >> 16);
645+
scratchpad[i++] = (uint8_t)(dfu_crc32 >> 24);
483646

484647
fwrite(scratchpad, i, 1, dfufp);
485648

usb_descriptors.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ usb_device_descriptor_t usb_device_descriptor __attribute__ ((aligned (4))) = /*
4545
.bMaxPacketSize0 = 64,
4646
.idVendor = 0x1209,
4747
.idProduct = 0x2003,
48-
.bcdDevice = 0x0101,
48+
.bcdDevice = 0x0102,
4949

5050
.iManufacturer = USB_STR_ZERO,
5151
.iProduct = USB_STR_ZERO,

0 commit comments

Comments
 (0)