Skip to content

Explicit support for embedded targets #253

@LoupVaillant

Description

@LoupVaillant

I've been approached lately by @Laczen about possibly making an "embedded edition" of Monocypher, less focused on speed and more focused on reducing footprint (binary & stack sizes). One particular step in this direction is implementing Blake2s.

I have resisted Blake2s for a long time, most notably because of the redundancy with Blake2b (we already have SHA-512 for compatibility, I don't want to make things even worse), but I have to admit it makes a whole lot of sense for what is arguably our biggest user base.

Ideally we would be able to share code between Blake2b and Blake2s, and have Blake2s be supported by a flag with the current Blake2 functions. To be checked, but I believe such sharing is not actually possible because the internal loops of Blake2b use 64-bit integer, while Blake2s uses 32-bit integers instead. So we should probably provide Blake2s separately, in a new optional compilation unit: src/optional/blake2s.h and src/optional/blake2s.c.

There's also Blake3 to consider, since it's 32-bit too, but takes advantage of vector instructions for bigger processors. I have two problems with it though: first, it has tighter security margins, in accordance to Aumasson's Too Much Crypto. Not exactly unsafe, but perhaps a bit risky? Second, it would replace Blake2b, which would be a hard break, and so far that never happened with Monocypher (we broke the API several times, but we never ever completely broke the ability of users to keep their old wire formats —even for AEAD they could use the basic primitives to achieve it again). Another problem with removing Blake2b would be the inability to implement Argon2. A variant would be easy for sure, but we'd lose the standard compliance. Oh, and I recall @fscoto warning me that Blake3 needs a bigger footprint than Blake2s, making it is less ideal for small embedded targets.

Anyway, adding Blake2s would be a step towards the support of embedded targets.

The next step would be writing an actual embedded edition of Monocypher. Something that sacrifices speed so it can be smaller (make sense with the smallest targets, that have limited program memory, and teeny tiny stacks). The problem here is how far we should be willing to go, and that very much depends on the embedded market: are 8-bit and 16-bit processors relevant? Would we even try to do cryptography on them? How about extremely low-powered 32-bit processors? Do they have enough memory so we should optimise for speed to reduce energy consumption? Or do they need smaller crypto code? What multipliers should our bignum arithmetic target? Are the current 32->64 bit multiplications okay, or should we go down to 16->32, or even 8->16 like C25519 does?

All hard questions I don't think I can answer right now. What I can answer for sure is that 32-bit targets all prefer Blake2s over Blake2b. And since it looks like we have actual demand for it, we should probably provide it. Only problem being, well… adding a primitive. We want to stay focused and not turn ourselves into a kitchen sink. @fscoto, do you have some thoughts about it all?

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions