Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
224 changes: 224 additions & 0 deletions bip.mediawiki
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
<pre>
BIP: ?
Layer: Applications
Title: Formosa --- Themed mnemonic sentences for generating deterministic keys
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title is limited to 50 characters

Author: Yuri S Villas Boas <yuri@t3infosecurity.com>
André Fidencio Gonçalves <andre7c4@gmail.com>
Comments-Summary: No comments yet.
Comments-URI: https://github.com/bitcoin/bips/wiki/Comments:BIP-formosa
Status: Draft
Type: Standards Track
Created: 2021-12-10
License: BSD-2-Clause
Requires: BIP-0032, BIP-0039
Post-History: https://www.toptal.com/cryptocurrency/formosa-crypto-wallet-management
Comment on lines +5 to +14
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are now following BIP3 for the BIP Process, the Preamble is formatted slightly differently:

Suggested change
Author: Yuri S Villas Boas <yuri@t3infosecurity.com>
André Fidencio Gonçalves <andre7c4@gmail.com>
Comments-Summary: No comments yet.
Comments-URI: https://github.com/bitcoin/bips/wiki/Comments:BIP-formosa
Status: Draft
Type: Standards Track
Created: 2021-12-10
License: BSD-2-Clause
Requires: BIP-0032, BIP-0039
Post-History: https://www.toptal.com/cryptocurrency/formosa-crypto-wallet-management
Authors: Yuri S Villas Boas <yuri@t3infosecurity.com>
André Fidencio Gonçalves <andre7c4@gmail.com>
Status: Draft
Type: Specification
Assigned: ?
License: BSD-2-Clause
Requires: 32, 39
Discussion: https://gnusha.org/pi/bitcoindev/jQqInjh7VTC5byefTzENidJjigvRqf5Y7UvbrWjKPJykvhdlLETeglGE3zoAiVAxUyAXU8uWHsHEjJ0MHqqPTy4prgaIhgMyIrD9c6ZUuE0=@pm.me/#t
https://gnusha.org/pi/bitcoindev/F4cs-RJRQYBXhjoS9fc_cUc93yLrkQS5DNQAeFRHrLEQ5bScCjKSnaqN-IcXb16fxqO053muqFCx8_GzzKN5XCGCIHD9Ir1_baI5voKYfOo=@pm.me/
https://www.toptal.com/cryptocurrency/formosa-crypto-wallet-management

</pre>

==Abstract==

This BIP describes an expansion of BIP-0039 for the generation of deterministic
wallets. Where BIP-0039 uses a flat list of unrelated words, Formosa organizes
mnemonic words into themed sentences with syntactic structure and semantic
coherence, substantially improving memorability while retaining all properties
of the original scheme.

It consists of two parts: generating the mnemonic and converting it into a
binary seed. This seed can be later used to generate deterministic wallets using
BIP-0032 or similar methods.

Full forward and backward compatibility with BIP-0039 is maintained: seed
derivation internally converts any Formosa mnemonic back to its equivalent
BIP-0039 representation, so existing keys and addresses are preserved.

==Copyright==

This BIP is licensed under the BSD 2-clause license.

==Motivation==

A mnemonic code or sentence is superior for human interaction compared to the
handling of raw binary or hexadecimal representations of a wallet seed. The
sentence could be written on paper or spoken over the telephone.

However, human memory is an associative process: information is more readily
retained when it can be linked to existing knowledge through semantic
associations, visual imagery, and narrative context. A BIP-0039 mnemonic is a
sequence of unrelated words with no syntactic or semantic relationship, making
it difficult to form the mental associations that aid long-term retention.

Formosa builds upon BIP-0039 by organizing mnemonic words into themed sentences
with syntactic roles (e.g., subject, adjective, object, location). Each sentence
draws vocabulary from a coherent semantic domain --- medieval fantasy, science
fiction, nature, finance, or any custom theme --- enabling the user to form vivid
mental images that reduce memorization effort per bit of entropy.

This guide is meant to be a way to transport computer-generated randomness with
a human-readable transcription. It's not a way to process user-created
sentences (also known as brainwallets) into a wallet seed.

==Generating the mnemonic==

The mnemonic must encode entropy in a multiple of 32 bits. With more entropy
security is improved but the sentence length increases. We refer to the
initial entropy length as ENT. The allowed size of ENT is 128-256 bits.

First, an initial entropy of ENT bits is generated. A checksum is generated by
taking the first <code>ENT / 32</code> bits of its SHA256 hash. This checksum is
appended to the end of the initial entropy. Next, these concatenated bits
are split into groups of 33 bits, which we call '''sentences'''. Each sentence is
further subdivided into variable-length bit fields, one per syntactic category,
whose lengths are defined by the active theme. Each bit field encodes an index
into the corresponding category's word list. Finally, we convert these indices
into words and use the joined words as a mnemonic sentence.

BIP-0039 is a special case where each sentence contains three 11-bit fields
indexing a single 2048-word list (3 x 11 = 33).

The following table describes the relation between the initial entropy
length (ENT), the checksum length (CS), the number of 33-bit sentences (S),
and the length of the generated mnemonic sentence (MS) in words. The word
count assumes a 6-word theme; for BIP-0039 (3 words per sentence), divide by 2.

<pre>
CS = ENT / 32
S = (ENT + CS) / 33

| ENT | CS | ENT+CS | S | MS (6-word) | MS (BIP-0039) |
+-------+----+--------+-----+-------------+---------------+
| 128 | 4 | 132 | 4 | 24 | 12 |
| 160 | 5 | 165 | 5 | 30 | 15 |
| 192 | 6 | 198 | 6 | 36 | 18 |
| 224 | 7 | 231 | 7 | 42 | 21 |
| 256 | 8 | 264 | 8 | 48 | 24 |
</pre>

For each 33-bit sentence, the word selection algorithm proceeds as follows:

# Initialize an empty sentence array with one slot per category.
# For each category in the theme's ''filling order'':
## Extract <code>BIT_LENGTH</code> bits from the current position in the bit stream.
## Interpret them as an unsigned integer index.
## If the category is ''led by'' another category, look up the appropriate sub-list from the leading category's mapping using the already-selected leading word. Otherwise, use the category's total word list.
## Select the word at the computed index from the resolved word list.
## Place the word into the sentence array at the position given by the theme's ''natural order''.
# Output the words in natural order.

==Themes==

The Formosa equivalent to a BIP-0039 wordlist is a '''theme'''. A theme is a JSON
document that defines syntactic categories, their word lists, bit-widths, and
optional semantic restrictions between categories. The sum of all category
bit-widths in a theme MUST equal 33.

An ideal theme has the following characteristics:

a) specific semantic scope (memory block)
- the entire vocabulary should adhere to a single coherent topic, enabling
the user to form a unified mental scene

b) concrete imagery
- categories should consist of elements easily associated with mental images.
Prefer concrete nouns and tangible adjectives over abstract terms

c) sorted wordlists
- the wordlist is sorted which allows for more efficient lookup of the code words
(i.e. implementations can use binary search instead of linear search)

d) first-letters uniqueness
- the wordlist is created in such a way that it's enough to type the first two
letters to unambiguously identify the word

The first-letters uniqueness property yields higher information density than
BIP-0039. In BIP-0039, four characters are needed to identify each word,
encoding 11 bits per 4 characters = 2.75 bits/character. In Formosa, two
characters suffice per word. The achievable density depends on the theme's
category bit-widths:

<pre>
| List size | Bits | Chars to identify | Density (bits/char) |
+-----------+------+-------------------+---------------------+
| 2048 | 11 | 4 | 2.75 (BIP-0039) |
| 32 | 5 | 2 | 2.50 |
| 64 | 6 | 2 | 3.00 |
| 128 | 7 | 2 | 3.50 |
</pre>

As an example, the ''nationalities'' theme uses four 7-bit nationality
categories (128 entries each) and one 5-bit profession category (32 entries),
yielding 33 bits per 5-word sentence. A user typing only the first two
characters of each word types 10 characters to encode 33 bits, achieving an
information density of 33 / 10 = 3.30 bits/character --- a 20% improvement
over BIP-0039's 2.75 bits/character

e) semantic restrictions (optional)
- themes may define restrictions between categories so that the available word list
for one category changes depending on the word selected in a leading category,
producing more semantically coherent sentences. Restriction relationships MUST
be acyclic

The wordlist can contain native characters, but they must be encoded in UTF-8
using Normalization Form Compatibility Decomposition (NFKD).

==From mnemonic to seed==

A user may decide to protect their mnemonic with a passphrase. If a passphrase is not
present, an empty string "" is used instead.

To ensure forward and backward compatibility with BIP-0039, seed derivation first
converts any Formosa mnemonic back to its equivalent BIP-0039 mnemonic by extracting
the underlying entropy and re-encoding it using the BIP-0039 English word list. This
guarantees that the same entropy always produces the same seed, keys, and addresses
regardless of which theme was used.

To create a binary seed from the resulting BIP-0039 mnemonic, we use the PBKDF2 function
with a mnemonic sentence (in UTF-8 NFKD) used as the password and the string "mnemonic" +
passphrase (again in UTF-8 NFKD) used as the salt. The iteration count is set to 2048 and
HMAC-SHA512 is used as the pseudo-random function. The length of the derived key is 512
bits (= 64 bytes).

This seed can be later used to generate deterministic wallets using BIP-0032 or
similar methods.

The conversion of the mnemonic sentence to a binary seed is completely independent
from generating the sentence. This results in a rather simple code; there are no
constraints on sentence structure and clients are free to implement their own
themes or even whole sentence generators, allowing for flexibility in wordlists
for typo detection or other purposes.

Although using a mnemonic not generated by the algorithm described in "Generating the
mnemonic" section is possible, this is not advised and software must compute a
checksum for the mnemonic sentence using a wordlist and issue a warning if it is
invalid.

The described method also provides plausible deniability, because every passphrase
generates a valid seed (and thus a deterministic wallet) but only the correct one
will make the desired wallet available.

==Standard themes==

The reference implementation ships with standard themes listed at the link below.
Since BIP-0039 is a valid Formosa theme, all existing BIP-0039 mnemonics work
without modification.

It is '''strongly discouraged''' to use non-standard custom themes for generating
mnemonic sentences, as the user assumes responsibility for ensuring the theme file
remains available and structurally valid. Users with proper training in security
protocols who understand these risks may benefit from custom themes through higher
memorization efficiency or an additional layer of obscurity.

* [[https://github.com/Yuri-SVB/formosa/tree/master/src/mnemonic/themes|Standard Formosa Themes]]

==Test vectors==

The test vectors include input entropy, mnemonic and seed. The
passphrase "TREZOR" is used for all vectors. Since Formosa converts back to
BIP-0039 before seed derivation, the same test vectors apply to all themes
given the same underlying entropy.

https://github.com/Yuri-SVB/formosa/blob/master/vectors.json

==Reference Implementation==

Reference implementation including themes is available from

https://github.com/Yuri-SVB/formosa