Skip to content

Commit 053a91d

Browse files
authored
Merge pull request #11532 from prestwich/patch-1
fix: address ambiguities in RLP spec description
2 parents 03d73db + 4e76922 commit 053a91d

File tree

1 file changed

+10
-6
lines changed
  • public/content/developers/docs/data-structures-and-encoding/rlp

1 file changed

+10
-6
lines changed

public/content/developers/docs/data-structures-and-encoding/rlp/index.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ lang: en
55
sidebarDepth: 2
66
---
77

8-
Recursive Length Prefix (RLP) serialization is used extensively in Ethereum's execution clients. RLP standardizes the transfer of data between nodes in a space-efficient format. The purpose of RLP is to encode arbitrarily nested arrays of binary data, and RLP is the primary encoding method used to serialize objects in Ethereum's execution layer. The only purpose of RLP is to encode structure; encoding specific data types (e.g. strings, floats) is left up to higher-order protocols; but positive RLP integers must be represented in big-endian binary form with no leading zeroes (thus making the integer value zero equivalent to the empty byte array). Deserialized positive integers with leading zeroes get treated as invalid. The integer representation of string length must also be encoded this way, as well as integers in the payload.
8+
Recursive Length Prefix (RLP) serialization is used extensively in Ethereum's execution clients. RLP standardizes the transfer of data between nodes in a space-efficient format. The purpose of RLP is to encode arbitrarily nested arrays of binary data, and RLP is the primary encoding method used to serialize objects in Ethereum's execution layer. The main purpose of RLP is to encode structure; with the exception of positive integers, RLP delegates encoding specific data types (e.g. strings, floats) to higher-order protocols. Positive integers must be represented in big-endian binary form with no leading zeroes (thus making the integer value zero equivalent to the empty byte array). Deserialized positive integers with leading zeroes must be treated as invalid by any higher-order protocol using RLP.
99

1010
More information in [the Ethereum yellow paper (Appendix B)](https://ethereum.github.io/yellowpaper/paper.pdf#page=19).
1111

@@ -20,21 +20,25 @@ The RLP encoding function takes in an item. An item is defined as follows:
2020

2121
- a string (i.e. byte array) is an item
2222
- a list of items is an item
23+
- a positive integer is an item
2324

2425
For example, all of the following are items:
2526

2627
- an empty string;
2728
- the string containing the word "cat";
2829
- a list containing any number of strings;
2930
- and a more complex data structures like `["cat", ["puppy", "cow"], "horse", [[]], "pig", [""], "sheep"]`.
31+
- the number `100`
3032

31-
Note that in the context of the rest of this page, 'string' means "a certain number of bytes of binary data"; no special encodings are used, and no knowledge about the content of the strings is implied.
33+
Note that in the context of the rest of this page, 'string' means "a certain number of bytes of binary data"; no special encodings are used, and no knowledge about the content of the strings is implied (except as required by the rule against non-minimal positive integers).
3234

3335
RLP encoding is defined as follows:
3436

37+
- For a positive integer, it is converted to the the shortest byte array whose big-endian interpretation is the integer, and then encoded as a string according to the rules below.
3538
- For a single byte whose value is in the `[0x00, 0x7f]` (decimal `[0, 127]`) range, that byte is its own RLP encoding.
3639
- Otherwise, if a string is 0-55 bytes long, the RLP encoding consists of a single byte with value **0x80** (dec. 128) plus the length of the string followed by the string. The range of the first byte is thus `[0x80, 0xb7]` (dec. `[128, 183]`).
3740
- If a string is more than 55 bytes long, the RLP encoding consists of a single byte with value **0xb7** (dec. 183) plus the length in bytes of the length of the string in binary form, followed by the length of the string, followed by the string. For example, a 1024 byte long string would be encoded as `\xb9\x04\x00` (dec. `185, 4, 0`) followed by the string. Here, `0xb9` (183 + 2 = 185) as the first byte, followed by the 2 bytes `0x0400` (dec. 1024) that denote the length of the actual string. The range of the first byte is thus `[0xb8, 0xbf]` (dec. `[184, 191]`).
41+
- If a string is 2^64 bytes long, or longer, it may not be encoded.
3842
- If the total payload of a list (i.e. the combined length of all its items being RLP encoded) is 0-55 bytes long, the RLP encoding consists of a single byte with value **0xc0** plus the length of the payload followed by the concatenation of the RLP encodings of the items. The range of the first byte is thus `[0xc0, 0xf7]` (dec. `[192, 247]`).
3943
- If the total payload of a list is more than 55 bytes long, the RLP encoding consists of a single byte with value **0xf7** plus the length in bytes of the length of the payload in binary form, followed by the length of the payload, followed by the concatenation of the RLP encodings of the items. The range of the first byte is thus `[0xf8, 0xff]` (dec. `[248, 255]`).
4044

@@ -73,9 +77,9 @@ def to_binary(x):
7377
- the empty string ('null') = `[ 0x80 ]`
7478
- the empty list = `[ 0xc0 ]`
7579
- the integer 0 = `[ 0x80 ]`
76-
- the encoded integer 0 ('\\x00') = `[ 0x00 ]`
77-
- the encoded integer 15 ('\\x0f') = `[ 0x0f ]`
78-
- the encoded integer 1024 ('\\x04\\x00') = `[ 0x82, 0x04, 0x00 ]`
80+
- the byte '\\x00' = `[ 0x00 ]`
81+
- the byte '\\x0f' = `[ 0x0f ]`
82+
- the bytes '\\x04\\x00' = `[ 0x82, 0x04, 0x00 ]`
7983
- the [set theoretical representation](http://en.wikipedia.org/wiki/Set-theoretic_definition_of_natural_numbers) of three, `[ [], [[]], [ [], [[]] ] ] = [ 0xc7, 0xc0, 0xc1, 0xc0, 0xc3, 0xc0, 0xc1, 0xc0 ]`
8084
- the string "Lorem ipsum dolor sit amet, consectetur adipisicing elit" = `[ 0xb8, 0x38, 'L', 'o', 'r', 'e', 'm', ' ', ... , 'e', 'l', 'i', 't' ]`
8185

@@ -85,7 +89,7 @@ According to the rules and process of RLP encoding, the input of RLP decode is r
8589

8690
1. according to the first byte (i.e. prefix) of input data and decoding the data type, the length of the actual data and offset;
8791

88-
2. according to the type and offset of data, decode the data correspondingly;
92+
2. according to the type and offset of data, decode the data correspondingly, respecting the minimal encoding rule for positive integers;
8993

9094
3. continue to decode the rest of the input;
9195

0 commit comments

Comments
 (0)