You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ec6cc56 docs: update README (Jose Celano)
68d9915 refactor: rename json::BencodeParser to json::Generator (Jose Celano)
a3c7c4b refactor: remove parent parser mod (Jose Celano)
3052d6a refactor: rename BencodeTOkenizer to Tokenizer (Jose Celano)
331c76e refactor: reorganize modules (Jose Celano)
9e0db6c refactor: remove writer from tokenizer string parser (Jose Celano)
0a05544 refactor: remove old int and str parsers with writers (Jose Celano)
75ffdb4 refactor: remove writer from tokenizer integer parser (Jose Celano)
77ad5af refactor: remove writer from main tokenizer (Jose Celano)
f6a0584 refactor: duplicate integer and strig parser before removing writer (Jose Celano)
3a7ea5d refactor: extract mod tokenizer (Jose Celano)
63b9b73 refactor: extract struct BencodeTokenizer (Jose Celano)
83eeefd refactor: extract bencode tokenizer (Jose Celano)
Pull request description:
This refactoring changes the current implementation to extract the tokenizer. It splits parser logic into two types:
- **Tokenizer**: It returns bencoded tokens.
- **Generator**: It iterates over bencoded tokens to generate the JSON.
**NOTES**
- It keeps the custom recursivity (with explicit stack) for the time being, instead of using explicit recursivity like @da2ce7 did [here](#12 (comment)). I guess that could be changed later if we think it increases readability and maintainability.
**SUBTASKS**
- [x] Separate logic for tokenizer.
- [x] Extract tokenizer.
- [x] Remove `Writer` from the tokenizer. It's not needed.
**PERFORMANCE**
In the current version, bencoded strings are cached in memory before starting writing to the output (because we nned the whole string to check if it's a valid UTF-8). In this PR, bencoded integers are also cached in memory because the whole integer value is a token. This should not be a problem since integers are short, unlike strings.
**FUTURE PRs**
We could:
- [ ] Implement the `Iterator` trait for the tokenizer.
- [ ] Use recursion for the generator like @da2ce7's proposal [here](#12).
- [ ] Implement another generator for TOML, for example. Check if this design can be easily extended to other output formats.
ACKs for top commit:
josecelano:
ACK ec6cc56
Tree-SHA512: 9210211d802c8e19aef1f02f814b494c5919c7da81f299cf2c7f4d9fb12b4c63cbec4ac526996e6b1b3d69f75ca58894b9d64936bef2d9da851e70d51234c675
Error: Leading zeros in integers are not allowed, for example b'i00e'; read context: byte `48` (char: `0`), input pos 3, latest input bytes dump: [105, 48, 48] (UTF-8 string: `i00`); write context: byte `48` (char: `0`), output pos 2, latest output bytes dump: [48, 48] (UTF-8 string: `00`)
73
+
Error: Leading zeros in integers are not allowed, for example b'i00e'; read context: byte `48` (char: `0`), input pos 3, latest input bytes dump: [105, 48, 48] (UTF-8 string: `i00`)
println!("{output}"); // It prints the JSON string: "<string>spam</string>"
141
-
```
142
-
143
-
More [examples](./examples/).
117
+
See [examples](./examples/).
144
118
145
119
## Test
146
120
@@ -167,21 +141,19 @@ cargo cov
167
141
## Performance
168
142
169
143
In terms of memory usage this implementation consumes at least the size of the
170
-
biggest bencoded string. The string parser keeps all the string bytes in memory until
171
-
it parses the whole string, in order to convert it to UTF-8, when it's possible.
144
+
biggest bencoded integer or string. The string and integer parsers keeps all the bytes in memory until
145
+
it parses the whole value.
172
146
173
147
The library also wraps the input and output streams in a [BufReader](https://doc.rust-lang.org/std/io/struct.BufReader.html)
174
148
and [BufWriter](https://doc.rust-lang.org/std/io/struct.BufWriter.html) because it can be excessively inefficient to work directly with something that implements [Read](https://doc.rust-lang.org/std/io/trait.Read.html) or [Write](https://doc.rust-lang.org/std/io/trait.Write.html).
175
149
176
150
## TODO
177
151
178
-
-[ ] More examples of using the library.
179
152
-[ ] Counter for number of items in a list for debugging and errors.
180
153
-[ ] Fuzz testing: Generate random valid bencoded values.
0 commit comments