|
| 1 | +--- |
| 2 | +title: Be Careful Zero-Copying Strings with serde |
| 3 | +--- |
| 4 | + |
| 5 | +# Be Careful Zero-Copying Strings with `serde` |
| 6 | + |
| 7 | +When deserializing a string using `serde`, it is possible to use a borrowed `&str` instead of an owned `String`: |
| 8 | + |
| 9 | +```rust |
| 10 | +use serde::Deserialize; |
| 11 | +use serde_json; |
| 12 | + |
| 13 | +#[derive(Deserialize)] |
| 14 | +struct Foo<'a> { |
| 15 | + // This string is borrowed. |
| 16 | + text: &'a str, |
| 17 | +} |
| 18 | + |
| 19 | +fn main() { |
| 20 | + let json = r#"{ "text": "Hello, world!" }"#; |
| 21 | + |
| 22 | + let foo: Foo = serde_json::from_str(json).unwrap(); |
| 23 | + |
| 24 | + println!("{}", foo.text); // Hello, world! |
| 25 | +} |
| 26 | +``` |
| 27 | + |
| 28 | +The borrowed string is a reference to a portion of the original serialized data. In this case, `foo.text` refers to a slice of the `json` variable that contains the text `Hello, world!`. |
| 29 | + |
| 30 | +This process is called zero-copy deserialization, and can be more efficient than allocating a new `String` and copying the data to it. Be warned, however; **some strings cannot be deserialized into `&str`, and must be deserialized into a `String` instead**. |
| 31 | + |
| 32 | +The specific case where I found this out was when I was deserializing text with backslashes in it: |
| 33 | + |
| 34 | +```rust |
| 35 | +let json = r#"{ "text": "Go to C:\\Users\\bd\\Desktop" }"#; |
| 36 | + |
| 37 | +let foo: Foo = serde_json::from_str(json).unwrap(); |
| 38 | + |
| 39 | +println!("{}", foo.text); |
| 40 | +``` |
| 41 | + |
| 42 | +Instead of printing `Go to C:\Users\bd\Desktop` as I expected, it instead panicked! |
| 43 | + |
| 44 | +``` |
| 45 | +thread 'main' panicked at src/main.rs:12:47: |
| 46 | +called `Result::unwrap()` on an `Err` value: Error("invalid type: string \"Go to C:\\\\Users\\\\bd\\\\Desktop\", expected a borrowed string", line: 1, column: 34) |
| 47 | +``` |
| 48 | + |
| 49 | +When deserializing the text, `serde_json` needs to convert `Go to C:\\Users\\bd\\Desktop` to `Go to C:\Users\bd\Desktop`. The only way it can do that is by *allocating a new string*. `serde_json` can't do that here, however, because we told it not to by using zero-copy deserialization! |
| 50 | + |
| 51 | +In order to fix this, you need to replace the borrowed `&str` with an owned `String`[^1]. It can be slower than zero-copy deserialization, but it supports *all* possible data inputs: |
| 52 | + |
| 53 | +```rust |
| 54 | +use serde::Deserialize; |
| 55 | +use serde_json; |
| 56 | + |
| 57 | +#[derive(Deserialize)] |
| 58 | +struct Foo { |
| 59 | + text: String, |
| 60 | +} |
| 61 | + |
| 62 | +fn main() { |
| 63 | + let json = r#"{ "text": "Go to C:\\Users\\bd\\Desktop" }"#; |
| 64 | + |
| 65 | + let foo: Foo = serde_json::from_str(json).unwrap(); |
| 66 | + |
| 67 | + println!("{}", foo.text); // Go to C:\Users\bd\Desktop |
| 68 | +} |
| 69 | +``` |
| 70 | + |
| 71 | +This kind of issue will arise when deserializing other escape codes in JSON, such as `\n` and `\t`. It can also occur when using other types that can be zero-copied, such as `&Path`[^2]. Next time you consider using zero-copy deserialization, be sure you're ok with limiting what data you can support. |
| 72 | + |
| 73 | +**Further Reading:** |
| 74 | + |
| 75 | +- [Deserializer lifetimes](https://serde.rs/lifetimes.html) |
| 76 | +- [JSON string with backslashes does not deserialize into borrowed `&str`](https://github.com/serde-rs/serde/issues/1746) |
| 77 | + |
| 78 | +[^1]: You can also use `Cow<str>`, but it will allocate a new `String` even if the text can be zero-copied, so it has the same effect as just using `String` directly. |
| 79 | + |
| 80 | +[^2]: Be especially careful about using this type. Since it cannot deserialize backslashes, you're essentially eliminating support for Windows paths. |
0 commit comments