Skip to content

Commit 82ee314

Browse files
Adding migration guide to storage
1 parent e5c5495 commit 82ee314

File tree

1 file changed

+330
-0
lines changed

1 file changed

+330
-0
lines changed
Lines changed: 330 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,330 @@
1+
---
2+
title: Migrate contract storage data when upgrading data structures
3+
hide_table_of_contents: true
4+
description: Use the version marker pattern to safely read and migrate stored data when a contract upgrade changes a data structure
5+
---
6+
7+
When a contract is upgraded and a stored data structure gains new fields, the data already written to the ledger still uses the old layout. Naively reading those old entries with the new type causes the host to trap. This guide explains why that happens, introduces the version marker pattern as the correct solution, and covers lazy versus eager migration strategies and how to test them.
8+
9+
## Why intuitive approaches fail
10+
11+
Suppose a contract stores `DataV1` entries and is upgraded to use `DataV2`, which adds an optional field `c`:
12+
13+
```rust
14+
#[contracttype]
15+
pub struct Data { a: i64, b: i64 }
16+
17+
#[contracttype]
18+
pub struct DataV2 { a: i64, b: i64, c: Option<i64> }
19+
```
20+
21+
### Approach 1: Read old entries directly with the new type
22+
23+
The most natural approach is to read the stored bytes directly as `DataV2` and expect `c` to default to `None`:
24+
25+
```rust
26+
// Reading a DataV1 entry with the DataV2 type.
27+
// A developer might expect c = None for old entries — but this traps.
28+
let data: DataV2 = env.storage().persistent().get(&key).unwrap();
29+
// Error(Object, UnexpectedSize)
30+
```
31+
32+
This traps with `Error(Object, UnexpectedSize)`. The Soroban host validates the field count of the XDR-encoded value against the type definition before returning anything to the contract. Because `DataV1` has two fields and `DataV2` has three, the host rejects the entry before the SDK can handle it.
33+
34+
### Approach 2: Use `try_from_val` as a fallback
35+
36+
Another approach is to use `try_from_val` expecting to catch a deserialization error and recover:
37+
38+
```rust
39+
let raw: Val = env.storage().persistent().get(&key).unwrap();
40+
if let Ok(v2) = DataV2::try_from_val(&env, &raw) {
41+
v2
42+
} else {
43+
// This branch is never reached — the host traps before returning Err.
44+
let v1 = DataV1::try_from_val(&env, &raw).unwrap();
45+
DataV2 { a: v1.a, b: v1.b, c: None }
46+
}
47+
```
48+
49+
This also traps at the host level. The field count validation happens in the host environment during deserialization — it does not produce a Rust `Err` that the SDK can intercept. There is no way to catch or recover from the mismatch at the contract level.
50+
51+
The root issue is that a contract cannot determine which type an existing storage entry was written as just by reading it. That information must be stored explicitly.
52+
53+
## Version Marker Pattern
54+
55+
The solution is to store a version number alongside each data entry, keyed by the same identifier. The contract reads the version first, then branches on the result to decode the payload with the correct type.
56+
57+
### Key layout
58+
59+
Define two variants in your key enum — one for the version marker and one for the payload — both keyed by the same `id`:
60+
61+
```rust
62+
#[contracttype]
63+
pub enum DataKey {
64+
DataVersion(u32), // version marker keyed by id
65+
Data(u32), // data keyed by id
66+
}
67+
```
68+
69+
Each logical record occupies two storage slots. Because the version is stored per-record rather than globally, each entry is independently versioned. There is no all-or-nothing upgrade requirement.
70+
71+
### Reading with version awareness
72+
73+
Before decoding a storage entry, read its version marker. Use `unwrap_or(1)` to handle entries that were written before versioning was introduced — the absence of a version key is itself a signal that the entry is version 1:
74+
75+
```rust
76+
fn read_data(env: &Env, id: u32) -> DataV2 {
77+
let version: u32 = env.storage().persistent()
78+
.get(&DataKey::DataVersion(id))
79+
.unwrap_or(1); // default to v1 for entries without version marker
80+
81+
match version {
82+
1 => {
83+
let v1: DataV1 = env.storage().persistent().get(&DataKey::Data(id)).unwrap();
84+
DataV2 { a: v1.a, b: v1.b, c: None }
85+
}
86+
_ => env.storage().persistent().get(&DataKey::Data(id)).unwrap(),
87+
}
88+
}
89+
```
90+
91+
### Writing always uses the current version
92+
93+
Every write stamps the entry with the current version number. An entry that was originally `DataV1` will carry a `DataVersion` marker of `2` the next time it is written back:
94+
95+
```rust
96+
fn write_data(env: &Env, id: u32, data: &DataV2) {
97+
env.storage().persistent().set(&DataKey::DataVersion(id), &2u32);
98+
env.storage().persistent().set(&DataKey::Data(id), data);
99+
}
100+
```
101+
102+
### Lazy vs eager migration
103+
104+
Once version-aware read/write logic is in place, there are two strategies for converting old entries.
105+
106+
#### Lazy migration (convert on read)
107+
108+
In lazy migration, old entries are left untouched on the ledger. When a record is read, its version is detected and it is up-converted in memory. When that record is later written back, it is stamped with the new version. No explicit migration step is needed — conversion happens as records are accessed in normal contract use.
109+
110+
Lazy migration is generally preferred on blockchains. Leaving old entries untouched has no upfront cost and no risk of hitting instruction or ledger-entry limits at upgrade time. Records that are never accessed again are never migrated, which is usually acceptable.
111+
112+
The `read_data` function shown above already implements lazy migration. Each time an old `DataV1` entry is read and then passed to `write_data`, the entry is silently upgraded in place.
113+
114+
#### Eager migration (batch conversion)
115+
116+
In eager migration, an explicit admin function iterates all known records and rewrites them in the new format immediately after the upgrade is deployed:
117+
118+
```rust
119+
pub fn migrate_all(env: &Env, ids: Vec<u32>) {
120+
// Caller should be an authorized admin.
121+
for id in ids.iter() {
122+
let version: u32 = env.storage().persistent()
123+
.get(&DataKey::DataVersion(id))
124+
.unwrap_or(1);
125+
126+
if version < 2 {
127+
// read_data up-converts to DataV2 in memory.
128+
let migrated = read_data(&env, id);
129+
// write_data stamps the entry as version 2.
130+
write_data(&env, id, &migrated);
131+
}
132+
}
133+
}
134+
```
135+
136+
Eager migration is rarely practical for large datasets on Soroban. Each rewrite consumes fees and burns instructions, and a single transaction cannot migrate an unbounded number of records — the contract will hit instruction or ledger-entry limits. If the batch must span multiple transactions, the contract is in a mixed-version state throughout the window, which means version-aware read logic is still required anyway.
137+
138+
Eager migration is occasionally appropriate when the total number of records is small and known in advance (for example, a fixed registry of a few dozen entries), or when you need to permanently drop old version branches from the read path.
139+
140+
:::caution
141+
142+
Never remove a version branch from `read_data` while old entries of that version can still exist on the ledger. Doing so will cause any remaining old entries to trap when accessed.
143+
144+
:::
145+
146+
### Testing migrations
147+
148+
Testing data migration requires simulating state written by an old contract version and verifying that the new contract reads it correctly.
149+
150+
The Soroban test environment allows you to set storage state directly. Use this to write `DataV1` entries (without a `DataVersion` key) and verify that `read_data` up-converts them correctly:
151+
152+
```rust
153+
#[cfg(test)]
154+
use super::*;
155+
use soroban_sdk::Env;
156+
157+
#[test]
158+
fn test_reads_v1_entry_as_v2() {
159+
let env = Env::default();
160+
let id: u32 = 42;
161+
162+
// Simulate what the old contract wrote: a DataV1 payload,
163+
// no DataVersion entry (old contracts did not write one).
164+
let v1_data = DataV1 { a: 10, b: 20 };
165+
env.storage().persistent().set(&DataKey::Data(id), &v1_data);
166+
167+
let result = read_data(&env, id);
168+
169+
assert_eq!(result.a, 10);
170+
assert_eq!(result.b, 20);
171+
assert_eq!(result.c, None);
172+
}
173+
174+
#[test]
175+
fn test_reads_v2_entry_correctly() {
176+
let env = Env::default();
177+
let id: u32 = 99;
178+
179+
let v2_data = DataV2 { a: 1, b: 2, c: Some(3) };
180+
write_data(&env, id, &v2_data);
181+
182+
let result = read_data(&env, id);
183+
184+
assert_eq!(result.a, 1);
185+
assert_eq!(result.b, 2);
186+
assert_eq!(result.c, Some(3));
187+
}
188+
189+
#[test]
190+
fn test_write_upgrades_v1_entry_to_v2() {
191+
let env = Env::default();
192+
let id: u32 = 7;
193+
194+
// Write a v1 entry directly, as the old contract would have.
195+
let v1_data = DataV1 { a: 5, b: 6 };
196+
env.storage().persistent().set(&DataKey::Data(id), &v1_data);
197+
198+
// Read it — lazy migration produces a DataV2 in memory.
199+
let migrated = read_data(&env, id);
200+
assert_eq!(migrated.c, None);
201+
202+
// Write it back — this stamps the entry as version 2.
203+
write_data(&env, id, &migrated);
204+
205+
let stored_version: u32 = env.storage().persistent()
206+
.get(&DataKey::DataVersion(id))
207+
.unwrap();
208+
assert_eq!(stored_version, 2);
209+
210+
// Subsequent reads should take the v2 branch.
211+
let result = read_data(&env, id);
212+
assert_eq!(result.a, 5);
213+
assert_eq!(result.b, 6);
214+
assert_eq!(result.c, None);
215+
}
216+
```
217+
218+
The three test cases cover the three states a record can be in after an upgrade:
219+
220+
- A `DataV1` entry with no version marker (pre-versioning era records)
221+
- A `DataV2` entry written by the new contract
222+
- A `DataV1` entry that is read and then written back (the lazy migration round-trip)
223+
224+
## Versioned Enum Pattern
225+
226+
Another approach is to implement a versioned enum that can hold either a `V1` or `V2` data struct.
227+
228+
```rust
229+
#[contracttype]
230+
pub enum Data {
231+
V1(DataV1),
232+
V2(DataV2),
233+
}
234+
235+
#[contracttype]
236+
pub enum DataKey {
237+
Data(u64),
238+
}
239+
```
240+
241+
### Migration Logic
242+
243+
The migration logic enumerates the two data formats and converts `V1` data to `V2` format, and passes `V2` format through. If it's already `V1`, it maps fields `a` and `b` over and sets the new `c` field to `None` (the field that was added in `V2`). If it's already `V2`, it passes through unchanged. This is a lazy migration — old data is upgraded on read, not in a bulk migration.
244+
245+
```rust
246+
impl Data {
247+
pub fn into_v2(self) -> DataV2 {
248+
match self {
249+
Data::V1(v1) => DataV2 { a: v1.a, b: v1.b, c: None },
250+
Data::V2(v2) => v2,
251+
}
252+
}
253+
}
254+
```
255+
256+
### Reading with version awareness
257+
258+
The value is read from storage and then `into_v2()` ensures that the returned value is in the `V2` format.
259+
260+
```rust
261+
pub fn read_data(e: Env, id: u32) -> Option<DataV2> {
262+
let data_enum: Data = e.storage().persistent().get(&DataKey::Data(id))?;
263+
Some(data_enum.into_v2())
264+
}
265+
```
266+
267+
### Writing always uses the current version
268+
269+
The write function `write_data()` takes a data argument in the `DataV2` format.
270+
271+
```rust
272+
pub fn write_data(e: Env, id: u32, data: DataV2) {
273+
e.storage().persistent().set(&DataKey::Data(id), &Data::V2(data));
274+
}
275+
```
276+
277+
### Testing migrations
278+
279+
Testing data migration requires simulating state written by an old contract version and verifying that the new contract reads it correctly.
280+
281+
In this test data in the `V1` format is first stored. Then it's read using the `read_data` function, which converts data in the `V1` format to V2 format with `into_v2()` before returning the result. The result is tested with `assert_eq!()`, and stored with the same `id` as it was stored with, which means the `V1` formatted data is overwritten with the same data in `V2` format.
282+
283+
Then the data is read from storage to verify it's stored in the `V2` format, and finally the data is read using the `read_data()` function to verify that the data is also returned in the `V2` format by the read function.
284+
285+
```rust
286+
#[test]
287+
fn test_write_upgrades_v1_entry_to_v2_1() {
288+
let env = Env::default();
289+
let id: u32 = 7;
290+
let contract_id = env.register(Contract, ());
291+
let client = ContractClient::new(&env, &contract_id);
292+
293+
// Inject a V1 entry directly, simulating legacy on-chain state.
294+
env.as_contract(&contract_id, || {
295+
env.storage()
296+
.persistent()
297+
.set(&DataKey::Data(id), &Data::V1(DataV1 { a: 5, b: 6 }));
298+
});
299+
300+
// Read it — into_v2() migrates lazily; c must be None.
301+
let migrated = client.read_data(&id).unwrap();
302+
assert_eq!(migrated.a, 5);
303+
assert_eq!(migrated.b, 6);
304+
assert_eq!(migrated.c, None);
305+
306+
// Write it back — write_data always stores Data::V2(...).
307+
client.write_data(&id, &migrated);
308+
309+
// Confirm the stored enum variant is now V2, not V1.
310+
let stored: Data = env.as_contract(&contract_id, || {
311+
env.storage().persistent().get(&DataKey::Data(id))
312+
})
313+
.unwrap();
314+
315+
match stored {
316+
Data::V2(v2) => {
317+
assert_eq!(v2.a, 5);
318+
assert_eq!(v2.b, 6);
319+
assert_eq!(v2.c, None);
320+
}
321+
Data::V1(_) => panic!("expected Data::V2 after write_data, found Data::V1"),
322+
}
323+
324+
// Subsequent reads go through the V2 branch and return identical values.
325+
let result = client.read_data(&id).unwrap();
326+
assert_eq!(result.a, 5);
327+
assert_eq!(result.b, 6);
328+
assert_eq!(result.c, None);
329+
}
330+
```

0 commit comments

Comments
 (0)