Adding migration guide to storage

carstenjacobsen · carstenjacobsen · commit 82ee31454ad5 · 2026-03-06T04:50:19.000-08:00
diff --git a/docs/build/guides/storage/migrate-contract-storage.mdx b/docs/build/guides/storage/migrate-contract-storage.mdx
@@ -0,0 +1,330 @@
+---
+title: Migrate contract storage data when upgrading data structures
+hide_table_of_contents: true
+description: Use the version marker pattern to safely read and migrate stored data when a contract upgrade changes a data structure
+---
+
+When a contract is upgraded and a stored data structure gains new fields, the data already written to the ledger still uses the old layout. Naively reading those old entries with the new type causes the host to trap. This guide explains why that happens, introduces the version marker pattern as the correct solution, and covers lazy versus eager migration strategies and how to test them.
+
+## Why intuitive approaches fail
+
+Suppose a contract stores `DataV1` entries and is upgraded to use `DataV2`, which adds an optional field `c`:
+
+```rust
+#[contracttype]
+pub struct Data { a: i64, b: i64 }
+
+#[contracttype]
+pub struct DataV2 { a: i64, b: i64, c: Option<i64> }
+```
+
+### Approach 1: Read old entries directly with the new type
+
+The most natural approach is to read the stored bytes directly as `DataV2` and expect `c` to default to `None`:
+
+```rust
+// Reading a DataV1 entry with the DataV2 type.
+// A developer might expect c = None for old entries — but this traps.
+let data: DataV2 = env.storage().persistent().get(&key).unwrap();
+// Error(Object, UnexpectedSize)
+```
+
+This traps with `Error(Object, UnexpectedSize)`. The Soroban host validates the field count of the XDR-encoded value against the type definition before returning anything to the contract. Because `DataV1` has two fields and `DataV2` has three, the host rejects the entry before the SDK can handle it.
+
+### Approach 2: Use `try_from_val` as a fallback
+
+Another approach is to use `try_from_val` expecting to catch a deserialization error and recover:
+
+```rust
+let raw: Val = env.storage().persistent().get(&key).unwrap();
+if let Ok(v2) = DataV2::try_from_val(&env, &raw) {
+    v2
+} else {
+    // This branch is never reached — the host traps before returning Err.
+    let v1 = DataV1::try_from_val(&env, &raw).unwrap();
+    DataV2 { a: v1.a, b: v1.b, c: None }
+}
+```
+
+This also traps at the host level. The field count validation happens in the host environment during deserialization — it does not produce a Rust `Err` that the SDK can intercept. There is no way to catch or recover from the mismatch at the contract level.
+
+The root issue is that a contract cannot determine which type an existing storage entry was written as just by reading it. That information must be stored explicitly.
+
+## Version Marker Pattern
+
+The solution is to store a version number alongside each data entry, keyed by the same identifier. The contract reads the version first, then branches on the result to decode the payload with the correct type.
+
+### Key layout
+
+Define two variants in your key enum — one for the version marker and one for the payload — both keyed by the same `id`:
+
+```rust
+#[contracttype]
+pub enum DataKey {
+    DataVersion(u32),  // version marker keyed by id
+    Data(u32),         // data keyed by id
+}
+```
+
+Each logical record occupies two storage slots. Because the version is stored per-record rather than globally, each entry is independently versioned. There is no all-or-nothing upgrade requirement.
+
+### Reading with version awareness
+
+Before decoding a storage entry, read its version marker. Use `unwrap_or(1)` to handle entries that were written before versioning was introduced — the absence of a version key is itself a signal that the entry is version 1:
+
+```rust
+fn read_data(env: &Env, id: u32) -> DataV2 {
+    let version: u32 = env.storage().persistent()
+        .get(&DataKey::DataVersion(id))
+        .unwrap_or(1);  // default to v1 for entries without version marker
+
+    match version {
+        1 => {
+            let v1: DataV1 = env.storage().persistent().get(&DataKey::Data(id)).unwrap();
+            DataV2 { a: v1.a, b: v1.b, c: None }
+        }
+        _ => env.storage().persistent().get(&DataKey::Data(id)).unwrap(),
+    }
+}
+```
+
+### Writing always uses the current version
+
+Every write stamps the entry with the current version number. An entry that was originally `DataV1` will carry a `DataVersion` marker of `2` the next time it is written back:
+
+```rust
+fn write_data(env: &Env, id: u32, data: &DataV2) {
+    env.storage().persistent().set(&DataKey::DataVersion(id), &2u32);
+    env.storage().persistent().set(&DataKey::Data(id), data);
+}
+```
+
+### Lazy vs eager migration
+
+Once version-aware read/write logic is in place, there are two strategies for converting old entries.
+
+#### Lazy migration (convert on read)
+
+In lazy migration, old entries are left untouched on the ledger. When a record is read, its version is detected and it is up-converted in memory. When that record is later written back, it is stamped with the new version. No explicit migration step is needed — conversion happens as records are accessed in normal contract use.
+
+Lazy migration is generally preferred on blockchains. Leaving old entries untouched has no upfront cost and no risk of hitting instruction or ledger-entry limits at upgrade time. Records that are never accessed again are never migrated, which is usually acceptable.
+
+The `read_data` function shown above already implements lazy migration. Each time an old `DataV1` entry is read and then passed to `write_data`, the entry is silently upgraded in place.
+
+#### Eager migration (batch conversion)
+
+In eager migration, an explicit admin function iterates all known records and rewrites them in the new format immediately after the upgrade is deployed:
+
+```rust
+pub fn migrate_all(env: &Env, ids: Vec<u32>) {
+    // Caller should be an authorized admin.
+    for id in ids.iter() {
+        let version: u32 = env.storage().persistent()
+            .get(&DataKey::DataVersion(id))
+            .unwrap_or(1);
+
+        if version < 2 {
+            // read_data up-converts to DataV2 in memory.
+            let migrated = read_data(&env, id);
+            // write_data stamps the entry as version 2.
+            write_data(&env, id, &migrated);
+        }
+    }
+}
+```
+
+Eager migration is rarely practical for large datasets on Soroban. Each rewrite consumes fees and burns instructions, and a single transaction cannot migrate an unbounded number of records — the contract will hit instruction or ledger-entry limits. If the batch must span multiple transactions, the contract is in a mixed-version state throughout the window, which means version-aware read logic is still required anyway.
+
+Eager migration is occasionally appropriate when the total number of records is small and known in advance (for example, a fixed registry of a few dozen entries), or when you need to permanently drop old version branches from the read path.
+
+:::caution
+
+Never remove a version branch from `read_data` while old entries of that version can still exist on the ledger. Doing so will cause any remaining old entries to trap when accessed.
+
+:::
+
+### Testing migrations
+
+Testing data migration requires simulating state written by an old contract version and verifying that the new contract reads it correctly.
+
+The Soroban test environment allows you to set storage state directly. Use this to write `DataV1` entries (without a `DataVersion` key) and verify that `read_data` up-converts them correctly:
+
+```rust
+#[cfg(test)]
+use super::*;
+use soroban_sdk::Env;
+
+#[test]
+fn test_reads_v1_entry_as_v2() {
+    let env = Env::default();
+    let id: u32 = 42;
+
+    // Simulate what the old contract wrote: a DataV1 payload,
+    // no DataVersion entry (old contracts did not write one).
+    let v1_data = DataV1 { a: 10, b: 20 };
+    env.storage().persistent().set(&DataKey::Data(id), &v1_data);
+
+    let result = read_data(&env, id);
+
+    assert_eq!(result.a, 10);
+    assert_eq!(result.b, 20);
+    assert_eq!(result.c, None);
+}
+
+#[test]
+fn test_reads_v2_entry_correctly() {
+    let env = Env::default();
+    let id: u32 = 99;
+
+    let v2_data = DataV2 { a: 1, b: 2, c: Some(3) };
+    write_data(&env, id, &v2_data);
+
+    let result = read_data(&env, id);
+
+    assert_eq!(result.a, 1);
+    assert_eq!(result.b, 2);
+    assert_eq!(result.c, Some(3));
+}
+
+#[test]
+fn test_write_upgrades_v1_entry_to_v2() {
+    let env = Env::default();
+    let id: u32 = 7;
+
+    // Write a v1 entry directly, as the old contract would have.
+    let v1_data = DataV1 { a: 5, b: 6 };
+    env.storage().persistent().set(&DataKey::Data(id), &v1_data);
+
+    // Read it — lazy migration produces a DataV2 in memory.
+    let migrated = read_data(&env, id);
+    assert_eq!(migrated.c, None);
+
+    // Write it back — this stamps the entry as version 2.
+    write_data(&env, id, &migrated);
+
+    let stored_version: u32 = env.storage().persistent()
+        .get(&DataKey::DataVersion(id))
+        .unwrap();
+    assert_eq!(stored_version, 2);
+
+    // Subsequent reads should take the v2 branch.
+    let result = read_data(&env, id);
+    assert_eq!(result.a, 5);
+    assert_eq!(result.b, 6);
+    assert_eq!(result.c, None);
+}
+```
+
+The three test cases cover the three states a record can be in after an upgrade:
+
+- A `DataV1` entry with no version marker (pre-versioning era records)
+- A `DataV2` entry written by the new contract
+- A `DataV1` entry that is read and then written back (the lazy migration round-trip)
+
+## Versioned Enum Pattern
+
+Another approach is to implement a versioned enum that can hold either a `V1` or `V2` data struct.
+
+```rust
+#[contracttype]
+pub enum Data {
+    V1(DataV1),
+    V2(DataV2),
+}
+
+#[contracttype]
+pub enum DataKey {
+    Data(u64),
+}
+```
+
+### Migration Logic
+
+The migration logic enumerates the two data formats and converts `V1` data to `V2` format, and passes `V2` format through. If it's already `V1`, it maps fields `a` and `b` over and sets the new `c` field to `None` (the field that was added in `V2`). If it's already `V2`, it passes through unchanged. This is a lazy migration — old data is upgraded on read, not in a bulk migration.
+
+```rust
+impl Data {
+    pub fn into_v2(self) -> DataV2 {
+        match self {
+            Data::V1(v1) => DataV2 { a: v1.a, b: v1.b, c: None },
+            Data::V2(v2) => v2,
+        }
+    }
+}
+```
+
+### Reading with version awareness
+
+The value is read from storage and then `into_v2()` ensures that the returned value is in the `V2` format.
+
+```rust
+pub fn read_data(e: Env, id: u32) -> Option<DataV2> {
+    let data_enum: Data = e.storage().persistent().get(&DataKey::Data(id))?;
+    Some(data_enum.into_v2())
+}
+```
+
+### Writing always uses the current version
+
+The write function `write_data()` takes a data argument in the `DataV2` format.
+
+```rust
+pub fn write_data(e: Env, id: u32, data: DataV2) {
+    e.storage().persistent().set(&DataKey::Data(id), &Data::V2(data));
+}
+```
+
+### Testing migrations
+
+Testing data migration requires simulating state written by an old contract version and verifying that the new contract reads it correctly.
+
+In this test data in the `V1` format is first stored. Then it's read using the `read_data` function, which converts data in the `V1` format to V2 format with `into_v2()` before returning the result. The result is tested with `assert_eq!()`, and stored with the same `id` as it was stored with, which means the `V1` formatted data is overwritten with the same data in `V2` format.
+
+Then the data is read from storage to verify it's stored in the `V2` format, and finally the data is read using the `read_data()` function to verify that the data is also returned in the `V2` format by the read function.
+
+```rust
+#[test]
+fn test_write_upgrades_v1_entry_to_v2_1() {
+    let env = Env::default();
+    let id: u32 = 7;
+    let contract_id = env.register(Contract, ());
+    let client = ContractClient::new(&env, &contract_id);
+
+    // Inject a V1 entry directly, simulating legacy on-chain state.
+    env.as_contract(&contract_id, || {
+        env.storage()
+            .persistent()
+            .set(&DataKey::Data(id), &Data::V1(DataV1 { a: 5, b: 6 }));
+    });
+
+    // Read it — into_v2() migrates lazily; c must be None.
+    let migrated = client.read_data(&id).unwrap();
+    assert_eq!(migrated.a, 5);
+    assert_eq!(migrated.b, 6);
+    assert_eq!(migrated.c, None);
+
+    // Write it back — write_data always stores Data::V2(...).
+    client.write_data(&id, &migrated);
+
+    // Confirm the stored enum variant is now V2, not V1.
+    let stored: Data = env.as_contract(&contract_id, || {
+        env.storage().persistent().get(&DataKey::Data(id))
+    })
+    .unwrap();
+
+    match stored {
+        Data::V2(v2) => {
+            assert_eq!(v2.a, 5);
+            assert_eq!(v2.b, 6);
+            assert_eq!(v2.c, None);
+        }
+        Data::V1(_) => panic!("expected Data::V2 after write_data, found Data::V1"),
+    }
+
+    // Subsequent reads go through the V2 branch and return identical values.
+    let result = client.read_data(&id).unwrap();
+    assert_eq!(result.a, 5);
+    assert_eq!(result.b, 6);
+    assert_eq!(result.c, None);
+}
+```