Skip to content

Conversation

@corylanou
Copy link
Collaborator

@corylanou corylanou commented Nov 25, 2025

Summary

Fix variable scoping bug where compaction errors were being lost during restore operations.

The Bug

The error from c.Compact() was captured in an if-statement scope, but the outer err variable (which was nil after successful NewCompactor) was passed to CloseWithError. This caused compaction failures to result in misleading EOF errors instead of the actual error message.

// Before (bug):
if err := c.Compact(ctx); err != nil { ... }  // err scoped to if
_ = pw.CloseWithError(err)  // This was always nil!

// After (fix):
compactErr := c.Compact(ctx)
_ = pw.CloseWithError(compactErr)  // Now properly propagates

Relation to Issue #858

Note: The root cause of the GCS restore EOF errors reported in issue #858 was already fixed in PR #807 (commit 5cb6fd6), which translates size=0 to length=-1 for GCS range reads.

This PR fixes a secondary bug discovered during the investigation of #858. The user confirmed via git bisect that #807 resolves their issue.

Test Plan

  • All existing tests pass
  • Pre-commit hooks pass
  • Code reviewed

Related: #807

🤖 Generated with Claude Code

Add validation in Restore() to check that LTX files have valid sizes
before attempting to open and compact them. Files with size less than
the LTX header size (100 bytes) now return a clear error message
identifying the problematic file instead of the cryptic
"decode database: decode header: EOF" error.

Fixes #848

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@smerschjohann
Copy link

smerschjohann commented Nov 25, 2025

this can't be a real fix.

  1. Why are some files broken in the first place? It happens all the time. Even if I stop my app first, wait a bit, then stop litestream.

  2. If one file is broken (with this code), the restore will still abort.

With litestream v3 everything works fine (I downgraded for now)

@benbjohnson
Copy link
Owner

@smerschjohann No, this is not the fix. I talked with @corylanou offline about it and decided that a length check is probably a good idea but not the root cause.

@smerschjohann
Copy link

ah, I see. I just wanted to mention it. Thanks for looking into it.

@corylanou
Copy link
Collaborator Author

Thanks for the feedback @smerschjohann, and you raise the key question: why are files ending up broken in the first place?

You're right - this PR is defensive (better error message), not a root cause fix. As Ben mentioned, that's intentional while we investigate further.

My Investigation So Far

The error decode database: decode header: EOF follows this path:

  1. replica.go:485 - dec.DecodeDatabaseTo(f) wraps the error
  2. ltx/decoder.go - DecodeHeader() uses io.ReadFull() which returns EOF if it can't read the full header (100 bytes)
  3. ltx/compactor.go - The Compactor reads headers from input readers during restore, but doesn't include context about which specific input file failed

The restore flow:

flowchart TB
    A["1. CalcRestorePlan()"] --> B["2. LTXFiles()<br/><i>list files in storage</i>"]
    B --> C["3. OpenLTXFile()<br/><i>open each file</i>"]
    C --> D["4. ltx.NewCompactor()<br/><i>read headers from readers</i>"]
    D --> E["5. DecodeDatabaseTo()<br/><i>write to disk</i>"]
    E --> F[("Restored DB")]
Loading

Files examined

  • gs/replica_client.go:169-195 - OpenLTXFile() implementation
  • gs/replica_client.go:118-165 - WriteLTXFile() implementation
  • replica.go:406-497 - Restore() implementation
  • replica.go:499-600 - CalcRestorePlan() implementation

Potential causes I'm considering

  1. Empty/truncated files in storage - File exists but has 0 bytes or < 100 bytes
  2. File listed but doesn't exist - LTXFiles() returns file info, but file missing when OpenLTXFile() is called
  3. Race condition - File deleted/overwritten between listing and reading
  4. Incomplete upload - Write interrupted before completing
  5. GCS-specific issue - Something in the upload path not completing properly

Related: PR #807

I fixed extract timestamp from LTX header: EOF where GCS NewRangeReader(ctx, offset, length) treats length=0 as "return 0 bytes" instead of "read to EOF". That fix is in OpenLTXFile() which is used by both compaction and restore. Since you're still seeing this error, it suggests the files in storage ARE actually empty/truncated - not that we're failing to read them correctly.

Key difference from v0.3.x

  • v0.3.x used single-level storage with NewReader() (always streamed full object)
  • v0.5.x uses multi-level LTX with NewRangeReader() (supports partial reads)
  • The entire storage layer was rewritten

To Help Diagnose

@smerschjohann - Since you can reproduce this consistently and you're asking the right question (why are files broken), could you provide:

1. File listing when the error occurs:

gsutil ls -la gs://your-bucket/path/

This will confirm if files are actually 0 bytes or suspiciously small (< 100 bytes).

2. Debug logs during replication (before you stop litestream):

# Run litestream with debug logging
litestream replicate -log-level debug -config litestream.yml

This might show what's happening when files are being written.

3. Debug logs during restore:

litestream restore -log-level debug -o test.db gs://your-bucket/path

This will show which specific file it fails on.


For @benbjohnson

Any thoughts on other areas worth diving deeper into?

  1. Other code paths that could cause this EOF?
  2. Something in how CalcRestorePlan builds the file list vs what's actually readable?
  3. Why files might end up empty/truncated during normal replication? smerschjohann says this happens "all the time" even with careful shutdown - that suggests something in the write path rather than a crash/interrupt issue.

@smerschjohann
Copy link

there is no -log-level flag


❯ export LITESTREAM_LOG_LEVEL=DEBUG 
❯ export REPLICA_URL=gs://BUCKET_NAME/test 
❯ cat litestream.yml
dbs:
  - path: ./database.db
    replicas:
      - url: ${REPLICA_URL}
      
❯ ./litestream replicate -config ./litestream.yml
time=2025-11-26T21:32:48.710+01:00 level=INFO msg=litestream version=0.5.2 level=""
time=2025-11-26T21:32:48.711+01:00 level=INFO msg="initialized db" path=/home/simon/tmp/test/database.db
time=2025-11-26T21:32:48.711+01:00 level=INFO msg="replicating to" type=gs sync-interval=1s bucket=BUCKET_NAME path=test
time=2025-11-26T21:32:48.713+01:00 level=INFO msg="starting compaction monitor" level=9 interval=24h0m0s
time=2025-11-26T21:32:48.713+01:00 level=INFO msg="starting compaction monitor" level=2 interval=5m0s
time=2025-11-26T21:32:48.713+01:00 level=INFO msg="starting compaction monitor" level=1 interval=30s
time=2025-11-26T21:32:48.713+01:00 level=INFO msg="starting compaction monitor" level=3 interval=1h0m0s
time=2025-11-26T21:32:49.349+01:00 level=ERROR msg="compaction failed" level=9 error="page size not initialized yet"
time=2025-11-26T21:33:00.269+01:00 level=INFO msg="compaction complete" level=1 txid.min=0000000000000001 txid.max=0000000000000001 size=7484
time=2025-11-26T21:34:30.387+01:00 level=INFO msg="compaction complete" level=1 txid.min=0000000000000002 txid.max=0000000000000002 size=548
^Ctime=2025-11-26T21:34:38.711+01:00 level=INFO msg="signal received, litestream shutting down"
time=2025-11-26T21:34:38.773+01:00 level=INFO msg="litestream shut down"

# (while litestream replicate was running the application added a single row

❯ gsutil ls -la gs://BUCKET_NAME/test/
                                 gs://BUCKET_NAME/test/ltx/
❯ gsutil ls -la gs://BUCKET_NAME/test/ltx/
                                 gs://BUCKET_NAME/test/ltx/0/
                                 gs://BUCKET_NAME/test/ltx/1/
❯ gsutil ls -la gs://BUCKET_NAME/test/ltx/0/
       548  2025-11-26T20:34:03Z  gs://BUCKET_NAME/test/ltx/0/0000000000000002-0000000000000002.ltx#1764189243815797  metageneration=1
TOTAL: 1 objects, 548 bytes (548 B)
❯ gsutil ls -la gs://BUCKET_NAME/test/ltx/1/
      7484  2025-11-26T20:33:00Z  gs://BUCKET_NAME/test/ltx/1/0000000000000001-0000000000000001.ltx#1764189180101881  metageneration=1
       548  2025-11-26T20:34:30Z  gs://BUCKET_NAME/test/ltx/1/0000000000000002-0000000000000002.ltx#1764189270117818  metageneration=1
TOTAL: 2 objects, 8032 bytes (7.84 KiB)


❯ ./litestream restore -o database.db gs://BUCKET_NAME/test
time=2025-11-26T21:37:39.853+01:00 level=ERROR msg="failed to run" error="decode database: decode header: EOF"

@smerschjohann
Copy link

ok found out that it is LOG_LEVEL=DEBUG


❯ export REPLICA_URL=gs://BUCKET_NAME/test   
❯ export LOG_LEVEL=DEBUG  
❯ cat litestream.yml
dbs:
  - path: ./database.db
    replicas:
      - url: ${REPLICA_URL}

❯ ./litestream replicate -config ./litestream.yml
time=2025-11-26T21:49:55.636+01:00 level=INFO msg=litestream version=0.5.2 level=DEBUG
time=2025-11-26T21:49:55.636+01:00 level=INFO msg="initialized db" path=/home/simon/tmp/test/database.db
time=2025-11-26T21:49:55.636+01:00 level=INFO msg="replicating to" type=gs sync-interval=1s bucket=BUCKET_NAME path=test
time=2025-11-26T21:49:55.638+01:00 level=INFO msg="starting compaction monitor" level=9 interval=24h0m0s
time=2025-11-26T21:49:55.639+01:00 level=INFO msg="starting compaction monitor" level=1 interval=30s
time=2025-11-26T21:49:55.639+01:00 level=INFO msg="starting compaction monitor" level=2 interval=5m0s
time=2025-11-26T21:49:55.639+01:00 level=INFO msg="starting compaction monitor" level=3 interval=1h0m0s
time=2025-11-26T21:49:56.325+01:00 level=ERROR msg="compaction failed" level=9 error="page size not initialized yet"
time=2025-11-26T21:49:56.438+01:00 level=DEBUG msg="no compaction" level=1 path=/home/simon/tmp/test/database.db
time=2025-11-26T21:49:56.550+01:00 level=DEBUG msg="no compaction" level=3 path=/home/simon/tmp/test/database.db
time=2025-11-26T21:49:56.550+01:00 level=DEBUG msg="no compaction" level=2 path=/home/simon/tmp/test/database.db
time=2025-11-26T21:49:56.769+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000001 offset=32 snap=true
time=2025-11-26T21:49:56.799+01:00 level=DEBUG msg="db sync" db=database.db status=ok
time=2025-11-26T21:49:56.863+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:49:56.989+01:00 level=DEBUG msg="ltx file uploaded" db=database.db replica=gs filename=/home/simon/tmp/test/.database.db-litestream/ltx/0/0000000000000001-0000000000000001.ltx minTXID=0000000000000001 maxTXID=0000000000000001
time=2025-11-26T21:49:57.326+01:00 level=DEBUG msg="enforcing snapshot retention" db=database.db timestamp=2025-11-25T21:49:57.326+01:00
time=2025-11-26T21:49:57.380+01:00 level=DEBUG msg="enforcing retention" db=database.db level=1 txid=0000000000000000
time=2025-11-26T21:49:57.433+01:00 level=DEBUG msg="enforcing retention" db=database.db level=2 txid=0000000000000000
time=2025-11-26T21:49:57.487+01:00 level=DEBUG msg="enforcing retention" db=database.db level=3 txid=0000000000000000
time=2025-11-26T21:49:57.646+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:49:57.646+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:49:57.647+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:49:57.771+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:49:58.639+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:49:58.640+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:49:58.641+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:49:58.771+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:49:59.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:49:59.640+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:49:59.641+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:49:59.770+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:00.000+01:00 level=DEBUG msg="no compaction" level=2 path=/home/simon/tmp/test/database.db
time=2025-11-26T21:50:00.269+01:00 level=DEBUG msg="enforcing retention" db=database.db level=0 txid=0000000000000001
time=2025-11-26T21:50:00.330+01:00 level=INFO msg="compaction complete" level=1 txid.min=0000000000000001 txid.max=0000000000000001 size=7484
time=2025-11-26T21:50:00.639+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:00.640+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:00.640+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:00.772+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:01.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:01.639+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:01.640+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:01.770+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:02.639+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:02.640+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:02.641+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:02.770+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:03.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:03.640+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:03.641+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:03.771+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:04.639+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:04.640+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:04.641+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:04.773+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:05.639+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:05.640+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:05.641+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:05.770+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:06.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:06.639+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:06.640+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:06.771+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:07.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:07.639+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:07.640+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:07.771+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:08.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:08.639+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:08.640+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:08.771+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:09.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:09.639+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:09.640+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:09.772+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:10.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:10.639+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:10.640+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:10.773+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:11.639+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:11.640+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:11.641+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:11.770+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:12.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:12.639+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:12.640+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:12.771+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:13.640+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:13.641+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:13.642+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:13.770+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:14.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:14.639+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:14.640+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:14.770+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:15.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:15.639+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:15.640+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:15.771+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:16.639+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:16.640+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:16.641+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:16.770+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:17.639+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:17.639+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:17.640+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:17.770+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:18.639+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:18.640+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:18.641+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:18.772+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:19.640+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:19.641+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:19.642+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:19.771+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:20.639+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:20.640+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:20.642+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:20.771+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:21.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:21.639+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:21.640+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:21.770+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:22.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:22.639+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:22.640+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:22.771+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:23.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:23.639+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:23.640+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:23.772+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:24.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:24.640+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:24.641+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:24.772+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000001
time=2025-11-26T21:50:25.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=300792
time=2025-11-26T21:50:25.639+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:25.640+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000002 offset=304912
time=2025-11-26T21:50:25.670+01:00 level=DEBUG msg="db sync" db=database.db status=ok
time=2025-11-26T21:50:25.770+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000002
time=2025-11-26T21:50:25.886+01:00 level=DEBUG msg="ltx file uploaded" db=database.db replica=gs filename=/home/simon/tmp/test/.database.db-litestream/ltx/0/0000000000000002-0000000000000002.ltx minTXID=0000000000000002 maxTXID=0000000000000002
time=2025-11-26T21:50:26.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=313152
time=2025-11-26T21:50:26.638+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:26.638+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000003 offset=317272
time=2025-11-26T21:50:26.770+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000002
time=2025-11-26T21:50:27.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=313152
time=2025-11-26T21:50:27.638+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:27.639+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000003 offset=317272
time=2025-11-26T21:50:27.771+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000002
time=2025-11-26T21:50:28.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=313152
time=2025-11-26T21:50:28.639+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:28.639+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000003 offset=317272
time=2025-11-26T21:50:28.770+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000002
time=2025-11-26T21:50:29.637+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=313152
time=2025-11-26T21:50:29.638+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:29.638+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000003 offset=317272
time=2025-11-26T21:50:29.770+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000002
time=2025-11-26T21:50:30.216+01:00 level=DEBUG msg="enforcing retention" db=database.db level=0 txid=0000000000000002
time=2025-11-26T21:50:30.277+01:00 level=DEBUG msg="deleting ltx file" gs.level=0 gs.minTXID=0000000000000001 gs.maxTXID=0000000000000001 gs.key=test/ltx/0/0000000000000001-0000000000000001.ltx
time=2025-11-26T21:50:30.353+01:00 level=DEBUG msg="deleting local ltx file" db=database.db level=0 minTXID=0000000000000001 maxTXID=0000000000000001 path=/home/simon/tmp/test/.database.db-litestream/ltx/0/0000000000000001-0000000000000001.ltx
time=2025-11-26T21:50:30.353+01:00 level=INFO msg="compaction complete" level=1 txid.min=0000000000000002 txid.max=0000000000000002 size=548
time=2025-11-26T21:50:30.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=313152
time=2025-11-26T21:50:30.638+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:30.638+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000003 offset=317272
time=2025-11-26T21:50:30.771+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000002
time=2025-11-26T21:50:31.637+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=313152
time=2025-11-26T21:50:31.638+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:31.638+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000003 offset=317272
time=2025-11-26T21:50:31.770+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000002
time=2025-11-26T21:50:32.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=313152
time=2025-11-26T21:50:32.638+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:32.638+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000003 offset=317272
time=2025-11-26T21:50:32.769+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000002
time=2025-11-26T21:50:33.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=313152
time=2025-11-26T21:50:33.638+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:33.638+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000003 offset=317272
time=2025-11-26T21:50:33.769+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000002
time=2025-11-26T21:50:34.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=313152
time=2025-11-26T21:50:34.639+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:34.639+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000003 offset=317272
time=2025-11-26T21:50:34.770+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000002
time=2025-11-26T21:50:35.637+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=313152
time=2025-11-26T21:50:35.638+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:35.638+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000003 offset=317272
time=2025-11-26T21:50:35.770+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000002
time=2025-11-26T21:50:36.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=313152
time=2025-11-26T21:50:36.638+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:36.638+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000003 offset=317272
time=2025-11-26T21:50:36.771+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000002
time=2025-11-26T21:50:37.637+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=313152
time=2025-11-26T21:50:37.638+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:37.638+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000003 offset=317272
time=2025-11-26T21:50:37.771+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000002
time=2025-11-26T21:50:38.637+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=313152
time=2025-11-26T21:50:38.638+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:38.638+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000003 offset=317272
time=2025-11-26T21:50:38.770+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000002
time=2025-11-26T21:50:39.637+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=313152
time=2025-11-26T21:50:39.637+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:39.637+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000003 offset=317272
time=2025-11-26T21:50:39.770+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000002
time=2025-11-26T21:50:40.637+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=313152
time=2025-11-26T21:50:40.637+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:40.638+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000003 offset=317272
time=2025-11-26T21:50:40.771+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000002
time=2025-11-26T21:50:41.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=313152
time=2025-11-26T21:50:41.638+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:41.638+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000003 offset=317272
time=2025-11-26T21:50:41.771+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000002
time=2025-11-26T21:50:42.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=313152
time=2025-11-26T21:50:42.638+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:42.638+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000003 offset=317272
time=2025-11-26T21:50:42.770+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000002
time=2025-11-26T21:50:43.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=313152
time=2025-11-26T21:50:43.638+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:43.638+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000003 offset=317272
time=2025-11-26T21:50:43.770+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000002
time=2025-11-26T21:50:44.638+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=313152
time=2025-11-26T21:50:44.638+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:44.638+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000003 offset=317272
time=2025-11-26T21:50:44.769+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000002
^Ctime=2025-11-26T21:50:45.196+01:00 level=INFO msg="signal received, litestream shutting down"
time=2025-11-26T21:50:45.197+01:00 level=DEBUG msg=verify saltMatch=true prevWALOffset=313152
time=2025-11-26T21:50:45.197+01:00 level=DEBUG msg=verify.2 lastPageMatch=true
time=2025-11-26T21:50:45.197+01:00 level=DEBUG msg=sync db=database.db txid=0000000000000003 offset=317272
time=2025-11-26T21:50:45.197+01:00 level=DEBUG msg="replica sync" db=database.db replica=gs txid=0000000000000002
time=2025-11-26T21:50:45.237+01:00 level=ERROR msg="failed to close database" error="sql: transaction has already been committed or rolled back"
time=2025-11-26T21:50:45.237+01:00 level=INFO msg="litestream shut down"




❯ gsutil ls -lar gs://BUCKET_NAME/test/
gs://BUCKET_NAME/test/:

gs://BUCKET_NAME/test/ltx/:

gs://BUCKET_NAME/test/ltx/0/:
       548  2025-11-26T20:50:25Z  gs://BUCKET_NAME/test/ltx/0/0000000000000002-0000000000000002.ltx#1764190225769698  metageneration=1

gs://BUCKET_NAME/test/ltx/1/:
      7484  2025-11-26T20:50:00Z  gs://BUCKET_NAME/test/ltx/1/0000000000000001-0000000000000001.ltx#1764190200145270  metageneration=1
       548  2025-11-26T20:50:30Z  gs://BUCKET_NAME/test/ltx/1/0000000000000002-0000000000000002.ltx#1764190230100306  metageneration=1
TOTAL: 3 objects, 8580 bytes (8.38 KiB)





❯ ./litestream restore -o database.db gs://BUCKET_NAME/test
time=2025-11-26T21:52:19.060+01:00 level=DEBUG msg="finding ltx files for level" replica=gs target=0000000000000000 level=8
time=2025-11-26T21:52:19.118+01:00 level=DEBUG msg="finding ltx files for level" replica=gs target=0000000000000000 level=7
time=2025-11-26T21:52:19.176+01:00 level=DEBUG msg="finding ltx files for level" replica=gs target=0000000000000000 level=6
time=2025-11-26T21:52:19.233+01:00 level=DEBUG msg="finding ltx files for level" replica=gs target=0000000000000000 level=5
time=2025-11-26T21:52:19.289+01:00 level=DEBUG msg="finding ltx files for level" replica=gs target=0000000000000000 level=4
time=2025-11-26T21:52:19.345+01:00 level=DEBUG msg="finding ltx files for level" replica=gs target=0000000000000000 level=3
time=2025-11-26T21:52:19.399+01:00 level=DEBUG msg="finding ltx files for level" replica=gs target=0000000000000000 level=2
time=2025-11-26T21:52:19.449+01:00 level=DEBUG msg="finding ltx files for level" replica=gs target=0000000000000000 level=1
time=2025-11-26T21:52:19.506+01:00 level=DEBUG msg="matching LTX file for restore" replica=gs target=0000000000000000 filename=0000000000000001-0000000000000001.ltx
time=2025-11-26T21:52:19.506+01:00 level=DEBUG msg="matching LTX file for restore" replica=gs target=0000000000000000 filename=0000000000000002-0000000000000002.ltx
time=2025-11-26T21:52:19.506+01:00 level=DEBUG msg="finding ltx files for level" replica=gs target=0000000000000000 level=0
time=2025-11-26T21:52:19.573+01:00 level=DEBUG msg="restore plan" replica=gs n=2 txid=0000000000000002 timestamp=2025-11-26T20:50:25.640Z
time=2025-11-26T21:52:19.574+01:00 level=DEBUG msg="opening ltx file for restore" replica=gs level=1 min=0000000000000001 max=0000000000000001
time=2025-11-26T21:52:19.632+01:00 level=DEBUG msg="opening ltx file for restore" replica=gs level=1 min=0000000000000002 max=0000000000000002
time=2025-11-26T21:52:19.689+01:00 level=DEBUG msg="compacting into database" replica=gs path=database.db.tmp n=2
time=2025-11-26T21:52:19.690+01:00 level=ERROR msg="failed to run" error="decode database: decode header: EOF"

@corylanou
Copy link
Collaborator Author

@smerschjohann Thanks for those logs - they're really helpful.

So the compaction shows size=7484 and gsutil confirms the files exist with correct sizes, but restore still gets EOF. I want to check if the files in GCS are actually corrupted or if there's a bug in the read path.

Can you run these to dump the actual file content?

# Check if the file has valid LTX header (should start with "LTX1" / 4c 54 58 31)
gsutil cat gs://BUCKET_NAME/test/ltx/1/0000000000000001-0000000000000001.ltx | xxd | head -10

# Or download and check locally
gsutil cp gs://BUCKET_NAME/test/ltx/1/0000000000000001-0000000000000001.ltx /tmp/test.ltx
ls -la /tmp/test.ltx
xxd /tmp/test.ltx | head -20

If the local litestream cache still exists (run from your test directory):

cd /home/simon/tmp/test
xxd .database.db-litestream/ltx/1/0000000000000001-0000000000000001.ltx | head -20

If the hex dump shows valid LTX headers (4c 54 58 31 at the start), then the bug is in restore's read path. If the file is empty or garbage, then the upload is corrupting the data somehow.

@smerschjohann
Copy link

The files look good.

❯ gsutil cat gs://BUCKET_NAME/test/ltx/1/0000000000000001-0000000000000001.ltx | xxd | head -10
00000000: 4c54 5831 0000 0002 0000 1000 0000 001f  LTX1............
00000010: 0000 0000 0000 0001 0000 0000 0000 0001  ................
00000020: 0000 019a c1ee 1423 0000 0000 0000 0000  .......#........
00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000060: 0000 0000 0000 0001 0000 0422 4d18 6440  ..........."M.d@
00000070: a74d 0600 00f6 1253 514c 6974 6520 666f  .M.....SQLite fo
00000080: 726d 6174 2033 0010 0002 0200 4020 2000  rmat 3......@  .
00000090: 0000 0300 0000 1f00 0100 5510 0000 0004  ..........U.....

Maybe this line from the replicate command is also relevant?

time=2025-11-26T21:50:45.237+01:00 level=ERROR msg="failed to close database" error="sql: transaction has already been committed or rolled back"

corylanou added a commit that referenced this pull request Nov 27, 2025
…failures

- Add detailed restore plan logging showing each file's level, TXID range, and size
- Fix compaction error capture (was being swallowed due to variable scoping)
- Improve decode error message to include first file info for debugging EOF errors
- Add warning when restore plan starts from non-TXID-1 (seeded replica detection)
- Add GCS-specific fail-fast for truncated files (< header size)
- Add debug logging when opening LTX files from GCS

These changes help diagnose issue #858 where GCS restores fail with
"decode database: decode header: EOF" by providing better visibility
into the restore plan and catching truncated files early.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@corylanou corylanou changed the title fix(restore): validate LTX file sizes before compaction fix(restore): improve diagnostics and error handling for GCS restore failures Nov 27, 2025
@smerschjohann
Copy link

I just checked out your branch and tried with it, and it works. Multiple times, when I use the executable from the release page it does not work.. Currently try to figure out if it is one of the static build flags

@smerschjohann
Copy link

I successfully dissected the issue and this commit fixes it: 5cb6fd6

Can you create another release?

Fix variable scoping bug where compaction errors were being lost.
The error from c.Compact() was captured in an if-statement scope,
but the outer `err` variable (nil on NewCompactor success) was
passed to CloseWithError, causing compaction failures to result
in misleading EOF errors instead of the actual error.

Note: The root cause of issue #858 (GCS restore EOF errors) was
already fixed in PR #807 (commit 5cb6fd6) which translates size=0
to length=-1 for GCS range reads. This PR fixes a secondary bug
discovered during investigation.

Related: #807

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@corylanou corylanou force-pushed the fix/848-restore-empty-file-error branch from 51ddcea to e4c9b89 Compare November 27, 2025 21:57
@corylanou corylanou changed the title fix(restore): improve diagnostics and error handling for GCS restore failures fix(restore): properly propagate compaction errors during restore Nov 27, 2025
@corylanou
Copy link
Collaborator Author

Thanks for the git bisect work @smerschjohann! That confirms it - commit 5cb6fd6 (PR #807) is indeed the fix for your issue.

Why we couldn't reproduce: We were testing on the main branch which already includes that fix. You were running v0.5.2, which was released before PR #807 was merged. That's my fault - I should have tried to replicate from your version instead of the current main branch. Sorry it took so long to diagnose!

Good news: A new release is coming soon - we're preparing to release the VFS functionality, and that release will include the fix from #807.

This PR (#858) started as an investigation into your issue, but since #807 already fixed the root cause, we've repurposed it to fix a secondary bug we discovered during the investigation (compaction errors weren't being properly propagated).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants