Skip to content

Slow deployment with litefs on fly.io on 6.7gb database #446

@luisiacc

Description

@luisiacc

So I'm having this issue where my app with 2 machines on fly.io, is currently taking about 3 minutes each machine to deploy, and about 80% of that time is occupied by litefs.

I'm looking for help on how to fix this and make it fast, I think 6.7gb is not that big of a database, while talking to the support team at fly.io, there is also a heavy increase in cpu usage to almost 100% usage during that time. Here are some pictures and logs about what happened.

Litefs version: 0.5.14

Image

Here is a log of one of my deployments where 1 step takes alomst 2 minutes:

08:13:15 level=INFO msg=”initializing consul: key=iuspro-20250530/notario url=https://:656117fa-394f-1726-8904-f31ddd6cce70@consul-iad-11.fly-shared.net/notario-yexkqwp8dm79m38d/ hostname=4d894672a06148 advertise-url=[http://4d894672a06148.vm.notario.internal:20202”](http://4d894672a06148.vm.notario.internal:20202%22/)

08:13:15 2025/12/15 08:13:15 INFO SSH listening listen_address=[fdaa:2:fb49:a7b:1eb:5c53:e84d:2]:22

08:13:15 level=INFO msg=”wal-sync: short wal file exists on \”cache.db\”, skipping sync with ltx”

08:13:16 Health check ‘servicecheck-01-http-8080’ on port 8080 has failed. Your app is not responding properly. Services exposed on ports [80, 443] will have intermittent failures until the health check passes.

08:13:16 Health check ‘servicecheck-00-tcp-8080’ on port 8080 has failed. Your app is not responding properly. Services exposed on ports [80, 443] will have intermittent failures until the health check passes.

08:13:23 level=INFO msg=”wal-sync: database \”sqlite.db\” has wal size of 2319592 bytes within range of ltx file (@1161872, 1157720 bytes)”


====================== 2 minutes here ===============================================

08:15:00 level=INFO msg=”using existing cluster id: \”LFSCD2310317B4BCC32D\””

08:15:00 level=INFO msg=”LiteFS mounted to: /litefs/data”

08:15:00 level=INFO msg=”http server listening on: [http://localhost:20202”](http://localhost:20202%22/)

During one of the deployments, while debugging, I ran litefs mount -tracing -fuse.debug to get more info here are relevant logs (full logs of that session at https://pastebin.com/RR2nw7Yr):

2025/12/14 04:55:37.093550 [ApplyLTX(sqlite.db)]: txid=000000000001ba80-000000000001ba80 chksum=aa0e807bec87e251-c05a18b70e4846a8 commit=1582701 pageSize=4096 timestamp=2025-12-14T03:44:17Z mode=(WAL_MODE→WAL_MODE) path=000000000001ba80-000000000001ba80.ltx
2025-12-13 22:55:37.093	
2025/12/14 04:55:37.093445 [UpdateSHMDone(sqlite.db)]
2025-12-13 22:55:37.093	
2025/12/14 04:55:37.093122 [UpdateSHM(sqlite.db)]
2025-12-13 22:55:37.090	
2025/12/14 04:55:37.090544 [TruncateDatabase(sqlite.db)]: pageN=1582701 prevPageN=1582701 pageSize=4096
2025-12-13 22:55:37.090	
2025/12/14 04:55:37.090107 [WriteDatabasePage(sqlite.db)]: pgno=1582492 chksum=90fa73c90670675e prev=90fa73c90670675e
2025-12-13 22:55:37.090	
2025/12/14 04:55:37.089989 [WriteDatabasePage(sqlite.db)]: pgno=1449939 chksum=ce7cdf1ee1e5c640 prev=ce7cdf1ee1e5c640
2025-12-13 22:55:37.089	
2025/12/14 04:55:37.089562 [AcquireWriteLock.DONE(sqlite.db)]:
2025-12-13 22:55:37.089	
2025/12/14 04:55:37.089229 [AcquireWriteLock(sqlite.db)]:
2025-12-13 22:55:34.151
======================================== 2 minutes here ==========================================
2025-12-13 22:53:28.153	
2025/12/14 04:53:28.153495 [Recover(sqlite.db)]:
2025-12-13 22:53:28.153	
2025/12/14 04:53:28.153420 [CheckpointDone(sqlite.db)] <nil>
2025-12-13 22:53:28.153	
2025/12/14 04:53:28.153378 [UpdateSHMDone(sqlite.db)]
2025-12-13 22:53:28.153	
2025/12/14 04:53:28.153112 [UpdateSHM(sqlite.db)]
2025-12-13 22:53:28.153	

My setup:

  • I have Docker file with an entrypoint
// entrypoint
#!/usr/bin/env sh

if [ "$FLY_PROCESS_GROUP" = "app" ]; then
    export LITEFS_EXEC_CMD="npm start"
    exec litefs mount
elif [ "$FLY_PROCESS_GROUP" = "worker" ]; then
    exec npm run worker
fi
// litefs.yml
fuse:
  # Required. This is the mount directory that applications will
  # use to access their SQLite databases.
  dir: '${LITEFS_DIR}'

data:
  # Path to internal data storage.
  dir: '/data/litefs'

proxy:
  # matches the internal_port in fly.toml
  addr: ':${INTERNAL_PORT}'
  target: 'localhost:${PORT}'
  db: '${DATABASE_FILENAME}'

# The lease section specifies how the cluster will be managed. We're using the
# "consul" lease type so that our application can dynamically change the primary.
#
# These environment variables will be available in your Fly.io application.
lease:
  type: 'consul'
  candidate: ${FLY_PROCESS_GROUP == "app"}
  promote: true
  advertise-url: 'http://${HOSTNAME}.vm.${FLY_APP_NAME}.internal:20202'

  consul:
    url: '${FLY_CONSUL_URL}'
    key: '<key-here>/${FLY_APP_NAME}'

exec:
  - cmd: 'npx --yes prisma migrate deploy'
    if-candidate: true

  # Set the journal mode for the database to WAL. This reduces concurrency deadlock issues
  - cmd: 'sqlite3 $DATABASE_PATH "PRAGMA journal_mode = WAL;"'
    if-candidate: true

  # Set the journal mode for the cache to WAL. This reduces concurrency deadlock issues
  - cmd: 'sqlite3 $CACHE_DATABASE_PATH "PRAGMA journal_mode = WAL;"'
    if-candidate: true

  - cmd: 'npx prisma generate'

  # Execute the command appropriate to the proceess group based on the environment variable set for LITEFS_EXEC_CMD
  - cmd: '${LITEFS_EXEC_CMD}'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions