-
Notifications
You must be signed in to change notification settings - Fork 116
Description
So I'm having this issue where my app with 2 machines on fly.io, is currently taking about 3 minutes each machine to deploy, and about 80% of that time is occupied by litefs.
I'm looking for help on how to fix this and make it fast, I think 6.7gb is not that big of a database, while talking to the support team at fly.io, there is also a heavy increase in cpu usage to almost 100% usage during that time. Here are some pictures and logs about what happened.
Litefs version: 0.5.14
Here is a log of one of my deployments where 1 step takes alomst 2 minutes:
08:13:15 level=INFO msg=”initializing consul: key=iuspro-20250530/notario url=https://:656117fa-394f-1726-8904-f31ddd6cce70@consul-iad-11.fly-shared.net/notario-yexkqwp8dm79m38d/ hostname=4d894672a06148 advertise-url=[http://4d894672a06148.vm.notario.internal:20202”](http://4d894672a06148.vm.notario.internal:20202%22/)
08:13:15 2025/12/15 08:13:15 INFO SSH listening listen_address=[fdaa:2:fb49:a7b:1eb:5c53:e84d:2]:22
08:13:15 level=INFO msg=”wal-sync: short wal file exists on \”cache.db\”, skipping sync with ltx”
08:13:16 Health check ‘servicecheck-01-http-8080’ on port 8080 has failed. Your app is not responding properly. Services exposed on ports [80, 443] will have intermittent failures until the health check passes.
08:13:16 Health check ‘servicecheck-00-tcp-8080’ on port 8080 has failed. Your app is not responding properly. Services exposed on ports [80, 443] will have intermittent failures until the health check passes.
08:13:23 level=INFO msg=”wal-sync: database \”sqlite.db\” has wal size of 2319592 bytes within range of ltx file (@1161872, 1157720 bytes)”
====================== 2 minutes here ===============================================
08:15:00 level=INFO msg=”using existing cluster id: \”LFSCD2310317B4BCC32D\””
08:15:00 level=INFO msg=”LiteFS mounted to: /litefs/data”
08:15:00 level=INFO msg=”http server listening on: [http://localhost:20202”](http://localhost:20202%22/)
During one of the deployments, while debugging, I ran litefs mount -tracing -fuse.debug to get more info here are relevant logs (full logs of that session at https://pastebin.com/RR2nw7Yr):
2025/12/14 04:55:37.093550 [ApplyLTX(sqlite.db)]: txid=000000000001ba80-000000000001ba80 chksum=aa0e807bec87e251-c05a18b70e4846a8 commit=1582701 pageSize=4096 timestamp=2025-12-14T03:44:17Z mode=(WAL_MODE→WAL_MODE) path=000000000001ba80-000000000001ba80.ltx
2025-12-13 22:55:37.093
2025/12/14 04:55:37.093445 [UpdateSHMDone(sqlite.db)]
2025-12-13 22:55:37.093
2025/12/14 04:55:37.093122 [UpdateSHM(sqlite.db)]
2025-12-13 22:55:37.090
2025/12/14 04:55:37.090544 [TruncateDatabase(sqlite.db)]: pageN=1582701 prevPageN=1582701 pageSize=4096
2025-12-13 22:55:37.090
2025/12/14 04:55:37.090107 [WriteDatabasePage(sqlite.db)]: pgno=1582492 chksum=90fa73c90670675e prev=90fa73c90670675e
2025-12-13 22:55:37.090
2025/12/14 04:55:37.089989 [WriteDatabasePage(sqlite.db)]: pgno=1449939 chksum=ce7cdf1ee1e5c640 prev=ce7cdf1ee1e5c640
2025-12-13 22:55:37.089
2025/12/14 04:55:37.089562 [AcquireWriteLock.DONE(sqlite.db)]:
2025-12-13 22:55:37.089
2025/12/14 04:55:37.089229 [AcquireWriteLock(sqlite.db)]:
2025-12-13 22:55:34.151
======================================== 2 minutes here ==========================================
2025-12-13 22:53:28.153
2025/12/14 04:53:28.153495 [Recover(sqlite.db)]:
2025-12-13 22:53:28.153
2025/12/14 04:53:28.153420 [CheckpointDone(sqlite.db)] <nil>
2025-12-13 22:53:28.153
2025/12/14 04:53:28.153378 [UpdateSHMDone(sqlite.db)]
2025-12-13 22:53:28.153
2025/12/14 04:53:28.153112 [UpdateSHM(sqlite.db)]
2025-12-13 22:53:28.153
My setup:
- I have Docker file with an entrypoint
// entrypoint
#!/usr/bin/env sh
if [ "$FLY_PROCESS_GROUP" = "app" ]; then
export LITEFS_EXEC_CMD="npm start"
exec litefs mount
elif [ "$FLY_PROCESS_GROUP" = "worker" ]; then
exec npm run worker
fi
// litefs.yml
fuse:
# Required. This is the mount directory that applications will
# use to access their SQLite databases.
dir: '${LITEFS_DIR}'
data:
# Path to internal data storage.
dir: '/data/litefs'
proxy:
# matches the internal_port in fly.toml
addr: ':${INTERNAL_PORT}'
target: 'localhost:${PORT}'
db: '${DATABASE_FILENAME}'
# The lease section specifies how the cluster will be managed. We're using the
# "consul" lease type so that our application can dynamically change the primary.
#
# These environment variables will be available in your Fly.io application.
lease:
type: 'consul'
candidate: ${FLY_PROCESS_GROUP == "app"}
promote: true
advertise-url: 'http://${HOSTNAME}.vm.${FLY_APP_NAME}.internal:20202'
consul:
url: '${FLY_CONSUL_URL}'
key: '<key-here>/${FLY_APP_NAME}'
exec:
- cmd: 'npx --yes prisma migrate deploy'
if-candidate: true
# Set the journal mode for the database to WAL. This reduces concurrency deadlock issues
- cmd: 'sqlite3 $DATABASE_PATH "PRAGMA journal_mode = WAL;"'
if-candidate: true
# Set the journal mode for the cache to WAL. This reduces concurrency deadlock issues
- cmd: 'sqlite3 $CACHE_DATABASE_PATH "PRAGMA journal_mode = WAL;"'
if-candidate: true
- cmd: 'npx prisma generate'
# Execute the command appropriate to the proceess group based on the environment variable set for LITEFS_EXEC_CMD
- cmd: '${LITEFS_EXEC_CMD}'