Skip to content

DB init does not wait for shutdown before starting main db process #730

@das7pad

Description

@das7pad

We are using a db-init script for setting up the mongo replica set in CI. The process has been quite flaky following the upgrade to mongodb 8.

https://github.com/overleaf/overleaf/blob/898a0b534ec2b3832beda1ac829e8951250b1a20/server-ce/test/docker-compose.yml#L36-L40

https://github.com/overleaf/overleaf/blob/898a0b534ec2b3832beda1ac829e8951250b1a20/bin/shared/mongodb-init-replica-set.js#L3

Taking a look at the entrypoint suggests that the shutdown of the init db is merely initiated, but not awaited. When the main db process is started, the init db might still be running.

...
17:17:11  mongo-1  | {"t":{"$date":"2025-08-11T16:17:08.804+00:00"},"s":"I",  "c":"STORAGE",  "id":22279,   "ctx":"SignalHandler","msg":"shutdown: removing fs lock..."}
17:17:11  mongo-1  | {"t":{"$date":"2025-08-11T16:17:08.804+00:00"},"s":"I",  "c":"STORAGE",  "id":7474902, "ctx":"SignalHandler","msg":"Shutting down oplog cap maintainer thread"}
17:17:11  mongo-1  | {"t":{"$date":"2025-08-11T16:17:09.000+00:00"},"s":"W",  "c":"REPL",     "id":6100702, "ctx":"ftdc","msg":"Failed to get last stable recovery timestamp due to lock acquire timeout. Note this is expected if shutdown is in progress."}
17:17:11  mongo-1  | {"t":{"$date":"2025-08-11T16:17:09.002+00:00"},"s":"I",  "c":"STORAGE",  "id":7474901, "ctx":"SignalHandler","msg":"Finished shutting down oplog cap maintainer thread"}
17:17:11  mongo-1  | {"t":{"$date":"2025-08-11T16:17:09.002+00:00"},"s":"I",  "c":"-",        "id":4784931, "ctx":"SignalHandler","msg":"Dropping the scope cache for shutdown"}
17:17:11  mongo-1  | {"t":{"$date":"2025-08-11T16:17:09.002+00:00"},"s":"I",  "c":"FTDC",     "id":20626,   "ctx":"SignalHandler","msg":"Shutting down full-time diagnostic data capture"}
17:17:11  mongo-1  | 
17:17:11  mongo-1  | MongoDB init process complete; ready for start up.
17:17:11  mongo-1  | 
17:17:11  mongo-1  | {"t":{"$date":"2025-08-11T16:17:09.776+00:00"},"s":"I",  "c":"CONTROL",  "id":23285,   "ctx":"main","msg":"Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'"}
17:17:11  mongo-1  | {"t":{"$date":"2025-08-11T16:17:09.776+00:00"},"s":"I",  "c":"CONTROL",  "id":5945603, "ctx":"main","msg":"Multi threading initialized"}
17:17:11  mongo-1  | {"t":{"$date":"2025-08-11T16:17:09.776+00:00"},"s":"I",  "c":"NETWORK",  "id":4648601, "ctx":"main","msg":"Implicit TCP FastOpen unavailable. If TCP FastOpen is required, set at least one of the related parameters","attr":{"relatedParameters":["tcpFastOpenServer","tcpFastOpenClient","tcpFastOpenQueueSize"]}}
17:17:11  mongo-1  | {"t":{"$date":"2025-08-11T16:17:09.777+00:00"},"s":"I",  "c":"NETWORK",  "id":4915701, "ctx":"main","msg":"Initialized wire specification","attr":{"spec":{"incomingExternalClient":{"minWireVersion":0,"maxWireVersion":25},"incomingInternalClient":{"minWireVersion":0,"maxWireVersion":25},"outgoing":{"minWireVersion":6,"maxWireVersion":25},"isInternalClient":true}}}
17:17:11  mongo-1  | {"t":{"$date":"2025-08-11T16:17:09.777+00:00"},"s":"I",  "c":"TENANT_M", "id":7091600, "ctx":"main","msg":"Starting TenantMigrationAccessBlockerRegistry"}
17:17:11  mongo-1  | {"t":{"$date":"2025-08-11T16:17:09.777+00:00"},"s":"I",  "c":"CONTROL",  "id":4615611, "ctx":"initandlisten","msg":"MongoDB starting","attr":{"pid":1,"port":27017,"dbPath":"/data/db","architecture":"64-bit","host":"886f613caec4"}}
17:17:11  mongo-1  | {"t":{"$date":"2025-08-11T16:17:09.777+00:00"},"s":"I",  "c":"CONTROL",  "id":23403,   "ctx":"initandlisten","msg":"Build Info","attr":{"buildInfo":{"version":"8.0.11","gitVersion":"bed99f699da6cb2b74262aa6d473446c41476643","openSSLVersion":"OpenSSL 3.0.13 30 Jan 2024","modules":[],"allocator":"tcmalloc-google","environment":{"distmod":"ubuntu2404","distarch":"x86_64","target_arch":"x86_64"}}}}
17:17:11  mongo-1  | {"t":{"$date":"2025-08-11T16:17:09.777+00:00"},"s":"I",  "c":"CONTROL",  "id":51765,   "ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"Ubuntu","version":"24.04"}}}
17:17:11  mongo-1  | {"t":{"$date":"2025-08-11T16:17:09.777+00:00"},"s":"I",  "c":"CONTROL",  "id":21951,   "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"net":{"bindIp":"*"},"replication":{"replSet":"overleaf"}}}}
17:17:11  mongo-1  | {"t":{"$date":"2025-08-11T16:17:09.778+00:00"},"s":"E",  "c":"CONTROL",  "id":20568,   "ctx":"initandlisten","msg":"Error setting up listener","attr":{"error":{"code":9001,"codeName":"SocketException","errmsg":"0.0.0.0:27017 :: caused by :: setup bind :: caused by :: Address already in use"}}}

Suggested fix: Wait for the pid ahead of proceeding with the startup of the main db.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions