A Tarantool role that exposes configurable HTTP health endpoints (e.g. /healthcheck), runs built-in checks (cluster and replication), executes your own checks, and can emit alerts.
- Quick start (working config)
- Why use it
- Configuration (from simple to advanced)
- Default checks
- Additional checks
- Custom checks (user-defined)
- Response format (default)
Create config.yml:
roles_cfg:
roles.healthcheck:
http:
- endpoints:
- path: /healthcheck
groups:
group-001:
replicasets:
router:
instances:
router:
roles: [roles.httpd, roles.healthcheck]
roles_cfg:
roles.httpd:
default:
listen: '127.0.0.1:8081'Create instances.yml:
router:Then initialize and start the instance with tt:
tt init
tt start
curl http://127.0.0.1:8081/healthcheck
{"status":"alive"}After start, http://127.0.0.1:8081/healthcheck returns 200 when all checks pass, and 500 with details when some checks fail.
- HTTP endpoint(s) for liveness with meaningful failure reasons.
- Built-in defaults: Tarantool status (
box.info.status) and ability to write snapshot/WAL files. - Optional additional checks (e.g. replication).
- Custom criteria: add your own
healthcheck.check_*functions. - Optional alerts, rate limiting, and custom response formats.
The snippet above enables one endpoint at /healthcheck on the default HTTP server; you can add more paths/endpoints if needed.
For details on HTTP server configuration, see the tarantool/http README.
roles_cfg:
roles.httpd:
default:
listen: '127.0.0.1:8081'
additional:
listen: '127.0.0.1:8082'
roles.healthcheck:
http:
- server: additional
endpoints:
- path: /hcroles_cfg:
roles.healthcheck:
ratelim_rps: 5 # requests per second; null (default) disables
http:
- endpoints:
- path: /healthcheckExcess requests return 429.
roles_cfg:
roles.healthcheck:
set_alerts: true
http:
- endpoints:
- path: /healthcheckFailed checks are mirrored into alerts.
Alerts are visible via box.info.config.alerts (see the
config.info() reference)
and in the TCM web interface.
roles_cfg:
roles.healthcheck:
checks:
include: [all] # default
exclude: ['replication.upstream_absent', 'replication.state_bad'] # default {}
http:
- endpoints:
- path: /healthcheckinclude / exclude applies to built-in additional checks. exclude wins. User checks run unless explicitly excluded.
Provide a formatter function in box.func returning {status=<number>, headers=?, body=?}.
For details on the HTTP request/response format, see
Fields and methods of the request object.
box.schema.func.create('custom_healthcheck_format', {
language = 'LUA',
body = [[
function(is_healthy, details)
local json = require('json')
if is_healthy then
return { status = 200, body = json.encode({ok=true}) }
end
return {
status = 560,
headers = {['content-type'] = 'application/json'},
body = json.encode({errors = details}),
}
end
]]
})Use it in the endpoint:
roles_cfg:
roles.healthcheck:
http:
- endpoints:
- path: /healthcheck
format: custom_healthcheck_format| Check key | What it does | Fails when |
|---|---|---|
check_box_info_status |
box.info.status == 'running' |
Tarantool status is not running |
check_snapshot_dir |
snapshot.dir exists (respecting work_dir) |
Snapshot dir missing or inaccessible |
check_wal_dir |
wal.dir exists (respecting work_dir) |
WAL dir missing or inaccessible |
| Key prefix / detail | Runs when | Fails when / detail example |
|---|---|---|
replication.upstream_absent.<peer> |
Replica nodes | No upstream for a peer;Replication from <peer> to <self> is not running |
replication.state_bad.<peer> |
Replica nodes | Upstream state not follow/sync; includes upstream state/message |
Additional checks are included by default; refine with checks.include / checks.exclude.
Only follow and sync states are considered healthy for replication.state_bad.*.
Any box.func named healthcheck.check_* is executed unless excluded. If a user-defined
check throws an error or returns a non-boolean result, the healthcheck stops iterating over
the remaining user checks; this fail-fast approach keeps broken checks visible and nudges
you to fix or exclude them explicitly.
-- migration or role code
box.schema.func.create('healthcheck.check_space_size', {
if_not_exists = true,
language = 'LUA',
body = [[
function()
local limit = 10 * 1024 * 1024
local used = box.space.my_space:bsize()
if used > limit then
return false, 'my_space is larger than 10MB'
end
return true
end
]]
})Exclude if needed:
roles_cfg:
roles.healthcheck:
checks:
exclude:
- healthcheck.check_space_size
http:
- endpoints:
- path: /healthcheck200 OKwith body{"status":"alive"}500 Internal Server Errorwith body{"status":"dead","details":["<key>: <reason>", ...]}(details sorted)- Rate-limited requests return
429with{"status":"rate limit exceeded"}