Shift-left reliability for platform teams.
Define reliability requirements as code. Validate SLOs against dependency chains. Detect drift before incidents. Gate deployments on real data.
pip install nthlayerReliability decisions happen too late. Teams set SLOs in isolation, deploy without checking error budgets, and discover missing metrics during incidents. Dashboards are inconsistent. Alerts are copy-pasted. Nobody validates whether a 99.99% target is even achievable given dependencies.
NthLayer moves reliability left:
service.yaml → validate → check-deploy → deploy
│ │
│ └── Error budget ok? Drift acceptable?
│
└── SLO feasible? Dependencies support it? Metrics exist?
Predict SLO exhaustion before it happens. Don't wait for the budget to hit zero.
$ nthlayer drift payment-api
payment-api: CRITICAL
Current: 73.2% budget remaining
Trend: -2.1%/day (gradual decline)
Projection: Budget exhausts in 23 days
Recommendation: Investigate error rate increase before next releaseYour SLO ceiling is your weakest dependency chain. NthLayer calculates it.
$ nthlayer validate-slo payment-api
Target: 99.99% availability
Dependencies:
→ postgresql (99.95%)
→ redis (99.99%)
→ user-service (99.9%)
Serial availability: 99.84%
✗ INFEASIBLE: Target exceeds dependency ceiling by 0.15%
Recommendation: Reduce target to 99.8% or improve user-service SLOBlock deploys when error budget is exhausted or drift is critical.
$ nthlayer check-deploy payment-api
ERROR: Deployment blocked
- Error budget: -47 minutes (exhausted)
- Drift severity: critical
- 3 P1 incidents in last 7 days
Exit code: 2 (BLOCKED)Understand impact before making changes.
$ nthlayer blast-radius payment-api
Direct dependents (3):
• checkout-service (critical) - 847K req/day
• order-service (critical) - 523K req/day
• refund-worker (standard) - 12K req/day
Transitive impact: 12 services, 2.1M daily requests
Risk: HIGH - affects checkout flowEnforce OpenTelemetry conventions. Know what's missing before production.
$ nthlayer recommend-metrics payment-api
Required (SLO-critical):
✓ http.server.request.duration FOUND
✗ http.server.active_requests MISSING
Run with --show-code for instrumentation examples.Generate dashboards, alerts, and SLOs from a single spec.
$ nthlayer apply service.yaml
Generated:
→ dashboard.json (Grafana)
→ alerts.yaml (Prometheus)
→ recording-rules.yaml (Prometheus)
→ slos.yaml (OpenSLO)# Install
pip install nthlayer
# Create a service spec
nthlayer init
# Validate and generate
nthlayer apply service.yaml
# Check deployment readiness
nthlayer check-deploy payment-apiname: payment-api
tier: critical
type: api
team: payments
dependencies:
- postgresql
- redisNthLayer also supports the OpenSRM format (apiVersion: srm/v1) for contracts, deployment gates, and more. See full spec reference for all options.
# GitHub Actions
- name: Validate reliability
run: |
nthlayer validate-slo ${{ matrix.service }}
nthlayer check-deploy ${{ matrix.service }}Works with: GitHub Actions, GitLab CI, ArgoCD, Tekton, Jenkins
| Traditional Approach | NthLayer |
|---|---|
| Set SLOs in isolation | Validate against dependency chains |
| Alert when budget exhausted | Predict exhaustion with drift detection |
| Discover missing metrics in incidents | Enforce before deployment |
| Manual dashboard creation | Generate from spec |
| "Is this ready?" = opinion | "Is this ready?" = deterministic check |
Full Documentation - Comprehensive guides and reference.
| Guide | Description |
|---|---|
| Quick Start | Get running in 5 minutes |
| Drift Detection | Predict SLO exhaustion |
| Dependency Discovery | Automatic dependency mapping |
| CI/CD Integration | Pipeline setup |
| CLI Reference | All commands |
- Artifact generation (dashboards, alerts, SLOs)
- Deployment gates (check-deploy)
- Error budget tracking
- Portfolio view
- Drift detection
- Dependency discovery
- validate-slo
- blast-radius
- Metric recommendations
- OpenSRM manifest format (
srm/v1) - Reliability scorecard
- Loki alert generation
- Recording rules generation
- Contract & dependency validation
- Intelligent alerts pipeline
- Identity resolution & ownership
- CI/CD GitHub Action
- MCP server integration
- Backstage plugin
# Install uv (https://docs.astral.sh/uv/)
curl -LsSf https://astral.sh/uv/install.sh | sh
git clone https://github.com/rsionnach/nthlayer.git
cd nthlayer
make setup # Install deps, start services
make test # Run testsSee CONTRIBUTING.md for details.
MIT - See LICENSE.txt
Built on grafana-foundation-sdk, awesome-prometheus-alerts, pint, and OpenSLO. Inspired by Sloth and autograf.
