Skip to content

Generate the complete reliability stack from a service spec in 5 minutes. Dashboards, alerts, SLOs, PagerDuty - zero toil.

Notifications You must be signed in to change notification settings

rsionnach/nthlayer

Use this GitHub action with your project
Add this Action to an existing workflow or create a new one
View on Marketplace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

375 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NthLayer

Shift-left reliability for platform teams.

Define reliability requirements as code. Validate SLOs against dependency chains. Detect drift before incidents. Gate deployments on real data.

Status: Alpha PyPI License: MIT Alert Rules

TL;DR

pip install nthlayer
nthlayer check-deploy demo

⚠️ The Problem

Reliability decisions happen too late. Teams set SLOs in isolation, deploy without checking error budgets, and discover missing metrics during incidents. Dashboards are inconsistent. Alerts are copy-pasted. Nobody validates whether a 99.99% target is even achievable given dependencies.

💡 The Solution

NthLayer moves reliability left:

service.yaml → validate → check-deploy → deploy
                  │            │
                  │            └── Error budget ok? Drift acceptable?
                  │
                  └── SLO feasible? Dependencies support it? Metrics exist?

⚡ Core Features

Drift Detection

Predict SLO exhaustion before it happens. Don't wait for the budget to hit zero.

$ nthlayer drift payment-api

payment-api: CRITICAL
  Current: 73.2% budget remaining
  Trend: -2.1%/day (gradual decline)
  Projection: Budget exhausts in 23 days

  Recommendation: Investigate error rate increase before next release

Dependency-Aware SLO Validation

Your SLO ceiling is your weakest dependency chain. NthLayer calculates it.

$ nthlayer validate-slo payment-api

Target: 99.99% availability
Dependencies:
  → postgresql (99.95%)
  → redis (99.99%)
  → user-service (99.9%)

Serial availability: 99.84%
✗ INFEASIBLE: Target exceeds dependency ceiling by 0.15%

Recommendation: Reduce target to 99.8% or improve user-service SLO

Deployment Gates

Block deploys when error budget is exhausted or drift is critical.

$ nthlayer check-deploy payment-api

ERROR: Deployment blocked
  - Error budget: -47 minutes (exhausted)
  - Drift severity: critical
  - 3 P1 incidents in last 7 days

Exit code: 2 (BLOCKED)

Blast Radius Analysis

Understand impact before making changes.

$ nthlayer blast-radius payment-api

Direct dependents (3):
  • checkout-service (critical) - 847K req/day
  • order-service (critical) - 523K req/day
  • refund-worker (standard) - 12K req/day

Transitive impact: 12 services, 2.1M daily requests
Risk: HIGH - affects checkout flow

Metric Recommendations

Enforce OpenTelemetry conventions. Know what's missing before production.

$ nthlayer recommend-metrics payment-api

Required (SLO-critical):
  ✓ http.server.request.duration    FOUND
  ✗ http.server.active_requests     MISSING

Run with --show-code for instrumentation examples.

Artifact Generation

Generate dashboards, alerts, and SLOs from a single spec.

$ nthlayer apply service.yaml

Generated:
  → dashboard.json (Grafana)
  → alerts.yaml (Prometheus)
  → recording-rules.yaml (Prometheus)
  → slos.yaml (OpenSLO)

🚀 Quick Start

# Install
pip install nthlayer

# Create a service spec
nthlayer init

# Validate and generate
nthlayer apply service.yaml

# Check deployment readiness
nthlayer check-deploy payment-api

Minimal service.yaml

name: payment-api
tier: critical
type: api
team: payments

dependencies:
  - postgresql
  - redis

NthLayer also supports the OpenSRM format (apiVersion: srm/v1) for contracts, deployment gates, and more. See full spec reference for all options.


🔄 CI/CD Integration

# GitHub Actions
- name: Validate reliability
  run: |
    nthlayer validate-slo ${{ matrix.service }}
    nthlayer check-deploy ${{ matrix.service }}

Works with: GitHub Actions, GitLab CI, ArgoCD, Tekton, Jenkins


🎯 How It's Different

Traditional Approach NthLayer
Set SLOs in isolation Validate against dependency chains
Alert when budget exhausted Predict exhaustion with drift detection
Discover missing metrics in incidents Enforce before deployment
Manual dashboard creation Generate from spec
"Is this ready?" = opinion "Is this ready?" = deterministic check

📚 Documentation

Full Documentation - Comprehensive guides and reference.

Ask DeepWiki

Guide Description
Quick Start Get running in 5 minutes
Drift Detection Predict SLO exhaustion
Dependency Discovery Automatic dependency mapping
CI/CD Integration Pipeline setup
CLI Reference All commands

🗺️ Roadmap

  • Artifact generation (dashboards, alerts, SLOs)
  • Deployment gates (check-deploy)
  • Error budget tracking
  • Portfolio view
  • Drift detection
  • Dependency discovery
  • validate-slo
  • blast-radius
  • Metric recommendations
  • OpenSRM manifest format (srm/v1)
  • Reliability scorecard
  • Loki alert generation
  • Recording rules generation
  • Contract & dependency validation
  • Intelligent alerts pipeline
  • Identity resolution & ownership
  • CI/CD GitHub Action
  • MCP server integration
  • Backstage plugin

🤝 Contributing

# Install uv (https://docs.astral.sh/uv/)
curl -LsSf https://astral.sh/uv/install.sh | sh

git clone https://github.com/rsionnach/nthlayer.git
cd nthlayer
make setup    # Install deps, start services
make test     # Run tests

See CONTRIBUTING.md for details.


📄 License

MIT - See LICENSE.txt


🙏 Acknowledgments

Built on grafana-foundation-sdk, awesome-prometheus-alerts, pint, and OpenSLO. Inspired by Sloth and autograf.

About

Generate the complete reliability stack from a service spec in 5 minutes. Dashboards, alerts, SLOs, PagerDuty - zero toil.

Topics

Resources

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors 4

  •  
  •  
  •  
  •