11---
22name : be-cost-monitoring
3- description : Monitor and analyze backend GPU costs for LTX API and LTX Studio. Use when analyzing cost trends, detecting anomalies, breaking down costs by endpoint/model/org/process, monitoring utilization, or investigating cost efficiency drift.
3+ description : " Monitor backend GPU costs for LTX API and LTX Studio with data-driven thresholds. Detects cost anomalies and utilization issues. Use when: (1) detecting cost spikes, (2) analyzing idle vs inference costs, (3) investigating efficiency drift by endpoint/model/org. "
44tags : [monitoring, costs, gpu, infrastructure, alerts]
55compatibility :
66 - BigQuery access (ltx-dwh-prod-processed)
@@ -9,154 +9,103 @@ compatibility:
99
1010# Backend Cost Monitoring
1111
12- ## When to use
12+ ## 1. Overview (Why?)
1313
14- - "Monitor GPU costs"
15- - "Analyze cost trends (daily/weekly/monthly)"
16- - "Detect cost anomalies or spikes"
17- - "Break down API costs by endpoint/model/org"
18- - "Break down Studio costs by process/workspace"
19- - "Compare API vs Studio cost distribution"
20- - "Investigate cost-per-request efficiency drift"
21- - "Monitor GPU utilization and idle costs"
22- - "Alert on cost budget breaches"
23- - "Day-over-day or week-over-week cost comparisons"
14+ This skill provides ** autonomous cost monitoring** using statistical anomaly detection. It detects genuine cost problems (autoscaler issues, efficiency regression, volume surges) while suppressing noise from expected patterns.
2415
25- ## Steps
16+ ** Problem solved ** : Detect cost spikes, utilization degradation, and efficiency drift that indicate infrastructure problems or wasteful spending — without manual threshold tuning or false alarms.
2617
27- ### 1. Gather Requirements
18+ ** Critical timing ** : Analyzes data from ** 3 days ago ** (cost data needs time to finalize).
2819
29- Ask the user:
30- - ** What to monitor** : Total costs, cost by product/feature, cost efficiency, utilization, anomalies
31- - ** Scope** : LTX API, LTX Studio, or both? Specific endpoint/org/process?
32- - ** Time window** : Daily, weekly, monthly? How far back?
33- - ** Analysis type** : Trends, comparisons (DoD/WoW), anomaly detection, breakdowns
34- - ** Alert threshold** (if setting up alerts): Absolute ($X/day) or relative (spike > X% vs baseline)
20+ ## 2. Requirements (What?)
3521
36- ### 2. Read Shared Knowledge
22+ Monitor these outcomes autonomously:
3723
38- Before writing SQL:
39- - ** ` shared/product-context.md ` ** — LTX products and business context
24+ - [ ] Idle cost spikes (over-provisioning or traffic drop without scaledown)
25+ - [ ] Inference cost spikes (volume surge, costlier model, heavy customer)
26+ - [ ] Idle-to-inference ratio degradation (utilization dropping)
27+ - [ ] Failure rate spikes (wasted compute + service quality issue)
28+ - [ ] Cost-per-request drift (model regression or resolution creep)
29+ - [ ] Day-over-day cost jumps (early warning signals)
30+ - [ ] Volume drops (potential outage)
31+ - [ ] Overhead spikes (system anomalies)
32+ - [ ] Alerts prioritized by tier (High/Medium/Low)
33+ - [ ] Vertical-specific thresholds (API vs Studio)
34+
35+ ## 3. Progress Tracker
36+
37+ * [ ] Read shared knowledge (schema, query templates, analysis patterns)
38+ * [ ] Execute monitoring query for 3 days ago
39+ * [ ] Calculate z-scores and classify severity (NOTICE/WARNING/CRITICAL)
40+ * [ ] Identify root cause (endpoint, model, org, process)
41+ * [ ] Present findings with statistical details and recommendations
42+
43+ ## 4. Implementation Plan
44+
45+ ### Phase 1: Read Shared Knowledge
46+
47+ Before analyzing, read:
4048- ** ` shared/bq-schema.md ` ** — GPU cost table schema (lines 418-615)
41- - ** ` shared/metric-standards.md ` ** — GPU cost metric patterns (section 13)
42- - ** ` shared/event-registry.yaml ` ** — Feature events (if analyzing feature-level costs)
4349- ** ` shared/gpu-cost-query-templates.md ` ** — 11 production-ready SQL queries
4450- ** ` shared/gpu-cost-analysis-patterns.md ` ** — Analysis workflows and benchmarks
4551
46- Key learnings:
47- - Table: ` ltx-dwh-prod-processed.gpu_costs.gpu_request_attribution_and_cost `
48- - Partitioned by ` dt ` (DATE) — always filter for performance
49- - ` cost_category ` : inference (requests), idle, overhead, unused
50- - Total cost = ` row_cost + attributed_idle_cost + attributed_overhead_cost ` (inference rows only)
51- - For infrastructure cost: ` SUM(row_cost) ` across all categories
52+ ** Statistical Method** : Last 10 same-day-of-week comparison with 2σ threshold
5253
53- ### 3. Run Query Templates
54+ ** Alert logic ** : ` |current_value - μ| > 2σ `
5455
55- ** If user didn't specify anything:** Run all 11 query templates from ` shared/gpu-cost-query-templates.md ` to provide comprehensive cost overview.
56+ Where:
57+ - ** μ (mean)** = average of last 10 same-day-of-week values
58+ - ** σ (stddev)** = standard deviation of last 10 same-day-of-week values
59+ - ** z-score** = (current - μ) / σ
5660
57- ** If user specified specific analysis:** Select appropriate template:
61+ ** Severity levels** :
62+ - ** NOTICE** : ` 2 < |z| ≤ 3 ` - Minor deviation, monitor
63+ - ** WARNING** : ` 3 < |z| ≤ 4.5 ` - Significant anomaly, investigate
64+ - ** CRITICAL** : ` |z| > 4.5 ` - Extreme anomaly, immediate action
5865
59- | User asks... | Use template |
60- | -------------| -------------|
61- | "What are total costs?" | Daily Total Cost (API + Studio) |
62- | "Break down API costs" | API Cost by Endpoint & Model |
63- | "Break down Studio costs" | Studio Cost by Process |
64- | "Yesterday vs day before" | Day-over-Day Comparison |
65- | "This week vs last week" | Week-over-Week Comparison |
66- | "Detect cost spikes" | Anomaly Detection (Z-Score) |
67- | "GPU utilization breakdown" | Utilization by Cost Category |
68- | "Cost efficiency by model" | Cost per Request by Model |
69- | "Which orgs cost most?" | API Cost by Organization |
66+ ### Phase 2: Execute Monitoring Query
7067
71- See ` shared/gpu-cost- query-templates.md ` for all 11 query templates.
68+ Run query for ** 3 days ago ** (cost data needs time to finalize):
7269
73- ### 4. Execute Query
74-
75- Run query using:
7670``` bash
7771bq --project_id=ltx-dwh-explore query --use_legacy_sql=false --format=pretty "
78- <query>
72+ <query from gpu-cost-query-templates.md >
7973"
8074```
8175
82- Or use BigQuery console with project ` ltx-dwh-explore ` .
83-
84- ### 5. Analyze Results
85-
86- ** For cost trends:**
87- - Compare current period vs baseline (7-day avg, prior week, prior month)
88- - Calculate % change and flag significant shifts (>15-20%)
89-
90- ** For anomaly detection:**
91- - Flag days with Z-score > 2 (cost or volume deviates > 2 std devs from rolling avg)
92- - Investigate root cause: specific endpoint/model/org, error rate spike, billing type change
76+ ✅ ** PREFERRED: Run all 11 templates for comprehensive overview**
9377
94- ** For breakdowns:**
95- - Identify top cost drivers (endpoint, model, org, process)
96- - Calculate cost per request to spot efficiency issues
97- - Check failure costs (wasted spend on errors)
78+ ### Phase 3: Analyze and Present Results
9879
99- ### 6. Present Findings
80+ Format findings with:
81+ - ** Summary** : Key finding (e.g., "Idle costs spiked 45% on March 5")
82+ - ** Severity** : NOTICE / WARNING / CRITICAL
83+ - ** Z-score** : Statistical deviation from baseline
84+ - ** Root cause** : Top contributors by endpoint/model/org/process
85+ - ** Recommendation** : Action to take, team to notify
10086
101- Format results with:
102- - ** Summary** : Key finding (e.g., "GPU costs spiked 45% yesterday")
103- - ** Root cause** : What drove the change (e.g., "LTX API /v1/text-to-video requests +120%")
104- - ** Breakdown** : Top contributors by dimension
105- - ** Recommendation** : Action to take (investigate org X, optimize model Y, alert team)
87+ ## 5. Constraints & Done
10688
107- ### 7. Set Up Alert (if requested)
89+ ### DO NOT
10890
109- For ongoing monitoring:
110- 1 . Save SQL query
111- 2 . Set up in BigQuery scheduled query or Hex Thread
112- 3 . Configure notification threshold
113- 4 . Route alerts to Slack channel or Linear issue
91+ - ** DO NOT** analyze yesterday's data — use 3 days ago (cost data needs time to finalize)
92+ - ** DO NOT** sum row_cost + attributed_ * across all cost_categories — causes double-counting
93+ - ** DO NOT** skip partition filtering — always filter on ` dt `
11494
115- ## Schema Reference
95+ ### DO
11696
117- For detailed table schema including all dimensions, columns, and cost calculations, see ` references/schema-reference.md ` .
118-
119- ## Reference Files
120-
121- | File | Read when |
122- | ------| -----------|
123- | ` references/schema-reference.md ` | GPU cost table dimensions, columns, and cost calculations |
124- | ` shared/bq-schema.md ` | Understanding GPU cost table schema (lines 418-615) |
125- | ` shared/metric-standards.md ` | GPU cost metric SQL patterns (section 13) |
126- | ` shared/gpu-cost-query-templates.md ` | Selecting query template for analysis (11 production-ready queries) |
127- | ` shared/gpu-cost-analysis-patterns.md ` | Interpreting results, workflows, benchmarks, investigation playbooks |
128-
129- ## Rules
130-
131- ### Query Best Practices
132-
133- - ** DO** always filter on ` dt ` partition column for performance
97+ - ** DO** use ` DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY) ` as target date
13498- ** DO** filter ` cost_category = 'inference' ` for request-level analysis
135- - ** DO** exclude Lightricks team requests with ` is_lt_team IS FALSE ` for customer-facing cost analysis
136- - ** DO** include LT team requests only when analyzing total infrastructure spend or debugging
137- - ** DO** use ` ltx-dwh-explore ` as execution project
138- - ** DO** calculate cost per request with ` SAFE_DIVIDE ` to avoid division by zero
139- - ** DO** compare against baseline (7-day avg, prior period) for trends
140- - ** DO** round cost values to 2 decimal places for readability
141-
142- ### Cost Calculation
143-
99+ - ** DO** exclude LT team with ` is_lt_team IS FALSE ` for customer-facing cost
100+ - ** DO** use execution project ` ltx-dwh-explore `
144101- ** DO** sum all three cost columns (row_cost + attributed_idle + attributed_overhead) for fully loaded cost per request
145- - ** DO** use ` SUM(row_cost) ` across all rows for total infrastructure cost
146- - ** DO NOT** sum row_cost + attributed_ * across all cost_categories (double-counting)
147- - ** DO NOT** mix inference and non-inference rows in same aggregation without filtering
148-
149- ### Analysis
150-
151- - ** DO** flag anomalies with Z-score > 2 (cost or volume deviation > 2 std devs)
152- - ** DO** investigate failure costs (wasted spend on errors)
153- - ** DO** break down by endpoint/model for API, by process for Studio
154- - ** DO** check cost per request trends to spot efficiency degradation
155- - ** DO** validate results against total infrastructure spend
102+ - ** DO** flag anomalies with Z-score > 2 (statistical threshold)
103+ - ** DO** route alerts: API team (API costs), Studio team (Studio costs), Engineering (infrastructure)
156104
157- ### Alerts
105+ ### Completion Criteria
158106
159- - ** DO** set thresholds based on historical baseline, not absolute values
160- - ** DO** alert engineering team for cost spikes > 30% vs baseline
161- - ** DO** include cost breakdown and root cause in alerts
162- - ** DO** route API cost alerts to API team, Studio alerts to Studio team
107+ ✅ Query executed for 3 days ago with partition filtering
108+ ✅ Statistical analysis applied (2σ threshold)
109+ ✅ Alerts classified by severity (NOTICE/WARNING/CRITICAL)
110+ ✅ Root cause identified (endpoint/model/org/process)
111+ ✅ Results presented with cost breakdown and z-scores
0 commit comments