11---
22name : be-cost-monitoring
3- description : Monitor and analyze backend GPU costs for LTX API and LTX Studio. Use when analyzing cost trends, detecting anomalies, breaking down costs by endpoint/model/org/process, monitoring utilization, or investigating cost efficiency drift.
3+ description : " Monitor backend GPU costs for LTX API and LTX Studio with data-driven thresholds. Detects cost anomalies and utilization issues. Use when: (1) detecting cost spikes, (2) analyzing idle vs inference costs, (3) investigating efficiency drift by endpoint/model/org. "
44tags : [monitoring, costs, gpu, infrastructure, alerts]
55compatibility :
66 - BigQuery access (ltx-dwh-prod-processed)
@@ -9,52 +9,98 @@ compatibility:
99
1010# Backend Cost Monitoring
1111
12- ## When to use
12+ ## 1. Overview (Why?)
1313
14- - "Monitor GPU costs"
15- - "Analyze cost trends (daily/weekly/monthly)"
16- - "Detect cost anomalies or spikes"
17- - "Break down API costs by endpoint/model/org"
18- - "Break down Studio costs by process/workspace"
19- - "Compare API vs Studio cost distribution"
20- - "Investigate cost-per-request efficiency drift"
21- - "Monitor GPU utilization and idle costs"
22- - "Alert on cost budget breaches"
23- - "Day-over-day or week-over-week cost comparisons"
14+ LTX GPU infrastructure spending is dominated by idle costs (~ 72% of total), with actual inference representing only ~ 27%. Cost patterns vary significantly by vertical (API vs Studio) and day-of-week. Generic alerting thresholds miss true anomalies while flagging normal variance.
2415
25- ## Steps
16+ This skill provides ** autonomous cost monitoring ** with data-driven thresholds derived from 60-day statistical analysis. It detects genuine cost problems (autoscaler issues, efficiency regression, volume surges) while suppressing noise from expected patterns.
2617
27- ### 1. Gather Requirements
18+ ** Problem solved ** : Detect cost spikes, utilization degradation, and efficiency drift that indicate infrastructure problems or wasteful spending — without manual threshold tuning or false alarms.
2819
29- Ask the user:
30- - ** What to monitor** : Total costs, cost by product/feature, cost efficiency, utilization, anomalies
31- - ** Scope** : LTX API, LTX Studio, or both? Specific endpoint/org/process?
32- - ** Time window** : Daily, weekly, monthly? How far back?
33- - ** Analysis type** : Trends, comparisons (DoD/WoW), anomaly detection, breakdowns
34- - ** Alert threshold** (if setting up alerts): Absolute ($X/day) or relative (spike > X% vs baseline)
20+ ** Critical timing** : Analyzes data from ** 3 days ago** (cost data needs time to finalize).
3521
36- ### 2. Read Shared Knowledge
22+ ## 2. Requirements (What?)
3723
38- Before writing SQL:
39- - ** ` shared/product-context.md ` ** — LTX products and business context
24+ Monitor these outcomes autonomously:
25+
26+ - [ ] Idle cost spikes (over-provisioning or traffic drop without scaledown)
27+ - [ ] Inference cost spikes (volume surge, costlier model, heavy customer)
28+ - [ ] Idle-to-inference ratio degradation (utilization dropping)
29+ - [ ] Failure rate spikes (wasted compute + service quality issue)
30+ - [ ] Cost-per-request drift (model regression or resolution creep)
31+ - [ ] Day-over-day cost jumps (early warning signals)
32+ - [ ] Volume drops (potential outage)
33+ - [ ] Overhead spikes (system anomalies)
34+ - [ ] Alerts prioritized by tier (High/Medium/Low)
35+ - [ ] Vertical-specific thresholds (API vs Studio)
36+
37+ ## 3. Progress Tracker
38+
39+ * [ ] Read production thresholds and shared knowledge
40+ * [ ] Select appropriate query template(s)
41+ * [ ] Execute query for 3 days ago
42+ * [ ] Analyze results by tier priority
43+ * [ ] Identify root cause (endpoint, model, org, process)
44+ * [ ] Present findings with cost breakdown
45+ * [ ] Route alerts to appropriate team (API/Studio/Engineering)
46+
47+ ## 4. Implementation Plan
48+
49+ ### Phase 1: Read Production Thresholds
50+
51+ Production thresholds from ` /Users/dbeer/workspace/dwh-data-model-transforms/queries/gpu_cost_alerting_thresholds.md ` :
52+
53+ #### Tier 1 — High Priority (daily monitoring)
54+ | Alert | Threshold | Signal |
55+ | -------| -----------| --------|
56+ | ** Idle cost spike** | > $15,600/day | Autoscaler over-provisioning or traffic drop without scaledown |
57+ | ** Inference cost spike** | > $5,743/day | Volume surge, costlier model, or heavy new customer |
58+ | ** Idle-to-inference ratio** | > 4:1 | GPU utilization degrading (baseline ~ 2.7:1) |
59+
60+ #### Tier 2 — Medium Priority (daily monitoring)
61+ | Alert | Threshold | Signal |
62+ | -------| -----------| --------|
63+ | ** Failure rate spike** | > 20.4% overall | Wasted compute + service quality issue |
64+ | ** Cost-per-request drift** | API > $0.70, Studio > $0.46 | Model regression or duration/resolution creep |
65+ | ** DoD cost jump** | API > 30%, Studio > 15% | Early warning before absolute thresholds breach |
66+
67+ #### Tier 3 — Low Priority (weekly review)
68+ | Alert | Threshold | Signal |
69+ | -------| -----------| --------|
70+ | ** Volume drop** | < 18,936 requests/day | Possible outage or upstream feed issue |
71+ | ** Overhead spike** | > $138/day | System overhead anomaly (review weekly) |
72+
73+ ** Per-Vertical Thresholds:**
74+ - ** LTX API** : Total daily > $5,555, Failure rate > 5.7%, DoD change > 30%
75+ - ** LTX Studio** : Total daily > $11,928, Failure rate > 22.6%, DoD change > 15%
76+
77+ ** Alert logic** : Cost metric exceeds WARNING (avg+2σ) or CRITICAL (avg+3σ) threshold
78+
79+ ### Phase 2: Read Shared Knowledge
80+
81+ Before writing SQL, read:
82+ - ** ` shared/product-context.md ` ** — LTX products, user types, business model
4083- ** ` shared/bq-schema.md ` ** — GPU cost table schema (lines 418-615)
4184- ** ` shared/metric-standards.md ` ** — GPU cost metric patterns (section 13)
42- - ** ` shared/event-registry.yaml ` ** — Feature events (if analyzing feature-level costs)
4385- ** ` shared/gpu-cost-query-templates.md ` ** — 11 production-ready SQL queries
4486- ** ` shared/gpu-cost-analysis-patterns.md ` ** — Analysis workflows and benchmarks
87+ - ** ` shared/event-registry.yaml ` ** — Feature events (if analyzing feature-level costs)
4588
46- Key learnings :
89+ ** Data nuances ** :
4790- Table: ` ltx-dwh-prod-processed.gpu_costs.gpu_request_attribution_and_cost `
4891- Partitioned by ` dt ` (DATE) — always filter for performance
49- - ` cost_category ` : inference (requests), idle, overhead, unused
50- - Total cost = ` row_cost + attributed_idle_cost + attributed_overhead_cost ` (inference rows only)
51- - For infrastructure cost: ` SUM(row_cost) ` across all categories
92+ - ** Target date** : ` DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY) ` (cost data from 3 days ago)
93+ - Cost categories: inference, idle, overhead, unused
94+ - Total cost per request = ` row_cost + attributed_idle_cost + attributed_overhead_cost ` (inference only)
95+ - Infrastructure cost = ` SUM(row_cost) ` across all categories
5296
53- ### 3. Run Query Templates
97+ ### Phase 3: Select Query Template
5498
55- ** If user didn't specify anything: ** Run all 11 query templates from ` shared/gpu-cost-query-templates.md ` to provide comprehensive cost overview.
99+ ✅ ** PREFERRED: Run all 11 templates for comprehensive overview**
56100
57- ** If user specified specific analysis:** Select appropriate template:
101+ If user didn't specify, run all templates from ` shared/gpu-cost-query-templates.md ` :
102+
103+ ** If user specified specific analysis** , select appropriate template:
58104
59105| User asks... | Use template |
60106| -------------| -------------|
@@ -68,9 +114,7 @@ Key learnings:
68114| "Cost efficiency by model" | Cost per Request by Model |
69115| "Which orgs cost most?" | API Cost by Organization |
70116
71- See ` shared/gpu-cost-query-templates.md ` for all 11 query templates.
72-
73- ### 4. Execute Query
117+ ### Phase 4: Execute Query
74118
75119Run query using:
76120``` bash
@@ -79,84 +123,107 @@ bq --project_id=ltx-dwh-explore query --use_legacy_sql=false --format=pretty "
79123"
80124```
81125
82- Or use BigQuery console with project ` ltx-dwh-explore ` .
126+ ** Or** use BigQuery console with project ` ltx-dwh-explore `
127+
128+ ** Critical** : Always filter on ` dt = DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY) `
83129
84- ### 5. Analyze Results
130+ ### Phase 5: Analyze Results
85131
86- ** For cost trends: **
132+ ** For cost trends** :
87133- Compare current period vs baseline (7-day avg, prior week, prior month)
88134- Calculate % change and flag significant shifts (>15-20%)
89135
90- ** For anomaly detection: **
136+ ** For anomaly detection** :
91137- Flag days with Z-score > 2 (cost or volume deviates > 2 std devs from rolling avg)
92138- Investigate root cause: specific endpoint/model/org, error rate spike, billing type change
93139
94- ** For breakdowns: **
140+ ** For breakdowns** :
95141- Identify top cost drivers (endpoint, model, org, process)
96142- Calculate cost per request to spot efficiency issues
97143- Check failure costs (wasted spend on errors)
98144
99- ### 6. Present Findings
145+ ### Phase 6: Present Findings
100146
101147Format results with:
102- - ** Summary** : Key finding (e.g., "GPU costs spiked 45% yesterday ")
148+ - ** Summary** : Key finding (e.g., "GPU costs spiked 45% 3 days ago ")
103149- ** Root cause** : What drove the change (e.g., "LTX API /v1/text-to-video requests +120%")
104- - ** Breakdown** : Top contributors by dimension
150+ - ** Breakdown** : Top contributors by dimension (endpoint, model, org, process)
105151- ** Recommendation** : Action to take (investigate org X, optimize model Y, alert team)
152+ - ** Priority tier** : Tier 1 (High), Tier 2 (Medium), or Tier 3 (Low)
106153
107- ### 7. Set Up Alert (if requested )
154+ ### Phase 7: Set Up Alert (Optional )
108155
109156For ongoing monitoring:
1101571 . Save SQL query
1111582 . Set up in BigQuery scheduled query or Hex Thread
112- 3 . Configure notification threshold
113- 4 . Route alerts to Slack channel or Linear issue
159+ 3 . Configure notification threshold by tier
160+ 4 . Route alerts to appropriate team:
161+ - API cost spikes → API team
162+ - Studio cost spikes → Studio team
163+ - Infrastructure issues → Engineering team
114164
115- ## Schema Reference
165+ ## 5. Context & References
116166
117- For detailed table schema including all dimensions, columns, and cost calculations, see ` references/schema-reference.md ` .
167+ ### Production Thresholds
168+ - ** ` /Users/dbeer/workspace/dwh-data-model-transforms/queries/gpu_cost_alerting_thresholds.md ` ** — Production thresholds for LTX division (60-day analysis)
118169
119- ## Reference Files
170+ ### Shared Knowledge
171+ - ** ` shared/product-context.md ` ** — LTX products and business context
172+ - ** ` shared/bq-schema.md ` ** — GPU cost table schema (lines 418-615)
173+ - ** ` shared/metric-standards.md ` ** — GPU cost metric patterns (section 13)
174+ - ** ` shared/gpu-cost-query-templates.md ` ** — 11 production-ready SQL queries
175+ - ** ` shared/gpu-cost-analysis-patterns.md ` ** — Analysis workflows, benchmarks, investigation playbooks
176+ - ** ` shared/event-registry.yaml ` ** — Feature events (if analyzing feature-level costs)
177+
178+ ### Data Source
179+ Table: ` ltx-dwh-prod-processed.gpu_costs.gpu_request_attribution_and_cost `
180+ - Partitioned by ` dt ` (DATE)
181+ - Division: ` LTX ` (LTX API + LTX Studio)
182+ - Cost categories: inference (27%), idle (72%), overhead (0.3%)
183+ - Key columns: ` cost_category ` , ` row_cost ` , ` attributed_idle_cost ` , ` attributed_overhead_cost ` , ` result ` , ` endpoint ` , ` model_type ` , ` org_name ` , ` process_name `
184+
185+ ### Query Templates
186+ See ` shared/gpu-cost-query-templates.md ` for 11 production-ready queries
120187
121- | File | Read when |
122- | ------| -----------|
123- | ` references/schema-reference.md ` | GPU cost table dimensions, columns, and cost calculations |
124- | ` shared/bq-schema.md ` | Understanding GPU cost table schema (lines 418-615) |
125- | ` shared/metric-standards.md ` | GPU cost metric SQL patterns (section 13) |
126- | ` shared/gpu-cost-query-templates.md ` | Selecting query template for analysis (11 production-ready queries) |
127- | ` shared/gpu-cost-analysis-patterns.md ` | Interpreting results, workflows, benchmarks, investigation playbooks |
188+ ## 6. Constraints & Done
128189
129- ## Rules
190+ ### DO NOT
130191
131- ### Query Best Practices
192+ - ** DO NOT** analyze yesterday's data — use 3 days ago (cost data needs time to finalize)
193+ - ** DO NOT** sum row_cost + attributed_ * across all cost_categories — causes double-counting
194+ - ** DO NOT** mix inference and non-inference rows in same aggregation without filtering
195+ - ** DO NOT** use absolute thresholds — always compare to baseline (avg+2σ/avg+3σ)
196+ - ** DO NOT** skip partition filtering — always filter on ` dt ` for performance
197+
198+ ### DO
132199
133200- ** DO** always filter on ` dt ` partition column for performance
201+ - ** DO** use ` DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY) ` as the target date
134202- ** DO** filter ` cost_category = 'inference' ` for request-level analysis
135- - ** DO** exclude Lightricks team requests with ` is_lt_team IS FALSE ` for customer-facing cost analysis
136- - ** DO** include LT team requests only when analyzing total infrastructure spend or debugging
203+ - ** DO** exclude Lightricks team with ` is_lt_team IS FALSE ` for customer-facing cost analysis
204+ - ** DO** include LT team only when analyzing total infrastructure spend or debugging
137205- ** DO** use ` ltx-dwh-explore ` as execution project
138206- ** DO** calculate cost per request with ` SAFE_DIVIDE ` to avoid division by zero
139- - ** DO** compare against baseline (7-day avg, prior period) for trends
140- - ** DO** round cost values to 2 decimal places for readability
141-
142- ### Cost Calculation
143-
144207- ** DO** sum all three cost columns (row_cost + attributed_idle + attributed_overhead) for fully loaded cost per request
145208- ** DO** use ` SUM(row_cost) ` across all rows for total infrastructure cost
146- - ** DO NOT** sum row_cost + attributed_ * across all cost_categories (double-counting)
147- - ** DO NOT** mix inference and non-inference rows in same aggregation without filtering
148-
149- ### Analysis
150-
209+ - ** DO** compare against baseline (7-day avg, prior period) for trends
210+ - ** DO** round cost values to 2 decimal places for readability
151211- ** DO** flag anomalies with Z-score > 2 (cost or volume deviation > 2 std devs)
152212- ** DO** investigate failure costs (wasted spend on errors)
153213- ** DO** break down by endpoint/model for API, by process for Studio
154214- ** DO** check cost per request trends to spot efficiency degradation
155215- ** DO** validate results against total infrastructure spend
156-
157- ### Alerts
158-
159216- ** DO** set thresholds based on historical baseline, not absolute values
160217- ** DO** alert engineering team for cost spikes > 30% vs baseline
161218- ** DO** include cost breakdown and root cause in alerts
162219- ** DO** route API cost alerts to API team, Studio alerts to Studio team
220+
221+ ### Completion Criteria
222+
223+ ✅ All cost categories analyzed (idle, inference, overhead)
224+ ✅ Production thresholds applied by tier (High/Medium/Low)
225+ ✅ Alerts prioritized and routed to appropriate teams
226+ ✅ Root cause investigation completed (endpoint/model/org/process)
227+ ✅ Findings presented with cost breakdown and recommendations
228+ ✅ Query uses 3-day lookback (not yesterday)
229+ ✅ Partition filtering applied for performance
0 commit comments