Skip to content

Commit 9e1f78c

Browse files
committed
Merge branch 'feature/add-dynamic-pricing-to-metering-reporting' into 'develop'
Add dynamic pricing to metering table in reporting database See merge request genaiic-reusable-assets/engagement-artifacts/genaiic-idp-accelerator!243
2 parents 377f0e9 + 1a03a83 commit 9e1f78c

File tree

10 files changed

+1242
-181
lines changed

10 files changed

+1242
-181
lines changed

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,13 @@ SPDX-License-Identifier: MIT-0
55

66
## [Unreleased]
77

8+
### Added
9+
10+
- **Dynamic Cost Calculation for Metering Data**
11+
- Added automated unit cost and estimated cost calculation to metering table with new `unit_cost` and `estimated_cost` columns
12+
- Dynamic pricing configuration loading from configuration
13+
- Enhanced cost analysis capabilities with comprehensive Athena queries for cost tracking, trend analysis, and efficiency metrics
14+
- Automatic cost calculation as `estimated_cost = value × unit_cost` for all metering records
815

916
## [0.3.11]
1017

docs/reporting-database.md

Lines changed: 144 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ This table is partitioned by date (YYYY-MM-DD format).
8585

8686
## Metering Table
8787

88-
The `metering` table captures detailed usage metrics for each document processing operation:
88+
The `metering` table captures detailed usage metrics and cost information for each document processing operation:
8989

9090
| Column | Type | Description |
9191
|--------|------|-------------|
@@ -95,15 +95,52 @@ The `metering` table captures detailed usage metrics for each document processin
9595
| unit | string | Unit of measurement (pages, inputTokens, outputTokens, etc.) |
9696
| value | double | Quantity of the unit consumed |
9797
| number_of_pages | int | Number of pages in the document |
98+
| unit_cost | double | Cost per unit in USD (e.g., cost per token, cost per page) |
99+
| estimated_cost | double | Calculated total cost in USD (value × unit_cost) |
98100
| timestamp | timestamp | When the operation was performed |
99101

100102
This table is partitioned by date (YYYY-MM-DD format).
101103

104+
### Cost Calculation and Pricing
105+
106+
The metering table now includes automated cost calculation capabilities:
107+
108+
- **unit_cost**: Retrieved from pricing configuration for each service_api/unit combination
109+
- **estimated_cost**: Automatically calculated as value × unit_cost for each record
110+
- **Dynamic Pricing**: Costs are loaded from configuration and cached for performance
111+
- **Fallback Handling**: When pricing data is not available, unit_cost defaults to $0.0
112+
113+
#### Pricing Configuration Format
114+
115+
Pricing data is loaded from the system configuration in the following format:
116+
117+
```yaml
118+
pricing:
119+
- name: "bedrock/us.anthropic.claude-3-sonnet-20240229-v1:0"
120+
units:
121+
- name: "inputTokens"
122+
price: "3.0e-6" # $0.000003 per input token
123+
- name: "outputTokens"
124+
price: "1.5e-5" # $0.000015 per output token
125+
- name: "textract/analyze_document"
126+
units:
127+
- name: "pages"
128+
price: "0.0015" # $0.0015 per page
129+
```
130+
131+
#### Cost Calculation Process
132+
133+
1. **Service/Unit Matching**: System attempts exact match for service_api/unit combination
134+
2. **Partial Matching**: If exact match fails, uses fuzzy matching for common patterns
135+
3. **Cost Calculation**: estimated_cost = value × unit_cost
136+
4. **Caching**: Pricing data is cached to avoid repeated configuration lookups
137+
102138
The metering table is particularly valuable for:
103-
- Cost analysis and allocation
104-
- Usage pattern identification
105-
- Resource optimization
106-
- Performance benchmarking across different document types and sizes
139+
- **Cost analysis and allocation** - Track spending by document type, service, or time period
140+
- **Usage pattern identification** - Analyze consumption patterns across different models
141+
- **Resource optimization** - Identify cost-effective processing approaches
142+
- **Performance benchmarking** - Compare cost efficiency across different document types and sizes
143+
- **Budget monitoring** - Track actual costs against budgets and forecasts
107144
108145
## Document Sections Tables
109146
@@ -319,6 +356,108 @@ ORDER BY
319356
month;
320357
```
321358

359+
**Cost analysis queries:**
360+
```sql
361+
-- Total estimated costs by service API
362+
SELECT
363+
service_api,
364+
SUM(estimated_cost) as total_cost,
365+
AVG(estimated_cost) as avg_cost_per_operation,
366+
COUNT(*) as operation_count,
367+
COUNT(DISTINCT document_id) as document_count
368+
FROM
369+
metering
370+
WHERE
371+
date BETWEEN '2024-01-01' AND '2024-01-31'
372+
GROUP BY
373+
service_api
374+
ORDER BY
375+
total_cost DESC;
376+
377+
-- Cost per page analysis by document type
378+
SELECT
379+
se.section_type,
380+
SUM(m.estimated_cost) / SUM(m.number_of_pages) as cost_per_page,
381+
SUM(m.estimated_cost) as total_cost,
382+
SUM(m.number_of_pages) as total_pages,
383+
COUNT(DISTINCT m.document_id) as document_count
384+
FROM
385+
metering m
386+
JOIN
387+
section_evaluations se ON m.document_id = se.document_id
388+
WHERE
389+
m.number_of_pages > 0
390+
AND m.date BETWEEN '2024-01-01' AND '2024-01-31'
391+
GROUP BY
392+
se.section_type
393+
ORDER BY
394+
cost_per_page DESC;
395+
396+
-- Daily cost trends
397+
SELECT
398+
date,
399+
SUM(estimated_cost) as daily_cost,
400+
COUNT(DISTINCT document_id) as documents_processed,
401+
SUM(estimated_cost) / COUNT(DISTINCT document_id) as avg_cost_per_document
402+
FROM
403+
metering
404+
WHERE
405+
date BETWEEN '2024-01-01' AND '2024-01-31'
406+
GROUP BY
407+
date
408+
ORDER BY
409+
date;
410+
411+
-- Most expensive documents
412+
SELECT
413+
document_id,
414+
SUM(estimated_cost) as total_document_cost,
415+
SUM(value) as total_units_consumed,
416+
COUNT(*) as operations_count,
417+
MAX(number_of_pages) as page_count
418+
FROM
419+
metering
420+
WHERE
421+
date BETWEEN '2024-01-01' AND '2024-01-31'
422+
GROUP BY
423+
document_id
424+
ORDER BY
425+
total_document_cost DESC
426+
LIMIT 10;
427+
428+
-- Cost efficiency by model (cost per token)
429+
SELECT
430+
service_api,
431+
SUM(estimated_cost) / SUM(value) as cost_per_token,
432+
SUM(estimated_cost) as total_cost,
433+
SUM(value) as total_tokens,
434+
COUNT(DISTINCT document_id) as document_count
435+
FROM
436+
metering
437+
WHERE
438+
unit IN ('inputTokens', 'outputTokens', 'totalTokens')
439+
AND date BETWEEN '2024-01-01' AND '2024-01-31'
440+
GROUP BY
441+
service_api
442+
ORDER BY
443+
cost_per_token ASC;
444+
445+
-- Cost breakdown by processing context
446+
SELECT
447+
context,
448+
SUM(estimated_cost) as total_cost,
449+
COUNT(DISTINCT document_id) as document_count,
450+
SUM(estimated_cost) / COUNT(DISTINCT document_id) as avg_cost_per_document
451+
FROM
452+
metering
453+
WHERE
454+
date BETWEEN '2024-01-01' AND '2024-01-31'
455+
GROUP BY
456+
context
457+
ORDER BY
458+
total_cost DESC;
459+
```
460+
322461
### Creating Dashboards
323462

324463
For more advanced visualization and dashboarding:

0 commit comments

Comments
 (0)