Skip to content

Commit fa8a324

Browse files
authored
feat: Per-User Token Quota Monitoring with Automated Alerts (#31)
* fix: update with arch detection for linux builds * fix: update codebuild region * feat: Add per-user token quota monitoring with automated alerts - Implement quota monitoring CloudFormation stack with DynamoDB storage - Add Lambda function to check usage thresholds (80%, 90%, 100%) - Integrate SNS alerting with configurable monthly limits - Refactor metrics aggregator for improved performance and quota tracking - Update CLI to support quota stack deployment and configuration - Add comprehensive documentation for quota monitoring setup - Enhance analytics pipeline with better data organization
1 parent e2da56d commit fa8a324

File tree

15 files changed

+1608
-460
lines changed

15 files changed

+1608
-460
lines changed

assets/docs/ANALYTICS.md

Lines changed: 79 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -53,74 +53,85 @@ aws cloudformation deploy \
5353
1. Navigate to the Athena console URL provided in the stack outputs
5454
2. Select the workgroup created by the stack (e.g., `claude-code-analytics-workgroup`)
5555
3. Select the database (e.g., `claude_code_analytics_analytics`)
56+
4. Access the saved queries from the "Saved queries" tab in the Athena console
5657

57-
### Sample Queries
58+
### Pre-Built Named Queries
5859

59-
The stack creates several named queries that you can use as starting points:
60+
The stack automatically creates 10 named queries associated with your workgroup. These queries provide comprehensive analytics capabilities:
6061

61-
#### Top Users by Token Usage (Last 7 Days)
62+
#### 1. Top Users by Token Usage
63+
Identifies your top 10 users by token consumption over the last 7 days, including user email, organization, session count, and estimated costs.
6264

63-
```sql
64-
WITH user_totals AS (
65-
SELECT
66-
user_id,
67-
SUM(token_usage) as total_tokens,
68-
COUNT(DISTINCT session_id) as session_count,
69-
COUNT(DISTINCT DATE(from_unixtime(timestamp/1000))) as active_days
70-
FROM metrics
71-
WHERE year >= YEAR(CURRENT_DATE - INTERVAL '7' DAY)
72-
AND from_unixtime(timestamp/1000) >= CURRENT_TIMESTAMP - INTERVAL '7' DAY
73-
GROUP BY user_id
74-
)
75-
SELECT
76-
SUBSTR(user_id, 1, 8) || '...' as user_id_short,
77-
total_tokens,
78-
session_count,
79-
active_days,
80-
ROUND(total_tokens * 0.000015, 2) as estimated_cost_usd
81-
FROM user_totals
82-
ORDER BY total_tokens DESC
83-
LIMIT 10;
84-
```
65+
**Use Case:** Understand who your power users are and track usage patterns.
8566

86-
#### Token Usage by Model
67+
#### 2. Token Usage by Model and Type
68+
Analyzes token usage patterns across different models (Opus, Sonnet, Haiku) and token types (input/output) with cost estimates.
8769

88-
```sql
89-
SELECT
90-
model,
91-
type as token_type,
92-
SUM(token_usage) as total_tokens,
93-
COUNT(DISTINCT user_id) as unique_users,
94-
ROUND(SUM(token_usage) * 0.000015, 2) as estimated_cost_usd
95-
FROM metrics
96-
WHERE year >= YEAR(CURRENT_DATE - INTERVAL '30' DAY)
97-
AND from_unixtime(timestamp/1000) >= CURRENT_TIMESTAMP - INTERVAL '30' DAY
98-
GROUP BY model, type
99-
ORDER BY total_tokens DESC;
100-
```
70+
**Use Case:** Optimize model selection and understand cost distribution.
10171

102-
#### User Activity by Hour
72+
#### 3. User Activity Pattern
73+
Shows user activity patterns by hour of day to identify peak usage times.
10374

104-
```sql
105-
SELECT
106-
HOUR(from_unixtime(timestamp/1000)) as hour_of_day,
107-
COUNT(DISTINCT user_id) as active_users,
108-
SUM(token_usage) as total_tokens
109-
FROM metrics
110-
WHERE year >= YEAR(CURRENT_DATE - INTERVAL '7' DAY)
111-
AND from_unixtime(timestamp/1000) >= CURRENT_TIMESTAMP - INTERVAL '7' DAY
112-
GROUP BY HOUR(from_unixtime(timestamp/1000))
113-
ORDER BY hour_of_day;
114-
```
75+
**Use Case:** Capacity planning and understanding when your users are most active.
76+
77+
#### 4. Token Usage by Organization
78+
Tracks token usage across different organizations with user counts and cost attribution.
79+
80+
**Use Case:** Organizational billing and chargeback.
81+
82+
#### 5. Token Usage by Email Domain
83+
Analyzes usage patterns by email domain to understand user demographics.
84+
85+
**Use Case:** Identify which teams or departments are using the service.
86+
87+
#### 6. Detailed TPM and RPM Analysis
88+
Calculates tokens per minute (TPM) and requests per minute (RPM) metrics for rate limit monitoring.
89+
90+
**Use Case:** Monitor API usage patterns and prevent rate limiting issues.
91+
92+
#### 7. User Session Analysis
93+
Analyzes user sessions including duration, intensity, models used, and per-session costs.
94+
95+
**Use Case:** Understand user behavior and session patterns.
96+
97+
#### 8. Detailed Cost Attribution
98+
Provides precise cost calculations by user, organization, and model with cumulative tracking.
11599

116-
### Custom Time Ranges
100+
**Use Case:** Accurate billing and cost management.
117101

118-
To query different time ranges, modify the WHERE clause:
102+
#### 9. Peak Usage and Rate Limit Analysis
103+
Identifies peak usage periods and highlights when you're approaching rate limits.
104+
105+
**Use Case:** Proactive monitoring to prevent service disruptions.
106+
107+
#### 10. Usage Analysis by Identity Provider
108+
Compares usage patterns across different identity providers (Okta, Auth0, Cognito).
109+
110+
**Use Case:** Understand usage by authentication method.
111+
112+
### Working with the Queries
113+
114+
Once you've selected your workgroup and database in the Athena console:
115+
116+
1. **Access Saved Queries**: Click on the "Saved queries" tab
117+
2. **Load a Query**: Select any of the 10 pre-built queries to load it into the query editor
118+
3. **Run the Query**: Click "Run" to execute the query with your current data
119+
4. **Export Results**: Download results as CSV for further analysis
120+
121+
### Customizing Queries
122+
123+
#### Adjusting Time Ranges
124+
125+
Modify the WHERE clause in any query to change the time range:
119126

120127
```sql
121128
-- Last 24 hours
122129
WHERE from_unixtime(timestamp/1000) >= CURRENT_TIMESTAMP - INTERVAL '24' HOUR
123130

131+
-- Last 7 days
132+
WHERE year >= YEAR(CURRENT_DATE - INTERVAL '7' DAY)
133+
AND from_unixtime(timestamp/1000) >= CURRENT_TIMESTAMP - INTERVAL '7' DAY
134+
124135
-- Last 30 days
125136
WHERE year >= YEAR(CURRENT_DATE - INTERVAL '30' DAY)
126137
AND from_unixtime(timestamp/1000) >= CURRENT_TIMESTAMP - INTERVAL '30' DAY
@@ -129,6 +140,21 @@ WHERE year >= YEAR(CURRENT_DATE - INTERVAL '30' DAY)
129140
WHERE from_unixtime(timestamp/1000) BETWEEN TIMESTAMP '2024-01-01' AND TIMESTAMP '2024-01-31'
130141
```
131142

143+
#### Filtering by Specific Users or Organizations
144+
145+
Add additional WHERE conditions to focus on specific users:
146+
147+
```sql
148+
-- Filter by email domain
149+
AND user_email LIKE '%@example.com'
150+
151+
-- Filter by organization
152+
AND organization_id = 'your-org-id'
153+
154+
-- Filter by specific model
155+
AND model LIKE '%opus%'
156+
```
157+
132158
## Data Retention
133159

134160
- **S3 Standard**: 90 days (configurable via `DataRetentionDays` parameter)

assets/docs/CLI_REFERENCE.md

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ poetry run ccwb deploy [stack] [options]
8080

8181
**Arguments:**
8282

83-
- `stack` - Specific stack to deploy: auth, networking, monitoring, dashboard, or analytics (optional)
83+
- `stack` - Specific stack to deploy: auth, networking, monitoring, dashboard, analytics, or quota (optional)
8484

8585
**Options:**
8686

@@ -102,7 +102,29 @@ poetry run ccwb deploy [stack] [options]
102102
3. **monitoring** - OpenTelemetry collector on ECS Fargate (optional)
103103
4. **dashboard** - CloudWatch dashboard for usage metrics (optional)
104104
5. **analytics** - Kinesis Firehose and Athena for analytics (optional)
105-
6. **codebuild** - AWS CodeBuild for Windows binary builds (optional, only if enabled during init)
105+
6. **quota** - Per-user token quota monitoring and alerts (optional, requires dashboard)
106+
7. **codebuild** - AWS CodeBuild for Windows binary builds (optional, only if enabled during init)
107+
108+
**Examples:**
109+
110+
```bash
111+
# Deploy all configured stacks
112+
poetry run ccwb deploy
113+
114+
# Deploy only authentication
115+
poetry run ccwb deploy auth
116+
117+
# Deploy quota monitoring (requires dashboard stack first)
118+
poetry run ccwb deploy quota
119+
120+
# Show commands without executing
121+
poetry run ccwb deploy --show-commands
122+
123+
# Dry run to see what would be deployed
124+
poetry run ccwb deploy --dry-run
125+
```
126+
127+
> **Note**: Quota monitoring requires the dashboard stack to be deployed first. See [Quota Monitoring Guide](QUOTA_MONITORING.md) for detailed information.
106128
107129
### `test` - Test Package
108130

assets/docs/MONITORING.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,15 @@ Claude Code sends several metric types that the collector processes:
3232
- `claude_code.cost.usage` - Estimates costs based on token usage
3333
- `claude_code.code_edit_tool.decision` - Records code editing decisions
3434

35+
## Usage Quota Monitoring
36+
37+
The monitoring system supports optional quota tracking to alert administrators when users approach or exceed token usage limits. This helps manage costs and prevent unexpected overages.
38+
39+
Quota monitoring deploys as a separate CloudFormation stack that integrates with the dashboard infrastructure. When enabled, it tracks monthly token consumption per user and sends automated alerts through Amazon SNS when usage thresholds are exceeded.
40+
41+
> **Detailed Information**: For complete quota monitoring setup, configuration, and usage instructions, see the [Quota Monitoring Guide](QUOTA_MONITORING.md).
42+
43+
3544
## Analytics Pipeline (Optional)
3645

3746
Beyond real-time monitoring through CloudWatch, you can enable an analytics pipeline for advanced reporting and historical analysis. The analytics stack creates a data lake for long-term metric storage and analysis.
@@ -54,6 +63,8 @@ The deployment creates the complete monitoring infrastructure: a VPC with public
5463

5564
If you provide a custom domain name and hosted zone ID during setup, the system automatically provisions an ACM certificate and configures HTTPS. This ensures encrypted transmission of metrics from Claude Code to your collector.
5665

66+
The dashboard stack creates the metrics aggregation infrastructure that supports quota monitoring. If you choose to deploy quota monitoring as a separate stack, it integrates with the dashboard's metrics table to track user consumption.
67+
5768
## Claude Code Configuration
5869

5970
The package command generates a `claude-settings/settings.json` file in the distribution package that configures Claude Code for telemetry collection. During installation, this file gets copied to `~/.claude/settings.json` in the user's home directory and contains all the settings needed for monitoring.
@@ -131,3 +142,5 @@ The Application Load Balancer is internet-facing to receive metrics from Claude
131142
The monitoring system provides comprehensive visibility into Claude Code usage across your organization. Deployment is automated through the `ccwb` CLI tools, creating all necessary infrastructure with minimal configuration. The OTEL Collector on ECS Fargate handles metric collection and transformation, while CloudWatch provides storage and visualization.
132143

133144
User attribution happens automatically through the OTEL helper binary that extracts information from authentication tokens. This enables detailed usage tracking by user, department, team, and other organizational dimensions without requiring manual configuration.
145+
146+
Quota monitoring provides proactive alerts when users approach or exceed token usage limits. The system sends detailed notifications through SNS, allowing organizations to manage costs and usage patterns effectively.

0 commit comments

Comments
 (0)