This directory contains the CloudFormation templates for setting up an analytics pipeline to track Claude Code usage metrics.
The analytics pipeline consists of:
- Kinesis Data Firehose: Streams CloudWatch Logs to S3 in Parquet format
- S3 Data Lake: Stores historical metrics data with automatic archival
- AWS Athena: Enables SQL queries on the metrics data
- Partition Projection: Eliminates the need for Glue crawlers
- AWS CLI configured with appropriate credentials
- Claude Code OTEL collector already deployed and sending metrics to CloudWatch Logs
# Deploy the analytics pipeline
aws cloudformation deploy \
--template-file analytics-pipeline.yaml \
--stack-name claude-code-analytics \
--capabilities CAPABILITY_IAM
# Get the Athena console URL
aws cloudformation describe-stacks \
--stack-name claude-code-analytics \
--query 'Stacks[0].Outputs[?OutputKey==`AthenaConsoleUrl`].OutputValue' \
--output text# Update the dashboard to remove hard-coded users
aws cloudformation deploy \
--template-file monitoring-dashboard.yaml \
--stack-name claude-code-auth-dashboard \
--parameter-overrides TokenCostPerMillion=15.0- Navigate to the Athena console URL provided in the stack outputs
- Select the workgroup created by the stack (e.g.,
claude-code-analytics-workgroup) - Select the database (e.g.,
claude_code_analytics_analytics) - Access the saved queries from the "Saved queries" tab in the Athena console
The stack automatically creates 10 named queries associated with your workgroup. These queries provide comprehensive analytics capabilities:
Identifies your top 10 users by token consumption over the last 7 days, including user email, organization, session count, and estimated costs.
Use Case: Understand who your power users are and track usage patterns.
Analyzes token usage patterns across different models (Opus, Sonnet, Haiku) and token types (input/output) with cost estimates.
Use Case: Optimize model selection and understand cost distribution.
Shows user activity patterns by hour of day to identify peak usage times.
Use Case: Capacity planning and understanding when your users are most active.
Tracks token usage across different organizations with user counts and cost attribution.
Use Case: Organizational billing and chargeback.
Analyzes usage patterns by email domain to understand user demographics.
Use Case: Identify which teams or departments are using the service.
Calculates tokens per minute (TPM) and requests per minute (RPM) metrics for rate limit monitoring.
Use Case: Monitor API usage patterns and prevent rate limiting issues.
Analyzes user sessions including duration, intensity, models used, and per-session costs.
Use Case: Understand user behavior and session patterns.
Provides precise cost calculations by user, organization, and model with cumulative tracking.
Use Case: Accurate billing and cost management.
Identifies peak usage periods and highlights when you're approaching rate limits.
Use Case: Proactive monitoring to prevent service disruptions.
Compares usage patterns across different identity providers (Okta, Auth0, Cognito).
Use Case: Understand usage by authentication method.
Once you've selected your workgroup and database in the Athena console:
- Access Saved Queries: Click on the "Saved queries" tab
- Load a Query: Select any of the 10 pre-built queries to load it into the query editor
- Run the Query: Click "Run" to execute the query with your current data
- Export Results: Download results as CSV for further analysis
Modify the WHERE clause in any query to change the time range:
-- Last 24 hours
WHERE from_unixtime(timestamp/1000) >= CURRENT_TIMESTAMP - INTERVAL '24' HOUR
-- Last 7 days
WHERE year >= YEAR(CURRENT_DATE - INTERVAL '7' DAY)
AND from_unixtime(timestamp/1000) >= CURRENT_TIMESTAMP - INTERVAL '7' DAY
-- Last 30 days
WHERE year >= YEAR(CURRENT_DATE - INTERVAL '30' DAY)
AND from_unixtime(timestamp/1000) >= CURRENT_TIMESTAMP - INTERVAL '30' DAY
-- Specific date range
WHERE from_unixtime(timestamp/1000) BETWEEN TIMESTAMP '2024-01-01' AND TIMESTAMP '2024-01-31'Add additional WHERE conditions to focus on specific users:
-- Filter by email domain
AND user_email LIKE '%@example.com'
-- Filter by organization
AND organization_id = 'your-org-id'
-- Filter by specific model
AND model LIKE '%opus%'- S3 Standard: 90 days (configurable via
DataRetentionDaysparameter) - S3 Glacier: After 90 days (automatic transition)
- Athena Query Results: 7 days (auto-deleted)
- Partition Projection: No need to run Glue crawlers
- Parquet Format: Columnar storage reduces query costs
- S3 Lifecycle: Automatic archival to Glacier
- Query Result Caching: Athena caches results for 7 days
- Use partition columns (year, month, day, hour) in WHERE clauses
- Limit time ranges to reduce data scanned
- Use LIMIT for exploratory queries
