Complete guide for deploying your cloud cost analytics to ClickHouse Cloud with optional data anonymization for public dashboards.
- Overview
- Quick Start
- Detailed Setup
- Switching Between Local and Cloud
- Data Anonymization
- GitHub Actions Automation
- Troubleshooting
Local Development (default):
- Data: Parquet files in
viz_rill/data/ - Visualization: Local Rill server
- Command:
make run-all
Cloud Production:
- Data: ClickHouse Cloud database
- Visualization: Rill Cloud or local Rill pointing to ClickHouse
- Command:
make run-all-cloud
Local Mode:
Cloud APIs → dlt → Parquet files → Rill (DuckDB) → localhost:9009
Cloud Mode:
Cloud APIs → dlt → ClickHouse Cloud → Rill Cloud → Public dashboard
Key Feature: Same codebase works for both! Just switch environment variables.
- ✅ Working local setup (
make run-allworks) - ✅ ClickHouse Cloud account (sign up free)
# 1. Add ClickHouse credentials to .dlt/secrets.toml
[destination.clickhouse.credentials]
host = "xxxxx.europe-west4.gcp.clickhouse.cloud"
port = 8443
username = "default"
password = "your-password"
secure = 1
# 2. Initialize database
make init-clickhouse
# 3. Load data to ClickHouse
make run-etl-clickhouse
# 4. (Optional) Anonymize for public dashboards
make anonymize-clickhouse
# 5. Configure Rill to use ClickHouse
cd viz_rill
echo 'RILL_CONNECTOR=clickhouse' >> .env
echo 'CONNECTOR_CLICKHOUSE_DSN=clickhouse://user:pass@host:8443/default?secure=true' >> .env
# 6. View dashboards
cd .. && make serveDone! Your dashboards now show ClickHouse data.
- Go to ClickHouse Cloud Console
- Click "Create new service"
- Choose:
- Provider: GCP, AWS, or Azure
- Region: Choose closest to your data sources
- Tier: Development (free tier) or Production
- Click "Create service"
- Save credentials (you won't see the password again):
Host: xxxxx.region.gcp.clickhouse.cloud Port: 8443 Username: default Password: [copy this!]
Edit .dlt/secrets.toml:
[destination.clickhouse.credentials]
host = "xxxxx.europe-west4.gcp.clickhouse.cloud"
port = 8443
username = "default"
password = "your-secure-password"
secure = 1See .dlt/secrets.toml.example for full template.
make init-clickhouseThis creates the necessary database and permissions for dlt.
Start with one pipeline:
make run-aws-clickhouseVerify data arrived:
# Using clickhouse-client
clickhouse-client --host xxxxx.clickhouse.cloud --secure \
--password 'your-password' \
--query "SELECT count(*) FROM aws_costs___cur_export_test_00001"Or use ClickHouse Cloud SQL console in the web UI.
make run-etl-clickhouseThis runs AWS, GCP, and Stripe pipelines to ClickHouse.
The project uses environment variables to switch between local and cloud:
ETL destination (set automatically by Makefile):
DLT_DESTINATION=filesystem→ Parquet files (local)DLT_DESTINATION=clickhouse→ ClickHouse Cloud
Rill connector (set in viz_rill/.env):
RILL_CONNECTOR=""or not set → DuckDB reads parquet (local)RILL_CONNECTOR=clickhouse→ Rill queries ClickHouse
# ETL to parquet
make run-all
# Rill automatically uses parquet files
make serveNo configuration needed! This is the default.
Option A: ETL + View locally
# 1. Load to ClickHouse
make run-etl-clickhouse
# 2. Configure Rill
cd viz_rill
cat > .env << 'EOF'
RILL_CONNECTOR=clickhouse
CONNECTOR_CLICKHOUSE_DSN=clickhouse://user:pass@host:8443/default?secure=true
EOF
# 3. View dashboards
cd .. && make serveOption B: Full Cloud Pipeline (ETL + Anonymize)
# Loads data and anonymizes in one command
make run-all-cloud
# Then configure Rill as abovecd viz_rill
# Remove or comment out RILL_CONNECTOR
echo 'RILL_CONNECTOR=""' > .env
cd .. && make serveRill now reads parquet files again.
ClickHouse tables created:
aws_costs___cur_export_test_00001 # AWS raw data
gcp_costs___bigquery_billing_table # GCP raw data
stripe_costs___balance_transaction # Stripe raw data
aws_costs # Rill model (materialized)
gcp_costs # Rill model (materialized)
stripe_revenue # Rill model (materialized)
unified_cost_model # Rill model (materialized)
For public dashboards, anonymize your cost data:
- Cost values: Multiplied by random 2-8x factor
- Row duplication: 3x more data by default
- IDs hashed: Account IDs, project IDs anonymized
# After loading data to ClickHouse
make anonymize-clickhouse
# Customize with environment variables
COST_MULTIPLIER_MIN=5.0 COST_MULTIPLIER_MAX=10.0 DUPLICATE_ROWS=5 make anonymize-clickhouse# Interactive (asks for confirmation)
make clear-clickhouse
# Force mode (for scripts)
make clear-clickhouse-force
# Dry run (see what would be deleted)
uv run python scripts/clear_clickhouse.py --dry-runThis drops all dlt tables AND Rill model tables.
| Variable | Default | Description |
|---|---|---|
COST_MULTIPLIER_MIN |
2.0 |
Minimum cost multiplier |
COST_MULTIPLIER_MAX |
8.0 |
Maximum cost multiplier |
DUPLICATE_ROWS |
3 |
Row duplication factor |
See ANONYMIZATION.md for details.
Automate daily data loads to ClickHouse.
-
Add GitHub Secrets (Settings → Secrets → Actions):
CLICKHOUSE_HOST- Your ClickHouse hostCLICKHOUSE_USERNAME- UsuallydefaultCLICKHOUSE_PASSWORD- Your ClickHouse passwordAWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEYGCP_PROJECT_ID,GCP_PRIVATE_KEY,GCP_CLIENT_EMAILSTRIPE_SECRET_KEY
-
Workflow already exists at
.github/workflows/etl-pipeline.yml -
Test manual trigger:
- Go to Actions tab → Cloud Cost ETL Pipeline
- Click "Run workflow"
- Monitor progress
- Runs daily at 2 AM UTC automatically
- Loads only new data (incremental)
- ~5-10 minutes per run
- Check Actions tab for status
- Green checkmark = success
- Logs available in each workflow run
"Cannot connect to ClickHouse"
Solutions:
- Verify ClickHouse service is running (check cloud console)
- Check host includes full domain:
xxxxx.region.gcp.clickhouse.cloud - Ensure
secure = 1in secrets.toml - Test connection:
clickhouse-client --host xxx --secure --password 'xxx' --query "SELECT 1"
"Authentication failed"
Solutions:
- Verify credentials in
.dlt/secrets.toml - No trailing spaces in values
- Try regenerating password in ClickHouse console
- Update both local config and GitHub Secrets
"Model execution is disabled" or "read_parquet not found"
Solutions:
- Check
RILL_CONNECTOR=clickhouseis set inviz_rill/.env - Verify ClickHouse DSN is correct
- Make sure
rill.yamlhasolap_connector: clickhouse - Restart Rill after changing config
"No data in dashboards"
Solutions:
- Verify data loaded:
SELECT count(*) FROM aws_costs___cur_export_test_00001 - Check Rill is using correct connector
- Look for errors in Rill startup logs
- Verify models reconciled successfully
"Dashboards show errors on some visualizations"
Solutions:
- Some visualizations may need column adjustments
- Check metrics files reference correct columns
- Models select all columns with
SELECT *, so schema should match
"Queries are slow"
Solutions:
- Add indexes in ClickHouse for common query patterns
- Use materialized views for aggregations
- Optimize metrics to reduce data scanned
- Consider table partitioning for large datasets
Start fresh:
# 1. Clear ClickHouse
make clear-clickhouse-force
# 2. Reload data
make run-etl-clickhouse
# 3. Anonymize if needed
make anonymize-clickhouse- Free tier: Development tier available
- Idle scaling: Auto-scales to zero when not in use
- Optimization: Run ETL once daily (sufficient for most use cases)
- Free tier: 2,000 minutes/month for public repos
- Usage: ~5-10 minutes per run × 30 days = ~300 minutes/month
- Well under limit for daily schedules
Short answer: No, normalization is optional for both local and cloud modes.
The normalization scripts (normalize.py, normalize_gcp.py) are now optional because:
- Models use
SELECT *to get all columns from raw data - ClickHouse and DuckDB can query nested structures directly
- Current dashboards work perfectly with raw, unnormalized data
- Models add derived columns (like
product_product_name) where needed
Only use normalization if you need:
- CUR Wizard dynamic dashboards (canvas generation)
- Pre-flattened tag/label columns for external tools
- Compatibility with tools that can't query nested structures
Current approach (no normalization):
- Pipelines load raw data to parquet/ClickHouse
- Models query raw data and add derived columns
- Dashboards use models
Important: Dynamic dashboard generation (make aws-dashboards / make gcp-dashboards) only works in local mode because it requires parquet files to analyze. ClickHouse mode doesn't create parquet files.
For local mode only:
# These commands work ONLY with local parquet files
make aws-dashboards # Requires: viz_rill/data/aws_costs/*.parquet
make gcp-dashboards # Requires: viz_rill/data/gcp_costs/*.parquetFor ClickHouse (alternative approach):
-- Create materialized view instead of Python normalization
CREATE MATERIALIZED VIEW aws_costs_normalized AS
SELECT
*,
resource_tags['Environment'] AS tag_environment,
resource_tags['Team'] AS tag_team
FROM aws_costs___cur_export_test_00001;ClickHouse materialized views are:
- ✅ Faster than Python scripts
- ✅ No intermediate parquet files needed
- ✅ Updated automatically as new data arrives
- ✅ Native ClickHouse performance
After successful ClickHouse deployment:
-
Configure Rill Cloud (optional):
- Deploy dashboards:
cd viz_rill && rill deploy - Set ClickHouse credentials in Rill Cloud settings
- Share dashboards with team
- Deploy dashboards:
-
Set up monitoring (optional):
- Add Slack/email notifications to workflow
- Monitor data freshness
- Set up alerts for failed runs
-
Optimize performance (optional):
- Add ClickHouse indexes for common queries
- Create aggregation tables
- Adjust workflow frequency based on needs
- ClickHouse docs: https://clickhouse.com/docs
- dlt ClickHouse destination: https://dlthub.com/docs/dlt-ecosystem/destinations/clickhouse
- Rill docs: https://docs.rilldata.com