-
Notifications
You must be signed in to change notification settings - Fork 359
4146 add troubleshooting section #4181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
dhtclk
wants to merge
29
commits into
main
Choose a base branch
from
4146-add-troubleshooting-section
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
32feb0b
Troubleshooting page, temporary placement
dhtclk 60f3fbe
Merge branch 'main' of https://github.com/ClickHouse/clickhouse-docs …
dhtclk c619204
Merge branch 'main' of https://github.com/ClickHouse/clickhouse-docs …
dhtclk aa15a8a
Merge branch 'main' of https://github.com/ClickHouse/clickhouse-docs …
dhtclk 0c7d189
Merge branch 'main' of https://github.com/ClickHouse/clickhouse-docs …
dhtclk 7c94a46
Adding Lessons Learned Guide with Interactable Queries
dhtclk 9166323
Split into multiple guides under a new section
dhtclk 2bbe1bc
Merge branch 'main' of https://github.com/ClickHouse/clickhouse-docs …
dhtclk bf67a27
Keywords, cross-linking, clean-up
dhtclk e10f908
adding ask ai link to troubleshooting, simple kapa link component.
dhtclk dc751b2
commenting out C++ link
dhtclk 0ba3eb4
spelling and dictionary update
dhtclk e9a6adf
Update docs/tips-and-tricks/too-many-parts.md
dhtclk 0f7d07e
Update docs/tips-and-tricks/too-many-parts.md
dhtclk 7159177
Update docs/tips-and-tricks/too-many-parts.md
dhtclk 03abb7d
Update docs/tips-and-tricks/too-many-parts.md
dhtclk 8e8fd7b
Update docs/tips-and-tricks/debugging-toolkit.md
dhtclk 4867bbe
Update docs/tips-and-tricks/cost-optimization.md
dhtclk ec2036f
Merge branch 'main' of https://github.com/ClickHouse/clickhouse-docs …
dhtclk de5e966
Rewriting Creative Use Cases
dhtclk 21e518c
fix formatting
dhtclk 33935f4
Rewrite cost-optimization doc
dhtclk f3806da
Performance Optimization Guide
dhtclk d8e3ca2
Too Many Parts
dhtclk 395d484
MVs and Debugging Toolkit
dhtclk aa308be
Fixing nav link
dhtclk bc0e0eb
adding to dictionary
dhtclk 1dd1100
fixing dictionary
dhtclk f37c70a
adding header ids
dhtclk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
--- | ||
sidebar_position: 1 | ||
slug: /tips-and-tricks/community-wisdom | ||
sidebar_label: 'Community Wisdom' | ||
doc_type: 'overview' | ||
keywords: [ | ||
'database tips', | ||
'community wisdom', | ||
'production troubleshooting', | ||
'performance optimization', | ||
'database debugging', | ||
'clickhouse guides', | ||
'real world examples', | ||
'database best practices', | ||
'meetup insights', | ||
'production lessons', | ||
'interactive tutorials', | ||
'database solutions' | ||
] | ||
title: 'ClickHouse Community Wisdom' | ||
description: 'Learn from the ClickHouse community with real world scenarios and lessons learned' | ||
--- | ||
|
||
# ClickHouse Community Wisdom: Tips and Tricks from Meetups {#community-wisdom} | ||
|
||
*These interactive guides represent collective wisdom from hundreds of production deployments. Each runnable example helps you understand ClickHouse patterns using real GitHub events data - practice these concepts to avoid common mistakes and accelerate your success.* | ||
|
||
Combine this collected knowledge with our [Best Practices](/best-practices) guide for optimal ClickHouse Experience. | ||
|
||
## Problem-Specific Quick Jumps {#problem-specific-quick-jumps} | ||
|
||
| Issue | Document | Description | | ||
|-------|---------|-------------| | ||
| **Production Issue** | [Debugging-Toolkit](./debugging-toolkit.md) | Copy/Paste Queries, production debugging guidance | | ||
| **Slow Queries** | [Performance Optimization](./performance-optimization.md) | Optimize Performance | | ||
| **Materialized Views** | [MV Double-Edged Sword](./materialized-views.md) | Avoid 10x storage instances | | ||
| **Too Many Parts** | [Too Many Parts](./too-many-parts.md) | Addressing the 'Too Many Parts' error and performance slowdown | | ||
| **High Costs** | [Cost Optimization](./cost-optimization.md) | Optimize Cost | | ||
| **Creative Use Cases** | [Success Stories](./creative-usecases.md) | Examples of ClickHouse in 'Outside the Box' use cases | | ||
|
||
### Usage Instructions {#usage-instructions} | ||
|
||
1. **Run the examples** - Many SQL blocks executable | ||
2. **Experiment freely** - Modify queries to test different patterns | ||
3. **Adapt to your data** - Use templates with your own table names | ||
4. **Monitor regularly** - Implement health check queries as ongoing monitoring | ||
5. **Learn progressively** - Start with basics, advance to optimization patterns | ||
|
||
### Interactive Features {#interactive-features} | ||
|
||
- **Real Data Examples**: Using actual GitHub events from ClickHouse playground | ||
- **Production-Ready Templates**: Adapt examples for your production systems | ||
- **Progressive Difficulty**: From basic concepts to advanced optimization | ||
- **Emergency Procedures**: Ready-to-use debugging and recovery queries | ||
|
||
**Last Updated:** Based on community meetup insights through 2024-2025 | ||
**Contributing:** Found a mistake or have a new lesson? Community contributions welcome |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
--- | ||
sidebar_position: 1 | ||
slug: /community-wisdom/cost-optimization | ||
sidebar_label: 'Cost Optimization' | ||
doc_type: 'how-to-guide' | ||
keywords: [ | ||
'cost optimization', | ||
'storage costs', | ||
'partition management', | ||
'data retention', | ||
'storage analysis', | ||
'database optimization', | ||
'clickhouse cost reduction', | ||
'storage hot spots', | ||
'ttl performance', | ||
'disk usage', | ||
'compression strategies', | ||
'retention analysis' | ||
] | ||
title: 'Lessons - Cost Optimization' | ||
description: 'Battle-tested cost optimization strategies from ClickHouse community meetups with real production examples and verified techniques.' | ||
--- | ||
|
||
# Cost Optimization: Battle-Tested Strategies {#cost-optimization} | ||
*This guide is part of a collection of findings gained from community meetups. The findings on this page cover community wisdom related to optimizing cost while using ClickHouse. For more real world solutions and insights you can [browse by specific problem](./community-wisdom.md).* | ||
|
||
## The ContentSquare Migration: 11x Cost Reduction {#contentsquare-migration} | ||
|
||
ContentSquare's migration from Elasticsearch to ClickHouse shows the cost optimization potential when moving to ClickHouse for analytics workloads, involving over 1,000 enterprise customers and processing over one billion page views daily. Before migration, ContentSquare ran 14 Elasticsearch clusters, each with 30 nodes, and struggled to make them bigger while keeping them stable. They were unable to host very large clients with high traffic, and frequently had to move clients between clusters as their traffic grew beyond cluster capacity. | ||
|
||
ContentSquare took a phased approach to avoid disrupting business operations. They first tested ClickHouse on a new mobile analytics product, which took four months to ship to production. This success convinced them to migrate their main web analytics platform. The full web migration took ten months to port all endpoints, followed by careful client-by-client migration of 600 clients in batches to avoid performance issues. They built extensive automation for non-regression testing, allowing them to complete the migration with zero regressions. | ||
|
||
After migration, the infrastructure became 11x cheaper while storing six times more data and delivering 10x faster performance on the 99th percentile queries. *"We are saving multiple millions per year using ClickHouse,"* the team noted. The performance improvements were particularly notable for their slowest queries—while fast queries (200ms on Elasticsearch) only improved to about 100ms on ClickHouse, their worst-performing queries went from over 15 seconds on Elasticsearch to under 2 seconds on ClickHouse. | ||
|
||
Their current ClickHouse setup includes 16 clusters across four regions on AWS and Azure, with over 100 nodes total. Each cluster typically has nine shards with two replicas per shard. They process approximately 100,000 analytics queries daily with an average response time of 200 milliseconds, while also increasing data retention from 3 months to 13 months. | ||
|
||
**Key Results:** | ||
- 11x reduction in infrastructure costs | ||
- 6x increase in data storage capacity | ||
- 10x faster 99th percentile query performance | ||
- Multiple millions in annual savings | ||
- Increased data retention from 3 months to 13 months | ||
- Zero regressions during migration | ||
|
||
## Compression Strategy: LZ4 vs ZSTD in Production {#compression-strategy} | ||
|
||
When Microsoft Clarity needed to handle hundreds of terabytes of data, they discovered that compression choices have dramatic cost implications. At their scale, every bit of storage savings matters, and they faced a classic trade-off: performance versus storage costs. Microsoft Clarity handles massive volumes—two petabytes of uncompressed data per month across all accounts, processing around 60,000 queries per hour across eight nodes and serving billions of page views from millions of websites. At this scale, compression strategy becomes a critical cost factor. | ||
|
||
They initially used ClickHouse's default LZ4 compression but discovered significant cost savings were possible with ZSTD. While LZ4 is faster, ZSTD provides better compression at the cost of slightly slower performance. After testing both approaches, they made a strategic decision to prioritize storage savings. The results were significant: 50% storage savings on large tables with manageable performance impact on ingestion and queries. | ||
|
||
**Key Results:** | ||
- 50% storage savings on large tables through ZSTD compression | ||
- 2 petabytes monthly data processing capacity | ||
- Manageable performance impact on ingestion and queries | ||
- Significant cost reduction at hundreds of TB scale | ||
|
||
## Column-Based Retention Strategy {#column-retention} | ||
|
||
One of the most powerful cost optimization techniques comes from analyzing which columns are actually being used. Microsoft Clarity implements sophisticated column-based retention strategies using ClickHouse's built-in telemetry capabilities. ClickHouse provides detailed metrics on storage usage by column as well as comprehensive query patterns—which columns are accessed, how frequently, query duration, and overall usage statistics. | ||
|
||
This data-driven approach enables strategic decisions about retention policies and column lifecycle management. By analyzing this telemetry data, Microsoft can identify storage hot spots—columns that consume significant space but receive minimal queries. For these low-usage columns, they can implement aggressive retention policies, reducing storage time from 30 months to just one month, or delete the columns entirely if they're not queried at all. This selective retention strategy reduces storage costs without impacting user experience. | ||
|
||
**The Strategy:** | ||
- Analyze column usage patterns using ClickHouse telemetry | ||
- Identify high-storage, low-query columns | ||
- Implement selective retention policies | ||
- Monitor query patterns for data-driven decisions | ||
|
||
## Partition-Based Data Management {#partition-management} | ||
|
||
Microsoft Clarity discovered that partitioning strategy impacts both performance and operational simplicity. Their approach: partition by date, order by hour. This strategy delivers multiple benefits beyond just cleanup efficiency—it enables trivial data cleanup, simplifies billing calculations for their customer-facing service, and supports GDPR compliance requirements for row-based deletion. | ||
|
||
**Key Benefits:** | ||
- Trivial data cleanup (drop partition vs row-by-row deletion) | ||
- Simplified billing calculations | ||
- Better query performance through partition elimination | ||
- Easier operational management | ||
|
||
## String-to-Integer Conversion Strategy {#string-integer-conversion} | ||
|
||
Analytics platforms often face a storage challenge with categorical data that appears repeatedly across millions of rows. Microsoft's engineering team encountered this problem with their search analytics data and developed an effective solution that achieved 60% storage reduction on affected datasets. | ||
|
||
In Microsoft's web analytics system, search results trigger different types of answers—weather cards, sports information, news articles, and factual responses. Each query result was tagged with descriptive strings like "weather_answer," "sports_answer," or "factual_answer." With billions of search queries processed, these string values were being stored repeatedly in ClickHouse, consuming massive amounts of storage space and requiring expensive string comparisons during queries. | ||
|
||
Microsoft implemented a string-to-integer mapping system using a separate MySQL database. Instead of storing the actual strings in ClickHouse, they store only integer IDs. When users run queries through the UI and request data for "weather_answer," their query optimizer first consults the MySQL mapping table to get the corresponding integer ID, then converts the query to use that integer before sending it to ClickHouse. | ||
|
||
This architecture preserves the user experience—people still see meaningful labels like "weather_answer" in their dashboards—while the backend storage and queries operate on much more efficient integers. The mapping system handles all translation transparently, requiring no changes to the user interface or user workflows. | ||
|
||
**Key Benefits:** | ||
- 60% storage reduction on affected datasets | ||
- Faster query performance on integer comparisons | ||
- Reduced memory usage for joins and aggregations | ||
- Lower network transfer costs for large result sets | ||
|
||
## Video Sources {#video-sources} | ||
|
||
- **[Microsoft Clarity and ClickHouse](https://www.youtube.com/watch?v=rUVZlquVGw0)** - Microsoft Clarity Team | ||
- **[ClickHouse journey in Contentsquare](https://www.youtube.com/watch?v=zvuCBAl2T0Q)** - Doron Hoffman & Guram Sigua (ContentSquare) | ||
|
||
*These community cost optimization insights represent strategies from companies processing hundreds of terabytes to petabytes of data, showing real-world approaches to reducing ClickHouse operational costs.* |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had taken to sentence casing, which is how Google does it https://developers.google.com/style/capitalization. I'm not strongly opinionated on which style we have but it will be nice to keep it consistent.