diff --git a/docs/tips-and-tricks/community-wisdom.md b/docs/tips-and-tricks/community-wisdom.md new file mode 100644 index 00000000000..6f67c656fb3 --- /dev/null +++ b/docs/tips-and-tricks/community-wisdom.md @@ -0,0 +1,57 @@ +--- +sidebar_position: 1 +slug: /tips-and-tricks/community-wisdom +sidebar_label: 'Community Wisdom' +doc_type: 'overview' +keywords: [ + 'database tips', + 'community wisdom', + 'production troubleshooting', + 'performance optimization', + 'database debugging', + 'clickhouse guides', + 'real world examples', + 'database best practices', + 'meetup insights', + 'production lessons', + 'interactive tutorials', + 'database solutions' +] +title: 'ClickHouse Community Wisdom' +description: 'Learn from the ClickHouse community with real world scenarios and lessons learned' +--- + +# ClickHouse Community Wisdom: Tips and Tricks from Meetups {#community-wisdom} + +*These interactive guides represent collective wisdom from hundreds of production deployments. Each runnable example helps you understand ClickHouse patterns using real GitHub events data - practice these concepts to avoid common mistakes and accelerate your success.* + +Combine this collected knowledge with our [Best Practices](/best-practices) guide for optimal ClickHouse Experience. + +## Problem-Specific Quick Jumps {#problem-specific-quick-jumps} + +| Issue | Document | Description | +|-------|---------|-------------| +| **Production Issue** | [Debugging-toolkit](./debugging-toolkit.md) | Copy/Paste Queries, production debugging guidance | +| **Slow Queries** | [Performance optimization](./performance-optimization.md) | Optimize Performance | +| **Materialized Views** | [MV double-edged sword](./materialized-views.md) | Avoid 10x storage instances | +| **Too Many Parts** | [Too many parts](./too-many-parts.md) | Addressing the 'Too Many Parts' error and performance slowdown | +| **High Costs** | [Cost optimization](./cost-optimization.md) | Optimize Cost | +| **Creative Use Cases** | [Success stories](./creative-usecases.md) | Examples of ClickHouse in 'Outside the Box' use cases | + +### Usage Instructions {#usage-instructions} + +1. **Run the examples** - Many SQL blocks executable +2. **Experiment freely** - Modify queries to test different patterns +3. **Adapt to your data** - Use templates with your own table names +4. **Monitor regularly** - Implement health check queries as ongoing monitoring +5. **Learn progressively** - Start with basics, advance to optimization patterns + +### Interactive Features {#interactive-features} + +- **Real Data Examples**: Using actual GitHub events from ClickHouse playground +- **Production-Ready Templates**: Adapt examples for your production systems +- **Progressive Difficulty**: From basic concepts to advanced optimization +- **Emergency Procedures**: Ready-to-use debugging and recovery queries + +**Last Updated:** Based on community meetup insights through 2024-2025 +**Contributing:** Found a mistake or have a new lesson? Community contributions welcome \ No newline at end of file diff --git a/docs/tips-and-tricks/cost-optimization.md b/docs/tips-and-tricks/cost-optimization.md new file mode 100644 index 00000000000..ac669fc2229 --- /dev/null +++ b/docs/tips-and-tricks/cost-optimization.md @@ -0,0 +1,100 @@ +--- +sidebar_position: 1 +slug: /community-wisdom/cost-optimization +sidebar_label: 'Cost Optimization' +doc_type: 'how-to-guide' +keywords: [ + 'cost optimization', + 'storage costs', + 'partition management', + 'data retention', + 'storage analysis', + 'database optimization', + 'clickhouse cost reduction', + 'storage hot spots', + 'ttl performance', + 'disk usage', + 'compression strategies', + 'retention analysis' +] +title: 'Lessons - Cost Optimization' +description: 'Battle-tested cost optimization strategies from ClickHouse community meetups with real production examples and verified techniques.' +--- + +# Cost Optimization: Battle-Tested Strategies {#cost-optimization} +*This guide is part of a collection of findings gained from community meetups. The findings on this page cover community wisdom related to optimizing cost while using ClickHouse. For more real world solutions and insights you can [browse by specific problem](./community-wisdom.md).* + +## The ContentSquare Migration: 11x Cost Reduction {#contentsquare-migration} + +ContentSquare's migration from Elasticsearch to ClickHouse shows the cost optimization potential when moving to ClickHouse for analytics workloads, involving over 1,000 enterprise customers and processing over one billion page views daily. Before migration, ContentSquare ran 14 Elasticsearch clusters, each with 30 nodes, and struggled to make them bigger while keeping them stable. They were unable to host very large clients with high traffic, and frequently had to move clients between clusters as their traffic grew beyond cluster capacity. + +ContentSquare took a phased approach to avoid disrupting business operations. They first tested ClickHouse on a new mobile analytics product, which took four months to ship to production. This success convinced them to migrate their main web analytics platform. The full web migration took ten months to port all endpoints, followed by careful client-by-client migration of 600 clients in batches to avoid performance issues. They built extensive automation for non-regression testing, allowing them to complete the migration with zero regressions. + +After migration, the infrastructure became 11x cheaper while storing six times more data and delivering 10x faster performance on the 99th percentile queries. *"We are saving multiple millions per year using ClickHouse,"* the team noted. The performance improvements were particularly notable for their slowest queries—while fast queries (200ms on Elasticsearch) only improved to about 100ms on ClickHouse, their worst-performing queries went from over 15 seconds on Elasticsearch to under 2 seconds on ClickHouse. + +Their current ClickHouse setup includes 16 clusters across four regions on AWS and Azure, with over 100 nodes total. Each cluster typically has nine shards with two replicas per shard. They process approximately 100,000 analytics queries daily with an average response time of 200 milliseconds, while also increasing data retention from 3 months to 13 months. + +**Key Results:** +- 11x reduction in infrastructure costs +- 6x increase in data storage capacity +- 10x faster 99th percentile query performance +- Multiple millions in annual savings +- Increased data retention from 3 months to 13 months +- Zero regressions during migration + +## Compression Strategy: LZ4 vs ZSTD in Production {#compression-strategy} + +When Microsoft Clarity needed to handle hundreds of terabytes of data, they discovered that compression choices have dramatic cost implications. At their scale, every bit of storage savings matters, and they faced a classic trade-off: performance versus storage costs. Microsoft Clarity handles massive volumes—two petabytes of uncompressed data per month across all accounts, processing around 60,000 queries per hour across eight nodes and serving billions of page views from millions of websites. At this scale, compression strategy becomes a critical cost factor. + +They initially used ClickHouse's default LZ4 compression but discovered significant cost savings were possible with ZSTD. While LZ4 is faster, ZSTD provides better compression at the cost of slightly slower performance. After testing both approaches, they made a strategic decision to prioritize storage savings. The results were significant: 50% storage savings on large tables with manageable performance impact on ingestion and queries. + +**Key Results:** +- 50% storage savings on large tables through ZSTD compression +- 2 petabytes monthly data processing capacity +- Manageable performance impact on ingestion and queries +- Significant cost reduction at hundreds of TB scale + +## Column-Based Retention Strategy {#column-retention} + +One of the most powerful cost optimization techniques comes from analyzing which columns are actually being used. Microsoft Clarity implements sophisticated column-based retention strategies using ClickHouse's built-in telemetry capabilities. ClickHouse provides detailed metrics on storage usage by column as well as comprehensive query patterns—which columns are accessed, how frequently, query duration, and overall usage statistics. + +This data-driven approach enables strategic decisions about retention policies and column lifecycle management. By analyzing this telemetry data, Microsoft can identify storage hot spots—columns that consume significant space but receive minimal queries. For these low-usage columns, they can implement aggressive retention policies, reducing storage time from 30 months to just one month, or delete the columns entirely if they're not queried at all. This selective retention strategy reduces storage costs without impacting user experience. + +**The Strategy:** +- Analyze column usage patterns using ClickHouse telemetry +- Identify high-storage, low-query columns +- Implement selective retention policies +- Monitor query patterns for data-driven decisions + +## Partition-Based Data Management {#partition-management} + +Microsoft Clarity discovered that partitioning strategy impacts both performance and operational simplicity. Their approach: partition by date, order by hour. This strategy delivers multiple benefits beyond just cleanup efficiency—it enables trivial data cleanup, simplifies billing calculations for their customer-facing service, and supports GDPR compliance requirements for row-based deletion. + +**Key Benefits:** +- Trivial data cleanup (drop partition vs row-by-row deletion) +- Simplified billing calculations +- Better query performance through partition elimination +- Easier operational management + +## String-to-Integer Conversion Strategy {#string-integer-conversion} + +Analytics platforms often face a storage challenge with categorical data that appears repeatedly across millions of rows. Microsoft's engineering team encountered this problem with their search analytics data and developed an effective solution that achieved 60% storage reduction on affected datasets. + +In Microsoft's web analytics system, search results trigger different types of answers—weather cards, sports information, news articles, and factual responses. Each query result was tagged with descriptive strings like "weather_answer," "sports_answer," or "factual_answer." With billions of search queries processed, these string values were being stored repeatedly in ClickHouse, consuming massive amounts of storage space and requiring expensive string comparisons during queries. + +Microsoft implemented a string-to-integer mapping system using a separate MySQL database. Instead of storing the actual strings in ClickHouse, they store only integer IDs. When users run queries through the UI and request data for "weather_answer," their query optimizer first consults the MySQL mapping table to get the corresponding integer ID, then converts the query to use that integer before sending it to ClickHouse. + +This architecture preserves the user experience—people still see meaningful labels like "weather_answer" in their dashboards—while the backend storage and queries operate on much more efficient integers. The mapping system handles all translation transparently, requiring no changes to the user interface or user workflows. + +**Key Benefits:** +- 60% storage reduction on affected datasets +- Faster query performance on integer comparisons +- Reduced memory usage for joins and aggregations +- Lower network transfer costs for large result sets + +## Video Sources {#video-sources} + +- **[Microsoft Clarity and ClickHouse](https://www.youtube.com/watch?v=rUVZlquVGw0)** - Microsoft Clarity Team +- **[ClickHouse journey in Contentsquare](https://www.youtube.com/watch?v=zvuCBAl2T0Q)** - Doron Hoffman & Guram Sigua (ContentSquare) + +*These community cost optimization insights represent strategies from companies processing hundreds of terabytes to petabytes of data, showing real-world approaches to reducing ClickHouse operational costs.* \ No newline at end of file diff --git a/docs/tips-and-tricks/creative-usecases.md b/docs/tips-and-tricks/creative-usecases.md new file mode 100644 index 00000000000..5ecac5e5e3c --- /dev/null +++ b/docs/tips-and-tricks/creative-usecases.md @@ -0,0 +1,89 @@ +--- +sidebar_position: 1 +slug: /community-wisdom/creative-use-cases +sidebar_label: 'Creative Use Cases' +doc_type: 'how-to-guide' +keywords: [ + 'clickhouse creative use cases', + 'clickhouse success stories', + 'unconventional database uses', + 'clickhouse rate limiting', + 'analytics database applications', + 'clickhouse mobile analytics', + 'customer-facing analytics', + 'database innovation', + 'clickhouse real-time applications', + 'alternative database solutions', + 'breaking database conventions', + 'production success stories' +] +title: 'Lessons - Creative Use Cases' +description: 'Find solutions to the most common ClickHouse problems including slow queries, memory errors, connection issues, and configuration problems.' +--- + +# Breaking the Rules: Success Stories {#breaking-the-rules} +*This guide is part of a collection of findings gained from community meetups. For more real world solutions and insights you can [browse by specific problem](./community-wisdom.md).* +*Need tips on debugging an issue in prod? Check out the [Debugging Toolkit](./debugging-toolkit.md) community insights guide.* + +These stories showcase how companies found success by using ClickHouse for unconventional use cases, challenging traditional database categories and proving that sometimes the "wrong" tool becomes exactly the right solution. + +## ClickHouse as Rate Limiter {#clickhouse-rate-limiter} + +When Craigslist needed to add tier-one rate limiting to protect their users, they faced the same decision every engineering team encounters: follow conventional wisdom and use Redis, or explore something different. Brad Lhotsky, working at Craigslist, knew Redis was the standard choice—virtually every rate limiting tutorial and example online uses Redis for good reason. It has rich primitives for rate limiting operations, well-established patterns, and proven track record. But Craigslist's experience with Redis wasn't matching the textbook examples. *"Our experience with Redis is not like what you've seen in the movies... there are a lot of weird maintenance issues that we've hit where we reboot a node in a Redis cluster and some latency spike hits the front end."* For a small team that values maintenance simplicity, these operational headaches were becoming a real problem. + +So when Brad was approached with the rate limiting requirements, he took a different approach: *"I asked my boss, 'What do you think of this idea? Maybe I can try this with ClickHouse?'"* The idea was unconventional—using an analytical database for what's typically a caching layer problem—but it addressed their core requirements: fail open, impose no latency penalties, and be maintenance-safe for a small team. The solution leveraged their existing infrastructure where access logs were already flowing into ClickHouse via Kafka. Instead of maintaining a separate Redis cluster, they could analyze request patterns directly from the access log data and inject rate limiting rules into their existing ACL API. The approach meant slightly higher latency than Redis, which *"is kind of cheating by instantiating that data set upfront"* rather than doing real-time aggregate queries, but the queries still completed in under 100 milliseconds. + +**Key Results:** +- Dramatic improvement over Redis infrastructure +- Built-in TTL for automatic cleanup eliminated maintenance overhead +- SQL flexibility enabled complex rate limiting rules beyond simple counters +- Leveraged existing data pipeline instead of requiring separate infrastructure + +## ClickHouse for Customer Analytics {#customer-analytics} + +When ServiceNow needed to upgrade their mobile analytics platform, they faced a simple question: *"Why would we replace something that works?"* Amir Vaza from ServiceNow knew their existing system was reliable, but customer demands were outgrowing what it could handle. *"The motivation to replace an existing reliable model is actually from the product world,"* Amir explained. ServiceNow offered mobile analytics as part of their solution for web, mobile, and chatbots, but customers wanted analytical flexibility that went beyond pre-aggregated data. + +Their previous system used about 30 different tables with pre-aggregated data segmented by fixed dimensions: application, app version, and platform. For custom properties—key-value pairs that customers could send—they created separate counters for each group. This approach delivered fast dashboard performance but came with a major limitation. *"While this is great for quick value breakdown, I mentioned limitation leads to a lot of loss of analytical context,"* Amir noted. Customers couldn't perform complex customer journey analysis or ask questions like "how many sessions started with the search term 'research RSA token'" and then analyze what those users did next. The pre-aggregated structure destroyed the sequential context needed for multi-step analysis, and every new analytical dimension required engineering work to pre-aggregate and store. + +So when the limitations became clear, ServiceNow moved to ClickHouse and eliminated these pre-computation constraints entirely. Instead of calculating every variable upfront, they broke metadata into data points and inserted everything directly into ClickHouse. They used ClickHouse's async insert queue, which Amir called *"actually amazing,"* to handle data ingestion efficiently. The approach meant customers could now create their own segments, slice data freely across any dimensions, and perform complex customer journey analysis that wasn't possible before. + +**Key Results:** +- Dynamic segmentation across any dimensions without pre-computation +- Complex customer journey analysis became possible +- Customers could create their own segments and slice data freely +- No more engineering bottlenecks for new analytical requirements + +```sql runnable editable +-- Challenge: Try different customer journey analysis - track user flows across multiple touchpoints +-- Experiment: Test complex segmentation that wasn't possible with pre-aggregated tables +SELECT + 'Dynamic Customer Journey Analysis' as feature, + actor_login as user_id, + arrayStringConcat(groupArray(event_type), ' -> ') as user_journey, + count() as journey_frequency, + toStartOfDay(min(created_at)) as journey_start_date, + 'Real-time multi-dimensional analysis' as capability +FROM github.github_events +WHERE created_at >= '2024-01-15' + AND created_at < '2024-01-16' + AND event_type IN ('WatchEvent', 'StarEvent', 'ForkEvent', 'IssuesEvent') +GROUP BY user_id +HAVING journey_frequency >= 3 +ORDER BY journey_frequency DESC +LIMIT 15; +``` + +### The Pattern of Innovation {#pattern-of-innovation} + +Both success stories follow a similar pattern: teams that succeeded by questioning database orthodoxy rather than accepting conventional limitations. The breakthrough came when engineering leaders asked themselves whether the "right" tool was actually serving their specific needs. + +Craigslist's moment came when Brad asked: *"What do you think of this idea? Maybe I can try this with ClickHouse?"* Instead of accepting Redis maintenance complexity, they found a path that leveraged existing infrastructure. ServiceNow's realization was similar—rather than accepting that analytics must be slow or pre-computed, they recognized that customers needed the ability to segment data and slice it dynamically without constraints. + +Both teams succeeded because they designed around ClickHouse's unique strengths rather than trying to force it into traditional database patterns. They understood that sometimes the "analytical database" becomes the perfect operational solution when speed and SQL flexibility matter more than traditional OLTP guarantees. ClickHouse's combination of speed, SQL flexibility, and operational simplicity enables use cases that traditional database categories can't address—proving that the best tool is often the one that solves your specific problems, not the one that fits the textbook definition. + +## Video Sources {#video-sources} + +- **[Breaking the Rules - Building a Rate Limiter with ClickHouse](https://www.youtube.com/watch?v=wRwqrbUjRe4)** - Brad Lhotsky (Craigslist) +- **[ClickHouse as an Analytical Solution in ServiceNow](https://www.youtube.com/watch?v=b4Pmpx3iRK4)** - Amir Vaza (ServiceNow) + +*These stories demonstrate how questioning conventional database wisdom can lead to breakthrough solutions that redefine what's possible with analytical databases.* \ No newline at end of file diff --git a/docs/tips-and-tricks/debugging-toolkit.md b/docs/tips-and-tricks/debugging-toolkit.md new file mode 100644 index 00000000000..8e40ba38cc0 --- /dev/null +++ b/docs/tips-and-tricks/debugging-toolkit.md @@ -0,0 +1,176 @@ +--- +sidebar_position: 1 +slug: /community-wisdom/debugging-toolkit +sidebar_label: 'Debugging Toolkit' +doc_type: 'how-to-guide' +keywords: [ + 'clickhouse troubleshooting', + 'clickhouse errors', + 'slow queries', + 'memory problems', + 'connection issues', + 'performance optimization', + 'database errors', + 'configuration problems', + 'debug', + 'solutions' +] +title: 'Lessons - Debugging Toolkit' +description: 'Find solutions to the most common ClickHouse problems including slow queries, memory errors, connection issues, and configuration problems.' +--- + +# Operations: The 2AM Debugging Toolkit {#operations-debugging} +*This guide is part of a collection of findings gained from community meetups. For more real world solutions and insights you can [browse by specific problem](./community-wisdom.md).* +*Suffering from high operational costs? Check out the [Cost Optimization](./cost-optimization.md) community insights guide.* + +## When Everything is Broken: Emergency Diagnostics {#emergency-diagnostics} + +**Community philosophy:** *"If something looks odd, even just slightly, something is wrong. Investigate before it gets worse."* + +## EMERGENCY: Production Incident Queries (Copy-Paste Ready) {#emergency-queries} + +**When your ClickHouse is down at 2AM, run these in order:** + +```sql +-- Step 1: What's broken right now? +SELECT name, value, 'CRITICAL ERROR' as urgency +FROM system.errors +WHERE value > 0 +ORDER BY value DESC; +``` + +```sql +-- Step 2: Disk space check (most common killer) +SELECT + database, table, + formatReadableSize(sum(bytes_on_disk)) as size, + count() as parts, + CASE + WHEN sum(bytes_on_disk) > 15*1024*1024*1024*1024 THEN 'CRITICAL: Near 16TB limit' + WHEN count() > 1000 THEN 'PARTS EXPLOSION' + ELSE 'OK' + END as status +FROM system.parts +WHERE active=1 AND database NOT IN ('system') +GROUP BY database, table +ORDER BY sum(bytes_on_disk) DESC; +``` + +```sql +-- Step 3: Replication problems +SELECT + database, table, absolute_delay, queue_size, + CASE + WHEN absolute_delay > 300 THEN 'CRITICAL: 5+ min lag' + WHEN is_readonly = 1 THEN 'READ-ONLY ERROR' + ELSE 'OK' + END as status +FROM system.replicas +ORDER BY absolute_delay DESC; +``` + +```sql +-- Step 4: Kill resource hogs +SELECT query_id, user, elapsed, formatReadableSize(memory_usage) as memory, + substring(query, 1, 80) as query_preview +FROM system.processes +WHERE elapsed > 60 OR memory_usage > 4*1024*1024*1024 +ORDER BY memory_usage DESC; + +-- To kill: KILL QUERY WHERE query_id = 'paste_id_here'; +``` + +```sql +-- Step 5: Stuck merges +SELECT database, table, elapsed, progress, + CASE WHEN elapsed > 3600 AND progress < 0.1 THEN 'STUCK' ELSE 'OK' END +FROM system.merges +ORDER BY elapsed DESC; +``` + +## Learning: Incident Pattern Recognition {#incident-patterns} + +**Understand the failure modes with working examples:** + +### Memory Exhaustion Detection {#memory-exhaustion} + +```sql runnable editable +-- Challenge: Try different cardinality combinations to see which ones are most dangerous +-- Experiment: Add SAMPLE 0.1 to this query if it's slow on large datasets +SELECT + 'Memory Risk Analysis' as analysis_type, + count() as total_events, + uniq(actor_login, repo_name, event_type) as unique_combinations, + round(uniq(actor_login, repo_name, event_type) / count() * 100, 2) as cardinality_percent, + CASE + WHEN uniq(actor_login, repo_name, event_type) / count() > 0.9 + THEN 'CRITICAL: Nearly every row unique - will exhaust memory!' + WHEN uniq(actor_login, repo_name, event_type) / count() > 0.5 + THEN 'HIGH RISK: Too many unique groups' + ELSE 'SAFE: Reasonable aggregation ratio' + END as memory_risk_level +FROM github.github_events +WHERE created_at >= '2024-01-01' AND created_at < '2024-01-02' +LIMIT 1; +``` + +### Bad Data Detection {#bad-data-detection} + +```sql runnable editable +-- Challenge: Modify the year thresholds (2010, 2030) based on your expected data ranges +-- Experiment: Try different time ranges to see what suspicious data patterns emerge +SELECT + 'Data Quality Check' as analysis, + data_year, + count() as events, + CASE + WHEN data_year < 2010 THEN 'BAD: Suspiciously old timestamps' + WHEN data_year > 2030 THEN 'BAD: Far future timestamps' + ELSE 'NORMAL' + END as data_quality +FROM ( + SELECT toYear(created_at) as data_year + FROM github.github_events + WHERE created_at >= '2020-01-01' +) +GROUP BY data_year +ORDER BY data_year DESC; +``` + +## The 2AM Methodology {#the-2am-methodology} + +**Follow this exact sequence when everything is broken:** + +### Phase 1: Immediate Triage (30 seconds) {#phase-1-immediate-triage} + +1. Run `system.errors` - any non-zero = active incident +2. Check disk space - *"It was as simple as low disk it took us from 12 to 4 AM"* +3. Look for replication lag > 5 minutes + +### Phase 2: Resource Investigation (2 minutes) {#phase-2-resource-investigation} + +4. Find memory-hungry queries in `system.processes` +5. Check for stuck merges running >1 hour +6. Kill obviously problematic queries + +### Phase 3: Data Quality Check (5 minutes) {#phase-3-data-quality-check} + +7. Look for bad partitions (1998, 2050 dates) +8. Check for parts explosion (>1000 parts per table) + +## Emergency Actions Reference {#emergency-actions} + +**Production-tested solutions:** + +| Problem | Detection Query | Solution | +|---------|-----------------|----------| +| **Memory OOM** | `SELECT * FROM system.processes WHERE memory_usage > 8GB` | *"Enable external aggregation-it will be a little bit slower...but it will use much less memory"* | +| **Disk Full** | `SELECT sum(bytes_on_disk) FROM system.parts` | Delete old partitions, expand disk | +| **Replication Lag** | `SELECT * FROM system.replicas WHERE absolute_delay > 300` | Check network, restart lagging replica | +| **Stuck Query** | `SELECT * FROM system.processes WHERE elapsed > 300` | `KILL QUERY WHERE query_id = '...'` | +| **Parts Explosion** | `SELECT count() FROM system.parts WHERE active=1` | Enable async_insert, increase batch sizes | + +**The golden rule:** *"Problems very rarely just pop out of nowhere there are signs... investigate it before it goes from 15 milliseconds to 30 seconds"* + +## Video Sources {#video-sources} +- [10 Lessons from Operating ClickHouse](https://www.youtube.com/watch?v=liTgGiTuhJE) - Source of the disk space, memory, and bad data lessons from production operations \ No newline at end of file diff --git a/docs/tips-and-tricks/materialized-views.md b/docs/tips-and-tricks/materialized-views.md new file mode 100644 index 00000000000..c0f0b4b1ed2 --- /dev/null +++ b/docs/tips-and-tricks/materialized-views.md @@ -0,0 +1,93 @@ +--- +sidebar_position: 1 +slug: /tips-and-tricks/materialized-views +sidebar_label: 'Materialized Views' +doc_type: 'how-to' +keywords: [ + 'clickhouse materialized views', + 'materialized view optimization', + 'materialized view storage issues', + 'materialized view best practices', + 'database aggregation patterns', + 'materialized view anti-patterns', + 'storage explosion problems', + 'materialized view performance', + 'database view optimization', + 'aggregation strategy', + 'materialized view troubleshooting', + 'view storage overhead' +] +title: 'Lessons - Materialized Views' +description: 'Real world examples of materialized views, problems and solutions' +--- + +# Materialized Views: The Double-Edged Sword {#materialized-views-the-double-edged-sword} +*This guide is part of a collection of findings gained from community meetups. For more real world solutions and insights you can [browse by specific problem](./community-wisdom.md).* +*Too many parts bogging your database down? Check out the [Too Many Parts](./too-many-parts.md) community insights guide.* +*Learn more about [Materialized Views](/materialized-views).* + +## The 10x Storage Anti-Pattern {#storage-antipattern} + +**Real production problem:** *"We had a materialized view... the raw log table was around 20 gig but the view from that log table got exploded to 190 gig so almost 10x the size of the raw table... this happened because... we were creating one row per attribute and each log can have 10 attributes"* + +**Rule:** If your GROUP BY creates more rows than it eliminates, you're building an expensive index, not a materialized view. + +## Production MV Health Validation {#mv-health-validation} + +```sql runnable editable +-- Challenge: Replace table name with your own data to check MV viability +-- Experiment: Try different GROUP BY combinations to see aggregation ratios +SELECT + 'MV Pre-deployment Health Check' as analysis_type, + count() as source_rows, + uniq(pickup_datetime, dropoff_datetime, passenger_count) as mv_unique_combinations, + round(uniq(pickup_datetime, dropoff_datetime, passenger_count) / count() * 100, 2) as aggregation_ratio_percent, + formatReadableSize(count() * 100) as estimated_source_size, + formatReadableSize(uniq(pickup_datetime, dropoff_datetime, passenger_count) * 100) as estimated_mv_size, + round(uniq(pickup_datetime, dropoff_datetime, passenger_count) * 100 / (count() * 100), 1) as storage_multiplier, + CASE + WHEN uniq(pickup_datetime, dropoff_datetime, passenger_count) / count() > 0.95 THEN 'PROBLEM: MV will be larger than source!' + WHEN uniq(pickup_datetime, dropoff_datetime, passenger_count) / count() > 0.7 THEN 'BAD: Massive storage waste (190GB scenario)' + WHEN uniq(pickup_datetime, dropoff_datetime, passenger_count) / count() > 0.3 THEN 'QUESTIONABLE: High storage overhead' + ELSE 'GOOD: Substantial aggregation benefit' + END as mv_assessment +FROM nyc_taxi.trips +WHERE pickup_datetime >= '2015-01-01' + AND pickup_datetime < '2015-01-02' +LIMIT 1; +``` + +## The Successful MV Patterns {#successful-mv-patterns} + +```sql runnable editable +-- Challenge: Try this pattern with different low-cardinality column combinations +-- Experiment: Change the time granularity to see how it affects compression +SELECT + 'Production Success Pattern' as analysis_type, + count() as source_rows, + uniq(event_type, toStartOfHour(created_at)) as unique_combinations, + round(uniq(event_type, toStartOfHour(created_at)) / count() * 100, 4) as aggregation_ratio_percent, + formatReadableSize(count() * 50) as estimated_source_size, + formatReadableSize(uniq(event_type, toStartOfHour(created_at)) * 50) as estimated_mv_size, + CASE + WHEN uniq(event_type, toStartOfHour(created_at)) / count() < 0.001 THEN 'OUTSTANDING: Like the 72GB→3GB compression example' + WHEN uniq(event_type, toStartOfHour(created_at)) / count() < 0.01 THEN 'EXCELLENT: Massive aggregation benefit' + WHEN uniq(event_type, toStartOfHour(created_at)) / count() < 0.1 THEN 'GOOD: Strong aggregation' + ELSE 'REVIEW: Limited benefit' + END as mv_assessment +FROM github.github_events +WHERE created_at >= '2024-01-01' + AND created_at < '2024-01-08' +LIMIT 1; +``` + +## When MVs Become a Problem {#mv-problems} + +**Common mistake:** Teams create too many materialized views and hurt insert performance. + +**Simple fix:** Replace non-critical MVs with regular tables populated by cron jobs. You get the same query benefits without slowing down inserts. + +**Which MVs to remove:** Start with redundant time windows (like 2-hour aggregations when you already have 1-hour) and low-frequency queries. + +## Video Sources {#video-sources} +- [ClickHouse at CommonRoom - Kirill Sapchuk](https://www.youtube.com/watch?v=liTgGiTuhJE) - Source of the "over enthusiastic about materialized views" and "20GB→190GB explosion" case study \ No newline at end of file diff --git a/docs/tips-and-tricks/performance-optimization.md b/docs/tips-and-tricks/performance-optimization.md new file mode 100644 index 00000000000..5c64f6b3701 --- /dev/null +++ b/docs/tips-and-tricks/performance-optimization.md @@ -0,0 +1,322 @@ +--- +sidebar_position: 1 +slug: /community-wisdom/performance-optimization +sidebar_label: 'Performance Optimization' +doc_type: 'how-to-guide' +keywords: [ + 'performance optimization', + 'query performance', + 'database tuning', + 'slow queries', + 'memory optimization', + 'cardinality analysis', + 'indexing strategies', + 'aggregation optimization', + 'sampling techniques', + 'database performance', + 'query analysis', + 'performance troubleshooting' +] +title: 'Lessons - Performance Optimization' +description: 'Real world examples of performance optimization strategies' +--- + +# Performance Optimization: Production-Tested Strategies {#performance-optimization} +*This guide is part of a collection of findings gained from community meetups. For more real world solutions and insights you can [browse by specific problem](./community-wisdom.md).* +*Having trouble with Materialized Views? Check out the [Materialized Views](./materialized-views.md) community insights guide.* +*If you're experiencing slow queries and want more examples, we also have a [Query Optimization](/optimize/query-optimization) guide.* + +## Order by Cardinality (Lowest to Highest) {#rule-1-cardinality-ordering} + +```sql runnable editable +-- Challenge: Try filtering by different event types or time ranges +-- Experiment: Add SAMPLE 0.1 to see how sampling affects cardinality analysis +SELECT + column_name, + unique_values, + total_rows, + cardinality_percentage, + assessment +FROM ( + SELECT + 'event_type' as column_name, + uniq(event_type) as unique_values, + count() as total_rows, + round(uniq(event_type) / count() * 100, 4) as cardinality_percentage, + 'Low cardinality - good for sort key prefix' as assessment + FROM github.github_events + WHERE created_at >= '2024-01-01' + + UNION ALL + + SELECT + 'actor_login' as column_name, + uniq(actor_login) as unique_values, + count() as total_rows, + round(uniq(actor_login) / count() * 100, 4) as cardinality_percentage, + 'High cardinality - use later in sort key' as assessment + FROM github.github_events + WHERE created_at >= '2024-01-01' +) +ORDER BY cardinality_percentage; +``` + +## Time Granularity Matters {#rule-2-time-granularity} + +```sql runnable editable +-- Challenge: Try different time functions like toStartOfMinute or toStartOfWeek +-- Experiment: Compare the cardinality differences with your own timestamp data +SELECT + 'Microsecond precision' as granularity, + uniq(created_at) as unique_values, + 'Creates massive cardinality - bad for sort key' as impact +FROM github.github_events +WHERE created_at >= '2024-01-01' +UNION ALL +SELECT + 'Hour precision', + uniq(toStartOfHour(created_at)), + 'Much better for sort key - enables skip indexing' +FROM github.github_events +WHERE created_at >= '2024-01-01' +UNION ALL +SELECT + 'Day precision', + uniq(toStartOfDay(created_at)), + 'Best for reporting queries' +FROM github.github_events +WHERE created_at >= '2024-01-01'; +``` + +## Focus on Individual Queries, Not Averages {#focus-on-individual-queries-not-averages} + +*"The right way is to ask yourself why this particular query was processed in five seconds... I don't care if median and other queries process quickly. I only care about my query"* + +Instead of looking at average performance, identify specific query patterns that cause problems: + +```sql runnable editable +-- Challenge: Change the cardinality threshold values (0.8, 0.3) to be more or less strict +-- Experiment: Try different column combinations to find the riskiest patterns +SELECT + event_type, + count() as total_events, + uniq(actor_login) as unique_actors, + uniq(repo_name) as unique_repos, + -- High cardinality combinations can cause memory problems + uniq(actor_login, repo_name) as unique_combinations, + round(uniq(actor_login, repo_name) / count() * 100, 2) as cardinality_ratio_percent, + CASE + WHEN uniq(actor_login, repo_name) / count() > 0.8 THEN 'HIGH MEMORY RISK: Almost every row unique' + WHEN uniq(actor_login, repo_name) / count() > 0.3 THEN 'MODERATE RISK: High cardinality grouping' + ELSE 'LOW RISK: Good aggregation potential' + END as memory_risk_assessment +FROM github.github_events +WHERE created_at >= '2024-01-01' + AND created_at < '2024-01-02' +GROUP BY event_type +ORDER BY cardinality_ratio_percent DESC +LIMIT 8; +``` + +**Spot queries with different performance bottlenecks:** + +```sql runnable editable +-- Challenge: Modify the filter conditions to test different early vs late filtering scenarios +-- Experiment: Try adding more complex WHERE conditions to see the efficiency impact +WITH early_filter AS ( + SELECT count() as rows_after_early_filter + FROM github.github_events + WHERE created_at >= '2024-01-01' + AND created_at < '2024-01-02' + AND event_type = 'PushEvent' -- Filter early + AND actor_login LIKE 'a%' +), +late_filter_simulation AS ( + SELECT + count() as total_rows_processed, + countIf(event_type = 'PushEvent' AND actor_login LIKE 'a%') as rows_after_late_filter + FROM github.github_events + WHERE created_at >= '2024-01-01' + AND created_at < '2024-01-02' +) +SELECT + 'Early Filtering' as strategy, + early_filter.rows_after_early_filter as result_rows, + early_filter.rows_after_early_filter as rows_scanned, + 'Efficient: Only processes needed rows' as assessment +FROM early_filter +UNION ALL +SELECT + 'Late Filtering Simulation', + late_filter_simulation.rows_after_late_filter, + late_filter_simulation.total_rows_processed, + concat('Inefficient: Processes ', + toString(round(late_filter_simulation.total_rows_processed / late_filter_simulation.rows_after_late_filter)), + 'x more rows than needed') as assessment +FROM late_filter_simulation; +``` + +**The key lesson from production teams:** When a query is slow, don't just look at averages. Ask "Why was THIS specific query slow?" and examine the actual resource usage patterns. + +## Memory vs Row Scanning Trade-offs {#memory-vs-row-scanning-trade-offs} + +**Sentry's key insight:** *"The cardinality of the grouping key that's going to drive memory in this particular situation"* - High cardinality aggregations kill performance through memory exhaustion, not row scanning. + +**Pattern Recognition:** When queries fail, determine if it's a memory problem (too many groups) or scanning problem (too many rows). + +**Compare granularity impact:** + +```sql runnable editable +-- Challenge: Try different time granularities like toStartOfMinute or toStartOfWeek +-- Experiment: See how the memory multiplier changes with different time functions +SELECT + 'Granularity Analysis' as analysis_type, + uniq(toStartOfHour(created_at)) as hourly_groups, + uniq(toStartOfMinute(created_at)) as minute_groups, + uniq(created_at) as microsecond_groups, + concat('Minute granularity = ', toString(round(uniq(toStartOfMinute(created_at)) / uniq(toStartOfHour(created_at)))), 'x more memory') as minute_impact, + concat('Microsecond = ', toString(round(uniq(created_at) / uniq(toStartOfHour(created_at)))), 'x more memory') as microsecond_impact +FROM github.github_events +WHERE created_at >= '2024-01-01' + AND created_at < '2024-01-02' +LIMIT 1; +``` + +**High-cardinality danger pattern:** + +```sql runnable editable +-- Challenge: Try different column combinations to see which create the most unique groups +-- Experiment: Add or remove columns from the uniq() function to test cardinality impact +SELECT + 'Dangerous Pattern Analysis' as analysis_type, + count() as total_events, + uniq(actor_login, repo_name, event_type) as unique_combinations, + round(uniq(actor_login, repo_name, event_type) / count() * 100, 2) as cardinality_percent, + CASE + WHEN uniq(actor_login, repo_name, event_type) / count() > 0.9 + THEN 'CRITICAL: Nearly every row unique - will exhaust memory!' + WHEN uniq(actor_login, repo_name, event_type) / count() > 0.5 + THEN 'HIGH RISK: Too many unique groups' + ELSE 'SAFE: Reasonable aggregation ratio' + END as memory_risk +FROM github.github_events +WHERE created_at >= '2024-01-01' + AND created_at < '2024-01-02' +LIMIT 1; +``` + +**Sentry's sampling solution for memory problems:** + +```sql runnable editable +-- Challenge: Try different sampling rates (0.01, 0.05, 0.2) to see accuracy vs speed trade-offs +-- Experiment: Change the hash function or modulus to see different sampling methods +WITH sampled_data AS ( + SELECT + count() as sampled_events, + uniq(actor_login) as sampled_unique_users, + round(avg(length(repo_name)), 2) as sampled_avg_repo_length + FROM github.github_events + WHERE created_at >= '2024-01-01' + AND created_at < '2024-01-02' + AND cityHash64(actor_login) % 10 = 0 -- Deterministic 10% sample +), +full_data AS ( + SELECT + count() as full_events, + uniq(actor_login) as full_unique_users, + round(avg(length(repo_name)), 2) as full_avg_repo_length + FROM github.github_events + WHERE created_at >= '2024-01-01' + AND created_at < '2024-01-02' +) +SELECT + 'Sampling Comparison' as analysis, + sampled_events * 10 as estimated_total_from_sample, + full_events as actual_total, + sampled_unique_users as users_in_sample, + full_unique_users as actual_unique_users, + '90% memory reduction with sampling' as benefit +FROM sampled_data, full_data; +``` + +## Sentry's Bit Mask Optimization: From Memory Explosion to 8 Bytes {#bit-mask-optimization} + +**The Problem:** When aggregating by high-cardinality columns (like URLs), each unique value creates a separate aggregation state in memory, leading to memory exhaustion. + +**Sentry's Solution:** Instead of grouping by the actual URL strings, group by boolean expressions that collapse into bit masks. + +**PROBLEM: Memory explosion with unbounded string arrays** + +```sql runnable editable +-- This creates separate aggregation states for every unique repo +-- Challenge: Watch the memory usage - each user stores ALL their repo names +SELECT + actor_login, + groupArray(repo_name) as all_repos, + length(all_repos) as repo_count +FROM github.github_events +WHERE created_at >= '2024-01-01' AND created_at < '2024-01-02' +GROUP BY actor_login +HAVING repo_count > 5 +ORDER BY repo_count DESC +LIMIT 10; +``` + +**SOLUTION: Boolean expressions that collapse to single integers** + +```sql runnable editable +-- Each condition becomes a bit in a single integer per user +-- Challenge: Add more repo conditions and see how memory stays bounded +SELECT + actor_login, + -- Each sumIf creates a single integer counter, not arrays of strings + sumIf(1, repo_name LIKE '%microsoft%') as microsoft_activity, + sumIf(1, repo_name LIKE '%google%') as google_activity, + sumIf(1, repo_name LIKE '%kubernetes%') as k8s_activity, + sumIf(1, event_type = 'PushEvent') as total_pushes, + -- Complex conditions still collapse to single counters + sumIf(1, repo_name LIKE '%microsoft%' AND event_type = 'PushEvent') as microsoft_pushes +FROM github.github_events +WHERE created_at >= '2024-01-01' AND created_at < '2024-01-02' +GROUP BY actor_login +HAVING microsoft_activity > 0 OR google_activity > 0 +ORDER BY (microsoft_activity + google_activity + k8s_activity) DESC +LIMIT 20; +``` + +**Compare the memory efficiency:** + +```sql runnable editable +-- Challenge: Compare cardinality - how many unique values vs simple counters? +SELECT + 'High Cardinality (Memory Explosion)' as approach, + uniq(repo_name) as unique_repos, + 'Each user aggregation state stores array of ALL repo names' as memory_impact +FROM github.github_events +WHERE created_at >= '2024-01-01' AND created_at < '2024-01-02' + +UNION ALL + +SELECT + 'Bit Mask Approach (Memory Bounded)', + 4 as unique_conditions, -- microsoft, google, k8s, pushes + 'Each user aggregation state = 4 integers (32 bytes total)' as memory_impact; +``` + +**The Memory Impact:** +- **Before:** Each user stores arrays of ALL unique repo names (potentially MBs per user) +- **After:** Each user stores fixed number of counters (32 bytes for 4 conditions) +- **Result:** Sentry achieved 100x memory reduction for certain query patterns + +**Production Insight:** *"Don't look for 10 or 20% improvements, look for orders of magnitude... you want to see like 10x, 100x less memory consumed."* + +**Why This Works:** Instead of storing every unique string in memory, you're storing the *answer to questions about those strings* as integers. The aggregation state becomes bounded and tiny, regardless of data diversity. + +## Video Sources {#video-sources} +- [Lost in the Haystack - Optimizing High Cardinality Aggregations](https://www.youtube.com/watch?v=paK84-EUJCA) - Sentry's production lessons on memory optimization +- [ClickHouse Performance Analysis](https://www.youtube.com/watch?v=lxKbvmcLngo) - Alexey Milovidov on debugging methodology +- [ClickHouse Meetup: Query Optimization Techniques](https://www.youtube.com/watch?v=JBomQk4Icjo) - Community optimization strategies + +**Read Next**: +- [Query Optimization Guide](/optimize/query-optimization) +- [Materialized Views Community Insights](./materialized-views.md) \ No newline at end of file diff --git a/docs/tips-and-tricks/too-many-parts.md b/docs/tips-and-tricks/too-many-parts.md new file mode 100644 index 00000000000..fd55ffd40a3 --- /dev/null +++ b/docs/tips-and-tricks/too-many-parts.md @@ -0,0 +1,111 @@ +--- +sidebar_position: 1 +slug: /tips-and-tricks/too-many-parts +sidebar_label: 'Too Many Parts' +doc_type: 'how-to' +keywords: [ + 'clickhouse too many parts', + 'too many parts error', + 'clickhouse insert batching', + 'part explosion problem', + 'clickhouse merge performance', + 'batch insert optimization', + 'clickhouse async inserts', + 'small insert problems', + 'clickhouse parts management', + 'insert performance optimization', + 'clickhouse batching strategy', + 'database insert patterns' +] +title: 'Lessons - Too Many Parts Problem' +description: 'Solutions and prevention of Too Many Parts' +--- + +# The Too Many Parts Problem {#the-too-many-parts-problem} +*This guide is part of a collection of findings gained from community meetups. For more real world solutions and insights you can [browse by specific problem](./community-wisdom.md).* +*Need more performance optimization tips? Check out the [Performance Optimization](./performance-optimization.md) community insights guide.* + +**Universal pain point:** Small frequent inserts create performance degradation through part explosion. + +## Recognize the Problem Early {#recognize-parts-problem} + +```sql runnable editable +-- Challenge: Replace with your actual database and table names for production use +-- Experiment: Adjust the part count thresholds (1000, 500, 100) based on your system +SELECT + database, + table, + count() as total_parts, + sum(rows) as total_rows, + round(avg(rows), 0) as avg_rows_per_part, + min(rows) as min_rows_per_part, + max(rows) as max_rows_per_part, + round(sum(bytes_on_disk) / 1024 / 1024, 2) as total_size_mb, + CASE + WHEN count() > 1000 THEN 'CRITICAL - Too many parts (>1000)' + WHEN count() > 500 THEN 'WARNING - Many parts (>500)' + WHEN count() > 100 THEN 'CAUTION - Getting many parts (>100)' + ELSE 'OK - Reasonable part count' + END as parts_assessment, + CASE + WHEN avg(rows) < 1000 THEN 'POOR - Very small parts' + WHEN avg(rows) < 10000 THEN 'FAIR - Small parts' + WHEN avg(rows) < 100000 THEN 'GOOD - Medium parts' + ELSE 'EXCELLENT - Large parts' + END as part_size_assessment +FROM system.parts +WHERE active = 1 + AND database NOT IN ('system', 'information_schema') +GROUP BY database, table +ORDER BY total_parts DESC +LIMIT 20; +``` + +## Proper Insert Batching {#proper-insert-batching} + +**Community-proven batching strategy from production deployments:** + +```python +# Python example - battle-tested batching approach from production systems +import clickhouse_driver +import time + +class ProductionBatchInserter: + """Based on patterns from companies processing TB/day""" + def __init__(self, client, batch_size=10000, batch_timeout=30): + self.client = client + self.batch_size = batch_size # Or 200MB as used in production + self.batch_timeout = batch_timeout # 30 seconds proven threshold + self.buffer = [] + self.last_flush = time.time() + + def insert_event(self, event_data): + self.buffer.append(event_data) + + # Flush on size or time threshold - prevents "too many parts" + if (len(self.buffer) >= self.batch_size or + time.time() - self.last_flush >= self.batch_timeout): + self.flush() + + def flush(self): + if self.buffer: + self.client.execute('INSERT INTO events VALUES', self.buffer) + self.buffer.clear() + self.last_flush = time.time() +``` + +**Alternative: Async Inserts (ClickHouse 21.11+)** + +*"We developed a function called async insert... this mechanism is straightforward similar to buffer table we insert to the server side and use some buffer to collect these inserts by default we have 16 threads to collect this buffer and if the buffer is large enough or reach timeout we will flush the buffer to the storage so a part will contain multiple inserts"* - ClickHouse team explaining built-in solution + +```sql +-- Enable async inserts to automatically batch small inserts +SET async_insert = 1; +SET wait_for_async_insert = 1; -- For consistency guarantees +SET async_insert_max_data_size = 10485760; -- 10MB buffer size +SET async_insert_busy_timeout_ms = 30000; -- 30 second timeout +``` + +## Video Sources {#video-sources} +- [Fast, Concurrent, and Consistent Asynchronous INSERTS in ClickHouse](https://www.youtube.com/watch?v=AsMPEfN5QtM) - ClickHouse team member explains async inserts and the too many parts problem +- [Production ClickHouse at Scale](https://www.youtube.com/watch?v=liTgGiTuhJE) - Real-world batching strategies from observability platforms \ No newline at end of file diff --git a/docs/tips-and-tricks/troubleshooting.md b/docs/tips-and-tricks/troubleshooting.md new file mode 100644 index 00000000000..59c47fada95 --- /dev/null +++ b/docs/tips-and-tricks/troubleshooting.md @@ -0,0 +1,149 @@ +--- +sidebar_position: 1 +slug: /tips-and-tricks/troubleshooting +sidebar_label: 'Troubleshooting' +doc_type: 'reference' +keywords: [ + 'clickhouse troubleshooting', + 'clickhouse errors', + 'database troubleshooting', + 'clickhouse connection issues', + 'memory limit exceeded', + 'clickhouse performance problems', + 'database error messages', + 'clickhouse configuration issues', + 'connection refused error', + 'clickhouse debugging', + 'database connection problems', + 'troubleshooting guide' +] +title: 'Troubleshooting Common Issues' +description: 'Find solutions to the most common ClickHouse problems including slow queries, memory errors, connection issues, and configuration problems.' +--- + +# Troubleshooting Common Issues {#troubleshooting-common-issues} + +Having problems with ClickHouse? Find the solutions to common issues here. + +## Performance and Errors {#performance-and-errors} + +Queries running slowly, timeouts, or getting specific error messages like "Memory limit exceeded" or "Connection refused." + +
+Show performance and error solutions + +### Query Performance {#query-performance} +- [Find which queries are using the most resources](/knowledgebase/find-expensive-queries) +- [Complete query optimization guide](/docs/optimize/query-optimization) +- [Optimize JOIN operations](/docs/best-practices/minimize-optimize-joins) +- [Run diagnostic queries to find bottlenecks](/docs/knowledgebase/useful-queries-for-troubleshooting) +
+### Data Insertion Performance {#data-insertion-performance} +- [Speed up data insertion](/docs/optimize/bulk-inserts) +- [Set up asynchronous inserts](/docs/optimize/asynchronous-inserts) +
+### Advanced Analysis Tools {#advanced-analysis-tools} + +- [Check what processes are running](/docs/knowledgebase/which-processes-are-currently-running) +- [Monitor system performance](/docs/operations/system-tables/processes) +
+### Error Messages {#error-messages} +- **"Memory limit exceeded"** → [Debug memory limit errors](/docs/guides/developer/debugging-memory-issues) +- **"Connection refused"** → [Fix connection problems](#connections-and-authentication) +- **"Login failures"** → [Set up users, roles, and permissions](/docs/operations/access-rights) +- **"SSL certificate errors"** → [Fix certificate problems](/docs/knowledgebase/certificate_verify_failed_error) +- **"Table/database errors"** → [Database creation guide](/docs/sql-reference/statements/create/database) | [Table UUID problems](/docs/engines/database-engines/atomic) +- **"Network timeouts"** → [Network troubleshooting](/docs/interfaces/http) +- **Other issues** → [Track errors across your cluster](/docs/operations/system-tables/errors) +
+ +## Memory and Resources {#memory-and-resources} + +High memory usage, out-of-memory crashes, or need help sizing your ClickHouse deployment. + +
+Show memory solutions + +### Memory debugging and monitoring: {#memory-debugging-and-monitoring} +- [Identify what's using memory](/docs/guides/developer/debugging-memory-issues) +- [Check current memory usage](/docs/operations/system-tables/processes) +- [Memory allocation profiling](/docs/operations/allocation-profiling) +- [Analyze memory usage patterns](/docs/operations/system-tables/query_log) +
+### Memory configuration: {#memory-configuration} +- [Configure memory limits](/docs/operations/settings/memory-overcommit) +- [Server memory settings](/docs/operations/server-configuration-parameters/settings) +- [Session memory settings](/docs/operations/settings/settings) +
+### Scaling and sizing: {#scaling-and-sizing} +- [Right-size your service](/docs/operations/tips) +- [Configure automatic scaling](/docs/manage/scaling) + +
+ +## Connections and Authentication {#connections-and-authentication} + +Can't connect to ClickHouse, authentication failures, SSL certificate errors, or client setup issues. + +
+Show connection solutions + +### Basic Connection Issues {#basic-connection-issues} +- [Fix HTTP interface issues](/docs/interfaces/http) +- [Handle SSL certificate problems](/docs/knowledgebase/certificate_verify_failed_error) +- [User authentication setup](/docs/operations/access-rights) +
+### Client Interfaces {#client-interfaces} +- [Native ClickHouse clients](/docs/interfaces/natives-clients-and-interfaces) +- [MySQL interface problems](/docs/interfaces/mysql) +- [PostgreSQL interface issues](/docs/interfaces/postgresql) +- [gRPC interface configuration](/docs/interfaces/grpc) +- [SSH interface setup](/docs/interfaces/ssh) +
+### Network and Data {#network-and-data} +- [Network security settings](/docs/operations/server-configuration-parameters/settings) +- [Data format parsing issues](/docs/interfaces/formats) + +
+ +## Setup and Configuration {#setup-and-configuration} + +Initial installation, server configuration, database creation, data ingestion issues, or replication setup. + +
+Show setup and configuration solutions + +### Initial Setup {#initial-setup} +- [Configure server settings](/docs/operations/server-configuration-parameters/settings) +- [Set up security and access control](/docs/operations/access-rights) +- [Configure hardware properly](/docs/operations/tips) +
+### Database Management {#database-management} +- [Create and manage databases](/docs/sql-reference/statements/create/database) +- [Choose the right table engine](/docs/engines/table-engines) + +
+### Data Operations {#data-operations} +- [Optimize bulk data insertion](/docs/optimize/bulk-inserts) +- [Handle data format problems](/docs/interfaces/formats) +- [Set up streaming data pipelines](/docs/optimize/asynchronous-inserts) +- [Improve S3 integration performance](/docs/integrations/s3/performance) +
+### Advanced Configuration {#advanced-configuration} +- [Set up data replication](/docs/engines/table-engines/mergetree-family/replication) +- [Configure distributed tables](/docs/engines/table-engines/special/distributed) + +- [Set up backup and recovery](/docs/operations/backup) +- [Configure monitoring](/docs/operations/system-tables/overview) + +
+ +## Still Need Help? {#still-need-help} + +If you can't find a solution: + +1. **Ask AI** - Ask AI for instant answers. +1. **Check system tables** - Run `SELECT * FROM system.processes` and `SELECT * FROM system.query_log ORDER BY event_time DESC LIMIT 10` +2. **Review server logs** - Look for error messages in your ClickHouse logs +3. **Ask the community** - [Join Our Community Slack](https://clickhouse.com/slack), [GitHub Discussions](https://github.com/ClickHouse/ClickHouse/discussions) +4. **Get professional support** - [ClickHouse Cloud support](https://clickhouse.com/support) \ No newline at end of file diff --git a/knowledgebase/find-expensive-queries.mdx b/knowledgebase/find-expensive-queries.mdx index db30bd4a81d..5e1e59e6bdf 100644 --- a/knowledgebase/find-expensive-queries.mdx +++ b/knowledgebase/find-expensive-queries.mdx @@ -2,6 +2,7 @@ title: How to Identify the Most Expensive Queries in ClickHouse description: Learn how to use the `query_log` table in ClickHouse to identify the most memory and CPU-intensive queries across distributed nodes. date: 2023-03-26 +slug: find-expensive-queries tags: ['Performance and Optimizations'] keywords: ['Expensive Queries'] --- diff --git a/knowledgebase/finding_expensive_queries_by_memory_usage.mdx b/knowledgebase/finding_expensive_queries_by_memory_usage.mdx index 100a8317d96..6f55d1c57b1 100644 --- a/knowledgebase/finding_expensive_queries_by_memory_usage.mdx +++ b/knowledgebase/finding_expensive_queries_by_memory_usage.mdx @@ -2,6 +2,7 @@ title: Identifying Expensive Queries by Memory Usage in ClickHouse description: Learn how to use the `system.query_log` table to find the most memory-intensive queries in ClickHouse, with examples for clustered and standalone setups. date: 2023-06-07 +slug: find-expensive-queries-by-memory-usage tags: ['Performance and Optimizations'] keywords: ['Expensive Queries', 'Memory Usage'] --- diff --git a/package.json b/package.json index 31a7664c6b2..1eb30f183d5 100644 --- a/package.json +++ b/package.json @@ -44,6 +44,7 @@ "@docusaurus/theme-mermaid": "3.7.0", "@docusaurus/theme-search-algolia": "^3.7.0", "@mdx-js/react": "^3.1.0", + "@monaco-editor/react": "^4.7.0", "@radix-ui/react-navigation-menu": "^1.2.13", "@redocly/cli": "^1.34.0", "axios": "^1.11.0", diff --git a/scripts/aspell-ignore/en/aspell-dict.txt b/scripts/aspell-ignore/en/aspell-dict.txt index 9c99dc35314..9c81dfff7de 100644 --- a/scripts/aspell-ignore/en/aspell-dict.txt +++ b/scripts/aspell-ignore/en/aspell-dict.txt @@ -1,4 +1,4 @@ -personal_ws-1.1 en 3611 +personal_ws-1.1 en 3638 AArch ACLs AICPA @@ -32,6 +32,7 @@ Airbyte Akka AlertManager Alexey +Amir Anthropic AnyEvent AnythingLLM @@ -192,9 +193,13 @@ ClickBench ClickCat ClickHouse ClickHouse's +ClickHouseAccess ClickHouseClient +ClickHouseIO ClickHouseMigrator ClickHouseNIO +ClickHouseSettings +ClickHouseType ClickHouseVapor ClickPipe ClickPipes @@ -215,6 +220,7 @@ CodeLLDB Codecs CollapsingMergeTree Combinators +CommonRoom Compat CompiledExpressionCacheBytes CompiledExpressionCacheCount @@ -227,12 +233,17 @@ ConcurrencyControlSoftLimit Config ConnectionDetails Const +ContentSquare +ContentSquare's +Contentsquare ContextLockWait Contrib CopilotKit Copilotkit CountMin Covid +Craigslist +Craigslist's Cramer's Criteo Crotty @@ -321,6 +332,7 @@ DiskSpaceReservedForMerge DiskTotal DiskUnreserved DiskUsed +Displayce DistributedCacheLogMode DistributedCachePoolBehaviourOnLimit DistributedDDLOutputMode @@ -328,6 +340,7 @@ DistributedFilesToInsert DistributedProductMode DistributedSend DockerHub +Doron DoubleDelta Doxygen Draxlr @@ -437,6 +450,7 @@ GraphQL GraphiteMergeTree Greenwald Gunicorn +Guram HANA HDDs HHMM @@ -447,6 +461,7 @@ HSTS HTAP HTTPConnection HTTPThreads +Hashboard's HashedDictionary HashedDictionaryThreads HashedDictionaryThreadsActive @@ -609,6 +624,7 @@ Kerberos Khanna Kibana Kinesis +Kirill KittenHouse Klickhouse Kolmogorov @@ -634,6 +650,7 @@ LangGraph Langchain Lemire Levenshtein +Lhotsky Liao LibFuzzer LibreChat @@ -707,6 +724,7 @@ MaxPartCountForPartition MaxPushedDDLEntryID MaxThreads Mbps +McClickHouse McNeal Memcheck MemoryCode @@ -744,6 +762,7 @@ MetroHash MiB Milli Milovidov +Milovidov's MinHash MinIO MinMax @@ -1126,6 +1145,7 @@ SaaS Sackmann's Sanjeev Sankey +Sapchuk Scalable Scatterplot Schaefer @@ -1141,6 +1161,8 @@ SendExternalTables SendScalars SerDe Serverless +ServiceNow +ServiceNow's SetOperationMode SeverityText ShareAlike @@ -1151,6 +1173,7 @@ SharedMergeTree ShortCircuitFunctionEvaluation Shortkeys Signup +Sigua SimHash Simhash SimpleAggregateFunction @@ -1290,6 +1313,7 @@ TotalPrimaryKeyBytesInMemory TotalPrimaryKeyBytesInMemoryAllocated TotalRowsOfMergeTreeTables TotalTemporaryFiles +Totalprices TotalsMode Tradeoff Transactional @@ -1348,6 +1372,7 @@ VPCs VPNs Vadim Valgrind +Vaza Vectorization Vectorized Vercel @@ -1709,6 +1734,7 @@ changelogs charset charsets chartdb +chatbots chconn chdb cheatsheet @@ -2360,6 +2386,7 @@ kernal keyspace keytab kittenhouse +knowledgebase kolmogorovSmirnovTest kolmogorovsmirnovtest kolya @@ -2372,6 +2399,7 @@ kurtSamp kurtosis kurtpop kurtsamp +kusto lagInFrame laion lakehouse diff --git a/sidebars.js b/sidebars.js index 7038ceeeb04..f42f2536f56 100644 --- a/sidebars.js +++ b/sidebars.js @@ -241,6 +241,20 @@ const sidebars = { }, ], }, + { + type: "category", + label: "Tips and Community Wisdom", + className: "top-nav-item", + collapsed: true, + collapsible: true, + link: { type: "doc", id: "tips-and-tricks/community-wisdom" }, + items: [ + { + type: "autogenerated", + dirName: "tips-and-tricks", + } + ] + }, { type: "category", label: "Example Datasets", @@ -1798,6 +1812,12 @@ const sidebars = { description: "Common use case guides for ClickHouse", href: "/use-cases" }, + { + type: "link", + label: "Tips and Community Wisdom", + description: "Community Lessons and Troubleshooting", + href: "/tips-and-tricks/community-wisdom" + }, { type: "link", label: "Example datasets", diff --git a/src/components/CodeViewer/index.tsx b/src/components/CodeViewer/index.tsx index 6f7292bace9..12c776146a3 100644 --- a/src/components/CodeViewer/index.tsx +++ b/src/components/CodeViewer/index.tsx @@ -1,11 +1,12 @@ -import { CodeBlock, ClickUIProvider, Text } from '@clickhouse/click-ui/bundled' +import { CodeBlock, ClickUIProvider, Text, Button } from '@clickhouse/click-ui/bundled' import CodeInterpreter from './CodeInterpreter' import { DefaultView } from './CodeResults' import { ChartConfig, ChartType } from './types' import { base64Decode } from './utils' import { useColorMode } from '@docusaurus/theme-common' -import { isValidElement } from 'react' -import DocusaurusCodeBlock from '@theme-original/CodeBlock'; +import { isValidElement, useState } from 'react' +import DocusaurusCodeBlock from '@theme-original/CodeBlock' +import Editor from '@monaco-editor/react' function getCodeContent(children: any): string { if (typeof children === 'string') return children @@ -42,6 +43,7 @@ function CodeViewer({ language = 'sql', show_line_numbers = false, runnable = 'false', + editable = 'false', run = 'false', link, view = 'table', @@ -54,9 +56,12 @@ function CodeViewer({ children, ...props }: any) { + const [code, setCode] = useState(typeof children === 'string' ? children : getCodeContent(children)) + const showLineNumbers = show_line_numbers === 'true' const runBoolean = run === 'true' const runnableBoolean = runnable === 'true' + const editableBoolean = editable === 'true' const showStatistics = show_statistics === 'true' let chart: { type: ChartType; config?: ChartConfig } | undefined @@ -71,58 +76,106 @@ function CodeViewer({ } catch { console.log('chart config is not valid') } - const { colorMode } = useColorMode(); // returns 'light' or 'dark' + + const { colorMode } = useColorMode() const extraStyle = parseInlineStyle(style) - const combinedStyle:React.CSSProperties = { + const combinedStyle: React.CSSProperties = { wordBreak: 'break-word', ...extraStyle } + + const handleKeyDown = (e: React.KeyboardEvent) => { + // Allow tab in textarea + if (e.key === 'Tab') { + e.preventDefault() + const target = e.target as HTMLTextAreaElement + const start = target.selectionStart + const end = target.selectionEnd + const newValue = code.substring(0, start) + ' ' + code.substring(end) + setCode(newValue) + setTimeout(() => { + target.selectionStart = target.selectionEnd = start + 2 + }, 0) + } + } + const header = title ? ( - <> - {title} - - ): null + {title} + ) : null - const code_block = click_ui === 'true' ? ( - - {typeof children === 'string' ? children : getCodeContent(children)} - - ): ( - + // Always show as editable Monaco editor when editable=true + const code_block = editableBoolean ? ( +
+ setCode(value || '')} + language={language} + theme={colorMode === 'dark' ? 'vs-dark' : 'vs-light'} + height={`${Math.max(200, (code.split('\n').length + 2) * 19)}px`} + options={{ + minimap: { enabled: false }, + scrollBeyondLastLine: false, + fontSize: 14, + lineNumbers: showLineNumbers ? 'on' : 'off', + wordWrap: 'on', + automaticLayout: true, + tabSize: 2, + insertSpaces: true, + folding: false, + glyphMargin: false, + lineDecorationsWidth: 0, + lineNumbersMinChars: 3, + renderLineHighlight: 'line', + selectOnLineNumbers: true, + roundedSelection: false, + scrollbar: { + verticalScrollbarSize: 8, + horizontalScrollbarSize: 8 + } + }} + /> +
+ ) : ( + click_ui === 'true' ? ( + + {code} + + ) : ( + + ) ) - const results = runnable ? ( + + const results = runnableBoolean ? ( - ): null + ) : null return ( -
- - { header } - { code_block } - { results } - -
- - +
+ + {header} + {code_block} + {results} + +
) } -export default CodeViewer +export default CodeViewer \ No newline at end of file diff --git a/src/components/KapaAI/KapaLink.tsx b/src/components/KapaAI/KapaLink.tsx new file mode 100644 index 00000000000..f77c24f6b5e --- /dev/null +++ b/src/components/KapaAI/KapaLink.tsx @@ -0,0 +1,22 @@ +import React from 'react'; + +declare global { + interface Window { + Kapa?: (action: string, params?: any) => void; + } +} + +export default function KapaLink({ children, query }) { + const handleClick = (e) => { + e.preventDefault(); + if (window.Kapa) { + window.Kapa('open', query ? { query, submit: true } : {}); + } + }; + + return ( + + {children} + + ); +} \ No newline at end of file diff --git a/src/theme/CodeBlock/index.js b/src/theme/CodeBlock/index.js index 1d523f36c26..d4b7408d5b3 100644 --- a/src/theme/CodeBlock/index.js +++ b/src/theme/CodeBlock/index.js @@ -18,7 +18,7 @@ function countLines(text = '') { function parseMetaString(meta = '') { const result = {} - const implicit_settings = ['runnable', 'run', 'show_statistics', 'click_ui'] + const implicit_settings = ['runnable', 'run', 'show_statistics', 'click_ui', 'editable'] meta.split(' ').forEach((part) => { if (!part) return diff --git a/src/theme/MDXComponents.js b/src/theme/MDXComponents.js index 07b73d1081c..395d61909f3 100644 --- a/src/theme/MDXComponents.js +++ b/src/theme/MDXComponents.js @@ -6,10 +6,12 @@ import MDXComponents from '@theme-original/MDXComponents'; // Make sure the path matches your project structure import VStepper from '@site/src/components/Stepper/Stepper'; import GlossaryTooltip from '@site/src/components/GlossaryTooltip/GlossaryTooltip'; +import KapaLink from '@site/src/components/KapaAI/KapaLink'; // Define the enhanced components const enhancedComponents = { ...MDXComponents, + KapaLink, GlossaryTooltip, ul: (props) =>
    , ol: (props) =>
      , diff --git a/yarn.lock b/yarn.lock index a8ff506a886..351f6feaed5 100644 --- a/yarn.lock +++ b/yarn.lock @@ -2700,6 +2700,20 @@ "@module-federation/runtime" "0.8.4" "@module-federation/sdk" "0.8.4" +"@monaco-editor/loader@^1.5.0": + version "1.5.0" + resolved "https://registry.yarnpkg.com/@monaco-editor/loader/-/loader-1.5.0.tgz#dcdbc7fe7e905690fb449bed1c251769f325c55d" + integrity sha512-hKoGSM+7aAc7eRTRjpqAZucPmoNOC4UUbknb/VNoTkEIkCPhqV8LfbsgM1webRM7S/z21eHEx9Fkwx8Z/C/+Xw== + dependencies: + state-local "^1.0.6" + +"@monaco-editor/react@^4.7.0": + version "4.7.0" + resolved "https://registry.yarnpkg.com/@monaco-editor/react/-/react-4.7.0.tgz#35a1ec01bfe729f38bfc025df7b7bac145602a60" + integrity sha512-cyzXQCtO47ydzxpQtCGSQGOC8Gk3ZUeBXFAxD+CWXYFo5OqZyZUonFl0DwUlTyAfRHntBfw2p3w4s9R6oe1eCA== + dependencies: + "@monaco-editor/loader" "^1.5.0" + "@napi-rs/wasm-runtime@^0.2.9": version "0.2.9" resolved "https://registry.yarnpkg.com/@napi-rs/wasm-runtime/-/wasm-runtime-0.2.9.tgz#7278122cf94f3b36d8170a8eee7d85356dfa6a96" @@ -13041,6 +13055,11 @@ srcset@^4.0.0: resolved "https://registry.yarnpkg.com/srcset/-/srcset-4.0.0.tgz#336816b665b14cd013ba545b6fe62357f86e65f4" integrity sha512-wvLeHgcVHKO8Sc/H/5lkGreJQVeYMm9rlmt8PuR1xE31rIuXhuzznUUqAt8MqLhB3MqJdFzlNAfpcWnxiFUcPw== +state-local@^1.0.6: + version "1.0.7" + resolved "https://registry.yarnpkg.com/state-local/-/state-local-1.0.7.tgz#da50211d07f05748d53009bee46307a37db386d5" + integrity sha512-HTEHMNieakEnoe33shBYcZ7NX83ACUjCu8c40iOGEZsngj9zRnkqS9j1pqQPXwobB0ZcVTk27REb7COQ0UR59w== + statuses@2.0.1: version "2.0.1" resolved "https://registry.yarnpkg.com/statuses/-/statuses-2.0.1.tgz#55cb000ccf1d48728bd23c685a063998cf1a1b63"