Skip to content

4146 add troubleshooting section #4181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 36 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
32feb0b
Troubleshooting page, temporary placement
dhtclk Jul 26, 2025
60f3fbe
Merge branch 'main' of https://github.com/ClickHouse/clickhouse-docs …
dhtclk Jul 26, 2025
c619204
Merge branch 'main' of https://github.com/ClickHouse/clickhouse-docs …
dhtclk Jul 28, 2025
aa15a8a
Merge branch 'main' of https://github.com/ClickHouse/clickhouse-docs …
dhtclk Aug 1, 2025
0c7d189
Merge branch 'main' of https://github.com/ClickHouse/clickhouse-docs …
dhtclk Aug 4, 2025
7c94a46
Adding Lessons Learned Guide with Interactable Queries
dhtclk Aug 6, 2025
9166323
Split into multiple guides under a new section
dhtclk Aug 6, 2025
2bbe1bc
Merge branch 'main' of https://github.com/ClickHouse/clickhouse-docs …
dhtclk Aug 7, 2025
bf67a27
Keywords, cross-linking, clean-up
dhtclk Aug 7, 2025
e10f908
adding ask ai link to troubleshooting, simple kapa link component.
dhtclk Aug 7, 2025
dc751b2
commenting out C++ link
dhtclk Aug 7, 2025
0ba3eb4
spelling and dictionary update
dhtclk Aug 7, 2025
e9a6adf
Update docs/tips-and-tricks/too-many-parts.md
dhtclk Aug 8, 2025
0f7d07e
Update docs/tips-and-tricks/too-many-parts.md
dhtclk Aug 8, 2025
7159177
Update docs/tips-and-tricks/too-many-parts.md
dhtclk Aug 8, 2025
03abb7d
Update docs/tips-and-tricks/too-many-parts.md
dhtclk Aug 8, 2025
8e8fd7b
Update docs/tips-and-tricks/debugging-toolkit.md
dhtclk Aug 8, 2025
4867bbe
Update docs/tips-and-tricks/cost-optimization.md
dhtclk Aug 8, 2025
ec2036f
Merge branch 'main' of https://github.com/ClickHouse/clickhouse-docs …
dhtclk Aug 11, 2025
de5e966
Rewriting Creative Use Cases
dhtclk Aug 11, 2025
21e518c
fix formatting
dhtclk Aug 11, 2025
33935f4
Rewrite cost-optimization doc
dhtclk Aug 11, 2025
f3806da
Performance Optimization Guide
dhtclk Aug 11, 2025
d8e3ca2
Too Many Parts
dhtclk Aug 11, 2025
395d484
MVs and Debugging Toolkit
dhtclk Aug 11, 2025
aa308be
Fixing nav link
dhtclk Aug 11, 2025
bc0e0eb
adding to dictionary
dhtclk Aug 11, 2025
1dd1100
fixing dictionary
dhtclk Aug 11, 2025
f37c70a
adding header ids
dhtclk Aug 11, 2025
c0cf0fa
removing garbage AI quotes
dhtclk Aug 12, 2025
d546215
removing another garbage quote and fixing capitalization
dhtclk Aug 12, 2025
6e98868
fixing another quote
dhtclk Aug 12, 2025
12fb3ef
rewriting debugging insights
dhtclk Aug 12, 2025
500d515
slight header change
dhtclk Aug 12, 2025
fc5d1a1
adding header ids
dhtclk Aug 12, 2025
267527c
Further pruning innaccuracies and renaming debugging toolkit
dhtclk Aug 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions docs/tips-and-tricks/community-wisdom.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
---
sidebar_position: 1
slug: /tips-and-tricks/community-wisdom
sidebar_label: 'Community Wisdom'
doc_type: 'overview'
keywords: [
'database tips',
'community wisdom',
'production troubleshooting',
'performance optimization',
'database debugging',
'clickhouse guides',
'real world examples',
'database best practices',
'meetup insights',
'production lessons',
'interactive tutorials',
'database solutions'
]
title: 'ClickHouse Community Wisdom'
description: 'Learn from the ClickHouse community with real world scenarios and lessons learned'
---

# ClickHouse Community Wisdom: Tips and Tricks from Meetups {#community-wisdom}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# ClickHouse Community Wisdom: Tips and Tricks from Meetups {#community-wisdom}
# ClickHouse community wisdom: tips and tricks from meetups {#community-wisdom}

We had taken to sentence casing, which is how Google does it https://developers.google.com/style/capitalization. I'm not strongly opinionated on which style we have but it will be nice to keep it consistent.


*These interactive guides represent collective wisdom from hundreds of production deployments. Each runnable example helps you understand ClickHouse patterns using real GitHub events data - practice these concepts to avoid common mistakes and accelerate your success.*

Combine this collected knowledge with our [Best Practices](/best-practices) guide for optimal ClickHouse Experience.

## Problem-Specific Quick Jumps {#problem-specific-quick-jumps}

| Issue | Document | Description |
|-------|---------|-------------|
| **Production Issue** | [Debugging-Toolkit](./debugging-toolkit.md) | Copy/Paste Queries, production debugging guidance |
| **Slow Queries** | [Performance Optimization](./performance-optimization.md) | Optimize Performance |
| **Materialized Views** | [MV Double-Edged Sword](./materialized-views.md) | Avoid 10x storage instances |
| **Too Many Parts** | [Too Many Parts](./too-many-parts.md) | Addressing the 'Too Many Parts' error and performance slowdown |
| **High Costs** | [Cost Optimization](./cost-optimization.md) | Optimize Cost |
| **Creative Use Cases** | [Success Stories](./creative-usecases.md) | Examples of ClickHouse in 'Outside the Box' use cases |

### Usage Instructions {#usage-instructions}

1. **Run the examples** - Many SQL blocks executable
2. **Experiment freely** - Modify queries to test different patterns
3. **Adapt to your data** - Use templates with your own table names
4. **Monitor regularly** - Implement health check queries as ongoing monitoring
5. **Learn progressively** - Start with basics, advance to optimization patterns

### Interactive Features {#interactive-features}

- **Real Data Examples**: Using actual GitHub events from ClickHouse playground
- **Production-Ready Templates**: Adapt examples for your production systems
- **Progressive Difficulty**: From basic concepts to advanced optimization
- **Emergency Procedures**: Ready-to-use debugging and recovery queries

**Last Updated:** Based on community meetup insights through 2024-2025
**Contributing:** Found a mistake or have a new lesson? Community contributions welcome
159 changes: 159 additions & 0 deletions docs/tips-and-tricks/cost-optimization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
---
sidebar_position: 1
slug: /community-wisdom/cost-optimization
sidebar_label: 'Performance Optimization'
doc_type: 'how-to-guide'
keywords: [
'cost optimization',
'storage costs',
'partition management',
'data retention',
'storage analysis',
'database optimization',
'clickhouse cost reduction',
'storage hot spots',
'ttl performance',
'disk usage',
'compression strategies',
'retention analysis'
]
title: 'Lessons - Cost Optimization'
description: 'Find solutions to the most common ClickHouse problems including slow queries, memory errors, connection issues, and configuration problems.'
---

# Cost Optimization: Battle-Tested Strategies {#cost-optimization}
*This guide is part of a collection of findings gained from community meetups. For more real world solutions and insights you can [browse by specific problem](./community-wisdom.md).*
*Want to learn about creative use cases for ClickHouse? Check out the [Creative Use Cases](./creative-usecases.md) community insights guide.*

## The Partition Deletion vs TTL Discovery {#partition-vs-ttl}

**Hard-learned lesson from production:** TTL mutations are resource-intensive and slow down everything.

*"Don't try to mutate data if there isn't a world where you absolutely need to... when you mutate data ClickHouse creates a new version of the data and then it merges it with the existing data... it's resource intensive... significantly significant performance impact"*

**Better strategy:** Delete entire partitions instead of TTL row-by-row deletion.

```sql runnable editable
-- Challenge: Adjust the month thresholds (3 months, 1 month) based on your retention needs
-- Experiment: Try different partition patterns like weekly or daily instead of monthly
SELECT
toYYYYMM(created_at) as year_month,
count() as events,
min(created_at) as oldest_event,
max(created_at) as newest_event,
formatReadableSize(count() * 200) as estimated_size_bytes,
CASE
WHEN toYYYYMM(created_at) < toYYYYMM(now()) - 3
THEN 'DELETE PARTITION - older than 3 months'
WHEN toYYYYMM(created_at) < toYYYYMM(now()) - 1
THEN 'ARCHIVE CANDIDATE - 1-3 months old'
ELSE 'KEEP - recent data'
END as retention_strategy
FROM github.github_events
WHERE created_at >= '2023-01-01'
GROUP BY year_month
ORDER BY year_month DESC
LIMIT 12;
```

## Storage Hot Spots Analysis {#storage-hot-spots}

**Find your biggest storage consumers:** Identify which columns and patterns drive your storage costs.

```sql runnable editable
-- Challenge: Replace column names with your own table's columns to find storage hot spots
-- Experiment: Try different size thresholds (50MB) and repetition factors (10, 3, 5)
SELECT
column_name,
total_size_mb,
unique_values,
repetition_factor,
storage_efficiency,
optimization_priority
FROM (
SELECT
'repo_name' as column_name,
round(sum(length(repo_name)) / 1024 / 1024, 2) as total_size_mb,
count(DISTINCT repo_name) as unique_values,
round(count() / count(DISTINCT repo_name), 1) as repetition_factor,
CASE
WHEN count() / count(DISTINCT repo_name) > 10 THEN 'HIGH compression potential'
WHEN count() / count(DISTINCT repo_name) > 3 THEN 'MEDIUM compression potential'
ELSE 'LOW compression potential'
END as storage_efficiency,
CASE
WHEN round(sum(length(repo_name)) / 1024 / 1024, 2) > 50 AND count() / count(DISTINCT repo_name) > 5
THEN 'OPTIMIZE FIRST - large + repetitive'
WHEN round(sum(length(repo_name)) / 1024 / 1024, 2) > 50
THEN 'SIZE CONCERN - consider retention'
ELSE 'LOW PRIORITY'
END as optimization_priority
FROM github.github_events
WHERE created_at >= '2024-01-01' AND created_at < '2024-01-08'

UNION ALL

SELECT
'actor_login',
round(sum(length(actor_login)) / 1024 / 1024, 2),
count(DISTINCT actor_login),
round(count() / count(DISTINCT actor_login), 1),
CASE
WHEN count() / count(DISTINCT actor_login) > 10 THEN 'HIGH compression potential'
WHEN count() / count(DISTINCT actor_login) > 3 THEN 'MEDIUM compression potential'
ELSE 'LOW compression potential'
END,
CASE
WHEN round(sum(length(actor_login)) / 1024 / 1024, 2) > 50 AND count() / count(DISTINCT actor_login) > 5
THEN 'OPTIMIZE FIRST - large + repetitive'
WHEN round(sum(length(actor_login)) / 1024 / 1024, 2) > 50
THEN 'SIZE CONCERN - consider retention'
ELSE 'LOW PRIORITY'
END
FROM github.github_events
WHERE created_at >= '2024-01-01' AND created_at < '2024-01-08'
)
ORDER BY total_size_mb DESC;
```

## Cost-Driven Retention Analysis {#cost-driven-retention}

**Real production strategy:** *"Once we get this kind of deletion signal... we do the row based deletion... we know what needs to be deleted and keep on tracking"*

```sql runnable editable
-- Challenge: Modify the age thresholds (7, 30, 90 days) to match your business needs
-- Experiment: Try different retention strategies for each temperature tier
SELECT
data_temperature,
count() as event_count,
round(count() * 100.0 / sum(count()) OVER(), 2) as percentage_of_total,
formatReadableSize(count() * 200) as estimated_storage_size,
retention_strategy
FROM (
SELECT
CASE
WHEN dateDiff('day', created_at, now()) <= 7 THEN 'Hot Data (0-7 days)'
WHEN dateDiff('day', created_at, now()) <= 30 THEN 'Warm Data (8-30 days)'
WHEN dateDiff('day', created_at, now()) <= 90 THEN 'Cool Data (31-90 days)'
ELSE 'Cold Data (90+ days)'
END as data_temperature,
CASE
WHEN dateDiff('day', created_at, now()) <= 7 THEN 'Keep all columns - high query value'
WHEN dateDiff('day', created_at, now()) <= 30 THEN 'Consider column-based TTL for large fields'
WHEN dateDiff('day', created_at, now()) <= 90 THEN 'Drop expensive columns, keep core data'
ELSE 'DELETE PARTITION - storage cost > query value'
END as retention_strategy,
CASE
WHEN dateDiff('day', created_at, now()) <= 7 THEN 1
WHEN dateDiff('day', created_at, now()) <= 30 THEN 2
WHEN dateDiff('day', created_at, now()) <= 90 THEN 3
ELSE 4
END as sort_order
FROM github.github_events
WHERE created_at >= '2023-01-01'
)
GROUP BY data_temperature, retention_strategy, sort_order
ORDER BY sort_order;
```

**The key insight:** Instead of deleting entire rows, strategically drop the expensive columns first while preserving the essential data structure for longer periods. This can save "several terabytes" as Displayce discovered.
136 changes: 136 additions & 0 deletions docs/tips-and-tricks/creative-usecases.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
---
sidebar_position: 1
slug: /community-wisdom/creative-use-cases
sidebar_label: 'Creative Use Cases'
doc_type: 'how-to-guide'
keywords: [
'clickhouse creative use cases',
'clickhouse success stories',
'unconventional database uses',
'clickhouse rate limiting',
'analytics database applications',
'clickhouse mobile analytics',
'customer-facing analytics',
'database innovation',
'clickhouse real-time applications',
'alternative database solutions',
'breaking database conventions',
'production success stories'
]
title: 'Lessons - Creative Use Cases'
description: 'Find solutions to the most common ClickHouse problems including slow queries, memory errors, connection issues, and configuration problems.'
---

# Breaking the Rules: Success Stories {#breaking-the-rules}
*This guide is part of a collection of findings gained from community meetups. For more real world solutions and insights you can [browse by specific problem](./community-wisdom.md).*
*Need tips on debugging an issue in prod? Check out the [Debugging Toolkit](./debugging-toolkit.md) community insights guide.*

## ClickHouse as Rate Limiter (Craigslist Story) {#clickhouse-rate-limiter}

**Conventional wisdom:** Use Redis for rate limiting.

**Craigslist's breakthrough:** *"Everyone uses Redis for rate limiter implementations... Why not just do it in Redis?"*

**The problem with Redis:** *"Our experience with Redis is not like what you've seen in the movies... weird maintenance issues... we will reboot a node in a Redis cluster and some weird latency spike hits the front end"*

**Test rate limiting logic using ClickHouse approach:**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The bold: point format feels too obviously AI generated to me. Wonder if we couldn’t rework these to rather be paragraphs of text. This could just be me though, let's get feedback from others on that.

Also feel that if we're referencing specific companies and quoting from the videos we should probably link to the meetup videos.


```sql runnable editable
-- Challenge: Try different rate limit thresholds (100, 50) or time windows (hour vs minute)
-- Experiment: Test with different user patterns by changing the HAVING clause
SELECT
actor_login as user_id,
toStartOfHour(created_at) as hour,
count() as requests_per_hour,
CASE
WHEN count() > 100 THEN 'RATE_LIMITED'
WHEN count() > 50 THEN 'WARNING'
ELSE 'ALLOWED'
END as rate_limit_status
FROM github.github_events
WHERE created_at >= '2024-01-15'
AND created_at < '2024-01-16'
GROUP BY actor_login, hour
HAVING count() > 10
ORDER BY requests_per_hour DESC
LIMIT 20;
```

**Results:** *"Running untouched for nearly a year without any alert"* - a dramatic improvement over Redis infrastructure.

**Why it works:**
- Incredible write performance for access log data
- Built-in TTL for automatic cleanup
- SQL flexibility for complex rate limiting rules
- No Redis cluster maintenance headaches

## Mobile Analytics: The 7-Eleven Success Story {#mobile-analytics}

**Conventional wisdom:** Analytics databases aren't for mobile applications.

**The reality:** *"People out in the factory floors... people out in health care facilities construction sites... they like to be able to look at reports... to sit at a computer at a desktop... is just not optimal"*

**7-Eleven's breakthrough:** Store managers using ClickHouse-powered analytics on mobile devices.

```sql runnable editable
-- Challenge: Modify this to show weekly or monthly patterns instead of daily
-- Experiment: Add different metrics like peak activity hours or user retention patterns
SELECT
'Daily Sales Summary' as report_type,
toDate(created_at) as date,
count() as total_transactions,
uniq(actor_login) as unique_customers,
round(count() / uniq(actor_login), 1) as avg_transactions_per_customer,
'Perfect for mobile dashboard' as mobile_optimized
FROM github.github_events
WHERE created_at >= today() - 7
GROUP BY date
ORDER BY date DESC;
```

**The use case:** *"The person who runs a store they're going back and forth between the stock room out to the front into the register and then going between stores"*

**Success metrics:**
- Daily sales by store (corporate + franchise)
- Out-of-stock alerts in real-time
- *"Full feature capability between your phone and your desktop"*

## Customer-Facing Real-Time Applications {#customer-facing-applications}

**Conventional wisdom:** ClickHouse is for internal analytics, not customer-facing apps.

**ServiceNow's reality:** *"We offer an analytic solution both for internal needs and for customers across web mobile and chatbots"*

**The breakthrough insight:** *"It enables you to build applications that are highly responsive... customer facing applications... whether they're web apps or mobile apps"*

```sql runnable editable
-- Challenge: Try different segmentation approaches like geographic or time-based grouping
-- Experiment: Add percentage calculations or ranking functions for customer insights
SELECT
'Customer Segmentation' as feature,
event_type as segment,
count() as segment_size,
round(count() * 100.0 / sum(count()) OVER(), 1) as percentage,
'Real-time customer insights' as value_proposition
FROM github.github_events
WHERE created_at >= '2024-01-01'
AND created_at < '2024-01-02'
GROUP BY event_type
ORDER BY segment_size DESC;
```

**Why this breaks conventional rules:**
- **Real-time customer segmentation:** *"Give customers the ability to real-time segments the data and dynamically slicing"*
- **User expectations:** *"In 2024 we have been very much trained to expect a certain degree of responsiveness"*
- **Retention impact:** *"If that repeats often enough you're either not going to come back"*

**Success pattern:** ClickHouse's speed enables customer-facing applications with sub-second response times, challenging the notion that analytical databases are only for internal use.

### The Rule-Breaking Philosophy {#rule-breaking-philosophy}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be H3 as a sub point of "Customer-Facing Real-Time Applications"?


**Common thread:** These successes came from questioning assumptions:
- *"I asked my boss like what do you think of this idea maybe I can try this with ClickHouse"* - Craigslist
- *"Mobile first actually became a big part of how we thought about this"* - Mobile analytics pioneers
- *"We wanted to give customers the ability to... slice and dice everything as much as they wanted"* - ServiceNow

**The lesson:** Sometimes the "wrong" tool for the job becomes the right tool when you understand its strengths and design around them.
Loading