Skip to content

Commit defc5f8

Browse files
committed
Adding custom tooltips for concepts and getting started docs, including script for re-use. Starting with concepts and getting started to reduce scope.
1 parent 360925a commit defc5f8

File tree

12 files changed

+195
-12
lines changed

12 files changed

+195
-12
lines changed

docs/concepts/glossary.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ title: 'Glossary'
55
slug: /concepts/glossary
66
---
77

8+
<!-- no-glossary -->
9+
810
# Glossary
911

1012
## Atomicity {#atomicity}

docs/concepts/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ In this section of the docs we'll dive into the concepts around what makes Click
1111

1212
| Page | Description |
1313
|------------------------------------------------------------------|---------------------------------------------------------------------------------------|
14-
| [Why is ClickHouse so Fast?](./why-clickhouse-is-so-fast.md) | Learn what makes ClickHouse so fast.
14+
| [Why is ClickHouse so Fast?](./why-clickhouse-is-so-fast.mdx) | Learn what makes ClickHouse so fast.
1515
| [What is OLAP?](./olap.md) | Learn what Online Analytical Processing is.
1616
| [Why is ClickHouse unique?](../about-us/distinctive-features.md) | Learn what makes ClickHouse unique.
1717
| [Glossary](./glossary.md) | This page contains a glossary of terms you'll commonly encounter throughout the docs.

docs/concepts/why-clickhouse-is-so-fast.md renamed to docs/concepts/why-clickhouse-is-so-fast.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ From an architectural perspective, databases consist (at least) of a storage lay
1919

2020
<iframe width="1024" height="576" src="https://www.youtube.com/embed/vsykFYns0Ws?si=hE2qnOf6cDKn-otP" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
2121

22-
In ClickHouse, each table consists of multiple "table parts". A [part](/parts) is created whenever a user inserts data into the table (INSERT statement). A query is always executed against all table parts that exist at the time the query starts.
22+
In ClickHouse, each table consists of multiple "table <GlossaryTooltip term="Parts" />". A [part](/parts) is created whenever a user inserts data into the table (INSERT statement). A query is always executed against all table parts that exist at the time the query starts.
2323

2424
To avoid that too many parts accumulate, ClickHouse runs a [merge](/merges) operation in the background which continuously combines multiple smaller parts into a single bigger part.
2525

@@ -97,7 +97,7 @@ Finally, ClickHouse uses a vectorized query processing layer that parallelizes q
9797

9898
Modern systems have dozens of CPU cores. To utilize all cores, ClickHouse unfolds the query plan into multiple lanes, typically one per core. Each lane processes a disjoint range of the table data. That way, the performance of the database scales "vertically" with the number of available cores.
9999

100-
If a single node becomes too small to hold the table data, further nodes can be added to form a cluster. Tables can be split ("sharded") and distributed across the nodes. ClickHouse will run queries on all nodes that store table data and thereby scale "horizontally" with the number of available nodes.
100+
If a single node becomes too small to hold the table data, further nodes can be added to form a <GlossaryTooltip term="Cluster" />. Tables can be split ("sharded") and distributed across the nodes. ClickHouse will run queries on all nodes that store table data and thereby scale "horizontally" with the number of available nodes.
101101

102102
🤿 Deep dive into this in the [Query Processing Layer](/academic_overview#4-query-processing-layer) section of the web version of our VLDB 2024 paper.
103103

@@ -143,4 +143,4 @@ You can read a [PDF of the paper](https://www.vldb.org/pvldb/vol17/p3731-schulze
143143
Alexey Milovidov, our CTO and the creator of ClickHouse, presented the paper (slides [here](https://raw.githubusercontent.com/ClickHouse/clickhouse-presentations/master/2024-vldb/VLDB_2024_presentation.pdf)), followed by a Q&A (that quickly ran out of time!).
144144
You can catch the recorded presentation here:
145145

146-
<iframe width="1024" height="576" src="https://www.youtube.com/embed/7QXKBKDOkJE?si=5uFerjqPSXQWqDkF" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
146+
<iframe width="1024" height="576" src="https://www.youtube.com/embed/7QXKBKDOkJE?si=5uFerjqPSXQWqDkF" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

docs/faq/general/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ description: 'Index page listing general questions about ClickHouse'
1010
# General questions about ClickHouse
1111

1212
- [What is ClickHouse?](../../intro.md)
13-
- [Why is ClickHouse so fast?](../../concepts/why-clickhouse-is-so-fast.md)
13+
- [Why is ClickHouse so fast?](../../concepts/why-clickhouse-is-so-fast.mdx)
1414
- [Who is using ClickHouse?](../../faq/general/who-is-using-clickhouse.md)
1515
- [What does "ClickHouse" mean?](../../faq/general/dbms-naming.md)
1616
- [What does "Не тормозит" mean?](../../faq/general/ne-tormozit.md)

docs/faq/general/olap.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ All database management systems could be classified into two groups: OLAP (Onlin
3434

3535
In practice OLAP and OLTP are not categories, it's more like a spectrum. Most real systems usually focus on one of them but provide some solutions or workarounds if the opposite kind of workload is also desired. This situation often forces businesses to operate multiple storage systems integrated, which might be not so big deal but having more systems make it more expensive to maintain. So the trend of recent years is HTAP (**Hybrid Transactional/Analytical Processing**) when both kinds of the workload are handled equally well by a single database management system.
3636

37-
Even if a DBMS started as a pure OLAP or pure OLTP, they are forced to move towards that HTAP direction to keep up with their competition. And ClickHouse is no exception, initially, it has been designed as [fast-as-possible OLAP system](../../concepts/why-clickhouse-is-so-fast.md) and it still does not have full-fledged transaction support, but some features like consistent read/writes and mutations for updating/deleting data had to be added.
37+
Even if a DBMS started as a pure OLAP or pure OLTP, they are forced to move towards that HTAP direction to keep up with their competition. And ClickHouse is no exception, initially, it has been designed as [fast-as-possible OLAP system](../../concepts/why-clickhouse-is-so-fast.mdx) and it still does not have full-fledged transaction support, but some features like consistent read/writes and mutations for updating/deleting data had to be added.
3838

3939
The fundamental trade-off between OLAP and OLTP systems remains:
4040

docs/faq/use-cases/time-series.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ description: 'Page describing how to use ClickHouse as a time-series database'
1010

1111
_Note: Please see the blog [Working with Time series data in ClickHouse](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse) for additional examples of using ClickHouse for time series analysis._
1212

13-
ClickHouse is a generic data storage solution for [OLAP](../../faq/general/olap.md) workloads, while there are many specialized [time-series database management systems](https://clickhouse.com/engineering-resources/what-is-time-series-database). Nevertheless, ClickHouse's [focus on query execution speed](../../concepts/why-clickhouse-is-so-fast.md) allows it to outperform specialized systems in many cases. There are many independent benchmarks on this topic out there, so we're not going to conduct one here. Instead, let's focus on ClickHouse features that are important to use if that's your use case.
13+
ClickHouse is a generic data storage solution for [OLAP](../../faq/general/olap.md) workloads, while there are many specialized [time-series database management systems](https://clickhouse.com/engineering-resources/what-is-time-series-database). Nevertheless, ClickHouse's [focus on query execution speed](../../concepts/why-clickhouse-is-so-fast.mdx) allows it to outperform specialized systems in many cases. There are many independent benchmarks on this topic out there, so we're not going to conduct one here. Instead, let's focus on ClickHouse features that are important to use if that's your use case.
1414

1515
First of all, there are **[specialized codecs](../../sql-reference/statements/create/table.md#specialized-codecs)** which make typical time-series. Either common algorithms like `DoubleDelta` and `Gorilla` or specific to ClickHouse like `T64`.
1616

docs/getting-started/quick-start/cloud.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ Select your desired region for deploying the service, and give your new service
6060
<Image img={createservice1} size="md" alt='New ClickHouse Service' border/>
6161
<br/>
6262

63-
By default, the scale tier will create 3 replicas each with 4 VCPUs and 16 GiB RAM. [Vertical autoscaling](/manage/scaling#vertical-auto-scaling) will be enabled by default in the Scale tier.
63+
By default, the scale tier will create 3 <GlossaryTooltip term="Replica" plural="s" /> each with 4 VCPUs and 16 GiB RAM. [Vertical autoscaling](/manage/scaling#vertical-auto-scaling) will be enabled by default in the Scale tier.
6464

6565
Users can customize the service resources if required, specifying a minimum and maximum size for replicas to scale between. When ready, select `Create service`.
6666

@@ -329,4 +329,4 @@ Suppose we have the following text in a CSV file named `data.csv`:
329329
- Check out our 25-minute video on [Getting Started with ClickHouse](https://clickhouse.com/company/events/getting-started-with-clickhouse/)
330330
- If your data is coming from an external source, view our [collection of integration guides](/integrations/index.mdx) for connecting to message queues, databases, pipelines and more
331331
- If you are using a UI/BI visualization tool, view the [user guides for connecting a UI to ClickHouse](/integrations/data-visualization)
332-
- The user guide on [primary keys](/guides/best-practices/sparse-primary-indexes.md) is everything you need to know about primary keys and how to define them
332+
- The user guide on [primary keys](/guides/best-practices/sparse-primary-indexes.md) is everything you need to know about primary keys and how to define them

docs/getting-started/quick-start/oss.mdx

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ PRIMARY KEY (user_id, timestamp)
107107

108108
You can use the familiar `INSERT INTO TABLE` command with ClickHouse, but it is
109109
important to understand that each insert into a `MergeTree` table causes what we
110-
call a **part** in ClickHouse to be created in storage. These parts later get
110+
call a **part** in ClickHouse to be created in storage. These <GlossaryTooltip term="Parts" /> later get
111111
merged in the background by ClickHouse.
112112

113113
In ClickHouse, we try to bulk insert lots of rows at a time
@@ -373,4 +373,3 @@ technologies that integrate with ClickHouse.
373373
- The user guide on [primary keys](/guides/best-practices/sparse-primary-indexes.md) is everything you need to know about primary keys and how to define them.
374374

375375
</VerticalStepper>
376-

docs/intro.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ ClickHouse chooses the join algorithm adaptively, it starts with fast hash joins
8888
## Superior query performance {#superior-query-performance}
8989

9090
ClickHouse is well known for having extremely fast query performance.
91-
To learn why ClickHouse is so fast, see the [Why is ClickHouse fast?](/concepts/why-clickhouse-is-so-fast.md) guide.
91+
To learn why ClickHouse is so fast, see the [Why is ClickHouse fast?](/concepts/why-clickhouse-is-so-fast.mdx) guide.
9292

9393
<!--
9494
## What is OLAP? {#what-is-olap}

scripts/inject-glossary-tooltips.py

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
import os
2+
import re
3+
import json
4+
import argparse
5+
import difflib
6+
# This script injects glossary tooltips into Markdown files based on a glossary JSON file.
7+
# Path to the glossary JSON file and target directory for Markdown files
8+
# Adjust these paths as necessary
9+
GLOSSARY_FILE = 'src/components/GlossaryTooltip/glossary.json'
10+
TARGET_DIR = 'docs/getting-started/quick-start'
11+
12+
with open(GLOSSARY_FILE, 'r', encoding='utf-8') as f:
13+
glossary = json.load(f)
14+
15+
terms = sorted(glossary.keys(), key=len, reverse=True)
16+
17+
def build_term_regex(term):
18+
escaped = re.escape(term)
19+
return re.compile(rf'\b({escaped})(s|es)?\b', re.IGNORECASE)
20+
21+
term_regexes = {term: build_term_regex(term) for term in terms}
22+
23+
def capitalize_word(word):
24+
return word[0].upper() + word[1:] if word else word
25+
26+
def replace_terms(line, replaced_terms):
27+
for term in terms:
28+
if term in replaced_terms:
29+
continue
30+
31+
regex = term_regexes[term]
32+
33+
def _replacer(match):
34+
if term in replaced_terms:
35+
return match.group(0)
36+
37+
base, plural = match.group(1), match.group(2) or ''
38+
capitalize = base[0].isupper()
39+
capital_attr = ' capitalize' if capitalize else ''
40+
plural_attr = f' plural="{plural}"' if plural else ''
41+
replaced_terms.add(term)
42+
return f'<GlossaryTooltip term="{term}"{capital_attr}{plural_attr} />'
43+
44+
line, count = regex.subn(_replacer, line, count=1)
45+
if count > 0:
46+
break # one term per line max
47+
48+
return line
49+
50+
def process_markdown(content):
51+
if '<!-- no-glossary -->' in content:
52+
return content, False
53+
54+
lines = content.splitlines()
55+
inside_code_block = False
56+
replaced_terms = set()
57+
modified = False
58+
output_lines = []
59+
60+
for line in lines:
61+
stripped = line.strip()
62+
63+
# Fence detection for code blocks
64+
if stripped.startswith('```'):
65+
inside_code_block = not inside_code_block
66+
output_lines.append(line)
67+
continue
68+
69+
# Skip inside code or headings
70+
if inside_code_block or stripped.startswith('#'):
71+
output_lines.append(line)
72+
continue
73+
74+
new_line = replace_terms(line, replaced_terms)
75+
if new_line != line:
76+
modified = True
77+
output_lines.append(new_line)
78+
79+
return '\n'.join(output_lines), modified
80+
81+
def rename_md_to_mdx(filepath):
82+
if filepath.endswith('.md'):
83+
new_path = filepath[:-3] + '.mdx'
84+
os.rename(filepath, new_path)
85+
print(f'Renamed: {filepath}{new_path}')
86+
return new_path
87+
return filepath
88+
89+
def walk_files(target_dir):
90+
for root, _, files in os.walk(target_dir):
91+
for filename in files:
92+
if filename.endswith('.md') or filename.endswith('.mdx'):
93+
yield os.path.join(root, filename)
94+
95+
def print_diff(original, modified, path):
96+
diff_lines = list(difflib.unified_diff(
97+
original.splitlines(),
98+
modified.splitlines(),
99+
fromfile=path,
100+
tofile=path,
101+
lineterm=''
102+
))
103+
if diff_lines:
104+
print('\n'.join(diff_lines))
105+
106+
def main():
107+
parser = argparse.ArgumentParser()
108+
parser.add_argument('--dry-run', action='store_true', help='Show diffs without writing')
109+
args = parser.parse_args()
110+
111+
for filepath in walk_files(TARGET_DIR):
112+
with open(filepath, 'r', encoding='utf-8') as f:
113+
original = f.read()
114+
115+
modified, changed = process_markdown(original)
116+
117+
if changed:
118+
if args.dry_run:
119+
print(f'\n--- DRY RUN: {filepath} ---')
120+
print_diff(original, modified, filepath)
121+
else:
122+
with open(filepath, 'w', encoding='utf-8') as f:
123+
f.write(modified)
124+
print(f'✅ Updated: {filepath}')
125+
126+
# Rename to .mdx if needed
127+
if filepath.endswith('.md'):
128+
filepath = rename_md_to_mdx(filepath)
129+
130+
if __name__ == '__main__':
131+
main()

0 commit comments

Comments
 (0)