Skip to content

Commit 4881f0a

Browse files
committed
Glossary Transformer, finds words wrapped in ^^...^^ and transforms to a GlossaryToolTip at build time.
1 parent e0f5675 commit 4881f0a

File tree

8 files changed

+165
-141
lines changed

8 files changed

+165
-141
lines changed

docs/concepts/why-clickhouse-is-so-fast.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ From an architectural perspective, databases consist (at least) of a storage lay
1919

2020
<iframe width="1024" height="576" src="https://www.youtube.com/embed/vsykFYns0Ws?si=hE2qnOf6cDKn-otP" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
2121

22-
In ClickHouse, each table consists of multiple "table <GlossaryTooltip term="Parts" />". A [part](/parts) is created whenever a user inserts data into the table (INSERT statement). A query is always executed against all table parts that exist at the time the query starts.
22+
In ClickHouse, each table consists of multiple "table ^^parts^^". A [part](/parts) is created whenever a user inserts data into the table (INSERT statement). A query is always executed against all table parts that exist at the time the query starts.
2323

2424
To avoid that too many parts accumulate, ClickHouse runs a [merge](/merges) operation in the background which continuously combines multiple smaller parts into a single bigger part.
2525

@@ -97,7 +97,7 @@ Finally, ClickHouse uses a vectorized query processing layer that parallelizes q
9797

9898
Modern systems have dozens of CPU cores. To utilize all cores, ClickHouse unfolds the query plan into multiple lanes, typically one per core. Each lane processes a disjoint range of the table data. That way, the performance of the database scales "vertically" with the number of available cores.
9999

100-
If a single node becomes too small to hold the table data, further nodes can be added to form a <GlossaryTooltip term="Cluster" />. Tables can be split ("sharded") and distributed across the nodes. ClickHouse will run queries on all nodes that store table data and thereby scale "horizontally" with the number of available nodes.
100+
If a single node becomes too small to hold the table data, further nodes can be added to form a ^^cluster^^. Tables can be split ("sharded") and distributed across the nodes. ClickHouse will run queries on all nodes that store table data and thereby scale "horizontally" with the number of available nodes.
101101

102102
🤿 Deep dive into this in the [Query Processing Layer](/academic_overview#4-query-processing-layer) section of the web version of our VLDB 2024 paper.
103103

docs/getting-started/quick-start/cloud.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ Select your desired region for deploying the service, and give your new service
6060
<Image img={createservice1} size="md" alt='New ClickHouse Service' border/>
6161
<br/>
6262

63-
By default, the scale tier will create 3 <GlossaryTooltip term="Replica" plural="s" /> each with 4 VCPUs and 16 GiB RAM. [Vertical autoscaling](/manage/scaling#vertical-auto-scaling) will be enabled by default in the Scale tier.
63+
By default, the scale tier will create 3 ^^replica^^s each with 4 VCPUs and 16 GiB RAM. [Vertical autoscaling](/manage/scaling#vertical-auto-scaling) will be enabled by default in the Scale tier.
6464

6565
Users can customize the service resources if required, specifying a minimum and maximum size for replicas to scale between. When ready, select `Create service`.
6666

docs/getting-started/quick-start/oss.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
slug: /getting-started/quick-start/oss
33
sidebar_label: 'OSS'
4-
sidebar_position: 1
4+
sidebar_position: 2
55
keywords: ['getting started', 'quick start', 'beginner-friendly']
66
title: 'ClickHouse OSS quick start'
77
description: 'ClickHouse Quick Start guide'
@@ -107,7 +107,7 @@ PRIMARY KEY (user_id, timestamp)
107107

108108
You can use the familiar `INSERT INTO TABLE` command with ClickHouse, but it is
109109
important to understand that each insert into a `MergeTree` table causes what we
110-
call a **part** in ClickHouse to be created in storage. These <GlossaryTooltip term="Parts" /> later get
110+
call a **part** in ClickHouse to be created in storage. These ^^parts^^ later get
111111
merged in the background by ClickHouse.
112112

113113
In ClickHouse, we try to bulk insert lots of rows at a time

docusaurus.config.en.js

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ const frontmatterValidator = require('./plugins/frontmatter-validation/frontmatt
1313
import pluginLlmsTxt from './plugins/llms-txt-plugin.ts'
1414
import prismLight from "./src/utils/prismLight";
1515
import prismDark from "./src/utils/prismDark";
16+
import glossaryTransformer from "./plugins/glossary-transformer.js";
1617

1718
// Helper function to skip over index.md files.
1819
function skipIndex(items) {
@@ -151,7 +152,7 @@ const config = {
151152
showLastUpdateTime: false,
152153
sidebarCollapsed: true,
153154
routeBasePath: "/",
154-
remarkPlugins: [math, remarkCustomBlocks],
155+
remarkPlugins: [math, remarkCustomBlocks, glossaryTransformer],
155156
beforeDefaultRemarkPlugins: [fixLinks],
156157
rehypePlugins: [katex],
157158
},

plugins/glossary-transformer.js

Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
// plugins/glossary-transformer/index.js
2+
const { visit } = require('unist-util-visit');
3+
const fs = require('fs');
4+
const path = require('path');
5+
6+
// Cache glossary terms globally
7+
let cachedGlossary = null;
8+
let glossaryModTime = null;
9+
10+
function createGlossaryTransformer(options = {}) {
11+
const config = {
12+
caseSensitive: false,
13+
validateTerms: true,
14+
glossaryFile: path.resolve(__dirname, '../src/components/GlossaryTooltip/glossary.json'),
15+
skipPatterns: [],
16+
...options
17+
};
18+
19+
if (!Array.isArray(config.skipPatterns)) {
20+
config.skipPatterns = [];
21+
}
22+
23+
function loadGlossary() {
24+
if (!fs.existsSync(config.glossaryFile)) {
25+
console.warn(`Glossary file not found: ${config.glossaryFile}`);
26+
return new Map();
27+
}
28+
29+
const stats = fs.statSync(config.glossaryFile);
30+
if (cachedGlossary && glossaryModTime && stats.mtime <= glossaryModTime) {
31+
return cachedGlossary;
32+
}
33+
34+
try {
35+
const glossaryData = JSON.parse(fs.readFileSync(config.glossaryFile, 'utf8'));
36+
const glossaryMap = new Map();
37+
38+
Object.entries(glossaryData).forEach(([term, definition]) => {
39+
glossaryMap.set(term.toLowerCase(), { originalTerm: term, definition });
40+
});
41+
42+
cachedGlossary = glossaryMap;
43+
glossaryModTime = stats.mtime;
44+
console.log(`Loaded ${glossaryMap.size} glossary terms`);
45+
46+
return glossaryMap;
47+
} catch (error) {
48+
console.error('Error loading glossary:', error.message);
49+
return new Map();
50+
}
51+
}
52+
53+
function shouldProcess(filePath, fileContent) {
54+
return filePath?.endsWith('.mdx') &&
55+
fileContent?.includes('^^') &&
56+
!config.skipPatterns.some(pattern =>
57+
pattern instanceof RegExp ? pattern.test(filePath) : filePath.includes(pattern)
58+
);
59+
}
60+
61+
return function transformer(tree, file) {
62+
const filePath = file.path || file.history?.[0] || '';
63+
const fileContent = file.value || '';
64+
65+
if (!shouldProcess(filePath, fileContent)) {
66+
return tree;
67+
}
68+
69+
const glossary = loadGlossary();
70+
if (glossary.size === 0) return tree;
71+
72+
let transformCount = 0;
73+
74+
visit(tree, 'text', (node, index, parent) => {
75+
if (!node.value?.includes('^^') || !parent) return;
76+
77+
const pattern = /\^\^([^^\n|]+?)(?:\|([^^\n]*?))?\^\^/g;
78+
const newNodes = [];
79+
let lastIndex = 0;
80+
let match;
81+
82+
while ((match = pattern.exec(node.value)) !== null) {
83+
const [fullMatch, term, plural = ''] = match;
84+
const cleanTerm = term.trim();
85+
const cleanPlural = plural.trim();
86+
87+
// Add text before match
88+
if (match.index > lastIndex) {
89+
newNodes.push({
90+
type: 'text',
91+
value: node.value.slice(lastIndex, match.index)
92+
});
93+
}
94+
95+
// Get original term from glossary or use as-is
96+
const glossaryEntry = glossary.get(cleanTerm.toLowerCase());
97+
const originalTerm = glossaryEntry?.originalTerm || cleanTerm;
98+
99+
if (!glossaryEntry && config.validateTerms) {
100+
console.warn(`Glossary term not found: ${cleanTerm}`);
101+
}
102+
103+
// Create MDX JSX element
104+
newNodes.push({
105+
type: 'mdxJsxTextElement',
106+
name: 'GlossaryTooltip',
107+
attributes: [
108+
{ type: 'mdxJsxAttribute', name: 'term', value: originalTerm },
109+
{ type: 'mdxJsxAttribute', name: 'plural', value: cleanPlural }
110+
],
111+
children: []
112+
});
113+
114+
transformCount++;
115+
lastIndex = match.index + fullMatch.length;
116+
}
117+
118+
// Add remaining text
119+
if (lastIndex < node.value.length) {
120+
newNodes.push({
121+
type: 'text',
122+
value: node.value.slice(lastIndex)
123+
});
124+
}
125+
126+
// Replace node if we made changes
127+
if (newNodes.length > 0) {
128+
parent.children.splice(index, 1, ...newNodes);
129+
}
130+
});
131+
132+
if (transformCount > 0) {
133+
console.log(`Processed ${transformCount} glossary terms in: ${path.basename(filePath)}`);
134+
}
135+
136+
return tree;
137+
};
138+
}
139+
140+
module.exports = createGlossaryTransformer;

scripts/inject-glossary-tooltips.py

Lines changed: 0 additions & 131 deletions
This file was deleted.

src/components/GlossaryTooltip/GlossaryTooltip.tsx

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,21 @@ import Link from '@docusaurus/Link';
44

55
const GlossaryTooltip = ({ term, capitalize = false, plural = '' }) => {
66
const [visible, setVisible] = useState(false);
7-
const definition = glossary[term];
7+
8+
// Case-insensitive lookup
9+
let definition = glossary[term]; // Try exact match first
10+
let matchedKey = term;
11+
12+
if (!definition) {
13+
// Try to find a case-insensitive match
14+
const foundKey = Object.keys(glossary).find(key =>
15+
key.toLowerCase() === term.toLowerCase()
16+
);
17+
if (foundKey) {
18+
definition = glossary[foundKey];
19+
matchedKey = foundKey;
20+
}
21+
}
822

923
if (!definition) {
1024
console.warn(`Glossary term not found: ${term}`);
@@ -15,7 +29,7 @@ const GlossaryTooltip = ({ term, capitalize = false, plural = '' }) => {
1529
}
1630

1731
const displayTerm = capitalize ? capitalizeWord(term) : term.toLowerCase();
18-
const anchorId = term.toLowerCase().replace(/\s+/g, '-');
32+
const anchorId = matchedKey.toLowerCase().replace(/\s+/g, '-');
1933
const glossarySlug = `/concepts/glossary#${anchorId}`;
2034

2135
return (
@@ -46,4 +60,4 @@ function capitalizeWord(word) {
4660
return word.charAt(0).toUpperCase() + word.slice(1);
4761
}
4862

49-
export default GlossaryTooltip;
63+
export default GlossaryTooltip;

src/theme/MDXComponents.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ import MDXComponents from '@theme-original/MDXComponents';
55
// Import the custom Stepper component
66
// Make sure the path matches your project structure
77
import VStepper from '@site/src/components/Stepper/Stepper';
8-
import GlossaryTooltip from '../../src/components/GlossaryTooltip/GlossaryTooltip.tsx';
8+
import GlossaryTooltip from '@site/src/components/GlossaryTooltip/GlossaryTooltip';
99

1010
// Define the enhanced components
1111
const enhancedComponents = {

0 commit comments

Comments
 (0)