Skip to content

Commit f954ba7

Browse files
committed
update main
2 parents 382aa1f + 95b645f commit f954ba7

File tree

20 files changed

+547
-159
lines changed

20 files changed

+547
-159
lines changed
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# This GitHub Action is used for triggering updates of
2+
# the toc.json files present in any directory that
3+
# needs an automatically generated table of contents.
4+
5+
name: Generate Table of Contents files
6+
7+
env:
8+
# Force the stdout and stderr streams to be unbuffered
9+
PYTHONUNBUFFERED: 1
10+
11+
on:
12+
schedule:
13+
- cron: '0 0 * * *'
14+
workflow_dispatch:
15+
16+
permissions:
17+
contents: write
18+
19+
jobs:
20+
generate_toc_formats:
21+
runs-on: ubuntu-latest
22+
steps:
23+
# Step 1: Check out the repository
24+
- name: Check out repository
25+
uses: actions/checkout@v3
26+
27+
# Step 2 - Setup Python
28+
- name: Set up Python
29+
uses: actions/setup-python@v3
30+
with:
31+
python-version: '3.x'
32+
33+
# Step 3: Install Python dependencies
34+
- name: Install dependencies
35+
run: |
36+
python -m pip install --upgrade pip
37+
pip install -r 'scripts/table-of-contents-generator/requirements.txt'
38+
39+
# Step 4 - Pull main repo docs, run script to generate TOCs:
40+
- name: Generate TOCs
41+
run: |
42+
yarn prep-from-master
43+
python -u ./scripts/table-of-contents-generator/toc_gen.py --dir="docs/en/interfaces/formats" --single-toc --out="table-of-contents-files" --ignore "_snippets"
44+
45+
# Step 5 - Fail the workflow if script returns exit code 1
46+
- name: Check exit code
47+
run: |
48+
if [[ "${{ steps.toc_gen.outcome }}" == "failure" ]]; then
49+
echo "Ran into trouble generating a table of contents. See the logs for details."
50+
exit 1
51+
fi
52+
53+
# Step 6 - Check if anything was actually updated
54+
- name: Check for Changes
55+
id: check_changes
56+
run: |
57+
git status -u
58+
if [[ -n "$(git diff --exit-code)" ]]; then
59+
echo "Changes detected."
60+
echo "has_changes=true" >> $GITHUB_OUTPUT
61+
else
62+
echo "No changes detected."
63+
echo "has_changes=false" >> $GITHUB_OUTPUT
64+
fi
65+
66+
# Step 7 - Commit and Push generated Table Of Contents files
67+
- uses: stefanzweifel/git-auto-commit-action@v5
68+
env:
69+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
70+
with:
71+
commit_message: "Autogenerate table of contents files from GitHub action - $(date '+%Y-%m-%d %H:%M:%S')"
72+
file_pattern: 'table-of-contents-files/*'
73+
branch: generate_table_of_contents
74+
create_branch: true

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,3 +49,6 @@ docs/en/cloud/manage/api/services-api-reference.md
4949
.vscode
5050
.aspell.en.prepl
5151
*.md.bak
52+
53+
# Don't ignore generated table of contents files
54+
!toc.json

copyClickhouseRepoDocs.sh

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,13 @@ echo "[$SCRIPT_NAME] Copying completed"
3131

3232
echo "[$SCRIPT_NAME] Generate changelog"
3333
cp docs/en/_placeholders/changelog/_index.md docs/en/whats-new/changelog/index.md
34-
if grep -q '^# $(date +%Y) Changelog' ClickHouse/CHANGELOG.md; then
35-
sed '/^# $(date +%Y) Changelog/d' ClickHouse/CHANGELOG.md > temp.txt
36-
cat >> docs/en/whats-new/changelog/index.md
37-
rm temp.txt
34+
if grep -q "^# $(date +%Y) Changelog" ClickHouse/CHANGELOG.md; then
35+
echo "Generating $(date +%Y) Changelog..."
36+
sed "/^# $(date +%Y) Changelog/d" ClickHouse/CHANGELOG.md > temp.txt
37+
echo "Changelog copied to temp.txt"
38+
cat temp.txt >> docs/en/whats-new/changelog/index.md
39+
echo "Changelog written to docs/en/whats-new/changelog/index.md"
40+
rm -f temp.txt
3841
echo "$(date +%Y) Changelog was updated."
3942
else
4043
current_year="$(date +%Y)"

docs/en/about-us/history.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,17 @@
22
slug: /en/about-us/history
33
sidebar_label: ClickHouse History
44
sidebar_position: 40
5-
description: Where it all began...
5+
description: History of ClickHouse development
6+
tags: ['history', 'development', 'Metrica']
67
---
78

89
# ClickHouse History {#clickhouse-history}
910

10-
ClickHouse has been developed initially to power [Yandex.Metrica](https://metrica.yandex.com/), [the second largest web analytics platform in the world](http://w3techs.com/technologies/overview/traffic_analysis/all), and continues to be the core component of this system. With more than 13 trillion records in the database and more than 20 billion events daily, ClickHouse allows generating custom reports on the fly directly from non-aggregated data. This article briefly covers the goals of ClickHouse in the early stages of its development.
11+
ClickHouse was initially developed to power [Yandex.Metrica](https://metrica.yandex.com/), [the second largest web analytics platform in the world](http://w3techs.com/technologies/overview/traffic_analysis/all), and continues to be its core component. With more than 13 trillion records in the database and more than 20 billion events daily, ClickHouse allows generating custom reports on the fly directly from non-aggregated data. This article briefly covers the goals of ClickHouse in the early stages of its development.
1112

12-
Yandex.Metrica builds customized reports on the fly based on hits and sessions, with arbitrary segments defined by the user. Doing so often requires building complex aggregates, such as the number of unique users. New data for building a report arrives in real-time.
13+
Yandex.Metrica builds customized reports on the fly based on hits and sessions, with arbitrary segments defined by the user. Doing so often requires building complex aggregates, such as the number of unique users, with new data for building reports arriving in real-time.
1314

14-
As of April 2014, Yandex.Metrica was tracking about 12 billion events (page views and clicks) daily. All these events must be stored to build custom reports. A single query may require scanning millions of rows within a few hundred milliseconds, or hundreds of millions of rows in just a few seconds.
15+
As of April 2014, Yandex.Metrica was tracking about 12 billion events (page views and clicks) daily. All these events needed to be stored, in order to build custom reports. A single query may have required scanning millions of rows within a few hundred milliseconds, or hundreds of millions of rows in just a few seconds.
1516

1617
## Usage in Yandex.Metrica and Other Yandex Services {#usage-in-yandex-metrica-and-other-yandex-services}
1718

@@ -26,30 +27,30 @@ ClickHouse also plays a key role in the following processes:
2627
- Running queries for debugging the Yandex.Metrica engine.
2728
- Analyzing logs from the API and the user interface.
2829

29-
Nowadays, there are multiple dozen ClickHouse installations in other Yandex services and departments: search verticals, e-commerce, advertisement, business analytics, mobile development, personal services, and others.
30+
Nowadays, there are a multiple dozen ClickHouse installations in other Yandex services and departments: search verticals, e-commerce, advertisement, business analytics, mobile development, personal services, and others.
3031

3132
## Aggregated and Non-aggregated Data {#aggregated-and-non-aggregated-data}
3233

3334
There is a widespread opinion that to calculate statistics effectively, you must aggregate data since this reduces the volume of data.
3435

35-
But data aggregation comes with a lot of limitations:
36+
However data aggregation comes with a lot of limitations:
3637

3738
- You must have a pre-defined list of required reports.
3839
- The user can’t make custom reports.
3940
- When aggregating over a large number of distinct keys, the data volume is barely reduced, so aggregation is useless.
4041
- For a large number of reports, there are too many aggregation variations (combinatorial explosion).
4142
- When aggregating keys with high cardinality (such as URLs), the volume of data is not reduced by much (less than twofold).
4243
- For this reason, the volume of data with aggregation might grow instead of shrink.
43-
- Users do not view all the reports we generate for them. A large portion of those calculations is useless.
44-
- The logical integrity of data may be violated for various aggregations.
44+
- Users do not view all the reports we generate for them. A large portion of those calculations are useless.
45+
- The logical integrity of the data may be violated for various aggregations.
4546

4647
If we do not aggregate anything and work with non-aggregated data, this might reduce the volume of calculations.
4748

4849
However, with aggregation, a significant part of the work is taken offline and completed relatively calmly. In contrast, online calculations require calculating as fast as possible, since the user is waiting for the result.
4950

5051
Yandex.Metrica has a specialized system for aggregating data called Metrage, which was used for the majority of reports.
5152
Starting in 2009, Yandex.Metrica also used a specialized OLAP database for non-aggregated data called OLAPServer, which was previously used for the report builder.
52-
OLAPServer worked well for non-aggregated data, but it had many restrictions that did not allow it to be used for all reports as desired. These included the lack of support for data types (only numbers), and the inability to incrementally update data in real-time (it could only be done by rewriting data daily). OLAPServer is not a DBMS, but a specialized DB.
53+
OLAPServer worked well for non-aggregated data, but it had many restrictions that did not allow it to be used for all reports as desired. These included a lack of support for data types (numbers only), and the inability to incrementally update data in real-time (it could only be done by rewriting data daily). OLAPServer is not a DBMS, but a specialized DB.
5354

5455
The initial goal for ClickHouse was to remove the limitations of OLAPServer and solve the problem of working with non-aggregated data for all reports, but over the years, it has grown into a general-purpose database management system suitable for a wide range of analytical tasks.
5556

docs/en/cloud/reference/cloud-compatibility.md

Lines changed: 1 addition & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -122,19 +122,5 @@ ClickHouse Cloud is tuned for variable workloads, and for that reason most syste
122122
As part of creating the ClickHouse service, we create a default database, and the default user that has broad permissions to this database. This initial user can create additional users and assign their permissions to this database. Beyond this, the ability to enable the following security features within the database using Kerberos, LDAP, or SSL X.509 certificate authentication are not supported at this time.
123123

124124
## Roadmap
125-
The table below summarizes our efforts to expand some of the capabilities described above. If you have feedback, please [submit it here](mailto:[email protected]).
126125

127-
| Capability | Status |
128-
|-------------------------------------------------------------------------|:----------------------------------------|
129-
|Dictionary support: PostgreSQL, MySQL, remote and local ClickHouse servers, Redis, MongoDB and HTTP sources | **Added in GA** |
130-
|SQL user-defined functions (UDFs) | **Added in GA** |
131-
|MySQL and PostgreSQL engine | **Added in GA** |
132-
|MySQL interface | **Added in GA** |
133-
|Redis | Coming soon |
134-
|Postgres interfaces | Evaluating demand |
135-
|Engines for SQLite, ODBC, HDFS, and Hive | Evaluating demand |
136-
|Protobuf, Cap'n'Proto formats | Evaluating demand |
137-
|Kafka Table Engine | Not recommended; see alternatives above |
138-
|JDBC Table Engine | Not recommended |
139-
|EmbeddedRocksDB Engine | Evaluating demand |
140-
|Executable user-defined functions | Evaluating demand |
126+
The table below summarizes our efforts to expand some of the capabilities described above. If you have feedback, please [submit it here](mailto:[email protected]).

docs/en/guides/developer/lightweight-update.md

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,8 @@ title: Lightweight Update
55
keywords: [lightweight update]
66
---
77

8-
import CloudAvailableBadge from '@theme/badges/CloudAvailableBadge';
9-
108
## Lightweight Update
119

12-
<CloudAvailableBadge/>
13-
1410
When lightweight updates are enabled, updated rows are marked as updated immediately and subsequent `SELECT` queries will automatically return with the changed values. When lightweight updates are not enabled, you may have to wait for your mutations to be applied via a background process to see the changed values.
1511

1612
Lightweight updates can be enabled for `MergeTree`-family tables by enabling the query-level setting `apply_mutations_on_fly`.
@@ -23,7 +19,7 @@ SET apply_mutations_on_fly = 1;
2319

2420
Let's create a table and run some mutations:
2521
```sql
26-
CREATE TABLE test_on_fly_mutations (id UInt64, v String)
22+
CREATE TABLE test_on_fly_mutations (id UInt64, v String)
2723
ENGINE = MergeTree ORDER BY id;
2824

2925
-- Disable background materialization of mutations to showcase
@@ -93,4 +89,4 @@ These behaviours are controlled by the following settings:
9389
- `mutations_execute_nondeterministic_on_initiator` - if true, non-deterministic functions are executed on the initiator replica and are replaced as literals in `UPDATE` and `DELETE` queries. Default value: `false`.
9490
- `mutations_execute_subqueries_on_initiator` - if true, scalar subqueries are executed on the initiator replica and are replaced as literals in `UPDATE` and `DELETE` queries. Default value: `false`.
9591
- `mutations_max_literal_size_to_replace` - The maximum size of serialized literals in bytes to replace in `UPDATE` and `DELETE` queries. Default value: `16384` (16 KiB).
96-
92+

docs/en/integrations/cli.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ sidebar_position: 30
33
sidebar_label: clickhouse-client
44
title: clickhouse-client
55
slug: /en/integrations/sql-clients/cli
6+
displayed_sidebar: integrations
67
---
78

89
import Content from '@site/docs/en/interfaces/cli.md';

docs/en/integrations/language-clients/java/client-v1.md

Lines changed: 2 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,13 @@
1-
---
2-
sidebar_label: Client V1
3-
sidebar_position: 3
4-
keywords: [clickhouse, java, client, integrate]
5-
description: Java ClickHouse Connector v1
6-
slug: /en/integrations/java/client-v1
7-
---
8-
91
import Tabs from '@theme/Tabs';
102
import TabItem from '@theme/TabItem';
113
import CodeBlock from '@theme/CodeBlock';
124

13-
# Client (V1)
5+
# Client (0.7.x and earlier)
146

157
Java client library to communicate with a DB server thru its protocols. Current implementation supports only [HTTP interface](/docs/en/interfaces/http). The library provides own API to send requests to a server.
168

179
:::warning Deprecation
18-
This library will be deprecated soon. Use Client-v2 for new projects
10+
This library will be deprecated soon. Use the latest [Java Client](/docs/en/integrations/language-clients/java/client-v2.md) for new projects
1911
:::
2012

2113
## Setup

docs/en/integrations/language-clients/java/client-v2.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
---
2-
sidebar_label: Client V2
2+
sidebar_label: Client 0.8+
33
sidebar_position: 2
44
keywords: [clickhouse, java, client, integrate]
5-
description: Java ClickHouse Connector v2
5+
description: Java ClickHouse Connector 0.8+
66
slug: /en/integrations/java/client-v2
77
---
88

@@ -12,7 +12,12 @@ import CodeBlock from '@theme/CodeBlock';
1212

1313
# Java Client (V2)
1414

15-
Java client library to communicate with a DB server through its protocols. The current implementation only supports the [HTTP interface](/docs/en/interfaces/http). The library provides its own API to send requests to a server. The library also provides tools to work with different binary data formats (RowBinary* & Native*).
15+
Java client library to communicate with a DB server through its protocols. The current implementation only supports the [HTTP interface](/docs/en/interfaces/http).
16+
The library provides its own API to send requests to a server. The library also provides tools to work with different binary data formats (RowBinary* & Native*).
17+
18+
:::note
19+
If you're looking for a prior version of the java client docs, please see [here](/docs/en/integrations/language-clients/java/client-v1.md).
20+
:::
1621

1722
## Setup
1823

docs/en/integrations/language-clients/java/index.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,8 @@ import CodeBlock from '@theme/CodeBlock';
1111

1212
# Java Clients Overview
1313

14-
- [Client-V2](./client-v2.md)
15-
- [Client-V1 (Old)](./client-v1.md)
16-
- [JDBC Driver](./jdbc-driver.md)
14+
- [Client 0.8+](./client-v2.md)
15+
- [JDBC 0.8+](./jdbc-v2.md)
1716
- [R2DBC Driver](./r2dbc.md)
1817

1918
## ClickHouse Client

0 commit comments

Comments
 (0)