Skip to content

Commit ed1b043

Browse files
committed
Merge branch 'main' of github.com:ClickHouse/clickhouse-docs
2 parents 47d01a1 + c63583f commit ed1b043

File tree

102 files changed

+3026
-411
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

102 files changed

+3026
-411
lines changed

.github/workflows/build-search.yml

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
name: Update Algolia Search
2+
3+
on:
4+
pull_request:
5+
types:
6+
- closed
7+
8+
workflow_dispatch:
9+
10+
schedule:
11+
- cron: '0 4 * * *'
12+
13+
env:
14+
PYTHONUNBUFFERED: 1 # Force the stdout and stderr streams to be unbuffered
15+
16+
jobs:
17+
update-search:
18+
if: github.event.pull_request.merged == true && contains(github.event.pull_request.labels.*.name, 'update search') && github.event.pull_request.base.ref == 'main'
19+
#if: contains(github.event.pull_request.labels.*.name, 'update search') # Updated to trigger directly on PRs with the label
20+
runs-on: ubuntu-latest
21+
22+
steps:
23+
- name: Checkout Repository
24+
uses: actions/checkout@v3
25+
26+
- name: Set up Node.js
27+
uses: actions/setup-node@v3
28+
with:
29+
node-version: '20'
30+
31+
- name: Run Prep from Master
32+
run: yarn copy-clickhouse-repo-docs
33+
34+
- name: Run Auto Generate Settings
35+
run: yarn auto-generate-settings
36+
37+
- name: Run Indexer
38+
run: yarn run-indexer
39+
env:
40+
ALGOLIA_API_KEY: ${{ secrets.ALGOLIA_API_KEY }}
41+
ALGOLIA_APP_ID: 5H9UG7CX5W
42+
43+
- name: Verify Completion
44+
run: echo "All steps completed successfully!"
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# This GitHub Action is used for triggering updates of
2+
# the toc.json files present in any directory that
3+
# needs an automatically generated table of contents.
4+
5+
name: Generate Table of Contents files
6+
7+
env:
8+
# Force the stdout and stderr streams to be unbuffered
9+
PYTHONUNBUFFERED: 1
10+
11+
on:
12+
schedule:
13+
- cron: '0 0 * * *'
14+
workflow_dispatch:
15+
16+
permissions:
17+
contents: write
18+
19+
jobs:
20+
generate_toc_formats:
21+
runs-on: ubuntu-latest
22+
steps:
23+
24+
# Step 1 - Check out the repository
25+
- name: Check out repository
26+
uses: actions/checkout@v3
27+
28+
# Step 2 - Pull changes
29+
- name: Pull remote Changes
30+
run: git pull
31+
32+
# Step 3 - Setup python
33+
- name: Set up python
34+
uses: actions/setup-python@v3
35+
with:
36+
python-version: '3.x'
37+
38+
# Step 4 - Install python dependencies
39+
- name: Install dependencies
40+
run: |
41+
python -m pip install --upgrade pip
42+
pip install -r 'scripts/table-of-contents-generator/requirements.txt'
43+
44+
# Step 5 - Pull main repo docs, run script to generate TOCs:
45+
- name: Generate table of contents files
46+
run: |
47+
yarn prep-from-master
48+
python -u ./scripts/table-of-contents-generator/toc_gen.py --dir="docs/en/interfaces/formats" --single-toc --out="table-of-contents-files" --ignore "_snippets"
49+
50+
# Step 6 - Fail the workflow if script returns exit code 1
51+
- name: Check exit code
52+
run: |
53+
if [[ "${{ steps.toc_gen.outcome }}" == "failure" ]]; then
54+
echo "Ran into trouble generating a table of contents. See the logs for details."
55+
exit 1
56+
fi
57+
58+
# Step 7 - Check if anything was actually updated
59+
- name: Check for changes
60+
id: check_changes
61+
run: |
62+
git status -u
63+
if [[ -n "$(git diff --exit-code)" ]]; then
64+
echo "Changes detected."
65+
echo "has_changes=true" >> $GITHUB_OUTPUT
66+
else
67+
echo "No changes detected."
68+
echo "has_changes=false" >> $GITHUB_OUTPUT
69+
fi
70+
71+
# Step 8 - Commit and Push generated Table Of Contents files
72+
- uses: stefanzweifel/git-auto-commit-action@v5
73+
env:
74+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
75+
with:
76+
commit_message: "Autogenerate table of contents files from GitHub action - $(date '+%Y-%m-%d %H:%M:%S')"
77+
file_pattern: 'table-of-contents-files/*'
78+
branch: generate_table_of_contents
79+
create_branch: true

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,3 +49,6 @@ docs/en/cloud/manage/api/services-api-reference.md
4949
.vscode
5050
.aspell.en.prepl
5151
*.md.bak
52+
53+
# Don't ignore generated table of contents files
54+
!toc.json

copyClickhouseRepoDocs.sh

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,13 @@ echo "[$SCRIPT_NAME] Copying completed"
3131

3232
echo "[$SCRIPT_NAME] Generate changelog"
3333
cp docs/en/_placeholders/changelog/_index.md docs/en/whats-new/changelog/index.md
34-
if grep -q '^# $(date +%Y) Changelog' ClickHouse/CHANGELOG.md; then
35-
sed '/^# $(date +%Y) Changelog/d' ClickHouse/CHANGELOG.md > temp.txt
36-
cat >> docs/en/whats-new/changelog/index.md
37-
rm temp.txt
34+
if grep -q "^# $(date +%Y) Changelog" ClickHouse/CHANGELOG.md; then
35+
echo "Generating $(date +%Y) Changelog..."
36+
sed "/^# $(date +%Y) Changelog/d" ClickHouse/CHANGELOG.md > temp.txt
37+
echo "Changelog copied to temp.txt"
38+
cat temp.txt >> docs/en/whats-new/changelog/index.md
39+
echo "Changelog written to docs/en/whats-new/changelog/index.md"
40+
rm -f temp.txt
3841
echo "$(date +%Y) Changelog was updated."
3942
else
4043
current_year="$(date +%Y)"

docs/en/about-us/cloud.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ description: ClickHouse Cloud
88
# ClickHouse Cloud
99

1010
ClickHouse Cloud is the cloud offering created by the original creators of the popular open-source OLAP database ClickHouse.
11-
You can experience ClickHouse Cloud by [starting a free trial](https://clickhouse.cloud/signUp).
11+
You can experience ClickHouse Cloud by [starting a free trial](https://console.clickhouse.cloud/signUp).
1212

1313
### ClickHouse Cloud benefits:
1414

docs/en/about-us/history.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,17 @@
22
slug: /en/about-us/history
33
sidebar_label: ClickHouse History
44
sidebar_position: 40
5-
description: Where it all began...
5+
description: History of ClickHouse development
6+
tags: ['history', 'development', 'Metrica']
67
---
78

89
# ClickHouse History {#clickhouse-history}
910

10-
ClickHouse has been developed initially to power [Yandex.Metrica](https://metrica.yandex.com/), [the second largest web analytics platform in the world](http://w3techs.com/technologies/overview/traffic_analysis/all), and continues to be the core component of this system. With more than 13 trillion records in the database and more than 20 billion events daily, ClickHouse allows generating custom reports on the fly directly from non-aggregated data. This article briefly covers the goals of ClickHouse in the early stages of its development.
11+
ClickHouse was initially developed to power [Yandex.Metrica](https://metrica.yandex.com/), [the second largest web analytics platform in the world](http://w3techs.com/technologies/overview/traffic_analysis/all), and continues to be its core component. With more than 13 trillion records in the database and more than 20 billion events daily, ClickHouse allows generating custom reports on the fly directly from non-aggregated data. This article briefly covers the goals of ClickHouse in the early stages of its development.
1112

12-
Yandex.Metrica builds customized reports on the fly based on hits and sessions, with arbitrary segments defined by the user. Doing so often requires building complex aggregates, such as the number of unique users. New data for building a report arrives in real-time.
13+
Yandex.Metrica builds customized reports on the fly based on hits and sessions, with arbitrary segments defined by the user. Doing so often requires building complex aggregates, such as the number of unique users, with new data for building reports arriving in real-time.
1314

14-
As of April 2014, Yandex.Metrica was tracking about 12 billion events (page views and clicks) daily. All these events must be stored to build custom reports. A single query may require scanning millions of rows within a few hundred milliseconds, or hundreds of millions of rows in just a few seconds.
15+
As of April 2014, Yandex.Metrica was tracking about 12 billion events (page views and clicks) daily. All these events needed to be stored, in order to build custom reports. A single query may have required scanning millions of rows within a few hundred milliseconds, or hundreds of millions of rows in just a few seconds.
1516

1617
## Usage in Yandex.Metrica and Other Yandex Services {#usage-in-yandex-metrica-and-other-yandex-services}
1718

@@ -26,30 +27,30 @@ ClickHouse also plays a key role in the following processes:
2627
- Running queries for debugging the Yandex.Metrica engine.
2728
- Analyzing logs from the API and the user interface.
2829

29-
Nowadays, there are multiple dozen ClickHouse installations in other Yandex services and departments: search verticals, e-commerce, advertisement, business analytics, mobile development, personal services, and others.
30+
Nowadays, there are a multiple dozen ClickHouse installations in other Yandex services and departments: search verticals, e-commerce, advertisement, business analytics, mobile development, personal services, and others.
3031

3132
## Aggregated and Non-aggregated Data {#aggregated-and-non-aggregated-data}
3233

3334
There is a widespread opinion that to calculate statistics effectively, you must aggregate data since this reduces the volume of data.
3435

35-
But data aggregation comes with a lot of limitations:
36+
However data aggregation comes with a lot of limitations:
3637

3738
- You must have a pre-defined list of required reports.
3839
- The user can’t make custom reports.
3940
- When aggregating over a large number of distinct keys, the data volume is barely reduced, so aggregation is useless.
4041
- For a large number of reports, there are too many aggregation variations (combinatorial explosion).
4142
- When aggregating keys with high cardinality (such as URLs), the volume of data is not reduced by much (less than twofold).
4243
- For this reason, the volume of data with aggregation might grow instead of shrink.
43-
- Users do not view all the reports we generate for them. A large portion of those calculations is useless.
44-
- The logical integrity of data may be violated for various aggregations.
44+
- Users do not view all the reports we generate for them. A large portion of those calculations are useless.
45+
- The logical integrity of the data may be violated for various aggregations.
4546

4647
If we do not aggregate anything and work with non-aggregated data, this might reduce the volume of calculations.
4748

4849
However, with aggregation, a significant part of the work is taken offline and completed relatively calmly. In contrast, online calculations require calculating as fast as possible, since the user is waiting for the result.
4950

5051
Yandex.Metrica has a specialized system for aggregating data called Metrage, which was used for the majority of reports.
5152
Starting in 2009, Yandex.Metrica also used a specialized OLAP database for non-aggregated data called OLAPServer, which was previously used for the report builder.
52-
OLAPServer worked well for non-aggregated data, but it had many restrictions that did not allow it to be used for all reports as desired. These included the lack of support for data types (only numbers), and the inability to incrementally update data in real-time (it could only be done by rewriting data daily). OLAPServer is not a DBMS, but a specialized DB.
53+
OLAPServer worked well for non-aggregated data, but it had many restrictions that did not allow it to be used for all reports as desired. These included a lack of support for data types (numbers only), and the inability to incrementally update data in real-time (it could only be done by rewriting data daily). OLAPServer is not a DBMS, but a specialized DB.
5354

5455
The initial goal for ClickHouse was to remove the limitations of OLAPServer and solve the problem of working with non-aggregated data for all reports, but over the years, it has grown into a general-purpose database management system suitable for a wide range of analytical tasks.
5556

docs/en/about-us/support.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ sidebar_position: 30
77

88
ClickHouse provides Support Services for our ClickHouse Cloud users and customers. Our objective is a Support Services team that represents the ClickHouse product – unparalleled performance, ease of use, and exceptionally fast, high-quality results. For details, [visit our Support Services](https://clickhouse.com/support/program/) page.
99

10-
[Login to the Cloud console](https://clickhouse.cloud/support) and select **Help -> Support** from the menu options to open a new support case and view the status of your submitted cases.
10+
[Login to the Cloud console](https://console.clickhouse.cloud/support) and select **Help -> Support** from the menu options to open a new support case and view the status of your submitted cases.
1111

1212
**Service level agreements ("SLA")**
1313
Classification of severity levels for issues reported and response times.

docs/en/chdb/getting-started.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ pip install pandas pyarrow
4949
## Querying a JSON file in S3
5050

5151
Let's now have a look at how to query a JSON file that's stored in an S3 bucket.
52-
The [YouTube dislikes dataset](https://clickhouse.com/docs/en/getting-started/example-datasets/youtube-dislikes) contains more than 4 billion rows of dislikes on YouTube videos up to 2021.
52+
The [YouTube dislikes dataset](/docs/en/getting-started/example-datasets/youtube-dislikes) contains more than 4 billion rows of dislikes on YouTube videos up to 2021.
5353
We're going to work with one of the JSON files from that dataset.
5454

5555
Import chdb:

docs/en/cloud/bestpractices/usagelimits.md

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,6 @@ Clickhouse is very fast and reliable, but any database has its limits. For examp
1212
If you've reached one of those limits, it may mean that you are implementing your use case in an unoptimized way. You can contact our support so we can help you refine your use case to avoid going through the limits or to increase the limits in a guided way.
1313
:::
1414

15-
# Tables
16-
Clickhouse Cloud have a limit of **5000** tables per instance
17-
18-
# Databases
19-
Clickhouse Cloud have a limit of **1000** databases per instance
20-
2115
# Partitions
2216
Clickhouse Cloud have a limit of **50000** [partitions](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/custom-partitioning-key) per instance
2317

docs/en/cloud/get-started/cloud-quick-start.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,13 @@ import SQLConsoleDetail from '@site/docs/en/_snippets/_launch_sql_console.md';
1010
# ClickHouse Cloud Quick Start
1111

1212
The quickest and easiest way to get up and running with ClickHouse is to create a new
13-
service in [ClickHouse Cloud](https://clickhouse.cloud).
13+
service in [ClickHouse Cloud](https://console.clickhouse.cloud).
1414

1515
## 1. Create a ClickHouse service
1616

17-
To create a free ClickHouse service in [ClickHouse Cloud](https://clickhouse.cloud), you just need to sign up by completing the following steps:
17+
To create a free ClickHouse service in [ClickHouse Cloud](https://console.clickhouse.cloud), you just need to sign up by completing the following steps:
1818

19-
- Create an account on the [sign-up page](https://clickhouse.cloud/signUp)
19+
- Create an account on the [sign-up page](https://console.clickhouse.cloud/signUp)
2020
- You can choose to sign up using your email or via Google SSO, Microsoft SSO, AWS Marketplace, Google Cloud or Microsoft Azure
2121
- If you sign up using an email and password, remember to verify your email address within the next 24h via the link you receive in your email
2222
- Login using the username and password you just created

0 commit comments

Comments
 (0)