Skip to content

Commit d90ff14

Browse files
authored
Merge branch 'customizations/24.8.14' into backport/24.8/83844
2 parents d14d70e + fda8a18 commit d90ff14

40 files changed

+998
-290
lines changed

docs/en/operations/system-tables/view_refreshes.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,8 @@ Columns:
1717
- `duration_ms` ([UInt64](../../sql-reference/data-types/int-uint.md)) — How long the last refresh attempt took.
1818
- `next_refresh_time` ([DateTime](../../sql-reference/data-types/datetime.md)) — Time at which the next refresh is scheduled to start.
1919
- `remaining_dependencies` ([Array(String)](../../sql-reference/data-types/array.md)) — If the view has [refresh dependencies](../../sql-reference/statements/create/view.md#refresh-dependencies), this array contains the subset of those dependencies that are not satisfied for the current refresh yet. If `status = 'WaitingForDependencies'`, a refresh is ready to start as soon as these dependencies are fulfilled.
20-
- `exception` ([String](../../sql-reference/data-types/string.md)) — if `last_refresh_result = 'Exception'`, i.e. the last refresh attempt failed, this column contains the corresponding error message and stack trace.
20+
- `exception` ([String](../../sql-reference/data-types/string.md)) — if `last_refresh_result = 'Error'`, i.e. the last refresh attempt failed, this column contains the corresponding error message and stack trace.
21+
- `retry` ([UInt64](../../sql-reference/data-types/int-uint.md)) — If nonzero, the current or next refresh is a retry (see `refresh_retries` refresh setting), and `retry` is the 1-based index of that retry.
2122
- `refresh_count` ([UInt64](../../sql-reference/data-types/int-uint.md)) — Number of successful refreshes since last server restart or table creation.
2223
- `progress` ([Float64](../../sql-reference/data-types/float.md)) — Progress of the current refresh, between 0 and 1.
2324
- `read_rows` ([UInt64](../../sql-reference/data-types/int-uint.md)) — Number of rows read by the current refresh so far.

docs/en/sql-reference/statements/create/view.md

Lines changed: 30 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ Creates a new view. Views can be [normal](#normal-view), [materialized](#materia
1313
Syntax:
1414

1515
``` sql
16-
CREATE [OR REPLACE] VIEW [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster_name]
17-
[DEFINER = { user | CURRENT_USER }] [SQL SECURITY { DEFINER | INVOKER | NONE }]
16+
CREATE [OR REPLACE] VIEW [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster_name]
17+
[DEFINER = { user | CURRENT_USER }] [SQL SECURITY { DEFINER | INVOKER | NONE }]
1818
AS SELECT ...
1919
[COMMENT 'comment']
2020
```
@@ -55,8 +55,8 @@ SELECT * FROM view(column1=value1, column2=value2 ...)
5555
## Materialized View
5656

5757
``` sql
58-
CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db.]table_name [ON CLUSTER] [TO[db.]name] [ENGINE = engine] [POPULATE]
59-
[DEFINER = { user | CURRENT_USER }] [SQL SECURITY { DEFINER | INVOKER | NONE }]
58+
CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db.]table_name [ON CLUSTER] [TO[db.]name] [ENGINE = engine] [POPULATE]
59+
[DEFINER = { user | CURRENT_USER }] [SQL SECURITY { DEFINER | INVOKER | NONE }]
6060
AS SELECT ...
6161
[COMMENT 'comment']
6262
```
@@ -92,7 +92,7 @@ Given that `POPULATE` works like `CREATE TABLE ... AS SELECT ...` it has limitat
9292
- It is not supported with Replicated database
9393
- It is not supported in ClickHouse cloud
9494

95-
Instead a separate `INSERT ... SELECT` can be used.
95+
Instead a separate `INSERT ... SELECT` can be used.
9696
:::
9797

9898
A `SELECT` query can contain `DISTINCT`, `GROUP BY`, `ORDER BY`, `LIMIT`. Note that the corresponding conversions are performed independently on each block of inserted data. For example, if `GROUP BY` is set, data is aggregated during insertion, but only within a single packet of inserted data. The data won’t be further aggregated. The exception is when using an `ENGINE` that independently performs data aggregation, such as `SummingMergeTree`.
@@ -110,7 +110,7 @@ To delete a view, use [DROP VIEW](../../../sql-reference/statements/drop.md#drop
110110
`DEFINER` and `SQL SECURITY` allow you to specify which ClickHouse user to use when executing the view's underlying query.
111111
`SQL SECURITY` has three legal values: `DEFINER`, `INVOKER`, or `NONE`. You can specify any existing user or `CURRENT_USER` in the `DEFINER` clause.
112112

113-
The following table will explain which rights are required for which user in order to select from view.
113+
The following table will explain which rights are required for which user in order to select from view.
114114
Note that regardless of the SQL security option, in every case it is still required to have `GRANT SELECT ON <view>` in order to read from it.
115115

116116
| SQL security option | View | Materialized View |
@@ -130,7 +130,7 @@ If `DEFINER`/`SQL SECURITY` aren't specified, the default values are used:
130130

131131
If a view is attached without `DEFINER`/`SQL SECURITY` specified, the default value is `SQL SECURITY NONE` for the materialized view and `SQL SECURITY INVOKER` for the normal view.
132132

133-
To change SQL security for an existing view, use
133+
To change SQL security for an existing view, use
134134
```sql
135135
ALTER TABLE MODIFY SQL SECURITY { DEFINER | INVOKER | NONE } [DEFINER = { user | CURRENT_USER }]
136136
```
@@ -161,6 +161,8 @@ CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db.]table_name
161161
REFRESH EVERY|AFTER interval [OFFSET interval]
162162
RANDOMIZE FOR interval
163163
DEPENDS ON [db.]name [, [db.]name [, ...]]
164+
SETTINGS name = value [, name = value [, ...]]
165+
[APPEND]
164166
[TO[db.]name] [(columns)] [ENGINE = engine] [EMPTY]
165167
AS SELECT ...
166168
[COMMENT 'comment']
@@ -170,18 +172,23 @@ where `interval` is a sequence of simple intervals:
170172
number SECOND|MINUTE|HOUR|DAY|WEEK|MONTH|YEAR
171173
```
172174

173-
Periodically runs the corresponding query and stores its result in a table, atomically replacing the table's previous contents.
175+
Periodically runs the corresponding query and stores its result in a table.
176+
* If the query says `APPEND`, each refresh inserts rows into the table without deleting existing rows. The insert is not atomic, just like a regular INSERT SELECT.
177+
* Otherwise each refresh atomically replaces the table's previous contents.
174178

175179
Differences from regular non-refreshable materialized views:
176-
* No insert trigger. I.e. when new data is inserted into the table specified in SELECT, it's *not* automatically pushed to the refreshable materialized view. The periodic refresh runs the entire query and replaces the entire table.
180+
* No insert trigger. I.e. when new data is inserted into the table specified in SELECT, it's *not* automatically pushed to the refreshable materialized view. The periodic refresh runs the entire query.
177181
* No restrictions on the SELECT query. Table functions (e.g. `url()`), views, UNION, JOIN, are all allowed.
178182

183+
:::note
184+
The settings in the `REFRESH ... SETTINGS` part of the query are refresh settings (e.g. `refresh_retries`), distinct from regular settings (e.g. `max_threads`). Regular settings can be specified using `SETTINGS` at the end of the query.
185+
:::
186+
179187
:::note
180188
Refreshable materialized views are a work in progress. Setting `allow_experimental_refreshable_materialized_view = 1` is required for creating one. Current limitations:
181189
* not compatible with Replicated database or table engines
182190
* It is not supported in ClickHouse Cloud
183191
* require [Atomic database engine](../../../engines/database-engines/atomic.md),
184-
* no retries for failed refresh - we just skip to the next scheduled refresh time,
185192
* no limit on number of concurrent refreshes.
186193
:::
187194

@@ -246,15 +253,22 @@ A few more examples:
246253
`DEPENDS ON` only works between refreshable materialized views. Listing a regular table in the `DEPENDS ON` list will prevent the view from ever refreshing (dependencies can be removed with `ALTER`, see below).
247254
:::
248255

256+
### Settings
257+
258+
Available refresh settings:
259+
* `refresh_retries` - How many times to retry if refresh query fails with an exception. If all retries fail, skip to the next scheduled refresh time. 0 means no retries, -1 means infinite retries. Default: 0.
260+
* `refresh_retry_initial_backoff_ms` - Delay before the first retry, if `refresh_retries` is not zero. Each subsequent retry doubles the delay, up to `refresh_retry_max_backoff_ms`. Default: 100 ms.
261+
* `refresh_retry_max_backoff_ms` - Limit on the exponential growth of delay between refresh attempts. Default: 60000 ms (1 minute).
262+
249263
### Changing Refresh Parameters {#changing-refresh-parameters}
250264

251265
To change refresh parameters:
252266
```
253-
ALTER TABLE [db.]name MODIFY REFRESH EVERY|AFTER ... [RANDOMIZE FOR ...] [DEPENDS ON ...]
267+
ALTER TABLE [db.]name MODIFY REFRESH EVERY|AFTER ... [RANDOMIZE FOR ...] [DEPENDS ON ...] [SETTINGS ...]
254268
```
255269

256270
:::note
257-
This replaces refresh schedule *and* dependencies. If the table had a `DEPENDS ON`, doing a `MODIFY REFRESH` without `DEPENDS ON` will remove the dependencies.
271+
This replaces *all* refresh parameters at once: schedule, dependencies, settings, and APPEND-ness. E.g. if the table had a `DEPENDS ON`, doing a `MODIFY REFRESH` without `DEPENDS ON` will remove the dependencies.
258272
:::
259273

260274
### Other operations
@@ -263,6 +277,10 @@ The status of all refreshable materialized views is available in table [`system.
263277

264278
To manually stop, start, trigger, or cancel refreshes use [`SYSTEM STOP|START|REFRESH|CANCEL VIEW`](../system.md#refreshable-materialized-views).
265279

280+
:::note
281+
Fun fact: the refresh query is allowed to read from the view that's being refreshed, seeing pre-refresh version of the data. This means you can implement Conway's game of life: https://pastila.nl/?00021a4b/d6156ff819c83d490ad2dcec05676865#O0LGWTO7maUQIA4AcGUtlA==
282+
:::
283+
266284
## Window View [Experimental]
267285

268286
:::info

docs/en/sql-reference/statements/system.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -406,7 +406,7 @@ SYSTEM SYNC REPLICA [ON CLUSTER cluster_name] [db.]replicated_merge_tree_family_
406406
After running this statement the `[db.]replicated_merge_tree_family_table_name` fetches commands from the common replicated log into its own replication queue, and then the query waits till the replica processes all of the fetched commands. The following modifiers are supported:
407407

408408
- If a `STRICT` modifier was specified then the query waits for the replication queue to become empty. The `STRICT` version may never succeed if new entries constantly appear in the replication queue.
409-
- If a `LIGHTWEIGHT` modifier was specified then the query waits only for `GET_PART`, `ATTACH_PART`, `DROP_RANGE`, `REPLACE_RANGE` and `DROP_PART` entries to be processed.
409+
- If a `LIGHTWEIGHT` modifier was specified then the query waits only for `GET_PART`, `ATTACH_PART`, `DROP_RANGE`, `REPLACE_RANGE` and `DROP_PART` entries to be processed.
410410
Additionally, the LIGHTWEIGHT modifier supports an optional FROM 'srcReplicas' clause, where 'srcReplicas' is a comma-separated list of source replica names. This extension allows for more targeted synchronization by focusing only on replication tasks originating from the specified source replicas.
411411
- If a `PULL` modifier was specified then the query pulls new replication queue entries from ZooKeeper, but does not wait for anything to be processed.
412412

@@ -532,6 +532,10 @@ Trigger an immediate out-of-schedule refresh of a given view.
532532
SYSTEM REFRESH VIEW [db.]name
533533
```
534534
535+
### REFRESH VIEW
536+
537+
Wait for the currently running refresh to complete. If the refresh fails, throws an exception. If no refresh is running, completes immediately, throwing an exception if previous refresh failed.
538+
535539
### STOP VIEW, STOP VIEWS
536540
537541
Disable periodic refreshing of the given view or all refreshable views. If a refresh is in progress, cancel it too.

src/Common/ErrorCodes.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -608,6 +608,7 @@
608608
M(727, UNEXPECTED_TABLE_ENGINE) \
609609
M(728, UNEXPECTED_DATA_TYPE) \
610610
M(729, ILLEGAL_TIME_SERIES_TAGS) \
611+
M(730, REFRESH_FAILED) \
611612
\
612613
M(900, DISTRIBUTED_CACHE_ERROR) \
613614
M(901, CANNOT_USE_DISTRIBUTED_CACHE) \

src/Core/Settings.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -618,6 +618,7 @@ class IColumn;
618618
M(Bool, throw_if_deduplication_in_dependent_materialized_views_enabled_with_async_insert, true, "Throw exception on INSERT query when the setting `deduplicate_blocks_in_dependent_materialized_views` is enabled along with `async_insert`. It guarantees correctness, because these features can't work together.", 0) \
619619
M(Bool, materialized_views_ignore_errors, false, "Allows to ignore errors for MATERIALIZED VIEW, and deliver original block to the table regardless of MVs", 0) \
620620
M(Bool, ignore_materialized_views_with_dropped_target_table, false, "Ignore MVs with dropped target table during pushing to views", 0) \
621+
M(Bool, allow_materialized_view_with_bad_select, true, "Allow CREATE MATERIALIZED VIEW with SELECT query that references nonexistent tables or columns. It must still be syntactically valid. Doesn't apply to refreshable MVs. Doesn't apply if the MV schema needs to be inferred from the SELECT query (i.e. if the CREATE has no column list and no TO table). Can be used for creating MV before its source table.", 0) \
621622
M(Bool, use_compact_format_in_distributed_parts_names, true, "Changes format of directories names for distributed table insert parts.", 0) \
622623
M(Bool, validate_polygons, true, "Throw exception if polygon is invalid in function pointInPolygon (e.g. self-tangent, self-intersecting). If the setting is false, the function will accept invalid polygons but may silently return wrong result.", 0) \
623624
M(UInt64, max_parser_depth, DBMS_DEFAULT_MAX_PARSER_DEPTH, "Maximum parser depth (recursion depth of recursive descend parser).", 0) \

src/Core/SettingsChangesHistory.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,7 @@ static std::initializer_list<std::pair<ClickHouseVersion, SettingsChangesHistory
8484
{"use_hive_partitioning", false, false, "Allows to use hive partitioning for File, URL, S3, AzureBlobStorage and HDFS engines."},
8585
{"allow_experimental_kafka_offsets_storage_in_keeper", false, false, "Allow the usage of experimental Kafka storage engine that stores the committed offsets in ClickHouse Keeper"},
8686
{"allow_archive_path_syntax", true, true, "Added new setting to allow disabling archive path syntax."},
87+
{"allow_materialized_view_with_bad_select", true, true, "Support (but not enable yet) stricter validation in CREATE MATERIALIZED VIEW"},
8788
{"query_cache_tag", "", "", "New setting for labeling query cache settings."},
8889
{"allow_experimental_time_series_table", false, false, "Added new setting to allow the TimeSeries table engine"},
8990
{"enable_analyzer", 1, 1, "Added an alias to a setting `allow_experimental_analyzer`."},

src/Interpreters/BloomFilter.cpp

Lines changed: 21 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
#include <DataTypes/DataTypeArray.h>
77
#include <DataTypes/DataTypeNullable.h>
88
#include <DataTypes/DataTypeLowCardinality.h>
9+
#include <libdivide.h>
910

1011

1112
namespace DB
@@ -39,7 +40,8 @@ BloomFilter::BloomFilter(const BloomFilterParameters & params)
3940
}
4041

4142
BloomFilter::BloomFilter(size_t size_, size_t hashes_, size_t seed_)
42-
: size(size_), hashes(hashes_), seed(seed_), words((size + sizeof(UnderType) - 1) / sizeof(UnderType)), filter(words, 0)
43+
: size(size_), hashes(hashes_), seed(seed_), words((size + sizeof(UnderType) - 1) / sizeof(UnderType)),
44+
modulus(8 * size_), divider(modulus), filter(words, 0)
4345
{
4446
chassert(size != 0);
4547
chassert(hashes != 0);
@@ -49,6 +51,8 @@ void BloomFilter::resize(size_t size_)
4951
{
5052
size = size_;
5153
words = ((size + sizeof(UnderType) - 1) / sizeof(UnderType));
54+
modulus = 8 * size;
55+
divider = libdivide::divider<size_t, libdivide::BRANCHFREE>(modulus);
5256
filter.resize(words);
5357
}
5458

@@ -57,11 +61,15 @@ bool BloomFilter::find(const char * data, size_t len)
5761
size_t hash1 = CityHash_v1_0_2::CityHash64WithSeed(data, len, seed);
5862
size_t hash2 = CityHash_v1_0_2::CityHash64WithSeed(data, len, SEED_GEN_A * seed + SEED_GEN_B);
5963

64+
size_t acc = hash1;
6065
for (size_t i = 0; i < hashes; ++i)
6166
{
62-
size_t pos = (hash1 + i * hash2 + i * i) % (8 * size);
63-
if (!(filter[pos / (8 * sizeof(UnderType))] & (1ULL << (pos % (8 * sizeof(UnderType))))))
67+
/// It accumulates in the loop as follows:
68+
/// pos = (hash1 + hash2 * i + i * i) % (8 * size)
69+
size_t pos = fastMod(acc + i * i);
70+
if (!(filter[pos / word_bits] & (1ULL << (pos % word_bits))))
6471
return false;
72+
acc += hash2;
6573
}
6674
return true;
6775
}
@@ -71,10 +79,14 @@ void BloomFilter::add(const char * data, size_t len)
7179
size_t hash1 = CityHash_v1_0_2::CityHash64WithSeed(data, len, seed);
7280
size_t hash2 = CityHash_v1_0_2::CityHash64WithSeed(data, len, SEED_GEN_A * seed + SEED_GEN_B);
7381

82+
size_t acc = hash1;
7483
for (size_t i = 0; i < hashes; ++i)
7584
{
76-
size_t pos = (hash1 + i * hash2 + i * i) % (8 * size);
77-
filter[pos / (8 * sizeof(UnderType))] |= (1ULL << (pos % (8 * sizeof(UnderType))));
85+
/// It accumulates in the loop as follows:
86+
/// pos = (hash1 + hash2 * i + i * i) % (8 * size)
87+
size_t pos = fastMod(acc + i * i);
88+
filter[pos / word_bits] |= (1ULL << (pos % word_bits));
89+
acc += hash2;
7890
}
7991
}
8092

@@ -111,14 +123,14 @@ bool operator== (const BloomFilter & a, const BloomFilter & b)
111123

112124
void BloomFilter::addHashWithSeed(const UInt64 & hash, const UInt64 & hash_seed)
113125
{
114-
size_t pos = CityHash_v1_0_2::Hash128to64(CityHash_v1_0_2::uint128(hash, hash_seed)) % (8 * size);
115-
filter[pos / (8 * sizeof(UnderType))] |= (1ULL << (pos % (8 * sizeof(UnderType))));
126+
size_t pos = fastMod(CityHash_v1_0_2::Hash128to64(CityHash_v1_0_2::uint128(hash, hash_seed)));
127+
filter[pos / word_bits] |= (1ULL << (pos % word_bits));
116128
}
117129

118130
bool BloomFilter::findHashWithSeed(const UInt64 & hash, const UInt64 & hash_seed)
119131
{
120-
size_t pos = CityHash_v1_0_2::Hash128to64(CityHash_v1_0_2::uint128(hash, hash_seed)) % (8 * size);
121-
return bool(filter[pos / (8 * sizeof(UnderType))] & (1ULL << (pos % (8 * sizeof(UnderType)))));
132+
size_t pos = fastMod(CityHash_v1_0_2::Hash128to64(CityHash_v1_0_2::uint128(hash, hash_seed)));
133+
return bool(filter[pos / word_bits] & (1ULL << (pos % word_bits)));
122134
}
123135

124136
DataTypePtr BloomFilter::getPrimitiveType(const DataTypePtr & data_type)

src/Interpreters/BloomFilter.h

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,14 @@
11
#pragma once
22

3-
#include <vector>
43
#include <base/types.h>
5-
#include <Core/Field.h>
6-
#include <Common/PODArray.h>
7-
#include <Common/Allocator.h>
84
#include <Columns/IColumn.h>
9-
#include <Columns/ColumnVector.h>
105
#include <DataTypes/IDataType.h>
6+
#include <libdivide.h>
7+
8+
//#include <vector>
9+
//#include <Common/PODArray.h>
10+
//#include <Common/Allocator.h>
11+
//#include <Columns/ColumnVector.h>
1112

1213

1314
namespace DB
@@ -58,12 +59,18 @@ class BloomFilter
5859
friend bool operator== (const BloomFilter & a, const BloomFilter & b);
5960
private:
6061

62+
static constexpr size_t word_bits = 8 * sizeof(UnderType);
63+
6164
size_t size;
6265
size_t hashes;
6366
size_t seed;
6467
size_t words;
68+
size_t modulus; /// 8 * size, cached for fast modulo.
69+
libdivide::divider<size_t, libdivide::BRANCHFREE> divider; /// Divider for fast modulo by modulus.
6570
Container filter;
6671

72+
inline size_t fastMod(size_t value) const { return value - (value / divider) * modulus; }
73+
6774
public:
6875
static ColumnPtr getPrimitiveColumn(const ColumnPtr & column);
6976
static DataTypePtr getPrimitiveType(const DataTypePtr & data_type);

0 commit comments

Comments
 (0)