Skip to content

Commit c7eccb4

Browse files
authored
Merge pull request #4135 from Blargian/final_language
Review of language around `FINAL`
2 parents c2c0100 + 1b1ab57 commit c7eccb4

File tree

7 files changed

+78
-7
lines changed

7 files changed

+78
-7
lines changed

docs/best-practices/_snippets/_avoid_optimize_final.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,15 @@ While it's tempting to manually trigger this merge using:
1616
OPTIMIZE TABLE <table> FINAL;
1717
```
1818

19-
**you should avoid this operation in most cases** as it initiates resource intensive operations which may impact cluster performance.
19+
**you should avoid the `OPTIMIZE FINAL` operation in most cases** as it initiates
20+
resource intensive operations which may impact cluster performance.
21+
22+
:::note OPTIMIZE FINAL vs FINAL
23+
`OPTIMIZE FINAL` is not the same as `FINAL`, which is sometimes necessary to use
24+
to get results without duplicates, such as with the `ReplacingMergeTree`. Generally,
25+
`FINAL` is okay to use if your queries are filtering on the same columns as those
26+
in your primary key.
27+
:::
2028

2129
## Why avoid? {#why-avoid}
2230

docs/best-practices/avoid_optimize_final.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,11 @@ sidebar_label: 'Avoid Optimize Final'
55
title: 'Avoid OPTIMIZE FINAL'
66
description: 'Page describing why you should avoid the OPTIMIZE FINAL clause in ClickHouse'
77
keywords: ['avoid OPTIMIZE FINAL', 'background merges']
8+
hide_title: true
89
---
910

11+
# Avoid `OPTIMIZE FINAL`
12+
1013
import Content from '@site/docs/best-practices/_snippets/_avoid_optimize_final.md';
1114

1215
<Content />

docs/cloud/bestpractices/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,5 +27,5 @@ These are in addition to the standard best practices which apply to all deployme
2727
| [Selecting an Insert Strategy](/best-practices/selecting-an-insert-strategy) | Strategies for efficient data insertion in ClickHouse. |
2828
| [Data Skipping Indices](/best-practices/use-data-skipping-indices-where-appropriate) | When to apply data skipping indices for performance gains. |
2929
| [Avoid Mutations](/best-practices/avoid-mutations) | Reasons to avoid mutations and how to design without them. |
30-
| [Avoid OPTIMIZE FINAL](/best-practices/avoid-optimize-final) | Why `OPTIMIZE FINAL` can be costly and how to work around it. |
30+
| [Avoid `OPTIMIZE FINAL`](/best-practices/avoid-optimize-final) | Why `OPTIMIZE FINAL` can be costly and how to work around it. |
3131
| [Use JSON where appropriate](/best-practices/use-json-where-appropriate) | Considerations for using JSON columns in ClickHouse. |

docs/guides/best-practices/index.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,8 @@ This section contains tips and best practices for improving performance with Cli
1111
We recommend users read [Core Concepts](/parts) as a precursor to this section,
1212
which covers the main concepts required to improve performance.
1313

14-
| Topic | Description |
15-
|---------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
14+
| Topic | Description |
15+
|---------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
1616
| [Query Optimization Guide](/optimize/query-optimization) | A good place to start for query optimization, this simple guide describes common scenarios of how to use different performance and optimization techniques to improve query performance. |
1717
| [Primary Indexes Advanced Guide](/guides/best-practices/sparse-primary-indexes) | A deep dive into ClickHouse indexing including how it differs from other DB systems, how ClickHouse builds and uses a table's spare primary index and what some of the best practices are for indexing in ClickHouse. |
1818
| [Query Parallelism](/optimize/query-parallelism) | Explains how ClickHouse parallelizes query execution using processing lanes and the max_threads setting. Covers how data is distributed across lanes, how max_threads is applied, when it isn't fully used, and how to inspect execution with tools like EXPLAIN and trace logs. |
@@ -23,7 +23,7 @@ which covers the main concepts required to improve performance.
2323
| [Asynchronous Inserts](/optimize/asynchronous-inserts) | Focuses on ClickHouse's asynchronous inserts feature. It likely explains how asynchronous inserts work (batching data on the server for efficient insertion) and their benefits (improved performance by offloading insert processing). It might also cover enabling asynchronous inserts and considerations for using them effectively in your ClickHouse environment. |
2424
| [Avoid Mutations](/optimize/avoid-mutations) | Discusses the importance of avoiding mutations (updates and deletes) in ClickHouse. It recommends using append-only inserts for optimal performance and suggests alternative approaches for handling data changes. |
2525
| [Avoid nullable columns](/optimize/avoid-nullable-columns) | Discusses why you may want to avoid nullable columns to save space and increase performance. Demonstrates how to set a default value for a column. |
26-
| [Avoid Optimize Final](/optimize/avoidoptimizefinal) | Explains how the `OPTIMIZE TABLE ... FINAL` query is resource-intensive and suggests alternative approaches to optimize ClickHouse performance. |
26+
| [Avoid `OPTIMIZE FINAL`](/optimize/avoidoptimizefinal) | Explains how the `OPTIMIZE TABLE ... FINAL` query is resource-intensive and suggests alternative approaches to optimize ClickHouse performance. |
2727
| [Analyzer](/operations/analyzer) | Looks at the ClickHouse Analyzer, a tool for analyzing and optimizing queries. Discusses how the Analyzer works, its benefits (e.g., identifying performance bottlenecks), and how to use it to improve your ClickHouse queries' efficiency. |
2828
| [Query Profiling](/operations/optimizing-performance/sampling-query-profiler) | Explains ClickHouse's Sampling Query Profiler, a tool that helps analyze query execution. |
2929
| [Query Cache](/operations/query-cache) | Details ClickHouse's Query Cache, a feature that aims to improve performance by caching the results of frequently executed `SELECT` queries. |

docs/guides/developer/deduplication.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,9 @@ FINAL
105105
The result only has 2 rows, and the last row inserted is the row that gets returned.
106106

107107
:::note
108-
Using `FINAL` works OK if you have a small amount of data. If you are dealing with a large amount of data, using `FINAL` is probably not the best option. Let's discuss a better option for finding the latest value of a column...
108+
Using `FINAL` works okay if you have a small amount of data. If you are dealing with a large amount of data,
109+
using `FINAL` is probably not the best option. Let's discuss a better option for
110+
finding the latest value of a column.
109111
:::
110112

111113
### Avoiding FINAL {#avoiding-final}

docs/guides/developer/replacing-merge-tree.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -220,7 +220,11 @@ Peak memory usage: 8.14 MiB.
220220

221221
## FINAL performance {#final-performance}
222222

223-
The `FINAL` operator will have a performance overhead on queries despite ongoing improvements. This will be most appreciable when queries are not filtering on primary key columns, causing more data to be read and increasing the deduplication overhead. If users filter on key columns using a `WHERE` condition, the data loaded and passed for deduplication will be reduced.
223+
The `FINAL` operator does have a small performance overhead on queries.
224+
This will be most noticeable when queries are not filtering on primary key columns,
225+
causing more data to be read and increasing the deduplication overhead. If users
226+
filter on key columns using a `WHERE` condition, the data loaded and passed for
227+
deduplication will be reduced.
224228

225229
If the `WHERE` condition does not use a key column, ClickHouse does not currently utilize the `PREWHERE` optimization when using `FINAL`. This optimization aims to reduce the rows read for non-filtered columns. Examples of emulating this `PREWHERE` and thus potentially improving performance can be found [here](https://clickhouse.com/blog/clickhouse-postgresql-change-data-capture-cdc-part-1#final-performance).
226230

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
---
2+
date: 2025-07-20
3+
title: What is the difference between OPTIMIZE FINAL and FINAL?
4+
tags: ['Core Data Concepts']
5+
keywords: ['OPTIMIZE FINAL', 'FINAL']
6+
description: 'Discusses the differences between OPTIMIZE FINAL and FINAL, and when to use and avoid them.'
7+
---
8+
9+
{frontMatter.description}
10+
{/* truncate */}
11+
12+
# What is the difference between `OPTIMIZE FINAL` and `FINAL`?
13+
14+
`OPTIMIZE FINAL` is a DDL command that physically and permanently reorganizes
15+
and optimizes data on disk. It physically merges data parts in `MergeTree` tables,
16+
performing data deduplication in the process by removing duplicate rows from storage.
17+
18+
`FINAL` is a **query-time** modifier that provides deduplicated results without
19+
changing the structure of the stored data. It works by performing merge logic at
20+
read-time. It is temporary, only affecting the current query result.
21+
22+
Users are often advised to avoid using `OPTIMIZE FINAL`, as it has a significant
23+
performance overhead, however they should not confuse the two. It is often necessary
24+
to use `FINAL` to get back results without duplicates, especially when using table
25+
engines like `ReplacingMergeTree` which may contain duplicate rows which have not
26+
been replaced during the eventual, background merge process.
27+
28+
The table below summarises the key differences:
29+
30+
|Aspect |`OPTIMIZE FINAL` | `FINAL` |
31+
|------------------|--------------------------------------------|----------------------------------------------------|
32+
|Type | DDL Command | Query Modifier |
33+
|Effect | Permanent storage optimization | Temporary query-time deduplication |
34+
|Performance | Impact High cost once, then faster queries | Lower individual cost, but repeated for each query |
35+
|Data Modification | Yes - physically changes storage | No - read-only operation |
36+
|Use Case | Periodic maintenance/optimization | Real-time deduplicated queries |
37+
38+
## When to use each {#when-to-use-each}
39+
40+
Use `OPTIMIZE FINAL` when:
41+
42+
- You want to permanently improve query performance
43+
- You can afford the one-time optimization cost
44+
- You're doing periodic table maintenance
45+
- You want to physically clean up duplicate data
46+
47+
Use `FINAL` when:
48+
49+
- You need deduplicated results immediately
50+
- You can't wait for or don't want permanent optimization
51+
- You only occasionally need deduplicated data
52+
- You're working with frequently changing data
53+
54+
Both are valuable tools, but they serve different purposes in ClickHouse's deduplication strategy.

0 commit comments

Comments
 (0)