Merge pull request #4135 from Blargian/final_language

Blargian · web-flow · commit c7eccb428deb · 2025-07-22T22:09:24.000+02:00
Review of language around `FINAL`
diff --git a/docs/best-practices/_snippets/_avoid_optimize_final.md b/docs/best-practices/_snippets/_avoid_optimize_final.md
@@ -16,7 +16,15 @@ While it's tempting to manually trigger this merge using:
 OPTIMIZE TABLE <table> FINAL;
 ```
 
-**you should avoid this operation in most cases** as it initiates resource intensive operations which may impact cluster performance.
+**you should avoid the `OPTIMIZE FINAL` operation in most cases** as it initiates 
+resource intensive operations which may impact cluster performance.
+
+:::note OPTIMIZE FINAL vs FINAL
+`OPTIMIZE FINAL` is not the same as `FINAL`, which is sometimes necessary to use 
+to get results without duplicates, such as with the `ReplacingMergeTree`. Generally,
+`FINAL` is okay to use if your queries are filtering on the same columns as those
+in your primary key.
+:::
 
 ## Why avoid?  {#why-avoid}
 
diff --git a/docs/best-practices/avoid_optimize_final.md b/docs/best-practices/avoid_optimize_final.md
@@ -5,8 +5,11 @@ sidebar_label: 'Avoid Optimize Final'
 title: 'Avoid OPTIMIZE FINAL'
 description: 'Page describing why you should avoid the OPTIMIZE FINAL clause in ClickHouse'
 keywords: ['avoid OPTIMIZE FINAL', 'background merges']
+hide_title: true
 ---
 
+# Avoid `OPTIMIZE FINAL`
+
 import Content from '@site/docs/best-practices/_snippets/_avoid_optimize_final.md';
 
 <Content />
diff --git a/docs/cloud/bestpractices/index.md b/docs/cloud/bestpractices/index.md
@@ -27,5 +27,5 @@ These are in addition to the standard best practices which apply to all deployme
 | [Selecting an Insert Strategy](/best-practices/selecting-an-insert-strategy) | Strategies for efficient data insertion in ClickHouse.             |
 | [Data Skipping Indices](/best-practices/use-data-skipping-indices-where-appropriate) | When to apply data skipping indices for performance gains.    |
 | [Avoid Mutations](/best-practices/avoid-mutations)                   | Reasons to avoid mutations and how to design without them.               |
-| [Avoid OPTIMIZE FINAL](/best-practices/avoid-optimize-final)         | Why `OPTIMIZE FINAL` can be costly and how to work around it.           |
+| [Avoid `OPTIMIZE FINAL`](/best-practices/avoid-optimize-final)         | Why `OPTIMIZE FINAL` can be costly and how to work around it.           |
 | [Use JSON where appropriate](/best-practices/use-json-where-appropriate) | Considerations for using JSON columns in ClickHouse.               |
diff --git a/docs/guides/best-practices/index.md b/docs/guides/best-practices/index.md
@@ -11,8 +11,8 @@ This section contains tips and best practices for improving performance with Cli
 We recommend users read [Core Concepts](/parts) as a precursor to this section, 
 which covers the main concepts required to improve performance.
 
-| Topic                                                                                 | Description                                                                                                                                                                                                                                                                                                                                                             |
-|---------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Topic                                                                           | Description                                                                                                                                                                                                                                                                                                                                                             |
+|---------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | [Query Optimization Guide](/optimize/query-optimization)                        | A good place to start for query optimization, this simple guide describes common scenarios of how to use different performance and optimization techniques to improve query performance.                                                                                                                                                                                |
 | [Primary Indexes Advanced Guide](/guides/best-practices/sparse-primary-indexes) | A deep dive into ClickHouse indexing including how it differs from other DB systems, how ClickHouse builds and uses a table's spare primary index and what some of the best practices are for indexing in ClickHouse.                                                                                                                                                   |
 | [Query Parallelism](/optimize/query-parallelism)                                | Explains how ClickHouse parallelizes query execution using processing lanes and the max_threads setting. Covers how data is distributed across lanes, how max_threads is applied, when it isn't fully used, and how to inspect execution with tools like EXPLAIN and trace logs.                                                                                        |
@@ -23,7 +23,7 @@ which covers the main concepts required to improve performance.
 | [Asynchronous Inserts](/optimize/asynchronous-inserts)                          | Focuses on ClickHouse's asynchronous inserts feature. It likely explains how asynchronous inserts work (batching data on the server for efficient insertion) and their benefits (improved performance by offloading insert processing). It might also cover enabling asynchronous inserts and considerations for using them effectively in your ClickHouse environment. |
 | [Avoid Mutations](/optimize/avoid-mutations)                                    | Discusses the importance of avoiding mutations (updates and deletes) in ClickHouse. It recommends using append-only inserts for optimal performance and suggests alternative approaches for handling data changes.                                                                                                                                                      |
 | [Avoid nullable columns](/optimize/avoid-nullable-columns)                      | Discusses why you may want to avoid nullable columns to save space and increase performance. Demonstrates how to set a default value for a column.                                                                                                                                                                                                                      |
-| [Avoid Optimize Final](/optimize/avoidoptimizefinal)                            | Explains how the `OPTIMIZE TABLE ... FINAL` query is resource-intensive and suggests alternative approaches to optimize ClickHouse performance.                                                                                                                                                                                                                         |
+| [Avoid `OPTIMIZE FINAL`](/optimize/avoidoptimizefinal)                       | Explains how the `OPTIMIZE TABLE ... FINAL` query is resource-intensive and suggests alternative approaches to optimize ClickHouse performance.                                                                                                                                                                                                                         |
 | [Analyzer](/operations/analyzer)                                                | Looks at the ClickHouse Analyzer, a tool for analyzing and optimizing queries. Discusses how the Analyzer works, its benefits (e.g., identifying performance bottlenecks), and how to use it to improve your ClickHouse queries' efficiency.                                                                                                                            |
 | [Query Profiling](/operations/optimizing-performance/sampling-query-profiler)   | Explains ClickHouse's Sampling Query Profiler, a tool that helps analyze query execution.                                                                                                                                                                                                                                                                               |
 | [Query Cache](/operations/query-cache)                                          | Details ClickHouse's Query Cache, a feature that aims to improve performance by caching the results of frequently executed `SELECT` queries.                                                                                                                                                                                                                            |
diff --git a/docs/guides/developer/deduplication.md b/docs/guides/developer/deduplication.md
@@ -105,7 +105,9 @@ FINAL
 The result only has 2 rows, and the last row inserted is the row that gets returned.
 
 :::note
-Using `FINAL` works OK if you have a small amount of data. If you are dealing with a large amount of data, using `FINAL` is probably not the best option. Let's discuss a better option for finding the latest value of a column...
+Using `FINAL` works okay if you have a small amount of data. If you are dealing with a large amount of data, 
+using `FINAL` is probably not the best option. Let's discuss a better option for 
+finding the latest value of a column.
 :::
 
 ### Avoiding FINAL {#avoiding-final}
diff --git a/docs/guides/developer/replacing-merge-tree.md b/docs/guides/developer/replacing-merge-tree.md
@@ -220,7 +220,11 @@ Peak memory usage: 8.14 MiB.
 
 ## FINAL performance {#final-performance}
 
-The `FINAL` operator will have a performance overhead on queries despite ongoing improvements. This will be most appreciable when queries are not filtering on primary key columns, causing more data to be read and increasing the deduplication overhead. If users filter on key columns using a `WHERE` condition, the data loaded and passed for deduplication will be reduced.
+The `FINAL` operator does have a small performance overhead on queries.
+This will be most noticeable when queries are not filtering on primary key columns,
+causing more data to be read and increasing the deduplication overhead. If users
+filter on key columns using a `WHERE` condition, the data loaded and passed for
+deduplication will be reduced.
 
 If the `WHERE` condition does not use a key column, ClickHouse does not currently utilize the `PREWHERE` optimization when using `FINAL`. This optimization aims to reduce the rows read for non-filtered columns. Examples of emulating this `PREWHERE` and thus potentially improving performance can be found [here](https://clickhouse.com/blog/clickhouse-postgresql-change-data-capture-cdc-part-1#final-performance).
 
diff --git a/knowledgebase/optimize_final_vs_final.mdx b/knowledgebase/optimize_final_vs_final.mdx
@@ -0,0 +1,54 @@
+---
+date: 2025-07-20
+title: What is the difference between OPTIMIZE FINAL and FINAL?
+tags: ['Core Data Concepts']
+keywords: ['OPTIMIZE FINAL', 'FINAL']
+description: 'Discusses the differences between OPTIMIZE FINAL and FINAL, and when to use and avoid them.'
+---
+
+{frontMatter.description}
+{/* truncate */}
+
+# What is the difference between `OPTIMIZE FINAL` and `FINAL`?
+
+`OPTIMIZE FINAL` is a DDL command that physically and permanently reorganizes
+and optimizes data on disk. It physically merges data parts in `MergeTree` tables,
+performing data deduplication in the process by removing duplicate rows from storage.
+
+`FINAL` is a **query-time** modifier that provides deduplicated results without
+changing the structure of the stored data. It works by performing merge logic at
+read-time. It is temporary, only affecting the current query result.
+
+Users are often advised to avoid using `OPTIMIZE FINAL`, as it has a significant
+performance overhead, however they should not confuse the two. It is often necessary
+to use `FINAL` to get back results without duplicates, especially when using table
+engines like `ReplacingMergeTree` which may contain duplicate rows which have not
+been replaced during the eventual, background merge process.
+
+The table below summarises the key differences:
+
+|Aspect	           |`OPTIMIZE FINAL`                            | `FINAL`                                            |
+|------------------|--------------------------------------------|----------------------------------------------------|
+|Type              | DDL Command                                | Query Modifier                                     |
+|Effect            | Permanent storage optimization	            | Temporary query-time deduplication                 |
+|Performance       | Impact	High cost once, then faster queries	| Lower individual cost, but repeated for each query |
+|Data Modification | Yes - physically changes storage           | No - read-only operation                           |
+|Use Case          | Periodic maintenance/optimization          | Real-time deduplicated queries                     |
+
+## When to use each {#when-to-use-each}
+
+Use `OPTIMIZE FINAL` when:
+
+- You want to permanently improve query performance
+- You can afford the one-time optimization cost
+- You're doing periodic table maintenance
+- You want to physically clean up duplicate data
+
+Use `FINAL` when:
+
+- You need deduplicated results immediately
+- You can't wait for or don't want permanent optimization
+- You only occasionally need deduplicated data
+- You're working with frequently changing data
+
+Both are valuable tools, but they serve different purposes in ClickHouse's deduplication strategy.