azat
diff --git a/‎docs/en/engines/table-engines/mergetree-family/annindexes.md‎
Lines changed: 20 additions & 14 deletions b/‎docs/en/engines/table-engines/mergetree-family/annindexes.md‎
Lines changed: 20 additions & 14 deletions
diff --git a/‎src/Core/Settings.cpp‎
Lines changed: 7 additions & 7 deletions b/‎src/Core/Settings.cpp‎
Lines changed: 7 additions & 7 deletions
diff --git a/‎src/Core/Settings.h‎
Lines changed: 1 addition & 1 deletion b/‎src/Core/Settings.h‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎src/Core/SettingsChangesHistory.cpp‎
Lines changed: 2 additions & 2 deletions b/‎src/Core/SettingsChangesHistory.cpp‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎src/Core/SettingsEnums.cpp‎
Lines changed: 6 additions & 4 deletions b/‎src/Core/SettingsEnums.cpp‎
Lines changed: 6 additions & 4 deletions
diff --git a/‎src/Core/SettingsEnums.h‎
Lines changed: 2 additions & 2 deletions b/‎src/Core/SettingsEnums.h‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎src/Processors/QueryPlan/Optimizations/Optimizations.h‎
Lines changed: 1 addition & 1 deletion b/‎src/Processors/QueryPlan/Optimizations/Optimizations.h‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎src/Processors/QueryPlan/Optimizations/QueryPlanOptimizationSettings.cpp‎
Lines changed: 10 additions & 10 deletions b/‎src/Processors/QueryPlan/Optimizations/QueryPlanOptimizationSettings.cpp‎
Lines changed: 10 additions & 10 deletions
diff --git a/‎src/Processors/QueryPlan/Optimizations/QueryPlanOptimizationSettings.h‎
Lines changed: 2 additions & 2 deletions b/‎src/Processors/QueryPlan/Optimizations/QueryPlanOptimizationSettings.h‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎src/Processors/QueryPlan/Optimizations/optimizeTree.cpp‎
Lines changed: 1 addition & 1 deletion b/‎src/Processors/QueryPlan/Optimizations/optimizeTree.cpp‎
Lines changed: 1 addition & 1 deletion
@@ -238,16 +238,14 @@ These two strategies determine the order in which the filters are evaluated:
 - With pre-filtering, the filter evaluation order is the other way round.
 
 Both strategies have different trade-offs:
-- Post-filtering has the general problem that it may return less than the number of rows requested in the `LIMIT <N>` clause. This happens when at least one of the result rows returned by the vector similarity index fails to satisfy the additional filters.
-- Pre-filtering is an unsolved problem. Some specialized vector databases implement it but most databases including ClickHouse will fall back to exact neighbor search, i.e., a brute-force scan without index.
+- Post-filtering has the general problem that it may return less than the number of rows requested in the `LIMIT <N>` clause. This situation happens when at least one of the result rows returned by the vector similarity index fails to satisfy the additional filters.
+- Pre-filtering is generally unsolved problem. Some specialized vector databases implement it but most databases including ClickHouse will fall back to exact neighbor search, i.e., a brute-force scan without index.
 
 What strategy is used comes down to whether ClickHouse can use indexes for the additional filter conditions.
-
 If no index can be used, post-filtering will be applied.
 
 If the additional filter condition is part of the partition key, then ClickHouse will apply partition pruning.
-
-Example, assuming that the table is range-partitioned by `year`:
+For example, assuming that the table is range-partitioned by `year`:
 
 ```sql
 WITH [0., 2.] AS reference_vec
@@ -261,14 +259,17 @@ LIMIT 3;
 ClickHouse will ignore all partitions but the one for year 2025.
 Within this partition, a post-filtering strategy will be applied.
 
-If the additional filter condition is on the primary key and the filter selects some but not all ranges of a part, then Clickhouse will fall back to exact neighbour search i.e brute force scan without index, on the selected ranges of the part. If the primary key filter selects entire parts, Clickhouse will use the vector similarity index on those parts to retrieve results.
+If the additional filter condition is on the primary key columns and the filter selects some but not all ranges of a part, then Clickhouse will fall back to exact neighbour search (brute-force scan without index) on the selected ranges of the part.
+If the primary key filter selects entire parts, Clickhouse will use the vector similarity index on those parts to retrieve results.
 
-In case additional filter conditions on columns can make use of skip indexes (minmax, set etc), Clickhouse by default chooses a post-filtering strategy. Clickhouse gives higher priority to the vector similarity index because the vector index is expected to deliver business value by accelerating semantic search response times.
+In case additional filter conditions on columns can make use of skip indexes (minmax, set etc), Clickhouse by default chooses a post-filtering strategy.
+Clickhouse gives higher priority to the vector similarity index because the vector index is expected to deliver business value by accelerating semantic search response times.
 
-Clickhouse provides 2 settings for finer control on post-filtering and pre-filtering -
+Clickhouse provides 2 settings for finer control on post-filtering and pre-filtering:
 
-- vector_search_filtering
-When the additional filter conditions are extremely selective, it is possible that brute force search on a small filtered set of rows gives better results then post-filtering using the vector search. Users can request explicit pre-filtering by setting ```vector_search_filtering``` to "prefilter" (default is "auto" which equates to "postfilter"). An example query where pre-filtering could be a good choice is -
+When the additional filter conditions are extremely selective, it is possible that brute force search on a small filtered set of rows gives better results then post-filtering using the vector search.
+Users can force pre-filtering by setting [vector_search_filter_strategy](../../../operations/settings/settings#vector_search_filter_strategy) to `prefilter` (default is `auto` which is equivalent to `postfilter`).
+An example query where pre-filtering could be a good choice is
 
 ```sql
 SELECT bookid, author, title
@@ -278,10 +279,11 @@ ORDER BY cosineDistance(book_vector, getEmbedding('Books on ancient Asian empire
 LIMIT 10
 ```
 
-Assuming books priced less that $2 are a tiny portion, post-filtering approach may return 0 rows because the top `LIMIT <N>` matches returned by the vector index could all be priced above $2. By opting for explicit pre-filtering, the subset of all books priced less than $2 are shortlisted and then brute-force vector search executed on the subset to return the closest matches.
+Assuming that only very few books cost less than $2, post-filtering may return zero rows because the top 10 matches returned by the vector index could all be priced above $2.
+By forcing pre-filtering (add `SETTINGS vector_search_filter_strategy = 'prefilter'` to the query), ClickHouse first finds all books with a price of less than $2 and then executes a brute-force vector search on the matches.
 
-- vector_search_postfilter_multiplier
-As explained above in the trade-offs, post-filtering could return lesser number of rows then specified in the `LIMIT <N>` clause. Consider this query -
+As mentioned above, post-filtering may return less matches then specified in the `LIMIT <N>` clause.
+Consider query
 
 ```sql
 SELECT bookid, author, title
@@ -290,7 +292,11 @@ WHERE published_year <= 2000
 ORDER BY cosineDistance(book_vector, getEmbedding('Books on ancient Asian empires'))
 LIMIT 10
 ```
-One or more of the 10 nearest matching books returned by the vector index could be published after year 2000. Hence the query will end up returning less than 10 rows, contrary to user expectations. For such cases, the parameter ```vector_search_postfilter_multiplier``` can be set to a value like 2 or 10 to indicate that 20 or 100 nearest matching books should be returned by the vector index and then the additional filter to be applied on those rows to return the result of 10 rows.
+
+With post-filtering, some of the 10 nearest matching books returned by the vector index may be pruned from the result because they were published later than in the year 2000.
+As a result, the query may return less rows than the user requested.
+For such cases, you can set parameter [vector_search_postfilter_multiplier](../../../operations/settings/settings#vector_search_postfilter_multiplier) to a value > 1.0 (for example, 2.0) to indicate that N times this factor many matches should be returned by the vector index and then the additional filter to be applied on those rows to return the result of 10 rows.
+We note that this method can mitigate the problem with post-filtering but in extreme cases (extremely selective WHERE condition), there may still less than N requested rows returned.
 
 ### Performance Tuning {#performance-tuning}
 
 
@@ -6586,17 +6586,17 @@ SELECT queries with LIMIT bigger than this setting cannot use vector similarity
     DECLARE(UInt64, hnsw_candidate_list_size_for_search, 256, R"(
 The size of the dynamic candidate list when searching the vector similarity index, also known as 'ef_search'.
 )", EXPERIMENTAL) \
-    DECLARE(VectorSearchFilteringType, vector_search_filtering, VectorSearchFilteringType::AUTO, R"(
-If a vector search query has a WHERE clause, this parameter determines if the predicates are evaluated first (pre-filtering) OR if the vector similarity index is looked up first (post-filtering). Please check documentation for additional specifics.
+    DECLARE(VectorSearchFilterStrategy, vector_search_filter_strategy, VectorSearchFilterStrategy::AUTO, R"(
+If a vector search query has a WHERE clause, this setting determines if it is evaluated first (pre-filtering) OR if the vector similarity index is checked first (post-filtering).
 
 Possible values:
 
-AUTO - Currently maps to POSTFILTER semantics.
-PREFILTER - Evaluate other column predicates first and then perform brute-force search to identify neighbours.
-POSTFILTER - Use vector similarity index to identify neighbours and then apply other column predicates.
+'auto' - Postfiltering (the exact semantics may change in future).
+'postfilter' - Use vector similarity index to identify the nearest neighbours, then apply other filters
+'prefilter' - Evaluate other filters first, then perform brute-force search to identify neighbours.
 )", EXPERIMENTAL) \
-    DECLARE(UInt64, vector_search_postfilter_multiplier, 1, R"(
-Determines the number of neighbours to fetch from the vector similarity index before performing post-filtering on other predicates. The number of neighbours fetched is (LIMIT n X ann_post_filter_multiplier).
+    DECLARE(Float, vector_search_postfilter_multiplier, 1.0, R"(
+Multiply the fetched nearest neighbors from the vector similarity index by this number before performing post-filtering on other predicates.
 )", EXPERIMENTAL) \
     DECLARE(Bool, throw_on_unsupported_query_inside_transaction, true, R"(
 Throw exception if unsupported query is used inside transaction
 
@@ -104,7 +104,7 @@ class WriteBuffer;
     M(CLASS_NAME, UInt64) \
     M(CLASS_NAME, UInt64Auto) \
     M(CLASS_NAME, URI) \
-    M(CLASS_NAME, VectorSearchFilteringType)
+    M(CLASS_NAME, VectorSearchFilterStrategy)
 
 
 COMMON_SETTINGS_SUPPORTED_TYPES(Settings, DECLARE_SETTING_TRAIT)
 
@@ -98,8 +98,8 @@ const VersionToSettingsChangesMap & getSettingsChangesHistory()
             {"allow_experimental_lightweight_update", false, false, "A new setting"},
             {"allow_experimental_delta_kernel_rs", true, true, "New setting"},
             {"allow_experimental_database_hms_catalog", false, false, "Allow experimental database engine DataLakeCatalog with catalog_type = 'hive'"},
-            {"vector_search_filtering", "auto", "auto", "Vector search related "},
-            {"vector_search_postfilter_multiplier", 1, 1, "Vector search related "},
+            {"vector_search_filter_strategy", "auto", "auto", "New setting"},
+            {"vector_search_postfilter_multiplier", 1, 1, "New setting"},
             {"compile_expressions", false, true, "We believe that the LLVM infrastructure behind the JIT compiler is stable enough to enable this setting by default."},
             {"use_legacy_to_time", false, false, "New setting. Allows for user to use the old function logic for toTime, which works as toTimeWithFixedDate."},
         });
 
@@ -305,9 +305,11 @@ IMPLEMENT_SETTING_ENUM(
      {"glue", DatabaseDataLakeCatalogType::GLUE},
      {"hive", DatabaseDataLakeCatalogType::ICEBERG_HIVE}})
 
-IMPLEMENT_SETTING_ENUM(VectorSearchFilteringType, ErrorCodes::BAD_ARGUMENTS,
-    {{"auto", VectorSearchFilteringType::AUTO},
-     {"prefilter", VectorSearchFilteringType::PREFILTER},
-     {"postfilter", VectorSearchFilteringType::POSTFILTER}})
+IMPLEMENT_SETTING_ENUM(
+    VectorSearchFilterStrategy,
+    ErrorCodes::BAD_ARGUMENTS,
+    {{"auto", VectorSearchFilterStrategy::AUTO},
+     {"prefilter", VectorSearchFilterStrategy::PREFILTER},
+     {"postfilter", VectorSearchFilterStrategy::POSTFILTER}})
 
 }
@@ -396,13 +396,13 @@ enum class DatabaseDataLakeCatalogType : uint8_t
 
 DECLARE_SETTING_ENUM(DatabaseDataLakeCatalogType)
 
-enum class VectorSearchFilteringType : uint8_t
+enum class VectorSearchFilterStrategy : uint8_t
 {
     AUTO,
     PREFILTER,
     POSTFILTER,
 };
 
-DECLARE_SETTING_ENUM(VectorSearchFilteringType)
+DECLARE_SETTING_ENUM(VectorSearchFilterStrategy)
 
 }
@@ -32,9 +32,9 @@ struct Optimization
     struct ExtraSettings
     {
         size_t max_limit_for_vector_search_queries;
+        VectorSearchFilterStrategy vector_search_filter_strategy;
         size_t use_index_for_in_with_subqueries_max_values;
         SizeLimits network_transfer_limits;
-        VectorSearchFilteringType vector_search_filtering;
     };
 
     using Function = size_t (*)(QueryPlan::Node *, QueryPlan::Nodes &, const ExtraSettings &);
 
@@ -37,22 +37,22 @@ namespace Setting
     extern const SettingsBool query_plan_convert_join_to_in;
     extern const SettingsBool use_query_condition_cache;
     extern const SettingsBool query_condition_cache_store_conditions_as_plaintext;
+    extern const SettingsBool collect_hash_table_stats_during_joins;
+    extern const SettingsBool query_plan_join_shard_by_pk_ranges;
+    extern const SettingsBool query_plan_optimize_lazy_materialization;
     extern const SettingsBoolAuto query_plan_join_swap_table;
     extern const SettingsMaxThreads max_threads;
+    extern const SettingsOverflowMode transfer_overflow_mode;
     extern const SettingsSeconds lock_acquire_timeout;
     extern const SettingsString force_optimize_projection_name;
-    extern const SettingsUInt64 max_limit_for_vector_search_queries;
-    extern const SettingsUInt64 query_plan_max_optimizations_to_apply;
-    extern const SettingsBool query_plan_optimize_lazy_materialization;
-    extern const SettingsUInt64 query_plan_max_limit_for_lazy_materialization;
-    extern const SettingsBool query_plan_join_shard_by_pk_ranges;
     extern const SettingsUInt64 max_bytes_to_transfer;
+    extern const SettingsUInt64 max_limit_for_vector_search_queries;
     extern const SettingsUInt64 max_rows_to_transfer;
-    extern const SettingsOverflowMode transfer_overflow_mode;
-    extern const SettingsUInt64 use_index_for_in_with_subqueries_max_values;
     extern const SettingsUInt64 max_size_to_preallocate_for_joins;
-    extern const SettingsBool collect_hash_table_stats_during_joins;
-    extern const SettingsVectorSearchFilteringType vector_search_filtering;
+    extern const SettingsUInt64 query_plan_max_limit_for_lazy_materialization;
+    extern const SettingsUInt64 query_plan_max_optimizations_to_apply;
+    extern const SettingsUInt64 use_index_for_in_with_subqueries_max_values;
+    extern const SettingsVectorSearchFilterStrategy vector_search_filter_strategy;
 }
 
 namespace ServerSetting
@@ -107,7 +107,7 @@ QueryPlanOptimizationSettings::QueryPlanOptimizationSettings(
     optimize_lazy_materialization = from[Setting::query_plan_optimize_lazy_materialization];
     max_limit_for_lazy_materialization = from[Setting::query_plan_max_limit_for_lazy_materialization];
 
-    vector_search_filtering = from[Setting::vector_search_filtering].value;
+    vector_search_filter_strategy = from[Setting::vector_search_filter_strategy].value;
     max_limit_for_vector_search_queries = from[Setting::max_limit_for_vector_search_queries].value;
 
     query_plan_join_shard_by_pk_ranges = from[Setting::query_plan_join_shard_by_pk_ranges].value;
 
@@ -1,9 +1,9 @@
 #pragma once
 
+#include <Core/SettingsEnums.h>
 #include <Interpreters/Context_fwd.h>
 #include <Interpreters/ExpressionActionsSettings.h>
 #include <QueryPipeline/SizeLimits.h>
-#include <Core/Settings.h>
 
 #include <cstddef>
 
@@ -88,7 +88,7 @@ struct QueryPlanOptimizationSettings
     bool optimize_lazy_materialization = false;
     size_t max_limit_for_lazy_materialization = 0;
 
-    VectorSearchFilteringType vector_search_filtering;
+    VectorSearchFilterStrategy vector_search_filter_strategy;
     size_t max_limit_for_vector_search_queries;
 
     /// Setting needed for Sets (JOIN -> IN optimization)
 
@@ -48,9 +48,9 @@ void optimizeTreeFirstPass(const QueryPlanOptimizationSettings & optimization_se
 
     Optimization::ExtraSettings extra_settings = {
         optimization_settings.max_limit_for_vector_search_queries,
+        optimization_settings.vector_search_filter_strategy,
         optimization_settings.use_index_for_in_with_subqueries_max_values,
         optimization_settings.network_transfer_limits,
-        optimization_settings.vector_search_filtering,
     };
 
     while (!stack.empty())
Original file line number	Diff line number	Diff line change
`@@ -396,13 +396,13 @@ enum class DatabaseDataLakeCatalogType : uint8_t`
`396`	`396`
`397`	`397`	`DECLARE_SETTING_ENUM(DatabaseDataLakeCatalogType)`
`398`	`398`
`399`		`-enum class VectorSearchFilteringType : uint8_t`
	`399`	`+enum class VectorSearchFilterStrategy : uint8_t`
`400`	`400`	`{`
`401`	`401`	`AUTO,`
`402`	`402`	`PREFILTER,`
`403`	`403`	`POSTFILTER,`
`404`	`404`	`};`
`405`	`405`
`406`		`-DECLARE_SETTING_ENUM(VectorSearchFilteringType)`
	`406`	`+DECLARE_SETTING_ENUM(VectorSearchFilterStrategy)`
`407`	`407`
`408`	`408`	`}`