-
Notifications
You must be signed in to change notification settings - Fork 88
feat(kv-ir): Add support for duplicate columns in projections. #1245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
75ac68d
082296d
c24e269
deaa66c
873f122
99d3083
1343e49
a91aaf1
d0e60cc
cfc431e
f998bbb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -1,6 +1,7 @@ | ||||||||||||||
| #ifndef CLP_FFI_IR_STREAM_SEARCH_QUERYHANDLERIMPL_HPP | ||||||||||||||
| #define CLP_FFI_IR_STREAM_SEARCH_QUERYHANDLERIMPL_HPP | ||||||||||||||
|
|
||||||||||||||
| #include <cstddef> | ||||||||||||||
| #include <memory> | ||||||||||||||
| #include <optional> | ||||||||||||||
| #include <string> | ||||||||||||||
|
|
@@ -117,7 +118,8 @@ class QueryHandlerImpl { | |||||||||||||
| clp_s::search::ast::DescriptorList::iterator m_next_token_it; | ||||||||||||||
| }; | ||||||||||||||
|
|
||||||||||||||
| using ProjectionMap = std::unordered_map<clp_s::search::ast::ColumnDescriptor*, std::string>; | ||||||||||||||
| using ProjectionMap = std:: | ||||||||||||||
| unordered_map<clp_s::search::ast::ColumnDescriptor*, std::pair<std::string, size_t>>; | ||||||||||||||
|
Comment on lines
+121
to
+122
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A high-level question: the current implementation will create different column descriptors for the same key. What's the reason for not using the same descriptor to track all project columns? I can see the major concern would be that they might be projecting different types of node values. Are there any other concerns?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The major one is projections with the same key name but different types, as you say. Generally though it's also unclear to me that sharing column descriptors for keys with the same name would give us any measurable benefits, so I prefer the simpler solution of just not sharing them.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, make sense.
Comment on lines
+121
to
+122
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Name the pair type to improve readability. Using std::pair<std::string, size_t> directly obscures intent. A named alias (or a tiny struct) clarifies semantics and reduces .first/.second leakage elsewhere. Apply this diff near the type section: - using ProjectionMap = std::
- unordered_map<clp_s::search::ast::ColumnDescriptor*, std::pair<std::string, size_t>>;
+ using OriginalKeyAndIndex = std::pair<std::string, size_t>;
+ using ProjectionMap = std::
+ unordered_map<clp_s::search::ast::ColumnDescriptor*, OriginalKeyAndIndex>;📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||
|
|
||||||||||||||
gibber9809 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
| using PartialResolutionMap = std:: | ||||||||||||||
| unordered_map<SchemaTree::Node::id_t, std::vector<ColumnDescriptorTokenIterator>>; | ||||||||||||||
|
|
@@ -127,6 +129,7 @@ class QueryHandlerImpl { | |||||||||||||
| * @param query The search query. | ||||||||||||||
| * @param projections The columns to project. | ||||||||||||||
| * @param case_sensitive_match Whether to use case-sensitive match for string comparison. | ||||||||||||||
| * @param allow_duplicate_projected_columns Whether to allow duplicate projected columns. | ||||||||||||||
| * @return A result containing the newly constructed `QueryHandler` on success, or an error code | ||||||||||||||
| * indicating the failure: | ||||||||||||||
| * - Forwards `preprocess_query`'s return values. | ||||||||||||||
|
|
@@ -137,7 +140,8 @@ class QueryHandlerImpl { | |||||||||||||
| std::shared_ptr<clp_s::search::ast::Expression> query, | ||||||||||||||
| std::vector<std::pair<std::string, clp_s::search::ast::literal_type_bitmask_t>> const& | ||||||||||||||
| projections, | ||||||||||||||
| bool case_sensitive_match | ||||||||||||||
| bool case_sensitive_match, | ||||||||||||||
| bool allow_duplicate_projected_columns | ||||||||||||||
| ) -> ystdlib::error_handling::Result<QueryHandlerImpl>; | ||||||||||||||
gibber9809 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
|
|
||||||||||||||
| // Delete copy constructor and assignment operator | ||||||||||||||
|
|
@@ -292,7 +296,7 @@ class QueryHandlerImpl { | |||||||||||||
| PartialResolutionMap auto_gen_namespace_partial_resolutions, | ||||||||||||||
| PartialResolutionMap user_gen_namespace_partial_resolutions, | ||||||||||||||
| std::vector<std::shared_ptr<clp_s::search::ast::ColumnDescriptor>> projected_columns, | ||||||||||||||
| ProjectionMap projected_column_to_original_key, | ||||||||||||||
| ProjectionMap projected_column_to_original_key_and_index, | ||||||||||||||
| bool case_sensitive_match | ||||||||||||||
| ) | ||||||||||||||
| : m_query{std::move(query)}, | ||||||||||||||
|
|
@@ -306,7 +310,9 @@ class QueryHandlerImpl { | |||||||||||||
| std::move(user_gen_namespace_partial_resolutions) | ||||||||||||||
| }, | ||||||||||||||
| m_projected_columns{std::move(projected_columns)}, | ||||||||||||||
| m_projected_column_to_original_key{std::move(projected_column_to_original_key)}, | ||||||||||||||
| m_projected_column_to_original_key_and_index{ | ||||||||||||||
| std::move(projected_column_to_original_key_and_index) | ||||||||||||||
| }, | ||||||||||||||
| m_case_sensitive_match{case_sensitive_match} {} | ||||||||||||||
|
|
||||||||||||||
| // Methods | ||||||||||||||
|
|
@@ -388,7 +394,7 @@ class QueryHandlerImpl { | |||||||||||||
| std::unordered_set<SchemaTree::Node::id_t>> | ||||||||||||||
| m_resolved_column_to_schema_tree_node_ids; | ||||||||||||||
| std::vector<std::shared_ptr<clp_s::search::ast::ColumnDescriptor>> m_projected_columns; | ||||||||||||||
| ProjectionMap m_projected_column_to_original_key; | ||||||||||||||
| ProjectionMap m_projected_column_to_original_key_and_index; | ||||||||||||||
| bool m_case_sensitive_match; | ||||||||||||||
| std::vector<std::pair<AstExprIterator, ast_evaluation_result_bitmask_t>> m_ast_dfs_stack; | ||||||||||||||
| }; | ||||||||||||||
|
|
@@ -492,11 +498,13 @@ auto QueryHandlerImpl::handle_column_resolution_on_new_schema_tree_node( | |||||||||||||
| } | ||||||||||||||
|
|
||||||||||||||
| auto* col{token_it.get_column_descriptor()}; | ||||||||||||||
| if (m_projected_column_to_original_key.contains(col)) { | ||||||||||||||
| auto const original_key_and_index_it = m_projected_column_to_original_key_and_index.find(col); | ||||||||||||||
| if (m_projected_column_to_original_key_and_index.end() != original_key_and_index_it) { | ||||||||||||||
| auto const& [original_key, projected_index] = original_key_and_index_it->second; | ||||||||||||||
|
Comment on lines
+501
to
+503
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
I think our guideline prefers this way more than using
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you sure? Older versions of g++ definitely don't optimize this away, and briefly playing around with this in godbolt recent versions of g++ still don't seem to optimize this away.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let me double check.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think u're right. Tried both ways on the latest GCC and clang. Using
Anyway, I think the take-away is we should use |
||||||||||||||
| YSTDLIB_ERROR_HANDLING_TRYV(new_projected_schema_tree_node_callback( | ||||||||||||||
| is_auto_generated, | ||||||||||||||
| node_id, | ||||||||||||||
| m_projected_column_to_original_key.at(col) | ||||||||||||||
| std::make_pair(original_key, projected_index) | ||||||||||||||
| )); | ||||||||||||||
| return ystdlib::error_handling::success(); | ||||||||||||||
| } | ||||||||||||||
|
|
||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,4 @@ | ||
| #include <cstddef> | ||
| #include <map> | ||
| #include <memory> | ||
| #include <set> | ||
|
|
@@ -410,7 +411,7 @@ TEST_CASE( | |
| auto query_stream{std::istringstream{kql_query_str}}; | ||
| auto query{clp_s::search::kql::parse_kql_expression(query_stream)}; | ||
|
|
||
| auto query_handler_impl_result{QueryHandlerImpl::create(query, {}, true)}; | ||
| auto query_handler_impl_result{QueryHandlerImpl::create(query, {}, true, false)}; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Create() call sites updated; add coverage for allow_duplicate_projected_columns Passing the new fourth parameter (false) is correct. To validate the new feature flag end-to-end, add a test path that sets allow_duplicate_projected_columns = true with intentionally duplicated projections and asserts multiple callback invocations with the correct indices. Would you like me to open a follow-up to add a focused “duplicates in projections” section with synthetic duplicates and explicit index assertions? Also applies to: 525-525, 620-620 🤖 Prompt for AI Agents |
||
| REQUIRE_FALSE(query_handler_impl_result.has_error()); | ||
| auto& query_handler_impl{query_handler_impl_result.value()}; | ||
|
|
||
|
|
@@ -521,19 +522,21 @@ TEST_CASE("query_handler_handle_projection", "[ffi][ir_stream][search][QueryHand | |
| unresolvable_projections_from_unrecognized_namespaces.cend() | ||
| ); | ||
|
|
||
| auto query_handler_impl_result{QueryHandlerImpl::create(null_query, projections, true)}; | ||
| auto query_handler_impl_result{QueryHandlerImpl::create(null_query, projections, true, false)}; | ||
| REQUIRE_FALSE(query_handler_impl_result.has_error()); | ||
| auto& query_handler_impl{query_handler_impl_result.value()}; | ||
|
|
||
| SchemaTree::Node::id_t node_id{1}; | ||
| std::map<std::string, std::set<SchemaTree::Node::id_t>> actual_resolved_projections; | ||
| auto new_projected_schema_tree_node_callback | ||
| = [&](bool is_auto_gen, | ||
| SchemaTree::Node::id_t node_id, | ||
| std::string_view key) -> ystdlib::error_handling::Result<void> { | ||
| = [&]( | ||
| bool is_auto_gen, | ||
| SchemaTree::Node::id_t node_id, | ||
| std::pair<std::string_view, size_t> key_and_index | ||
| ) -> ystdlib::error_handling::Result<void> { | ||
| REQUIRE((is_auto_generated == is_auto_gen)); | ||
| auto [column_it, column_inserted] = actual_resolved_projections.try_emplace( | ||
| std::string{key}, | ||
| std::string{key_and_index.first}, | ||
| std::set<SchemaTree::Node::id_t>{} | ||
| ); | ||
| auto [node_id_it, node_id_inserted] = column_it->second.emplace(node_id); | ||
|
|
@@ -615,7 +618,7 @@ TEST_CASE("query_handler_evaluation_kv_pair_log_event", "[ffi][ir_stream][search | |
| auto query{clp_s::search::kql::parse_kql_expression(query_stream)}; | ||
| REQUIRE((nullptr != query)); | ||
|
|
||
| auto query_handler_impl_result{QueryHandlerImpl::create(query, {}, true)}; | ||
| auto query_handler_impl_result{QueryHandlerImpl::create(query, {}, true, false)}; | ||
| REQUIRE_FALSE(query_handler_impl_result.has_error()); | ||
| auto& query_handler_impl{query_handler_impl_result.value()}; | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Remove unused exception variable to silence -Wunused warnings.
The exception object isn’t used.
📝 Committable suggestion
🤖 Prompt for AI Agents