feat(kv-ir): Add support for duplicate columns in projections.#1245
Conversation
WalkthroughAdds a projected-key index to the NewProjectedSchemaTreeNode callback (now a std::pair<std::string_view, size_t>) and propagates that change through QueryHandler/QueryHandlerImpl, projection mapping, tests, and a KV IR integration site. Introduces allow_duplicate_projected_columns (bool) to creation APIs and updates ProjectionMap to store (key, index). Adds necessary / includes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant QH as QueryHandler
participant QHI as QueryHandlerImpl
participant PM as create_projected_columns_and_projection_map
participant CB as new_projected_schema_tree_node_callback
Client->>QH: create(cb, query, projections, case_sensitive, allow_dup)
QH->>QHI: create(query, projections, case_sensitive, allow_dup)
QHI->>PM: build map (projections, allow_dup)
note right of PM #DFF3F4: Map: ColumnDescriptor* -> (original_key, index)
note right of PM #DFF3F4: Duplicates allowed iff allow_dup=true
PM-->>QHI: ProjectionMap (key,index)
QHI-->>Client: QueryHandler instance
rect rgba(224,247,250,0.6)
note over QHI: When new schema tree nodes are resolved
QHI->>CB: invoke(is_auto_generated, node_id, (projected_key_path, index))
CB-->>QHI: Result<void>
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
…ist of projections.
| for (size_t index = 0; index < projections.size(); ++index) { | ||
| auto const& [key, types] = projections[index]; |
There was a problem hiding this comment.
If we were using c++23 I would use something like std::views::enumerate, but I couldn't think of a better way of doing this in c++20.
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (1)
531-545: Assert index invariants inside the test callbackSince the callback exposes both key and its index into the projections list, we can cheaply assert contract correctness here.
Apply this diff to add invariants:
- auto new_projected_schema_tree_node_callback - = [&](bool is_auto_gen, - SchemaTree::Node::id_t node_id, - std::string_view key, - size_t index) -> ystdlib::error_handling::Result<void> { + auto new_projected_schema_tree_node_callback + = [&](bool is_auto_gen, + SchemaTree::Node::id_t node_id, + std::string_view key, + size_t index) -> ystdlib::error_handling::Result<void> { REQUIRE((is_auto_generated == is_auto_gen)); + // Index must be within bounds of the original projections vector + REQUIRE((index < projections.size())); + // The key returned should equal the projection’s original key at this index + REQUIRE((key == std::string_view{projections[index].first})); auto [column_it, column_inserted] = actual_resolved_projections.try_emplace( std::string{key}, std::set<SchemaTree::Node::id_t>{} ); auto [node_id_it, node_id_inserted] = column_it->second.emplace(node_id); REQUIRE(node_id_inserted); return ystdlib::error_handling::success(); };components/core/src/clp_s/kv_ir_search.cpp (1)
213-219: Plan for wiring the new duplicates flag when projections are enabledYou currently don’t pass projections from CLI (feature unimplemented) and rely on the default allow_duplicate_projected_columns = false. When projection support lands here, please surface and thread a CLI/config flag to this parameter so non-JSON sinks that require repeated columns can opt in.
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (1)
319-330: Fix typos in the method docstring.“on the a new schema-tree tree node” has grammatical issues and duplication.
Apply this doc-only diff:
- /** - * Handles column resolution of the given token iterator on the a new schema-tree tree node. + /** + * Handles column resolution of the given token iterator on a new schema-tree node.components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (1)
60-67: Update return-value docs to reflect key+index mapping.The function now returns a projected-column-to-original-key-and-index map; the comment still mentions only the key.
Apply this diff:
* - The pair: * - A vector of projected columns. - * - A projected-column-to-original-key map. + * - A projected-column-to-original-key-and-index map.
♻️ Duplicate comments (1)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (1)
186-205: Note on std::string_view in unique_projected_columns.Using std::string_view here is fine since the set is local to the function and references the input projections’ std::string storage, which outlives the set; this aligns with prior team guidance.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (8)
components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp(3 hunks)components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp(2 hunks)components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp(7 hunks)components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp(7 hunks)components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp(5 hunks)components/core/src/clp/ffi/ir_stream/search/test/utils.cpp(1 hunks)components/core/src/clp/ffi/ir_stream/search/test/utils.hpp(1 hunks)components/core/src/clp_s/kv_ir_search.cpp(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp/ffi/ir_stream/search/QueryHandler.hppcomponents/core/src/clp_s/kv_ir_search.cppcomponents/core/src/clp/ffi/ir_stream/search/test/utils.cppcomponents/core/src/clp/ffi/ir_stream/search/test/utils.hppcomponents/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hppcomponents/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hppcomponents/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cppcomponents/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
🧠 Learnings (11)
📓 Common learnings
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.cpp:78-84
Timestamp: 2025-05-02T23:49:30.798Z
Learning: The function `trivial_new_projected_schema_tree_node_callback` in the KV-IR codebase is properly documented in the header file to indicate its purpose for testing, which addresses static analyzer warnings about unused functions.
📚 Learning: 2025-05-02T22:27:59.347Z
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/QueryHandler.hppcomponents/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hppcomponents/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp
📚 Learning: 2025-05-02T23:49:30.798Z
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.cpp:78-84
Timestamp: 2025-05-02T23:49:30.798Z
Learning: The function `trivial_new_projected_schema_tree_node_callback` in the KV-IR codebase is properly documented in the header file to indicate its purpose for testing, which addresses static analyzer warnings about unused functions.
Applied to files:
components/core/src/clp_s/kv_ir_search.cppcomponents/core/src/clp/ffi/ir_stream/search/test/utils.cppcomponents/core/src/clp/ffi/ir_stream/search/test/utils.hppcomponents/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp
📚 Learning: 2025-05-02T23:49:30.798Z
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.cpp:78-84
Timestamp: 2025-05-02T23:49:30.798Z
Learning: The function `trivial_new_projected_schema_tree_node_callback` in the KV-IR codebase is documented in the header file to explain its purpose for testing, which addresses static analyzer warnings about unused functions.
Applied to files:
components/core/src/clp_s/kv_ir_search.cppcomponents/core/src/clp/ffi/ir_stream/search/test/utils.cppcomponents/core/src/clp/ffi/ir_stream/search/test/utils.hppcomponents/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hppcomponents/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: LinZhihao-723
PR: y-scope/clp#549
File: components/core/tests/test-ir_encoding_methods.cpp:110-110
Timestamp: 2024-10-08T15:52:50.753Z
Learning: The function `handle_schema_tree_node_insertion` in `IrUnitHandler` must accept `clp::ffi::SchemaTree::NodeLocator` by value to match the `IrUnitHandlerInterface`.
Applied to files:
components/core/src/clp_s/kv_ir_search.cppcomponents/core/src/clp/ffi/ir_stream/search/test/utils.cpp
📚 Learning: 2024-10-18T02:31:18.595Z
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/utils.cppcomponents/core/src/clp/ffi/ir_stream/search/test/utils.hppcomponents/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp
📚 Learning: 2024-10-13T09:27:43.408Z
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/utils.cpp
📚 Learning: 2025-04-26T02:21:22.021Z
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp
📚 Learning: 2024-11-12T18:46:20.933Z
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hppcomponents/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-11-12T18:47:03.828Z
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:91-94
Timestamp: 2024-11-12T18:47:03.828Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, the `SchemaNode` class uses `std::unique_ptr<char[]> m_key_buf` and `std::string_view m_key_name` to ensure that references to `m_key_name` remain valid even after `SchemaNode` is move-constructed.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hppcomponents/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2025-05-07T16:56:53.901Z
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:428-450
Timestamp: 2025-05-07T16:56:53.901Z
Learning: The `QueryHandlerImpl` class in clp/ffi/ir_stream/search is not designed to be thread-safe and is assumed to not be used across multiple threads.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
🧬 Code graph analysis (7)
components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp (2)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (4)
create(417-443)create(417-422)create(471-503)create(471-472)components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (1)
query(138-144)
components/core/src/clp_s/kv_ir_search.cpp (4)
components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cpp (2)
maybe_unused(51-54)maybe_unused(58-62)components/core/tests/test-ffi_IrUnitHandlerReq.cpp (1)
maybe_unused(45-49)components/core/src/clp_s/JsonParser.cpp (2)
maybe_unused(56-59)maybe_unused(63-67)components/core/tests/test-ir_encoding_methods.cpp (2)
maybe_unused(114-117)maybe_unused(131-135)
components/core/src/clp/ffi/ir_stream/search/test/utils.cpp (1)
components/core/src/clp_s/kv_ir_search.cpp (2)
maybe_unused(81-84)maybe_unused(88-92)
components/core/src/clp/ffi/ir_stream/search/test/utils.hpp (1)
components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp (4)
nodiscard(43-62)nodiscard(84-95)nodiscard(104-107)is_auto_generated(84-88)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (1)
components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp (1)
new_projected_schema_tree_node_callback(43-50)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (1)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (1)
query(138-144)
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (2)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (4)
create(417-443)create(417-422)create(471-503)create(471-472)components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (1)
query(138-144)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-15, false)
🔇 Additional comments (10)
components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp (2)
4-4: Include for size_t is correct and necessaryAdding is appropriate for size_t usage in the public signature.
27-34: Manual verification required: confirm all callbacks updated to the new 4-parameter signatureI ran a heuristic search across
.cpp/.hpp/.cc/.hfiles for any occurrences ofProjectedSchemaTreeNodeCallback…(and specifically for the test utilitytrivial_new_projected_schema_tree_node_callback(…), but the scan produced no hits for 3-parameter usages nor mismatched definitions.Please manually verify the following:
- All aliases and typedefs of
NewProjectedSchemaTreeNodeCallbackTypeand related callback types now expect exactly four parameters(bool, SchemaTree::Node::id_t, std::string_view, size_t).- The declaration and definition of
trivial_new_projected_schema_tree_node_callbackin test utilities each accept four arguments, and no old overload or legacy fixture remains.- Every call site—both in production code and tests, including any out-of-tree consumers—has been updated to pass four parameters in the new order.
components/core/src/clp/ffi/ir_stream/search/test/utils.cpp (1)
70-72: Callback extension matches concept and uses [[maybe_unused]] appropriatelySignature aligns with the new concept and avoids unused-parameter warnings. No issues here.
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (1)
1-1: Include is appropriate for size_t in lambdasGood addition to keep test TU self-sufficient.
components/core/src/clp_s/kv_ir_search.cpp (1)
203-205: Lambda signature updated to 4-arg; matches QueryHandler’s conceptChange is correct and keeps QueryHandlerType deduction intact.
components/core/src/clp/ffi/ir_stream/search/test/utils.hpp (1)
84-86: Header declaration aligned with implementation and conceptSignature matches utils.cpp and the concept; include ensures size_t availability.
components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp (2)
48-50: API surface: sensible defaulting of allow_duplicate_projected_columns.Defaulting the flag to false preserves source compatibility while exposing the feature gating knob. Looks good.
57-59: Forwarding of new parameter is correct.The factory forwards both case_sensitive_match and allow_duplicate_projected_columns to QueryHandlerImpl::create in the right order. No issues.
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (2)
197-205: Duplicate detection semantics: confirm “by key only” is intentional.The DuplicateProjectedColumn check keys solely on the raw projection key string, ignoring the type bitmask. If two entries specify the same key with different type masks, they’ll be treated as duplicates when allow_duplicate_projected_columns == false. Confirm this matches intended behaviour.
If you want to permit same-key-with-different-types as distinct when duplicates are otherwise disallowed, we can change the uniqueness key to (key, types).
424-441: LGTM: end-to-end wiring of the duplicate-column flag and index propagation.create() threads allow_duplicate_projected_columns through, builds the new map, and stores it. The subsequent partial-resolution creation uses the updated map. Looks coherent and self-contained.
| * @param arg_0 Whether the schema-tree node is for an auto-generated kv-pair. | ||
| * @param arg_1 The ID of the schema-tree node. | ||
| * @param arg_2 The projected key path. | ||
| * @param arg_3 The index of the projected key in the list of projections. |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Clarify semantics of the new index parameter and lifetime of the key view
Please tighten the docs to be explicit that arg_3 is zero-based and refers to the position in the caller-provided projections list (including duplicates when enabled). Also note that arg_2 is a string_view whose lifetime is only guaranteed during the callback invocation.
Apply this doc-only diff:
- * @param arg_2 The projected key path.
- * @param arg_3 The index of the projected key in the list of projections.
+ * @param arg_2 The projected key path as provided in the projections list at arg_3.
+ * Note: The string_view is only valid for the duration of this call.
+ * @param arg_3 Zero-based index of this projected key in the original projections list
+ * supplied to QueryHandler::create (duplicates preserved if enabled).📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| * @param arg_3 The index of the projected key in the list of projections. | |
| * @param arg_2 The projected key path as provided in the projections list at arg_3. | |
| * Note: The string_view is only valid for the duration of this call. | |
| * @param arg_3 Zero-based index of this projected key in the original projections list | |
| * supplied to QueryHandler::create (duplicates preserved if enabled). |
🤖 Prompt for AI Agents
In
components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp
around line 23, update the parameter docs to state that arg_3 is a zero-based
index that refers to the position in the caller-provided projections list (and
therefore counts duplicates when duplicates are enabled) and explicitly document
that arg_2 is a std::string_view whose lifetime is only valid for the duration
of the callback invocation (callers must not retain references beyond the
callback).
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp
Outdated
Show resolved
Hide resolved
| auto query{clp_s::search::kql::parse_kql_expression(query_stream)}; | ||
|
|
||
| auto query_handler_impl_result{QueryHandlerImpl::create(query, {}, true)}; | ||
| auto query_handler_impl_result{QueryHandlerImpl::create(query, {}, true, false)}; |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Create() call sites updated; add coverage for allow_duplicate_projected_columns
Passing the new fourth parameter (false) is correct. To validate the new feature flag end-to-end, add a test path that sets allow_duplicate_projected_columns = true with intentionally duplicated projections and asserts multiple callback invocations with the correct indices.
Would you like me to open a follow-up to add a focused “duplicates in projections” section with synthetic duplicates and explicit index assertions?
Also applies to: 525-525, 620-620
🤖 Prompt for AI Agents
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
around lines 414 (also apply similar changes at 525 and 620): the new Create()
signature added a fourth parameter allow_duplicate_projected_columns which
should be exercised true in tests; add a new test path that calls
QueryHandlerImpl::create(query, {}, true, true) (fourth arg true), supply
intentionally duplicated projection entries in the query, then assert that the
query handler invokes the projection callback multiple times for duplicate
projections and that each callback receives the correct projection indices (use
explicit expected index list and assert call count and order); update the three
call sites (lines ~414, ~525, ~620) to include this new test branch and add
assertions verifying multiple callback invocations and exact indices for
duplicates.
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
Show resolved
Hide resolved
LinZhihao-723
left a comment
There was a problem hiding this comment.
Left some high-level questions and small fixes. Will do another round when the conversation is finalized.
| for (auto const& [key, types] : projections) { | ||
| if (unique_projected_columns.contains(key)) { | ||
| return ErrorCode{ErrorCodeEnum::DuplicateProjectedColumn}; | ||
| for (size_t index = 0; index < projections.size(); ++index) { |
There was a problem hiding this comment.
Missing <cstddef> for size_t
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp
Outdated
Show resolved
Hide resolved
| std::string_view, | ||
| size_t>; |
There was a problem hiding this comment.
What do u think if we bind the index and the key into a pair?
There was a problem hiding this comment.
No strong preference either way myself -- if you prefer binding them into a pair I'll make the change.
| using ProjectionMap = std:: | ||
| unordered_map<clp_s::search::ast::ColumnDescriptor*, std::pair<std::string, size_t>>; |
There was a problem hiding this comment.
A high-level question: the current implementation will create different column descriptors for the same key. What's the reason for not using the same descriptor to track all project columns? I can see the major concern would be that they might be projecting different types of node values. Are there any other concerns?
There was a problem hiding this comment.
The major one is projections with the same key name but different types, as you say. Generally though it's also unclear to me that sharing column descriptors for keys with the same name would give us any measurable benefits, so I prefer the simpler solution of just not sharing them.
| auto const original_key_and_index_it = m_projected_column_to_original_key_and_index.find(col); | ||
| if (m_projected_column_to_original_key_and_index.end() != original_key_and_index_it) { | ||
| auto const& [original_key, projected_index] = original_key_and_index_it->second; |
There was a problem hiding this comment.
| auto const original_key_and_index_it = m_projected_column_to_original_key_and_index.find(col); | |
| if (m_projected_column_to_original_key_and_index.end() != original_key_and_index_it) { | |
| auto const& [original_key, projected_index] = original_key_and_index_it->second; | |
| if (m_projected_column_to_original_key_and_index.contains(col)) { | |
| auto const& [original_key, projected_index] | |
| = m_projected_column_to_original_key_and_index.at(col); |
I think our guideline prefers this way more than using find. With the compiler optimization it should be equivalent to the find form.
There was a problem hiding this comment.
Are you sure? Older versions of g++ definitely don't optimize this away, and briefly playing around with this in godbolt recent versions of g++ still don't seem to optimize this away.
There was a problem hiding this comment.
I think u're right. Tried both ways on the latest GCC and clang. Using contains + at involves two hash lookups, while using find + iterator comparison is only one. From the generated assembly, it seems like they can be manually optimized, as the first hash lookup dominates the second. But I guess it's also fair for the compiler not to aggressively look into this case as:
- The user-defined hash function may have side effects.
atinvolves a throw on bad keys.
Anyway, I think the take-away is we should use find for better performance.
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
…te-projection-columns
…ction resolution callback.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
components/core/src/clp_s/kv_ir_search.cpp (1)
212-218: Be explicit about allow_duplicate_projected_columns in the create(...) callAlthough the 5th parameter defaults to false, being explicit here makes intent clear and future-proofs against default changes.
QueryHandlerType::create( trivial_new_projected_schema_tree_node_callback, std::move(query), {}, - false == command_line_arguments.get_ignore_case() + false == command_line_arguments.get_ignore_case(), + false )components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (1)
525-546: Add a focused duplicates test to assert multiple callbacks and exact indicesTo validate the PR objective end-to-end, add a test path that:
- constructs projections with intentional duplicates
- sets allow_duplicate_projected_columns = true
- asserts that the callback fires once per duplicate with the correct original indices
Sketch (add as a new SECTION near this block):
SECTION("projections with duplicates trigger multiple callbacks with correct indices") { auto null_query = std::shared_ptr<clp_s::search::ast::Expression>{}; // Start from resolvable projections and inject duplicates of the first two entries auto projections_with_dups = resolvable_projections; REQUIRE(projections_with_dups.size() >= 2); auto const dup0 = projections_with_dups[0]; auto const dup1 = projections_with_dups[1]; projections_with_dups.push_back(dup0); projections_with_dups.push_back(dup1); // Build expected indices for each key path (vector of zero-based positions in the original list) std::map<std::string, std::vector<size_t>> expected_indices; for (size_t i = 0; i < projections_with_dups.size(); ++i) { expected_indices[projections_with_dups[i].first].push_back(i); } auto query_handler_impl_result = QueryHandlerImpl::create( null_query, projections_with_dups, true, true); REQUIRE_FALSE(query_handler_impl_result.has_error()); auto& query_handler_impl = query_handler_impl_result.value(); std::map<std::string, std::vector<size_t>> actual_indices; auto cb = [&](bool /*is_auto_gen*/, SchemaTree::Node::id_t /*node_id*/, std::pair<std::string_view, size_t> key_and_index) -> ystdlib::error_handling::Result<void> { actual_indices[std::string{key_and_index.first}].push_back(key_and_index.second); return ystdlib::error_handling::success(); }; // Drive discovery through all nodes (same as the surrounding test) SchemaTree::Node::id_t node_id{1}; for (auto const& locator : locators) { REQUIRE_FALSE(query_handler_impl .update_partially_resolved_columns(true, locator, node_id, cb) .has_error()); REQUIRE_FALSE(query_handler_impl .update_partially_resolved_columns(false, locator, node_id, cb) .has_error()); ++node_id; } // For keys that were projected, the callback must have been invoked for each index for (auto const& [key, exp] : expected_indices) { // Only assert for keys that resolve to at least one node in this schema if (expected_resolved_projections.contains(key)) { REQUIRE(actual_indices.contains(key)); auto got = actual_indices[key]; std::sort(got.begin(), got.end()); auto exp_sorted = exp; std::sort(exp_sorted.begin(), exp_sorted.end()); REQUIRE(got == exp_sorted); } } }This directly validates index preservation and duplicate callback emission.
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (1)
468-474: Rename shadowing loop variable for clarity.The loop variable node_id shadows the function parameter node_id. Rename the loop binding to avoid confusion during maintenance.
- for (auto const [node_id, token_it] : new_partial_resolutions) { - auto [it, inserted] = partial_resolutions_to_update.try_emplace( - node_id, + for (auto const [dst_node_id, token_it] : new_partial_resolutions) { + auto [it, inserted] = partial_resolutions_to_update.try_emplace( + dst_node_id, std::vector<ColumnDescriptorTokenIterator>{} ); it->second.emplace_back(token_it); }
♻️ Duplicate comments (11)
components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp (1)
23-23: Tighten arg_2 docs: pair semantics, index base, and string_view lifetimePlease be explicit about:
- first: projected key path; lifetime of the string_view (valid only during callback)
- second: zero-based index into the original projections list; duplicates preserved when enabled
Doc-only diff:
- * @param arg_2 The projected key path and its original index in the list of projections. + * @param arg_2 A pair consisting of: + * - first: projected key path as provided in the original projections list. + * Note: the std::string_view is only guaranteed to be valid for the + * duration of this callback; callers must not retain it. + * - second: zero-based index of this projected key in the original projections list + * supplied to QueryHandler::create (duplicates preserved when + * allow_duplicate_projected_columns == true).components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (4)
532-546: Destructure and assert index bounds for clarity and early failureImprove readability by naming the pair elements and assert the index is within the original projections list.
- = [&]( - bool is_auto_gen, - SchemaTree::Node::id_t node_id, - std::pair<std::string_view, size_t> key_and_index - ) -> ystdlib::error_handling::Result<void> { + = [&]( + bool is_auto_gen, + SchemaTree::Node::id_t node_id, + std::pair<std::string_view, size_t> projected_key_path_and_index + ) -> ystdlib::error_handling::Result<void> { REQUIRE((is_auto_generated == is_auto_gen)); - auto [column_it, column_inserted] = actual_resolved_projections.try_emplace( - std::string{key_and_index.first}, + auto const& [projected_key_path, projected_key_index] = projected_key_path_and_index; + REQUIRE(projected_key_index < projections.size()); + auto [column_it, column_inserted] = actual_resolved_projections.try_emplace( + std::string{projected_key_path}, std::set<SchemaTree::Node::id_t>{} );
414-415: Exercise allow_duplicate_projected_columns in this test as wellYou currently pass false for allow_duplicate_projected_columns. To cover both behaviours, parameterize it with GENERATE(false, true). This won’t affect existing assertions since this test doesn’t inject duplicate projections.
- auto query_handler_impl_result{QueryHandlerImpl::create(query, {}, true, false)}; + auto const allow_duplicate_projected_columns = GENERATE(false, true); + auto query_handler_impl_result{ + QueryHandlerImpl::create(query, {}, true, allow_duplicate_projected_columns)};
525-526: Also parameterize allow_duplicate_projected_columns hereSame rationale as above; this test doesn’t create duplicates, so behaviour remains unchanged while adding coverage.
- auto query_handler_impl_result{QueryHandlerImpl::create(null_query, projections, true, false)}; + auto const allow_duplicate_projected_columns = GENERATE(false, true); + auto query_handler_impl_result{ + QueryHandlerImpl::create(null_query, projections, true, allow_duplicate_projected_columns)};
621-623: Parameterize allow_duplicate_projected_columns in helper as wellKeep consistency across tests and allow running the same evaluation paths with duplicates enabled.
- auto query_handler_impl_result{QueryHandlerImpl::create(query, {}, true, false)}; + auto const allow_duplicate_projected_columns = GENERATE(false, true); + auto query_handler_impl_result{ + QueryHandlerImpl::create(query, {}, true, allow_duplicate_projected_columns)};components/core/src/clp/ffi/ir_stream/search/test/utils.hpp (1)
72-84: Clarify pair semantics and string_view lifetime in the test helper docAdd detail so test authors don’t accidentally retain the view.
* Trivial implementation of `NewProjectedSchemaTreeNodeCallback` that always return success without * doing anything. * @param is_auto_generated * @param node_id - * @param projected_key_path_and_index + * @param projected_key_path_and_index A pair of: + * - first: projected key path as provided in the projections list at the given index. + * Note: std::string_view is only guaranteed to be valid during the callback; + * tests must not store it; copy to std::string if needed. + * - second: zero-based index of this projected key in the original projections vector + * (duplicates preserved when enabled).components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (2)
4-4: Explicitly including resolves prior concern.Good fix; this header uses size_t in a public alias and now includes explicitly.
143-145: Signature change acknowledged; verify all direct impl call sites are updated.Public wrapper appears to accept and forward the new flag. As a safety net, scan for any direct uses of QueryHandlerImpl::create that might still be calling the 3-arg overload.
#!/bin/bash # Find direct calls to QueryHandlerImpl::create across the tree with context rg -nP -C3 '\bQueryHandlerImpl::create\s*\(' --type=cpp --type=hppcomponents/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (3)
3-3: Including in the implementation is appropriate.This file uses size_t; the explicit include avoids relying on transitive headers.
243-253: Param threading into create_initial_partial_resolutions is correct.Iterating projected_column_to_original_key_and_index by descriptor pointer is appropriate and supports duplicates (distinct descriptors). Also, adopting a placeholder for the unused mapped value improves readability.
194-197: Pre-reserve containers and use try_emplace/in-place construction for minor wins.Avoid rehash/realloc under typical sizes and construct mapped values directly to skip temporaries and copies. Also, when deduplication is enabled, combine contains+emplace into a single emplace that reports insertion.
- std::unordered_set<std::string_view> unique_projected_columns; - std::vector<std::shared_ptr<ColumnDescriptor>> projected_columns; - QueryHandlerImpl::ProjectionMap projected_column_to_original_key_and_index; + std::unordered_set<std::string_view> unique_projected_columns; + std::vector<std::shared_ptr<ColumnDescriptor>> projected_columns; + QueryHandlerImpl::ProjectionMap projected_column_to_original_key_and_index; + unique_projected_columns.reserve(projections.size()); + projected_columns.reserve(projections.size()); + projected_column_to_original_key_and_index.reserve(projections.size()); @@ - for (size_t index{0}; index < projections.size(); ++index) { + for (size_t index{0}; index < projections.size(); ++index) { auto const& [key, types] = projections[index]; - if (false == allow_duplicate_projected_columns) { - if (unique_projected_columns.contains(key)) { - return ErrorCode{ErrorCodeEnum::DuplicateProjectedColumn}; - } - unique_projected_columns.emplace(key); - } + if (false == allow_duplicate_projected_columns) { + auto [_, inserted] = unique_projected_columns.emplace(key); + if (false == inserted) { + return ErrorCode{ErrorCodeEnum::DuplicateProjectedColumn}; + } + } @@ - projected_column_to_original_key_and_index.emplace( - column_descriptor.get(), - std::make_pair(key, index) - ); + projected_column_to_original_key_and_index.try_emplace( + column_descriptor.get(), key, index); @@ - return {std::move(projected_columns), std::move(projected_column_to_original_key_and_index)}; + return {std::move(projected_columns), + std::move(projected_column_to_original_key_and_index)};Also applies to: 198-205, 230-233, 240-241
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (7)
components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp(3 hunks)components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp(8 hunks)components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp(8 hunks)components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp(4 hunks)components/core/src/clp/ffi/ir_stream/search/test/utils.cpp(1 hunks)components/core/src/clp/ffi/ir_stream/search/test/utils.hpp(1 hunks)components/core/src/clp_s/kv_ir_search.cpp(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp/ffi/ir_stream/search/test/utils.hppcomponents/core/src/clp/ffi/ir_stream/search/test/utils.cppcomponents/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hppcomponents/core/src/clp_s/kv_ir_search.cppcomponents/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cppcomponents/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hppcomponents/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp
🧠 Learnings (12)
📓 Common learnings
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.
📚 Learning: 2025-05-02T23:49:30.798Z
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.cpp:78-84
Timestamp: 2025-05-02T23:49:30.798Z
Learning: The function `trivial_new_projected_schema_tree_node_callback` in the KV-IR codebase is properly documented in the header file to indicate its purpose for testing, which addresses static analyzer warnings about unused functions.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/utils.hppcomponents/core/src/clp/ffi/ir_stream/search/test/utils.cppcomponents/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hppcomponents/core/src/clp_s/kv_ir_search.cppcomponents/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cppcomponents/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp
📚 Learning: 2025-05-02T23:49:30.798Z
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.cpp:78-84
Timestamp: 2025-05-02T23:49:30.798Z
Learning: The function `trivial_new_projected_schema_tree_node_callback` in the KV-IR codebase is documented in the header file to explain its purpose for testing, which addresses static analyzer warnings about unused functions.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/utils.hppcomponents/core/src/clp/ffi/ir_stream/search/test/utils.cppcomponents/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hppcomponents/core/src/clp_s/kv_ir_search.cppcomponents/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-10-18T02:31:18.595Z
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/utils.hppcomponents/core/src/clp/ffi/ir_stream/search/test/utils.cppcomponents/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: LinZhihao-723
PR: y-scope/clp#549
File: components/core/tests/test-ir_encoding_methods.cpp:110-110
Timestamp: 2024-10-08T15:52:50.753Z
Learning: The function `handle_schema_tree_node_insertion` in `IrUnitHandler` must accept `clp::ffi::SchemaTree::NodeLocator` by value to match the `IrUnitHandlerInterface`.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/utils.hppcomponents/core/src/clp/ffi/ir_stream/search/test/utils.cppcomponents/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hppcomponents/core/src/clp_s/kv_ir_search.cpp
📚 Learning: 2025-05-02T22:27:59.347Z
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/utils.hppcomponents/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cppcomponents/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hppcomponents/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp
📚 Learning: 2024-11-12T18:46:20.933Z
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/utils.cppcomponents/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hppcomponents/core/src/clp_s/kv_ir_search.cppcomponents/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp
📚 Learning: 2024-11-12T18:47:03.828Z
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:91-94
Timestamp: 2024-11-12T18:47:03.828Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, the `SchemaNode` class uses `std::unique_ptr<char[]> m_key_buf` and `std::string_view m_key_name` to ensure that references to `m_key_name` remain valid even after `SchemaNode` is move-constructed.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/utils.cppcomponents/core/src/clp_s/kv_ir_search.cppcomponents/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cppcomponents/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp
📚 Learning: 2024-10-13T09:27:43.408Z
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/utils.cpp
📚 Learning: 2024-11-12T18:56:31.067Z
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:40-55
Timestamp: 2024-11-12T18:56:31.067Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, within the `SchemaNode` class, the use of `std::string_view` for `m_key_name` referencing `m_key_buf` is intentional to ensure that references to the key name remain valid even after move construction.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/utils.cppcomponents/core/src/clp_s/kv_ir_search.cpp
📚 Learning: 2025-04-26T02:21:22.021Z
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hppcomponents/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp
📚 Learning: 2025-05-07T16:56:53.901Z
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:428-450
Timestamp: 2025-05-07T16:56:53.901Z
Learning: The `QueryHandlerImpl` class in clp/ffi/ir_stream/search is not designed to be thread-safe and is assumed to not be used across multiple threads.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cppcomponents/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hppcomponents/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp
🧬 Code graph analysis (3)
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (2)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (7)
create(418-444)create(418-423)create(472-504)create(472-473)is_auto_generated(118-118)is_auto_generated(346-354)is_auto_generated(346-346)components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp (2)
new_projected_schema_tree_node_callback(44-51)is_auto_generated(85-89)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (1)
components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp (1)
new_projected_schema_tree_node_callback(44-51)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (1)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (1)
query(139-145)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
🔇 Additional comments (10)
components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp (2)
4-4: Includes for size_t and std::pair are correct and sufficientand are the right additions for size_t and std::pair in this header. No issues.
Also applies to: 7-7
27-33: Sanity-check all downstream instantiations of the new callback signatureI wasn’t able to automatically spot any “stale” uses of a lone
std::string_viewin your recent sweep, but we also didn’t surface the actual call-sites where the callback is passed in. To be safe, please manually verify every instantiation of theNewProjectedSchemaTreeNodeCallbackTypecallback—especially where you call
QueryHandler::create(…)or invoke
QueryHandlerImpl::update_partially_resolved_columns(…)
or
QueryHandlerImpl::handle_column_resolution_on_new_schema_tree_node(…)—to ensure each site now passes a
std::pair<std::string_view, size_t>rather than a rawstd::string_view.Key files to review:
- components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp
- components/core/src/clp/ffi/ir_stream/search/QueryHandlerReq.hpp
- components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp
- components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp
Double-check any FFI bindings or test helpers that wrap these calls to confirm all lambdas/func-ptrs match the updated
(bool, id_t, std::pair<string_view,size_t>)signature.components/core/src/clp_s/kv_ir_search.cpp (1)
200-205: Callback signature update is correctSwitching the third parameter to std::pair<std::string_view, size_t> is aligned with the updated concept; [[maybe_unused]] annotations are appropriate.
components/core/src/clp/ffi/ir_stream/search/test/utils.cpp (1)
67-73: LGTM: test helper updated to the new pair-based signatureThe trivial callback continues to return success and now accepts (path, index). No functional concerns.
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (1)
1-1: Include is appropriateNeeded for size_t in test code. Looks good.
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (3)
132-133: API doc addition for allow_duplicate_projected_columns looks correct.The parameter and behaviour are documented where users will see it.
299-301: Wiring the new ProjectionMap through ctor and member is consistent.Passing and storing the map of (original_key, index) is correct and keeps ownership/lifetime simple via std::string copies.
Also applies to: 313-316, 397-398
500-508: Avoid constructing a transient std::string when invoking the callback (lifetime and perf).Currently you build a std::pair<std::string, size_t> (via std::make_pair) and rely on conversion to std::pair<std::string_view, size_t>. That creates a short-lived std::string which the std::string_view will reference only during the call; if the callback stores the view, it risks dangling. Also it adds an unnecessary allocation/copy.
Construct the target pair in-place from the stable std::string stored in the map to yield a std::string_view that remains valid for the lifetime of QueryHandlerImpl.
- YSTDLIB_ERROR_HANDLING_TRYV(new_projected_schema_tree_node_callback( - is_auto_generated, - node_id, - std::make_pair(original_key, projected_index) - )); + YSTDLIB_ERROR_HANDLING_TRYV(new_projected_schema_tree_node_callback( + is_auto_generated, + node_id, + std::pair<std::string_view, size_t>{original_key, projected_index} + ));⛔ Skipped due to learnings
Learnt from: LinZhihao-723 PR: y-scope/clp#873 File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157 Timestamp: 2025-05-02T22:27:59.347Z Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.Learnt from: gibber9809 PR: y-scope/clp#584 File: components/core/src/clp_s/SchemaTree.hpp:171-171 Timestamp: 2024-11-12T18:46:20.933Z Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.Learnt from: gibber9809 PR: y-scope/clp#584 File: components/core/src/clp_s/SchemaTree.hpp:40-55 Timestamp: 2024-11-12T18:56:31.067Z Learning: In `components/core/src/clp_s/SchemaTree.hpp`, within the `SchemaNode` class, the use of `std::string_view` for `m_key_name` referencing `m_key_buf` is intentional to ensure that references to the key name remain valid even after move construction.Learnt from: gibber9809 PR: y-scope/clp#584 File: components/core/src/clp_s/SchemaTree.hpp:91-94 Timestamp: 2024-11-12T18:47:03.828Z Learning: In `components/core/src/clp_s/SchemaTree.hpp`, the `SchemaNode` class uses `std::unique_ptr<char[]> m_key_buf` and `std::string_view m_key_name` to ensure that references to `m_key_name` remain valid even after `SchemaNode` is move-constructed.Learnt from: LinZhihao-723 PR: y-scope/clp#882 File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:164-166 Timestamp: 2025-05-11T22:22:49.286Z Learning: For the `QueryHandlerImpl` class, reusing member variables like `m_ast_dfs_stack` is preferred over making methods `const` and moving state to function scope to avoid allocation/deallocation overhead, especially for frequently called methods like `evaluate_kv_pair_log_event`.Learnt from: LinZhihao-723 PR: y-scope/clp#873 File: components/core/src/clp/ffi/ir_stream/search/test/utils.cpp:78-84 Timestamp: 2025-05-02T23:49:30.798Z Learning: The function `trivial_new_projected_schema_tree_node_callback` in the KV-IR codebase is documented in the header file to explain its purpose for testing, which addresses static analyzer warnings about unused functions.Learnt from: LinZhihao-723 PR: y-scope/clp#873 File: components/core/src/clp/ffi/ir_stream/search/test/utils.cpp:78-84 Timestamp: 2025-05-02T23:49:30.798Z Learning: The function `trivial_new_projected_schema_tree_node_callback` in the KV-IR codebase is properly documented in the header file to indicate its purpose for testing, which addresses static analyzer warnings about unused functions.components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (2)
58-59: Docs and signatures updated consistently for duplicate projections and the new map.The added allow_duplicate_projected_columns parameter and the updated ProjectionMap usage are coherent across declarations.
Also applies to: 69-76, 91-97
418-444: Factory wires the new flag and map end-to-end correctly.QueryHandlerImpl::create now preprocesses, builds projections with the new flag, seeds partial resolutions using the updated map, and forwards everything to the ctor. Looks good.
| } catch (std::exception const& e) { | ||
| return ErrorCode{ErrorCodeEnum::ProjectionColumnDescriptorCreationFailure}; | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Remove unused exception variable to silence -Wunused warnings.
The exception object isn’t used.
- } catch (std::exception const& e) {
+ } catch (std::exception const&) {
return ErrorCode{ErrorCodeEnum::ProjectionColumnDescriptorCreationFailure};
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| } catch (std::exception const& e) { | |
| return ErrorCode{ErrorCodeEnum::ProjectionColumnDescriptorCreationFailure}; | |
| } | |
| } catch (std::exception const&) { | |
| return ErrorCode{ErrorCodeEnum::ProjectionColumnDescriptorCreationFailure}; | |
| } |
🤖 Prompt for AI Agents
In components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp around
lines 235-237, the catch clause declares an exception variable 'e' that is never
used causing -Wunused warnings; replace "catch (std::exception const& e)" with
"catch (std::exception const&)" (or use "catch (...)" if you intend to catch all
exceptions) and leave the return statement as-is to silence the warning.
| using ProjectionMap = std:: | ||
| unordered_map<clp_s::search::ast::ColumnDescriptor*, std::pair<std::string, size_t>>; |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Name the pair type to improve readability.
Using std::pair<std::string, size_t> directly obscures intent. A named alias (or a tiny struct) clarifies semantics and reduces .first/.second leakage elsewhere.
Apply this diff near the type section:
- using ProjectionMap = std::
- unordered_map<clp_s::search::ast::ColumnDescriptor*, std::pair<std::string, size_t>>;
+ using OriginalKeyAndIndex = std::pair<std::string, size_t>;
+ using ProjectionMap = std::
+ unordered_map<clp_s::search::ast::ColumnDescriptor*, OriginalKeyAndIndex>;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| using ProjectionMap = std:: | |
| unordered_map<clp_s::search::ast::ColumnDescriptor*, std::pair<std::string, size_t>>; | |
| // Name the pair type to improve readability | |
| using OriginalKeyAndIndex = std::pair<std::string, size_t>; | |
| using ProjectionMap = std:: | |
| unordered_map<clp_s::search::ast::ColumnDescriptor*, OriginalKeyAndIndex>; |
🤖 Prompt for AI Agents
In components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp around
lines 121-122, the map value type std::pair<std::string, size_t> is unnamed
which obscures meaning; introduce a named type (either a using alias like
ProjectionInfo = std::pair<std::string, size_t> or preferably a small struct
e.g. struct ProjectionInfo { std::string name; size_t index; };) and change
ProjectionMap to use that named type; update all downstream uses to replace
.first/.second with the explicit field names (e.g., .name/.index) for clarity.
Description
This PR follows up on #1217 by also allowing projections in kv-ir search to specify duplicate columns, guarded by a flag. The general idea is that if a user is serializing to some non-JSON format they aren't limited to outputting each column a single time, and we run into such situations with the velox connector implementation.
To better support duplicate columns we also change the signature expected by
NewProjectedSchemaTreeNodeCallbackReqto pass the user both the original key name as well as the index of the projected key within the original list of projections.Checklist
breaking change.
Validation performed
Summary by CodeRabbit
New Features
Documentation