Skip to content

feat(kv-ir): Add support for duplicate columns in projections.#1245

Merged
gibber9809 merged 11 commits intoy-scope:mainfrom
gibber9809:kv-ir-allow-duplicate-projection-columns
Sep 3, 2025
Merged

feat(kv-ir): Add support for duplicate columns in projections.#1245
gibber9809 merged 11 commits intoy-scope:mainfrom
gibber9809:kv-ir-allow-duplicate-projection-columns

Conversation

@gibber9809
Copy link
Contributor

@gibber9809 gibber9809 commented Aug 21, 2025

Description

This PR follows up on #1217 by also allowing projections in kv-ir search to specify duplicate columns, guarded by a flag. The general idea is that if a user is serializing to some non-JSON format they aren't limited to outputting each column a single time, and we run into such situations with the velox connector implementation.

To better support duplicate columns we also change the signature expected by NewProjectedSchemaTreeNodeCallbackReq to pass the user both the original key name as well as the index of the projected key within the original list of projections.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Validated that existing unit tests pass.

Summary by CodeRabbit

  • New Features

    • Added an option to allow duplicate projected columns in search queries.
    • Extended the callback for newly resolved projections to include a projection index (path + index) for order-aware handling.
  • Documentation

    • Updated parameter descriptions and examples to document the duplicate-allowance option and the extended callback signature.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 21, 2025

Walkthrough

Adds a projected-key index to the NewProjectedSchemaTreeNode callback (now a std::pair<std::string_view, size_t>) and propagates that change through QueryHandler/QueryHandlerImpl, projection mapping, tests, and a KV IR integration site. Introduces allow_duplicate_projected_columns (bool) to creation APIs and updates ProjectionMap to store (key, index). Adds necessary / includes.

Changes

Cohort / File(s) Summary of changes
Callback concept update
components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp
Callback type changed to accept a std::pair<std::string_view, size_t> as the projected-key argument; added <cstddef> and <utility> includes and updated docs.
Public factory API
components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp
QueryHandler::create signature gains bool allow_duplicate_projected_columns = false and forwards it to the impl; docs updated to describe the new parameter and the projection callback.
QueryHandlerImpl core
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp, .../QueryHandlerImpl.cpp
ProjectionMap value switched to std::pair<std::string, std::size_t>; member renamed to ..._key_and_index. create/create_projected_columns_and_projection_map/create_initial_partial_resolutions updated to accept allow_duplicate_projected_columns and the new ProjectionMap; projection building preserves original index and optionally allows duplicates; callback invocations pass (original_key, index). <cstddef> added.
Tests: QueryHandlerImpl
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
Calls to QueryHandlerImpl::create(...) updated to pass the new boolean arg; test callbacks now accept std::pair<std::string_view, size_t> (index currently maybe_unused). Added <cstddef>.
Test utilities
components/core/src/clp/ffi/ir_stream/search/test/utils.hpp, .../utils.cpp
trivial_new_projected_schema_tree_node_callback signature changed to accept std::pair<std::string_view, size_t>; docs updated.
KV IR integration
components/core/src/clp_s/kv_ir_search.cpp
Local callback used by deserialize_and_search_kv_ir_stream updated to accept std::pair<std::string_view, size_t> (index maybe_unused); decltype changes propagate to QueryHandlerType but behaviour unchanged.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant QH as QueryHandler
  participant QHI as QueryHandlerImpl
  participant PM as create_projected_columns_and_projection_map
  participant CB as new_projected_schema_tree_node_callback

  Client->>QH: create(cb, query, projections, case_sensitive, allow_dup)
  QH->>QHI: create(query, projections, case_sensitive, allow_dup)
  QHI->>PM: build map (projections, allow_dup)
  note right of PM #DFF3F4: Map: ColumnDescriptor* -> (original_key, index)
  note right of PM #DFF3F4: Duplicates allowed iff allow_dup=true
  PM-->>QHI: ProjectionMap (key,index)
  QHI-->>Client: QueryHandler instance

  rect rgba(224,247,250,0.6)
    note over QHI: When new schema tree nodes are resolved
    QHI->>CB: invoke(is_auto_generated, node_id, (projected_key_path, index))
    CB-->>QHI: Result<void>
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • wraymo
  • LinZhihao-723
  • kirkrodrigues

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Comment on lines 197 to 198
for (size_t index = 0; index < projections.size(); ++index) {
auto const& [key, types] = projections[index];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were using c++23 I would use something like std::views::enumerate, but I couldn't think of a better way of doing this in c++20.

@gibber9809 gibber9809 marked this pull request as ready for review August 21, 2025 15:51
@gibber9809 gibber9809 requested review from a team and wraymo as code owners August 21, 2025 15:51
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (1)

531-545: Assert index invariants inside the test callback

Since the callback exposes both key and its index into the projections list, we can cheaply assert contract correctness here.

Apply this diff to add invariants:

-    auto new_projected_schema_tree_node_callback
-            = [&](bool is_auto_gen,
-                  SchemaTree::Node::id_t node_id,
-                  std::string_view key,
-                  size_t index) -> ystdlib::error_handling::Result<void> {
+    auto new_projected_schema_tree_node_callback
+            = [&](bool is_auto_gen,
+                  SchemaTree::Node::id_t node_id,
+                  std::string_view key,
+                  size_t index) -> ystdlib::error_handling::Result<void> {
         REQUIRE((is_auto_generated == is_auto_gen));
+        // Index must be within bounds of the original projections vector
+        REQUIRE((index < projections.size()));
+        // The key returned should equal the projection’s original key at this index
+        REQUIRE((key == std::string_view{projections[index].first}));
         auto [column_it, column_inserted] = actual_resolved_projections.try_emplace(
                 std::string{key},
                 std::set<SchemaTree::Node::id_t>{}
         );
         auto [node_id_it, node_id_inserted] = column_it->second.emplace(node_id);
         REQUIRE(node_id_inserted);
         return ystdlib::error_handling::success();
     };
components/core/src/clp_s/kv_ir_search.cpp (1)

213-219: Plan for wiring the new duplicates flag when projections are enabled

You currently don’t pass projections from CLI (feature unimplemented) and rely on the default allow_duplicate_projected_columns = false. When projection support lands here, please surface and thread a CLI/config flag to this parameter so non-JSON sinks that require repeated columns can opt in.

components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (1)

319-330: Fix typos in the method docstring.

“on the a new schema-tree tree node” has grammatical issues and duplication.

Apply this doc-only diff:

-    /**
-     * Handles column resolution of the given token iterator on the a new schema-tree tree node.
+    /**
+     * Handles column resolution of the given token iterator on a new schema-tree node.
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (1)

60-67: Update return-value docs to reflect key+index mapping.

The function now returns a projected-column-to-original-key-and-index map; the comment still mentions only the key.

Apply this diff:

  * - The pair:
  *   - A vector of projected columns.
- *   - A projected-column-to-original-key map.
+ *   - A projected-column-to-original-key-and-index map.
♻️ Duplicate comments (1)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (1)

186-205: Note on std::string_view in unique_projected_columns.

Using std::string_view here is fine since the set is local to the function and references the input projections’ std::string storage, which outlives the set; this aligns with prior team guidance.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8a00bff and 082296d.

📒 Files selected for processing (8)
  • components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp (3 hunks)
  • components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp (2 hunks)
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (7 hunks)
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (7 hunks)
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (5 hunks)
  • components/core/src/clp/ffi/ir_stream/search/test/utils.cpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/test/utils.hpp (1 hunks)
  • components/core/src/clp_s/kv_ir_search.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp
  • components/core/src/clp_s/kv_ir_search.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/utils.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/utils.hpp
  • components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
🧠 Learnings (11)
📓 Common learnings
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.cpp:78-84
Timestamp: 2025-05-02T23:49:30.798Z
Learning: The function `trivial_new_projected_schema_tree_node_callback` in the KV-IR codebase is properly documented in the header file to indicate its purpose for testing, which addresses static analyzer warnings about unused functions.
📚 Learning: 2025-05-02T22:27:59.347Z
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp
📚 Learning: 2025-05-02T23:49:30.798Z
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.cpp:78-84
Timestamp: 2025-05-02T23:49:30.798Z
Learning: The function `trivial_new_projected_schema_tree_node_callback` in the KV-IR codebase is properly documented in the header file to indicate its purpose for testing, which addresses static analyzer warnings about unused functions.

Applied to files:

  • components/core/src/clp_s/kv_ir_search.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/utils.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/utils.hpp
  • components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp
📚 Learning: 2025-05-02T23:49:30.798Z
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.cpp:78-84
Timestamp: 2025-05-02T23:49:30.798Z
Learning: The function `trivial_new_projected_schema_tree_node_callback` in the KV-IR codebase is documented in the header file to explain its purpose for testing, which addresses static analyzer warnings about unused functions.

Applied to files:

  • components/core/src/clp_s/kv_ir_search.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/utils.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/utils.hpp
  • components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: LinZhihao-723
PR: y-scope/clp#549
File: components/core/tests/test-ir_encoding_methods.cpp:110-110
Timestamp: 2024-10-08T15:52:50.753Z
Learning: The function `handle_schema_tree_node_insertion` in `IrUnitHandler` must accept `clp::ffi::SchemaTree::NodeLocator` by value to match the `IrUnitHandlerInterface`.

Applied to files:

  • components/core/src/clp_s/kv_ir_search.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/utils.cpp
📚 Learning: 2024-10-18T02:31:18.595Z
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/utils.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/utils.hpp
  • components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp
📚 Learning: 2024-10-13T09:27:43.408Z
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/utils.cpp
📚 Learning: 2025-04-26T02:21:22.021Z
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp
📚 Learning: 2024-11-12T18:46:20.933Z
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-11-12T18:47:03.828Z
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:91-94
Timestamp: 2024-11-12T18:47:03.828Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, the `SchemaNode` class uses `std::unique_ptr<char[]> m_key_buf` and `std::string_view m_key_name` to ensure that references to `m_key_name` remain valid even after `SchemaNode` is move-constructed.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2025-05-07T16:56:53.901Z
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:428-450
Timestamp: 2025-05-07T16:56:53.901Z
Learning: The `QueryHandlerImpl` class in clp/ffi/ir_stream/search is not designed to be thread-safe and is assumed to not be used across multiple threads.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
🧬 Code graph analysis (7)
components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp (2)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (4)
  • create (417-443)
  • create (417-422)
  • create (471-503)
  • create (471-472)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (1)
  • query (138-144)
components/core/src/clp_s/kv_ir_search.cpp (4)
components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cpp (2)
  • maybe_unused (51-54)
  • maybe_unused (58-62)
components/core/tests/test-ffi_IrUnitHandlerReq.cpp (1)
  • maybe_unused (45-49)
components/core/src/clp_s/JsonParser.cpp (2)
  • maybe_unused (56-59)
  • maybe_unused (63-67)
components/core/tests/test-ir_encoding_methods.cpp (2)
  • maybe_unused (114-117)
  • maybe_unused (131-135)
components/core/src/clp/ffi/ir_stream/search/test/utils.cpp (1)
components/core/src/clp_s/kv_ir_search.cpp (2)
  • maybe_unused (81-84)
  • maybe_unused (88-92)
components/core/src/clp/ffi/ir_stream/search/test/utils.hpp (1)
components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp (4)
  • nodiscard (43-62)
  • nodiscard (84-95)
  • nodiscard (104-107)
  • is_auto_generated (84-88)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (1)
components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp (1)
  • new_projected_schema_tree_node_callback (43-50)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (1)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (1)
  • query (138-144)
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (2)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (4)
  • create (417-443)
  • create (417-422)
  • create (471-503)
  • create (471-472)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (1)
  • query (138-144)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-15, false)
🔇 Additional comments (10)
components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp (2)

4-4: Include for size_t is correct and necessary

Adding is appropriate for size_t usage in the public signature.


27-34: Manual verification required: confirm all callbacks updated to the new 4-parameter signature

I ran a heuristic search across .cpp/.hpp/.cc/.h files for any occurrences of ProjectedSchemaTreeNodeCallback…( and specifically for the test utility trivial_new_projected_schema_tree_node_callback(…), but the scan produced no hits for 3-parameter usages nor mismatched definitions.

Please manually verify the following:

  • All aliases and typedefs of NewProjectedSchemaTreeNodeCallbackType and related callback types now expect exactly four parameters (bool, SchemaTree::Node::id_t, std::string_view, size_t).
  • The declaration and definition of trivial_new_projected_schema_tree_node_callback in test utilities each accept four arguments, and no old overload or legacy fixture remains.
  • Every call site—both in production code and tests, including any out-of-tree consumers—has been updated to pass four parameters in the new order.
components/core/src/clp/ffi/ir_stream/search/test/utils.cpp (1)

70-72: Callback extension matches concept and uses [[maybe_unused]] appropriately

Signature aligns with the new concept and avoids unused-parameter warnings. No issues here.

components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (1)

1-1: Include is appropriate for size_t in lambdas

Good addition to keep test TU self-sufficient.

components/core/src/clp_s/kv_ir_search.cpp (1)

203-205: Lambda signature updated to 4-arg; matches QueryHandler’s concept

Change is correct and keeps QueryHandlerType deduction intact.

components/core/src/clp/ffi/ir_stream/search/test/utils.hpp (1)

84-86: Header declaration aligned with implementation and concept

Signature matches utils.cpp and the concept; include ensures size_t availability.

components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp (2)

48-50: API surface: sensible defaulting of allow_duplicate_projected_columns.

Defaulting the flag to false preserves source compatibility while exposing the feature gating knob. Looks good.


57-59: Forwarding of new parameter is correct.

The factory forwards both case_sensitive_match and allow_duplicate_projected_columns to QueryHandlerImpl::create in the right order. No issues.

components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (2)

197-205: Duplicate detection semantics: confirm “by key only” is intentional.

The DuplicateProjectedColumn check keys solely on the raw projection key string, ignoring the type bitmask. If two entries specify the same key with different type masks, they’ll be treated as duplicates when allow_duplicate_projected_columns == false. Confirm this matches intended behaviour.

If you want to permit same-key-with-different-types as distinct when duplicates are otherwise disallowed, we can change the uniqueness key to (key, types).


424-441: LGTM: end-to-end wiring of the duplicate-column flag and index propagation.

create() threads allow_duplicate_projected_columns through, builds the new map, and stores it. The subsequent partial-resolution creation uses the updated map. Looks coherent and self-contained.

* @param arg_0 Whether the schema-tree node is for an auto-generated kv-pair.
* @param arg_1 The ID of the schema-tree node.
* @param arg_2 The projected key path.
* @param arg_3 The index of the projected key in the list of projections.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Clarify semantics of the new index parameter and lifetime of the key view

Please tighten the docs to be explicit that arg_3 is zero-based and refers to the position in the caller-provided projections list (including duplicates when enabled). Also note that arg_2 is a string_view whose lifetime is only guaranteed during the callback invocation.

Apply this doc-only diff:

- * @param arg_2 The projected key path.
- * @param arg_3 The index of the projected key in the list of projections.
+ * @param arg_2 The projected key path as provided in the projections list at arg_3.
+ *              Note: The string_view is only valid for the duration of this call.
+ * @param arg_3 Zero-based index of this projected key in the original projections list
+ *              supplied to QueryHandler::create (duplicates preserved if enabled).
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* @param arg_3 The index of the projected key in the list of projections.
* @param arg_2 The projected key path as provided in the projections list at arg_3.
* Note: The string_view is only valid for the duration of this call.
* @param arg_3 Zero-based index of this projected key in the original projections list
* supplied to QueryHandler::create (duplicates preserved if enabled).
🤖 Prompt for AI Agents
In
components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp
around line 23, update the parameter docs to state that arg_3 is a zero-based
index that refers to the position in the caller-provided projections list (and
therefore counts duplicates when duplicates are enabled) and explicitly document
that arg_2 is a std::string_view whose lifetime is only valid for the duration
of the callback invocation (callers must not retain references beyond the
callback).

auto query{clp_s::search::kql::parse_kql_expression(query_stream)};

auto query_handler_impl_result{QueryHandlerImpl::create(query, {}, true)};
auto query_handler_impl_result{QueryHandlerImpl::create(query, {}, true, false)};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Create() call sites updated; add coverage for allow_duplicate_projected_columns

Passing the new fourth parameter (false) is correct. To validate the new feature flag end-to-end, add a test path that sets allow_duplicate_projected_columns = true with intentionally duplicated projections and asserts multiple callback invocations with the correct indices.

Would you like me to open a follow-up to add a focused “duplicates in projections” section with synthetic duplicates and explicit index assertions?

Also applies to: 525-525, 620-620

🤖 Prompt for AI Agents
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
around lines 414 (also apply similar changes at 525 and 620): the new Create()
signature added a fourth parameter allow_duplicate_projected_columns which
should be exercised true in tests; add a new test path that calls
QueryHandlerImpl::create(query, {}, true, true) (fourth arg true), supply
intentionally duplicated projection entries in the query, then assert that the
query handler invokes the projection callback multiple times for duplicate
projections and that each callback receives the correct projection indices (use
explicit expected index list and assert call count and order); update the three
call sites (lines ~414, ~525, ~620) to include this new test branch and add
assertions verifying multiple callback invocations and exact indices for
duplicates.

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some high-level questions and small fixes. Will do another round when the conversation is finalized.

for (auto const& [key, types] : projections) {
if (unique_projected_columns.contains(key)) {
return ErrorCode{ErrorCodeEnum::DuplicateProjectedColumn};
for (size_t index = 0; index < projections.size(); ++index) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing <cstddef> for size_t

Comment on lines 32 to 33
std::string_view,
size_t>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do u think if we bind the index and the key into a pair?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong preference either way myself -- if you prefer binding them into a pair I'll make the change.

Comment on lines +121 to +122
using ProjectionMap = std::
unordered_map<clp_s::search::ast::ColumnDescriptor*, std::pair<std::string, size_t>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A high-level question: the current implementation will create different column descriptors for the same key. What's the reason for not using the same descriptor to track all project columns? I can see the major concern would be that they might be projecting different types of node values. Are there any other concerns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The major one is projections with the same key name but different types, as you say. Generally though it's also unclear to me that sharing column descriptors for keys with the same name would give us any measurable benefits, so I prefer the simpler solution of just not sharing them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, make sense.

Comment on lines +501 to +503
auto const original_key_and_index_it = m_projected_column_to_original_key_and_index.find(col);
if (m_projected_column_to_original_key_and_index.end() != original_key_and_index_it) {
auto const& [original_key, projected_index] = original_key_and_index_it->second;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto const original_key_and_index_it = m_projected_column_to_original_key_and_index.find(col);
if (m_projected_column_to_original_key_and_index.end() != original_key_and_index_it) {
auto const& [original_key, projected_index] = original_key_and_index_it->second;
if (m_projected_column_to_original_key_and_index.contains(col)) {
auto const& [original_key, projected_index]
= m_projected_column_to_original_key_and_index.at(col);

I think our guideline prefers this way more than using find. With the compiler optimization it should be equivalent to the find form.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? Older versions of g++ definitely don't optimize this away, and briefly playing around with this in godbolt recent versions of g++ still don't seem to optimize this away.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me double check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think u're right. Tried both ways on the latest GCC and clang. Using contains + at involves two hash lookups, while using find + iterator comparison is only one. From the generated assembly, it seems like they can be manually optimized, as the first hash lookup dominates the second. But I guess it's also fair for the compiler not to aggressively look into this case as:

  • The user-defined hash function may have side effects.
  • at involves a throw on bad keys.

Anyway, I think the take-away is we should use find for better performance.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
components/core/src/clp_s/kv_ir_search.cpp (1)

212-218: Be explicit about allow_duplicate_projected_columns in the create(...) call

Although the 5th parameter defaults to false, being explicit here makes intent clear and future-proofs against default changes.

             QueryHandlerType::create(
                     trivial_new_projected_schema_tree_node_callback,
                     std::move(query),
                     {},
-                    false == command_line_arguments.get_ignore_case()
+                    false == command_line_arguments.get_ignore_case(),
+                    false
             )
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (1)

525-546: Add a focused duplicates test to assert multiple callbacks and exact indices

To validate the PR objective end-to-end, add a test path that:

  • constructs projections with intentional duplicates
  • sets allow_duplicate_projected_columns = true
  • asserts that the callback fires once per duplicate with the correct original indices

Sketch (add as a new SECTION near this block):

SECTION("projections with duplicates trigger multiple callbacks with correct indices") {
    auto null_query = std::shared_ptr<clp_s::search::ast::Expression>{};
    // Start from resolvable projections and inject duplicates of the first two entries
    auto projections_with_dups = resolvable_projections;
    REQUIRE(projections_with_dups.size() >= 2);
    auto const dup0 = projections_with_dups[0];
    auto const dup1 = projections_with_dups[1];
    projections_with_dups.push_back(dup0);
    projections_with_dups.push_back(dup1);

    // Build expected indices for each key path (vector of zero-based positions in the original list)
    std::map<std::string, std::vector<size_t>> expected_indices;
    for (size_t i = 0; i < projections_with_dups.size(); ++i) {
        expected_indices[projections_with_dups[i].first].push_back(i);
    }

    auto query_handler_impl_result = QueryHandlerImpl::create(
            null_query, projections_with_dups, true, true);
    REQUIRE_FALSE(query_handler_impl_result.has_error());
    auto& query_handler_impl = query_handler_impl_result.value();

    std::map<std::string, std::vector<size_t>> actual_indices;
    auto cb = [&](bool /*is_auto_gen*/,
                  SchemaTree::Node::id_t /*node_id*/,
                  std::pair<std::string_view, size_t> key_and_index)
            -> ystdlib::error_handling::Result<void> {
        actual_indices[std::string{key_and_index.first}].push_back(key_and_index.second);
        return ystdlib::error_handling::success();
    };

    // Drive discovery through all nodes (same as the surrounding test)
    SchemaTree::Node::id_t node_id{1};
    for (auto const& locator : locators) {
        REQUIRE_FALSE(query_handler_impl
                              .update_partially_resolved_columns(true, locator, node_id, cb)
                              .has_error());
        REQUIRE_FALSE(query_handler_impl
                              .update_partially_resolved_columns(false, locator, node_id, cb)
                              .has_error());
        ++node_id;
    }

    // For keys that were projected, the callback must have been invoked for each index
    for (auto const& [key, exp] : expected_indices) {
        // Only assert for keys that resolve to at least one node in this schema
        if (expected_resolved_projections.contains(key)) {
            REQUIRE(actual_indices.contains(key));
            auto got = actual_indices[key];
            std::sort(got.begin(), got.end());
            auto exp_sorted = exp;
            std::sort(exp_sorted.begin(), exp_sorted.end());
            REQUIRE(got == exp_sorted);
        }
    }
}

This directly validates index preservation and duplicate callback emission.

components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (1)

468-474: Rename shadowing loop variable for clarity.

The loop variable node_id shadows the function parameter node_id. Rename the loop binding to avoid confusion during maintenance.

-    for (auto const [node_id, token_it] : new_partial_resolutions) {
-        auto [it, inserted] = partial_resolutions_to_update.try_emplace(
-                node_id,
+    for (auto const [dst_node_id, token_it] : new_partial_resolutions) {
+        auto [it, inserted] = partial_resolutions_to_update.try_emplace(
+                dst_node_id,
                 std::vector<ColumnDescriptorTokenIterator>{}
         );
         it->second.emplace_back(token_it);
     }
♻️ Duplicate comments (11)
components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp (1)

23-23: Tighten arg_2 docs: pair semantics, index base, and string_view lifetime

Please be explicit about:

  • first: projected key path; lifetime of the string_view (valid only during callback)
  • second: zero-based index into the original projections list; duplicates preserved when enabled

Doc-only diff:

- * @param arg_2 The projected key path and its original index in the list of projections.
+ * @param arg_2 A pair consisting of:
+ *              - first: projected key path as provided in the original projections list.
+ *                        Note: the std::string_view is only guaranteed to be valid for the
+ *                        duration of this callback; callers must not retain it.
+ *              - second: zero-based index of this projected key in the original projections list
+ *                        supplied to QueryHandler::create (duplicates preserved when
+ *                        allow_duplicate_projected_columns == true).
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (4)

532-546: Destructure and assert index bounds for clarity and early failure

Improve readability by naming the pair elements and assert the index is within the original projections list.

-            = [&](
-                      bool is_auto_gen,
-                      SchemaTree::Node::id_t node_id,
-                      std::pair<std::string_view, size_t> key_and_index
-              ) -> ystdlib::error_handling::Result<void> {
+            = [&](
+                      bool is_auto_gen,
+                      SchemaTree::Node::id_t node_id,
+                      std::pair<std::string_view, size_t> projected_key_path_and_index
+              ) -> ystdlib::error_handling::Result<void> {
         REQUIRE((is_auto_generated == is_auto_gen));
-        auto [column_it, column_inserted] = actual_resolved_projections.try_emplace(
-                std::string{key_and_index.first},
+        auto const& [projected_key_path, projected_key_index] = projected_key_path_and_index;
+        REQUIRE(projected_key_index < projections.size());
+        auto [column_it, column_inserted] = actual_resolved_projections.try_emplace(
+                std::string{projected_key_path},
                 std::set<SchemaTree::Node::id_t>{}
         );

414-415: Exercise allow_duplicate_projected_columns in this test as well

You currently pass false for allow_duplicate_projected_columns. To cover both behaviours, parameterize it with GENERATE(false, true). This won’t affect existing assertions since this test doesn’t inject duplicate projections.

-    auto query_handler_impl_result{QueryHandlerImpl::create(query, {}, true, false)};
+    auto const allow_duplicate_projected_columns = GENERATE(false, true);
+    auto query_handler_impl_result{
+            QueryHandlerImpl::create(query, {}, true, allow_duplicate_projected_columns)};

525-526: Also parameterize allow_duplicate_projected_columns here

Same rationale as above; this test doesn’t create duplicates, so behaviour remains unchanged while adding coverage.

-    auto query_handler_impl_result{QueryHandlerImpl::create(null_query, projections, true, false)};
+    auto const allow_duplicate_projected_columns = GENERATE(false, true);
+    auto query_handler_impl_result{
+            QueryHandlerImpl::create(null_query, projections, true, allow_duplicate_projected_columns)};

621-623: Parameterize allow_duplicate_projected_columns in helper as well

Keep consistency across tests and allow running the same evaluation paths with duplicates enabled.

-        auto query_handler_impl_result{QueryHandlerImpl::create(query, {}, true, false)};
+        auto const allow_duplicate_projected_columns = GENERATE(false, true);
+        auto query_handler_impl_result{
+                QueryHandlerImpl::create(query, {}, true, allow_duplicate_projected_columns)};
components/core/src/clp/ffi/ir_stream/search/test/utils.hpp (1)

72-84: Clarify pair semantics and string_view lifetime in the test helper doc

Add detail so test authors don’t accidentally retain the view.

  * Trivial implementation of `NewProjectedSchemaTreeNodeCallback` that always return success without
  * doing anything.
  * @param is_auto_generated
  * @param node_id
- * @param projected_key_path_and_index
+ * @param projected_key_path_and_index A pair of:
+ *        - first: projected key path as provided in the projections list at the given index.
+ *                 Note: std::string_view is only guaranteed to be valid during the callback;
+ *                 tests must not store it; copy to std::string if needed.
+ *        - second: zero-based index of this projected key in the original projections vector
+ *                 (duplicates preserved when enabled).
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (2)

4-4: Explicitly including resolves prior concern.

Good fix; this header uses size_t in a public alias and now includes explicitly.


143-145: Signature change acknowledged; verify all direct impl call sites are updated.

Public wrapper appears to accept and forward the new flag. As a safety net, scan for any direct uses of QueryHandlerImpl::create that might still be calling the 3-arg overload.

#!/bin/bash
# Find direct calls to QueryHandlerImpl::create across the tree with context
rg -nP -C3 '\bQueryHandlerImpl::create\s*\(' --type=cpp --type=hpp
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (3)

3-3: Including in the implementation is appropriate.

This file uses size_t; the explicit include avoids relying on transitive headers.


243-253: Param threading into create_initial_partial_resolutions is correct.

Iterating projected_column_to_original_key_and_index by descriptor pointer is appropriate and supports duplicates (distinct descriptors). Also, adopting a placeholder for the unused mapped value improves readability.


194-197: Pre-reserve containers and use try_emplace/in-place construction for minor wins.

Avoid rehash/realloc under typical sizes and construct mapped values directly to skip temporaries and copies. Also, when deduplication is enabled, combine contains+emplace into a single emplace that reports insertion.

-    std::unordered_set<std::string_view> unique_projected_columns;
-    std::vector<std::shared_ptr<ColumnDescriptor>> projected_columns;
-    QueryHandlerImpl::ProjectionMap projected_column_to_original_key_and_index;
+    std::unordered_set<std::string_view> unique_projected_columns;
+    std::vector<std::shared_ptr<ColumnDescriptor>> projected_columns;
+    QueryHandlerImpl::ProjectionMap projected_column_to_original_key_and_index;
+    unique_projected_columns.reserve(projections.size());
+    projected_columns.reserve(projections.size());
+    projected_column_to_original_key_and_index.reserve(projections.size());
@@
-    for (size_t index{0}; index < projections.size(); ++index) {
+    for (size_t index{0}; index < projections.size(); ++index) {
         auto const& [key, types] = projections[index];
-        if (false == allow_duplicate_projected_columns) {
-            if (unique_projected_columns.contains(key)) {
-                return ErrorCode{ErrorCodeEnum::DuplicateProjectedColumn};
-            }
-            unique_projected_columns.emplace(key);
-        }
+        if (false == allow_duplicate_projected_columns) {
+            auto [_, inserted] = unique_projected_columns.emplace(key);
+            if (false == inserted) {
+                return ErrorCode{ErrorCodeEnum::DuplicateProjectedColumn};
+            }
+        }
@@
-            projected_column_to_original_key_and_index.emplace(
-                    column_descriptor.get(),
-                    std::make_pair(key, index)
-            );
+            projected_column_to_original_key_and_index.try_emplace(
+                    column_descriptor.get(), key, index);
@@
-    return {std::move(projected_columns), std::move(projected_column_to_original_key_and_index)};
+    return {std::move(projected_columns),
+            std::move(projected_column_to_original_key_and_index)};

Also applies to: 198-205, 230-233, 240-241

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 873f122 and a91aaf1.

📒 Files selected for processing (7)
  • components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp (3 hunks)
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (8 hunks)
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (8 hunks)
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (4 hunks)
  • components/core/src/clp/ffi/ir_stream/search/test/utils.cpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/test/utils.hpp (1 hunks)
  • components/core/src/clp_s/kv_ir_search.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/ffi/ir_stream/search/test/utils.hpp
  • components/core/src/clp/ffi/ir_stream/search/test/utils.cpp
  • components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp
  • components/core/src/clp_s/kv_ir_search.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp
🧠 Learnings (12)
📓 Common learnings
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.
📚 Learning: 2025-05-02T23:49:30.798Z
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.cpp:78-84
Timestamp: 2025-05-02T23:49:30.798Z
Learning: The function `trivial_new_projected_schema_tree_node_callback` in the KV-IR codebase is properly documented in the header file to indicate its purpose for testing, which addresses static analyzer warnings about unused functions.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/utils.hpp
  • components/core/src/clp/ffi/ir_stream/search/test/utils.cpp
  • components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp
  • components/core/src/clp_s/kv_ir_search.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp
📚 Learning: 2025-05-02T23:49:30.798Z
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.cpp:78-84
Timestamp: 2025-05-02T23:49:30.798Z
Learning: The function `trivial_new_projected_schema_tree_node_callback` in the KV-IR codebase is documented in the header file to explain its purpose for testing, which addresses static analyzer warnings about unused functions.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/utils.hpp
  • components/core/src/clp/ffi/ir_stream/search/test/utils.cpp
  • components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp
  • components/core/src/clp_s/kv_ir_search.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-10-18T02:31:18.595Z
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/utils.hpp
  • components/core/src/clp/ffi/ir_stream/search/test/utils.cpp
  • components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: LinZhihao-723
PR: y-scope/clp#549
File: components/core/tests/test-ir_encoding_methods.cpp:110-110
Timestamp: 2024-10-08T15:52:50.753Z
Learning: The function `handle_schema_tree_node_insertion` in `IrUnitHandler` must accept `clp::ffi::SchemaTree::NodeLocator` by value to match the `IrUnitHandlerInterface`.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/utils.hpp
  • components/core/src/clp/ffi/ir_stream/search/test/utils.cpp
  • components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp
  • components/core/src/clp_s/kv_ir_search.cpp
📚 Learning: 2025-05-02T22:27:59.347Z
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/utils.hpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp
📚 Learning: 2024-11-12T18:46:20.933Z
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/utils.cpp
  • components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp
  • components/core/src/clp_s/kv_ir_search.cpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp
📚 Learning: 2024-11-12T18:47:03.828Z
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:91-94
Timestamp: 2024-11-12T18:47:03.828Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, the `SchemaNode` class uses `std::unique_ptr<char[]> m_key_buf` and `std::string_view m_key_name` to ensure that references to `m_key_name` remain valid even after `SchemaNode` is move-constructed.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/utils.cpp
  • components/core/src/clp_s/kv_ir_search.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp
📚 Learning: 2024-10-13T09:27:43.408Z
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/utils.cpp
📚 Learning: 2024-11-12T18:56:31.067Z
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:40-55
Timestamp: 2024-11-12T18:56:31.067Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, within the `SchemaNode` class, the use of `std::string_view` for `m_key_name` referencing `m_key_buf` is intentional to ensure that references to the key name remain valid even after move construction.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/utils.cpp
  • components/core/src/clp_s/kv_ir_search.cpp
📚 Learning: 2025-04-26T02:21:22.021Z
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp
📚 Learning: 2025-05-07T16:56:53.901Z
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:428-450
Timestamp: 2025-05-07T16:56:53.901Z
Learning: The `QueryHandlerImpl` class in clp/ffi/ir_stream/search is not designed to be thread-safe and is assumed to not be used across multiple threads.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp
🧬 Code graph analysis (3)
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (2)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (7)
  • create (418-444)
  • create (418-423)
  • create (472-504)
  • create (472-473)
  • is_auto_generated (118-118)
  • is_auto_generated (346-354)
  • is_auto_generated (346-346)
components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp (2)
  • new_projected_schema_tree_node_callback (44-51)
  • is_auto_generated (85-89)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (1)
components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp (1)
  • new_projected_schema_tree_node_callback (44-51)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (1)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (1)
  • query (139-145)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
🔇 Additional comments (10)
components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp (2)

4-4: Includes for size_t and std::pair are correct and sufficient

and are the right additions for size_t and std::pair in this header. No issues.

Also applies to: 7-7


27-33: Sanity-check all downstream instantiations of the new callback signature

I wasn’t able to automatically spot any “stale” uses of a lone std::string_view in your recent sweep, but we also didn’t surface the actual call-sites where the callback is passed in. To be safe, please manually verify every instantiation of the NewProjectedSchemaTreeNodeCallbackType callback—especially where you call
QueryHandler::create(…) or invoke
QueryHandlerImpl::update_partially_resolved_columns(…)
or
QueryHandlerImpl::handle_column_resolution_on_new_schema_tree_node(…)—to ensure each site now passes a
std::pair<std::string_view, size_t> rather than a raw std::string_view.

Key files to review:

  • components/core/src/clp/ffi/ir_stream/search/NewProjectedSchemaTreeNodeCallbackReq.hpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerReq.hpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp

Double-check any FFI bindings or test helpers that wrap these calls to confirm all lambdas/func-ptrs match the updated (bool, id_t, std::pair<string_view,size_t>) signature.

components/core/src/clp_s/kv_ir_search.cpp (1)

200-205: Callback signature update is correct

Switching the third parameter to std::pair<std::string_view, size_t> is aligned with the updated concept; [[maybe_unused]] annotations are appropriate.

components/core/src/clp/ffi/ir_stream/search/test/utils.cpp (1)

67-73: LGTM: test helper updated to the new pair-based signature

The trivial callback continues to return success and now accepts (path, index). No functional concerns.

components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (1)

1-1: Include is appropriate

Needed for size_t in test code. Looks good.

components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (3)

132-133: API doc addition for allow_duplicate_projected_columns looks correct.

The parameter and behaviour are documented where users will see it.


299-301: Wiring the new ProjectionMap through ctor and member is consistent.

Passing and storing the map of (original_key, index) is correct and keeps ownership/lifetime simple via std::string copies.

Also applies to: 313-316, 397-398


500-508: Avoid constructing a transient std::string when invoking the callback (lifetime and perf).

Currently you build a std::pair<std::string, size_t> (via std::make_pair) and rely on conversion to std::pair<std::string_view, size_t>. That creates a short-lived std::string which the std::string_view will reference only during the call; if the callback stores the view, it risks dangling. Also it adds an unnecessary allocation/copy.

Construct the target pair in-place from the stable std::string stored in the map to yield a std::string_view that remains valid for the lifetime of QueryHandlerImpl.

-        YSTDLIB_ERROR_HANDLING_TRYV(new_projected_schema_tree_node_callback(
-                is_auto_generated,
-                node_id,
-                std::make_pair(original_key, projected_index)
-        ));
+        YSTDLIB_ERROR_HANDLING_TRYV(new_projected_schema_tree_node_callback(
+                is_auto_generated,
+                node_id,
+                std::pair<std::string_view, size_t>{original_key, projected_index}
+        ));
⛔ Skipped due to learnings
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:40-55
Timestamp: 2024-11-12T18:56:31.067Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, within the `SchemaNode` class, the use of `std::string_view` for `m_key_name` referencing `m_key_buf` is intentional to ensure that references to the key name remain valid even after move construction.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:91-94
Timestamp: 2024-11-12T18:47:03.828Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, the `SchemaNode` class uses `std::unique_ptr<char[]> m_key_buf` and `std::string_view m_key_name` to ensure that references to `m_key_name` remain valid even after `SchemaNode` is move-constructed.
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:164-166
Timestamp: 2025-05-11T22:22:49.286Z
Learning: For the `QueryHandlerImpl` class, reusing member variables like `m_ast_dfs_stack` is preferred over making methods `const` and moving state to function scope to avoid allocation/deallocation overhead, especially for frequently called methods like `evaluate_kv_pair_log_event`.
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.cpp:78-84
Timestamp: 2025-05-02T23:49:30.798Z
Learning: The function `trivial_new_projected_schema_tree_node_callback` in the KV-IR codebase is documented in the header file to explain its purpose for testing, which addresses static analyzer warnings about unused functions.
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.cpp:78-84
Timestamp: 2025-05-02T23:49:30.798Z
Learning: The function `trivial_new_projected_schema_tree_node_callback` in the KV-IR codebase is properly documented in the header file to indicate its purpose for testing, which addresses static analyzer warnings about unused functions.
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (2)

58-59: Docs and signatures updated consistently for duplicate projections and the new map.

The added allow_duplicate_projected_columns parameter and the updated ProjectionMap usage are coherent across declarations.

Also applies to: 69-76, 91-97


418-444: Factory wires the new flag and map end-to-end correctly.

QueryHandlerImpl::create now preprocesses, builds projections with the new flag, seeds partial resolutions using the updated map, and forwards everything to the ctor. Looks good.

Comment on lines 235 to 237
} catch (std::exception const& e) {
return ErrorCode{ErrorCodeEnum::ProjectionColumnDescriptorCreationFailure};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Remove unused exception variable to silence -Wunused warnings.

The exception object isn’t used.

-        } catch (std::exception const& e) {
+        } catch (std::exception const&) {
             return ErrorCode{ErrorCodeEnum::ProjectionColumnDescriptorCreationFailure};
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} catch (std::exception const& e) {
return ErrorCode{ErrorCodeEnum::ProjectionColumnDescriptorCreationFailure};
}
} catch (std::exception const&) {
return ErrorCode{ErrorCodeEnum::ProjectionColumnDescriptorCreationFailure};
}
🤖 Prompt for AI Agents
In components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp around
lines 235-237, the catch clause declares an exception variable 'e' that is never
used causing -Wunused warnings; replace "catch (std::exception const& e)" with
"catch (std::exception const&)" (or use "catch (...)" if you intend to catch all
exceptions) and leave the return statement as-is to silence the warning.

Comment on lines +121 to +122
using ProjectionMap = std::
unordered_map<clp_s::search::ast::ColumnDescriptor*, std::pair<std::string, size_t>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Name the pair type to improve readability.

Using std::pair<std::string, size_t> directly obscures intent. A named alias (or a tiny struct) clarifies semantics and reduces .first/.second leakage elsewhere.

Apply this diff near the type section:

-    using ProjectionMap = std::
-            unordered_map<clp_s::search::ast::ColumnDescriptor*, std::pair<std::string, size_t>>;
+    using OriginalKeyAndIndex = std::pair<std::string, size_t>;
+    using ProjectionMap = std::
+            unordered_map<clp_s::search::ast::ColumnDescriptor*, OriginalKeyAndIndex>;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
using ProjectionMap = std::
unordered_map<clp_s::search::ast::ColumnDescriptor*, std::pair<std::string, size_t>>;
// Name the pair type to improve readability
using OriginalKeyAndIndex = std::pair<std::string, size_t>;
using ProjectionMap = std::
unordered_map<clp_s::search::ast::ColumnDescriptor*, OriginalKeyAndIndex>;
🤖 Prompt for AI Agents
In components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp around
lines 121-122, the map value type std::pair<std::string, size_t> is unnamed
which obscures meaning; introduce a named type (either a using alias like
ProjectionInfo = std::pair<std::string, size_t> or preferably a small struct
e.g. struct ProjectionInfo { std::string name; size_t index; };) and change
ProjectionMap to use that named type; update all downstream uses to replace
.first/.second with the explicit field names (e.g., .name/.index) for clarity.

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title lgtm.

@gibber9809 gibber9809 merged commit a8444ff into y-scope:main Sep 3, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants