Skip to content

Conversation

@LinZhihao-723
Copy link
Member

@LinZhihao-723 LinZhihao-723 commented Nov 5, 2025

Description

This is one of PR series to address #1541.

This PR adds a new EncodedTextAst implementation, using StringBlob to store dictionary variables and logtypes. Besides storage, it implements a generic decoding method similar to this as a member method to allow text decoding. Similarly, it implements a to_string method similar to this to decode and unparse the encoded text AST to its string form.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Ensure all workflows pass.
  • Add unit tests to cover basic decoding and error handling behaviors.

Summary by CodeRabbit

  • New Features

    • Added a type-safe API to parse and render encoded text with embedded integers, floats and dictionary placeholders, plus optional unescaping.
    • Added a method to append strings to the string storage used by the encoded-text API.
  • Tests

    • Added comprehensive unit tests covering decoding, round-trip fidelity, boundary cases and explicit error scenarios across supported encoded-variable types.
  • Bug Fixes

    • Clarified user-facing error messages for missing components and trailing-escape conditions.

@LinZhihao-723 LinZhihao-723 requested a review from a team as a code owner November 5, 2025 22:14
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 5, 2025

Walkthrough

Adds a templated EncodedTextAst type with decode/to_string, a new EncodedTextAst error enum and category, a concept restricting encoded-variable types, StringBlob::append, new Catch2 tests, and updates CMake to include the new sources.

Changes

Cohort / File(s) Summary
EncodedTextAst Core API
components/core/src/clp/ffi/EncodedTextAst.hpp
New templated class EncodedTextAst<encoded_variable_t> with create, get_logtype, templated decode<unescape_logtype>(...), and to_string; defines handler concepts for constant/int/float/dict handlers; stores encoded vars, StringBlob and dict var count.
Error Type & Category
components/core/src/clp/ffi/EncodedTextAstError.hpp, components/core/src/clp/ffi/EncodedTextAstError.cpp
New EncodedTextAstErrorEnum (MissingDictVar, MissingEncodedVar, MissingLogtype, UnexpectedTrailingEscapeCharacter), alias EncodedTextAstError, and specialized error category with human-readable messages.
IR Type Constraint
components/core/src/clp/ir/types.hpp
New concept clp::ir::EncodedVariableTypeReq<encoded_variable_t> constraining encoded-variable types to eight_byte_encoded_variable_t or four_byte_encoded_variable_t.
StringBlob API
components/core/src/clp/ffi/StringBlob.hpp
Added auto append(std::string_view str) -> void to append data and update internal offsets.
Tests
components/core/src/clp/ffi/test/test_EncodedTextAst.cpp, components/core/src/clp/ffi/test/test_StringBlob.cpp
New Catch2 test test_EncodedTextAst.cpp exercising decoding, round-trip and error conditions across two encoded-variable types; test_StringBlob.cpp wrapped in clp::ffi::test namespace.
Build Configuration
components/core/CMakeLists.txt
Appended new source files to unit test sources: src/clp/ffi/EncodedTextAst.hpp, src/clp/ffi/EncodedTextAstError.cpp, src/clp/ffi/EncodedTextAstError.hpp, src/clp/ffi/test/test_EncodedTextAst.cpp.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant EncodedTextAst
    participant StringBlob
    participant Handlers

    Caller->>EncodedTextAst: create(encoded_vars, string_blob)
    activate EncodedTextAst
    Caller->>EncodedTextAst: decode<unescape_logtype>(constant,intVar,floatVar,dictVar)
    EncodedTextAst->>StringBlob: read logtype & placeholders
    loop per segment/placeholder
        alt Constant
            EncodedTextAst->>Handlers: constant(segment)
        else Integer
            EncodedTextAst->>Handlers: intVar(encoded_var)
        else Float
            EncodedTextAst->>Handlers: floatVar(encoded_var)
        else Dictionary
            EncodedTextAst->>Handlers: dictVar(dict_index)
        else Escape
            EncodedTextAst->>EncodedTextAst: validate escape (no trailing)
        end
    end
    EncodedTextAst->>EncodedTextAst: validate counts (encoded/dict)
    alt OK
        EncodedTextAst->>Caller: Result<void>::ok()
    else Error
        EncodedTextAst->>Caller: Result<void>::err(EncodedTextAstError)
    end
    deactivate EncodedTextAst
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Inspect templated decode implementation and handler concept signatures in components/core/src/clp/ffi/EncodedTextAst.hpp (edge cases: trailing escape handling, correct consumption/counting of encoded and dict vars).
  • Verify enum values and messages in components/core/src/clp/ffi/EncodedTextAstError.cpp / .hpp.
  • Validate StringBlob::append modifies offsets correctly and is used consistently in tests.
  • Review components/core/src/clp/ffi/test/test_EncodedTextAst.cpp for thoroughness and correctness across both encoded-variable types.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 53.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately describes the main change: adding a new EncodedTextAst implementation that uses StringBlob, which aligns with the key additions across multiple files.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e59609d and bd72701.

📒 Files selected for processing (1)
  • components/core/src/clp/ffi/EncodedTextAst.hpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/ffi/EncodedTextAst.hpp
🧠 Learnings (8)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2025-08-06T23:57:49.762Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 1163
File: components/core/src/clp_s/ColumnWriter.hpp:120-122
Timestamp: 2025-08-06T23:57:49.762Z
Learning: In the CLP codebase, the `encoded_log_dict_id_t` type alias in `components/core/src/clp_s/ColumnWriter.hpp` is appropriately placed as a class-scoped alias because the ColumnWriter.hpp header is already included in all places where this type is used, eliminating the need to move it to a common header.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAst.hpp
📚 Learning: 2024-10-18T02:31:18.595Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAst.hpp
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAst.hpp
📚 Learning: 2024-10-07T21:16:41.660Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAst.hpp
📚 Learning: 2024-10-13T09:27:43.408Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAst.hpp
📚 Learning: 2025-06-24T20:13:46.758Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 939
File: components/package-template/src/etc/clp-config.yml:64-64
Timestamp: 2025-06-24T20:13:46.758Z
Learning: When users ask CodeRabbit to create an issue after providing suggestions, they want a GitHub issue created with the high-level requirements and context, not specific code implementations.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAst.hpp
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAst.hpp
🧬 Code graph analysis (1)
components/core/src/clp/ffi/EncodedTextAst.hpp (1)
components/core/src/clp/ffi/StringBlob.hpp (3)
  • nodiscard (21-21)
  • nodiscard (28-35)
  • nodiscard (44-58)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: package-image
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-14, true)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ec509bf and 3380975.

📒 Files selected for processing (7)
  • components/core/CMakeLists.txt (1 hunks)
  • components/core/src/clp/ffi/EncodedTextAst.hpp (1 hunks)
  • components/core/src/clp/ffi/EncodedTextAstError.cpp (1 hunks)
  • components/core/src/clp/ffi/EncodedTextAstError.hpp (1 hunks)
  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp (1 hunks)
  • components/core/src/clp/ffi/test/test_StringBlob.cpp (3 hunks)
  • components/core/src/clp/ir/types.hpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/ir/types.hpp
  • components/core/src/clp/ffi/EncodedTextAstError.hpp
  • components/core/src/clp/ffi/EncodedTextAstError.cpp
  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
  • components/core/src/clp/ffi/EncodedTextAst.hpp
  • components/core/src/clp/ffi/test/test_StringBlob.cpp
🧠 Learnings (17)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2025-08-06T23:57:49.762Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 1163
File: components/core/src/clp_s/ColumnWriter.hpp:120-122
Timestamp: 2025-08-06T23:57:49.762Z
Learning: In the CLP codebase, the `encoded_log_dict_id_t` type alias in `components/core/src/clp_s/ColumnWriter.hpp` is appropriately placed as a class-scoped alias because the ColumnWriter.hpp header is already included in all places where this type is used, eliminating the need to move it to a common header.

Applied to files:

  • components/core/src/clp/ir/types.hpp
  • components/core/src/clp/ffi/EncodedTextAstError.hpp
  • components/core/src/clp/ffi/EncodedTextAstError.cpp
  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
  • components/core/src/clp/ffi/EncodedTextAst.hpp
📚 Learning: 2024-10-18T02:31:18.595Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.

Applied to files:

  • components/core/src/clp/ir/types.hpp
  • components/core/src/clp/ffi/EncodedTextAstError.cpp
  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
  • components/core/src/clp/ffi/EncodedTextAst.hpp
📚 Learning: 2025-09-11T13:15:39.930Z
Learnt from: anlowee
Repo: y-scope/clp PR: 1176
File: components/core/src/clp_s/ColumnWriter.cpp:67-72
Timestamp: 2025-09-11T13:15:39.930Z
Learning: In components/core/src/clp_s/ColumnWriter.cpp, when using sizeof with types like variable_dictionary_id_t, always use the fully qualified namespace (clp::variable_dictionary_id_t) to match the variable declaration and maintain consistency with other column writers in the codebase.

Applied to files:

  • components/core/src/clp/ir/types.hpp
  • components/core/src/clp/ffi/test/test_StringBlob.cpp
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.

Applied to files:

  • components/core/CMakeLists.txt
📚 Learning: 2024-11-19T17:30:04.970Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Applied to files:

  • components/core/CMakeLists.txt
  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
  • components/core/src/clp/ffi/test/test_StringBlob.cpp
📚 Learning: 2024-11-26T19:12:09.102Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 486
File: components/core/src/clp/error_handling/ErrorCode.hpp:131-134
Timestamp: 2024-11-26T19:12:09.102Z
Learning: In the error handling system (`components/core/src/clp/error_handling/ErrorCode.hpp`), the `ErrorCode` class is intentionally designed to wrap the `ErrorCodeEnum`, not to use the enum directly. This wrapper provides a general interface for adding customized error enums.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAstError.hpp
  • components/core/src/clp/ffi/EncodedTextAstError.cpp
📚 Learning: 2024-11-26T19:16:19.933Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 486
File: components/core/tests/test-error_handling.cpp:44-91
Timestamp: 2024-11-26T19:16:19.933Z
Learning: In `components/core/tests/test-error_handling.cpp` (C++), when specializing template methods of `ErrorCategory` via type aliases (e.g., `AlwaysSuccessErrorCategory`), it's acceptable to define these specializations in the current namespace, even though the primary template is in the `clp::error_handling` namespace, due to the use of type aliases and template instantiation.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAstError.cpp
  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAstError.cpp
  • components/core/src/clp/ffi/test/test_StringBlob.cpp
📚 Learning: 2024-11-26T19:10:05.051Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 486
File: components/core/src/clp/error_handling/ErrorCode.hpp:34-34
Timestamp: 2024-11-26T19:10:05.051Z
Learning: In the `ErrorCategory` template class in `ErrorCode.hpp`, the `name()` and `message(ErrorCodeEnum)` methods should not be declared as pure virtual functions because the class relies on template specializations, and making them pure virtual would make the class abstract, which is not intended.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAstError.cpp
📚 Learning: 2024-11-01T03:26:26.386Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.

Applied to files:

  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
📚 Learning: 2024-10-13T09:27:43.408Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.

Applied to files:

  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
  • components/core/src/clp/ffi/test/test_StringBlob.cpp
📚 Learning: 2024-10-14T03:42:10.355Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.

Applied to files:

  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
📚 Learning: 2024-11-29T22:50:17.206Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.

Applied to files:

  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
  • components/core/src/clp/ffi/test/test_StringBlob.cpp
📚 Learning: 2025-04-26T02:21:22.021Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.

Applied to files:

  • components/core/src/clp/ffi/test/test_StringBlob.cpp
📚 Learning: 2024-11-29T22:51:00.355Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-clp_s-end_to_end.cpp:141-142
Timestamp: 2024-11-29T22:51:00.355Z
Learning: In 'components/core/tests/test-clp_s-end_to_end.cpp', using `WIFEXITED` and `WEXITSTATUS` with the return value of `std::system` is acceptable for checking the exit status of system commands, as per the C++ documentation.

Applied to files:

  • components/core/src/clp/ffi/test/test_StringBlob.cpp
📚 Learning: 2024-12-10T16:56:33.545Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 624
File: components/core/src/clp/BoundedReader.hpp:25-27
Timestamp: 2024-12-10T16:56:33.545Z
Learning: In the CLP project, within constructors like `BoundedReader` in `components/core/src/clp/BoundedReader.hpp`, it's acceptable to call methods like `get_pos()` that may throw exceptions without handling potential errors via `try_get_pos()`, since the constructor itself handles errors by throwing.

Applied to files:

  • components/core/src/clp/ffi/test/test_StringBlob.cpp
🧬 Code graph analysis (2)
components/core/src/clp/ffi/test/test_EncodedTextAst.cpp (1)
components/core/src/clp/ffi/EncodedTextAst.hpp (5)
  • nodiscard (78-85)
  • nodiscard (97-99)
  • nodiscard (130-145)
  • nodiscard (162-261)
  • encoded_vars (79-80)
components/core/src/clp/ffi/test/test_StringBlob.cpp (1)
components/core/src/clp/ffi/StringBlob.hpp (1)
  • reader (44-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (6)
components/core/src/clp/ffi/test/test_StringBlob.cpp (2)

12-12: LGTM! Good namespace organization.

The namespace wrapper properly organizes the test code in the clp::ffi::test namespace, following good practices for test code organization.

Also applies to: 47-47


14-14: LGTM! Unqualified names resolve correctly.

The change to unqualified StringBlob and BufferReader is correct. From the clp::ffi::test namespace, both names properly resolve to their fully qualified versions (clp::ffi::StringBlob and clp::BufferReader respectively) through namespace lookup.

Also applies to: 26-26

components/core/CMakeLists.txt (1)

484-491: Nice job wiring the new EncodedTextAst sources into the test target.
With these entries in SOURCE_FILES_unitTest, the new implementation and its tests will actually build under the unit test binary. Looks good.

components/core/src/clp/ffi/EncodedTextAstError.hpp (1)

12-20: Enum surface looks spot on.
The enum values and error-code alias align cleanly with the decode paths and ystdlib’s error-handling conventions. Nicely done.

components/core/src/clp/ffi/test/test_EncodedTextAst.cpp (1)

73-191: Great coverage across success and failure paths.
Exercising both encoded-variable widths, real decoding, and each distinct error branch gives solid confidence in the new API. Thanks for the thorough tests.

components/core/src/clp/ffi/EncodedTextAst.hpp (1)

130-145: to_string neatly reuses the generic decode path.
Rebuilding the string via the handler-based decode keeps the code path single-sourced while honouring the unescape semantics. Looks good.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3380975 and 332f586.

📒 Files selected for processing (2)
  • components/core/src/clp/ffi/EncodedTextAst.hpp (1 hunks)
  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/ffi/EncodedTextAst.hpp
  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
🧠 Learnings (8)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2025-08-06T23:57:49.762Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 1163
File: components/core/src/clp_s/ColumnWriter.hpp:120-122
Timestamp: 2025-08-06T23:57:49.762Z
Learning: In the CLP codebase, the `encoded_log_dict_id_t` type alias in `components/core/src/clp_s/ColumnWriter.hpp` is appropriately placed as a class-scoped alias because the ColumnWriter.hpp header is already included in all places where this type is used, eliminating the need to move it to a common header.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAst.hpp
  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
📚 Learning: 2024-10-18T02:31:18.595Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAst.hpp
  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
📚 Learning: 2024-11-01T03:26:26.386Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.

Applied to files:

  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
📚 Learning: 2024-10-13T09:27:43.408Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.

Applied to files:

  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
📚 Learning: 2024-11-19T17:30:04.970Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Applied to files:

  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
📚 Learning: 2024-10-14T03:42:10.355Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.

Applied to files:

  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
📚 Learning: 2024-11-29T22:50:17.206Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.

Applied to files:

  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
🧬 Code graph analysis (1)
components/core/src/clp/ffi/test/test_EncodedTextAst.cpp (1)
components/core/src/clp/ffi/EncodedTextAst.hpp (5)
  • nodiscard (78-85)
  • nodiscard (97-99)
  • nodiscard (130-145)
  • nodiscard (162-261)
  • encoded_vars (79-80)
🔇 Additional comments (10)
components/core/src/clp/ffi/test/test_EncodedTextAst.cpp (4)

1-64: LGTM! Well-structured test helper.

The create_encoded_text_ast_from_string helper function correctly encodes the input string, loads dictionary variables and logtype into the StringBlob, and constructs the EncodedTextAst. The logic properly handles the dict var bounds and aligns with the expected structure where dict vars precede the logtype.


73-109: LGTM! Comprehensive round-trip test.

This test section thoroughly exercises the encoding and decoding pipeline with a variety of variable types (small/medium/large integers, various float formats, dictionary variables) and embedded placeholder characters. The round-trip verification ensures bit-accurate decoding.


111-118: LGTM! Static text test is correct.

The test correctly verifies that static text (no variables) is preserved through encoding and decoding, and that the logtype equals the original text.


120-191: LGTM! Thorough error-handling coverage.

The error test sections comprehensively cover all error cases defined in EncodedTextAstErrEnum:

  • Missing logtype when StringBlob is empty
  • Missing encoded variables (Integer, Float) and dictionary variables using Catch2 generators
  • Trailing escape character without a following character

The assertions correctly verify both the error condition and the specific error enum value. Based on learnings, exception handling for Catch2 tests can be omitted as the framework captures exceptions.

components/core/src/clp/ffi/EncodedTextAst.hpp (6)

21-61: LGTM! Well-defined handler concepts.

The concept definitions clearly specify the signature requirements for each handler type (constant, int var, float var, dict var). The use of std::same_as<void> ensures handlers return void, which is appropriate for side-effect-only callbacks.


78-85: LGTM! Safe factory function with validation.

The factory function correctly validates that string_blob contains at least one string (the logtype) before constructing the object. This ensures get_logtype() at Line 98 and the constructor logic at Line 152 will always work safely.


97-99: LGTM! Safe unchecked .value() call.

The .value() call is safe because the factory function ensures get_num_strings() >= 1, and m_num_dict_vars is computed as get_num_strings() - 1 in the constructor, so the index is always valid.


130-145: LGTM! Clean to_string implementation.

The method correctly uses the decode template with unescape_logtype = true and lambda handlers to build the decoded string. The use of YSTDLIB_ERROR_HANDLING_TRYV macro properly propagates errors from the decode method.


176-250: LGTM! Robust decode loop with proper bounds checking.

The decode implementation correctly:

  • Iterates through the logtype character by character
  • Handles all placeholder types (Float, Integer, Dictionary, Escape)
  • Validates bounds before accessing m_encoded_vars (lines 185, 199) and m_string_blob (line 213)
  • Handles escape characters by checking for trailing escape (line 223) and incrementing curr_pos to skip the escaped character (line 242)
  • Conditionally emits constant text segments based on the unescape_logtype template parameter

252-260: LGTM! Proper remainder handling.

After the loop, any remaining constant text after the last placeholder is correctly emitted via constant_handler. The implementation returns success after completing the decode.

Copy link
Contributor

@gibber9809 gibber9809 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did an initial review, mostly looks good. Besides the comments I left I'm also wondering about the [] usage in the decode method -- I'm personally fine with it since we're doing bounds checking ourselves before every access, but I'm wondering if we should add a comment since I think our coding standard prefers .at().

Comment on lines 163 to 166
EncodedTextAstConstantHandlerReq auto&& constant_handler,
EncodedTextAstIntVarHandlerReq<encoded_variable_t> auto&& int_var_handler,
EncodedTextAstFloatVarHandlerReq<encoded_variable_t> auto&& float_var_handler,
EncodedTextAstDictVarHandlerReq auto&& dict_var_handler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why rvalue references here? I think passing callbacks by value is more common + it's what the previous implementation does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will switch back to use passing-by-value. However, do u think it makes more sense to update our code base to use rvalue ref for callbacks? This makes the callback object more flexible: the callback can be not just a lambda but also an object that holds stateful values. Using rvalue allows us to pass the object into the call and use it later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai can u create an issue to track this codebase-level update?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Create a GitHub issue to track the callback parameter passing strategy discussion

gh issue create \
  --title "Evaluate using rvalue references for callback parameters codebase-wide" \
  --body "## Context

During the review of PR #1561, a discussion arose regarding the parameter passing strategy for callbacks in the codebase.

Currently, most of the codebase passes callbacks by value. The PR initially used rvalue references (\`auto&&\`) for callback parameters, which offers potential benefits:

- **Increased flexibility**: Allows callbacks to be not just lambdas, but also objects that hold stateful values
- **Efficiency**: Using rvalue references allows passing objects into the call and using them later without unnecessary copies

## Proposal

Evaluate whether it would be beneficial to update the codebase to consistently use rvalue references for callback parameters instead of pass-by-value.

## References

- Discussion: https://github.com/y-scope/clp/pull/1561#discussion_r2500321748
- Related PR: https://github.com/y-scope/clp/pull/1561

Requested by: @LinZhihao-723" \
  --assignee LinZhihao-723 \
  --repo y-scope/clp

Length of output: 42


@LinZhihao-723, I've created issue #1570 to track the evaluation of using rvalue references for callback parameters codebase-wide. The issue includes the context from this discussion, the rationale for the potential change, and references back to this PR and comment thread.


🧠 Learnings used
Learnt from: junhaoliao
Repo: y-scope/clp PR: 939
File: components/package-template/src/etc/clp-config.yml:64-64
Timestamp: 2025-06-24T20:13:46.758Z
Learning: When users ask CodeRabbit to create an issue after providing suggestions, they want a GitHub issue created with the high-level requirements and context, not specific code implementations.

Learnt from: gibber9809
Repo: y-scope/clp PR: 1163
File: components/core/src/clp_s/ColumnWriter.hpp:120-122
Timestamp: 2025-08-06T23:57:49.762Z
Learning: In the CLP codebase, the `encoded_log_dict_id_t` type alias in `components/core/src/clp_s/ColumnWriter.hpp` is appropriately placed as a class-scoped alias because the ColumnWriter.hpp header is already included in all places where this type is used, eliminating the need to move it to a common header.

Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.

Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.

Comment on lines 19 to 25
return "An encoded variable is missing from the `EncodedTextAst`";
case EncodedTextAstErrEnum::MissingDictVar:
return "A dictionary variable is missing from the `EncodedTextAst`";
case EncodedTextAstErrEnum::MissingLogtype:
return "The logtype is missing from the `EncodedTextAst`";
case EncodedTextAstErrEnum::UnexpectedTrailingEscapeCharacter:
return "Unexpected escape character without escaped value at the end of the logtype";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return "An encoded variable is missing from the `EncodedTextAst`";
case EncodedTextAstErrEnum::MissingDictVar:
return "A dictionary variable is missing from the `EncodedTextAst`";
case EncodedTextAstErrEnum::MissingLogtype:
return "The logtype is missing from the `EncodedTextAst`";
case EncodedTextAstErrEnum::UnexpectedTrailingEscapeCharacter:
return "Unexpected escape character without escaped value at the end of the logtype";
return "An encoded variable is missing from the `EncodedTextAst`.";
case EncodedTextAstErrEnum::MissingDictVar:
return "A dictionary variable is missing from the `EncodedTextAst`.";
case EncodedTextAstErrEnum::MissingLogtype:
return "The logtype is missing from the `EncodedTextAst`.";
case EncodedTextAstErrEnum::UnexpectedTrailingEscapeCharacter:
return "Unexpected escape character without escaped value at the end of the logtype.";

We end the error messages with periods for our other error categories, so editing this for consistency.

Copy link
Member Author

@LinZhihao-723 LinZhihao-723 Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose that we try to avoid adding periods unless it's a logging statement. Two benefits of doing so:

  1. In a logging statement we usually do:
  LOG(fmt("{}: {}", err.category().name(), err.message()));

While in other logging statement we do:

  LOG(fmt("{} is not valid.", addr))

In the first statement we're assuming the error message will end with a period. But this becomes hard to manage as some error codes don't end with . while others do (e.g., std::errc doesn't end with .: https://en.cppreference.com/w/cpp/error/errc.html)

  1. Not including the period in the error message allows us to format the message in the middle of a sentence if needed.

auto const length{static_cast<size_t>(dict_var_bounds[i + 1]) - begin};
auto const dict_var{input.substr(begin, length)};
BufferReader reader{dict_var.data(), dict_var.size()};
REQUIRE_FALSE(string_blob.read_from(reader, length).has_value());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it makes sense to also add an emplace_back(std::string_view) signature to StringBlob? Besides tests I think it might also end up being useful for the encode path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but I doubt we're gonna need this in the encode path: I prefer to preserve a list of string views and encode them directly, since the lifetime of the input message should outlive the encode method and we can avoid a memory copy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, leave it as is then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

text += " and a medium float " + var_strs[var_ix++];
text += " and a weird float " + var_strs[var_ix++];
text += " and a string with numbers " + var_strs[var_ix++];
text += " and another string with numbers " + var_strs[var_ix++];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this sort of mimics the long test case from test-EncodedVariableInterpreter.cpp could we also add the case with a variable that contains a backslash? Figure we might as well include that edge case since we're doing all of these other cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. Also modernize the text string construction.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd6d6f7 and d0910a4.

📒 Files selected for processing (2)
  • components/core/src/clp/ffi/EncodedTextAst.hpp (1 hunks)
  • components/core/src/clp/ffi/EncodedTextAstError.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/ffi/EncodedTextAstError.cpp
  • components/core/src/clp/ffi/EncodedTextAst.hpp
🧠 Learnings (9)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2024-11-26T19:16:19.933Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 486
File: components/core/tests/test-error_handling.cpp:44-91
Timestamp: 2024-11-26T19:16:19.933Z
Learning: In `components/core/tests/test-error_handling.cpp` (C++), when specializing template methods of `ErrorCategory` via type aliases (e.g., `AlwaysSuccessErrorCategory`), it's acceptable to define these specializations in the current namespace, even though the primary template is in the `clp::error_handling` namespace, due to the use of type aliases and template instantiation.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAstError.cpp
📚 Learning: 2025-08-06T23:57:49.762Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 1163
File: components/core/src/clp_s/ColumnWriter.hpp:120-122
Timestamp: 2025-08-06T23:57:49.762Z
Learning: In the CLP codebase, the `encoded_log_dict_id_t` type alias in `components/core/src/clp_s/ColumnWriter.hpp` is appropriately placed as a class-scoped alias because the ColumnWriter.hpp header is already included in all places where this type is used, eliminating the need to move it to a common header.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAstError.cpp
  • components/core/src/clp/ffi/EncodedTextAst.hpp
📚 Learning: 2024-11-26T19:10:05.051Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 486
File: components/core/src/clp/error_handling/ErrorCode.hpp:34-34
Timestamp: 2024-11-26T19:10:05.051Z
Learning: In the `ErrorCategory` template class in `ErrorCode.hpp`, the `name()` and `message(ErrorCodeEnum)` methods should not be declared as pure virtual functions because the class relies on template specializations, and making them pure virtual would make the class abstract, which is not intended.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAstError.cpp
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAstError.cpp
  • components/core/src/clp/ffi/EncodedTextAst.hpp
📚 Learning: 2024-11-26T19:12:09.102Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 486
File: components/core/src/clp/error_handling/ErrorCode.hpp:131-134
Timestamp: 2024-11-26T19:12:09.102Z
Learning: In the error handling system (`components/core/src/clp/error_handling/ErrorCode.hpp`), the `ErrorCode` class is intentionally designed to wrap the `ErrorCodeEnum`, not to use the enum directly. This wrapper provides a general interface for adding customized error enums.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAstError.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAstError.cpp
📚 Learning: 2024-12-10T16:03:08.691Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:08.691Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` function, validation and exception throwing for UTF-8 compliance of `curr_node.get_key_name()` are unnecessary and should be omitted.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAst.hpp
📚 Learning: 2024-10-18T02:31:18.595Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAst.hpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: package-image
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-14, false)
🔇 Additional comments (11)
components/core/src/clp/ffi/EncodedTextAstError.cpp (3)

1-8: LGTM!

The includes and type alias setup follow the established error handling patterns in the codebase.


10-13: LGTM!

The error category name correctly identifies this as an EncodedTextAst error domain, distinguishing it from other error categories.


15-29: LGTM!

The error messages are clear, consistent, and cover all enum cases. The decision to omit trailing periods aligns with the team's preference for composable error messages.

components/core/src/clp/ffi/EncodedTextAst.hpp (8)

1-19: LGTM!

Header guards and includes are appropriate for this template class definition.


21-61: LGTM!

The handler concepts clearly define the required signatures and leverage C++20 concepts effectively to constrain the decode method's template parameters.


71-85: LGTM!

The factory function provides essential validation that the string blob contains at least the required logtype, preventing construction of invalid instances.


88-94: LGTM!

Explicitly defaulted special member functions with appropriate noexcept specifications follow modern C++ best practices.


97-99: LGTM!

The direct .value() call is safe because the factory function ensures at least one string exists in the blob. The logtype is stored at index m_num_dict_vars which is always valid.


101-122: LGTM!

The method signature uses auto&& forwarding references with concepts, which is the idiomatic C++20 approach for generic callable parameters. This provides perfect forwarding and works efficiently with both lvalue and rvalue handlers without forcing copies.


130-145: LGTM!

The to_string() implementation cleanly composes the decode method with lambda handlers to reconstruct the original text, with proper error propagation.


160-261: LGTM!

The decode implementation correctly handles all placeholder types and escape sequences with appropriate bounds checking and error handling. The logic for skipping escaped characters (line 242) and processing the remainder (lines 252-258) is sound.

@LinZhihao-723
Copy link
Member Author

Did an initial review, mostly looks good. Besides the comments I left I'm also wondering about the [] usage in the decode method -- I'm personally fine with it since we're doing bounds checking ourselves before every access, but I'm wondering if we should add a comment since I think our coding standard prefers .at().

Switched to use .at. Should do this refactoring when copying the code in the first place. My bad.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d0910a4 and dbfcd7d.

📒 Files selected for processing (4)
  • components/core/src/clp/ffi/EncodedTextAst.hpp (1 hunks)
  • components/core/src/clp/ffi/EncodedTextAstError.cpp (1 hunks)
  • components/core/src/clp/ffi/EncodedTextAstError.hpp (1 hunks)
  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/ffi/EncodedTextAstError.cpp
  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
  • components/core/src/clp/ffi/EncodedTextAstError.hpp
  • components/core/src/clp/ffi/EncodedTextAst.hpp
🧠 Learnings (15)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2024-11-26T19:16:19.933Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 486
File: components/core/tests/test-error_handling.cpp:44-91
Timestamp: 2024-11-26T19:16:19.933Z
Learning: In `components/core/tests/test-error_handling.cpp` (C++), when specializing template methods of `ErrorCategory` via type aliases (e.g., `AlwaysSuccessErrorCategory`), it's acceptable to define these specializations in the current namespace, even though the primary template is in the `clp::error_handling` namespace, due to the use of type aliases and template instantiation.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAstError.cpp
  • components/core/src/clp/ffi/EncodedTextAstError.hpp
📚 Learning: 2025-08-06T23:57:49.762Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 1163
File: components/core/src/clp_s/ColumnWriter.hpp:120-122
Timestamp: 2025-08-06T23:57:49.762Z
Learning: In the CLP codebase, the `encoded_log_dict_id_t` type alias in `components/core/src/clp_s/ColumnWriter.hpp` is appropriately placed as a class-scoped alias because the ColumnWriter.hpp header is already included in all places where this type is used, eliminating the need to move it to a common header.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAstError.cpp
  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
  • components/core/src/clp/ffi/EncodedTextAstError.hpp
  • components/core/src/clp/ffi/EncodedTextAst.hpp
📚 Learning: 2024-11-26T19:10:05.051Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 486
File: components/core/src/clp/error_handling/ErrorCode.hpp:34-34
Timestamp: 2024-11-26T19:10:05.051Z
Learning: In the `ErrorCategory` template class in `ErrorCode.hpp`, the `name()` and `message(ErrorCodeEnum)` methods should not be declared as pure virtual functions because the class relies on template specializations, and making them pure virtual would make the class abstract, which is not intended.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAstError.cpp
  • components/core/src/clp/ffi/EncodedTextAstError.hpp
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAstError.cpp
  • components/core/src/clp/ffi/EncodedTextAst.hpp
📚 Learning: 2024-10-07T21:38:35.979Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-07T21:38:35.979Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAstError.cpp
📚 Learning: 2024-11-26T19:12:09.102Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 486
File: components/core/src/clp/error_handling/ErrorCode.hpp:131-134
Timestamp: 2024-11-26T19:12:09.102Z
Learning: In the error handling system (`components/core/src/clp/error_handling/ErrorCode.hpp`), the `ErrorCode` class is intentionally designed to wrap the `ErrorCodeEnum`, not to use the enum directly. This wrapper provides a general interface for adding customized error enums.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAstError.cpp
  • components/core/src/clp/ffi/EncodedTextAstError.hpp
📚 Learning: 2024-11-01T03:26:26.386Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.

Applied to files:

  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
📚 Learning: 2024-10-13T09:27:43.408Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.

Applied to files:

  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
  • components/core/src/clp/ffi/EncodedTextAst.hpp
📚 Learning: 2024-11-19T17:30:04.970Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Applied to files:

  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
📚 Learning: 2024-10-14T03:42:10.355Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.

Applied to files:

  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
📚 Learning: 2024-10-18T02:31:18.595Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.

Applied to files:

  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
  • components/core/src/clp/ffi/EncodedTextAstError.hpp
  • components/core/src/clp/ffi/EncodedTextAst.hpp
📚 Learning: 2024-11-29T22:50:17.206Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.

Applied to files:

  • components/core/src/clp/ffi/test/test_EncodedTextAst.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:209-210
Timestamp: 2024-10-08T15:52:50.753Z
Learning: Until #486 is merged with customized error code support, introducing a layer from `IRErrorCode` to `std::errc` is unnecessary.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAstError.hpp
📚 Learning: 2024-10-07T21:16:41.660Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.

Applied to files:

  • components/core/src/clp/ffi/EncodedTextAst.hpp
🧬 Code graph analysis (1)
components/core/src/clp/ffi/test/test_EncodedTextAst.cpp (1)
components/core/src/clp/ffi/EncodedTextAst.hpp (5)
  • nodiscard (78-85)
  • nodiscard (97-99)
  • nodiscard (130-145)
  • nodiscard (162-261)
  • encoded_vars (79-80)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (3)
components/core/src/clp/ffi/EncodedTextAstError.hpp (1)

12-20: Nice consistency with the error handling stack.

Enum sizing plus the YSTDLIB_ERROR_HANDLING_MARK_AS_ERROR_CODE_ENUM registration keeps this aligned with the existing error plumbing—looks good to me.

components/core/src/clp/ffi/EncodedTextAstError.cpp (1)

10-28: Clear, user-facing messages.

Distinct category name and concise per-enum text keep the diagnostics easy to trace—nicely done.

components/core/src/clp/ffi/test/test_EncodedTextAst.cpp (1)

70-195: Great coverage breadth.

The template test spanning both encoded variable widths plus the targeted error sections (missing pieces, trailing escape) gives strong confidence in the new decode path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants