-
Notifications
You must be signed in to change notification settings - Fork 83
refactor(clp): Prepare parsing & search code for deduplication with clp-s: #1138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(clp): Prepare parsing & search code for deduplication with clp-s: #1138
Conversation
…dependence on specific clp implementations; Use string_view where possible instead of std::string const&
… future concept interface
…y to EncodedVariableInterpreter
## Walkthrough
This set of changes modernizes and refactors the variable encoding/decoding and dictionary logic in the core CLP components. It introduces generic, templated implementations for variable handling, switches to `std::string_view` for efficient string passing, and moves static placeholder-adding logic from `LogTypeDictionaryEntry` to `EncodedVariableInterpreter`. Several interfaces are updated for flexibility and performance, and large portions of legacy logic are removed or replaced with more generic methods.
## Changes
| Cohort / File(s) | Change Summary |
|---|---|
| **Dictionary Reader/Writer API Modernization**<br>`components/core/src/clp/DictionaryReader.hpp`, `components/core/src/clp/DictionaryWriter.hpp`, `components/core/src/clp/VariableDictionaryWriter.cpp`, `components/core/src/clp/VariableDictionaryWriter.hpp` | Updated method signatures to use `std::string_view` for string parameters, added type aliases for template parameters, switched internal map to `absl::flat_hash_map`, and updated includes. |
| **EncodedVariableInterpreter Major Refactor**<br>`components/core/src/clp/EncodedVariableInterpreter.cpp`, `components/core/src/clp/EncodedVariableInterpreter.hpp` | Removed legacy variable encoding/decoding and dictionary methods from `.cpp`. Introduced generic, templated, and more efficient static methods for encoding/decoding, placeholder handling, and dictionary operations in `.hpp`. Improved error handling and logging. |
| **LogTypeDictionaryEntry Interface and Logic Update**<br>`components/core/src/clp/LogTypeDictionaryEntry.cpp`, `components/core/src/clp/LogTypeDictionaryEntry.hpp` | Updated methods to use `std::string_view`, removed static placeholder-adding methods, and switched calls to `EncodedVariableInterpreter` for placeholder logic. |
| **Query/Grep Integration Update**<br>`components/core/src/clp/Grep.cpp` | Replaced calls to now-removed static methods in `LogTypeDictionaryEntry` with the new static methods in `EncodedVariableInterpreter` for adding variable placeholders. |
| **CMake Build Integration for Abseil**<br>`components/core/src/clp/clg/CMakeLists.txt`, `components/core/src/clp/clo/CMakeLists.txt`, `components/core/src/clp/clp/CMakeLists.txt` | Added `absl::flat_hash_map` as a private library dependency to `clg`, `clo`, and `clp` targets. |
| **Test Header Inclusion Update**<br>`components/core/tests/test-EncodedVariableInterpreter.cpp` | Added includes for `LogTypeDictionaryEntry.hpp`, `VariableDictionaryReader.hpp`, and `VariableDictionaryWriter.hpp` to enable test compilation with updated interfaces. |
## Sequence Diagram(s)
```mermaid
sequenceDiagram
participant User
participant EncodedVariableInterpreter
participant LogTypeDictionaryEntry
participant VariableDictionaryWriter
participant VariableDictionaryReader
User->>EncodedVariableInterpreter: encode_and_add_to_dictionary(message, logtype_entry, var_dict, ...)
EncodedVariableInterpreter->>LogTypeDictionaryEntry: parse_next_var(message, ...)
EncodedVariableInterpreter->>VariableDictionaryWriter: add_entry(var, ...)
EncodedVariableInterpreter->>LogTypeDictionaryEntry: add_constant/placeholder(logtype)
EncodedVariableInterpreter-->>User: encoded_vars, var_ids
User->>EncodedVariableInterpreter: decode_variables_into_message(logtype_entry, var_dict, encoded_vars, ...)
EncodedVariableInterpreter->>VariableDictionaryReader: get_entry_matching_value(id)
EncodedVariableInterpreter-->>User: decompressed_msgEstimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Possibly related PRs
Suggested reviewers
Learnt from: Bill-hbrhbr Learnt from: gibber9809 Learnt from: gibber9809 Learnt from: gibber9809 Learnt from: LinZhihao-723 Learnt from: AVMatthews Learnt from: AVMatthews Learnt from: gibber9809 Learnt from: quinntaylormitchell Learnt from: LinZhihao-723 Learnt from: jackluo923 Learnt from: jackluo923 Learnt from: AVMatthews Learnt from: davemarco |
haiqi96
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partial review
| #ifndef CLP_LOGTYPEDICTIONARYENTRY_HPP | ||
| #define CLP_LOGTYPEDICTIONARYENTRY_HPP | ||
|
|
||
| #include <string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the string is not used in this header.
| size_t& var_begin_pos, | ||
| size_t& var_end_pos, | ||
| std::string& var | ||
| std::string_view& var |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at first glance, It's not very intuitive to return a string_view by reference.
I double checked the code and I think indeed will work, as the function updates the var so it reference to the correct porition of the string. so I guess I will approve this.
@kirkrodrigues just want to double check if you also agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree that it would be better to end up changing the interface (actually in a few places besides this as well), but my reasoning for keeping it mostly the same besides switching to string_view in this PR was to minimize how much call-sites for all of this code need to change to simplify the whole series of PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, seems fine.
| * @param logtype | ||
| */ | ||
| static void add_dict_var(std::string& logtype) { | ||
| logtype += enum_to_underlying_type(ir::VariablePlaceholder::Dictionary); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use append instead of +=?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure -- this was inherited from the old code, but may as well clean it up.
| typename VariableDictionaryReaderType, | ||
| typename VariableDictionaryEntryType = VariableDictionaryReaderType::entry_t> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any plan to use a concept to enforce that VariableDictionaryReaderType have an entry_t type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's part of the concept (see here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/core/src/clp/EncodedVariableInterpreter.hpp(7 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit Configuration File
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp/EncodedVariableInterpreter.hpp
🧠 Learnings (2)
📓 Common learnings
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:40-55
Timestamp: 2024-11-12T18:56:31.067Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, within the `SchemaNode` class, the use of `std::string_view` for `m_key_name` referencing `m_key_buf` is intentional to ensure that references to the key name remain valid even after move construction.
Learnt from: LinZhihao-723
PR: y-scope/clp#486
File: components/core/tests/test-error_handling.cpp:37-38
Timestamp: 2024-11-26T19:12:43.982Z
Learning: In the CLP project, C++20 is used, allowing the utilization of C++20 features such as class template argument deduction (CTAD) with `std::array`.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:91-94
Timestamp: 2024-11-12T18:47:03.828Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, the `SchemaNode` class uses `std::unique_ptr<char[]> m_key_buf` and `std::string_view m_key_name` to ensure that references to `m_key_name` remain valid even after `SchemaNode` is move-constructed.
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-07T20:10:08.254Z
Learning: In `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Learnt from: quinntaylormitchell
PR: y-scope/clp#961
File: docs/src/dev-guide/design-clp-structured/single-file-archive-format.md:216-219
Timestamp: 2025-06-18T14:35:20.485Z
Learning: In clp-s documentation, technical abbreviations like "MPT" (Merged Parse Tree) should be defined at first use to improve reader clarity and comprehension.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within `components/core/tools/scripts/lib_install/`, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like `readonly` declarations.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within `components/core/tools/scripts/lib_install/`, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like `readonly` declarations.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.
Learnt from: davemarco
PR: y-scope/clp#700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.
components/core/src/clp/EncodedVariableInterpreter.hpp (12)
Learnt from: LinZhihao-723
PR: #557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In components/core/src/clp/ffi/ir_stream/utils.hpp, the function size_dependent_encode_and_serialize_schema_tree_node_id assumes that the caller checks that node_id fits within the range of encoded_node_id_t before casting.
Learnt from: LinZhihao-723
PR: #554
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:109-111
Timestamp: 2024-10-10T15:19:52.408Z
Learning: In KeyValuePairLogEvent.cpp, for the class JsonSerializationIterator, it's acceptable to use raw pointers for member variables like m_schema_tree_node, and there's no need to replace them with references or smart pointers in this use case.
Learnt from: haiqi96
PR: #523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In components/core/src/clp/BufferedFileReader.cpp, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In components/core/src/clp_s/JsonParser.cpp, within the parse_from_ir method, encountering errors from kv_log_event_result.error() aside from std::errc::no_message_available and std::errc::result_out_of_range is anticipated behavior and does not require additional error handling or logging.
Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-07T20:10:08.254Z
Learning: In clp-s.cpp, the run_serializer function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In clp-s.cpp, the run_serializer function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Learnt from: LinZhihao-723
PR: #856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.
Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In components/core/src/clp_s/JsonParser.cpp, within the parse_from_ir() function, reaching the end of log events in a given IR is not considered an error case. The errors std::errc::no_message_available and std::errc::result_out_of_range are expected signals to break the deserialization loop and proceed accordingly.
Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In components/core/src/clp_s/JsonParser.cpp, when handling errors in parse_from_ir, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.
Learnt from: LinZhihao-723
PR: #558
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:474-480
Timestamp: 2024-12-09T15:25:14.043Z
Learning: In components/core/src/clp/ffi/KeyValuePairLogEvent.cpp, node IDs are validated before accessing child_schema_tree_node in the function serialize_node_id_value_pairs_to_json, ensuring get_node does not throw exceptions due to invalid node IDs.
Learnt from: gibber9809
PR: #584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In components/core/src/clp_s/SchemaTree.hpp, it's acceptable to use std::string_view as keys in m_node_map because SchemaNode's m_key_name remains valid even after move operations or reallocations, preventing dangling references.
Learnt from: davemarco
PR: #700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-15, true)
🔇 Additional comments (3)
components/core/src/clp/EncodedVariableInterpreter.hpp (3)
54-84: LGTM! Clean placeholder helper methods.The static helper methods for adding variable placeholders are well-structured and provide a clear interface for logtype construction.
93-93: Good use ofstd::string_viewfor read-only string parameters.The conversion to
std::string_viewfor these parameters is appropriate as it avoids unnecessary string copies and provides a more flexible interface.Also applies to: 103-103, 126-126, 201-201, 222-222
340-352: Verify the dict_var_handler logic for 4-byte encoded variables.In the dict_var_handler, when handling 4-byte encoded variables (line 349), the code calls
encode_varwhich could potentially return a non-dictionary encoded variable if the string looks like an integer or float. This seems inconsistent with the handler's purpose of processing dictionary variables.Consider using
add_dict_vardirectly for 4-byte types as well, or document why this different behavior is intended.⛔ Skipped due to learnings
Learnt from: LinZhihao-723 PR: y-scope/clp#557 File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0 Timestamp: 2024-10-18T02:31:18.595Z Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.Learnt from: AVMatthews PR: y-scope/clp#543 File: components/core/src/clp_s/JsonParser.cpp:735-794 Timestamp: 2024-10-07T21:35:04.362Z Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.Learnt from: AVMatthews PR: y-scope/clp#543 File: components/core/src/clp_s/clp-s.cpp:265-275 Timestamp: 2024-10-03T14:39:03.539Z Learning: In `generate_IR`, the `run_serializer` function is already a template, and the branching between `int32_t` and `int64_t` occurs outside the loop, making further template-based refactoring unnecessary.Learnt from: AVMatthews PR: y-scope/clp#543 File: components/core/src/clp_s/clp-s.cpp:265-275 Timestamp: 2024-10-08T15:52:50.753Z Learning: In `generate_IR`, the `run_serializer` function is already a template, and the branching between `int32_t` and `int64_t` occurs outside the loop, making further template-based refactoring unnecessary.Learnt from: LinZhihao-723 PR: y-scope/clp#544 File: components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp:230-261 Timestamp: 2024-10-08T15:52:50.753Z Learning: In `deserialize_int_val`, the integer type needs to be determined at runtime, so refactoring into a templated helper function does not simplify the logic.Learnt from: LinZhihao-723 PR: y-scope/clp#544 File: components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp:230-261 Timestamp: 2024-09-24T22:34:58.746Z Learning: In `deserialize_int_val`, the integer type needs to be determined at runtime, so refactoring into a templated helper function does not simplify the logic.Learnt from: haiqi96 PR: y-scope/clp#523 File: components/core/src/clp/clp/FileCompressor.cpp:189-220 Timestamp: 2024-10-11T16:16:02.866Z Learning: Ensure that before flagging functions like `parse_and_encode` for throwing exceptions while declared with `noexcept`, verify that the function is actually declared with `noexcept` to avoid false positives.Learnt from: LinZhihao-723 PR: y-scope/clp#557 File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286 Timestamp: 2024-10-13T09:27:43.408Z Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.
haiqi96
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general the change looks good to me. Locally compiled it and made sure unittests passed.
Given that there are more changes to come, I would prefer to approve this one instead of keep reviewing it (and get diminished return)
kirkrodrigues
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the PR title & body, how about:
refactor(clp): Prepare for deduplication with clp-s:
- Use templates in `EncodedVariableInterpreter` instead of CLP dictionary implementations.
- Use `std::string_view` where possible in `EncodedVariableInterpreter` and dictionary classes.
| * @param encoded_vars | ||
| * @param var_ids | ||
| */ | ||
| template <typename VariableDictionaryWriterType, typename LogTypeDictionaryEntryType> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Can we swap these two template params for consistency? (Also in the docstring and definition.)
| */ | ||
| template < | ||
| typename VariableDictionaryReaderType, | ||
| typename VariableDictionaryEntryType = VariableDictionaryReaderType::entry_t> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My IDE suggests:
| typename VariableDictionaryEntryType = VariableDictionaryReaderType::entry_t> | |
| typename VariableDictionaryEntryType = typename VariableDictionaryReaderType::entry_t> |
Since entry_t is a template type that's dependent on what VariableDictionaryReaderType is. I suppose the compiler could infer that since the lhs is a typename, the rhs must also be a typename, but I'm not sure. I'll leave it up to you based on your knowledge. Same for line 220.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right -- I'll update it
| */ | ||
| std::vector<EntryType const*> | ||
| get_entry_matching_value(std::string const& search_string, bool ignore_case) const; | ||
| get_entry_matching_value(std::string_view search_string, bool ignore_case) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a test case in test-EncodedVariableInterpreter.cpp that currently casts a string_view to a string to call this method. We can now get rid of those casts.
EncodedVariableInterpreter instead of CLP dictionary implementations; Use std::string_view where possible in EncodedVariableInterpreter and dictionary classes.EncodedVariableInterpreter instead of CLP dictionary implementations. - Use std::string_view where possible in EncodedVariableInterpreter and dictionary classes.
EncodedVariableInterpreter instead of CLP dictionary implementations. - Use std::string_view where possible in EncodedVariableInterpreter and dictionary classes.EncodedVariableInterpreter instead of CLP dictionary implementations. - Use std::string_view where possible in EncodedVariableInterpreter and dictionary classes.
EncodedVariableInterpreter instead of CLP dictionary implementations. - Use std::string_view where possible in EncodedVariableInterpreter and dictionary classes.EncodedVariableInterpreter instead of CLP dictionary implementations. - Use std::string_view where possible in EncodedVariableInterpreter and dictionary classes.
EncodedVariableInterpreter instead of CLP dictionary implementations. - Use std::string_view where possible in EncodedVariableInterpreter and dictionary classes.
kirkrodrigues
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, we might need to make the title a little more specific. How about:
refactor(clp): Prepare parsing & search code for deduplication with clp-s:
Description
This PR is the first of several that is attempting to bring the core CLP parsing and search code to a state where it can be shared by both clp and clp-s. The full end to end integration can be seen here.
This initial PR does two things:
EncodedVariableInterpreterto use templates instead of dictionary and dictionary entry implementations specific to CLPThis will end up allowing both clp and clp-s to share the same core logic while using their own respective dictionary implementations. These templates will be enhanced in a follow-up PR to use concepts in order to ensure dictionaries passed to these methods follow a certain interface.
EncodedVariableInterpreterand the various dictionary classes to acceptstd::string_viewinstead ofstd::string const&.This allows us to avoid some string copies (more once combined with a follow-up PR that modifies
Grep). It also makes theEncodedVariableInterpreterinterface agree more with the ffi parsing interfaces.As part of changing the dictionary writers to accept
std::string_viewwe also switch their internal hashmap representation toabsl::flat_hash_map<std::string,...>. The performance of flat hash map is a bit better thanstd::unordered_map, but the main advantage here is that absl has defined their hash functions such thatstd::string_viewcan be hashed in the same way asstd::stringto do lookups in a hashmap keyed bystd::string(which allows us to avoid a string copy on the fast path), whereas withstd::unordered_mapwe would have to provide our own hash function to accomplish the same thing which is error prone and forces us to use a more verbose template forstd::unordered_map.Checklist
breaking change.
Validation performed
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Refactor
std::string_viewfor more efficient string handling.Chores