perf(clp-s::timestamp_parser): Improve marshalling speed for padded integer fields (resolves #1968). by gibber9809 · Pull Request #2013 · y-scope/clp

gibber9809 · 2026-02-20T18:16:58Z

Description

This PR significantly improves decompression speed for padded integer fields in timestamps, leading to much higher decompression speeds overall. As identified in #1968 most of the overhead for timestamp marshalling was coming from fmt::format() calls, which appeared to be dynamically parsing the format strings on each call.

Decompression speedup for open source datasets compared to decompression speed without this optimization:

dataset	speedup %
mongodb	+23.14%
cockroachdb	+3.75%
elasticsearch	+16.81%

Checklist

The PR satisfies the contribution guidelines.
This is a breaking change and that has been indicated in the PR title, OR this isn't a
breaking change.
Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Validated that unit tests still pass
Benchmarked improvement in decompression speed

Summary by CodeRabbit

Refactor
- Enhanced error handling and validation in date and time parsing operations to improve overall reliability.
- Strengthened consistency in timestamp formatting throughout the application.
- Improved robustness of temporal data processing with refined internal validation logic to ensure more predictable behaviour.

…ormat calls.

coderabbitai · 2026-02-20T18:17:06Z

Walkthrough

Introduces a new helper function append_positive_left_padded_integer() to replace fmt::format calls for padding integers with zeros throughout timestamp parsing and marshaling logic. The helper validates non-negativity and integrates error propagation via existing error-handling mechanisms.

Changes

Cohort / File(s)	Summary
Timestamp Parser Helper Function `components/core/src/clp_s/timestamp_parser/TimestampParser.cpp`	Adds `append_positive_left_padded_integer()` helper function for left-padded integer formatting with negative value validation. Replaces `fmt::format` calls in date-time and numeric timestamp marshaling for years, months, days, hours, minutes, seconds, sub-seconds, and timezone handling. Implements error propagation via YSTDLIB_ERROR_HANDLING_TRYV throughout affected code paths.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

perf(clp-s::timestamp_parser): Improve marshalling speed for format specifiers representing padded integer fields. #1968: Directly addresses the implementation of append_positive_left_padded_integer() helper to replace fmt::format calls in timestamp parsing for improved error handling and consistency.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: a performance optimization for timestamp marshalling by improving padded integer field handling, and references the related issue.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (1)
751-773: 🧹 Nitpick | 🔵 Trivial

Near-duplicate logic in the 'T' case could be extracted into a shared helper.

Both the date-time marshal (lines 751–773) and numeric marshal (lines 872–894) create a temporary string, pad it, strip trailing zeros, check for all-zeros, and append to the buffer. This is an existing pattern, but the PR touches both blocks. Consider extracting this into a small helper to reduce duplication.

Also applies to: 872-894
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/core/src/clp_s/timestamp_parser/TimestampParser.cpp` around lines
751 - 773, Extract the repeated logic that builds a 9-digit left-padded
nanoseconds string, trims trailing zeros, validates not-all-zero, and appends to
the output into a small helper (e.g., formatOrAppendFractionalNanoseconds or
appendTrimmedSubseconds) and call it from the 'T' branch in TimestampParser::...
(the date-time marshal case) and from the numeric marshal case; the helper
should accept the nanoseconds value (from time_of_day.subseconds().count() or
equivalent) plus a reference to the buffer (or return a string and an ErrorCode)
and must reproduce the current behavior: pad to 9 digits, strip trailing '0's,
return ErrorCode::IncompatibleTimestampPattern when all digits are zero, and
otherwise append the trimmed substring to the buffer so both code paths use the
same implementation.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/core/src/clp_s/timestamp_parser/TimestampParser.cpp`:
- Around line 946-951: The current code in TimestampParser.cpp creates a
temporary std::string via std::to_string(value) before appending padding and the
value to buffer; replace that allocation by formatting the integer directly into
a small stack char array using std::to_chars (include <charconv>), compute
digits written, compute num_padding_characters as before, then call
buffer.append(num_padding_characters, padding_character) and buffer.append(ptr,
ptr + digits_written) instead of using value_str—target the block that declares
value, length, padding_character and the buffer appends to eliminate the heap
allocation.
- Around line 936-953: The function append_positive_left_padded_integer
currently takes an int which causes implicit narrowing at call sites; change the
parameter type of value from int to int64_t in both the function declaration and
the function definition (append_positive_left_padded_integer) so it accepts
int64_t (matching extract_absolute_subsecond_nanoseconds()/epochtime_t and
chrono .count() results); ensure any includes for <cstdint> are present and
update any forward declarations or prototypes to the new signature so callers no
longer perform narrowing conversions.

---

Outside diff comments:
In `@components/core/src/clp_s/timestamp_parser/TimestampParser.cpp`:
- Around line 751-773: Extract the repeated logic that builds a 9-digit
left-padded nanoseconds string, trims trailing zeros, validates not-all-zero,
and appends to the output into a small helper (e.g.,
formatOrAppendFractionalNanoseconds or appendTrimmedSubseconds) and call it from
the 'T' branch in TimestampParser::... (the date-time marshal case) and from the
numeric marshal case; the helper should accept the nanoseconds value (from
time_of_day.subseconds().count() or equivalent) plus a reference to the buffer
(or return a string and an ErrorCode) and must reproduce the current behavior:
pad to 9 digits, strip trailing '0's, return
ErrorCode::IncompatibleTimestampPattern when all digits are zero, and otherwise
append the trimmed substring to the buffer so both code paths use the same
implementation.

coderabbitai · 2026-02-20T20:09:43Z