Properly strip whitespace from the right side of header values #48

kenballus · 2025-09-08T19:03:13Z

Fixes #47.

Contribution

I added tests for my changes.
I tested my changes locally.
I agree to the Developer's Certificate of Origin 1.1.

ioquatix · 2025-09-09T02:40:25Z

This looks good to me, but I want to draw your attention to https://github.com/socketry/protocol-http1/blob/main/test/protocol/http1/parser.rb which is testing these patterns to ensure they match in linear time.

kenballus · 2025-09-09T21:12:12Z

This looks good to me, but I want to draw your attention to https://github.com/socketry/protocol-http1/blob/main/test/protocol/http1/parser.rb which is testing these patterns to ensure they match in linear time.

Interesting :)

Assuming the test works, I'm not sure why it's passing. Worst-case on this regex should be $\Theta(n^2)$ with a naive algorithm. Maybe the regex engine here is using the \z anchor in the pattern to optimize this case?

kenballus · 2025-09-09T21:36:57Z

Okay, clearly I don't understand regexes as well as I thought I did :)

Regexp.linear_time? definitely returns true for this pattern, but I remain unsure as to why.

ioquatix · 2025-09-09T23:38:00Z

Modern Ruby implementations use memoization to avoid exponential backtracking (trading memory for computation), so it can depend on the implementation, which is why linear_time? is a function. It may be that some Ruby implementations return false for the same regex. However, one of the reasons why I changed this code in the first place was actually because I found the regex was not linear.

kenballus · 2025-09-10T18:00:21Z

Modern Ruby implementations use memoization to avoid exponential backtracking (trading memory for computation), so it can depend on the implementation, which is why linear_time? is a function.

I ended up going down that rabbithole last night and reading the paper referenced in the Ruby docs. Pretty cool!

However, one of the reasons why I changed this code in the first place was actually because I found the regex was not linear.

As it is, the current pattern clearly needs to change, given that the final OWS is useless (the preceding [^\r\n\0] always eats its lunch).

Anyway, the only pattern I can think of that doesn't use non-greedy matching would be something like this (imo, gross):

FIELD_VALUE = /|[^ \t\0\r\n]|[^ \t\0\r\n][^\0\r\n]*[^ \t\0\r\n]/.freeze

That is, 3 cases:

empty
a single non-whitespace character
a non-whitespace character, any number of potentially-whitespace characters, then another non-whitespace character

Would this be preferable?

ioquatix · 2025-09-11T02:14:45Z

TBH, as long as 1inear_time? is true, I think it's okay, but it might be nice to evaluate the performance vs rstrip! before we make a final decision?

kenballus · 2025-09-11T15:40:09Z

We want to strip only spaces and tabs, but rstrip will also strip vertical tab and form feed (and also CR, LF, and NUL, but these wouldn't match the regex anyway.)

Properly strip whitespace from the right side of header values

93884b8

kenballus force-pushed the main branch from ae87e17 to 93884b8 Compare September 9, 2025 20:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Properly strip whitespace from the right side of header values #48

Properly strip whitespace from the right side of header values #48

Uh oh!

kenballus commented Sep 8, 2025

Uh oh!

ioquatix commented Sep 9, 2025

Uh oh!

kenballus commented Sep 9, 2025

Uh oh!

kenballus commented Sep 9, 2025

Uh oh!

ioquatix commented Sep 9, 2025 •

edited

Loading

Uh oh!

kenballus commented Sep 10, 2025

Uh oh!

ioquatix commented Sep 11, 2025

Uh oh!

kenballus commented Sep 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Properly strip whitespace from the right side of header values #48

Are you sure you want to change the base?

Properly strip whitespace from the right side of header values #48

Uh oh!

Conversation

kenballus commented Sep 8, 2025

Contribution

Uh oh!

ioquatix commented Sep 9, 2025

Uh oh!

kenballus commented Sep 9, 2025

Uh oh!

kenballus commented Sep 9, 2025

Uh oh!

ioquatix commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kenballus commented Sep 10, 2025

Uh oh!

ioquatix commented Sep 11, 2025

Uh oh!

kenballus commented Sep 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ioquatix commented Sep 9, 2025 •

edited

Loading