-
Notifications
You must be signed in to change notification settings - Fork 286
Refactor mime.zig
#1085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor mime.zig
#1085
Conversation
karlseguin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve of changes for the sake of changes, but can you benchmark it to compare? At least we'll have a better idea of further changes like this is worth it.
|
|
||
| /// Matches the first 3 characters of data with given characters. | ||
| inline fn match3(data: *const [3]u8, c0: u8, c1: u8, c2: u8) bool { | ||
| return data[0] == c0 and data[1] == c1 and data[2] == c2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use a u24 here? (genuine question)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unsure to prefer arbitrary bit widths, it can definitely suit here.
| continue; | ||
| } | ||
|
|
||
| const charset_: u64 = @bitCast([_]u8{ 'c', 'h', 'a', 'r', 's', 'e', 't', '=' }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dunno if it's valid, but go parses "charset = hello" (space after charset)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the same but the format specified as:
parameter = parameter-name "=" parameter-value
parameter-name = token
parameter-value = ( token / quoted-string )
token = 1*tchar
tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*"
/ "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
/ DIGIT / ALPHA
; any VCHAR, except delimiters
I've also tested it in the wild, web servers seem to comply.
| return parseOther(normalized); | ||
| }, | ||
| // Perfect cases. | ||
| text_xml => break :blk .{ .text_xml = {} }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this match text/xmlover9000! ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It Fails since after text/xml, following bytes should contain charset= though text/xmlcharset=utf-8 passes sadly...
| if (value.len == 0) { | ||
| return error.Invalid; | ||
| } | ||
| if (match4(rem[0..4], 'a', 's', 'c', 'r') and match3(rem[4..7], 'i', 'p', 't')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, would match text/javascript_is_java_written_in_cursive ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as before, text/javascriptcharset=utf-8 passes, will handle semicolon case.
| continue; | ||
| } | ||
|
|
||
| const charset_: u64 = @bitCast([_]u8{ 'c', 'h', 'a', 'r', 's', 'e', 't', '=' }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enough repeated logic here that I'd be tempted to try to DRY it if possible.
I've benchmarked both implementations in a separate environment with large data, I think the performance gain compared to code complexity isn't worth it: Closing in favor of this. |
This PR refactors
Mimeand introduces different parsing logic (prefers integer comparisons where possible) while giving up readability a bit.