tokenizeString performance improved by 57%#781
tokenizeString performance improved by 57%#781ahfuzhang wants to merge 4 commits intoVictoriaMetrics:masterfrom
Conversation
|
@func25 could you review this again ? |
func25
left a comment
There was a problem hiding this comment.
LGTM. Could you add more tests to cover the cases we discussed?
| // Register the token. | ||
| token := unsafe.String((*byte)(unsafe.Add(ptr, start)), end-start) | ||
| if curUnicodeFlag == 1 { | ||
| // Only perform tokenizeStringUnicode on very short substrings if the string contains Unicode characters |
There was a problem hiding this comment.
| // Only perform tokenizeStringUnicode on very short substrings if the string contains Unicode characters | |
| // If the current token contains non-ASCII bytes, delegate to tokenizeStringUnicode for rune-aware tokenization. Otherwise hash directly |
|
@valyala I don't understand why this project has such low activity levels. Has the core team moved on to another project? |
|
The team is focused on higher-priority work before the end of the year. + we agreed to merge this PR if no one else reviews it (it may be reverted later), but it still needs test cases, so I can't merge it right now |
|
@ahfuzhang , could you provide benchmark results for the BenchmarkTokenizeStrings before and after the optimization applied in this pull request? |
/* goos: darwin /* goos: darwin 2267.84 MB/s vs 1796.11 MB/s, about 26% speedup |
Describe Your Changes
Checklist
The following checks are mandatory:
@valyala
I'm sorry, I don't participate much in open source projects. It seems I'm not following the proper procedures and etiquette enough. I'd appreciate some guidance if I find myself doing things inappropriately.
Could you please take a moment to review my PR? My idol