Skip to content

Conversation

kkrik-es
Copy link
Contributor

@kkrik-es kkrik-es commented Mar 7, 2025

No description provided.

@kkrik-es kkrik-es self-assigned this Mar 7, 2025
@eyalkoren
Copy link
Contributor

I started microbenchmarking. I am pushing this playground code so we have reference for what the benchmarks results are for. It is all reversible.
For ebf5d42, these are some initial micro-microbenchmark results:

UUID recognition

Benchmark                                                                                                (uuid)  Mode  Cnt    Score    Error  Units
PatternedTextMapperOperationsBenchmark.testUuidMatchManual                 550e8400-e29b-41d4-a716-446655440000  avgt    3    1.267 ±  0.092  ns/op
PatternedTextMapperOperationsBenchmark.testUuidMatchManual                                           not-a-uuid  avgt    3    0.870 ±  0.368  ns/op
PatternedTextMapperOperationsBenchmark.testUuidMatchManual                123e4567-e89b-12d3-a456-4266141740000  avgt    3    0.833 ±  0.050  ns/op
PatternedTextMapperOperationsBenchmark.testUuidMatchManualWithValidation   550e8400-e29b-41d4-a716-446655440000  avgt    3   49.799 ±  3.024  ns/op
PatternedTextMapperOperationsBenchmark.testUuidMatchManualWithValidation                             not-a-uuid  avgt    3    0.833 ±  0.016  ns/op
PatternedTextMapperOperationsBenchmark.testUuidMatchManualWithValidation  123e4567-e89b-12d3-a456-4266141740000  avgt    3    0.856 ±  0.598  ns/op
PatternedTextMapperOperationsBenchmark.testUuidMatchRegex                  550e8400-e29b-41d4-a716-446655440000  avgt    3  220.213 ± 35.570  ns/op
PatternedTextMapperOperationsBenchmark.testUuidMatchRegex                                            not-a-uuid  avgt    3   16.270 ± 17.619  ns/op
PatternedTextMapperOperationsBenchmark.testUuidMatchRegex                 123e4567-e89b-12d3-a456-4266141740000  avgt    3  248.779 ± 88.636  ns/op

Summary

Regex is out of the question, even when it should fail fast it takes too much CPU time.
Between the two "manual" (non-regex) versions, the verification is ~40x more expensive for true positives, which maybe doesn't worth the cost as false positives are not so terrible.
So we stay with the simple and very efficient recognition.

IPv4 recognition

Benchmark                                                                     (input)  Mode  Cnt    Score    Error  Units
PatternedTextMapperOperationsBenchmark.testIpv4MatchManual                   172.16.0  avgt    3   59.765 ±  3.055  ns/op
PatternedTextMapperOperationsBenchmark.testIpv4MatchManual            255.255.255.255  avgt    3  162.797 ±  7.804  ns/op
PatternedTextMapperOperationsBenchmark.testIpv4MatchManual_Iterative         172.16.0  avgt    3    6.363 ±  2.359  ns/op
PatternedTextMapperOperationsBenchmark.testIpv4MatchManual_Iterative  255.255.255.255  avgt    3   11.538 ±  3.148  ns/op
PatternedTextMapperOperationsBenchmark.testIpv4MatchRegex                    172.16.0  avgt    3  241.109 ± 47.092  ns/op
PatternedTextMapperOperationsBenchmark.testIpv4MatchRegex             255.255.255.255  avgt    3  110.812 ±  3.420  ns/op

Summary

Regex is out of the question.
The algorithm I propose in this commit is an order of magnitude more efficient than the former one, so we'll stick with it. It also provides full validation to IPv4, which may be more important in this case as there is potential that some time in the future we turn such arguments into IP field type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants