Skip to content

Allow prefiltering based on string fields#341

Open
Dr-Emann wants to merge 64 commits intoKong:mainfrom
Dr-Emann:prefilter
Open

Allow prefiltering based on string fields#341
Dr-Emann wants to merge 64 commits intoKong:mainfrom
Dr-Emann:prefilter

Conversation

@Dr-Emann
Copy link
Contributor

@Dr-Emann Dr-Emann commented Feb 3, 2026

Closes #339
Fixes #338

Note, this requires calling the enable_prefilter method, otherwise nothing changes.

In the case where most matches can be filtered out based on prefixes, it can greatly speed up matching by quickly eliminating large swaths of possible expressions.

I believe most of the time, the http.path field will be ideal for this filtering, where most of the time, looking at the prefix of the value can narrow down matching to just a small number of possible matches.

It makes building the router a bit slower

Build Router            time:   [4.3063 ms 4.3118 ms 4.3177 ms]
Build Router with Prefilter
                        time:   [5.0742 ms 5.0842 ms 5.0949 ms] (1.18x)

But it can greatly decrease the cost of matching on all but the very first few expressions:

match_mix/match/0       time:   [248.85 ns 249.38 ns 250.00 ns]
match_mix/match/10      time:   [525.80 ns 526.30 ns 526.99 ns]
match_mix/match/49999   time:   [15.977 ms 16.450 ms 16.946 ms]
match_mix/match/99999   time:   [32.377 ms 33.620 ms 34.941 ms]
match_mix/no match/100001
                        time:   [30.529 ms 31.400 ms 32.325 ms]

match_mix with prefilter/match/0
                        time:   [337.58 ns 338.34 ns 339.10 ns] (1.36x)
match_mix with prefilter/match/10
                        time:   [408.87 ns 410.04 ns 411.50 ns] (0.779x)
match_mix with prefilter/match/49999
                        time:   [578.56 ns 579.68 ns 580.93 ns] (0.0000352x)
match_mix with prefilter/match/99999
                        time:   [616.64 ns 617.38 ns 618.24 ns] (0.0000184x)
match_mix with prefilter/no match/100001
                        time:   [379.15 ns 379.83 ns 380.60 ns] (0.0000121x)

@StarlightIbuki
Copy link

This should be effective. However, I need to find a more representative test suite to verify. Working on it

@StarlightIbuki
Copy link

Test results show high confidence in the improvement. I used GitHub OpenAPI spec and random requests as the test scenario. It shows a slight, but statistically significant improvement in throughput for a constant config(which is kind of expected as cache plays a role in this case); and it shows great improvement in P99 (tail latency) for config changes(route configuring during the tests).
The plot is on the way.

@Dr-Emann
Copy link
Contributor Author

Dr-Emann commented Mar 2, 2026

I used GitHub OpenAPI spec and random requests as the test scenario
...
which is kind of expected as cache plays a role in this case

What does "random requests" mean in this case? Totally random urls, or urls with a random component (/users/{uuid}) (cache wouldn't work in that case)? Or maybe random choices from a set of matching urls?

@StarlightIbuki
Copy link

I used GitHub OpenAPI spec and random requests as the test scenario
...
which is kind of expected as cache plays a role in this case

What does "random requests" mean in this case? Totally random urls, or urls with a random component (/users/{uuid}) (cache wouldn't work in that case)? Or maybe random choices from a set of matching urls?

URL towards random valid APIs, with random valid parameters.

@StarlightIbuki
Copy link

StarlightIbuki commented Mar 3, 2026

Config updating (changing router)
changeroute conf-github script-github default_major

Constant config
defualt conf-github script-github default_major

@StarlightIbuki
Copy link

Lax Max is highly dominated by randomness. Lat P99 and P90 are better indicators for tail latencies.

@Dr-Emann
Copy link
Contributor Author

Dr-Emann commented Mar 4, 2026

I wonder how CPU usage is affected, as well. The Kong blog has said:

Router matching represents one of the most resource-intensive operations within our core proxy path

@StarlightIbuki
Copy link

I wonder how CPU usage is affected, as well. The Kong blog has said:

Router matching represents one of the most resource-intensive operations within our core proxy path

The recorded CPU usage shows 100% utilization. The test is CPU-bound, thus it's already reflected in the throughput metrics.
It requires a different test scenario to check how the CPU usage changes

Copy link
Member

@Oyami-Srk Oyami-Srk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Could you please extend the existing tests to use enable_prefilter?

Copy link
Member

@Oyami-Srk Oyami-Srk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, still need some review and decision from the Kong 3 team. cc @michaelxiong-byte

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prefilter based on http.path match_mix benchmark seems misleading

3 participants