-
Couldn't load subscription status.
- Fork 38.8k
Description
ETA: 5 minutes
Hello! π
TL;DR
Just add:
.publishOn(Schedulers.boundedElastic()) // or maybe Schedulers.parallel()?Somewhere in WebClient internals to improve app performance π
Context
Recently i was deep diving into SpringWebFlux + Spring WebClient app to figure out which parts of the code are executed by given thread. It turns out, that if i have cpu bound operation, even as simple as encoding object to json, then they are executed on WebClient threads by default. I've created MVCE App for this. Basically the whole app is something like this:
@GetMapping("/endpoint")
fun endpoint(): Mono<ResponseEntity<AppResponse>> =
webClient
.get()
.uri("http://some-external-service/endpoint")
.retrieve()
.bodyToMono(MockServerResponse::class.java)
// comment this line if needed
.publishOn(Schedulers.parallel())
.map {
heavyCpuOperation()
it
}
.map { ResponseEntity.ok(AppResponse(it.data)) }
private fun heavyCpuOperation() {
var bigInteger = BigInteger.ZERO
for (i in 0..500_000) {
bigInteger = bigInteger.add(BigInteger.valueOf(i.toLong()))
}
}Here are the results:
- Logs, where there is no
.publishOn()operator
reactor-http-nio-3 ### com.nalepa.publishon.AppEndpoint ###
reactor-http-nio-3 ### com.nalepa.publishon.AppEndpoint ### ENDPOINT: Start processing request
http-client-nio-2 ### io.netty.channel.DefaultChannelPipeline$HeadContext ### Writing data to socket
http-client-nio-2 ### org.springframework.http.codec.json.Jackson2JsonDecoder ### Decoding webClient response
http-client-nio-2 ### com.nalepa.publishon.AppEndpoint ### WEBCLIENT: I hava response from external service
http-client-nio-2 ### com.nalepa.publishon.AppEndpoint ### CPU OPERATION: Started heavy operation
http-client-nio-2 ### com.nalepa.publishon.AppEndpoint ### CPU OPERATION: Ended heavy operation
http-client-nio-2 ### org.springframework.http.codec.json.Jackson2JsonEncoder ### Encoding endpoint response
reactor-http-nio-3 ### io.netty.channel.DefaultChannelPipeline$HeadContext ### Writing data to socket
http-client-nio-2 ### com.nalepa.publishon.AppEndpoint ### ENDPOINT: Ended processing request
- Logs, where there is
.publishOn(Schedulers.parallel())operator
reactor-http-nio-4 ### com.nalepa.publishon.AppEndpoint ###
reactor-http-nio-4 ### com.nalepa.publishon.AppEndpoint ### ENDPOINT: Start processing request
http-client-nio-2 ### io.netty.channel.DefaultChannelPipeline$HeadContext ### Writing data to socket
http-client-nio-2 ### org.springframework.http.codec.json.Jackson2JsonDecoder ### Decoding webClient response
http-client-nio-2 ### com.nalepa.publishon.AppEndpoint ### WEBCLIENT: I hava response from external service
parallel-1 ### com.nalepa.publishon.AppEndpoint ### CPU OPERATION: Started heavy operation
parallel-1 ### com.nalepa.publishon.AppEndpoint ### CPU OPERATION: Ended heavy operation
parallel-1 ### org.springframework.http.codec.json.Jackson2JsonEncoder ### Encoding endpoint response
parallel-1 ### com.nalepa.publishon.AppEndpoint ### ENDPOINT: Ended processing request
reactor-http-nio-4 ### io.netty.channel.DefaultChannelPipeline$HeadContext ### Writing data to socket
As you can see, CPU Operation is executed on http thread when there is no .publishOn(). I've decided to perform some tests related to this.
Few words about testing
Dependencies used:
- Spring Boot
3.4.1 - Java 21
Platform:
- MacBook Air M2, 16GB RAM
Testing App:
I know, that for testing it's good to tests many scenarios, many schedulers etc. I've done that. In this issue, I'm putting results for so called Base and Complex scenario. Also, maybe I'm missing something and those tests simply does not make any sense, if yes, please let me know!
In tests, i was also ignoring like first 2-3 minutes of results, due to JVM warmup.
Performance Testing
I've decided to perform some tests to find out, if there will be any performance improvement by publishing webClient response on another scheduler.
Base Scenario
I've started with simpliest one:
@GetMapping("/endpoint")
fun endpoint(): Mono<ResponseEntity<String>> =
webClient
.get()
.uri("http://some-external-service/endpoint")
.retrieve()
.bodyToMono(String::class.java)
.map { ResponseEntity.ok(it) }Here's the architecture diagram:

So basically the flow is something like:
1. Send `only one` request to an app
2. App get data from mock server by using `only one` WebClient.
3. Go to step 1
I've run this scenario in 3 variants:
- without
.publishOn() - with
.publishOn(Schedulers.parallel()) - with
.publishOn(Schedulers.boundedElastic())
For every one of them the results were similar, so I will post only one screenshot from Grafana.
- About 8K RPS
sum by (instance) (irate(http_server_requests_seconds_count[15s])) - About 5% CPU Usage
max by (instance) (process_cpu_usage)
So it's good to know, that adding .publishOn() did not has any impact on the simpliest app.
Complex Scenario
I've added:
- decoding response from
Mock Server:
data class MockServerResponse(
val value: String,
)- encoding response from an
TestAppby simply returningList<String>
So now app looks like this:
@GetMapping("/endpoint")
fun endpoint(): Mono<ResponseEntity<List<String>>> =
Flux
.fromIterable(webClients)
.flatMap {
it
.getResponseFromWebClient()
// comment if needed
.publishOn(Schedulers.boundedElastic())
}
.collectList()
.map { ResponseEntity.ok(it) }I've also changed a little bit architecture. Here's the diagram:

So basically the flow is something like:
1. Send `N` requests to an app
2. For every request app get data from mock server by using `M` WebClients.
3. Go to step 1
I've run this scenario in 3 variants:
- without
.publishOn() - with
.publishOn(Schedulers.parallel()) - with
.publishOn(Schedulers.boundedElastic())
Results where there is no .publishOn() or there isSchedulers.parallel() were similar:
- About 240 RPS
sum by (instance) (irate(http_server_requests_seconds_count[15s])) - About 33% CPU Usage
max by (instance) (process_cpu_usage) - About 260 ms response times
max by (instance) (http_server_requests_seconds{uri="/dummy/{id}", quantile="0.999"})
Results for .publishOn(Schedulers.boundedElastic()) were better:
- About 300 RPS
sum by (instance) (irate(http_server_requests_seconds_count[15s])) - About 53% CPU Usage
max by (instance) (process_cpu_usage) - About 185 ms response times
max by (instance) (http_server_requests_seconds{uri="/dummy/{id}", quantile="0.999"})
So adding .publishOn(Schedulers.boundedElastic()) bring performance benefits! β€οΈ
- RPS: ~240 -> ~300
- CPU Usage: ~33% -> ~53%
- Response times: ~260 ms -> 185 ms
Based on my tests i would say, that when all Web Client threads are executing cpu bound operations, then using .boundedElastic() shines β€οΈ
Few words about Schedulers
As far as I know:
parallel- every thread has it's own task queueboundedElastic- every thread share one task queu
I did also an comparison:
Schedulers.newBoundedElastic(
Runtime.getRuntime().availableProcessors(), 100_000, "customBounded"
)
vs
Schedulers.newParallel(
"customParallel", Runtime.getRuntime().availableProcessors(),
)
BoundedElastic was also better in that case.
I've tried to find out why and I have no answer for that.
Maybe:
- it's related with task queue?
- those threads are blocking/synchronizing somewhere? Project Reactor Docs Schedulers says, that
.parallel()should not execute blocking code
Question
What do you think about publishing response from webClient on .boundedElastic() by default?
Also, with possibility to overwrite or disable it.
Proposal
I've focused only on WebClient.Builder API due to fact, that imo this is more important from programming experience than the internals.
fun create(number: Int, size: String): WebClient =
webClientBuilder
// disable this new option
.disablePublishResponseOnAnotherThread()
// publish response on another scheduler
.publishResponseOn(schedulerProvidedByProgrammer)
.build()Alternatives
In my tests, i was just adding .publishOn() after getting response from WebClient. But WebClient threads are also decoding response from downstream service. Maybe we should use .publishOn() even before that deserializing?
Also, if those tests makes sense, maybe you will be able to provide some another tests, just to double confirm results?
I didn't checked database clients, but maybe they are working in the samy way?
Summary
Publishing WebClient response on .boundedElatic() bring performance improvement. It leads to:
- shorter response times
- higher RPS
- higher CPU usage
Please let me know what do you think about all of this π


