Summary
Spring Boot 2 with Spring Webflux based application outperforms a Spring Boot 1 based application by a huge margin for IO heavy workloads. The following is a summarized result of a load test - Response time for a IO heavy transaction with varying concurrent users:When the number of concurrent users remains low (say less than 1000) both Spring Boot 1 and Spring Boot 2 handle the load well and the 95 percentile response time remains milliseconds above a expected value of 300 ms.
At higher concurrency levels, the Async Non-Blocking IO and reactive support in Spring Boot 2 starts showing its colors - the 95th percentile time even with a very heavy load of 5000 users remains at around 312ms! Spring Boot 1 records a lot of failures and high response times at these concurrency levels.
Details
My set-up for the performance test is the following:
The sample applications expose an endpoint(/passthrough/message) which in-turn calls a downstream service. The request message to the endpoint looks something like this:
{ "id": "1", "payload": "sample payload", "delay": 3000 }
The downstream service would delay based on the "delay" attribute in the message (in milliseconds).
Spring Boot 1 Application
I have used Spring Boot 1.5.8.RELEASE for the Boot 1 version of the application. The endpoint is a simple Spring MVC controller which in turn uses Spring's RestTemplate to make the downstream call. Everything is synchronous and blocking and I have used the default embedded Tomcat container as the runtime. This is the raw code for the downstream call:public MessageAck handlePassthrough(Message message) { ResponseEntity<MessageAck> responseEntity = this.restTemplate.postForEntity(targetHost + "/messages", message, MessageAck.class); return responseEntity.getBody(); }
Spring Boot 2 Application
Spring Boot 2 version of the application exposes a Spring Webflux based endpoint and uses WebClient, the new non-blocking, reactive alternate to RestTemplate to make the downstream call - I have also used Kotlin for the implementation, which has no bearing on the performance. The runtime server is Netty:import org.springframework.http.HttpHeaders import org.springframework.http.MediaType import org.springframework.web.reactive.function.BodyInserters.fromObject import org.springframework.web.reactive.function.client.ClientResponse import org.springframework.web.reactive.function.client.WebClient import org.springframework.web.reactive.function.client.bodyToMono import org.springframework.web.reactive.function.server.ServerRequest import org.springframework.web.reactive.function.server.ServerResponse import org.springframework.web.reactive.function.server.bodyToMono import reactor.core.publisher.Mono class PassThroughHandler(private val webClient: WebClient) { fun handle(serverRequest: ServerRequest): Mono<ServerResponse> { val messageMono = serverRequest.bodyToMono<Message>() return messageMono.flatMap { message -> passThrough(message) .flatMap { messageAck -> ServerResponse.ok().body(fromObject(messageAck)) } } } fun passThrough(message: Message): Mono<MessageAck> { return webClient.post() .uri("/messages") .header(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE) .header(HttpHeaders.ACCEPT, MediaType.APPLICATION_JSON_VALUE) .body(fromObject<Message>(message)) .exchange() .flatMap { response: ClientResponse -> response.bodyToMono<MessageAck>() } } }
Details of the Perfomance Test
The test is simple, for different sets of concurrent users (300, 1000, 1500, 3000, 5000), I send a message with the delay attribute set to 300 ms, each user repeats the scenario 30 times with a delay between 1 to 2 seconds between requests. I am using the excellent Gatling tool to generate this load.
Results
These are the results as captured by Gatling:300 concurrent users:
Boot 1 | Boot 2 |
---|---|
1000 concurrent users:
Boot 1 | Boot 2 |
---|---|
1500 concurrent users:
Boot 1 | Boot 2 |
---|---|
3000 concurrent users:
Boot 1 | Boot 2 |
---|---|
5000 concurrent users:
Boot 1 | Boot 2 |
---|---|
Great Post.
ReplyDeleteThanks
It's not fair since spring boot 1 has embedded tomcat which is not tuned.
ReplyDeleteFirst, who tunes tomcat. Second, will that matter? is Netty tuned?
DeleteI like the post, but tried to run this on my local computer, and looks like my system is the bottleneck in Spring Boot 2 solution when I am testing above 1500 concurrent users. Not sure why, but my CPU load is limited to 50%, while running the Gatling. Is there some limitations etc., that's are using only 50% of each core power? My CPU is 6-cores i7-4930K @ 3.40GHz. Gatling is using all 12 logical cores, but do not exceeds 50% at each single core.
ReplyDeleteI like it!
ReplyDeleteMy curiosity: have you tried with less than 100 concurrent users?
I got result where Boot 1 runs very faster than WebFlux; can you confirm my feeling?
Thank you.