This is second in a series of posts exploring service to service call patterns in some of the application runtimes on Google Cloud. The first in the series explored service to service call patterns in GKE.
This post will expand on it by adding in a Service Mesh, specifically Anthos Service Mesh, and explore how the service to service patterns change in the presence of a mesh. The service to service call with be across services in a single cluster. The next post will explore services deployed to multiple GKE clusters.
Set-Up
The steps to set-up a GKE cluster and install Anthos service mesh on top of it is described in this document - https://cloud.google.com/service-mesh/docs/unified-install/install, in brief these are the commands that I had to run in my GCP Project to get a cluster running:
The services that I will be installing is fairly simple and looks like this:
- Introduce response time delays
- Respond with certain status codes
The codebase for the "caller" and "producer" are in this repository - https://github.com/bijukunjummen/sample-service-to-service, there are kubernetes manifests available in the repository to bring up these services.
Behavior 1 - Mutual TLS
The first behavior that I want to see is for the the caller and the producer to verify each others identities by presenting and validating their certificates.
This can be done by adding in a istio DestinationRule for the producer, along these lines:
Alright now that the set-up in place, the following is what gets captured as the request flows from the Browser to the Ingress Gateway to the Caller to the Producer.
Behavior 2 - Timeout
The second behavior that I want to explore is the timeouts. A request timeout can be set for the call from the Caller to Producer by creating a Virtual Service for the Producer with the value set, along these lines:
The mesh responds with a http status code of 504 with a message of "Upstream timed out".
Behavior 3 - Circuit Breaker
Circuit breaker is implemented using a Destination Rule resource
Here I have configuration which breaks the circuit if 3 continuous 5XX responses are received from the Producer in a 15 second interval, and then does not make a request for another 15 secondsWith this configuration in place a request with broken circuit looks like this:
Conclusion
The neat thing is that in all scenarios so far, the way the Caller calls the Producer remains exactly the same, it is the mesh which injects in the appropriate security controls through mTLS and the resilience of calling service through timeouts and circuit breaker.