Problem Statement
In a dynamic cloud environment, there can be scenarios when there are intermittent network connectivity errors causing your service to be unavailable. These issues are generally self-correcting and if you retry the operation after a small delay its most probably going to succeed. You need to design your Microservice architecture to handle such transient errors gracefully.
Retry Design Pattern
Retry Design Pattern states that you can retry a connection automatically which has failed earlier due to an exception. This is very handy in case of temporary or one-off issues with your services. A lot of times a simple retry might fix the issue. The load balancer might point you to a different healthy server on the retry, and your call might get successful.
You might have bumped into scenarios where your application is not able to connect with the database. However if the application retries after a short delay it might successfully establish the database connection. Having a Retry pattern can stabilize your applications from these intermittent network issues. This also reduces the burden on the application for handling failures in case of such transient errors.
You can specify the number of retry attempts for an HTTP request in a virtual service. You can mention the interval between retries – the service might be busy and processing the request after a delay sometimes does the trick. If the request is unsuccessful after the retry attempts, the service should treat it as an error and handle it accordingly.
Sample VirtualService below —
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
apiVersion: networking.istio.io/v1alpha3 | |
kind: VirtualService | |
metadata: | |
name: serviceB | |
spec: | |
hosts: | |
– serviceB | |
http: | |
– route: | |
– destination: | |
host: serviceB | |
retries: | |
attempts: 3 | |
perTryTimeout: 2s | |
The Retry Policy should be carefully implemented such that you are not negatively impacting the service. For example – You should not retry a large number of times nor try to retry with a small delay. You should set the Retry Policy such that you are confident of identifying any transient errors and report the failure as soon as possible. If operations are not idempotent, the retry logic might introduce additional complexity.
Conclusion
Handling transient failures is required in a microservice architecture. The network is not unreliable and sometimes retrying the failed operation saves the day. Istio provides a transparent approach of handling application retires in case of such intermittent network errors.
Read more about Retry Pattern here
Categories: Architecture, Istio, Microservices, Service Mesh
Leave a Reply