Retry Design Pattern with Istio

By Samir Behara on June 5, 2019 • ( 3 )

Problem Statement

In a dynamic cloud environment, there can be scenarios when there are intermittent network connectivity errors causing your service to be unavailable. These issues are generally self-correcting and if you retry the operation after a small delay its most probably going to succeed. You need to design your Microservice architecture to handle such transient errors gracefully.

Retry Design Pattern

Retry Design Pattern states that you can retry a connection automatically which has failed earlier due to an exception. This is very handy in case of temporary or one-off issues with your services. A lot of times a simple retry might fix the issue. The load balancer might point you to a different healthy server on the retry, and your call might get successful.

You might have bumped into scenarios where your application is not able to connect with the database. However if the application retries after a short delay it might successfully establish the database connection. Having a Retry pattern can stabilize your applications from these intermittent network issues. This also reduces the burden on the application for handling failures in case of such transient errors.

You can specify the number of retry attempts for an HTTP request in a virtual service. You can mention the interval between retries – the service might be busy and processing the request after a delay sometimes does the trick. If the request is unsuccessful after the retry attempts, the service should treat it as an error and handle it accordingly.

Sample VirtualService below —

	apiVersion: networking.istio.io/v1alpha3
	kind: VirtualService
	metadata:
	name: serviceB
	spec:
	hosts:
	– serviceB
	http:
	– route:
	– destination:
	host: serviceB
	retries:
	attempts: 3
	perTryTimeout: 2s

view raw

Retry Strategy with Istio.yaml

hosted with ❤ by GitHub

The Retry Policy should be carefully implemented such that you are not negatively impacting the service. For example – You should not retry a large number of times nor try to retry with a small delay. You should set the Retry Policy such that you are confident of identifying any transient errors and report the failure as soon as possible. If operations are not idempotent, the retry logic might introduce additional complexity.

Conclusion
Handling transient failures is required in a microservice architecture. The network is not unreliable and sometimes retrying the failed operation saves the day. Istio provides a transparent approach of handling application retires in case of such intermittent network errors.