Reliability is key to Microservices architecture. Circuit Breaker is a design pattern to create resilient microservices by limiting the impact of service failures and latencies. One of the primary goals of the Circuit Breaker pattern is to handle failures gracefully so that no cascading failures occur. In a Microservice landscape, failing fast is critical. Circuit Breaker does a great job in protecting the service from a heavy load.
If there are failures in your Microservice ecosystem, then you need to fail fast by opening the circuit. This ensures that no additional calls are made to the failing service, so that we return an exception immediately. This pattern also monitors the system for failures and once things are back to normal, the circuit is closed to allow normal functionality.
In my earlier blog post, I explained about Outlier Detection – which is an Istio Resiliency strategy to detect unusual host behavior and evict the unhealthy hosts from the set of load balanced healthy hosts inside a cluster. Read more about it here —
Hystrix vs Istio
- The Hystrix library, part of Netflix OSS, has been the leading circuit breaker tooling in the microservices world. Hystrix can be considered as Whitebox Monitoring whereas Istio can be considered as Blackbox Monitoring, primarily because Istio monitors the system from outside and does not know how the system works internally. On the other hand, Hystrix libraries are added to each of the individual services to capture the required data.
- You can configure and use advanced resiliency features from Istio without changing the application code. Hystrix implementation requires changing each of your services to include the Hystrix libraries.
- Istio improves the reliability and availability of services in the mesh. However, applications need to handle the errors and take appropriate fallback actions. For example, when all instances in a load balancing pool have failed, Envoy will return HTTP 503. It is the responsibility of the application to implement any fallback logic that is needed to handle the HTTP 503 error code from an upstream service. On the other hand, Hystrix does provide a fallback implementation which is very helpful. Hystrix fallback can be returning an error message, single default value, from cache or even call another service.
- Envoy is completely transparent to the application. Hystrix library has to be embedded in each of the service calls.
- Istio can be used as a Circuit Breaker in a Polyglot landscape, however, Hystrix is focused primarily towards Java applications.
Resiliency and Fault Tolerance capabilities
Istio adds fault tolerance to your application without any changes to the code. Some resiliency features it supports are —
- Retries and Timeouts
- Circuit breakers
- Health checks
- Outlier Detection
- Fault injection
Circuit Breaker Settings
Envoy provides a set of out-of-the-box opt-in failure recovery features that can be taken advantage of by the services in an application. You can place limits on the number of concurrent connections and requests to upstream services – so that systems are not overwhelmed with a large number of requests.
- Maximum Connections: Maximum number of connections to a backend. Any excess connection will be pending in a queue. You can modify this number by changing the maxConnections field.
- Maximum Pending Requests: Maximum number of pending requests to a backend. Any excess pending requests will be denied. You can modify this number by changing the http1MaxPendingRequests field.
- Maximum Requests: Maximum number of requests in a cluster at any given time. You can modify this number by changing the maxRequestsPerConnection field.
While creating a DestinationRule, you can mention the Circuit Breaker fields inside the TrafficPolicy section. Sample DestinationRule below —
The circuit breaker will short circuit any pending requests or connections that exceed the specified threshold. One of the primary goals of the circuit breaker is to fail fast.
In this article, we looked at how you can protect your services from an unexpected number of requests or a dependent service outage. You can implement a throttling logic to reject incoming requests based on the Circuit Breaker configuration.
Additional resources —