Circuit breakers on Microservices
I want to talk about one of the ways we have to be resilient in case we found some of these issues while our systems are running in production: the circuit breakers.
microservices, distributed computing, circuit breakers, messaging, HTTP, synchronous
It is well known that distributed computing -and a microservices architecture approach is distributed by definition- has some problems. Peter Deutsch wrote in 1994 the eight fallacies of distributed computing, which summarizes many of the problems we can have from the perspective of the fallacies that those who wanted to just ignore it defended at that time.
But, as I am not going to hide them, I want to talk about one of the ways we have to be resilient in case we found some of these issues while our systems are running in production: the circuit breakers.
Avoid depending on other services to begin
Even when I am going to talk about how to be resilient when we depend on external services that are failing, I need to advise you that you should not depend on synchronous communication except it is absolutely necessary. You can find more information on how to properly communicate microservices -which is also applicable to external services- here.
When you depend on an external service -or on another microservice of your microservices architecture- and you need to do synchronous calls to it, it could happen that this service:
- Is broken: a developer made a mistake and the service, even when it responds, it responds with an improper response like a 500 HTTP code.
- The service is responding eventually but is taking so long. This may fire some timeout on your side.
- The service is not responding at all because is down.
- The service is not reliable, as is responding only from time to time, giving errors, or not responding at all randomly.
In this situation and if you don't prepare your system to be reliable, it will crash and your users will be affected.
The circuit breaker
Could you make a plan B to be run in case the running system depends on a service having some of the issues above? The circuit breaker represents the concept of changing from plan A -your synchronous call- to plan B when it detects that the service is failing too often. Let's see the flow:
- Given we are tracking the number of failed requests to the service:
- If we reach some threshold we mark the service as broken and we set Plan B
- From time to time we check the service: when it works again, we mark it as working and Plan A again is set.
It is important to note that all of these situations need to be properly logged and we need to set alarms so we can detect these situations that not always are going to be fixed by others -for example, when the falling service is one of our microservices. More on monitoring properly here.
The plan B
We've talked about setting a plan B in case our circuit breaker detects that the target service is failing but, what should be this plan B?
There's not a single answer to this, as it depends on the situation, the kind of flow, the targeted service, and more. But here you have some options:
- Send a message to a queue and set up a daemon to process those messages. This daemon would be ignoring messages when the service is marked as broken and will be processing them when marked as working. So, when the service is working again, your daemon will process all the pending messages. There are some considerations, though: sorting issues, whether or not it makes sense to process the message later than expected, idempotency, and transactionality...
- Respond using a cache: If the service you depend on is just returning you some information, you could consider saving the information in a cache and returning it in case the service is down.
- Disable the full feature: in some situations, disabling a whole feature of your product can be the only way to react to this kind of situation properly.
Distributed computing is not easy, so the same applies to a microservices architecture. Even though, there are dozens of situations when it is still the best option: for those situations, we need to fix all the drawbacks of it. A circuit breaker is an excellent tool for this.