Recently I tried to visualize the current state of a circuit breaker in Grafana as a Stat. The monitored system is based on Spring Boot and resilience4j while gathering metrics using Micrometer and InfluxDB.
When trying to visualize all states in a single Stat I first only get a single Stat per circuit breaker state. Combining all to a single Stat wasn’t as easy as I expected.
The problem
Sadly resilience4j doesn’t combine the states into one single metric rather than emitting a metric with multiple tags per state (see circuitbreaker-metrics). So for every state/tag there will be a value of 0 or 1 determining that the state is not active (= 0) or active (= 1) accordingly.
Naively attempting to use the “group by” function with the tag “state” will result in multiple Stats showing only a single state per tag:
Not my desired outcome.
A solution
So how can we combine these states into a single Stat, ideally showing the state as text (rather than a numeric value) ?
Like this:
- Create a Query per state (and multiply the value with a factor to spread the result set)
- Reduce the queries using the Transformation tab (by simply summing every value up)
- (Optionally) Add a Value Mapping and Threshold for every possible value in the result set
So let’s get a little bit into the details.
Create a Query per state
First we have to create a query per state (“closed”, “half_open”, “open”, “forced_open” and “disabled”) as shown in the image below.
Note:
- For every query there is a different factor multiplied (which will be explained in the next step).
- The “application_name” is a custom tag to distinguish different applications inside the InfluxDB—and can be ignored here.
- The “name” tag corresponds to the resilience4j circuit breaker’s name given inside my application.
Reduce the queries using the Transformation tab
After creating the queries we reduce all of them to a single, new value by simply calculating the sum of the other queries’ outcome. We’re naming this new value “state” (because we’re not having enough ambiguity by now :-). The “Time” field has to be excluded.
Now multiplying a factor to every query comes into play.
If the circuit breaker is in a certain state, the according metric will change to 1 (while all others remain 0). By multiplying a factor, this enables a one-to-one mapping between the value “state” (received by reducing the other queries) and the circuit breaker’s state:
- 1 = closed
- 2 = half_open
- 4 = open
- 8 = forced_open
- 16 = disabled
Noteworthy:
- If “state” has a value of 0, this means that there is no state for the circuit breaker—which has to be a failure while measuring
- If (for whatever reason) more than one circuit breaker state is measured with “1”, the sum of all states will result in a “state” which isn’t covered in our list above. E.g. if “closed” and “open” both are measured true, then the sum will be 1 + 4 = 5, and 5 is obviously not a covered condition (and will be shown in Grafana’s visualization accordingly).
- This is also the reason we choose a power of 2 for the factors. If we had chosen a factor like [closed=1, half_open=2 ,open=3, forced_open=4, disabled=5] we wouldn’t be able to recognize an erroneous situation like this.
Finally, an additional step is needed to reduce the states to a single Stat: select the transformation outcome “state” at “Value options / Fields” in the “Panel display options”.
Add a Value Mapping and Threshold for every possible value in the result set
By now we’ve reached our goal to have a single Stat combining every circuit breaker state.
Let’s tweak the appearance by adding a value mapping for every possible value. This will result in a nicer user experience because we now are showing the state as readable text instead of a numeric value only. Additionally we set a Threshold to color every other state than “closed” in a heart warming red.
This will lead to the desired outcome if the Circuit Breaker is closed:
And if the Circuit Breaker is open (all other states according):