Circuit breaking

Circuit breaking is a critical component of distributed systems. It’s nearly always better to fail quickly and apply back pressure downstream as soon as possible. One of the main benefits of an Envoy mesh is that Envoy enforces circuit breaking limits at the network level as opposed to having to configure and code each application independently. Envoy supports various types of fully distributed (not coordinated) circuit breaking:

  • Cluster maximum connections: The maximum number of connections that Envoy will establish to all hosts in an upstream cluster. If this circuit breaker overflows the upstream_cx_overflow counter for the cluster will increment.

  • Cluster maximum pending requests: The maximum number of requests that will be queued while waiting for a ready connection pool connection. Requests are added to the list of pending requests whenever there aren’t enough upstream connections available to immediately dispatch the request. For HTTP/2 connections, if max concurrent streams and max requests per connection are not configured, all requests will be multiplexed over the same connection so this circuit breaker will only be hit when no connection is already established. If this circuit breaker overflows the upstream_rq_pending_overflow counter for the cluster will increment.

  • Cluster maximum requests: The maximum number of requests that can be outstanding to all hosts in a cluster at any given time. If this circuit breaker overflows the upstream_rq_pending_overflow counter for the cluster will increment.

  • Cluster maximum active retries: The maximum number of retries that can be outstanding to all hosts in a cluster at any given time. In general we recommend using retry budgets; however, if static circuit breaking is preferred it should aggressively circuit break retries. This is so that retries for sporadic failures are allowed, but the overall retry volume cannot explode and cause large scale cascading failure. If this circuit breaker overflows the upstream_rq_retry_overflow counter for the cluster will increment.

  • Cluster maximum concurrent connection pools: The maximum number of connection pools that can be concurrently instantiated. Some features, such as the Original Src Listener Filter, can create an unbounded number of connection pools. When a cluster has exhausted its concurrent connection pools, it will attempt to reclaim an idle one. If it cannot, then the circuit breaker will overflow. This differs from Cluster maximum connections in that connection pools never time out, whereas connections typically will. Connections automatically clean up; connection pools do not. Note that in order for a connection pool to function it needs at least one upstream connection, so this value should likely be no greater than Cluster maximum connections. If this circuit breaker overflows the upstream_cx_pool_overflow counter for the cluster will increment.

Each circuit breaking limit is configurable and tracked on a per upstream cluster and per priority basis. This allows different components of the distributed system to be tuned independently and have different limits. The live state of these circuit breakers, including the number of resources remaining until a circuit breaker opens, can be observed via statistics.

Note that circuit breaking will cause the x-envoy-overloaded header to be set by the router filter in the case of HTTP requests.