Rails adds jitter to ActiveJob::Exceptions.retry_on


Rails has added jitter to ActiveJob::Exceptions.retry_on to prevent the thundering herd effect.

Before Rails 6

Thundering herd effect in the context of ActiveJob job failures

When a job fails, the ActiveJob::Exceptions.retry_on catches the exception for the failed job and reschedules the job with a specified delay. The specified delay can be seconds (ActiveSupport::Duration is also allowed), exponentially_longer or a block which has several executions as an argument.

Consider a situation where a lot of jobs fail around the same time. The reason can be anything, a network error or load on the database. The exception handler will catch the error for each job and reschedule with a static wait time.

The problem with static wait time is, the reschedule may happen around the same time and the jobs will be retried at the provided interval. With many failed jobs being tried at the same time causing the thundering herd effect, it may easily bring down the system resulting in jobs failing again. The jobs will be retried again in a static interval causing similar results until the retry limit is exhausted.

With Rails 6

Jitter

The thundering herd effect can be taken care of with a custom wait argument block. But with many users not being aware that a situation like this may happen, a jitter was introduced by default which helps in randomizing the wait.

The wait time is randomized with jitter depending on the type of wait argument provided. Essentially, a random slight delay calculated from jitter value is added to the wait time. Note that the jitter will only be applied when an Integer, ActiveSupport::Duration or exponentially_longer, is passed to the wait argument in retry_on.

The jitter value defaults 15% (represented as 0.15), but can be overridden by passing it as an argument,

retry_on CustomAppException, wait: 5.seconds, attempts: 3, jitter: 0.30

There is also a provision to add default jitter value at the config level (defaults to 0.15),

config.active_job.retry_jitter = 0.45

If you do not wish to add jitter, it can be disabled by providing a zero/false value,

retry_on CustomAppException, wait: 5.seconds, attempts: 3, jitter: 0 # or false

Or at config level,

config.active_job.retry_jitter = false # or 0