Rails has added jitter
to ActiveJob::Exceptions.retry_on to prevent the thundering herd effect.
Before Rails 6
Thundering herd effect in the context of ActiveJob job failures
When a job fails, the ActiveJob::Exceptions.retry_on
catches the exception for the failed job and reschedules the job with a specified delay. The specified delay can be seconds (ActiveSupport::Duration is also allowed), exponentially_longer
or a block which has several executions as an argument.
Consider a situation where a lot of jobs fail around the same time. The reason can be anything, a network error or load on the database. The exception handler will catch the error for each job and reschedule with a static wait time.
The problem with static wait time is, the reschedule may happen around the same time and the jobs will be retried at the provided interval. With many failed jobs being tried at the same time causing the thundering herd effect, it may easily bring down the system resulting in jobs failing again. The jobs will be retried again in a static interval causing similar results until the retry limit is exhausted.
With Rails 6
Jitter
The thundering herd effect can be taken care of with a custom wait
argument block. But with many users not being aware that a situation like this may happen, a jitter
was introduced by default which helps in randomizing the wait
.
The wait
time is randomized with jitter
depending on the type of wait
argument provided. Essentially, a random slight delay calculated from jitter
value is added to the wait
time. Note that the jitter
will only be applied when an Integer
, ActiveSupport::Duration
or exponentially_longer
, is passed to the wait
argument in retry_on
.
The jitter
value defaults 15% (represented as 0.15), but can be overridden by passing it as an argument,
retry_on CustomAppException, wait: 5.seconds, attempts: 3, jitter: 0.30
There is also a provision to add default jitter
value at the config level (defaults to 0.15),
config.active_job.retry_jitter = 0.45
If you do not wish to add jitter
, it can be disabled by providing a zero/false value,
retry_on CustomAppException, wait: 5.seconds, attempts: 3, jitter: 0 # or false
Or at config level,
config.active_job.retry_jitter = false # or 0