Recovering cleanly from Resque::TermException or SIGTERM on Heroku
When we restart or deploy we get a number of Resque jobs in the failed queue with either Resque::TermException (SIGTERM)
or Resque::DirtyExit
.
We’re using the new TERM_CHILD=1 RESQUE_TERM_TIMEOUT=10
in our Procfile so our worker line looks like:
worker: TERM_CHILD=1 RESQUE_TERM_TIMEOUT=10 bundle exec rake environment resque:work QUEUE=critical,high,low
We’re also using resque-retry
which I thought might auto-retry on these two exceptions? But it seems to not be.
So I guess two questions:
- We could manually rescue from
Resque::TermException
in each job, and use this to reschedule the job. But is there a clean way to do this for all jobs? Even a monkey patch. - Shouldn’t resque-retry auto retry these? Can you think of any reason why it wouldn’t be?
Thanks!
Edit: Getting all jobs to complete in less than 10 seconds seems unreasonable at scale. It seems like there needs to be a way to automatically re-queue these jobs when the Resque::DirtyExit exception is run.