Recovering cleanly from Resque::TermException or SIGTERM on Heroku

11 years ago

admin

1 minute

When we restart or deploy we get a number of Resque jobs in the failed queue with either Resque::TermException (SIGTERM) or Resque::DirtyExit.

We’re using the new TERM_CHILD=1 RESQUE_TERM_TIMEOUT=10 in our Procfile so our worker line looks like:

worker:  TERM_CHILD=1 RESQUE_TERM_TIMEOUT=10 bundle exec rake environment resque:work QUEUE=critical,high,low

We’re also using resque-retry which I thought might auto-retry on these two exceptions? But it seems to not be.

So I guess two questions:

We could manually rescue from Resque::TermException in each job, and use this to reschedule the job. But is there a clean way to do this for all jobs? Even a monkey patch.
Shouldn’t resque-retry auto retry these? Can you think of any reason why it wouldn’t be?

Thanks!

Edit: Getting all jobs to complete in less than 10 seconds seems unreasonable at scale. It seems like there needs to be a way to automatically re-queue these jobs when the Resque::DirtyExit exception is run.