Monday, December 21, 2009

Introducing resque-scheduler

Moved here

At Avvo, we've got a lot of backend ruby/rails jobs running as crons. Jobs to refresh leaderboards, jobs to warm caches, jobs to pull data from third parties. The list is long and distinguished. These jobs run on different machines at different times in different intervals. Up until recently, we've been managing it all with puppet. We guess at where we have spare cycles to run a job and on which machine. We've ignored the problem long enough that now we have cron configuration in a couple dozen files running all different sorts of things. Very annoying and difficult to keep track of. Just a few weeks ago, we integrated Resque and started porting over our jobs. Having distributed, generic workers is awesome. It's another thing we've long put off because I couldn't find just the right fit. Resque is that fit. Only one thing was missing: A way to add things to queues based on a schedule. What I wanted in addition to job processing is job scheduling to replace our gagillion cron jobs. So we hacked up resque-scheduler. Resque-scheduler is a gem that extends resque to support scheduled jobs. Installation (hosted on gemcutter.org):
  gem install resque-scheduler
Resque-scheduler takes the schedule as a hash, which is easily represented in YAML...
queue_documents_for_indexing:
  cron: "0 0 * * *"
  class: QueueDocuments
  args: 
  description: "This job queues all content for indexing in solr

clear_leaderboards_contributors:
  cron: "30 6 * * 1"
  class: ClearLeaderboards
  args: contributors
  description: "This job resets the weekly leaderboard for contributions"

clear_leaderboards_moderator:
  cron: "30 6 * * 1"
  class: ClearLeaderboards
  args: moderators
  description: "This job resets the weekly leaderboard for moderators"

... that can be set in your resque initializer like so:
  require 'resque-scheduler'
  ResqueScheduler.schedule = YAML.load_file(File.join(File.dirname(__FILE__), '../resque_schedule.yml'))
Then to run the scheduler process, a simple rake task:
  $ rake resque:scheduler
This process idles until a schedule item fires, then it stuffs it into the appropriate queue. See http://github.com/bvandenbos/resque-scheduler for the complete details. Obviously, you need to be using resque to take advantage of resque-scheduler For the future, I'd like to find a clean way to extend resque-web to display the schedule and allow users to click a button to manually queue items in the schedule. Many thanks to resque (defunkt) and rufus-scheduler (jmettraux) which are doing all the heavy lifting.

No comments: