Skip to content

Queue Master

Grant Carthew edited this page Aug 8, 2016 · 44 revisions

Description

When creating a Queue object within rethinkdb-job-queue you can customize its operation with configuration options. One of the options is called the masterInterval. If this option is set to false, the Queue object will not be a Master Queue. If the masterInterval option is set to a number then you will have a Master Queue. The value of the masterInterval represents a repeat time period in seconds. The default value for the masterInterval is 310 seconds or five minutes and ten seconds. This is ten seconds past the default job timeout value of 300 seconds.

It is worth noting that only one master review process can be enabled per process. If the nodejs process already has a Queue object that is a master, then creating more Master Queue objects will not enable multiple database reviews.

The queue master role in rethinkdb-job-queue is an integral role to ensure failed jobs get processed and the database is cleaned. A Master Queue will perform three tasks within the job queue:

  1. Retry jobs if the queue worker node stops responding.
  2. Process jobs delayed for retry if the Queue object is idle.
  3. Remove completed, cancelled, or terminated jobs from the queue.

Failed Queue Worker Node

During normal queue operation, worker nodes processing jobs will detect when a job has taken too long and is operating past its timeout value. If this situation occurs the job status in the database is set to 'failed' and the job will be delayed based on the retryDelay, retryCount, and retryMax values. See Job Retry for more detail on the delay process.

However, if a queue worker node fails for any reason whilst working on a job, the job will not complete and will remain in the database with an active status causing an orphaned job.

To ensure the job is not forgotten, a Master Queue will repeatedly review the queue database backing table based on the masterInterval. When the master node reviews the queue backing table, it looks for jobs that are active and past their dateRetry value which is set when the job is retrieved from the database and made active.

Again, for more detail on the dateRetry value see the Job Retry document.

If a job is in the database with a status of active and the current time is past the dateRetry value, this indicates the worker node has stalled or failed. The database review process will updated the job to a failed status and the priority will be set to retry (a stored value of 1) which is the highest priority. The jobs retryCount value will also be incremented.

It is possible for the job being processed to extend past its initial timeout value and be marked as failed by the Queue Master review process. To prevent this call the Job.progress method on the Job object. When progress for a job is updated, the dateRetry value is also updated. Therefore calling Job.progress periodically within the job timeout period will prevent the job from erroneously being marked as failed on review.

It is worth noting that the database review process is called when the queue.process() function is first called. This means that if you don't have a master queue, orphaned jobs will still be updated on process restart.

Delayed Job Processing

In a busy queue the database will be queried often on completion of jobs to find more jobs to process. This includes finding jobs with a status of waiting, timeout, or retry.

If the last job in the queue fails and the retryDelay value is not 0, the jobs status will be set to 'retry' and the queue will enter an idle state.

Without something initiating the queue to process jobs, the last job will remain in the database until more jobs are added to the queue.

To prevent this situation from delaying the last job well beyond its dateRetry value, the Master Queue database review process completes by calling the queue process task. The queue process task will query the database discovering the delayed job and retrieve it for processing.

Remove Old Jobs

Once a job has finished processing and its status is changed to either completed, cancelled, or terminated, it will no longer be an active part of the queue. The job details in the database including its log entries and other properties are just taking up space.

Now if you are processing thousands of jobs a day this might not be a big deal and you may very well be happy to just leave the job details in the database for future reference. However if you are processing millions of jobs a day, the space taken up by the completed jobs could add up over a year or more. If that is the case then you will want to remove completed, cancelled, or terminated jobs from the database to free up space.

Fortunately rethinkdb-job-queue has three options for cleaning up jobs once they are finished. If you set the Queue.removeFinishedJobs property to true, jobs that are completed, cancelled, or terminated will be removed from the database immediately.

If you set the Queue.removeFinishedJobs property to false, jobs will never be removed from the database no matter what their status is.

You do have the option of setting the Queue.removeFinishedJobs property to a number representing days. The default is 180 days. If the property value is a number, then at some point in the future after a job has been completed, cancelled, or terminated it will need to be removed from the database. This is the final task for a Master Queue.

When the Master Queue reviews the database if the Queue.removeFinishedJobs property is a number on the saved jobs, and the date has moved past the expiry day, then the jobs will be removed.

Main

How It Works

Contributing

API

Queue Methods

Queue Properties

Queue Events

Job Methods

Job Properties

Documentation

Clone this wiki locally