Skip to content

Queue Master

Grant Carthew edited this page Aug 16, 2016 · 44 revisions

Description

When creating a Queue object within rethinkdb-job-queue you can customize its operation with configuration options. One of the options is called the masterInterval. If this option is set to false, the Queue object will not be a Queue Master. If the masterInterval option is set to a positive Integer then you will have a Queue Master. See the Queue Options document for more detail.

The value of the masterInterval represents a repeat time period in seconds. The default value for the masterInterval is 310 seconds or five minutes and ten seconds. This is ten seconds past the default job timeout value of 300 seconds. The extra 10 seconds is to assist in detecting failed jobs directly after queue startup. During long term operation the extra 10 seconds will make no difference.

When the time period elapses, the Queue Master will review the database Table backing the queue. This is called the Queue Review process.

It is worth noting that only one Queue Master can be enabled per Node.js process. If the Node.js process already has a Queue object that is a master, then creating more Queue Master objects will not enable multiple database reviews.

The queue master role in rethinkdb-job-queue is an integral role to ensure failed jobs get processed and the database is cleaned. A Queue Master will perform three tasks within the job queue during the Queue Review process:

  1. Failed Node.js Process
  • Discover and retry jobs that have failed due to the Node.js process crashing or hanging.
  1. Delayed Job Processing
  • Process failed jobs delayed for retry if the Queue object is idle.
  1. Remove Finished Jobs
  • Remove completed, cancelled, or terminated jobs from the queue.

If you do not enable a Queue Master against a queue, these tasks will still be performed during Node.js process start as long as a handler function has been added to a Queue object. See the Queue.process document for more detail.

Failed Node.js Process

During normal queue operation, worker nodes processing jobs will detect when a job has taken too long and is operating past its timeout value. If this situation occurs the job status in the database is set to 'failed' and the job will be delayed based on the retryDelay, retryCount, and retryMax values. See Job Retry for more detail on the delay process.

However, if a queue worker node fails for any reason whilst working on a job, the job will not complete and will remain in the database with an active status causing an orphaned job.

To ensure the job is not forgotten, a Master Queue will repeatedly review the queue database backing table based on the masterInterval. When the master node reviews the queue backing table, it looks for jobs that are active and past their dateRetry value which is set when the job is retrieved from the database and made active.

Again, for more detail on the dateRetry value see the Job Retry document.

If a job is in the database with a status of active and the current time is past the dateRetry value, this indicates the worker node has stalled or failed. The database review process will updated the job to a failed status and the priority will be set to retry (a stored value of 1) which is the highest priority. The jobs retryCount value will also be incremented.

It is possible for the job being processed to extend past its initial timeout value and be marked as failed by the Queue Master review process. To prevent this call the Job.progress method on the Job object. When progress for a job is updated, the dateRetry value is also updated. Therefore calling Job.progress periodically within the job timeout period will prevent the job from erroneously being marked as failed on review.

It is worth noting that the database review process is called when the queue.process() function is first called. This means that if you don't have a master queue, orphaned jobs will still be updated on process restart.

Delayed Job Processing

In a busy queue the database will be queried often on completion of jobs to find more jobs to process. This includes finding jobs with a status of waiting, timeout, or retry.

If the last job in the queue fails and the retryDelay value is not 0, the jobs status will be set to 'retry' and the queue will enter an idle state.

Without something initiating the queue to process jobs, the last job will remain in the database until more jobs are added to the queue.

To prevent this situation from delaying the last job well beyond its dateRetry value, the Master Queue database review process completes by calling the queue process task. The queue process task will query the database discovering the delayed job and retrieve it for processing.

Remove Finished Jobs

Once a job has finished processing and its status is changed to either completed, cancelled, or terminated, it will no longer be an active part of the queue. The job details in the database including its log entries and other properties are just taking up space.

Now if you are processing thousands of jobs a day this might not be a big deal and you may very well be happy to just leave the job details in the database for future reference. However if you are processing millions of jobs a day, the space taken up by the completed jobs could add up over a year or more. If that is the case then you will want to remove completed, cancelled, or terminated jobs from the database to free up space.

Fortunately rethinkdb-job-queue has three options for cleaning up jobs once they are finished. If you set the Queue.removeFinishedJobs property to true, jobs that are completed, cancelled, or terminated will be removed from the database immediately.

If you set the Queue.removeFinishedJobs property to false, jobs will never be removed from the database no matter what their status is.

You do have the option of setting the Queue.removeFinishedJobs property to a number representing days. The default is 180 days. If the property value is a number, then at some point in the future after a job has been completed, cancelled, or terminated it will need to be removed from the database. This is the final task for a Master Queue.

When the Master Queue reviews the database if the Queue.removeFinishedJobs property is a number on the saved jobs, and the date has moved past the expiry day, then the jobs will be removed.

Main

How It Works

Contributing

API

Queue Methods

Queue Properties

Queue Events

Job Methods

Job Properties

Documentation

Clone this wiki locally