-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Description
Update CD crawler to allow timeToLive for queue messages to be configurable. It is currently using the default of 7 days, which is too short and results in messages getting lost on this arbitrary timeline. This impacts our internal harvester and the missing license backfill process. If this can't be fixed, the DAG will have to reduce the number of packages it sends to the harvester. This will likely slow down processing. It is currently averaging only 125k per day, but will process closer to 500k on some days. The primary driver of this is the number of files being scanned by scancode. This will require some thought into how best to keep the process running without missing packages because they get dropped off after expiring.
Rationale
The backfill DAG puts more messages on the queue than the throughput of the GH CD harvester. If these messages just drop off the queue unprocessed, then it will appear that they are indeed missing their license, which may be incorrect.
Definition of Done
- There is a new config to set the expiration to use for a message and the configured expiration is seen with messages in the queue.