Indexer Reliability and Performance Epic

We have a functional indexer codebase, but there are some reliability and performance aspects that need improvement:

1. Index-failure handling
  - After being consumed from RabbitMQ, an indexing task can fail for many reasons, which can usually be categorized into:
    - Retrievable: the task failed due to a temporary issue (e.g. network, pod restart, etc.), and can be re-tried with the likelihood of success
    - Terminal: the task will never succeed, so should not be retried. Instead, the operator should be notified of the failure, the reason, and steps to resolve
  - see:
    - #23
    - #240
2. Some indexing tasks take a long time; e.g:
  - Indexing EML docs with lots of annotations, requiring the use of `OntologyModelService` - See #34 
  - Resource maps are not indexed until all referenced objects are indexed. An alternative approach has been proposed - #101 
3. We should be able to assign priorities to Index tasks - see #103 (e.g. big re-indexes should be low priority background jobs, so as not to disrupt people submitting new data packages or edits)
4. Would be good to have a periodic auditing task that determines if we have objects that are missing from solr and reindexes them. 

Already Done:
- Metacat resubmits index tasks that were not successfully submitted to RabbitMQ ([Metacat Issue #1603](https://github.com/NCEAS/metacat/issues/1603))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexer Reliability and Performance Epic #250

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Indexer Reliability and Performance Epic #250

Description

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions