Skip to content

[Metadata Correction]: Heuristics to improve EXTRACT taxonomic extractions #38

@gtsueng

Description

@gtsueng

Correction Scope

most or all records - e.g. it seems to be wrong for most or all of the records.

Metadata Record Identifier

No response

metadata field

species

Other metadata field

infectiousAgent

Term in Question

vectors and more

Error Description

EXTRACT is erroneously identifying 'vectors' and 'collections' as specific NCBI Taxonomy IDs.

Proposed Correction Type

remove value

Proposed Correction

To address this, please add a filtration step after Text2Term to ignore/drop an EXTRACT term if it is a child of:

  • NCBITAXON: 2787854 - "other entries"
  • NCBITAXON: 2787823 - "unclassified entries" EXCEPT if it is equal to a or a child of NCBITAXON: 408169 - "metagenomes"

Evidence or Reasoning for the Correction

No response

Additional References

No response

Urgency

  • This is an urgent correction

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions