Skip to content

ingest: Remove strain name extraction #48

@victorlin

Description

@victorlin

scripts/extract_from_strain.py is used to extract information from strain names. Specifically with the current data, it extracts a more accurate collection year for 35 samples. The full output can be found in any recent run of Ingest to phylogenetic.

'PP_000N5LB': date '2015-XX-XX' → '2001-XX-XX'
'PP_000N73A': date '2015-XX-XX' → '2001-XX-XX'
'PP_000N748': date '2015-XX-XX' → '2001-XX-XX'
…

All 35 collection date inaccuracies have already been reported in pathoplexus/curation_reports#7. Once Pathoplexus is updated with the corrected dates, it should be safe to remove this script and simplify the workflow.

Metadata

Metadata

Assignees

No one assigned

    Labels

    blockedDependent on external development

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions