Skip to content

Y26-022 - Sample Manifest country of origin dataΒ #5482

@andrewsparkes

Description

@andrewsparkes

Describe the Housekeeping
When using a plate manifest created before the upload validation for accurate country of origin values was added, if the country value does not match a valid country in the EBI list (including case sensitivity), then sample accessioning will fail.

This story is to potentially fix that metadata on the samples_metadata table and re-accession the samples to update EBI.
As a minimum to fix those country names that have incorrect case.
e.g. TANZANIA should be Tanzania.

Potentially also fix those with variations in spelling or otherwise non-matching to the EBI list. e.g UK variations include:
UK, U.K, U.K., United Kingdom, United Kindom, etc.

Potentially also change any non-country values to 'Unknown', although this is riskier as we don't understand why those values are stored in that field.

For any sample_metadata rows changed, whether the samples have an accessioning number yet or not, we should also trigger an accessioning attempt to the EBI to create/update the sample there to get things in sync.

Blocking issues
None.

Additional context
Related validation story: #5218

MySQL query (for Training environment) used to see case sensitive differences in country values (and all the nonsense values):

SELECT DISTINCT 
(CAST(country_of_origin AS CHAR CHARACTER SET utf8) COLLATE utf8_bin) AS Country_of_origin 
FROM sequencescape_snapshot.sample_metadata sm
ORDER BY Country_of_origin asc
;

Run against the 10 million rows of sample_metadata in Training that gives 651 rows, which is obviously far in excess of the number of countries in the world.
There are many nonsense or non-country values, many spelling variations, and some case sensitive variations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Size: MMedium - medium effort & risk

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions