Skip to content

DPL-471-1: malformed root_sample_ids that have duplicates in MLWH, and WERE picked #539

@Jonnie-Bevan

Description

@Jonnie-Bevan

User Story
Part of the wider DPL-471 issue which spawned in turn from DPL-048. Related are DPL-048-2, DPL-048-3 and DPL-048-4.

This story concerns the 94 malformed root_sample_ids in the MLWH lighthouse_sample table (+ MongoDB) which have duplicate samples with the correct root_sample_id. These came from the MK lighthouse lab in August 2021 and have an extra substring (something like '_RNA123456789') concatenated on the end of the correct ID.

The samples were picked at some point and are therefore found in SequenceScape and Event Warehouse as well as in the MLWH sample table. As such these need to be addressed in all 5 places (MLWH x2, Mongo, SS, EW). These are also in the iq_seq_flowcell table, meaning they have been picked/sequenced and we need to investigate how far they have gone. Indeed these 94 samples all show up twice in the iq_seq_flowcell table, so they have been picked/sequenced twice.

Fix
The main issue is that since these are duplicated, we cannot simply fix the root_sample_id in the databases. The root_sample_id/plate_barcode/coordinate combination must be unique, and the fixed IDs break this uniqueness. We also can't really delete the rows since these samples were used and have a paper trail of picking -> sequencing etc. that shouldn't be broken.

For SequenceScape/the MLWH sample table, we can add a flag in the description or comments which shows that the sample is a duplicate. For MLWH/MongoDB, we may not be able to do this, and the data might have to be left (not ideal) or some other option will need to be found to deal with these. This could be difficult/complicated... the good news is it is just 94 samples.

Who are the primary contacts for this story
Jonnie B
Alan K

Acceptance criteria

  • data assessed to work out if it has been sequenced, and NPG alerted if need be
  • data are assessed and dealt with in MLWH lighthouse_sample table
  • data are assessed and dealt with in in MongoDB sample table
  • data are assessed and dealt with in SequenceScape (proliferates to MLWH samples table)
  • data are assessed and dealt with in Events Warehouse

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions