Status of Database Creation Pipeline

This page documents the progress towards getting a working pipeline that processes all of our messy experimental PAM data into a cleanly organized data table.

Each data table row represents a well during a single experiment (i.e. a well from a particular plate # and measurement method) Together the well id (row x column), plate number (1-20 or 99) and measurement id (M1-6) function as a unique identifier.

Targets

No duplicated rows
100% of the rows should have a number of frames = 84 or 164
100% of Y(II) values are floats (and not NaNs)
1. counting by actual Y(II) values -- i.e. each row has 41 or 81 values, and a fraction is reported for each row

3/18/24

Duplicated rows

Out of 33792 unique wells: 
Fraction duplicated: 0.06818181818181818
Number Duplicated: 2304

Notes - duplicated plates might be real duplicates from the 99 well plates -> solution is on the gdrive side - duplicate files need to be removed)

Correct number of frames

Screenshot 2024-03-18 at 12 44 11 PM

Out of 33792 unique wells: 
Fraction with too many frames: 0.03409090909090909
Number with too many frames: 1152

plates with extra frames should be discarded

Valid Y(II) values

Screenshot 2024-03-18 at 1 23 47 PM

Out of 32640 unique wells with the correct number of frames: 
Fraction of NaN timeseries: 0.08250612745098039
Number of NaN timeseries: 2693

the light/dark pairing might be off -- this is all due to Fv/Fm's being off

Summary

Out of 33792 wells with data collected, 29947 wells or 88.6% have the full set of valid data checked specified by these checks.

other notes: make sure that the Fv/Fm of the 3 WT's is different from the previous day

8th April 2024 update from Murray

Commit hash to reproduce: 670cee00a37c0d345d8bfba3d167417fbf3a6fee

I started from the main branch, and implemented some changes in order to re-run database creation locally on my laptop. The total set of tif and csv data from the google drive came to 6.7G.

Some summary data:

Number of mutant_ID values: 6461 Number of Y(II) time series which are not all NaN: 33,395

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Status of Database Creation Pipeline

Targets

3/18/24

Summary

8th April 2024 update from Murray

Some summary data:

Raw Y(II) data:

Light regime / plate coverage:

Number of blank wells per plate:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally