Skip to content

Add contamination and relatedness QC checks#13

Open
iamh2o wants to merge 4 commits intocodex/refactor-script-directives-to-shellfrom
codex/add-contamination-and-relatedness-checks-for-rnaseq
Open

Add contamination and relatedness QC checks#13
iamh2o wants to merge 4 commits intocodex/refactor-script-directives-to-shellfrom
codex/add-contamination-and-relatedness-checks-for-rnaseq

Conversation

@iamh2o
Copy link

@iamh2o iamh2o commented Sep 18, 2025

Summary

  • add relatedness configuration options and validation
  • generate new Snakemake rules to assess read pair contamination and patient relatedness
  • implement a reusable analysis script that derives correlation-based flags from STAR counts

Testing

  • python -m compileall workflow/scripts/relatedness_analysis.py

https://chatgpt.com/codex/tasks/task_e_68cc900403e48331b0692e842405ab9d

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex Review: Here are some suggestions.

Reply with @codex fix comments to fix any unresolved comments.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

Comment on lines +111 to +116
samples_df = pd.read_csv(samples_path, sep="\t", dtype=str)

if "patient" not in samples_df.columns:
raise ValueError(
"The samples sheet must contain a 'patient' column to assess relatedness."
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Avoid hard requirement for patient column in samples sheet

The new attach_patient_metadata function aborts when the samples sheet lacks a patient column (raise ValueError(...)). The schema in workflow/schemas/samples.schema.yaml still treats patient as optional, and the function already implements a sensible fallback by logging a warning and defaulting missing patients to the sample name. With the default config now activating relatedness checks, existing projects that omit the optional patient column will fail before reaching the fallback logic. Consider removing the hard check and letting the existing warning/default behaviour handle this case so backward‑compatible sample sheets continue to run.

Useful? React with 👍 / 👎.

AWS ParallelCluster user added 2 commits September 18, 2025 23:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant