@@ -54,5 +54,76 @@ export TAXONOMY_URI="mysql+pymysql://user:pass@host:port/ncbi_taxonomy"
5454
5555** Usage:**
5656``` bash
57+ # Basic usage
5758python beta_patcher.py patches.csv --jira-ticket EBD-1111 --output-dir ./patches/
58- ```
59+
60+ # With team filter (only applies patches where all affected genomes belong to specified team)
61+ python beta_patcher.py patches.csv --jira-ticket EBD-1111 --team-filter Genebuild
62+ ```
63+
64+ ### Finding genome_uuid for organism/assembly patches
65+
66+ When patching ` organism ` or ` assembly ` tables, you need to provide a genome_uuid. Use these queries to find genome UUIDs:
67+
68+ ** Find all genomes for a specific assembly (by accession):**
69+ ``` sql
70+ SELECT DISTINCT
71+ genome .genome_uuid ,
72+ genome .production_name ,
73+ assembly .accession ,
74+ assembly .name AS assembly_name,
75+ (SELECT da .value
76+ FROM genome_dataset gd
77+ JOIN dataset d ON gd .dataset_id = d .dataset_id AND d .name = ' genebuild'
78+ JOIN dataset_attribute da ON d .dataset_id = da .dataset_id
79+ JOIN attribute a ON da .attribute_id = a .attribute_id AND a .name = ' genebuild.team_responsible'
80+ WHERE gd .genome_id = genome .genome_id
81+ LIMIT 1 ) AS team_responsible
82+ FROM genome
83+ JOIN assembly ON genome .assembly_id = assembly .assembly_id
84+ WHERE assembly .accession = ' GCA_000001405.14'
85+ ORDER BY team_responsible, genome .production_name ;
86+ ```
87+
88+ ** Find all genomes for a specific organism (by biosample_id):**
89+ ``` sql
90+ SELECT DISTINCT
91+ genome .genome_uuid ,
92+ genome .production_name ,
93+ organism .biosample_id ,
94+ organism .scientific_name ,
95+ organism .strain ,
96+ (SELECT da .value
97+ FROM genome_dataset gd
98+ JOIN dataset d ON gd .dataset_id = d .dataset_id AND d .name = ' genebuild'
99+ JOIN dataset_attribute da ON d .dataset_id = da .dataset_id
100+ JOIN attribute a ON da .attribute_id = a .attribute_id AND a .name = ' genebuild.team_responsible'
101+ WHERE gd .genome_id = genome .genome_id
102+ LIMIT 1 ) AS team_responsible
103+ FROM genome
104+ JOIN organism ON genome .organism_id = organism .organism_id
105+ WHERE organism .biosample_id = ' SAMN04851098'
106+ ORDER BY team_responsible, genome .production_name ;
107+ ```
108+
109+ ** Find genomes by organism strain:**
110+ ``` sql
111+ SELECT DISTINCT
112+ genome .genome_uuid ,
113+ genome .production_name ,
114+ organism .scientific_name ,
115+ organism .strain ,
116+ (SELECT da .value
117+ FROM genome_dataset gd
118+ JOIN dataset d ON gd .dataset_id = d .dataset_id AND d .name = ' genebuild'
119+ JOIN dataset_attribute da ON d .dataset_id = da .dataset_id
120+ JOIN attribute a ON da .attribute_id = a .attribute_id AND a .name = ' genebuild.team_responsible'
121+ WHERE gd .genome_id = genome .genome_id
122+ LIMIT 1 ) AS team_responsible
123+ FROM genome
124+ JOIN organism ON genome .organism_id = organism .organism_id
125+ WHERE organism .scientific_name = ' Homo sapiens'
126+ ORDER BY team_responsible, genome .production_name ;
127+ ```
128+
129+ Pick any one of the returned genome_uuid values to use in your CSV. The script will automatically detect and warn about all other genomes sharing that organism/assembly.
0 commit comments