@@ -22,6 +22,15 @@ deployments:
2222 job_variables : {}
2323 schedules : []
2424
25+ - name : analysis_amplicon_study_deployment
26+ description : |-
27+ Get a study from ENA, and input it to MGnify.
28+ Kick off amplicon-v6 pipeline.
29+ :param study_accession: Study accession e.g. PRJxxxxxx
30+ entrypoint : workflows/flows/analysis_amplicon_study.py:analysis_amplicon_study
31+ work_pool :
32+ name : slurm
33+
2534- name : analysis_assembly_study_deployment
2635 description : |-
2736 Get a study from ENA (or MGnify), and run assembly analysis the assemblies of the study.
@@ -177,3 +186,107 @@ deployments:
177186 entrypoint : workflows/flows/nf_traces/flows.py:nextflow_trace_etl_flow
178187 work_pool :
179188 name : slurm
189+
190+ - name : analysis_rawreads_study_deployment
191+ description : |-
192+ Get a study from ENA, and input it to MGnify.
193+ Kick off raw-reads-v6 pipeline.
194+ :param study_accession: Study accession e.g. PRJxxxxxx
195+ entrypoint : workflows/flows/analysis_rawreads_study.py:analysis_rawreads_study
196+ work_pool :
197+ name : slurm
198+
199+ - name : import_genomes_flow_deployment
200+ description : |-
201+ Imports genomes from a catalogue directory into the database.
202+
203+ This flow processes genome results from a catalogue directory, performs sanity checks,
204+ and imports genome data including annotations, files, and metadata.
205+
206+ :param results_directory: Path to the catalogue directory containing genome results
207+ :param catalogue_name: Name of the genome catalogue
208+ :param catalogue_version: Version of the genome catalogue
209+ :param gold_biome: Biome classification for the catalogue
210+ :param pipeline_version: Version of the pipeline used to generate the genomes
211+ :param catalogue_type: Type of catalogue (e.g., prokaryotes, eukaryotes)
212+ :param catalogue_biome_label: Optional label for the catalogue biome
213+ entrypoint : workflows/flows/import_genomes_flow.py:import_genomes_flow
214+ work_pool :
215+ name : slurm
216+
217+ - name : import_genome_assembly_links_flow_deployment
218+ description : |-
219+ Imports data from a TSV file into the GenomeAssemblyLink model.
220+
221+ This flow processes a TSV file containing genome assembly link information and
222+ imports it into the database, creating relationships between MAGs, genomes, and species representatives.
223+
224+ :param tsv_path: Path to the TSV file containing genome assembly link data
225+ entrypoint : workflows/flows/import_genome_assembly_links_flow.py:import_genome_assembly_links_flow
226+ work_pool :
227+ name : slurm
228+
229+ - name : import_additional_contained_genomes_flow_deployment
230+ description : |-
231+ Imports data from a large TSV file into the AdditionalContainedGenomes model.
232+
233+ The TSV must contain the following columns:
234+ - Run
235+ - Genome_Mgnify_accession
236+ - Containment
237+ - cANI
238+
239+ The flow reads the file in streaming chunks and performs batched DB operations.
240+
241+ :param csv_path: Path to the TSV file containing additional contained genomes data
242+ :param chunk_size: Size of chunks to read from the file (default: 50000)
243+ :param insert_batch_size: Size of batches for database insertion (default: 10000)
244+ entrypoint : workflows/flows/import_additional_contained_genomes_flow.py:import_additional_contained_genomes_flow
245+ work_pool :
246+ name : slurm
247+
248+ - name : update_ena_accession_from_json_flow_deployment
249+ description : |-
250+ Traverse per-genome JSON files to update Genome.ena_genome_accession from the
251+ 'ncbi_genome_accession' value found in each file.
252+
253+ :param base_dir: Directory containing one subdirectory per genome accession, each with
254+ a JSON file named <accession>.json
255+ :param read_chunk_size: Django iterator chunk size when scanning genomes (default: 5000)
256+ :param update_batch_size: Number of rows to bulk update at once (default: 2000)
257+ :param catalogue_name: Optional; if provided, restrict processing to genomes whose
258+ catalogue has this exact name
259+ entrypoint : workflows/flows/update_ena_accession_from_json_flow.py:update_ena_accession_from_json_flow
260+ work_pool :
261+ name : slurm
262+
263+ - name : upload_assembly_deployment
264+ description : |-
265+ This flow performs a sanity check and uploads an assembly for a specific run to ENA.
266+
267+ It is intended to be executed *per run* after the assembly flow. The assembly uploader
268+ is a separate python library to prepare the upload files. The assembly submission
269+ via `webin-cli` is launched as a SLURM cluster job.
270+
271+ :param assembly_id: ID of the assembly to upload
272+ :param dry_run: If True, perform a dry run without actual upload (default: True)
273+ :param custom_upload_folder: Optional custom path for upload folder
274+ entrypoint : workflows/flows/upload_assembly.py:upload_assembly
275+ work_pool :
276+ name : slurm
277+
278+ - name : move_data_deployment
279+ description : |-
280+ Move files on the cluster filesystem.
281+
282+ This flow uses a slurm job running on the datamover partition to move files
283+ between locations on the cluster filesystem.
284+
285+ :param source: fully qualified path of the source location (file or folder)
286+ :param target: fully qualified path of the target location (file or folder)
287+ :param move_command: tool command for the move. Default is `cp`, but could be `mv` or `rsync` etc.
288+ :param make_target: mkdir the target location path before copying.
289+ :param kwargs: Other keywords to pass to run_cluster_job
290+ entrypoint : workflows.prefect_utils.datamovers:move_data
291+ work_pool :
292+ name : slurm
0 commit comments