Skip to content

Commit ed588a7

Browse files
authored
1.2.0 (#18)
* prep changelog * Fix PR template * WIP parallelise eager job submission * Correct syntax error * No printing to screen for arrays. fix whitespace * fix number of jobs * make executable * Add array qsub command * update .gitignore * print qsub command before submission * Fix job naming * Initial commit of poseidon package creation * Rscript to fill in janno and overwrite columns * Bugfixes * Add suffix option and correct utput janno path * move script * Add janno recreation. Other minor changes * Add pandora results to janno * Add log info. New pacakge creation completed. * Minor changes. Add Library_Names column * Add script to mirror Population and Sex from janno to fam/ind * Update CHANGELOG.md * Update package updating. * Add debug option. Add AE version in poseidon pkgs * Remove debug cause of clash. Error when update fails. * Update CHANGELOG.md * Only delete temp files if validation passed. * Bugfix.Runs now updated only if a change in the data occurs. * move update script to scripts/ * Server-side testing paths * Add path to trident executable * server-paths * Bump version * Add environment yml file * Update CHANGELOG.md * Update output folder to live * increase resources for AE_spawner jobs * More resource tweaking for array jobs * Increase memory further * Remove path from environment yml * Bump version * Match Run_ID, not Batch_ID * Array log subdir * Update CHANGELOG.md * prep CHANGELOG.md * 40G memory max for array job * indentation fix * correct column naming * document changes * correct Nr_libs in column selection * correct paths * Update .gitignore * Optimisation. Version bump. Distinct iids used for joining. * Bump version * Update CHANGELOG.md * bump version * Add mention of memory changes * Prep Changelog * Apply stash * non-local paths * Unique lib names * Keep both lowercase and uppercase version of pandora analysis IDs * Add whitelist option to update only specific TSVs * pre-release version bump. * document whitelist option * fix formatting
1 parent 11d1434 commit ed588a7

File tree

6 files changed

+43
-12
lines changed

6 files changed

+43
-12
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,4 @@ test_data/
1414
.tmp/
1515
eager_inputs_old/
1616
eager_outputs_old/
17-
array_Logs/
17+
array_Logs/

CHANGELOG.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,19 @@
33
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
44
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
55

6+
## [1.2.0] - 21/03/2023
7+
8+
### `Added`
9+
- `prepare_eager_tsv.R`: Added `-w/--whitelist` option. A whitelist of Pandora Individual IDs can be provided. Only the TSVs of individuals in the whitelist will be updated.
10+
11+
### `Fixed`
12+
- `update_poseidon_packages.sh`: `Library_Names` field now includes only unique library names.
13+
- `prepare_eager_tsv.R`: Camel_Case versions of Pandora Analysis IDs are no longer filtered out.
14+
15+
### `Dependencies`
16+
17+
### `Deprecated`
18+
619
## [1.1.3] - 17/03/2023
720

821
### `Added`

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,10 @@ Options:
6767
Some tools used in nf-core/eager will strip everything after the first dot (.)
6868
from the name of the input file, which can cause naming conflicts in rare cases.
6969
70+
-w WHITELIST, --whitelist=WHITELIST
71+
An optional file that includes the IDs of whitelisted individuals,
72+
one per line. Only the TSVs for these individuals will be updated.
73+
7074
-o OUTDIR/, --outDir=OUTDIR/
7175
The desired output directory. Within this directory, one subdirectory will be
7276
created per analysis type, within that one subdirectory per individual ID,

scripts/fill_in_janno.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,7 @@ updated_columns <- eager2poseidon::compile_eager_result_tables(
195195
))) %>%
196196
## Remove ss_suffix from library names, so they match Pandora Library IDs
197197
dplyr::mutate(
198-
Library_Names=gsub('_ss','',.data$Library_Names)
198+
Library_Names=gsub('_ss','',.data$Library_Names) %>% vctrs::vec_unique()
199199
) %>%
200200
## Keep distinct rows, now that Library_ID has been dropped
201201
dplyr::distinct() %>%

scripts/prepare_eager_tsv.R

Lines changed: 22 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -46,20 +46,20 @@ save_ind_tsv <- function(data, rename, output_dir, ...) {
4646
data %>% select(-individual.Full_Individual_Id) %>% readr::write_tsv(file=paste0(ind_dir,"/",ind_id,".tsv")) ## Output structure can be changed here.
4747

4848
## Print Autorun_eager version to file
49-
AE_version <- "1.1.3"
49+
AE_version <- "1.2.0"
5050
cat(AE_version, file=paste0(ind_dir,"/autorun_eager_version.txt"), fill=T, append = F)
5151
}
5252

5353
## Correspondance between '-a' analysis type and the name of Kay's pipeline.
5454
## Only bams from the output autorun_name will be included in the output
55-
autorun_name_from_analysis_type <- function(analysis_type) {
56-
autorun_name <- case_when(
57-
analysis_type == "TF" ~ "HUMAN_1240K",
58-
analysis_type == "SG" ~ "HUMAN_SHOTGUN",
55+
autorun_names_from_analysis_type <- function(analysis_type) {
56+
autorun_names <- case_when(
57+
analysis_type == "TF" ~ c( "HUMAN_1240K", "Human_1240k" ),
58+
analysis_type == "SG" ~ c( "HUMAN_SHOTGUN", "Human_Shotgun" ),
5959
## Future analyses can be added here to pull those bams for eager processsing.
6060
TRUE ~ NA_character_
6161
)
62-
return(autorun_name)
62+
return(autorun_names)
6363
}
6464

6565
## MAIN ##
@@ -81,6 +81,11 @@ parser <- add_option(parser, c("-r", "--rename"), type = 'logical',
8181
Some tools used in nf-core/eager will strip everything after the first dot (.)
8282
from the name of the input file, which can cause naming conflicts in rare cases."
8383
)
84+
parser <- add_option(parser, c("-w", "--whitelist"), type = 'character',
85+
action = 'store', dest = 'whitelist_fn', default=NA_character_,
86+
help = "An optional file that includes the IDs of whitelisted individuals,
87+
one per line. Only the TSVs for these individuals will be updated."
88+
)
8489
parser <- add_option(parser, c("-o", "--outDir"), type = 'character',
8590
action = "store", dest = "outdir",
8691
help= "The desired output directory. Within this directory, one subdirectory will be
@@ -99,6 +104,7 @@ opts <- arguments$options
99104
cred_file <- arguments$args
100105
sequencing_batch_id <- opts$sequencing_batch_id
101106
analysis_type <- opts$analysis_type
107+
whitelist_fn <- opts$whitelist_fn
102108

103109
if (is.na(analysis_type)) {
104110
stop(call.=F, "\n[prepare_eager_tsv.R] error: No analysis type provided with -a. Please see --help for more information.\n")
@@ -128,9 +134,9 @@ tibble_input_iids <- complete_pandora_table %>% filter(sequencing.Run_Id == sequ
128134

129135
## Pull information from pandora, keeping only matching IIDs and requested Sequencing types.
130136
results <- inner_join(complete_pandora_table, tibble_input_iids, by=c("individual.Full_Individual_Id"="individual.Full_Individual_Id")) %>%
131-
filter(grepl(paste0("\\.", analysis_type), sequencing.Full_Sequencing_Id), analysis.Analysis_Id == autorun_name_from_analysis_type(analysis_type)) %>%
137+
filter(grepl(paste0("\\.", analysis_type), sequencing.Full_Sequencing_Id), analysis.Analysis_Id %in% autorun_names_from_analysis_type(analysis_type)) %>%
132138
select(individual.Full_Individual_Id,individual.Organism,library.Full_Library_Id,library.Protocol,analysis.Result_Directory,sequencing.Sequencing_Id,sequencing.Full_Sequencing_Id,sequencing.Single_Stranded) %>%
133-
distinct() %>% ## Need distinct() call because of hoe analysis tab is read in, which created one copy of each row per analysis field.
139+
distinct() %>% ## Need distinct() call because of how analysis tab is read in, which created one copy of each row per analysis field.
134140
group_by(individual.Full_Individual_Id) %>%
135141
filter(!is.na(analysis.Result_Directory)) %>% ## Exclude individuals with no results directory (seem to mostly be controls)
136142
mutate(
@@ -183,5 +189,13 @@ results <- inner_join(complete_pandora_table, tibble_input_iids, by=c("individua
183189
## Save results into single file for debugging
184190
if ( opts$debug ) { write_tsv(results, file=paste0(sequencing_batch_id, ".", analysis_type, ".results.txt")) }
185191

192+
## Read in the whitelist if any, and filter the results table
193+
if (! is.na(whitelist_fn) ){
194+
whitelist <- read_tsv(whitelist_fn, col_types='c', col_names='Pandora_ID')
195+
196+
results <- results %>% filter(individual.Full_Individual_Id %in% whitelist$Pandora_ID)
197+
# write_tsv(results, file=paste0(sequencing_batch_id, ".", analysis_type, ".whitelist.results.txt"))
198+
}
199+
186200
## Group by individual IDs and save each chunk as TSV
187201
results %>% group_by(individual.Full_Individual_Id) %>% group_walk(~save_ind_tsv(., rename=F, output_dir=output_dir), .keep=T)

scripts/update_poseidon_package.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
#!/usr/bin/env bash
22

3-
VERSION="1.1.3"
3+
VERSION="1.2.0"
44

55
## Colours for printing to terminal
66
Yellow=$(tput sgr0)'\033[1;33m' ## Yellow normal face
@@ -43,7 +43,7 @@ done
4343

4444
autorun_root_dir='/mnt/archgen/Autorun_eager/'
4545
root_input_dir='/mnt/archgen/Autorun_eager/eager_outputs' ## Directory should include subdirectories for each analysis type (TF/SG) and sub-subdirectories for each site and individual.
46-
root_output_dir='/mnt/archgen/Autorun_eager/poseidon_packages' ## Directory that includes data type, site ID and ind ID subdirs.
46+
root_output_dir='/mnt/archgen/Autorun_eager/dev/poseidon_packages' ## Directory that includes data type, site ID and ind ID subdirs.
4747
input_dir="${root_input_dir}/TF/${ind_id:0:3}/${ind_id}/genotyping/"
4848
output_dir="${root_output_dir}/TF/${ind_id:0:3}/${ind_id}/"
4949
cred_file="${autorun_root_dir}/.eva_credentials"

0 commit comments

Comments
 (0)