-
Input discovery and gatekeeping (
main.nf)- The pipeline reads batch directories from
--input_dirusingChannel.fromPath("${params.input_dir}/*/"). - It then computes global label counts (
ALL_LABEL_COUNTS) and explicitly fails if the total count is zero (CHECK_LABEL_COUNTS).
- The pipeline reads batch directories from
-
Label preparation (
main.nf)- It recomputes a per-label table (
GET_ALL_LABEL_RECOUNTS). - It applies heuristic negative-label relabeling to each quantification table (
BOOST_NEGATIVE_LABELS).
- It recomputes a per-label table (
-
Optional normalization (
main.nf)- If
params.use_boxcox_transformationis true, each modified table is transformed byBOXCOX_TRANSFORM.
- If
-
Modeling and reporting sub-workflow (
modules/fit_new_models.nf)- Training sets are generated from relabeled/normalized tables (
GET_SINGLE_MARKER_TRAINING_DF). - Marker training files are grouped and merged by marker (
MERGE_TRAINING_BY_MARKER). - Binary models are trained (
BINARY_MODEL_TRAINING). - Each trained model is paired with each input table, then predictions are made (
PREDICTIONS_FROM_BEST_MODEL). - Predictions are grouped per image and merged (
MERGE_BY_PRED_IMAGE). - Per-image PNG/HTML reports are produced (
REPORT_PER_IMAGE).
- Training sets are generated from relabeled/normalized tables (
-
Output layout
- Reports:
${output_dir}/reports/and${output_dir}/per_image_reports/<image_id>/ - Merged prediction tables:
${output_dir}/merged/ - Normalization PDFs:
${output_dir}/normalization_reports/
- Reports:
-
Parameter overriding bug risk in
main.nfparams.input_dirandparams.output_dirare assigned insidemain.nf, which can override user-provided CLI/config values depending on evaluation order.- This is fragile and can surprise users if custom values are ignored.
-
Missing local profile config reference
nextflow.configdefineslocalprofile withincludeConfig 'conf/local.config', butconf/local.configis absent.- Running with
-profile localcan fail immediately.
-
Potential channel shape mismatch in prediction pairing
fitting.combine(tablesOfQuantification.flatMap { it })can create broad Cartesian-style pairings if not carefully constrained.- Depending on number of models/tables, this can explode compute and produce logically invalid model-to-table pairs.
-
Non-deterministic file grouping/name cleanup
- Grouping logic strips an 8-char hash suffix only if basename matches
/_([a-zA-Z0-9]{8})$/. - If upstream naming differs, grouping may be inconsistent (mixture of tuple/file return paths), leading to fragile behavior.
- Grouping logic strips an 8-char hash suffix only if basename matches
-
Optional outputs may hide silent failures
- Several outputs are marked
optional: true(GET_SINGLE_MARKER_TRAINING_DF,BINARY_MODEL_TRAINING). - This prevents early hard failures and can cause downstream steps to run with partial/no data.
- Several outputs are marked
-
Hard-coded schema assumptions
- Defaults like
singleLabelColumn = "Classification", context columns with special spacing/Unicode (" Centroid X µm"), andnucleus_marker = "NA2"assume stable export schema. - Any slight naming drift in source TSVs can break scripts.
- Defaults like
-
Typo-prone heuristic parameter names
- Parameter names use
huerustic_*spelling consistently; this works internally but increases operator mistake risk when overriding values.
- Parameter names use
-
Dependency/environment coupling
- Processes call Python scripts directly (no explicit conda/container declaration in workflow files).
- Reproducibility depends on external environment setup, reducing portability.
-
README command typo
- Usage example shows
-profile -slrum(typo/order issue), which may confuse users and reduce successful first runs.
- Usage example shows
- Move default param assignments out of
main.nfand keep them solely innextflow.config. - Add
conf/local.configor remove/adjust thelocalprofile reference. - Validate channel cardinality before
combineto prevent combinatorial blow-up. - Add explicit schema validation step for required columns before modeling.
- Add process-level software environment declarations (conda/container) and pinned versions.