snakemake · seneschall · Dec 10, 2025 · Dec 10, 2025 · Dec 12, 2025 · Dec 18, 2025
diff --git a/.gitattributes b/.gitattributes
@@ -0,0 +1,2 @@
+# SCM syntax highlighting & preventing 3-way merges
+pixi.lock merge=binary linguist-language=YAML linguist-generated=true
diff --git a/.gitignore b/.gitignore
@@ -6,4 +6,6 @@ __pycache__
 .idea
 *~
 docs/_build
-docs/meta-wrappers
+docs/meta-wrappers# pixi environments
+.pixi/*
+!.pixi/config.toml
diff --git a/bio/mofa2/.gitattributes b/bio/mofa2/.gitattributes
@@ -0,0 +1,2 @@
+# SCM syntax highlighting & preventing 3-way merges
+pixi.lock merge=binary linguist-language=YAML linguist-generated=true -diff
diff --git a/bio/mofa2/.gitignore b/bio/mofa2/.gitignore
@@ -0,0 +1,3 @@
+# pixi environments
+.pixi/*
+!.pixi/config.toml
diff --git a/bio/mofa2/environment.linux-64.pin.txt b/bio/mofa2/environment.linux-64.pin.txt
diff --git a/bio/mofa2/environment.yaml b/bio/mofa2/environment.yaml
@@ -0,0 +1,10 @@
+channels:
+  - conda-forge
+  - bioconda
+  - nodefaults
+dependencies:
+  - bioconductor-mofa2 =1.16.0
+  - r-base =4.4.3
+  - r-arrow =22.0.0
+  - mofapy2 =0.7.2
+  - python =3.14.2
diff --git a/bio/mofa2/meta.yaml b/bio/mofa2/meta.yaml
@@ -0,0 +1,26 @@
+name: mofa2
+description: |
+  Train a model on a multi-omic data set with default options.
+url: https://www.bioconductor.org/packages/release/bioc/html/MOFA2.html
+authors:
+  - Simon Sack
+input:
+  - |
+    A parquet file in tidy format containing data with the headers: `sample, feature, view, group (optional), value`
+
+    `sample`: The name of the sample
+
+    `feature`: The name of the observed feature
+
+    `group` (optional, advanced): Discouraged for beginners. The aim of the multi-group framework is not to capture differential changes in mean levels between the groups (as for example when doing differential RNA expression). The goal is to compare the sources of variability that drive each group.
+
+    `value`: The observed value
+
+    `view`: The view the observed feature is grouped into
+output:
+  - An HDF5-file with the trained model.
+notes: |
+  In the params, set `scale_group` and/or `scale_views` to `TRUE`, if your groups/views
+  have different ranges/variances. This scales them to unit variance.
+  Defaults to `FALSE` if no params are given.
+  For all other training variables, this wrapper uses the default values.
diff --git a/bio/mofa2/test/Snakefile b/bio/mofa2/test/Snakefile
@@ -0,0 +1,12 @@
+rule mofa2:
+    input:
+        "{data}.parquet",
+    output:
+        "{data}.hdf5",
+    log:
+        "log/{data}.log",
+    params:
+        scale_groups="FALSE",  # set to TRUE if groups have different ranges/variances
+        scale_views="FALSE",  # set to TRUE if views have different ranges/variances
+    wrapper:
+        "master/bio/mofa2"
diff --git a/bio/mofa2/test/data.parquet b/bio/mofa2/test/data.parquet
diff --git a/bio/mofa2/test/log/data.log b/bio/mofa2/test/log/data.log
@@ -0,0 +1,29 @@
+Creating MOFA object from a data.frame...
+
+# Multi-group mode requested.
+
+This is an advanced option, if this is the first time that you are running MOFA, we suggest that you try do some exploration first without specifying groups. Two important remarks:
+
+ - The aim of the multi-group framework is to identify the sources of variability *within* the groups. If your aim is to find a factor that 'separates' the groups, you DO NOT want to use the multi-group framework. Please see the FAQ on the MOFA2 webpage.
+
+ - It is important to account for the group effect before selecting highly variable features (HVFs). We suggest that either you calculate HVFs per group and then take the union, or regress out the group effect before HVF selection
+Checking data options...
+Checking training options...
+Checking model options...
+Connecting to the mofapy2 python package using reticulate (use_basilisk = FALSE)... 
+    Please make sure to manually specify the right python binary when loading R with reticulate::use_python(..., force=TRUE) or the right conda environment with reticulate::use_condaenv(..., force=TRUE)
+    If you prefer to let us automatically install a conda environment with 'mofapy2' installed using the 'basilisk' package, please use the argument 'use_basilisk = TRUE'
+
+10 factors were found to explain no variance and they were removed for downstream analysis. You can disable this option by setting load_model(..., remove_inactive_factors = FALSE)
+Trained MOFA with the following characteristics: 
+ Number of views: 2 
+ Views names: view_0 view_1 
+ Number of features (per view): 1000 1000 
+ Number of groups: 2 
+ Groups names: group_0 group_1 
+ Number of samples (per group): 100 100 
+ Number of factors: 5 
+
+Warning message:
+In run_mofa(mofa_object, outfile, ) :
+  The latest mofapy2 version is 0.7.0, you are using 0.7.2. Please upgrade with 'pip install mofapy2'
diff --git a/bio/mofa2/test/log/microbiome.log b/bio/mofa2/test/log/microbiome.log
@@ -0,0 +1,24 @@
+Creating MOFA object from a data.frame...
+Checking data options...
+Checking training options...
+Checking model options...
+Warning message:
+In prepare_mofa(object = mofa_object, data_options = data_opts,  :
+  The total number of samples is very small for learning 15 factors.  
+    Try to reduce the number of factors to obtain meaningful results. It should not exceed ~14.
+Connecting to the mofapy2 python package using reticulate (use_basilisk = FALSE)... 
+    Please make sure to manually specify the right python binary when loading R with reticulate::use_python(..., force=TRUE) or the right conda environment with reticulate::use_condaenv(..., force=TRUE)
+    If you prefer to let us automatically install a conda environment with 'mofapy2' installed using the 'basilisk' package, please use the argument 'use_basilisk = TRUE'
+
+Trained MOFA with the following characteristics: 
+ Number of views: 3 
+ Views names: Bacteria Fungi Viruses 
+ Number of features (per view): 180 18 42 
+ Number of groups: 1 
+ Groups names: single_group 
+ Number of samples (per group): 59 
+ Number of factors: 15 
+
+Warning message:
+In run_mofa(mofa_object, outfile, ) :
+  The latest mofapy2 version is 0.7.0, you are using 0.7.2. Please upgrade with 'pip install mofapy2'
diff --git a/bio/mofa2/test/microbiome.parquet b/bio/mofa2/test/microbiome.parquet
diff --git a/bio/mofa2/wrapper.R b/bio/mofa2/wrapper.R
@@ -0,0 +1,68 @@
+#!/bin/R
+
+# load libraries
+library(MOFA2)
+library(arrow)
+
+# connect to conda environment
+conda_prefix <- Sys.getenv("CONDA_PREFIX")
+reticulate::use_condaenv(conda_prefix)
+
+# if log file is provided, write log to that file
+if (length(snakemake@log) > 0) {
+  log <- file(snakemake@log[[1]], open = "wt")
+  sink(log)
+  sink(log, type = "message")
+}
+
+# load long.data frame from parquet file with following headers:
+# `sample, feature, view, group (optional), value`
+
+# cast input path as character to avoid errors
+path <- as.character(snakemake@input[[1]])
+
+df <- read_parquet(path)
+
+mofa_object <- create_mofa(df)
+
+data_opts <- get_default_data_options(mofa_object)
+model_opts <- get_default_model_options(mofa_object)
+train_opts <- get_default_training_options(mofa_object)
+
+# add params:
+# model params: scale_groups, scale_views
+
+if ("scale_groups" %in% names(snakemake@params)) {
+  if (snakemake@params[["scale_groups"]] == "FALSE") {
+    data_opts$scale_groups <- FALSE
+  }
+  if (snakemake@params[["scale_groups"]] == "TRUE") {
+    data_opts$scale_groups <- TRUE
+  }
+}
+
+if ("scale_views" %in% names(snakemake@params)) {
+  if (snakemake@params[["scale_views"]] == "FALSE") {
+    data_opts$scale_views <- FALSE
+  }
+  if (snakemake@params[["scale_views"]] == "TRUE") {
+    data_opts$scale_views <- TRUE
+  }
+}
+
+# training params: maxiter (int), convergence_mode, gpu_mode, verbose
+
+mofa_object <- prepare_mofa(
+  object = mofa_object,
+  data_options = data_opts,
+  model_options = model_opts,
+  training_options = train_opts
+)
+
+outfile <- file.path(getwd(), snakemake@output[[1]])
+
+# train the MOFA model and write the result to `outfile`
+run_mofa(
+  mofa_object,
+  outfile,
+)
diff --git a/test_wrappers.py b/test_wrappers.py
@@ -107,7 +107,6 @@ def _run(wrapper, cmd, check_log=None, compare_results_with_expected=None):
                 f"file://{tmp_test_subdir}/",
             ]
 
-
         if CONTAINERIZED:
             # run snakemake in container
             cmd = [
@@ -129,9 +128,7 @@ def _run(wrapper, cmd, check_log=None, compare_results_with_expected=None):
                         with open(generated) as genf, open(expected) as expf:
                             gen_lines = genf.readlines()
                             exp_lines = expf.readlines()
-                        diff = "".join(
-                            difflib.Differ().compare(gen_lines, exp_lines)
-                        )
+                        diff = "".join(difflib.Differ().compare(gen_lines, exp_lines))
                         raise ValueError(
                             f"Unexpected results: {generated} != {expected}."
                             f"Diff:\n{diff}"
@@ -271,9 +268,19 @@ def test_agat(run):
 def test_alignoth(run):
     run(
         "bio/alignoth",
-        ["snakemake", "--cores", "1", "--use-conda", "-F", "out/json_plot.vl.json", "out/plot.html", "output-dir/"],
+        [
+            "snakemake",
+            "--cores",
+            "1",
+            "--use-conda",
+            "-F",
+            "out/json_plot.vl.json",
+            "out/plot.html",
+            "output-dir/",
+        ],
     )
 
+
 def test_alignoth_report_meta(run):
     run(
         "meta/bio/alignoth_report",
@@ -7165,3 +7172,7 @@ def test_orthanq(run):
             "out/calls_virus",
         ],
     )
+
+
+def test_mofa2(run):
+    run("bio/mofa2", ["snakemake", "--cores", "1", "data.hdf5", "--use-conda", "-F"])
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		# SCM syntax highlighting & preventing 3-way merges
		pixi.lock merge=binary linguist-language=YAML linguist-generated=true