Implementing and Testing DIANN converter for MSstatsBIG. #9

Rudhik1904 · 2026-01-05T23:45:26Z

WIP: Added Clean_DIANN, Diann Converter and some basic test (more comprehensive test is still needed)

Summary by CodeRabbit

New Features
- Out-of-memory DIANN → MSstats conversion with chunked, delimiter-aware processing and MSstats-formatted output.
- New exported converter for end-to-end DIANN-to-MSstats workflow with options for MBR, quant column selection, filtering, aggregation, and anomaly scoring.
Tests
- Added unit and integration tests covering conversion, preprocessing, annotation merging, and feature selection.
Chores
- Updated package metadata to bump tooling and add testing/development dependencies.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-05T23:45:34Z

📝 Walkthrough

Walkthrough

Adds chunked DIANN-to-MSstats conversion: new chunked reader (reduceBigDIANN), per-chunk processor (cleanDIANNChunk), a public converter (bigDIANNtoMSstatsFormat), tests for DIANN/converter flows, and DESCRIPTION/NAMESPACE dependency and export updates (new Imports and Suggests; RoxygenNote bump).

Changes

Cohort / File(s)	Summary
Build Configuration `DESCRIPTION`	Bumped RoxygenNote to 7.3.3; expanded `Imports` to include `dplyr`, `MSstats`, `MSstatsConvert`; added `Suggests`: `testthat`, `mockery`.
Public API `NAMESPACE`	Exported new function `bigDIANNtoMSstatsFormat`.
DIANN Processing `R/clean_DIANN.R`	Added `reduceBigDIANN()` (delimiter detection + read_delim_chunked, 1e6-row chunks) and `cleanDIANNChunk()` (MBR-aware column selection, un-nesting semicolon fields, numeric conversion, Fragment.Info → FragmentIon/ProductCharge parsing, filtering by Q-values, MSstats column mapping, per-chunk CSV write/append).
Converter Integration `R/converters.R`	Added exported `bigDIANNtoMSstatsFormat()` which invokes `reduceBigDIANN`, then calls `MSstatsPreprocessBig` and returns Arrow/Sparklyr-compatible object; note: duplicate definition appears in file (potential duplication).
Documentation `man/bigDIANNtoMSstatsFormat.Rd`	Added roxygen-generated man page for `bigDIANNtoMSstatsFormat()` (usage, args, return type).
Tests — Converters `tests/testthat/test-converters.R`	New tests for `MSstatsAddAnnotationBig`, `MSstatsPreprocessBig` feature selection, and an integration test for a bigSpectronaut converter flow using Arrow backend and temporary file I/O.
Tests — DIANN `tests/testthat/test-diann_converter.R`	New tests for `cleanDIANNChunk`, `reduceBigDIANN`, and `bigDIANNtoMSstatsFormat` (end-to-end arrow backend checks, feature selection, and cleanup).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant bigDIANN as bigDIANNtoMSstatsFormat
    participant reduce as reduceBigDIANN
    participant chunk as cleanDIANNChunk
    participant preprocess as MSstatsPreprocessBig
    participant output as Arrow/Sparklyr

    User->>bigDIANN: call with DIANN file & params
    bigDIANN->>reduce: orchestrate chunked reduction
    reduce->>reduce: detect delimiter

    rect rgba(200,220,255,0.5)
    Note over reduce,chunk: Chunked loop (1e6 rows/chunk)
    reduce->>chunk: process chunk 1
    chunk->>chunk: select columns (MBR-aware)
    chunk->>chunk: un-nest semicolon fields
    chunk->>chunk: parse FragmentIon & ProductCharge
    chunk->>chunk: apply Q-value and fragment filters
    chunk->>reduce: return processed chunk
    reduce->>reduce: append to CSV output
    reduce->>chunk: process chunk 2...N
    end

    reduce-->>bigDIANN: completed reduced CSV
    bigDIANN->>preprocess: MSstatsPreprocessBig (feature selection, filtering, anomaly scoring)
    preprocess->>output: produce Arrow/Sparklyr table
    bigDIANN->>User: return processed data

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Create MSstatsBig DIANN Converter - Old Format #7: Implements the DIANN converter feature requested (adds reduceBigDIANN, cleanDIANNChunk, and bigDIANNtoMSstatsFormat with tests).

Possibly related PRs

Feature anomaly #6: Shares anomaly-scoring parameters and preprocessing workflow changes that overlap with the new converter signature and MSstatsPreprocessBig integration.

Poem

🐰
I nibble through DIANN in million-row bites,
unnesting fragments and parsing the nights.
I hop, I filter, I map every line,
then bundle for MSstats — tidy and fine. 🥕

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately summarizes the main change—implementing and testing a DIANN converter for MSstatsBIG—which aligns with the substantial additions across multiple files including converter functions, tests, and documentation.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

Fix all issues with AI Agents 🤖

In @DESCRIPTION:
- Around line 26-27: Move the test-only packages out of Imports and into
Suggests in DESCRIPTION: remove "testthat" and "mockery" from the Imports list
and add them under Suggests so they are only required for development/testing;
update the DESCRIPTION sections accordingly so Imports contains runtime
dependencies only and Suggests lists testthat and mockery.

In @man/bigDIANNtoMSstatsFormat.Rd:
- Around line 33-34: The documentation and default parameter use incorrect DIANN
column names without dots; update the man page entry for quantificationColumn to
use "Fragment.Quant.Corrected" (DIANN 1.8.x) and "Fragment.Quant.Raw" (DIANN
1.9.x) instead of the dotted-less variants, and update the default value in the
reduceBigDIANN function (R/clean_DIANN.R, function reduceBigDIANN) to
"Fragment.Quant.Corrected" so the code and docs match DIANN's actual output
column names.

In @R/clean_DIANN.R:
- Around line 49-58: The current sub('\\^\\.\\*', '', .data$Fragment.Info) in
the dplyr::mutate (assigning FragmentIon) uses an incorrect literal regex and
does nothing; either remove the sub and set FragmentIon = .data$Fragment.Info
directly, or replace the pattern with one that strips the charge/suffix (e.g.
sub("\\^.*$", "", .data$Fragment.Info)) so "y7^1/1" -> "y7" and "b3-H2O^1/1" ->
"b3-H2O"; update the FragmentIon assignment accordingly in the mutate call.

In @R/converters.R:
- Around line 162-163: The roxygen docs for the quantificationColumn parameter
are inconsistent with the actual default; update the documentation text for
quantificationColumn to match the real default value used in the function
(change the displayed 'FragmentQuantCorrected' to 'Fragment.Quant.Corrected')
and likewise ensure the alternative value text matches the actual option name
used in code (e.g., use 'Fragment.Quant.Raw' if that is the real token), so the
docstring and the default parameter value for quantificationColumn are
identical.

🧹 Nitpick comments (6)

R/clean_DIANN.R (2)

91-91: Address the TODO comment regarding annotation columns.

The TODO indicates uncertainty about whether Condition and BioReplicate columns are needed. This should be confirmed with Tony (or through documentation review) to ensure the converter produces the correct MSstats format.

Would you like me to help search the MSstats documentation or existing converter implementations to verify the required columns?

2-21: Consider adding error handling for I/O operations.

Both reduceBigDIANN and cleanDIANNChunk lack error handling for file operations. Consider wrapping I/O operations in tryCatch blocks to provide informative error messages when:

Input files are missing, corrupted, or have incorrect format

Output path is invalid or write-protected

Disk space is exhausted during chunked writes

This would improve the user experience when processing large files that might fail midway through.

Also applies to: 24-113
tests/testthat/test-converters.R (4)
31-32: Use tempfile() for output file to avoid test pollution.

The input file uses tempfile() but the output file is hardcoded as "preprocess_output.csv". This could cause issues with parallel test execution or leave artifacts in the working directory if the test fails before cleanup.
🔎 Proposed fix
   input_file <- tempfile(fileext = ".csv")
-  output_file <- "preprocess_output.csv"
+  output_file <- tempfile(fileext = ".csv")
57-60: Use on.exit() for guaranteed cleanup.

If an assertion fails before reaching cleanup, temporary files may persist. Using on.exit() ensures cleanup runs regardless of test outcome.
🔎 Proposed fix
   input_file <- tempfile(fileext = ".csv")
-  output_file <- "preprocess_output.csv"
+  output_file <- tempfile(fileext = ".csv")
+  on.exit({
+    if (file.exists(input_file)) file.remove(input_file)
+    if (file.exists(output_file)) file.remove(output_file)
+  }, add = TRUE)

   # ... test code ...

-  # Cleanup
-  file.remove(input_file)
-  if (file.exists(output_file)) file.remove(output_file)
 })
64-64: Unused mock_reduce variable.

mock_reduce is created but never used. The stub() function on line 66 is what actually replaces reduceBigSpectronaut. This line can be removed.
🔎 Proposed fix
 test_that("bigSpectronauttoMSstatsFormat works correctly", {
-  # Mock reduceBigSpectronaut as its source is not provided
-  mock_reduce <- mock(NULL)
-
   stub(bigSpectronauttoMSstatsFormat, "reduceBigSpectronaut", function(input_file, output_path, ...) {
76-77: Use tempfile() for test files.

Similar to the previous test, hardcoded filenames can cause test pollution. While input_file isn't actually read (due to mocking), using tempfile() for output_file ensures proper isolation.
🔎 Proposed fix
-  input_file <- "dummy_spectro_input.csv"
-  output_file <- "spectro_output.csv"
+  input_file <- "dummy_spectro_input.csv"  # Not actually read due to mock
+  output_file <- tempfile(fileext = ".csv")
+  on.exit({
+    if (file.exists(output_file)) file.remove(output_file)
+    reduce_file <- paste0("reduce_output_", basename(output_file))
+    if (file.exists(reduce_file)) file.remove(reduce_file)
+  }, add = TRUE)

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a160afc and 1dfd8f0.

⛔ Files ignored due to path filters (3)

tests/testthat/topN_preprocess_output.csv is excluded by !**/*.csv
tests/testthat/topN_spectro_output.csv is excluded by !**/*.csv
tests/testthat/topN_test_diann_output.csv is excluded by !**/*.csv

📒 Files selected for processing (7)

DESCRIPTION
NAMESPACE
R/clean_DIANN.R
R/converters.R
man/bigDIANNtoMSstatsFormat.Rd
tests/testthat/test-converters.R
tests/testthat/test-diann_converter.R

🔇 Additional comments (6)

NAMESPACE (1)

5-5: LGTM!

The new export for bigDIANNtoMSstatsFormat is correctly added and aligns with the new converter functionality.

tests/testthat/test-diann_converter.R (1)

6-109: LGTM! Comprehensive test coverage.

The test suite provides excellent coverage of the DIANN converter functionality:

cleanDIANNChunk test: Validates chunk processing, fragment filtering (H2O removal), and column mapping

reduceBigDIANN test: Tests file-based chunked processing with multiple proteins and charge extraction

bigDIANNtoMSstatsFormat test: End-to-end validation of feature selection logic with arrow backend

The tests correctly exercise key edge cases and validate expected outputs.

tests/testthat/test-converters.R (3)

1-4: LGTM on test setup.

The imports and context setup are appropriate for testing converter functions.

6-28: LGTM!

Good coverage of the annotation merge functionality. The test validates that Run and Intensity are preserved while Condition and BioReplicate are correctly joined.

92-95: Cleanup may miss the intermediate file with tempfile() approach.

If you adopt tempfile() for output_file, the paste0("reduce_output_", output_file) pattern will break because output_file would be a full path like /tmp/RtmpXXX/file123.csv. Consider adjusting how the reduce output path is constructed or use basename().

Also, consider wrapping cleanup in on.exit() as suggested for the previous test.
R/converters.R (1)
170-199: Verify reduceBigDIANN exists and parameter alignment.

The function implementation follows the established pattern from bigSpectronauttoMSstatsFormat. The function reduceBigDIANN is correctly defined in R/clean_DIANN.R and properly invoked.

Looking at MSstatsPreprocessBig signature (R/converters.R:44-54), the parameter order is: input_file, output_file_name, backend, max_feature_count, filter_unique_peptides, aggregate_psms, filter_few_obs, remove_annotation, calculateAnomalyScores, anomalyModelFeatures, connection.

For bigSpectronauttoMSstatsFormat (R/converters.R:147-151):
MSstatsPreprocessBig(
  paste0("reduce_output_", output_file_name),
  output_file_name, backend, max_feature_count,
  aggregate_psms, filter_few_obs, remove_annotation, calculateAnomalyScores, 
  anomalyModelFeatures, connection)
This is missing filter_unique_peptides entirely—a bug in the existing function.

Your new function correctly includes all parameters in the proper order:
MSstatsPreprocessBig(
  paste0("reduce_output_", output_file_name),
  output_file_name, backend, max_feature_count,
  filter_unique_peptides, aggregate_psms, filter_few_obs, 
  remove_annotation, calculateAnomalyScores, 
  anomalyModelFeatures, connection)

DESCRIPTION

coderabbitai · 2026-01-05T23:50:12Z

man/bigDIANNtoMSstatsFormat.Rd

+\item{quantificationColumn}{Use 'FragmentQuantCorrected'(default) column for quantified intensities for DIANN 1.8.x.
+Use 'FragmentQuantRaw' for quantified intensities for DIANN 1.9.x.}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n man/bigDIANNtoMSstatsFormat.Rd

Repository: Vitek-Lab/MSstatsBig

Length of output: 3052

🏁 Script executed:

rg -i "fragment.quant|fragmentquant" -B 2 -A 2

Repository: Vitek-Lab/MSstatsBig

Length of output: 5131

🏁 Script executed:

cat -n R/clean_DIANN.R | head -50

Repository: Vitek-Lab/MSstatsBig

Length of output: 2207

🏁 Script executed:

rg "FragmentQuantRaw|Fragment.Quant.Raw" -B 2 -A 2

Repository: Vitek-Lab/MSstatsBig

Length of output: 832

🏁 Script executed:

rg "Fragment.Quant.Raw" -B 2 -A 2

Repository: Vitek-Lab/MSstatsBig

Length of output: 46

🏁 Script executed:

rg "1.9" -B 2 -A 2

Repository: Vitek-Lab/MSstatsBig

Length of output: 50377

🏁 Script executed:

grep -A 10 "read.*delim\|colnames" R/clean_DIANN.R | head -30

Repository: Vitek-Lab/MSstatsBig

Length of output: 1149

🏁 Script executed:

grep -B 5 -A 15 "intersect.*quantificationColumn\|quantificationColumn.*intersect" R/clean_DIANN.R

Repository: Vitek-Lab/MSstatsBig

Length of output: 852

🏁 Script executed:

rg -B 3 -A 3 "names\(|colnames.*=" R/ --max-count=5

Repository: Vitek-Lab/MSstatsBig

Length of output: 3036

🏁 Script executed:

rg "standardizeColnames|Fragment.Quant" R/clean_DIANN.R

Repository: Vitek-Lab/MSstatsBig

Length of output: 46

🏁 Script executed:

sed -n '1,100p' tests/testthat/test-diann_converter.R | grep -A 20 "Fragment"

Repository: Vitek-Lab/MSstatsBig

Length of output: 3986

Correct the column name documentation to match DIANN's actual output format.

The documentation incorrectly specifies column names without dots. DIANN exports columns with dots in their names. The actual column names are:

Fragment.Quant.Corrected (for DIANN 1.8.x) - not FragmentQuantCorrected

Fragment.Quant.Raw (for DIANN 1.9.x) - not FragmentQuantRaw

Update lines 33-34 to use the correct dotted format, and also fix the default value in reduceBigDIANN (R/clean_DIANN.R, line 3) which currently uses "FragmentQuantCorrected" instead of "Fragment.Quant.Corrected".

🤖 Prompt for AI Agents

In @man/bigDIANNtoMSstatsFormat.Rd around lines 33-34, The documentation and default parameter use incorrect DIANN column names without dots; update the man page entry for quantificationColumn to use "Fragment.Quant.Corrected" (DIANN 1.8.x) and "Fragment.Quant.Raw" (DIANN 1.9.x) instead of the dotted-less variants, and update the default value in the reduceBigDIANN function (R/clean_DIANN.R, function reduceBigDIANN) to "Fragment.Quant.Corrected" so the code and docs match DIANN's actual output column names.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check the man page documentation for the quantificationColumn parameter echo "=== Checking man/bigDIANNtoMSstatsFormat.Rd around lines 33-34 ===" sed -n '30,40p' man/bigDIANNtoMSstatsFormat.Rd echo -e "\n=== Checking R/clean_DIANN.R for reduceBigDIANN default parameter ===" # Check the function signature for reduceBigDIANN rg -A 5 "^reduceBigDIANN\s*<-\s*function" R/clean_DIANN.R

Length of output: 910

@Rudhik1904, I don't see the changes yet in the repository. The files still contain the incorrect column names:

man/bigDIANNtoMSstatsFormat.Rd (lines 33-34) still shows:

'FragmentQuantCorrected' → should be 'Fragment.Quant.Corrected'

'FragmentQuantRaw' → should be 'Fragment.Quant.Raw'

R/clean_DIANN.R (line 2) still shows:

quantificationColumn = "FragmentQuantCorrected" → should be "Fragment.Quant.Corrected"

Could you please commit and push the changes? The column names need to include the dots to match DIANN's actual output format.

R/clean_DIANN.R

R/converters.R

tonywu1999 · 2026-01-08T19:45:57Z

R/clean_DIANN.R

+  if ("Condition" %in% colnames(input)) msstats_cols <- c(msstats_cols, "Condition")
+  if ("BioReplicate" %in% colnames(input)) msstats_cols <- c(msstats_cols, "BioReplicate")


Confirmed with Devon, you can remove this code and instead add an additional parameter for users to provide an annotation file (i.e. the table with Run, BioReplicate, and Condition)

R/clean_DIANN.R

tonywu1999 · 2026-01-08T20:05:02Z

R/clean_DIANN.R

+  mbr_cols <- if (MBR) {
+    c('Lib.Q.Value', 'Lib.PG.Q.Value')
+  } else {
+    c('Global.Q.Value', 'Global.PG.Q.Value')
+  }


Here, we need to add a filter for filtering rows if their values in these columns is below a certain threshold:

see reference

R/clean_DIANN.R

coderabbitai

Actionable comments posted: 5

🤖 Fix all issues with AI agents

In `@R/clean_DIANN.R`:
- Around line 2-3: The default quantificationColumn in reduceBigDIANN is
incorrect ("FragmentQuantCorrected") and should match bigDIANNtoMSstatsFormat's
column name ("Fragment.Quant.Corrected") so the column can be found; update the
default value of the quantificationColumn parameter in reduceBigDIANN to
"Fragment.Quant.Corrected" (or otherwise normalize dotted vs. non-dotted names
inside reduceBigDIANN to accept both), referencing the reduceBigDIANN function
and the bigDIANNtoMSstatsFormat expectation when making the change.
- Around line 35-58: The code unconditionally uses columns "Fragment.Info" and
the variable quantificationColumn after only using intersect() earlier, so add
guards to skip downstream processing when those columns are missing: check if
quantificationColumn exists in colnames(input) before calling
tidyr::separate_rows on it or converting input[[quantificationColumn]] <-
as.numeric(...), and check for "Fragment.Info" in colnames(input) before
separating rows or any parsing of fragment data; use the existing split_cols or
explicit logical checks (e.g., if (quantificationColumn %in% colnames(input)) {
... } and if ("Fragment.Info" %in% colnames(input)) { ... }) around the
tidyr::separate_rows, fragment processing, and numeric conversion to avoid
crashes when columns are absent.
- Around line 64-70: ProductCharge currently becomes NA when Fragment.Info
contains "/" but no digits follow because as.integer(stringr::str_extract(...))
yields NA; modify the ProductCharge computation (the dplyr::if_else branch that
uses stringr::str_extract on .data$Fragment.Info) to wrap the extracted value
with dplyr::coalesce (or equivalent) so that if str_extract returns NA it falls
back to "1" before coercion, ensuring the final value is 1L when parsing fails.

In `@R/converters.R`:
- Around line 184-193: The intermediate path construction using
paste0("reduce_output_", output_file_name) is unsafe when output_file_name
contains directories; update calls that create the reduced intermediate filename
(in reduceBigDIANN -> MSstatsPreprocessBig sequence and the analogous locations
in bigSpectronauttoMSstatsFormat) to build the path with
file.path(dirname(output_file_name), paste0("reduce_output_",
basename(output_file_name))) so the reduce_output_* file is created alongside
the intended output file regardless of directory components.

In `@tests/testthat/test-diann_converter.R`:
- Around line 78-108: The test uses a static output filename
("test_diann_output.csv") which can cause collisions; change the output_file
assignment to use tempfile(fileext = ".csv") and update all subsequent
references (the call to bigDIANNtoMSstatsFormat, the dplyr::collect on
converted, and the cleanup file.remove calls that use output_file and
paste0("reduce_output_", output_file)) to use that tempfile variable so the test
writes/cleans unique temp files; relevant symbols: output_file,
bigDIANNtoMSstatsFormat, converted, and the cleanup file.remove/paste0 calls.

♻️ Duplicate comments (2)

R/clean_DIANN.R (2)
7-13: Stop on unsupported extensions; avoid treating .xlsx as TSV.

grepl("tsv|xls") matches .xlsx, which is binary. The ";" fallback can silently mis-parse files. Prefer anchored, case-insensitive extension checks and error for unsupported formats.
🐛 Proposed fix
-  if (grepl("csv", input_file)) {
+  if (grepl("\\.csv$", input_file, ignore.case = TRUE)) {
     delim = ","
-  } else if (grepl("tsv|xls", input_file)) {
+  } else if (grepl("\\.(tsv|txt)$", input_file, ignore.case = TRUE)) {
     delim = "\t"
   } else {
-    delim <- ";"
+    stop("Unsupported file extension. Please provide a .csv or .tsv/.txt file.")
   }
60-63: Remove the ineffective FragmentIon regex.

sub('\\^\\.\\*', ...) matches the literal "^.*" and is effectively a no-op. If no transformation is intended, assign the value directly.
🔧 Suggested change
-    FragmentIon = sub('\\^\\.\\*', '', .data$Fragment.Info),
+    FragmentIon = .data$Fragment.Info,

coderabbitai · 2026-01-20T04:48:21Z

R/clean_DIANN.R

+reduceBigDIANN <- function(input_file, output_path, MBR = TRUE,
+                           quantificationColumn = "FragmentQuantCorrected",


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find and examine the reduceBigDIANN function find . -name "clean_DIANN.R" -o -name "*DIANN*" -type f | head -20

Repository: Vitek-Lab/MSstatsBig

Length of output: 114

🏁 Script executed:

# Search for the reduceBigDIANN function definition rg -n "reduceBigDIANN\s*<-\s*function" -A 5

Repository: Vitek-Lab/MSstatsBig

Length of output: 504

🏁 Script executed:

# Search for bigDIANNtoMSstatsFormat function and its quantificationColumn default rg -n "bigDIANNtoMSstatsFormat\s*<-\s*function" -A 10

Repository: Vitek-Lab/MSstatsBig

Length of output: 938

🏁 Script executed:

# Search for any test files mentioning quantificationColumn fd -e R -path "*/test*" | xargs rg "quantificationColumn|Fragment" 2>/dev/null | head -30

Repository: Vitek-Lab/MSstatsBig

Length of output: 4826

Align quantificationColumn default with bigDIANNtoMSstatsFormat.

The default "FragmentQuantCorrected" differs from bigDIANNtoMSstatsFormat ("Fragment.Quant.Corrected"), and test data confirms DIANN output uses the dotted notation. Calling reduceBigDIANN without override will fail to locate the column.

🔧 Suggested change

-reduceBigDIANN <- function(input_file, output_path, MBR = TRUE, - quantificationColumn = "FragmentQuantCorrected", +reduceBigDIANN <- function(input_file, output_path, MBR = TRUE, + quantificationColumn = "Fragment.Quant.Corrected",

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

reduceBigDIANN <- function(input_file, output_path, MBR = TRUE,

quantificationColumn = "FragmentQuantCorrected",

reduceBigDIANN <- function(input_file, output_path, MBR = TRUE,

quantificationColumn = "Fragment.Quant.Corrected",

🤖 Prompt for AI Agents

In `@R/clean_DIANN.R` around lines 2 - 3, The default quantificationColumn in reduceBigDIANN is incorrect ("FragmentQuantCorrected") and should match bigDIANNtoMSstatsFormat's column name ("Fragment.Quant.Corrected") so the column can be found; update the default value of the quantificationColumn parameter in reduceBigDIANN to "Fragment.Quant.Corrected" (or otherwise normalize dotted vs. non-dotted names inside reduceBigDIANN to accept both), referencing the reduceBigDIANN function and the bigDIANNtoMSstatsFormat expectation when making the change.

coderabbitai · 2026-01-20T04:48:21Z

R/clean_DIANN.R

+  # 1. Select required columns
+  base_cols <- c('Protein.Names', 'Stripped.Sequence', 'Modified.Sequence', 
+                 'Precursor.Charge', quantificationColumn, 'Q.Value', 
+                 'Precursor.Mz', 'Fragment.Info', 'Run')
+
+  mbr_cols <- if (MBR) {
+    c('Lib.Q.Value', 'Lib.PG.Q.Value')
+  } else {
+    c('Global.Q.Value', 'Global.PG.Q.Value')
+  }
+
+  req_cols <- intersect(c(base_cols, mbr_cols), colnames(input))
+  input <- dplyr::select(input, all_of(req_cols))
+
+  # 2. Split concatenated values (un-nest)
+  split_cols <- intersect(c(quantificationColumn, "Fragment.Info"), colnames(input))
+  if (length(split_cols) > 0) {
+    input <- tidyr::separate_rows(input, all_of(split_cols), sep = ";")
+  }
+
+  # 3. Process fragment information
+
+  #Convert Intensity to Numeric from Char strings
+  input[[quantificationColumn]] <- as.numeric(input[[quantificationColumn]])


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /tmp && find . -name "clean_DIANN.R" -type f 2>/dev/null | head -5

Repository: Vitek-Lab/MSstatsBig

Length of output: 46

🏁 Script executed:

# Get overall structure first git ls-files | grep -i diann

Repository: Vitek-Lab/MSstatsBig

Length of output: 190

🏁 Script executed:

# Search for the file in common R locations find . -name "clean_DIANN.R" -o -name "*DIANN*" 2>/dev/null | head -10

Repository: Vitek-Lab/MSstatsBig

Length of output: 114

🏁 Script executed:

wc -l R/clean_DIANN.R

Repository: Vitek-Lab/MSstatsBig

Length of output: 83

🏁 Script executed:

cat -n R/clean_DIANN.R

Repository: Vitek-Lab/MSstatsBig

Length of output: 6174

Guard against missing Fragment.Info or quantificationColumn before unconditional use.

The code treats Fragment.Info and quantificationColumn as optional via intersect() at lines 46–50, but uses them unconditionally at lines 58, 62–68, and 76. If either column is absent from the input, the code will crash. Either enforce these as required columns upfront or conditionally apply the downstream processing when they are missing.

🔧 Example guard

- req_cols <- intersect(c(base_cols, mbr_cols), colnames(input)) - input <- dplyr::select(input, all_of(req_cols)) + required_cols <- c(base_cols, mbr_cols) + missing_cols <- setdiff(required_cols, colnames(input)) + if (length(missing_cols) > 0) { + stop("Missing required DIANN columns: ", paste(missing_cols, collapse = ", ")) + } + input <- dplyr::select(input, all_of(required_cols))

🤖 Prompt for AI Agents

In `@R/clean_DIANN.R` around lines 35 - 58, The code unconditionally uses columns "Fragment.Info" and the variable quantificationColumn after only using intersect() earlier, so add guards to skip downstream processing when those columns are missing: check if quantificationColumn exists in colnames(input) before calling tidyr::separate_rows on it or converting input[[quantificationColumn]] <- as.numeric(...), and check for "Fragment.Info" in colnames(input) before separating rows or any parsing of fragment data; use the existing split_cols or explicit logical checks (e.g., if (quantificationColumn %in% colnames(input)) { ... } and if ("Fragment.Info" %in% colnames(input)) { ... }) around the tidyr::separate_rows, fragment processing, and numeric conversion to avoid crashes when columns are absent.

coderabbitai · 2026-01-20T04:48:21Z

R/clean_DIANN.R

+    # Extract product charge
+    ProductCharge = dplyr::if_else(
+      grepl("/", .data$Fragment.Info),
+      # Extract charge (number right after "/" in string), default to 1 if parsing fails
+      as.integer(stringr::str_extract(.data$Fragment.Info, "(?<=/)[0-9]+")),
+      1L
+    )


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "clean_DIANN.R" -type f

Repository: Vitek-Lab/MSstatsBig

Length of output: 81

🏁 Script executed:

sed -n '64,70p' ./R/clean_DIANN.R

Repository: Vitek-Lab/MSstatsBig

Length of output: 348

🏁 Script executed:

python3 << 'EOF' # Let's verify the behavior of stringr::str_extract and as.integer in R # by examining what happens when the regex pattern doesn't match # The behavior should be: # stringr::str_extract("something/", "(?<=/)[0-9]+") -> NA (no digits after /) # as.integer(NA) -> NA (not 1L) # This is a R behavior question, let me create a test to understand it better print("Testing R string extraction and type conversion behavior:") print("Case 1: Fragment.Info with / but no digits after it") print(" stringr::str_extract returns: NA (pattern doesn't match)") print(" as.integer(NA) returns: NA (not 1L)") print("") print("Case 2: Fragment.Info with / and digits after it") print(" stringr::str_extract returns: the digits") print(" as.integer(digits) returns: integer value") print("") print("Case 3: Fragment.Info without /") print(" Current code returns: 1L (via the else branch of if_else)") print("") print("So the bug is in Case 1 - when / is present but no digits follow") print("The function would return NA instead of 1L") EOF

Repository: Vitek-Lab/MSstatsBig

Length of output: 150

🏁 Script executed:

grep -r "ProductCharge" ./R --include="*.R" -B 2 -A 2

Repository: Vitek-Lab/MSstatsBig

Length of output: 4903

🏁 Script executed:

sed -n '50,80p' ./R/clean_DIANN.R

Repository: Vitek-Lab/MSstatsBig

Length of output: 1088

Ensure ProductCharge defaults to 1 when parsing fails.

The current code returns NA when a / is present in Fragment.Info but no digits follow (e.g., "y1/"). The regex pattern "(?<=/)[0-9]+" fails to match, causing stringr::str_extract() to return NA, and subsequently as.integer(NA) returns NA instead of the intended default of 1L. Use dplyr::coalesce() to handle this edge case:

🔧 Suggested change

- ProductCharge = dplyr::if_else( - grepl("/", .data$Fragment.Info), - # Extract charge (number right after "/" in string), default to 1 if parsing fails - as.integer(stringr::str_extract(.data$Fragment.Info, "(?<=/)[0-9]+")), - 1L - ) + ProductCharge = dplyr::if_else( + grepl("/", .data$Fragment.Info), + dplyr::coalesce( + as.integer(stringr::str_extract(.data$Fragment.Info, "(?<=/)[0-9]+")), + 1L + ), + 1L + )

🤖 Prompt for AI Agents

In `@R/clean_DIANN.R` around lines 64 - 70, ProductCharge currently becomes NA when Fragment.Info contains "/" but no digits follow because as.integer(stringr::str_extract(...)) yields NA; modify the ProductCharge computation (the dplyr::if_else branch that uses stringr::str_extract on .data$Fragment.Info) to wrap the extracted value with dplyr::coalesce (or equivalent) so that if str_extract returns NA it falls back to "1" before coercion, ensuring the final value is 1L when parsing fails.

coderabbitai · 2026-01-20T04:48:21Z

R/converters.R

+  # Reduce and clean the DIANN report file in chunks
+  reduceBigDIANN(input_file, 
+                 paste0("reduce_output_", output_file_name),
+                 MBR,
+                 quantificationColumn)
+
+  # Preprocess the cleaned data (feature selection, etc.)
+  msstats_data <- MSstatsPreprocessBig(
+    paste0("reduce_output_", output_file_name),
+    output_file_name, backend, max_feature_count,


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

fd "converters.R" --type f

Repository: Vitek-Lab/MSstatsBig

Length of output: 111

🏁 Script executed:

wc -l R/converters.R

Repository: Vitek-Lab/MSstatsBig

Length of output: 82

🏁 Script executed:

cat -n R/converters.R | sed -n '180,195p'

Repository: Vitek-Lab/MSstatsBig

Length of output: 878

🏁 Script executed:

cat -n R/converters.R | sed -n '170,200p'

Repository: Vitek-Lab/MSstatsBig

Length of output: 1649

🏁 Script executed:

cat -n R/converters.R | sed -n '1,30p'

Repository: Vitek-Lab/MSstatsBig

Length of output: 2410

🏁 Script executed:

rg "bigDIANNtoMSstatsFormat" --type r -A 3 -B 3

Repository: Vitek-Lab/MSstatsBig

Length of output: 1488

🏁 Script executed:

cat -n tests/testthat/test-diann_converter.R | grep -A 50 "bigDIANNtoMSstatsFormat works with arrow"

Repository: Vitek-Lab/MSstatsBig

Length of output: 2424

🏁 Script executed:

cat -n tests/testthat/test-diann_converter.R

Repository: Vitek-Lab/MSstatsBig

Length of output: 5551

🏁 Script executed:

rg "output_file_name" R/converters.R -n

Repository: Vitek-Lab/MSstatsBig

Length of output: 958

🏁 Script executed:

cat -n R/converters.R | sed -n '128,155p'

Repository: Vitek-Lab/MSstatsBig

Length of output: 1853

Fix unsafe path construction that breaks when output_file_name contains directories.

The paste0("reduce_output_", output_file_name) calls at lines 186 and 192 produce invalid paths if output_file_name includes directory separators (e.g., /tmp/out.csv → reduce_output_/tmp/out.csv). Use file.path() with dirname() and basename() to safely place the intermediate file alongside the output.

Note: The same pattern exists in bigSpectronauttoMSstatsFormat at lines 143 and 148 and should be fixed identically.

🐛 Proposed fix

+ reduce_path <- file.path( + dirname(output_file_name), + paste0("reduce_output_", basename(output_file_name)) + ) + reduceBigDIANN(input_file, reduce_path, MBR, quantificationColumn) - reduceBigDIANN(input_file, - paste0("reduce_output_", output_file_name), - MBR, - quantificationColumn) - msstats_data <- MSstatsPreprocessBig( - paste0("reduce_output_", output_file_name), + msstats_data <- MSstatsPreprocessBig( + reduce_path, output_file_name, backend, max_feature_count,

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Reduce and clean the DIANN report file in chunks

reduceBigDIANN(input_file,

paste0("reduce_output_", output_file_name),

MBR,

quantificationColumn)

# Preprocess the cleaned data (feature selection, etc.)

msstats_data <- MSstatsPreprocessBig(

paste0("reduce_output_", output_file_name),

output_file_name, backend, max_feature_count,

# Reduce and clean the DIANN report file in chunks

reduce_path <- file.path(

dirname(output_file_name),

paste0("reduce_output_", basename(output_file_name))

)

reduceBigDIANN(input_file, reduce_path, MBR, quantificationColumn)

# Preprocess the cleaned data (feature selection, etc.)

msstats_data <- MSstatsPreprocessBig(

reduce_path,

output_file_name, backend, max_feature_count,

🤖 Prompt for AI Agents

In `@R/converters.R` around lines 184 - 193, The intermediate path construction using paste0("reduce_output_", output_file_name) is unsafe when output_file_name contains directories; update calls that create the reduced intermediate filename (in reduceBigDIANN -> MSstatsPreprocessBig sequence and the analogous locations in bigSpectronauttoMSstatsFormat) to build the path with file.path(dirname(output_file_name), paste0("reduce_output_", basename(output_file_name))) so the reduce_output_* file is created alongside the intended output file regardless of directory components.

coderabbitai · 2026-01-20T04:48:21Z

tests/testthat/test-diann_converter.R

+test_that("bigDIANNtoMSstatsFormat works with arrow backend", {
+  input_file <- tempfile(fileext = ".csv")
+  output_file <- "test_diann_output.csv"
+
+  # 4 features for one protein. Feature selection should pick the top 2.
+  diann_data <- rbind(
+    data.frame(Run = c("r1", "r2"), Protein.Names = "P1", Stripped.Sequence = "PEPTIDE", Modified.Sequence = "PEPTIDE", Precursor.Charge = 2, Fragment.Quant.Corrected = c(1000, 1100), Q.Value = 0.001, Precursor.Mz = 500, Fragment.Info = "y1", Lib.Q.Value = 0.001, Lib.PG.Q.Value = 0.001),
+    data.frame(Run = c("r1", "r2"), Protein.Names = "P1", Stripped.Sequence = "PEPTIDE", Modified.Sequence = "PEPTIDE", Precursor.Charge = 2, Fragment.Quant.Corrected = c(500, 600), Q.Value = 0.001, Precursor.Mz = 500, Fragment.Info = "y2", Lib.Q.Value = 0.001, Lib.PG.Q.Value = 0.001),
+    data.frame(Run = c("r1", "r2"), Protein.Names = "P1", Stripped.Sequence = "PEPTIDE", Modified.Sequence = "PEPTIDE", Precursor.Charge = 2, Fragment.Quant.Corrected = c(100, 100), Q.Value = 0.001, Precursor.Mz = 500, Fragment.Info = "y3", Lib.Q.Value = 0.001, Lib.PG.Q.Value = 0.001),
+    data.frame(Run = c("r1", "r2"), Protein.Names = "P1", Stripped.Sequence = "PEPTIDE", Modified.Sequence = "PEPTIDE", Precursor.Charge = 2, Fragment.Quant.Corrected = c(2000, 2100), Q.Value = 0.001, Precursor.Mz = 500, Fragment.Info = "y4", Lib.Q.Value = 0.001, Lib.PG.Q.Value = 0.001)
+  )
+  write.csv(diann_data, input_file, row.names = FALSE)
+
+  converted <- bigDIANNtoMSstatsFormat(
+    input_file = input_file,
+    output_file_name = output_file,
+    backend = "arrow",
+    max_feature_count = 2
+  )
+  result <- dplyr::collect(converted)
+
+  # Avg intensities: y1=1050, y2=550, y3=100, y4=2050.
+  # Top 2 features are y4 and y1.
+  expect_equal(nrow(result), 4) # 2 features * 2 runs
+  expect_true(all(c("y1", "y4") %in% unique(result$FragmentIon)))
+  expect_false(any(c("y2", "y3") %in% unique(result$FragmentIon)))
+
+  # Cleanup
+  file.remove(input_file)
+  if (file.exists(output_file)) file.remove(output_file)
+  if (file.exists(paste0("reduce_output_", output_file))) file.remove(paste0("reduce_output_", output_file))


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the actual test file to confirm the current state cat -n tests/testthat/test-diann_converter.R | sed -n '70,115p'

Repository: Vitek-Lab/MSstatsBig

Length of output: 2688

🏁 Script executed:

# Check if there's any test configuration for parallel execution git ls-files | grep -E "(testthat|\.yml|\.yaml|\.json)" | head -20

Repository: Vitek-Lab/MSstatsBig

Length of output: 257

🏁 Script executed:

# Look for similar test output file patterns in other test files fd '\.R$' tests/testthat/ | xargs grep -l "tempfile\|file.remove" | head -10

Repository: Vitek-Lab/MSstatsBig

Length of output: 134

🏁 Script executed:

# Check for test configuration files (especially for parallel execution) fd -e yml -e yaml -e dcf | head -20

Repository: Vitek-Lab/MSstatsBig

Length of output: 46

🏁 Script executed:

# Look at other tests to see if they use static filenames too grep -n "output_file.*<-" tests/testthat/*.R

Repository: Vitek-Lab/MSstatsBig

Length of output: 465

🏁 Script executed:

# Check the git history to see if these leftover files should have been cleaned git ls-files tests/testthat/*.csv

Repository: Vitek-Lab/MSstatsBig

Length of output: 186

🏁 Script executed:

# Look for how other tests in the project handle temporary output files grep -B 2 -A 2 "tempfile.*output\|output.*tempfile" tests/testthat/*.R | head -40

Repository: Vitek-Lab/MSstatsBig

Length of output: 803

Use tempfile() instead of static filename to prevent test collisions.

The other tests in this file (lines 7 and 45) correctly use tempfile(fileext = ".csv") for output files. This test should follow the same pattern to avoid potential collisions in parallel test runs or from previous test failures. Evidence of incomplete cleanup is visible in the repository with leftover CSV files (e.g., topN_test_diann_output.csv).

🔧 Suggested change

- output_file <- "test_diann_output.csv" + output_file <- basename(tempfile(fileext = ".csv"))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

test_that("bigDIANNtoMSstatsFormat works with arrow backend", {

input_file <- tempfile(fileext = ".csv")

output_file <- "test_diann_output.csv"

# 4 features for one protein. Feature selection should pick the top 2.

diann_data <- rbind(

data.frame(Run = c("r1", "r2"), Protein.Names = "P1", Stripped.Sequence = "PEPTIDE", Modified.Sequence = "PEPTIDE", Precursor.Charge = 2, Fragment.Quant.Corrected = c(1000, 1100), Q.Value = 0.001, Precursor.Mz = 500, Fragment.Info = "y1", Lib.Q.Value = 0.001, Lib.PG.Q.Value = 0.001),

data.frame(Run = c("r1", "r2"), Protein.Names = "P1", Stripped.Sequence = "PEPTIDE", Modified.Sequence = "PEPTIDE", Precursor.Charge = 2, Fragment.Quant.Corrected = c(500, 600), Q.Value = 0.001, Precursor.Mz = 500, Fragment.Info = "y2", Lib.Q.Value = 0.001, Lib.PG.Q.Value = 0.001),

data.frame(Run = c("r1", "r2"), Protein.Names = "P1", Stripped.Sequence = "PEPTIDE", Modified.Sequence = "PEPTIDE", Precursor.Charge = 2, Fragment.Quant.Corrected = c(100, 100), Q.Value = 0.001, Precursor.Mz = 500, Fragment.Info = "y3", Lib.Q.Value = 0.001, Lib.PG.Q.Value = 0.001),

data.frame(Run = c("r1", "r2"), Protein.Names = "P1", Stripped.Sequence = "PEPTIDE", Modified.Sequence = "PEPTIDE", Precursor.Charge = 2, Fragment.Quant.Corrected = c(2000, 2100), Q.Value = 0.001, Precursor.Mz = 500, Fragment.Info = "y4", Lib.Q.Value = 0.001, Lib.PG.Q.Value = 0.001)

)

write.csv(diann_data, input_file, row.names = FALSE)

converted <- bigDIANNtoMSstatsFormat(

input_file = input_file,

output_file_name = output_file,

backend = "arrow",

max_feature_count = 2

)

result <- dplyr::collect(converted)

# Avg intensities: y1=1050, y2=550, y3=100, y4=2050.

# Top 2 features are y4 and y1.

expect_equal(nrow(result), 4) # 2 features * 2 runs

expect_true(all(c("y1", "y4") %in% unique(result$FragmentIon)))

expect_false(any(c("y2", "y3") %in% unique(result$FragmentIon)))

# Cleanup

file.remove(input_file)

if (file.exists(output_file)) file.remove(output_file)

if (file.exists(paste0("reduce_output_", output_file))) file.remove(paste0("reduce_output_", output_file))

test_that("bigDIANNtoMSstatsFormat works with arrow backend", {

input_file <- tempfile(fileext = ".csv")

output_file <- basename(tempfile(fileext = ".csv"))

# 4 features for one protein. Feature selection should pick the top 2.

diann_data <- rbind(

data.frame(Run = c("r1", "r2"), Protein.Names = "P1", Stripped.Sequence = "PEPTIDE", Modified.Sequence = "PEPTIDE", Precursor.Charge = 2, Fragment.Quant.Corrected = c(1000, 1100), Q.Value = 0.001, Precursor.Mz = 500, Fragment.Info = "y1", Lib.Q.Value = 0.001, Lib.PG.Q.Value = 0.001),

data.frame(Run = c("r1", "r2"), Protein.Names = "P1", Stripped.Sequence = "PEPTIDE", Modified.Sequence = "PEPTIDE", Precursor.Charge = 2, Fragment.Quant.Corrected = c(500, 600), Q.Value = 0.001, Precursor.Mz = 500, Fragment.Info = "y2", Lib.Q.Value = 0.001, Lib.PG.Q.Value = 0.001),

data.frame(Run = c("r1", "r2"), Protein.Names = "P1", Stripped.Sequence = "PEPTIDE", Modified.Sequence = "PEPTIDE", Precursor.Charge = 2, Fragment.Quant.Corrected = c(100, 100), Q.Value = 0.001, Precursor.Mz = 500, Fragment.Info = "y3", Lib.Q.Value = 0.001, Lib.PG.Q.Value = 0.001),

data.frame(Run = c("r1", "r2"), Protein.Names = "P1", Stripped.Sequence = "PEPTIDE", Modified.Sequence = "PEPTIDE", Precursor.Charge = 2, Fragment.Quant.Corrected = c(2000, 2100), Q.Value = 0.001, Precursor.Mz = 500, Fragment.Info = "y4", Lib.Q.Value = 0.001, Lib.PG.Q.Value = 0.001)

)

write.csv(diann_data, input_file, row.names = FALSE)

converted <- bigDIANNtoMSstatsFormat(

input_file = input_file,

output_file_name = output_file,

backend = "arrow",

max_feature_count = 2

)

result <- dplyr::collect(converted)

# Avg intensities: y1=1050, y2=550, y3=100, y4=2050.

# Top 2 features are y4 and y1.

expect_equal(nrow(result), 4) # 2 features * 2 runs

expect_true(all(c("y1", "y4") %in% unique(result$FragmentIon)))

expect_false(any(c("y2", "y3") %in% unique(result$FragmentIon)))

# Cleanup

file.remove(input_file)

if (file.exists(output_file)) file.remove(output_file)

if (file.exists(paste0("reduce_output_", output_file))) file.remove(paste0("reduce_output_", output_file))

})

🤖 Prompt for AI Agents

In `@tests/testthat/test-diann_converter.R` around lines 78 - 108, The test uses a static output filename ("test_diann_output.csv") which can cause collisions; change the output_file assignment to use tempfile(fileext = ".csv") and update all subsequent references (the call to bigDIANNtoMSstatsFormat, the dplyr::collect on converted, and the cleanup file.remove calls that use output_file and paste0("reduce_output_", output_file)) to use that tempfile variable so the test writes/cleans unique temp files; relevant symbols: output_file, bigDIANNtoMSstatsFormat, converted, and the cleanup file.remove/paste0 calls.

Implementing and Testing DIANN converter for MSstatsBIG.

1dfd8f0

coderabbitai bot reviewed Jan 5, 2026

View reviewed changes

tonywu1999 reviewed Jan 8, 2026

View reviewed changes

R/clean_DIANN.R Show resolved Hide resolved

tonywu1999 reviewed Jan 8, 2026

View reviewed changes

R/clean_DIANN.R Show resolved Hide resolved

tonywu1999 reviewed Jan 8, 2026

View reviewed changes

R/clean_DIANN.R Show resolved Hide resolved

Responding to comments

005ced9

coderabbitai bot reviewed Jan 20, 2026

View reviewed changes

		\item{quantificationColumn}{Use 'FragmentQuantCorrected'(default) column for quantified intensities for DIANN 1.8.x.
		Use 'FragmentQuantRaw' for quantified intensities for DIANN 1.9.x.}

		if ("Condition" %in% colnames(input)) msstats_cols <- c(msstats_cols, "Condition")
		if ("BioReplicate" %in% colnames(input)) msstats_cols <- c(msstats_cols, "BioReplicate")

		reduceBigDIANN <- function(input_file, output_path, MBR = TRUE,
		quantificationColumn = "FragmentQuantCorrected",

Implementing and Testing DIANN converter for MSstatsBIG. #9

Are you sure you want to change the base?

Implementing and Testing DIANN converter for MSstatsBIG. #9

Conversation

Rudhik1904 commented Jan 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Rudhik1904 Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tonywu1999 Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Rudhik1904 Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tonywu1999 Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Rudhik1904 Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Rudhik1904 commented Jan 5, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 5, 2026 •

edited

Loading

coderabbitai bot Jan 5, 2026 •

edited

Loading

tonywu1999 Jan 8, 2026 •

edited

Loading