Skip to content

Conversation

@Rudhik1904
Copy link

@Rudhik1904 Rudhik1904 commented Jan 5, 2026

WIP: Added Clean_DIANN, Diann Converter and some basic test (more comprehensive test is still needed)

Summary by CodeRabbit

  • New Features

    • Out-of-memory DIANN → MSstats conversion with chunked, delimiter-aware processing and MSstats-formatted output.
    • New exported converter for end-to-end DIANN-to-MSstats workflow with options for MBR, quant column selection, filtering, aggregation, and anomaly scoring.
  • Tests

    • Added unit and integration tests covering conversion, preprocessing, annotation merging, and feature selection.
  • Chores

    • Updated package metadata to bump tooling and add testing/development dependencies.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 5, 2026

📝 Walkthrough

Walkthrough

Adds chunked DIANN-to-MSstats conversion: new chunked reader (reduceBigDIANN), per-chunk processor (cleanDIANNChunk), a public converter (bigDIANNtoMSstatsFormat), tests for DIANN/converter flows, and DESCRIPTION/NAMESPACE dependency and export updates (new Imports and Suggests; RoxygenNote bump).

Changes

Cohort / File(s) Summary
Build Configuration
DESCRIPTION
Bumped RoxygenNote to 7.3.3; expanded Imports to include dplyr, MSstats, MSstatsConvert; added Suggests: testthat, mockery.
Public API
NAMESPACE
Exported new function bigDIANNtoMSstatsFormat.
DIANN Processing
R/clean_DIANN.R
Added reduceBigDIANN() (delimiter detection + read_delim_chunked, 1e6-row chunks) and cleanDIANNChunk() (MBR-aware column selection, un-nesting semicolon fields, numeric conversion, Fragment.Info → FragmentIon/ProductCharge parsing, filtering by Q-values, MSstats column mapping, per-chunk CSV write/append).
Converter Integration
R/converters.R
Added exported bigDIANNtoMSstatsFormat() which invokes reduceBigDIANN, then calls MSstatsPreprocessBig and returns Arrow/Sparklyr-compatible object; note: duplicate definition appears in file (potential duplication).
Documentation
man/bigDIANNtoMSstatsFormat.Rd
Added roxygen-generated man page for bigDIANNtoMSstatsFormat() (usage, args, return type).
Tests — Converters
tests/testthat/test-converters.R
New tests for MSstatsAddAnnotationBig, MSstatsPreprocessBig feature selection, and an integration test for a bigSpectronaut converter flow using Arrow backend and temporary file I/O.
Tests — DIANN
tests/testthat/test-diann_converter.R
New tests for cleanDIANNChunk, reduceBigDIANN, and bigDIANNtoMSstatsFormat (end-to-end arrow backend checks, feature selection, and cleanup).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant bigDIANN as bigDIANNtoMSstatsFormat
    participant reduce as reduceBigDIANN
    participant chunk as cleanDIANNChunk
    participant preprocess as MSstatsPreprocessBig
    participant output as Arrow/Sparklyr

    User->>bigDIANN: call with DIANN file & params
    bigDIANN->>reduce: orchestrate chunked reduction
    reduce->>reduce: detect delimiter

    rect rgba(200,220,255,0.5)
    Note over reduce,chunk: Chunked loop (1e6 rows/chunk)
    reduce->>chunk: process chunk 1
    chunk->>chunk: select columns (MBR-aware)
    chunk->>chunk: un-nest semicolon fields
    chunk->>chunk: parse FragmentIon & ProductCharge
    chunk->>chunk: apply Q-value and fragment filters
    chunk->>reduce: return processed chunk
    reduce->>reduce: append to CSV output
    reduce->>chunk: process chunk 2...N
    end

    reduce-->>bigDIANN: completed reduced CSV
    bigDIANN->>preprocess: MSstatsPreprocessBig (feature selection, filtering, anomaly scoring)
    preprocess->>output: produce Arrow/Sparklyr table
    bigDIANN->>User: return processed data
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

  • Feature anomaly #6: Shares anomaly-scoring parameters and preprocessing workflow changes that overlap with the new converter signature and MSstatsPreprocessBig integration.

Poem

🐰
I nibble through DIANN in million-row bites,
unnesting fragments and parsing the nights.
I hop, I filter, I map every line,
then bundle for MSstats — tidy and fine. 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main change—implementing and testing a DIANN converter for MSstatsBIG—which aligns with the substantial additions across multiple files including converter functions, tests, and documentation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Fix all issues with AI Agents 🤖
In @DESCRIPTION:
- Around line 26-27: Move the test-only packages out of Imports and into
Suggests in DESCRIPTION: remove "testthat" and "mockery" from the Imports list
and add them under Suggests so they are only required for development/testing;
update the DESCRIPTION sections accordingly so Imports contains runtime
dependencies only and Suggests lists testthat and mockery.

In @man/bigDIANNtoMSstatsFormat.Rd:
- Around line 33-34: The documentation and default parameter use incorrect DIANN
column names without dots; update the man page entry for quantificationColumn to
use "Fragment.Quant.Corrected" (DIANN 1.8.x) and "Fragment.Quant.Raw" (DIANN
1.9.x) instead of the dotted-less variants, and update the default value in the
reduceBigDIANN function (R/clean_DIANN.R, function reduceBigDIANN) to
"Fragment.Quant.Corrected" so the code and docs match DIANN's actual output
column names.

In @R/clean_DIANN.R:
- Around line 49-58: The current sub('\\^\\.\\*', '', .data$Fragment.Info) in
the dplyr::mutate (assigning FragmentIon) uses an incorrect literal regex and
does nothing; either remove the sub and set FragmentIon = .data$Fragment.Info
directly, or replace the pattern with one that strips the charge/suffix (e.g.
sub("\\^.*$", "", .data$Fragment.Info)) so "y7^1/1" -> "y7" and "b3-H2O^1/1" ->
"b3-H2O"; update the FragmentIon assignment accordingly in the mutate call.

In @R/converters.R:
- Around line 162-163: The roxygen docs for the quantificationColumn parameter
are inconsistent with the actual default; update the documentation text for
quantificationColumn to match the real default value used in the function
(change the displayed 'FragmentQuantCorrected' to 'Fragment.Quant.Corrected')
and likewise ensure the alternative value text matches the actual option name
used in code (e.g., use 'Fragment.Quant.Raw' if that is the real token), so the
docstring and the default parameter value for quantificationColumn are
identical.
🧹 Nitpick comments (6)
R/clean_DIANN.R (2)

91-91: Address the TODO comment regarding annotation columns.

The TODO indicates uncertainty about whether Condition and BioReplicate columns are needed. This should be confirmed with Tony (or through documentation review) to ensure the converter produces the correct MSstats format.

Would you like me to help search the MSstats documentation or existing converter implementations to verify the required columns?


2-21: Consider adding error handling for I/O operations.

Both reduceBigDIANN and cleanDIANNChunk lack error handling for file operations. Consider wrapping I/O operations in tryCatch blocks to provide informative error messages when:

  • Input files are missing, corrupted, or have incorrect format
  • Output path is invalid or write-protected
  • Disk space is exhausted during chunked writes

This would improve the user experience when processing large files that might fail midway through.

Also applies to: 24-113

tests/testthat/test-converters.R (4)

31-32: Use tempfile() for output file to avoid test pollution.

The input file uses tempfile() but the output file is hardcoded as "preprocess_output.csv". This could cause issues with parallel test execution or leave artifacts in the working directory if the test fails before cleanup.

🔎 Proposed fix
   input_file <- tempfile(fileext = ".csv")
-  output_file <- "preprocess_output.csv"
+  output_file <- tempfile(fileext = ".csv")

57-60: Use on.exit() for guaranteed cleanup.

If an assertion fails before reaching cleanup, temporary files may persist. Using on.exit() ensures cleanup runs regardless of test outcome.

🔎 Proposed fix
   input_file <- tempfile(fileext = ".csv")
-  output_file <- "preprocess_output.csv"
+  output_file <- tempfile(fileext = ".csv")
+  on.exit({
+    if (file.exists(input_file)) file.remove(input_file)
+    if (file.exists(output_file)) file.remove(output_file)
+  }, add = TRUE)

   # ... test code ...

-  # Cleanup
-  file.remove(input_file)
-  if (file.exists(output_file)) file.remove(output_file)
 })

64-64: Unused mock_reduce variable.

mock_reduce is created but never used. The stub() function on line 66 is what actually replaces reduceBigSpectronaut. This line can be removed.

🔎 Proposed fix
 test_that("bigSpectronauttoMSstatsFormat works correctly", {
-  # Mock reduceBigSpectronaut as its source is not provided
-  mock_reduce <- mock(NULL)
-
   stub(bigSpectronauttoMSstatsFormat, "reduceBigSpectronaut", function(input_file, output_path, ...) {

76-77: Use tempfile() for test files.

Similar to the previous test, hardcoded filenames can cause test pollution. While input_file isn't actually read (due to mocking), using tempfile() for output_file ensures proper isolation.

🔎 Proposed fix
-  input_file <- "dummy_spectro_input.csv"
-  output_file <- "spectro_output.csv"
+  input_file <- "dummy_spectro_input.csv"  # Not actually read due to mock
+  output_file <- tempfile(fileext = ".csv")
+  on.exit({
+    if (file.exists(output_file)) file.remove(output_file)
+    reduce_file <- paste0("reduce_output_", basename(output_file))
+    if (file.exists(reduce_file)) file.remove(reduce_file)
+  }, add = TRUE)
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a160afc and 1dfd8f0.

⛔ Files ignored due to path filters (3)
  • tests/testthat/topN_preprocess_output.csv is excluded by !**/*.csv
  • tests/testthat/topN_spectro_output.csv is excluded by !**/*.csv
  • tests/testthat/topN_test_diann_output.csv is excluded by !**/*.csv
📒 Files selected for processing (7)
  • DESCRIPTION
  • NAMESPACE
  • R/clean_DIANN.R
  • R/converters.R
  • man/bigDIANNtoMSstatsFormat.Rd
  • tests/testthat/test-converters.R
  • tests/testthat/test-diann_converter.R
🔇 Additional comments (6)
NAMESPACE (1)

5-5: LGTM!

The new export for bigDIANNtoMSstatsFormat is correctly added and aligns with the new converter functionality.

tests/testthat/test-diann_converter.R (1)

6-109: LGTM! Comprehensive test coverage.

The test suite provides excellent coverage of the DIANN converter functionality:

  • cleanDIANNChunk test: Validates chunk processing, fragment filtering (H2O removal), and column mapping
  • reduceBigDIANN test: Tests file-based chunked processing with multiple proteins and charge extraction
  • bigDIANNtoMSstatsFormat test: End-to-end validation of feature selection logic with arrow backend

The tests correctly exercise key edge cases and validate expected outputs.

tests/testthat/test-converters.R (3)

1-4: LGTM on test setup.

The imports and context setup are appropriate for testing converter functions.


6-28: LGTM!

Good coverage of the annotation merge functionality. The test validates that Run and Intensity are preserved while Condition and BioReplicate are correctly joined.


92-95: Cleanup may miss the intermediate file with tempfile() approach.

If you adopt tempfile() for output_file, the paste0("reduce_output_", output_file) pattern will break because output_file would be a full path like /tmp/RtmpXXX/file123.csv. Consider adjusting how the reduce output path is constructed or use basename().

Also, consider wrapping cleanup in on.exit() as suggested for the previous test.

R/converters.R (1)

170-199: Verify reduceBigDIANN exists and parameter alignment.

The function implementation follows the established pattern from bigSpectronauttoMSstatsFormat. The function reduceBigDIANN is correctly defined in R/clean_DIANN.R and properly invoked.

Looking at MSstatsPreprocessBig signature (R/converters.R:44-54), the parameter order is: input_file, output_file_name, backend, max_feature_count, filter_unique_peptides, aggregate_psms, filter_few_obs, remove_annotation, calculateAnomalyScores, anomalyModelFeatures, connection.

For bigSpectronauttoMSstatsFormat (R/converters.R:147-151):

MSstatsPreprocessBig(
  paste0("reduce_output_", output_file_name),
  output_file_name, backend, max_feature_count,
  aggregate_psms, filter_few_obs, remove_annotation, calculateAnomalyScores, 
  anomalyModelFeatures, connection)

This is missing filter_unique_peptides entirely—a bug in the existing function.

Your new function correctly includes all parameters in the proper order:

MSstatsPreprocessBig(
  paste0("reduce_output_", output_file_name),
  output_file_name, backend, max_feature_count,
  filter_unique_peptides, aggregate_psms, filter_few_obs, 
  remove_annotation, calculateAnomalyScores, 
  anomalyModelFeatures, connection)

Comment on lines +33 to +34
\item{quantificationColumn}{Use 'FragmentQuantCorrected'(default) column for quantified intensities for DIANN 1.8.x.
Use 'FragmentQuantRaw' for quantified intensities for DIANN 1.9.x.}
Copy link

@coderabbitai coderabbitai bot Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n man/bigDIANNtoMSstatsFormat.Rd

Repository: Vitek-Lab/MSstatsBig

Length of output: 3052


🏁 Script executed:

rg -i "fragment.quant|fragmentquant" -B 2 -A 2

Repository: Vitek-Lab/MSstatsBig

Length of output: 5131


🏁 Script executed:

cat -n R/clean_DIANN.R | head -50

Repository: Vitek-Lab/MSstatsBig

Length of output: 2207


🏁 Script executed:

rg "FragmentQuantRaw|Fragment.Quant.Raw" -B 2 -A 2

Repository: Vitek-Lab/MSstatsBig

Length of output: 832


🏁 Script executed:

rg "Fragment.Quant.Raw" -B 2 -A 2

Repository: Vitek-Lab/MSstatsBig

Length of output: 46


🏁 Script executed:

rg "1.9" -B 2 -A 2

Repository: Vitek-Lab/MSstatsBig

Length of output: 50377


🏁 Script executed:

grep -A 10 "read.*delim\|colnames" R/clean_DIANN.R | head -30

Repository: Vitek-Lab/MSstatsBig

Length of output: 1149


🏁 Script executed:

grep -B 5 -A 15 "intersect.*quantificationColumn\|quantificationColumn.*intersect" R/clean_DIANN.R

Repository: Vitek-Lab/MSstatsBig

Length of output: 852


🏁 Script executed:

rg -B 3 -A 3 "names\(|colnames.*=" R/ --max-count=5

Repository: Vitek-Lab/MSstatsBig

Length of output: 3036


🏁 Script executed:

rg "standardizeColnames|Fragment.Quant" R/clean_DIANN.R

Repository: Vitek-Lab/MSstatsBig

Length of output: 46


🏁 Script executed:

sed -n '1,100p' tests/testthat/test-diann_converter.R | grep -A 20 "Fragment"

Repository: Vitek-Lab/MSstatsBig

Length of output: 3986


Correct the column name documentation to match DIANN's actual output format.

The documentation incorrectly specifies column names without dots. DIANN exports columns with dots in their names. The actual column names are:

  • Fragment.Quant.Corrected (for DIANN 1.8.x) - not FragmentQuantCorrected
  • Fragment.Quant.Raw (for DIANN 1.9.x) - not FragmentQuantRaw

Update lines 33-34 to use the correct dotted format, and also fix the default value in reduceBigDIANN (R/clean_DIANN.R, line 3) which currently uses "FragmentQuantCorrected" instead of "Fragment.Quant.Corrected".

🤖 Prompt for AI Agents
In @man/bigDIANNtoMSstatsFormat.Rd around lines 33-34, The documentation and
default parameter use incorrect DIANN column names without dots; update the man
page entry for quantificationColumn to use "Fragment.Quant.Corrected" (DIANN
1.8.x) and "Fragment.Quant.Raw" (DIANN 1.9.x) instead of the dotted-less
variants, and update the default value in the reduceBigDIANN function
(R/clean_DIANN.R, function reduceBigDIANN) to "Fragment.Quant.Corrected" so the
code and docs match DIANN's actual output column names.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the man page documentation for the quantificationColumn parameter
echo "=== Checking man/bigDIANNtoMSstatsFormat.Rd around lines 33-34 ==="
sed -n '30,40p' man/bigDIANNtoMSstatsFormat.Rd

echo -e "\n=== Checking R/clean_DIANN.R for reduceBigDIANN default parameter ==="
# Check the function signature for reduceBigDIANN
rg -A 5 "^reduceBigDIANN\s*<-\s*function" R/clean_DIANN.R

Length of output: 910


@Rudhik1904, I don't see the changes yet in the repository. The files still contain the incorrect column names:

man/bigDIANNtoMSstatsFormat.Rd (lines 33-34) still shows:

  • 'FragmentQuantCorrected' → should be 'Fragment.Quant.Corrected'
  • 'FragmentQuantRaw' → should be 'Fragment.Quant.Raw'

R/clean_DIANN.R (line 2) still shows:

  • quantificationColumn = "FragmentQuantCorrected" → should be "Fragment.Quant.Corrected"

Could you please commit and push the changes? The column names need to include the dots to match DIANN's actual output format.

Comment on lines +94 to +95
if ("Condition" %in% colnames(input)) msstats_cols <- c(msstats_cols, "Condition")
if ("BioReplicate" %in% colnames(input)) msstats_cols <- c(msstats_cols, "BioReplicate")
Copy link
Contributor

@tonywu1999 tonywu1999 Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed with Devon, you can remove this code and instead add an additional parameter for users to provide an annotation file (i.e. the table with Run, BioReplicate, and Condition)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Comment on lines +31 to +35
mbr_cols <- if (MBR) {
c('Lib.Q.Value', 'Lib.PG.Q.Value')
} else {
c('Global.Q.Value', 'Global.PG.Q.Value')
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we need to add a filter for filtering rows if their values in these columns is below a certain threshold:

see reference

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@R/clean_DIANN.R`:
- Around line 2-3: The default quantificationColumn in reduceBigDIANN is
incorrect ("FragmentQuantCorrected") and should match bigDIANNtoMSstatsFormat's
column name ("Fragment.Quant.Corrected") so the column can be found; update the
default value of the quantificationColumn parameter in reduceBigDIANN to
"Fragment.Quant.Corrected" (or otherwise normalize dotted vs. non-dotted names
inside reduceBigDIANN to accept both), referencing the reduceBigDIANN function
and the bigDIANNtoMSstatsFormat expectation when making the change.
- Around line 35-58: The code unconditionally uses columns "Fragment.Info" and
the variable quantificationColumn after only using intersect() earlier, so add
guards to skip downstream processing when those columns are missing: check if
quantificationColumn exists in colnames(input) before calling
tidyr::separate_rows on it or converting input[[quantificationColumn]] <-
as.numeric(...), and check for "Fragment.Info" in colnames(input) before
separating rows or any parsing of fragment data; use the existing split_cols or
explicit logical checks (e.g., if (quantificationColumn %in% colnames(input)) {
... } and if ("Fragment.Info" %in% colnames(input)) { ... }) around the
tidyr::separate_rows, fragment processing, and numeric conversion to avoid
crashes when columns are absent.
- Around line 64-70: ProductCharge currently becomes NA when Fragment.Info
contains "/" but no digits follow because as.integer(stringr::str_extract(...))
yields NA; modify the ProductCharge computation (the dplyr::if_else branch that
uses stringr::str_extract on .data$Fragment.Info) to wrap the extracted value
with dplyr::coalesce (or equivalent) so that if str_extract returns NA it falls
back to "1" before coercion, ensuring the final value is 1L when parsing fails.

In `@R/converters.R`:
- Around line 184-193: The intermediate path construction using
paste0("reduce_output_", output_file_name) is unsafe when output_file_name
contains directories; update calls that create the reduced intermediate filename
(in reduceBigDIANN -> MSstatsPreprocessBig sequence and the analogous locations
in bigSpectronauttoMSstatsFormat) to build the path with
file.path(dirname(output_file_name), paste0("reduce_output_",
basename(output_file_name))) so the reduce_output_* file is created alongside
the intended output file regardless of directory components.

In `@tests/testthat/test-diann_converter.R`:
- Around line 78-108: The test uses a static output filename
("test_diann_output.csv") which can cause collisions; change the output_file
assignment to use tempfile(fileext = ".csv") and update all subsequent
references (the call to bigDIANNtoMSstatsFormat, the dplyr::collect on
converted, and the cleanup file.remove calls that use output_file and
paste0("reduce_output_", output_file)) to use that tempfile variable so the test
writes/cleans unique temp files; relevant symbols: output_file,
bigDIANNtoMSstatsFormat, converted, and the cleanup file.remove/paste0 calls.
♻️ Duplicate comments (2)
R/clean_DIANN.R (2)

7-13: Stop on unsupported extensions; avoid treating .xlsx as TSV.

grepl("tsv|xls") matches .xlsx, which is binary. The ";" fallback can silently mis-parse files. Prefer anchored, case-insensitive extension checks and error for unsupported formats.

🐛 Proposed fix
-  if (grepl("csv", input_file)) {
+  if (grepl("\\.csv$", input_file, ignore.case = TRUE)) {
     delim = ","
-  } else if (grepl("tsv|xls", input_file)) {
+  } else if (grepl("\\.(tsv|txt)$", input_file, ignore.case = TRUE)) {
     delim = "\t"
   } else {
-    delim <- ";"
+    stop("Unsupported file extension. Please provide a .csv or .tsv/.txt file.")
   }

60-63: Remove the ineffective FragmentIon regex.

sub('\\^\\.\\*', ...) matches the literal "^.*" and is effectively a no-op. If no transformation is intended, assign the value directly.

🔧 Suggested change
-    FragmentIon = sub('\\^\\.\\*', '', .data$Fragment.Info),
+    FragmentIon = .data$Fragment.Info,

Comment on lines +2 to +3
reduceBigDIANN <- function(input_file, output_path, MBR = TRUE,
quantificationColumn = "FragmentQuantCorrected",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find and examine the reduceBigDIANN function
find . -name "clean_DIANN.R" -o -name "*DIANN*" -type f | head -20

Repository: Vitek-Lab/MSstatsBig

Length of output: 114


🏁 Script executed:

# Search for the reduceBigDIANN function definition
rg -n "reduceBigDIANN\s*<-\s*function" -A 5

Repository: Vitek-Lab/MSstatsBig

Length of output: 504


🏁 Script executed:

# Search for bigDIANNtoMSstatsFormat function and its quantificationColumn default
rg -n "bigDIANNtoMSstatsFormat\s*<-\s*function" -A 10

Repository: Vitek-Lab/MSstatsBig

Length of output: 938


🏁 Script executed:

# Search for any test files mentioning quantificationColumn
fd -e R -path "*/test*" | xargs rg "quantificationColumn|Fragment" 2>/dev/null | head -30

Repository: Vitek-Lab/MSstatsBig

Length of output: 4826


Align quantificationColumn default with bigDIANNtoMSstatsFormat.

The default "FragmentQuantCorrected" differs from bigDIANNtoMSstatsFormat ("Fragment.Quant.Corrected"), and test data confirms DIANN output uses the dotted notation. Calling reduceBigDIANN without override will fail to locate the column.

🔧 Suggested change
-reduceBigDIANN <- function(input_file, output_path, MBR = TRUE,
-                           quantificationColumn = "FragmentQuantCorrected",
+reduceBigDIANN <- function(input_file, output_path, MBR = TRUE,
+                           quantificationColumn = "Fragment.Quant.Corrected",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
reduceBigDIANN <- function(input_file, output_path, MBR = TRUE,
quantificationColumn = "FragmentQuantCorrected",
reduceBigDIANN <- function(input_file, output_path, MBR = TRUE,
quantificationColumn = "Fragment.Quant.Corrected",
🤖 Prompt for AI Agents
In `@R/clean_DIANN.R` around lines 2 - 3, The default quantificationColumn in
reduceBigDIANN is incorrect ("FragmentQuantCorrected") and should match
bigDIANNtoMSstatsFormat's column name ("Fragment.Quant.Corrected") so the column
can be found; update the default value of the quantificationColumn parameter in
reduceBigDIANN to "Fragment.Quant.Corrected" (or otherwise normalize dotted vs.
non-dotted names inside reduceBigDIANN to accept both), referencing the
reduceBigDIANN function and the bigDIANNtoMSstatsFormat expectation when making
the change.

Comment on lines +35 to +58
# 1. Select required columns
base_cols <- c('Protein.Names', 'Stripped.Sequence', 'Modified.Sequence',
'Precursor.Charge', quantificationColumn, 'Q.Value',
'Precursor.Mz', 'Fragment.Info', 'Run')

mbr_cols <- if (MBR) {
c('Lib.Q.Value', 'Lib.PG.Q.Value')
} else {
c('Global.Q.Value', 'Global.PG.Q.Value')
}

req_cols <- intersect(c(base_cols, mbr_cols), colnames(input))
input <- dplyr::select(input, all_of(req_cols))

# 2. Split concatenated values (un-nest)
split_cols <- intersect(c(quantificationColumn, "Fragment.Info"), colnames(input))
if (length(split_cols) > 0) {
input <- tidyr::separate_rows(input, all_of(split_cols), sep = ";")
}

# 3. Process fragment information

#Convert Intensity to Numeric from Char strings
input[[quantificationColumn]] <- as.numeric(input[[quantificationColumn]])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /tmp && find . -name "clean_DIANN.R" -type f 2>/dev/null | head -5

Repository: Vitek-Lab/MSstatsBig

Length of output: 46


🏁 Script executed:

# Get overall structure first
git ls-files | grep -i diann

Repository: Vitek-Lab/MSstatsBig

Length of output: 190


🏁 Script executed:

# Search for the file in common R locations
find . -name "clean_DIANN.R" -o -name "*DIANN*" 2>/dev/null | head -10

Repository: Vitek-Lab/MSstatsBig

Length of output: 114


🏁 Script executed:

wc -l R/clean_DIANN.R

Repository: Vitek-Lab/MSstatsBig

Length of output: 83


🏁 Script executed:

cat -n R/clean_DIANN.R

Repository: Vitek-Lab/MSstatsBig

Length of output: 6174


Guard against missing Fragment.Info or quantificationColumn before unconditional use.

The code treats Fragment.Info and quantificationColumn as optional via intersect() at lines 46–50, but uses them unconditionally at lines 58, 62–68, and 76. If either column is absent from the input, the code will crash. Either enforce these as required columns upfront or conditionally apply the downstream processing when they are missing.

🔧 Example guard
-  req_cols <- intersect(c(base_cols, mbr_cols), colnames(input))
-  input <- dplyr::select(input, all_of(req_cols))
+  required_cols <- c(base_cols, mbr_cols)
+  missing_cols <- setdiff(required_cols, colnames(input))
+  if (length(missing_cols) > 0) {
+    stop("Missing required DIANN columns: ", paste(missing_cols, collapse = ", "))
+  }
+  input <- dplyr::select(input, all_of(required_cols))
🤖 Prompt for AI Agents
In `@R/clean_DIANN.R` around lines 35 - 58, The code unconditionally uses columns
"Fragment.Info" and the variable quantificationColumn after only using
intersect() earlier, so add guards to skip downstream processing when those
columns are missing: check if quantificationColumn exists in colnames(input)
before calling tidyr::separate_rows on it or converting
input[[quantificationColumn]] <- as.numeric(...), and check for "Fragment.Info"
in colnames(input) before separating rows or any parsing of fragment data; use
the existing split_cols or explicit logical checks (e.g., if
(quantificationColumn %in% colnames(input)) { ... } and if ("Fragment.Info" %in%
colnames(input)) { ... }) around the tidyr::separate_rows, fragment processing,
and numeric conversion to avoid crashes when columns are absent.

Comment on lines +64 to +70
# Extract product charge
ProductCharge = dplyr::if_else(
grepl("/", .data$Fragment.Info),
# Extract charge (number right after "/" in string), default to 1 if parsing fails
as.integer(stringr::str_extract(.data$Fragment.Info, "(?<=/)[0-9]+")),
1L
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "clean_DIANN.R" -type f

Repository: Vitek-Lab/MSstatsBig

Length of output: 81


🏁 Script executed:

sed -n '64,70p' ./R/clean_DIANN.R

Repository: Vitek-Lab/MSstatsBig

Length of output: 348


🏁 Script executed:

python3 << 'EOF'
# Let's verify the behavior of stringr::str_extract and as.integer in R
# by examining what happens when the regex pattern doesn't match

# The behavior should be:
# stringr::str_extract("something/", "(?<=/)[0-9]+") -> NA (no digits after /)
# as.integer(NA) -> NA (not 1L)

# This is a R behavior question, let me create a test to understand it better

print("Testing R string extraction and type conversion behavior:")
print("Case 1: Fragment.Info with / but no digits after it")
print("  stringr::str_extract returns: NA (pattern doesn't match)")
print("  as.integer(NA) returns: NA (not 1L)")
print("")
print("Case 2: Fragment.Info with / and digits after it")
print("  stringr::str_extract returns: the digits")
print("  as.integer(digits) returns: integer value")
print("")
print("Case 3: Fragment.Info without /")
print("  Current code returns: 1L (via the else branch of if_else)")
print("")
print("So the bug is in Case 1 - when / is present but no digits follow")
print("The function would return NA instead of 1L")

EOF

Repository: Vitek-Lab/MSstatsBig

Length of output: 150


🏁 Script executed:

grep -r "ProductCharge" ./R --include="*.R" -B 2 -A 2

Repository: Vitek-Lab/MSstatsBig

Length of output: 4903


🏁 Script executed:

sed -n '50,80p' ./R/clean_DIANN.R

Repository: Vitek-Lab/MSstatsBig

Length of output: 1088


Ensure ProductCharge defaults to 1 when parsing fails.

The current code returns NA when a / is present in Fragment.Info but no digits follow (e.g., "y1/"). The regex pattern "(?<=/)[0-9]+" fails to match, causing stringr::str_extract() to return NA, and subsequently as.integer(NA) returns NA instead of the intended default of 1L. Use dplyr::coalesce() to handle this edge case:

🔧 Suggested change
-    ProductCharge = dplyr::if_else(
-      grepl("/", .data$Fragment.Info),
-      # Extract charge (number right after "/" in string), default to 1 if parsing fails
-      as.integer(stringr::str_extract(.data$Fragment.Info, "(?<=/)[0-9]+")),
-      1L
-    )
+    ProductCharge = dplyr::if_else(
+      grepl("/", .data$Fragment.Info),
+      dplyr::coalesce(
+        as.integer(stringr::str_extract(.data$Fragment.Info, "(?<=/)[0-9]+")),
+        1L
+      ),
+      1L
+    )
🤖 Prompt for AI Agents
In `@R/clean_DIANN.R` around lines 64 - 70, ProductCharge currently becomes NA
when Fragment.Info contains "/" but no digits follow because
as.integer(stringr::str_extract(...)) yields NA; modify the ProductCharge
computation (the dplyr::if_else branch that uses stringr::str_extract on
.data$Fragment.Info) to wrap the extracted value with dplyr::coalesce (or
equivalent) so that if str_extract returns NA it falls back to "1" before
coercion, ensuring the final value is 1L when parsing fails.

Comment on lines +184 to +193
# Reduce and clean the DIANN report file in chunks
reduceBigDIANN(input_file,
paste0("reduce_output_", output_file_name),
MBR,
quantificationColumn)

# Preprocess the cleaned data (feature selection, etc.)
msstats_data <- MSstatsPreprocessBig(
paste0("reduce_output_", output_file_name),
output_file_name, backend, max_feature_count,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

fd "converters.R" --type f

Repository: Vitek-Lab/MSstatsBig

Length of output: 111


🏁 Script executed:

wc -l R/converters.R

Repository: Vitek-Lab/MSstatsBig

Length of output: 82


🏁 Script executed:

cat -n R/converters.R | sed -n '180,195p'

Repository: Vitek-Lab/MSstatsBig

Length of output: 878


🏁 Script executed:

cat -n R/converters.R | sed -n '170,200p'

Repository: Vitek-Lab/MSstatsBig

Length of output: 1649


🏁 Script executed:

cat -n R/converters.R | sed -n '1,30p'

Repository: Vitek-Lab/MSstatsBig

Length of output: 2410


🏁 Script executed:

rg "bigDIANNtoMSstatsFormat" --type r -A 3 -B 3

Repository: Vitek-Lab/MSstatsBig

Length of output: 1488


🏁 Script executed:

cat -n tests/testthat/test-diann_converter.R | grep -A 50 "bigDIANNtoMSstatsFormat works with arrow"

Repository: Vitek-Lab/MSstatsBig

Length of output: 2424


🏁 Script executed:

cat -n tests/testthat/test-diann_converter.R

Repository: Vitek-Lab/MSstatsBig

Length of output: 5551


🏁 Script executed:

rg "output_file_name" R/converters.R -n

Repository: Vitek-Lab/MSstatsBig

Length of output: 958


🏁 Script executed:

cat -n R/converters.R | sed -n '128,155p'

Repository: Vitek-Lab/MSstatsBig

Length of output: 1853


Fix unsafe path construction that breaks when output_file_name contains directories.

The paste0("reduce_output_", output_file_name) calls at lines 186 and 192 produce invalid paths if output_file_name includes directory separators (e.g., /tmp/out.csvreduce_output_/tmp/out.csv). Use file.path() with dirname() and basename() to safely place the intermediate file alongside the output.

Note: The same pattern exists in bigSpectronauttoMSstatsFormat at lines 143 and 148 and should be fixed identically.

🐛 Proposed fix
+  reduce_path <- file.path(
+    dirname(output_file_name),
+    paste0("reduce_output_", basename(output_file_name))
+  )
+  reduceBigDIANN(input_file, reduce_path, MBR, quantificationColumn)
-  reduceBigDIANN(input_file, 
-                 paste0("reduce_output_", output_file_name),
-                 MBR,
-                 quantificationColumn)
   
-  msstats_data <- MSstatsPreprocessBig(
-    paste0("reduce_output_", output_file_name),
+  msstats_data <- MSstatsPreprocessBig(
+    reduce_path,
     output_file_name, backend, max_feature_count,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Reduce and clean the DIANN report file in chunks
reduceBigDIANN(input_file,
paste0("reduce_output_", output_file_name),
MBR,
quantificationColumn)
# Preprocess the cleaned data (feature selection, etc.)
msstats_data <- MSstatsPreprocessBig(
paste0("reduce_output_", output_file_name),
output_file_name, backend, max_feature_count,
# Reduce and clean the DIANN report file in chunks
reduce_path <- file.path(
dirname(output_file_name),
paste0("reduce_output_", basename(output_file_name))
)
reduceBigDIANN(input_file, reduce_path, MBR, quantificationColumn)
# Preprocess the cleaned data (feature selection, etc.)
msstats_data <- MSstatsPreprocessBig(
reduce_path,
output_file_name, backend, max_feature_count,
🤖 Prompt for AI Agents
In `@R/converters.R` around lines 184 - 193, The intermediate path construction
using paste0("reduce_output_", output_file_name) is unsafe when output_file_name
contains directories; update calls that create the reduced intermediate filename
(in reduceBigDIANN -> MSstatsPreprocessBig sequence and the analogous locations
in bigSpectronauttoMSstatsFormat) to build the path with
file.path(dirname(output_file_name), paste0("reduce_output_",
basename(output_file_name))) so the reduce_output_* file is created alongside
the intended output file regardless of directory components.

Comment on lines +78 to +108
test_that("bigDIANNtoMSstatsFormat works with arrow backend", {
input_file <- tempfile(fileext = ".csv")
output_file <- "test_diann_output.csv"

# 4 features for one protein. Feature selection should pick the top 2.
diann_data <- rbind(
data.frame(Run = c("r1", "r2"), Protein.Names = "P1", Stripped.Sequence = "PEPTIDE", Modified.Sequence = "PEPTIDE", Precursor.Charge = 2, Fragment.Quant.Corrected = c(1000, 1100), Q.Value = 0.001, Precursor.Mz = 500, Fragment.Info = "y1", Lib.Q.Value = 0.001, Lib.PG.Q.Value = 0.001),
data.frame(Run = c("r1", "r2"), Protein.Names = "P1", Stripped.Sequence = "PEPTIDE", Modified.Sequence = "PEPTIDE", Precursor.Charge = 2, Fragment.Quant.Corrected = c(500, 600), Q.Value = 0.001, Precursor.Mz = 500, Fragment.Info = "y2", Lib.Q.Value = 0.001, Lib.PG.Q.Value = 0.001),
data.frame(Run = c("r1", "r2"), Protein.Names = "P1", Stripped.Sequence = "PEPTIDE", Modified.Sequence = "PEPTIDE", Precursor.Charge = 2, Fragment.Quant.Corrected = c(100, 100), Q.Value = 0.001, Precursor.Mz = 500, Fragment.Info = "y3", Lib.Q.Value = 0.001, Lib.PG.Q.Value = 0.001),
data.frame(Run = c("r1", "r2"), Protein.Names = "P1", Stripped.Sequence = "PEPTIDE", Modified.Sequence = "PEPTIDE", Precursor.Charge = 2, Fragment.Quant.Corrected = c(2000, 2100), Q.Value = 0.001, Precursor.Mz = 500, Fragment.Info = "y4", Lib.Q.Value = 0.001, Lib.PG.Q.Value = 0.001)
)
write.csv(diann_data, input_file, row.names = FALSE)

converted <- bigDIANNtoMSstatsFormat(
input_file = input_file,
output_file_name = output_file,
backend = "arrow",
max_feature_count = 2
)
result <- dplyr::collect(converted)

# Avg intensities: y1=1050, y2=550, y3=100, y4=2050.
# Top 2 features are y4 and y1.
expect_equal(nrow(result), 4) # 2 features * 2 runs
expect_true(all(c("y1", "y4") %in% unique(result$FragmentIon)))
expect_false(any(c("y2", "y3") %in% unique(result$FragmentIon)))

# Cleanup
file.remove(input_file)
if (file.exists(output_file)) file.remove(output_file)
if (file.exists(paste0("reduce_output_", output_file))) file.remove(paste0("reduce_output_", output_file))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the actual test file to confirm the current state
cat -n tests/testthat/test-diann_converter.R | sed -n '70,115p'

Repository: Vitek-Lab/MSstatsBig

Length of output: 2688


🏁 Script executed:

# Check if there's any test configuration for parallel execution
git ls-files | grep -E "(testthat|\.yml|\.yaml|\.json)" | head -20

Repository: Vitek-Lab/MSstatsBig

Length of output: 257


🏁 Script executed:

# Look for similar test output file patterns in other test files
fd '\.R$' tests/testthat/ | xargs grep -l "tempfile\|file.remove" | head -10

Repository: Vitek-Lab/MSstatsBig

Length of output: 134


🏁 Script executed:

# Check for test configuration files (especially for parallel execution)
fd -e yml -e yaml -e dcf | head -20

Repository: Vitek-Lab/MSstatsBig

Length of output: 46


🏁 Script executed:

# Look at other tests to see if they use static filenames too
grep -n "output_file.*<-" tests/testthat/*.R

Repository: Vitek-Lab/MSstatsBig

Length of output: 465


🏁 Script executed:

# Check the git history to see if these leftover files should have been cleaned
git ls-files tests/testthat/*.csv

Repository: Vitek-Lab/MSstatsBig

Length of output: 186


🏁 Script executed:

# Look for how other tests in the project handle temporary output files
grep -B 2 -A 2 "tempfile.*output\|output.*tempfile" tests/testthat/*.R | head -40

Repository: Vitek-Lab/MSstatsBig

Length of output: 803


Use tempfile() instead of static filename to prevent test collisions.

The other tests in this file (lines 7 and 45) correctly use tempfile(fileext = ".csv") for output files. This test should follow the same pattern to avoid potential collisions in parallel test runs or from previous test failures. Evidence of incomplete cleanup is visible in the repository with leftover CSV files (e.g., topN_test_diann_output.csv).

🔧 Suggested change
-  output_file <- "test_diann_output.csv"
+  output_file <- basename(tempfile(fileext = ".csv"))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
test_that("bigDIANNtoMSstatsFormat works with arrow backend", {
input_file <- tempfile(fileext = ".csv")
output_file <- "test_diann_output.csv"
# 4 features for one protein. Feature selection should pick the top 2.
diann_data <- rbind(
data.frame(Run = c("r1", "r2"), Protein.Names = "P1", Stripped.Sequence = "PEPTIDE", Modified.Sequence = "PEPTIDE", Precursor.Charge = 2, Fragment.Quant.Corrected = c(1000, 1100), Q.Value = 0.001, Precursor.Mz = 500, Fragment.Info = "y1", Lib.Q.Value = 0.001, Lib.PG.Q.Value = 0.001),
data.frame(Run = c("r1", "r2"), Protein.Names = "P1", Stripped.Sequence = "PEPTIDE", Modified.Sequence = "PEPTIDE", Precursor.Charge = 2, Fragment.Quant.Corrected = c(500, 600), Q.Value = 0.001, Precursor.Mz = 500, Fragment.Info = "y2", Lib.Q.Value = 0.001, Lib.PG.Q.Value = 0.001),
data.frame(Run = c("r1", "r2"), Protein.Names = "P1", Stripped.Sequence = "PEPTIDE", Modified.Sequence = "PEPTIDE", Precursor.Charge = 2, Fragment.Quant.Corrected = c(100, 100), Q.Value = 0.001, Precursor.Mz = 500, Fragment.Info = "y3", Lib.Q.Value = 0.001, Lib.PG.Q.Value = 0.001),
data.frame(Run = c("r1", "r2"), Protein.Names = "P1", Stripped.Sequence = "PEPTIDE", Modified.Sequence = "PEPTIDE", Precursor.Charge = 2, Fragment.Quant.Corrected = c(2000, 2100), Q.Value = 0.001, Precursor.Mz = 500, Fragment.Info = "y4", Lib.Q.Value = 0.001, Lib.PG.Q.Value = 0.001)
)
write.csv(diann_data, input_file, row.names = FALSE)
converted <- bigDIANNtoMSstatsFormat(
input_file = input_file,
output_file_name = output_file,
backend = "arrow",
max_feature_count = 2
)
result <- dplyr::collect(converted)
# Avg intensities: y1=1050, y2=550, y3=100, y4=2050.
# Top 2 features are y4 and y1.
expect_equal(nrow(result), 4) # 2 features * 2 runs
expect_true(all(c("y1", "y4") %in% unique(result$FragmentIon)))
expect_false(any(c("y2", "y3") %in% unique(result$FragmentIon)))
# Cleanup
file.remove(input_file)
if (file.exists(output_file)) file.remove(output_file)
if (file.exists(paste0("reduce_output_", output_file))) file.remove(paste0("reduce_output_", output_file))
test_that("bigDIANNtoMSstatsFormat works with arrow backend", {
input_file <- tempfile(fileext = ".csv")
output_file <- basename(tempfile(fileext = ".csv"))
# 4 features for one protein. Feature selection should pick the top 2.
diann_data <- rbind(
data.frame(Run = c("r1", "r2"), Protein.Names = "P1", Stripped.Sequence = "PEPTIDE", Modified.Sequence = "PEPTIDE", Precursor.Charge = 2, Fragment.Quant.Corrected = c(1000, 1100), Q.Value = 0.001, Precursor.Mz = 500, Fragment.Info = "y1", Lib.Q.Value = 0.001, Lib.PG.Q.Value = 0.001),
data.frame(Run = c("r1", "r2"), Protein.Names = "P1", Stripped.Sequence = "PEPTIDE", Modified.Sequence = "PEPTIDE", Precursor.Charge = 2, Fragment.Quant.Corrected = c(500, 600), Q.Value = 0.001, Precursor.Mz = 500, Fragment.Info = "y2", Lib.Q.Value = 0.001, Lib.PG.Q.Value = 0.001),
data.frame(Run = c("r1", "r2"), Protein.Names = "P1", Stripped.Sequence = "PEPTIDE", Modified.Sequence = "PEPTIDE", Precursor.Charge = 2, Fragment.Quant.Corrected = c(100, 100), Q.Value = 0.001, Precursor.Mz = 500, Fragment.Info = "y3", Lib.Q.Value = 0.001, Lib.PG.Q.Value = 0.001),
data.frame(Run = c("r1", "r2"), Protein.Names = "P1", Stripped.Sequence = "PEPTIDE", Modified.Sequence = "PEPTIDE", Precursor.Charge = 2, Fragment.Quant.Corrected = c(2000, 2100), Q.Value = 0.001, Precursor.Mz = 500, Fragment.Info = "y4", Lib.Q.Value = 0.001, Lib.PG.Q.Value = 0.001)
)
write.csv(diann_data, input_file, row.names = FALSE)
converted <- bigDIANNtoMSstatsFormat(
input_file = input_file,
output_file_name = output_file,
backend = "arrow",
max_feature_count = 2
)
result <- dplyr::collect(converted)
# Avg intensities: y1=1050, y2=550, y3=100, y4=2050.
# Top 2 features are y4 and y1.
expect_equal(nrow(result), 4) # 2 features * 2 runs
expect_true(all(c("y1", "y4") %in% unique(result$FragmentIon)))
expect_false(any(c("y2", "y3") %in% unique(result$FragmentIon)))
# Cleanup
file.remove(input_file)
if (file.exists(output_file)) file.remove(output_file)
if (file.exists(paste0("reduce_output_", output_file))) file.remove(paste0("reduce_output_", output_file))
})
🤖 Prompt for AI Agents
In `@tests/testthat/test-diann_converter.R` around lines 78 - 108, The test uses a
static output filename ("test_diann_output.csv") which can cause collisions;
change the output_file assignment to use tempfile(fileext = ".csv") and update
all subsequent references (the call to bigDIANNtoMSstatsFormat, the
dplyr::collect on converted, and the cleanup file.remove calls that use
output_file and paste0("reduce_output_", output_file)) to use that tempfile
variable so the test writes/cleans unique temp files; relevant symbols:
output_file, bigDIANNtoMSstatsFormat, converted, and the cleanup
file.remove/paste0 calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants