First iteration about supporting some SDRF-based plots by ypriverol · Pull Request #478 · bigbio/pmultiqc

ypriverol · 2025-12-24T14:38:09Z

Pull Request

Description

Brief description of the changes made in this PR.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Performance improvement
Code refactoring
Test addition/update
Updates to the dependencies has been done.

Summary by CodeRabbit

New Features
- Dark logo support for custom branding.
- Sample-level aggregation and visualizations added alongside run-level metrics.
- SDRF-aware DIA plotting for improved sample grouping.
Improvements
- Refined calculations for modifications, charge-state distributions, and identification rates.
- Better handling of identified vs. unidentified scans and more robust batch processing.
- Heatmap rendering now disables clustering; updated plot labels/descriptions.
Updates
- MultiQC compatibility bumped to <=1.33; urllib3 added as a dependency.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Add initial support for SDRF integration

minor changes

coderabbitai · 2025-12-24T14:38:38Z

📝 Walkthrough

Walkthrough

Refactors aggregation and plotting to add SDRF/sample-aware and DIA-aware per-run and per-sample metrics across common, dia, quantms, mzidentml and plotting modules; adds utilities for identified-rate, charge-state and modification summaries; and bumps MultiQC upper bound and adds urllib3 to dependency specs.

Changes

Cohort / File(s)	Summary
Environment & Packaging `environment.yml`, `pyproject.toml`, `requirements.txt`, `recipe/meta.yaml`	Bumped MultiQC upper bound from `<=1.32` → `<=1.33`; added `urllib3>=2.6.1`; minor formatting tweaks.
App Config / Branding `pmultiqc/main.py`	Introduced local dark logo path and conditional application of `custom_logo_dark` and `custom_logo_width` when available.
Common Utilities `pmultiqc/modules/common/common_utils.py`	Renamed `mod_group_percentage(group)` → `mod_group_percentage(df)`; added handling/rename for `Modifications` vs `modifications`; added functions `cal_num_table_at_sample`, `cal_msms_identified_rate`, `aggregate_msms_identified_rate`, `summarize_modifications`, `group_charge`.
DIA Processing `pmultiqc/modules/common/dia_utils.py`	Threaded SDRF/file context through processing and plotting, removed external cal_num_table_data param, changed `_process_run_data` to return `cal_num_table_data` with `ms_runs`/`sdrf_samples`, added `dia_sample_level_modifications`, updated signatures and plotting call paths.
MS I/O & MS Info `pmultiqc/modules/common/ms_io.py`, `pmultiqc/modules/common/ms/msinfo.py`	`add_ms_values` refactored to batch processing, DIA path, identified/unidentified grouping and explicit dtype handling; `msinfo` now passes whole group DataFrame into `add_ms_values` (batch) rather than iterating rows.
mzIdentML Utilities & Module `pmultiqc/modules/common/mzidentml_utils.py`, `pmultiqc/modules/mzidentml/mzidentml.py`	`get_mzid_num_data` now returns only `identified_spectra`; mzIdentML parsing now uses `aggregate_msms_identified_rate` to compute `msms_identified_rate`, builds per-run `cal_num_table_data["ms_runs"]` and sets `identified_msms_spectra` accordingly.
Plots — DIA / General / ID `pmultiqc/modules/common/plots/dia.py`, `.../plots/general.py`, `.../plots/id.py`	DIA plotting functions extended to accept `sdrf_file_df` and produce Run vs Sample groupings; heatmap clustering disabled (`cluster_rows`, `cluster_cols` = False); ID plots refactored to accept `msms_identified_rate`, dynamic `data_labels`, renamed plot ids/titles, and added `rebuild_dict_structure` helper.
DiANN & QuantMS Modules `pmultiqc/modules/diann/diann.py`, `pmultiqc/modules/quantms/quantms.py`	Removed passing of `cal_num_table_data`/`mzml_table` downstream in DiANN; QuantMS added `cal_charge_state` and `sample_level_modifications`, computes and stores run- and sample-level peptide intensities, charge-state and modification plots as paired structures `[run_level, sample_level]`, and integrates `aggregate_msms_identified_rate` for identified-rate computation.
IdXML `pmultiqc/modules/common/ms/idxml.py`	Added a commented-out instantiation line for `PeptideIdentificationList()` (no runtime effect).

Sequence Diagram(s)

mermaid
sequenceDiagram
autonumber
participant Parser as File Parser
participant SDRF as SDRF / file_df
participant Aggregator as common_utils
participant Stat as per-run/stat engine
participant Plotter as Plot modules
Note over Parser,SDRF: 1. Parse raw files & SDRF
Parser->>Aggregator: send per-run DataFrames + SDRF
Aggregator->>Stat: compute identified rates (aggregate_msms_identified_rate)
Stat-->>Aggregator: msms_identified_rate, per-run/sample tables
Aggregator->>Aggregator: summarize_modifications(), cal_num_table_at_sample()
Aggregator->>Plotter: emit plot_data (plot_data = [run_level, sample_level]) + msms_identified_rate
Plotter->>Plotter: draw Run and Sample plots (DIA-aware paths)
Plotter->>User: render visuals and tables
Note over Plotter,User: Final output includes paired Run/Sample charts and tables

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Major refactoring and cleaning of the code. #394 — Overlapping SDRF/sample-aware refactors across dia, quantms, mzidentml and common utilities that this PR extends.
major refactoring #186 — Refactors around modification-percentage logic and quantms plotting that relate to summarize_modifications extraction.
Major improvements in DIA #358 — Related changes to quantms/DIANN plotting and aggregation pathways; similar run/sample plotting adjustments.

Suggested labels

Review effort 5/5

Suggested reviewers

daichengxin

Poem

I nibble through SDRF and runs,
Count mods beneath moonlit suns,
Dark logos gleam, per-sample charts spring—
A hopping rabbit loves the new plotting thing! 🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 15.79% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'First iteration about supporting some SDRF-based plots' is directly related to the main changes in the PR, which involve extensive SDRF integration for plot generation and per-sample/run aggregations throughout multiple modules.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (9)

pmultiqc/modules/common/plots/general.py (2)
124-124: Use f-string conversion flag instead of explicit str() call.

The explicit str() call inside the f-string can be replaced with the more Pythonic !s conversion flag.
🔎 Proposed fix
-                    sample=SampleName(f"Sample {str(sample)}"),
+                    sample=SampleName(f"Sample {sample!s}"),
Based on static analysis hints (Ruff RUF010).

187-187: Use f-string conversion flag instead of explicit str() call.

The explicit str() call inside the f-string can be replaced with the more Pythonic !s conversion flag, consistent with the suggestion for line 124.
🔎 Proposed fix
-                    sample=SampleName(f"Sample {str(sample)}"),
+                    sample=SampleName(f"Sample {sample!s}"),
Based on static analysis hints (Ruff RUF010).
pmultiqc/modules/common/ms/idxml.py (1)
89-100: Simplify _init_labels() return statement.

The method returns three unused False values alongside the labels dictionary. Only the labels dict is actually used by the caller (line 64). Simplify the return to just return labels to reduce unnecessary complexity.
🔎 Proposed simplification
 def _init_labels(self):
     labels = {
         "spec_e_label": [],
         "xcorr_label": [],
         "hyper_label": [],
         "peps_label": [],
         "consensus_label": [],
         "msgf_label": False,
         "comet_label": False,
         "sage_label": False,
     }
-    return False, False, False, labels
+    return labels
And update the caller:
-_, _, _, labels = self._init_labels()
+labels = self._init_labels()
pmultiqc/modules/common/ms_io.py (1)
128-146: LGTM! Efficient batch processing with boolean indexing.

The refactoring uses info_df["scan"].isin(identified_spectrum_scan_id) to create a boolean mask, then processes identified and unidentified scans separately. This is more efficient than row-by-row iteration.
Optional: Further reduce code duplication

The three loops for each histogram could be extracted into a helper function:
+    def process_columns(df_subset, charge_hist, intensity_hist, peaks_hist):
+        for charge_state in df_subset["precursor_charge"]:
+            data_add_value(charge_hist, charge_state)
+        for base_peak_inte in df_subset["base_peak_intensity"]:
+            data_add_value(intensity_hist, base_peak_inte)
+        for peak_per_ms2 in df_subset["num_peaks"]:
+            data_add_value(peaks_hist, peak_per_ms2)
+
     identified_scans = info_df["scan"].isin(identified_spectrum_scan_id)
-    
-    for charge_state in info_df[identified_scans]["precursor_charge"]:
-        data_add_value(mzml_charge_plot, charge_state)
-    for base_peak_inte in info_df[identified_scans]["base_peak_intensity"]:
-        data_add_value(mzml_peak_distribution_plot, base_peak_inte)
-    for peak_per_ms2 in info_df[identified_scans]["num_peaks"]:
-        data_add_value(mzml_peaks_ms2_plot, peak_per_ms2)
-
-    for charge_state in info_df[~identified_scans]["precursor_charge"]:
-        data_add_value(mzml_charge_plot_1, charge_state)
-    for base_peak_inte in info_df[~identified_scans]["base_peak_intensity"]:
-        data_add_value(mzml_peak_distribution_plot_1, base_peak_inte)
-    for peak_per_ms2 in info_df[~identified_scans]["num_peaks"]:
-        data_add_value(mzml_peaks_ms2_plot_1, peak_per_ms2)
+    process_columns(
+        info_df[identified_scans],
+        mzml_charge_plot, mzml_peak_distribution_plot, mzml_peaks_ms2_plot
+    )
+    process_columns(
+        info_df[~identified_scans],
+        mzml_charge_plot_1, mzml_peak_distribution_plot_1, mzml_peaks_ms2_plot_1
+    )
pmultiqc/modules/common/plots/dia.py (2)
9-10: Self-import creates circular reference risk.

Line 9 imports dia as dia_plots from the same module (pmultiqc.modules.common.plots). This self-reference is used in calculate_dia_intensity_std (line 560) to call dia_plots.can_groupby_for_std. Since can_groupby_for_std is defined in this same file (line 257), you can call it directly without the import.
🔎 Suggested fix
-from pmultiqc.modules.common.plots import dia as dia_plots
 from pmultiqc.modules.common.common_utils import group_charge
And at line 560:
-    if dia_plots.can_groupby_for_std(df_sub, "Run"):
+    if can_groupby_for_std(df_sub, "Run"):
536-582: Missing return statement when grouping fails.

When can_groupby_for_std returns False (line 580), the function logs a warning but doesn't explicitly return. This results in an implicit return None, which is handled by the caller but could be made explicit for clarity.
🔎 Suggested fix
     else:
         log.warning("No SDRF available; failed to parse experimental groups; SD Intensity not generated.")
+        return {}
pmultiqc/modules/common/common_utils.py (2)
272-293: Column conflict handling logs warning but proceeds anyway.

When both "Modifications" and "modifications" columns exist (line 279), the warning is logged but the code continues using the original "modifications" column. This could silently produce incorrect results if the columns have different data. Consider either raising an error or explicitly documenting which column takes precedence.
🔎 Suggested fix to clarify precedence
     if "Modifications" in df_copy.columns and "modifications" not in df_copy.columns:
         df_copy = df_copy.rename(columns={"Modifications": "modifications"})
-    else:
-        log.warning('Detected both "Modifications" and "modifications" columns.')
+    elif "Modifications" in df_copy.columns and "modifications" in df_copy.columns:
+        log.warning('Detected both "Modifications" and "modifications" columns; using "modifications".')
540-548: zip() without strict parameter may silently truncate mismatched iterables.

As flagged by static analysis (B905), using zip() without strict=True on line 544 could silently drop data if the iterables have different lengths. Since both come from the same DataFrame, this is unlikely but worth protecting against.
🔎 Suggested fix
     mod_plot_dict = dict(
-        zip(mod_group_processed["modifications"], mod_group_processed["percentage"])
+        zip(mod_group_processed["modifications"], mod_group_processed["percentage"], strict=True)
     )
pmultiqc/modules/common/dia_utils.py (1)
837-855: dia_sample_level_modifications duplicates sample_level_modifications from quantms.py.

This function has nearly identical logic to sample_level_modifications in pmultiqc/modules/quantms/quantms.py (lines 2286-2308). Both merge with SDRF, cast Sample to int, group by Sample, and call summarize_modifications. Consider extracting to a shared utility in common_utils.py.

Based on the relevant code snippets, summarize_modifications and group_charge are already duplicated between common_utils.py and mztab.py. This pattern of duplication should be consolidated.
#!/bin/bash
# Check for duplicate implementations of sample_level_modifications
rg -n "def.*sample_level_modifications" --type py

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 40e2353 and d20c0d3.

⛔ Files ignored due to path filters (2)

MultiQC_logo.png is excluded by !**/*.png
pmultiqc/images/pmultiqc_logo_dark.png is excluded by !**/*.png

📒 Files selected for processing (17)

environment.yml
pmultiqc/main.py
pmultiqc/modules/common/common_utils.py
pmultiqc/modules/common/dia_utils.py
pmultiqc/modules/common/ms/idxml.py
pmultiqc/modules/common/ms/msinfo.py
pmultiqc/modules/common/ms_io.py
pmultiqc/modules/common/mzidentml_utils.py
pmultiqc/modules/common/plots/dia.py
pmultiqc/modules/common/plots/general.py
pmultiqc/modules/common/plots/id.py
pmultiqc/modules/diann/diann.py
pmultiqc/modules/mzidentml/mzidentml.py
pmultiqc/modules/quantms/quantms.py
pyproject.toml
recipe/meta.yaml
requirements.txt

💤 Files with no reviewable changes (1)

pmultiqc/modules/diann/diann.py

🧰 Additional context used

🧬 Code graph analysis (7)

pmultiqc/modules/common/ms/msinfo.py (1)

pmultiqc/modules/common/ms_io.py (1)

add_ms_values (77-146)

pmultiqc/modules/quantms/quantms.py (1)

pmultiqc/modules/common/file_utils.py (1)

file_prefix (110-117)

pmultiqc/modules/common/ms_io.py (1)

pmultiqc/modules/common/histogram.py (1)

add_value (67-119)

pmultiqc/modules/mzidentml/mzidentml.py (3)

pmultiqc/modules/common/common_utils.py (1)

aggregate_msms_identified_rate (497-538)

pmultiqc/modules/common/plots/id.py (1)

draw_quantms_identification (377-506)

pmultiqc/modules/quantms/sparklines.py (1)

plot (19-56)

pmultiqc/modules/common/plots/id.py (1)

pmultiqc/modules/quantms/sparklines.py (1)

plot (19-56)

pmultiqc/modules/common/dia_utils.py (1)

pmultiqc/modules/common/common_utils.py (2)

cal_num_table_at_sample (451-480)

summarize_modifications (540-548)

pmultiqc/modules/common/plots/dia.py (1)

pmultiqc/modules/common/common_utils.py (1)

group_charge (551-561)

🪛 OSV Scanner (2.3.1)

requirements.txt

[HIGH] 1-1: urllib3 1.26.20: urllib3 streaming API improperly handles highly compressed data

(GHSA-2xpw-w6gg-jr37)

[HIGH] 1-1: urllib3 1.26.20: urllib3 allows an unbounded number of links in the decompression chain

(GHSA-gm62-xv2j-4w53)

[HIGH] 1-1: urllib3 1.26.20: urllib3 redirects are not disabled when retries are disabled on PoolManager instantiation

(GHSA-pq67-6m6q-mj2v)

🪛 Ruff (0.14.10)

pmultiqc/modules/common/plots/general.py

124-124: Use explicit conversion flag

Replace with conversion flag

(RUF010)

187-187: Use explicit conversion flag

Replace with conversion flag

(RUF010)

pmultiqc/modules/quantms/quantms.py

1105-1105: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

1177-1177: Use explicit conversion flag

Replace with conversion flag

(RUF010)

1183-1183: Use explicit conversion flag

Replace with conversion flag

(RUF010)

1251-1251: Use explicit conversion flag

Replace with conversion flag

(RUF010)

2306-2306: Use explicit conversion flag

Replace with conversion flag

(RUF010)

pmultiqc/modules/common/plots/id.py

415-415: Use explicit conversion flag

Replace with conversion flag

(RUF010)

647-647: Use explicit conversion flag

Replace with conversion flag

(RUF010)

709-709: Use explicit conversion flag

Replace with conversion flag

(RUF010)

pmultiqc/modules/common/common_utils.py

528-528: Use explicit conversion flag

Replace with conversion flag

(RUF010)

544-544: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

558-558: Use explicit conversion flag

Replace with conversion flag

(RUF010)

pmultiqc/modules/common/dia_utils.py

854-854: Use explicit conversion flag

Replace with conversion flag

(RUF010)

pmultiqc/modules/common/plots/dia.py

87-87: Use explicit conversion flag

Replace with conversion flag

(RUF010)

554-554: Use explicit conversion flag

Replace with conversion flag

(RUF010)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: test_maxquant
GitHub Check: test_mzid_mzML
GitHub Check: test_lfq
GitHub Check: test_maxquant_dia
GitHub Check: test_mzid_mgf
GitHub Check: test_diann
GitHub Check: test_proteobench
GitHub Check: test_tmt
GitHub Check: test_dia
GitHub Check: Codacy Static Code Analysis

🔇 Additional comments (23)

pmultiqc/modules/common/plots/general.py (1)

66-67: Good practice: Explicit clustering configuration.

Making the clustering behavior explicit is a good practice that improves code clarity and prevents reliance on default behavior that could change between library versions.

pmultiqc/modules/common/ms/idxml.py (1)

108-111: No action needed. The project is explicitly constrained to pyopenms<=3.4.0 (in pyproject.toml, requirements.txt, and environment.yml), so the codebase will never reach the pyopenms >= 3.5.0 scenario mentioned in the comment. The current implementation using plain lists is correct for all supported versions. The commented-out line appears to be a developer note about potential future compatibility, not an active issue. If the note causes confusion, it can optionally be removed, but this is not a critical concern.

Likely an incorrect or invalid review comment.
pmultiqc/main.py (1)
48-49: Verify the attribute check and consider adding file existence validation.

Two concerns:

Attribute check: Using custom_logo_width to detect MultiQC v1.33+ dark logo support seems indirect. Consider checking for hasattr(config, "custom_logo_dark") instead to directly verify dark logo support.

Missing file validation: Unlike the main logo (line 42), there's no existence check for pmultiqc_logo_dark before setting config.custom_logo_dark. If the file doesn't exist, the config will point to a non-existent path.

Consider adding file existence validation:
if hasattr(config, "custom_logo_width"):
    if pmultiqc_logo_dark.exists():
        config.custom_logo_dark = str(pmultiqc_logo_dark)
    else:
        log.warning(f"pmultiqc dark logo file not found at: {pmultiqc_logo_dark}")
    config.custom_logo_width = 118
Additionally, please verify whether custom_logo_width is the correct attribute to check for v1.33+ support, or if custom_logo_dark would be more appropriate.
environment.yml (1)

8-8: LGTM!

The MultiQC version constraint update is consistent with other dependency files in this PR.

pyproject.toml (1)

44-44: LGTM!

The MultiQC version constraint update aligns with the dependency updates in requirements.txt, environment.yml, and recipe/meta.yaml.

recipe/meta.yaml (1)

28-28: LGTM!

The MultiQC runtime constraint update is consistent with the version updates across all other dependency files.

pmultiqc/modules/common/ms/msinfo.py (1)

114-127: LGTM! Performance improvement through batch processing.

The change from per-row iteration to passing the entire group DataFrame to add_ms_values is a good performance optimization. This aligns with the refactored add_ms_values function in ms_io.py that now accepts and processes DataFrames in batch.

pmultiqc/modules/mzidentml/mzidentml.py (3)

312-312: Ensure table.plot handles the nested data structure correctly.

The code now accesses self.cal_num_table_data["ms_runs"] instead of self.cal_num_table_data directly. This is consistent with the refactoring in parse_out_mzid, but verify that the table rendering works correctly with this structure.

143-147: Verified: aggregate_msms_identified_rate correctly handles None sdrf_file_df parameter.

The function implementation in common_utils.py (lines 507-508) explicitly checks if sdrf_file_df is None: and returns the per-run identified rate. The call at lines 143-147 in mzidentml.py with sdrf_file_df=None will correctly return only the per-run identified rate computed by cal_msms_identified_rate.

1213-1266: The nested structure is correctly and consistently used throughout the module.

All consumers of self.cal_num_table_data access the nested structure {"ms_runs": ...} using bracket notation (line 312) or pass it to draw_quantms_identification() which safely checks for optional keys like "sdrf_samples" using .get(). No code expects a flat structure. The refactoring maintains consistency.

pmultiqc/modules/common/ms_io.py (2)

110-117: LGTM! Good defensive coding with explicit type casting and NaN handling.

The explicit type casting to Int64 and float ensures data consistency, and the data_add_value helper properly handles NaN values by converting them to None before passing to the histogram.

119-126: Verify DIA mode processes all spectra correctly.

In DIA mode, the function processes all precursor_charge, base_peak_intensity, and num_peaks values without filtering by identified scans. This appears correct for DIA workflows where all spectra are typically considered.

pmultiqc/modules/common/mzidentml_utils.py (1)

45-52: Remove unused import and function or clarify intent.

get_mzid_num_data is defined but never called anywhere in the codebase. The import statement in pmultiqc/modules/mzidentml/mzidentml.py line 20 is also unused. Either remove the function and import or document the reason for keeping unused code.

pmultiqc/modules/common/plots/dia.py (2)

71-105: SDRF merge and Sample type handling looks correct.

The implementation correctly:

Checks sdrf_file_df.empty before merging

Drops duplicates on the merge key columns

Casts Sample to int for consistent sorting

Generates both Run and Sample views when SDRF is available

One minor improvement: the static analysis hint on line 87 suggests using an explicit conversion flag (f"Sample {run!s}" instead of f"Sample {str(run)}"), but this is purely stylistic.

269-303: Early return when no data is produced is a good defensive pattern.

The function correctly handles the case when calculate_dia_intensity_std returns empty data by returning early (lines 273-274), preventing downstream errors.

pmultiqc/modules/common/common_utils.py (1)

551-561: group_charge utility function is well-implemented.

The function correctly:

Groups by the specified columns

Converts column types to string for consistent handling

Adds "Sample " prefix when grouping by Sample for display purposes

pmultiqc/modules/quantms/quantms.py (2)

424-430: MSMS identified rate aggregation correctly handles optional SDRF.

The code properly checks for both mzml_table and identified_msms_spectra before calling aggregate_msms_identified_rate, and passes self.file_df which may be empty if no SDRF is available.

1105-1112: Column subsetting before stand_spectra_ref is a good optimization.

Selecting only needed columns early reduces memory footprint when processing large peptide tables.

pmultiqc/modules/common/plots/id.py (2)

509-532: rebuild_dict_structure correctly transforms MC counts to percentages.

The function properly:

Buckets missed cleavages into 0, 1, and >=2 categories

Handles division by zero when total is 0

Returns percentage values for plotting

814-830: Dynamic label detection for modifications plot is consistent with other functions.

The pattern of checking isinstance(modified_data["plot_data"], list) to determine labeling is used consistently across multiple drawing functions, providing a uniform user experience.

pmultiqc/modules/common/dia_utils.py (3)

212-259: _process_run_data restructured to return cal_num_table_data.

The refactored function now:

Builds per-run statistics and data separately

Computes sample-level aggregation via cal_num_table_at_sample

Structures the return as {"sdrf_samples": ..., "ms_runs": ...}

This aligns with the data structure expected by downstream plotting functions.

262-289: _calculate_run_statistics now returns a tuple.

The function cleanly separates aggregated statistics (stat_run) from raw data needed for sample-level aggregation (data_per_run). This is a good separation of concerns.

298-298: Updated path for files without PSM uses correct key.

The fix correctly writes to cal_num_table_data["ms_runs"][i] instead of a top-level key, maintaining consistency with the new data structure.

coderabbitai · 2025-12-24T14:44:26Z

pmultiqc/main.py

+        # Supported starting from MultiQC v1.33
+        if hasattr(config, "custom_logo_width"):
+            config.custom_logo_dark = str(pmultiqc_logo_dark)
+
+        if hasattr(config, "custom_logo_width"):
+            config.custom_logo_width = 118


🛠️ Refactor suggestion | 🟠 Major

Combine duplicate conditional blocks.

Lines 48 and 51 both check for the same condition if hasattr(config, "custom_logo_width"). These should be merged into a single conditional block.

🔎 Proposed refactor

# Supported starting from MultiQC v1.33 if hasattr(config, "custom_logo_width"): config.custom_logo_dark = str(pmultiqc_logo_dark) - - if hasattr(config, "custom_logo_width"): config.custom_logo_width = 118

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Supported starting from MultiQC v1.33

if hasattr(config, "custom_logo_width"):

config.custom_logo_dark = str(pmultiqc_logo_dark)

if hasattr(config, "custom_logo_width"):

config.custom_logo_width = 118

# Supported starting from MultiQC v1.33

if hasattr(config, "custom_logo_width"):

config.custom_logo_dark = str(pmultiqc_logo_dark)

config.custom_logo_width = 118

🤖 Prompt for AI Agents

In pmultiqc/main.py around lines 47 to 52, there are two separate if-blocks both checking hasattr(config, "custom_logo_width"); merge them into a single conditional that checks hasattr(config, "custom_logo_width") once and inside that block set both config.custom_logo_dark = str(pmultiqc_logo_dark) and config.custom_logo_width = 118 so both assignments occur only when the attribute exists.

pmultiqc/modules/common/common_utils.py

pmultiqc/modules/common/plots/id.py

coderabbitai · 2025-12-24T14:44:26Z

pmultiqc/modules/quantms/quantms.py

+            pep_table = pep_table.merge(
+                right=self.file_df[["Sample", "Run"]].drop_duplicates(),
+                left_on="stand_spectra_ref",
+                right_on="Run"
+            )
+
+            pep_intensity_by_sample = dict()
+            pep_table["Sample"] = pep_table["Sample"].astype(int)
+            for name, group in pep_table.groupby("Sample", sort=True):
+
+                if config.kwargs["remove_decoy"]:
+                    pep_intensity_by_sample[f"Sample {str(name)}"] = group[
+                        (group["opt_global_cv_MS:1002217_decoy_peptide"] == 0)
+                    ]["average_intensity"].apply(
+                        lambda x: np.log2(x) if pd.notnull(x) and x > 0 else 1
+                    )
+                else:
+                    pep_intensity_by_sample[f"Sample {str(name)}"] = group["average_intensity"].apply(
+                        lambda x: np.log2(x) if pd.notnull(x) and x > 0 else 1
+                    )
+
+            self.quantms_pep_intensity = [pep_intensity_by_run, pep_intensity_by_sample]


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Check for file_df emptiness checks and broader context rg -n "file_df" pmultiqc/modules/quantms/quantms.py | head -30

Repository: bigbio/pmultiqc

Length of output: 1329

🏁 Script executed:

# Look at the function containing lines 1166-1187 to understand validation sed -n '1100,1200p' pmultiqc/modules/quantms/quantms.py | cat -n

Repository: bigbio/pmultiqc

Length of output: 5296

🏁 Script executed:

# Search for any validations before the merge operation rg -B 20 "pep_table.merge" pmultiqc/modules/quantms/quantms.py | head -50

Repository: bigbio/pmultiqc

Length of output: 921

🏁 Script executed:

# View context around line 1456 to see the if-block structure sed -n '1450,1470p' pmultiqc/modules/quantms/quantms.py | cat -n

Repository: bigbio/pmultiqc

Length of output: 983

🏁 Script executed:

# Get broader context to understand function structure and block scope sed -n '1140,1175p' pmultiqc/modules/quantms/quantms.py | cat -n

Repository: bigbio/pmultiqc

Length of output: 1787

🏁 Script executed:

# Get broader context to understand which function contains this code and any guards sed -n '1100,1180p' pmultiqc/modules/quantms/quantms.py | cat -n

Repository: bigbio/pmultiqc

Length of output: 4245

🏁 Script executed:

# Check the full function definition and its entry conditions sed -n '1050,1170p' pmultiqc/modules/quantms/quantms.py | cat -n

Repository: bigbio/pmultiqc

Length of output: 6536

🏁 Script executed:

# Check what happens after the merge and pep_intensity_by_sample assignment sed -n '1187,1200p' pmultiqc/modules/quantms/quantms.py | cat -n

Repository: bigbio/pmultiqc

Length of output: 818

🏁 Script executed:

# Look for where self.quantms_pep_intensity is used to understand impact of empty dict rg -n "quantms_pep_intensity" pmultiqc/modules/quantms/quantms.py

Repository: bigbio/pmultiqc

Length of output: 330

🏁 Script executed:

# Check the context around line 2195 to see how quantms_pep_intensity is used sed -n '2190,2210p' pmultiqc/modules/quantms/quantms.py | cat -n

Repository: bigbio/pmultiqc

Length of output: 983

🏁 Script executed:

# Verify if there's any validation before cal_heat_map_score is called rg -B 10 "cal_heat_map_score" pmultiqc/modules/quantms/quantms.py | head -30

Repository: bigbio/pmultiqc

Length of output: 827

🏁 Script executed:

# Check what pep_table_exists is and how it relates to file_df rg -B 5 -A 5 "pep_table_exists" pmultiqc/modules/quantms/quantms.py | head -40

Repository: bigbio/pmultiqc

Length of output: 1417

🏁 Script executed:

# Check how file_df gets populated to understand if it can be empty when pep_table_exists is true rg -n "self.file_df\s*=" pmultiqc/modules/quantms/quantms.py

Repository: bigbio/pmultiqc

Length of output: 100

🏁 Script executed:

# Find where pep_table_exists is set to True rg -n "pep_table_exists\s*=\s*True" pmultiqc/modules/quantms/quantms.py

Repository: bigbio/pmultiqc

Length of output: 104

🏁 Script executed:

# Find where file_df gets populated with data (beyond initialization) rg -n "file_df.*=" pmultiqc/modules/quantms/quantms.py -A 1 | head -40

Repository: bigbio/pmultiqc

Length of output: 912

🏁 Script executed:

# Search for where file_df data is being added/assigned in the class rg -n "file_df\[|file_df\.loc|file_df\.insert|\.append|\.concat" pmultiqc/modules/quantms/quantms.py | head -20

Repository: bigbio/pmultiqc

Length of output: 1041

🏁 Script executed:

# Check how pep_table_exists gets set to True and in what condition sed -n '1340,1360p' pmultiqc/modules/quantms/quantms.py | cat -n

Repository: bigbio/pmultiqc

Length of output: 1193

🏁 Script executed:

# Look at the __init__ method to see if file_df is populated there sed -n '140,220p' pmultiqc/modules/quantms/quantms.py | cat -n

Repository: bigbio/pmultiqc

Length of output: 3567

🏁 Script executed:

# Search for any constructor or initialization code that populates file_df rg -n "def __init__|file_df.*pd\." pmultiqc/modules/quantms/quantms.py | head -10

Repository: bigbio/pmultiqc

Length of output: 178

Add a guard to check if file_df is empty before the merge operation.

The merge at lines 1166-1170 will proceed even if self.file_df is empty (when experimental design/SDRF files are unavailable), resulting in an empty pep_table after the merge. This causes pep_intensity_by_sample to be an empty dict, while self.quantms_pep_intensity becomes a list with an empty dict. The downstream check at line 2195 (if self.quantms_pep_intensity:) doesn't catch this because non-empty lists remain truthy. Add if not self.file_df.empty: before the merge or handle the case explicitly.

🧰 Tools

🪛 Ruff (0.14.10)

1177-1177: Use explicit conversion flag

Replace with conversion flag

(RUF010)

1183-1183: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🤖 Prompt for AI Agents

In pmultiqc/modules/quantms/quantms.py around lines 1166 to 1187, add a guard to check if self.file_df is empty before performing the merge: if self.file_df is empty, skip the merge/loop and set self.quantms_pep_intensity to an empty list (or otherwise a falsy value) so downstream checks like `if self.quantms_pep_intensity:` behave correctly; otherwise proceed with the merge, grouping and population of pep_intensity_by_sample as before and assign self.quantms_pep_intensity = [pep_intensity_by_run, pep_intensity_by_sample].

coderabbitai · 2025-12-24T14:44:26Z

pmultiqc/modules/quantms/quantms.py

+def cal_charge_state(psm, sdrf_file_df):
+
+    charge_state_df = psm[["filename", "charge"]].copy()
+
+    charge_state_df = charge_state_df.merge(
+        right=sdrf_file_df[["Sample", "Run"]].drop_duplicates(),
+        left_on="filename",
+        right_on="Run"
+    )
+
+    charge_state_df["Sample"] = charge_state_df["Sample"].astype(int)
+
+    charge_state_by_run = group_charge(charge_state_df, "filename", "charge")
+    charge_state_by_sample = group_charge(charge_state_df, "Sample", "charge")
+
+    return {
+        "plot_data": [
+            charge_state_by_run.to_dict(orient="index"),
+            charge_state_by_sample.to_dict(orient="index"),
+        ],
+        "cats": sorted(
+            set(charge_state_by_run.columns)
+            | set(charge_state_by_sample.columns),
+            key=int
+        )
+    }


⚠️ Potential issue | 🟡 Minor

cal_charge_state assumes sdrf_file_df is always valid.

The function merges psm with sdrf_file_df without checking if sdrf_file_df is empty. If called with an empty DataFrame, the merge will produce no rows, causing group_charge to return empty tables.

🔎 Suggested fix to handle empty SDRF

def cal_charge_state(psm, sdrf_file_df): charge_state_df = psm[["filename", "charge"]].copy() + if sdrf_file_df.empty: + charge_state_by_run = group_charge(charge_state_df, "filename", "charge") + return { + "plot_data": charge_state_by_run.to_dict(orient="index"), + "cats": sorted(charge_state_by_run.columns, key=int) + } + charge_state_df = charge_state_df.merge( right=sdrf_file_df[["Sample", "Run"]].drop_duplicates(), left_on="filename", right_on="Run" )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def cal_charge_state(psm, sdrf_file_df):

charge_state_df = psm[["filename", "charge"]].copy()

charge_state_df = charge_state_df.merge(

right=sdrf_file_df[["Sample", "Run"]].drop_duplicates(),

left_on="filename",

right_on="Run"

)

charge_state_df["Sample"] = charge_state_df["Sample"].astype(int)

charge_state_by_run = group_charge(charge_state_df, "filename", "charge")

charge_state_by_sample = group_charge(charge_state_df, "Sample", "charge")

return {

"plot_data": [

charge_state_by_run.to_dict(orient="index"),

charge_state_by_sample.to_dict(orient="index"),

],

"cats": sorted(

set(charge_state_by_run.columns)

| set(charge_state_by_sample.columns),

key=int

)

}

def cal_charge_state(psm, sdrf_file_df):

charge_state_df = psm[["filename", "charge"]].copy()

if sdrf_file_df.empty:

charge_state_by_run = group_charge(charge_state_df, "filename", "charge")

return {

"plot_data": charge_state_by_run.to_dict(orient="index"),

"cats": sorted(charge_state_by_run.columns, key=int)

}

charge_state_df = charge_state_df.merge(

right=sdrf_file_df[["Sample", "Run"]].drop_duplicates(),

left_on="filename",

right_on="Run"

)

charge_state_df["Sample"] = charge_state_df["Sample"].astype(int)

charge_state_by_run = group_charge(charge_state_df, "filename", "charge")

charge_state_by_sample = group_charge(charge_state_df, "Sample", "charge")

return {

"plot_data": [

charge_state_by_run.to_dict(orient="index"),

charge_state_by_sample.to_dict(orient="index"),

],

"cats": sorted(

set(charge_state_by_run.columns)

| set(charge_state_by_sample.columns),

key=int

)

}

🤖 Prompt for AI Agents

In pmultiqc/modules/quantms/quantms.py around lines 2258-2283, cal_charge_state merges psm with sdrf_file_df without checking for an empty SDRF which yields no rows and empty groupings; add an early check: if sdrf_file_df is None or sdrf_file_df.empty then return a consistent empty result (e.g., {"plot_data":[{},{}], "cats": []}) or alternatively fall back to using psm["filename"] as the Run/Sample mapping before grouping; implement the chosen fallback so group_charge always receives a non-empty DataFrame and ensure types match (Sample as int) before calling group_charge.

coderabbitai · 2025-12-24T14:44:26Z

pmultiqc/modules/quantms/quantms.py

+def sample_level_modifications(df, sdrf_file_df):
+
+    psm = df[
+        ["filename", "sequence", "charge", "Modifications"]
+    ].copy()
+
+    psm = psm.merge(
+        right=sdrf_file_df[["Sample", "Run"]].drop_duplicates(),
+        left_on="filename",
+        right_on="Run"
+    )
+
+    psm["Sample"] = psm["Sample"].astype(int)
+
+    mod_plot = dict()
+    for sample, group in psm.groupby("Sample", sort=True):
+
+        mod_plot_dict, _ = summarize_modifications(
+            group[["sequence", "charge", "Modifications"]].drop_duplicates()
+        )
+        mod_plot[f"Sample {str(sample)}"] = mod_plot_dict
+
+    return mod_plot


⚠️ Potential issue | 🟡 Minor

sample_level_modifications similarly lacks empty SDRF check.

Like cal_charge_state, this function doesn't handle the case when sdrf_file_df is empty. The merge will produce no rows, resulting in an empty mod_plot dictionary.

🔎 Suggested fix

def sample_level_modifications(df, sdrf_file_df): + if sdrf_file_df.empty: + return {} + psm = df[ ["filename", "sequence", "charge", "Modifications"] ].copy()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def sample_level_modifications(df, sdrf_file_df):

psm = df[

["filename", "sequence", "charge", "Modifications"]

].copy()

psm = psm.merge(

right=sdrf_file_df[["Sample", "Run"]].drop_duplicates(),

left_on="filename",

right_on="Run"

)

psm["Sample"] = psm["Sample"].astype(int)

mod_plot = dict()

for sample, group in psm.groupby("Sample", sort=True):

mod_plot_dict, _ = summarize_modifications(

group[["sequence", "charge", "Modifications"]].drop_duplicates()

)

mod_plot[f"Sample {str(sample)}"] = mod_plot_dict

return mod_plot

def sample_level_modifications(df, sdrf_file_df):

if sdrf_file_df.empty:

return {}

psm = df[

["filename", "sequence", "charge", "Modifications"]

].copy()

psm = psm.merge(

right=sdrf_file_df[["Sample", "Run"]].drop_duplicates(),

left_on="filename",

right_on="Run"

)

psm["Sample"] = psm["Sample"].astype(int)

mod_plot = dict()

for sample, group in psm.groupby("Sample", sort=True):

mod_plot_dict, _ = summarize_modifications(

group[["sequence", "charge", "Modifications"]].drop_duplicates()

)

mod_plot[f"Sample {str(sample)}"] = mod_plot_dict

return mod_plot

🧰 Tools

🪛 Ruff (0.14.10)

2306-2306: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🤖 Prompt for AI Agents

In pmultiqc/modules/quantms/quantms.py around lines 2286 to 2308, sample_level_modifications doesn't handle an empty or missing SDRF (sdrf_file_df) which causes the merge to yield no rows and an empty mod_plot; add an early guard that checks if sdrf_file_df is None or sdrf_file_df[["Sample","Run"]].drop_duplicates() is empty and immediately return an empty dict, and otherwise proceed with the merge and subsequent processing (also ensure after merge any empty result returns empty dict before casting Sample to int to avoid exceptions).

requirements.txt

Optimize CodeRabbitAI code suggestions

coderabbitai

Actionable comments posted: 11

🧹 Nitpick comments (11)

pmultiqc/main.py (1)
54-54: Consider mentioning dark logo in log message.

The log message only references the light logo, but the code also configures a dark logo (when supported). Including both in the log message would improve observability.
🔎 Proposed enhancement to log message
-    log.info(f"pmultiqc: injected custom logo from local path: {pmultiqc_logo}")
+    log.info(f"pmultiqc: injected custom logo(s) from local path: {pmultiqc_logo}")
Or for more detail:
-    log.info(f"pmultiqc: injected custom logo from local path: {pmultiqc_logo}")
+    log.info(f"pmultiqc: injected custom logo from local path: {pmultiqc_logo}")
+    if hasattr(config, "custom_logo_dark") and pmultiqc_logo_dark.exists():
+        log.info(f"pmultiqc: injected custom dark logo from local path: {pmultiqc_logo_dark}")
pmultiqc/modules/common/common_utils.py (3)
546-552: Add strict=True to zip() call for safer iteration.

Per static analysis (B905), using zip() without strict=True can silently truncate if the iterables have different lengths. Since both come from the same DataFrame, they should match, but explicit is better.
🔎 Suggested fix
     mod_plot_dict = dict(
-        zip(mod_group_processed["modifications"], mod_group_processed["percentage"])
+        zip(mod_group_processed["modifications"], mod_group_processed["percentage"], strict=True)
     )
532-532: Use f-string conversion flag instead of explicit str().

Per Ruff RUF010, prefer {sample!s} over {str(sample)} for cleaner f-strings.
🔎 Suggested fix
-            sample_key = f"Sample {str(sample)}"
+            sample_key = f"Sample {sample!s}"
560-564: Use f-string conversion flag and consider edge case.

Per Ruff RUF010, prefer {i!s} over {str(i)}. Also, if group_col == "Sample" but the index is empty, this list comprehension is safe but worth noting.
🔎 Suggested fix
     if group_col == "Sample":
         table.index = [
-            f"Sample {str(i)}" for i in table.index
+            f"Sample {i!s}" for i in table.index
         ]
pmultiqc/modules/common/plots/dia.py (3)
578-582: Missing explicit return in else branch.

When can_groupby_for_std returns False, the function logs a warning but implicitly returns None. This is correct behavior (caller checks at line 273), but an explicit return None improves clarity.
🔎 Suggested fix
     else:
         log.warning("No SDRF available; failed to parse experimental groups; SD Intensity not generated.")
+        return None
87-90: Use f-string conversion flag instead of str().

Per Ruff RUF010, prefer {run!s} over {str(run)}.
🔎 Suggested fix
                 (
-                    f"Sample {str(run)}"
+                    f"Sample {run!s}"
                     if data_type == "Sample"
                     else str(run)
                 ): group["log_intensity"].dropna().tolist()
553-556: Use f-string conversion flag.

Per Ruff RUF010, prefer {sample!s} over {str(sample)}.
🔎 Suggested fix
         plot_data = {
-            f"Sample {str(sample)}": group["log_intensity_std"].dropna().tolist()
+            f"Sample {sample!s}": group["log_intensity_std"].dropna().tolist()
             for sample, group in grouped_std.groupby("Sample")
         }
pmultiqc/modules/common/plots/id.py (3)
414-418: Use f-string conversion flag instead of str().

Per Ruff RUF010, prefer {sample!s} over {str(sample)}.
🔎 Suggested fix
             (
-                f"{sample_prefix}{str(sample)}"
+                f"{sample_prefix}{sample!s}"
                 if "sdrf_samples" == source
                 else str(sample)
             ): {"Count": info[value_key]}
650-650: Use f-string conversion flag.

Per Ruff RUF010, prefer {sample!s} over {str(sample)}.
🔎 Suggested fix
-                        sample=SampleName(f"Sample {str(sample)}"),
+                        sample=SampleName(f"Sample {sample!s}"),
718-718: Use f-string conversion flag.

Per Ruff RUF010, prefer {sample!s} over {str(sample)}.
🔎 Suggested fix
-                        sample=SampleName(f"Sample {str(sample)}"),
+                        sample=SampleName(f"Sample {sample!s}"),
pmultiqc/modules/common/dia_utils.py (1)
854-854: Use f-string conversion flag instead of str().

Per Ruff RUF010, prefer {sample!s} over {str(sample)}.
🔎 Suggested fix
-        mod_plot[f"Sample {str(sample)}"] = mod_plot_dict
+        mod_plot[f"Sample {sample!s}"] = mod_plot_dict

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d20c0d3 and 5570b9c.

📒 Files selected for processing (9)

environment.yml
pmultiqc/main.py
pmultiqc/modules/common/common_utils.py
pmultiqc/modules/common/dia_utils.py
pmultiqc/modules/common/plots/dia.py
pmultiqc/modules/common/plots/id.py
pyproject.toml
recipe/meta.yaml
requirements.txt

🚧 Files skipped from review as they are similar to previous changes (4)

environment.yml
pyproject.toml
requirements.txt
recipe/meta.yaml

🧰 Additional context used

🧬 Code graph analysis (2)

pmultiqc/modules/common/plots/dia.py (1)

pmultiqc/modules/common/common_utils.py (1)

group_charge (555-565)

pmultiqc/modules/common/plots/id.py (2)

pmultiqc/modules/quantms/sparklines.py (1)

plot (19-56)

pmultiqc/modules/common/common_utils.py (1)

condition_split (55-60)

🪛 Ruff (0.14.10)

pmultiqc/modules/common/plots/dia.py

87-87: Use explicit conversion flag

Replace with conversion flag

(RUF010)

554-554: Use explicit conversion flag

Replace with conversion flag

(RUF010)

pmultiqc/modules/common/common_utils.py

532-532: Use explicit conversion flag

Replace with conversion flag

(RUF010)

548-548: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

562-562: Use explicit conversion flag

Replace with conversion flag

(RUF010)

pmultiqc/modules/common/plots/id.py

415-415: Use explicit conversion flag

Replace with conversion flag

(RUF010)

650-650: Use explicit conversion flag

Replace with conversion flag

(RUF010)

718-718: Use explicit conversion flag

Replace with conversion flag

(RUF010)

pmultiqc/modules/common/dia_utils.py

854-854: Use explicit conversion flag

Replace with conversion flag

(RUF010)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: test_maxquant_dia
GitHub Check: test_maxquant
GitHub Check: test_dia
GitHub Check: test_diann
GitHub Check: test_lfq
GitHub Check: test_proteobench
GitHub Check: test_mzid_mzML
GitHub Check: test_tmt
GitHub Check: test_mzid_mgf
GitHub Check: Codacy Static Code Analysis

🔇 Additional comments (5)

pmultiqc/main.py (2)

40-40: LGTM! Dark logo path follows consistent pattern.

The dark logo path construction mirrors the existing light logo pattern and uses appropriate Path operations.

47-52: Past duplicate conditional issue has been resolved.

The previous review flagged duplicate conditionals checking custom_logo_width. The current code now correctly checks two different attributes: custom_logo_dark (line 48) and custom_logo_width (line 51).

pmultiqc/modules/common/plots/id.py (1)

619-622: LGTM - Defensive .get() access pattern.

Good use of .get() on cal_num_table_data to safely access sdrf_samples and ms_runs, addressing the pattern from previous review comments.

pmultiqc/modules/common/dia_utils.py (2)

262-289: LGTM - Clean refactor of _calculate_run_statistics.

The function now returns a tuple (stat_run, data_per_run) instead of mutating external state. This is a good separation of concerns and improves testability.

241-246: LGTM - Proper structure for cal_num_table_data.

The data structure with "sdrf_samples" and "ms_runs" keys provides clear separation between sample-level and run-level data, aligning well with the SDRF-aware workflow.

coderabbitai · 2025-12-25T10:39:00Z

pmultiqc/main.py

+        if hasattr(config, "custom_logo_dark"):
+            config.custom_logo_dark = str(pmultiqc_logo_dark)


⚠️ Potential issue | 🟡 Minor

Verify dark logo file existence before use.

The code checks whether pmultiqc_logo exists (line 42) but does not verify pmultiqc_logo_dark exists before setting config.custom_logo_dark. If the dark logo file is missing, this could cause issues at render time.

🔎 Proposed fix to add existence check

# Supported starting from MultiQC v1.33 - if hasattr(config, "custom_logo_dark"): + if hasattr(config, "custom_logo_dark") and pmultiqc_logo_dark.exists(): config.custom_logo_dark = str(pmultiqc_logo_dark)

🤖 Prompt for AI Agents

In pmultiqc/main.py around lines 48-49, the code sets config.custom_logo_dark without verifying the pmultiqc_logo_dark file exists; update this to check the file exists (e.g., Path(pmultiqc_logo_dark).exists() or os.path.exists) before assigning config.custom_logo_dark = str(pmultiqc_logo_dark), and if the file does not exist simply skip setting the attribute (or log a warning) to avoid render-time errors.

coderabbitai · 2025-12-25T10:39:00Z

pmultiqc/modules/common/common_utils.py

+    if "Modifications" in df_copy.columns and "modifications" not in df_copy.columns:
+        df_copy = df_copy.rename(columns={"Modifications": "modifications"})
+    else:
+        log.warning('Detected both "Modifications" and "modifications" columns.')
+


⚠️ Potential issue | 🟡 Minor

Incomplete column handling when both columns exist.

When both "Modifications" and "modifications" columns exist, the code logs a warning but then proceeds to use df_copy["modifications"] on line 281, which may not exist if only "Modifications" was present. The else branch doesn't handle the case where only "modifications" exists (which is fine) but also doesn't handle the "both exist" case properly.

🔎 Suggested fix

- if "Modifications" in df_copy.columns and "modifications" not in df_copy.columns: - df_copy = df_copy.rename(columns={"Modifications": "modifications"}) - else: - log.warning('Detected both "Modifications" and "modifications" columns.') + if "Modifications" in df_copy.columns and "modifications" in df_copy.columns: + log.warning('Detected both "Modifications" and "modifications" columns. Using "modifications".') + elif "Modifications" in df_copy.columns: + df_copy = df_copy.rename(columns={"Modifications": "modifications"}) + elif "modifications" not in df_copy.columns: + raise ValueError("Neither 'Modifications' nor 'modifications' column found in DataFrame.")

🤖 Prompt for AI Agents

In pmultiqc/modules/common/common_utils.py around lines 276-280, the current else branch treats all non-rename cases as "both exist" and only logs a warning, causing later code to fail if the lowercase column doesn't actually exist; change the logic to explicitly handle three cases: (1) if both "Modifications" and "modifications" exist, populate/merge into "modifications" (e.g., fillna from "Modifications" or prefer non-null values) and then drop the "Modifications" column while logging a warning; (2) if only "Modifications" exists, rename it to "modifications" as before; (3) if only "modifications" exists, do nothing; ensure after this block df_copy always has a "modifications" column when either original column was present.

coderabbitai · 2025-12-25T10:39:00Z

pmultiqc/modules/common/common_utils.py

+    sample_file_df["Sample"] = sample_file_df["Sample"].astype(int)
+


⚠️ Potential issue | 🟡 Minor

Potential ValueError if Sample column contains non-numeric values.

astype(int) will raise a ValueError if the Sample column contains non-integer-convertible values. Consider adding error handling or validation.

🔎 Suggested fix

sample_file_df = file_df.copy() - sample_file_df["Sample"] = sample_file_df["Sample"].astype(int) + try: + sample_file_df["Sample"] = pd.to_numeric(sample_file_df["Sample"], errors="raise").astype(int) + except (ValueError, TypeError) as e: + log.warning(f"Failed to convert Sample column to int: {e}. Returning empty dict.") + return dict()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

sample_file_df["Sample"] = sample_file_df["Sample"].astype(int)

sample_file_df = file_df.copy()

try:

sample_file_df["Sample"] = pd.to_numeric(sample_file_df["Sample"], errors="raise").astype(int)

except (ValueError, TypeError) as e:

log.warning(f"Failed to convert Sample column to int: {e}. Returning empty dict.")

return dict()

🤖 Prompt for AI Agents

In pmultiqc/modules/common/common_utils.py around lines 457-458, the direct use of sample_file_df["Sample"].astype(int) can raise a ValueError for non-numeric values; replace this with a safe conversion using pandas.to_numeric (e.g., sample_file_df["Sample"] = pd.to_numeric(sample_file_df["Sample"].str.strip(), errors="coerce")), then check for resulting NaNs and either raise a clear ValueError listing offending rows or handle them (drop/fill) as appropriate for the pipeline, ensuring any error message includes filename/row info for easier debugging.

coderabbitai · 2025-12-25T10:39:00Z

pmultiqc/modules/common/dia_utils.py

+    mod_plot_by_sample = dia_sample_level_modifications(
+        df=report_data,
+        sdrf_file_df=sdrf_file_df
+    )

    # Update quantms_modified with processed data
-    quantms_modified["plot_data"] = mod_plot_dict
+    quantms_modified["plot_data"] = [mod_plot_by_run, mod_plot_by_sample]
    quantms_modified["cats"] = list(
        sorted(modified_cats, key=lambda x: (x == "Modified (Total)", x))


⚠️ Potential issue | 🟡 Minor

dia_sample_level_modifications called unconditionally may fail with empty SDRF.

The function dia_sample_level_modifications performs a merge with sdrf_file_df without checking if it's empty. If sdrf_file_df is empty, the merge will produce an empty DataFrame, and subsequent operations may not behave as expected.

🔎 Suggested fix

- mod_plot_by_sample = dia_sample_level_modifications( - df=report_data, - sdrf_file_df=sdrf_file_df - ) + if sdrf_file_df is not None and not sdrf_file_df.empty: + mod_plot_by_sample = dia_sample_level_modifications( + df=report_data, + sdrf_file_df=sdrf_file_df + ) + else: + mod_plot_by_sample = {} # Update quantms_modified with processed data - quantms_modified["plot_data"] = [mod_plot_by_run, mod_plot_by_sample] + if mod_plot_by_sample: + quantms_modified["plot_data"] = [mod_plot_by_run, mod_plot_by_sample] + else: + quantms_modified["plot_data"] = mod_plot_by_run

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

mod_plot_by_sample = dia_sample_level_modifications(

df=report_data,

sdrf_file_df=sdrf_file_df

)

# Update quantms_modified with processed data

quantms_modified["plot_data"] = mod_plot_dict

quantms_modified["plot_data"] = [mod_plot_by_run, mod_plot_by_sample]

quantms_modified["cats"] = list(

sorted(modified_cats, key=lambda x: (x == "Modified (Total)", x))

if sdrf_file_df is not None and not sdrf_file_df.empty:

mod_plot_by_sample = dia_sample_level_modifications(

df=report_data,

sdrf_file_df=sdrf_file_df

)

else:

mod_plot_by_sample = {}

# Update quantms_modified with processed data

if mod_plot_by_sample:

quantms_modified["plot_data"] = [mod_plot_by_run, mod_plot_by_sample]

else:

quantms_modified["plot_data"] = mod_plot_by_run

quantms_modified["cats"] = list(

sorted(modified_cats, key=lambda x: (x == "Modified (Total)", x))

🤖 Prompt for AI Agents

In pmultiqc/modules/common/dia_utils.py around lines 248 to 256, dia_sample_level_modifications is called unconditionally with sdrf_file_df which can be empty and cause merges to yield empty DataFrames and break downstream logic; modify the code to first check if sdrf_file_df is empty (e.g., if sdrf_file_df is None or sdrf_file_df.empty) and only call dia_sample_level_modifications when it contains rows, otherwise set mod_plot_by_sample to a safe default (empty DataFrame or structure matching expected output) and ensure quantms_modified["plot_data"] and quantms_modified["cats"] are populated consistently so downstream code does not assume non-empty merge results.

coderabbitai · 2025-12-25T10:39:00Z

pmultiqc/modules/common/dia_utils.py

+def dia_sample_level_modifications(df, sdrf_file_df):
+
+    report_data = df.copy()
+
+    report_data = report_data.merge(
+        right=sdrf_file_df[["Sample", "Run"]].drop_duplicates(),
+        on="Run"
+    )
+
+    report_data["Sample"] = report_data["Sample"].astype(int)
+
+    mod_plot = dict()
+    for sample, group in report_data.groupby("Sample", sort=True):
+
+        mod_plot_dict, _ = summarize_modifications(
+            group.drop_duplicates()
+        )
+        mod_plot[f"Sample {str(sample)}"] = mod_plot_dict
+
+    return mod_plot


⚠️ Potential issue | 🟡 Minor

Missing empty DataFrame guard in dia_sample_level_modifications.

The function assumes sdrf_file_df has data. If called with an empty DataFrame, the merge at line 841-844 will produce an empty result, and the subsequent groupby will iterate over nothing, returning an empty mod_plot. Consider adding an early return.

🔎 Suggested fix

def dia_sample_level_modifications(df, sdrf_file_df): + if sdrf_file_df is None or sdrf_file_df.empty: + return {} + report_data = df.copy() report_data = report_data.merge(

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def dia_sample_level_modifications(df, sdrf_file_df):

report_data = df.copy()

report_data = report_data.merge(

right=sdrf_file_df[["Sample", "Run"]].drop_duplicates(),

on="Run"

)

report_data["Sample"] = report_data["Sample"].astype(int)

mod_plot = dict()

for sample, group in report_data.groupby("Sample", sort=True):

mod_plot_dict, _ = summarize_modifications(

group.drop_duplicates()

)

mod_plot[f"Sample {str(sample)}"] = mod_plot_dict

return mod_plot

def dia_sample_level_modifications(df, sdrf_file_df):

if sdrf_file_df is None or sdrf_file_df.empty:

return {}

report_data = df.copy()

report_data = report_data.merge(

right=sdrf_file_df[["Sample", "Run"]].drop_duplicates(),

on="Run"

)

report_data["Sample"] = report_data["Sample"].astype(int)

mod_plot = dict()

for sample, group in report_data.groupby("Sample", sort=True):

mod_plot_dict, _ = summarize_modifications(

group.drop_duplicates()

)

mod_plot[f"Sample {str(sample)}"] = mod_plot_dict

return mod_plot

🧰 Tools

🪛 Ruff (0.14.10)

854-854: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🤖 Prompt for AI Agents

In pmultiqc/modules/common/dia_utils.py around lines 837 to 856, the function dia_sample_level_modifications assumes sdrf_file_df has rows and proceeds to merge and group; add an early-empty guard: if sdrf_file_df is empty (or if the merge results in an empty report_data) immediately return an empty dict (or the same type as mod_plot) to avoid downstream processing; implement the check before the merge (and a second check after the merge) and return {} when empty.

coderabbitai · 2025-12-25T10:39:01Z

pmultiqc/modules/common/plots/dia.py

+    if not sdrf_file_df.empty:
+
+        df_sub = df_sub.merge(
+            sdrf_file_df[["Sample", "Run"]].drop_duplicates(),
+            on="Run"
+        )


⚠️ Potential issue | 🟡 Minor

Potential AttributeError if sdrf_file_df is None.

The function signature accepts sdrf_file_df but checks sdrf_file_df.empty directly. If the caller passes None instead of an empty DataFrame, this will raise AttributeError.

🔎 Suggested fix

- if not sdrf_file_df.empty: + if sdrf_file_df is not None and not sdrf_file_df.empty:

🤖 Prompt for AI Agents

In pmultiqc/modules/common/plots/dia.py around lines 75 to 80, the code checks sdrf_file_df.empty directly which will raise AttributeError if sdrf_file_df is None; change the conditional to explicitly guard against None (e.g. if sdrf_file_df is not None and not sdrf_file_df.empty) before calling .merge, so the merge only runs when sdrf_file_df is a real DataFrame with rows.

coderabbitai · 2025-12-25T10:39:01Z

pmultiqc/modules/common/plots/dia.py

+    if sdrf_file_df.empty:
+
+        bar_data = stat_data_by_run.to_dict(orient="index")
+        plot_label = ["by Run"]


⚠️ Potential issue | 🟡 Minor

Same None-safety issue for sdrf_file_df.empty check.

Consistent with draw_dia_intensity_dis, this check should guard against None.

🔎 Suggested fix

- if sdrf_file_df.empty: + if sdrf_file_df is None or sdrf_file_df.empty: bar_data = stat_data_by_run.to_dict(orient="index")

🤖 Prompt for AI Agents

In pmultiqc/modules/common/plots/dia.py around lines 210 to 213, the check "if sdrf_file_df.empty:" is unsafe if sdrf_file_df can be None; change the guard to explicitly handle None (e.g. use "if not sdrf_file_df or sdrf_file_df.empty:") so the branch executes correctly when sdrf_file_df is None or an empty DataFrame, then proceed to build bar_data and plot_label as before.

coderabbitai · 2025-12-25T10:39:01Z

pmultiqc/modules/common/plots/dia.py

+    if not sdrf_file_df.empty:

-    plot_data = {
-        condition: group["log_intensity_std"].dropna().tolist()
-        for condition, group in grouped_std.groupby("run_condition")
-    }
+        df_sub = df_sub.merge(
+            sdrf_file_df[["Sample", "Run"]].drop_duplicates(),
+            on="Run"
+        )


⚠️ Potential issue | 🟡 Minor

Guard against None for sdrf_file_df in calculate_dia_intensity_std.

Same pattern as other functions - should check for None before accessing .empty.

🔎 Suggested fix

- if not sdrf_file_df.empty: + if sdrf_file_df is not None and not sdrf_file_df.empty:

🤖 Prompt for AI Agents

In pmultiqc/modules/common/plots/dia.py around lines 540 to 545, the code checks sdrf_file_df.empty without guarding against sdrf_file_df being None; update the condition to first check that sdrf_file_df is not None (e.g., if sdrf_file_df is not None and not sdrf_file_df.empty:) and keep the existing merge only when that check passes so the function safely skips the merge when sdrf_file_df is None.

coderabbitai · 2025-12-25T10:39:01Z

pmultiqc/modules/common/plots/id.py

+        "data_labels": ["by Run", "by Sample"]
    }


⚠️ Potential issue | 🟡 Minor

Hardcoded data_labels may not match actual data structure.

The data_labels is hardcoded to ["by Run", "by Sample"], but if missed_cleavages_data["plot_data"] contains only one dataset (not a list), this will cause a mismatch between labels and data.

🔎 Suggested fix

+ if isinstance(missed_cleavages_data["plot_data"], list): + plot_label = ["by Run", "by Sample"] + else: + plot_label = ["by Run"] + draw_config = { "id": "missed_cleavages", "cpswitch": False, "cpswitch_c_active": False, "title": "Missed Cleavages", "tt_decimals": 2, "ylab": "Missed Cleavages [%]", - "data_labels": ["by Run", "by Sample"] + "data_labels": plot_label }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"data_labels": ["by Run", "by Sample"]

}

if isinstance(missed_cleavages_data["plot_data"], list):

plot_label = ["by Run", "by Sample"]

else:

plot_label = ["by Run"]

draw_config = {

"id": "missed_cleavages",

"cpswitch": False,

"cpswitch_c_active": False,

"title": "Missed Cleavages",

"tt_decimals": 2,

"ylab": "Missed Cleavages [%]",

"data_labels": plot_label

}

🤖 Prompt for AI Agents

In pmultiqc/modules/common/plots/id.py around lines 176-177, the data_labels list is hardcoded to ["by Run", "by Sample"] which can mismatch when missed_cleavages_data["plot_data"] is a single dataset (not a list); update the code to detect if plot_data is a list and set data_labels dynamically (e.g., if plot_data is not a list wrap it into a list or generate labels matching the number of datasets, using sensible defaults or indexed names) so that the length of data_labels always matches the number of datasets in plot_data.

coderabbitai · 2025-12-25T10:39:01Z

pmultiqc/modules/common/plots/id.py

+        mc_ratio = list()
+
+        for k in ["ms_runs", "sdrf_samples"]:
+            mc_ratio.append(rebuild_dict_structure(quantms_missed_cleavages[k]))
+
+        mc_data = {
+            "plot_data": mc_ratio,
+            "cats": ["0", "1", ">=2"]
+        }
+
        draw_msms_missed_cleavages(sub_sections, mc_data, False)


⚠️ Potential issue | 🟡 Minor

Potential KeyError when accessing quantms_missed_cleavages keys.

The loop iterates over ["ms_runs", "sdrf_samples"] but doesn't verify these keys exist in quantms_missed_cleavages. If sdrf_samples is missing, this will raise a KeyError.

🔎 Suggested fix

- mc_ratio = list() - - for k in ["ms_runs", "sdrf_samples"]: - mc_ratio.append(rebuild_dict_structure(quantms_missed_cleavages[k])) + mc_ratio = [ + rebuild_dict_structure(quantms_missed_cleavages[k]) + for k in ["ms_runs", "sdrf_samples"] + if k in quantms_missed_cleavages and quantms_missed_cleavages[k] + ] + + if not mc_ratio: + mc_ratio = [rebuild_dict_structure(quantms_missed_cleavages.get("ms_runs", {}))]

🤖 Prompt for AI Agents

In pmultiqc/modules/common/plots/id.py around lines 490 to 500, the loop directly indexes quantms_missed_cleavages for keys "ms_runs" and "sdrf_samples" which can raise KeyError if a key is missing; change the loop to safely access those keys (e.g., use dict.get(key, {}) or check key in quantms_missed_cleavages) and ensure rebuild_dict_structure receives a valid default (empty structure) so mc_ratio always appends a safe value even when a key is absent before calling draw_msms_missed_cleavages.

yueqixuan and others added 11 commits December 22, 2025 21:27

Add support for SDRF integration

9776fe5

Add support for SDRF integration

c2ec26b

Add support for SDRF integration

321658a

Add support for SDRF integration

2bd477c

fix

17db680

Apply SDRF in DIA intensity distribution

6eb191b

ms_info performance optimization

ff2bc47

ms_info performance optimization

fbb1627

Add SDRF integration

ee422c4

Merge pull request #476 from yueqixuan/dev

ba25a40

Add initial support for SDRF integration

Merge pull request #477 from bigbio/main

d20c0d3

minor changes

coderabbitai bot reviewed Dec 24, 2025

View reviewed changes

yueqixuan and others added 2 commits December 25, 2025 15:29

Optimize CodeRabbitAI code suggestions

8916827

Merge pull request #479 from yueqixuan/dev

5570b9c

Optimize CodeRabbitAI code suggestions

coderabbitai bot reviewed Dec 25, 2025

View reviewed changes

ypriverol merged commit 438cdfa into main Dec 25, 2025
20 of 21 checks passed

coderabbitai bot mentioned this pull request Dec 26, 2025

SDRF integration in more plots #484

Merged

8 tasks

This was referenced Jan 4, 2026

FragPipe first implementation #495

Merged

Third iteration of FragPipe support #505

Merged

		if hasattr(config, "custom_logo_dark"):
		config.custom_logo_dark = str(pmultiqc_logo_dark)

		sample_file_df["Sample"] = sample_file_df["Sample"].astype(int)

-    sample_file_df["Sample"] = sample_file_df["Sample"].astype(int)
+    sample_file_df = file_df.copy()
+    try:
+        sample_file_df["Sample"] = pd.to_numeric(sample_file_df["Sample"], errors="raise").astype(int)
+    except (ValueError, TypeError) as e:
+        log.warning(f"Failed to convert Sample column to int: {e}. Returning empty dict.")
+        return dict()

-        "data_labels": ["by Run", "by Sample"]
-    }
+    if isinstance(missed_cleavages_data["plot_data"], list):
+        plot_label = ["by Run", "by Sample"]
+    else:
+        plot_label = ["by Run"]
+    draw_config = {
+        "id": "missed_cleavages",
+        "cpswitch": False,
+        "cpswitch_c_active": False,
+        "title": "Missed Cleavages",
+        "tt_decimals": 2,
+        "ylab": "Missed Cleavages [%]",
+        "data_labels": plot_label
+    }

Comments

Conversation

ypriverol commented Dec 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Description

Type of Change

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

ypriverol commented Dec 24, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 24, 2025 •

edited

Loading