Skip to content

Comments

First iteration about supporting some SDRF-based plots#478

Merged
ypriverol merged 13 commits intomainfrom
dev
Dec 25, 2025
Merged

First iteration about supporting some SDRF-based plots#478
ypriverol merged 13 commits intomainfrom
dev

Conversation

@ypriverol
Copy link
Member

@ypriverol ypriverol commented Dec 24, 2025

Pull Request

Description

Brief description of the changes made in this PR.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Test addition/update
  • Updates to the dependencies has been done.

Summary by CodeRabbit

  • New Features

    • Dark logo support for custom branding.
    • Sample-level aggregation and visualizations added alongside run-level metrics.
    • SDRF-aware DIA plotting for improved sample grouping.
  • Improvements

    • Refined calculations for modifications, charge-state distributions, and identification rates.
    • Better handling of identified vs. unidentified scans and more robust batch processing.
    • Heatmap rendering now disables clustering; updated plot labels/descriptions.
  • Updates

    • MultiQC compatibility bumped to <=1.33; urllib3 added as a dependency.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 24, 2025

📝 Walkthrough

Walkthrough

Refactors aggregation and plotting to add SDRF/sample-aware and DIA-aware per-run and per-sample metrics across common, dia, quantms, mzidentml and plotting modules; adds utilities for identified-rate, charge-state and modification summaries; and bumps MultiQC upper bound and adds urllib3 to dependency specs.

Changes

Cohort / File(s) Summary
Environment & Packaging
environment.yml, pyproject.toml, requirements.txt, recipe/meta.yaml
Bumped MultiQC upper bound from <=1.32<=1.33; added urllib3>=2.6.1; minor formatting tweaks.
App Config / Branding
pmultiqc/main.py
Introduced local dark logo path and conditional application of custom_logo_dark and custom_logo_width when available.
Common Utilities
pmultiqc/modules/common/common_utils.py
Renamed mod_group_percentage(group)mod_group_percentage(df); added handling/rename for Modifications vs modifications; added functions cal_num_table_at_sample, cal_msms_identified_rate, aggregate_msms_identified_rate, summarize_modifications, group_charge.
DIA Processing
pmultiqc/modules/common/dia_utils.py
Threaded SDRF/file context through processing and plotting, removed external cal_num_table_data param, changed _process_run_data to return cal_num_table_data with ms_runs/sdrf_samples, added dia_sample_level_modifications, updated signatures and plotting call paths.
MS I/O & MS Info
pmultiqc/modules/common/ms_io.py, pmultiqc/modules/common/ms/msinfo.py
add_ms_values refactored to batch processing, DIA path, identified/unidentified grouping and explicit dtype handling; msinfo now passes whole group DataFrame into add_ms_values (batch) rather than iterating rows.
mzIdentML Utilities & Module
pmultiqc/modules/common/mzidentml_utils.py, pmultiqc/modules/mzidentml/mzidentml.py
get_mzid_num_data now returns only identified_spectra; mzIdentML parsing now uses aggregate_msms_identified_rate to compute msms_identified_rate, builds per-run cal_num_table_data["ms_runs"] and sets identified_msms_spectra accordingly.
Plots — DIA / General / ID
pmultiqc/modules/common/plots/dia.py, .../plots/general.py, .../plots/id.py
DIA plotting functions extended to accept sdrf_file_df and produce Run vs Sample groupings; heatmap clustering disabled (cluster_rows, cluster_cols = False); ID plots refactored to accept msms_identified_rate, dynamic data_labels, renamed plot ids/titles, and added rebuild_dict_structure helper.
DiANN & QuantMS Modules
pmultiqc/modules/diann/diann.py, pmultiqc/modules/quantms/quantms.py
Removed passing of cal_num_table_data/mzml_table downstream in DiANN; QuantMS added cal_charge_state and sample_level_modifications, computes and stores run- and sample-level peptide intensities, charge-state and modification plots as paired structures [run_level, sample_level], and integrates aggregate_msms_identified_rate for identified-rate computation.
IdXML
pmultiqc/modules/common/ms/idxml.py
Added a commented-out instantiation line for PeptideIdentificationList() (no runtime effect).

Sequence Diagram(s)

mermaid
sequenceDiagram
autonumber
participant Parser as File Parser
participant SDRF as SDRF / file_df
participant Aggregator as common_utils
participant Stat as per-run/stat engine
participant Plotter as Plot modules
Note over Parser,SDRF: 1. Parse raw files & SDRF
Parser->>Aggregator: send per-run DataFrames + SDRF
Aggregator->>Stat: compute identified rates (aggregate_msms_identified_rate)
Stat-->>Aggregator: msms_identified_rate, per-run/sample tables
Aggregator->>Aggregator: summarize_modifications(), cal_num_table_at_sample()
Aggregator->>Plotter: emit plot_data (plot_data = [run_level, sample_level]) + msms_identified_rate
Plotter->>Plotter: draw Run and Sample plots (DIA-aware paths)
Plotter->>User: render visuals and tables
Note over Plotter,User: Final output includes paired Run/Sample charts and tables

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

Review effort 5/5

Suggested reviewers

  • daichengxin

Poem

I nibble through SDRF and runs,
Count mods beneath moonlit suns,
Dark logos gleam, per-sample charts spring—
A hopping rabbit loves the new plotting thing! 🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.79% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'First iteration about supporting some SDRF-based plots' is directly related to the main changes in the PR, which involve extensive SDRF integration for plot generation and per-sample/run aggregations throughout multiple modules.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (9)
pmultiqc/modules/common/plots/general.py (2)

124-124: Use f-string conversion flag instead of explicit str() call.

The explicit str() call inside the f-string can be replaced with the more Pythonic !s conversion flag.

🔎 Proposed fix
-                    sample=SampleName(f"Sample {str(sample)}"),
+                    sample=SampleName(f"Sample {sample!s}"),

Based on static analysis hints (Ruff RUF010).


187-187: Use f-string conversion flag instead of explicit str() call.

The explicit str() call inside the f-string can be replaced with the more Pythonic !s conversion flag, consistent with the suggestion for line 124.

🔎 Proposed fix
-                    sample=SampleName(f"Sample {str(sample)}"),
+                    sample=SampleName(f"Sample {sample!s}"),

Based on static analysis hints (Ruff RUF010).

pmultiqc/modules/common/ms/idxml.py (1)

89-100: Simplify _init_labels() return statement.

The method returns three unused False values alongside the labels dictionary. Only the labels dict is actually used by the caller (line 64). Simplify the return to just return labels to reduce unnecessary complexity.

🔎 Proposed simplification
 def _init_labels(self):
     labels = {
         "spec_e_label": [],
         "xcorr_label": [],
         "hyper_label": [],
         "peps_label": [],
         "consensus_label": [],
         "msgf_label": False,
         "comet_label": False,
         "sage_label": False,
     }
-    return False, False, False, labels
+    return labels

And update the caller:

-_, _, _, labels = self._init_labels()
+labels = self._init_labels()
pmultiqc/modules/common/ms_io.py (1)

128-146: LGTM! Efficient batch processing with boolean indexing.

The refactoring uses info_df["scan"].isin(identified_spectrum_scan_id) to create a boolean mask, then processes identified and unidentified scans separately. This is more efficient than row-by-row iteration.

Optional: Further reduce code duplication

The three loops for each histogram could be extracted into a helper function:

+    def process_columns(df_subset, charge_hist, intensity_hist, peaks_hist):
+        for charge_state in df_subset["precursor_charge"]:
+            data_add_value(charge_hist, charge_state)
+        for base_peak_inte in df_subset["base_peak_intensity"]:
+            data_add_value(intensity_hist, base_peak_inte)
+        for peak_per_ms2 in df_subset["num_peaks"]:
+            data_add_value(peaks_hist, peak_per_ms2)
+
     identified_scans = info_df["scan"].isin(identified_spectrum_scan_id)
-    
-    for charge_state in info_df[identified_scans]["precursor_charge"]:
-        data_add_value(mzml_charge_plot, charge_state)
-    for base_peak_inte in info_df[identified_scans]["base_peak_intensity"]:
-        data_add_value(mzml_peak_distribution_plot, base_peak_inte)
-    for peak_per_ms2 in info_df[identified_scans]["num_peaks"]:
-        data_add_value(mzml_peaks_ms2_plot, peak_per_ms2)
-
-    for charge_state in info_df[~identified_scans]["precursor_charge"]:
-        data_add_value(mzml_charge_plot_1, charge_state)
-    for base_peak_inte in info_df[~identified_scans]["base_peak_intensity"]:
-        data_add_value(mzml_peak_distribution_plot_1, base_peak_inte)
-    for peak_per_ms2 in info_df[~identified_scans]["num_peaks"]:
-        data_add_value(mzml_peaks_ms2_plot_1, peak_per_ms2)
+    process_columns(
+        info_df[identified_scans],
+        mzml_charge_plot, mzml_peak_distribution_plot, mzml_peaks_ms2_plot
+    )
+    process_columns(
+        info_df[~identified_scans],
+        mzml_charge_plot_1, mzml_peak_distribution_plot_1, mzml_peaks_ms2_plot_1
+    )
pmultiqc/modules/common/plots/dia.py (2)

9-10: Self-import creates circular reference risk.

Line 9 imports dia as dia_plots from the same module (pmultiqc.modules.common.plots). This self-reference is used in calculate_dia_intensity_std (line 560) to call dia_plots.can_groupby_for_std. Since can_groupby_for_std is defined in this same file (line 257), you can call it directly without the import.

🔎 Suggested fix
-from pmultiqc.modules.common.plots import dia as dia_plots
 from pmultiqc.modules.common.common_utils import group_charge

And at line 560:

-    if dia_plots.can_groupby_for_std(df_sub, "Run"):
+    if can_groupby_for_std(df_sub, "Run"):

536-582: Missing return statement when grouping fails.

When can_groupby_for_std returns False (line 580), the function logs a warning but doesn't explicitly return. This results in an implicit return None, which is handled by the caller but could be made explicit for clarity.

🔎 Suggested fix
     else:
         log.warning("No SDRF available; failed to parse experimental groups; SD Intensity not generated.")
+        return {}
pmultiqc/modules/common/common_utils.py (2)

272-293: Column conflict handling logs warning but proceeds anyway.

When both "Modifications" and "modifications" columns exist (line 279), the warning is logged but the code continues using the original "modifications" column. This could silently produce incorrect results if the columns have different data. Consider either raising an error or explicitly documenting which column takes precedence.

🔎 Suggested fix to clarify precedence
     if "Modifications" in df_copy.columns and "modifications" not in df_copy.columns:
         df_copy = df_copy.rename(columns={"Modifications": "modifications"})
-    else:
-        log.warning('Detected both "Modifications" and "modifications" columns.')
+    elif "Modifications" in df_copy.columns and "modifications" in df_copy.columns:
+        log.warning('Detected both "Modifications" and "modifications" columns; using "modifications".')

540-548: zip() without strict parameter may silently truncate mismatched iterables.

As flagged by static analysis (B905), using zip() without strict=True on line 544 could silently drop data if the iterables have different lengths. Since both come from the same DataFrame, this is unlikely but worth protecting against.

🔎 Suggested fix
     mod_plot_dict = dict(
-        zip(mod_group_processed["modifications"], mod_group_processed["percentage"])
+        zip(mod_group_processed["modifications"], mod_group_processed["percentage"], strict=True)
     )
pmultiqc/modules/common/dia_utils.py (1)

837-855: dia_sample_level_modifications duplicates sample_level_modifications from quantms.py.

This function has nearly identical logic to sample_level_modifications in pmultiqc/modules/quantms/quantms.py (lines 2286-2308). Both merge with SDRF, cast Sample to int, group by Sample, and call summarize_modifications. Consider extracting to a shared utility in common_utils.py.

Based on the relevant code snippets, summarize_modifications and group_charge are already duplicated between common_utils.py and mztab.py. This pattern of duplication should be consolidated.

#!/bin/bash
# Check for duplicate implementations of sample_level_modifications
rg -n "def.*sample_level_modifications" --type py
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 40e2353 and d20c0d3.

⛔ Files ignored due to path filters (2)
  • MultiQC_logo.png is excluded by !**/*.png
  • pmultiqc/images/pmultiqc_logo_dark.png is excluded by !**/*.png
📒 Files selected for processing (17)
  • environment.yml
  • pmultiqc/main.py
  • pmultiqc/modules/common/common_utils.py
  • pmultiqc/modules/common/dia_utils.py
  • pmultiqc/modules/common/ms/idxml.py
  • pmultiqc/modules/common/ms/msinfo.py
  • pmultiqc/modules/common/ms_io.py
  • pmultiqc/modules/common/mzidentml_utils.py
  • pmultiqc/modules/common/plots/dia.py
  • pmultiqc/modules/common/plots/general.py
  • pmultiqc/modules/common/plots/id.py
  • pmultiqc/modules/diann/diann.py
  • pmultiqc/modules/mzidentml/mzidentml.py
  • pmultiqc/modules/quantms/quantms.py
  • pyproject.toml
  • recipe/meta.yaml
  • requirements.txt
💤 Files with no reviewable changes (1)
  • pmultiqc/modules/diann/diann.py
🧰 Additional context used
🧬 Code graph analysis (7)
pmultiqc/modules/common/ms/msinfo.py (1)
pmultiqc/modules/common/ms_io.py (1)
  • add_ms_values (77-146)
pmultiqc/modules/quantms/quantms.py (1)
pmultiqc/modules/common/file_utils.py (1)
  • file_prefix (110-117)
pmultiqc/modules/common/ms_io.py (1)
pmultiqc/modules/common/histogram.py (1)
  • add_value (67-119)
pmultiqc/modules/mzidentml/mzidentml.py (3)
pmultiqc/modules/common/common_utils.py (1)
  • aggregate_msms_identified_rate (497-538)
pmultiqc/modules/common/plots/id.py (1)
  • draw_quantms_identification (377-506)
pmultiqc/modules/quantms/sparklines.py (1)
  • plot (19-56)
pmultiqc/modules/common/plots/id.py (1)
pmultiqc/modules/quantms/sparklines.py (1)
  • plot (19-56)
pmultiqc/modules/common/dia_utils.py (1)
pmultiqc/modules/common/common_utils.py (2)
  • cal_num_table_at_sample (451-480)
  • summarize_modifications (540-548)
pmultiqc/modules/common/plots/dia.py (1)
pmultiqc/modules/common/common_utils.py (1)
  • group_charge (551-561)
🪛 OSV Scanner (2.3.1)
requirements.txt

[HIGH] 1-1: urllib3 1.26.20: urllib3 streaming API improperly handles highly compressed data

(GHSA-2xpw-w6gg-jr37)


[HIGH] 1-1: urllib3 1.26.20: urllib3 allows an unbounded number of links in the decompression chain

(GHSA-gm62-xv2j-4w53)


[HIGH] 1-1: urllib3 1.26.20: urllib3 redirects are not disabled when retries are disabled on PoolManager instantiation

(GHSA-pq67-6m6q-mj2v)

🪛 Ruff (0.14.10)
pmultiqc/modules/common/plots/general.py

124-124: Use explicit conversion flag

Replace with conversion flag

(RUF010)


187-187: Use explicit conversion flag

Replace with conversion flag

(RUF010)

pmultiqc/modules/quantms/quantms.py

1105-1105: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)


1177-1177: Use explicit conversion flag

Replace with conversion flag

(RUF010)


1183-1183: Use explicit conversion flag

Replace with conversion flag

(RUF010)


1251-1251: Use explicit conversion flag

Replace with conversion flag

(RUF010)


2306-2306: Use explicit conversion flag

Replace with conversion flag

(RUF010)

pmultiqc/modules/common/plots/id.py

415-415: Use explicit conversion flag

Replace with conversion flag

(RUF010)


647-647: Use explicit conversion flag

Replace with conversion flag

(RUF010)


709-709: Use explicit conversion flag

Replace with conversion flag

(RUF010)

pmultiqc/modules/common/common_utils.py

528-528: Use explicit conversion flag

Replace with conversion flag

(RUF010)


544-544: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


558-558: Use explicit conversion flag

Replace with conversion flag

(RUF010)

pmultiqc/modules/common/dia_utils.py

854-854: Use explicit conversion flag

Replace with conversion flag

(RUF010)

pmultiqc/modules/common/plots/dia.py

87-87: Use explicit conversion flag

Replace with conversion flag

(RUF010)


554-554: Use explicit conversion flag

Replace with conversion flag

(RUF010)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: test_maxquant
  • GitHub Check: test_mzid_mzML
  • GitHub Check: test_lfq
  • GitHub Check: test_maxquant_dia
  • GitHub Check: test_mzid_mgf
  • GitHub Check: test_diann
  • GitHub Check: test_proteobench
  • GitHub Check: test_tmt
  • GitHub Check: test_dia
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (23)
pmultiqc/modules/common/plots/general.py (1)

66-67: Good practice: Explicit clustering configuration.

Making the clustering behavior explicit is a good practice that improves code clarity and prevents reliance on default behavior that could change between library versions.

pmultiqc/modules/common/ms/idxml.py (1)

108-111: No action needed. The project is explicitly constrained to pyopenms<=3.4.0 (in pyproject.toml, requirements.txt, and environment.yml), so the codebase will never reach the pyopenms >= 3.5.0 scenario mentioned in the comment. The current implementation using plain lists is correct for all supported versions. The commented-out line appears to be a developer note about potential future compatibility, not an active issue. If the note causes confusion, it can optionally be removed, but this is not a critical concern.

Likely an incorrect or invalid review comment.

pmultiqc/main.py (1)

48-49: Verify the attribute check and consider adding file existence validation.

Two concerns:

  1. Attribute check: Using custom_logo_width to detect MultiQC v1.33+ dark logo support seems indirect. Consider checking for hasattr(config, "custom_logo_dark") instead to directly verify dark logo support.

  2. Missing file validation: Unlike the main logo (line 42), there's no existence check for pmultiqc_logo_dark before setting config.custom_logo_dark. If the file doesn't exist, the config will point to a non-existent path.

Consider adding file existence validation:

if hasattr(config, "custom_logo_width"):
    if pmultiqc_logo_dark.exists():
        config.custom_logo_dark = str(pmultiqc_logo_dark)
    else:
        log.warning(f"pmultiqc dark logo file not found at: {pmultiqc_logo_dark}")
    config.custom_logo_width = 118

Additionally, please verify whether custom_logo_width is the correct attribute to check for v1.33+ support, or if custom_logo_dark would be more appropriate.

environment.yml (1)

8-8: LGTM!

The MultiQC version constraint update is consistent with other dependency files in this PR.

pyproject.toml (1)

44-44: LGTM!

The MultiQC version constraint update aligns with the dependency updates in requirements.txt, environment.yml, and recipe/meta.yaml.

recipe/meta.yaml (1)

28-28: LGTM!

The MultiQC runtime constraint update is consistent with the version updates across all other dependency files.

pmultiqc/modules/common/ms/msinfo.py (1)

114-127: LGTM! Performance improvement through batch processing.

The change from per-row iteration to passing the entire group DataFrame to add_ms_values is a good performance optimization. This aligns with the refactored add_ms_values function in ms_io.py that now accepts and processes DataFrames in batch.

pmultiqc/modules/mzidentml/mzidentml.py (3)

312-312: Ensure table.plot handles the nested data structure correctly.

The code now accesses self.cal_num_table_data["ms_runs"] instead of self.cal_num_table_data directly. This is consistent with the refactoring in parse_out_mzid, but verify that the table rendering works correctly with this structure.


143-147: Verified: aggregate_msms_identified_rate correctly handles None sdrf_file_df parameter.

The function implementation in common_utils.py (lines 507-508) explicitly checks if sdrf_file_df is None: and returns the per-run identified rate. The call at lines 143-147 in mzidentml.py with sdrf_file_df=None will correctly return only the per-run identified rate computed by cal_msms_identified_rate.


1213-1266: The nested structure is correctly and consistently used throughout the module.

All consumers of self.cal_num_table_data access the nested structure {"ms_runs": ...} using bracket notation (line 312) or pass it to draw_quantms_identification() which safely checks for optional keys like "sdrf_samples" using .get(). No code expects a flat structure. The refactoring maintains consistency.

pmultiqc/modules/common/ms_io.py (2)

110-117: LGTM! Good defensive coding with explicit type casting and NaN handling.

The explicit type casting to Int64 and float ensures data consistency, and the data_add_value helper properly handles NaN values by converting them to None before passing to the histogram.


119-126: Verify DIA mode processes all spectra correctly.

In DIA mode, the function processes all precursor_charge, base_peak_intensity, and num_peaks values without filtering by identified scans. This appears correct for DIA workflows where all spectra are typically considered.

pmultiqc/modules/common/mzidentml_utils.py (1)

45-52: Remove unused import and function or clarify intent.

get_mzid_num_data is defined but never called anywhere in the codebase. The import statement in pmultiqc/modules/mzidentml/mzidentml.py line 20 is also unused. Either remove the function and import or document the reason for keeping unused code.

pmultiqc/modules/common/plots/dia.py (2)

71-105: SDRF merge and Sample type handling looks correct.

The implementation correctly:

  1. Checks sdrf_file_df.empty before merging
  2. Drops duplicates on the merge key columns
  3. Casts Sample to int for consistent sorting
  4. Generates both Run and Sample views when SDRF is available

One minor improvement: the static analysis hint on line 87 suggests using an explicit conversion flag (f"Sample {run!s}" instead of f"Sample {str(run)}"), but this is purely stylistic.


269-303: Early return when no data is produced is a good defensive pattern.

The function correctly handles the case when calculate_dia_intensity_std returns empty data by returning early (lines 273-274), preventing downstream errors.

pmultiqc/modules/common/common_utils.py (1)

551-561: group_charge utility function is well-implemented.

The function correctly:

  1. Groups by the specified columns
  2. Converts column types to string for consistent handling
  3. Adds "Sample " prefix when grouping by Sample for display purposes
pmultiqc/modules/quantms/quantms.py (2)

424-430: MSMS identified rate aggregation correctly handles optional SDRF.

The code properly checks for both mzml_table and identified_msms_spectra before calling aggregate_msms_identified_rate, and passes self.file_df which may be empty if no SDRF is available.


1105-1112: Column subsetting before stand_spectra_ref is a good optimization.

Selecting only needed columns early reduces memory footprint when processing large peptide tables.

pmultiqc/modules/common/plots/id.py (2)

509-532: rebuild_dict_structure correctly transforms MC counts to percentages.

The function properly:

  1. Buckets missed cleavages into 0, 1, and >=2 categories
  2. Handles division by zero when total is 0
  3. Returns percentage values for plotting

814-830: Dynamic label detection for modifications plot is consistent with other functions.

The pattern of checking isinstance(modified_data["plot_data"], list) to determine labeling is used consistently across multiple drawing functions, providing a uniform user experience.

pmultiqc/modules/common/dia_utils.py (3)

212-259: _process_run_data restructured to return cal_num_table_data.

The refactored function now:

  1. Builds per-run statistics and data separately
  2. Computes sample-level aggregation via cal_num_table_at_sample
  3. Structures the return as {"sdrf_samples": ..., "ms_runs": ...}

This aligns with the data structure expected by downstream plotting functions.


262-289: _calculate_run_statistics now returns a tuple.

The function cleanly separates aggregated statistics (stat_run) from raw data needed for sample-level aggregation (data_per_run). This is a good separation of concerns.


298-298: Updated path for files without PSM uses correct key.

The fix correctly writes to cal_num_table_data["ms_runs"][i] instead of a top-level key, maintaining consistency with the new data structure.

Comment on lines 47 to 52
# Supported starting from MultiQC v1.33
if hasattr(config, "custom_logo_width"):
config.custom_logo_dark = str(pmultiqc_logo_dark)

if hasattr(config, "custom_logo_width"):
config.custom_logo_width = 118
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Combine duplicate conditional blocks.

Lines 48 and 51 both check for the same condition if hasattr(config, "custom_logo_width"). These should be merged into a single conditional block.

🔎 Proposed refactor
     # Supported starting from MultiQC v1.33
     if hasattr(config, "custom_logo_width"):
         config.custom_logo_dark = str(pmultiqc_logo_dark)
-
-    if hasattr(config, "custom_logo_width"):
         config.custom_logo_width = 118
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Supported starting from MultiQC v1.33
if hasattr(config, "custom_logo_width"):
config.custom_logo_dark = str(pmultiqc_logo_dark)
if hasattr(config, "custom_logo_width"):
config.custom_logo_width = 118
# Supported starting from MultiQC v1.33
if hasattr(config, "custom_logo_width"):
config.custom_logo_dark = str(pmultiqc_logo_dark)
config.custom_logo_width = 118
🤖 Prompt for AI Agents
In pmultiqc/main.py around lines 47 to 52, there are two separate if-blocks both
checking hasattr(config, "custom_logo_width"); merge them into a single
conditional that checks hasattr(config, "custom_logo_width") once and inside
that block set both config.custom_logo_dark = str(pmultiqc_logo_dark) and
config.custom_logo_width = 118 so both assignments occur only when the attribute
exists.

Comment on lines +1166 to +1187
pep_table = pep_table.merge(
right=self.file_df[["Sample", "Run"]].drop_duplicates(),
left_on="stand_spectra_ref",
right_on="Run"
)

pep_intensity_by_sample = dict()
pep_table["Sample"] = pep_table["Sample"].astype(int)
for name, group in pep_table.groupby("Sample", sort=True):

if config.kwargs["remove_decoy"]:
pep_intensity_by_sample[f"Sample {str(name)}"] = group[
(group["opt_global_cv_MS:1002217_decoy_peptide"] == 0)
]["average_intensity"].apply(
lambda x: np.log2(x) if pd.notnull(x) and x > 0 else 1
)
else:
pep_intensity_by_sample[f"Sample {str(name)}"] = group["average_intensity"].apply(
lambda x: np.log2(x) if pd.notnull(x) and x > 0 else 1
)

self.quantms_pep_intensity = [pep_intensity_by_run, pep_intensity_by_sample]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Check for file_df emptiness checks and broader context
rg -n "file_df" pmultiqc/modules/quantms/quantms.py | head -30

Repository: bigbio/pmultiqc

Length of output: 1329


🏁 Script executed:

# Look at the function containing lines 1166-1187 to understand validation
sed -n '1100,1200p' pmultiqc/modules/quantms/quantms.py | cat -n

Repository: bigbio/pmultiqc

Length of output: 5296


🏁 Script executed:

# Search for any validations before the merge operation
rg -B 20 "pep_table.merge" pmultiqc/modules/quantms/quantms.py | head -50

Repository: bigbio/pmultiqc

Length of output: 921


🏁 Script executed:

# View context around line 1456 to see the if-block structure
sed -n '1450,1470p' pmultiqc/modules/quantms/quantms.py | cat -n

Repository: bigbio/pmultiqc

Length of output: 983


🏁 Script executed:

# Get broader context to understand function structure and block scope
sed -n '1140,1175p' pmultiqc/modules/quantms/quantms.py | cat -n

Repository: bigbio/pmultiqc

Length of output: 1787


🏁 Script executed:

# Get broader context to understand which function contains this code and any guards
sed -n '1100,1180p' pmultiqc/modules/quantms/quantms.py | cat -n

Repository: bigbio/pmultiqc

Length of output: 4245


🏁 Script executed:

# Check the full function definition and its entry conditions
sed -n '1050,1170p' pmultiqc/modules/quantms/quantms.py | cat -n

Repository: bigbio/pmultiqc

Length of output: 6536


🏁 Script executed:

# Check what happens after the merge and pep_intensity_by_sample assignment
sed -n '1187,1200p' pmultiqc/modules/quantms/quantms.py | cat -n

Repository: bigbio/pmultiqc

Length of output: 818


🏁 Script executed:

# Look for where self.quantms_pep_intensity is used to understand impact of empty dict
rg -n "quantms_pep_intensity" pmultiqc/modules/quantms/quantms.py

Repository: bigbio/pmultiqc

Length of output: 330


🏁 Script executed:

# Check the context around line 2195 to see how quantms_pep_intensity is used
sed -n '2190,2210p' pmultiqc/modules/quantms/quantms.py | cat -n

Repository: bigbio/pmultiqc

Length of output: 983


🏁 Script executed:

# Verify if there's any validation before cal_heat_map_score is called
rg -B 10 "cal_heat_map_score" pmultiqc/modules/quantms/quantms.py | head -30

Repository: bigbio/pmultiqc

Length of output: 827


🏁 Script executed:

# Check what pep_table_exists is and how it relates to file_df
rg -B 5 -A 5 "pep_table_exists" pmultiqc/modules/quantms/quantms.py | head -40

Repository: bigbio/pmultiqc

Length of output: 1417


🏁 Script executed:

# Check how file_df gets populated to understand if it can be empty when pep_table_exists is true
rg -n "self.file_df\s*=" pmultiqc/modules/quantms/quantms.py

Repository: bigbio/pmultiqc

Length of output: 100


🏁 Script executed:

# Find where pep_table_exists is set to True
rg -n "pep_table_exists\s*=\s*True" pmultiqc/modules/quantms/quantms.py

Repository: bigbio/pmultiqc

Length of output: 104


🏁 Script executed:

# Find where file_df gets populated with data (beyond initialization)
rg -n "file_df.*=" pmultiqc/modules/quantms/quantms.py -A 1 | head -40

Repository: bigbio/pmultiqc

Length of output: 912


🏁 Script executed:

# Search for where file_df data is being added/assigned in the class
rg -n "file_df\[|file_df\.loc|file_df\.insert|\.append|\.concat" pmultiqc/modules/quantms/quantms.py | head -20

Repository: bigbio/pmultiqc

Length of output: 1041


🏁 Script executed:

# Check how pep_table_exists gets set to True and in what condition
sed -n '1340,1360p' pmultiqc/modules/quantms/quantms.py | cat -n

Repository: bigbio/pmultiqc

Length of output: 1193


🏁 Script executed:

# Look at the __init__ method to see if file_df is populated there
sed -n '140,220p' pmultiqc/modules/quantms/quantms.py | cat -n

Repository: bigbio/pmultiqc

Length of output: 3567


🏁 Script executed:

# Search for any constructor or initialization code that populates file_df
rg -n "def __init__|file_df.*pd\." pmultiqc/modules/quantms/quantms.py | head -10

Repository: bigbio/pmultiqc

Length of output: 178


Add a guard to check if file_df is empty before the merge operation.

The merge at lines 1166-1170 will proceed even if self.file_df is empty (when experimental design/SDRF files are unavailable), resulting in an empty pep_table after the merge. This causes pep_intensity_by_sample to be an empty dict, while self.quantms_pep_intensity becomes a list with an empty dict. The downstream check at line 2195 (if self.quantms_pep_intensity:) doesn't catch this because non-empty lists remain truthy. Add if not self.file_df.empty: before the merge or handle the case explicitly.

🧰 Tools
🪛 Ruff (0.14.10)

1177-1177: Use explicit conversion flag

Replace with conversion flag

(RUF010)


1183-1183: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🤖 Prompt for AI Agents
In pmultiqc/modules/quantms/quantms.py around lines 1166 to 1187, add a guard to
check if self.file_df is empty before performing the merge: if self.file_df is
empty, skip the merge/loop and set self.quantms_pep_intensity to an empty list
(or otherwise a falsy value) so downstream checks like `if
self.quantms_pep_intensity:` behave correctly; otherwise proceed with the merge,
grouping and population of pep_intensity_by_sample as before and assign
self.quantms_pep_intensity = [pep_intensity_by_run, pep_intensity_by_sample].

Comment on lines +2258 to +2283
def cal_charge_state(psm, sdrf_file_df):

charge_state_df = psm[["filename", "charge"]].copy()

charge_state_df = charge_state_df.merge(
right=sdrf_file_df[["Sample", "Run"]].drop_duplicates(),
left_on="filename",
right_on="Run"
)

charge_state_df["Sample"] = charge_state_df["Sample"].astype(int)

charge_state_by_run = group_charge(charge_state_df, "filename", "charge")
charge_state_by_sample = group_charge(charge_state_df, "Sample", "charge")

return {
"plot_data": [
charge_state_by_run.to_dict(orient="index"),
charge_state_by_sample.to_dict(orient="index"),
],
"cats": sorted(
set(charge_state_by_run.columns)
| set(charge_state_by_sample.columns),
key=int
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

cal_charge_state assumes sdrf_file_df is always valid.

The function merges psm with sdrf_file_df without checking if sdrf_file_df is empty. If called with an empty DataFrame, the merge will produce no rows, causing group_charge to return empty tables.

🔎 Suggested fix to handle empty SDRF
 def cal_charge_state(psm, sdrf_file_df):
 
     charge_state_df = psm[["filename", "charge"]].copy()
 
+    if sdrf_file_df.empty:
+        charge_state_by_run = group_charge(charge_state_df, "filename", "charge")
+        return {
+            "plot_data": charge_state_by_run.to_dict(orient="index"),
+            "cats": sorted(charge_state_by_run.columns, key=int)
+        }
+
     charge_state_df = charge_state_df.merge(
         right=sdrf_file_df[["Sample", "Run"]].drop_duplicates(),
         left_on="filename",
         right_on="Run"
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def cal_charge_state(psm, sdrf_file_df):
charge_state_df = psm[["filename", "charge"]].copy()
charge_state_df = charge_state_df.merge(
right=sdrf_file_df[["Sample", "Run"]].drop_duplicates(),
left_on="filename",
right_on="Run"
)
charge_state_df["Sample"] = charge_state_df["Sample"].astype(int)
charge_state_by_run = group_charge(charge_state_df, "filename", "charge")
charge_state_by_sample = group_charge(charge_state_df, "Sample", "charge")
return {
"plot_data": [
charge_state_by_run.to_dict(orient="index"),
charge_state_by_sample.to_dict(orient="index"),
],
"cats": sorted(
set(charge_state_by_run.columns)
| set(charge_state_by_sample.columns),
key=int
)
}
def cal_charge_state(psm, sdrf_file_df):
charge_state_df = psm[["filename", "charge"]].copy()
if sdrf_file_df.empty:
charge_state_by_run = group_charge(charge_state_df, "filename", "charge")
return {
"plot_data": charge_state_by_run.to_dict(orient="index"),
"cats": sorted(charge_state_by_run.columns, key=int)
}
charge_state_df = charge_state_df.merge(
right=sdrf_file_df[["Sample", "Run"]].drop_duplicates(),
left_on="filename",
right_on="Run"
)
charge_state_df["Sample"] = charge_state_df["Sample"].astype(int)
charge_state_by_run = group_charge(charge_state_df, "filename", "charge")
charge_state_by_sample = group_charge(charge_state_df, "Sample", "charge")
return {
"plot_data": [
charge_state_by_run.to_dict(orient="index"),
charge_state_by_sample.to_dict(orient="index"),
],
"cats": sorted(
set(charge_state_by_run.columns)
| set(charge_state_by_sample.columns),
key=int
)
}
🤖 Prompt for AI Agents
In pmultiqc/modules/quantms/quantms.py around lines 2258-2283, cal_charge_state
merges psm with sdrf_file_df without checking for an empty SDRF which yields no
rows and empty groupings; add an early check: if sdrf_file_df is None or
sdrf_file_df.empty then return a consistent empty result (e.g.,
{"plot_data":[{},{}], "cats": []}) or alternatively fall back to using
psm["filename"] as the Run/Sample mapping before grouping; implement the chosen
fallback so group_charge always receives a non-empty DataFrame and ensure types
match (Sample as int) before calling group_charge.

Comment on lines +2286 to +2308
def sample_level_modifications(df, sdrf_file_df):

psm = df[
["filename", "sequence", "charge", "Modifications"]
].copy()

psm = psm.merge(
right=sdrf_file_df[["Sample", "Run"]].drop_duplicates(),
left_on="filename",
right_on="Run"
)

psm["Sample"] = psm["Sample"].astype(int)

mod_plot = dict()
for sample, group in psm.groupby("Sample", sort=True):

mod_plot_dict, _ = summarize_modifications(
group[["sequence", "charge", "Modifications"]].drop_duplicates()
)
mod_plot[f"Sample {str(sample)}"] = mod_plot_dict

return mod_plot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

sample_level_modifications similarly lacks empty SDRF check.

Like cal_charge_state, this function doesn't handle the case when sdrf_file_df is empty. The merge will produce no rows, resulting in an empty mod_plot dictionary.

🔎 Suggested fix
 def sample_level_modifications(df, sdrf_file_df):
 
+    if sdrf_file_df.empty:
+        return {}
+
     psm = df[
         ["filename", "sequence", "charge", "Modifications"]
     ].copy()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def sample_level_modifications(df, sdrf_file_df):
psm = df[
["filename", "sequence", "charge", "Modifications"]
].copy()
psm = psm.merge(
right=sdrf_file_df[["Sample", "Run"]].drop_duplicates(),
left_on="filename",
right_on="Run"
)
psm["Sample"] = psm["Sample"].astype(int)
mod_plot = dict()
for sample, group in psm.groupby("Sample", sort=True):
mod_plot_dict, _ = summarize_modifications(
group[["sequence", "charge", "Modifications"]].drop_duplicates()
)
mod_plot[f"Sample {str(sample)}"] = mod_plot_dict
return mod_plot
def sample_level_modifications(df, sdrf_file_df):
if sdrf_file_df.empty:
return {}
psm = df[
["filename", "sequence", "charge", "Modifications"]
].copy()
psm = psm.merge(
right=sdrf_file_df[["Sample", "Run"]].drop_duplicates(),
left_on="filename",
right_on="Run"
)
psm["Sample"] = psm["Sample"].astype(int)
mod_plot = dict()
for sample, group in psm.groupby("Sample", sort=True):
mod_plot_dict, _ = summarize_modifications(
group[["sequence", "charge", "Modifications"]].drop_duplicates()
)
mod_plot[f"Sample {str(sample)}"] = mod_plot_dict
return mod_plot
🧰 Tools
🪛 Ruff (0.14.10)

2306-2306: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🤖 Prompt for AI Agents
In pmultiqc/modules/quantms/quantms.py around lines 2286 to 2308,
sample_level_modifications doesn't handle an empty or missing SDRF
(sdrf_file_df) which causes the merge to yield no rows and an empty mod_plot;
add an early guard that checks if sdrf_file_df is None or
sdrf_file_df[["Sample","Run"]].drop_duplicates() is empty and immediately return
an empty dict, and otherwise proceed with the merge and subsequent processing
(also ensure after merge any empty result returns empty dict before casting
Sample to int to avoid exceptions).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (11)
pmultiqc/main.py (1)

54-54: Consider mentioning dark logo in log message.

The log message only references the light logo, but the code also configures a dark logo (when supported). Including both in the log message would improve observability.

🔎 Proposed enhancement to log message
-    log.info(f"pmultiqc: injected custom logo from local path: {pmultiqc_logo}")
+    log.info(f"pmultiqc: injected custom logo(s) from local path: {pmultiqc_logo}")

Or for more detail:

-    log.info(f"pmultiqc: injected custom logo from local path: {pmultiqc_logo}")
+    log.info(f"pmultiqc: injected custom logo from local path: {pmultiqc_logo}")
+    if hasattr(config, "custom_logo_dark") and pmultiqc_logo_dark.exists():
+        log.info(f"pmultiqc: injected custom dark logo from local path: {pmultiqc_logo_dark}")
pmultiqc/modules/common/common_utils.py (3)

546-552: Add strict=True to zip() call for safer iteration.

Per static analysis (B905), using zip() without strict=True can silently truncate if the iterables have different lengths. Since both come from the same DataFrame, they should match, but explicit is better.

🔎 Suggested fix
     mod_plot_dict = dict(
-        zip(mod_group_processed["modifications"], mod_group_processed["percentage"])
+        zip(mod_group_processed["modifications"], mod_group_processed["percentage"], strict=True)
     )

532-532: Use f-string conversion flag instead of explicit str().

Per Ruff RUF010, prefer {sample!s} over {str(sample)} for cleaner f-strings.

🔎 Suggested fix
-            sample_key = f"Sample {str(sample)}"
+            sample_key = f"Sample {sample!s}"

560-564: Use f-string conversion flag and consider edge case.

Per Ruff RUF010, prefer {i!s} over {str(i)}. Also, if group_col == "Sample" but the index is empty, this list comprehension is safe but worth noting.

🔎 Suggested fix
     if group_col == "Sample":
         table.index = [
-            f"Sample {str(i)}" for i in table.index
+            f"Sample {i!s}" for i in table.index
         ]
pmultiqc/modules/common/plots/dia.py (3)

578-582: Missing explicit return in else branch.

When can_groupby_for_std returns False, the function logs a warning but implicitly returns None. This is correct behavior (caller checks at line 273), but an explicit return None improves clarity.

🔎 Suggested fix
     else:
         log.warning("No SDRF available; failed to parse experimental groups; SD Intensity not generated.")
+        return None

87-90: Use f-string conversion flag instead of str().

Per Ruff RUF010, prefer {run!s} over {str(run)}.

🔎 Suggested fix
                 (
-                    f"Sample {str(run)}"
+                    f"Sample {run!s}"
                     if data_type == "Sample"
                     else str(run)
                 ): group["log_intensity"].dropna().tolist()

553-556: Use f-string conversion flag.

Per Ruff RUF010, prefer {sample!s} over {str(sample)}.

🔎 Suggested fix
         plot_data = {
-            f"Sample {str(sample)}": group["log_intensity_std"].dropna().tolist()
+            f"Sample {sample!s}": group["log_intensity_std"].dropna().tolist()
             for sample, group in grouped_std.groupby("Sample")
         }
pmultiqc/modules/common/plots/id.py (3)

414-418: Use f-string conversion flag instead of str().

Per Ruff RUF010, prefer {sample!s} over {str(sample)}.

🔎 Suggested fix
             (
-                f"{sample_prefix}{str(sample)}"
+                f"{sample_prefix}{sample!s}"
                 if "sdrf_samples" == source
                 else str(sample)
             ): {"Count": info[value_key]}

650-650: Use f-string conversion flag.

Per Ruff RUF010, prefer {sample!s} over {str(sample)}.

🔎 Suggested fix
-                        sample=SampleName(f"Sample {str(sample)}"),
+                        sample=SampleName(f"Sample {sample!s}"),

718-718: Use f-string conversion flag.

Per Ruff RUF010, prefer {sample!s} over {str(sample)}.

🔎 Suggested fix
-                        sample=SampleName(f"Sample {str(sample)}"),
+                        sample=SampleName(f"Sample {sample!s}"),
pmultiqc/modules/common/dia_utils.py (1)

854-854: Use f-string conversion flag instead of str().

Per Ruff RUF010, prefer {sample!s} over {str(sample)}.

🔎 Suggested fix
-        mod_plot[f"Sample {str(sample)}"] = mod_plot_dict
+        mod_plot[f"Sample {sample!s}"] = mod_plot_dict
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d20c0d3 and 5570b9c.

📒 Files selected for processing (9)
  • environment.yml
  • pmultiqc/main.py
  • pmultiqc/modules/common/common_utils.py
  • pmultiqc/modules/common/dia_utils.py
  • pmultiqc/modules/common/plots/dia.py
  • pmultiqc/modules/common/plots/id.py
  • pyproject.toml
  • recipe/meta.yaml
  • requirements.txt
🚧 Files skipped from review as they are similar to previous changes (4)
  • environment.yml
  • pyproject.toml
  • requirements.txt
  • recipe/meta.yaml
🧰 Additional context used
🧬 Code graph analysis (2)
pmultiqc/modules/common/plots/dia.py (1)
pmultiqc/modules/common/common_utils.py (1)
  • group_charge (555-565)
pmultiqc/modules/common/plots/id.py (2)
pmultiqc/modules/quantms/sparklines.py (1)
  • plot (19-56)
pmultiqc/modules/common/common_utils.py (1)
  • condition_split (55-60)
🪛 Ruff (0.14.10)
pmultiqc/modules/common/plots/dia.py

87-87: Use explicit conversion flag

Replace with conversion flag

(RUF010)


554-554: Use explicit conversion flag

Replace with conversion flag

(RUF010)

pmultiqc/modules/common/common_utils.py

532-532: Use explicit conversion flag

Replace with conversion flag

(RUF010)


548-548: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


562-562: Use explicit conversion flag

Replace with conversion flag

(RUF010)

pmultiqc/modules/common/plots/id.py

415-415: Use explicit conversion flag

Replace with conversion flag

(RUF010)


650-650: Use explicit conversion flag

Replace with conversion flag

(RUF010)


718-718: Use explicit conversion flag

Replace with conversion flag

(RUF010)

pmultiqc/modules/common/dia_utils.py

854-854: Use explicit conversion flag

Replace with conversion flag

(RUF010)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: test_maxquant_dia
  • GitHub Check: test_maxquant
  • GitHub Check: test_dia
  • GitHub Check: test_diann
  • GitHub Check: test_lfq
  • GitHub Check: test_proteobench
  • GitHub Check: test_mzid_mzML
  • GitHub Check: test_tmt
  • GitHub Check: test_mzid_mgf
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (5)
pmultiqc/main.py (2)

40-40: LGTM! Dark logo path follows consistent pattern.

The dark logo path construction mirrors the existing light logo pattern and uses appropriate Path operations.


47-52: Past duplicate conditional issue has been resolved.

The previous review flagged duplicate conditionals checking custom_logo_width. The current code now correctly checks two different attributes: custom_logo_dark (line 48) and custom_logo_width (line 51).

pmultiqc/modules/common/plots/id.py (1)

619-622: LGTM - Defensive .get() access pattern.

Good use of .get() on cal_num_table_data to safely access sdrf_samples and ms_runs, addressing the pattern from previous review comments.

pmultiqc/modules/common/dia_utils.py (2)

262-289: LGTM - Clean refactor of _calculate_run_statistics.

The function now returns a tuple (stat_run, data_per_run) instead of mutating external state. This is a good separation of concerns and improves testability.


241-246: LGTM - Proper structure for cal_num_table_data.

The data structure with "sdrf_samples" and "ms_runs" keys provides clear separation between sample-level and run-level data, aligning well with the SDRF-aware workflow.

Comment on lines +48 to +49
if hasattr(config, "custom_logo_dark"):
config.custom_logo_dark = str(pmultiqc_logo_dark)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Verify dark logo file existence before use.

The code checks whether pmultiqc_logo exists (line 42) but does not verify pmultiqc_logo_dark exists before setting config.custom_logo_dark. If the dark logo file is missing, this could cause issues at render time.

🔎 Proposed fix to add existence check
     # Supported starting from MultiQC v1.33
-    if hasattr(config, "custom_logo_dark"):
+    if hasattr(config, "custom_logo_dark") and pmultiqc_logo_dark.exists():
         config.custom_logo_dark = str(pmultiqc_logo_dark)
🤖 Prompt for AI Agents
In pmultiqc/main.py around lines 48-49, the code sets config.custom_logo_dark
without verifying the pmultiqc_logo_dark file exists; update this to check the
file exists (e.g., Path(pmultiqc_logo_dark).exists() or os.path.exists) before
assigning config.custom_logo_dark = str(pmultiqc_logo_dark), and if the file
does not exist simply skip setting the attribute (or log a warning) to avoid
render-time errors.

Comment on lines +276 to +280
if "Modifications" in df_copy.columns and "modifications" not in df_copy.columns:
df_copy = df_copy.rename(columns={"Modifications": "modifications"})
else:
log.warning('Detected both "Modifications" and "modifications" columns.')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Incomplete column handling when both columns exist.

When both "Modifications" and "modifications" columns exist, the code logs a warning but then proceeds to use df_copy["modifications"] on line 281, which may not exist if only "Modifications" was present. The else branch doesn't handle the case where only "modifications" exists (which is fine) but also doesn't handle the "both exist" case properly.

🔎 Suggested fix
-    if "Modifications" in df_copy.columns and "modifications" not in df_copy.columns:
-        df_copy = df_copy.rename(columns={"Modifications": "modifications"})
-    else:
-        log.warning('Detected both "Modifications" and "modifications" columns.')
+    if "Modifications" in df_copy.columns and "modifications" in df_copy.columns:
+        log.warning('Detected both "Modifications" and "modifications" columns. Using "modifications".')
+    elif "Modifications" in df_copy.columns:
+        df_copy = df_copy.rename(columns={"Modifications": "modifications"})
+    elif "modifications" not in df_copy.columns:
+        raise ValueError("Neither 'Modifications' nor 'modifications' column found in DataFrame.")
🤖 Prompt for AI Agents
In pmultiqc/modules/common/common_utils.py around lines 276-280, the current
else branch treats all non-rename cases as "both exist" and only logs a warning,
causing later code to fail if the lowercase column doesn't actually exist;
change the logic to explicitly handle three cases: (1) if both "Modifications"
and "modifications" exist, populate/merge into "modifications" (e.g., fillna
from "Modifications" or prefer non-null values) and then drop the
"Modifications" column while logging a warning; (2) if only "Modifications"
exists, rename it to "modifications" as before; (3) if only "modifications"
exists, do nothing; ensure after this block df_copy always has a "modifications"
column when either original column was present.

Comment on lines +457 to +458
sample_file_df["Sample"] = sample_file_df["Sample"].astype(int)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential ValueError if Sample column contains non-numeric values.

astype(int) will raise a ValueError if the Sample column contains non-integer-convertible values. Consider adding error handling or validation.

🔎 Suggested fix
     sample_file_df = file_df.copy()
-    sample_file_df["Sample"] = sample_file_df["Sample"].astype(int)
+    try:
+        sample_file_df["Sample"] = pd.to_numeric(sample_file_df["Sample"], errors="raise").astype(int)
+    except (ValueError, TypeError) as e:
+        log.warning(f"Failed to convert Sample column to int: {e}. Returning empty dict.")
+        return dict()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sample_file_df["Sample"] = sample_file_df["Sample"].astype(int)
sample_file_df = file_df.copy()
try:
sample_file_df["Sample"] = pd.to_numeric(sample_file_df["Sample"], errors="raise").astype(int)
except (ValueError, TypeError) as e:
log.warning(f"Failed to convert Sample column to int: {e}. Returning empty dict.")
return dict()
🤖 Prompt for AI Agents
In pmultiqc/modules/common/common_utils.py around lines 457-458, the direct use
of sample_file_df["Sample"].astype(int) can raise a ValueError for non-numeric
values; replace this with a safe conversion using pandas.to_numeric (e.g.,
sample_file_df["Sample"] = pd.to_numeric(sample_file_df["Sample"].str.strip(),
errors="coerce")), then check for resulting NaNs and either raise a clear
ValueError listing offending rows or handle them (drop/fill) as appropriate for
the pipeline, ensuring any error message includes filename/row info for easier
debugging.

Comment on lines +248 to 256
mod_plot_by_sample = dia_sample_level_modifications(
df=report_data,
sdrf_file_df=sdrf_file_df
)

# Update quantms_modified with processed data
quantms_modified["plot_data"] = mod_plot_dict
quantms_modified["plot_data"] = [mod_plot_by_run, mod_plot_by_sample]
quantms_modified["cats"] = list(
sorted(modified_cats, key=lambda x: (x == "Modified (Total)", x))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

dia_sample_level_modifications called unconditionally may fail with empty SDRF.

The function dia_sample_level_modifications performs a merge with sdrf_file_df without checking if it's empty. If sdrf_file_df is empty, the merge will produce an empty DataFrame, and subsequent operations may not behave as expected.

🔎 Suggested fix
-    mod_plot_by_sample = dia_sample_level_modifications(
-        df=report_data,
-        sdrf_file_df=sdrf_file_df
-    )
+    if sdrf_file_df is not None and not sdrf_file_df.empty:
+        mod_plot_by_sample = dia_sample_level_modifications(
+            df=report_data,
+            sdrf_file_df=sdrf_file_df
+        )
+    else:
+        mod_plot_by_sample = {}
 
     # Update quantms_modified with processed data
-    quantms_modified["plot_data"] = [mod_plot_by_run, mod_plot_by_sample]
+    if mod_plot_by_sample:
+        quantms_modified["plot_data"] = [mod_plot_by_run, mod_plot_by_sample]
+    else:
+        quantms_modified["plot_data"] = mod_plot_by_run
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
mod_plot_by_sample = dia_sample_level_modifications(
df=report_data,
sdrf_file_df=sdrf_file_df
)
# Update quantms_modified with processed data
quantms_modified["plot_data"] = mod_plot_dict
quantms_modified["plot_data"] = [mod_plot_by_run, mod_plot_by_sample]
quantms_modified["cats"] = list(
sorted(modified_cats, key=lambda x: (x == "Modified (Total)", x))
if sdrf_file_df is not None and not sdrf_file_df.empty:
mod_plot_by_sample = dia_sample_level_modifications(
df=report_data,
sdrf_file_df=sdrf_file_df
)
else:
mod_plot_by_sample = {}
# Update quantms_modified with processed data
if mod_plot_by_sample:
quantms_modified["plot_data"] = [mod_plot_by_run, mod_plot_by_sample]
else:
quantms_modified["plot_data"] = mod_plot_by_run
quantms_modified["cats"] = list(
sorted(modified_cats, key=lambda x: (x == "Modified (Total)", x))
🤖 Prompt for AI Agents
In pmultiqc/modules/common/dia_utils.py around lines 248 to 256,
dia_sample_level_modifications is called unconditionally with sdrf_file_df which
can be empty and cause merges to yield empty DataFrames and break downstream
logic; modify the code to first check if sdrf_file_df is empty (e.g., if
sdrf_file_df is None or sdrf_file_df.empty) and only call
dia_sample_level_modifications when it contains rows, otherwise set
mod_plot_by_sample to a safe default (empty DataFrame or structure matching
expected output) and ensure quantms_modified["plot_data"] and
quantms_modified["cats"] are populated consistently so downstream code does not
assume non-empty merge results.

Comment on lines +837 to +856
def dia_sample_level_modifications(df, sdrf_file_df):

report_data = df.copy()

report_data = report_data.merge(
right=sdrf_file_df[["Sample", "Run"]].drop_duplicates(),
on="Run"
)

report_data["Sample"] = report_data["Sample"].astype(int)

mod_plot = dict()
for sample, group in report_data.groupby("Sample", sort=True):

mod_plot_dict, _ = summarize_modifications(
group.drop_duplicates()
)
mod_plot[f"Sample {str(sample)}"] = mod_plot_dict

return mod_plot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing empty DataFrame guard in dia_sample_level_modifications.

The function assumes sdrf_file_df has data. If called with an empty DataFrame, the merge at line 841-844 will produce an empty result, and the subsequent groupby will iterate over nothing, returning an empty mod_plot. Consider adding an early return.

🔎 Suggested fix
 def dia_sample_level_modifications(df, sdrf_file_df):
 
+    if sdrf_file_df is None or sdrf_file_df.empty:
+        return {}
+
     report_data = df.copy()
 
     report_data = report_data.merge(
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def dia_sample_level_modifications(df, sdrf_file_df):
report_data = df.copy()
report_data = report_data.merge(
right=sdrf_file_df[["Sample", "Run"]].drop_duplicates(),
on="Run"
)
report_data["Sample"] = report_data["Sample"].astype(int)
mod_plot = dict()
for sample, group in report_data.groupby("Sample", sort=True):
mod_plot_dict, _ = summarize_modifications(
group.drop_duplicates()
)
mod_plot[f"Sample {str(sample)}"] = mod_plot_dict
return mod_plot
def dia_sample_level_modifications(df, sdrf_file_df):
if sdrf_file_df is None or sdrf_file_df.empty:
return {}
report_data = df.copy()
report_data = report_data.merge(
right=sdrf_file_df[["Sample", "Run"]].drop_duplicates(),
on="Run"
)
report_data["Sample"] = report_data["Sample"].astype(int)
mod_plot = dict()
for sample, group in report_data.groupby("Sample", sort=True):
mod_plot_dict, _ = summarize_modifications(
group.drop_duplicates()
)
mod_plot[f"Sample {str(sample)}"] = mod_plot_dict
return mod_plot
🧰 Tools
🪛 Ruff (0.14.10)

854-854: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🤖 Prompt for AI Agents
In pmultiqc/modules/common/dia_utils.py around lines 837 to 856, the function
dia_sample_level_modifications assumes sdrf_file_df has rows and proceeds to
merge and group; add an early-empty guard: if sdrf_file_df is empty (or if the
merge results in an empty report_data) immediately return an empty dict (or the
same type as mod_plot) to avoid downstream processing; implement the check
before the merge (and a second check after the merge) and return {} when empty.

Comment on lines +75 to +80
if not sdrf_file_df.empty:

df_sub = df_sub.merge(
sdrf_file_df[["Sample", "Run"]].drop_duplicates(),
on="Run"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential AttributeError if sdrf_file_df is None.

The function signature accepts sdrf_file_df but checks sdrf_file_df.empty directly. If the caller passes None instead of an empty DataFrame, this will raise AttributeError.

🔎 Suggested fix
-    if not sdrf_file_df.empty:
+    if sdrf_file_df is not None and not sdrf_file_df.empty:
🤖 Prompt for AI Agents
In pmultiqc/modules/common/plots/dia.py around lines 75 to 80, the code checks
sdrf_file_df.empty directly which will raise AttributeError if sdrf_file_df is
None; change the conditional to explicitly guard against None (e.g. if
sdrf_file_df is not None and not sdrf_file_df.empty) before calling .merge, so
the merge only runs when sdrf_file_df is a real DataFrame with rows.

Comment on lines +210 to +213
if sdrf_file_df.empty:

bar_data = stat_data_by_run.to_dict(orient="index")
plot_label = ["by Run"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same None-safety issue for sdrf_file_df.empty check.

Consistent with draw_dia_intensity_dis, this check should guard against None.

🔎 Suggested fix
-    if sdrf_file_df.empty:
+    if sdrf_file_df is None or sdrf_file_df.empty:
 
         bar_data = stat_data_by_run.to_dict(orient="index")
🤖 Prompt for AI Agents
In pmultiqc/modules/common/plots/dia.py around lines 210 to 213, the check "if
sdrf_file_df.empty:" is unsafe if sdrf_file_df can be None; change the guard to
explicitly handle None (e.g. use "if not sdrf_file_df or sdrf_file_df.empty:")
so the branch executes correctly when sdrf_file_df is None or an empty
DataFrame, then proceed to build bar_data and plot_label as before.

Comment on lines +540 to +545
if not sdrf_file_df.empty:

plot_data = {
condition: group["log_intensity_std"].dropna().tolist()
for condition, group in grouped_std.groupby("run_condition")
}
df_sub = df_sub.merge(
sdrf_file_df[["Sample", "Run"]].drop_duplicates(),
on="Run"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard against None for sdrf_file_df in calculate_dia_intensity_std.

Same pattern as other functions - should check for None before accessing .empty.

🔎 Suggested fix
-    if not sdrf_file_df.empty:
+    if sdrf_file_df is not None and not sdrf_file_df.empty:
🤖 Prompt for AI Agents
In pmultiqc/modules/common/plots/dia.py around lines 540 to 545, the code checks
sdrf_file_df.empty without guarding against sdrf_file_df being None; update the
condition to first check that sdrf_file_df is not None (e.g., if sdrf_file_df is
not None and not sdrf_file_df.empty:) and keep the existing merge only when that
check passes so the function safely skips the merge when sdrf_file_df is None.

Comment on lines +176 to 177
"data_labels": ["by Run", "by Sample"]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Hardcoded data_labels may not match actual data structure.

The data_labels is hardcoded to ["by Run", "by Sample"], but if missed_cleavages_data["plot_data"] contains only one dataset (not a list), this will cause a mismatch between labels and data.

🔎 Suggested fix
+    if isinstance(missed_cleavages_data["plot_data"], list):
+        plot_label = ["by Run", "by Sample"]
+    else:
+        plot_label = ["by Run"]
+
     draw_config = {
         "id": "missed_cleavages",
         "cpswitch": False,
         "cpswitch_c_active": False,
         "title": "Missed Cleavages",
         "tt_decimals": 2,
         "ylab": "Missed Cleavages [%]",
-        "data_labels": ["by Run", "by Sample"]
+        "data_labels": plot_label
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"data_labels": ["by Run", "by Sample"]
}
if isinstance(missed_cleavages_data["plot_data"], list):
plot_label = ["by Run", "by Sample"]
else:
plot_label = ["by Run"]
draw_config = {
"id": "missed_cleavages",
"cpswitch": False,
"cpswitch_c_active": False,
"title": "Missed Cleavages",
"tt_decimals": 2,
"ylab": "Missed Cleavages [%]",
"data_labels": plot_label
}
🤖 Prompt for AI Agents
In pmultiqc/modules/common/plots/id.py around lines 176-177, the data_labels
list is hardcoded to ["by Run", "by Sample"] which can mismatch when
missed_cleavages_data["plot_data"] is a single dataset (not a list); update the
code to detect if plot_data is a list and set data_labels dynamically (e.g., if
plot_data is not a list wrap it into a list or generate labels matching the
number of datasets, using sensible defaults or indexed names) so that the length
of data_labels always matches the number of datasets in plot_data.

Comment on lines +490 to 500
mc_ratio = list()

for k in ["ms_runs", "sdrf_samples"]:
mc_ratio.append(rebuild_dict_structure(quantms_missed_cleavages[k]))

mc_data = {
"plot_data": mc_ratio,
"cats": ["0", "1", ">=2"]
}

draw_msms_missed_cleavages(sub_sections, mc_data, False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential KeyError when accessing quantms_missed_cleavages keys.

The loop iterates over ["ms_runs", "sdrf_samples"] but doesn't verify these keys exist in quantms_missed_cleavages. If sdrf_samples is missing, this will raise a KeyError.

🔎 Suggested fix
-        mc_ratio = list()
-
-        for k in ["ms_runs", "sdrf_samples"]:
-            mc_ratio.append(rebuild_dict_structure(quantms_missed_cleavages[k]))
+        mc_ratio = [
+            rebuild_dict_structure(quantms_missed_cleavages[k])
+            for k in ["ms_runs", "sdrf_samples"]
+            if k in quantms_missed_cleavages and quantms_missed_cleavages[k]
+        ]
+
+        if not mc_ratio:
+            mc_ratio = [rebuild_dict_structure(quantms_missed_cleavages.get("ms_runs", {}))]
🤖 Prompt for AI Agents
In pmultiqc/modules/common/plots/id.py around lines 490 to 500, the loop
directly indexes quantms_missed_cleavages for keys "ms_runs" and "sdrf_samples"
which can raise KeyError if a key is missing; change the loop to safely access
those keys (e.g., use dict.get(key, {}) or check key in
quantms_missed_cleavages) and ensure rebuild_dict_structure receives a valid
default (empty structure) so mc_ratio always appends a safe value even when a
key is absent before calling draw_msms_missed_cleavages.

@ypriverol ypriverol merged commit 438cdfa into main Dec 25, 2025
20 of 21 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Dec 26, 2025
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants