FragPipe first implementation by ypriverol · Pull Request #495 · bigbio/pmultiqc

ypriverol · 2026-01-04T16:43:17Z

User description

Pull Request

Description

Brief description of the changes made in this PR.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Performance improvement
Code refactoring
Test addition/update
Updates to the dependencies has been done.

PR Type

Enhancement, Tests, Documentation

Description

New FragPipe module implementation with complete data parsing and visualization capabilities for PSM (Peptide Spectrum Match) analysis
Parses PSM data including delta mass, charge states, and retention times; generates plots for mass error, charge-state distribution, identification statistics, and retention time analysis
Calculates pipeline statistics including protein and peptide counts with modification tracking
Refactored common plotting functions to support multiple pipelines: renamed draw_quantms_identification() to draw_identification() and updated parameter names for broader applicability
Updated existing modules (QuantMS, DIA-NN, mzIdentML) to use renamed functions and inherit from BasePMultiqcModule for code reuse
Refactored core plugin loading system with centralized PLUGIN_MAP configuration and loop-based plugin loading for maintainability
Added FragPipe CLI option (--fragpipe-plugin) and TSV file configuration for FragPipe support
Comprehensive documentation including README updates, usage examples, and example dataset configurations
CI/CD integration with new FragPipe test job that downloads test data from PRIDE repository and validates plugin functionality

Diagram Walkthrough

flowchart LR
  A["FragPipe<br/>Module"] -->|"Parses PSM data"| B["fragpipe.py<br/>Data Processing"]
  A -->|"File I/O"| C["fragpipe_io.py<br/>TSV Reader"]
  B -->|"Uses"| D["Common Plots<br/>Identification"]
  D -->|"Refactored"| E["QuantMS<br/>DIA-NN<br/>mzIdentML"]
  F["CLI Option<br/>--fragpipe-plugin"] -->|"Enables"| A
  G["Core Plugin<br/>System"] -->|"Loads via<br/>PLUGIN_MAP"| A
  H["CI/CD Pipeline"] -->|"Tests"| A
  I["Documentation<br/>& Examples"] -->|"Supports"| A

File Walkthrough

Relevant files

Enhancement

6 files

fragpipe.py `FragPipe module implementation with data parsing and plotting` pmultiqc/modules/fragpipe/fragpipe.py New FragPipe module implementation with data parsing and visualization capabilities Parses PSM (Peptide Spectrum Match) data including delta mass, charge states, and retention times Generates plots for mass error, charge-state distribution, identification statistics, and retention time analysis Calculates pipeline statistics including protein and peptide counts with modification tracking	+322/-0
fragpipe_io.py `FragPipe file I/O and data reading utilities` pmultiqc/modules/fragpipe/fragpipe_io.py New I/O module for FragPipe file handling and data reading Implements `get_fragpipe_files()` to locate and load FragPipe TSV files Implements `psm_reader()` to parse PSM data with column validation and run extraction Defines required columns for PSM data validation	+111/-0
__init__.py `FragPipe module package initialization` pmultiqc/modules/fragpipe/init.py New package initialization file for FragPipe module Exports `FragPipeModule` class for public API	+3/-0
dia.py `Update DIA charge-state and delta mass plot configurations` pmultiqc/modules/common/plots/dia.py Updated charge-state plot configuration ID and title for consistency Added `cpswitch_c_active` parameter to charge-state plot configuration Removed filtering condition in delta mass calculation to include zero values Added HTML validation check for delta mass plot output	+7/-4
cli.py `Add FragPipe plugin command-line option` pmultiqc/cli.py Added `--fragpipe-plugin` command-line option to enable FragPipe plugin	+3/-0
multiqc_report.html `Update MultiQC report with FragPipe support` docs/PXD054720_disable_hoverinfo/multiqc_report.html Updated compressed plot data with new report UUID Modified report metadata to include FragPipe in the list of supported tools Updated report generation timestamp from 2025-12-30 to 2026-01-04 Added FragPipe link reference in the pmultiqc description section	+7/-6

Refactoring

5 files

id.py `Refactor function names and parameters for multi-pipeline support` pmultiqc/modules/common/plots/id.py Renamed `draw_quantms_identification()` to `draw_identification()` for broader applicability Renamed `draw_quantms_identi_num()` to `draw_identi_num()` for consistency Updated `draw_summary_protein_ident_table()` parameter from `enable_dia` to `use_two_columns` Updated `draw_num_pep_per_protein()` parameter from `enable_mzid` to `is_fragpipe_or_mzid` Enhanced documentation to support FragPipe and updated help text for delta mass and retention time Added FragPipe-specific retention time handling in `draw_ids_rt_count()`	+16/-9
quantms.py `Refactor QuantMS module to use base class and updated function names` pmultiqc/modules/quantms/quantms.py Updated `QuantMSModule` to inherit from `BasePMultiqcModule` for code reuse Simplified initialization by calling parent class constructor Updated function calls to use renamed functions (`draw_identification`, `draw_identi_num`) Updated parameter names in function calls (`use_two_columns` instead of `enable_dia`) Fixed code formatting/indentation for mass error calculation	+14/-14
diann.py `Refactor DIA-NN module to use base class and updated function names` pmultiqc/modules/diann/diann.py Updated `DiannModule` to inherit from `BasePMultiqcModule` for code reuse Simplified initialization by calling parent class constructor Updated function calls to use renamed functions (`draw_identification`, `draw_identi_num`) Updated parameter names in function calls (`use_two_columns` instead of `enable_dia`)	+9/-9
mzidentml.py `Update mzIdentML module function call` pmultiqc/modules/mzidentml/mzidentml.py Updated function call from `draw_quantms_identification()` to `draw_identification()`	+1/-1
core.py `Refactor core plugin loading system with centralized configuration` pmultiqc/modules/core/core.py Added `PLUGIN_MAP` dictionary to centralize plugin configuration mapping Refactored plugin loading logic into `load_and_run_plugin()` method for maintainability Replaced multiple if-elif statements with loop-based plugin loading Updated module description to include FragPipe support Fixed `heatmap_color_list` variable scope by assigning to instance variable	+32/-36

Formatting

1 files

maxquant_utils.py `Code formatting fix for percentage calculation` pmultiqc/modules/maxquant/maxquant_utils.py Fixed code formatting/indentation for percentage calculation in peptide per protein analysis	+2/-2

Configuration changes

3 files

main.py `Add TSV file configuration for FragPipe support` pmultiqc/main.py Added `pmultiqc/tsv` configuration for FragPipe TSV file detection Improved code formatting for configuration update calls with better line breaks	+24/-6
config.json `Add FragPipe example dataset configurations` docs/config.json Added two new example configurations for FragPipe (PXD070239 and PXD070239_disable_hoverinfo) Configured to download `psm.tsv` from PRIDE FTP repository	+16/-0
pyproject.toml `Register FragPipe plugin CLI entry point` pyproject.toml Added `fragpipe_plugin` entry point for CLI command registration	+1/-0

Documentation

4 files

update_examples.py `Add FragPipe example report generation` docs/update_examples.py Added FragPipe plugin command generation for example report creation	+3/-0
README.md `Add FragPipe documentation and usage examples` docs/README.md Added FragPipe as supported data source with `psm.tsv` file requirement Added FragPipe usage example with `--fragpipe-plugin` command Added FragPipe to command-line options table Added FragPipe example report link to documentation	+11/-0
multiqc_report.html `Update example report with FragPipe documentation` docs/PXD054720/multiqc_report.html Updated report UUID and creation timestamp Updated module description to include FragPipe support with link Updated AI report metadata to reflect FragPipe addition	+7/-6
README.md `Document FragPipe support and usage instructions` README.md Added FragPipe to the list of supported data sources with `psm.tsv` file format Included usage instructions for FragPipe plugin with example command Added `--fragpipe-plugin` to the command-line options table with description	+10/-0

Tests

1 files

python-app.yml `Add FragPipe integration test to CI/CD pipeline` .github/workflows/python-app.yml Added new `test_fragpipe` job to the CI/CD workflow Downloads FragPipe test data from PRIDE repository Runs MultiQC with `--fragpipe-plugin` flag on test data Uploads generated results as artifacts for validation	+24/-1

Additional files

18 files

multiqc_report.html	+37/-35
multiqc_report.html	+37/-35
multiqc_report.html	+9/-7
multiqc_report.html	+9/-7
multiqc_report.html	+9/-7
multiqc_report.html	+11/-9
multiqc_report.html	+10/-8
multiqc_report.html	+9/-7
multiqc_report.html	+7/-6
multiqc_report.html	+7/-6
multiqc_report.html	+3371/-0
multiqc_report.html	+3371/-0
multiqc_report.html	+7/-6
multiqc_report.html	+7/-6
multiqc_report.html	+9/-7
multiqc_report.html	+9/-7
multiqc_report.html	+37/-35
multiqc_report.html	+37/-35

Summary by CodeRabbit

Release Notes

New Features
- Added FragPipe plugin support for analyzing PSM data files with new --fragpipe-plugin CLI option
- Added FragPipe-specific visualizations for delta mass, charge state, and retention time analysis
- Implemented dynamic plugin loading system for improved plugin management
Documentation
- Added FragPipe data source documentation and usage examples
Improvements
- Enhanced delta mass plot calculations and charge state handling

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Add FragPipe plugin

Auto-update examples

fix

Dev

fix

Add FragPipe plugin

coderabbitai · 2026-01-04T16:43:52Z

📝 Walkthrough

Walkthrough

This PR adds comprehensive FragPipe support to pmultiqc by introducing a new FragPipe module for PSM data parsing and visualization, refactoring the plugin system to use dynamic loading via PLUGIN_MAP, renaming identification-related API functions for consistency, and updating dependent modules to adopt the new patterns and base class.

Changes

Cohort / File(s)	Summary
FragPipe Module Implementation `pmultiqc/modules/fragpipe/__init__.py`, `pmultiqc/modules/fragpipe/fragpipe.py`, `pmultiqc/modules/fragpipe/fragpipe_io.py`	New complete FragPipe support: FragPipeModule parses PSM data, extracts delta masses, charge states, retention times, and pipeline statistics; fragpipe_io provides file discovery and PSM reading with column validation.
Plugin System Refactoring `pmultiqc/modules/core/core.py`, `pmultiqc/cli.py`, `pyproject.toml`	Introduces PLUGIN_MAP for dynamic plugin loading, adds `--fragpipe-plugin` flag, replaces hard-coded plugin blocks with generic loader via `load_and_run_plugin()`.
Identification Plotting API Refactoring `pmultiqc/modules/common/plots/id.py`	Renames `draw_quantms_identification` → `draw_identification`, `draw_quantms_identi_num` → `draw_identi_num`; updates parameter names: `enable_dia` → `use_two_columns`, `enable_mzid` → `is_fragpipe_or_mzid`; adds FragPipe-specific branches for retention-time plot.
DIA Module Visualization Updates `pmultiqc/modules/common/plots/dia.py`	Simplifies charge-state section naming; adjusts delta mass data selection to include zero values; adds `plot_html_check` post-processing.
Module Updates with New Base Class & API `pmultiqc/modules/diann/diann.py`, `pmultiqc/modules/quantms/quantms.py`, `pmultiqc/modules/mzidentml/mzidentml.py`	Updates to inherit from `BasePMultiqcModule`; calls renamed identification functions; adapts parameter names for consistency across DiannModule, QuantMSModule, and MzIdentML processing.
Documentation & Configuration `README.md`, `docs/README.md`, `docs/config.json`, `docs/update_examples.py`, `pmultiqc/main.py`	Adds FragPipe data source documentation, CLI usage examples; adds two FragPipe project entries to config; adds FragPipe plugin branch in example runner; adds TSV file pattern to pmultiqc config.
Report & Report Metadata `docs/PXD054720/multiqc_report.html`, `docs/PXD054720_disable_hoverinfo/multiqc_report.html`, `.github/workflows/python-app.yml`	Updates report UUIDs and timestamps; reinstates proteobench path whitespace; adds new `test_fragpipe` CI job.
Minor Refactoring `pmultiqc/modules/maxquant/maxquant_utils.py`	Formatting adjustment to percentage calculation (no semantic change).

Sequence Diagram(s)

sequenceDiagram
    participant CLI
    participant Core as PMultiQC Core
    participant PM as PLUGIN_MAP
    participant Loader as Plugin Loader
    participant Fragment as FragPipe Module
    participant Log as Log Parser

    CLI->>Core: Initialize with --fragpipe-plugin flag
    Core->>Core: Check PLUGIN_MAP
    Core->>PM: Lookup fragpipe_plugin key
    PM-->>Core: Return module path
    Core->>Loader: load_and_run_plugin()
    Loader->>Fragment: Dynamically import & instantiate<br/>FragPipeModule(find_log_files, sub_sections, heatmap_colors)
    Fragment->>Log: find_log_files() discover PSM data
    Log-->>Fragment: Return fragpipe_files dict
    Fragment->>Fragment: parse_psm() extract delta masses,<br/>charges, stats, retention
    Fragment->>Fragment: draw_plots() generate visualizations
    Fragment-->>Core: Return data & plots
    Core->>Core: Add plots to report
    Core-->>CLI: Report generated

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Major refactoring and cleaning of the code. #394 — Introduces the plugin infrastructure and PLUGIN_MAP pattern that this PR extends with FragPipe support.
First iteration about supporting some SDRF-based plots #478 — Modifies the same identification plotting API functions (draw_quantms_identification, draw_quantms_identi_num) being renamed in this PR.
Dev #217 — Refactors pmultiqc/modules/core/core.py plugin loading logic that this PR substantially overhauls.

Suggested labels

Review effort 4/5

Suggested reviewers

daichengxin
yueqixuan

Poem

🐰 A rabbit hops through FragPipe's data stream,
Parsing PSM files and charge-state dreams,
Plugins now dance in dynamic grace,
While renamed APIs find their place! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 3.13% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'FragPipe first implementation' clearly and specifically describes the main change: adding initial FragPipe plugin support to the codebase, which is confirmed by the extensive changes across multiple files.

✨ Finishing touches

📝 Generate docstrings

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.40.3)

docs/PXD054720/multiqc_report.html

docs/PXD054720_disable_hoverinfo/multiqc_report.html

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

qodo-code-review · 2026-01-04T16:44:25Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
⚪	Resource exhaustion Description: Unbounded `pd.read_csv()` on attacker-controlled `.tsv` inputs can enable resource-exhaustion (CPU/RAM) denial-of-service by providing extremely large or pathological TSV files (no size/row limits, dtype constraints, or chunking). fragpipe_io.py [72-85]* Referred Code def psm_reader(file_path: str): psm_df = pd.read_csv(file_path, sep="\t") if "Spectrum" not in psm_df.columns: raise ValueError("psm.tsv must contain a 'Spectrum' column") required_cols = [c for c in REQUIRED_COLS["psm"] if c in psm_df.columns] psm_df["Run"] = psm_df["Spectrum"].astype(str).str.rsplit(".", n=3).str[0] required_cols.append("Run") return psm_df[required_cols].copy()
	Unpinned CI download Description: CI downloads external test data via `wget` from a remote PRIDE URL without pinning a checksum/signature, creating a supply-chain risk where compromised/altered remote content could influence the build and execution environment. python-app.yml [274-277] Referred Code - name: Test FragPipe file run: \| wget -nv -P ./fragpipe https://ftp.pride.ebi.ac.uk/pride/data/archive/2025/12/PXD070239/psm.tsv multiqc --fragpipe-plugin ./fragpipe -o ./results_fragpipe
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
🔴	Generic: Meaningful Naming and Self-Documenting Code Objective: Ensure all identifiers clearly express their purpose and intent, making code self-documenting Status: Misspelled identifier: The new variable name `categorys` is misspelled and reduces readability/self-documentation. Referred Code categorys = OrderedDict() categorys["Frequency"] = { "name": "Frequency", "description": "number of peptides per proteins", } pep_plot.to_dict(percentage=True, cats=categorys) Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Robust Error Handling and Edge Case Management Objective: Ensure comprehensive error handling that provides meaningful context and graceful degradation Status: None not handled: `fragpipe_io.get_fragpipe_files()` can return `None`, but `get_data()` assumes a dict and immediately indexes `self.fragpipe_files["psm"]`, which can raise an exception instead of degrading gracefully. Referred Code self.fragpipe_files = fragpipe_io.get_fragpipe_files(self.find_log_files) if self.fragpipe_files["psm"]: ( self.delta_masses, self.charge_states, self.pipeline_stats, self.retentions ) = self.parse_psm( fragpipe_files=self.fragpipe_files ) else: log.warning("Required input not found: psm.tsv") return False Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Security-First Input Validation and Data Handling Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent vulnerabilities Status: Weak input validation: `psm_reader()` does not validate delimiter/encoding or enforce required columns beyond `Spectrum`, so malformed/unexpected TSV inputs may cause downstream failures rather than being rejected with actionable validation errors. Referred Code def psm_reader(file_path: str): psm_df = pd.read_csv(file_path, sep="\t") if "Spectrum" not in psm_df.columns: raise ValueError("psm.tsv must contain a 'Spectrum' column") required_cols = [c for c in REQUIRED_COLS["psm"] if c in psm_df.columns] psm_df["Run"] = psm_df["Spectrum"].astype(str).str.rsplit(".", n=3).str[0] required_cols.append("Run") return psm_df[required_cols].copy() Learn more about managing compliance generic rules or creating your own custom rules
⚪	Generic: Comprehensive Audit Trails Objective: To create a detailed and reliable record of critical system actions for security analysis and compliance. Status: No user context: Logging is present for file parsing and plot generation but does not include user identity/context required by the audit trail criteria, which may or may not be applicable to this CLI/tooling context. Referred Code log.info("Starting data recognition and processing...") self.fragpipe_files = fragpipe_io.get_fragpipe_files(self.find_log_files) if self.fragpipe_files["psm"]: ( self.delta_masses, self.charge_states, self.pipeline_stats, self.retentions ) = self.parse_psm( fragpipe_files=self.fragpipe_files ) else: log.warning("Required input not found: psm.tsv") return False return True def draw_plots(self): ... (clipped 2 lines) Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Error Handling Objective: To prevent the leakage of sensitive system information through error messages while providing sufficient detail for internal debugging. Status: Uncaught read errors: `psm_reader()` performs `pd.read_csv()` without catching exceptions, which may surface stack traces and internal details to end users depending on MultiQC's exception handling behavior. Referred Code def psm_reader(file_path: str): psm_df = pd.read_csv(file_path, sep="\t") if "Spectrum" not in psm_df.columns: raise ValueError("psm.tsv must contain a 'Spectrum' column") required_cols = [c for c in REQUIRED_COLS["psm"] if c in psm_df.columns] psm_df["Run"] = psm_df["Spectrum"].astype(str).str.rsplit(".", n=3).str[0] required_cols.append("Run") return psm_df[required_cols].copy() Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Logging Practices Objective: To ensure logs are useful for debugging and auditing without exposing sensitive information like PII, PHI, or cardholder data. Status: Logs full paths: Debug logging outputs full local file paths (`log.debug(...)`), which can be sensitive in some environments and should be reviewed against the project's logging policy. Referred Code if any(fragpipe_files.values()): for k, v in fragpipe_files.items(): log.info(f"FragPipe data loaded: {k} ({len(v)} files).") log.debug(f"FragPipe data loaded: {k}: {v}") Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

qodo-code-review · 2026-01-04T16:45:47Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Correct broken test data URL Correct the broken FTP URL for the `psm.tsv` file in the `PXD070239` project configuration. The current URL points to a non-existent future path; it should be updated to the correct path in the PRIDE repository. docs/config.json [167-182] { "accession": "PXD070239", "urls": [ - "https://ftp.pride.ebi.ac.uk/pride/data/archive/2025/12/PXD070239/psm.tsv" + "https://ftp.pride.ebi.ac.uk/pride/data/archive/2024/01/PXD070239/fragpipe_output/psm.tsv" ], "path": "docs/PXD070239", "file_type": ["fragpipe", ""] }, { "accession": "PXD070239_disable_hoverinfo", "urls": [ - "https://ftp.pride.ebi.ac.uk/pride/data/archive/2025/12/PXD070239/psm.tsv" + "https://ftp.pride.ebi.ac.uk/pride/data/archive/2024/01/PXD070239/fragpipe_output/psm.tsv" ], "path": "docs/PXD070239_disable_hoverinfo", "file_type": ["fragpipe", "disable_hoverinfo"] } Apply / Chat Suggestion importance[1-10]: 9 __ Why: The suggestion correctly identifies a broken URL for test data, which would cause the CI pipeline and documentation generation to fail, making this a critical fix for the PR's functionality.	High
	Add None check for file loading Add a check to ensure the result of `fragpipe_io.get_fragpipe_files` is not `None` before attempting to access the `["psm"]` key to prevent a potential runtime error. pmultiqc/modules/fragpipe/fragpipe.py [52-65] -self.fragpipe_files = fragpipe_io.get_fragpipe_files(self.find_log_files) - -if self.fragpipe_files["psm"]: - ( - self.delta_masses, - self.charge_states, - self.pipeline_stats, - self.retentions - ) = self.parse_psm( - fragpipe_files=self.fragpipe_files - ) -else: +files = fragpipe_io.get_fragpipe_files(self.find_log_files) +if not files or not files.get("psm"): log.warning("Required input not found: psm.tsv") return False +self.fragpipe_files = files +( + self.delta_masses, + self.charge_states, + self.pipeline_stats, + self.retentions +) = self.parse_psm( + fragpipe_files=self.fragpipe_files +) + Apply / Chat Suggestion importance[1-10]: 8 __ Why: This suggestion correctly identifies a potential `TypeError` if `get_fragpipe_files` returns `None` and prevents it by adding a necessary guard clause, which is a critical fix for robust error handling.	Medium
	Fix dead code by updating file list Add 'peptide' and 'protein' to the `required_files` list to activate the existing logic for handling these file types and remove dead code. pmultiqc/modules/fragpipe/fragpipe_io.py [35-56] -required_files = ["psm"] +required_files = ["psm", "peptide", "protein"] req_set = set(required_files) fragpipe_files = {req: [] for req in required_files} # FragPipe tsv Data for file_info in find_log_files("pmultiqc/tsv", filecontents=False): filename = file_info["fn"] full_path = os.path.join(file_info["root"], filename) for req in req_set: if req in filename: if req == "protein" and "combined_protein" in filename: continue if req == "peptide" and ( "combined_peptide" in filename or "combined_modified_peptide" in filename ): continue fragpipe_files[req].append(full_path) Apply / Chat* Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies dead code and an inconsistency, as the code includes logic for `protein` and `peptide` files but doesn't search for them. Applying this change improves code quality and maintainability for future extensions.	Low
General	Improve performance and memory usage Refactor the `_calculate_statistics` function to use `nunique()` for counting unique items, improving performance and reducing memory usage compared to the current `len(set(...))` and `groupby().agg(list)` approach. pmultiqc/modules/fragpipe/fragpipe.py [277-312] def _calculate_statistics(pipeline_stats: list): df = pd.concat(pipeline_stats, ignore_index=True) log.info(f"Number of pipeline result statistics rows in DataFrame: {len(df)}") summary_data = { - "total_proteins": len(set(df["Protein"])), - "total_peptides": len(set(df["Peptide"])) + "total_proteins": df["Protein"].nunique(), + "total_peptides": df["Peptide"].nunique() } stats_by_run = dict() for run, group in df.groupby("Run"): unique_group = group.loc[group["Is Unique"]] modified_peptides = group.loc[ group["Modified Peptide"].notna() & (group["Modified Peptide"] != ""), "Modified Peptide" ] stats_by_run[run] = { - "protein_num": len(set(group["Protein"])), - "peptide_num": len(set(group["Peptide"])), - "unique_peptide_num": len(set(unique_group["Peptide"])), + "protein_num": group["Protein"].nunique(), + "peptide_num": group["Peptide"].nunique(), + "unique_peptide_num": unique_group["Peptide"].nunique(), "modified_peptide_num": modified_peptides.nunique() } statistics_data = { "ms_runs": stats_by_run } - protein_pep_map = df.groupby("Protein")["Peptide"].agg(list).to_dict() pep_plot = Histogram("number of peptides per proteins", plot_category="frequency") - for _, peps in protein_pep_map.items(): - number = len(set(peps)) + pep_counts = df.groupby("Protein")["Peptide"].nunique() + for number in pep_counts: pep_plot.add_value(number) ... `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 7 __ Why: The suggestion provides a significant performance and memory optimization by replacing inefficient pandas operations with more direct and memory-friendly equivalents like `nunique()`, which is crucial for handling large datasets.	Medium
Security	Open external link safely Add `target="_blank"` and `rel="noopener noreferrer"` to the external link for `FragPipe` to enhance security and user experience. docs/PXD054720/multiqc_report.html [1392] -<a href='https://fragpipe.nesvilab.org/'>FragPipe</a>. +<a href="https://fragpipe.nesvilab.org/" target="_blank" rel="noopener noreferrer">FragPipe</a>. `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies a security best practice for external links by adding `target="_blank"` and `rel="noopener noreferrer"` to prevent tabnabbing, which is relevant to the newly added `<a>` tag.	Medium
More

Copilot

Pull request overview

This PR implements support for FragPipe proteomics analysis results in the pmultiqc tool, adding the capability to generate quality control reports from FragPipe's PSM (peptide-spectrum match) output files.

Key changes:

Added FragPipeModule with support for PSM file parsing and visualization
Refactored existing modules to use a common base class (BasePMultiqcModule)
Renamed several functions for better consistency across modules (e.g., draw_quantms_identification → draw_identification)

Reviewed changes

Copilot reviewed 19 out of 38 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
pmultiqc/modules/fragpipe/*	New FragPipe module implementation including I/O and visualization logic
pmultiqc/modules/base.py	New base class for all pmultiqc modules providing common initialization
pmultiqc/modules/core/core.py	Refactored plugin loading to use a mapping structure and consolidated plugin initialization
pmultiqc/cli.py	Added `--fragpipe-plugin` command-line option
pyproject.toml	Registered the FragPipe plugin entry point
pmultiqc/main.py	Added TSV file pattern configuration for FragPipe file discovery
pmultiqc/modules/common/plots/id.py	Renamed functions and parameters for better generalization across modules
pmultiqc/modules/quantms/quantms.py	Updated to use BasePMultiqcModule and renamed function calls
pmultiqc/modules/diann/diann.py	Updated to use BasePMultiqcModule and renamed function calls
pmultiqc/modules/mzidentml/mzidentml.py	Updated to use renamed functions
docs/*	Updated documentation and examples, added FragPipe configuration
.github/workflows/python-app.yml	Added CI/CD test for FragPipe functionality

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-04T16:48:10Z

pmultiqc/modules/fragpipe/fragpipe_io.py

+# def check_columns(file_path: str, data_type: str):
+
+#     try:
+#         df_header = pd.read_csv(file_path, sep="\t", nrows=0)
+#         actual_columns = set(df_header.columns.str.strip())
+
+#         missing_columns = [
+#             col
+#             for col in REQUIRED_COLS[data_type]


This comment appears to contain commented-out code.

Copilot · 2026-01-04T16:48:11Z

pmultiqc/modules/fragpipe/fragpipe_io.py

+#         if not missing_columns:
+#             log.info("Check passed: All required columns are present.")
+#             return True
+#         else:
+#             log.info(
+#                 f"Check failed: {file_path}. Missing the following columns: {missing_columns}"


This comment appears to contain commented-out code.

Copilot · 2026-01-04T16:48:11Z

pmultiqc/modules/fragpipe/fragpipe_io.py

+#     except Exception as e:
+#         log.warning(f"Check failed: Unable to read file: {e}")
+#         return False


This comment appears to contain commented-out code.

Copilot · 2026-01-04T16:48:11Z

pmultiqc/modules/core/core.py


-            if diann.get_data():
-                diann.draw_plots()
+                plugin_loaded = True


Variable plugin_loaded is not used.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (7)

pmultiqc/modules/maxquant/maxquant_utils.py (1)
119-121: Move DataFrame copy outside the loop to improve performance.

The .copy() call inside the loop creates a full copy of the DataFrame on every iteration, which is inefficient for large datasets. Move the copy before the loop if needed, or remove it if column assignments don't require a separate copy.
+            mq_data = mq_data.copy()
             for col, new_col in zip(intensity_cols, new_intensity_cols):
                 mq_data[new_col] = mq_data[col] / mq_data["mol. weight [kda]"]
-                mq_data = mq_data.copy()
.github/workflows/python-app.yml (1)

269-269: Update to the latest version of actions/setup-python.

The actions/setup-python@v4 is outdated. The latest stable version is v6.1.0. Consider updating to @v6 across all jobs in the workflow.
pmultiqc/modules/fragpipe/fragpipe_io.py (2)
45-56: Dead code: protein/peptide filtering will never execute.

The filtering conditions on lines 48-54 check for "protein" and "peptide" in req, but required_files only contains "psm" (line 35). This code is unreachable.

Additionally, the substring match if req in filename (line 46) could match unintended files (e.g., my_psm_backup.tsv). Consider using a more precise pattern like checking if the filename ends with psm.tsv.
🔎 Proposed fix
-        for req in req_set:
-            if req in filename:
-
-                if req == "protein" and "combined_protein" in filename:
-                    continue
-
-                if req == "peptide" and (
-                    "combined_peptide" in filename or "combined_modified_peptide" in filename
-                ):
-                    continue
-
-                fragpipe_files[req].append(full_path)
+        for req in req_set:
+            # Match files ending with the expected pattern (e.g., psm.tsv)
+            if filename.endswith(f"{req}.tsv"):
+                fragpipe_files[req].append(full_path)
87-110: Consider removing or enabling the commented-out helper.

This commented-out check_columns function could be useful for validating file structure before processing. Consider either enabling it in the workflow or removing it to reduce code clutter.
pmultiqc/modules/core/core.py (1)
75-101: Unreachable code: the if not plugin_loaded check is dead code.

The return statement on line 98 exits the method immediately after loading the first matching plugin, so line 100's condition if not plugin_loaded can never be reached when plugin_loaded is False. The only way to reach line 100 is if no plugin flag matched in the loop, which means the loop completed without setting plugin_loaded = True.

The logic works correctly, but setting plugin_loaded = True on line 97 is unnecessary since the method returns immediately afterward.
🔎 Suggested simplification
         for flag, (module_name, class_name) in PLUGIN_MAP.items():
 
             if config.kwargs.get(flag, False):
 
                 ModuleClass = get_module(module_name, class_name)
 
                 if "proteobench_plugin" == flag:
                     plugin = ModuleClass(self.find_log_files, None, None)
                 else:
                     plugin = ModuleClass(
                         self.find_log_files,
                         self.sub_sections,
                         self.heatmap_color_list
                     )
 
                 if plugin.get_data():
                     plugin.draw_plots()
 
-                plugin_loaded = True
                 return
 
-        if not plugin_loaded:
-            raise ValueError("No pmultiqc plugin selected; skipping.")
+        raise ValueError("No pmultiqc plugin selected; skipping.")
pmultiqc/modules/fragpipe/fragpipe.py (2)
309-320: Typo: categorys should be categories.

Minor spelling inconsistency with other modules.
🔎 Proposed fix
-    categorys = OrderedDict()
-    categorys["Frequency"] = {
+    categories = OrderedDict()
+    categories["Frequency"] = {
         "name": "Frequency",
         "description": "number of peptides per proteins",
     }
 
-    pep_plot.to_dict(percentage=True, cats=categorys)
+    pep_plot.to_dict(percentage=True, cats=categories)
277-322: Consider extracting _calculate_statistics as a module-level function vs class method.

This is a module-level function (good for testability), but the Histogram title "number of peptides per proteins" uses lowercase, which differs from other modules that use "Number of peptides per proteins". Consider aligning for consistency.
🔎 Consistency fix for histogram title
-    pep_plot = Histogram("number of peptides per proteins", plot_category="frequency")
+    pep_plot = Histogram("Number of peptides per proteins", plot_category="frequency")
-        "description": "number of peptides per proteins",
+        "description": "Number of identified peptides per protein.",

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f4d7134 and 1859013.

📒 Files selected for processing (38)

.github/workflows/python-app.yml
README.md
docs/DIANN/multiqc_report.html
docs/DIANN_disable_hoverinfo/multiqc_report.html
docs/LFQ_PXD007683/multiqc_report.html
docs/LFQ_PXD007683_disable_hoverinfo/multiqc_report.html
docs/MaxDIA/multiqc_report.html
docs/MaxDIA_disable_hoverinfo/multiqc_report.html
docs/PXD003133/multiqc_report.html
docs/PXD003133_disable_hoverinfo/multiqc_report.html
docs/PXD053068/multiqc_report.html
docs/PXD053068_disable_hoverinfo/multiqc_report.html
docs/PXD054720/multiqc_report.html
docs/PXD054720_disable_hoverinfo/multiqc_report.html
docs/PXD070239/multiqc_report.html
docs/PXD070239_disable_hoverinfo/multiqc_report.html
docs/ProteoBench/multiqc_report.html
docs/ProteoBench_disable_hoverinfo/multiqc_report.html
docs/README.md
docs/TMT_PXD007683/multiqc_report.html
docs/TMT_PXD007683_disable_hoverinfo/multiqc_report.html
docs/config.json
docs/dia/multiqc_report.html
docs/dia_disable_hoverinfo/multiqc_report.html
docs/update_examples.py
pmultiqc/cli.py
pmultiqc/main.py
pmultiqc/modules/common/plots/dia.py
pmultiqc/modules/common/plots/id.py
pmultiqc/modules/core/core.py
pmultiqc/modules/diann/diann.py
pmultiqc/modules/fragpipe/__init__.py
pmultiqc/modules/fragpipe/fragpipe.py
pmultiqc/modules/fragpipe/fragpipe_io.py
pmultiqc/modules/maxquant/maxquant_utils.py
pmultiqc/modules/mzidentml/mzidentml.py
pmultiqc/modules/quantms/quantms.py
pyproject.toml

🧰 Additional context used

🧬 Code graph analysis (4)

pmultiqc/modules/fragpipe/fragpipe_io.py (1)

pmultiqc/modules/common/logging.py (1)

get_logger (31-86)

pmultiqc/modules/diann/diann.py (2)

pmultiqc/modules/base.py (1)

BasePMultiqcModule (3-20)

pmultiqc/modules/common/plots/id.py (3)

draw_identification (385-517)

draw_summary_protein_ident_table (546-616)

draw_identi_num (619-837)

pmultiqc/modules/fragpipe/__init__.py (1)

pmultiqc/modules/fragpipe/fragpipe.py (1)

FragPipeModule (35-274)

pmultiqc/modules/common/plots/dia.py (1)

pmultiqc/modules/common/plots/general.py (1)

plot_html_check (17-22)

🪛 actionlint (1.7.9)

.github/workflows/python-app.yml

269-269: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 markdownlint-cli2 (0.18.1)

docs/README.md

84-84: Unordered list indentation
Expected: 0; Actual: 3

(MD007, ul-indent)

🪛 Ruff (0.14.10)

pmultiqc/modules/fragpipe/fragpipe_io.py

77-77: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: test_fragpipe
GitHub Check: test_proteobench
GitHub Check: test_mzid_mzML
GitHub Check: test_tmt
GitHub Check: test_maxquant
GitHub Check: test_lfq
GitHub Check: test_mzid_mgf
GitHub Check: test_maxquant_dia
GitHub Check: test_diann
GitHub Check: test_dia
GitHub Check: Agent
GitHub Check: Codacy Static Code Analysis

🔇 Additional comments (43)

pmultiqc/modules/maxquant/maxquant_utils.py (1)

406-408: Readability improvement via explicit parentheses.

The reformatted expression with parentheses clarifies operator precedence and improves readability. The functionality remains unchanged.

pmultiqc/modules/common/plots/dia.py (2)

224-224: LGTM! Simplified charge-state plot configuration.

The changes simplify the charge-state plot labels and configuration by removing "of Per File" from the title and ID. The addition of cpswitch_c_active: False on line 254 explicitly disables CPSwitch active state, which aligns with the simplified view.

Also applies to: 252-255

380-381: LGTM! Consistent HTML validation.

Adding plot_html_check to the delta mass plot ensures consistent HTML processing across all plots in this module. This aligns with the pattern used throughout the file (lines 39, 132, 171, 208, 267, 321, 415, 451, 490, 525, 560).

docs/README.md (1)

166-170: Documentation additions look good.

The FragPipe usage example, CLI option, and example link follow the established patterns and are consistent with other data source documentation.

Also applies to: 192-192, 246-246

docs/update_examples.py (1)

164-165: Implementation follows the established pattern.

The FragPipe plugin branch is correctly implemented and consistent with other plugin handlers.

.github/workflows/python-app.yml (1)

263-283: FragPipe test job implementation looks good.

The test job structure is consistent with other test jobs, correctly downloads the FragPipe psm.tsv file, and runs the appropriate MultiQC command.

docs/config.json (1)

167-182: FragPipe project entries are well-structured.

The two project entries (standard and disable_hoverinfo variants) follow the established pattern and are correctly configured with appropriate URLs, paths, and file types.

docs/PXD054720/multiqc_report.html (2)

31-31: Auto-generated content updates are expected.

The timestamps, reportUuid, and aiReportMetadataBase64 are auto-generated when the report is regenerated. These changes are not meaningful for code review.

Also applies to: 44-44, 51-51, 1215-1215

1391-1392: Verify intentional documentation updates for FragPipe support.

The references to DIA-NN and FragPipe in the HTML documentation appear to be intentional updates documenting new pipeline support. Ensure this HTML file is meant to be a version-controlled example/documentation artifact (rather than an accidentally-committed generated report).

README.md (3)

63-64: Documentation addition for FragPipe support is clear and consistent.

The new data source entry follows the existing pattern used for other proteomics tools, specifying psm.tsv as the FragPipe report file. This aligns well with the broader FragPipe integration in the codebase.

140-144: Usage example for FragPipe mirrors existing plugin patterns.

The example command is consistent with other plugin documentation (quantms, MaxQuant, DIA-NN, etc.) and clearly shows how to invoke the FragPipe plugin with typical arguments.

166-166: The --fragpipe-plugin option is properly registered in pyproject.toml (line 80) and matches the documentation.

The configuration correctly maps fragpipe_plugin = "pmultiqc.cli:fragpipe_plugin", confirming that the README documentation is accurate and consistent with the project setup.

docs/PXD054720_disable_hoverinfo/multiqc_report.html (4)

26-31: LGTM - Auto-generated report artifacts.

The updated base64-encoded plot data and new reportUuid are expected outputs from regenerating the MultiQC report with FragPipe support.

44-51: LGTM - Updated report metadata.

The updated base64-encoded metadata now includes FragPipe tool information, and the creation timestamp reflects the report regeneration.

1215-1215: LGTM - Consistent timestamp.

The generation date in the report footer is consistent with the configCreationDate value.

1391-1392: LGTM - FragPipe added to supported pipelines.

The FragPipe documentation link is properly formatted and consistent with the other tool references in the list.

pmultiqc/modules/fragpipe/fragpipe_io.py (3)

1-8: LGTM!

Imports and logger initialization follow the established pattern used in other modules.

11-17: LGTM!

The required columns definition is clear. The empty peptide list appears to be a placeholder for future expansion.

72-84: LGTM!

The psm_reader function correctly validates the required "Spectrum" column, derives the "Run" column, and filters to available required columns. The static analysis hint (TRY003) about the exception message is a minor style preference and can be safely ignored here.

pmultiqc/modules/common/plots/id.py (7)

228-235: LGTM!

Description and helptext updated to be more generic, now referencing both MaxQuant's evidence.txt and FragPipe's psm.tsv as data sources.

385-391: LGTM!

Function renamed from draw_quantms_identification to draw_identification for broader applicability across different pipeline sources.

546-599: LGTM!

Parameter renamed from enable_dia to use_two_columns which better describes its purpose (controlling table layout). The conditional logic is updated consistently.

619-627: LGTM!

Function renamed from draw_quantms_identi_num to draw_identi_num for consistency with other renames.

829-836: LGTM!

Description and helptext updated to mention FragPipe data source (psm.tsv), making it clear this table supports multiple pipeline types.

949-999: LGTM!

Parameter renamed from enable_mzid to is_fragpipe_or_mzid which accurately reflects its expanded usage for both mzIdentML and FragPipe contexts.

1044-1048: LGTM!

New FragPipe-specific branch added for retention time plots with appropriate description and helptext referencing the psm.tsv source.

pmultiqc/modules/diann/diann.py (4)

8-8: LGTM!

DiannModule now properly inherits from BasePMultiqcModule, following the new plugin architecture pattern. The base class initialization is correctly delegated.

Also applies to: 37-42

18-23: LGTM!

Import statements updated to use the renamed functions (draw_identification, draw_identi_num).

167-181: LGTM!

Function calls updated to use renamed functions and parameters (use_two_columns, draw_identi_num).

202-206: LGTM!

Call to draw_identification (renamed from draw_quantms_identification) is correctly updated.

pmultiqc/modules/fragpipe/__init__.py (1)

1-3: LGTM!

Clean package initializer that re-exports FragPipeModule following the established pattern used by other modules in the codebase.

pmultiqc/cli.py (1)

79-81: LGTM!

The new --fragpipe-plugin CLI option follows the established pattern used by other plugin flags (--diann-plugin, --maxquant-plugin, etc.).

pyproject.toml (1)

80-80: LGTM!

The fragpipe_plugin CLI option is correctly registered in the MultiQC plugin entry points, following the same pattern as other plugin options.

pmultiqc/main.py (1)

89-121: LGTM!

Formatting changes (multi-line wrapping of dict literals) improve readability without changing functionality.

pmultiqc/modules/mzidentml/mzidentml.py (1)

151-157: LGTM! API call updated correctly.

The function call correctly uses the renamed draw_identification API with all required parameters (cal_num_table_data, quantms_missed_cleavages, quantms_modified, msms_identified_rate) matching the updated function signature.

pmultiqc/modules/core/core.py (1)

13-20: Good extensible plugin registration pattern.

The PLUGIN_MAP provides a clean, maintainable way to register plugins with their module and class names. Adding new plugins only requires a single entry here.

pmultiqc/modules/quantms/quantms.py (3)

85-89: LGTM! Clean base class refactor.

The inheritance from BasePMultiqcModule and proper super().__init__() call correctly aligns with the plugin system refactoring pattern used across other modules.

333-348: API updates consistent with the refactor.

The parameter rename from enable_dia to use_two_columns and function rename from draw_quantms_identi_num to draw_identi_num correctly follow the centralized API changes.

458-464: LGTM! Function call updated to match the renamed API.

The call to draw_identification correctly replaces draw_quantms_identification with all required parameters.

pmultiqc/modules/fragpipe/fragpipe.py (4)

38-46: LGTM! Proper initialization pattern.

The module correctly inherits from BasePMultiqcModule and initializes instance variables for data collection.

142-183: LGTM! Robust PSM parsing with proper error handling.

The static method correctly handles missing/empty files with appropriate logging and gracefully skips unreadable data.

186-206: LGTM! Delta mass calculation follows established patterns.

The method correctly concatenates data, handles type conversion with error coercion, drops invalid values, and uses the shared cal_delta_mass_dict utility.

105-108: No action required. The draw_identi_num function signature is fully compatible with the fragpipe.py call. The function definition shows that all parameters except sub_sections have default values (enable_exp=False, enable_sdrf=False, is_multi_conditions=False, sample_df=None, file_df=None, cal_num_table_data=None), so passing only sub_sections and cal_num_table_data is safe and intentional.

coderabbitai · 2026-01-04T16:48:58Z

docs/README.md

   - `*.mzML` or `*.mgf`: Corresponding spectra files

+### 6. [FragPipe](https://fragpipe.nesvilab.org) files:
+   - `psm.tsv`: FragPipe report file


⚠️ Potential issue | 🟡 Minor

Fix indentation to align with other list items.

The list item is indented with 3 spaces instead of 0, which is inconsistent with other data source sections in the file and flagged by markdownlint.

🔎 Proposed fix

- - `psm.tsv`: FragPipe report file +- `psm.tsv`: FragPipe report file

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- `psm.tsv`: FragPipe report file

- `psm.tsv`: FragPipe report file

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

84-84: Unordered list indentation
Expected: 0; Actual: 3

(MD007, ul-indent)

🤖 Prompt for AI Agents

In docs/README.md around line 84, the list item "`psm.tsv`: FragPipe report file" is indented with 3 spaces instead of matching the other top-level list items; remove the leading spaces so the line starts at column 0 to align with the other list entries and satisfy markdownlint.

coderabbitai · 2026-01-04T16:48:58Z

pmultiqc/main.py

+    if "pmultiqc/tsv" not in config.sp:
+        config.update_dict(
+            config.sp,
+            {"pmultiqc/tsv": {"fn": "*.tsv", "num_lines": 0}},
+        )


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -n "pmultiqc/tsv" --type py

Repository: bigbio/pmultiqc

Length of output: 320

🏁 Script executed:

cat -n pmultiqc/modules/fragpipe/fragpipe_io.py | sed -n '35,60p'

Repository: bigbio/pmultiqc

Length of output: 1043

🏁 Script executed:

cat -n pmultiqc/modules/fragpipe/fragpipe_io.py | sed -n '30,75p'

Repository: bigbio/pmultiqc

Length of output: 1791

🏁 Script executed:

cat -n pmultiqc/main.py | sed -n '80,160p'

Repository: bigbio/pmultiqc

Length of output: 3153

The broad *.tsv pattern will match files intended for other modules, but filtering ensures correct behavior.

The *.tsv pattern will cause MultiQC to scan all TSV files, including experimental_design.tsv, *sdrf.tsv, and *report.tsv which are already matched by more specific patterns (pmultiqc/exp_design, pmultiqc/sdrf, pmultiqc/diann_report_tsv). However, fragpipe_io.get_fragpipe_files() filters results to only process files containing "psm" in the filename, so only FragPipe-specific TSV files are actually used. This is a working design, though it may cause unnecessary file scanning. If efficiency is a concern, consider restricting the pattern to *psm*.tsv or similar to avoid overlapping with other module patterns.

🤖 Prompt for AI Agents

In pmultiqc/main.py around lines 149 to 153, the config entry uses the broad "*.tsv" pattern which causes MultiQC to scan unrelated TSVs; narrow the glob to only FragPipe/PSM TSVs (for example "*psm*.tsv" or another pattern that includes "psm") by updating the value passed to config.update_dict so it no longer uses a generic "*.tsv" while preserving the existing metadata (num_lines: 0) and key ("pmultiqc/tsv").

coderabbitai · 2026-01-04T16:48:58Z

pmultiqc/modules/common/plots/dia.py

+    delta_df = df[["Ms1.Apex.Mz.Delta", "Run"]].copy()
    delta_mass = cal_delta_mass_dict(delta_df, "Ms1.Apex.Mz.Delta")


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find cal_delta_mass_dict function definition rg -n "def cal_delta_mass_dict" --type python

Repository: bigbio/pmultiqc

Length of output: 89

🏁 Script executed:

#!/bin/bash # Also search for any imports or references to cal_delta_mass_dict rg -n "cal_delta_mass_dict" --type python -B 2 -A 10

Repository: bigbio/pmultiqc

Length of output: 89

🏁 Script executed:

#!/bin/bash # Find cal_delta_mass_dict function definition rg "def cal_delta_mass_dict" -n

Repository: bigbio/pmultiqc

Length of output: 128

🏁 Script executed:

#!/bin/bash # Find all references to cal_delta_mass_dict rg "cal_delta_mass_dict" -n -B 2 -A 15

Repository: bigbio/pmultiqc

Length of output: 9219

🏁 Script executed:

#!/bin/bash # Get the full function definition of cal_delta_mass_dict sed -n '42,60p' pmultiqc/modules/common/stats.py

Repository: bigbio/pmultiqc

Length of output: 629

Remove the unused "Run" column from delta_df.

The cal_delta_mass_dict function only uses the column specified in its second parameter. Since only "Ms1.Apex.Mz.Delta" is accessed, the "Run" column in delta_df is dead code. Change line 341 to:

delta_df = df[["Ms1.Apex.Mz.Delta"]].copy()

🤖 Prompt for AI Agents

In pmultiqc/modules/common/plots/dia.py around lines 341 to 342, delta_df currently includes an unused "Run" column which is not used by cal_delta_mass_dict; update the construction of delta_df to only include the "Ms1.Apex.Mz.Delta" column (i.e., remove "Run") so the DataFrame copies only the necessary column and eliminates dead code.

yueqixuan and others added 14 commits January 4, 2026 15:15

Add FragPipe plugin

d5cf498

Add FragPipe plugin

12f0314

Merge pull request #138 from yueqixuan/dev

6be0f80

Add FragPipe plugin

Update pmultiqc documentation and examples

037ac15

Merge pull request #139 from yueqixuan/dev-docs-update

77f71af

Auto-update examples

Add FragPipe plugin

fa69d26

Merge pull request #140 from yueqixuan/dev

d3ea0d6

fix

fix

124693a

fix

c419a92

fix

45a1a3f

Merge pull request #142 from yueqixuan/dev

cc7531b

Dev

fix

3586322

Merge pull request #144 from yueqixuan/dev

e30463d

fix

Merge pull request #494 from yueqixuan/main

1859013

Add FragPipe plugin

ypriverol requested review from Copilot and yueqixuan January 4, 2026 16:43

Copilot started reviewing on behalf of ypriverol January 4, 2026 16:44 View session

qodo-code-review bot added the Review effort 4/5 label Jan 4, 2026

Copilot AI reviewed Jan 4, 2026

View reviewed changes

coderabbitai bot reviewed Jan 4, 2026

View reviewed changes

ypriverol merged commit eac5452 into main Jan 4, 2026
27 of 28 checks passed

This was referenced Jan 13, 2026

Adding Quant plots to FragPipe based on ion.tsv #533

Merged

Improving FragPipe #544

Merged

Peptide level distributions #554

Merged

	- `psm.tsv`: FragPipe report file
	- `psm.tsv`: FragPipe report file

		delta_df = df[["Ms1.Apex.Mz.Delta", "Run"]].copy()
		delta_mass = cal_delta_mass_dict(delta_df, "Ms1.Apex.Mz.Delta")

Comments

Conversation

ypriverol commented Jan 4, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Pull Request

Description

Type of Change

PR Type

Description

Diagram Walkthrough

File Walkthrough

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

qodo-code-review bot commented Jan 4, 2026

PR Compliance Guide 🔍

Uh oh!

qodo-code-review bot commented Jan 4, 2026

PR Code Suggestions ✨

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ypriverol commented Jan 4, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 4, 2026 •

edited

Loading