Skip to content

Comments

FragPipe first implementation#495

Merged
ypriverol merged 14 commits intomainfrom
dev
Jan 4, 2026
Merged

FragPipe first implementation#495
ypriverol merged 14 commits intomainfrom
dev

Conversation

@ypriverol
Copy link
Member

@ypriverol ypriverol commented Jan 4, 2026

User description

Pull Request

Description

Brief description of the changes made in this PR.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Test addition/update
  • Updates to the dependencies has been done.

PR Type

Enhancement, Tests, Documentation


Description

  • New FragPipe module implementation with complete data parsing and visualization capabilities for PSM (Peptide Spectrum Match) analysis

  • Parses PSM data including delta mass, charge states, and retention times; generates plots for mass error, charge-state distribution, identification statistics, and retention time analysis

  • Calculates pipeline statistics including protein and peptide counts with modification tracking

  • Refactored common plotting functions to support multiple pipelines: renamed draw_quantms_identification() to draw_identification() and updated parameter names for broader applicability

  • Updated existing modules (QuantMS, DIA-NN, mzIdentML) to use renamed functions and inherit from BasePMultiqcModule for code reuse

  • Refactored core plugin loading system with centralized PLUGIN_MAP configuration and loop-based plugin loading for maintainability

  • Added FragPipe CLI option (--fragpipe-plugin) and TSV file configuration for FragPipe support

  • Comprehensive documentation including README updates, usage examples, and example dataset configurations

  • CI/CD integration with new FragPipe test job that downloads test data from PRIDE repository and validates plugin functionality


Diagram Walkthrough

flowchart LR
  A["FragPipe<br/>Module"] -->|"Parses PSM data"| B["fragpipe.py<br/>Data Processing"]
  A -->|"File I/O"| C["fragpipe_io.py<br/>TSV Reader"]
  B -->|"Uses"| D["Common Plots<br/>Identification"]
  D -->|"Refactored"| E["QuantMS<br/>DIA-NN<br/>mzIdentML"]
  F["CLI Option<br/>--fragpipe-plugin"] -->|"Enables"| A
  G["Core Plugin<br/>System"] -->|"Loads via<br/>PLUGIN_MAP"| A
  H["CI/CD Pipeline"] -->|"Tests"| A
  I["Documentation<br/>& Examples"] -->|"Supports"| A
Loading

File Walkthrough

Relevant files
Enhancement
6 files
fragpipe.py
FragPipe module implementation with data parsing and plotting

pmultiqc/modules/fragpipe/fragpipe.py

  • New FragPipe module implementation with data parsing and visualization
    capabilities
  • Parses PSM (Peptide Spectrum Match) data including delta mass, charge
    states, and retention times
  • Generates plots for mass error, charge-state distribution,
    identification statistics, and retention time analysis
  • Calculates pipeline statistics including protein and peptide counts
    with modification tracking
+322/-0 
fragpipe_io.py
FragPipe file I/O and data reading utilities                         

pmultiqc/modules/fragpipe/fragpipe_io.py

  • New I/O module for FragPipe file handling and data reading
  • Implements get_fragpipe_files() to locate and load FragPipe TSV files
  • Implements psm_reader() to parse PSM data with column validation and
    run extraction
  • Defines required columns for PSM data validation
+111/-0 
__init__.py
FragPipe module package initialization                                     

pmultiqc/modules/fragpipe/init.py

  • New package initialization file for FragPipe module
  • Exports FragPipeModule class for public API
+3/-0     
dia.py
Update DIA charge-state and delta mass plot configurations

pmultiqc/modules/common/plots/dia.py

  • Updated charge-state plot configuration ID and title for consistency
  • Added cpswitch_c_active parameter to charge-state plot configuration
  • Removed filtering condition in delta mass calculation to include zero
    values
  • Added HTML validation check for delta mass plot output
+7/-4     
cli.py
Add FragPipe plugin command-line option                                   

pmultiqc/cli.py

  • Added --fragpipe-plugin command-line option to enable FragPipe plugin
+3/-0     
multiqc_report.html
Update MultiQC report with FragPipe support                           

docs/PXD054720_disable_hoverinfo/multiqc_report.html

  • Updated compressed plot data with new report UUID
  • Modified report metadata to include FragPipe in the list of supported
    tools
  • Updated report generation timestamp from 2025-12-30 to 2026-01-04
  • Added FragPipe link reference in the pmultiqc description section
+7/-6     
Refactoring
5 files
id.py
Refactor function names and parameters for multi-pipeline support

pmultiqc/modules/common/plots/id.py

  • Renamed draw_quantms_identification() to draw_identification() for
    broader applicability
  • Renamed draw_quantms_identi_num() to draw_identi_num() for consistency
  • Updated draw_summary_protein_ident_table() parameter from enable_dia
    to use_two_columns
  • Updated draw_num_pep_per_protein() parameter from enable_mzid to
    is_fragpipe_or_mzid
  • Enhanced documentation to support FragPipe and updated help text for
    delta mass and retention time
  • Added FragPipe-specific retention time handling in draw_ids_rt_count()
+16/-9   
quantms.py
Refactor QuantMS module to use base class and updated function names

pmultiqc/modules/quantms/quantms.py

  • Updated QuantMSModule to inherit from BasePMultiqcModule for code
    reuse
  • Simplified initialization by calling parent class constructor
  • Updated function calls to use renamed functions (draw_identification,
    draw_identi_num)
  • Updated parameter names in function calls (use_two_columns instead of
    enable_dia)
  • Fixed code formatting/indentation for mass error calculation
+14/-14 
diann.py
Refactor DIA-NN module to use base class and updated function names

pmultiqc/modules/diann/diann.py

  • Updated DiannModule to inherit from BasePMultiqcModule for code reuse
  • Simplified initialization by calling parent class constructor
  • Updated function calls to use renamed functions (draw_identification,
    draw_identi_num)
  • Updated parameter names in function calls (use_two_columns instead of
    enable_dia)
+9/-9     
mzidentml.py
Update mzIdentML module function call                                       

pmultiqc/modules/mzidentml/mzidentml.py

  • Updated function call from draw_quantms_identification() to
    draw_identification()
+1/-1     
core.py
Refactor core plugin loading system with centralized configuration

pmultiqc/modules/core/core.py

  • Added PLUGIN_MAP dictionary to centralize plugin configuration mapping
  • Refactored plugin loading logic into load_and_run_plugin() method for
    maintainability
  • Replaced multiple if-elif statements with loop-based plugin loading
  • Updated module description to include FragPipe support
  • Fixed heatmap_color_list variable scope by assigning to instance
    variable
+32/-36 
Formatting
1 files
maxquant_utils.py
Code formatting fix for percentage calculation                     

pmultiqc/modules/maxquant/maxquant_utils.py

  • Fixed code formatting/indentation for percentage calculation in
    peptide per protein analysis
+2/-2     
Configuration changes
3 files
main.py
Add TSV file configuration for FragPipe support                   

pmultiqc/main.py

  • Added pmultiqc/tsv configuration for FragPipe TSV file detection
  • Improved code formatting for configuration update calls with better
    line breaks
+24/-6   
config.json
Add FragPipe example dataset configurations                           

docs/config.json

  • Added two new example configurations for FragPipe (PXD070239 and
    PXD070239_disable_hoverinfo)
  • Configured to download psm.tsv from PRIDE FTP repository
+16/-0   
pyproject.toml
Register FragPipe plugin CLI entry point                                 

pyproject.toml

  • Added fragpipe_plugin entry point for CLI command registration
+1/-0     
Documentation
4 files
update_examples.py
Add FragPipe example report generation                                     

docs/update_examples.py

  • Added FragPipe plugin command generation for example report creation
+3/-0     
README.md
Add FragPipe documentation and usage examples                       

docs/README.md

  • Added FragPipe as supported data source with psm.tsv file requirement
  • Added FragPipe usage example with --fragpipe-plugin command
  • Added FragPipe to command-line options table
  • Added FragPipe example report link to documentation
+11/-0   
multiqc_report.html
Update example report with FragPipe documentation               

docs/PXD054720/multiqc_report.html

  • Updated report UUID and creation timestamp
  • Updated module description to include FragPipe support with link
  • Updated AI report metadata to reflect FragPipe addition
+7/-6     
README.md
Document FragPipe support and usage instructions                 

README.md

  • Added FragPipe to the list of supported data sources with psm.tsv file
    format
  • Included usage instructions for FragPipe plugin with example command
  • Added --fragpipe-plugin to the command-line options table with
    description
+10/-0   
Tests
1 files
python-app.yml
Add FragPipe integration test to CI/CD pipeline                   

.github/workflows/python-app.yml

  • Added new test_fragpipe job to the CI/CD workflow
  • Downloads FragPipe test data from PRIDE repository
  • Runs MultiQC with --fragpipe-plugin flag on test data
  • Uploads generated results as artifacts for validation
+24/-1   
Additional files
18 files
multiqc_report.html +37/-35 
multiqc_report.html +37/-35 
multiqc_report.html +9/-7     
multiqc_report.html +9/-7     
multiqc_report.html +9/-7     
multiqc_report.html +11/-9   
multiqc_report.html +10/-8   
multiqc_report.html +9/-7     
multiqc_report.html +7/-6     
multiqc_report.html +7/-6     
multiqc_report.html +3371/-0
multiqc_report.html +3371/-0
multiqc_report.html +7/-6     
multiqc_report.html +7/-6     
multiqc_report.html +9/-7     
multiqc_report.html +9/-7     
multiqc_report.html +37/-35 
multiqc_report.html +37/-35 

Summary by CodeRabbit

Release Notes

  • New Features

    • Added FragPipe plugin support for analyzing PSM data files with new --fragpipe-plugin CLI option
    • Added FragPipe-specific visualizations for delta mass, charge state, and retention time analysis
    • Implemented dynamic plugin loading system for improved plugin management
  • Documentation

    • Added FragPipe data source documentation and usage examples
  • Improvements

    • Enhanced delta mass plot calculations and charge state handling

✏️ Tip: You can customize this high-level summary in your review settings.

@ypriverol ypriverol requested review from Copilot and yueqixuan January 4, 2026 16:43
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 4, 2026

📝 Walkthrough

Walkthrough

This PR adds comprehensive FragPipe support to pmultiqc by introducing a new FragPipe module for PSM data parsing and visualization, refactoring the plugin system to use dynamic loading via PLUGIN_MAP, renaming identification-related API functions for consistency, and updating dependent modules to adopt the new patterns and base class.

Changes

Cohort / File(s) Summary
FragPipe Module Implementation
pmultiqc/modules/fragpipe/__init__.py, pmultiqc/modules/fragpipe/fragpipe.py, pmultiqc/modules/fragpipe/fragpipe_io.py
New complete FragPipe support: FragPipeModule parses PSM data, extracts delta masses, charge states, retention times, and pipeline statistics; fragpipe_io provides file discovery and PSM reading with column validation.
Plugin System Refactoring
pmultiqc/modules/core/core.py, pmultiqc/cli.py, pyproject.toml
Introduces PLUGIN_MAP for dynamic plugin loading, adds --fragpipe-plugin flag, replaces hard-coded plugin blocks with generic loader via load_and_run_plugin().
Identification Plotting API Refactoring
pmultiqc/modules/common/plots/id.py
Renames draw_quantms_identificationdraw_identification, draw_quantms_identi_numdraw_identi_num; updates parameter names: enable_diause_two_columns, enable_mzidis_fragpipe_or_mzid; adds FragPipe-specific branches for retention-time plot.
DIA Module Visualization Updates
pmultiqc/modules/common/plots/dia.py
Simplifies charge-state section naming; adjusts delta mass data selection to include zero values; adds plot_html_check post-processing.
Module Updates with New Base Class & API
pmultiqc/modules/diann/diann.py, pmultiqc/modules/quantms/quantms.py, pmultiqc/modules/mzidentml/mzidentml.py
Updates to inherit from BasePMultiqcModule; calls renamed identification functions; adapts parameter names for consistency across DiannModule, QuantMSModule, and MzIdentML processing.
Documentation & Configuration
README.md, docs/README.md, docs/config.json, docs/update_examples.py, pmultiqc/main.py
Adds FragPipe data source documentation, CLI usage examples; adds two FragPipe project entries to config; adds FragPipe plugin branch in example runner; adds TSV file pattern to pmultiqc config.
Report & Report Metadata
docs/PXD054720/multiqc_report.html, docs/PXD054720_disable_hoverinfo/multiqc_report.html, .github/workflows/python-app.yml
Updates report UUIDs and timestamps; reinstates proteobench path whitespace; adds new test_fragpipe CI job.
Minor Refactoring
pmultiqc/modules/maxquant/maxquant_utils.py
Formatting adjustment to percentage calculation (no semantic change).

Sequence Diagram(s)

sequenceDiagram
    participant CLI
    participant Core as PMultiQC Core
    participant PM as PLUGIN_MAP
    participant Loader as Plugin Loader
    participant Fragment as FragPipe Module
    participant Log as Log Parser

    CLI->>Core: Initialize with --fragpipe-plugin flag
    Core->>Core: Check PLUGIN_MAP
    Core->>PM: Lookup fragpipe_plugin key
    PM-->>Core: Return module path
    Core->>Loader: load_and_run_plugin()
    Loader->>Fragment: Dynamically import & instantiate<br/>FragPipeModule(find_log_files, sub_sections, heatmap_colors)
    Fragment->>Log: find_log_files() discover PSM data
    Log-->>Fragment: Return fragpipe_files dict
    Fragment->>Fragment: parse_psm() extract delta masses,<br/>charges, stats, retention
    Fragment->>Fragment: draw_plots() generate visualizations
    Fragment-->>Core: Return data & plots
    Core->>Core: Add plots to report
    Core-->>CLI: Report generated
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

Review effort 4/5

Suggested reviewers

  • daichengxin
  • yueqixuan

Poem

🐰 A rabbit hops through FragPipe's data stream,
Parsing PSM files and charge-state dreams,
Plugins now dance in dynamic grace,
While renamed APIs find their place! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 3.13% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'FragPipe first implementation' clearly and specifically describes the main change: adding initial FragPipe plugin support to the codebase, which is confirmed by the extensive changes across multiple files.
✨ Finishing touches
  • 📝 Generate docstrings

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.40.3)
docs/PXD054720/multiqc_report.html
docs/PXD054720_disable_hoverinfo/multiqc_report.html

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@qodo-code-review
Copy link
Contributor

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Resource exhaustion

Description: Unbounded pd.read_csv() on attacker-controlled *.tsv inputs can enable resource-exhaustion
(CPU/RAM) denial-of-service by providing extremely large or pathological TSV files (no
size/row limits, dtype constraints, or chunking).
fragpipe_io.py [72-85]

Referred Code
def psm_reader(file_path: str):

    psm_df = pd.read_csv(file_path, sep="\t")

    if "Spectrum" not in psm_df.columns:
        raise ValueError("psm.tsv must contain a 'Spectrum' column")

    required_cols = [c for c in REQUIRED_COLS["psm"] if c in psm_df.columns]

    psm_df["Run"] = psm_df["Spectrum"].astype(str).str.rsplit(".", n=3).str[0]
    required_cols.append("Run")

    return psm_df[required_cols].copy()
Unpinned CI download

Description: CI downloads external test data via wget from a remote PRIDE URL without pinning a
checksum/signature, creating a supply-chain risk where compromised/altered remote content
could influence the build and execution environment.
python-app.yml [274-277]

Referred Code
- name: Test FragPipe file
  run: |
    wget -nv -P ./fragpipe https://ftp.pride.ebi.ac.uk/pride/data/archive/2025/12/PXD070239/psm.tsv
    multiqc --fragpipe-plugin ./fragpipe -o ./results_fragpipe
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🔴
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status:
Misspelled identifier: The new variable name categorys is misspelled and reduces readability/self-documentation.

Referred Code
categorys = OrderedDict()
categorys["Frequency"] = {
    "name": "Frequency",
    "description": "number of peptides per proteins",
}

pep_plot.to_dict(percentage=True, cats=categorys)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
None not handled: fragpipe_io.get_fragpipe_files() can return None, but get_data() assumes a dict and
immediately indexes self.fragpipe_files["psm"], which can raise an exception
instead of degrading gracefully.

Referred Code
self.fragpipe_files = fragpipe_io.get_fragpipe_files(self.find_log_files)

if self.fragpipe_files["psm"]:
    (
        self.delta_masses,
        self.charge_states,
        self.pipeline_stats,
        self.retentions
    ) = self.parse_psm(
        fragpipe_files=self.fragpipe_files
    )
else:
    log.warning("Required input not found: psm.tsv")
    return False

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Weak input validation: psm_reader() does not validate delimiter/encoding or enforce required columns beyond
Spectrum, so malformed/unexpected TSV inputs may cause downstream failures rather than
being rejected with actionable validation errors.

Referred Code
def psm_reader(file_path: str):

    psm_df = pd.read_csv(file_path, sep="\t")

    if "Spectrum" not in psm_df.columns:
        raise ValueError("psm.tsv must contain a 'Spectrum' column")

    required_cols = [c for c in REQUIRED_COLS["psm"] if c in psm_df.columns]

    psm_df["Run"] = psm_df["Spectrum"].astype(str).str.rsplit(".", n=3).str[0]
    required_cols.append("Run")

    return psm_df[required_cols].copy()

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
No user context: Logging is present for file parsing and plot generation but does not include user
identity/context required by the audit trail criteria, which may or may not be applicable
to this CLI/tooling context.

Referred Code
    log.info("Starting data recognition and processing...")

    self.fragpipe_files = fragpipe_io.get_fragpipe_files(self.find_log_files)

    if self.fragpipe_files["psm"]:
        (
            self.delta_masses,
            self.charge_states,
            self.pipeline_stats,
            self.retentions
        ) = self.parse_psm(
            fragpipe_files=self.fragpipe_files
        )
    else:
        log.warning("Required input not found: psm.tsv")
        return False

    return True

def draw_plots(self):



 ... (clipped 2 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Uncaught read errors: psm_reader() performs pd.read_csv() without catching exceptions, which may surface stack
traces and internal details to end users depending on MultiQC's exception handling
behavior.

Referred Code
def psm_reader(file_path: str):

    psm_df = pd.read_csv(file_path, sep="\t")

    if "Spectrum" not in psm_df.columns:
        raise ValueError("psm.tsv must contain a 'Spectrum' column")

    required_cols = [c for c in REQUIRED_COLS["psm"] if c in psm_df.columns]

    psm_df["Run"] = psm_df["Spectrum"].astype(str).str.rsplit(".", n=3).str[0]
    required_cols.append("Run")

    return psm_df[required_cols].copy()

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Logs full paths: Debug logging outputs full local file paths (log.debug(...)), which can be sensitive in
some environments and should be reviewed against the project's logging policy.

Referred Code
if any(fragpipe_files.values()):

    for k, v in fragpipe_files.items():
        log.info(f"FragPipe data loaded: {k} ({len(v)} files).")
        log.debug(f"FragPipe data loaded: {k}: {v}")

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Correct broken test data URL

Correct the broken FTP URL for the psm.tsv file in the PXD070239 project
configuration. The current URL points to a non-existent future path; it should
be updated to the correct path in the PRIDE repository.

docs/config.json [167-182]

 {
     "accession": "PXD070239",
     "urls": [
-        "https://ftp.pride.ebi.ac.uk/pride/data/archive/2025/12/PXD070239/psm.tsv"
+        "https://ftp.pride.ebi.ac.uk/pride/data/archive/2024/01/PXD070239/fragpipe_output/psm.tsv"
     ],
     "path": "docs/PXD070239",
     "file_type": ["fragpipe", ""]
 },
 {
     "accession": "PXD070239_disable_hoverinfo",
     "urls": [
-        "https://ftp.pride.ebi.ac.uk/pride/data/archive/2025/12/PXD070239/psm.tsv"
+        "https://ftp.pride.ebi.ac.uk/pride/data/archive/2024/01/PXD070239/fragpipe_output/psm.tsv"
     ],
     "path": "docs/PXD070239_disable_hoverinfo",
     "file_type": ["fragpipe", "disable_hoverinfo"]
 }
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a broken URL for test data, which would cause the CI pipeline and documentation generation to fail, making this a critical fix for the PR's functionality.

High
Add None check for file loading

Add a check to ensure the result of fragpipe_io.get_fragpipe_files is not None
before attempting to access the ["psm"] key to prevent a potential runtime
error.

pmultiqc/modules/fragpipe/fragpipe.py [52-65]

-self.fragpipe_files = fragpipe_io.get_fragpipe_files(self.find_log_files)
-
-if self.fragpipe_files["psm"]:
-    (
-        self.delta_masses,
-        self.charge_states,
-        self.pipeline_stats,
-        self.retentions
-    ) = self.parse_psm(
-        fragpipe_files=self.fragpipe_files
-    )
-else:
+files = fragpipe_io.get_fragpipe_files(self.find_log_files)
+if not files or not files.get("psm"):
     log.warning("Required input not found: psm.tsv")
     return False
+self.fragpipe_files = files
 
+(
+    self.delta_masses,
+    self.charge_states,
+    self.pipeline_stats,
+    self.retentions
+) = self.parse_psm(
+    fragpipe_files=self.fragpipe_files
+)
+
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: This suggestion correctly identifies a potential TypeError if get_fragpipe_files returns None and prevents it by adding a necessary guard clause, which is a critical fix for robust error handling.

Medium
Fix dead code by updating file list

Add 'peptide' and 'protein' to the required_files list to activate the existing
logic for handling these file types and remove dead code.

pmultiqc/modules/fragpipe/fragpipe_io.py [35-56]

-required_files = ["psm"]
+required_files = ["psm", "peptide", "protein"]
 req_set = set(required_files)
 
 fragpipe_files = {req: [] for req in required_files}
 
 # FragPipe *tsv Data
 for file_info in find_log_files("pmultiqc/tsv", filecontents=False):
     filename = file_info["fn"]
     full_path = os.path.join(file_info["root"], filename)
 
     for req in req_set:
         if req in filename:
 
             if req == "protein" and "combined_protein" in filename:
                 continue
 
             if req == "peptide" and (
                 "combined_peptide" in filename or "combined_modified_peptide" in filename
             ):
                 continue
 
             fragpipe_files[req].append(full_path)
  • Apply / Chat
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies dead code and an inconsistency, as the code includes logic for protein and peptide files but doesn't search for them. Applying this change improves code quality and maintainability for future extensions.

Low
General
Improve performance and memory usage

Refactor the _calculate_statistics function to use nunique() for counting unique
items, improving performance and reducing memory usage compared to the current
len(set(...)) and groupby().agg(list) approach.

pmultiqc/modules/fragpipe/fragpipe.py [277-312]

 def _calculate_statistics(pipeline_stats: list):
 
     df = pd.concat(pipeline_stats, ignore_index=True)
     log.info(f"Number of pipeline result statistics rows in DataFrame: {len(df)}")
 
     summary_data = {
-        "total_proteins": len(set(df["Protein"])),
-        "total_peptides": len(set(df["Peptide"]))
+        "total_proteins": df["Protein"].nunique(),
+        "total_peptides": df["Peptide"].nunique()
     }
 
     stats_by_run = dict()
     for run, group in df.groupby("Run"):
 
         unique_group = group.loc[group["Is Unique"]]
 
         modified_peptides = group.loc[
             group["Modified Peptide"].notna() & (group["Modified Peptide"] != ""),
             "Modified Peptide"
         ]
 
         stats_by_run[run] = {
-            "protein_num": len(set(group["Protein"])),
-            "peptide_num": len(set(group["Peptide"])),
-            "unique_peptide_num": len(set(unique_group["Peptide"])),
+            "protein_num": group["Protein"].nunique(),
+            "peptide_num": group["Peptide"].nunique(),
+            "unique_peptide_num": unique_group["Peptide"].nunique(),
             "modified_peptide_num": modified_peptides.nunique()
         }
     
     statistics_data = {
         "ms_runs": stats_by_run
     }
 
-    protein_pep_map = df.groupby("Protein")["Peptide"].agg(list).to_dict()
     pep_plot = Histogram("number of peptides per proteins", plot_category="frequency")
-    for _, peps in protein_pep_map.items():
-        number = len(set(peps))
+    pep_counts = df.groupby("Protein")["Peptide"].nunique()
+    for number in pep_counts:
         pep_plot.add_value(number)
     ...

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion provides a significant performance and memory optimization by replacing inefficient pandas operations with more direct and memory-friendly equivalents like nunique(), which is crucial for handling large datasets.

Medium
Security
Open external link safely

Add target="_blank" and rel="noopener noreferrer" to the external link for
FragPipe to enhance security and user experience.

docs/PXD054720/multiqc_report.html [1392]

-<a href='https://fragpipe.nesvilab.org/'>FragPipe</a>.
+<a href="https://fragpipe.nesvilab.org/" target="_blank" rel="noopener noreferrer">FragPipe</a>.

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a security best practice for external links by adding target="_blank" and rel="noopener noreferrer" to prevent tabnabbing, which is relevant to the newly added <a> tag.

Medium
  • More

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements support for FragPipe proteomics analysis results in the pmultiqc tool, adding the capability to generate quality control reports from FragPipe's PSM (peptide-spectrum match) output files.

Key changes:

  • Added FragPipeModule with support for PSM file parsing and visualization
  • Refactored existing modules to use a common base class (BasePMultiqcModule)
  • Renamed several functions for better consistency across modules (e.g., draw_quantms_identificationdraw_identification)

Reviewed changes

Copilot reviewed 19 out of 38 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pmultiqc/modules/fragpipe/* New FragPipe module implementation including I/O and visualization logic
pmultiqc/modules/base.py New base class for all pmultiqc modules providing common initialization
pmultiqc/modules/core/core.py Refactored plugin loading to use a mapping structure and consolidated plugin initialization
pmultiqc/cli.py Added --fragpipe-plugin command-line option
pyproject.toml Registered the FragPipe plugin entry point
pmultiqc/main.py Added TSV file pattern configuration for FragPipe file discovery
pmultiqc/modules/common/plots/id.py Renamed functions and parameters for better generalization across modules
pmultiqc/modules/quantms/quantms.py Updated to use BasePMultiqcModule and renamed function calls
pmultiqc/modules/diann/diann.py Updated to use BasePMultiqcModule and renamed function calls
pmultiqc/modules/mzidentml/mzidentml.py Updated to use renamed functions
docs/* Updated documentation and examples, added FragPipe configuration
.github/workflows/python-app.yml Added CI/CD test for FragPipe functionality

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +87 to +95
# def check_columns(file_path: str, data_type: str):

# try:
# df_header = pd.read_csv(file_path, sep="\t", nrows=0)
# actual_columns = set(df_header.columns.str.strip())

# missing_columns = [
# col
# for col in REQUIRED_COLS[data_type]
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears to contain commented-out code.

Copilot uses AI. Check for mistakes.
Comment on lines +99 to +104
# if not missing_columns:
# log.info("Check passed: All required columns are present.")
# return True
# else:
# log.info(
# f"Check failed: {file_path}. Missing the following columns: {missing_columns}"
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears to contain commented-out code.

Copilot uses AI. Check for mistakes.
Comment on lines +108 to +110
# except Exception as e:
# log.warning(f"Check failed: Unable to read file: {e}")
# return False
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears to contain commented-out code.

Copilot uses AI. Check for mistakes.

if diann.get_data():
diann.draw_plots()
plugin_loaded = True
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable plugin_loaded is not used.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (7)
pmultiqc/modules/maxquant/maxquant_utils.py (1)

119-121: Move DataFrame copy outside the loop to improve performance.

The .copy() call inside the loop creates a full copy of the DataFrame on every iteration, which is inefficient for large datasets. Move the copy before the loop if needed, or remove it if column assignments don't require a separate copy.

+            mq_data = mq_data.copy()
             for col, new_col in zip(intensity_cols, new_intensity_cols):
                 mq_data[new_col] = mq_data[col] / mq_data["mol. weight [kda]"]
-                mq_data = mq_data.copy()
.github/workflows/python-app.yml (1)

269-269: Update to the latest version of actions/setup-python.

The actions/setup-python@v4 is outdated. The latest stable version is v6.1.0. Consider updating to @v6 across all jobs in the workflow.

pmultiqc/modules/fragpipe/fragpipe_io.py (2)

45-56: Dead code: protein/peptide filtering will never execute.

The filtering conditions on lines 48-54 check for "protein" and "peptide" in req, but required_files only contains "psm" (line 35). This code is unreachable.

Additionally, the substring match if req in filename (line 46) could match unintended files (e.g., my_psm_backup.tsv). Consider using a more precise pattern like checking if the filename ends with psm.tsv.

🔎 Proposed fix
-        for req in req_set:
-            if req in filename:
-
-                if req == "protein" and "combined_protein" in filename:
-                    continue
-
-                if req == "peptide" and (
-                    "combined_peptide" in filename or "combined_modified_peptide" in filename
-                ):
-                    continue
-
-                fragpipe_files[req].append(full_path)
+        for req in req_set:
+            # Match files ending with the expected pattern (e.g., psm.tsv)
+            if filename.endswith(f"{req}.tsv"):
+                fragpipe_files[req].append(full_path)

87-110: Consider removing or enabling the commented-out helper.

This commented-out check_columns function could be useful for validating file structure before processing. Consider either enabling it in the workflow or removing it to reduce code clutter.

pmultiqc/modules/core/core.py (1)

75-101: Unreachable code: the if not plugin_loaded check is dead code.

The return statement on line 98 exits the method immediately after loading the first matching plugin, so line 100's condition if not plugin_loaded can never be reached when plugin_loaded is False. The only way to reach line 100 is if no plugin flag matched in the loop, which means the loop completed without setting plugin_loaded = True.

The logic works correctly, but setting plugin_loaded = True on line 97 is unnecessary since the method returns immediately afterward.

🔎 Suggested simplification
         for flag, (module_name, class_name) in PLUGIN_MAP.items():
 
             if config.kwargs.get(flag, False):
 
                 ModuleClass = get_module(module_name, class_name)
 
                 if "proteobench_plugin" == flag:
                     plugin = ModuleClass(self.find_log_files, None, None)
                 else:
                     plugin = ModuleClass(
                         self.find_log_files,
                         self.sub_sections,
                         self.heatmap_color_list
                     )
 
                 if plugin.get_data():
                     plugin.draw_plots()
 
-                plugin_loaded = True
                 return
 
-        if not plugin_loaded:
-            raise ValueError("No pmultiqc plugin selected; skipping.")
+        raise ValueError("No pmultiqc plugin selected; skipping.")
pmultiqc/modules/fragpipe/fragpipe.py (2)

309-320: Typo: categorys should be categories.

Minor spelling inconsistency with other modules.

🔎 Proposed fix
-    categorys = OrderedDict()
-    categorys["Frequency"] = {
+    categories = OrderedDict()
+    categories["Frequency"] = {
         "name": "Frequency",
         "description": "number of peptides per proteins",
     }
 
-    pep_plot.to_dict(percentage=True, cats=categorys)
+    pep_plot.to_dict(percentage=True, cats=categories)

277-322: Consider extracting _calculate_statistics as a module-level function vs class method.

This is a module-level function (good for testability), but the Histogram title "number of peptides per proteins" uses lowercase, which differs from other modules that use "Number of peptides per proteins". Consider aligning for consistency.

🔎 Consistency fix for histogram title
-    pep_plot = Histogram("number of peptides per proteins", plot_category="frequency")
+    pep_plot = Histogram("Number of peptides per proteins", plot_category="frequency")
-        "description": "number of peptides per proteins",
+        "description": "Number of identified peptides per protein.",
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f4d7134 and 1859013.

📒 Files selected for processing (38)
  • .github/workflows/python-app.yml
  • README.md
  • docs/DIANN/multiqc_report.html
  • docs/DIANN_disable_hoverinfo/multiqc_report.html
  • docs/LFQ_PXD007683/multiqc_report.html
  • docs/LFQ_PXD007683_disable_hoverinfo/multiqc_report.html
  • docs/MaxDIA/multiqc_report.html
  • docs/MaxDIA_disable_hoverinfo/multiqc_report.html
  • docs/PXD003133/multiqc_report.html
  • docs/PXD003133_disable_hoverinfo/multiqc_report.html
  • docs/PXD053068/multiqc_report.html
  • docs/PXD053068_disable_hoverinfo/multiqc_report.html
  • docs/PXD054720/multiqc_report.html
  • docs/PXD054720_disable_hoverinfo/multiqc_report.html
  • docs/PXD070239/multiqc_report.html
  • docs/PXD070239_disable_hoverinfo/multiqc_report.html
  • docs/ProteoBench/multiqc_report.html
  • docs/ProteoBench_disable_hoverinfo/multiqc_report.html
  • docs/README.md
  • docs/TMT_PXD007683/multiqc_report.html
  • docs/TMT_PXD007683_disable_hoverinfo/multiqc_report.html
  • docs/config.json
  • docs/dia/multiqc_report.html
  • docs/dia_disable_hoverinfo/multiqc_report.html
  • docs/update_examples.py
  • pmultiqc/cli.py
  • pmultiqc/main.py
  • pmultiqc/modules/common/plots/dia.py
  • pmultiqc/modules/common/plots/id.py
  • pmultiqc/modules/core/core.py
  • pmultiqc/modules/diann/diann.py
  • pmultiqc/modules/fragpipe/__init__.py
  • pmultiqc/modules/fragpipe/fragpipe.py
  • pmultiqc/modules/fragpipe/fragpipe_io.py
  • pmultiqc/modules/maxquant/maxquant_utils.py
  • pmultiqc/modules/mzidentml/mzidentml.py
  • pmultiqc/modules/quantms/quantms.py
  • pyproject.toml
🧰 Additional context used
🧬 Code graph analysis (4)
pmultiqc/modules/fragpipe/fragpipe_io.py (1)
pmultiqc/modules/common/logging.py (1)
  • get_logger (31-86)
pmultiqc/modules/diann/diann.py (2)
pmultiqc/modules/base.py (1)
  • BasePMultiqcModule (3-20)
pmultiqc/modules/common/plots/id.py (3)
  • draw_identification (385-517)
  • draw_summary_protein_ident_table (546-616)
  • draw_identi_num (619-837)
pmultiqc/modules/fragpipe/__init__.py (1)
pmultiqc/modules/fragpipe/fragpipe.py (1)
  • FragPipeModule (35-274)
pmultiqc/modules/common/plots/dia.py (1)
pmultiqc/modules/common/plots/general.py (1)
  • plot_html_check (17-22)
🪛 actionlint (1.7.9)
.github/workflows/python-app.yml

269-269: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 markdownlint-cli2 (0.18.1)
docs/README.md

84-84: Unordered list indentation
Expected: 0; Actual: 3

(MD007, ul-indent)

🪛 Ruff (0.14.10)
pmultiqc/modules/fragpipe/fragpipe_io.py

77-77: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: test_fragpipe
  • GitHub Check: test_proteobench
  • GitHub Check: test_mzid_mzML
  • GitHub Check: test_tmt
  • GitHub Check: test_maxquant
  • GitHub Check: test_lfq
  • GitHub Check: test_mzid_mgf
  • GitHub Check: test_maxquant_dia
  • GitHub Check: test_diann
  • GitHub Check: test_dia
  • GitHub Check: Agent
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (43)
pmultiqc/modules/maxquant/maxquant_utils.py (1)

406-408: Readability improvement via explicit parentheses.

The reformatted expression with parentheses clarifies operator precedence and improves readability. The functionality remains unchanged.

pmultiqc/modules/common/plots/dia.py (2)

224-224: LGTM! Simplified charge-state plot configuration.

The changes simplify the charge-state plot labels and configuration by removing "of Per File" from the title and ID. The addition of cpswitch_c_active: False on line 254 explicitly disables CPSwitch active state, which aligns with the simplified view.

Also applies to: 252-255


380-381: LGTM! Consistent HTML validation.

Adding plot_html_check to the delta mass plot ensures consistent HTML processing across all plots in this module. This aligns with the pattern used throughout the file (lines 39, 132, 171, 208, 267, 321, 415, 451, 490, 525, 560).

docs/README.md (1)

166-170: Documentation additions look good.

The FragPipe usage example, CLI option, and example link follow the established patterns and are consistent with other data source documentation.

Also applies to: 192-192, 246-246

docs/update_examples.py (1)

164-165: Implementation follows the established pattern.

The FragPipe plugin branch is correctly implemented and consistent with other plugin handlers.

.github/workflows/python-app.yml (1)

263-283: FragPipe test job implementation looks good.

The test job structure is consistent with other test jobs, correctly downloads the FragPipe psm.tsv file, and runs the appropriate MultiQC command.

docs/config.json (1)

167-182: FragPipe project entries are well-structured.

The two project entries (standard and disable_hoverinfo variants) follow the established pattern and are correctly configured with appropriate URLs, paths, and file types.

docs/PXD054720/multiqc_report.html (2)

31-31: Auto-generated content updates are expected.

The timestamps, reportUuid, and aiReportMetadataBase64 are auto-generated when the report is regenerated. These changes are not meaningful for code review.

Also applies to: 44-44, 51-51, 1215-1215


1391-1392: Verify intentional documentation updates for FragPipe support.

The references to DIA-NN and FragPipe in the HTML documentation appear to be intentional updates documenting new pipeline support. Ensure this HTML file is meant to be a version-controlled example/documentation artifact (rather than an accidentally-committed generated report).

README.md (3)

63-64: Documentation addition for FragPipe support is clear and consistent.

The new data source entry follows the existing pattern used for other proteomics tools, specifying psm.tsv as the FragPipe report file. This aligns well with the broader FragPipe integration in the codebase.


140-144: Usage example for FragPipe mirrors existing plugin patterns.

The example command is consistent with other plugin documentation (quantms, MaxQuant, DIA-NN, etc.) and clearly shows how to invoke the FragPipe plugin with typical arguments.


166-166: The --fragpipe-plugin option is properly registered in pyproject.toml (line 80) and matches the documentation.

The configuration correctly maps fragpipe_plugin = "pmultiqc.cli:fragpipe_plugin", confirming that the README documentation is accurate and consistent with the project setup.

docs/PXD054720_disable_hoverinfo/multiqc_report.html (4)

26-31: LGTM - Auto-generated report artifacts.

The updated base64-encoded plot data and new reportUuid are expected outputs from regenerating the MultiQC report with FragPipe support.


44-51: LGTM - Updated report metadata.

The updated base64-encoded metadata now includes FragPipe tool information, and the creation timestamp reflects the report regeneration.


1215-1215: LGTM - Consistent timestamp.

The generation date in the report footer is consistent with the configCreationDate value.


1391-1392: LGTM - FragPipe added to supported pipelines.

The FragPipe documentation link is properly formatted and consistent with the other tool references in the list.

pmultiqc/modules/fragpipe/fragpipe_io.py (3)

1-8: LGTM!

Imports and logger initialization follow the established pattern used in other modules.


11-17: LGTM!

The required columns definition is clear. The empty peptide list appears to be a placeholder for future expansion.


72-84: LGTM!

The psm_reader function correctly validates the required "Spectrum" column, derives the "Run" column, and filters to available required columns. The static analysis hint (TRY003) about the exception message is a minor style preference and can be safely ignored here.

pmultiqc/modules/common/plots/id.py (7)

228-235: LGTM!

Description and helptext updated to be more generic, now referencing both MaxQuant's evidence.txt and FragPipe's psm.tsv as data sources.


385-391: LGTM!

Function renamed from draw_quantms_identification to draw_identification for broader applicability across different pipeline sources.


546-599: LGTM!

Parameter renamed from enable_dia to use_two_columns which better describes its purpose (controlling table layout). The conditional logic is updated consistently.


619-627: LGTM!

Function renamed from draw_quantms_identi_num to draw_identi_num for consistency with other renames.


829-836: LGTM!

Description and helptext updated to mention FragPipe data source (psm.tsv), making it clear this table supports multiple pipeline types.


949-999: LGTM!

Parameter renamed from enable_mzid to is_fragpipe_or_mzid which accurately reflects its expanded usage for both mzIdentML and FragPipe contexts.


1044-1048: LGTM!

New FragPipe-specific branch added for retention time plots with appropriate description and helptext referencing the psm.tsv source.

pmultiqc/modules/diann/diann.py (4)

8-8: LGTM!

DiannModule now properly inherits from BasePMultiqcModule, following the new plugin architecture pattern. The base class initialization is correctly delegated.

Also applies to: 37-42


18-23: LGTM!

Import statements updated to use the renamed functions (draw_identification, draw_identi_num).


167-181: LGTM!

Function calls updated to use renamed functions and parameters (use_two_columns, draw_identi_num).


202-206: LGTM!

Call to draw_identification (renamed from draw_quantms_identification) is correctly updated.

pmultiqc/modules/fragpipe/__init__.py (1)

1-3: LGTM!

Clean package initializer that re-exports FragPipeModule following the established pattern used by other modules in the codebase.

pmultiqc/cli.py (1)

79-81: LGTM!

The new --fragpipe-plugin CLI option follows the established pattern used by other plugin flags (--diann-plugin, --maxquant-plugin, etc.).

pyproject.toml (1)

80-80: LGTM!

The fragpipe_plugin CLI option is correctly registered in the MultiQC plugin entry points, following the same pattern as other plugin options.

pmultiqc/main.py (1)

89-121: LGTM!

Formatting changes (multi-line wrapping of dict literals) improve readability without changing functionality.

pmultiqc/modules/mzidentml/mzidentml.py (1)

151-157: LGTM! API call updated correctly.

The function call correctly uses the renamed draw_identification API with all required parameters (cal_num_table_data, quantms_missed_cleavages, quantms_modified, msms_identified_rate) matching the updated function signature.

pmultiqc/modules/core/core.py (1)

13-20: Good extensible plugin registration pattern.

The PLUGIN_MAP provides a clean, maintainable way to register plugins with their module and class names. Adding new plugins only requires a single entry here.

pmultiqc/modules/quantms/quantms.py (3)

85-89: LGTM! Clean base class refactor.

The inheritance from BasePMultiqcModule and proper super().__init__() call correctly aligns with the plugin system refactoring pattern used across other modules.


333-348: API updates consistent with the refactor.

The parameter rename from enable_dia to use_two_columns and function rename from draw_quantms_identi_num to draw_identi_num correctly follow the centralized API changes.


458-464: LGTM! Function call updated to match the renamed API.

The call to draw_identification correctly replaces draw_quantms_identification with all required parameters.

pmultiqc/modules/fragpipe/fragpipe.py (4)

38-46: LGTM! Proper initialization pattern.

The module correctly inherits from BasePMultiqcModule and initializes instance variables for data collection.


142-183: LGTM! Robust PSM parsing with proper error handling.

The static method correctly handles missing/empty files with appropriate logging and gracefully skips unreadable data.


186-206: LGTM! Delta mass calculation follows established patterns.

The method correctly concatenates data, handles type conversion with error coercion, drops invalid values, and uses the shared cal_delta_mass_dict utility.


105-108: No action required. The draw_identi_num function signature is fully compatible with the fragpipe.py call. The function definition shows that all parameters except sub_sections have default values (enable_exp=False, enable_sdrf=False, is_multi_conditions=False, sample_df=None, file_df=None, cal_num_table_data=None), so passing only sub_sections and cal_num_table_data is safe and intentional.

- `*.mzML` or `*.mgf`: Corresponding spectra files

### 6. [FragPipe](https://fragpipe.nesvilab.org) files:
- `psm.tsv`: FragPipe report file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix indentation to align with other list items.

The list item is indented with 3 spaces instead of 0, which is inconsistent with other data source sections in the file and flagged by markdownlint.

🔎 Proposed fix
-   - `psm.tsv`: FragPipe report file
+- `psm.tsv`: FragPipe report file
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- `psm.tsv`: FragPipe report file
- `psm.tsv`: FragPipe report file
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

84-84: Unordered list indentation
Expected: 0; Actual: 3

(MD007, ul-indent)

🤖 Prompt for AI Agents
In docs/README.md around line 84, the list item "`psm.tsv`: FragPipe report
file" is indented with 3 spaces instead of matching the other top-level list
items; remove the leading spaces so the line starts at column 0 to align with
the other list entries and satisfy markdownlint.

Comment on lines +149 to +153
if "pmultiqc/tsv" not in config.sp:
config.update_dict(
config.sp,
{"pmultiqc/tsv": {"fn": "*.tsv", "num_lines": 0}},
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -n "pmultiqc/tsv" --type py

Repository: bigbio/pmultiqc

Length of output: 320


🏁 Script executed:

cat -n pmultiqc/modules/fragpipe/fragpipe_io.py | sed -n '35,60p'

Repository: bigbio/pmultiqc

Length of output: 1043


🏁 Script executed:

cat -n pmultiqc/modules/fragpipe/fragpipe_io.py | sed -n '30,75p'

Repository: bigbio/pmultiqc

Length of output: 1791


🏁 Script executed:

cat -n pmultiqc/main.py | sed -n '80,160p'

Repository: bigbio/pmultiqc

Length of output: 3153


The broad *.tsv pattern will match files intended for other modules, but filtering ensures correct behavior.

The *.tsv pattern will cause MultiQC to scan all TSV files, including experimental_design.tsv, *sdrf.tsv, and *report.tsv which are already matched by more specific patterns (pmultiqc/exp_design, pmultiqc/sdrf, pmultiqc/diann_report_tsv). However, fragpipe_io.get_fragpipe_files() filters results to only process files containing "psm" in the filename, so only FragPipe-specific TSV files are actually used. This is a working design, though it may cause unnecessary file scanning. If efficiency is a concern, consider restricting the pattern to *psm*.tsv or similar to avoid overlapping with other module patterns.

🤖 Prompt for AI Agents
In pmultiqc/main.py around lines 149 to 153, the config entry uses the broad
"*.tsv" pattern which causes MultiQC to scan unrelated TSVs; narrow the glob to
only FragPipe/PSM TSVs (for example "*psm*.tsv" or another pattern that includes
"psm") by updating the value passed to config.update_dict so it no longer uses a
generic "*.tsv" while preserving the existing metadata (num_lines: 0) and key
("pmultiqc/tsv").

Comment on lines +341 to 342
delta_df = df[["Ms1.Apex.Mz.Delta", "Run"]].copy()
delta_mass = cal_delta_mass_dict(delta_df, "Ms1.Apex.Mz.Delta")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find cal_delta_mass_dict function definition
rg -n "def cal_delta_mass_dict" --type python

Repository: bigbio/pmultiqc

Length of output: 89


🏁 Script executed:

#!/bin/bash
# Also search for any imports or references to cal_delta_mass_dict
rg -n "cal_delta_mass_dict" --type python -B 2 -A 10

Repository: bigbio/pmultiqc

Length of output: 89


🏁 Script executed:

#!/bin/bash
# Find cal_delta_mass_dict function definition
rg "def cal_delta_mass_dict" -n

Repository: bigbio/pmultiqc

Length of output: 128


🏁 Script executed:

#!/bin/bash
# Find all references to cal_delta_mass_dict
rg "cal_delta_mass_dict" -n -B 2 -A 15

Repository: bigbio/pmultiqc

Length of output: 9219


🏁 Script executed:

#!/bin/bash
# Get the full function definition of cal_delta_mass_dict
sed -n '42,60p' pmultiqc/modules/common/stats.py

Repository: bigbio/pmultiqc

Length of output: 629


Remove the unused "Run" column from delta_df.

The cal_delta_mass_dict function only uses the column specified in its second parameter. Since only "Ms1.Apex.Mz.Delta" is accessed, the "Run" column in delta_df is dead code. Change line 341 to:

delta_df = df[["Ms1.Apex.Mz.Delta"]].copy()
🤖 Prompt for AI Agents
In pmultiqc/modules/common/plots/dia.py around lines 341 to 342, delta_df
currently includes an unused "Run" column which is not used by
cal_delta_mass_dict; update the construction of delta_df to only include the
"Ms1.Apex.Mz.Delta" column (i.e., remove "Run") so the DataFrame copies only the
necessary column and eliminates dead code.

@ypriverol ypriverol merged commit eac5452 into main Jan 4, 2026
27 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants