Conversation
Add FragPipe plugin
Auto-update examples
Add FragPipe plugin
📝 WalkthroughWalkthroughThis PR adds comprehensive FragPipe support to pmultiqc by introducing a new FragPipe module for PSM data parsing and visualization, refactoring the plugin system to use dynamic loading via PLUGIN_MAP, renaming identification-related API functions for consistency, and updating dependent modules to adopt the new patterns and base class. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI
participant Core as PMultiQC Core
participant PM as PLUGIN_MAP
participant Loader as Plugin Loader
participant Fragment as FragPipe Module
participant Log as Log Parser
CLI->>Core: Initialize with --fragpipe-plugin flag
Core->>Core: Check PLUGIN_MAP
Core->>PM: Lookup fragpipe_plugin key
PM-->>Core: Return module path
Core->>Loader: load_and_run_plugin()
Loader->>Fragment: Dynamically import & instantiate<br/>FragPipeModule(find_log_files, sub_sections, heatmap_colors)
Fragment->>Log: find_log_files() discover PSM data
Log-->>Fragment: Return fragpipe_files dict
Fragment->>Fragment: parse_psm() extract delta masses,<br/>charges, stats, retention
Fragment->>Fragment: draw_plots() generate visualizations
Fragment-->>Core: Return data & plots
Core->>Core: Add plots to report
Core-->>CLI: Report generated
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ast-grep (0.40.3)docs/PXD054720/multiqc_report.htmldocs/PXD054720_disable_hoverinfo/multiqc_report.htmlThanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
|||||||||||||||||||||||||
PR Code Suggestions ✨Explore these optional code suggestions:
|
|||||||||||||||||||
There was a problem hiding this comment.
Pull request overview
This PR implements support for FragPipe proteomics analysis results in the pmultiqc tool, adding the capability to generate quality control reports from FragPipe's PSM (peptide-spectrum match) output files.
Key changes:
- Added FragPipeModule with support for PSM file parsing and visualization
- Refactored existing modules to use a common base class (BasePMultiqcModule)
- Renamed several functions for better consistency across modules (e.g.,
draw_quantms_identification→draw_identification)
Reviewed changes
Copilot reviewed 19 out of 38 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pmultiqc/modules/fragpipe/* | New FragPipe module implementation including I/O and visualization logic |
| pmultiqc/modules/base.py | New base class for all pmultiqc modules providing common initialization |
| pmultiqc/modules/core/core.py | Refactored plugin loading to use a mapping structure and consolidated plugin initialization |
| pmultiqc/cli.py | Added --fragpipe-plugin command-line option |
| pyproject.toml | Registered the FragPipe plugin entry point |
| pmultiqc/main.py | Added TSV file pattern configuration for FragPipe file discovery |
| pmultiqc/modules/common/plots/id.py | Renamed functions and parameters for better generalization across modules |
| pmultiqc/modules/quantms/quantms.py | Updated to use BasePMultiqcModule and renamed function calls |
| pmultiqc/modules/diann/diann.py | Updated to use BasePMultiqcModule and renamed function calls |
| pmultiqc/modules/mzidentml/mzidentml.py | Updated to use renamed functions |
| docs/* | Updated documentation and examples, added FragPipe configuration |
| .github/workflows/python-app.yml | Added CI/CD test for FragPipe functionality |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # def check_columns(file_path: str, data_type: str): | ||
|
|
||
| # try: | ||
| # df_header = pd.read_csv(file_path, sep="\t", nrows=0) | ||
| # actual_columns = set(df_header.columns.str.strip()) | ||
|
|
||
| # missing_columns = [ | ||
| # col | ||
| # for col in REQUIRED_COLS[data_type] |
There was a problem hiding this comment.
This comment appears to contain commented-out code.
| # if not missing_columns: | ||
| # log.info("Check passed: All required columns are present.") | ||
| # return True | ||
| # else: | ||
| # log.info( | ||
| # f"Check failed: {file_path}. Missing the following columns: {missing_columns}" |
There was a problem hiding this comment.
This comment appears to contain commented-out code.
| # except Exception as e: | ||
| # log.warning(f"Check failed: Unable to read file: {e}") | ||
| # return False |
There was a problem hiding this comment.
This comment appears to contain commented-out code.
|
|
||
| if diann.get_data(): | ||
| diann.draw_plots() | ||
| plugin_loaded = True |
There was a problem hiding this comment.
Variable plugin_loaded is not used.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (7)
pmultiqc/modules/maxquant/maxquant_utils.py (1)
119-121: Move DataFrame copy outside the loop to improve performance.The
.copy()call inside the loop creates a full copy of the DataFrame on every iteration, which is inefficient for large datasets. Move the copy before the loop if needed, or remove it if column assignments don't require a separate copy.+ mq_data = mq_data.copy() for col, new_col in zip(intensity_cols, new_intensity_cols): mq_data[new_col] = mq_data[col] / mq_data["mol. weight [kda]"] - mq_data = mq_data.copy().github/workflows/python-app.yml (1)
269-269: Update to the latest version ofactions/setup-python.The
actions/setup-python@v4is outdated. The latest stable version isv6.1.0. Consider updating to@v6across all jobs in the workflow.pmultiqc/modules/fragpipe/fragpipe_io.py (2)
45-56: Dead code: protein/peptide filtering will never execute.The filtering conditions on lines 48-54 check for
"protein"and"peptide"inreq, butrequired_filesonly contains"psm"(line 35). This code is unreachable.Additionally, the substring match
if req in filename(line 46) could match unintended files (e.g.,my_psm_backup.tsv). Consider using a more precise pattern like checking if the filename ends withpsm.tsv.🔎 Proposed fix
- for req in req_set: - if req in filename: - - if req == "protein" and "combined_protein" in filename: - continue - - if req == "peptide" and ( - "combined_peptide" in filename or "combined_modified_peptide" in filename - ): - continue - - fragpipe_files[req].append(full_path) + for req in req_set: + # Match files ending with the expected pattern (e.g., psm.tsv) + if filename.endswith(f"{req}.tsv"): + fragpipe_files[req].append(full_path)
87-110: Consider removing or enabling the commented-out helper.This commented-out
check_columnsfunction could be useful for validating file structure before processing. Consider either enabling it in the workflow or removing it to reduce code clutter.pmultiqc/modules/core/core.py (1)
75-101: Unreachable code: theif not plugin_loadedcheck is dead code.The
returnstatement on line 98 exits the method immediately after loading the first matching plugin, so line 100's conditionif not plugin_loadedcan never be reached whenplugin_loadedisFalse. The only way to reach line 100 is if no plugin flag matched in the loop, which means the loop completed without settingplugin_loaded = True.The logic works correctly, but setting
plugin_loaded = Trueon line 97 is unnecessary since the method returns immediately afterward.🔎 Suggested simplification
for flag, (module_name, class_name) in PLUGIN_MAP.items(): if config.kwargs.get(flag, False): ModuleClass = get_module(module_name, class_name) if "proteobench_plugin" == flag: plugin = ModuleClass(self.find_log_files, None, None) else: plugin = ModuleClass( self.find_log_files, self.sub_sections, self.heatmap_color_list ) if plugin.get_data(): plugin.draw_plots() - plugin_loaded = True return - if not plugin_loaded: - raise ValueError("No pmultiqc plugin selected; skipping.") + raise ValueError("No pmultiqc plugin selected; skipping.")pmultiqc/modules/fragpipe/fragpipe.py (2)
309-320: Typo:categorysshould becategories.Minor spelling inconsistency with other modules.
🔎 Proposed fix
- categorys = OrderedDict() - categorys["Frequency"] = { + categories = OrderedDict() + categories["Frequency"] = { "name": "Frequency", "description": "number of peptides per proteins", } - pep_plot.to_dict(percentage=True, cats=categorys) + pep_plot.to_dict(percentage=True, cats=categories)
277-322: Consider extracting_calculate_statisticsas a module-level function vs class method.This is a module-level function (good for testability), but the Histogram title "number of peptides per proteins" uses lowercase, which differs from other modules that use "Number of peptides per proteins". Consider aligning for consistency.
🔎 Consistency fix for histogram title
- pep_plot = Histogram("number of peptides per proteins", plot_category="frequency") + pep_plot = Histogram("Number of peptides per proteins", plot_category="frequency")- "description": "number of peptides per proteins", + "description": "Number of identified peptides per protein.",
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (38)
.github/workflows/python-app.ymlREADME.mddocs/DIANN/multiqc_report.htmldocs/DIANN_disable_hoverinfo/multiqc_report.htmldocs/LFQ_PXD007683/multiqc_report.htmldocs/LFQ_PXD007683_disable_hoverinfo/multiqc_report.htmldocs/MaxDIA/multiqc_report.htmldocs/MaxDIA_disable_hoverinfo/multiqc_report.htmldocs/PXD003133/multiqc_report.htmldocs/PXD003133_disable_hoverinfo/multiqc_report.htmldocs/PXD053068/multiqc_report.htmldocs/PXD053068_disable_hoverinfo/multiqc_report.htmldocs/PXD054720/multiqc_report.htmldocs/PXD054720_disable_hoverinfo/multiqc_report.htmldocs/PXD070239/multiqc_report.htmldocs/PXD070239_disable_hoverinfo/multiqc_report.htmldocs/ProteoBench/multiqc_report.htmldocs/ProteoBench_disable_hoverinfo/multiqc_report.htmldocs/README.mddocs/TMT_PXD007683/multiqc_report.htmldocs/TMT_PXD007683_disable_hoverinfo/multiqc_report.htmldocs/config.jsondocs/dia/multiqc_report.htmldocs/dia_disable_hoverinfo/multiqc_report.htmldocs/update_examples.pypmultiqc/cli.pypmultiqc/main.pypmultiqc/modules/common/plots/dia.pypmultiqc/modules/common/plots/id.pypmultiqc/modules/core/core.pypmultiqc/modules/diann/diann.pypmultiqc/modules/fragpipe/__init__.pypmultiqc/modules/fragpipe/fragpipe.pypmultiqc/modules/fragpipe/fragpipe_io.pypmultiqc/modules/maxquant/maxquant_utils.pypmultiqc/modules/mzidentml/mzidentml.pypmultiqc/modules/quantms/quantms.pypyproject.toml
🧰 Additional context used
🧬 Code graph analysis (4)
pmultiqc/modules/fragpipe/fragpipe_io.py (1)
pmultiqc/modules/common/logging.py (1)
get_logger(31-86)
pmultiqc/modules/diann/diann.py (2)
pmultiqc/modules/base.py (1)
BasePMultiqcModule(3-20)pmultiqc/modules/common/plots/id.py (3)
draw_identification(385-517)draw_summary_protein_ident_table(546-616)draw_identi_num(619-837)
pmultiqc/modules/fragpipe/__init__.py (1)
pmultiqc/modules/fragpipe/fragpipe.py (1)
FragPipeModule(35-274)
pmultiqc/modules/common/plots/dia.py (1)
pmultiqc/modules/common/plots/general.py (1)
plot_html_check(17-22)
🪛 actionlint (1.7.9)
.github/workflows/python-app.yml
269-269: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🪛 markdownlint-cli2 (0.18.1)
docs/README.md
84-84: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
🪛 Ruff (0.14.10)
pmultiqc/modules/fragpipe/fragpipe_io.py
77-77: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test_fragpipe
- GitHub Check: test_proteobench
- GitHub Check: test_mzid_mzML
- GitHub Check: test_tmt
- GitHub Check: test_maxquant
- GitHub Check: test_lfq
- GitHub Check: test_mzid_mgf
- GitHub Check: test_maxquant_dia
- GitHub Check: test_diann
- GitHub Check: test_dia
- GitHub Check: Agent
- GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (43)
pmultiqc/modules/maxquant/maxquant_utils.py (1)
406-408: Readability improvement via explicit parentheses.The reformatted expression with parentheses clarifies operator precedence and improves readability. The functionality remains unchanged.
pmultiqc/modules/common/plots/dia.py (2)
224-224: LGTM! Simplified charge-state plot configuration.The changes simplify the charge-state plot labels and configuration by removing "of Per File" from the title and ID. The addition of
cpswitch_c_active: Falseon line 254 explicitly disables CPSwitch active state, which aligns with the simplified view.Also applies to: 252-255
380-381: LGTM! Consistent HTML validation.Adding
plot_html_checkto the delta mass plot ensures consistent HTML processing across all plots in this module. This aligns with the pattern used throughout the file (lines 39, 132, 171, 208, 267, 321, 415, 451, 490, 525, 560).docs/README.md (1)
166-170: Documentation additions look good.The FragPipe usage example, CLI option, and example link follow the established patterns and are consistent with other data source documentation.
Also applies to: 192-192, 246-246
docs/update_examples.py (1)
164-165: Implementation follows the established pattern.The FragPipe plugin branch is correctly implemented and consistent with other plugin handlers.
.github/workflows/python-app.yml (1)
263-283: FragPipe test job implementation looks good.The test job structure is consistent with other test jobs, correctly downloads the FragPipe psm.tsv file, and runs the appropriate MultiQC command.
docs/config.json (1)
167-182: FragPipe project entries are well-structured.The two project entries (standard and disable_hoverinfo variants) follow the established pattern and are correctly configured with appropriate URLs, paths, and file types.
docs/PXD054720/multiqc_report.html (2)
31-31: Auto-generated content updates are expected.The timestamps, reportUuid, and aiReportMetadataBase64 are auto-generated when the report is regenerated. These changes are not meaningful for code review.
Also applies to: 44-44, 51-51, 1215-1215
1391-1392: Verify intentional documentation updates for FragPipe support.The references to DIA-NN and FragPipe in the HTML documentation appear to be intentional updates documenting new pipeline support. Ensure this HTML file is meant to be a version-controlled example/documentation artifact (rather than an accidentally-committed generated report).
README.md (3)
63-64: Documentation addition for FragPipe support is clear and consistent.The new data source entry follows the existing pattern used for other proteomics tools, specifying
psm.tsvas the FragPipe report file. This aligns well with the broader FragPipe integration in the codebase.
140-144: Usage example for FragPipe mirrors existing plugin patterns.The example command is consistent with other plugin documentation (quantms, MaxQuant, DIA-NN, etc.) and clearly shows how to invoke the FragPipe plugin with typical arguments.
166-166: The--fragpipe-pluginoption is properly registered inpyproject.toml(line 80) and matches the documentation.The configuration correctly maps
fragpipe_plugin = "pmultiqc.cli:fragpipe_plugin", confirming that the README documentation is accurate and consistent with the project setup.docs/PXD054720_disable_hoverinfo/multiqc_report.html (4)
26-31: LGTM - Auto-generated report artifacts.The updated base64-encoded plot data and new
reportUuidare expected outputs from regenerating the MultiQC report with FragPipe support.
44-51: LGTM - Updated report metadata.The updated base64-encoded metadata now includes FragPipe tool information, and the creation timestamp reflects the report regeneration.
1215-1215: LGTM - Consistent timestamp.The generation date in the report footer is consistent with the
configCreationDatevalue.
1391-1392: LGTM - FragPipe added to supported pipelines.The FragPipe documentation link is properly formatted and consistent with the other tool references in the list.
pmultiqc/modules/fragpipe/fragpipe_io.py (3)
1-8: LGTM!Imports and logger initialization follow the established pattern used in other modules.
11-17: LGTM!The required columns definition is clear. The empty
peptidelist appears to be a placeholder for future expansion.
72-84: LGTM!The
psm_readerfunction correctly validates the required "Spectrum" column, derives the "Run" column, and filters to available required columns. The static analysis hint (TRY003) about the exception message is a minor style preference and can be safely ignored here.pmultiqc/modules/common/plots/id.py (7)
228-235: LGTM!Description and helptext updated to be more generic, now referencing both MaxQuant's
evidence.txtand FragPipe'spsm.tsvas data sources.
385-391: LGTM!Function renamed from
draw_quantms_identificationtodraw_identificationfor broader applicability across different pipeline sources.
546-599: LGTM!Parameter renamed from
enable_diatouse_two_columnswhich better describes its purpose (controlling table layout). The conditional logic is updated consistently.
619-627: LGTM!Function renamed from
draw_quantms_identi_numtodraw_identi_numfor consistency with other renames.
829-836: LGTM!Description and helptext updated to mention FragPipe data source (
psm.tsv), making it clear this table supports multiple pipeline types.
949-999: LGTM!Parameter renamed from
enable_mzidtois_fragpipe_or_mzidwhich accurately reflects its expanded usage for both mzIdentML and FragPipe contexts.
1044-1048: LGTM!New FragPipe-specific branch added for retention time plots with appropriate description and helptext referencing the
psm.tsvsource.pmultiqc/modules/diann/diann.py (4)
8-8: LGTM!DiannModule now properly inherits from
BasePMultiqcModule, following the new plugin architecture pattern. The base class initialization is correctly delegated.Also applies to: 37-42
18-23: LGTM!Import statements updated to use the renamed functions (
draw_identification,draw_identi_num).
167-181: LGTM!Function calls updated to use renamed functions and parameters (
use_two_columns,draw_identi_num).
202-206: LGTM!Call to
draw_identification(renamed fromdraw_quantms_identification) is correctly updated.pmultiqc/modules/fragpipe/__init__.py (1)
1-3: LGTM!Clean package initializer that re-exports
FragPipeModulefollowing the established pattern used by other modules in the codebase.pmultiqc/cli.py (1)
79-81: LGTM!The new
--fragpipe-pluginCLI option follows the established pattern used by other plugin flags (--diann-plugin,--maxquant-plugin, etc.).pyproject.toml (1)
80-80: LGTM!The
fragpipe_pluginCLI option is correctly registered in the MultiQC plugin entry points, following the same pattern as other plugin options.pmultiqc/main.py (1)
89-121: LGTM!Formatting changes (multi-line wrapping of dict literals) improve readability without changing functionality.
pmultiqc/modules/mzidentml/mzidentml.py (1)
151-157: LGTM! API call updated correctly.The function call correctly uses the renamed
draw_identificationAPI with all required parameters (cal_num_table_data,quantms_missed_cleavages,quantms_modified,msms_identified_rate) matching the updated function signature.pmultiqc/modules/core/core.py (1)
13-20: Good extensible plugin registration pattern.The
PLUGIN_MAPprovides a clean, maintainable way to register plugins with their module and class names. Adding new plugins only requires a single entry here.pmultiqc/modules/quantms/quantms.py (3)
85-89: LGTM! Clean base class refactor.The inheritance from
BasePMultiqcModuleand propersuper().__init__()call correctly aligns with the plugin system refactoring pattern used across other modules.
333-348: API updates consistent with the refactor.The parameter rename from
enable_diatouse_two_columnsand function rename fromdraw_quantms_identi_numtodraw_identi_numcorrectly follow the centralized API changes.
458-464: LGTM! Function call updated to match the renamed API.The call to
draw_identificationcorrectly replacesdraw_quantms_identificationwith all required parameters.pmultiqc/modules/fragpipe/fragpipe.py (4)
38-46: LGTM! Proper initialization pattern.The module correctly inherits from
BasePMultiqcModuleand initializes instance variables for data collection.
142-183: LGTM! Robust PSM parsing with proper error handling.The static method correctly handles missing/empty files with appropriate logging and gracefully skips unreadable data.
186-206: LGTM! Delta mass calculation follows established patterns.The method correctly concatenates data, handles type conversion with error coercion, drops invalid values, and uses the shared
cal_delta_mass_dictutility.
105-108: No action required. Thedraw_identi_numfunction signature is fully compatible with the fragpipe.py call. The function definition shows that all parameters exceptsub_sectionshave default values (enable_exp=False,enable_sdrf=False,is_multi_conditions=False,sample_df=None,file_df=None,cal_num_table_data=None), so passing onlysub_sectionsandcal_num_table_datais safe and intentional.
| - `*.mzML` or `*.mgf`: Corresponding spectra files | ||
|
|
||
| ### 6. [FragPipe](https://fragpipe.nesvilab.org) files: | ||
| - `psm.tsv`: FragPipe report file |
There was a problem hiding this comment.
Fix indentation to align with other list items.
The list item is indented with 3 spaces instead of 0, which is inconsistent with other data source sections in the file and flagged by markdownlint.
🔎 Proposed fix
- - `psm.tsv`: FragPipe report file
+- `psm.tsv`: FragPipe report file📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - `psm.tsv`: FragPipe report file | |
| - `psm.tsv`: FragPipe report file |
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
84-84: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
🤖 Prompt for AI Agents
In docs/README.md around line 84, the list item "`psm.tsv`: FragPipe report
file" is indented with 3 spaces instead of matching the other top-level list
items; remove the leading spaces so the line starts at column 0 to align with
the other list entries and satisfy markdownlint.
| if "pmultiqc/tsv" not in config.sp: | ||
| config.update_dict( | ||
| config.sp, | ||
| {"pmultiqc/tsv": {"fn": "*.tsv", "num_lines": 0}}, | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -n "pmultiqc/tsv" --type pyRepository: bigbio/pmultiqc
Length of output: 320
🏁 Script executed:
cat -n pmultiqc/modules/fragpipe/fragpipe_io.py | sed -n '35,60p'Repository: bigbio/pmultiqc
Length of output: 1043
🏁 Script executed:
cat -n pmultiqc/modules/fragpipe/fragpipe_io.py | sed -n '30,75p'Repository: bigbio/pmultiqc
Length of output: 1791
🏁 Script executed:
cat -n pmultiqc/main.py | sed -n '80,160p'Repository: bigbio/pmultiqc
Length of output: 3153
The broad *.tsv pattern will match files intended for other modules, but filtering ensures correct behavior.
The *.tsv pattern will cause MultiQC to scan all TSV files, including experimental_design.tsv, *sdrf.tsv, and *report.tsv which are already matched by more specific patterns (pmultiqc/exp_design, pmultiqc/sdrf, pmultiqc/diann_report_tsv). However, fragpipe_io.get_fragpipe_files() filters results to only process files containing "psm" in the filename, so only FragPipe-specific TSV files are actually used. This is a working design, though it may cause unnecessary file scanning. If efficiency is a concern, consider restricting the pattern to *psm*.tsv or similar to avoid overlapping with other module patterns.
🤖 Prompt for AI Agents
In pmultiqc/main.py around lines 149 to 153, the config entry uses the broad
"*.tsv" pattern which causes MultiQC to scan unrelated TSVs; narrow the glob to
only FragPipe/PSM TSVs (for example "*psm*.tsv" or another pattern that includes
"psm") by updating the value passed to config.update_dict so it no longer uses a
generic "*.tsv" while preserving the existing metadata (num_lines: 0) and key
("pmultiqc/tsv").
| delta_df = df[["Ms1.Apex.Mz.Delta", "Run"]].copy() | ||
| delta_mass = cal_delta_mass_dict(delta_df, "Ms1.Apex.Mz.Delta") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find cal_delta_mass_dict function definition
rg -n "def cal_delta_mass_dict" --type pythonRepository: bigbio/pmultiqc
Length of output: 89
🏁 Script executed:
#!/bin/bash
# Also search for any imports or references to cal_delta_mass_dict
rg -n "cal_delta_mass_dict" --type python -B 2 -A 10Repository: bigbio/pmultiqc
Length of output: 89
🏁 Script executed:
#!/bin/bash
# Find cal_delta_mass_dict function definition
rg "def cal_delta_mass_dict" -nRepository: bigbio/pmultiqc
Length of output: 128
🏁 Script executed:
#!/bin/bash
# Find all references to cal_delta_mass_dict
rg "cal_delta_mass_dict" -n -B 2 -A 15Repository: bigbio/pmultiqc
Length of output: 9219
🏁 Script executed:
#!/bin/bash
# Get the full function definition of cal_delta_mass_dict
sed -n '42,60p' pmultiqc/modules/common/stats.pyRepository: bigbio/pmultiqc
Length of output: 629
Remove the unused "Run" column from delta_df.
The cal_delta_mass_dict function only uses the column specified in its second parameter. Since only "Ms1.Apex.Mz.Delta" is accessed, the "Run" column in delta_df is dead code. Change line 341 to:
delta_df = df[["Ms1.Apex.Mz.Delta"]].copy()
🤖 Prompt for AI Agents
In pmultiqc/modules/common/plots/dia.py around lines 341 to 342, delta_df
currently includes an unused "Run" column which is not used by
cal_delta_mass_dict; update the construction of delta_df to only include the
"Ms1.Apex.Mz.Delta" column (i.e., remove "Run") so the DataFrame copies only the
necessary column and eliminates dead code.
User description
Pull Request
Description
Brief description of the changes made in this PR.
Type of Change
PR Type
Enhancement, Tests, Documentation
Description
New FragPipe module implementation with complete data parsing and visualization capabilities for PSM (Peptide Spectrum Match) analysis
Parses PSM data including delta mass, charge states, and retention times; generates plots for mass error, charge-state distribution, identification statistics, and retention time analysis
Calculates pipeline statistics including protein and peptide counts with modification tracking
Refactored common plotting functions to support multiple pipelines: renamed
draw_quantms_identification()todraw_identification()and updated parameter names for broader applicabilityUpdated existing modules (QuantMS, DIA-NN, mzIdentML) to use renamed functions and inherit from
BasePMultiqcModulefor code reuseRefactored core plugin loading system with centralized
PLUGIN_MAPconfiguration and loop-based plugin loading for maintainabilityAdded FragPipe CLI option (
--fragpipe-plugin) and TSV file configuration for FragPipe supportComprehensive documentation including README updates, usage examples, and example dataset configurations
CI/CD integration with new FragPipe test job that downloads test data from PRIDE repository and validates plugin functionality
Diagram Walkthrough
File Walkthrough
6 files
fragpipe.py
FragPipe module implementation with data parsing and plottingpmultiqc/modules/fragpipe/fragpipe.py
capabilities
states, and retention times
identification statistics, and retention time analysis
with modification tracking
fragpipe_io.py
FragPipe file I/O and data reading utilitiespmultiqc/modules/fragpipe/fragpipe_io.py
get_fragpipe_files()to locate and load FragPipe TSV filespsm_reader()to parse PSM data with column validation andrun extraction
__init__.py
FragPipe module package initializationpmultiqc/modules/fragpipe/init.py
FragPipeModuleclass for public APIdia.py
Update DIA charge-state and delta mass plot configurationspmultiqc/modules/common/plots/dia.py
cpswitch_c_activeparameter to charge-state plot configurationvalues
cli.py
Add FragPipe plugin command-line optionpmultiqc/cli.py
--fragpipe-plugincommand-line option to enable FragPipe pluginmultiqc_report.html
Update MultiQC report with FragPipe supportdocs/PXD054720_disable_hoverinfo/multiqc_report.html
tools
5 files
id.py
Refactor function names and parameters for multi-pipeline supportpmultiqc/modules/common/plots/id.py
draw_quantms_identification()todraw_identification()forbroader applicability
draw_quantms_identi_num()todraw_identi_num()for consistencydraw_summary_protein_ident_table()parameter fromenable_diato
use_two_columnsdraw_num_pep_per_protein()parameter fromenable_mzidtois_fragpipe_or_mziddelta mass and retention time
draw_ids_rt_count()quantms.py
Refactor QuantMS module to use base class and updated function namespmultiqc/modules/quantms/quantms.py
QuantMSModuleto inherit fromBasePMultiqcModulefor codereuse
draw_identification,draw_identi_num)use_two_columnsinstead ofenable_dia)diann.py
Refactor DIA-NN module to use base class and updated function namespmultiqc/modules/diann/diann.py
DiannModuleto inherit fromBasePMultiqcModulefor code reusedraw_identification,draw_identi_num)use_two_columnsinstead ofenable_dia)mzidentml.py
Update mzIdentML module function callpmultiqc/modules/mzidentml/mzidentml.py
draw_quantms_identification()todraw_identification()core.py
Refactor core plugin loading system with centralized configurationpmultiqc/modules/core/core.py
PLUGIN_MAPdictionary to centralize plugin configuration mappingload_and_run_plugin()method formaintainability
heatmap_color_listvariable scope by assigning to instancevariable
1 files
maxquant_utils.py
Code formatting fix for percentage calculationpmultiqc/modules/maxquant/maxquant_utils.py
peptide per protein analysis
3 files
main.py
Add TSV file configuration for FragPipe supportpmultiqc/main.py
pmultiqc/tsvconfiguration for FragPipe TSV file detectionline breaks
config.json
Add FragPipe example dataset configurationsdocs/config.json
PXD070239_disable_hoverinfo)
psm.tsvfrom PRIDE FTP repositorypyproject.toml
Register FragPipe plugin CLI entry pointpyproject.toml
fragpipe_pluginentry point for CLI command registration4 files
update_examples.py
Add FragPipe example report generationdocs/update_examples.py
README.md
Add FragPipe documentation and usage examplesdocs/README.md
psm.tsvfile requirement--fragpipe-plugincommandmultiqc_report.html
Update example report with FragPipe documentationdocs/PXD054720/multiqc_report.html
README.md
Document FragPipe support and usage instructionsREADME.md
psm.tsvfileformat
--fragpipe-pluginto the command-line options table withdescription
1 files
python-app.yml
Add FragPipe integration test to CI/CD pipeline.github/workflows/python-app.yml
test_fragpipejob to the CI/CD workflow--fragpipe-pluginflag on test data18 files
Summary by CodeRabbit
Release Notes
New Features
Documentation
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.