Skip to content

Support new QPX format and fix warnings#13

Open
Shen-YuFei wants to merge 8 commits intobigbio:devfrom
Shen-YuFei:dev
Open

Support new QPX format and fix warnings#13
Shen-YuFei wants to merge 8 commits intobigbio:devfrom
Shen-YuFei:dev

Conversation

@Shen-YuFei
Copy link
Contributor

Support new QPX format and fix Bandit security warnings

Summary

Adapt mokume to the latest QPX parquet schema and fix Bandit warnings across the codebase.

Changes

New QPX format support

  • Detect and handle pg_accessions as list<struct> via list_transform(x -> x.accession)
  • Support is_decoy (bool) for optimized decoy filtering in SQLFilterBuilder
  • Support anchor_protein for protein-level grouping in get_low_frequency_peptides
  • Handle unique as bool (previously int), charge/run_file_name column detection
  • Add comprehensive compatibility test suite (test_qpx_format_compat.py, 13 tests)

Bandit B608 — SQL injection prevention

  • Refactor SQLFilterBuilder.build_where_clause() to return (clause, params) tuple
  • Replace all f-string/.format() SQL construction with "".join() + parameterized execute(sql, params)
  • Convert HTML template in interactive.py from f-string to string.Template

Copilot AI review requested due to automatic review settings March 18, 2026 10:36
@coderabbitai
Copy link

coderabbitai bot commented Mar 18, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5f353513-c323-4e08-9617-ebf506525186

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

Flake8 can be used to improve the quality of Python code reviews.

Flake8 is a Python linter that wraps PyFlakes, pycodestyle and Ned Batchelder's McCabe script.

To configure Flake8, add a '.flake8' or 'setup.cfg' file to your project root.

See Flake8 Documentation for more details.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates mokume’s DuckDB/parquet ingestion and filtering to be compatible with the latest QPX schema (including pg_accessions as list<struct>, is_decoy, anchor_protein, and new column names), while refactoring query construction to address Bandit SQL-injection warnings via parameterized execution. It also adds/updates tests to cover compatibility across legacy and new QPX formats.

Changes:

  • Extend QPX parsing to normalize pg_accessions (list<struct>list<string>), support anchor_protein, and surface is_decoy.
  • Refactor SQLFilterBuilder.build_where_clause() to return (clause, params) and update call sites to execute parameterized queries.
  • Add a new QPX format compatibility test suite and update existing tests for the new filter builder API.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
mokume/io/feature.py Adds new/legacy QPX detection and normalization, parameterized filtering, and exposes optional new columns (is_decoy, anchor_protein).
mokume/quantification/ratio.py Updates QPX schema handling and switches to parameterized execution when appending filter clauses.
mokume/pipeline/stages.py Replaces f-string SQL with parameterized execute(sql, params) for filtered parquet view queries.
mokume/reports/interactive.py Replaces HTML f-string generation with string.Template substitution.
tests/test_qpx_format_compat.py Adds a new test suite covering both QPX schemas and deep-compat scenarios.
tests/test_peptide_normalize.py Updates tests to the new (clause, params) API and adjusts assertions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines 133 to 138
@@ -122,14 +134,17 @@ def __init__(

safe_path = database_path.replace("'", "''")
self.parquet_db.execute(
"CREATE VIEW parquet_db_raw AS SELECT * FROM parquet_scan('{}')".format(safe_path)
"".join(["CREATE VIEW parquet_db_raw AS SELECT * FROM parquet_scan('", safe_path, "')"])
)
Comment on lines 430 to +435
def get_report_from_database(self, samples: list, columns: list = None):
"""Retrieves a standardized report from the database for specified samples."""
cols = ",".join(columns) if columns is not None else "*"
database = self.parquet_db.sql(
"""SELECT {} FROM parquet_db WHERE sample_accession IN {}""".format(
cols, tuple(samples)
)
)
placeholders = ",".join(["?"] * len(samples))
sql = "".join(["SELECT ", cols, " FROM parquet_db WHERE sample_accession IN (", placeholders, ")"])
database = self.parquet_db.execute(sql, samples)
Comment on lines 519 to +524
def get_report_condition_from_database(self, cons: list, columns: list = None) -> pd.DataFrame:
"""Retrieves a standardized report from the database for specified conditions."""
cols = ",".join(columns) if columns is not None else "*"
database = self.parquet_db.sql(
f"""SELECT {cols} FROM parquet_db WHERE condition IN {tuple(cons)}"""
)
placeholders = ",".join(["?"] * len(cons))
sql = "".join(["SELECT ", cols, " FROM parquet_db WHERE condition IN (", placeholders, ")"])
database = self.parquet_db.execute(sql, cons)
Comment on lines +609 to +611
stat_fn = "median" if (irs_stat or "").lower() == "median" else "avg"
if stat_fn not in _VALID_STAT_FNS:
raise ValueError(stat_fn)
Comment on lines +20 to +32
def test_default_where_clause(self):
"""Test that default filter builder generates expected WHERE clause."""
builder = SQLFilterBuilder()
where_clause = builder.build_where_clause()
where_clause, params = builder.build_where_clause()

# Should include intensity > 0
assert "intensity > 0" in where_clause
# Should include peptide length filter
assert 'LENGTH("sequence") >= 7' in where_clause
if "intensity > 0" not in where_clause:
raise AssertionError("Missing 'intensity > 0' in where_clause")
# Should include peptide length filter (parameterized)
if 'LENGTH("sequence") >= ?' not in where_clause:
raise AssertionError("Missing LENGTH filter in where_clause")
if 7 not in params:
raise AssertionError("Missing 7 in params")
Comment on lines +231 to +246
try:
first_acc = df["pg_accessions"].str[0].fillna("")
result = np.where(
first_acc.str.contains("|", regex=False),
first_acc.str.split("|").str[1],
first_acc,
)
print(f"Parsed protein names: {result}")
parsed_ok = True
except Exception as e:
print(f"FAILED to parse pg_accessions: {e}")
parsed_ok = False

if not parsed_ok:
raise AssertionError("pg_accessions struct parsing failed - needs compatibility fix")

Comment on lines 304 to 311
)
where_clause = filter_builder.build_where_clause()
where_clause, where_params = filter_builder.build_where_clause()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants