Support new QPX format and fix warnings#13
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip Flake8 can be used to improve the quality of Python code reviews.Flake8 is a Python linter that wraps PyFlakes, pycodestyle and Ned Batchelder's McCabe script. To configure Flake8, add a '.flake8' or 'setup.cfg' file to your project root. See Flake8 Documentation for more details. |
There was a problem hiding this comment.
Pull request overview
This PR updates mokume’s DuckDB/parquet ingestion and filtering to be compatible with the latest QPX schema (including pg_accessions as list<struct>, is_decoy, anchor_protein, and new column names), while refactoring query construction to address Bandit SQL-injection warnings via parameterized execution. It also adds/updates tests to cover compatibility across legacy and new QPX formats.
Changes:
- Extend QPX parsing to normalize
pg_accessions(list<struct>→list<string>), supportanchor_protein, and surfaceis_decoy. - Refactor
SQLFilterBuilder.build_where_clause()to return(clause, params)and update call sites to execute parameterized queries. - Add a new QPX format compatibility test suite and update existing tests for the new filter builder API.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
mokume/io/feature.py |
Adds new/legacy QPX detection and normalization, parameterized filtering, and exposes optional new columns (is_decoy, anchor_protein). |
mokume/quantification/ratio.py |
Updates QPX schema handling and switches to parameterized execution when appending filter clauses. |
mokume/pipeline/stages.py |
Replaces f-string SQL with parameterized execute(sql, params) for filtered parquet view queries. |
mokume/reports/interactive.py |
Replaces HTML f-string generation with string.Template substitution. |
tests/test_qpx_format_compat.py |
Adds a new test suite covering both QPX schemas and deep-compat scenarios. |
tests/test_peptide_normalize.py |
Updates tests to the new (clause, params) API and adjusts assertions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
mokume/io/feature.py
Outdated
| @@ -122,14 +134,17 @@ def __init__( | |||
|
|
|||
| safe_path = database_path.replace("'", "''") | |||
| self.parquet_db.execute( | |||
| "CREATE VIEW parquet_db_raw AS SELECT * FROM parquet_scan('{}')".format(safe_path) | |||
| "".join(["CREATE VIEW parquet_db_raw AS SELECT * FROM parquet_scan('", safe_path, "')"]) | |||
| ) | |||
| def get_report_from_database(self, samples: list, columns: list = None): | ||
| """Retrieves a standardized report from the database for specified samples.""" | ||
| cols = ",".join(columns) if columns is not None else "*" | ||
| database = self.parquet_db.sql( | ||
| """SELECT {} FROM parquet_db WHERE sample_accession IN {}""".format( | ||
| cols, tuple(samples) | ||
| ) | ||
| ) | ||
| placeholders = ",".join(["?"] * len(samples)) | ||
| sql = "".join(["SELECT ", cols, " FROM parquet_db WHERE sample_accession IN (", placeholders, ")"]) | ||
| database = self.parquet_db.execute(sql, samples) |
| def get_report_condition_from_database(self, cons: list, columns: list = None) -> pd.DataFrame: | ||
| """Retrieves a standardized report from the database for specified conditions.""" | ||
| cols = ",".join(columns) if columns is not None else "*" | ||
| database = self.parquet_db.sql( | ||
| f"""SELECT {cols} FROM parquet_db WHERE condition IN {tuple(cons)}""" | ||
| ) | ||
| placeholders = ",".join(["?"] * len(cons)) | ||
| sql = "".join(["SELECT ", cols, " FROM parquet_db WHERE condition IN (", placeholders, ")"]) | ||
| database = self.parquet_db.execute(sql, cons) |
mokume/io/feature.py
Outdated
| stat_fn = "median" if (irs_stat or "").lower() == "median" else "avg" | ||
| if stat_fn not in _VALID_STAT_FNS: | ||
| raise ValueError(stat_fn) |
tests/test_peptide_normalize.py
Outdated
| def test_default_where_clause(self): | ||
| """Test that default filter builder generates expected WHERE clause.""" | ||
| builder = SQLFilterBuilder() | ||
| where_clause = builder.build_where_clause() | ||
| where_clause, params = builder.build_where_clause() | ||
|
|
||
| # Should include intensity > 0 | ||
| assert "intensity > 0" in where_clause | ||
| # Should include peptide length filter | ||
| assert 'LENGTH("sequence") >= 7' in where_clause | ||
| if "intensity > 0" not in where_clause: | ||
| raise AssertionError("Missing 'intensity > 0' in where_clause") | ||
| # Should include peptide length filter (parameterized) | ||
| if 'LENGTH("sequence") >= ?' not in where_clause: | ||
| raise AssertionError("Missing LENGTH filter in where_clause") | ||
| if 7 not in params: | ||
| raise AssertionError("Missing 7 in params") |
tests/test_qpx_format_compat.py
Outdated
| try: | ||
| first_acc = df["pg_accessions"].str[0].fillna("") | ||
| result = np.where( | ||
| first_acc.str.contains("|", regex=False), | ||
| first_acc.str.split("|").str[1], | ||
| first_acc, | ||
| ) | ||
| print(f"Parsed protein names: {result}") | ||
| parsed_ok = True | ||
| except Exception as e: | ||
| print(f"FAILED to parse pg_accessions: {e}") | ||
| parsed_ok = False | ||
|
|
||
| if not parsed_ok: | ||
| raise AssertionError("pg_accessions struct parsing failed - needs compatibility fix") | ||
|
|
mokume/quantification/ratio.py
Outdated
| ) | ||
| where_clause = filter_builder.build_where_clause() | ||
| where_clause, where_params = filter_builder.build_where_clause() | ||
|
|
…est asserts, defer is_decoy detection
Support new QPX format and fix Bandit security warnings
Summary
Adapt mokume to the latest QPX parquet schema and fix Bandit warnings across the codebase.
Changes
New QPX format support
pg_accessionsaslist<struct>vialist_transform(x -> x.accession)is_decoy(bool) for optimized decoy filtering inSQLFilterBuilderanchor_proteinfor protein-level grouping inget_low_frequency_peptidesuniqueas bool (previously int),charge/run_file_namecolumn detectiontest_qpx_format_compat.py, 13 tests)Bandit B608 — SQL injection prevention
SQLFilterBuilder.build_where_clause()to return(clause, params)tuple.format()SQL construction with"".join()+ parameterizedexecute(sql, params)interactive.pyfrom f-string tostring.Template