Skip to content

fix(queries): eliminate double-escaping in regex patterns #541

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

0xbrayo
Copy link
Member

@0xbrayo 0xbrayo commented Aug 9, 2025

The serialize_classes function was using serde_json serialization which caused regex patterns to be double-escaped (e.g., t.co became t\.co, (noticing markdown also does escaping so viewing the raw text is best) breaking pattern matching in ActivityWatch queries.

Core changes:

  • Rewrite serialize_classes() to build JSON strings manually instead of using serde_json
  • Preserve single-escaped regex patterns for proper matching
  • Only include 'ignore_case' field when true (omit when false)
  • Only include 'regex' field for non-'none' type categories
  • Improve error handling in classes deserialization
  • Add optional fields to ClassSetting struct for better compatibility This fixes the core regex pattern matching issues in ActivityWatch canonical queries that were causing incorrect categorization results.

Important

Fixes double-escaping of regex patterns in ActivityWatch queries by manually constructing JSON strings in serialize_classes() and improves error handling in get_classes_from_server().

  • Behavior:
    • serialize_classes() in queries.rs now manually constructs JSON strings to avoid double-escaping regex patterns.
    • Preserves single-escaped regex patterns for accurate matching.
    • Omits 'ignore_case' field when false and 'regex' field for 'none' type categories.
  • Error Handling:
    • Improved error handling in get_classes_from_server() in classes.rs.
  • Models:
    • Added optional fields id and data to ClassSetting struct in classes.rs for better compatibility.

This description was created by Ellipsis for b705f04. You can customize this summary. It will automatically update as commits are pushed.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to f38457f in 1 minute and 51 seconds. Click for details.
  • Reviewed 208 lines of code in 4 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. aw-client-rust/src/classes.rs:182
  • Draft comment:
    Good improvement on error handling when deserializing classes settings. Logging the error detail helps troubleshooting.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
2. aw-client-rust/src/queries.rs:168
  • Draft comment:
    Manual JSON construction in serialize_classes may be fragile if any field contains quotes. Consider reusing the escape_doublequote helper to safely escape fields like the regex pattern and category strings.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment raises a valid security/correctness concern. If category names or regex patterns contain quotes, the manual JSON construction could break. However, looking at the code more carefully, the format!() with "{}" will automatically escape quotes in strings. The comment author may have missed this. Additionally, the function's comment explicitly states this is intentional to avoid double-escaping. I could be wrong about format!() handling quote escaping properly in all cases. There might be edge cases I'm not considering. The code's explicit comment about avoiding double-escaping suggests this was a deliberate design choice. The test cases show it working correctly with complex strings. The comment should be deleted. The current implementation appears intentional and correct, with format!() providing necessary escaping. The comment misunderstands the purpose of the manual JSON construction.
3. aw-client-rust/src/queries.rs:231
  • Draft comment:
    The new string literal that spans two lines does not indent the second line (it now starts with not_afk without preceding whitespace). This may cause an unintended change in the query string formatting. If a leading whitespace was intended, please add it.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment suggests this might be an unintended change, but looking at the full context shows this is actually an intentional formatting change applied consistently throughout the file. The query strings are being built programmatically and the indentation doesn't affect functionality. The author appears to have deliberately chosen to align all continuation lines at column 0. I could be wrong about the intentionality - maybe some of these formatting changes were accidental. However, the consistency of the changes across multiple locations suggests otherwise. The systematic nature of the changes, affecting multiple similar strings in the same way, strongly indicates this was an intentional reformatting rather than a mistake that needs to be fixed. This comment should be deleted because it's pointing out what appears to be an intentional formatting change that was consistently applied across the file.

Workflow ID: wflow_COtcfsVZJSCHbmxI

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@@ -123,6 +123,12 @@ impl AwClient {
.map(|(start, stop)| format!("{}/{}", start, stop))
.collect();

let query_lines: Vec<&str> = query.split('\n').collect();
let payload = json!({
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable 'payload' is computed but never used; consider removing it or using it in the request for clarity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these were for debugging nice catch

}

// Only include ignore_case when true
if category_spec.ignore_case {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new logic omits the 'ignore_case' field when false per PR description, but the test (test_classes_serialization) expects it to be present. Update the test to align with the intended behavior.

@0xbrayo 0xbrayo marked this pull request as draft August 9, 2025 15:52
@0xbrayo
Copy link
Member Author

0xbrayo commented Aug 9, 2025

@ellipsis-dev implement both suggestions

@0xbrayo
Copy link
Member Author

0xbrayo commented Aug 9, 2025

@ellipsis-dev fix the failing test to align with intended behavior

Copy link

ellipsis-dev bot commented Aug 9, 2025

Hi @0xbrayo, I attempted to address the failing test but couldn't locate the specific test that needs adjustment. Could you please provide more details about the failing test, such as its name, location, and the error message it produces? This information will help me align the test with the intended behavior. Thank you!

@0xbrayo
Copy link
Member Author

0xbrayo commented Aug 9, 2025

@ellipsis-dev The new logic omits the 'ignore_case' field when false per PR description, but the test (test_classes_serialization) expects it to be present. Update the test to align with the intended behavior.

Copy link

ellipsis-dev bot commented Aug 9, 2025

Hi @0xbrayo, I attempted to locate the 'test_classes_serialization' test to update it according to the new logic, but I couldn't find it in the codebase. Could you please clarify where this test is located? If it doesn't exist yet, would you like me to create a new test in classes.rs to verify the serialization behavior, particularly the omission of the 'ignore_case' field when it's false? Thank you!

@0xbrayo
Copy link
Member Author

0xbrayo commented Aug 9, 2025

nevermind. I'll just do it.

@0xbrayo 0xbrayo force-pushed the queries-fix branch 2 times, most recently from d23b8e3 to ae9209a Compare August 9, 2025 17:56
Copy link

codecov bot commented Aug 9, 2025

Codecov Report

❌ Patch coverage is 73.91304% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.86%. Comparing base (656f3c9) to head (b705f04).
⚠️ Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
aw-client-rust/src/classes.rs 0.00% 5 Missing ⚠️
aw-client-rust/src/queries.rs 94.11% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #541      +/-   ##
==========================================
- Coverage   70.81%   69.86%   -0.95%     
==========================================
  Files          51       54       +3     
  Lines        2916     3166     +250     
==========================================
+ Hits         2065     2212     +147     
- Misses        851      954     +103     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The serialize_classes function was using serde_json serialization which
caused regex patterns to be double-escaped (e.g., 't\.co' became 't\.co'),
breaking pattern matching in ActivityWatch queries.

Core changes:
- Rewrite serialize_classes() to build JSON strings manually instead of using serde_json
- Preserve single-escaped regex patterns for proper matching
- Only include 'ignore_case' field when true (omit when false)
- Only include 'regex' field for non-'none' type categories
- Improve error handling in classes deserialization
- Add optional fields to ClassSetting struct for better compatibility
This fixes the core regex pattern matching issues in ActivityWatch
canonical queries that were causing incorrect categorization results.
@0xbrayo 0xbrayo marked this pull request as ready for review August 12, 2025 08:43
Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to b705f04 in 2 minutes and 13 seconds. Click for details.
  • Reviewed 218 lines of code in 4 files
  • Skipped 0 files when reviewing.
  • Skipped posting 6 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. aw-client-rust/src/classes.rs:18
  • Draft comment:
    Good use of #[serde(default)] on the 'regex' field to handle missing data gracefully.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
2. aw-client-rust/src/lib.rs:122
  • Draft comment:
    The variable 'query_lines' is computed but never used; consider removing it to avoid unused code warnings.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. aw-client-rust/src/queries.rs:169
  • Draft comment:
    Manual JSON assembly in serialize_classes avoids double-escaping but be cautious—ensure consistent formatting and proper escaping for all edge cases.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 20% <= threshold 50% The comment is suggesting caution regarding JSON assembly, which is a valid concern. However, it does not provide a specific suggestion or point out a specific issue in the code. It is more of a general warning rather than a constructive comment. According to the rules, comments should be specific and actionable, and this one is not. Therefore, it should be removed.
4. aw-client-rust/src/queries.rs:216
  • Draft comment:
    Mixing embedded semicolons in query fragments (e.g. in the 'not_afk' block) with join using ";\n" may lead to redundant semicolons; consider normalizing statement termination.
  • Reason this comment was not posted:
    Comment was on unchanged code.
5. aw-client-rust/src/queries.rs:242
  • Draft comment:
    Appending a semicolon in the 'categorize' query string may result in extra semicolons when joining statements; verify that the query syntax tolerates this redundancy.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The semicolon is actually correct here. The query.join(";\n") on line 256 joins multiple statements, each needing a semicolon terminator. Having a semicolon in the format string ensures consistent statement termination. The comment's concern about redundant semicolons is unfounded since the join adds newlines between statements. Could there be edge cases where the query language is sensitive to multiple semicolons that I haven't considered? The code's test cases pass with the semicolon present, and the pattern of semicolon-terminated statements is used consistently throughout the codebase. SQL-like query languages typically handle multiple semicolons gracefully. The comment should be deleted. The semicolon is intentional and correct as part of the consistent statement termination pattern used throughout the query building.
6. aw-client-rust/src/queries.rs:291
  • Draft comment:
    In build_browser_events, the initial string already ends with a semicolon; ensure consistent semicolon usage when these parts are later concatenated.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.

Workflow ID: wflow_mW2ufx5lPCQwOJFw

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant