Skip to content

Conversation

kieran-ryan
Copy link
Member

@kieran-ryan kieran-ryan commented Jan 4, 2025

🤔 What's changed?

  • Updated parser generation code with Python formatting as per style guidelines
  • Invoked no-operation Token.detach method in generated parser code
  • Migrated Python formatter from black to ruff
  • Applied refactorings to generated parser code
    • Reorganised imports - I001 (ruff check gherkin/parser.py --select=I001)
    • Dropped or False logic from conditionals - RET505 (ruff check gherkin/parser.py --select=RET505)
    • Replace series of or statements with any call
    • Drop else statement and nesting proceeding by conditionals that would return from the enclosing function

⚡️ What's your motivation?

  • Ensures entire Python implementation is formatted consistently - building on Apply black code formatter to the python codebase #286
    • Improves code review as pull requests less-susceptible to stylistic changes
    • Simplifies configuration - removes exclusion of incompliant modules
    • Improves readability
      • Bulk of change reduces mix of 4 and 8 space indentation within match_token_at_'x' methods to just 4
        • Was difficult to read and was most-likely a workaround in the generator code to satisfy generating code for two cases: to run inside conditionals (with 4 spaces) and outside conditionals (8 spaces) without having to handle within the generator; a SyntaxError would be raised if standardised to 4 spaces without handling (as per this pull request)
  • Migrating formatting to ruff has a number of advantages - including performance, and enabling standardising to a single toolset for linting and formatting
  • General refactorings to improve code quality and simplify code generation
    • or False is redundant - assume added as a safeguard where the proceeding args are not present (defaults to just ‘False’ in that case) pots generation - however the unit tests protect against this, failing if the args aren’t generated
    • Easier to generate any calls against a comma-separate iterable compared to a series of or statements where it can't be applied on every line
    • superflous else is redundant (superflous-else-return - RET505)
    • Generated parser code currently contains a useless-expression (B018) token.detach - updated code invokes appropriately though this doesn't actually alter behaviour (being a no-operation method) though conforms to intention and other parser implementation

🏷️ What kind of change is this?

  • 🏦 Refactoring/debt/DX (improvement to code design, tooling, etc. without changing behaviour)

♻️ Anything particular you want feedback on?

  • Thoughts whether a CHANGELOG entry is warranted?

  • To evaluate changes are formatted correctly:

    Open a codespace (or locally alternatively):

    Open in GitHub Codespaces

    Regenerate all parser implementations:

    docker build --tag berp-env .
    docker run --rm --interactive --tty --volume ".:/app" berp-env
    make clean-generate
    make generate

    From a separate terminal window, install the pre-commit hook and validate all Python code is formatted (including the generated parser code):

    pip install pre-commit
    pre-commit install
    pre-commit run --all-files ruff-format

    Run Python unit and acceptance tests to validate no regression in the generated parser implementation:

    pip install pytest
    pytest
    cd python/
    make acceptance
  • Have skipped formatting on state_comment string

    • Challenging to generate formatted Python multiline string from C# - consider exclusion pragmatic
  • Intend to update GitHub Actions workflow in a follow up pull request to validate, or fully assess use of pre-commit-ci with this polyglot repo

📋 Checklist:

  • I agree to respect and uphold the Cucumber Community Code of Conduct
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly.
  • Users should know about my change
    • I have added an entry to the "Unreleased" section of the CHANGELOG, linking to this pull request.

@kieran-ryan kieran-ryan added the 🏦 debt Tech debt label Jan 4, 2025
@kieran-ryan kieran-ryan self-assigned this Jan 4, 2025
@kieran-ryan kieran-ryan force-pushed the debt/format-py-parser branch from c50567f to ab8bc7c Compare January 4, 2025 23:32
@kieran-ryan kieran-ryan marked this pull request as ready for review January 4, 2025 23:37
@kieran-ryan kieran-ryan requested review from youtux and jsa34 January 4, 2025 23:37
@youtux
Copy link
Contributor

youtux commented Jan 5, 2025

I’m 👎 for this change:

self.ast_builder = ast_builder if ast_builder is not None else AstBuilder() -> self.ast_builder = ast_builder or AstBuilder()

the reason is that you should be able to subclass AstParser and override __bool__ to return False, and the above code should still use your object.

- Removes configuring exclusion of any Python modules from formatting
- Ensure entire library is formatted correctly
- Drop redundant `or False` logic from conditionals
- Drop redundant `else` conditionals after `return` within `if` clause
- Skipped formatting on `state_comment` string
  - Challenging to generate formatted Python multiline string from C#
- Faster tooling
- Better handling with linting rule conflicts
- Enables extending to hundreds of linting rules
- Aligns with pytest-bdd (pytest-dev/pytest-bdd#758)
@kieran-ryan kieran-ryan force-pushed the debt/format-py-parser branch from ab8bc7c to 52864a5 Compare January 5, 2025 22:53
@kieran-ryan
Copy link
Member Author

kieran-ryan commented Jan 5, 2025

I’m 👎 for this change:

self.ast_builder = ast_builder if ast_builder is not None else AstBuilder() -> self.ast_builder = ast_builder or AstBuilder()

the reason is that you should be able to subclass AstParser and override __bool__ to return False, and the above code should still use your object.

Makes sense - applied! Had been unsure on the same for a somewhat related reason - being too implicit by assessing "truthiness" rather than the explicit type declarations - which could become a consideration with changes down the line should users begin passing additional values. Accustomed to using in a local context without this consideration. Interesting, thank you.

@mpkorstanje mpkorstanje requested a review from youtux January 12, 2025 19:07
@luke-hill
Copy link
Contributor

Given this is a big refactor how far down the line is it. Is it close to being merge-worthy? @kieran-ryan as we're doing a lot of other big changes in other flavours so would be good to get this in if we can

@mpkorstanje mpkorstanje force-pushed the debt/format-py-parser branch 2 times, most recently from 8ae3c0d to c026fe6 Compare June 30, 2025 10:54
@mpkorstanje mpkorstanje force-pushed the debt/format-py-parser branch from c026fe6 to 370f944 Compare June 30, 2025 11:11
@mpkorstanje mpkorstanje merged commit c1911b0 into main Jun 30, 2025
6 checks passed
@mpkorstanje mpkorstanje deleted the debt/format-py-parser branch June 30, 2025 11:15
@mpkorstanje
Copy link
Contributor

I've fixed the remarks around not exiting from the loop early. The implementation is a bit awkward as python doesn't have a do-while construct, but this should work and seems to satisfy ruff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏦 debt Tech debt
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants