Open
Conversation
Expanded grammar and visitor to handle additional bash constructs encountered in a 3145-script corpus (nixpkgs, kubernetes, pi-hole, void-packages): - Brace groups, select statements, process substitution with whitespace before closing paren - Suppress synthetic ANTLR error-recovery token text (e.g. <missing 'fi'>) that was polluting round-trip output - Arithmetic expressions with bitwise ops, special vars, triple-paren edge cases - Deeply nested command/process substitutions, backtick nesting - Here-strings, complex redirections, associative arrays - Various quoting edge cases (dollar-single-quote, regex escapes) Added 59 regression tests across 15 existing test classes covering all newly supported constructs. 270 tests total, all passing.
Address review feedback on CommandList and Pipeline operator modeling: - Replace List<Literal> operators in CommandList with typed enum (Operator.AND, Operator.OR) paired with Space via OperatorEntry - Replace List<Literal> pipeOperators in Pipeline with typed enum (PipeOp.PIPE, PipeOp.PIPE_AND) paired with Space via PipeEntry - Add Bash.Background wrapper type for & (postfix statement modifier), following the same pattern as Bash.Redirected - Restructure grammar: move &&/|| into andOr rule, keep ;/& in listSep - CommandList now exclusively represents &&/|| chains - ; absorbed into whitespace prefix (not explicitly modeled)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A new bash parser for OpenRewrite, built from scratch using ANTLR4. The parser produces a lossless syntax tree (LST) that preserves all whitespace, comments, and formatting — enabling round-trip parsing where printed output is byte-identical to the original source.
What's included
BashLexer.g4,BashParser.g4) covering bash syntax: functions, loops, conditionals, case statements, arrays, arithmetic, pipelines, redirections, here-documents, process substitution, command substitution, variable expansion, quoting, and moreBashParserVisitorthat converts ANTLR parse trees into OpenRewrite's LST modelBashPrinterfor lossless printing back to sourceBashVisitor/BashIsoVisitorfor recipe authors to traverse and transform bash scriptsTesting strategy
Unit tests (270 tests, 19 classes): Each test class covers a specific language construct (arithmetic, arrays, case statements, command substitution, conditionals, for loops, functions, if statements, pipelines, process substitution, quoting, redirections, subshells, variable expansion, while loops, etc.). Every test verifies lossless round-trip fidelity — the parsed-then-printed output must be byte-identical to the input.
Corpus validation (3,000+ scripts from 10 open source projects): The parser was validated against a diverse corpus of real-world bash scripts. Every script parses and round-trips successfully. The corpus repos:
Test plan