|
| 1 | +# HCL2 Parser — CLAUDE.md |
| 2 | + |
| 3 | +## Pipeline |
| 4 | + |
| 5 | +``` |
| 6 | +Forward: HCL2 Text → Lark Parse Tree → LarkElement Tree → Python Dict/JSON |
| 7 | +Reverse: Python Dict/JSON → LarkElement Tree → Lark Tree → HCL2 Text |
| 8 | +``` |
| 9 | + |
| 10 | +## Module Map |
| 11 | + |
| 12 | +| Module | Role | |
| 13 | +|---|---| |
| 14 | +| `hcl2/hcl2.lark` | Lark grammar definition | |
| 15 | +| `hcl2/api.py` | Public API (`load/loads/dump/dumps` + intermediate stages) | |
| 16 | +| `hcl2/parser.py` | Lark parser factory with caching | |
| 17 | +| `hcl2/transformer.py` | Lark parse tree → LarkElement tree | |
| 18 | +| `hcl2/deserializer.py` | Python dict → LarkElement tree | |
| 19 | +| `hcl2/formatter.py` | Whitespace alignment and spacing on LarkElement trees | |
| 20 | +| `hcl2/reconstructor.py` | LarkElement tree → HCL2 text via Lark | |
| 21 | +| `hcl2/builder.py` | Programmatic HCL document construction | |
| 22 | +| `hcl2/utils.py` | `SerializationOptions`, `SerializationContext`, string helpers | |
| 23 | +| `hcl2/const.py` | Constants: `IS_BLOCK`, `COMMENTS_KEY`, `INLINE_COMMENTS_KEY` | |
| 24 | +| `cli/helpers.py` | File/directory/stdin conversion helpers | |
| 25 | +| `cli/hcl_to_json.py` | `hcl2tojson` entry point | |
| 26 | +| `cli/json_to_hcl.py` | `jsontohcl2` entry point | |
| 27 | + |
| 28 | +`hcl2/__main__.py` is a thin wrapper that imports `cli.hcl_to_json:main`. |
| 29 | + |
| 30 | +### Rules (one class per grammar rule) |
| 31 | + |
| 32 | +| File | Domain | |
| 33 | +|---|---| |
| 34 | +| `rules/abstract.py` | `LarkElement`, `LarkRule`, `LarkToken` base classes | |
| 35 | +| `rules/tokens.py` | `StringToken` (cached factory), `StaticStringToken`, punctuation constants | |
| 36 | +| `rules/base.py` | `StartRule`, `BodyRule`, `BlockRule`, `AttributeRule` | |
| 37 | +| `rules/containers.py` | `TupleRule`, `ObjectRule`, `ObjectElemRule`, `ObjectElemKeyRule` | |
| 38 | +| `rules/expressions.py` | `ExprTermRule`, `BinaryOpRule`, `UnaryOpRule`, `ConditionalRule` | |
| 39 | +| `rules/literal_rules.py` | `IntLitRule`, `FloatLitRule`, `IdentifierRule`, `KeywordRule` | |
| 40 | +| `rules/strings.py` | `StringRule`, `InterpolationRule`, `HeredocTemplateRule` | |
| 41 | +| `rules/functions.py` | `FunctionCallRule`, `ArgumentsRule` | |
| 42 | +| `rules/indexing.py` | `GetAttrRule`, `SqbIndexRule`, splat rules | |
| 43 | +| `rules/for_expressions.py` | `ForTupleExprRule`, `ForObjectExprRule`, `ForIntroRule`, `ForCondRule` | |
| 44 | +| `rules/whitespace.py` | `NewLineOrCommentRule`, `InlineCommentMixIn` | |
| 45 | + |
| 46 | +## Public API (`api.py`) |
| 47 | + |
| 48 | +Follows the `json` module convention. All option parameters are keyword-only. |
| 49 | + |
| 50 | +- `load/loads` — HCL2 text → Python dict |
| 51 | +- `dump/dumps` — Python dict → HCL2 text |
| 52 | +- Intermediate stages: `parse/parses`, `parse_to_tree/parses_to_tree`, `transform`, `serialize`, `from_dict`, `from_json`, `reconstruct` |
| 53 | + |
| 54 | +### Option Dataclasses |
| 55 | + |
| 56 | +**`SerializationOptions`** (LarkElement → dict): |
| 57 | +`with_comments`, `with_meta`, `wrap_objects`, `wrap_tuples`, `explicit_blocks`, `preserve_heredocs`, `force_operation_parentheses`, `preserve_scientific_notation` |
| 58 | + |
| 59 | +**`DeserializerOptions`** (dict → LarkElement): |
| 60 | +`heredocs_to_strings`, `strings_to_heredocs`, `object_elements_colon`, `object_elements_trailing_comma` |
| 61 | + |
| 62 | +**`FormatterOptions`** (whitespace/alignment): |
| 63 | +`indent_length`, `open_empty_blocks`, `open_empty_objects`, `open_empty_tuples`, `vertically_align_attributes`, `vertically_align_object_elements` |
| 64 | + |
| 65 | +## CLI |
| 66 | + |
| 67 | +Console scripts defined in `pyproject.toml`. Each uses argparse flags that map directly to the option dataclass fields above. |
| 68 | + |
| 69 | +``` |
| 70 | +hcl2tojson --json-indent 2 --with-meta file.tf |
| 71 | +jsontohcl2 --indent 4 --no-align file.json |
| 72 | +``` |
| 73 | + |
| 74 | +Add new options as `parser.add_argument()` calls in the relevant entry point module. |
| 75 | + |
| 76 | +## Hard Rules |
| 77 | + |
| 78 | +These are project-specific constraints that must not be violated: |
| 79 | + |
| 80 | +1. **Always use the LarkElement IR.** Never transform directly from Lark parse tree to Python dict or vice versa. |
| 81 | +1. **Block vs object distinction.** Use `__is_block__` markers (`const.IS_BLOCK`) to preserve semantic intent during round-trips. The deserializer must distinguish blocks from regular objects. |
| 82 | +1. **Bidirectional completeness.** Every serialization path must have a corresponding deserialization path. Test round-trip integrity: Parse → Serialize → Deserialize → Serialize produces identical results. |
| 83 | +1. **One grammar rule = one `LarkRule` class.** Each class implements `lark_name()`, typed property accessors, `serialize()`, and declares `_children_layout: Tuple[...]` (annotation only, no assignment) to document child structure. |
| 84 | +1. **Token caching.** Use the `StringToken` factory in `rules/tokens.py` — never create token instances directly. |
| 85 | +1. **Interpolation context.** `${...}` generation depends on nesting depth — always pass and respect `SerializationContext`. |
| 86 | +1. **Update both directions.** When adding language features, update transformer.py, deserializer.py, formatter.py and reconstructor.py. |
| 87 | + |
| 88 | +## Adding a New Language Construct |
| 89 | + |
| 90 | +1. Add grammar rules to `hcl2.lark` |
| 91 | +1. Create rule class(es) in the appropriate `rules/` file |
| 92 | +1. Add transformer method(s) in `transformer.py` |
| 93 | +1. Implement `serialize()` in the rule class |
| 94 | +1. Update `deserializer.py`, `formatter.py` and `reconstructor.py` for round-trip support |
| 95 | + |
| 96 | +## Testing |
| 97 | + |
| 98 | +Framework: `unittest.TestCase` (not pytest). |
| 99 | + |
| 100 | +``` |
| 101 | +python -m unittest discover -s test -p "test_*.py" -v |
| 102 | +``` |
| 103 | + |
| 104 | +**Unit tests** (`test/unit/`): instantiate rule objects directly (no parsing). |
| 105 | + |
| 106 | +- `test/unit/rules/` — one file per rules module |
| 107 | +- `test/unit/cli/` — one file per CLI module |
| 108 | +- `test/unit/test_api.py`, `test_builder.py`, `test_deserializer.py`, `test_formatter.py`, `test_reconstructor.py`, `test_utils.py` |
| 109 | + |
| 110 | +Use concrete stubs when testing ABCs (e.g., `StubExpression(ExpressionRule)`). |
| 111 | + |
| 112 | +**Integration tests** (`test/integration/`): full-pipeline tests with golden files. |
| 113 | + |
| 114 | +- `test_round_trip.py` — iterates over all suites in `hcl2_original/`, tests HCL→JSON, JSON→JSON, JSON→HCL, and full round-trip |
| 115 | +- `test_specialized.py` — feature-specific tests with golden files in `specialized/` |
| 116 | + |
| 117 | +Always run round-trip full test suite after any modification. |
| 118 | + |
| 119 | +## Keeping Docs Current |
| 120 | + |
| 121 | +Update this file when architecture, modules, API surface, or testing conventions change. Also update `README.md` and `docs/usage.md` when changes affect the public API, CLI flags, or option fields. |
0 commit comments