Skip to content

Conversation

@coopernetes
Copy link

@coopernetes coopernetes commented Dec 13, 2025

This PR introduces an opt-in experimental writer to the various ratchet
operations that modify files (pin/unpin/update/upgrade) that preserves
original file formatting when comments are mixed or near YAML nodes. The
new flag uses line & column positions to modify ratchet references while
still using the AST traversal to find refs to operate on.

This fixes an issue where block comments above or inline with certain
nodes get reordered unexpectedly from the output of ratchet.

Related: #126

@ccoVeille
Copy link

go-yaml maintainer here.

Feel free to ping me on your repository or open issue on yaml/go-yaml if you are facing some issues

@sethvargo
Copy link
Owner

Hi @coopernetes @ccoVeille - this seems like an extremely large change (nearly 2k of hand-written code that needs to be maintained), and it seems to come with a lot of tradeoffs (e.g. maintaining our own support for managing scalars). What is the primary motivation for these changes?

The existing YAML parser (braydonk/yaml) is archived and no longer maintained.

I see no evidence that https://github.com/braydonk/yaml has been deprecated or archived. While there haven't been commits for 10mo, there's no indication the code is stale and the repo is still active.

@coopernetes
Copy link
Author

coopernetes commented Dec 14, 2025

Hey Seth, I'm so sorry. I am mistaken with the yaml parser dependency. I confused it with the original go-yaml/yaml and not the fork that this project currently uses.

My primary motivation was to address the known issues with format preservation, newline additions that show up from the marshalling and other format changes. The thought was an experimental approach which would use the same AST procedure to find matching refs (based on the various CI tool's structure) but then do the actual replacements using unstructured (textual) methods. This would allow ratchet to make only the changes needed to pin/unpin/upgrade matching refs.

I agree that this is a large amount of new code. It's very possible to implement the same approach without a new dependency. I will revert this and try to keep the change set smaller so that it's easier to review.

The associated issue clearly needs to be closed or updated to be more accurate. Sorry again for that.

If you feel this is not worth doing, feel free to close this PR. I can iterate a bit more on own and present something a bit more succinct.

@coopernetes coopernetes force-pushed the experimental-yaml-parsing branch 2 times, most recently from d40ba1b to ec871b1 Compare December 14, 2025 06:33
@coopernetes coopernetes changed the title feat: Add experimental YAML parser with format-preserving surgical replacements feat: Add experimental writer with format-preserving replacements Dec 14, 2025
@coopernetes coopernetes marked this pull request as ready for review December 14, 2025 18:13
@sethvargo
Copy link
Owner

Hi @coopernetes while there are possibly bugs, @dcreey did a bunch of work to preserve newlines and other formatting. The only remaining issue, as far as I know, is that indentation is always set to 2 spaces.

If you have a specific bug, it'd be useful to open a PR with a failing test (there are already formatting tests you can copy), and we can investigate whether that's a bug or working as intended.

@coopernetes coopernetes force-pushed the experimental-yaml-parsing branch 2 times, most recently from a4c6446 to b26fefa Compare December 14, 2025 19:54
@coopernetes
Copy link
Author

coopernetes commented Dec 14, 2025

Hey @sethvargo I have a better grasp now. I encountered this problem when using ratchet to upgrade & pin two workflows: codeql.yml & ci.yml.

Essentially, the current writer reorders and interleaves comments that exist at specific indents above or around YAML nodes which ratchet modifies. By itself, this isn't a big problem. However, with the indent behaviour as well, it results in some gnarly diffs when adopting ratchet on existing workflows or ones which have not been formatted with a proper YAML formatter. I've added my existing workflow as a test case to demonstrate this:

$ go test -v ./command
...
=== NAME  Test_loadYAMLFiles/github-pr125.yml
    command_test.go:58: round-trip mismatch (-want +got):
          strings.Join({
                ... // 20 identical lines
                "          script: |",
                `            console.log("Hello, world!");`,
        -       "      # - name: Example disabled script",
        -       "      #   run: Turning this off for now",
        +       "      - name: Final step",
        +       "        # - name: Example disabled script",
        +       "",
        +       "        #   run: Turning this off for now",
        +       "",
                "",
        -       "      - name: Final step",
                `        run: echo "This is the final step."`,
                "",
          }, "\n")
=== NAME  Test_loadYAMLFiles/github-codeql-pr125.yml
    command_test.go:58: round-trip mismatch (-want +got):
          (
                """
                ... // 34 identical lines
                          # Prefix the list here with "+" to use these queries and those in the config file.
          
        -                 # For more details on CodeQL's query packs, refer to: https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
        -                 # queries: security-extended,security-and-quality
        - 
                      # Autobuild attempts to build any compiled languages (C/C++, C#, Go, Java, or Swift).
                      # If this step fails, then you should remove it and run the build manually (see below)
        + 
                      - name: Autobuild
        -               uses: github/codeql-action/autobuild@v3
        +               # For more details on CodeQL's query packs, refer to: https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
        +               # queries: security-extended,security-and-quality
        + 
          
        +               uses: github/codeql-action/autobuild@v3
                      # ℹ Command-line programs to run using the OS shell.
        + 
                      # 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun
        + 
          
                      #   If the Autobuild fails above, remove it and uncomment the following three lines.
                      #   modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance.
        + 
          
                      # - run: |
                      #     echo "Run, Build Application using script"
                      #     ./location_of_script_within_repo/buildscript.sh
        - 
                      - name: Perform CodeQL Analysis
        + 
                        uses: github/codeql-action/analyze@v3
                        with:
                ... // 2 identical lines
                """
          )

In the 2nd workflow (ci.yml), this can actually cause the workflow to break. There was a step that was commented out. Re-ordering that block into a new indent level and then un-commenting in VS Code causes the syntax to no longer be correct (the comment block is indented two spaces too many to the right).

To recreate:

  1. Run ratchet pin testdata/github-pr125.yml.
  2. Uncomment out the "Example disabled script" step after format changes.

Results in a broken workflow:

image

I believe you have mentioned in past issues that this output formatting are acceptable for ratchet and regex/text based approaches used before were error-prone & discouraged from use. I completely understand if this is not a behaviour you wish to integrate. Let me know what you think!

This PR introduces an opt-in experimental writer to the various ratchet
operations that modify files (pin/unpin/update/upgrade) that preserves
original file formatting when comments are mixed or near YAML nodes. The
new flag uses line & column positions to modify ratchet references while
still using the AST traversal to find refs to operate on.

This fixes an issue where block comments above or inline with certain
nodes get reordered unexpectedly from the output of ratchet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants