postprocess: handle comments in conditional code #1534

kkysen · 2026-01-05T23:17:35Z

Since c2rust transpile does not transpile conditionally-compiled code (and we haven't integrated Hayroll yet), LLMs currently (and logically) don't insert comments from that conditionally-compiled code that haven't been transpiled into Rust. In order for the comparison checking to work, direct the LLM to still insert these comments where that Rust would've gone. This allows us to run successfully on most of json-c (those caches are checked in), although there are still a few other remaining bugs.

We now extract the Rust code block in the Markdown response. This allows to ask for other LLM feedback besides purely the code, which helps in debugging/reasoning about why the LLM did what it did.

…tional code Since `c2rust transpile` does not transpile conditionally-compiled code (and we haven't integrated Hayroll yet), LLMs currently (and logically) don't insert comments from that conditionally-compiled code that haven't been transpiled into Rust. In order for the comparison checking to work, direct the LLM to still insert these comments where that Rust would've gone. This allows us to run successfully on most of `json-c`, although there are still a few other remaining bugs.

There are still some remaining bugs I'm encountering in testing on `json-c`, but checking these in now allows us to see in the PR how the conditional comments are inserted.

thedataking

I'm not convinced this is the way to go. At least not long term.

We have the compile commands, so we could tell the LLM what environment the function was compiled with so it can determine what code is guarded. We could use that information to exclude guarded comments from the validation function.

Are you suggesting this as a short-term fix to make tests pass?
If so, please put in some TODO comments... specially the LLM prompt to remind ourselves that this is a workaround that needs addressing.

Edit to unpack my thinking a bit more: I'm working on a PR that adds a pre-processing step that removes gunk before or around a function. I think it is best to have a sequence of very simple LLM calls that are easy to validate. One is the gunk removal (e.g. license comment at top of file), another could be removal of guarded code. If we ask the LLM to do a lot in one go so we will not be able to use some of the simpler and less capable ones and validation gets more involved.

…eview` for the default `flash` is cheaper and faster and seems to still be sufficiently capable. I did have to adjust the prompt slightly to explicitly say to include the surrounding Rust Markdown code block. This isn't actually used right now, but is in #1534, so it seems worth preserving.

thedataking

Please preserve logic in remove_backticks (don't care what you name it) but it should not require backticks to be present, just remove them if they are. Sometimes the LLM emits them (with or without the rust part), sometimes, not. We don't want to pay for another roundtip (in time or API costs) if we can just remove backticks if present.

kkysen

We can discuss further offline, but

I'm not convinced this is the way to go. At least not long term.

Are you suggesting this as a short-term fix to make tests pass? If so, please put in some TODO comments... specially the LLM prompt to remind ourselves that this is a workaround that needs addressing.

I think this makes sense at least in the short-term as the simplest fix to get to the next errors in json-c and start testing real code. I'm less sure about the long-term, though. I was thinking long-term Hayroll might be the best, but there are also corner cases to that and integration issues, so I'm not sure. Also, in the intermediate-term, these comments can serve as some notice that there's missing conditionally-compiled code (although it's not thorough at that/only if it has comments).

We have the compile commands, so we could tell the LLM what environment the function was compiled with so it can determine what code is guarded. We could use that information to exclude guarded comments from the validation function.

I was thinking about that, but it seems 1) more complex in the short-term and 2) would become unnecessary if we do run after Hayroll. This might be a good path, but I'm not sure how complex it'll get and how it'll interact with other things (e.g. Hayroll). Would we basically run the preprocessor/clang and get the C comments that way instead of through a lexer?

Edit to unpack my thinking a bit more: I'm working on a PR that adds a pre-processing step that removes gunk before or around a function. I think it is best to have a sequence of very simple LLM calls that are easy to validate. One is the gunk removal (e.g. license comment at top of file), another could be removal of guarded code. If we ask the LLM to do a lot in one go so we will not be able to use some of the simpler and less capable ones and validation gets more involved.

Hmm, let's discuss.

…eview` for the default (#1535) `flash` is cheaper and faster and seems to still be sufficiently capable. ~~I did have to adjust the prompt slightly to explicitly say to include the surrounding Rust Markdown code block. This isn't actually used right now, but it is in #1534, so it seems worth preserving.~~ Edit: The prompt is kept the same now.

thedataking · 2026-01-07T19:33:01Z

Update from meeting with Galois 1/7/2026: handling conditional compilation is a general problem so we should find a general approach that works not just for c2rust-postprocess but elsewhere too. We can denylist problematic functions for now and once a robust and general approach (probably at the MVIR tools level) is available, we will switch to that.

As a general-purpose solution to bypassing functions the postprocessor's comment transferer (and future transforms, too) can't handle yet, this adds an exclude list to skip functions in files. There is also `--ident-filter`, which is a single regex just based on the function name, but this is more specific and also takes into account the file name. The file passed to `--exclude-file` is a YAML file containing file paths (relative to the exclude file's directory) and the function names within them to skip. This gets us around issues like conditionally-compiled comments (#1534), the first function in a file also including license comments and `#include`s, and anything else. I've run this on `json-c` so far ```sh (cd c2rust-postprocess/ && uv run postprocess $OLDPWD/tests/integration/tests/json-c/repo/lib.rs --no-update-rust --exclude-file $OLDPWD/tests/integration/tests/json-c/postprocess-exclude.yml) ``` and checked-in the LLM cache for it, but setting up the testing infrastructure to run this automatically is still a WIP (should be the next PR). Funnily enough, the LLM cache is going to end up behaving similarly to snapshot tests once it's up and running.

kkysen added 2 commits January 5, 2026 13:34

postprocess: skip empty comment lines in comparing comments

0efbfbc

postprocess: log the cache value as toml (easier to read)

9382c54

kkysen requested a review from thedataking January 5, 2026 23:17

kkysen added 3 commits January 5, 2026 15:26

postprocess: allow non Rust code in LLM response

6ae22af

We now extract the Rust code block in the Markdown response. This allows to ask for other LLM feedback besides purely the code, which helps in debugging/reasoning about why the LLM did what it did.

postprocess: add most of the LLM cache for json-c

0cf4791

There are still some remaining bugs I'm encountering in testing on `json-c`, but checking these in now allows us to see in the PR how the conditional comments are inserted.

kkysen force-pushed the kkysen/postprocess-conditional-comments branch from 8e2ab7c to 0cf4791 Compare January 5, 2026 23:26

thedataking reviewed Jan 5, 2026

View reviewed changes

Base automatically changed from kkysen/postprocess-search-dir to master January 5, 2026 23:40

kkysen mentioned this pull request Jan 5, 2026

postprocess: switch from gemini-3-pro-preview to gemini-3-flash-preview for the default #1535

Merged

thedataking requested changes Jan 6, 2026

View reviewed changes

kkysen commented Jan 6, 2026

View reviewed changes

kkysen mentioned this pull request Jan 8, 2026

postprocess: add --exclude-file for skipping functions #1541

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

postprocess: handle comments in conditional code #1534

postprocess: handle comments in conditional code #1534

Uh oh!

kkysen commented Jan 5, 2026

Uh oh!

thedataking left a comment •

edited

Loading

Uh oh!

thedataking left a comment •

edited

Loading

Uh oh!

kkysen left a comment

Uh oh!

thedataking commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

postprocess: handle comments in conditional code #1534

Are you sure you want to change the base?

postprocess: handle comments in conditional code #1534

Uh oh!

Conversation

kkysen commented Jan 5, 2026

Uh oh!

thedataking left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thedataking left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kkysen left a comment

Choose a reason for hiding this comment

Uh oh!

thedataking commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

thedataking left a comment •

edited

Loading

thedataking left a comment •

edited

Loading