Skip to content

Conversation

@taminomara
Copy link
Contributor

@taminomara taminomara commented May 22, 2025

No description provided.

@ratmice
Copy link
Collaborator

ratmice commented May 22, 2025

This looks to me like a good catch to me, I am scratching my head though wondering why rule_ids_map in CTLexerBuilder::build() doesn't seem to have the same sort of issue?

@taminomara
Copy link
Contributor Author

Hmm, it does. There's something strange going on. Cargo doesn't re-run build.rs unless some files were changed. This is why we didn't see hash maps causing any issues before. But for some reason, in my project, vergen causes re-run of build scripts every time, and that's when I see the results of this undeterministic behavior. Why vergen acts up in my repo but not here, I don't know...

@taminomara
Copy link
Contributor Author

Alright, I'm not sure what exactly happening with rebuilds, but I've added a sort to the lexer generator as well, hope it'll reduce the number of spurious builds.

@ratmice
Copy link
Collaborator

ratmice commented May 22, 2025

I'm definitely not familiar with vergen to know what might cause such a loop to happen,
the only thing I can think of is if perhaps build.rs is generating a timestamp file, but then perhaps also include_str! from build.rs too that same file. So that perhaps each run of build.rs is triggering a rebuild of the next attempt.

However I didn't see anything in vergen that seemed to indicate how to get it to output such a file, and doesn't
that doesn't seem terribly likely.

@ratmice
Copy link
Collaborator

ratmice commented May 22, 2025

Well, I'm happy with all the changes but I'll give some time in case Laurence has any further comments.

@taminomara
Copy link
Contributor Author

In the end, the culprit turned out to be a mounted file system that didn't handle time stamps correctly (this was a WSL thing). So, this PR doesn't actually fix anything because there was nothing to fix except my environment. Still, it might be good to keep generated files consistent, so feel free to merge or reject this 😸

// Record the time that this version of lrlex was built. If the source code changes and rustc
// forces a recompile, this will change this value, causing anything which depends on this
// build of lrlex to be recompiled too.
let timestamp = env!("VERGEN_BUILD_TIMESTAMP");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this environment variable always defined?

Copy link
Collaborator

@ratmice ratmice May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you can see it also used in the ct_token_map codegen at the top of the diff context in the first patch in this series. 719da09

I believe it is ensured by:
https://docs.rs/vergen/latest/vergen/struct.BuildBuilder.html#method.build_timestamp
https://github.com/softdevteam/grmtools/blob/master/lrlex/build.rs#L4

along with the compile time caching behavior of the env! macro:
https://www.cs.brandeis.edu/~cs146a/rust/doc-02-21-2015/std/macro.env!.html

So I think this is including a timestamp from the time when lrlex/build.rs is built to every token_map.rs,
so this addition syncs things so that applies to the main lex output as well!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, yes, I can see. @ratmice I think you're best placed to judge this PR.

@ratmice
Copy link
Collaborator

ratmice commented May 22, 2025

@taminomara Even if this doesn't fix your original issue, I feel like the deterministic output is an improvement for reproducible builds.

But we should probably rewrite the commit message to to reflect that, I'd suggest something to the
effect of "Sort tokens in CTLexerBuilder codegen for deterministic output". Please squash it down to one commit as well.

let timestamp = env!("VERGEN_BUILD_TIMESTAMP");
write!(outs, "// lrlex build time: {}\n\n", quote!(#timestamp),).ok();
outs.push_str(&syn::parse_str(&unformatted)
.map(|syntax_tree| prettyplease::unparse(&syntax_tree))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this patch needs cargo fmt and fixing of cargo clippy errors before it'll pass CI.

@taminomara taminomara changed the title Prevent rebuilds due to different order of tokens in ct_token_map Sort tokens in CTLexerBuilder codegen for deterministic output May 22, 2025
@ratmice ratmice added this pull request to the merge queue May 22, 2025
Merged via the queue into softdevteam:master with commit 6444cf6 May 22, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants