Skip to content

feat: Map BUILD to Python (Starlark) for Bazel (fixes #3575)#3576

Open
vorburger wants to merge 4 commits intosharkdp:masterfrom
vorburger:patch-2
Open

feat: Map BUILD to Python (Starlark) for Bazel (fixes #3575)#3576
vorburger wants to merge 4 commits intosharkdp:masterfrom
vorburger:patch-2

Conversation

@vorburger
Copy link
Contributor

@keith-hall like this, for #3575 ?

Full disclosure: I have NOT actually tested this... as in, for #3575 I HAVE tested bat --map-syntax='BUILD:Python' BUILD locally on a pre-built binary, and looking around https://github.com/sharkdp/bat/tree/master/src/syntax_mapping/builtins/common I kinda gathered that's what you meant with your comment in #3575 - but I haven't actually rebuilt bat with this... but it looks about right (I hope).

Copilot AI review requested due to automatic review settings February 1, 2026 12:36
@vorburger
Copy link
Contributor Author

Re. 2 failing checks, let me add a line to CHANGELOG.md (coming up).

But that "[Minimum supported rust version" failure must be unrelated to this PR.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds a syntax mapping for Bazel BUILD files to be highlighted as Python (Starlark), addressing issue #3575. The change adds a new configuration file that maps the filename "BUILD" to Python syntax highlighting.

Changes:

  • Added 50-bazel.toml configuration file to map BUILD files to Python syntax

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -0,0 +1,2 @@
[mappings]
"Python" = ["BUILD"]
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will break the existing test 'syntax_detection_first_line' in src/assets.rs:540-541. The test expects files named "build" (lowercase) with a bash shebang to use first-line fallback detection and be identified as "Bourne Again Shell (bash)". However, with this mapping, "build" will match the case-insensitive "BUILD" pattern and be mapped to Python, preventing first-line fallback and causing the test to fail.

Suggested change
"Python" = ["BUILD"]
# Intentionally no mapping for bare "BUILD" to avoid conflicting with
# first-line fallback detection for lowercase "build" files.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer relevant after case sensitive matching was added in 56fe0fa

@@ -0,0 +1,2 @@
[mappings]
"Python" = ["BUILD"]
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No test coverage has been added for the new BUILD file mapping. Consider adding a test case to src/assets.rs in the syntax_detection_basic test function, similar to the existing tests for other filename mappings like PKGBUILD and Makefile. For example: assert_eq!(test.syntax_for_file("BUILD"), "Python");

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added test in 56fe0fa

@@ -0,0 +1,2 @@
[mappings]
"Python" = ["BUILD"]
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern "BUILD" will match files named "build" (lowercase) due to case-insensitive glob matching (see src/syntax_mapping.rs:22). This overrides the existing mapping in 99-unset-ambiguous-filenames.toml that intentionally maps "build" to MapToUnknown to prevent incorrect syntax detection for NAnt Build Files. This breaks existing behavior.

Since Bazel BUILD files are typically uppercase, consider either:

  1. Finding a way to make this mapping case-sensitive (would require code changes to the make_glob_matcher function)
  2. Documenting this as a known limitation and accepting that lowercase "build" files will now be treated as Python
  3. Not adding this mapping and relying on users to configure it manually via --map-syntax if needed

The issue noted in 99-unset-ambiguous-filenames.toml is that "NAnt Build File" should only match *.build files (with extension), not files named "build" (no extension). This PR's case-insensitive "BUILD" pattern will cause lowercase "build" files to be mapped to Python instead of remaining as MapToUnknown.

Suggested change
"Python" = ["BUILD"]

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 56fe0fa

@vorburger
Copy link
Contributor Author

@keith-hall hm, is Copilot right about the case sensitivity? Given the test failures, I guess this would need more work.

My time is limited and my Rust is too rusty to implement that... BTW see #3577.

Also, would you want to see a test, to accept merging this?

@keith-hall
Copy link
Collaborator

Yep, looks like Copilot is correct about the case-insensitivity, which is indeed why the tests are failing. Hmm, yeah I guess we need to introduce a way to map a filename case sensitively to proceed further. As for tests, a simple test case like Copilot suggested at #3576 (comment) would suffice, if all the existing tests continue to pass unmodified.

@keith-hall
Copy link
Collaborator

@cyqsimon giving you a heads up about these proposed changes to the syntax mapping code, as you are the original author, I would appreciate your thoughts/review when you get chance please 🙂

@cyqsimon
Copy link
Contributor

@cyqsimon giving you a heads up about these proposed changes to the syntax mapping code, as you are the original author, I would appreciate your thoughts/review when you get chance please 🙂

Ack that. Will take a look tomorrow.

My free time is somewhat constrained these couple of days due to minor personal health issues, so my apologies in advance for any slow responses.

Copy link
Contributor

@cyqsimon cyqsimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all code style looks good. In this age of LLM there is not much point to nitpick on any one specific line or two IMO so I won't. As long as it's not egregiously offending and there is test coverage I'm okay with it.

That being said, I do see some architectural problems here:

1. The addition of the case_sensitive_mappings table

It's rigid and lacks consideration for future extensibility. Because fundamentally, case-sensitivity is a property of a particular rule. And we may have other yet unforeseen properties for each rule in the future, which will create a multiplicative number of possible states for each rule.

So for example, imagine if one day we wish to add support for rules that are only active for SSH sessions. Suddenly we will need 4 categories: ssh+case-sensitive, ssh+case-insensitive, local+case-sensitive, local+case-insensitive. I hope you don't intend to create a table for each of those combinations. And what happens when you add a third and a fourth property? This is unsustainable.

Much better I think is to use the approach taken by Cargo.toml's dependency table, where you can either specify a dependency with version only: serde = "1" or with additional properties: serde = { version = "1", optional = true }. Serde docs have a page on how you can do this: Either string or struct.

2. Use of the Matcher type

This issue stems from the first one. It should automatically become irrelevant after you solve issue 1.

Currently, the Matcher type used to serve double-duty:

  1. as the newtype wrapper for a parsed single rule;
  2. as the codegen argument for build_matcher_{fixed,dynamic}.

The problem is, in the current design the "case-sensitivity" info isn't provided during parsing. Instead, it is set arbitrarily during parsing (because the info is yet unknown), then overridden (or not) by upstream callers. This existence of invalid states (no matter how briefly it exists) goes contrary to one of Rust's core philosophies - "make invalid states unrepresentable".

Therefore the options are to either:

  1. Use a distinct type for each of those two duties;
  2. Modify the design so that the "case-sensitivity" info is available during parsing.

And since resolving the aforementioned first problem would automatically give you option 2 for free, that's what I would go with.

3. Not using a new enum to represent case-sensitivity

This is a minor issue compared to the previous two, but I'll put it here for your consideration nonetheless.

In code like this where you name both a property and its inverse (e.g. here, both sensitive and insensitive), it's very easy to accidentally flip the logic in the implementation. This is a human problem and I'm sure everyone has done it. It also makes it more difficult to read - when you're passing true to a function, let's say build_matcher_fixed, are you replying true to "sensitive" or "insensitive"?

This is why using booleans in arguments is considered (by some) bad practise where you can avoid it. I think some official Rust material (Rust book? I can't remember.) explicitly holds this position. Much better would be to use a enum, something like this:

enum Case {
  Sensitive,
  Insensitive,
}

@cyqsimon
Copy link
Contributor

cyqsimon commented Mar 17, 2026

I was a little worried after posting my initial opinion that TOML may not be too happy with mixed-type arrays but it appears that's not a problem since toml 1.0.

@keith-hall
Copy link
Collaborator

Thanks for the excellent feedback @cyqsimon, I really appreciate it. I completely agree with all your points - I will make the necessary changes 👍

@keith-hall keith-hall self-requested a review March 18, 2026 02:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants