Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 82 additions & 30 deletions .github/workflows/style-bot-action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,15 @@ name: Style Bot Action
on:
workflow_call:
inputs:
pre_commit_script:
style_command_type:
required: false
type: string
description: "Optional script to run before committing changes"
pre_commit_script_name:
description: "Which style command to run (options: 'default' (make style && make quality), 'quality_only', 'style_only')"
default: "default"
pr_number:
required: false
type: string
description: "Custom name for the pre-commit script step"
default: "Custom pre-commit script"
type: number
description: "Pull Request number to process (only used when running manually)"
python_quality_dependencies:
required: true
type: string
Expand All @@ -21,11 +21,6 @@ on:
type: string
description: "Python version to run code formatter"
default: "3.10"
style_command:
required: false
type: string
description: "Command to run for style checks or/and style fixes"
default: "make style && make quality"
secrets:
bot_token:
required: true
Expand All @@ -34,8 +29,10 @@ on:
jobs:
check-permissions:
if: >
contains(github.event.comment.body, '@bot /style') &&
github.event.issue.pull_request != null
(github.event_name == 'issue_comment' &&
contains(github.event.comment.body, '@bot /style') &&
github.event.issue.pull_request != null) ||
github.event_name == 'workflow_dispatch'
runs-on: ubuntu-latest
outputs:
is_authorized: ${{ steps.check_user_permission.outputs.has_permission }}
Expand All @@ -45,7 +42,14 @@ jobs:
uses: actions/github-script@v6
with:
script: |
const comment_user = context.payload.comment.user.login;
let comment_user;
if (context.eventName === 'workflow_dispatch') {
comment_user = context.actor;
console.log('Workflow triggered manually by:', comment_user);
} else {
comment_user = context.payload.comment.user.login;
console.log('Workflow triggered by comment from:', comment_user);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this only be reserved for users having admin privileges?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it's already checked here :

const authorized = permission.permission === 'admin';

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(both for workflow_dispatch and issue comment)

Copy link
Contributor

@ydshieh ydshieh Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, I am not sure about the details, but I kind feel admin is not guaranteed to every maintainer if we are talking about HF repositories.

For example, for transformers, I think admin is not guaranteed to say @molbap @qubvel etc. I am admin because I need to access transformers repository setting page.

It might be a good idea to double check here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my statement above is correct, what I did so far is a lazy approach by a hard coded list of user name .... (but that is not good neither for security as people may leave the team someday)

https://github.com/huggingface/transformers/blob/2c2495cc7b0e3e2942a9310f61548f40a2bc8425/.github/workflows/self-comment-ci.yml#L32

const { data: permission } = await github.rest.repos.getCollaboratorPermissionLevel({
owner: context.repo.owner,
repo: context.repo.repo,
Expand All @@ -65,18 +69,29 @@ jobs:
uses: actions/github-script@v6
with:
script: |
const prNumber = context.payload.issue.number;
const prNumber = context.eventName === 'workflow_dispatch'
? context.payload.inputs.pr_number
: context.payload.issue.number;

// Get PR details
const { data: pr } = await github.rest.pulls.get({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: prNumber
});

// We capture both the branch ref and the "full_name" of the head repo
// so that we can check out the correct repository & branch (including forks).
core.setOutput("prNumber", prNumber);
core.setOutput("headRef", pr.head.ref);
core.setOutput("headRepoFullName", pr.head.repo.full_name);
// Set outputs for use in subsequent steps
core.setOutput('headRepoFullName', pr.head.repo.full_name);
core.setOutput('headRef', pr.head.ref);
core.setOutput('baseRef', pr.base.ref);
core.setOutput('prNumber', prNumber);

console.log('PR Details:', {
number: prNumber,
headRepo: pr.head.repo.full_name,
headRef: pr.head.ref,
baseRef: pr.base.ref
});

- name: Check out PR branch
uses: actions/checkout@v3
Expand All @@ -101,6 +116,33 @@ jobs:
echo "Head Ref: $HEADREF"
echo "Head Repo Full Name: $HEADREPOFULLNAME"

- name: Verify critical files haven't been modified
uses: actions/github-script@v6
with:
script: |
const prNumber = context.eventName === 'workflow_dispatch'
? context.payload.inputs.pr_number
: context.payload.issue.number;
const { data: pr } = await github.rest.pulls.listFiles({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: prNumber
});

const modifiedFiles = pr.map(file => file.filename);
console.log("Modified files:", modifiedFiles);

const protectedFiles = ["setup.py", "Makefile"];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, why do you check this file in the workflow? I mean Makefile and setup.py are checked because they directly affect command execution so I wonder why check_doc_toc.py is also protected here.

quickly checked other HF repos, it seems that this file is not common in other repos (except for transformers), maybe we can add a parameter additional_protected_files so that you can pass utils/check_doc_toc.py as well. wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_doc_toc.py is a part of the make style && make quality process. So, in the case it was modified, it exposes security risks.

Copy link
Contributor

@Wauplin Wauplin Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of checking for protected files, what do you think of allowing the process to run only if the @bot /style comment has been written by a maintainer/reviewer? Since this bot is mostly there to help reviewers with stuck PRs, I don't think it's that much of a problem if we forbid external contributors to run it (we can always do it for them). And on the contrary it makes things more secure + reduces the workflow complexity by always requiring a pair of eyes "from HF" to validate.

To make the logic simple, I'd suggest adding a maintainers parameter to the GH workflow:

# example for huggingface_hub
jobs:
  style:
    uses: ./.github/workflows/style-bot-action.yml
    with:
      python_quality_dependencies: "[quality]"
      style_command_type: "style_only"
      pr_number: ${{ fromJSON(inputs.pr_number || '0') }}
    secrets:
      bot_token: ${{ secrets.GITHUB_TOKEN }}
    maintainers: hanouticelina wauplin julien-c etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I thought this was already implemented. In huggingface/diffusers#10931, Dhruv from our team had already implemented it.

But accidentally, if the @bot /style command is triggered somehow we still have the risk. So, I would prefer to have the modification check even if there's complexity.

Copy link
Contributor Author

@hanouticelina hanouticelina Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, @Wauplin reminded me that, if we check for protected files, we also have a couple of scripts in utils/ for huggingface_hub that we should check too.
if we keep this step, we can check Makefile, setup.py and the folders utils/ and scripts/.

@glegendre01 what do you think about this?

Copy link
Contributor

@Wauplin Wauplin Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see! Then no need for extra maintainers field. Not sure it makes sense to have the concept of "protected files" then. What I fear with this added layer is that:

  1. if someone makes legit changes to a protected file, we can't use the bot (in huggingface_hub, this can happen for files under https://github.com/huggingface/huggingface_hub/tree/main/utils)
  2. it is easy to forget to add a file to this "protected files" list. Having a manual step (i.e. writing @bot /style) is similar to the "approve and run workflows" button from GH. I feel that not having a "protected files" list shifts the responsibility on the admins who should feel more engaged/involved to really check what's inside the PR before approving it
    1. related to 2., I feel that the "protected files list" adds a broader scope of logic to think of, meaning more weaknesses. Typically someone that find a way to execute custom code when a script is triggered, without modifying this script. Since the range of existing tools and scripts is large, it's harder to account for all potential cases. Whereas requiring an extra check from admins (by letting them know that they have full responsibility, not half-responsibility) makes it much harder to dodge the system. Or at least, as hard as any change in a test file that is also executed by the CI.

No strong opinion overall if you or @glegendre01 really think this is a requirement to improve security, but IMO we should not do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I see your point. If the workflow is aimed to be used across numerous repos, then of course having the admins check the file changes first is easier considering the trade-offs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note that this was only my 2c, I'm not security expert here. I just find that the filter might lower attention without completely removing all the attack scope. And that requiring CI approval for "make style" is quite similar to requiring CI approval for the tests suite. Both can execute malicious code if inadvertently approved. "half responsibility" is not the best way to phrase it but didn't find a better terminology 😬)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requiring CI approval for "make style" is quite similar to requiring CI approval for the tests suite. Both can execute malicious code if inadvertently approved.

agree! 👍

console.log("Protected files:", protectedFiles);

for (const file of protectedFiles) {
if (modifiedFiles.includes(file)) {
core.setFailed(`❌ Error: Protected file '${file}' has been modified in this PR. This is not allowed for security reasons.`);
return;
}
}

console.log("✅ All protected files check passed!");
- name: Set up Python
uses: actions/setup-python@v4
with:
Expand All @@ -113,18 +155,28 @@ jobs:
python -m pip install --upgrade pip
pip install .$python_quality_dependencies

- name: ${{ inputs.pre_commit_script_name }}
env:
pre_commit_script: ${{ inputs.pre_commit_script }}
if: inputs.pre_commit_script != ''
run: |
bash -c "${pre_commit_script}"

- name: Run style command
env:
style_command: ${{ inputs.style_command }}
id: run_style
run: |
bash -c "$style_command"
case "${{ inputs.style_command_type }}" in
"default")
echo "Running default style and quality checks"
make style && make quality
;;
"quality_only")
echo "Running quality checks only"
make quality
;;
"style_only")
echo "Running style checks only"
make style
;;
*)
echo "Invalid style_command_type: ${{ inputs.style_command_type }}"
echo "Valid options are: 'default', 'quality_only', 'style_only'"
exit 1
;;
esac

- name: Commit and push changes
id: commit_and_push
Expand Down
11 changes: 9 additions & 2 deletions .github/workflows/style-bot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,13 @@ name: Style Bot
on:
issue_comment:
types: [created]

workflow_dispatch:
inputs:
pr_number:
description: 'Pull Request number to run style checks on'
required: false
type: number

permissions:
contents: write
pull-requests: write
Expand All @@ -13,6 +19,7 @@ jobs:
uses: ./.github/workflows/style-bot-action.yml
with:
python_quality_dependencies: "[quality]"
style_command: "make style"
style_command_type: "style_only"
pr_number: ${{ fromJSON(inputs.pr_number || '0') }}
secrets:
bot_token: ${{ secrets.GITHUB_TOKEN }}