Skip to content

Commit 144852f

Browse files
refactor: benchmarks (#33896)
* refactor: benchmarks Based on a discussion with @LysandreJik & @ArthurZucker, the goal of this PR is to improve transformers' benchmark system. This is a WIP, for the moment the infrastructure required to make things work is not ready. Will update the PR description when it is the case. * feat: add db init in benchmarks CI * fix: pg_config is missing in runner * fix: add psql to the runner * fix: connect info from env vars + PR comments * refactor: set database as env var * fix: invalid working directory * fix: `commit_msg` -> `commit_message` * fix: git marking checked out repo as unsafe * feat: add logging * fix: invalid device * feat: update grafana dashboard for prod grafana * feat: add `commit_id` to header table * feat: commit latest version of dashboard * feat: move measurements into json field * feat: remove drop table migration queries * fix: `torch.arrange` -> `torch.arange` * fix: add missing `s` to `cache_position` positional argument * fix: change model * revert: `cache_positions` -> `cache_position` * fix: set device for `StaticCache` * fix: set `StaticCache` dtype * feat: limit max cache len * fix script * raise error on failure! * not try catch * try to skip generate compilation * update * update docker image! * update * update again!@ * update * updates * ??? * ?? * use `torch.cuda.synchronize()` * fix json * nits * fix * fixed! * f**k * feat: add TTNT panels * feat: add try except --------- Co-authored-by: Arthur Zucker <[email protected]>
1 parent 80bee7b commit 144852f

File tree

5 files changed

+2697
-22
lines changed

5 files changed

+2697
-22
lines changed

.github/workflows/benchmark.yml

Lines changed: 51 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,43 +1,72 @@
11
name: Self-hosted runner (benchmark)
22

33
on:
4-
schedule:
5-
- cron: "17 2 * * *"
6-
workflow_call:
4+
push:
5+
branches: [main]
6+
pull_request:
7+
types: [ opened, labeled, reopened, synchronize ]
8+
9+
concurrency:
10+
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
11+
cancel-in-progress: true
712

813
env:
914
HF_HOME: /mnt/cache
10-
TF_FORCE_GPU_ALLOW_GROWTH: true
11-
1215

1316
jobs:
1417
benchmark:
1518
name: Benchmark
16-
runs-on:
19+
runs-on:
1720
group: aws-g5-4xlarge-cache
1821
container:
19-
image: huggingface/transformers-all-latest-gpu
20-
options: --gpus all --privileged --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
22+
image: huggingface/transformers-pytorch-gpu
23+
options: --gpus all --privileged --ipc host
2124
steps:
22-
- name: Update clone
23-
working-directory: /transformers
25+
- name: Get repo
26+
if: github.event_name == 'pull_request'
27+
uses: actions/checkout@v4
28+
with:
29+
ref: ${{ github.event.pull_request.head.sha }}
30+
31+
- name: Get repo
32+
if: github.event_name == 'push'
33+
uses: actions/checkout@v4
34+
with:
35+
ref: ${{ github.sha }}
36+
37+
- name: Install libpq-dev & psql
2438
run: |
25-
git fetch && git checkout ${{ github.sha }}
39+
apt update
40+
apt install -y libpq-dev postgresql-client
41+
42+
- name: Install benchmark script dependencies
43+
run: python3 -m pip install -r benchmark/requirements.txt
2644

2745
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
2846
working-directory: /transformers
29-
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
47+
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e ".[torch]"
3048

31-
- name: Benchmark (daily)
32-
if: github.event_name == 'schedule'
33-
working-directory: /transformers
49+
- name: Run database init script
3450
run: |
35-
python3 -m pip install optimum-benchmark>=0.3.0
36-
HF_TOKEN=${{ secrets.TRANSFORMERS_BENCHMARK_TOKEN }} python3 benchmark/benchmark.py --repo_id hf-internal-testing/benchmark_results --path_in_repo $(date +'%Y-%m-%d') --config-dir benchmark/config --config-name generation --commit=${{ github.sha }} backend.model=google/gemma-2b backend.cache_implementation=null,static backend.torch_compile=false,true --multirun
51+
psql -f benchmark/init_db.sql
52+
env:
53+
PGDATABASE: metrics
54+
PGHOST: ${{ secrets.TRANSFORMERS_BENCHMARKS_PGHOST }}
55+
PGUSER: transformers_benchmarks
56+
PGPASSWORD: ${{ secrets.TRANSFORMERS_BENCHMARKS_PGPASSWORD }}
3757

38-
- name: Benchmark (merged to main event)
39-
if: github.event_name == 'push' && github.ref_name == 'main'
40-
working-directory: /transformers
58+
- name: Run benchmark
4159
run: |
42-
python3 -m pip install optimum-benchmark>=0.3.0
43-
HF_TOKEN=${{ secrets.TRANSFORMERS_BENCHMARK_TOKEN }} python3 benchmark/benchmark.py --repo_id hf-internal-testing/benchmark_results_merge_event --path_in_repo $(date +'%Y-%m-%d') --config-dir benchmark/config --config-name generation --commit=${{ github.sha }} backend.model=google/gemma-2b backend.cache_implementation=null,static backend.torch_compile=false,true --multirun
60+
git config --global --add safe.directory /__w/transformers/transformers
61+
if [ "$GITHUB_EVENT_NAME" = "pull_request" ]; then
62+
commit_id=$(echo "${{ github.event.pull_request.head.sha }}")
63+
elif [ "$GITHUB_EVENT_NAME" = "push" ]; then
64+
commit_id=$GITHUB_SHA
65+
fi
66+
commit_msg=$(git show -s --format=%s | cut -c1-70)
67+
python3 benchmark/llama.py "${{ github.head_ref || github.ref_name }}" "$commit_id" "$commit_msg"
68+
env:
69+
HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
70+
PGHOST: ${{ secrets.TRANSFORMERS_BENCHMARKS_PGHOST }}
71+
PGUSER: transformers_benchmarks
72+
PGPASSWORD: ${{ secrets.TRANSFORMERS_BENCHMARKS_PGPASSWORD }}

0 commit comments

Comments
 (0)