Skip to content

Commit 4c76340

Browse files
authored
Merge pull request #165 from danmcp/unittests
Add facilities for unit and functional tests
2 parents af35854 + 5cff4ab commit 4c76340

File tree

23 files changed

+3085
-34
lines changed

23 files changed

+3085
-34
lines changed

.github/workflows/test.yml

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# SPDX-License-Identifier: Apache-2.0
2+
3+
name: Test
4+
5+
on:
6+
workflow_dispatch:
7+
push:
8+
branches:
9+
- "main"
10+
- "release-**"
11+
paths:
12+
- '**.py'
13+
- 'pyproject.toml'
14+
- 'requirements**.txt'
15+
- 'tox.ini'
16+
- 'scripts/*.sh' # Used by this workflow
17+
- '.github/workflows/test.yml' # This workflow
18+
pull_request:
19+
branches:
20+
- "main"
21+
- "release-**"
22+
paths:
23+
- '**.py'
24+
- 'pyproject.toml'
25+
- 'requirements**.txt'
26+
- 'tox.ini'
27+
- 'scripts/*.sh' # Used by this workflow
28+
- '.github/workflows/test.yml' # This workflow
29+
30+
env:
31+
LC_ALL: en_US.UTF-8
32+
33+
defaults:
34+
run:
35+
shell: bash
36+
37+
permissions:
38+
contents: read
39+
40+
jobs:
41+
test:
42+
name: "test: ${{ matrix.python }} on ${{ matrix.platform }}"
43+
runs-on: "${{ matrix.platform }}"
44+
strategy:
45+
matrix:
46+
python:
47+
- "3.10"
48+
- "3.11"
49+
platform:
50+
- "ubuntu-latest"
51+
include:
52+
- python: "3.11"
53+
platform: "macos-latest"
54+
steps:
55+
- name: "Harden Runner"
56+
uses: step-security/harden-runner@91182cccc01eb5e619899d80e4e971d6181294a7 # v2.10.1
57+
with:
58+
egress-policy: audit # TODO: change to 'egress-policy: block' after couple of runs
59+
60+
- name: Checkout
61+
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
62+
with:
63+
# https://github.com/actions/checkout/issues/249
64+
fetch-depth: 0
65+
66+
- name: Free disk space
67+
if: matrix.platform != 'macos-latest'
68+
uses: ./.github/actions/free-disk-space
69+
70+
- name: Install the expect package
71+
if: startsWith(matrix.platform, 'ubuntu')
72+
run: |
73+
sudo apt-get install -y expect
74+
75+
- name: Install tools on MacOS
76+
if: startsWith(matrix.platform, 'macos')
77+
run: |
78+
brew install expect coreutils bash
79+
80+
- name: Setup Python ${{ matrix.python }}
81+
uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
82+
with:
83+
python-version: ${{ matrix.python }}
84+
cache: pip
85+
cache-dependency-path: |
86+
**/pyproject.toml
87+
**/requirements*.txt
88+
89+
- name: Remove llama-cpp-python from cache
90+
run: |
91+
pip cache remove llama_cpp_python
92+
93+
- name: Cache huggingface
94+
uses: actions/cache@6849a6489940f00c2f30c0fb92c6274307ccb58a # v4.1.2
95+
with:
96+
path: ~/.cache/huggingface
97+
# config contains DEFAULT_MODEL
98+
key: huggingface-${{ hashFiles('src/instructlab/configuration.py') }}
99+
100+
- name: Install dependencies
101+
run: |
102+
python -m pip install --upgrade pip
103+
python -m pip install tox tox-gh>=1.2
104+
105+
- name: Run unit and functional tests with tox
106+
run: |
107+
tox
108+
109+
- name: Remove llama-cpp-python from cache
110+
if: always()
111+
run: |
112+
pip cache remove llama_cpp_python
113+
114+
test-workflow-complete:
115+
needs: ["test"]
116+
runs-on: ubuntu-latest
117+
steps:
118+
- name: Test Workflow Complete
119+
run: echo "Test Workflow Complete"

.spellcheck-en-custom.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ Backport
66
backported
77
benchmarking
88
codebase
9+
cli
10+
dev
911
dr
1012
eval
1113
gpt
@@ -16,9 +18,11 @@ jsonl
1618
justfile
1719
MMLU
1820
openai
21+
pre
1922
SDG
2023
Tatsu
2124
tl
2225
TODO
26+
tox
2327
venv
2428
vllm

Makefile

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,3 +54,9 @@ spellcheck-sort: .spellcheck-en-custom.txt ## Sort spellcheck directory
5454
.PHONY: verify
5555
verify: check-tox ## Run linting, typing, and formatting checks via tox
5656
tox p -e fastlint,mypy,ruff
57+
58+
##@ Development
59+
60+
.PHONY: tests
61+
tests: check-tox ## Run unit and type checks
62+
tox -e py3-unit,mypy

README.md

Lines changed: 83 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# eval
22

33
![Lint](https://github.com/instructlab/eval/actions/workflows/lint.yml/badge.svg?branch=main)
4+
![Tests](https://github.com/instructlab/eval/actions/workflows/test.yml/badge.svg?branch=main)
45
![Build](https://github.com/instructlab/eval/actions/workflows/pypi.yaml/badge.svg?branch=main)
56
![Release](https://img.shields.io/github/v/release/instructlab/eval)
67
![License](https://img.shields.io/github/license/instructlab/eval)
@@ -77,20 +78,32 @@ MMLU Branch is an adaptation of MMLU that is designed to test custom knowledge t
7778

7879
A teacher model is used to generate new multiple choice questions based on the knowledge document included in the taxonomy Git branch. A “task” is then constructed that references the newly generated answer choices. These tasks are then used to score the model’s grasp on new knowledge the same way MMLU works. Generation of these tasks are done as part of the [InstructLab SDG](https://github.com/instructlab/sdg) library.
7980

80-
## MT-Bench / MT-Bench Branch Testing Steps
81+
## Development
8182

8283
> **⚠️ Note:** Must use Python version 3.10 or later.
8384
85+
### Set up your dev environment
86+
87+
The following tools are required:
88+
89+
- [`git`](https://git-scm.com)
90+
- [`python`](https://www.python.org) (v3.10 or v3.11)
91+
- [`pip`](https://pypi.org/project/pip/) (v23.0+)
92+
- [`bash`](https://www.gnu.org/software/bash/) (v5+, for functional tests)
93+
94+
#### Optional: Use [cloud-instance.sh](https://github.com/instructlab/instructlab/tree/main/scripts/infra) to launch and setup an instance
95+
8496
```shell
85-
# Optional: Use cloud-instance.sh (https://github.com/instructlab/instructlab/tree/main/scripts/infra) to launch and setup the instance
86-
scripts/infra/cloud-instance.sh ec2 launch -t g5.4xlarge
97+
scripts/infra/cloud-instance.sh ec2 launch -t g6.2xlarge
8798
scripts/infra/cloud-instance.sh ec2 setup-rh-devenv
8899
scripts/infra/cloud-instance.sh ec2 install-rh-nvidia-drivers
89100
scripts/infra/cloud-instance.sh ec2 ssh sudo reboot
90101
scripts/infra/cloud-instance.sh ec2 ssh
102+
```
91103

104+
#### Regardless of how you setup your instance
92105

93-
# Regardless of how you setup your instance
106+
```shell
94107
git clone https://github.com/instructlab/taxonomy.git && pushd taxonomy && git branch rc && popd
95108
git clone --bare https://github.com/instructlab/eval.git && git clone eval.git/ && cd eval && git remote add syncrepo ../eval.git
96109
python3 -m venv venv
@@ -99,6 +112,68 @@ pip install -r requirements.txt
99112
pip install -r requirements-dev.txt
100113
pip install -e .
101114
pip install vllm
115+
```
116+
117+
### Testing
118+
119+
Before pushing changes to GitHub, you need to run the tests as shown below. They can be run individually as shown in each sub-section
120+
or can be run with the one command:
121+
122+
```shell
123+
tox
124+
```
125+
126+
#### Unit tests
127+
128+
Unit tests are enforced by the CI system using [`pytest`](https://docs.pytest.org/). When making changes, run these tests before pushing the changes to avoid CI issues.
129+
130+
Running unit tests can be done with:
131+
132+
```shell
133+
tox -e py3-unit
134+
```
135+
136+
By default, all tests found within the `tests` directory are run. However, specific unit tests can run by passing filenames, classes and/or methods to `pytest` using tox positional arguments. The following example invokes a single test method `test_mt_bench` that is declared in the `tests/test_mt_bench.py` file:
137+
138+
```shell
139+
tox -e py3-unit -- tests/test_mt_bench.py::test_mt_bench
140+
```
141+
142+
#### Functional tests
143+
144+
Functional tests are enforced by the CI system. When making changes, run the tests before pushing the changes to avoid CI issues.
145+
146+
Running functional tests can be done with:
147+
148+
```shell
149+
tox -e py3-functional
150+
```
151+
152+
#### Coding style
153+
154+
Cli follows the python [`pep8`](https://peps.python.org/pep-0008/) coding style. The coding style is enforced by the CI system, and your PR will fail until the style has been applied correctly.
155+
156+
We use [pre-commit](https://pre-commit.com/) to enforce coding style using [`black`](https://github.com/psf/black), and [`isort`](https://pycqa.github.io/isort/).
157+
158+
You can invoke formatting with:
159+
160+
```shell
161+
tox -e ruff
162+
```
163+
164+
In addition, we use [`pylint`](https://www.pylint.org) to perform static code analysis of the code.
165+
166+
You can invoke the linting with the following command
167+
168+
```shell
169+
tox -e lint
170+
```
171+
172+
### MT-Bench / MT-Bench Branch Example Usage
173+
174+
Launch vllm serving granite-7b-lab
175+
176+
```shell
102177
python -m vllm.entrypoints.openai.api_server --model instructlab/granite-7b-lab --tensor-parallel-size 1
103178
```
104179

@@ -107,8 +182,8 @@ In another shell window
107182
```shell
108183
export INSTRUCTLAB_EVAL_FIRST_N_QUESTIONS=10 # Optional if you want to shorten run times
109184
# Commands relative to eval directory
110-
python3 tests/test_gen_answers.py
111-
python3 tests/test_branch_gen_answers.py
185+
python3 scripts/test_gen_answers.py
186+
python3 scripts/test_branch_gen_answers.py
112187
```
113188

114189
Example output tree
@@ -139,8 +214,8 @@ eval_output/
139214
```
140215

141216
```shell
142-
python3 tests/test_judge_answers.py
143-
python3 tests/test_branch_judge_answers.py
217+
python3 scripts/test_judge_answers.py
218+
python3 scripts/test_branch_judge_answers.py
144219
```
145220

146221
Example output tree

0 commit comments

Comments
 (0)