Skip to content

Commit 06cfc03

Browse files
committed
add llm_tests target and CI
Signed-off-by: Jack Luar <[email protected]>
1 parent 1523731 commit 06cfc03

File tree

5 files changed

+56
-1
lines changed

5 files changed

+56
-1
lines changed

.github/workflows/ci.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,16 @@ jobs:
2828
- name: Build Docker image
2929
run: |
3030
make docker
31+
- name: Run LLM CI
32+
working-directory: evaluation
33+
run: |
34+
make llm-tests
35+
- name: Create commit comment
36+
working-directory: evaluation
37+
uses: peter-evans/commit-comment@v3
38+
with:
39+
token: ${{ secrets.GH_PATH }}
40+
body-path: llm-tests-output.txt
3141
- name: Teardown
3242
if: always()
3343
run: |

Makefile

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
1-
FOLDERS=backend frontend
1+
.PHONY: init init-dev format check
2+
3+
FOLDERS=backend frontend evaluation
24

35
init:
46
@for folder in $(FOLDERS); do (cd $$folder && make init && cd ../); done

evaluation/Makefile

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.PHONY: init init-dev format check clean
2+
13
init:
24
@python3 -m venv .venv && \
35
. .venv/bin/activate && \
@@ -16,3 +18,10 @@ format:
1618
check:
1719
@. .venv/bin/activate && \
1820
ruff check --fix
21+
22+
clean:
23+
@rm -f llm_tests_output.txt
24+
25+
llm-tests: clean
26+
@. .venv/bin/activate && \
27+
./auto_evaluation/llm_tests.sh > llm_tests_output.txt 2>&1
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
#!/bin/bash -eu
2+
3+
retrievers=(
4+
"agent-retriever" \
5+
"ensemble" \
6+
)
7+
8+
echo "==================================="
9+
echo "==> Dataset: EDA Corpus"
10+
for retriever in "${retrievers[@]}" ; do
11+
echo "==> Running tests for $retriever"
12+
python auto_evaluation/eval_main.py \
13+
--base_url http://localhost:8000 \
14+
--dataset ./auto_evaluation/dataset/EDA_Corpus_100_Question.csv \
15+
--retriever $retriever
16+
echo "==> Done"
17+
done
18+
echo "==================================="

evaluation/llm_tests_output.txt

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
===================================
2+
==> Dataset: EDA Corpus
3+
==> Running tests for agent-retriever
4+
/home/luars/ORAssistant/evaluation/.venv/lib/python3.12/site-packages/deepeval/__init__.py:49: UserWarning: You are using deepeval version 1.4.9, however version 1.5.0 is available. You should consider upgrading via the "pip install --upgrade deepeval" command.
5+
warnings.warn(
6+
Fetching 3 files: 0%| | 0/3 [00:00<?, ?it/s]Fetching 3 files: 100%|██████████| 3/3 [00:00<00:00, 33.41it/s]
7+
Traceback (most recent call last):
8+
File "/home/luars/ORAssistant/evaluation/auto_evaluation/eval_main.py", line 146, in <module>
9+
harness = EvaluationHarness(args.base_url, args.dataset, args.reranker_base_url)
10+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
11+
File "/home/luars/ORAssistant/evaluation/auto_evaluation/eval_main.py", line 44, in __init__
12+
self.qns = preprocess.read_data(self.dataset)
13+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14+
File "/home/luars/ORAssistant/evaluation/auto_evaluation/dataset/preprocess.py", line 10, in read_data
15+
assert len(header) == 2, "CSV file must have exactly 2 columns"
16+
AssertionError: CSV file must have exactly 2 columns

0 commit comments

Comments
 (0)