You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+16-8Lines changed: 16 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,8 @@
1
-
## 🧑🏿💻 Developing
1
+
#ACE
2
2
3
-
### Installing dependencies
3
+
ACE (Active learning for Capability Evaluation) is a novel framework that uses active learning and powerful language models to automate fine-grained evaluation of foundation models. It enables scalable, adaptive testing that uncovers strengths and weaknesses beyond static benchmarks.
4
+
5
+
## Installing dependencies
4
6
5
7
The development environment can be set up using
6
8
[poetry](https://python-poetry.org/docs/#installation). Hence, make sure it is
@@ -18,17 +20,17 @@ run:
18
20
python3 -m poetry install --with test
19
21
```
20
22
21
-
### [Optional] Google Cloud Authentication
23
+
####[Optional] Google Cloud Authentication
22
24
23
25
The capability evaluation logs (evaluated using [Inspect](https://inspect.aisi.org.uk/)) are stored in a GCP bucket. Use the following command to log in using your GCP account:
#### Capability Generation using the scientist LLM
53
+
### Capability Generation using the scientist LLM
54
+
55
+
Generates capability names and descriptions in the first step. In the second step, for each capability, it generates tasks, solves them, and verifies the solutions.
52
56
53
57
```bash
54
58
python3 src/run_capability_generation.py
55
59
```
56
60
57
-
#### Evaluation of subject LLM on generated capabilities
61
+
### Evaluation of subject LLM on generated capabilities
62
+
63
+
Evaluates the subject LLM on the generated capabilities and calculates a score for each.
58
64
59
65
```bash
60
66
python3 src/run_evaluation.py
61
67
```
62
68
63
-
#### Run active learning pipeline
69
+
### Capability selection/generation using active learning
70
+
71
+
Utilize the capability and the corresponding subject LLM score to select or generate a new capability.
0 commit comments