Commit ec52ee5
committed
chore: allow adding custom parameter scorers
In some cases, find, aggregate, count, explain, deleteMany, etc we need
to grade extra provided arguments depending on the prompt itself.
Sometimes additional parameters are fine and sometimes they are not. For
example: increasing the keys in filter might lead to a different result
hence if any such thing happens, we should grade the accuracy as 0 and
not 0.75. To suppor this use-case, this commit introduces the idea of
a custom scorer that could be plugged in to accuracy scorer to provided
more controlled accuracy grading.
Additionally this commit reverts the default behaviour of handling added
parameters. Earlier we were marking newly added parameters as
hallucinations and hence grading 0.75. But now, after figuring out that
most of our tools don't even expect extra parameters, we are flipping
the switch and instead will now grade 0 when additional parameters are
specified, unless there is a scorer provided to handle the custom
scoring logic.1 parent 7c3061d commit ec52ee5
File tree
12 files changed
+475
-141
lines changed- tests
- accuracy
- sdk
- unit
12 files changed
+475
-141
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
| 2 | + | |
2 | 3 | | |
3 | 4 | | |
4 | 5 | | |
5 | 6 | | |
6 | 7 | | |
7 | 8 | | |
8 | 9 | | |
9 | | - | |
10 | | - | |
11 | | - | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
12 | 19 | | |
13 | 20 | | |
14 | 21 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
| 5 | + | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
11 | | - | |
| 10 | + | |
| 11 | + | |
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
| 17 | + | |
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
23 | | - | |
| 22 | + | |
| 23 | + | |
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
| 2 | + | |
2 | 3 | | |
3 | 4 | | |
4 | 5 | | |
5 | 6 | | |
6 | 7 | | |
7 | 8 | | |
8 | 9 | | |
9 | | - | |
10 | | - | |
11 | | - | |
12 | | - | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
13 | 17 | | |
14 | 18 | | |
15 | 19 | | |
| |||
18 | 22 | | |
19 | 23 | | |
20 | 24 | | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
25 | 32 | | |
26 | 33 | | |
27 | 34 | | |
| |||
30 | 37 | | |
31 | 38 | | |
32 | 39 | | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
38 | 48 | | |
39 | 49 | | |
40 | 50 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
| 2 | + | |
2 | 3 | | |
3 | 4 | | |
4 | 5 | | |
5 | 6 | | |
6 | 7 | | |
7 | 8 | | |
8 | 9 | | |
9 | | - | |
10 | | - | |
11 | | - | |
12 | | - | |
13 | | - | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
14 | 17 | | |
15 | | - | |
| 18 | + | |
| 19 | + | |
16 | 20 | | |
17 | 21 | | |
18 | 22 | | |
| |||
21 | 25 | | |
22 | 26 | | |
23 | 27 | | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
29 | 35 | | |
30 | | - | |
| 36 | + | |
| 37 | + | |
31 | 38 | | |
32 | 39 | | |
33 | 40 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
| 2 | + | |
2 | 3 | | |
3 | 4 | | |
4 | 5 | | |
5 | 6 | | |
6 | 7 | | |
7 | 8 | | |
8 | 9 | | |
9 | | - | |
10 | | - | |
11 | | - | |
12 | | - | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
13 | 17 | | |
14 | 18 | | |
15 | 19 | | |
| |||
18 | 22 | | |
19 | 23 | | |
20 | 24 | | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
25 | 32 | | |
26 | 33 | | |
27 | 34 | | |
| |||
30 | 37 | | |
31 | 38 | | |
32 | 39 | | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
37 | 48 | | |
38 | 49 | | |
39 | 50 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
| 2 | + | |
2 | 3 | | |
3 | 4 | | |
4 | 5 | | |
| |||
11 | 12 | | |
12 | 13 | | |
13 | 14 | | |
14 | | - | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
22 | 25 | | |
23 | | - | |
24 | | - | |
25 | | - | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
26 | 31 | | |
27 | 32 | | |
28 | 33 | | |
29 | 34 | | |
30 | | - | |
| 35 | + | |
31 | 36 | | |
32 | 37 | | |
33 | 38 | | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
46 | 53 | | |
47 | | - | |
48 | | - | |
49 | | - | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
50 | 58 | | |
51 | 59 | | |
52 | 60 | | |
53 | 61 | | |
54 | | - | |
| 62 | + | |
55 | 63 | | |
56 | 64 | | |
57 | 65 | | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
66 | 76 | | |
67 | | - | |
68 | | - | |
69 | | - | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
70 | 81 | | |
71 | 82 | | |
72 | 83 | | |
| |||
0 commit comments