Commit 1656c0a
updated onechat evaluation suite (#232728)
## Summary
This pull request updates the system prompt and example outputs for the
correctness evaluator.
The main changes include:
1. Moving the `summary` object to the end of each JSON output
2. Adding the `sequence_match` field to each claim analysis, and
ensuring all example outputs follow the revised format
3. Adding all the examples (total 70; current 5) from the onechat
evaluation dataset to the onechat test suite
Closes: elastic/search-team#10824
### Checklist
Check the PR satisfies following conditions.
Reviewers should verify this PR satisfies this list as well.
- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [ ] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [ ] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.
### Identify risks
Does this PR introduce any risks? For example, consider risks like hard
to test bugs, performance regression, potential of data loss.
Describe the risk, its severity, and mitigation for each identified
risk. Invite stakeholders and evaluate how to proceed before merging.
- [ ] [See some risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)
- [ ] ...
---------
Co-authored-by: Srdjan Lulic <[email protected]>1 parent 443f21b commit 1656c0a
File tree
2 files changed
+1094
-48
lines changed2 files changed
+1094
-48
lines changedLines changed: 31 additions & 22 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | 70 | | |
76 | 71 | | |
77 | 72 | | |
| |||
82 | 77 | | |
83 | 78 | | |
84 | 79 | | |
85 | | - | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
86 | 86 | | |
87 | 87 | | |
88 | 88 | | |
| |||
95 | 95 | | |
96 | 96 | | |
97 | 97 | | |
98 | | - | |
99 | | - | |
100 | | - | |
101 | | - | |
102 | 98 | | |
103 | 99 | | |
104 | 100 | | |
105 | 101 | | |
106 | 102 | | |
107 | 103 | | |
| 104 | + | |
108 | 105 | | |
109 | 106 | | |
110 | 107 | | |
| |||
113 | 110 | | |
114 | 111 | | |
115 | 112 | | |
| 113 | + | |
116 | 114 | | |
117 | 115 | | |
118 | 116 | | |
| |||
121 | 119 | | |
122 | 120 | | |
123 | 121 | | |
| 122 | + | |
124 | 123 | | |
125 | 124 | | |
126 | 125 | | |
127 | | - | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
128 | 132 | | |
129 | 133 | | |
130 | 134 | | |
| |||
136 | 140 | | |
137 | 141 | | |
138 | 142 | | |
139 | | - | |
140 | | - | |
141 | | - | |
142 | | - | |
143 | 143 | | |
144 | 144 | | |
145 | 145 | | |
146 | 146 | | |
147 | 147 | | |
148 | 148 | | |
| 149 | + | |
149 | 150 | | |
150 | 151 | | |
151 | 152 | | |
| |||
154 | 155 | | |
155 | 156 | | |
156 | 157 | | |
| 158 | + | |
157 | 159 | | |
158 | 160 | | |
159 | 161 | | |
| |||
162 | 164 | | |
163 | 165 | | |
164 | 166 | | |
| 167 | + | |
165 | 168 | | |
166 | 169 | | |
167 | 170 | | |
| |||
170 | 173 | | |
171 | 174 | | |
172 | 175 | | |
| 176 | + | |
173 | 177 | | |
174 | 178 | | |
175 | 179 | | |
176 | | - | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
177 | 186 | | |
178 | 187 | | |
179 | 188 | | |
| |||
185 | 194 | | |
186 | 195 | | |
187 | 196 | | |
188 | | - | |
189 | | - | |
190 | | - | |
191 | | - | |
192 | | - | |
193 | 197 | | |
194 | 198 | | |
195 | 199 | | |
| |||
227 | 231 | | |
228 | 232 | | |
229 | 233 | | |
230 | | - | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
231 | 240 | | |
232 | 241 | | |
0 commit comments