Skip to content

Commit 9e0c81d

Browse files
committed
Add multimodal JSONL ground truth
1 parent f2007b2 commit 9e0c81d

File tree

66 files changed

+249
-98
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

66 files changed

+249
-98
lines changed

app/backend/approaches/approach.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -364,7 +364,8 @@ def nonewlines(s: str) -> str:
364364
for doc in results:
365365
# Get the citation for the source page
366366
citation = self.get_citation(doc.sourcepage)
367-
citations.append(citation)
367+
if citation not in citations:
368+
citations.append(citation)
368369

369370
# If semantic captions are used, extract captions; otherwise, use content
370371
if use_semantic_captions and doc.captions:

app/backend/approaches/prompts/ask_answer_question.prompty

Lines changed: 6 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -14,26 +14,20 @@ system:
1414
{% if override_prompt %}
1515
{{ override_prompt }}
1616
{% else %}
17-
You are an intelligent assistant helping Contoso Inc employees with their healthcare plan questions and employee handbook questions.
18-
Use 'you' to refer to the individual asking the questions even if they ask with 'I'.
19-
Answer the following question using only the data provided in the sources below.
20-
Each source has a name followed by colon and the actual information, always include the source name for each fact you use in the response.
21-
If you cannot answer using the sources below, say you don't know. Use below example to answer.
17+
Assistant helps the company employees with their questions about internal documents. Be brief in your answers.
18+
Answer ONLY with the facts listed in the list of sources below. If there isn't enough information below, say you don't know. Do not generate answers that don't use the sources below.
19+
You CANNOT ask clarifying questions to the user, since the user will have no way to reply.
20+
If the question is not in English, answer in the language used in the question.
21+
Each source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. Use square brackets to reference the source, for example [info1.txt]. Don't combine sources, list each source separately, for example [info1.txt][info2.pdf].
2222
{% if image_sources %}
2323
Each image source has the document file name in the top left corner of the image with coordinates (10,10) pixels with format <filename.ext#page=N>,
2424
and the image figure name is right-aligned in the top right corner of the image.
2525
The filename of the actual image is in the top right corner of the image and is in the format <figureN_N.png>.
2626
Each text source starts in a new line and has the file name followed by colon and the actual information.
2727
Always include the source document filename for each fact you use in the response in the format: [document_name.ext#page=N].
2828
If you are referencing an image, add the image filename in the format: [document_name.ext#page=N(image_name.png)].
29-
Answer the following question using only the data provided in the sources below.
30-
If you cannot answer using the sources below, say you don't know.
31-
Return just the answer without any input texts.
3229
{% endif %}
33-
Possible citations for current question:
34-
{% for citation in citations %}
35-
[{{ citation }}]
36-
{% endfor %}
30+
Possible citations for current question: {% for citation in citations %} [{{ citation }}] {% endfor %}
3731
{{ injected_prompt }}
3832
{% endif %}
3933

app/backend/approaches/prompts/chat_answer_question.prompty

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -20,22 +20,20 @@ system:
2020
{% if override_prompt %}
2121
{{ override_prompt }}
2222
{% else %}
23-
Assistant helps the company employees with their healthcare plan questions, and questions about the employee handbook. Be brief in your answers.
24-
Answer ONLY with the facts listed in the list of sources below. If there isn't enough information below, say you don't know. Do not generate answers that don't use the sources below. If asking a clarifying question to the user would help, ask the question.
23+
Assistant helps the company employees with their questions about internal documents. Be brief in your answers.
24+
Answer ONLY with the facts listed in the list of sources below. If there isn't enough information below, say you don't know. Do not generate answers that don't use the sources below.
25+
If asking a clarifying question to the user would help, ask the question.
2526
If the question is not in English, answer in the language used in the question.
2627
Each source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. Use square brackets to reference the source, for example [info1.txt]. Don't combine sources, list each source separately, for example [info1.txt][info2.pdf].
27-
{% if include_images %}
28+
{% if image_sources %}
2829
Each image source has the document file name in the top left corner of the image with coordinates (10,10) pixels with format <filename.ext#page=N>,
2930
and the image figure name is right-aligned in the top right corner of the image.
3031
The filename of the actual image is in the top right corner of the image and is in the format <figureN_N.png>.
3132
Each text source starts in a new line and has the file name followed by colon and the actual information
32-
Always include the source name from the image or text for each fact you use in the response in the format: [filename]
33-
Answer the following question using only the data provided in the sources below.
34-
If asking a clarifying question to the user would help, ask the question.
35-
Be brief in your answers.
36-
The text and image source can be the same file name, don't use the image title when citing the image source, only use the file name as mentioned
37-
If you cannot answer using the sources below, say you don't know. Return just the answer without any input texts.
33+
Always include the source document filename for each fact you use in the response in the format: [document_name.ext#page=N].
34+
If you are referencing an image, add the image filename in the format: [document_name.ext#page=N(image_name.png)].
3835
{% endif %}
36+
Possible citations for current question: {% for citation in citations %} [{{ citation }}] {% endfor %}
3937
{{ injected_prompt }}
4038
{% endif %}
4139

@@ -56,9 +54,9 @@ Make sure the last question ends with ">>".
5654

5755
user:
5856
{{ user_query }}
59-
{% for image_source in image_sources %}
57+
{% if image_sources is defined %}{% for image_source in image_sources %}
6058
![Image]({{image_source}})
61-
{% endfor %}
59+
{% endfor %}{% endif %}
6260
{% if text_sources is defined %}
6361
Sources:
6462
{% for text_source in text_sources %}
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
{"question": "How closely do the S&P 500 and NASDAQ move together?",
2+
"truth": "The S&P 500 and NASDAQ move very closely together, with a correlation coefficient of 0.95, indicating a strong positive relationship between the two indices [Financial Market Analysis Report 2023.pdf#page=7]."
3+
}
4+
{"question": "Which commodity—oil, gold, or wheat—was the most stable over the last decade?",
5+
"truth": "Over the last decade, gold was the most stable commodity compared to oil and wheat. The annual percentage changes for gold mostly stayed within a smaller range, while oil showed significant fluctuations including a large negative change in 2014 and a large positive peak in 2021. Wheat also varied but less than oil and more than gold [Financial Market Analysis Report 2023.pdf#page=6][Financial Market Analysis Report 2023.pdf#page=6(figure6_1.png)]."
6+
}
7+
{"question": "Do cryptocurrencies like Bitcoin or Ethereum show stronger ties to stocks or commodities?",
8+
"truth": "Cryptocurrencies like Bitcoin and Ethereum show stronger ties to stocks than to commodities. The correlation values between Bitcoin and stock indices are 0.3 with the S&P 500 and 0.4 with NASDAQ, while for Ethereum, the correlations are 0.35 with the S&P 500 and 0.45 with NASDAQ. In contrast, the correlations with commodities like Oil are lower (0.2 for Bitcoin and 0.25 for Ethereum), and correlations with Gold are slightly negative (-0.1 for Bitcoin and -0.05 for Ethereum) [Financial Market Analysis Report 2023.pdf#page=7]."
9+
}
10+
{"question": "Around what level did the S&P 500 reach its highest point before declining in 2021?",
11+
"truth": "The S&P 500 reached its highest point just above the 4500 level before declining in 2021 [Financial Market Analysis Report 2023.pdf#page=4][Financial Market Analysis Report 2023.pdf#page=4(figure4_1.png)]."
12+
}
13+
{"question": "In which month of 2023 did Bitcoin nearly hit 45,000?",
14+
"truth": "Bitcoin nearly hit 45,000 in December 2023, as shown by the blue line reaching close to 45,000 on the graph for that month [Financial Market Analysis Report 2023.pdf#page=5(figure5_1.png)]."
15+
}
16+
{
17+
"question": "Which year saw oil prices fall the most, and by roughly how much did they drop?",
18+
"truth": "The year that saw oil prices fall the most was 2020, with a drop of roughly 20% as shown by the blue bar extending to about -20% on the horizontal bar chart of annual percentage changes for Oil from 2014 to 2022 [Financial Market Analysis Report 2023.pdf#page=6(figure6_1.png)]."
19+
}
20+
{"question": "What was the approximate inflation rate in 2022?",
21+
"truth": "The approximate inflation rate in 2022 was near 3.4% according to the orange line in the inflation data on the graph showing trends from 2018 to 2023 [Financial Market Analysis Report 2023.pdf#page=8(figure8_1.png)]."
22+
}
23+
{"question": "By 2028, to what relative value are oil prices projected to move compared to their 2024 baseline of 100?",
24+
"truth" :"Oil prices are projected to decline to about 90 by 2028, relative to their 2024 baseline of 100. [Financial Market Analysis Report 2023.pdf#page=9(figure9_1.png)]."
25+
}
26+
{"question": "What approximate value did the S&P 500 fall to at its lowest point between 2018 and 2022?",
27+
"truth": "The S&P 500 fell in 2018 to an approximate value of around 2600 at its lowest point between 2018 and 2022, as shown by the graph depicting the 5-Year Trend of the S&P 500 Index [Financial Market Analysis Report 2023.pdf#page=4(figure4_1.png)]."
28+
}
29+
{"question": "Around what value did Ethereum finish the year at in 2023?",
30+
"truth": "Ethereum finished the year 2023 at a value around 2200, as indicated by the orange line on the price fluctuations graph for the last 12 months [Financial Market Analysis Report 2023.pdf#page=5][Financial Market Analysis Report 2023.pdf#page=5(figure5_1.png)][Financial Market Analysis Report 2023.pdf#page=5(figure5_2.png)]."
31+
}

scripts/pretty_print_jsonl.py

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
"""Utility to pretty-format a JSONL (JSON Lines) file.
2+
3+
NOTE: Classic JSONL expects one JSON object per single line. Once we pretty
4+
print (indent) each object, the result is no longer *strict* JSONL because
5+
objects will span multiple lines. This script offers a few output modes so
6+
you can choose what you need:
7+
8+
1. Default (stdout): Pretty prints each record (with indentation) separated
9+
by a blank line for readability.
10+
2. --in-place: Rewrites the source file by replacing each original single-line
11+
object with its multi-line, indented representation separated by a blank line.
12+
3. --output <path>: Writes the pretty output to a new file (recommended if you
13+
also want to keep the original valid JSONL file unchanged).
14+
4. --as-array: Instead of individual objects, emit a single JSON array containing
15+
all objects, using indentation (this produces standard JSON, not JSONL).
16+
17+
Examples:
18+
python scripts/pretty_print_jsonl.py evals/ground_truth_multimodal.jsonl
19+
python scripts/pretty_print_jsonl.py evals/ground_truth_multimodal.jsonl --output evals/ground_truth_multimodal.pretty.jsonl
20+
python scripts/pretty_print_jsonl.py evals/ground_truth_multimodal.jsonl --in-place
21+
python scripts/pretty_print_jsonl.py evals/ground_truth_multimodal.jsonl --as-array --output evals/ground_truth_multimodal.pretty.json
22+
23+
Safeguards:
24+
* Refuses to use --in-place together with --as-array (ambiguous expectations).
25+
* Backs up the original file to <filename>.bak before in-place rewrite unless
26+
--no-backup is supplied.
27+
"""
28+
29+
from __future__ import annotations
30+
31+
import argparse
32+
import json
33+
import sys
34+
from pathlib import Path
35+
36+
37+
def read_jsonl(path: Path):
38+
"""Yield parsed JSON objects from a JSONL file.
39+
40+
Skips empty lines. Raises ValueError with context on parse failures.
41+
"""
42+
for idx, line in enumerate(path.read_text(encoding="utf-8").splitlines(), start=1):
43+
stripped = line.strip()
44+
if not stripped:
45+
continue
46+
try:
47+
yield json.loads(stripped)
48+
except json.JSONDecodeError as e:
49+
raise ValueError(f"Failed to parse JSON on line {idx} of {path}: {e}") from e
50+
51+
52+
def write_pretty_individual(objs, indent: int) -> str:
53+
"""Return a string with each object pretty JSON, separated by a blank line."""
54+
parts = [json.dumps(o, indent=indent, ensure_ascii=False) for o in objs]
55+
# Add trailing newline for file friendliness
56+
return "\n\n".join(parts) + "\n"
57+
58+
59+
def write_pretty_array(objs, indent: int) -> str:
60+
return json.dumps(list(objs), indent=indent, ensure_ascii=False) + "\n"
61+
62+
63+
def parse_args(argv: list[str]) -> argparse.Namespace:
64+
parser = argparse.ArgumentParser(description="Pretty-format a JSONL file.")
65+
parser.add_argument(
66+
"jsonl_file",
67+
type=Path,
68+
help="Path to the source JSONL file (one JSON object per line).",
69+
)
70+
parser.add_argument("--indent", type=int, default=2, help="Indent level for json.dumps (default: 2)")
71+
group = parser.add_mutually_exclusive_group()
72+
group.add_argument(
73+
"--in-place",
74+
action="store_true",
75+
help="Rewrite the original file with pretty-formatted objects (not strict JSONL).",
76+
)
77+
group.add_argument(
78+
"--output",
79+
type=Path,
80+
help="Path to write output. If omitted and not --in-place, prints to stdout.",
81+
)
82+
parser.add_argument(
83+
"--as-array",
84+
action="store_true",
85+
help="Emit a single JSON array instead of individual pretty objects.",
86+
)
87+
parser.add_argument(
88+
"--no-backup",
89+
action="store_true",
90+
help="When using --in-place, do not create a .bak backup file.",
91+
)
92+
return parser.parse_args(argv)
93+
94+
95+
def main(argv: list[str] | None = None) -> int:
96+
args = parse_args(argv or sys.argv[1:])
97+
98+
if not args.jsonl_file.exists():
99+
print(f"Error: File not found: {args.jsonl_file}", file=sys.stderr)
100+
return 1
101+
102+
objs = list(read_jsonl(args.jsonl_file))
103+
104+
if args.as_array:
105+
output_text = write_pretty_array(objs, args.indent)
106+
else:
107+
output_text = write_pretty_individual(objs, args.indent)
108+
109+
# Destination logic
110+
if args.in_place:
111+
if not args.no_backup:
112+
backup_path = args.jsonl_file.with_suffix(args.jsonl_file.suffix + ".bak")
113+
if not backup_path.exists():
114+
backup_path.write_text(args.jsonl_file.read_text(encoding="utf-8"), encoding="utf-8")
115+
args.jsonl_file.write_text(output_text, encoding="utf-8")
116+
print(f"Rewrote {args.jsonl_file} ({len(objs)} objects).")
117+
elif args.output:
118+
args.output.parent.mkdir(parents=True, exist_ok=True)
119+
args.output.write_text(output_text, encoding="utf-8")
120+
print(f"Wrote pretty output to {args.output} ({len(objs)} objects).")
121+
else:
122+
# stdout
123+
sys.stdout.write(output_text)
124+
return 0
125+
126+
127+
if __name__ == "__main__": # pragma: no cover
128+
raise SystemExit(main())

tests/snapshots/test_app/test_ask_prompt_template_concat/client0/result.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@
5454
{
5555
"description": [
5656
{
57-
"content": "You are an intelligent assistant helping Contoso Inc employees with their healthcare plan questions and employee handbook questions.\nUse 'you' to refer to the individual asking the questions even if they ask with 'I'.\nAnswer the following question using only the data provided in the sources below.\nEach source has a name followed by colon and the actual information, always include the source name for each fact you use in the response.\nIf you cannot answer using the sources below, say you don't know. Use below example to answer.\n\nPossible citations for current question:\n\n[Benefit_Options-2.pdf]\n\n Meow like a cat.",
57+
"content": "You are an intelligent assistant helping Contoso Inc employees with their healthcare plan questions and employee handbook questions.\nUse 'you' to refer to the individual asking the questions even if they ask with 'I'.\nAnswer the following question using only the data provided in the sources below.\nEach source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. Use square brackets to reference the source, for example [info1.txt]. Don't combine sources, list each source separately, for example [info1.txt][info2.pdf].\nIf you cannot answer using the sources below, say you don't know. Use below example to answer.\n\nPossible citations for current question:\n\n[Benefit_Options-2.pdf]\n\n Meow like a cat.",
5858
"role": "system"
5959
},
6060
{

tests/snapshots/test_app/test_ask_prompt_template_concat/client1/result.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@
5454
{
5555
"description": [
5656
{
57-
"content": "You are an intelligent assistant helping Contoso Inc employees with their healthcare plan questions and employee handbook questions.\nUse 'you' to refer to the individual asking the questions even if they ask with 'I'.\nAnswer the following question using only the data provided in the sources below.\nEach source has a name followed by colon and the actual information, always include the source name for each fact you use in the response.\nIf you cannot answer using the sources below, say you don't know. Use below example to answer.\n\nPossible citations for current question:\n\n[Benefit_Options-2.pdf]\n\n Meow like a cat.",
57+
"content": "You are an intelligent assistant helping Contoso Inc employees with their healthcare plan questions and employee handbook questions.\nUse 'you' to refer to the individual asking the questions even if they ask with 'I'.\nAnswer the following question using only the data provided in the sources below.\nEach source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. Use square brackets to reference the source, for example [info1.txt]. Don't combine sources, list each source separately, for example [info1.txt][info2.pdf].\nIf you cannot answer using the sources below, say you don't know. Use below example to answer.\n\nPossible citations for current question:\n\n[Benefit_Options-2.pdf]\n\n Meow like a cat.",
5858
"role": "system"
5959
},
6060
{

tests/snapshots/test_app/test_ask_rtr_hybrid/client0/result.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@
5454
{
5555
"description": [
5656
{
57-
"content": "You are an intelligent assistant helping Contoso Inc employees with their healthcare plan questions and employee handbook questions.\nUse 'you' to refer to the individual asking the questions even if they ask with 'I'.\nAnswer the following question using only the data provided in the sources below.\nEach source has a name followed by colon and the actual information, always include the source name for each fact you use in the response.\nIf you cannot answer using the sources below, say you don't know. Use below example to answer.\n\nPossible citations for current question:\n\n[Benefit_Options-2.pdf]",
57+
"content": "Assistant helps the company employees with their questions about internal documents. Be brief in your answers.\nAnswer ONLY with the facts listed in the list of sources below. If there isn't enough information below, say you don't know. Do not generate answers that don't use the sources below.\nYou CANNOT ask clarifying questions to the user, since the user will have no way to reply.\nIf the question is not in English, answer in the language used in the question.\nEach source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. Use square brackets to reference the source, for example [info1.txt]. Don't combine sources, list each source separately, for example [info1.txt][info2.pdf].\n\nPossible citations for current question: [Benefit_Options-2.pdf]",
5858
"role": "system"
5959
},
6060
{

0 commit comments

Comments
 (0)