Skip to content

Commit df00d51

Browse files
authored
Merge pull request #106 from ks6088ts-labs/feature/issue-69_promptflow-demo
fork eval-chat-math example
2 parents d16ca7a + 3707542 commit df00d51

File tree

7 files changed

+182
-3
lines changed

7 files changed

+182
-3
lines changed

apps/11_promptflow/README.md

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ $ pip install -r requirements.txt
3434
## Examples
3535

3636
[Prompt flow > Quick start](https://microsoft.github.io/promptflow/how-to-guides/quick-start.html) provides a quick start guide to Prompt flow.
37+
Some of the examples are extracted from [github.com/microsoft/promptflow/examples](https://github.com/microsoft/promptflow/tree/main/examples) to guide you through the basic usage of Prompt flow.
3738

3839
### [chat_minimal](https://github.com/microsoft/promptflow/tree/main/examples/flex-flows/chat-minimal)
3940

@@ -203,7 +204,7 @@ $ pf run create \
203204
$ pf run show-details --name $RUN_NAME
204205
```
205206

206-
### chat-math-variant
207+
### [chat-math-variant](https://github.com/microsoft/promptflow/tree/main/examples/flows/chat/chat-math-variant)
207208

208209
Tuning prompts using `variants` is a powerful feature in Prompt flow. It allows you to test different prompts and see which one works best for your use case.
209210

@@ -229,7 +230,19 @@ $ pf run create \
229230
$ pf run show-details --name $RUN_NAME
230231
```
231232

233+
### [eval-chat-math](https://github.com/microsoft/promptflow/tree/main/examples/flows/evaluation/eval-chat-math)
234+
235+
This example shows how to evaluate the answer of math questions, which can compare the output results with the standard answers numerically.
236+
Details are available in the [eval-chat-math/README.md](./eval-chat-math/README.md).
237+
To understand how to operate the flow in VS Code, you can refer to the [Build your high quality LLM apps with Prompt flow](https://www.youtube.com/watch?v=gcIe6nk2gA4).
238+
This video shows how to evaluate the answer of math questions and guide you to tune the prompts using variants.
239+
240+
<!-- TODO: rag, tracing, deployments -->
241+
232242
## References
233243

234-
- [Prompt flow > repos](https://github.com/microsoft/promptflow)
235-
- [Prompt flow > documents](https://microsoft.github.io/promptflow/)
244+
- [Repository](https://github.com/microsoft/promptflow)
245+
- [examples](https://github.com/microsoft/promptflow/tree/main/examples)
246+
- [Documents](https://microsoft.github.io/promptflow/)
247+
- [How-to Guides](https://microsoft.github.io/promptflow/how-to-guides/index.html)
248+
- [Tutorials](https://microsoft.github.io/promptflow/tutorials/index.html#)
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Eval chat math
2+
3+
This example shows how to evaluate the answer of math questions, which can compare the output results with the standard answers numerically.
4+
5+
Learn more on corresponding [tutorials](../../../tutorials/flow-fine-tuning-evaluation/promptflow-quality-improvement.md)
6+
7+
Tools used in this flow:
8+
- `python` tool
9+
10+
## Prerequisites
11+
12+
Install promptflow sdk and other dependencies in this folder:
13+
```bash
14+
pip install -r requirements.txt
15+
```
16+
17+
### 1. Test flow with single line data
18+
19+
Testing flow/node:
20+
```bash
21+
# test with default input value in flow.dag.yaml
22+
pf flow test --flow .
23+
24+
# test with flow inputs
25+
pf flow test --flow . --inputs groundtruth=123 prediction=123
26+
27+
# test node with inputs
28+
pf flow test --flow . --node line_process --inputs groundtruth=123 prediction=123
29+
```
30+
31+
### 2. create flow run with multi line data
32+
There are two ways to evaluate an classification flow.
33+
34+
```bash
35+
pf run create --flow . --data ./data.jsonl --stream
36+
```
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
from promptflow.core import log_metric, tool
2+
3+
4+
@tool
5+
def accuracy_aggregate(processed_results: list[int]):
6+
7+
num_exception = 0
8+
num_correct = 0
9+
10+
for i in range(len(processed_results)):
11+
if processed_results[i] == -1:
12+
num_exception += 1
13+
elif processed_results[i] == 1:
14+
num_correct += 1
15+
16+
num_total = len(processed_results)
17+
accuracy = round(1.0 * num_correct / num_total, 2)
18+
error_rate = round(1.0 * num_exception / num_total, 2)
19+
20+
log_metric(key="accuracy", value=accuracy)
21+
log_metric(key="error_rate", value=error_rate)
22+
23+
return {
24+
"num_total": num_total,
25+
"num_correct": num_correct,
26+
"num_exception": num_exception,
27+
"accuracy": accuracy,
28+
"error_rate": error_rate,
29+
}
30+
31+
32+
if __name__ == "__main__":
33+
numbers = [1, 1, 1, 1, 0, -1, -1]
34+
accuracy = accuracy_aggregate(numbers)
35+
print("The accuracy is", accuracy)
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{"groundtruth": "10","prediction": "10"}
2+
{"groundtruth": "253","prediction": "506"}
3+
{"groundtruth": "1/3","prediction": "2/6"}
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
$schema: https://azuremlschemas.azureedge.net/promptflow/latest/Flow.schema.json
2+
environment:
3+
python_requirements_txt: requirements.txt
4+
inputs:
5+
groundtruth:
6+
type: string
7+
default: "10"
8+
is_chat_input: false
9+
prediction:
10+
type: string
11+
default: "10"
12+
is_chat_input: false
13+
outputs:
14+
score:
15+
type: string
16+
reference: ${line_process.output}
17+
nodes:
18+
- name: line_process
19+
type: python
20+
source:
21+
type: code
22+
path: line_process.py
23+
inputs:
24+
groundtruth: ${inputs.groundtruth}
25+
prediction: ${inputs.prediction}
26+
use_variants: false
27+
- name: aggregate
28+
type: python
29+
source:
30+
type: code
31+
path: aggregate.py
32+
inputs:
33+
processed_results: ${line_process.output}
34+
aggregation: true
35+
use_variants: false
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
from promptflow.core import tool
2+
3+
4+
def string_to_number(raw_string: str) -> float:
5+
"""Try to parse the prediction string and groundtruth string to float number.
6+
Support parse int, float, fraction and recognize non-numeric string with wrong format.
7+
Wrong format cases: 'the answer is \box{2/3}', '0, 5, or any number greater than 11', '4/7//9'
8+
"""
9+
float_number = 0.0
10+
try:
11+
float_number = float(raw_string)
12+
except Exception:
13+
if "/" in raw_string:
14+
split_list = raw_string.split("/")
15+
if len(split_list) == 2:
16+
numerator, denominator = split_list
17+
try:
18+
float_number = float(numerator) / float(denominator)
19+
except Exception:
20+
return None
21+
else:
22+
return None
23+
else:
24+
return None
25+
return float_number
26+
27+
28+
@tool
29+
def line_process(groundtruth: str, prediction: str) -> int:
30+
pred_float = string_to_number(prediction)
31+
"""Early stop"""
32+
if pred_float is None:
33+
return -1
34+
gt_float = string_to_number(groundtruth)
35+
if gt_float is None:
36+
return -1
37+
""" both pred_float and gt_float are valid"""
38+
if round(pred_float, 10) == round(gt_float, 10):
39+
return 1
40+
else:
41+
return -1
42+
43+
44+
if __name__ == "__main__":
45+
processed_result = line_process("3/5", "6/10")
46+
print("The processed result is", processed_result)
47+
48+
processed_result = line_process("1/2", "0.5")
49+
print("The processed result is", processed_result)
50+
51+
processed_result = line_process("3", "5")
52+
print("The processed result is", processed_result)
53+
54+
processed_result = line_process("2/3", "the answer is \box{2/3}")
55+
print("The processed result is", processed_result)
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
promptflow
2+
promptflow-tools

0 commit comments

Comments
 (0)