Commit f1cd1b2
authored
[evaluation] refactor: Make RunSubmitterClient default batch client (remove promptflow) (Azure#42243)
* chore: Remove promptflow dependencies
* fix: Filter inputs before calling function in Batch Engine
Added to match behavior in promptflow
see also:
https://github.com/microsoft/promptflow/blob/5e6c183474c0a2575bb416d18201e4f9fd562b2e/src/promptflow-core/promptflow/executor/_script_executor.py#L162
* refactor: Use enumerate instead of manually keeping track of line number
* fix,refactor: Unconditionally inject default column mapping from data -> params
In promptflow's logic for applying a column mapping to data,
it will unconditionally inject a mapping from function
parameter do data of the same name:
https://github.com/microsoft/promptflow/blob/3e297112a2c142caf7c185bcba644d0f66422539/src/promptflow-devkit/promptflow/batch/_batch_inputs_processor.py#L110-L141
This behavior deviated from the existing logic in this SDK,
where generating that mapping was conditional on the user
not providing a column mapping:
https://github.com/Azure/azure-sdk-for-python/blob/f3740540eb5b3d22dc1bccba0eb00b652b124d5f/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_legacy/_batch_engine/_run_submitter.py#L51-L52
This deviation caused one of the parameterized cases of
`test_evaluate_another_questions` to fail, because there
was a user provided column mapping that mapped to a parameter
not present in the evaluator, so the lack of a default mapping
caused the evaluation to fail because the required parameter
was missing
https://github.com/Azure/azure-sdk-for-python/blob/f3740540eb5b3d22dc1bccba0eb00b652b124d5f/sdk/evaluation/azure-ai-evaluation/tests/e2etests/test_evaluate.py#L216
This commit aligns the application of the column mapping in the SDK
more closely to the promptflow implementation
* refactor: Don't shadow `value` variable in apply_column_mapping_to_lines
* feat: Add support for running aggregations in RunSubmitterClient
* tests,fix: Don't log duration as a metric
Breaks a tests that checks for strict equality of metrics
* refactor: Rewrite RunSubmitterClient.get_details
* fix: Correct the typing of is_onedp_project
* fix,tests: Don't log tokens as metrics
Removing to match the behavior of the other clients
* fix: Set error message without depending on run storage
Promptflow surfaces exceptions by reading them from their
"Storage" abstraction. That has not been ported to this SDK.
* tests,fix: Fix test_evaluate_invalid_column_mapping
PR 40556 accidentally indented the assertion in
test_evaluate_invalid_column_mapping into the `pytest.raises`
block.
This inadvertently made the test useless, since the `evaluate`
call would always raise an exception which skips over the
assertion as the exception unwinds the stack.
This commit unindents the assertion so that it runs.
Additionally, PR 41919 updated our validation logic to allow
column mapping reference of arbitrary length e.g.
`${target.foo.bar.baz`}`.
So this commit also removes the test case that was explicitly
guarding against this `${target.response.one}`
* fix,tests: Force PFClient specific tests to use PFClient
* fix,tests: Force CodeClient specific tests to use CodeClient
* fix: Improve the ergonomics for picking which client is used
Except for the CodeClient, you only need to use
at most 1 of `_use_pf_client` and `_use_run_submitter_client`
* feat: Show exception message in run logs
* fix: For safety evaluation to use codeclient as originally intended
* fix: Don't wrap EvaluationExcpeiton in BatchEngineError
* refactor: Refactor BatchConfig
* fix: Make raising on error configurable for runsubmitterclient
* chore: Update changelog
* fix: Uncomment log_path
* chore: Add promptflow to dev-requirements.txt
Some tests have explicit dependencies on the promptflow implementation.
Since we aren't removing the code path yet, allow them to run by installing
promptflow for those tests.
* fix: Initialize error_message
* fix: Get exception instead of batchrunresult
* chore,docs: Clarify changelog1 parent 1f0488c commit f1cd1b2
File tree
14 files changed
+299
-117
lines changed- sdk/evaluation/azure-ai-evaluation
- azure/ai/evaluation
- _common
- _evaluate
- _batch_run
- _legacy/_batch_engine
- _safety_evaluation
- tests/unittests
14 files changed
+299
-117
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
24 | 31 | | |
25 | 32 | | |
26 | 33 | | |
| |||
Lines changed: 5 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
| 9 | + | |
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
13 | | - | |
| 13 | + | |
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| |||
127 | 127 | | |
128 | 128 | | |
129 | 129 | | |
130 | | - | |
| 130 | + | |
131 | 131 | | |
132 | 132 | | |
133 | 133 | | |
134 | | - | |
| 134 | + | |
135 | 135 | | |
136 | 136 | | |
137 | 137 | | |
138 | | - | |
139 | | - | |
140 | | - | |
| 138 | + | |
141 | 139 | | |
142 | 140 | | |
143 | 141 | | |
| |||
Lines changed: 70 additions & 22 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
9 | 10 | | |
10 | 11 | | |
11 | 12 | | |
| |||
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
| 20 | + | |
19 | 21 | | |
| 22 | + | |
| 23 | + | |
20 | 24 | | |
| 25 | + | |
21 | 26 | | |
22 | | - | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
23 | 30 | | |
24 | 31 | | |
25 | 32 | | |
26 | | - | |
27 | | - | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
28 | 48 | | |
29 | 49 | | |
30 | 50 | | |
| |||
72 | 92 | | |
73 | 93 | | |
74 | 94 | | |
75 | | - | |
76 | 95 | | |
77 | 96 | | |
78 | | - | |
79 | | - | |
| 97 | + | |
| 98 | + | |
80 | 99 | | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | | - | |
86 | | - | |
87 | | - | |
| 100 | + | |
| 101 | + | |
88 | 102 | | |
89 | | - | |
90 | | - | |
91 | | - | |
92 | | - | |
93 | | - | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
94 | 112 | | |
95 | | - | |
96 | | - | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
97 | 120 | | |
98 | 121 | | |
99 | 122 | | |
100 | | - | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
101 | 149 | | |
102 | 150 | | |
103 | 151 | | |
| |||
110 | 158 | | |
111 | 159 | | |
112 | 160 | | |
113 | | - | |
| 161 | + | |
114 | 162 | | |
115 | 163 | | |
116 | 164 | | |
| |||
Lines changed: 41 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | | - | |
| 12 | + | |
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| |||
876 | 876 | | |
877 | 877 | | |
878 | 878 | | |
| 879 | + | |
879 | 880 | | |
880 | 881 | | |
881 | 882 | | |
| |||
983 | 984 | | |
984 | 985 | | |
985 | 986 | | |
| 987 | + | |
986 | 988 | | |
987 | 989 | | |
988 | 990 | | |
| |||
1016 | 1018 | | |
1017 | 1019 | | |
1018 | 1020 | | |
1019 | | - | |
1020 | | - | |
| 1021 | + | |
| 1022 | + | |
| 1023 | + | |
| 1024 | + | |
| 1025 | + | |
| 1026 | + | |
| 1027 | + | |
| 1028 | + | |
| 1029 | + | |
| 1030 | + | |
| 1031 | + | |
| 1032 | + | |
| 1033 | + | |
| 1034 | + | |
| 1035 | + | |
| 1036 | + | |
| 1037 | + | |
| 1038 | + | |
| 1039 | + | |
| 1040 | + | |
| 1041 | + | |
| 1042 | + | |
| 1043 | + | |
| 1044 | + | |
| 1045 | + | |
| 1046 | + | |
| 1047 | + | |
| 1048 | + | |
| 1049 | + | |
| 1050 | + | |
| 1051 | + | |
| 1052 | + | |
| 1053 | + | |
| 1054 | + | |
| 1055 | + | |
| 1056 | + | |
1021 | 1057 | | |
1022 | | - | |
| 1058 | + | |
1023 | 1059 | | |
1024 | 1060 | | |
1025 | 1061 | | |
1026 | 1062 | | |
1027 | | - | |
| 1063 | + | |
1028 | 1064 | | |
1029 | 1065 | | |
1030 | 1066 | | |
| |||
Lines changed: 6 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
| 22 | + | |
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
35 | 38 | | |
36 | 39 | | |
37 | 40 | | |
38 | 41 | | |
39 | 42 | | |
40 | | - | |
41 | | - | |
| 43 | + | |
| 44 | + | |
42 | 45 | | |
43 | 46 | | |
44 | 47 | | |
| |||
0 commit comments