Commit 89cdab7
authored
[evaluation] tests: Migrate azure-ai-evaluations tests (Azure#37201)
* chore: Add pf-azure extra as dev dependency
* tests: Copy tests verbatim from Microsoft/promptflow
* tests: Re-sync tests
* chore: Re-sync tests again
* fix: Change imports from "promptflow.evals" to "azure.ai.evaluation"
* tests,refactor: Replace promptflow-recording.{is_live,is_record,...} with az-for-python equivalents
* tests,refactor: Make SanitizedValues enum
* tests: Replace "vcr_recording" with azp equivalent
* tests,refactor: Move test data from `recordings/` -> `test_configs`
* tests, fix: Explicitly set expected caplog level
Some unittests inspect the log messages captured by the caplog
pytest fixture.
Explicitly setting the caplog level resolves test failures
when the `log_level` config value is changed from the default.
* tests,refactor: Remove setup_recording_injection_if_enabled
* style: Run isort
* tests,refactor: Remove RecordStorage
RecordStorage appears to be a caching mechanism to accelerate
recordings that involve flows.
* tests,refactor: Remove variable_recorder fixture
This shadows the implementation used by the azure-sdk-for-python
infrastructure.
* tests,refactor: Make a dev_connections fixture
* style: Run isort + black
* tests,chore: Move tests up a directory
* chore: Remove tests/e2etests/README.md
* ci: Re-enable tests in ci
* chore: Mock dev_connections when not live
* tests,fix: Don't hardcode azure_deployment
* tests,feat: Add support for recording openai requests
* fix: response.status -> response.status_code
* fix: Don't await response.text()
* tests,fix: Redirect traffic from AsyncioRequestsTransport to test proxy
Currently, the azure-sdk-for-python infra:
* Only patches a single async transport (AsyncioTransport)
* Only patches the async transport when the test itself is async
This fixture gets requested unconditionally, and patches the
default transport the SDK uses.
Ideally this would be part of the azure-sdk-for-python
test infra.
* tests: Request "recorded_test" for more e2e tests
* test,refactor: Update mock config values
* tests,refactor: Remove redundant mock project_scope
* tests: Add some sanitizers
Were taken from promptflow-recording
* chore: Add assets.json
* tests,fix: Use a FakeTokenCredential when not live
* ci,fix: Exclude tests from packages so verify_sdist finds py.typed
* tests: Add sanitizers for stainless headers and x-cv
* tests: Add a sanitizer for values from connections.json
* tests: Add return type to get_cred
* chore: Update assets.json
* tests,fix: late import NISTTokenizer
nltk does not bundle all the data it uses in its pip install, and
instead requires that the user manually installs them
(`nltk.download`).
nltk will error on the import of any class the depends on some external
resource to work.
The azure-sdk-for-python team's test proxy uses it's own certificate
bundle to enable https connections to the test proxy, but this
seems to cause `nltk.download` to fail.
late importing NISTTokenizer allows tests to run in CI without
immediately crashing on the import
* fix: Fix broken IndirectAttackEvaluator imports
* fix: Fix broken EvaluationMetrics import
* chore: Update assets.json
* docs: Fix docstring for IndirectAttackEvaluator
* tests,fix: Coerce string enum values to string
Otherwise the string in the dict is the qualified name of the enum
value.
* chore: Update assets.json
* tests: Temp skip tests
* chore: Add a minimum bound to azure-identity dependency
* chore: Bump minimum bound of numpy
1.26.4 fixes a bug that prevented numpy from installing on
python3.12
* chore: Bump nltk lower bound
* 3.8.1 crashes on python3.12
* 3.9.0 can't be imported (`import nltk`) without downloading
"wordnet"
* ci: Temporarily disable windows tests
* ci: Temp disable python3.12 test
* chore: Bump numpy minimum bound
On Python 3.11, pandas depends on numpy>=1.23.2
* ci: Run pypy39 on ubuntu1 parent 189b106 commit 89cdab7
File tree
60 files changed
+5402
-40
lines changed- sdk/evaluation
- azure-ai-evaluation
- azure/ai/evaluation
- _common
- evaluators
- _xpia
- synthetic/_model_tools
- tests
- e2etests
- custom_evaluators
- data
- test_configs/local
- unittests
- data
- test_evaluators
- apology_dag
- apology_prompty
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
60 files changed
+5402
-40
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
Lines changed: 10 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | | - | |
12 | 11 | | |
13 | 12 | | |
14 | | - | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | 13 | | |
22 | 14 | | |
23 | 15 | | |
| |||
45 | 37 | | |
46 | 38 | | |
47 | 39 | | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
48 | 50 | | |
49 | 51 | | |
50 | 52 | | |
| |||
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| |||
Lines changed: 28 additions & 22 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
| 9 | + | |
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
28 | | - | |
29 | | - | |
| 29 | + | |
| 30 | + | |
30 | 31 | | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
45 | 51 | | |
46 | 52 | | |
47 | 53 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
182 | 182 | | |
183 | 183 | | |
184 | 184 | | |
185 | | - | |
| 185 | + | |
186 | 186 | | |
187 | 187 | | |
188 | 188 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
60 | | - | |
| 60 | + | |
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
| |||
69 | 69 | | |
70 | 70 | | |
71 | 71 | | |
72 | | - | |
| 72 | + | |
73 | 73 | | |
74 | | - | |
| 74 | + | |
75 | 75 | | |
76 | | - | |
| 76 | + | |
77 | 77 | | |
78 | 78 | | |
79 | 79 | | |
| |||
Whitespace-only changes.
Lines changed: 118 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
0 commit comments