Skip to content

Commit 89cdab7

Browse files
authored
[evaluation] tests: Migrate azure-ai-evaluations tests (Azure#37201)
* chore: Add pf-azure extra as dev dependency * tests: Copy tests verbatim from Microsoft/promptflow * tests: Re-sync tests * chore: Re-sync tests again * fix: Change imports from "promptflow.evals" to "azure.ai.evaluation" * tests,refactor: Replace promptflow-recording.{is_live,is_record,...} with az-for-python equivalents * tests,refactor: Make SanitizedValues enum * tests: Replace "vcr_recording" with azp equivalent * tests,refactor: Move test data from `recordings/` -> `test_configs` * tests, fix: Explicitly set expected caplog level Some unittests inspect the log messages captured by the caplog pytest fixture. Explicitly setting the caplog level resolves test failures when the `log_level` config value is changed from the default. * tests,refactor: Remove setup_recording_injection_if_enabled * style: Run isort * tests,refactor: Remove RecordStorage RecordStorage appears to be a caching mechanism to accelerate recordings that involve flows. * tests,refactor: Remove variable_recorder fixture This shadows the implementation used by the azure-sdk-for-python infrastructure. * tests,refactor: Make a dev_connections fixture * style: Run isort + black * tests,chore: Move tests up a directory * chore: Remove tests/e2etests/README.md * ci: Re-enable tests in ci * chore: Mock dev_connections when not live * tests,fix: Don't hardcode azure_deployment * tests,feat: Add support for recording openai requests * fix: response.status -> response.status_code * fix: Don't await response.text() * tests,fix: Redirect traffic from AsyncioRequestsTransport to test proxy Currently, the azure-sdk-for-python infra: * Only patches a single async transport (AsyncioTransport) * Only patches the async transport when the test itself is async This fixture gets requested unconditionally, and patches the default transport the SDK uses. Ideally this would be part of the azure-sdk-for-python test infra. * tests: Request "recorded_test" for more e2e tests * test,refactor: Update mock config values * tests,refactor: Remove redundant mock project_scope * tests: Add some sanitizers Were taken from promptflow-recording * chore: Add assets.json * tests,fix: Use a FakeTokenCredential when not live * ci,fix: Exclude tests from packages so verify_sdist finds py.typed * tests: Add sanitizers for stainless headers and x-cv * tests: Add a sanitizer for values from connections.json * tests: Add return type to get_cred * chore: Update assets.json * tests,fix: late import NISTTokenizer nltk does not bundle all the data it uses in its pip install, and instead requires that the user manually installs them (`nltk.download`). nltk will error on the import of any class the depends on some external resource to work. The azure-sdk-for-python team's test proxy uses it's own certificate bundle to enable https connections to the test proxy, but this seems to cause `nltk.download` to fail. late importing NISTTokenizer allows tests to run in CI without immediately crashing on the import * fix: Fix broken IndirectAttackEvaluator imports * fix: Fix broken EvaluationMetrics import * chore: Update assets.json * docs: Fix docstring for IndirectAttackEvaluator * tests,fix: Coerce string enum values to string Otherwise the string in the dict is the qualified name of the enum value. * chore: Update assets.json * tests: Temp skip tests * chore: Add a minimum bound to azure-identity dependency * chore: Bump minimum bound of numpy 1.26.4 fixes a bug that prevented numpy from installing on python3.12 * chore: Bump nltk lower bound * 3.8.1 crashes on python3.12 * 3.9.0 can't be imported (`import nltk`) without downloading "wordnet" * ci: Temporarily disable windows tests * ci: Temp disable python3.12 test * chore: Bump numpy minimum bound On Python 3.11, pandas depends on numpy>=1.23.2 * ci: Run pypy39 on ubuntu
1 parent 189b106 commit 89cdab7

File tree

60 files changed

+5402
-40
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+5402
-40
lines changed
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"AssetsRepo": "Azure/azure-sdk-assets",
3+
"AssetsRepoPrefixPath": "python",
4+
"TagPrefix": "python/evaluation/azure-ai-evaluation",
5+
"Tag": "python/evaluation/azure-ai-evaluation_9ac3e64c3e"
6+
}

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_common/utils.py

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,16 +8,8 @@
88

99
from typing import List
1010

11-
import nltk
1211
import numpy as np
1312

14-
try:
15-
from nltk.tokenize.nist import NISTTokenizer
16-
except LookupError:
17-
nltk.download("perluniprops")
18-
nltk.download("punkt")
19-
nltk.download("punkt_tab")
20-
from nltk.tokenize.nist import NISTTokenizer
2113

2214

2315
def get_harm_severity_level(harm_score: int) -> str:
@@ -45,6 +37,16 @@ def get_harm_severity_level(harm_score: int) -> str:
4537
def nltk_tokenize(text: str) -> List[str]:
4638
"""Tokenize the input text using the NLTK tokenizer."""
4739

40+
import nltk
41+
42+
try:
43+
from nltk.tokenize.nist import NISTTokenizer
44+
except LookupError:
45+
nltk.download("perluniprops")
46+
nltk.download("punkt")
47+
nltk.download("punkt_tab")
48+
from nltk.tokenize.nist import NISTTokenizer
49+
4850
if not text.isascii():
4951
# Use NISTTokenizer for international tokenization
5052
tokens = NISTTokenizer().international_tokenize(text)

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/evaluators/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
from ._relevance import RelevanceEvaluator
2424
from ._rouge import RougeScoreEvaluator, RougeType
2525
from ._similarity import SimilarityEvaluator
26+
from ._xpia import IndirectAttackEvaluator
2627

2728
__all__ = [
2829
"CoherenceEvaluator",

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/evaluators/_xpia/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
from ._xpia import IndirectAttackEvaluator
1+
from .xpia import IndirectAttackEvaluator
22

33
__all__ = [
44
"IndirectAttackEvaluator",

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/evaluators/_xpia/xpia.py

Lines changed: 28 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -6,42 +6,48 @@
66

77
from promptflow._utils.async_utils import async_run_allowing_running_loop
88

9-
from azure.ai._common.constants import EvaluationMetrics
9+
from azure.ai.evaluation._common.constants import EvaluationMetrics
1010
from azure.ai.evaluation._common.rai_service import evaluate_with_rai_service
1111

1212
logger = logging.getLogger(__name__)
1313

1414

1515
class IndirectAttackEvaluator:
16-
"""
17-
Initializes an XPIA (cross domain prompt injected attack) jailbreak evaluator to detect whether cross domain
18-
injected attacks are present in your AI system's response.
19-
:param project_scope: The scope of the Azure AI project.
20-
It contains subscription id, resource group, and project name.
16+
"""A Cross-Domain Prompt Injection Attack (XPIA) jailbreak evaluator.
17+
18+
Detect whether cross domain injected attacks are present in your AI system's response.
19+
20+
:param project_scope: The scope of the Azure AI project. It contains subscription id, resource group, and project
21+
name.
2122
:type project_scope: dict
2223
:param eval_last_turn: Set to True to evaluate only the most recent exchange in the dialogue,
2324
focusing on the latest user inquiry and the assistant's corresponding response. Defaults to False
2425
:type eval_last_turn: bool
2526
:param credential: The credential for connecting to Azure AI project.
2627
:type credential: ~azure.core.credentials.TokenCredential
2728
:return: A function that evaluates and generates metrics for XPIA chat scenario. Metrics include the overall
28-
evaluation label and reason for the Q/A Pair, as well as sub-labels
29-
for manipulated content, intrusion, and information.
29+
evaluation label and reason for the Q/A Pair, as well as sub-labels for manipulated content, intrusion, and
30+
information.
3031
:rtype: Callable
31-
**Usage**
32-
.. code-block:: python
33-
eval_fn = IndirectAttackEvaluator(model_config)
34-
result = eval_fn(question="What is the capital of France?", answer="Paris.")
35-
**Output format for question-answer pair**
36-
.. code-block:: python
37-
{
38-
'xpia_label': False,
39-
'xpia_reason': 'The conversation does not contain any manipulated content, intrusion or
40-
information gathering.'
41-
'xpia_information_gathering': False,
42-
'xpia_intrusion': False
43-
'xpia_manipulated_content': False
44-
}
32+
**Usage**
33+
34+
.. code-block:: python
35+
36+
eval_fn = IndirectAttackEvaluator(model_config)
37+
result = eval_fn(question="What is the capital of France?", answer="Paris.")
38+
39+
**Output format for question-answer pair**
40+
41+
.. code-block:: python
42+
43+
{
44+
'xpia_label': False,
45+
'xpia_reason': 'The conversation does not contain any manipulated content, intrusion or
46+
information gathering.'
47+
'xpia_information_gathering': False,
48+
'xpia_intrusion': False
49+
'xpia_manipulated_content': False
50+
}
4551
"""
4652

4753
def __init__(self, project_scope: dict, eval_last_turn: bool = False, credential=None):

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/synthetic/_model_tools/_proxy_completion_model.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ async def request_api(
182182

183183
if response.status_code != 202:
184184
raise HttpResponseError(
185-
message=f"Received unexpected HTTP status: {response.status} {await response.text()}", response=response
185+
message=f"Received unexpected HTTP status: {response.status_code} {response.text()}", response=response
186186
)
187187

188188
response = response.json()

sdk/evaluation/azure-ai-evaluation/dev_requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,4 @@ pytest-asyncio
66
pytest-cov
77
pytest-mock
88
pytest-xdist
9+
-e ../azure-ai-evaluation[pf-azure]

sdk/evaluation/azure-ai-evaluation/setup.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@
5757
include_package_data=True,
5858
packages=find_packages(
5959
exclude=[
60-
"tests",
60+
"tests*",
6161
# Exclude packages that will be covered by PEP420 or nspkg
6262
"azure",
6363
"azure.ai",
@@ -69,11 +69,11 @@
6969
"promptflow-core>=1.15.0",
7070
"websocket-client>=1.2.0",
7171
"jsonpath_ng>=1.5.0",
72-
"numpy>=1.22",
72+
"numpy>=1.23.2",
7373
"pyjwt>=2.8.0",
74-
"azure-identity",
74+
"azure-identity>=1.12.0",
7575
"azure-core>=1.30.2",
76-
"nltk>=3.8.1",
76+
"nltk>=3.9.1",
7777
"rouge-score>=0.1.2",
7878
],
7979
extras_require={

sdk/evaluation/azure-ai-evaluation/tests/__init__.py

Whitespace-only changes.
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
"""Implementation of an httpx.Client that forwards traffic to the Azure SDK test-proxy.
2+
3+
.. note::
4+
5+
This module has side-effects!
6+
7+
Importing this module will replace the default httpx.Client used
8+
by the openai package with one that can redirect it's traffic
9+
to the Azure SDK test-proxy on demand.
10+
11+
"""
12+
13+
from contextlib import contextmanager
14+
from dataclasses import dataclass
15+
from typing import Iterable, Iterator, Literal, Optional
16+
17+
import httpx
18+
import openai._base_client
19+
from typing_extensions import override
20+
21+
22+
@dataclass
23+
class TestProxyConfig:
24+
recording_id: str
25+
"""The ID for the ongoing test recording."""
26+
27+
recording_mode: Literal["playback", "record"]
28+
"""The current recording mode."""
29+
30+
proxy_url: str
31+
"""The url for the Azure SDK test proxy."""
32+
33+
34+
class TestProxyHttpxClientBase:
35+
recording_config: Optional[TestProxyConfig] = None
36+
37+
@classmethod
38+
def is_recording(cls) -> bool:
39+
"""Whether we are forwarding requests to the test proxy
40+
41+
:return: True if forwarding, False otherwise
42+
:rtype: bool
43+
"""
44+
return cls.recording_config is not None
45+
46+
@classmethod
47+
@contextmanager
48+
def record_with_proxy(cls, config: TestProxyConfig) -> Iterable[None]:
49+
"""Forward all requests made within the scope of context manager to test-proxy.
50+
51+
:param TestProxyConfig config: The test proxy configuration
52+
"""
53+
cls.recording_config = config
54+
55+
yield
56+
57+
cls.recording_config = None
58+
59+
@contextmanager
60+
def _reroute_to_proxy(self, request: httpx.Request) -> Iterator[None]:
61+
"""Temporarily re-route a request to be sent throught the test proxy.
62+
63+
The request is modified in place, but is restored once the contextmanager exits
64+
65+
:param httpx.Request request: The request to update
66+
:return: None
67+
:rtype: None
68+
"""
69+
assert self.is_recording(), f"{self._reroute_to_proxy.__qualname__} should only be called while recording"
70+
config = self.recording_config
71+
original_url = request.url
72+
73+
request_path = original_url.copy_with(scheme="", netloc=b"")
74+
request.url = httpx.URL(config.proxy_url).join(request_path)
75+
76+
original_headers = request.headers
77+
request.headers = request.headers.copy()
78+
request.headers.setdefault(
79+
"x-recording-upstream-base-uri", str(httpx.URL(scheme=original_url.scheme, netloc=original_url.netloc))
80+
)
81+
request.headers["x-recording-id"] = config.recording_id
82+
request.headers["x-recording-mode"] = config.recording_mode
83+
84+
yield
85+
86+
request.url = original_url
87+
request.headers = original_headers
88+
89+
90+
class TestProxyHttpxClient(TestProxyHttpxClientBase, openai._base_client.SyncHttpxClientWrapper):
91+
@override
92+
def send(self, request: httpx.Request, **kwargs) -> httpx.Response:
93+
if self.is_recording():
94+
with self._reroute_to_proxy(request):
95+
response = super().send(request, **kwargs)
96+
97+
response.request.url = request.url
98+
return response
99+
else:
100+
return super().send(request, **kwargs)
101+
102+
103+
class TestProxyAsyncHttpxClient(TestProxyHttpxClientBase, openai._base_client.AsyncHttpxClientWrapper):
104+
@override
105+
async def send(self, request: httpx.Request, **kwargs) -> httpx.Response:
106+
if self.is_recording():
107+
with self._reroute_to_proxy(request):
108+
response = await super().send(request, **kwargs)
109+
110+
response.request.url = request.url
111+
return response
112+
else:
113+
return await super().send(request, **kwargs)
114+
115+
116+
# openai._base_client.{Async,Sync}HttpxClientWrapper are default httpx.Clients instantiated by openai
117+
openai._base_client.SyncHttpxClientWrapper = TestProxyHttpxClient
118+
openai._base_client.AsyncHttpxClientWrapper = TestProxyAsyncHttpxClient

0 commit comments

Comments
 (0)