Skip to content

Commit 94c2b59

Browse files
nagkumar91Nagkumar ArkalgudNagkumar ArkalgudNagkumar Arkalgud
authored
Clean up simulator readme (Azure#38450)
* Update task_query_response.prompty remove required keys * Update task_simulate.prompty * Update task_query_response.prompty * Update task_simulate.prompty * Fix the api_key needed * Update for release * Black fix for file * Add original text in global context * Update test * Update the indirect attack simulator * Black suggested fixes * Update simulator prompty * Update adversarial scenario enum to exclude XPIA * Update changelog * Black fixes * Remove duplicate import * Fix the mypy error * Mypy please be happy * Updates to non adv simulator * accept context from assistant messages, exclude them when using them for conversation * update changelog * pylint fixes * pylint fixes * remove redundant quotes * Fix typo * pylint fix * Update broken tests * Include the grounding json in the manifest * Fix typo * Come on package * Release 1.0.0b5 * Notice from Chang * Remove adv_conv template parameters from the outputs * Update chanagelog * Experimental tags on adv scenarios * Readme fix onbreaking change * Add the category and both user and assistant context to the response of qr_json_lines * Update changelog * Rename _kwargs to _options * _options as prefix * update troubleshooting for simulator * Rename according to suggestions * Clean up readme * more links --------- Co-authored-by: Nagkumar Arkalgud <[email protected]> Co-authored-by: Nagkumar Arkalgud <[email protected]> Co-authored-by: Nagkumar Arkalgud <[email protected]>
1 parent ca3cd48 commit 94c2b59

File tree

1 file changed

+40
-269
lines changed

1 file changed

+40
-269
lines changed

sdk/evaluation/azure-ai-evaluation/README.md

Lines changed: 40 additions & 269 deletions
Original file line numberDiff line numberDiff line change
@@ -180,292 +180,68 @@ For more details refer to [Evaluate on a target][evaluate_target]
180180
### Simulator
181181

182182

183-
Simulators allow users to generate synthentic data using their application. Simulator expects the user to have a callback method that invokes
184-
their AI application.
185-
186-
#### Simulating with a Prompty
187-
188-
```yaml
189-
---
190-
name: ApplicationPrompty
191-
description: Simulates an application
192-
model:
193-
api: chat
194-
parameters:
195-
temperature: 0.0
196-
top_p: 1.0
197-
presence_penalty: 0
198-
frequency_penalty: 0
199-
response_format:
200-
type: text
201-
202-
inputs:
203-
conversation_history:
204-
type: dict
205-
206-
---
207-
system:
208-
You are a helpful assistant and you're helping with the user's query. Keep the conversation engaging and interesting.
209-
210-
Output with a string that continues the conversation, responding to the latest message from the user, given the conversation history:
211-
{{ conversation_history }}
183+
Simulators allow users to generate synthentic data using their application. Simulator expects the user to have a callback method that invokes their AI application. The intergration between your AI application and the simulator happens at the callback method. Here's how a sample callback would look like:
212184

213-
```
214-
215-
Query Response generaing prompty for gpt-4o with `json_schema` support
216-
Use this file as an override.
217-
```yaml
218-
---
219-
name: TaskSimulatorQueryResponseGPT4o
220-
description: Gets queries and responses from a blob of text
221-
model:
222-
api: chat
223-
parameters:
224-
temperature: 0.0
225-
top_p: 1.0
226-
presence_penalty: 0
227-
frequency_penalty: 0
228-
response_format:
229-
type: json_schema
230-
json_schema:
231-
name: QRJsonSchema
232-
schema:
233-
type: object
234-
properties:
235-
items:
236-
type: array
237-
items:
238-
type: object
239-
properties:
240-
q:
241-
type: string
242-
r:
243-
type: string
244-
required:
245-
- q
246-
- r
247-
248-
inputs:
249-
text:
250-
type: string
251-
num_queries:
252-
type: integer
253-
254-
255-
---
256-
system:
257-
You're an AI that helps in preparing a Question/Answer quiz from Text for "Who wants to be a millionaire" tv show
258-
Both Questions and Answers MUST BE extracted from given Text
259-
Frame Question in a way so that Answer is RELEVANT SHORT BITE-SIZED info from Text
260-
RELEVANT info could be: NUMBER, DATE, STATISTIC, MONEY, NAME
261-
A sentence should contribute multiple QnAs if it has more info in it
262-
Answer must not be more than 5 words
263-
Answer must be picked from Text as is
264-
Question should be as descriptive as possible and must include as much context as possible from Text
265-
Output must always have the provided number of QnAs
266-
Output must be in JSON format.
267-
Output must have {{num_queries}} objects in the format specified below. Any other count is unacceptable.
268-
Text:
269-
<|text_start|>
270-
On January 24, 1984, former Apple CEO Steve Jobs introduced the first Macintosh. In late 2003, Apple had 2.06 percent of the desktop share in the United States.
271-
Some years later, research firms IDC and Gartner reported that Apple's market share in the U.S. had increased to about 6%.
272-
<|text_end|>
273-
Output with 5 QnAs:
274-
{
275-
"qna": [{
276-
"q": "When did the former Apple CEO Steve Jobs introduced the first Macintosh?",
277-
"r": "January 24, 1984"
278-
},
279-
{
280-
"q": "Who was the former Apple CEO that introduced the first Macintosh on January 24, 1984?",
281-
"r": "Steve Jobs"
282-
},
283-
{
284-
"q": "What percent of the desktop share did Apple have in the United States in late 2003?",
285-
"r": "2.06 percent"
286-
},
287-
{
288-
"q": "What were the research firms that reported on Apple's market share in the U.S.?",
289-
"r": "IDC and Gartner"
290-
},
291-
{
292-
"q": "What was the percentage increase of Apple's market share in the U.S., as reported by research firms IDC and Gartner?",
293-
"r": "6%"
294-
}]
295-
}
296-
Text:
297-
<|text_start|>
298-
{{ text }}
299-
<|text_end|>
300-
Output with {{ num_queries }} QnAs:
301-
```
302-
303-
Application code:
304185

305186
```python
306-
import json
307-
import asyncio
308-
from typing import Any, Dict, List, Optional
309-
from azure.ai.evaluation.simulator import Simulator
310-
from promptflow.client import load_flow
311-
import os
312-
import wikipedia
313-
314-
# Set up the model configuration without api_key, using DefaultAzureCredential
315-
model_config = {
316-
"azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
317-
"azure_deployment": os.environ.get("AZURE_DEPLOYMENT"),
318-
# not providing key would make the SDK pick up `DefaultAzureCredential`
319-
# use "api_key": "<your API key>"
320-
"api_version": "2024-08-01-preview" # keep this for gpt-4o
321-
}
322-
323-
# Use Wikipedia to get some text for the simulation
324-
wiki_search_term = "Leonardo da Vinci"
325-
wiki_title = wikipedia.search(wiki_search_term)[0]
326-
wiki_page = wikipedia.page(wiki_title)
327-
text = wiki_page.summary[:1000]
328-
329-
def method_to_invoke_application_prompty(query: str, messages_list: List[Dict], context: Optional[Dict]):
330-
try:
331-
current_dir = os.path.dirname(__file__)
332-
prompty_path = os.path.join(current_dir, "application.prompty")
333-
_flow = load_flow(
334-
source=prompty_path,
335-
model=model_config,
336-
credential=DefaultAzureCredential()
337-
)
338-
response = _flow(
339-
query=query,
340-
context=context,
341-
conversation_history=messages_list
342-
)
343-
return response
344-
except Exception as e:
345-
print(f"Something went wrong invoking the prompty: {e}")
346-
return "something went wrong"
347-
348187
async def callback(
349188
messages: Dict[str, List[Dict]],
350189
stream: bool = False,
351-
session_state: Any = None, # noqa: ANN401
190+
session_state: Any = None,
352191
context: Optional[Dict[str, Any]] = None,
353192
) -> dict:
354193
messages_list = messages["messages"]
355194
# Get the last message from the user
356195
latest_message = messages_list[-1]
357196
query = latest_message["content"]
358197
# Call your endpoint or AI application here
359-
response = method_to_invoke_application_prompty(query, messages_list, context)
360-
# Format the response to follow the OpenAI chat protocol format
198+
# response should be a string
199+
response = call_to_your_application(query, messages_list, context)
361200
formatted_response = {
362201
"content": response,
363202
"role": "assistant",
364203
"context": "",
365204
}
366205
messages["messages"].append(formatted_response)
367206
return {"messages": messages["messages"], "stream": stream, "session_state": session_state, "context": context}
207+
```
368208

369-
async def main():
370-
simulator = Simulator(model_config=model_config)
371-
current_dir = os.path.dirname(__file__)
372-
query_response_override_for_latest_gpt_4o = os.path.join(current_dir, "TaskSimulatorQueryResponseGPT4o.prompty")
373-
outputs = await simulator(
374-
target=callback,
375-
text=text,
376-
query_response_generating_prompty=query_response_override_for_latest_gpt_4o, # use this only with latest gpt-4o
377-
num_queries=2,
378-
max_conversation_turns=1,
379-
user_persona=[
380-
f"I am a student and I want to learn more about {wiki_search_term}",
381-
f"I am a teacher and I want to teach my students about {wiki_search_term}"
209+
The simulator initialization and invocation looks like this:
210+
```python
211+
from azure.ai.evaluation.simulator import Simulator
212+
model_config = {
213+
"azure_endpoint": os.environ.get("AZURE_ENDPOINT"),
214+
"azure_deployment": os.environ.get("AZURE_DEPLOYMENT_NAME"),
215+
"api_version": os.environ.get("AZURE_API_VERSION"),
216+
}
217+
custom_simulator = Simulator(model_config=model_config)
218+
outputs = asyncio.run(custom_simulator(
219+
target=callback,
220+
conversation_turns=[
221+
[
222+
"What should I know about the public gardens in the US?",
382223
],
383-
)
384-
print(json.dumps(outputs, indent=2))
385-
386-
if __name__ == "__main__":
387-
# Ensure that the following environment variables are set in your environment:
388-
# AZURE_OPENAI_ENDPOINT and AZURE_DEPLOYMENT
389-
# Example:
390-
# os.environ["AZURE_OPENAI_ENDPOINT"] = "https://your-endpoint.openai.azure.com/"
391-
# os.environ["AZURE_DEPLOYMENT"] = "your-deployment-name"
392-
asyncio.run(main())
393-
print("done!")
394-
224+
[
225+
"How do I simulate data against LLMs",
226+
],
227+
],
228+
max_conversation_turns=2,
229+
))
230+
with open("simulator_output.jsonl", "w") as f:
231+
for output in outputs:
232+
f.write(output.to_eval_qr_json_lines())
395233
```
396234

397235
#### Adversarial Simulator
398236

399237
```python
400238
from azure.ai.evaluation.simulator import AdversarialSimulator, AdversarialScenario
401239
from azure.identity import DefaultAzureCredential
402-
from typing import Any, Dict, List, Optional
403-
import asyncio
404-
405-
406240
azure_ai_project = {
407241
"subscription_id": <subscription_id>,
408242
"resource_group_name": <resource_group_name>,
409243
"project_name": <project_name>
410244
}
411-
412-
async def callback(
413-
messages: List[Dict],
414-
stream: bool = False,
415-
session_state: Any = None,
416-
context: Dict[str, Any] = None
417-
) -> dict:
418-
messages_list = messages["messages"]
419-
# get last message
420-
latest_message = messages_list[-1]
421-
query = latest_message["content"]
422-
context = None
423-
if 'file_content' in messages["template_parameters"]:
424-
query += messages["template_parameters"]['file_content']
425-
# the next few lines explains how to use the AsyncAzureOpenAI's chat.completions
426-
# to respond to the simulator. You should replace it with a call to your model/endpoint/application
427-
# make sure you pass the `query` and format the response as we have shown below
428-
from openai import AsyncAzureOpenAI
429-
oai_client = AsyncAzureOpenAI(
430-
api_key=<api_key>,
431-
azure_endpoint=<endpoint>,
432-
api_version="2023-12-01-preview",
433-
)
434-
try:
435-
response_from_oai_chat_completions = await oai_client.chat.completions.create(messages=[{"content": query, "role": "user"}], model="gpt-4", max_tokens=300)
436-
except Exception as e:
437-
print(f"Error: {e}")
438-
# to continue the conversation, return the messages, else you can fail the adversarial with an exception
439-
message = {
440-
"content": "Something went wrong. Check the exception e for more details.",
441-
"role": "assistant",
442-
"context": None,
443-
}
444-
messages["messages"].append(message)
445-
return {
446-
"messages": messages["messages"],
447-
"stream": stream,
448-
"session_state": session_state
449-
}
450-
response_result = response_from_oai_chat_completions.choices[0].message.content
451-
formatted_response = {
452-
"content": response_result,
453-
"role": "assistant",
454-
"context": {},
455-
}
456-
messages["messages"].append(formatted_response)
457-
return {
458-
"messages": messages["messages"],
459-
"stream": stream,
460-
"session_state": session_state,
461-
"context": context
462-
}
463-
464-
```
465-
466-
#### Adversarial QA
467-
468-
```python
469245
scenario = AdversarialScenario.ADVERSARIAL_QA
470246
simulator = AdversarialSimulator(azure_ai_project=azure_ai_project, credential=DefaultAzureCredential())
471247

@@ -480,30 +256,20 @@ outputs = asyncio.run(
480256

481257
print(outputs.to_eval_qr_json_lines())
482258
```
483-
#### Direct Attack Simulator
484-
485-
```python
486-
scenario = AdversarialScenario.ADVERSARIAL_QA
487-
simulator = DirectAttackSimulator(azure_ai_project=azure_ai_project, credential=DefaultAzureCredential())
488259

489-
outputs = asyncio.run(
490-
simulator(
491-
scenario=scenario,
492-
max_conversation_turns=1,
493-
max_simulation_results=2,
494-
target=callback
495-
)
496-
)
497-
498-
print(outputs)
499-
```
260+
For more details about the simulator, visit the following links:
261+
- [Adversarial Simulation docs][adversarial_simulation_docs]
262+
- [Adversarial scenarios][adversarial_simulation_scenarios]
263+
- [Simulating jailbreak attacks][adversarial_jailbreak]
500264

501265
## Examples
502266

503267
In following section you will find examples of:
504268
- [Evaluate an application][evaluate_app]
505269
- [Evaluate different models][evaluate_models]
506270
- [Custom Evaluators][custom_evaluators]
271+
- [Adversarial Simulation][adversarial_simulation]
272+
- [Simulate with conversation starter][simulate_with_conversation_starter]
507273

508274
More examples can be found [here][evaluate_samples].
509275

@@ -571,4 +337,9 @@ This project has adopted the [Microsoft Open Source Code of Conduct][code_of_con
571337
[evaluation_metrics]: https://learn.microsoft.com/azure/ai-studio/concepts/evaluation-metrics-built-in
572338
[performance_and_quality_evaluators]: https://learn.microsoft.com/azure/ai-studio/how-to/develop/evaluate-sdk#performance-and-quality-evaluators
573339
[risk_and_safety_evaluators]: https://learn.microsoft.com/azure/ai-studio/how-to/develop/evaluate-sdk#risk-and-safety-evaluators
574-
[composite_evaluators]: https://learn.microsoft.com/azure/ai-studio/how-to/develop/evaluate-sdk#composite-evaluators
340+
[composite_evaluators]: https://learn.microsoft.com/azure/ai-studio/how-to/develop/evaluate-sdk#composite-evaluators
341+
[adversarial_simulation_docs]: https://learn.microsoft.com/azure/ai-studio/how-to/develop/simulator-interaction-data#generate-adversarial-simulations-for-safety-evaluation
342+
[adversarial_simulation_scenarios]: https://learn.microsoft.com/azure/ai-studio/how-to/develop/simulator-interaction-data#supported-adversarial-simulation-scenarios
343+
[adversarial_simulation]: https://github.com/Azure-Samples/azureai-samples/tree/main/scenarios/evaluate/simulate_adversarial
344+
[simulate_with_conversation_starter]: https://github.com/Azure-Samples/azureai-samples/tree/main/scenarios/evaluate/simulate_conversation_starter
345+
[adversarial_jailbreak]: https://learn.microsoft.com/azure/ai-studio/how-to/develop/simulator-interaction-data#simulating-jailbreak-attacks

0 commit comments

Comments
 (0)