Clean up simulator readme (Azure#38450)

nagkumar91 · Nagkumar Arkalgud · Nagkumar Arkalgud · web-flow · commit 94c2b59ad205 · 2024-11-11T21:55:49.000Z
* Update task_query_response.prompty

remove required keys

* Update task_simulate.prompty

* Update task_query_response.prompty

* Update task_simulate.prompty

* Fix the api_key needed

* Update for release

* Black fix for file

* Add original text in global context

* Update test

* Update the indirect attack simulator

* Black suggested fixes

* Update simulator prompty

* Update adversarial scenario enum to exclude XPIA

* Update changelog

* Black fixes

* Remove duplicate import

* Fix the mypy error

* Mypy please be happy

* Updates to non adv simulator

* accept context from assistant messages, exclude them when using them for conversation

* update changelog

* pylint fixes

* pylint fixes

* remove redundant quotes

* Fix typo

* pylint fix

* Update broken tests

* Include the grounding json in the manifest

* Fix typo

* Come on package

* Release 1.0.0b5

* Notice from Chang

* Remove adv_conv template parameters from the outputs

* Update chanagelog

* Experimental tags on adv scenarios

* Readme fix onbreaking change

* Add the category and both user and assistant context to the response of qr_json_lines

* Update changelog

* Rename _kwargs to _options

* _options as prefix

* update troubleshooting for simulator

* Rename according to suggestions

* Clean up readme

* more links

---------

Co-authored-by: Nagkumar Arkalgud &lt;nagkumar@naarkalg-work-mac.local&gt;
Co-authored-by: Nagkumar Arkalgud &lt;nagkumar@naarkalgworkmac.lan&gt;
Co-authored-by: Nagkumar Arkalgud &lt;nagkumar@Mac.lan&gt;
diff --git a/sdk/evaluation/azure-ai-evaluation/README.md b/sdk/evaluation/azure-ai-evaluation/README.md
@@ -180,292 +180,68 @@ For more details refer to [Evaluate on a target][evaluate_target]
 ### Simulator
 
 
-Simulators allow users to generate synthentic data using their application. Simulator expects the user to have a callback method that invokes
-their AI application.
-
-#### Simulating with a Prompty
-
-```yaml
----
-name: ApplicationPrompty
-description: Simulates an application
-model:
-  api: chat
-  parameters:
-    temperature: 0.0
-    top_p: 1.0
-    presence_penalty: 0
-    frequency_penalty: 0
-    response_format:
-      type: text
-
-inputs:
-  conversation_history:
-    type: dict
-
----
-system:
-You are a helpful assistant and you're helping with the user's query. Keep the conversation engaging and interesting.
-
-Output with a string that continues the conversation, responding to the latest message from the user, given the conversation history:
-{{ conversation_history }}
+Simulators allow users to generate synthentic data using their application. Simulator expects the user to have a callback method that invokes their AI application. The intergration between your AI application and the simulator happens at the callback method. Here's how a sample callback would look like:
 
-```
-
-Query Response generaing prompty for gpt-4o with `json_schema` support
-Use this file as an override.
-```yaml
----
-name: TaskSimulatorQueryResponseGPT4o
-description: Gets queries and responses from a blob of text
-model:
-  api: chat
-  parameters:
-    temperature: 0.0
-    top_p: 1.0
-    presence_penalty: 0
-    frequency_penalty: 0
-    response_format:
-      type: json_schema
-      json_schema:
-        name: QRJsonSchema
-        schema: 
-          type: object
-          properties:
-            items:
-              type: array
-              items:
-                type: object
-                properties:
-                  q:
-                    type: string
-                  r:
-                    type: string
-                required:
-                  - q
-                  - r
-
-inputs:
-  text:
-    type: string
-  num_queries:
-    type: integer
-
-
----
-system:
-You're an AI that helps in preparing a Question/Answer quiz from Text for "Who wants to be a millionaire" tv show
-Both Questions and Answers MUST BE extracted from given Text
-Frame Question in a way so that Answer is RELEVANT SHORT BITE-SIZED info from Text
-RELEVANT info could be: NUMBER, DATE, STATISTIC, MONEY, NAME
-A sentence should contribute multiple QnAs if it has more info in it
-Answer must not be more than 5 words
-Answer must be picked from Text as is
-Question should be as descriptive as possible and must include as much context as possible from Text
-Output must always have the provided number of QnAs
-Output must be in JSON format.
-Output must have {{num_queries}} objects in the format specified below. Any other count is unacceptable.
-Text:
-<|text_start|>
-On January 24, 1984, former Apple CEO Steve Jobs introduced the first Macintosh. In late 2003, Apple had 2.06 percent of the desktop share in the United States.
-Some years later, research firms IDC and Gartner reported that Apple's market share in the U.S. had increased to about 6%.
-<|text_end|>
-Output with 5 QnAs:
-{
-    "qna": [{
-        "q": "When did the former Apple CEO Steve Jobs introduced the first Macintosh?",
-        "r": "January 24, 1984"
-    },
-    {
-        "q": "Who was the former Apple CEO that introduced the first Macintosh on January 24, 1984?",
-        "r": "Steve Jobs"
-    },
-    {
-        "q": "What percent of the desktop share did Apple have in the United States in late 2003?",
-        "r": "2.06 percent"
-    },
-    {
-        "q": "What were the research firms that reported on Apple's market share in the U.S.?",
-        "r": "IDC and Gartner"
-    },
-    {
-        "q": "What was the percentage increase of Apple's market share in the U.S., as reported by research firms IDC and Gartner?",
-        "r": "6%"
-    }]
-}
-Text:
-<|text_start|>
-{{ text }}
-<|text_end|>
-Output with {{ num_queries }} QnAs:
-```
-
-Application code:
 
 ```python
-import json
-import asyncio
-from typing import Any, Dict, List, Optional
-from azure.ai.evaluation.simulator import Simulator
-from promptflow.client import load_flow
-import os
-import wikipedia
-
-# Set up the model configuration without api_key, using DefaultAzureCredential
-model_config = {
-    "azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
-    "azure_deployment": os.environ.get("AZURE_DEPLOYMENT"),
-    # not providing key would make the SDK pick up `DefaultAzureCredential`
-    # use "api_key": "<your API key>"
-    "api_version": "2024-08-01-preview" # keep this for gpt-4o
-}
-
-# Use Wikipedia to get some text for the simulation
-wiki_search_term = "Leonardo da Vinci"
-wiki_title = wikipedia.search(wiki_search_term)[0]
-wiki_page = wikipedia.page(wiki_title)
-text = wiki_page.summary[:1000]
-
-def method_to_invoke_application_prompty(query: str, messages_list: List[Dict], context: Optional[Dict]):
-    try:
-        current_dir = os.path.dirname(__file__)
-        prompty_path = os.path.join(current_dir, "application.prompty")
-        _flow = load_flow(
-            source=prompty_path,
-            model=model_config,
-            credential=DefaultAzureCredential()
-        )
-        response = _flow(
-            query=query,
-            context=context,
-            conversation_history=messages_list
-        )
-        return response
-    except Exception as e:
-        print(f"Something went wrong invoking the prompty: {e}")
-        return "something went wrong"
-
 async def callback(
     messages: Dict[str, List[Dict]],
     stream: bool = False,
-    session_state: Any = None,  # noqa: ANN401
+    session_state: Any = None,
     context: Optional[Dict[str, Any]] = None,
 ) -> dict:
     messages_list = messages["messages"]
     # Get the last message from the user
     latest_message = messages_list[-1]
     query = latest_message["content"]
     # Call your endpoint or AI application here
-    response = method_to_invoke_application_prompty(query, messages_list, context)
-    # Format the response to follow the OpenAI chat protocol format
+    # response should be a string
+    response = call_to_your_application(query, messages_list, context)
     formatted_response = {
         "content": response,
         "role": "assistant",
         "context": "",
     }
     messages["messages"].append(formatted_response)
     return {"messages": messages["messages"], "stream": stream, "session_state": session_state, "context": context}
+```
 
-async def main():
-    simulator = Simulator(model_config=model_config)
-    current_dir = os.path.dirname(__file__)
-    query_response_override_for_latest_gpt_4o = os.path.join(current_dir, "TaskSimulatorQueryResponseGPT4o.prompty")
-    outputs = await simulator(
-        target=callback,
-        text=text,
-        query_response_generating_prompty=query_response_override_for_latest_gpt_4o, # use this only with latest gpt-4o
-        num_queries=2,
-        max_conversation_turns=1,
-        user_persona=[
-            f"I am a student and I want to learn more about {wiki_search_term}",
-            f"I am a teacher and I want to teach my students about {wiki_search_term}"
+The simulator initialization and invocation looks like this:
+```python
+from azure.ai.evaluation.simulator import Simulator
+model_config = {
+    "azure_endpoint": os.environ.get("AZURE_ENDPOINT"),
+    "azure_deployment": os.environ.get("AZURE_DEPLOYMENT_NAME"),
+    "api_version": os.environ.get("AZURE_API_VERSION"),
+}
+custom_simulator = Simulator(model_config=model_config)
+outputs = asyncio.run(custom_simulator(
+    target=callback,
+    conversation_turns=[
+        [
+            "What should I know about the public gardens in the US?",
         ],
-    )
-    print(json.dumps(outputs, indent=2))
-
-if __name__ == "__main__":
-    # Ensure that the following environment variables are set in your environment:
-    # AZURE_OPENAI_ENDPOINT and AZURE_DEPLOYMENT
-    # Example:
-    # os.environ["AZURE_OPENAI_ENDPOINT"] = "https://your-endpoint.openai.azure.com/"
-    # os.environ["AZURE_DEPLOYMENT"] = "your-deployment-name"
-    asyncio.run(main())
-    print("done!")
-
+        [
+            "How do I simulate data against LLMs",
+        ],
+    ],
+    max_conversation_turns=2,
+))
+with open("simulator_output.jsonl", "w") as f:
+    for output in outputs:
+        f.write(output.to_eval_qr_json_lines())
 ```
 
 #### Adversarial Simulator
 
 ```python
 from azure.ai.evaluation.simulator import AdversarialSimulator, AdversarialScenario
 from azure.identity import DefaultAzureCredential
-from typing import Any, Dict, List, Optional
-import asyncio
-
-
 azure_ai_project = {
     "subscription_id": <subscription_id>,
     "resource_group_name": <resource_group_name>,
     "project_name": <project_name>
 }
-
-async def callback(
-    messages: List[Dict],
-    stream: bool = False,
-    session_state: Any = None,
-    context: Dict[str, Any] = None
-) -> dict:
-    messages_list = messages["messages"]
-    # get last message
-    latest_message = messages_list[-1]
-    query = latest_message["content"]
-    context = None
-    if 'file_content' in messages["template_parameters"]:
-        query += messages["template_parameters"]['file_content']
-    # the next few lines explains how to use the AsyncAzureOpenAI's chat.completions
-    # to respond to the simulator. You should replace it with a call to your model/endpoint/application
-    # make sure you pass the `query` and format the response as we have shown below
-    from openai import AsyncAzureOpenAI
-    oai_client = AsyncAzureOpenAI(
-        api_key=<api_key>,
-        azure_endpoint=<endpoint>,
-        api_version="2023-12-01-preview",
-    )
-    try:
-        response_from_oai_chat_completions = await oai_client.chat.completions.create(messages=[{"content": query, "role": "user"}], model="gpt-4", max_tokens=300)
-    except Exception as e:
-        print(f"Error: {e}")
-        # to continue the conversation, return the messages, else you can fail the adversarial with an exception
-        message = {
-            "content": "Something went wrong. Check the exception e for more details.",
-            "role": "assistant",
-            "context": None,
-        }
-        messages["messages"].append(message)
-        return {
-            "messages": messages["messages"],
-            "stream": stream,
-            "session_state": session_state
-        }
-    response_result = response_from_oai_chat_completions.choices[0].message.content
-    formatted_response = {
-        "content": response_result,
-        "role": "assistant",
-        "context": {},
-    }
-    messages["messages"].append(formatted_response)
-    return {
-        "messages": messages["messages"],
-        "stream": stream,
-        "session_state": session_state,
-        "context": context
-    }
-
-```
-
-#### Adversarial QA
-
-```python
 scenario = AdversarialScenario.ADVERSARIAL_QA
 simulator = AdversarialSimulator(azure_ai_project=azure_ai_project, credential=DefaultAzureCredential())
 
@@ -480,30 +256,20 @@ outputs = asyncio.run(
 
 print(outputs.to_eval_qr_json_lines())
 ```
-#### Direct Attack Simulator
-
-```python
-scenario = AdversarialScenario.ADVERSARIAL_QA
-simulator = DirectAttackSimulator(azure_ai_project=azure_ai_project, credential=DefaultAzureCredential())
 
-outputs = asyncio.run(
-    simulator(
-        scenario=scenario,
-        max_conversation_turns=1,
-        max_simulation_results=2,
-        target=callback
-    )
-)
-
-print(outputs)
-```
+For more details about the simulator, visit the following links:
+- [Adversarial Simulation docs][adversarial_simulation_docs]
+- [Adversarial scenarios][adversarial_simulation_scenarios]
+- [Simulating jailbreak attacks][adversarial_jailbreak]
 
 ## Examples
 
 In following section you will find examples of:
 - [Evaluate an application][evaluate_app]
 - [Evaluate different models][evaluate_models]
 - [Custom Evaluators][custom_evaluators]
+- [Adversarial Simulation][adversarial_simulation]
+- [Simulate with conversation starter][simulate_with_conversation_starter]
 
 More examples can be found [here][evaluate_samples].
 
@@ -571,4 +337,9 @@ This project has adopted the [Microsoft Open Source Code of Conduct][code_of_con
 [evaluation_metrics]: https://learn.microsoft.com/azure/ai-studio/concepts/evaluation-metrics-built-in
 [performance_and_quality_evaluators]: https://learn.microsoft.com/azure/ai-studio/how-to/develop/evaluate-sdk#performance-and-quality-evaluators
 [risk_and_safety_evaluators]: https://learn.microsoft.com/azure/ai-studio/how-to/develop/evaluate-sdk#risk-and-safety-evaluators
-[composite_evaluators]: https://learn.microsoft.com/azure/ai-studio/how-to/develop/evaluate-sdk#composite-evaluators
+[composite_evaluators]: https://learn.microsoft.com/azure/ai-studio/how-to/develop/evaluate-sdk#composite-evaluators
+[adversarial_simulation_docs]: https://learn.microsoft.com/azure/ai-studio/how-to/develop/simulator-interaction-data#generate-adversarial-simulations-for-safety-evaluation
+[adversarial_simulation_scenarios]: https://learn.microsoft.com/azure/ai-studio/how-to/develop/simulator-interaction-data#supported-adversarial-simulation-scenarios
+[adversarial_simulation]: https://github.com/Azure-Samples/azureai-samples/tree/main/scenarios/evaluate/simulate_adversarial
+[simulate_with_conversation_starter]: https://github.com/Azure-Samples/azureai-samples/tree/main/scenarios/evaluate/simulate_conversation_starter
+[adversarial_jailbreak]: https://learn.microsoft.com/azure/ai-studio/how-to/develop/simulator-interaction-data#simulating-jailbreak-attacks