|
| 1 | +# PytestAgentsSDK |
| 2 | + |
| 3 | +This project provides a sample test harness for evaluating Copilot Studio agents using [**Pytest**](https://docs.pytest.org/en/stable/) and [**DeepEval**](https://github.com/confident-ai/deepeval). It uses the [Microsoft 365 Agents SDK](https://github.com/microsoft/agents) to communicate with Copilot Studio and focuses on **semantic evaluation** of agent responses using DeepEval’s `GEval` metric. |
| 4 | + |
| 5 | +## Features |
| 6 | + |
| 7 | +- Multi-turn conversation testing against a Copilot Studio agent |
| 8 | +- Semantic response evaluation using DeepEval’s `GEval` metric |
| 9 | +- Loads test cases from a CSV file |
| 10 | +- Custom HTML reporting with detailed metadata (user input, actual and expected output, score, reason) |
| 11 | +- Authentication via MSAL, supporting [“Authenticate with Microsoft”](https://learn.microsoft.com/en-us/microsoft-copilot-studio/configuration-end-user-authentication#authenticate-with-microsoft) in Copilot Studio |
| 12 | +- Easily extensible for use with additional metrics and long-term result tracking using DeepEval and Pytest plugins |
| 13 | + |
| 14 | +--- |
| 15 | + |
| 16 | +## Setup |
| 17 | + |
| 18 | +### **1. Clone the repository** |
| 19 | + |
| 20 | +```bash |
| 21 | +git clone https://github.com/microsoft/CopilotStudioSamples.git |
| 22 | +cd CopilotStudioSamples/FunctionalTesting/PytestAgentsSDK |
| 23 | +``` |
| 24 | + |
| 25 | +### **2. Create and activate a virtual environment** |
| 26 | + |
| 27 | +```bash |
| 28 | +python3 -m venv .venv |
| 29 | +source .venv/bin/activate # On Windows use `.venv\Scripts\activate` |
| 30 | +``` |
| 31 | + |
| 32 | +### **3. Install required dependencies** |
| 33 | + |
| 34 | +```bash |
| 35 | +pip install -r requirements.txt |
| 36 | +``` |
| 37 | + |
| 38 | +### **4. Create an app registration** |
| 39 | + |
| 40 | +You will need to register an application in Azure for the SDK to authenticate with Copilot Studio: |
| 41 | + |
| 42 | +- Create a **single-tenant** app registration in Azure |
| 43 | +- Under **Authentication → Platform configurations**, click **Add a platform**, and select **Mobile and desktop applications** |
| 44 | +- Add these redirect URIs: |
| 45 | + - `msal40347a26-35bb-48f3-bdc4-7f4f209aecb1://auth` (MSAL only) |
| 46 | + - `http://localhost` |
| 47 | +- Under **API permissions**, click **Add a permission** |
| 48 | + - Choose **APIs my organization uses**, then search for **Power Platform API** |
| 49 | + - Choose **Delegated permissions**, then add `CopilotStudio.Copilots.Invoke` |
| 50 | + |
| 51 | +### **5. Authentication and Agent details** |
| 52 | + |
| 53 | +Create a `.env` file (you can copy from `.env.template`) and populate it with your MSAL and Copilot Studio agent configuration: |
| 54 | + |
| 55 | +```env |
| 56 | +APP_CLIENT_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx |
| 57 | +TENANT_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx |
| 58 | +ENVIRONMENT_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx |
| 59 | +AGENT_IDENTIFIER=cr26e_dMyAgent # This is the schema name, found under Settings > Advanced > Metadata |
| 60 | +``` |
| 61 | + |
| 62 | +### **6. Configure Azure OpenAI or OpenAI details** |
| 63 | + |
| 64 | +You can use either OpenAI or Azure OpenAI with DeepEval. |
| 65 | + |
| 66 | +#### To configure Azure OpenAI using the DeepEval CLI: |
| 67 | + |
| 68 | +```bash |
| 69 | +deepeval set-azure-openai \ |
| 70 | + --openai-endpoint=<endpoint> \ # e.g. https://example-resource.openai.azure.com/ |
| 71 | + --openai-api-key=<api_key> \ |
| 72 | + --openai-model-name=<model_name> \ # e.g. gpt-4o |
| 73 | + --deployment-name=<deployment_name> \ # e.g. Test Deployment |
| 74 | + --openai-api-version=<openai_api_version> # e.g. 2025-01-01-preview |
| 75 | +``` |
| 76 | + |
| 77 | +> These values will be stored in a local `.deepeval` configuration file. |
| 78 | +
|
| 79 | +Alternatively, if you're using OpenAI (not Azure), set the following environment variable: |
| 80 | + |
| 81 | +```bash |
| 82 | +export OPENAI_API_KEY=<your-openai-key> |
| 83 | +``` |
| 84 | + |
| 85 | +### **7. Publish and set agent authentication** |
| 86 | + |
| 87 | +Before running tests, ensure that your Copilot Studio agent is: |
| 88 | + |
| 89 | +- **Published** in the Copilot Studio portal |
| 90 | +- Configured to use **[Authenticate with Microsoft](https://learn.microsoft.com/en-us/microsoft-copilot-studio/configuration-end-user-authentication#authenticate-with-microsoft)** under **Settings > Security > Authentication** |
| 91 | + |
| 92 | +### **8. Prepare Test Cases (CSV Input)** |
| 93 | + |
| 94 | +Before running the tests, populate the CSV file at `input/test_cases.csv` with your test cases. |
| 95 | + |
| 96 | +The CSV file must contain two columns: |
| 97 | + |
| 98 | +- `input_text`: The message sent to the Copilot Studio agent |
| 99 | +- `expected_output`: The ideal response you'd expect from the agent |
| 100 | + |
| 101 | +#### Example: |
| 102 | + |
| 103 | +```csv |
| 104 | +input_text,expected_output |
| 105 | +What is the capital of France?,The capital of France is Paris, which is known for its historical landmarks like the Eiffel Tower and the Louvre Museum. |
| 106 | +Who wrote 'Hamlet'?,William Shakespeare wrote the play 'Hamlet', which is considered one of the greatest works of English literature. |
| 107 | +What is the chemical symbol for water?,H3O is the correct chemical symbol for water. |
| 108 | +``` |
| 109 | + |
| 110 | +--- |
| 111 | + |
| 112 | +## Running the Tests |
| 113 | + |
| 114 | +From the `PytestAgentsSDK` directory, run: |
| 115 | + |
| 116 | +```bash |
| 117 | +pytest tests/multi_turn_eval_openai.py --html=reports/multi_turn_eval_openai.html --self-contained-html |
| 118 | +``` |
| 119 | + |
| 120 | +This will: |
| 121 | + |
| 122 | +- Start a conversation with your Copilot Studio agent |
| 123 | +- Send test questions and capture responses |
| 124 | +- Evaluate the responses using DeepEval |
| 125 | +- Generate a self-contained HTML report in the `reports/` folder |
| 126 | + |
| 127 | +--- |
| 128 | + |
| 129 | +## Output |
| 130 | + |
| 131 | +The HTML report includes: |
| 132 | + |
| 133 | +- Pass/Fail status based on semantic threshold |
| 134 | +- User message and expected answer |
| 135 | +- Actual response from the agent |
| 136 | +- DeepEval score |
| 137 | +- Explanation for the result |
| 138 | +- Conversation ID (for debugging) |
0 commit comments