Merge pull request #255770 from eric-urban/eur/prompt-flow-transparency

PMEds28 · web-flow · commit b3720ba1b12c · 2023-11-06T09:12:30.000Z
prompt flow transparency note
diff --git a/articles/machine-learning/prompt-flow/transparency-note.md b/articles/machine-learning/prompt-flow/transparency-note.md
@@ -0,0 +1,118 @@
+---
+title: Transparency Note for Auto-Generate Prompt Variants in Prompt Flow
+titleSuffix: Azure Machine Learning
+description: Transparency Note for Auto-Generate Prompt Variants in Prompt Flow
+author: prakharg-msft
+ms.author: prakharg
+manager: omkarm
+ms.service: machine-learning
+ms.subservice: prompt-flow
+ms.date: 10/20/2023
+ms.topic: article
+---
+
+# Transparency Note for Auto-Generate Prompt Variants in Prompt Flow
+
+## What is a Transparency Note?
+
+An AI system includes not only the technology, but also the people who use it, the people who are affected by it, and the environment in which it's deployed. Creating a system that is fit for its intended purpose requires an understanding of how the technology works, what its capabilities and limitations are, and how to achieve the best performance. Microsoft's Transparency Notes are intended to help you understand how our AI technology works, the choices system owners can make that influence system performance and behavior, and the importance of thinking about the whole system, including the technology, the people, and the environment. You can use Transparency Notes when developing or deploying your own system, or share them with the people who will use or be affected by your system.
+
+Microsoft's Transparency Notes are part of a broader effort at Microsoft to put our AI Principles into practice. To find out more, see the [Microsoft's AI principles](https://www.microsoft.com/ai/responsible-ai).
+
+## The basics of Auto-Generate Prompt Variants in Prompt Flow
+
+### Introduction
+
+Prompt engineering is at the center of building applications using Large Language Models. Microsoft's Prompt Flow offers rich capabilities to interactively edit, bulk test, and evaluate prompts with built-in flows to pick the best prompt. With the Auto-Generate Prompt Variants feature in Prompt Flow, we provide the ability to automatically generate variations of a user's base prompt with help of large language models and allow users to test them in Prompt Flow to reach the optimal solution for the user's model and use case needs.
+
+### Key terms
+
+| **Term** | **Definition** |
+| --- | --- |
+| Prompt flow | Prompt Flow offers rich capabilities to interactively edit prompts and bulk test them with built-in evaluation flows to pick the best prompt. More information available at [What is prompt flow](./overview-what-is-prompt-flow.md) |
+| Prompt engineering | The practice of crafting and refining input prompts to elicit more desirable responses from a large language model, particularly in large language models. |
+| Prompt variants | Different versions or modifications of a given input prompt designed to test or achieve varied responses from a large language model. |
+| Base prompt | The initial or primary prompt that serves as a starting point for eliciting response from large language models. In this case it is provided by the user and is modified to create prompt variants. |
+| System prompt | A predefined prompt generated by a system, typically to initiate a task or seek specific information. This is not visible but is used internally to generate prompt variants. |
+
+## Capabilities
+
+### System behavior
+
+The Auto-Generate Prompt Variants feature, as part of the Prompt Flow experience, provides the ability to automatically generate and easily assess prompt variations to quickly find the best prompt for your use case. This feature further empowers Prompt Flow's rich set of capabilities to interactively edit and evaluate prompts, with the goal of simplifying prompt engineering. 
+
+When provided with the user's base prompt the Auto-Generate Prompt Variants feature generates several variations using the generative power of Azure OpenAI models and an internal system prompt. While Azure OpenAI provides content management filters, we recommend verifying any prompts generated before using them in production scenarios. 
+
+### Use cases 
+
+#### Intended uses 
+
+Auto-Generate Prompt Variants can be used in the following scenarios. The system's intended use is: 
+
+**Generate new prompts from a provided base prompt**: "Generate Variants" feature will allow the users of prompt flow to automatically generate variants of their provided base prompt with help of LLMs (Large Language Models). 
+
+#### Considerations when choosing a use case 
+
+**Do not use Auto-Generate Prompt Variants for decisions that might have serious adverse impacts.**
+
+Auto-Generate Prompt Variants was not designed or tested to recommend items that require additional considerations related to accuracy, governance, policy, legal, or expert knowledge as these often exist outside the scope of the usage patterns carried out by regular (non-expert) users. Examples of such use cases include medical diagnostics, banking, or financial recommendations, hiring or job placement recommendations, or recommendations related to housing. 
+
+## Limitations
+
+Explicitly in the generation of prompt variants, it is important to understand that while AI systems are incredibly valuable tools, they are **non-deterministic**. This means that perfect **accuracy** (the measure of how well the system-generated events correspond to real events that happened in a space) of predictions is not possible. A good model will have high accuracy, but it will occasionally output incorrect predictions. Failure to understand this limitation can lead to over-reliance on the system and unmerited decisions that can impact stakeholders.  
+
+Furthermore, the prompt variants that are generated using LLMs, are returned to the user as is. It is encouraged to evaluate and compare these variants to determine the best prompt for a given scenario. There are **additional concerns** here because many of the evaluations offered in the Prompt Flow ecosystems also depend on LLMs, potentially further decreasing the utility of any given prompt. Manual review is strongly recommended. 
+
+### Technical limitations, operational factors, and ranges 
+
+As mentioned previously, the Auto-Generate Prompt Variants feature does not provide a measurement or evaluation of the provided prompt variants. It is strongly recommended that the user of this feature evaluates the suggested prompts in the way which best aligns with their specific use case and requirements. 
+
+The Auto-Generate Prompt Variants feature is limited to generating a maximum of five variations from a given base prompt. If more are required, additional prompt variants can be generated after modifying the original base prompt. 
+
+Auto-Generate Prompt Variants only supports Azure OpenAI models at this time. In addition to limiting users to only the models which are supported by Azure OpenAI, it also limits content to what is acceptable in terms of the Azure OpenAI's content management policy. Uses outside of this policy are not supported by this feature. 
+
+## System performance 
+
+Performance for the Auto-Generate Prompt Variants feature is determined by the user's use case in each individual scenario – in this way the feature does not evaluate each prompt or generate metrics. 
+
+Operating in the Prompt Flow ecosystem, which focuses on Prompt Engineering, provides a strong story for error handling. Often retrying the operation will resolve an error. One error which might arise specific to this feature is response filtering from the Azure OpenAI resource for content or harm detection, this would happen in the case that content in the base prompt is determined to be against Azure OpenAI's content management policy. To resolve these errors please update the base prompt in accordance with the guidance at [Azure OpenAI Service content filtering](/azure/ai-services/openai/concepts/content-filter). 
+
+### Best practices for improving system performance  
+
+To improve performance there are several parameters which can be modified, depending on the use cases and requirements of the prompt requirements: 
+
+- **Model**: The choice of models used with this feature will impact the performance. As general guidance, the GPT-4 model is more powerful than the GPT-3.5 and would thus be expected to generate more performant prompt variants. 
+- **Number of Variants**: This parameter specifies how many variants to generate. A larger number of variants will produce more prompts and therefore the likelihood of finding the best prompt for the use case. 
+- **Base Prompt**: Since this tool generates variants of the provided base prompt, a strong base prompt can set up the tool to provide the maximum value for your case. Please review the guidelines at Prompt engineering techniques with [Azure OpenAI](/azure/ai-services/openai/concepts/advanced-prompt-engineering). 
+
+## Evaluation of Auto-Generate Prompt Variants 
+
+### Evaluation methods 
+
+The Auto-Generate Prompt Variants feature been testing by the internal development team, targeting fit for purpose and harm mitigation. 
+
+### Evaluation results 
+
+Evaluation of harm management showed staunch support for the combination of system prompt and Azure Open AI content management policies in actively safe-guarding responses. Additional opportunities to minimize the chance and risk of harms can be found in the Microsoft documentation: [Azure OpenAI Service abuse monitoring](/azure/ai-services/openai/concepts/abuse-monitoring) and [Azure OpenAI Service content filtering](/azure/ai-services/openai/concepts/content-filter). 
+
+Fit for purpose testing supported the quality of generated prompts from creative purposes (poetry) and chat-bot agents. The reader is cautioned from drawing sweeping conclusions given the breadth of possible base prompt and potential use cases. As previously mentioned, please use evaluations appropriate to the required use cases and ensure a human reviewer is part of the process. 
+
+## Evaluating and integrating Auto-Generate Prompt Variants for your use 
+
+The performance of the Auto-Generate Prompt Variants feature will vary depending on the base prompt and use case in it is used. True usage of the generated prompts will depend on a combination of the many elements of the system in which the prompt is used. 
+
+To ensure optimal performance in their scenarios, customers should conduct their own evaluations of the solutions they implement using Auto-Generate Prompt Variants. Customers should, generally, follow an evaluation process that: 
+
+- Uses internal stakeholders to evaluate any generated prompt. 
+- Uses internal stakeholders to evaluate results of any system which uses a generated prompt. 
+- Incorporates KPI (Key Performance Indicators) and metrics monitoring when deploying the service using generated prompts meets evaluation targets. 
+
+## Learn more about responsible AI
+
+- [Microsoft AI principles](https://www.microsoft.com/ai/responsible-ai)
+- [Microsoft responsible AI resources](https://www.microsoft.com/ai/responsible-ai-resources) 
+- [Microsoft Azure Learning courses on responsible AI](/training/paths/responsible-ai-business-principles/)
+
+## Learn more about Auto-Generate Prompt Variants 
+ - [What is prompt flow](./overview-what-is-prompt-flow.md)
+
diff --git a/articles/machine-learning/toc.yml b/articles/machine-learning/toc.yml
@@ -662,6 +662,8 @@
               href: ./prompt-flow/how-to-custom-tool-package-creation-and-usage.md
         - name: Monitor generative AI applications in production
           href: ./prompt-flow/how-to-monitor-generative-ai-applications.md
+        - name: Transparency note
+          href: ./prompt-flow/transparency-note.md
         - name: Tools Reference
           items:
             - name: LLM tool