Merge pull request #281497 from ssalgadodev/patch-129

Stacyrch140 · web-flow · commit 18ce037691a6 · 2024-07-23T12:28:55.000-04:00
Data Generation and Distillation
diff --git a/articles/ai-studio/concepts/concept-model-distillation.md b/articles/ai-studio/concepts/concept-model-distillation.md
@@ -0,0 +1,42 @@
+---
+title: Distillation in AI Studio
+titleSuffix: Azure AI Studio
+description: Learn how to do distillation in Azure AI Studio.
+manager: scottpolly
+ms.service: azure-ai-studio
+ms.topic: how-to
+ms.date: 07/23/2024
+ms.reviewer: vkann
+reviewer: anshirga
+ms.author: ssalgado
+author: ssalgadodev
+ms.custom: references_regions
+---
+
+# Distillation in Azure AI Studio
+
+In this article
+  - [Distillation](#distillation)
+  - [Next Steps](#next-steps)
+
+In Azure AI Studio, you can leverage Distillation to efficiently train the student model.
+
+## Distillation
+
+In machine learning, distillation is a technique used to transfer knowledge from a large, complex model (often called the “teacher model”) to a smaller, simpler model (the “student model”). This process helps the smaller model achieve similar performance to the larger one while being more efficient in terms of computation and memory usage12.
+
+The main steps in knowledge distillation involve:
+
+- **Using the teacher model** to generate predictions for the dataset.
+
+- **Training the student model** using these predictions, along with the original dataset, to mimic the teacher model’s behavior.
+ 
+You can use the sample notebook available at this [link](https://aka.ms/meta-llama-3.1-distillation) to see how to perform distillation. In this sample notebook, the teacher model used the Meta Llama 3.1 405B Instruct model, and the student model used the Meta Llama 3.1 8B Instruct.
+
+We used an advanced prompt during synthetic data generation, which incorporates Chain of thought (COT) reasoning, resulting in higher accuracy data labels in the synthetic data. This further improves the accuracy of the distilled model.
+
+## Next steps
+- [What is Azure AI Studio?](../what-is-ai-studio.md)
+- [Learn more about deploying Meta Llama models](../how-to/deploy-models-llama.md)
+
+- [Azure AI FAQ article](../faq.yml)
diff --git a/articles/ai-studio/concepts/concept-synthetic-data.md b/articles/ai-studio/concepts/concept-synthetic-data.md
@@ -0,0 +1,40 @@
+---
+title: Synthetic data generation in AI Studio
+titleSuffix: Azure AI Studio
+description: Learn how to generate Synthetic dataset in Azure AI Studio.
+manager: scottpolly
+ms.service: azure-ai-studio
+ms.topic: how-to
+ms.date: 07/23/2024
+ms.reviewer: vkann
+reviewer: anshirga
+ms.author: ssalgado
+author: ssalgadodev
+ms.custom: references_regions
+---
+
+# Synthetic data generation in Azure AI Studio
+
+In this article
+  - [Synthetic data generation](#synthetic-data-generation)
+  - [Next Steps](#next-steps)
+
+In Azure AI Studio, you can leverage synthetic data generation to efficiently produce predictions for your datasets.
+
+## Synthetic data generation
+
+Synthetic data generation involves creating artificial data that mimics the statistical properties of real-world data. This data is generated using algorithms and machine learning techniques, and it can be used in various ways, such as computer simulations or by modeling real-world events.
+
+In machine learning, synthetic data is particularly valuable for several reasons:
+
+**Data Augmentation:** It helps in expanding the size of training datasets, which is crucial for training robust machine learning models. This is especially useful when real-world data is scarce or expensive to obtain.
+
+**Testing and Validation:** It allows for extensive testing and validation of machine learning models under various scenarios without the need for real-world data.
+
+You can use the sample notebook available at this [link](https://aka.ms/meta-llama-3.1-datagen) to see how to generate Synthetic data.
+
+## Next steps
+- [What is Azure AI Studio?](../what-is-ai-studio.md)
+- [Learn more about deploying Meta Llama models](../how-to/deploy-models-llama.md)
+
+- [Azure AI FAQ article](../faq.yml)
diff --git a/articles/ai-studio/toc.yml b/articles/ai-studio/toc.yml
@@ -88,6 +88,8 @@ items:
       displayName: endpoint
     - name: Fine-tune models
       href: concepts/fine-tuning-overview.md
+    - name: Distillation
+      href: concepts/concept-model-distillation.md
     - name: Serverless API models
       items:
       - name: Deploy models as serverless API
@@ -142,6 +144,8 @@ items:
       href: how-to/index-add.md
     - name: Build and consume indexes using code
       href: how-to/develop/index-build-consume-sdk.md
+    - name: Synthetic Data Generation
+      href: concepts/concept-synthetic-data.md
       displayName: code,sdk
   - name: Develop generative AI apps
     items: