Skip to content

Commit 18ce037

Browse files
authored
Merge pull request #281497 from ssalgadodev/patch-129
Data Generation and Distillation
2 parents 5735acf + 969a089 commit 18ce037

File tree

3 files changed

+86
-0
lines changed

3 files changed

+86
-0
lines changed
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
---
2+
title: Distillation in AI Studio
3+
titleSuffix: Azure AI Studio
4+
description: Learn how to do distillation in Azure AI Studio.
5+
manager: scottpolly
6+
ms.service: azure-ai-studio
7+
ms.topic: how-to
8+
ms.date: 07/23/2024
9+
ms.reviewer: vkann
10+
reviewer: anshirga
11+
ms.author: ssalgado
12+
author: ssalgadodev
13+
ms.custom: references_regions
14+
---
15+
16+
# Distillation in Azure AI Studio
17+
18+
In this article
19+
- [Distillation](#distillation)
20+
- [Next Steps](#next-steps)
21+
22+
In Azure AI Studio, you can leverage Distillation to efficiently train the student model.
23+
24+
## Distillation
25+
26+
In machine learning, distillation is a technique used to transfer knowledge from a large, complex model (often called the “teacher model”) to a smaller, simpler model (the “student model”). This process helps the smaller model achieve similar performance to the larger one while being more efficient in terms of computation and memory usage12.
27+
28+
The main steps in knowledge distillation involve:
29+
30+
- **Using the teacher model** to generate predictions for the dataset.
31+
32+
- **Training the student model** using these predictions, along with the original dataset, to mimic the teacher model’s behavior.
33+
34+
You can use the sample notebook available at this [link](https://aka.ms/meta-llama-3.1-distillation) to see how to perform distillation. In this sample notebook, the teacher model used the Meta Llama 3.1 405B Instruct model, and the student model used the Meta Llama 3.1 8B Instruct.
35+
36+
We used an advanced prompt during synthetic data generation, which incorporates Chain of thought (COT) reasoning, resulting in higher accuracy data labels in the synthetic data. This further improves the accuracy of the distilled model.
37+
38+
## Next steps
39+
- [What is Azure AI Studio?](../what-is-ai-studio.md)
40+
- [Learn more about deploying Meta Llama models](../how-to/deploy-models-llama.md)
41+
42+
- [Azure AI FAQ article](../faq.yml)
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
---
2+
title: Synthetic data generation in AI Studio
3+
titleSuffix: Azure AI Studio
4+
description: Learn how to generate Synthetic dataset in Azure AI Studio.
5+
manager: scottpolly
6+
ms.service: azure-ai-studio
7+
ms.topic: how-to
8+
ms.date: 07/23/2024
9+
ms.reviewer: vkann
10+
reviewer: anshirga
11+
ms.author: ssalgado
12+
author: ssalgadodev
13+
ms.custom: references_regions
14+
---
15+
16+
# Synthetic data generation in Azure AI Studio
17+
18+
In this article
19+
- [Synthetic data generation](#synthetic-data-generation)
20+
- [Next Steps](#next-steps)
21+
22+
In Azure AI Studio, you can leverage synthetic data generation to efficiently produce predictions for your datasets.
23+
24+
## Synthetic data generation
25+
26+
Synthetic data generation involves creating artificial data that mimics the statistical properties of real-world data. This data is generated using algorithms and machine learning techniques, and it can be used in various ways, such as computer simulations or by modeling real-world events.
27+
28+
In machine learning, synthetic data is particularly valuable for several reasons:
29+
30+
**Data Augmentation:** It helps in expanding the size of training datasets, which is crucial for training robust machine learning models. This is especially useful when real-world data is scarce or expensive to obtain.
31+
32+
**Testing and Validation:** It allows for extensive testing and validation of machine learning models under various scenarios without the need for real-world data.
33+
34+
You can use the sample notebook available at this [link](https://aka.ms/meta-llama-3.1-datagen) to see how to generate Synthetic data.
35+
36+
## Next steps
37+
- [What is Azure AI Studio?](../what-is-ai-studio.md)
38+
- [Learn more about deploying Meta Llama models](../how-to/deploy-models-llama.md)
39+
40+
- [Azure AI FAQ article](../faq.yml)

articles/ai-studio/toc.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,8 @@ items:
8888
displayName: endpoint
8989
- name: Fine-tune models
9090
href: concepts/fine-tuning-overview.md
91+
- name: Distillation
92+
href: concepts/concept-model-distillation.md
9193
- name: Serverless API models
9294
items:
9395
- name: Deploy models as serverless API
@@ -142,6 +144,8 @@ items:
142144
href: how-to/index-add.md
143145
- name: Build and consume indexes using code
144146
href: how-to/develop/index-build-consume-sdk.md
147+
- name: Synthetic Data Generation
148+
href: concepts/concept-synthetic-data.md
145149
displayName: code,sdk
146150
- name: Develop generative AI apps
147151
items:

0 commit comments

Comments
 (0)