Skip to content

Commit a7f0ed8

Browse files
committed
Concepts - Fine-tuning language models
1 parent 2618058 commit a7f0ed8

File tree

3 files changed

+55
-5
lines changed

3 files changed

+55
-5
lines changed

articles/aks/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,8 @@
154154
items:
155155
- name: Small and large language models
156156
href: concepts-ai-ml-language-models.md
157+
- name: Fine-tuning language models
158+
href: concepts-fine-tune-language-models.md
157159
- name: Advanced Container Networking Services
158160
items:
159161
- name: Advanced Container Networking Services overview

articles/aks/concepts-ai-ml-language-models.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,10 +85,12 @@ For more information, see [Deploy an AI model on AKS with the AI toolchain opera
8585
To learn more about containerized AI and machine learning workloads on AKS, see the following articles:
8686

8787
* [Use KAITO to forecast energy usage with intelligent apps][forecast-energy-usage]
88+
* [Concepts - Fine-tuning language models][fine-tune-language-models]
8889
* [Build and deploy data and machine learning pipelines with Flyte on AKS][flyte-aks]
8990

9091
<!-- LINKS -->
9192
[ai-toolchain-operator]: ./ai-toolchain-operator.md
9293
[forecast-energy-usage]: https://azure.github.io/Cloud-Native/60DaysOfIA/forecasting-energy-usage-with-intelligent-apps-1/
9394
[flyte-aks]: ./use-flyte.md
9495
[kaito-repo]: https://github.com/Azure/kaito/tree/main/presets
96+
[fine-tune-language-models]: ./concepts-fine-tune-language-models.md
Lines changed: 51 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,63 @@
11
---
2-
title: Concepts - Customizing language models for AI and machine learning workflows
2+
title: Concepts - Fine-tuning language models for AI and machine learning workflows
33
description: Learn about how you can customize language models to use in your AI and machine learning workflows on Azure Kubernetes Service (AKS).
44
ms.topic: conceptual
5-
ms.date: 06/24/2024
5+
ms.date: 07/01/2024
66
author: schaffererin
77
ms.author: schaffererin
88
---
99

10-
# Concepts - Customizing language models for AI and machine learning workflows
10+
# Concepts - Fine-tuning language models for AI and machine learning workflows
1111

12-
In this article, you learn about customizing language models, including some common methods and how applying the results to your models can improve the performance of your AI and machine learning workflows on Azure Kubernetes Service (AKS).
12+
In this article, you learn about fine-tuning [language models][language-models], including some common methods and how applying the results to your models can improve the performance of your AI and machine learning workflows on Azure Kubernetes Service (AKS).
1313

1414
## Pre-trained language models
1515

16-
*Pre-trained language models* offer an accessible way to get started with AI inferencing and are widely used in natural language processing (NLP). Pre-trained language models are trained on large-scale text corpora from the internet using deep neural networks and can be fine-tuned on smaller datasets for specific tasks.
16+
*Pre-trained language models (PTMs)* offer an accessible way to get started with AI inferencing and are widely used in natural language processing (NLP). PTMS are trained on large-scale text corpora from the internet using deep neural networks and can be fine-tuned on smaller datasets for specific tasks. These models typically consist of billions of parameters, or *weights*, that are learned during the pre-training process.
1717

18+
PTMs can learn universal language representations that capture the statistical properties of natural language, such as the probability of words or sequences of words occurring in a given context. These representations can be transferred to downstream tasks, such as text classification, named entity recognition, and question answering, by fine-tuning the model on task-specific datasets.
19+
20+
### Pros and cons
21+
22+
The following table lists some pros and cons of using PTMs in your AI and machine learning workflows:
23+
24+
| Pros | Cons |
25+
|------|------|
26+
| • Speeds up development and deployment. <br> • Improves model accuracy and generalization. <br> • Reduces the need for large labeled datasets. | • Requires large computational resources. <br> • Might not be suitable for all tasks or domains. <br> • Might introduce biases or errors in the output. |
27+
28+
## Fine-tuning methods
29+
30+
### Parameter efficient fine-tuning
31+
32+
*Parameter efficient fine-tuning (PEFT)* is a method for fine-tuning PTMs on small datasets with limited computational resources. PEFT uses a combination of techniques, such as data augmentation, regularization, and transfer learning, to improve the performance of the model on specific tasks. PEFT requires minimal compute resources and flexible quantities of data, making it suitable for low-resource settings. This method allows you to retain most of the weights of the original pre-trained model and update the remaining weights to fit context-specific, labeled data.
33+
34+
### Low rank adaptation
35+
36+
*Low rank adaptation (LoRA)* is a PEFT method commonly used to customize large language models for new tasks. This method tracks changes to model weights and efficiently stores smaller weight matrices that represent only the model's trainable parameters, reducing memory usage and the compute power needed for fine-tuning. LoRA creates fine-tuning results, known as *adapter layers*, that can be temporarily stored and pulled into the model's architecture for new inferencing jobs.
37+
38+
*Quantized low rank adaptation (QLoRA)* is an extension of LoRA that further reduces memory usage by introducing quantization to the adapter layers. For more information, see [Making LLMs even more accessible wit6h bitsandbites, 4-bit quantization, and QLoRA][qlora].
39+
40+
## Experiment with fine-tuning language models on AKS
41+
42+
Kubernetes AI Toolchain Operator (KAITO) is an open-source operator that automates small and large language model deployments in Kubernetes clusters. The KAITO add-on for AKS simplifies onboarding and reduces the time-to-inference for open-source models on your AKS clusters. The add-on automatically provisions right-sized GPU nodes and sets up the associated interference server as an endpoint server to your chosen model.
43+
44+
In the upcoming open source KAITO release, you can efficiently fine-tune supported MIT and Apache 2.0 licensed models with the following features:
45+
46+
* Store your retraining data as a container image in a private container registry.
47+
* Host the new adapter layer image in a private container registry.
48+
* Efficiently pull the image for inferencing with adapter layers in new scenarios.
49+
50+
To learn more about leveraging KAITO with your AKS clusters, see the [KAITO model GitHub repository][kaito-repo].
51+
52+
## Next steps
53+
54+
To learn more about containerized AI and machine learning workloads on AKS, see the following articles:
55+
56+
* [Concepts - Small and large language models][language-models]
57+
* [Build and deploy data and machine learning pipelines with Flyte on AKS][flyte-aks]
58+
59+
<!-- LINKS -->
60+
[flyte-aks]: ./use-flyte.md
61+
[kaito-repo]: https://github.com/Azure/kaito/tree/main/presets
62+
[language-models]: ./concepts-ai-ml-language-models.md
63+
[qlora]: https://huggingface.co/blog/4bit-transformers-bitsandbytes#:~:text=We%20present%20QLoRA%2C%20an%20efficient%20finetuning%20approach%20that,pretrained%20language%20model%20into%20Low%20Rank%20Adapters~%20%28LoRA%29.

0 commit comments

Comments
 (0)