New conceptual doc on language models for AI and machine learning

schaffererin · schaffererin · commit 900c63034600 · 2024-06-14T15:30:58.000-07:00
diff --git a/articles/aks/TOC.yml b/articles/aks/TOC.yml
@@ -150,6 +150,10 @@
           href: concepts-network-services.md
         - name: Ingress
           href: concepts-network-ingress.md
+    - name: AI and machine learning
+      items:
+        - name: Small and large language models
+          href: concepts-ai-ml-language-models.md
     - name: Advanced Container Networking Services 
       items:
         - name: Advanced Container Networking Services overview
diff --git a/articles/aks/concepts-ai-ml-language-models.md b/articles/aks/concepts-ai-ml-language-models.md
@@ -0,0 +1,88 @@
+---
+title: Concepts - Small and large language models
+description: Learn about small and large language models, including when and how you can use them with your Azure Kubernetes Service (AKS) AI and machine learning workloads.
+ms.topic: conceptual
+ms.date: 06/14/2024
+author: schaffererin
+ms.author: schaffererin
+---
+
+# Concepts - Small and large language models
+
+In this article, you learn about small and large language models, including when to use them and how you can use them with your Azure Kubernetes Service (AKS) AI and machine learning workloads.
+
+
+## What are language models?
+
+Language models are powerful machine learning models used for natural language processing (NLP) tasks, such as text generation and sentiment analysis. These models represent natural language based on the probability of words or sequences of words occurring in a given context.
+
+*Conventional language models* have been used in supervised settings where they're trained on well-labeled text datasets for specific tasks. *Pretrained language models*, on the other hand, are trained on large-scale text corpora from the internet using deep neural networks and can be fine-tuned on smaller datasets for specific tasks.
+
+The size of a language model is determined by the number of parameters it has, which is the number of weights that determine how the model processes input data and generates output. Parameters are learned during the training process by adjusting the weights to minimize the difference between the model's predictions and the actual data. The more parameters a model has, the more complex and expressive it is, but also the more computationally expensive it is to train and use.
+
+In general, ***small language models*** have *fewer than 100 million parameters*, while ***large language models*** have *more than 100 million parameters*. For example, GPT-2 has four versions with different sizes: small (124 million parameters), medium (355 million parameters), large (774 million parameters), and extra-large (1.5 billion parameters).
+
+## When to use small language models
+
+### Advantages
+
+Small language models are a good choice if you want models that are:
+
+* **Faster and cheaper to train and run**: They require less data and compute power.
+* **Easy to deploy and maintain**: They have smaller storage and memory footprints.
+* **Less prone to *overfitting***, which is when a model learns the noise or specific patterns of the training data and fails to generalize new data.
+* **Interpretable and explainable**: They have fewer parameters and components to understand and analyze.
+
+### Use cases
+
+Small language models are suitable for use cases that require:
+
+* **Limited data or resources**, and you need a quick and simple solution.
+* **Well-defined or narrow tasks**, and you don't need a lot of creativity in the output.
+* **High-precision and low-recall tasks**, and you value accuracy and quality over coverage and quantity.
+* **Sensitive or regulated tasks**, and you need to ensure the transparency and accountability of the model.
+
+The following table lists some popular, high-performance small language models:
+
+| Model family | Model sizes (Number of parameters) | Software license |
+|--------------|------------------------------------|------------------|
+| Microsoft Phi 3 | Phi-3-mini (3.8 billion), Phi-3-small (7 billion), Phi-3-medium (14 billion) | MIT license |
+| Meta LLaMA 3 | | Meta license |
+| Mistral open-weight models | Mistral7B (7.3 billion) | Apache 2.0 license |
+
+## When to use large language models
+
+### Advantages
+
+Large language models are a good choice if you want models that are:
+
+* **Powerful and expressive**: They can capture more complex and diverse patterns and relationships in the data.
+* **General and adaptable**: They can handle a wider range of tasks and domains and transfer knowledge across them.
+* **Creative and innovative**: They can generate more original and varied outputs and sometimes discover new knowledge.
+* **Robust and consistent**: They can handle noisy or incomplete inputs and avoid common errors and biases.
+
+### Use cases
+
+Large language models are suitable for use cases that require:
+
+* **Abundant data and resources**, and you have the budget to build and maintain a complex solution.
+* **Broad and open-ended tasks**, and you need creativity and diversity in the output.
+* **Low-precision and high-recall tasks**, and you value coverage and quantity over accuracy and quality.
+* **Challenging or exploratory tasks**, and you want to leverage the model's capacity to learn and adapt.
+
+The following table lists some popular, high-performance large language models:
+
+| Model family | Model sizes (Number of parameters) | Software license |
+|--------------|------------------------------------|------------------|
+| | | |
+| | | |
+
+## X
+
+## Y
+
+## Z
+
+## Next steps
+
+XYZ