|
7 | 7 | paragraph: This page explains all the concepts related to Managed Inference |
8 | 8 | tags: |
9 | 9 | dates: |
10 | | - validation: 2024-09-10 |
| 10 | + validation: 2024-11-29 |
11 | 11 | categories: |
12 | 12 | - ai-data |
13 | 13 | --- |
@@ -68,19 +68,19 @@ LLM Applications are applications or software tools that leverage the capabiliti |
68 | 68 | ## Large Language Models |
69 | 69 |
|
70 | 70 | LLMs are advanced artificial intelligence systems capable of understanding and generating human-like text on various topics. |
71 | | -These models, such as Llama-2, are trained on vast amounts of data to learn the patterns and structures of language, enabling them to generate coherent and contextually relevant responses to queries or prompts. |
| 71 | +These models, such as Llama-3, are trained on vast amounts of data to learn the patterns and structures of language, enabling them to generate coherent and contextually relevant responses to queries or prompts. |
72 | 72 | LLMs have applications in natural language processing, text generation, translation, and other tasks requiring sophisticated language understanding and production. |
73 | 73 |
|
74 | 74 | ## Prompt |
75 | 75 |
|
76 | | -In the context of LLMs, a prompt refers to the input provided to the model to generate a desired response. |
| 76 | +In the context of generative AI models, a prompt refers to the input provided to the model to generate a desired response. |
77 | 77 | It typically consists of a sentence, paragraph, or series of keywords or instructions that guide the model in producing text relevant to the given context or task. |
78 | 78 | The quality and specificity of the prompt greatly influence the generated output, as the model uses it to understand the user's intent and create responses accordingly. |
79 | 79 |
|
80 | 80 | ## Quantization |
81 | 81 |
|
82 | 82 | Quantization is a technique used to reduce the precision of numerical values in a model's parameters or activations to improve efficiency and reduce memory footprint during inference. It involves representing floating-point values with fewer bits while minimizing the loss of accuracy. |
83 | | -LLMs provided for deployment are named with suffixes that denote their quantization levels, such as `:int8`, `:fp8`, and `:fp16`. |
| 83 | +AI models provided for deployment are named with suffixes that denote their quantization levels, such as `:int8`, `:fp8`, and `:fp16`. |
84 | 84 |
|
85 | 85 | ## Retrieval Augmented Generation (RAG) |
86 | 86 |
|
|
0 commit comments