Skip to content

Commit c650555

Browse files
authored
Merge pull request #33 from imohitmayank/training_llm
Training LLM article added + Section rename and shuffle
2 parents 8daef07 + bec7795 commit c650555

File tree

11 files changed

+231
-10
lines changed

11 files changed

+231
-10
lines changed
247 KB
Loading
193 KB
Loading
492 KB
Loading
360 KB
Loading

docs/imgs/rl_rlhf_instructgpt.png

76.2 KB
Loading

docs/machine_learning/interview_questions.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,8 @@
120120

121121
Temperature allows you to control the trade-off between exploration and exploitation in the model's predictions. It's a hyperparameter that can be adjusted during training or inference to achieve the desired level of certainty in the model's output, depending on the specific requirements of your application.
122122

123+
Here is a good online [tool](https://artefact2.github.io/llm-sampling/index.xhtml) to learn about the impact of temperature and other parameters on output generation.
124+
123125

124126
!!! Question ""
125127
=== "Question"

docs/machine_learning/loss_functions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
## Introduction
55

6-
- Loss functions are the "ideal objectives" that neural networks (NN) tries to optimize. In fact, they are the mathematical personification of what we want to achieve with the NN. As the name suggests, it is a function that takes input and compute a loss value that determines how further away the current model is from the ideal model for that example. In an ideal world, we would expect the loss value to be 0, but in reality it could get very close to 0 and sometimes even be high enough so that we terminate training to handle overfitting.
6+
- Loss functions are the "ideal objectives" that neural networks (NN) tries to optimize. In fact, they are the mathematical personification of what we want to achieve with the NN. As the name suggests, it is a function that takes input and compute a loss value that determines how further away the current model is from the ideal model for that example. In an ideal world, we would expect the loss value to be 0, but in reality it could get very close to 0.
77
- We also have cost functions that is nothing but aggrgation of the loss functions over a batch or complete dataset. The cost function is the function that we use in practice to optimize the model.
88

99
!!! Hint

docs/machine_learning/model_compression_quant.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -483,4 +483,6 @@ Fine-tuning the model can be done very easily using the `llama.cpp` library. Bel
483483

484484
[8] LLM.int8() - [Blog](https://huggingface.co/blog/hf-bitsandbytes-integration)
485485

486-
[9] GGUF/GGML - [Official Docs](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) | [Blog - Quantize Llama_2 models using GGML](https://mlabonne.github.io/blog/posts/Quantize_Llama_2_models_using_ggml.html) | [K Quants](https://github.com/ggerganov/llama.cpp/pull/1684)
486+
[9] GGUF/GGML - [Official Docs](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) | [Blog - Quantize Llama_2 models using GGML](https://mlabonne.github.io/blog/posts/Quantize_Llama_2_models_using_ggml.html) | [K Quants](https://github.com/ggerganov/llama.cpp/pull/1684)
487+
488+
[10] [A Visual Guide to Quantization](https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-quantization)

docs/natural_language_processing/training_llm.md

Lines changed: 210 additions & 0 deletions
Large diffs are not rendered by default.

docs/reinforcement_learning/rlhf.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,12 @@ Using human feedback in reinforcement learning has several benefits, but also pr
5656

5757
- Reinforcement learning from human feedback (RLHF) has shown great potential in improving natural language processing (NLP) tasks. In NLP, the use of human feedback can help to capture the nuances of language and better align the agent's behavior with the user's expectations.
5858

59+
<figure markdown>
60+
![](../imgs/rl_rlhf_instructgpt.png)
61+
<figcaption>PPO model trained with RLHF outperforming SFT and base models by OpenAI. Source [2]</figcaption>
62+
</figure>
63+
64+
5965
### Summarization
6066

6167
- One of the first examples of utilizing RLHF in NLP was proposed in [1] to improve summarization using human feedback. Summarization aims to generate summaries that capture the most important information from a longer text. In RLHF, human feedback can be used to evaluate the quality of summaries and guide the agent towards more informative and concise summaries. This is quite difficult to capture using the metrics like ROUGE as they miss the human preferences.
@@ -86,4 +92,6 @@ Using human feedback in reinforcement learning has several benefits, but also pr
8692

8793
## References
8894

89-
[1] [Learning to summarize from human feedback](https://arxiv.org/abs/2009.01325)
95+
[1] [Learning to summarize from human feedback](https://arxiv.org/abs/2009.01325)
96+
97+
[2] [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)

0 commit comments

Comments
 (0)