imohitmayank
diff --git a/‎docs/imgs/nlp_trainingllm_cover.jpg‎
247 KB b/‎docs/imgs/nlp_trainingllm_cover.jpg‎
247 KB
diff --git a/‎docs/imgs/nlp_trainingllm_iterativetraining.png‎
193 KB b/‎docs/imgs/nlp_trainingllm_iterativetraining.png‎
193 KB
diff --git a/‎docs/imgs/nlp_trainingllms_4dparallelism.png‎
492 KB b/‎docs/imgs/nlp_trainingllms_4dparallelism.png‎
492 KB
diff --git a/‎docs/imgs/nlp_trainingllms_scalinglaws.png‎
360 KB b/‎docs/imgs/nlp_trainingllms_scalinglaws.png‎
360 KB
diff --git a/‎docs/imgs/rl_rlhf_instructgpt.png‎
76.2 KB b/‎docs/imgs/rl_rlhf_instructgpt.png‎
76.2 KB
diff --git a/‎docs/machine_learning/interview_questions.md‎
Lines changed: 2 additions & 0 deletions b/‎docs/machine_learning/interview_questions.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/machine_learning/loss_functions.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/machine_learning/loss_functions.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/machine_learning/model_compression_quant.md‎
Lines changed: 3 additions & 1 deletion b/‎docs/machine_learning/model_compression_quant.md‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎docs/natural_language_processing/training_llm.md‎
Lines changed: 210 additions & 0 deletions b/‎docs/natural_language_processing/training_llm.md‎
Lines changed: 210 additions & 0 deletions
diff --git a/‎docs/reinforcement_learning/rlhf.md‎
Lines changed: 9 additions & 1 deletion b/‎docs/reinforcement_learning/rlhf.md‎
Lines changed: 9 additions & 1 deletion
@@ -120,6 +120,8 @@
 
         Temperature allows you to control the trade-off between exploration and exploitation in the model's predictions. It's a hyperparameter that can be adjusted during training or inference to achieve the desired level of certainty in the model's output, depending on the specific requirements of your application.
 
+        Here is a good online [tool](https://artefact2.github.io/llm-sampling/index.xhtml) to learn about the impact of temperature and other parameters on output generation. 
+
 
 !!! Question ""
     === "Question"
 
@@ -3,7 +3,7 @@
 
 ## Introduction
 
-- Loss functions are the "ideal objectives" that neural networks (NN) tries to optimize. In fact, they are the mathematical personification of what we want to achieve with the NN. As the name suggests, it is a function that takes input and compute a loss value that determines how further away the current model is from the ideal model for that example. In an ideal world, we would expect the loss value to be 0, but in reality it could get very close to 0 and sometimes even be high enough so that we terminate training to handle overfitting.
+- Loss functions are the "ideal objectives" that neural networks (NN) tries to optimize. In fact, they are the mathematical personification of what we want to achieve with the NN. As the name suggests, it is a function that takes input and compute a loss value that determines how further away the current model is from the ideal model for that example. In an ideal world, we would expect the loss value to be 0, but in reality it could get very close to 0.
 - We also have cost functions that is nothing but aggrgation of the loss functions over a batch or complete dataset. The cost function is the function that we use in practice to optimize the model.
 
 !!! Hint
 
@@ -483,4 +483,6 @@ Fine-tuning the model can be done very easily using the `llama.cpp` library. Bel
 
 [8] LLM.int8() - [Blog](https://huggingface.co/blog/hf-bitsandbytes-integration)
 
-[9] GGUF/GGML - [Official Docs](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) | [Blog - Quantize Llama_2 models using GGML](https://mlabonne.github.io/blog/posts/Quantize_Llama_2_models_using_ggml.html) | [K Quants](https://github.com/ggerganov/llama.cpp/pull/1684)
+[9] GGUF/GGML - [Official Docs](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) | [Blog - Quantize Llama_2 models using GGML](https://mlabonne.github.io/blog/posts/Quantize_Llama_2_models_using_ggml.html) | [K Quants](https://github.com/ggerganov/llama.cpp/pull/1684)
+
+[10] [A Visual Guide to Quantization](https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-quantization)
@@ -56,6 +56,12 @@ Using human feedback in reinforcement learning has several benefits, but also pr
 
 - Reinforcement learning from human feedback (RLHF) has shown great potential in improving natural language processing (NLP) tasks. In NLP, the use of human feedback can help to capture the nuances of language and better align the agent's behavior with the user's expectations.
 
+<figure markdown> 
+    ![](../imgs/rl_rlhf_instructgpt.png)
+    <figcaption>PPO model trained with RLHF outperforming SFT and base models by OpenAI. Source [2]</figcaption>
+</figure>
+
+
 ### Summarization
 
 - One of the first examples of utilizing RLHF in NLP was proposed in [1] to improve summarization using human feedback. Summarization aims to generate summaries that capture the most important information from a longer text. In RLHF, human feedback can be used to evaluate the quality of summaries and guide the agent towards more informative and concise summaries. This is quite difficult to capture using the metrics like ROUGE as they miss the human preferences.
@@ -86,4 +92,6 @@ Using human feedback in reinforcement learning has several benefits, but also pr
 
 ## References
 
-[1] [Learning to summarize from human feedback](https://arxiv.org/abs/2009.01325)
+[1] [Learning to summarize from human feedback](https://arxiv.org/abs/2009.01325)
+
+[2] [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)