nerdy-tech-com-gitub
diff --git a/‎recipes/quickstart/NotebookLlama/README.md‎
Lines changed: 12 additions & 3 deletions b/‎recipes/quickstart/NotebookLlama/README.md‎
Lines changed: 12 additions & 3 deletions
diff --git a/‎recipes/quickstart/NotebookLlama/resources/2402.13116v4.pdf‎
1.98 MB b/‎recipes/quickstart/NotebookLlama/resources/2402.13116v4.pdf‎
1.98 MB
@@ -12,17 +12,26 @@ It assumes zero knowledge of LLMs, prompting and audio models, everything is cov
 
 Here is step by step thought (pun intended) for the task:
 
-- Step 1: Pre-process PDF: Use `Llama-3.2-1B` to pre-process the PDF and save it in a `.txt` file.
-- Step 2: Transcript Writer: Use `Llama-3.1-70B` model to write a podcast transcript from the text
-- Step 3: Dramatic Re-Writer: Use `Llama-3.1-8B` model to make the transcript more dramatic
+- Step 1: Pre-process PDF: Use `Llama-3.2-1B-Instruct` to pre-process the PDF and save it in a `.txt` file.
+- Step 2: Transcript Writer: Use `Llama-3.1-70B-Instruct` model to write a podcast transcript from the text
+- Step 3: Dramatic Re-Writer: Use `Llama-3.1-8B-Instruct` model to make the transcript more dramatic
 - Step 4: Text-To-Speech Workflow: Use `parler-tts/parler-tts-mini-v1` and `bark/suno` to generate a conversational podcast
 
+Note 1: In Step 1, we prompt the 1B model to not modify the text or summarize it, strictly clean up extra characters or garbage characters that might get picked due to encoding from PDF. Please see the prompt in Notebook 1 for more details.
+
+Note 2: For Step 2, you can also use `Llama-3.1-8B-Instruct` model, we recommend experimenting and trying if you see any differences. The 70B model was used here because it gave slightly more creative podcast transcripts for the tested examples.
+
 ### Detailed steps on running the notebook:
 
 Requirements: GPU server or an API provider for using 70B, 8B and 1B Llama models.
+For running the 70B model, you will need a GPU with aggregated memory around 140GB to infer in bfloat-16 precision.
 
 Note: For our GPU Poor friends, you can also use the 8B and lower models for the entire pipeline. There is no strong recommendation. The pipeline below is what worked best on first few tests. You should try and see what works best for you!
 
+- Before getting started, please make sure to login using the `huggingface cli` and then launch your jupyter notebook server to make sure you are able to download the Llama models.
+
+You'll need your Hugging Face access token, which you can get at your Settings page [here](https://huggingface.co/settings/tokens). Then run `huggingface-cli login` and copy and paste your Hugging Face access token to complete the login to make sure the scripts can download Hugging Face models if needed.
+
 - First, please Install the requirements from [here]() by running inside the folder:
 
 ```