The Large Language Model (LLM) Bootcamp is designed from a real-world perspective, following the data processing, development, and deployment pipeline paradigm. Attendees walk through the workflow of preprocessing a multi-turn conversational dataset for the summarization task and fine-tune the dataset on SOTA LLM using NeMo-Run. Attendees will also learn to optimize the fine-tuned model and apply prompt engineering techniques to solve complex real-world tasks. Furthermore, we introduced an AI Assistant customer care use case challenge to test attendees' understanding of the material and solidify their experience in the Text Generation domain.
To run this tutorial, you will need a Laptop/Workstation/DGX machine with a minimum of 40GB NVIDIA GPU.
- Install the latest Docker or Singularity.
- The labs require a Huggingface security token. Steps can be found in the link here.
We tested and ran all labs on a DGX machine equipped with an A100 and H100 GPUs (80GB).
You can deploy this material using Singularity and Docker containers. Please refer to the respective sections for the instructions.
To run the Labs, build a Docker container by following these steps:
- Open a terminal window and navigate to the directory where
Dockerfilefile is located (e.g.cd ~/LLM-Bootcamp) - To build the docker container, run :
sudo docker build -f Dockerfile --network=host -t <imagename>:<tagnumber> ., for instance:
sudo docker build -f Dockerfile --network=host -t nemodiv:v1 .
- To run the built container :
docker run --rm -it --gpus all -p 8888:8888 --ipc=host --ulimit memlock=-1 --ulimit stack=67108864
-v ./nemo_workspace:/workspace nemodev:v1
jupyter-lab --no-browser --allow-root --ip=0.0.0.0 --port=8888 --NotebookApp.token="" --notebook-dir=/workspace
flags:
--rmwill delete the container when finished.-itmeans run in interactive mode.--gpusoption makes GPUs accessible inside the container.-vis used to mount host directories in the container filesystem.--network=hostwill share the host’s network stack to the container.-pflag explicitly maps a single port or range of ports.
Open the browser at http://localhost:8888 and click on the start_here.ipynb. Go to the table of content and click on Lab 1: Preprocessing Multi-turn Conversational Dataset.
As soon as you are done with the rest of the labs, shut down jupyter lab by selecting File > Shut Down and the container by typing exit or pressing ctrl d in the terminal window.
- Build the Labs Singularity container with:
singularity build --fakeroot --sandbox nemodev.simg Singularity
- To run the built container:
singularity run --nv -B nemo_workspace:/workspace nemodev.simg
jupyter-lab --no-browser --allow-root --ip=0.0.0.0 --port=8888 --NotebookApp.token="" --notebook-dir=/workspace
The -B flag mounts local directories in the container filesystem and ensures changes are stored locally in the project folder. Open jupyter lab in the browser: http://localhost:8888
You may start working on the labs by clicking the start_here.ipynb notebook.
When you finish these notebooks, shut down jupyter lab by selecting File > Shut Down in the top left corner, then shut down the Singularity container by typing exit or pressing ctrl + d in the terminal window.
-
Your custom preprocessed dataset must be saved in
.jsonlformat and follow the naming convention astraining.jsonl,validation.jsonl, andtest.jsonlotherwise you will see an error stating:unable to find the training data ... -
By default, NeMo stores the checkpoint here:
NEMO_MODELS_CACHE=/root/.cache/nemo/models. You will be able to run inference successfully but run into errors runing infernece after restarting the NeMo container. This is because the cache as been cleared and an error will pop up stating:Can't find the Llama-3.1-8B model...
To prevent this, please set up the NeMo cache path to store the checkpoint in your local environment as: os.environ["NEMO_MODELS_CACHE"] = "/workspace/model/"