This repository contains the example code from our O'Reilly book Natural Language Processing with Transformers:
You can run these notebooks on cloud platforms like Google Colab or your local machine. Note that most chapters require a GPU to run in a reasonable amount of time, so we recommend one of the cloud platforms as they come pre-installed with CUDA, or use a local machine with NVIDIA GPU (CUDA) or Apple Silicon (MPS).
To run these notebooks on a cloud platform, just click on one of the badges in the table below:
Nowadays, the GPUs on Colab tend to be K80s (which have limited memory), so we recommend using Kaggle, Gradient, or SageMaker Studio Lab. These platforms tend to provide more performant GPUs like P100s, all for free!
Note: some cloud platforms like Kaggle require you to restart the notebook after installing new packages.
To run the notebooks on your own machine, first clone the repository and navigate to it:
$ git clone https://github.com/nlp-with-transformers/notebooks.git
$ cd notebooksNext, run the following command to create a conda virtual environment that contains all the libraries needed to run the notebooks:
$ conda env create -f environment.ymlNote: For optimal performance, you'll need either:
- NVIDIA GPU: with CUDA Toolkit support
- Apple Silicon: M1/M2/M3 Macs with MPS (Metal Performance Shaders) support
- CPU only: Will work but significantly slower
Apple Silicon Macs (M1/M2/M3) are now fully supported! 🎉
Chapter 7 (Question Answering) has a special set of dependencies, so to run that chapter you'll need a separate environment:
$ conda env create -f environment-chapter7.ymlOnce you've installed the dependencies, you can activate the conda environment and spin up the notebooks as follows:
$ conda activate book # or conda activate book-chapter7
$ jupyter notebookIf you prefer using pip instead of conda, you can create a virtual environment and install the requirements:
$ python -m venv venv
$ source venv/bin/activate # On Windows: venv\Scripts\activate
$ pip install -r requirements.txt
$ jupyter notebookApple Silicon Macs automatically benefit from MPS (Metal Performance Shaders) acceleration when using PyTorch 2.0+. The notebooks will automatically detect and use your Mac's GPU for significantly faster training and inference. No additional setup required!
When trying to clone the notebooks on Kaggle I get a message that I am unable to access the book's Github repository. How can I solve this issue?
This issue is likely due to a missing internet connection. When running your first notebook on Kaggle you need to enable internet access in the settings menu on the right side.
You can enable GPU usage by selecting GPU as Accelerator in the settings menu on the right side.
If you'd like to cite this book, you can use the following BibTeX entry:
@book{tunstall2022natural,
title={Natural Language Processing with Transformers: Building Language Applications with Hugging Face},
author={Tunstall, Lewis and von Werra, Leandro and Wolf, Thomas},
isbn={1098103246},
url={https://books.google.ch/books?id=7hhyzgEACAAJ},
year={2022},
publisher={O'Reilly Media, Incorporated}
}
