Transformers Notebooks

This repository contains the example code from our O'Reilly book Natural Language Processing with Transformers:

Getting started

You can run these notebooks on cloud platforms like Google Colab or your local machine. Note that most chapters require a GPU to run in a reasonable amount of time, so we recommend one of the cloud platforms as they come pre-installed with CUDA, or use a local machine with NVIDIA GPU (CUDA) or Apple Silicon (MPS).

Running on a cloud platform

To run these notebooks on a cloud platform, just click on one of the badges in the table below:

Chapter	Colab	Kaggle	Gradient	Studio Lab
Introduction
Text Classification
Transformer Anatomy
Multilingual Named Entity Recognition
Text Generation
Summarization
Question Answering
Making Transformers Efficient in Production
Dealing with Few to No Labels
Training Transformers from Scratch
Future Directions

Nowadays, the GPUs on Colab tend to be K80s (which have limited memory), so we recommend using Kaggle, Gradient, or SageMaker Studio Lab. These platforms tend to provide more performant GPUs like P100s, all for free!

Note: some cloud platforms like Kaggle require you to restart the notebook after installing new packages.

Running on your machine

To run the notebooks on your own machine, first clone the repository and navigate to it:

$ git clone https://github.com/nlp-with-transformers/notebooks.git
$ cd notebooks

Next, run the following command to create a conda virtual environment that contains all the libraries needed to run the notebooks:

$ conda env create -f environment.yml

Note: For optimal performance, you'll need either:

NVIDIA GPU: with CUDA Toolkit support

Apple Silicon: M1/M2/M3 Macs with MPS (Metal Performance Shaders) support

CPU only: Will work but significantly slower

Apple Silicon Macs (M1/M2/M3) are now fully supported! 🎉

Chapter 7 (Question Answering) has a special set of dependencies, so to run that chapter you'll need a separate environment:

$ conda env create -f environment-chapter7.yml

Once you've installed the dependencies, you can activate the conda environment and spin up the notebooks as follows:

$ conda activate book # or conda activate book-chapter7
$ jupyter notebook

Alternative: Using pip with virtual environment

If you prefer using pip instead of conda, you can create a virtual environment and install the requirements:

$ python -m venv venv
$ source venv/bin/activate  # On Windows: venv\Scripts\activate
$ pip install -r requirements.txt
$ jupyter notebook

Apple Silicon (M1/M2/M3) Users

Apple Silicon Macs automatically benefit from MPS (Metal Performance Shaders) acceleration when using PyTorch 2.0+. The notebooks will automatically detect and use your Mac's GPU for significantly faster training and inference. No additional setup required!

FAQ

When trying to clone the notebooks on Kaggle I get a message that I am unable to access the book's Github repository. How can I solve this issue?

This issue is likely due to a missing internet connection. When running your first notebook on Kaggle you need to enable internet access in the settings menu on the right side.

How do you select a GPU on Kaggle?

You can enable GPU usage by selecting GPU as Accelerator in the settings menu on the right side.

Citations

If you'd like to cite this book, you can use the following BibTeX entry:

@book{tunstall2022natural,
  title={Natural Language Processing with Transformers: Building Language Applications with Hugging Face},
  author={Tunstall, Lewis and von Werra, Leandro and Wolf, Thomas},
  isbn={1098103246},
  url={https://books.google.ch/books?id=7hhyzgEACAAJ},
  year={2022},
  publisher={O'Reilly Media, Incorporated}
}

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
SageMaker		SageMaker
data		data
images		images
scripts		scripts
.gitignore		.gitignore
01_introduction.ipynb		01_introduction.ipynb
02_classification.ipynb		02_classification.ipynb
03_transformer-anatomy.ipynb		03_transformer-anatomy.ipynb
04_multilingual-ner.ipynb		04_multilingual-ner.ipynb
05_text-generation.ipynb		05_text-generation.ipynb
06_summarization.ipynb		06_summarization.ipynb
07_question-answering.ipynb		07_question-answering.ipynb
07_question_answering_v2.ipynb		07_question_answering_v2.ipynb
08_model-compression.ipynb		08_model-compression.ipynb
09_few-to-no-labels.ipynb		09_few-to-no-labels.ipynb
10_transformers-from-scratch.ipynb		10_transformers-from-scratch.ipynb
11_future-directions.ipynb		11_future-directions.ipynb
LICENSE		LICENSE
README.md		README.md
environment-chapter7.yml		environment-chapter7.yml
environment.yml		environment.yml
install.py		install.py
plotting.mplstyle		plotting.mplstyle
requirements-chapter7-v2.txt		requirements-chapter7-v2.txt
requirements-chapter7.txt		requirements-chapter7.txt
requirements.txt		requirements.txt
settings.ini		settings.ini
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformers Notebooks

Getting started

Running on a cloud platform

Running on your machine

Alternative: Using pip with virtual environment

Apple Silicon (M1/M2/M3) Users

FAQ

When trying to clone the notebooks on Kaggle I get a message that I am unable to access the book's Github repository. How can I solve this issue?

How do you select a GPU on Kaggle?

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Transformers Notebooks

Getting started

Running on a cloud platform

Running on your machine

Alternative: Using pip with virtual environment

Apple Silicon (M1/M2/M3) Users

FAQ

When trying to clone the notebooks on Kaggle I get a message that I am unable to access the book's Github repository. How can I solve this issue?

How do you select a GPU on Kaggle?

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages