SmolLM‑2 ONNX Demo

A lightweight, Colab‑ready demo that shows how to run the SmolLM‑2 family of models (135 M & 360 M) directly from an ONNX export using 🤗 Transformers, Optimum and ONNX‑Runtime.

Notebook	What it does	Model(s)
`SmolLM2_ONNX.ipynb`	Two‑cell notebook with an interactive dropdown that lets you pick which SmolLM‑2 model to download and run.	135 M or 360 M (selected at runtime)
`SmolLM2_360M_ONNX.ipynb`	Single‑cell, ready‑to‑run demo that is hard‑coded to the 360 M model.	360 M only

The 360 M demo is also published on Kaggle – see the Kaggle Demo link in the About section.

Overview

SmolLM‑2 is a family of small, fast, and open‑source language models released by the 🤗 Community.
The models have been exported to ONNX so they can be executed with the high‑performance ONNX‑Runtime (CPU or GPU) without needing the full PyTorch stack.

This repository contains ready‑to‑run Colab notebooks that:

Install the required Python packages (transformers, optimum[onnxruntime], sentencepiece, …)
Detect whether a GPU is available and install the appropriate ONNX‑Runtime build (onnxruntime-gpu or onnxruntime)
Download the ONNX model files from the Hugging Face Hub (only the needed files)
Provide a small helper (generate_text) that wraps the generation call and offers common parameters (temperature, top‑k, stop token, line‑wrapping, …)

The notebooks are deliberately self‑contained – you can copy‑paste the cells into a fresh Colab runtime and start generating text.

Notebooks at a Glance

`SmolLM2_ONNX.ipynb`

Cell 1 – Environment & dependencies
- Installs system tools (git, wget)
- Upgrades the core Python libraries
- Installs the correct ONNX‑Runtime build (GPU/CPU)
- Adds a compatibility shim for newer 🤗 Transformers (≥ 4.36)
Cell 2 – Interactive demo
- Shows a dropdown UI (ipywidgets) with two options:
  
  UI label Hub repo ID
  
  135M (SmolLM‑2‑135M) onnx-community/SmolLM2-135M-ONNX
  
  360M (SmolLM‑2‑360M) onnx-community/SmolLM2-360M-ONNX
- After you pick a model, the notebook:
  1. Downloads the selected repo (only the required files)
  2. Loads the tokenizer and the ONNX model via ORTModelForCausalLM
  3. Provides a generate_text wrapper (temperature, top‑k, max tokens, …)
  4. Runs a quick demo prompt ("Here is the poem I wrote: ")
Key point: you can switch models on the fly without editing any code – just pick a different entry in the dropdown and re‑run the second cell.

`SmolLM2_360M_ONNX.ipynb`

Single cell that does everything described above but is hard‑coded to the 360 M model (repo_id = "onnx-community/SmolLM2-360M-ONNX").
No UI – the notebook is a straightforward “run‑once” demo, ideal for quick testing or for embedding in a tutorial where the model choice is fixed.

The notebook is also hosted on Kaggle (see the link below).

Running the Notebooks

1. Open the notebook in Colab

SmolLM2_ONNX.ipynb → Open in Colab → Run the first cell → Choose a model from the dropdown → Run the second cell.
SmolLM2_360M_ONNX.ipynb → Open in Colab → Run the single cell.

2. (Optional) Adjust generation parameters

Both notebooks expose the following arguments in the generate_text call:

Argument	Default	Description
`max_new_tokens`	`100`	Maximum number of tokens the model may generate.
`temperature`	`0.7`	Sampling temperature (0 = deterministic).
`top_k`	`50`	Limit sampling to the top‑k most likely tokens.
`stop_token`	`None`	Stop generation when this token appears.
`wrap_width`	`80`	Wrap output lines for readability.
`**extra_kwargs`	–	Any additional `model.generate` kwargs (e.g. `repetition_penalty`).

Feel free to experiment – higher temperature → more creative, lower → more deterministic.

3. View the output

The notebooks print the prompt and the generated text, wrapped at the width you set (default 80 characters).

Kaggle Demo

The 360 M demo notebook is also available as a Kaggle Kernel for users who prefer Kaggle’s environment:

🔗 Kaggle Demo URL: https://www.kaggle.com/code/harisna/smollm2-360m-onnx

The Kaggle version mirrors SmolLM2_360M_ONNX.ipynb exactly, so you can run it on Kaggle’s free GPU tier or CPU tier with a single click.

References

SmolLM‑2 – Small, open‑source language models from the 🤗 Community.
🤗 Transformers – for the tokenizer and generation utilities.
Optimum – for the ONNX‑Runtime integration (ORTModelForCausalLM).
ONNX Community – for providing the exported ONNX models
Google Colab – for the free GPU/CPU runtime that makes this demo instantly runnable.
Kaggle – data science community for the free GPU/CPU runtime.
GPT-OSS – used for code generation.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
SmolLM2_360M_ONNX.ipynb		SmolLM2_360M_ONNX.ipynb
SmolLM2_ONNX.ipynb		SmolLM2_ONNX.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SmolLM‑2 ONNX Demo

Overview

Notebooks at a Glance

`SmolLM2_ONNX.ipynb`

`SmolLM2_360M_ONNX.ipynb`

Running the Notebooks

1. Open the notebook in Colab

2. (Optional) Adjust generation parameters

3. View the output

Kaggle Demo

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

UI label	Hub repo ID
`135M (SmolLM‑2‑135M)`	`onnx-community/SmolLM2-135M-ONNX`
`360M (SmolLM‑2‑360M)`	`onnx-community/SmolLM2-360M-ONNX`

Folders and files

Latest commit

History

Repository files navigation

SmolLM‑2 ONNX Demo

Overview

Notebooks at a Glance

SmolLM2_ONNX.ipynb

SmolLM2_360M_ONNX.ipynb

Running the Notebooks

1. Open the notebook in Colab

2. (Optional) Adjust generation parameters

3. View the output

Kaggle Demo

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`SmolLM2_ONNX.ipynb`

`SmolLM2_360M_ONNX.ipynb`

Packages