A lightweight, Colab‑ready demo that shows how to run the SmolLM‑2 family of models (135 M & 360 M) directly from an ONNX export using 🤗 Transformers, Optimum and ONNX‑Runtime.
| Notebook | What it does | Model(s) |
|---|---|---|
SmolLM2_ONNX.ipynb |
Two‑cell notebook with an interactive dropdown that lets you pick which SmolLM‑2 model to download and run. | 135 M or 360 M (selected at runtime) |
SmolLM2_360M_ONNX.ipynb |
Single‑cell, ready‑to‑run demo that is hard‑coded to the 360 M model. | 360 M only |
The 360 M demo is also published on Kaggle – see the Kaggle Demo link in the About section.
SmolLM‑2 is a family of small, fast, and open‑source language models released by the 🤗 Community.
The models have been exported to ONNX so they can be executed with the high‑performance ONNX‑Runtime (CPU or GPU) without needing the full PyTorch stack.
This repository contains ready‑to‑run Colab notebooks that:
- Install the required Python packages (
transformers,optimum[onnxruntime],sentencepiece, …) - Detect whether a GPU is available and install the appropriate ONNX‑Runtime build (
onnxruntime-gpuoronnxruntime) - Download the ONNX model files from the Hugging Face Hub (only the needed files)
- Provide a small helper (
generate_text) that wraps the generation call and offers common parameters (temperature, top‑k, stop token, line‑wrapping, …)
The notebooks are deliberately self‑contained – you can copy‑paste the cells into a fresh Colab runtime and start generating text.
-
Cell 1 – Environment & dependencies
- Installs system tools (
git,wget) - Upgrades the core Python libraries
- Installs the correct ONNX‑Runtime build (GPU/CPU)
- Adds a compatibility shim for newer 🤗 Transformers (≥ 4.36)
- Installs system tools (
-
Cell 2 – Interactive demo
-
Shows a dropdown UI (
ipywidgets) with two options:UI label Hub repo ID 135M (SmolLM‑2‑135M)onnx-community/SmolLM2-135M-ONNX360M (SmolLM‑2‑360M)onnx-community/SmolLM2-360M-ONNX -
After you pick a model, the notebook:
- Downloads the selected repo (only the required files)
- Loads the tokenizer and the ONNX model via
ORTModelForCausalLM - Provides a
generate_textwrapper (temperature, top‑k, max tokens, …) - Runs a quick demo prompt (
"Here is the poem I wrote: ")
-
-
Key point: you can switch models on the fly without editing any code – just pick a different entry in the dropdown and re‑run the second cell.
- Single cell that does everything described above but is hard‑coded to the 360 M model (
repo_id = "onnx-community/SmolLM2-360M-ONNX"). - No UI – the notebook is a straightforward “run‑once” demo, ideal for quick testing or for embedding in a tutorial where the model choice is fixed.
The notebook is also hosted on Kaggle (see the link below).
SmolLM2_ONNX.ipynb→ Open in Colab → Run the first cell → Choose a model from the dropdown → Run the second cell.SmolLM2_360M_ONNX.ipynb→ Open in Colab → Run the single cell.
Both notebooks expose the following arguments in the generate_text call:
| Argument | Default | Description |
|---|---|---|
max_new_tokens |
100 |
Maximum number of tokens the model may generate. |
temperature |
0.7 |
Sampling temperature (0 = deterministic). |
top_k |
50 |
Limit sampling to the top‑k most likely tokens. |
stop_token |
None |
Stop generation when this token appears. |
wrap_width |
80 |
Wrap output lines for readability. |
**extra_kwargs |
– | Any additional model.generate kwargs (e.g. repetition_penalty). |
Feel free to experiment – higher temperature → more creative, lower → more deterministic.
The notebooks print the prompt and the generated text, wrapped at the width you set (default 80 characters).
The 360 M demo notebook is also available as a Kaggle Kernel for users who prefer Kaggle’s environment:
🔗 Kaggle Demo URL: https://www.kaggle.com/code/harisna/smollm2-360m-onnx
The Kaggle version mirrors SmolLM2_360M_ONNX.ipynb exactly, so you can run it on Kaggle’s free GPU tier or CPU tier with a single click.
- SmolLM‑2 – Small, open‑source language models from the 🤗 Community.
- 🤗 Transformers – for the tokenizer and generation utilities.
- Optimum – for the ONNX‑Runtime integration (
ORTModelForCausalLM). - ONNX Community – for providing the exported ONNX models
- Google Colab – for the free GPU/CPU runtime that makes this demo instantly runnable.
- Kaggle – data science community for the free GPU/CPU runtime.
- GPT-OSS – used for code generation.