Skip to content

Commit cf0855b

Browse files
Update README.MD
1 parent 8e8cee1 commit cf0855b

File tree

1 file changed

+25
-9
lines changed

1 file changed

+25
-9
lines changed

README.md

Lines changed: 25 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
<img src="https://em-content.zobj.net/source/microsoft-teams/363/crayon_1f58d-fe0f.png" width="120" alt="Crayon Logo"/>
33
</p>
44

5-
<h1 align="center">🖍️ XERV Crayon v4.1.9</h1>
5+
<h1 align="center">🖍️ XERV Crayon v5.0.1</h1>
66

77
<p align="center">
88
<strong>The Omni-Backend Tokenizer for Specialized AI</strong>
@@ -208,9 +208,15 @@ print(f"Tokens: {tokens}")
208208
## 📦 Installation
209209

210210
```bash
211-
git clone https://github.com/Xerv-AI/crayon.git
212-
cd crayon
213-
pip install -e .
211+
pip install xerv-crayon
212+
```
213+
214+
### Google Colab / Linux Installation
215+
Since Crayon includes high-performance C++ extensions, it will compile natively on your environment:
216+
217+
```python
218+
# Run this in a Colab cell
219+
!pip install xerv-crayon
214220
```
215221

216222
### Build the Extensions
@@ -253,6 +259,7 @@ Crayon now uses a **"God Tier"** multi-backend implementation combining:
253259

254260
| Profile | Size | Optimized For | Sources |
255261
|:--------|:-----|:--------------|:--------|
262+
| **`standard`** | 57k | **General English (V5 Default)** | Lite + Top 10k subwords |
256263
| **`lite`** | 50k | Speed & Mobile | WikiText, RainDrop |
257264
| **`science`** | 250k | Reasoning (LaTeX, Quantum, Grad Math) | GRAD, Physics-700 |
258265
| **`code`** | 250k | Syntax (Python, Rust, C++, JS) | CodeParrot, The Stack |
@@ -268,14 +275,23 @@ vocab = CrayonVocab.load_profile("multilingual")
268275

269276
## ☁️ Verify on Google Colab
270277

271-
Want to test the **CUDA Backend** for free?
272278

273-
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Xerv-AI/crayon/blob/main/colab_benchmark.ipynb)
279+
### ✅ Quick Verify Snippet
274280

275-
1. Open the notebook.
276-
2. Change Runtime type to **T4 GPU**.
277-
3. Run the cells to verify `crayon_cuda` compiles and smashes tokens at >100M/sec.
281+
```python
282+
from crayon import CrayonVocab
283+
284+
# Initialize with Auto-Backend (AVX2/CUDA/ROCm)
285+
tokenizer = CrayonVocab(device="auto")
278286

287+
# 1. Test Standard subword-heavy profile
288+
tokenizer.load_profile("standard")
289+
print(tokenizer.tokenize("that is a test for the standard profile"))
290+
291+
# 2. Test Code specialized profile
292+
tokenizer.load_profile("code")
293+
print(tokenizer.tokenize("def fast_inverse_sqrt(x):"))
294+
```
279295
---
280296

281297
## 🧪 Testing & Verification

0 commit comments

Comments
 (0)