Skip to content

Commit 7bc3d31

Browse files
authored
Revert "MLX-LM backend" (#50)
1 parent 7ec0c05 commit 7bc3d31

File tree

12 files changed

+5
-586
lines changed

12 files changed

+5
-586
lines changed

.github/workflows/coverage.yml

Lines changed: 1 addition & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -32,42 +32,7 @@ jobs:
3232
- name: Run tests
3333
run: |
3434
source venv/bin/activate
35-
coverage run --source=genlm/backend -m pytest --benchmark-disable --ignore=tests/test_mlx.py
36-
coverage json --omit "*/test*"
37-
coverage report --omit "*/test*"
38-
39-
- name: Upload coverage to Codecov
40-
uses: codecov/codecov-action@v5
41-
with:
42-
fail_ci_if_error: false
43-
token: ${{ secrets.CODECOV_TOKEN }}
44-
files: ./coverage.json
45-
slug: genlm/genlm-backend
46-
47-
test_mlx_coverage:
48-
runs-on: macos-14
49-
50-
steps:
51-
- uses: actions/checkout@v4
52-
with:
53-
fetch-depth: 1
54-
55-
- uses: actions/setup-python@v4
56-
with:
57-
python-version: 3.11.5
58-
cache: 'pip'
59-
60-
- name: Install dependencies
61-
run: |
62-
python -m venv venv
63-
source venv/bin/activate
64-
pip install -e .[mlx]
65-
pip install -r requirements-dev.txt
66-
67-
- name: Run MLX tests
68-
run: |
69-
source venv/bin/activate
70-
coverage run --source=genlm/backend -m pytest tests/test_mlx.py --benchmark-disable
35+
coverage run --source=genlm/backend -m pytest --benchmark-disable
7136
coverage json --omit "*/test*"
7237
coverage report --omit "*/test*"
7338

.github/workflows/pytest.yml

Lines changed: 1 addition & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -29,25 +29,4 @@ jobs:
2929
source venv/bin/activate
3030
pip install -e .[test]
3131
pip install -r requirements-dev.txt
32-
python -m pytest tests --ignore=tests/test_mlx.py
33-
34-
test-mlx:
35-
runs-on: macos-14
36-
37-
steps:
38-
- uses: actions/checkout@v4
39-
with:
40-
fetch-depth: 1
41-
42-
- uses: actions/setup-python@v4
43-
with:
44-
python-version: 3.11.5
45-
cache: 'pip'
46-
47-
- name: Run MLX Tests
48-
run: |
49-
python -m venv venv
50-
source venv/bin/activate
51-
pip install -e .[mlx]
52-
pip install -r requirements-dev.txt
53-
python -m pytest tests/test_mlx.py
32+
python -m pytest tests

DEVELOPING.md

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,6 @@ uv pip install -e ".[docs]"
2727
uv pip install -r requirements-dev.txt
2828
```
2929

30-
To build with MLX support, run:
31-
```bash
32-
uv pip install -e ".[mlx]"
33-
```
34-
3530
## Testing
3631

3732
When test dependencies are installed, the test suite can be run via:

README.md

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,6 @@ See our [documentation](https://genlm.github.io/genlm-backend/).
1818
- Automatic batching of concurrent log-probability requests, enabling efficient large-scale inference without having to write batching logic yourself
1919
- Byte-level decoding of transformers tokenizers, enabling advanced token-level control
2020
- Support for arbitrary Hugging Face models (e.g., LLaMA, DeepSeek, etc.) with fast inference and automatic KV caching using vllm
21-
- NEW: support for MLX-LM library, allowing faster inference on Apple silicon devices.
2221

2322

2423
## ⚡ Quick Start
@@ -29,13 +28,6 @@ This library supports installation via pip:
2928
pip install genlm-backend
3029
```
3130

32-
Or to install with MLX support, run:
33-
34-
```bash
35-
pip install genlm-backend[mlx]
36-
```
37-
38-
3931
## 🧪 Example: Autobatched Sequential Importance Sampling with LLMs
4032

4133
This example demonstrates how `genlm-backend` enables concise, scalable probabilistic inference with language models. It implements a Sequential Importance Sampling (SIS) algorithm that makes asynchronous log-probabality requests which get automatically batched by the language model.

benchmark/benchmark_mlx.py

Lines changed: 0 additions & 43 deletions
This file was deleted.

genlm/backend/cache.py

Lines changed: 0 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -43,33 +43,6 @@ def clear(self):
4343
self.cache.clear()
4444

4545

46-
class OutputMLXCache(OutputCache):
47-
"""A cache for storing tensor outputs with MLX.
48-
49-
Since MLX uses unified memory, we don't need to move tensors between CPU and GPU.
50-
51-
Args:
52-
maxsize (int): Maximum number of items to store in the cache
53-
"""
54-
55-
def __init__(self, maxsize):
56-
super().__init__(maxsize, move_to_cpu=False)
57-
58-
def __getitem__(self, key):
59-
if key in self.cache:
60-
value = self.cache.pop(key)
61-
self.cache[key] = value
62-
return value
63-
raise KeyError(key)
64-
65-
def __setitem__(self, key, value):
66-
if len(self.cache) >= self.maxsize:
67-
_, old_tensor = self.cache.popitem(last=False)
68-
del old_tensor
69-
70-
self.cache[key] = value
71-
72-
7346
class TokenTrie:
7447
"""Class used internally to cache language model results.
7548

genlm/backend/llm/__init__.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
from genlm.backend.llm.vllm import AsyncVirtualLM
22
from genlm.backend.llm.hf import AsyncTransformer
33
from genlm.backend.llm.base import AsyncLM, MockAsyncLM
4-
from genlm.backend.llm.mlx import AsyncMlxLM
54

65
import torch
76

@@ -34,8 +33,6 @@ def load_model_by_name(name, backend=None, llm_opts=None):
3433
return AsyncTransformer.from_name(name, **llm_opts)
3534
elif backend == "mock":
3635
return MockAsyncLM.from_name(name, **llm_opts)
37-
elif backend == "mlx":
38-
return AsyncMlxLM.from_name(name, **llm_opts)
3936
else:
4037
raise ValueError(f"Invalid backend: {backend}")
4138

@@ -45,6 +42,5 @@ def load_model_by_name(name, backend=None, llm_opts=None):
4542
"AsyncLM",
4643
"AsyncVirtualLM",
4744
"AsyncTransformer",
48-
"AsyncMlxLM",
4945
"MockAsyncLM",
5046
]

0 commit comments

Comments
 (0)