Skip to content

Commit 902a68c

Browse files
authored
Convert to Triton Punica kernels (#658)
1 parent b3944ad commit 902a68c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

67 files changed

+4198
-954
lines changed

Cargo.lock

Lines changed: 471 additions & 189 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -216,7 +216,7 @@ COPY --from=eetq-kernels-builder /usr/src/eetq/build/lib.linux-x86_64-cpython-31
216216
RUN pip install einops --no-cache-dir
217217

218218
# Install flashinfer
219-
RUN pip install --no-cache-dir flashinfer==0.1.5+cu124torch2.4 -i https://flashinfer.ai/whl/cu124/torch2.4
219+
RUN pip install --no-cache-dir flashinfer==0.1.6 -i https://flashinfer.ai/whl/cu124/torch2.4
220220

221221
# Install server
222222
COPY proto proto

clients/python/lorax/client.py

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import json
2+
import logging
23
import requests
34
from requests.adapters import HTTPAdapter, Retry
45

@@ -20,7 +21,22 @@
2021
from lorax.errors import parse_error
2122
import os
2223

23-
LORAX_DEBUG_MODE = os.getenv("LORAD_DEBUG_MODE", None) is not None
24+
LORAX_DEBUG_MODE = os.getenv("LORAX_DEBUG_MODE", None) is not None
25+
if LORAX_DEBUG_MODE:
26+
# https://stackoverflow.com/a/16630836/1869739
27+
# These two lines enable debugging at httplib level (requests->urllib3->http.client)
28+
# You will see the REQUEST, including HEADERS and DATA, and RESPONSE with HEADERS but without DATA.
29+
# The only thing missing will be the response.body which is not logged.
30+
import http.client as http_client
31+
http_client.HTTPConnection.debuglevel = 1
32+
33+
# You must initialize logging, otherwise you'll not see debug output.
34+
logging.basicConfig()
35+
logging.getLogger().setLevel(logging.DEBUG)
36+
requests_log = logging.getLogger("requests.packages.urllib3")
37+
requests_log.setLevel(logging.DEBUG)
38+
requests_log.propagate = True
39+
2440

2541
class Client:
2642
"""Client to make calls to a LoRAX instance

docs/guides/contributing/development_env.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -47,12 +47,12 @@ We'll be working out of three different terminals during development, each servi
4747
Install development dependencies:
4848

4949
```shell
50-
DEBIAN_FRONTEND=noninteractive apt install pkg-config rsync tmux rust-gdb git -y
50+
DEBIAN_FRONTEND=noninteractive apt install pkg-config rsync tmux rust-gdb git -y && \
5151
PROTOC_ZIP=protoc-21.12-linux-x86_64.zip && \
5252
curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v21.12/$PROTOC_ZIP && \
5353
unzip -o $PROTOC_ZIP -d /usr/local bin/protoc && \
5454
unzip -o $PROTOC_ZIP -d /usr/local 'include/*' && \
55-
rm -f $PROTOC_ZIP
55+
rm -f $PROTOC_ZIP && \
5656
hash -r
5757
```
5858

@@ -71,8 +71,7 @@ tmux new -s server
7171
From within the `tmux` session, move into the LoRAX `server` directory within the repo (assumed to be in `/data/lorax`) and install dependencies:
7272

7373
```shell
74-
cd /data/lorax/server
75-
pip install -e .
74+
cd /data/lorax/server && pip install -e .
7675
make gen-server
7776
```
7877

@@ -95,9 +94,9 @@ tmux new -s router
9594
Now move into the `router` directory within the repo and install dependencies:
9695

9796
```shell
98-
cd /data/lorax/router
99-
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
100-
export PATH=$PATH:$HOME/.cargo/bin
97+
cd /data/lorax/router && \
98+
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y && \
99+
export PATH=$PATH:$HOME/.cargo/bin && \
101100
touch ../proto/generate.proto
102101
```
103102

docs/guides/contributing/index.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,22 @@ make export-requirements
2323
```
2424

2525
Never modify `requirements.txt` directly, as it may introduce dependency conflicts.
26+
27+
## Profiling
28+
29+
LoRAX supports the [PyTorch Profiler](https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html) to measure performance of LoRAX.
30+
31+
You can enable profiling when launching LoRAX by setting the `LORAX_PROFILER_DIR` environment variable to the directory
32+
you wish to output the Tensorboard traces to.
33+
34+
Once initialized, LoRAX will begin recording traces for every request to the server. Because traces can get very large,
35+
we record only the first 10 prefill requests (plus any decode requests between them), then stop recording and write
36+
out the results. A summary will be printed to stdout when this occurs.
37+
38+
Once you have your traces written to the profiler directory, you can visualize them in Tensorboard using the
39+
[PyTorch Profiler Tensorboard Plugin](https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html).
40+
41+
```bash
42+
pip install torch_tb_profiler
43+
tensorboard --logdir=$LORAX_PROFILER_DIR
44+
```

launcher/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ clap = { version = "4.1.4", features = ["derive", "env"] }
1111
ctrlc = { version = "3.2.5", features = ["termination"] }
1212
nix = "0.26.2"
1313
openssl = "0.10.66"
14+
hf-hub = { version = "0.3.0", features = ["tokio"] }
1415
h2 = "0.3.26"
1516
rustix = "0.37.25"
1617
serde = { version = "1.0.152", features = ["derive"] }

0 commit comments

Comments
 (0)