Update README.md

leeyeehoo · web-flow · commit 234aa339b4ca · 2023-09-11T16:05:21.000-05:00
fix the GPU device setting for CLI
diff --git a/README.md b/README.md
@@ -97,7 +97,7 @@ We currently support inference in the single GPU and batch size 1 setting, which
 
 You can use the following command for lauching a CLI interface:
 ```bash
-python -m medusa.inference.cli --model [path of medusa model]
+CUDA_VISIBLE_DEVICES=0 python -m medusa.inference.cli --model [path of medusa model]
 ```
 You can also pass `--load-in-8bit` or `--load-in-4bit` to load the base model in quantized format.
 
@@ -162,4 +162,4 @@ We also provide some illustrative notebooks in `notebooks/` to help you understa
 We welcome community contributions to Medusa. If you have an idea for how to improve it, please open an issue to discuss it with us. When submitting a pull request, please ensure that your changes are well-tested. Please split each major change into a separate pull request. We also have a [Roadmap](ROADMAP.md) summarizing our future plans for Medusa. Don't hesitate to reach out if you are interested in contributing to any of the items on the roadmap.
 
 ## Acknowledgements
-This codebase is influenced by amazing works from the community, including [FastChat](https://github.com/lm-sys/FastChat), [TinyChat](https://github.com/mit-han-lab/llm-awq/tree/main/), [vllm](https://github.com/vllm-project/vllm) and many others.
+This codebase is influenced by amazing works from the community, including [FastChat](https://github.com/lm-sys/FastChat), [TinyChat](https://github.com/mit-han-lab/llm-awq/tree/main/), [vllm](https://github.com/vllm-project/vllm) and many others.