-
Simple test script on guidance @ 9629e88, an M3 macOS, and Python 3.11: from datetime import datetime
from guidance import models, gen, user, assistant
from guidance.chat import Phi3MiniChatTemplate
if __name__ == "__main__":
model = models.Transformers(
"microsoft/Phi-3.5-mini-instruct",
chat_template=Phi3MiniChatTemplate,
)
with user():
model += "Hello. How are you?\n"
response_start = datetime.now()
with assistant():
model += gen(name="response", stop="\n")
response_end = datetime.now()
print(model["response"])
print(response_end - response_start) Results after running this three times in a row: $ python test.py
Loading checkpoint shards: 100%|████████████████| 2/2 [00:10<00:00, 5.31s/it]
gpustat is not installed, run `pip install gpustat` to collect GPU stats.
I'm Phi, an AI language model, so I don't have feelings, but I'm fully operational and ready to assist you! How can I help you today?
0:00:17.377674
$ python test.py
Loading checkpoint shards: 100%|████████████████| 2/2 [00:12<00:00, 6.06s/it]
gpustat is not installed, run `pip install gpustat` to collect GPU stats.
I'm Phi, an AI language model, so I don't have feelings, but I'm fully operational and ready to assist you! How can I help you today?
0:00:19.502854
$ python test.py
Loading checkpoint shards: 100%|████████████████| 2/2 [00:13<00:00, 6.64s/it]
gpustat is not installed, run `pip install gpustat` to collect GPU stats.
I'm Phi, an AI language model, so I don't have feelings, but I'm fully operational and ready to assist you! How can I help you today?
0:00:21.474961 Is this normal? I don't see any GPU load on my machine. I assume there is something I can do to speed this up. There is some stuff in the Transformers docs on GPUs, but I'm not sure what, if anything, applies to my situation. Is it appropriate to add some docs to Guidance on configuring your environment for the best performance? If not, what do you suggest I try? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 7 replies
-
Doing some more reading, I believe I need to configure an MPS backend to get GPU acceleration on macOS. And I noticed a Line 69 in 9629e88 So I installed the if __name__ == "__main__":
model = models.Transformers(
"microsoft/Phi-3.5-mini-instruct",
chat_template=Phi3MiniChatTemplate,
device_map="mps:0",
) However, whether the value is
I'm not sure if the mention of How is this ideally supposed to work? |
Beta Was this translation helpful? Give feedback.
-
Extra parameters to the |
Beta Was this translation helpful? Give feedback.
Oh wow! I just copied
torch_dtype
from the Transformers docs and didn't really think about it.Yes, I am seeing similar performance (and GPU usage) if I update my call to
models.Transformers()
as follows: