Skip to content
Discussion options

You must be logged in to vote

Oh wow! I just copied torch_dtype from the Transformers docs and didn't really think about it.

Yes, I am seeing similar performance (and GPU usage) if I update my call to models.Transformers() as follows:

from datetime import datetime
from guidance import models, gen, user, assistant
from guidance.chat import Phi3MiniChatTemplate
from accelerate import Accelerator


if __name__ == "__main__":
    accelerator = Accelerator()
    model = models.Transformers(
        "microsoft/Phi-3.5-mini-instruct",
        chat_template=Phi3MiniChatTemplate,
        torch_dtype="auto",
        device_map=accelerator.device,
    )
    with user():
        model += "Hello. How are you?\n"

    response_start =

Replies: 2 comments 7 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
7 replies
@nchammas
Comment options

@nchammas
Comment options

@hudson-ai
Comment options

@nchammas
Comment options

Answer selected by nchammas
@nchammas
Comment options

@hudson-ai
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants