Replies: 3 comments
-
So after digging through the C++ source code, the answer is: logits=generator.get_output("logits") However for some reason at the first step the maximum token is different from the output of import onnxruntime_genai as og
import numpy as np
prompt = '''<|user|>
Please tell me the time.<|end|>
<|assistant|>'''
model=og.Model("/home/ubuntu/models/Phi-3-mini-4k-instruct-onnx/cuda/cuda-fp16/")
tokenizer = og.Tokenizer(model)
tokens = tokenizer.encode(prompt)
params=og.GeneratorParams(model)
params.input_ids = tokens
generator = og.Generator(model, params)
i = 0
while not generator.is_done():
generator.compute_logits()
generator.generate_next_token()
new_token = generator.get_next_tokens()[0]
logits = generator.get_output("logits").squeeze()
new_token2 = np.argmax(logits)
print(new_token, " ", new_token2)
i += 1
if i > 10:
break
print() And the result:
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Created an issue for this as it looks like it needs to be investigated |
Beta Was this translation helpful? Give feedback.
0 replies
-
See #591 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
According to the documentation
generator.get_output()
should return the generated logits.In practice, this is the error message I get:
The function expects an input string. However no matter what I put, the output is
array([], dtype=float64)
.What is the correct way to use this method?
Beta Was this translation helpful? Give feedback.
All reactions