-
Notifications
You must be signed in to change notification settings - Fork 749
Qualcomm AI Engine Direct - GA Static Olmo-1b #14065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Qualcomm AI Engine Direct - GA Static Olmo-1b #14065
Conversation
Summary: - e2e script for GA Static OLMo-1b - perf 16a4w block quant token rate in kv mode: ~= 63 tokens/sec(SM8750) - acc: PPL ~= (fp: 8.735 -> htp: 9.945) in wikitext dataset - add model params file & model weight converter - add workaround pass for LayerNorm without weight & unitest - fix layernorm op builder & fix layernorm quant annotator
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14065
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit c869d51 with merge base 2845fd3 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
|
Hi @cccclai, This PR enables OLMo-1B from the GA list in the static version. model_id = "allenai/OLMo-1B-hf"
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(model_id).eval()
hf_tokenizer = AutoTokenizer.from_pretrained(model_id)
inputs = hf_tokenizer(prompt, return_tensors="pt")
output_ids = model.generate(
**inputs,
max_length=1024,
eos_token_id=hf_tokenizer.eos_token_id,
do_sample=False,
)
print(hf_tokenizer.decode(output_ids[0], skip_special_tokens=True))Output: Simply put, the theory of relativity states that the speed of light is the same in all directions.
The speed of light is the same in all directions.
The speed of light is the same in all directions. The speed of light is the same in all directions.
The speed of light is the same in all directions.
....
....Due to this repetitive behavior, I’ve created this as a draft PR. |
|
I think it is worth enabling it first and improving the accuracy later. If the accuracy isn't great, maybe let's have the default recipe to be 8w weight for now, just so users can get reasonable results from it. |
|
Also did you apply spinquant and seqmse with this model? |
|
cc: @rohan if you have bandwidth on this |
|
Hi @cccclai, I think the FP model itself already keeps repeating. Could we switch to |
I see, does this model work better? https://huggingface.co/allenai/OLMo-2-0425-1B |
We'll try both and see which one is more feasible. I think instruct version is empirically better than vanilla one. |
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Summary:
Test plan
cc: @haowhsu-quic