- 
                Notifications
    
You must be signed in to change notification settings  - Fork 712
 
Qualcomm AI Engine Direct - Support simple_eval in calibration, perpl… #12958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
          
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12958
 Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New FailuresAs of commit 7a1a1d3 with merge base 9e00a51 ( NEW FAILURES - The following jobs have failed:
 
 This comment was automatically generated by Dr. CI and updates every 15 minutes.  | 
    
          This PR needs a 
 | 
    
| 
           Hi @cccclai,  | 
    
          
 There are some jobs running less frequently like https://github.com/pytorch/executorch/blob/b6b7a16df5e7852d976d8c34c8a7e9a1b6f7d005/.github/workflows/periodic.yml and https://github.com/pytorch/executorch/blob/b6b7a16df5e7852d976d8c34c8a7e9a1b6f7d005/.github/workflows/nightly.yml they won't block CI. Maybe we can use these jobs  | 
    
4f7d594    to
    dd3b173      
    Compare
  
    
          
 Sure! We will support the CI in the future PR under these yml files. Thanks  | 
    
dd3b173    to
    1cd8022      
    Compare
  
    | 
           Hi @cccclai, There are some incoming PRs would like to let you know: 
 Thank you.  | 
    
bb81d56    to
    f0e16d1      
    Compare
  
    | 
           Hi @cccclai, After this PR is merged, we will push another PR that applies some optimization. With those optimizations, we should be able to get ppl score of 12 for QNN on device, which aligns with prepare_pt2e and convert_pt2e.  | 
    
43ce5b2    to
    7a1a1d3      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thank you!
pytorch#12958) ### Summary - Enable Perplexity Evaluation on device with `llama.py` - Evaluate perplexity after qdq cpu - Enable quantization to use simple_eval as calibration dataset. - Enable UT to check perplexity for QWEN, which should be more reliable than checking the string output. Will have a follow up PR to address: - External CI enablement for qwen on x86 (If it does not take too long). - Hide Logits scale/offset to metadata in model #### Script `python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -s $DEVICE -m SM8750 --prompt "What is 1+1?" --temperature 0 --model_mode kv --max_seq_len 1024 --ptq 16a8w --decoder_model qwen2_5 --eval_perplexity --tasks wikitext` ### Test plan `python backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_static_qwen2_5 --model SM8650 --build_folder build-android/ --executorch_root . -s $DEVICE` Author: @shewu-quic, @winskuo-quic
Summary
llama.pyWill have a follow up PR to address:
Script
python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -s $DEVICE -m SM8750 --prompt "What is 1+1?" --temperature 0 --model_mode kv --max_seq_len 1024 --ptq 16a8w --decoder_model qwen2_5 --eval_perplexity --tasks wikitextTest plan
python backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_static_qwen2_5 --model SM8650 --build_folder build-android/ --executorch_root . -s $DEVICEAuthor: @shewu-quic, @winskuo-quic