GitHub - tzujohsu/LLM_speculative_decoding_evaluation: Accelerating Large Language Models inference with SPS and BiLD algorithms

Accelerating LLM inference with SPS and BiLD

Speculative Decoding has emerged as a pivotal approach in enhancing the efficiency of Large Language Models (LLMs), addressing the critical challenge of inference latency primarily caused by memory-bound computation limitations. The motivation is to explore how Speculative Decoding can be effectively adapted across different model series and configurations.

This repo aims to implement two algorithms: (1) Deepmind's Algorithm: Speculative Sampling (SpS) with Auto-Regressive Target and Draft Models (2) Big Little Decoder Algorithm(BiLD) with Fallback and Rollback Policies

To run the experiment:

python benchmark.py    \
 --target_model_name facebook/opt-6.7b     \
 --approx_model_name facebook/opt-125m     \
 --temperature 0     \
 --max_tokens 30    \
 --fallback_thres 0.6 \
 --rollback_thres 3

This is a project repo for eecs598:llm course.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
__pycache__		__pycache__
t5		t5
.gitignore		.gitignore
BiLD.py		BiLD.py
README.md		README.md
autoregressive_sampling.py		autoregressive_sampling.py
benchmark.py		benchmark.py
combined_data.jsonl		combined_data.jsonl
speculative_sampling.py		speculative_sampling.py
speedup-plot.png		speedup-plot.png
speedup_plot.ipynb		speedup_plot.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Accelerating LLM inference with SPS and BiLD

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Accelerating LLM inference with SPS and BiLD

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages