Skip to content

Conversation

@mcr229
Copy link
Contributor

@mcr229 mcr229 commented Oct 2, 2024

We are pulling in and testing out new 16x4 kleidi kernels, we see some significant performance improvements from this

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 2, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5827

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 48fef5e with merge base fbcd332 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 2, 2024
@mcr229 mcr229 requested a review from digantdesai October 2, 2024 21:17
@mcr229
Copy link
Contributor Author

mcr229 commented Oct 3, 2024

We see some rather significant speed up on prefill performance for Llama Models:

Before:

I 00:00:05.587790 executorch:stats.h:84] 	Prompt Tokens: 64    Generated Tokens: 63
I 00:00:05.587793 executorch:stats.h:90] 	Model Load Time:		3.999000 (seconds)
I 00:00:05.587796 executorch:stats.h:100] 	Total inference time:		1.579000 (seconds)		 Rate: 	39.898670 (tokens/second)
I 00:00:05.587806 executorch:stats.h:108] 		Prompt evaluation:	0.219000 (seconds)		 Rate: 	292.237443 (tokens/second)
I 00:00:05.587809 executorch:stats.h:119] 		Generated 63 tokens:	1.360000 (seconds)		 Rate: 	46.323529 (tokens/second)
I 00:00:05.587812 executorch:stats.h:127] 	Time to first generated token:	0.219000 (seconds)
I 00:00:05.587816 executorch:stats.h:134] 	Sampling time over 127 tokens:	0.014000 (seconds)

After

I 00:00:05.917623 executorch:stats.h:97] 	Prompt Tokens: 64    Generated Tokens: 63
I 00:00:05.917626 executorch:stats.h:103] 	Model Load Time:		0.000000 (seconds)
I 00:00:05.917628 executorch:stats.h:113] 	Total inference time:		1.326000 (seconds)		 Rate: 	47.511312 (tokens/second)
I 00:00:05.917632 executorch:stats.h:121] 		Prompt evaluation:	0.179000 (seconds)		 Rate: 	357.541899 (tokens/second)
I 00:00:05.917635 executorch:stats.h:132] 		Generated 63 tokens:	1.147000 (seconds)		 Rate: 	54.925894 (tokens/second)
I 00:00:05.917639 executorch:stats.h:140] 	Time to first generated token:	0.179000 (seconds)
I 00:00:05.917641 executorch:stats.h:147] 	Sampling time over 127 tokens:	0.009000 (seconds)

@mcr229 mcr229 closed this Jul 25, 2025
@mcr229 mcr229 deleted the qp8++ branch July 25, 2025 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants