Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Conversation

@vmpuri
Copy link
Contributor

@vmpuri vmpuri commented Oct 25, 2024

TPS and other stats were being reported as NAN due to poor logic.

I don't see any reason why compile should block theses stats from being printed or why there should be a separate TPS for jit compile

python3 torchchat.py generate llama3.2-1b --compile --device cuda
Using device=cuda NVIDIA PG509-210
Loading model...
Time to load model: 1.19 seconds
-----------------------------------------------------------
Hello, my name is Sophia. I'm a huge fan of your work. I've been following your blog for a while now and I just wanted to say that your content is top-notch. I love how you share your passion for history, science, and culture with your audience.

As a young woman in my early twenties, I'm always looking for new and interesting things to read about. Your blog is the perfect place to learn something new and expand my knowledge on a subject that I'm really interested in. I've read a few of your posts on archaeology and I was really impressed with what you had to say about it.

I was born and raised in a small town, and I've always been fascinated by the history of my hometown. I have a lot of amazing memories of visiting the local historical society and attending events and exhibitions. Your blog has made me realize how much I want to learn more about the history of my town and the region. I've been thinking about studying history as a career,just-in-time compilation time (incl run time): 8.3e+01 seconds
2024-10-24:17:19:37,884 INFO     [generate.py:1171] 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                
Generated 199 tokens                 
Time for inference 1: 82.8278 sec total                 
Time to first token: 1.3995 sec with parallel prefill.                

      Total throughput: 2.4146 tokens/sec, 0.4141 s/token                 
First token throughput: 0.7145 tokens/sec, 1.3995 s/token                 
 Next token throughput: 2.4439 tokens/sec, 0.4092 s/token                     
2024-10-24:17:19:37,885 INFO     [generate.py:1182] 
Bandwidth achieved: 7.24 GB/s
2024-10-24:17:19:37,885 INFO     [generate.py:1186] *** This first iteration will include cold start effects for dynamic import, hardware caches, JIT compilation. ***

========================================


      Average tokens/sec (total): 2.41                 
Average tokens/sec (first token): 0.71                 
Average tokens/sec (next tokens): 2.44 
                
Memory used: 2.83 GB

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 25, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1330

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 52caa9c with merge base 7fe2c86 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@vmpuri vmpuri requested a review from Jack-Khuu October 25, 2024 00:20
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 25, 2024
@vmpuri vmpuri requested a review from jerryzh168 October 25, 2024 00:21
@vmpuri vmpuri marked this pull request as ready for review October 25, 2024 00:21
@Jack-Khuu Jack-Khuu merged commit 77774d2 into main Oct 25, 2024
52 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants