Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Commit bea750d

Browse files
committed
1 parent a039ab1 commit bea750d

25 files changed

+4688
-2
lines changed

llama31-1218/cpu_aoti_4.txt

Lines changed: 655 additions & 0 deletions
Large diffs are not rendered by default.

llama31-1218/cpu_aoti_8.txt

Lines changed: 232 additions & 0 deletions
Large diffs are not rendered by default.

llama31-1218/cpu_aoti_b16.txt

Lines changed: 228 additions & 0 deletions
Large diffs are not rendered by default.

llama31-1218/cpu_aoti_pt2_4.txt

Lines changed: 675 additions & 0 deletions
Large diffs are not rendered by default.

llama31-1218/cpu_aoti_pt2_8.txt

Lines changed: 228 additions & 0 deletions
Large diffs are not rendered by default.

llama31-1218/cpu_aoti_pt2_b16.txt

Lines changed: 222 additions & 0 deletions
Large diffs are not rendered by default.

llama31-1218/cpu_compile_4.txt

Lines changed: 302 additions & 0 deletions
Large diffs are not rendered by default.

llama31-1218/cpu_compile_8.txt

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
2+
OMP_NUM_THREADS=16 numactl --cpunodebind=0 --membind=0 python3 torchchat.py generate llama3.1 --quantize '{"linear:int8": {"groupsize": 0}, "precision": {"dtype":"bfloat16"}, "executor":{"accelerator":"cpu"}}' --prompt "Once upon a time," --max-new-tokens 256 --compile --num-samples 3
3+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4+
OMP_NUM_THREADS=16 numactl --cpunodebind=0 --membind=0 python3 torchchat.py generate llama3.1 --quantize '{"linear:int8": {"groupsize": 0}, "precision": {"dtype":"bfloat16"}, "executor":{"accelerator":"cpu"}}' --prompt "Once upon a time," --max-new-tokens 256 --compile --num-samples 3
5+
PyTorch version 2.6.0.dev20241218+cu124 available.
6+
Unabled to import torchao experimental quant_api with error: [Errno 2] No such file or directory: '/home/jackkhuu/oss/torchchat/torchao-build/src/ao/torchao/experimental/quant_api.py'
7+
Using device=cpu Intel(R) Xeon(R) Platinum 8339HC CPU @ 1.80GHz
8+
Loading model...
9+
Time to load model: 0.10 seconds
10+
Quantizing the model with: {'linear:int8': {'groupsize': 0}, 'precision': {'dtype': 'bfloat16'}, 'executor': {'accelerator': 'cpu'}}
11+
Time to quantize model: 29.60 seconds
12+
-----------------------------------------------------------
13+
Once upon a time, in a small village nestled in the rolling hills of Provence, there lived a young artist named Sophie. Sophie was known throughout the village for her exquisite watercolors of the local flora and fauna. She spent her days painting the vibrant bouquets of wildflowers that bloomed in the nearby fields, and the majestic birds that soared through the sky.
14+
15+
Sophie's studio was a cozy little room above her family's bakery, filled with the sweet scent of freshly baked bread wafting through the air. Her easel stood by the window, where she could paint the ever-changing light of the Provençal sky. Sophie's art was not just a reflection of her love for nature, but also a way for her to connect with the beauty of the world around her.
16+
17+
One day, a wealthy patron from the city, Monsieur LeFleur, arrived in the village in search of a new artist to paint the gardens of his estate. He had heard about Sophie's extraordinary talent and sought her out, eager to commission a series of paintings that would capture the essence of the Provençal landscape.
18+
19+
Sophie was both thrilled and intimidated by the opportunity. She had never painted on such a grand scale before, and the pressure to produce something truly exceptional was daunting.
20+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
21+
Generated 255 tokens
22+
Time for inference 1: 176.6127 sec total
23+
Time to first token: 0.6387 sec with parallel prefill.
24+
25+
Total throughput: 1.4495 tokens/sec, 0.6899 s/token
26+
First token throughput: 1.5656 tokens/sec, 0.6387 s/token
27+
Next token throughput: 1.4491 tokens/sec, 0.6901 s/token
28+
29+
Bandwidth achieved: 12.41 GB/s
30+
*** This first iteration will include cold start effects for dynamic import, hardware caches, JIT compilation. ***
31+
just-in-time compilation time (incl run time): 1.8e+02 seconds
32+
33+
========================================
34+
35+
Once upon a time, there was a young girl named Sophie who lived in a small village surrounded by rolling hills and dense forests. She was a curious and adventurous child, always eager to explore the world around her.
36+
One day, while wandering through the forest, Sophie stumbled upon a hidden path she had never seen before. The path was overgrown with vines and shrubs, and it looked as though it hadn't been used in years. Sophie's curiosity was piqued, and she decided to investigate further.
37+
As she made her way down the path, the trees grew taller and the air grew thick with the scent of wildflowers. Sophie felt as though she was walking through a secret world, one that few people knew existed. The path twisted and turned, leading her deeper into the forest.
38+
Suddenly, Sophie heard the sound of running water. She followed the sound and soon found herself standing at the edge of a beautiful waterfall. The water cascaded down a rocky cliff, creating a misty veil that surrounded her.
39+
Sophie felt a sense of wonder and awe wash over her. She had never seen anything like it before. She couldn't help but feel a sense of magic in the air, as though the waterfall was a portal to a different world.
40+
As she stood there, taking in
41+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
42+
Generated 255 tokens
43+
Time for inference 2: 35.0481 sec total
44+
Time to first token: 0.3972 sec with parallel prefill.
45+
46+
Total throughput: 7.3043 tokens/sec, 0.1369 s/token
47+
First token throughput: 2.5176 tokens/sec, 0.3972 s/token
48+
Next token throughput: 7.3591 tokens/sec, 0.1359 s/token
49+
50+
Bandwidth achieved: 62.52 GB/s
51+
52+
========================================
53+
54+
Once upon a time, in the rolling hills of Somerset, there was a small village nestled among the picturesque countryside. The village was called Langport, and it was famous for its beautiful riverside setting and its rich history.
55+
One day, a young girl named Emily moved to Langport with her family. She was excited to explore her new surroundings and make some new friends. As she wandered through the village, she discovered a quaint little shop with a sign that read "Curious Goods".
56+
Emily's curiosity was piqued, and she pushed open the door to venture inside. The shop was dimly lit, with shelves upon shelves of strange and wondrous objects. There were vintage dolls, antique clocks, and mysterious boxes with strange symbols etched onto their lids.
57+
The shopkeeper, an elderly man with a kind face and twinkling eyes, welcomed Emily to his store. "Ah, a new face in town," he said with a warm smile. "What brings you to Curious Goods?"
58+
Emily's eyes widened as she scanned the shelves. "I'm not sure," she said. "I just saw the sign and thought it looked interesting."
59+
The shopkeeper chuckled. "Ah, curiosity is a wonderful thing. Let me show you some of my favorite treasures."
60+
As Emily browsed
61+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
62+
Generated 255 tokens
63+
Time for inference 3: 34.1833 sec total
64+
Time to first token: 0.3417 sec with parallel prefill.
65+
66+
Total throughput: 7.4890 tokens/sec, 0.1335 s/token
67+
First token throughput: 2.9262 tokens/sec, 0.3417 s/token
68+
Next token throughput: 7.5351 tokens/sec, 0.1327 s/token
69+
70+
Bandwidth achieved: 64.10 GB/s
71+
72+
========================================
73+
74+
75+
Warning: Excluding compile in calculations
76+
Average tokens/sec (total): 7.40
77+
Average tokens/sec (first token): 2.72
78+
Average tokens/sec (next tokens): 7.45
79+
80+
Memory used: 0.00 GB

llama31-1218/cpu_compile_b16.txt

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
2+
OMP_NUM_THREADS=16 numactl --cpunodebind=0 --membind=0 python3 torchchat.py generate llama3.1 --quantize '{"precision": {"dtype":"bfloat16"}, "executor":{"accelerator":"cpu"}}' --prompt "Once upon a time," --max-new-tokens 256 --compile --num-samples 3
3+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4+
OMP_NUM_THREADS=16 numactl --cpunodebind=0 --membind=0 python3 torchchat.py generate llama3.1 --quantize '{"precision": {"dtype":"bfloat16"}, "executor":{"accelerator":"cpu"}}' --prompt "Once upon a time," --max-new-tokens 256 --compile --num-samples 3
5+
PyTorch version 2.6.0.dev20241218+cu124 available.
6+
Unabled to import torchao experimental quant_api with error: [Errno 2] No such file or directory: '/home/jackkhuu/oss/torchchat/torchao-build/src/ao/torchao/experimental/quant_api.py'
7+
Using device=cpu Intel(R) Xeon(R) Platinum 8339HC CPU @ 1.80GHz
8+
Loading model...
9+
Time to load model: 0.11 seconds
10+
Quantizing the model with: {'precision': {'dtype': 'bfloat16'}, 'executor': {'accelerator': 'cpu'}}
11+
Time to quantize model: 0.01 seconds
12+
-----------------------------------------------------------
13+
Once upon a time, in the ancient kingdom of Aethoria, there lived a young warrior princess named Eira. Eira was known throughout the land for her unmatched bravery, strength, and unwavering dedication to justice. Her people adored her, and her enemies trembled at the mere mention of her name.
14+
Eira's journey began when she was just a child, training in the ways of combat and magic under the watchful eye of her wise and powerful mentor, the sorceress Arachne. Arachne had taken Eira under her wing, recognizing the young princess's innate potential and grooming her to one day take the throne.
15+
As Eira grew in power and wisdom, she became increasingly frustrated with the injustices that plagued her kingdom. Corruption, tyranny, and oppression were rampant, and Eira saw it as her duty to put an end to them. She spent countless hours studying the ancient lore and magical texts, seeking a deeper understanding of the world and the forces that shaped it.
16+
One fateful day, a dark and malevolent force began to spread across the land, threatening to destroy everything Eira held dear. A powerful sorcerer-king, named Malakar, had risen to power, using his mastery of dark magic to enslave the people
17+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
18+
Generated 255 tokens
19+
Time for inference 1: 220.2756 sec total
20+
Time to first token: 0.6765 sec with parallel prefill.
21+
22+
Total throughput: 1.1622 tokens/sec, 0.8605 s/token
23+
First token throughput: 1.4782 tokens/sec, 0.6765 s/token
24+
Next token throughput: 1.1612 tokens/sec, 0.8612 s/token
25+
26+
Bandwidth achieved: 18.67 GB/s
27+
*** This first iteration will include cold start effects for dynamic import, hardware caches, JIT compilation. ***
28+
just-in-time compilation time (incl run time): 2.2e+02 seconds
29+
30+
========================================
31+
32+
Once upon a time, there lived a rabbit named Rosie. Rosie was different from the other rabbits in the forest. While they were content to spend their days nibbling on carrots and lounging in the sun, Rosie had a passion for adventure.
33+
Rosie had heard of a hidden garden deep within the forest, full of the most beautiful and exotic flowers she had ever seen. She had always dreamed of exploring it, but she was afraid of the unknown dangers that lay within.
34+
One day, Rosie decided that she had had enough of being afraid. She packed a small bag with some carrots, a canteen of water, and a map, and set off on her journey to the hidden garden.
35+
As she wandered through the forest, Rosie encountered all sorts of obstacles. She had to navigate through thick underbrush, cross rushing streams, and climb steep hills. But she persevered, driven by her determination to reach the garden.
36+
Finally, after what seemed like hours of walking, Rosie caught sight of a glimpse of green through the trees. She quickened her pace, her heart racing with excitement.
37+
As she pushed through the final curtain of foliage, Rosie gasped in wonder. Before her lay a garden unlike any she had ever seen. The flowers were more vibrant and exotic than she had imagined,
38+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
39+
Generated 255 tokens
40+
Time for inference 2: 81.8519 sec total
41+
Time to first token: 0.4276 sec with parallel prefill.
42+
43+
Total throughput: 3.1276 tokens/sec, 0.3197 s/token
44+
First token throughput: 2.3387 tokens/sec, 0.4276 s/token
45+
Next token throughput: 3.1317 tokens/sec, 0.3193 s/token
46+
47+
Bandwidth achieved: 50.23 GB/s
48+
49+
========================================
50+
51+
Once upon a time, in a land far, far away, there was the most magical city in all the land. It was a place where dreams came true and magic was real. The city was called Everwood, and it was a place of wonder and enchantment.
52+
Everwood was a city of towering trees that seemed to stretch up to the sky, their branches tangled and woven together in a way that seemed almost... magical. The trees were covered in leaves that shimmered and sparkled in the sunlight, and the air was filled with the sweet scent of honey and lavender.
53+
In the heart of Everwood, there was a great and ancient tree, the Heartwood. It was said that the Heartwood was the source of all magic in the city, and that it held the secrets of the ancient ones. The Heartwood was a place of great power and wisdom, and many sought to unlock its secrets.
54+
In Everwood, the inhabitants were a magical people, with the ability to communicate with animals and control the elements. They lived in harmony with the natural world, and their city was a reflection of this harmony. Everwood was a place of beauty and wonder, where anything seemed possible.
55+
But, as magical as Everwood was, it was not without its dangers. There were dark
56+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
57+
Generated 255 tokens
58+
Time for inference 3: 83.6049 sec total
59+
Time to first token: 0.4247 sec with parallel prefill.
60+
61+
Total throughput: 3.0620 tokens/sec, 0.3266 s/token
62+
First token throughput: 2.3546 tokens/sec, 0.4247 s/token
63+
Next token throughput: 3.0656 tokens/sec, 0.3262 s/token
64+
65+
Bandwidth achieved: 49.18 GB/s
66+
67+
========================================
68+
69+
70+
Warning: Excluding compile in calculations
71+
Average tokens/sec (total): 3.09
72+
Average tokens/sec (first token): 2.35
73+
Average tokens/sec (next tokens): 3.10
74+
75+
Memory used: 0.00 GB

0 commit comments

Comments
 (0)