Skip to content

Commit ffffb49

Browse files
committed
doc
1 parent d849b3c commit ffffb49

File tree

14 files changed

+1167
-13
lines changed

14 files changed

+1167
-13
lines changed

.gitattributes

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
binding/android/PicoLLMTestApp/** linguist-detectable=false
22
binding/android/PicoLLM/picollm/src/main/java/ai/picovoice/picollm/dialog/** linguist-detectable=false
33
binding/ios/PicoLLMAppTest/** linguist-detectable=false
4+
binding/nodejs/** linguist-detectable=false
45
binding/web/cypress/** linguist-detectable=false
56
binding/web/scripts/** linguist-detectable=false
67
binding/web/src/picollm.ts linguist-detectable=false

README.md

Lines changed: 52 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
1515
picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language
1616
models. picoLLM Inference Engine is:
1717

18-
- Accurate; picoLLM Compression improves GPTQ by up to 98%.
18+
- Accurate; picoLLM Compression improves GPTQ by [significant margins](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/)
1919
- Private; LLM inference runs 100% locally.
2020
- Cross-Platform
2121
- Linux (x86_64), macOS (arm64, x86_64), and Windows (x86_64)
@@ -29,6 +29,15 @@ models. picoLLM Inference Engine is:
2929

3030
- [picoLLM](#picollm-inference-engine)
3131
- [Table of Contents](#table-of-contents)
32+
- [Showcases](#showcases)
33+
- [Raspberry Pi](#raspberry-pi)
34+
- [Android](#android)
35+
- [iOS](#ios)
36+
- [Cross-Browser Local LLM](#cross-browser-local-llm)
37+
- [Llama-3-70B-Instruct on GeForce RTX 4090](#llama-3-70b-instruct-on-geforce-rtx-4090)
38+
- [Local LLM-Powered Voice Assistant on Raspberry Pi](#local-llm-powered-voice-assistant-on-raspberry-pi)
39+
- [Local Llama-3-8B-Instruct Voice Assistant on CPU](#local-llama-3-8b-instruct-voice-assistant-on-cpu)
40+
- [Accuracy](#accuracy)
3241
- [Models](#models)
3342
- [AccessKey](#accesskey)
3443
- [Demos](#demos)
@@ -48,6 +57,45 @@ models. picoLLM Inference Engine is:
4857
- [Releases](#releases)
4958
- [FAQ](#faq)
5059

60+
## Showcases
61+
62+
### Raspberry Pi
63+
64+
[![Local LLM on Raspberry Pi](https://img.youtube.com/vi/CeKPXZ_8hkI/0.jpg)](https://www.youtube.com/watch?v=CeKPXZ_8hkI)
65+
66+
### Android
67+
68+
[![How to Run a Local LLM on Android](https://img.youtube.com/vi/XeUMkue-5lI/0.jpg)](https://www.youtube.com/watch?v=XeUMkue-5lI)
69+
70+
### iOS
71+
72+
[![How to Run a Local LLM on iOS](https://img.youtube.com/vi/dNK5esdkI0Y/0.jpg)](https://www.youtube.com/watch?v=dNK5esdkI0Y)
73+
74+
### Cross-Browser Local LLM
75+
76+
[Live Demo — Works offline!](https://picovoice.ai/picollm/)
77+
78+
### Llama-3-70B-Instruct on GeForce RTX 4090
79+
80+
[![Llama-3-70B-Instruct on GeForce RTX 4090](https://img.youtube.com/vi/4mcVwbOOIqk/0.jpg)](https://www.youtube.com/watch?v=4mcVwbOOIqk)
81+
82+
### Local LLM-Powered Voice Assistant on Raspberry Pi
83+
84+
[![Local LLM-Powered Voice Assistant on Raspberry Pi](https://img.youtube.com/vi/GEndT3RGRvw/0.jpg)](https://www.youtube.com/watch?v=GEndT3RGRvw)
85+
86+
### Local Llama-3-8B-Instruct Voice Assistant on CPU
87+
88+
[![Local Llama-3-8B-Instruct Voice Assistant on CPU](https://img.youtube.com/vi/uV0GlXDFSPw/0.jpg)](https://www.youtube.com/watch?v=uV0GlXDFSPw)
89+
90+
## Accuracy
91+
92+
picoLLM Compression is a novel large language model (LLM) quantization algorithm developed within Picovoice. Given a task-specific cost function, picoLLM Compression automatically learns the optimal bit allocation strategy across and within LLM's weights. Existing techniques require a fixed bit allocation scheme, which is subpar.
93+
94+
For example, picoLLM Compression recovers MMLU score degradation of widely adopted GPTQ by 91%, 99%, and 100% at 2, 3,
95+
and 4-bit settings. The figure below depicts the MMLU comparison between picoLLM and GPTQ for Llama-3-8b [[1]](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/).
96+
97+
![picoLLM Compression vs GPTQ MMLU scores when applied to Llama-3-8B](./resources/mmlu-llama-3-8b.svg)
98+
5199
## Models
52100

53101
picoLLM Inference Engine supports the following open-weight models. The models are on
@@ -126,13 +174,13 @@ picollm-completion-demo --access_key ${ACCESS_KEY} --model_path ${MODEL_PATH} --
126174
Replace `${ACCESS_KEY}` with yours obtained from Picovoice Console, `${MODEL_PATH}` with the path to a model file
127175
downloaded from Picovoice Console, and `${PROMPT}` with a prompt string.
128176

129-
For more information about Node.js demos go to [demo/nodejs](./demo/nodejs).
177+
For more information about Node.js demos go to [Node.js demo](./demo/nodejs).
130178

131179
### Android Demos
132180

133-
Using Android Studio, open the [Completion demo](demo/android/Completion/) as an Android project, copy your AccessKey into MainActivity.java, and run the application.
181+
Using Android Studio, open the [Completion demo](demo/android/Completion) as an Android project, copy your AccessKey into MainActivity.java, and run the application.
134182

135-
To learn about how to use picoLLM in a chat application, try out the [Chat demo](demo/android/Chat/).
183+
To learn about how to use picoLLM in a chat application, try out the [Chat demo](demo/android/Chat).
136184

137185
For more information about Android demos go to [demo/android](demo/android/README.md).
138186

binding/android/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
77
picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language
88
models. picoLLM Inference Engine is:
99

10-
- Accurate; picoLLM Compression improves GPTQ by up to 98%.
10+
- Accurate; picoLLM Compression improves GPTQ by [significant margins](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/)
1111
- Private; LLM inference runs 100% locally.
1212
- Cross-Platform
1313
- Runs on CPU and GPU

binding/ios/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
77
picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language
88
models. picoLLM Inference Engine is:
99

10-
- Accurate; picoLLM Compression improves GPTQ by up to 98%.
10+
- Accurate; picoLLM Compression improves GPTQ by [significant margins](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/)
1111
- Private; LLM inference runs 100% locally.
1212
- Cross-Platform
1313
- Runs on CPU and GPU

binding/nodejs/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
77
picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language
88
models. picoLLM Inference Engine is:
99

10-
- Accurate; picoLLM Compression improves GPTQ by up to 98%.
10+
- Accurate; picoLLM Compression improves GPTQ by [significant margins](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/)
1111
- Private; LLM inference runs 100% locally.
1212
- Cross-Platform
1313
- Runs on CPU and GPU

binding/python/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
77
picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language
88
models. picoLLM Inference Engine is:
99

10-
- Accurate; picoLLM Compression improves GPTQ by up to 98%.
10+
- Accurate; picoLLM Compression improves GPTQ by [significant margins](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/)
1111
- Private; LLM inference runs 100% locally.
1212
- Cross-Platform
1313
- Runs on CPU and GPU

binding/web/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
77
picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language
88
models. picoLLM Inference Engine is:
99

10-
- Accurate; picoLLM Compression improves GPTQ by up to 98%.
10+
- Accurate; picoLLM Compression improves GPTQ by [significant margins](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/)
1111
- Private; LLM inference runs 100% locally.
1212
- Cross-Platform
1313
- Runs on CPU and GPU

demo/android/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
77
picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language
88
models. picoLLM Inference Engine is:
99

10-
- Accurate; picoLLM Compression improves GPTQ by up to 98%.
10+
- Accurate; picoLLM Compression improves GPTQ by [significant margins](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/)
1111
- Private; LLM inference runs 100% locally.
1212
- Cross-Platform
1313
- Runs on CPU and GPU

demo/ios/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
77
picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language
88
models. picoLLM Inference Engine is:
99

10-
- Accurate; picoLLM Compression improves GPTQ by up to 98%.
10+
- Accurate; picoLLM Compression improves GPTQ by [significant margins](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/)
1111
- Private; LLM inference runs 100% locally.
1212
- Cross-Platform
1313
- Runs on CPU and GPU

demo/nodejs/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
77
picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language
88
models. picoLLM Inference Engine is:
99

10-
- Accurate; picoLLM Compression improves GPTQ by up to 98%.
10+
- Accurate; picoLLM Compression improves GPTQ by [significant margins](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/)
1111
- Private; LLM inference runs 100% locally.
1212
- Cross-Platform
1313
- Runs on CPU and GPU

0 commit comments

Comments
 (0)