This repository was archived by the owner on Sep 10, 2025. It is now read-only.
Commit e5cf6e5
Add the min, max to embedding 4bit
#1506
Please refer it for more context information.
Case:
Failed to export the embedding 4bit model with the following command:
python torchchat.py export stories110m --quantize torchchat/quant_config/mobile.json --output-pte-path stories110me4.pte
Solution:
Add the min, max to embedding 4bit.
Test:
The 4bit,8bit of embedding are exported successfully by the commands:
Change torchchat/quant_config/mobile.json to:
{
"embedding": {"bitwidth": 8, "groupsize" : 32},
"linear:a8w4dq": {"groupsize" : 256}
}
python torchchat.py export stories110m --quantize torchchat/quant_config/mobile.json --output-pte-path stories110me8.pte
{
"embedding": {"bitwidth": 4, "groupsize" : 32},
"linear:a8w4dq": {"groupsize" : 256}
}
python torchchat.py export stories110m --quantize torchchat/quant_config/mobile.json --output-pte-path stories110me4.pte
Signed-off-by: jijie <[email protected]>
Co-authored-by: jijie <[email protected]>1 parent 4251a54 commit e5cf6e5
1 file changed
+1
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
747 | 747 | | |
748 | 748 | | |
749 | 749 | | |
750 | | - | |
| 750 | + | |
751 | 751 | | |
752 | 752 | | |
753 | 753 | | |
| |||
0 commit comments