A compact, quantized chat model file in GGUF format.
- tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf — 1.1B-parameter chat model quantized to Q4_K_M for reduced size and faster inference.
- Download the GGUF file to your model directory.
- Load with a compatible runtime (example: llama.cpp / ggml-based runtimes):
./main -m ./models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf -p "Hello, how are you?"
- Quantized format trades some precision for smaller size and speed — suitable for lightweight inference and experimentation.
- Ensure your inference tool supports GGUF and the Q4_K_M quantization type.
- No license or training data details included — check upstream/source for licensing and provenance.
For issues or questions, open an issue on this repository.