A lightweight CLI chat tool that lets you swap between LLMs while keeping a short-term memory of the conversation.
I built this because I noticed that most LLM APIs are stateless - they don’t remember anything between messages unless you manually include the full conversation history. I wanted something lightweight that could simulate memory across model calls, and also let me switch between different LLMs mid-conversation without losing context. This project keeps a running memory of recent messages and feeds them into each API call so the model can respond more naturally, like a real conversation.
- Stores all messages in
chat_history.json
. - Replays the last N turns to whichever model you choose, so each model sees the same context.
- Lets you switch models by typing a letter (A-C).
- Generate a response using Groq’s Chat Completion API under the hood.
Key | Model ID |
---|---|
A | gemma2-9b-it |
B | llama-3.3-70b-versatile |
C | llama3-8b-8192 |
pip install requests groq
export GROQ_API_KEY="your-real-groq-key"
python main.py

Made by Yash Thapliyal 2025