This project demonstrates how to pull an LLM model using Ollama and serve it as a chat API using FastAPI, all inside a Google Colab environment.
- π¦ Ollama (for running local LLMs)
- β‘ FastAPI (for exposing the chat interface as an API)
- π Uvicorn (ASGI server to run FastAPI)
- π§ͺ Google Colab (as the environment to run everything)
- π§ Langchain, Langgraph
β
Pull and run an LLM model (like llama2, mistral, or phi) using Ollama
β
Expose the model via a simple FastAPI endpoint
β
Send chat messages and get model-generated responses
β
Designed to run entirely on Google Colab for quick testing