A .NET Aspire application that imports your entire ChatGPT conversation history and makes it available as RAG (Retrieval-Augmented Generation) memory for any LLM.
![]() |
![]() |
![]() |
![]() |
Enable users to import their entire ChatGPT conversation history into a format that can be used as RAG memory for any Large Language Model. This allows users to leverage their past interactions with ChatGPT to enhance responses from other LLMs.
A fully local .NET Aspire application consisting of:
- Blazor web frontend — upload UI and chat UI
- ASP.NET Core API — parsing, background processing, RAG pipeline
- MongoDB — stores full conversation data and metadata
- Qdrant — stores embeddings for semantic search
- LLM — config-driven: Ollama, Foundry Local, or Azure OpenAI
- .NET 10 SDK
- Docker Desktop (for MongoDB and Qdrant containers)
- An LLM provider (one of):
- Ollama running locally (default)
- Foundry Local running locally
- Azure OpenAI (cloud, requires subscription and API key)
- Node.js (for Tailwind CSS build during development)
-
Clone the repository
git clone https://github.com/matt-goldman/MattGPT.git cd MattGPT -
Install npm dependencies (needed for CSS build)
cd src/MattGPT.Web npm install cd ../..
-
Configure your LLM provider (see LLM Configuration)
-
Start the application
cd src/MattGPT.AppHost dotnet runAspire will start MongoDB, Qdrant, the API service, and the web frontend automatically. The Aspire dashboard URL will be printed to the console — open it to monitor all services.
-
Open the web UI
The web frontend URL is also printed on startup (e.g.
https://localhost:7xxx). Open it in your browser.
LLM settings are in src/MattGPT.ApiService/appsettings.json under the LLM section:
{
"LLM": {
"Provider": "Ollama",
"ModelId": "llama3.2",
"EmbeddingModelId": "nomic-embed-text",
"Endpoint": "http://localhost:11434"
}
}| Setting | Description |
|---|---|
Provider |
LLM backend: Ollama, FoundryLocal, or AzureOpenAI |
ModelId |
Chat model name (e.g. llama3.2 for Ollama, deployment name for Azure) |
EmbeddingModelId |
Embedding model name. Defaults to ModelId if omitted |
Endpoint |
Base URL of the LLM API |
ApiKey |
API key (required for AzureOpenAI; optional for FoundryLocal) |
Ollama example (default):
{
"LLM": {
"Provider": "Ollama",
"ModelId": "llama3.2",
"EmbeddingModelId": "nomic-embed-text",
"Endpoint": "http://localhost:11434"
}
}Ensure the required models are pulled before starting:
ollama pull llama3.2
ollama pull nomic-embed-textFoundry Local example:
{
"LLM": {
"Provider": "FoundryLocal",
"ModelId": "phi-3.5-mini",
"EmbeddingModelId": "phi-3.5-mini",
"Endpoint": "http://localhost:5273/v1"
}
}Azure OpenAI example:
{
"LLM": {
"Provider": "AzureOpenAI",
"ModelId": "gpt-4o",
"EmbeddingModelId": "text-embedding-3-small",
"Endpoint": "https://YOUR_RESOURCE.openai.azure.com/",
"ApiKey": "YOUR_API_KEY"
}
}- In ChatGPT, go to Settings → Data controls → Export data.
- You will receive an email with a download link. Download and extract the ZIP.
- Locate
conversations.jsoninside the extracted folder. This is the file to upload.
- The file must be named
conversations.json(or any.jsonfile) and follow the ChatGPT export schema. - Maximum file size: 200 MB (the typical full export is ~148 MB for a large history).
-
Navigate to the Upload page from the nav bar.
-
Select your
conversations.jsonfile. -
Click Upload & Process.
-
The UI will show upload progress, then switch to processing status.
-
Processing runs in the background. The UI polls for progress and shows the number of conversations processed.
-
When complete, a success message is shown.
The background pipeline performs the following steps automatically:
- Parse — the JSON is parsed into structured conversations.
- Store — each conversation is stored in MongoDB.
- Summarise — each conversation is summarised using the configured LLM.
- Embed — each summary is converted to a vector embedding.
- Index — embeddings are stored in Qdrant for semantic search.
-
Navigate to the Chat page from the nav bar.
-
Type a question in the input box and press Enter or click Send.
-
The system embeds your query, retrieves the most semantically similar past conversations, and sends them as context to the LLM.
-
The LLM response is displayed in the chat window.
-
Below each response, click "N source(s) used" to expand the list of retrieved conversations that informed the response, including their titles and relevance scores.
-
Continue the conversation — each new message is processed independently with fresh RAG retrieval.
Update appsettings.json in MattGPT.ApiService (see LLM Configuration) and restart the API service. No data migration is required — embeddings are already stored in Qdrant.
Note: If you change the embedding model, existing embeddings will be incompatible with new ones. Re-run the embedding pipeline via
POST /conversations/embedon the API, or re-import your conversations.
The RAG pipeline is controlled by the RAG section in appsettings.json:
{
"RAG": {
"Mode": "Auto",
"TopK": 5,
"MinScore": 0.5,
"AutoTopK": 2,
"AutoMinScore": 0.65,
"ToolMaxResults": 5
}
}| Mode | Behaviour |
|---|---|
WithPrompt |
Full automatic RAG injection on every message. No tools registered. Best for models that don't support tool calling (e.g. llama3.2 3B). |
Auto (default) |
Light auto-RAG (uses AutoTopK/AutoMinScore) plus a search_memories tool the LLM can call for deeper retrieval. Best for tool-capable models (e.g. llama3.1 8B+, GPT-4o). |
ToolsOnly |
No automatic RAG injection. The LLM must explicitly call the search_memories tool to retrieve context. Best for high-capability models where you want minimal context waste. |
| Setting | Description |
|---|---|
Mode |
RAG mode (see above). Default: Auto |
TopK |
Number of conversations retrieved per query in WithPrompt mode (default: 5) |
MinScore |
Minimum cosine similarity (0.0–1.0) in WithPrompt mode (default: 0.5) |
AutoTopK |
Conversations retrieved in the light auto-pass of Auto mode (default: 2) |
AutoMinScore |
Minimum similarity for the Auto mode light pass (default: 0.65) |
ToolMaxResults |
Maximum results the search_memories tool returns per invocation (default: 5) |
Increase TopK/AutoTopK for richer context. Lower MinScore/AutoMinScore to include less similar results (may add noise). Raise them to require higher relevance.
- Ensure Docker Desktop is running before starting the Aspire AppHost.
- Check the Aspire dashboard for container error logs.
- Navigate to
/llm/statuson the API service to check connectivity. - For Ollama, ensure the service is running (
ollama serve) and the required models are pulled. - For Foundry Local, ensure the local server is started.
- Check that
Endpointinappsettings.jsonmatches the actual running address.
- Verify the file is a valid
.jsonfile (not the full ZIP export — extract it first). - Files larger than 200 MB are not accepted. Split or trim the export if necessary.
- Check API service logs in the Aspire dashboard for parsing errors.
- Ensure the full pipeline has completed: upload → summarise → embed → index.
- If you imported conversations but have no embeddings, trigger the pipeline manually via the API:
POST /conversations/summarise POST /conversations/embed - Lower the
MinScorethreshold if results are being filtered out. - Check that the embedding model is compatible with your LLM provider configuration.
- If you change
EmbeddingModelId, existing embeddings were generated by a different model and are no longer comparable. - Re-embed by calling
POST /conversations/embed(the service re-embeds any conversation in theSummarisedstate). - If needed, re-import conversations to reset processing state.
MattGPT runs LLM inference locally by default (Ollama). Performance varies dramatically depending on hardware:
- GPU acceleration is strongly recommended. On Windows with a CUDA-capable GPU, the AppHost enables GPU passthrough for the Ollama container via
.WithGPUSupport(). This makes summarisation, embedding, and chat responses significantly faster. - CPU-only inference is very slow. On macOS (or any machine without a supported GPU), expect long processing times — especially for the summarisation and embedding pipeline across thousands of conversations. The HTTP client timeout is set to 10 minutes per request to accommodate this, but individual operations may still feel sluggish.
- Model choice matters. Smaller models (e.g.
llama3.23B,nomic-embed-text) are much faster than larger ones. If you're experimenting or running on limited hardware, start with the defaults. - Cloud providers are an alternative. If local performance is unacceptable, switch to
AzureOpenAIin the LLM configuration to offload inference to the cloud.
You may need to tweak the configuration for your specific hardware. The defaults are tuned for a Windows machine with an NVIDIA GPU.
Planning and issue tracking lives in the docs/ folder — docs/index.md is the system of record. This file-based backlog exists so that AI coding agents (both online and offline) can pick up work autonomously. Completed issues are archived in docs/Done/ with full context of what was built and why.
If you'd like to suggest a feature or report a bug, please open a GitHub Issue. Approved items will be promoted into the docs backlog for implementation.
- Runtime configuration wizard — a guided setup experience so new users can configure the LLM provider and model without editing config files (see issue #14).
- Support for additional vector databases (Pinecone, Weaviate) and LLM endpoints.
- Advanced parsing: sentiment analysis, topic modelling, entity extraction.
- Import of other file types (images, PDFs) shared in conversations.
- Integration with LM Studio, OpenWebUI, and other LLM tools.
- Automatic project reconstruction in other LLMs from imported history.
