MattGPT

A .NET Aspire application that imports your entire ChatGPT conversation history and makes it available as RAG (Retrieval-Augmented Generation) memory for any LLM.

Screenshots

Goals

Enable users to import their entire ChatGPT conversation history into a format that can be used as RAG memory for any Large Language Model. This allows users to leverage their past interactions with ChatGPT to enhance responses from other LLMs.

Solution Structure

A fully local .NET Aspire application consisting of:

Blazor web frontend — upload UI and chat UI
ASP.NET Core API — parsing, background processing, RAG pipeline
MongoDB — stores full conversation data and metadata
Qdrant — stores embeddings for semantic search
LLM — config-driven: Ollama, Foundry Local, or Azure OpenAI

Running Locally

Prerequisites

.NET 10 SDK
Docker Desktop (for MongoDB and Qdrant containers)
An LLM provider (one of):
- Ollama running locally (default)
- Foundry Local running locally
- Azure OpenAI (cloud, requires subscription and API key)
Node.js (for Tailwind CSS build during development)

Setup

Clone the repository

git clone https://github.com/matt-goldman/MattGPT.git
cd MattGPT

Install npm dependencies (needed for CSS build)
```
cd src/MattGPT.Web
npm install
cd ../..
```
Configure your LLM provider (see LLM Configuration)
Start the application
```
cd src/MattGPT.AppHost
dotnet run
```
Aspire will start MongoDB, Qdrant, the API service, and the web frontend automatically. The Aspire dashboard URL will be printed to the console — open it to monitor all services.
Open the web UI

The web frontend URL is also printed on startup (e.g. https://localhost:7xxx). Open it in your browser.

LLM Configuration

LLM settings are in src/MattGPT.ApiService/appsettings.json under the LLM section:

{
  "LLM": {
    "Provider": "Ollama",
    "ModelId": "llama3.2",
    "EmbeddingModelId": "nomic-embed-text",
    "Endpoint": "http://localhost:11434"
  }
}

Setting	Description
`Provider`	LLM backend: `Ollama`, `FoundryLocal`, or `AzureOpenAI`
`ModelId`	Chat model name (e.g. `llama3.2` for Ollama, deployment name for Azure)
`EmbeddingModelId`	Embedding model name. Defaults to `ModelId` if omitted
`Endpoint`	Base URL of the LLM API
`ApiKey`	API key (required for `AzureOpenAI`; optional for `FoundryLocal`)

Ollama example (default):

{
  "LLM": {
    "Provider": "Ollama",
    "ModelId": "llama3.2",
    "EmbeddingModelId": "nomic-embed-text",
    "Endpoint": "http://localhost:11434"
  }
}

Ensure the required models are pulled before starting:

ollama pull llama3.2
ollama pull nomic-embed-text

Foundry Local example:

{
  "LLM": {
    "Provider": "FoundryLocal",
    "ModelId": "phi-3.5-mini",
    "EmbeddingModelId": "phi-3.5-mini",
    "Endpoint": "http://localhost:5273/v1"
  }
}

Azure OpenAI example:

{
  "LLM": {
    "Provider": "AzureOpenAI",
    "ModelId": "gpt-4o",
    "EmbeddingModelId": "text-embedding-3-small",
    "Endpoint": "https://YOUR_RESOURCE.openai.azure.com/",
    "ApiKey": "YOUR_API_KEY"
  }
}

Uploading and Processing Conversations

Exporting from ChatGPT

In ChatGPT, go to Settings → Data controls → Export data.
You will receive an email with a download link. Download and extract the ZIP.
Locate conversations.json inside the extracted folder. This is the file to upload.

Upload format

The file must be named conversations.json (or any .json file) and follow the ChatGPT export schema.
Maximum file size: 200 MB (the typical full export is ~148 MB for a large history).

Uploading via the UI

Navigate to the Upload page from the nav bar.
Select your conversations.json file.
Click Upload & Process.
The UI will show upload progress, then switch to processing status.
Processing runs in the background. The UI polls for progress and shows the number of conversations processed.
When complete, a success message is shown.

What happens during processing

The background pipeline performs the following steps automatically:

Parse — the JSON is parsed into structured conversations.
Store — each conversation is stored in MongoDB.
Summarise — each conversation is summarised using the configured LLM.
Embed — each summary is converted to a vector embedding.
Index — embeddings are stored in Qdrant for semantic search.

Testing LLM Interaction

Using the Chat UI

Navigate to the Chat page from the nav bar.
Type a question in the input box and press Enter or click Send.
The system embeds your query, retrieves the most semantically similar past conversations, and sends them as context to the LLM.
The LLM response is displayed in the chat window.
Below each response, click "N source(s) used" to expand the list of retrieved conversations that informed the response, including their titles and relevance scores.
Continue the conversation — each new message is processed independently with fresh RAG retrieval.

Switching LLM providers

Update appsettings.json in MattGPT.ApiService (see LLM Configuration) and restart the API service. No data migration is required — embeddings are already stored in Qdrant.

Note: If you change the embedding model, existing embeddings will be incompatible with new ones. Re-run the embedding pipeline via POST /conversations/embed on the API, or re-import your conversations.

RAG tuning

The RAG pipeline is controlled by the RAG section in appsettings.json:

{
  "RAG": {
    "Mode": "Auto",
    "TopK": 5,
    "MinScore": 0.5,
    "AutoTopK": 2,
    "AutoMinScore": 0.65,
    "ToolMaxResults": 5
  }
}

Modes

Mode	Behaviour
`WithPrompt`	Full automatic RAG injection on every message. No tools registered. Best for models that don't support tool calling (e.g. llama3.2 3B).
`Auto` (default)	Light auto-RAG (uses `AutoTopK`/`AutoMinScore`) plus a `search_memories` tool the LLM can call for deeper retrieval. Best for tool-capable models (e.g. llama3.1 8B+, GPT-4o).
`ToolsOnly`	No automatic RAG injection. The LLM must explicitly call the `search_memories` tool to retrieve context. Best for high-capability models where you want minimal context waste.

Settings

Setting	Description
`Mode`	RAG mode (see above). Default: `Auto`
`TopK`	Number of conversations retrieved per query in `WithPrompt` mode (default: 5)
`MinScore`	Minimum cosine similarity (0.0–1.0) in `WithPrompt` mode (default: 0.5)
`AutoTopK`	Conversations retrieved in the light auto-pass of `Auto` mode (default: 2)
`AutoMinScore`	Minimum similarity for the `Auto` mode light pass (default: 0.65)
`ToolMaxResults`	Maximum results the `search_memories` tool returns per invocation (default: 5)

Increase TopK/AutoTopK for richer context. Lower MinScore/AutoMinScore to include less similar results (may add noise). Raise them to require higher relevance.

Troubleshooting

Docker containers don't start

Ensure Docker Desktop is running before starting the Aspire AppHost.
Check the Aspire dashboard for container error logs.

LLM is unreachable

Navigate to /llm/status on the API service to check connectivity.
For Ollama, ensure the service is running (ollama serve) and the required models are pulled.
For Foundry Local, ensure the local server is started.
Check that Endpoint in appsettings.json matches the actual running address.

Upload fails

Verify the file is a valid .json file (not the full ZIP export — extract it first).
Files larger than 200 MB are not accepted. Split or trim the export if necessary.
Check API service logs in the Aspire dashboard for parsing errors.

No sources appear in chat responses

Ensure the full pipeline has completed: upload → summarise → embed → index.
If you imported conversations but have no embeddings, trigger the pipeline manually via the API:
```
POST /conversations/summarise
POST /conversations/embed
```
Lower the MinScore threshold if results are being filtered out.
Check that the embedding model is compatible with your LLM provider configuration.

Embeddings are empty or irrelevant after switching models

If you change EmbeddingModelId, existing embeddings were generated by a different model and are no longer comparable.
Re-embed by calling POST /conversations/embed (the service re-embeds any conversation in the Summarised state).
If needed, re-import conversations to reset processing state.

Performance Notes

MattGPT runs LLM inference locally by default (Ollama). Performance varies dramatically depending on hardware:

GPU acceleration is strongly recommended. On Windows with a CUDA-capable GPU, the AppHost enables GPU passthrough for the Ollama container via .WithGPUSupport(). This makes summarisation, embedding, and chat responses significantly faster.
CPU-only inference is very slow. On macOS (or any machine without a supported GPU), expect long processing times — especially for the summarisation and embedding pipeline across thousands of conversations. The HTTP client timeout is set to 10 minutes per request to accommodate this, but individual operations may still feel sluggish.
Model choice matters. Smaller models (e.g. llama3.2 3B, nomic-embed-text) are much faster than larger ones. If you're experimenting or running on limited hardware, start with the defaults.
Cloud providers are an alternative. If local performance is unacceptable, switch to AzureOpenAI in the LLM configuration to offload inference to the cloud.

You may need to tweak the configuration for your specific hardware. The defaults are tuned for a Windows machine with an NVIDIA GPU.

Project Tracking

Planning and issue tracking lives in the docs/ folder — docs/index.md is the system of record. This file-based backlog exists so that AI coding agents (both online and offline) can pick up work autonomously. Completed issues are archived in docs/Done/ with full context of what was built and why.

If you'd like to suggest a feature or report a bug, please open a GitHub Issue. Approved items will be promoted into the docs backlog for implementation.

Future Enhancements

Runtime configuration wizard — a guided setup experience so new users can configure the LLM provider and model without editing config files (see issue #14).
Support for additional vector databases (Pinecone, Weaviate) and LLM endpoints.
Advanced parsing: sentiment analysis, topic modelling, entity extraction.
Import of other file types (images, PDFs) shared in conversations.
Integration with LM Studio, OpenWebUI, and other LLM tools.
Automatic project reconstruction in other LLMs from imported history.

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
.aspire		.aspire
.github/workflows		.github/workflows
assets		assets
docs		docs
src		src
tools		tools
.gitignore		.gitignore
AGENTS.md		AGENTS.md
MattGPT.slnx		MattGPT.slnx
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MattGPT

Screenshots

Goals

Solution Structure

Running Locally

Prerequisites

Setup

LLM Configuration

Uploading and Processing Conversations

Exporting from ChatGPT

Upload format

Uploading via the UI

What happens during processing

Testing LLM Interaction

Using the Chat UI

Switching LLM providers

RAG tuning

Modes

Settings

Troubleshooting

Docker containers don't start

LLM is unreachable

Upload fails

No sources appear in chat responses

Embeddings are empty or irrelevant after switching models

Performance Notes

Project Tracking

Future Enhancements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

matt-goldman/MattGPT

Folders and files

Latest commit

History

Repository files navigation

MattGPT

Screenshots

Goals

Solution Structure

Running Locally

Prerequisites

Setup

LLM Configuration

Uploading and Processing Conversations

Exporting from ChatGPT

Upload format

Uploading via the UI

What happens during processing

Testing LLM Interaction

Using the Chat UI

Switching LLM providers

RAG tuning

Modes

Settings

Troubleshooting

Docker containers don't start

LLM is unreachable

Upload fails

No sources appear in chat responses

Embeddings are empty or irrelevant after switching models

Performance Notes

Project Tracking

Future Enhancements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages