This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
ChatMine is a Python application for importing, analyzing, and searching AI chat conversations. It supports imports from Claude AI and ChatGPT, stores conversations in a SQLite database, and provides both CLI tools and a web interface with advanced search capabilities including AI-powered semantic search.
ChatMine supports GPU acceleration for faster embedding generation:
- Sentence Transformers: Automatically uses CUDA when available (via PyTorch)
- FAISS: Currently uses CPU version. For GPU-accelerated FAISS:
# Install via conda (recommended) conda install -c pytorch faiss-gpu # Or build from source
- Performance: ~10-15x speedup on embedding generation with GPU
- Fallback: Automatically uses CPU if GPU not available
This project uses Rye for Python package management and development workflow.
Python Version: 3.13.5 (specified in .python-version)
# Setup and sync dependencies
rye sync
# Run all tests and code quality checks
./scripts/test.sh
# Run tests only
rye test
# Run specific test file
rye test src/chatmine/test_cli.py
# Format code
rye fmt
black src
isort src
# Type checking
mypy --strict src
pyright
# Linting
rye lint# Run database migrations
alembic upgrade head
# Create new migration
alembic revision --autogenerate -m "description"
# Downgrade migration
alembic downgrade -1# Import data
chatmine import-claude path/to/claude-export.zip
chatmine import-chatgpt path/to/chatgpt-export.zip
# Search operations
chatmine search "python programming"
chatmine search "weather" --limit 5 --context 50
chatmine semantic-search "machine learning" --threshold 0.3 --limit 10
# Database operations
chatmine stats
chatmine recent
# Embeddings and indexing
chatmine generate-embeddings --batch-size 100
chatmine rebuild-index
# Code extraction and analysis
chatmine code-search --language python --code-type function
chatmine code-stats
chatmine export-code --language python -o my_python_code
# Conversation export to markdown files
chatmine export-conversations --preview # See what would be exported
chatmine export-conversations -o my_conversations
chatmine export-conversations --platform claude --date-from 2024-01
# Web interface
chatmine serve # Starts on port 8000- cli.py: Click-based command-line interface with all main commands
- models.py: SQLAlchemy models for
Conversation,ChatMessage, andCodeBlockwith integer primary keys - database.py: Database setup and session management utilities
- data_export.py: Pydantic models for parsing chat export data from ZIP files
- importers/: Platform-specific importers in
src/chatmine/importers/(claude.py,chatgpt.py) withClaudeImporterandChatGPTImporterclasses - code_extractor.py: Advanced code block detection, language identification, and metadata extraction
- conversation_exporter.py: Export conversations to organized markdown files with metadata
- hardware.py: Hardware detection and GPU acceleration utilities
- web.py: FastAPI-based web interface with templating and REST endpoints
- embeddings.py: Core embedding service using sentence-transformers
- embeddings_faiss.py: FAISS-optimized embedding service for fast similarity search
- migrations/: Alembic database migrations in
src/chatmine/migrations/using SQLite withchatmine.db
- Uses SQLite database (
chatmine.db) with Alembic migrations - Integer primary keys for conversations, messages, and code blocks (migrated from UUIDs)
- Foreign key relationships:
Conversation→ChatMessage→CodeBlock - Platform-agnostic design with
platform_idfields storing external UUIDs - Embedding storage in
ChatMessage.embeddingfield for semantic search - Code block storage with language, type, content, and rich metadata in JSON format
- Unique constraints on
platform_id+platformcombinations
- Import Process: Chat exports are ZIP files containing platform-specific JSON data
- Parsing:
DataExportclass parses ZIP files using Pydantic models - Storage: Platform importers convert to SQLAlchemy models and store in database
- Code Extraction: Code blocks are automatically extracted during import using regex patterns
- Deduplication: Duplicate conversations are skipped based on
platform_id - Embeddings: Semantic search requires generating embeddings via
generate-embeddings - Indexing: FAISS index is built for fast similarity search via
rebuild-index
- Text Search: Direct SQLite LIKE queries on message text
- Semantic Search: Uses sentence-transformers model to generate embeddings
- Code Search: Specialized search for code blocks by language, type, and content
- FAISS Integration: Optimized vector similarity search with configurable thresholds
- Web Interface: Provides multiple search modes with result highlighting
- FastAPI Backend: Serves REST endpoints and HTML templates
- Jinja2 Templates: Located in
src/chatmine/templates/directory with base layout and specialized pages - Dashboard: Conversation statistics and overview
- Search Pages: Both text and semantic search with result highlighting
- Conversation Browser: Paginated list and detailed conversation views
- Static Assets: CSS and JavaScript files in
src/chatmine/static/
- Uses strict type checking with mypy and pyright
- Code formatting with Black and isort
- Test coverage with pytest and pytest-cov (90% minimum coverage enforced)
- Parallel test execution with
pytest-xdistandpytest-sugarfor enhanced output - All dependencies managed through Rye's
pyproject.toml - Python 3.8+ compatibility (3.13.5 specified in
.python-version) - Entry point configured as
chatmine = "chatmine.cli:cli"in pyproject.toml - Coverage Exclusions: Test files, migrations, cache directories automatically excluded
- Test files use
test_*.pynaming convention and are located alongside source code insrc/chatmine/ - Coverage requirement: 90% minimum (enforced in pyproject.toml)
- Parallel execution: Automatic parallel testing with
pytest-xdist(-n auto) - Test Data: Sample exports located in
src/chatmine/temporary_chat_data/for integration testing - Run the comprehensive test suite with
./scripts/test.shwhich includes:- All unit tests with coverage reporting (
--cov=src/chatmine) - Code formatting (Black, isort)
- Linting with rye lint
- Type checking (mypy --strict, pyright)
- Database migrations (
alembic upgrade head) - CLI integration tests with real import commands and data validation
- Web interface testing with FastAPI test client
- All unit tests with coverage reporting (
- SQLAlchemy + Alembic: Database ORM and migrations
- Click: CLI framework
- FastAPI + Uvicorn: Web interface
- Pydantic: Data validation and parsing
- sentence-transformers: AI embeddings for semantic search
- FAISS: Fast similarity search indexing (CPU version by default, GPU via conda)
- Rich: Terminal formatting and progress bars
- PyTorch: Backend for sentence-transformers with automatic CUDA support
The comprehensive test script ./scripts/test.sh ensures all quality checks pass:
- repomix: Generates consolidated codebase snapshot
- Dependencies: Syncs and validates with Rye
- Testing: Full test suite with 90% coverage requirement
- Formatting: Black and isort for consistent code style
- Linting: Rye lint for code quality
- Type Checking: Both mypy (strict) and pyright for comprehensive type safety
- Database: Migration validation with Alembic
- Integration: End-to-end CLI testing with real data imports
- chatmine.db: Main SQLite database (auto-created on first run)
- faiss_index.pkl: Serialized FAISS search index
- Sample Data: Test exports in
src/chatmine/temporary_chat_data/for development - Export Outputs:
exported_conversations/andcode-extract/directories for user exports