Skip to content

Latest commit

 

History

History
212 lines (163 loc) · 8.33 KB

File metadata and controls

212 lines (163 loc) · 8.33 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

ChatMine is a Python application for importing, analyzing, and searching AI chat conversations. It supports imports from Claude AI and ChatGPT, stores conversations in a SQLite database, and provides both CLI tools and a web interface with advanced search capabilities including AI-powered semantic search.

GPU Acceleration

ChatMine supports GPU acceleration for faster embedding generation:

  • Sentence Transformers: Automatically uses CUDA when available (via PyTorch)
  • FAISS: Currently uses CPU version. For GPU-accelerated FAISS:
    # Install via conda (recommended)
    conda install -c pytorch faiss-gpu
    # Or build from source
  • Performance: ~10-15x speedup on embedding generation with GPU
  • Fallback: Automatically uses CPU if GPU not available

Development Environment

This project uses Rye for Python package management and development workflow.

Python Version: 3.13.5 (specified in .python-version)

Essential Commands

# Setup and sync dependencies
rye sync

# Run all tests and code quality checks
./scripts/test.sh

# Run tests only
rye test

# Run specific test file
rye test src/chatmine/test_cli.py

# Format code
rye fmt
black src
isort src

# Type checking
mypy --strict src
pyright

# Linting
rye lint

Database Operations

# Run database migrations
alembic upgrade head

# Create new migration
alembic revision --autogenerate -m "description"

# Downgrade migration
alembic downgrade -1

CLI Application Commands

# Import data
chatmine import-claude path/to/claude-export.zip
chatmine import-chatgpt path/to/chatgpt-export.zip

# Search operations
chatmine search "python programming"
chatmine search "weather" --limit 5 --context 50
chatmine semantic-search "machine learning" --threshold 0.3 --limit 10

# Database operations
chatmine stats
chatmine recent

# Embeddings and indexing
chatmine generate-embeddings --batch-size 100
chatmine rebuild-index

# Code extraction and analysis
chatmine code-search --language python --code-type function
chatmine code-stats
chatmine export-code --language python -o my_python_code

# Conversation export to markdown files
chatmine export-conversations --preview  # See what would be exported
chatmine export-conversations -o my_conversations
chatmine export-conversations --platform claude --date-from 2024-01

# Web interface
chatmine serve  # Starts on port 8000

Architecture

Core Components

  • cli.py: Click-based command-line interface with all main commands
  • models.py: SQLAlchemy models for Conversation, ChatMessage, and CodeBlock with integer primary keys
  • database.py: Database setup and session management utilities
  • data_export.py: Pydantic models for parsing chat export data from ZIP files
  • importers/: Platform-specific importers in src/chatmine/importers/ (claude.py, chatgpt.py) with ClaudeImporter and ChatGPTImporter classes
  • code_extractor.py: Advanced code block detection, language identification, and metadata extraction
  • conversation_exporter.py: Export conversations to organized markdown files with metadata
  • hardware.py: Hardware detection and GPU acceleration utilities
  • web.py: FastAPI-based web interface with templating and REST endpoints
  • embeddings.py: Core embedding service using sentence-transformers
  • embeddings_faiss.py: FAISS-optimized embedding service for fast similarity search
  • migrations/: Alembic database migrations in src/chatmine/migrations/ using SQLite with chatmine.db

Database Schema

  • Uses SQLite database (chatmine.db) with Alembic migrations
  • Integer primary keys for conversations, messages, and code blocks (migrated from UUIDs)
  • Foreign key relationships: ConversationChatMessageCodeBlock
  • Platform-agnostic design with platform_id fields storing external UUIDs
  • Embedding storage in ChatMessage.embedding field for semantic search
  • Code block storage with language, type, content, and rich metadata in JSON format
  • Unique constraints on platform_id + platform combinations

Data Flow

  1. Import Process: Chat exports are ZIP files containing platform-specific JSON data
  2. Parsing: DataExport class parses ZIP files using Pydantic models
  3. Storage: Platform importers convert to SQLAlchemy models and store in database
  4. Code Extraction: Code blocks are automatically extracted during import using regex patterns
  5. Deduplication: Duplicate conversations are skipped based on platform_id
  6. Embeddings: Semantic search requires generating embeddings via generate-embeddings
  7. Indexing: FAISS index is built for fast similarity search via rebuild-index

Search Architecture

  • Text Search: Direct SQLite LIKE queries on message text
  • Semantic Search: Uses sentence-transformers model to generate embeddings
  • Code Search: Specialized search for code blocks by language, type, and content
  • FAISS Integration: Optimized vector similarity search with configurable thresholds
  • Web Interface: Provides multiple search modes with result highlighting

Web Interface

  • FastAPI Backend: Serves REST endpoints and HTML templates
  • Jinja2 Templates: Located in src/chatmine/templates/ directory with base layout and specialized pages
  • Dashboard: Conversation statistics and overview
  • Search Pages: Both text and semantic search with result highlighting
  • Conversation Browser: Paginated list and detailed conversation views
  • Static Assets: CSS and JavaScript files in src/chatmine/static/

Development Standards

  • Uses strict type checking with mypy and pyright
  • Code formatting with Black and isort
  • Test coverage with pytest and pytest-cov (90% minimum coverage enforced)
  • Parallel test execution with pytest-xdist and pytest-sugar for enhanced output
  • All dependencies managed through Rye's pyproject.toml
  • Python 3.8+ compatibility (3.13.5 specified in .python-version)
  • Entry point configured as chatmine = "chatmine.cli:cli" in pyproject.toml
  • Coverage Exclusions: Test files, migrations, cache directories automatically excluded

Testing

  • Test files use test_*.py naming convention and are located alongside source code in src/chatmine/
  • Coverage requirement: 90% minimum (enforced in pyproject.toml)
  • Parallel execution: Automatic parallel testing with pytest-xdist (-n auto)
  • Test Data: Sample exports located in src/chatmine/temporary_chat_data/ for integration testing
  • Run the comprehensive test suite with ./scripts/test.sh which includes:
    • All unit tests with coverage reporting (--cov=src/chatmine)
    • Code formatting (Black, isort)
    • Linting with rye lint
    • Type checking (mypy --strict, pyright)
    • Database migrations (alembic upgrade head)
    • CLI integration tests with real import commands and data validation
    • Web interface testing with FastAPI test client

Key Dependencies

  • SQLAlchemy + Alembic: Database ORM and migrations
  • Click: CLI framework
  • FastAPI + Uvicorn: Web interface
  • Pydantic: Data validation and parsing
  • sentence-transformers: AI embeddings for semantic search
  • FAISS: Fast similarity search indexing (CPU version by default, GPU via conda)
  • Rich: Terminal formatting and progress bars
  • PyTorch: Backend for sentence-transformers with automatic CUDA support

Repository Maintenance

Code Quality Pipeline

The comprehensive test script ./scripts/test.sh ensures all quality checks pass:

  1. repomix: Generates consolidated codebase snapshot
  2. Dependencies: Syncs and validates with Rye
  3. Testing: Full test suite with 90% coverage requirement
  4. Formatting: Black and isort for consistent code style
  5. Linting: Rye lint for code quality
  6. Type Checking: Both mypy (strict) and pyright for comprehensive type safety
  7. Database: Migration validation with Alembic
  8. Integration: End-to-end CLI testing with real data imports

Data Files

  • chatmine.db: Main SQLite database (auto-created on first run)
  • faiss_index.pkl: Serialized FAISS search index
  • Sample Data: Test exports in src/chatmine/temporary_chat_data/ for development
  • Export Outputs: exported_conversations/ and code-extract/ directories for user exports