Skip to content

AI-powered crypto analytics: automated data collection, web scraping, and LLM analysis for cryptocurrency projects using Ollama.

License

Notifications You must be signed in to change notification settings

jrbgit/crypto-analytics

Repository files navigation

Crypto Analytics with ML and LLM Integration

A comprehensive system for collecting, processing, storing, and analyzing cryptocurrency data from multiple sources using machine learning and large language models. This project automates the collection of crypto project data, scrapes and analyzes web content, social media, and whitepapers to provide deep insights into cryptocurrency projects.

πŸš€ Features

Data Collection

  • LiveCoinWatch Integration: Automated collection of 52,000+ crypto projects with market data, links, and metadata
  • Multi-source Scraping: Websites, whitepapers, Reddit, Twitter, Telegram, Medium, and YouTube
  • Rate Limiting & Error Handling: Robust API client with retry logic and rate limit management
  • Change Tracking: Historical tracking of all data changes with timestamp and source attribution

Content Analysis

  • LLM-Powered Analysis: Automated analysis using Ollama (local LLM inference)
  • Website Analysis: Extract technology stack, use cases, competitive advantages
  • Whitepaper Analysis: Parse and analyze project whitepapers (PDF and web formats)
  • Social Media Intelligence: Reddit sentiment, Twitter activity, Telegram engagement
  • Medium & YouTube: Content analysis from project blogs and video channels

Data Infrastructure

  • PostgreSQL Database: Production-ready with optimized schema for crypto data
  • Docker Compose Setup: Full infrastructure including PostgreSQL, Redis, and admin tools
  • Migration System: Alembic-based database migrations with rollback support
  • Status Tracking: Comprehensive logging for website, whitepaper, and Reddit scraping

πŸ“ Project Structure

crypto-analytics/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ collectors/        # Data collection modules
β”‚   β”‚   β”œβ”€β”€ livecoinwatch.py    # LiveCoinWatch API client
β”‚   β”‚   β”œβ”€β”€ twitter_api.py      # Twitter integration
β”‚   β”‚   └── telegram_api.py     # Telegram channel monitoring
β”‚   β”œβ”€β”€ scrapers/         # Web scraping modules
β”‚   β”‚   β”œβ”€β”€ website_scraper.py  # General website scraping
β”‚   β”‚   β”œβ”€β”€ whitepaper_scraper.py
β”‚   β”‚   β”œβ”€β”€ reddit_scraper.py
β”‚   β”‚   β”œβ”€β”€ medium_scraper.py
β”‚   β”‚   └── youtube_scraper.py
β”‚   β”œβ”€β”€ analyzers/        # LLM analysis modules
β”‚   β”‚   β”œβ”€β”€ website_analyzer.py
β”‚   β”‚   β”œβ”€β”€ whitepaper_analyzer.py
β”‚   β”‚   β”œβ”€β”€ reddit_analyzer.py
β”‚   β”‚   β”œβ”€β”€ twitter_analyzer.py
β”‚   β”‚   β”œβ”€β”€ telegram_analyzer.py
β”‚   β”‚   β”œβ”€β”€ medium_analyzer.py
β”‚   β”‚   └── youtube_analyzer.py
β”‚   β”œβ”€β”€ pipelines/        # Analysis pipelines
β”‚   β”‚   β”œβ”€β”€ content_analysis_pipeline.py
β”‚   β”‚   └── website_analysis_pipeline.py
β”‚   β”œβ”€β”€ models/           # Database models
β”‚   β”‚   β”œβ”€β”€ database.py         # SQLAlchemy models
β”‚   β”‚   └── init_db.py          # Database initialization
β”‚   β”œβ”€β”€ services/         # Business logic services
β”‚   β”‚   β”œβ”€β”€ reddit_status_logger.py
β”‚   β”‚   β”œβ”€β”€ website_status_logger.py
β”‚   β”‚   └── whitepaper_status_logger.py
β”‚   └── utils/            # Utility modules
β”‚       β”œβ”€β”€ error_reporter.py
β”‚       β”œβ”€β”€ logging_config.py
β”‚       └── url_filter.py
β”œβ”€β”€ scripts/              # Utility scripts
β”‚   β”œβ”€β”€ analysis/         # Analysis runners
β”‚   β”œβ”€β”€ migration/        # Database migrations
β”‚   β”œβ”€β”€ dev/              # Development tools
β”‚   └── utils/            # Helper scripts
β”œβ”€β”€ config/               # Configuration files
β”œβ”€β”€ data/                 # Data storage (gitignored)
β”œβ”€β”€ logs/                 # Application logs (gitignored)
β”œβ”€β”€ tests/                # Unit and integration tests
β”œβ”€β”€ docs/                 # Comprehensive documentation
β”œβ”€β”€ migrations/           # Alembic database migrations
└── docker-compose.yml    # Docker infrastructure setup

πŸ› οΈ Getting Started

Prerequisites

  • Python 3.10+
  • Docker & Docker Compose (for database)
  • Ollama installed and running locally
  • API Keys for:
    • LiveCoinWatch
    • Twitter API (optional)
    • Reddit API (optional)

Installation

  1. Clone the repository
git clone https://github.com/jrbgit/crypto-analytics.git
cd crypto-analytics
  1. Install dependencies
pip install -r requirements.txt
# Or for development
pip install -e .[dev]
  1. Set up environment variables
cp config/.env.example config/.env
# Edit config/.env with your API keys
  1. Start the database
docker-compose up -d postgres
# Optional: start admin interfaces
docker-compose --profile admin up -d
  1. Initialize the database
python src/models/init_db.py
  1. Run migrations (if needed)
alembic upgrade head

Quick Start

Collect crypto project data:

python src/collectors/livecoinwatch.py

Run website analysis:

python scripts/analysis/run_website_analysis.py

Run comprehensive analysis:

python scripts/analysis/run_comprehensive_analysis.py

Monitor progress:

python scripts/analysis/monitor_progress.py

πŸ—„οΈ Database Schema

The system uses PostgreSQL with the following main tables:

  • crypto_projects: Core project data (price, market cap, supply, etc.)
  • project_links: Social media and official links with status tracking
  • project_images: Project logos and icons
  • project_changes: Historical change tracking
  • link_content_analysis: LLM analysis results for websites
  • website_status_log: Website scraping status and error tracking
  • whitepaper_status_log: Whitepaper analysis status
  • reddit_status_log: Reddit scraping status
  • api_usage: API usage tracking and rate limiting

See docs/DATABASE_MIGRATION_GUIDE.md for detailed schema information.

πŸ“Š Data Sources

Primary Data

  • LiveCoinWatch: Market data, rankings, supply metrics, project links
  • Rate limit: 10,000 requests/day
  • Coverage: 52,000+ crypto projects

Content Sources

  • Project Websites: Technology stack, features, use cases
  • Whitepapers: Technical specifications, tokenomics, roadmaps
  • Reddit: Community sentiment, discussion activity
  • Twitter: Social engagement, announcements
  • Telegram: Community size, activity levels
  • Medium: Project blog posts, updates
  • YouTube: Video content, tutorials, AMAs

πŸ”§ Configuration

Environment Variables

Create a config/.env file with:

# Database
DATABASE_URL=postgresql://crypto_user:password@localhost:5432/crypto_analytics

# LLM Configuration
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama2  # or your preferred model

# API Keys
LIVECOINWATCH_API_KEY=your_api_key_here

# Optional: Social Media APIs
TWITTER_API_KEY=your_twitter_key
REDDIT_CLIENT_ID=your_reddit_id
REDDIT_CLIENT_SECRET=your_reddit_secret

Docker Services

The docker-compose.yml includes:

  • postgres: Main database (port 5432)
  • redis: Caching layer (port 6379)
  • adminer: Database admin UI (port 8080)
  • pgadmin: PostgreSQL admin (port 5050)
  • postgres_backup: Automated daily backups

πŸ§ͺ Development

Running Tests

pytest tests/
# With coverage
pytest --cov=src tests/

Code Quality

# Linting
flake8 src/

# Type checking
mypy src/

# Formatting
black src/

Development Scripts

python scripts/dev/lint.py      # Run all linters
python scripts/dev/check_types.py  # Type checking
python scripts/dev/setup.py     # Development setup

πŸ“š Documentation

Detailed documentation is available in the docs/ directory:

Project Overview

  • CryptoAnalyticsWithML_LLM.md: Original project concept and vision
  • project_spec.md: Complete project specification
  • Crypto_Data_Sources.md: Comprehensive list of crypto data sources and APIs

Technical Documentation

  • DATABASE_MIGRATION_GUIDE.md: Database schema and migrations
  • PERFORMANCE_ANALYSIS.md: Performance optimization guide
  • ANALYSIS_REPORT.md: Analysis results and findings

API Integration Guides

  • livecoinwatch_api.md: LiveCoinWatch API documentation
  • REDDIT_API_NOTES.md: Reddit integration guide
  • twitter_integration_guide.md: Twitter API setup
  • YOUTUBE_API_SETUP.md: YouTube OAuth configuration
  • GOOGLE_DRIVE_SUPPORT.md: Google Drive whitepaper extraction

🀝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with tests
  4. Run the test suite and linters
  5. Submit a pull request

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ”— Links

πŸ™ Acknowledgments

  • LiveCoinWatch for comprehensive crypto market data
  • Ollama for local LLM inference capabilities
  • The cryptocurrency and open-source communities

About

AI-powered crypto analytics: automated data collection, web scraping, and LLM analysis for cryptocurrency projects using Ollama.

Topics

Resources

License

Stars

Watchers

Forks

Languages