A comprehensive system for collecting, processing, storing, and analyzing cryptocurrency data from multiple sources using machine learning and large language models. This project automates the collection of crypto project data, scrapes and analyzes web content, social media, and whitepapers to provide deep insights into cryptocurrency projects.
- LiveCoinWatch Integration: Automated collection of 52,000+ crypto projects with market data, links, and metadata
- Multi-source Scraping: Websites, whitepapers, Reddit, Twitter, Telegram, Medium, and YouTube
- Rate Limiting & Error Handling: Robust API client with retry logic and rate limit management
- Change Tracking: Historical tracking of all data changes with timestamp and source attribution
- LLM-Powered Analysis: Automated analysis using Ollama (local LLM inference)
- Website Analysis: Extract technology stack, use cases, competitive advantages
- Whitepaper Analysis: Parse and analyze project whitepapers (PDF and web formats)
- Social Media Intelligence: Reddit sentiment, Twitter activity, Telegram engagement
- Medium & YouTube: Content analysis from project blogs and video channels
- PostgreSQL Database: Production-ready with optimized schema for crypto data
- Docker Compose Setup: Full infrastructure including PostgreSQL, Redis, and admin tools
- Migration System: Alembic-based database migrations with rollback support
- Status Tracking: Comprehensive logging for website, whitepaper, and Reddit scraping
crypto-analytics/
βββ src/
β βββ collectors/ # Data collection modules
β β βββ livecoinwatch.py # LiveCoinWatch API client
β β βββ twitter_api.py # Twitter integration
β β βββ telegram_api.py # Telegram channel monitoring
β βββ scrapers/ # Web scraping modules
β β βββ website_scraper.py # General website scraping
β β βββ whitepaper_scraper.py
β β βββ reddit_scraper.py
β β βββ medium_scraper.py
β β βββ youtube_scraper.py
β βββ analyzers/ # LLM analysis modules
β β βββ website_analyzer.py
β β βββ whitepaper_analyzer.py
β β βββ reddit_analyzer.py
β β βββ twitter_analyzer.py
β β βββ telegram_analyzer.py
β β βββ medium_analyzer.py
β β βββ youtube_analyzer.py
β βββ pipelines/ # Analysis pipelines
β β βββ content_analysis_pipeline.py
β β βββ website_analysis_pipeline.py
β βββ models/ # Database models
β β βββ database.py # SQLAlchemy models
β β βββ init_db.py # Database initialization
β βββ services/ # Business logic services
β β βββ reddit_status_logger.py
β β βββ website_status_logger.py
β β βββ whitepaper_status_logger.py
β βββ utils/ # Utility modules
β βββ error_reporter.py
β βββ logging_config.py
β βββ url_filter.py
βββ scripts/ # Utility scripts
β βββ analysis/ # Analysis runners
β βββ migration/ # Database migrations
β βββ dev/ # Development tools
β βββ utils/ # Helper scripts
βββ config/ # Configuration files
βββ data/ # Data storage (gitignored)
βββ logs/ # Application logs (gitignored)
βββ tests/ # Unit and integration tests
βββ docs/ # Comprehensive documentation
βββ migrations/ # Alembic database migrations
βββ docker-compose.yml # Docker infrastructure setup
- Python 3.10+
- Docker & Docker Compose (for database)
- Ollama installed and running locally
- API Keys for:
- LiveCoinWatch
- Twitter API (optional)
- Reddit API (optional)
- Clone the repository
git clone https://github.com/jrbgit/crypto-analytics.git
cd crypto-analytics- Install dependencies
pip install -r requirements.txt
# Or for development
pip install -e .[dev]- Set up environment variables
cp config/.env.example config/.env
# Edit config/.env with your API keys- Start the database
docker-compose up -d postgres
# Optional: start admin interfaces
docker-compose --profile admin up -d- Initialize the database
python src/models/init_db.py- Run migrations (if needed)
alembic upgrade headCollect crypto project data:
python src/collectors/livecoinwatch.pyRun website analysis:
python scripts/analysis/run_website_analysis.pyRun comprehensive analysis:
python scripts/analysis/run_comprehensive_analysis.pyMonitor progress:
python scripts/analysis/monitor_progress.pyThe system uses PostgreSQL with the following main tables:
- crypto_projects: Core project data (price, market cap, supply, etc.)
- project_links: Social media and official links with status tracking
- project_images: Project logos and icons
- project_changes: Historical change tracking
- link_content_analysis: LLM analysis results for websites
- website_status_log: Website scraping status and error tracking
- whitepaper_status_log: Whitepaper analysis status
- reddit_status_log: Reddit scraping status
- api_usage: API usage tracking and rate limiting
See docs/DATABASE_MIGRATION_GUIDE.md for detailed schema information.
- LiveCoinWatch: Market data, rankings, supply metrics, project links
- Rate limit: 10,000 requests/day
- Coverage: 52,000+ crypto projects
- Project Websites: Technology stack, features, use cases
- Whitepapers: Technical specifications, tokenomics, roadmaps
- Reddit: Community sentiment, discussion activity
- Twitter: Social engagement, announcements
- Telegram: Community size, activity levels
- Medium: Project blog posts, updates
- YouTube: Video content, tutorials, AMAs
Create a config/.env file with:
# Database
DATABASE_URL=postgresql://crypto_user:password@localhost:5432/crypto_analytics
# LLM Configuration
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama2 # or your preferred model
# API Keys
LIVECOINWATCH_API_KEY=your_api_key_here
# Optional: Social Media APIs
TWITTER_API_KEY=your_twitter_key
REDDIT_CLIENT_ID=your_reddit_id
REDDIT_CLIENT_SECRET=your_reddit_secretThe docker-compose.yml includes:
- postgres: Main database (port 5432)
- redis: Caching layer (port 6379)
- adminer: Database admin UI (port 8080)
- pgadmin: PostgreSQL admin (port 5050)
- postgres_backup: Automated daily backups
pytest tests/
# With coverage
pytest --cov=src tests/# Linting
flake8 src/
# Type checking
mypy src/
# Formatting
black src/python scripts/dev/lint.py # Run all linters
python scripts/dev/check_types.py # Type checking
python scripts/dev/setup.py # Development setupDetailed documentation is available in the docs/ directory:
CryptoAnalyticsWithML_LLM.md: Original project concept and visionproject_spec.md: Complete project specificationCrypto_Data_Sources.md: Comprehensive list of crypto data sources and APIs
DATABASE_MIGRATION_GUIDE.md: Database schema and migrationsPERFORMANCE_ANALYSIS.md: Performance optimization guideANALYSIS_REPORT.md: Analysis results and findings
livecoinwatch_api.md: LiveCoinWatch API documentationREDDIT_API_NOTES.md: Reddit integration guidetwitter_integration_guide.md: Twitter API setupYOUTUBE_API_SETUP.md: YouTube OAuth configurationGOOGLE_DRIVE_SUPPORT.md: Google Drive whitepaper extraction
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Run the test suite and linters
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Repository: https://github.com/jrbgit/crypto-analytics
- Issues: https://github.com/jrbgit/crypto-analytics/issues
- LiveCoinWatch for comprehensive crypto market data
- Ollama for local LLM inference capabilities
- The cryptocurrency and open-source communities