SkillForge is a fullstack, AI-powered platform for generating, tracking, and personalizing learning roadmaps. It combines a modern Rust/Dioxus web application with robust data collection, enrichment, and processing pipelines in both Rust and Python. The system is designed for extensibility, automation, and high-quality learning resource curation.
- SkillForge
- Personalized, AI-generated skill roadmaps with progress tracking
- User authentication (sign up, login, profile management)
- Dashboard with learning analytics and visualizations
- AI-powered roadmap and question generation
- Automated data collection from YouTube and other sources
- LLM-based topic and prerequisite extraction
- Modern, responsive UI (Dioxus + Tailwind CSS)
- Modular, extensible data processing pipeline (Rust & Python)
- app/: Main Rust/Dioxus web application (frontend + backend, fullstack)
- database_url_enricher/: Rust tool for enriching course/resource URLs and storing them in SurrealDB
- data_collection_and_processing/: Python & Rust scripts for collecting, processing, merging, and cleaning educational data
- yt_data_collector/: Python YouTube scraper with LLM enhancement
- data_processor/: Rust/Python scripts for transforming and enhancing raw data
- data_merger_and_cleaner/: Rust/Python scripts for merging, deduplicating, and cleaning datasets
- Rust (latest stable)
- Node.js (for Tailwind CSS, optional)
- Python 3.9+ (for data collection/processing)
- wasm-pack (for web build)
- SurrealDB (local database, optional)
- Dioxus CLI (
cargo install dioxus-cli)
- Install Python dependencies:
cd data_collection_and_processing/yt_data_collector pip install -r requirements.txt - Collect YouTube data:
- Edit and run
integrated_scraper_with_llm.pywith your YouTube API key (and OpenRouter API key for LLM enhancement). - Example usage:
python integrated_scraper_with_llm.py --api-key <YOUTUBE_API_KEY> --openrouter-key <OPENROUTER_API_KEY>
- The script will fetch video metadata, transcripts, and optionally enhance topics/prerequisites using LLMs.
- Edit and run
- Process raw data:
- Use the Rust or Python scripts in
data_processor/to transform, enhance, and structure the collected data. - Example (Rust):
cd ../data_processor cargo run --release - The processor will output cleaned and LLM-enhanced JSON datasets.
- Use the Rust or Python scripts in
- Merge and deduplicate datasets:
- Use the scripts in
data_merger_and_cleaner/to combine multiple processed datasets, remove duplicates, and perform final cleaning. - Example (Rust):
cd ../data_merger_and_cleaner cargo run --release - The output will be a unified, high-quality dataset (e.g.,
final_data.json).
- Use the scripts in
- Build and run the Rust enrichment tool:
cd database_url_enricher cargo build --release cargo run --release- This tool enriches your course/resource data with URLs and stores them in SurrealDB.
- Build frontend assets (optional):
cd app # If you want to rebuild Tailwind CSS # npx tailwindcss -i ./tailwind.css -o ./assets/main.css --watch
- Run the app (recommended):
- Web (WASM, hot reload):
dx serve
- Other targets:
dx serve --platform desktop dx serve --platform web
- The app will be available at http://localhost:8080 by default.
- Web (WASM, hot reload):
- Location:
data_collection_and_processing/yt_data_collector/ - Language: Python
- Purpose: Scrapes YouTube for educational videos, collects metadata and transcripts, and (optionally) enhances data using LLMs via OpenRouter.
- Key scripts:
integrated_scraper_with_llm.py: Main entry point for scraping and LLM enhancement.llm_enhanced_processor.py: LLM-powered topic/prerequisite extraction.
- How to use:
- Install requirements, provide API keys, and run the main script as described above.
- Location:
data_collection_and_processing/data_processor/ - Language: Rust (and optionally Python)
- Purpose: Transforms raw scraped data into a structured, LLM-enhanced format suitable for merging and ingestion.
- Key script:
src/main.rs: Reads raw CSV/JSON, enhances with LLM, outputs processed JSON.
- How to use:
- Configure input/output paths and API keys as needed, then run with
cargo run --release.
- Configure input/output paths and API keys as needed, then run with
- Location:
data_collection_and_processing/data_merger_and_cleaner/ - Language: Rust (and optionally Python)
- Purpose: Merges multiple processed datasets, deduplicates entries, and performs final cleaning and validation.
- Key script:
src/main.rs: Main merging/cleaning logic.
- How to use:
- Adjust input/output paths as needed, then run with
cargo run --release.
- Adjust input/output paths as needed, then run with
- Location:
database_url_enricher/ - Language: Rust
- Purpose: Enriches course/resource data with URLs and stores them in SurrealDB for use by the main app.
- How to use:
- Build and run as described above in Database Enrichment.
- Open the app in your browser (default: http://localhost:8080)
- Sign up, log in, and start creating your learning roadmap!
- Use the dashboard to track your progress and explore recommended resources.
skillforge/
├── app/ # Dioxus web app (Rust)
├── database_url_enricher/ # Rust enrichment tool
├── data_collection_and_processing/
│ ├── yt_data_collector/ # Python YouTube scraper + LLM
│ ├── data_processor/ # Rust/Python data processing
│ └── data_merger_and_cleaner/ # Rust/Python data merging/cleaning
├── dataset_test.json # Example/test dataset
├── final_data.json # Final merged dataset
└── README.md # This file
- Build errors: Ensure all dependencies are installed and correct Rust toolchain is active.
- WASM issues: Make sure
wasm-packanddioxus-cliare installed and up to date. - Python errors: Check Python version and required packages.
- Database issues: Ensure SurrealDB is running if using server features.
- API key issues: Double-check your YouTube and OpenRouter API keys for data/LLM scripts.