SkillForge

SkillForge is a fullstack, AI-powered platform for generating, tracking, and personalizing learning roadmaps. It combines a modern Rust/Dioxus web application with robust data collection, enrichment, and processing pipelines in both Rust and Python. The system is designed for extensibility, automation, and high-quality learning resource curation.

Features

Personalized, AI-generated skill roadmaps with progress tracking
User authentication (sign up, login, profile management)
Dashboard with learning analytics and visualizations
AI-powered roadmap and question generation
Automated data collection from YouTube and other sources
LLM-based topic and prerequisite extraction
Modern, responsive UI (Dioxus + Tailwind CSS)
Modular, extensible data processing pipeline (Rust & Python)

Architecture Overview

app/: Main Rust/Dioxus web application (frontend + backend, fullstack)
database_url_enricher/: Rust tool for enriching course/resource URLs and storing them in SurrealDB
data_collection_and_processing/: Python & Rust scripts for collecting, processing, merging, and cleaning educational data
- yt_data_collector/: Python YouTube scraper with LLM enhancement
- data_processor/: Rust/Python scripts for transforming and enhancing raw data
- data_merger_and_cleaner/: Rust/Python scripts for merging, deduplicating, and cleaning datasets

Setup & Installation

1. Prerequisites

Rust (latest stable)
Node.js (for Tailwind CSS, optional)
Python 3.9+ (for data collection/processing)
wasm-pack (for web build)
SurrealDB (local database, optional)
Dioxus CLI (cargo install dioxus-cli)

2. Data Collection & Processing

a. YouTube Data Collection

Install Python dependencies:

cd data_collection_and_processing/yt_data_collector
pip install -r requirements.txt

Collect YouTube data:
- Edit and run integrated_scraper_with_llm.py with your YouTube API key (and OpenRouter API key for LLM enhancement).
- Example usage:
```
python integrated_scraper_with_llm.py --api-key <YOUTUBE_API_KEY> --openrouter-key <OPENROUTER_API_KEY>
```
- The script will fetch video metadata, transcripts, and optionally enhance topics/prerequisites using LLMs.

b. Data Processing

Process raw data:
- Use the Rust or Python scripts in data_processor/ to transform, enhance, and structure the collected data.
- Example (Rust):
```
cd ../data_processor
cargo run --release
```
- The processor will output cleaned and LLM-enhanced JSON datasets.

c. Data Merging & Cleaning

Merge and deduplicate datasets:
- Use the scripts in data_merger_and_cleaner/ to combine multiple processed datasets, remove duplicates, and perform final cleaning.
- Example (Rust):
```
cd ../data_merger_and_cleaner
cargo run --release
```
- The output will be a unified, high-quality dataset (e.g., final_data.json).

3. Database Enrichment

Build and run the Rust enrichment tool:
```
cd database_url_enricher
cargo build --release
cargo run --release
```
- This tool enriches your course/resource data with URLs and stores them in SurrealDB.

4. Running the Web App

Build frontend assets (optional):

cd app
# If you want to rebuild Tailwind CSS
# npx tailwindcss -i ./tailwind.css -o ./assets/main.css --watch

Run the app (recommended):
- Web (WASM, hot reload):
```
dx serve
```
- Other targets:
```
dx serve --platform desktop
dx serve --platform web
```
- The app will be available at http://localhost:8080 by default.

Helper Modules & Pipelines

YouTube Data Collector

Location: data_collection_and_processing/yt_data_collector/
Language: Python
Purpose: Scrapes YouTube for educational videos, collects metadata and transcripts, and (optionally) enhances data using LLMs via OpenRouter.
Key scripts:
- integrated_scraper_with_llm.py: Main entry point for scraping and LLM enhancement.
- llm_enhanced_processor.py: LLM-powered topic/prerequisite extraction.
How to use:
- Install requirements, provide API keys, and run the main script as described above.

Data Processor

Location: data_collection_and_processing/data_processor/
Language: Rust (and optionally Python)
Purpose: Transforms raw scraped data into a structured, LLM-enhanced format suitable for merging and ingestion.
Key script:
- src/main.rs: Reads raw CSV/JSON, enhances with LLM, outputs processed JSON.
How to use:
- Configure input/output paths and API keys as needed, then run with cargo run --release.

Data Merger & Cleaner

Location: data_collection_and_processing/data_merger_and_cleaner/
Language: Rust (and optionally Python)
Purpose: Merges multiple processed datasets, deduplicates entries, and performs final cleaning and validation.
Key script:
- src/main.rs: Main merging/cleaning logic.
How to use:
- Adjust input/output paths as needed, then run with cargo run --release.

Database URL Enricher

Location: database_url_enricher/
Language: Rust
Purpose: Enriches course/resource data with URLs and stores them in SurrealDB for use by the main app.
How to use:
- Build and run as described above in Database Enrichment.

Usage

Open the app in your browser (default: http://localhost:8080)
Sign up, log in, and start creating your learning roadmap!
Use the dashboard to track your progress and explore recommended resources.

Project Structure

skillforge/
├── app/                        # Dioxus web app (Rust)
├── database_url_enricher/      # Rust enrichment tool
├── data_collection_and_processing/
│   ├── yt_data_collector/      # Python YouTube scraper + LLM
│   ├── data_processor/         # Rust/Python data processing
│   └── data_merger_and_cleaner/ # Rust/Python data merging/cleaning
├── dataset_test.json           # Example/test dataset
├── final_data.json             # Final merged dataset
└── README.md                   # This file

Troubleshooting

Build errors: Ensure all dependencies are installed and correct Rust toolchain is active.
WASM issues: Make sure wasm-pack and dioxus-cli are installed and up to date.
Python errors: Check Python version and required packages.
Database issues: Ensure SurrealDB is running if using server features.
API key issues: Double-check your YouTube and OpenRouter API keys for data/LLM scripts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SkillForge

Table of Contents

Features

Architecture Overview

Setup & Installation

1. Prerequisites

2. Data Collection & Processing

a. YouTube Data Collection

b. Data Processing

c. Data Merging & Cleaning

3. Database Enrichment

4. Running the Web App

Helper Modules & Pipelines

YouTube Data Collector

Data Processor

Data Merger & Cleaner

Database URL Enricher

Usage

Project Structure

Troubleshooting

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
app		app
data_collection_and_processing		data_collection_and_processing
database_url_enricher		database_url_enricher
README.md		README.md
dataset_test.json		dataset_test.json
final_data.json		final_data.json

SeerBlazeJ/skillforge-ai

Folders and files

Latest commit

History

Repository files navigation

SkillForge

Table of Contents

Features

Architecture Overview

Setup & Installation

1. Prerequisites

2. Data Collection & Processing

a. YouTube Data Collection

b. Data Processing

c. Data Merging & Cleaning

3. Database Enrichment

4. Running the Web App

Helper Modules & Pipelines

YouTube Data Collector

Data Processor

Data Merger & Cleaner

Database URL Enricher

Usage

Project Structure

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages