- Visit: https://console.cloud.google.com/
- Sign in with your Google account
- Click the project dropdown at the top of the page
- Click "New Project"
- Name it:
YouTube Semantic Search - Click "Create"
- In the left sidebar, click "APIs & Services" → "Library"
- Search for:
YouTube Data API v3 - Click on it and press "Enable"
- Go to "APIs & Services" → "Credentials"
- Click "Create Credentials" → "API Key"
- Copy your new API key (it looks like:
AIzaSyC...)
# Edit the setup file with your actual key
nano setup_api.sh
# Replace YOUR_API_KEY_HERE with your actual key
export YOUTUBE_API_KEY='AIzaSyC...your_actual_key_here...'
# Save and exit (Ctrl+X, Y, Enter)
# Load the API key
source setup_api.sh
# Verify it's set
echo $YOUTUBE_API_KEY# Collect real YouTube data
python collect_real_data.py- Free Tier: 10,000 units per day
- Each search: ~100 units
- Each video detail: ~1-2 units
- Each comment fetch: ~1 unit
- Each transcript: ~1 unit
- Search API: 300 requests per minute
- Video API: 300 requests per minute
- Comments API: 300 requests per minute
- 50 queries × 20 videos each = 1,000 videos
- Total API units: ~50,000
- Time needed: 15-30 minutes
- Cost: Free (within daily quota)
- ✅ Title and description
- ✅ Channel information
- ✅ View count, likes, comments
- ✅ Auto-generated transcripts
- ✅ Top relevant comments
- ✅ Video metadata (duration, category)
- ✅ Positive pairs (query → relevant video)
- ✅ Negative pairs (query → irrelevant video)
- ✅ Relevance scores for evaluation
- Check if the key is copied correctly
- Ensure the key is enabled for YouTube Data API v3
- Verify the project is selected
- Wait until tomorrow (quota resets daily)
- Reduce the number of search queries
- Use a paid Google Cloud account
- The script automatically handles this with delays
- If persistent, increase delays in the code
After successful collection, you'll have:
- ~1,000 videos with rich metadata
- ~3,000 training examples (positive + negative pairs)
- Diverse content across 8 major categories
- Real-world data for robust model training
- ✅ Set up API key
- ✅ Collect real data
- 🚀 Train model on real data
- 🚀 Test semantic search
- 🚀 Deploy and use!
Need help? Check the main README.md or create an issue in the repository.