Skip to content

Latest commit

 

History

History
117 lines (89 loc) · 3.04 KB

File metadata and controls

117 lines (89 loc) · 3.04 KB

🔑 YouTube API Setup Guide

Step-by-Step Instructions

1. Go to Google Cloud Console

2. Create a New Project

  • Click the project dropdown at the top of the page
  • Click "New Project"
  • Name it: YouTube Semantic Search
  • Click "Create"

3. Enable YouTube Data API v3

  • In the left sidebar, click "APIs & Services" → "Library"
  • Search for: YouTube Data API v3
  • Click on it and press "Enable"

4. Create API Credentials

  • Go to "APIs & Services" → "Credentials"
  • Click "Create Credentials" → "API Key"
  • Copy your new API key (it looks like: AIzaSyC...)

5. Set Up Your API Key

# Edit the setup file with your actual key
nano setup_api.sh

# Replace YOUR_API_KEY_HERE with your actual key
export YOUTUBE_API_KEY='AIzaSyC...your_actual_key_here...'

# Save and exit (Ctrl+X, Y, Enter)

# Load the API key
source setup_api.sh

# Verify it's set
echo $YOUTUBE_API_KEY

6. Run Data Collection

# Collect real YouTube data
python collect_real_data.py

⚠️ Important Notes

API Quotas

  • Free Tier: 10,000 units per day
  • Each search: ~100 units
  • Each video detail: ~1-2 units
  • Each comment fetch: ~1 unit
  • Each transcript: ~1 unit

Rate Limits

  • Search API: 300 requests per minute
  • Video API: 300 requests per minute
  • Comments API: 300 requests per minute

Estimated Collection Time

  • 50 queries × 20 videos each = 1,000 videos
  • Total API units: ~50,000
  • Time needed: 15-30 minutes
  • Cost: Free (within daily quota)

🚀 What You'll Get

Data Collected Per Video:

  • ✅ Title and description
  • ✅ Channel information
  • ✅ View count, likes, comments
  • ✅ Auto-generated transcripts
  • ✅ Top relevant comments
  • ✅ Video metadata (duration, category)

Training Examples Created:

  • ✅ Positive pairs (query → relevant video)
  • ✅ Negative pairs (query → irrelevant video)
  • ✅ Relevance scores for evaluation

🔧 Troubleshooting

"API Key Invalid" Error

  • Check if the key is copied correctly
  • Ensure the key is enabled for YouTube Data API v3
  • Verify the project is selected

"Quota Exceeded" Error

  • Wait until tomorrow (quota resets daily)
  • Reduce the number of search queries
  • Use a paid Google Cloud account

"Rate Limit Exceeded" Error

  • The script automatically handles this with delays
  • If persistent, increase delays in the code

📊 Expected Results

After successful collection, you'll have:

  • ~1,000 videos with rich metadata
  • ~3,000 training examples (positive + negative pairs)
  • Diverse content across 8 major categories
  • Real-world data for robust model training

🎯 Next Steps

  1. ✅ Set up API key
  2. ✅ Collect real data
  3. 🚀 Train model on real data
  4. 🚀 Test semantic search
  5. 🚀 Deploy and use!

Need help? Check the main README.md or create an issue in the repository.