Skip to content

YouTube Import: Background Worker — Video Processing + Channel Polling #128

@davidortinau

Description

@davidortinau

Overview

Implement two BackgroundServices in the Workers project: one to process queued video imports, one to poll monitored channels for new videos.

Crew: Wash (infrastructure/workers)
Depends on: #126 (data model), #127 (AI pipeline)

Worker 1: VideoImportWorker

Processes VideoImport records with Status = Queued.

Pipeline per job:

1. Set status → FetchingTranscript
2. Resolve video metadata (title, language)
3. Fetch available transcripts → pick target language
4. Download transcript text
5. Set status → PolishingTranscript
6. AI cleanup via TranscriptFormattingService.PolishWithAiAsync()
7. Set status → ExtractingVocabulary
8. Extract vocab via VocabularyExtractionService
9. Save LearningResource + VocabularyWords + mappings
10. Set VideoImport.LearningResourceId, status → Complete

Error handling:

  • Wrap each step in try/catch
  • On failure: Status = Failed, ErrorMessage = ex.Message
  • Do NOT retry automatically — user triggers retry from UI
  • Log structured errors for Aspire dashboard

Processing rules:

  • Poll for queued imports every 10 seconds
  • Process one import at a time (sequential, not parallel)
  • 500ms delay between YouTube API calls (rate limiting)

Worker 2: ChannelPollingWorker

Checks monitored channels for new videos and queues imports.

Logic:

while (!stoppingToken.IsCancellationRequested)
{
    var dueChannels = db.MonitoredChannels
        .Where(c => c.IsEnabled)
        .Where(c => c.LastPolledAt == null ||
               c.LastPolledAt < DateTime.UtcNow.AddHours(-c.PollIntervalHours));

    foreach (var channel in dueChannels)
    {
        var uploads = youtubeClient.Channels.GetUploadsAsync(channel.ChannelId);
        await foreach (var video in uploads.Take(10))
        {
            if (!await db.VideoImports.AnyAsync(v => v.VideoId == video.Id))
            {
                db.VideoImports.Add(new VideoImport
                {
                    VideoUrl = video.Url,
                    VideoId = video.Id,
                    Title = video.Title,
                    MonitoredChannelId = channel.Id,
                    UserProfileId = channel.UserProfileId,
                    Status = VideoImportStatus.Queued
                });
            }
        }
        channel.LastPolledAt = DateTime.UtcNow;
        await db.SaveChangesAsync();
    }
    await Task.Delay(TimeSpan.FromMinutes(15), stoppingToken);
}

Tasks

  • Add YoutubeExplode package reference to Workers project
  • Add SentenceStudio.Shared project reference to Workers (for DbContext, services)
  • Create VideoImportWorker : BackgroundService
  • Create ChannelPollingWorker : BackgroundService
  • Register both in Program.cs
  • Add required service registrations (YouTubeImportService, TranscriptFormattingService, VocabularyExtractionService, DbContext)
  • Add structured logging at each pipeline step
  • Test with a real YouTube video URL end-to-end

Architecture Reference

See .squad/decisions/inbox/zoe-youtube-import-architecture.md — Sections 4-5

Metadata

Metadata

Assignees

No one assigned

    Labels

    go:needs-researchNeeds investigationsquadSquad triage inbox — Lead will assign to a membersquad:washAssigned to Wash (Backend Dev)

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions