-
Notifications
You must be signed in to change notification settings - Fork 5
YouTube Import: Background Worker — Video Processing + Channel Polling #128
Copy link
Copy link
Closed
Labels
go:needs-researchNeeds investigationNeeds investigationsquadSquad triage inbox — Lead will assign to a memberSquad triage inbox — Lead will assign to a membersquad:washAssigned to Wash (Backend Dev)Assigned to Wash (Backend Dev)
Description
Overview
Implement two BackgroundServices in the Workers project: one to process queued video imports, one to poll monitored channels for new videos.
Crew: Wash (infrastructure/workers)
Depends on: #126 (data model), #127 (AI pipeline)
Worker 1: VideoImportWorker
Processes VideoImport records with Status = Queued.
Pipeline per job:
1. Set status → FetchingTranscript
2. Resolve video metadata (title, language)
3. Fetch available transcripts → pick target language
4. Download transcript text
5. Set status → PolishingTranscript
6. AI cleanup via TranscriptFormattingService.PolishWithAiAsync()
7. Set status → ExtractingVocabulary
8. Extract vocab via VocabularyExtractionService
9. Save LearningResource + VocabularyWords + mappings
10. Set VideoImport.LearningResourceId, status → Complete
Error handling:
- Wrap each step in try/catch
- On failure:
Status = Failed,ErrorMessage = ex.Message - Do NOT retry automatically — user triggers retry from UI
- Log structured errors for Aspire dashboard
Processing rules:
- Poll for queued imports every 10 seconds
- Process one import at a time (sequential, not parallel)
- 500ms delay between YouTube API calls (rate limiting)
Worker 2: ChannelPollingWorker
Checks monitored channels for new videos and queues imports.
Logic:
while (!stoppingToken.IsCancellationRequested)
{
var dueChannels = db.MonitoredChannels
.Where(c => c.IsEnabled)
.Where(c => c.LastPolledAt == null ||
c.LastPolledAt < DateTime.UtcNow.AddHours(-c.PollIntervalHours));
foreach (var channel in dueChannels)
{
var uploads = youtubeClient.Channels.GetUploadsAsync(channel.ChannelId);
await foreach (var video in uploads.Take(10))
{
if (!await db.VideoImports.AnyAsync(v => v.VideoId == video.Id))
{
db.VideoImports.Add(new VideoImport
{
VideoUrl = video.Url,
VideoId = video.Id,
Title = video.Title,
MonitoredChannelId = channel.Id,
UserProfileId = channel.UserProfileId,
Status = VideoImportStatus.Queued
});
}
}
channel.LastPolledAt = DateTime.UtcNow;
await db.SaveChangesAsync();
}
await Task.Delay(TimeSpan.FromMinutes(15), stoppingToken);
}Tasks
- Add YoutubeExplode package reference to Workers project
- Add SentenceStudio.Shared project reference to Workers (for DbContext, services)
- Create
VideoImportWorker : BackgroundService - Create
ChannelPollingWorker : BackgroundService - Register both in
Program.cs - Add required service registrations (YouTubeImportService, TranscriptFormattingService, VocabularyExtractionService, DbContext)
- Add structured logging at each pipeline step
- Test with a real YouTube video URL end-to-end
Architecture Reference
See .squad/decisions/inbox/zoe-youtube-import-architecture.md — Sections 4-5
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
go:needs-researchNeeds investigationNeeds investigationsquadSquad triage inbox — Lead will assign to a memberSquad triage inbox — Lead will assign to a membersquad:washAssigned to Wash (Backend Dev)Assigned to Wash (Backend Dev)