Skip to content

Commit 792e710

Browse files
committed
docs(homepage): add technical deep dive section
- Explain encrypted HLS stream handling (segment capture & concat) - Document persisted GraphQL query workarounds (DOM extraction) - Cover sequential content locking and the complete command - Describe session persistence and CDN token authentication - Add tech stack badges (TypeScript, Playwright, ffmpeg, etc.) - Update navigation to include 'How It Works' link
1 parent d0c0f0a commit 792e710

File tree

1 file changed

+113
-1
lines changed

1 file changed

+113
-1
lines changed

docs/index.html

Lines changed: 113 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@
6767
<div class="flex items-center gap-6">
6868
<a href="#features" class="text-slate-400 hover:text-white transition-colors hidden sm:block">Features</a>
6969
<a href="#platforms" class="text-slate-400 hover:text-white transition-colors hidden sm:block">Platforms</a>
70-
<a href="#usage" class="text-slate-400 hover:text-white transition-colors hidden sm:block">Usage</a>
70+
<a href="#architecture" class="text-slate-400 hover:text-white transition-colors hidden sm:block">How It Works</a>
7171
<a href="#install" class="text-slate-400 hover:text-white transition-colors hidden sm:block">Install</a>
7272
<a href="https://github.com/sebastian-software/offcourse" class="flex items-center gap-2 text-slate-400 hover:text-white transition-colors">
7373
<img src="icon-github.svg" alt="GitHub" class="w-5 h-5 opacity-70">
@@ -474,6 +474,118 @@ <h4 class="font-semibold mb-4 flex items-center gap-2">
474474
</div>
475475
</section>
476476

477+
<!-- Under the Hood Section -->
478+
<section id="architecture" class="py-24 px-6 bg-gradient-to-b from-slate-900/50 to-slate-950">
479+
<div class="max-w-4xl mx-auto">
480+
<h2 class="text-3xl sm:text-4xl font-bold text-center mb-4">Under the Hood</h2>
481+
<p class="text-slate-400 text-center mb-6 max-w-2xl mx-auto">Scraping modern learning platforms isn't trivial. Here's how Offcourse solves real-world challenges.</p>
482+
<p class="text-slate-500 text-center mb-16 max-w-2xl mx-auto text-sm">
483+
Each platform has its own authentication, content locking, and video delivery quirks.
484+
We handle them all so you don't have to.
485+
</p>
486+
487+
<div class="space-y-8">
488+
<!-- Challenge 1: Encrypted HLS -->
489+
<div class="bg-slate-800/20 rounded-2xl p-6 border border-slate-700/30">
490+
<div class="flex items-start gap-4">
491+
<div class="w-10 h-10 bg-brand-500/20 rounded-lg flex items-center justify-center flex-shrink-0 mt-1">
492+
<img src="icon-lock.svg" alt="" class="w-5 h-5" style="filter: invert(70%) sepia(80%) saturate(500%) hue-rotate(330deg);">
493+
</div>
494+
<div>
495+
<h3 class="text-lg font-semibold mb-2">Encrypted HLS Streams</h3>
496+
<p class="text-slate-400 text-sm mb-3">Some platforms serve encrypted HLS playlists that are decrypted client-side. Standard downloaders fail because the playlist data is gibberish.</p>
497+
<div class="bg-slate-900/50 rounded-lg p-4 border border-slate-700/30">
498+
<p class="text-slate-500 text-xs mb-2 font-mono">Our approach:</p>
499+
<p class="text-slate-300 text-sm">We intercept the actual <code class="bg-slate-800 px-1.5 py-0.5 rounded text-xs">.ts</code> video segments as the browser plays them, capture their individual auth tokens, download each segment, then concatenate them with <code class="bg-slate-800 px-1.5 py-0.5 rounded text-xs">ffmpeg</code>.</p>
500+
</div>
501+
</div>
502+
</div>
503+
</div>
504+
505+
<!-- Challenge 2: GraphQL Persisted Queries -->
506+
<div class="bg-slate-800/20 rounded-2xl p-6 border border-slate-700/30">
507+
<div class="flex items-start gap-4">
508+
<div class="w-10 h-10 bg-purple-500/20 rounded-lg flex items-center justify-center flex-shrink-0 mt-1">
509+
<img src="icon-terminal.svg" alt="" class="w-5 h-5" style="filter: invert(70%) sepia(50%) saturate(500%) hue-rotate(240deg);">
510+
</div>
511+
<div>
512+
<h3 class="text-lg font-semibold mb-2">Persisted GraphQL Queries</h3>
513+
<p class="text-slate-400 text-sm mb-3">Modern SPAs often use GraphQL with "persisted queries"—the server only accepts pre-registered query hashes, rejecting arbitrary queries.</p>
514+
<div class="bg-slate-900/50 rounded-lg p-4 border border-slate-700/30">
515+
<p class="text-slate-500 text-xs mb-2 font-mono">Our approach:</p>
516+
<p class="text-slate-300 text-sm">Instead of reverse-engineering GraphQL, we drive a real browser and extract data from the rendered DOM. This approach is more robust and survives API changes.</p>
517+
</div>
518+
</div>
519+
</div>
520+
</div>
521+
522+
<!-- Challenge 3: Sequential Locking -->
523+
<div class="bg-slate-800/20 rounded-2xl p-6 border border-slate-700/30">
524+
<div class="flex items-start gap-4">
525+
<div class="w-10 h-10 bg-emerald-500/20 rounded-lg flex items-center justify-center flex-shrink-0 mt-1">
526+
<img src="icon-check.svg" alt="" class="w-5 h-5" style="filter: invert(70%) sepia(50%) saturate(500%) hue-rotate(110deg);">
527+
</div>
528+
<div>
529+
<h3 class="text-lg font-semibold mb-2">Sequential Content Locking</h3>
530+
<p class="text-slate-400 text-sm mb-3">Some courses lock lessons until previous ones are "completed". You might own the course but can't access lesson 10 until you've watched lessons 1-9.</p>
531+
<div class="bg-slate-900/50 rounded-lg p-4 border border-slate-700/30">
532+
<p class="text-slate-500 text-xs mb-2 font-mono">Our approach:</p>
533+
<p class="text-slate-300 text-sm">The <code class="bg-slate-800 px-1.5 py-0.5 rounded text-xs">complete</code> command iteratively marks lessons as done, triggering the platform to unlock subsequent content. It loops until no new content appears.</p>
534+
</div>
535+
</div>
536+
</div>
537+
</div>
538+
539+
<!-- Challenge 4: Session Management -->
540+
<div class="bg-slate-800/20 rounded-2xl p-6 border border-slate-700/30">
541+
<div class="flex items-start gap-4">
542+
<div class="w-10 h-10 bg-yellow-500/20 rounded-lg flex items-center justify-center flex-shrink-0 mt-1">
543+
<img src="icon-zap.svg" alt="" class="w-5 h-5" style="filter: invert(70%) sepia(80%) saturate(500%) hue-rotate(10deg);">
544+
</div>
545+
<div>
546+
<h3 class="text-lg font-semibold mb-2">Persistent Session Management</h3>
547+
<p class="text-slate-400 text-sm mb-3">Re-authenticating for every sync is tedious. Sessions need to persist, but also detect when they've expired.</p>
548+
<div class="bg-slate-900/50 rounded-lg p-4 border border-slate-700/30">
549+
<p class="text-slate-500 text-xs mb-2 font-mono">Our approach:</p>
550+
<p class="text-slate-300 text-sm">We cache browser state (cookies, localStorage) per-platform. On each run, we validate the session and only prompt for login when truly needed. Interactive logins happen in a visible browser window you control.</p>
551+
</div>
552+
</div>
553+
</div>
554+
</div>
555+
556+
<!-- Challenge 5: Token Expiry -->
557+
<div class="bg-slate-800/20 rounded-2xl p-6 border border-slate-700/30">
558+
<div class="flex items-start gap-4">
559+
<div class="w-10 h-10 bg-cyan-500/20 rounded-lg flex items-center justify-center flex-shrink-0 mt-1">
560+
<img src="icon-video.svg" alt="" class="w-5 h-5" style="filter: invert(70%) sepia(50%) saturate(500%) hue-rotate(170deg);">
561+
</div>
562+
<div>
563+
<h3 class="text-lg font-semibold mb-2">CDN Token Authentication</h3>
564+
<p class="text-slate-400 text-sm mb-3">Video CDNs like Bunny require signed URLs with short-lived tokens, API keys, and specific headers. Simple wget/curl doesn't work.</p>
565+
<div class="bg-slate-900/50 rounded-lg p-4 border border-slate-700/30">
566+
<p class="text-slate-500 text-xs mb-2 font-mono">Our approach:</p>
567+
<p class="text-slate-300 text-sm">We extract all auth artifacts (cookies, tokens, API keys) from the browser session and pass them to <code class="bg-slate-800 px-1.5 py-0.5 rounded text-xs">ffmpeg</code> and our download handlers via headers.</p>
568+
</div>
569+
</div>
570+
</div>
571+
</div>
572+
</div>
573+
574+
<!-- Tech Stack -->
575+
<div class="mt-16 pt-12 border-t border-slate-700/30">
576+
<h3 class="text-lg font-semibold text-center mb-8">Built With</h3>
577+
<div class="flex flex-wrap justify-center gap-3">
578+
<span class="bg-slate-800/50 px-4 py-2 rounded-full text-slate-300 text-sm border border-slate-700/50">TypeScript</span>
579+
<span class="bg-slate-800/50 px-4 py-2 rounded-full text-slate-300 text-sm border border-slate-700/50">Playwright</span>
580+
<span class="bg-slate-800/50 px-4 py-2 rounded-full text-slate-300 text-sm border border-slate-700/50">Commander.js</span>
581+
<span class="bg-slate-800/50 px-4 py-2 rounded-full text-slate-300 text-sm border border-slate-700/50">Zod</span>
582+
<span class="bg-slate-800/50 px-4 py-2 rounded-full text-slate-300 text-sm border border-slate-700/50">ffmpeg</span>
583+
<span class="bg-slate-800/50 px-4 py-2 rounded-full text-slate-300 text-sm border border-slate-700/50">hls-parser</span>
584+
</div>
585+
</div>
586+
</div>
587+
</section>
588+
477589
<!-- CTA Section -->
478590
<section class="py-24 px-6 bg-gradient-to-b from-slate-950 to-slate-900">
479591
<div class="max-w-4xl mx-auto text-center">

0 commit comments

Comments
 (0)