|
67 | 67 | <div class="flex items-center gap-6"> |
68 | 68 | <a href="#features" class="text-slate-400 hover:text-white transition-colors hidden sm:block">Features</a> |
69 | 69 | <a href="#platforms" class="text-slate-400 hover:text-white transition-colors hidden sm:block">Platforms</a> |
70 | | - <a href="#usage" class="text-slate-400 hover:text-white transition-colors hidden sm:block">Usage</a> |
| 70 | + <a href="#architecture" class="text-slate-400 hover:text-white transition-colors hidden sm:block">How It Works</a> |
71 | 71 | <a href="#install" class="text-slate-400 hover:text-white transition-colors hidden sm:block">Install</a> |
72 | 72 | <a href="https://github.com/sebastian-software/offcourse" class="flex items-center gap-2 text-slate-400 hover:text-white transition-colors"> |
73 | 73 | <img src="icon-github.svg" alt="GitHub" class="w-5 h-5 opacity-70"> |
@@ -474,6 +474,118 @@ <h4 class="font-semibold mb-4 flex items-center gap-2"> |
474 | 474 | </div> |
475 | 475 | </section> |
476 | 476 |
|
| 477 | + <!-- Under the Hood Section --> |
| 478 | + <section id="architecture" class="py-24 px-6 bg-gradient-to-b from-slate-900/50 to-slate-950"> |
| 479 | + <div class="max-w-4xl mx-auto"> |
| 480 | + <h2 class="text-3xl sm:text-4xl font-bold text-center mb-4">Under the Hood</h2> |
| 481 | + <p class="text-slate-400 text-center mb-6 max-w-2xl mx-auto">Scraping modern learning platforms isn't trivial. Here's how Offcourse solves real-world challenges.</p> |
| 482 | + <p class="text-slate-500 text-center mb-16 max-w-2xl mx-auto text-sm"> |
| 483 | + Each platform has its own authentication, content locking, and video delivery quirks. |
| 484 | + We handle them all so you don't have to. |
| 485 | + </p> |
| 486 | + |
| 487 | + <div class="space-y-8"> |
| 488 | + <!-- Challenge 1: Encrypted HLS --> |
| 489 | + <div class="bg-slate-800/20 rounded-2xl p-6 border border-slate-700/30"> |
| 490 | + <div class="flex items-start gap-4"> |
| 491 | + <div class="w-10 h-10 bg-brand-500/20 rounded-lg flex items-center justify-center flex-shrink-0 mt-1"> |
| 492 | + <img src="icon-lock.svg" alt="" class="w-5 h-5" style="filter: invert(70%) sepia(80%) saturate(500%) hue-rotate(330deg);"> |
| 493 | + </div> |
| 494 | + <div> |
| 495 | + <h3 class="text-lg font-semibold mb-2">Encrypted HLS Streams</h3> |
| 496 | + <p class="text-slate-400 text-sm mb-3">Some platforms serve encrypted HLS playlists that are decrypted client-side. Standard downloaders fail because the playlist data is gibberish.</p> |
| 497 | + <div class="bg-slate-900/50 rounded-lg p-4 border border-slate-700/30"> |
| 498 | + <p class="text-slate-500 text-xs mb-2 font-mono">Our approach:</p> |
| 499 | + <p class="text-slate-300 text-sm">We intercept the actual <code class="bg-slate-800 px-1.5 py-0.5 rounded text-xs">.ts</code> video segments as the browser plays them, capture their individual auth tokens, download each segment, then concatenate them with <code class="bg-slate-800 px-1.5 py-0.5 rounded text-xs">ffmpeg</code>.</p> |
| 500 | + </div> |
| 501 | + </div> |
| 502 | + </div> |
| 503 | + </div> |
| 504 | + |
| 505 | + <!-- Challenge 2: GraphQL Persisted Queries --> |
| 506 | + <div class="bg-slate-800/20 rounded-2xl p-6 border border-slate-700/30"> |
| 507 | + <div class="flex items-start gap-4"> |
| 508 | + <div class="w-10 h-10 bg-purple-500/20 rounded-lg flex items-center justify-center flex-shrink-0 mt-1"> |
| 509 | + <img src="icon-terminal.svg" alt="" class="w-5 h-5" style="filter: invert(70%) sepia(50%) saturate(500%) hue-rotate(240deg);"> |
| 510 | + </div> |
| 511 | + <div> |
| 512 | + <h3 class="text-lg font-semibold mb-2">Persisted GraphQL Queries</h3> |
| 513 | + <p class="text-slate-400 text-sm mb-3">Modern SPAs often use GraphQL with "persisted queries"—the server only accepts pre-registered query hashes, rejecting arbitrary queries.</p> |
| 514 | + <div class="bg-slate-900/50 rounded-lg p-4 border border-slate-700/30"> |
| 515 | + <p class="text-slate-500 text-xs mb-2 font-mono">Our approach:</p> |
| 516 | + <p class="text-slate-300 text-sm">Instead of reverse-engineering GraphQL, we drive a real browser and extract data from the rendered DOM. This approach is more robust and survives API changes.</p> |
| 517 | + </div> |
| 518 | + </div> |
| 519 | + </div> |
| 520 | + </div> |
| 521 | + |
| 522 | + <!-- Challenge 3: Sequential Locking --> |
| 523 | + <div class="bg-slate-800/20 rounded-2xl p-6 border border-slate-700/30"> |
| 524 | + <div class="flex items-start gap-4"> |
| 525 | + <div class="w-10 h-10 bg-emerald-500/20 rounded-lg flex items-center justify-center flex-shrink-0 mt-1"> |
| 526 | + <img src="icon-check.svg" alt="" class="w-5 h-5" style="filter: invert(70%) sepia(50%) saturate(500%) hue-rotate(110deg);"> |
| 527 | + </div> |
| 528 | + <div> |
| 529 | + <h3 class="text-lg font-semibold mb-2">Sequential Content Locking</h3> |
| 530 | + <p class="text-slate-400 text-sm mb-3">Some courses lock lessons until previous ones are "completed". You might own the course but can't access lesson 10 until you've watched lessons 1-9.</p> |
| 531 | + <div class="bg-slate-900/50 rounded-lg p-4 border border-slate-700/30"> |
| 532 | + <p class="text-slate-500 text-xs mb-2 font-mono">Our approach:</p> |
| 533 | + <p class="text-slate-300 text-sm">The <code class="bg-slate-800 px-1.5 py-0.5 rounded text-xs">complete</code> command iteratively marks lessons as done, triggering the platform to unlock subsequent content. It loops until no new content appears.</p> |
| 534 | + </div> |
| 535 | + </div> |
| 536 | + </div> |
| 537 | + </div> |
| 538 | + |
| 539 | + <!-- Challenge 4: Session Management --> |
| 540 | + <div class="bg-slate-800/20 rounded-2xl p-6 border border-slate-700/30"> |
| 541 | + <div class="flex items-start gap-4"> |
| 542 | + <div class="w-10 h-10 bg-yellow-500/20 rounded-lg flex items-center justify-center flex-shrink-0 mt-1"> |
| 543 | + <img src="icon-zap.svg" alt="" class="w-5 h-5" style="filter: invert(70%) sepia(80%) saturate(500%) hue-rotate(10deg);"> |
| 544 | + </div> |
| 545 | + <div> |
| 546 | + <h3 class="text-lg font-semibold mb-2">Persistent Session Management</h3> |
| 547 | + <p class="text-slate-400 text-sm mb-3">Re-authenticating for every sync is tedious. Sessions need to persist, but also detect when they've expired.</p> |
| 548 | + <div class="bg-slate-900/50 rounded-lg p-4 border border-slate-700/30"> |
| 549 | + <p class="text-slate-500 text-xs mb-2 font-mono">Our approach:</p> |
| 550 | + <p class="text-slate-300 text-sm">We cache browser state (cookies, localStorage) per-platform. On each run, we validate the session and only prompt for login when truly needed. Interactive logins happen in a visible browser window you control.</p> |
| 551 | + </div> |
| 552 | + </div> |
| 553 | + </div> |
| 554 | + </div> |
| 555 | + |
| 556 | + <!-- Challenge 5: Token Expiry --> |
| 557 | + <div class="bg-slate-800/20 rounded-2xl p-6 border border-slate-700/30"> |
| 558 | + <div class="flex items-start gap-4"> |
| 559 | + <div class="w-10 h-10 bg-cyan-500/20 rounded-lg flex items-center justify-center flex-shrink-0 mt-1"> |
| 560 | + <img src="icon-video.svg" alt="" class="w-5 h-5" style="filter: invert(70%) sepia(50%) saturate(500%) hue-rotate(170deg);"> |
| 561 | + </div> |
| 562 | + <div> |
| 563 | + <h3 class="text-lg font-semibold mb-2">CDN Token Authentication</h3> |
| 564 | + <p class="text-slate-400 text-sm mb-3">Video CDNs like Bunny require signed URLs with short-lived tokens, API keys, and specific headers. Simple wget/curl doesn't work.</p> |
| 565 | + <div class="bg-slate-900/50 rounded-lg p-4 border border-slate-700/30"> |
| 566 | + <p class="text-slate-500 text-xs mb-2 font-mono">Our approach:</p> |
| 567 | + <p class="text-slate-300 text-sm">We extract all auth artifacts (cookies, tokens, API keys) from the browser session and pass them to <code class="bg-slate-800 px-1.5 py-0.5 rounded text-xs">ffmpeg</code> and our download handlers via headers.</p> |
| 568 | + </div> |
| 569 | + </div> |
| 570 | + </div> |
| 571 | + </div> |
| 572 | + </div> |
| 573 | + |
| 574 | + <!-- Tech Stack --> |
| 575 | + <div class="mt-16 pt-12 border-t border-slate-700/30"> |
| 576 | + <h3 class="text-lg font-semibold text-center mb-8">Built With</h3> |
| 577 | + <div class="flex flex-wrap justify-center gap-3"> |
| 578 | + <span class="bg-slate-800/50 px-4 py-2 rounded-full text-slate-300 text-sm border border-slate-700/50">TypeScript</span> |
| 579 | + <span class="bg-slate-800/50 px-4 py-2 rounded-full text-slate-300 text-sm border border-slate-700/50">Playwright</span> |
| 580 | + <span class="bg-slate-800/50 px-4 py-2 rounded-full text-slate-300 text-sm border border-slate-700/50">Commander.js</span> |
| 581 | + <span class="bg-slate-800/50 px-4 py-2 rounded-full text-slate-300 text-sm border border-slate-700/50">Zod</span> |
| 582 | + <span class="bg-slate-800/50 px-4 py-2 rounded-full text-slate-300 text-sm border border-slate-700/50">ffmpeg</span> |
| 583 | + <span class="bg-slate-800/50 px-4 py-2 rounded-full text-slate-300 text-sm border border-slate-700/50">hls-parser</span> |
| 584 | + </div> |
| 585 | + </div> |
| 586 | + </div> |
| 587 | + </section> |
| 588 | + |
477 | 589 | <!-- CTA Section --> |
478 | 590 | <section class="py-24 px-6 bg-gradient-to-b from-slate-950 to-slate-900"> |
479 | 591 | <div class="max-w-4xl mx-auto text-center"> |
|
0 commit comments