-
Notifications
You must be signed in to change notification settings - Fork 58
[FEATURE] Add WebVTT regression test coverage #993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Rahul-2k4
wants to merge
31
commits into
CCExtractor:master
Choose a base branch
from
Rahul-2k4:feature/webvtt-regression-test
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+309
−17
Open
Changes from all commits
Commits
Show all changes
31 commits
Select commit
Hold shift + click to select a range
111e898
Add regression test for WebVTT output format
Rahul-2k4 253944c
Initial plan
Copilot c775987
Fix extension format for WebVTT and SRT test outputs
Copilot 39db9ca
fix(webvtt-test): Address code review feedback
Rahul-2k4 dfd6055
fix(webvtt-test): Align test fixture with actual golden file content
Rahul-2k4 f493901
chore(deps): bump mypy from 1.5.1 to 1.19.1
dependabot[bot] 7f47053
fix: Remove remaining blocking wait_for_operation calls
cfsmp3 cbaf79b
feat: Add VM deletion tracking and verification system
cfsmp3 f723c48
fix: Remove unsupported script_stop parameter from deployment workflow
cfsmp3 0845d5c
fix: Add -L flag to curl to follow HTTP->HTTPS redirects
cfsmp3 8df33a1
Add Alembic migration for WebVTT regression test
Rahul-2k4 7f5529a
chore(deps): bump xmltodict from 0.13.0 to 1.0.2
dependabot[bot] 1656414
chore(deps): bump coverage from 7.13.0 to 7.13.1
dependabot[bot] 8c1e6f8
chore(deps): bump lxml from 5.3.0 to 6.0.2
dependabot[bot] e311d16
chore(deps): bump google-api-python-client from 2.154.0 to 2.187.0
dependabot[bot] edd10da
chore(deps): bump gitpython from 3.1.45 to 3.1.46
dependabot[bot] e81e5ae
Add real-time sample progress indicator during testing stage
NexionisJake 7a9bcb9
fix: Replace misleading queue message with accurate VM provisioning info
cfsmp3 fca9294
test: Update test_data_for_test to match simplified implementation
cfsmp3 20ce086
fix: Show correct sample progress on initial page load
cfsmp3 6b29e6d
fix: handle log file permission errors gracefully
cfsmp3 897ad69
fix: add log directory ownership check to pre_deploy.sh
cfsmp3 775bec3
chore(deps): upgrade SQLAlchemy from 1.4.41 to 2.0.45
cfsmp3 2e39c33
fix: restore specific type annotations per review feedback
cfsmp3 83b686c
fix: suppress AsyncMock RuntimeWarnings in Python 3.13+
cfsmp3 af2e8c8
fix: use MockStatus object instead of dict in test_webhook_pr_closed
cfsmp3 f91b205
chore: update gitignore and add CLAUDE.md
Rahul-2k4 555acfc
fix: Update tests to account for WebVTT regression test (id=3)
Rahul-2k4 8221bed
fix: Resolve CodeRabbit review issues
Rahul-2k4 a24583c
fix: Update test_add_test assertion to check ID=4 after WebVTT test
Rahul-2k4 95d12c9
Merge upstream/master, keep CodeRabbit fixes
Rahul-2k4 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,130 @@ | ||
| # CLAUDE.md | ||
|
|
||
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | ||
|
|
||
| ## Project Overview | ||
|
|
||
| CCExtractor Sample Platform - Flask web application for managing regression tests, sample uploads, and CI/CD for the CCExtractor project. Validates PRs by running CCExtractor against sample media files on GCP VMs (Linux/Windows). | ||
|
|
||
| ## Tech Stack | ||
|
|
||
| - **Backend**: Flask 3.1, SQLAlchemy 1.4, MySQL (SQLite for tests) | ||
| - **Cloud**: GCP Compute Engine (test VMs), Google Cloud Storage (samples) | ||
| - **CI/CD**: GitHub Actions, GitHub API (PyGithub) | ||
| - **Testing**: nose2, Flask-Testing, coverage | ||
|
|
||
| ## Commands | ||
|
|
||
| ```bash | ||
| # Setup | ||
| virtualenv venv && source venv/bin/activate | ||
| pip install -r requirements.txt | ||
| pip install -r test-requirements.txt | ||
|
|
||
| # Run tests | ||
| TESTING=True nose2 | ||
|
|
||
| # Linting & type checking | ||
| pycodestyle ./ --config=./.pycodestylerc | ||
| pydocstyle ./ | ||
| mypy . | ||
| isort . --check-only | ||
|
|
||
| # Database migrations | ||
| export FLASK_APP=/path/to/run.py | ||
| flask db upgrade # Apply migrations | ||
| flask db migrate # Generate new migration | ||
|
|
||
| # Update regression test results | ||
| python manage.py update /path/to/ccextractor | ||
| ``` | ||
|
|
||
| ## Architecture | ||
|
|
||
| ### Module Structure | ||
| Each module in `mod_*/` follows: `__init__.py`, `controllers.py` (routes), `models.py` (ORM), `forms.py` (WTForms) | ||
|
|
||
| | Module | Purpose | | ||
| |--------|---------| | ||
| | `mod_ci` | GitHub webhooks, GCP VM orchestration, test execution | | ||
| | `mod_regression` | Regression test definitions, categories, expected outputs | | ||
| | `mod_test` | Test runs, results, progress tracking | | ||
| | `mod_sample` | Sample file management, tags, extra files | | ||
| | `mod_upload` | HTTP/FTP upload handling | | ||
| | `mod_auth` | User auth, roles (admin/user/contributor/tester) | | ||
| | `mod_customized` | Custom test runs for forks | | ||
|
|
||
| ### Key Models & Relationships | ||
| ``` | ||
| Sample (sha hash) -> RegressionTest (command, expected_rc) -> RegressionTestOutput | ||
| | | ||
| Fork (GitHub repo) -> Test (platform, commit) -> TestResult -> TestResultFile | ||
| -> TestProgress (status tracking) | ||
| ``` | ||
|
|
||
| ### CI Flow | ||
| 1. GitHub webhook (`/start-ci`) receives PR/push events | ||
| 2. Waits for GitHub Actions build artifacts | ||
| 3. `gcp_instance()` provisions Linux/Windows VMs | ||
| 4. VMs run CCExtractor, report to `progress_reporter()` | ||
| 5. Results compared against expected outputs | ||
| 6. `comment_pr()` posts results to GitHub | ||
|
|
||
| ## Critical Files | ||
|
|
||
| - `run.py` - Flask app entry, blueprint registration | ||
| - `mod_ci/controllers.py` - CI orchestration (2500+ lines) | ||
| - `mod_regression/models.py` - Test definitions | ||
| - `mod_test/models.py` - Test execution models | ||
| - `database.py` - SQLAlchemy setup, custom types | ||
| - `tests/base.py` - Test fixtures, mock helpers | ||
|
|
||
| ## GSoC 2026 Focus Areas (from Carlos) | ||
|
|
||
| ### Priority 1: Regression Test Suite | ||
| The main blocker for CCExtractor Rust migration is test coverage. Current needs: | ||
| - Add regression tests for uncovered caption types/containers | ||
| - Import FFmpeg and VLC official video libraries as test samples | ||
| - Systematic sample analysis using ffprobe, mkvnix, CCExtractor output | ||
| - Goal: Trust SP enough that passing tests = safe to merge | ||
|
|
||
| ### Priority 2: Sample Platform Improvements | ||
| Low-coverage modules needing work: | ||
| - `mod_upload` (44% coverage) - FTP upload, progress tracking | ||
| - `mod_test` (58% coverage) - diff generation, error scenarios | ||
| - `mod_sample` (61% coverage) - Issue linking, tag management | ||
|
|
||
| ### Contribution Strategy | ||
| 1. Start with unit tests for low-coverage modules | ||
| 2. Add integration tests for CI flow | ||
| 3. Help document sample metadata systematically | ||
| 4. Enable confident C code removal by proving test coverage | ||
|
|
||
| ## Code Style | ||
|
|
||
| - Type hints required (mypy enforced) | ||
| - Docstrings required (pydocstyle enforced) | ||
| - PEP8 (pycodestyle enforced) | ||
| - Imports sorted with isort | ||
|
|
||
| ## MCP Setup (GSoC 2026) | ||
|
|
||
| **Configured servers** (`~/.claude/settings.json`): | ||
| - `github` – repo/PR/issue management (needs `GITHUB_PERSONAL_ACCESS_TOKEN` env var) | ||
| - `context7` – up-to-date library docs | ||
| - `filesystem` – scoped to `/home/rahul/projects/gsoc` | ||
|
|
||
| **Security**: | ||
| - Token stored in `~/.profile`, never committed | ||
| - MCP paths added to `.gitignore` | ||
| - pm2 config at `~/ecosystem.config.js` for auto-restart | ||
|
|
||
| **Commands**: | ||
| ```bash | ||
| # Start MCP servers | ||
| pm2 start ~/ecosystem.config.js | ||
| pm2 logs | ||
|
|
||
| # Resume Claude session | ||
| claude --resume | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| # Golden File Provenance | ||
|
|
||
| This document tracks the generation details for regression test golden files. | ||
|
|
||
| ## sample1.webvtt | ||
|
|
||
| | Field | Value | | ||
| |-------|-------| | ||
| | Generated | 2026-01-02 | | ||
| | CCExtractor Version | 0.96.3 | | ||
| | Binary | ccextractorwinfull.exe | | ||
| | Platform | Windows x64 | | ||
| | Source Commit | Release build from windows/x64/Release-Full | | ||
| | Command | `ccextractorwinfull.exe sample1.ts -out=webvtt -o sample1.webvtt` | | ||
| | Input Sample | sample1.ts (no embedded closed captions) | | ||
| | Expected Output | WebVTT header only (WEBVTT + blank line) | | ||
|
|
||
| ### Reproduction Steps | ||
|
|
||
| ```bash | ||
| ccextractor install/sample_files/sample1.ts -out=webvtt -o install/sample_files/sample1.webvtt | ||
| ``` | ||
|
|
||
| ### Notes | ||
|
|
||
| - sample1.ts contains no closed caption data, so output is header-only | ||
| - This test validates WebVTT header generation, not full cue formatting | ||
| - For full WebVTT validation, a sample with embedded captions should be added |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| WEBVTT | ||
|
|
112 changes: 112 additions & 0 deletions
112
migrations/versions/c1a2b3d4e5f6_add_webvtt_regression_test.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,112 @@ | ||
| """Add WebVTT regression test | ||
|
|
||
| Revision ID: c1a2b3d4e5f6 | ||
| Revises: b3ed927671bd | ||
| Create Date: 2026-01-04 21:05:00.000000 | ||
|
|
||
| """ | ||
| import sqlalchemy as sa | ||
| from alembic import op | ||
| from sqlalchemy import text | ||
|
|
||
| # revision identifiers, used by Alembic. | ||
| revision = 'c1a2b3d4e5f6' | ||
| down_revision = 'b3ed927671bd' | ||
| branch_labels = None | ||
| depends_on = None | ||
|
|
||
|
|
||
| def upgrade(): | ||
| conn = op.get_bind() | ||
|
|
||
| # 1. Insert "Output Formats" category if not exists | ||
| existing_cat = conn.execute( | ||
| text("SELECT id FROM category WHERE name = 'Output Formats'") | ||
| ).fetchone() | ||
|
|
||
| if existing_cat is None: | ||
| conn.execute( | ||
| text("INSERT INTO category (name, description) VALUES ('Output Formats', 'Tests for specific output format generation')") | ||
| ) | ||
| category_id = conn.execute(text("SELECT id FROM category WHERE name = 'Output Formats'")).fetchone()[0] | ||
| else: | ||
| category_id = existing_cat[0] | ||
|
|
||
| # 2. Check if WebVTT regression test already exists | ||
| existing_test = conn.execute( | ||
| text("SELECT id FROM regression_test WHERE command = '-out=webvtt' AND sample_id = 1") | ||
| ).fetchone() | ||
|
|
||
| if existing_test is None: | ||
| # 3. Insert the WebVTT regression test (sample_id=1 is sample1.ts) | ||
| conn.execute( | ||
| text(""" | ||
| INSERT INTO regression_test (sample_id, command, input_type, output_type, expected_rc, active, description) | ||
| VALUES (1, '-out=webvtt', 'file', 'file', 0, 1, 'Validates WebVTT header generation on empty-caption input') | ||
| """) | ||
| ) | ||
| test_id = conn.execute( | ||
| text("SELECT id FROM regression_test WHERE command = '-out=webvtt' AND sample_id = 1") | ||
| ).fetchone()[0] | ||
|
|
||
| # 4. Insert RegressionTestOutput with the golden content | ||
| conn.execute( | ||
| text(""" | ||
| INSERT INTO regression_test_output (regression_id, correct, correct_extension, expected_filename) | ||
| VALUES (:test_id, 'WEBVTT\r\n\r\n', '.webvtt', 'sample1.webvtt') | ||
| """), | ||
| {"test_id": test_id} | ||
| ) | ||
|
|
||
| # 5. Link test to category | ||
| conn.execute( | ||
| text(""" | ||
| INSERT INTO regression_test_category (regression_id, category_id) | ||
| VALUES (:test_id, :cat_id) | ||
| """), | ||
| {"test_id": test_id, "cat_id": category_id} | ||
| ) | ||
|
|
||
|
|
||
| def downgrade(): | ||
| conn = op.get_bind() | ||
|
|
||
| # Get the WebVTT test ID | ||
| test_row = conn.execute( | ||
| text("SELECT id FROM regression_test WHERE command = '-out=webvtt' AND sample_id = 1") | ||
| ).fetchone() | ||
|
|
||
| if test_row is not None: | ||
| test_id = test_row[0] | ||
|
|
||
| # Delete in reverse order of dependencies | ||
| conn.execute( | ||
| text("DELETE FROM regression_test_category WHERE regression_id = :test_id"), | ||
| {"test_id": test_id} | ||
| ) | ||
| conn.execute( | ||
| text("DELETE FROM regression_test_output WHERE regression_id = :test_id"), | ||
| {"test_id": test_id} | ||
| ) | ||
| conn.execute( | ||
| text("DELETE FROM regression_test WHERE id = :test_id"), | ||
| {"test_id": test_id} | ||
| ) | ||
|
|
||
| # Check if "Output Formats" category has any remaining tests | ||
| cat_row = conn.execute( | ||
| text("SELECT id FROM category WHERE name = 'Output Formats'") | ||
| ).fetchone() | ||
|
|
||
| if cat_row is not None: | ||
| category_id = cat_row[0] | ||
| remaining = conn.execute( | ||
| text("SELECT COUNT(*) FROM regression_test_category WHERE category_id = :cat_id"), | ||
| {"cat_id": category_id} | ||
| ).fetchone()[0] | ||
|
|
||
| if remaining == 0: | ||
| conn.execute( | ||
| text("DELETE FROM category WHERE id = :cat_id"), | ||
| {"cat_id": category_id} | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.