Skip to content

Conversation

@thomasdhc
Copy link
Contributor

beep boop [🤖]: Hi @suiyoubi 👋,

we've cherry picked #1451 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

* Remove Internvideo2

Signed-off-by: Ao Tang <aot@nvidia.com>

* more to remove

Signed-off-by: Ao Tang <aot@nvidia.com>

* fix writer

Signed-off-by: Ao Tang <aot@nvidia.com>

* Enhance Clip class to include cosmos_embed1_frames and cosmos_embed1_embedding in total size calculation

Signed-off-by: Ao Tang <aot@nvidia.com>

* remove iv2

Signed-off-by: Ao Tang <aot@nvidia.com>

---------

Signed-off-by: Ao Tang <aot@nvidia.com>
Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 4, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@thomasdhc
Copy link
Contributor Author

/ok to test 796fbcd

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 4, 2026

Greptile Overview

Greptile Summary

Cherry-pick of PR #1451 that completely removes InternVideo2 embedding support from the codebase, standardizing on Cosmos-Embed1 as the sole video embedding model.

Key Changes:

  • Removed all InternVideo2 model implementations, stages, and external dependencies (381+ lines from internvideo2_mm.py, 166+ lines from embedding stages)
  • Cleaned up Clip dataclass by removing InternVideo2-specific fields and fixing duplicate size calculation logic in get_major_size()
  • Updated ClipWriterStage to remove InternVideo2 embedding buffer and I/O logic, improved cosmos-embed1 algorithm check to use .startswith() for variant support
  • Removed InternVideo2 from Docker build (git clone, patching, uv install steps)
  • Cleaned up CI/CD pipeline by removing InternVideo2 caching and installation
  • Updated all tests to remove InternVideo2 test cases and use Cosmos-Embed1 exclusively
  • Comprehensively updated documentation, tutorials, and examples across 20+ files

Impact:
The removal is clean and complete with no remaining InternVideo2 references in the codebase. Tests were updated appropriately to reflect the new Cosmos-Embed1-only architecture. The change simplifies the embedding pipeline and reduces external dependencies.

Confidence Score: 5/5

  • This PR is safe to merge with no concerns
  • The cherry-pick cleanly removes InternVideo2 support with thorough changes across code, tests, documentation, and infrastructure. All InternVideo2 references have been eliminated, tests have been updated appropriately, and the improved algorithm check using .startswith() properly supports Cosmos-Embed1 variants. The removal also fixes a bug where size calculation was checking the wrong embedding fields.
  • No files require special attention

Important Files Changed

Filename Overview
nemo_curator/tasks/video.py Removed InternVideo2 fields from Clip dataclass and fixed duplicate size calculation
nemo_curator/stages/video/io/clip_writer.py Removed InternVideo2 embedding support, updated cosmos-embed1 algorithm check to use .startswith()
docker/Dockerfile Removed InternVideo2 repository cloning and patching from Docker build
.github/workflows/cicd-main.yml Removed InternVideo2 caching and installation from CI/CD pipeline
nemo_curator/models/internvideo2_mm.py Deleted InternVideo2 model implementation
nemo_curator/stages/video/embedding/internvideo2.py Deleted InternVideo2 embedding stages
tests/stages/video/io/test_clip_writer.py Removed InternVideo2 test cases and updated assertions
tutorials/video/getting-started/video_split_clip_example.py Removed InternVideo2 import and pipeline stages from tutorial

Sequence Diagram

sequenceDiagram
    participant Dev as Developer
    participant PR as PR #1451
    participant Code as Codebase
    participant Docker as Docker Build
    participant CI as CI/CD Pipeline
    participant Tests as Test Suite
    participant Docs as Documentation

    Note over Dev,Docs: InternVideo2 Removal Process

    Dev->>Code: Remove InternVideo2 model files
    Code-->>Dev: Delete internvideo2_mm.py, internvideo2.py

    Dev->>Code: Remove InternVideo2 stages
    Code-->>Dev: Delete InternVideo2EmbeddingStage, InternVideo2FrameCreationStage

    Dev->>Code: Update Clip dataclass
    Note right of Code: Remove intern_video_2_frames<br/>Remove intern_video_2_embedding<br/>Remove intern_video_2_text_match<br/>Fix duplicate size calculation

    Dev->>Code: Update ClipWriterStage
    Note right of Code: Remove _iv2_embedding_buffer<br/>Remove InternVideo2 embedding logic<br/>Update cosmos-embed1 check to .startswith()

    Dev->>Docker: Remove InternVideo2 from Dockerfile
    Docker-->>Dev: Remove git clone, patch, and uv add steps

    Dev->>CI: Update CI/CD workflow
    CI-->>Dev: Remove InternVideo2 cache and checkout steps

    Dev->>Tests: Update test suite
    Tests-->>Dev: Remove InternVideo2 tests<br/>Update assertions to use Cosmos-Embed1

    Dev->>Docs: Update documentation
    Docs-->>Dev: Remove InternVideo2 references<br/>Update embedding algorithm docs<br/>Update tutorials and examples

    Dev->>PR: Cherry-pick to r1.1.0
    PR->>Code: Apply all changes to release branch

    Note over Code,Tests: Result: Cosmos-Embed1 is now<br/>the only supported embedding model
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@thomasdhc thomasdhc merged commit 1f38a7e into r1.1.0 Feb 4, 2026
49 of 53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants