forked from pelikhan/action-genai-video-issue-analyzer
-
Notifications
You must be signed in to change notification settings - Fork 0
Enhance video analyzer to detect slide transitions and generate timestamps for videos of slide decks #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
Copilot
wants to merge
6
commits into
main
Choose a base branch
from
copilot/fix-1
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Enhance video analyzer to detect slide transitions and generate timestamps for videos of slide decks #2
Changes from 3 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
f8d4135
Initial plan for issue
Copilot 0200a04
Implement slide deck annotator with configurable script selection
Copilot ee8fcf3
Final implementation with documentation and entrypoint fixes
Copilot 126c4a8
Add Git LFS support for large video files in both analyzers
Copilot 503263f
Document Git LFS support in README
Copilot 6e12cb6
Refactor slide deck annotator to accept video file path instead of ex…
Copilot File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
name: genai video slide deck annotator | ||
on: | ||
issues: | ||
types: [opened, edited] | ||
permissions: | ||
contents: read | ||
issues: write | ||
models: read | ||
concurrency: | ||
group: ${{ github.workflow }}-${{ github.ref }} | ||
cancel-in-progress: true | ||
jobs: | ||
genai-video-slide-deck-analyze: | ||
runs-on: ubuntu-latest | ||
services: | ||
whisper: | ||
image: onerahmet/openai-whisper-asr-webservice:latest | ||
env: | ||
ASR_MODEL: base | ||
ASR_ENGINE: openai_whisper | ||
ports: | ||
- 9000:9000 | ||
options: >- | ||
--health-cmd "curl -f http://localhost:9000/docs || exit 1" | ||
--health-interval 10s | ||
--health-timeout 5s | ||
--health-retries 5 | ||
--health-start-period 20s | ||
steps: | ||
- uses: actions/checkout@v4 | ||
- uses: pelikhan/action-genai-video-issue-analyzer@main | ||
with: | ||
script: action-video-slide-deck-annotator | ||
github_issue: ${{ github.event.issue.number }} | ||
github_token: ${{ secrets.GITHUB_TOKEN }} | ||
instructions: "Analyze the video frames to detect slide transitions in a presentation. Focus on identifying significant visual changes that indicate when slides change, ignore minor changes like cursor movement or highlighting. Generate timestamps with confidence scores for each detected transition." |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
#!/bin/sh | ||
|
||
# Set the script name from the input parameter, defaulting to action-video-issue-analyzer | ||
export SCRIPT_NAME="${INPUT_SCRIPT:-action-video-issue-analyzer}" | ||
|
||
# Set the whisper API base | ||
export WHISPERASR_API_BASE=http://whisper:9000 | ||
|
||
# Run genaiscript directly with the selected script | ||
cd /genaiscript/action | ||
npx genaiscript run "$SCRIPT_NAME" --github-workspace --pull-request-comment --no-run-trace --no-output-trace |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,190 @@ | ||
script({ | ||
title: "Analyzes videos to detect slide transitions and generate timestamps", | ||
accept: "none", | ||
parameters: { | ||
instructions: { | ||
type: "string", | ||
description: | ||
"Custom prompting instructions for slide transition detection.", | ||
default: | ||
"Analyze the video frames to detect slide transitions in a presentation. Focus on identifying significant visual changes that indicate when slides change, ignore minor changes like cursor movement or highlighting. Generate timestamps with confidence scores for each detected transition.", | ||
}, | ||
}, | ||
}); | ||
|
||
const { dbg, output, vars } = env; | ||
const issue = await github.getIssue(); | ||
if (!issue) | ||
throw new Error( | ||
"No issue found in the context. This action requires an issue to be present.", | ||
); | ||
const { instructions } = vars as { instructions: string }; | ||
if (!instructions) | ||
throw new Error( | ||
"No instructions provided. Please provide instructions to process the video.", | ||
); | ||
|
||
const RX = /^https:\/\/github.com\/user-attachments\/assets\/.+$/gim; | ||
const assetLinks = Array.from( | ||
new Set(Array.from(issue.body.matchAll(RX), (m) => m[0])), | ||
); | ||
if (assetLinks.length === 0) | ||
cancel("No video assets found in the issue body, nothing to do."); | ||
|
||
dbg(`issue: %s`, issue.title); | ||
|
||
for (const assetLink of assetLinks) await processAssetLink(assetLink); | ||
|
||
async function processAssetLink(assetLink: string) { | ||
output.heading(3, assetLink); | ||
dbg(assetLink); | ||
const downloadUrl = await github.resolveAssetUrl(assetLink); | ||
const res = await fetch(downloadUrl, { method: "GET" }); | ||
const contentType = res.headers.get("content-type") || ""; | ||
dbg(`download url: %s`, downloadUrl); | ||
dbg(`headers: %O`, res.headers); | ||
if (!res.ok) | ||
throw new Error( | ||
`Failed to download asset from ${downloadUrl}: ${res.status} ${res.statusText}`, | ||
); | ||
if (!/^video\//.test(contentType)) { | ||
output.p(`Asset is not a video file, skipping`); | ||
return; | ||
} | ||
|
||
// save and cache | ||
const buffer = await res.arrayBuffer(); | ||
dbg(`size`, `${(buffer.byteLength / 1e6) | 0}Mb`); | ||
const filename = await workspace.writeCached(buffer, { scope: "run" }); | ||
dbg(`filename`, filename); | ||
|
||
await processVideo(filename); | ||
} | ||
|
||
async function processVideo(filename: string) { | ||
const transcript = await transcribe(filename, { | ||
model: "whisperasr:default", | ||
cache: true, | ||
}); | ||
if (!transcript) { | ||
output.error(`no transcript found for video ${filename}.`); | ||
} | ||
|
||
// Extract frames for slide transition detection | ||
const frames = await ffmpeg.extractFrames(filename, { | ||
transcript, | ||
}); | ||
|
||
const { text, error } = await runPrompt( | ||
(ctx) => { | ||
ctx.def("TRANSCRIPT", transcript?.srt, { ignoreEmpty: true }); // ignore silent videos | ||
ctx.defImages(frames, { detail: "high", sliceSample: 80 }); // higher detail for slide detection | ||
ctx.$`${instructions} | ||
|
||
## Analysis Instructions | ||
|
||
You are analyzing a video of a slide deck presentation. Your task is to: | ||
|
||
1. **Detect Slide Transitions**: Identify when the content significantly changes between frames, indicating a new slide | ||
2. **Filter Noise**: Ignore minor changes like cursor movement, highlighting, or small animations | ||
3. **Generate Timestamps**: Provide accurate timestamps for each transition | ||
4. **Assess Confidence**: Rate your confidence in each detection (0.0 to 1.0) | ||
5. **Create Viewing Segments**: Generate recommended 2-minute viewing segments for each slide | ||
|
||
## Output Format | ||
|
||
Respond with a valid JSON object in the following format: | ||
|
||
\`\`\`json | ||
{ | ||
"video_duration": "HH:MM:SS", | ||
"slide_transitions": [ | ||
{ | ||
"timestamp": "HH:MM:SS", | ||
"confidence": 0.95, | ||
"slide_number": 1, | ||
"description": "Brief description of the transition" | ||
} | ||
], | ||
"recommended_segments": [ | ||
{ | ||
"start": "HH:MM:SS", | ||
"end": "HH:MM:SS", | ||
"slide": 1, | ||
"description": "First 2 minutes of slide content" | ||
} | ||
] | ||
} | ||
\`\`\` | ||
|
||
## Key Guidelines | ||
|
||
- Focus on major visual changes that clearly indicate slide transitions | ||
- Confidence scores should reflect how certain you are about the transition | ||
- Slide numbers should increment sequentially starting from 1 | ||
- Recommended segments should be exactly 2 minutes or until the next slide transition | ||
- Use the transcript to help understand content changes when visual changes are ambiguous | ||
- If frames show the same slide content, do not mark as a transition | ||
- Look for changes in slide titles, bullet points, images, charts, or overall layout | ||
|
||
Analyze the provided frames and transcript to detect slide transitions.`.role( | ||
"system", | ||
); | ||
}, | ||
{ | ||
systemSafety: true, | ||
model: "vision", | ||
responseType: "json", | ||
label: `analyze slide transitions ${filename}`, | ||
}, | ||
); | ||
|
||
if (error) { | ||
output.error(error?.message); | ||
} else { | ||
// Parse and validate JSON response | ||
try { | ||
const analysisResult = JSON.parse(text); | ||
|
||
// Display results in a formatted way | ||
output.heading(4, "Slide Transition Analysis Results"); | ||
output.code(JSON.stringify(analysisResult, null, 2), "json"); | ||
|
||
// Also provide a summary | ||
if ( | ||
analysisResult.slide_transitions && | ||
analysisResult.slide_transitions.length > 0 | ||
) { | ||
output.heading(5, "Summary"); | ||
output.p( | ||
`Found ${analysisResult.slide_transitions.length} slide transitions in video duration: ${analysisResult.video_duration}`, | ||
); | ||
|
||
output.heading(5, "Detected Transitions"); | ||
for (const transition of analysisResult.slide_transitions) { | ||
output.p( | ||
`**Slide ${transition.slide_number}** at [${transition.timestamp}] (confidence: ${transition.confidence}) - ${transition.description}`, | ||
); | ||
} | ||
|
||
if ( | ||
analysisResult.recommended_segments && | ||
analysisResult.recommended_segments.length > 0 | ||
) { | ||
output.heading(5, "Recommended Viewing Segments"); | ||
for (const segment of analysisResult.recommended_segments) { | ||
output.p( | ||
`**Slide ${segment.slide}**: [${segment.start}] - [${segment.end}] - ${segment.description}`, | ||
); | ||
} | ||
} | ||
} else { | ||
output.p("No slide transitions detected in this video."); | ||
} | ||
} catch (parseError) { | ||
output.error(`Failed to parse JSON response: ${parseError.message}`); | ||
output.heading(4, "Raw Response"); | ||
output.appendContent(text); | ||
} | ||
} | ||
} |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The video may be too big to fit in an issue, so we need to be able to fetch it from git large file storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to use issues at all for this flow. Take the file location in the repo as the arg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added Git LFS support to handle large video files. Both video analyzers now detect and properly download videos from Git LFS URLs including raw files, releases, and blob URLs, while maintaining backward compatibility with regular GitHub attachments. Added appropriate error handling and size monitoring for large files. Changes committed in 4f8e8a3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the slide deck annotator to accept a direct file path parameter instead of extracting videos from GitHub issues. The script now takes
video_file_path
as a parameter and processes the video file directly from the repository location, removing the dependency on issues entirely. Changes committed in 6e12cb6.