An application that processes video files to analyze employee work sessions and returns summaries and analytics.
- Upload: user uploads a video via the frontend (drag-and-drop or file select). A
prompt
field may be provided. - Processing: server validates the file, extracts audio (if present) and keyframes using
ffmpeg
/ffprobe
. - Keyframe selection: server chooses a keyframe interval based on video duration (e.g. 10s / 20s / 30s) and extracts frames at that rate.
- Analysis: audio is transcribed (OpenAI transcription models) and frames are analyzed via vision-capable models. A composed internal prompt (including audio/visual context) is sent to the model.
- Summary database: results are saved to MongoDB along with
prompt_used
andmetadata
to enable the summaries page to display the prompt and other details later.
# Navigate to client
cd client
# Install dependencies
npm install
# Start frontend
npm run dev
# Frontend available at http://localhost:3000
# Navigate to server
cd server
# Create & activate venv (macOS / Linux)
python -m venv venv
source venv/bin/activate
# Install server deps
pip install -r requirements.txt
# Copy environment file and set credentials
cp .env.example .env
# Edit .env to set OPENAI_API_KEY, MONGODB_URL, MONGODB_DB_NAME, etc.
# (Optional) If you need to merge WebM fragments:
python merge_webm.py
# Start backend (the project contains a FastAPI app)
python main.py
# Backend available at http://localhost:8000 (API docs at /docs)
# Build both services
docker compose build
# Run both services
docker compose up -d
# Check logs
docker compose logs -f
# Stop services
docker compose down
Note: ensure ffmpeg
and ffprobe
are installed. On macOS, you can install via Homebrew:
brew install ffmpeg