Skip to content

Latest commit

 

History

History
51 lines (41 loc) · 1.08 KB

File metadata and controls

51 lines (41 loc) · 1.08 KB

SCRUM-XX: Preprocessing Pipeline for

Dataset

  • Name:
  • Source:
  • Size: <e.g., 300GB, 440k videos>
  • License:

Storage Location

  • Path:
  • Access Method: <download/access command>

Data Split

  • Train/Val/Test Ratio: <e.g., 70/15/15>
  • Counts:
    • Train:
    • Val:
    • Test:
  • Method: <how split was created, including seed/stratification and grouping rules>

Preprocessing Steps

  1. <step 1>
  2. <step 2>
  3. <step 3>

Code Changes

  • Updated <path/to/file>
  • Added <path/to/file>
  • Modified <path/to/file>

How to Run

# Step 1: Download/locate dataset
<command>

# Step 2: Split dataset
<command>

# Step 3: Run preprocessing
<command>

Verification

  • Ran preprocessing end-to-end
  • Verified split counts match expectations
  • Verified processed data looks correct (sample check)
  • No dataset files committed
  • Documentation updated

Notes

<gotchas, assumptions, and important details for teammates>