Skip to content

Unable to extract feature vectors other than frame by frame #22

@Guilleuz

Description

@Guilleuz

Describe the bug
The feature extraction is performed frame by frame, instead of in clips, regardless of the values I use for NUM_FRAMES and SAMPLING_RATE I always get the same number of feature vector as frames the video has.

To Reproduce
Execute using the following configuration:

TRAIN:

  ENABLE: False
  DATASET: epickitchens
  BATCH_SIZE: 50
  EVAL_PERIOD: 2
  CHECKPOINT_PERIOD: 1
  CHECKPOINT_FILE_PATH: "SlowFast.pyth"
  CHECKPOINT_TYPE: pytorch
  AUTO_RESUME: True
DATA:
  NUM_FRAMES: 32
  SAMPLING_RATE: 2
  PATH_TO_DATA_DIR: "path to videos"
  TRAIN_JITTER_SCALES: [256, 320]
  TRAIN_CROP_SIZE: 224
  TEST_CROP_SIZE: 256
  READ_VID_FILE: False
  IMG_FILE_EXT: ".jpg"
  IN_FPS: 30
  OUT_FPS: 30
  TARGET_FPS: 30
  IMG_FILE_FORMAT: "frame_{:010d}.jpg"
  INPUT_CHANNEL_NUM: [3, 3]
  VID_FILE_EXT: ""
SLOWFAST:
  ALPHA: 8
  BETA_INV: 8
  FUSION_CONV_CHANNEL_RATIO: 2
  FUSION_KERNEL_SZ: 7
RESNET:
  ZERO_INIT_FINAL_BN: True
  WIDTH_PER_GROUP: 64
  NUM_GROUPS: 1
  DEPTH: 50
  TRANS_FUNC: bottleneck_transform
  STRIDE_1X1: False
  NUM_BLOCK_TEMP_KERNEL: [[3, 3], [4, 4], [6, 6], [3, 3]]
  SPATIAL_STRIDES: [[1, 1], [2, 2], [2, 2], [2, 2]]
  SPATIAL_DILATIONS: [[1, 1], [1, 1], [1, 1], [1, 1]]
NONLOCAL:
  LOCATION: [[[], []], [[], []], [[], []], [[], []]]
  GROUP: [[1, 1], [1, 1], [1, 1], [1, 1]]
  INSTANTIATION: dot_product
BN:
  USE_PRECISE_STATS: True
  NUM_BATCHES_PRECISE: 200
  WEIGHT_DECAY: 0.0
SOLVER:
  BASE_LR: 0.01
  LR_POLICY: steps_with_relative_lrs
  STEPS: [0, 20, 25]
  LRS: [1, 0.1, 0.01]
  MAX_EPOCH: 30
  MOMENTUM: 0.9
  WEIGHT_DECAY: 1e-4
  WARMUP_EPOCHS: 1.0
  WARMUP_START_LR: 0.001
  OPTIMIZING_METHOD: sgd
MODEL:
  NUM_CLASSES: 97
  ARCH: slowfast
  MODEL_NAME: SlowFast
  LOSS_FUNC: cross_entropy
  DROPOUT_RATE: 0.5
TEST:
  ENABLE: True
  DATASET: epickitchens
  BATCH_SIZE: 1
  NUM_SPATIAL_CROPS: 1
  CHECKPOINT_FILE_PATH: "SlowFast.pyth"
  CHECKPOINT_TYPE: pytorch
DATA_LOADER:
  NUM_WORKERS: 8
  PIN_MEMORY: True
NUM_GPUS: 1
NUM_SHARDS: 1
RNG_SEED: 0
OUTPUT_DIR: "./bsh-features"

Expected behavior
Instead of outputting a feature vector for each frame, I should be able to process the video divided in clips.

Desktop:

  • OS: [e.g. Ubuntu 22.04 LTS]
  • Python Version: 3.10.4
  • PyTorch version: 1.12.1+cu113

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions