Standardize Era0 and Era1 datasets to incorporate into model training by pastorep · Pull Request #13 · orcasound/orca-ml

pastorep · 2025-09-15T18:54:36Z

Previously, our 2a_*.ipynb model training script was using only Era1 data to train a model. This change gives Era0 dataset generation capability and standardizes the clip length 0sec < X <= 4sec such that that model can be trained on spectrograms of exactly 4sec audio length.

Dev Note: I haven't fully tested the model training capability with the new data/spectrograms, but I am pushing up this PR such that others can build from what I have in the Microsoft Hackathon 2025.

…andomly downsample train set

…e clips and no Era0 data

pastorep added 11 commits February 11, 2025 07:22

Placeholder code for Era0 data generation

06bdc37

Cleaned import dependencies in Era1 data generation; wrote logic to r…

c0263a2

…andomly downsample train set

Simple checks of train set

8620867

Train model on rebalanced dataset with random downsampling of negativ…

9b7a4a4

…e clips and no Era0 data

Begin train set split logic

97b5f8f

Filtering applicable clips into era0/[train|val|test] folders

918c22f

Automate folder cleanup if already exist

891efcd

Updated Era1 data generation to take full audio clip (3s)

59e4f6e

Update before move compute

ac0b9ce

Padding clips to 4sec

d1bd21f

Merge branch 'main' into dev/pastorep/train-with-era0

fbd2f38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardize Era0 and Era1 datasets to incorporate into model training#13

Standardize Era0 and Era1 datasets to incorporate into model training#13
pastorep wants to merge 11 commits intomainfrom
dev/pastorep/train-with-era0

pastorep commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pastorep commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant