-
Notifications
You must be signed in to change notification settings - Fork 23
Other training data: humpback whales
Over the winter of 2021-2022, Emily Vierling worked with Val and Scott Veirs as a Beam Reach "extern" (mostly-remote internship during COVID) to describe humpback signals within the open data from Orcasound Lab hydrophones (Haro Strait, WA, USA). Leveraging her previous training with Helena Symonds and Paul Spong of OrcaLab, listening to humpbacks in Johnstone Strait (BC, Canada), Emily developed a new online Haro Humpback dictionary and annotated thousands of signals in Orcasound open data.
Presented first to the DCLDE 2022 workshop by Emily in spring, 2022, the catalogue (catalog) contains 12 signals that she found to be most common in recordings made primarily in the late fall (presumably of male humpback whales beginning to vocalize prior to leaving the Salish Sea for tropical wintertime habitat in Hawaii and/or Mexico). In addition to the 2022 version that Emily published via Wordpress, the catalogue is shared via the signal-catalogue Github repo where we hope new versions of code can be maintained to provide a generic tool to the bioacoustic community for building online and offline signal catalogues.
The 12 signal types in version 1.0 of this humpback signal dictionary are:
- Whup
- Grunt
- Ascending Moan
- Descending Moan
- Moan
- Upsweep
- Trumpet
- Growl
- Creak
- Buzz
- Shriek
- Chirp
Emily's annotated data includes ~9,000 labels and is based on ~YY hours of audio data from 3 days during October 03-28, 2021. These labeled data are part of Orcasound's AWS open data registry and are freely available under Orcasound's Creative Commons license (CC BY-NC-SA). Please attribute any use of the dictionary and/or labeled data to: "Emily Vierling, 2022, Orcasound" with a link back to orcasound.net.
- License/data sharing agreement: Creative Commons license (CC BY-NC-SA)
- Data owner / source: Orcasound
- Location: Acoustic Sandbox (S3 bucket, part of the AWS open data registry)
- No. files: 7
- File length: 12-180 MB
- Time range: 03 Oct 2021 - 28 Oct 2021
- Dataset size: 993 MB
- Description: Version 1 of Haro Humpback bioacoustic bouts for annotation by Emily Vierling (winter-spring 2022)
- Coordinates: 48.55833, -123.17357 (Orcasound Lab)
- Water depth: 8 meters
- Format: FLAC
- Codec: FLAC
- Channels: 1
- Sample Rate: 44.1 kHz
- Filelist: URI | URL via Quilt
- 211026-133018-OS-humpback-47min-clip.flac (175.3 MB)
- OS_10_03_2021_19_34_00_.flac (160.3 MB)
- OS_10_28_2021_18_54_00_.flac (150.5 MB)
- OS_10_28_2021_1900_HB.flac (12.5 MB)
- OS_10_28_2021_19_24_00_.flac (153.2 MB)
- OS_10_28_2021_19_55_00_.flac (161.2 MB)
- OS_10_28_2021_20_25_00_HB.flac (180.1 MB)
- License/data sharing agreement: Creative Commons license (CC BY-NC-SA)
- Annotator: Emily Vierling
- Method (manual or semi-manual): manual
- Detector (if applicable): N/A
- Filelist: URI | URL via Quilt
- Granularity (call, file, encounter): non-song vocalization
- Resolution (species, ecotype, call type, etc.): species; possibly individual(s) in some cases, depending on sightings data
- Columns (for each column provide description of content and possible values):
- Selection: sequential numbering within the annotation file for each labeled signal
- Begin Time (s): seconds into the recording when annotation bounding box begins _ End Time (s): seconds into the recording when annotation bounding box ends
- Low Freq (Hz): lower frequency bound of annotation bounding box
- High Freq (Hz): upper frequency bound of annotation bounding box
- Call Type: 12 non-song vocalization categories for "Haro Humpbacks" and humpbacks observed by Orca Lab in Johnstone Strait
We are also sharing the training data (URI | URL via Quilt)that Val developed based on Emily's work. It includes fixed-window audio clips and associated spectrograms. Preliminary documentation of his efforts can be found in the signal-annotation Github repo.