Skip to content

Commit a979e0a

Browse files
sdenton4copybara-github
authored andcommitted
Avoid collisions in common CSV names, and allow instantiating deployment+recording metadata directly from a dataframe.
PiperOrigin-RevId: 876277128
1 parent 0ad47fd commit a979e0a

File tree

6 files changed

+233
-140
lines changed

6 files changed

+233
-140
lines changed

perch_hoplite/agile/README.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -45,22 +45,22 @@ directory:
4545
`int`, and `bytes`.
4646
* `description`: An optional description of the field.
4747

48-
2. **`deployments_metadata.csv`**: This file contains metadata for each
48+
2. **`hoplite_deployments_metadata.csv`**: This file contains metadata for each
4949
deployment. The first column must be the deployment identifier (which
5050
corresponds to the directory name if audio files are in
5151
`deployment/recording.wav`
5252
format), and subsequent columns should match `field_name`s from
5353
`metadata_description.csv` where `metadata_level` is `deployment`.
5454

55-
3. **`recordings_metadata.csv`**: This file contains metadata for each
55+
3. **`hoplite_recordings_metadata.csv`**: This file contains metadata for each
5656
recording. The first column must be the recording identifier (e.g.
5757
`deployment/recording.wav`),
5858
and subsequent columns should match `field_name`s from
5959
`metadata_description.csv` where `metadata_level` is `recording`.
6060

6161
### Example
6262

63-
**`metadata_description.csv`**
63+
**`hoplite_metadata_description.csv`**
6464

6565
```csv
6666
field_name,metadata_level,type,description
@@ -71,15 +71,15 @@ file_id,recording,str,Recording identifier.
7171
mic_type,recording,str,Microphone type.
7272
```
7373

74-
**`deployments_metadata.csv`**
74+
**`hoplite_deployments_metadata.csv`**
7575

7676
```csv
7777
deployment_name,habitat,latitude
7878
DEP01,"forest",47.6
7979
DEP02,"grassland",45.1
8080
```
8181

82-
**`recordings_metadata.csv`**
82+
**`hoplite_recordings_metadata.csv`**
8383

8484
```csv
8585
file_id,mic_type
@@ -98,7 +98,8 @@ and `Recording` objects returned by the database interface (e.g.,
9898

9999
If you have existing annotations for your audio data, Hoplite can ingest these
100100
during the embedding process. Annotations should be stored in CSV files named
101-
`annotations.csv` alongside your audio data. Each `annotations.csv` should
101+
`hoplite_annotations.csv` alongside your audio data. Each
102+
`hoplite_annotations.csv` should
102103
contain columns for `recording` (the filename or file_id of the audio),
103104
`start_offset_s`, `end_offset_s`, `label`, and `label_type` ('positive',
104105
'negative', or 'uncertain'). When embeddings are generated, Hoplite will find
@@ -107,7 +108,7 @@ appropriate time windows.
107108

108109
### Example
109110

110-
**`annotations.csv`**
111+
**`hoplite_annotations.csv`**
111112

112113
```csv
113114
recording,start_offset_s,end_offset_s,label,label_type

perch_hoplite/agile/embed.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -220,6 +220,7 @@ def _get_or_insert_deployment_id(
220220
)
221221
if not deployments:
222222
md = self.metadata[project_name].get_deployment_metadata(deployment_name)
223+
md.pop('deployment', None)
223224
return self.db.insert_deployment(
224225
name=deployment_name,
225226
project=project_name,
@@ -243,6 +244,7 @@ def _get_or_insert_recording_id(
243244
)
244245
if not recordings:
245246
md = self.metadata[dataset_name].get_recording_metadata(filename)
247+
md.pop('recording', None)
246248
return (
247249
self.db.insert_recording(
248250
filename=filename,
@@ -296,7 +298,7 @@ def add_annotations(self, target_dataset_name: str | None = None):
296298
if not agile_md.annotations:
297299
continue
298300
for file_id, annotation_list in agile_md.annotations.items():
299-
depl_name = os.path.split(file_id)[0]
301+
depl_name = file_id.split('/')[0]
300302
if not depl_name:
301303
logging.warning(
302304
'Could not get deployment name from file_id %s, skipping.',

0 commit comments

Comments
 (0)