|
1 | | -# Noise Accessor |
| 1 | +# Power Spectral Density Parquet File Retrieval and Analysis Functionality |
2 | 2 |
|
3 | | -The accessor is the toolkit used for accessing the stored files. This is done by initializing a NoiseAccessor object for a specific hydrophone, and then requesting a time range and optional time and frequency resolution (or granularity). The accessor scans the generated archive files, loads the correct ones, concatenates the data into a single dataframe, and then trims any data outside of the requested range. |
| 3 | +These modules facilitate the retrieval of parquet files stored on AWS S3 of hydrophone power spectral density and broadband sound level |
| 4 | +and include functionality to analyze that sound data. |
4 | 5 |
|
5 | | -Example: |
| 6 | +## partitioned_accessor |
| 7 | + |
| 8 | +Accessor uses the python polars library to retrieve partitioned parquet files using lazy loading for fast on-demand data retrieval. |
| 9 | + |
| 10 | +Current partition structure: |
| 11 | +*psd/hydrophone=###/year=####/month=##/day=##/* |
| 12 | +*broadband/hydrophone=###/year=####/month=##/day=##/* |
| 13 | + |
| 14 | +### Dependencies |
| 15 | + |
| 16 | +* Requires AWS CLI on PATH, (external install) |
| 17 | + |
| 18 | +### Current analytical metrics |
| 19 | + |
| 20 | +* Broadband sound level for a given frequency range |
| 21 | + * use 500-15000 for orca communication band |
| 22 | + * use >15000 for orca echo location band |
| 23 | +* 0.05, 0.25, 0.75, 0.95 broadband quantiles for a given range |
| 24 | +* Quantile vs Db range of broadband |
| 25 | + |
| 26 | +### Example |
6 | 27 |
|
7 | 28 | ```python |
8 | | -from src.orcasound_noise.analysis import NoiseAcccessor |
| 29 | +import datetime as dt |
| 30 | +from orcasound_noise.analysis.partitioned_accessor import ParitionedAccessor |
| 31 | +from orcasound_noise.utils import Hydrophone |
| 32 | + |
| 33 | +# start and end time for time range of dataset |
| 34 | +start = dt.datetime(2026, 2, 5, 0, 0, 0) |
| 35 | +end = dt.datetime(2026, 2, 6, 0, 0, 0) |
9 | 36 |
|
10 | | -ac = NoiseAcccessor(Hydrophone.ORCASOUND_LAB) |
11 | | -df = ac.create_df(dt.datetime(2023, 2, 1), dt.datetime(2023, 2, 2), delta_t=10, delta_f="3oct") |
12 | | -print(df.shape) # (8638, 26) |
| 37 | +pa_orcalab = PartitionedAccessor(Hydrophone.ORCASOUND_LAB, start, end) |
| 38 | + |
| 39 | +# start and end time of a specific ship passage, or other event of interest |
| 40 | +start_ship = dt.datetime(2026, 2, 5, 12, 30, 0) |
| 41 | +end_ship = dt.datetime(2026, 2, 5, 12, 55, 0) |
| 42 | + |
| 43 | +quantiles = pa_orcalab.get_quantiles(start_ship, end_ship) |
13 | 44 | ``` |
14 | 45 |
|
15 | | -where the parameters `delta_t=10` and `delta_f="3oct"` specify computation of 1/3-octave band levels over 10-second time intervals. |
| 46 | +### Overview of Broadband sound level calculation from PSD |
| 47 | + |
| 48 | +Assume broadband $SPL$ is represented as follows: |
| 49 | + |
| 50 | +$$ |
| 51 | +SPL = 10\log\frac{p^2(t)}{p^2_{ambient}} \; or \; SPL = 10\log\frac{V^2(t)}{V^2_{ambient}} |
| 52 | +$$ |
| 53 | + |
| 54 | +where: |
| 55 | + |
| 56 | +$ p^2(t) = V^2(t)/sensitivity$ |
| 57 | + |
| 58 | +$p^2(t)$ has units of pascals ($Pa$) and is the mean square of the pressure waveform over a given windowing time, $t$ |
| 59 | + |
| 60 | +$V^2(t)$ is the mean square of voltage waveform generated by the hydrophone |
16 | 61 |
|
17 | | -# Usage |
| 62 | +$sensitivity$ has units of $V/Pa$ and characterizes the sensitivity of the hydrophone |
18 | 63 |
|
19 | | -To initialize a NoiseAccessor object, all that is needed a Hydrophone enum instance. This instance contains all needed connection info. |
| 64 | +$p^2_{ambient}$ or $V^2_{ambient}$ is the mean square of the waveform over a period of time that is assumed to reflect the ancient ambient noise of puget sound. |
20 | 65 |
|
21 | | -## Create a Dataframe |
| 66 | +since the sound level is a ratio, the sensitivity value is canceled out and the sound pressure level can be represented by the voltage waveform. |
22 | 67 |
|
23 | | -The NoiseAccessor object has a create_df method that can be used to generate dataframes of requested ranges. It needs the following arguments: |
| 68 | +#### PSD to broadband sound level |
24 | 69 |
|
25 | | -- start: datetime object representing start of range |
26 | | -- end: datetime object representing end of range |
27 | | -- delta_t: Int, Time interval to find |
28 | | -- delta_f: Str, Hz frequency to find. Use format '50hz' for linear hz bands or '3oct' for octave bands |
29 | | -- round_timestamps: Bool, default False. Set to True to round timestamps to the delta_t frequency. Good for when grouping by time. |
| 70 | +$$ |
| 71 | + p^2= \sum_{k=f_1}^{f_2} PSD(k) \times \Delta f |
| 72 | +$$ |
30 | 73 |
|
31 | | -Currently, only 1 second 3rd octave files (`delta_t=1, delta_f="3oct"`) are periodically generated and available in AWS: anything else must be manually created and uploaded first using the [NoiseAnalysisPipeline](../pipeline/README.md). |
| 74 | +Where $PSD(k)$ has units of $Pa^2/Hz$ |
32 | 75 |
|
33 | | -## delta_f |
| 76 | +Our PSD data is reported in values of dB re Pa^2/Hz so the values need to be converted back to linear with: |
34 | 77 |
|
35 | | -This argument is a string to allow different frequency banding methods. Note that only frequency bands that have been pre-compiled are available to access. |
| 78 | +$$ |
| 79 | +PSD(f) = p_{ambient}^2 * 10^{PSD(f)_{dB}/10} |
| 80 | +$$ |
36 | 81 |
|
37 | | -- To access linear frequency bands, use the "hz" suffix. For example, a "50hz" would return frequency bounds in columns like [0, 50, 100, 150...] |
38 | | -- To access (fractions of) octave bands, use the "oct" suffix. "3oct" will return the 1/3 octave bands, starting with [63, 80, 100, 125, 160...] |
39 | | -- To access broadband noise, use the "broadband" suffix. This returns a single column representing the total noise level across all frequencies sensed by the hydrophone recording system. |
| 82 | +#### $\Delta f$ given 1/12 octave bands |
40 | 83 |
|
41 | | -## round_timestamps |
| 84 | +take n = 12 for 1/12 octaves and $f_c$ as the center frequency reported in the PSD |
42 | 85 |
|
43 | | -Due to the nature of Orcasound's source data (see the [orcanode repo](https://github.com/orcasound/orcanode)), timestamps can experience some drift in the nanosecond precision. A dataframe may start with 00:00:00.010 but may end with 00:00:00.020 or a larger gap. |
| 86 | +$f_{i,low} = \frac {f_c}{2^{1/2n}} $ and $f_{i,high} = f_c * 2^{1/2n} $ |
44 | 87 |
|
45 | | -If you want to do time-based analysis across multiple days, this can cause mis-alignment. To correct, set the _round_timestamps_ argument to true. This will round the timestamps to the delta_t value's precision, dropping nanosecond values. For example, at delta_t=10 and round_timestamps=True, every timestamp will be a multiple of 10 seconds from the minute. |
| 88 | +$\Delta f_i = f_c ( 2^{1/2n} - \frac {1}{2^{1/2n}})$ |
46 | 89 |
|
47 | | -_*Warning*_ Rounding is only available when delta_t is a divisor of 60. |
| 90 | +Then: |
48 | 91 |
|
49 | | -# Structure |
| 92 | +$\Delta f_i = 0.0577 * f_c$ |
0 commit comments