Skip to content

Commit 50466c8

Browse files
authored
Merge pull request #199 from ehinman/add-continuous
Add continuous endpoint
2 parents 237ed81 + 9631044 commit 50466c8

File tree

6 files changed

+211
-42
lines changed

6 files changed

+211
-42
lines changed

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
**12/04/2025:** The `get_continuous()` function was added to the `waterdata` module, which provides access to measurements collected via automated sensors at a high frequency (often 15 minute intervals) at a monitoring location. This is an early version of the continuous endpoint and should be used with caution as the API team improves its performance. In the future, we anticipate the addition of an endpoint(s) specifically for handling large data requests, so it may make sense for power users to hold off on heavy development using the new continuous endpoint.
2+
13
**11/24/2025:** `dataretrieval` is pleased to offer a new module, `waterdata`, which gives users access USGS's modernized [Water Data APIs](https://api.waterdata.usgs.gov/). The Water Data API endpoints include daily values, instantaneous values, field measurements (modernized groundwater levels service), time series metadata, and discrete water quality data from the Samples database. Though there will be a period of overlap, the functions within `waterdata` will eventually replace the `nwis` module, which currently provides access to the legacy [NWIS Water Services](https://waterservices.usgs.gov/). More example workflows and functions coming soon. Check `help(waterdata)` for more information.
24

35
**09/03/2024:** The groundwater levels service has switched endpoints, and `dataretrieval` was updated accordingly in [`v1.0.10`](https://github.com/DOI-USGS/dataretrieval-python/releases/tag/v1.0.10). Older versions using the discontinued endpoint will return 503 errors for `nwis.get_gwlevels` or the `service='gwlevels'` argument. Visit [Water Data For the Nation](https://waterdata.usgs.gov/blog/wdfn-waterservices-2024/) for more information.

README.md

Lines changed: 26 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,16 @@
66

77
## Latest Announcements
88

9-
:mega: **11/24/2025:** `dataretrieval` now features the new `waterdata` module,
9+
:mega: **12/04/2025:** `dataretrieval` now features the new `waterdata` module,
1010
which provides access to USGS's modernized [Water Data
1111
APIs](https://api.waterdata.usgs.gov/). The Water Data API endpoints include
12-
daily values, instantaneous values, field measurements, time series metadata,
12+
daily values, **instantaneous values**, field measurements, time series metadata,
1313
and discrete water quality data from the Samples database. This new module will
1414
eventually replace the `nwis` module, which provides access to the legacy [NWIS
1515
Water Services](https://waterservices.usgs.gov/).
1616

17+
Check out the [NEWS](NEWS.md) file for all updates and announcements.
18+
1719
**Important:** Users of the Water Data APIs are strongly encouraged to obtain an
1820
API key for higher rate limits and greater access to USGS data. [Register for
1921
an API key](https://api.waterdata.usgs.gov/signup/) and set it as an
@@ -24,8 +26,6 @@ import os
2426
os.environ["API_USGS_PAT"] = "your_api_key_here"
2527
```
2628

27-
Check out the [NEWS](NEWS.md) file for all updates and announcements.
28-
2929
## What is dataretrieval?
3030

3131
`dataretrieval` simplifies the process of loading hydrologic data into Python.
@@ -61,9 +61,9 @@ pip install git+https://github.com/DOI-USGS/dataretrieval-python.git
6161

6262
The `waterdata` module provides access to modern USGS Water Data APIs.
6363

64-
The example below retrieves daily streamflow data for a specific monitoring
65-
location for water year 2025, where a "/" between two dates in the "time"
66-
input argument indicates a desired date range:
64+
Some basic usage examples include retrieving daily streamflow data for a
65+
specific monitoring location, where the `/` in the `time` argument indicates
66+
the desired range:
6767

6868
```python
6969
from dataretrieval import waterdata
@@ -79,8 +79,7 @@ print(f"Retrieved {len(df)} records")
7979
print(f"Site: {df['monitoring_location_id'].iloc[0]}")
8080
print(f"Mean discharge: {df['value'].mean():.2f} {df['unit_of_measure'].iloc[0]}")
8181
```
82-
Fetch daily discharge data for multiple sites from a start date to present
83-
using the following code:
82+
Retrieving streamflow at multiple locations from October 1, 2024 to the present:
8483

8584
```python
8685
df, metadata = waterdata.get_daily(
@@ -91,18 +90,31 @@ df, metadata = waterdata.get_daily(
9190

9291
print(f"Retrieved {len(df)} records")
9392
```
94-
The following example downloads location information for all monitoring
95-
locations that are categorized as stream sites in the state of Maryland:
93+
Retrieving location information for all monitoring locations categorized as
94+
stream sites in the state of Maryland:
9695

9796
```python
9897
# Get monitoring location information
99-
locations, metadata = waterdata.get_monitoring_locations(
98+
df, metadata = waterdata.get_monitoring_locations(
10099
state_name='Maryland',
101100
site_type_code='ST' # Stream sites
102101
)
103102

104-
print(f"Found {len(locations)} stream monitoring locations in Maryland")
103+
print(f"Found {len(df)} stream monitoring locations in Maryland")
105104
```
105+
Finally, retrieving continuous (a.k.a. "instantaneous") data
106+
for one location. We *strongly advise* breaking up continuous data requests into smaller time periods and collections to avoid timeouts and other issues:
107+
108+
```python
109+
# Get continuous data for a single monitoring location and water year
110+
df, metadata = waterdata.get_continuous(
111+
monitoring_location_id='USGS-01646500',
112+
parameter_code='00065', # Gage height
113+
time='2024-10-01/2025-09-30'
114+
)
115+
print(f"Retrieved {len(df)} continuous gage height measurements")
116+
```
117+
106118
Visit the
107119
[API Reference](https://doi-usgs.github.io/dataretrieval-python/reference/waterdata.html)
108120
for more information and examples on available services and input parameters.
@@ -202,13 +214,13 @@ print(f"Found {len(flowlines)} upstream tributaries within 50km")
202214

203215
### Modern USGS Water Data APIs (Recommended)
204216
- **Daily values**: Daily statistical summaries (mean, min, max)
217+
- **Instantaneous values**: High-frequency continuous data
205218
- **Field measurements**: Discrete measurements from field visits
206219
- **Monitoring locations**: Site information and metadata
207220
- **Time series metadata**: Information about available data parameters
208221
- **Latest daily values**: Most recent daily statistical summary data
209222
- **Latest instantaneous values**: Most recent high-frequency continuous data
210223
- **Samples data**: Discrete USGS water quality data
211-
- **Instantaneous values** (*COMING SOON*): High-frequency continuous data
212224

213225
### Legacy NWIS Services (Deprecated)
214226
- **Daily values (dv)**: Legacy daily statistical data

dataretrieval/waterdata/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
from .api import (
1414
_check_profiles,
1515
get_codes,
16+
get_continuous,
1617
get_daily,
1718
get_field_measurements,
1819
get_latest_continuous,
@@ -30,6 +31,7 @@
3031

3132
__all__ = [
3233
"get_codes",
34+
"get_continuous",
3335
"get_daily",
3436
"get_field_measurements",
3537
"get_latest_continuous",

dataretrieval/waterdata/api.py

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,171 @@ def get_daily(
204204

205205
return get_ogc_data(args, output_id, service)
206206

207+
def get_continuous(
208+
monitoring_location_id: Optional[Union[str, List[str]]] = None,
209+
parameter_code: Optional[Union[str, List[str]]] = None,
210+
statistic_id: Optional[Union[str, List[str]]] = None,
211+
properties: Optional[List[str]] = None,
212+
time_series_id: Optional[Union[str, List[str]]] = None,
213+
continuous_id: Optional[Union[str, List[str]]] = None,
214+
approval_status: Optional[Union[str, List[str]]] = None,
215+
unit_of_measure: Optional[Union[str, List[str]]] = None,
216+
qualifier: Optional[Union[str, List[str]]] = None,
217+
value: Optional[Union[str, List[str]]] = None,
218+
last_modified: Optional[str] = None,
219+
time: Optional[Union[str, List[str]]] = None,
220+
limit: Optional[int] = None,
221+
convert_type: bool = True,
222+
) -> Tuple[pd.DataFrame, BaseMetadata]:
223+
"""
224+
Continuous data provide instantanous water conditions.
225+
226+
This is an early version of the continuous endpoint that is feature-complete
227+
and is being made available for limited use. Geometries are not included
228+
with the continuous endpoint. If the "time" input is left blank, the service
229+
will return the most recent year of measurements. Users may request no more
230+
than three years of data with each function call.
231+
232+
Continuous data are collected at a high frequency, typically 15-minute
233+
intervals. Depending on the specific monitoring location, the data may be
234+
transmitted automatically via telemetry and be available on WDFN within
235+
minutes of collection, while other times the delivery of data may be delayed
236+
if the monitoring location does not have the capacity to automatically
237+
transmit data. Continuous data are described by parameter name and
238+
parameter code (pcode). These data might also be referred to as
239+
"instantaneous values" or "IV".
240+
241+
Parameters
242+
----------
243+
monitoring_location_id : string or list of strings, optional
244+
A unique identifier representing a single monitoring location. This
245+
corresponds to the id field in the monitoring-locations endpoint.
246+
Monitoring location IDs are created by combining the agency code of
247+
the agency responsible for the monitoring location (e.g. USGS) with
248+
the ID number of the monitoring location (e.g. 02238500), separated
249+
by a hyphen (e.g. USGS-02238500).
250+
parameter_code : string or list of strings, optional
251+
Parameter codes are 5-digit codes used to identify the constituent
252+
measured and the units of measure. A complete list of parameter
253+
codes and associated groupings can be found at
254+
https://help.waterdata.usgs.gov/codes-and-parameters/parameters.
255+
statistic_id : string or list of strings, optional
256+
A code corresponding to the statistic an observation represents.
257+
Continuous data are nearly always associated with statistic id
258+
00011. Using a different code (such as 00003 for mean) will
259+
typically return no results. A complete list of codes and their
260+
descriptions can be found at
261+
https://help.waterdata.usgs.gov/code/stat_cd_nm_query?stat_nm_cd=%25&fmt=html.
262+
properties : string or list of strings, optional
263+
A vector of requested columns to be returned from the query.
264+
Available options are: geometry, id, time_series_id,
265+
monitoring_location_id, parameter_code, statistic_id, time, value,
266+
unit_of_measure, approval_status, qualifier, last_modified
267+
time_series_id : string or list of strings, optional
268+
A unique identifier representing a single time series. This
269+
corresponds to the id field in the time-series-metadata endpoint.
270+
continuous_id : string or list of strings, optional
271+
A universally unique identifier (UUID) representing a single version of
272+
a record. It is not stable over time. Every time the record is refreshed
273+
in our database (which may happen as part of normal operations and does
274+
not imply any change to the data itself) a new ID will be generated. To
275+
uniquely identify a single observation over time, compare the time and
276+
time_series_id fields; each time series will only have a single
277+
observation at a given time.
278+
approval_status : string or list of strings, optional
279+
Some of the data that you have obtained from this U.S. Geological Survey
280+
database may not have received Director's approval. Any such data values
281+
are qualified as provisional and are subject to revision. Provisional
282+
data are released on the condition that neither the USGS nor the United
283+
States Government may be held liable for any damages resulting from its
284+
use. This field reflects the approval status of each record, and is either
285+
"Approved", meaining processing review has been completed and the data is
286+
approved for publication, or "Provisional" and subject to revision. For
287+
more information about provisional data, go to:
288+
https://waterdata.usgs.gov/provisional-data-statement/.
289+
unit_of_measure : string or list of strings, optional
290+
A human-readable description of the units of measurement associated
291+
with an observation.
292+
qualifier : string or list of strings, optional
293+
This field indicates any qualifiers associated with an observation, for
294+
instance if a sensor may have been impacted by ice or if values were
295+
estimated.
296+
value : string or list of strings, optional
297+
The value of the observation. Values are transmitted as strings in
298+
the JSON response format in order to preserve precision.
299+
last_modified : string, optional
300+
The last time a record was refreshed in our database. This may happen
301+
due to regular operational processes and does not necessarily indicate
302+
anything about the measurement has changed. You can query this field
303+
using date-times or intervals, adhering to RFC 3339, or using ISO 8601
304+
duration objects. Intervals may be bounded or half-bounded (double-dots
305+
at start or end).
306+
Examples:
307+
308+
* A date-time: "2018-02-12T23:20:50Z"
309+
* A bounded interval: "2018-02-12T00:00:00Z/2018-03-18T12:31:12Z"
310+
* Half-bounded intervals: "2018-02-12T00:00:00Z/.." or "../2018-03-18T12:31:12Z"
311+
* Duration objects: "P1M" for data from the past month or "PT36H" for the last 36 hours
312+
313+
Only features that have a last_modified that intersects the value of
314+
datetime are selected.
315+
time : string, optional
316+
The date an observation represents. You can query this field using
317+
date-times or intervals, adhering to RFC 3339, or using ISO 8601
318+
duration objects. Intervals may be bounded or half-bounded (double-dots
319+
at start or end). Only features that have a time that intersects the
320+
value of datetime are selected. If a feature has multiple temporal
321+
properties, it is the decision of the server whether only a single
322+
temporal property is used to determine the extent or all relevant
323+
temporal properties.
324+
Examples:
325+
326+
* A date-time: "2018-02-12T23:20:50Z"
327+
* A bounded interval: "2018-02-12T00:00:00Z/2018-03-18T12:31:12Z"
328+
* Half-bounded intervals: "2018-02-12T00:00:00Z/.." or "../2018-03-18T12:31:12Z"
329+
* Duration objects: "P1M" for data from the past month or "PT36H" for the last 36 hours
330+
331+
limit : numeric, optional
332+
The optional limit parameter is used to control the subset of the
333+
selected features that should be returned in each page. The maximum
334+
allowable limit is 10000. It may be beneficial to set this number lower
335+
if your internet connection is spotty. The default (NA) will set the
336+
limit to the maximum allowable limit for the service.
337+
convert_type : boolean, optional
338+
If True, the function will convert the data to dates and qualifier to
339+
string vector
340+
341+
Returns
342+
-------
343+
df : ``pandas.DataFrame`` or ``geopandas.GeoDataFrame``
344+
Formatted data returned from the API query.
345+
md: :obj:`dataretrieval.utils.Metadata`
346+
A custom metadata object
347+
348+
Examples
349+
--------
350+
.. code::
351+
352+
>>> # Get instantaneous gage height data from a
353+
>>> # single site from a single year
354+
>>> df, md = dataretrieval.waterdata.get_continuous(
355+
... monitoring_location_id="USGS-02238500",
356+
... parameter_code="00065",
357+
... time="2021-01-01T00:00:00Z/2022-01-01T00:00:00Z",
358+
... )
359+
"""
360+
service = "continuous"
361+
output_id = "continuous_id"
362+
363+
# Build argument dictionary, omitting None values
364+
args = {
365+
k: v
366+
for k, v in locals().items()
367+
if k not in {"service", "output_id"} and v is not None
368+
}
369+
370+
return get_ogc_data(args, output_id, service)
371+
207372

208373
def get_monitoring_locations(
209374
monitoring_location_id: Optional[List[str]] = None,

dataretrieval/waterdata/utils.py

Lines changed: 0 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -773,23 +773,3 @@ def get_ogc_data(
773773
metadata = BaseMetadata(response)
774774
return return_list, metadata
775775

776-
777-
# def _get_description(service: str):
778-
# tags = _get_collection().get("tags", [])
779-
# for tag in tags:
780-
# if tag.get("name") == service:
781-
# return tag.get("description")
782-
# return None
783-
784-
# def _get_params(service: str):
785-
# url = f"{_base_url()}collections/{service}/schema"
786-
# resp = requests.get(url, headers=_default_headers())
787-
# resp.raise_for_status()
788-
# properties = resp.json().get("properties", {})
789-
# return {k: v.get("description") for k, v in properties.items()}
790-
791-
# def _get_collection():
792-
# url = f"{_base_url()}openapi?f=json"
793-
# resp = requests.get(url, headers=_default_headers())
794-
# resp.raise_for_status()
795-
# return resp.json()

tests/waterdata_test.py

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
_check_profiles,
1111
get_samples,
1212
get_daily,
13+
get_continuous,
1314
get_monitoring_locations,
1415
get_latest_continuous,
1516
get_latest_daily,
@@ -142,7 +143,7 @@ def test_get_daily_properties():
142143
assert df.parameter_code.unique().tolist() == ["00060"]
143144

144145
def test_get_daily_no_geometry():
145-
df, md = get_daily(
146+
df,_ = get_daily(
146147
monitoring_location_id="USGS-05427718",
147148
parameter_code="00060",
148149
time="2025-01-01/..",
@@ -152,6 +153,18 @@ def test_get_daily_no_geometry():
152153
assert df.shape[1] == 11
153154
assert isinstance(df, DataFrame)
154155

156+
def test_get_continuous():
157+
df,_ = get_continuous(
158+
monitoring_location_id="USGS-06904500",
159+
parameter_code="00065",
160+
time="2025-01-01/2025-12-31"
161+
)
162+
assert isinstance(df, DataFrame)
163+
assert "geometry" not in df.columns
164+
assert df.shape[1] == 11
165+
assert df['time'].dtype == 'datetime64[ns, UTC]'
166+
assert "continuous_id" in df.columns
167+
155168
def test_get_monitoring_locations():
156169
df, md = get_monitoring_locations(
157170
state_name="Connecticut",
@@ -162,7 +175,7 @@ def test_get_monitoring_locations():
162175
assert hasattr(md, 'query_time')
163176

164177
def test_get_monitoring_locations_hucs():
165-
df, md = get_monitoring_locations(
178+
df,_ = get_monitoring_locations(
166179
hydrologic_unit_code=["010802050102", "010802050103"]
167180
)
168181
assert set(df.hydrologic_unit_code.unique().tolist()) == {"010802050102", "010802050103"}
@@ -177,12 +190,7 @@ def test_get_latest_continuous():
177190
assert df.statistic_id.unique().tolist() == ["00011"]
178191
assert hasattr(md, 'url')
179192
assert hasattr(md, 'query_time')
180-
try:
181-
datetime.datetime.strptime(df['time'].iloc[0], "%Y-%m-%dT%H:%M:%S+00:00")
182-
out=True
183-
except:
184-
out=False
185-
assert out
193+
assert df['time'].dtype == 'datetime64[ns, UTC]'
186194

187195
def test_get_latest_daily():
188196
df, md = get_latest_daily(

0 commit comments

Comments
 (0)