An interactive web application for exploring and analyzing baseflow separation across thousands of USGS stream gages in the contiguous United States. Built on the PyBFS physically-based algorithm and backed by USGS WaterServices, this tool lets researchers and practitioners calibrate models, visualize flow components, detect anomalies, and fetch live data — all from a browser.
git clone git@github.com:BYU-Hydroinformatics/baseflow_analyst.git
cd baseflow_analyst
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
python app.pyThen open http://127.0.0.1:5000 in your browser. The app ships with pre-calibrated results for 8,729 USGS gages, so no data pipeline steps are required to start exploring.
- Overview
- Features
- Architecture
- Prerequisites
- Installation
- Data Pipeline
- Usage
- API Reference
- Directory Structure
- Configuration
- Contributing
- License
PyBFS App combines a physically-based baseflow separation model with real-time USGS data access into a single Flask application. The map view displays every active USGS stream gage that has daily discharge records, color-coded by calibration status and watershed behavior. Clicking a gage loads interactive Plotly charts without any page reload, running BFS on-the-fly from pre-calibrated parameters stored on disk.
The backend pipeline consists of four standalone scripts that can be run once to bootstrap the dataset, after which the web app serves everything dynamically.
- PyBFS algorithm — physically-based coupled surface/subsurface reservoir model separating total streamflow into baseflow, surface flow, and direct runoff
- Traditional method comparisons — Chapman, Lyne-Hollick (LH), and Eckhardt filter overlaid on the same chart
- 90% Bayesian credible intervals on baseflow estimates via
pybfs.bf_ci
- 90-day baseflow forecast projected from the last observed state, displayed alongside the historical record
- Annual Baseflow Index (BFI) — year-by-year bar chart with long-term mean
- Flow component fractions — BFF / SFF / DRF displayed as a donut summary
- Flow anomaly detection — top extreme low-flow and high-flow events identified by percentile thresholds, ranked by severity, with narrative descriptions
- US Drought Monitor (USDM) — county-level weekly drought severity timeline correlated with detected low-flow anomalies
- National Water Model (NWM) classification — Natural vs. Artificial channel behavior
- GAGES-II reference / non-reference status for each gage
- Live USGS update — fetch the latest full discharge record from USGS WaterServices, recalibrate, and view updated results in the browser without touching local files
- Leaflet-based interactive map with marker clustering
- Filter panel: calibration status, NWM behavior, GAGES-II class, low-flow gages
- Side panel with gage metadata, calibration trigger, and tabbed chart view
pybfs_app/
├── app.py # Flask application (entry point)
├── usgs_download_all_daily.py # Pipeline step 1: download USGS data
├── detect_low_flow_gages.py # Pipeline step 2: classify low-flow gages
├── calibrate_all_gages.py # Pipeline step 3: calibrate BFS parameters
├── run_bfs_all_gages.py # Pipeline step 4: pre-compute results & plots
├── templates/
│ └── index.html # Single-page map application
├── behavior/
│ ├── NWM_USGS_Natural_Flow.csv
│ ├── NWM_USGS_Artificial_Path.csv
│ └── GAGES-II_ref_non_ref.csv
├── pybfs/ # PyBFS library (submodule)
├── baseflow/ # Baseflow library (submodule)
├── usgs_daily_streamflow/ # Downloaded gage CSVs (generated)
├── usgs_calibration_results/ # Calibrated parameters (generated)
├── usgs_bfs_results/ # Pre-computed plots & CSVs (generated)
├── temp_results/ # Live-update working directory (generated)
└── low_flow_gages.csv # Low-flow classification (generated)
The app reads calibration parameters from usgs_calibration_results/params/ and runs BFS in-process on every chart request. Results are cached in memory for 5 minutes (BFS_CACHE_TTL). The "Update Data" feature runs a full download + calibration cycle in a background thread, writing to temp_results/ so original data is never modified.
| Requirement | Version |
|---|---|
| Python | ≥ 3.9 |
| pip | ≥ 22 |
Python package dependencies (install via requirements.txt):
flask
pandas
numpy
matplotlib
scipy
statsmodels
numba
The pybfs and baseflow libraries ship as subdirectories under the project root and are added to sys.path automatically by app.py.
# Clone the repository
git clone git@github.com:BYU-Hydroinformatics/baseflow_analyst.git
cd baseflow_analyst
# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtNote:
numbarequires a C compiler. On macOS install Xcode Command Line Tools (xcode-select --install). On Linux ensuregccis available.
Run these scripts once to build the dataset. Each step is idempotent — re-running skips already-completed work so you can safely resume after interruptions.
Downloads all available daily discharge records for every active USGS stream gage in CONUS (48 states + DC).
python usgs_download_all_daily.pyOutput (usgs_daily_streamflow/):
site_info.csv— gage metadata (coordinates, drainage area, …)<site_no>.csv— one CSV per gage withdateandstreamflow(cfs) columnsgauges_with_data.csv— list of gages that returned datagauges_without_data.csv— list of gages with no available records
This step queries the USGS WaterServices REST API state-by-state with a 1-second delay between requests and a 0.5-second delay per gage download to respect rate limits. Expect several hours for a full CONUS run (~10 000 gages).
Classifies gages that exhibit sustained low-flow periods (e.g. regulated rivers, intermittent streams). Uses multiprocessing to analyze all gages in parallel.
python detect_low_flow_gages.py [--pct 10] [--cv 0.15] [--dur 60]| Flag | Default | Description |
|---|---|---|
--pct |
10 | Percentile threshold (flow below this is "low") |
--cv |
0.15 | Max coefficient of variation to consider flow "constant" |
--dur |
60 | Minimum consecutive days below threshold |
Output: low_flow_gages.csv with columns site_no, has_lowflow, max_lowflow_duration, max_run_cv, threshold_cfs, record_days.
Runs the PyBFS calibration routine for every gage that has both streamflow data and a known drainage area. Saves per-site parameter files used by the web app. Skips already-calibrated sites and logs failures for easy resumption.
python calibrate_all_gages.pyOutput (usgs_calibration_results/):
params/params_<site_no>.csv— calibrated basin and groundwater parametersbff/bff_<site_no>.csv— baseflow, surface flow, and direct runoff fractionsall_params.csv— combined parameter table for all sitesall_bff.csv— combined BFF tablecalibration_failed.csv— sites that could not be calibrated
Calibration takes roughly 60–90 seconds per gage on a modern CPU.
Generates static PNG plots and result CSVs for every calibrated gage. The web app runs BFS on-the-fly, so this step is optional but useful for producing publication-quality figures in bulk.
python run_bfs_all_gages.pyOutput per gage (usgs_bfs_results/<site_no>/):
bfs_results.csv— daily baseflow separation time seriesbaseflow_separation.png— observed streamflow vs. flow componentsflow_fractions.png— BFF / SFF / DRF pie chartannual_bfi.png— annual Baseflow Index bar chartconfidence_intervals.png— baseflow with 90% credible interval bandci_table.csv/ci_daily.csv— credible interval tablesforecast.png/forecast.csv— 90-day baseflow forecast
A .done marker file is written per site so the script can be interrupted and resumed safely.
python app.pyOpen http://localhost:5000 in your browser. The app starts by loading all gage metadata, NWM behavior data, and low-flow classifications into memory.
For production deployments, serve with Gunicorn:
pip install gunicorn
gunicorn -w 4 -b 0.0.0.0:5000 app:appThe map loads all gages as clustered markers. Marker color indicates status:
| Color | Meaning |
|---|---|
| Green | Calibrated — charts available |
| Gray | Pending — calibration not yet run |
Use the filter panel (top-right) to show only:
- Calibrated gages
- Natural-channel gages (NWM classification)
- GAGES-II reference gages
- Gages without low-flow periods
Click any marker to open the side panel:
- Info tab — site name, USGS ID, coordinates, drainage area, NWM behavior, GAGES-II class, calibration status
- Charts tab — interactive Plotly charts loaded on demand:
- Baseflow separation time series with traditional method overlay
- Flow component fractions (BFF / SFF / DRF)
- Annual BFI bar chart
- Confidence interval band
- 90-day forecast
- County drought monitor timeline
- Anomalies tab — ranked extreme flow events with narrative descriptions and drought context
If a gage shows "Pending," click Run Calibration. A progress bar tracks the background process (~90 seconds). Once complete, the charts load automatically.
Click Update Data on any gage to:
- Download the complete historical record fresh from USGS
- Re-calibrate on the updated data
- View charts based on the new calibration (written to
temp_results/)
Original calibration files are never overwritten.
All endpoints return JSON unless noted otherwise.
Returns all gages with coordinates, status, NWM behavior, and low-flow flag.
[
{
"site_no": "12167000",
"name": "SAUK RIVER NEAR SAUK CITY, WA",
"lat": 48.47,
"lng": -121.59,
"status": "calibrated",
"behavior": "Natural",
"ref_status": "Ref",
"has_lowflow": false
}
]Returns detailed metadata and BFF/SFF/DRF fractions for a single gage.
Runs BFS on-the-fly and returns time series data for all interactive charts:
{
"dates": ["2000-01-01", "..."],
"qob": [12.4, "..."],
"baseflow": [8.1, "..."],
"surface_flow": [3.1, "..."],
"direct_runoff": [1.2, "..."],
"ci_lower": [7.5, "..."],
"ci_upper": [8.7, "..."],
"forecast_dates": ["2025-01-01", "..."],
"forecast_baseflow": [7.9, "..."],
"annual_years": [2000, 2001, "..."],
"annual_bfi": [0.652, "..."],
"mean_bfi": 0.641,
"bff": 0.641, "sff": 0.253, "drf": 0.106
}Large datasets (> 5 000 points) are automatically downsampled for chart performance; total_points reports the full record length.
Triggers background calibration. Returns {"status": "started"} or {"status": "already_done"}.
Server-Sent Events (SSE) stream reporting calibration / update progress:
{"stage": "calibrating", "progress": 10, "message": "Calibrating...", "done": false}Fetches the latest USGS data and runs a full calibration cycle in temp_results/.
Returns US Drought Monitor weekly severity percentages (D0–D4) for the gage's county since January 2000, plus the current drought level string.
Detects and ranks extreme low-flow and high-flow events, correlates with drought data, and returns narrative descriptions.
{
"events": [
{
"type": "low_flow",
"start_date": "2015-07-01",
"end_date": "2015-09-28",
"duration": 89,
"min_flow": 0.12,
"threshold": 0.43,
"severity": 58.3,
"drought_context": "This coincided with D3 (Extreme Drought) conditions affecting 74% of the county.",
"narrative": "Extreme low-flow period from 2015-07-01 to 2015-09-28..."
}
]
}Serves pre-computed static plot PNGs.
pybfs_app/
├── app.py # Flask application
├── usgs_download_all_daily.py # Step 1: download USGS data
├── detect_low_flow_gages.py # Step 2: low-flow classification
├── calibrate_all_gages.py # Step 3: BFS calibration
├── run_bfs_all_gages.py # Step 4: batch BFS + plots
│
├── templates/
│ └── index.html # Map SPA (Leaflet + Plotly)
│
├── behavior/ # Static reference datasets
│ ├── NWM_USGS_Natural_Flow.csv
│ ├── NWM_USGS_Artificial_Path.csv
│ └── GAGES-II_ref_non_ref.csv
│
├── pybfs/ # PyBFS library
├── baseflow/ # Baseflow separation library
│
├── low_flow_gages.csv # Generated by step 2
│
├── usgs_daily_streamflow/ # Generated by step 1
│ ├── site_info.csv
│ ├── gauges_with_data.csv
│ └── <site_no>.csv (one per gage)
│
├── usgs_calibration_results/ # Generated by step 3
│ ├── params/params_<site_no>.csv
│ ├── bff/bff_<site_no>.csv
│ ├── all_params.csv
│ ├── all_bff.csv
│ └── calibration_failed.csv
│
├── usgs_bfs_results/ # Generated by step 4 (optional)
│ └── <site_no>/
│ ├── bfs_results.csv
│ ├── baseflow_separation.png
│ ├── flow_fractions.png
│ ├── annual_bfi.png
│ ├── confidence_intervals.png
│ ├── forecast.png
│ └── ci_table.csv
│
└── temp_results/ # Live-update scratch space
└── <site_no>/
Key constants at the top of app.py:
| Constant | Default | Description |
|---|---|---|
FORECAST_DAYS |
90 | Number of days projected in the baseflow forecast |
BFS_CACHE_TTL |
300 | Seconds to cache on-the-fly BFS results in memory |
CFS_TO_M3_PER_DAY |
2446.58 | Unit conversion factor (cfs → m³/day) |
SQMI_TO_M2 |
2 589 988.11 | Unit conversion factor (mi² → m²) |
Directory paths (STREAMFLOW_DIR, CALIB_DIR, RESULTS_DIR, TEMP_DIR) are derived from the location of app.py and can be adjusted at the top of the file if you keep data in a different location.
- Fork the repository and create a feature branch.
- Follow existing code style (no external formatter required).
- Keep new dependencies minimal — the app intentionally avoids heavy frameworks.
- Open a pull request with a clear description of the change and any relevant screenshots.
For bug reports, please include the USGS site number that triggered the issue, the Python version, and the full traceback.
This project is licensed under the MIT License. See LICENSE for details.
The PyBFS and baseflow libraries ship as subdirectories and carry their own respective licenses.
- PyBFS — BYU Hydroinformatics, physically-based baseflow separation algorithm
- USGS WaterServices — daily streamflow data and site metadata
- US Drought Monitor (USDM) — county-level drought severity data
- FCC Census Bureau Area API — lat/lon to county FIPS conversion
- National Water Model (NWM) — channel behavior classification
- GAGES-II — reference/non-reference gage classification