AI-powered agents for querying and analyzing California Landscape Metrics (CLM) datasets using natural language. Built with Pydantic AI and FastMCP.
This repository provides three Jupyter notebooks demonstrating different capabilities for interacting with CLM datasets:
- Simple Agent - Basic dataset search with conversation history
- Agent with Logfire - Enhanced logging and observability
- Enhanced Agent - Full-featured analysis with maps and visualizations
- π Natural Language Search - Find datasets using plain English queries
- π¬ Conversational Interface - Maintains context across questions
- π Statistical Analysis - Compute zonal statistics for California counties
- πΊοΈ Interactive Maps - Visualize datasets with WMS layers
- π Distribution Charts - Compare data distributions across regions
- π¬ Threshold Analysis - Calculate areas above/below specified thresholds
- π Observability - Optional Logfire integration for debugging
- Python 3.10 or higher
- Jupyter Notebook or JupyterLab
- OpenAI API key (or NRP API key for Qwen3 model)
- Clone the repository:
git clone https://github.com/yourusername/clm-agents-ndp.git
cd clm-agents-ndp- Install dependencies:
pip install -r requirements.txt- Create a
.envfile with your API key:
# For OpenAI
OPENAI_API_KEY=your_openai_api_key
# Or for NRP (optional)
NRP_API_KEY=your_nrp_api_keyOpen simple_clm_agent.ipynb and run all cells. This provides a basic chat interface for dataset search.
Example questions:
- "Find datasets about carbon turnover"
- "What are the units for this dataset?"
- "Search for burn probability datasets"
Open enhanced_clm_agent.ipynb for advanced capabilities including maps and visualizations.
Example questions:
- "What is the average carbon turnover time in Los Angeles?"
- "Show me a map of the annual burn probability dataset"
- "Compare the distribution of carbon turnover between San Diego and Orange county"
- "Which county has the highest mean carbon turnover time?"
For observability and debugging, use simple_clm_agent_with_logfire.ipynb:
- First authenticate with Logfire:
logfire auth- Run the notebook to see detailed logs at https://logfire.pydantic.dev
Basic agent demonstrating core functionality:
- Dataset search using semantic similarity
- Conversation history management
- Simple chat interface
- Support for OpenAI GPT-4o-mini or NRP Qwen3
Best for: Learning the basics, simple queries
Same as the simple agent but with comprehensive logging:
- All API calls logged to Logfire cloud
- Performance metrics tracking
- Error tracking and debugging
- HTTP request monitoring
Best for: Development, debugging, understanding agent behavior
Full-featured agent with advanced capabilities:
- Statistical analysis (mean, median, min, max, std)
- Threshold-based area calculations
- Interactive WMS map visualization
- Distribution charts and histograms
- Multi-county comparisons
- Ranking and filtering
Best for: Production use, complex analysis tasks
User Question
β
ConversationalAgent (manages history)
β
Pydantic AI Agent (processes with LLM)
β
Tools (search_datasets, compute_stats, etc.)
β
FastMCP Client β CLM-MCP Server
β
GeoServer (WCS/WFS) β Datasets
β
Response (with data/maps/charts)
The enhanced agent includes these tools:
search_and_select_dataset- Find relevant datasets by topicget_county_statistics- Compute zonal statisticsget_area_above_threshold- Calculate area above a thresholdget_area_below_threshold- Calculate area below a thresholdshow_map- Display interactive WMS mapget_value_distribution- Get data distribution for charts
- Fast and reliable
- 60 second default timeout
- Requires OpenAI API key
- Open-source model
- 180 second default timeout
- Requires NRP API key
- Set
MODEL = "nrp"in the configuration cell
"Find datasets about carbon turnover"
"What datasets are available for burn probability?"
"What is the average carbon turnover time in Los Angeles?"
"Find the maximum annual burn probability in San Diego county"
"Rank the top 5 counties by mean carbon turnover time"
"What percentage of San Diego County has carbon turnover time above 100 years?"
"Show all counties where at least 30% of area has carbon turnover less than 20 years"
"Show me a map of the carbon turnover dataset"
"Show the data distribution for San Diego and Los Angeles"
"Compare distributions between San Diego, Los Angeles, and Orange county"
User: "What is the average carbon turnover in Los Angeles?"
Agent: [provides answer]
User: "How about San Diego?"
Agent: [understands context and provides San Diego statistics]
The notebooks connect to:
- MCP Server:
https://wenokn.fastmcp.app/mcp - GeoServer:
https://sparcal.sdsc.edu/geoserver
These are configured in the BASE_CONFIG dictionary and can be modified if needed.
- For complex queries, the agent may timeout
- Try breaking down the question into simpler parts
- Switch to OpenAI model if using NRP
- Increase timeout in
conv_agent.ask(question, timeout=300)
- Ensure the dataset has been selected via search first
- Check that WMS URLs are accessible
- Verify internet connection
- Distribution charts require multiple data points
- Ensure you're querying counties that exist in the dataset
- Check that the distribution tool received data
Contributions are welcome! Please feel free to submit issues or pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with Pydantic AI
- Uses FastMCP for Model Context Protocol
- California Landscape Metrics data from SDSC
- Powered by OpenAI GPT-4o-mini or NRP Qwen3
For issues or questions:
- Open an issue on GitHub
- Check existing issues for solutions
- Review the example questions in each notebook
If you use this work in your research, please cite:
@software{clm_agents_ndp,
title = {CLM Agents in NDP: AI-powered California Landscape Metrics Analysis},
year = {2024},
license = {MIT}
}