SEC filing research environment powered by the SEC EDGAR database for accessing company filings and financial data, with rubric-based evaluation for structured grading provided by The LLM Data Company.
See docs for the complete environment design workflow.
environment/ - Manages SEC EDGAR and web search integration
- Uses the edgartools Python library to access SEC filing data
- Integrates with Exa API for supplementary web search capabilities
- Exposes HTTP endpoints for research workflows with exponential backoff for rate limiting
server/ - Wraps data in MCP tools
- Provides research tools for agents to access SEC filings, financial data, and web search
- Agents and tasks interact only with these tools
Why separate? Edit tools for the agent or tasks without restarting the environment backend.
setup()- Initialize the environment and reset state.search_company(query: str)- Search for a company by ticker symbol or name. Returns company information including ticker, name, and CIK.get_filings(ticker?: str, form_type?: str, limit?: int, cutoff_date?: str)- Get SEC filings. Whentickeris provided, returns company-specific filings. Otherwise, returns global recent filings. Can filter by form type (e.g., "10-K", "10-Q", "8-K"), limit results, and filter by date (YYYY-MM-DD).get_filing_content(filing_url: str)- Fetch the full text content of a specific SEC filing from its URL.get_financial_data(ticker: str, accession_number: str)- Extract financial statements and key metrics from a 10-K or 10-Q filing. Returns income statement, balance sheet, cash flow, and other financial data.get_segment_data(ticker: str, accession_number: str)- Extract segment-level financial data from a 10-K or 10-Q filing for companies with multiple business segments.get_filing_sections(ticker: str, accession_number: str)- Extract specific sections from a 10-K or 10-Q filing (e.g., Business, Risk Factors, MD&A).
web_search(query: str)- Search the web using Exa API. Returns titles and URLs of relevant results.web_fetch(url: str)- Fetch and extract content from a web URL. Returns summary, highlights, and full content.
answer(final_answer: str)- Submit the final research answer.evaluate(rubric: list[dict])- Evaluate submitted answer using a structured rubric with weighted requirements.
The evaluate tool uses The LLM Data Company's rubric package to grade answers against structured criteria with autograders.
The environment requires several API keys and configuration:
Required:
EDGAR_IDENTITY- Your identity for SEC EDGAR access (required by SEC regulations)- Format:
"Your Name your.email@example.com"
- Format:
Optional:
EXA_API_KEY- For web search and content fetching capabilities (if using web_search/web_fetch tools)HUD_API_KEY- For HUD telemetry and tracingANTHROPIC_API_KEY- For Claude agent (if using Claude)OPENAI_API_KEY- For rubric evaluation (if using OpenAI-based autograders)
Add these to your .env before running hud eval:
export EDGAR_IDENTITY="Your Name your.email@example.com"
export EXA_API_KEY="your-exa-key" # optional, for web search
export ANTHROPIC_API_KEY="your-anthropic-key" # only if using an Anthropic model
export OPENAI_API_KEY="your-openai-key"
# Optional
export HUD_API_KEY="your-hud-key"# Terminal 1 - Environment backend
cd environment
export EDGAR_IDENTITY="Your Name your.email@example.com"
export EXA_API_KEY="your-exa-key" # optional, for web search
uv run uvicorn server:app --reload
# Terminal 2 - MCP server
cd server
uv run hud devThe environment includes exponential backoff for rate limiting, so API calls will automatically retry on 429 errors.
In general, we recommend starting work on the environment backend first, then developing the MCP server to expose the right things to the agent.
For complex environments that require many dependencies, we recommend running hud dev in the environment root:
cd ..
hud dev# Build first in the global folder with the Dockerfile (creates rubrics:latest)
hud buildYour tasks.json uses docker run to launch the environment:
{
"prompt": "Analyze Tesla's FY2024 10-K filing...",
"mcp_config": {
"local": {
"command": "docker",
"args": ["run", "--rm", "-i", "rubrics:latest"]
}
},
"evaluate_tool": {
"name": "evaluate",
"arguments": {
"rubric": [...]
}
}
}Note: Export environment variables before running. The Docker container will inherit them from your shell.
Commands:
# Build first
hud build
# Test task locally
export EDGAR_IDENTITY="Your Name your.email@example.com"
export EXA_API_KEY="your-exa-key" # optional, for web search
export ANTHROPIC_API_KEY="your-anthropic-key"
export OPENAI_API_KEY="your-openai-key"
hud eval tasks.json --max-steps 25
# Push environment for remote running
hud push
# Production RL training
hud rl tasks.json # Auto-converts docker→remote, builds & pushes if neededOnce your environment is ready, you can share it with the community:
# Build and push your environment (requires docker hub login and hud api key)
hud build
hud pushCreate a dataset on HuggingFace with your tasks:
Option A: Upload manually
- Upload your
tasks.jsonto HuggingFace - Make sure it's public to appear on leaderboards
Option B: Use the SDK
from hud.datasets import save_tasks
import json
# Load your tasks
with open("tasks.json") as f:
tasks = json.load(f)
# Push to HuggingFace
save_tasks(tasks, repo_id="your-org/your-dataset")# Run Claude on your benchmark
hud eval "your-org/your-dataset" --agent claude
# View results at:
# hud.so/leaderboards/your-org/your-datasetNote: Only public HuggingFace datasets appear as leaderboards!
📚 Learn more: Creating Benchmarks | Leaderboards
# Initialize environment
setup()
# Agent searches for a company
company_info = search_company("TSLA")
# Returns: [{"ticker": "TSLA", "name": "Tesla Inc", "cik": "1318605"}]
# Agent gets recent filings
filings = get_filings(ticker="TSLA", form_type="10-K", limit=1)
# Returns: [{"filing_date": "2024-01-01", "form_type": "10-K", "accession_number": "...", "filing_url": "..."}]
# Agent extracts financial data
financial_data = get_financial_data(ticker="TSLA", accession_number=filings[0]["accession_number"])
# Returns: {"has_financials": True, "financial_data": {...income statement, balance sheet, etc...}}
# Agent gets specific sections from the filing
sections = get_filing_sections(ticker="TSLA", accession_number=filings[0]["accession_number"])
# Returns: {"sections": {"business": "...", "risk_factors": "...", "mda": "..."}}
# Agent uses web search for additional context
search_results = web_search("Tesla FY2024 revenue analysis")
# Returns: [{"title": "...", "url": "..."}]
# Agent fetches web content
web_content = web_fetch(search_results[0]["url"])
# Returns: "=== SUMMARY ===\n...\n=== KEY HIGHLIGHTS ===\n...\n=== FULL CONTENT ===\n..."
# Agent submits final answer
answer("Based on Tesla's FY2024 10-K, revenue was $96.8B...")
# Evaluate answer using rubric
result = evaluate(rubric=[
{"requirement": "Correctly states FY2024 revenue", "weight": 15},
{"requirement": "Provides segment breakdown", "weight": 5},
])
# Returns: {"reward": float, "info": {"report": [...]}, "done": True}- edgartools: Python library for accessing SEC EDGAR data
- fastapi: Web framework for the environment server
- httpx: HTTP client for API calls
- rubric: LLM Data Company's rubric evaluation package
- Exa API: Web search and content extraction (optional, for web_search/web_fetch tools)
- EdgarTools - Python library to access SEC EDGAR
- SEC EDGAR MCP - Rich OSS SEC MCP server