Turn bloated 10-Ks into clean, navigable financial statements (Balance Sheet, Income Statement, Cash Flow) and explore them locally with a professional UI — complete with analytics, charts, and multi-year trend tracking.
Most 10-Ks are massive documents where the core financial statements are buried under pages of narrative. This project:
- Extracts the statements directly from EDGAR's report HTML
- Structures them as clean CSV + JSON with hierarchy data
- Visualizes them in an interactive local web app with analytics
- Pulls filings via
data.sec.gov/submissions - Finds reports via
FilingSummary.xml - Parses tables from EDGAR's statement HTML (R*.htm) with robust handling for:
- colspan/rowspan, multi-page table stitching, multi-line header merging
- CSS indentation detection (4 fallback strategies)
- XBRL scaffolding removal
- Exports CSV + JSON + manifest per company
- Rate limiting with exponential backoff
- Homepage: Company cards with real names/tickers (resolved from SEC API), key financial metrics with YoY% change, mini sparkline charts
- Company Page: Key metrics dashboard, multi-year trend chart (Chart.js), financial ratios (Current Ratio, D/E, margins), filings grouped by fiscal year
- Statement Page: Collapsible hierarchy tree, YoY% change column (color-coded), inline magnitude bars, label search/filter, keyboard navigation
- On-Demand Extraction: Add new companies directly from the UI
- Company Search: Search SEC's full company database by name or ticker
- REST API: JSON endpoints for metrics, trends, ratios, search
FreeThe10Ks/
app/
main.py # FastAPI app entry point
config.py # Environment settings
models.py # Pydantic response models
routers/
pages.py # HTML page routes (Jinja2)
api.py # JSON API routes (/api/...)
services/
filing_store.py # Manifest discovery & JSON loading
company_info.py # SEC company name/ticker resolution
analytics.py # YoY, metrics, ratios computation
extractor.py # On-demand extraction wrapper
templates/
base.html # Shared layout
index.html # Homepage
company.html # Company detail page
statement.html # Statement viewer
static/
css/styles.css # Design system
css/statement.css # Table styles
js/ # Client-side interactivity
sec_statements.py # CLI extractor
edgar_viewer.py # Legacy viewer (still works standalone)
requirements.txt
python3 -m venv .venv
source .venv/bin/activate # macOS/Linux
# .venv\Scripts\Activate.ps1 # Windows PowerShell
pip install -r requirements.txtmacOS/Linux
export SEC_UA="FreeThe10Ks (your_email@example.com)"Windows PowerShell
$env:SEC_UA = "FreeThe10Ks (your_email@example.com)"python sec_statements.py --cik 0001045810 --out statements/nvidia
python sec_statements.py --cik 0000320193 --out statements/apple
python sec_statements.py --cik 0000789019 --out statements/microsoftexport EDGAR_OUT_ROOT="statements"
uvicorn app.main:app --reload --port 8000| Endpoint | Description |
|---|---|
GET /api/company/search?q=apple |
Search SEC companies by name/ticker |
GET /api/company/{cik} |
Get company name, ticker, exchange |
GET /api/companies |
List all loaded companies |
GET /api/companies/{col}/{cik}/metrics |
Key financial metrics from latest filing |
GET /api/companies/{col}/{cik}/ratios |
Financial ratios (current, D/E, margins) |
GET /api/companies/{col}/{cik}/trends/{metric} |
Multi-year trend data for charts |
POST /api/extract |
Trigger on-demand extraction for a CIK |
Available trend metrics: revenue, net_income, total_assets, total_equity, operating_cash_flow
python sec_statements.py --cik 0001045810 \
--years 8 \
--limit 8 \
--out statements/nvidia \
--include-amends| Flag | Default | Description |
|---|---|---|
--cik |
required | Company CIK number |
--years |
5 | Lookback window (years) |
--limit |
5 | Max filings to process |
--out |
sec_statements_out |
Output directory |
--user-agent |
$SEC_UA |
SEC-required contact info |
--include-amends |
false | Include 10-K/A amendments |
--keep-abstract |
false | Keep XBRL scaffolding rows |
Bash
for cik in 0001045810 0000320193 0000789019; do
python sec_statements.py --cik "$cik" --out "statements/$cik" --years 6 --limit 6
donePowerShell
$env:SEC_UA = "FreeThe10Ks (your_email@example.com)"
@("0001045810", "0000320193", "0000789019") | ForEach-Object {
python sec_statements.py --cik $_ --out "statements\$_" --years 6 --limit 6
}Indexes all processed filings with accession numbers, dates, chosen reports, output paths, and any parse errors.
Each statement JSON contains:
rows— table data (first row = header)indent— integer hierarchy levels per rowindent_mode—"from_html"or"inferred"sourceUrl— original EDGAR report URLreport— metadata about the selected report
| Variable | Required | Description |
|---|---|---|
SEC_UA |
Yes | SEC User-Agent with contact email |
EDGAR_OUT_ROOT |
No | Root directory for statement exports (default: statements) |
EDGAR_CACHE_DIR |
No | Cache directory for SEC API responses (default: cache) |
| Key | Action |
|---|---|
↑ / k |
Previous row |
↓ / j |
Next row |
Enter / Space |
Toggle expand/collapse |
/ |
Focus search |
Escape |
Clear focus / blur search |
The original single-file viewer (edgar_viewer.py) still works:
export EDGAR_OUT_ROOT="statements"
uvicorn edgar_viewer:app --reload --port 8000Check that EDGAR_OUT_ROOT points to a directory containing exports with manifest.json files at:
ROOT/<CIK>/manifest.json, orROOT/<collection>/<CIK>/manifest.json
A filing may use unusual naming or different report structure. Check errors inside the relevant filing entry in the company's manifest.json.
Set SEC_UA so the app can query the SEC EDGAR API for company metadata. Names are cached after first lookup.
None