Turn bloated 10‑Ks into clean, navigable financial statements (Balance Sheet, Income Statement, Cash Flow) and browse them locally with a fast UI.
This repo has two parts:
- Extractor (
edgar_extract.py): downloads the as-filed statement tables from EDGAR report HTML (R*.htm) and exports CSV + JSON plus a manifest for indexing. - Viewer (
edgar_viewer.py): a local FastAPI app that auto-discovers multiple company exports and lets you browse filings grouped by year, with collapsible line-item hierarchy and search.
Most 10‑Ks are huge documents where the core financial statements are buried under pages of narrative. This project pulls the statements out into a consistent, structured format and gives you a lightweight way to explore them.
![]() |
![]() |
![]() |
![]() |
- Pulls filings via
data.sec.gov/submissions - Finds the statement reports via
FilingSummary.xml - Parses tables from EDGAR’s statement HTML reports (R*.htm)
- Exports:
balance_sheet.csv+balance_sheet.jsonincome_statement.csv+income_statement.jsoncash_flow.csv+cash_flow.jsonmanifest.jsonper company (indexes filings + output paths + source URLs)FilingSummary.xml+ the referencedR*.htmfiles used for parsing
- Preserves structure using an
indentarray (either from HTML indentation or inferred)
- Auto-discovers exports from a “super-root” directory
- Homepage: shows every discovered company/CIK + latest filing date
- Company page: filings grouped by report year (from
reportDate) - Statement page:
- collapsible tree using indentation
- expand/collapse all
- fast label filter
- link back to the original EDGAR report page
Recommended:
.
├── edgar_extract.py
├── edgar_viewer.py
├── requirements.txt
└── statements/
├── ui_statements/
│ └── 0001511737/
│ ├── manifest.json
│ └── <accession>/
│ ├── balance_sheet.json
│ ├── income_statement.json
│ └── cash_flow.json
└── aapl_statements/
└── 0000320193/
└── manifest.json
The viewer supports these layouts under EDGAR_OUT_ROOT:
ROOT/<CIK>/manifest.jsonROOT/<collection>/<CIK>/manifest.json
Windows (PowerShell)
python -m venv .venv
.venv\Scripts\Activate.ps1macOS/Linux
python3 -m venv .venv
source .venv/bin/activatepip install -r requirements.txtSuggested requirements.txt:
requests>=2.31.0
beautifulsoup4>=4.12.0
lxml>=5.0.0
fastapi>=0.110.0
uvicorn[standard]>=0.27.0SEC expects automated access to identify itself with a descriptive User-Agent including contact info.
You can set it as an environment variable:
Windows (PowerShell)
$env:SEC_UA = "FreeThe10Ks (your_email@example.com)"macOS/Linux
export SEC_UA="FreeThe10Ks (your_email@example.com)"Or pass --user-agent on each run.
python edgar_extract.py --cik 0001511737 --out statements/ui_statementspython edgar_extract.py --cik 0001511737 `
--years 8 `
--limit 8 `
--out statements/ui_statements `
--include-amendsFlags:
--years: lookback window--limit: max number of filings to process--include-amends: include10-K/A--keep-abstract: keep XBRL scaffolding rows (default is to drop them)--min-interval: rate-limit delay between SEC requests
PowerShell
$env:SEC_UA = "FreeThe10Ks (your_email@example.com)"
$targets = @(
@{ name = "ui_statements"; cik = "0001511737" },
@{ name = "aapl_statements"; cik = "0000320193" },
@{ name = "msft_statements"; cik = "0000789019" }
)
foreach ($t in $targets) {
python edgar_extract.py --cik $t.cik --out ("statements\" + $t.name) --years 6 --limit 6
}This produces multiple export folders under statements/, which the viewer will pick up automatically.
EDGAR_OUT_ROOT should point to the directory containing one or more export collections.
Example:
C:\Users\you\Documents\SEC Edgar\statements
├── ui_statements
├── aapl_statements
└── msft_statements
Windows (PowerShell)
$env:EDGAR_OUT_ROOT = "C:\Users\you\Documents\SEC Edgar\statements"
uvicorn edgar_viewer:app --reload --port 8000macOS/Linux
export EDGAR_OUT_ROOT="/path/to/statements"
uvicorn edgar_viewer:app --reload --port 8000Open:
Contains:
- filings (accession, form, filingDate, reportDate)
- chosen reports for BS/IS/CFS (short/long names + EDGAR URLs)
- output paths for CSV/JSON
- any parse errors per filing
Each statement JSON contains:
rows: table rows (first row is header)indent: integer indentation per row for hierarchyindent_mode:"from_html"or"inferred"sourceUrl: EDGAR report URL usedreport: metadata describing the selected report
The viewer uses indent to build the collapsible tree.
PowerShell uses:
$env:VAR = "value"not export VAR=value.
Check:
EDGAR_OUT_ROOTis correct- Your exports include a
manifest.jsonat either:
ROOT/<CIK>/manifest.jsonROOT/<collection>/<CIK>/manifest.json
A filing may use unusual naming or different report structure. Look at errors inside the relevant filing entry in the company’s manifest.json.
- The extractor includes a small rate limiter and retry/backoff for transient errors.
- File writes use safe path resolution to avoid writing outside the chosen output directory.
- The viewer is intentionally server-side only: it serves HTML pages and reads JSON locally.
None



