Skip to content

Latest commit

 

History

History
233 lines (176 loc) · 6.98 KB

File metadata and controls

233 lines (176 loc) · 6.98 KB

FreeThe10Ks — SEC EDGAR 10-K Financial Statement Explorer

Turn bloated 10-Ks into clean, navigable financial statements (Balance Sheet, Income Statement, Cash Flow) and explore them locally with a professional UI — complete with analytics, charts, and multi-year trend tracking.


Why This Exists

Most 10-Ks are massive documents where the core financial statements are buried under pages of narrative. This project:

  1. Extracts the statements directly from EDGAR's report HTML
  2. Structures them as clean CSV + JSON with hierarchy data
  3. Visualizes them in an interactive local web app with analytics

Features

Extractor (sec_statements.py)

  • Pulls filings via data.sec.gov/submissions
  • Finds reports via FilingSummary.xml
  • Parses tables from EDGAR's statement HTML (R*.htm) with robust handling for:
    • colspan/rowspan, multi-page table stitching, multi-line header merging
    • CSS indentation detection (4 fallback strategies)
    • XBRL scaffolding removal
  • Exports CSV + JSON + manifest per company
  • Rate limiting with exponential backoff

Viewer (app/)

  • Homepage: Company cards with real names/tickers (resolved from SEC API), key financial metrics with YoY% change, mini sparkline charts
  • Company Page: Key metrics dashboard, multi-year trend chart (Chart.js), financial ratios (Current Ratio, D/E, margins), filings grouped by fiscal year
  • Statement Page: Collapsible hierarchy tree, YoY% change column (color-coded), inline magnitude bars, label search/filter, keyboard navigation
  • On-Demand Extraction: Add new companies directly from the UI
  • Company Search: Search SEC's full company database by name or ticker
  • REST API: JSON endpoints for metrics, trends, ratios, search

Project Structure

FreeThe10Ks/
  app/
    main.py                  # FastAPI app entry point
    config.py                # Environment settings
    models.py                # Pydantic response models
    routers/
      pages.py               # HTML page routes (Jinja2)
      api.py                 # JSON API routes (/api/...)
    services/
      filing_store.py        # Manifest discovery & JSON loading
      company_info.py        # SEC company name/ticker resolution
      analytics.py           # YoY, metrics, ratios computation
      extractor.py           # On-demand extraction wrapper
    templates/
      base.html              # Shared layout
      index.html             # Homepage
      company.html           # Company detail page
      statement.html         # Statement viewer
    static/
      css/styles.css         # Design system
      css/statement.css      # Table styles
      js/                    # Client-side interactivity
  sec_statements.py          # CLI extractor
  edgar_viewer.py            # Legacy viewer (still works standalone)
  requirements.txt

Quick Start

1. Install

python3 -m venv .venv
source .venv/bin/activate    # macOS/Linux
# .venv\Scripts\Activate.ps1  # Windows PowerShell
pip install -r requirements.txt

2. Set SEC User-Agent (required by SEC)

macOS/Linux

export SEC_UA="FreeThe10Ks (your_email@example.com)"

Windows PowerShell

$env:SEC_UA = "FreeThe10Ks (your_email@example.com)"

3. Extract some companies

python sec_statements.py --cik 0001045810 --out statements/nvidia
python sec_statements.py --cik 0000320193 --out statements/apple
python sec_statements.py --cik 0000789019 --out statements/microsoft

4. Run the viewer

export EDGAR_OUT_ROOT="statements"
uvicorn app.main:app --reload --port 8000

Open http://127.0.0.1:8000/


API Endpoints

Endpoint Description
GET /api/company/search?q=apple Search SEC companies by name/ticker
GET /api/company/{cik} Get company name, ticker, exchange
GET /api/companies List all loaded companies
GET /api/companies/{col}/{cik}/metrics Key financial metrics from latest filing
GET /api/companies/{col}/{cik}/ratios Financial ratios (current, D/E, margins)
GET /api/companies/{col}/{cik}/trends/{metric} Multi-year trend data for charts
POST /api/extract Trigger on-demand extraction for a CIK

Available trend metrics: revenue, net_income, total_assets, total_equity, operating_cash_flow


Extractor Options

python sec_statements.py --cik 0001045810 \
  --years 8 \
  --limit 8 \
  --out statements/nvidia \
  --include-amends
Flag Default Description
--cik required Company CIK number
--years 5 Lookback window (years)
--limit 5 Max filings to process
--out sec_statements_out Output directory
--user-agent $SEC_UA SEC-required contact info
--include-amends false Include 10-K/A amendments
--keep-abstract false Keep XBRL scaffolding rows

Batch Extraction

Bash

for cik in 0001045810 0000320193 0000789019; do
  python sec_statements.py --cik "$cik" --out "statements/$cik" --years 6 --limit 6
done

PowerShell

$env:SEC_UA = "FreeThe10Ks (your_email@example.com)"
@("0001045810", "0000320193", "0000789019") | ForEach-Object {
  python sec_statements.py --cik $_ --out "statements\$_" --years 6 --limit 6
}

Output Format

manifest.json (per CIK)

Indexes all processed filings with accession numbers, dates, chosen reports, output paths, and any parse errors.

*_statement.json

Each statement JSON contains:

  • rows — table data (first row = header)
  • indent — integer hierarchy levels per row
  • indent_mode"from_html" or "inferred"
  • sourceUrl — original EDGAR report URL
  • report — metadata about the selected report

Environment Variables

Variable Required Description
SEC_UA Yes SEC User-Agent with contact email
EDGAR_OUT_ROOT No Root directory for statement exports (default: statements)
EDGAR_CACHE_DIR No Cache directory for SEC API responses (default: cache)

Keyboard Shortcuts (Statement Page)

Key Action
/ k Previous row
/ j Next row
Enter / Space Toggle expand/collapse
/ Focus search
Escape Clear focus / blur search

Legacy Viewer

The original single-file viewer (edgar_viewer.py) still works:

export EDGAR_OUT_ROOT="statements"
uvicorn edgar_viewer:app --reload --port 8000

Troubleshooting

Viewer says "No manifests found"

Check that EDGAR_OUT_ROOT points to a directory containing exports with manifest.json files at:

  • ROOT/<CIK>/manifest.json, or
  • ROOT/<collection>/<CIK>/manifest.json

Some filings have missing statements

A filing may use unusual naming or different report structure. Check errors inside the relevant filing entry in the company's manifest.json.

Company names not showing

Set SEC_UA so the app can query the SEC EDGAR API for company metadata. Names are cached after first lookup.


License

None