|
1 | 1 | # omphalOS |
2 | 2 |
|
3 | | -omphalOS is a deterministic analysis workbench for trade and technology-transfer oversight. It builds a run directory that contains: inputs, normalized datasets, a warehouse, scored entities, review tables, exports, and a machine-checkable integrity index. |
| 3 | +omphalOS exists for a simple reason: _Analytic conclusions outlive the circumstances that produced them._ |
4 | 4 |
|
5 | | -The repository ships a reference implementation with synthetic data. It is structured to support the same workflow across workstation runs, scheduled runs, and deployed runs, while preserving a stable artifact contract. |
| 5 | +In U.S.-governmental environments vis-à-vis trade, technology, export controls, and enforcement — the setting(s) for which this system was first built in May 2024, then modernized for open release in December 2025 — an output that may inform action is expected to remain (i) legible under review, (ii) traceable to its provenance, and (iii) transmissible only under deliberate restraint. |
6 | 6 |
|
7 | | -## Scope |
| 7 | +The purpose, here, is practical: **to make that posture routine.** |
8 | 8 |
|
9 | | -omphalOS covers four tasks: |
| 9 | +This public release is a sanitized reference implementation, and all example data is synthetic. |
10 | 10 |
|
11 | | -1. Ingest: load a trade feed and a registry (lists, watchlists, or reference entities). |
12 | | -2. Normalize: canonicalize fields, enforce schemas, and derive deterministic features. |
13 | | -3. Score and assemble: match trade records to registry entities, compute entity exposure summaries, and write review-ready tables. |
14 | | -4. Package: fingerprint all outputs, emit a run manifest, and (optionally) assemble a release bundle for distribution. |
| 11 | +## What the system asserts |
15 | 12 |
|
16 | | -## Run directory contract |
| 13 | +A run is treated as an evidentiary package: it yields deliverables for a reader and, inseparably, a record adequate to explain what was done, reproduce it when feasible, and detect post-hoc alteration without argument. |
17 | 14 |
|
18 | | -A run is an immutable directory rooted at: |
| 15 | +The claims, then, are intentionally narrow: |
19 | 16 |
|
20 | | -artifacts/runs/<run_id>/ |
| 17 | +1. Integrity: a completed run directory can be checked against its manifest. (To wit, if the fingerprints do not match, the package has changed.) |
| 18 | +2. Comparability: two runs can be compared at the level of declared outputs, so disagreement can be located rather than narrated. |
| 19 | +3. Controlled distribution: a publishability scan surfaces common disclosure hazards before a package leaves its originating context. |
21 | 20 |
|
22 | | -The directory is treated as write-once: outputs are written under stable paths, then indexed and fingerprinted. The manifest contains: |
| 21 | +No stronger guarantee is implied. Correctness remains a matter of method, inputs, and judgment. |
23 | 22 |
|
24 | | -- metadata: tool version, run_id, timestamps, environment identifiers |
25 | | -- declared artifacts: relative paths, sizes, sha256 |
26 | | -- merkle root of the artifact set |
27 | | -- structured reports: dataset validation, matching statistics, scoring summaries |
28 | | -- release metadata when a bundle is assembled |
| 23 | +## What a reader can expect from the record |
29 | 24 |
|
30 | | -This contract is the unit of comparison and verification. |
| 25 | +A run produces a directory intended to travel as a unit. The directory is structured so a reviewer can answer, from the artifacts alone, the questions that reliably matter once work leaves its originating workspace: |
31 | 26 |
|
32 | | -## Data model |
| 27 | +- What inputs were admitted, and what boundaries were enforced? |
| 28 | +- What rules governed transformations, and where are those rules stated? |
| 29 | +- Which outputs are intended for consumption, which are intermediate, and which require human review? |
| 30 | +- What may be shared, with whom, and with what risk of inadvertent disclosure? |
| 31 | +- When two executions disagree, is the disagreement substantive or procedural? |
33 | 32 |
|
34 | | -The reference warehouse is a SQLite database written to: |
| 33 | +If a package cannot answer these questions, it is incomplete work. |
35 | 34 |
|
36 | | -artifacts/runs/<run_id>/warehouse/warehouse.sqlite |
| 35 | +## Minimal use |
37 | 36 |
|
38 | | -Base tables: |
| 37 | +One may verify the included sample run: |
39 | 38 |
|
40 | | -- trade_feed: one row per shipment |
41 | | -- registry: one row per entity |
42 | | -- entity_matches: one row per shipment-entity candidate match |
43 | | -- entity_scores: one row per entity summary |
| 39 | +```bash |
| 40 | +python -m omphalos verify --run-dir examples/sample_run |
| 41 | +``` |
44 | 42 |
|
45 | | -The maximal pipeline extends trade_feed with exporter_country and importer_country while preserving the legacy country field. |
| 43 | +Execute the synthetic reference pipeline: |
46 | 44 |
|
47 | | -## Warehouse and SQL surfaces |
| 45 | +```bash |
| 46 | +python -m omphalos run --config config/runs/example_run.yaml |
| 47 | +``` |
48 | 48 |
|
49 | | -The repository contains two SQL surfaces: |
| 49 | +Verify a generated run directory: |
50 | 50 |
|
51 | | -1. Warehouse transforms: a dbt project under warehouse/ that defines staging, intermediate, and mart models. It is written to run against SQLite, DuckDB, or Postgres using profiles shipped under warehouse/profiles/. |
52 | | -2. Analyst catalog: a curated query library under sql/ organized by briefing, review, audit, and investigations. Catalog execution records the query text, parameters, and output fingerprints into the run directory. |
| 51 | +```bash |
| 52 | +python -m omphalos verify --run-dir artifacts/runs/<run_id> |
| 53 | +``` |
53 | 54 |
|
54 | | -Both surfaces are designed to be executable and to emit artifacts that the manifest can index. |
| 55 | +Compare two runs for payload-level equivalence: |
55 | 56 |
|
56 | | -## Orchestration and deployment |
| 57 | +```bash |
| 58 | +python -m omphalos certify --run-a artifacts/runs/<runA> --run-b artifacts/runs/<runB> |
| 59 | +``` |
57 | 60 |
|
58 | | -The repository includes: |
| 61 | +## Distribution |
59 | 62 |
|
60 | | -- scripts/ as the canonical operator interface (run, verify, certify, backfill, release-build, release-verify) |
61 | | -- orchestration/airflow/ with DAGs that call the same runner interfaces |
62 | | -- infra/k8s with base manifests and overlays for scheduled jobs |
63 | | -- infra/terraform with modules and cloud examples for storage, identity, and logging |
64 | | -- spark/scala as an optional scaling path for ingestion and coarse aggregations |
| 63 | +When a run must be transmitted as a single object: |
65 | 64 |
|
66 | | -## Policy |
| 65 | +```bash |
| 66 | +python -m omphalos release build --run-dir artifacts/runs/<run_id> --out artifacts/releases/<run_id>.tar.gz |
| 67 | +python -m omphalos release verify --bundle artifacts/releases/<run_id>.tar.gz |
| 68 | +``` |
67 | 69 |
|
68 | | -policies/opa contains Rego policies that can evaluate: |
| 70 | +Before distributing outputs outside the environment in which they were generated: |
69 | 71 |
|
70 | | -- run manifests and release bundles |
71 | | -- publishability constraints |
72 | | -- infrastructure constraints for Terraform plans and Kubernetes manifests |
| 72 | +```bash |
| 73 | +python -m omphalos publishability scan --path . --out artifacts/reports/publishability.json |
| 74 | +``` |
73 | 75 |
|
74 | | -Policy evaluation produces structured reports under the run directory. |
| 76 | +The scan ought to be treated as a pre-flight gate, whereupon a clean report reduces common failure modes; it does not constitute a blanket safety determination. |
75 | 77 |
|
76 | | -## User interface |
| 78 | +## Configuration and declared rules |
77 | 79 |
|
78 | | -ui/ provides a local run browser that renders: |
| 80 | +Runs are configured in `config/runs/`. Schemas and rule packs live in `contracts/`. |
79 | 81 |
|
80 | | -- run manifests |
81 | | -- reports and diffs between runs |
82 | | -- review tables and export artifacts |
| 82 | +The governing posture is explicitness. Shapes worth consuming should be declared. Rules worth relying on should be written down. Failures should be inspectable. |
83 | 83 |
|
84 | | -The UI reads from a small API server under src/omphalos/api. |
| 84 | +## Appendix A: run directory layout |
85 | 85 |
|
86 | | -## Independent verifiers |
| 86 | +A typical run directory includes: |
87 | 87 |
|
88 | | -agents/ contains small verifiers that can validate a run directory without importing the Python package: |
| 88 | +- `run_manifest.json` |
| 89 | + Inventory of outputs with integrity fingerprints. |
89 | 90 |
|
90 | | -- agents/go/omphalos-verify |
91 | | -- agents/rust/omphalos-verify |
| 91 | +- `exports/` |
| 92 | + Reader-facing products (tables, narrative, packet-style records). |
92 | 93 |
|
93 | | -## Command line |
| 94 | +- `reports/` |
| 95 | + Structured checks and summaries (quality, determinism comparison, publishability scan, dependency inventory). |
94 | 96 |
|
95 | | -The CLI exposes: |
| 97 | +- `lineage/` |
| 98 | + Append-only event record of execution. |
96 | 99 |
|
97 | | -- omphalos run: reference pipeline on synthetic data |
98 | | -- omphalos verify: recompute fingerprints and validate the manifest |
99 | | -- omphalos compare: compare declared artifacts between runs |
100 | | -- omphalos release: build and verify release bundles |
| 100 | +- `warehouse/` |
| 101 | + Local SQLite artifact used by the reference pipeline. |
101 | 102 |
|
102 | | -Maximal pipelines and additional surfaces are available under src/omphalos/maximal and are invoked through explicit commands and job specs. |
| 103 | +## Appendix B: operating expectations |
103 | 104 |
|
104 | | -## Files kept for provenance |
| 105 | +omphalOS assumed two expectations throughout itself: |
105 | 106 |
|
106 | | -Original repository files are preserved. Where a file is materially upgraded, the prior content is copied into .legacy_snapshots/ with the same relative path before modification. |
| 107 | +Firstly, the run directory is treated as an immutable package once the run completes. Editing outputs “for presentation” after completion is a change in evidence. If edits are required, the disciplined move is to rerun under a revised configuration and allow the record to reflect the revision. |
| 108 | + |
| 109 | +Secondly, comparisons are only as meaningful as the boundaries you enforce. If the run’s inputs depend on ambient state—untracked files, implicit credentials, external services whose responses are not recorded—then replay will converge on approximation rather than identity. The system will still produce a record; it cannot supply missing constraints. |
| 110 | + |
| 111 | +## Documentation |
| 112 | + |
| 113 | +I recommend that you start with: |
| 114 | + |
| 115 | +- `docs/overview.md` |
| 116 | +- `docs/architecture.md` |
| 117 | +- `docs/artifacts.md` |
| 118 | +- `docs/cli.md` |
| 119 | +- `docs/open_source_readiness.md` |
| 120 | +- `docs/threat_model.md` |
| 121 | + |
| 122 | +## License |
| 123 | + |
| 124 | +Apache-2.0; see `LICENSE` and `NOTICE`; citation metadata is in `CITATION.cff`. |
| 125 | + |
| 126 | + |
| 127 | +## Local run |
| 128 | + |
| 129 | +```bash |
| 130 | +omphalos run --config config/runs/example_run.yaml |
| 131 | +``` |
| 132 | + |
| 133 | +This produces a run directory under `artifacts/runs/<run_id>/` with `run_manifest.json` binding inputs, artifacts, and exports by hash. |
| 134 | + |
| 135 | +## Local review UI |
| 136 | + |
| 137 | +```bash |
| 138 | +omphalos serve --runs-root artifacts/runs --host 127.0.0.1 --port 8000 |
| 139 | +``` |
| 140 | + |
| 141 | +Open `http://127.0.0.1:8000/`. |
| 142 | + |
| 143 | +The UI is a client of the API exposed at `/api/*` and reads all data from the run directories on disk. |
0 commit comments