|
1 | | -omphalOS |
| 1 | +# omphalOS |
2 | 2 |
|
3 | | -Deterministic workbench that ingests a trade feed and an entity registry, resolves entity matches, produces scored outputs, and packages artifacts for verification and release. |
| 3 | +omphalOS is a deterministic analysis workbench for trade and technology-transfer oversight. It builds a run directory that contains: inputs, normalized datasets, a warehouse, scored entities, review tables, exports, and a machine-checkable integrity index. |
4 | 4 |
|
5 | | -Directory map |
6 | | -- src/omphalos: Python implementation (CLI, pipeline, rules, artifacts, verification) |
7 | | -- config: run configurations |
8 | | -- warehouse: SQLite schema plus dbt project for modeling |
9 | | -- sql: curated SQL catalog and playbooks targeting a run warehouse |
10 | | -- orchestration: Airflow DAGs and scheduler surfaces |
11 | | -- spark: PySpark and Scala Spark transforms that mirror warehouse rollups |
12 | | -- infra: Terraform and Kubernetes manifests |
13 | | -- policies: OPA policies for publishability and quality gates |
14 | | -- agents: Go and Rust verifiers for run bundles |
15 | | -- ui: run_manifest inspector (React) |
| 5 | +The repository ships a reference implementation with synthetic data. It is structured to support the same workflow across workstation runs, scheduled runs, and deployed runs, while preserving a stable artifact contract. |
16 | 6 |
|
17 | | -<<<<<<< Updated upstream |
18 | | -The purpose, here, is practical: **to make that posture routine.** |
| 7 | +## Scope |
19 | 8 |
|
20 | | -This public release is a sanitized reference implementation, and all example data is synthetic. |
| 9 | +omphalOS covers four tasks: |
21 | 10 |
|
22 | | -## What the system asserts |
| 11 | +1. Ingest: load a trade feed and a registry (lists, watchlists, or reference entities). |
| 12 | +2. Normalize: canonicalize fields, enforce schemas, and derive deterministic features. |
| 13 | +3. Score and assemble: match trade records to registry entities, compute entity exposure summaries, and write review-ready tables. |
| 14 | +4. Package: fingerprint all outputs, emit a run manifest, and (optionally) assemble a release bundle for distribution. |
23 | 15 |
|
24 | | -A run is treated as an evidentiary package: it yields deliverables for a reader and, inseparably, a record adequate to explain what was done, reproduce it when feasible, and detect post-hoc alteration without argument. |
| 16 | +## Run directory contract |
25 | 17 |
|
26 | | -The claims, then, are intentionally narrow: |
| 18 | +A run is an immutable directory rooted at: |
27 | 19 |
|
28 | | -1. Integrity: a completed run directory can be checked against its manifest. (To wit, if the fingerprints do not match, the package has changed.) |
29 | | -2. Comparability: two runs can be compared at the level of declared outputs, so disagreement can be located rather than narrated. |
30 | | -3. Controlled distribution: a publishability scan surfaces common disclosure hazards before a package leaves its originating context. |
| 20 | +artifacts/runs/<run_id>/ |
31 | 21 |
|
32 | | -No stronger guarantee is implied. Correctness remains a matter of method, inputs, and judgment. |
| 22 | +The directory is treated as write-once: outputs are written under stable paths, then indexed and fingerprinted. The manifest contains: |
33 | 23 |
|
34 | | -## What a reader can expect from the record |
| 24 | +- metadata: tool version, run_id, timestamps, environment identifiers |
| 25 | +- declared artifacts: relative paths, sizes, sha256 |
| 26 | +- merkle root of the artifact set |
| 27 | +- structured reports: dataset validation, matching statistics, scoring summaries |
| 28 | +- release metadata when a bundle is assembled |
35 | 29 |
|
36 | | -A run produces a directory intended to travel as a unit. The directory is structured so a reviewer can answer, from the artifacts alone, the questions that reliably matter once work leaves its originating workspace: |
| 30 | +This contract is the unit of comparison and verification. |
37 | 31 |
|
38 | | -- What inputs were admitted, and what boundaries were enforced? |
39 | | -- What rules governed transformations, and where are those rules stated? |
40 | | -- Which outputs are intended for consumption, which are intermediate, and which require human review? |
41 | | -- What may be shared, with whom, and with what risk of inadvertent disclosure? |
42 | | -- When two executions disagree, is the disagreement substantive or procedural? |
| 32 | +## Data model |
43 | 33 |
|
44 | | -If a package cannot answer these questions, it is incomplete work. |
| 34 | +The reference warehouse is a SQLite database written to: |
45 | 35 |
|
46 | | -## Minimal use |
| 36 | +artifacts/runs/<run_id>/warehouse/warehouse.sqlite |
47 | 37 |
|
48 | | -From a fresh clone, either install the package: |
| 38 | +Base tables: |
49 | 39 |
|
50 | | -```bash |
51 | | -python -m pip install -e . |
52 | | -``` |
| 40 | +- trade_feed: one row per shipment |
| 41 | +- registry: one row per entity |
| 42 | +- entity_matches: one row per shipment-entity candidate match |
| 43 | +- entity_scores: one row per entity summary |
53 | 44 |
|
54 | | -or run directly from source by setting: |
| 45 | +The maximal pipeline extends trade_feed with exporter_country and importer_country while preserving the legacy country field. |
55 | 46 |
|
56 | | -```bash |
57 | | -export PYTHONPATH="$(pwd)/src" |
58 | | -``` |
| 47 | +## Warehouse and SQL surfaces |
59 | 48 |
|
| 49 | +The repository contains two SQL surfaces: |
60 | 50 |
|
| 51 | +1. Warehouse transforms: a dbt project under warehouse/ that defines staging, intermediate, and mart models. It is written to run against SQLite, DuckDB, or Postgres using profiles shipped under warehouse/profiles/. |
| 52 | +2. Analyst catalog: a curated query library under sql/ organized by briefing, review, audit, and investigations. Catalog execution records the query text, parameters, and output fingerprints into the run directory. |
61 | 53 |
|
62 | | -One may verify the included sample run: |
| 54 | +Both surfaces are designed to be executable and to emit artifacts that the manifest can index. |
63 | 55 |
|
64 | | -```bash |
65 | | -python -m omphalos verify --run-dir examples/sample_run |
66 | | -``` |
| 56 | +## Orchestration and deployment |
67 | 57 |
|
68 | | -Execute the synthetic reference pipeline: |
| 58 | +The repository includes: |
69 | 59 |
|
70 | | -```bash |
71 | | -python -m omphalos run --config config/runs/example_run.yaml |
72 | | -``` |
| 60 | +- scripts/ as the canonical operator interface (run, verify, certify, backfill, release-build, release-verify) |
| 61 | +- orchestration/airflow/ with DAGs that call the same runner interfaces |
| 62 | +- infra/k8s with base manifests and overlays for scheduled jobs |
| 63 | +- infra/terraform with modules and cloud examples for storage, identity, and logging |
| 64 | +- spark/scala as an optional scaling path for ingestion and coarse aggregations |
73 | 65 |
|
74 | | -Verify a generated run directory: |
| 66 | +## Policy |
75 | 67 |
|
76 | | -```bash |
77 | | -python -m omphalos verify --run-dir artifacts/runs/<run_id> |
78 | | -``` |
| 68 | +policies/opa contains Rego policies that can evaluate: |
79 | 69 |
|
80 | | -Compare two runs for payload-level equivalence: |
| 70 | +- run manifests and release bundles |
| 71 | +- publishability constraints |
| 72 | +- infrastructure constraints for Terraform plans and Kubernetes manifests |
81 | 73 |
|
82 | | -```bash |
83 | | -python -m omphalos certify --run-a artifacts/runs/<runA> --run-b artifacts/runs/<runB> |
84 | | -``` |
| 74 | +Policy evaluation produces structured reports under the run directory. |
85 | 75 |
|
86 | | -## Optional extras (SQL/dbt, Airflow, Spark) |
| 76 | +## User interface |
87 | 77 |
|
88 | | -The core runtime stays lightweight. Extra surfaces are available as optional dependencies: |
| 78 | +ui/ provides a local run browser that renders: |
89 | 79 |
|
90 | | -```bash |
91 | | -# Development tools |
92 | | -python -m pip install -e ".[dev]" |
| 80 | +- run manifests |
| 81 | +- reports and diffs between runs |
| 82 | +- review tables and export artifacts |
93 | 83 |
|
94 | | -# SQL/dbt surface (DuckDB + Postgres connectivity) |
95 | | -python -m pip install -e ".[warehouse]" |
| 84 | +The UI reads from a small API server under src/omphalos/api. |
96 | 85 |
|
97 | | -# Orchestration surface |
98 | | -python -m pip install -e ".[orchestration]" |
| 86 | +## Independent verifiers |
99 | 87 |
|
100 | | -# Spark surface |
101 | | -python -m pip install -e ".[spark]" |
102 | | -``` |
| 88 | +agents/ contains small verifiers that can validate a run directory without importing the Python package: |
103 | 89 |
|
104 | | -## Distribution |
| 90 | +- agents/go/omphalos-verify |
| 91 | +- agents/rust/omphalos-verify |
105 | 92 |
|
106 | | -When a run must be transmitted as a single object: |
| 93 | +## Command line |
107 | 94 |
|
108 | | -```bash |
109 | | -python -m omphalos release build --run-dir artifacts/runs/<run_id> --out artifacts/releases/<run_id>.tar.gz |
110 | | -python -m omphalos release verify --bundle artifacts/releases/<run_id>.tar.gz |
111 | | -``` |
| 95 | +The CLI exposes: |
112 | 96 |
|
113 | | -Before distributing outputs outside the environment in which they were generated: |
| 97 | +- omphalos run: reference pipeline on synthetic data |
| 98 | +- omphalos verify: recompute fingerprints and validate the manifest |
| 99 | +- omphalos compare: compare declared artifacts between runs |
| 100 | +- omphalos release: build and verify release bundles |
114 | 101 |
|
115 | | -```bash |
116 | | -python -m omphalos publishability scan --path . --out artifacts/reports/publishability.json |
117 | | -``` |
| 102 | +Maximal pipelines and additional surfaces are available under src/omphalos/maximal and are invoked through explicit commands and job specs. |
118 | 103 |
|
119 | | -The scan ought to be treated as a pre-flight gate, whereupon a clean report reduces common failure modes; it does not constitute a blanket safety determination. |
| 104 | +## Files kept for provenance |
120 | 105 |
|
121 | | -## Configuration and declared rules |
122 | | - |
123 | | -Runs are configured in `config/runs/`. Schemas and rule packs live in `contracts/`. |
124 | | - |
125 | | -The governing posture is explicitness. Shapes worth consuming should be declared. Rules worth relying on should be written down. Failures should be inspectable. |
126 | | - |
127 | | -## Appendix A: run directory layout |
128 | | - |
129 | | -A typical run directory includes: |
130 | | - |
131 | | -- `run_manifest.json` |
132 | | - Inventory of outputs with integrity fingerprints. |
133 | | - |
134 | | -- `exports/` |
135 | | - Reader-facing products (tables, narrative, packet-style records). |
136 | | - |
137 | | -- `reports/` |
138 | | - Structured checks and summaries (quality, determinism comparison, publishability scan, dependency inventory). |
139 | | - |
140 | | -- `lineage/` |
141 | | - Append-only event record of execution. |
142 | | - |
143 | | -- `warehouse/` |
144 | | - Local SQLite artifact used by the reference pipeline. |
145 | | - |
146 | | -## Appendix B: operating expectations |
147 | | - |
148 | | -omphalOS assumed two expectations throughout itself: |
149 | | - |
150 | | -Firstly, the run directory is treated as an immutable package once the run completes. Editing outputs “for presentation” after completion is a change in evidence. If edits are required, the disciplined move is to rerun under a revised configuration and allow the record to reflect the revision. |
151 | | - |
152 | | -Secondly, comparisons are only as meaningful as the boundaries you enforce. If the run’s inputs depend on ambient state—untracked files, implicit credentials, external services whose responses are not recorded—then replay will converge on approximation rather than identity. The system will still produce a record; it cannot supply missing constraints. |
153 | | - |
154 | | -## Documentation |
155 | | - |
156 | | -I recommend that you start with: |
157 | | - |
158 | | -- `docs/overview.md` |
159 | | -- `docs/architecture.md` |
160 | | -- `docs/artifacts.md` |
161 | | -- `docs/cli.md` |
162 | | -- `docs/open_source_readiness.md` |
163 | | -- `docs/threat_model.md` |
164 | | - |
165 | | -## License |
166 | | - |
167 | | -Apache-2.0; see `LICENSE` and `NOTICE`; citation metadata is in `CITATION.cff`. |
168 | | -======= |
169 | | -Common commands |
170 | | -- omphalos run --config config/runs/example_run.yaml |
171 | | -- omphalos verify --run-dir <run_dir> |
172 | | -- omphalos release build --run-dir <run_dir> --out <bundle.tar.gz> |
173 | | -- omphalos sql run --run-dir <run_dir> --manifest sql/manifests/briefing_pack.yaml |
174 | | ->>>>>>> Stashed changes |
| 106 | +Original repository files are preserved. Where a file is materially upgraded, the prior content is copied into .legacy_snapshots/ with the same relative path before modification. |
0 commit comments