Skip to content

Commit ed3550c

Browse files
Major refactor: restructure infra, SQL, and code
This commit restructures infrastructure files for Kubernetes and Terraform, consolidates artifact and identity modules, and removes cloud-specific module directories. It refactors and replaces numerous SQL audit and briefing scripts, adds investigation catalog queries, and updates Airflow DAGs and Spark jobs. Documentation and code in Go and Rust agents are updated, with many obsolete files removed to streamline the project.
1 parent 050c205 commit ed3550c

File tree

6,166 files changed

+94345
-20122
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

6,166 files changed

+94345
-20122
lines changed

README.md

Lines changed: 101 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -1,106 +1,143 @@
11
# omphalOS
22

3-
omphalOS is a deterministic analysis workbench for trade and technology-transfer oversight. It builds a run directory that contains: inputs, normalized datasets, a warehouse, scored entities, review tables, exports, and a machine-checkable integrity index.
3+
omphalOS exists for a simple reason: _Analytic conclusions outlive the circumstances that produced them._
44

5-
The repository ships a reference implementation with synthetic data. It is structured to support the same workflow across workstation runs, scheduled runs, and deployed runs, while preserving a stable artifact contract.
5+
In U.S.-governmental environments vis-à-vis trade, technology, export controls, and enforcement — the setting(s) for which this system was first built in May 2024, then modernized for open release in December 2025 — an output that may inform action is expected to remain (i) legible under review, (ii) traceable to its provenance, and (iii) transmissible only under deliberate restraint.
66

7-
## Scope
7+
The purpose, here, is practical: **to make that posture routine.**
88

9-
omphalOS covers four tasks:
9+
This public release is a sanitized reference implementation, and all example data is synthetic.
1010

11-
1. Ingest: load a trade feed and a registry (lists, watchlists, or reference entities).
12-
2. Normalize: canonicalize fields, enforce schemas, and derive deterministic features.
13-
3. Score and assemble: match trade records to registry entities, compute entity exposure summaries, and write review-ready tables.
14-
4. Package: fingerprint all outputs, emit a run manifest, and (optionally) assemble a release bundle for distribution.
11+
## What the system asserts
1512

16-
## Run directory contract
13+
A run is treated as an evidentiary package: it yields deliverables for a reader and, inseparably, a record adequate to explain what was done, reproduce it when feasible, and detect post-hoc alteration without argument.
1714

18-
A run is an immutable directory rooted at:
15+
The claims, then, are intentionally narrow:
1916

20-
artifacts/runs/<run_id>/
17+
1. Integrity: a completed run directory can be checked against its manifest. (To wit, if the fingerprints do not match, the package has changed.)
18+
2. Comparability: two runs can be compared at the level of declared outputs, so disagreement can be located rather than narrated.
19+
3. Controlled distribution: a publishability scan surfaces common disclosure hazards before a package leaves its originating context.
2120

22-
The directory is treated as write-once: outputs are written under stable paths, then indexed and fingerprinted. The manifest contains:
21+
No stronger guarantee is implied. Correctness remains a matter of method, inputs, and judgment.
2322

24-
- metadata: tool version, run_id, timestamps, environment identifiers
25-
- declared artifacts: relative paths, sizes, sha256
26-
- merkle root of the artifact set
27-
- structured reports: dataset validation, matching statistics, scoring summaries
28-
- release metadata when a bundle is assembled
23+
## What a reader can expect from the record
2924

30-
This contract is the unit of comparison and verification.
25+
A run produces a directory intended to travel as a unit. The directory is structured so a reviewer can answer, from the artifacts alone, the questions that reliably matter once work leaves its originating workspace:
3126

32-
## Data model
27+
- What inputs were admitted, and what boundaries were enforced?
28+
- What rules governed transformations, and where are those rules stated?
29+
- Which outputs are intended for consumption, which are intermediate, and which require human review?
30+
- What may be shared, with whom, and with what risk of inadvertent disclosure?
31+
- When two executions disagree, is the disagreement substantive or procedural?
3332

34-
The reference warehouse is a SQLite database written to:
33+
If a package cannot answer these questions, it is incomplete work.
3534

36-
artifacts/runs/<run_id>/warehouse/warehouse.sqlite
35+
## Minimal use
3736

38-
Base tables:
37+
One may verify the included sample run:
3938

40-
- trade_feed: one row per shipment
41-
- registry: one row per entity
42-
- entity_matches: one row per shipment-entity candidate match
43-
- entity_scores: one row per entity summary
39+
```bash
40+
python -m omphalos verify --run-dir examples/sample_run
41+
```
4442

45-
The maximal pipeline extends trade_feed with exporter_country and importer_country while preserving the legacy country field.
43+
Execute the synthetic reference pipeline:
4644

47-
## Warehouse and SQL surfaces
45+
```bash
46+
python -m omphalos run --config config/runs/example_run.yaml
47+
```
4848

49-
The repository contains two SQL surfaces:
49+
Verify a generated run directory:
5050

51-
1. Warehouse transforms: a dbt project under warehouse/ that defines staging, intermediate, and mart models. It is written to run against SQLite, DuckDB, or Postgres using profiles shipped under warehouse/profiles/.
52-
2. Analyst catalog: a curated query library under sql/ organized by briefing, review, audit, and investigations. Catalog execution records the query text, parameters, and output fingerprints into the run directory.
51+
```bash
52+
python -m omphalos verify --run-dir artifacts/runs/<run_id>
53+
```
5354

54-
Both surfaces are designed to be executable and to emit artifacts that the manifest can index.
55+
Compare two runs for payload-level equivalence:
5556

56-
## Orchestration and deployment
57+
```bash
58+
python -m omphalos certify --run-a artifacts/runs/<runA> --run-b artifacts/runs/<runB>
59+
```
5760

58-
The repository includes:
61+
## Distribution
5962

60-
- scripts/ as the canonical operator interface (run, verify, certify, backfill, release-build, release-verify)
61-
- orchestration/airflow/ with DAGs that call the same runner interfaces
62-
- infra/k8s with base manifests and overlays for scheduled jobs
63-
- infra/terraform with modules and cloud examples for storage, identity, and logging
64-
- spark/scala as an optional scaling path for ingestion and coarse aggregations
63+
When a run must be transmitted as a single object:
6564

66-
## Policy
65+
```bash
66+
python -m omphalos release build --run-dir artifacts/runs/<run_id> --out artifacts/releases/<run_id>.tar.gz
67+
python -m omphalos release verify --bundle artifacts/releases/<run_id>.tar.gz
68+
```
6769

68-
policies/opa contains Rego policies that can evaluate:
70+
Before distributing outputs outside the environment in which they were generated:
6971

70-
- run manifests and release bundles
71-
- publishability constraints
72-
- infrastructure constraints for Terraform plans and Kubernetes manifests
72+
```bash
73+
python -m omphalos publishability scan --path . --out artifacts/reports/publishability.json
74+
```
7375

74-
Policy evaluation produces structured reports under the run directory.
76+
The scan ought to be treated as a pre-flight gate, whereupon a clean report reduces common failure modes; it does not constitute a blanket safety determination.
7577

76-
## User interface
78+
## Configuration and declared rules
7779

78-
ui/ provides a local run browser that renders:
80+
Runs are configured in `config/runs/`. Schemas and rule packs live in `contracts/`.
7981

80-
- run manifests
81-
- reports and diffs between runs
82-
- review tables and export artifacts
82+
The governing posture is explicitness. Shapes worth consuming should be declared. Rules worth relying on should be written down. Failures should be inspectable.
8383

84-
The UI reads from a small API server under src/omphalos/api.
84+
## Appendix A: run directory layout
8585

86-
## Independent verifiers
86+
A typical run directory includes:
8787

88-
agents/ contains small verifiers that can validate a run directory without importing the Python package:
88+
- `run_manifest.json`
89+
Inventory of outputs with integrity fingerprints.
8990

90-
- agents/go/omphalos-verify
91-
- agents/rust/omphalos-verify
91+
- `exports/`
92+
Reader-facing products (tables, narrative, packet-style records).
9293

93-
## Command line
94+
- `reports/`
95+
Structured checks and summaries (quality, determinism comparison, publishability scan, dependency inventory).
9496

95-
The CLI exposes:
97+
- `lineage/`
98+
Append-only event record of execution.
9699

97-
- omphalos run: reference pipeline on synthetic data
98-
- omphalos verify: recompute fingerprints and validate the manifest
99-
- omphalos compare: compare declared artifacts between runs
100-
- omphalos release: build and verify release bundles
100+
- `warehouse/`
101+
Local SQLite artifact used by the reference pipeline.
101102

102-
Maximal pipelines and additional surfaces are available under src/omphalos/maximal and are invoked through explicit commands and job specs.
103+
## Appendix B: operating expectations
103104

104-
## Files kept for provenance
105+
omphalOS assumed two expectations throughout itself:
105106

106-
Original repository files are preserved. Where a file is materially upgraded, the prior content is copied into .legacy_snapshots/ with the same relative path before modification.
107+
Firstly, the run directory is treated as an immutable package once the run completes. Editing outputs “for presentation” after completion is a change in evidence. If edits are required, the disciplined move is to rerun under a revised configuration and allow the record to reflect the revision.
108+
109+
Secondly, comparisons are only as meaningful as the boundaries you enforce. If the run’s inputs depend on ambient state—untracked files, implicit credentials, external services whose responses are not recorded—then replay will converge on approximation rather than identity. The system will still produce a record; it cannot supply missing constraints.
110+
111+
## Documentation
112+
113+
I recommend that you start with:
114+
115+
- `docs/overview.md`
116+
- `docs/architecture.md`
117+
- `docs/artifacts.md`
118+
- `docs/cli.md`
119+
- `docs/open_source_readiness.md`
120+
- `docs/threat_model.md`
121+
122+
## License
123+
124+
Apache-2.0; see `LICENSE` and `NOTICE`; citation metadata is in `CITATION.cff`.
125+
126+
127+
## Local run
128+
129+
```bash
130+
omphalos run --config config/runs/example_run.yaml
131+
```
132+
133+
This produces a run directory under `artifacts/runs/<run_id>/` with `run_manifest.json` binding inputs, artifacts, and exports by hash.
134+
135+
## Local review UI
136+
137+
```bash
138+
omphalos serve --runs-root artifacts/runs --host 127.0.0.1 --port 8000
139+
```
140+
141+
Open `http://127.0.0.1:8000/`.
142+
143+
The UI is a client of the API exposed at `/api/*` and reads all data from the run directories on disk.

agents/go/omphalos-verify/main.go

Lines changed: 43 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -4,68 +4,67 @@ import (
44
"crypto/sha256"
55
"encoding/hex"
66
"encoding/json"
7-
"errors"
8-
"flag"
7+
"fmt"
98
"io"
109
"os"
1110
"path/filepath"
1211
)
1312

14-
type Artifact struct {
15-
Path string `json:"path"`
16-
Size int64 `json:"size"`
13+
type ArtifactFile struct {
14+
Path string `json:"path"`
1715
Sha256 string `json:"sha256"`
1816
}
1917

2018
type Manifest struct {
21-
RunID string `json:"run_id"`
22-
ArtifactsRootHash string `json:"artifacts_root_hash"`
23-
Artifacts []Artifact `json:"artifacts"`
19+
Artifacts struct {
20+
Files []ArtifactFile `json:"files"`
21+
ControlFiles []ArtifactFile `json:"control_files"`
22+
} `json:"artifacts"`
2423
}
2524

26-
func sha256File(path string) (string, int64, error) {
27-
f, err := os.Open(path)
28-
if err != nil { return "", 0, err }
25+
func fileHash(p string) (string, error) {
26+
f, err := os.Open(p)
27+
if err != nil { return "", err }
2928
defer f.Close()
3029
h := sha256.New()
31-
n, err := io.Copy(h, f)
32-
if err != nil { return "", 0, err }
33-
return hex.EncodeToString(h.Sum(nil)), n, nil
34-
}
35-
36-
func readManifest(path string) (*Manifest, error) {
37-
b, err := os.ReadFile(path)
38-
if err != nil { return nil, err }
39-
var m Manifest
40-
if err := json.Unmarshal(b, &m); err != nil { return nil, err }
41-
return &m, nil
42-
}
43-
44-
func verifyRun(runDir string) error {
45-
m, err := readManifest(filepath.Join(runDir, "manifest.json"))
46-
if err != nil { return err }
47-
if m.RunID == "" { return errors.New("missing run_id") }
48-
if len(m.Artifacts) == 0 { return errors.New("no artifacts declared") }
49-
for _, a := range m.Artifacts {
50-
p := filepath.Join(runDir, a.Path)
51-
gotHash, gotSize, err := sha256File(p)
52-
if err != nil { return err }
53-
if gotSize != a.Size { return errors.New("size mismatch: " + a.Path) }
54-
if gotHash != a.Sha256 { return errors.New("hash mismatch: " + a.Path) }
55-
}
56-
return nil
30+
if _, err := io.Copy(h, f); err != nil { return "", err }
31+
return hex.EncodeToString(h.Sum(nil)), nil
5732
}
5833

5934
func main() {
60-
runDir := flag.String("run-dir", "", "run directory containing manifest.json")
61-
flag.Parse()
62-
if *runDir == "" {
63-
os.Stderr.WriteString("missing --run-dir\n")
35+
if len(os.Args) < 2 {
36+
fmt.Println("usage: omphalos-verify <run_dir>")
6437
os.Exit(2)
6538
}
66-
if err := verifyRun(*runDir); err != nil {
67-
os.Stderr.WriteString(err.Error() + "\n")
39+
runDir := os.Args[1]
40+
mfPath := filepath.Join(runDir, "run_manifest.json")
41+
b, err := os.ReadFile(mfPath)
42+
if err != nil { panic(err) }
43+
var m Manifest
44+
if err := json.Unmarshal(b, &m); err != nil { panic(err) }
45+
46+
files := append([]ArtifactFile{}, m.Artifacts.Files...)
47+
files = append(files, m.Artifacts.ControlFiles...)
48+
bad := 0
49+
for _, af := range files {
50+
fp := filepath.Join(runDir, af.Path)
51+
got, err := fileHash(fp)
52+
if err != nil {
53+
fmt.Printf("missing %s
54+
", af.Path)
55+
bad++
56+
continue
57+
}
58+
if got != af.Sha256 {
59+
fmt.Printf("mismatch %s
60+
", af.Path)
61+
bad++
62+
}
63+
}
64+
if bad > 0 {
65+
fmt.Printf("fail %d
66+
", bad)
6867
os.Exit(1)
6968
}
70-
os.Stdout.WriteString("ok\n")
69+
fmt.Println("pass")
7170
}

agents/rust/omphalos-verify/Cargo.toml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@ version = "0.1.0"
44
edition = "2021"
55

66
[dependencies]
7+
sha2 = "0.10"
78
serde = { version = "1.0", features = ["derive"] }
89
serde_json = "1.0"
9-
sha2 = "0.10"
10-
hex = "0.4"

0 commit comments

Comments
 (0)