Skip to content

Commit 31bf30d

Browse files
feat: add owner intelligence panel and reporting functionality
- Implemented the Owner Intelligence Panel in `gui_owner.py` for user interaction regarding PII lookups. - Created reporting utilities in `report_owner.py` for exporting owner intelligence data in JSON and CSV formats, and generating PDF summaries. - Introduced owner intelligence helpers in `owner/__init__.py`, including classification and interface definitions. - Developed classification logic in `classify.py` to determine ownership types based on associations and PII. - Established interfaces for owner intelligence in `interface.py`, defining data structures for owner associations, PII, and audit records. - Implemented signal extraction and confidence scoring in `signals.py`, including functions to analyze evidence and score ownership confidence. - Added a stub implementation of the Truecaller adapter in `truecaller_adapter.py` for future integration. - Created unit tests in `test_owner.py` to validate classification, signal extraction, confidence scoring, and adapter behavior.
1 parent 081b327 commit 31bf30d

File tree

12 files changed

+1706
-29
lines changed

12 files changed

+1706
-29
lines changed

.env.example

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,13 @@ PHONEINT_CACHE_TTL_SECONDS=3600
2828
GCS_API_KEY=
2929
GCS_CX=
3030

31+
# Truecaller (optional; PII-capable)
32+
# Requires official/commercial Truecaller API credentials and lawful basis + explicit consent.
33+
ENABLE_TRUECALLER=0
34+
TRUECALLER_API_KEY=
35+
3136
# Optional: scoring weights override (JSON object)
3237
# PHONEINT_SCORE_WEIGHTS={"found_in_scam_db":60,"voip":15,"found_in_classifieds":15,"business_listing":-10,"age_of_first_mention_per_year":-2}
3338

39+
# Optional: owner intelligence confidence weights override (JSON object)
40+
# PHONEINT_OWNER_CONFIDENCE_WEIGHTS={"voip":25,"business_listing":15,"classified_ad":10,"scam_report":5,"pii_confirmed":50,"multiple_sources":10,"evidence_any":5}

README.md

Lines changed: 225 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,50 @@
11
# file: README.md
22
# phoneint (Phone Number OSINT)
33

4-
This tool parses and enriches phone numbers (offline via `phonenumbers`) and can run **pluggable, async** reputation checks (optional) to produce an **auditable** risk score and report.
4+
`phoneint` parses and enriches phone numbers offline (via `phonenumbers`) and runs optional, pluggable async reputation checks to produce an auditable risk score and report. It is designed to be transparent, explainable, and safe-by-default.
55

66
## Disclaimer (Read First)
77

8-
**This tool is for lawful, ethical OSINT research only.**
8+
**This tool is for lawful, ethical OSINT research only.**
99
Do not use it to harass, stalk, dox, or violate privacy. Always comply with applicable laws and third-party Terms of Service.
1010

11+
## What This Tool Does (At a Glance)
12+
13+
- Normalizes numbers to E.164 and common formats
14+
- Enriches with deterministic metadata (carrier when available, region, time zones, number type)
15+
- Optionally queries public OSINT sources via adapters
16+
- Produces explainable risk scoring and owner intelligence
17+
- Exports JSON, CSV, and optional PDF reports
18+
- Provides a CLI and a minimal non-blocking GUI
19+
20+
## How It Works
21+
22+
1. **Parse + Normalize**: `phonenumbers` parses the input to E.164 and standard formats.
23+
2. **Deterministic Enrichment**: carrier, time zone, number type, and region are derived offline.
24+
3. **Adapter Checks (Optional)**: adapters query public sources and return evidence items.
25+
4. **Scoring**: risk score is calculated from explicit signals with a breakdown.
26+
5. **Owner Intelligence (Optional)**: evidence is converted into associations and confidence scores.
27+
6. **Reporting**: a full JSON report is built and can be exported to CSV/PDF.
28+
1129
## Features
1230

1331
- E.164 parsing + normalization (`phonenumbers`)
14-
- Deterministic enrichment: carrier (if available), country/region, time zones, number type, ISO code
15-
- Async adapters (`httpx`): DuckDuckGo Instant Answer (ToS-friendly), Google Custom Search (requires your key), public dataset checks
16-
- Explainable risk scoring with configurable weights (YAML)
32+
- Deterministic enrichment: carrier (when available), region/country name, time zones, number type, ISO code
33+
- Async adapters (`httpx`): DuckDuckGo Instant Answer, Google Custom Search, public dataset checks
34+
- Explainable risk scoring with configurable weights (YAML/JSON)
35+
- Owner Intelligence with audit trail for PII-capable adapters
1736
- Reports: JSON + CSV; optional PDF (extra dependency)
18-
- Optional SQLite TTL caching for adapter calls
19-
- CLI and minimal non-blocking GUI skeleton (PySide6 + qasync, optional)
37+
- Optional SQLite TTL caching
38+
- CLI and GUI (PySide6 + qasync)
2039

2140
## Install
2241

2342
Python 3.11+ recommended.
2443

2544
```bash
45+
python -m venv .venv
46+
.\.venv\Scripts\Activate.ps1
47+
2648
pip install -U pip
2749
pip install .
2850
```
@@ -78,21 +100,138 @@ Launch GUI:
78100
phoneint serve-gui
79101
```
80102

81-
## Example Numbers (Format Examples)
103+
## GUI Highlights
104+
105+
- **Download report**: choose `json`, `csv`, or `pdf` and click Save.
106+
- **Owner Intelligence**: consent checkbox gates PII-capable lookups.
107+
- **Evidence list**: populated as adapters complete.
108+
- **Non-blocking**: UI remains responsive during async checks.
109+
110+
## Example Numbers
111+
These example numbers are reserved test ranges, fictional examples, or public-format demonstrations intended for documentation and testing only. Do not use them to query private services or to target real individuals.
112+
113+
### 1. Reserved Test Numbers (RFC / NANP)
114+
115+
These numbers are explicitly reserved for testing and documentation and are never assigned to real users.
116+
117+
```text
118+
+1 202-555-0100
119+
+1 202-555-0101
120+
+1 202-555-0147
121+
+1 202-555-0199
122+
```
123+
124+
Expected behavior (when running in example/test mode or when marked in harnesses):
125+
126+
- `number_classification`: `reserved_test_number`
127+
- `example_mode`: `true`
128+
- `risk_score`: `0`
129+
- No live OSINT checks performed; adapters should be mocked or skipped.
130+
131+
### 2. Toll-Free Number Examples
132+
133+
Useful for testing toll-free detection, multi-timezone handling, and business vs scam ambiguity.
134+
135+
```text
136+
+1 800-356-9377
137+
+1 888-555-0000
138+
+1 877-555-1212
139+
```
140+
141+
Expected behavior:
142+
143+
- `line_type`: `toll_free`
144+
- Timezone coverage may be broad or absent depending on enrichment metadata
145+
- Neutral or low risk in example mode unless synthetic signals are injected
146+
147+
### 3. Fictional Bangladesh Numbers (Example Mode)
148+
149+
These demonstrate country-specific parsing and carrier detection. Treat as fictional/demo data.
150+
151+
```text
152+
+8801700000000
153+
+8801800000000
154+
+8801900000000
155+
```
156+
157+
Expected behavior:
158+
159+
- `number_classification`: `fictional_example`
160+
- `example_mode`: `true`
161+
- `risk_score`: `0`
162+
- Carrier may be mocked for demo purposes
163+
164+
### 4. International Fictional Examples
165+
166+
Useful for international formatting, country & timezone extraction, and multi-region validation.
167+
168+
```text
169+
+44 7000 000000 # UK-style fictional number
170+
+61 400 000 000 # Australia-style fictional mobile
171+
+49 151 00000000 # Germany-style fictional mobile
172+
```
173+
174+
Expected behavior:
175+
176+
- Correct country detection
177+
- Valid international formatting (E.164 and international display)
178+
- No real OSINT signals in example mode
179+
180+
### 5. Spam / Risk Logic Demonstration (Mocked)
181+
182+
These are NOT real scam numbers; use them to exercise scoring and signal detection in example/test harnesses.
183+
184+
```text
185+
+1 202-555-0147 # used to simulate scam_db match in tests
186+
+1 800-555-9999 # used to simulate classifieds exposure
187+
```
188+
189+
Expected behavior (example mode only):
190+
191+
- Deterministic mocked signals (e.g., `found_in_scam_db: true` for the first item)
192+
- Scoring logic exercised without querying live data
193+
194+
How to use in tests or demos:
195+
196+
- CLI: run with adapters mocked or with `--no-cache` and a local test adapter.
197+
- GUI: use a test harness that injects `example_mode` or pre-populates adapter results.
198+
- Always mark runs that use these numbers as `example_mode=true` in logs/reports so audits can distinguish synthetic data from live OSINT.
199+
200+
---
82201

83-
These are example formats. Validate legality and intent before using any real numbers.
84202

85-
- USA: `phoneint lookup +12015550123`
86-
- UK: `phoneint lookup +441212345678`
87-
- Bangladesh: `phoneint lookup +88027111234`
88-
- India: `phoneint lookup +917410410123`
89203

90204
If you have a national-format number without `+CC`, provide a default region:
91205

92206
```bash
93207
phoneint lookup 6502530000 --region US
94208
```
95209

210+
## Sample Output (Human Summary)
211+
212+
```text
213+
E.164: +88027111234
214+
International: +880 2-7111234
215+
National: 02-7111234
216+
Region (ISO): BD
217+
Country code: 880
218+
219+
Carrier:
220+
Region: Dhaka
221+
Time zones: Asia/Dhaka
222+
Type: fixed_line
223+
ISO country code: BD
224+
Dialing prefix: 880
225+
226+
Risk score: 0/100
227+
228+
Breakdown:
229+
- found_in_scam_db: 0.0 (Matched a public scam dataset)
230+
- voip: 0.0 (libphonenumber classified the number as VOIP)
231+
- found_in_classifieds: 0.0 (Evidence URL matched a classifieds domain heuristic)
232+
- business_listing: -0.0 (Evidence URL matched a business listing domain heuristic)
233+
```
234+
96235
## Configuration
97236

98237
Create a local `.env` from `.env.example` (do not commit real secrets).
@@ -101,6 +240,7 @@ Environment variables:
101240

102241
- `GCS_API_KEY` / `GCS_CX`: required only for the Google Custom Search adapter
103242
- `PHONEINT_*`: HTTP, cache, logging, default region, scoring weights (JSON)
243+
- `ENABLE_TRUECALLER` / `TRUECALLER_API_KEY`: required only for PII-capable Truecaller adapter
104244

105245
YAML config (recommended for score weights and non-secret defaults):
106246

@@ -132,22 +272,84 @@ Or set `PHONEINT_CONFIG=config.yaml` in `.env`.
132272

133273
No private-service scraping is included or encouraged.
134274

135-
### Google Custom Search — costs & free alternatives
275+
### Google Custom Search - Costs and Free Alternatives
276+
277+
Setting up a Google Programmable Search Engine (the Search Engine ID / `CX`) is free. The Google Custom Search API (`GCS_API_KEY`) provides a small free quota and then is billed per request (typically per 1,000 requests). Some accounts require billing to access even the free quota.
278+
279+
Free alternatives:
280+
281+
- `duckduckgo`: requires no API key
282+
- `public`: checks bundled/public datasets
283+
284+
## Owner Intelligence (Ethical Use Only)
285+
286+
`phoneint` includes an Owner Intelligence layer that produces evidence-based owner-related intelligence while remaining privacy-preserving by default.
287+
288+
What it does:
289+
290+
- Infers a coarse `ownership_type` (`business` | `individual` | `voip` | `unknown`) using deterministic rules
291+
- Extracts auditable signals and associations from public evidence
292+
- Produces an explainable confidence score with a breakdown
293+
294+
What it does not do:
295+
296+
- It does not claim to identify a private individual by default
297+
- It does not scrape private services or bypass authentication
298+
299+
### PII-Capable Adapters (Gated)
136300

137-
Setting up a Google Programmable Search Engine (the Search Engine ID / `CX`) is free — you get a CX at no cost. The Google Custom Search API (the API key used by this tool, `GCS_API_KEY`) is not completely free: Google provides a small free quota (varies by account/region) but after that the API is billed per-requests (typically per 1,000 requests). In some cases you may need to enable billing in Google Cloud to activate the API key even to access the free quota.
301+
PII-capable adapters (e.g., Truecaller) are disabled by default and will only run when:
138302

139-
In short: you can get started for free, but heavy or programmatic use may incur small charges.
303+
1. You provide official credentials via `.env` (never commit secrets)
304+
2. You explicitly enable the adapter in config/environment
305+
3. You explicitly confirm lawful purpose plus explicit consent (`--enable-pii` in CLI, checkbox in GUI)
306+
4. An audit trail is recorded in the report (`owner_audit_trail`)
140307

141-
If you prefer fully free options, consider these alternatives:
308+
Enable (example):
142309

143-
- `duckduckgo`: uses DuckDuckGo Instant Answer and requires no API key.
144-
- `public`: checks bundled/public datasets (ships with `phoneint/data/scam_list.json`) and is free to use.
310+
- `.env`: set `ENABLE_TRUECALLER=1` and `TRUECALLER_API_KEY=...`
311+
- CLI: pass `--enable-pii` and `--legal-purpose "..."`
145312

146-
Quick recap:
313+
### CLI Examples
147314

148-
- **CX (Search Engine ID)**: Free ✅
149-
- **Custom Search API key usage**: Free tier then paid ⚠️
150-
- **Free alternatives**: DuckDuckGo adapter, public datasets ✅
315+
Public evidence only:
316+
317+
```bash
318+
phoneint lookup +8801712345678 --adapters duckduckgo --output report.json
319+
```
320+
321+
PII-capable (only if you have official API access and explicit consent):
322+
323+
```bash
324+
ENABLE_TRUECALLER=true phoneint lookup +8801712345678 --enable-pii --legal-purpose "customer-verification" --output report.json
325+
```
326+
327+
### GUI Usage
328+
329+
In the GUI, the Owner Intelligence panel includes a consent checkbox:
330+
"I confirm I have lawful basis and explicit consent to query identity data..."
331+
332+
Default is unchecked. If checked and a PII-capable adapter is enabled, a warning modal is shown and the action is logged to the audit trail.
333+
334+
### Step-by-Step: If You See "PII Adapter Not Enabled"
335+
336+
This dialog means you tried to enable PII-capable lookups but have not configured an official adapter.
337+
338+
1. Close the dialog.
339+
2. Create a local `.env` file from `.env.example`.
340+
3. Add your official credentials:
341+
- `ENABLE_TRUECALLER=1`
342+
- `TRUECALLER_API_KEY=your_key_here`
343+
4. Restart the GUI.
344+
5. Check the consent checkbox and provide a lawful purpose.
345+
346+
If you do not have official credentials or consent, leave PII disabled. Public evidence and deterministic enrichment still work.
347+
348+
## Reports
349+
350+
- JSON: full report, includes owner intelligence and audit trail
351+
- CSV: evidence + owner associations + audit trail in a single long table
352+
- PDF: summary pages plus legal disclaimer (requires `reportlab`)
151353

152354
## Docker
153355

@@ -167,4 +369,3 @@ pytest
167369
## License
168370

169371
MIT. See `LICENSE`.
170-

phoneint/config.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
from pydantic import ConfigDict as PydanticConfigDict
2929

3030
from phoneint.net.http import HttpClientConfig
31+
from phoneint.owner.signals import default_owner_confidence_weights
3132
from phoneint.reputation.score import default_score_weights
3233

3334

@@ -56,9 +57,14 @@ class PhoneintSettings(BaseModel):
5657
gcs_api_key: str | None = Field(default=None)
5758
gcs_cx: str | None = Field(default=None)
5859
scam_list_path: Path | None = None
60+
enable_truecaller: bool = False
61+
truecaller_api_key: str | None = None
5962

6063
# Scoring
6164
score_weights: dict[str, float] = Field(default_factory=default_score_weights)
65+
owner_confidence_weights: dict[str, float] = Field(
66+
default_factory=default_owner_confidence_weights
67+
)
6268

6369
def http_config(self) -> HttpClientConfig:
6470
return HttpClientConfig(
@@ -87,8 +93,12 @@ def http_config(self) -> HttpClientConfig:
8793
"PHONEINT_CACHE_PATH": "cache_path",
8894
"PHONEINT_CACHE_TTL_SECONDS": "cache_ttl_seconds",
8995
"PHONEINT_SCAM_LIST_PATH": "scam_list_path",
96+
"ENABLE_TRUECALLER": "enable_truecaller",
97+
"TRUECALLER_API_KEY": "truecaller_api_key",
9098
# JSON string: {"found_in_scam_db": 70, "voip": 10, ...}
9199
"PHONEINT_SCORE_WEIGHTS": "score_weights",
100+
# JSON string: {"pii_confirmed": 50, "business_listing": 15, ...}
101+
"PHONEINT_OWNER_CONFIDENCE_WEIGHTS": "owner_confidence_weights",
92102
}
93103

94104

0 commit comments

Comments
 (0)