Skip to content

Conversation

@pierringshot
Copy link
Member

This PR adds a robust, offline-first IP CTI scanner with caching, rate/backoff safety, proxy rotation, and polished reporting.\n\nHighlights\n- CTI resiliency: token-bucket rate limiter, Retry-After backoff; rotate keys on 429; rotate proxies on 403 (resiliency, not to evade quotas)\n- Multi-key support: VT_API_KEYS and ABUSEIPDB_API_KEYS (comma-separated)\n- Proxy rotation: PROXY_LIST supports http/https/socks5 (requests[socks])\n- Caching: resumable cache saves; dedupe inputs\n- Parallel workers: workers=N with per-client rate limits\n- Outputs: PDF (with flag placeholders), JSON, CSV\n- Streamlit UI: upload IPs, adjust options, download PDF\n- Docs: updated README and .env.example with usage and env vars\n\nKey files\n- src/core/scanner.py – scanning pipeline + enrichment\n- src/cti/virustotal.py – VT client with backoff/rotation\n- src/cti/abuseipdb.py – AbuseIPDB client with backoff/rotation\n- src/cti/ratelimit.py – token-bucket limiter\n- src/cache/cti_cache.py – JSON cache\n- src/report/pdf_report.py – PDF with colored bars + flags\n- src/ui/streamlit_app.py – lightweight Streamlit UI\n- src/cli/main.py – CLI ()\n- src/net/proxy.py – proxy rotation helper\n\nEnv vars\n- VT_API_KEY or VT_API_KEYS\n- ABUSEIPDB_API_KEY or ABUSEIPDB_API_KEYS\n- PROXY_LIST (optional)\n- OFFLINE_IP_BLOCKLIST (optional)\n\nTypical commands\n- Offline demo: \n- With CTI (safe): \n- UI: \n\nNotes\n- Respect provider TOS. Proxies are for resiliency; rate limits still apply.\n- VirusTotal requires an API key; API-less access is not supported.\n\nAfter merge, we can iterate on proxy health scoring and per-key daily budgets.

pierringshot and others added 17 commits September 3, 2025 07:26
feat(cli): CTI + UA + reports + tests
…g by IP/signature, sampling, gating by 4xx/UA, optional time-window bucketing\n- CTI: scope filtering, max cap, batch flush + pause, strong cache reuse\n- Offline: support OFFLINE_IP_BLOCKLIST for local reputations\n- Budget: GROQ_TOKENS_BUDGET guard in GroqRotatingClient\n- Docs: USAGE updated; AGENTS.md adds scalable plan\n\nDefault preserves test behavior; all tests pass.
…rom CTI signals\n\n- New report files: reports/malicious_ai_report.txt|md\n- Expose Groq client at module level for monkeypatching\n- Add tests for offline blocklist + dummy Groq client
…ing\n\n- Add --verbose with quiet|normal|max (default max)\n- Gate major console rules/logs by verbosity\n- Use grey50 for unknown severity border\n- Add richer progress logs for enrichment and reports
…el-reported token usage when available; fallback to conservative estimate\n- Keeps pre-check guard; updates used-count post-call accurately\n\ndocs(readme,usage): document --verbose and budget notes\n\nchore(env): align GROQ_MODEL default in .env.example with config
… in PDF and one-click run.sh; enhance UI to use enriched scan
…ders; docs: README + .env.example; deps: requests[socks]
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex Review: Here are some suggestions.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

Comment on lines +62 to +108
def fetch(self, ip: str) -> Optional[VTResult]:
if not self.enabled():
return None
url = self.BASE + ip
key = self.api_keys[self._key_index % max(1, len(self.api_keys))] if self.api_keys else None
headers = {"x-apikey": key or ""}
for attempt in range(4):
try:
self.ratelimiter.acquire()
resp = self.session.get(
url, headers=headers, timeout=self.timeout, proxies=(self.proxies.get() if self.proxies.enabled() else None)
)
if resp.status_code == 200:
return self._parse(resp.json(), ip)
if resp.status_code == 404:
return VTResult(
ip=ip,
malicious=0,
suspicious=0,
harmless=0,
undetected=0,
last_analysis_date=None,
asn=None,
as_owner=None,
country=None,
link=None,
)
if resp.status_code in (429, 500, 502, 503):
retry_after = resp.headers.get("Retry-After")
if retry_after:
try:
sleep_s = float(retry_after)
except ValueError:
sleep_s = 2 ** attempt
else:
sleep_s = 2 ** attempt
time.sleep(sleep_s)
if len(self.api_keys) > 1 and resp.status_code == 429:
self._key_index = (self._key_index + 1) % len(self.api_keys)
continue
if resp.status_code == 403:
# Forbidden (possibly IP-level). Rotate proxy if configured and retry with backoff.
if self.proxies.enabled():
self.proxies.rotate()
if len(self.api_keys) > 1:
self._key_index = (self._key_index + 1) % len(self.api_keys)
time.sleep(2 ** attempt)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Rotate VirusTotal API key after 429/403 retries

Within VirusTotalClient.fetch, the value of headers is computed once before the retry loop, so when a 429 or 403 is received the code increments _key_index and continues but keeps sending requests with the same API key. This means multi‑key rotation never actually takes effect until the next call, causing every retry of the current lookup to hit the rate‑limited key and often return None/clean even if other keys are available. Consider recomputing the key and headers inside each retry iteration so the rotated key is used immediately when multiple keys are configured.

Useful? React with 👍 / 👎.

pierringshot and others added 2 commits September 10, 2025 22:08
…xport\n\n- Add optional AI summary via GROQ to PDF\n- Switch PDF fonts to Helvetica, UTC timestamp\n- Remove advanced toggles from UI; keep minimal, clear controls\n- Save PDF to data/processed and offer download\n- Update README to mention AI PDF summary
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants