Skip to content

[Shadowserver] connector OOM: per-day STIX object accumulation prevents processing for high-volume subscriptions #6058

@fruitcakej

Description

@fruitcakej

Description

The Shadowserver connector accumulates all STIX objects for all reports of a single day in memory before yielding a bundle. For customers subscribed to many Shadowserver report types, a single day can produce millions of STIX objects, causing the connector to OOM before completing even one day of processing.

The root cause is in connector.py _collect_intelligence(): the per-day processing loop collects all report data via ThreadPoolExecutor, extends a single stix_objects list with every report's transformed output, then calls remove_duplicates() (which creates a second copy of the full list), and only then yields to the bundle sender.

Environment

  1. On-prem 7.260317.0

Reproducible Steps

Steps to create the smallest reproducible scenario:
Deploy the Shadowserver connector with a valid API key and secret
Subscribe to a large number of Shadowserver report types (or leave SHADOWSERVER_REPORT_TYPES empty to receive all available reports)
Set SHADOWSERVER_INITIAL_LOOKBACK to 1 (single day)
Start the connector
Observe memory consumption during the first collection cycle

Expected Output

The connector should process and yield STIX bundles incrementally (per-report or in configurable batch sizes) so that memory usage remains bounded regardless of the number of reports or rows per day.

Actual Output

The connector accumulates all STIX objects for all reports of a single day in a single in-memory list before yielding. The processing flow is:

For each day, ThreadPoolExecutor(max_workers=8) downloads all reports in parallel
Each report's CSV is parsed row-by-row; ShadowserverStixTransformation generates multiple STIX objects per row (Identity, Artifact with base64-encoded CSV, ObservedData, Notes, IPs, ASNs, hostnames, etc.)
All results are .extend()-ed into a single stix_objects list
remove_duplicates() creates a second copy of the full list, doubling peak memory
Only then does the method yield the bundle to the sender

Additional information

For high-volume subscriptions, this results in OOM before a single day completes processing. The connector restarts and retries the same day, creating an infinite OOM loop.

Observed in a Kubernetes deployment with 16 GiB memory limit on the connector pod:

Memory pattern: Sawtooth, linear climb from ~2 GiB to ~14 GiB over ~80 minutes, then OOM kill and restart
Pod restarts: Three OOM kills visible in a ~90-minute window
CPU: Pegged at ~1 core (Python GIL bound), 4 cores allocated, CPU is not the bottleneck
Outcome: Connector never completes processing a single day's reports

The ThreadPoolExecutor improves download speed but does not reduce peak memory, since all downloaded and transformed data is accumulated before yielding.

Suggested Fix

Option A (minimal change): Yield per-report instead of per-day. After each report is downloaded and transformed, yield its STIX objects as a bundle immediately rather than accumulating into the day-level list. This bounds memory to the size of a single report's output.

Option B (more robust): Implement chunked bundle sending with a configurable batch size (e.g., SHADOWSERVER_BATCH_SIZE). Accumulate STIX objects and yield a bundle every N objects, regardless of report boundaries. This provides predictable memory usage.

Both options should also address the remove_duplicates() copy: consider in-place deduplication or deduplication per-chunk rather than on the full day's output.

Screenshots (optional)

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    buguse for describing something not working as expectedneeds triageuse to identify issue needing triage from Filigran Product team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions