[Shadowserver] connector OOM: per-day STIX object accumulation prevents processing for high-volume subscriptions

## Description
The Shadowserver connector accumulates all STIX objects for all reports of a single day in memory before yielding a bundle. For customers subscribed to many Shadowserver report types, a single day can produce millions of STIX objects, causing the connector to OOM before completing even one day of processing.

The root cause is in connector.py _collect_intelligence(): the per-day processing loop collects all report data via ThreadPoolExecutor, extends a single stix_objects list with every report's transformed output, then calls remove_duplicates() (which creates a second copy of the full list), and only then yields to the bundle sender.

## Environment
1. On-prem 7.260317.0

## Reproducible Steps

Steps to create the smallest reproducible scenario:
Deploy the Shadowserver connector with a valid API key and secret
Subscribe to a large number of Shadowserver report types (or leave SHADOWSERVER_REPORT_TYPES empty to receive all available reports)
Set SHADOWSERVER_INITIAL_LOOKBACK to 1 (single day)
Start the connector
Observe memory consumption during the first collection cycle

## Expected Output
The connector should process and yield STIX bundles incrementally (per-report or in configurable batch sizes) so that memory usage remains bounded regardless of the number of reports or rows per day.

## Actual Output
The connector accumulates all STIX objects for all reports of a single day in a single in-memory list before yielding. The processing flow is:

For each day, ThreadPoolExecutor(max_workers=8) downloads all reports in parallel
Each report's CSV is parsed row-by-row; ShadowserverStixTransformation generates multiple STIX objects per row (Identity, Artifact with base64-encoded CSV, ObservedData, Notes, IPs, ASNs, hostnames, etc.)
All results are .extend()-ed into a single stix_objects list
remove_duplicates() creates a second copy of the full list, doubling peak memory
Only then does the method yield the bundle to the sender

## Additional information
For high-volume subscriptions, this results in OOM before a single day completes processing. The connector restarts and retries the same day, creating an infinite OOM loop.

Observed in a Kubernetes deployment with 16 GiB memory limit on the connector pod:

    Memory pattern: Sawtooth, linear climb from ~2 GiB to ~14 GiB over ~80 minutes, then OOM kill and restart
    Pod restarts: Three OOM kills visible in a ~90-minute window
    CPU: Pegged at ~1 core (Python GIL bound), 4 cores allocated, CPU is not the bottleneck
    Outcome: Connector never completes processing a single day's reports

The ThreadPoolExecutor improves download speed but does not reduce peak memory, since all downloaded and transformed data is accumulated before yielding.

Suggested Fix

Option A (minimal change): Yield per-report instead of per-day. After each report is downloaded and transformed, yield its STIX objects as a bundle immediately rather than accumulating into the day-level list. This bounds memory to the size of a single report's output.

Option B (more robust): Implement chunked bundle sending with a configurable batch size (e.g., SHADOWSERVER_BATCH_SIZE). Accumulate STIX objects and yield a bundle every N objects, regardless of report boundaries. This provides predictable memory usage.

Both options should also address the remove_duplicates() copy: consider in-place deduplication or deduplication per-chunk rather than on the full day's output.

## Screenshots (optional)

<img width="1251" height="876" alt="Image" src="https://github.com/user-attachments/assets/436a6a9d-5185-4680-b01e-d12d07048330" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Shadowserver] connector OOM: per-day STIX object accumulation prevents processing for high-volume subscriptions #6058

Description

Environment

Reproducible Steps

Expected Output

Actual Output

Additional information

Screenshots (optional)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Shadowserver] connector OOM: per-day STIX object accumulation prevents processing for high-volume subscriptions #6058

Description

Description

Environment

Reproducible Steps

Expected Output

Actual Output

Additional information

Screenshots (optional)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions