Skip to content

CyrenThreatIntelligence v3.0.3: Fix duplicate data ingestion#13631

Open
mazamizo21 wants to merge 1 commit intoAzure:masterfrom
Data443:feature/cyren-v3.0.3-dedup-fix-ms
Open

CyrenThreatIntelligence v3.0.3: Fix duplicate data ingestion#13631
mazamizo21 wants to merge 1 commit intoAzure:masterfrom
Data443:feature/cyren-v3.0.3-dedup-fix-ms

Conversation

@mazamizo21
Copy link
Contributor

Summary

Follow-up fix to PR #13603 (v3.0.2). While v3.0.2 correctly changed the paging type from Offset to PersistentToken, the combination of small page sizes (count=100) and frequent polling (queryWindowInMin=15) still caused significant duplicate data ingestion in production.

Problem

The Cyren IP Reputation feed contains approximately 800 static indicators and the Malware URLs feed approximately 200 indicators. With the v3.0.2 configuration:

  • count=100 caused 8+ page requests per poll cycle to fetch all indicators
  • queryWindowInMin=15 triggered polling every 15 minutes (96 times/day)
  • Observed impact: 304,000 rows ingested in 24 hours with only 198 unique IPs — a 1,535:1 duplicate ratio

Changes

Parameter v3.0.2 (Before) v3.0.3 (After) Rationale
count 100 1000 Fetch all indicators in a single page — no multi-page re-fetching needed
queryWindowInMin 15 360 Poll every 6 hours — threat intelligence indicators are relatively static
pagingType PersistentToken PersistentToken No change — correct paging type preserved from v3.0.2

Expected Impact

  • ~99.7% reduction in duplicate data ingestion
  • Before: ~304,000 rows/day → After: ~3,200 rows/day (4 polls × ~800 records)
  • Already validated on a live Sentinel workspace (Cyren-Final-2)

Files Changed

File Change
Cyren_PollerConfig.json count: 100→1000, queryWindowInMin: 15→360 (both pollers)
Package/mainTemplate.json Same config changes + _solutionVersion: 3.0.2→3.0.3
Package/3.0.3.zip New package with updated mainTemplate.json + createUiDefinition.json
ReleaseNotes.md Added v3.0.3 entry

All previous package versions preserved: 3.0.0.zip, 3.0.1.zip, 3.0.2.zip

Verification

  • Extracted 3.0.3.zip and confirmed all values match source files
  • Live connector patched and validated in production workspace
  • Old zip files verified unchanged (SHA-256 matches upstream)

Related

…up to Azure#13603)

Changes in this PR:
- Increased 'count' from 100 to 1000 in both IP Reputation and Malware URLs pollers
  (Cyren IP Rep feed has ~800 indicators, Malware URLs ~200 — all fit in one page)
- Increased 'queryWindowInMin' from 15 to 360 minutes (6 hours)
  (Threat intelligence feeds are relatively static and do not require frequent polling)
- Preserved PersistentToken paging from v3.0.2
- Added 3.0.3.zip package (all previous versions preserved: 3.0.0, 3.0.1, 3.0.2)
- Updated ReleaseNotes.md

Root cause of duplication:
With count=100, the connector made 8+ page requests per poll cycle to fetch all ~800
indicators. Combined with 15-minute polling, this re-ingested the same data 96 times
per day. Observed: 304,000 rows with only 198 unique IPs (1,535:1 duplicate ratio).

Files changed:
- Cyren_PollerConfig.json: count 100→1000, queryWindowInMin 15→360
- Package/mainTemplate.json: Same fixes + version bump to 3.0.3
- Package/3.0.3.zip: Updated package with all changes
- ReleaseNotes.md: Added 3.0.3 entry
@mazamizo21 mazamizo21 requested review from a team as code owners February 13, 2026 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant