CyrenThreatIntelligence v3.0.3: Fix duplicate data ingestion#13631
Open
mazamizo21 wants to merge 1 commit intoAzure:masterfrom
Open
CyrenThreatIntelligence v3.0.3: Fix duplicate data ingestion#13631mazamizo21 wants to merge 1 commit intoAzure:masterfrom
mazamizo21 wants to merge 1 commit intoAzure:masterfrom
Conversation
…up to Azure#13603) Changes in this PR: - Increased 'count' from 100 to 1000 in both IP Reputation and Malware URLs pollers (Cyren IP Rep feed has ~800 indicators, Malware URLs ~200 — all fit in one page) - Increased 'queryWindowInMin' from 15 to 360 minutes (6 hours) (Threat intelligence feeds are relatively static and do not require frequent polling) - Preserved PersistentToken paging from v3.0.2 - Added 3.0.3.zip package (all previous versions preserved: 3.0.0, 3.0.1, 3.0.2) - Updated ReleaseNotes.md Root cause of duplication: With count=100, the connector made 8+ page requests per poll cycle to fetch all ~800 indicators. Combined with 15-minute polling, this re-ingested the same data 96 times per day. Observed: 304,000 rows with only 198 unique IPs (1,535:1 duplicate ratio). Files changed: - Cyren_PollerConfig.json: count 100→1000, queryWindowInMin 15→360 - Package/mainTemplate.json: Same fixes + version bump to 3.0.3 - Package/3.0.3.zip: Updated package with all changes - ReleaseNotes.md: Added 3.0.3 entry
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up fix to PR #13603 (v3.0.2). While v3.0.2 correctly changed the paging type from
OffsettoPersistentToken, the combination of small page sizes (count=100) and frequent polling (queryWindowInMin=15) still caused significant duplicate data ingestion in production.Problem
The Cyren IP Reputation feed contains approximately 800 static indicators and the Malware URLs feed approximately 200 indicators. With the v3.0.2 configuration:
count=100caused 8+ page requests per poll cycle to fetch all indicatorsqueryWindowInMin=15triggered polling every 15 minutes (96 times/day)Changes
countqueryWindowInMinpagingTypeExpected Impact
Files Changed
Cyren_PollerConfig.jsoncount: 100→1000,queryWindowInMin: 15→360 (both pollers)Package/mainTemplate.json_solutionVersion: 3.0.2→3.0.3Package/3.0.3.zipReleaseNotes.mdAll previous package versions preserved: 3.0.0.zip, 3.0.1.zip, 3.0.2.zip
Verification
Related