Skip to content

Conversation

@brijesh-elastic
Copy link
Collaborator

@brijesh-elastic brijesh-elastic commented May 30, 2025

Proposed commit message

rapid7_insightvm: Add asset_vulnerability data stream for Cloud Detection and Response (CDR) workflow

This adds a breaking change, as it involves deprecating asset data stream using HTTPJSON input and
adding new data_stream asset_vulnerability using CEL input.

The new data stream, asset_vulnerability, will retrieve a list of assets and the vulnerabilities contained within each.
Additionally, it will enrich the vulnerability details by querying vulnerability endpoint.
The agent will publish one document per vulnerability per host.

ECS mapping and transforms have also been added to facilitate
the Cloud Native Vulnerability Management (CNVM)[1] workflow.

Enabled agentless deployment for rapid7 insightvm
Upgraded the format_version to 3.3.5
Updated Kibana version constraints to ^8.19.0 || ^9.1.0

[1] https://www.elastic.co/guide/en/security/current/vuln-management-overview.html

Note

To Reviewers:

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

How to test this PR locally

  • Clone integrations repo.
  • Install elastic package locally.
  • Start elastic stack using elastic-package.
  • Move to integrations/packages/rapid7_insightvm directory.
  • Run the following command to run tests.

elastic-package test

Related issues

@brijesh-elastic brijesh-elastic self-assigned this May 30, 2025
@brijesh-elastic brijesh-elastic added enhancement New feature or request Integration:rapid7_insightvm Rapid7 InsightVM Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations] Team:Sit-Crest Crest developers on the Security Integrations team [elastic/sit-crest-contractors] labels May 30, 2025
@elastic-vault-github-plugin-prod

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@kcreddy kcreddy marked this pull request as ready for review June 7, 2025 17:32
@kcreddy kcreddy requested a review from a team as a code owner June 7, 2025 17:32
@elasticmachine
Copy link

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@kcreddy kcreddy requested a review from a team June 7, 2025 17:33
tag: set_vulnerability_enumeration
value: CVE
# Remove cloud.* fields populated by beat.
# These fields correspond to EA rather than Tenable hosts and could be misleading.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comment mentions Tenable, should be Rapid7 I guess

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 9bd0b7b

template_path: cel.yml.hbs
enabled: false
vars:
- name: interval
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no initial interval as far as I can see. Do we do full scan periodically? In that case we need to set the @timestamp of the documents to the ingestion time, like we did for AWS SecurityHub full posture data stream https://github.com/elastic/integrations/pull/13372/files#diff-2aebbf76c4f4587a0d3701264a03075f4886df078bb04d4b41e40ec329902488R592 This way we won't have the problem of old documents outside of the retention period

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Rapid7, it will collect all the vulnerabilities present in all the assets during the first interval. After subsequent intervals, it will collect only the new and remediated vulnerabilities that have introduced since the last interval.

cc: @kcreddy

Copy link
Contributor

@kcreddy kcreddy Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxcold, we are going with incremental load to keep lower memory footprint as @brijesh-elastic observed agentless agent restarts in Serverless environments.

But I see your point. We need to match the transform retention with initial_interval, otherwise even if we ingest 2 years worth of data if our transform retention is much lower, all the documents are deleted inside destination indices. Is this correct understanding?
If this is the case, we can add initial_interval like with other incremental integrations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we go with incremental I think we need to follow other integraitons approach and have initial_interval. It will at least give users the option to ingest some initial data for the CDR flows. But I'm not sure I understand how do we then have the data from 2023 in the envs if you do incremental updates only?

Copy link
Contributor

@kcreddy kcreddy Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I'm not sure I understand how do we then have the data from 2023 in the envs if you do incremental updates only?

In the absence of initial_interval, the first fetch ingests all historical data. All fetches after this will be incremental.

@brijesh-elastic is already working on adding initial_interval option, so that first fetch only has user-defined initial interval data instead of ingesting all historical data.

Copy link
Contributor

@maxcold maxcold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested cloud security UIs with the changes, everything works well as far as i can tell

Copy link
Contributor

@efd6 efd6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the behaviour is better, but I still have concerns about #14079 (comment).

Copy link
Contributor

@chemamartinez chemamartinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding comments I added, LGTM.

@brijesh-elastic brijesh-elastic requested review from efd6 and kcreddy June 17, 2025 09:10
Copy link
Contributor

@kcreddy kcreddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit only. LGTM for my comments 👍🏼 . Please wait for @efd6 approval.

Comment on lines +16 to +26
- remove:
field:
- organization
- division
- team
ignore_missing: true
if: ctx.organization instanceof String && ctx.division instanceof String && ctx.team instanceof String
tag: remove_agentless_tags
description: >-
Removes the fields added by Agentless as metadata,
as they can collide with ECS fields.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a changelog entry for this change (bugfix) similar to #14172

Copy link
Contributor

@efd6 efd6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Nit only, then LGTM

?"asset": has(state.?cursor.last_interval_time) ?
optional.none()
:
optional.of(("last_scan_end > " + string((timestamp(interval_time) - duration(state.initial_interval)).format(time_layout.RFC3339)))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
optional.of(("last_scan_end > " + string((timestamp(interval_time) - duration(state.initial_interval)).format(time_layout.RFC3339)))),
optional.of(("last_scan_end > " + (timestamp(interval_time) - duration(state.initial_interval)).format(time_layout.RFC3339))),

timestamp.format returns a string.

@elasticmachine
Copy link

💚 Build Succeeded

History

cc @brijesh-elastic

@elastic-sonarqube
Copy link

Quality Gate failed Quality Gate failed

Failed conditions
75.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube

@brijesh-elastic brijesh-elastic merged commit 8d53d83 into elastic:main Jun 19, 2025
6 of 7 checks passed
@elastic-vault-github-plugin-prod

Package rapid7_insightvm - 2.0.0 containing this change is available at https://epr.elastic.co/package/rapid7_insightvm/2.0.0/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation. Applied to PRs that modify *.md files. enhancement New feature or request Integration:rapid7_insightvm Rapid7 InsightVM Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations] Team:Sit-Crest Crest developers on the Security Integrations team [elastic/sit-crest-contractors]

Projects

None yet

7 participants