Skip to content

Conversation

@kcreddy
Copy link
Contributor

@kcreddy kcreddy commented Apr 8, 2025

Proposed commit message

Adds a latest transform for vulnerabilities which allows data
from Qualys VMDR to be displayed in Elastic Security CNVM workflow.

Ref: https://github.com/elastic/integrations/issues/11673

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

How to test this PR locally

Related issues

@kcreddy kcreddy self-assigned this Apr 8, 2025
@kcreddy kcreddy added enhancement New feature or request Integration:qualys_vmdr Qualys VMDR Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations] labels Apr 8, 2025
@kcreddy kcreddy marked this pull request as ready for review April 8, 2025 09:57
@kcreddy kcreddy requested a review from a team as a code owner April 8, 2025 09:57
@elasticmachine
Copy link

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@kcreddy kcreddy requested a review from a team April 8, 2025 09:58
@kcreddy kcreddy marked this pull request as draft April 8, 2025 15:30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that having these in separate files reflects the situation in the data stream's defs, but it makes me sad. I the rationale for this diff collision avoidance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fields in these files package.yml, resource.yml, vulnerability.yml are meant for extended ECS fields that are not in ECS definitions, but are used by the Elastic workflows.
Here both package.name, package.version are present in ECS: https://www.elastic.co/guide/en/ecs/current/ecs-package.html. But field package.fixed_version isn't. Hence it was separated.
It could also help checking diff between source fields and destination fields: https://github.com/elastic/integrations/blob/main/packages/qualys_vmdr/data_stream/asset_host_detection/fields/package.yml as they are now identical.

@kcreddy
Copy link
Contributor Author

kcreddy commented Apr 9, 2025

The PR will be moved to ready for review once live testing is finished.
cc: @maxcold @alexreal1314

@alexreal1314
Copy link
Contributor

The PR will be moved to ready for review once live testing is finished. cc: @maxcold @alexreal1314

still waiting for credentials to test it.

move_on_creation: true
latest:
unique_key:
- vulnerability.id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be causing duplication of the documents (we see duplication in the test env). I remember we had this conversation before, but I need to find out the outcome. If having fields with multiple values as unique_key causing duplication we might need to find another way, eg. use QID+resource.id+namespace as a key. We need to understand better our options here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kcreddy @maxcold IMO we should use event.id+resource.id+namespace since we set event.id based on unique_vuln_id in host detection, or use qualys_vmdr.knowledge_base.qid field in knowledge_base ingest pipeline.

Copy link
Contributor Author

@kcreddy kcreddy Apr 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion @alexreal1314.
I have implemented it here: 591a0f0. Removed vulnerability.package.name and vulnerability.package.version as well.

retention_policy:
time:
field: "@timestamp"
max_age: 4h
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kcreddy am I correct that Qualys integration implemented in a way that it ingests the whole data every cycle?
If yes, we need to think about the value here. 4h is the default value but also configurable. For example InfoSec team set it to 12h now. this means that after 4h the findings will be gone for the next 8h. Until we implement the connection between the params and this value in the transform, setting higher value might be needed. Say 24h as a good middle ground or maybe even more (native CNVM has 72h for example). In this case for some time there might be some old vulnerabilities lingering around in the env, but it's better than having no findings

Copy link
Contributor

@clement-fouque clement-fouque Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am I correct that Qualys integration implemented in a way that it ingests the whole data every cycle?

Yes it's correct, it's the default behaviour. However, there is the option to enable an incremental scan.

If yes, we need to think about the value here. 4h is the default value but also configurable. For example InfoSec team set it to 12h now. this means that after 4h the findings will be gone for the next 8h. Until we implement the connection between the params and this value in the transform, setting higher value might be needed. Say 24h as a good middle ground or maybe even more (native CNVM has 72h for example). In this case for some time there might be some old vulnerabilities lingering around in the env, but it's better than having no findings

I agree with your analysis. I would like to add details that will likely influence the value:

  • If Qualys is configured with authenticated scan (either through the Qualys agent, either through network scan), Qualys will see if the vulnerability is fixed and will change the status as fixed.
  • If Qualys is configured to scan with the Qualys agent, the minimal scanning interval is 4h and the default is also 4h

To sum up, we'll see old vulnerabilities lingering around only on non-authenticated network scans. As it's not a recommended practice to run unauthenticated scans, we should be fine most of the time. EDIT: I think it's inaccurate unless the CNVM functionality is filtering fixed or ignored vulnerabilities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion @maxcold and @clement-fouque
I updated the retention to 24h here: 591a0f0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clement-fouque thanks for sharing more context! We will work toward connecting the data retention param on the transform with the params of the integration, but for now, I think we should proceed with the longer retention period

@elastic-vault-github-plugin-prod

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@kcreddy kcreddy marked this pull request as ready for review April 14, 2025 02:36
@kcreddy kcreddy requested review from alexreal1314 and maxcold April 14, 2025 02:36
@maxcold
Copy link
Contributor

maxcold commented Apr 14, 2025

@kcreddy we probably need to update kibana.version condition https://github.com/elastic/integrations/blob/main/packages/qualys_vmdr/manifest.yml#L12 to the version: "^8.19.0 || ^9.1.0" . @alexreal1314 as far as I can see you didn't backport the changes of supporting multiple CVEs and packages to other branches. We might need to backport at least to 8.x so the changes go out in 8.19 as well.

@kcreddy
Copy link
Contributor Author

kcreddy commented Apr 14, 2025

@kcreddy we probably need to update kibana.version condition https://github.com/elastic/integrations/blob/main/packages/qualys_vmdr/manifest.yml#L12 to the version: "^8.19.0 || ^9.1.0" . @alexreal1314 as far as I can see you didn't backport the changes of supporting multiple CVEs and packages to other branches. We might need to backport at least to 8.x so the changes go out in 8.19 as well.

@maxcold, Okay. Do you think merging this PR closer to 8.19 or after 8.19 makes sense?
Prior to this, you can test the integration on 8.19 using the fleet API.

@maxcold
Copy link
Contributor

maxcold commented Apr 14, 2025

@nick-alayil can you provide your input on

Do you think merging this PR closer to 8.19 or after 8.19 makes sense?

This would mean that the Qualys data won't be available in Serverless before 8.19/9.1 release in June. Are we ok with that? do we have customers in Serverless we would like to enable with this feature?

@kcreddy
Copy link
Contributor Author

kcreddy commented Apr 16, 2025

If the changes look solid and testing is good, I think we should merge this right away. We're following the serverless first approach as an org now anyway, so this aligns with that direction.

However, I agree with the statement: 'We're following a serverless-first approach as an organization now anyway, so this aligns with that direction.' In my opinion, this technical limitation within the integrations ecosystem shouldn't block our serverless-first approach.

Thanks for the input @nick-alayil and @maxcold.
I've updated the min stack version as per your suggestion -> 89b8323. We can proceed merging the PR now. We will be following the integration backporting guidelines to backport any bugs in the meantime.

@kcreddy
Copy link
Contributor Author

kcreddy commented Apr 16, 2025

The CI error should be fixed by elastic/elasticsearch#126417.

@efd6
Copy link
Contributor

efd6 commented Apr 17, 2025

Running this locally on v8.19.0-SNAPSHOT, I have minimised this to

---
description: Pipeline for processing Asset Host Detection data.
processors:
  - set:
      field: message
      value: '{"ATTRIBUTE":[{"OTHER_FIELD":""}]}'
  - json:
      field: message
      target_field: json
  - remove:
      tag: remove_json
      field:
        - json.ATTRIBUTE.LAST_ERROR_DATE
      ignore_missing: true
on_failure:
  - append:
      field: error.message
      value: 'Processor {{{_ingest.on_failure_processor_type}}} with tag {{{_ingest.on_failure_processor_tag}}} in pipeline {{{_ingest.pipeline}}} failed with message: {{{_ingest.on_failure_message}}}'

This ceases to fail when ATTRIBUTE is not an array.

Using the dev tools gives an indication why this might be:

POST /_ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "json": {
          "field": "message",
          "target_field": "json"
        }
      },
     {
        "remove": {
          "field": [
            "json.ATTRIBUTE.LAST_ERROR_DATE"
          ],
          "ignore_missing": true
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "idx",
      "_id": "id",
      "_source": {
        "message": """{"ATTRIBUTE":[{"OTHER_FIELD":""}]}"""
      }
    }
  ]
}

gives

{
  "docs": [
    {
      "error": {
        "root_cause": [
          {
            "type": "illegal_argument_exception",
            "reason": "[LAST_ERROR_DATE] is not an integer, cannot be used as an index as part of path [json.ATTRIBUTE.LAST_ERROR_DATE]"
          }
        ],
        "type": "illegal_argument_exception",
        "reason": "[LAST_ERROR_DATE] is not an integer, cannot be used as an index as part of path [json.ATTRIBUTE.LAST_ERROR_DATE]",
        "caused_by": {
          "type": "number_format_exception",
          "reason": "For input string: \"LAST_ERROR_DATE\""
        }
      }
    }
  ]
}

while when the document is {"ATTRIBUTE":{"OTHER_FIELD":""}} there is no error.

If I run the same failing simulate request on v8.17.3, I get no failure.

The image for the v8.19.0-SNAPSHOT is 3c3225efef7e8285bb2c5de5cf0b15fe4747677fc84473a2b3db53fe2391212e, which reports {"@timestamp":"2025-04-17T02:29:50.155Z", "log.level": "INFO", "message":"version[8.19.0-SNAPSHOT], pid[233], build[docker/c070c976010455cf7d9b45b6270cb6709e84c1ae/2025-04-15T17:38:52.824533143Z], OS[Linux/6.10.14-linuxkit/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/24/24+36-3646]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.node.Node","elasticsearch.node.name":"3c3225efef7e","elasticsearch.cluster.name":"elasticsearch"} which is here and should contain the fix in elastic/elasticsearch#126417. So I have to conclude that the issue here is not the same as the one that was fixed in that PR — this is not completely surprising since the error messages are subtly different.

@kcreddy
Copy link
Contributor Author

kcreddy commented Apr 21, 2025

Thanks @efd6 for the analysis. It is indeed a variation of elastic/elasticsearch#126417 but for lists.

@joegallo made the fix in elastic/elasticsearch#127006 and should be available in v8.19.0 stack version.

@kcreddy
Copy link
Contributor Author

kcreddy commented Apr 21, 2025

/test

2 similar comments
@kcreddy
Copy link
Contributor Author

kcreddy commented Apr 21, 2025

/test

@kcreddy
Copy link
Contributor Author

kcreddy commented Apr 23, 2025

/test

@elasticmachine
Copy link

💚 Build Succeeded

History

cc @kcreddy

@elastic-sonarqube
Copy link

Quality Gate failed Quality Gate failed

Failed conditions
5.5% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube

@alexreal1314
Copy link
Contributor

alexreal1314 commented Apr 28, 2025

Hi @kcreddy @maxcold I've tested latest changes on my env https://alex-dev91.kb.us-west2.gcp.elastic-cloud.com/ and seems that after we changed the unique key of the transform there are no duplicates.

this is the query i ran:

POST security_solution-qualys_vmdr.vulnerability_latest-v1/_search
{
  "size": 0,
  "aggs": {
    "duplicate_check": {
      "composite": {
        "size": 10000,
        "sources": [
          { "event_id": { "terms": { "field": "event.id.keyword" } } },
          { "resource_id": { "terms": { "field": "resource.id.keyword" } } },
          { "namespace": { "terms": { "field": "data_stream.namespace.keyword" } } }
        ]
      },
      "aggs": {
        "docs_per_key": {
          "bucket_script": {
            "buckets_path": {
              "docCount": "_count"
            },
            "script": "params.docCount"
          }
        },
        "duplicate_filter": {
          "bucket_selector": {
            "buckets_path": {
              "docCount": "_count"
            },
            "script": "params.docCount > 1"
          }
        }
      }
    }
  }
}

image

so from my side we are good to go.

Copy link
Contributor

@maxcold maxcold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for pushing this forward @kcreddy !

@kcreddy kcreddy merged commit c0cf6f2 into elastic:main Apr 28, 2025
5 of 7 checks passed
@elastic-vault-github-plugin-prod

Package qualys_vmdr - 6.6.0 containing this change is available at https://epr.elastic.co/package/qualys_vmdr/6.6.0/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Integration:qualys_vmdr Qualys VMDR Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Qualys VMDR: Implement transform for enhancing cloud security workflow

7 participants