Skip to content

Conversation

@eyalkoren
Copy link
Contributor

Adding tests to try and reproduce failures in ingest pipeline following #121914.

A benchmark ingesting bulks of messages from different sample sources fail specifically on the ingestion of some (apparently random) nginx apache access logs.

@gbanasiak found the following example log event causing the failure:

{"@timestamp": "2020-09-02T08:21:51.000Z", "type": "beats", "input": {"type": "log"}, "agent": {"version": "7.3.2", "id": "2e5b12d4-8f16-4bfe-8d91-4a98dd9c7214", "type": "filebeat", "name": "infosec-ci-master-green.c.elastic-ci-prod.internal", "ephemeral_id": "3f2b9397-a5e0-4c51-a814-5cce75962b97", "hostname": "infosec-ci-master-green"}, "service": {"type": "nginx"}, "host": {"os": {"family": "debian", "platform": "ubuntu", "kernel": "5.3.0-1032-gcp", "version": "18.04.3 LTS (Bionic Beaver)", "codename": "bionic", "name": "Ubuntu"}, "architecture": "x86_64", "containerized": false, "id": "bfafdfcc69fc1f239af7a05fe266e68c", "mac": ["42:01:0a:e0:01:ec"], "ip": ["10.224.1.236", "fe80::4001:aff:fee0:1ec"], "hostname": "infosec-ci-master-green", "name": "infosec-ci-master-green.c.elastic-ci-prod.internal"}, "event": {"timezone": "+00:00", "module": "nginx", "dataset": "nginx.access"}, "log": {"file": {"path": "/var/log/nginx/infosec-ci.elastic.co.access.log"}, "offset": 1040193}, "ecs": {"version": "1.0.1"}, "cloud": {"provider": "gcp", "instance": {"id": "7729978443851144062", "name": "infosec-ci-master-green"}, "machine": {"type": "n1-standard-4"}, "project": {"id": "elastic-ci-prod"}, "availability_zone": "us-central1-a"}, "@version": "1", "tags": ["jenkins_master", "nginx", "infra-stats", "\ud83c\udd71\ufe0f", "llama", "\ud83e\udd99", "llama-prod"], "fileset": {"name": "access"}, "message": "28.27.251.216 - dustin03 [03/Jan/2020:21:05:52 +0000] \"GET /computer/api/json HTTP/1.1\" 200 602 \"-\" \"Go-http-client/1.1\"", "data_stream": {"type": "logs", "namespace": "default", "dataset": "nginx.access"}, "rally": {"message_size": 120, "doc_size": 1531}

The error is:

Bulk request failed: [HTTP status: 400, message: [1:2460] failed to parse: data stream timestamp field [@timestamp] is missing]

This PR contains a test that attempts to reproduce the issue with the same pipeline as used in the failing benchmark and the same message that is causing the failure.
Passing this test message in isolation through the same pipeline doesn't cause any issue, thus the root cause is likely such that require more real world scenarios, like big bulks and high concurrency.

@eyalkoren eyalkoren added the >test Issues or PRs that are addressing/adding tests label Jul 17, 2025
@eyalkoren eyalkoren closed this Jul 17, 2025
@eyalkoren eyalkoren deleted the analysis-complex-pipelines branch July 17, 2025 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>test Issues or PRs that are addressing/adding tests v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants