Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .buildkite/scripts/e2e-pipeline/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,11 +44,11 @@ pip install -r .buildkite/scripts/e2e-pipeline/requirements.txt
### Run on local
Run the following command from the repo dir:
```bash
python3 .buildkite/scripts/e2e-pipeline/main.py --skip-setup=true --integrations='apache','nginx'
python3 .buildkite/scripts/e2e-pipeline/main.py --skip-setup --integrations='apache','nginx'
```

This will run entire ELK docker containers.
Remove `--skip-setup` or use `--skip-setup=true` if you are running the script for the first time, where it needs to set up elastic-package and integrations.
Do not use `--skip-setup` if you are running the script for the first time, where it needs to set up elastic-package and integrations.

## Troubleshooting
- The project retries on some operations to overcome timeout issues, uses [`retry` tool](https://formulae.brew.sh/formula/retry). If you get `retry` undefined error, make sure to install it.
Expand Down
1 change: 1 addition & 0 deletions .buildkite/scripts/e2e-pipeline/bootstrap.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ def __init__(self, stack_version: str, project_type: str) -> None:
Returns:
Validates and sets stack version, project type and resolves elastic package distro based on running OS (sys.platform)
"""
print(f"Stack version: {stack_version}")
self.stack_version = stack_version
self.__validate_and_set_project_type(project_type)
self.__resolve_distro()
Expand Down
2 changes: 1 addition & 1 deletion .buildkite/scripts/e2e-pipeline/generate-steps.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ def generate_steps_for_main_branch(versions) -> list:
structure = {
"agents": {
"provider": "gcp",
"machineType": "n2-standard-4",
"machineType": "n2-standard-16",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes ES is not reachable, when I spin up VM and tried many times, it doesn't response. It looks like elastic-package fires up so much resource that ES is very slow. Changing the machine type and decreasing number of packages (only m365_defender which has more processors) to test.

"imageProject": "elastic-images-prod",
"image": "family/platform-ingest-logstash-multi-jdk-ubuntu-2204",
"diskSizeGb": 120
Expand Down
11 changes: 3 additions & 8 deletions .buildkite/scripts/e2e-pipeline/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from plugin_test import PluginTest
import util

INTEGRATION_PACKAGES_TO_TEST = ["apache", "m365_defender", "nginx", "tomcat"]
INTEGRATION_PACKAGES_TO_TEST = ["m365_defender"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we are using a more capable VM are we adding these back?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking to opt out them as they do not provide meaningful values to the test cases. m365_defender has more complex conditions and more processors which, I consider, is the best candidate (or TI integration) for the test. nginx, apache & tomcat are simple integrations with countable processors.
If you have strong opinion I am happy to rollback (or at least add one/couple of them).



class BootstrapContextManager:
Expand Down Expand Up @@ -41,6 +41,7 @@ def main(skip_setup=False, integrations=[]):
with BootstrapContextManager(skip_setup) as bootstrap:
working_dir = os.getcwd()
test_plugin = PluginTest()

packages = integrations or INTEGRATION_PACKAGES_TO_TEST
for package in packages:
try:
Expand All @@ -50,13 +51,7 @@ def main(skip_setup=False, integrations=[]):
print(f"Test failed for {package} with {e}.")
failed_packages.append(package)

container = util.get_logstash_container()

# pretty printing
print(f"Logstash docker container logs..")
ls_container_logs = container.logs().decode('utf-8')
for log_line in ls_container_logs.splitlines():
print(log_line)
util.show_containers_logs(["logstash-", "elasticsearch-", "elastic-agent-"])

if len(failed_packages) > 0:
raise Exception(f"Following packages failed: {failed_packages}")
Expand Down
3 changes: 2 additions & 1 deletion .buildkite/scripts/e2e-pipeline/plugin_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
A class to validate the Integration Plugin with a given integration package
"""
import subprocess
import time
from logstash_stats import LogstashStats


class PluginTest:
logstash_stats_api = LogstashStats()
LAST_PROCESSED_EVENTS = {"in": 0, "out": 0}
Expand Down Expand Up @@ -58,4 +58,5 @@ def on(self, package: str) -> None:
for result_line in result.stdout.splitlines(): print(f"{result_line}")

# although there was an error, le's check how LS performed and make sure errors weren't because of Logstash
time.sleep(2) # make sure LS processes the event way to downstream ES
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happened mostly when ES is very slow or irresponsive but in case to make sure events are processed, leaving some window before hitting _node/stats API

self.__analyze_logstash_throughput(package, result)
27 changes: 23 additions & 4 deletions .buildkite/scripts/e2e-pipeline/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,30 @@ def call_url_with_retry(url: str, max_retries: int = 5, delay: int = 1) -> reque
session.mount(schema, HTTPAdapter(max_retries=retries))
return session.get(url)


def get_logstash_container() -> Container:
def show_containers_logs(container_prefixes):
client = docker.from_env()
return client.containers.get("elastic-package-stack-e2e-logstash-1")

containers = client.containers.list(all=True)
print(f"Available container names: {[c.name for c in containers]}")
matching_containers = []
for container in containers:
if any(prefix in container.name for prefix in container_prefixes):
matching_containers.append(container)

if not matching_containers:
prefixes_str = ", ".join(container_prefixes)
print(f"No containers found with prefixes: {prefixes_str}")
return

for container in matching_containers:
# pretty printing with clear separators
separator = "=" * 80
print(f"\n{separator}")
print(f"Container: {container.name}")
print(f"{separator}")
container_logs = container.logs().decode('utf-8')
for log_line in container_logs.splitlines():
print(f" {log_line}")
print(f"{separator}\n")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gives us better visibility what happened with entire stack.


def run_or_raise_error(commands: list, error_message):
result = subprocess.run(commands, universal_newlines=True, stdout=subprocess.PIPE)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,10 @@ void setupMock() {

try (RestClient restClient = rcb.build()) {
final SSLHandshakeException ex = assertThrows(SSLHandshakeException.class, () -> restClient.performRequest(new Request("GET", "/")));
assertThat(ex.getMessage(), stringContainsInOrder("fatal", "bad_certificate"));
assertThat(ex.getMessage(), allOf(
containsString("fatal"),
anyOf(containsString("bad_certificate"), containsString("certificate_required"))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some hosts (mostly with recent builds), this test fails with the below error, looks like a JVM specific.
I can't reproduce it on my local (my JVM always uses build in certificates) but taking a look at RFC-8446, the unit test is not sending any certificates, so certificate_required looks a correct validation here.
image

logstash-1       \|   Test testBasicConnectivityDisablingVerification() FAILED
--
logstash-1       \|
logstash-1       \|   java.lang.AssertionError:
logstash-1       \|   Expected: a string containing "fatal", "bad_certificate" in order
logstash-1       \|        but: was "(certificate_required) Received fatal alert: certificate_required"
logstash-1       \|       at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
logstash-1       \|       at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
logstash-1       \|       at co.elastic.logstash.filters.elasticintegration.ElasticsearchRestClientWireMockTest$MutualHttps.testBasicConnectivityDisablingVerification(ElasticsearchRestClientWireMockTest.java:178)
logstash-1       \|       at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
logstash-1       \|       at java.base/java.lang.reflect.Method.invoke(Method.java:580)

));
}
}

Expand Down