Add HEC validation mode to omhttp test server harness by rgerhards · Pull Request #132 · rgerhards/rsyslog

rgerhards · 2025-10-01T10:29:05Z

Summary

extend the omhttp test server with Splunk HEC validation, deterministic failure injection, and raw+summary storage for requests
update diag.sh and README to consume the new response layout and expose the hec batch format when collecting batches
add omhttp-profile-hec-splunk.sh and register it in the test suite to exercise accept and reject paths

Testing

python3 -m py_compile tests/omhttp_server.py
manual curl validation of HEC success/failure responses

https://chatgpt.com/codex/tasks/task_e_68dcffca38a883329a0b6bc1854e9e1d

Introduce Splunk HEC validation for the omhttp test server and expose deterministic retry knobs so tests can exercise retry logic. Update diag utilities and documentation for the new storage layout and add a profile script that asserts both acceptance and rejection paths. AI-Agent: Codex

gemini-code-assist · 2025-10-01T10:29:27Z

Summary of Changes

Hello @rgerhards, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the omhttp test server by integrating robust Splunk HEC (HTTP Event Collector) validation and testing mechanisms. It introduces the ability to validate HEC payloads, inject controlled failures for comprehensive error handling tests, and store detailed raw and summarized request data. These improvements provide a more powerful and flexible environment for testing omhttp's behavior with HEC, ensuring reliable data ingestion and error recovery.

Highlights

Splunk HEC Validation: The omhttp test server now includes robust Splunk HEC (HTTP Event Collector) validation capabilities, ensuring payloads conform to expected formats.
Deterministic Failure Injection: New options have been added to the omhttp test server to allow for deterministic failure injection during HEC validation, enabling comprehensive testing of retry and error handling mechanisms.
Enhanced Data Storage: The test server now stores both the raw request bodies and a parsed summary of the data, including extracted msgnum values and HEC-specific event details, providing richer information for test assertions.
diag.sh Update for HEC: The diag.sh script's omhttp_get_data function has been refactored to correctly parse the new structured data from the test server and now supports a dedicated hec batch format.
New HEC Test Case: A new test script, omhttp-profile-hec-splunk.sh, has been added to the test suite to specifically exercise and validate the HEC accept and reject paths of the omhttp test server.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request extends the omhttp test server with Splunk HEC validation capabilities, which is a great addition for testing. The changes are well-structured, including new test scripts and documentation.

My review focuses on improving maintainability and correctness:

In diag.sh, the embedded Python script for parsing is hard to maintain. I've suggested using a heredoc for better readability.
In omhttp_server.py, I've pointed out a couple of areas for improvement: using splitlines() for more robust line ending handling and making the HEC error response format compliant with the Splunk specification.

Overall, these are solid changes that enhance the test suite's capabilities.

gemini-code-assist · 2025-10-01T10:31:38Z

tests/diag.sh

+    python_parse='import json, sys\n'
+    python_parse+='fmt = sys.argv[1] if len(sys.argv) > 1 else ""\n'
+    python_parse+='data = json.load(sys.stdin)\n'
+    python_parse+='try:\n'
+    python_parse+='    string_types = (str, bytes)\n'
+    python_parse+='except NameError:\n'
+    python_parse+='    string_types = (str,)\n'
+    python_parse+='def ensure_list(val):\n'
+    python_parse+='    if isinstance(val, list):\n'
+    python_parse+='        return val\n'
+    python_parse+='    return [val]\n'
+    python_parse+='def extract_from_obj(obj, out):\n'
+    python_parse+='    if isinstance(obj, dict):\n'
+    python_parse+='        if "msgnum" in obj:\n'
+    python_parse+='            out.append(str(obj["msgnum"]))\n'
+    python_parse+='        if "value" in obj:\n'
+    python_parse+='            extract_from_obj(obj["value"], out)\n'
+    python_parse+='        if "streams" in obj and isinstance(obj["streams"], list):\n'
+    python_parse+='            for item in obj["streams"]:\n'
+    python_parse+='                extract_from_obj(item, out)\n'
+    python_parse+='        if "records" in obj and isinstance(obj["records"], list):\n'
+    python_parse+='            for item in obj["records"]:\n'
+    python_parse+='                extract_from_obj(item, out)\n'
+    python_parse+='        if "event" in obj:\n'
+    python_parse+='            event_value = obj["event"]\n'
+    python_parse+='            if isinstance(event_value, string_types):\n'
+    python_parse+='                try:\n'
+    python_parse+='                    event_value = json.loads(event_value)\n'
+    python_parse+='                except (TypeError, ValueError):\n'
+    python_parse+='                    event_value = None\n'
+    python_parse+='            extract_from_obj(event_value, out)\n'
+    python_parse+='    elif isinstance(obj, list):\n'
+    python_parse+='        for item in obj:\n'
+    python_parse+='            extract_from_obj(item, out)\n'
+    python_parse+='def extract_msgnums(raw):\n'
+    python_parse+='    nums = []\n'
+    python_parse+='    try:\n'
+    python_parse+='        parsed = json.loads(raw)\n'
+    python_parse+='    except (TypeError, ValueError):\n'
+    python_parse+='        for line in raw.splitlines():\n'
+    python_parse+='            line = line.strip()\n'
+    python_parse+='            if not line:\n'
+    python_parse+='                continue\n'
+    python_parse+='            try:\n'
+    python_parse+='                parsed_line = json.loads(line)\n'
+    python_parse+='            except (TypeError, ValueError):\n'
+    python_parse+='                continue\n'
+    python_parse+='            extract_from_obj(parsed_line, nums)\n'
+    python_parse+='    else:\n'
+    python_parse+='        extract_from_obj(parsed, nums)\n'
+    python_parse+='    return nums\n'
+    python_parse+='def parse_entry(entry):\n'
+    python_parse+='    if isinstance(entry, dict):\n'
+    python_parse+='        summary = entry.get("summary", {})\n'
+    python_parse+='        if isinstance(summary, dict):\n'
+    python_parse+='            msgnums = summary.get("msgnums")\n'
+    python_parse+='            if msgnums:\n'
+    python_parse+='                return [str(v) for v in ensure_list(msgnums)]\n'
+    python_parse+='            if fmt == "hec" and summary.get("hec_events"):\n'
+    python_parse+='                extracted = []\n'
+    python_parse+='                for ev in summary.get("hec_events") or []:\n'
+    python_parse+='                    extract_from_obj(ev, extracted)\n'
+    python_parse+='                if extracted:\n'
+    python_parse+='                    return extracted\n'
+    python_parse+='        raw = entry.get("raw", "")\n'
+    python_parse+='    else:\n'
+    python_parse+='        raw = entry\n'
+    python_parse+='    return [str(v) for v in extract_msgnums(raw)]\n'
+    python_parse+='out = []\n'
+    python_parse+='for item in data:\n'
+    python_parse+='    out.extend(parse_entry(item))\n'
+    python_parse+='print("\\n".join(out))\n'


The way the Python script is embedded in this shell script by concatenating strings makes it very difficult to read, maintain, and debug. For better readability and maintainability, I suggest using a shell heredoc to define the Python script block.

python_parse=$(cat <<'END_PYTHON' import json, sys fmt = sys.argv[1] if len(sys.argv) > 1 else "" data = json.load(sys.stdin) try: string_types = (str, bytes) except NameError: string_types = (str,) def ensure_list(val): if isinstance(val, list): return val return [val] def extract_from_obj(obj, out): if isinstance(obj, dict): if "msgnum" in obj: out.append(str(obj["msgnum"])) if "value" in obj: extract_from_obj(obj["value"], out) if "streams" in obj and isinstance(obj["streams"], list): for item in obj["streams"]: extract_from_obj(item, out) if "records" in obj and isinstance(obj["records"], list): for item in obj["records"]: extract_from_obj(item, out) if "event" in obj: event_value = obj["event"] if isinstance(event_value, string_types): try: event_value = json.loads(event_value) except (TypeError, ValueError): event_value = None extract_from_obj(event_value, out) elif isinstance(obj, list): for item in obj: extract_from_obj(item, out) def extract_msgnums(raw): nums = [] try: parsed = json.loads(raw) except (TypeError, ValueError): for line in raw.splitlines(): line = line.strip() if not line: continue try: parsed_line = json.loads(line) except (TypeError, ValueError): continue extract_from_obj(parsed_line, nums) else: extract_from_obj(parsed, nums) return nums def parse_entry(entry): if isinstance(entry, dict): summary = entry.get("summary", {}) if isinstance(summary, dict): msgnums = summary.get("msgnums") if msgnums: return [str(v) for v in ensure_list(msgnums)] if fmt == "hec" and summary.get("hec_events"): extracted = [] for ev in summary.get("hec_events") or []: extract_from_obj(ev, extracted) if extracted: return extracted raw = entry.get("raw", "") else: raw = entry return [str(v) for v in extract_msgnums(raw)] out = [] for item in data: out.extend(parse_entry(item)) print("\n".join(out)) END_PYTHON )

gemini-code-assist · 2025-10-01T10:31:38Z

tests/omhttp_server.py

+        lines = [line.strip() for line in raw_data.split('\n') if line.strip()]
+        for line in lines:
+            try:
+                decoded_entries.append(json.loads(line))
+            except (ValueError, TypeError, JSONDecodeError):
+                return False, 'invalid JSON payload for HEC validation', summary


Using raw_data.split('\n') might not handle all line endings correctly (e.g., \r\n). It's more robust to use splitlines(). Also, creating an intermediate list with a list comprehension is less memory-efficient for large payloads than iterating directly. I suggest iterating over the lines and processing them one by one.

Suggested change

lines = [line.strip() for line in raw_data.split('\n') if line.strip()]

for line in lines:

try:

decoded_entries.append(json.loads(line))

except (ValueError, TypeError, JSONDecodeError):

return False, 'invalid JSON payload for HEC validation', summary

for line in raw_data.splitlines():

line = line.strip()

if not line:

continue

try:

decoded_entries.append(json.loads(line))

except (ValueError, TypeError, JSONDecodeError):

return False, 'invalid JSON payload for HEC validation', summary

gemini-code-assist · 2025-10-01T10:31:38Z

tests/omhttp_server.py

+            code = 400
+        if code < 400 or code > 499:
+            code = 400
+        payload = json.dumps({'error': message}).encode('utf8')


The Splunk HEC specification for error responses indicates a JSON body with text and code fields, for example {"text": "Invalid data format", "code": 6}. The current implementation uses an error key, which is not compliant. To better simulate a real HEC endpoint, the response should be adjusted.

Suggested change

payload = json.dumps({'error': message}).encode('utf8')

payload = json.dumps({'text': message, 'code': 1}).encode('utf8')

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting

@codex fix this CI failure
@codex address that feedback

chatgpt-codex-connector · 2025-10-01T10:35:30Z

tests/diag.sh

+    python_parse='import json, sys\n'
+    python_parse+='fmt = sys.argv[1] if len(sys.argv) > 1 else ""\n'
+    python_parse+='data = json.load(sys.stdin)\n'


Build Python snippet with literal \n breaks omhttp_get_data

The new omhttp_get_data builds the inline Python program via assignments such as python_parse='import json, sys\n'. Because these strings contain literal \n characters instead of real newlines, invoking $PYTHON -c "$python_parse" produces a syntax error (unexpected character after line continuation character) before any JSON is parsed. Every test that calls omhttp_get_data will now fail. Construct the script using actual newlines (e.g., a here‑document) or keep the previous semicolon style so the generated code is valid.

Useful? React with 👍 / 👎.

rgerhards added the codex label Oct 1, 2025 — with ChatGPT Codex Connector

gemini-code-assist bot reviewed Oct 1, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Oct 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add HEC validation mode to omhttp test server harness#132

Add HEC validation mode to omhttp test server harness#132
rgerhards wants to merge 1 commit intomainfrom
codex/extend-omhttp_server-with-hec-validation

rgerhards commented Oct 1, 2025

Uh oh!

gemini-code-assist bot commented Oct 1, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 1, 2025

Uh oh!

gemini-code-assist bot Oct 1, 2025

Uh oh!

gemini-code-assist bot Oct 1, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	payload = json.dumps({'error': message}).encode('utf8')
	payload = json.dumps({'text': message, 'code': 1}).encode('utf8')

Conversation

rgerhards commented Oct 1, 2025

Summary

Testing

Uh oh!

gemini-code-assist bot commented Oct 1, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant