Skip to content

Add HEC validation mode to omhttp test server harness#132

Open
rgerhards wants to merge 1 commit intomainfrom
codex/extend-omhttp_server-with-hec-validation
Open

Add HEC validation mode to omhttp test server harness#132
rgerhards wants to merge 1 commit intomainfrom
codex/extend-omhttp_server-with-hec-validation

Conversation

@rgerhards
Copy link
Owner

Summary

  • extend the omhttp test server with Splunk HEC validation, deterministic failure injection, and raw+summary storage for requests
  • update diag.sh and README to consume the new response layout and expose the hec batch format when collecting batches
  • add omhttp-profile-hec-splunk.sh and register it in the test suite to exercise accept and reject paths

Testing

  • python3 -m py_compile tests/omhttp_server.py
  • manual curl validation of HEC success/failure responses

https://chatgpt.com/codex/tasks/task_e_68dcffca38a883329a0b6bc1854e9e1d

Introduce Splunk HEC validation for the omhttp test server and expose deterministic retry knobs so tests can exercise retry logic. Update diag utilities and documentation for the new storage layout and add a profile script that asserts both acceptance and rejection paths.

AI-Agent: Codex
@gemini-code-assist
Copy link

Summary of Changes

Hello @rgerhards, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the omhttp test server by integrating robust Splunk HEC (HTTP Event Collector) validation and testing mechanisms. It introduces the ability to validate HEC payloads, inject controlled failures for comprehensive error handling tests, and store detailed raw and summarized request data. These improvements provide a more powerful and flexible environment for testing omhttp's behavior with HEC, ensuring reliable data ingestion and error recovery.

Highlights

  • Splunk HEC Validation: The omhttp test server now includes robust Splunk HEC (HTTP Event Collector) validation capabilities, ensuring payloads conform to expected formats.
  • Deterministic Failure Injection: New options have been added to the omhttp test server to allow for deterministic failure injection during HEC validation, enabling comprehensive testing of retry and error handling mechanisms.
  • Enhanced Data Storage: The test server now stores both the raw request bodies and a parsed summary of the data, including extracted msgnum values and HEC-specific event details, providing richer information for test assertions.
  • diag.sh Update for HEC: The diag.sh script's omhttp_get_data function has been refactored to correctly parse the new structured data from the test server and now supports a dedicated hec batch format.
  • New HEC Test Case: A new test script, omhttp-profile-hec-splunk.sh, has been added to the test suite to specifically exercise and validate the HEC accept and reject paths of the omhttp test server.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request extends the omhttp test server with Splunk HEC validation capabilities, which is a great addition for testing. The changes are well-structured, including new test scripts and documentation.

My review focuses on improving maintainability and correctness:

  • In diag.sh, the embedded Python script for parsing is hard to maintain. I've suggested using a heredoc for better readability.
  • In omhttp_server.py, I've pointed out a couple of areas for improvement: using splitlines() for more robust line ending handling and making the HEC error response format compliant with the Splunk specification.

Overall, these are solid changes that enhance the test suite's capabilities.

Comment on lines +2851 to +2922
python_parse='import json, sys\n'
python_parse+='fmt = sys.argv[1] if len(sys.argv) > 1 else ""\n'
python_parse+='data = json.load(sys.stdin)\n'
python_parse+='try:\n'
python_parse+=' string_types = (str, bytes)\n'
python_parse+='except NameError:\n'
python_parse+=' string_types = (str,)\n'
python_parse+='def ensure_list(val):\n'
python_parse+=' if isinstance(val, list):\n'
python_parse+=' return val\n'
python_parse+=' return [val]\n'
python_parse+='def extract_from_obj(obj, out):\n'
python_parse+=' if isinstance(obj, dict):\n'
python_parse+=' if "msgnum" in obj:\n'
python_parse+=' out.append(str(obj["msgnum"]))\n'
python_parse+=' if "value" in obj:\n'
python_parse+=' extract_from_obj(obj["value"], out)\n'
python_parse+=' if "streams" in obj and isinstance(obj["streams"], list):\n'
python_parse+=' for item in obj["streams"]:\n'
python_parse+=' extract_from_obj(item, out)\n'
python_parse+=' if "records" in obj and isinstance(obj["records"], list):\n'
python_parse+=' for item in obj["records"]:\n'
python_parse+=' extract_from_obj(item, out)\n'
python_parse+=' if "event" in obj:\n'
python_parse+=' event_value = obj["event"]\n'
python_parse+=' if isinstance(event_value, string_types):\n'
python_parse+=' try:\n'
python_parse+=' event_value = json.loads(event_value)\n'
python_parse+=' except (TypeError, ValueError):\n'
python_parse+=' event_value = None\n'
python_parse+=' extract_from_obj(event_value, out)\n'
python_parse+=' elif isinstance(obj, list):\n'
python_parse+=' for item in obj:\n'
python_parse+=' extract_from_obj(item, out)\n'
python_parse+='def extract_msgnums(raw):\n'
python_parse+=' nums = []\n'
python_parse+=' try:\n'
python_parse+=' parsed = json.loads(raw)\n'
python_parse+=' except (TypeError, ValueError):\n'
python_parse+=' for line in raw.splitlines():\n'
python_parse+=' line = line.strip()\n'
python_parse+=' if not line:\n'
python_parse+=' continue\n'
python_parse+=' try:\n'
python_parse+=' parsed_line = json.loads(line)\n'
python_parse+=' except (TypeError, ValueError):\n'
python_parse+=' continue\n'
python_parse+=' extract_from_obj(parsed_line, nums)\n'
python_parse+=' else:\n'
python_parse+=' extract_from_obj(parsed, nums)\n'
python_parse+=' return nums\n'
python_parse+='def parse_entry(entry):\n'
python_parse+=' if isinstance(entry, dict):\n'
python_parse+=' summary = entry.get("summary", {})\n'
python_parse+=' if isinstance(summary, dict):\n'
python_parse+=' msgnums = summary.get("msgnums")\n'
python_parse+=' if msgnums:\n'
python_parse+=' return [str(v) for v in ensure_list(msgnums)]\n'
python_parse+=' if fmt == "hec" and summary.get("hec_events"):\n'
python_parse+=' extracted = []\n'
python_parse+=' for ev in summary.get("hec_events") or []:\n'
python_parse+=' extract_from_obj(ev, extracted)\n'
python_parse+=' if extracted:\n'
python_parse+=' return extracted\n'
python_parse+=' raw = entry.get("raw", "")\n'
python_parse+=' else:\n'
python_parse+=' raw = entry\n'
python_parse+=' return [str(v) for v in extract_msgnums(raw)]\n'
python_parse+='out = []\n'
python_parse+='for item in data:\n'
python_parse+=' out.extend(parse_entry(item))\n'
python_parse+='print("\\n".join(out))\n'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The way the Python script is embedded in this shell script by concatenating strings makes it very difficult to read, maintain, and debug. For better readability and maintainability, I suggest using a shell heredoc to define the Python script block.

    python_parse=$(cat <<'END_PYTHON'
import json, sys
fmt = sys.argv[1] if len(sys.argv) > 1 else ""
data = json.load(sys.stdin)
try:
    string_types = (str, bytes)
except NameError:
    string_types = (str,)
def ensure_list(val):
    if isinstance(val, list):
        return val
    return [val]
def extract_from_obj(obj, out):
    if isinstance(obj, dict):
        if "msgnum" in obj:
            out.append(str(obj["msgnum"]))
        if "value" in obj:
            extract_from_obj(obj["value"], out)
        if "streams" in obj and isinstance(obj["streams"], list):
            for item in obj["streams"]:
                extract_from_obj(item, out)
        if "records" in obj and isinstance(obj["records"], list):
            for item in obj["records"]:
                extract_from_obj(item, out)
        if "event" in obj:
            event_value = obj["event"]
            if isinstance(event_value, string_types):
                try:
                    event_value = json.loads(event_value)
                except (TypeError, ValueError):
                    event_value = None
            extract_from_obj(event_value, out)
    elif isinstance(obj, list):
        for item in obj:
            extract_from_obj(item, out)
def extract_msgnums(raw):
    nums = []
    try:
        parsed = json.loads(raw)
    except (TypeError, ValueError):
        for line in raw.splitlines():
            line = line.strip()
            if not line:
                continue
            try:
                parsed_line = json.loads(line)
            except (TypeError, ValueError):
                continue
            extract_from_obj(parsed_line, nums)
    else:
        extract_from_obj(parsed, nums)
    return nums
def parse_entry(entry):
    if isinstance(entry, dict):
        summary = entry.get("summary", {})
        if isinstance(summary, dict):
            msgnums = summary.get("msgnums")
            if msgnums:
                return [str(v) for v in ensure_list(msgnums)]
            if fmt == "hec" and summary.get("hec_events"):
                extracted = []
                for ev in summary.get("hec_events") or []:
                    extract_from_obj(ev, extracted)
                if extracted:
                    return extracted
        raw = entry.get("raw", "")
    else:
        raw = entry
    return [str(v) for v in extract_msgnums(raw)]
out = []
for item in data:
    out.extend(parse_entry(item))
print("\n".join(out))
END_PYTHON
)

Comment on lines +107 to +112
lines = [line.strip() for line in raw_data.split('\n') if line.strip()]
for line in lines:
try:
decoded_entries.append(json.loads(line))
except (ValueError, TypeError, JSONDecodeError):
return False, 'invalid JSON payload for HEC validation', summary

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using raw_data.split('\n') might not handle all line endings correctly (e.g., \r\n). It's more robust to use splitlines(). Also, creating an intermediate list with a list comprehension is less memory-efficient for large payloads than iterating directly. I suggest iterating over the lines and processing them one by one.

Suggested change
lines = [line.strip() for line in raw_data.split('\n') if line.strip()]
for line in lines:
try:
decoded_entries.append(json.loads(line))
except (ValueError, TypeError, JSONDecodeError):
return False, 'invalid JSON payload for HEC validation', summary
for line in raw_data.splitlines():
line = line.strip()
if not line:
continue
try:
decoded_entries.append(json.loads(line))
except (ValueError, TypeError, JSONDecodeError):
return False, 'invalid JSON payload for HEC validation', summary

code = 400
if code < 400 or code > 499:
code = 400
payload = json.dumps({'error': message}).encode('utf8')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Splunk HEC specification for error responses indicates a JSON body with text and code fields, for example {"text": "Invalid data format", "code": 6}. The current implementation uses an error key, which is not compliant. To better simulate a real HEC endpoint, the response should be adjusted.

Suggested change
payload = json.dumps({'error': message}).encode('utf8')
payload = json.dumps({'text': message, 'code': 1}).encode('utf8')

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting

Comment on lines +2851 to +2853
python_parse='import json, sys\n'
python_parse+='fmt = sys.argv[1] if len(sys.argv) > 1 else ""\n'
python_parse+='data = json.load(sys.stdin)\n'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Build Python snippet with literal \n breaks omhttp_get_data

The new omhttp_get_data builds the inline Python program via assignments such as python_parse='import json, sys\n'. Because these strings contain literal \n characters instead of real newlines, invoking $PYTHON -c "$python_parse" produces a syntax error (unexpected character after line continuation character) before any JSON is parsed. Every test that calls omhttp_get_data will now fail. Construct the script using actual newlines (e.g., a here‑document) or keep the previous semicolon style so the generated code is valid.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant