Add HEC validation mode to omhttp test server harness#132
Add HEC validation mode to omhttp test server harness#132
Conversation
Introduce Splunk HEC validation for the omhttp test server and expose deterministic retry knobs so tests can exercise retry logic. Update diag utilities and documentation for the new storage layout and add a profile script that asserts both acceptance and rejection paths. AI-Agent: Codex
Summary of ChangesHello @rgerhards, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request extends the omhttp test server with Splunk HEC validation capabilities, which is a great addition for testing. The changes are well-structured, including new test scripts and documentation.
My review focuses on improving maintainability and correctness:
- In
diag.sh, the embedded Python script for parsing is hard to maintain. I've suggested using a heredoc for better readability. - In
omhttp_server.py, I've pointed out a couple of areas for improvement: usingsplitlines()for more robust line ending handling and making the HEC error response format compliant with the Splunk specification.
Overall, these are solid changes that enhance the test suite's capabilities.
| python_parse='import json, sys\n' | ||
| python_parse+='fmt = sys.argv[1] if len(sys.argv) > 1 else ""\n' | ||
| python_parse+='data = json.load(sys.stdin)\n' | ||
| python_parse+='try:\n' | ||
| python_parse+=' string_types = (str, bytes)\n' | ||
| python_parse+='except NameError:\n' | ||
| python_parse+=' string_types = (str,)\n' | ||
| python_parse+='def ensure_list(val):\n' | ||
| python_parse+=' if isinstance(val, list):\n' | ||
| python_parse+=' return val\n' | ||
| python_parse+=' return [val]\n' | ||
| python_parse+='def extract_from_obj(obj, out):\n' | ||
| python_parse+=' if isinstance(obj, dict):\n' | ||
| python_parse+=' if "msgnum" in obj:\n' | ||
| python_parse+=' out.append(str(obj["msgnum"]))\n' | ||
| python_parse+=' if "value" in obj:\n' | ||
| python_parse+=' extract_from_obj(obj["value"], out)\n' | ||
| python_parse+=' if "streams" in obj and isinstance(obj["streams"], list):\n' | ||
| python_parse+=' for item in obj["streams"]:\n' | ||
| python_parse+=' extract_from_obj(item, out)\n' | ||
| python_parse+=' if "records" in obj and isinstance(obj["records"], list):\n' | ||
| python_parse+=' for item in obj["records"]:\n' | ||
| python_parse+=' extract_from_obj(item, out)\n' | ||
| python_parse+=' if "event" in obj:\n' | ||
| python_parse+=' event_value = obj["event"]\n' | ||
| python_parse+=' if isinstance(event_value, string_types):\n' | ||
| python_parse+=' try:\n' | ||
| python_parse+=' event_value = json.loads(event_value)\n' | ||
| python_parse+=' except (TypeError, ValueError):\n' | ||
| python_parse+=' event_value = None\n' | ||
| python_parse+=' extract_from_obj(event_value, out)\n' | ||
| python_parse+=' elif isinstance(obj, list):\n' | ||
| python_parse+=' for item in obj:\n' | ||
| python_parse+=' extract_from_obj(item, out)\n' | ||
| python_parse+='def extract_msgnums(raw):\n' | ||
| python_parse+=' nums = []\n' | ||
| python_parse+=' try:\n' | ||
| python_parse+=' parsed = json.loads(raw)\n' | ||
| python_parse+=' except (TypeError, ValueError):\n' | ||
| python_parse+=' for line in raw.splitlines():\n' | ||
| python_parse+=' line = line.strip()\n' | ||
| python_parse+=' if not line:\n' | ||
| python_parse+=' continue\n' | ||
| python_parse+=' try:\n' | ||
| python_parse+=' parsed_line = json.loads(line)\n' | ||
| python_parse+=' except (TypeError, ValueError):\n' | ||
| python_parse+=' continue\n' | ||
| python_parse+=' extract_from_obj(parsed_line, nums)\n' | ||
| python_parse+=' else:\n' | ||
| python_parse+=' extract_from_obj(parsed, nums)\n' | ||
| python_parse+=' return nums\n' | ||
| python_parse+='def parse_entry(entry):\n' | ||
| python_parse+=' if isinstance(entry, dict):\n' | ||
| python_parse+=' summary = entry.get("summary", {})\n' | ||
| python_parse+=' if isinstance(summary, dict):\n' | ||
| python_parse+=' msgnums = summary.get("msgnums")\n' | ||
| python_parse+=' if msgnums:\n' | ||
| python_parse+=' return [str(v) for v in ensure_list(msgnums)]\n' | ||
| python_parse+=' if fmt == "hec" and summary.get("hec_events"):\n' | ||
| python_parse+=' extracted = []\n' | ||
| python_parse+=' for ev in summary.get("hec_events") or []:\n' | ||
| python_parse+=' extract_from_obj(ev, extracted)\n' | ||
| python_parse+=' if extracted:\n' | ||
| python_parse+=' return extracted\n' | ||
| python_parse+=' raw = entry.get("raw", "")\n' | ||
| python_parse+=' else:\n' | ||
| python_parse+=' raw = entry\n' | ||
| python_parse+=' return [str(v) for v in extract_msgnums(raw)]\n' | ||
| python_parse+='out = []\n' | ||
| python_parse+='for item in data:\n' | ||
| python_parse+=' out.extend(parse_entry(item))\n' | ||
| python_parse+='print("\\n".join(out))\n' |
There was a problem hiding this comment.
The way the Python script is embedded in this shell script by concatenating strings makes it very difficult to read, maintain, and debug. For better readability and maintainability, I suggest using a shell heredoc to define the Python script block.
python_parse=$(cat <<'END_PYTHON'
import json, sys
fmt = sys.argv[1] if len(sys.argv) > 1 else ""
data = json.load(sys.stdin)
try:
string_types = (str, bytes)
except NameError:
string_types = (str,)
def ensure_list(val):
if isinstance(val, list):
return val
return [val]
def extract_from_obj(obj, out):
if isinstance(obj, dict):
if "msgnum" in obj:
out.append(str(obj["msgnum"]))
if "value" in obj:
extract_from_obj(obj["value"], out)
if "streams" in obj and isinstance(obj["streams"], list):
for item in obj["streams"]:
extract_from_obj(item, out)
if "records" in obj and isinstance(obj["records"], list):
for item in obj["records"]:
extract_from_obj(item, out)
if "event" in obj:
event_value = obj["event"]
if isinstance(event_value, string_types):
try:
event_value = json.loads(event_value)
except (TypeError, ValueError):
event_value = None
extract_from_obj(event_value, out)
elif isinstance(obj, list):
for item in obj:
extract_from_obj(item, out)
def extract_msgnums(raw):
nums = []
try:
parsed = json.loads(raw)
except (TypeError, ValueError):
for line in raw.splitlines():
line = line.strip()
if not line:
continue
try:
parsed_line = json.loads(line)
except (TypeError, ValueError):
continue
extract_from_obj(parsed_line, nums)
else:
extract_from_obj(parsed, nums)
return nums
def parse_entry(entry):
if isinstance(entry, dict):
summary = entry.get("summary", {})
if isinstance(summary, dict):
msgnums = summary.get("msgnums")
if msgnums:
return [str(v) for v in ensure_list(msgnums)]
if fmt == "hec" and summary.get("hec_events"):
extracted = []
for ev in summary.get("hec_events") or []:
extract_from_obj(ev, extracted)
if extracted:
return extracted
raw = entry.get("raw", "")
else:
raw = entry
return [str(v) for v in extract_msgnums(raw)]
out = []
for item in data:
out.extend(parse_entry(item))
print("\n".join(out))
END_PYTHON
)| lines = [line.strip() for line in raw_data.split('\n') if line.strip()] | ||
| for line in lines: | ||
| try: | ||
| decoded_entries.append(json.loads(line)) | ||
| except (ValueError, TypeError, JSONDecodeError): | ||
| return False, 'invalid JSON payload for HEC validation', summary |
There was a problem hiding this comment.
Using raw_data.split('\n') might not handle all line endings correctly (e.g., \r\n). It's more robust to use splitlines(). Also, creating an intermediate list with a list comprehension is less memory-efficient for large payloads than iterating directly. I suggest iterating over the lines and processing them one by one.
| lines = [line.strip() for line in raw_data.split('\n') if line.strip()] | |
| for line in lines: | |
| try: | |
| decoded_entries.append(json.loads(line)) | |
| except (ValueError, TypeError, JSONDecodeError): | |
| return False, 'invalid JSON payload for HEC validation', summary | |
| for line in raw_data.splitlines(): | |
| line = line.strip() | |
| if not line: | |
| continue | |
| try: | |
| decoded_entries.append(json.loads(line)) | |
| except (ValueError, TypeError, JSONDecodeError): | |
| return False, 'invalid JSON payload for HEC validation', summary |
| code = 400 | ||
| if code < 400 or code > 499: | ||
| code = 400 | ||
| payload = json.dumps({'error': message}).encode('utf8') |
There was a problem hiding this comment.
The Splunk HEC specification for error responses indicates a JSON body with text and code fields, for example {"text": "Invalid data format", "code": 6}. The current implementation uses an error key, which is not compliant. To better simulate a real HEC endpoint, the response should be adjusted.
| payload = json.dumps({'error': message}).encode('utf8') | |
| payload = json.dumps({'text': message, 'code': 1}).encode('utf8') |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting
| python_parse='import json, sys\n' | ||
| python_parse+='fmt = sys.argv[1] if len(sys.argv) > 1 else ""\n' | ||
| python_parse+='data = json.load(sys.stdin)\n' |
There was a problem hiding this comment.
Build Python snippet with literal \n breaks omhttp_get_data
The new omhttp_get_data builds the inline Python program via assignments such as python_parse='import json, sys\n'. Because these strings contain literal \n characters instead of real newlines, invoking $PYTHON -c "$python_parse" produces a syntax error (unexpected character after line continuation character) before any JSON is parsed. Every test that calls omhttp_get_data will now fail. Construct the script using actual newlines (e.g., a here‑document) or keep the previous semicolon style so the generated code is valid.
Useful? React with 👍 / 👎.
Summary
hecbatch format when collecting batchesTesting
https://chatgpt.com/codex/tasks/task_e_68dcffca38a883329a0b6bc1854e9e1d