Skip to content

Commit 36ab0e7

Browse files
ekneg54Pablu23kaya-davidmhoff
authored
fix spliting elements with one item (#924)
* fix spliting elements with one item * put changes to ng * add multiple empty use case * Add remove_whitespace flag, fix tests, use pytest.param * Rename, fix documentation test_cases maybe, add check for non Empty instead of only space * Try and make template account for pytest.param * Change from item.id is defined, to item.values * Add pydoc, remove lsp comments, fix pydoc * release: 18.0.1 (#925) * Update changelog.md * Update pre commit and black dependency * Reformat with black * refactor: switch to uv with lockfile, decouple PyPI and optimize Docker builds (#914) Documentation * doc: update README.md * doc: update installation docs * doc: update CHANGELOG.md Build & Dependencies * build: replace pip with uv pip * build: add uv.lock * chore: cap major versions for production dependencies * fix: CVE-2025-69223 Docker * feat: adapt Dockerfile for uv-based installation * feat: optimize Docker build using uv caching * feat: simplify Dockerfile * refactor: small Dockerfile improvements * fix: remove LOGPREP_VERSION build argument CI / CD * feat: adapt CI/CD workflow files * feat: update actions/checkout to v6 * build: add CI job to verify uv.lock consistency (#927) * build: add CI job to verify uv.lock consistency * feat: make mapping of Generic Resolver yaml compliant (#928) * Add list[tuple[str]] as valid resolve_list type, change test to account for tuple, COMMITED WITH NO-VERIFY * Add converter func * Apparently resolve_list is a mapping from str to dict * Add test to test converter * update changelog * Add typevar and rename func * Rename INPUT_TYPE and add KEY_TYPE, extract to own utils file, with own utils test, add wrapper function and keep merge_dicts relativly clean * Remove assert isinstance for now, rework converters, update tests * Add FieldValue as Dict value * pull up path * Update tests and add doc * Update logprep/processor/generic_resolver/rule.py Co-authored-by: Michael Hoff <mail@michael-hoff.net> * fix review comments * Convert from file load, fix tests, fix Key and Value vartypes * Upgrade uv.lock for protobuf cve fix * Update logprep/processor/generic_resolver/rule.py Co-authored-by: Michael Hoff <9436725+mhoff@users.noreply.github.com> * Rename test id, add pydoc to converters * Split long comment line into multiple * Change test to expect ValueError not InvalidConfigurationError --------- Co-authored-by: Michael Hoff <mail@michael-hoff.net> Co-authored-by: Michael Hoff <9436725+mhoff@users.noreply.github.com> * feat: add service account to chart (#931) * Add service account to chart, no-verify because yamlfmt doesnt like go templating? * update Changelog * Make serviceAccount disable able * Remove spec: as it is not spec * Update tests, enable service accounts by default * Remove enabled as it is redundant * remove unused chart.yaml.j2 * Update charts/logprep/templates/deployment.yaml Co-authored-by: Michael Hoff <9436725+mhoff@users.noreply.github.com> * Update charts/logprep/templates/service-account.yaml Co-authored-by: Michael Hoff <9436725+mhoff@users.noreply.github.com> * Update CHANGELOG.md Co-authored-by: Michael Hoff <9436725+mhoff@users.noreply.github.com> * Update charts/logprep/values.yaml Co-authored-by: Michael Hoff <9436725+mhoff@users.noreply.github.com> * Update default values and tests * Remove if enabled * Remove chart.lock --------- Co-authored-by: Michael Hoff <9436725+mhoff@users.noreply.github.com> * Update CHANGELOG * Add pytest.param improvement, add more rules, enhance pydoc * Update logprep/processor/string_splitter/rule.py Co-authored-by: Michael Hoff <9436725+mhoff@users.noreply.github.com> * Fix closing qoutes --------- Co-authored-by: Pablu23 <me@pablu.de> Co-authored-by: Pablu23 <43807157+Pablu23@users.noreply.github.com> Co-authored-by: kaya-david <david.kaya@bwi.de> Co-authored-by: kaya-david <205917832+kaya-david@users.noreply.github.com> Co-authored-by: Michael Hoff <mail@michael-hoff.net> Co-authored-by: Michael Hoff <9436725+mhoff@users.noreply.github.com>
1 parent 738d7dc commit 36ab0e7

File tree

7 files changed

+347
-65
lines changed

7 files changed

+347
-65
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,12 @@
44
### Features
55
* add uv as dependency management, including uv.lock
66
* allow configuration (and auto-creation) of service accounts in helm chart
7+
* add new drop_empty flag to allow the `string_splitter` to drop resulting fields that would be empty (e.g. whitespace)
78
* generic_resolver now handles all FieldValue types (including None)
89

910
### Improvements
1011
* simplify Dockerfile and remove docker build support for `LOGPREP_VERSION`
12+
* pytest.param now works with test_cases document generation
1113

1214
### Bugfix
1315
* generic_resolver now follows yaml standard and accepts a list instead of relying on the ordering of a dict
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,21 @@
11
.. -*- mode: rst -*-
22
{% for item in data.test_cases %}
33

4+
{% if item.values %}
5+
{# handles pytest.param #}
6+
*{{ item.id }}:*
7+
8+
- rule: :code:`{{ item.values[0] }}`
9+
- message: :code:`{{ item.values[1] }}`
10+
- processed: :code:`{{ item.values[2] }}`
11+
12+
{% else %}
13+
{# handles legacy tuple style #}
414
*{{ item[0] }}:*
515

616
- rule: :code:`{{ item[1] }}`
717
- message: :code:`{{ item[2] }}`
818
- processed: :code:`{{ item[3] }}`
19+
{% endif %}
920

1021
{% endfor %}

logprep/ng/processor/string_splitter/processor.py

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,13 @@
2424
.. automodule:: logprep.processor.string_splitter.rule
2525
"""
2626

27+
import typing
28+
29+
from typing_extensions import override
30+
2731
from logprep.ng.processor.field_manager.processor import FieldManager
2832
from logprep.processor.base.exceptions import ProcessingWarning
33+
from logprep.processor.base.rule import Rule
2934
from logprep.processor.string_splitter.rule import StringSplitterRule
3035
from logprep.util.helper import get_dotted_field_value
3136

@@ -35,11 +40,17 @@ class StringSplitter(FieldManager):
3540

3641
rule_class = StringSplitterRule
3742

38-
def _apply_rules(self, event: dict, rule: StringSplitterRule) -> None:
39-
source_field = rule.source_fields[0]
43+
@override
44+
def _apply_rules(self, event: dict, rule: Rule) -> None:
45+
_rule = typing.cast(StringSplitterRule, rule)
46+
47+
source_field = _rule.source_fields[0]
4048
source_field_content = get_dotted_field_value(event, source_field)
41-
self._handle_missing_fields(event, rule, rule.source_fields, [source_field_content])
49+
self._handle_missing_fields(event, _rule, _rule.source_fields, [source_field_content])
4250
if not isinstance(source_field_content, str):
4351
raise ProcessingWarning(f"source_field '{source_field}' is not a string", rule, event)
44-
result = source_field_content.split(rule.delimiter)
52+
result = source_field_content.split(_rule.delimiter)
53+
54+
if _rule.drop_empty:
55+
result = [item for item in result if item != "" and not item.isspace()]
4556
self._write_target_field(event, rule, result)

logprep/processor/string_splitter/processor.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,4 +42,7 @@ def _apply_rules(self, event: dict, rule: StringSplitterRule):
4242
if not isinstance(source_field_content, str):
4343
raise ProcessingWarning(f"source_field '{source_field}' is not a string", rule, event)
4444
result = source_field_content.split(rule.delimiter)
45+
46+
if rule.drop_empty:
47+
result = [item for item in result if item != "" and not item.isspace()]
4548
self._write_target_field(event, rule, result)

logprep/processor/string_splitter/rule.py

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,8 @@
4141
4242
"""
4343

44+
import typing
45+
4446
from attrs import define, field, validators
4547

4648
from logprep.processor.field_manager.rule import FieldManagerRule
@@ -60,13 +62,28 @@ class Config(FieldManagerRule.Config):
6062
validators.min_len(1),
6163
validators.max_len(1),
6264
],
65+
default=[],
6366
)
6467
delimiter: str = field(validator=validators.instance_of(str), default=" ")
6568
"""The delimiter for splitting. Defaults to whitespace"""
66-
mapping: dict = field(default="", init=False, repr=False, eq=False)
69+
mapping: dict = field(default={}, init=False, repr=False, eq=False)
6770
ignore_missing_fields: bool = field(default=False, init=False, repr=False, eq=False)
71+
drop_empty: bool = field(default=False)
72+
"""If empty list values (as a result of the splitting operation) should be dropped or kept.
73+
By this definition, the empty string (no characters) and strings containing only `whitespace <https://docs.python.org/3/library/stdtypes.html#str.isspace>`_ count as 'empty'.
74+
The default setting is to keep empty list values."""
75+
76+
@property
77+
def config(self) -> Config:
78+
"""returns the config as typed StringSplitterRule.Config"""
79+
return typing.cast(StringSplitterRule.Config, self._config)
6880

6981
@property
70-
def delimiter(self):
82+
def delimiter(self) -> str:
7183
"""returns the configured delimiter"""
72-
return self._config.delimiter
84+
return self.config.delimiter
85+
86+
@property
87+
def drop_empty(self) -> bool:
88+
"""returns the configured drop_empty flag"""
89+
return self.config.drop_empty

tests/unit/ng/processor/string_splitter/test_string_splitter.py

Lines changed: 147 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -13,52 +13,146 @@
1313
from tests.unit.ng.processor.base import BaseProcessorTestCase
1414

1515
test_cases = [
16-
(
17-
"splits without delimeter on whitespace",
16+
pytest.param(
1817
{
1918
"filter": "message",
20-
"string_splitter": {"source_fields": ["message"], "target_field": "result"},
19+
"string_splitter": {
20+
"source_fields": ["message"],
21+
"target_field": "result",
22+
"drop_empty": True,
23+
},
2124
},
2225
{"message": "this is the message"},
23-
{"message": "this is the message", "result": ["this", "is", "the", "message"]},
26+
["this", "is", "the", "message"],
27+
id="splits_without_explicit_set_delimiter_on_whitespace",
2428
),
25-
(
26-
"splits with delimeter",
29+
pytest.param(
2730
{
2831
"filter": "message",
2932
"string_splitter": {
3033
"source_fields": ["message"],
3134
"target_field": "result",
3235
"delimiter": ", ",
36+
"drop_empty": True,
3337
},
3438
},
3539
{"message": "this, is, the, message"},
36-
{"message": "this, is, the, message", "result": ["this", "is", "the", "message"]},
40+
["this", "is", "the", "message"],
41+
id="splits_with_delimiter",
3742
),
38-
] # testcase, rule, event, expected
39-
40-
failure_test_cases = [
41-
(
42-
"splits without delimeter on whitespace",
43+
pytest.param(
4344
{
4445
"filter": "message",
45-
"string_splitter": {"source_fields": ["message"], "target_field": "result"},
46+
"string_splitter": {
47+
"source_fields": ["message"],
48+
"target_field": "result",
49+
"delimiter": ",",
50+
"drop_empty": True,
51+
},
4652
},
47-
{"message": ["this", "is", "the", "message"]},
48-
{"message": ["this", "is", "the", "message"], "tags": ["_string_splitter_failure"]},
49-
".*ProcessingWarning.*",
53+
{"message": "this,"},
54+
["this"],
55+
id="splits_one_item_with_delimiter",
5056
),
51-
(
52-
"splits without delimeter on whitespace",
57+
pytest.param(
5358
{
5459
"filter": "message",
55-
"string_splitter": {"source_fields": ["message"], "target_field": "message"},
60+
"string_splitter": {
61+
"source_fields": ["message"],
62+
"target_field": "result",
63+
"delimiter": ",",
64+
"drop_empty": True,
65+
},
5666
},
57-
{"message": "this is the message"},
58-
{"message": "this is the message", "tags": ["_string_splitter_failure"]},
59-
".*FieldExistsWarning.*",
67+
{"message": ",,this,,"},
68+
["this"],
69+
id="splits_one_item_with_multiple_delimiter_and_drop_empty",
70+
),
71+
pytest.param(
72+
{
73+
"filter": "message",
74+
"string_splitter": {
75+
"source_fields": ["message"],
76+
"target_field": "result",
77+
"delimiter": ",",
78+
"drop_empty": False,
79+
},
80+
},
81+
{"message": ",,this,,"},
82+
["", "", "this", "", ""],
83+
id="splits_one_item_with_multiple_delimiter_and_no_drop_empty",
84+
),
85+
pytest.param(
86+
{
87+
"filter": "message",
88+
"string_splitter": {
89+
"source_fields": ["message"],
90+
"target_field": "result",
91+
"delimiter": ",",
92+
"drop_empty": True,
93+
},
94+
},
95+
{"message": " , ,this, ,"},
96+
["this"],
97+
id="splits_one_item_with_multiple_delimiter_and_empty_fields",
98+
),
99+
pytest.param(
100+
{
101+
"filter": "message",
102+
"string_splitter": {
103+
"source_fields": ["message"],
104+
"target_field": "result",
105+
"delimiter": ",",
106+
"drop_empty": True,
107+
},
108+
},
109+
{"message": ",, this , , "},
110+
[" this "],
111+
id="splits_one_item_with_multiple_delimiter_and_whitespace",
112+
),
113+
pytest.param(
114+
{
115+
"filter": "message",
116+
"string_splitter": {
117+
"source_fields": ["message"],
118+
"target_field": "result",
119+
"delimiter": ",",
120+
"drop_empty": True,
121+
},
122+
},
123+
{"message": "\n,,this,\t, "},
124+
["this"],
125+
id="splits_one_item_with_multiple_delimiter_and_newline",
126+
),
127+
pytest.param(
128+
{
129+
"filter": "message",
130+
"string_splitter": {
131+
"source_fields": ["message"],
132+
"target_field": "result",
133+
"delimiter": ",",
134+
"drop_empty": True,
135+
},
136+
},
137+
{"message": ",, this, , "},
138+
[" this"],
139+
id="splits_one_item_with_multiple_delimiter_and_whitespace_only_in_front",
140+
),
141+
pytest.param(
142+
{
143+
"filter": "message",
144+
"string_splitter": {
145+
"source_fields": ["message"],
146+
"target_field": "result",
147+
"delimiter": ",",
148+
"drop_empty": True,
149+
},
150+
},
151+
{"message": "hello , world,this, is a very complex,\n , and even multiline, text,,, "},
152+
["hello ", " world", "this", " is a very complex", " and even multiline", " text"],
153+
id="splits_one_item_with_multiple_delimiter_and_whitespace_only_in_front",
60154
),
61-
] # testcase, rule, event, expected, error_message
155+
]
62156

63157

64158
class TestStringSplitter(BaseProcessorTestCase):
@@ -67,18 +161,42 @@ class TestStringSplitter(BaseProcessorTestCase):
67161
"rules": ["tests/testdata/unit/string_splitter/rules"],
68162
}
69163

70-
@pytest.mark.parametrize("testcase, rule, event, expected", test_cases)
71-
def test_testcases(self, testcase, rule, event, expected): # pylint: disable=unused-argument
164+
@pytest.mark.parametrize(["rule", "event", "expected"], test_cases)
165+
def test_testcases(self, rule, event, expected):
72166
self._load_rule(rule)
73167
event = LogEvent(event, original=b"")
74168
self.object.process(event)
75-
assert event.data == expected, testcase
169+
assert event.data["result"] == expected
76170

77-
@pytest.mark.parametrize("testcase, rule, event, expected, error_message", failure_test_cases)
78-
def test_testcases_failure_handling(self, testcase, rule, event, expected, error_message):
171+
@pytest.mark.parametrize(
172+
["rule", "event", "expected", "error_message"],
173+
[
174+
pytest.param(
175+
{
176+
"filter": "message",
177+
"string_splitter": {"source_fields": ["message"], "target_field": "result"},
178+
},
179+
{"message": ["this", "is", "the", "message"]},
180+
{"message": ["this", "is", "the", "message"], "tags": ["_string_splitter_failure"]},
181+
".*ProcessingWarning.*",
182+
id="splits_without_delimiter_on_whitespace_with_no_string",
183+
),
184+
pytest.param(
185+
{
186+
"filter": "message",
187+
"string_splitter": {"source_fields": ["message"], "target_field": "message"},
188+
},
189+
{"message": "this is the message"},
190+
{"message": "this is the message", "tags": ["_string_splitter_failure"]},
191+
".*FieldExistsWarning.*",
192+
id="splits_without_delimiter_on_whitespace_with_existing_field",
193+
),
194+
],
195+
)
196+
def test_testcases_failure_handling(self, rule, event, expected, error_message):
79197
self._load_rule(rule)
80198
event = LogEvent(event, original=b"")
81199
result = self.object.process(event)
82200
assert len(result.warnings) == 1
83201
assert re.match(error_message, str(result.warnings[0]))
84-
assert event.data == expected, testcase
202+
assert event.data == expected

0 commit comments

Comments
 (0)