From 3012ae902657d6bea1dc3530c9a534eb158569c2 Mon Sep 17 00:00:00 2001 From: Karen Metts Date: Fri, 6 Jun 2025 19:57:27 -0400 Subject: [PATCH 1/2] Doc: Doc improvements and version bump --- CHANGELOG.md | 7 +++-- docs/index.asciidoc | 59 +++++++++++++++++++----------------- logstash-filter-grok.gemspec | 2 +- 3 files changed, 37 insertions(+), 31 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index e6f29fd..9ce1908 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,6 @@ +## 4.4.4 + - [DOC] Minor doc fixes and version bump to pick up changes in [#186](https://github.com/logstash-plugins/logstash-filter-grok/pull/186) [#187](https://github.com/logstash-plugins/logstash-filter-grok/pull/187) + ## 4.4.3 - Minor typos in docs examples [#176](https://github.com/logstash-plugins/logstash-filter-grok/pull/176) @@ -9,7 +12,7 @@ ## 4.4.0 - Feat: ECS compatibility support [#162](https://github.com/logstash-plugins/logstash-filter-grok/pull/162) - + The filter supports using built-in pattern definitions that are fully Elastic Common Schema (ECS) compliant. ## 4.3.0 @@ -30,7 +33,7 @@ ## 4.0.3 - Fixed memory leak when run on JRuby 1.x (Logstash 5.x) [#135](https://github.com/logstash-plugins/logstash-filter-grok/issues/135) - + ## 4.0.2 - Fixed resource leak where this plugin might get double initialized during plugin reload, leaking a thread + some objects diff --git a/docs/index.asciidoc b/docs/index.asciidoc index 12dccf8..07e3dab 100644 --- a/docs/index.asciidoc +++ b/docs/index.asciidoc @@ -24,13 +24,13 @@ Parse arbitrary text and structure it. Grok is a great way to parse unstructured log data into something structured and queryable. -This tool is perfect for syslog logs, apache and other webserver logs, mysql +This tool is great for syslog logs, apache and other webserver logs, mysql logs, and in general, any log format that is generally written for humans and not computer consumption. -Logstash ships with about 120 patterns by default. You can find them here: -. You can add -your own trivially. (See the `patterns_dir` setting) +Logstash ships with about 120 https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns[default grok patterns]. +You can also add your own. +Check out the <> for more info. If you need help building patterns to match your logs, try the {kibana-ref}/xpack-grokdebugger.html[Grok debugger] in {kib}. @@ -39,7 +39,7 @@ If you need help building patterns to match your logs, try the {kibana-ref}/xpac The {logstash-ref}/plugins-filters-dissect.html[`dissect`] filter plugin is another way to extract unstructured event data into fields using delimiters. -Dissect differs from Grok in that it does not use regular expressions and is faster. +Dissect differs from Grok in that it does not use regular expressions and is faster. Dissect works well when data is reliably repeated. Grok is a better choice when the structure of your text varies from line to line. @@ -48,12 +48,12 @@ line is reliably repeated, but the entire line is not. The Dissect filter can deconstruct the section of the line that is repeated. The Grok filter can process the remaining field values with more regex predictability. -==== Grok Basics +==== Grok basics Grok works by combining text patterns into something that matches your logs. -The syntax for a grok pattern is `%{SYNTAX:SEMANTIC}` +The syntax for a grok pattern is `%{SYNTAX:SEMANTIC}`. The `SYNTAX` is the name of the pattern that will match your text. For example, `3.44` will be matched by the `NUMBER` pattern and `55.3.244.1` will @@ -65,8 +65,11 @@ simply `duration`. Further, a string `55.3.244.1` might identify the `client` making a request. For the above example, your grok filter would look something like this: + [source,ruby] +----- %{NUMBER:duration} %{IP:client} +----- Optionally you can add a data type conversion to your grok pattern. By default all semantics are saved as strings. If you wish to convert a semantic's data type, @@ -106,14 +109,14 @@ After the grok filter, the event will have a few extra fields in it: * `bytes: 15824` * `duration: 0.043` -==== Regular Expressions +==== Regular expressions Grok sits on top of regular expressions, so any regular expressions are valid in grok as well. The regular expression library is Oniguruma, and you can see the full supported regexp syntax https://github.com/kkos/oniguruma/blob/master/doc/RE[on the Oniguruma site]. -==== Custom Patterns +==== Custom patterns Sometimes logstash doesn't have a pattern you need. For this, you have a few options. @@ -171,7 +174,7 @@ The `timestamp`, `logsource`, `program`, and `pid` fields come from the `SYSLOGBASE` pattern which itself is defined by other patterns. Another option is to define patterns _inline_ in the filter using `pattern_definitions`. -This is mostly for convenience and allows user to define a pattern which can be used just in that +This is mostly for convenience and allows user to define a pattern which can be used just in that filter. This newly defined patterns in `pattern_definitions` will not be available outside of that particular `grok` filter. [id="plugins-{type}s-{plugin}-ecs"] @@ -184,7 +187,7 @@ compliant with the schema. The ECS pattern set has all of the pattern definitions from the legacy set, and is a drop-in replacement. Use the <> -setting to switch modes. +setting to switch modes. New features and enhancements will be added to the ECS-compliant files. The legacy patterns may still receive bug fixes which are backwards compatible. @@ -219,7 +222,7 @@ filter plugins.   [id="plugins-{type}s-{plugin}-break_on_match"] -===== `break_on_match` +===== `break_on_match` * Value type is <> * Default value is `true` @@ -243,7 +246,7 @@ Controls this plugin's compatibility with the {ecs-ref}[Elastic Common Schema (E The value of this setting affects extracted event field names when a composite pattern (such as `HTTPD_COMMONLOG`) is matched. [id="plugins-{type}s-{plugin}-keep_empty_captures"] -===== `keep_empty_captures` +===== `keep_empty_captures` * Value type is <> * Default value is `false` @@ -251,7 +254,7 @@ The value of this setting affects extracted event field names when a composite p If `true`, keep empty captures as event fields. [id="plugins-{type}s-{plugin}-match"] -===== `match` +===== `match` * Value type is <> * Default value is `{}` @@ -280,7 +283,7 @@ If you need to match multiple patterns against a single field, the value can be } } } - + To perform matches on multiple fields just use multiple entries in the `match` hash: [source,ruby] @@ -312,7 +315,7 @@ However, if one pattern depends on a field created by a previous pattern, separa [id="plugins-{type}s-{plugin}-named_captures_only"] -===== `named_captures_only` +===== `named_captures_only` * Value type is <> * Default value is `true` @@ -320,7 +323,7 @@ However, if one pattern depends on a field created by a previous pattern, separa If `true`, only store named captures from grok. [id="plugins-{type}s-{plugin}-overwrite"] -===== `overwrite` +===== `overwrite` * Value type is <> * Default value is `[]` @@ -342,7 +345,7 @@ overwrite the `message` field with part of the match like so: In this case, a line like `May 29 16:37:11 sadness logger: hello world` will be parsed and `hello world` will overwrite the original message. -If you are using a field reference in `overwrite`, you must use the field +If you are using a field reference in `overwrite`, you must use the field reference in the pattern. Example: [source,ruby] filter { @@ -354,18 +357,18 @@ reference in the pattern. Example: [id="plugins-{type}s-{plugin}-pattern_definitions"] -===== `pattern_definitions` +===== `pattern_definitions` * Value type is <> * Default value is `{}` -A hash of pattern-name and pattern tuples defining custom patterns to be used by -the current filter. Patterns matching existing names will override the pre-existing -definition. Think of this as inline patterns available just for this definition of +A hash of pattern-name and pattern tuples defining custom patterns to be used by +the current filter. Patterns matching existing names will override the pre-existing +definition. Think of this as inline patterns available just for this definition of grok [id="plugins-{type}s-{plugin}-patterns_dir"] -===== `patterns_dir` +===== `patterns_dir` * Value type is <> * Default value is `[]` @@ -375,7 +378,7 @@ Logstash ships by default with a bunch of patterns, so you don't necessarily need to define this yourself unless you are adding additional patterns. You can point to multiple pattern directories using this setting. Note that Grok will read all files in the directory matching the patterns_files_glob -and assume it's a pattern file (including any tilde backup files). +and assume it's a pattern file (including any tilde backup files). [source,ruby] patterns_dir => ["/opt/logstash/patterns", "/opt/logstash/extra_patterns"] @@ -390,7 +393,7 @@ For example: The patterns are loaded when the pipeline is created. [id="plugins-{type}s-{plugin}-patterns_files_glob"] -===== `patterns_files_glob` +===== `patterns_files_glob` * Value type is <> * Default value is `"*"` @@ -399,7 +402,7 @@ Glob pattern, used to select the pattern files in the directories specified by patterns_dir [id="plugins-{type}s-{plugin}-tag_on_failure"] -===== `tag_on_failure` +===== `tag_on_failure` * Value type is <> * Default value is `["_grokparsefailure"]` @@ -408,7 +411,7 @@ Append values to the `tags` field when there has been no successful match [id="plugins-{type}s-{plugin}-tag_on_timeout"] -===== `tag_on_timeout` +===== `tag_on_timeout` * Value type is <> * Default value is `"_groktimeout"` @@ -424,7 +427,7 @@ Tag to apply if a grok regexp times out. Define target namespace for placing matches. [id="plugins-{type}s-{plugin}-timeout_millis"] -===== `timeout_millis` +===== `timeout_millis` * Value type is <> * Default value is `30000` diff --git a/logstash-filter-grok.gemspec b/logstash-filter-grok.gemspec index 5ccb55f..b4988bf 100644 --- a/logstash-filter-grok.gemspec +++ b/logstash-filter-grok.gemspec @@ -1,6 +1,6 @@ Gem::Specification.new do |s| s.name = 'logstash-filter-grok' - s.version = '4.4.3' + s.version = '4.4.4' s.licenses = ['Apache License (2.0)'] s.summary = "Parses unstructured event data into fields" s.description = "This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program" From bbde97a850e7064ac0ee53f251a7d1f860b3b867 Mon Sep 17 00:00:00 2001 From: Karen Metts <35154725+karenzone@users.noreply.github.com> Date: Fri, 6 Jun 2025 20:14:56 -0400 Subject: [PATCH 2/2] Correct PR number and link --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 9ce1908..e480a15 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,5 @@ ## 4.4.4 - - [DOC] Minor doc fixes and version bump to pick up changes in [#186](https://github.com/logstash-plugins/logstash-filter-grok/pull/186) [#187](https://github.com/logstash-plugins/logstash-filter-grok/pull/187) + - [DOC] Minor doc fixes and version bump to pick up changes in [#186](https://github.com/logstash-plugins/logstash-filter-grok/pull/186) [#197](https://github.com/logstash-plugins/logstash-filter-grok/pull/197) ## 4.4.3 - Minor typos in docs examples [#176](https://github.com/logstash-plugins/logstash-filter-grok/pull/176)