Skip to content

Commit f879c02

Browse files
thomheymanndgieselaarkibanamachine
authored
[Streams] Consistent Grok pattern generation (#230076)
Resolves elastic/streams-program#314 ## Summary Updates the Grok pattern suggestions feature to use heuristics when constructing Grok patterns in order to improve reliability. Once a Grok pattern has been created an LLM is used to determine ECS field names. Introduces a new shared package `@kbn/grok-heuristics` which exposes Grok pattern extraction and grouping utilities for log/message analysis. ## Acceptance criteria - Generate pattern button should create one GROK processor suggestion showing parse rate and field metrics - The user can accept the suggestion or dismiss it - When accepting the suggestion any existing patterns and pattern definitions are replaced with the suggested grok processor - When multiple log formats are detected in the dataset the suggested processor should contain a list of fallback patterns, one for each log format. The maximum number of patterns returned is limited to 3. - For each suggested Grok pattern the UI should call the selected LLM to suggest field names and merge any fields that have been broken up too granular - The Grok pattern should extract fields to ECS convention for classic streams or Otel convention for wired streams. Custom timestamps should never be extracted to `@timestamp` to avoid date format conflicts. - Fields that span at least 2 Grok components should be extracted into their own pattern definition - When multiple conflicting pattern definitions are returned by the LLM should resolve them by appending a counter (e.g. CUSTOM_TIMESTAMP2 for the second pattern definition) ## Screenshot <img width="1363" height="1086" alt="Screenshot 2025-08-13 at 08 39 48" src="https://github.com/user-attachments/assets/71791309-a2b0-4769-8ffd-d3fa3ea11c74" /> ## Evaluations | Stream | Before (all docs) | After (all docs) | |--------------------|-------------------------|------------------------| | `logs.android` | 0% | 100% | | `logs.apache` | 100% | 100% | | `logs.bgl` | 0% | 100% | | `logs.hadoop` | 100% | 100% | | `logs.hdfs` | 0% | 100% | | `logs.healthapp` | 100% | 100% | | `logs.hpc` | 100% | 100% | | `logs.linux` | 100% | 100% | | `logs.mac` | 100% | 100% | | `logs.openssh` | 71.9% | 100% | | `logs.openstack` | 100% | 100% | | `logs.proxifier` | 51.0% | 99.8% | | `logs.spark` | 99.6% | 99.4% | | `logs.thunderbird` | 58.1% | 95.2% | | `logs.windows` | 100% | 100% | | `logs.zookeeper` | 100% | 100% | | Metric | Before | After | |-------------------------------|--------|--------| | Average Parsing Score (samples) | 75.2% | 100% | | Average Parsing Score (all docs) | 73.8% | 99.7% | --------- Co-authored-by: Dario Gieselaar <[email protected]> Co-authored-by: kibanamachine <[email protected]>
1 parent 1a31fab commit f879c02

File tree

55 files changed

+3661
-1924
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+3661
-1924
lines changed

.github/CODEOWNERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -889,6 +889,7 @@ x-pack/platform/packages/shared/kbn-entities-schema @elastic/obs-entities
889889
x-pack/platform/packages/shared/kbn-evals @elastic/appex-ai-infra
890890
x-pack/platform/packages/shared/kbn-event-stacktrace @elastic/obs-ux-infra_services-team @elastic/obs-ux-logs-team
891891
x-pack/platform/packages/shared/kbn-failure-store-modal @elastic/kibana-management
892+
x-pack/platform/packages/shared/kbn-grok-heuristics/package.json @elastic/streams-program-team
892893
x-pack/platform/packages/shared/kbn-inference-cli @elastic/appex-ai-infra
893894
x-pack/platform/packages/shared/kbn-inference-endpoint-ui-common @elastic/appex-ai-infra
894895
x-pack/platform/packages/shared/kbn-inference-prompt-utils @elastic/appex-ai-infra

package.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -595,6 +595,7 @@
595595
"@kbn/graph-plugin": "link:x-pack/platform/plugins/private/graph",
596596
"@kbn/grid-example-plugin": "link:examples/grid_example",
597597
"@kbn/grid-layout": "link:src/platform/packages/private/kbn-grid-layout",
598+
"@kbn/grok-heuristics": "link:x-pack/platform/packages/shared/kbn-grok-heuristics",
598599
"@kbn/grok-ui": "link:src/platform/packages/shared/kbn-grok-ui",
599600
"@kbn/grokdebugger-plugin": "link:x-pack/platform/plugins/private/grokdebugger",
600601
"@kbn/grouping": "link:src/platform/packages/shared/kbn-grouping",

tsconfig.base.json

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1110,6 +1110,8 @@
11101110
"@kbn/grid-example-plugin/*": ["examples/grid_example/*"],
11111111
"@kbn/grid-layout": ["src/platform/packages/private/kbn-grid-layout"],
11121112
"@kbn/grid-layout/*": ["src/platform/packages/private/kbn-grid-layout/*"],
1113+
"@kbn/grok-heuristics": ["x-pack/platform/packages/shared/kbn-grok-heuristics"],
1114+
"@kbn/grok-heuristics/*": ["x-pack/platform/packages/shared/kbn-grok-heuristics/*"],
11131115
"@kbn/grok-ui": ["src/platform/packages/shared/kbn-grok-ui"],
11141116
"@kbn/grok-ui/*": ["src/platform/packages/shared/kbn-grok-ui/*"],
11151117
"@kbn/grokdebugger-plugin": ["x-pack/platform/plugins/private/grokdebugger"],
@@ -2342,13 +2344,23 @@
23422344
"@kbn/zod-helpers/*": ["src/platform/packages/shared/kbn-zod-helpers/*"],
23432345
// END AUTOMATED PACKAGE LISTING
23442346
// Allows for importing from `kibana` package for the exported types.
2345-
"@emotion/core": ["typings/@emotion"],
2347+
"@emotion/core": [
2348+
"typings/@emotion"
2349+
],
23462350
// We need the custom typings "proxy" because our current import-resolver doesn't support "exports" in package.json.
23472351
// We should be able to remove this once we support cjs/esm interop.
2348-
"@elastic/opentelemetry-node/sdk": ["typings/@elastic/opentelemetry-node/sdk"],
2349-
"@opentelemetry/semantic-conventions/incubating": ["typings/@opentelemetry/semantic-conventions/incubating"],
2350-
"@typescript-eslint/parser": ["typings/@typescript-eslint/parser"],
2351-
"@a2a-js/sdk/server": ["typings/@a2a-js-sdk/server"],
2352+
"@elastic/opentelemetry-node/sdk": [
2353+
"typings/@elastic/opentelemetry-node/sdk"
2354+
],
2355+
"@opentelemetry/semantic-conventions/incubating": [
2356+
"typings/@opentelemetry/semantic-conventions/incubating"
2357+
],
2358+
"@typescript-eslint/parser": [
2359+
"typings/@typescript-eslint/parser"
2360+
],
2361+
"@a2a-js/sdk/server": [
2362+
"typings/@a2a-js-sdk/server"
2363+
],
23522364
},
23532365
// Support .tsx files and transform JSX into calls to React.createElement
23542366
"jsx": "react",

x-pack/platform/plugins/shared/streams/scripts/evaluate_grok_patterns.ts renamed to x-pack/platform/packages/private/kbn-evals-suite-streams/scripts/evaluate_heuristic_grok_patterns.ts

Lines changed: 59 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -17,20 +17,24 @@
1717
* Run evaluation script using:
1818
*
1919
* ```bash
20-
* yarn run ts-node --transpile-only x-pack/platform/plugins/shared/streams/scripts/evaluate_grok_patterns.ts
20+
* node --require ./src/setup_node_env/ ./x-pack/platform/plugins/shared/streams/scripts/evaluate_heuristic_grok_patterns.ts
2121
* ```
2222
*/
2323

2424
import { Client } from '@elastic/elasticsearch';
2525
import fetch from 'node-fetch';
26-
import uniq from 'lodash/uniq';
2726
import { writeFile } from 'fs/promises';
2827
import { join } from 'path';
2928
import chalk from 'chalk';
3029
import yargs from 'yargs/yargs';
3130
import { flattenObject } from '@kbn/object-utils';
3231
import { get } from 'lodash';
33-
import { getLogGroups } from '../server/routes/internal/streams/processing/get_log_groups';
32+
import {
33+
getReviewFields,
34+
getGrokProcessor,
35+
getGrokPattern,
36+
extractGrokPatternDangerouslySlow,
37+
} from '@kbn/grok-heuristics';
3438

3539
const ES_URL = 'http://localhost:9200';
3640
const ES_USER = 'elastic';
@@ -72,11 +76,11 @@ async function getStreams(): Promise<string[]> {
7276
const data = await fetch(`${KIBANA_URL}/api/streams`, {
7377
method: 'GET',
7478
headers: getKibanaAuthHeaders(),
75-
}).then((res) => {
79+
}).then(async (res) => {
7680
if (res.ok) {
7781
return res.json();
7882
}
79-
throw new Error(`HTTP Response Code: ${res.status}`);
83+
throw new Error(`HTTP Response (${res.status}): ${await res.text()}`);
8084
});
8185
return data.streams.map((s: any) => s.name).filter((name: string) => name.startsWith('logs.'));
8286
}
@@ -85,29 +89,37 @@ async function getConnectors(): Promise<string[]> {
8589
const data = await fetch(`${KIBANA_URL}/api/actions/connectors`, {
8690
method: 'GET',
8791
headers: getKibanaAuthHeaders(),
88-
}).then((res) => {
92+
}).then(async (res) => {
8993
if (res.ok) {
9094
return res.json();
9195
}
92-
throw new Error(`HTTP Response Code: ${res.status}`);
96+
throw new Error(`HTTP Response (${res.status}): ${await res.text()}`);
9397
});
9498
return data.map((c: any) => c.id);
9599
}
96100

97-
async function getSuggestions(stream: string, connectorId: string, samples: any[]) {
98-
const data = await fetch(`${KIBANA_URL}/internal/streams/${stream}/processing/_suggestions`, {
99-
method: 'POST',
100-
headers: getKibanaAuthHeaders(),
101-
body: JSON.stringify({
102-
connectorId,
103-
field: MESSAGE_FIELD,
104-
samples,
105-
}),
106-
}).then(async (res) => {
101+
async function getSuggestions(
102+
stream: string,
103+
connectorId: string,
104+
messages: string[],
105+
reviewFields: ReturnType<typeof getReviewFields>
106+
) {
107+
const data = await fetch(
108+
`${KIBANA_URL}/internal/streams/${stream}/processing/_suggestions/grok`,
109+
{
110+
method: 'POST',
111+
headers: getKibanaAuthHeaders(),
112+
body: JSON.stringify({
113+
connector_id: connectorId,
114+
sample_messages: messages.slice(0, 10),
115+
review_fields: reviewFields,
116+
}),
117+
}
118+
).then(async (res) => {
107119
if (res.ok) {
108120
return res.json();
109121
}
110-
throw new Error(`HTTP Response Code: ${res.status}`);
122+
throw new Error(`HTTP Response (${res.status}): ${await res.text()}`);
111123
});
112124
return data;
113125
}
@@ -118,24 +130,25 @@ async function simulateGrokProcessor(stream: string, documents: any[], grokProce
118130
headers: getKibanaAuthHeaders(),
119131
body: JSON.stringify({
120132
documents,
121-
processing: {
122-
steps: [
123-
{
124-
customIdentifier: 'eval-grok',
125-
from: grokProcessor.field,
133+
processing: [
134+
{
135+
id: 'eval-grok',
136+
grok: {
137+
field: MESSAGE_FIELD,
126138
patterns: grokProcessor.patterns,
139+
pattern_definitions: grokProcessor.pattern_definitions,
127140
ignore_failure: true,
128141
ignore_missing: true,
129-
where: { always: {} },
142+
if: { always: {} },
130143
},
131-
],
132-
},
144+
},
145+
],
133146
}),
134-
}).then((res) => {
147+
}).then(async (res) => {
135148
if (res.ok) {
136149
return res.json();
137150
}
138-
throw new Error(`HTTP Response Code: ${res.status}`);
151+
throw new Error(`HTTP Response (${res.status}): ${await res.text()}`);
139152
});
140153
return data;
141154
}
@@ -172,7 +185,7 @@ export async function evaluateGrokSuggestions() {
172185
);
173186
}
174187

175-
const connector = connectors[connectors.length - 1]; // Use the last connector for evaluation
188+
const connector = connectors[0]; // Use the first connector for evaluation
176189

177190
// 1. Get AI suggestions
178191
console.log(chalk.bold('Getting suggestions...'));
@@ -181,12 +194,25 @@ export async function evaluateGrokSuggestions() {
181194
streams.map(async (stream) => {
182195
const sampleDocs = await fetchDocs(stream, 100);
183196

184-
const suggestion = await getSuggestions(stream, connector, sampleDocs);
185-
const grokProcessor = suggestion[0]?.grokProcessor;
197+
const messages = sampleDocs.reduce<string[]>((acc, sample) => {
198+
const value = get(sample, MESSAGE_FIELD);
199+
if (typeof value === 'string') {
200+
acc.push(value);
201+
}
202+
return acc;
203+
}, []);
204+
205+
const grokPatternNodes = extractGrokPatternDangerouslySlow(messages);
206+
const grokPattern = getGrokPattern(grokPatternNodes);
207+
const reviewFields = getReviewFields(grokPatternNodes, 10);
208+
console.log(`- ${stream}: ${chalk.dim(grokPattern)}`);
209+
210+
const suggestion = await getSuggestions(stream, connector, messages, reviewFields);
211+
const grokProcessor = getGrokProcessor(grokPatternNodes, suggestion);
186212
if (!grokProcessor) {
187213
throw new Error('No grokProcessor returned');
188214
}
189-
console.log(`- ${stream}: ${chalk.dim(grokProcessor.patterns.join(', '))}`);
215+
console.log(`- ${stream}: ${chalk.green(grokProcessor.patterns.join(', '))}`);
190216

191217
return { stream, ...grokProcessor };
192218
})
@@ -232,7 +258,7 @@ export async function evaluateGrokSuggestions() {
232258
chalk.bold(`Average Parsing Score (all docs): ${chalk.green(averageParsingScoreAllDocs)}`)
233259
);
234260

235-
return output.reduce((acc, suggestion) => {
261+
return output.reduce<Record<string, any>>((acc, suggestion) => {
236262
acc[suggestion.stream] = {
237263
patterns: suggestion.patterns,
238264
pattern_definitions: suggestion.pattern_definitions,
@@ -243,29 +269,6 @@ export async function evaluateGrokSuggestions() {
243269
}, {});
244270
}
245271

246-
export async function evaluateLogGrouping() {
247-
const allDocs = await fetchDocs('logs.*', 10_000);
248-
const groups = getLogGroups(
249-
allDocs.map((doc) => `${get(doc, MESSAGE_FIELD)}|||${get(doc, 'attributes.filepath')}`),
250-
1
251-
);
252-
const output = groups.map((g) => {
253-
return {
254-
pattern: g.pattern,
255-
logs: g.logs.length,
256-
streams: uniq(g.logs.map((log) => log.split('|||')[1])),
257-
};
258-
});
259-
output.forEach((g) => {
260-
console.log();
261-
console.log(chalk.bold(`"${g.pattern}" (${g.logs} logs):`));
262-
g.streams.forEach((stream) => {
263-
console.log(`- ${chalk.green(stream)}`);
264-
});
265-
});
266-
return output;
267-
}
268-
269272
async function runGrokSuggestionsEvaluation() {
270273
await evaluateGrokSuggestions()
271274
.then((result) => {
@@ -278,20 +281,6 @@ async function runGrokSuggestionsEvaluation() {
278281
.catch(console.error);
279282
}
280283

281-
async function runLogGroupingEvaluation() {
282-
console.log();
283-
console.log('Starting evaluation of Log Grouping...');
284-
await evaluateLogGrouping()
285-
.then((result) => {
286-
const file = `grouping_results.${Date.now()}.json`;
287-
console.log();
288-
console.log(`Evaluation complete. Writing results to ${file}`);
289-
return writeFile(join(__dirname, file), JSON.stringify(result, null, 2));
290-
})
291-
.catch(console.error);
292-
}
293-
294284
yargs(process.argv.slice(2))
295285
.command('*', 'Evaluate AI suggestions for Grok patterns', runGrokSuggestionsEvaluation)
296-
.command('grouping', 'Evaluate log grouping patterns', runLogGroupingEvaluation)
297286
.parse();

x-pack/platform/packages/private/kbn-evals-suite-streams/tsconfig.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,5 +9,7 @@
99
"kbn_references": [
1010
"@kbn/evals",
1111
"@kbn/sse-utils-client",
12+
"@kbn/object-utils",
13+
"@kbn/grok-heuristics",
1214
]
1315
}
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# @kbn/grok-heuristics
2+
3+
Utilities and helper functions for extracting GROK patterns from log messages.
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the Elastic License
4+
* 2.0; you may not use this file except in compliance with the Elastic License
5+
* 2.0.
6+
*/
7+
8+
export { getReviewFields } from './src/review/get_review_fields';
9+
export { getGrokPattern } from './src/review/get_grok_pattern';
10+
export { unwrapPatternDefinitions } from './src/review/unwrap_pattern_definitions';
11+
export { getGrokProcessor, type GrokProcessorResult } from './src/review/get_grok_processor';
12+
export { mergeGrokProcessors } from './src/review/merge_grok_processors';
13+
export { groupMessagesByPattern } from './src/group_messages';
14+
export { extractGrokPatternDangerouslySlow } from './src/tokenization/extract_grok_pattern';
15+
export { ReviewFieldsPrompt } from './src/review/review_fields_prompt';
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the Elastic License
4+
* 2.0; you may not use this file except in compliance with the Elastic License
5+
* 2.0.
6+
*/
7+
8+
module.exports = {
9+
preset: '@kbn/test/jest_node',
10+
rootDir: '../../../../..',
11+
roots: ['<rootDir>/x-pack/platform/packages/shared/kbn-grok-heuristics'],
12+
};
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"type": "shared-common",
3+
"id": "@kbn/grok-heuristics",
4+
"owner": "@elastic/streams-program-team",
5+
"group": "platform",
6+
"visibility": "shared"
7+
}
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"name": "@kbn/grok-heuristics",
3+
"private": true,
4+
"version": "1.0.0",
5+
"license": "Elastic License 2.0"
6+
}

0 commit comments

Comments
 (0)