Can someone help me grasp pipelines? #6336
-
So, now that I can tell Vector to use a pipeline, I thought I'd do so. I've been working from the docs, but I'm obviously missing something. My goal is to take postfix messages from the journald output and parse them into fields like the postfix queueid, to, from, etc. I thought that the way to do that was a dispatcher pipeline. dispatcher:
field: SYSLOG_IDENTIFIER
rules:
- value: 'postfix/smtp'
table_suffix: postfix
pipeline: process_postfix
transform:
- fields:
- SYSLOG_IDENTIFIER
type: string
index: fulltext A bunch of attempts later, and I currently have a manually created journald_postfix table that is empty, and in the journald table, the SYSLOG_IDENTIFIER=postfix/smtp rows are all completely empty except for the SYSLOG_IDENTIFIER and greptime_timestamp columns. This is after thinking the _postfix table would be created automatically, creating the table anyway, and trying to manually tell the pipelines to include the all the non-"message" columns from the original journald data. Unfortunately I'm not seeing anything in the greptime logs that give me any clues, and I haven't found anything in the docs either. I'm not even really sure what to search on at this point... So here are some questions: Is a transform actually required? The dispatcher example in the docs does not have a transform, but the pipeline editor requires one. Which is correct? How does dispatching work? I thought my config above would look at the SYSLOG_IDENTIFIER column and if it had a value of 'postfix/smtp', it would pass the entire log entry on to the process_postfix pipeline. Then the process_postfix pipeline would split the message field out into the columns I want, and the data would be stored in a new table with the suffix from the dispatcher config. I also thought that the new table would be created automatically based on the transform config in the process_postfix pipeline, plus the original journald data. Does the dispatcher value field accept wildcards? I originally wanted to capture all logs that come from a postfix SYSLOG_IDENTIFIER, since postfix has several. Is there a "drop row" processor/transform? There are some log entries that get repeated all the time. I'd like to detect and just drop them in my pipelines. Is that possible? I haven't found a decent way yet... Is there an easy way to get example logs into the pipeline dashboard? With as many columns as the journald table has, I'd love to be able to just click somewhere and get a copy of the row in the proper format for pipeline testing. As it is, I have only been created test data out the message column since manually turning all 80+ columns into a json field is not exactly fun. I did try exporting as csv, but the file had no header row. How can I create a dispatcher based on multiple fields? Is this just not possible? I'm not seeing any way the current syntax could support that. Ok, that's plenty for one post. Any tips would be appreciated! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 7 replies
-
Hi, thanks for reaching out! To reproduce the case, can you upload the vector config file along with all the pipeline config files? You can pack them into a compressed file. We use the sample data from here, it's working under the minimal config. Is a transform actually required? How does dispatching work? Does the dispatcher value field accept wildcards? Is there a "drop row" processor/transform? Is there an easy way to get example logs into the pipeline dashboard? How can I create a dispatcher based on multiple fields? If you encounter any problem, please don't hesitate to ask! |
Beta Was this translation helpful? Give feedback.
Hi, thanks for reaching out!
To reproduce the case, can you upload the vector config file along with all the pipeline config files? You can pack them into a compressed file.
Also, are you using the v0.14 version or the main branch?
I suspect the outcome of the vector may not be a straightforward JSON of data, you can use the
console
sink of the vector to view the output data locally.We use the sample data from here, it's working under the minimal config.
Is a transform actually required?
In v0.14, yes, transform is still required. In the main branch, we've added support for auto-transform, which allows the pipeline engine to infer the data type of the input data, much like the behaviour …