-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Description
As seen in logstash-plugins/logstash-codec-csv#8 and logstash-plugins/logstash-codec-multiline#63 some codecs would benefit from supporting two modes of operation for the 2 types of data that our input plugins can provide to codecs.
Our input plugins can provide two types of data for decoding:
- line/document based data where each data chunk provided by the plugin to the codec is a complete data line or complete document. For example the
fileinput or thehttpinput. - stream based data where each data chunk provided by the plugin can be a part of a line/document where the complete line/document can spawn multiple chunks. For example the
stdininput or thetcpinput.
The way we have been dealing with this situation has beed to have two versions of a same codec, for example: json and json_lines or plain and line. Furthermore, to help deal with this confusion, we introduced the fix_streaming_codecs method to automagically swap these codecs depending on the input used.
logstash/logstash-core/lib/logstash/inputs/base.rb
Lines 145 to 159 in 196ec20
| def fix_streaming_codecs | |
| require "logstash/codecs/plain" | |
| require "logstash/codecs/line" | |
| require "logstash/codecs/json" | |
| require "logstash/codecs/json_lines" | |
| case @codec.class.name | |
| when "LogStash::Codecs::Plain" | |
| @logger.info("Automatically switching from #{@codec.class.config_name} to line codec", :plugin => self.class.config_name) | |
| @codec = LogStash::Codecs::Line.new("charset" => @codec.charset) | |
| when "LogStash::Codecs::JSON" | |
| @logger.info("Automatically switching from #{@codec.class.config_name} to json_lines codec", :plugin => self.class.config_name) | |
| @codec = LogStash::Codecs::JSONLines.new("charset" => @codec.charset) | |
| end | |
| end |
Until we figure a whole new/better input/codec architecture my proposal to iteratively improve the current design with:
- introduce a new
input_typeconfig option in codecs that can support both input types (similar to [WIP] support line delimited data logstash-plugins/logstash-codec-csv#8) - get rid of these redundant codecs
- figure a way for inputs to hints codecs about their input style so that a correct default can be set in the codec depending on the input plugin used.
I am looking for comments/suggestions about this plan and if we agree on the idea I will detail the steps for each iterations.