Skip to content

[META][Discuss] add support for input_type in "dual modes" codecs #11885

@colinsurprenant

Description

@colinsurprenant

As seen in logstash-plugins/logstash-codec-csv#8 and logstash-plugins/logstash-codec-multiline#63 some codecs would benefit from supporting two modes of operation for the 2 types of data that our input plugins can provide to codecs.

Our input plugins can provide two types of data for decoding:

  • line/document based data where each data chunk provided by the plugin to the codec is a complete data line or complete document. For example the file input or the http input.
  • stream based data where each data chunk provided by the plugin can be a part of a line/document where the complete line/document can spawn multiple chunks. For example the stdin input or the tcp input.

The way we have been dealing with this situation has beed to have two versions of a same codec, for example: json and json_lines or plain and line. Furthermore, to help deal with this confusion, we introduced the fix_streaming_codecs method to automagically swap these codecs depending on the input used.

def fix_streaming_codecs
require "logstash/codecs/plain"
require "logstash/codecs/line"
require "logstash/codecs/json"
require "logstash/codecs/json_lines"
case @codec.class.name
when "LogStash::Codecs::Plain"
@logger.info("Automatically switching from #{@codec.class.config_name} to line codec", :plugin => self.class.config_name)
@codec = LogStash::Codecs::Line.new("charset" => @codec.charset)
when "LogStash::Codecs::JSON"
@logger.info("Automatically switching from #{@codec.class.config_name} to json_lines codec", :plugin => self.class.config_name)
@codec = LogStash::Codecs::JSONLines.new("charset" => @codec.charset)
end
end

Until we figure a whole new/better input/codec architecture my proposal to iteratively improve the current design with:

  1. introduce a new input_type config option in codecs that can support both input types (similar to [WIP] support line delimited data logstash-plugins/logstash-codec-csv#8)
  2. get rid of these redundant codecs
  3. figure a way for inputs to hints codecs about their input style so that a correct default can be set in the codec depending on the input plugin used.

I am looking for comments/suggestions about this plan and if we agree on the idea I will detail the steps for each iterations.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions