Skip to content

codec unresponsive when working on a large file #21

@nikhilo

Description

@nikhilo

I'm seeing a problem with logstash-codec-cloudtrail where the processing just hangs without any error or debug logs when the codec encounters a large file.

Tried enabling debug logs for the codec, but nothing is printed:

curl -XPUT 'localhost:9600/_node/logging?pretty' -H 'Content-Type: application/json' -d'{"logger.logstash.codecs.cloudtrail" : "DEBUG"}'
  • Logstash version 5.5.1
  • Codec Version: 3.0.4
  • Operating System: Ubuntu 14.04
  • Config File
    s3 {
      region => 'us-east-1'
      bucket => '<my-org>-logs'
      backup_to_bucket => '<my-org>-logs'
      backup_add_prefix => 'processed/'
      delete => true
      interval => 300
      tags => ['aws-input', 'cloudtrail']
      type => 'cloudtrail'
      codec => 'cloudtrail'
      prefix => 'cloudtrail/'
      sincedb_path => '/opt/logstash/server/sincedb/cloudtrail'
    }

Sample Data:

Here's the list of files we have in the s3 bucket

2018-05-21 05:32:14      21408 20180521T0000Z_oueDeCc9ryuFaNE2.json.gz
2018-05-21 07:07:23      10581 20180521T0130Z_2C9gPDzKtmwp1sO3.json.gz
2018-05-21 07:04:22    5264114 20180521T0135Z_7zhrUZGpPj8c9rnb.json.gz
2018-05-21 07:12:09      13128 20180521T0135Z_b9h4v5QqEkumMZNu.json.gz
2018-05-21 07:08:06      29622 20180521T0135Z_gY3u2wcdDT3DjPY9.json.gz
2018-05-21 07:08:05      42110 20180521T0135Z_uOFgvOohWqh7pCKm.json.gz
2018-05-21 07:07:13      42502 20180521T0140Z_2TX8v5UumEV24fgg.json.gz
2018-05-21 07:17:28      10593 20180521T0140Z_UQVPTdRJ7OGIpeQu.json.gz
2018-05-21 07:09:28    4841248 20180521T0140Z_ZV0HXfgBNseHi2cG.json.gz
2018-05-21 07:12:32      58228 20180521T0140Z_j8gNtuBoG91ftY6J.json.gz
2018-05-21 07:13:29      33323 20180521T0140Z_jBjTddHPURNw0wDp.json.gz
2018-05-21 07:17:43      45539 20180521T0145Z_28lYKm6deu5M9fPf.json.gz
2018-05-21 07:17:21      37363 20180521T0145Z_MuvtNRJAgTgjsIjq.json.gz
2018-05-21 07:12:22    5245924 20180521T0145Z_kCpHWvq3Hlua803U.json.gz
2018-05-21 07:22:40      12516 20180521T0145Z_kkJAyDaUNgv2LFLK.json.gz
2018-05-21 07:12:23     109264 20180521T0145Z_zrOp34x50ibxvQNT.json.gz
2018-05-21 07:16:04    5257312 20180521T0150Z_3KaopDSL1sGxg6vf.json.gz
2018-05-21 07:17:25     252268 20180521T0150Z_CIrZORIB3WFCVN9s.json.gz
2018-05-21 07:21:08    3119643 20180521T0150Z_ERpgl6PvHjkY90QB.json.gz

At first, the sincedb was stuck at 01:34, and this file was seen in /tmp/logstash
20180521T0135Z_7zhrUZGpPj8c9rnb.json.gz, which is about 5MB.

There was no processing/logs seen beyond that timestamp for over 6 hours.
So, I stopped logstash and set the sincedb to 01:37, to skip that file.

After doing that, logstash was stuck on this file 20180521T0140Z_ZV0HXfgBNseHi2cG.json.gz which is about 4MB.

This kept on going until I skipped this file from above list 20180521T0150Z_3KaopDSL1sGxg6vf.json.gz which is about 5MBs

Steps to Reproduce:

  • Have the codec parse a file larger than 2MB
  • Codec is hung

Please note:

  • Other s3 inputs (elb and cloudfront logs) are functional in the same logstash instance.
  • Filenames in above example have been simplified to emphasize timestamp and file sizes.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions