Skip to content

apm-agent-python and structlog - mapper_parsing_exception for event.dataset  #1351

@mikesdnielsen

Description

@mikesdnielsen

Describe the bug: When using structlog and with elasticapm and the log processer elasticapm.structlog_processor, we have recently seen Logstash refuse our logger with the following error:

[2021-10-05T12:10:10,746][WARN ][logstash.outputs.elasticsearch][main][a2a92c7c0ddf765b1969e7e8d4a302b6deca976af4c80a2d9706ccdf2486267b] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"logstash-2021.10.05", :routing=>nil}, {"stage_environment"=>"prod", "service.name"=>"flow", "company"=>"<PayerCompany: Hidden Company Name [Parent Company: Hidden Company Name [CRN: DKXXXXXXXX]] [CRN: DKXXXXXXXX]>", "@version"=>"1", "host"=>"167.71.1.240", "sentry"=>"skipped", "timestamp"=>"2021-10-05T12:10:00.483890Z", "logger"=>"account_service.models", "event.dataset"=>"flow", "event"=>"PayerCompany change state request", "level"=>"debug", "port"=>58652, "new"=>"APPROVED", "override"=>false, "@timestamp"=>2021-10-05T12:10:10.462Z, "old"=>"APPROVED", "modline"=>"account_service.models:159"}], :response=>{"index"=>{"_index"=>"logstash-2021.10.05", "_type"=>"_doc", "_id"=>"ST1cUHwBFM723LU2e_JV", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"Could not dynamically add mapping for field [event.dataset]. Existing mapping for [event] must be of type object but found [text]."}}}}

We setup our structlog like this in the Django settings file:

processors = [
    structlog.stdlib.filter_by_level,
    structlog.stdlib.add_log_level,
    structlog.stdlib.add_logger_name,
    log_processors.add_module_and_lineno,
    log_processors.normalize_datatypes,
    log_processors.attach_environment,
    log_processors.timestamper,
    structlog_processor,  # this is the processor that seems to cause the error
    SentryJsonProcessor(level=logging.ERROR, tag_keys="__all__"),
    structlog.stdlib.PositionalArgumentsFormatter(),
    structlog.processors.StackInfoRenderer(),
    structlog.processors.format_exc_info,
    structlog.processors.UnicodeDecoder(),
    structlog.stdlib.ProcessorFormatter.wrap_for_formatter,
]


# Structlog
structlog.configure(
    processors=processors,
    context_class=structlog.threadlocal.wrap_dict(dict),
    logger_factory=structlog.stdlib.LoggerFactory(),
    wrapper_class=structlog.stdlib.BoundLogger,
    cache_logger_on_first_use=True,
)

If we remove structlog_processor all loggers are received by Logstash with no problems.

If we write a small customer log processor that strips the event.dataset (e.g. del event_dict["event.dataset"] it also works fine again.

To Reproduce

  1. Setup structlog
  2. Add the elasticapm.structlog_processor to the list of processors for structlog
  3. Send a logger (e.g. logger.info("test")).

Environment (please complete the following information)

  • OS: Debian (slim)
  • Python version: 3.9.7
  • Framework and version [e.g. Django 2.1]: 3.2.8
  • APM Server version: 7.15
  • Agent version: N/A

Additional context

We use python-logstash-async for delivery of the logs to the logstash server.

requirements.txt (excerpts)

elastic-apm==6.5.0
elasticsearch==7.15.0
django-structlog==2.1.3
python-logstash-async==2.3.0
structlog==21.1.0
structlog-sentry==1.4.0

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions