Skip to content

Improve information about _stats index_failuresΒ #80802

@GlenRSmith

Description

@GlenRSmith

As is, when exceptions occur in InternalEngine.index, they bubble up and by way of a postIndex hook, two things happen:

  • if trace logging is enable for the right class context, a trace log is written that includes the root cause failure
  • the relevant InternalIndexingStats have the index_failed counter incremented

This makes it very challenging to investigate the cause of indexing failures. I think even the most surgical setting of trace logging (I think it would be org.elasticsearch.index.shard.IndexShard ?) in production environment will result in pretty massive amounts of logging.

I've been able to figure out that, for example, that a org.elasticsearch.index.engine.VersionConflictEngineException will contribute to this count when e.g. trying to update a document with a lagging version number, but only by suspecting that to be the case and testing it in isolation.

I'm not really sure how, exactly, I would prefer to see this improved. One approach would be to add granularity to those stats; that seems like a fairly high bar to clear in justifying as it would be disruptive to the client-facing API. Changes to logging seems more palatable in that regard, and the lowest hanging approach might be to promote the [1] logger.trace in IndexShard.index to, at a minimum, logger.debug. Another approach would be to add logging at each of the places where a root cause failure occurs. Of course that would fan out the changes needed and would be more difficult to be certain all relevant scenarios are covered.

(I would contend that, regardless any effort to address the general request I'm making, the point-by-point places where relevant exceptions are raised should generate log entries, arguably as much as warn level.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed Indexing/CRUDA catch all label for issues around indexing, updating and getting a doc by id. Not search.>enhancementTeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions