Skip to content

Bulk index performance decays substantially (and may eventually fail) on large data sets #53

@Trebla7th

Description

@Trebla7th

NOTE: This was discovered against MySQL, loading data with offsets get slower the more the data is offset. This may not be an issue against other DBs

When indexing a domain class with greater than 1 million records, index performance decays and eventually dies. This is caused by the way data is loaded in ElasticSearchService.doBulkRequest(). The line:

List<Class<?>> results = domainClass.listOrderById([offset: offset, max: max, readOnly: true, sort: 'id', order: "asc"])

Does a poor job of loading data as the offset increases.

Locally, we fixed this by making the following changes (corporate policy prevents me from submitting an actual pull request to this project, but I can suggest the fix... beauracracy!!!!)

def idResults = domainClass.createCriteria().list {
 projections {
    property 'id'
 }
 order("id", "asc")
}

..snip..

//The loop
idResults?.collate(max)?.eachWithIndex { subList, i ->
  
   //Other stuff here, then load the actual domains to index like this
    def results = domainClass.createCriteria().list {
         'in'('id', subList)
     }

  //everything else
}

Task List

  • Steps to reproduce provided
  • [N/A] Stacktrace (if present) provided
  • [N/A] Example that reproduces the problem uploaded to Github
  • Full description of the issue provided (see below)

Steps to Reproduce

  1. Create 1 million or more domain objects to be indexed
  2. Start application to index (or trigger an index after startup)
  3. Observe that subsequent iterations of the bulk loop slowly decay

Expected Behaviour

Indexing would continue at a consistent pace regardless of number of records

Actual Behaviour

Indexing decays linearly, each iteration slowing until eventually data connections start timing out

Environment Information

  • Operating System: RHEL, MacOS Mojave
  • GORM Version: 7.0.2.RELEASE
  • Grails Version (if using Grails): 4.0.3
  • JDK Version:
java -version
openjdk version "1.8.0_192"
OpenJDK Runtime Environment (Zulu 8.33.0.1-macosx) (build 1.8.0_192-b01)
OpenJDK 64-Bit Server VM (Zulu 8.33.0.1-macosx) (build 25.192-b01, mixed mode)

Example Application

  • N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions