Skip to content

@BatchInsert builds SQL for the entire input list at once → OutOfMemoryError on large inserts #1444

@rougou

Description

@rougou

Describe the bug
When using @BatchInsert(batchSize = N), Doma currently builds a single PreparedSql object containing all parameter sets for the entire input list, regardless of batchSize.

This causes OutOfMemoryError when inserting very large lists, because the framework constructs one huge SQL representation in memory before executing JDBC batches.

Example stack trace:

java.lang.OutOfMemoryError: Java heap space
    at org.seasar.doma.jdbc.query.AutoBatchInsertQuery.prepareSql(AutoBatchInsertQuery.java:192)
    at org.seasar.doma.jdbc.query.AutoBatchInsertQuery.prepare(AutoBatchInsertQuery.java:102)
    ...

Expected behavior
batchSize should not only control how many rows are sent per JDBC execution, but also how much SQL is prepared in memory at once.

Ideally, Doma would:

  1. Partition the entity list internally (e.g. into chunks of batchSize).
  2. Build PreparedSql for each chunk separately.
  3. Execute and discard that SQL before moving to the next chunk.

This way memory usage stays bounded, and developers wouldn’t need to manually partition the list before calling the DAO.

To Reproduce

Entity

@Entity
public class Person {
    @Id
    public Long id;

    public String name;

    public Person(Long id, String name) {
        this.id = id;
        this.name = name;
    }
}

DAO

@Dao
public interface PersonDao {

    @BatchInsert(batchSize = 1000)
    int[] insertAll(List<Person> people);
}

Test Case

public class BatchInsertTest {

    private final PersonDao dao = new PersonDaoImpl(/* inject config */);

    @Test
    public void testLargeBatchInsert() {
        // Generate 500k rows
        List<Person> people = new ArrayList<>();
        for (long i = 1; i <= 500_000; i++) {
            people.add(new Person(i, "name-" + i));
        }

        // This causes OutOfMemoryError even though batchSize=1000
        dao.insertAll(people);
    }
}

Environment

  • Doma: 3.6.0
  • IDE: Eclipse
  • Java version: JDK21
  • Build tool: Maven

Workaround

Currently, the only way to avoid OOM is to partition the input manually:

for (List<Entity> chunk : Lists.partition(entities, 1000)) {
    dao.insert(chunk); // uses @BatchInsert without batchSize
}

This works fine, but it duplicates logic that could be handled inside Doma.

Proposed solution

Update AutoBatchInsertQuery / BatchInsertCommand so that SQL is prepared lazily per batch:

for (List<E> chunk : partition(entities, batchSize)) {
    PreparedSql sql = prepareSqlForChunk(chunk);
    jdbc.execute(sql);
}

This would:

  • Prevent OutOfMemoryError on large lists.
  • Maintain the same semantics for developers (@BatchInsert(batchSize = N)).
  • Remain backward-compatible.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions