-
-
Notifications
You must be signed in to change notification settings - Fork 73
Description
Describe the bug
When using @BatchInsert(batchSize = N), Doma currently builds a single PreparedSql object containing all parameter sets for the entire input list, regardless of batchSize.
This causes OutOfMemoryError when inserting very large lists, because the framework constructs one huge SQL representation in memory before executing JDBC batches.
Example stack trace:
java.lang.OutOfMemoryError: Java heap space
at org.seasar.doma.jdbc.query.AutoBatchInsertQuery.prepareSql(AutoBatchInsertQuery.java:192)
at org.seasar.doma.jdbc.query.AutoBatchInsertQuery.prepare(AutoBatchInsertQuery.java:102)
...
Expected behavior
batchSize should not only control how many rows are sent per JDBC execution, but also how much SQL is prepared in memory at once.
Ideally, Doma would:
- Partition the entity list internally (e.g. into chunks of batchSize).
- Build PreparedSql for each chunk separately.
- Execute and discard that SQL before moving to the next chunk.
This way memory usage stays bounded, and developers wouldn’t need to manually partition the list before calling the DAO.
To Reproduce
Entity
@Entity
public class Person {
@Id
public Long id;
public String name;
public Person(Long id, String name) {
this.id = id;
this.name = name;
}
}
DAO
@Dao
public interface PersonDao {
@BatchInsert(batchSize = 1000)
int[] insertAll(List<Person> people);
}
Test Case
public class BatchInsertTest {
private final PersonDao dao = new PersonDaoImpl(/* inject config */);
@Test
public void testLargeBatchInsert() {
// Generate 500k rows
List<Person> people = new ArrayList<>();
for (long i = 1; i <= 500_000; i++) {
people.add(new Person(i, "name-" + i));
}
// This causes OutOfMemoryError even though batchSize=1000
dao.insertAll(people);
}
}
Environment
- Doma: 3.6.0
- IDE: Eclipse
- Java version: JDK21
- Build tool: Maven
Workaround
Currently, the only way to avoid OOM is to partition the input manually:
for (List<Entity> chunk : Lists.partition(entities, 1000)) {
dao.insert(chunk); // uses @BatchInsert without batchSize
}
This works fine, but it duplicates logic that could be handled inside Doma.
Proposed solution
Update AutoBatchInsertQuery / BatchInsertCommand so that SQL is prepared lazily per batch:
for (List<E> chunk : partition(entities, batchSize)) {
PreparedSql sql = prepareSqlForChunk(chunk);
jdbc.execute(sql);
}
This would:
- Prevent OutOfMemoryError on large lists.
- Maintain the same semantics for developers (@BatchInsert(batchSize = N)).
- Remain backward-compatible.