-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Description
When running the changesets CLI command on a large .osm.bz2 changeset dump extracted with osmium, the process crashes with:
java.lang.OutOfMemoryError: Required array size too large
Note that importing the full planet file does not cause this issue.
Steps to reproduce
- Download the changesets planet file
- Prepare a large changeset dump in
.osm.bz2format (in my case, Italy). I have run the following command:
osmium changeset-filter \
--bbox=6.240134062,35.5420306591,19.0018525438,47.2604346174 \
-o changesets-260124-italy.osm.bz2 \
changesets-260124.osm.bz2This is the file I obtain and with which I can reproduce the error changesets-260124-italy.osm.bz2 (390MB).
- Launch a PostGIS container (as per the instructions in the README.md) with:
docker run \
--name "ohsome_planet_changeset_db" \
-e POSTGRES_PASSWORD=$OHSOME_PLANET_DB_PASSWORD \
-e POSTGRES_USER=$OHSOME_PLANET_DB_USER \
-e POSTGRES_DB=$OHSOME_PLANET_DB \
-p 5432:5432 \
-v postgis_data:/var/lib/postgresql/data \
postgis/postgis:latest- Try importing into PostGIS with:
java -Xmx52G -jar ohsome-planet-cli/target/ohsome-planet.jar changesets \
--bz2 changesets-260124-italy.osm.bz2 \
--changeset-db "jdbc:postgresql://localhost:5432/$OHSOME_PLANET_DB?user=$OHSOME_PLANET_DB_USER&password=$OHSOME_PLANET_DB_PASSWORD" \
--create-tables \
--overwriteActual behavior
The process throws:
java.lang.OutOfMemoryError: Required array size too large
Stack trace (excerpt):
java.lang.OutOfMemoryError: Required array size too large
at java.base/java.io.InputStream.readNBytes(InputStream.java:420)
at java.base/java.io.InputStream.readAllBytes(InputStream.java:349)
at org.heigit.ohsome.osm.changesets.PBZ2Reader.decompress(PBZ2Reader.java:46)
...
(full log)
The process gets stuck and never ends.
Expected behavior
The changesets import should succeed or at least fail gracefully with a clear message.
Environment
- ohsome-planet: commit
22721f27fdd4b7120c502549a1fb6f2908d8b02b, tag:1.2.0(I cloned and checked out the tag, see issue Build of ohsome-planet-cli fails for release 1.2.0 #29) - osmium:
osmium version 1.16.0
libosmium version 2.20.0
Supported PBF compression types: none zlib lz4
- Java:
$ java -version
openjdk version "21.0.9" 2025-10-21
OpenJDK Runtime Environment (build 21.0.9+10-Ubuntu-124.04)
OpenJDK 64-Bit Server VM (build 21.0.9+10-Ubuntu-124.04, mixed mode, sharing)- OS: Ubuntu 24.04
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 24.04.3 LTS
Release: 24.04
Codename: noble
$ uname -a
Linux [host] 6.8.0-94-generic #96-Ubuntu SMP PREEMPT_DYNAMIC Fri Jan 9 20:36:55 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
- Database:
Postgres (w/ Docker):
# SELECT version();
version
-----------------------------------------------------------------------------------------------------------------------------
PostgreSQL 17.5 (Debian 17.5-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)
PostGIS:
# SELECT extversion
FROM pg_catalog.pg_extension
WHERE extname='postgis';
extversion
------------
3.5.2
(1 row)
Workaround
Querying ChatGPT, I was able to find a workaround to this problem by repacking the .bz2 file:
bunzip2 -c changesets-260124-italy.osm.bz2 \
| pbzip2 -b50 -p8 > changesets-260124-italy.repacked.osm.bz2In this way, I am able to import the changeset extract without issue.
The stack trace shows the exception originates from java.io.InputStream.readAllBytes() inside org.heigit.ohsome.osm.changesets.PBZ2Reader.decompress, suggesting the implementation tries to read the decompressed content into a single byte array.
This failure happens even with a large heap (-Xmx52G), which indicates that the problem likely lies with the JDK byte-array size / int indexing limit, triggered by attempting to allocate an array > ~2GB.
If using Apache Commons Compress for bzip2, it may also help to ensure concatenated bzip2 streams are supported (common in “multi-stream” .bz2 files), but the core issue here appears to be readAllBytes() forcing a single huge allocation.
The workaround works by repacking the compressed file so that each pack has a size smaller than 2GB.