Skip to content

Changeset import to PostGIS changesets fails with OutOfMemoryError: Required array size too large #33

@CristianCantoro

Description

@CristianCantoro

Description

When running the changesets CLI command on a large .osm.bz2 changeset dump extracted with osmium, the process crashes with:

java.lang.OutOfMemoryError: Required array size too large

Note that importing the full planet file does not cause this issue.

Steps to reproduce

  1. Download the changesets planet file
  2. Prepare a large changeset dump in .osm.bz2 format (in my case, Italy). I have run the following command:
osmium changeset-filter \
  --bbox=6.240134062,35.5420306591,19.0018525438,47.2604346174 \
  -o changesets-260124-italy.osm.bz2 \
    changesets-260124.osm.bz2

This is the file I obtain and with which I can reproduce the error changesets-260124-italy.osm.bz2 (390MB).

  1. Launch a PostGIS container (as per the instructions in the README.md) with:
docker run \
    --name "ohsome_planet_changeset_db" \
    -e POSTGRES_PASSWORD=$OHSOME_PLANET_DB_PASSWORD \
    -e POSTGRES_USER=$OHSOME_PLANET_DB_USER \
    -e POSTGRES_DB=$OHSOME_PLANET_DB \
    -p 5432:5432 \
    -v postgis_data:/var/lib/postgresql/data \
    postgis/postgis:latest
  1. Try importing into PostGIS with:
java -Xmx52G -jar ohsome-planet-cli/target/ohsome-planet.jar changesets \
  --bz2 changesets-260124-italy.osm.bz2 \
  --changeset-db "jdbc:postgresql://localhost:5432/$OHSOME_PLANET_DB?user=$OHSOME_PLANET_DB_USER&password=$OHSOME_PLANET_DB_PASSWORD" \
  --create-tables \
  --overwrite

Actual behavior

The process throws:
java.lang.OutOfMemoryError: Required array size too large

Stack trace (excerpt):

java.lang.OutOfMemoryError: Required array size too large
  at java.base/java.io.InputStream.readNBytes(InputStream.java:420)
  at java.base/java.io.InputStream.readAllBytes(InputStream.java:349)
  at org.heigit.ohsome.osm.changesets.PBZ2Reader.decompress(PBZ2Reader.java:46)
  ...

(full log)

The process gets stuck and never ends.

Expected behavior

The changesets import should succeed or at least fail gracefully with a clear message.

Environment

osmium version 1.16.0
libosmium version 2.20.0
Supported PBF compression types: none zlib lz4
  • Java:
$ java -version
openjdk version "21.0.9" 2025-10-21
OpenJDK Runtime Environment (build 21.0.9+10-Ubuntu-124.04)
OpenJDK 64-Bit Server VM (build 21.0.9+10-Ubuntu-124.04, mixed mode, sharing)
  • OS: Ubuntu 24.04
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:  Ubuntu 24.04.3 LTS
Release:  24.04
Codename: noble
$ uname -a
Linux [host] 6.8.0-94-generic #96-Ubuntu SMP PREEMPT_DYNAMIC Fri Jan  9 20:36:55 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
  • Database:

Postgres (w/ Docker):

# SELECT version();
                                                           version                                                           
-----------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 17.5 (Debian 17.5-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)

PostGIS:

# SELECT extversion
FROM pg_catalog.pg_extension
WHERE extname='postgis';
 extversion 
------------
 3.5.2
(1 row)

Workaround

Querying ChatGPT, I was able to find a workaround to this problem by repacking the .bz2 file:

bunzip2 -c changesets-260124-italy.osm.bz2 \
  | pbzip2 -b50 -p8 > changesets-260124-italy.repacked.osm.bz2

In this way, I am able to import the changeset extract without issue.

The stack trace shows the exception originates from java.io.InputStream.readAllBytes() inside org.heigit.ohsome.osm.changesets.PBZ2Reader.decompress, suggesting the implementation tries to read the decompressed content into a single byte array.

This failure happens even with a large heap (-Xmx52G), which indicates that the problem likely lies with the JDK byte-array size / int indexing limit, triggered by attempting to allocate an array > ~2GB.

If using Apache Commons Compress for bzip2, it may also help to ensure concatenated bzip2 streams are supported (common in “multi-stream” .bz2 files), but the core issue here appears to be readAllBytes() forcing a single huge allocation.

The workaround works by repacking the compressed file so that each pack has a size smaller than 2GB.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions