Changeset import to PostGIS changesets fails with OutOfMemoryError: Required array size too large

## Description

When running the `changesets` CLI command on a large `.osm.bz2` changeset dump extracted with `osmium`, the process crashes with:

`java.lang.OutOfMemoryError: Required array size too large`

Note that importing the full planet file does not cause this issue.

## Steps to reproduce

1. Download the [changesets planet file](https://planet.openstreetmap.org/planet/changesets-latest.osm.bz2)
2. Prepare a large changeset dump in `.osm.bz2` format (in my case, Italy). I have run the following command:
```bash
osmium changeset-filter \
  --bbox=6.240134062,35.5420306591,19.0018525438,47.2604346174 \
  -o changesets-260124-italy.osm.bz2 \
    changesets-260124.osm.bz2
```
This is the file I obtain and with which I can reproduce the error [`changesets-260124-italy.osm.bz2`](https://drive.google.com/file/d/1FqWpgXKuh7NjVGTZVgK6SKRWJeqgK-fW/view?usp=sharing) (390MB).

3. Launch a PostGIS container (as per the instructions in the [README.md](https://github.com/GIScience/ohsome-planet/blob/main/README.md#changesets-postgresql)) with:
```bash
docker run \
    --name "ohsome_planet_changeset_db" \
    -e POSTGRES_PASSWORD=$OHSOME_PLANET_DB_PASSWORD \
    -e POSTGRES_USER=$OHSOME_PLANET_DB_USER \
    -e POSTGRES_DB=$OHSOME_PLANET_DB \
    -p 5432:5432 \
    -v postgis_data:/var/lib/postgresql/data \
    postgis/postgis:latest
```
4. Try importing into PostGIS with:
```bash
java -Xmx52G -jar ohsome-planet-cli/target/ohsome-planet.jar changesets \
  --bz2 changesets-260124-italy.osm.bz2 \
  --changeset-db "jdbc:postgresql://localhost:5432/$OHSOME_PLANET_DB?user=$OHSOME_PLANET_DB_USER&password=$OHSOME_PLANET_DB_PASSWORD" \
  --create-tables \
  --overwrite
```

## Actual behavior

The process throws:
`java.lang.OutOfMemoryError: Required array size too large`

Stack trace (excerpt):
```text
java.lang.OutOfMemoryError: Required array size too large
  at java.base/java.io.InputStream.readNBytes(InputStream.java:420)
  at java.base/java.io.InputStream.readAllBytes(InputStream.java:349)
  at org.heigit.ohsome.osm.changesets.PBZ2Reader.decompress(PBZ2Reader.java:46)
  ...
```
([full log](https://github.com/user-attachments/files/25016058/error.log))

The process gets stuck and never ends.

## Expected behavior

The `changesets` import should succeed or at least fail gracefully with a clear message.

## Environment

* ohsome-planet: commit `22721f27fdd4b7120c502549a1fb6f2908d8b02b`, tag: `1.2.0` (I cloned and checked out the tag, see issue #29)
* osmium:
```$ osmium --version
osmium version 1.16.0
libosmium version 2.20.0
Supported PBF compression types: none zlib lz4
```
* Java:
```bash
$ java -version
openjdk version "21.0.9" 2025-10-21
OpenJDK Runtime Environment (build 21.0.9+10-Ubuntu-124.04)
OpenJDK 64-Bit Server VM (build 21.0.9+10-Ubuntu-124.04, mixed mode, sharing)
```
* OS: Ubuntu 24.04 
```
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:  Ubuntu 24.04.3 LTS
Release:  24.04
Codename: noble
$ uname -a
Linux [host] 6.8.0-94-generic #96-Ubuntu SMP PREEMPT_DYNAMIC Fri Jan  9 20:36:55 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
```
* Database:

Postgres (w/ Docker):
```psql
# SELECT version();
                                                           version                                                           
-----------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 17.5 (Debian 17.5-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)
```

PostGIS:
```
# SELECT extversion
FROM pg_catalog.pg_extension
WHERE extname='postgis';
 extversion 
------------
 3.5.2
(1 row)
```

## Workaround

Querying ChatGPT, I was able to find a workaround to this problem by repacking the `.bz2` file:
```bash
bunzip2 -c changesets-260124-italy.osm.bz2 \
  | pbzip2 -b50 -p8 > changesets-260124-italy.repacked.osm.bz2
```

In this way, I am able to import the changeset extract without issue.

The stack trace shows the exception originates from `java.io.InputStream.readAllBytes()` inside `org.heigit.ohsome.osm.changesets.PBZ2Reader.decompress`, suggesting the implementation tries to read the decompressed content into a single byte array.

This failure happens even with a large heap (`-Xmx52G`), which indicates that the problem likely lies with the JDK byte-array size / int indexing limit, triggered by attempting to allocate an array > ~2GB.

If using Apache Commons Compress for bzip2, it may also help to ensure concatenated bzip2 streams are supported (common in “multi-stream” `.bz2` files), but the core issue here appears to be `readAllBytes()` forcing a single huge allocation.

The workaround works by repacking the compressed file so that each pack has a size smaller than 2GB.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changeset import to PostGIS changesets fails with OutOfMemoryError: Required array size too large #33

Description

Steps to reproduce

Actual behavior

Expected behavior

Environment

Workaround

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Changeset import to PostGIS changesets fails with OutOfMemoryError: Required array size too large #33

Description

Description

Steps to reproduce

Actual behavior

Expected behavior

Environment

Workaround

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions