Skip to content

[Bug] Bookkeeper concurrent containers has concurrency issues #4318

@dao-jun

Description

@dao-jun

BUG REPORT

Describe the bug

This issue introduced by #3074

ConcurrentLongHashMap, ConcurrentLongHashSet,ConcurrentLongLongHashMap, ConcurrentLongLongPairHashMap,ConcurrentOpenHashMap and ConcurrentOpenHashSet
has the same issue refers to #4316.

This issue caused by their rehash method, the inner class Section extends StampedLock for the purpose of performance optimization.

We read entry from array without requires lock as following code(copy from ConcurrentLongHashMap) to optimize read operations:

        V get(long key, int keyHash) {
            int bucket = keyHash;

            long stamp = tryOptimisticRead();
            boolean acquiredLock = false;

            try {
                while (true) {
                    int capacity = this.capacity;
                    bucket = signSafeMod(bucket, capacity);

                    // First try optimistic locking
                    long storedKey = keys[bucket];
                    V storedValue = values[bucket];
                    // ignore.....
                    }
                 // ignore...
                 }
                // ignore....

The problem is

                    int capacity = this.capacity;
                    bucket = signSafeMod(bucket, capacity);

and

                    // First try optimistic locking
                    long storedKey = keys[bucket];
                    V storedValue = values[bucket];

are not in an atomic scope.

If rehash method triggered in an async thread after calculating bucket and finished before read entry from the array, #3074 introduces shrink, the new arrays' capital can be smaller than before, so ArrayIndexOutOfBoundsExeception happens.

The issue can be fixed with a same pattern, see https://github.com/apache/bookkeeper/pull/4317/files#diff-1214718241c7ea351d22fa5806ba112c64d64183fe2f091976d6165c4573d05a

To Reproduce

Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions