Skip to content

fix: Fix null pointer exception on reopen (for some readers)#4412

Open
tlambert03 wants to merge 1 commit intoome:developfrom
tlambert03:fix-null-pointer-on-reopen
Open

fix: Fix null pointer exception on reopen (for some readers)#4412
tlambert03 wants to merge 1 commit intoome:developfrom
tlambert03:fix-null-pointer-on-reopen

Conversation

@tlambert03
Copy link

Hello! my first PR here so please put me in my place if i'm suggesting something stupid :)

Over in https://github.com/imaging-formats/bffile I'm working on a new python wrapper for bio-formats (essentially an improved variant of the one contributed a while back to aicsimageio/bioio). As I was playing with some patterns I hit a java.lang.java.lang.NullPointerException upon calling reopenFile(), after having called close() with fileOnly=True.

After some time with claude I came to this fix, but it touches the actual FormatReader base... I'll give you the full conversation we had here just so you can see my understanding (and naiveté 😄 )

Details

I asked it:

  • DOES FormatReader.close definitely have a bug? (or is it more complicated)\
  • if it DOES have a bug, would fixing it in FormatReader be a rather risky maneuver, given how many things use it? (or is setting in= null pretty safe)\
  • if not, are there any other readers besides MinimalTiffReader that might be impacted, and what SHOULD be the fix?

it said:

  1. Does FormatReader.close definitely have a bug?

YES, there is a clear bug. Here's the evidence:

FormatReader.close(boolean fileOnly) (lines 946-956):

  @Override
  public void close(boolean fileOnly) throws IOException {
    if (in != null) in.close();  // ← Closes the stream
    if (!fileOnly) {
      in = null;                  // ← Only nulls when fileOnly=false
      currentId = null;
      resolution = 0;
      core = null;
    }
  }

The Bug: When close(fileOnly=true) is called, the stream is closed but in is NOT set to null. This leaves in pointing to a closed RandomAccessInputStream
object.

Why This Matters:

  • MinimalTiffReader.initTiffParser() (line 698) checks if (in == null) to decide whether to create a new stream
  • After close(fileOnly=true), this check fails because in != null (it's just closed, not null)
  • The code tries to use the closed stream, causing errors
  1. Would fixing it in FormatReader be risky?

NO, fixing it in FormatReader would be quite safe. Here's why:

I examined all the reopenFile() implementations in the codebase (~186 readers total). They fall into three patterns:

Pattern 1: Don't check, just recreate (TileJPEGReader, ICSReader, etc.):

  public void reopenFile() throws IOException {
    if (in != null) { in.close(); }
    in = new RandomAccessInputStream(currentId);
  }

✅ Would work fine with the fix (harmless to close null, and they recreate regardless)

Pattern 2: Explicitly null first (MinimalTiffReader):

  public void reopenFile() throws IOException {
    in = null;  // ← Explicitly sets to null
    initTiffParser();
  }

✅ Would work fine with the fix (they already expect this behavior)

Pattern 3: Delegate to super (FlowSightReader, OMETiffReader):

public void reopenFile() throws IOException {
    super.reopenFile();  // ← Calls FormatReader.reopenFile()
    // ... additional logic
  }

✅ Would work fine with the fix (they rely on base class behavior)

Conclusion:
Adding in = null after in.close() in FormatReader.close(fileOnly=true) is safe and would not break any existing readers. In fact, it would fix
latent bugs in multiple readers.

  1. Which readers are impacted?

Potentially Affected Readers:

  • MinimalTiffReader (and all subclasses):
    • BaseTiffReader
    • SVSReader
    • OMETiffReader (partially - it has workarounds)
    • ~30+ other TIFF-based readers

Any reader that uses an initialization pattern like:

  protected void initSomething() {
    if (in == null) {
      in = new RandomAccessInputStream(...);
    }
    // use in
  }

Actual Impact in Practice:

  • Most readers don't encounter this because they don't use close(fileOnly=true) followed by reopenFile()
  • The bug only manifests when:
    a. close(fileOnly=true) is called
    b. Then reopenFile() or similar initialization is called
    c. The initialization checks if (in == null) before creating a stream

Summary

  1. YES, FormatReader.close() has a bug when fileOnly=true
  2. NO, fixing it in FormatReader is NOT risky - it's actually the safest approach
  3. Multiple readers are potentially impacted, especially TIFF-based ones
  4. The proper fix is a one-line change in FormatReader.close() to set in = null after closing

The current workaround in MinimalTiffReader (setting in = null in reopenFile()) masks the bug but doesn't fix the root cause. Fixing it in FormatReader would be
the cleanest, safest, and most maintainable solution.

After making this change, all tests continued passing, and reopening SVS also passed. Happy to have it rejected or modified of course 👍

@tlambert03
Copy link
Author

after digging into this a bit more, I'm less confident that this is safe. I guess i'm just unclear on what the API contract is for close(fileOnly=true). So maybe I should have opened this as an issue instead:

What is the semantic meaning of close(fileOnly=True)? What should a user be able to expect from future calls to reopenFile() followed by openBytes()?

(and, feel free to close this if you're certain it's an incorrect fix that needs more discussion)

@melissalinkert
Copy link
Member

Thanks, @tlambert03. On the surface, this doesn't look like a problematic change, but it would be helpful to have a little more information in order to evaluate more carefully. Do you have a minimal example that we can use to reproduce the original NullPointerException?

reopenFile() is mostly used in the context of reloading a memo file (see https://github.com/ome/bioformats/blob/develop/components/formats-bsd/src/loci/formats/Memoizer.java#L712), and the behavior of something like:

ImageReader r = new ImageReader();
r.setId("/path/to/file");
r.close(true);
r.reopenFile();

isn't necessarily widely tested. Can you explain a bit under what circumstances bffile needs to close/re-open the file(s) on an initialized reader?

As a side note, we would need a signed Contributor License Agreement in order to merge this pull request once it passes review (see also https://ome-contributing.readthedocs.io/en/latest/third-party-policy.html#third-party-contributions).

@tlambert03
Copy link
Author

thanks @melissalinkert

here's a minimal reproducer. If you pass in an .svs file like CMU-1.svs you'll see the error on the develop branch:

import loci.formats.ImageReader;

public class ReopenBug {
    public static void main(String[] args) throws Exception {
        ImageReader reader = new ImageReader();
        reader.setId(args[0]);
        reader.openBytes(0, 0, 0, 512, 512);  
        reader.close(true);                 
        reader.reopenFile();                
        reader.openBytes(0, 0, 0, 512, 512);  // NPE here on develop!
        reader.close();
        System.out.println("Success!");
    }
}

Can you explain a bit under what circumstances bffile needs to close/re-open the file(s) on an initialized reader?

sure!
A really nice and common API in python is to present some underlying resource as a virtual object with a common API. dask,zarr,xarray, etc... In the original aicsimageio reader I wrote (now in bioio-bioformats) there is a roughly working proof of principle of that, and I'm picking it up again in bffile. The usage pattern is lovely: end user generally doesn't need to care or think about any of the details, they get presented with an object that they can slice into and all the work is done lazily to get the minimum amount of bytes when needed.

The real challenge there lies in resource management. If we present something like:

zarr_group = open_bio_zarr('some_file.svs')

then it's a bit of a bummer to also have to either demand that the end user only ever use that object within some single context where the underlying reader is open. But on the other hand, we don't want to leak file resources and file handles. And if we completely spin up and tear down the whole thing on every partial read, then it's a performance disaster. So, that's why I wanted to use close(true) -> reopenFile() -> openBytes.

I will say that it's actually working quite nicely for all the other files I tried (ome.tiff, nd2, czi, dv, etc...). It just seems that a handful of the Readers aren't ready to support that. I can also just accept that this is a "risky maneuver", and we can just either raise file-specific exceptions or just refuse to use certain Reader classes with this pattern... but if it can be improved in a safe way, then I thought it worth trying

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants