Skip to content

ENH: Better support for ESO decompression (particularly on Windows) #3443

@kYwzor

Description

@kYwzor

This is a follow up to issue #1818 and its corresponding PR #2681.

Issue

Currently, the implementation of eso.core._unzip_file handles .Z and .gz files by doing system calls to gunzip:

def _unzip_file(self, filename: str) -> str:
"""
Uncompress the provided file with gunzip.
Note: ``system_tools.gunzip`` does not work with .Z files
"""
uncompressed_filename = None
if filename.endswith(('fits.Z', 'fits.gz')):
uncompressed_filename = filename.rsplit(".", 1)[0]
if not os.path.exists(uncompressed_filename):
log.info(f"Uncompressing file {filename}")
try:
subprocess.run([self.GUNZIP, filename], check=True)
except Exception as ex:
uncompressed_filename = None
log.error(f"Failed to unzip {filename}: {ex}")
return uncompressed_filename or filename

While most (though not necessarily all) Linux/MacOS users are able to do this system call, Windows users are unlikely to have gunzip installed, especially in a system-wide fashion. In other words, the current code is incompatible with most Windows users. Another downside is that the current code creates a full decompressed copy of the file in storage, which may be unnecessary in cases where we only need to read a small part of the file (not to mention storage space concerns).

Right, now if one attempts to perform the following code on a Windows install with no gunzip:

from astroquery.eso import Eso
files = Eso().retrieve_data("NACO.2016-09-16T08:37:27.966", destination="test_folder")

it prints out the error:

UserWarning: Unable to unzip files (gunzip is not available on this system)

And no files are made available on the folder, not even the compressed files.

Possible solution

For .gz files, the alternative is clear: we can simply use Python's gzip module, which will work equally on all OSes. This option also enables us to support decompression on-the-fly, avoiding the full decompressed copy. It should also perform similarly to the system call, possibly better as there is less overhead in principle.

As for .Z files, they are a bit trickier. We went through a similar issue in Astropy (see astropy/astropy#10714), because the gzip module does not support them and thus there is no built-in module in Python for handling .Z files. To solve this issue, I ended up creating uncompresspy which is a Pure Python package for reading .Z files that supports decompression on-the-fly and works on all OSes. This option was recently added to Astropy (see astropy/astropy#17960) and so far it has worked fine with all known files. There are some performance concerns (see astropy/astropy#10714 (comment)), but I believe integrating it in astroquery would be possible and an improvement at least for all users that do not have gunzip. A possible concern is that uncompresspy only supports Python 3.10+, while Astroquery provides support for Python 3.9+, but I'm not sure how important this is, given that the official support for Python 3.9 will be dropped in just a few days and uncompresspy would always be an optional dependency.

More generally, I wonder if the compression handling logic here could be replaced by Astropy's get_readable_fileobj? Does the ESO archive ever store anything in a compressed format that is not .gz or .Z?

If there's interest in this, I could propose a PR, but I'd need some guidance on how you'd want me to approach this (e.g. should we just prefer gunzip system calls for those that do have it, and fallback to gzip/uncompresspy for those that don't?)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions