Skip to content

CCITT group 4 (Fax4) decoding support#229

Merged
197g merged 9 commits intoimage-rs:mainfrom
stephenjudkins:sdj/fax4-support
Aug 31, 2025
Merged

CCITT group 4 (Fax4) decoding support#229
197g merged 9 commits intoimage-rs:mainfrom
stephenjudkins:sdj/fax4-support

Conversation

@stephenjudkins
Copy link
Contributor

@stephenjudkins stephenjudkins commented Apr 5, 2024

This adds support for decoding CCIT group 4 tiff images by using the fax crate.

I've found that the photometric interpretation tag is ignored by all the extant decoders I've found, and attempting to correctly interpret it will break some subset of images. Unfortunate, but I suspect it's best to follow the herd here?

@s3bk
Copy link

s3bk commented Apr 5, 2024

Probably check Tag::PhotometricInterpretation to decide what values are black and white.

@stephenjudkins
Copy link
Contributor Author

OK. I've done some research and found that

  1. other image processors (imagemagick, Apple Preview) ignore the photometric interpretation tag and always assume that it's WhiteIsZero
  2. fax4-encoded images I've found in the wild (which are all, unfortunately, confidential images of business checks) have the tag set correctly WhiteIsZero
  3. imagemagick creates fax4-encoded tiffs with photometric interpretation incorrectly as BlackIsZero but that doesn't matter because everything that loads the files assumes WhiteIsZero.

To conform with how everything in the wild I'd argue we just keep this hardcoded the way it is?

@s3bk
Copy link

s3bk commented Apr 5, 2024

hahaha. Yea, I think just keep it as is.

@stephenjudkins stephenjudkins marked this pull request as ready for review April 5, 2024 21:45
Copy link
Contributor

@fintelia fintelia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. I think might also be a missing check that fax compression is only used for bilevel images?

@fintelia
Copy link
Contributor

fintelia commented Apr 7, 2024

Ideally, I'd like to see the create_reader act more lazily. The best way would be to have incremental decoding, but the fax crate doesn't seem to support that. An intermediate step might be to generate the list of color transitions, and then lazily expand that into the provided output buffer

@s3bk
Copy link

s3bk commented Apr 7, 2024

I don't see a reason why incremental decoding could not be implemented.

It would be pretty easy to write something like

impl Decoder<R: Read> {
  fn new(reader: R) -> Self;
  fn advance(&mut self) -> Result<(), io::Error>;
  fn pels(&self) -> Pels;
}

@fintelia
Copy link
Contributor

fintelia commented Apr 7, 2024

Yeah, I think an interface like that would work. Then it could be wrapped in another object that implemented Read by tracking how far into the row had been read and expanded Pels pixel by pixel

@stephenjudkins
Copy link
Contributor Author

Appreciate the feedback and all the back-and-forth here. I'm happy to help out as much as I can here to push this forward, but I don't want to step on @s3bk's feet if he's working on these changes.

@s3bk
Copy link

s3bk commented Apr 8, 2024

@stephenjudkins I pushed changes to the fax repo. There are strange differences with the last line again. I need to re-encode my samples with libtiff to check if the error is in the sample data or my code.

But you should be able to use the code to adapt this PR to the next version.

@inzanez
Copy link

inzanez commented Oct 30, 2024

What is the state of this? Anything one can do?

@s3bk
Copy link

s3bk commented Oct 30, 2024

I don't know, but from what a I remember, the fax tests are all passing now.

@inzanez
Copy link

inzanez commented Oct 30, 2024

Seems that something with the test_fax4 is still messed up, ...

@stephenjudkins
Copy link
Contributor Author

Sorry! I dropped this but would like to pick it back up. Can we confirm that the upstream fax crate can decode the Fax4 files that are included?

@stephenjudkins
Copy link
Contributor Author

OK, I've made some changes, and I've gotta say...I'm putting in a lot of work to support 1-bit samples, both here and in the image crate, when we're just transforming these back into 8-bit pixels in the image crate anyways. I could add in a flag (like in the PNG crate) to optionally expand to 8-bit samples for image crate to use. Then we'd have the 1-bit sample path that I suspect no one would actually ever use? I'd have this done and working already, without any changes to the image crate at all, if I just kept the 8-bit samples to begin with.

If you feel strongly on keeping the 1-bit image option I'll add the flag but there are a lot of edge cases I'm pretty worried about testing. Are you sure that you want this? It's frustrating to write a bunch of code that I suspect will never be used by anyone and might introduce bugs into other code paths.

@inzanez
Copy link

inzanez commented Oct 31, 2024

@stephenjudkins I assume that refers to images like this, with 1 bit per sample?:
TIFF image data, little-endian, direntries=18, height=3565, bps=1, compression=bi-level group 4, PhotometricInterpretation=WhiteIsZero

If so, that is exactly what I am looking for...

@s3bk
Copy link

s3bk commented Oct 31, 2024

Files are decoded and encoded correctly, but bit-wise verification fails for some files that have a strange EOF.

@stephenjudkins
Copy link
Contributor Author

OK. I'm still working on this, and, since this code does not currently support < 8 bit samples it's introducing some significant and pervasive changes across several code paths that I don't reasonably expect to be able to test. Would you like to help me by finding some other files that could test this? Do you know of any 1-bit tile-based (instead of strip-based) files I could check?

@fintelia
Copy link
Contributor

fintelia commented Nov 1, 2024

@stephenjudkins I made a fix to the sub 8-bpp decoding in #252. It should now be working.

If you want to make test images, you can use gdal_translate:

gdal_translate -ot Byte -co TILED=YES -co BLOCKXSIZE=16 -co BLOCKYSIZE=16 -co NBITS=1 -co COMPRESS=CCITTFAX4 input.png output.tif

For instance, this is a fax4 compressed version of the test image from that other PR: tiled-gray-i1-fax4.tif.zip (inside a ZIP archive because GitHub doesn't allow uploading TIFF files)

@stephenjudkins
Copy link
Contributor Author

Alright, I've added support for bit-sized samples as well as an optional flag (a la the PNG crate) to expand these to 8bpp. I've confirmed in the upstream image crate that, using that flag, fax4-encoded images work as expected.

@stephenjudkins
Copy link
Contributor Author

See draft PR https://github.com/image-rs/image/pull/2377/files

@fintelia
Copy link
Contributor

fintelia commented Nov 3, 2024

This adds expand_samples_to_bytes but doesn't actually implement it correctly for non-fax4 compressed images. I'd rather land the fax compression part of this PR now, and then deal with sample expansion afterwards (either in this crate, or IMO preferably in image directly)

@applecuckoo
Copy link

fax4-encoded images I've found in the wild (which are all, unfortunately, confidential images of business checks)

just an FYI here, the TIFF versions of NOAA's marine weather charts are all in Group 4 format. Those charts also happen to be all set as WhiteIsZero.

@volodkuzn
Copy link
Contributor

Hi @stephenjudkins. Do you still have capacity to continue working on this? I was chasing another issue. Apparently, it is solved by this PR.

@197g 197g force-pushed the sdj/fax4-support branch 4 times, most recently from 42b2eb6 to cf0a595 Compare August 30, 2025 22:21
@197g 197g force-pushed the sdj/fax4-support branch from 1c6fee9 to 4a68fea Compare August 30, 2025 23:26
@197g 197g merged commit 7c6c327 into image-rs:main Aug 31, 2025
15 checks passed
@github-project-automation github-project-automation bot moved this from Tiff to Gif in glycin-compat Aug 31, 2025
@Shnatsel
Copy link
Member

I see the PR got reworked and the test image that ultimately made it into the repo has BlackIsZero photometric interpretation, and the tests enforce that it is decoded like that.

This contradicts the research done in #229 (comment) and diverges from other TIFF Fax4 decoders. This also breaks the test image from libtiff, see image-rs/image#1822 (comment)

@Shnatsel
Copy link
Member

Shnatsel commented Aug 31, 2025

The test image added in this PR has BlackIsZero and is decoded correctly, but the libtiff test image has WhiteIsZero and is decoded incorrectly.

Meanwhile ImageMagick 6.9.11 decodes both correctly, as does GIMP.

I also could not reproduce the issue with imagemagick reportedly writing broken files. Here's the output of convert input.png -depth 1 -compress group4 imagemagick_group4.tiff

imagemagick_group4.tiff

It has WhiteIsZero interpretation and GIMP and imagemagick both open it correctly while tiff crate inverts the colors.

It seems the photometric interpretation needs to be read and honored after all. Either that, or we're the only ones honoring it and THAT is causing bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Gif

Development

Successfully merging this pull request may close these issues.

8 participants