Skip to content

New version 1.0.0 including ISOBMFF (HEIF, HEIC, AVIF) support #17

Merged
rNoz merged 20 commits intomainfrom
heic-support
Mar 28, 2025
Merged

New version 1.0.0 including ISOBMFF (HEIF, HEIC, AVIF) support #17
rNoz merged 20 commits intomainfrom
heic-support

Conversation

@rNoz
Copy link
Collaborator

@rNoz rNoz commented Mar 26, 2025

ISOBMFF Support

What do I bring here?

  • Full support for ISOBMFF image formats (box-type): HEIF, HEIC and AVIF.
  • Bumping to 1.0.0 since this library has been working for years really stable, and time to use SemVer properly.
  • Reordering types being checked in the guessing function, focusing on recent trends.

Supporting this image format has been the toughest among all supported, and one of the hardest problems has been finding real examples. Therefore, I needed to study other libraries (like Ruby's fastimage and the official spec and examples) and do my own custom processors, based on what I have found after creating and porting (online services, inspection and local tools) a set of images to these 6 file formats (heif, heic, avif; + sequence types). By the way, I studied libheif library and some of its internals (e.g., to answer some of the mime+brands I have selected).

There are a bunch of tests that are skipped by default, as they require fetching files from the Internet, but they are helpful validators any time I have to update/check ISOBMFF format.

Also, I played a lot with binary files and mock files, to check all type of malformed binary files. The Mocks.ISOBMFFTest file contains plenty of custom binaries that I have been creating manually to explore different features and validate against those, to make sure ExImageInfo continues highlighting its robustness.

Finally, it has been impossible to find a heic-sequence image file and a couple of @unimplemented returns (never failing, will be just nil), so, as more examples appear, we can extend the parser.

All images found (2 repos and local examples) are correctly parsed by ExImageInfo.

These new features solve #12.

Helpers

HexEditor/Viewer to Elixir binaries, being used in my tests with binary mocks.

defmodule HexConverter do
  def convert(hex_string) do
    hex_string
    |> String.split(" ", trim: true)
    |> Enum.chunk_every(4)
    |> Enum.map(fn chunk ->
      chunk
      |> Enum.join("")
      |> then(&("0x" <> &1 <> "::32"))
    end)
    |> Enum.chunk_every(4)
    |> Enum.map_join(",\n", &Enum.join(&1, ", "))
    |> then(&"<<\n#{&1}\n>>")
  end
end
iex> HexConverter.convert("00 00 00 03 AD 76 ... ") |> IO.puts

Thanks to @bu6n for highlighting the HEIF/HEIC/AVIF image support and doing the first steps to get that into ExImageInfo. In the end it has been much more complex to support, so I couldn't reuse your original code.

Remaining tasks:

  • Changelog updates.
  • Generate docs and review them.
  • Publishing new version in hex.
  •  Merge.

end)
end

defp read_ipma_essen_entry(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future work/Contributions welcomed: no real example reaching this case (flags? being true).

.formatter.exs Outdated
Comment on lines +7 to +9
| ["test/**/*.{ex,exs}"]
|> Enum.flat_map(&Path.wildcard(&1, match_dot: true))
|> Kernel.--(["test/ex_image_info_test/mocks/isobmff_test.exs"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish there was a way to specifically inline this on the document to tell the formatter to leave it alone or provide an exception list.

Copy link
Collaborator Author

@rNoz rNoz Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that was my first thought. AFAIK, not possible.
I did this because I don't want to reformat the binary mocks that I prepared manually.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I saw the big blob of fixture data with helpful comments. In C/C++ I'd usually do // clang-format: off just before the special block of code I hand crafted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be to use Code.eval_string, but not sure what I like better...
https://elixirforum.com/t/configure-formatter-to-ignore-some-part-of-code/16081/6

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done a "combination", just extracting the mock fixtures to keep my custom format, but allowing checking/re-styling the test file.

parse_box(rest, state)

<<_::binary-size(size), _rest::binary>> ->
@unimplemented
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future work/Contributions welcomed: no real example reaching this case (ispe box without width/height).

@ftype "HEIC"
@brands ["heic", "heix", "heim", "heis"]

@mime_seq "image/heic-sequence"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future work/Contributions welcomed: never found a image/heic-sequence real image file.

Extension might be .heics, but it is not always respected.
Use heif-info $image to get real mime+brand. If you find the missing mimetype, please, report it :)

defp aliases do
[
test_wip: ["test --only wip"],
test_all: ["test --include fetch_external"],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to validate and test main images parsed, run this.

@warmwaffles
Copy link
Collaborator

Since you are wanting to bump to 1.0, this may be a good time to review the code organization in the library as a whole. (maybe good for another PR)

@rNoz
Copy link
Collaborator Author

rNoz commented Mar 27, 2025

Any proposal?

@warmwaffles
Copy link
Collaborator

Well it has always seemed odd to me to have this:

@spec seems?(binary, format :: atom) :: boolean | nil

Return true | false | nil. Typically predicate methods thing? are only boolean() by default. In our production code, luckily we just use if seems?(foo) to get the job done since both nil and false are falsey.

def seems?(binary, :avif), do: AVIF.seems?(binary)
def seems?(binary, :heic), do: HEIC.seems?(binary)
def seems?(binary, :heif), do: HEIF.seems?(binary)
def seems?(_, _), do: nil
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def seems?(_, _), do: nil

@warmwaffles let's continue here. I agree func? is usually boolean. I used the nil to establish sorta unknown format, but I can get rid of it (and the spec to just boolean).

So, you will get a runtime error if we don't support the format.

Anything else?

Copy link
Collaborator Author

@rNoz rNoz Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, seems?/1 won't be affected, as it returns image_format() | nil. However, won't be consistent either, as it is not boolean. So, maybe seems for the guess function? (not sure if that is even more confusing than what we have.

However, removing seems?(_, _) case is weird if I leave the type(_, _) and info(_, _) cases.

So, basically, I would remove the 3, but now if someone provides an unsupported format (e.g., :mp4), it will raise (previously was just nil).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea and I would also suggest changing the type :: atom() to be the explicit defined type image_format(). This way, dialyzer should be able to aid people consuming it that they are about to use it wrong.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly just tightening up the API a bit is good.

@typedoc "Supported image format types."
@type image_format ::
:jpeg
| :jpg
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking if really removing this alias... as it is not standard and is redundant. Originally, someone wanted to have that alias, and I "pleased" him. However, I don't see a real reason to keep it.

Reviewing the case, dev was doing something really risky in many ways (to_atom(ext)).

However, I see the convenience of trying to check if the binary provided along with a filename-extension (string type), is "enough" valid. However, I see that as a bad pattern, extensions shouldn't be trusted.

And if they want to do something like that, they can just provide the alias on their side.

Thoughts?

Copy link
Collaborator

@warmwaffles warmwaffles Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, these formats do look like the file extension, so it's more a force of habit to short hand it to jpg. Thank you DOS for three letter file extensions. But if the downstream consumer is using dialyzer this should be caught.

One solution is to explicitly take the :jpg and emit a warning that it was used and that it will be removed. The other solution is to just leave it since it is well understood that :jpg and :jpeg are synonymous.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only downside of accepting 2, is that seems?/1 returns only :jpeg, not :jpg.

Another alternative to not do anything, was to remove :jpeg, so seems?/1 accepts/returns the same for a given jpeg file.

I have to say I have started thinking about this with the ISOBMFF tedious tests, as it has many extensions, and found files being heif-sequence but not using .heifs extension, or being an heif file, but using .heic extension, etc. Well, not to say capital letters (.HEIC, .BMP, that I find sometimes as well - that only affecting the above case).

@rNoz
Copy link
Collaborator Author

rNoz commented Mar 27, 2025

All done. I'll leave this PR open another day before publishing/merging, in case anyone has any other suggestion or feedback.

@rNoz rNoz merged commit f9c8315 into main Mar 28, 2025
30 checks passed
@bu6n
Copy link
Collaborator

bu6n commented Mar 29, 2025

I have not have time to react to this before merging, but I'm glad you have taken over on this. Your code reads much better and complete than mine. Congrats for 1.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants