Skip to content

Commit ecf193a

Browse files
committed
Docs
1 parent d3322ec commit ecf193a

File tree

2 files changed

+137
-0
lines changed

2 files changed

+137
-0
lines changed

docs/reference/internal_design.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,5 @@ Internal Reference Docs
66

77
open_files
88
limits
9+
tiff_metadata
10+

docs/reference/tiff_metadata.rst

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
Tiff Metadata
2+
=============
3+
4+
Pillow currently reads TIFF metadata in pure python and writes either
5+
through its own writer (if writing an uncompressed tiff) or libtiff.
6+
7+
Basic overview
8+
++++++++++++++
9+
10+
TIFF is Tagged Image File Format -- the metadata is stored as a well
11+
known tag number, a type, a quantity, and the data. Many tags are
12+
defined in the spec and implemented in libtiff, but there's also the
13+
possibility to add custom tags that aren't specified in the spec. The
14+
metadata is packed into a structure known as the Image File Directory,
15+
or IFD. We define many of these tags in :py:mod:`~PIL.TiffTags`.
16+
17+
The Tiff IFD is also used in other file formats (like JPEG) to store
18+
EXIF information or other metadata.
19+
20+
Writing Metadata in Libtiff
21+
+++++++++++++++++++++++++++
22+
23+
There are three categories of metadata::
24+
25+
* Built in, but not special cases
26+
* Special cases, built in
27+
* Custom tags
28+
29+
These categories aren't listed in the docs anywhere, it's from a dive
30+
through libtif's tif_dir.c and the various headers.
31+
32+
Metadata is set using a single api call to ``TIFFVSetField(tiff, tag,
33+
*args)`` which has a var args setup, so the function signature changes
34+
based on the tag passed in. The count and data sizes are defined in
35+
the libtiff directory headers. There are many different memcopys that
36+
are performed with **no** validation of the input parameters, either
37+
in type or quantity. These lead to a segfault in the best case.
38+
39+
.. Warning::
40+
This is a security nightmare.
41+
42+
Because of this security nightmare, we're whitelisting and testing
43+
individual tiff tags for writing. The complexity of this simple
44+
interface means that we have to essentially duplicate the logic of the
45+
libtiff interface to put the parameters in the right configuration. We
46+
are whitelisting these tags in :py:mod:`~PIL.TiffTags.LIBTIFF_CORE`.
47+
48+
49+
Built In
50+
--------
51+
52+
There is a long list (in theory, you have to go through the code for
53+
them) of built in items that regular. These have one of three call
54+
signatures:
55+
56+
* Individual: ``TIFFVSetField(tiff, tag, item)``
57+
* Multiple: ``TIFFVSetField(tiff, tag, ct, items* )``
58+
* Alternate Multiple: ``TIFFVSetField(tiff, tag, items* )``
59+
60+
In libtiff4, the individual integer like numeric items are passed as
61+
32 bit ints (signed or unsigned as appropriate) even if the actual
62+
item is a short or char. The individual rational and floating point
63+
types are all passed as a double.
64+
65+
The multipile value items are passed as pointers to packed arrays of
66+
the correct type, short for short.
67+
68+
UNDONE -- This isn't quite true: The count is only used for items
69+
where field_passcount is true. Then if ``field->writecount`` ==
70+
``TIFF_VARIABLE2``, then it's a ``uint32``, otherwise count is an int.
71+
Otherwise, if ``field_writecount`` is ``TIFF_VARIABLE`` or
72+
``TIFF_VARIABLE2``, then the count is not used and set to 1. If it's
73+
``TIFF_SPP``, then it's set to samplesperpixel. Otherwise, it's set to
74+
the ``field_writecount``.
75+
76+
77+
Special Cases
78+
-------------
79+
80+
There are a set of special cases in the ``tif_dir.c:_TIFFVSetField``
81+
function where tag by tag, the individual items are pulled. These are
82+
mainly items that are specifically used by the tiff decoder. The
83+
individual items all follow the pattern of the built ins above, but
84+
the array based items are special, each in their own way.
85+
86+
* Where there are two shorts passed in, they are passed as separate
87+
parameters rather than a packed array. (e.g. ``TIFFTAG_PAGENUMBER``)
88+
89+
* ``TIFFTAG_REFERENCEBLACKWHITE`` is just passed as an array of 6
90+
``float``, there's no count of items.
91+
92+
* ``TIFFTAG_COLORMAP`` is passed as three pointers to arrays of ``short``
93+
94+
* ``TIFFTAG_TRANSFERFUNCTION`` is passed as 1 or 3 pointers to arrays
95+
of ``short``
96+
97+
* ``TIFFTAG_INKNAMES`` is passed as a length and a ``char*``.
98+
99+
* ``TIFFTAG_SUBIFD`` is passed as a length and pointer to ``uint32``
100+
UNDONE -- is this length in bytes, or in integers, and does this
101+
change in libtiff5?
102+
103+
Custom Tags
104+
-----------
105+
106+
These are tags that are not defined in libtiff. To use these, we would
107+
need to define the tag for that image by passing in the appropriate
108+
definition.
109+
110+
.. Note::
111+
Custom tags are currently unimplemented.
112+
113+
114+
Writing Metadata in Python
115+
++++++++++++++++++++++++++
116+
117+
UNDONE -- review/expand this on down.
118+
119+
When writing a TIFF file using python, the IFD is written using the
120+
code at
121+
:py:mod:`~PIL.TiffImagePlugin.ImageFileDirectory_v2.save`. This uses
122+
the types from the IFD to control the writing. This leads to safe but
123+
possibly out of spec writing.
124+
125+
Metadata Storage in Pillow
126+
++++++++++++++++++++++++++
127+
128+
* See TiffImagePlugin
129+
* tags_v2 vs tags
130+
131+
Reading Metadata in TiffImagePlugin
132+
+++++++++++++++++++++++++++++++++++
133+
134+
* Type confusion between file and spec.
135+

0 commit comments

Comments
 (0)