|
| 1 | +Tiff Metadata |
| 2 | +============= |
| 3 | + |
| 4 | +Pillow currently reads TIFF metadata in pure python and writes either |
| 5 | +through its own writer (if writing an uncompressed tiff) or libtiff. |
| 6 | + |
| 7 | +Basic overview |
| 8 | +++++++++++++++ |
| 9 | + |
| 10 | +TIFF is Tagged Image File Format -- the metadata is stored as a well |
| 11 | +known tag number, a type, a quantity, and the data. Many tags are |
| 12 | +defined in the spec and implemented in libtiff, but there's also the |
| 13 | +possibility to add custom tags that aren't specified in the spec. The |
| 14 | +metadata is packed into a structure known as the Image File Directory, |
| 15 | +or IFD. We define many of these tags in :py:mod:`~PIL.TiffTags`. |
| 16 | + |
| 17 | +The Tiff IFD is also used in other file formats (like JPEG) to store |
| 18 | +EXIF information or other metadata. |
| 19 | + |
| 20 | +Writing Metadata in Libtiff |
| 21 | ++++++++++++++++++++++++++++ |
| 22 | + |
| 23 | +There are three categories of metadata:: |
| 24 | + |
| 25 | +* Built in, but not special cases |
| 26 | +* Special cases, built in |
| 27 | +* Custom tags |
| 28 | + |
| 29 | +These categories aren't listed in the docs anywhere, it's from a dive |
| 30 | +through libtif's tif_dir.c and the various headers. |
| 31 | + |
| 32 | +Metadata is set using a single api call to ``TIFFVSetField(tiff, tag, |
| 33 | +*args)`` which has a var args setup, so the function signature changes |
| 34 | +based on the tag passed in. The count and data sizes are defined in |
| 35 | +the libtiff directory headers. There are many different memcopys that |
| 36 | +are performed with **no** validation of the input parameters, either |
| 37 | +in type or quantity. These lead to a segfault in the best case. |
| 38 | + |
| 39 | +.. Warning:: |
| 40 | + This is a security nightmare. |
| 41 | + |
| 42 | +Because of this security nightmare, we're whitelisting and testing |
| 43 | +individual tiff tags for writing. The complexity of this simple |
| 44 | +interface means that we have to essentially duplicate the logic of the |
| 45 | +libtiff interface to put the parameters in the right configuration. We |
| 46 | +are whitelisting these tags in :py:mod:`~PIL.TiffTags.LIBTIFF_CORE`. |
| 47 | + |
| 48 | + |
| 49 | +Built In |
| 50 | +-------- |
| 51 | + |
| 52 | +There is a long list (in theory, you have to go through the code for |
| 53 | +them) of built in items that regular. These have one of three call |
| 54 | +signatures: |
| 55 | + |
| 56 | +* Individual: ``TIFFVSetField(tiff, tag, item)`` |
| 57 | +* Multiple: ``TIFFVSetField(tiff, tag, ct, items* )`` |
| 58 | +* Alternate Multiple: ``TIFFVSetField(tiff, tag, items* )`` |
| 59 | + |
| 60 | +In libtiff4, the individual integer like numeric items are passed as |
| 61 | +32 bit ints (signed or unsigned as appropriate) even if the actual |
| 62 | +item is a short or char. The individual rational and floating point |
| 63 | +types are all passed as a double. |
| 64 | + |
| 65 | +The multipile value items are passed as pointers to packed arrays of |
| 66 | +the correct type, short for short. |
| 67 | + |
| 68 | +UNDONE -- This isn't quite true: The count is only used for items |
| 69 | +where field_passcount is true. Then if ``field->writecount`` == |
| 70 | +``TIFF_VARIABLE2``, then it's a ``uint32``, otherwise count is an int. |
| 71 | +Otherwise, if ``field_writecount`` is ``TIFF_VARIABLE`` or |
| 72 | +``TIFF_VARIABLE2``, then the count is not used and set to 1. If it's |
| 73 | +``TIFF_SPP``, then it's set to samplesperpixel. Otherwise, it's set to |
| 74 | +the ``field_writecount``. |
| 75 | + |
| 76 | + |
| 77 | +Special Cases |
| 78 | +------------- |
| 79 | + |
| 80 | +There are a set of special cases in the ``tif_dir.c:_TIFFVSetField`` |
| 81 | +function where tag by tag, the individual items are pulled. These are |
| 82 | +mainly items that are specifically used by the tiff decoder. The |
| 83 | +individual items all follow the pattern of the built ins above, but |
| 84 | +the array based items are special, each in their own way. |
| 85 | + |
| 86 | +* Where there are two shorts passed in, they are passed as separate |
| 87 | + parameters rather than a packed array. (e.g. ``TIFFTAG_PAGENUMBER``) |
| 88 | + |
| 89 | +* ``TIFFTAG_REFERENCEBLACKWHITE`` is just passed as an array of 6 |
| 90 | + ``float``, there's no count of items. |
| 91 | + |
| 92 | +* ``TIFFTAG_COLORMAP`` is passed as three pointers to arrays of ``short`` |
| 93 | + |
| 94 | +* ``TIFFTAG_TRANSFERFUNCTION`` is passed as 1 or 3 pointers to arrays |
| 95 | + of ``short`` |
| 96 | + |
| 97 | +* ``TIFFTAG_INKNAMES`` is passed as a length and a ``char*``. |
| 98 | + |
| 99 | +* ``TIFFTAG_SUBIFD`` is passed as a length and pointer to ``uint32`` |
| 100 | + UNDONE -- is this length in bytes, or in integers, and does this |
| 101 | + change in libtiff5? |
| 102 | + |
| 103 | +Custom Tags |
| 104 | +----------- |
| 105 | + |
| 106 | +These are tags that are not defined in libtiff. To use these, we would |
| 107 | +need to define the tag for that image by passing in the appropriate |
| 108 | +definition. |
| 109 | + |
| 110 | +.. Note:: |
| 111 | + Custom tags are currently unimplemented. |
| 112 | + |
| 113 | + |
| 114 | +Writing Metadata in Python |
| 115 | +++++++++++++++++++++++++++ |
| 116 | + |
| 117 | +UNDONE -- review/expand this on down. |
| 118 | + |
| 119 | +When writing a TIFF file using python, the IFD is written using the |
| 120 | +code at |
| 121 | +:py:mod:`~PIL.TiffImagePlugin.ImageFileDirectory_v2.save`. This uses |
| 122 | +the types from the IFD to control the writing. This leads to safe but |
| 123 | +possibly out of spec writing. |
| 124 | + |
| 125 | +Metadata Storage in Pillow |
| 126 | +++++++++++++++++++++++++++ |
| 127 | + |
| 128 | +* See TiffImagePlugin |
| 129 | +* tags_v2 vs tags |
| 130 | + |
| 131 | +Reading Metadata in TiffImagePlugin |
| 132 | ++++++++++++++++++++++++++++++++++++ |
| 133 | + |
| 134 | +* Type confusion between file and spec. |
| 135 | + |
0 commit comments