|
| 1 | +# Layout module overview |
| 2 | + |
| 3 | +Layout is a basic iText module that performs the operations of transforming abstract elements |
| 4 | +(like Paragraph, Table, List) into low-level PDF syntax on actual document pages. |
| 5 | + |
| 6 | +In terms of the content presentation, PDF format only accepts low-level operations, like "draw a character at a given |
| 7 | +position" or "draw a line from (x1,y1) to (x2, y2)". The layout module mostly consists of the **rendering engine** logic, |
| 8 | +which deals with the placement on the page of various **model elements**: calculating the exact element's position on the page |
| 9 | +and constructing drawing operations in PDF syntax. |
| 10 | + |
| 11 | +Consider the mechanism of rendering elements. |
| 12 | + |
| 13 | +### Rendering Engine |
| 14 | + |
| 15 | +#### Property Containers & Layout Objects |
| 16 | +At the beginning we will start from the `IPropertyContainer` interface. This interface defines methods to set, get, and |
| 17 | +delete properties. These methods work with generic types. All data used in the elements is stored as a property. |
| 18 | + |
| 19 | +Properties are stored in maps, and you can access them using a special key, which is the property number. |
| 20 | +Why do we store properties in a map, instead of using regular fields? First of all, it saves memory, because there are |
| 21 | +many different types of properties, but each element only works with some part of them. Also, it allows to maintain a |
| 22 | +simple inheritance mechanism when we look for a property in the parent map if cannot find it in the element map. |
| 23 | +A list of all the properties is in the class `com.itextpdf.layout.property.Property`. Some properties are noted as |
| 24 | +inherited and placed in `Property#INHERITED_PROPERTIES` array. This means that when we try to get the property of the element |
| 25 | +by the method `IPropertyContainer#getProperty`, if such property is not in the properties list of the current renderer, |
| 26 | +it will be searched in parents recursively. With the approach when properties are kept in the map, the inheritance |
| 27 | +mechanism is the same for every property: there's no need to implement it over and over again or to use reflection. |
| 28 | + |
| 29 | +This interface `IPropertyContainer` has two direct sub-interfaces: `IElement` and `IRenderer`. The `IElement` interface is |
| 30 | +implemented by classes such as `Text`, `Paragraph` and `Table`. These are the objects that we'll add to a document, either |
| 31 | +directly or indirectly (for example when we add `String` to `Paragraph`, and under the hood, the `Text` element will |
| 32 | +automatically wrap this string). The `IRenderer` interface is implemented by classes such as `TextRenderer`, |
| 33 | +`ParagraphRenderer` and `TableRenderer`. These renderers are used internally by iText, but we can subclass them if we |
| 34 | +want to tweak the way an object is rendered. Each renderer borrows the properties of the corresponding model element: |
| 35 | +it first checks if the property is available in the renderer, then - in the corresponding model element and then |
| 36 | +performs the same check for the parent renderer (if the property is inheritable). **If during layout it's needed to |
| 37 | +override the model element properties, one should set them to the renderer**, because we don't want to pollute the |
| 38 | +model element properties. It is important to separate element model structure and logic which performs actual element |
| 39 | +placement (rendering logic). At the model level, a tree of models is created - this is an abstraction that represents |
| 40 | +the structure of the elements that will be added to the documents. Each of these elements (Paragraph, Image, etc.) can |
| 41 | +be added several times. And the rendering includes the basic logic that fills the PDF with data, obtained from the model |
| 42 | +elements tree. Different renderers can be created for one model element, and the result of the work of these renderers |
| 43 | +will be different, but each renderer has one model element on which it is based. |
| 44 | + |
| 45 | +Let's consider in more detail the mechanism of renderers. |
| 46 | + |
| 47 | +#### Renderers |
| 48 | +Renderers have two main responsibilities |
| 49 | + |
| 50 | +- `Layout()` - Calculating the area & position its object takes up on the canvas. `Layout()` can work with different input |
| 51 | +parameters and properties, and it's explicitly allowed to call layout several times for the same renderer and results |
| 52 | +will differ. |
| 53 | +- `Draw()` - Creating the appearance and adding it to the canvas. It can be called only once after layout, `PdfDocument` |
| 54 | +is changed after `Draw()`. |
| 55 | + |
| 56 | +The base class for renderers is `AbstractRenderer`. It contains a basic set of properties and operations that are common |
| 57 | +to all renderers. The next important class is `BlockRenderer` which is a superclass for high-level layout objects |
| 58 | +renderers such as `DivRenderer`, `ListRenderer` etc. At a lower level, a `LeafRenderer` arises, which works with elements |
| 59 | +such as `TextRenderer` and `ImageRenderer`. We also need to mention `LineRenderer`, which is not an independent renderer, |
| 60 | +but is only used inside the `ParagraphRenderer`. And the main entry for the layout mechanism is the abstract class `RootRenderer`. |
| 61 | +In the methods of this class, the mechanism for constructing the hierarchy of renderers is introduced. It does not have |
| 62 | +parental renderers and some root renderers such as `CanvasRenderer` and `DocumentRenderer` are inherited from it. |
| 63 | +These renderers are created from the `Document` and `Canvas` objects. **You need to understand the difference between the |
| 64 | +_Document_ and _PdfDocument_ objects and also between _Canvas_ and _PdfCanvas_**. So `PdfDocument` and `PdfCanvas` work with the PDF |
| 65 | +on a more low level with PDF pages, internal PDF objects such as arrays, dictionaries, etc. And `Document` and `Canvas` with |
| 66 | +their corresponding renderers are connecting links between the layout mechanism and the output PDF file structure. |
| 67 | +So these classes can do the similar operations but on different levels of abstraction. For example, if you need 15 |
| 68 | +lines of code to add some text with `PdfDocument`, `Document` will do this in a few lines. `DocumentRenderer` - directs |
| 69 | +writing of the layout objects to page content streams and handles creation of the new pages in the document if needed |
| 70 | +for continuous placement of elements not fitting on one page. `CanvasRenderer` - directs writing of the layout objects |
| 71 | +to a single arbitrary content stream (e.g. `PdfFormXObject`, or also page content stream), so it only writes to a single |
| 72 | +area, which means that not fitting content will not be shown. |
| 73 | + |
| 74 | +The rendering logic is triggered when a certain element is added to the `RootElement`. Elements added to the document are |
| 75 | +presented in the form of a tree, where each parent element has a list of children. This tree is formed at the stage of |
| 76 | +writing code, when the added elements are declared, as here: |
| 77 | + |
| 78 | +``` |
| 79 | +Div container = new Div(); |
| 80 | +Paragraph paragraph = new Paragraph("New paragraph."); |
| 81 | +container.add(paragraph); |
| 82 | +``` |
| 83 | + |
| 84 | +Then, using the `IElement#createRendererSubTree` method, these elements are recursively converted to a depth-first traversal |
| 85 | +tree of renderers. Next, the renderer layout algorithm begins to work. It starts by calling the `layout()` method of |
| 86 | +the renderer for the element that was added directly to the root. |
| 87 | + |
| 88 | +The meaning of the `layout()` method is to determine the free space on the page and fill it with elements. And first, |
| 89 | +layout is performed for all children of the element (Depth-first traversal). Occupied space of a parent is determined |
| 90 | +by its children + own properties. Data from parent elements to children is transmitted using a `LayoutContext`, which |
| 91 | +stores information about the area, page number, and others. |
| 92 | + |
| 93 | +When filling out the page, two types of areas are used: `LayoutBBox` (represents the available area, that parent gives for |
| 94 | +children elements) and `OccupiedArea` (represents the area taken up by all placed elements, includes child renderers |
| 95 | +occupied area). If an element doesn't fit in a given area, it's split into two independent renderers. First renderer is |
| 96 | +usually named **split renderer** and the second - **overflow renderer**. **Split renderer** is a renderer with data that is fitted |
| 97 | +to the available page area, it is successfully layouted and ready to be drawn. And the **overflow renderer** contains part |
| 98 | +of the element which is not yet positioned, it is transferred to the next page, and we call `layout()` on it. After all |
| 99 | +actions in `layout()` are finished it return `LayoutResult` with the results placement current element on the page. This |
| 100 | +object contains info about whether the current renderer was placed on the page in full (`LayoutResult#FULL`), partially |
| 101 | +(`LayoutResult#PARTIAL`) or not at all (`LayoutResult#NOTHING`), also the info about the occupied area and the split/overflow |
| 102 | +renderers. |
| 103 | + |
| 104 | +Briefly, the layout mechanism is shown in the figure: |
| 105 | + |
| 106 | + |
| 107 | + |
| 108 | +The next step is to call the `Draw()` method. It uses layout-result from `Layout()` step and generates PDF syntax, |
| 109 | +written to the PDF document: drawing instructions based on the rendering result. |
| 110 | + |
| 111 | +Specific Renderers contain the following instructions: |
| 112 | +- `TextRenderer`: text instructions to `PdfCanvas` |
| 113 | +- `ImageRenderer`: creating and adding `XObject` |
| 114 | +- `TableRenderer`: borders, etc. |
0 commit comments