Skip to content

Commit 09af7a6

Browse files
committed
Add layout module overview .md file from KB
DEVSIX-7309
1 parent 9761e39 commit 09af7a6

File tree

3 files changed

+119
-1
lines changed

3 files changed

+119
-1
lines changed

README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ The iText 7 Core/Community release contains:
44

55
- ```kernel-x.y.z.jar```: low-level functionality
66
- ```io-x.y.z.jar```: low-level functionality
7-
- ```layout-x.y.z.jar```: high-level functionality
7+
- ```layout-x.y.z.jar```: high-level functionality. For more information see [layout overview][layoutMd].
88
- ```forms-x.y.z.jar```: AcroForms
99
- ```pdfa-x.y.z.jar```: PDF/A-specific functionality
1010
- ```pdftest-x.y.z.jar```: test helper classes
@@ -15,6 +15,9 @@ The iText 7 Core/Community release contains:
1515
- ```styled-xml-parser-x.y.z.jar```: use this if you need support for SVG or html2pdf
1616
- ```svg-x.y.z.jar```: SVG support
1717
- ```commons-x.y.z.jar```: commons module
18+
- ```bouncy-castle-connector-x.y.z.jar```: auxiliary internal module
19+
- ```bouncy-castle-adapter-x.y.z.jar```: use this to apply BouncyCastle as low-level cryptography library
20+
- ```bouncy-castle-fips-adapter-x.y.z.jar```: use this to apply BouncyCastle FIPS as low-level cryptography library
1821

1922
The **iText 7 Community** source code is hosted on [Github][github], where you can also [download the latest releases][latest].
2023

@@ -48,6 +51,7 @@ Contact [sales] for more info.
4851
[agpl]: LICENSE.md
4952
[building]: BUILDING.md
5053
[contributing]: CONTRIBUTING.md
54+
[layoutMd]: layout/MODULE_OVERVIEW.md
5155
[itext]: https://itextpdf.com/
5256
[github]: https://github.com/itext/itext7
5357
[latest]: https://github.com/itext/itext7/releases/latest

layout/Layout mechanism.png

34.2 KB
Loading

layout/MODULE_OVERVIEW.md

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# Layout module overview
2+
3+
Layout is a basic iText module that performs the operations of transforming abstract elements
4+
(like Paragraph, Table, List) into low-level PDF syntax on actual document pages.
5+
6+
In terms of the content presentation, PDF format only accepts low-level operations, like "draw a character at a given
7+
position" or "draw a line from (x1,y1) to (x2, y2)". The layout module mostly consists of the **rendering engine** logic,
8+
which deals with the placement on the page of various **model elements**: calculating the exact element's position on the page
9+
and constructing drawing operations in PDF syntax.
10+
11+
Consider the mechanism of rendering elements.
12+
13+
### Rendering Engine
14+
15+
#### Property Containers & Layout Objects
16+
At the beginning we will start from the `IPropertyContainer` interface. This interface defines methods to set, get, and
17+
delete properties. These methods work with generic types. All data used in the elements is stored as a property.
18+
19+
Properties are stored in maps, and you can access them using a special key, which is the property number.
20+
Why do we store properties in a map, instead of using regular fields? First of all, it saves memory, because there are
21+
many different types of properties, but each element only works with some part of them. Also, it allows to maintain a
22+
simple inheritance mechanism when we look for a property in the parent map if cannot find it in the element map.
23+
A list of all the properties is in the class `com.itextpdf.layout.property.Property`. Some properties are noted as
24+
inherited and placed in `Property#INHERITED_PROPERTIES` array. This means that when we try to get the property of the element
25+
by the method `IPropertyContainer#getProperty`, if such property is not in the properties list of the current renderer,
26+
it will be searched in parents recursively. With the approach when properties are kept in the map, the inheritance
27+
mechanism is the same for every property: there's no need to implement it over and over again or to use reflection.
28+
29+
This interface `IPropertyContainer` has two direct sub-interfaces: `IElement` and `IRenderer`. The `IElement` interface is
30+
implemented by classes such as `Text`, `Paragraph` and `Table`. These are the objects that we'll add to a document, either
31+
directly or indirectly (for example when we add `String` to `Paragraph`, and under the hood, the `Text` element will
32+
automatically wrap this string). The `IRenderer` interface is implemented by classes such as `TextRenderer`,
33+
`ParagraphRenderer` and `TableRenderer`. These renderers are used internally by iText, but we can subclass them if we
34+
want to tweak the way an object is rendered. Each renderer borrows the properties of the corresponding model element:
35+
it first checks if the property is available in the renderer, then - in the corresponding model element and then
36+
performs the same check for the parent renderer (if the property is inheritable). **If during layout it's needed to
37+
override the model element properties, one should set them to the renderer**, because we don't want to pollute the
38+
model element properties. It is important to separate element model structure and logic which performs actual element
39+
placement (rendering logic). At the model level, a tree of models is created - this is an abstraction that represents
40+
the structure of the elements that will be added to the documents. Each of these elements (Paragraph, Image, etc.) can
41+
be added several times. And the rendering includes the basic logic that fills the PDF with data, obtained from the model
42+
elements tree. Different renderers can be created for one model element, and the result of the work of these renderers
43+
will be different, but each renderer has one model element on which it is based.
44+
45+
Let's consider in more detail the mechanism of renderers.
46+
47+
#### Renderers
48+
Renderers have two main responsibilities
49+
50+
- `Layout()` - Calculating the area & position its object takes up on the canvas. `Layout()` can work with different input
51+
parameters and properties, and it's explicitly allowed to call layout several times for the same renderer and results
52+
will differ.
53+
- `Draw()` - Creating the appearance and adding it to the canvas. It can be called only once after layout, `PdfDocument`
54+
is changed after `Draw()`.
55+
56+
The base class for renderers is `AbstractRenderer`. It contains a basic set of properties and operations that are common
57+
to all renderers. The next important class is `BlockRenderer` which is a superclass for high-level layout objects
58+
renderers such as `DivRenderer`, `ListRenderer` etc. At a lower level, a `LeafRenderer` arises, which works with elements
59+
such as `TextRenderer` and `ImageRenderer`. We also need to mention `LineRenderer`, which is not an independent renderer,
60+
but is only used inside the `ParagraphRenderer`. And the main entry for the layout mechanism is the abstract class `RootRenderer`.
61+
In the methods of this class, the mechanism for constructing the hierarchy of renderers is introduced. It does not have
62+
parental renderers and some root renderers such as `CanvasRenderer` and `DocumentRenderer` are inherited from it.
63+
These renderers are created from the `Document` and `Canvas` objects. **You need to understand the difference between the
64+
_Document_ and _PdfDocument_ objects and also between _Canvas_ and _PdfCanvas_**. So `PdfDocument` and `PdfCanvas` work with the PDF
65+
on a more low level with PDF pages, internal PDF objects such as arrays, dictionaries, etc. And `Document` and `Canvas` with
66+
their corresponding renderers are connecting links between the layout mechanism and the output PDF file structure.
67+
So these classes can do the similar operations but on different levels of abstraction. For example, if you need 15
68+
lines of code to add some text with `PdfDocument`, `Document` will do this in a few lines. `DocumentRenderer` - directs
69+
writing of the layout objects to page content streams and handles creation of the new pages in the document if needed
70+
for continuous placement of elements not fitting on one page. `CanvasRenderer` - directs writing of the layout objects
71+
to a single arbitrary content stream (e.g. `PdfFormXObject`, or also page content stream), so it only writes to a single
72+
area, which means that not fitting content will not be shown.
73+
74+
The rendering logic is triggered when a certain element is added to the `RootElement`. Elements added to the document are
75+
presented in the form of a tree, where each parent element has a list of children. This tree is formed at the stage of
76+
writing code, when the added elements are declared, as here:
77+
78+
```
79+
Div container = new Div();
80+
Paragraph paragraph = new Paragraph("New paragraph.");
81+
container.add(paragraph);
82+
```
83+
84+
Then, using the `IElement#createRendererSubTree` method, these elements are recursively converted to a depth-first traversal
85+
tree of renderers. Next, the renderer layout algorithm begins to work. It starts by calling the `layout()` method of
86+
the renderer for the element that was added directly to the root.
87+
88+
The meaning of the `layout()` method is to determine the free space on the page and fill it with elements. And first,
89+
layout is performed for all children of the element (Depth-first traversal). Occupied space of a parent is determined
90+
by its children + own properties. Data from parent elements to children is transmitted using a `LayoutContext`, which
91+
stores information about the area, page number, and others.
92+
93+
When filling out the page, two types of areas are used: `LayoutBBox` (represents the available area, that parent gives for
94+
children elements) and `OccupiedArea` (represents the area taken up by all placed elements, includes child renderers
95+
occupied area). If an element doesn't fit in a given area, it's split into two independent renderers. First renderer is
96+
usually named **split renderer** and the second - **overflow renderer**. **Split renderer** is a renderer with data that is fitted
97+
to the available page area, it is successfully layouted and ready to be drawn. And the **overflow renderer** contains part
98+
of the element which is not yet positioned, it is transferred to the next page, and we call `layout()` on it. After all
99+
actions in `layout()` are finished it return `LayoutResult` with the results placement current element on the page. This
100+
object contains info about whether the current renderer was placed on the page in full (`LayoutResult#FULL`), partially
101+
(`LayoutResult#PARTIAL`) or not at all (`LayoutResult#NOTHING`), also the info about the occupied area and the split/overflow
102+
renderers.
103+
104+
Briefly, the layout mechanism is shown in the figure:
105+
106+
![Layout mechanism](Layout mechanism.png)
107+
108+
The next step is to call the `Draw()` method. It uses layout-result from `Layout()` step and generates PDF syntax,
109+
written to the PDF document: drawing instructions based on the rendering result.
110+
111+
Specific Renderers contain the following instructions:
112+
- `TextRenderer`: text instructions to `PdfCanvas`
113+
- `ImageRenderer`: creating and adding `XObject`
114+
- `TableRenderer`: borders, etc.

0 commit comments

Comments
 (0)