Add layout module overview .md file from KB

introfog · introfog · commit 09af7a6b3081 · 2023-03-23T07:27:04.000Z
DEVSIX-7309
diff --git a/README.md b/README.md
@@ -4,7 +4,7 @@ The iText 7 Core/Community release contains:
 
 - ```kernel-x.y.z.jar```: low-level functionality
 - ```io-x.y.z.jar```:  low-level functionality
-- ```layout-x.y.z.jar```: high-level functionality
+- ```layout-x.y.z.jar```: high-level functionality. For more information see [layout overview][layoutMd].
 - ```forms-x.y.z.jar```: AcroForms
 - ```pdfa-x.y.z.jar```: PDF/A-specific functionality
 - ```pdftest-x.y.z.jar```: test helper classes
@@ -15,6 +15,9 @@ The iText 7 Core/Community release contains:
 - ```styled-xml-parser-x.y.z.jar```: use this if you need support for SVG or html2pdf
 - ```svg-x.y.z.jar```: SVG support
 - ```commons-x.y.z.jar```: commons module
+- ```bouncy-castle-connector-x.y.z.jar```: auxiliary internal module
+- ```bouncy-castle-adapter-x.y.z.jar```: use this to apply BouncyCastle as low-level cryptography library
+- ```bouncy-castle-fips-adapter-x.y.z.jar```: use this to apply BouncyCastle FIPS as low-level cryptography library
 
 The **iText 7 Community** source code is hosted on [Github][github], where you can also [download the latest releases][latest].
 
@@ -48,6 +51,7 @@ Contact [sales] for more info.
 [agpl]: LICENSE.md
 [building]: BUILDING.md
 [contributing]: CONTRIBUTING.md
+[layoutMd]: layout/MODULE_OVERVIEW.md
 [itext]: https://itextpdf.com/
 [github]: https://github.com/itext/itext7
 [latest]: https://github.com/itext/itext7/releases/latest
diff --git a/layout/Layout mechanism.png b/layout/Layout mechanism.png
diff --git a/layout/MODULE_OVERVIEW.md b/layout/MODULE_OVERVIEW.md
@@ -0,0 +1,114 @@
+# Layout module overview
+
+Layout is a basic iText module that performs the operations of transforming abstract elements 
+(like Paragraph, Table, List) into low-level PDF syntax on actual document pages.
+
+In terms of the content presentation, PDF format only accepts low-level operations, like "draw a character at a given 
+position" or "draw a line from (x1,y1) to (x2, y2)". The layout module mostly consists of the **rendering engine** logic, 
+which deals with the placement on the page of various **model elements**: calculating the exact element's position on the page 
+and constructing drawing operations in PDF syntax.
+
+Consider the mechanism of rendering elements.
+
+### Rendering Engine
+
+#### Property Containers & Layout Objects
+At the beginning we will start from the `IPropertyContainer` interface. This interface defines methods to set, get, and 
+delete properties. These methods work with generic types. All data used in the elements is stored as a property.
+
+Properties are stored in maps, and you can access them using a special key, which is the property number. 
+Why do we store properties in a map, instead of using regular fields? First of all, it saves memory, because there are 
+many different types of properties, but each element only works with some part of them. Also, it allows to maintain a 
+simple inheritance mechanism when we look for a property in the parent map if cannot find it in the element map. 
+A list of all the properties is in the class `com.itextpdf.layout.property.Property`. Some properties are noted as 
+inherited and placed in `Property#INHERITED_PROPERTIES` array. This means that when we try to get the property of the element 
+by the method `IPropertyContainer#getProperty`, if such property is not in the properties list of the current renderer, 
+it will be searched in parents recursively. With the approach when properties are kept in the map, the inheritance 
+mechanism is the same for every property: there's no need to implement it over and over again or to use reflection.
+
+This interface `IPropertyContainer` has two direct sub-interfaces: `IElement` and `IRenderer`. The `IElement` interface is 
+implemented by classes such as `Text`, `Paragraph` and `Table`. These are the objects that we'll add to a document, either 
+directly or indirectly (for example when we add `String` to `Paragraph`, and under the hood, the `Text` element will 
+automatically wrap this string). The `IRenderer` interface is implemented by classes such as `TextRenderer`, 
+`ParagraphRenderer` and `TableRenderer`. These renderers are used internally by iText, but we can subclass them if we 
+want to tweak the way an object is rendered. Each renderer borrows the properties of the corresponding model element: 
+it first checks if the property is available in the renderer, then - in the corresponding model element and then 
+performs the same check for the parent renderer (if the property is inheritable). **If during layout it's needed to 
+override the model element properties, one should set them to the renderer**, because we don't want to pollute the 
+model element properties. It is important to separate element model structure and logic which performs actual element 
+placement (rendering logic). At the model level, a tree of models is created - this is an abstraction that represents 
+the structure of the elements that will be added to the documents. Each of these elements (Paragraph, Image, etc.) can 
+be added several times. And the rendering includes the basic logic that fills the PDF with data, obtained from the model 
+elements tree. Different renderers can be created for one model element, and the result of the work of these renderers 
+will be different, but each renderer has one model element on which it is based.
+
+Let's consider in more detail the mechanism of renderers.
+
+#### Renderers
+Renderers have two main responsibilities
+
+- `Layout()` - Calculating the area & position its object takes up on the canvas. `Layout()` can work with different input 
+parameters and properties, and it's explicitly allowed to call layout several times for the same renderer and results 
+will differ.
+- `Draw()` - Creating the appearance and adding it to the canvas. It can be called only once after layout, `PdfDocument` 
+is changed after `Draw()`.
+
+The base class for renderers is `AbstractRenderer`. It contains a basic set of properties and operations that are common 
+to all renderers. The next important class is `BlockRenderer` which is a superclass for high-level layout objects 
+renderers such as `DivRenderer`, `ListRenderer` etc. At a lower level, a `LeafRenderer` arises, which works with elements 
+such as `TextRenderer` and `ImageRenderer`. We also need to mention `LineRenderer`, which is not an independent renderer, 
+but is only used inside the `ParagraphRenderer`. And the main entry for the layout mechanism is the abstract class `RootRenderer`. 
+In the methods of this class, the mechanism for constructing the hierarchy of renderers is introduced. It does not have 
+parental renderers and some root renderers such as `CanvasRenderer` and `DocumentRenderer` are inherited from it. 
+These renderers are created from the `Document` and `Canvas` objects. **You need to understand the difference between the 
+_Document_ and _PdfDocument_ objects and also between _Canvas_ and _PdfCanvas_**. So `PdfDocument` and `PdfCanvas` work with the PDF 
+on a more low level with PDF pages, internal PDF objects such as arrays, dictionaries, etc. And `Document` and `Canvas` with 
+their corresponding renderers are connecting links between the layout mechanism and the output PDF file structure. 
+So these classes can do the similar operations but on different levels of abstraction. For example, if you need 15 
+lines of code to add some text with `PdfDocument`, `Document` will do this in a few lines. `DocumentRenderer` - directs 
+writing of the layout objects to page content streams and handles creation of the new pages in the document if needed 
+for continuous placement of elements not fitting on one page. `CanvasRenderer` - directs writing of the layout objects 
+to a single arbitrary content stream (e.g. `PdfFormXObject`, or also page content stream), so it only writes to a single 
+area, which means that not fitting content will not be shown.
+
+The rendering logic is triggered when a certain element is added to the `RootElement`. Elements added to the document are
+presented in the form of a tree, where each parent element has a list of children. This tree is formed at the stage of
+writing code, when the added elements are declared, as here:
+
+```
+Div container = new Div();
+Paragraph paragraph = new Paragraph("New paragraph.");
+container.add(paragraph);
+```
+
+Then, using the `IElement#createRendererSubTree` method, these elements are recursively converted to a depth-first traversal
+tree of renderers. Next, the renderer layout algorithm begins to work. It starts by calling the `layout()` method of 
+the renderer for the element that was added directly to the root.
+
+The meaning of the `layout()` method is to determine the free space on the page and fill it with elements. And first, 
+layout is performed for all children of the element (Depth-first traversal). Occupied space of a parent is determined
+by its children + own properties. Data from parent elements to children is transmitted using a `LayoutContext`, which 
+stores information about the area, page number, and others.
+
+When filling out the page, two types of areas are used: `LayoutBBox` (represents the available area, that parent gives for
+children elements) and `OccupiedArea` (represents the area taken up by all placed elements, includes child renderers 
+occupied area). If an element doesn't fit in a given area, it's split into two independent renderers. First renderer is 
+usually named **split renderer** and the second - **overflow renderer**. **Split renderer** is a renderer with data that is fitted
+to the available page area, it is successfully layouted and ready to be drawn. And the **overflow renderer** contains part 
+of the element which is not yet positioned, it is transferred to the next page, and we call `layout()` on it. After all 
+actions in `layout()` are finished it return `LayoutResult` with the results placement current element on the page. This 
+object contains info about whether the current renderer was placed on the page in full (`LayoutResult#FULL`), partially 
+(`LayoutResult#PARTIAL`) or not at all (`LayoutResult#NOTHING`), also the info about the occupied area and the split/overflow 
+renderers.
+
+Briefly, the layout mechanism is shown in the figure:
+
+![Layout mechanism](Layout mechanism.png)
+
+The next step is to call the `Draw()` method. It uses layout-result from `Layout()` step and generates PDF syntax, 
+written to the PDF document: drawing instructions based on the rendering result.
+
+Specific Renderers contain the following instructions:
+- `TextRenderer`: text instructions to `PdfCanvas`
+- `ImageRenderer`: creating and adding `XObject`
+- `TableRenderer`: borders, etc.