You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/architecture/fonts.md
+17-5Lines changed: 17 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -100,8 +100,11 @@ Lets highlite several important architectural details:
100
100
- To avoid data duplication fonts loaded only once and managed by separate thread. That reduces memory footprint of the program. Another reason is limitations that particular platform typesetting library stack imposes on servo(I.E threadsafety of function calls)
101
101
- Special IPC mechanism that serve as middleman between Script and Compositor threads (processes) provided;
102
102
103
+
One additional thing that is important to mention is that currently two approaches
104
+
103
105
## Fonts sequence diagram
104
106
Lets also analyze how servo initializes cause it is important to understand when fonts is loaded in general pipeline. Sequence diagram bellom demonstrate how significant number of servo threads will be launched.
107
+
105
108
```mermaid
106
109
sequenceDiagram
107
110
box rgb(255, 173, 173) main thread
@@ -224,7 +227,7 @@ But I don't know whether it will be accepted so I will explain important concept
224
227
225
228
Each `FontGroupFamily` represent `FontFamily` that may be represented on device as single font file or a set of font-files if we consider `segmented fonts`. Each `FontFamily` (`FontGroupFamily`) must have set of `FontDescriptor`s that allow to uniquely identify `FontFace` object within particular `font file`.
226
229
227
-
That means that `FontGroup::FontDescriptor` represents some abstract `FontFace` that must be present in at least one of `FontFamily` objects. In case we will not be able to find it we will start `installed_font_fallback` procedure;
230
+
That means that `FontGroup::FontDescriptor` represents some abstract `FontFace` that must be present in at least one `FontGroupFamily` object. In case we will not be able to find it we will start `installed_font_fallback` procedure;
228
231
229
232
`Language_id` is the new CSS4 feature that allow us to more accurately control visual representation. Lets say we have two `FontFamily` within specified list which have `FontFace` that will satisfy `FontGroup::FontDescriptor`. In that case old spec asked us to simply pick first one that satisfies the descriptor. CSS4 allows user to setup corresondance between language of the element and particular family that we want to use:
230
233
```html
@@ -235,20 +238,29 @@ That means that `FontGroup::FontDescriptor` represents some abstract `FontFace`
On pictures above you can see that body have several conflicting font-families all of them are capable to display particular character. Language allow us to additionaly precisely chose `FontGroupFamily` within `FontGroup`
261
+
On pictures above you can see that `body` have several conflicting font-families all of them are capable to display particular character. Then we declare a list of elements where first wellcome string will use body font-family rules, and second wellcom declared inside `<span>` will use additional language hint for `font-family` style.
262
+
263
+
So `language_id` allow us to additionaly precisely chose `FontGroupFamily` within `FontGroup`. Differences are easy to spot on arabic and Chinese language versions.
252
264
253
265
So the task of `find_by_codepoint` is:
254
266
1. Traverse all `font files` that represent particular `FontFamily` on device, and accumulate all possible `FontFace`s within family in question. Get `FontFace`s in form the of list of `FontDescriptor`s (load with help o OS / font third party libraries).
In modern browsers inline layout consists form the following steps:
2
+
In modern browsers inline layout consists from the following steps:
3
3
## Layout preparation
4
4
- Divide `BoxTree` into the set of subtree view objects that is called `Block Formatting Context` (`BFC`) and `Inline Formatting Context` (`IFC`); Plese do not confuse this termin with `Independent Formatiing Context` that also can be abbreviated as `IFC`.
5
-
- Accumulate all text within `Inline Formatting Context`inside one `infinite string`; Accumulate all elements with visual representation in `Inline Formatting Context` into container of `Inline Items`; During such aggregation html elements may introduce additional codepoints into string. In example some html elements under special conditions may setup independent `BIDI paragraph object` and to adress this fact it is necessary to introduce special `bidi-control symbols`.
5
+
- Accumulate all text within `Inline Formatting Context`into one `infinite string`(important to properly handle some abstractions that span across several hypertext markup elements - i.e. bidi directionality); Accumulate all elements with visual representation in `Inline Formatting Context` into container of `Inline Items`; During such aggregation html elements may introduce additional codepoints into string. In example some html elements under special conditions may setup independent `BIDI paragraph object` and to adress this fact it is necessary to introduce special `bidi-control symbols`.
6
6
- Then we need to prepare text for `shaping` procedure (the result of shaping is a size and position of each symbol that it will occupy within the string). **OpenType / TrueType specific preparation is described further**: During this process we split aquired infinite string into segments of consequative symbols that is sharing a set properties (same bidi-direction, font, language and script), and memorise such segments as `ranges` within original `infinite string`. For each generated `text segment` corresponding item must be introduced in `Inline Items` container.
7
7
- After we use some third party library that will actually extract information that is contained within font file
8
8
and perform `shaping`
@@ -15,12 +15,61 @@ and perform `shaping`
15
15
## Detailed Explanation of steps:
16
16
### Inline Items construction
17
17
TODO
18
-
### Text Segmentation
19
-
TODO
20
18
#### BIDI level computation
21
19
TODO
22
20
#### Font Style matching procedure
21
+
Exact details is provided in servo [fonts module](./fonts.md)
22
+
23
+
### Text Segmentation & Shaping
24
+
#### What is text shaping?
25
+
Wikipedia provides following definition:
26
+
> Text shaping is the process of converting text to glyph indices and positions as part of text rendering. It is complementary to font rendering as part of the text rendering process; font rendering is used to generate the glyphs, and text shaping decides which glyphs to render and where they should be put on the image plane. Unicode is generally used to specify the text to be rendered.
27
+
28
+
Microsoft have the following document that describe [Text layout](https://learn.microsoft.com/en-us/globalization/fonts-layout/text-layout). It have section devoted to [Text shaping](https://learn.microsoft.com/en-us/globalization/fonts-layout/text-layout#text-shaping), however clear and short definition is not provided, this section names four important subtasks that any text shping engine must solve:
29
+
-[correct processing of ligatures](https://learn.microsoft.com/en-us/globalization/fonts-layout/text-layout#ligatures)
30
+
-[script-specific replacement of the characters that depends on context](https://learn.microsoft.com/en-us/globalization/fonts-layout/text-layout#contextual-shaping)
31
+
-[combining special characters (i.e diacritics and tone marks) into single visual representation](https://learn.microsoft.com/en-us/globalization/fonts-layout/text-layout#combining-characters)
32
+
-[script-specific (i.e Hindi, Devanagari) reordering of the characters](https://learn.microsoft.com/en-us/globalization/fonts-layout/text-layout#character-reordering)
33
+
34
+
[HarfBuzz definition](https://harfbuzz.github.io/what-is-harfbuzz.html) of text shaping:
35
+
> Text shaping is the process of translating a string of character codes (such as Unicode codepoints) into a properly arranged sequence of glyphs that can be rendered onto a screen or into final output form for inclusion in a document.
36
+
37
+
#### What facts everyone must know about text shaping:
38
+
39
+
First that everyone must know about font shaping is the fact that different shaping approaches exists. The two of which author is aware of is [SIL Graphite](https://graphite.sil.org/) and Opentype / TrueType shaping algorithms. Great writeup on both technologies provided by [article](https://graphite.sil.org/graphite_aboutOT.html) on the SIL Graphite website.
40
+
I also feel obliged to provide a link on [great repository](https://github.com/n8willis/opentype-shaping-documents) that contains a lot of documents regarding the OpenType / TrueType shaping.
41
+
42
+
Second most important thing that everyone must know is the fact that w3c established [Web Font Working Group](https://www.w3.org/groups/wg/webfonts/). That group created [WOFF fonts](https://www.w3.org/TR/WOFF/) to improve and standartized loading of fonts from web resources, mostly it contains the standart of data compression and additional headers for web straming. If we look at uncompressed font format we will find that it follows simmilar structure and shaping model as OpenType fonts. That means that OpenType shaping models is currently dominating web domain. So the rest of the information in the shaping section will be devoted to shaping operations specific to [OpenType / Truetype shaping models](https://harfbuzz.github.io/opentype-shaping-models.html).
43
+
44
+
#### OpenType / Truetype shaping model
45
+
It would be unreasonable to copy information about all different Opentype and Truetype shaping models here, so I will just provide the link to [awesome repository](https://github.com/n8willis/opentype-shaping-documents) about shaping again.
46
+
47
+
Now let's get to more practical side. For web engine developer it is not necessary to know every detail of notorious shaping process cause we are working with third-party crates that provide already written shaping engine. Here servo uses opensource HarfBuzz shaping engine written in C/C++ (if someone is intrested in writing pure Rust OpenType shaping engine, please consider to provide your implementation to servo authors). `harfbuzz-sys` crate is used as FFI between C/C++ implementation and Rust language.
48
+
49
+
HarfBuzz engine have special requirements to the inputs.
50
+
1. we must create special `hb_font_t` and `hb_face_t` structures. In our case creation of such structures is dictated by CSS styles. Users will define the font through CSS font-family and language; and face through combination of font-size, font-weight, font-style, font-stretch, ...
51
+
2. we must properly segment the whole text string accumulated in IFC to the segments of characters that share common properties (more details at [what harfbuzz doesn't do](https://harfbuzz.github.io/what-harfbuzz-doesnt-do.html)). List of properties provided bellow.
52
+
53
+
#### Text segment features that is shared by all codepoints
54
+
- Bidi direction
55
+
- Language
56
+
- Script
57
+
-***Font*** (particular rigidly defined face within the font)
58
+
59
+
After segmentation we must provide set of OpenType features to the shaping engine.
60
+
61
+
### Inline items linebreaking
62
+
Unfortunately I don't have enough understanding of token based linebreaking algorithm to shrtly describe it here.
23
63
TODO
24
-
### Shaping
64
+
65
+
### Inline items BIDI Reordering
66
+
Bidi reordering should be preformed exactly as stated in [Unicode® Standard Annex #9 Unicode Bidirectional Algorithm](https://unicode.org/reports/tr9/).
67
+
Currently servo have only partial implementation of that algorithm.
68
+
Problems mostly concentrated at per-line reordering rules.
69
+
Rules [L1](https://unicode.org/reports/tr9/#L1)-[L4](https://unicode.org/reports/tr9/#L4) not properly implemented. For example we don't use information about bidi-paragraphs at all.
70
+
71
+
On conceptual level application developer should allways use icu4c or new pure rust icu crates for such operations.
0 commit comments