Skip to content

Commit f034491

Browse files
fix: rework hyphenation (#3188)
* Add snapshot test for the textkit layout engine * Fix hyphenation algorithm The hyphenation algorithm may change the string (e.g. by removing some characters, namely soft hyphens). Therefore, calculating the glyphs must come after hyphenation, so that the glyphs match the final string. Fixes #3018 This was probably broken in #2600. * Change default hyphenation algorithm to more flexible paradigm The Hyphenation algorithm should be able to leave soft hyphens in, to indicate that a hyphen should be placed there if the word breaks there. * Variable width penalty nodes The line breaking algorithm needs to distinguish syllables which end with a soft hyphen from syllables that do not, and only mark a syllable for adding a hyphen in the former case. * Ensure zero advanceWidth for soft hyphens in both font and pdfkit For the line breaking algorithm, soft hyphens should be considered to have a width of zero, since they are never printed directly (they can only lead to an inserted hyphen if at the end of a line). The font package was already doing this correctly, but the pdfkit package considered the soft hyphen to be the same as a normal hyphen with an advanceWidth of 333 in Helvetica. Without this change, in some edge cases the pdfkit would break apart lines already broken apart by the line breaking algorithm in textkit. Added tests for both packages to make sure they remain compatible in the future. * Consider end-of-line hyphen width in bestFit algorithm In the best fit line breaking algorithm, the width of the hyphen must be taken into account, in case one is to be inserted at the end of the line. This is the most readable change I was able to find to acheive the goal. Maybe the bestFit algorithm could be optimized in the future, along with writing extensive tests for corner cases. * Soft hyphens in the text should not be rendered as hyphens Therefore, we remove all soft hyphens from the attributed string after linebreaking is completed, and recalculate the glyphs afterwards. This way, pdfkit never sees the soft hyphens, and does not mistake them for normal hyphens. * Add another test for the textkit layout engine Tests the functionality of custom word splitting functions * Add changeset * Pass builtin hyphenation callback to custom callback This allows library users to avoid importing the callback themselves, which probably most of the implementations will want to do.
1 parent cc1aff2 commit f034491

File tree

17 files changed

+192105
-30
lines changed

17 files changed

+192105
-30
lines changed

.changeset/ninety-dogs-grow.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
"@react-pdf/pdfkit": minor
3+
"@react-pdf/textkit": minor
4+
---
5+
6+
Fix and rework the hyphenation algorithm, and allow custom word hyphenation algorithms to specify whether a hyphen should be inserted in case the word is wrapped.
7+
8+
**Caution**: If you have been using a custom hyphenation callback - which hasn't been working properly since at least version 2.0.21 - then you will need to change your implementation to leave a soft hyphen character (`'\u00AD'`) at the end of syllables where you want react-pdf to insert a hyphen when wrapping lines. Syllables without a final soft hyphen character will still be able to break, but will not produce a hyphen character at the end of the line.
9+
10+
This allows you to break correctly on normal hyphens or other special characters in your text. For example, to use the default english-language syllable breaking built into react-pdf, but also break after hyphens naturally occurring in your text (such as is often present in hyperlinks), you could use the following hyphenation callback:
11+
```js
12+
import { Font } from '@react-pdf/renderer';
13+
14+
Font.registerHyphenationCallback((word, originalHyphenationCallback) => {
15+
return originalHyphenationCallback(word).flatMap(w => w.split(/(?<=-)/))
16+
})
17+
```
18+
(`flatMap` requires at least ES2019)

packages/font/tests/standard-fonts.test.ts

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -266,4 +266,13 @@ describe('standard fonts', () => {
266266

267267
expect(font.src).toBe('Helvetica-BoldOblique');
268268
});
269+
270+
it('should resolve advanceWidth of soft hyphen to be zero', () => {
271+
const SOFT_HYPHEN = '\u00AD';
272+
const fontStore = new FontStore();
273+
274+
const font = fontStore.getFont({ fontFamily: 'Helvetica' });
275+
276+
expect(font.data.encode(SOFT_HYPHEN)[1][0].advanceWidth).toBe(0);
277+
});
269278
});

packages/layout/tests/text/layoutText.test.ts

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -89,15 +89,18 @@ describe('text layoutText', () => {
8989

9090
test('should allow hyphenation callback to be overriden', async () => {
9191
const text = 'reallylongtext';
92-
const hyphens = ['really', 'long', 'text'];
92+
const hyphens = ['really­', 'long', 'text'];
9393
const hyphenationCallback = vi.fn().mockReturnValue(hyphens);
9494

9595
const node = createTextNode(text, {}, { hyphenationCallback });
9696
const lines = layoutText(node, 50, 100, fontStore);
9797

9898
expect(lines[0].string).toEqual('really-');
99-
expect(lines[1].string).toEqual('long-');
99+
expect(lines[1].string).toEqual('long');
100100
expect(lines[2].string).toEqual('text');
101-
expect(hyphenationCallback).toHaveBeenCalledWith('reallylongtext');
101+
expect(hyphenationCallback).toHaveBeenCalledWith(
102+
'reallylongtext',
103+
expect.any(Function),
104+
);
102105
});
103106
});

packages/pdfkit/package.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
"url": "http://badassjs.com/"
2020
},
2121
"scripts": {
22+
"test": "vitest",
2223
"clear": "rimraf ./lib ./src/font/data/*.json",
2324
"parse:afm": "node ./src/font/data/compressData.js",
2425
"build": "npm run clear && npm run parse:afm && rollup -c ",

packages/pdfkit/src/font/afm.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ oe .notdef zcaron ydieresis
8080
space exclamdown cent sterling
8181
currency yen brokenbar section
8282
dieresis copyright ordfeminine guillemotleft
83-
logicalnot hyphen registered macron
83+
logicalnot softhyphen registered macron
8484
degree plusminus twosuperior threesuperior
8585
acute mu paragraph periodcentered
8686
cedilla onesuperior ordmasculine guillemotright
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
import { describe, expect, it } from 'vitest';
2+
3+
import StandardFont from '../../src/font.js';
4+
5+
describe('standard fonts', () => {
6+
it('should resolve advanceWidth of soft hyphen to be zero', () => {
7+
const SOFT_HYPHEN = '\u00AD';
8+
const font = StandardFont.open({}, 'Helvetica', 'Helvetica', 'foobar');
9+
10+
expect(font.encode(SOFT_HYPHEN)[1][0].advanceWidth).toBe(0);
11+
});
12+
});

packages/textkit/src/engines/linebreaker/bestFit.ts

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,16 @@ import { Node } from './types';
22

33
const INFINITY = 10000;
44

5+
const skipPastGlueAndPenalty = (nodes: Node[], start: number): Node => {
6+
let j = start + 1;
7+
for (; j < nodes.length; j++) {
8+
if (nodes[j].type !== 'glue' && nodes[j].type !== 'penalty') {
9+
break;
10+
}
11+
}
12+
return nodes[j - 1];
13+
};
14+
515
const getNextBreakpoint = (
616
subnodes: Node[],
717
widths: number[],
@@ -37,6 +47,8 @@ const getNextBreakpoint = (
3747
return 0;
3848
};
3949

50+
let hyphenWidth = 0;
51+
4052
for (let i = 0; i < subnodes.length; i += 1) {
4153
const node = subnodes[i];
4254

@@ -50,7 +62,11 @@ const getNextBreakpoint = (
5062
sum.shrink += node.shrink;
5163
}
5264

53-
if (sum.width - sum.shrink > lineLength) {
65+
const potentialEndOfLine = skipPastGlueAndPenalty(subnodes, i);
66+
hyphenWidth =
67+
potentialEndOfLine.type === 'penalty' ? potentialEndOfLine.width : 0;
68+
69+
if (sum.width - sum.shrink + hyphenWidth > lineLength) {
5470
if (position === null) {
5571
let j = i === 0 ? i + 1 : i;
5672

@@ -78,7 +94,7 @@ const getNextBreakpoint = (
7894
}
7995
}
8096

81-
return sum.width - sum.shrink > lineLength ? position : null;
97+
return sum.width - sum.shrink + hyphenWidth > lineLength ? position : null;
8298
};
8399

84100
const applyBestFit = (nodes: Node[], widths: number[]): number[] => {

packages/textkit/src/engines/linebreaker/index.ts

Lines changed: 36 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,10 @@ import insertGlyph from '../../attributedString/insertGlyph';
55
import advanceWidthBetween from '../../attributedString/advanceWidthBetween';
66
import { AttributedString, Attributes, LayoutOptions } from '../../types';
77
import { Node } from './types';
8+
import generateGlyphs from '../../layout/generateGlyphs';
89

9-
const HYPHEN = 0x002d;
10+
const SOFT_HYPHEN = '\u00AD';
11+
const HYPHEN_CODE_POINT = 0x002d;
1012
const TOLERANCE_STEPS = 5;
1113
const TOLERANCE_LIMIT = 50;
1214

@@ -45,23 +47,49 @@ const breakLines = (
4547
end = prevNode.end;
4648

4749
line = slice(start, end, attributedString);
48-
line = insertGlyph(line.string.length, HYPHEN, line);
50+
if (node.width > 0) {
51+
// A non-zero-width penalty indicates an additional hyphen should be inserted
52+
line = insertGlyph(line.string.length, HYPHEN_CODE_POINT, line);
53+
}
4954
} else {
5055
end = node.end;
5156
line = slice(start, end, attributedString);
5257
}
5358

5459
start = end;
5560

56-
return [...acc, line];
61+
return [...acc, removeSoftHyphens(line)];
5762
}, []);
5863

59-
// Last line
60-
lines.push(slice(start, attributedString.string.length, attributedString));
64+
const lastLine = slice(
65+
start,
66+
attributedString.string.length,
67+
attributedString,
68+
);
69+
lines.push(removeSoftHyphens(lastLine));
6170

6271
return lines;
6372
};
6473

74+
/**
75+
* Remove all soft hyphen characters from the line.
76+
* Soft hyphens are not relevant anymore after line breaking, and will only
77+
* disrupt the rendering later down the line if left in the text.
78+
*
79+
* @param line
80+
*/
81+
const removeSoftHyphens = (line: AttributedString): AttributedString => {
82+
const modifiedLine = {
83+
...line,
84+
string: line.string.split(SOFT_HYPHEN).join(''),
85+
};
86+
87+
return {
88+
...modifiedLine,
89+
...generateGlyphs()(modifiedLine),
90+
};
91+
};
92+
6593
/**
6694
* Return Knuth & Plass nodes based on line and previously calculated syllables
6795
*
@@ -78,6 +106,7 @@ const getNodes = (
78106
let start = 0;
79107

80108
const hyphenWidth = 5;
109+
const softHyphen = '\u00ad';
81110

82111
const { syllables } = attributedString;
83112

@@ -107,7 +136,8 @@ const getNodes = (
107136

108137
if (syllables[index + 1] && hyphenated) {
109138
// Add penalty node. Penalty nodes are used to represent hyphenation points.
110-
acc.push(knuthPlass.penalty(hyphenWidth, hyphenPenalty, 1));
139+
const penaltyWidth = s.endsWith(softHyphen) ? hyphenWidth : 0;
140+
acc.push(knuthPlass.penalty(penaltyWidth, hyphenPenalty, 1));
111141
}
112142
}
113143

packages/textkit/src/engines/wordHyphenation/index.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ const hyphenator = hyphen(pattern);
1010
* @returns Word parts
1111
*/
1212
const splitHyphen = (word: string) => {
13-
return word.split(SOFT_HYPHEN);
13+
return word.split(new RegExp(`(?<=${SOFT_HYPHEN})`));
1414
};
1515

1616
const cache: Record<string, string[]> = {};

packages/textkit/src/layout/index.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,8 @@ const layoutEngine = (engines: Engines) => {
3232
resolveYOffset(),
3333
resolveAttachments(),
3434
verticalAlignment(),
35-
wrapWords(engines, options),
3635
generateGlyphs(),
36+
wrapWords(engines, options),
3737
bidiMirroring(),
3838
preprocessRuns(engines),
3939
);

0 commit comments

Comments
 (0)