|
| 1 | +# Hyphenation |
| 2 | + |
| 3 | +Efficient and flexible automatic hyphenation using the Knuth-Liang algorithm. |
| 4 | + |
| 5 | +See full documentation at [https://john-mueller.github.io/Hyphenation](https://john-mueller.github.io/Hyphenation). |
| 6 | + |
| 7 | +## Table of Contents |
| 8 | + |
| 9 | +- [Introduction](#introduction) |
| 10 | +- [Installation](#installation) |
| 11 | +- [Usage](#usage) |
| 12 | + - [Exceptions](#exceptions) |
| 13 | + - [Custom Patterns](#custom-patterns) |
| 14 | + - [Thread Safety](#thread-safety) |
| 15 | + - [HTML/Code](#htmlcode) |
| 16 | +- [Contributing](#contributing) |
| 17 | +- [License](#license) |
| 18 | +- [Credits](#credits) |
| 19 | + |
| 20 | +## Introduction |
| 21 | + |
| 22 | +The primary purpose of this library is to automatically insert soft hyphens into text to improve its layout flow. Consider the following example from [Butterick's Practical Typography](https://practicaltypography.com/justified-text.html): |
| 23 | + |
| 24 | + |
| 25 | + |
| 26 | +As line length decreases, justified text often displays large gaps between words, while left- or right-aligned text displays increasingly uneven margins. By inserting soft hyphens into words (which are only displayed when they fall at the end of a line), the text can be split more naturally across multiple lines, improving the text's aesthetic appearance: |
| 27 | + |
| 28 | + |
| 29 | + |
| 30 | +This proves useful in HTML (since the `hyphens` CSS property is [not universally supported](https://caniuse.com/#search=hyphen)), and will even work in UIKit and SwiftUI text components. |
| 31 | + |
| 32 | +## Installation |
| 33 | + |
| 34 | +Hyphenation is installed via the [Swift Package Manager](https://swift.org/package-manager/). First, add it to your dependencies: |
| 35 | + |
| 36 | +```swift |
| 37 | +let package = Package( |
| 38 | + ... |
| 39 | + dependencies: [ |
| 40 | + .package(url: "https://github.com/john-mueller/Hyphenation.git", from: "0.1.0") |
| 41 | + ], |
| 42 | + ... |
| 43 | +) |
| 44 | +``` |
| 45 | + |
| 46 | +Then import the Hyphenation module wherever it is needed: |
| 47 | + |
| 48 | +```swift |
| 49 | +import Hyphenation |
| 50 | +``` |
| 51 | + |
| 52 | +## Usage |
| 53 | + |
| 54 | +Basic usage just requires creating a `Hyphenator` instance and calling its `hyphenate(text:)` method: |
| 55 | + |
| 56 | +```swift |
| 57 | +let hyphenator = Hyphenator() |
| 58 | +hyphenator.separator = "-" |
| 59 | + |
| 60 | +let text = "This algorithm identifies likely hyphenation points." |
| 61 | +print(hyphenator.hyphenate(text: text)) |
| 62 | +// This al-go-rithm iden-ti-fies like-ly hy-phen-ation points. |
| 63 | +``` |
| 64 | + |
| 65 | +You can also remove the `separator` character from a string by calling `unhyphenate(text:)`: |
| 66 | + |
| 67 | +```swift |
| 68 | +let hyphenatedText = "This al-go-rithm iden-ti-fies like-ly hy-phen-ation points." |
| 69 | +print(hyphenator.unhyphenate(text: hyphenatedText)) |
| 70 | +// This algorithm identifies likely hyphenation points. |
| 71 | +``` |
| 72 | + |
| 73 | +Note that if the original string contained the `separator` character, the `unhyphenate(text:)` method will remove those instances as well, so hyphenating and unhyphenating a string may not recover the original string. |
| 74 | + |
| 75 | +### Exceptions |
| 76 | + |
| 77 | +The algorithm is designed to prioritize the prevention of incorrect hyphenations over finding every correct hyphenation—missing a single hyphenation rarely effects text flow meaningfully, but bad hyphenation can be rather noticable. Because the patterns were derived from a English dictionary, it can make good guesses about many words that do not appear in a dictionary. |
| 78 | + |
| 79 | +```swift |
| 80 | +let hyphenator = Hyphenator() |
| 81 | +hyphenator.separator = "-" |
| 82 | + |
| 83 | +print(hyphenator.hyphenate(text: "swiftlang supercalifragilisticexpialidocious")) |
| 84 | +// swift-lang su-per-cal-ifrag-ilis-tic-ex-pi-ali-do-cious |
| 85 | +``` |
| 86 | + |
| 87 | +Nevertheless, the algorithm may occasionally produce unexpected results for brand names or other unusual words. In this case, you may manually specify a desired hyphenation using exceptions. |
| 88 | + |
| 89 | +```swift |
| 90 | +print(hyphenator.hyphenate(text: "Microsoft sesquipedalian")) |
| 91 | +// Mi-crosoft sesquipedalian |
| 92 | + |
| 93 | +hyphenator.addCustomExceptions(["Micro-soft", "ses-qui-pe-da-li-an"]) |
| 94 | + |
| 95 | +print(hyphenator.hyphenate(text: "Microsoft sesquipedalian")) |
| 96 | +// Micro-soft ses-qui-pe-da-li-an |
| 97 | +``` |
| 98 | + |
| 99 | +To remove custom exceptions, use the following methods: |
| 100 | + |
| 101 | +```swift |
| 102 | +hyphenator.removeCustomExceptions(["sesquipedalian"]) |
| 103 | +hyphenator.removeAllCustomExceptions() |
| 104 | +``` |
| 105 | + |
| 106 | +### Customizable Properties |
| 107 | + |
| 108 | +There are several properties you can modify to adjust how words are hyphenated. |
| 109 | + |
| 110 | +The `separator` property sets the character inserted at hyphenation points. By default, this is U+00AD (soft hyphen). |
| 111 | + |
| 112 | +The `minLength`, `minLeading`, and `minTrailing` properties adjust where the separator character may be inserted in a word. |
| 113 | + |
| 114 | +- Words shorter than `minLength` will not be hyphenated. |
| 115 | +- Hyphenation will not occur within the first `minLeading` characters. |
| 116 | +- Hyphenation will not occur within the last `minTrailing` characters. |
| 117 | + |
| 118 | +### Custom Patterns |
| 119 | + |
| 120 | +This library includes American English hyphenation patterns by default, but you can easily initialize a `Hyphenator` instance using patterns for many other languages. The patterns can be passed via `String` or `URL`: |
| 121 | + |
| 122 | +```swift |
| 123 | +let hyphenator1 = Hyphenator(patterns: patternsString, exceptions: exceptionsString) |
| 124 | +let hyphenator2 = Hyphenator(patternFile: patternsURL, exceptions: exceptionsURL) |
| 125 | +``` |
| 126 | + |
| 127 | +Patterns for a wide variety of languages can be found in the [TeX hyphenation repo](https://github.com/hyphenation/tex-hyphen/tree/master/hyph-utf8/tex/generic/hyph-utf8/patterns/txt). *Please check the license under which each set of patterns is released* at the [TeX hyphenation patterns website](http://www.hyphenation.org/tex#languages). This page also lists the proper `minLeading` and `minTrailing` for each language. |
| 128 | + |
| 129 | +When initializing a new `Hyphenator` instance, it is assumed that patterns are separated by newlines or whitespace. |
| 130 | + |
| 131 | +### Thread Safety |
| 132 | + |
| 133 | +The Hyphenator class is thread-safe, and can be used to hyphenate on multiple threads simultaneously (although the performance benefits over using two instances are negligible). |
| 134 | + |
| 135 | +The `copy()` method provides an efficient way to create a new `Hyphenator` instance with the same properties and patterns as an existing instance. |
| 136 | + |
| 137 | +### HTML/Code |
| 138 | + |
| 139 | +You should not apply the `hyphenate(text:)` method directly to strings containing HTML or code, as the code elements may be erroneously hyphenated. A safer approach is to use another tool capable of identifying HTML or code elements and applying hyphenation only to plain text content within the markup. |
| 140 | + |
| 141 | +See [HyphenationPublishPlugin](https://github.com/john-mueller/HyphenationPublishPlugin) for an example hyphenating HTML using [SwiftSoup](https://github.com/scinfu/SwiftSoup). |
| 142 | + |
| 143 | +## Contributing |
| 144 | + |
| 145 | +If you encounter an issue using Hyphenation, please let me know by filing an issue or submitting a pull request! |
| 146 | + |
| 147 | +When filing an issue, please do your best to provide reproducable steps and an explanation of the expected behavior. |
| 148 | + |
| 149 | +In the case of a pull request, please take note of the following steps: |
| 150 | + |
| 151 | +1. `swiftlint` should produce no warnings when run in the project directory. This is checked by CI, but I also recommend linting locally if possible (instructions for installation in the [SwiftLint repo](https://github.com/realm/SwiftLint#installation)). |
| 152 | +2. If you have added or renamed test cases, run `make generate-linuxmain` in the project directory. This will ensure all tests are run on both macOS and Linux. |
| 153 | +3. Make sure `make test` results in no errors. This runs the tests in the `HyphenationTests` and `ThreadSafetyTests` targets. |
| 154 | +4. If changing any internal implementations, please run `make bench` both with and without your changes, to check for any speed regressions. This runs the tests in the `PerformanceTests` target. |
| 155 | + |
| 156 | +## License |
| 157 | + |
| 158 | +Hyphenation is provided under the MIT license (see [LICENSE.md](LICENSE.md)). |
| 159 | + |
| 160 | +The English hyphenation patterns are provided under the [original custom license](https://github.com/hyphenation/tex-hyphen/blob/master/hyph-utf8/tex/generic/hyph-utf8/patterns/tex/hyph-en-us.tex), and are sourced from the [TeX hyphenation repo](https://github.com/hyphenation/tex-hyphen/tree/master/hyph-utf8/tex/generic/hyph-utf8/patterns/txt). |
| 161 | + |
| 162 | +[The texts](Tests/PerformanceTests/TestHelpers/) used for performance testing are in the public domain, and are attributed in their headers. |
| 163 | + |
| 164 | +The example paragraphs in the Introduction are from [Butterick's Practical Typography](https://practicaltypography.com), and are used with permission. |
| 165 | + |
| 166 | +## Credits |
| 167 | + |
| 168 | +This library was inspired by the pages on [justified text](https://practicaltypography.com/justified-text.html) and [optional hyphens](https://practicaltypography.com/optional-hyphens.html) in [Butterick's Practical Typography](https://practicaltypography.com) and the author's [implementation in Racket](https://github.com/mbutterick/hyphenate). It's worth a read! |
0 commit comments