Skip to content

Commit 5bea03d

Browse files
committed
Add Compatibility Issues section
1 parent 7b79d75 commit 5bea03d

File tree

1 file changed

+76
-10
lines changed

1 file changed

+76
-10
lines changed

proposals/002-text-utf-default.md

Lines changed: 76 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -86,8 +86,8 @@ The very `instance Binary Text` serializes `Text` in UTF-8 encoding.
8686
If we switch the internal representation of `Text` from UTF-16 to UTF-8,
8787
all such conversions would be made redundant and we'll be able just check that
8888
a `ByteString` is a valid UTF-8 (which is most often the case) and copy it into `Text`.
89-
If in future (see an upcoming "Unifying vector-like types" proposal) `ByteString`
90-
is backed by unpinned memory, we'd be able to eliminate copying entirely.
89+
If in future `ByteString` switch to be
90+
backed by unpinned memory, we'd be able to eliminate copying entirely.
9191

9292
`Text` is also often used in contexts, which involve mostly ASCII characters.
9393
This often prompts developers to use `ByteString` instead of `Text` to save 2x space
@@ -118,6 +118,8 @@ to make another attempt.
118118

119119
- Performance satisfies targets listed below in "Performance impact" section.
120120

121+
- Compatibiltiy story satisfies targets listed below in "Compatibility issues" section.
122+
121123
# People
122124

123125
- Performers:
@@ -180,6 +182,70 @@ possible. This candidate should be shared publicly and loudly.
180182

181183
- TBD: There is a straightforward implementation, but this one is left up to Andrew for comment.
182184

185+
**Compatibility issues**
186+
187+
`text` is a very old package, deeply ingrained in Haskell ecosystem.
188+
A change of internal representation is necessarily a breaking change.
189+
Our strategy to tackle compatibility issues is guided by a desire
190+
to finish this project in a time-bound fashion with realistic expectations
191+
about available resources.
192+
193+
Current `text` HEAD supports GHCs back to GHC 8.0.
194+
At the moment we do not foresee any blockers
195+
to keep compatibility with GHC 8.0 after UTF-8 transition,
196+
and we plan to stick to it even if it causes some overhead in CPP.
197+
However, if we discover that supporting old GHCs causes a significant overhead
198+
(e. g., a dedicated non-trivial code path, emulating missing primop
199+
or working around a bug), we may decide to shrink the compatibility window.
200+
Such decision would not to be taken lightly, but we believe that getting
201+
things done for the bright future should not be hindered by old unsupported luggage.
202+
203+
One suggestion to improve compatibility story was to keep both UTF-16 and UTF-8
204+
implementations in `text` and switch between them via Cabal flag. It seems,
205+
however, that such strategy will put an undue, indefinitely long burden
206+
on `text` maintainers, and brings little benefits to downstream packages, because
207+
they cannot detect build flags of `text` (and thus cannot rely on its internals at all).
208+
209+
Instead we mark a new, UTF-8 release as `text-2.0`, and put a call for volunteers
210+
to maintain a legacy UTF-16 package. Depending on a demand, this could be done either
211+
as a continuation of `text-1.X` series, or as a separate `text-utf16` package. We'll facilitate such community project and will work with Hackage Trustees and Stackage
212+
Curators to ensure timely transition of ecosystem.
213+
214+
With regards to API compatibility, we intend to keep signatures of non-`Internal`
215+
modules unchanged, except `Word16` replaced by `Word8` where appropriate.
216+
Such promise unfortunately cannot be made for `Internal` modules,
217+
due to their nature: even while we'll strive to keep as much untouched as possible,
218+
the semantics of internal functions is due to change drastically. This kind of breakage
219+
should not come as a big surprise, because `Internal` modules have a disclaimer about
220+
unstable API.
221+
222+
There are two places where `text` leaks details of internal represenation.
223+
First of them is `Data.Text.Array`, which provides an access to an underlying bytearray.
224+
Not only its API is to change from `Word16` to `Word8`, but also the semantics
225+
of array switches from UTF-16 to UTF-8. This will cause breakage of several packages
226+
such as `unicode-transforms` and `unicode-collation`. We intend to communicate with
227+
respective maintainers as early as possible to help with transition.
228+
229+
Another one is `Data.Text.Foreign`, which is mostly used by `text-icu` library,
230+
which binds to `libicu` for certain Unicode manipulations. `libicu` provides
231+
helpers to convert C strings
232+
[from UTF8 to UTF16](https://unicode-org.github.io/icu/userguide/strings/utf-8.html).
233+
It is up to `text-icu` maintainers either to modify their bindings. We intend
234+
to reach to them as soon as we have an MVP.
235+
236+
Since fixing downstream compatibility issues is up to external counterparties,
237+
most of which are unpaid volunteers, we cannot expect them to do it in a limited
238+
time frame. We are devoted to having a smooth migration story and will provide
239+
as much guidance as possible, but to keep our targets time-bound we cannot tie the success
240+
of this project to actions of third parties. We will not wait for everyone to migrate.
241+
242+
To sum up, we plan to:
243+
244+
* Keep `text` compatible with GHCs back to 8.0, unless it puts an undue cost (more than 50 lines of code per major release).
245+
* Keep signatures of non-`Internal` modules compatible modulo `Word16`/`Word8` change.
246+
* Provide migration guidance to clients of `Data.Text.{Array,Foreign}`.
247+
* Facilitate a community project to keep UTF16-based legacy fork alive, if there is such demand.
248+
183249
**Performance impact**
184250

185251
A common misunderstanding is that switching to UTF-8 makes everything twice smaller and
@@ -265,20 +331,20 @@ packages that go out of date.
265331

266332
- text-2.0.0.0, which will provide a UTF-8 encoding for Text as a default for all versions going forward.
267333

268-
- A `text-utf16` package, which is a preservation of the current UTF-16 encoded text, for backwards compatibility.
269-
270-
- Updates to the Text Haddocks that reflect the UTF-8 changes
334+
- Updates to the Text Haddocks that reflect the UTF-8 changes.
271335

272336
- Announcements and updates across all Haskell channels covering the following:
273-
- Significant dates and milestones
337+
- Significant dates and milestones.
338+
339+
- Release candidates.
274340

275-
- Expected code impact
341+
- Migration guides.
276342

277-
- Release candidates
343+
- Performance impact analysis.
278344

279-
- Delivery
345+
- Delivery.
280346

281-
- Migration instructions and design documentation
347+
- Design documentation.
282348

283349
# Outcomes
284350

0 commit comments

Comments
 (0)