-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Description
I am writing a REPL for the Lambda Calculus and I incorporated line noise-Swift to provide line editing functionality. Unfortunately, The Greek letter lambda (λ) is encoded in UTF-8 as two bytes: CE BB. linenoise-swift handles input one byte at a time and tries to split the λ. The same problem occurs for any Unicode code point that takes more than one byte to stop in UTF-8, i.e. everything except 7-bit US ASCII.
How to Reproduce
Run the linenoiseDemo command line app. Type in a few characters and then a λ. The cursor will be repositioned at the start of the line and garbage appended to the end of the line. Here is an example:
Type 'exit' to quit
gdggfdsgdsλ
utput: gdggfdsgdsλ
?
If you are having trouble producing a λ from your keyboard, the problem still manifests if you copy-paste it from the text of this issue.
Further Information
I made an attempt to fix the issue myself. You can see my attempt here. The patch is a lot bigger than you might expect because adding support for multibyte UTF-8 exposes another more subtle bug.
Consider the following code in class EditLine
func insertCharacter(_ char: Character) {
let origLoc = location
let origEnd = buffer.endIndex
buffer.insert(char, at: location)
location = buffer.index(after: location)
if origLoc == origEnd {
location = buffer.endIndex
}
}
The Apple Documentation for insert(_:, at:) says
Calling this method invalidates any existing indices for use with this string
This means that location, origLoc and origEnd are all invalid after the insert. If it's a single byte character we get away with it. If not, location ends up as a garbage value and causes a process abort when it is next used. I ended up changing the types of buffer to [Character] and location to Int as the easy way out.
NB I can give you a pull request or a patch, if it helps, but it hasn't been extensively tested and probably still breaks with composed characters e.g. emoji.