Unicode (read: Emoji) support

When auto-completing a line that contains an emoji (or any other multi-byte unicode character) the suggestions stop working.

For instance, typing `let 💯 = Foo(` does accurately bring up the suggestion box. However the suggestions are not relevant to the location of the current cursor position.
## Overview

The problem is that [`FullTextDocument`](https://github.com/Microsoft/vscode-languageserver-node/blob/a9f36d43a789e6fd9c16e5e50fc818eb35d097db/types/src/main.ts#L1192-L1291) does not handle Unicode characters.

For example, say we have two Swift source documents:
### ascii.swift

``` swift
struct Foo {
    let bar: Int
}

let x = Foo()
```
### unicode.swift

```
struct Foo {
    let bar: Int
}

let 💯 = Foo()
```

Both source documents have the same number of code points, e.g., 46, but they have a different number of bytes, e.g., 46 for `ascii.swift` and 49 for `unicode.swift`.

Therefore, if you were to ask for the byte-offset of the closing parenthesis in `ascii.swift` it would be `45`. Compare that with `unicode.swift` which would have the value `48`.
## Example

Using the same above documents, `ascii.swift` and `unicode.swift`.

``` typescript
import * as fs from 'fs';
import { TextDocument, Position } from 'vscode-languageserver';

const ascii = '/path/to/ascii.swift';
const unicode = '/path/to/unicode.swift';

// Load the text documents
let asciiBuffer: Promise<Buffer> = new Promise((resolve, reject) => {
    fs.readFile(ascii, (err, data) => {
        if (err) { reject(err); }
        else { resolve(data); }
    });
});

let unicodeBuffer: Promise<Buffer> = new Promise((resolve, reject) => {
    fs.readFile(unicode, (err, data) => {
        if (err) { reject(err); }
        else { resolve(data); }
    });
});

// REMEMBER Position is zero indexed!
// https://github.com/Microsoft/vscode-languageserver-node/blob/a9f36d43a789e6fd9c16e5e50fc818eb35d097db/types/src/main.ts#L12
let position = Position.create(4, 12);

let asciiByteOffset = asciiBuffer.then((buffer) => TextDocument.create(ascii, 'swift', 1, buffer.toString('utf8')))
  .then((document) => document.offsetAt(position))
  .then(console.log); // logs 45 ✅

let unicodeByteOffset = unicodeBuffer.then((buffer) => TextDocument.create(unicode, 'swift', 1, buffer.toString('utf8')))
  .then((document) => document.offsetAt(position))
  .then(console.log); // logs 45 ❌
```
## Resolution?

One idea that I've been working towards is creating a new class `UnicodeTextDocument` that conforms to the [`TextDocument`](https://github.com/Microsoft/vscode-languageserver-node/blob/a9f36d43a789e6fd9c16e5e50fc818eb35d097db/types/src/main.ts#L1082-L1139) interface.

Which could serve as a drop-in replacement for `TextDocument` that transparently provides byte-offset.

Such that you could do:

``` typescript
let asciiByteOffset = asciiBuffer.then((buffer) => new UnicodeTextDocument(ascii, 'swift', 1, buffer.toString('utf8')))
  .then((document) => document.offsetAt(position))
  .then(console.log); // logs 45 ✅

let unicodeByteOffset = unicodeBuffer.then((buffer) => new UnicodeTextDocument(unicode, 'swift', 1, buffer.toString('utf8')))
  .then((document) => document.offsetAt(position))
  .then(console.log); // logs 48 ✅
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode (read: Emoji) support #6

Overview

ascii.swift

unicode.swift

Example

Resolution?

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Unicode (read: Emoji) support #6

Description

Overview

ascii.swift

unicode.swift

Example

Resolution?

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions