Skip to content

Javascript version doesn't handle astral code points correctly #106

@GoogleCodeExporter

Description

@GoogleCodeExporter
What steps will reproduce the problem?

var dmp = new diffMatchPatch();

var str1 = ">>> \ud83d\ude4b <<<";
var str2 = ">>> \ud83d\ude4c <<<";

var diffs = dmp.diff_main(str1, str2);
console.log("diff = " + JSON.stringify(diffs));

What is the expected output? What do you see instead?

Expected: diff = [[0,">>> "],[-1,"🙋 "],[1,"🙌 "],[0," <<<"]]
Actual: diff = [[0,">>> �"],[-1,"�"],[1,"�"],[0," <<<"]]

Expanded, in case there is a loss of fidelity in this issue posting:

Expected: diff = [[0,">>> "],[-1,"\ud83d\ude4b"],[1,"\ud83d\ude4c"],[0," <<<"]]
Actual: diff = [[0,">>> \ud83d"],[-1,"\ude4b"],[1,"\ude4c"],[0," <<<"]]

The diff is split *between* the surrogate characters of the astral code point. 
Note that str1 and str2 share a common high surrogate.

What version of the product are you using? On what operating system?

1.0.0, on OSX


Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 1 May 2015 at 2:59

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions