Skip to content

ToJson produces invalid json when input has smart quote charactersΒ #122

@twaddell

Description

@twaddell

Describe the bug
When processing a string containing curly quote characters they are converted to straight quotes which produces invalid Json when the ToJson Method is called

To Reproduce

Catalyst.Models.English.Register(); 

Storage.Current = new DiskStorage("catalyst-models");
var nlp = await Pipeline.ForAsync(Language.English);
var doc = new Document("The \u201cquick\u201d brown fox jumps over the lazy dog", Language.English);
nlp.ProcessSingle(doc);
Console.WriteLine(doc.ToJson());

This outputs the following to the console:

{"Language":"en","Length":45,"Value":"The "quick" brown fox jumps over the lazy dog","TokensData":[[{"Bounds":[0,2],"Tag":"DET"},{"Bounds":[4,4],"Tag":"PUNCT"},{"Bounds":[5,9],"Tag":"ADJ"},{"Bounds":[10,10],"Tag":"PUNCT"},{"Bounds":[12,16],"Tag":"ADJ"},{"Bounds":[18,20],"Tag":"NOUN"},{"Bounds":[22,26],"Tag":"VERB"},{"Bounds":[28,31],"Tag":"ADP"},{"Bounds":[33,35],"Tag":"DET"},{"Bounds":[37,40],"Tag":"ADJ"},{"Bounds":[42,44],"Tag":"NOUN"}]]}

Note the straight quotes around the word quick. This json cannot be parsed.

Expected behavior
The ToJson method should retain the unicode curly quotes in the Value property and produce valid json output.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions