Skip to content

Unicode problem in text parsing #1

@bgrawi

Description

@bgrawi

If you look at commit 79f59c2144 on http://commit.guru/repo/jquery you will see the person's email with failed Unicode chars in it (the u7352 stuff). We need to figure out where the slash is lost. It could be anywhere from:

  • The first ingestion
  • Python itself
  • SqlAlchemy parsing/coercing
  • Database coalition (content type),

Or on the front end:

  • Node maybe can't handle it?
  • The waterline ORM is incorrectly parsing it
  • The socket connection might be loosing it
  • Angular.js might be loosing it
  • The actual html page might have the incorrect character encoding.

We need to rule out that it is not caused by the CodeRepoAnalyzer -> database before I start digging in to the front-end side of things.

My guess is that it's lost from the first ingestion from the git log output, but I might be wrong.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions