Skip to content

Commit ba518b3

Browse files
authored
Merge pull request #1 from Mathics3/character-selection-comments
Start indicating why we chose what we chose.
2 parents 6973ab9 + 6597490 commit ba518b3

File tree

2 files changed

+22
-1
lines changed

2 files changed

+22
-1
lines changed

README.rst

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,27 @@ Uses
1313

1414
This is used as the scanner inside `Mathics <https://mathics.org>`_ but it can also be used for tokenizing and formatting WL code. In fact we intend to write one.
1515

16+
Implementation
17+
==============
18+
19+
mathics_scaner.characters
20+
-------------------------
21+
22+
This module consists mostly of translation tables between WL and unicode/ascii.
23+
Because of the large size of this tables, it was decided to store them in a
24+
file and read them from disk at runtime (when the module is imported). Our
25+
tests showed that storing the tables as JSON and using
26+
[ujson](https://github.com/ultrajson/ultrajson) to read them is the most
27+
efficient way to access them. However, this is merelly an implementation
28+
detail and consumers of this library should not relly on this assumption.
29+
30+
For maintainability and effeciency, we decided to store this data in a
31+
human-readable YAML file (`data/named-characters.yml`) and compile them into
32+
the JSON tables used internally by the library (`data/characters.json`) for
33+
faster access at runtime. The conversion of the data is performed by the
34+
script `admin-tools/compile-translation-tables.py` at each commit to the
35+
`master` branch via GitHub Actions.
36+
1637

1738
Contributing
1839
------------

mathics_scanner/data/named-characters.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6943,7 +6943,7 @@ Upsilon:
69436943
# looks more like U+26E2 (Astronomical Symbol for Uranus) than the Standard Unicode equavalent
69446944
# seen at https://www.compart.com/en/unicode/U+2645.
69456945
# As with the Earth, we are going off of the name and the code point rather than the
6946-
# visual representation of the symbo.
6946+
# visual representation of the symbol.
69476947
Uranus:
69486948
has-unicode-inverse: false
69496949
is-letter-like: false

0 commit comments

Comments
 (0)