Skip to content

Commit 0f0a76b

Browse files
committed
Added notes on the implementation of mathics_scanner.characters
1 parent b6d843f commit 0f0a76b

File tree

1 file changed

+21
-0
lines changed

1 file changed

+21
-0
lines changed

README.rst

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,27 @@ Uses
1313

1414
This is used as the scanner inside `Mathics <https://mathics.org>`_ but it can also be used for tokenizing and formatting WL code. In fact we intend to write one.
1515

16+
Implementation
17+
==============
18+
19+
mathics_scaner.characters
20+
-------------------------
21+
22+
This module consists mostly of translation tables between WL and unicode/ascii.
23+
Because of the large size of this tables, it was decided to store them in a
24+
file and read them from disk at runtime (when the module is imported). Our
25+
tests showed that storing the tables as JSON and using
26+
[ujson](https://github.com/ultrajson/ultrajson) to read them is the most
27+
efficient way to access them. However, this is merelly an implementation
28+
detail and consumers of this library should not relly on this assumption.
29+
30+
For maintainability and effeciency, we decided to store this data in a
31+
human-readable YAML file (`data/named-characters.yml`) and compile them into
32+
the JSON tables used internally by the library (`data/characters.json`) for
33+
faster access at runtime. The conversion of the data is performed by the
34+
script `admin-tools/compile-translation-tables.py` at each commit to the
35+
`master` branch via GitHub Actions.
36+
1637

1738
Contributing
1839
------------

0 commit comments

Comments
 (0)