Skip to content

Commit 869b668

Browse files
committed
Merge branch 'master' of github.com:Mathics3/mathics-scanner
2 parents 7e97d13 + d76c53f commit 869b668

File tree

5 files changed

+27
-2
lines changed

5 files changed

+27
-2
lines changed

README.rst

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,27 @@ Uses
1313

1414
This is used as the scanner inside `Mathics <https://mathics.org>`_ but it can also be used for tokenizing and formatting WL code. In fact we intend to write one.
1515

16+
Implementation
17+
==============
18+
19+
mathics_scaner.characters
20+
-------------------------
21+
22+
This module consists mostly of translation tables between WL and unicode/ascii.
23+
Because of the large size of this tables, it was decided to store them in a
24+
file and read them from disk at runtime (when the module is imported). Our
25+
tests showed that storing the tables as JSON and using
26+
[ujson](https://github.com/ultrajson/ultrajson) to read them is the most
27+
efficient way to access them. However, this is merelly an implementation
28+
detail and consumers of this library should not relly on this assumption.
29+
30+
For maintainability and effeciency, we decided to store this data in a
31+
human-readable YAML file (`data/named-characters.yml`) and compile them into
32+
the JSON tables used internally by the library (`data/characters.json`) for
33+
faster access at runtime. The conversion of the data is performed by the
34+
script `admin-tools/compile-translation-tables.py` at each commit to the
35+
`master` branch via GitHub Actions.
36+
1637

1738
Contributing
1839
------------

mathics_scanner/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@
66
from mathics_scanner.version import __version__
77

88
from mathics_scanner.characters import (
9+
aliased_characters,
10+
named_characters,
911
replace_unicode_with_wl,
1012
replace_wl_with_plain_text,
1113
)

mathics_scanner/data/named-characters.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6943,7 +6943,7 @@ Upsilon:
69436943
# looks more like U+26E2 (Astronomical Symbol for Uranus) than the Standard Unicode equavalent
69446944
# seen at https://www.compart.com/en/unicode/U+2645.
69456945
# As with the Earth, we are going off of the name and the code point rather than the
6946-
# visual representation of the symbo.
6946+
# visual representation of the symbol.
69476947
Uranus:
69486948
has-unicode-inverse: false
69496949
is-letter-like: false

mathics_scanner/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,4 @@
55
# This file is suitable for sourcing inside POSIX shell as
66
# well as importing into Python. That's why there is no
77
# space around "=" below.
8-
__version__="1.0.0.dev" # noqa
8+
__version__="1.0.0.dev0" # noqa

setup.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
import sys
2929
import os.path as osp
3030
import platform
31+
import subprocess
3132
from setuptools import setup, Command, Extension
3233

3334
# Ensure user has the correct Python version
@@ -43,6 +44,7 @@ def get_srcdir():
4344
def read(*rnames):
4445
return open(osp.join(get_srcdir(), *rnames)).read()
4546

47+
subprocess.run(["make", "mathics_scanner/data/characters.json"])
4648

4749
# stores __version__ in the current namespace
4850
exec(compile(open("mathics_scanner/version.py").read(), "mathics_scanner/version.py", "exec"))

0 commit comments

Comments
 (0)