Skip to content

Commit 36ec369

Browse files
authored
Support a wider array of integer literals (#832)
Fixes #769
1 parent 7b85243 commit 36ec369

File tree

7 files changed

+249
-72
lines changed

7 files changed

+249
-72
lines changed

.github/workflows/run-tests.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -102,17 +102,17 @@ jobs:
102102
with:
103103
python-version: ${{ matrix.version }}
104104
- name: Cache dependencies
105-
id: cache-deps
105+
id: min-deps-test-cache-deps
106106
uses: actions/cache@v3
107107
with:
108108
path: |
109109
.tox
110110
~/.cache/pip
111111
~/.cache/pypoetry
112112
~/.local/share/pypoetry
113-
key: mdt-${{ runner.os }}-python-${{ matrix.version }}-poetry-${{ hashFiles('pyproject.toml', 'tox.ini') }}
113+
key: min-deps-test-${{ runner.os }}-python-${{ matrix.version }}-poetry-${{ hashFiles('pyproject.toml', 'tox.ini') }}
114114
- name: Install Poetry
115-
if: steps.cache-deps.outputs.cache-hit != 'true'
115+
if: steps.min-deps-test-cache-deps.outputs.cache-hit != 'true'
116116
run: curl -sSL https://install.python-poetry.org | python3 -
117117
- name: Install Tox
118118
run: |

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1414
* Added support for executing Basilisp namespaces directly via `basilisp run` and by `python -m` (#791)
1515
* Added the `memoize` core fn (#812)
1616
* Added support for `thrown-with-msg?` assertions to `basilisp.test/is` (#831)
17+
* Added support for reading scientific notation literals, octal and hex integer literals, and arbitrary base (2-36) integer literals (#769)
1718

1819
### Changed
1920
* Optimize calls to Python's `operator` module into their corresponding native operators (#754)

docs/reader.rst

Lines changed: 54 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@ Numeric Literals
2121

2222
The Basilisp reader reads a wide range of numeric literals.
2323

24+
.. _integer_numbers:
25+
2426
Integers
2527
^^^^^^^^
2628

@@ -35,11 +37,22 @@ Integers
3537
basilisp.user=> (python/type 1N)
3638
<class 'int'>
3739

38-
Integers are represented using numeric ``0-9`` and may be prefixed with any number of negative signs ``-``.
39-
The resulting integer will have the correct sign after resolving all of the supplied ``-`` signs.
40+
Integers are represented using numeric ``0-9`` and may be prefixed with a single negative sign ``-``.
4041
For interoperability support with Clojure, Basilisp integers may also be declared with the ``N`` suffix, like ``1N``.
4142
In Clojure, this syntax signals a ``BigInteger``, but Python's default ``int`` type supports arbitrary precision by default so there is no difference between ``1`` and ``1N`` in Basilisp.
4243

44+
Integer literals may be specified in arbitrary bases between 2 and 36 by using the syntax ``[base]r[value]``.
45+
For example, in base 2 ``2r1001``, base 12 ``12r918a32``, and base 36 ``36r81jdk3kdp``.
46+
Arbitrary base literals do not distinguish between upper and lower case characters, so ``p`` and ``P`` are the same for bases which support ``P`` as a digit.
47+
Arbitrary base literals do not support the ``N`` suffix because ``N`` is a valid digit for some bases.
48+
49+
For common bases such as octal and hex, there is a custom syntax.
50+
Octal literals can be specified with a ``0`` prefix; for example, the octal literal ``0777`` corresponds to the base 10 integer 511.
51+
Hex literals can be specified with a ``0x`` prefix; for example, the hex literal ``0xFACE`` corresponds to the base 10 integer 64206.
52+
Both octal and hex literals support the ``N`` suffix and it is treated the same as with base 10 integers.
53+
54+
.. _floating_point_numbers:
55+
4356
Floating Point
4457
^^^^^^^^^^^^^^
4558

@@ -55,11 +68,30 @@ Floating Point
5568
<class 'decimal.Decimal'>
5669

5770
Floating point values are represented using ``0-9`` and a trailing decimal value, separated by a ``.`` character.
58-
Like integers, floating point values may be prefixed with an arbitrary number of negative signs ``-`` and the final read value will have the correct sign after resolving the negations.
71+
Like integers, floating point values may be prefixed with a single negative sign ``-``.
5972
By default floating point values are represented by Python's ``float`` type, which does **not** support arbitrary precision by default.
6073
Like in Clojure, floating point literals may be specified with a single ``M`` suffix to specify an arbitrary-precision floating point value.
6174
In Basilisp, a floating point number declared with a trailing ``M`` will return Python's `Decimal <https://docs.python.org/3/library/decimal.html>`_ type, which supports arbitrary floating point arithmetic.
6275

76+
.. _scientific_notation:
77+
78+
Scientific Notation
79+
^^^^^^^^^^^^^^^^^^^
80+
81+
::
82+
83+
basilisp.user=> 2e6
84+
2000000
85+
basilisp.user=> 3.14e-1
86+
0.31400000000000006
87+
88+
Basilisp supports scientific notation using the ``e`` syntax common to many programming languages.
89+
The significand (the number to the left of the ``e`` ) may be an integer or floating point and may be prefixed with a single negative sign ``-``.
90+
The exponent (the number to the right of the ``e`` ) must be an integer and may be prefixed with a single negative sign ``-``.
91+
The resulting value will be either an integer or float depending on the type of the significand.
92+
93+
.. _complex_numbers:
94+
6395
Complex
6496
^^^^^^^
6597

@@ -76,7 +108,24 @@ Complex
76108

77109
Basilisp includes support for complex literals to match the Python VM hosts it.
78110
Complex literals may be specified as integer or floating point values with a ``J`` suffix.
79-
Like integers and floats, complex values may be prefixed with an arbitrary number of negative signs ``-`` and the final read value will have the correct sign after resolving the negations.
111+
Like integers and floats, complex values may be prefixed with a single negative sign ``-``.
112+
113+
.. _ratios:
114+
115+
Ratios
116+
^^^^^^
117+
118+
::
119+
120+
basilisp.user=> 22/7
121+
22/7
122+
basilisp.user=> -3/8
123+
-3/8
124+
125+
Basilisp includes support for ratios.
126+
Ratios are represented as the division of 2 integers which cannot be reduced to an integer.
127+
As with integers and floats, the numerator of a ratio may be prefixed with a single negative sign ``-`` -- a negative sign may not appear in the denominator.
128+
In Basilisp, ratios are backed by Python's `Fraction <https://docs.python.org/3/library/fractions.html>`_ type, which is highly interoperable with other Python numeric types.
80129

81130
.. _strings:
82131

@@ -101,6 +150,7 @@ String literals are always read with the UTF-8 encoding.
101150
String literals may contain the following escape sequences: ``\\``, ``\a``, ``\b``, ``\f``, ``\n``, ``\r``, ``\t``, ``\v``.
102151
Their meanings match the equivalent escape sequences supported in `Python string literals <https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals>`_\.
103152

153+
.. _byte_strings
104154
105155
Byte Strings
106156
------------

src/basilisp/contrib/sphinx/autodoc.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,8 @@ def filter_members(
147147
self, members: List[ObjectMember], want_all: bool
148148
) -> List[Tuple[str, Any, bool]]:
149149
filtered = []
150-
for name, val in members:
150+
for member in members:
151+
name, val = member.__name__, member.object
151152
assert isinstance(val, runtime.Var)
152153
if self.options.exclude_members and name in self.options.exclude_members:
153154
continue
@@ -384,7 +385,8 @@ def filter_members(
384385
self, members: List[ObjectMember], want_all: bool
385386
) -> List[Tuple[str, Any, bool]]:
386387
filtered = []
387-
for name, val in members:
388+
for member in members:
389+
name, val = member.__name__, member.object
388390
assert isinstance(val, runtime.Var)
389391
if val.meta is not None:
390392
if val.meta.val_at(_PRIVATE_KW):

src/basilisp/edn.lpy

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@
8888
basilisp.lang.reader/ns-name-chars)
8989

9090
(def ^:private num-chars
91-
basilisp.lang.reader/num-chars)
91+
#"[0-9]")
9292

9393
(def ^:private unicode-char
9494
basilisp.lang.reader/unicode-char)

src/basilisp/lang/reader.py

Lines changed: 58 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,15 @@
6767
ns_name_chars = re.compile(r"\w|-|\+|\*|\?|/|\=|\\|!|&|%|>|<|\$|\.")
6868
alphanumeric_chars = re.compile(r"\w")
6969
begin_num_chars = re.compile(r"[0-9\-]")
70-
num_chars = re.compile("[0-9]")
70+
maybe_num_chars = re.compile(r"[0-9A-Za-z/\.]")
71+
integer_literal = re.compile(r"(-?(?:\d|[1-9]\d+))N?")
72+
float_literal = re.compile(r"(-?(?:\d|[1-9]\d+)(?:\.\d*)?)M?")
73+
complex_literal = re.compile(r"-?(\d+(?:\.\d*)?)J")
74+
arbitrary_base_literal = re.compile(r"-?(\d{1,2})r([0-9A-Za-z]+)")
75+
octal_literal = re.compile("-?0([0-7]+)N?")
76+
hex_literal = re.compile("-?0[Xx]([0-9A-Fa-f]+)N?")
77+
ratio_literal = re.compile(r"(-?\d+)/(\d+)")
78+
scientific_notation_literal = re.compile(r"-?(\d+(?:\.\d*)?)[Ee](-?\d+)")
7179
whitespace_chars = re.compile(r"[\s,]")
7280
newline_chars = re.compile("(\r\n|\r|\n)")
7381
fn_macro_args = re.compile("(%)(&|[0-9])?")
@@ -746,18 +754,13 @@ def _read_namespaced_map(ctx: ReaderContext) -> lmap.PersistentMap:
746754
MaybeNumber = Union[complex, decimal.Decimal, float, Fraction, int, MaybeSymbol]
747755

748756

749-
def _read_num( # noqa: C901 # pylint: disable=too-many-statements
757+
def _read_num( # noqa: C901 # pylint: disable=too-many-locals,too-many-statements
750758
ctx: ReaderContext,
751759
) -> MaybeNumber:
752760
"""Return a numeric (complex, Decimal, float, int, Fraction) from the input stream."""
753761
chars: List[str] = []
754762
reader = ctx.reader
755763

756-
is_complex = False
757-
is_decimal = False
758-
is_float = False
759-
is_integer = False
760-
is_ratio = False
761764
while True:
762765
token = reader.peek()
763766
if token == "-":
@@ -774,68 +777,62 @@ def _read_num( # noqa: C901 # pylint: disable=too-many-statements
774777
return _read_sym(ctx)
775778
chars.append(token)
776779
continue
777-
elif token == ".":
778-
if is_float:
779-
raise ctx.syntax_error(
780-
"Found extra '.' in float; expected decimal portion"
781-
)
782-
is_float = True
783-
elif token == "J":
784-
if is_complex:
785-
raise ctx.syntax_error("Found extra 'J' suffix in complex literal")
786-
is_complex = True
787-
elif token == "M":
788-
if is_decimal:
789-
raise ctx.syntax_error("Found extra 'M' suffix in decimal literal")
790-
is_decimal = True
791-
elif token == "N":
792-
if is_integer:
793-
raise ctx.syntax_error("Found extra 'N' suffix in integer literal")
794-
is_integer = True
795-
elif token == "/":
796-
if is_ratio:
797-
raise ctx.syntax_error("Found extra '/' in ratio literal")
798-
is_ratio = True
799-
elif not num_chars.match(token):
780+
elif not maybe_num_chars.match(token):
800781
break
801782
reader.next_token()
802783
chars.append(token)
803784

804-
assert len(chars) > 0, "Must have at least one digit in integer or float"
785+
assert len(chars) > 0, "Must have at least one digit in number"
805786

806787
s = "".join(chars)
807-
if (
808-
sum(
809-
[
810-
is_complex and is_decimal,
811-
is_complex and is_integer,
812-
is_complex and is_ratio,
813-
is_decimal or is_float,
814-
is_integer,
815-
is_ratio,
816-
]
817-
)
818-
> 1
819-
):
820-
raise ctx.syntax_error(f"Invalid number format: {s}")
788+
neg = s.startswith("-")
821789

822-
if is_complex:
823-
imaginary = float(s[:-1]) if is_float else int(s[:-1])
824-
return complex(0, imaginary)
825-
elif is_decimal:
790+
if (match := integer_literal.fullmatch(s)) is not None:
791+
return int(match.group(1))
792+
elif (match := float_literal.fullmatch(s)) is not None:
793+
if s.endswith("M"):
794+
try:
795+
return decimal.Decimal(match.group(1))
796+
except decimal.InvalidOperation: # pragma: no cover
797+
raise ctx.syntax_error(f"Invalid number format: {s}") from None
798+
else:
799+
return float(match.group(1))
800+
elif (match := octal_literal.fullmatch(s)) is not None:
801+
v = int(match.group(1), base=8)
802+
return -v if neg else v
803+
elif (match := hex_literal.fullmatch(s)) is not None:
804+
v = int(match.group(1), base=16)
805+
return -v if neg else v
806+
elif (match := ratio_literal.fullmatch(s)) is not None:
807+
num, denominator = match.groups()
808+
if (numerator := int(num)) == 0:
809+
return 0
826810
try:
827-
return decimal.Decimal(s[:-1])
828-
except decimal.InvalidOperation:
829-
raise ctx.syntax_error(f"Invalid number format: {s}") from None
830-
elif is_float:
831-
return float(s)
832-
elif is_ratio:
833-
assert "/" in s, "Ratio must contain one '/' character"
834-
num, denominator = s.split("/")
835-
return Fraction(numerator=int(num), denominator=int(denominator))
836-
elif is_integer:
837-
return int(s[:-1])
838-
return int(s)
811+
return Fraction(numerator=numerator, denominator=int(denominator))
812+
except ZeroDivisionError as e:
813+
raise ctx.syntax_error(f"Invalid ratio format: {s}") from e
814+
elif (match := scientific_notation_literal.fullmatch(s)) is not None:
815+
sig = float(m) if "." in (m := match.group(1)) else int(m)
816+
exp = int(match.group(2))
817+
res = sig * (10**exp)
818+
return -res if neg else res
819+
elif (match := arbitrary_base_literal.fullmatch(s)) is not None:
820+
base = int(match.group(1))
821+
if not 2 <= base <= 36:
822+
raise ctx.syntax_error(
823+
f"Invalid base {base} for integer literal {s}: must be between 2 and 36"
824+
)
825+
try:
826+
v = int(match.group(2), base=base)
827+
except ValueError as e:
828+
raise ctx.syntax_error(f"Invalid number format: {s}") from e
829+
else:
830+
return -v if neg else v
831+
elif (match := complex_literal.fullmatch(s)) is not None:
832+
imaginary_raw = match.group(1)
833+
imaginary = float(imaginary_raw) if "." in imaginary_raw else int(imaginary_raw)
834+
return complex(0, -imaginary if neg else imaginary)
835+
raise ctx.syntax_error(f"Invalid number format: {s}")
839836

840837

841838
_STR_ESCAPE_CHARS = {

0 commit comments

Comments
 (0)