Skip to content

Commit 718e1f3

Browse files
EmilyBourneFarouk-Echarefyguclu
authored
Improve basic datatypes (pyccel#1756)
Rewrites the datatyping system and fix pyccel#1729. As described in the issue, the existing superclass `Datatype` is replaced with 2 superclasses representing slightly different concepts: - `PrimitiveType` : Representing the category of datatype (integer/floating point/etc) - `PyccelType` : Representing the actual type of the object in Python Subclasses of `PyccelType` generally fall into one of two categories: - `FixedSizeType` - `ContainerType` These types are described in `developer_docs/type_inference.md` (it is recommended to read this file first before reviewing this pull request. The `precision` is removed from `ast.basic.TypedAstNode` and is stored in the `FixedSizeType`s where it is relevant. This allows us to remove the necessity for negative precision values as defaults, as the types can now be easily differentiated using the different `FixedSizeType`s. The attribute `_dtype` is removed from `ast.basic.TypedAstNode` instances. It is still accessible via the `dtype` property but is extracted from the class type. This ensures that the two do not deviate from one another. The NumPy type deduction is significantly improved due to two major improvements. Firstly an assertion is added to `NumpyNDArrayType` to ensure that the internal datatypes are NumPy types. This uncovered a few bugs where the native Python types were used in our current code. Thus the wrong type would be returned if an element of one of these objects was returned. Secondly `np.result_type` is now used to deduce the result type of NumPy functions. This function is implemented by NumPy which guarantees that it handles all corner cases and matches the expected behaviour of NumPy. Fixes pyccel#1763. The `CustomDatatype` is simplified slightly to remove the unused parameters. The printing of floats/complexes in Fortran is improved to stop printing decimal digits beyond the precision of the number. Additionally printing the return value of functions returning multiple results of different types is fixed. --------- Co-authored-by: Farouk-Echaref <[email protected]> Co-authored-by: Yaman Güçlü <[email protected]>
1 parent 6c5b537 commit 718e1f3

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+3475
-2769
lines changed

.dict_custom.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@ Pythran
33
numba
44
NumPy
55
NumPy's
6+
CuPy
7+
CuPy's
68
BLAS
79
LAPACK
810
MPI
@@ -108,3 +110,6 @@ subclasses
108110
oneAPI
109111
getter
110112
setter
113+
bitwise
114+
datatyping
115+
datatypes

CHANGELOG.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,15 +13,29 @@ All notable changes to this project will be documented in this file.
1313
### Fixed
1414

1515
- #1720 : Fix Undefined Variable error when the function definition is after the variable declaration.
16+
- #1763 Use `np.result_type` to avoid mistakes in non-trivial NumPy type promotion rules.
17+
- Fix some cases where a Python built-in type is returned in place of a NumPy type.
18+
- Stop printing numbers with more decimal digits than their precision.
19+
- Allow printing the result of a function returning multiple objects of different types.
1620

1721
### Changed
1822
- #1720 : functions with the `@inline` decorator are no longer exposed to Python in the shared library.
1923
- #1720 : Error raised when incompatible arguments are passed to an `inlined` function is now fatal.
2024
- \[INTERNALS\] `FunctionDef` is annotated when it is called, or at the end of the `CodeBlock` if it is never called.
2125
- \[INTERNALS\] `InlinedFunctionDef` is only annotated if it is called.
26+
- \[INTERNALS\] Build `utilities.metaclasses.ArgumentSingleton` on the fly to ensure correct docstrings.
27+
- \[INTERNALS\] Rewrite datatyping system. See #1722.
28+
- \[INTERNALS\] Moved precision from `ast.basic.TypedAstNode` to an internal property of `ast.datatypes.FixedSizeNumericType` objects.
29+
- \[INTERNALS\] Use cached `__add__` method to determine result type of arithmetic operations.
30+
- \[INTERNALS\] Use cached `__and__` method to determine result type of bitwise comparison operations.
2231

2332
### Deprecated
2433

34+
- \[INTERNALS\] Remove property `ast.basic.TypedAstNode.precision`.
35+
- \[INTERNALS\] Remove class `ast.datatypes.DataType` (replaced by `ast.datatypes.PrimitiveType` and `ast.datatypes.PyccelType`).
36+
- \[INTERNALS\] Remove unused properties `prefix` and `alias` from `CustomDataType`.
37+
- \[INTERNALS\] Remove `ast.basic.TypedAstNode._dtype`. The datatype can still be accessed as it is contained within the class type.
38+
2539
## \[1.11.2\] - 2024-03-05
2640

2741
### Added

bandit.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,5 @@ skips:
1010
- B403 # Ignore warnings about import pickle
1111
- B301 # Ignore warnings about pickle.load
1212
- B303 # Ignore warnings about MD2, MD4, MD5, or SHA1 hash functions
13+
14+
exclude_dirs: ['pyccel/utilities/metaclasses.py']

developer_docs/ast_nodes.md

Lines changed: 22 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -10,33 +10,33 @@ The inheritance tree for a Python AST node is often more complicated than direct
1010

1111
The class `TypedAstNode` is a super class. This class should never be used directly but provides functionalities which are common to certain AST objects. These AST nodes are those which describe objects which take up space in memory in a running program. For example a Variable requires space in memory, as does the result of a function call or an arithmetic operation, however a loop or a module does not require runtime memory to store the concept. Objects which require memory must therefore contain all information necessary to declare them in the generated code. A `TypedAstNode` therefore exposes the following properties:
1212
- `dtype`
13-
- `precision`
1413
- `rank`
1514
- `shape`
1615
- `order`
1716
- `class_type`
1817

1918
The contents of these types are explained in more detail below.
2019

21-
When examining the class `TypedAstNode` you may notice that there are two methods for getting each of these properties. Each time, one is a standard method, while the other is a static class method. In general in the code you will always use the normal method. This will return the instance attribute if it is available, otherwise it will return the static class attribute. The static class method is used for type deductions when [parsing type annotations](./type_inference.md). In type annotations we do not generally have an instance of a class, however we can get access to the class itself. For instance let us consider the following type annotation:
20+
The class `TypedAstNode` also contains static class methods. The static class method is used for type deductions when [parsing type annotations](./type_inference.md). In type annotations we do not generally have an instance of a class, however we can get access to the class itself. The available class methods are:
21+
- `static_type`
22+
- `static_rank`
23+
- `static_order`
24+
25+
For instance let us consider the following type annotation:
2226
```python
2327
a : int
2428
```
25-
When we visit `int` in the [semantic stage](./semantic_stage.md) the `SemanticParser` will return the class `PythonInt`. This is usually used as a function (e.g to cast a variable), however here we use it to deduce the type. Following the [development conventions](./development_conventions.md#Class-variables-vs.-Instance-variables) any attributes which will remain constant over all instances of a class should be stored in static class attributes. This means that they can be accessed via these static methods. Returning to our example, a call to the function `int` always returns a scalar object with an integer type and default precision. This means that all the properties of a `TypedAstNode` can be defined without having an instance of this class. These properties cannot be defined statically for all nodes (e.g. it would not be possible for `PyccelAdd`), however generally they can be defined for the nodes which can be used in type annotations.
29+
When we visit `int` in the [semantic stage](./semantic_stage.md) the `SemanticParser` will return the class `PythonInt`. This is usually used as a function (e.g to cast a variable), however here we use it to deduce the type. Following the [development conventions](./development_conventions.md#Class-variables-vs.-Instance-variables) any attributes which will remain constant over all instances of a class should be stored in static class attributes. This means that they can be accessed via these static methods. Returning to our example, a call to the function `int` always returns a scalar object with the built-in `float` type. This means that all the properties of a `TypedAstNode` can be defined without having an instance of this class. These properties cannot be defined statically for all nodes (e.g. it would not be possible for `PyccelAdd`), however generally they can be defined for the nodes which can be used in type annotations.
2630

2731
### Class type
2832

29-
The class type is the type reported by Python when you call the built-in function `type`. The object stored in this attribute should inherit from `pyccel.ast.datatypes.DataType`.
33+
The `class_type` property represents the type reported by Python when you call the built-in function `type`. This property should return an object which inherits from `pyccel.ast.datatypes.PyccelType`.
3034

3135
### Datatype
3236

33-
Some types in Python are containers which contain elements of other types. This is the case for NumPy arrays, tuples, lists, etc. In this case, the class type does not provide enough information to write the declaration in the low-level target language. Additionally a data type is required. The data type is the type of an element of the container, as for the class type, the object stored in this attribute should inherit from `pyccel.ast.datatypes.DataType`. If the class type is not a container then the class type and the data type will be the same.
34-
35-
### Precision
36-
37-
The precision indicates the precision of the datatype. This number is related to the number of bytes that the datatype takes up in memory (e.g. `float64` has precision = 8 as it takes up 8 bytes, `complex128` has precision = 8 as it is comprised of two `float64` objects). The precision is equivalent to the `kind` parameter in Fortran.
37+
Some types in Python are containers which contain elements of other types. This is the case for NumPy arrays, tuples, lists, etc. In this case, the class type does not provide enough information to write the declaration in the low-level target language. Additionally a data type is required. The data type is the type of an element of the container and can be accessed via the `dtype` property. As for the class type the object returned by this property should inherit from `pyccel.ast.datatypes.PyccelType`. If the class type is not a container then the class type and the data type will be the same, otherwise the class type will inherit from `pyccel.ast.datatypes.ContainerType` and the data type will inherit from `pyccel.ast.datatypes.FixedSizeType`.
3838

39-
In Python the precision of some types depends on the system where the code is run. This is notably the case for integers which have a precision of 4 on Windows but a precision of 8 on Linux and MacOS. This is the case for native types. In order to differentiate these types from the fixed-precision objects provided by NumPy, the precision -1 is used to denote the default precision.
39+
A `FixedSizeType` represents a built-in scalar datatype which can be represented in memory. E.g. `int32`, `int64`. It is characterised by a primitive type which describes the category of datatype (integer, floating point, etc) and a precision. The precision is related to the number of bytes that the datatype takes up in memory (e.g. `float64` has precision = 8 as it takes up 8 bytes, `complex128` has precision = 8 as it is comprised of two `float64` objects). The precision is equivalent to the `kind` parameter in Fortran.
4040

4141
### Rank
4242

@@ -50,6 +50,18 @@ The shape of an array indicates the number of elements in each dimension of the
5050

5151
The order indicates how an array is laid out in memory. This can either be row-major (C-style) ordering or column-major (Fortran-style) ordering. For more information about this, please see the [dedicated documentation](./order_docs.md).
5252

53+
### Static Type
54+
55+
The static type is the class type that would be assigned to an object created using an instance of this class as a type annotation.
56+
57+
### Static rank
58+
59+
The static rank is the rank that would be assigned to an object created using an instance of this class as a type annotation.
60+
61+
### Static order
62+
63+
The static order is the order that would be assigned to an object created using an instance of this class as a type annotation.
64+
5365
## Pyccel Internal Function
5466

5567
The class `pyccel.ast.internals.PyccelInternalFunction` is a super class. This class should never be used directly but provides functionalities which are common to certain AST objects. These AST nodes are those which describe functions which are supported by Pyccel. For example it is used for functions from the `math` library, the `cmath` library, the `numpy` library, etc. `PyccelInternalFunction` inherits from `TypedAstNode`. The type information for the sub-class describes the type of the result of the function.

developer_docs/type_inference.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,3 +43,50 @@ These objects contain all the information necessary to create a Variable from a
4343
### Inferring types from assignments
4444

4545
When assignments occur in the code types must also be inferred. This allows new variables to be declared implicitly, and also enables us to verify that the types of existing variables do not change. In this case the type inference is done via the AST nodes. Each node contains the logic necessary to deduce its type information from the arguments passed to it. The resulting object, which is found on the right hand side of an assignment can be used to verify or define the type of the object on the left hand side of the assignment.
46+
47+
### Data types in Pyccel
48+
49+
The data types in Pyccel are designed around two super classes:
50+
- `PrimitiveType` : Representing the category of datatype (integer/floating point/etc)
51+
- `PyccelType` : Representing the actual type of the object in Python
52+
53+
Subclasses of `PyccelType` generally fall into one of two categories:
54+
- `FixedSizeType`
55+
- `ContainerType`
56+
57+
The types can be compared using either the `is` operator or the `==` operator. These operators have slightly different behaviour. All instances of `PyccelType` are singletons so the `is` operator tests if the types are identical. However the `==` operator tests if the types are compatible. For example `PythonNativeFloat() == NumpyFloat64Type()` will return true. This operator should therefore be used when permissive behaviour is required (e.g. when adding elements to a list of `PythonNativeFloat()` we are capable of adding an instance with the type `NumpyFloat64Type()` even if this would not be strictly homogeneous in Python).
58+
59+
#### Fixed Size Type
60+
A `FixedSizeType` is an object whose size in memory is known and cannot change from one instance to another (e.g. `float64`, `int32`, `void`). In most cases the developer will need the sub-class `FixedSizeNumericType` which refers to the subset of fixed size types which contain numeric values. These objects are characterised by a `primitive_type` describing the category of datatype (integer/floating point/etc) and a `precision`. They additionally implement two magic methods:
61+
- `__add__` (+)
62+
- `__and__` (&)
63+
64+
The add operator describes what happens when two numeric types are combined in an arithmetic operator. In almost all cases this is sufficient to describe all resulting datatypes. Special cases (e.g. float for integer division) are handled in the associated operator in `ast.operators` or `ast.bitwise_operators`.
65+
66+
The and operator describes what happens when two numeric types are combined in a bitwise comparison operator. This only applies to integers and booleans.
67+
68+
When using these operators on an unknown number of arguments it can be useful to use `NativeGeneric()` as a starting point for the sum.
69+
70+
#### Container Type
71+
A `ContainerType` is an object which is comprised of `FixedSizeType` objects (e.g. `ndarray`,`list`,`tuple`, custom class). The sub-class `HomogeneousContainerType` describes containers which contain homogeneous data. These objects are characterised by an `element_type`. The elements of a `HomogeneousContainerType` are instances of `PyccelType`, but they can be either `FixedSizeType`s or `ContainerType`s.
72+
73+
`HomogeneousContainerType`s also contain some utility functions. They implement `primitive_type` and `precision` to get the properties of the internal `FixedSizeType` (even if that type is inside another `HomogeneousContainerType`). They also implement `switch_basic_type` which creates a new `HomogeneousContainerType` which is similar to the current `HomogeneousContainerType`. The only difference is that the `FixedSizeType` is exchanged. This is useful when we want to preserve information about the container but need to change the type. For example, when we divide an integer by another we get a floating point type. When we divide a NumPy array or a CuPy array of integers by an integer (or array of integers) we get a NumPy/CuPy array of floating point numbers (with default Python precision). In order to preserve the container type we therefore call `switch_basic_type`. So for the division in the case of NumPy arrays, we want to change the type from `np.ndarray[int]` to `np.ndarray[float]`. This is done in one line:
74+
```python
75+
new_class_type = class_type.switch_basic_type(NativePythonFloat())
76+
```
77+
instead of the multiple lines that would be needed without this function. The advantage of this is seen most clearly when we consider a function acting on a more complex type e.g. `list[np.ndarray[float]]` in this case without the `switch_basic_type` function, the equivalent code would be much longer:
78+
```python
79+
new_container_types = []
80+
old_type = class_type
81+
while isinstance(old_type, ContainerType):
82+
new_container_types.append(type(old_type))
83+
old_type = old_type.element_type
84+
85+
new_type = old_type
86+
for container in new_container_types:
87+
new_type = container(new_type)
88+
```
89+
90+
The `switch_basic_type` cannot be implemented generally in `PyccelType` as there is no logical interpretation for an inhomogeneous `ContainerType`, however the function is also implemented (as the identity function) for `FixedSizeType`s so `switch_basic_type` can be used without the need for type checks (generally inhomogeneous containers will not be valid arguments to classes which may need to use the `switch_basic_type` function).
91+
92+
In order to access the internal `FixedSizeType`, `PyccelType` also implements a `datatype` property. This makes more sense in the case of a `HomogeneousContainerType` however it is also implemented (as the identity function) for `FixedSizeType`s so the low-level type can be obtained without the need for type checks.

0 commit comments

Comments
 (0)