Skip to content

Commit 1b7e717

Browse files
authored
[PEP 747] Recognize TypeForm[T] type and values (#9773) (#19596)
_(This PR replaces an earlier draft of the same feature: #18690 )_ Feedback from @JukkaL integrated since the last PR, by commit title: * Apply feedback: Change MAYBE_UNRECOGNIZED_STR_TYPEFORM from unaccompanied note to standalone error * Apply feedback: Refactor extract save/restore of SemanticAnalyzer state to a new context manager * Apply feedback: Suppress SyntaxWarnings when parsing strings as types at the _most-targeted_ location * Apply feedback: Add TypeForm profiling counters to SemanticAnalyzer and the --dump-build-stats option * Increase efficiency of quick rejection heuristic from 85.8% -> 99.6% in SemanticAnalyzer.try_parse_as_type_expression() * Apply feedback: Recognize assignment to union of TypeForm with non-TypeForm * Apply feedback: Alter primitives.pyi fixture rather than tuple.pyi and dict.pyi Feedback NOT integrated, with rationale: * ✖️ Add tests related to recursive types * Recursive cases are already well-covered by tests related to TypeType (is_type_form=False). * I _did_ find an infinite recursion bug affecting garden-variety `Type[...]`, which I can fix in a separate PR. * ✖️ Define `TypeForm(...)` in value contexts as a regular function like `Callable[[TypeForm[T]], TypeForm[T]]` rather than as a special expression node (TypeFormExpr). * The special expression node allows mypy to print out _better error messages_ when a user puts an invalid type expression inside `TypeForm(...)`. See case 4 of testTypeFormExpression in check-typeform.test There is one commit unrelated to the core function of this PR that could be split to a separate PR: * Allow TypeAlias and PlaceholderNode to be stringified/printed Closes #9773 --- _(Most of the following description is copied from the original PR, **except for the text in bold**)_ Implements the [TypeForm PEP 747](https://peps.python.org/pep-0747/), as an opt-in feature enabled by the CLI flag `--enable-incomplete-feature=TypeForm`. Implementation approach: * The `TypeForm[T]` is represented as a type using the existing `TypeType` class, with an `is_type_form=True` constructor parameter. `Type[C]` continues to be represented using `TypeType`, but with `is_type_form=False` (the default). * Recognizing a type expression literal such as `int | str` requires parsing an `Expression` as a type expression. Only the SemanticAnalyzer pass has the ability to parse **arbitrary** type expressions **(including stringified annotations)**, using `SemanticAnalyzer.expr_to_analyzed_type()`. **(I've extended the `TypeChecker` pass to parse all kinds of type expressions except stringified annotations, using the new `TypeCheckerAsSemanticAnalyzer` adapter.)** * Therefore during the SemanticAnalyzer pass, at certain syntactic locations (i.e. assignment r-values, callable arguments, returned expressions), the analyzer tries to parse the `Expression` it is looking at using `try_parse_as_type_expression()` - a new function - and stores the result (a `Type`) in `{IndexExpr, OpExpr, StrExpr}.as_type` - a new attribute. * During the later TypeChecker pass, when looking at an `Expression` to determine its type, if the expression is in a type context that expects some kind of `TypeForm[...]` and the expression was successfully parsed as a type expression by the earlier SemanticAnalyzer pass **(or can be parsed as a type expression immediately during the type checker pass)**, the expression will be given the type `TypeForm[expr.as_type]` rather than using the regular type inference rules for a value expression. * Key relationships between `TypeForm[T]`, `Type[C]`, and `object` types are defined in the visitors powering `is_subtype`, `join_types`, and `meet_types`. * The `TypeForm(T)` expression is recognized as a `TypeFormExpr` and has the return type `TypeForm[T]`. * The new test suite in `check-typeform.test` is a good reference to the expected behaviors for operations that interact with `TypeForm` in some way. Controversial parts of this PR, in @davidfstr 's opinion: * Type form literals **containing stringified annotations** are only recognized in certain syntactic locations (and not ALL possible locations). Namely they are recognized as (1) assignment r-values, (2) callable expression arguments, and (3) as returned expressions, but nowhere else. For example they aren't recognized in expressions like `dict_with_typx_keys[int | str]`. **Attempting to use stringified annotations in other locations will emit a MAYBE_UNRECOGNIZED_STR_TYPEFORM error.** * The existing `TypeType` class is now used to represent BOTH the `Type[T]` and `TypeForm[T]` types, rather than introducing a distinct subclass of `Type` to represent the `TypeForm[T]` type. This was done to simplify logic that manipulates both `Type[T]` and `TypeForm[T]` values, since they are both manipulated in very similar ways. * The "normalized" form of `TypeForm[X | Y]` - as returned by `TypeType.make_normalized()` - is just `TypeForm[X | Y]` rather than `TypeForm[X] | TypeForm[Y]`, differing from the normalization behavior of `Type[X | Y]`.
1 parent 2809328 commit 1b7e717

38 files changed

+1814
-73
lines changed

docs/source/error_code_list.rst

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1286,6 +1286,65 @@ type must be a subtype of the original type::
12861286
def g(x: object) -> TypeIs[str]: # OK
12871287
...
12881288

1289+
.. _code-maybe-unrecognized-str-typeform:
1290+
1291+
String appears in a context which expects a TypeForm [maybe-unrecognized-str-typeform]
1292+
--------------------------------------------------------------------------------------
1293+
1294+
TypeForm literals may contain string annotations:
1295+
1296+
.. code-block:: python
1297+
1298+
typx1: TypeForm = str | None
1299+
typx2: TypeForm = 'str | None' # OK
1300+
typx3: TypeForm = 'str' | None # OK
1301+
1302+
However TypeForm literals containing a string annotation can only be recognized
1303+
by mypy in the following locations:
1304+
1305+
.. code-block:: python
1306+
1307+
typx_var: TypeForm = 'str | None' # assignment r-value
1308+
1309+
def func(typx_param: TypeForm) -> TypeForm:
1310+
return 'str | None' # returned expression
1311+
1312+
func('str | None') # callable's argument
1313+
1314+
If you try to use a string annotation in some other location
1315+
which expects a TypeForm, the string value will always be treated as a ``str``
1316+
even if a ``TypeForm`` would be more appropriate and this error code
1317+
will be generated:
1318+
1319+
.. code-block:: python
1320+
1321+
# Error: TypeForm containing a string annotation cannot be recognized here. Surround with TypeForm(...) to recognize. [maybe-unrecognized-str-typeform]
1322+
# Error: List item 0 has incompatible type "str"; expected "TypeForm[Any]" [list-item]
1323+
list_of_typx: list[TypeForm] = ['str | None', float]
1324+
1325+
Fix the error by surrounding the entire type with ``TypeForm(...)``:
1326+
1327+
.. code-block:: python
1328+
1329+
list_of_typx: list[TypeForm] = [TypeForm('str | None'), float] # OK
1330+
1331+
Similarly, if you try to use a string literal in a location which expects a
1332+
TypeForm, this error code will be generated:
1333+
1334+
.. code-block:: python
1335+
1336+
dict_of_typx = {'str_or_none': TypeForm(str | None)}
1337+
# Error: TypeForm containing a string annotation cannot be recognized here. Surround with TypeForm(...) to recognize. [maybe-unrecognized-str-typeform]
1338+
list_of_typx: list[TypeForm] = [dict_of_typx['str_or_none']]
1339+
1340+
Fix the error by adding ``# type: ignore[maybe-unrecognized-str-typeform]``
1341+
to the line with the string literal:
1342+
1343+
.. code-block:: python
1344+
1345+
dict_of_typx = {'str_or_none': TypeForm(str | None)}
1346+
list_of_typx: list[TypeForm] = [dict_of_typx['str_or_none']] # type: ignore[maybe-unrecognized-str-typeform]
1347+
12891348
.. _code-misc:
12901349

12911350
Miscellaneous checks [misc]

misc/analyze_typeform_stats.py

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
#!/usr/bin/env python3
2+
"""
3+
Analyze TypeForm parsing efficiency from mypy build stats.
4+
5+
Usage:
6+
python3 analyze_typeform_stats.py '<mypy_output_with_stats>'
7+
python3 -m mypy --dump-build-stats file.py 2>&1 | python3 analyze_typeform_stats.py
8+
9+
Example output:
10+
TypeForm Expression Parsing Statistics:
11+
==================================================
12+
Total calls to SA.try_parse_as_type_expression: 14,555
13+
Quick rejections (no full parse): 14,255
14+
Full parses attempted: 300
15+
- Successful: 248
16+
- Failed: 52
17+
18+
Efficiency Metrics:
19+
- Quick rejection rate: 97.9%
20+
- Full parse rate: 2.1%
21+
- Full parse success rate: 82.7%
22+
- Overall success rate: 1.7%
23+
24+
Performance Implications:
25+
- Expensive failed full parses: 52 (0.4% of all calls)
26+
27+
See also:
28+
- mypy/semanal.py: SemanticAnalyzer.try_parse_as_type_expression()
29+
- mypy/semanal.py: DEBUG_TYPE_EXPRESSION_FULL_PARSE_FAILURES
30+
"""
31+
32+
import re
33+
import sys
34+
35+
36+
def analyze_stats(output: str) -> None:
37+
"""Parse mypy stats output and calculate TypeForm parsing efficiency."""
38+
39+
# Extract the three counters
40+
total_match = re.search(r"type_expression_parse_count:\s*(\d+)", output)
41+
success_match = re.search(r"type_expression_full_parse_success_count:\s*(\d+)", output)
42+
failure_match = re.search(r"type_expression_full_parse_failure_count:\s*(\d+)", output)
43+
44+
if not (total_match and success_match and failure_match):
45+
print("Error: Could not find all required counters in output")
46+
return
47+
48+
total = int(total_match.group(1))
49+
successes = int(success_match.group(1))
50+
failures = int(failure_match.group(1))
51+
52+
full_parses = successes + failures
53+
54+
print("TypeForm Expression Parsing Statistics:")
55+
print("=" * 50)
56+
print(f"Total calls to SA.try_parse_as_type_expression: {total:,}")
57+
print(f"Quick rejections (no full parse): {total - full_parses:,}")
58+
print(f"Full parses attempted: {full_parses:,}")
59+
print(f" - Successful: {successes:,}")
60+
print(f" - Failed: {failures:,}")
61+
if total > 0:
62+
print()
63+
print("Efficiency Metrics:")
64+
print(f" - Quick rejection rate: {((total - full_parses) / total * 100):.1f}%")
65+
print(f" - Full parse rate: {(full_parses / total * 100):.1f}%")
66+
print(f" - Full parse success rate: {(successes / full_parses * 100):.1f}%")
67+
print(f" - Overall success rate: {(successes / total * 100):.1f}%")
68+
print()
69+
print("Performance Implications:")
70+
print(
71+
f" - Expensive failed full parses: {failures:,} ({(failures / total * 100):.1f}% of all calls)"
72+
)
73+
74+
75+
if __name__ == "__main__":
76+
if len(sys.argv) == 1:
77+
# Read from stdin
78+
output = sys.stdin.read()
79+
elif len(sys.argv) == 2:
80+
# Read from command line argument
81+
output = sys.argv[1]
82+
else:
83+
print("Usage: python3 analyze_typeform_stats.py [mypy_output_with_stats]")
84+
print("Examples:")
85+
print(
86+
" python3 -m mypy --dump-build-stats file.py 2>&1 | python3 analyze_typeform_stats.py"
87+
)
88+
print(" python3 analyze_typeform_stats.py 'output_string'")
89+
sys.exit(1)
90+
91+
analyze_stats(output)

mypy/checker.py

Lines changed: 112 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,7 @@
155155
from mypy.scope import Scope
156156
from mypy.semanal import is_trivial_body, refers_to_fullname, set_callable_name
157157
from mypy.semanal_enum import ENUM_BASES, ENUM_SPECIAL_PROPS
158+
from mypy.semanal_shared import SemanticAnalyzerCoreInterface
158159
from mypy.sharedparse import BINARY_MAGIC_METHODS
159160
from mypy.state import state
160161
from mypy.subtypes import (
@@ -346,6 +347,8 @@ class TypeChecker(NodeVisitor[None], TypeCheckerSharedApi):
346347

347348
tscope: Scope
348349
scope: CheckerScope
350+
# Innermost enclosing type
351+
type: TypeInfo | None
349352
# Stack of function return types
350353
return_types: list[Type]
351354
# Flags; true for dynamically typed functions
@@ -423,6 +426,7 @@ def __init__(
423426
self.scope = CheckerScope(tree)
424427
self.binder = ConditionalTypeBinder(options)
425428
self.globals = tree.names
429+
self.type = None
426430
self.return_types = []
427431
self.dynamic_funcs = []
428432
self.partial_types = []
@@ -2661,7 +2665,11 @@ def visit_class_def(self, defn: ClassDef) -> None:
26612665
self.fail(message_registry.CANNOT_INHERIT_FROM_FINAL.format(base.name), defn)
26622666
if not can_have_shared_disjoint_base(typ.bases):
26632667
self.fail(message_registry.INCOMPATIBLE_DISJOINT_BASES.format(typ.name), defn)
2664-
with self.tscope.class_scope(defn.info), self.enter_partial_types(is_class=True):
2668+
with (
2669+
self.tscope.class_scope(defn.info),
2670+
self.enter_partial_types(is_class=True),
2671+
self.enter_class(defn.info),
2672+
):
26652673
old_binder = self.binder
26662674
self.binder = ConditionalTypeBinder(self.options)
26672675
with self.binder.top_frame_context():
@@ -2729,6 +2737,15 @@ def visit_class_def(self, defn: ClassDef) -> None:
27292737
self.check_enum(defn)
27302738
infer_class_variances(defn.info)
27312739

2740+
@contextmanager
2741+
def enter_class(self, type: TypeInfo) -> Iterator[None]:
2742+
original_type = self.type
2743+
self.type = type
2744+
try:
2745+
yield
2746+
finally:
2747+
self.type = original_type
2748+
27322749
def check_final_deletable(self, typ: TypeInfo) -> None:
27332750
# These checks are only for mypyc. Only perform some checks that are easier
27342751
# to implement here than in mypyc.
@@ -8023,7 +8040,9 @@ def add_any_attribute_to_type(self, typ: Type, name: str) -> Type:
80238040
fallback = typ.fallback.copy_with_extra_attr(name, any_type)
80248041
return typ.copy_modified(fallback=fallback)
80258042
if isinstance(typ, TypeType) and isinstance(typ.item, Instance):
8026-
return TypeType.make_normalized(self.add_any_attribute_to_type(typ.item, name))
8043+
return TypeType.make_normalized(
8044+
self.add_any_attribute_to_type(typ.item, name), is_type_form=typ.is_type_form
8045+
)
80278046
if isinstance(typ, TypeVarType):
80288047
return typ.copy_modified(
80298048
upper_bound=self.add_any_attribute_to_type(typ.upper_bound, name),
@@ -8151,6 +8170,97 @@ def visit_global_decl(self, o: GlobalDecl, /) -> None:
81518170
return None
81528171

81538172

8173+
class TypeCheckerAsSemanticAnalyzer(SemanticAnalyzerCoreInterface):
8174+
"""
8175+
Adapts TypeChecker to the SemanticAnalyzerCoreInterface,
8176+
allowing most type expressions to be parsed during the TypeChecker pass.
8177+
8178+
See ExpressionChecker.try_parse_as_type_expression() to understand how this
8179+
class is used.
8180+
"""
8181+
8182+
_chk: TypeChecker
8183+
_names: dict[str, SymbolTableNode]
8184+
did_fail: bool
8185+
8186+
def __init__(self, chk: TypeChecker, names: dict[str, SymbolTableNode]) -> None:
8187+
self._chk = chk
8188+
self._names = names
8189+
self.did_fail = False
8190+
8191+
def lookup_qualified(
8192+
self, name: str, ctx: Context, suppress_errors: bool = False
8193+
) -> SymbolTableNode | None:
8194+
sym = self._names.get(name)
8195+
# All names being looked up should have been previously gathered,
8196+
# even if the related SymbolTableNode does not refer to a valid SymbolNode
8197+
assert sym is not None, name
8198+
return sym
8199+
8200+
def lookup_fully_qualified(self, fullname: str, /) -> SymbolTableNode:
8201+
ret = self.lookup_fully_qualified_or_none(fullname)
8202+
assert ret is not None, fullname
8203+
return ret
8204+
8205+
def lookup_fully_qualified_or_none(self, fullname: str, /) -> SymbolTableNode | None:
8206+
try:
8207+
return self._chk.lookup_qualified(fullname)
8208+
except KeyError:
8209+
return None
8210+
8211+
def fail(
8212+
self,
8213+
msg: str,
8214+
ctx: Context,
8215+
serious: bool = False,
8216+
*,
8217+
blocker: bool = False,
8218+
code: ErrorCode | None = None,
8219+
) -> None:
8220+
self.did_fail = True
8221+
8222+
def note(self, msg: str, ctx: Context, *, code: ErrorCode | None = None) -> None:
8223+
pass
8224+
8225+
def incomplete_feature_enabled(self, feature: str, ctx: Context) -> bool:
8226+
if feature not in self._chk.options.enable_incomplete_feature:
8227+
self.fail("__ignored__", ctx)
8228+
return False
8229+
return True
8230+
8231+
def record_incomplete_ref(self) -> None:
8232+
pass
8233+
8234+
def defer(self, debug_context: Context | None = None, force_progress: bool = False) -> None:
8235+
pass
8236+
8237+
def is_incomplete_namespace(self, fullname: str) -> bool:
8238+
return False
8239+
8240+
@property
8241+
def final_iteration(self) -> bool:
8242+
return True
8243+
8244+
def is_future_flag_set(self, flag: str) -> bool:
8245+
return self._chk.tree.is_future_flag_set(flag)
8246+
8247+
@property
8248+
def is_stub_file(self) -> bool:
8249+
return self._chk.tree.is_stub
8250+
8251+
def is_func_scope(self) -> bool:
8252+
# Return arbitrary value.
8253+
#
8254+
# This method is currently only used to decide whether to pair
8255+
# a fail() message with a note() message or not. Both of those
8256+
# message types are ignored.
8257+
return False
8258+
8259+
@property
8260+
def type(self) -> TypeInfo | None:
8261+
return self._chk.type
8262+
8263+
81548264
class CollectArgTypeVarTypes(TypeTraverserVisitor):
81558265
"""Collects the non-nested argument types in a set."""
81568266

0 commit comments

Comments
 (0)