Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions Lib/test/test_pydoc/test_pydoc.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ class A(builtins.object)
class B(builtins.object)
| Methods defined here:
|
| __annotate__(...)
| __annotate__(format, /)
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
Expand Down Expand Up @@ -180,7 +180,7 @@ class A(builtins.object)

class B(builtins.object)
Methods defined here:
__annotate__(...)
__annotate__(format, /)
----------------------------------------------------------------------
Data descriptors defined here:
__dict__
Expand Down
49 changes: 49 additions & 0 deletions Lib/test/test_type_annotations.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import annotationlib
import inspect
import textwrap
import types
import unittest
Expand Down Expand Up @@ -380,6 +381,11 @@ class X:
annotate(None)
self.assertEqual(annotate(annotationlib.Format.VALUE), {"x": int})

sig = inspect.signature(annotate)
self.assertEqual(sig, inspect.Signature([
inspect.Parameter("format", inspect.Parameter.POSITIONAL_ONLY)
]))

def test_comprehension_in_annotation(self):
# This crashed in an earlier version of the code
ns = run_code("x: [y for y in range(10)]")
Expand All @@ -400,6 +406,7 @@ def f(x: int) -> int: pass

def test_name_clash_with_format(self):
# this test would fail if __annotate__'s parameter was called "format"
# during symbol table construction
code = """
class format: pass

Expand All @@ -408,3 +415,45 @@ def f(x: format): pass
ns = run_code(code)
f = ns["f"]
self.assertEqual(f.__annotations__, {"x": ns["format"]})

code = """
class Outer:
class format: pass

def meth(self, x: format): ...
"""
ns = run_code(code)
self.assertEqual(ns["Outer"].meth.__annotations__, {"x": ns["Outer"].format})

code = """
def f(format):
def inner(x: format): pass
return inner
res = f("closure var")
"""
ns = run_code(code)
self.assertEqual(ns["res"].__annotations__, {"x": "closure var"})

code = """
def f(x: format):
pass
"""
ns = run_code(code)
# picks up the format() builtin
self.assertEqual(ns["f"].__annotations__, {"x": format})

code = """
def outer():
def f(x: format):
pass
if False:
class format: pass
return f
f = outer()
"""
ns = run_code(code)
with self.assertRaisesRegex(
NameError,
"cannot access free variable 'format' where it is not associated with a value in enclosing scope",
):
ns["f"].__annotations__
28 changes: 28 additions & 0 deletions Python/codegen.c
Original file line number Diff line number Diff line change
Expand Up @@ -675,6 +675,34 @@ codegen_leave_annotations_scope(compiler *c, location loc,
ADDOP_I(c, loc, BUILD_MAP, annotations_len);
ADDOP_IN_SCOPE(c, loc, RETURN_VALUE);
PyCodeObject *co = _PyCompile_OptimizeAndAssemble(c, 1);

// We want the parameter to __annotate__ to be named "format" in the
// signature shown by inspect.signature(), but we need to use a
// different name (.format) in the symtable so that if the name
// "format" appears in the annotations, it does not get clobbered
// by this name.
// This code is essentially:
// co->co_localsplusnames = ("format", *co->co_localsplusnames[1:])
const Py_ssize_t size = PyObject_Size(co->co_localsplusnames);
if (size == -1) {
return ERROR;
}
PyObject *new_names = PyTuple_New(size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a critique of your approach, but--I'm surprised you needed to go to all this effort. Why was it necessary to make a new tuple, write the new value for index 0, copy over the other values, and release the reference to the old tuple? I'm assuming the reference count of co_localsplusnames is currently 1; I would have asserted that, then overwritten the first entry. I grant you your approach is more conceptually hygienic, but in practice I assume the quick-and-dirty approach would work equally well.

What am I missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it is possible to mutate tuples in C code, it feels riskier. For example, maybe we'll make changes in the future that rely on tuples being immutable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assure you, this is a long-standing CPython idiom. We've relied on "if there's only one reference to an object, and you own it, you may modify the object however you like" for decades now.

For fun I made a survey of CPython, literally examining every instance of PyTuple_SET_ITEM. (I didn't try the other spellings.) I found a bunch of sites where we do this. In nearly every instance the code is structured as follows:

if there's only one reference to the tuple (which we own)
    modify the tuple in place
else
    create a new tuple

(I'll append the list of such sites at the bottom of this comment.)

Clearly these existing sites are optimizations; instead of destroying the old tuple and creating a fresh one, they're just reusing the existing tuple. They have a harder time of it because generally the tuple has been shown to the interpreter. In our case, we have a freshly compiled code object that hasn't been shown to the interpreter. So there's no chance anyone else has taken any references yet.

If we did change CPython so this was no longer viable, the developer making that change would have to fix all the sites I listed below, which they would probably find the same way I did--looking for all places where people set things in tuples. I don't think modifying the tuple directly would trip up such a future developer.

So, yeah, I really do think it'd be safe to modify the tuple in-place. Just to be totally safe, I'd check the reference count was 1 and raise if it wasn't. (It'd only happen if someone was hacking on compile.c or something, at which point they would deal with it. This would never raise in the wild.)

I don't actually mind you doing it the hard way--we can ship it like this. It just seems needless. We have a longstanding idiom that lets us skip the laborious approach you took. But I'm not gonna fight you about it.


Places where CPython modifies tuples in-place:

compile.c does it a couple times in its internal cache objects. Never exposed to the user (I think).

zip_next in bltinmodule.c, uses _PyObject_IsUniquelyReferenced.

odictiter_iternext in odictobject.c, uses (Py_REFCNT(result) == 1).

enum_next_long in enumobject.c, uses if (Py_REFCNT(result) == 1).

dictiter_iternextitem in dictobject.c, uses _Py_IsOwnedByCurrentThread.

dictreviter_iter_lock_held in dictobject.c, uses Py_REFCNT(result) == 1.

intern_constants in codeojbect.c, doesn't check ownership, this is in con->consts and I assume that's internal.

Five places in itertoolsmodule.c: pairwise_next combinations_next cwr_next permutations_next zip_longest_next, all use Py_REFCNT(result) == 1.

p.s. you should see the if-only-one-reference-modify-the-object shenanigans in the Unicode object!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #127058 where @markshannon proposes to deprecate existing tuple-mutation shenanigans. That strengthens my conviction that we shouldn't introduce a new tuple mutation here.

if (new_names == NULL) {
return ERROR;
}
PyTuple_SET_ITEM(new_names, 0, Py_NewRef(&_Py_ID(format)));
for (int i = 1; i < size; i++) {
PyObject *item = PyTuple_GetItem(co->co_localsplusnames, i);
if (item == NULL) {
Py_DECREF(new_names);
return ERROR;
}
Py_INCREF(item);
PyTuple_SET_ITEM(new_names, i, item);
}
Py_SETREF(co->co_localsplusnames, new_names);

_PyCompile_ExitScope(c);
if (co == NULL) {
return ERROR;
Expand Down
Loading