PEP 768: Add some clarifications and minor edits (#4284)

pablogsal · web-flow · commit 7f219a0acf1e · 2025-03-04T00:57:52.000Z
diff --git a/peps/pep-0768.rst b/peps/pep-0768.rst
@@ -141,8 +141,10 @@ A new structure is added to PyThreadState to support remote debugging:
 
 This structure is appended to ``PyThreadState``, adding only a few fields that
 are **never accessed during normal execution**. The ``debugger_pending_call`` field
-indicates when a debugger has requested execution, while ``debugger_script``
-provides Python code to be executed when the interpreter reaches a safe point.
+indicates when a debugger has requested execution, while ``debugger_script_path``
+provides a filesystem path to a Python source file (.py) that will be executed when
+the interpreter reaches a safe point. The path must point to a Python source file,
+not compiled Python code (.pyc) or any other format.
 
 The value for ``MAX_SCRIPT_PATH_SIZE`` will be a trade-off between binary size
 and how big debugging scripts' paths can be. To limit the memory overhead per
@@ -177,7 +179,7 @@ debugger support:
 These offsets allow debuggers to locate critical debugging control structures in
 the target process's memory space. The ``eval_breaker`` and ``remote_debugger_support``
 offsets are relative to each ``PyThreadState``, while the ``debugger_pending_call``
-and ``debugger_script`` offsets are relative to each ``_PyRemoteDebuggerSupport``
+and ``debugger_script_path`` offsets are relative to each ``_PyRemoteDebuggerSupport``
 structure, allowing the new structure and its fields to be found regardless of
 where they are in memory. ``debugger_script_path_size`` informs the attaching
 tool of the size of the buffer.
@@ -200,13 +202,19 @@ When a debugger wants to attach to a Python process, it follows these steps:
 
 5. Write control information:
 
-   - Write a filename containing Python code to be executed into the
-     ``debugger_script`` field in ``_PyRemoteDebuggerSupport``.
+   - Most debuggers will pause the process before writing to its memory. This is
+     standard practice for tools like GDB, which use SIGSTOP or ptrace to pause the process.
+     This approach prevents races when writing to process memory. Profilers and other tools
+     that don't wish to stop the process can still use this interface, but they need to
+     handle possible races. This is a normal consideration for profilers.
+
+   - Write a file path to a Python source file (.py) into the
+     ``debugger_script_path`` field in ``_PyRemoteDebuggerSupport``.
    - Set ``debugger_pending_call`` flag in ``_PyRemoteDebuggerSupport`` to 1
    - Set ``_PY_EVAL_PLEASE_STOP_BIT`` in the ``eval_breaker`` field
 
-Once the interpreter reaches the next safe point, it will execute the script
-provided by the debugger.
+Once the interpreter reaches the next safe point, it will execute the Python code
+contained in the file specified by the debugger.
 
 Interpreter Integration
 -----------------------
@@ -237,7 +245,7 @@ to be audited or disabled if desired by a system's administrator.
     if (tstate->eval_breaker) {
         if (tstate->remote_debugger_support.debugger_pending_call) {
             tstate->remote_debugger_support.debugger_pending_call = 0;
-            const char *path = tstate->remote_debugger_support.debugger_script;
+            const char *path = tstate->remote_debugger_support.debugger_script_path;
             if (*path) {
                 if (0 != PySys_Audit("debugger_script", "%s", path)) {
                     PyErr_Clear();
@@ -273,28 +281,35 @@ arbitrary Python code within the context of a specified Python process:
 
 .. code-block:: python
 
-  def remote_exec(pid: int, code: str, timeout: int = 0) -> None:
+  def remote_exec(pid: int, script: str|bytes|PathLike) -> None:
       """
-      Executes a block of Python code in a given remote Python process.
+      Executes a file containing Python code in a given remote Python process.
+
+      This function returns immediately, and the code will be executed by the
+      target process's main thread at the next available opportunity, similarly
+      to how signals are handled. There is no interface to determine when the
+      code has been executed. The caller is responsible for making sure that
+      the file still exists whenever the remote process tries to read it and that
+      it hasn't been overwritten.
 
       Args:
            pid (int): The process ID of the target Python process.
-           code (str): A string containing the Python code to be executed.
-           timeout (int): An optional timeout for waiting for the remote
-              process to execute the code. If the timeout is exceeded a
-              ``TimeoutError`` will be raised.
+           script (str|bytes|PathLike): The path to a file containing
+               the Python code to be executed.
       """
 
 An example usage of the API would look like:
 
 .. code-block:: python
 
     import sys
+    import uuid
     # Execute a print statement in a remote Python process with PID 12345
+    script = f"/tmp/{uuid.uuid4()}.py"
+    with open(script, "w") as f:
+        f.write("print('Hello from remote execution!')")
     try:
-        sys.remote_exec(12345, "print('Hello from remote execution!')", timeout=3)
-    except TimeoutError:
-        print(f"The remote process took too long to execute the code")
+        sys.remote_exec(12345, script)
     except Exception as e:
         print(f"Failed to execute code: {e}")
 
@@ -322,6 +337,36 @@ feature. This way, tools can offer a useful error message explaining why they
 won't work, instead of believing that they have attached and then never having
 their script run.
 
+Multi-threading Considerations
+------------------------------
+
+The overall execution pattern resembles how Python handles signals internally.
+The interpreter guarantees that injected code only runs at safe points, never
+interrupting atomic operations within the interpreter itself. This approach
+ensures that debugging operations cannot corrupt the interpreter state while
+still providing timely execution in most real-world scenarios.
+
+However, debugging code injected through this interface can execute in any
+thread. This behavior is different than how Python handles signals, since
+signal handlers can only run in the main thread. If a debugger wants to inject
+code into every running thread, it must inject it into every ``PyThreadState``.
+If a debugger wants to run code in the first available thread, it needs to
+inject it into every ``PyThreadState``, and that injected code must check
+whether it has already been run by another thread (likely by setting some flag
+in the globals of some module).
+
+Note that the Global Interpreter Lock (GIL) continues to govern execution as
+normal when the injected code runs. This means if a target thread is currently
+executing a C extension that holds the GIL continuously, the injected code
+won't be able to run until that operation completes and the GIL becomes
+available. However, the interface introduces no additional GIL contention
+beyond what the injected code itself requires. Importantly, the interface
+remains fully compatible with Python's free-threaded mode.
+
+It may be useful for a debugger that injected some code to be run to follow
+that up by sending some pre-registered signal to the process, which can
+interrupt any blocking I/O or sleep states waiting for external resources, and
+allow a safe opportunity to run the injected code.
 
 Backwards Compatibility
 =======================
@@ -454,8 +499,8 @@ Rejected Ideas
 Writing Python code into the buffer
 -----------------------------------
 
-We have chosen to have debuggers write the code to be executed into a file
-whose path is written into a buffer in the remote process. This has been deemed
+We have chosen to have debuggers write the path to a file containing Python code
+into a buffer in the remote process. This has been deemed
 more secure than writing the Python code to be executed itself into a buffer in
 the remote process, because it means that an attacker who has gained arbitrary
 writes in a process but not arbitrary code execution or file system