You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
``line``, ``block``, and ``text`` keys are removed in favor of a single ``code`` for context.
596
604
Lexing is up to the kernel.
597
605
606
+
.. versionchanged:: 5.2
607
+
608
+
Due to a widespread bug in many frontends, ``cursor_pos``
609
+
in versions prior to 5.2 is ambiguous in the presence of "astral-plane" characters.
610
+
In 5.2, cursor_pos **must be** the actual encoding-independent offset in unicode codepoints.
611
+
See :ref:`cursor_pos_unicode_note` for more.
612
+
598
613
599
614
Message type: ``complete_reply``::
600
615
@@ -1370,12 +1385,48 @@ handlers should set the parent header and publish status busy / idle,
1370
1385
just like an execute request.
1371
1386
1372
1387
1373
-
To Do
1388
+
Notes
1374
1389
=====
1375
1390
1376
-
Missing things include:
1391
+
.. _cursor_pos_unicode_note:
1392
+
1393
+
``cursor_pos`` and unicode offsets
1394
+
----------------------------------
1395
+
1396
+
Many frontends, especially those implemented in javascript,
1397
+
reported cursor_pos as the interpreter's string index,
1398
+
which is not the same as the unicode character offset if the interpreter uses UTF-16 (e.g. javascript or Python 2 on macOS),
1399
+
which stores "astral-plane" characters such as ``𝐚 (U+1D41A)`` as surrogate pairs,
1400
+
taking up two indices instead of one, causing a unicode offset
1401
+
drift of one per astral-plane character.
1402
+
Not all frontends have this behavior, however,
1403
+
and after JSON serialization information about which encoding was used
1404
+
when calculating the offset is lost,
1405
+
so assuming ``cursor_pos`` is calculated in UTF-16 could result in a similarly incorrect offset
1406
+
for frontends that did the right thing.
1407
+
1408
+
For this reason, in protocol versions prior to 5.2, ``cursor_pos``
1409
+
is officially ambiguous in the presence of astral plane unicode characters.
1410
+
Frontends claiming to implement protocol 5.2 **MUST** identify cursor_pos as the encoding-independent unicode character offset.
1411
+
Kernels may choose to expect the UTF-16 offset from requests implementing protocol 5.1 and earlier, in order to behave correctly with the most popular frontends.
1412
+
But they should know that doing so *introduces* the inverse bug for the frontends that do not have this bug.
1413
+
1414
+
Known affected frontends (as of 2017-06):
1415
+
1416
+
- Jupyter Notebook < 5.1
1417
+
- JupyterLab < 0.24
1418
+
- nteract
1419
+
- CoCalc
1420
+
- Jupyter Console and QtConsole with Python 2 on macOS and Windows
1421
+
1422
+
Known *not* affected frontends:
1423
+
1424
+
- QtConsole, Jupyter Console with Python 3 or Python 2 on Linux
1425
+
1426
+
.. see-also::
1427
+
1428
+
`Discussion on GitHub <https://github.com/jupyter/jupyter_client/issues/259>`_
1377
1429
1378
-
* Important: finish thinking through the payload concept and API.
0 commit comments