Skip to content

Commit 51f20a6

Browse files
authored
Merge pull request #262 from minrk/52-unicode
describe cursor_pos ambiguity and bump protocol to 5.2
2 parents 3e667b1 + 5acecef commit 51f20a6

File tree

2 files changed

+56
-5
lines changed

2 files changed

+56
-5
lines changed

docs/messaging.rst

Lines changed: 55 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Versioning
2121

2222
The Jupyter message specification is versioned independently of the packages
2323
that use it.
24-
The current version of the specification is 5.1.
24+
The current version of the specification is 5.2.
2525

2626
.. note::
2727
*New in* and *Changed in* messages in this document refer to versions of the
@@ -547,6 +547,14 @@ Message type: ``inspect_request``::
547547
``name`` key replaced with ``code`` and ``cursor_pos``,
548548
moving the lexing responsibility to the kernel.
549549

550+
.. versionchanged:: 5.2
551+
552+
Due to a widespread bug in many frontends, ``cursor_pos``
553+
in versions prior to 5.2 is ambiguous in the presence of "astral-plane" characters.
554+
In 5.2, cursor_pos **must be** the actual encoding-independent offset in unicode codepoints.
555+
See :ref:`cursor_pos_unicode_note` for more.
556+
557+
550558
The reply is a mime-bundle, like a `display_data`_ message,
551559
which should be a formatted representation of information about the context.
552560
In the notebook, this is used to show tooltips over function calls, etc.
@@ -595,6 +603,13 @@ Message type: ``complete_request``::
595603
``line``, ``block``, and ``text`` keys are removed in favor of a single ``code`` for context.
596604
Lexing is up to the kernel.
597605

606+
.. versionchanged:: 5.2
607+
608+
Due to a widespread bug in many frontends, ``cursor_pos``
609+
in versions prior to 5.2 is ambiguous in the presence of "astral-plane" characters.
610+
In 5.2, cursor_pos **must be** the actual encoding-independent offset in unicode codepoints.
611+
See :ref:`cursor_pos_unicode_note` for more.
612+
598613

599614
Message type: ``complete_reply``::
600615

@@ -1370,12 +1385,48 @@ handlers should set the parent header and publish status busy / idle,
13701385
just like an execute request.
13711386

13721387

1373-
To Do
1388+
Notes
13741389
=====
13751390

1376-
Missing things include:
1391+
.. _cursor_pos_unicode_note:
1392+
1393+
``cursor_pos`` and unicode offsets
1394+
----------------------------------
1395+
1396+
Many frontends, especially those implemented in javascript,
1397+
reported cursor_pos as the interpreter's string index,
1398+
which is not the same as the unicode character offset if the interpreter uses UTF-16 (e.g. javascript or Python 2 on macOS),
1399+
which stores "astral-plane" characters such as ``𝐚 (U+1D41A)`` as surrogate pairs,
1400+
taking up two indices instead of one, causing a unicode offset
1401+
drift of one per astral-plane character.
1402+
Not all frontends have this behavior, however,
1403+
and after JSON serialization information about which encoding was used
1404+
when calculating the offset is lost,
1405+
so assuming ``cursor_pos`` is calculated in UTF-16 could result in a similarly incorrect offset
1406+
for frontends that did the right thing.
1407+
1408+
For this reason, in protocol versions prior to 5.2, ``cursor_pos``
1409+
is officially ambiguous in the presence of astral plane unicode characters.
1410+
Frontends claiming to implement protocol 5.2 **MUST** identify cursor_pos as the encoding-independent unicode character offset.
1411+
Kernels may choose to expect the UTF-16 offset from requests implementing protocol 5.1 and earlier, in order to behave correctly with the most popular frontends.
1412+
But they should know that doing so *introduces* the inverse bug for the frontends that do not have this bug.
1413+
1414+
Known affected frontends (as of 2017-06):
1415+
1416+
- Jupyter Notebook < 5.1
1417+
- JupyterLab < 0.24
1418+
- nteract
1419+
- CoCalc
1420+
- Jupyter Console and QtConsole with Python 2 on macOS and Windows
1421+
1422+
Known *not* affected frontends:
1423+
1424+
- QtConsole, Jupyter Console with Python 3 or Python 2 on Linux
1425+
1426+
.. see-also::
1427+
1428+
`Discussion on GitHub <https://github.com/jupyter/jupyter_client/issues/259>`_
13771429

1378-
* Important: finish thinking through the payload concept and API.
13791430

13801431
.. _ZeroMQ: http://zeromq.org
13811432
.. _nteract: https://nteract.io

jupyter_client/_version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
version_info = (5, 1, 0, 'dev')
22
__version__ = '.'.join(map(str, version_info))
33

4-
protocol_version_info = (5, 1)
4+
protocol_version_info = (5, 2)
55
protocol_version = "%i.%i" % protocol_version_info

0 commit comments

Comments
 (0)