|
| 1 | +--- |
| 2 | +title: Jupyter integration with the Language Server Protocol |
| 3 | +authors: Nicholas Bollweg (@bollwyvl), Jeremy Tuloup (@jtpio), Michał Krassowski (@krassowski) |
| 4 | +issue-number: 67 |
| 5 | +pr-number: 72 |
| 6 | +date-started: 2021-06-27 |
| 7 | +--- |
| 8 | + |
| 9 | +# Summary |
| 10 | + |
| 11 | +[jupyter(lab)-lsp](https://github.com/krassowski/jupyterlab-lsp) is a project bringing integration |
| 12 | +of language-specific IDE features (such as diagnostics, linting, autocompletion, refactoring) to the |
| 13 | +Jupyter ecosystem by leveraging the established |
| 14 | +[Language Server Protocol](https://microsoft.github.io/language-server-protocol/) (LSP), with a good |
| 15 | +overview on the [community knowledge site](https://langserver.org). We would like to propose its |
| 16 | +incorporation as an official sub-project of Project Jupyter. We feel this would benefit Jupyter |
| 17 | +users through better discoverability of advanced interactive computing features, supported by the |
| 18 | +(LSP), but otherwise missing in a user's Jupyter experience. While our repository currently features |
| 19 | +a working implementation, the proposal is not tied to it (beyond a proposal for migration of the |
| 20 | +repository to a Jupyter-managed GitHub organization) but rather aimed to guide the process of |
| 21 | +formalizing and evolving the way of integrating Jupyter with LSP in general. |
| 22 | + |
| 23 | +# Motivation |
| 24 | + |
| 25 | +A common criticism of the Jupyter environment (regardless of the front-end editor) and of the |
| 26 | +official Jupyter frontends (in light of recent, experimental support of feature-rich notebook |
| 27 | +edition under development by some of the major IDE developers) is the lack of advanced code |
| 28 | +assistance tooling. The proper tooling can improve code quality, validity of computation and |
| 29 | +increase development speed and we therefore believe that it is a key ingredient of a good |
| 30 | +computational notebooks environment, which from the beginning aimed at improving the workflow of |
| 31 | +users. |
| 32 | + |
| 33 | +Providing support for advanced coding assistance for each language separately is a daunting task, |
| 34 | +challenging not only for volunteer-driven projects, but also for large companies. Microsoft |
| 35 | +recognized the problem creating the Language Server Protocol with reference implementation in |
| 36 | +VSCode(TM). |
| 37 | + |
| 38 | +Many language servers are community supported and available for free (see the community-maintained |
| 39 | +list of [language servers](https://langserver.org/)). |
| 40 | + |
| 41 | +# Guide-level explanation |
| 42 | + |
| 43 | +Much like |
| 44 | +[Jupyter Kernel Messaging](https://jupyter-client.readthedocs.io/en/stable/messaging.html), LSP |
| 45 | +provides a language-agnostic, JSON-compatible description for multiple clients to integrate with any |
| 46 | +number of language implementations. Unlike Kernel Messaging, the focus is on precise definition of |
| 47 | +the many facets of static analysis and code transformation, with nearly four times the number of |
| 48 | +messages of the Jupyter specification. We will discuss the opportunities and challenges of this |
| 49 | +complexity for users and maintainers of Jupyter Clients, Kernels, and related tools. |
| 50 | + |
| 51 | +The key component of the repository, |
| 52 | +[@krassowski/jupyterlab-lsp](https://www.npmjs.com/package/@krassowski/jupyterlab-lsp), offers |
| 53 | +Jupyter users an expanded subset of features described by the LSP as an extension to JupyterLab. |
| 54 | +These features include refinements of existing Jupyter interactive computing features, such as |
| 55 | +completion and introspection, as well as new Jupyter features such as linting, reference linking, |
| 56 | +and symbol renaming. It is supported by [jupyter-lsp](https://pypi.org/project/jupyter-lsp/), a |
| 57 | +Language Server- and Jupyter Client-agnostic extension of the Jupyter Notebook Server (for the `0.x` |
| 58 | +line) and Jupyter Server (for the `1.x`). We will discuss the architecture and engineering process |
| 59 | +of maintaining these components at greater length, leveraging a good deal of the user and developer |
| 60 | +[documentation](https://jupyterlab-lsp.readthedocs.io/en/latest/?badge=latest). |
| 61 | + |
| 62 | +# Reference-level explanation |
| 63 | + |
| 64 | +The current implementation of the LSP integration is a barely a proof of concept. We believe that a |
| 65 | +different implementation should be developed to take the more comprehensive use cases and diversity |
| 66 | +of the Jupyter ecosystem into account; we created detailed proposals for improvement and refactoring |
| 67 | +of our code as explained later. |
| 68 | + |
| 69 | +## Dealing with Jupyter notebooks complexity |
| 70 | + |
| 71 | +The following features need to be considered in the design: |
| 72 | + |
| 73 | +The interactive, data-driven computing paradigm provides additional convenience features on top of |
| 74 | +existing languages: |
| 75 | + |
| 76 | +- cell and line magics |
| 77 | +- tranclusions: "foreign" code in the document, often implemented as magics which uses a different |
| 78 | + language or scope than the rest of the document (e.g. `%%html` magic in IPython) |
| 79 | +- polyglot notebooks using cell metadata to define language |
| 80 | +- the concept of cells, including cell outputs and cell metadata (e.g. enabling LSP extensions to |
| 81 | + warn users about unused empty cells, out of order execution markers, etc., as briefly discussed in |
| 82 | + [#467](https://github.com/krassowski/jupyterlab-lsp/issues/467)) |
| 83 | + |
| 84 | +## Current implementation |
| 85 | + |
| 86 | +Currently: |
| 87 | + |
| 88 | +- the notebook cells are concatenated into a single temporary ("virtual") document on the frontend, |
| 89 | + which is then sent to the backend, |
| 90 | + - the navigation between coordinate system is performed by the frontend and is based solely on the |
| 91 | + total number of lines after concatenation |
| 92 | +- as a workaround for some language servers requiring actual presence of the file on the filesystem |
| 93 | + (against the LSP spec, but common in some less advanced servers), our backend Jupyter server |
| 94 | + extension creates a temporary file on the file system (by default in the `.virtual_documents` |
| 95 | + directory); this is scheduled for deprecation, |
| 96 | +- Jupyter server extension serves as: |
| 97 | + - a transparent proxy between LSP language servers and frontend, speaking over websocket |
| 98 | + connection |
| 99 | + - a manager of language servers, determining whether specific LSP servers are installed and |
| 100 | + starting their processes |
| 101 | + - JSON files or declarative Python classes registered via entry points are used to define |
| 102 | + specification of the LSP servers (where to look for an executable of the LSP server, for which |
| 103 | + languages/kernels given LSP server should be used, what is its display name, etc.) |
| 104 | + |
| 105 | +# Rationale and alternatives |
| 106 | + |
| 107 | +A previous (stale) JEP proposed to integrate LSP and to adopt Monaco editor, which would entail |
| 108 | +bringing a heavy dependency and large reliance on continuous development of Monaco by Microsoft; it |
| 109 | +was not clear whether Monaco would allow efficient use in multi-editor notebook setting and the work |
| 110 | +on the integration stalled a few years ago. Differently to that previous proposal we **do not** |
| 111 | +propose to adopt any specific implementation, yet we bring a working implementation for CodeMirror 5 |
| 112 | +editor, which is already in use by two of the official front-ends for Jupyter (Jupyter Notebook and |
| 113 | +JupyterLab). While the nearly-feature-complete CodeMirror 6 has specifically declared LSP |
| 114 | +integration to be a non-goal, it does however provide a number of features which would allow for |
| 115 | +cleaner integration of multiple sources of editor annotation, such as named bundles of marks. |
| 116 | + |
| 117 | +The Jupyter originally driving innovation in the field is now in some communities perceived as a |
| 118 | +driver behind bad coding practices due to the lack of available toolset in the official frontends. |
| 119 | +Alternative formats to ipynb were proposed and sometimes the only motivation was a better |
| 120 | +IDE-features support. |
| 121 | + |
| 122 | +# Prior art |
| 123 | + |
| 124 | +Multiple editors already support the Language Server Protocol, whether directly or via extension |
| 125 | +points, including VSCode, Atom, Brackets (Adobe), Spyder, Visual Studio and many more. The list of |
| 126 | +clients and their capabilities is described at the community-maintained |
| 127 | +[knowledge site](https://langserver.org/) in the "LSP clients" section and at official website of |
| 128 | +the [LSP protocol](https://microsoft.github.io/language-server-protocol/implementors/tools/). |
| 129 | + |
| 130 | +Multiple proprietary notebook interfaces attempted integration of language features such as those |
| 131 | +provided by LSP, including Google Colab, Datalore, Deepnote, and Polynote; due to proprietary |
| 132 | +implementation details it is not clear how many of the existing solutions employ LSP (or its subset) |
| 133 | +under the hood. |
| 134 | + |
| 135 | +The on-going integration of the [Debug Adapter Protocol][dap] has demonstrated both the user |
| 136 | +benefits, and kernel maintainer costs, of "embracing and extending" existing, non-Jupyter protocols |
| 137 | +rather than re-implementing. |
| 138 | + |
| 139 | +# Unresolved questions |
| 140 | + |
| 141 | +The current implementation can be improved by: |
| 142 | + |
| 143 | +1. embedding cell identifiers (and possibly metadata) as comments in the virtual document at a place |
| 144 | + corresponding to the start of each cell (in jupytext-compatible way), to enable easier |
| 145 | + calculation of positions and implementation of refactoring features (e.g. linting with black) |
| 146 | + that add or remove lines (which is not currently possible), |
| 147 | + - adding metadata might be required to enable polyglot SOS notebooks, see discussion in |
| 148 | + [#282](https://github.com/krassowski/jupyterlab-lsp/issues/282) |
| 149 | + - one might consider if it is worth to delegate this task to jupytext; this would necessitate |
| 150 | + moving the notebook concatenation logic to the server extension, with a positive side effect of |
| 151 | + exposing it for re-use by other clients, but with a potential downsides of the need to |
| 152 | + frequently transfer the entire notebook (on each debounced keypress) to the server extension |
| 153 | + (which could be alleviated if implemented via delta/diffs; this adds more logic but given that |
| 154 | + notebooks is just a JSON it might be feasible to use an existing tool) and with a downside of |
| 155 | + having the notebook-virtual document position transformation code on both backend and frontend |
| 156 | + as the frontend part cannot be easily (or at al?) eliminated; as this option looks promising it |
| 157 | + will be investigated once current performance shortcomings are resolved. |
| 158 | + - see further discussion in [#467](https://github.com/krassowski/jupyterlab-lsp/issues/467) |
| 159 | +2. formalizing grammar of substituting magics with equivalent or placeholder (which allows for |
| 160 | + one-to-one mapping of magics to code that can be understood by standard refactoring tools and |
| 161 | + back to the magics after the code was transformed by the refactoring tools, for example moved to |
| 162 | + another file), see [#347](https://github.com/krassowski/jupyterlab-lsp/issues/347) |
| 163 | +3. abstracting the communication layer between client and server so that different mechanisms can be |
| 164 | + used for such communication, for example: |
| 165 | + - custom, manually managed websocket between the client and jupyter server extension (existing |
| 166 | + solution), |
| 167 | + - websocket managed reusing the kernel comms (acting as a transparent proxy but reducing the |
| 168 | + number of dependencies since in the context of Jupyter the kernel comms are expected to be |
| 169 | + present either way), see the proposed implementation in |
| 170 | + [#278](https://github.com/krassowski/jupyterlab-lsp/pull/278) |
| 171 | + - direct connection to a cloud or self-hosted service providing language intelligence as a |
| 172 | + service, e.g. [sourcegraph](https://about.sourcegraph.com/) |
| 173 | + - (potentially) in-client language servers, such a JSON Schema-aware language server to assist in |
| 174 | + configuration |
| 175 | + |
| 176 | +There are also smaller fires to put out in the current implementation which we believe do not |
| 177 | +warrant further discussion; however, we want to enumerate those to assure a potentially concerned |
| 178 | +reader that those topics are being looked at and considered a priority due to the immediate impact |
| 179 | +on user and/or developer experience: |
| 180 | + |
| 181 | +- reorganizing deeply nested code into shallower structure of multiple packages, one per each |
| 182 | + feature (with the current state of the repository in half a monorepo, half complex project being |
| 183 | + an annoyance to maintainers and contributors alike) |
| 184 | +- improving performance of completer and overall robustness of the features |
| 185 | +- enabling integration with other packages providing completion suggestions |
| 186 | +- enabling use of multiple LSP servers for a single document |
| 187 | + |
| 188 | +# Future possibilities |
| 189 | + |
| 190 | +- Amending the kernel messaging protocol to ask only for runtime (e.g. keys in a dictionary, columns |
| 191 | + in a data frame) and kernel-specific completions (e.g. magics), this is excluding static-analysis |
| 192 | + based completions, to improve the performance of the completer |
| 193 | +- Seeding existing linting tools with plugins to support notebook-specific features (empty cells, |
| 194 | + out of order execution, largely as envisioned by pioneering work of |
| 195 | + [JuLynter](https://dew-uff.github.io/julynter/index.html) experiment) |
| 196 | + - also see [lintotype] |
| 197 | +- Encouraging contributions to existing language-servers and offering platform for development of |
| 198 | + Jupyter-optimized language servers |
| 199 | +- Enabling LSP features in markdown cells |
| 200 | +- Implementing support for related Language Server Index Format (LSIF), a protocol closely related |
| 201 | + to LSP and defined on the |
| 202 | + [specification page](https://microsoft.github.io/language-server-protocol/specifications/lsif/0.5.0/specification/) |
| 203 | + for even faster IDE features for the retrieval of immutable (or infrequently mutable) information, |
| 204 | + such as documentation of built-in functions. |
| 205 | + |
| 206 | +[lintotype]: https://github.com/deathbeds/lintotype/ |
| 207 | +[dap]: |
| 208 | + https://github.com/jupyter/enhancement-proposals/blob/master/jupyter-debugger-protocol/jupyter-debugger-protocol.md |
0 commit comments