Skip to content

Commit 39a8a00

Browse files
author
Steven Silvester
authored
Merge pull request #72 from krassowski/lsp
Language server protocol (LSP)
2 parents 1161dd3 + 9d0ff51 commit 39a8a00

File tree

1 file changed

+208
-0
lines changed

1 file changed

+208
-0
lines changed
Lines changed: 208 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,208 @@
1+
---
2+
title: Jupyter integration with the Language Server Protocol
3+
authors: Nicholas Bollweg (@bollwyvl), Jeremy Tuloup (@jtpio), Michał Krassowski (@krassowski)
4+
issue-number: 67
5+
pr-number: 72
6+
date-started: 2021-06-27
7+
---
8+
9+
# Summary
10+
11+
[jupyter(lab)-lsp](https://github.com/krassowski/jupyterlab-lsp) is a project bringing integration
12+
of language-specific IDE features (such as diagnostics, linting, autocompletion, refactoring) to the
13+
Jupyter ecosystem by leveraging the established
14+
[Language Server Protocol](https://microsoft.github.io/language-server-protocol/) (LSP), with a good
15+
overview on the [community knowledge site](https://langserver.org). We would like to propose its
16+
incorporation as an official sub-project of Project Jupyter. We feel this would benefit Jupyter
17+
users through better discoverability of advanced interactive computing features, supported by the
18+
(LSP), but otherwise missing in a user's Jupyter experience. While our repository currently features
19+
a working implementation, the proposal is not tied to it (beyond a proposal for migration of the
20+
repository to a Jupyter-managed GitHub organization) but rather aimed to guide the process of
21+
formalizing and evolving the way of integrating Jupyter with LSP in general.
22+
23+
# Motivation
24+
25+
A common criticism of the Jupyter environment (regardless of the front-end editor) and of the
26+
official Jupyter frontends (in light of recent, experimental support of feature-rich notebook
27+
edition under development by some of the major IDE developers) is the lack of advanced code
28+
assistance tooling. The proper tooling can improve code quality, validity of computation and
29+
increase development speed and we therefore believe that it is a key ingredient of a good
30+
computational notebooks environment, which from the beginning aimed at improving the workflow of
31+
users.
32+
33+
Providing support for advanced coding assistance for each language separately is a daunting task,
34+
challenging not only for volunteer-driven projects, but also for large companies. Microsoft
35+
recognized the problem creating the Language Server Protocol with reference implementation in
36+
VSCode(TM).
37+
38+
Many language servers are community supported and available for free (see the community-maintained
39+
list of [language servers](https://langserver.org/)).
40+
41+
# Guide-level explanation
42+
43+
Much like
44+
[Jupyter Kernel Messaging](https://jupyter-client.readthedocs.io/en/stable/messaging.html), LSP
45+
provides a language-agnostic, JSON-compatible description for multiple clients to integrate with any
46+
number of language implementations. Unlike Kernel Messaging, the focus is on precise definition of
47+
the many facets of static analysis and code transformation, with nearly four times the number of
48+
messages of the Jupyter specification. We will discuss the opportunities and challenges of this
49+
complexity for users and maintainers of Jupyter Clients, Kernels, and related tools.
50+
51+
The key component of the repository,
52+
[@krassowski/jupyterlab-lsp](https://www.npmjs.com/package/@krassowski/jupyterlab-lsp), offers
53+
Jupyter users an expanded subset of features described by the LSP as an extension to JupyterLab.
54+
These features include refinements of existing Jupyter interactive computing features, such as
55+
completion and introspection, as well as new Jupyter features such as linting, reference linking,
56+
and symbol renaming. It is supported by [jupyter-lsp](https://pypi.org/project/jupyter-lsp/), a
57+
Language Server- and Jupyter Client-agnostic extension of the Jupyter Notebook Server (for the `0.x`
58+
line) and Jupyter Server (for the `1.x`). We will discuss the architecture and engineering process
59+
of maintaining these components at greater length, leveraging a good deal of the user and developer
60+
[documentation](https://jupyterlab-lsp.readthedocs.io/en/latest/?badge=latest).
61+
62+
# Reference-level explanation
63+
64+
The current implementation of the LSP integration is a barely a proof of concept. We believe that a
65+
different implementation should be developed to take the more comprehensive use cases and diversity
66+
of the Jupyter ecosystem into account; we created detailed proposals for improvement and refactoring
67+
of our code as explained later.
68+
69+
## Dealing with Jupyter notebooks complexity
70+
71+
The following features need to be considered in the design:
72+
73+
The interactive, data-driven computing paradigm provides additional convenience features on top of
74+
existing languages:
75+
76+
- cell and line magics
77+
- tranclusions: "foreign" code in the document, often implemented as magics which uses a different
78+
language or scope than the rest of the document (e.g. `%%html` magic in IPython)
79+
- polyglot notebooks using cell metadata to define language
80+
- the concept of cells, including cell outputs and cell metadata (e.g. enabling LSP extensions to
81+
warn users about unused empty cells, out of order execution markers, etc., as briefly discussed in
82+
[#467](https://github.com/krassowski/jupyterlab-lsp/issues/467))
83+
84+
## Current implementation
85+
86+
Currently:
87+
88+
- the notebook cells are concatenated into a single temporary ("virtual") document on the frontend,
89+
which is then sent to the backend,
90+
- the navigation between coordinate system is performed by the frontend and is based solely on the
91+
total number of lines after concatenation
92+
- as a workaround for some language servers requiring actual presence of the file on the filesystem
93+
(against the LSP spec, but common in some less advanced servers), our backend Jupyter server
94+
extension creates a temporary file on the file system (by default in the `.virtual_documents`
95+
directory); this is scheduled for deprecation,
96+
- Jupyter server extension serves as:
97+
- a transparent proxy between LSP language servers and frontend, speaking over websocket
98+
connection
99+
- a manager of language servers, determining whether specific LSP servers are installed and
100+
starting their processes
101+
- JSON files or declarative Python classes registered via entry points are used to define
102+
specification of the LSP servers (where to look for an executable of the LSP server, for which
103+
languages/kernels given LSP server should be used, what is its display name, etc.)
104+
105+
# Rationale and alternatives
106+
107+
A previous (stale) JEP proposed to integrate LSP and to adopt Monaco editor, which would entail
108+
bringing a heavy dependency and large reliance on continuous development of Monaco by Microsoft; it
109+
was not clear whether Monaco would allow efficient use in multi-editor notebook setting and the work
110+
on the integration stalled a few years ago. Differently to that previous proposal we **do not**
111+
propose to adopt any specific implementation, yet we bring a working implementation for CodeMirror 5
112+
editor, which is already in use by two of the official front-ends for Jupyter (Jupyter Notebook and
113+
JupyterLab). While the nearly-feature-complete CodeMirror 6 has specifically declared LSP
114+
integration to be a non-goal, it does however provide a number of features which would allow for
115+
cleaner integration of multiple sources of editor annotation, such as named bundles of marks.
116+
117+
The Jupyter originally driving innovation in the field is now in some communities perceived as a
118+
driver behind bad coding practices due to the lack of available toolset in the official frontends.
119+
Alternative formats to ipynb were proposed and sometimes the only motivation was a better
120+
IDE-features support.
121+
122+
# Prior art
123+
124+
Multiple editors already support the Language Server Protocol, whether directly or via extension
125+
points, including VSCode, Atom, Brackets (Adobe), Spyder, Visual Studio and many more. The list of
126+
clients and their capabilities is described at the community-maintained
127+
[knowledge site](https://langserver.org/) in the "LSP clients" section and at official website of
128+
the [LSP protocol](https://microsoft.github.io/language-server-protocol/implementors/tools/).
129+
130+
Multiple proprietary notebook interfaces attempted integration of language features such as those
131+
provided by LSP, including Google Colab, Datalore, Deepnote, and Polynote; due to proprietary
132+
implementation details it is not clear how many of the existing solutions employ LSP (or its subset)
133+
under the hood.
134+
135+
The on-going integration of the [Debug Adapter Protocol][dap] has demonstrated both the user
136+
benefits, and kernel maintainer costs, of "embracing and extending" existing, non-Jupyter protocols
137+
rather than re-implementing.
138+
139+
# Unresolved questions
140+
141+
The current implementation can be improved by:
142+
143+
1. embedding cell identifiers (and possibly metadata) as comments in the virtual document at a place
144+
corresponding to the start of each cell (in jupytext-compatible way), to enable easier
145+
calculation of positions and implementation of refactoring features (e.g. linting with black)
146+
that add or remove lines (which is not currently possible),
147+
- adding metadata might be required to enable polyglot SOS notebooks, see discussion in
148+
[#282](https://github.com/krassowski/jupyterlab-lsp/issues/282)
149+
- one might consider if it is worth to delegate this task to jupytext; this would necessitate
150+
moving the notebook concatenation logic to the server extension, with a positive side effect of
151+
exposing it for re-use by other clients, but with a potential downsides of the need to
152+
frequently transfer the entire notebook (on each debounced keypress) to the server extension
153+
(which could be alleviated if implemented via delta/diffs; this adds more logic but given that
154+
notebooks is just a JSON it might be feasible to use an existing tool) and with a downside of
155+
having the notebook-virtual document position transformation code on both backend and frontend
156+
as the frontend part cannot be easily (or at al?) eliminated; as this option looks promising it
157+
will be investigated once current performance shortcomings are resolved.
158+
- see further discussion in [#467](https://github.com/krassowski/jupyterlab-lsp/issues/467)
159+
2. formalizing grammar of substituting magics with equivalent or placeholder (which allows for
160+
one-to-one mapping of magics to code that can be understood by standard refactoring tools and
161+
back to the magics after the code was transformed by the refactoring tools, for example moved to
162+
another file), see [#347](https://github.com/krassowski/jupyterlab-lsp/issues/347)
163+
3. abstracting the communication layer between client and server so that different mechanisms can be
164+
used for such communication, for example:
165+
- custom, manually managed websocket between the client and jupyter server extension (existing
166+
solution),
167+
- websocket managed reusing the kernel comms (acting as a transparent proxy but reducing the
168+
number of dependencies since in the context of Jupyter the kernel comms are expected to be
169+
present either way), see the proposed implementation in
170+
[#278](https://github.com/krassowski/jupyterlab-lsp/pull/278)
171+
- direct connection to a cloud or self-hosted service providing language intelligence as a
172+
service, e.g. [sourcegraph](https://about.sourcegraph.com/)
173+
- (potentially) in-client language servers, such a JSON Schema-aware language server to assist in
174+
configuration
175+
176+
There are also smaller fires to put out in the current implementation which we believe do not
177+
warrant further discussion; however, we want to enumerate those to assure a potentially concerned
178+
reader that those topics are being looked at and considered a priority due to the immediate impact
179+
on user and/or developer experience:
180+
181+
- reorganizing deeply nested code into shallower structure of multiple packages, one per each
182+
feature (with the current state of the repository in half a monorepo, half complex project being
183+
an annoyance to maintainers and contributors alike)
184+
- improving performance of completer and overall robustness of the features
185+
- enabling integration with other packages providing completion suggestions
186+
- enabling use of multiple LSP servers for a single document
187+
188+
# Future possibilities
189+
190+
- Amending the kernel messaging protocol to ask only for runtime (e.g. keys in a dictionary, columns
191+
in a data frame) and kernel-specific completions (e.g. magics), this is excluding static-analysis
192+
based completions, to improve the performance of the completer
193+
- Seeding existing linting tools with plugins to support notebook-specific features (empty cells,
194+
out of order execution, largely as envisioned by pioneering work of
195+
[JuLynter](https://dew-uff.github.io/julynter/index.html) experiment)
196+
- also see [lintotype]
197+
- Encouraging contributions to existing language-servers and offering platform for development of
198+
Jupyter-optimized language servers
199+
- Enabling LSP features in markdown cells
200+
- Implementing support for related Language Server Index Format (LSIF), a protocol closely related
201+
to LSP and defined on the
202+
[specification page](https://microsoft.github.io/language-server-protocol/specifications/lsif/0.5.0/specification/)
203+
for even faster IDE features for the retrieval of immutable (or infrequently mutable) information,
204+
such as documentation of built-in functions.
205+
206+
[lintotype]: https://github.com/deathbeds/lintotype/
207+
[dap]:
208+
https://github.com/jupyter/enhancement-proposals/blob/master/jupyter-debugger-protocol/jupyter-debugger-protocol.md

0 commit comments

Comments
 (0)