Skip to content

Commit 4f8b944

Browse files
authored
Merge pull request #57 from consideRatio/pr/refresh-readme
Refresh README.md
2 parents 0a3d1ee + 506cea7 commit 4f8b944

File tree

1 file changed

+96
-74
lines changed

1 file changed

+96
-74
lines changed

README.md

Lines changed: 96 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,19 @@
66
[![Discourse](https://img.shields.io/badge/help_forum-discourse-blue?logo=discourse)](https://discourse.jupyter.org/c/jupyterhub)
77
[![Gitter](https://img.shields.io/badge/social_chat-gitter-blue?logo=gitter)](https://gitter.im/jupyterhub/jupyterhub)
88

9-
`jupyterhub-idle-culler` provides a JupyterHub service to identify and shut down idle or long-running Jupyter Notebook servers.
10-
The exact actions performed are dependent on the used spawner for the Jupyter Notebook server (e.g. the default [LocalProcessSpawner](https://jupyterhub.readthedocs.io/en/stable/api/spawner.html#localprocessspawner>), [kubespawner](https://github.com/jupyterhub/kubespawner), or [dockerspawner](https://github.com/jupyterhub/dockerspawner)).
11-
In addition, if explicitly requested, all users whose Jupyter Notebook servers have been shut down this way are deleted as JupyterHub users from the internal database. This neither affects the authentication method which continues to allow those users to log in nor does it delete persisted user data (e.g. stored in docker volumes for dockerspawner or in persisted volumes for kubespawner).
9+
`jupyterhub-idle-culler` provides a JupyterHub service to identify and stop idle
10+
or long-running Jupyter servers via JupyterHub. It works solely by interacting
11+
with JupyterHub's REST API, and is often configured to run as a JupyterHub
12+
managed service started up by JupyterHub itself.
1213

1314
## Setup
1415

16+
Setup involves three parts:
17+
18+
1. Install the Python package.
19+
2. Configure JupyterHub permissions to work against JupyterHub's REST API.
20+
3. Configure how its started up, either as a JupyterHub managed service or as a standalone script.
21+
1522
### Installation
1623

1724
```bash
@@ -59,7 +66,7 @@ c.JupyterHub.load_roles = [
5966
### As a hub managed service
6067

6168
In `jupyterhub_config.py`, add the following dictionary for the idle-culler
62-
Service to the `c.JupyterHub.services` list:
69+
service to the `c.JupyterHub.services` list:
6370

6471
```python
6572
c.JupyterHub.services = [
@@ -109,7 +116,7 @@ Then start `jupyterhub-idle-culler` manually.
109116

110117
```bash
111118
export JUPYTERHUB_API_TOKEN=api_token_above...
112-
python3 -m jupyterhub-idle-culler [--timeout=900] [--url=http://localhost:8081/hub/api]
119+
python3 -m jupyterhub_idle_culler [--timeout=900] [--url=http://localhost:8081/hub/api]
113120
```
114121

115122
## Command line flags
@@ -149,101 +156,115 @@ python3 -m jupyterhub-idle-culler [--timeout=900] [--url=http://localhost:8081/h
149156

150157
## Caveats
151158

152-
1. last_activity is not updated with high frequency, so cull timeout should be
153-
greater than the sum of:
159+
1. JupyterHub's `last_activity` data about user servers is not updated with high
160+
frequency, so cull timeout should be greater than the sum of:
154161

155162
- single-user websocket ping interval (default: 30 seconds)
156163
- `JupyterHub.last_activity_interval` (default: 5 minutes)
157164

158-
2. The same `--timeout` and `--max-age` values are used to cull
159-
users and users' servers. If you want a different value for users and servers,
160-
you should add this script to the services list twice, just with different
161-
`name`s, different values, and one with the `--cull-users` option.
165+
2. If you want to use `--cull-users` with a different culling interval for the
166+
user servers and users, you must start two idle culler services. This is
167+
because both are configured via `--timeout` and `--max-age`. To do so,
168+
configure this service to start twice with different configuration, where one
169+
has the `--cull-users` option.
162170

163-
3. By default HTTP requests to the hub timeout after 60 seconds. This can be
164-
changed by setting the `JUPYTERHUB_REQUEST_TIMEOUT` environment variable.
171+
3. By default `jupyterhub-idle-cullers` HTTP requests to JupyterHub's REST API
172+
timeouts after 60 seconds. This can be changed by setting the
173+
`JUPYTERHUB_REQUEST_TIMEOUT` environment variable.
165174

166175
## How it works
167176

168-
jupytehrub-idle-culler lists available users via JupyterHub's [/users][users-api] REST API.
177+
JupyterHub's REST API is used to acquire information about activity, and if the
178+
idle culler service based on configuration thinks a server should be stopped or
179+
deleted it also does so via JupyterHub's REST API.
180+
181+
### In depth
182+
183+
`jupyterhub-idle-culler` relies on permission to work against JupyterHub's REST
184+
API is provided via the `JUPYTERHUB_API_TOKEN`, that is set automatically for
185+
[managed services] started by JupyterHub.
186+
187+
`jupyterhub-idle-culler` lists available users and their server's reported
188+
`last_activity` via JupyterHub's [`/users`] REST API and makes decisions based on
189+
that. User's default servers can be stopped via [`/users/{name}/server`], named
190+
servers can be stopped and optionally removed via
191+
[`/users/{name}/servers/{server_name}`], and users can optionally be deleted via
192+
[`/users/{name}`].
169193

170-
[users-api]: https://jupyterhub.readthedocs.io/en/stable/_static/rest-api/index.html#path--users
194+
[managed services]: https://jupyterhub.readthedocs.io/en/stable/reference/services.html#launching-a-hub-managed-service
195+
[`/users`]: https://jupyterhub.readthedocs.io/en/stable/reference/rest-api.html#/default/get_users
196+
[`/users/{name}/server`]: https://jupyterhub.readthedocs.io/en/stable/reference/rest-api.html#/default/delete_users__name__server
197+
[`/users/{name}/servers/{server_name}`]: https://jupyterhub.readthedocs.io/en/stable/reference/rest-api/index.html#operation--users--name--servers--server_name--delete
198+
[`/users/{name}`]: https://jupyterhub.readthedocs.io/en/stable/reference/rest-api.html#/default/delete_users__name_
171199

172-
jupyterhub-idle-culler culls user servers using JupyterHub's REST API
173-
([/users/{name}/server](https://jupyterhub.readthedocs.io/en/stable/_static/rest-api/index.html#operation--users--name--server-delete)
174-
or
175-
[/users/{name}/servers/{server_name}](https://jupyterhub.readthedocs.io/en/stable/_static/rest-api/index.html#operation--users--name--servers--server_name--delete)),
176-
and makes the culling decisions based on its configuration and what JupyterHub
177-
reports about the user servers via its REST API
178-
[(/users)][users-api]
179-
where user servers' `last_activity` is reported back.
200+
JupyterHub's reported `last_activity` for user servers is updated by JupyterHub
201+
at a regular interval in the [`update_last_activity` function] that relies on
202+
two sources of information.
180203

181-
The `last_activity` that JupyterHub reports is the most recent summary of
182-
information updated at a regular interval via the [`update_last_activity`
183-
function](https://github.com/jupyterhub/jupyterhub/blob/1.4.2/jupyterhub/app.py#L2646)
184-
that combines two sources of information.
204+
[`update_last_activity` function]: https://github.com/jupyterhub/jupyterhub/blob/3.1.1/jupyterhub/app.py#L3002
185205

186206
1. **The proxy's routes data**
187207

188-
The `update_last_activity` function will [ask the
189-
proxy](https://jupyterhub.readthedocs.io/en/stable/reference/proxy.html#retrieving-routes)
190-
for the active routes like `/user/user1` and collects associated
191-
`last_activity` data if it is available. This activity represents
192-
successfully proxies network traffic.
208+
The configurable proxy class for JupyterHub is an interface for JupyterHub to
209+
request routing of network traffic to user servers. Through this interface,
210+
JupyterHub be informed on network activity if the proxy class provides it,
211+
specifically via the [`get_all_routes`] function.
193212

194-
`last_activity` data for routes will be available when using
195-
[configurable-http-proxy](https://github.com/jupyterhub/configurable-http-proxy#readme)
196-
as JupyterHub does by default, but if for example
197-
[traefik-proxy](https://github.com/jupyterhub/traefik-proxy#readme) is used
198-
as it is in the [TLJH distribution](https://tljh.jupyter.org), no such data
199-
will be available.
213+
The [configurable-http-proxy] used in https://z2jh.jupyter.org provides
214+
information about network routes activity, but [traefik-proxy] used in
215+
https://tljh.jupyter.org [currently does not].
216+
217+
[`get_all_routes`]: https://jupyterhub.readthedocs.io/en/stable/reference/proxy.html#retrieving-routes
218+
[configurable-http-proxy]: https://github.com/jupyterhub/configurable-http-proxy#readme
219+
[traefik-proxy]: https://github.com/jupyterhub/traefik-proxy#readme
220+
[currently does not]: https://github.com/jupyterhub/traefik-proxy/issues/151
200221

201222
2. **The user server's activity reports**
202223

203224
The `update_last_activity` function also reads JupyterHub's database that
204225
keeps state about servers `last_activity`. These database records are updated
205-
whenever a server notifies JupyterHub about activity, as they are
206-
responsible to do.
226+
whenever a server notifies JupyterHub about activity, as they are required to
227+
do.
207228

208-
Servers notify JupyterHub about activity by being started by the
209-
[`jupyterhub-singleuser`](https://github.com/jupyterhub/jupyterhub/blob/1.4.2/setup.py#L115)
210-
script that is made available by installing jupyterhub (or `jupyterhub-base`
211-
on conda-forge).
229+
Servers has before JupyterHub 4 notified JupyterHub about activity by being
230+
started by the [`jupyterhub-singleuser`] script made available by installing
231+
`jupyterhub` (or `jupyterhub-singleuser` on conda-forge). With JupyterHub 4+
232+
and jupyter_server 2+ a jupyter_server server extension can be used instead.
212233

213234
The `jupyterhub-singleuser` script launches a modified server application
214235
that keeps JupyterHub updated with the server activity via the
215-
[`notify_activity`](https://github.com/jupyterhub/jupyterhub/blob/1.4.2/jupyterhub/singleuser/mixins.py#L497)
216-
function.
236+
[`notify_activity`] function.
217237

218238
The `notify_activity` function in turn make use of the server applications
219-
`last_activity` function (see implementation in
220-
[NotebookApp](https://github.com/jupyter/notebook/blob/v6.4.0/notebook/notebookapp.py#L392-L397)
221-
and
222-
[ServerApp](https://github.com/jupyter-server/jupyter_server/blob/v1.9.0/jupyter_server/serverapp.py#L375)
239+
`last_activity` function (see implementation in [NotebookApp] and [ServerApp]
223240
respectively) that that combines information from API activity, kernel
224241
activity, kernel shutdown, and terminal activity. This activity also covers
225242
activity of applications like RStudio running via `jupyter-server-proxy`.
226243

244+
[`jupyterhub-singleuser`]: https://github.com/jupyterhub/jupyterhub/blob/3.1.1/setup.py#L112
245+
[`notify_activity`]: https://github.com/jupyterhub/jupyterhub/blob/3.1.1/jupyterhub/singleuser/mixins.py#L532
246+
[notebookapp]: https://github.com/jupyter/notebook/blob/v6.5.2/notebook/notebookapp.py#L391-L396
247+
[serverapp]: https://github.com/jupyter-server/jupyter_server/blob/v1.23.5/jupyter_server/serverapp.py#L446-L451
248+
227249
Here is a summary of what's described so far:
228250

229-
1. jupyterhub-idle-culler culls servers via JupyterHub's REST API.
230-
2. jupyterhub-idle-culler makes decisions based on information retrieved by
231-
JupyterHub REST API.
232-
3. JupyterHub REST API reports information regularly updated by summarizing
233-
information gained by: asking the proxy about routes' activity, and by
234-
retaining activity information reported by the servers.
251+
1. `jupyterhub-idle-culler` collects information and acts entirely through
252+
JupyterHub's REST API.
253+
2. `jupyterhub-idle-culler` makes decisions based on information provided by
254+
JupyterHub, that collects activity reports from the user servers and polls
255+
the proxy class for information about user servers' network activity.
235256

236257
Now, as the server's kernel activity influence the activity that servers will
237258
notify JupyterHub about, the kernel activity in turn influences
238-
jupyterhub-idle-culler. Due to this, it can be relevant to also learn a little
259+
`jupyterhub-idle-culler`. Due to this, it can be relevant to also learn a little
239260
about a mechanism to _cull idle kernels_ as well even though
240-
jupyterhub-idle-culler isn't involved in that.
261+
`jupyterhub-idle-culler` isn't involved in that.
241262

242-
The default kernel manager, the MappingKernelManager, can be configured to cull
243-
idle kernels. Its configuration is documented in
244-
[NotebookApp's](https://jupyter-notebook.readthedocs.io/en/stable/config.html#options)
263+
The default kernel manager, the `MappingKernelManager`, can be configured to
264+
cull idle kernels. Its configuration is documented in
265+
[ServerApp's](https://jupyter-server.readthedocs.io/en/stable/other/full-config.html)
245266
and
246-
[ServerApp's](https://jupyter-server.readthedocs.io/en/latest/full-config.html)
267+
[NotebookApp's](https://jupyter-notebook.readthedocs.io/en/stable/config.html#options)
247268
respective documentation, and here are some relevant kernel culling
248269
configuration options:
249270

@@ -258,24 +279,25 @@ configuration options:
258279
details](https://github.com/jupyterlab/jupyterlab/issues/6893).
259280

260281
Also note that configuration of MappingKernelManager should be made on the
261-
user server itself, for example via a `jupyter_notebook_config.py` file in
282+
user server itself, for example via a `jupyter_server_config.py` file in
262283
`/etc/jupyter` or `/usr/local/etc/jupyter` rather than where JupyterHub is
263284
running.
264285

265-
Finally, note that a Jupyter Notebook server can shut itself down without intervention by jupyterhub-idle-culler if
266-
`NotebookApp.shutdown_no_activity_timeout` is configured.
286+
Finally, note that a Jupyter server can shut itself down without intervention by
287+
`jupyterhub-idle-culler` if `ServerApp.shutdown_no_activity_timeout` is
288+
configured.
267289

268290
### Caveats
269291

270292
#### Pagination
271293

272-
JupyterHub 2.0 introduces pagination to the [/users][users-api] API endpoint.
273-
This pagination does not guarantee a consistent snapshot
274-
for consecutive requests spread over time,
275-
so it is possible for a highly active hub to occasionally miss culling users crossing page boundaries between requests.
276-
This is expected to be an infrequent occurrence and only result in delaying a server being culled by one cull interval
277-
in realistic scenarios, so of minor consequence in JupyterHub.
294+
JupyterHub 2.0 introduces pagination to the [`/users`] API endpoint. This
295+
pagination does not guarantee a consistent snapshot for consecutive requests
296+
spread over time, so it is possible for a highly active hub to occasionally miss
297+
culling users crossing page boundaries between requests. This is expected to be
298+
an infrequent occurrence and only result in delaying a server being culled by
299+
one cull interval in realistic scenarios, so of minor consequence in JupyterHub.
278300

279-
The issue can be mitigated by requesting a larger page size,
280-
via e.g. `--api-page-size=200`,
281-
but feel free to open an issue if this is causing a problem for you.
301+
The issue can be mitigated by requesting a larger page size, via e.g.
302+
`--api-page-size=200`, but feel free to open an issue if this is causing a
303+
problem for you.

0 commit comments

Comments
 (0)