6
6
[ ![ Discourse] ( https://img.shields.io/badge/help_forum-discourse-blue?logo=discourse )] ( https://discourse.jupyter.org/c/jupyterhub )
7
7
[ ![ Gitter] ( https://img.shields.io/badge/social_chat-gitter-blue?logo=gitter )] ( https://gitter.im/jupyterhub/jupyterhub )
8
8
9
- ` jupyterhub-idle-culler ` provides a JupyterHub service to identify and shut down idle or long-running Jupyter Notebook servers.
10
- The exact actions performed are dependent on the used spawner for the Jupyter Notebook server (e.g. the default [ LocalProcessSpawner] ( https://jupyterhub.readthedocs.io/en/stable/api/spawner.html#localprocessspawner> ) , [ kubespawner] ( https://github.com/jupyterhub/kubespawner ) , or [ dockerspawner] ( https://github.com/jupyterhub/dockerspawner ) ).
11
- In addition, if explicitly requested, all users whose Jupyter Notebook servers have been shut down this way are deleted as JupyterHub users from the internal database. This neither affects the authentication method which continues to allow those users to log in nor does it delete persisted user data (e.g. stored in docker volumes for dockerspawner or in persisted volumes for kubespawner).
9
+ ` jupyterhub-idle-culler ` provides a JupyterHub service to identify and stop idle
10
+ or long-running Jupyter servers via JupyterHub. It works solely by interacting
11
+ with JupyterHub's REST API, and is often configured to run as a JupyterHub
12
+ managed service started up by JupyterHub itself.
12
13
13
14
## Setup
14
15
16
+ Setup involves three parts:
17
+
18
+ 1 . Install the Python package.
19
+ 2 . Configure JupyterHub permissions to work against JupyterHub's REST API.
20
+ 3 . Configure how its started up, either as a JupyterHub managed service or as a standalone script.
21
+
15
22
### Installation
16
23
17
24
``` bash
@@ -59,7 +66,7 @@ c.JupyterHub.load_roles = [
59
66
### As a hub managed service
60
67
61
68
In ` jupyterhub_config.py ` , add the following dictionary for the idle-culler
62
- Service to the ` c.JupyterHub.services ` list:
69
+ service to the ` c.JupyterHub.services ` list:
63
70
64
71
``` python
65
72
c.JupyterHub.services = [
@@ -109,7 +116,7 @@ Then start `jupyterhub-idle-culler` manually.
109
116
110
117
``` bash
111
118
export JUPYTERHUB_API_TOKEN=api_token_above...
112
- python3 -m jupyterhub-idle-culler [--timeout= 900] [--url= http://localhost:8081/hub/api]
119
+ python3 -m jupyterhub_idle_culler [--timeout= 900] [--url= http://localhost:8081/hub/api]
113
120
```
114
121
115
122
## Command line flags
@@ -149,101 +156,115 @@ python3 -m jupyterhub-idle-culler [--timeout=900] [--url=http://localhost:8081/h
149
156
150
157
## Caveats
151
158
152
- 1 . last_activity is not updated with high frequency, so cull timeout should be
153
- greater than the sum of:
159
+ 1 . JupyterHub's ` last_activity ` data about user servers is not updated with high
160
+ frequency, so cull timeout should be greater than the sum of:
154
161
155
162
- single-user websocket ping interval (default: 30 seconds)
156
163
- ` JupyterHub.last_activity_interval ` (default: 5 minutes)
157
164
158
- 2 . The same ` --timeout ` and ` --max-age ` values are used to cull
159
- users and users' servers. If you want a different value for users and servers,
160
- you should add this script to the services list twice, just with different
161
- ` name ` s, different values, and one with the ` --cull-users ` option.
165
+ 2 . If you want to use ` --cull-users ` with a different culling interval for the
166
+ user servers and users, you must start two idle culler services. This is
167
+ because both are configured via ` --timeout ` and ` --max-age ` . To do so,
168
+ configure this service to start twice with different configuration, where one
169
+ has the ` --cull-users ` option.
162
170
163
- 3 . By default HTTP requests to the hub timeout after 60 seconds. This can be
164
- changed by setting the ` JUPYTERHUB_REQUEST_TIMEOUT ` environment variable.
171
+ 3 . By default ` jupyterhub-idle-cullers ` HTTP requests to JupyterHub's REST API
172
+ timeouts after 60 seconds. This can be changed by setting the
173
+ ` JUPYTERHUB_REQUEST_TIMEOUT ` environment variable.
165
174
166
175
## How it works
167
176
168
- jupytehrub-idle-culler lists available users via JupyterHub's [ /users] [ users-api ] REST API.
177
+ JupyterHub's REST API is used to acquire information about activity, and if the
178
+ idle culler service based on configuration thinks a server should be stopped or
179
+ deleted it also does so via JupyterHub's REST API.
180
+
181
+ ### In depth
182
+
183
+ ` jupyterhub-idle-culler ` relies on permission to work against JupyterHub's REST
184
+ API is provided via the ` JUPYTERHUB_API_TOKEN ` , that is set automatically for
185
+ [ managed services] started by JupyterHub.
186
+
187
+ ` jupyterhub-idle-culler ` lists available users and their server's reported
188
+ ` last_activity ` via JupyterHub's [ ` /users ` ] REST API and makes decisions based on
189
+ that. User's default servers can be stopped via [ ` /users/{name}/server ` ] , named
190
+ servers can be stopped and optionally removed via
191
+ [ ` /users/{name}/servers/{server_name} ` ] , and users can optionally be deleted via
192
+ [ ` /users/{name} ` ] .
169
193
170
- [ users-api ] : https://jupyterhub.readthedocs.io/en/stable/_static/rest-api/index.html#path--users
194
+ [ managed services ] : https://jupyterhub.readthedocs.io/en/stable/reference/services.html#launching-a-hub-managed-service
195
+ [ `/users` ] : https://jupyterhub.readthedocs.io/en/stable/reference/rest-api.html#/default/get_users
196
+ [ `/users/{name}/server` ] : https://jupyterhub.readthedocs.io/en/stable/reference/rest-api.html#/default/delete_users__name__server
197
+ [ `/users/{name}/servers/{server_name}` ] : https://jupyterhub.readthedocs.io/en/stable/reference/rest-api/index.html#operation--users--name--servers--server_name--delete
198
+ [ `/users/{name}` ] : https://jupyterhub.readthedocs.io/en/stable/reference/rest-api.html#/default/delete_users__name_
171
199
172
- jupyterhub-idle-culler culls user servers using JupyterHub's REST API
173
- ([ /users/{name}/server] ( https://jupyterhub.readthedocs.io/en/stable/_static/rest-api/index.html#operation--users--name--server-delete )
174
- or
175
- [ /users/{name}/servers/{server_name}] ( https://jupyterhub.readthedocs.io/en/stable/_static/rest-api/index.html#operation--users--name--servers--server_name--delete ) ),
176
- and makes the culling decisions based on its configuration and what JupyterHub
177
- reports about the user servers via its REST API
178
- [ (/users)] [ users-api ]
179
- where user servers' ` last_activity ` is reported back.
200
+ JupyterHub's reported ` last_activity ` for user servers is updated by JupyterHub
201
+ at a regular interval in the [ ` update_last_activity ` function] that relies on
202
+ two sources of information.
180
203
181
- The ` last_activity ` that JupyterHub reports is the most recent summary of
182
- information updated at a regular interval via the [ ` update_last_activity `
183
- function] ( https://github.com/jupyterhub/jupyterhub/blob/1.4.2/jupyterhub/app.py#L2646 )
184
- that combines two sources of information.
204
+ [ `update_last_activity` function ] : https://github.com/jupyterhub/jupyterhub/blob/3.1.1/jupyterhub/app.py#L3002
185
205
186
206
1 . ** The proxy's routes data**
187
207
188
- The ` update_last_activity ` function will [ ask the
189
- proxy] ( https://jupyterhub.readthedocs.io/en/stable/reference/proxy.html#retrieving-routes )
190
- for the active routes like ` /user/user1 ` and collects associated
191
- ` last_activity ` data if it is available. This activity represents
192
- successfully proxies network traffic.
208
+ The configurable proxy class for JupyterHub is an interface for JupyterHub to
209
+ request routing of network traffic to user servers. Through this interface,
210
+ JupyterHub be informed on network activity if the proxy class provides it,
211
+ specifically via the [ ` get_all_routes ` ] function.
193
212
194
- ` last_activity ` data for routes will be available when using
195
- [ configurable-http-proxy] ( https://github.com/jupyterhub/configurable-http-proxy#readme )
196
- as JupyterHub does by default, but if for example
197
- [ traefik-proxy] ( https://github.com/jupyterhub/traefik-proxy#readme ) is used
198
- as it is in the [ TLJH distribution] ( https://tljh.jupyter.org ) , no such data
199
- will be available.
213
+ The [ configurable-http-proxy] used in https://z2jh.jupyter.org provides
214
+ information about network routes activity, but [ traefik-proxy] used in
215
+ https://tljh.jupyter.org [ currently does not] .
216
+
217
+ [ `get_all_routes` ] : https://jupyterhub.readthedocs.io/en/stable/reference/proxy.html#retrieving-routes
218
+ [ configurable-http-proxy ] : https://github.com/jupyterhub/configurable-http-proxy#readme
219
+ [ traefik-proxy ] : https://github.com/jupyterhub/traefik-proxy#readme
220
+ [ currently does not ] : https://github.com/jupyterhub/traefik-proxy/issues/151
200
221
201
222
2 . ** The user server's activity reports**
202
223
203
224
The ` update_last_activity ` function also reads JupyterHub's database that
204
225
keeps state about servers ` last_activity ` . These database records are updated
205
- whenever a server notifies JupyterHub about activity, as they are
206
- responsible to do.
226
+ whenever a server notifies JupyterHub about activity, as they are required to
227
+ do.
207
228
208
- Servers notify JupyterHub about activity by being started by the
209
- [ ` jupyterhub-singleuser ` ] ( https://github.com/jupyterhub/jupyterhub/blob/1.4.2/setup.py#L115 )
210
- script that is made available by installing jupyterhub (or ` jupyterhub-base `
211
- on conda-forge) .
229
+ Servers has before JupyterHub 4 notified JupyterHub about activity by being
230
+ started by the [ ` jupyterhub-singleuser ` ] script made available by installing
231
+ ` jupyterhub ` (or ` jupyterhub-singleuser ` on conda-forge). With JupyterHub 4+
232
+ and jupyter_server 2+ a jupyter_server server extension can be used instead .
212
233
213
234
The ` jupyterhub-singleuser ` script launches a modified server application
214
235
that keeps JupyterHub updated with the server activity via the
215
- [ ` notify_activity ` ] ( https://github.com/jupyterhub/jupyterhub/blob/1.4.2/jupyterhub/singleuser/mixins.py#L497 )
216
- function.
236
+ [ ` notify_activity ` ] function.
217
237
218
238
The ` notify_activity ` function in turn make use of the server applications
219
- ` last_activity ` function (see implementation in
220
- [ NotebookApp] ( https://github.com/jupyter/notebook/blob/v6.4.0/notebook/notebookapp.py#L392-L397 )
221
- and
222
- [ ServerApp] ( https://github.com/jupyter-server/jupyter_server/blob/v1.9.0/jupyter_server/serverapp.py#L375 )
239
+ ` last_activity ` function (see implementation in [ NotebookApp] and [ ServerApp]
223
240
respectively) that that combines information from API activity, kernel
224
241
activity, kernel shutdown, and terminal activity. This activity also covers
225
242
activity of applications like RStudio running via ` jupyter-server-proxy ` .
226
243
244
+ [ `jupyterhub-singleuser` ] : https://github.com/jupyterhub/jupyterhub/blob/3.1.1/setup.py#L112
245
+ [ `notify_activity` ] : https://github.com/jupyterhub/jupyterhub/blob/3.1.1/jupyterhub/singleuser/mixins.py#L532
246
+ [ notebookapp ] : https://github.com/jupyter/notebook/blob/v6.5.2/notebook/notebookapp.py#L391-L396
247
+ [ serverapp ] : https://github.com/jupyter-server/jupyter_server/blob/v1.23.5/jupyter_server/serverapp.py#L446-L451
248
+
227
249
Here is a summary of what's described so far:
228
250
229
- 1 . jupyterhub-idle-culler culls servers via JupyterHub's REST API.
230
- 2 . jupyterhub-idle-culler makes decisions based on information retrieved by
231
- JupyterHub REST API.
232
- 3 . JupyterHub REST API reports information regularly updated by summarizing
233
- information gained by: asking the proxy about routes' activity, and by
234
- retaining activity information reported by the servers.
251
+ 1 . ` jupyterhub-idle-culler ` collects information and acts entirely through
252
+ JupyterHub's REST API.
253
+ 2 . ` jupyterhub-idle-culler ` makes decisions based on information provided by
254
+ JupyterHub, that collects activity reports from the user servers and polls
255
+ the proxy class for information about user servers' network activity.
235
256
236
257
Now, as the server's kernel activity influence the activity that servers will
237
258
notify JupyterHub about, the kernel activity in turn influences
238
- jupyterhub-idle-culler. Due to this, it can be relevant to also learn a little
259
+ ` jupyterhub-idle-culler ` . Due to this, it can be relevant to also learn a little
239
260
about a mechanism to _ cull idle kernels_ as well even though
240
- jupyterhub-idle-culler isn't involved in that.
261
+ ` jupyterhub-idle-culler ` isn't involved in that.
241
262
242
- The default kernel manager, the MappingKernelManager, can be configured to cull
243
- idle kernels. Its configuration is documented in
244
- [ NotebookApp 's] ( https://jupyter-notebook .readthedocs.io/en/stable/config.html#options )
263
+ The default kernel manager, the ` MappingKernelManager ` , can be configured to
264
+ cull idle kernels. Its configuration is documented in
265
+ [ ServerApp 's] ( https://jupyter-server .readthedocs.io/en/stable/other/full- config.html )
245
266
and
246
- [ ServerApp 's] ( https://jupyter-server .readthedocs.io/en/latest/full- config.html )
267
+ [ NotebookApp 's] ( https://jupyter-notebook .readthedocs.io/en/stable/ config.html#options )
247
268
respective documentation, and here are some relevant kernel culling
248
269
configuration options:
249
270
@@ -258,24 +279,25 @@ configuration options:
258
279
details] ( https://github.com/jupyterlab/jupyterlab/issues/6893 ) .
259
280
260
281
Also note that configuration of MappingKernelManager should be made on the
261
- user server itself, for example via a ` jupyter_notebook_config .py` file in
282
+ user server itself, for example via a ` jupyter_server_config .py` file in
262
283
` /etc/jupyter ` or ` /usr/local/etc/jupyter ` rather than where JupyterHub is
263
284
running.
264
285
265
- Finally, note that a Jupyter Notebook server can shut itself down without intervention by jupyterhub-idle-culler if
266
- ` NotebookApp.shutdown_no_activity_timeout ` is configured.
286
+ Finally, note that a Jupyter server can shut itself down without intervention by
287
+ ` jupyterhub-idle-culler ` if ` ServerApp.shutdown_no_activity_timeout ` is
288
+ configured.
267
289
268
290
### Caveats
269
291
270
292
#### Pagination
271
293
272
- JupyterHub 2.0 introduces pagination to the [ /users] [ users-api ] API endpoint.
273
- This pagination does not guarantee a consistent snapshot
274
- for consecutive requests spread over time,
275
- so it is possible for a highly active hub to occasionally miss culling users crossing page boundaries between requests.
276
- This is expected to be an infrequent occurrence and only result in delaying a server being culled by one cull interval
277
- in realistic scenarios, so of minor consequence in JupyterHub.
294
+ JupyterHub 2.0 introduces pagination to the [ ` /users ` ] API endpoint. This
295
+ pagination does not guarantee a consistent snapshot for consecutive requests
296
+ spread over time, so it is possible for a highly active hub to occasionally miss
297
+ culling users crossing page boundaries between requests. This is expected to be
298
+ an infrequent occurrence and only result in delaying a server being culled by
299
+ one cull interval in realistic scenarios, so of minor consequence in JupyterHub.
278
300
279
- The issue can be mitigated by requesting a larger page size,
280
- via e.g. ` --api-page-size=200 ` ,
281
- but feel free to open an issue if this is causing a problem for you.
301
+ The issue can be mitigated by requesting a larger page size, via e.g.
302
+ ` --api-page-size=200 ` , but feel free to open an issue if this is causing a
303
+ problem for you.
0 commit comments