You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* wip kubernetes scheduler
* proof-of-concept
* refactoring
* fixes
* fixes
* fixes
* fixes
* fixes
* fixes
* ci
* ci
* use jobs
* Merge branch 'master' into cluster-k8s
* fixes
* Merge branch 'cluster-k8s' of github.com:scalableminds/webknossos-libs into cluster-k8s
* ttl for job
* poetry locks
* test refactoring
* ci
* ci
* fixes?
* ci
* ci
* fixes
* fixes
* mounts
* fixes for mounts
* changelog
* readme
* readme
* pr feedback + wkcuber integration
* refactor tests
* ci
* test fixes
* test fixes
* ci
* ci
* ci
* fixes?
* ci
* bool
* better job_id, job_index separation
* fixes?
* fixes
* fixes
* reactivate tests
* deduplicate mounts
* fixes
* readme
* ci
* fixes
* using objects instead of dict
* ci
* ci
* ci
* ci
* ci
* Apply suggestions from code review
Co-authored-by: Philipp Otto <[email protected]>
* better mount test
* Merge branch 'master' into cluster-k8s
* comment
Copy file name to clipboardExpand all lines: cluster_tools/Changelog.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,6 +12,7 @@ For upgrade instructions, please check the respective *Breaking Changes* section
12
12
### Breaking Changes
13
13
14
14
### Added
15
+
* Added `KubernetesExecutor` for parallelizing Python scripts on a Kubernetes cluster. [#600](https://github.com/scalableminds/webknossos-libs/pull/600)
Copy file name to clipboardExpand all lines: cluster_tools/README.md
+25-2Lines changed: 25 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,6 @@
4
4
5
5
This package provides python `Executor` classes for distributing tasks on a slurm cluster or via multi processing.
6
6
7
-
8
7
## Example
9
8
10
9
```python
@@ -24,11 +23,35 @@ if __name__ == '__main__':
24
23
25
24
### Slurm
26
25
27
-
The cluster_tools automatically determine the slurm limit for maximum array job size and split up larger job batches into multiple smaller batches.
26
+
The `cluster_tools` automatically determine the slurm limit for maximum array job size and split up larger job batches into multiple smaller batches.
28
27
Also, the slurm limit for the maximum number of jobs which are allowed to be submitted by a user at the same time is honored by looking up the number of currently submitted jobs and only submitting new batches if they fit within the limit.
29
28
30
29
If you would like to configure these limits independently, you can do so by setting the `SLURM_MAX_ARRAY_SIZE` and `SLURM_MAX_SUBMIT_JOBS` environment variables.
|`namespace`| Kubernetes namespace for the resources to be created. Will be created if not existent. |`cluster-tools`|
38
+
|`node_selector`| Which nodes to utilize for the processing. Needs to be a [Kubernetes `nodeSelector` object](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/). |`{"kubernetes.io/hostname": "node001"}`|
39
+
|`image`| The docker image for the containerized jobs to run in. The image needs to have the same version of `cluster_tools` and the code to run installed and in the `PYTHONPATH`. |`scalableminds/voxelytics:latest`|
40
+
|`mounts`| Additional mounts for the containerized jobs. The current working directory and the `.cfut` directory are automatically mounted. |`["/srv", "/data"]`|
41
+
|`cpu`|[CPU requirements](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) for this job. |`4`|
42
+
|`memory`|[Memory requirements](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) for this job. Not required, but highly recommended to avoid congestion. Without resource requirements, all jobs will be run in parallel and RAM will run out soon. |`16G`|
43
+
|`python_executable`| The python executable may differ in the docker image from the one in the current environment. For images based of `FROM python`, it should be `python`. Defaults to `python`. |`python3.8`|
44
+
|`umask`|`umask` for the jobs. |`0002`|
45
+
46
+
#### Notes
47
+
48
+
- The jobs are run with the current `uid:gid`.
49
+
- The jobs are removed 7 days after completion (successful or not).
50
+
- The logs are stored in the `.cfut` directory. This is actually redundant, because Kubernetes also stores them.
51
+
- Pods are not restarted upon error.
52
+
- Requires Kubernetes ≥ 1.23.
53
+
-[Kubernetes cluster configuration](https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/) is expected to be the same as for `kubectl`, i.e. in `~/.kube/config` or similar.
0 commit comments