Skip to content

Using GPU-optimized NGC images as base for ML (Pytorch/Tensorflow) docker images #457

@weiji14

Description

@weiji14

Consolidating some of the discussion @ngam had around using NVIDIA GPU Cloud (NGC) containers as the base image for pytorch-notebook and ml-notebook, and potentially cupy (#322)

Is your feature request related to a problem? Please describe.

For machine learning and data analytics work that rely on NVIDIA Graphical Processing Units (GPUs), there are several optimizations related to drivers/hardware that can help to speed up processing workflows. Currently, the pytorch-notebook and ml-notebook docker images rely on CUDA libraries from conda-forge which are less optimized than what exists on NGC.

Describe the solution you'd like

Refactor the pytorch-notebook and ml-notebook to be based on NGC containers instead of the current base-image. This might involve flipping the current installation pipeline from Pangeo-first/ML-second (base-notebook -> pangeo-notebook -> ml-notebook) to ML-first/Pangeo-second (ngc -> ml-notebook -> pangeo-notebook). Something that can help with this is a pangeo-notebook metapackage #359

Describe alternatives you've considered

Spin things off into a different repository (pangeo-gpu-docker-images?), or have a separate build chain (ngc-pytorch-notebook, ngc-ml-notebook) from the current CI/CD infrastructure.

Additional context
Add any other context or screenshots about the feature request here.

One benefit of chaging the build order to ML-first/Pangeo-second is that ML folks who don't need all of the heavy Climate/Ocean packages pangeo-notebook can get a slimmer ml-notebook. For example, if they're deploying a model to some server API, they can base their docker image on ngc-ml-notebook, instead of the current heavy ml-notebook.

Disadvantage is that the refactoring will require some effort, and we need to be careful to ensure this doesn't affect existing JupyterHub deployments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions