-
Notifications
You must be signed in to change notification settings - Fork 4
added first pass of conda environments page #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 7 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
3ccc2ae
added first pass of conda environments page
RobJY 30679cd
first pass of r packages with renv page and updated conda environment…
RobJY 41fee7e
first pass of python packages with virtual environments page
RobJY 89ff4ad
merged main updates
RobJY 1c2863a
moved files as requested in PR
RobJY 35dd923
lint fixes
RobJY f07457c
more lint fixes
RobJY dc1b9f1
added VPN info to OOD page
RobJY ac65522
moved r packages with renv page as suggested in PR
RobJY 8e53c3d
removed TOCs and update R version in examples as suggested in PR
RobJY File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| # Research Project Space (RPS) | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| # Data Management |
267 changes: 267 additions & 0 deletions
267
docs/hpc/06_tools_and_software/02_conda_environments.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,268 @@ | ||
| # Conda Environments (Python, R) | ||
| - [What is Conda?](#what-is-conda) | ||
| - [Advantages/disadvantages of using Conda](#advantagesdisadvantages-of-using-conda) | ||
| - [Initializing Conda](#initializing-conda) | ||
| - [Automatic deletion of your files](#automatic-deletion-of-your-files) | ||
| - [Python](#python) | ||
| - [R](#r) | ||
| - [Reproducibility](#reproducibility) | ||
| - [Use conda env in a batch script](#use-conda-env-in-a-batch-script) | ||
|
|
||
| ## What is Conda? | ||
| Package, dependency and environment management for any language—Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN, and more. | ||
|
|
||
| Please find more information at this link: [https://docs.conda.io/en/latest/](https://docs.conda.io/en/latest/) | ||
|
|
||
| Conda provides a great way to install packages that are already compiled, so you don't need to go over the long compilation process. If a package you need is not available, you can install it (and compile it when needed) using pip (Python) or install.packages (R). | ||
|
|
||
| :::note | ||
| Reproducibility: | ||
| One of the ways to ensure the reproducibility of your results is to have an independent conda environment in the directory of each project (one of the options shown below). This will also keep conda environment files away from your /home/$USER directory. | ||
| ::: | ||
|
|
||
| ## Advantages/disadvantages of using Conda | ||
| ### Advantages | ||
|
|
||
| - A lot of pre-compiled packages (fast and easy to install) | ||
| - Note for Python: pip also offers pre-compiled packages (wheels). List can be found here https://pythonwheels.com. However, Conda has a significantly larger number of pre-compiled packages. | ||
| - Compiled packages use highly efficient Intel Math Kernel Library (MKL) library | ||
|
|
||
| ### Disadvantages | ||
|
|
||
| - Conda does not take advantage of packages already installed in the system (while [virtualenv and venv](./03_python_packages_with_virtual_environments.md) do) | ||
| - As you will see below, you may need to do additional steps to keep track of all installed packages (including those installed by pip and/or install.packages) | ||
|
|
||
| ## Initializing Conda | ||
| Load anaconda module | ||
| ```sh | ||
| module purge | ||
| module load anaconda3/2020.07 | ||
| ``` | ||
|
|
||
| Conda init can create problems with package installation, so we suggest using `source activate` instead of `conda activate`, even though conda activate is considered a best practice by the Anaconda developers. | ||
|
|
||
| ### Automatic deletion of your files | ||
| This page describes the installation of packages on /scratch. One has to remember, though, that files stored in the HPC scratch file system are subject to the HPC Scratch old file purging policy: Files on the /scratch file system that have not been accessed for 60 or more days will be purged (read more about [Data Management](../03_storage/06_data_management.md). | ||
|
|
||
| Thus you can consider the following options | ||
|
|
||
| - Reinstall your packages if some of the files get deleted | ||
| - You can do this manually | ||
| - You can do this automatically. For example, within a workflow of a pipeline software like [Nextflow](https://www.nextflow.io/) | ||
| - Pay for "Research Project Space" - read more [here](../03_storage/05_research_project_space.md) | ||
| - Use Singularity and install packages within a corresponding overlay file - read more [here](../07_containers/03_singularity_with_conda.md) | ||
|
|
||
| ## Python | ||
| Load anaconda module | ||
| ```sh | ||
| module purge | ||
| module load anaconda3/2020.07 | ||
| ``` | ||
| :::tip | ||
| Keep your program/project in `/scratch` and create conda environment using '-p' parameter. This will keep all the files inside the project's directory, instead of putting in in your `/home/$USER` | ||
| ::: | ||
|
|
||
| ```sh | ||
| conda create -p ./penv python=3 ## environment will be created in project directory | ||
| conda activate ./penv | ||
| ``` | ||
| Also, you need to create a symbolic link, so conda will download files for packages to be installed into scratch, not your home directory. | ||
| ```sh | ||
| mkdir /home/<NetID>/.conda | ||
| mkdir /scratch/<NetID>/conda_pkgs | ||
| ln -s /scratch/<NetID>/conda_pkgs /home/<NetID>/.conda/pkgs | ||
| ``` | ||
| [Install pre-compiled packages available in conda](https://anaconda.org/anaconda/repo) | ||
| ```sh | ||
| conda install -c anaconda pandas | ||
| ``` | ||
|
|
||
| Other packages may be installed (and compiled when needed) using pip | ||
| ```sh | ||
| pip install <package_name> | ||
| ``` | ||
| :::note | ||
| Conda and packages install by default to `~/.local/lib/python<version>` | ||
| ::: | ||
|
|
||
| If you did use `'pip install --user'` to install some packages (without conda or other virtual environment), they will be available in `~/.local/lib/python<version>` | ||
|
|
||
| :::warning | ||
| ***The primary takeaway:*** | ||
|
|
||
| Let say you have tornado v.6 installed in `~/.local/lib/python<version>`, and tornado v.5 installed by `conda install`. | ||
|
|
||
| When you will do `conda activate` you will have tornado v.6 available!! Not v.5!! | ||
|
|
||
| (this behaviour is the same for packages installed by to `~/.local/lib/python<version>` before or after you create your conda environment) | ||
|
|
||
| `pip freeze` will give v.6 | ||
|
|
||
| `conda list` will give v.5 | ||
|
|
||
| ***Solution*** | ||
|
|
||
| To overcome this, do `export PYTHONNOUSERSITE=True` after conda activate | ||
| ::: | ||
|
|
||
| ## R | ||
| Load anaconda module | ||
| ```sh | ||
| module load anaconda3/2020.07 | ||
| ``` | ||
| :::tip | ||
| Keep your program/project in `/scratch` and create conda environment using '-p' parameter. This will keep all the files inside the project's directory, instead of putting them in your `/home/$USER` | ||
| ::: | ||
|
|
||
| ```sh | ||
| conda create -p ./renv r=3.5 ## environment will be created in project directory | ||
RobJY marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ## OR | ||
| conda create -c conda-forge -p ./penv r-base=3.6.3 ## environment will be created in project directory | ||
| conda activate ./renv | ||
| ``` | ||
|
|
||
| Install pre-compiled packages available in conda: | ||
|
|
||
| [https://docs.anaconda.com/anaconda/packages/r-language-pkg-docs/](https://docs.anaconda.com/anaconda/packages/r-language-pkg-docs/) | ||
|
|
||
| ```sh | ||
| conda install -c r r-dplyr | ||
| ``` | ||
|
|
||
| Other packages may be installed (and compiled) using install.packages() | ||
| ```sh | ||
| install.packages("<package_name>") | ||
| ``` | ||
|
|
||
| ## Reproducibility | ||
| Packages installed only using conda | ||
|
|
||
| Save a list of packages (so you are able to report environment in publication, and to restore/reproduce env on another machine at any time) | ||
|
|
||
| ```sh | ||
| # save | ||
| conda list --export > requirements.txt | ||
| # restore | ||
| conda create -p ./penv --file requirements.txt | ||
| ``` | ||
| :::note | ||
| This will not list packages installed by `pip` or `install.packages()` | ||
| ::: | ||
|
|
||
| If you installed extra packages using pip (Python) | ||
|
|
||
| In this you can use | ||
| ```sh | ||
| export PYTHONNOUSERSITE=True ## to ingnore packages in ~/.local/lib/python<version> | ||
| # save | ||
| conda list --export > conda_requirements.txt | ||
| pip freeze > pip_requirements.txt | ||
| # restore | ||
| conda create -p ./penv --file conda_requirements.txt | ||
| pip install -r pip_requirements.txt | ||
| ``` | ||
|
|
||
| :::note | ||
| Alternatively, you can use conda env export > all_requirements.txt, which will save both: packages installed by conda and by pip. | ||
| ::: | ||
|
|
||
| However, this may fail if your conda environment is created as a sub-directory of your project's directory (which we recommend) | ||
|
|
||
| Installed extra packages using install.packages? (R) | ||
|
|
||
| Usecase: You need packages not availalbe in conda channels, and want to use install.packages. | ||
|
|
||
| Command `conda list --export` will not include packages installed by "install.packages". So, do not use `conda install` at all. To have reproducibility in this case you need to use Conda and renv together, as described below | ||
|
|
||
| Conda + pakcrat: specific version of R and install.packages (R) | ||
|
|
||
| - use conda to install version of R you need | ||
| - do not use 'conda install' at all | ||
| - use renv | ||
| - install all the packages using install.packages | ||
| - use [renv as described here](../09_ood/r_packages_with_renv.md) to keep track of the environment | ||
|
|
||
| In order for conda + renv to work, you need to add following steps: | ||
|
|
||
| - After you activate conda AND before loading R | ||
| ```sh | ||
| export R_RENV_DEFAULT_LIBPATHS=<path_to_project_directory>/renv/lib/x86_64-conda_cos6-linux-gnu/<version>/ | ||
| ``` | ||
| - Start R and execute | ||
| ```sh | ||
| .libPaths(c(.libPaths(), Sys.getenv("R_RENV_SYSTEM_LIBRARY"))) | ||
| ``` | ||
|
|
||
| ## Use conda env in a batch script | ||
| The part of the batch script which will call the command shall look like (replace `<path_to_env>` to an appropriate value) | ||
|
|
||
| ### Python | ||
|
|
||
| #### Single node | ||
| ```bash | ||
| #!/bin/bash | ||
| #SBATCH --job-name=test | ||
| #SBATCH --nodes=1 | ||
| #SBATCH --cpus-per-task=1 | ||
| #SBATCH --ntasks-per-node=4 | ||
| #SBATCH --mem=8GB | ||
| #SBATCH --time=1:00:00 | ||
| module purge; | ||
| module load anaconda3/2020.07; | ||
| export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK; | ||
| source /share/apps/anaconda3/2020.07/etc/profile.d/conda.sh; | ||
| conda activate ./penv; | ||
| export PATH=./penv/bin:$PATH; | ||
| python python_script.py | ||
| ``` | ||
|
|
||
| #### Multiple nodes, using MPI | ||
| ```sh | ||
| mpiexec --mca bash -c "module purge; | ||
| export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK; | ||
| module load anaconda3/2020.07; | ||
| source /share/apps/anaconda3/2020.07/etc/profile.d/conda.sh; | ||
| conda activate ./penv; | ||
| export PATH=./penv/bin:$PATH; | ||
| python python_script.py" | ||
| ``` | ||
|
|
||
| ### R (conda packages only) | ||
| ```bash | ||
| #!/bin/bash | ||
| #SBATCH --job-name=test | ||
| #SBATCH --nodes=1 | ||
| #SBATCH --cpus-per-task=1 | ||
| #SBATCH --ntasks-per-node=4 | ||
| #SBATCH --mem=8GB | ||
| #SBATCH --time=1:00:00 | ||
| module purge; | ||
| module load anaconda3/2020.07; | ||
| export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK; | ||
| source /share/apps/anaconda3/2020.07/etc/profile.d/conda.sh; | ||
| conda activate ./renv; | ||
| export PATH=./renv/bin:$PATH; | ||
| Rscript r_script.R | ||
| ``` | ||
|
|
||
| #### Multiple nodes, using MPI | ||
| ```sh | ||
| mpiexec --mca bash -c "module purge; | ||
| export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK; | ||
| module load anaconda3/2020.07; | ||
| source /share/apps/anaconda3/2020.07/etc/profile.d/conda.sh; | ||
| conda activate ./renv; | ||
| export PATH=./renv/bin:$PATH; | ||
| Rscript r_script.R" | ||
| ``` | ||
|
|
||
| ### R (conda with renv combination) | ||
|
|
||
| In this case, when you use sbatch you would activate conda in sbatch script, and R script will pickup packages installed in renv | ||
| ```sh | ||
| module purge | ||
| module load anaconda3/2020.07 | ||
| source /share/apps/anaconda3/2020.07/etc/profile.d/conda.sh | ||
| conda activate ./renv | ||
| Rscript test.R | ||
| ``` | ||
124 changes: 124 additions & 0 deletions
124
docs/hpc/06_tools_and_software/03_python_packages_with_virtual_environments.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,124 @@ | ||
| # Python Packages with Virtual Environments | ||
|
|
||
| - [Create project directory and load Python module](#create-project-directory-and-load-python-module) | ||
| - [Automatic deletion of your files](#automatic-deletion-of-your-files) | ||
| - [Create virtual environment](#create-virtual-environment) | ||
| - [virtualenv](#virtualenv) | ||
| - [venv](#venv) | ||
| - [Install packages. Keep things reproducible](#install-packages-keep-things-reproducible) | ||
| - [Close an Activated Virtual Environment](#close-an-activated-virtual-environment) | ||
| - [Use with sbatch](#use-with-sbatch) | ||
RobJY marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| In order to be able to install new Python packages and make your work reproducible, please use virtual environments. | ||
|
|
||
| There is more than one way to create a private environment in Python. | ||
|
|
||
| ## Create project directory and load Python module | ||
| ```sh | ||
| ## Find python version you need | ||
| module avail python | ||
| ## created directory for your project and cd there | ||
| mkdir /scratch/$USER/my_project | ||
| cd /scratch/$USER/my_project | ||
| ## load python module (different versions available) | ||
| module load python/intel/3.8.6 | ||
| ``` | ||
|
|
||
| ## Automatic deletion of your files | ||
| This page describes the installation of packages on /scratch. One has to remember, though, that files stored in the HPC scratch file system are subject to the HPC Scratch old file purging policy: Files on the /scratch file system that have not been accessed for 60 or more days will be purged (read [more](../03_storage/06_data_management.md)). | ||
|
|
||
| Thus you can consider the following options | ||
|
|
||
| - Reinstall your packages if some of the files get deleted | ||
| - You can do this manually | ||
| - You can do this automatically. For example, within a workflow of a pipeline software like [Nextflow](https://www.nextflow.io/) | ||
| - Pay for "Research Project Space" - read more [here](../03_storage/05_research_project_space.md) | ||
| - Use Singularity and install packages within a corresponding overlay file - read more [here](../07_containers/03_singularity_with_conda.md) | ||
|
|
||
| ## Create virtual environment | ||
| It is advisable to create private environment inside the project directory. This boosts reproducibility and does not use space in `/home/$USER` | ||
|
|
||
| ### virtualenv | ||
| [virtualenv](https://virtualenv.pypa.io/en/latest/) is a tool to create isolated Python environments | ||
|
|
||
| Since Python 3.3, a subset of it has been integrated into the standard library under the venv module. | ||
|
|
||
| Note: you may need to install virtualenv first, if it is not yet installed ([instructions](https://virtualenv.pypa.io/en/latest/installation.html)) | ||
|
|
||
| Now create new virtual environment in current directory | ||
|
|
||
| - Empty | ||
| - OR | ||
| - inherit all packages from those installed on HPC already (and available in PATH after you load python module) | ||
| ```sh | ||
| ## created directory for your project and cd there | ||
| mkdir /scratch/$USER/my_project | ||
| cd /scratch/$USER/my_project | ||
|
|
||
| ## Create an EMPTY virtual environment | ||
| virtualenv venv | ||
|
|
||
| ## Create an virtual environment that inherits system packages | ||
| virtualenv venv --system-site-packages | ||
| ``` | ||
|
|
||
| ### venv | ||
| [venv](https://docs.python.org/3/library/venv.html) is package shipped with Python3. It provides subset of options available in virtualenv tool ([link](https://virtualenv.pypa.io/en/latest/)). | ||
| ```sh | ||
| python3 -m venv venv | ||
| ``` | ||
|
|
||
| Create new virtual environment in current directory | ||
|
|
||
| - Empty | ||
| - OR | ||
| - inherit all packages from those installed on HPC already (and available in PATH after you load python module) | ||
| ```sh | ||
| ## created directory for your project and cd there | ||
| mkdir /scratch/$USER/my_project | ||
| cd /scratch/$USER/my_project | ||
| ##EMPTY | ||
| ## (use venv command to create environment called "venv") | ||
|
|
||
| python3 -m venv venv | ||
|
|
||
| ## Inhering all packages | ||
| python3 -m venv venv --system-site-packages | ||
| ``` | ||
|
|
||
| ## Install packages. Keep things reproducible | ||
| ```sh | ||
| ## activate | ||
| source venv/bin/activate | ||
| ## install packages | ||
| pip install <package you need> | ||
| ## If package was inherited, but you want to install it in your own env anyway | ||
| pip install <package you need> --ignore-installed | ||
| ## export list of packages (to report together with paper and/or to reproduce environment on another computer) | ||
| pip freeze > requirements.txt | ||
| ## restore | ||
| pip install -r requirements.txt | ||
| ``` | ||
|
|
||
| ## Close an Activated Virtual Environment | ||
| If you have activated a virtual environment, you can exit it with the following command: | ||
| ```sh | ||
| deactivate | ||
| ``` | ||
|
|
||
| ## Use with sbatch | ||
| When you use this env in sbatch script, please use | ||
| ```sh | ||
| module purge; | ||
| source venv/bin/activate; | ||
| export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK; | ||
| python python_script.py | ||
| ``` | ||
|
|
||
| If you use mpi | ||
| ```sh | ||
| mpiexec bash -c "module purge; | ||
| source venv/bin/activate; | ||
| export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK; | ||
| python python_script.py" | ||
| ``` | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.