Repository containing scaffolding for a Python 3-based data science project.
Simply follow the instructions to create a new project repository from this template.
Project organization is based on ideas from Good Enough Practices for Scientific Computing.
- Put each project in its own directory, which is named after the project.
- Put external scripts or compiled programs in the
bindirectory. - Put raw data and metadata in a
datadirectory. - Put text documents associated with the project in the
docdirectory. - Put all Docker related files in the
dockerdirectory. - Install the Conda environment into an
envdirectory. - Put all notebooks in the
notebooksdirectory. - Put files generated during cleanup and analysis in a
resultsdirectory. - Put project source code in the
srcdirectory. - Name all files to reflect their content or function.
After adding any necessary dependencies for your project to the Conda environment.yml file
(or the requirements.txt file), you can create the environment in a sub-directory of your
project directory by running the following command.
ENV_PREFIX=$PWD/env
conda env create --prefix $ENV_PREFIX --file environment.yml --forceOnce the new environment has been created you can activate the environment with the following command.
conda activate $ENV_PREFIXNote that the ENV_PREFIX directory is not under version control as it can always be re-created as
necessary.
If you wish to use any JupyterLab extensions included in the environment.yml and requirements.txt
files then you need to activate the environment and rebuild the JupyterLab application using the
following commands to source the postBuild script.
conda activate $ENV_PREFIX # optional if environment already active
source postBuildFor your convenience these commands have been combined in a shell script ./bin/create-conda-env.sh.
Running the shell script will create the Conda environment, activate the Conda environment, and build
JupyterLab with any additional extensions. The script should be run from the project root directory as
follows.
./bin/create-conda-env.shThe list of explicit dependencies for the project are listed in the environment.yml file. To see
the full lost of packages installed into the environment run the following command.
conda list --prefix $ENV_PREFIXIf you add (remove) dependencies to (from) the environment.yml file or the requirements.txt file
after the environment has already been created, then you can re-create the environment with the
following command.
$ conda env create --prefix $ENV_PREFIX --file environment.yml --forceIf you have added any JupyterLab extensions or made any other changes to the postBuild script, then you
should re-create the entire Conda environment by re-running the bin/create-conda-env.sh scipt as follows.
./bin/create-conda-env.shIn order to build Docker images for your project and run containers you will need to install Docker and Docker Compose.
Detailed instructions for using Docker to build and image and launch containers can be found in
the docker/README.md.