Skip to content

Getting Started with EasyData Environments

Amy Wooding edited this page Dec 31, 2022 · 25 revisions

Let's say that you want to use EasyData to manage your environments reproducibly. Here's a tutorial outlining how you can get up and running managing your environments this way. Here's what we'll cover.

  1. Install requirements
    • Installing Anaconda or Miniconda
    • Creating an easydata conda environment
  2. Create your repo (also called a project) based on an EasyData project template
    • Create a project using an EasyData cookiecutter template
    • Initialize the project as a git repo
  3. Explore the default settings
    • Explore the default conda environment
    • Explore the default paths
  4. Customize your conda environment
    • Updating the environment
    • Checking your updates back into the project repo
    • Deleting and recreating the environment
  5. Customize your local settings (things you shouldn't check in to a repo)
    • Customize your local paths configuration
    • Customize your environment variables
    • Customizing your local config to include credentials

1. Install Requirements

This is a setup step that you only need to do once. After this is done, you shouldn't need to do this step again. Occasionally it may be necessary to update your requirements.

  1. Install anaconda: if you don't already have anaconda or miniconda, you'll need to install it following the instructions for your platform (MacOS/Windows/Linux)
  2. Open a terminal window
  3. Install the remaining requirements:
conda create -n easydata python=3 cookiecutter
conda activate easydata
pip install ruamel.yaml

We've created a conda environment that we'll use to create EasyData projects easydata to house the other requirements. Once this environment exists as created above, we won't need to create it again.

1. Create your EasyData repo

The best time to use an EasyData template is when you first create your project/repo. We will assume that you are starting your project from scratch.

Note: We recommend using EasyData to create every project you work with so that there is at least a 1:1 ratio between your conda environments and your projects. There are many issues with having more than one repo using the same conda environment, so whatever you do please don't use a monolithic environment to rule them all. For more on this idea see Tip #2 of Kjell's talk on building reproducible workflows.

Create a project using an EasyData cookiecutter template

  1. Open a terminal window
  2. Activate the easydata environment created above: conda activate easydata
  3. Navigate to the location that you'd like your project to located (without creating the project directory, that happens automagically in the next step). For example, if I want my project to be located in /home/my-repo-name I would navigate to /home in this step.
  4. Create your project. Run cookiecutter https://github.com/hackalog/easydata and fill in the prompts. Note that the repo name that you enter will be the name of the directory that your project lives in.

We've now created a project filled with the EasyData template files in my-repo-name.

Initialize the project as a git repo

We'd like to use git to keep track of changes that are made to our project. Now is the best time to initialize the git repo.

  1. Navigate into the project: cd <my-repo-name> as entered into the prompts of the previous step
  2. Initialize the repo:
git init
git add .
git commit -m "initial import"
git branch easydata   # tag for future easydata upgrades
  1. Tag this branch for future EasyData updates

3. Explore the default settings

4. Customize your conda environment

5. Customize your local settings

Clone this wiki locally