Welcome to the Advanced Python Workshop! This 20-hour course is designed to help researchers move beyond single-script, single-data-file analyses and adopt best practices in version control, collaboration, data management, modular coding, workflow orchestration, and environment management. If you’re eager to make your research code more robust, reproducible, and scalable, you’re in the right place!
Before the workshop, please ensure you have installed the following tools:
- Git
- Conda (via Anaconda, Miniconda, or Miniforge)
- Visual Studio Code (VSCode)
- GIN CLI Client
Date | Topic | Short Description |
---|---|---|
Feb 24 (9–12:30) | Git, GitHub, Conda, VSCode, & READMEs | An introduction to reproducibility concepts, environment management, and collaborative coding practices. Learn to manage your code and dependencies via Git, GitHub, Conda, and VSCode. |
Mar 3 (9–12:30) | Functions, Modules, & Testing | Dive into writing reusable functions, structuring larger projects into modules, and using Pytest to ensure code reliability. |
Mar 10 (9–12:30) | Dependency Inversion | Implement advanced design patterns for testability and modularity, making your codebase easier to extend and maintain. |
Mar 24 (9–12:30) | Scientific Data File Storage with HDF5. | Explore the JSON, YAML, Numpy, and HDF5 file formats for efficient, large-scale scientific data management and how to integrate it into your Python workflows. |
Mar 31 (9–12:30) | Workflow Management with Snakemake and Papermill | Orchestrate multi-step pipelines, manage complex data analysis workflows, and ensure reproducibility using Snakemake. |
Apr 7 (9–12:30) | Data Packaging with XArray and GIN | Make complex data easy to work with and easy to access. |
Note: An optional “Joker” session is tentatively planned for April 9 (9–12:30). Content will be determined based on class progress and participant feedback.