Skip to content

Latest commit

 

History

History
316 lines (235 loc) · 11.9 KB

File metadata and controls

316 lines (235 loc) · 11.9 KB

HPC Research Toolkit

Modular research toolkit for deep learning, computer vision, data preprocessing and analysis, 3D visualization, and associated workflows on HPC systems. Features shared utilities for analysis, logging, I/O, and visualization, plus applications for depth processing, multi-view stereo analysis, and scene rendering. Includes SLURM-based automations and configuration-driven execution for reproducible HPC workflows.

Overview

Modular research codebase optimized for HPC environments. Combines reusable shared utilities with specialized application modules to support scalable, automated, and reproducible workflows in deep learning, computer vision, and 3D processing under job scheduler management.

Repository Structure
code/
├── research_utils/          # Shared utilities package
│   ├── src/research_utils/
│   │   ├── core/           
│   │   ├── io/             
│   │   ├── logging/         
│   │   ├── ops/             
│   │   └── viz/             
│   └── pyproject.toml       
│
├── apps/                     # Application-specific code
│   ├── analysis/             
│   │   ├── depth_overlay/           
│   │   │   ├── src/                 
│   │   │   ├── scripts/              
│   │   │   ├── configs/               
│   │   │   ├── output/                
│   │   │   ├── logs/                  
│   │   │   └── pyproject.toml
│   │   └── depth_overlay_compare/   
│   │       ├── src/                 
│   │       ├── scripts/              
│   │       ├── configs/
│   │       ├── output/            
│   │       ├── logs/                
│   │       └── pyproject.toml
│   │
│   └── rendering/           
│       └── viser/
│           ├── dl3dv/       
│           │   ├── drafts/            
│           │   ├── logs/              
│           │   ├── render_scene.py
│           │   ├── render_scene_v2.py
│           │   └── pyproject.toml
│           └── wai-vis/     # 3D rendering
│               ├── modules/           
│               ├── main.py
│               ├── pyproject.toml
│               └── requirements.txt
│
└── scripts/                 # HPC workflow and utility scripts
    ├── python/              
    ├── slurm/              
    │   └── da3/            
    └── archive/             

Components

research_utils/

Reusable Python package with core utilities for research workflows:

  • Core: I/O operations and logging infrastructure
  • Logging: Advanced logging handlers with color support and JSON logging
  • Operations: Geometry and mathematical operations
  • Visualization: Overlay and plotting utilities

Installation:

cd research_utils
pip install -e .

apps/

Application-specific modules that depend on research_utils. Each application includes a pyproject.toml file for dependency management and can be installed in editable mode:

Analysis Tools

  • depth_overlay/: Visualizes depth maps generated by deep learning models via RGB-depth overlays.
  • depth_overlay_compare/: Side-by-side comparison of depth maps extracted from multiple models (e.g., Depth-Anything, MoGe, MVSAnywhere) overlaid on the same RGB images to evaluate model performance.

Rendering Tools

  • dl3dv/: Scene rendering scripts for 3D visualization
  • wai-vis/: A real-time, interactive web-based 3D visualization tool for WAI/Nerfstudio datasets.
    • This tool allows researchers to inspect preprocessing results-specifically RGB images coupled with depth maps. It projects 2D RGB-D data into an interactive 3D point cloud environment, complete with camera frustums, accessible directly via a web browser.
    • See: WAI-Viser

scripts/

Top-level scripts directory containing automated HPC workflows and utility tools:

python/

Utilities for local development and common data processing and research tasks. For rapid validation of deep learning outputs and dataset integrity before scaling to full HPC pipelines.

slurm/

Automations for distributed HPC execution. These manage the end-to-end lifecycle of model inference, multi-view stereo analysis, large-scale data engineering, and preprocessing workflows, including resource allocation, environment staging, and structured error handling.

Modularity

This codebase follows a modular architecture that promotes code reuse and maintainability:

Shared Utilities Package (research_utils)

The research_utils package serves as a centralized library of reusable components that all applications can import and use. This design pattern provides several benefits:

  • Code Reuse: Common functionality (logging, I/O, visualization) is implemented once and shared across all applications
  • Consistency: All applications use the same logging system, configuration loading, and utility functions
  • Maintainability: Bug fixes and improvements to shared utilities benefit all applications automatically
  • Separation of Concerns: Application-specific logic is isolated from general-purpose utilities

Example Usage:

from research_utils import plot_overlay, print_args, setup_logging, load_config

# All apps can use these shared utilities
setup_logging("configs/default_logging.json")
config = load_config("configs/default.json")
plot_overlay(rgb_path="...", depth_path="...", save_dir="...")

Application Modules (apps/)

Each application in the apps/ directory is:

  • Self-contained: Has its own configuration files, scripts, and output directories
  • Independent: Can be run separately without affecting other applications
  • Configurable: Uses JSON configuration files for easy customization
  • CLI-based: Provides command-line interfaces for HPC job submission

Usage Patterns

Configuration-Driven Execution

Applications are designed to be configuration-driven, making them ideal for HPC workflows where parameters may vary between job runs:

  1. Default Configurations: Each app includes default JSON configs in configs/
  2. Custom Configurations: Override defaults via command-line arguments
  3. Auto-Discovery: The load_config() utility automatically searches for config files relative to the calling script

Example:

# Use default config
python -m scripts.run_depth_overlay --rgb_path /path/to/rgb.jpg --depth_path /path/to/depth.exr

# Use custom config
python -m scripts.run_depth_overlay \
    --config_path configs/custom.json \
    --log_config configs/logging_verbos.json \
    --rgb_path /path/to/rgb.jpg \
    --depth_path /path/to/depth.exr

Command-Line Interface Pattern

All applications follow a consistent CLI pattern:

  • Required arguments: Essential inputs (e.g., file paths)
  • Optional arguments: Configurable parameters with sensible defaults
  • Config integration: Arguments can override or complement JSON configs
  • Logging setup: Logging is initialized early via --log_config argument

HPC Job Submission

Applications are designed to be submitted directly or as HPC jobs.

Logging System

Multi-formatter system optimized for interactive and batch use:

Features

  1. Multiple Formatters:

    • JSON Formatter: Structured logging for programmatic analysis and HPC job monitoring
    • Color Formatters: Human-readable console output with syntax highlighting
    • Readable Formatters: Plain text formats for log files
  2. Structured Logging: Support for custom extra fields that are automatically captured:

    logger.info("Processing file", extra={"path": "/data/file.csv", "request_id": "abc123"})
  3. Configurable via JSON: Logging behavior is configured through JSON files, allowing different log levels and handlers per application

  4. Multiple Handlers: Simultaneous output to:

    • Console (stdout): Colored, human-readable output for interactive use
    • File: Persistent logs with rotation support
    • JSONL files: Structured logs for post-processing

Logging Configuration

Logging is configured via JSON files (e.g., configs/default_logging.json):

{
    "version": 1,
    "formatters": {
        "json": {
            "()": "research_utils.logging.handlers.JSONFormatter"
        },
        "readable_color_stdout": {
            "()": "research_utils.logging.handlers.ReadableColorFormatter"
        }
    },
    "handlers": {
        "file": {
            "class": "logging.handlers.RotatingFileHandler",
            "formatter": "readable",
            "filename": "logs/app.log"
        },
        "stdout": {
            "class": "logging.StreamHandler",
            "formatter": "readable_color_stdout",
            "level": "DEBUG"
        }
    }
}

Usage in Applications

Applications initialize logging early in their execution:

from research_utils import setup_logging
import logging

# setup logging from config file
setup_logging("configs/default_logging.json")
logger = logging.getLogger(__name__)

# use structured logging
logger.info("Processing started", extra={"input_args": vars(args)})
logger.error("File not found", extra={"path": file_path})

Benefits for HPC Workflows

  • Post-Processing: JSON logs can be parsed and analyzed after job completion
  • Debugging: Detailed logs with function names, line numbers, and timestamps
  • Monitoring: Structured logs enable automated job monitoring and error detection
  • Reproducibility: Log configurations are version-controlled alongside code

HPC Workflow Integration

This codebase is designed to integrate with HPC job schedulers (e.g., SLURM, PBS). When running on HPC systems:

  1. Environment Setup: Use virtual environments (.venvs/ directory) to manage dependencies per job or workflow
  2. Resource Management: Configure job scripts to request appropriate compute resources (CPU, GPU, memory)
  3. Data I/O: Ensure data paths are accessible from compute nodes (shared filesystems, network storage)
  4. Logging: Structured logging outputs (JSON, JSONL) are designed for post-processing and analysis of HPC job outputs

Dependencies

Core Dependencies (research_utils)

  • Deep Learning: PyTorch, TorchVision
  • Core & Math: NumPy, OpenCV, Matplotlib, pathlib, OpenEXR, imageio, Pillow
  • 3D Processing & Geometry: trimesh, scipy, shapely, manifold3d, rtree
  • 3D Formats & Parsing: lxml, jsonschema, pycollada, xxhash

Application Dependencies

See individual pyproject.toml and requirements.txt files in each application directory for specific dependencies.

All applications depend on research-utils, which must be installed first.

Usage

Setting Up the Environment

  1. Install the shared utilities package:
cd research_utils
pip install -e .
  1. Install application-specific dependencies. Each app can be installed in editable mode using its pyproject.toml:
# Install a specific application
cd apps/analysis/depth_overlay
pip install -e .

# Or install multiple apps
cd apps/rendering/viser/wai-vis
pip install -e .

Running Applications

Each application typically includes:

  • Configuration files in configs/
  • Scripts in scripts/
  • Output directories for results
  • Logging directories for job outputs

Refer to individual application directories for specific usage instructions.

Development

  • Python 3.8+
  • Virtual environments are strictly recommended due to version conflicts
  • Follow the existing structure when adding new applications or utilities

Notes

  • This repository is part of a larger HPC workflow system
  • Configuration files use JSON format