Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,5 @@ data/
*.patch
*.diff
/docs/build
/site
/site
.quarto
48 changes: 48 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# DS4CG Job Analytics Documentation

This directory contains the documentation for the DS4CG Job Analytics project.

## Overview
The documentation provides detailed information about the data pipeline, analysis scripts, reporting tools, and usage instructions for the DS4CG Job Analytics platform. It is intended for users, contributors, and administrators who want to understand or extend the analytics and reporting capabilities.

## How to Build and View the Documentation

The documentation is built using [MkDocs](https://www.mkdocs.org/) and [Quarto](https://quarto.org/) for interactive reports and notebooks.

### MkDocs
- To serve the documentation locally:
```sh
mkdocs serve
```
This will start a local server (usually at http://127.0.0.1:8000/) where you can browse the docs.

- To build the static site:
```sh
mkdocs build
```
The output will be in the `site/` directory.

### Quarto
- Quarto is used for rendering interactive reports and notebooks (e.g., `.qmd` files).
- To render a Quarto report:
```sh
quarto render path/to/report.qmd
```

## Structure
- `index.md`: Main landing page for the documentation site.
- `about.md`: Project background and team information.
- `preprocess.md`: Data preprocessing details.
- `analysis/`, `visualization/`, `mvp_scripts/`: Subsections for specific topics and scripts.
- `notebooks/`: Example notebooks and interactive analysis.

## Requirements
- Python 3.10+
- MkDocs (`pip install mkdocs`)
- Quarto (see https://quarto.org/docs/get-started/ for installation)

## Contributing
Contributions to the documentation are welcome! Edit or add Markdown files in this directory and submit a pull request.

---
For more details, see the main project README or contact the maintainers.
6 changes: 6 additions & 0 deletions docs/analysis/frequency_analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
title: Frequency Analysis
---

<!-- TODO (Ayush): Update when frequency analysis is merged-->
<!-- ::: src.analysis.frequency_analysis -->
106 changes: 106 additions & 0 deletions docs/contact.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Contact and Support

If you encounter issues or need help using the DS4CG Unity Job Analytics project, here are the best ways to get support.

## GitHub Issues

For technical problems, bug reports, or feature requests, please create a GitHub issue:

**🐛 Bug Reports**
- Provide a clear description of the problem
- Include steps to reproduce the issue
- Share error messages and stack traces
- Mention your environment (Python version, OS, etc.)

**💡 Feature Requests**
- Describe the desired functionality
- Explain the use case and benefits
- Suggest possible implementation approaches

**📚 Documentation Issues**
- Point out unclear or missing documentation
- Suggest improvements or additions
- Request examples for specific use cases

[**Create a GitHub Issue →**](https://github.com/your-org/ds4cg-job-analytics/issues)

## Response Time

The development team will review and respond to GitHub issues periodically. Please allow:
- **Critical bugs**: 1-2 business days
- **General issues**: 3-5 business days
- **Feature requests**: 1-2 weeks
- **Documentation updates**: 1 week

## Community Guidelines

When seeking help, please:

✅ **Do:**

- Search existing issues first
- Provide minimal reproducible examples
- Use clear, descriptive titles
- Be respectful and patient
- Share relevant context and details

❌ **Don't:**

- Post duplicate issues
- Share sensitive data or credentials
- Expect immediate responses
- Use issues for general questions about Slurm or Unity

## Unity Slack

For urgent questions related to Unity cluster operations or data access, you can reach out via the Unity Slack workspace. However, for project-specific issues, GitHub issues are preferred.

## Contributing

Interested in contributing to the project? We welcome:

- **Code contributions**: Bug fixes, new features, optimizations
- **Documentation**: Improvements, examples, tutorials
- **Testing**: Additional test cases, bug reports
- **Feedback**: User experience insights, suggestions

See our contributing guidelines in the repository for detailed information about:

- Development setup
- Code style requirements
- Pull request process
- Testing procedures

## Academic Collaboration

This project is part of the Data Science for the Common Good (DS4CG) program. For academic collaborations or research partnerships, consider reaching out through:

- **DS4CG Program**: [DS4CG Website](https://ds.cs.umass.edu/programs/ds4cg)
- **Unity HPC Team**: For cluster-related inquiries

## Project Maintainers

- **Project Lead**: Christopher Odoom
- **Contributors**: DS4CG Summer 2025 Internship Team

## Additional Resources

Before reaching out for support, please check:

1. **[FAQ](faq.md)** - Common questions and solutions
2. **[Getting Started](getting-started.md)** - Setup and basic usage
3. **[Demo](demo.md)** - Working examples and code samples
4. **Jupyter Notebooks** - Interactive examples in `notebooks/` directory
5. **API Documentation** - Detailed function/class documentation

## Reporting Security Issues

If you discover a security vulnerability, please **do not** create a public GitHub issue. Instead:

1. Contact the project maintainers directly
2. Provide a detailed description of the vulnerability
3. Allow time for the issue to be addressed before public disclosure

---

**Remember**: The team volunteers their time to maintain this project. Clear, detailed, and respectful communication helps everyone get the help they need more efficiently. Thank you for using the DS4CG Unity Job Analytics project!
164 changes: 164 additions & 0 deletions docs/data-and-metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# Data and Efficiency Metrics

This page provides comprehensive documentation about the data structure and efficiency metrics available in the DS4CG Unity Job Analytics project.

## Data Structure

The project works with job data from the Unity cluster's Slurm scheduler. After preprocessing, the data contains the following key attributes:

### Job Identification
- **JobID** – Unique identifier for each job.
- **ArrayID** – Array job identifier (`-1` for non-array jobs).
- **User** – Username of the job submitter.
- **Account** – Account/group associated with the job.

### Time Attributes
- **StartTime** – When the job started execution (datetime).
- **SubmitTime** – When the job was submitted (datetime).
- **Elapsed** – Total runtime duration (timedelta).
- **TimeLimit** – Maximum allowed runtime (timedelta).

### Resource Allocation
- **GPUs** – Number of GPUs allocated.
- **GPUType** – Type of GPU allocated (e.g., `"v100"`, `"a100"`, or `NA` for CPU-only jobs).
- **Nodes** – Number of nodes allocated.
- **CPUs** – Number of CPU cores allocated.
- **ReqMem** – Requested memory.

### Job Status
- **Status** – Final job status (`"COMPLETED"`, `"FAILED"`, `"CANCELLED"`, etc.).
- **ExitCode** – Job exit code.
- **QOS** – Quality of Service level.
- **Partition** – Cluster partition used.

### Resource Usage
- **CPUTime** – Total CPU time used.
- **CPUTimeRAW** – Raw CPU time measurement.

### Constraints and Configuration
- **Constraints** – Hardware constraints specified.
- **Interactive** – Whether the job was interactive (`"interactive"` or `"non-interactive"`).

---

## Efficiency and Resource Metrics

### GPU and VRAM Metrics

- **GPU Count** (`gpu_count`)
Number of GPUs allocated to the job.

- **Job Hours** (`job_hours`)
$$
\text{job\_hours} = \frac{\text{Elapsed (seconds)}}{3600} \times \text{gpu\_count}
$$

- **VRAM Constraint** (`vram_constraint`)
VRAM requested via constraints, in GiB. Defaults are applied if not explicitly requested.

- **Partition Constraint** (`partition_constraint`)
VRAM derived from selecting a GPU partition, in GiB.

- **Requested VRAM** (`requested_vram`)
$$
\text{requested\_vram} =
\begin{cases}
\text{partition\_constraint}, & \text{if available} \\
\text{vram\_constraint}, & \text{otherwise}
\end{cases}
$$

- **Used VRAM** (`used_vram_gib`)
Sum of peak VRAM used on all allocated GPUs (GiB).

- **Approximate Allocated VRAM** (`allocated_vram`)
Estimated VRAM based on GPU model(s) and job node allocation.

- **Total VRAM-Hours** (`vram_hours`)
$$
\text{vram\_hours} = \text{allocated\_vram} \times \text{job\_hours}
$$

- **Allocated VRAM Efficiency** (`alloc_vram_efficiency`)
$$
\text{alloc\_vram\_efficiency} = \frac{\text{used\_vram\_gib}}{\text{allocated\_vram}}
$$

- **VRAM Constraint Efficiency** (`vram_constraint_efficiency`)
$$
\text{vram\_constraint\_efficiency} =
\frac{\text{used\_vram\_gib}}{\text{vram\_constraint}}
$$

- **Allocated VRAM Efficiency Score** (`alloc_vram_efficiency_score`)
$$
\text{alloc\_vram\_efficiency\_score} =
\ln(\text{alloc\_vram\_efficiency}) \times \text{vram\_hours}
$$
Penalizes long jobs with low VRAM efficiency.

- **VRAM Constraint Efficiency Score** (`vram_constraint_efficiency_score`)
$$
\text{vram\_constraint\_efficiency\_score} =
\ln(\text{vram\_constraint\_efficiency}) \times \text{vram\_hours}
$$

### CPU Memory Metrics
- **Used CPU Memory** (`used_cpu_mem_gib`) – Peak CPU RAM usage in GiB.
- **Allocated CPU Memory** (`allocated_cpu_mem_gib`) – Requested CPU RAM in GiB.
- **CPU Memory Efficiency** (`cpu_mem_efficiency`)
$$
\text{cpu\_mem\_efficiency} = \frac{\text{used\_cpu\_mem\_gib}}{\text{allocated\_cpu\_mem\_gib}}
$$

---

## User-Level Metrics

- **Job Count** (`job_count`) – Number of jobs submitted by the user.
- **Total Job Hours** (`user_job_hours`) – Sum of job hours for all jobs of the user.
- **Average Allocated VRAM Efficiency Score** (`avg_alloc_vram_efficiency_score`).
- **Average VRAM Constraint Efficiency Score** (`avg_vram_constraint_efficiency_score`).

- **Weighted Average Allocated VRAM Efficiency**
$$
\text{expected\_value\_alloc\_vram\_efficiency} =
\frac{\sum (\text{alloc\_vram\_efficiency} \times \text{vram\_hours})}
{\sum \text{vram\_hours}}
$$

- **Weighted Average VRAM Constraint Efficiency**
$$
\text{expected\_value\_vram\_constraint\_efficiency} =
\frac{\sum (\text{vram\_constraint\_efficiency} \times \text{vram\_hours})}
{\sum \text{vram\_hours}}
$$

- **Weighted Average GPU Count**
$$
\text{expected\_value\_gpu\_count} =
\frac{\sum (\text{gpu\_count} \times \text{vram\_hours})}
{\sum \text{vram\_hours}}
$$

- **Total VRAM-Hours** – Sum of allocated_vram × job_hours across all jobs of the user.

---

## Group-Level Metrics

For a group of users (e.g., PI group):

- **Job Count** – Total number of jobs across the group.
- **PI Group Job Hours** (`pi_acc_job_hours`).
- **PI Group VRAM Hours** (`pi_ac_vram_hours`).
- **User Count**.
- Group averages and weighted averages of efficiency metrics (similar formulas as above).

---

## Efficiency Categories
- **High**: > 70%
- **Medium**: 30–70%
- **Low**: 10–30%
- **Very Low**: < 10%
Loading