Skip to content

Commit 4f8e266

Browse files
authored
Merge pull request #24 from UnityHPC/feature/documentation-updates
documentation updates
2 parents 4e45a93 + dac93a2 commit 4f8e266

File tree

17 files changed

+1808
-32
lines changed

17 files changed

+1808
-32
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,4 +41,5 @@ data/
4141
*.patch
4242
*.diff
4343
/docs/build
44-
/site
44+
/site
45+
.quarto

docs/README.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# DS4CG Job Analytics Documentation
2+
3+
This directory contains the documentation for the DS4CG Job Analytics project.
4+
5+
## Overview
6+
The documentation provides detailed information about the data pipeline, analysis scripts, reporting tools, and usage instructions for the DS4CG Job Analytics platform. It is intended for users, contributors, and administrators who want to understand or extend the analytics and reporting capabilities.
7+
8+
## How to Build and View the Documentation
9+
10+
The documentation is built using [MkDocs](https://www.mkdocs.org/) and [Quarto](https://quarto.org/) for interactive reports and notebooks.
11+
12+
### MkDocs
13+
- To serve the documentation locally:
14+
```sh
15+
mkdocs serve
16+
```
17+
This will start a local server (usually at http://127.0.0.1:8000/) where you can browse the docs.
18+
19+
- To build the static site:
20+
```sh
21+
mkdocs build
22+
```
23+
The output will be in the `site/` directory.
24+
25+
### Quarto
26+
- Quarto is used for rendering interactive reports and notebooks (e.g., `.qmd` files).
27+
- To render a Quarto report:
28+
```sh
29+
quarto render path/to/report.qmd
30+
```
31+
32+
## Structure
33+
- `index.md`: Main landing page for the documentation site.
34+
- `about.md`: Project background and team information.
35+
- `preprocess.md`: Data preprocessing details.
36+
- `analysis/`, `visualization/`, `mvp_scripts/`: Subsections for specific topics and scripts.
37+
- `notebooks/`: Example notebooks and interactive analysis.
38+
39+
## Requirements
40+
- Python 3.10+
41+
- MkDocs (`pip install mkdocs`)
42+
- Quarto (see https://quarto.org/docs/get-started/ for installation)
43+
44+
## Contributing
45+
Contributions to the documentation are welcome! Edit or add Markdown files in this directory and submit a pull request.
46+
47+
---
48+
For more details, see the main project README or contact the maintainers.
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
title: Frequency Analysis
3+
---
4+
5+
<!-- TODO (Ayush): Update when frequency analysis is merged-->
6+
<!-- ::: src.analysis.frequency_analysis -->

docs/contact.md

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Contact and Support
2+
3+
If you encounter issues or need help using the DS4CG Unity Job Analytics project, here are the best ways to get support.
4+
5+
## GitHub Issues
6+
7+
For technical problems, bug reports, or feature requests, please create a GitHub issue:
8+
9+
**🐛 Bug Reports**
10+
- Provide a clear description of the problem
11+
- Include steps to reproduce the issue
12+
- Share error messages and stack traces
13+
- Mention your environment (Python version, OS, etc.)
14+
15+
**💡 Feature Requests**
16+
- Describe the desired functionality
17+
- Explain the use case and benefits
18+
- Suggest possible implementation approaches
19+
20+
**📚 Documentation Issues**
21+
- Point out unclear or missing documentation
22+
- Suggest improvements or additions
23+
- Request examples for specific use cases
24+
25+
[**Create a GitHub Issue →**](https://github.com/your-org/ds4cg-job-analytics/issues)
26+
27+
## Response Time
28+
29+
The development team will review and respond to GitHub issues periodically. Please allow:
30+
- **Critical bugs**: 1-2 business days
31+
- **General issues**: 3-5 business days
32+
- **Feature requests**: 1-2 weeks
33+
- **Documentation updates**: 1 week
34+
35+
## Community Guidelines
36+
37+
When seeking help, please:
38+
39+
**Do:**
40+
41+
- Search existing issues first
42+
- Provide minimal reproducible examples
43+
- Use clear, descriptive titles
44+
- Be respectful and patient
45+
- Share relevant context and details
46+
47+
**Don't:**
48+
49+
- Post duplicate issues
50+
- Share sensitive data or credentials
51+
- Expect immediate responses
52+
- Use issues for general questions about Slurm or Unity
53+
54+
## Unity Slack
55+
56+
For urgent questions related to Unity cluster operations or data access, you can reach out via the Unity Slack workspace. However, for project-specific issues, GitHub issues are preferred.
57+
58+
## Contributing
59+
60+
Interested in contributing to the project? We welcome:
61+
62+
- **Code contributions**: Bug fixes, new features, optimizations
63+
- **Documentation**: Improvements, examples, tutorials
64+
- **Testing**: Additional test cases, bug reports
65+
- **Feedback**: User experience insights, suggestions
66+
67+
See our contributing guidelines in the repository for detailed information about:
68+
69+
- Development setup
70+
- Code style requirements
71+
- Pull request process
72+
- Testing procedures
73+
74+
## Academic Collaboration
75+
76+
This project is part of the Data Science for the Common Good (DS4CG) program. For academic collaborations or research partnerships, consider reaching out through:
77+
78+
- **DS4CG Program**: [DS4CG Website](https://ds.cs.umass.edu/programs/ds4cg)
79+
- **Unity HPC Team**: For cluster-related inquiries
80+
81+
## Project Maintainers
82+
83+
- **Project Lead**: Christopher Odoom
84+
- **Contributors**: DS4CG Summer 2025 Internship Team
85+
86+
## Additional Resources
87+
88+
Before reaching out for support, please check:
89+
90+
1. **[FAQ](faq.md)** - Common questions and solutions
91+
2. **[Getting Started](getting-started.md)** - Setup and basic usage
92+
3. **[Demo](demo.md)** - Working examples and code samples
93+
4. **Jupyter Notebooks** - Interactive examples in `notebooks/` directory
94+
5. **API Documentation** - Detailed function/class documentation
95+
96+
## Reporting Security Issues
97+
98+
If you discover a security vulnerability, please **do not** create a public GitHub issue. Instead:
99+
100+
1. Contact the project maintainers directly
101+
2. Provide a detailed description of the vulnerability
102+
3. Allow time for the issue to be addressed before public disclosure
103+
104+
---
105+
106+
**Remember**: The team volunteers their time to maintain this project. Clear, detailed, and respectful communication helps everyone get the help they need more efficiently. Thank you for using the DS4CG Unity Job Analytics project!

docs/data-and-metrics.md

Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
# Data and Efficiency Metrics
2+
3+
This page provides comprehensive documentation about the data structure and efficiency metrics available in the DS4CG Unity Job Analytics project.
4+
5+
## Data Structure
6+
7+
The project works with job data from the Unity cluster's Slurm scheduler. After preprocessing, the data contains the following key attributes:
8+
9+
### Job Identification
10+
- **JobID** – Unique identifier for each job.
11+
- **ArrayID** – Array job identifier (`-1` for non-array jobs).
12+
- **User** – Username of the job submitter.
13+
- **Account** – Account/group associated with the job.
14+
15+
### Time Attributes
16+
- **StartTime** – When the job started execution (datetime).
17+
- **SubmitTime** – When the job was submitted (datetime).
18+
- **Elapsed** – Total runtime duration (timedelta).
19+
- **TimeLimit** – Maximum allowed runtime (timedelta).
20+
21+
### Resource Allocation
22+
- **GPUs** – Number of GPUs allocated.
23+
- **GPUType** – Type of GPU allocated (e.g., `"v100"`, `"a100"`, or `NA` for CPU-only jobs).
24+
- **Nodes** – Number of nodes allocated.
25+
- **CPUs** – Number of CPU cores allocated.
26+
- **ReqMem** – Requested memory.
27+
28+
### Job Status
29+
- **Status** – Final job status (`"COMPLETED"`, `"FAILED"`, `"CANCELLED"`, etc.).
30+
- **ExitCode** – Job exit code.
31+
- **QOS** – Quality of Service level.
32+
- **Partition** – Cluster partition used.
33+
34+
### Resource Usage
35+
- **CPUTime** – Total CPU time used.
36+
- **CPUTimeRAW** – Raw CPU time measurement.
37+
38+
### Constraints and Configuration
39+
- **Constraints** – Hardware constraints specified.
40+
- **Interactive** – Whether the job was interactive (`"interactive"` or `"non-interactive"`).
41+
42+
---
43+
44+
## Efficiency and Resource Metrics
45+
46+
### GPU and VRAM Metrics
47+
48+
- **GPU Count** (`gpu_count`)
49+
Number of GPUs allocated to the job.
50+
51+
- **Job Hours** (`job_hours`)
52+
$$
53+
\text{job\_hours} = \frac{\text{Elapsed (seconds)}}{3600} \times \text{gpu\_count}
54+
$$
55+
56+
- **VRAM Constraint** (`vram_constraint`)
57+
VRAM requested via constraints, in GiB. Defaults are applied if not explicitly requested.
58+
59+
- **Partition Constraint** (`partition_constraint`)
60+
VRAM derived from selecting a GPU partition, in GiB.
61+
62+
- **Requested VRAM** (`requested_vram`)
63+
$$
64+
\text{requested\_vram} =
65+
\begin{cases}
66+
\text{partition\_constraint}, & \text{if available} \\
67+
\text{vram\_constraint}, & \text{otherwise}
68+
\end{cases}
69+
$$
70+
71+
- **Used VRAM** (`used_vram_gib`)
72+
Sum of peak VRAM used on all allocated GPUs (GiB).
73+
74+
- **Approximate Allocated VRAM** (`allocated_vram`)
75+
Estimated VRAM based on GPU model(s) and job node allocation.
76+
77+
- **Total VRAM-Hours** (`vram_hours`)
78+
$$
79+
\text{vram\_hours} = \text{allocated\_vram} \times \text{job\_hours}
80+
$$
81+
82+
- **Allocated VRAM Efficiency** (`alloc_vram_efficiency`)
83+
$$
84+
\text{alloc\_vram\_efficiency} = \frac{\text{used\_vram\_gib}}{\text{allocated\_vram}}
85+
$$
86+
87+
- **VRAM Constraint Efficiency** (`vram_constraint_efficiency`)
88+
$$
89+
\text{vram\_constraint\_efficiency} =
90+
\frac{\text{used\_vram\_gib}}{\text{vram\_constraint}}
91+
$$
92+
93+
- **Allocated VRAM Efficiency Score** (`alloc_vram_efficiency_score`)
94+
$$
95+
\text{alloc\_vram\_efficiency\_score} =
96+
\ln(\text{alloc\_vram\_efficiency}) \times \text{vram\_hours}
97+
$$
98+
Penalizes long jobs with low VRAM efficiency.
99+
100+
- **VRAM Constraint Efficiency Score** (`vram_constraint_efficiency_score`)
101+
$$
102+
\text{vram\_constraint\_efficiency\_score} =
103+
\ln(\text{vram\_constraint\_efficiency}) \times \text{vram\_hours}
104+
$$
105+
106+
### CPU Memory Metrics
107+
- **Used CPU Memory** (`used_cpu_mem_gib`) – Peak CPU RAM usage in GiB.
108+
- **Allocated CPU Memory** (`allocated_cpu_mem_gib`) – Requested CPU RAM in GiB.
109+
- **CPU Memory Efficiency** (`cpu_mem_efficiency`)
110+
$$
111+
\text{cpu\_mem\_efficiency} = \frac{\text{used\_cpu\_mem\_gib}}{\text{allocated\_cpu\_mem\_gib}}
112+
$$
113+
114+
---
115+
116+
## User-Level Metrics
117+
118+
- **Job Count** (`job_count`) – Number of jobs submitted by the user.
119+
- **Total Job Hours** (`user_job_hours`) – Sum of job hours for all jobs of the user.
120+
- **Average Allocated VRAM Efficiency Score** (`avg_alloc_vram_efficiency_score`).
121+
- **Average VRAM Constraint Efficiency Score** (`avg_vram_constraint_efficiency_score`).
122+
123+
- **Weighted Average Allocated VRAM Efficiency**
124+
$$
125+
\text{expected\_value\_alloc\_vram\_efficiency} =
126+
\frac{\sum (\text{alloc\_vram\_efficiency} \times \text{vram\_hours})}
127+
{\sum \text{vram\_hours}}
128+
$$
129+
130+
- **Weighted Average VRAM Constraint Efficiency**
131+
$$
132+
\text{expected\_value\_vram\_constraint\_efficiency} =
133+
\frac{\sum (\text{vram\_constraint\_efficiency} \times \text{vram\_hours})}
134+
{\sum \text{vram\_hours}}
135+
$$
136+
137+
- **Weighted Average GPU Count**
138+
$$
139+
\text{expected\_value\_gpu\_count} =
140+
\frac{\sum (\text{gpu\_count} \times \text{vram\_hours})}
141+
{\sum \text{vram\_hours}}
142+
$$
143+
144+
- **Total VRAM-Hours** – Sum of allocated_vram × job_hours across all jobs of the user.
145+
146+
---
147+
148+
## Group-Level Metrics
149+
150+
For a group of users (e.g., PI group):
151+
152+
- **Job Count** – Total number of jobs across the group.
153+
- **PI Group Job Hours** (`pi_acc_job_hours`).
154+
- **PI Group VRAM Hours** (`pi_ac_vram_hours`).
155+
- **User Count**.
156+
- Group averages and weighted averages of efficiency metrics (similar formulas as above).
157+
158+
---
159+
160+
## Efficiency Categories
161+
- **High**: > 70%
162+
- **Medium**: 30–70%
163+
- **Low**: 10–30%
164+
- **Very Low**: < 10%

0 commit comments

Comments
 (0)