Skip to content

Commit fa61ac0

Browse files
committed
add README
1 parent d87be16 commit fa61ac0

File tree

3 files changed

+157
-100
lines changed

3 files changed

+157
-100
lines changed

README.md

Lines changed: 155 additions & 98 deletions
Original file line numberDiff line numberDiff line change
@@ -2,108 +2,170 @@
22

33
An interactive visualization tool for [TaskVine](https://github.com/cooperative-computing-lab/cctools), a task scheduler for large workflows to run efficiently on HPC clusters. This tool helps you analyze task execution patterns, file transfers, resource utilization, storage consumption, and other key metrics.
44

5-
## Quick Install
5+
## Installation
6+
7+
### For Users (Recommended)
8+
9+
Install directly from PyPI:
10+
11+
```bash
12+
pip install taskvine-report-tool
13+
```
14+
15+
After installation, you can use the commands `vine_parse` and `vine_report` directly from anywhere.
16+
17+
### For Developers
18+
19+
If you want to contribute to development or modify the source code:
620

721
```bash
822
git clone https://github.com/cooperative-computing-lab/taskvine-report-tool.git
923
cd taskvine-report-tool
10-
pip install .
24+
pip install -e .
1125
```
1226

1327
## Usage Guide
1428

1529
The tool provides two main commands:
1630

17-
- 🔍 `vine_parse` - Parse TaskVine logs
31+
- 🔍 `vine_parse` - Parse TaskVine logs and generate analysis data
1832
- 🌐 `vine_report` - Start web visualization server
1933

20-
Follow these steps to use the visualization tool:
34+
### Command Reference
2135

22-
### 1. Prepare Log Files
36+
#### `vine_parse` - Parse TaskVine Logs
2337

24-
After running your TaskVine workflow, you'll find the logs in a directory named with a timestamp (e.g., `2025-05-20T110437`) or your specified workflow name. The default structure looks like this:
38+
**Required Parameters:**
39+
- `--templates`: List of log directory names/patterns (required)
2540

26-
```
27-
workflow_name/
28-
└── vine-logs/
29-
├── debug
30-
├── performance
31-
├── taskgraph
32-
├── transactions
33-
└── workflow.json
34-
```
41+
**Optional Parameters:**
42+
- `--logs-dir`: Base directory containing log folders (default: current directory)
3543

36-
To use these logs with the visualization tool:
44+
**Usage Examples:**
3745

38-
1. Copy the entire workflow directory to a logs directory:
3946
```bash
40-
mkdir -p logs
41-
cp -r /path/to/workflow_name logs/
42-
```
47+
# Basic usage - parse specific log directories (--templates is required)
48+
vine_parse --templates experiment1 experiment2
4349

44-
2. Parse the logs and generate visualization data:
45-
```bash
46-
vine_parse logs/your_workflow_name
47-
```
50+
# Use glob patterns to match multiple directories
51+
vine_parse --templates exp* test* checkpoint_*
4852

49-
Or parse all log collections at once:
50-
```bash
51-
vine_parse --all --logs-dir logs
53+
# Specify a different logs directory
54+
vine_parse --logs-dir /path/to/logs --templates experiment1
55+
56+
# Parse directories matching patterns in a specific directory
57+
vine_parse --logs-dir /home/user/logs --templates workflow_* test_*
5258
```
5359

54-
3. Start the visualization server:
60+
**Default Behavior:**
61+
- If no `--logs-dir` is specified, uses current working directory
62+
- The `--templates` parameter is **required** - the command will fail without it
63+
- Patterns support shell glob expansion (*, ?, [])
64+
- Automatically filters out directories that don't contain `vine-logs` subdirectory
65+
66+
#### `vine_report` - Start Web Server
67+
68+
**All Parameters are Optional:**
69+
70+
- `--logs-dir`: Directory containing log folders (default: current directory)
71+
- `--port`: Port number for the web server (default: 9122)
72+
- `--host`: Host address to bind to (default: 0.0.0.0)
73+
74+
**Usage Examples:**
75+
5576
```bash
77+
# Basic usage - start server with all defaults
5678
vine_report
57-
```
5879

59-
4. View the report in your browser at `http://localhost:9122`
80+
# Specify custom port and logs directory
81+
vine_report --port 8080 --logs-dir /path/to/logs
6082

61-
Note: In the web interface, you'll only see log collections that have been successfully processed by `vine_parse`. You can process multiple log collections at once:
83+
# Bind to specific host (restrict access)
84+
vine_report --host 127.0.0.1 --port 9122
6285

63-
```bash
64-
vine_parse logs/log1 logs/log2 logs/log3
86+
# Allow remote access (default behavior)
87+
vine_report --host 0.0.0.0 --port 9122
6588
```
6689

67-
### 2. Command Reference
90+
**Default Behavior:**
91+
- Uses current working directory as logs directory
92+
- Starts server on port 9122
93+
- Binds to all interfaces (0.0.0.0) allowing remote access
94+
- Displays all available IP addresses where the server can be accessed
6895

69-
#### `vine_parse` - Parse TaskVine Logs
96+
## Quick Start
97+
98+
Follow these steps to use the visualization tool:
99+
100+
### 1. Navigate to Your Log Directory
101+
102+
After running your TaskVine workflow, the logs are automatically saved in the `vine-run-info` directory within your workflow's working directory. Navigate to this directory:
70103

71104
```bash
72-
# Basic usage - specify individual log directories
73-
vine_parse experiment1 experiment2
105+
cd your_workflow_directory/vine-run-info
106+
```
107+
108+
You'll see a structure like this containing your experiment runs:
74109

75-
# Process all log directories in current directory
76-
vine_parse --all
110+
```
111+
vine-run-info/
112+
├── experiment1/
113+
│ └── vine-logs/
114+
│ ├── debug
115+
│ ├── performance
116+
│ ├── taskgraph
117+
│ ├── transactions
118+
│ └── workflow.json
119+
├── experiment2/
120+
│ └── vine-logs/
121+
└── test_run/
122+
└── vine-logs/
123+
```
77124

78-
# Process all log directories in a specific directory
79-
vine_parse --all --logs-dir /path/to/logs
125+
### 2. Parse and Visualize
80126

81-
# Specify custom logs directory for individual directories
82-
vine_parse --logs-dir /path/to/logs experiment1
127+
From within the `vine-run-info` directory:
83128

84-
# Get help
85-
vine_parse --help
129+
1. Parse specific experiments (**--templates is required**):
130+
```bash
131+
vine_parse --templates experiment1 experiment2
86132
```
87133

88-
#### `vine_report` - Start Web Server
134+
Or parse all experiments matching a pattern:
135+
```bash
136+
vine_parse --templates exp* test_*
137+
```
89138

139+
2. Start the visualization server:
90140
```bash
91-
# Basic usage
92141
vine_report
142+
```
93143

94-
# Specify custom port and logs directory
95-
vine_report --port 8080 --logs-dir /path/to/logs
144+
3. View the report in your browser at `http://localhost:9122`
96145

97-
# Allow remote access
98-
vine_report --host 0.0.0.0 --port 9122
146+
Note: In the web interface, you'll only see log collections that have been successfully processed by `vine_parse`.
147+
148+
### 2. Working with Different Log Directories
149+
150+
If your logs are in a different location, you can specify the base directory containing your log folders using `--logs-dir`:
99151

100-
# Get help
101-
vine_report --help
152+
```bash
153+
# If your logs are in a custom location:
154+
# /home/user/custom_logs/
155+
# ├── experiment1/vine-logs/
156+
# ├── experiment2/vine-logs/
157+
# └── test_run/vine-logs/
158+
159+
# Parse specific experiments from custom location
160+
vine_parse --logs-dir /home/user/custom_logs --templates experiment1 experiment2
161+
162+
# Parse all experiments matching pattern from custom location
163+
vine_parse --logs-dir /home/user/custom_logs --templates exp* test*
102164
```
103165

104-
### 3. Alternative: Configure TaskVine Log Location
166+
### 3. Customizing TaskVine Log Location
105167

106-
Instead of manually copying logs, you can configure TaskVine to generate logs directly in the correct location. When creating your TaskVine manager, set these parameters:
168+
By default, TaskVine creates a `vine-run-info` directory in your working directory. You can customize this location when creating your TaskVine manager:
107169

108170
```python
109171
manager = vine.Manager(
@@ -124,62 +186,57 @@ This will automatically create the correct directory structure:
124186

125187
After your workflow completes, simply:
126188
1. Navigate to your analysis directory: `cd ~/my_analysis_directory`
127-
2. Parse the logs: `vine_parse your_workflow_name`
189+
2. Parse the logs: `vine_parse --templates your_workflow_name`
128190
3. Start the server: `vine_report`
129191
4. View at `http://localhost:9122`
130192

131-
### 4. Multiple Log Collections
132-
133-
You can have multiple log collections. For example:
193+
### 4. Generated Data Structure
134194

195+
After parsing, each experiment will have multiple generated directories:
135196
```
136-
logs/
137-
├── experiment1/
138-
│ └── vine-logs/
139-
├── large_workflow/
140-
│ └── vine-logs/
141-
└── test_run/
142-
└── vine-logs/
143-
```
144-
145-
Parse all of them at once:
146-
```bash
147-
vine_parse experiment1 large_workflow test_run
197+
vine-run-info/
198+
└── experiment1/
199+
├── vine-logs/ # Original log files
200+
│ ├── debug
201+
│ ├── performance
202+
│ ├── taskgraph
203+
│ ├── transactions
204+
│ └── workflow.json
205+
├── pkl-files/ # Raw parsed data (generated by vine_parse)
206+
│ ├── manager.pkl # Manager information
207+
│ ├── workers.pkl # Worker statistics
208+
│ ├── tasks.pkl # Task execution details
209+
│ ├── files.pkl # File transfer information
210+
│ └── subgraphs.pkl # Task dependency graphs
211+
├── csv-files/ # Visualization-ready data (generated from pkl-files)
212+
│ ├── task_concurrency.csv
213+
│ ├── worker_lifetime.csv
214+
│ ├── file_transfers.csv
215+
│ └── ... # Various CSV files for different charts
216+
└── svg-files/ # Cached graph visualizations
217+
├── task_subgraphs_1.svg
218+
├── task_dependencies_graph.svg
219+
└── ... # Cached SVG files for complex graphs
148220
```
149221

150-
Or use the --all option:
151-
```bash
152-
vine_parse --all
153-
```
222+
**Directory Breakdown:**
154223

155-
### 5. Complete Workflow Example
224+
- **`pkl-files/`**: Contains the raw parsed data extracted directly from log files. These are Python pickle files containing structured data about workers, tasks, files, and other workflow components. This is the primary output of `vine_parse`.
156225

157-
```bash
158-
# 1. Parse your logs
159-
vine_parse --logs-dir ~/my_logs experiment1 experiment2
226+
- **`csv-files/`**: Contains visualization-ready data files generated from the pkl-files. The web frontend uses these CSV files as the data source for all charts and graphs. Each CSV file corresponds to a specific visualization module.
160227

161-
# 2. Start the web server
162-
vine_report --logs-dir ~/my_logs --port 9122
228+
- **`svg-files/`**: Contains cached SVG files for complex graph visualizations (such as task dependency graphs and subgraphs). Since building these graphs is computationally expensive and time-consuming, we cache the generated SVG files to avoid rebuilding them on subsequent loads.
163229

164-
# 3. Open browser to http://localhost:9122
165-
```
230+
**For Developers:**
166231

167-
### 6. Generated Data Structure
232+
If you want to work with the raw data programmatically, you can load the pkl files into memory using the `restore_pkl_files()` function. The data structures are defined in the following files:
233+
- `data_parser.py` - Main data parsing logic and file restoration
234+
- `task.py` - Task data structure and methods
235+
- `worker.py` - Worker data structure and methods
236+
- `file.py` - File data structure and methods
237+
- `manager.py` - Manager data structure and methods
168238

169-
After parsing, each log collection will have a `pkl-files` directory:
170-
```
171-
logs/
172-
└── experiment1/
173-
├── vine-logs/
174-
│ ├── debug
175-
│ └── transactions
176-
└── pkl-files/ # Generated by vine_parse
177-
├── manager.pkl # Manager information
178-
├── workers.pkl # Worker statistics
179-
├── tasks.pkl # Task execution details
180-
├── files.pkl # File transfer information
181-
└── subgraphs.pkl # Task dependency graphs
182-
```
239+
This allows you to build custom visualizations based on the original parsed data. You can also customize the CSV generation logic by editing the `generate_csv_files()` function to create your own visualization-ready data formats.
183240

184241
## Important Notes
185242

taskvine_report/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
Visualization and analysis tool for TaskVine execution logs.
55
"""
66

7-
__version__ = "2025.5.0"
7+
__version__ = "2025.6.0"
88
__author__ = "Collaborative Computing Lab (CCL), University of Notre Dame"
99
__email__ = "[email protected]"
1010

taskvine_report/src/data_parser.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -987,7 +987,7 @@ def postprocess_debug(self):
987987
time_end = time.time()
988988
print(f"Postprocessing debug took {round(time_end - time_start, 4)} seconds")
989989

990-
def restore_debug(self):
990+
def restore_pkl_files(self):
991991
time_start = time.time()
992992
try:
993993
time_start = time.time()

0 commit comments

Comments
 (0)