Skip to content

Commit 11722f2

Browse files
authored
Merge pull request #404 from yinhaofeng/visualDL
visualDL
2 parents 50ea3df + eb621e7 commit 11722f2

File tree

6 files changed

+151
-16
lines changed

6 files changed

+151
-16
lines changed

doc/visualization.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# 可视化功能介绍
2+
PaddleRec通过飞桨生态的可视化分析工具VisualDL,支持将训练的过程可视化,让您清晰而直观的看到模型的训练效果。
3+
4+
## 可视化功能的依赖
5+
可视化功能依赖飞桨生态的可视化分析工具VisualDL完成,如果需要开启这项功能需要先安装VisualDL。安装命令如下:
6+
```bash
7+
python -m pip install visualdl -i https://mirror.baidu.com/pypi/simple
8+
```
9+
10+
## 开启可视化功能
11+
1. 在各模型的yaml配置文件中,runner项下添加新的参数“use_visual”,并将该项的值填写为True。该参数为bool类型,默认值为False,用于在安装VisualDL完成的情况下开启可视化训练。
12+
2. 在模型的dygraph_model.py文件中,可以通过train_forward函数的metrics_list, print_dict两个返回值来输出动态图运行时您需要打印的指标或变量。同理在模型的static_model.py文件中,可以通过net函数的fetch_dict返回值来输出静态图运行时您需要打印的指标。可视化功能会自动收集这些指标,并创建一个visualDL_log目录存放他们。
13+
3. 您可以正常的训练模型
14+
4. 启动VisualDL面板,有一下两种方法供您选择:
15+
16+
使用命令行启动VisualDL面板,命令格式如下:
17+
```python
18+
visualdl --logdir <dir_1, dir_2, ... , dir_n> --model <model_file> --host <host> --port <port> --cache-timeout <cache_timeout> --language <language> --public-path <public_path> --api-only
19+
```
20+
21+
参数详情:
22+
23+
| 参数 | 意义 |
24+
| --------------- | ------------------------------------------------------------ |
25+
| --logdir | 设定日志所在目录,可以指定多个目录,VisualDL将遍历并且迭代寻找指定目录的子目录,将所有实验结果进行可视化 |
26+
| --model | 设定模型文件路径(非文件夹路径),VisualDL将在此路径指定的模型文件进行可视化,目前可支持PaddlePaddle、ONNX、Keras、Core ML、Caffe等多种模型结构,详情可查看[graph支持模型种类](./docs/components/README.md#%E5%8A%9F%E8%83%BD%E6%93%8D%E4%BD%9C%E8%AF%B4%E6%98%8E-2) |
27+
| --host | 设定IP,默认为`127.0.0.1`,若想使得本机以外的机器访问启动的VisualDL面板,需指定此项为`0.0.0.0`或自己的公网IP地址 |
28+
| --port | 设定端口,默认为`8040` |
29+
| --cache-timeout | 后端缓存时间,在缓存时间内前端多次请求同一url,返回的数据从缓存中获取,默认为20秒 |
30+
| --language | VisualDL面板语言,可指定为'en'或'zh',默认为浏览器使用语言 |
31+
| --public-path | VisualDL面板URL路径,默认是'/app',即访问地址为'http://&lt;host&gt;:&lt;port&gt;/app' |
32+
| --api-only | 是否只提供API,如果设置此参数,则VisualDL不提供页面展示,只提供API服务,此时API地址为'http://&lt;host&gt;:&lt;port&gt;/&lt;public_path&gt;/api';若没有设置public_path参数,则默认为'http://&lt;host&gt;:&lt;port&gt;/api' |
33+
34+
使用Python脚本启动VisualDL面板,接口如下:
35+
36+
```python
37+
visualdl.server.app.run(logdir,
38+
model="path/to/model",
39+
host="127.0.0.1",
40+
port=8080,
41+
cache_timeout=20,
42+
language=None,
43+
public_path=None,
44+
api_only=False,
45+
open_browser=False)
46+
```
47+
48+
请注意:除`logdir`外,其他参数均为不定参数,传递时请指明参数名。
49+
50+
接口参数具体如下:
51+
52+
| 参数 | 格式 | 含义 |
53+
| ------------- | ------------------------------------------------ | ------------------------------------------------------------ |
54+
| logdir | string或list[string_1, string_2, ... , string_n] | 日志文件所在的路径,VisualDL将在此路径下递归搜索日志文件并进行可视化,可指定单个或多个路径,每个路径中及其子目录中的日志都将视为独立日志展现在前端面板上 |
55+
| model | string | 模型文件路径(非文件夹路径),VisualDL将在此路径指定的模型文件进行可视化,目前可支持PaddlePaddle、ONNX、Keras、Core ML、Caffe等多种模型结构,详情可查看[graph支持模型种类](./docs/components/README.md#%E5%8A%9F%E8%83%BD%E6%93%8D%E4%BD%9C%E8%AF%B4%E6%98%8E-2) |
56+
| host | string | 设定IP,默认为`127.0.0.1`,若想使得本机以外的机器访问启动的VisualDL面板,需指定此项为`0.0.0.0`或自己的公网IP地址 |
57+
| port | int | 启动服务端口,默认为`8040` |
58+
| cache_timeout | int | 后端缓存时间,在缓存时间内前端多次请求同一url,返回的数据从缓存中获取,默认为20秒 |
59+
| language | string | VisualDL面板语言,可指定为'en'或'zh',默认为浏览器使用语言 |
60+
| public_path | string | VisualDL面板URL路径,默认是'/app',即访问地址为'http://&lt;host&gt;:&lt;port&gt;/app' |
61+
| api_only | boolean | 是否只提供API,如果设置此参数,则VisualDL不提供页面展示,只提供API服务,此时API地址为'http://&lt;host&gt;:&lt;port&gt;/&lt;public_path&gt;/api';若没有设置public_path参数,则默认为'http://&lt;host&gt;:&lt;port&gt;/api' |
62+
| open_browser | boolean | 是否打开浏览器,设置为True则在启动后自动打开浏览器并访问VisualDL面板,若设置api_only,则忽略此参数 |
63+
64+
5. 在使用任意一种方式启动VisualDL面板后,打开浏览器访问VisualDL面板,即可查看日志的可视化结果
65+
66+
## 注意:
67+
1. 可视化功能依赖visualDL实现,请先安装最新版visualDL再开启yaml文件中的use_visual功能,不然会报错。
68+
2. 目前我们不支持静态图中dataset方式的可视化
69+
3. 目前可视化功能仅支持生成折线图,后续会逐步添加更多功能的可视化,敬请期待。
70+
4. 若对功能有疑问欢迎来用户群中交流:QQ群号码:861717190,微信小助手微信号:paddlerec2020

doc/yaml.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
| epochs | int | >= 1 || 指定train阶段需要训练几个epoch |
2121
| print_interval | int | >= 1 || 训练指标打印batch间隔 |
2222
| use_auc | bool | True/False || 在每个epoch开始时重置auc指标的值 |
23+
| use_visual | bool | True/False || 开启模型训练的可视化功能,开启时需要安装visualDL |
2324

2425

2526
## hyper_parameters变量

tools/infer.py

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ def main(args):
6464
config["config_abs_dir"] = args.abs_dir
6565
# tools.vars
6666
use_gpu = config.get("runner.use_gpu", True)
67+
use_visual = config.get("runner.use_visual", False)
6768
test_data_dir = config.get("runner.test_data_dir", None)
6869
print_interval = config.get("runner.print_interval", None)
6970
model_load_path = config.get("runner.infer_load_path", "model_output")
@@ -72,15 +73,20 @@ def main(args):
7273

7374
logger.info("**************common.configs**********")
7475
logger.info(
75-
"use_gpu: {}, test_data_dir: {}, start_epoch: {}, end_epoch: {}, print_interval: {}, model_load_path: {}".
76-
format(use_gpu, test_data_dir, start_epoch, end_epoch, print_interval,
77-
model_load_path))
76+
"use_gpu: {}, use_visual: {}, test_data_dir: {}, start_epoch: {}, end_epoch: {}, print_interval: {}, model_load_path: {}".
77+
format(use_gpu, use_visual, test_data_dir, start_epoch, end_epoch,
78+
print_interval, model_load_path))
7879
logger.info("**************common.configs**********")
7980

8081
place = paddle.set_device('gpu' if use_gpu else 'cpu')
8182

8283
dy_model = dy_model_class.create_model(config)
8384

85+
# Create a log_visual object and store the data in the path
86+
if use_visual:
87+
from visualdl import LogWriter
88+
log_visual = LogWriter(args.abs_dir + "/visualDL_log/infer")
89+
8490
# to do : add optimizer function
8591
#optimizer = dy_model_class.create_optimizer(dy_model, config)
8692

@@ -92,6 +98,7 @@ def main(args):
9298
interval_begin = time.time()
9399

94100
metric_list, metric_list_name = dy_model_class.create_metrics()
101+
step_num = 0
95102

96103
for epoch_id in range(start_epoch, end_epoch):
97104
logger.info("load model epoch {}".format(epoch_id))
@@ -110,18 +117,29 @@ def main(args):
110117
for var_name, var in tensor_print_dict.items():
111118
tensor_print_str += (
112119
"{}:".format(var_name) + str(var.numpy()) + ",")
120+
if use_visual:
121+
log_visual.add_scalar(
122+
tag="infer/" + var_name,
123+
step=step_num,
124+
value=var.numpy())
113125
metric_str = ""
114126
for metric_id in range(len(metric_list_name)):
115127
metric_str += (
116128
metric_list_name[metric_id] +
117129
": {:.6f},".format(metric_list[metric_id].accumulate())
118130
)
131+
if use_visual:
132+
log_visual.add_scalar(
133+
tag="infer/" + metric_list_name[metric_id],
134+
step=step_num,
135+
value=metric_list[metric_id].accumulate())
119136
logger.info("epoch: {}, batch_id: {}, ".format(
120137
epoch_id, batch_id) + metric_str + tensor_print_str +
121138
" speed: {:.2f} ins/s".format(
122139
print_interval * batch_size / (time.time(
123140
) - interval_begin)))
124141
interval_begin = time.time()
142+
step_num = step_num + 1
125143

126144
metric_str = ""
127145
for metric_id in range(len(metric_list_name)):

tools/static_infer.py

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,6 @@
2424

2525
from utils.utils_single import load_yaml, load_static_model_class, get_abs_model, create_data_loader, reset_auc
2626
from utils.save_load import save_static_model, load_static_model
27-
2827
import time
2928
import argparse
3029

@@ -59,6 +58,7 @@ def main(args):
5958

6059
use_gpu = config.get("runner.use_gpu", True)
6160
use_auc = config.get("runner.use_auc", False)
61+
use_visual = config.get("runner.use_visual", False)
6262
auc_num = config.get("runner.auc_num", 1)
6363
test_data_dir = config.get("runner.test_data_dir", None)
6464
print_interval = config.get("runner.print_interval", None)
@@ -69,9 +69,9 @@ def main(args):
6969
os.environ["CPU_NUM"] = str(config.get("runner.thread_num", 1))
7070
logger.info("**************common.configs**********")
7171
logger.info(
72-
"use_gpu: {}, test_data_dir: {}, start_epoch: {}, end_epoch: {}, print_interval: {}, model_load_path: {}".
73-
format(use_gpu, test_data_dir, start_epoch, end_epoch, print_interval,
74-
model_load_path))
72+
"use_gpu: {}, use_visual: {}, test_data_dir: {}, start_epoch: {}, end_epoch: {}, print_interval: {}, model_load_path: {}".
73+
format(use_gpu, use_visual, test_data_dir, start_epoch, end_epoch,
74+
print_interval, model_load_path))
7575
logger.info("**************common.configs**********")
7676

7777
place = paddle.set_device('gpu' if use_gpu else 'cpu')
@@ -82,6 +82,12 @@ def main(args):
8282
test_dataloader = create_data_loader(
8383
config=config, place=place, mode="test")
8484

85+
# Create a log_visual object and store the data in the path
86+
if use_visual:
87+
from visualdl import LogWriter
88+
log_visual = LogWriter(args.abs_dir + "/visualDL_log/infer")
89+
step_num = 0
90+
8591
for epoch_id in range(start_epoch, end_epoch):
8692
logger.info("load model epoch {}".format(epoch_id))
8793
model_path = os.path.join(model_load_path, str(epoch_id))
@@ -104,12 +110,18 @@ def main(args):
104110
for var_idx, var_name in enumerate(fetch_vars):
105111
metric_str += "{}: {}, ".format(
106112
var_name, fetch_batch_var[var_idx][0])
113+
if use_visual:
114+
log_visual.add_scalar(
115+
tag="infer/" + var_name,
116+
step=step_num,
117+
value=fetch_batch_var[var_idx][0])
107118
logger.info("epoch: {}, batch_id: {}, ".format(
108119
epoch_id, batch_id) + metric_str + "speed: {:.2f} ins/s".
109120
format(print_interval * batch_size / (time.time(
110121
) - interval_begin)))
111122
interval_begin = time.time()
112123
reader_start = time.time()
124+
step_num = step_num + 1
113125

114126
metric_str = ""
115127
for var_idx, var_name in enumerate(fetch_vars):

tools/static_trainer.py

Lines changed: 22 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ def main(args):
6262

6363
use_gpu = config.get("runner.use_gpu", True)
6464
use_auc = config.get("runner.use_auc", False)
65+
use_visual = config.get("runner.use_visual", False)
6566
auc_num = config.get("runner.auc_num", 1)
6667
train_data_dir = config.get("runner.train_data_dir", None)
6768
epochs = config.get("runner.epochs", None)
@@ -73,8 +74,8 @@ def main(args):
7374
os.environ["CPU_NUM"] = str(config.get("runner.thread_num", 1))
7475
logger.info("**************common.configs**********")
7576
logger.info(
76-
"use_gpu: {}, train_data_dir: {}, epochs: {}, print_interval: {}, model_save_path: {}".
77-
format(use_gpu, train_data_dir, epochs, print_interval,
77+
"use_gpu: {}, use_visual: {}, train_data_dir: {}, epochs: {}, print_interval: {}, model_save_path: {}".
78+
format(use_gpu, use_visual, train_data_dir, epochs, print_interval,
7879
model_save_path))
7980
logger.info("**************common.configs**********")
8081

@@ -85,6 +86,14 @@ def main(args):
8586

8687
last_epoch_id = config.get("last_epoch", -1)
8788

89+
# Create a log_visual object and store the data in the path
90+
if use_visual:
91+
from visualdl import LogWriter
92+
log_visual = LogWriter(args.abs_dir + "/visualDL_log/train")
93+
else:
94+
log_visual = None
95+
step_num = 0
96+
8897
if reader_type == 'QueueDataset':
8998
dataset, file_list = get_reader(input_data, config)
9099
elif reader_type == 'DataLoader':
@@ -96,9 +105,9 @@ def main(args):
96105
if use_auc:
97106
reset_auc(auc_num)
98107
if reader_type == 'DataLoader':
99-
fetch_batch_var = dataloader_train(epoch_id, train_dataloader,
100-
input_data_names, fetch_vars,
101-
exe, config)
108+
fetch_batch_var, step_num = dataloader_train(
109+
epoch_id, train_dataloader, input_data_names, fetch_vars, exe,
110+
config, use_visual, log_visual, step_num)
102111
metric_str = ""
103112
for var_idx, var_name in enumerate(fetch_vars):
104113
metric_str += "{}: {}, ".format(var_name,
@@ -139,7 +148,7 @@ def dataset_train(epoch_id, dataset, fetch_vars, exe, config):
139148

140149

141150
def dataloader_train(epoch_id, train_dataloader, input_data_names, fetch_vars,
142-
exe, config):
151+
exe, config, use_visual, log_visual, step_num):
143152
print_interval = config.get("runner.print_interval", None)
144153
batch_size = config.get("runner.train_batch_size", None)
145154
interval_begin = time.time()
@@ -162,6 +171,11 @@ def dataloader_train(epoch_id, train_dataloader, input_data_names, fetch_vars,
162171
for var_idx, var_name in enumerate(fetch_vars):
163172
metric_str += "{}: {}, ".format(var_name,
164173
fetch_batch_var[var_idx])
174+
if use_visual:
175+
log_visual.add_scalar(
176+
tag="train/" + var_name,
177+
step=step_num,
178+
value=fetch_batch_var[var_idx])
165179
logger.info(
166180
"epoch: {}, batch_id: {}, ".format(epoch_id,
167181
batch_id) + metric_str +
@@ -174,7 +188,8 @@ def dataloader_train(epoch_id, train_dataloader, input_data_names, fetch_vars,
174188
train_run_cost = 0.0
175189
total_samples = 0
176190
reader_start = time.time()
177-
return fetch_batch_var
191+
step_num = step_num + 1
192+
return fetch_batch_var, step_num
178193

179194

180195
if __name__ == "__main__":

0 commit comments

Comments
 (0)