Skip to content

关于GPU集群训练log的疑问 #674

@qubingxin

Description

@qubingxin

训练集样本共计31W左右,测试集样本共计2.4W左右。用2个结点进行训练,其中一个结点的日志如下:
I1130 17:48:22.723963 32583 TrainerInternal.cpp:180] Pass=2 Batch=309 samples=158208 AvgCost=0.134166 Eval: classification_error_evaluator=0.0493528
I1130 18:01:22.744444 32583 Tester.cpp:101] Test samples=51200 cost=0.34299 Eval: classification_error_evaluator=0.160273

  1. 其中训练样本的samples是15W多,是否可以理解为31W除以2个结点得到的数字?
  2. Test samples在log中是5W多,是原始数据的2倍多,请问为何有这样的数据量差异?

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions