Replies: 2 comments
-
是的,你观察到的 CPPDHead 中 forward_train 和 forward_test 的差异确实存在,并且是有意义的。这种差异主要体现在两个方面:
char_vis_node_query = paddle.concat([char_node_embed, visual_feats], 1) 解释:
这种做法的好处是在训练阶段“引导”视觉特征学习字符分布或字符节点的上下文信息,有助于优化训练收敛。
char_vis_node_query = visual_feats 解释:
diag_mask = paddle.eye(pos_node_feats1.shape[1]).unsqueeze(0).tile([...]) 用于边关系推理中屏蔽掉自身连接(diagonal),这是训练过程中常见做法,避免模型 trivially 学习 identity mapping。
为什么推理结果仍然好? 这是因为:
这是一种常见的训练/测试不对称设计方式,旨在利用更多 prior information 优化训练过程,同时保持推理阶段输入简洁、高效推断。 参考实践: 类似做法在 NLP 和 OCR 中都存在,例如:
如果你关心进一步验证结果影响,可以通过 ablation test(如:在 forward_test 中加入训练时的设计)来验证是否有准确率变化。 目前没有文档或 Issue 对此逻辑有具体说明,但你可以考虑在 PaddleOCR 提出相关讨论。此类设计通常写在论文或代码注释中。 Response generated by 🤖 feifei-bot | chatgpt-4o-latest |
Beta Was this translation helpful? Give feedback.
-
I check again |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I check the code and see the different between forward_train and forward_test in Class CPPDhead,
char_vis_node_query = paddle.concat([char_node_embed, visual_feats], 1)
char_vis_node_query = char_decoder_layer( char_vis_node_query, char_vis_node_query[:, counting_char_num:, :] )
char_vis_feats = char_vis_node_query[:, counting_char_num:, :]
pos_node_feats = self.edge_decoder( pos_node_query, char_vis_feats, char_vis_feats )
char_vis_node_query = visual_feats
char_vis_node_query = char_decoder_layer( char_vis_node_query, char_vis_node_query )
char_vis_feats = char_vis_node_query
pos_node_feats = self.edge_decoder( pos_node_query, char_vis_feats, char_vis_feats )
We can see in forward train, tensor char_vis_node_query is attention between concat([char_node_embed, visual_feats], 1) and visual_feats, but in forward test char_vis_node_query is attention between visual_feats and visual_feats, this can make the different between char_vis_feats = char_vis_node_query[:, counting_char_num:, :] in forward_train and char_vis_feats = char_vis_node_query in forward_test
One more think in forward train I see have use
diag_mask = ( paddle.eye(pos_node_feats1.shape[1]) .unsqueeze(0) .tile([pos_node_feats1.shape[0], 1, 1]) )
but in forward_test do not use it
Do you test it and see the different between train and test process ?
If result is good, Can you explain why have different but the result still good
Beta Was this translation helpful? Give feedback.
All reactions