Skip to content

group by v2#680

Open
Honglei-Qiu wants to merge 10 commits intoPaddlePaddle:developfrom
Honglei-Qiu:group_v2
Open

group by v2#680
Honglei-Qiu wants to merge 10 commits intoPaddlePaddle:developfrom
Honglei-Qiu:group_v2

Conversation

@Honglei-Qiu
Copy link
Copy Markdown
Contributor

PR Category

Feature Enhancement

Description

新增group分组规则

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Mar 23, 2026

Thanks for your contribution!

Copy link
Copy Markdown
Collaborator

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

统计下每种分组方法,都能产生多少个group

WHERE s.deleted = 0
AND s.sample_type != 'full_graph'
) sub
WHERE sub.rn = 1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个判断的作用是什么?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

去重吧,防止一个sample被重复选取


def get_v2_group_members(candidates: list[CandidateGraph], num_dtypes: int):
# Index candidates by op_seq
by_op_seq = defaultdict(list)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

优化下所有的变量命名

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

b.input_shapes_bucket_id,
b.input_dtypes_bucket_id,
s.graph_hash,
ROW_NUMBER() OVER (
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

graph_hash不需要了吧?ROW_NUMBER在这里的作用是什么?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在每个 (op_seq, shapes, dtypes) 分区内,按创建时间排序编号,然后只取 rn = 1(最早的那条)。作用是桶内去重:同一个桶里可能有多个样本,只保留一个代表。
不过现在代码改了很多

"""

# Index candidates by op_seq
by_op_seq = defaultdict(list)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by_op_seq这样的变量名太抽象了

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

candidates_by_op_seq,润色一下

for c in candidates:
by_op_seq[c.op_seq_bucket_id].append(c)

rule3_selected_uids = set()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要以relux_这样的方式命名

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants