Chatbot_Retrieval_model中的QA文件夹下utils.py中的POS_WEIGHT是如何得到的

https://github.com/charlesXu86/Chatbot_Retrieval/blob/5249957f61392a93a296b00e440aac04d0c52992/Chatbot_Retrieval_model/QA/utils.py#L21-L30

这个词性权重该如何得到呢？这种使用词性权重是通用的做法吗，我在实习的公司也见到他们要使用句子中每个词的词性，但是不知道具体是如何做的

https://github.com/charlesXu86/Chatbot_Retrieval/blob/5249957f61392a93a296b00e440aac04d0c52992/Chatbot_Retrieval_model/QA/utils.py#L107-L122
下面这种计算相似度的方式和 用jaccard，bm25，embedding余弦相似度结合相比会更好吗




	POS_WEIGHT = {
	"Ag": 1, # 形语素
	"a": 0.5, # 形容词
	"ad": 0.5, # 副形词
	"an": 1, # 名形词
	"b": 1, # 区别词
	"c": 0.2, # 连词
	"dg": 0.5, # 副语素
	"d": 0.5, # 副词
	"e": 0.5, # 叹词

	elif method == 'vec' and embedding:
	# 词向量+词性权重
	sim_weight = 0
	total_weight = 0
	for word, pos in a:
	if word not in embedding.index2word:
	continue

	cur_weight = pos_weight.get(pos, 1)

	max_word_sim = max(embedding.similarity(bword, word)
	for bword in b)
	sim_weight += cur_weight * max_word_sim

	total_weight += cur_weight

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chatbot_Retrieval_model中的QA文件夹下utils.py中的POS_WEIGHT是如何得到的 #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Chatbot_Retrieval_model中的QA文件夹下utils.py中的POS_WEIGHT是如何得到的 #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions