Skip to content

Chatbot_Retrieval_model中的QA文件夹下utils.py中的POS_WEIGHT是如何得到的 #6

@Youarerare

Description

@Youarerare

POS_WEIGHT = {
"Ag": 1, # 形语素
"a": 0.5, # 形容词
"ad": 0.5, # 副形词
"an": 1, # 名形词
"b": 1, # 区别词
"c": 0.2, # 连词
"dg": 0.5, # 副语素
"d": 0.5, # 副词
"e": 0.5, # 叹词

这个词性权重该如何得到呢?这种使用词性权重是通用的做法吗,我在实习的公司也见到他们要使用句子中每个词的词性,但是不知道具体是如何做的

elif method == 'vec' and embedding:
# 词向量+词性权重
sim_weight = 0
total_weight = 0
for word, pos in a:
if word not in embedding.index2word:
continue
cur_weight = pos_weight.get(pos, 1)
max_word_sim = max(embedding.similarity(bword, word)
for bword in b)
sim_weight += cur_weight * max_word_sim
total_weight += cur_weight

下面这种计算相似度的方式和 用jaccard,bm25,embedding余弦相似度结合相比会更好吗

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions