-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Description
Chatbot_Retrieval/Chatbot_Retrieval_model/QA/utils.py
Lines 21 to 30 in 5249957
| POS_WEIGHT = { | |
| "Ag": 1, # 形语素 | |
| "a": 0.5, # 形容词 | |
| "ad": 0.5, # 副形词 | |
| "an": 1, # 名形词 | |
| "b": 1, # 区别词 | |
| "c": 0.2, # 连词 | |
| "dg": 0.5, # 副语素 | |
| "d": 0.5, # 副词 | |
| "e": 0.5, # 叹词 |
这个词性权重该如何得到呢?这种使用词性权重是通用的做法吗,我在实习的公司也见到他们要使用句子中每个词的词性,但是不知道具体是如何做的
Chatbot_Retrieval/Chatbot_Retrieval_model/QA/utils.py
Lines 107 to 122 in 5249957
| elif method == 'vec' and embedding: | |
| # 词向量+词性权重 | |
| sim_weight = 0 | |
| total_weight = 0 | |
| for word, pos in a: | |
| if word not in embedding.index2word: | |
| continue | |
| cur_weight = pos_weight.get(pos, 1) | |
| max_word_sim = max(embedding.similarity(bword, word) | |
| for bword in b) | |
| sim_weight += cur_weight * max_word_sim | |
| total_weight += cur_weight | |
下面这种计算相似度的方式和 用jaccard,bm25,embedding余弦相似度结合相比会更好吗
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels