-
Notifications
You must be signed in to change notification settings - Fork 102
Open
Labels
bugSomething isn't workingSomething isn't working
Description
from rank_bm25 import BM25Okapi
corpus = [
"Hello there good man!",
"It is quite windy in London"
"How is the weather today?"
]
tokenized_corpus = [doc.split(" ") for doc in corpus]
bm25 = BM25Okapi(tokenized_corpus)
query = "windy London"
tokenized_query = query.split(" ")
doc_scores = bm25.get_scores(tokenized_query)
print(doc_scores)[0. 0.93729472 0. ]
But
from rank_bm25 import BM25Okapi
corpus = [
"Hello there good man!",
"It is quite windy in London",
# "How is the weather today?"
]
tokenized_corpus = [doc.split(" ") for doc in corpus]
bm25 = BM25Okapi(tokenized_corpus)
query = "windy London"
tokenized_query = query.split(" ")
doc_scores = bm25.get_scores(tokenized_query)
print(doc_scores)[0. 0.]
The difference lies in the number of corpus elements. It should be incorrect, but I don't know why?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working