-
Notifications
You must be signed in to change notification settings - Fork 19.6k
Description
Checked other resources
- I added a very descriptive title to this issue.
- I used the GitHub search to find a similar question and didn't find it.
- I am sure that this is a bug in LangChain rather than my code.
- The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
- I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.
Example Code
Example of code in colab:
!pip install -q faiss-cpu
!pip install -q langchain-huggingface
!pip install langchain-community -q
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document
from langchain_community.vectorstores.faiss import DistanceStrategy
from langchain_huggingface import HuggingFaceEmbeddings
import torch
device = 'cuda' if torch.cuda.is_available() else 'cpu'
mn = 'basel/ATTACK-BERT'
embed_wrapper = HuggingFaceEmbeddings(model_name=mn,
model_kwargs={'device': device})
# sents = base_df['sentence'].tolist()
sents = ['Sidewinder has used JavaScript to drop and execute malware loaders.',
'Sidewinder has used PowerShell to drop and execute malware loaders.',
'It includes a module on internet threats designed to help end users learn how to identify and protect themselves from various types of phishing attacks.',
'It includes a module on Internet threats designed to help end users learn how to identify and protect themselves from various types of phishing attacks.',
'regexp_url (accessed Apr. 25, 2023).',
'regexp_url (accessed Apr. 28, 2023).',
'It’s unclear whether Victim 1 was impacted by Trigona.',
'It’s unclear whether Victim 2 was impacted by Trigona.',
'Use a reputed anti-virus and internet security software package on your connected devices, including PC, laptop, and mobile.',
'Use a reputed anti-virus and Internet security software package on your connected devices, including PC, laptop, and mobile.']
N = 100
db = FAISS.from_texts(sents[:N], embed_wrapper,
distance_strategy = DistanceStrategy.MAX_INNER_PRODUCT, normalize_L2=True)
docs_scores = db.similarity_search_with_relevance_scores(query, k=4,
score_threshold = 0)
for doc, score in docs_scores:
print(doc.page_content, doc.metadata, score)Error Message and Stack Trace (if applicable)
No response
Description
problems of choosing score_threshold in similarity_search_with_relevance_scores and faiss storage with distance== DistanceStrategy.MAX_INNER_PRODUCT
- Why? There are 2 funcs in chain calls which use score_threshold in opposite purposes. First
similarity_search_with_score_by_vector
calcs scalar similarity and finds docs with similarity more than score_threshold. Peace of code under link:
if score_threshold is not None:
cmp = (
operator.ge
if self.distance_strategy
in (DistanceStrategy.MAX_INNER_PRODUCT, DistanceStrategy.JACCARD)
else operator.le
)
docs = [
(doc, similarity)
for doc, similarity in docs
if cmp(similarity, score_threshold)
]
Then relevance_score_fn calcs distance (1.0 - similarity if dist>0) based on similarity and find elements with scores more than score_threshold
docs_and_similarities = [
(doc, similarity)
for doc, similarity in docs_and_similarities
if similarity >= score_threshold
]
If you have doc with similarity 0.8 and score_threshold 0.6, on first step it will be chosen but then as 0.2 (1-0.8) is less than 0.6 it will be dropped
My example of code outputs:
Sidewinder has used PowerShell to drop and execute malware loaders. {} 0.34206724
Sidewinder has used JavaScript to drop and execute malware loaders. {} 0.38181686
Use a reputed anti-virus and Internet security software package on your connected devices, including PC, laptop, and mobile. {} 0.83190054
Use a reputed anti-virus and internet security software package on your connected devices, including PC, laptop, and mobile. {} 0.83190054
but if you change score_threshold to 0.4 similar docs (first 2) will be dropped
- And another question why there is a warning if you to normalize_L2 to True, seems that it is a good way to transform scalar product cosine similarity
System Info
System Information
OS: Linux
OS Version: #1 SMP PREEMPT_DYNAMIC Sun Mar 30 16:01:29 UTC 2025
Python Version: 3.11.13 (main, Jun 4 2025, 08:57:29) [GCC 11.4.0]
Package Information
langchain_core: 0.3.68
langchain: 0.3.26
langchain_community: 0.3.27
langsmith: 0.4.4
langchain_huggingface: 0.3.0
langchain_text_splitters: 0.3.8
Optional packages not installed
langserve
Other Dependencies
aiohttp<4.0.0,>=3.8.3: Installed. No version info available.
async-timeout<5.0.0,>=4.0.0;: Installed. No version info available.
dataclasses-json<0.7,>=0.5.7: Installed. No version info available.
httpx: 0.28.1
httpx-sse<1.0.0,>=0.4.0: Installed. No version info available.
huggingface-hub>=0.30.2: Installed. No version info available.
jsonpatch<2.0,>=1.33: Installed. No version info available.
langchain-anthropic;: Installed. No version info available.
langchain-aws;: Installed. No version info available.
langchain-azure-ai;: Installed. No version info available.
langchain-cohere;: Installed. No version info available.
langchain-community;: Installed. No version info available.
langchain-core<1.0.0,>=0.3.51: Installed. No version info available.
langchain-core<1.0.0,>=0.3.65: Installed. No version info available.
langchain-core<1.0.0,>=0.3.66: Installed. No version info available.
langchain-deepseek;: Installed. No version info available.
langchain-fireworks;: Installed. No version info available.
langchain-google-genai;: Installed. No version info available.
langchain-google-vertexai;: Installed. No version info available.
langchain-groq;: Installed. No version info available.
langchain-huggingface;: Installed. No version info available.
langchain-mistralai;: Installed. No version info available.
langchain-ollama;: Installed. No version info available.
langchain-openai;: Installed. No version info available.
langchain-perplexity;: Installed. No version info available.
langchain-text-splitters<1.0.0,>=0.3.8: Installed. No version info available.
langchain-together;: Installed. No version info available.
langchain-xai;: Installed. No version info available.
langchain<1.0.0,>=0.3.26: Installed. No version info available.
langsmith-pyo3: Installed. No version info available.
langsmith>=0.1.125: Installed. No version info available.
langsmith>=0.1.17: Installed. No version info available.
langsmith>=0.3.45: Installed. No version info available.
numpy>=1.26.2;: Installed. No version info available.
numpy>=2.1.0;: Installed. No version info available.
openai-agents: Installed. No version info available.
opentelemetry-api: Installed. No version info available.
opentelemetry-exporter-otlp-proto-http: Installed. No version info available.
opentelemetry-sdk: Installed. No version info available.
orjson: 3.10.18
packaging: 24.2
packaging<25,>=23.2: Installed. No version info available.
pydantic: 2.11.7
pydantic-settings<3.0.0,>=2.4.0: Installed. No version info available.
pydantic<3.0.0,>=2.7.4: Installed. No version info available.
pydantic>=2.7.4: Installed. No version info available.
pytest: 8.3.5
PyYAML>=5.3: Installed. No version info available.
requests: 2.32.3
requests-toolbelt: 1.0.0
requests<3,>=2: Installed. No version info available.
rich: 13.9.4
sentence-transformers>=2.6.0;: Installed. No version info available.
SQLAlchemy<3,>=1.4: Installed. No version info available.
tenacity!=8.4.0,<10,>=8.1.0: Installed. No version info available.
tenacity!=8.4.0,<10.0.0,>=8.1.0: Installed. No version info available.
tokenizers>=0.19.1: Installed. No version info available.
transformers>=4.39.0;: Installed. No version info available.
typing-extensions>=4.7: Installed. No version info available.
zstandard: 0.23.0