Skip to content

Commit 30f7eaa

Browse files
1 parent 2da5d13 commit 30f7eaa

File tree

1 file changed

+100
-0
lines changed

1 file changed

+100
-0
lines changed
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
{
2+
"schema_version": "1.4.0",
3+
"id": "GHSA-c67j-w6g6-q2cm",
4+
"modified": "2025-12-23T18:46:13Z",
5+
"published": "2025-12-23T18:46:13Z",
6+
"aliases": [
7+
"CVE-2025-68664"
8+
],
9+
"summary": "LangChain serialization injection vulnerability enables secret extraction in dumps/loads APIs",
10+
"details": "## Summary\n\nA serialization injection vulnerability exists in LangChain's `dumps()` and `dumpd()` functions. The functions do not escape dictionaries with `'lc'` keys when serializing free-form dictionaries. The `'lc'` key is used internally by LangChain to mark serialized objects. When user-controlled data contains this key structure, it is treated as a legitimate LangChain object during deserialization rather than plain user data.\n\n### Attack surface\n\nThe core vulnerability was in `dumps()` and `dumpd()`: these functions failed to escape user-controlled dictionaries containing `'lc'` keys. When this unescaped data was later deserialized via `load()` or `loads()`, the injected structures were treated as legitimate LangChain objects rather than plain user data.\n\nThis escaping bug enabled several attack vectors:\n\n1. **Injection via user data**: Malicious LangChain object structures could be injected through user-controlled fields like `metadata`, `additional_kwargs`, or `response_metadata`\n2. **Class instantiation within trusted namespaces**: Injected manifests could instantiate any `Serializable` subclass, but only within the pre-approved trusted namespaces (`langchain_core`, `langchain`, `langchain_community`). This includes classes with side effects in `__init__` (network calls, file operations, etc.). Note that namespace validation was already enforced before this patch, so arbitrary classes outside these trusted namespaces could not be instantiated.\n\n### Security hardening\n\nThis patch fixes the escaping bug in `dumps()` and `dumpd()` and introduces new restrictive defaults in `load()` and `loads()`: allowlist enforcement via `allowed_objects=\"core\"` (restricted to [serialization mappings](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/load/mapping.py)), `secrets_from_env` changed from `True` to `False`, and default Jinja2 template blocking via `init_validator`. These are breaking changes for some use cases.\n\n## Who is affected?\n\nApplications are vulnerable if they:\n\n1. **Use `astream_events(version=\"v1\")`** — The v1 implementation internally uses vulnerable serialization. Note: `astream_events(version=\"v2\")` is not vulnerable.\n2. **Use `Runnable.astream_log()`** — This method internally uses vulnerable serialization for streaming outputs.\n3. **Call `dumps()` or `dumpd()` on untrusted data, then deserialize with `load()` or `loads()`** — Trusting your own serialization output makes you vulnerable if user-controlled data (e.g., from LLM responses, metadata fields, or user inputs) contains `'lc'` key structures.\n4. **Deserialize untrusted data with `load()` or `loads()`** — Directly deserializing untrusted data that may contain injected `'lc'` structures.\n5. **Use `RunnableWithMessageHistory`** — Internal serialization in message history handling.\n6. **Use `InMemoryVectorStore.load()`** to deserialize untrusted documents.\n7. Load untrusted generations from cache using **`langchain-community` caches**.\n8. Load untrusted manifests from the LangChain Hub via **`hub.pull`**.\n9. Use **`StringRunEvaluatorChain`** on untrusted runs.\n10. Use **`create_lc_store`** or **`create_kv_docstore`** with untrusted documents.\n11. Use **`MultiVectorRetriever`** with byte stores containing untrusted documents.\n12. Use **`LangSmithRunChatLoader`** with runs containing untrusted messages.\n\nThe most common attack vector is through **LLM response fields** like `additional_kwargs` or `response_metadata`, which can be controlled via prompt injection and then serialized/deserialized in streaming operations.\n\n## Impact\n\nAttackers who control serialized data can extract environment variable secrets by injecting `{\"lc\": 1, \"type\": \"secret\", \"id\": [\"ENV_VAR\"]}` to load environment variables during deserialization (when `secrets_from_env=True`, which was the old default). They can also instantiate classes with controlled parameters by injecting constructor structures to instantiate any class within trusted namespaces with attacker-controlled parameters, potentially triggering side effects such as network calls or file operations.\n\nKey severity factors:\n\n- Affects the serialization path - applications trusting their own serialization output are vulnerable\n- Enables secret extraction when combined with `secrets_from_env=True` (the old default)\n- LLM responses in `additional_kwargs` can be controlled via prompt injection\n\n## Exploit example\n\n```python\nfrom langchain_core.load import dumps, load\nimport os\n\n# Attacker injects secret structure into user-controlled data\nattacker_dict = {\n \"user_data\": {\n \"lc\": 1,\n \"type\": \"secret\",\n \"id\": [\"OPENAI_API_KEY\"]\n }\n}\n\nserialized = dumps(attacker_dict) # Bug: does NOT escape the 'lc' key\n\nos.environ[\"OPENAI_API_KEY\"] = \"sk-secret-key-12345\"\ndeserialized = load(serialized, secrets_from_env=True)\n\nprint(deserialized[\"user_data\"]) # \"sk-secret-key-12345\" - SECRET LEAKED!\n\n```\n\n## Security hardening changes (breaking changes)\n\nThis patch introduces three breaking changes to `load()` and `loads()`:\n\n1. **New `allowed_objects` parameter** (defaults to `'core'`): Enforces allowlist of classes that can be deserialized. The `'all'` option corresponds to the list of objects [specified in `mappings.py`](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/load/mapping.py) while the `'core'` option limits to objects within `langchain_core`. We recommend that users explicitly specify which objects they want to allow for serialization/deserialization.\n2. **`secrets_from_env` default changed from `True` to `False`**: Disables automatic secret loading from environment\n3. **New `init_validator` parameter** (defaults to `default_init_validator`): Blocks Jinja2 templates by default\n\n## Migration guide\n\n### No changes needed for most users\n\nIf you're deserializing standard LangChain types (messages, documents, prompts, trusted partner integrations like `ChatOpenAI`, `ChatAnthropic`, etc.), your code will work without changes:\n\n```python\nfrom langchain_core.load import load\n\n# Uses default allowlist from serialization mappings\nobj = load(serialized_data)\n\n```\n\n### For custom classes\n\nIf you're deserializing custom classes not in the serialization mappings, add them to the allowlist:\n\n```python\nfrom langchain_core.load import load\nfrom my_package import MyCustomClass\n\n# Specify the classes you need\nobj = load(serialized_data, allowed_objects=[MyCustomClass])\n```\n\n### For Jinja2 templates\n\nJinja2 templates are now blocked by default because they can execute arbitrary code. If you need Jinja2 templates, pass `init_validator=None`:\n\n```python\nfrom langchain_core.load import load\nfrom langchain_core.prompts import PromptTemplate\n\nobj = load(\n serialized_data,\n allowed_objects=[PromptTemplate],\n init_validator=None\n)\n\n```\n\n> [!WARNING]\n> Only disable `init_validator` if you trust the serialized data. Jinja2 templates can execute arbitrary Python code.\n\n### For secrets from environment\n\n`secrets_from_env` now defaults to `False`. If you need to load secrets from environment variables:\n\n```python\nfrom langchain_core.load import load\n\nobj = load(serialized_data, secrets_from_env=True)\n```\n\n\n## Credits\n\n* Dumps bug was reported by @yardenporat\n* Changes for security hardening due to findings from @0xn3va and @VladimirEliTokarev",
11+
"severity": [
12+
{
13+
"type": "CVSS_V3",
14+
"score": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:L/A:N"
15+
}
16+
],
17+
"affected": [
18+
{
19+
"package": {
20+
"ecosystem": "PyPI",
21+
"name": "langchain-core"
22+
},
23+
"ranges": [
24+
{
25+
"type": "ECOSYSTEM",
26+
"events": [
27+
{
28+
"introduced": "1.0.0"
29+
},
30+
{
31+
"fixed": "1.2.5"
32+
}
33+
]
34+
}
35+
]
36+
},
37+
{
38+
"package": {
39+
"ecosystem": "PyPI",
40+
"name": "langchain-core"
41+
},
42+
"ranges": [
43+
{
44+
"type": "ECOSYSTEM",
45+
"events": [
46+
{
47+
"introduced": "0"
48+
},
49+
{
50+
"fixed": "0.3.81"
51+
}
52+
]
53+
}
54+
]
55+
}
56+
],
57+
"references": [
58+
{
59+
"type": "WEB",
60+
"url": "https://github.com/langchain-ai/langchain/security/advisories/GHSA-c67j-w6g6-q2cm"
61+
},
62+
{
63+
"type": "WEB",
64+
"url": "https://github.com/langchain-ai/langchain/pull/34455"
65+
},
66+
{
67+
"type": "WEB",
68+
"url": "https://github.com/langchain-ai/langchain/pull/34458"
69+
},
70+
{
71+
"type": "WEB",
72+
"url": "https://github.com/langchain-ai/langchain/commit/5ec0fa69de31bbe3d76e4cf9cd65a6accb8466c8"
73+
},
74+
{
75+
"type": "WEB",
76+
"url": "https://github.com/langchain-ai/langchain/commit/d9ec4c5cc78960abd37da79b0250f5642e6f0ce6"
77+
},
78+
{
79+
"type": "PACKAGE",
80+
"url": "https://github.com/langchain-ai/langchain"
81+
},
82+
{
83+
"type": "WEB",
84+
"url": "https://github.com/langchain-ai/langchain/releases/tag/langchain-core%3D%3D0.3.81"
85+
},
86+
{
87+
"type": "WEB",
88+
"url": "https://github.com/langchain-ai/langchain/releases/tag/langchain-core%3D%3D1.2.5"
89+
}
90+
],
91+
"database_specific": {
92+
"cwe_ids": [
93+
"CWE-502"
94+
],
95+
"severity": "CRITICAL",
96+
"github_reviewed": true,
97+
"github_reviewed_at": "2025-12-23T18:46:13Z",
98+
"nvd_published_at": null
99+
}
100+
}

0 commit comments

Comments
 (0)