Skip to content

Commit ac6711b

Browse files
authored
Langchain python Instructions (#303)
* feat(instructions): Add comprehensive LangChain development instructions for Python * feat(instructions): Revise LangChain Python instructions for chat models and vector stores * feat(instructions): Enhance LangChain Python documentation with detailed Runnable interface and chat model usage examples * feat(instructions): Add LangChain Python instructions to README * fix(instructions): Standardize description quotes in LangChain Python instructions * refactor(instructions): Streamline chat models section and remove redundant overview content
1 parent 38d3ab3 commit ac6711b

File tree

2 files changed

+230
-0
lines changed

2 files changed

+230
-0
lines changed

README.instructions.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ Team and project-specific instructions to enhance GitHub Copilot's behavior for
6060
| [Joyride User Scripts Project Assistant](instructions/joyride-user-project.instructions.md)<br />[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/instructions?url=vscode%3Achat-instructions%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Finstructions%2Fjoyride-user-project.instructions.md)<br />[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/instructions?url=vscode-insiders%3Achat-instructions%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Finstructions%2Fjoyride-user-project.instructions.md) | Expert assistance for Joyride User Script projects - REPL-driven ClojureScript and user space automation of VS Code |
6161
| [Joyride Workspace Automation Assistant](instructions/joyride-workspace-automation.instructions.md)<br />[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/instructions?url=vscode%3Achat-instructions%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Finstructions%2Fjoyride-workspace-automation.instructions.md)<br />[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/instructions?url=vscode-insiders%3Achat-instructions%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Finstructions%2Fjoyride-workspace-automation.instructions.md) | Expert assistance for Joyride Workspace automation - REPL-driven and user space ClojureScript automation within specific VS Code workspaces |
6262
| [Kubernetes Deployment Best Practices](instructions/kubernetes-deployment-best-practices.instructions.md)<br />[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/instructions?url=vscode%3Achat-instructions%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Finstructions%2Fkubernetes-deployment-best-practices.instructions.md)<br />[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/instructions?url=vscode-insiders%3Achat-instructions%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Finstructions%2Fkubernetes-deployment-best-practices.instructions.md) | Comprehensive best practices for deploying and managing applications on Kubernetes. Covers Pods, Deployments, Services, Ingress, ConfigMaps, Secrets, health checks, resource limits, scaling, and security contexts. |
63+
| [LangChain Python Instructions](instructions/langchain-python.instructions.md)<br />[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/instructions?url=vscode%3Achat-instructions%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Finstructions%2Flangchain-python.instructions.md)<br />[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/instructions?url=vscode-insiders%3Achat-instructions%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Finstructions%2Flangchain-python.instructions.md) | Instructions for using LangChain with Python |
6364
| [Markdown](instructions/markdown.instructions.md)<br />[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/instructions?url=vscode%3Achat-instructions%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Finstructions%2Fmarkdown.instructions.md)<br />[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/instructions?url=vscode-insiders%3Achat-instructions%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Finstructions%2Fmarkdown.instructions.md) | Documentation and content creation standards |
6465
| [Memory Bank](instructions/memory-bank.instructions.md)<br />[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/instructions?url=vscode%3Achat-instructions%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Finstructions%2Fmemory-bank.instructions.md)<br />[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/instructions?url=vscode-insiders%3Achat-instructions%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Finstructions%2Fmemory-bank.instructions.md) | Bank specific coding standards and best practices |
6566
| [Microsoft 365 Declarative Agents Development Guidelines](instructions/declarative-agents-microsoft365.instructions.md)<br />[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/instructions?url=vscode%3Achat-instructions%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Finstructions%2Fdeclarative-agents-microsoft365.instructions.md)<br />[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/instructions?url=vscode-insiders%3Achat-instructions%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Finstructions%2Fdeclarative-agents-microsoft365.instructions.md) | Comprehensive development guidelines for Microsoft 365 Copilot declarative agents with schema v1.5, TypeSpec integration, and Microsoft 365 Agents Toolkit workflows |
Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
---
2+
description: 'Instructions for using LangChain with Python'
3+
applyTo: "**/*.py"
4+
---
5+
6+
# LangChain Python Instructions
7+
8+
These instructions guide GitHub Copilot in generating code and documentation for LangChain applications in Python. Focus on LangChain-specific patterns, APIs, and best practices.
9+
10+
## Runnable Interface (LangChain-specific)
11+
12+
LangChain's `Runnable` interface is the foundation for composing and executing chains, chat models, output parsers, retrievers, and LangGraph graphs. It provides a unified API for invoking, batching, streaming, inspecting, and composing components.
13+
14+
**Key LangChain-specific features:**
15+
16+
- All major LangChain components (chat models, output parsers, retrievers, graphs) implement the Runnable interface.
17+
- Supports synchronous (`invoke`, `batch`, `stream`) and asynchronous (`ainvoke`, `abatch`, `astream`) execution.
18+
- Batching (`batch`, `batch_as_completed`) is optimized for parallel API calls; set `max_concurrency` in `RunnableConfig` to control parallelism.
19+
- Streaming APIs (`stream`, `astream`, `astream_events`) yield outputs as they are produced, critical for responsive LLM apps.
20+
- Input/output types are component-specific (e.g., chat models accept messages, retrievers accept strings, output parsers accept model outputs).
21+
- Inspect schemas with `get_input_schema`, `get_output_schema`, and their JSONSchema variants for validation and OpenAPI generation.
22+
- Use `with_types` to override inferred input/output types for complex LCEL chains.
23+
- Compose Runnables declaratively with LCEL: `chain = prompt | chat_model | output_parser`.
24+
- Propagate `RunnableConfig` (tags, metadata, callbacks, concurrency) automatically in Python 3.11+; manually in async code for Python 3.9/3.10.
25+
- Create custom runnables with `RunnableLambda` (simple transforms) or `RunnableGenerator` (streaming transforms); avoid subclassing directly.
26+
- Configure runtime attributes and alternatives with `configurable_fields` and `configurable_alternatives` for dynamic chains and LangServe deployments.
27+
28+
**LangChain best practices:**
29+
30+
- Use batching for parallel API calls to LLMs or retrievers; set `max_concurrency` to avoid rate limits.
31+
- Prefer streaming APIs for chat UIs and long outputs.
32+
- Always validate input/output schemas for custom chains and deployed endpoints.
33+
- Use tags and metadata in `RunnableConfig` for tracing in LangSmith and debugging complex chains.
34+
- For custom logic, wrap functions with `RunnableLambda` or `RunnableGenerator` instead of subclassing.
35+
- For advanced configuration, expose fields and alternatives via `configurable_fields` and `configurable_alternatives`.
36+
37+
38+
- Use LangChain's chat model integrations for conversational AI:
39+
40+
- Import from `langchain.chat_models` or `langchain_openai` (e.g., `ChatOpenAI`).
41+
- Compose messages using `SystemMessage`, `HumanMessage`, `AIMessage`.
42+
- For tool calling, use `bind_tools(tools)` method.
43+
- For structured outputs, use `with_structured_output(schema)`.
44+
45+
Example:
46+
```python
47+
from langchain_openai import ChatOpenAI
48+
from langchain.schema import HumanMessage, SystemMessage
49+
50+
chat = ChatOpenAI(model="gpt-4", temperature=0)
51+
messages = [
52+
SystemMessage(content="You are a helpful assistant."),
53+
HumanMessage(content="What is LangChain?")
54+
]
55+
response = chat.invoke(messages)
56+
print(response.content)
57+
```
58+
59+
- Compose messages as a list of `SystemMessage`, `HumanMessage`, and optionally `AIMessage` objects.
60+
- For RAG, combine chat models with retrievers/vectorstores for context injection.
61+
- Use `streaming=True` for real-time token streaming (if supported).
62+
- Use `tools` argument for function/tool calling (OpenAI, Anthropic, etc.).
63+
- Use `response_format="json"` for structured outputs (OpenAI models).
64+
65+
Best practices:
66+
67+
- Always validate model outputs before using them in downstream tasks.
68+
- Prefer explicit message types for clarity and reliability.
69+
- For Copilot, provide clear, actionable prompts and document expected outputs.
70+
71+
72+
73+
- LLM client factory: centralize provider configs (API keys), timeouts, retries, and telemetry. Provide a single place to switch providers or client settings.
74+
- Prompt templates: store templates under `prompts/` and load via a safe helper. Keep templates small and testable.
75+
- Chains vs Agents: prefer Chains for deterministic pipelines (RAG, summarization). Use Agents when you require planning or dynamic tool selection.
76+
- Tools: implement typed adapter interfaces for tools; validate inputs and outputs strictly.
77+
- Memory: default to stateless design. When memory is needed, store minimal context and document retention/erasure policies.
78+
- Retrievers: build retrieval + rerank pipelines. Keep vectorstore schema stable (id, text, metadata).
79+
80+
### Patterns
81+
82+
- Callbacks & tracing: use LangChain callbacks and integrate with LangSmith or your tracing system to capture request/response lifecycle.
83+
- Separation of concerns: keep prompt construction, LLM wiring, and business logic separate to simplify testing and reduce accidental prompt changes.
84+
85+
## Embeddings & vectorstores
86+
87+
- Use consistent chunking and metadata fields (source, page, chunk_index).
88+
- Cache embeddings to avoid repeated cost for unchanged documents.
89+
- Local/dev: Chroma or FAISS. Production: managed vector DBs (Pinecone, Qdrant, Milvus, Weaviate) depending on scale and SLAs.
90+
91+
## Vector stores (LangChain-specific)
92+
93+
- Use LangChain's vectorstore integrations for semantic search, retrieval-augmented generation (RAG), and document similarity workflows.
94+
- Always initialize vectorstores with a supported embedding model (e.g., OpenAIEmbeddings, HuggingFaceEmbeddings).
95+
- Prefer official integrations (e.g., Chroma, FAISS, Pinecone, Qdrant, Weaviate) for production; use InMemoryVectorStore for tests and demos.
96+
- Store documents as LangChain `Document` objects with `page_content` and `metadata`.
97+
- Use `add_documents(documents, ids=...)` to add/update documents. Always provide unique IDs for upserts.
98+
- Use `delete(ids=...)` to remove documents by ID.
99+
- Use `similarity_search(query, k=4, filter={...})` to retrieve top-k similar documents. Use metadata filters for scoped search.
100+
- For RAG, connect your vectorstore to a retriever and chain with an LLM (see LangChain Retriever and RAGChain docs).
101+
- For advanced search, use vectorstore-specific options: Pinecone supports hybrid search and metadata filtering; Chroma supports filtering and custom distance metrics.
102+
- Always validate the vectorstore integration and API version in your environment; breaking changes are common between LangChain releases.
103+
- Example (InMemoryVectorStore):
104+
105+
```python
106+
from langchain_core.vectorstores import InMemoryVectorStore
107+
from langchain_openai import OpenAIEmbeddings
108+
from langchain_core.documents import Document
109+
110+
embedding_model = OpenAIEmbeddings()
111+
vector_store = InMemoryVectorStore(embedding=embedding_model)
112+
113+
documents = [Document(page_content="LangChain content", metadata={"source": "doc1"})]
114+
vector_store.add_documents(documents=documents, ids=["doc1"])
115+
116+
results = vector_store.similarity_search("What is RAG?", k=2)
117+
for doc in results:
118+
print(doc.page_content, doc.metadata)
119+
```
120+
121+
- For production, prefer persistent vectorstores (Chroma, Pinecone, Qdrant, Weaviate) and configure authentication, scaling, and backup as per provider docs.
122+
- Reference: https://python.langchain.com/docs/integrations/vectorstores/
123+
124+
## Prompt engineering & governance
125+
126+
- Store canonical prompts under `prompts/` and reference them by filename from code.
127+
- Write unit tests that assert required placeholders exist and that rendered prompts fit expected patterns (length, variables present).
128+
- Maintain a CHANGELOG for prompt and schema changes that affect behavior.
129+
130+
## Chat models
131+
132+
LangChain offers a consistent interface for chat models with additional features for monitoring, debugging, and optimization.
133+
134+
### Integrations
135+
136+
Integrations are either:
137+
138+
1. Official: packaged `langchain-<provider>` integrations maintained by the LangChain team or provider.
139+
2. Community: contributed integrations (in `langchain-community`).
140+
141+
Chat models typically follow a naming convention with a `Chat` prefix (e.g., `ChatOpenAI`, `ChatAnthropic`, `ChatOllama`). Models without the `Chat` prefix (or with an `LLM` suffix) often implement the older string-in/string-out interface and are less preferred for modern chat workflows.
142+
143+
### Interface
144+
145+
Chat models implement `BaseChatModel` and support the Runnable interface: streaming, async, batching, and more. Many operations accept and return LangChain `messages` (roles like `system`, `user`, `assistant`). See the BaseChatModel API reference for details.
146+
147+
Key methods include:
148+
149+
- `invoke(messages, ...)` — send a list of messages and receive a response.
150+
- `stream(messages, ...)` — stream partial outputs as tokens arrive.
151+
- `batch(inputs, ...)` — batch multiple requests.
152+
- `bind_tools(tools)` — attach tool adapters for tool calling.
153+
- `with_structured_output(schema)` — helper to request structured responses.
154+
155+
### Inputs and outputs
156+
157+
- LangChain supports its own message format and OpenAI's message format; pick one consistently in your codebase.
158+
- Messages include a `role` and `content` blocks; content can include structured or multimodal payloads where supported.
159+
160+
### Standard parameters
161+
162+
Commonly supported parameters (provider-dependent):
163+
164+
- `model`: model identifier (eg. `gpt-4o`, `gpt-3.5-turbo`).
165+
- `temperature`: randomness control (0.0 deterministic — 1.0 creative).
166+
- `timeout`: seconds to wait before canceling.
167+
- `max_tokens`: response token limit.
168+
- `stop`: stop sequences.
169+
- `max_retries`: retry attempts for network/limit failures.
170+
- `api_key`, `base_url`: provider auth and endpoint configuration.
171+
- `rate_limiter`: optional BaseRateLimiter to space requests and avoid provider quota errors.
172+
173+
> Note: Not all parameters are implemented by every provider. Always consult the provider integration docs.
174+
175+
### Tool calling
176+
177+
Chat models can call tools (APIs, DBs, system adapters). Use LangChain's tool-calling APIs to:
178+
179+
- Register tools with strict input/output typing.
180+
- Observe and log tool call requests and results.
181+
- Validate tool outputs before passing them back to the model or executing side effects.
182+
183+
See the tool-calling guide in the LangChain docs for examples and safe patterns.
184+
185+
### Structured outputs
186+
187+
Use `with_structured_output` or schema-enforced methods to request JSON or typed outputs from the model. Structured outputs are essential for reliable extraction and downstream processing (parsers, DB writes, analytics).
188+
189+
### Multimodality
190+
191+
Some models support multimodal inputs (images, audio). Check provider docs for supported input types and limitations. Multimodal outputs are rare — treat them as experimental and validate rigorously.
192+
193+
### Context window
194+
195+
Models have a finite context window measured in tokens. When designing conversational flows:
196+
197+
- Keep messages concise and prioritize important context.
198+
- Trim old context (summarize or archive) outside the model when it exceeds the window.
199+
- Use a retriever + RAG pattern to surface relevant long-form context instead of pasting large documents into the chat.
200+
201+
## Advanced topics
202+
203+
### Rate-limiting
204+
205+
- Use `rate_limiter` when initializing chat models to space calls.
206+
- Implement retry with exponential backoff and consider fallback models or degraded modes when throttled.
207+
208+
### Caching
209+
210+
- Exact-input caching for conversations is often ineffective. Consider semantic caching (embedding-based) for repeated meaning-level queries.
211+
- Semantic caching introduces dependency on embeddings and is not universally suitable.
212+
- Cache only where it reduces cost and meets correctness requirements (e.g., FAQ bots).
213+
214+
## Best practices
215+
216+
- Use type hints and dataclasses for public APIs.
217+
- Validate inputs before calling LLMs or tools.
218+
- Load secrets from secret managers; never log secrets or unredacted model outputs.
219+
- Deterministic tests: mock LLMs and embedding calls.
220+
- Cache embeddings and frequent retrieval results.
221+
- Observability: log request_id, model name, latency, and sanitized token counts.
222+
- Implement exponential backoff and idempotency for external calls.
223+
224+
## Security & privacy
225+
226+
- Treat model outputs as untrusted. Sanitize before executing generated code or system commands.
227+
- Validate any user-supplied URLs and inputs to avoid SSRF and injection attacks.
228+
- Document data retention and add an API to erase user data on request.
229+
- Limit stored PII and encrypt sensitive fields at rest.

0 commit comments

Comments
 (0)