|
| 1 | +--- |
| 2 | +description: 'Instructions for using LangChain with Python' |
| 3 | +applyTo: "**/*.py" |
| 4 | +--- |
| 5 | + |
| 6 | +# LangChain Python Instructions |
| 7 | + |
| 8 | +These instructions guide GitHub Copilot in generating code and documentation for LangChain applications in Python. Focus on LangChain-specific patterns, APIs, and best practices. |
| 9 | + |
| 10 | +## Runnable Interface (LangChain-specific) |
| 11 | + |
| 12 | +LangChain's `Runnable` interface is the foundation for composing and executing chains, chat models, output parsers, retrievers, and LangGraph graphs. It provides a unified API for invoking, batching, streaming, inspecting, and composing components. |
| 13 | + |
| 14 | +**Key LangChain-specific features:** |
| 15 | + |
| 16 | +- All major LangChain components (chat models, output parsers, retrievers, graphs) implement the Runnable interface. |
| 17 | +- Supports synchronous (`invoke`, `batch`, `stream`) and asynchronous (`ainvoke`, `abatch`, `astream`) execution. |
| 18 | +- Batching (`batch`, `batch_as_completed`) is optimized for parallel API calls; set `max_concurrency` in `RunnableConfig` to control parallelism. |
| 19 | +- Streaming APIs (`stream`, `astream`, `astream_events`) yield outputs as they are produced, critical for responsive LLM apps. |
| 20 | +- Input/output types are component-specific (e.g., chat models accept messages, retrievers accept strings, output parsers accept model outputs). |
| 21 | +- Inspect schemas with `get_input_schema`, `get_output_schema`, and their JSONSchema variants for validation and OpenAPI generation. |
| 22 | +- Use `with_types` to override inferred input/output types for complex LCEL chains. |
| 23 | +- Compose Runnables declaratively with LCEL: `chain = prompt | chat_model | output_parser`. |
| 24 | +- Propagate `RunnableConfig` (tags, metadata, callbacks, concurrency) automatically in Python 3.11+; manually in async code for Python 3.9/3.10. |
| 25 | +- Create custom runnables with `RunnableLambda` (simple transforms) or `RunnableGenerator` (streaming transforms); avoid subclassing directly. |
| 26 | +- Configure runtime attributes and alternatives with `configurable_fields` and `configurable_alternatives` for dynamic chains and LangServe deployments. |
| 27 | + |
| 28 | +**LangChain best practices:** |
| 29 | + |
| 30 | +- Use batching for parallel API calls to LLMs or retrievers; set `max_concurrency` to avoid rate limits. |
| 31 | +- Prefer streaming APIs for chat UIs and long outputs. |
| 32 | +- Always validate input/output schemas for custom chains and deployed endpoints. |
| 33 | +- Use tags and metadata in `RunnableConfig` for tracing in LangSmith and debugging complex chains. |
| 34 | +- For custom logic, wrap functions with `RunnableLambda` or `RunnableGenerator` instead of subclassing. |
| 35 | +- For advanced configuration, expose fields and alternatives via `configurable_fields` and `configurable_alternatives`. |
| 36 | + |
| 37 | + |
| 38 | +- Use LangChain's chat model integrations for conversational AI: |
| 39 | + |
| 40 | +- Import from `langchain.chat_models` or `langchain_openai` (e.g., `ChatOpenAI`). |
| 41 | +- Compose messages using `SystemMessage`, `HumanMessage`, `AIMessage`. |
| 42 | +- For tool calling, use `bind_tools(tools)` method. |
| 43 | +- For structured outputs, use `with_structured_output(schema)`. |
| 44 | + |
| 45 | +Example: |
| 46 | +```python |
| 47 | +from langchain_openai import ChatOpenAI |
| 48 | +from langchain.schema import HumanMessage, SystemMessage |
| 49 | + |
| 50 | +chat = ChatOpenAI(model="gpt-4", temperature=0) |
| 51 | +messages = [ |
| 52 | + SystemMessage(content="You are a helpful assistant."), |
| 53 | + HumanMessage(content="What is LangChain?") |
| 54 | +] |
| 55 | +response = chat.invoke(messages) |
| 56 | +print(response.content) |
| 57 | +``` |
| 58 | + |
| 59 | +- Compose messages as a list of `SystemMessage`, `HumanMessage`, and optionally `AIMessage` objects. |
| 60 | +- For RAG, combine chat models with retrievers/vectorstores for context injection. |
| 61 | +- Use `streaming=True` for real-time token streaming (if supported). |
| 62 | +- Use `tools` argument for function/tool calling (OpenAI, Anthropic, etc.). |
| 63 | +- Use `response_format="json"` for structured outputs (OpenAI models). |
| 64 | + |
| 65 | +Best practices: |
| 66 | + |
| 67 | +- Always validate model outputs before using them in downstream tasks. |
| 68 | +- Prefer explicit message types for clarity and reliability. |
| 69 | +- For Copilot, provide clear, actionable prompts and document expected outputs. |
| 70 | + |
| 71 | + |
| 72 | + |
| 73 | +- LLM client factory: centralize provider configs (API keys), timeouts, retries, and telemetry. Provide a single place to switch providers or client settings. |
| 74 | +- Prompt templates: store templates under `prompts/` and load via a safe helper. Keep templates small and testable. |
| 75 | +- Chains vs Agents: prefer Chains for deterministic pipelines (RAG, summarization). Use Agents when you require planning or dynamic tool selection. |
| 76 | +- Tools: implement typed adapter interfaces for tools; validate inputs and outputs strictly. |
| 77 | +- Memory: default to stateless design. When memory is needed, store minimal context and document retention/erasure policies. |
| 78 | +- Retrievers: build retrieval + rerank pipelines. Keep vectorstore schema stable (id, text, metadata). |
| 79 | + |
| 80 | +### Patterns |
| 81 | + |
| 82 | +- Callbacks & tracing: use LangChain callbacks and integrate with LangSmith or your tracing system to capture request/response lifecycle. |
| 83 | +- Separation of concerns: keep prompt construction, LLM wiring, and business logic separate to simplify testing and reduce accidental prompt changes. |
| 84 | + |
| 85 | +## Embeddings & vectorstores |
| 86 | + |
| 87 | +- Use consistent chunking and metadata fields (source, page, chunk_index). |
| 88 | +- Cache embeddings to avoid repeated cost for unchanged documents. |
| 89 | +- Local/dev: Chroma or FAISS. Production: managed vector DBs (Pinecone, Qdrant, Milvus, Weaviate) depending on scale and SLAs. |
| 90 | + |
| 91 | +## Vector stores (LangChain-specific) |
| 92 | + |
| 93 | +- Use LangChain's vectorstore integrations for semantic search, retrieval-augmented generation (RAG), and document similarity workflows. |
| 94 | +- Always initialize vectorstores with a supported embedding model (e.g., OpenAIEmbeddings, HuggingFaceEmbeddings). |
| 95 | +- Prefer official integrations (e.g., Chroma, FAISS, Pinecone, Qdrant, Weaviate) for production; use InMemoryVectorStore for tests and demos. |
| 96 | +- Store documents as LangChain `Document` objects with `page_content` and `metadata`. |
| 97 | +- Use `add_documents(documents, ids=...)` to add/update documents. Always provide unique IDs for upserts. |
| 98 | +- Use `delete(ids=...)` to remove documents by ID. |
| 99 | +- Use `similarity_search(query, k=4, filter={...})` to retrieve top-k similar documents. Use metadata filters for scoped search. |
| 100 | +- For RAG, connect your vectorstore to a retriever and chain with an LLM (see LangChain Retriever and RAGChain docs). |
| 101 | +- For advanced search, use vectorstore-specific options: Pinecone supports hybrid search and metadata filtering; Chroma supports filtering and custom distance metrics. |
| 102 | +- Always validate the vectorstore integration and API version in your environment; breaking changes are common between LangChain releases. |
| 103 | +- Example (InMemoryVectorStore): |
| 104 | + |
| 105 | +```python |
| 106 | +from langchain_core.vectorstores import InMemoryVectorStore |
| 107 | +from langchain_openai import OpenAIEmbeddings |
| 108 | +from langchain_core.documents import Document |
| 109 | + |
| 110 | +embedding_model = OpenAIEmbeddings() |
| 111 | +vector_store = InMemoryVectorStore(embedding=embedding_model) |
| 112 | + |
| 113 | +documents = [Document(page_content="LangChain content", metadata={"source": "doc1"})] |
| 114 | +vector_store.add_documents(documents=documents, ids=["doc1"]) |
| 115 | + |
| 116 | +results = vector_store.similarity_search("What is RAG?", k=2) |
| 117 | +for doc in results: |
| 118 | + print(doc.page_content, doc.metadata) |
| 119 | +``` |
| 120 | + |
| 121 | +- For production, prefer persistent vectorstores (Chroma, Pinecone, Qdrant, Weaviate) and configure authentication, scaling, and backup as per provider docs. |
| 122 | +- Reference: https://python.langchain.com/docs/integrations/vectorstores/ |
| 123 | + |
| 124 | +## Prompt engineering & governance |
| 125 | + |
| 126 | +- Store canonical prompts under `prompts/` and reference them by filename from code. |
| 127 | +- Write unit tests that assert required placeholders exist and that rendered prompts fit expected patterns (length, variables present). |
| 128 | +- Maintain a CHANGELOG for prompt and schema changes that affect behavior. |
| 129 | + |
| 130 | +## Chat models |
| 131 | + |
| 132 | +LangChain offers a consistent interface for chat models with additional features for monitoring, debugging, and optimization. |
| 133 | + |
| 134 | +### Integrations |
| 135 | + |
| 136 | +Integrations are either: |
| 137 | + |
| 138 | +1. Official: packaged `langchain-<provider>` integrations maintained by the LangChain team or provider. |
| 139 | +2. Community: contributed integrations (in `langchain-community`). |
| 140 | + |
| 141 | +Chat models typically follow a naming convention with a `Chat` prefix (e.g., `ChatOpenAI`, `ChatAnthropic`, `ChatOllama`). Models without the `Chat` prefix (or with an `LLM` suffix) often implement the older string-in/string-out interface and are less preferred for modern chat workflows. |
| 142 | + |
| 143 | +### Interface |
| 144 | + |
| 145 | +Chat models implement `BaseChatModel` and support the Runnable interface: streaming, async, batching, and more. Many operations accept and return LangChain `messages` (roles like `system`, `user`, `assistant`). See the BaseChatModel API reference for details. |
| 146 | + |
| 147 | +Key methods include: |
| 148 | + |
| 149 | +- `invoke(messages, ...)` — send a list of messages and receive a response. |
| 150 | +- `stream(messages, ...)` — stream partial outputs as tokens arrive. |
| 151 | +- `batch(inputs, ...)` — batch multiple requests. |
| 152 | +- `bind_tools(tools)` — attach tool adapters for tool calling. |
| 153 | +- `with_structured_output(schema)` — helper to request structured responses. |
| 154 | + |
| 155 | +### Inputs and outputs |
| 156 | + |
| 157 | +- LangChain supports its own message format and OpenAI's message format; pick one consistently in your codebase. |
| 158 | +- Messages include a `role` and `content` blocks; content can include structured or multimodal payloads where supported. |
| 159 | + |
| 160 | +### Standard parameters |
| 161 | + |
| 162 | +Commonly supported parameters (provider-dependent): |
| 163 | + |
| 164 | +- `model`: model identifier (eg. `gpt-4o`, `gpt-3.5-turbo`). |
| 165 | +- `temperature`: randomness control (0.0 deterministic — 1.0 creative). |
| 166 | +- `timeout`: seconds to wait before canceling. |
| 167 | +- `max_tokens`: response token limit. |
| 168 | +- `stop`: stop sequences. |
| 169 | +- `max_retries`: retry attempts for network/limit failures. |
| 170 | +- `api_key`, `base_url`: provider auth and endpoint configuration. |
| 171 | +- `rate_limiter`: optional BaseRateLimiter to space requests and avoid provider quota errors. |
| 172 | + |
| 173 | +> Note: Not all parameters are implemented by every provider. Always consult the provider integration docs. |
| 174 | +
|
| 175 | +### Tool calling |
| 176 | + |
| 177 | +Chat models can call tools (APIs, DBs, system adapters). Use LangChain's tool-calling APIs to: |
| 178 | + |
| 179 | +- Register tools with strict input/output typing. |
| 180 | +- Observe and log tool call requests and results. |
| 181 | +- Validate tool outputs before passing them back to the model or executing side effects. |
| 182 | + |
| 183 | +See the tool-calling guide in the LangChain docs for examples and safe patterns. |
| 184 | + |
| 185 | +### Structured outputs |
| 186 | + |
| 187 | +Use `with_structured_output` or schema-enforced methods to request JSON or typed outputs from the model. Structured outputs are essential for reliable extraction and downstream processing (parsers, DB writes, analytics). |
| 188 | + |
| 189 | +### Multimodality |
| 190 | + |
| 191 | +Some models support multimodal inputs (images, audio). Check provider docs for supported input types and limitations. Multimodal outputs are rare — treat them as experimental and validate rigorously. |
| 192 | + |
| 193 | +### Context window |
| 194 | + |
| 195 | +Models have a finite context window measured in tokens. When designing conversational flows: |
| 196 | + |
| 197 | +- Keep messages concise and prioritize important context. |
| 198 | +- Trim old context (summarize or archive) outside the model when it exceeds the window. |
| 199 | +- Use a retriever + RAG pattern to surface relevant long-form context instead of pasting large documents into the chat. |
| 200 | + |
| 201 | +## Advanced topics |
| 202 | + |
| 203 | +### Rate-limiting |
| 204 | + |
| 205 | +- Use `rate_limiter` when initializing chat models to space calls. |
| 206 | +- Implement retry with exponential backoff and consider fallback models or degraded modes when throttled. |
| 207 | + |
| 208 | +### Caching |
| 209 | + |
| 210 | +- Exact-input caching for conversations is often ineffective. Consider semantic caching (embedding-based) for repeated meaning-level queries. |
| 211 | +- Semantic caching introduces dependency on embeddings and is not universally suitable. |
| 212 | +- Cache only where it reduces cost and meets correctness requirements (e.g., FAQ bots). |
| 213 | + |
| 214 | +## Best practices |
| 215 | + |
| 216 | +- Use type hints and dataclasses for public APIs. |
| 217 | +- Validate inputs before calling LLMs or tools. |
| 218 | +- Load secrets from secret managers; never log secrets or unredacted model outputs. |
| 219 | +- Deterministic tests: mock LLMs and embedding calls. |
| 220 | +- Cache embeddings and frequent retrieval results. |
| 221 | +- Observability: log request_id, model name, latency, and sanitized token counts. |
| 222 | +- Implement exponential backoff and idempotency for external calls. |
| 223 | + |
| 224 | +## Security & privacy |
| 225 | + |
| 226 | +- Treat model outputs as untrusted. Sanitize before executing generated code or system commands. |
| 227 | +- Validate any user-supplied URLs and inputs to avoid SSRF and injection attacks. |
| 228 | +- Document data retention and add an API to erase user data on request. |
| 229 | +- Limit stored PII and encrypt sensitive fields at rest. |
0 commit comments