A comprehensive demonstration of LangSmith's capabilities for LLM observability, prompt engineering, testing, and monitoring. This project showcases how to build, trace, evaluate, and improve an AI application that generates museum labels for artworks using Wikipedia as a knowledge source.
This demo implements a Museum Description Generator that:
- Retrieves context about artworks from Wikipedia
- Uses LLMs to generate scholarly, public-facing museum labels
- Demonstrates LangSmith's full evaluation lifecycle
- Includes both simple function-based and LangGraph-based implementations
The project includes two main implementations:
- Uses
@traceabledecorators for observability - Implements Wikipedia retrieval β LLM generation pipeline
- Demonstrates manual tracing and metadata
- Uses LangGraph for workflow orchestration
- Automatic tracing with LangSmith integration
- Includes SQLite checkpointing for persistence
- Configurable via
langgraph.json
langsmith-demo/
βββ README.md # This file
βββ langsmith_demo.ipynb # Main demo notebook
βββ graph.py # LangGraph implementation
βββ langgraph.json # LangGraph configuration
βββ images/ # Demo images and diagrams
β βββ museum-app.jpg
β βββ museum-app.pdf
β βββ evaluation_lifecycle.png
βββ ls-academy/ # Virtual environment
βββ readMe.ME # Original placeholder file
- Python 3.8+
- LangSmith account and API key
- OpenAI API key
-
Clone the repository:
git clone <repository-url> cd langsmith-demo
-
Set up environment variables:
export LANGSMITH_API_KEY="your-langsmith-api-key" export LANGCHAIN_API_KEY="your-langchain-api-key" # Same as LangSmith export LANGCHAIN_TRACING_V2="true" export LANGCHAIN_PROJECT="langsmith-demo"
-
Install dependencies:
pip install langsmith langchain langchain-openai langchain-community langgraph wikipedia python-dotenv jupyter
-
Run the demo:
# Option 1: Jupyter Notebook jupyter notebook langsmith_demo.ipynb # Option 2: LangGraph Studio langgraph dev
- Function Tracing: Using
@traceabledecorators - LangGraph Integration: Automatic tracing for graph-based workflows
- Metadata & Filtering: Rich metadata for run organization
- Run Types: Different run types (retriever, llm, chain)
- Prompt Hub Integration: Pulling prompts from LangSmith's prompt hub
- Template Management: Versioned prompt templates
- Prompt Canvas: Visual prompt development
- Dataset Creation: Golden dataset with reference outputs
- Custom Evaluators: Rule-based evaluation functions
- LLM-as-Judge: Automated evaluation using LLMs
- Annotation Queues: Human feedback integration
- Online Evaluations: Real-time evaluation capabilities
- Automations: Automated workflows and webhooks
- Dashboards: Prebuilt and custom monitoring dashboards
The project demonstrates the complete evaluation lifecycle:
- Data Collection: Creating golden datasets with reference outputs
- Evaluation Setup: Configuring custom and LLM-based evaluators
- Experiment Running: Comparing different prompts and configurations
- Analysis: Reviewing results and identifying improvements
- Iteration: Refining prompts and models based on feedback
from langsmith import traceable
@traceable(run_type="chain")
def museum_description_generator(artwork_name: str) -> str:
# Retrieve Wikipedia context
docs = retrieve_wikipedia(artwork_name)
# Generate museum label
messages = build_messages(artwork_name, docs)
response = call_openai(messages)
return response.choices[0].message.content
# Generate a label for "The Birth of Venus"
description = museum_description_generator("The Birth of Venus")
print(description)# Using the LangGraph implementation
from graph import graph
result = graph.invoke({
"question": "The Birth of Venus",
"messages": []
})
museum_label = result["messages"][-1].content
print(museum_label)The project includes comprehensive evaluation examples:
- Concise Evaluation: Ensures labels are appropriately sized
- Quality Checks: Avoids overused words like "beautiful" or "amazing"
- Reference Comparison: Compares against golden dataset
- A/B Testing: Compares different prompt versions
{
"graphs": {
"museum": {
"entrypoint": "graph.py:graph",
"title": "Museum Label Graph",
"description": "Wikipedia β Prompt β LLM"
}
},
"server": {
"port": 2024
}
}- Default Model:
gpt-4o-mini - Temperature:
0.2(for consistent outputs) - Max Wikipedia Docs:
2(for focused context)
This demo covers key LangSmith concepts:
This is a demonstration project. Feel free to:
- Experiment with different prompts
- Add new evaluation metrics
- Try different LLM providers
- Extend the museum label functionality
This project is for educational and demonstration purposes.
For questions about LangSmith:
Note: This demo requires valid API keys for LangSmith and OpenAI to function properly. Make sure to set up your environment variables before running the examples.