diff --git a/docs/book/user-guide/.gitbook/assets/agent-evaluation-overview.png b/docs/book/user-guide/.gitbook/assets/agent-evaluation-overview.png new file mode 100644 index 00000000000..1c196977e77 Binary files /dev/null and b/docs/book/user-guide/.gitbook/assets/agent-evaluation-overview.png differ diff --git a/docs/book/user-guide/.gitbook/assets/agent-monitoring-dashboard.png b/docs/book/user-guide/.gitbook/assets/agent-monitoring-dashboard.png new file mode 100644 index 00000000000..f258e2c321b Binary files /dev/null and b/docs/book/user-guide/.gitbook/assets/agent-monitoring-dashboard.png differ diff --git a/docs/book/user-guide/.gitbook/assets/agent-orchestration.png b/docs/book/user-guide/.gitbook/assets/agent-orchestration.png new file mode 100644 index 00000000000..300d50a3185 Binary files /dev/null and b/docs/book/user-guide/.gitbook/assets/agent-orchestration.png differ diff --git a/docs/book/user-guide/.gitbook/assets/agent-pipeline-pattern.png b/docs/book/user-guide/.gitbook/assets/agent-pipeline-pattern.png new file mode 100644 index 00000000000..e02f1a1d498 Binary files /dev/null and b/docs/book/user-guide/.gitbook/assets/agent-pipeline-pattern.png differ diff --git a/docs/book/user-guide/.gitbook/assets/agent-scaling-challenge.png b/docs/book/user-guide/.gitbook/assets/agent-scaling-challenge.png new file mode 100644 index 00000000000..3724d8e56a2 Binary files /dev/null and b/docs/book/user-guide/.gitbook/assets/agent-scaling-challenge.png differ diff --git a/docs/book/user-guide/.gitbook/assets/batch-optimization.png b/docs/book/user-guide/.gitbook/assets/batch-optimization.png new file mode 100644 index 00000000000..929f4bf55d8 Binary files /dev/null and b/docs/book/user-guide/.gitbook/assets/batch-optimization.png differ diff --git a/docs/book/user-guide/.gitbook/assets/batch-processing-flow.png b/docs/book/user-guide/.gitbook/assets/batch-processing-flow.png new file mode 100644 index 00000000000..8af1cf64bfd Binary files /dev/null and b/docs/book/user-guide/.gitbook/assets/batch-processing-flow.png differ diff --git a/docs/book/user-guide/.gitbook/assets/batch-processing-scale.png b/docs/book/user-guide/.gitbook/assets/batch-processing-scale.png new file mode 100644 index 00000000000..12fe6482ebc Binary files /dev/null and b/docs/book/user-guide/.gitbook/assets/batch-processing-scale.png differ diff --git a/docs/book/user-guide/.gitbook/assets/direct-serving-pattern.png b/docs/book/user-guide/.gitbook/assets/direct-serving-pattern.png new file mode 100644 index 00000000000..ea3141794dd Binary files /dev/null and b/docs/book/user-guide/.gitbook/assets/direct-serving-pattern.png differ diff --git a/docs/book/user-guide/.gitbook/assets/error-handling-flow.png b/docs/book/user-guide/.gitbook/assets/error-handling-flow.png new file mode 100644 index 00000000000..c45fcd8585c Binary files /dev/null and b/docs/book/user-guide/.gitbook/assets/error-handling-flow.png differ diff --git a/docs/book/user-guide/.gitbook/assets/evaluation-metrics.png b/docs/book/user-guide/.gitbook/assets/evaluation-metrics.png new file mode 100644 index 00000000000..d9130405551 Binary files /dev/null and b/docs/book/user-guide/.gitbook/assets/evaluation-metrics.png differ diff --git a/docs/book/user-guide/.gitbook/assets/framework-landscape.png b/docs/book/user-guide/.gitbook/assets/framework-landscape.png new file mode 100644 index 00000000000..00dd3448d80 Binary files /dev/null and b/docs/book/user-guide/.gitbook/assets/framework-landscape.png differ diff --git a/docs/book/user-guide/.gitbook/assets/horizontal-scaling-agents.png b/docs/book/user-guide/.gitbook/assets/horizontal-scaling-agents.png new file mode 100644 index 00000000000..8fb1b41cf37 Binary files /dev/null and b/docs/book/user-guide/.gitbook/assets/horizontal-scaling-agents.png differ diff --git a/docs/book/user-guide/.gitbook/assets/hybrid-architecture-pattern.png b/docs/book/user-guide/.gitbook/assets/hybrid-architecture-pattern.png new file mode 100644 index 00000000000..06692972ae8 Binary files /dev/null and b/docs/book/user-guide/.gitbook/assets/hybrid-architecture-pattern.png differ diff --git a/docs/book/user-guide/.gitbook/assets/orchestration-benefits.png b/docs/book/user-guide/.gitbook/assets/orchestration-benefits.png new file mode 100644 index 00000000000..ff9078f6fd4 Binary files /dev/null and b/docs/book/user-guide/.gitbook/assets/orchestration-benefits.png differ diff --git a/docs/book/user-guide/.gitbook/assets/production-architecture-spectrum.png b/docs/book/user-guide/.gitbook/assets/production-architecture-spectrum.png new file mode 100644 index 00000000000..279cca7544f Binary files /dev/null and b/docs/book/user-guide/.gitbook/assets/production-architecture-spectrum.png differ diff --git a/docs/book/user-guide/agent-guide/README.md b/docs/book/user-guide/agent-guide/README.md new file mode 100644 index 00000000000..6a35758f310 --- /dev/null +++ b/docs/book/user-guide/agent-guide/README.md @@ -0,0 +1,50 @@ +--- +description: Build production-ready AI agent workflows with ZenML orchestration. +icon: robot-face +--- + +# Agent guide + +Transform your agent development from experimental scripts into systematic, production-ready workflows using ZenML as your agent development platform. + +

ZenML provides a complete agent development workflow with evaluation, versioning, and deployment capabilities.

+ +## Why pipelines for agents (for ML/AI engineers) + +Agents evolve quickly: prompts change, tools are added/removed, and behavior shifts in production. A pipeline-first approach gives you reproducibility, lineage, and safe iteration: you version prompts and configs, deploy with confidence, and evaluate continuously on real traces. + +ZenML lets you apply the same rigor you use for classical ML to agents—regardless of whether you use an agent framework or direct LLM calls. The benefit is speed with control: faster iteration loops, clear governance, and a shared operating model across teams. + +Common pitfalls of ad‑hoc agent work today: +- Experiment across notebooks and frameworks without shared structure +- Eyeball outputs instead of evaluating systematically +- Deploy without lineage, then hope production matches development +- Struggle to improve based on real usage and missing traces + +**ZenML changes this** by applying the same systematic rigor you use for traditional ML to agent development. + +Looking for a runnable starting point? See the end-to-end minimal example in [`examples/minimal_agent_production`](https://github.com/zenml-io/zenml/tree/main/examples/minimal_agent_production). + +## This Guide's Journey + +We'll take you through the complete agent development workflow: + +1. **[Development & Experimentation](agent-fundamentals.md)** - Set up systematic agent development with ZenML, covering framework integration, tool setup, and experiment tracking +2. **[Production & Deployment](agent-deployment.md)** - Deploy your agents with observability and monitoring while maintaining ZenML integration +3. **[Evaluation & Improvement](agent-evaluation.md)** - Use production data to systematically evaluate and improve agents using proven LLMOps patterns + +Each chapter builds on the previous one, creating a complete workflow from initial experimentation through deployment to data-driven improvement. + +## What You'll Learn + +**After Chapter 1**: How to wrap any agent (framework or custom) in ZenML pipelines for systematic development +**After Chapter 2**: How to deploy agents in production while maintaining configuration lineage and observability +**After Chapter 3**: How to use production traces to systematically evaluate and improve agents over time + +## Prerequisites + +- Python 3.9+ environment with ZenML installed +- Familiarity with [ZenML fundamentals](../starter-guide/) +- Experience with [LLM evaluation patterns](../llmops-guide/evaluation/) (helpful but not required) + +Ready? Let's start with [development fundamentals](agent-fundamentals.md) to set up your systematic agent development workflow. \ No newline at end of file diff --git a/docs/book/user-guide/agent-guide/agent-deployment.md b/docs/book/user-guide/agent-guide/agent-deployment.md new file mode 100644 index 00000000000..a34a69d8d6c --- /dev/null +++ b/docs/book/user-guide/agent-guide/agent-deployment.md @@ -0,0 +1,319 @@ +--- +description: Deploy your winning agent configuration with observability and continuous improvement. +--- + +# Production & deployment + +You've systematically developed agents ([Chapter 1](agent-fundamentals.md)) and now it's time to deploy these agents in production while maintaining the same systematic approach for monitoring and improvement. + +## Deploying Your Winning Configuration + +From Chapter 1, you now have systematically developed agent configurations: + +```python +# From Chapter 1 - you have systematically developed configurations +agent_configs = complete_agent_development_pipeline(config, test_queries) +# Result: Tracked experiments with different approaches +``` + +Now deploy this configuration while maintaining ZenML integration. + +## Deploying Your Agent + +Here's how to deploy your systematically developed agent from Chapter 1 while maintaining ZenML integration: + +```python +# production_server.py +from fastapi import FastAPI +from zenml.client import Client + +# Load your agent implementation from Chapter 1 +from agent_implementation import ( + run_direct_llm_agent, + run_framework_agent, + run_custom_agent +) + +app = FastAPI() + +def get_production_setup(): + """Load agent implementation and configuration from ZenML.""" + + client = Client() + model = client.get_model("customer_support_agent") + production_version = model.get_model_version("production") + + # Load artifacts stored in Chapter 1 + config = production_version.load_artifact("agent_configuration") + prompts = production_version.load_artifact("agent_prompts") + + # Choose which agent implementation to use + agent_type = config.get("agent_type", "direct") + if agent_type == "direct": + agent_func = run_direct_llm_agent + elif agent_type == "framework": + agent_func = run_framework_agent + else: + agent_func = run_custom_agent + + return agent_func, config, prompts, production_version.version + +@app.post("/chat") +async def chat_endpoint(query: str): + """Production endpoint running your systematically developed agent.""" + + try: + agent_func, config, prompts, version = get_production_setup() + + # Run your actual agent code with ZenML-managed configuration + response = agent_func( + query=query, + config=config, + prompts=prompts + ) + + # Log interaction for evaluation + # You can use langfuse, langsmith or any tool you prefer + log_production_interaction({ + "query": query, + "response": response, + "agent_version": version, + "timestamp": datetime.now() + }) + + return {"response": response} + + except Exception as e: + return {"error": "Agent temporarily unavailable"} + +@app.get("/health") +async def health_check(): + """Validate deployment and ZenML connectivity.""" + + try: + _, config, _, version = get_production_setup() + return { + "status": "healthy", + "agent_version": version, + "agent_type": config.get("agent_type", "direct") + } + except Exception as e: + return {"status": "unhealthy", "error": str(e)} +``` + +Deploy this however you prefer - locally, in containers, or on cloud platforms: + +```bash +# Local development +python production_server.py + +# Docker deployment +docker build -t my-agent . && docker run -p 8000:8000 my-agent + +# Cloud deployment (AWS ECS, Google Cloud Run, etc.) +# Use your organization's deployment pipeline +``` + +### Configuration Management with ZenML Artifacts + +As shown above, loading configuration artifacts is a common pattern—and you can apply the same approach to other components such as prompts, tool definitions, or any agent resource. By storing each as a ZenML artifact, you ensure reproducibility and complete lineage for every part of your agent pipeline. + +```python +from zenml.client import Client +from zenml import step, pipeline +from typing_extensions import Annotated + +@step +def store_agent_prompts() -> Annotated[Dict[str, str], "agent_prompts"]: + """Store agent prompts as versioned artifacts.""" + + prompts = { + "system_prompt": """You are a helpful customer support assistant. + Always be polite and try to resolve customer issues efficiently. + If you cannot help, escalate to a human agent.""", + + "tool_selection_prompt": """Given the customer query, determine which tools to use. + Available tools: search_knowledge_base, create_ticket, transfer_to_human.""", + + "summary_prompt": """Summarize this customer interaction in 2-3 sentences. + Include the issue, resolution, and customer satisfaction.""" + } + + return prompts + +@step +def store_agent_config() -> Annotated[Dict[str, Any], "agent_configuration"]: + """Store complete agent configuration.""" + + config = { + "model": "gpt-4", + "temperature": 0.1, + "max_tokens": 1000, + "timeout": 30, + "retry_attempts": 3, + "tools_enabled": ["knowledge_search", "ticket_creation"], + "fallback_enabled": True + } + + return config + +@pipeline +def agent_configuration_pipeline() -> None: + """Create versioned agent configuration artifacts.""" + + # Store prompts and config as separate artifacts + prompts = store_agent_prompts() + config = store_agent_config() + + # These are automatically versioned and tracked by ZenML + +# Run this pipeline when you update prompts/config +agent_configuration_pipeline() +``` + +### Loading Configuration in Production + +```python +from zenml.client import Client + +def get_production_agent_config(): + """Load production configuration with full ZenML lineage.""" + + client = Client() + + # Get the production model version + model = client.get_model("customer_support_agent") + production_version = model.get_model_version("production") + + # Load individual artifacts + prompts = production_version.load_artifact("agent_prompts") + config = production_version.load_artifact("agent_configuration") + + return { + "prompts": prompts, + "config": config, + "version": production_version.version, + "deployed_at": production_version.created, + "lineage": production_version.metadata + } + +@app.post("/chat") +async def production_endpoint(query: str): + """Production endpoint using ZenML-managed prompts and config.""" + + # Load current production configuration + production_setup = get_production_agent_config() + prompts = production_setup["prompts"] + config = production_setup["config"] + + # Use versioned prompts in your agent + response = openai_client.chat.completions.create( + model=config["model"], + temperature=config["temperature"], + max_tokens=config["max_tokens"], + messages=[ + {"role": "system", "content": prompts["system_prompt"]}, + {"role": "user", "content": query} + ] + ) + + # Log which configuration version was used + return { + "response": response.choices[0].message.content, + "config_version": production_setup["version"], + "model_used": config["model"] + } +``` + +### Prompt Versioning and A/B Testing + +```python +@pipeline +def prompt_ab_test_pipeline() -> None: + """Create A/B test versions of prompts.""" + + # Version A - Current prompt + @step + def create_prompt_variant_a() -> Annotated[str, "system_prompt_v1"]: + return """You are a helpful customer support assistant. + Always be polite and try to resolve customer issues efficiently.""" + + # Version B - More detailed prompt + @step + def create_prompt_variant_b() -> Annotated[str, "system_prompt_v2"]: + return """You are an expert customer support assistant with deep product knowledge. + Begin each response by acknowledging the customer's concern. + Provide step-by-step solutions when possible. + Always ask if there's anything else you can help with.""" + + prompt_a = create_prompt_variant_a() + prompt_b = create_prompt_variant_b() + +def deploy_prompt_variant(variant: str, traffic_split: float = 0.5): + """Deploy specific prompt variant with traffic splitting.""" + + client = Client() + + if variant == "A": + artifact_name = "system_prompt_v1" + else: + artifact_name = "system_prompt_v2" + + # Get the specific prompt version + artifact = client.get_artifact_version(artifact_name, version="latest") + + # Update production deployment with traffic split + update_production_prompt(artifact.load(), traffic_split) +``` + +### Configuration Rollback and Updates + +```python +def rollback_to_previous_config(): + """Rollback to previous configuration version.""" + + client = Client() + model = client.get_model("customer_support_agent") + + # Get previous production version + versions = model.list_model_versions() + previous_version = versions[1] # Second most recent + + # Rollback + previous_version.set_stage("production") + + print(f"Rolled back to version {previous_version.version}") + restart_production_deployment() + +def update_production_config(new_version: str): + """Promote new version to production.""" + + client = Client() + model = client.get_model("customer_support_agent") + + # Promote new version + new_model_version = model.get_model_version(new_version) + new_model_version.set_stage("production") + + # Validate configuration before deployment + config = new_model_version.load_artifact("agent_configuration") + prompts = new_model_version.load_artifact("agent_prompts") + + if validate_configuration(config, prompts): + restart_production_deployment() + print(f"Successfully deployed version {new_version}") + else: + print("Configuration validation failed, aborting deployment") +``` + +## Next Steps + +Your agents are now deployed in production with proper observability and monitoring. The traces and logs you're collecting will be essential for Chapter 3, where we'll use this production data to systematically evaluate and improve your agents. + +Key data being collected: +- **User interactions** and agent responses +- **Performance metrics** (response times, success rates) +- **Error patterns** and failure modes +- **Configuration lineage** connecting to your ZenML development experiments + +In [Chapter 3](agent-evaluation.md), you'll learn how to systematically analyze this production data to identify improvement opportunities and create a continuous feedback loop back to your ZenML development process. \ No newline at end of file diff --git a/docs/book/user-guide/agent-guide/agent-evaluation.md b/docs/book/user-guide/agent-guide/agent-evaluation.md new file mode 100644 index 00000000000..96ae2d62e4c --- /dev/null +++ b/docs/book/user-guide/agent-guide/agent-evaluation.md @@ -0,0 +1,225 @@ +--- +description: Evaluate and compare your ZenML agent experiments using proven LLM evaluation patterns. +--- + +# Evaluation & improvement + +Now that you have systematic agent development ([Chapter 1](agent-fundamentals.md)) and production deployment ([Chapter 2](agent-deployment.md)) set up, let's learn how to use production data to evaluate and improve your agents systematically. Instead of eyeballing outputs, you'll apply the same rigorous evaluation patterns used in LLM development. + +

Systematic evaluation transforms guesswork into data-driven agent improvement.

+ +## Building on Your Development Setup + +From Chapter 1 you have development results, and from Chapter 2 you have production traces. The question now is: which configuration performs best, and should you promote or iterate? + +## Leveraging LLM Evaluation Patterns + +Production agent evaluation builds directly on ZenML's [proven LLM evaluation patterns](../llmops-guide/evaluation/) using real production data. The key insight: **agents are LLMs with enhanced capabilities**, so the same evaluation principles apply. + +### Quality Evaluation + +Apply the same patterns as [LLM generation evaluation](../llmops-guide/evaluation/generation.md) to production agent data: + +```python +@step +def evaluate_agent_responses( + agent_results: Dict[str, Any] +) -> Annotated[Dict[str, float], "quality_metrics"]: + """Evaluate response quality using LLM-as-judge patterns.""" + + evaluator = LLMEvaluator(model="gpt-4") # Same as LLMOps guide + + responses = [r["response"] for r in agent_results["results"]] + queries = [r["query"] for r in agent_results["results"]] + + # Apply standard evaluation metrics + accuracy_scores = evaluator.evaluate_batch( + responses=responses, + queries=queries, + metric="accuracy" + ) + + helpfulness_scores = evaluator.evaluate_batch( + responses=responses, + queries=queries, + metric="helpfulness" + ) + + return { + "accuracy": np.mean(accuracy_scores), + "helpfulness": np.mean(helpfulness_scores), + "response_count": len(responses) + } +``` + +### Tool Usage Evaluation + +For production agents that use tools, extend [retrieval evaluation patterns](../llmops-guide/evaluation/retrieval.md): + +```python +@step +def evaluate_tool_usage( + agent_results: Dict[str, Any] +) -> Annotated[Dict[str, float], "tool_metrics"]: + """Evaluate how well agents select and use tools.""" + + results_with_tools = [r for r in agent_results["results"] + if "tools_used" in r] + + if not results_with_tools: + return {"tool_selection_accuracy": 0.0, "tool_usage_rate": 0.0} + + # Tool selection accuracy (like retrieval precision) + correct_tool_usage = sum(1 for r in results_with_tools + if validate_tool_usage(r)) + + return { + "tool_selection_accuracy": correct_tool_usage / len(results_with_tools), + "tool_usage_rate": len(results_with_tools) / len(agent_results["results"]) + } +``` + +## Complete Evaluation Pipeline + +Now systematically evaluate your production agent performance: + +```python +@pipeline +def agent_comparison_pipeline() -> Annotated[Dict[str, Any], "comparison_report"]: + """Compare different agent configurations systematically.""" + + # Load the same test dataset for fair comparison + test_queries = load_evaluation_dataset() + + # Get production data from Chapter 2 deployment + production_traces = collect_production_traces() + production_data = convert_traces_to_evaluation_format(production_traces) + + # Run current production agent on test dataset + current_results = evaluate_production_agent_on_dataset(test_queries) + + # Evaluate production performance + production_quality = evaluate_agent_responses(production_data) + current_quality = evaluate_agent_responses(current_results) + + production_tools = evaluate_tool_usage(production_data) + current_tools = evaluate_tool_usage(current_results) + + # Compare production vs current performance + comparison = { + "production_traces": {**production_quality, **production_tools}, + "current_evaluation": {**current_quality, **current_tools} + } + + # Determine if retraining is needed + improvement_needed = determine_improvement_needs(comparison) + + return { + "comparison": comparison, + "improvement_needed": improvement_needed, + "production_dataset_size": len(production_traces), + "evaluation_dataset_size": len(test_queries) + } +``` + +## Using Your Existing LLM Infrastructure + +The power of this approach: **your existing LLM evaluation setup works directly with production agent data**. + +```python +@step +def reuse_llm_evaluation_infrastructure( + agent_results: Dict[str, Any] +) -> Annotated[Dict[str, Any], "comprehensive_metrics"]: + """Reuse existing LLM evaluation datasets and metrics.""" + + # Combine production traces with existing evaluation datasets + production_dataset = convert_traces_to_dataset(agent_results["production_traces"]) + existing_dataset = load_evaluation_dataset("customer_support_v1") # From LLM work + combined_dataset = combine_datasets([production_dataset, existing_dataset]) + + # Apply your existing evaluation metrics to production data + quality_scores = evaluate_with_existing_metrics( + agent_results["results"], + combined_dataset + ) + + # Use your existing LLM evaluators on production traces + semantic_similarity = calculate_semantic_similarity( + agent_results["results"], + combined_dataset + ) + + return { + "quality_scores": quality_scores, + "semantic_similarity": semantic_similarity, + "production_data_included": True + } +``` + +## Making Data-Driven Decisions + +With production traces, you have real performance data: + +```python +@step +def select_winning_configuration( + comparison_report: Dict[str, Any] +) -> Annotated[Dict[str, Any], "selection_decision"]: + """Make data-driven agent selection.""" + + comparison = comparison_report["comparison"] + + # Define success criteria + quality_threshold = 0.8 + tool_accuracy_threshold = 0.7 + + scores = {} + for config_name, metrics in comparison.items(): + quality_score = metrics.get("accuracy", 0) * 0.6 + metrics.get("helpfulness", 0) * 0.4 + tool_score = metrics.get("tool_selection_accuracy", 1.0) # Default to 1.0 if no tools + + combined_score = quality_score * 0.7 + tool_score * 0.3 + scores[config_name] = combined_score + + winner = max(scores, key=scores.get) + + return { + "winner": winner, + "scores": scores, + "meets_threshold": scores[winner] > quality_threshold, + "recommendation": "deploy" if scores[winner] > quality_threshold else "iterate" + } +``` + +## Integration with ZenML Tracking + +Every evaluation run is automatically tracked with: + +- **Evaluation datasets** and versions +- **Metric calculations** and thresholds +- **Comparison results** across configurations +- **Winner selection** and reasoning +- **Complete lineage** from development to evaluation + +Check your ZenML dashboard to see all evaluations, compare runs, and track improvements over time. + +## Best Practices from LLMOps + +Following [proven LLM evaluation principles](../llmops-guide/evaluation/evaluation-in-practice.md): + +1. **Use consistent datasets** - Same test set for fair comparison +2. **Multiple metrics** - Don't rely on single scores +3. **Statistical significance** - Test with sufficient data +4. **Automated evaluation** - Run with every experiment +5. **Version everything** - Track datasets, metrics, and thresholds + +## What's Next? + +You now have systematic evaluation using production data. You can confidently say "Our production agent maintains 85% accuracy on real user queries, with 15% improvement opportunity identified." + +This completes the full agent development workflow: systematic development (Chapter 1), production deployment (Chapter 2), and data-driven evaluation and improvement (Chapter 3). + +{% hint style="success" %} +**Key Achievement**: You've created a complete feedback loop from production traces back to systematic agent improvement using proven evaluation patterns. +{% endhint %} \ No newline at end of file diff --git a/docs/book/user-guide/agent-guide/agent-fundamentals.md b/docs/book/user-guide/agent-guide/agent-fundamentals.md new file mode 100644 index 00000000000..4891a67a67c --- /dev/null +++ b/docs/book/user-guide/agent-guide/agent-fundamentals.md @@ -0,0 +1,234 @@ +--- +description: Set up systematic agent development with ZenML experiment tracking and framework integration. +--- + +# Development & experimentation + +This chapter shows you how to transform experimental agent code into systematic, trackable development using ZenML. You'll learn to wrap any agent implementation in ZenML pipelines for proper experiment tracking and version management. + +

ZenML transforms agent development from experimental scripts into systematic, trackable workflows.

+ +## A note on agent orchestrators vs ZenML + +Some of you might already be familiar with agent orchestration frameworks like [LangGraph](https://www.langchain.com/langgraph) or the [OpenAI SDK](https://openai.github.io/openai-agents-python/). The good news is that these frameworks integrate seamlessly with ZenML pipelines: you can easily embed agent workflows as steps within your pipeline, allowing you to orchestrate, track, and version your agent experiments alongside the rest of your ML workflow. This means you get the benefits of both worlds—leveraging powerful agent frameworks while maintaining systematic experiment tracking and reproducibility through ZenML. + +Having said that, you don't need agent frameworks. Many successful production systems use direct LLM API calls. ZenML works with any approach - frameworks, custom code, or direct API calls. The key is systematic development, not the underlying implementation. + +{% hint style="info" %} +**Quick Start**: If you want to see working examples of ZenML with agent frameworks first, check our [framework integrations example](https://github.com/zenml-io/zenml/tree/main/examples/agent_framework_integrations) with 11+ ready-to-run examples. +{% endhint %} + +## From Scripts to Systematic Development + +Most agent development starts like this: + +```python +# Experimental notebook approach +agent_v1 = create_langgraph_agent(prompt_template_v1) +agent_v2 = create_custom_agent(prompt_template_v2) + +response_v1 = agent_v1.invoke("Analyze this customer feedback...") +response_v2 = agent_v2.run("Analyze this customer feedback...") + +# Manual comparison by eyeballing outputs +print("V1:", response_v1) +print("V2:", response_v2) +# Which is better? Hard to tell... +``` + +ZenML makes this systematic: + +```python +@pipeline +def agent_experiment_pipeline( + agent_config: Dict[str, Any] +) -> Annotated[Dict[str, Any], "experiment_results"]: + """Track agent experiments systematically.""" + + # ZenML automatically versions: + # - Agent configurations and prompts + # - Test datasets and responses + # - Performance metrics + + test_queries = load_test_dataset() + agent_responses = run_agent_experiment(agent_config, test_queries) + + return agent_responses + +# Run multiple experiments - all tracked automatically +config_v1 = {"framework": "langchain", "prompt": "template_v1"} +config_v2 = {"framework": "custom", "prompt": "template_v2"} + +results_v1 = agent_experiment_pipeline(config_v1) +results_v2 = agent_experiment_pipeline(config_v2) + +# ZenML Dashboard shows all experiments, configurations, and results +``` + +## Framework-Agnostic Development + +ZenML works with any agent implementation. Here is the canonical pattern; choose one of the three approaches and link to the examples below. + +### Pattern 1: Direct LLM calls (canonical inline example) +```python +@step +def run_direct_llm_agent(query: str) -> Annotated[str, "llm_response"]: + """Simple, effective - no framework needed.""" + + system_prompt = """You are a customer support assistant. + Analyze the query and provide helpful responses.""" + + response = openai_client.chat.completions.create( + model="gpt-5", + messages=[ + {"role": "system", "content": system_prompt}, + {"role": "user", "content": query} + ] + ) + + return response.choices[0].message.content +``` + +### Pattern 2: Agent Frameworks +```python +@step +def run_framework_agent(query: str) -> Annotated[str, "framework_response"]: + """Use any framework - LangChain, CrewAI, etc.""" + + # Framework choice doesn't matter to ZenML + agent = create_langgraph_agent() # or CrewAI, AutoGen, etc. + response = agent.invoke(query) + + return str(response) +``` + +### Pattern 3: Custom agent logic +```python +@step +def run_custom_agent(query: str) -> Annotated[Dict[str, Any], "custom_response"]: + """Your own agent implementation.""" + + # Multi-step custom logic + context = retrieve_relevant_context(query) + analysis = analyze_query_intent(query) + response = generate_contextual_response(query, context, analysis) + + return { + "response": response, + "context_used": context, + "intent": analysis + } +``` + +## Tool Integration & Configuration + +Modern agents need tools. ZenML helps you manage tool configurations: + +```python +@step +def setup_agent_tools() -> Annotated[Dict[str, Any], "tool_config"]: + """Configure agent capabilities.""" + + return { + "tools": [ + {"name": "web_search", "api_key": "search_key"}, + {"name": "database_query", "connection": "postgresql://..."}, + {"name": "send_email", "smtp_config": {...}} + ], + "mcp_servers": [ + {"name": "filesystem", "path": "/allowed/files"}, + {"name": "github", "repo": "company/repo"} + ] + } + +@step +def run_agent_with_tools( + query: str, + tool_config: Dict[str, Any] +) -> Annotated[Dict[str, Any], "agent_results"]: + """Run agent with configured tools.""" + + # Initialize tools from config + tools = initialize_tools(tool_config["tools"]) + + # Any framework can use these tools + agent = create_agent_with_tools(tools) + result = agent.run(query) + + # Track which tools were actually used + tool_usage = extract_tool_usage(result) + + return { + "response": result.response, + "tools_used": tool_usage, + "config_version": tool_config.get("version") + } +``` + +## Complete Development Pipeline + +Here's how everything comes together: + +```python +@pipeline +def complete_agent_development_pipeline( + agent_type: str, + test_queries: List[str] +) -> Annotated[Dict[str, Any], "development_results"]: + """Complete agent development workflow.""" + + # 1. Setup tools and configuration + tool_config = setup_agent_tools() + + # 2. Run agent experiments + results = [] + for query in test_queries: + if agent_type == "direct": + response = run_direct_llm_agent(query) + elif agent_type == "framework": + response = run_framework_agent(query) + else: + response = run_custom_agent(query) + + results.append({"query": query, "response": response}) + + # 3. Track everything for next chapter (evaluation) + return { + "agent_type": agent_type, + "tool_config": tool_config, + "results": results, + "timestamp": datetime.now() + } +``` + +## What ZenML Tracks Automatically + +Every time you run an experiment, ZenML captures: + +- **Agent Configuration**: Framework, prompts, model parameters +- **Tool Setup**: Available tools, MCP servers, API configurations +- **Input Data**: Test queries and datasets used +- **Outputs**: Agent responses and metadata +- **Environment**: Dependencies, versions, runtime environment +- **Lineage**: Complete chain from config to results + +## Ready-to-Use Examples + +We provide working examples for 11+ frameworks. Each follows the same ZenML pattern: + +- [AutoGen](https://github.com/zenml-io/zenml/tree/main/examples/agent_framework_integrations/autogen) - Multi-agent conversations +- [LangGraph](https://github.com/zenml-io/zenml/tree/main/examples/agent_framework_integrations/langgraph) - Graph-based agents +- [CrewAI](https://github.com/zenml-io/zenml/tree/main/examples/agent_framework_integrations/crewai) - Role-based crews +- [OpenAI SDK](https://github.com/zenml-io/zenml/tree/main/examples/agent_framework_integrations/openai_agents_sdk) - Official OpenAI agents +- [And 7 more...](https://github.com/zenml-io/zenml/tree/main/examples/agent_framework_integrations) + +## Getting Started + +1. **Pick your approach**: Direct LLM calls, existing framework, or custom logic +2. **Wrap in ZenML pipeline**: Use the patterns shown above +3. **Run experiments**: ZenML tracks everything automatically +4. **Check the dashboard**: See all your experiments and configurations + +You now have systematic agent development with full experiment tracking. Next, we'll learn how to [deploy these agents in production](agent-deployment.md) while maintaining ZenML integration. + +

ZenML provides systematic tracking for all your agent development experiments.

\ No newline at end of file diff --git a/docs/book/user-guide/toc.md b/docs/book/user-guide/toc.md index 6ac98c81360..8420d61b8f8 100644 --- a/docs/book/user-guide/toc.md +++ b/docs/book/user-guide/toc.md @@ -16,6 +16,10 @@ * [Configure a code repository](production-guide/connect-code-repository.md) * [Set up CI/CD](production-guide/ci-cd.md) * [An end-to-end project](production-guide/end-to-end.md) +* [Agent guide](agent-guide/README.md) + * [Development & experimentation](agent-guide/agent-fundamentals.md) + * [Production & deployment](agent-guide/agent-deployment.md) + * [Evaluation & improvement](agent-guide/agent-evaluation.md) * [LLMOps guide](llmops-guide/README.md) * [RAG with ZenML](llmops-guide/rag-with-zenml/README.md) * [RAG in 85 lines of code](llmops-guide/rag-with-zenml/rag-85-loc.md) diff --git a/examples/agent_framework_integrations/README.md b/examples/agent_framework_integrations/README.md index ced53c7e856..e82897de779 100644 --- a/examples/agent_framework_integrations/README.md +++ b/examples/agent_framework_integrations/README.md @@ -7,7 +7,7 @@

Agent Frameworks Integration Examples

- Production-ready agent orchestration with ZenML + Systematic agent development with ZenML
Features · @@ -32,9 +32,9 @@

-# 🤖 Agent Frameworks + ZenML +# 🤖 ZenML Agent Development Platform -This collection demonstrates how to integrate popular agent frameworks with ZenML for production-grade AI agent orchestration. Each example follows consistent patterns and best practices, making it easy to adapt any framework for your specific use case. +This collection demonstrates how to use ZenML as your **agent development platform** - applying the same systematic development practices you use for traditional ML to agent development. Each example shows how any agent framework can benefit from ZenML's experiment tracking, evaluation workflows, and deployment capabilities. ## 🚀 Quick Start @@ -121,17 +121,6 @@ def agent_pipeline() -> str: - **LlamaIndex**: Function agents with async capabilities - **OpenAI Agents SDK**: Structured execution with OpenAI -## 🔄 Implementation Notes - -### Production vs. Demos -**These examples demonstrate single-query execution for simplicity.** In production, ZenML's value comes from: -- **Batch processing**: Process hundreds/thousands of queries overnight -- **Agent evaluation**: Compare different frameworks on test datasets -- **Data pipelines**: Use agents to process document collections -- **A/B testing**: Systematic comparison of agent configurations - -For real-time serving, use FastAPI/Flask directly. Use ZenML for the operational layer. - ### Async Frameworks Some frameworks require async handling within ZenML steps: - **LlamaIndex**: `asyncio.run(agent.run(query))` @@ -163,35 +152,26 @@ Each framework returns different response types: - 🐛 [Report issues](https://github.com/zenml-io/zenml/issues) - 💡 [Request features](https://zenml.io/discussion) -## 🌟 About ZenML - -ZenML is an extensible, open-source MLOps framework for creating production-ready ML pipelines. These agent framework integrations showcase ZenML's flexibility in orchestrating AI workflows beyond traditional ML use cases. - -**Why ZenML for Agent Orchestration?** -- 🔄 **Reproducible workflows**: Version and track agent executions -- 📊 **Artifact management**: Store and version agent inputs/outputs -- 🎯 **Production ready**: Built-in monitoring, logging, and error handling -- 🔧 **Tool agnostic**: Works with any agent framework -- ☁️ **Cloud native**: Deploy anywhere with consistent behavior - ## 📖 Learn More | Resource | Description | |----------|-------------| +| 🤖 **[Agent Guide]** | Complete guide to agent development with ZenML | +| 📊 **[LLM Evaluation]** | Proven evaluation patterns for agents and LLMs | | 🧘 **[ZenML 101]** | New to ZenML? Start here! | | ⚛ **[Core Concepts]** | Understand ZenML fundamentals | -| 🤖 **[LLMOps Guide]** | Complete guide to LLMOps with ZenML | | 📓 **[Documentation]** | Full ZenML documentation | | 📒 **[API Reference]** | Detailed API documentation | | ⚽ **[Examples]** | More ZenML examples | -[ZenML 101]: https://docs.zenml.io/user-guides/starter-guide +[Agent Guide]: https://docs.zenml.io/user-guide/agent-guide +[LLM Evaluation]: https://docs.zenml.io/user-guide/llmops-guide/evaluation +[ZenML 101]: https://docs.zenml.io/user-guide/starter-guide [Core Concepts]: https://docs.zenml.io/getting-started/core-concepts -[LLMOps Guide]: https://docs.zenml.io/user-guides/llmops-guide [Documentation]: https://docs.zenml.io/ -[SDK Reference]: https://sdkdocs.zenml.io/ +[API Reference]: https://sdkdocs.zenml.io/ [Examples]: https://github.com/zenml-io/zenml/tree/main/examples --- -*This collection demonstrates the power and flexibility of ZenML for orchestrating diverse agent frameworks in production environments.* \ No newline at end of file +*This collection demonstrates how ZenML transforms agent development into a systematic, trackable workflow - applying the same rigor you use for traditional ML to agent development.*