Evaluating Chatbot responses

Jump to bottom

yangm2 edited this page Feb 16, 2026 · 2 revisions

Terminology & Concepts

term	definition
dataset	tbd
evaluation	tbd
evaluator	tbd
experiment	tbd
llm-as-a-judge	tbd
single-turn conversation	tbd
multi-turn conversation	tbd
trajectory evaluation	tbd
simulated user	tbd

Technology

tech	description
langchain	tbd
langsmith	tbd

Experimental Flow

Please see the EVALUATION.md in the repo for setting up and running experiments with LangSmith.

Understanding and Navigating Experimental Results in LangSmith

TBD