diff --git a/content/en/llm_observability/experiments/_index.md b/content/en/llm_observability/experiments/_index.md index 5bad706fa7e..b463aecc76f 100644 --- a/content/en/llm_observability/experiments/_index.md +++ b/content/en/llm_observability/experiments/_index.md @@ -576,6 +576,8 @@ List all projects, sorted by creation date. The most recently created projects a | ---- | ---- | --- | | `filter[id]` | string | The ID of a project to search for. | | `filter[name]` | string | The name of a project to search for. | +| `filter[is_deleted]` | boolean | Filter for deleted projects. | +| `include[user_data]` | boolean | Include user data in the response. | | `page[cursor]` | string | List results with a cursor provided in the previous query. | | `page[limit]` | int | Limits the number of results. | @@ -605,6 +607,7 @@ Create a project. If there is an existing project with the same name, the API re | Field | Type | Description | | ---- | ---- | ---- | +| `ml_app` | string | ML app identifier. | | `name` (_required_) | string | Unique project name. | | `description` | string | Project description. | @@ -628,6 +631,7 @@ Partially update a project object. Specify the fields to update in the payload. | Field | Type | Description | | ---- | ---- | ---- | +| `ml_app` | string | ML app identifier. | | `name` | string | Unique project name. | | `description` | string | Project description. | @@ -636,10 +640,13 @@ Partially update a project object. Specify the fields to update in the payload. | Field | Type | Description | | ---- | ---- | ---- | | `id` | UUID | Unique project ID. Set at the top level `id` field within the [Data](#object-data) object. | +| `ml_app` | string | ML app identifier. | | `name` | string | Unique project name. | | `description` | string | Project description. | | `created_at` | timestamp | Timestamp representing when the resource was created. | | `updated_at` | timestamp | Timestamp representing when the resource was last updated. | +| `deleted_at` | timestamp | Timestamp representing when the resource was deleted (soft delete). | +| `author` | object | User who created the project. | {{% /collapse-content %}} @@ -651,7 +658,7 @@ Delete one or more projects. | Field | Type | Description | | ---- | ---- | ---- | -| `project_ids` (_required_) | []UUID | List of project IDs to delete. | +| `project_ids` (_required_) | array of strings | List of project IDs to delete. | **Response** @@ -673,6 +680,8 @@ List all datasets, sorted by creation date. The most recently-created datasets a | ---- | ---- | --- | | `filter[name]` | string | The name of a dataset to search for. | | `filter[id]` | string | The ID of a dataset to search for. | +| `filter[is_deleted]` | boolean | Filter for deleted datasets. | +| `include[user_data]` | boolean | Include user data in the response. | | `page[cursor]` | string | List results with a cursor provided in the previous query. | | `page[limit]` | int | Limits the number of results. | @@ -687,12 +696,16 @@ List all datasets, sorted by creation date. The most recently-created datasets a | Field | Type | Description | | ---- | ---- | ---- | | `id` | string | Unique dataset ID. Set at the top level `id` field within the [Data](#object-data) object. | +| `project_id` | string | Unique project ID. | | `name` | string | Unique dataset name. | | `description` | string | Dataset description. | | `metadata` | json | Arbitrary key-value metadata associated with the dataset. | | `current_version` | int | The current version number of the dataset. Versions start at 0 and increment when records are added or modified. | +| `dataset_type` | string | Type of dataset. | | `created_at` | timestamp | Timestamp representing when the resource was created. | | `updated_at` | timestamp | Timestamp representing when the resource was last updated. | +| `deleted_at` | timestamp | Timestamp representing when the resource was deleted (soft delete). | +| `author` | object | User who created the dataset. | {{% /collapse-content %}} @@ -707,6 +720,7 @@ Create a dataset. If there is an existing dataset with the same name, the API re | `name` (_required_) | string | Unique dataset name. | | `description` | string | Dataset description. | | `metadata` | json | Arbitrary key-value metadata associated with the dataset. | +| `dataset_type` | string | Type of dataset. | **Response** @@ -751,11 +765,18 @@ List all dataset records, sorted by creation date. The most recently-created rec | ---- | ---- | ---- | | `id` | string | Unique record ID. | | `dataset_id` | string | Unique dataset ID. | +| `span_id` | string | Associated span ID. | +| `trace_id` | string | Associated trace ID. | | `input` | any (string, number, Boolean, object, array) | Data that serves as the starting point for an experiment. | | `expected_output` | any (string, number, Boolean, object, array) | Expected output. | | `metadata` | json | Arbitrary key-value metadata associated with the record. | | `created_at` | timestamp | Timestamp representing when the resource was created. | | `updated_at` | timestamp | Timestamp representing when the resource was last updated. | +| `deleted_at` | timestamp | Timestamp representing when the resource was deleted (soft delete). | +| `ttl` | string | Time-to-live for the record. | +| `version` | int | Record version number. | +| `author` | object | User who created the record. | +| `_dd` | object | Internal Datadog attributes including content preview metadata. | {{% /collapse-content %}} @@ -769,12 +790,14 @@ Appends records for a given dataset. | ---- | ---- | --- | | `deduplicate` | bool | If `true`, deduplicates appended records. Defaults to `true`. | | `records` (_required_) | [][RecordReq](#object-recordreq) | List of records to create. | +| `create_new_version` | bool | If `true`, creates a new dataset version. | #### Object: RecordReq | Field | Type | Description | | ---- | ---- | ---- | -| `input` (_required_) | any (string, number, Boolean, object, array) | Data that serves as the starting point for an experiment. | +| `id` | string | Optional record ID. | +| `input` | any (string, number, Boolean, object, array) | Data that serves as the starting point for an experiment. | | `expected_output` | any (string, number, Boolean, object, array) | Expected output. | | `metadata` | json | Arbitrary key-value metadata associated with the record. | @@ -902,12 +925,19 @@ List all experiments, sorted by creation date. The most recently-created experim | `id` | UUID | Unique experiment ID. Set at the top level `id` field within the [Data](#object-data) object. | | `project_id` | string | Unique project ID. | | `dataset_id` | string | Unique dataset ID. | +| `dataset_version` | int | Dataset version number. | +| `dataset_name` | string | Dataset name. | +| `experiment` | string | Experiment identifier. | | `name` | string | Unique experiment name. | | `description` | string | Experiment description. | | `metadata` | json | Arbitrary key-value metadata associated with the experiment. | +| `aggregate_data` | json | Aggregated experiment data. | +| `run_count` | int | Number of experiment runs. | | `config` | json | Configuration used when creating the experiment. | | `created_at` | timestamp | Timestamp representing when the resource was created. | | `updated_at` | timestamp | Timestamp representing when the resource was last updated. | +| `deleted_at` | timestamp | Timestamp representing when the resource was deleted (soft delete). | +| `author` | object | User who created the experiment. | {{% /collapse-content %}} @@ -927,6 +957,7 @@ Create an experiment. If there is an existing experiment with the same name, the | `ensure_unique` | bool | If `true`, Datadog generates a new experiment with a unique name in the case of a conflict. Default is `true`. | | `metadata` | json | Arbitrary key-value metadata associated with the experiment. | | `config` | json | Configuration used when creating the experiment. | +| `run_count` | int | Number of runs for the experiment. | **Response** @@ -954,6 +985,8 @@ Partially update an experiment object. Specify the fields to update in the paylo | ---- | ---- | ---- | | `name` | string | Unique experiment name. | | `description` | string | Experiment description. | +| `dataset_id` | string | Unique dataset ID. | +| `metadata` | json | Arbitrary key-value metadata associated with the experiment. | **Response** @@ -1004,8 +1037,10 @@ Push events (spans and metrics) for an experiment. | ---- | ---- | ---- | | `trace_id` | string | Trace ID. | | `span_id` | string | Span ID. | +| `parent_id` | string | Parent span ID. | | `project_id` | string | Project ID. | | `dataset_id` | string | Dataset ID. | +| `dataset_record_id` | string | Dataset record ID associated with this span. | | `name` | string | Span name (for example, task name). | | `start_ns` | number | Span start time in nanoseconds. | | `duration` | number | Span duration in nanoseconds. | @@ -1015,19 +1050,25 @@ Push events (spans and metrics) for an experiment. | `meta.output` | json | Output payload associated with the span. | | `meta.expected_output` | json | Expected output for the span. | | `meta.error` | object | Error details: `message`, `stack`, `type`. | +| `meta.span` | object | Span-specific metadata (for example, `kind`). | +| `meta.metadata` | json | Arbitrary key-value metadata. | #### Object: Metric | Field | Type | Description | | ---- | ---- | ---- | +| `id` | string | Metric ID (internally generated UUID). | | `span_id` | string | Associated span ID. | -| `metric_type` | string | Metric type. One of: `score`, `categorical`. | +| `metric_type` | string | Metric type. One of: `score`, `categorical`, `boolean`. | | `timestamp_ms` | number | UNIX timestamp in milliseconds. | | `label` | string | Metric label (evaluator name). | | `score_value` | number | Score value (when `metric_type` is `score`). | | `categorical_value` | string | Categorical value (when `metric_type` is `categorical`). | +| `boolean_value` | boolean | Boolean value (when `metric_type` is `boolean`). | +| `metric_source` | string | Source of the metric (for example, `custom`, `summary`). | +| `eval_metric_type` | string | Type of evaluation metric. | | `metadata` | json | Arbitrary key-value metadata associated with the metric. | -| `error.message` | string | Optional error message for the metric. | +| `error` | object | Error details: `message`, `stack`, `type`. | **Response** diff --git a/content/en/llm_observability/instrumentation/api.md b/content/en/llm_observability/instrumentation/api.md index 63cf4aaeaa1..2fe8d00ae49 100644 --- a/content/en/llm_observability/instrumentation/api.md +++ b/content/en/llm_observability/instrumentation/api.md @@ -153,6 +153,8 @@ If the request is successful, the API responds with a 202 network code and an em | messages| [Message](#message) | List of messages. This should only be used for LLM spans. | | documents| [Document](#document) | List of documents. This should only be used as the output for retrieval spans | | prompt | [Prompt](#prompt) | Structured prompt metadata that includes the template and variables used for the LLM input. This should only be used for input IO on LLM spans. | +| embedding | []float | Embedding vector representation. **Only valid for embedding spans.** | +| parameters | object | Parameters used for the LLM request or response. **Only valid for LLM spans.** | **Note**: When only `input.messages` is set for an LLM span, Datadog infers `input.value` from `input.messages` and uses the following inference logic: @@ -166,6 +168,8 @@ If the request is successful, the API responds with a 202 network code and an em |----------------------|--------|--------------------------| | content [*required*] | string | The body of the message. | | role | string | The role of the entity. | +| tool_calls | []object | List of tool calls made by the LLM. | +| tool_results | []object | List of tool results returned to the LLM. | #### Document | Field | Type | Description | @@ -184,6 +188,7 @@ If the request is successful, the API responds with a 202 network code and an em | Field | Type | Description | |----------------------|--------|--------------------------| | id | string | Logical identifier for this prompt template. Should be unique per `ml_app`. | +| name | string | Human-readable name for the prompt. | | version | string | Version tag for the prompt (for example, "1.0.0"). If not provided, LLM Observability automatically generates a version by computing a hash of the template content. | | template | string | Single string template form. Use placeholder syntax (like `{{variable_name}}`) to embed variables. This should not be set with `chat_template`. | | chat_template | [[Message]](#message) | Multi-message template form. Use placeholder syntax (like `{{variable_name}}`) to embed variables in message content. This should not be set with `template`. | @@ -219,10 +224,14 @@ If the request is successful, the API responds with a 202 network code and an em | Field | Type | Description | |-------------|-------------------|--------------| | kind [*required*] | string | The [span kind][2]: `"agent"`, `"workflow"`, `"llm"`, `"tool"`, `"task"`, `"embedding"`, or `"retrieval"`. | +| model_name | string | The name of the model used. **Only valid for LLM spans.** | +| model_provider | string | The provider of the model. **Only valid for LLM spans.** | +| model_version | string | The version of the model. **Only valid for LLM spans.** | +| embedding_for_prompt_idx | integer | Index of the prompt for which the embedding is generated. **Only valid for embedding spans.** | | error | [Error](#error) | Error information on the span. | | input | [IO](#io) | The span's input information. | | output | [IO](#io) | The span's output information. | -| metadata | Dict[key (string), value] where the value is a float, bool, or string | Data about the span that is not input or output related. Use the following metadata keys for LLM spans: `temperature`, `max_tokens`, `model_name`, and `model_provider`. | +| metadata | Dict[key (string), value] where the value is a float, bool, or string | Data about the span that is not input or output related. Use the following metadata keys for LLM spans: `temperature`, `max_tokens`. | #### Metrics | Field | Type | Description | @@ -251,6 +260,9 @@ If the request is successful, the API responds with a 202 network code and an em | duration [*required*] | float64 | The span's duration in nanoseconds. | | meta [*required*] | [Meta](#meta) | The core content relative to the span. | | status | string | Error status (`"ok"` or `"error"`). Defaults to `"ok"`. | +| service | string | The service name associated with the span. | +| ml_app | string | The ML application name. Overrides the top-level `ml_app` field. | +| ml_app_version | string | The ML application version. | | apm_trace_id | string | The ID of the associated APM trace. Defaults to match the `trace_id` field. | | metrics | [Metrics](#metrics) | Datadog metrics to collect. | | session_id | string | The span's `session_id`. Overrides the top-level `session_id` field. | @@ -266,6 +278,7 @@ If the request is successful, the API responds with a 202 network code and an em | Field | Type | Description | |----------|---------------------|--------------| | ml_app [*required*] | string | The name of your LLM application. See [Application naming guidelines](#application-naming-guidelines). | +| ml_app_version | string | The version of your LLM application. | | spans [*required*] | [[Span](#span)] | A list of spans. | | tags | [[Tag](#tag)] | A list of top-level tags to apply to each span. | | session_id | string | The session the list of spans belongs to. Can be overridden or set on individual spans as well. | @@ -464,8 +477,11 @@ Evaluations must be joined to a unique span. You can identify the target span us |--------------------------------------------------------------------|---------------------|--------------------------------------------------------------------------------------------------------| | ID | string | Evaluation metric UUID (generated upon submission). | | join_on [*required*] | [[JoinOn](#joinon)] | How the evaluation is joined to a span. | +| trace_id | string | The trace ID of the span associated with this evaluation. | +| span_id | string | The span ID of the span associated with this evaluation. | | timestamp_ms [*required*] | int64 | A UTC UNIX timestamp in milliseconds representing the time the request was sent. | | ml_app [*required*] | string | The name of your LLM application. See [Application naming guidelines](#application-naming-guidelines). | +| ml_app_version | string | The version of your LLM application. | | metric_type [*required*] | string | The type of evaluation: `"categorical"`, `"score"`, or `"boolean"`. | | label [*required*] | string | The unique name or label for the provided evaluation . | | categorical_value [*required if the metric_type is "categorical"*] | string | A string representing the category that the evaluation belongs to. | @@ -474,6 +490,7 @@ Evaluations must be joined to a unique span. You can identify the target span us | assessment | string | An assessment of this evaluation. Accepted values are `pass` and `fail`. | | reasoning | string | A text explanation of the evaluation result. | | tags | [[Tag](#tag)] | A list of tags to apply to this particular evaluation metric. | +| metadata | object | Arbitrary key-value metadata associated with the evaluation. | #### JoinOn @@ -486,15 +503,15 @@ Evaluations must be joined to a unique span. You can identify the target span us | Field | Type | Description | |------------|-----------------|--------------| -| span_id | string | The span ID of the span that this evaluation is associated with. | -| trace_id | string | The trace ID of the span that this evaluation is associated with. | +| span_id [*required*] | string | The span ID of the span that this evaluation is associated with. | +| trace_id [*required*] | string | The trace ID of the span that this evaluation is associated with. | #### TagContext | Field | Type | Description | |------------|-----------------|--------------| -| key | string | The tag key name. This must be the same key used when setting the tag on the span. | -| value | string | The tag value. This value must match exactly one span with the specified tag key/value pair. | +| key [*required*] | string | The tag key name. This must be the same key used when setting the tag on the span. | +| value [*required*] | string | The tag value. This value must match exactly one span with the specified tag key/value pair. | #### EvalMetricsRequestData