|
| 1 | +# OpenTelemetry |
| 2 | + |
| 3 | +- Title: OpenTelemetry |
| 4 | +- Status: Accepted |
| 5 | +- Minimum Server Version: N/A |
| 6 | + |
| 7 | +______________________________________________________________________ |
| 8 | + |
| 9 | +## Abstract |
| 10 | + |
| 11 | +This specification defines requirements for drivers' OpenTelemetry integration and behavior. Drivers will trace database |
| 12 | +commands and driver operations with a pre-defined set of attributes when OpenTelemetry is enabled and configured in an |
| 13 | +application. |
| 14 | + |
| 15 | +## META |
| 16 | + |
| 17 | +The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and |
| 18 | +"OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt). |
| 19 | + |
| 20 | +## Specification |
| 21 | + |
| 22 | +### Terms |
| 23 | + |
| 24 | +**Host Application** |
| 25 | + |
| 26 | +An application that uses the MongoDB driver. |
| 27 | + |
| 28 | +**Span** |
| 29 | + |
| 30 | +A Span represents a single operation within a trace. Spans can be nested to form a trace tree. Each trace contains a |
| 31 | +root span, which typically describes the entire operation and, optionally, one or more sub-spans for its sub-operations. |
| 32 | + |
| 33 | +Spans encapsulate: |
| 34 | + |
| 35 | +- The span name |
| 36 | +- An immutable SpanContext that uniquely identifies the Span |
| 37 | +- A parent span in the form of a Span, SpanContext, or null |
| 38 | +- A SpanKind |
| 39 | +- A start timestamp |
| 40 | +- An end timestamp |
| 41 | +- Attributes |
| 42 | +- A list of Links to other Spans |
| 43 | +- A list of timestamped Events |
| 44 | +- A Status. |
| 45 | + |
| 46 | +**Tracer** |
| 47 | + |
| 48 | +A Tracer is responsible for creating spans, and using a tracer is the only way to create a span. A Tracer is not |
| 49 | +responsible for configuration; this should be the responsibility of the TracerProvider instead. |
| 50 | + |
| 51 | +**OpenTelemetry API and SDK** |
| 52 | + |
| 53 | +OpenTelemetry offers two components for implementing instrumentation – API and SDK. The OpenTelemetry API provides all |
| 54 | +the necessary types and method signatures. If there is no OpenTelemetry SDK available at runtime, API methods are no-op. |
| 55 | +OpenTelemetry SDK is an actual implementation of the API. If the SDK is available, API methods do work. |
| 56 | + |
| 57 | +### Implementation Requirements |
| 58 | + |
| 59 | +Drivers MAY add a dependency to the corresponding OpenTelemetry API. This is the recommended way for implementing |
| 60 | +OpenTelemetry in libraries. Alternatively, drivers can implement OpenTelemetry support using any suitable tools within |
| 61 | +the driver ecosystem. Drivers MUST NOT add a dependency to OpenTelemetry SDK. |
| 62 | + |
| 63 | +#### Enabling and Disabling OpenTelemetry |
| 64 | + |
| 65 | +OpenTelemetry SHOULD be disabled by default. |
| 66 | + |
| 67 | +Drivers SHOULD support configuring OpenTelemetry on multiple levels. |
| 68 | + |
| 69 | +- **MongoClient Level**: Drivers SHOULD provide a configuration option for `MongoClient`'s Configuration/Settings that |
| 70 | + enables or disables tracing for operations and commands executed with this client. This option MUST override |
| 71 | + settings on higher levels. |
| 72 | +- **Driver Level**: Drivers SHOULD provide a global setting that enables or disables OpenTelemetry for all `MongoClient` |
| 73 | + instances (excluding those that explicitly override the setting). This configuration can be implemented with an |
| 74 | + environment variable `OTEL_#{LANG}_INSTRUMENTATION_MONGODB_ENABLED`. Drivers MAY provide other means to globally |
| 75 | + disable OpenTelemetry that are more suitable for their language ecosystem. This option MUST override settings on the |
| 76 | + higher level. |
| 77 | +- **Host Application Level**: If the host application enables OpenTelemetry for all available instrumentations (e.g., |
| 78 | + Ruby), and a driver can detect this, OpenTelemetry SHOULD be enabled in the driver. |
| 79 | + |
| 80 | +Drivers MUST NOT try to detect whether the OpenTelemetry SDK library is available, and enable tracing based on this. |
| 81 | + |
| 82 | +#### Tracer Attributes |
| 83 | + |
| 84 | +If a driver creates a Tracer using OpenTelemetry API, drivers MUST use the following attributes: |
| 85 | + |
| 86 | +- `name`: A string that identifies the driver. It can be the name of a driver's component (e.g., "mongo", "PyMongo") or |
| 87 | + a package name (e.g., "com.mongo.Driver"). Drivers SHOULD select a name that is idiomatic for their language and |
| 88 | + ecosystem. Drivers SHOULD follow the Instrumentation Scope guidance. |
| 89 | +- `version`: The version of the driver. |
| 90 | + |
| 91 | +#### Instrumenting Driver Operations |
| 92 | + |
| 93 | +When a user calls the driver's public API, the driver MUST create a span for every driver operation. Drivers MUST start |
| 94 | +the span as soon as possible so that the span’s duration reflects all activities made by the driver, such as server |
| 95 | +selection and serialization/deserialization. |
| 96 | + |
| 97 | +##### `withTransaction` |
| 98 | + |
| 99 | +The `withTransaction` operation is a special case because it may include other operations that are executed "in scope" |
| 100 | +of `withTransaction`. In this case, spans for operations that are executed inside the callbacks SHOULD be nested into |
| 101 | +the `withTransaction` span. |
| 102 | + |
| 103 | +##### Span Name |
| 104 | + |
| 105 | +The span name SHOULD be: |
| 106 | + |
| 107 | +- `driver_operation_name db.collection_name` if the command is executed on a collection (e.g., |
| 108 | + `findOneAndDelete warehouse.users`). |
| 109 | +- `db.driver_operation_name` if there is no specific collection for the command (e.g., `warehouse.runCommand`). |
| 110 | + |
| 111 | +##### Span Kind |
| 112 | + |
| 113 | +Span kind MUST be "client". |
| 114 | + |
| 115 | +##### Span Attributes |
| 116 | + |
| 117 | +Spans SHOULD have the following attributes: |
| 118 | + |
| 119 | +| Attribute | Type | Description | Requirement Level | |
| 120 | +| :--------------------- | :------- | :------------------------------------------------------------------------- | :-------------------- | |
| 121 | +| `db.system` | `string` | MUST be 'mongodb' | Required | |
| 122 | +| `db.namespace` | `string` | The database name | Required if available | |
| 123 | +| `db.collection.name` | `string` | The collection being accessed within the database stated in `db.namespace` | Required if available | |
| 124 | +| `db.operation.name` | `string` | The name of the driver operation being executed | Required | |
| 125 | +| `db.operation.summary` | `string` | Equivalent to span name | Required | |
| 126 | +| `db.mongodb.cursor_id` | `int64` | If a cursor is created or used in the operation | Required if available | |
| 127 | + |
| 128 | +Not all attributes are available at the moment of span creation. Drivers need to add attributes at later stages, which |
| 129 | +requires an operation span to be available throughout the complete operation lifecycle. |
| 130 | + |
| 131 | +##### Exceptions |
| 132 | + |
| 133 | +If the driver operation fails with an exception, drivers MUST record an exception to the current operation span. When |
| 134 | +recording an exception, drivers SHOULD add the following attributes to the span, when the content for the attribute if |
| 135 | +available: |
| 136 | + |
| 137 | +- `exception.message` |
| 138 | +- `exception.type` |
| 139 | +- `exception.stacktrace` |
| 140 | + |
| 141 | +#### Instrumenting Server Commands |
| 142 | + |
| 143 | +Drivers MUST create a span for every server command sent to the server as a result of a public API call, except for |
| 144 | +sensitive commands as listed in the command logging and monitoring specification. |
| 145 | + |
| 146 | +Spans for commands MUST be nested to the span for the corresponding driver operation span. If the command is being |
| 147 | +retried, the driver MUST create a separate span for each retry. |
| 148 | + |
| 149 | +##### Span Name |
| 150 | + |
| 151 | +The span name SHOULD be: |
| 152 | + |
| 153 | +- `server_command db.collection_name` if the command is executed on a collection (e.g., |
| 154 | + `findAndModify warehouse.users`). |
| 155 | +- `db.server_command` if there is no specific collection for the command. |
| 156 | + |
| 157 | +##### Span Kind |
| 158 | + |
| 159 | +Span kind MUST be "client". |
| 160 | + |
| 161 | +##### Span Attributes |
| 162 | + |
| 163 | +Spans SHOULD have the following attributes: |
| 164 | + |
| 165 | +| Attribute | Type | Description | Requirement Level | |
| 166 | +| :-------------------------------- | :------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :--------------------------- | |
| 167 | +| `db.system` | `string` | MUST be 'mongodb' | Required | |
| 168 | +| `db.namespace` | `string` | The database name | Required if available | |
| 169 | +| `db.collection.name` | `string` | The collection being accessed within the database stated in `db.namespace` | Required if available | |
| 170 | +| `db.command.name` | `string` | The name of the server command being executed | Required | |
| 171 | +| `db.response.status_code` (\*) | `string` | MongoDB error code represented as a string. This attribute should be added only if an error happens. | Required if an error happens | |
| 172 | +| `error.type` (\*) | `string` | Describes a class of error the operation ended with. This attribute should be added only if an error happens. Examples: `timeout; java.net.UnknownHostException; server_certificate_invalid; 500`. | Required if an error happens | |
| 173 | +| `server.port` | `int64` | Server port number | Required | |
| 174 | +| `server.address` | `string` | Name of the database host, or IP address if name is not known | Required | |
| 175 | +| `network.transport` | `string` | MUST be 'tcp' or 'unix' depending on the protocol | Required | |
| 176 | +| `db.query.summary` | `string` | Equivalent to span name | Required | |
| 177 | +| `db.mongodb.server_connection_id` | `int64` | Server connection id | Required if available | |
| 178 | +| `db.mongodb.driver_connection_id` | `int64` | Local connection id | Required if available | |
| 179 | +| `db.query.text` (\*\*) | `string` | Database command that was sent to the server. Content should be equivalent to the `document` field of the CommandStartedEvent of the command monitoring. | Conditional | |
| 180 | +| `db.mongodb.cursor_id` (\*\*\*) | `int64` | If a cursor is created or used in the operation | Required if available | |
| 181 | + |
| 182 | +(\*) `db.response.status_code` and `error.type` attributes should be added only if the command was not successful. The |
| 183 | +content of `error.type` is language specific; a driver decides what best describes the error. |
| 184 | + |
| 185 | +(\*\*) `db.query.text` contains the full database command executed serialized to extended JSON. Drivers MUST NOT add |
| 186 | +this attribute by default. Drivers MUST provide a toggle to enable this attribute. This configuration can be implemented |
| 187 | +with an environment variable `OTEL_#{LANG}_INSTRUMENTATION_MONGODB_QUERY_TEXT_MAX_LENGTH` set to a positive integer |
| 188 | +value. The attribute will be added and truncated to the provided value (similar to the Logging specification). |
| 189 | + |
| 190 | +(\*\*\*) If the command returns a cursor, or uses a cursor, the `cursor_id` attribute SHOULD be added. |
| 191 | + |
| 192 | +##### Exception Handling |
| 193 | + |
| 194 | +If an exception was thrown, it MUST be recorded in accordance with OpenTelemetry specifications for exceptions. |
| 195 | + |
| 196 | +## Motivation for Change |
| 197 | + |
| 198 | +A common complaint from our support team is that they don't know how to easily get debugging information from drivers. |
| 199 | +Some drivers provide debug logging, but others do not. For drivers that do provide it, the log messages produced and the |
| 200 | +mechanisms for enabling debug logging are inconsistent. |
| 201 | + |
| 202 | +Although users can implement their own debug logging support via existing driver events (SDAM, APM, etc), this requires |
| 203 | +code changes. It is often difficult to quickly implement and deploy such changes in production at the time they are |
| 204 | +needed, and to remove the changes afterward. Additionally, there are useful scenarios to log that do not correspond to |
| 205 | +existing events. Standardizing on debug log messages that drivers produce and how to enable/configure logging will |
| 206 | +provide TSEs, CEs, and MongoDB users an easier way to get debugging information out of our drivers, facilitate support |
| 207 | +of drivers for our internal teams, and improve our documentation around troubleshooting. |
| 208 | + |
| 209 | +## Test Plan |
| 210 | + |
| 211 | +TODO |
| 212 | + |
| 213 | +## Backwards Compatibility |
| 214 | + |
| 215 | +Introduction of OpenTelemetry in new driver versions should not significantly affect existing applications that do not |
| 216 | +enable OpenTelemetry. However, since the no-op tracing operation may introduce some performance degradation (though it |
| 217 | +should be negligible), customers should be informed of this feature and how to disable it completely. |
| 218 | + |
| 219 | +If a driver is used in an application that has OpenTelemetry enabled, customers will see traces from the driver in their |
| 220 | +OpenTelemetry backends. This may be unexpected and MAY cause negative effects in some cases (e.g., the OpenTelemetry |
| 221 | +backend MAY not have enough capacity to process new traces). Customers should be informed of this feature and how to |
| 222 | +disable it completely. |
| 223 | + |
| 224 | +## Security Implication |
| 225 | + |
| 226 | +Drivers MUST take care to avoid exposing sensitive information (e.g. authentication credentials) in traces. |
0 commit comments