-
Notifications
You must be signed in to change notification settings - Fork 282
Description
For a simple database queries (such as SELECT * from foo where bar="baz"), we'd like to capture the following attributes when possible (on spans):
db.operation.name = SELECTdb.collection.name = foo(akadb.sql.tableand other similar system-specific attributes)db.collection.namespace = mydb(akadb.namealong withdb.instance.idand similar)db.query.text = SELECT * from foo where bar=?(akadb.statement)db.query.parameter.bar=baz
(attribute names are being discussed and are not final).
This simple case is supported by current version of DB semantic conventions. It's also a common one in the NoSQL world if we exclude bulk operations (or non-homogeneous batch operations).
Attributes (except db.query.* ones) have reasonable cardinality and can be used on traces and metrics.
More complicated queries involve multiple operations, tables, or even databases. E.g. in SELECT * from foo JOIN bar ON baz
we have two operations (SELECT and JOIN), two tables (foo and ``bar`), and just one database.
db.query.textanddb.query.parameter.*for such queries are still relevant and make sense on spans (still cardinality is a problem for metrics)- Operation and collection names become problematic to record
DB WG is considering multiple options:
Option 1: always capture db.operation.names, db.collection.names, db.collection.namespaces as arrays
Pros: consistent understandable model
Cons:
- array attribute on metrics
- simple case (one operation) becomes hard to use
- hard to query and not quite useful on spans (if we already capture templatized query text)
- the combined cardinality of operation name, collection names, database names is the same as in the
db.query.text
Option 2: capture both db.operation.name and db.operation.names (same for collections and namespaces).
The array attributes are only captured when more than one operation is performed.
In this case we can entertain different options for db.operation.name - it may contain the first operation, operations joined as string, or shouldn't be reported at all.
Pros:
- simple case is easy
- complex case is slightly better (e.g. when
db.operation.namecontains joined listSELECT JOIN)
Cons:
- array attribute on metrics
- hard to query and not quite useful on spans (if we already capture templatized query text)
- the combined cardinality of operation name, collection names, database names is the same as in the
db.query.text
Option 3: don't capture multiple operation names, collection names, namespaces
Pros: simple case is easy
Cons: nothing distinguishes different operations in the complex case
There could be other options including opting into collecting templatized query string on metrics, but none of those is perfect.
Still, we'd like to provide a default experience which could be improved with users providing query nick-names (see #521 for the context).
Additional context
- similar problem exist for bulk operations - DB Convention does not cover batch/multi/envelope operations #712
- in case of bulk operations in gets even worse since we'd need two-dimensional arrays
- array attributes, if protocol does not support them "SHOULD be represented as JSON-encoded strings". I.e.
{[\"SELECT\", \"JOIN\"]}making them even harder to use. Space (or comma) delimited list would be nicer ([SELECT, JOIN])