Skip to content

How to record multi-operation/table/dbs operations on DB metrics #805

@lmolkova

Description

@lmolkova

For a simple database queries (such as SELECT * from foo where bar="baz"), we'd like to capture the following attributes when possible (on spans):

  • db.operation.name = SELECT
  • db.collection.name = foo (aka db.sql.table and other similar system-specific attributes)
  • db.collection.namespace = mydb (aka db.name along with db.instance.id and similar)
  • db.query.text = SELECT * from foo where bar=? (aka db.statement)
  • db.query.parameter.bar=baz

(attribute names are being discussed and are not final).

This simple case is supported by current version of DB semantic conventions. It's also a common one in the NoSQL world if we exclude bulk operations (or non-homogeneous batch operations).
Attributes (except db.query.* ones) have reasonable cardinality and can be used on traces and metrics.


More complicated queries involve multiple operations, tables, or even databases. E.g. in SELECT * from foo JOIN bar ON baz
we have two operations (SELECT and JOIN), two tables (foo and ``bar`), and just one database.

  • db.query.text and db.query.parameter.* for such queries are still relevant and make sense on spans (still cardinality is a problem for metrics)
  • Operation and collection names become problematic to record

DB WG is considering multiple options:


Option 1: always capture db.operation.names, db.collection.names, db.collection.namespaces as arrays

Pros: consistent understandable model

Cons:

  • array attribute on metrics
  • simple case (one operation) becomes hard to use
  • hard to query and not quite useful on spans (if we already capture templatized query text)
  • the combined cardinality of operation name, collection names, database names is the same as in the db.query.text

Option 2: capture both db.operation.name and db.operation.names (same for collections and namespaces).

The array attributes are only captured when more than one operation is performed.
In this case we can entertain different options for db.operation.name - it may contain the first operation, operations joined as string, or shouldn't be reported at all.

Pros:

  • simple case is easy
  • complex case is slightly better (e.g. when db.operation.name contains joined list SELECT JOIN)

Cons:

  • array attribute on metrics
  • hard to query and not quite useful on spans (if we already capture templatized query text)
  • the combined cardinality of operation name, collection names, database names is the same as in the db.query.text

Option 3: don't capture multiple operation names, collection names, namespaces

Pros: simple case is easy
Cons: nothing distinguishes different operations in the complex case

There could be other options including opting into collecting templatized query string on metrics, but none of those is perfect.
Still, we'd like to provide a default experience which could be improved with users providing query nick-names (see #521 for the context).


Additional context

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions