This file documents detailed technical changes, internal refactorings, and development notes. For user-facing highlights, see CHANGELOG.md.
- Implemented
PipesPluginto monitor Snowpipe status and validation - Uses
SYSTEM$PIPE_STATUSfunction for real-time pipe monitoring - Uses
VALIDATE_PIPE_LOADfunction for validation checks - Delivers telemetry as logs, metrics, and events
- Implemented
StreamsPluginto monitor Snowflake Streams - Tracks stream staleness using
SHOW STREAMSoutput - Monitors pending changes and stream health
- Reports stale streams as warning events
- Implemented
StagePluginto monitor staged data - Tracks internal and external stages
- Monitors COPY INTO activities from
QUERY_HISTORYandCOPY_HISTORYviews - Reports on staged file sizes, counts, and load patterns
- Implemented
DataLineagePlugincombining static and dynamic lineage - Static lineage from
OBJECT_DEPENDENCIESview (DDL-based relationships) - Dynamic lineage from
ACCESS_HISTORYview (runtime data flow) - Column-level lineage tracking with direct and indirect dependencies
- Lineage graphs delivered as structured events
- Issue: When a customer account had
EVENT_TABLE = snowflake.telemetry.events(the Snowflake-managed shared event table),SETUP_EVENT_TABLE()listed it ina_no_custom_event_t— the "not a real custom table" array — and took theIFbranch, creating DSOA's ownDTAGENT_DB.STATUS.EVENT_LOGtable and ignoring the Snowflake-managed table entirely. - Root cause:
'snowflake.telemetry.events'was excluded from the view-creation path because the originalELSEbranch attemptedGRANT SELECT ON TABLE snowflake.telemetry.events TO ROLE DTAGENT_VIEWER, which Snowflake rejects — privileges cannot be granted on Snowflake-managed objects. - Fix: Two-part change in
src/dtagent/plugins/event_log.sql/init/009_event_log_init.sql:- Removed
'snowflake.telemetry.events'froma_no_custom_event_tso it falls through to theELSEbranch - Wrapped the
GRANT SELECTin aBEGIN/EXCEPTION WHEN OTHER THEN SYSTEM$LOG_WARN()block — attempts the grant and logs warnings, ignoring failures for any read-only or Snowflake-managed table; more robust than a string comparison
- Removed
- Behaviour after fix: When
EVENT_TABLE = snowflake.telemetry.events, DSOA createsDTAGENT_DB.STATUS.EVENT_LOGas a view over it, exactly as for any other pre-existing customer event table. All threeevent_logSQL views continue to queryDTAGENT_DB.STATUS.EVENT_LOGunchanged — no Python changes needed.
- Motivation: Lookback windows were hardcoded across SQL views in every plugin that uses
F_LAST_PROCESSED_TS. This could not be tuned per deployment without modifying SQL files. - Approach: Replace each literal with
CONFIG.F_GET_CONFIG_VALUE('plugins.<plugin>.lookback_hours', <default>)and addlookback_hoursto each plugin's config YAML — consistent with howretention_hoursis already handled inP_CLEANUP_EVENT_LOG. - Pattern:
timeadd(hour, -1*F_GET_CONFIG_VALUE('plugins.<plugin>.lookback_hours', <N>), current_timestamp)— the-1*multiplier converts the positive config value to a negative offset. - Note: The
F_LAST_PROCESSED_TSguard in each view'sGREATEST(...)clause ensures normal incremental runs are unaffected;lookback_hoursonly bounds the fallback window when no prior timestamp exists. - Files changed (SQL views + config YAMLs):
| Plugin | SQL view(s) | Default |
|---|---|---|
event_log |
051_v_event_log.sql, 051_v_event_log_metrics_instrumented.sql, 051_v_event_log_spans_instrumented.sql |
24h |
login_history |
061_v_login_history.sql, 061_v_sessions.sql |
24h |
warehouse_usage |
070_v_warehouse_event_history.sql, 071_v_warehouse_load_history.sql, 072_v_warehouse_metering_history.sql |
24h |
tasks |
061_v_serverless_tasks.sql → lookback_hours (4h); 063_v_task_versions.sql → lookback_hours_versions (720h = 1 month) |
separate keys, original defaults preserved |
event_usage |
051_v_event_usage.sql |
6h |
data_schemas |
051_v_data_schemas.sql |
4h |
-
Issue:
P_GRANT_MONITOR_DYNAMIC_TABLES()always grantedMONITORat database level, even when theincludepattern specified a particular schema (e.g.PROD_DB.ANALYTICS.%). This caused the procedure to over-grant: a user expecting grants only onPROD_DB.ANALYTICSreceived grants on all schemas inPROD_DB. -
Root cause: The CTE extracted only
split_part(value, '.', 0)(the database part) and the schema part was never inspected. -
Fix: Three-pass approach in
032_p_grant_monitor_dynamic_tables.sql:- Database pass —
split_part(value, '.', 1) = '%'→GRANT … IN DATABASE. - Schema pass —
split_part(value, '.', 1) != '%'andsplit_part(value, '.', 2) = '%'→GRANT … IN SCHEMA db.schema. - Table pass —
split_part(value, '.', 1) != '%'andsplit_part(value, '.', 2) != '%'→GRANT … ON DYNAMIC TABLE db.schema.table(no FUTURE grant — not supported by Snowflake at individual table level).
- Database pass —
-
Grant matrix:
Include pattern Grant level %.%.%All databases PROD_DB.%.%Database PROD_DBPROD_DB.ANALYTICS.%Schema PROD_DB.ANALYTICSPROD_DB.ANALYTICS.ORDERS_DTTable PROD_DB.ANALYTICS.ORDERS_DT -
Files changed:
032_p_grant_monitor_dynamic_tables.sql,bom.yml,config.md -
Tests added:
test/bash/test_grant_monitor_dynamic_tables.bats— structural content checks covering both grant paths
- Issue: OTel log
observed_timestampfield was sent in milliseconds - Root cause: OTLP spec requires nanoseconds for
observed_timestamp, but code was converting to milliseconds - Fix: Modified
process_timestamps_for_telemetry()to returnobserved_timestamp_nsin nanoseconds - Impact: Logs now comply with OTLP spec
- Note: Dynatrace OTLP Logs API still requires milliseconds for
timestampfield (deviation from spec)
- Issue:
HAS_DB_DELETEDflag incorrectly reported for deleted shared databases inTMP_SHARESview - Root cause: Logic error in SQL view predicate
- Fix: Corrected SQL logic in
shares.sql/view definition - Impact: Accurate reporting of deleted shared database status
- Issue: Database name filtering logic failed to correctly identify DTAGENT_DB references
- Root cause: String matching logic didn't account for fully qualified names
- Fix: Updated filtering logic in self-monitoring plugin
- Impact: Self-monitoring logs now correctly exclude internal agent operations
- Motivation: Eliminate wasteful ns→ms→ns conversions and clarify API requirements
- Approach: Unified timestamp handling with smart unit detection
- Implementation:
- All SQL views produce nanoseconds via
extract(epoch_nanosecond ...) - Conversion to appropriate unit occurs only at API boundary
validate_timestamp()works internally in nanoseconds to preserve precision- Added
return_unitparameter ("ms" or "ns") for explicit output control - Added
skip_range_validationparameter forobserved_timestamp(no time range check) - Created
process_timestamps_for_telemetry()utility for standard timestamp processing pattern
- All SQL views produce nanoseconds via
- Changes to
validate_timestamp():- Works internally in nanoseconds throughout validation logic
- Converts to requested unit only at the end
- Raises
ValueErrorifreturn_unitnot in ["ms", "ns"] - Added
skip_range_validationfor observed_timestamp (preserves original value without range checks)
- Changes to
process_timestamps_for_telemetry():- New utility function implementing standard pattern for logs and events
- Extracts
timestampandobserved_timestampfrom data dict - Falls back to
timestampvalue whenobserved_timestampnot provided - Validates
timestampwith range checking (returns milliseconds) - Validates
observed_timestampwithout range checking (returns nanoseconds) - Returns
(timestamp_ms, observed_timestamp_ns)tuple - Hardcoded units: always milliseconds for timestamp, nanoseconds for observed_timestamp
- Removed obsolete functions:
get_timestamp_in_ms()— replaced byvalidate_timestamp(value, return_unit="ms")validate_timestamp_ms()— replaced byvalidate_timestamp(value, return_unit="ms")
- Added new functions:
get_timestamp()— returns nanoseconds from SQL query results
- API Documentation:
- Added comprehensive documentation links in all telemetry classes
- Documented Dynatrace OTLP Logs API deviation (milliseconds for
timestampfield) - Documented OTLP standard requirements (nanoseconds for most timestamp fields)
- Fallback Logic:
observed_timestampnow correctly falls back totimestampvalue when not provided- Only
event_logplugin provides explicitobserved_timestampvalues - All other plugins rely on fallback mechanism
- Change: All
scripts/dev/scripts now auto-activate.venv/ - Implementation: Added
source .venv/bin/activateto script preambles - Impact: Eliminates common "wrong Python" errors during development
- Change: Updated
.github/copilot-instructions.mdwith autogenerated file documentation - Coverage:
- Documentation files:
docs/PLUGINS.md,docs/SEMANTICS.md,docs/APPENDIX.md - Build artifacts:
build/_dtagent.py,build/_send_telemetry.py,build/_semantics.py,build/_version.py,build/_metric_semantics.txt
- Documentation files:
- Guidance: Never edit autogenerated files manually; edit source files and regenerate
- Change: Enhanced budget data collection using
SYSTEM$SHOW_BUDGETS_IN_ACCOUNT() - Previous: Manual query construction
- New: Leverages Snowflake system function for comprehensive budget data
- Impact: More accurate and complete budget information
- Issue:
STATUS.UPDATE_PROCESSED_QUERIESwas called regardless of whether the OTLP trace flush succeeded, meaning queries could be silently lost on export failures without being retried on the next cycle. - Root cause:
_process_span_rowsinsrc/dtagent/plugins/__init__.pycalledUPDATE_PROCESSED_QUERIESunconditionally afterflush_traces(). - Fix: Captured the boolean return value of
flush_traces()intoflush_succeededand gated theUPDATE_PROCESSED_QUERIEScall behindif report_status and flush_succeeded. - Impact: Queries whose spans fail to export are re-queued on the next agent run, ensuring at-least-once delivery semantics for span telemetry.
- Issue:
V_EVENT_LOGused a hardcodedtimeadd(hour, -24, current_timestamp)lower bound, preventing operators from adjusting the lookback window without editing SQL. - Fix:
src/dtagent/plugins/event_log.sql/051_v_event_log.sql: replaced literal withCONFIG.F_GET_CONFIG_VALUE('plugins.event_log.lookback_hours', 24)::int.src/dtagent/plugins/event_log.config/event_log-config.yml: addedlookback_hours: 24(default preserves prior behaviour).
- Impact: Operators can increase the window for initial deployments or decrease it for high-volume environments without any SQL change.
- Goal: Confirm that nested stored procedure call chains are correctly represented as OTel parent-child spans.
- Validation approach:
P_REFRESH_RECENT_QUERIESsetsIS_ROOT=TRUEfor top-level calls (noparent_query_id) andIS_PARENT=TRUEfor any query that has at least one child in the same batch. Leaf queries haveIS_ROOT=FALSE, IS_PARENT=FALSE._process_span_rowsinsrc/dtagent/plugins/__init__.pyiterates onlyIS_ROOT=TRUErows as top-level spans; child spans are fetched recursively viaSpans._get_sub_rowsusingPARENT_QUERY_ID.ExistingIdGeneratorinsrc/dtagent/otel/spans.pypropagates the root's_TRACE_IDand_SPAN_IDdown the hierarchy so every sub-span shares the correct trace context.
- New test fixture:
test/test_data/query_history_nested_sp.ndjson— 3-row synthetic SP chain: outer SP (root) → inner SP (mid) → leaf SELECT. - New test file:
test/plugins/test_query_history_span_hierarchy.pytest_span_hierarchy: integration test verifying 3 entries processed, 3 spans, 3 logs, 27 metrics across alldisabled_telemetrycombinations.test_is_root_only_processes_top_level: unit test confirming only 1 root row and 2 non-root rows in the fixture.test_is_parent_flags_intermediate_nodes: unit test asserting correctIS_ROOT/IS_PARENT/PARENT_QUERY_IDvalues for each level of the hierarchy.
- Impact: Span hierarchies for stored procedure chains are confirmed correct and regression-protected.
- Change: Refactored tests to use synthetic JSON fixtures
- Previous: Live Dynatrace API calls for validation
- New: Input/output validation against golden JSON files
- Impact: Faster, more reliable, deterministic tests
- Change: Expanded
event_log.config/config.mdfrom a minimal 5-line note to a full configuration reference - Content added:
- Configuration options table covering all 7 plugin settings with types, defaults, and descriptions
- Cost optimization guidance section explaining the cost impact of
LOOKBACK_HOURS,MAX_ENTRIES,RETENTION_HOURS, andSCHEDULE - Key guidance:
retention_hoursshould be>= lookback_hoursto prevent cleanup from removing events before they are processed
- Files changed:
src/dtagent/plugins/event_log.config/config.md— full configuration reference + cost guidancesrc/dtagent/plugins/event_log.config/readme.md— updated to mention configurable lookback window
- Issue:
_process_span_rows()insrc/dtagent/plugins/__init__.pycalled_report_execution()withcurrent_timestamp()(a Snowflake lazy column expression) instead of the actual last-row timestamp. - Root cause: When
STATUS.LOG_PROCESSED_MEASUREMENTSstored this value, it received the string'Column[current_timestamp]'rather than a real timestamp. On the next run,F_LAST_PROCESSED_TSwould return a malformed value, causing theGREATEST(...)guard in each SQL view to use the fallback lookback window — potentially re-processing spans already sent. - Fix: Added
last_processed_timestampvariable trackingrow_dict.get("TIMESTAMP", last_processed_timestamp)within the row iteration loop, mirroring the identical pattern used by_log_entries(). Passedstr(last_processed_timestamp)to_report_execution()instead ofcurrent_timestamp(). - Side effect removed: Dropped the now-unused
from snowflake.snowpark.functions import current_timestampimport — pylint flagged this as unused after the fix. - Impact: Spans and traces will no longer be re-processed after an agent restart. The
F_LAST_PROCESSED_TS('event_log_spans')guard now advances correctly after each run. - Affects:
event_logplugin (_process_span_entries) and any future plugin using_process_span_rowswithlog_completion=True
Detailed technical changes for prior versions can be added here as needed.
Detailed technical changes for prior versions can be added here as needed.
- This file is not auto-generated. Manual maintenance required.
- Focus on technical implementation details, root causes, and internal changes.
- For user-facing release notes, see CHANGELOG.md.
- Entries should help future developers understand decisions and troubleshoot issues.