- 
                Notifications
    You must be signed in to change notification settings 
- Fork 176
Feature: Add support for Data Profiling Scan #1392
base: main
Are you sure you want to change the base?
Feature: Add support for Data Profiling Scan #1392
Conversation
fd42a67    to
    524a19a      
    Compare
  
    524a19a    to
    e191796      
    Compare
  
    88f64a4    to
    38f1e8c      
    Compare
  
            
          
                dbt/adapters/bigquery/impl.py
              
                Outdated
          
        
      | return self.connections.dry_run(sql) | ||
|  | ||
| # If the label `dataplex-dp-published-*` is not assigned, we cannot view the results of the Data Profile Scan from BigQuery | ||
| def _update_labels_with_data_profile_scan_labels( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data Profile Scan is sometimes used for purposes other than dbt. It is important to have a way to tell whether the information in Data Profile Scan was created via dbt when updating/deleting it mechanically using cli or sdk. You can use scan_id, but I added the managed_by label because it is easier to handle when structured like labels.
03f68e3    to
    b59a087      
    Compare
  
    7d9e7c5    to
    9fa2586      
    Compare
  
    9fa2586    to
    8a99bfe      
    Compare
  
    | @mikealfare @colin-rogers-dbt Could you review this pull request? It seems that many users are waiting for this feature to be incorporated into dbt-bigquery.   I can also make a pull request to fix the documentation for BigQuery configurations, so please let me know if you need this 👍. If you need it, it would be helpful if you could let me know if you need it before this pull request is merged or if it would be sufficient after it is merged. | 
…6162/dbt-bigquery into feature/introduce_data_profile_scan
| This PR has been marked as Stale because it has been open with no activity as of late. If you would like the PR to remain open, please comment on the PR or else it will be closed in 7 days. | 
resolves dbt-labs/dbt-adapters#543
Problem
Dataplex data profiling lets you identify common statistical characteristics of the columns in your BigQuery tables. This information helps you to understand and analyze your data more effectively.
If you are managing tables with dbt, it is natural to want to configure Data Profile Scan in a yaml file. If data profiling could be set within dbt after the table is created, it would make it easier for dbt users to use the data profiling function.
Solution
I created this pull request to add support for Data Profiling Scan. If you write the following in
dbt_project.ymland then rundbt run, the Data Profile Scan settings will be configured automatically.You can also specify Data Profile Scan settings for individual model files, rather than
dbt_project.yml.Checklist