Skip to content

feat(ingest/bigquery): improve profiler to support multiple partition columns and support external table profiling#12825

Open
acrylJonny wants to merge 173 commits intomasterfrom
bq-multi-partition-profiling
Open

feat(ingest/bigquery): improve profiler to support multiple partition columns and support external table profiling#12825
acrylJonny wants to merge 173 commits intomasterfrom
bq-multi-partition-profiling

Conversation

@acrylJonny
Copy link
Collaborator

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Mar 10, 2025
…profiling

# Conflicts:
#	metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_config.py
@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Jan 13, 2026
try:
# Query for actual values of this column using the date filters
discover_query = f"""
SELECT DISTINCT `{col_name}` as col_value, COUNT(*) as row_count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential SQL injection via string-based query concatenation - critical severity
SQL injection might be possible in these locations, especially if the strings being concatenated are controlled via user input.

Remediation: If possible, rebuild the query to use prepared statements or an ORM. If that is not possible, make sure the user input is verified or sanitized. As an added layer of protection, we also recommend installing a WAF that blocks SQL injection attacks.
View details in Aikido Security

# Non-partitioned table - apply row limit or safety limit
if self.config.profiling.profiling_row_limit > 0:
row_limit = max(1, int(self.config.profiling.profiling_row_limit))
custom_sql = f"SELECT * FROM {safe_table_ref} LIMIT {row_limit}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential SQL injection via string-based query concatenation - critical severity
SQL injection might be possible in these locations, especially if the strings being concatenated are controlled via user input.

Remediation: If possible, rebuild the query to use prepared statements or an ORM. If that is not possible, make sure the user input is verified or sanitized. As an added layer of protection, we also recommend installing a WAF that blocks SQL injection attacks.
View details in Aikido Security

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ingestion PR or Issue related to the ingestion of metadata needs-review Label for PRs that need review from a maintainer.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants