-
Notifications
You must be signed in to change notification settings - Fork 173
feat(csharp/src/Drivers/Databricks): Add CI for E2E Databricks tests #3115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
alexguo-db
wants to merge
8
commits into
apache:main
Choose a base branch
from
alexguo-db:alex-guo_data/alexguo-add-databricks-e2e-ci
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
c64ea70
feat(csharp/src/Drivers/Databricks): Add CI for E2E Databricks tests
alexguo-db 6785dc1
Debug
alexguo-db 5901ee0
Update
alexguo-db a66afa3
Remove debug
alexguo-db 01997af
Lint
alexguo-db 48821e7
Add databricks-e2e environment
alexguo-db 02e15c3
Add masks and add collaborators
alexguo-db d8b945a
Address comments
alexguo-db File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,6 +19,9 @@ github: | |
| description: "Database connectivity API standard and libraries for Apache Arrow" | ||
| homepage: https://arrow.apache.org/adbc/ | ||
| collaborators: | ||
| - alexguo-db | ||
| - eric-wang-1990 | ||
| - jadewang-db | ||
| - krlmlr | ||
| - nbenn | ||
| enabled_merge_buttons: | ||
|
|
@@ -33,6 +36,18 @@ github: | |
| - database | ||
| protected_branches: | ||
| main: {} | ||
| environments: | ||
| databricks-e2e: | ||
| wait_timer: 0 | ||
| required_reviewers: | ||
| - id: alexguo-db | ||
| type: User | ||
| - id: eric-wang-1990 | ||
| type: User | ||
| - id: jadewang-db | ||
| type: User | ||
| deployment_branch_policy: | ||
| protected_branches: true | ||
|
|
||
| notifications: | ||
| commits: [email protected] | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,147 @@ | ||
| # Licensed to the Apache Software Foundation (ASF) under one | ||
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance | ||
| # with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, | ||
| # software distributed under the License is distributed on an | ||
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| # KIND, either express or implied. See the License for the | ||
| # specific language governing permissions and limitations | ||
| # under the License. | ||
|
|
||
| name: C# Databricks E2E Tests | ||
|
|
||
| on: | ||
| push: | ||
| branches: [main] | ||
| paths: | ||
| - '.github/workflows/csharp_databricks_e2e.yml' | ||
| - 'ci/scripts/csharp_databricks_e2e.sh' | ||
| - 'csharp/src/Apache.Arrow.Adbc/**' | ||
| - 'csharp/src/Client/**' | ||
| - 'csharp/src/Drivers/Apache/Hive2/**' | ||
| - 'csharp/src/Drivers/Apache/Spark/**' | ||
| - 'csharp/src/Drivers/Databricks/**' | ||
| - 'csharp/test/Drivers/Databricks/**' | ||
| pull_request_target: | ||
| paths: | ||
| - '.github/workflows/csharp_databricks_e2e.yml' | ||
| - 'ci/scripts/csharp_databricks_e2e.sh' | ||
| - 'csharp/src/Apache.Arrow.Adbc/**' | ||
| - 'csharp/src/Client/**' | ||
| - 'csharp/src/Drivers/Apache/Hive2/**' | ||
| - 'csharp/src/Drivers/Apache/Spark/**' | ||
| - 'csharp/src/Drivers/Databricks/**' | ||
| - 'csharp/test/Drivers/Databricks/**' | ||
|
|
||
| concurrency: | ||
| group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }} | ||
| cancel-in-progress: true | ||
|
|
||
| permissions: | ||
| contents: read | ||
| id-token: write # Required for OIDC token exchange | ||
|
|
||
| defaults: | ||
| run: | ||
| # 'bash' will expand to -eo pipefail | ||
| shell: bash | ||
|
|
||
| jobs: | ||
| csharp-databricks-e2e: | ||
| name: "C# ${{ matrix.os }} ${{ matrix.dotnet }}" | ||
| runs-on: ${{ matrix.os }} | ||
| environment: databricks-e2e | ||
| if: ${{ !contains(github.event.pull_request.title, 'WIP') }} | ||
| timeout-minutes: 15 | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| dotnet: ['8.0.x'] | ||
| os: [ubuntu-latest, windows-2022, macos-13, macos-latest] | ||
| steps: | ||
| - name: Install C# | ||
| uses: actions/setup-dotnet@v4 | ||
| with: | ||
| dotnet-version: ${{ matrix.dotnet }} | ||
| - name: Checkout ADBC | ||
| uses: actions/checkout@v5 | ||
| with: | ||
| ref: ${{ github.event.pull_request.head.sha || github.sha }} | ||
| fetch-depth: 0 | ||
| submodules: recursive | ||
| - name: Build | ||
| shell: bash | ||
| run: ci/scripts/csharp_build.sh $(pwd) | ||
| - name: Set up Databricks testing | ||
| shell: bash | ||
| env: | ||
| DATABRICKS_WORKSPACE_URL: 'adb-6436897454825492.12.azuredatabricks.net' | ||
| DATABRICKS_WAREHOUSE_PATH: '/sql/1.0/warehouses/2f03dd43e35e2aa0' | ||
| DATABRICKS_SP_CLIENT_ID: '8335020c-9ba9-4821-92bb-0e8657759cda' | ||
| run: | | ||
| # Set up cross-platform variables | ||
| if [[ "$RUNNER_OS" == "Windows" ]]; then | ||
| DATABRICKS_DIR="$USERPROFILE/.databricks" | ||
| DATABRICKS_CONFIG_FILE="$USERPROFILE/.databricks/connection.json" | ||
| else | ||
| DATABRICKS_DIR="$HOME/.databricks" | ||
| DATABRICKS_CONFIG_FILE="$HOME/.databricks/connection.json" | ||
| fi | ||
|
|
||
| # Get GitHub OIDC token | ||
| GITHUB_TOKEN=$(curl -H "Authorization: bearer $ACTIONS_ID_TOKEN_REQUEST_TOKEN" \ | ||
| "$ACTIONS_ID_TOKEN_REQUEST_URL&audience=https://github.com/apache" | jq -r '.value') | ||
|
|
||
| if [ "$GITHUB_TOKEN" = "null" ] || [ -z "$GITHUB_TOKEN" ]; then | ||
| echo "Failed to get GitHub OIDC token" | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Mask the GitHub OIDC token | ||
| echo "::add-mask::$GITHUB_TOKEN" | ||
|
|
||
| # Exchange OIDC token for Databricks OAuth token | ||
| OAUTH_RESPONSE=$(curl -X POST "https://$DATABRICKS_WORKSPACE_URL/oidc/v1/token" \ | ||
| -H "Content-Type: application/x-www-form-urlencoded" \ | ||
| -d "grant_type=urn:ietf:params:oauth:grant-type:token-exchange" \ | ||
| -d "client_id=$DATABRICKS_SP_CLIENT_ID" \ | ||
| -d "subject_token=$GITHUB_TOKEN" \ | ||
| -d "subject_token_type=urn:ietf:params:oauth:token-type:jwt" \ | ||
| -d "scope=sql") | ||
|
|
||
| DATABRICKS_TOKEN=$(echo "$OAUTH_RESPONSE" | jq -r '.access_token') | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You might want to use |
||
|
|
||
| if [ "$DATABRICKS_TOKEN" = "null" ] || [ -z "$DATABRICKS_TOKEN" ]; then | ||
| echo "Failed to get Databricks access token. Response:" | ||
| echo "$OAUTH_RESPONSE" | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Mask the Databricks access token | ||
| echo "::add-mask::$DATABRICKS_TOKEN" | ||
|
|
||
| # Create Databricks configuration file | ||
| mkdir -p "$DATABRICKS_DIR" | ||
| cat > "$DATABRICKS_CONFIG_FILE" << EOF | ||
| { | ||
| "hostName": "$DATABRICKS_WORKSPACE_URL", | ||
| "port": "443", | ||
| "path": "$DATABRICKS_WAREHOUSE_PATH", | ||
| "auth_type": "oauth", | ||
| "access_token": "$DATABRICKS_TOKEN" | ||
| } | ||
| EOF | ||
|
|
||
| echo "DATABRICKS_TEST_CONFIG_FILE=$DATABRICKS_CONFIG_FILE" >> $GITHUB_ENV | ||
|
|
||
| echo "Databricks configuration created successfully at $DATABRICKS_CONFIG_FILE" | ||
| - name: Test Databricks | ||
| shell: bash | ||
| run: ci/scripts/csharp_test_databricks_e2e.sh $(pwd) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| #!/usr/bin/env bash | ||
| # | ||
| # Licensed to the Apache Software Foundation (ASF) under one | ||
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance | ||
| # with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, | ||
| # software distributed under the License is distributed on an | ||
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| # KIND, either express or implied. See the License for the | ||
| # specific language governing permissions and limitations | ||
| # under the License. | ||
|
|
||
| set -ex | ||
|
|
||
| source_dir=${1}/csharp/test/Drivers/Databricks | ||
|
|
||
| pushd ${source_dir} | ||
| # Include all E2E tests once the tests are all passing | ||
| dotnet test --filter "FullyQualifiedName~CloudFetchE2ETest" | ||
| popd |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that we may run
ci/scripts/csharp_build.shin forked repository with thepull_request_targetcontext (that has write access to apache/arrow-adbc)?I think that it's not acceptable based on the ASF GitHub Actions policy: https://infra.apache.org/github-actions-policy.html
Can we run this on fork not apache/arrow-adbc by removing
branches: [main]fromon.push?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I suppose the point is that the manual approval for the environment protects against this. But Infra may not have intended for environments to be used this way.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe @zeroshade checked with ASF Infra about this use case. Can you confirm with ASF Infra that we can bypass the triggers policy if we have an environment with required reviewers? Otherwise, I don't see how other Apache repos can implement this click-approval pattern
@kou Sorry, I'm not following why excluding it from being run on the upstream repo would improve the security
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we run this job on apache/arrow-adbc, evil pull requests can get
GITHUB_TOKENthat hasid-token: writepermission for apache/arrow-adbc. They may abuse it.If we run this job on fork repository, evil developers can get only
GITHUB_TOKENfor their fork repositories. It's not a problem because they already have permissions that theseGITHUB_TOKENhave. They can't get additional permissions for apache/arrow-adbc.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Within our own repo we found out that our GITHUB_TOKEN would only be valid for 5 minutes thus the exchanged Databricks token is also 5 mins long and any test case passing that limit will fail because of invalidation.
We will probably need to supply a Databricks PAT token instead, which will bypass the GITHUB_TOKEN entirely, would that address the concern here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kou That won't work for us, the Databricks service principal is set up to only allow OIDC token exchange when the Github OIDC token originates from the main repo.
If we allowed the service principal to accept any Github OIDC token then anybody can create a fork and run malicious queries against the Databricks workspace.
The alternative is to use a Databricks personal access token secret but that runs into the same problem where we can only store it only on the main repo (so only people with main branch permissions can copy the branch to main and run the E2E tests)
How is this click approval pattern implemented elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid using
pull_request_targetby this? If so, it addresses the concern.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we accept only trusted fork repositories not any fork repositories in Databricks side?
I haven't seen the implementation in apache/* repositories...
apache/airflow-publish may have an implementation of it...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, can Databricks provide local test tool like MinIO for AWS S3, Azurite fhttps://github.com/Azure/Azurite or Azure Storage and Storage Testbench https://github.com/googleapis/storage-testbench for Google Cloud Storage?