Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .asf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ github:
description: "Database connectivity API standard and libraries for Apache Arrow"
homepage: https://arrow.apache.org/adbc/
collaborators:
- alexguo-db
- eric-wang-1990
- jadewang-db
- krlmlr
- nbenn
enabled_merge_buttons:
Expand All @@ -33,6 +36,18 @@ github:
- database
protected_branches:
main: {}
environments:
databricks-e2e:
wait_timer: 0
required_reviewers:
- id: alexguo-db
type: User
- id: eric-wang-1990
type: User
- id: jadewang-db
type: User
deployment_branch_policy:
protected_branches: true

notifications:
commits: [email protected]
Expand Down
147 changes: 147 additions & 0 deletions .github/workflows/csharp_databricks_e2e.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

name: C# Databricks E2E Tests

on:
push:
branches: [main]
paths:
- '.github/workflows/csharp_databricks_e2e.yml'
- 'ci/scripts/csharp_databricks_e2e.sh'
- 'csharp/src/Apache.Arrow.Adbc/**'
- 'csharp/src/Client/**'
- 'csharp/src/Drivers/Apache/Hive2/**'
- 'csharp/src/Drivers/Apache/Spark/**'
- 'csharp/src/Drivers/Databricks/**'
- 'csharp/test/Drivers/Databricks/**'
pull_request_target:
paths:
- '.github/workflows/csharp_databricks_e2e.yml'
- 'ci/scripts/csharp_databricks_e2e.sh'
- 'csharp/src/Apache.Arrow.Adbc/**'
- 'csharp/src/Client/**'
- 'csharp/src/Drivers/Apache/Hive2/**'
- 'csharp/src/Drivers/Apache/Spark/**'
- 'csharp/src/Drivers/Databricks/**'
- 'csharp/test/Drivers/Databricks/**'

concurrency:
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
cancel-in-progress: true

permissions:
contents: read
id-token: write # Required for OIDC token exchange

defaults:
run:
# 'bash' will expand to -eo pipefail
shell: bash

jobs:
csharp-databricks-e2e:
name: "C# ${{ matrix.os }} ${{ matrix.dotnet }}"
runs-on: ${{ matrix.os }}
environment: databricks-e2e
if: ${{ !contains(github.event.pull_request.title, 'WIP') }}
timeout-minutes: 15
strategy:
fail-fast: false
matrix:
dotnet: ['8.0.x']
os: [ubuntu-latest, windows-2022, macos-13, macos-latest]
steps:
- name: Install C#
uses: actions/setup-dotnet@v4
with:
dotnet-version: ${{ matrix.dotnet }}
- name: Checkout ADBC
uses: actions/checkout@v5
with:
ref: ${{ github.event.pull_request.head.sha || github.sha }}
fetch-depth: 0
submodules: recursive
- name: Build
shell: bash
run: ci/scripts/csharp_build.sh $(pwd)
Comment on lines +76 to +81
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that we may run ci/scripts/csharp_build.sh in forked repository with the pull_request_target context (that has write access to apache/arrow-adbc)?

I think that it's not acceptable based on the ASF GitHub Actions policy: https://infra.apache.org/github-actions-policy.html

Triggers

You MUST NOT use pull_request_target as a trigger on ANY action that exports ANY confidential credentials or tokens such as GITHUB_TOKEN or NPM_TOKEN.

Can we run this on fork not apache/arrow-adbc by removing branches: [main] from on.push?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I suppose the point is that the manual approval for the environment protects against this. But Infra may not have intended for environments to be used this way.

Copy link
Contributor Author

@alexguo-db alexguo-db Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @zeroshade checked with ASF Infra about this use case. Can you confirm with ASF Infra that we can bypass the triggers policy if we have an environment with required reviewers? Otherwise, I don't see how other Apache repos can implement this click-approval pattern

Can we run this on fork not apache/arrow-adbc by removing branches: [main] from on.push?

@kou Sorry, I'm not following why excluding it from being run on the upstream repo would improve the security

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we run this job on apache/arrow-adbc, evil pull requests can get GITHUB_TOKEN that has id-token: write permission for apache/arrow-adbc. They may abuse it.

If we run this job on fork repository, evil developers can get only GITHUB_TOKEN for their fork repositories. It's not a problem because they already have permissions that these GITHUB_TOKEN have. They can't get additional permissions for apache/arrow-adbc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within our own repo we found out that our GITHUB_TOKEN would only be valid for 5 minutes thus the exchanged Databricks token is also 5 mins long and any test case passing that limit will fail because of invalidation.
We will probably need to supply a Databricks PAT token instead, which will bypass the GITHUB_TOKEN entirely, would that address the concern here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou That won't work for us, the Databricks service principal is set up to only allow OIDC token exchange when the Github OIDC token originates from the main repo.

If we run this job on fork repository, evil developers can get only GITHUB_TOKEN for their fork repositories.

If we allowed the service principal to accept any Github OIDC token then anybody can create a fork and run malicious queries against the Databricks workspace.

The alternative is to use a Databricks personal access token secret but that runs into the same problem where we can only store it only on the main repo (so only people with main branch permissions can copy the branch to main and run the E2E tests)

How is this click approval pattern implemented elsewhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will probably need to supply a Databricks PAT token instead, which will bypass the GITHUB_TOKEN entirely, would that address the concern here?

Can we avoid using pull_request_target by this? If so, it addresses the concern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we allowed the service principal to accept any Github OIDC token then anybody can create a fork and run malicious queries against the Databricks workspace.

Can we accept only trusted fork repositories not any fork repositories in Databricks side?

How is this click approval pattern implemented elsewhere?

I haven't seen the implementation in apache/* repositories...

apache/airflow-publish may have an implementation of it...?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, can Databricks provide local test tool like MinIO for AWS S3, Azurite fhttps://github.com/Azure/Azurite or Azure Storage and Storage Testbench https://github.com/googleapis/storage-testbench for Google Cloud Storage?

- name: Set up Databricks testing
shell: bash
env:
DATABRICKS_WORKSPACE_URL: 'adb-6436897454825492.12.azuredatabricks.net'
DATABRICKS_WAREHOUSE_PATH: '/sql/1.0/warehouses/2f03dd43e35e2aa0'
DATABRICKS_SP_CLIENT_ID: '8335020c-9ba9-4821-92bb-0e8657759cda'
run: |
# Set up cross-platform variables
if [[ "$RUNNER_OS" == "Windows" ]]; then
DATABRICKS_DIR="$USERPROFILE/.databricks"
DATABRICKS_CONFIG_FILE="$USERPROFILE/.databricks/connection.json"
else
DATABRICKS_DIR="$HOME/.databricks"
DATABRICKS_CONFIG_FILE="$HOME/.databricks/connection.json"
fi

# Get GitHub OIDC token
GITHUB_TOKEN=$(curl -H "Authorization: bearer $ACTIONS_ID_TOKEN_REQUEST_TOKEN" \
"$ACTIONS_ID_TOKEN_REQUEST_URL&audience=https://github.com/apache" | jq -r '.value')

if [ "$GITHUB_TOKEN" = "null" ] || [ -z "$GITHUB_TOKEN" ]; then
echo "Failed to get GitHub OIDC token"
exit 1
fi

# Mask the GitHub OIDC token
echo "::add-mask::$GITHUB_TOKEN"

# Exchange OIDC token for Databricks OAuth token
OAUTH_RESPONSE=$(curl -X POST "https://$DATABRICKS_WORKSPACE_URL/oidc/v1/token" \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "grant_type=urn:ietf:params:oauth:grant-type:token-exchange" \
-d "client_id=$DATABRICKS_SP_CLIENT_ID" \
-d "subject_token=$GITHUB_TOKEN" \
-d "subject_token_type=urn:ietf:params:oauth:token-type:jwt" \
-d "scope=sql")

DATABRICKS_TOKEN=$(echo "$OAUTH_RESPONSE" | jq -r '.access_token')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to use ::add-mask:: in some areas here to ensure that the access tokens don't show up in the logs


if [ "$DATABRICKS_TOKEN" = "null" ] || [ -z "$DATABRICKS_TOKEN" ]; then
echo "Failed to get Databricks access token. Response:"
echo "$OAUTH_RESPONSE"
exit 1
fi

# Mask the Databricks access token
echo "::add-mask::$DATABRICKS_TOKEN"

# Create Databricks configuration file
mkdir -p "$DATABRICKS_DIR"
cat > "$DATABRICKS_CONFIG_FILE" << EOF
{
"hostName": "$DATABRICKS_WORKSPACE_URL",
"port": "443",
"path": "$DATABRICKS_WAREHOUSE_PATH",
"auth_type": "oauth",
"access_token": "$DATABRICKS_TOKEN"
}
EOF

echo "DATABRICKS_TEST_CONFIG_FILE=$DATABRICKS_CONFIG_FILE" >> $GITHUB_ENV

echo "Databricks configuration created successfully at $DATABRICKS_CONFIG_FILE"
- name: Test Databricks
shell: bash
run: ci/scripts/csharp_test_databricks_e2e.sh $(pwd)
27 changes: 27 additions & 0 deletions ci/scripts/csharp_test_databricks_e2e.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

set -ex

source_dir=${1}/csharp/test/Drivers/Databricks

pushd ${source_dir}
# Include all E2E tests once the tests are all passing
dotnet test --filter "FullyQualifiedName~CloudFetchE2ETest"
popd
Loading