Skip to content

Commit efbee5a

Browse files
Tom-Newtonkoupitroushefali singh
authored andcommitted
apacheGH-29847: [C++] Build with Azure SDK for C++ (apache#36835)
### Rationale for this change We want to use the Azure SDK for C++ to read/write to Azure blob storage. Obviously this is pretty important for building an `AzureFileSystem`. ### What changes are included in this PR? Builds the the relevant parts of the azure SDK as a cmake external project. Adds a couple of simple tests that just assert that the Azure SDK is working and a couple of lines in `AzureFileSystem` to initialise the blob storage client to ensure the build is working correctly in all environments. I started with the build setup from apache#12914 but I did make few changes. 1. Although its atypical for this project we chose to switch from cmake's `ExternalProject` to `FetchContent`. `FetchContent` is recomended by the Azure docs https://github.com/Azure/azure-sdk-for-cpp#cmake-project--fetch-content. It also solves a few problems including: automatically linking system curl and ssl instead of bootstrapping vcpkg and installing curl and ssl from there. 2. Only build one version of the Azure SDK for C++ because it contains all the components. Previously we were unnecessarily building 5 different versions of the whole thing on top of each other. This created race conditions for which version each component came from. 3. We are using `azure-core_1.10.2` which is a very recent version. There are a couple of important reasons for this 1. [an important managed identity fix](Azure/azure-sdk-for-cpp#4723), 2. [fixed support for curl versions < 7.71.0](Azure/azure-sdk-for-cpp#4792). There will be follow up PRs to enable Azure in the manylinux builds. We need to update `vcpkg` first so we can get a version of the Azure SDK which contains [an important managed identity fix](Azure/azure-sdk-for-cpp#4723). ### Are these changes tested? Yes. There is a simple test that just runs the Azure client against azurite. Additionally just initialising the client in `AzureFileSystem` goes a long way towards ensuring the build is working. ### Are there any user-facing changes? No * Closes: apache#29847 Lead-authored-by: Thomas Newton <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Co-authored-by: shefali singh <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
1 parent a2a451e commit efbee5a

File tree

9 files changed

+174
-6
lines changed

9 files changed

+174
-6
lines changed

ci/docker/ubuntu-20.04-cpp.dockerfile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,7 @@ RUN apt-get update -y -q && \
9999
libssl-dev \
100100
libthrift-dev \
101101
libutf8proc-dev \
102+
libxml2-dev \
102103
libzstd-dev \
103104
make \
104105
ninja-build \
@@ -172,6 +173,7 @@ ENV absl_SOURCE=BUNDLED \
172173
ARROW_WITH_ZSTD=ON \
173174
ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-${llvm}/bin/llvm-symbolizer \
174175
AWSSDK_SOURCE=BUNDLED \
176+
Azure_SOURCE=BUNDLED \
175177
google_cloud_cpp_storage_SOURCE=BUNDLED \
176178
gRPC_SOURCE=BUNDLED \
177179
GTest_SOURCE=BUNDLED \

ci/docker/ubuntu-22.04-cpp.dockerfile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,7 @@ RUN apt-get update -y -q && \
9898
libssl-dev \
9999
libthrift-dev \
100100
libutf8proc-dev \
101+
libxml2-dev \
101102
libzstd-dev \
102103
make \
103104
ninja-build \
@@ -196,6 +197,7 @@ ENV absl_SOURCE=BUNDLED \
196197
ARROW_WITH_ZSTD=ON \
197198
ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-${llvm}/bin/llvm-symbolizer \
198199
AWSSDK_SOURCE=BUNDLED \
200+
Azure_SOURCE=BUNDLED \
199201
google_cloud_cpp_storage_SOURCE=BUNDLED \
200202
GTest_SOURCE=BUNDLED \
201203
ORC_SOURCE=BUNDLED \

ci/scripts/cpp_build.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,7 @@ cmake \
152152
-DARROW_WITH_ZLIB=${ARROW_WITH_ZLIB:-OFF} \
153153
-DARROW_WITH_ZSTD=${ARROW_WITH_ZSTD:-OFF} \
154154
-DAWSSDK_SOURCE=${AWSSDK_SOURCE:-} \
155+
-DAzure_SOURCE=${Azure_SOURCE:-} \
155156
-Dbenchmark_SOURCE=${benchmark_SOURCE:-} \
156157
-DBOOST_SOURCE=${BOOST_SOURCE:-} \
157158
-DBrotli_SOURCE=${Brotli_SOURCE:-} \

cpp/CMakeLists.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -818,6 +818,11 @@ if(ARROW_WITH_OPENTELEMETRY)
818818
list(APPEND ARROW_STATIC_INSTALL_INTERFACE_LIBS CURL::libcurl)
819819
endif()
820820

821+
if(ARROW_WITH_AZURE_SDK)
822+
list(APPEND ARROW_SHARED_LINK_LIBS ${AZURE_SDK_LINK_LIBRARIES})
823+
list(APPEND ARROW_STATIC_LINK_LIBS ${AZURE_SDK_LINK_LIBRARIES})
824+
endif()
825+
821826
if(ARROW_WITH_UTF8PROC)
822827
list(APPEND ARROW_SHARED_LINK_LIBS utf8proc::utf8proc)
823828
list(APPEND ARROW_STATIC_LINK_LIBS utf8proc::utf8proc)

cpp/cmake_modules/FindAzure.cmake

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
if(Azure_FOUND)
19+
return()
20+
endif()
21+
22+
set(find_package_args)
23+
list(APPEND find_package_args CONFIG)
24+
if(Azure_FIND_QUIETLY)
25+
list(APPEND find_package_args QUIET)
26+
endif()
27+
28+
if(Azure_FIND_REQUIRED)
29+
list(APPEND find_package_args REQUIRED)
30+
endif()
31+
32+
find_package(azure-core-cpp ${find_package_args})
33+
find_package(azure-identity-cpp ${find_package_args})
34+
find_package(azure-storage-blobs-cpp ${find_package_args})
35+
find_package(azure-storage-common-cpp ${find_package_args})
36+
find_package(azure-storage-files-datalake-cpp ${find_package_args})
37+
38+
find_package_handle_standard_args(
39+
Azure
40+
REQUIRED_VARS azure-core-cpp_FOUND
41+
azure-identity-cpp_FOUND
42+
azure-storage-blobs-cpp_FOUND
43+
azure-storage-common-cpp_FOUND
44+
azure-storage-files-datalake-cpp_FOUND
45+
VERSION_VAR azure-core-cpp_VERSION)

cpp/cmake_modules/ThirdpartyToolchain.cmake

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ set(ARROW_RE2_LINKAGE
4949
set(ARROW_THIRDPARTY_DEPENDENCIES
5050
absl
5151
AWSSDK
52+
Azure
5253
benchmark
5354
Boost
5455
Brotli
@@ -162,6 +163,8 @@ macro(build_dependency DEPENDENCY_NAME)
162163
build_absl()
163164
elseif("${DEPENDENCY_NAME}" STREQUAL "AWSSDK")
164165
build_awssdk()
166+
elseif("${DEPENDENCY_NAME}" STREQUAL "Azure")
167+
build_azure_sdk()
165168
elseif("${DEPENDENCY_NAME}" STREQUAL "benchmark")
166169
build_benchmark()
167170
elseif("${DEPENDENCY_NAME}" STREQUAL "Boost")
@@ -389,6 +392,10 @@ if(ARROW_GCS)
389392
set(ARROW_WITH_ZLIB ON)
390393
endif()
391394

395+
if(ARROW_AZURE)
396+
set(ARROW_WITH_AZURE_SDK ON)
397+
endif()
398+
392399
if(ARROW_JSON)
393400
set(ARROW_WITH_RAPIDJSON ON)
394401
endif()
@@ -569,6 +576,14 @@ else()
569576
"${THIRDPARTY_MIRROR_URL}/aws-sdk-cpp-${ARROW_AWSSDK_BUILD_VERSION}.tar.gz")
570577
endif()
571578

579+
if(DEFINED ENV{ARROW_AZURE_SDK_URL})
580+
set(ARROW_AZURE_SDK_URL "$ENV{ARROW_AZURE_SDK_URL}")
581+
else()
582+
set_urls(ARROW_AZURE_SDK_URL
583+
"https://github.com/Azure/azure-sdk-for-cpp/archive/${ARROW_AZURE_SDK_BUILD_VERSION}.tar.gz"
584+
)
585+
endif()
586+
572587
if(DEFINED ENV{ARROW_BOOST_URL})
573588
set(BOOST_SOURCE_URL "$ENV{ARROW_BOOST_URL}")
574589
else()
@@ -981,6 +996,8 @@ else()
981996
set(MAKE_BUILD_ARGS "-j${NPROC}")
982997
endif()
983998

999+
include(FetchContent)
1000+
9841001
# ----------------------------------------------------------------------
9851002
# Find pthreads
9861003

@@ -1388,6 +1405,7 @@ endif()
13881405
set(ARROW_OPENSSL_REQUIRED_VERSION "1.0.2")
13891406
set(ARROW_USE_OPENSSL OFF)
13901407
if(PARQUET_REQUIRE_ENCRYPTION
1408+
OR ARROW_AZURE
13911409
OR ARROW_FLIGHT
13921410
OR ARROW_GANDIVA
13931411
OR ARROW_GCS
@@ -5095,6 +5113,56 @@ if(ARROW_S3)
50955113
endif()
50965114
endif()
50975115

5116+
# ----------------------------------------------------------------------
5117+
# Azure SDK for C++
5118+
5119+
function(build_azure_sdk)
5120+
message(STATUS "Building Azure SDK for C++ from source")
5121+
fetchcontent_declare(azure_sdk
5122+
URL ${ARROW_AZURE_SDK_URL}
5123+
URL_HASH "SHA256=${ARROW_AZURE_SDK_BUILD_SHA256_CHECKSUM}")
5124+
set(BUILD_PERFORMANCE_TESTS FALSE)
5125+
set(BUILD_SAMPLES FALSE)
5126+
set(BUILD_TESTING FALSE)
5127+
set(BUILD_WINDOWS_UWP TRUE)
5128+
set(CMAKE_EXPORT_NO_PACKAGE_REGISTRY TRUE)
5129+
set(DISABLE_AZURE_CORE_OPENTELEMETRY TRUE)
5130+
set(ENV{AZURE_SDK_DISABLE_AUTO_VCPKG} TRUE)
5131+
set(WARNINGS_AS_ERRORS FALSE)
5132+
# TODO: Configure flags in a better way. FetchContent builds inherit
5133+
# global flags but we want to disable -Werror for Azure SDK for C++ builds.
5134+
if(MSVC)
5135+
string(REPLACE "/WX" "" CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG}")
5136+
string(REPLACE "/WX" "" CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG}")
5137+
else()
5138+
string(REPLACE "-Werror" "" CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG}")
5139+
string(REPLACE "-Werror" "" CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG}")
5140+
endif()
5141+
fetchcontent_makeavailable(azure_sdk)
5142+
set(AZURE_SDK_VENDORED
5143+
TRUE
5144+
PARENT_SCOPE)
5145+
list(APPEND
5146+
ARROW_BUNDLED_STATIC_LIBS
5147+
Azure::azure-core
5148+
Azure::azure-identity
5149+
Azure::azure-storage-blobs
5150+
Azure::azure-storage-common
5151+
Azure::azure-storage-files-datalake)
5152+
set(ARROW_BUNDLED_STATIC_LIBS
5153+
${ARROW_BUNDLED_STATIC_LIBS}
5154+
PARENT_SCOPE)
5155+
endfunction()
5156+
5157+
if(ARROW_WITH_AZURE_SDK)
5158+
resolve_dependency(Azure REQUIRED_VERSION 1.10.2)
5159+
set(AZURE_SDK_LINK_LIBRARIES
5160+
Azure::azure-storage-files-datalake
5161+
Azure::azure-storage-common
5162+
Azure::azure-storage-blobs
5163+
Azure::azure-identity
5164+
Azure::azure-core)
5165+
endif()
50985166
# ----------------------------------------------------------------------
50995167
# ucx - communication framework for modern, high-bandwidth and low-latency networks
51005168

cpp/src/arrow/filesystem/azurefs.cc

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,9 @@
1717

1818
#include "arrow/filesystem/azurefs.h"
1919

20+
#include <azure/identity/default_azure_credential.hpp>
21+
#include <azure/storage/blobs.hpp>
22+
2023
#include "arrow/result.h"
2124
#include "arrow/util/checked_cast.h"
2225

@@ -47,6 +50,12 @@ class AzureFileSystem::Impl {
4750
: io_context_(io_context), options_(std::move(options)) {}
4851

4952
Status Init() {
53+
// TODO: GH-18014 Delete this once we have a proper implementation. This just
54+
// initializes a pointless Azure blob service client with a fake endpoint to ensure
55+
// the build will fail if the Azure SDK build is broken.
56+
auto default_credential = std::make_shared<Azure::Identity::DefaultAzureCredential>();
57+
auto service_client = Azure::Storage::Blobs::BlobServiceClient(
58+
"http://fake-blob-storage-endpoint", default_credential);
5059
if (options_.backend == AzureBackend::Azurite) {
5160
// gen1Client_->GetAccountInfo().Value.IsHierarchicalNamespaceEnabled
5261
// throws error in azurite

cpp/src/arrow/filesystem/azurefs_test.cc

Lines changed: 39 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,12 @@
4545
#include "arrow/testing/gtest_util.h"
4646
#include "arrow/testing/util.h"
4747

48+
#include <azure/identity/client_secret_credential.hpp>
49+
#include <azure/identity/default_azure_credential.hpp>
50+
#include <azure/identity/managed_identity_credential.hpp>
51+
#include <azure/storage/blobs.hpp>
52+
#include <azure/storage/common/storage_credential.hpp>
53+
4854
namespace arrow {
4955
using internal::TemporaryDir;
5056
namespace fs {
@@ -105,15 +111,42 @@ AzuriteEnv* GetAzuriteEnv() {
105111
return ::arrow::internal::checked_cast<AzuriteEnv*>(azurite_env);
106112
}
107113

108-
// Placeholder tests for file structure
114+
// Placeholder tests
109115
// TODO: GH-18014 Remove once a proper test is added
110-
TEST(AzureFileSystem, InitialiseAzurite) {
116+
TEST(AzureFileSystem, UploadThenDownload) {
117+
const std::string container_name = "sample-container";
118+
const std::string blob_name = "sample-blob.txt";
119+
const std::string blob_content = "Hello Azure!";
120+
111121
const std::string& account_name = GetAzuriteEnv()->account_name();
112122
const std::string& account_key = GetAzuriteEnv()->account_key();
113-
EXPECT_EQ(account_name, "devstoreaccount1");
114-
EXPECT_EQ(account_key,
115-
"Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/"
116-
"K1SZFPTOtr/KBHBeksoGMGw==");
123+
124+
auto credential = std::make_shared<Azure::Storage::StorageSharedKeyCredential>(
125+
account_name, account_key);
126+
127+
auto service_client = Azure::Storage::Blobs::BlobServiceClient(
128+
std::string("http://127.0.0.1:10000/") + account_name, credential);
129+
auto container_client = service_client.GetBlobContainerClient(container_name);
130+
container_client.CreateIfNotExists();
131+
auto blob_client = container_client.GetBlockBlobClient(blob_name);
132+
133+
std::vector<uint8_t> buffer(blob_content.begin(), blob_content.end());
134+
blob_client.UploadFrom(buffer.data(), buffer.size());
135+
136+
std::vector<uint8_t> downloaded_content(blob_content.size());
137+
blob_client.DownloadTo(downloaded_content.data(), downloaded_content.size());
138+
139+
EXPECT_EQ(std::string(downloaded_content.begin(), downloaded_content.end()),
140+
blob_content);
141+
}
142+
143+
TEST(AzureFileSystem, InitializeCredentials) {
144+
auto default_credential = std::make_shared<Azure::Identity::DefaultAzureCredential>();
145+
auto managed_identity_credential =
146+
std::make_shared<Azure::Identity::ManagedIdentityCredential>();
147+
auto service_principal_credential =
148+
std::make_shared<Azure::Identity::ClientSecretCredential>("tenant_id", "client_id",
149+
"client_secret");
117150
}
118151

119152
TEST(AzureFileSystem, OptionsCompare) {

cpp/thirdparty/versions.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,9 @@ ARROW_AWS_LC_BUILD_VERSION=v1.3.0
5353
ARROW_AWS_LC_BUILD_SHA256_CHECKSUM=ae96a3567161552744fc0cae8b4d68ed88b1ec0f3d3c98700070115356da5a37
5454
ARROW_AWSSDK_BUILD_VERSION=1.10.55
5555
ARROW_AWSSDK_BUILD_SHA256_CHECKSUM=2d552fb1a84bef4a9b65e34aa7031851ed2aef5319e02cc6e4cb735c48aa30de
56+
# Despite the confusing version name this is still the whole Azure SDK for C++ including core, keyvault, storage-common, etc.
57+
ARROW_AZURE_SDK_BUILD_VERSION=azure-core_1.10.2
58+
ARROW_AZURE_SDK_BUILD_SHA256_CHECKSUM=36557dae87de4cdd257d9b441d9a7f043290eae6666fb1065e0fa486ae3e58a0
5659
ARROW_BOOST_BUILD_VERSION=1.81.0
5760
ARROW_BOOST_BUILD_SHA256_CHECKSUM=9e0ffae35528c35f90468997bc8d99500bf179cbae355415a89a600c38e13574
5861
ARROW_BROTLI_BUILD_VERSION=v1.0.9

0 commit comments

Comments
 (0)