-
Notifications
You must be signed in to change notification settings - Fork 67
Add iceberg_arrow library #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me, but if we're going to require CMake 3.25, is there anything that would simplify dependency management? Arrow has been discussing requiring this version since there are apparently new features that greatly simplify things like this.
|
e.g. it seems we could integrate find_package and FetchContent more naturally and not rely on the pile of code Arrow uses: https://stackoverflow.com/questions/73621403/cmake-integrating-fetchcontent-with-find-package |
That would certainly be preferrable. The "Arrow way" of handling this has been a huge time sink. |
raulcd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have some minor CI before starting to push more code. I've created:
#7
to have at least a build validation.
I agree with David and Antoine that the idea of using a newer CMake is to try and simplify all the Thirdparty dependency management
|
That makes sense! Let me spend some time to simplify it. |
3311817 to
9f7e725
Compare
|
Well, it seems that I cannot make the Windows CI happy. It complains unresolved Arrow symbols during linking. I think I need a Windows PC to debug it. |
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can continue refining the CMake setup over time. If possible it would be good to use modern features since Arrow's setup (while well tested by now) can be rather brittle
|
@zhjwpku Yes, I am already working on simplifying it by integrating |
Sure, thanks. Take your time :) |
- add libiceberg_arrow - use single config file with three separate targets to support components - add fetchcontent support
api/iceberg/visibility.h
Outdated
| #if defined(_WIN32) || defined(__CYGWIN__) | ||
| # ifdef ICEBERG_STATIC | ||
| # define ICEBERG_EXPORT | ||
| # else | ||
| # ifdef ICEBERG_EXPORTING | ||
| # define ICEBERG_EXPORT __declspec(dllexport) | ||
| # else | ||
| # define ICEBERG_EXPORT __declspec(dllimport) | ||
| # endif | ||
| # endif | ||
| #else | ||
| # define ICEBERG_EXPORT | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to prepare this for each shared library. (We need ICEBERG_CORE_EXPORT/ICEBERG_PUFFIN_EXPORT/ICEBERG_ARROW_EXPORT.)
Because libiceberg_arrow.dll will use exported symbols in libiceberg_core.dll. In this case, libiceberg_arrow.dll uses _declspec(dllexport) for libiceberg_arrow.dll symbols and _declspec(dllimport) for libiceberg_core.dll symbols.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. What if we define ICEBERG_EXPORTING to all of libiceberg_core.dll, libiceberg_arrow.dll and libiceberg_puffin.dll? I think we can simply keep the same visibility to all libraries for now because the core library provides the API and core logic while puffin and arrow provide concrete implementations of the core library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it causes link errors. If libiceberg_arrow.dll uses a libiceberg_core.dll symbol, libiceberg_arrow.dll uses __declspec(dllexport)-ed libiceberg_core.dll symbols...
Can we add a simple function to libiceberg_core.dll and use it from libiceberg_arrow.dll?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try it on my Windows PC tonight. Thanks!
EDIT: I have added a new function DemoTable::name() in demo_table.h and called it in demo_arrow.cc. It does not have linking error. @kou
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is relevant to how we manage the public headers from different iceberg libraries. As per previous discussion, we use a separate api folder holds all public header files from all libraries (i.e. core, puffin, arrow, etc.). IIUC, the current approach makes it difficult to manage the visibility of the header file across libraries. I'd propose to remove the current api folder and let each library manage its own header files. WDYT? @kou @raulcd @zhjwpku @gaborkaszab @lidavidm @pitrou
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having separate directories per library is reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a new function
DemoTable::name()in demo_table.h and called it in demo_arrow.cc. It does not have linking error.
Hmm. It may have a real world problem because libiceberg_arrow.lib is always used with libiceberg_core.lib...
FYI: If we use the same ICEBERG_EXPORTING for all libraries, libiceberg_arrow.lib also exports symbols in libiceberg_core.dll but libiceberg_arrow.dll doesn't have them:
$ nm iceberg_arrow.lib | grep Demo
0000000000000000 T ??0DemoTable@iceberg@@QEAA@AEBV01@@Z
0000000000000000 I __imp_??0DemoTable@iceberg@@QEAA@AEBV01@@Z
0000000000000000 T ??0DemoTable@iceberg@@QEAA@XZ
0000000000000000 I __imp_??0DemoTable@iceberg@@QEAA@XZ
0000000000000000 T ??1DemoTable@iceberg@@UEAA@XZ
0000000000000000 I __imp_??1DemoTable@iceberg@@UEAA@XZ
0000000000000000 T ??4DemoArrow@arrow@iceberg@@QEAAAEAV012@$$QEAV012@@Z
0000000000000000 I __imp_??4DemoArrow@arrow@iceberg@@QEAAAEAV012@$$QEAV012@@Z
0000000000000000 T ??4DemoArrow@arrow@iceberg@@QEAAAEAV012@AEBV012@@Z
0000000000000000 I __imp_??4DemoArrow@arrow@iceberg@@QEAAAEAV012@AEBV012@@Z
0000000000000000 T ??4DemoTable@iceberg@@QEAAAEAV01@AEBV01@@Z
0000000000000000 I __imp_??4DemoTable@iceberg@@QEAAAEAV01@AEBV01@@Z
0000000000000000 I __imp_??_7DemoTable@iceberg@@6B@
0000000000000000 T ?name@DemoArrow@arrow@iceberg@@QEBA?AV?$basic_string_view@DU?$char_traits@D@std@@@std@@XZ
0000000000000000 I __imp_?name@DemoArrow@arrow@iceberg@@QEBA?AV?$basic_string_view@DU?$char_traits@D@std@@@std@@XZ
0000000000000000 T ?print@DemoArrow@arrow@iceberg@@QEBA?AV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@XZ
0000000000000000 I __imp_?print@DemoArrow@arrow@iceberg@@QEBA?AV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@XZIf we use separated _EXPORTING macro for each library, DemoTable symbols aren't exported by libiceberg_arrow.dll:
$ nm iceberg_arrow.lib | grep Demo
0000000000000000 T ??4DemoArrow@arrow@iceberg@@QEAAAEAV012@$$QEAV012@@Z
0000000000000000 I __imp_??4DemoArrow@arrow@iceberg@@QEAAAEAV012@$$QEAV012@@Z
0000000000000000 T ??4DemoArrow@arrow@iceberg@@QEAAAEAV012@AEBV012@@Z
0000000000000000 I __imp_??4DemoArrow@arrow@iceberg@@QEAAAEAV012@AEBV012@@Z
0000000000000000 T ?name@DemoArrow@arrow@iceberg@@QEBA?AV?$basic_string_view@DU?$char_traits@D@std@@@std@@XZ
0000000000000000 I __imp_?name@DemoArrow@arrow@iceberg@@QEBA?AV?$basic_string_view@DU?$char_traits@D@std@@@std@@XZ
0000000000000000 T ?print@DemoArrow@arrow@iceberg@@QEBA?AV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@XZ
0000000000000000 I __imp_?print@DemoArrow@arrow@iceberg@@QEBA?AV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@XZThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd propose to remove the current
apifolder and let each library manage its own header files.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed the api folder and move the header files to each library instead.
kou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
cmake_modules/BuildUtils.cmake
Outdated
| string(TOUPPER ${LIB_NAME} LIB_NAME_UPPER) | ||
| if(BUILD_SHARED) | ||
| generate_export_header(${LIB_NAME}_shared BASE_NAME ${LIB_NAME_UPPER}) | ||
| target_compile_definitions(${LIB_NAME}_shared PUBLIC ${LIB_NAME}_shared_EXPORTS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this?
It seems that this is defined automatically: https://cmake.org/cmake/help/latest/prop_tgt/DEFINE_SYMBOL.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I added this to fix a linking error. I can try to remove this again to confirm.
Update: It failed at linking again: https://github.com/apache/iceberg-cpp/actions/runs/12591221855/job/35094013478?pr=6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/apache/iceberg-cpp/actions/runs/12591221855/job/35094013478#step:3:422
demo_puffin.obj : error LNK2019: unresolved external symbol "__declspec(dllimport) public: virtual __cdecl iceberg::Puffin::~Puffin(void)" (__imp_??1Puffin@iceberg@@UEAA@XZ) referenced in function "public: virtual __cdecl iceberg::puffin::DemoPuffin::~DemoPuffin(void)" (??1DemoPuffin@puffin@iceberg@@UEAA@XZ) [D:\a\iceberg-cpp\iceberg-cpp\build\src\iceberg\puffin\iceberg_puffin_shared.vcxproj]
demo_puffin.obj : error LNK2019: unresolved external symbol "__declspec(dllimport) public: __cdecl iceberg::Puffin::Puffin(void)" (__imp_??0Puffin@iceberg@@QEAA@XZ) referenced in function "public: __cdecl iceberg::puffin::DemoPuffin::DemoPuffin(void)" (??0DemoPuffin@puffin@iceberg@@QEAA@XZ) [D:\a\iceberg-cpp\iceberg-cpp\build\src\iceberg\puffin\iceberg_puffin_shared.vcxproj]
demo_puffin.obj : error LNK2019: unresolved external symbol "__declspec(dllimport) public: __cdecl iceberg::Puffin::Puffin(class iceberg::Puffin const &)" (__imp_??0Puffin@iceberg@@QEAA@AEBV01@@Z) referenced in function "public: __cdecl iceberg::puffin::DemoPuffin::DemoPuffin(class iceberg::puffin::DemoPuffin const &)" (??0DemoPuffin@puffin@iceberg@@QEAA@AEBV012@@Z) [D:\a\iceberg-cpp\iceberg-cpp\build\src\iceberg\puffin\iceberg_puffin_shared.vcxproj]
demo_puffin.obj : error LNK2019: unresolved external symbol "__declspec(dllimport) public: class iceberg::Puffin & __cdecl iceberg::Puffin::operator=(class iceberg::Puffin const &)" (__imp_??4Puffin@iceberg@@QEAAAEAV01@AEBV01@@Z) referenced in function "public: class iceberg::puffin::DemoPuffin & __cdecl iceberg::puffin::DemoPuffin::operator=(class iceberg::puffin::DemoPuffin const &)" (??4DemoPuffin@puffin@iceberg@@QEAAAEAV012@AEBV012@@Z) [D:\a\iceberg-cpp\iceberg-cpp\build\src\iceberg\puffin\iceberg_puffin_shared.vcxproj]
OK. It's an implementation problem. We should remove this line. (Or we should use PRIVATE not PUBLIC.)
iceberg::Puffin should exist in libiceberg_core.dll not libiceberg_puffin.dll. Can we define iceberg::Puffin in libiceberg_core.dll something like the following?
diff --git a/src/iceberg/demo_table.cc b/src/iceberg/demo_table.cc
index 9e46d7a..d960f1b 100644
--- a/src/iceberg/demo_table.cc
+++ b/src/iceberg/demo_table.cc
@@ -18,6 +18,7 @@
*/
#include "iceberg/demo_table.h"
+#include "iceberg/puffin.h"
namespace iceberg {
BTW, is it OK that iceberg::Puffin exists in libiceberg_core.dll not libiceberg_puffin.dll?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried to use PRIVATE and it failed: https://github.com/apache/iceberg-cpp/actions/runs/12584082126/job/35072964038
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/apache/iceberg-cpp/actions/runs/12584082126/job/35072964038#step:3:410
LINK : fatal error LNK1104: cannot open file '..\Debug\iceberg.lib' [D:\a\iceberg-cpp\iceberg-cpp\build\src\iceberg\arrow\iceberg_arrow_shared.vcxproj]
Building Custom Rule D:/a/iceberg-cpp/iceberg-cpp/src/iceberg/CMakeLists.txt
demo_table.cc
iceberg_static.vcxproj -> D:\a\iceberg-cpp\iceberg-cpp\build\src\iceberg\Debug\iceberg_static.lib
Building Custom Rule D:/a/iceberg-cpp/iceberg-cpp/src/iceberg/arrow/CMakeLists.txt
demo_arrow.cc
iceberg_arrow_static.vcxproj -> D:\a\iceberg-cpp\iceberg-cpp\build\src\iceberg\arrow\Debug\iceberg_arrow_static.lib
Building Custom Rule D:/a/iceberg-cpp/iceberg-cpp/src/iceberg/puffin/CMakeLists.txt
demo_puffin.cc
LINK : fatal error LNK1104: cannot open file '..\Debug\iceberg.lib' [D:\a\iceberg-cpp\iceberg-cpp\build\src\iceberg\puffin\iceberg_puffin_shared.vcxproj]
I think that it's a different problem. Could you share the URL of the commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining the context.
How about discussing it as a follow-up task if it's needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we can merge this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your help!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wgtmac I think let's go with the approach that is the most convenient at this point just to get going. We can re-evaluate later on once we actually do the implementation of Puffin. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Let's keep the puffin library as is and revisit its design when you start the implementation.
kou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Co-authored-by: Sutou Kouhei <[email protected]>
Xuanwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Thanks @Xuanwo! |
libiceberg_arrowto leverage arrow-backed implementations (e.g. filesystem and parquet).libiceberg_coretolibiceberg.libiceberg_arrowandlibiceberg_puffinlink tolibiceberg.apidirectory and let each library manage its own header files.