Skip to content

Conversation

@wgtmac
Copy link
Member

@wgtmac wgtmac commented Mar 28, 2025

No description provided.

@wgtmac
Copy link
Member Author

wgtmac commented Mar 28, 2025

@lidavidm @zhjwpku PTAL :)

#ifndef ARROW_C_STREAM_INTERFACE
# define ARROW_C_STREAM_INTERFACE

struct ArrowArrayStream {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention [0] in the \file description?

I have a feeling that this is not related to PR $titile, so maybe another PR for this?

[0] https://arrow.apache.org/docs/format/CStreamInterface.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to add this, otherwise compiling files with headers from nanoarrow will complain missing definition of ArrowArrayStream.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, then the patch LGTM.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed ArrowArrayStream. I think internally we can just include nanoarrow.h so we don't have to add ArrowArrayStream and confuse the downstream.

ASSERT_EQ(imported_schema->num_fields(), 1);

auto field = imported_schema->field(0);
CheckArrowField(*field, ::arrow::Type::STRUCT, kStructFieldName, /*nullable=*/false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically you should ASSERT_NO_FATAL_FAILURE(CheckArrowField(...))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I forgot to add them.

test/test_util.h Outdated

#pragma once

#define EXPECT_OK(value) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you can write a GMock matcher using one of the macros GMock provides, then you can do something like ASSERT_THAT(Foo(), IsOk()). A macro might be simpler, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion! Added matchers.h for this.

std::string_view name = "", int32_t field_id = -1) {
ArrowBuffer metadata_buffer;
NANOARROW_RETURN_NOT_OK(ArrowMetadataBuilderInit(&metadata_buffer, nullptr));
if (field_id > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, where in the Iceberg spec does it say that field IDs must be nonzero? (I don't even see a bit width defined for field ID in the spec...)

optional might be a more idiomatic way to write the type but this works (if we can find a reference for the field ID type)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! In practice, parquet-cpp rejects negative field_id: https://github.com/apache/arrow/blob/618ef501a21375abfaeee19e393eb64dee83ef0d/cpp/src/parquet/arrow/schema.cc#L248-L278

@Fokko @rdblue Has this been discussed before? Should we restrict field_id to be non-negative from the spec?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, should field ID 0 be valid then though?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've switched to use std::optional<int32_t> for now.

Copy link
Collaborator

@zhjwpku zhjwpku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wgtmac
Copy link
Member Author

wgtmac commented Mar 31, 2025

Thanks @lidavidm @zhjwpku! I believe I have addressed all the comments now.

@wgtmac
Copy link
Member Author

wgtmac commented Mar 31, 2025

@Fokko @Xuanwo Could you help review and merge it? Thanks!

Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thank @wgtmac for working on this. I have some dumb questions over this PR.

constexpr const char* kArrowExtensionMetadata = "ARROW:extension:metadata";

// Convert an Iceberg type to Arrow schema. Return value is Nanoarrow error code.
ArrowErrorCode ConvertToArrowSchema(const Type& type, ArrowSchema* schema, bool optional,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConvertToArrowSchema and ToArrowSchema are a bit confusing to me. Do we need to make the API naming more clear or split them in different namespaces?


By the way, do we have a policy for our API args' order?

input_field_a, out, input_field_b, input_field_c seems not good to me and hard to follow.

Is it widely adopted in cpp community?

Copy link
Member Author

@wgtmac wgtmac Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Actually they are in different namespaces (the former one is in the anonymous namespace). I have renamed them to ToArrowSchema to be consistent.

W.r.t. the order of arguments, those with default values must be placed at the end. Since they are all internal functions, I have just removed any default argument and put the out param to the end.

Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, make sense to me

@Xuanwo Xuanwo merged commit faf9cc8 into apache:main Mar 31, 2025
6 checks passed
@wgtmac
Copy link
Member Author

wgtmac commented Mar 31, 2025

Thank you @Xuanwo!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants