Skip to content

Conversation

@wgtmac
Copy link
Member

@wgtmac wgtmac commented Apr 8, 2025

No description provided.

@wgtmac wgtmac force-pushed the add_table_metadata branch 2 times, most recently from 5871171 to 7dcadb2 Compare April 8, 2025 08:00
@wgtmac wgtmac force-pushed the add_table_metadata branch from 7dcadb2 to b76dc8e Compare April 8, 2025 08:19
@wgtmac wgtmac force-pushed the add_table_metadata branch from b76dc8e to f8ea501 Compare April 8, 2025 08:21
///
/// TODO(wgtmac): Implement Equals and ToString once SortOrder and Snapshot are
/// implemented.
struct ICEBERG_EXPORT TableMetadata {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to mark some fields as optional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we need but I'm still unclear which one should change. I will postpone this decision until we implement the json serialization from/to different table format versions.

Comment on lines 114 to 115
/// whether or not to track the creation and updates to rows in the table
bool row_lineage_enabled = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Row lineage will always be enabled from V3 onwards: apache/iceberg#12593 So I think we can remove this 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads-up! I found that the Java impl still has this field which looks weird to me.

Copy link
Collaborator

@zhjwpku zhjwpku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

#include <format>
#include <string>

#include "iceberg/statistics_file.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason why this include is here and not in table_metadata.h where we seem to require the following?

  std::vector<std::shared_ptr<struct StatisticsFile>> statistics;
  /// A list of partition statistics
  std::vector<std::shared_ptr<struct PartitionStatisticsFile>> partition_statistics;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to use forward declaration as much as possible. The implementation is not yet complete due to missing the concrete implementation of Snapshot and other classes so it looks weird that iceberg/statistics_file.h is included but not used at the moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but as those would still be part of the table metadata API wouldn't it be better to be part of the header file? I am learning C++, I just thought it would still be better there even when we add the implementation part but I might be wrong :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a common practice to use forward declaration to speed up compilation if the implementation detail is not required in the header file, though I suspect that modern compilers are smart enough to optimize this.

@Fokko Fokko merged commit 2819128 into apache:main Apr 8, 2025
6 checks passed
@Fokko
Copy link
Contributor

Fokko commented Apr 8, 2025

Thanks @wgtmac for working on this, and thanks @zhjwpku for the review 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants