Skip to content

Conversation

@dongxiao1198
Copy link
Contributor

@dongxiao1198 dongxiao1198 commented Aug 14, 2025

1 add v1v2v3 writer definition
2 add v1v2v3 metadata wrapper

@dongxiao1198 dongxiao1198 marked this pull request as ready for review August 15, 2025 02:16
case 1:
return std::make_unique<ManifestListWriterV1>(snapshot_id, parent_snapshot_id,
std::move(writer), std::move(schema));
case 2:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sequence_number is required for V2 manifest list, so here needs check validation for v2?

@wgtmac
Copy link
Member

wgtmac commented Aug 19, 2025

What about this? ManifestListWriter does not need to be an abstract class. Only ManifestFileAdapter should have subclasses from v1 ~ v4 to deal with different schemas and value-filling logics. @dongxiao1198 @HeartLinked

// Implemented by different versions with different schemas to
// append a list of `ManifestFile`s to an `ArrowArray`.
class ManifestFileAdapter {
 public:
  virtual Status StartAppending() const = 0;
  virtual Status Append(const ManifestFile& file) const = 0;
  virtual Result<ArrowArray> FinishAppending() const = 0;
  int64_t size() const { return size_; }

 private:
  std::shared_ptr<Schema> manifest_list_schema_;
  ArrowSchema schema_;  // converted from manifest_list_schema_
  ArrowArray array_;    // array to append `ManifestFile`s
  int64_t size_;
};

class ICEBERG_EXPORT ManifestListWriter {
 public:
  static Result<std::unique_ptr<ManifestListWriter>> Make(
      std::string_view manifest_list_location, std::shared_ptr<FileIO> file_io,
      const std::unordered_map<std::string, std::string>& meta);

  Status AddAll(const std::vector<ManifestFile>& files) const {
    for (const auto& file : files) {
      Add(file);
    }
    return {};
  }

  Status Add(const ManifestFile& file) const {
    if (adapter_->size() >= kBatchSize) {
      ICEBERG_ASSIGN_OR_RAISE(auto array, adapter_->FinishAppending());
      ICEBERG_RETURN_UNEXPECTED(writer_->Write(array));
      ICEBERG_RETURN_UNEXPECTED(adapter_->StartAppending());
    }
    return adapter_->Append(file);
  }

  Status Close() const;

 private:
  static constexpr int64_t kBatchSize = 1024;
  std::unique_ptr<Writer> writer_;
  std::unique_ptr<ManifestFileAdapter> adapter_;
};

@dongxiao1198
Copy link
Contributor Author

What about this? ManifestListWriter does not need to be an abstract class. Only ManifestFileAdapter should have subclasses from v1 ~ v4 to deal with different schemas and value-filling logics. @dongxiao1198 @HeartLinked

// Implemented by different versions with different schemas to
// append a list of `ManifestFile`s to an `ArrowArray`.
class ManifestFileAdapter {
 public:
  virtual Status StartAppending() const = 0;
  virtual Status Append(const ManifestFile& file) const = 0;
  virtual Result<ArrowArray> FinishAppending() const = 0;
  int64_t size() const { return size_; }

 private:
  std::shared_ptr<Schema> manifest_list_schema_;
  ArrowSchema schema_;  // converted from manifest_list_schema_
  ArrowArray array_;    // array to append `ManifestFile`s
  int64_t size_;
};

class ICEBERG_EXPORT ManifestListWriter {
 public:
  static Result<std::unique_ptr<ManifestListWriter>> Make(
      std::string_view manifest_list_location, std::shared_ptr<FileIO> file_io,
      const std::unordered_map<std::string, std::string>& meta);

  Status AddAll(const std::vector<ManifestFile>& files) const {
    for (const auto& file : files) {
      Add(file);
    }
    return {};
  }

  Status Add(const ManifestFile& file) const {
    if (adapter_->size() >= kBatchSize) {
      ICEBERG_ASSIGN_OR_RAISE(auto array, adapter_->FinishAppending());
      ICEBERG_RETURN_UNEXPECTED(writer_->Write(array));
      ICEBERG_RETURN_UNEXPECTED(adapter_->StartAppending());
    }
    return adapter_->Append(file);
  }

  Status Close() const;

 private:
  static constexpr int64_t kBatchSize = 1024;
  std::unique_ptr<Writer> writer_;
  std::unique_ptr<ManifestFileAdapter> adapter_;
};

The latest patch add a new header for this adapter at src/iceberg/metadata_adapter.h.
Split writer maker function into 4 independent builder function.

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Left some nits.

@zeroshade zeroshade merged commit 7404f14 into apache:main Sep 3, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants