-
Notifications
You must be signed in to change notification settings - Fork 70
feat: implement initial MemoryCatalog functionality with namespace and table support #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
gty404
commented
Apr 17, 2025
- Added NamespaceContainer class to manage namespace hierarchy and table metadata
- Implemented MemoryCatalog methods: ListTables, TableExists, DropTable, RegisterTable
2774dc9 to
e47caa0
Compare
src/iceberg/catalog/memory_catalog.h
Outdated
| * the children of the root are returned. | ||
| * @return A vector of child namespace names. | ||
| */ | ||
| std::vector<std::string> ListChildrenNamespaces( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not return std::vector<Namespace>? Does it return next level or all levels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is returning the namespace of the next level, returning std::string instead of Namespace, I think this makes it easier to get the name.
|
I know that you already took the time and effort to write this, but I did want to put this out here. In PyIceberg we've removed the MemoryCatalog from the codebase, mostly because the SQLCatalog (JDBCCatalog in Java) was proven to be much better. The SQLCatalog also allows running in-memory using a SQLite in-memory instance, but also provides the option to easily keep the state after the process terminates. Next to that, it also provides locking out of the box. |
|
@yingcai-cy @wgtmac @lidavidm @zhjwpku Could you help me review it again? |
zhjwpku
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two minor review comments, other than that, LGTM.
src/iceberg/catalog/memory_catalog.h
Outdated
| std::unique_ptr<iceberg::TableBuilder> BuildTable(const TableIdentifier& identifier, | ||
| const Schema& schema) const override; | ||
|
|
||
| private: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: what about using pimpl idiom to hide the implementation detail like InMemoryNamespace and reduce compile time?
class ICEBERG_EXPORT InMemoryCatalog : public Catalog {
public:
...
private:
std::unique_ptr<class InMemoryCatalogImpl> impl_;
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about placing InMemoryCatalogImpl and InMemoryNamespace in another header file? Their unit tests are also easy to add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why unit tests are required to have access to InMemoryCatalogImpl? Isn't the Catalog interface enough to test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't InMemoryCatalogImpl and InMemoryNamespace require corresponding unit tests? The interfaces provided by these two may not be completely covered by the InMemoryCatalog test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's reasonable.
src/iceberg/catalog/memory_catalog.h
Outdated
| * the children of the root are returned. | ||
| * \return A vector of child namespace names. | ||
| */ | ||
| Result<std::vector<std::string>> ListChildrenNamespaces( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use std::span
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The returned Namespace list is a temporary variable.
src/iceberg/catalog/memory_catalog.h
Outdated
| * tables, and child namespaces. This structure enables a tree-like representation | ||
| * of nested namespaces. | ||
| */ | ||
| class ICEBERG_EXPORT InMemoryNamespace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was simply mirroring the Java impl when I added the Catalog without notice of SupportsNamespace: https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/catalog/SupportsNamespaces.java. Does it make sense to add all interfaces from SupportsNamespace to our Catalog API? iceberg-python did the same thing: https://github.com/apache/iceberg-python/blob/main/pyiceberg/catalog/__init__.py#L588 This aligns with the rest catalog api.
@Fokko Do you have an opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @wgtmac I think it makes sense to add all of them to the Catalog API. Keep in mind that some implementations (Rest, SQL) support nested namespaces, and some don't (Hive, Glue etc)
|
|
||
| namespace iceberg { | ||
|
|
||
| class ICEBERG_EXPORT InMemoryCatalog : public Catalog { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some comments to describe its purpose. Especially to clarify that it is not supposed for production use.
| * tables, and child namespaces. This structure enables a tree-like representation | ||
| * of nested namespaces. | ||
| */ | ||
| class ICEBERG_EXPORT InMemoryNamespace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in #80 (comment), we need to move some of these functions (which are included in the rest catalog spec) as a part of the public Catalog API.
| * \param[in] namespace_ident The namespace to check. | ||
| * \return Status indicating success or failure. | ||
| */ | ||
| Status NamespaceExists(const Namespace& namespace_ident) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Result<bool>
| * the children of the root are returned. | ||
| * \return A vector of child namespace names. | ||
| */ | ||
| Result<std::vector<std::string>> ListChildrenNamespaces( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If parent can be optional, is it better to rename it to ListNamespaces?
wgtmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating it! I've left some minor comments.
src/iceberg/catalog.h
Outdated
| /// \return Status::OK if updated successfully; | ||
| /// ErrorKind::kNoSuchNamespace if the namespace does not exist; | ||
| /// ErrorKind::kNotSupported if the operation is not supported | ||
| virtual Status SetNamespaceProperties( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about merging SetNamespaceProperties and RemoveNamespaceProperties into a single UpdateNamespaceProperties(updates, removals)? This is a single call in the rest catalog spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am referring to the implementation in java implementations, which has two interfaces. Should we follow the REST catalog spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fokko WDYT?
| /** | ||
| * @brief An in-memory implementation of the Iceberg Catalog interface. | ||
| * | ||
| * This catalog stores all metadata purely in memory, with no persistence to disk | ||
| * or external systems. It is primarily intended for unit tests, prototyping, or | ||
| * demonstration purposes. | ||
| * | ||
| * @note This class is **not** suitable for production use. | ||
| * All data will be lost when the process exits. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /** | |
| * @brief An in-memory implementation of the Iceberg Catalog interface. | |
| * | |
| * This catalog stores all metadata purely in memory, with no persistence to disk | |
| * or external systems. It is primarily intended for unit tests, prototyping, or | |
| * demonstration purposes. | |
| * | |
| * @note This class is **not** suitable for production use. | |
| * All data will be lost when the process exits. | |
| */ | |
| /// \brief An in-memory implementation of the Iceberg Catalog interface. | |
| /// | |
| /// This catalog stores all metadata purely in memory, with no persistence to disk | |
| /// or external systems. It is primarily intended for unit tests, prototyping, or | |
| /// demonstration purposes. | |
| /// | |
| /// \note This class is **not** suitable for production use. | |
| /// All data will be lost when the process exits. |
|
Yes, looks good @gty404 thanks! |