This repository was archived by the owner on Oct 10, 2025. It is now read-only.
Conversation
4e36d27 to
e2e77a5
Compare
Benchmark ResultMaster commit hash:
|
ray6080
approved these changes
Aug 21, 2024
| // Bind physical create node table info | ||
| auto pkDefinition = getDefinition(propertyDefinitions, extraInfo.pkName); | ||
| std::vector<PropertyDefinition> physicalPropertyDefinitions; | ||
| physicalPropertyDefinitions.push_back(pkDefinition.copy()); |
Contributor
There was a problem hiding this comment.
Do you need to copy here? getDefinition returns a copied one already.
| static std::vector<std::string> getPropertyNames(const std::vector<TableCatalogEntry*>& entries) { | ||
| std::vector<std::string> result; | ||
| std::unordered_set<std::string> propertyNamesSet; | ||
| auto distinctVector = DistinctVector<std::string>(); |
Contributor
There was a problem hiding this comment.
The DistinctVector usage seems unnecessarily complicated here, why not just use a set and copy to a vector at the end?
| newSubgraph.addQueryNode(nodePos); | ||
| auto plan = std::make_unique<LogicalPlan>(); | ||
| appendScanNodeTable(node->getInternalID(), node->getTableIDs(), {}, *plan); | ||
| appendScanNodeTable(node->getInternalID(), {}, node->getEntries(), *plan); |
Contributor
There was a problem hiding this comment.
Better add an inline comment to {}.
| auto [boundNode, nbrNode] = getBoundAndNbrNodes(*rel, direction); | ||
| const auto extendDirection = getExtendDirection(*rel, *boundNode); | ||
| appendScanNodeTable(boundNode->getInternalID(), boundNode->getTableIDs(), {}, *plan); | ||
| appendScanNodeTable(boundNode->getInternalID(), {}, boundNode->getEntries(), *plan); |
Contributor
There was a problem hiding this comment.
Ditto on adding an inline comment to {}.
3a97fc6 to
b386c41
Compare
Benchmark ResultMaster commit hash:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This is our initial PR to support directly execute cypher on relational database. It contains logic for node table only. Major changes including
Nested catalog entry.
We add a new catalog entry type
ExternalNodeTablewhich has a nested entry structure. At parent level, it maintains the logic view of properties which aligns with the columns in relational tables. At child level, it maintains another catalog entry which contains the primary key property only. This child entry aligns with our physical storage.In the current design, we still need to materialize primary key and use it as join condition when we try to read a property that does not exist in storage.
Scan external table
When we run
MATCH (a:label)where label points to an external relational table, we need to scan external relational table and the primary key column materialized in our storage and then perform a join on primary key.Some sanity benchmark numbers are
Setup
LDBC10 Comment table storing in DuckDB database. 8 Threads.
DuckDB native scanning: 0.3s.
Kuzu scanning DuckDB: 2s.
Scanning external database: copy primary key (6s) + join (3s) = 9s.
Slower than DuckDB is expected as we need to first materialize DuckDB's result and then re-scan it. The major overhead is in us scanning DuckDB result which @acquamarin should see if we can further optimize this.
Another bottleneck is the copy of primary key. I'm fairly confident I can bring the time to ~2s with some optimization.
Fixes # (issue)
Contributor agreement