Skip to content

fix: Fix overflow for vertex/edge insersion#8

Open
zhanglei1949 wants to merge 23 commits intomainfrom
fix-overflow
Open

fix: Fix overflow for vertex/edge insersion#8
zhanglei1949 wants to merge 23 commits intomainfrom
fix-overflow

Conversation

@zhanglei1949
Copy link
Collaborator

@zhanglei1949 zhanglei1949 commented Mar 9, 2026

Fix #7

Related Greptile view at https://github.com/GraphScope/neug/pull/1725

Greptile Summary

This PR fixes overflow issues during vertex and edge insertion by introducing explicit capacity tracking with a new calculate_new_capacity() growth function (growth.h), persisted statistics files for edge tables, and EnsureCapacity methods on VertexTable, EdgeTable, and PropertyGraph. It also fixes the TypedColumn<string_view> write-cursor persistence bug (.pos file) and removes the compaction-triggered buffer shrink that was destroying reserved capacity.

Key changes:

  • New growth.h provides overflow-safe capacity growth (2× for vertices, ~20% for edges)
  • EdgeTable gains capacity_ atomic, EnsureCapacity(), and a statistics file written on Dump() / read on Open() to survive restarts
  • VertexTable::Open() signature simplified (removed unused build_empty_graph param); eager keys_->resize() removed from LFIndexer::open()
  • TypedColumn<string_view>::pos_ now persisted to a .pos file instead of being re-derived from data_size() after compaction
  • mmap_array::compact() no longer shrinks the data buffer reservation

Issues found:

  • In AddEdge (update_transaction.cc), the WAL entry is serialized and op_num_ incremented before EnsureCapacity is called. If the capacity check subsequently fails, the WAL contains a spurious uncommitted edge that was never applied, breaking WAL replay integrity.
  • The new table_cap != capacity_ integrity check in EdgeTable::Open/OpenInMemory/OpenWithHugepages breaks backward compatibility: upgrading an existing database (without statistics files) throws an exception for any non-empty edge table because capacity_ is back-filled to 0 when no statistics file is found.
  • growth.h is missing #include <limits> for std::numeric_limits.
  • insert_edges_bundled_typed_impl calls edge_data.reserve(edge_data.size()) (reserves 0) instead of edge_data.reserve(property_vec.size()).
  • read_file opens files with "r" (text mode) instead of "rb" — a portability concern for binary data.

Confidence Score: 2/5

  • Not safe to merge: WAL ordering bug in AddEdge can corrupt replay, and the missing statistics file check breaks opening any pre-existing database after upgrade.
  • Two logic-level bugs block safe merging: (1) the redo-log/op_num_ increment in AddEdge happens before EnsureCapacity, meaning a failed capacity check leaves a phantom WAL entry that will be replayed on restart; (2) the strict capacity-file integrity check in EdgeTable::Open has no fallback for the upgrade case (no statistics file, non-empty table), making existing databases irrecoverable. The rest of the changes — growth formula, column pos_ persistence, compaction fix, and id_indexer cleanup — look correct.
  • src/transaction/update_transaction.cc (WAL ordering), src/storages/graph/edge_table.cc (backward-compat integrity check)

Important Files Changed

Filename Overview
src/transaction/update_transaction.cc AddEdge serializes the redo log entry and increments op_num_ before calling EnsureCapacity — if capacity check fails, the WAL is left with a spurious uncommitted entry that could corrupt replay.
src/storages/graph/edge_table.cc New statistics-file integrity check breaks backward compatibility: databases opened after upgrading from the old code have no statistics file, causing capacity_=0 while table_cap>0, which always throws. Also contains a reserve(0) no-op in insert_edges_bundled_typed_impl.
include/neug/utils/growth.h New file introducing calculate_new_capacity with overflow guards. Missing #include <limits> for std::numeric_limits; works today only via transitive inclusion.
src/utils/file_utils.cc read_file/write_file and statistic helpers moved here. write_file now throws on error (addressing previous review). fopen uses "r" (text mode) instead of "rb" for binary data — portability concern.
src/storages/graph/property_graph.cc Dump() now correctly uses edge_table.Size() (not Capacity()) for growth calculations, addressing the previous unbounded-growth issue. EnsureCapacity overloads correctly gate on old_cap before resizing.
src/storages/graph/vertex_table.cc EnsureCapacity and Size() methods added cleanly; Open() signature simplified; Reserve() now correctly guards on existing capacity before resizing.
include/neug/utils/id_indexer.h Removed eager keys_->resize() calls on every open(), deferring capacity management to the explicit EnsureCapacity path — correct approach.
include/neug/utils/property/column.h pos_ (string write cursor) is now persisted to a .pos file rather than derived from data_size(), fixing the compaction-overwrites-reservation bug. init_pos helper correctly handles the missing-file case.
include/neug/utils/mmap_array.h compact() no longer calls data_.resize() to shrink the buffer, preserving reserved capacity after compaction. The old data_size log message is fixed to report limit_offset.

Sequence Diagram

sequenceDiagram
    participant Client
    participant UpdateTransaction
    participant PropertyGraph
    participant VertexTable
    participant EdgeTable
    participant WAL

    Client->>UpdateTransaction: AddVertex(label, oid, props)
    UpdateTransaction->>PropertyGraph: EnsureCapacity(label)
    PropertyGraph->>VertexTable: EnsureCapacity(capacity)
    VertexTable-->>PropertyGraph: new_cap
    PropertyGraph-->>UpdateTransaction: Status::OK
    UpdateTransaction->>WAL: InsertVertexRedo::Serialize(...)
    UpdateTransaction->>PropertyGraph: AddVertex(...)
    PropertyGraph-->>UpdateTransaction: vid

    Client->>UpdateTransaction: AddEdge(src, dst, edge, props)
    Note over UpdateTransaction,WAL: ⚠️ BUG: Serialize happens BEFORE EnsureCapacity
    UpdateTransaction->>WAL: InsertEdgeRedo::Serialize(...)
    UpdateTransaction->>UpdateTransaction: op_num_ += 1
    UpdateTransaction->>PropertyGraph: EnsureCapacity(src, dst, edge)
    alt EnsureCapacity fails
        PropertyGraph-->>UpdateTransaction: Status::ERR
        UpdateTransaction-->>Client: false (WAL already written!)
    else EnsureCapacity succeeds
        PropertyGraph->>EdgeTable: EnsureCapacity(capacity)
        EdgeTable-->>PropertyGraph: OK
        UpdateTransaction->>PropertyGraph: AddEdge(...)
        PropertyGraph-->>UpdateTransaction: oe_offset
    end

    Client->>PropertyGraph: Dump()
    PropertyGraph->>VertexTable: EnsureCapacity(calculate_new_capacity(size))
    PropertyGraph->>EdgeTable: EnsureCapacity(calculate_new_capacity(Size()))
    EdgeTable->>EdgeTable: table_->resize(capacity)
    PropertyGraph->>VertexTable: Dump()
    PropertyGraph->>EdgeTable: Dump()
    EdgeTable->>EdgeTable: write_statistic_file(Capacity(), Size())
Loading

Comments Outside Diff (4)

  1. src/transaction/update_transaction.cc, line 738-746 (link)

    Redo log serialized before capacity check — inconsistent transaction state on failure

    InsertEdgeRedo::Serialize and op_num_ += 1 are executed on lines 738–741 before EnsureCapacity is checked on line 742. If EnsureCapacity returns an error (e.g., the edge label triplet doesn't exist), the function returns false but the redo log (arc_) already contains a serialized edge entry and op_num_ has been incremented. This leaves the transaction's redo log in an inconsistent state and could cause incorrect replay during crash recovery.

    This is directly inconsistent with the pattern used in the AddVertex function in this same file (lines 683–691), which correctly calls EnsureCapacity before calling InsertVertexRedo::Serialize and incrementing op_num_.

    The fix is to move the EnsureCapacity call (and early return on failure) before the serialize/increment:

  2. src/transaction/update_transaction.cc, line 738-746 (link)

    Redo log serialized before capacity check — transaction log corruption on failure

    InsertEdgeRedo::Serialize (line 738) and op_num_ += 1 (line 741) execute before EnsureCapacity is checked. If EnsureCapacity returns an error and the function returns false, the redo log entry has already been appended and op_num_ incremented, but the edge was never actually inserted. This leaves the transaction log in an inconsistent state and could cause incorrect replay during recovery.

    This is directly inconsistent with the fixed AddVertex path in the same file (lines 683–692), where serialization is correctly placed after the capacity check:

    // AddVertex (correct ordering):
    auto status = graph_.EnsureCapacity(label);
    if (!status.ok()) { return false; }
    InsertVertexRedo::Serialize(arc_, label, oid, props);
    op_num_ += 1;
    status = graph_.AddVertex(...);

    The AddEdge block should follow the same pattern:

    auto status = graph_.EnsureCapacity(src_label, dst_label, edge_label);
    if (!status.ok()) {
      LOG(ERROR) << "Failed to ensure space before insert edge: "
                 << status.ToString();
      return false;
    }
    InsertEdgeRedo::Serialize(arc_, src_label, GetVertexId(src_label, src_lid),
                              dst_label, GetVertexId(dst_label, dst_lid),
                              edge_label, properties);
    op_num_ += 1;
    auto oe_offset = graph_.AddEdge(...);
  3. src/transaction/update_transaction.cc, line 738-746 (link)

    Redo log serialized before capacity check

    InsertEdgeRedo::Serialize (line 738) and op_num_ += 1 (line 741) are called before EnsureCapacity (line 742). If EnsureCapacity fails and the function returns false, the WAL already contains the edge-insert entry and op_num_ has been incremented — but the edge was never actually written to the graph.

    On WAL replay after a crash at this point, the replayed edge insertion would fail again (or worse, succeed spuriously with uninitialised capacity), corrupting the graph state.

    The AddVertex path (line 683) does this correctly: it calls EnsureCapacity first, and only then serializes the redo entry. AddEdge should follow the same ordering:

    // Correct order — mirror AddVertex:
    auto status = graph_.EnsureCapacity(src_label, dst_label, edge_label);
    if (!status.ok()) {
      LOG(ERROR) << "Failed to ensure space before insert edge: " << status.ToString();
      return false;
    }
    InsertEdgeRedo::Serialize(arc_, src_label, GetVertexId(src_label, src_lid),
                              dst_label, GetVertexId(dst_label, dst_lid),
                              edge_label, properties);
    op_num_ += 1;
  4. src/storages/graph/edge_table.cc, line 263-264 (link)

    reserve reserves 0 instead of the intended size

    edge_data is freshly default-constructed, so edge_data.size() is 0. The call edge_data.reserve(edge_data.size()) is a no-op and the intent was clearly to pre-allocate property_vec.size() slots to avoid repeated reallocations during the loop below.

Last reviewed commit: f6501a8

Greptile also left 3 inline comments on this PR.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets the 4096 insert failure (issue #7) by introducing proactive capacity growth for vertex/edge storage structures so that sequential CREATE statements can continue past the default reserved space.

Changes:

  • Add EnsureCapacity APIs for vertex/edge storage and call them before inserts to avoid “reserved space exhausted” failures.
  • Introduce edge-table capacity tracking for unbundled edges and expose capacity/size helpers across storage layers.
  • Add/extend Python binding tests for high-volume vertex/edge insertions; adjust some logging/error handling paths.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
tools/python_bind/tests/test_tp_service.py Adds TP service stress tests for inserting many vertices/edges; also updates execute() call style to pass access_mode.
tools/python_bind/tests/test_db_query.py Adds embedded/local test for inserting many edges and validating count.
src/utils/property/column.cc Fixes missing return in TypedEmptyColumn<T>::get_view.
src/transaction/update_transaction.cc Calls graph_.EnsureCapacity(...) before vertex/edge inserts.
src/storages/graph/vertex_table.cc Adds VertexTable::EnsureCapacity helper.
src/storages/graph/property_graph.cc Adds PropertyGraph::EnsureCapacity for vertices/edges and improves vertex add error message.
src/storages/graph/graph_interface.cc Calls EnsureCapacity in AP update interface before vertex/edge inserts.
src/storages/graph/edge_table.cc Tracks/resizes unbundled edge property-table capacity and adds EdgeTable::EnsureCapacity.
src/storages/csr/mutable_csr.cc Adds CSR capacity() API implementation.
src/storages/csr/immutable_csr.cc Adds CSR capacity() API implementation.
src/server/neug_db_session.cc Adds exception handling around transaction commit and returns structured internal errors.
src/compiler/planner/gopt_planner.cc Downgrades compilePlan query logging from INFO to VLOG(1).
include/neug/storages/graph/vertex_table.h Declares EnsureCapacity and adds Size() accessor.
include/neug/storages/graph/property_graph.h Declares EnsureCapacity APIs (vertex and edge).
include/neug/storages/graph/edge_table.h Declares EnsureCapacity, adds size/capacity helpers and capacity tracking member.
include/neug/storages/csr/csr_base.h Adds pure virtual capacity() API for CSRs.
include/neug/storages/csr/mutable_csr.h Declares capacity() overrides for mutable CSRs.
include/neug/storages/csr/immutable_csr.h Declares capacity() overrides for immutable CSRs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@zhanglei1949
Copy link
Collaborator Author

@greptile

@zhanglei1949 zhanglei1949 force-pushed the fix-overflow branch 3 times, most recently from db68d39 to ac3638f Compare March 9, 2026 09:42
@zhanglei1949 zhanglei1949 requested review from liulx20 and lnfjpt March 10, 2026 02:23
@zhanglei1949
Copy link
Collaborator Author

@greptile

Committed-by: xiaolei.zl from Dev container

Committed-by: xiaolei.zl from Dev container

add capacity api

Committed-by: xiaolei.zl from Dev container

fix CI

Committed-by: xiaolei.zl from Dev container

remove tests for TP

Committed-by: xiaolei.zl from Dev container

Committed-by: xiaolei.zl from Dev container

fixing

Committed-by: xiaolei.zl from Dev container

Committed-by: xiaolei.zl from Dev container

Committed-by: xiaolei.zl from Dev container

fix

Update src/storages/graph/edge_table.cc

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

fixes

fix

use explicit capacity calculation for EnsureCapacity
@zhanglei1949
Copy link
Collaborator Author

@greptile

@zhanglei1949
Copy link
Collaborator Author

@greptile

zhanglei1949 and others added 3 commits March 11, 2026 15:03
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@zhanglei1949
Copy link
Collaborator Author

@greptile

zhanglei1949 and others added 3 commits March 11, 2026 18:03
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Louyk14 pushed a commit that referenced this pull request Mar 12, 2026
- Add `.clang-format`.
- Remove constraint of singleton `GraphDB`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CREATE vertex fails

2 participants