Skip to content

Conversation

@ljeub-pometry
Copy link
Collaborator

What changes were proposed in this pull request?

Progress towards the new version of the underlying storage

Why are the changes needed?

Does this PR introduce any user-facing change? If yes is this documented?

How was this patch tested?

Are there any further changes required?

fabianmurariu and others added 30 commits November 28, 2025 11:51
* attempt at fixing the graph writing logic for windows

* fmt

* It should never be possible for a graph to be de-cached while it is being mutated. If we allow that to happen, reinserting it doesn't fix anything as the graph might already have been re-cached as well and state is inconsistent.

* fmt

* initial implementation of dirty path tracking

* more validation

* need to clean up dirty paths during validation

* Refactor writing to disk such that it writes with the new folder structure and refactor the validation logic (compiles but does not work yet)

* fix writing to empty graph folder

* move a lot more of the logic to GraphFolder

* fix zip encoding/decoding

* all the tests compile

* all features compile

* move the python benchmarks so they don't always run

* make secondary_index the last argument so it doesn't become annoying

* improved error messages

* need to get the graph before creating the new folder or the new path gets cleaned up again!

* don't create arbitrarily deep paths when writing graphs

* better error messages

* overwrite is handled internally, no need to call delete

* fix doc strings

* no more deserialize for Prop

* load graph metadata from the parquet file instead

* tidy up a lot of warnings

* fmt

* tidy up and add more validation for relative paths

* need to bring back Prop deserialisation for the WAL

* this reserve causes a race condition as it re-checks the count and is probably not really helpful anyway

* flat serialisation of Prop for working with arrow

* simplify serialisation for SerdeArrowArray

* more refactoring of the graph path handling

* avoid writing metadata without writing graph

* is_reserved should not be true for files

* materialize_at only works with persistent storage

* need to write the metadata in decode_at variants

* load metadata correctly when storage is enabled and use cached graph when available

* fix send_graph

* fix new_graph in graphql

* improve error messages

* always use / in zip paths

* fmt

* rearrange tests to ignore indexing/vectors

* look at .raph file instead

* tweak seed to get the old result as order of operations changed

* hopefully more robust fast_rp test

* chore: apply tidy-public auto-fixes

* update graph to new format

* no unzipping needed anymore

* some minor cleanup

* track io failure location

* fix storage check

* make sure all dirty paths are cleaned up properly

* fix drop handling for storage-enabled graphs

* use send_graph instead of save to also test the storage

* remove stray println!

* don't panic on errors

* delete should remove the graph from cache

* make sure load preserves the id type

* fmt

* try to fix the graphql benchmark workflow

* use send_graph to test storage

* add invalidation to make sure cached graphs are dropped before we delete them

* rename disk_storage_enabled to disk_storage_path

* rename inner metadata file to .meta and address remaining review comments

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* redo the node distribution and refactor bulk loading

# Conflicts:
#	db4-graph/src/lib.rs
#	db4-storage/src/pages/layer_counter.rs
#	db4-storage/src/pages/node_store.rs
#	raphtory-api/src/core/entities/properties/prop/prop_array.rs
#	raphtory-api/src/core/entities/properties/prop/prop_enum.rs
#	raphtory-core/src/storage/mod.rs
#	raphtory-storage/src/graph/graph.rs
#	raphtory/src/db/api/view/graph.rs
#	raphtory/src/db/graph/views/deletion_graph.rs
#	raphtory/src/errors.rs
#	raphtory/src/io/arrow/df_loaders.rs
#	raphtory/src/io/parquet_loaders.rs
#	raphtory/src/search/mod.rs
#	raphtory/src/serialise/graph_folder.rs
#	raphtory/src/serialise/parquet/mod.rs
#	raphtory/src/serialise/serialise.rs
#	raphtory/src/vectors/db.rs

* rebased and passing some tests

* fmt

* fix compilation issue

* fix the raphtory benchmarks

* add log statements for stress test

* rename graph1 -> graph0

* fix graphql-bench

* fixes for build failing

* fix the python test breaking

* fix various test failures in python

* update the result handling to prevent the errors from going missing with async_graphql 7.1.0

* fix node_props node_type resolution

* add COMPUTE_POOL in rayon.rs graphql with increased stack size

* chore: apply tidy-public auto-fixes

* fix the graph path

* fix handling of relative paths

* fmt

* remove the empty .raph file

* chore: apply tidy-public auto-fixes

* various fixes to keep order in graphql tests and fixing lifetimes

* various compilation fixes for graphql

* add an additional NodeState impl

* add repr.rs for 3 element tuple

* chore: apply tidy-public auto-fixes

* fix deadlock in dict_mapper.rs

* add more read_recursive and an extra check in layer_col

* increase the pyo3 to 0.27 and arrow to 57.2

* various fixes for Pyo3 0.27

# Conflicts:
#	raphtory/src/python/graph/io/pandas_loaders.rs

* stuff is compiling

* more fixes for pyo3 0.27

* enable Str as a type for prop_type

* fmt

* update rust to 1.89

* change error message for casting in python

* chore: apply tidy-public auto-fixes

---------

Co-authored-by: Lucas Jeub <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* flush needs to return a result

* don't panic when constructing arrays

* implement explicit flush for all the segment types

* add flush support to addition ops

* add flush support in python

* re-enable empty maps in tests

* at support for creating graph with config

* make inner map size depend on property parameter

* tweak the trait bounds so that we don't have to leak all the internal config for the disk storage

* tidy up warnings

* chore: apply tidy-public auto-fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Moved assert_valid_graph to test_utils.rs and added proptest for dumping/loading from parquet (roundtrip).

* Validate both graphs in parquet encode/decode round trip. Dedup temporal props obtained from fixture.

* Fixed nodes being expected by validation even if they have no temporal properties

* Removed sorting of histories obtained from the graph. They should already be sorted.

* chore: apply tidy-public auto-fixes

* remove sorts for graph output (should already be sorted) and dedup for fixture (history should no longer deduplicate)

* use multiset comparison for temporal properties to avoid ambiguities in case of duplicate timestamps

* fmt

* old node histories still have the dedup!

* update non-sensical test (should not have the same property repeated in a single update)

* fix hash for Prop

* use python-inspired implementation for hashing map Prop to avoid allocations

---------

Co-authored-by: Fabian Murariu <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Lucas Jeub <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants