-
Notifications
You must be signed in to change notification settings - Fork 441
Rewrite Schema in Rust #9084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Rewrite Schema in Rust #9084
Conversation
This reverts commit ed80851e1e90c40e9c4a44cc5008347112deb9d5.
I'm not convinced this is worth doing on a large scale: - .get_fully_qualified is much more efficient, so it should be used when possible, - it is also much less convenient, - it can be mostly be used only in places that are not hot paths of our compiler
This reverts commit 5e4ad5adda3ee557abc9b5ea699679d6d3fa2218.
This reverts commit 0474076.
This reverts commit 74416ae.
8131cf8 to
1b20f7e
Compare
1b20f7e to
c50afa3
Compare
|
Latest benchmarks: ... are great! This impl is only ~5% slower than current python one. The major improvement was enabling release build (lol) and a significant one was also using There are still some optimizations left that I can implement, so I plan to get this to run faster than the python impl on master. Although they require more work, which might take some time. |
|
One concerning data point is still the |
c50afa3 to
929f702
Compare
929f702 to
a4adf9f
Compare
This PR implements the Schema data structure in Rust.
It:
Schemaclass,RustSchemathat proxies method calls to the PyO3 class,FlatSchemawithRustSchema.Challenges:
edb.schema.Schemaabstract class has a pretty large interface. In Minimize FlatSchema interface #9016, I've tried to push as many abstract methods from schema implementations up into Schema itself.schema_restore-d later by getter methods.I've managed to get it so far that it compiles std lib, bootstraps and works on all queries that I've tried. Let's see the test suite.
Plan:
static PyOnceLock,ObjectList,ObjectSet,ObjectDictandObjectIndexare currently copied on each access. This means that if we want to, for example, lookup a Pointer of an ObjectType, we doget_pointers(), which copies all pointers from schema into a newObjectIndexinstance, just to pick a singleUuidout of it. This is so wrong that it feels unethical and immoral. To improve this, I want to store these values in schema in anRc, so that when we retrieve a field value, we just clone theRcand not the value itself. This means that eacn of these object containers would need its PyO3 wrapper class. A bit of work, but huge potential speed-up.Current benchmark: sometimes this is 3x slower than master