Rewrite Schema in Rust #9084

aljazerzen · 2025-10-08T17:47:05Z

This PR implements the Schema data structure in Rust.

It:

implements the data structure in gel-schema crate in gel-rust repo,
adds a PyO3 wrapper crate that provides a Schema class,
adds a Python class RustSchema that proxies method calls to the PyO3 class,
replaces usages of FlatSchema with RustSchema.

Challenges:

the edb.schema.Schema abstract class has a pretty large interface. In Minimize FlatSchema interface #9016, I've tried to push as many abstract methods from schema implementations up into Schema itself.
data that we put in the schema can be of many different types (Python primitives, uuid, None, ObjectSet, ObjectList, Expressions, Version, ...), each of which needs a rust-native representation,
all data we put in the schema is stored there in so-called "reduced representation" (a tuple of data that can be pickled). This means that the PyO3 class must produce precisely this repr, so it can be schema_restore-d later by getter methods.
conversion between "Python reduced repr" and "Rust repr" is currently slow. That's because:
- I'm importing classes just to check if a value is an instance of it,
- some data needs to be copied (strings, uuids, lists, ...),

I've managed to get it so far that it compiles std lib, bootstraps and works on all queries that I've tried. Let's see the test suite.

Plan:

Cache imports using static PyOnceLock,
Reducing values is no longer needed, because serialization is now done with serde and bincode. This means that we should remove the "reduced repr" and make PyO3 class consume and produce the "normal" Python repr of values. I suspect that this will provide a huge speed-up.
ObjectList, ObjectSet, ObjectDict and ObjectIndex are currently copied on each access. This means that if we want to, for example, lookup a Pointer of an ObjectType, we do get_pointers(), which copies all pointers from schema into a new ObjectIndex instance, just to pick a single Uuid out of it. This is so wrong that it feels unethical and immoral. To improve this, I want to store these values in schema in an Rc, so that when we retrieve a field value, we just clone the Rc and not the value itself. This means that eacn of these object containers would need its PyO3 wrapper class. A bit of work, but huge potential speed-up.

Current benchmark: sometimes this is 3x slower than master

This reverts commit ed80851e1e90c40e9c4a44cc5008347112deb9d5.

I'm not convinced this is worth doing on a large scale: - .get_fully_qualified is much more efficient, so it should be used when possible, - it is also much less convenient, - it can be mostly be used only in places that are not hot paths of our compiler

This reverts commit 5e4ad5adda3ee557abc9b5ea699679d6d3fa2218.

This reverts commit 0474076.

This reverts commit 74416ae.

aljazerzen · 2025-10-22T15:48:33Z

Latest benchmarks:

... are great! This impl is only ~5% slower than current python one.

The major improvement was enabling release build (lol) and a significant one was also using im::HashMap instead of im::OrdMap. I though I need ordering in maps, but apparently I don't. Great.

There are still some optimizations left that I can implement, so I plan to get this to run faster than the python impl on master. Although they require more work, which might take some time.

aljazerzen · 2025-10-22T15:49:34Z

One concerning data point is still the compile_migration_01 benchmark, I have to investigate that.

aljazerzen added 30 commits September 12, 2025 13:20

Remove disallow_module

c5844eb

Revert "Remove disallow_module"

e8b39c8

This reverts commit ed80851e1e90c40e9c4a44cc5008347112deb9d5.

Remove support for __current__

0094631

Replace get_operators by get_by_shortname

090e1ce

Replace get_functions with Schema.get_by_shortname

766d033

Simplify _get_global into get_by_global_name

7c3659b

Optimization: _raw_schema_restore

ccf0762

Fix s_schema.lookup

c0995b1

Replace _get with s_schema.lookup

f546564

Pull get_casts* methods from FlatSchema to Schema

b7cc00e

Tweak get() signature

1cb64cb

Pull add into Schema

6485105

Pull get_obj_data_raw into Schema

385c4c5

Pull get_by_id into Schema

8072269

Revert "Pull get_by_id into Schema"

93f801f

This reverts commit 5e4ad5adda3ee557abc9b5ea699679d6d3fa2218.

minor fix

329e091

fix expr.py

0474076

fix

ff5710b

add a few type annos

831a533

Pull discard into Schema

265a83e

Rename get_by_fully_qualified to get_by_name

2ec32cb

Pull has_module and has_migration to Schema

e7b8ae6

Pull get_objects into Schema

692834e

Pull get_modules and get_last_migration to Schema

9ba2c47

Revert "fix expr.py"

1fc1c6e

This reverts commit 0474076.

Revert "Replace a few .get with .get_fully_qualified"

82da28e

This reverts commit 74416ae.

Reorder a few Schema methods, add doc string

add039a

Schema.fetch

95663eb

.

a18ca33

aljazerzen force-pushed the schema-rust branch from 8131cf8 to 1b20f7e Compare October 13, 2025 16:48

aljazerzen added 5 commits October 14, 2025 19:36

fix

1cb43ff

fix

24ebcd9

Merge remote-tracking branch 'origin/master' into refactor-schema

3e1dcb3

Push add_raw down into FlatSchema and RustSchema

e47a67b

remove _obj_ from schema names

fef3d7d

aljazerzen force-pushed the schema-rust branch from 1b20f7e to c50afa3 Compare October 22, 2025 15:45

aljazerzen added 8 commits October 23, 2025 14:47

revert naming schema back to get

49273a0

rust schema

841c058

cached imports

2fbd8b6

rc object containers

67510a0

Push add_raw down into FlatSchema and RustSchema

4b4afc8

Remove reducing

dbeeccc

Remove schema restoring

d65db15

remove FlatSchema and abc.Reducible

17c1394

aljazerzen force-pushed the schema-rust branch from c50afa3 to 929f702 Compare October 23, 2025 13:23

aljazerzen added 6 commits October 24, 2025 13:28

replace im with im-rc

58e1f50

rename a few methods

6b2d743

rc and release build

408bd09

more rc

1c96f3b

fix renaming (again)

15e1372

a few fixes

c76b80d

Base automatically changed from refactor-schema to master October 27, 2025 06:51

aljazerzen added 4 commits October 27, 2025 12:54

RustSchemaError

1b99381

clippy

14858fb

span

4cb4064

replace Rc with Arc

a4adf9f

aljazerzen force-pushed the schema-rust branch from 929f702 to a4adf9f Compare November 3, 2025 18:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite Schema in Rust #9084

Rewrite Schema in Rust #9084

aljazerzen commented Oct 8, 2025 •

edited

Loading

Uh oh!

aljazerzen commented Oct 22, 2025

Uh oh!

aljazerzen commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Rewrite Schema in Rust #9084

Are you sure you want to change the base?

Rewrite Schema in Rust #9084

Conversation

aljazerzen commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aljazerzen commented Oct 22, 2025

Uh oh!

aljazerzen commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aljazerzen commented Oct 8, 2025 •

edited

Loading