Skip to content

Conversation

@vigna
Copy link

@vigna vigna commented Nov 27, 2025

This PR adds Phast support for serialization and memory mapping with ε-serde, and inspection via mem_dbg. The two features are named epserde and mem_dbg.

@beling
Copy link
Owner

beling commented Dec 22, 2025

Sorry, I've only just found the time to review this PR.

My remarks and questions:

  1. There is hard-coded mem_dbg path in ph/Cargo.toml
  2. Can maligned be an optional dependency?
  3. Does _marker (in phast/function.rs) is required by epserde or mem_dbg? Can we make _marker exists only when the feature is enabled?
  4. I have the same question about these extra template parameters, like L0, L, ... etc. (in function2.rs). Maybe it is good idea to define the whole struct (for example Function2) twice. Once with this extra parameters (when epserde or mem_dbg is enabled) and once without them?

Since I am not a user of mem_dbg and epserde myself, I am wondering how all this should be done so that I can somehow maintain it.

@vigna
Copy link
Author

vigna commented Dec 22, 2025

  1. My mistake.
  2. It already is—I just have to update to the last version.
  3. I'll check that.
  4. Yes, that's actually a possibility. Let me explore that.

@vigna
Copy link
Author

vigna commented Jan 4, 2026

I completely removed maligned (that has been removed also from all our crates).

As for mem_dbg: it was essential for me to understand where the structure actually stores significant data. But if you don't think it might be useful for you for debugging/inspection, I can simply remove it.

It is possible to gate two versions of all involved structures. That, however, would mean to have 5 alternate versions, which starts to be significant code. In the non-ε-serde version, _marker would disappear (it is necessary as that type parameter is now a parameter of another type parameter, and as such it is not considered used by the compiler).

I understand that the addition of the parameters looks invasive (but note that there is no structural change). On the other hand, I think you need to have some efficient serialization mechanism, or your code can only be used for benchmarking. We developed ε-serde because we build this kind of maps for dozen of billions of items, and we need store to memory-map them, or load them in memory at high speed.

We're presently using PTHash, and we would be happy to switch to Phast2+, but we need instant memory-mapping (or at least efficient serialization). I think others would find efficient serialization and memory-mapping useful.

@martinkirch
Copy link

Hello @vigna ! I'm one of the others interested in serialization. But I'm surprised your branch didn't move after you last two comments ? could you push it without the hardcoded mem_dbg path and without maligned ? Thanks !

@vigna
Copy link
Author

vigna commented Jan 13, 2026

I was expecting an answer from @beling , but if there's interest I'll do it today.

@vigna
Copy link
Author

vigna commented Jan 13, 2026

Oh wait. The path should already be ok. There's something wrong...

@vigna
Copy link
Author

vigna commented Jan 13, 2026

Ok, there was a missing push.

@martinkirch
Copy link

Thanks ! My tests on your branch are very positive towards using Phast+. The ability to store/load the MPH unlocks my current work on distributing large-scale archive containers. Looking forward to this PR 🤞

@vigna
Copy link
Author

vigna commented Jan 27, 2026

Yeah, it's a great data structure. We are now in the hands of @beling 😂.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants