Skip to content

Commit 7abd808

Browse files
authored
Turbopack: use the same serialization method for lookup as for storing (#84765)
### What? This makes sure to use the same method for serializing CachedTaskType for storing and lookup. Before that we instantiated the serialization twice and that caused some weird behavior on linux. Before this fix I was seeing that serialization of CachedTaskType was not byte to byte identical between storing and lookup. This caused tasks to not be found, which will lead to very weird effects: stale tasks, cell not found, etc. I was expecting that pot serialization is deterministic, but turns out it isn't fully deterministic. It has some logic to deduplicate Symbols (like enum variant names). But it deduplicates these `&'static str`s based on the pointer address of the static string. Turns out that Rust doesn't guarantee that static strings are deduplicated in all cases. I assume that we are somehow running into that problem when having two variants of the serialization. They ended up serializing task types differently (one didn't have a symbol deduplicated in the serialized form. But that's more a wild guess. It did fix the bug, but I have a hard time understanding why this is happening. On long term, I think we should change pot to avoid using pointer addresses and do a real comparision instead. > why does the `inline(never)` fix it? I'm not super sure, but here is my theory. The rust compiler on linux is very aggressive in inlining the whole stuff, up to the point where it inlines the Serialize implementation of CachedTaskType and RawVc into the call side (`turbo-tasks-backend` crate). It can't inline the Serialize implementation that's behind the `arg` `MagicAny` stuff. So we are duplicating the RawVc Serialize logic into two crates. This way we end up with two `&'static str` of `TaskCell` for the enum variant. This string is not deduplicated during linking for whatever reason (It's not guaranteed). So in the end the Symbol is not deduplicated in the serialized form of the CachedTaskType. The stuff above might happen or not. It's in the fate of the compiler. And we also don't rely on that the Symbol is deduplicated. But we rely on a deterministic serialized version. The problem is that the lookup path and the store path have two separate calls to `serialize`, so it might end up inlining into two different places. So that might end up being aggressive in one place and not so aggressive in the other place. So in the end it might end up deduplicating in the lookup path and not deduplicating in the store path, which would cause different serialized versions in the two paths. To avoid that, I changed to code to use the `serialize_task_type` function in both places. And the `inline(never)` ensures that it won't be duplicated by inlining. Don't ask me how long it took to figure that out...
1 parent 04c7cef commit 7abd808

File tree

1 file changed

+30
-10
lines changed

1 file changed

+30
-10
lines changed

turbopack/crates/turbo-tasks-backend/src/kv_backing_storage.rs

Lines changed: 30 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -357,8 +357,12 @@ impl<T: KeyValueDatabase + Send + Sync + 'static> BackingStorageSealed
357357

358358
let mut task_type_bytes = Vec::new();
359359
for (task_type, task_id) in updates {
360+
serialize_task_type(
361+
&task_type,
362+
&mut task_type_bytes,
363+
Some(task_id),
364+
)?;
360365
let task_id: u32 = *task_id;
361-
serialize_task_type(&task_type, &mut task_type_bytes, task_id)?;
362366

363367
batch
364368
.put(
@@ -441,8 +445,8 @@ impl<T: KeyValueDatabase + Send + Sync + 'static> BackingStorageSealed
441445
.entered();
442446
let mut task_type_bytes = Vec::new();
443447
for (task_type, task_id) in task_cache_updates.into_iter().flatten() {
448+
serialize_task_type(&task_type, &mut task_type_bytes, Some(task_id))?;
444449
let task_id = *task_id;
445-
serialize_task_type(&task_type, &mut task_type_bytes, task_id)?;
446450

447451
batch
448452
.put(
@@ -499,8 +503,10 @@ impl<T: KeyValueDatabase + Send + Sync + 'static> BackingStorageSealed
499503
tx: &D::ReadTransaction<'_>,
500504
task_type: &CachedTaskType,
501505
) -> Result<Option<TaskId>> {
502-
let task_type = POT_CONFIG.serialize(task_type)?;
503-
let Some(bytes) = database.get(tx, KeySpace::ForwardTaskCache, &task_type)? else {
506+
let mut task_type_bytes = Vec::new();
507+
serialize_task_type(task_type, &mut task_type_bytes, None)?;
508+
let Some(bytes) = database.get(tx, KeySpace::ForwardTaskCache, &task_type_bytes)?
509+
else {
504510
return Ok(None);
505511
};
506512
let bytes = bytes.borrow().try_into()?;
@@ -645,23 +651,37 @@ where
645651
Ok(())
646652
}
647653

654+
// DO NOT REMOVE THE `inline(never)` ATTRIBUTE!
655+
// `pot` uses the pointer address of `&'static str` to deduplicate Symbols.
656+
// If this function is inlined into multiple different callsites it might inline the Serialize
657+
// implementation too, which can pull a `&'static str` from another crate into this crate.
658+
// Since string deduplication between crates is not guaranteed, it can lead to behavior changes due
659+
// to the pointer addresses. This can lead to lookup path and store path creating different
660+
// serialization of the same task type, which breaks task cache lookups.
661+
#[inline(never)]
648662
fn serialize_task_type(
649-
task_type: &Arc<CachedTaskType>,
663+
task_type: &CachedTaskType,
650664
mut task_type_bytes: &mut Vec<u8>,
651-
task_id: u32,
665+
task_id: Option<TaskId>,
652666
) -> Result<()> {
653667
task_type_bytes.clear();
654668
POT_CONFIG
655-
.serialize_into(&**task_type, &mut task_type_bytes)
656-
.with_context(|| anyhow!("Unable to serialize task {task_id} cache key {task_type:?}"))?;
669+
.serialize_into(task_type, &mut task_type_bytes)
670+
.with_context(|| {
671+
if let Some(task_id) = task_id {
672+
anyhow!("Unable to serialize task {task_id} cache key {task_type:?}")
673+
} else {
674+
anyhow!("Unable to serialize task cache key {task_type:?}")
675+
}
676+
})?;
657677
#[cfg(feature = "verify_serialization")]
658678
{
659679
let deserialize: Result<CachedTaskType, _> = serde_path_to_error::deserialize(
660680
&mut pot_de_symbol_list().deserializer_for_slice(&*task_type_bytes)?,
661681
);
662682
if let Err(err) = deserialize {
663-
println!("Task type would not be deserializable {task_id}: {err:?}\n{task_type:#?}");
664-
panic!("Task type would not be deserializable {task_id}: {err:?}");
683+
println!("Task type would not be deserializable {task_id:?}: {err:?}\n{task_type:#?}");
684+
panic!("Task type would not be deserializable {task_id:?}: {err:?}");
665685
}
666686
}
667687
Ok(())

0 commit comments

Comments
 (0)