Fix prompt-cache round-trip support for ArraysCache, MambaCache, and CacheList#155
Fix prompt-cache round-trip support for ArraysCache, MambaCache, and CacheList#155ronaldmannak wants to merge 6 commits intoml-explore:mainfrom
ArraysCache, MambaCache, and CacheList#155Conversation
Libraries/MLXLMCommon/KVCache.swift
Outdated
| return result | ||
| } | ||
| set { | ||
| // Handled by restoreFromMetaState |
There was a problem hiding this comment.
Is this OK that it is a NOP? I don't see any calls to set in the code, but other KVCache types do allow set. Should this be a fatalError() instead -- is it a bug to call the setter on CacheList?
There was a problem hiding this comment.
@davidkoski The set is required to conform to the KVCache protocol. The same NOP pattern exists in CacheList.metaState (line 1220) as well, btw. I agree a silent NOP could bite us in the future and I've replaced both with assertionFailure() so they trigger in debug builds but are stripped in release. Happy to switch to fatalError() if you'd prefer a hard crash in all builds.
davidkoski
left a comment
There was a problem hiding this comment.
This looks good, thank you. See what you think of my question -- not a blocker for merge.
|
Has a few conflicts from #158 now |
cbb44d0 to
ced9cf0
Compare
|
@davidkoski done. should be good to go |
Proposed changes
This change keeps the existing public
savePromptCache/loadPromptCacheAPI intact and extends cache reconstruction so composite and array-backed caches can be restored correctly from persisted prompt caches.What changed
ArraysCachenow serializes enough metadata inmetaStateto preserve:leftPaddingMambaCachenow round-trips through the same array-cache restore path while preserving its concrete typeCacheListnow serializes child cache structure inmetaStateand reconstructs children recursively on loadTests
Added round-trip coverage for:
ArraysCacheArraysCachewithleftPaddingMambaCachetype preservationCacheListwith KV-only childrenCacheListwith hybrid children (MambaCache+KVCacheSimple)Notes
LRUPromptCache, direct-generation integration, or batching behavior in Add continuous batching support for concurrent LLM inference #150Checklist
pre-commit run --all-filesto format my code / installed pre-commit prior to committing changes