Introduce index serialization mode option to support serializing graph in memory#124
Conversation
a4a060e to
72035bd
Compare
|
@rchitale7 please add a benchmarking results for the change |
remote_vector_index_builder/core/common/models/index_build_parameters.py
Outdated
Show resolved
Hide resolved
remote_vector_index_builder/core/common/models/index_build_parameters.py
Outdated
Show resolved
Hide resolved
...ector_index_builder/core/common/models/index_builder/faiss/faiss_index_hnsw_cagra_builder.py
Outdated
Show resolved
Hide resolved
079fd54 to
5e4de8a
Compare
|
Serializing the faiss index in memory instead of writing to disk impacts the performance in two places, in the overall remote build flow:
All other components of the remote build flow should remain the same. Serializing also impacts CPU memory usage; we must now maintain an additional copy of the serialized graph in memory, instead of keeping it on disk. GPU memory usage should also be unaffected. To benchmark this serialization change, I used the ms-marco-384, cohere-10M-768-IP, and open-ai-1536 datasets that can be found in benchmarks.yml. I compared the performance of serializing the index in memory v.s. disk for the 'write index' and 'upload to s3' steps. I also compared the peak CPU memory usage for memory v.s. disk. Finally, I validated that serializing in memory does not impact the recall. I also validated the performance for all other aspects of the remote index build flow remained the same. I've added the benchmarking scripts I used to this PR, and this README explains how I (and anyone else) use these scripts. Here is a table comparing the performance and memory impacts for serializing in memory v.s. disk:
As expected, the peak CPU memory is roughly [dataset size] MB larger when the index is serialized in memory v.s. stored on disk. This is because the additional copy of the faiss graph contains all of the vectors. There does not appear to be a performance benefit for serializing in memory for the marco and open ai datasets. However, for the much larger cohere dataset, we get a small boost when serializing in memory. The write index step is 57 seconds faster, but uploading to s3 is 23 seconds slower. This results in about a ~30 second bump in performance. More investigation needs to be done to improve the s3 upload times for serializing in memory. One reason may be that we need to tune the multipart upload settings for |
…h in memory Signed-off-by: Rohan Chitale <rchital@amazon.com>
5e4de8a to
6c33cf4
Compare
|
I realized we can further optimize the memory usage, by freeing the vectors from memory after GPU->CPU conversion. Here is the updated memory graph:
I've updated the code to free the vectors after conversion, but before serialization. This gives us a similar memory impact to writing to disk. Here is the updated table comparing performance and memory impacts for serializing in memory v.s. disk:
I've also updated the comment on the GH issue with this new graph: #97 (comment) |

Description
This change introduces the option of serializing the
faissgraph in memory, instead of writing to disk. For background on why this can be helpful, see the issue description here: #97. For the reasoning behind the solution I took, please see the solution here: #97 (comment). TLDR: I did not end up usingfaiss.serialize_index, due its inefficient memory usage. Instead, I usedfaiss.PyCallbackIOWriterto stream directly to a bytes buffer.Note that writing the graph to disk is still the default option. The consumer of this library needs to specify the
index_storage_modeviaIndexBuildParametersasIndexStorageMode.MEMORYto serialize in memory.Testing
I tested this change manually and verified it works. Will need to do some more benchmarking as a follow up.
Issues Resolved
Resolves #97.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.