Skip to content

On huge vector index (190k) Adding one vector makes the next vectorNeighbors call slow as hell (2-9 minutes) #3679

@ExtReMLapin

Description

@ExtReMLapin

Hello,

Long story short, in our RAG app, the data ingestion can occur while the user is toying with the application.

We have an embeddings node type which pretty much has inheritance and other stuff, but it doesn't matter here.

We have an index on this and we use vectorNeighbors to search for the closest matching chunk answering the user question.

The first time we ask a question, it will take some time (2-9 minutes), next questions will be answered fast.

If we add more data to the database, the next question will again takes something like 2-9 minutes. Even if addition is really small.

Logs :

2026-03-18 15:27:51.386 INFO  [LSMVectorIndex] <ArcadeDB_0> Graph build validating: 18803/18803 (vector accesses=0, heap=575,4/8120,0MB, offheap=0,8MB, files=4,8MB [idx=0,3, graph=4,5, pq=0,0, compacted=0,0])
2026-03-18 15:27:51.393 INFO  [LSMVectorIndex] <ArcadeDB_0> Building graph with 18803 vectors using property 'vector' (cache enabled: size=100000)
2026-03-18 15:27:51.396 INFO  [LSMVectorIndex] <ArcadeDB_0> Building JVector graph index with 18803 vectors for index: CHUNK_EMBEDDING_0_32963829291000
2026-03-18 15:27:56.466 INFO  [LSMVectorIndex] Graph build building: 17306/18803 (vector accesses=17322, heap=395,8/8120,0MB, offheap=0,8MB, files=4,8MB [idx=0,3, graph=4,5, pq=0,0, compacted=0,0])
2026-03-18 15:28:01.515 INFO  [LSMVectorIndex] Graph build building: 17333/18803 (vector accesses=17349, heap=400,6/8120,0MB, offheap=0,8MB, files=4,8MB [idx=0,3, graph=4,5, pq=0,0, compacted=0,0])
2026-03-18 15:28:06.562 INFO  [LSMVectorIndex] Graph build building: 17530/18803 (vector accesses=17546, heap=330,2/8120,0MB, offheap=0,8MB, files=4,8MB [idx=0,3, graph=4,5, pq=0,0, compacted=0,0])
2026-03-18 15:28:11.587 INFO  [LSMVectorIndex] Graph build building: 17627/18803 (vector accesses=17643, heap=579,2/8120,0MB, offheap=0,8MB, files=4,8MB [idx=0,3, graph=4,5, pq=0,0, compacted=0,0])
2026-03-18 15:28:32.464 INFO  [LSMVectorIndex] Graph build building: 18363/18803 (vector accesses=18379, heap=634,2/8120,0MB, offheap=0,8MB, files=4,8MB [idx=0,3, graph=4,5, pq=0,0, compacted=0,0])
2026-03-18 15:28:37.498 INFO  [LSMVectorIndex] Graph build building: 18463/18803 (vector accesses=18479, heap=535,7/8120,0MB, offheap=0,8MB, files=4,8MB [idx=0,3, graph=4,5, pq=0,0, compacted=0,0])
2026-03-18 15:28:42.531 INFO  [LSMVectorIndex] Graph build building: 18708/18803 (vector accesses=18723, heap=420,5/8120,0MB, offheap=0,8MB, files=4,8MB [idx=0,3, graph=4,5, pq=0,0, compacted=0,0])
2026-03-18 15:28:46.354 INFO  [LSMVectorIndex] Graph build building: 18803/18803 (vector accesses=18813, heap=525,2/8120,0MB, offheap=0,8MB, files=4,8MB [idx=0,3, graph=4,5, pq=0,0, compacted=0,0])
2026-03-18 15:28:51.986 INFO  [LSMVectorIndex] <ArcadeDB_0> JVector graph index built successfully
2026-03-18 15:28:51.986 INFO  [LSMVectorIndex] <ArcadeDB_0> Graph build persisting: 0/18803 (vector accesses=0, heap=472,9/8120,0MB, offheap=0,8MB, files=4,8MB [idx=0,3, graph=4,5, pq=0,0, compacted=0,0])
2026-03-18 15:28:51.987 INFO  [LSMVectorIndexGraphFile] <ArcadeDB_0> Starting graph write (sequential) with chunking: 18803 nodes, 50MB chunk size
2026-03-18 15:28:51.988 INFO  [LSMVectorIndexGraphFile] <ArcadeDB_0> Writing graph WITHOUT inline vectors - topology only (vectors fetched from documents on-demand)
2026-03-18 15:28:52.057 INFO  [LSMVectorIndexGraphFile] <ArcadeDB_0> Graph written to pages (sequential): 18803 nodes, 4964596 bytes, 18 pages (topology only, vectors in documents)
2026-03-18 15:28:52.057 INFO  [LSMVectorIndex] <ArcadeDB_0> Graph build persisting: 18803/18803 (vector accesses=0, heap=524,9/8120,0MB, offheap=0,8MB, files=4,8MB [idx=0,3, graph=4,5, pq=0,0, compacted=0,0])
2026-03-18 15:28:52.058 INFO  [LSMVectorIndex] <ArcadeDB_0> Built graph for index: CHUNK_EMBEDDING_0_32963829291000
2026-03-18 15:28:52.068 INFO  [LSMVectorIndex] <ArcadeDB_0> GraphSearcher returned 100 nodes, graphSize=18803, vectorsSize=18803, ordinalToVectorIdLength=18803
2026-03-18 15:28:52.068 INFO  [LSMVectorIndex] <ArcadeDB_0> Vector search returned 100 results (skipped: 0 out of bounds, 0 deleted/null)

my two coworkers are working on a reproducible example as I'm writing this post.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions