Skip to content

Conversation

@Mukesh-P
Copy link
Contributor

Issue link: #144

Description
Fixes "Index required but not found for 'type' field" error on cloud-hosted Qdrant instances by adding proper keyword indexing for the 'type' payload field during collection creation.

Type of Change
Bug fix (non-breaking change which fixes an issue)

Related Issue(s)
Fixes #[HELP WANTED] Bug: Qdrant semantic search fails due to missing index for "type" field

Changes Made
Added payload_schema parameter to QdrantVectorConfiguration for payload index support
Updated collection creation to include keyword index for 'type' field
Removed manual post-creation index creation (now handled at collection init)

Testing
Test Commands:
uv run pytest tests/semantic_search/ -v
uv run pytest tests/ -v --tb=short
All tests pass: 63 passed, 18 skipped

Checklist
-[x] My code follows the code style of this project
-[x] Unit tests pass locally
-[x] New and existing functionality works
-[x] No breaking changes

Additional Context
This resolves the Qdrant cloud filtering issue by ensuring collections are created with proper keyword indexes from the start.

@raphael-intugle
Copy link
Collaborator

Nice work. Its connecting to Qdrant Cloud now but looks like there is another issue while uploading the vector points to the Qdrant Cloud :

WriteTimeout: The write operation timed out

During handling of the above exception, another exception occurred:

ResponseHandlingException                 Traceback (most recent call last)
Cell In[5], [line 2](vscode-notebook-cell:?execution_count=5&line=2)
      1 # Perform a semantic search
----> [2](vscode-notebook-cell:?execution_count=5&line=2) search_results = sm.search("reason for hospital visit")
      4 # View the search results
      5 search_results

File ~/intugle/data-tools/src/intugle/semantic_model.py:356, in SemanticModel.search(self, query)
    339 """
    340 Performs a semantic search against the knowledge base.
    341 
   (...)    353     >>> results = sm.search("Find all tables related to patient claims.")
    354 """
    355 if not self._semantic_search_initialized:
--> [356](https://file+.vscode-resource.vscode-cdn.net/home/raphael/intugle/data-tools/~/intugle/data-tools/src/intugle/semantic_model.py:356)     self.initialize_semantic_search()
    358 try:
    359     search_client = SemanticSearch()

File ~/intugle/data-tools/src/intugle/semantic_model.py:322, in SemanticModel.initialize_semantic_search(self)
    320 except Exception as e:
    321     log.warning(f"Could not initialize semantic search: {e}")
--> [322](https://file+.vscode-resource.vscode-cdn.net/home/raphael/intugle/data-tools/~/intugle/data-tools/src/intugle/semantic_model.py:322)     raise e

File ~/intugle/data-tools/src/intugle/semantic_model.py:317, in SemanticModel.initialize_semantic_search(self)
    315 print("Initializing semantic search...")
    316 search_client = SemanticSearch()
--> [317](https://file+.vscode-resource.vscode-cdn.net/home/raphael/intugle/data-tools/~/intugle/data-tools/src/intugle/semantic_model.py:317) search_client.initialize()
    318 self._semantic_search_initialized = True
    319 print("Semantic search initialized.")

File ~/intugle/data-tools/src/intugle/semantic_search.py:181, in SemanticSearch.initialize(self)
    170 def initialize(self):
    171     """
    172     Index columns into the vector database (sync wrapper).
    173 
   (...)    179     >>> ss.initialize()
    180     """
--> [181](https://file+.vscode-resource.vscode-cdn.net/home/raphael/intugle/data-tools/~/intugle/data-tools/src/intugle/semantic_search.py:181)     return _run_async_in_sync(self._async_initialize())

File ~/intugle/data-tools/src/intugle/semantic_search.py:43, in _run_async_in_sync(coro)
     40     thread.join()
     42     if exc:
---> [43](https://file+.vscode-resource.vscode-cdn.net/home/raphael/intugle/data-tools/~/intugle/data-tools/src/intugle/semantic_search.py:43)         raise exc
     44     return result
     45 else:

File ~/intugle/data-tools/src/intugle/semantic_search.py:34, in _run_async_in_sync.<locals>.thread_target()
     32 nonlocal result, exc
     33 try:
---> [34](https://file+.vscode-resource.vscode-cdn.net/home/raphael/intugle/data-tools/~/intugle/data-tools/src/intugle/semantic_search.py:34)     result = asyncio.run(coro)
     35 except Exception as e:
     36     exc = e

File /usr/lib/python3.12/asyncio/runners.py:194, in run(main, debug, loop_factory)
    190     raise RuntimeError(
    191         "asyncio.run() cannot be called from a running event loop")
    193 with Runner(debug=debug, loop_factory=loop_factory) as runner:
--> [194](https://file+.vscode-resource.vscode-cdn.net/usr/lib/python3.12/asyncio/runners.py:194)     return runner.run(main)

File /usr/lib/python3.12/asyncio/runners.py:118, in Runner.run(self, coro, context)
    116 self._interrupt_count = 0
    117 try:
--> [118](https://file+.vscode-resource.vscode-cdn.net/usr/lib/python3.12/asyncio/runners.py:118)     return self._loop.run_until_complete(task)
    119 except exceptions.CancelledError:
    120     if self._interrupt_count > 0:

File /usr/lib/python3.12/asyncio/base_events.py:687, in BaseEventLoop.run_until_complete(self, future)
    684 if not future.done():
    685     raise RuntimeError('Event loop stopped before Future completed.')
--> [687](https://file+.vscode-resource.vscode-cdn.net/usr/lib/python3.12/asyncio/base_events.py:687) return future.result()

File ~/intugle/data-tools/src/intugle/semantic_search.py:168, in SemanticSearch._async_initialize(self)
    166 column_details = self.get_column_details()
    167 column_details = pd.DataFrame.from_records(column_details)
--> [168](https://file+.vscode-resource.vscode-cdn.net/home/raphael/intugle/data-tools/~/intugle/data-tools/src/intugle/semantic_search.py:168) await semantic_search_crud.initialize(column_details)

File ~/intugle/data-tools/src/intugle/core/semantic_search/crud.py:226, in SemanticSearchCRUD.initialize(self, column_details)
    222 content = pd.concat(content, axis=0).reset_index(drop=True)
    224 points = await self.vectorize(content)
--> [226](https://file+.vscode-resource.vscode-cdn.net/home/raphael/intugle/data-tools/~/intugle/data-tools/src/intugle/core/semantic_search/crud.py:226) vdb.bulk_insert(points)

File ~/intugle/data-tools/src/intugle/core/vector_store/qdrant.py:143, in AsyncQdrantService.bulk_insert(self, points)
    141 except Exception as e:
    142     log.error(f"Couldn't bulk insert data: {e}")
--> [143](https://file+.vscode-resource.vscode-cdn.net/home/raphael/intugle/data-tools/~/intugle/data-tools/src/intugle/core/vector_store/qdrant.py:143)     raise e

File ~/intugle/data-tools/src/intugle/core/vector_store/qdrant.py:138, in AsyncQdrantService.bulk_insert(self, points)
    136 def bulk_insert(self, points: models.PointStruct | List[models.PointStruct]):
    137     try:
--> [138](https://file+.vscode-resource.vscode-cdn.net/home/raphael/intugle/data-tools/~/intugle/data-tools/src/intugle/core/vector_store/qdrant.py:138)         result = self.upload_point(points)
    139         log.debug(f"Upload Status: {result}")
    140         return result

File ~/intugle/data-tools/src/intugle/core/vector_store/qdrant.py:157, in AsyncQdrantService.upload_point(self, points)
    155 except Exception as e:
    156     log.error(f"Coulnd't uploading points: {e}")
--> [157](https://file+.vscode-resource.vscode-cdn.net/home/raphael/intugle/data-tools/~/intugle/data-tools/src/intugle/core/vector_store/qdrant.py:157)     raise e

File ~/intugle/data-tools/src/intugle/core/vector_store/qdrant.py:147, in AsyncQdrantService.upload_point(self, points)
    145 def upload_point(self, points):
    146     try:
--> [147](https://file+.vscode-resource.vscode-cdn.net/home/raphael/intugle/data-tools/~/intugle/data-tools/src/intugle/core/vector_store/qdrant.py:147)         self.client.upload_points(
    148             collection_name=self.collection_name,
    149             points=points,
    150             parallel=1,  # number of vectors points to insert parallely,
    151             max_retries=5,
    152         )
    153         log.debug(f"batch uploaded: {len(points)}")
    154         return True

File ~/intugle/data-tools/.venv/lib/python3.12/site-packages/qdrant_client/async_qdrant_client.py:1801, in AsyncQdrantClient.upload_points(self, collection_name, points, batch_size, parallel, method, max_retries, wait, shard_key_selector, update_filter, **kwargs)
   1797     if requires_inference:
   1798         points = self._embed_models_strict(
   1799             points, parallel=parallel, batch_size=self.local_inference_batch_size
   1800         )
-> [1801](https://file+.vscode-resource.vscode-cdn.net/home/raphael/intugle/data-tools/~/intugle/data-tools/.venv/lib/python3.12/site-packages/qdrant_client/async_qdrant_client.py:1801) return self._client.upload_points(
   1802     collection_name=collection_name,
   1803     points=points,
   1804     batch_size=batch_size,
   1805     parallel=parallel,
   1806     method=method,
   1807     max_retries=max_retries,
   1808     wait=wait,
   1809     shard_key_selector=shard_key_selector,
   1810     update_filter=update_filter,
   1811 )

File ~/intugle/data-tools/.venv/lib/python3.12/site-packages/qdrant_client/async_qdrant_remote.py:1991, in AsyncQdrantRemote.upload_points(self, collection_name, points, batch_size, parallel, method, max_retries, wait, shard_key_selector, update_filter, **kwargs)
   1975 def upload_points(
   1976     self,
   1977     collection_name: str,
   (...)   1986     **kwargs: Any,
   1987 ) -> None:
   1988     batches_iterator = self._updater_class.iterate_records_batches(
   1989         records=points, batch_size=batch_size
   1990     )
-> [1991](https://file+.vscode-resource.vscode-cdn.net/home/raphael/intugle/data-tools/~/intugle/data-tools/.venv/lib/python3.12/site-packages/qdrant_client/async_qdrant_remote.py:1991)     self._upload_collection(
   1992         batches_iterator=batches_iterator,
   1993         collection_name=collection_name,
   1994         max_retries=max_retries,
   1995         parallel=parallel,
   1996         method=method,
   1997         wait=wait,
   1998         shard_key_selector=shard_key_selector,
   1999         update_filter=update_filter,
   2000     )

File ~/intugle/data-tools/.venv/lib/python3.12/site-packages/qdrant_client/async_qdrant_remote.py:1968, in AsyncQdrantRemote._upload_collection(self, batches_iterator, collection_name, max_retries, parallel, method, wait, shard_key_selector, update_filter)
   1966 if parallel == 1:
   1967     updater = self._updater_class.start(**updater_kwargs)
-> [1968](https://file+.vscode-resource.vscode-cdn.net/home/raphael/intugle/data-tools/~/intugle/data-tools/.venv/lib/python3.12/site-packages/qdrant_client/async_qdrant_remote.py:1968)     for _ in updater.process(batches_iterator):
   1969         pass
   1970 else:

File ~/intugle/data-tools/.venv/lib/python3.12/site-packages/qdrant_client/uploader/rest_uploader.py:110, in RestBatchUploader.process(self, items)
    108 def process(self, items: Iterable[Any]) -> Iterable[bool]:
    109     for batch in items:
--> [110](https://file+.vscode-resource.vscode-cdn.net/home/raphael/intugle/data-tools/~/intugle/data-tools/.venv/lib/python3.12/site-packages/qdrant_client/uploader/rest_uploader.py:110)         yield upload_batch(
    111             self.openapi_client,
    112             self.collection_name,
    113             batch,
    114             shard_key_selector=self._shard_key_selector,
    115             max_retries=self.max_retries,
    116             update_filter=self._update_filter,
    117             wait=self._wait,
    118         )

File ~/intugle/data-tools/.venv/lib/python3.12/site-packages/qdrant_client/uploader/rest_uploader.py:68, in upload_batch(openapi_client, collection_name, batch, max_retries, shard_key_selector, update_filter, wait)
     61         show_warning(
     62             message=f"Batch upload failed {attempt + 1} times. Retrying...",
     63             category=UserWarning,
     64             stacklevel=7,
     65         )
     67         if attempt == max_retries - 1:
---> [68](https://file+.vscode-resource.vscode-cdn.net/home/raphael/intugle/data-tools/~/intugle/data-tools/.venv/lib/python3.12/site-packages/qdrant_client/uploader/rest_uploader.py:68)             raise e
     70         attempt += 1
     71 return True

File ~/intugle/data-tools/.venv/lib/python3.12/site-packages/qdrant_client/uploader/rest_uploader.py:44, in upload_batch(openapi_client, collection_name, batch, max_retries, shard_key_selector, update_filter, wait)
     42 while attempt < max_retries:
     43     try:
---> [44](https://file+.vscode-resource.vscode-cdn.net/home/raphael/intugle/data-tools/~/intugle/data-tools/.venv/lib/python3.12/site-packages/qdrant_client/uploader/rest_uploader.py:44)         openapi_client.points_api.upsert_points(
     45             collection_name=collection_name,
     46             point_insert_operations=rest.PointsList(  # type: ignore[attr-defined]
     47                 points=points, shard_key=shard_key_selector, update_filter=update_filter
     48             ),
     49             wait=wait,
     50         )
     51         break
     52     except ResourceExhaustedResponse as ex:

File ~/intugle/data-tools/.venv/lib/python3.12/site-packages/qdrant_client/http/api/points_api.py:994, in SyncPointsApi.upsert_points(self, collection_name, wait, ordering, point_insert_operations)
    984 def upsert_points(
    985     self,
    986     collection_name: str,
   (...)    989     point_insert_operations: m.PointInsertOperations = None,
    990 ) -> m.InlineResponse2005:
    991     """
    992     Perform insert + updates on points. If point with given ID already exists - it will be overwritten.
    993     """
--> [994](https://file+.vscode-resource.vscode-cdn.net/home/raphael/intugle/data-tools/~/intugle/data-tools/.venv/lib/python3.12/site-packages/qdrant_client/http/api/points_api.py:994)     return self._build_for_upsert_points(
    995         collection_name=collection_name,
    996         wait=wait,
    997         ordering=ordering,
    998         point_insert_operations=point_insert_operations,
    999     )

File ~/intugle/data-tools/.venv/lib/python3.12/site-packages/qdrant_client/http/api/points_api.py:515, in _PointsApi._build_for_upsert_points(self, collection_name, wait, ordering, point_insert_operations)
    513 if "Content-Type" not in headers:
    514     headers["Content-Type"] = "application/json"
--> [515](https://file+.vscode-resource.vscode-cdn.net/home/raphael/intugle/data-tools/~/intugle/data-tools/.venv/lib/python3.12/site-packages/qdrant_client/http/api/points_api.py:515) return self.api_client.request(
    516     type_=m.InlineResponse2005,
    517     method="PUT",
    518     url="/collections/{collection_name}/points",
    519     headers=headers if headers else None,
    520     path_params=path_params,
    521     params=query_params,
    522     content=body,
    523 )

File ~/intugle/data-tools/.venv/lib/python3.12/site-packages/qdrant_client/http/api_client.py:95, in ApiClient.request(self, type_, method, url, path_params, **kwargs)
     93     kwargs["timeout"] = int(kwargs["params"]["timeout"])
     94 request = self._client.build_request(method, url, **kwargs)
---> [95](https://file+.vscode-resource.vscode-cdn.net/home/raphael/intugle/data-tools/~/intugle/data-tools/.venv/lib/python3.12/site-packages/qdrant_client/http/api_client.py:95) return self.send(request, type_)

File ~/intugle/data-tools/.venv/lib/python3.12/site-packages/qdrant_client/http/api_client.py:112, in ApiClient.send(self, request, type_)
    111 def send(self, request: Request, type_: Type[T]) -> T:
--> [112](https://file+.vscode-resource.vscode-cdn.net/home/raphael/intugle/data-tools/~/intugle/data-tools/.venv/lib/python3.12/site-packages/qdrant_client/http/api_client.py:112)     response = self.middleware(request, self.send_inner)
    114     if response.status_code == 429:
    115         retry_after_s = response.headers.get("Retry-After", None)

File ~/intugle/data-tools/.venv/lib/python3.12/site-packages/qdrant_client/http/api_client.py:250, in BaseMiddleware.__call__(self, request, call_next)
    249 def __call__(self, request: Request, call_next: Send) -> Response:
--> [250](https://file+.vscode-resource.vscode-cdn.net/home/raphael/intugle/data-tools/~/intugle/data-tools/.venv/lib/python3.12/site-packages/qdrant_client/http/api_client.py:250)     return call_next(request)

File ~/intugle/data-tools/.venv/lib/python3.12/site-packages/qdrant_client/http/api_client.py:136, in ApiClient.send_inner(self, request)
    134     response = self._client.send(request)
    135 except Exception as e:
--> [136](https://file+.vscode-resource.vscode-cdn.net/home/raphael/intugle/data-tools/~/intugle/data-tools/.venv/lib/python3.12/site-packages/qdrant_client/http/api_client.py:136)     raise ResponseHandlingException(e)
    137 return response

ResponseHandlingException: The write operation timed out

Would you be able to investigate ?

@Mukesh-P
Copy link
Contributor Author

@raphael-intugle can you try now, it should work now

@raphael-intugle
Copy link
Collaborator

Great find @Mukesh-P !
That was the correct direction. It seems like this batch size needs to be tuned depending on what cluster configuration a person has. For me it worked when I set the batch size to 5.

So the ideal way this should be done is, the user should be able to set it as an environment variable. Would you be able to make that change ?

Basically in settings.py, a new variable called QDRANT_INSERT_BATCH_SIZE should exist with a default value. The user will be able to override this from the environment. The SemanticSearchCRUD should pick it up from this if no value for it was passed in from the function.

Feel free to get in touch with me on Discord if you need more clarity !

@Mukesh-P
Copy link
Contributor Author

Mukesh-P commented Dec 31, 2025

@raphael-intugle can you check now, i hope it works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[HELP WANTED] Bug: Qdrant semantic search fails due to missing index for \"type\" field

2 participants