UND_ERR_CONNECT_TIMEOUT after multiple upsert requests, no error messages

I'm working on a webserver that let's users upload large numbers of documents and I've run into a bottleneck, the js server creates async requests to the qdrant docker container, most of them work, but it seems to get overwhelmed when the client continues to send file upload requests. I've also noticed that this seems to lock up other system resources like mongodb connections. Our server is able to process and store the files just fine, the source of the error is strictly the qdrant `.upsert(...)` call:

```bash
TypeError: fetch failed
    at node:internal/deps/undici/undici:12500:13
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async fetchJson (/home/torrin/Repos/Equator/ai-app/node_modules/@qdrant/openapi-typescript-fetch/dist/cjs/fetcher.js:135:22)
    at async /home/torrin/Repos/Equator/ai-app/node_modules/@qdrant/js-client-rest/dist/cjs/api-client.js:46:26
    at async handler (/home/torrin/Repos/Equator/ai-app/node_modules/@qdrant/openapi-typescript-fetch/dist/cjs/fetcher.js:156:16)
    at async /home/torrin/Repos/Equator/ai-app/node_modules/@qdrant/js-client-rest/dist/cjs/api-client.js:32:24
    at async handler (/home/torrin/Repos/Equator/ai-app/node_modules/@qdrant/openapi-typescript-fetch/dist/cjs/fetcher.js:156:16)
    at async fetchUrl (/home/torrin/Repos/Equator/ai-app/node_modules/@qdrant/openapi-typescript-fetch/dist/cjs/fetcher.js:162:22)
    at async Object.fun [as upsertPoints] (/home/torrin/Repos/Equator/ai-app/node_modules/@qdrant/openapi-typescript-fetch/dist/cjs/fetcher.js:168:20)
    at async QdrantClient.upsert (/home/torrin/Repos/Equator/ai-app/node_modules/@qdrant/js-client-rest/dist/cjs/qdrant-client.js:553:26)
    at (backend/module/ai/Vector.js:132:22)
    at (backend/module/ai/Vector.js:178:12)
    at (backend/module/util/Upload.js:44:21)
    at (backend/module/account/Login.js:75:13) {
  [cause]: ConnectTimeoutError: Connect Timeout Error
      at onConnectTimeout (/home/torrin/Repos/Equator/ai-app/node_modules/undici/lib/core/connect.js:186:24)
      at /home/torrin/Repos/Equator/ai-app/node_modules/undici/lib/core/connect.js:133:46
      at Immediate._onImmediate (/home/torrin/Repos/Equator/ai-app/node_modules/undici/lib/core/connect.js:174:9)
      at process.processImmediate (node:internal/timers:478:21) {
    code: 'UND_ERR_CONNECT_TIMEOUT'
  }
}
```
The code I'm using looks something like this:
```javascript
	const create = async ({ text, fileID, user }) => {
		text = text.replace(/\s+/g, ' ').trim();
		if (!text) throw new Error('File does not contain valid text.');

		const { embeddings, chunks } = await getTextEmbeddings({ text, user });

		const points = embeddings.map((embedding, i) => ({
			id: uuidv4(),
			vector: embedding,
			payload: {
				file_id: fileID,
				chunk_id: i,
				text: chunks[i],
				creator: user
			}
		}));

		const batchPoints = (points, maxSize) => {
			const batches = [];
			let currentBatch = [];
			let currentSize = 0;

			for (const point of points) {
				const pointSize = Buffer.byteLength(JSON.stringify(point), 'utf8'); // Estimate size
				if (currentSize + pointSize > maxSize && currentBatch.length > 0) {
					batches.push(currentBatch);
					currentBatch = [];
					currentSize = 0;
				}
				currentBatch.push(point);
				currentSize += pointSize;
			}

			if (currentBatch.length > 0) batches.push(currentBatch);
			return batches;
		};

		const batches = batchPoints(points, MAX_PAYLOAD_SIZE);

		try {
			for (const batch of batches) {
				await qdrantClient.upsert(COLLECTION_NAME, { points: batch  });
			}
		} catch (e) {
			console.error(e);
		}
	};
```

Everything up until the upsert call works as expected, chunking and embedding the incoming text. The error happens a large number of requests take place to qdrant. Which seems odd since it's not a crazy number of requests imo: 38 files with a combined size of 300mb.

If this is a fundamental limit with qdrant then I'm a bit concerned, and might have to resort to a queue system, my hope is there is something very wrong with my setup. 

Here is the config file, copied it mostly from the docs:
```yaml
log_level: INFO

storage:
  # Path to store all the data
  storage_path: /qdrant/storage

  # Where to store snapshots
  snapshots_path: /qdrant/storage/snapshots

  snapshots_config:
    # "local" or "s3" - where to store snapshots
    snapshots_storage: local

  # Where to store temporary files
  # If null, temporary snapshot are stored in: storage/snapshots_temp/
  temp_path: null

  # If true - point's payload will not be stored in memory.
  # It will be read from the disk every time it is requested.
  on_disk_payload: true

  # Maximum number of concurrent updates to shard replicas
  update_concurrency: null

  # Write-ahead-log related configuration
  wal:
    wal_capacity_mb: 32
    wal_segments_ahead: 0

  node_type: "Normal"

  performance:
    max_search_threads: 0
    max_optimization_threads: 0
    optimizer_cpu_budget: 0
    update_rate_limit: null

  optimizers:
    deleted_threshold: 0.2
    vacuum_min_vector_number: 1000
    default_segment_number: 0
    max_segment_size_kb: null
    memmap_threshold_kb: 1000
    indexing_threshold_kb: 20000
    flush_interval_sec: 5
    max_optimization_threads: null

  hnsw_index:
    m: 16
    ef_construct: 100
    full_scan_threshold_kb: 10000
    max_indexing_threads: 0
    on_disk: true

service:
  max_request_size_mb: 32
  max_workers: 0
  host: 0.0.0.0
  http_port: 6333
  grpc_port: 6334
  enable_cors: true
  enable_tls: false

telemetry_disabled: false
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UND_ERR_CONNECT_TIMEOUT after multiple upsert requests, no error messages #92

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

UND_ERR_CONNECT_TIMEOUT after multiple upsert requests, no error messages #92

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions