Is there support for faster serialization format than json for embedding #630
Replies: 3 comments
-
|
Hi @honeyspoon, currently there are no other binary formats supported. However to limit the duration of the vector conversion as described we are leveraging the Did a quick check and it seems that import random
import json
import msgpack
import time
import ormsgpack
import sys
import orjson
random.seed(1)
test = {"embeddings": [random.random() for _ in range(1000)]}
ITER = 10000
print("JSON")
start = time.perf_counter()
for i in range(ITER):
r1 = json.dumps(test)
ret = json.loads(r1)
print(sys.getsizeof(r1))
print(time.perf_counter() - start)
print("ORJSON")
start = time.perf_counter()
for i in range(ITER):
r1 = orjson.dumps(test)
ret = orjson.loads(r1)
print(sys.getsizeof(r1))
print(time.perf_counter() - start)
print("MSGPACK")
start = time.perf_counter()
for i in range(ITER):
r1 = msgpack.packb(test)
ret = msgpack.unpackb(r1)
print(sys.getsizeof(r1))
print(time.perf_counter() - start)
print("ORMSGPACK")
start = time.perf_counter()
for i in range(ITER):
r1 = ormsgpack.packb(test)
ret = msgpack.unpackb(r1)
print(sys.getsizeof(r1))
print(time.perf_counter() - start)Results: |
Beta Was this translation helpful? Give feedback.
-
|
I decided to just build my own server on top of the engine |
Beta Was this translation helpful? Give feedback.
-
|
Yes, there is the OpenAI encoding format, that is supported if you are e.g. using oak |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to do embedding calculation in a fast way and I would like to get rid fo the last serialization step to json.
Converting a 1k float vec into a string doesn't seem that great.
Is there support for other binary formats like msgpack or arrow?
Beta Was this translation helpful? Give feedback.
All reactions