formally support int8 and uint8 within langchain and 2 distance metrics

Sorry for the overlapping posts, but I thought it prudent to initiate a separate issue if this is going to be worked on:

Here's a summary:

1) Both langchain and sentence transformers allow a model to be run in float32, bfloat16, and float16.
2) Sentence Transformers "encode" method only supports float32 (and certain quantizations), which results in the embeddings themselves being in float32.
3) ```Tiledb.py```, langchain's integration with the ```tiledb``` library, automatically converts embeddings to ```float32``` here:

```np.array([np.array(embedding).astype(np.float32)]).astype(np.float32),```

4) [Relatively newer versions](https://github.com/UKPLab/sentence-transformers/releases/tag/v2.6.0) of ```sentence-transformers``` support ```int8```, ```uint8```, ```binary```, ```ubinary```.  [Pull request here](https://github.com/UKPLab/sentence-transformers/pull/2549)

5) ```Tiledb``` [seems to support](https://docs.tiledb.com/spark/supported-datatypes) ```int8``` and ```uint8``` but not the other two.
6) Again, langchain's integration of ```tiledb``` within ```tiledb.py``` doesn't distinguish and converts everything to ```float32```.

Does that succinctly summarize the current state of affairs?  Is it possible to at least modify tiledb.py to formally support ```int8``` and ```uint8``` if not the other two?  I noticed that [@nikolaos](https://github.com/NikolaosPapailiou) did the initial integration in November, 2023.  Is he still around at the company by chance?  lol.

Here is the related "issue" where I realized this for peoples' cross reference: https://github.com/TileDB-Inc/TileDB-Py/issues/2130#issuecomment-2573687441

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

formally support int8 and uint8 within langchain and 2 distance metrics #561

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

formally support int8 and uint8 within langchain and 2 distance metrics #561

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions