Skip to content

Commit 27b6472

Browse files
authored
together.ai embedder (#290)
1 parent e1ca480 commit 27b6472

File tree

4 files changed

+25
-17
lines changed

4 files changed

+25
-17
lines changed

api-reference/how-to/embedding.mdx

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -45,41 +45,45 @@ To use the Ingest CLI or Ingest Python library to generate embeddings, do the fo
4545

4646
- The provider ID `aws-bedrock` for [Amazon Bedrock](https://aws.amazon.com/bedrock/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/bedrock/).
4747
- `huggingface` for [Hugging Face](https://huggingface.co/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/huggingfacehub/).
48+
- `mixedbread-ai` for [Mixedbread](https://www.mixedbread.ai/). [Learn more](https://www.mixedbread.ai/docs/embeddings/overview).
49+
- `octoai` for [Octo AI](https://octo.ai/). [Learn more](https://octo.ai/docs/text-gen-solution/using-unstructured-io-for-embedding-documents).
4850
- `openai` for [OpenAI](https://openai.com/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/openai/).
51+
- `togetherai` for [Together.ai](https://www.together.ai/). [Learn more](https://docs.together.ai/docs/embedding-models).
4952
- `vertexai` for [Google Vertex AI PaLM](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/google_vertex_ai_palm/).
5053
- `voyageai` for [Voyage AI](https://www.voyageai.com/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/voyageai/).
51-
- `mixedbread-ai` for [Mixedbread](https://www.mixedbread.ai/). [Learn more](https://www.mixedbread.ai/docs/embeddings/overview).
52-
- `octoai` for [Octo AI](https://octo.ai/). [Learn more](https://octo.ai/docs/text-gen-solution/using-unstructured-io-for-embedding-documents).
53-
54+
5455
2. Run the following command to install the required Python package for the embedding provider:
5556

5657
- For `aws-bedrock`, run `pip install "unstructured-ingest[bedrock]"`.
5758
- For `huggingface`, run `pip install "unstructured-ingest[embed-huggingface]"`.
59+
- For `mixedbread-ai`, run `pip install "unstructured-ingest[embed-mixedbreadai]"`.
60+
- For `octoai`, run `pip install "unstructured-ingest[embed-octoai]"`.
5861
- For `openai`, run `pip install "unstructured-ingest[openai]"`.
62+
- For `togetherai`, run `pip install "unstructured-ingest[togetherai]"`.
5963
- For `vertexai`, run `pip install "unstructured-ingest[embed-vertexai]"`.
6064
- For `voyageai`, run `pip install "unstructured-ingest[embed-voyageai]"`.
61-
- For `mixedbread-ai`, run `pip install "unstructured-ingest[embed-mixedbreadai]"`.
62-
- For `octoai`, run `pip install "unstructured-ingest[embed-octoai]"`.
6365

6466
3. For the following embedding providers, you can choose the model that you want to use. If you do choose a model, note the model's name:
6567

6668
- `aws-bedrock`. [Choose a model](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html). No default model is provided. [Learn more about the supported models](https://docs.aws.amazon.com/bedrock/latest/userguide/models-supported.html).
6769
- `huggingface`. [Choose a model](https://huggingface.co/models?other=embeddings), or use the default model [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2).
70+
- `mixedbread-ai`. [Choose a model](https://www.mixedbread.ai/docs/embeddings/models), or use the default model [mixedbread-ai/mxbai-embed-large-v1](https://www.mixedbread.ai/docs/embeddings/mxbai-embed-large-v1).
71+
- `octoai`. [Choose a model](https://octo.ai/blog/supercharge-rag-performance-using-octoai-and-unstructured-embeddings/), or use the default model `thenlper/gte-large`.
6872
- `openai`. [Choose a model](https://platform.openai.com/docs/guides/embeddings/embedding-models), or use the default model `text-embedding-ada-002`.
73+
- `togetherai`. [Choose a model](https://docs.together.ai/docs/embedding-models), or use the default model `togethercomputer/m2-bert-80M-8k-retrieval`.
6974
- `vertexai`. [Choose a model](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api), or use the default model `textembedding-gecko@001`.
7075
- `voyageai`. [Choose a model](https://docs.voyageai.com/docs/embeddings). No default model is provided.
71-
- `mixedbread-ai`. [Choose a model](https://www.mixedbread.ai/docs/embeddings/models), or use the default model [mixedbread-ai/mxbai-embed-large-v1](https://www.mixedbread.ai/docs/embeddings/mxbai-embed-large-v1).
72-
- `octoai`. [Choose a model](https://octo.ai/blog/supercharge-rag-performance-using-octoai-and-unstructured-embeddings/), or use the default model `thenlper/gte-large`.
7376

7477
4. Note the special settings to connect to the provider:
7578

7679
- For `aws-bedrock`, you'll need an AWS access key value, the corresponding AWS secret access key value, and the corresponding AWS Region identifier. [Get an AWS access key and secret access key](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html).
7780
- For `huggingface`, if you use a gated model (a model with special conditions that you must accept before you can use it, or a privately published model), you'll need an HF inference API key value, beginning with `hf_`. [Get an HF inference API key](https://huggingface.co/docs/api-inference/en/quicktour#get-your-api-token). To learn whether your model requires an HF inference API key, see your model provider's documentation.
81+
- For `mixedbread-ai`, you'll need a Mixedbread API key value. [Get a Mixedbread API key](https://www.mixedbread.ai/dashboard?next=api-keys).
82+
- For `octoai`, you'll need an Octo AI API token value. [Get an Octo AI API token](https://octo.ai/docs/getting-started/how-to-create-octoai-access-token).
7883
- For `openai`, you'll need an OpenAI API key value. [Get an OpenAI API key](https://platform.openai.com/docs/quickstart/create-and-export-an-api-key).
84+
- For `togetherai`, you'll need a together.ai API key value. [Get a together.ai API key](https://docs.together.ai/reference/authentication-1).
7985
- For `vertexai`, you'll need the path to a Google Cloud credentials JSON file. Learn more [here](https://cloud.google.com/docs/authentication/application-default-credentials#GAC) and [here](https://googleapis.dev/python/google-auth/latest/reference/google.auth.html#module-google.auth).
8086
- For `voyageai`, you'll need a Voyage AI API key value. [Get a Voyage AI API key](https://docs.voyageai.com/docs/api-key-and-installation#authentication-with-api-keys).
81-
- For `mixedbread-ai`, you'll need a Mixedbread API key value. [Get a Mixedbread API key](https://www.mixedbread.ai/dashboard?next=api-keys).
82-
- For `octoai`, you'll need an Octo AI API token value. [Get an Octo AI API token](https://octo.ai/docs/getting-started/how-to-create-octoai-access-token).
8387

8488
5. Now, apply all of this information as follows, and then run your command or code:
8589

api-reference/ingest/ingest-dependencies.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,7 @@ To add support for available embedding libraries, run the following:
9898
| `pip install "unstructured-ingest[embed-voyageai]"` | Voyage AI |
9999
| `pip install "unstructured-ingest[embed-mixedbreadai]"` | Mixedbread |
100100
| `pip install "unstructured-ingest[openai]"` | OpenAI |
101+
| `pip install "unstructured-ingest[togetherai]"` | together.ai |
101102

102103
For details about the specific dependencies that are installed, see:
103104

open-source/core-functionality/embedding.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,5 +72,6 @@ For information about how to use Python scripts to call various embedding provid
7272
- [Hugging Face](https://huggingface.co/blog/getting-started-with-embeddings)
7373
- [OctoAI](https://octo.ai/blog/introducing-octoais-embedding-api-to-power-your-rag-needs/)
7474
- [OpenAI](https://platform.openai.com/docs/guides/embeddings)
75+
- [together.ai](https://docs.together.ai/docs/embeddings-overview)
7576
- [Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings)
7677
- [Voyage AI](https://docs.voyageai.com/docs/embeddings)

snippets/ingest-configuration-shared/embedding-configuration.mdx

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ A common embedding configuration is a critical component that allows for dynamic
1010

1111
* <Icon icon="v"/><Icon icon="1"/>&nbsp;&nbsp;`aws_secret_access_key`: The AWS secret access key to be used for AWS-based embedders, such as Amazon Bedrock.
1212

13-
* <Icon icon="v"/><Icon icon="2"/>&nbsp;&nbsp;`embedding_provider`: The embedding provider to use while doing embedding. Available values include `openai`, `huggingface`, `aws-bedrock`, `vertexai`, `voyageai`, and `octoai`.
13+
* <Icon icon="v"/><Icon icon="2"/>&nbsp;&nbsp;`embedding_provider`: The embedding provider to use while doing embedding. Available values include `aws-bedrock`, `huggingface`, `octoai`, `openai`, `togetherai`, `vertexai`, and `voyageai`.
1414

1515
* <Icon icon="v"/><Icon icon="2"/>&nbsp;&nbsp;`embedding_api_key`: The API key to use, if one is required to generate the embeddings through an API service, such as OpenAI.
1616

@@ -24,21 +24,23 @@ A common embedding configuration is a critical component that allows for dynamic
2424

2525
* <Icon icon="v"/><Icon icon="1"/>&nbsp;&nbsp;`model_name`: The specific model to use for the embedding provider, if necessary.
2626

27-
* <Icon icon="v"/><Icon icon="1"/>&nbsp;&nbsp;`provider`: The embedding provider to use while doing embedding. Available values include `openai`, `huggingface`, `aws-bedrock`, `vertexai`, `voyageai`, and `octoai`.
27+
* <Icon icon="v"/><Icon icon="1"/>&nbsp;&nbsp;`provider`: The embedding provider to use while doing embedding. Available values include `aws-bedrock`, `huggingface`, `octoai`, `openai`, `togetherai`, `vertexai`, and `voyageai`.
2828

2929

3030
<Icon icon="v"/><Icon icon="1"/>&nbsp;&nbsp;The default `model_name` values unless otherwise specified are:
3131

32-
* `openai`: `text-embedding-ada-002`
32+
* `aws-bedrock`: None
3333

3434
* `huggingface`: `sentence-transformers/all-MiniLM-L6-v2`
3535

36-
* `aws-bedrock`: None
36+
* `mixedbread-ai`: `mixedbread-ai/mxbai-embed-large-v1`
3737

38-
* `vertexai`: `textembedding-gecko@001`
38+
* `octoai`: `thenlper/gte-large`
3939

40-
* `voyageai`: None
40+
* `openai`: `text-embedding-ada-002`
4141

42-
* `mixedbread-ai`: `mixedbread-ai/mxbai-embed-large-v1`
42+
* `togetherai`: `togethercomputer/m2-bert-80M-8k-retrieval`
43+
44+
* `vertexai`: `textembedding-gecko@001`
4345

44-
* `octoai`: `thenlper/gte-large`
46+
* `voyageai`: None

0 commit comments

Comments
 (0)