together.ai embedder (#290)

Paul-Cornell · web-flow · commit 27b64724ee86 · 2024-10-21T16:29:04.000-07:00
diff --git a/api-reference/how-to/embedding.mdx b/api-reference/how-to/embedding.mdx
@@ -45,41 +45,45 @@ To use the Ingest CLI or Ingest Python library to generate embeddings, do the fo
 
    - The provider ID `aws-bedrock` for [Amazon Bedrock](https://aws.amazon.com/bedrock/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/bedrock/).
    - `huggingface` for [Hugging Face](https://huggingface.co/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/huggingfacehub/).
+   - `mixedbread-ai` for [Mixedbread](https://www.mixedbread.ai/). [Learn more](https://www.mixedbread.ai/docs/embeddings/overview).
+   - `octoai` for [Octo AI](https://octo.ai/). [Learn more](https://octo.ai/docs/text-gen-solution/using-unstructured-io-for-embedding-documents).
    - `openai` for [OpenAI](https://openai.com/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/openai/).
+   - `togetherai` for [Together.ai](https://www.together.ai/). [Learn more](https://docs.together.ai/docs/embedding-models).
    - `vertexai` for [Google Vertex AI PaLM](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/google_vertex_ai_palm/).
    - `voyageai` for [Voyage AI](https://www.voyageai.com/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/voyageai/).
-   - `mixedbread-ai` for [Mixedbread](https://www.mixedbread.ai/). [Learn more](https://www.mixedbread.ai/docs/embeddings/overview).
-   - `octoai` for [Octo AI](https://octo.ai/). [Learn more](https://octo.ai/docs/text-gen-solution/using-unstructured-io-for-embedding-documents).
-
+   
 2. Run the following command to install the required Python package for the embedding provider:
 
    - For `aws-bedrock`, run `pip install "unstructured-ingest[bedrock]"`.
    - For `huggingface`, run `pip install "unstructured-ingest[embed-huggingface]"`.
+   - For `mixedbread-ai`, run `pip install "unstructured-ingest[embed-mixedbreadai]"`.
+   - For `octoai`, run `pip install "unstructured-ingest[embed-octoai]"`.
    - For `openai`, run `pip install "unstructured-ingest[openai]"`.
+   - For `togetherai`, run `pip install "unstructured-ingest[togetherai]"`.
    - For `vertexai`, run `pip install "unstructured-ingest[embed-vertexai]"`.
    - For `voyageai`, run `pip install "unstructured-ingest[embed-voyageai]"`.
-   - For `mixedbread-ai`, run `pip install "unstructured-ingest[embed-mixedbreadai]"`.
-   - For `octoai`, run `pip install "unstructured-ingest[embed-octoai]"`.
 
 3. For the following embedding providers, you can choose the model that you want to use. If you do choose a model, note the model's name:
 
    - `aws-bedrock`. [Choose a model](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html). No default model is provided. [Learn more about the supported models](https://docs.aws.amazon.com/bedrock/latest/userguide/models-supported.html).
    - `huggingface`. [Choose a model](https://huggingface.co/models?other=embeddings), or use the default model [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2).
+   - `mixedbread-ai`. [Choose a model](https://www.mixedbread.ai/docs/embeddings/models), or use the default model [mixedbread-ai/mxbai-embed-large-v1](https://www.mixedbread.ai/docs/embeddings/mxbai-embed-large-v1).
+   - `octoai`. [Choose a model](https://octo.ai/blog/supercharge-rag-performance-using-octoai-and-unstructured-embeddings/), or use the default model `thenlper/gte-large`.
    - `openai`. [Choose a model](https://platform.openai.com/docs/guides/embeddings/embedding-models), or use the default model `text-embedding-ada-002`.
+   - `togetherai`. [Choose a model](https://docs.together.ai/docs/embedding-models), or use the default model `togethercomputer/m2-bert-80M-8k-retrieval`.
    - `vertexai`. [Choose a model](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api), or use the default model `textembedding-gecko@001`.
    - `voyageai`.  [Choose a model](https://docs.voyageai.com/docs/embeddings). No default model is provided.
-   - `mixedbread-ai`. [Choose a model](https://www.mixedbread.ai/docs/embeddings/models), or use the default model [mixedbread-ai/mxbai-embed-large-v1](https://www.mixedbread.ai/docs/embeddings/mxbai-embed-large-v1).
-   - `octoai`. [Choose a model](https://octo.ai/blog/supercharge-rag-performance-using-octoai-and-unstructured-embeddings/), or use the default model `thenlper/gte-large`.
 
 4. Note the special settings to connect to the provider:
 
    - For `aws-bedrock`, you'll need an AWS access key value, the corresponding AWS secret access key value, and the corresponding AWS Region identifier. [Get an AWS access key and secret access key](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html).
    - For `huggingface`, if you use a gated model (a model with special conditions that you must accept before you can use it, or a privately published model), you'll need an HF inference API key value, beginning with `hf_`. [Get an HF inference API key](https://huggingface.co/docs/api-inference/en/quicktour#get-your-api-token). To learn whether your model requires an HF inference API key, see your model provider's documentation. 
+   - For `mixedbread-ai`, you'll need a Mixedbread API key value. [Get a Mixedbread API key](https://www.mixedbread.ai/dashboard?next=api-keys).
+   - For `octoai`, you'll need an Octo AI API token value. [Get an Octo AI API token](https://octo.ai/docs/getting-started/how-to-create-octoai-access-token).
    - For `openai`, you'll need an OpenAI API key value. [Get an OpenAI API key](https://platform.openai.com/docs/quickstart/create-and-export-an-api-key).
+   - For `togetherai`, you'll need a together.ai API key value. [Get a together.ai API key](https://docs.together.ai/reference/authentication-1). 
    - For `vertexai`, you'll need the path to a Google Cloud credentials JSON file. Learn more [here](https://cloud.google.com/docs/authentication/application-default-credentials#GAC) and [here](https://googleapis.dev/python/google-auth/latest/reference/google.auth.html#module-google.auth).
    - For `voyageai`, you'll need a Voyage AI API key value. [Get a Voyage AI API key](https://docs.voyageai.com/docs/api-key-and-installation#authentication-with-api-keys).
-   - For `mixedbread-ai`, you'll need a Mixedbread API key value. [Get a Mixedbread API key](https://www.mixedbread.ai/dashboard?next=api-keys).
-   - For `octoai`, you'll need an Octo AI API token value. [Get an Octo AI API token](https://octo.ai/docs/getting-started/how-to-create-octoai-access-token).
 
 5. Now, apply all of this information as follows, and then run your command or code:
 
diff --git a/api-reference/ingest/ingest-dependencies.mdx b/api-reference/ingest/ingest-dependencies.mdx
@@ -98,6 +98,7 @@ To add support for available embedding libraries, run the following:
 | `pip install "unstructured-ingest[embed-voyageai]"` | Voyage AI |
 | `pip install "unstructured-ingest[embed-mixedbreadai]"` | Mixedbread |
 | `pip install "unstructured-ingest[openai]"` | OpenAI |
+| `pip install "unstructured-ingest[togetherai]"` | together.ai  |
 
 For details about the specific dependencies that are installed, see:
 
diff --git a/open-source/core-functionality/embedding.mdx b/open-source/core-functionality/embedding.mdx
@@ -72,5 +72,6 @@ For information about how to use Python scripts to call various embedding provid
 - [Hugging Face](https://huggingface.co/blog/getting-started-with-embeddings)
 - [OctoAI](https://octo.ai/blog/introducing-octoais-embedding-api-to-power-your-rag-needs/)
 - [OpenAI](https://platform.openai.com/docs/guides/embeddings)
+- [together.ai](https://docs.together.ai/docs/embeddings-overview)
 - [Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings)
 - [Voyage AI](https://docs.voyageai.com/docs/embeddings)
diff --git a/snippets/ingest-configuration-shared/embedding-configuration.mdx b/snippets/ingest-configuration-shared/embedding-configuration.mdx
@@ -10,7 +10,7 @@ A common embedding configuration is a critical component that allows for dynamic
 
 *   <Icon icon="v"/><Icon icon="1"/>&nbsp;&nbsp;`aws_secret_access_key`: The AWS secret access key to be used for AWS-based embedders, such as Amazon Bedrock.
 
-*   <Icon icon="v"/><Icon icon="2"/>&nbsp;&nbsp;`embedding_provider`: The embedding provider to use while doing embedding. Available values include `openai`, `huggingface`, `aws-bedrock`, `vertexai`, `voyageai`, and `octoai`.
+*   <Icon icon="v"/><Icon icon="2"/>&nbsp;&nbsp;`embedding_provider`: The embedding provider to use while doing embedding. Available values include `aws-bedrock`, `huggingface`, `octoai`, `openai`, `togetherai`, `vertexai`, and `voyageai`.
     
 *   <Icon icon="v"/><Icon icon="2"/>&nbsp;&nbsp;`embedding_api_key`: The API key to use, if one is required to generate the embeddings through an API service, such as OpenAI.
 
@@ -24,21 +24,23 @@ A common embedding configuration is a critical component that allows for dynamic
 
 *   <Icon icon="v"/><Icon icon="1"/>&nbsp;&nbsp;`model_name`: The specific model to use for the embedding provider, if necessary.
 
-*   <Icon icon="v"/><Icon icon="1"/>&nbsp;&nbsp;`provider`: The embedding provider to use while doing embedding. Available values include `openai`, `huggingface`, `aws-bedrock`, `vertexai`, `voyageai`, and `octoai`.
+*   <Icon icon="v"/><Icon icon="1"/>&nbsp;&nbsp;`provider`: The embedding provider to use while doing embedding. Available values include `aws-bedrock`, `huggingface`, `octoai`, `openai`, `togetherai`, `vertexai`, and `voyageai`.
 
 
 <Icon icon="v"/><Icon icon="1"/>&nbsp;&nbsp;The default `model_name` values unless otherwise specified are:
 
-* `openai`: `text-embedding-ada-002`
+* `aws-bedrock`: None
 
 * `huggingface`: `sentence-transformers/all-MiniLM-L6-v2`
 
-* `aws-bedrock`: None
+* `mixedbread-ai`: `mixedbread-ai/mxbai-embed-large-v1`
 
-* `vertexai`: `textembedding-gecko@001`
+* `octoai`: `thenlper/gte-large`
 
-* `voyageai`: None
+* `openai`: `text-embedding-ada-002`
 
-* `mixedbread-ai`: `mixedbread-ai/mxbai-embed-large-v1`
+* `togetherai`: `togethercomputer/m2-bert-80M-8k-retrieval`
+
+* `vertexai`: `textembedding-gecko@001`
 
-* `octoai`: `thenlper/gte-large`
+* `voyageai`: None