docs(ifr): add documentation for moshi (#3924)

bmarzelleau · web-flow · commit e5a9a0a2f259 · 2024-10-31T14:11:58.000+01:00
diff --git a/ai-data/managed-inference/reference-content/moshika-0.1-8b.mdx b/ai-data/managed-inference/reference-content/moshika-0.1-8b.mdx
@@ -0,0 +1,87 @@
+---
+meta:
+  title: Understanding the Moshika-0.1-8b model
+  description: Deploy your own secure Moshika-0.1-8b model with Scaleway Managed Inference. Privacy-focused, fully managed.
+content:
+  h1: Understanding the Moshika-0.1-8b model
+  paragraph: This page provides information on the Moshika-0.1-8b model
+tags:
+dates:
+  validation: 2024-10-30
+  posted: 2024-10-30
+categories:
+  - ai-data
+---
+
+## Model overview
+
+| Attribute       | Details                            |
+|-----------------|------------------------------------|
+| Provider        | [Kyutai](https://github.com/kyutai-labs/moshi)  |
+| Compatible Instances | L4, H100 (FP8, BF16)    |
+| Context size | 4096 tokens   |
+
+## Model names
+
+```bash
+kyutai/moshika-0.1-8b:bf16
+kyutai/moshika-0.1-8b:fp8
+```
+
+## Compatible Instances
+
+| Instance type  | Max context length |
+| ------------- |-------------|
+| L4      | 4096 (FP8, BF16) | 
+| H100      | 4096 (FP8, BF16) |
+
+## Model introduction
+
+Kyutai's Moshi is a speech-text foundation model for real-time dialogue.
+Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity.
+While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model.
+Moshika is the variant of Moshi with a female voice in English.
+
+## Why is it useful?
+
+Moshi offers seamless real-time dialogue capabilities, enabling users to engage in natural conversations with the model.
+It allows the modeling of arbitrary conversational dynamics, including overlapping speech, interruptions, interjections, and more.
+In particular, this model:
+- Processes 24 kHz audio down to a 12.5 Hz representation with a bandwith of 1.1 kbps, performing better than existing non-streaming models.
+- Achieves a theoretical latency of 160 ms, with a practical latency of 200 ms, making it suitable for real-time applications.
+
+## How to use it
+
+To perform inference tasks with your Moshi deployed at Scaleway, a WebSocket API is exposed for real-time dialogue and is accessible at the following endpoint:
+
+```bash
+wss://<Deployment UUID>.ifr.fr-par.scaleway.com/api/chat
+```
+
+### Testing the WebSocket endpoint
+
+To test the endpoint, use the following command:
+
+```bash
+curl -i --http1.1 \
+-H "Authorization: Bearer <IAM API key>" \
+-H "Connection: Upgrade" \
+-H "Upgrade: websocket" \
+-H "Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==" \
+-H "Sec-WebSocket-Version: 13" \
+--url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/api/chat"
+```
+
+Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/identity-and-access-management/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+
+<Message type="tip">
+  Authentication can be done using the `token` query parameter, which should be set to your IAM API key, if headers are not supported (e.g., in a browser).
+</Message>
+
+The server should respond with a `101 Switching Protocols` status code, indicating that the connection has been successfully upgraded to a WebSocket connection.
+
+### Interacting with the model
+
+We provide code samples in various programming languages (Python, Rust, typescript) to interact with the model using the WebSocket API as well as a simple web interface.
+Those code samples can be found in our [GitHub repository](https://github.com/scaleway/moshi-client-examples).
+This repository contains instructions on how to run the code samples and interact with the model.
diff --git a/ai-data/managed-inference/reference-content/moshiko-0.1-8b.mdx b/ai-data/managed-inference/reference-content/moshiko-0.1-8b.mdx
@@ -0,0 +1,86 @@
+---
+meta:
+  title: Understanding the Moshiko-0.1-8b model
+  description: Deploy your own secure Moshiko-0.1-8b model with Scaleway Managed Inference. Privacy-focused, fully managed.
+content:
+  h1: Understanding the Moshiko-0.1-8b model
+  paragraph: This page provides information on the Moshiko-0.1-8b model
+tags:
+dates:
+  validation: 2024-10-30
+categories:
+  - ai-data
+---
+
+## Model overview
+
+| Attribute       | Details                            |
+|-----------------|------------------------------------|
+| Provider        | [Kyutai](https://github.com/kyutai-labs/moshi)  |
+| Compatible Instances | L4, H100 (FP8, BF16)    |
+| Context size | 4096 tokens   |
+
+## Model names
+
+```bash
+kyutai/moshiko-0.1-8b:bf16
+kyutai/moshiko-0.1-8b:fp8
+```
+
+## Compatible Instances
+
+| Instance type  | Max context length |
+| ------------- |-------------|
+| L4      | 4096 (FP8, BF16) | 
+| H100      | 4096 (FP8, BF16) |
+
+## Model introduction
+
+Kyutai's Moshi is a speech-text foundation model for real-time dialogue.
+Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity.
+While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model.
+Moshiko is the variant of Moshi with a male voice in English.
+
+## Why is it useful?
+
+Moshi offers seamless real-time dialogue capabilities, enabling users to engage in natural conversations with the model.
+It allows the modeling of arbitrary conversational dynamics, including overlapping speech, interruptions, interjections, and more.
+In particular, this model:
+- Processes 24 kHz audio down to a 12.5 Hz representation with a bandwith of 1.1 kbps, performing better than existing non-streaming models.
+- Achieves a theoretical latency of 160 ms, with a practical latency of 200 ms, making it suitable for real-time applications.
+
+## How to use it
+
+To perform inference tasks with your Moshi deployed at Scaleway, a WebSocket API is exposed for real-time dialogue and is accessible at the following endpoint:
+
+```bash
+wss://<Deployment UUID>.ifr.fr-par.scaleway.com/api/chat
+```
+
+### Testing the WebSocket endpoint
+
+To test the endpoint, use the following command:
+
+```bash
+curl -i --http1.1 \
+-H "Authorization: Bearer <IAM API key>" \
+-H "Connection: Upgrade" \
+-H "Upgrade: websocket" \
+-H "Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==" \
+-H "Sec-WebSocket-Version: 13" \
+--url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/api/chat"
+```
+
+Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/identity-and-access-management/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+
+<Message type="tip">
+  Authentication can be done using the `token` query parameter, which should be set to your IAM API key, if headers are not supported (e.g., in a browser).
+</Message>
+
+The server should respond with a `101 Switching Protocols` status code, indicating that the connection has been successfully upgraded to a WebSocket connection.
+
+### Interacting with the model
+
+We provide code samples in various programming languages (Python, Rust, typescript) to interact with the model using the WebSocket API as well as a simple web interface.
+Those code samples can be found in our [GitHub repository](https://github.com/scaleway/moshi-client-examples).
+This repository contains instructions on how to run the code samples and interact with the model.