|
| 1 | +--- |
| 2 | +meta: |
| 3 | + title: Understanding the Moshika-0.1-8b model |
| 4 | + description: Deploy your own secure Moshika-0.1-8b model with Scaleway Managed Inference. Privacy-focused, fully managed. |
| 5 | +content: |
| 6 | + h1: Understanding the Moshika-0.1-8b model |
| 7 | + paragraph: This page provides information on the Moshika-0.1-8b model |
| 8 | +tags: |
| 9 | +dates: |
| 10 | + validation: 2024-10-30 |
| 11 | + posted: 2024-10-30 |
| 12 | +categories: |
| 13 | + - ai-data |
| 14 | +--- |
| 15 | + |
| 16 | +## Model overview |
| 17 | + |
| 18 | +| Attribute | Details | |
| 19 | +|-----------------|------------------------------------| |
| 20 | +| Provider | [Kyutai](https://github.com/kyutai-labs/moshi) | |
| 21 | +| Compatible Instances | L4, H100 (FP8, BF16) | |
| 22 | +| Context size | 4096 tokens | |
| 23 | + |
| 24 | +## Model names |
| 25 | + |
| 26 | +```bash |
| 27 | +kyutai/moshika-0.1-8b:bf16 |
| 28 | +kyutai/moshika-0.1-8b:fp8 |
| 29 | +``` |
| 30 | + |
| 31 | +## Compatible Instances |
| 32 | + |
| 33 | +| Instance type | Max context length | |
| 34 | +| ------------- |-------------| |
| 35 | +| L4 | 4096 (FP8, BF16) | |
| 36 | +| H100 | 4096 (FP8, BF16) | |
| 37 | + |
| 38 | +## Model introduction |
| 39 | + |
| 40 | +Kyutai's Moshi is a speech-text foundation model for real-time dialogue. |
| 41 | +Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity. |
| 42 | +While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model. |
| 43 | +Moshika is the variant of Moshi with a female voice in English. |
| 44 | + |
| 45 | +## Why is it useful? |
| 46 | + |
| 47 | +Moshi offers seamless real-time dialogue capabilities, enabling users to engage in natural conversations with the model. |
| 48 | +It allows the modeling of arbitrary conversational dynamics, including overlapping speech, interruptions, interjections, and more. |
| 49 | +In particular, this model: |
| 50 | +- Processes 24 kHz audio down to a 12.5 Hz representation with a bandwith of 1.1 kbps, performing better than existing non-streaming models. |
| 51 | +- Achieves a theoretical latency of 160 ms, with a practical latency of 200 ms, making it suitable for real-time applications. |
| 52 | + |
| 53 | +## How to use it |
| 54 | + |
| 55 | +To perform inference tasks with your Moshi deployed at Scaleway, a WebSocket API is exposed for real-time dialogue and is accessible at the following endpoint: |
| 56 | + |
| 57 | +```bash |
| 58 | +wss://<Deployment UUID>.ifr.fr-par.scaleway.com/api/chat |
| 59 | +``` |
| 60 | + |
| 61 | +### Testing the WebSocket endpoint |
| 62 | + |
| 63 | +To test the endpoint, use the following command: |
| 64 | + |
| 65 | +```bash |
| 66 | +curl -i --http1.1 \ |
| 67 | +-H "Authorization: Bearer <IAM API key>" \ |
| 68 | +-H "Connection: Upgrade" \ |
| 69 | +-H "Upgrade: websocket" \ |
| 70 | +-H "Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==" \ |
| 71 | +-H "Sec-WebSocket-Version: 13" \ |
| 72 | +--url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/api/chat" |
| 73 | +``` |
| 74 | + |
| 75 | +Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/identity-and-access-management/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. |
| 76 | + |
| 77 | +<Message type="tip"> |
| 78 | + Authentication can be done using the `token` query parameter, which should be set to your IAM API key, if headers are not supported (e.g., in a browser). |
| 79 | +</Message> |
| 80 | + |
| 81 | +The server should respond with a `101 Switching Protocols` status code, indicating that the connection has been successfully upgraded to a WebSocket connection. |
| 82 | + |
| 83 | +### Interacting with the model |
| 84 | + |
| 85 | +We provide code samples in various programming languages (Python, Rust, typescript) to interact with the model using the WebSocket API as well as a simple web interface. |
| 86 | +Those code samples can be found in our [GitHub repository](https://github.com/scaleway/moshi-client-examples). |
| 87 | +This repository contains instructions on how to run the code samples and interact with the model. |
0 commit comments