Skip to content

Commit e5a9a0a

Browse files
authored
docs(ifr): add documentation for moshi (#3924)
1 parent bd9f871 commit e5a9a0a

File tree

2 files changed

+173
-0
lines changed

2 files changed

+173
-0
lines changed
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
---
2+
meta:
3+
title: Understanding the Moshika-0.1-8b model
4+
description: Deploy your own secure Moshika-0.1-8b model with Scaleway Managed Inference. Privacy-focused, fully managed.
5+
content:
6+
h1: Understanding the Moshika-0.1-8b model
7+
paragraph: This page provides information on the Moshika-0.1-8b model
8+
tags:
9+
dates:
10+
validation: 2024-10-30
11+
posted: 2024-10-30
12+
categories:
13+
- ai-data
14+
---
15+
16+
## Model overview
17+
18+
| Attribute | Details |
19+
|-----------------|------------------------------------|
20+
| Provider | [Kyutai](https://github.com/kyutai-labs/moshi) |
21+
| Compatible Instances | L4, H100 (FP8, BF16) |
22+
| Context size | 4096 tokens |
23+
24+
## Model names
25+
26+
```bash
27+
kyutai/moshika-0.1-8b:bf16
28+
kyutai/moshika-0.1-8b:fp8
29+
```
30+
31+
## Compatible Instances
32+
33+
| Instance type | Max context length |
34+
| ------------- |-------------|
35+
| L4 | 4096 (FP8, BF16) |
36+
| H100 | 4096 (FP8, BF16) |
37+
38+
## Model introduction
39+
40+
Kyutai's Moshi is a speech-text foundation model for real-time dialogue.
41+
Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity.
42+
While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model.
43+
Moshika is the variant of Moshi with a female voice in English.
44+
45+
## Why is it useful?
46+
47+
Moshi offers seamless real-time dialogue capabilities, enabling users to engage in natural conversations with the model.
48+
It allows the modeling of arbitrary conversational dynamics, including overlapping speech, interruptions, interjections, and more.
49+
In particular, this model:
50+
- Processes 24 kHz audio down to a 12.5 Hz representation with a bandwith of 1.1 kbps, performing better than existing non-streaming models.
51+
- Achieves a theoretical latency of 160 ms, with a practical latency of 200 ms, making it suitable for real-time applications.
52+
53+
## How to use it
54+
55+
To perform inference tasks with your Moshi deployed at Scaleway, a WebSocket API is exposed for real-time dialogue and is accessible at the following endpoint:
56+
57+
```bash
58+
wss://<Deployment UUID>.ifr.fr-par.scaleway.com/api/chat
59+
```
60+
61+
### Testing the WebSocket endpoint
62+
63+
To test the endpoint, use the following command:
64+
65+
```bash
66+
curl -i --http1.1 \
67+
-H "Authorization: Bearer <IAM API key>" \
68+
-H "Connection: Upgrade" \
69+
-H "Upgrade: websocket" \
70+
-H "Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==" \
71+
-H "Sec-WebSocket-Version: 13" \
72+
--url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/api/chat"
73+
```
74+
75+
Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/identity-and-access-management/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
76+
77+
<Message type="tip">
78+
Authentication can be done using the `token` query parameter, which should be set to your IAM API key, if headers are not supported (e.g., in a browser).
79+
</Message>
80+
81+
The server should respond with a `101 Switching Protocols` status code, indicating that the connection has been successfully upgraded to a WebSocket connection.
82+
83+
### Interacting with the model
84+
85+
We provide code samples in various programming languages (Python, Rust, typescript) to interact with the model using the WebSocket API as well as a simple web interface.
86+
Those code samples can be found in our [GitHub repository](https://github.com/scaleway/moshi-client-examples).
87+
This repository contains instructions on how to run the code samples and interact with the model.
Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
---
2+
meta:
3+
title: Understanding the Moshiko-0.1-8b model
4+
description: Deploy your own secure Moshiko-0.1-8b model with Scaleway Managed Inference. Privacy-focused, fully managed.
5+
content:
6+
h1: Understanding the Moshiko-0.1-8b model
7+
paragraph: This page provides information on the Moshiko-0.1-8b model
8+
tags:
9+
dates:
10+
validation: 2024-10-30
11+
categories:
12+
- ai-data
13+
---
14+
15+
## Model overview
16+
17+
| Attribute | Details |
18+
|-----------------|------------------------------------|
19+
| Provider | [Kyutai](https://github.com/kyutai-labs/moshi) |
20+
| Compatible Instances | L4, H100 (FP8, BF16) |
21+
| Context size | 4096 tokens |
22+
23+
## Model names
24+
25+
```bash
26+
kyutai/moshiko-0.1-8b:bf16
27+
kyutai/moshiko-0.1-8b:fp8
28+
```
29+
30+
## Compatible Instances
31+
32+
| Instance type | Max context length |
33+
| ------------- |-------------|
34+
| L4 | 4096 (FP8, BF16) |
35+
| H100 | 4096 (FP8, BF16) |
36+
37+
## Model introduction
38+
39+
Kyutai's Moshi is a speech-text foundation model for real-time dialogue.
40+
Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity.
41+
While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model.
42+
Moshiko is the variant of Moshi with a male voice in English.
43+
44+
## Why is it useful?
45+
46+
Moshi offers seamless real-time dialogue capabilities, enabling users to engage in natural conversations with the model.
47+
It allows the modeling of arbitrary conversational dynamics, including overlapping speech, interruptions, interjections, and more.
48+
In particular, this model:
49+
- Processes 24 kHz audio down to a 12.5 Hz representation with a bandwith of 1.1 kbps, performing better than existing non-streaming models.
50+
- Achieves a theoretical latency of 160 ms, with a practical latency of 200 ms, making it suitable for real-time applications.
51+
52+
## How to use it
53+
54+
To perform inference tasks with your Moshi deployed at Scaleway, a WebSocket API is exposed for real-time dialogue and is accessible at the following endpoint:
55+
56+
```bash
57+
wss://<Deployment UUID>.ifr.fr-par.scaleway.com/api/chat
58+
```
59+
60+
### Testing the WebSocket endpoint
61+
62+
To test the endpoint, use the following command:
63+
64+
```bash
65+
curl -i --http1.1 \
66+
-H "Authorization: Bearer <IAM API key>" \
67+
-H "Connection: Upgrade" \
68+
-H "Upgrade: websocket" \
69+
-H "Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==" \
70+
-H "Sec-WebSocket-Version: 13" \
71+
--url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/api/chat"
72+
```
73+
74+
Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/identity-and-access-management/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
75+
76+
<Message type="tip">
77+
Authentication can be done using the `token` query parameter, which should be set to your IAM API key, if headers are not supported (e.g., in a browser).
78+
</Message>
79+
80+
The server should respond with a `101 Switching Protocols` status code, indicating that the connection has been successfully upgraded to a WebSocket connection.
81+
82+
### Interacting with the model
83+
84+
We provide code samples in various programming languages (Python, Rust, typescript) to interact with the model using the WebSocket API as well as a simple web interface.
85+
Those code samples can be found in our [GitHub repository](https://github.com/scaleway/moshi-client-examples).
86+
This repository contains instructions on how to run the code samples and interact with the model.

0 commit comments

Comments
 (0)