|
3 | 3 | <head> |
4 | 4 | <meta charset="UTF-8"> |
5 | 5 | <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
6 | | - <title>Speech Synthesis System Demo</title> |
| 6 | + <title>MiniCPM-o 4.0</title> |
7 | 7 |
|
8 | 8 | <!-- External CSS --> |
9 | 9 | <link href="https://fonts.googleapis.com/css?family=Roboto:300,400,500,700" rel="stylesheet"> |
|
344 | 344 | <!-- Main Header Section --> |
345 | 345 | <div class="main-header"> |
346 | 346 | <h1>MiniCPM-o 4.0</h1> |
347 | | - <p class="subtitle">A Family of High-Quality Versatile Speech Generation Models</p> |
348 | | - <p class="author-info"><em>Your Team Name</em></p> |
349 | | - <p class="author-info">Your Organization</p> |
| 347 | + <!-- <p class="subtitle">End-to-end, Customizable-Speaker, Stable, and Natural Voice Chat</p> --> |
| 348 | + <p class="author-info"><em>MiniCPM-o Team</em></p> |
| 349 | + <!-- <p class="author-info">OpenBMB</p> --> |
350 | 350 | <div class="links"> |
351 | | - <a href="#" target="_blank">[Paper]</a> |
352 | | - <a href="#" target="_blank">[Code]</a> |
353 | | - <a href="#" target="_blank">[Dataset]</a> |
354 | | - <a href="#" target="_blank">[Demo]</a> |
| 351 | + <a href="https://github.com/OpenBMB/MiniCPM-o" target="_blank">[Github]</a> |
| 352 | + <a href="https://huggingface.co/openbmb/MiniCPM-o-4" target="_blank">[HuggingFace]</a> |
| 353 | + <!-- <a href="#" target="_blank">[Dataset]</a> --> |
| 354 | + <!-- <a href="#" target="_blank">[Demo]</a> --> |
355 | 355 | </div> |
356 | 356 | </div> |
357 | 357 |
|
358 | 358 | <!-- Abstract Section --> |
359 | | - <div class="abstract-section"> |
| 359 | + <!-- <div class="abstract-section"> |
360 | 360 | <h2>Abstract</h2> |
| 361 | + MiniCPM-o is the latest series of end-side multimodal LLMs (MLLMs) ungraded from MiniCPM-V. The models can now take images, video, text, and audio as inputs and provide high-quality text and speech outputs in an end-to-end fashion. MiniCPM-o 4.0 is the latest and most capable model in the MiniCPM-o series. With a total of 4B parameters, this end-to-end model achieves comparable performance to GPT-4o-202405 in vision, speech, and multimodal live streaming, making it one of the most versatile and performant models in the open-source community. For the new voice mode, MiniCPM-o 4.0 supports bilingual real-time speech conversation with customizable voices, and also allows for end-to-end voice cloning, role play, etc. Compared to MiniCPM-o-2.6, we enhancd the stability and naturalness of speech conversation by introducing architecture improvements and improved data pipelines. It also advances MiniCPM-V-2.6's visual capabilities such strong OCR capability, trustworthy behavior, multilingual support, and video understanding. |
361 | 362 | <div class="content-placeholder"> |
362 | 363 | Insert your technical report abstract here. This section should contain a comprehensive overview |
363 | 364 | of your speech synthesis system, its key innovations, and main contributions. |
364 | 365 | </div> |
365 | | - </div> |
| 366 | + </div> --> |
366 | 367 |
|
367 | 368 | <!-- System Overview Section --> |
368 | 369 | <div class="overview-section"> |
|
0 commit comments