FunAudioLLM · ivanmilevtues · Jun 28, 2025
diff --git a/.codeboarding/Application_Layer.md b/.codeboarding/Application_Layer.md
@@ -0,0 +1,137 @@
+```mermaid
+
+graph LR
+
+    cosyvoice_bin["cosyvoice.bin"]
+
+    cosyvoice_cli_cosyvoice["cosyvoice.cli.cosyvoice"]
+
+    cosyvoice_cli_model["cosyvoice.cli.model"]
+
+    cosyvoice_cli_frontend["cosyvoice.cli.frontend"]
+
+    cosyvoice_bin -- "initiates" --> cosyvoice_cli_cosyvoice
+
+    cosyvoice_bin -- "initiates" --> cosyvoice_cli_model
+
+    cosyvoice_bin -- "utilizes" --> cosyvoice_utils
+
+    cosyvoice_cli_cosyvoice -- "composes" --> cosyvoice_cli_frontend
+
+    cosyvoice_cli_cosyvoice -- "composes" --> cosyvoice_cli_model
+
+    cosyvoice_cli_cosyvoice -- "utilizes" --> cosyvoice_utils_file_utils
+
+    cosyvoice_cli_model -- "composes" --> cosyvoice_llm
+
+    cosyvoice_cli_model -- "composes" --> cosyvoice_flow
+
+    cosyvoice_cli_model -- "composes" --> cosyvoice_hifigan
+
+    cosyvoice_cli_model -- "utilizes" --> cosyvoice_utils_common
+
+    cosyvoice_cli_frontend -- "utilizes" --> cosyvoice_utils_file_utils
+
+    cosyvoice_cli_frontend -- "utilizes" --> cosyvoice_utils_frontend_utils
+
+```
+
+[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%[email protected]?style=flat-square)](mailto:[email protected])
+
+
+
+## Component Details
+
+
+
+The Application Layer serves as the primary interface for users and external systems within the CosyVoice project. It is responsible for handling command-line arguments, orchestrating the overall execution flow for tasks such as model inference, training, and model export (JIT, ONNX), and managing the high-level setup of the system. It acts as the central coordinator, initiating interactions with other core components to fulfill user requests.
+
+
+
+### cosyvoice.bin
+
+This component comprises the direct executable scripts that users run to perform specific tasks like model inference, training, or export. They serve as the primary entry points for the CosyVoice system, parsing command-line arguments and initiating the respective high-level operations.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/bin/inference.py#L1-L9999" target="_blank" rel="noopener noreferrer">`cosyvoice.bin.inference` (1:9999)</a>
+
+- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/bin/train.py#L1-L9999" target="_blank" rel="noopener noreferrer">`cosyvoice.bin.train` (1:9999)</a>
+
+- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/bin/export_jit.py#L1-L9999" target="_blank" rel="noopener noreferrer">`cosyvoice.bin.export_jit` (1:9999)</a>
+
+- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/bin/export_onnx.py#L1-L9999" target="_blank" rel="noopener noreferrer">`cosyvoice.bin.export_onnx` (1:9999)</a>
+
+- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/bin/average_model.py#L1-L9999" target="_blank" rel="noopener noreferrer">`cosyvoice.bin.average_model` (1:9999)</a>
+
+- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/bin/train_dpo.py#L1-L9999" target="_blank" rel="noopener noreferrer">`cosyvoice.bin.train_dpo` (1:9999)</a>
+
+
+
+
+
+### cosyvoice.cli.cosyvoice
+
+This module provides the high-level Python API for interacting with the CosyVoice models. It acts as an orchestrator, managing model loading, frontend processing, and dispatching inference requests to the underlying model logic. It encapsulates different inference modes (SFT, zero-shot, cross-lingual, instruct, VC).
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/cli/cosyvoice.py#L1-L9999" target="_blank" rel="noopener noreferrer">`cosyvoice.cli.cosyvoice` (1:9999)</a>
+
+
+
+
+
+### cosyvoice.cli.model
+
+This component contains the core CosyVoiceModel and CosyVoice2Model classes. These models encapsulate the neural network architecture (LLM, Flow, Vocoder) and orchestrate their interaction to perform Text-to-Speech (TTS) and Voice Conversion (VC). They manage model loading, JIT/TensorRT optimization, and the multi-stage speech generation process, including streaming.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/cli/model.py#L1-L9999" target="_blank" rel="noopener noreferrer">`cosyvoice.cli.model` (1:9999)</a>
+
+
+
+
+
+### cosyvoice.cli.frontend
+
+This module is responsible for all pre-processing of input data (text and speech) before it is fed into the core CosyVoiceModel. This includes text tokenization, speech tokenization, speaker embedding extraction, and text normalization, ensuring that the models receive properly formatted inputs.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/cli/frontend.py#L1-L9999" target="_blank" rel="noopener noreferrer">`cosyvoice.cli.frontend` (1:9999)</a>
+
+
+
+
+
+
+
+
+
+### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)
diff --git a/.codeboarding/Data_Preparation_Management.md b/.codeboarding/Data_Preparation_Management.md
@@ -0,0 +1,217 @@
+```mermaid
+
+graph LR
+
+    CosyVoiceFrontEnd["CosyVoiceFrontEnd"]
+
+    Text_Normalization_Utilities["Text Normalization Utilities"]
+
+    Tokenizer["Tokenizer"]
+
+    Dataset_Factory["Dataset Factory"]
+
+    DataList_Manager["DataList Manager"]
+
+    Distributed_Data_Sampler["Distributed Data Sampler"]
+
+    Data_Pipeline_Processor["Data Pipeline Processor"]
+
+    Standard_Data_Processing_Functions["Standard Data Processing Functions"]
+
+    DPO_Data_Processing_Functions["DPO Data Processing Functions"]
+
+    CosyVoiceFrontEnd -- "uses" --> Tokenizer
+
+    CosyVoiceFrontEnd -- "uses" --> Text_Normalization_Utilities
+
+    Dataset_Factory -- "creates and uses" --> DataList_Manager
+
+    Dataset_Factory -- "creates and uses" --> Data_Pipeline_Processor
+
+    DataList_Manager -- "uses" --> Distributed_Data_Sampler
+
+    Data_Pipeline_Processor -- "uses" --> Standard_Data_Processing_Functions
+
+    Data_Pipeline_Processor -- "uses" --> DPO_Data_Processing_Functions
+
+```
+
+[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%[email protected]?style=flat-square)](mailto:[email protected])
+
+
+
+## Component Details
+
+
+
+This subsystem is responsible for all aspects of data handling within the `CosyVoice` project, from raw input to model-ready tensors. It encompasses text normalization, tokenization, speech feature extraction, dataset loading, and efficient batching for both training and inference, ensuring data is correctly formatted and accessible for the models.
+
+
+
+### CosyVoiceFrontEnd
+
+This is the primary interface for preparing raw input data (text and speech) into a format suitable for CosyVoice models. It orchestrates text normalization, tokenization, speech feature extraction, and speaker embedding extraction. It acts as a crucial data pre-processing layer, ensuring that all incoming data is standardized before being fed into the models.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/cli/frontend.py#L38-L214" target="_blank" rel="noopener noreferrer">`cosyvoice.cli.frontend.CosyVoiceFrontEnd` (38:214)</a>
+
+
+
+
+
+### Text Normalization Utilities
+
+This component provides a collection of helper functions specifically designed for text processing and normalization tasks. These utilities are leveraged by the `CosyVoiceFrontEnd` component to ensure consistent and clean text input, handling various linguistic nuances (e.g., Chinese and English normalization, punctuation handling, number spelling).
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/utils/frontend_utils.py#L1-L1" target="_blank" rel="noopener noreferrer">`cosyvoice.utils.frontend_utils` (1:1)</a>
+
+
+
+
+
+### Tokenizer
+
+This component is responsible for converting raw text into numerical tokens. This tokenization process is fundamental for natural language processing tasks within the CosyVoice system, enabling models to understand and process textual input. It's a critical step in transforming human-readable text into a machine-understandable format.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/tokenizer/tokenizer.py#L1-L1" target="_blank" rel="noopener noreferrer">`cosyvoice.tokenizer.tokenizer.Tokenizer` (1:1)</a>
+
+
+
+
+
+### Dataset Factory
+
+This is a factory function responsible for constructing the overall dataset pipeline. It initializes the `DataList Manager` and chains various `Data Pipeline Processor` functions to create a complete data loading and processing flow for both training and inference. It handles the initial loading of data lists and applies mode-specific configurations.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/dataset/dataset.py#L125-L163" target="_blank" rel="noopener noreferrer">`cosyvoice.dataset.dataset.Dataset` (125:163)</a>
+
+
+
+
+
+### DataList Manager
+
+Manages the list of individual data samples, providing an organized structure for accessing data. It integrates with the `Distributed Data Sampler` to ensure proper data distribution across different processes in a distributed training setup.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/dataset/dataset.py#L107-L122" target="_blank" rel="noopener noreferrer">`cosyvoice.dataset.dataset.DataList` (107:122)</a>
+
+
+
+
+
+### Distributed Data Sampler
+
+Ensures efficient and correct data sampling in distributed training environments, preventing data duplication and ensuring even distribution across processes. It handles the logic for splitting data among different ranks and workers.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/dataset/dataset.py#L51-L104" target="_blank" rel="noopener noreferrer">`cosyvoice.dataset.dataset.DistributedSampler` (51:104)</a>
+
+
+
+
+
+### Data Pipeline Processor
+
+This component provides a flexible mechanism for chaining multiple data processing functions. It acts as an iterable dataset that applies a sequence of transformations to the data, enabling complex and customizable data preparation pipelines.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/dataset/dataset.py#L26-L48" target="_blank" rel="noopener noreferrer">`cosyvoice.dataset.dataset.Processor` (26:48)</a>
+
+
+
+
+
+### Standard Data Processing Functions
+
+A collection of functions that perform various standard data transformations, such as opening parquet files, filtering samples based on length, resampling audio, truncating audio, and computing acoustic features (fbank, f0). These functions are designed to be used as individual steps within the `Data Pipeline Processor`.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/dataset/processor.py#L1-L1" target="_blank" rel="noopener noreferrer">`cosyvoice.dataset.processor` (1:1)</a>
+
+
+
+
+
+### DPO Data Processing Functions
+
+A collection of functions similar to the standard data processing functions, but specifically tailored for Direct Preference Optimization (DPO) training. These functions handle DPO-specific data loading, filtering, and feature extraction requirements, ensuring data is correctly prepared for this specialized training regime.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/dataset/processor_dpo.py#L1-L1" target="_blank" rel="noopener noreferrer">`cosyvoice.dataset.processor_dpo` (1:1)</a>
+
+
+
+
+
+
+
+
+
+### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)