Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 137 additions & 0 deletions .codeboarding/Application_Layer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
```mermaid

graph LR

cosyvoice_bin["cosyvoice.bin"]

cosyvoice_cli_cosyvoice["cosyvoice.cli.cosyvoice"]

cosyvoice_cli_model["cosyvoice.cli.model"]

cosyvoice_cli_frontend["cosyvoice.cli.frontend"]

cosyvoice_bin -- "initiates" --> cosyvoice_cli_cosyvoice

cosyvoice_bin -- "initiates" --> cosyvoice_cli_model

cosyvoice_bin -- "utilizes" --> cosyvoice_utils

cosyvoice_cli_cosyvoice -- "composes" --> cosyvoice_cli_frontend

cosyvoice_cli_cosyvoice -- "composes" --> cosyvoice_cli_model

cosyvoice_cli_cosyvoice -- "utilizes" --> cosyvoice_utils_file_utils

cosyvoice_cli_model -- "composes" --> cosyvoice_llm

cosyvoice_cli_model -- "composes" --> cosyvoice_flow

cosyvoice_cli_model -- "composes" --> cosyvoice_hifigan

cosyvoice_cli_model -- "utilizes" --> cosyvoice_utils_common

cosyvoice_cli_frontend -- "utilizes" --> cosyvoice_utils_file_utils

cosyvoice_cli_frontend -- "utilizes" --> cosyvoice_utils_frontend_utils

```

[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%[email protected]?style=flat-square)](mailto:[email protected])



## Component Details



The Application Layer serves as the primary interface for users and external systems within the CosyVoice project. It is responsible for handling command-line arguments, orchestrating the overall execution flow for tasks such as model inference, training, and model export (JIT, ONNX), and managing the high-level setup of the system. It acts as the central coordinator, initiating interactions with other core components to fulfill user requests.



### cosyvoice.bin

This component comprises the direct executable scripts that users run to perform specific tasks like model inference, training, or export. They serve as the primary entry points for the CosyVoice system, parsing command-line arguments and initiating the respective high-level operations.





**Related Classes/Methods**:



- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/bin/inference.py#L1-L9999" target="_blank" rel="noopener noreferrer">`cosyvoice.bin.inference` (1:9999)</a>

- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/bin/train.py#L1-L9999" target="_blank" rel="noopener noreferrer">`cosyvoice.bin.train` (1:9999)</a>

- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/bin/export_jit.py#L1-L9999" target="_blank" rel="noopener noreferrer">`cosyvoice.bin.export_jit` (1:9999)</a>

- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/bin/export_onnx.py#L1-L9999" target="_blank" rel="noopener noreferrer">`cosyvoice.bin.export_onnx` (1:9999)</a>

- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/bin/average_model.py#L1-L9999" target="_blank" rel="noopener noreferrer">`cosyvoice.bin.average_model` (1:9999)</a>

- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/bin/train_dpo.py#L1-L9999" target="_blank" rel="noopener noreferrer">`cosyvoice.bin.train_dpo` (1:9999)</a>





### cosyvoice.cli.cosyvoice

This module provides the high-level Python API for interacting with the CosyVoice models. It acts as an orchestrator, managing model loading, frontend processing, and dispatching inference requests to the underlying model logic. It encapsulates different inference modes (SFT, zero-shot, cross-lingual, instruct, VC).





**Related Classes/Methods**:



- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/cli/cosyvoice.py#L1-L9999" target="_blank" rel="noopener noreferrer">`cosyvoice.cli.cosyvoice` (1:9999)</a>





### cosyvoice.cli.model

This component contains the core CosyVoiceModel and CosyVoice2Model classes. These models encapsulate the neural network architecture (LLM, Flow, Vocoder) and orchestrate their interaction to perform Text-to-Speech (TTS) and Voice Conversion (VC). They manage model loading, JIT/TensorRT optimization, and the multi-stage speech generation process, including streaming.





**Related Classes/Methods**:



- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/cli/model.py#L1-L9999" target="_blank" rel="noopener noreferrer">`cosyvoice.cli.model` (1:9999)</a>





### cosyvoice.cli.frontend

This module is responsible for all pre-processing of input data (text and speech) before it is fed into the core CosyVoiceModel. This includes text tokenization, speech tokenization, speaker embedding extraction, and text normalization, ensuring that the models receive properly formatted inputs.





**Related Classes/Methods**:



- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/cli/frontend.py#L1-L9999" target="_blank" rel="noopener noreferrer">`cosyvoice.cli.frontend` (1:9999)</a>









### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)
217 changes: 217 additions & 0 deletions .codeboarding/Data_Preparation_Management.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
```mermaid

graph LR

CosyVoiceFrontEnd["CosyVoiceFrontEnd"]

Text_Normalization_Utilities["Text Normalization Utilities"]

Tokenizer["Tokenizer"]

Dataset_Factory["Dataset Factory"]

DataList_Manager["DataList Manager"]

Distributed_Data_Sampler["Distributed Data Sampler"]

Data_Pipeline_Processor["Data Pipeline Processor"]

Standard_Data_Processing_Functions["Standard Data Processing Functions"]

DPO_Data_Processing_Functions["DPO Data Processing Functions"]

CosyVoiceFrontEnd -- "uses" --> Tokenizer

CosyVoiceFrontEnd -- "uses" --> Text_Normalization_Utilities

Dataset_Factory -- "creates and uses" --> DataList_Manager

Dataset_Factory -- "creates and uses" --> Data_Pipeline_Processor

DataList_Manager -- "uses" --> Distributed_Data_Sampler

Data_Pipeline_Processor -- "uses" --> Standard_Data_Processing_Functions

Data_Pipeline_Processor -- "uses" --> DPO_Data_Processing_Functions

```

[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%[email protected]?style=flat-square)](mailto:[email protected])



## Component Details



This subsystem is responsible for all aspects of data handling within the `CosyVoice` project, from raw input to model-ready tensors. It encompasses text normalization, tokenization, speech feature extraction, dataset loading, and efficient batching for both training and inference, ensuring data is correctly formatted and accessible for the models.



### CosyVoiceFrontEnd

This is the primary interface for preparing raw input data (text and speech) into a format suitable for CosyVoice models. It orchestrates text normalization, tokenization, speech feature extraction, and speaker embedding extraction. It acts as a crucial data pre-processing layer, ensuring that all incoming data is standardized before being fed into the models.





**Related Classes/Methods**:



- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/cli/frontend.py#L38-L214" target="_blank" rel="noopener noreferrer">`cosyvoice.cli.frontend.CosyVoiceFrontEnd` (38:214)</a>





### Text Normalization Utilities

This component provides a collection of helper functions specifically designed for text processing and normalization tasks. These utilities are leveraged by the `CosyVoiceFrontEnd` component to ensure consistent and clean text input, handling various linguistic nuances (e.g., Chinese and English normalization, punctuation handling, number spelling).





**Related Classes/Methods**:



- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/utils/frontend_utils.py#L1-L1" target="_blank" rel="noopener noreferrer">`cosyvoice.utils.frontend_utils` (1:1)</a>





### Tokenizer

This component is responsible for converting raw text into numerical tokens. This tokenization process is fundamental for natural language processing tasks within the CosyVoice system, enabling models to understand and process textual input. It's a critical step in transforming human-readable text into a machine-understandable format.





**Related Classes/Methods**:



- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/tokenizer/tokenizer.py#L1-L1" target="_blank" rel="noopener noreferrer">`cosyvoice.tokenizer.tokenizer.Tokenizer` (1:1)</a>





### Dataset Factory

This is a factory function responsible for constructing the overall dataset pipeline. It initializes the `DataList Manager` and chains various `Data Pipeline Processor` functions to create a complete data loading and processing flow for both training and inference. It handles the initial loading of data lists and applies mode-specific configurations.





**Related Classes/Methods**:



- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/dataset/dataset.py#L125-L163" target="_blank" rel="noopener noreferrer">`cosyvoice.dataset.dataset.Dataset` (125:163)</a>





### DataList Manager

Manages the list of individual data samples, providing an organized structure for accessing data. It integrates with the `Distributed Data Sampler` to ensure proper data distribution across different processes in a distributed training setup.





**Related Classes/Methods**:



- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/dataset/dataset.py#L107-L122" target="_blank" rel="noopener noreferrer">`cosyvoice.dataset.dataset.DataList` (107:122)</a>





### Distributed Data Sampler

Ensures efficient and correct data sampling in distributed training environments, preventing data duplication and ensuring even distribution across processes. It handles the logic for splitting data among different ranks and workers.





**Related Classes/Methods**:



- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/dataset/dataset.py#L51-L104" target="_blank" rel="noopener noreferrer">`cosyvoice.dataset.dataset.DistributedSampler` (51:104)</a>





### Data Pipeline Processor

This component provides a flexible mechanism for chaining multiple data processing functions. It acts as an iterable dataset that applies a sequence of transformations to the data, enabling complex and customizable data preparation pipelines.





**Related Classes/Methods**:



- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/dataset/dataset.py#L26-L48" target="_blank" rel="noopener noreferrer">`cosyvoice.dataset.dataset.Processor` (26:48)</a>





### Standard Data Processing Functions

A collection of functions that perform various standard data transformations, such as opening parquet files, filtering samples based on length, resampling audio, truncating audio, and computing acoustic features (fbank, f0). These functions are designed to be used as individual steps within the `Data Pipeline Processor`.





**Related Classes/Methods**:



- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/dataset/processor.py#L1-L1" target="_blank" rel="noopener noreferrer">`cosyvoice.dataset.processor` (1:1)</a>





### DPO Data Processing Functions

A collection of functions similar to the standard data processing functions, but specifically tailored for Direct Preference Optimization (DPO) training. These functions handle DPO-specific data loading, filtering, and feature extraction requirements, ensuring data is correctly prepared for this specialized training regime.





**Related Classes/Methods**:



- <a href="https://github.com/FunAudioLLM/CosyVoice/blob/master/cosyvoice/dataset/processor_dpo.py#L1-L1" target="_blank" rel="noopener noreferrer">`cosyvoice.dataset.processor_dpo` (1:1)</a>









### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)
Loading