diff --git a/.codeboarding/Application_Layer.md b/.codeboarding/Application_Layer.md
new file mode 100644
index 00000000..ab0c7b6a
--- /dev/null
+++ b/.codeboarding/Application_Layer.md
@@ -0,0 +1,137 @@
+```mermaid
+
+graph LR
+
+ cosyvoice_bin["cosyvoice.bin"]
+
+ cosyvoice_cli_cosyvoice["cosyvoice.cli.cosyvoice"]
+
+ cosyvoice_cli_model["cosyvoice.cli.model"]
+
+ cosyvoice_cli_frontend["cosyvoice.cli.frontend"]
+
+ cosyvoice_bin -- "initiates" --> cosyvoice_cli_cosyvoice
+
+ cosyvoice_bin -- "initiates" --> cosyvoice_cli_model
+
+ cosyvoice_bin -- "utilizes" --> cosyvoice_utils
+
+ cosyvoice_cli_cosyvoice -- "composes" --> cosyvoice_cli_frontend
+
+ cosyvoice_cli_cosyvoice -- "composes" --> cosyvoice_cli_model
+
+ cosyvoice_cli_cosyvoice -- "utilizes" --> cosyvoice_utils_file_utils
+
+ cosyvoice_cli_model -- "composes" --> cosyvoice_llm
+
+ cosyvoice_cli_model -- "composes" --> cosyvoice_flow
+
+ cosyvoice_cli_model -- "composes" --> cosyvoice_hifigan
+
+ cosyvoice_cli_model -- "utilizes" --> cosyvoice_utils_common
+
+ cosyvoice_cli_frontend -- "utilizes" --> cosyvoice_utils_file_utils
+
+ cosyvoice_cli_frontend -- "utilizes" --> cosyvoice_utils_frontend_utils
+
+```
+
+[](https://github.com/CodeBoarding/GeneratedOnBoardings)[](https://www.codeboarding.org/demo)[](mailto:contact@codeboarding.org)
+
+
+
+## Component Details
+
+
+
+The Application Layer serves as the primary interface for users and external systems within the CosyVoice project. It is responsible for handling command-line arguments, orchestrating the overall execution flow for tasks such as model inference, training, and model export (JIT, ONNX), and managing the high-level setup of the system. It acts as the central coordinator, initiating interactions with other core components to fulfill user requests.
+
+
+
+### cosyvoice.bin
+
+This component comprises the direct executable scripts that users run to perform specific tasks like model inference, training, or export. They serve as the primary entry points for the CosyVoice system, parsing command-line arguments and initiating the respective high-level operations.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.bin.inference` (1:9999)
+
+- `cosyvoice.bin.train` (1:9999)
+
+- `cosyvoice.bin.export_jit` (1:9999)
+
+- `cosyvoice.bin.export_onnx` (1:9999)
+
+- `cosyvoice.bin.average_model` (1:9999)
+
+- `cosyvoice.bin.train_dpo` (1:9999)
+
+
+
+
+
+### cosyvoice.cli.cosyvoice
+
+This module provides the high-level Python API for interacting with the CosyVoice models. It acts as an orchestrator, managing model loading, frontend processing, and dispatching inference requests to the underlying model logic. It encapsulates different inference modes (SFT, zero-shot, cross-lingual, instruct, VC).
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.cli.cosyvoice` (1:9999)
+
+
+
+
+
+### cosyvoice.cli.model
+
+This component contains the core CosyVoiceModel and CosyVoice2Model classes. These models encapsulate the neural network architecture (LLM, Flow, Vocoder) and orchestrate their interaction to perform Text-to-Speech (TTS) and Voice Conversion (VC). They manage model loading, JIT/TensorRT optimization, and the multi-stage speech generation process, including streaming.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.cli.model` (1:9999)
+
+
+
+
+
+### cosyvoice.cli.frontend
+
+This module is responsible for all pre-processing of input data (text and speech) before it is fed into the core CosyVoiceModel. This includes text tokenization, speech tokenization, speaker embedding extraction, and text normalization, ensuring that the models receive properly formatted inputs.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.cli.frontend` (1:9999)
+
+
+
+
+
+
+
+
+
+### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)
\ No newline at end of file
diff --git a/.codeboarding/Data_Preparation_Management.md b/.codeboarding/Data_Preparation_Management.md
new file mode 100644
index 00000000..50a9bbd5
--- /dev/null
+++ b/.codeboarding/Data_Preparation_Management.md
@@ -0,0 +1,217 @@
+```mermaid
+
+graph LR
+
+ CosyVoiceFrontEnd["CosyVoiceFrontEnd"]
+
+ Text_Normalization_Utilities["Text Normalization Utilities"]
+
+ Tokenizer["Tokenizer"]
+
+ Dataset_Factory["Dataset Factory"]
+
+ DataList_Manager["DataList Manager"]
+
+ Distributed_Data_Sampler["Distributed Data Sampler"]
+
+ Data_Pipeline_Processor["Data Pipeline Processor"]
+
+ Standard_Data_Processing_Functions["Standard Data Processing Functions"]
+
+ DPO_Data_Processing_Functions["DPO Data Processing Functions"]
+
+ CosyVoiceFrontEnd -- "uses" --> Tokenizer
+
+ CosyVoiceFrontEnd -- "uses" --> Text_Normalization_Utilities
+
+ Dataset_Factory -- "creates and uses" --> DataList_Manager
+
+ Dataset_Factory -- "creates and uses" --> Data_Pipeline_Processor
+
+ DataList_Manager -- "uses" --> Distributed_Data_Sampler
+
+ Data_Pipeline_Processor -- "uses" --> Standard_Data_Processing_Functions
+
+ Data_Pipeline_Processor -- "uses" --> DPO_Data_Processing_Functions
+
+```
+
+[](https://github.com/CodeBoarding/GeneratedOnBoardings)[](https://www.codeboarding.org/demo)[](mailto:contact@codeboarding.org)
+
+
+
+## Component Details
+
+
+
+This subsystem is responsible for all aspects of data handling within the `CosyVoice` project, from raw input to model-ready tensors. It encompasses text normalization, tokenization, speech feature extraction, dataset loading, and efficient batching for both training and inference, ensuring data is correctly formatted and accessible for the models.
+
+
+
+### CosyVoiceFrontEnd
+
+This is the primary interface for preparing raw input data (text and speech) into a format suitable for CosyVoice models. It orchestrates text normalization, tokenization, speech feature extraction, and speaker embedding extraction. It acts as a crucial data pre-processing layer, ensuring that all incoming data is standardized before being fed into the models.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.cli.frontend.CosyVoiceFrontEnd` (38:214)
+
+
+
+
+
+### Text Normalization Utilities
+
+This component provides a collection of helper functions specifically designed for text processing and normalization tasks. These utilities are leveraged by the `CosyVoiceFrontEnd` component to ensure consistent and clean text input, handling various linguistic nuances (e.g., Chinese and English normalization, punctuation handling, number spelling).
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.utils.frontend_utils` (1:1)
+
+
+
+
+
+### Tokenizer
+
+This component is responsible for converting raw text into numerical tokens. This tokenization process is fundamental for natural language processing tasks within the CosyVoice system, enabling models to understand and process textual input. It's a critical step in transforming human-readable text into a machine-understandable format.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.tokenizer.tokenizer.Tokenizer` (1:1)
+
+
+
+
+
+### Dataset Factory
+
+This is a factory function responsible for constructing the overall dataset pipeline. It initializes the `DataList Manager` and chains various `Data Pipeline Processor` functions to create a complete data loading and processing flow for both training and inference. It handles the initial loading of data lists and applies mode-specific configurations.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.dataset.dataset.Dataset` (125:163)
+
+
+
+
+
+### DataList Manager
+
+Manages the list of individual data samples, providing an organized structure for accessing data. It integrates with the `Distributed Data Sampler` to ensure proper data distribution across different processes in a distributed training setup.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.dataset.dataset.DataList` (107:122)
+
+
+
+
+
+### Distributed Data Sampler
+
+Ensures efficient and correct data sampling in distributed training environments, preventing data duplication and ensuring even distribution across processes. It handles the logic for splitting data among different ranks and workers.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.dataset.dataset.DistributedSampler` (51:104)
+
+
+
+
+
+### Data Pipeline Processor
+
+This component provides a flexible mechanism for chaining multiple data processing functions. It acts as an iterable dataset that applies a sequence of transformations to the data, enabling complex and customizable data preparation pipelines.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.dataset.dataset.Processor` (26:48)
+
+
+
+
+
+### Standard Data Processing Functions
+
+A collection of functions that perform various standard data transformations, such as opening parquet files, filtering samples based on length, resampling audio, truncating audio, and computing acoustic features (fbank, f0). These functions are designed to be used as individual steps within the `Data Pipeline Processor`.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.dataset.processor` (1:1)
+
+
+
+
+
+### DPO Data Processing Functions
+
+A collection of functions similar to the standard data processing functions, but specifically tailored for Direct Preference Optimization (DPO) training. These functions handle DPO-specific data loading, filtering, and feature extraction requirements, ensuring data is correctly prepared for this specialized training regime.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.dataset.processor_dpo` (1:1)
+
+
+
+
+
+
+
+
+
+### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)
\ No newline at end of file
diff --git a/.codeboarding/General_Utilities.md b/.codeboarding/General_Utilities.md
new file mode 100644
index 00000000..effea715
--- /dev/null
+++ b/.codeboarding/General_Utilities.md
@@ -0,0 +1,291 @@
+```mermaid
+
+graph LR
+
+ File_Utilities["File Utilities"]
+
+ Common_Utilities["Common Utilities"]
+
+ Masking_Utilities["Masking Utilities"]
+
+ Learning_Rate_Schedulers["Learning Rate Schedulers"]
+
+ Loss_Functions["Loss Functions"]
+
+ Class_Utilities["Class Utilities"]
+
+ Frontend_Utilities["Frontend Utilities"]
+
+ Training_Utilities["Training Utilities"]
+
+ File_Utilities -- "provides loaded data to" --> Data_Consumers
+
+ Common_Utilities -- "offers functions to" --> Data_Generation_Manipulation
+
+ Common_Utilities -- "offers functions to" --> Model_Components
+
+ Masking_Utilities -- "provides functions to" --> Model_Architecture_Components
+
+ Learning_Rate_Schedulers -- "provides adjustments to" --> Training_Utilities
+
+ Loss_Functions -- "are consumed by" --> Training_Executors
+
+ Class_Utilities -- "is used by" --> cosyvoice_cli_cosyvoice
+
+ Class_Utilities -- "is used by" --> cosyvoice_transformer
+
+ Frontend_Utilities -- "prepares text inputs for" --> Model_Input_Preparation
+
+```
+
+[](https://github.com/CodeBoarding/GeneratedOnBoardings)[](https://www.codeboarding.org/demo)[](mailto:contact@codeboarding.org)
+
+
+
+## Component Details
+
+
+
+Abstract Components Overview of a machine learning project, detailing various utility components and their interrelationships.
+
+
+
+### File Utilities
+
+Manages various file input/output operations, including reading structured data (lists, JSON), loading audio files, and handling model conversion and export to different formats (e.g., ONNX to TensorRT, CosyVoice2 to VLLM). It ensures efficient loading and saving of configuration, data, and model artifacts.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `read_lists` (1:1)
+
+- `read_json_lists` (1:1)
+
+- `load_wav` (1:1)
+
+- `convert_onnx_to_trt` (1:1)
+
+- `export_cosyvoice2_vllm` (1:1)
+
+
+
+
+
+### Common Utilities
+
+Contains general-purpose helper functions and core utilities, including various sampling strategies, tensor padding, accuracy calculation, and random seed management. These methods are essential for data manipulation, generation, and other common operations used throughout the project, especially within model architectures and training.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `pad_list` (1:1)
+
+- `th_accuracy` (1:1)
+
+- `ras_sampling` (1:1)
+
+- `nucleus_sampling` (1:1)
+
+- `random_sampling` (1:1)
+
+- `set_all_random_seed` (1:1)
+
+- `mask_to_bias` (1:1)
+
+- `TrtContextWrapper` (1:1)
+
+
+
+
+
+### Masking Utilities
+
+Provides functionalities for creating and applying masks, which are critical in sequence models (especially transformer architectures) to control information flow, handle variable-length inputs, and implement attention mechanisms.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `make_pad_mask` (1:1)
+
+- `make_non_pad_mask` (1:1)
+
+- `add_optional_chunk_mask` (1:1)
+
+- `subsequent_mask` (1:1)
+
+
+
+
+
+### Learning Rate Schedulers
+
+Manages and adjusts the learning rate during the training of machine learning models. It implements various learning rate policies (e.g., Warmup, Annealing, Cosine, Noam) to optimize the training process and ensure stable convergence.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `WarmupScheduler` (1:1)
+
+- `NoamScheduler` (1:1)
+
+- `CosineScheduler` (1:1)
+
+- `AnnealingScheduler` (1:1)
+
+
+
+
+
+### Loss Functions
+
+Defines various loss functions used to quantify the error of a model during training. This includes standard loss functions (`losses.py`) and those specific to Direct Preference Optimization (DPO) (`losses_dpo.py`), which are crucial for guiding model optimization.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `L1Loss` (1:1)
+
+- `MSELoss` (1:1)
+
+- `LogitsLoss` (1:1)
+
+- `DPOLoss` (1:1)
+
+
+
+
+
+### Class Utilities
+
+Provides utilities for dynamic class handling, instantiation, and potentially registration. This is particularly useful for modular architectures where components (like transformer layers, activations, embeddings) might be dynamically loaded or configured based on configuration.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `get_class_from_path` (1:1)
+
+- `get_all_classes_from_path` (1:1)
+
+- `get_all_classes_from_module` (1:1)
+
+
+
+
+
+### Frontend Utilities
+
+Focuses on preprocessing input data, specifically text, by splitting it into manageable segments. This is a common initial step in text-to-speech or natural language processing pipelines before data is fed into the main model.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `split_text_into_sentences` (1:1)
+
+
+
+
+
+### Training Utilities
+
+Contains core helper functions and logic for the machine learning model training loop. This includes functionalities related to data handling within the training context, gradient accumulation, and interactions with learning rate schedulers. It provides the backbone for both standard and DPO training processes.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `AverageMeter` (1:1)
+
+- `set_grad_enabled` (1:1)
+
+- `get_grad_norm` (1:1)
+
+- `get_logger` (1:1)
+
+- `get_model_info` (1:1)
+
+- `save_checkpoint` (1:1)
+
+- `load_checkpoint` (1:1)
+
+- `get_latest_checkpoint` (1:1)
+
+- `get_model_state_dict` (1:1)
+
+- `get_optimizer_state_dict` (1:1)
+
+- `get_scheduler_state_dict` (1:1)
+
+- `get_scaler_state_dict` (1:1)
+
+- `get_model_state_dict_with_prefix` (1:1)
+
+- `get_optimizer_state_dict_with_prefix` (1:1)
+
+- `get_scheduler_state_dict_with_prefix` (1:1)
+
+- `get_scaler_state_dict_with_prefix` (1:1)
+
+- `get_model_state_dict_without_prefix` (1:1)
+
+- `get_optimizer_state_dict_without_prefix` (1:1)
+
+- `get_scheduler_state_dict_without_prefix` (1:1)
+
+- `get_scaler_state_dict_without_prefix` (1:1)
+
+- `get_model_state_dict_with_prefix_and_without_prefix` (1:1)
+
+- `get_optimizer_state_dict_with_prefix_and_without_prefix` (1:1)
+
+- `get_scheduler_state_dict_with_prefix_and_without_prefix` (1:1)
+
+- `get_scaler_state_dict_with_prefix_and_without_prefix` (1:1)
+
+
+
+
+
+
+
+
+
+### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)
\ No newline at end of file
diff --git a/.codeboarding/Training_Management.md b/.codeboarding/Training_Management.md
new file mode 100644
index 00000000..0c1ebc15
--- /dev/null
+++ b/.codeboarding/Training_Management.md
@@ -0,0 +1,165 @@
+```mermaid
+
+graph LR
+
+ Training_Loop_Executor["Training Loop Executor"]
+
+ Training_Environment_Manager["Training Environment Manager"]
+
+ Core_Training_Operations["Core Training Operations"]
+
+ Loss_Function_Provider["Loss Function Provider"]
+
+ Learning_Rate_Scheduler["Learning Rate Scheduler"]
+
+ Data_Loader["Data Loader"]
+
+ Training_Loop_Executor -- "orchestrates" --> Core_Training_Operations
+
+ Training_Loop_Executor -- "consumes data from" --> Data_Loader
+
+ Training_Environment_Manager -- "initializes" --> Training_Loop_Executor
+
+ Training_Environment_Manager -- "configures" --> Learning_Rate_Scheduler
+
+ Training_Environment_Manager -- "prepares" --> Data_Loader
+
+ Core_Training_Operations -- "utilizes" --> Loss_Function_Provider
+
+ Core_Training_Operations -- "applies" --> Learning_Rate_Scheduler
+
+```
+
+[](https://github.com/CodeBoarding/GeneratedOnBoardings)[](https://www.codeboarding.org/demo)[](mailto:contact@codeboarding.org)
+
+
+
+## Component Details
+
+
+
+The Training Management subsystem is the core orchestrator of the model training process. It encompasses everything from setting up the distributed training environment to executing the training loop, managing model updates, and handling logging and checkpointing. It is designed to be flexible, supporting both standard and Direct Preference Optimization (DPO) training paradigms.
+
+
+
+### Training Loop Executor
+
+This component is the central control unit for the entire training process. It orchestrates the iteration over epochs and data batches, manages the training and cross-validation cycles, and coordinates the forward and backward passes. It includes specialized logic for both standard and DPO training, handling aspects like gradient accumulation and distributed training synchronization.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.utils.executor` (0:0)
+
+- `cosyvoice.utils.executor_dpo` (0:0)
+
+
+
+
+
+### Training Environment Manager
+
+This component is responsible for setting up the foundational environment required for training. This includes initializing distributed training (e.g., PyTorch DDP or DeepSpeed), configuring and loading datasets, preparing optimizers, and setting up learning rate schedulers. It ensures that the training infrastructure is correctly configured before the training loop begins.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.utils.train_utils` (0:0)
+
+- `cosyvoice.utils.train_utils_dpo` (0:0)
+
+
+
+
+
+### Core Training Operations
+
+This component encapsulates the fundamental, per-step operations within the training loop. It handles the forward pass (model inference and loss computation), the backward pass (gradient calculation), the update of model parameters based on gradients, and the management of gradient clipping. It also includes utilities for logging training metrics and saving model checkpoints.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.utils.train_utils` (0:0)
+
+- `cosyvoice.utils.train_utils_dpo` (0:0)
+
+
+
+
+
+### Loss Function Provider
+
+This component defines and computes the various loss functions used to quantify the error between the model's predictions and the actual targets. It includes both general-purpose loss functions and specialized implementations for Direct Preference Optimization (DPO), providing the essential feedback signal for model learning.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.utils.losses` (0:0)
+
+- `cosyvoice.utils.losses_dpo` (0:0)
+
+
+
+
+
+### Learning Rate Scheduler
+
+This component implements strategies for dynamically adjusting the learning rate of the optimizer throughout the training process. It includes various scheduling policies like warmup, noam annealing, or constant rates, which are crucial for achieving stable convergence and optimal model performance.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.utils.scheduler` (0:0)
+
+
+
+
+
+### Data Loader
+
+This component is responsible for efficiently loading, preprocessing, and batching the training and validation data. It provides an iterable interface to the datasets, ensuring that data is fed to the model in an optimized manner for training.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.dataset.dataset` (0:0)
+
+
+
+
+
+
+
+
+
+### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)
\ No newline at end of file
diff --git a/.codeboarding/on_boarding.md b/.codeboarding/on_boarding.md
new file mode 100644
index 00000000..3d910783
--- /dev/null
+++ b/.codeboarding/on_boarding.md
@@ -0,0 +1,247 @@
+```mermaid
+
+graph LR
+
+ Application_Layer["Application Layer"]
+
+ Core_TTS_Models["Core TTS Models"]
+
+ Data_Preparation_Management["Data Preparation & Management"]
+
+ Training_Management["Training Management"]
+
+ Shared_Neural_Network_Components["Shared Neural Network Components"]
+
+ General_Utilities["General Utilities"]
+
+ Application_Layer -- "initiates calls to" --> Core_TTS_Models
+
+ Application_Layer -- "initiates calls to" --> Training_Management
+
+ Application_Layer -- "utilizes" --> General_Utilities
+
+ Core_TTS_Models -- "utilizes" --> Shared_Neural_Network_Components
+
+ Data_Preparation_Management -- "receives input from" --> Core_TTS_Models
+
+ Core_TTS_Models -- "provides output to" --> Application_Layer
+
+ Data_Preparation_Management -- "provides input to" --> Core_TTS_Models
+
+ Data_Preparation_Management -- "provides data to" --> Training_Management
+
+ Data_Preparation_Management -- "relies on" --> General_Utilities
+
+ Training_Management -- "interacts with" --> Core_TTS_Models
+
+ Training_Management -- "receives data from" --> Data_Preparation_Management
+
+ Training_Management -- "utilizes" --> General_Utilities
+
+ Shared_Neural_Network_Components -- "provides building blocks for" --> Core_TTS_Models
+
+ General_Utilities -- "supports" --> Application_Layer
+
+ General_Utilities -- "supports" --> Core_TTS_Models
+
+ General_Utilities -- "supports" --> Data_Preparation_Management
+
+ General_Utilities -- "supports" --> Training_Management
+
+ click Application_Layer href "https://github.com/FunAudioLLM/CosyVoice/blob/main/.codeboarding//Application_Layer.md" "Details"
+
+ click Data_Preparation_Management href "https://github.com/FunAudioLLM/CosyVoice/blob/main/.codeboarding//Data_Preparation_Management.md" "Details"
+
+ click Training_Management href "https://github.com/FunAudioLLM/CosyVoice/blob/main/.codeboarding//Training_Management.md" "Details"
+
+ click General_Utilities href "https://github.com/FunAudioLLM/CosyVoice/blob/main/.codeboarding//General_Utilities.md" "Details"
+
+```
+
+[](https://github.com/CodeBoarding/GeneratedOnBoardings)[](https://www.codeboarding.org/demo)[](mailto:contact@codeboarding.org)
+
+
+
+## Component Details
+
+
+
+The `CosyVoice` architecture is logically decomposed into six fundamental components, each with distinct responsibilities, ensuring a modular, scalable, and maintainable system for Text-to-Speech (TTS) synthesis.
+
+
+
+### Application Layer
+
+This component serves as the primary interface for users and external systems. It handles command-line arguments, orchestrates the overall execution flow for tasks such as model inference, training, and model export (JIT, ONNX), and manages the high-level setup of the system. It acts as the central coordinator, initiating interactions with other core components.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.bin.inference` (1:1)
+
+- `cosyvoice.bin.train` (1:1)
+
+- `cosyvoice.bin.export_jit` (1:1)
+
+- `cosyvoice.cli.cosyvoice` (1:1)
+
+- `cosyvoice.cli.model` (1:1)
+
+
+
+
+
+### Core TTS Models
+
+This is the heart of the text-to-speech system, encompassing the entire speech synthesis pipeline. It integrates and orchestrates the linguistic model (LLM), the acoustic model (Flow-based Generative Models), and the vocoder (HiFi-GAN) to transform text and speaker information into high-fidelity audio. It handles model loading and the core `tts` functionality.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.cli.model` (1:1)
+
+- `cosyvoice.llm.llm` (1:1)
+
+- `cosyvoice.llm.llm_dpo` (1:1)
+
+- `cosyvoice.flow.decoder` (1:1)
+
+- `cosyvoice.flow.flow` (1:1)
+
+- `cosyvoice.flow.flow_matching` (1:1)
+
+- `cosyvoice.hifigan.generator` (1:1)
+
+- `cosyvoice.hifigan.discriminator` (1:1)
+
+- `cosyvoice.hifigan.hifigan` (1:1)
+
+
+
+
+
+### Data Preparation & Management
+
+Responsible for all aspects of data handling, from raw input to model-ready tensors. This includes text normalization, tokenization, speech feature extraction, dataset loading, and efficient batching for both training and inference. It ensures data is correctly formatted and accessible for the models.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.cli.frontend` (1:1)
+
+- `cosyvoice.dataset.dataset` (1:1)
+
+- `cosyvoice.dataset.processor` (1:1)
+
+- `cosyvoice.dataset.processor_dpo` (1:1)
+
+- `cosyvoice.tokenizer.tokenizer` (1:1)
+
+
+
+
+
+### Training Management
+
+Orchestrates the entire training process, including setting up distributed training environments, managing training loops, performing forward and backward passes, updating model parameters, handling logging, and saving model checkpoints. It supports both standard and DPO (Direct Preference Optimization) training.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.utils.executor` (1:1)
+
+- `cosyvoice.utils.executor_dpo` (1:1)
+
+- `cosyvoice.utils.train_utils` (1:1)
+
+- `cosyvoice.utils.train_utils_dpo` (1:1)
+
+
+
+
+
+### Shared Neural Network Components
+
+This component provides fundamental, reusable building blocks for neural network architectures, particularly those based on transformers and conformers. It encapsulates common layers, attention mechanisms, and embeddings that are utilized by various models within the `Core TTS Models` component.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.transformer.encoder` (1:1)
+
+- `cosyvoice.transformer.decoder` (1:1)
+
+- `cosyvoice.transformer.attention` (1:1)
+
+- `cosyvoice.transformer.embedding` (1:1)
+
+- `cosyvoice.transformer.positionwise_feed_forward` (1:1)
+
+- `cosyvoice.transformer.subsampling` (1:1)
+
+
+
+
+
+### General Utilities
+
+A collection of miscellaneous helper functions and common utilities used across the entire project. This includes mathematical operations, file I/O, masking utilities, learning rate schedulers, and various loss functions. It serves as a foundational support layer for other components.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `cosyvoice.utils.common` (1:1)
+
+- `cosyvoice.utils.file_utils` (1:1)
+
+- `cosyvoice.utils.mask` (1:1)
+
+- `cosyvoice.utils.scheduler` (1:1)
+
+- `cosyvoice.utils.losses` (1:1)
+
+- `cosyvoice.utils.losses_dpo` (1:1)
+
+- `cosyvoice.utils.class_utils` (1:1)
+
+- `cosyvoice.utils.frontend_utils` (1:1)
+
+
+
+
+
+
+
+
+
+### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)
\ No newline at end of file