Skip to content

Conversation

simoneangarano
Copy link

This pull request contains the modifications made to the codebase in the litgpt repository to add support for Multi-Head Latent Attention (MLA) block from DeepSeekV2.

Changes

  • Configuration: Added latent_attention: Optional[bool] = False parameter to the configuration file to enable the MLA block.
  • MLA module: Implemented the MLA module as a separate component in the litgpt codebase.
  • KVCacheCompressed: Added support for the KVCacheCompressed class to store the key-value pairs for the MLA block.
  • Model: Modified the GPT class to include the MLA block as an alternative component based on the configuration parameter latent_attention.
  • Training: Updated the training script to support the MLA block and added support for training with the new configuration file config_hub/pretrain/cfg.yaml.

Usage

  • Configuration: Set the latent_attention parameter to True in the configuration file to enable the MLA block.
  • Training: Run the training script with the updated configuration file.
    litgpt pretrain --config config_hub/pretrain/cfg.yaml
  • Inference: Use the trained model for inference as follows:
    litgpt generate out/pretrain/mla/final/

@Borda Borda added the enhancement New feature or request label Mar 12, 2025
@Borda
Copy link
Collaborator

Borda commented Mar 20, 2025

@simoneangarano, mind checking the failing tests? :)

@ysjprojects
Copy link
Collaborator

@Borda @simoneangarano I can look into the failing tests if that's okay with everyone.

@Borda
Copy link
Collaborator

Borda commented Jun 4, 2025

@simoneangarano I can look into the failing tests if that's okay with everyone.

@ysjprojects that would be great!

@Borda Borda marked this pull request as draft June 20, 2025 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request waiting on author
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants