Skip to content

Releases: erfanzar/EasyDeL

EasyDeL v0.2.0.2

03 Jan 12:19
afdcca6

Choose a tag to compare

Highlights

  • Unified attention support end-to-end, including new cache structures and eSurge compatibility.
  • Faster GPU/TPU inference via backend-aware attention selection, KV-cache update optimizations, and smarter compilation/batching defaults.

Added

  • Unified attention mechanism across attention layers and generation/scheduler paths.
  • New cache types for unified attention, including UnifiedAttentionCache and related config/view helpers.
  • HybridCache support and expanded unified-attention cache integration.

Performance & Behavior Changes

  • GPU inference now prefers unified_attention and TPU inference prefers ragged_page_attention_v3, with warnings when a suboptimal mechanism is selected.
  • KV-cache updates are optimized for GPU latency (vectorized scatter approach; improved memory donation behavior).
  • eSurge compilation is capped to the scheduler’s actual per-step token budget to reduce startup time for long-context models.
  • runner.compile() now accepts max_num_batched_tokens for fine-grained compilation control.
  • GPU/TPU-aware auto-defaults for max_num_batched_tokens (GPU: >= 2048 tokens/step, TPU: >= 8192 tokens/step) and higher TPU defaults.
  • Performance tuning updates (numexpr threading configuration, JAX PGLE enablement, and XLA GPU flag fixes).

Evaluation

  • eSurge lm-eval adapter improvements: exact teacher-forced log-likelihood scoring, rolling-window perplexity, per-request stop sequences, improved greedy_until, and more robust tokenization/chat-template fallbacks.

Fixes

  • Dtype conversion adjustments in the bridge for more consistent behavior.
  • Linting fixes in tests and the Xerxes model.

Dependency Updates

  • Upgrade ejkernel to v0.0.50.

Merged PRs

  • #249: Unified attention mechanism + caching structures.
  • #250: Bridge dtype conversion updates.
  • #251: eSurge Speedup v1.

What's Changed

  • feat: Add unified attention mechanism and related caching structures by @erfanzar in #249
  • modify dtype conversation in bridge. by @erfanzar in #250
  • eSurge Speedup v1 (esurge/speedup-v1) by @erfanzar in #251

Full Changelog: v0.2.0.1...v0.2.0.2

EasyDeL 0.2.0.1

31 Dec 13:49

Choose a tag to compare

EasyDeL 0.2.0.1

This patch release only updates ejkernel to reduce noisy import-time diagnostics on Python 3.11.

What changed

  • Updated ejkernel to 0.0.47.
    • Fixes false-positive "Signature mismatch" warnings during import-time kernel registry validation (e.g. ragged_decode_attention).
    • Improves type-annotation normalization so equivalent JAX array spellings don’t trip signature checks under postponed annotations (Python 3.11).

Compatibility

  • No EasyDeL API changes.

Release 0.2.0

30 Dec 22:49

Choose a tag to compare

EasyDeL v0.2.0 (Production-ready)

EasyDeL v0.2.0 is a production-ready release focused on framework-wide stability and performance. As always, validate on your target hardware/workloads when upgrading.

Highlights

New high-performance inference engine: eSurge

eSurge is a new inference stack focused on efficient serving with better scheduling, caching, and observability.

  • Dynamic batching + scheduling: batches requests to improve throughput, with scheduling controls (e.g., token budgets, prioritization) to better handle mixed workloads.
  • KV cache + prefix caching: introduces page-based cache management and prefix caching to reduce recomputation and improve memory utilization for long/repeated prompts.
  • Streaming + async support: supports streaming token output and async execution patterns for interactive serving.
  • OpenAI-compatible API server: includes an OpenAI-style server module so you can integrate with existing clients more easily.
  • Function calling + tool parsing: adds structured tool/function parsing support (including OpenAI, Qwen3 XML, and xLAM-style formats).
  • Metrics and monitoring: adds built-in metrics collection with Prometheus/Grafana-friendly monitoring components.

Unified ops, attention backends, and caching architecture

This release consolidates core execution pieces so models share a more consistent and optimized path.

  • Pluggable attention backends: moves toward a unified attention/ops layer with multiple backends (e.g., flash/ring/vanilla/SDPA/blocksparse and more), making it easier to swap implementations and tune performance.
  • More consistent dispatch/execution: introduces an executor-style framework so kernels/ops are selected and run in a more uniform way across model families.
  • Stabilized caching across model types: strengthens caching for transformer + hybrid/recurrent-style models (including SSM variants) with a cleaner metadata/management abstraction.

Expanded model coverage (LLMs, VLMs, and SSM/Hybrid)

v0.2.0 significantly increases the number of supported architectures and improves multimodal foundations.

  • SSM/Hybrid support: adds/extends native support for Mamba/Mamba2-style models and related recurrent/hybrid caching paths.
  • New and improved model families: adds support for multiple newer families (including Qwen3 variants, Olmo3, Exaone4, SmolLM3, Seed-OSS, Kimi variants, GLM-4V) and improves compatibility across existing ones.
  • Multimodal improvements: expands the vision-language pipeline and model support (e.g., Gemma3/LLaVA/Pixtral/Qwen2-VL and related infrastructure).

Training & data pipeline upgrades

This release expands post-training and RLHF-style capabilities and improves dataset handling.

  • New RL trainers: adds PPO and additional group-based RL trainers (e.g., GSPO, GFPO) alongside broader trainer/config stabilization.
  • Prompt + dataset transforms: improves/extends transforms for chat templates, dataset mixtures, and sequence packing to make training pipelines more consistent.
  • Distributed training improvements: improves Ray-based training utilities and sharding ergonomics.

Tooling, docs, CI, and developer experience

  • Hugging Face utilities: adds batch scripts for downloading and converting HF models, plus improved/streaming checkpoint conversion workflows.
  • Docs and tests: reorganizes API docs and expands test coverage across model architectures and core utilities.
  • CI and packaging workflow: introduces lint/lockfile workflows and modernizes project tooling.

Breaking changes / Migration notes

  • Notable runtime requirement change: Python is now >=3.11.
  • Packaging workflow changed: project moved from Poetry-style configuration to pyproject.toml (PEP 621) + uv/uv.lock.
  • Inference API migration: older inference modules like easydel.inference.vsurge / easydel.inference.vinference are removed; migrate to easydel.inference.esurge.
  • Trainer/config rename: max_sequence_length was renamed to max_length in trainer-related configs.

Install

  • pip install -U easydel==0.2.0
  • Extras (optional): pip install -U "easydel[gpu]" / "easydel[tpu]" / "easydel[lm_eval]"

What's Changed

  • feat(examples): Add comprehensive suite of documented training and evaluation examples` by @erfanzar in #203
  • Add Tenstorrent backend support by @vzeljkovicTT in #208
  • Add GIDD architecture by @dvruette in #207
  • Faster MoE Models by @erfanzar in #211
  • feat(docker): enhance Docker setup with multi-hardware support and pre-commit hooks by @erfanzar in #213
  • chore: update eformer dependency to >=0.0.54 by @erfanzar in #214
  • feat: add sequential initialization and comprehensive model documentation by @erfanzar in #218
  • Fix: Complete mesh update implementation for AttentionMetadata by @vzeljkovicTT in #216
  • docs: improve RayDistributedTrainer documentation by @erfanzar in #219
  • refactor: clean up eSurge runner and optimize data handling by @erfanzar in #220
  • refactor: relocate eLMConfig to infra module and enhance data management by @erfanzar in #221
  • refactor: optimize model parallelization and gradient checkpointing by @erfanzar in #222
  • feat: upgrade to JAX 0.7.2 and eformer 0.0.72 by @erfanzar in #223
  • Fix Flash attention bias head count for tensor parallel training. by @XMaster96 in #224
  • feat: Add comprehensive JAX type hinting and documentation by @erfanzar in #225
  • docs: enhance eLargeModel API documentation by @erfanzar in #226
  • Integrate ejkernel library and modernize EasyDeL architecture by @erfanzar in #233
  • Refactor MoE system: modularize, unify execution modes, and add configurability by @erfanzar in #236
  • refactor(esurge): extract delta text computation into helper method by @erfanzar in #237
  • fix: prevent token loss in eSurge streaming under concurrency by @Azure99 in #235
  • 6 new state-of-the-art trainers, significantly enhancing the eSurge inference engine with enterprise-grade authentication and performance optimizations by @erfanzar in #243
  • feat: add SeedOss model by @Azure99 in #238
  • feat(data): add EasyData pipeline with trainer transform integration by @erfanzar in #245
  • feat(trainers): add GSPO and GFPO trainers by @erfanzar in #246
  • Pre release/v0.2.0 by @erfanzar in #247

New Contributors

Full Changelog: 0.1.4...0.2.0

EasyDeL v0.1.4 Release – Inference Engine Enhancements and Evaluation Support

02 May 13:04

Choose a tag to compare

This release brings major updates to the EasyDeL inference stack (vSurge), including new attention mechanisms, streamlined sampling options, and integrated evaluation capabilities for large language model serving.

🚀 Key Highlights

🧠 Inference Engine Drivers: vDriver & oDriver

EasyDeL’s vSurge inference engine now offers two core driver modes:

  • oDriver (Optimized Driver):

    • Supports continuous batching, auto-scheduling, and auto-insertion.
    • Introduces Paged Attention (attn_mechanism=AttentionMechanisms.PAGED_ATTENTION) — an attention mechanism that manages the KV cache using fixed-size memory pages.
    • Benefits: Improves memory utilization, reduces fragmentation, and supports variable sequence lengths and large batch sizes efficiently.
    • Recommended for high-throughput, latency-sensitive inference workloads.
    • ⚠️ Currently supported only on TPU
  • vDriver (Vanilla Driver):

    • Also supports continuous batching, auto-scheduling, and auto-insertion.
    • Provides a more traditional inference path, compatible with standard attention, Flash Attention, SDPA and ....
    • Useful for debugging or when advanced KV management (e.g., paged attention) is not needed.

Choose between oDriver and vDriver based on your hardware and performance needs. For TPU inference, oDriver with Paged Attention is the preferred option.

🎯 Sampling Parameter Update

  • The top_k parameter has been removed from the core inference engines.

  • If you're currently using top_k sampling, migrate to alternatives such as:

    • top_p (nucleus sampling)
    • Temperature-based sampling
  • This update simplifies and modernizes the sampling configuration.


📊 LM Evaluation Integration (lm-eval)

vSurge now integrates with the lm-evaluation-harness, enabling standardized benchmarking of any EasyDeL-served model.

  • A new adapter class, EasyDeLLM, is available in easydel.inference.evals.
  • Example script: easydel/scripts/eval.py demonstrates how to run evaluations.

⚙️ Configuration Improvements

  • The attn_mechanism parameter is now explicitly configurable when initializing inference drivers (create_odriver or create_vdriver).
  • Use this setting to select attention strategies such as PAGED_ATTENTION or FLASH_ATTENTION depending on your deployment needs.

🧹 General Enhancements

  • Refactoring of the vSurge module improves code structure, cache management, and maintainability.
  • Internal improvements lay the groundwork for future extensibility and better platform support.

Please review your existing configurations when upgrading — particularly for sampling options and the new attn_mechanism setting — to take full advantage of these improvements.

🔗 Full Changelog: [Compare v0.1.3...v0.1.4](0.1.3...0.1.4)

New Model Support & Inference/Training Improvements

07 Apr 19:54

Choose a tag to compare

EasyDeL v0.1.3 Release Notes

We're excited to announce the release of EasyDeL version 0.1.3, bringing significant model support expansions and improved inference capabilities.

🚀 New Model Support

Llama 4 Integration

Added comprehensive support for Llama 4 with multiple task types:

  • Causal Language Modeling (CLM)
  • Text-Image-to-Text processing
  • Sequence Classification

Qwen 3 Integration

Added support for Qwen 3 with:

  • Causal Language Modeling (CLM)
  • Sequence Classification

Qwen 3 MoE Integration

Added support for Qwen 3 Mixture of Experts with:

  • Causal Language Modeling (CLM)
  • Sequence Classification

🚀 Improved Inference Capabilities

Enhanced vInference Engine

  • Added support for dynamic sampling parameters:
    • Top-k sampling
    • Top-p (nucleus) sampling
    • Temperature control
    • And more configurable generation parameters

Improved vInference API Server

  • Enhanced API server functionality
  • Better error handling
  • Improved response formatting
  • More robust request processing

Updated vWhisperInference

  • Significantly improved Whisper inference serving engine
  • Better performance and accuracy
  • Enhanced streaming capabilities
  • Improved resource utilization

📚 Documentation and Examples

  • Complete overhaul of documentation structure
  • Added comprehensive examples for all new model integrations
  • Improved code documentation and clarity
  • Added detailed usage guides for new features
  • Enhanced API documentation

🛠️ Code Quality

  • Reformatted all model implementations for better clarity
  • Improved code organization and structure
  • Enhanced type hints and documentation
  • Better error messages and debugging information

📦 Installation

pip install easydel==0.1.3 -U

We encourage users to upgrade to this version to take advantage of the new model support and improved inference capabilities. For detailed documentation and examples, please visit our documentation site.

Small Bug Fixes

27 Mar 17:11

Choose a tag to compare

We're excited to announce EasyDeL v0.1.2, bringing improved compatibility and key bug fixes! 🎉

🔄 What's New?

  • JAX 0.5.3 Support

    • EasyDeL now fully supports JAX 0.5.3, and everything works smoothly with implicit auto behavior.
  • Fixed Auto PyTree Issues 🌳

    • Resolved issues with implicit PyTree structures, ensuring better compatibility with JAX transformations.
  • Fixed Auto CLI from Dataclass Bugs 🛠️

    • Automatic CLI generation from dataclass now works correctly without unexpected failures.

📦 Upgrade Now

pip install --upgrade easydel

As always, please feel free to open any issues or talk about them if you encounter any problems. 🚀

Happy coding!

EasyDeL version 0.1.1: Vision-Language Models (VLMs) & Vision Models Update

18 Mar 18:06

Choose a tag to compare

EasyDeL Release Notes: Vision-Language Models (VLMs) & Vision Models Update (Pre-Train, Finetune and Inference)

First JAX Library to Support VLMs!

EasyDeL is now the first library to support Vision-Language Models (VLMs) in JAX, bringing cutting-edge multimodal AI capabilities to the ecosystem. This update significantly expands our vision model offerings while optimizing performance and usability across the board.

New Models Added

We’ve added support for the following vision and multimodal models, unlocking new capabilities in computer vision and vision-language tasks:

  • Aya Vision – A high-performance vision model with strong generalization capabilities
  • Cohere2 – Enhanced visual reasoning and feature extraction (LLM/VLM via AyaVision)
  • LLaVA – A Vision-Language model for image-grounded understanding
  • SigLip – State-of-the-art self-supervised learning for visual representations

Architecture & Performance Enhancements

We've made several improvements to streamline the framework and improve efficiency:

  • Unified Configuration Handling: Refactored configuration methods to ensure consistency across all modules, reducing redundant code and making customization easier.
  • Lazy Imports for Faster Startup: Implemented lazy loading of dependencies, significantly reducing initialization time and improving integration flexibility.
  • Extended VLM Support: Expanded Vision-Language Model (VLM) support throughout vinference core and API server, enabling seamless inference and integration.

Technical Maintenance & Cleanup

  • Removed deprecated code to enhance maintainability and keep the codebase clean.
  • Improved internal documentation and structured error handling for more robust deployments.

To use EasyDeL for Vision-Language inference with models like LLaVA, follow this setup:

🔧 Installation

Ensure you have EasyDeL installed:

pip install "easydel[all]==0.1.1"

🚀 Running a Vision-Language Model

Here’s a minimal script to load and serve a VLM using EasyDeL:

import easydel as ed
import jax
from jax import numpy as jnp
from transformers import AutoProcessor

def main():
    sharding_axis_dims = (1, 1, -1, 1)  # DP, FSDP, TP, SP
    prefill_length = 8192 - 1024
    max_new_tokens = 1024
    max_length = max_new_tokens + prefill_length
    pretrained_model_name_or_path = "llava-hf/llava-1.5-7b-hf"

    dtype = jnp.bfloat16
    param_dtype = jnp.bfloat16
    partition_axis = ed.PartitionAxis()

    processor = AutoProcessor.from_pretrained(pretrained_model_name_or_path)
    processor.padding_side = "left"

    model = ed.AutoEasyDeLModelForImageTextToText.from_pretrained(
        pretrained_model_name_or_path,
        auto_shard_model=True,
        sharding_axis_dims=sharding_axis_dims,
        config_kwargs=ed.EasyDeLBaseConfigDict(
            freq_max_position_embeddings=max_length,
            mask_max_position_embeddings=max_length,
            kv_cache_quantization_method=ed.EasyDeLQuantizationMethods.NONE,
            gradient_checkpointing=ed.EasyDeLGradientCheckPointers.NONE,
            attn_dtype=param_dtype,
            attn_mechanism=ed.AttentionMechanisms.AUTO,
        ),
        quantization_method=ed.EasyDeLQuantizationMethods.NONE,
        param_dtype=param_dtype,
        dtype=dtype,
        partition_axis=partition_axis,
        precision=jax.lax.Precision.DEFAULT,
    )

    inference = ed.vInference(
        model=model,
        processor_class=processor,
        generation_config=ed.vInferenceConfig(
            max_new_tokens=max_new_tokens,
            temperature=0.8,
            do_sample=True,
            top_p=0.95,
            top_k=10,
            eos_token_id=model.generation_config.eos_token_id,
            streaming_chunks=32,
            num_return_sequences=1,
        ),
        inference_name="easydelvlm",
    )

    inference.precompile(
        ed.vInferencePreCompileConfig(
            batch_size=1,
            prefill_length=prefill_length,
            vision_included=True,
            vision_batch_size=1,
            vision_channels=3,
            vision_height=336,
            vision_width=336,
        )
    )
    
    ed.vInferenceApiServer(inference, max_workers=10).fire()

if __name__ == "__main__":
    main()

🌍 Example API Request

Once your model is running, you can query it using an OpenAI-compatible API format:

{
  "model": "easydelvlm",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image",
          "image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"
        },
        {
          "type": "text",
          "text": "Describe this image in detail."
        }
      ]
    }
  ],
  "temperature": 1.0,
  "top_p": 1.0,
  "max_tokens": 16
}

This release marks a major milestone in EasyDeL's evolution, ensuring that JAX users have access to state-of-the-art vision and multimodal AI models with optimized performance. 🚀

EasyDeL v0.1.0 - No Trade-Off – Unleashing Uncompromised Performance & Modular Magic

11 Mar 00:45

Choose a tag to compare

We’re pleased to introduce EasyDeL v0.1.0—a significant update that improves our framework’s performance, modularity, and integration capabilities. This release brings several important changes and enhancements, ensuring a smoother and more flexible experience for model training and inference.

Introduction

EasyDeL v0.1.0 marks a solid step forward in our journey. With a renewed focus on modularity, distributed training, and improved API integrations, this release aims to offer better support for your research and development needs without overpromising. We continue to work hard on making EasyDeL a dependable tool for deep learning and machine learning tasks.

New Core Components

NNX Flax API Integration

  • What’s New:
    We have replaced the previous Linen-based implementation with the NNX Flax API.
  • Benefits:
    • More efficient computation graphs and cleaner API design.
    • Enhanced flexibility for customization and future extensions.

vInference Engine & vInferenceAPIServer

  • vInference Engine:
    A new component designed to deliver reliable model inference with low latency.
  • vInferenceAPIServer:
    Provides an OpenAI-compatible interface to make model deployment straightforward.
  • Key Points:
    • Better integration with production environments.
    • Improved logging and monitoring features.

Distributed Training and Scalability

Support for Ray and MultiSlice

  • Enhanced Distribution:
    EasyDeL now supports distribution with Ray and MultiSlice, making it easier to scale training workloads across multiple nodes or GPUs.
  • Impact:
    • More efficient resource utilization.
    • Reduced training times for larger models in distributed settings.

Expanded Trainer Suite

New and Enhanced Trainers

  • GRPO Trainers:
    Introduced to help manage more advanced training scenarios.
  • Reward Model Trainers:
    Added support for reinforcement learning and preference-based training.
  • Bug Fixes:
    Important fixes have been applied to ORPO and DPO trainers to improve overall stability and reliability.
  • Overall Improvements:
    Enhanced logging, improved error handling, and more configurable options have been integrated to make the training process more predictable and user-friendly.

Attention Mechanism and Performance Enhancements

Bug Fixes and Optimizations

  • Attention Mechanisms:
    Resolved issues in Flash Attention (GPU/TPU) and Splash Attention (TPU) to ensure smoother operations.
  • Performance:
    Fine-tuned kernel launch times, memory management, and synchronization across devices for a modest but valuable performance boost.
  • Dynamic Quantization:
    Continued improvements in support for various quantization methods (NF4, A8BIT, A8Q, A4Q) offer a better balance between model size and inference speed.

Extended Model Support

New and Updated Models

  • DeepSeekV3:
    We’ve added support for DeepSeekV3, keeping up with emerging model architectures.
  • General Model Expansion:
    Additional new models have been integrated, ensuring that EasyDeL remains compatible with a wider range of model types.

Modularity and Hackability

A More Modular Codebase

  • Improved Structure:
    The codebase has been refactored into clearer, well-organized modules and functions, making it easier for developers to navigate and customize.
  • Customization:
    Whether modifying trainer behavior or integrating new models, the enhanced modular design allows changes without impacting overall system stability.
  • Community Focus:
    We encourage developers and researchers to explore and extend the framework in ways that best suit their projects.

Additional Improvements & Bug Fixes

  • Documentation Updates:
    In-line documentation and external resources have been refreshed to reflect these changes.
  • Stability Enhancements:
    A number of bug fixes across trainers, attention mechanisms, and hardware-specific operations lead to a more reliable framework.
  • Developer Experience:
    Enhanced error messages and detailed logging have been implemented to simplify troubleshooting and further development.
  • API Consistency:
    Internal APIs have been standardized and better documented for smoother integration with external tools.

Looking Ahead

EasyDeL v0.1.0 sets a strong foundation for future improvements. Upcoming updates will continue to expand support for distributed training, integrate additional models, and further refine the user and developer experience.

Full Changelog: 0.0.80...0.1.0

EasyDeL version 0.0.80

04 Dec 15:05

Choose a tag to compare

EasyDeL 0.0.80 brings enhanced flexibility, expanded model support, and improved performance with the introduction of vInference and optimized GPU/TPU integration. This version offers a significant speed and performance boost, with benchmarks showing improvements of over 4.9%, making EasyDeL more dynamic and easier to work with.

New Features:

  • Platform and Backend Flexibility: Users can now specify the platform (e.g., TRITON) and backend (e.g., GPU) to optimize their workflows.
    Expanded Model Support: We have added support for new models including olmo2, qwen2_moe, mamba2, and others, enhancing the tool's versatility.
  • Enhanced Trainers: Trainers are now more customizable and hackable, providing greater flexibility for project-specific needs.
  • New Trainer Types: Introduced sequence-to-sequence trainers and sequence classification trainers to support a wider range of training tasks.
  • vInference Engine: A robust inference engine for LLMs with Long-Term Support (LTS), ensuring stability and reliability.
  • vInferenceApiServer: A backend for the inference engine that is fully compatible with OpenAI APIs, facilitating easy integration.
  • Optimized GPU Integration: Leverages custom, direct TRITON calls for improved GPU performance, speeding up processing times.
  • Dynamic Quantization Support: Added support for quantization types NF4, A8BIT, A8Q, and A4Q, enabling efficiency and scalability.

Performance Improvements:

  • EasyDeL 0.0.80 has been optimized for speed and performance, with benchmarks showing improvements of over 4.9% compared to previous versions.
  • The tool is now more dynamic and easier to work with, enhancing the overall user experience.

This release is a significant step forward in making EasyDeL a more powerful and flexible tool for machine learning tasks. We look forward to your feedback and continued support.

Documentation:

Comprehensive documentation is available at https://easydel.readthedocs.io/en/latest/

Example Usage:

Load any of the 40+ available models with EasyDeL:

 
sharding_axis_dims = (1, 1, 1, -1)  # sequence sharding for better inference and training
max_length = 2**15
pretrained_model_name_or_path = "AnyEasyModel"
dtype = jnp.float16
model, params = ed.AutoEasyDeLModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path,
    input_shape=(len(jax.devices()), max_length),
    auto_shard_params=True,
    sharding_axis_dims=sharding_axis_dims,
    config_kwargs=EasyDeLBaseConfigDict(
        use_scan_mlp=False,
        attn_dtype=jnp.float16,
        freq_max_position_embeddings=max_length,
        mask_max_position_embeddings=max_length,
        attn_mechanism=ed.AttentionMechanisms.VANILLA,
        kv_cache_quantization_method=ed.EasyDeLQuantizationMethods.A8BIT,
        use_sharded_kv_caching=False,
        gradeint_checkpointing=ed.EasyDeLGradientCheckPointers.NONE,
    ),
    quantization_method=ed.EasyDeLQuantizationMethods.NF4,
    quantization_block_size=256,
    platform=ed.EasyDeLPlatforms.TRITON,
    partition_axis=ed.PartitionAxis(),
    param_dtype=dtype,
    dtype=dtype,
    precision=lax.Precision("fastest"),
)

This release marks a significant advancement in making EasyDeL a more powerful and flexible tool for machine learning tasks. We look forward to your feedback and continued support.

Note

This might be the last release of EasyDeL that incorporates HF/Flax modules. In future versions, EasyDeL will transition to its own base
modules and may adopt Equinox or Flax NNX, provided that NNX meets sufficient performance standards. Users are encouraged to
provide feedback on this direction.
This release represents a significant step forward in making EasyDeL a more powerful and flexible tool for machine learning tasks. We
look forward to your feedback and continued support.

EasyDeL version 0.0.69

04 Jul 15:21

Choose a tag to compare

This release brings significant scalability improvements, new models, bug fixes, and usability enhancements to EasyDeL.

Highlights:

  • Multi-host GPU Training: EasyDeL now scales seamlessly across multiple GPUs and hosts for demanding training workloads.
  • New Models: Expand your NLP arsenal with the addition of Gemma2, OLMo, and Aya models.
  • Improved KV Cache Quantization: Enjoy a substantial accuracy boost with enhanced KV cache quantization, achieving +21% accuracy compared to the previous version.
  • Simplified Model Management: Load and save pretrained models effortlessly using the new model.from_pretrained and model.save_pretrained methods.
  • Enhanced Generation Pipeline: The GenerationPipeLine now supports streaming token generation, ideal for real-time applications.
  • Introducing the ApiEngine: Leverage the power of the new ApiEngine and engine_client for seamless integration with your applications.

Other Changes:

  • Fixed GPU Flash Attention bugs for increased stability.
  • Updated required jax version to >=0.4.28 for optimal performance. Versions 0.4.29 or higher are recommended if available.
  • Streamlined the structure import process and resolved multi-host training issues.

Upgrade:

To upgrade to EasyDeL v0.0.69, use the following command:

pip install --upgrade easydel==0.0.69