Skip to content

Releases: erfanzar/eformer

eFormer 0.0.90

07 Jan 12:48

Choose a tag to compare

This release note summarizes changes since the last GitHub release (0.0.42), up to 0.0.90.

Highlights

  • Major serialization upgrade with a high-level Checkpointer, AsyncManager improvements, and structured checkpoints.
  • Expanded quantization stack with NF4 improvements, 1-bit support, a dedicated config object, and stronger tests.
  • Mesh creation and sharding utilities modernized, including a switch to JAX-driven dynamic mesh creation.
  • Executor/Ray tooling enhanced with better error handling, new helpers, and device management updates.
  • API documentation reorganized into a hierarchical layout with a dedicated generation script.

Breaking and Migration Notes

  • Legacy callib and several old ops subpackages were removed during the codebase restructure. Update imports to the current module layout (notably under eformer/ops/quantization and consolidated packages).
  • Python 3.10 support was dropped and JAX was updated (to 0.7.x); verify your runtime compatibility.
  • API docs paths and module names changed to a nested structure; update any custom doc links.

Serialization and Checkpointing

  • Added a high-level Checkpointer with policy-based management and improved lifecycle handling.
  • Introduced structured checkpoint support and fsspec path utilities.
  • Added distributed checkpointing and sharded checkpoint helpers.
  • Implemented CPU offloading paths and SafeTensors support.
  • Improved AsyncManager behavior, version management, and checkpoint safety.
  • Added and refined GCS support with unified index handling.

Quantization and Implicit Arrays

  • Added a quantization configuration object and updated training examples.
  • Extended NF4 kernel support and fixed issues across implicit array implementations.
  • Added 1-bit quantization support and improved 8-bit/NF4 behavior.
  • Expanded quantization tests and runtime coverage for sharded use cases.

Mesh, Sharding, and Partitioning

  • Switched dynamic mesh creation to jax.make_mesh instead of manual topology calculations.
  • Added CPU mesh utilities and barrier synchronization helpers.
  • Improved partition constraints and error handling for sharding utilities.
  • Added new tests for mesh creation utilities.

Executor, Ray, and TPU Tooling

  • Added a device_remote decorator and expanded executor helper utilities.
  • Improved execute_multislice error handling and pool management (including DeviceHostActor support).
  • Refined Ray executor health checks, blocking behavior, and resource manager logic.
  • Updated TPU patching and Ray integration utilities.

Optimizers

  • Refactored the optimizer stack into clearer builders, factories, and tx utilities.
  • Improved WhiteKron and Mars implementations with new tests.
  • Updated optimizer documentation to match the new layout.

Paths and Storage

  • Fixed GCSPath glob/iterdir edge cases and improved error handling.
  • Improved LocalPath mkdir handling and path safety checks.

Docs, Tooling, and Tests

  • API docs reorganized into hierarchical packages with per-package indices.
  • Added format_and_generate_docs.py for formatting and documentation generation.
  • Added tests for mesh creation, aparser, paths, mixed precision utilities, and serialization helpers.
  • Expanded optimizer, quantization, cluster, and logging tests.

Versioning and Packaging

  • Version bumps from 0.0.43 through 0.0.90 with incremental fixes and dependency updates.
  • Added optional dev dependencies (including pytest) and updated requirements for serialization backends.

0.0.42

18 Jul 15:35

Choose a tag to compare

Release Notes (0.0.42, 0.0.41, 0.0.40, ...)

Major Features & Enhancements

  • Distributed Execution & Cluster Management

    • Added RayExecutor for executing remote functions with support for multi-slice and resumable executions.
    • Implemented Ray TPU/GPU/CPU Cluster Setup utilities.
    • Enhanced TPU patcher and cluster utility functions, including dynamic patching and improved command-line configuration.
  • Sharding & Partitioning

    • Introduced new dynamic sharding axes and enhanced partition manager functionality.
    • Added flexible sharding strategies: Data Parallelism (DP), Fully Sharded Data Parallel (FSDP), Tensor Parallelism (TP), Expert Parallelism (EP), Sequence Parallelism (SP).
    • Improved partition axis handling, including helper functions and dataclass-based refactors.
  • PyTree & Serialization

    • Added FrozenPyTree and improved PyTree module for better JAX compatibility.
    • Enhanced serialization capabilities for JAX PyTree-compatible dataclasses.
    • Improved error handling and docstrings in state management.
  • Optimized Operations & Quantization

    • Improved Triton call logging and error handling for more consistent output.
    • Enhanced quantization functions and support for float8, float16, bfloat16, and dynamic loss scaling.
    • Added support for 8-bit and NF4 quantization for efficient model deployment.
  • Documentation & Usability

    • Updated and expanded documentation, including project structure, key features, and API references.
    • Improved README and Sphinx documentation structure.
    • Added license headers and improved code readability and maintainability.
  • General Refactoring & Maintenance

    • Refactored codebase for improved clarity, maintainability, and Python 3.10+ compatibility.
    • Updated dependencies and switched from poetry to uv for build management.
    • Removed deprecated and obsolete modules, streamlined imports, and improved module exports.

Notable Fixes

  • Fixed issues with mesh creation for multi-slice environments.
  • Enhanced error handling for Ray command execution in TPU patcher.
  • Fixed Python 3.10 compatibility issues.
  • Improved logging and validation in sharding and partitioning utilities.

eformer (EasyDel Former)

04 Feb 10:06

Choose a tag to compare

eformer (EasyDel Former) is a utility library designed to simplify and enhance the development of machine learning models using JAX. It provides a collection of tools for sharding, custom PyTrees, quantization, mixed precision training, and optimized operations, making it easier to build and scale models efficiently.

  • Mixed Precision Training (mpric): Advanced mixed precision utilities supporting float8, float16, and bfloat16 with dynamic loss scaling.
  • Sharding Utilities (escale): Tools for efficient sharding and distributed computation in JAX.
  • Custom PyTrees (jaximus): Enhanced utilities for creating custom PyTrees and ArrayValue objects, updated from Equinox.
  • Custom Calling (callib): A tool for custom function calls and direct integration with Triton kernels in JAX.
  • Optimizer Factory: A flexible factory for creating and configuring optimizers like AdamW, Adafactor, Lion, and RMSProp.
  • Custom Operations and Kernels:
    • Flash Attention 2 for GPUs/TPUs (via Triton and Pallas).
    • 8-bit and NF4 quantization for efficient model.
    • Many others to be added.
  • Quantization Support: Tools for 8-bit and NF4 quantization, enabling memory-efficient model deploymen