Releases: erfanzar/eformer
Releases · erfanzar/eformer
eFormer 0.0.90
This release note summarizes changes since the last GitHub release (0.0.42), up to 0.0.90.
Highlights
- Major serialization upgrade with a high-level Checkpointer, AsyncManager improvements, and structured checkpoints.
- Expanded quantization stack with NF4 improvements, 1-bit support, a dedicated config object, and stronger tests.
- Mesh creation and sharding utilities modernized, including a switch to JAX-driven dynamic mesh creation.
- Executor/Ray tooling enhanced with better error handling, new helpers, and device management updates.
- API documentation reorganized into a hierarchical layout with a dedicated generation script.
Breaking and Migration Notes
- Legacy
calliband several oldopssubpackages were removed during the codebase restructure. Update imports to the current module layout (notably undereformer/ops/quantizationand consolidated packages). - Python 3.10 support was dropped and JAX was updated (to 0.7.x); verify your runtime compatibility.
- API docs paths and module names changed to a nested structure; update any custom doc links.
Serialization and Checkpointing
- Added a high-level Checkpointer with policy-based management and improved lifecycle handling.
- Introduced structured checkpoint support and fsspec path utilities.
- Added distributed checkpointing and sharded checkpoint helpers.
- Implemented CPU offloading paths and SafeTensors support.
- Improved AsyncManager behavior, version management, and checkpoint safety.
- Added and refined GCS support with unified index handling.
Quantization and Implicit Arrays
- Added a quantization configuration object and updated training examples.
- Extended NF4 kernel support and fixed issues across implicit array implementations.
- Added 1-bit quantization support and improved 8-bit/NF4 behavior.
- Expanded quantization tests and runtime coverage for sharded use cases.
Mesh, Sharding, and Partitioning
- Switched dynamic mesh creation to
jax.make_meshinstead of manual topology calculations. - Added CPU mesh utilities and barrier synchronization helpers.
- Improved partition constraints and error handling for sharding utilities.
- Added new tests for mesh creation utilities.
Executor, Ray, and TPU Tooling
- Added a
device_remotedecorator and expanded executor helper utilities. - Improved execute_multislice error handling and pool management (including DeviceHostActor support).
- Refined Ray executor health checks, blocking behavior, and resource manager logic.
- Updated TPU patching and Ray integration utilities.
Optimizers
- Refactored the optimizer stack into clearer builders, factories, and tx utilities.
- Improved WhiteKron and Mars implementations with new tests.
- Updated optimizer documentation to match the new layout.
Paths and Storage
- Fixed GCSPath glob/iterdir edge cases and improved error handling.
- Improved LocalPath mkdir handling and path safety checks.
Docs, Tooling, and Tests
- API docs reorganized into hierarchical packages with per-package indices.
- Added
format_and_generate_docs.pyfor formatting and documentation generation. - Added tests for mesh creation, aparser, paths, mixed precision utilities, and serialization helpers.
- Expanded optimizer, quantization, cluster, and logging tests.
Versioning and Packaging
- Version bumps from 0.0.43 through 0.0.90 with incremental fixes and dependency updates.
- Added optional dev dependencies (including pytest) and updated requirements for serialization backends.
0.0.42
Release Notes (0.0.42, 0.0.41, 0.0.40, ...)
Major Features & Enhancements
-
Distributed Execution & Cluster Management
- Added RayExecutor for executing remote functions with support for multi-slice and resumable executions.
- Implemented Ray TPU/GPU/CPU Cluster Setup utilities.
- Enhanced TPU patcher and cluster utility functions, including dynamic patching and improved command-line configuration.
-
Sharding & Partitioning
- Introduced new dynamic sharding axes and enhanced partition manager functionality.
- Added flexible sharding strategies: Data Parallelism (DP), Fully Sharded Data Parallel (FSDP), Tensor Parallelism (TP), Expert Parallelism (EP), Sequence Parallelism (SP).
- Improved partition axis handling, including helper functions and dataclass-based refactors.
-
PyTree & Serialization
- Added
FrozenPyTreeand improved PyTree module for better JAX compatibility. - Enhanced serialization capabilities for JAX PyTree-compatible dataclasses.
- Improved error handling and docstrings in state management.
- Added
-
Optimized Operations & Quantization
- Improved Triton call logging and error handling for more consistent output.
- Enhanced quantization functions and support for float8, float16, bfloat16, and dynamic loss scaling.
- Added support for 8-bit and NF4 quantization for efficient model deployment.
-
Documentation & Usability
- Updated and expanded documentation, including project structure, key features, and API references.
- Improved README and Sphinx documentation structure.
- Added license headers and improved code readability and maintainability.
-
General Refactoring & Maintenance
- Refactored codebase for improved clarity, maintainability, and Python 3.10+ compatibility.
- Updated dependencies and switched from
poetrytouvfor build management. - Removed deprecated and obsolete modules, streamlined imports, and improved module exports.
Notable Fixes
- Fixed issues with mesh creation for multi-slice environments.
- Enhanced error handling for Ray command execution in TPU patcher.
- Fixed Python 3.10 compatibility issues.
- Improved logging and validation in sharding and partitioning utilities.
eformer (EasyDel Former)
eformer (EasyDel Former) is a utility library designed to simplify and enhance the development of machine learning models using JAX. It provides a collection of tools for sharding, custom PyTrees, quantization, mixed precision training, and optimized operations, making it easier to build and scale models efficiently.
- Mixed Precision Training (
mpric): Advanced mixed precision utilities supporting float8, float16, and bfloat16 with dynamic loss scaling. - Sharding Utilities (
escale): Tools for efficient sharding and distributed computation in JAX. - Custom PyTrees (
jaximus): Enhanced utilities for creating custom PyTrees andArrayValueobjects, updated from Equinox. - Custom Calling (
callib): A tool for custom function calls and direct integration with Triton kernels in JAX. - Optimizer Factory: A flexible factory for creating and configuring optimizers like AdamW, Adafactor, Lion, and RMSProp.
- Custom Operations and Kernels:
- Flash Attention 2 for GPUs/TPUs (via Triton and Pallas).
- 8-bit and NF4 quantization for efficient model.
- Many others to be added.
- Quantization Support: Tools for 8-bit and NF4 quantization, enabling memory-efficient model deploymen