|
1 | | -test |
| 1 | +# RecML: High-Performance Recommender Library |
| 2 | + |
| 3 | +## Vision |
| 4 | + |
| 5 | +RecML is envisioned as a high-performance, large-scale deep learning recommender |
| 6 | +system library optimized for Cloud TPUs. It aims to provide researchers and |
| 7 | +practitioners state-of-the-art reference implementations, tools, and best |
| 8 | +practice guidelines for building and deploying recommender systems. |
| 9 | + |
| 10 | +The key goals of RecML are: |
| 11 | + |
| 12 | +* **Performance & Scalability:** Leverage Cloud TPUs (including SparseCore |
| 13 | + acceleration) to deliver exceptional performance for training and serving |
| 14 | + massive models with large embeddings on datasets with millions or billions |
| 15 | + of items/users. RecML can additionally target Cloud GPUs. |
| 16 | +* **State-of-the-Art Models:** Provide production-ready, easy-to-understand |
| 17 | + reference implementations of popular and cutting-edge models, with a strong |
| 18 | + focus on LLM-based recommenders. |
| 19 | +* **Ease of Use:** Offer a user-friendly API, intuitive abstractions, and |
| 20 | + comprehensive documentation/examples for rapid prototyping and deployment. |
| 21 | +* **Flexibility:** Primarily built with Keras and JAX, but designed with |
| 22 | + potential future expansion to other frameworks like PyTorch/XLA. |
| 23 | +* **Open Source:** Foster community collaboration and provide components to |
| 24 | + help users get started with advanced recommender workloads on Google Cloud. |
| 25 | + |
| 26 | +## Features |
| 27 | + |
| 28 | +* **High Performance:** Optimized for Cloud TPU (SparseCore) training and |
| 29 | + inference. |
| 30 | +* **Scalable Architecture:** Designed for massive datasets and models with |
| 31 | + large embedding tables. Includes support for efficient data loading |
| 32 | + (tf.data, potentially Grain) and sharding/SPMD. |
| 33 | +* **State-of-the-Art Model Implementations:** Reference implementations for |
| 34 | + various recommendation tasks (ranking, retrieval, sequential). |
| 35 | +* **Reusable Building Blocks:** |
| 36 | + * Common recommendation layers (e.g., DCN, BERT4Rec). |
| 37 | + * Specialized Embedding APIs (e.g. JAX Embedding API for SparseCore). |
| 38 | + * Standardized metrics (e.g., AUC, Accuracy, NDCG@K, MRR, Recall@K). |
| 39 | + * Common loss functions. |
| 40 | +* **Unified Trainer:** A high-level trainer abstraction capable of targeting |
| 41 | + different hardware (TPU/GPU) and frameworks. Includes customizable training |
| 42 | + and evaluation loops. |
| 43 | +* **End-to-End Support:** Covers aspects from data pipelines to training, |
| 44 | + evaluation, checkpointing, metrics logging (e.g., to BigQuery), and model |
| 45 | + export/serving considerations. |
| 46 | + |
| 47 | +## Models Included |
| 48 | + |
| 49 | +This library aims to house implementations for a variety of recommender models, |
| 50 | +including: |
| 51 | + |
| 52 | +* **SASRec:** Self-Attention based Sequential Recommendation |
| 53 | +* **BERT4Rec:** Bidirectional Encoder Representations from Transformer for |
| 54 | + Sequential Recommendation. |
| 55 | +* **Mamba4Rec:** Efficient Sequential Recommendation with Selective State |
| 56 | + Space Models. |
| 57 | +* **HSTU:** Hierarchical Sequential Transduction Units for Generative |
| 58 | + Recommendations. |
| 59 | +* **DLRM v2:** Deep Learning Recommendation Model |
| 60 | + |
| 61 | +## Roadmap / Future Work |
| 62 | + |
| 63 | +* Expand reference model implementations (Retrieval, Uplift, foundation user |
| 64 | + model). |
| 65 | +* Add support for optimized configurations and lower precision training |
| 66 | + (bfloat16, fp16). |
| 67 | +* Improve support for Cloud GPU training and inference |
| 68 | +* Enhance sharding and quantization support. |
| 69 | +* Improve integration with Keras (and Keras Recommenders) and potentially |
| 70 | + PyTorch/XLA. |
| 71 | +* Develop comprehensive model serving examples and integrations. |
| 72 | +* Refine data loading pipelines (e.g., Grain support). |
| 73 | +* Add more common layers, losses, and metrics. |
| 74 | + |
| 75 | +## Responsible Use |
| 76 | + |
| 77 | +As with any machine learning model, potential risks exist. The performance and |
| 78 | +behavior depend heavily on the training data, which may contain biases reflected |
| 79 | +in the recommendations. Developers should carefully evaluate the model's |
| 80 | +fairness and potential limitations in their specific application context. |
| 81 | + |
| 82 | +## License |
| 83 | + |
| 84 | +RecML is released under the Apache 2.0. Please see the `LICENSE` file for full |
| 85 | +details. |
0 commit comments