|
6 | 6 |
|
7 | 7 |
|
8 | 8 | <p align="center"> |
9 | | - 🤗 <a href="https://huggingface.co/collections/ibm-fms/bamba-674f1388b9bbc98b413c7bab"> Bamba on Hugging Face</a>  | <a href="https://github.com/foundation-model-stack/bamba/blob/main/blog/bamba-9b-release.md"> Bamba Blog</a>  |
| 9 | + 🤗 <a href="https://huggingface.co/collections/ibm-fms/bamba-674f1388b9bbc98b413c7bab"> Bamba on Hugging Face</a>  | <a href="https://huggingface.co/blog/bamba"> Bamba Blog</a>  |
10 | 10 | <be> |
11 | 11 |
|
12 | 12 |
|
@@ -44,11 +44,11 @@ pip install git+https://github.com/huggingface/transformers.git |
44 | 44 | | Bamba | 9B (9.78B) | 32 | 4096 | 32 | Yes | 8 | 4096 | False | |
45 | 45 |
|
46 | 46 | ### Checkpoints |
47 | | -We have published our model checkpoints here: [Bamba Models](https://huggingface.co/collections/ibm-fms/bamba-674f1388b9bbc98b413c7bab) |
| 47 | +You can find links to our model checkpoints here: [Bamba Models](https://huggingface.co/collections/ibm-fms/bamba-674f1388b9bbc98b413c7bab) |
48 | 48 |
|
49 | 49 | ## Inference |
50 | 50 |
|
51 | | -You can utilize our newly contributed HF integration to run inference on our Bamba models: |
| 51 | +You can use the following command to perform text generation using one of our checkpoints provided above: |
52 | 52 |
|
53 | 53 | ```python |
54 | 54 | python text_generation.py --model_path ibm-fms/Bamba-9B --tokenizer_path ibm-fms/Bamba-9B --prompt "The largest living mammal on Earth is " --max_new_tokens 128 |
@@ -278,17 +278,17 @@ cd /path/to/llama.cpp |
278 | 278 |
|
279 | 279 | ## Contributors |
280 | 280 |
|
281 | | -* **Data collection and curation**: We acknowledge and thank AllenAI team for making a high quality open source dataset Dolma as well as Hugging Face data team for making FineWeb-edu and Cosmopedia available. These are tremendous contributions and enable us to create the model today. |
| 281 | +* **Data collection and curation**: We acknowledge and thank AllenAI team for making a high quality open source dataset Dolma as well as Hugging Face data team for making FineWeb-edu and Cosmopedia available. These are tremendous contributions which enabled us to create the model. |
282 | 282 | * **Data preprocessing**: We thank IBM's internal data preprocessing team, specifically Tuan Hoang Trong, Syed Zawad, Jay Gala, and Ryan Gordon for helping tokenize the data at scale. The code for tokenization is available [here](https://github.com/IBM/data-prep-kit). |
283 | 283 | * **Model architecture**: The model architecture design was jointly done by Princeton, CMU, IBM, and UIUC and involved the following folks: Tri Dao (Princeton), Albert Gu (CMU), Linsong Chu (IBM), Davis Wertheimer (IBM), Minjia Zhang (UIUC), Mudhakar Srivatsa (IBM), and Raghu Ganti (IBM). |
284 | 284 | * **Model training**: Model training was performed primarily by the IBM team using the Mamba2 kernels and layer implementation from Tri Dao and Albert Gu. The following folks from IBM were primarily involved: Linsong Chu, Divya Kumari, Davis Wertheimer, Raghu Ganti, and Dakshi Agrawal. |
285 | 285 | * **Model tuning**: Tuning of the model was enabled and verified in [TRL](https://github.com/huggingface/trl) by the IBM team, involving Sukriti Sharma and Anh Uong. |
286 | | -* **Model inference**: Model inference in `transformers`, `vLLM`, and `llama.cpp` builds on the kernels written by Princeton and CMU. The IBM team is working with the community to enable it in various ecosystems, the team includes Fabian Lim, Antoni viros i Martin, Adnan Hoque, Jamie Yang, Nelson Nimura Gomez, Joshua Rosenkranz, Nick Hill, and Gabe Goodhart. |
| 286 | +* **Model inference**: Model inference in `transformers`, `vLLM`, and `llama.cpp` builds on the kernels written by Princeton and CMU. The IBM team is working with the community to enable it in various ecosystems. The team includes Fabian Lim, Antoni viros i Martin, Adnan Hoque, Jamie Yang, Nelson Nimura Gonzalez, Joshua Rosenkranz, Nick Hill, and Gabe Goodhart. |
287 | 287 | * **Quantization**: Quantization is led by the IBM team \- Naigang Wang and Charlie Liu. |
288 | | -* **Evaluations**: Evaluations were led by the IBM team with long context evaluations being performed by UIUC, involving: Ofir Arviv (IBM), Yotam Perlitz (IBM), Michal Shmueli-Scheuer (IBM), Haoechen Shen (UIUC), and Minjia Zhang (UIUC). |
| 288 | +* **Evaluations**: Evaluations are led by a team in IBM with long context evaluations being performed by UIUC, involving the following folks: Yotam Perlitz, Ofir Arviv, Michal Shmueli-Scheuer (IBM), Haoechen Shen, and Minjia Zhang (UIUC). |
289 | 289 |
|
290 | | -Finally, we would like to thank our leadership for their support in this effort \- Priya Nagpurkar, David Cox, Sriram Raghavan, Aya Soffer, and Mukesh Khare. |
| 290 | +Finally, we would like to thank our leadership for their support in this effort \- Priya Nagpurkar, David Cox, Sriram Raghavan, Aya Soffer, Ruchir Puri, and Mukesh Khare. |
291 | 291 |
|
292 | | -We would also like to thank the community, in particular Pablo Montalvo-Leroux and Vaibhav Srivastav from Hugging Face who provided valuable feedback to this blog and the PRs into transformers. Further, we would like to thank Tyler Michael Smith from Neural Magic, who is shepherding the integration with vLLM. |
| 292 | +We would also like to thank the community, in particular Pablo Montalvo-Leroux, Aritra Roy Gosthipaty, and Vaibhav Srivastav from Hugging Face and Stas Bekman from Contextual AI who provided valuable feedback to this blog and the PRs into transformers. Further, we would like to thank Tyler Michael Smith from Neural Magic, who is shepherding the integration with vLLM. |
293 | 293 |
|
294 | | -A huge shoutout to Meta PyTorch, AllenAI, and Hugging Face teams for their contributions to the open initative, FSDP allowed us to smoothly train this model and the data from Dolma and Fineweb/Cosmopedia made this model today! |
| 294 | +A huge shoutout to Meta PyTorch, AllenAI, and Hugging Face teams for their contributions to the open initative, PyTorch FSDP allowed us to smoothly train this model and the data from Dolma and Fineweb/Cosmopedia made this model today! |
0 commit comments