Skip to content

Commit 5a590b1

Browse files
New case studies (#1361)
* Add files via upload * Create writer-case-study.md * Update writer-case-study.md * Update _blog.yml * Create snorkel-case-study.md * Update _blog.yml * Create mantis-case-study.md * Create genomicsengland-case-study.md * Create databricks-case-study.md * Add files via upload * Update mantis-case-study.md * Update snorkel-case-study.md * Update _blog.yml * Update writer-case-study.md * Update mantis-case-study.md * Update mantis-case-study.md * Update mantis-case-study.md * Update mantis-case-study.md * Update databricks-case-study.md * Update _blog.yml * Update _blog.yml * Update _blog.yml * Update _blog.yml * Delete mantis.png * Update snorkel-case-study.md * Delete genomicsengland-case-study.md * Delete genomics.png
1 parent 2bb6ef3 commit 5a590b1

File tree

9 files changed

+282
-0
lines changed

9 files changed

+282
-0
lines changed

_blog.yml

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1792,6 +1792,15 @@
17921792
- fine-tuning
17931793
- community
17941794
- dreambooth
1795+
1796+
- local: mantis-case-study
1797+
title: "Why we’re switching to Hugging Face Inference Endpoints, and maybe you should too"
1798+
author: mattupson
1799+
guest: true
1800+
thumbnail: /blog/assets/78_ml_director_insights/mantis1.png
1801+
date: February 15, 2023
1802+
tags:
1803+
- case-studies
17951804

17961805
- local: blip-2
17971806
title: "Zero-shot image-to-text generation with BLIP-2"
@@ -1982,6 +1991,14 @@
19821991
- rl
19831992
- rlhf
19841993
- nlp
1994+
1995+
- local: snorkel-case-study
1996+
title: "Snorkel AI x Hugging Face: unlock foundation models for enterprises"
1997+
author: VioletteLepercq
1998+
thumbnail: /blog/assets/78_ml_director_insights/snorkel.png
1999+
date: April 6, 2023
2000+
tags:
2001+
- case-studies
19852002

19862003
- local: owkin-substra
19872004
title: "Creating Privacy Preserving AI with Substra"
@@ -2034,6 +2051,15 @@
20342051
- partnerships
20352052
- community
20362053

2054+
- local: databricks-case-study
2055+
title: "Databricks ❤️ Hugging Face: up to 40% faster training and tuning of Large Language Models"
2056+
author: alighodsi
2057+
guest: true
2058+
thumbnail: /blog/assets/78_ml_director_insights/databricks.png
2059+
date: April 26, 2023
2060+
tags:
2061+
- case-studies
2062+
20372063
- local: tf_tpu
20382064
title: "Training a language model with 🤗 Transformers using TensorFlow and TPUs"
20392065
author: rocketknight1
@@ -2426,6 +2452,14 @@
24262452
- cv
24272453
- hardware
24282454

2455+
- local: writer-case-study
2456+
title: "Leveraging Hugging Face for complex generative AI use cases"
2457+
author: jeffboudier
2458+
thumbnail: /blog/assets/78_ml_director_insights/writer.png
2459+
date: July 1, 2023
2460+
tags:
2461+
- case-studies
2462+
24292463
- local: text-to-webapp
24302464
title: "Making a web app generator with open ML models"
24312465
author: jbilcke-hf
491 KB
Loading
303 KB
Loading
222 KB
Loading
168 KB
Loading

databricks-case-study.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
---
2+
title: "Databricks ❤️ Hugging Face: up to 40% faster training and tuning of Large Language Models"
3+
thumbnail: /blog/assets/78_ml_director_insights/databricks.png
4+
authors:
5+
- author: alighodsi
6+
guest: true
7+
- author: maddiedawson
8+
guest: true
9+
---
10+
11+
<h1>Databricks ❤️ Hugging Face: up to 40% faster training and tuning of Large Language Models</h1>
12+
13+
<!-- {blog_metadata} -->
14+
<!-- {authors} -->
15+
16+
Generative AI has been taking the world by storm. As the data and AI company, we have been on this journey with the release of the open source large language model [Dolly](https://huggingface.co/databricks/dolly-v2-12b), as well as the internally crowdsourced dataset licensed for research and commercial use that we used to fine-tune it, the [databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k). Both the model and dataset are available on Hugging Face. We’ve learned a lot throughout this process, and today we’re excited to announce our first of many official commits to the Hugging Face codebase that allows users to easily create a Hugging Face Dataset from an Apache Spark™ dataframe.
17+
18+
#### “It's been great to see Databricks release models and datasets to the community, and now we see them extending that work with direct open source commitment to Hugging Face. Spark is one of the most efficient engines for working with data at scale, and it's great to see that users can now benefit from that technology to more effectively fine tune models from Hugging Face.”
19+
— Clem Delange, Hugging Face CEO
20+
21+
## Hugging Face gets first-class Spark support
22+
23+
Over the past few weeks, we’ve gotten many requests from users asking for an easier way to load their Spark dataframe into a Hugging Face dataset that can be utilized for model training or tuning. Prior to today’s release, to get data from a Spark dataframe into a Hugging Face dataset, users had to write data into Parquet files and then point the Hugging Face dataset to these files to reload them. For example:
24+
25+
```swift
26+
from datasets import load_dataset
27+
28+
train_df = train.write.parquet(train_dbfs_path, mode="overwrite")
29+
30+
train_test = load_dataset("parquet", data_files={"train":f"/dbfs{train_dbfs_path}/*.parquet", "test":f"/dbfs{test_dbfs_path}/*.parquet"})
31+
32+
#16GB == 22min
33+
```
34+
Not only was this cumbersome, but it also meant that data had to be written to disk and then read in again. On top of that, the data would get rematerialized once loaded back into the dataset, which eats up more resources and, therefore, more time and cost. Using this method, we saw that a relatively small (16GB) dataset took about 22 minutes to go from Spark dataframe to Parquet, and then back into the Hugging Face dataset.
35+
36+
With the latest Hugging Face release, we make it much simpler for users to accomplish the same task by simply calling the new “from_spark” function in Datasets:
37+
38+
```swift
39+
from datasets import Dataset
40+
41+
df = [some Spark dataframe or Delta table loaded into df]
42+
43+
dataset = Dataset.from_spark(df)
44+
45+
#16GB == 12min
46+
```
47+
This allows users to use Spark to efficiently load and transform data for training or fine-tuning a model, then easily map their Spark dataframe into a Hugging Face dataset for super simple integration into their training pipelines. This combines cost savings and speed from Spark and optimizations like memory-mapping and smart caching from Hugging Face datasets. These improvements cut down the processing time for our example 16GB dataset by more than 40%, going from 22 minutes down to only 12 minutes.
48+
49+
## Why does this matter?
50+
51+
As we transition to this new AI paradigm, organizations will need to use their extremely valuable data to augment their AI models if they want to get the best performance within their specific domain. This will almost certainly require work in the form of data transformations, and doing this efficiently over large datasets is something Spark was designed to do. Integrating Spark with Hugging Face gives you the cost-effectiveness and performance of Spark while retaining the pipeline integration that Hugging Face provides.
52+
53+
## Continued Open-Source Support
54+
55+
We see this release as a new avenue to further contribute to the open source community, something that we believe Hugging Face does extremely well, as it has become the de facto repository for open source models and datasets. This is only the first of many contributions. We already have plans to add streaming support through Spark to make the dataset loading even faster.
56+
57+
In order to become the best platform for users to jump into the world of AI, we’re working hard to provide the best tools to successfully train, tune, and deploy models. Not only will we continue contributing to Hugging Face, but we’ve also started releasing improvements to our other open source projects. A recent [MLflow](https://www.databricks.com/blog/2023/04/18/introducing-mlflow-23-enhanced-native-llm-support-and-new-features.html) release added support for the transformers library, OpenAI integration, and Langchain support. We also announced [AI Functions](https://www.databricks.com/blog/2023/04/18/introducing-ai-functions-integrating-large-language-models-databricks-sql.html) within Databricks SQL that lets users easily integrate OpenAI (or their own deployed models in the future) into their queries. To top it all off, we also released a [PyTorch distributor](https://www.databricks.com/blog/2023/04/20/pytorch-databricks-introducing-spark-pytorch-distributor.html) for Spark to simplify distributed PyTorch training on Databricks.
58+
59+
_This article was originally published on April 26, 2023 in [Databricks's blog](https://www.databricks.com/blog/contributing-spark-loader-for-hugging-face-datasets)._

mantis-case-study.md

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
---
2+
title: "Why we’re switching to Hugging Face Inference Endpoints, and maybe you should too"
3+
thumbnail: /blog/assets/78_ml_director_insights/mantis1.png
4+
authors:
5+
- user: mattupson
6+
guest: true
7+
---
8+
9+
<h1>Why we’re switching to Hugging Face Inference Endpoints, and maybe you should too</h1>
10+
11+
12+
<!-- {blog_metadata} -->
13+
<!-- {authors} -->
14+
15+
Hugging Face recently launched [Inference Endpoints](https://huggingface.co/inference-endpoints); which as they put it: solves transformers in production. Inference Endpoints is a managed service that allows you to:
16+
17+
- Deploy (almost) any model on Hugging Face Hub
18+
- To any cloud (AWS, and Azure, GCP on the way)
19+
- On a range of instance types (including GPU)
20+
- We’re switching some of our Machine Learning (ML) models that do inference on a CPU to this new service. This blog is about why, and why you might also want to consider it.
21+
22+
## What were we doing?
23+
24+
The models that we have switched over to Inference Endpoints were previously managed internally and were running on AWS [Elastic Container Service](https://aws.amazon.com/ecs/) (ECS) backed by [AWS Fargate](https://aws.amazon.com/fargate/). This gives you a serverless cluster which can run container based tasks. Our process was as follows:
25+
26+
- Train model on a GPU instance (provisioned by [CML](https://cml.dev/), trained with [transformers](https://huggingface.co/docs/transformers/main/))
27+
- Upload to [Hugging Face Hub](https://huggingface.co/models)
28+
- Build API to serve model [(FastAPI)](https://fastapi.tiangolo.com/)
29+
- Wrap API in container [(Docker)](https://www.docker.com/)
30+
- Upload container to AWS [Elastic Container Repository](https://aws.amazon.com/ecr/) (ECR)
31+
- Deploy model to ECS Cluster
32+
33+
Now, you can reasonably argue that ECS was not the best approach to serving ML models, but it served us up until now, and also allowed ML models to sit alongside other container based services, so it reduced cognitive load.
34+
35+
## What do we do now?
36+
37+
With Inference Endpoints, our flow looks like this:
38+
39+
- Train model on a GPU instance (provisioned by [CML](https://cml.dev/), trained with [transformers](https://huggingface.co/docs/transformers/main/))
40+
- Upload to [Hugging Face Hub](https://huggingface.co/models)
41+
- Deploy using Hugging Face Inference Endpoints.
42+
43+
So this is significantly easier. We could also use another managed service such as [SageMaker](https://aws.amazon.com/es/sagemaker/), [Seldon](https://www.seldon.io/), or [Bento ML](https://www.bentoml.com/), etc., but since we are already uploading our model to Hugging Face hub to act as a model registry, and we’re pretty invested in Hugging Face’s other tools (like transformers, and [AutoTrain](https://huggingface.co/autotrain)) using Inference Endpoints makes a lot of sense for us.
44+
45+
46+
## What about Latency and Stability?
47+
48+
Before switching to Inference Endpoints we tested different CPU endpoints types using [ab](https://httpd.apache.org/docs/2.4/programs/ab.html).
49+
50+
For ECS we didn’t test so extensively, but we know that a large container had a latency of about ~200ms from an instance in the same region. The tests we did for Inference Endpoints we based on text classification model fine tuned on [RoBERTa](https://huggingface.co/roberta-base) with the following test parameters:
51+
52+
- Requester region: eu-east-1
53+
- Requester instance size: t3-medium
54+
- Inference endpoint region: eu-east-1
55+
- Endpoint Replicas: 1
56+
- Concurrent connections: 1
57+
- Requests: 1000 (1000 requests in 1–2 minutes even from a single connection would represent very heavy use for this particular application)
58+
59+
The following table shows latency (ms ± standard deviation and time to complete test in seconds) for four Intel Ice Lake equipped CPU endpoints.
60+
61+
```bash
62+
size | vCPU (cores) | Memory (GB) | ECS (ms) | 🤗 (ms)
63+
----------------------------------------------------------------------
64+
small | 1 | 2 | _ | ~ 296
65+
medium | 2 | 4 | _ | 156 ± 51 (158s)
66+
large | 4 | 8 | ~200 | 80 ± 30 (80s)
67+
xlarge | 8 | 16 | _ | 43 ± 31 (43s)
68+
```
69+
What we see from these results is pretty encouraging. The application that will consume these endpoints serves requests in real time, so we need as low latency as possible. We can see that the vanilla Hugging Face container was more than twice as fast as our bespoke container run on ECS — the slowest response we received from the large Inference Endpoint was just 108ms.
70+
71+
## What about the cost?
72+
73+
So how much does this all cost? The table below shows a price comparison for what we were doing previously (ECS + Fargate) and using Inference Endpoints.
74+
75+
```bash
76+
size | vCPU | Memory (GB) | ECS | 🤗 | % diff
77+
----------------------------------------------------------------------
78+
small | 1 | 2 | $ 33.18 | $ 43.80 | 0.24
79+
medium | 2 | 4 | $ 60.38 | $ 87.61 | 0.31
80+
large | 4 | 8 | $ 114.78 | $ 175.22 | 0.34
81+
xlarge | 8 | 16 | $ 223.59 | $ 350.44 | 0.5
82+
```
83+
84+
We can say a couple of things about this. Firstly, we want a managed solution to deployment, we don’t have a dedicated MLOPs team (yet), so we’re looking for a solution that helps us minimize the time we spend on deploying models, even if it costs a little more than handling the deployments ourselves.
85+
86+
Inference Endpoints are more expensive that what we were doing before, there’s an increased cost of between 24% and 50%. At the scale we’re currently operating, this additional cost, a difference of ~$60 a month for a large CPU instance is nothing compared to the time and cognitive load we are saving by not having to worry about APIs, and containers. If we were deploying 100s of ML microservices we would probably want to think again, but that is probably true of many approaches to hosting.
87+
88+
## Some notes and caveats:
89+
90+
- You can find pricing for Inference Endpoints [here](https://huggingface.co/pricing#endpoints), but a different number is displayed when you deploy a new endpoint from the [GUI](https://ui.endpoints.huggingface.co/new). I’ve used the latter, which is higher.
91+
- The values that I present in the table for ECS + Fargate are an underestimate, but probably not by much. I extracted them from the [fargate pricing page](https://aws.amazon.com/fargate/pricing/) and it includes just the cost of hosting the instance. I’m not including the data ingress/egress (probably the biggest thing is downloading the model from Hugging Face hub), nor have I included the costs related to ECR.
92+
93+
## Other considerations
94+
95+
### Deployment Options
96+
97+
Currently you can deploy an Inference Endpoint from the [GUI](https://ui.endpoints.huggingface.co/new) or using a [RESTful API](https://huggingface.co/docs/inference-endpoints/api_reference). You can also make use of our command line tool [hugie](https://github.com/MantisAI/hfie) (which will be the subject of a future blog) to launch Inference Endpoints in one line of code by passing a configuration, it’s really this simple:
98+
99+
```bash
100+
hugie endpoint create example/development.json
101+
```
102+
103+
For me, what’s lacking is a [custom terraform provider](https://www.hashicorp.com/blog/writing-custom-terraform-providers). It’s all well and good deploying an inference endpoint from a [GitHub action](https://github.com/features/actions) using hugie, as we do, but it would be better if we could use the awesome state machine that is terraform to keep track of these. I’m pretty sure that someone (if not Hugging Face) will write one soon enough — if not, we will.
104+
105+
### Hosting multiple models on a single endpoint
106+
107+
Philipp Schmid posted a really nice blog about how to write a custom [Endpoint Handler](https://www.philschmid.de/multi-model-inference-endpoints) class to allow you to host multiple models on a single endpoint, potentially saving you quite a bit of money. His blog was about GPU inference, and the only real limitation is how many models you can fit into the GPU memory. I assume this will also work for CPU instances, though I’ve not tried yet.
108+
109+
## To conclude…
110+
111+
We find Hugging Face Inference Endpoints to be a very simple and convenient way to deploy transformer (and [sklearn](https://huggingface.co/scikit-learn)) models into an endpoint so they can be consumed by an application. Whilst they cost a little more than the ECS approach we were using before, it’s well worth it because it saves us time on thinking about deployment, we can concentrate on the thing we want to: building NLP solutions for our clients to help solve their problems.
112+
113+
_This article was originally published on February 15, 2023 [in Medium](https://medium.com/mantisnlp/why-were-switching-to-hugging-face-inference-endpoints-and-maybe-you-should-too-829371dcd330)._

0 commit comments

Comments
 (0)