Skip to content

Commit 86e68bf

Browse files
authored
Add support for private/gated model access (Closes #198) (#202)
* Allow user to specify HF token as an environment variable * Add documentation for how to make authorized requests * Improve docs
1 parent 00c0e29 commit 86e68bf

File tree

4 files changed

+96
-28
lines changed

4 files changed

+96
-28
lines changed

docs/source/_toctree.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@
2020
- local: tutorials/node-audio-processing
2121
title: Server-side Audio Processing in Node.js
2222
title: Tutorials
23+
- sections:
24+
- local: guides/private
25+
title: Accessing Private/Gated Models
26+
title: Developer Guides
2327
- sections:
2428
- local: api/transformers
2529
title: Index

docs/source/guides/private.mdx

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
2+
# Accessing Private/Gated Models
3+
4+
<Tip>
5+
6+
Due to the possibility of leaking access tokens to users of your website or web application, we only support accessing private/gated models from server-side environments (e.g., Node.js) that have access to the process' environment variables.
7+
8+
</Tip>
9+
10+
## Step 1: Generating a User Access Token
11+
12+
[User Access Tokens](https://huggingface.co/docs/hub/security-tokens) are the preferred way to authenticate an application to Hugging Face services.
13+
14+
To generate an access token, navigate to the [Access Tokens tab](https://huggingface.co/settings/tokens) in your settings and click on the **New token** button. Choose a name for your token and click **Generate a token** (we recommend keeping the "Role" as read-only). You can then click the **Copy** button next to your newly-created token to copy it to your clipboard.
15+
16+
<div class="flex justify-center">
17+
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/new-token.png"/>
18+
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/new-token-dark.png"/>
19+
</div>
20+
21+
To delete or refresh User Access Tokens, you can click the **Manage** button.
22+
23+
24+
## Step 2: Using the access token in Transformers.js
25+
26+
Transformers.js will attach an Authorization header to requests made to the Hugging Face Hub when the `HF_ACCESS_TOKEN` environment variable is set and visible to the process.
27+
28+
One way to do this is to call your program with the environment variable set. For example, let's say you have a file called `llama.js` with the following code:
29+
30+
```js
31+
import { AutoTokenizer } from '@xenova/transformers';
32+
33+
// Load tokenizer for a gated repository.
34+
const tokenizer = await AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf');
35+
36+
// Encode text.
37+
const text = 'Hello world!';
38+
const encoded = tokenizer.encode(text);
39+
console.log(encoded);
40+
```
41+
42+
You can then use the following command to set the `HF_ACCESS_TOKEN` environment variable and run the file:
43+
44+
```bash
45+
HF_ACCESS_TOKEN=hf_... node tests/llama.js
46+
```
47+
48+
(remember to replace `hf_...` with your actual access token).
49+
50+
If done correctly, you should see the following output:
51+
52+
```bash
53+
[ 1, 15043, 3186, 29991 ]
54+
```
55+
56+
57+
Alternatively, you can set the environment variable directly in your code:
58+
```js
59+
// Set access token (NB: Keep this private!)
60+
process.env.HF_ACCESS_TOKEN = 'hf_...';
61+
62+
// ... rest of your code
63+
```

docs/source/index.mdx

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,11 @@
1717

1818
## Contents
1919

20-
The documentation is organized into 3 sections:
20+
The documentation is organized into 4 sections:
2121
1. **GET STARTED** provides a quick tour of the library and installation instructions to get up and running.
2222
2. **TUTORIALS** are a great place to start if you're a beginner! We also include sample applications for you to play around with!
23-
3. **API REFERENCE** describes all classes and functions, as well as their available parameters and types.
23+
3. **HOW-TO GUIDES** show you how to use the library to achieve a specific goal.
24+
4. **API REFERENCE** describes all classes and functions, as well as their available parameters and types.
2425

2526
## Supported tasks/models
2627

src/utils/hub.js

Lines changed: 26 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -31,21 +31,6 @@ if (!globalThis.ReadableStream) {
3131
* NOTE: This setting is ignored for local requests.
3232
*/
3333

34-
class Headers extends Object {
35-
constructor(...args) {
36-
super();
37-
Object.assign(this, args);
38-
}
39-
40-
get(key) {
41-
return this[key];
42-
}
43-
44-
clone() {
45-
return new Headers(this);
46-
}
47-
}
48-
4934
class FileResponse {
5035
/**
5136
* Mapping from file extensions to MIME types.
@@ -75,7 +60,7 @@ class FileResponse {
7560
this.statusText = 'OK';
7661

7762
let stats = fs.statSync(filePath);
78-
this.headers['content-length'] = stats.size;
63+
this.headers.set('content-length', stats.size.toString());
7964

8065
this.updateContentType();
8166

@@ -103,7 +88,7 @@ class FileResponse {
10388
updateContentType() {
10489
// Set content-type header based on file extension
10590
const extension = this.filePath.toString().split('.').pop().toLowerCase();
106-
this.headers['content-type'] = this._CONTENT_TYPE_MAP[extension] ?? 'application/octet-stream';
91+
this.headers.set('content-type', this._CONTENT_TYPE_MAP[extension] ?? 'application/octet-stream');
10792
}
10893

10994
/**
@@ -115,7 +100,7 @@ class FileResponse {
115100
response.exists = this.exists;
116101
response.status = this.status;
117102
response.statusText = this.statusText;
118-
response.headers = this.headers.clone();
103+
response.headers = new Headers(this.headers);
119104
return response;
120105
}
121106

@@ -138,7 +123,7 @@ class FileResponse {
138123
*/
139124
async blob() {
140125
const data = await fs.promises.readFile(this.filePath);
141-
return new Blob([data], { type: this.headers['content-type'] });
126+
return new Blob([data], { type: this.headers.get('content-type') });
142127
}
143128

144129
/**
@@ -167,16 +152,20 @@ class FileResponse {
167152
/**
168153
* Determines whether the given string is a valid HTTP or HTTPS URL.
169154
* @param {string|URL} string The string to test for validity as an HTTP or HTTPS URL.
155+
* @param {string[]} [validHosts=null] A list of valid hostnames. If specified, the URL's hostname must be in this list.
170156
* @returns {boolean} True if the string is a valid HTTP or HTTPS URL, false otherwise.
171157
*/
172-
function isValidHttpUrl(string) {
158+
function isValidHttpUrl(string, validHosts = null) {
173159
// https://stackoverflow.com/a/43467144
174160
let url;
175161
try {
176162
url = new URL(string);
177163
} catch (_) {
178164
return false;
179165
}
166+
if (validHosts && !validHosts.includes(url.hostname)) {
167+
return false;
168+
}
180169
return url.protocol === "http:" || url.protocol === "https:";
181170
}
182171

@@ -194,13 +183,25 @@ export async function getFile(urlOrPath) {
194183
} else if (typeof process !== 'undefined' && process?.release?.name === 'node') {
195184
const IS_CI = !!process.env?.TESTING_REMOTELY;
196185
const version = env.version;
197-
return fetch(urlOrPath, {
198-
headers: {
199-
'User-Agent': `transformers.js/${version}; is_ci/${IS_CI};`
186+
187+
const headers = new Headers();
188+
headers.set('User-Agent', `transformers.js/${version}; is_ci/${IS_CI};`);
189+
190+
// Check whether we are making a request to the Hugging Face Hub.
191+
const isHFURL = isValidHttpUrl(urlOrPath, ['huggingface.co', 'hf.co']);
192+
if (isHFURL) {
193+
// If an access token is present in the environment variables,
194+
// we add it to the request headers.
195+
const token = process.env?.HF_ACCESS_TOKEN;
196+
if (token) {
197+
headers.set('Authorization', `Bearer ${token}`);
200198
}
201-
});
199+
}
200+
return fetch(urlOrPath, { headers });
202201
} else {
203202
// Running in a browser-environment, so we use default headers
203+
// NOTE: We do not allow passing authorization headers in the browser,
204+
// since this would require exposing the token to the client.
204205
return fetch(urlOrPath);
205206
}
206207
}
@@ -409,11 +410,10 @@ export async function getModelFile(path_or_repo_id, filename, fatal = true, opti
409410
if (response === undefined) {
410411
// Caching not available, or file is not cached, so we perform the request
411412

412-
let isURL = isValidHttpUrl(requestURL);
413-
414413
if (env.allowLocalModels) {
415414
// Accessing local models is enabled, so we try to get the file locally.
416415
// If request is a valid HTTP URL, we skip the local file check. Otherwise, we try to get the file locally.
416+
const isURL = isValidHttpUrl(requestURL);
417417
if (!isURL) {
418418
try {
419419
response = await getFile(localPath);

0 commit comments

Comments
 (0)