Skip to content

Commit 84a9b7f

Browse files
authored
add file upload guidance to fine-tune docs (#231)
1 parent d6cf14c commit 84a9b7f

File tree

2 files changed

+23
-19
lines changed

2 files changed

+23
-19
lines changed

clients/python/llmengine/fine_tuning.py

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -107,14 +107,14 @@ def create(
107107
writer.writerows(data)
108108
```
109109
110-
Currently, data needs to be uploaded to a publicly accessible web URL so that it can be read
111-
for fine-tuning. Publicly accessible HTTP and HTTPS URLs are currently supported.
112-
Support for privately sharing data with the LLM Engine API is coming shortly. For quick
113-
iteration, you can look into tools like Pastebin or GitHub Gists to quickly host your CSV
114-
files in a public manner. An example Github Gist can be found
115-
[here](https://gist.github.com/tigss/7cec73251a37de72756a3b15eace9965). To use the gist,
116-
you can use the URL given when you click the “Raw” button
117-
([URL](https://gist.githubusercontent.com/tigss/7cec73251a37de72756a3b15eace9965/raw/85d9742890e1e6b0c06468507292893b820c13c9/llm_sample_data.csv)).
110+
Currently, data needs to be uploaded to either a publicly accessible web URL or to LLM Engine's
111+
private file server so that it can be read for fine-tuning. Publicly accessible HTTP and HTTPS
112+
URLs are currently supported.
113+
114+
To privately share data with the LLM Engine API, use LLM Engine's [File.upload](../../api/python_client/#llmengine.File.upload)
115+
API. You can upload data in local file to LLM Engine's private file server and then use the
116+
returned file ID to reference your data in the FineTune API. The file ID is generally in the
117+
form of `file-<random_string>`, e.g. "file-7DLVeLdN2Ty4M2m".
118118
119119
Example code for fine-tuning:
120120
=== "Fine-tuning in Python"
@@ -123,7 +123,7 @@ def create(
123123
124124
response = FineTune.create(
125125
model="llama-2-7b",
126-
training_file="https://my-bucket.s3.us-west-2.amazonaws.com/path/to/training-file.csv",
126+
training_file="file-7DLVeLdN2Ty4M2m",
127127
)
128128
129129
print(response.json())

docs/guides/fine_tuning.md

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -103,14 +103,18 @@ with open('customer_service_data.csv', 'w', newline='') as file:
103103

104104
## Making your data accessible to LLM Engine
105105

106-
Currently, data needs to be uploaded to a publicly accessible web URL so that it can be read
107-
for fine-tuning. Publicly accessible HTTP and HTTPS URLs are currently supported.
108-
Support for privately sharing data with the LLM Engine API is coming shortly. For quick
109-
iteration, you can look into tools like Pastebin or GitHub Gists to quickly host your CSV
110-
files in a public manner. An example Github Gist can be found
111-
[here](https://gist.github.com/tigss/7cec73251a37de72756a3b15eace9965). To use the gist,
112-
you can use the URL given when you click the “Raw” button
113-
([URL](https://gist.githubusercontent.com/tigss/7cec73251a37de72756a3b15eace9965/raw/85d9742890e1e6b0c06468507292893b820c13c9/llm_sample_data.csv)).
106+
Currently, data needs to be uploaded to either a publicly accessible web URL or to LLM Engine's private file server so that it can be read for fine-tuning. Publicly accessible HTTP and HTTPS URLs are currently supported.
107+
108+
To privately share data with the LLM Engine API, use LLM Engine's [File.upload](../../api/python_client/#llmengine.File.upload) API. You can upload data in local file to LLM Engine's private file server and then use the returned file ID to reference your data in the FineTune API. The file ID is generally in the form of `file-<random_string>`, e.g. "file-7DLVeLdN2Ty4M2m".
109+
110+
=== "Upload to LLM Engine's private file server"
111+
112+
```python
113+
from llmengine import File
114+
115+
response = File.upload(open("customer_service_data.csv", "r"))
116+
print(response.json())
117+
```
114118

115119
## Launching the fine-tune
116120

@@ -137,8 +141,8 @@ from llmengine import FineTune
137141

138142
response = FineTune.create(
139143
model="llama-2-7b",
140-
training_file="s3://my-bucket/path/to/training-file.csv",
141-
validation_file="s3://my-bucket/path/to/validation-file.csv",
144+
training_file="file-7DLVeLdN2Ty4M2m",
145+
training_file="file-ezSRtpgKQyItI26",
142146
)
143147

144148
print(response.json())

0 commit comments

Comments
 (0)