Skip to content

Commit aecfe12

Browse files
committed
Configure Hugging Face cache directories for dataset preparation
1 parent df2de46 commit aecfe12

File tree

2 files changed

+13
-2
lines changed

2 files changed

+13
-2
lines changed

llm-finetuning/configs/generate_code_dataset.yaml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,11 @@
22
settings:
33
docker:
44
requirements: requirements.txt
5+
apt_packages:
6+
- git
7+
environment:
8+
HF_HOME: "/tmp/huggingface"
9+
HF_HUB_CACHE: "/tmp/huggingface"
510

611
# pipeline configuration
712
parameters:
@@ -11,4 +16,4 @@ steps:
1116
mirror_repositories:
1217
parameters:
1318
repositories:
14-
- zenml
19+
- zenml

llm-finetuning/steps/prepare_dataset.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,14 @@
66
"""
77

88
import os
9-
from typing import Dict
9+
from pathlib import Path
10+
11+
# Set cache directories before importing HF libraries
12+
os.environ["HF_HOME"] = "/tmp/huggingface"
13+
os.environ["HF_HUB_CACHE"] = "/tmp/huggingface"
14+
os.makedirs("/tmp/huggingface", exist_ok=True)
1015

16+
from typing import Dict
1117
import pandas as pd
1218
from datasets import Dataset
1319
from huggingface_hub import HfApi

0 commit comments

Comments
 (0)