Why GPU Memory is not released after the pipeline is finished? #13117
-
Hi, I am using trf model to do a pipeline. It runs good with the first batch and fails to load the second batch because vram is not released. How to reproduce the behaviourdef spacy_tokenise(df: pd.DataFrame, batch_size: int):
from thinc.api import set_gpu_allocator, require_gpu
# manage gpu vram
set_gpu_allocator("pytorch")
# use GPU
# spacy.require_gpu()
require_gpu(0)
# Check if spaCy is using GPU
print("spaCy is using GPU: ", spacy.prefer_gpu())
# load model
model = spacy.load("en_core_web_trf")
docs = model.pipe(df.TEXT, batch_size=batch_size)
res = []
for doc in tqdm(docs, total=len(df.TEXT), desc="spaCy pipeline"):
for sent in doc.sents:
lst_token = [word.text for word in sent]
lst_pos = [word.pos_ for word in sent]
lst_lemma = [word.lemma_ for word in sent]
lst_ner_token = [ent.text for ent in sent.ents]
lst_ner_label = [ent.label_ for ent in sent.ents]
if len(lst_ner_token) == 0:
lst_ner_token = np.nan
lst_ner_label = np.nan
res.append(
{
"token": lst_token,
"pos": lst_pos,
"lemma": lst_lemma,
"ner_token": lst_ner_token,
"ner_label": lst_ner_label,
}
)
res = pd.DataFrame(res)
return res Your Environment
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
PyTorch uses a caching memory allocator. So, it doesn't immediately release freed memory to the operating system. Instead, it'll reuse previously freed memory when a new tensor needs to be allocated. What this means in practice is that the VRAM usage will appear to only grow during a given PyTorch session. You could call the |
Beta Was this translation helpful? Give feedback.
PyTorch uses a caching memory allocator. So, it doesn't immediately release freed memory to the operating system. Instead, it'll reuse previously freed memory when a new tensor needs to be allocated. What this means in practice is that the VRAM usage will appear to only grow during a given PyTorch session. You could call the
torch.cuda.empty_cache
function to release the cached memory, but this generally isn't advisable unless you memory fragmentation becomes an issue.