Why GPU Memory is not released after the pipeline is finished? #13117

Hansyvea · 2023-11-08T03:54:06Z

Hansyvea
Nov 8, 2023

Hi, I am using trf model to do a pipeline. It runs good with the first batch and fails to load the second batch because vram is not released.
I tried to reduce the data fed into the pipeline and after the pipeline is finished, the vram is still occupied.
How to release the vram after one batch of work is done?

How to reproduce the behaviour

def spacy_tokenise(df: pd.DataFrame, batch_size: int):
    from thinc.api import set_gpu_allocator, require_gpu

    # manage gpu vram
    set_gpu_allocator("pytorch")
    # use GPU
    # spacy.require_gpu()
    require_gpu(0)
    # Check if spaCy is using GPU
    print("spaCy is using GPU: ", spacy.prefer_gpu())
    # load model
    model = spacy.load("en_core_web_trf")
    docs = model.pipe(df.TEXT, batch_size=batch_size)
    res = []
    for doc in tqdm(docs, total=len(df.TEXT), desc="spaCy pipeline"):
        for sent in doc.sents:
            lst_token = [word.text for word in sent]
            lst_pos = [word.pos_ for word in sent]
            lst_lemma = [word.lemma_ for word in sent]
            lst_ner_token = [ent.text for ent in sent.ents]
            lst_ner_label = [ent.label_ for ent in sent.ents]
            if len(lst_ner_token) == 0:
                lst_ner_token = np.nan
                lst_ner_label = np.nan
            res.append(
                {
                    "token": lst_token,
                    "pos": lst_pos,
                    "lemma": lst_lemma,
                    "ner_token": lst_ner_token,
                    "ner_label": lst_ner_label,
                }
            )
    res = pd.DataFrame(res)
    return res

Your Environment

Operating System: WSL2 Ubuntu 20
Python Version Used: 3.10
spaCy Version Used: 3
Environment Information: Cuda 12.3

Answered by shadeMe

Nov 8, 2023

PyTorch uses a caching memory allocator. So, it doesn't immediately release freed memory to the operating system. Instead, it'll reuse previously freed memory when a new tensor needs to be allocated. What this means in practice is that the VRAM usage will appear to only grow during a given PyTorch session. You could call the torch.cuda.empty_cache function to release the cached memory, but this generally isn't advisable unless you memory fragmentation becomes an issue.

View full answer

shadeMe · 2023-11-08T10:26:03Z

shadeMe
Nov 8, 2023

PyTorch uses a caching memory allocator. So, it doesn't immediately release freed memory to the operating system. Instead, it'll reuse previously freed memory when a new tensor needs to be allocated. What this means in practice is that the VRAM usage will appear to only grow during a given PyTorch session. You could call the torch.cuda.empty_cache function to release the cached memory, but this generally isn't advisable unless you memory fragmentation becomes an issue.

4 replies

Hansyvea Nov 9, 2023
Author

thank you for your reply. Suppose I have 1000 text stored in a Pandas Series ( each row is a piece of text ), my problem is that after one batch's done (batch_size=500), the second batch stopped working. It just stopped there.

shadeMe Nov 9, 2023

If you're persisting Doc instances, you might want to look into the doc_cleaner function. It'll strip out PyTorch tensors that are attached to the Docs, which will help reducing memory usage.

Hansyvea Nov 10, 2023
Author

I added this

model = spacy.load("en_core_web_trf")
model.add_pipe("doc_cleaner")
docs = model.pipe(lst_text, batch_size=batch_size)

but the vram is still not released after each iteration of doc nor the whole pipeline.

the whole code:

def spacy_tokenise(lst_text: list, batch_size: int):
    spacy.require_gpu()
    # Check if spaCy is using GPU
    print("spaCy is using GPU: ", spacy.prefer_gpu())
    # load
    model = spacy.load("en_core_web_trf")
    model.add_pipe("doc_cleaner")
    docs = model.pipe(lst_text, batch_size=batch_size)
    res = []
    text_id = 0  
    for doc in tqdm(docs, total=len(lst_text), desc="spaCy pipeline"):
        for sent in doc.sents:
            lst_token = [word.text for word in sent]
            lst_pos = [word.pos_ for word in sent]
            lst_lemma = [word.lemma_ for word in sent]
            lst_ner_token = [ent.text for ent in sent.ents]
            lst_ner_label = [ent.label_ for ent in sent.ents]
            if len(lst_ner_token) == 0:
                lst_ner_token = np.nan
                lst_ner_label = np.nan
            res.append(
                {
                    "text_id": text_id,
                    "token": lst_token,
                    "pos": lst_pos,
                    "lemma": lst_lemma,
                    "ner_token": lst_ner_token,
                    "ner_label": lst_ner_label,
                }
            )
        text_id += 1
    res = pd.DataFrame(res)
    return res

dummy_df = pd.read_parquet("../Data/Data_Frame/1987.parquet")
dummy = list(dummy_df.head(200).TEXT)

res = spacy_tokenise(dummy, 100)

adrianeboyd Nov 13, 2023

Here's an older discussion with some relevant explanations and links: #12109 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Why GPU Memory is not released after the pipeline is finished? #13117

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Why GPU Memory is not released after the pipeline is finished? #13117

Uh oh!

Hansyvea Nov 8, 2023

How to reproduce the behaviour

Your Environment

Replies: 1 comment · 4 replies

Uh oh!

shadeMe Nov 8, 2023

Uh oh!

Hansyvea Nov 9, 2023 Author

Uh oh!

shadeMe Nov 9, 2023

Uh oh!

Uh oh!

Hansyvea Nov 10, 2023 Author

Uh oh!

adrianeboyd Nov 13, 2023

Hansyvea
Nov 8, 2023

Replies: 1 comment 4 replies

shadeMe
Nov 8, 2023

Hansyvea Nov 9, 2023
Author

Hansyvea Nov 10, 2023
Author