-
Notifications
You must be signed in to change notification settings - Fork 255
Stop copying whole image before cropping #1308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop copying whole image before cropping #1308
Conversation
…ijn/torchio into 1305-PatchSamper_speedup
|
@nicoloesch could you please comment on the general design? I can then fix formatting etc. |
nicoloesch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really like the changes @StijnvWijn and this is exactly how I would have done it. Some changes are purely stylistic, others require some feedback from @fepegar about formatting and docstring description to make it as clear as possible.
Most of my comments are suggestions instead of requirements.
|
I fixed it, but now the apply_transform function has some less beautiful code to make a new subject, copy it and all its attributes, it uses the dict and class attributes, but I added some comments, so hopefully it is clear why. |
|
@nicoloesch Do you have any comments on the new copy mechanism? |
nicoloesch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All my comments have been addressed! Thank you @StijnvWijn
|
@allcontributors please add @nicoloesch for design, maintenance, question, review |
|
I've put up a pull request to add @nicoloesch! 🎉 |
|
@allcontributors please add @StijnvWijn for code |
|
I've put up a pull request to add @StijnvWijn! 🎉 |
|
Thank you both for your contribution! |
|
Unfortunately, these changes are giving memory trouble. The script below shows that, from v0.20.10, references to the uncropped image are still somewhere and memory just grows and grows. import gc
import os
import psutil
num_subjects = 1
process = psutil.Process(os.getpid())
def print_referrers_recursively(obj, depth=2):
"""Recursively print referrers of an object."""
if depth == 0:
return
referrers = gc.get_referrers(obj)
print(f"Object {id(obj)} has {len(referrers)} referrers")
for ref in referrers:
print(f"Referrer type and ID: {type(ref)}, {id(ref)}")
if isinstance(ref, dict):
print(f" Dict keys: {list(ref.keys())[:10]}")
elif isinstance(ref, (list, tuple)):
print(f" Sequence length: {len(ref)}")
elif isinstance(ref, type):
print(f" Class: {ref.__name__}")
else:
print(f" Other referrer: {type(ref)} (repr suppressed)")
print()
# Recursively check referrers of the referrer
print_referrers_recursively(ref, depth=depth - 1)
print()
def print_memory_usage():
"""Print the current memory usage of the process."""
mem = process.memory_info().rss / 1024**2 # Convert bytes to MB
print(f"\nMemory usage: {mem:.2f} MB")
import torch
for obj in gc.get_objects():
is_tensor = torch.is_tensor(obj)
if is_tensor:
if obj.shape[-1] == 16:
continue
print("-" * 80)
tensor = obj if is_tensor else obj.data
tensor_id = id(tensor)
print(
f"Type: {type(tensor)}, "
f"Shape: {tuple(tensor.shape)}, "
f"Dtype: {tensor.dtype}, "
f"ID: {tensor_id}"
)
print_referrers_recursively(obj)
print("-" * 80)
import sys
sys.exit(0)
print("\nChecking memory usage before importing...")
print_memory_usage()
print("\nImporting...")
import torchio as tio
print_memory_usage()
print("\nCreating subjects list...")
subjects = [tio.datasets.Colin27(2008) for _ in range(num_subjects)]
transform = tio.CropOrPad(16)
print_memory_usage()
print("\nInstantiating SubjectsDataset...")
dataset = tio.SubjectsDataset(subjects, transform=transform)
print_memory_usage()
print("\nLoading...")
for subject in dataset:
print_memory_usage() |
|
I'm going to submit a hotfix reverting this in a new version, but it's a very nice feature so I hope it can be fixed :) |
|
Hmm that is very sad. I am not sure where the issue would occur, because all attributes seem to be deepcopied from the original input. Do you have any idea where to start? |
I know, and I'm sorry I had to revert this. I'm sure we'll be able to find a solution! I'm happy to refactor the design if needed.
I would start with the snippet I shared above. |
|
Thanks for you quick response! I am not sure I can reproduce your issue. In your snippet, indeed the memory usage increases slightly between the instantiation of the dataset and the creation of the cropped patches, but IMO this is expected, because we make a copy of the subject with all the old attributes and put a copy of the cropped area of the original image into the subject. Also if I run the subject generation loop multiple times, I see some minor fluctuations in memory usage, but they go both up and down between loops, so not the behaviour I would expect with a memory leak. I am running it on a server with quite a bit of RAM, just like I described in my original issue, so that might also have an impact on my results. |
|
I would not mind to work on this, but I can't test whether my implementation works currently. Shall I make a new issue or PR to discuss this issue or continue it here? |
|
I may also be able to assist but I do have currently very limited time as I am in the final stages to submit my PhD. I may have some time this week and will start with the snippet you, @fepegar, provided and whether I can reproduce the memory leak! |
|
Ah awesome, that is great news for you. Good luck! It is not too urgent as we are currently just using an older version of tio, so you don't need to worry too much about it :) |
Which TorchIO version and OS are you using? |
|
Thanks for your response! So my environment has the following: And if I run the following example (adapted from your script): With the following tio versions, I get: 0.20.10 (Version with this MR) 0.20.17 (Version with hotfix): This time, it does look like there is an increase in memory consumption, not sure what changed, but this does mean I can investigate! |
|
Thanks for trying. When I ran it, my memory was increasing much faster
(e.g. 4 GB after a few iterations). I’ll try to replicate and share my
results here.
…On Mon, 4 Aug 2025 at 08:06, StijnvWijn ***@***.***> wrote:
*StijnvWijn* left a comment (TorchIO-project/torchio#1308)
<#1308 (comment)>
Thanks for your response!
So my environment has the following:
Platform: Linux-5.15.0-143-generic-x86_64-with-glibc2.35
TorchIO: 0.20.3
PyTorch: 2.2.0+cpu
SimpleITK: 2.5.0 (ITK 5.4)
NumPy: 1.26.4
Python: 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0]
And if I run the following example (adapted from your script):
import gc
import os
import psutil
num_subjects = 1
process = psutil.Process(os.getpid())
def print_referrers_recursively(obj, depth=2):
"""Recursively print referrers of an object."""
if depth == 0:
return
referrers = gc.get_referrers(obj)
print(f"Object {id(obj)}, {type(obj)} has {len(referrers)} referrers")
for ref in referrers:
print(f"Referrer type and ID: {type(ref)}, {id(ref)}")
if isinstance(ref, dict):
print(f" Dict keys: {list(ref.keys())[:10]}")
elif isinstance(ref, (list, tuple)):
print(f" Sequence length: {len(ref)}")
elif isinstance(ref, type):
print(f" Class: {ref.__name__}")
else:
print(f" Other referrer: {type(ref)} (repr suppressed)")
print()
# Recursively check referrers of the referrer
print_referrers_recursively(ref, depth=depth - 1)
print()
def print_memory_usage():
"""Print the current memory usage of the process."""
mem = process.memory_info().rss / 1024**2 # Convert bytes to MB
print(f"\nMemory usage: {mem:.2f} MB")
import torch
for obj in gc.get_objects():
is_tensor = torch.is_tensor(obj)
if is_tensor:
if obj.shape[-1] == 16:
continue
print("-" * 80)
tensor = obj if is_tensor else obj.data
tensor_id = id(tensor)
print(
f"Type: {type(tensor)}, "
f"Shape: {tuple(tensor.shape)}, "
f"Dtype: {tensor.dtype}, "
f"ID: {tensor_id}"
)
print_referrers_recursively(obj)
print("-" * 80)
# import sys
# sys.exit(0)
print("\nChecking memory usage before importing...")
print_memory_usage()
print("\nImporting...")
import torchio as tio
print_memory_usage()
print("\nCreating subjects list...")
subjects = [tio.datasets.Colin27(2008) for _ in range(num_subjects)]
transform = tio.CropOrPad(16)
print_memory_usage()
print("\nInstantiating SubjectsDataset...")
dataset = tio.SubjectsDataset(subjects, transform=transform)
print_memory_usage()
print("\nLoading...")
for _ in range(5):
print("testing memory usage")
for subject in dataset:
print_memory_usage()
for subject in dataset:
print_memory_usage()
print(f"Subject ID: {id(subject)}")
print(f"Subject keys: {list(subject.keys())}")
for key, value in subject.items():
print(f" {key}: {type(value)}, Shape: {value.shape if hasattr(value, 'shape') else 'N/A'}")
print_referrers_recursively(subject['t1'], depth=3)
With the following tio versions, I get:
Torchio 0.20.3 (No memory leak expected):
Checking memory usage before importing...
Memory usage: 12.03 MB
Importing...
Memory usage: 325.64 MB
Creating subjects list...
Memory usage: 326.93 MB
Instantiating SubjectsDataset...
Memory usage: 326.93 MB
Loading...
testing memory usage
Memory usage: 332.82 MB
testing memory usage
Memory usage: 334.54 MB
testing memory usage
Memory usage: 334.16 MB
testing memory usage
Memory usage: 335.05 MB
testing memory usage
Memory usage: 335.17 MB
Memory usage: 334.27 MB
Subject ID: 139742629258896
Subject keys: ['t1', 't2', 'pd', 'cls']
t1: <class 'torchio.data.image.ScalarImage'>, Shape: (1, 16, 16, 16)
t2: <class 'torchio.data.image.ScalarImage'>, Shape: (1, 16, 16, 16)
pd: <class 'torchio.data.image.ScalarImage'>, Shape: (1, 16, 16, 16)
cls: <class 'torchio.data.image.LabelMap'>, Shape: (1, 16, 16, 16)
Object 139742629251600, <class 'torchio.data.image.ScalarImage'> has 2 referrers
Referrer type and ID: <class 'torchio.data.subject.Subject'>, 139742629258896
Dict keys: ['t1', 't2', 'pd', 'cls']
Object 139742629258896, <class 'torchio.data.subject.Subject'> has 2 referrers
Referrer type and ID: <class 'list'>, 139742629315328
Sequence length: 2
Object 139742629315328, <class 'list'> has 2 referrers
Referrer type and ID: <class 'list_iterator'>, 139742629209584
Other referrer: <class 'list_iterator'> (repr suppressed)
Referrer type and ID: <class 'list'>, 139742629318080
Sequence length: 2
Referrer type and ID: <class 'dict'>, 139746139912960
Dict keys: ['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__annotations__', '__builtins__', '__file__', '__cached__', 'gc']
Object 139746139912960, <class 'dict'> has 5 referrers
Referrer type and ID: <class 'list'>, 139742629318080
Sequence length: 2
Referrer type and ID: <class 'module'>, 139746139916304
Other referrer: <class 'module'> (repr suppressed)
Referrer type and ID: <class 'function'>, 139746139775488
Other referrer: <class 'function'> (repr suppressed)
Referrer type and ID: <class 'function'>, 139746137540544
Other referrer: <class 'function'> (repr suppressed)
Referrer type and ID: <class 'function'>, 139746138048768
Other referrer: <class 'function'> (repr suppressed)
Referrer type and ID: <class 'dict'>, 139742637074816
Dict keys: ['t1', 't2', 'pd', 'cls', 'applied_transforms']
Object 139742637074816, <class 'dict'> has 2 referrers
Referrer type and ID: <class 'torchio.data.subject.Subject'>, 139742629258896
Dict keys: ['t1', 't2', 'pd', 'cls']
Object 139742629258896, <class 'torchio.data.subject.Subject'> has 3 referrers
Referrer type and ID: <class 'list'>, 139742629315328
Sequence length: 2
Referrer type and ID: <class 'list'>, 139744021918784
Sequence length: 2
Referrer type and ID: <class 'dict'>, 139746139912960
Dict keys: ['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__annotations__', '__builtins__', '__file__', '__cached__', 'gc']
Referrer type and ID: <class 'list'>, 139742629315328
Sequence length: 2
Object 139742629315328, <class 'list'> has 2 referrers
Referrer type and ID: <class 'list_iterator'>, 139742629209584
Other referrer: <class 'list_iterator'> (repr suppressed)
Referrer type and ID: <class 'list'>, 139744021918784
Sequence length: 2
0.20.10 (Version with this MR)
Checking memory usage before importing...
Memory usage: 12.03 MB
/home/stijn/venvs/rndeep_dev3.11/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:359: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
warnings.warn(
Importing...
Memory usage: 325.46 MB
/home/stijn/venvs/rndeep_dev3.11/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:359: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
warnings.warn(
Creating subjects list...
Memory usage: 326.75 MB
Instantiating SubjectsDataset...
Memory usage: 326.75 MB
Loading...
testing memory usage
Memory usage: 334.41 MB
testing memory usage
Memory usage: 334.92 MB
testing memory usage
Memory usage: 335.52 MB
testing memory usage
Memory usage: 335.66 MB
testing memory usage
Memory usage: 336.83 MB
Memory usage: 337.18 MB
Subject ID: 140106867906544
Subject keys: ['t1', 't2', 'pd', 'cls']
t1: <class 'torchio.data.image.ScalarImage'>, Shape: (1, 16, 16, 16)
t2: <class 'torchio.data.image.ScalarImage'>, Shape: (1, 16, 16, 16)
pd: <class 'torchio.data.image.ScalarImage'>, Shape: (1, 16, 16, 16)
cls: <class 'torchio.data.image.LabelMap'>, Shape: (1, 16, 16, 16)
Object 140106867905392, <class 'torchio.data.image.ScalarImage'> has 2 referrers
Referrer type and ID: <class 'torchio.datasets.mni.colin.Colin27'>, 140106867906544
Dict keys: ['t1', 't2', 'pd', 'cls']
Object 140106867906544, <class 'torchio.datasets.mni.colin.Colin27'> has 2 referrers
Referrer type and ID: <class 'list'>, 140106867953856
Sequence length: 2
Object 140106867953856, <class 'list'> has 2 referrers
Referrer type and ID: <class 'list_iterator'>, 140106867762048
Other referrer: <class 'list_iterator'> (repr suppressed)
Referrer type and ID: <class 'list'>, 140106867953280
Sequence length: 2
Referrer type and ID: <class 'dict'>, 140110378056448
Dict keys: ['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__annotations__', '__builtins__', '__file__', '__cached__', 'gc']
Object 140110378056448, <class 'dict'> has 5 referrers
Referrer type and ID: <class 'list'>, 140106867953280
Sequence length: 2
Referrer type and ID: <class 'module'>, 140110378059792
Other referrer: <class 'module'> (repr suppressed)
Referrer type and ID: <class 'function'>, 140110377918976
Other referrer: <class 'function'> (repr suppressed)
Referrer type and ID: <class 'function'>, 140110375700416
Other referrer: <class 'function'> (repr suppressed)
Referrer type and ID: <class 'function'>, 140110376208640
Other referrer: <class 'function'> (repr suppressed)
Referrer type and ID: <class 'dict'>, 140108265044224
Dict keys: ['version', 'name', 'url_dir', 'filename', 'url', 'applied_transforms', 't1', 't2', 'pd', 'cls']
Object 140108265044224, <class 'dict'> has 2 referrers
Referrer type and ID: <class 'torchio.datasets.mni.colin.Colin27'>, 140106867906544
Dict keys: ['t1', 't2', 'pd', 'cls']
Object 140106867906544, <class 'torchio.datasets.mni.colin.Colin27'> has 3 referrers
Referrer type and ID: <class 'list'>, 140106867953856
Sequence length: 2
Referrer type and ID: <class 'list'>, 140106875282112
Sequence length: 2
Referrer type and ID: <class 'dict'>, 140110378056448
Dict keys: ['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__annotations__', '__builtins__', '__file__', '__cached__', 'gc']
Referrer type and ID: <class 'list'>, 140106867953856
Sequence length: 2
Object 140106867953856, <class 'list'> has 2 referrers
Referrer type and ID: <class 'list_iterator'>, 140106867762048
Other referrer: <class 'list_iterator'> (repr suppressed)
Referrer type and ID: <class 'list'>, 140106875282112
Sequence length: 2
0.20.17 (Version with hotfix):
Checking memory usage before importing...
Memory usage: 12.12 MB
Importing...
Memory usage: 325.56 MB
Creating subjects list...
Memory usage: 326.85 MB
Instantiating SubjectsDataset...
Memory usage: 326.85 MB
Loading...
testing memory usage
Memory usage: 334.45 MB
testing memory usage
Memory usage: 334.60 MB
testing memory usage
Memory usage: 335.69 MB
testing memory usage
Memory usage: 335.34 MB
testing memory usage
Memory usage: 335.01 MB
Memory usage: 335.30 MB
Subject ID: 140460681742384
Subject keys: ['t1', 't2', 'pd', 'cls']
t1: <class 'torchio.data.image.ScalarImage'>, Shape: (1, 16, 16, 16)
t2: <class 'torchio.data.image.ScalarImage'>, Shape: (1, 16, 16, 16)
pd: <class 'torchio.data.image.ScalarImage'>, Shape: (1, 16, 16, 16)
cls: <class 'torchio.data.image.LabelMap'>, Shape: (1, 16, 16, 16)
Object 140460681742672, <class 'torchio.data.image.ScalarImage'> has 2 referrers
Referrer type and ID: <class 'torchio.datasets.mni.colin.Colin27'>, 140460681742384
Dict keys: ['t1', 't2', 'pd', 'cls']
Object 140460681742384, <class 'torchio.datasets.mni.colin.Colin27'> has 2 referrers
Referrer type and ID: <class 'list'>, 140462379464896
Sequence length: 2
Object 140462379464896, <class 'list'> has 2 referrers
Referrer type and ID: <class 'list_iterator'>, 140460681555888
Other referrer: <class 'list_iterator'> (repr suppressed)
Referrer type and ID: <class 'list'>, 140460703722048
Sequence length: 2
Referrer type and ID: <class 'dict'>, 140464191896384
Dict keys: ['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__annotations__', '__builtins__', '__file__', '__cached__', 'gc']
Object 140464191896384, <class 'dict'> has 5 referrers
Referrer type and ID: <class 'list'>, 140460703722048
Sequence length: 2
Referrer type and ID: <class 'module'>, 140464191899664
Other referrer: <class 'module'> (repr suppressed)
Referrer type and ID: <class 'function'>, 140464191758848
Other referrer: <class 'function'> (repr suppressed)
Referrer type and ID: <class 'function'>, 140464189540288
Other referrer: <class 'function'> (repr suppressed)
Referrer type and ID: <class 'function'>, 140464190048512
Other referrer: <class 'function'> (repr suppressed)
Referrer type and ID: <class 'dict'>, 140460703776704
Dict keys: ['version', 'name', 'url_dir', 'filename', 'url', 't1', 't2', 'pd', 'cls', 'applied_transforms']
Object 140460703776704, <class 'dict'> has 2 referrers
Referrer type and ID: <class 'torchio.datasets.mni.colin.Colin27'>, 140460681742384
Dict keys: ['t1', 't2', 'pd', 'cls']
Object 140460681742384, <class 'torchio.datasets.mni.colin.Colin27'> has 3 referrers
Referrer type and ID: <class 'list'>, 140462379464896
Sequence length: 2
Referrer type and ID: <class 'list'>, 140460703722048
Sequence length: 2
Referrer type and ID: <class 'dict'>, 140464191896384
Dict keys: ['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__annotations__', '__builtins__', '__file__', '__cached__', 'gc']
Referrer type and ID: <class 'list'>, 140462379464896
Sequence length: 2
Object 140462379464896, <class 'list'> has 2 referrers
Referrer type and ID: <class 'list_iterator'>, 140460681555888
Other referrer: <class 'list_iterator'> (repr suppressed)
Referrer type and ID: <class 'list'>, 140460703722048
Sequence length: 2
This time, it does look like there is an increase in memory consumption,
not sure what changed, but this does mean I can investigate!
—
Reply to this email directly, view it on GitHub
<#1308 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADAZVVHL3FAM6GJPDVWPUD33L4BBFAVCNFSM6AAAAAB6SWG2AKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTCNBZGM4DSNRRGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
Sorry, my script was not very helpful in the state I shared it. I've checked out Change import gc
import os
import psutil
num_subjects = 10
print_object_info = False
print_referrers = False
process = psutil.Process(os.getpid())
def print_referrers_recursively(obj, depth=2):
"""Recursively print referrers of an object."""
if depth == 0:
return
referrers = gc.get_referrers(obj)
print(f'Object {id(obj)} has {len(referrers)} referrers')
for ref in referrers:
print(f'Referrer type and ID: {type(ref)}, {id(ref)}')
if isinstance(ref, dict):
print(f' Dict keys: {list(ref.keys())[:10]}')
elif isinstance(ref, (list, tuple)):
print(f' Sequence length: {len(ref)}')
elif isinstance(ref, type):
print(f' Class: {ref.__name__}')
else:
print(f' Other referrer: {type(ref)} (repr suppressed)')
print()
# Recursively check referrers of the referrer
print_referrers_recursively(ref, depth=depth - 1)
print()
def print_memory_usage():
"""Print the current memory usage of the process."""
mem = process.memory_info().rss / 1024**2 # Convert bytes to MB
print(f'\nMemory usage: {mem:.2f} MB')
if not print_object_info or not print_referrers:
return
import torch
for obj in gc.get_objects():
is_tensor = torch.is_tensor(obj)
if is_tensor:
if obj.shape[-1] == 16:
continue
print('-' * 80)
tensor = obj if is_tensor else obj.data
tensor_id = id(tensor)
if print_object_info:
print(
f'Type: {type(tensor)}, '
f'Shape: {tuple(tensor.shape)}, '
f'Dtype: {tensor.dtype}, '
f'ID: {tensor_id}'
)
if print_referrers:
print_referrers_recursively(obj)
print('-' * 80)
print('\nChecking memory usage before importing...')
print_memory_usage()
print('\nImporting...')
import torchio as tio
print_memory_usage()
print('\nCreating subjects list...')
subjects = [tio.datasets.Colin27(2008) for _ in range(num_subjects)]
transform = tio.CropOrPad(16)
print_memory_usage()
print('\nInstantiating SubjectsDataset...')
dataset = tio.SubjectsDataset(subjects, transform=transform)
print_memory_usage()
print('\nLoading...')
for subject in dataset:
print_memory_usage() |
|
Thanks for your quick response! I'm sorry, apparently it does not show up on the server, but if I run it on my laptop, I can replicate your issue. I see that the pytorch versions are different, so maybe its related to that? There were some changes related to the default collation that made us stay with an older Pytorch version, but I will check. torchio 0.20.8:
Torchio 0.20.10:
torchio 0.20.16:
|
|
Interesting that it's different! But we definitely should check the PyTorch versions. However I'm not sure this would be about the collating function, because note in the script I'm not using a |
|
I have found an issue that might be related: If I create a new tio.Image inside of a loop, it seems to not be garbage collected. Not sure why yet, but it also causes a linear increase in memory consumption if I run the following snippet: It does not happen if the new_image is just a tensor, an ordinary dict like {'tensor': new_tensor} or a much simpler class like below, Soo I asked Gemini and it found the issue, it turns out that it is related to the
|
|
Wow! Great investigation! That cheeky |
|
Awesome! No problem, It also took me quite some time to find the issue, did not even know this was how that warnings module worked haha. |


Fixes #1305.
Description
A slowdown while sampling patches that are much smaller than the image in Torchio >= 0.20.4
Checklist
CONTRIBUTINGdocs and have a developer setup ready