-
Notifications
You must be signed in to change notification settings - Fork 125
Description
Describe the bug
When copying one array's contents to another (eg. buf2[:] = buf1
), it takes a long time, and ends up creating a new array in the process, unnecessarily increasing memory consumption.
ulab
version: 6.8.0-2D-c
To Reproduce
Run the following (DMA part only works on RP2, tested with Pico 2):
from ulab import numpy as np
import time
import rp2
import gc
# Set buffer size in bytes
buffer_size = 100_000
# Log free memory
print("mem_free:", gc.mem_free())
# Create first buffer
t0 = time.ticks_us()
buf1 = np.zeros((buffer_size), dtype=np.uint8)
t1 = time.ticks_us()
print("create buf1:", t1 - t0, "microseconds")
# Log free memory
print("mem_free:", gc.mem_free())
# Create second buffer
t0 = time.ticks_us()
buf2 = np.zeros((buffer_size), dtype=np.uint8)
t1 = time.ticks_us()
print("create buf2:", t1 - t0, "microseconds")
# Log free memory
print("mem_free:", gc.mem_free())
# Create a DMA controller and configure it
dma = rp2.DMA()
bytes_per_transfer = 4 # 1, 2, or 4 bytes per transfer
dma_ctrl = dma.pack_ctrl(
# 0 = 1 byte, 1 = 2 bytes, 2 = 4 bytes
size = {1:0, 2:1, 4:2}[bytes_per_transfer],
inc_write = True,
inc_read = True
)
dma.config(
read = buf1,
write = buf2,
count = buffer_size // bytes_per_transfer,
ctrl = dma_ctrl
)
# Fill buf1 with data
t0 = time.ticks_us()
buf1[:] = 1
t1 = time.ticks_us()
print("filling buf1:", t1 - t0, "microseconds")
# Log free memory
print("mem_free:", gc.mem_free())
# Copy buf1 to buf2 using standard assignment
t0 = time.ticks_us()
buf2[:] = buf1
t1 = time.ticks_us()
print("copy buf1 to buf2:", t1 - t0, "microseconds")
# Log free memory
print("mem_free:", gc.mem_free())
# Copy buf1 to buf2 using DMA
t0 = time.ticks_us()
_ = dma.active(True)
while dma.active():
pass
t1 = time.ticks_us()
print("copy buf1 to buf2 with dma:", t1 - t0, "microseconds")
# Log free memory
print("mem_free:", gc.mem_free())
Expected behavior
buf2[:] = buf1
should take about as long as filling one of the buffers, and should create a new array in the process.
With a Pico 2, the following is printed:
mem_free: 490016
create buf1: 1395 microseconds
mem_free: 389920
create buf2: 1576 microseconds
mem_free: 289824
filling buf1: 23131 microseconds
mem_free: 289408
copy buf1 to buf2: 47145 microseconds
mem_free: 189216
copy buf1 to buf2 with dma: 193 microseconds
mem_free: 189216
With a Pico 2 and buffer_size = 100_000
, filling one buffer takes ~23ms, whereas copying takes ~47ms and unnecessarily allocates a whole extra array (100kB) for some reason. By contrast, the DMA can do the memory copy in just 0.2ms (~250x faster!).
Additional context
I understand that the copy with CPU will take longer than the DMA, but why does it take so much longer than simply filling the array, and why is whole new array being allocated? The project I'm working on uses large arrays, and it's pretty wasteful memory usage for whole new arrays to be create for a simple copy operation. Is that intended behavior? Is there a solution that's more efficient? I see copyto() isn't implemented, perhaps that should be a feature request? Just trying to better understand the problem before asking for that.