To what extent should I be trying to optimize memory usage? #9778
Replies: 5 comments 6 replies
-
Depending on how you count, it's actually a triple copy -- at one instant there will be:
The second one is temporary and nothing holds a reference to it, but will not be freed until a collection runs. (That point is a bit subtle -- at a high level there really isn't a difference between "available for free" and "actually free" because (see below) an allocation attempt will implicitly collect if necessary, but practically there is a difference when it comes down to heap fragmentation and peak memory usage).
As long as there still exists a reference to But you can control this manually -- Normally I would suggest that you can use
Manually calling gc.collect is only useful to help force the gc to run at a time where it will have the most benefit in terms of freeing up contiguous memory (i.e. you should explicitly call gc.collect when there are the fewest "live" objects). Calling it manually will never help with the discovery of "freeable" regions. The thing that does happen automatically is that if an allocation fails, it implicitly does a gc.collect then re-tries the allocation.
What happens here is that split creates a list of strings. Each string is referenced by the list. You could in theory remove the elements from the list (e.g. by assigning None to the list elements), but that wouldn't help solve the fact that you needed to allocate the full set to start with. This is just an unfortunate inherent limitation of (FWIW, this is why MicroPython has an I gave a talk at the Melbourne MicroPython Meetup a couple of years that covers some of these concepts... https://www.youtube.com/watch?v=H_xq8IYjh2w It's a bit informal, would love to do a more polished version. It's hard to explain the impact of memory management on a program by giving a bunch of specific rules and examples, so instead the idea was to cover a bit of "how does the GC work at a fundamental" so that the higher-level behaviours can be inferred from this. |
Beta Was this translation helpful? Give feedback.
-
I want to start this by saying I'm not looking for a fix at this point, I'm just sharing some findings I thought were interesting. I tried optimizing my code a bit by swapping this line: req_buffer_lines = req_buffer.decode('utf8').split('\r\n') for this: req_buffer_lines = req_buffer.split(b'\r\n') and then decoding each line as I examine it. I figured by breaking the buffer into lines first, I would be decoding shorter strings, and by reusing the variable that temporarily holds the decoded string I would make overall memory usage less. To test, I created two functions: one with the old way of converting the entire buffer to string then splitting into lines, and another with the new way of splitting the buffer into lines and decoding a single line at a time to string. I printed free memory at various points. Here's what I got... Using the original method of converting entire buffer to string and then splitting into lines:
Alternative method of splitting buffer into lines and then converting individual lines to strings:
It seems I've actually made things worse. The code I used to test is shown below. All I did between the two runs was to change the function call from I'm not sure what I've proved here, if anything. It was just unexpected and I wanted to share in case anyone else is interested. import gc
gc.collect()
def parse_query_string(query_string):
"""
Split a URL's query string into individual key/value pairs
(ex: 'pet=Panda&color=Red' becomes { "pet": "panda", "color": "red"}
Args:
query_string (string): the query string portion of a URL (without the leading ? delimiter)
Returns:
dictionary: key/value pairs
"""
query = {}
query_params = query_string.split('&')
for param in query_params:
if (not '=' in param): # A key with no value, like: 'red' instead of 'color=red'
key=param
query[key] = ''
else:
key, value = param.split('=')
query[key] = value
return query
def parse_http_request(req_buffer):
"""
Given a raw HTTP request, return a dictionary with individual elements broken out
Args:
req_buffer (bytes): the unprocessed HTTP request sent from the client
Raises:
exception: when the request buffer is empty
Returns:
dictionary: key/value pairs including, but not limited to method, path, query, headers, body, etc.
or None type if parsing fails
"""
assert (req_buffer != b''), 'Empty request buffer.'
req = {}
req_buffer_lines = req_buffer.decode('utf8').split('\r\n')
req['method'], target, req['http_version'] = req_buffer_lines[0].split(' ', 2) # Example: GET /route/path HTTP/1.1
if (not '?' in target):
req['path'] = target
else: # target can have a query component, so /route/path could be something like /route/path?state=on&timeout=30
req['path'], query_string = target.split('?', 1)
req['query'] = parse_query_string(query_string)
req['headers'] = {}
for i in range(1, len(req_buffer_lines) - 1):
if (req_buffer_lines[i] == ''): # Blank line signifies the end of headers.
break
else:
name, value = req_buffer_lines[i].split(':', 1)
req['headers'][name.strip()] = value.strip()
req['body'] = req_buffer_lines[len(req_buffer_lines) - 1] # Last line is the body (or blank if no body.)
return req
def parse_http_request2(req_buffer):
"""
Given a raw HTTP request, return a dictionary with individual elements broken out
Args:
req_buffer (bytes): the unprocessed HTTP request sent from the client
Raises:
exception: when the request buffer is empty
Returns:
dictionary: key/value pairs including, but not limited to method, path, query, headers, body, etc.
or None type if parsing fails
"""
assert (req_buffer != b''), 'Empty request buffer.'
req = {}
req_buffer_lines = req_buffer.split(b'\r\n')
line = req_buffer_lines[0].decode('utf-8')
req['method'], target, req['http_version'] = line.split(' ', 2) # Example: GET /route/path HTTP/1.1
if (not '?' in target):
req['path'] = target
else: # target can have a query component, so /route/path could be something like /route/path?state=on&timeout=30
req['path'], query_string = target.split('?', 1)
req['query'] = parse_query_string(query_string)
req['headers'] = {}
for i in range(1, len(req_buffer_lines) - 1):
line = req_buffer_lines[i].decode('utf-8')
if (line == ''): # Blank line signifies the end of headers.
break
else:
name, value = line.split(':', 1)
req['headers'][name.strip()] = value.strip()
req['body'] = req_buffer_lines[len(req_buffer_lines) - 1].decode('utf-8') # Last line is the body (or blank if no body.)
return req
print(f'{gc.mem_free()} RAM at start')
req_buffer = b'PUT /gpio/2?foo=bar&bar=baz HTTP/1.1\r\nContent-Type: text/plain\r\n\r\noff\r\n'
print(f'{gc.mem_free()} RAM after buffer allocation')
req = parse_http_request2(req_buffer)
print(f'{gc.mem_free()} RAM after request parse')
gc.collect()
print(f'{gc.mem_free()} RAM after garbage collection')
print(req) |
Beta Was this translation helpful? Give feedback.
-
Could it be that you need something like
i.e. a string split generator that returns the line substring memory views (without any new allocation)? I found this function here and adapted it a bit. Also see this discussion. Then it seems we have readfrom_mem, writeto_mem in different circumstances like i2c, but not for string decoding. |
Beta Was this translation helpful? Give feedback.
-
@rkompass Unfortunately this (i.e. slicing a string or bytes) does cause an allocation. You can do this with memoryview though, in which case the only allocation is for each memoryview instance (16 bytes) but not the underlying string data. |
Beta Was this translation helpful? Give feedback.
-
Thank you @jimmo.
and I get:
the last line indicates that |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a tcp socket that receives a client request as
req_buffer
I then pass it to a method that splits the buffer into an array of lines, like this:
After this split, I never use the original
req_buffer
again.I then go through
req_buffer_lines
line by line to parse data into a dictionary.My concern is this... Am I creating a double copy of the data with the split and using extra memory on an already constrained system? Or, is MicroPython smart enough to see that
req_buffer
is never used again and will free up the space it was occupying? Do I need to force this with agc.collect()
after the split, or does it happen automatically?When parsing the lines in
req_buffer_lines
, should I be deleting them when I'm done so they can be garbage collected? Or am I overthinking this and all these details are taken care of as well?Here's the entire method:
Beta Was this translation helpful? Give feedback.
All reactions