To what extent should I be trying to optimize memory usage? #9778

DavesCodeMusings · 2022-10-28T21:51:41Z

DavesCodeMusings
Oct 28, 2022

I have a tcp socket that receives a client request as req_buffer

I then pass it to a method that splits the buffer into an array of lines, like this:

req_buffer_lines = req_buffer.decode('utf8').split('\r\n')

After this split, I never use the original req_buffer again.

I then go through req_buffer_lines line by line to parse data into a dictionary.

My concern is this... Am I creating a double copy of the data with the split and using extra memory on an already constrained system? Or, is MicroPython smart enough to see that req_buffer is never used again and will free up the space it was occupying? Do I need to force this with a gc.collect() after the split, or does it happen automatically?

When parsing the lines in req_buffer_lines, should I be deleting them when I'm done so they can be garbage collected? Or am I overthinking this and all these details are taken care of as well?

Here's the entire method:

    @staticmethod
    async def parse_http_request(req_buffer):
        """
        Given a raw HTTP request, return a dictionary with individual elements broken out

        Args:
            req_buffer (bytes): the unprocessed HTTP request sent from the client

        Raises:
            exception: when the request buffer is empty

        Returns:
            dictionary: key/value pairs including, but not limited to method, path, query, headers, body, etc.
                or None type if parsing fails
        """
        assert (req_buffer != b''), 'Empty request buffer.'

        req = {}
        req_buffer_lines = req_buffer.decode('utf8').split('\r\n')
        req['method'], target, req['http_version'] = req_buffer_lines[0].split(' ', 2)  # Example: GET /route/path HTTP/1.1
        if (not '?' in target):
            req['path'] = target
        else:  # target can have a query component, so /route/path could be something like /route/path?state=on&timeout=30
            req['path'], query_string = target.split('?', 1)
            req['query'] = Thimble.parse_query_string(query_string)

        req['headers'] = {}
        for i in range(1, len(req_buffer_lines) - 1):
            if (req_buffer_lines[i] == ''):  # Blank line signifies the end of headers.
                break
            else:
                name, value = req_buffer_lines[i].split(':', 1)
                req['headers'][name.strip()] = value.strip()
                
        req['body'] = req_buffer_lines[len(req_buffer_lines) - 1]  # Last line is the body (or blank if no body.)
        
        return req

jimmo · 2022-10-29T00:47:17Z

jimmo
Oct 29, 2022
Maintainer

Am I creating a double copy of the data with the split

Depending on how you count, it's actually a triple copy -- at one instant there will be:

the original req_buffer bytes
the decoded str
the split list of strs

The second one is temporary and nothing holds a reference to it, but will not be freed until a collection runs.

(That point is a bit subtle -- at a high level there really isn't a difference between "available for free" and "actually free" because (see below) an allocation attempt will implicitly collect if necessary, but practically there is a difference when it comes down to heap fragmentation and peak memory usage).

Or, is MicroPython smart enough to see that req_buffer is never used again and will free up the space it was occupying?

As long as there still exists a reference to req_buffer, the GC cannot reclaim it. There's no analysis of "this variable is not used later in this function" (I'm not sure that you could do this in a way that wouldn't subtly break programs -- the rule is that scope is what is important, not usage).

But you can control this manually -- Normally I would suggest that you can use del req_buffer to remove the reference held by the local variable, however in this case req_buffer is an argument, so the caller still holds a reference to it.

Do I need to force this with a gc.collect() after the split, or does it happen automatically?

Manually calling gc.collect is only useful to help force the gc to run at a time where it will have the most benefit in terms of freeing up contiguous memory (i.e. you should explicitly call gc.collect when there are the fewest "live" objects). Calling it manually will never help with the discovery of "freeable" regions.

The thing that does happen automatically is that if an allocation fails, it implicitly does a gc.collect then re-tries the allocation.

When parsing the lines in req_buffer_lines, should I be deleting them when I'm done so they can be garbage collected?

What happens here is that split creates a list of strings. Each string is referenced by the list. You could in theory remove the elements from the list (e.g. by assigning None to the list elements), but that wouldn't help solve the fact that you needed to allocate the full set to start with. This is just an unfortunate inherent limitation of split().

(FWIW, this is why MicroPython has an ilistdir() as well as listdir(). listdir works like split, whereas ilistdir makes an iterator that returns a single element at a time. I'm not sure why listdir got the special treatment, but it would be useful to have something like this for split (but impossible to do in a CPython-compatible way).

I gave a talk at the Melbourne MicroPython Meetup a couple of years that covers some of these concepts... https://www.youtube.com/watch?v=H_xq8IYjh2w It's a bit informal, would love to do a more polished version. It's hard to explain the impact of memory management on a program by giving a bunch of specific rules and examples, so instead the idea was to cover a bit of "how does the GC work at a fundamental" so that the higher-level behaviours can be inferred from this.

1 reply

DavesCodeMusings Oct 29, 2022
Author

Thank you for that explanation. It really helps me visualize what's going on. I'll definitely check out the video link. And, I see I have some room for improvement in the way I handle the data in the buffer.

DavesCodeMusings · 2022-10-29T14:57:11Z

DavesCodeMusings
Oct 29, 2022
Author

I want to start this by saying I'm not looking for a fix at this point, I'm just sharing some findings I thought were interesting.

I tried optimizing my code a bit by swapping this line:

req_buffer_lines = req_buffer.decode('utf8').split('\r\n')

for this:

req_buffer_lines = req_buffer.split(b'\r\n')

and then decoding each line as I examine it.

I figured by breaking the buffer into lines first, I would be decoding shorter strings, and by reusing the variable that temporarily holds the decoded string I would make overall memory usage less.

To test, I created two functions: one with the old way of converting the entire buffer to string then splitting into lines, and another with the new way of splitting the buffer into lines and decoding a single line at a time to string.

I printed free memory at various points. Here's what I got...

Using the original method of converting entire buffer to string and then splitting into lines:

33904 RAM at start
33824 RAM after buffer allocation
32800 RAM after request parse
33520 RAM after garbage collection
{'path': '/gpio/2', 'headers': {'Content-Type': 'text/plain'}, 'method': 'PUT', 'query': {'bar': 'baz', 'foo': 'bar'}, 'http_version': 'HTTP/1.1', 'body': ''}

Alternative method of splitting buffer into lines and then converting individual lines to strings:

33904 RAM at start
33824 RAM after buffer allocation
32688 RAM after request parse
33488 RAM after garbage collection
{'path': '/gpio/2', 'headers': {'Content-Type': 'text/plain'}, 'method': 'PUT', 'query': {'bar': 'baz', 'foo': 'bar'}, 'http_version': 'HTTP/1.1', 'body': ''}

It seems I've actually made things worse.

The code I used to test is shown below. All I did between the two runs was to change the function call from parse_http_request to parse_http_request2.

I'm not sure what I've proved here, if anything. It was just unexpected and I wanted to share in case anyone else is interested.

import gc
gc.collect()

def parse_query_string(query_string):
    """
    Split a URL's query string into individual key/value pairs
    (ex: 'pet=Panda&color=Red' becomes { "pet": "panda", "color": "red"}
    Args:
        query_string (string): the query string portion of a URL (without the leading ? delimiter)
        
    Returns:
        dictionary: key/value pairs
    """
    query = {}
    query_params = query_string.split('&')
    for param in query_params:
        if (not '=' in param):  # A key with no value, like: 'red' instead of 'color=red'
            key=param
            query[key] = ''
        else:
            key, value = param.split('=')
            query[key] = value
                
    return query


def parse_http_request(req_buffer):
    """
    Given a raw HTTP request, return a dictionary with individual elements broken out
     
    Args:
        req_buffer (bytes): the unprocessed HTTP request sent from the client

    Raises:
        exception: when the request buffer is empty

    Returns:
        dictionary: key/value pairs including, but not limited to method, path, query, headers, body, etc.
            or None type if parsing fails
    """
    assert (req_buffer != b''), 'Empty request buffer.'

    req = {}
    req_buffer_lines = req_buffer.decode('utf8').split('\r\n')
    req['method'], target, req['http_version'] = req_buffer_lines[0].split(' ', 2)  # Example: GET /route/path HTTP/1.1
    if (not '?' in target):
        req['path'] = target
    else:  # target can have a query component, so /route/path could be something like /route/path?state=on&timeout=30
        req['path'], query_string = target.split('?', 1)
        req['query'] = parse_query_string(query_string)

    req['headers'] = {}
    for i in range(1, len(req_buffer_lines) - 1):
        if (req_buffer_lines[i] == ''):  # Blank line signifies the end of headers.
            break
        else:
            name, value = req_buffer_lines[i].split(':', 1)
            req['headers'][name.strip()] = value.strip()
                
    req['body'] = req_buffer_lines[len(req_buffer_lines) - 1]  # Last line is the body (or blank if no body.)
        
    return req


def parse_http_request2(req_buffer):
    """
    Given a raw HTTP request, return a dictionary with individual elements broken out
     
    Args:
        req_buffer (bytes): the unprocessed HTTP request sent from the client

    Raises:
        exception: when the request buffer is empty

    Returns:
        dictionary: key/value pairs including, but not limited to method, path, query, headers, body, etc.
            or None type if parsing fails
    """
    assert (req_buffer != b''), 'Empty request buffer.'

    req = {}
    req_buffer_lines = req_buffer.split(b'\r\n')
    line = req_buffer_lines[0].decode('utf-8')
    req['method'], target, req['http_version'] = line.split(' ', 2)  # Example: GET /route/path HTTP/1.1
    if (not '?' in target):
        req['path'] = target
    else:  # target can have a query component, so /route/path could be something like /route/path?state=on&timeout=30
        req['path'], query_string = target.split('?', 1)
        req['query'] = parse_query_string(query_string)

    req['headers'] = {}
    for i in range(1, len(req_buffer_lines) - 1):
        line = req_buffer_lines[i].decode('utf-8')
        if (line == ''):  # Blank line signifies the end of headers.
            break
        else:
            name, value = line.split(':', 1)
            req['headers'][name.strip()] = value.strip()
                
    req['body'] = req_buffer_lines[len(req_buffer_lines) - 1].decode('utf-8')  # Last line is the body (or blank if no body.)
        
    return req


print(f'{gc.mem_free()} RAM at start')
req_buffer = b'PUT /gpio/2?foo=bar&bar=baz HTTP/1.1\r\nContent-Type: text/plain\r\n\r\noff\r\n'
print(f'{gc.mem_free()} RAM after buffer allocation')
req = parse_http_request2(req_buffer)
print(f'{gc.mem_free()} RAM after request parse')
gc.collect()
print(f'{gc.mem_free()} RAM after garbage collection')
print(req)

2 replies

jimmo Oct 31, 2022
Maintainer

The simple explanation is that in your new way (decode after splitting), despite creating the same total copies of string data you end up creating more total string objects. A string object is 16 bytes. That said, your peak "live" usage will be lower (two-thirds), despite having a higher peak "allocated" amount.

(There's more to this, e.g. not all strings are string objects and they can be interned strings instead, but in this case I think this explains what's going on).

DavesCodeMusings Oct 31, 2022
Author

After watching the video you linked, I began to suspect the 16-byte minimum allocation or fragmentation from creating the line variable inside a loop might be the cause.

In the end, I decided to take a third approach and do this:

req_buffer_string = req_buffer.decode('utf8')
req_buffer_lines = req_buffer_string.split('\r\n')
del req_buffer_string
gc.collect()

and then examine req_buffer_lines to parse the request. From the print(gc.mem_free()) I did along the way, it seems to be the best of the three approaches. Though I'll probably remove the forced gc.collect() and let it happen automatically as needed.

This has been a fascinating journey into the depths of MicroPython and I appreciate your assistance in understanding it.

rkompass · 2022-10-31T17:59:43Z

rkompass
Oct 31, 2022

Could it be that you need something like isplit_rn() in the following code example

foo0 = "This is\r\na\r\nvery very\r\nlong\r\nmulti-line\r\nstring."
foo1 = "This is\r\na\r\nvery very\r\nlong\r\nmulti-line\r\nstring.\r\n"

def isplit_rn(s):
    p = -1
    l = len(s)
    while True:
      n = s.find('\n', p+1)
      if n < 0:
          if p+1 < l:
              yield s[p+1:]
          break
      yield s[p+1:n-1]
      p = n

for s in isplit_rn(foo0):
    print(len(s), s)
for s in isplit_rn(foo1):
    print(len(s), s)

print(foo0.splitlines())
print(foo1.splitlines())

i.e. a string split generator that returns the line substring memory views (without any new allocation)? I found this function here and adapted it a bit. Also see this discussion.
I assume the above function is rather slow, but should have low memory footprint (which perhaps you want to test).
Unfortunately we have no generators in viper, else this code could be made run much faster.
In ordinary python re.finditer() seems to be the method of choice which is not available in uPy.
Apparently no other equivalent method available in memory-aware micropython? The reasons might be python compatibility, where isplit was not accepted as new string function and probably finditer being awfully complex?

Then it seems we have readfrom_mem, writeto_mem in different circumstances like i2c, but not for string decoding.

0 replies

jimmo · 2022-11-01T01:38:34Z

jimmo
Nov 1, 2022
Maintainer

i.e. a string split generator that returns the line substring memory views (without any new allocation)?

@rkompass Unfortunately this (i.e. slicing a string or bytes) does cause an allocation. You can do this with memoryview though, in which case the only allocation is for each memoryview instance (16 bytes) but not the underlying string data.

1 reply

jimmo Nov 1, 2022
Maintainer

You could imagine that in theory string (or bytes, but not bytearray) slicing could avoid the copy/allocation, however in some scenarios you genuinely do want it to be a copy (i.e. taking a small substring of a large string should not keep the original string pinned in RAM).

rkompass · 2022-11-01T09:35:13Z

rkompass
Nov 1, 2022

Thank you @jimmo.
Just to clarify this for myself I tried:

import gc, random

s = 4

gc.collect(); print(gc.mem_free(), end=': ')

for i in range(1000):
    s = s + i

print(gc.mem_free())

# -----------------------------------------
gc.collect()

b = bytearray((random.getrandbits(8) for _ in range(5000)))
s = b[0:4]
a = 0
e = 4
s = b[a:e]

gc.collect(); print(gc.mem_free(), end=': ')

for i in range(1000):
    a = e+1
    e = a+4
    s = b[a:e]

print(gc.mem_free())

# -----------------------------------------
c = memoryview(b)
s = c[0:4]
a = 0
e = 4
s = c[a:e]

gc.collect(); print(gc.mem_free(), end=': ')

for i in range(1000):
    a = e+1
    e = a+4
    s = c[a:e]

print(gc.mem_free())

and I get:

187344: 187328
182256: 134272
182224: 150224

the last line indicates that s = c[a:e] consumes 32 bytes in each iteration, 16 for the right side memory view production and 16 for the left side assignment.?.

2 replies

rkompass Nov 1, 2022

And yet another question:
If I iterate through a long array / bytearray / string and use every part only once until the end, is there a way to reassign say the last half of the array to a new variable without allocating the contents again, and freeing the first half (making it available fore gc)?

jimmo Nov 1, 2022
Maintainer

@rkompass

the last line indicates that s = c[a:e] consumes 32 bytes in each iteration, 16 for the right side memory view production and 16 for the left side assignment.?.

Not quite... there's no "left side assignment" -- assignment is just assigning the reference, no copies or allocations there.

It's 32 bytes each because to make a memoryview slice, first you have to allocate a slice object (16 bytes), and then the actual memoryview (another 16 bytes). There is definitely some scope for optimisation here, it's really unfortunate to have to allocate the slice to then immediately use it. (Lots of details here... difficult to do in a CPython compatible way, if someone feels adventurous it would be great to make a PEP to extend CPython's memory view in a way to support zero-allocation use cases and then we could implement that).

And just to confirm -- the middle line is exactly what you'd expect too -- 182256-134272 == 1000*(16+16+16)-16 (16 for the slice, 16 for the bytearray, and 16 (rounded up from 4) bytes for the copied data, then -16 for the net difference in live objects).

If I iterate through a long array / bytearray / string and use every part only once until the end, is there a way to reassign say the last half of the array to a new variable without allocating the contents again, and freeing the first half (making it available fore gc)?

No, I don't think so.

The heap does support shrinking allocations, so in theory we could make this work in conjunction with the memmove that the following code does:

b = bytearray(1024)
b[:] = b[512:]

The last line doesn't cause any allocation (except for the two slice objects), it just copies the second half of b down to the first half. However even though len(b) == 512, the underlying buffer is still the full 1024 bytes. I think we could probably put a gc_realloc there and reclaim the second half.

MicroPython

To what extent should I be trying to optimize memory usage? #9778

Uh oh!

DavesCodeMusings Oct 28, 2022

Replies: 5 comments · 6 replies

Uh oh!

jimmo Oct 29, 2022 Maintainer

Uh oh!

DavesCodeMusings Oct 29, 2022 Author

Uh oh!

DavesCodeMusings Oct 29, 2022 Author

Uh oh!

Uh oh!

jimmo Oct 31, 2022 Maintainer

Uh oh!

DavesCodeMusings Oct 31, 2022 Author

Uh oh!

rkompass Oct 31, 2022

Uh oh!

jimmo Nov 1, 2022 Maintainer

Uh oh!

jimmo Nov 1, 2022 Maintainer

Uh oh!

rkompass Nov 1, 2022

Uh oh!

rkompass Nov 1, 2022

Uh oh!

jimmo Nov 1, 2022 Maintainer

DavesCodeMusings
Oct 28, 2022

Replies: 5 comments 6 replies

jimmo
Oct 29, 2022
Maintainer

DavesCodeMusings Oct 29, 2022
Author

DavesCodeMusings
Oct 29, 2022
Author

jimmo Oct 31, 2022
Maintainer

DavesCodeMusings Oct 31, 2022
Author

rkompass
Oct 31, 2022

jimmo
Nov 1, 2022
Maintainer

jimmo Nov 1, 2022
Maintainer

rkompass
Nov 1, 2022

jimmo Nov 1, 2022
Maintainer