The power of coroutines in statistics #10147

rkompass · 2022-12-02T14:48:13Z

rkompass
Dec 2, 2022

Hello, I want to share an approach I found when studying "Clean code in Python" by Mariano Anaya.

Basically in his book on page 253 he suggests doing statistics with a stream of data retrieved from a file (CSV format).
As he/we want to compute minimum, maximum and average he suggests tripling the stream with
itertools.tee. So the processing function is:

def load_fromfile(filename, n=0): # jump over n lines
    with open(filename) as f:
        for _ in range(n):
            next(f)
        for line in f:
            *_, dur = line.split()
            yield float(dur)

def process_purchases1(purchases):
    min_, max_, sum_ = tee(purchases,3)
    return min(min_), max(max_), sum(sum_)

fg = load_fromfile('snakescnt_1000.csv', 1)
mn, mx, sm = process_purchases1(fg)

This looks elegant, but behind the curtain it is not. Actually tee stores the whole sequence an returns independent iterators for it, which are consumed by min(), max() and sum(). A consequence is, that on my Pico only a datafile of 1000 lines may be processed, 2000 lines lead already to a memory error.

The idea: Why not program min, max, sum the other way round: We stuff data into them and get a result if desired.
This is achieved with coroutines:

def min_coro():
    mn = yield
    while True:
        x = yield mn
        if x is not None:
            mn = min(mn, x)

def max_coro():
    mx = yield
    while True:
        x = yield mx
        if x is not None:
            mx = max(mx, x)

def sum_coro():
    sm = 0
    while True:
        x = yield sm
        if x is not None:
            sm += x

Now the processing function looks like:

def process_purchases2(purchases):
    minc = min_coro(); next(minc)
    maxc = max_coro(); next(maxc)
    sumc = sum_coro(); next(sumc)    
    while True:
        try:
            x = next(purchases)
            minc.send(x)
            maxc.send(x)
            sumc.send(x)
        except StopIteration:
            return minc.send(None), maxc.send(None), sumc.send(None)

and I can process very (arbitrary?) long datasets. Speed is relatively the same, due to slowness of tee, probably.
It takes 8 s to process 10000 data this way.
If you like this approach, here is a coroutine for mean and standard deviation:

def meanstd_coro():
    sm2 = 0
    sm = 0
    n = 0
    avg = None
    std = None
    while True:
        x = yield avg, std
        if x is not None:
            n += 1
            sm += x
            sm2 += x*x
            avg = sm/n
            if n > 1:
                std = sqrt((sm2-sm*sm/n)/(n-1))

peterhinch · 2022-12-03T09:05:43Z

peterhinch
Dec 3, 2022
Collaborator

That is a nice use of coroutines/generators. Another approach is to use closures:

def amin():
    s = None
    def inner(x=None):
        nonlocal s
        if x is not None:
            s = x if s is None else min(s, x)
        return s
    return inner

demo

mymin = amin()
for x in range(10, 2, -1):
    mymin(x)  # Returns running minimum if required
print(f"minimum is {mymin()}")

0 replies

peterhinch · 2022-12-03T11:28:39Z

peterhinch
Dec 3, 2022
Collaborator

It occurred to me later that the use of closures can be generalised. This will work with any function of two variables:

def do_func(func):
    s = None
    def inner(x=None):
        nonlocal s
        if x is not None:
            s = x if s is None else func(s, x)
        return s
    return inner

Demo:

def demo():
    def add(x, y):
        return x + y
    mymin = do_func(min)  # Create a function for each operation
    mymax = do_func(max)
    mysum = do_func(add)
    for x in range(10, 2, -1):
        mymin(x)  # Returns running minimum
        mymax(x)
        mysum(x)
    print(f"minimum is {mymin()} max is {mymax()} sum is {mysum()}")  # Final values

0 replies

sosi-deadeye · 2022-12-03T15:37:23Z

sosi-deadeye
Dec 3, 2022

Everything in one function:

def min_max_sum(values):
    iterator = iter(values)
    _min = _max = _sum = next(iterator)
    for value in iterator:
        _min = min(_min, value)
        _max = max(_max, value)
        _sum += value
    return _min, _max, _sum


def demo():
    result = min_max_sum(range(10, 2, -1))
    print("minimum is {} max is {} sum is {}".format(*result))

You can also easily convert it to a generator:

def min_max_sum_gen(values):
    iterator = iter(values)
    _min = _max = _sum = next(iterator)
    yield _min, _max, _sum

    for value in iterator:
        _min = min(_min, value)
        _max = max(_max, value)
        _sum += value
        yield _min, _max, _sum

def demo2():
    for row in min_max_sum_gen(range(10, 2, -1)):
        print("minimum is {} max is {} sum is {}".format(*row))

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MicroPython

The power of coroutines in statistics #10147

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

MicroPython

The power of coroutines in statistics #10147

Uh oh!

rkompass Dec 2, 2022

Replies: 3 comments

Uh oh!

peterhinch Dec 3, 2022 Collaborator

demo

Uh oh!

peterhinch Dec 3, 2022 Collaborator

Uh oh!

Uh oh!

sosi-deadeye Dec 3, 2022

rkompass
Dec 2, 2022

peterhinch
Dec 3, 2022
Collaborator

peterhinch
Dec 3, 2022
Collaborator

sosi-deadeye
Dec 3, 2022