Skip to content

Average iter #1075

@michaelciraci

Description

@michaelciraci

Overview

To calculate an average (mean) in Rust, normally something like this is used:

let a = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let avg = a.iter().sum::<f64>() / a.len() as f64;

And that's fine for short vectors/iterators. However, when iterators get long, float rounding becomes an issue. For example:

let a = vec![1.96; 1024];
let avg = a.iter().sum::<f64>() / a.len() as f64;
assert_eq!(avg, 1.96); // panic!
// assertion `left == right` failed
//    left: 1.9600000000000248
//   right: 1.96

However, we can use an iterative mean to calculate the average without summing the entire iterator: http://www.heikohoffmann.de/htmlthesis/node134.html

Additionally, averaging more complex iterators with filtering operations is not as trivial.

Other Languages

Both C# and Kotlin have this built into their standard library:

  • C#: Generic over the iterator type, but returns f32 or f64
  • Kotlin: Generic over the iterator type, but always returns f64

Example Rust Implementation

pub trait AverageExt<T>: Iterator<Item = T> {
    fn average(self) -> Option<f64>;
}

impl<T, I> AverageExt<T> for I
where
    I: Iterator<Item = T>,
    T: Into<f64>,
{
    fn average(self) -> Option<f64> {
        let mut iter = self;
        if let Some(first) = iter.next() {
            let mut count: usize = 1;
            let mut avg = first.into();

            for num in iter {
                count += 1;
                avg = avg + (num.into() - avg) / (count as f64);
            }

            Some(avg)
        } else {
            None
        }
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn f() {
        let a = vec![1.0, 2.0, 3.0, 4.0, 5.0];
        let b = a.into_iter().average().unwrap();
        assert_eq!(b, 3.0);

        // Example 1 where average with sum does not work
        let a = vec![1e10 + 2.5; 1024 * 1024];
        let s1 = a.iter().sum::<f64>() / a.len() as f64;
        let s2 = a.into_iter().average().unwrap();
        assert_eq!(s1, s2);
        // thread 'tests::f' (3950386) panicked at src/lib.rs:47:9:
        // assertion `left == right` failed
        //   left: 10000000002.214748
        //  right: 10000000002.5

        // Example 2 where average with sum does not work
        let a = vec![1.96; 1024];
        let s1 = a.iter().sum::<f64>() / a.len() as f64;
        let s2 = a.into_iter().average().unwrap();
        assert_eq!(s1, s2);
        //     thread 'tests::f' (3952567) panicked at src/lib.rs:53:9:
        // assertion `left == right` failed
        //    left: 1.9600000000000248
        //   right: 1.96

        // Implementation above is generic over input type
        let a: Vec<i32> = vec![1, 2, 3, 4, 5];
        let b = a.into_iter().average().unwrap();
        assert_eq!(b, 3.0);
    }
}

The only related issue I could find when searching is this: #1030, however this seems much more specific to statistics. Additionally, a basic average method was generic enough for other languages to add to their library.

I can open a pull request if this method is liked. If we go forward with this method, one question is do we always return f64, or do we make the return type generic over both f32 and f64.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions