MATH 314 Homework 03

  1. Welford's online algorithm for computing the mean and variance/standard deviation work with just one data point at a time, as if the data were streaming in and you didn't have the memory or ability to store past data.

    I prefer to write the math a bit differently than the Wikipedia page linked above, but to the same effect. Initialize N=0,  m=0,  v=0N = 0, \; m = 0, \; v = 0. Update these variables with a new observation xx

    N+=1w=1/Nd=xmm+=dwv+=vw+d2w(1w)\def\pluseq{\mathrel{+}=}\begin{align*}N & \pluseq 1\\w &= 1 / N\\d &= x - m\\m & \pluseq d * w\\v & \pluseq -v * w + d ^ 2 * w * (1 - w)\end{align*}

    Write a Python class called OnlineMeanStd, which implements the following API:

    om = OnlineMeanStd()
    om.update(1)
    om.update(2)
    om.update(3)
    om.mean()
    om.std()
    om.count()
    om.size()
    

    Note that the mathematics above track the biased variance, where as I'm asking for unbiased standard deviation ss. The class method std() needs to make the appropriate conversion:

    s=vom.count()/(om.count() - 1)s = \sqrt{v * \text{om.count()} / (\text{om.count() - 1})}

    The constructor should accept a size argument, which defaults to 1, that sets the number of means and standard deviations to be tracked.