MATH 351 Homework 04

Welford's online algorithm for computing the mean and variance work with just one data point at a time, as if the data were streaming in and you didn't have the memory or ability to store the past data.

I prefer to write the math down a bit differently, but to the same effect. Initialize N=0,  m=0,  v=0N = 0, \; m = 0, \; v = 0. Update these variables with a new observation xx

N+=1w=1/Nd=xmm+=dwv+=vw+d2w(1w)\def\pluseq{\mathrel{+}=}\begin{align*}N & \pluseq 1\\w &= 1 / N\\d &= x - m\\m & \pluseq d * w\\v & \pluseq -v * w + d ^ 2 * w * (1 - w)\end{align*}
  1. Write a Python class called OnlineMoments, which implements the following API:

    om = OnlineMoments()
    om.update(1)
    om.update(2)
    om.update(3)
    om.mean()
    om.var()

    Ensure your class passess the following tests:

    om = OnlineMoments()
    om.update(1)
    om.update(1)
    om.update(1)
    assert om.mean() == 1
    assert om.var() == 0
    import numpy as np
    om = OnlineMoments()
    om.update(1)
    om.update(2)
    om.update(3)
    assert om.mean() == 2
    assert np.isclose(om.var(), np.var([1, 2, 3]))
    

  2. Why did we need to use the Numpy function isclose? Read the Python webpage Floating Point Arithmetic: Issues and Limitations most of which is not specific to Python.

  3. Read the subsections how does floating point work? and a variance calculation gone wrong on the incredibly informative webpage Examples of floating point problems by Julia Evans.

  4. Write one more test of your class OnlineMoments using the more challenging example provided on Julia's webpage in the subsection Examples of floating point problems. Hint: you'll have to write a for-loop to make your class methods fit Julia's example.