MATH 385 Week 11 Worksheet
Welford's online algorithm for computing the mean and variance (and therefore standard deviation, if desired) work with just one data point at a time, as if the data were streaming in and you didn't have the memory or ability to store past data.
I prefer to write the math a bit differently than the Wikipedia page linked above, but to the same effect. Initialize . Update these variables with a new observation
Write a Python class called
OnlineMeanVar
, which implements the following API:om = OnlineMeanVar() om.update(1) om.update(2) om.update(3) om.mean() om.var() om.var_biased() om.count() # the number of times update() has been called om.size()
The method
om.var_biased()
should return the value ofv
. The methodom.var()
should returnThe constructor should accept a
size
argument, which defaults to , that sets the number of means and variances to be tracked. Ifsize
is bigger than , thenom.update(x)
's argumentx
should be a -dimensional array of sizesize
. Further, ifsize
is bigger than , thenom.mean()
,om.var()
, andom.var_biased()
should all return -dimensional arrays of sizesize
.Use the class above to construct a convergence path plot of paths for each of two different calculations of the variance,
var()
andvar_biased()
. Thus, your convergence path plot should have total paths. Each path should consist of data points.- You can either initialize a dataframe with the appropriate columns and then use plotnine, or use matplotlib, like I did in our class notes.
- Generate fake data from whichever distribution you want.
- Proper design of the plot above ensures you should never had nested Python loops to answer this question.
- Color the paths based on which variance calculation is used.
Do you notice any differences in the convergence paths between the two estimates of the variance? If so, what do you notice?