Notes of various kinds, for various things. Some of these might be helpful for some students in some classes. Some might help non-students. Most help me in some way.
Big ups to Nathan Tsoi for his advice on getting Jupyter setup behind an nginx reverse proxy at a specific path.
Jay Rotella, Ph.D. has some nice resources for teaching WILD 502 - Analysis of Population & Habitat Data. I should check more of it out one day.
Cosma Rohilla Shalizii explains this one perfectly himself: "This is a draft textbook on data analysis methods, intended for a one-semester course for advance undergraduate students who have already taken classes in probability, mathematical statistics, and linear regression." Advanced Data Analysis from an Elementary Point of View. Some class materials associated with this book are found here.
Colin Rundel and Mine Çetinkaya-Rundel developed the R package ghclass for managing a class on GitHub.
Sebastian Nowozin has a great blog post on a streaming or online computation of log-sum-exp, namely $\log \sum_{n=1}^N \exp x_n$. There's well known tricks to calculate log-sum-exp while avoiding underflow and overflow issues, but to perform those same trick and calculate it online is really novel.
function logsumexp_stream(xs)
T = float(eltype(x))
alpha = typemin(T)
r = zero(T)
for x in xs
if x <= alpha
r += exp(x - alpha)
else
r *= exp(alpha - x)
r += one(T)
alpha = x
end
end
log(r) + alpha
end
For some fun with tail probabilities as related to t-distributions with various degrees of freedom, see Statistical Odds and Ends posts How heavy-tailed is the t distribution? and How heavy-tailed is the t distribution? (Part 2), and John D. Cook's post 50 sigma events for t distributions.
In an attempt to help students relate to probabilities of various magnitudes, here's a work-in-progress list of (hopefully) relatable probabilities. I'm aiming for at least one number of each magnitude. That is one probability $a \times 10^{-b}$ for $b \in \{1, \ldots, 79\}$ and any $a$.
Danielle Navarro's book Learning statistics with R has an associated github page, on which one can find all the datasets used. Very helpful.
In my notebooks about models and extrapolation, I developed some simulated data representing the relationship between age and height of US children. The simulated data are based on percentiles found on CDC webpage: Data Table of Stature-for-age Charts . I fit a cubic polynomial through the percentiles, on age, added some noise and then used these noisy data as my dataset.
I dream of an online homework system for math and statistics (and more?). The calculator insect could be a helpful component of this.
Pavel Sountsov and Matt D. Hoffman, authors of Focusing on Difficult Directions for Learning HMC Trajectory Lengths, made available an implementation of SNAPER-HMC.
Larry Leemis, professor in the Department of Mathematics at Purdue Unversity made a Univariate Distribution Relationships webpage. Here is a corresponding paper.
ProbOnto is a website that maps distributions to other distributions through some transformations of the parameters. It makes for an excellent reference of probability distributions and common transformations thereof.
How to Collect, Store, and Plant Acorns is a short bulletin from the California Oak Foundation that describes what it says.
Felix Riedel has a nice blog post on speeding up pairwise distances of observations stored in matrices: Pairwise distances in R.
Wilson Ye Chen co-author of Optimal Thinning of MCMC Output has a relatively simple and easy to follow implementation of the methods found within the paper, stein.thinning.
A nice blog post about quasi-random sequences by Martin Roberts: The Unreasonable Effectiveness of Quasirandom Sequences
Modern Dive is a nice intro to stats OER that uses R and much of the tidyverse. Though they use too much of their own package for my taste, it's still a good book.
While browsing the The Graz University of Technology 9 week course on "Bayesian Probability Theory", I came across the book Bayesian Probability Theory by Wolfgang von der Linden, Volker Dose, and Udo von Toussaint. Probably has some good examples in the physical sciences for teaching.
Johnny Chen's scripts to bootstrap a Julia kernel in Google Colab: colab-julia-bootstrap
mathchi.io claims to deliver Tikz code from a drawing. Half an effort gives some support to this claim.
The official TidyTuesday GitHub repository. A wealth of datasets.
Ivan hosts the blog Good news, everyone!. I should probably read the posts on Breaking sticks, or estimation of probability distributions using the Dirichlet process and Bayesian inference of the net promoter score via multilevel regression with poststratification
Professor David E. Joyce of Clark University hosts course materials for Math 217/Econ 360, Probability and Statistics
An old, but potentially full of gems, teaching website called Chance.
The Graz University of Technology posted a fun looking 9 week course on Bayesian Probability Theory.
The site Data Analysis in the Geosciences is a great resource for teaching statistics to non-stats majors and specifically geosciences based majors. The first page on this site that I came across was Regression.
The Palaeontological Association hosts a column named PalaeoMaths 101, which is a great resource for teaching statistics to non-stats majors and specifically anthropology based majors. Find helpful and probably more fun datasets here. I wish more academic disciplines did this sort of thing. The first one of these I came across was PalaeoMath: Part 2 - Regression II.
I should probably order this book one day, Bayesian Filtering and Smoothing.
The blog Neuroscience, Stats, and Coding by Jonas Kristoffer Lindeløv contains a great tutorial, Common statistical tests are linear models (or: how to teach stats).
Martha K. Smith, retired as Professor of Mathematics from the University of Texas at Austin, hosts the site Common Mistakes In Using Statistics, which has a nice summary of probability, What Is Probability?
The online book Statistical Thinking for the 21st Century could be good. Need to spend some time looking through it.
The site 3blue1brown looks promising. Check out their lessons on probability.
I'm not entirely sure who puts out the blog Lojic Technologies Blog, presumably Lojic Technologies, but the are some goods posts on this site, particularly An intuition for the Fourier Transform. And to be fair, this post is just referencing the video found on 3blue1brown.
The online book Interpretable Machine Learning looks promising, but I haven't given it much time yet.
RegexOne is a great hands on introduction to the world of regular expressions.
MIT's course The Missing Semester of Your CS Education looks like it will be a great resource for future data science courses.
The website SQLBolt is a great resource for beginners looking to dive into the world of SQL.
Mine Çetinkaya-Rundel is a superstar of the R world. Thankfully she also hosts here teaching materials on her website.
Kenneth Tay hosts a blog, Statistical Odds and Ends which covers a wide range of topics, many of which are explained quite well.
Brian Keng, research director at BorealisAI, hosts a blog named Bounded Rationality.
Grant R. McDermott, an Assistant Prof. at the University of Oregon, who is interested in environmental economics & data science, hosts a blog. The post on Marginal effects and interaction terms introduced me to some R syntax within the function lm()
that I'd never seen before.
The website Better Explained has some low-level explanations that might be helpful for simplifying various statistical or mathematical topics. The author Kalid Azad has some good intentions. I first found the page An Intuitive (and Short) Explanation of Bayes’ Theorem.
I'm not sure what the general site is all about, here, but the page Bayes' Rule might have some good explanations on it.
LessWrong is a place to 1. develop and train rationality, and 2. apply one’s rationality to real-world problems. The post The Best Textbooks on Every Subject is, if nothing else, a nice resource of some highly recommended (by who, exactly?) books.
NIST Digital Library of Mathematical Functions is a great resource, which I should definitely use more often.
Introduction to Probability for Data Science by Stanley H. Chan seems to be quite popular and has associated code in MATLAB, Python, Julia, and R.
Feature Engineering and Selection: A Practical Approach for Predictive Models is a free online book (hard copies for purchase), which focuses on good predictive modeling.
Some papers I should read, in no particular order.
Yuling Yao writes about statistics on his blog. The Bayesian ideas are often delightfully new and creative.
Thomas Lumley writes about statistics on his blog.
John D. Cook often posts about mathematics and sometimes statistics on his blog. The math can get pretty deep.
Frank Harrell make a blog out of his website. His quick analyses are often quite insightful and come along with great experience in applied statistics.
The site R-bloggers aggregates some other info that I occassionally find helpful/fun/informative.
Some students might be interested the blog Exploring Baseball Data with R.
Megan Higgs hosts the blog Critical Inference. I should probably make my students read her re-post How typical is a median person?, originally posted on Statisticians React to the News.
Christian Robert hosts a blog that I like to read.
Andreas Kröpelin has a nice blog post about Pluto.jl and using it to make presentations.
Paul Tol hosts some great notes on color-blind friendly colors within graphics. Here's some schemes that will likely be helpful later on.
bright:
high-contrast:
vibrant:
muted:
medium-contrast:
light:
There's a live cam on the Sacramento National Wildlife Refuge.
Murray Logn, maybe this person, developed the site Show us your R's for helping ecologists with statistics and R. The discussion on Species, Richness and Diversity is a nice little survey.
Applied Time Series Analysis for Fisheries and Environmental Sciences} and the R package atsar
David Anderson and Kenneth Burnham's AIC Myths and Misunderstandings
Rob J. Hyndman's Facts and fallacies of the AIC
An interactive site from Brown University about statistics: Seeing Theory
Charles Frye's blog
STATS 1201 at Columbia University by Wenda Zhou in the summer of 2017
Probability and Bayesian Modeling by Jim Albert and Jingchen Hu
Aleph 0's YouTube video where he details some resources to self-teach pure mathematics.
Look syntax highlighted code and math.
function sum(x)
s = zero(eltype(x))
for n in eachindex(x)
s += x[n]
end
end
Using the definition $\mathbb{E}[g(X)] = \int g(x) dP$, take $ g(x) = 1_A(x) $, so that we can define probability from an expectation $ \mathbb{E}[1_{A}(x)] = \mathbb{P}[A] $.
The Probability Distribution Explorer is a pretty good reference site for commonly used probability distributions. Scroll down, most distributions are interactive.
Aki Vehtari is the source of so many modern Bayesian tools. Check out his page on Model assesment, selection, and inference after selection and specifically Cross Validation FAQ.
Adrian Price-Whelan has some blog posts I should read. I found him from his MCMC model The Joker, with code available at this repository.
Jake VanderPlas and his blog often have some cool stuff. For instance, check out the post Frequentism and Bayesianism: A Practical Introduction, which contains a nice explanatoin about how so-called "non-informative" priors are often not non-informative after all.
Yves Atchadé is a researcher whom I greatly admire, specifically with regards to Markov Chain Monte Carlo methods.
Jenny Bryan is such an R superstar that she switched from a tenured Associate Professor of Statistics position to an adjunct professor position and a software engineer at RStudio. In fall 2018, my MATH 385 Introduction to Data Science course found an issue with the tidyverse package readxl, and Jenny fixed it!
Arnaud Doucet is a researcher whom I greatly admire, specifically with regards to his work on Sequential Monte Carlo and Markov Chain Monte Carlo methods.
Danielle Navarro is an amazing educator and researcher. She's produced some great educational materials, e.g. Learning Statistics with R and Data Science with R, and I also really like her detailed thinking about science, e.g. The case for formal methodology in scientific reform.
Jeffrey Rosenthal runs probability.ca, which is a great resource for Jeffrey's own research and for researchers working in similar areas. The pages on Jeffrey's books, research, and fellow probabilists, in particular, are excellent.
Olle Häggström's book, Anthropic Bias sounds super interesting.
Cosma Shalizi, from whom I derived the idea for such notes. He calls his notebooks.
A Google map highlighting some birding locations at which I've had success.
I wouldn't call myself a good knitter, but I occassionally knit small rectangles and use them as sponges. Davina's webpage Sheep and Stitch has been helpful.
A Geometric Interpretation of the Metropolis-Hastings Algorithm by L. J. Billera and P. Diaconis.
What Do We Know about the Metropolis Aglorithm by P. Diaconis and L. Saloff-Coste.
The Metropolis-Hastings algorithm by C. P. Robert.
Stephen Rhodes has a blog post titled Logit Model for RJags, which I used to test and validate my implementation of Stan's HMC algorithm written in Julia. Code for recreating this simulated data and fitting the model in RJags is in this GitHub repository.