MATH 385 Week 05 Worksheet
- abs
- size
- sort
- sqrt
- sum
- Write a Python function
mean(x)
which accepts one Numpy array as the only argument, calculates, and returns the mean of the array - Write a Python function
std(x)
which accepts one Numpy array as the only argument, calculates, and returns the standard deviation of the array wheremean(x)
. - Write a Python function
median(x)
which accepts one Numpy array as the only argument, calculates, and returns the median of the array. A simple version of the median can be calculated as follows. Ifx
has an even number of values, return the mean of the two numbers in the middle of a sorted copy ofx
. Ifx
has an odd number of values, return the number in the middle of a sorted copy ofx
. - Write a Python function
mad(x)
which accepts one Numpy array as the only argument, calculates, and returns the median absolute deviation of the array. The median absolute deviation of an array can be calculated as follows. Calculate and store the median ofx
. Return the median of the absolute value of the difference betweenx
and its median. - Write a Python function
dot(x, y)
which accepts two Numpy arrays, calculates, and returns the following calculation. - Write a Python function
norm(x)
which accepts one Numpy array, calculates, and returns the following calculation. - Write a Python function
cosine_similarity(x, y)
which accepts two Numpy arrays, calculates, and returns the following calculation. This is an increasingly popular function in modern large language models. Here's a relatively simple read about this calculation, albeit written in Javascript: How does cosine similarity work? Uniform sampling from a stream of numbers, where you don't have the availability (memory or otherwise) to store more than one of the numbers at a time, is a clever trick.
Write a Python class called
OnlineUniformSampler
, which implements the following API:ou = OnlineUniformSampler(95928) ou.update(1) ou.update(2) ou.update(3) ou.sample() ou.count()
The class method
sample()
should return one, uniformly chosen, of the values input to the possibly many calls ofupdate()
.The class method
update()
should implement an algorithm based on the following description. For the first value passed toupdate()
, store it. For -th call toupdate(x_i)
, replace the stored value with passed in valuex_i
with probability . Not all calls toupdate()
will successfully replace the stored value. To replace a stored value with the -th value of with a probability , use code likerng = np.random.default_rng(seed) if rng.uniform() <= p: u = x
where the variable
rng
should be re-used within this class.- Use Pandas to read the following URL to a CSV dataset about penguins
https://raw.githubusercontent.com/roualdes/data/master/penguins.csv
to create a DataFrame of these data. - Print the top 6 rows of this DataFrame.
- Print the bottom 8 rows of this DataFrame.
- How many columns are in this DataFrame? How many rows? Instead of writing out numbers, show me the answers to these questions with Python code.
- What type is your DataFrame? What type are each of the columns? What are the types of the elements of each colum? Instead of writing out numbers, show me the answers to these questions with Python code.
- What is type
object
all about? - Do your Numpy functions above work on Pandas
Series
? Why? - Calculate the median
bill_length_mm
of all penguins whosebill_length_mm
is within plus/minus one standard deviation of the mean bill length. - Calculate the cosine similarity between
bill_length_mm
andbill_depth_mm
.