MATH 385 Week 07 Worksheet
Please submit one Python file (worksheet07_solutions.ipynb)
by 11:59pm Pacific time on Friday, October 11, in Week 07 to
your Week 07 GitHub repository.
- Penguins
- Use Pandas to read the following URL to a CSV dataset about penguins
https://raw.githubusercontent.com/roualdes/data/master/penguins.csv
to create a DataFrame of these data. - For each island, calculate the mean, standard deviation, 95% confidence interval: lower bound = mean - 1.96 * standard deviation and upper bound = mean + 1.96 * standard deviation.
- Cricket
- Use Pandas to read the following URL create a DataFrame:
https://raw.githubusercontent.com/roualdes/data/refs/heads/master/test_cricket.csv
- Replace any dashes "-" in the column
Runs
with the string"0"
. Then convert the column to have dtype"Int64"
. - Use one regular expression to create two new columns
start_year
andend_year
based on the columnSpan
. Make sure these new columns both have dtype"Int64"
. - Create a new column named
years_played
, which uses the columnsstart_year
andend_year
, to determine the number of years played for each player. This requires more, albeit not much more, than simple substraction. - Create a new column named
runs_per_year
, by dividingRuns
byyears_played
. - Create a sub-DataFrame which shows the top 10 players
ranked by
runs_per_year
. - temp
- Use Pandas to read the following URL create a DataFrame: temperatures.csv. You can read about the dataset here
- Create two new columns
month
andyear
from thedate
column. - Calculate the mean (across years) monthly high temperature for both Hilo and Death Valley. In the end, you should have for each city 12 numbers, one for each month.