MATH 385 Week 07 Worksheet

  1. Penguins
    1. Use Pandas to read the following URL to a CSV dataset about penguins https://raw.githubusercontent.com/roualdes/data/master/penguins.csv to create a DataFrame of these data.
    2. For each island, calculate the mean, standard deviation, 95% confidence interval: lower bound = mean - 1.96 * standard deviation and upper bound = mean + 1.96 * standard deviation.
  2. Cricket
    1. Use Pandas to read the following URL create a DataFrame: https://raw.githubusercontent.com/roualdes/data/refs/heads/master/test_cricket.csv
    2. Replace any dashes "-" in the column Runs with the string "0". Then convert the column to have dtype "Int64".
    3. Use one regular expression to create two new columns start_year and end_year based on the column Span. Make sure these new columns both have dtype "Int64".
    4. Create a new column named years_played, which uses the columns start_year and end_year, to determine the number of years played for each player. This requires more, albeit not much more, than simple substraction.
    5. Create a new column named runs_per_year, by dividing Runs by years_played.
    6. Create a sub-DataFrame which shows the top 10 players ranked by runs_per_year.
  3. temp
    1. Use Pandas to read the following URL create a DataFrame: temperatures.csv. You can read about the dataset here
    2. Create two new columns month and year from the date column.
    3. Calculate the mean (across years) monthly high temperature for both Hilo and Death Valley. In the end, you should have for each city 12 numbers, one for each month.