Syllabus CSCI/MATH 385 Introduction to Data Science
Section 01: MWF @ 10 - 10.50am in Holt 185
Contact Information
Edward A. Roualdes (call me Edward)
Email: eroualdes@csuchico.edu
Office hours: Monday 11 - 11.50am and Tuesday 1 - 1.50pm in Holt 204, and
Wednesday 12 - 12.50pm in Holt 204 Chico State Innovation Lab
(on the second floor of Meriam Library), or email me and we'll
find a time that works for us both.
Course Description
Data Science is the science of learning from data in order to
gain useful predictions and insights. The course provides an
overview of the wide area of data science, with a particular
focus on the tools required to store, clean, manipulate,
visualize, model, and ultimately extract information from
various sources of data. Topics include the analytics life
cycle, data integration and modeling in R/Python, relational
databases and SQL, text processing and sentiment analysis, and
data visualization. Emphasis is placed on reproducible research,
code sharing, version control, and communicating results to a
non-technical audience. 3 hours
Student Learning Objectives / Goals
- improve Python programming skills
- learn beginnings some of the many tools of data science
- get started with Git, GitHub, Jupyter Lab/Google colab
Resources
Content will be primarily developed in class. The recommended references for more advanced coverage of the material presented are
all of which are freely available online.
Additional Requirements
- Access to a computer will be essential to master the material of this course. If you don’t have immediate and consistent access to a laptop, please speak to me as soon as possible.
- We will learn to code in Python. If your machine does not have Python installed on it already, I strongly discourage you using Anaconda (despite what the internet tells you) to install Python. Here's a reasonable webpage for installing Python modern Windows machines
- You will need a text-editor of some kind, but I encourage you to learn Emacs, Vim, or VS Code.
- If you do not already have a working Python environment (which is factually different from your OSes Python environment), you should set one up using venv.
- You should also install the Python package pip, because we will use pip to install numerous other Python packages throughout the semester. If you followed my advice and used venv to create a Python virtual environment, then you shouldn't need to do anything here, as pip is automatically installed in venv-based virtual environments.
- You should also install git, as all assignments will be submit with this tool.
Content Delivery
Lectures are in person at the times listed above. No recordings will be available. As Gil Scott-Heron says, the revolution will not be televised; this class will be live.
All course materials will be posted to my website: roualdes.us/math385.
Course Communication
The absolute best place to ask a question is during lecture. I understand, though, that not all students feel comfortable asking questions publicly.
If you prefer more private and in person communication, come to office hours.
If you prefer written and identifiable communication, email me at eroualdes@csuchico.edu. If your questions become too complex for email, as judged by me, I reserve the right to ask you to come visit my office to receive your answers in person.
If you prefer written and anonymous communication, I have created an anonymous Google form named ask. Access is only granted to your@mail.csuchico.edu account. If you intend to ask a question anonymously, please remember that this form is anonymous. The implications of this anonymity are greater than you might at first think; take a minute to think through how you want me to address you specifically, if I don’t know who you are. Further, there might be some questions I deem to not deserve a response. If you intend to give me feedback, please give constructive and respectful feedback. If at any point this form goes poorly, as judged by me, I reserve the right to take it down.
If for any reason I need to address everyone in the course, I will send you an email to your student email account, eg you@csuchico.edu.
Course Grading
Your final grade for this course will be given according to the +/- grading systems, based on the following percentages and scale: 90 to 100, A; 80 to <90, B; 70 to <80, C; 60 to <70, D; <60, F.
Component | Percentage |
---|---|
Worksheets | 50% |
Reading assignments | 20% |
Final Project | 30% |
Grades will be posted to a shared (between me and each of you, individually and exclusively) Google Sheets file. Access is only granted to your@csuchico.edu account.
Worksheets
Worksheets are (mostly) due every other week, where you showcase the programming skills you've learned and developed in class. Worksheets coincide with the topics of the course, detailed in the Course Outline below. There will be one worksheet per topic.
All worksheets will be submit via Git to a shared repository, between you and me exclusively.
Worksheets will be part at home and part in class. This is part of the reason that access to a laptop is essential to this course. Worksheets will be submit as IPython notebooks (.ipynb) created from Jupyter Lab.
You can re-submit a worksheet that was previously submit on time after it was graded for up to 50% of your missed points back. Think of this as an attempt to correct some of your less-than-correct solutions. As an example, if you earned 80% on a worksheet, you can re-submit this worksheet with updated answers for a maximum of 10% added to your original score. Thus, you could obtain 90% on a worksheet for which you originally earned 80%.
Reading Assignments
There will be 4 Reading assignments throughout the semester. The dates are listed in the course outline below and on the Course (Google) Calendar. For each reading assignment you are to read and write about the assigned reading(s) before coming to class of the week the assignment is due.
The Reading Assignment component of your grade is worth 20% of your course grade. Grades for each Reading Assignment are based both on your write-up and your participation in the in-class discussion. You are required to say something relavant (not just "I agree with Edward") at least two times throughout the semester. That means you have 4 opportunities to contribute two thoughts.
Before the in-class discussion you are to submit to a Markdown file that comments on and discusses that week's reading. I'll supply you prompts to help you develop something to write about. You don't have to use my prompts. You have to write at least 400 words.
Although the use of artificial intelligence tools is not discouraged in this class, you are expected to write your own words for all Reading Assignments. As such, I don't want summaries of the articles. I want your opinions, thoughts, concerns, questions, and/or comments about the topics in the articles.
Final Project
The final project is meant for you to showcase a (or some) data science reatled topic(s) that you are interested in. Your final project could be grouping together 3 Worksheet topics into one data analysis using a data set we did not cover in class, or it can be on a data science related topic that we did not cover in class.
You will give a 10 minute presentation at the end of the semester about your final project. As you will give a presentation on your final project, your project should include some sort of visible and sharable output more than just a Python script.
Some ideas for final projects are
- combine at least 3 Worksheet topics into a single analysis of a data set we didn't' consider in class;
- make a personal website, with at least three webpages, using GitHub pages;
- perform a data analysis using new (not covered in our class) data science related tools not previously covered in class, using a data set we didn't consider in class;
- write and host a tutorial of a new (not covered in our class) data science related topic;
- learn and host an introduction to a new (not covered in our class) data science related piece of software;
You must consult me (Edward) about all final project ideas before you begin. Although final project presentations will be given in the last days of the semester, there is not necessarily any reason you can't get started right away.
Tests
There are zero tests.
Make-up Policy
Worksheets can be submit late for a maximum of 50% credit. You can submit a worksheet as late up until the last day of the regular semester, Friday, December 13 at 11:59pm.
Reading Assignments can be submit late for a maximum of 50% credit. There is no way to make up participation in the in-class discussion.
Diversity Policy
Respect: Students in this class are encouraged to speak up and participate during class meetings. Because the class will represent a diversity of individual beliefs, backgrounds, and experiences, every member of this class must show respect for every other member of this class (this includes me).
Academic Integrity Policy
Students are permitted and encouraged to collaborate on all assignments. However, each student must turn in their own work. Further, it is the expressed expectation of this instructor that all students demonstrate integrity and individual responsibility in all actions related to this course. Unethical behavior of any kind is unacceptable and will be prosecuted vigorously. Any sign of cheating in any way on any course assignment will be addressed directly, according to University standards. If you do not understand what plagiarism is, or what cheating entails, you must seek information regarding this matter from the current University Catalog and from me. The consequences of plagiarism begin with a failing grade on the work, and possibly a failing grade in the course, depending upon University action. More information is found on the Student Rights And Responsibilities campus webpage.
The use of artificial intelligence tools are in not discouraged. At times, ChatGPT or Gemini may help you through some otherwise challenging coding or writing problems. But submitting Worksheets and Reading Assignments exclusively based on such software is considered unacceptable and dishonest. In the end, there's nothing I can do to stop this behavior, other than warn you that future employers can quickly tell the difference between people who know the tools and ideas we'll develop in this course and those who don't. Please do use AI as a tool to help you be more efficient. Please don't become a data scientist/programmer who can't work without AI.
Disability Support
If you have any disability related needs, please contact Disability Support Service (Colusa Hall 898-5959 or campus information 898-INFO for directions) on campus to obtain the appropriate documentation. Afterwards, email me to identify your needs within the first two weeks of class so that any necessary arrangements can be made.
Confidentiality and Mandatory Reporting
As an instructor, one of my responsibilities is to help create a safe learning environment on our campus. I am required to share information regarding sexual misconduct with the University. Students may speak to someone confidentially by contacting the Counseling and Wellness Center (898-6345) or Safe Place (898-3030). Information on campus reporting obligations and other Title IX related resources are available at www.csuchico.edu/title-ix.
Course Outline
- Week 01: introduction to Python
- Python
- variables: int, float, strings, booleans
- data structures: lists, dicts, tuples, sets
- control flow: if, elif, else, for, while
- functions: positional, keyword, and default arguments
- classes
- Week 01 Worksheet due
- reference: Python Numpy Tutorial
- Python
- Week 02: introduction to Numpy
- Numpy
- arrays
- array indexing
- array math
- array broadcasting; SIMD is different
- reference: Python Numpy Tutorial
- Numpy
- Week 03: text-based data formats and Pandas
- tabular data: CSV, TSV, JSON
- structured data: JSON
- unstructured data
- Pandas
- pandas stuff
- Week 03 Worksheet due
- Week 04: More Pandas
- Week 05: Matplotlib
- matplotlib stuff
- Week 05 Worksheet due
- Week 06: Plotnine
- more plotting stuff
- Week 06 Reading Assignment due
- Week 07: Scipy and some Mathematics and Statistics
- distributions and their plots
- generating random data
- optimization
- Week 07 Worksheet due
- Week 08: Scipy and Mathematics and Statistics continued
- Week 09: SQL
- Week 10: Linear Regression
- Week 11: Linear Regression Continued
- Week 12: Logistic Regression
- Week 13: Logistic Regression continued
- Week 15: Work on Projects
- Week 16: Work on Projects / Start Final Project Presentations