Syllabus Math 385-01 Introduction to Data Science

Section 01 Holt Hall 257 TuTh 11 - 12.15pm

Contact Information

Edward A. Roualdes (call me Edward)

Email:

Office hours: Holt 204 on MW 10 - 10.50am and Th 2 - 3pm 3 - 4pm

Course Description

Data Science is the science of learning from data in order to gain useful predictions and insights. The course provides an overview of the wide area of data science, with a particular focus on the tools required to store, clean, manipulate, visualize, model, and ultimately extract information from various sources of data. Topics include the analytics life cycle, data integration and modeling in R/Python, relational databases and SQL, text processing and sentiment analysis, and data visualization. Emphasis is placed on reproducible research, code sharing, version control, and communicating results to a non-technical audience. 3 hours discussion.

Student Learning Objectives / Goals

  • Learn basic and common command line utilities
  • Understand text based data formats
  • Create summary statistics and plots
  • Understand connection between statistics and data
  • Explore intermediate statistical models

Textbook

There is no assigned nor required book for this course. On the our class Python package ds385, see Section References on the Overview page, I keep a running list of the references from which I draw content for this course.

Additional Requirements

  • Access to a computer will be essential to master the material of this course. If you don’t have immediate and consistent access to a laptop, please speak to me as soon as possible.

  • We will learn to code in Python using the IDE JupyterLab Desktop, both of which are free software. If you do not have a working Python environment (which is factually different from your OSes Python environment), you should use JupyterLab. A Python environment should be installed automatically when you install JupyterLab.

  • You must register for a free account on GitHub. If you do not already have an account, I encourage you to pick a username that you would not be embarrassed to show to a future boss.

Content Delivery

Lectures are in person at the times listed above. No recordings will be available. As Gil Scott-Heron says, the revolution will not be televised; this class will be live.

All course materials will be posted to my website: roualdes.us/math385.

Course Communication

The absolute best place to ask a question is during lecture. I understand, though, that not all students feel comfortable asking questions publicly.

If you prefer more private and in person communication, come to office hours.

If you prefer written and identifiable communication, email me at . If your questions become too complex for email, as judged by me, I reserve the right to ask you to come visit my office to receive your answers in person.

If you prefer written and anonymous communication, I have created an anonymous Google form named ask. If you intend to ask a question anonymously, please remember that this form is anonymous. The implications of this anonymity are greater than you might at first think; take a minute to think through how you want me to address you specifically, if I don’t know who you are. Further, there might be some questions I deem to not deserve a response. If you intend to give me feedback, please give constructive and respectful feedback. If at any point this form goes poorly, as judged by me, I reserve the right to take it down.

If for any reason I need to address everyone in the course, I will send you an email to your student email account, eg you@mail.csuchico.edu.

Course Grading

Your final grade for this course will be given according to the \(+/-\) grading system, based on the following percentages and scale: \(90 - 100\), A; \(80 - < 90\), B; \(70 - < 80\), C; \(60 - < 70\), D; \(<60\), F.

Component Percentage
Assignments 60%
Quizzes 10%
Project 20%
Project presentation 10%

Assignments

All assignments must be created using Jupyter notebooks. Each assignment will have a private GitHub repository that only you and I have access to. All assignments must be submit using git to the appropriate GitHub repository.

Project

In my humble opinion, success in the world of data science has a lot to do with an individual’s own initiative. Those who are eager and excited to learn, build, and create things will generally do better than those who don’t and/or won’t help themselves. The project for this course is designed to showcase your own initiative. On the other hand, I fully understand that this particular semester might simply not (’cause life) work for you to engage this specific course in such a way.

For your DS385 project, please pick a topic and explore further on your own. Create a deliverable to submit by the end of the semester, prepare and deliver a presentation, using RISE, on it during the last two weeks of the semester prior to finals week. You are required to run your project idea by me. You can begin work on the project before you run the idea by me, but this is a risky bet if I deem your project unworthy of our time.

Here are a few project ideas, but you are more than welcome to think of your own.

  • Create a Python package; the package must have a purpose and be more than just a skeleton.

  • Research a topic we did not discuss in class:

  • Research in more depth a topic we did discuss in class: regularization, nonlinear regression (including but not limited to neural networks), …

  • Explore a programming language: Rust, Julia, …

  • Investigate multi-core processing for data anslysis: dask, …

  • Read a book on a data science, statistics, machine learning, or artificial intelligence topic of your choice

  • Explore a piece of software: TensorFlow, PyTorch, …

  • Write a tool in C/C++/Cython code and interface it with Python

  • Explore GPU programming in Python

Project Summary

Pick a topic. Run the topic by me. Deliverable due by the end of the semester to our shared Google Drive folder. Presentations given during the last two weeks of the semester before finals week. Presentations must be at least 10 minutes long and no more than 20.

Make-Up Policy

Homework assignments can be submit late for a maximum of 70% credit. You can submit a homework assignment as late up until the next test, but not after.

Diversity Policy

Respect: Students in this class are encouraged to speak up and participate during class meetings. Because the class will represent a diversity of individual beliefs, backgrounds, and experiences, every member of this class must show respect for every other member of this class (this includes me).

Academic Integrity Policy

Students are permitted and encouraged to collaborate on all assignments other than tests. However, each student must turn in their own work. Further, it is the expressed expectation of this instructor that all students demonstrate integrity and individual responsibility in all actions related to this course. Unethical behavior of any kind is unacceptable and will be prosecuted vigorously. Any sign of cheating in any way on any course assignment will be addressed directly, according to University standards. If you do not understand what plagiarism is, or what cheating entails, you must seek information regarding this matter from the current University Catalog and from me. The consequences of plagiarism begin with a failing grade on the work, and possibly a failing grade in the course, depending upon University action. More information is found on the Student Conduct, Rights, and Responsibilities campus webpage.

Disability Support

If you have any disability related needs, please contact Disability Support Service (Colusa Hall 898-5959 or campus information 898-INFO for directions) on campus to obtain the appropriate documentation. Afterwards, email me to identify your needs within the first two weeks of class so that any necessary arrangements can be made.

Confidentiality and Mandatory Reporting

As an instructor, one of my responsibilities is to help create a safe learning environment on our campus. I am required to share information regarding sexual misconduct with the University. Students may speak to someone confidentially by contacting the Counseling and Wellness Center (898-6345) or Safe Place (898-3030). Information on campus reporting obligations and other Title IX related resources are available here: www.csuchico.edu/title-ix.

Course Outline

Please see the Notebooks page of our class Python package, ds385.