Foundations of Data Science: Prediction and Machine Learning

This course is provided by

Course Description

Instructors:  Ani Adhikari
Instructors:  John DeNero
Instructors:  David Wagner
School:  BerkeleyX

One of the principal responsibilities of a data scientist is to make reliable predictions based on data. When the amount of data available is enormous, it helps if some of the analysis can be automated. Machine learning is a way of identifying patterns in data and using them to automatically make predictions or decisions in the future. In this data science course, you will learn how basic concepts and elements of machine learning.

The two main methods of machine learning you will learn are regression and classification. Regression is used when you seek to predict a numerical quantity. Classification is used when you seek to choose which category to assign (e.g., given information about a financial transaction, predict whether it is fraudulent or legitimate).

For regression, we will teach you how to measure the correlation between two variables and compute a best-fit line for making predictions when the underlying relationship is linear. We will also teach you how to quantify the uncertainty in your prediction using the bootstrap method. These techniques will be illustrated with a wide range of examples. For classification, you will learn the k-nearest neighbor classification algorithm, learn how to measure the effectiveness of your classifier, and learn how to apply it to real-world tasks.

The course will highlight the assumptions underlying the techniques, and will provide ways to assess whether those assumptions are good. It will also point out pitfalls that lead to overly optimistic or inaccurate predictions.