Course-related Materials

Canvas.

Setting up your system for this course 

This courses uses Python in conjunction with Jupyter notebooks and some of the most commonly used packages for data science and machine learning. The Anaconda Python distribution, which is specifically designed for the needs of data science users, is the recommended way of installing the required packages. In particular, we recommend the Miniconda version of Anaconda, which is a barebones version of Anaconda. For this course you will need the following packages: NumPy, pandas, matplotlib, scikit-learn, and tensorFlow. Our coding environment will be the Jupyter notebook. As an alternative, you can use Google Colab which allows you to run Jupyter notebooks through a browser without the need to install any software. You will need a google account for that.

Python and Jupyter 

Tools 

The primary tools we will use are:

  • scikit-learn. The most widely used Python machine learning package.
  • NumPy. Containers for vectors and matrices and linear algebra operations.
  • Matplotlib. Data plotting and visualization.

Additional tools

The Data Science Handbook 

The following is a book that covers NumPy, pandas, matplotlib and scikit-learn:

NumPy 

Statistics 

  • Free statistics book that includes Python code examples: Think Stats 2e.

Machine learning 

Math