Course-related Materials
Setting up your system for this course
This courses uses Python in conjunction with Jupyter notebooks and some of the most commonly used packages for data science and machine learning. The Anaconda Python distribution, which is specifically designed for the needs of data science users, is the recommended way of installing the required packages. In particular, we recommend the Miniconda version of Anaconda, which is a barebones version of Anaconda. For this course you will need the following packages: NumPy, pandas, matplotlib, scikit-learn, and tensorFlow. Our coding environment will be the Jupyter notebook. As an alternative, you can use Google Colab which allows you to run Jupyter notebooks through a browser without the need to install any software. You will need a google account for that.
Python and Jupyter
- The Python website.
- The Jupyter notebook. Jupyter notebooks are a platform that combines code and text. This will be our primary coding environment.
- If you need to improve your Python skills I recommend the following free Python book: How to Think Like a Computer Scientist: Learning with Python 3. This book is designed for beginner programmers, but should still be very helpful.
Tools
The primary tools we will use are:
- scikit-learn. The most widely used Python machine learning package.
- NumPy. Containers for vectors and matrices and linear algebra operations.
- Matplotlib. Data plotting and visualization.
Additional tools
- Keras/TensorFlow. Neural network and deep learning.
- Pandas. Data structures and analysis tools.
The Data Science Handbook
The following is a book that covers NumPy, pandas, matplotlib and scikit-learn:
- The Python Data Science Handbook by Jake Vander Plas.
NumPy
- From Python to NumPy by Nicolas P. Rougier.
Statistics
- Free statistics book that includes Python code examples: Think Stats 2e.
Machine learning
- Introduction to Machine Learning with Python by Andreas C. Muller and Sarah Guido. Many code examples from scikit-learn.
- Machine Learning Yearning by Andrew Ng.
Math
- Mathematics for Machine Learning by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong.