“Python Data Science Handbook” by Jake VanderPlas

Summary & Discussion

Jake VanderPlas wrote The Python Data Science Handbook in order to teach Python users the essential tools necessary to performing data-intensive and computational science. More directly, it teaches the Python “data science stack,” the libraries of which are enumerated below, in order empower users to effectively store, manipulate, and gain insight from data.

Similar to Data Analysis with Python, this is not a book primarily about data analysis methodology. Its not even necessarily about programming itself. The Python Data Science Handbook is about the current state of data-oriented libraries.

From the preface to the book: “Each chapter of this book focuses on a particular package or tool that contributes a fundamental piece of the Python Data Science story…”

Referring the libraries it discusses, “…these five are currently fundamental to much of the work being done in the Python data science space, and I expect they will remain important even as the ecosystem continues growing around them.”


I will be relying on this reference in generating the applied, Python library-specific elements of my Data Science notes. Jake VanderPlas has graciously made the text for the book freely available on his github in the form of Jupyter notebooks. I expect I will be reproducing and commenting on large parts of it in my notes as well.

As part of my ongoing attempt to understand the knowledge base a Data Scientist requires, I’ve distilled the book’s contents, below.

Major Contents

IPython: Beyond Normal Python

Introduction to NumPy

Data Manipulation with Pandas

Visualization with Matplotlib

Machine Learning


Content for this article is taken from:

The Python Data Science Handbook by Jake VanderPlas (O’Reilly). Copyright 2016 Jake VanderPlaus, 978-1-491-91205-8. Get it here.