Summary & Discussion
This book is distinct from the other sources included in my Notes Sources section in that it is primarily about Python and secondarily about Data Science.
From page 1: “My goal is to offer a guide to the parts of the Python programming language and its data-oriented library ecosystem and tools that will equip you to become an effective data analyst. While ‘data analysis’ is in the title of the book, the focus is specifically on Python programming, libraries, and tools as opposed to data analysis methodology.”
Lastly, as the subtitle indicates, the book was written with the purpose of elaborating on the “data wrangling” part of the data analysis process specifically.
I will be heavily relying on this great reference in generating the applied, Python-related elements of my own library of Data Science notes. Wes McKinney has made all the data and source code for the book freely available online, and I expect I will be reproducing and commenting on large parts of it in my notes as well.
As part of my ongoing attempt to understand the knowledge base a Data Scientist requires, I’ve distilled the book’s contents, below.
List of Essential Python libraries
- IPython and Jupyter
Content in Order of Appearance
- Python Basics
- Jupyter Notebooks
- Data Structures, Functions, Files
- NumPy Basics
- Pandas Basics
- Series, DataFrames
- Computing Descriptive statistics
- Reading and Writing Data
- Data Cleaning and Preparation
- Data Wrangling
- Join, Combine, Reshape
- Line plots, bar plots
- Histograms, scatter plots, categorical data
- Time Series
- Advanced Pandas
- Intro to Modeling libraries
- Statsmodels, scikit-learn
- Data Analysis Examples
- Advanced NumPy
- More on IPython System
Content for this article is taken from:
Python for Data Analysis by Wes McKinney (O’Reilly). Copyright 2017 Wes McKinney, 978-1-491-95766-0. Get it here.