The Python Scientific Stack

The Python scientific stack consists of the following libraries (and others).

  • Pandas[link] - data loading, cleaning, and quick exploration
  • Numpy[link] - (n-dimensional) array manipulation
  • Statsmodels[link] - statistical modeling
  • Scikit-learn[link] - (general purpose) machine learning
  • Keras[link] - deep learning models
  • XGBoost[link] - gradient boosted trees
  • Matplotlib[link] - visualization
  • Seaborn[link] - statistical visualization
  • Bokeh[link] - interactive visualization
  • Jupyter[link] - notebook for prototyping
  • Ipython[link] - quicker prototyping
  • Dask[link] - distributed and out-of-core data manipulation
  • (Mini)Conda[link] - package and environment management

The logos for these libraries (and others) are all listed in the following image, extracted from Jake VanderPlas’s presentation at PyCon 2017, entitled “The Unexpected Effectiveness of Python in Science.”

image-center

I found his presentation on YouTube, here. It would probably be worth taking notes on in the future.


The source for this note is a Quora answer, written by Yassine Alouini, linked here. It also includes an image extracted from a presentation by Jake VanderPlas.