Describes a basic data architecture that can be used to manage your personal finance on (free!) Trailhead orgs. As of 5/20/21 this project is a work-in-progress.

  1. Data Architecture

Supervised Learning

Trains and optimizes several supervised machine learning model on the Census Income Data Set.

Details the criteria by which an appropriate model was selected. Iteratively tunes that final model to optimize performance using sklearn’s GridSearchCV. Visualizes changes in performance with changes in model complexity. Demonstrates the data exploration, preprocessing and visualization workflow necessary to prepare the data prior to training the model. Examines the relative importance of the features in the original dataset.

  1. Explore and Preprocess Dataset
  2. Determine an Appropriate Model for the Data
  3. Train Models and Determine Feature Importance

Deep Learning

Demonstrates training of a PyTorch deep learning model that managed 78.4% accuracy on a test to label images. Then, converts both the training and inference functions of the model to run from the command-line as a standalone Python command-line application.

  1. Iteratively Train and Test the Classifier in a Notebook
  2. Convert the Network Training Function to a Command-Line Application
  3. Convert the Network Prediction Function to a Command-Line Application

Unsupervised Learning

Demonstrates utilization of principal component analysis and clustering on a large dataset to better understand the characteristics of a company’s customer base.

  1. Load and Preprocess the Data
  2. Transform Features and Perform Principal Component Analysis and Cluster Analysis


Cleans and consolidates multiple, similar datasets, mapping the data to common categorical variables. Then, visualizes the result as several stacked barplots built according to visualization best practices using Matplotlib. The raw data utilized are the Kaggle Machine Learning and Data Science survey results for 2017, 2018, and 2019.

  1. Parse, Combine, and Consolidate Kaggle ML and DS Survey Results
  2. Summarize and Visualize Kaggle ML and DS Survey Results
  3. Resulting Visual

Demonstrates creation of a best-practices visual using Matplotlib and real-world weather data.

Demonstrates one method to dynamically color a plot based upon an input parameter.

Demonstrates manipulation and reshaping of real-world data and then creation of basic visuals to facilitate insights discussion in a masters thesis.

Binary Classifier

Outlines creation of a predictive model used to determine whether credit card applicants should be approved or rejected for a credit card based on various creditworthiness metrics.

  1. Engineering a Metric
  2. Quantifying Binary Classifier Value
  3. Information Gain
  4. Selecting an Optimum