Projects


Supervised Learning

Trains and optimizes several supervised machine learning model on the Census Income Data Set.

Details the criteria by which an appropriate model was selected. Iteratively tunes that final model to optimize performance using GridSearchCV. Visualizes changes in performance with changes in model complexity. Demonstrates the data exploration, preprocessing and visualization workflow necessary to prepare the data prior to training the model. Examines the relative importance of the features in the original dataset.

  1. Explore and Preprocess Dataset
  2. Determine an Appropriate Model for the Data
  3. Train Models and Determine Feature Importance

Visualization

Cleans and consolidates multiple, similar datasets, mapping the data to common categorical variables. Then, visualizes the result as several stacked barplots built according to visualization best practices using Matplotlib. The raw data utilized are the Kaggle Machine Learning and Data Science survey results for 2017, 2018, and 2019.

  1. Parse, Combine, and Consolidate Kaggle ML and DS Survey Results
  2. Summarize and Visualize Kaggle ML and DS Survey Results
  3. Resulting Visual

Demonstrates creation of a best-practices visual using Matplotlib and real-world weather data.

Demonstrates one method to dynamically color a plot based upon an input parameter.

Demonstrates manipulation and reshaping of real-world data and then creation of basic visuals to facilitate insights discussion in a masters thesis.

Data Prep for Machine Learning

Outlines and demonstrates the steps required to clean and prepare data to be used to train machine learning models.

Binary Classifier

Outlines creation of a predictive model used to determine whether credit card applicants should be approved or rejected for a credit card based on various creditworthiness metrics.

  1. Engineering a Metric
  2. Quantifying Binary Classifier Value
  3. Information Gain
  4. Selecting an Optimum

Hypothesis Testing

Cleans, compiles, and tests a dataset to evaluate the hypothesis that the financial value of homes in University towns are less effected by recessions.

Demonstrates basic matplotlib visuals and execution of a statistical t-test using a simple dataset that demonstrates the Stroop Effect.