Describes a basic data architecture that can be used to manage your personal finance on (free!) Trailhead orgs. As of 5/20/21 this project is a work-in-progress.
Trains and optimizes several supervised machine learning model on the Census Income Data Set.
Details the criteria by which an appropriate model was selected. Iteratively tunes that final model to optimize performance using sklearn’s GridSearchCV. Visualizes changes in performance with changes in model complexity. Demonstrates the data exploration, preprocessing and visualization workflow necessary to prepare the data prior to training the model. Examines the relative importance of the features in the original dataset.
- Explore and Preprocess Dataset
- Determine an Appropriate Model for the Data
- Train Models and Determine Feature Importance
Demonstrates training of a PyTorch deep learning model that managed 78.4% accuracy on a test to label images. Then, converts both the training and inference functions of the model to run from the command-line as a standalone Python command-line application.
- Iteratively Train and Test the Classifier in a Notebook
- Convert the Network Training Function to a Command-Line Application
- Convert the Network Prediction Function to a Command-Line Application
Demonstrates utilization of principal component analysis and clustering on a large dataset to better understand the characteristics of a company’s customer base.
- Load and Preprocess the Data
- Transform Features and Perform Principal Component Analysis and Cluster Analysis
Cleans and consolidates multiple, similar datasets, mapping the data to common categorical variables. Then, visualizes the result as several stacked barplots built according to visualization best practices using Matplotlib. The raw data utilized are the Kaggle Machine Learning and Data Science survey results for 2017, 2018, and 2019.
- Parse, Combine, and Consolidate Kaggle ML and DS Survey Results
- Summarize and Visualize Kaggle ML and DS Survey Results
- Resulting Visual
Demonstrates creation of a best-practices visual using Matplotlib and real-world weather data.
Demonstrates one method to dynamically color a plot based upon an input parameter.
Demonstrates manipulation and reshaping of real-world data and then creation of basic visuals to facilitate insights discussion in a masters thesis.
Outlines creation of a predictive model used to determine whether credit card applicants should be approved or rejected for a credit card based on various creditworthiness metrics.