Data Scientist/Analyst & Product Management Specialist with an education in data science, economic analytics and a demonstrated history of working in the information technology and services industry. Pursuing opportunities in product management and various data analytic roles to begin a career focused on deriving the stories being told within data through exploratory data analysis, data manipulation, and machine learning.
One of the most recent debates in Major League Baseball is focused on why more home runs were hit in the 2017 season than any other season in the leagues history. There were 6,105 home runs hit in the 2017 season, which is more home runs than in the peak of the steroid era and everyone wants to know why. One specific area of interest for investigation is related to the most important item to the game, the baseball. During the 2017 season there were numerous complaints from major league pitchers that the ball felt different and the result was a record breaking year for home runs.
Thanks to Baseball Savant and their StatCast technology we can begin to understand what is happening by analyzing the data tracked on every pitch that was hit for a home run. After a high level review of the data and some outside research I found a specific stat known as the Home Run Exit Velocity which is defined by Baseball Savant as the speed at which the baseball leaves the bat after being hit. Home Run Exit Velocity will be the target variable for my investigation into what is causing an increase in home runs.
Using machine learning techniques I will build a production level regression model to draw a conclusion on what features involved in a home run are most influential to the home run exit velocity. The data that will be used to train this model will come from three different aspects:
In this assignment I was asked to analyze the total number of bets made in each of the first five months of the 2018 horse racing season. The instructions indicated that in the fourth month of the season a $35,000 marketing campaign was launched to increase the number of bets being placed and I was tasked with evaluating the success of the campaign. After exploratory analysis, I used machine learning modeling techniques to create a model to predict the total number of bets placed within each wager, this allowed me to identify the most influential features from the dataset contributing to the number of bets being placed. Using both my analysis and modeling I then provided my evaluation of the launched marketing campaign along with a recommendation to the marketing team on what to consider for future campaigns.
In order to prevent further outbreaks of West Nile Virus (WNV) in the city of Chicago the Department of Public Health has been collecting historical data on the weather, mosquito traps and the areas of the city where pesticides were sprayed. Using this data, my team will investigate the factors that are contributing to the population of mosquitoes and build models to predict which mosquito traps will have the highest percentage chance of capturing mosquitoes with WNV. From this data, my team will recommend the optimal location and times of the year to spray mosquito pesticides to decrease the mosquito population and ultimately decrease WNV outbreaks.
Applied Natural Language Processing (NLP) techniques to analyze text data within posts from both the Space X and NASA subreddits. With the public Reddit API I was able to web scrape both of these subreddiits to obtain the text data needed to perform the necessary NLP techniques. Using CountVectorizer and TfidfVectorizer I was able to identify the vocabulary words and phrases that held the most influence in each of these subreddit posts. Armed with this information I was then able to apply classification modeling techniques to predict if a post originated from the Space X or NASA subreddit.
Explored a dataset containing various different housing features in order to create a production regression model that could predict the overall home price fo homes in Ames, Iowa. Used feature engineering techniques (Variance Threshold, SelectKBest) to ensure the model was making predictions based on the features that were most influential to a home’s sale price. Applied additional model tuning techniques (GridSearch) to ensure the best hyper parameters were being chosen for use in predicting the overall sales price.
This blog post will provide a Home Run scouting report for Bryce Harper using the data from my capstone project.
This blog post will walk through the process of setting up Flask.
This blog post will define enemble modeling and the methods that are used within these modeling techniques.
This blog post will walk through the process of creating the perfect multiple bar chart within MatPlotLib.
This blog post will walk through the process of pickling an object for future use within Python.
This blog post will walk through the process to instantiate and score a model using Python with SKLearn.
I grew up a basketball player and have been a huge sports fan for my entire life. I appreciate all sports but some of my favorites are basketball, football, golf, hockey and baseball.
I love the snow and traveling up to the mountains for some snowboarding. When also outside I love to hike micellaneous areas in southern California.
When I find myself inside I am practicing my Python, reading about my favorite Data Scientists and reseaching a new Data Science problem to solve.