Current State of Data Science Education
The buzz surrounding the analytics/data science space has been intense in recent years. Well-known organizations like glassdoor referred to “Data Scientist” as the “Best Job in America” for each of 2016, 2017, and 2018. Prognostication about the number of Data Scientists we need has reached a fever pitch, with some sources predicting 400,000 new Analytics jobs being created between 2015 and 2020 (Northeastern University, “Breaking Into Analytics”). As a point of comparison, the Bureau of Labor Statistics predicts approximately 300,000 new jobs for Software Developers between 2016 and 2026. Note that the former, larger prediction is expected to occur in half the time of the latter, which is one of the fastest growing occupations in the nation.
Whether or not the rosiest predictions are reasonable, there is broad consensus that data science (sometimes referred to as “Analytics”) is a high-growth field. As with many other tech jobs, the huge demand for this new profession has resulted in a huge proliferation of relevant learning material freely available online. The university system has been slow to respond to the sudden demand for the multifaceted data science skillset… it takes lots of tedious committee meetings to develop formal educational curricula.
I gave a short presentation on this and related subjects to the Business School faculty at my local state university (Cal State University Bakersfield) in February, 2018. See my slide deck.
The huge abundance of information about data science, plus the lack of available formal education, creates a unique problem for would-be learners of this new science. There appears to be so much to learn, and so many places from which to learn it, that it becomes difficult to know where to begin. This is the motivation for the notes section of my website. It exists to help promote learning, for the people finding my work, and also for me. They say the best way to learn is to teach, and I find that going through the process of clearly articulating something for others makes me understand it more completely. So, my notes are an ongoing exercise in articulating things for my future reference and for yours.
I began studying data science in early 2018, following completion of my MBA degree. I discuss my reasoning for this career transition in another post. Its been a very fun, if daunting, ride so far. The deeper I get into the field, the more ignorant I realize I am.
I discuss some of the means by which I’ve learned in the sections below. There is more detail, including hourly time commitment breakdowns, in my sources section.
From December, 2017 until May, 2018 I worked toward the Data Analyst Nanodegree from a company called Udacity. Udacity is an online, tech-focused education platform founded by Sebastian Thrun, among others. Their nanodegrees are a credential that users earn by completing coursework and projects.
Having completed that program, I believe it can be more effective as traditional education for self-motivated learners. But, it is unlike traditional computer science education in a few ways. Specifically, it is project-focused, outside the university system, and totally online.
Because it is project-focused, you internalize concepts well, but only in sufficient depth to solve your immediate problem. Shallow understanding is often sufficient in technology, but I think there is value in theoretical rigor and depth of understanding. For that, you can’t beat university education.
Because it is outside the university system it is highly responsive to tech industry needs. The platform’s content evolves much, much more rapidly than university curricula.
Because it is totally online it is relatively low-cost and flexible, but it offers little in the way of spoon-feeding and hand-holding: you* must do the work. I consider that an advantage; technical employment requires an ability to research and solve your own problems.
* The online nature also makes it also possible to cheat, despite Udacity’s best efforts. Udacity encourages posting your projects on publicly available github repos to serve as a project portfolio, which means people can and do steal your code. One of the most popular pages on this site is the project page for a particularly difficult Udacity project. Why? People get stuck and google for answers. This a problem without an easy solution, unfortunately.
For more thoughts on Udacity see my other posts here and here.
In May of 2018, around the time I completed the nanodegree, I began applying to jobs and immediately recognized how crucial SQL was to an analysts job. It is a very fundamental thing to understand. Udacity had taught a decent amount of PostgreSQL, but its difficult to truly incorporate SQL into project-focused and creative projects. I was interested in practicing with SQL more.
Research led me to a Coursera course called Managing Big Data with MySQL. It is part of the Excel to MySQL: Analytic Techniques for Business Specialization specialization on Coursera. It was economical at $50/month, so I dove in. I was blown away. The depth of instruction was unbelievable. The notes linked here are all a product of taking that initial class.
Long story short, I decided to pursue the entire Duke University specialization. Since then I’ve also completed the first of five courses towards a Google specialization called Machine Learning with TensorFlow on Google Cloud Platform. I plan to complete it after the Duke specialization. It is similarly amazing content.
The content is more in depth and focused than Udacity (mind you, Udacity manages an almost impossible breadth with their nanodegree). The specialization is like a midway point between the shallow Udacity coursework and formal University instruction, but at a pace of your choosing.
The structure is also different than Udacity in other ways. Most importantly, Coursera is a platform for organizations, primarily schools, to post their own structured training courses. On the other hand, Udacity creates almost their own content. There is one exception, and that is GA Tech’s Online Masters in Computer Science.
GA Tech OMSCS
I’ve discussed this program in more depth here, and there are insightful articles about the program all over the internet. I have yet to begin it, but I have read amazing things, and I’m very excited to pursue a program with the rigor and pedigree of GA Tech. I think it will be very challenging, but worthwhile.
I’ve also supplemented my data science studies with other books and references, including:
- “Learn Python the Hard Way”, by Zed Shaw
- “Data Science from Scratch”, by Joel Grus
- “Python for Data Analysis”, by Wes McKinney
- “A Whirlwind Tour of Python”, by Jake VanderPlas
- “The Python Data Science Handbook”, by Jake VanderPlas
These are the early days of my education in this field, and as I’ve elaborated elsewhere, I’m excited that my data science journey will be long and ongoing. Much of it will be documented on this site, so feel free to check in once in a while.