Starting Point for a New Body of Knowledge

A New Field

As I have mentioned elsewhere, the newness of the data science field means that the notion of what a data scientist knows and does is still developing. Universities, which have historically formalized and delineated the various fields of study, have not had sufficient time to develop curricula for this new discipline. There are only a handful of analytics programs available at time of this writing.

As a comparison to illustrate this point, my undergraduate degree is in mechanical engineering. As a discipline, mechanical engineering is very old, having formally emerged during the Industrial Revolution in the 18th century. Our understanding of the fundamental principles of physics upon which the field was constructed are thousands of years older than that. Rome, for example, may not have had “mechanical engineers” (or civil), but they obviously had people capable of designing aqueduct systems.

On the other hand, it is difficult to imagine the work of today’s data scientists taking place in the absence of a computer. We so quickly forget that electronic calculators have only existed since the late 1950s, and then they looked like this. In the absence of the enabling electronic computing, the work of today’s data scientists would immediately devolve into endless abacus manipulation or chalkboard scratching. Consequently, one would immediately lose track of the forest (the standard deviation of 10,000 samples) for the trees (did I carry the one?).

Data scientists will be successful to the degree that they skillfully work with and understand the forest. Today’s computers iterate over large numbers of trees just fine. Continuing with the metaphor, the forests are growing increasingly dynamic and complex, as are the requisite skills to manage them. But we have to start somewhere.

A Point of Consensus

If there is anything that today’s analytics professionals have circled the wagons around, it is probably the following Venn diagram. Originally created by Drew Conway around 2010, it has become the stuff of legend.


Given its broad acceptance in the community, this Venn Diagram is the basis for the organization of my data science notes.

Drew Conway’s Nomenclature Data Science Notes Category
Hacking Skills Computation Tools
Math & Statistics Knowledge Math & Stats
Substantive Expertise Purpose & Process

For more on this subject, see Drew Conway’s (far more authoritative) discussion here.