The Coursera course on which these notes are based identifies three main types of data science roles, listed under three different job titles. These are:
- business analyst,
- data analyst, and
- data scientist.
Professor Daniel Egger says people who complete this coursera specialization will be well prepared for an entry level position working as a business analyst or business data analyst. Data Scientists require additional skills and work experience, but this specialization will provide a good idea of what those next steps and skills are. Egger says that, aside from a few people who come to industry with PhDs in statistics or computer science, most data scientists started as business data analysts a few years earlier and learned the additional skills they needed on the job.
The title business analyst is used as a label a very broad category of jobs. This section will describe the areas where most of these jobs overlap. There are seven key areas of competency.
- Ability to identify the most important and relevant business metrics for a given business. This is specific knowledge of a given industry sector (or “vertical market”). Obtaining this specialized knowledge may require specific research. An example that will be covered is the real estate and financial services market.
- Ability to apply appropriate models to analyze those metrics. Most of the models a business analyst is expected to know can be run in Excel. Models are defined as ways to represent a complex real-world phenomena using a simplified mathematical form.
- Ability to quantify the effectiveness of models used. As a specific example, the area under the ROC curve can be used to compare the performance of any two binary classifications.
- Ability to interview customers, internal or external, to define project requirements. A large part of a business analyst’s work product takes the form of different reports. An example of one of these reports might be a customer requirements document, that translates customer desires into product features and services that the company that then work to deliver.
- Basic Excel skills, including identification of patterns and trends in business data, as well as the ability to make forecasts, organize financial information, and display conclusions in charts and graphs. Slightly more advanced excel skills may also be required, such as the ability to import and manage large data sets, develop and run models, and run optimizations using solver.
- Presentation skills. The most effective presentations are clear, concise, and persuasive.
- Ability to use data visualizations to make conclusions intuitive to non-technical audience.
A data analyst generally similar to a business analyst, except that a few additional skills are required. Generally, data analysts are more senior and higher paying than business analysts. There are two key differences:
- A data analysts is expected to be able to think flexibly about how a company’s data could be combined and analyzed in new ways to gain a better understanding of the business. Contrast this with a business analyst, who is generally given a specific problem to analyze. In order to meet this requirement, a data analyst typically needs to be able to pull together information from various data sources within the organization. Stated more succinctly, a data analyst should be able to determine the questions to ask in addition to identifying the data that will let him/her solve them.
- SQL. In order to combine data in the ways described in #1, data analysts generally need to know how to use SQL to query databases. Not only does this make the data analyst more autonomous and self-sufficient, but they also can create entirely new datasets to analyze.
Data scientist is a rapidly-evolving, new leadership role. At its core, it is interdisciplinary and its scope is constantly changing. Data scientists, in the most general sense, are people who understand how to improve business processes using big data. In keeping with this goal, they can translate a company’s business goals and needs into database architecture. Conversely, they can also translate the needs and concerns of technical individuals like engineers into terms that non-technical management can understand. Data Scientists are expected to work from a big-picture perspective, in addition to having detailed technical knowledge.
In terms of specific skillset, data scientists need to have all the skillsets of a data analyst plus the following:
- Advanced modeling tools including one of R, Matlab, or SAS.
- Advanced statistical methods. Data scientists generally have completed at least one calculus-level probability and statistical inference course.
- Bayesian learning and probabilistic models, as well as machine learning algorithms for predictive analytics. Real-time decision systems generally rely on machine learning methods, which is captured by Bayesian assumptions and methods.
- More advanced relational database skills. This would include populating and optimizing SQL databases.
- Distributed and unstructured data skills using tools like Hadoop, MapReduce, Hive, Pig, and Spark.
- Basic knowledge of methods for natural-language processing for sentiment analysis and other applications.
- Experience with massively scalable cloud-based data hosting and processing. Examples of these services are Amazon Web Services (AWS), Microsoft Azure, and the Google Compute layer.