Not Just a Title: How to Identify a Data Scientist
Update 2018: This post has been updated to reflect our most recent criteria from our 2018 data science salary study, released in May 2018.
After our much-debated blog post, 4 Ways to Spot a Fake Data Scientist, many readers were curious to know what criteria Burtch Works uses to identify data scientists, since the title itself is not always an indicator. The following is adapted from our recently released data science salary study that goes into more detail about the academic background, skills, and day-to-day job responsibilities that we look for when identifying data scientists. To download the full report with complete compensation data and demographic information, as well as how data science salaries compare to predictive analytics, click here.
Data scientists apply sophisticated quantitative and computer science skills to both structure and analyze massive stores or continuous streams of unstructured data, with the intent to derive insights and prescribe action.The depth and breadth of their coding skills distinguishes them from other predictive analytics professionals, and allows them to exploit data regardless of its source, size, or format. Through the use of one or more general-purpose coding languages and data infrastructures, data scientists can tackle problems that are made very difficult by the size and disorganization of the data.To identify data scientists for our recruiting efforts and Burtch Works Studies, we use the following criteria:1. Educational Background – Data scientists typically have an advanced degree, such as a Master’s or PhD, in a quantitative discipline, such as Computer Science, Physics, Engineering, Applied Mathematics, Statistics, Economics, or Operations Research. New educational options include data science degree programs, MOOCs (massive open online courses), and bootcamps which continue to take hold in the quantitative community. Some professionals from related careers or fields of study have successfully pivoted into entry-level data science roles through premier bootcamps and mid-career Master’s programs.2. Skills – Data scientists have expert knowledge of statistical and machine learning methods using tools such as Python and R, with predictive analytics still at the core of the discipline. Data scientists are usually proficient users of relational databases such as SQL, Big Data infrastructures like Hadoop and Spark, related tools like Pig and Hive, and, frequently, AWS.Data scientists may use languages such as Python, Java, and Scala (among others) to write programs to wrangle and manage data, automate analysis, and, at times, build these functions into production level code for SaaS companies. Many also use other methods to derive useful information from data, including pattern recognition using TensorFlow and deep learning techniques, signal processing, and visualization.3. Dataset Size – Data scientists typically work with datasets that are measured in gigabytes or larger increments, usually too large to be housed in local memory, and may work with continuously streaming data.4. Job Responsibilities – Although they may specialize in a specific area, data scientists are equipped to work on every stage of the analytics life cycle which includes:
- Data Acquisition – This may involve scraping data, interfacing with APIs, querying relational and non-relational databases, building ETL pipelines, or defining strategy in relation to what data to pursue.
- Data Cleaning/Transformation – This may involve parsing and aggregating messy, incomplete, and unstructured data sources to produce datasets that can be used in analytics and/or predictive modeling.
- Analytics – This involves statistical and machine learning-based modeling in order to understand, describe, or predict patterns in the data.
- Prescribing Actions – This involves interpreting analytical results through the lens of business priorities, and using data-driven insights to inform strategy. Strong technical chops alone do not make an exceptional data scientist, so when recruiting we look for a combination of technical and non-technical skills.
- Programming/Automation – In many cases, data scientists are also responsible for creating libraries and utilities to operationalize or simplify various stages of this process. Often, they will contribute production-level code for a firm’s data products.
Note: Professionals whose jobs are described as predictive analytics, analytics management, business intelligence, and operations research are not classified as data scientists under our definition. This is because they either do not work with exceptionally large datasets or do not work with unstructured data. In the specific case of operations researchers, their function is to optimize well-described processes rather than predict and prescribe insights towards more nebulous problems like customer behavior. Predictive analytics professionals (that primarily work with structured data) were the subject of their own study, The Burtch Works Study: Salaries of Predictive Analytics Professionals, released in September 2017.There’s been a lot of conversation around this developing field, and as the tools continue to evolve, our criteria have evolved with each new study as well. Whether you’re a data scientist, an analytics professional, a programmer, or a data engineer, it’s important that you continue to learn as tools enter the market, and keep up with new technology. I’m sure there will be some bleeding edge tools that we’ve missed, so be sure to leave your thoughts in the comments below.