Posted

Editor’s Note, 2016: Even a year after publication, this blog post continues to generate a fair amount of traffic and discussion, with many asking what our complete criteria are for discerning who is a data scientist. We posted the complete criteria that we use for our data science salary studies, which you can find here, if you’re interested.

Linda Burtch, Managing Director at Burtch Works | 30+ years’ experience in quantitative recruiting

In June 2013, with the data science hype picking up steam, I took to my blog to write Data Scientists… or Data Wannabes? – a post bemoaning  the plague of professionals who had begun changing their titles to Data Scientist without any of the necessary qualifications. At the time, the Data Scientist title was still loosely defined, which resulted in confusion in the market, obfuscation in resumes, and exaggeration of skills.pinocchio-595468_1920

Two years later, and the trend has gotten even worse. With the media inflating Data Scientists’ already-high salaries (I’ve yet to see a newly-graduated data scientist making $300,000+, as has been reported), data scientists have captured the imagination of job seekers thinking that they can write Hadoop on their resume and get a 50% raise.

I’m here to tell you that from all of my conversations with data scientists and “data scientists” I’ve discovered four telltale signs that a professional is not a true data scientist:

 

1. Lack of a highly quantitative advanced degree – It’s incredibly rare for someone without an advanced quantitative degree to have the technical skills necessary to be a data scientist. In our data science salary report we found that 88% of data scientists have at least a Master’s degree, and 48% have a Ph.D. The areas of study may vary, but the vast majority are very rigorous quantitative, technical, or scientific programs, including Math, Statistics, Computer Science, Engineering, Economics, and Operations Research. 2017 Update – Although it is becoming slightly more common for data scientists to have a quantitative Bachelor’s or Master’s layered with a top-tier bootcamp instead of a PhD, without a strong foundation in a technically rigorous program, it is very difficult to master all of the statistical and computer science concepts and skills necessary to be hired as a data scientist.

2. No concrete examples of experience with unstructured data or statistical analysisLists of tools such as Hadoop, Python, and AWS need to be accompanied by projects that show those skills being put to good use. If a professional cannot provide clear examples of their experience with unstructured data, or mentions data science projects, but keeps their involvement very vague, then they are probably not a data scientist. If their specific role in or impact on a Big Data project is unclear, that is cause for concern. 2017 Update – This is also true for statistical analysis. If a professional has experience organizing and structuring large data sets, but little-to-no experience with statistical concepts and analytics, then they are likely a Data Engineer, not a Data Scientist.

3. Purely academic or research background – Now, this is not to say that someone with a stellar academic or research background won’t make a great corporate data scientist, but a key component to being a data scientist in a corporate setting is business acumen. Understanding how findings affect business goals and delivering actionable insights to leaders is critical to a data scientist’s success. Many research academics have exceptional data skills, but without strong business savvy they are not data scientists… yet.

4. List of basic business skills – If I see a list of tools on a “data scientist” resume like Omniture, Google Analytics, SPSS, Excel, or any other Microsoft Office tool, you can be sure that I will take a harder look at whether or not this professional makes the grade. These skills are basic business qualifications that, by themselves, are insufficient for most data science jobs, and are not indicative of a true data scientist.

 

For more about what skills employers are looking for when hiring data scientists, click here. If you’re entry-level and looking to get into an analytics and data science career, check out these tips for landing your first job.

Here at Burtch Works we are hard at work getting ready for the release of our 2015 Burtch Works Study: Salaries of Data Scientists, which will have complete salary and demographic information for hundreds of data scientists, and will be completed in April. It will be interesting to see how salaries have changed since last year’s report, and whether the field has gotten more diverse. To stay updated on its release you can follow us on LinkedIn or follow @BurtchWorks on Twitter.

Learn more about the latest salaries and hiring market insights for the 2017 data science hiring market in our 10-minute Burtch Works Study recap video below, or watch the full-length version on YouTube!

28 Responses to “4 Ways to Spot a Fake Data Scientist”

  1. James

    I’m going to say up front that I’m relatively young and have only been a data scientist for 5 months (I’m an Engineering Scientist / Operations Research graduate).

    While this post makes some good points I think that the first point is over-emphasized both in industry and the post. I have worked with a few Masters and Ph.D. qualified data scientists and one thing I have come to realise is that while the academic credentials can certainly be treated as an indicator of ability, there is nothing to suggest that the best undergrads couldn’t run circles around Masters students and probably most of the Ph.D.’s.

    I’m not bagging on Ph.D.s at all, but I don’t believe there is much to gain for top undergrads doing a Masters (apart from a perceived ability increase).

    I’m happy to be proven wrong on this, what are other peoples observations?

    Reply
    • Jamie

      Mmm… Apart from the fact that the PhD study time (four years in my case) was spent painstakingly applying theoretical approaches I learned as part of my undergrad and MSc to real-world examples sponsored by an industry partner (not model examples chosen because the available resources and data fit the technique being taught), followed by authoring those results in such a way as to prove their use my field, being knocked back from peer reviewers on the basis of scientific rigour, reworking my analyses before finally being published before being rigorously grilled on subject by a board of experts.

      While I’m sure it’s entirely plausible that some graduates are more adept than some docs (because some people are better than others), in my field – Computational Biology – there’s absolutely no way that an someone with an undergrad degree could ‘run rings’ around ‘most’ post-docs as they simply wouldn’t have had the time to gain enough knowledge of their subject matter as well as the required mathematical and computational skill to do the right analysis in the right way. Rather, at the university where I did my PhD, I was paid to help teach MSc and undergrads in practical sessions.

      I would also agree that business acumen is also extremely valuable, which could not be learned in academia alone. What I would also infer from my professional experience is that the longer you work, the more you understand the limits your knowledge. Excellent graduates are often brimming with confidence in their ability – their enthusiasm is a big part of their appeal as employees, but take care not to dismiss others on the basis that they aren’t as good as you at the list of things you understand because while that list may seem fairly complete to you, someone with more experience may know where the gaps in your knowledge lie.

      Reply
  2. JX

    I am a bit surprised that you mixed computer science with quantitative sciences

    Reply
    • Paul

      Most people would say that machine learning is a subfield of computer science. And machine learning is the quantitative part of data science.

      Reply
  3. George Danner

    Its a shame that the media have contributed to this confusion by equating data science with the use of R as applied to customer data. That’s not data science!

    To me data science is all about business problem solving, using models and methods. Problem Solving skills are easy to test for in the interview process by asking a candidate to walk through a *problem* from beginning hypothesis to ending solution. If the candidate can tell a story with data as punctuation, that’s a data scientist. Everyone else is a poser.

    Reply
  4. Igor Kleiner

    “nullius in verba” – the motto of scientists

    We should not become a victim authority fallace.

    Your position is a simply speculations without scientific proofs , it is not a more than simple opinion.
    That worhless

    Reply
  5. Bijo Samuel

    I agree with your point 1 and 2 but not 3 and 4. You are basically implying that a person can only be a “Data Scientist” if they are specifically focused on “business” and industry. I disagree, firstly, the word business is vague, what exactly do you mean by it and what particular business skills are you referring to. We live in an age of specialization, its unrealistic and even unwanted that a person should be an expert in everything. A data scientist is a person who has the expertise to wrangle with data and extract insights from data, that is their scope. In industry\”business” you would typically work alongside experts in a particular field and if you don’t have that then you need to request it.

    This person (Data Scientist\Analyst) can come from any background, one could be an academic who spent the last 5 years doing various quantitative analysis in a biology department. Another could be a physicist or a biologist. Irrespective of their area of focus, I think the important thing is their general skill-set that can be transferred and applied to other areas. I think the focus needs to be on what this general skill-set should constitute, for me personally, at the bare minimum, the person should have a solid understanding of probability and statistics. But probability and statistics are not sufficient, in addition, the person should have a reasonable mathematical maturity, they don’t necessarily have to have mathematics degrees or phds, but a firm grasp of the critical undergraduate math which is Calculus (Single and multiple variable), Differential equations and Linear Algebra. In addition, it wouldn’t hurt to have knowledge of new machine learning developments, operations research, Combinatorics etc.

    In addition to the conceptual knowledge, they need to have experience applying it using one or more tools and a portfolio of actual projects (irrespective of field of application) showing how they have successfully derived “valuable” insights from data (extra points for deriving counter-intuitive insights). In addition they need to be able to wrangle with large amounts of unstructured dirty data. I would give each applicant a scenario where they are given a large amount of unstructured dirty data and question their approach to the project.

    Rather than business skills, I would emphasize that they have good communication skills (verbal and written) and most importantly, the ability to communicate with non-academics\non-technical people. In addition, the ability to sell is critical no matter what you do, you need to be able to sell yourself, your ideas, your insights and be responsibly for the “buy in”. You can’t just put something out there and hope someone will buy it.

    Most importantly, I think the person needs a hacker like attitude that is not limited in a dogmatic manner to only certain methodologies\systems or tools. I find most statisticians to be very much the opposite of this hacker spirit.

    Reply
    • Carlos Perez

      I’m sorry, “strong business savvy”?

      What a bogus requirement. You can’t demand a requirement that is extremely vague. You may have domain experts in your team for a specific industry, but I wouldn’t call this ‘business savvy.

      I speak to folks with MBAs, but they have absolutely no clue about the software industry. Does you concept of “business savvy” even translate to all kind of different industries?

      Reply
      • sbosowski

        Business savvy does not imply an MBA. We get into more detail about the softer skills necessary for success as a data scientist in this blog post, specifically:

        To be a data scientist you’ll need a solid understanding of the industry you’re working in, and know what business problems your company is trying to solve. In terms of data science, being able to discern which problems are important to solve for the business is critical, in addition to identifying new ways the business should be leveraging its data.

        Reply
  6. Dennis Crow, Ph.D., PMP

    I would certainly agree with the comments posted. In the social sciences and public administration, “business” acumen is very important when it comes to the fundamentals of data. Logic and semantics are more important to getting across the importance of data. It is important to understand what you’re measuring and why. In the federal government, after 20 some years in domestic policy agencies (not scientific ones) I haven’t seen advanced statistics used where they are much needed. Knowing R or Hadoop will not provide that kind of knowledge or experience. With those kind of salaries, it’s tempting to leap into this, but with the proviso that one needs to know why any of that is applicable. Graph databases and SPARQL are probably more appropriate to administrative data. Etc. I’ll keep watching for comments and always more on this subject.

    Reply
  7. Gene Leynes

    This is pretty funny coming from a recruiting firm that only did SAS recruiting in 2012 and said that they had no experience with people using R.

    Reply
  8. Eric Zambrana

    I’m surprised there hasn’t been an effort to apply different levels of expertise to data scientists. Something like what they have for project management skills like green belt versus a black belt. The black belt would be the full fledged data scientist and so on. I would say that you will rarely see data scientists with a PhD in stats, strong data mining skills and sharp business acumen. Two out of the three tools should be acceptable and the two must haves should be business acumen and data mining skills.

    Reply
  9. Jade Cook, PhD

    First of all, “data scientist” has nothing to do with science. It’s just an ordinary IT job that everyone with an advanced degree in quantitative discipline can do. The reason you’ve seen so many people applying for “”data scientist” positions are two-fold: (1) this job become more popular because of data expansion; (2) data “science” for application, has a very flat learning curve and can be learned in a couple of months simply because it’s not about creation, it’s *mostly* about applications of existing algorithms and requires very little innovation. What’s really hard still remains hard, but very few data scientists actually master them. The word “scientist” is simply being abused here. I’ve spoken to data scientist who has not even heard of Leo Breiman, or Jerome Friedman while they talk about trees and boosting all the time. All they talk about is their own little experiences with the methods invented by the real creators, which they treasure very much, but unfortunately, extremely easy to acquire.

    As long as a candidate satisfies (1) & (2) I would say she or he already deserves a second look. Just ask yourself a question, do you really think the “data scientists” you’ve hired are super capable or super smart? Not at all. Can they do their work well? Most certainly. So data scientist is just an IT job everyone can learn quickly and stop being ridiculous about it.

    Reply
  10. kuz

    I am a budding data analyst after completing a second master’s in Management Science (OR). After reading the points raised above by the author of the article, I can infer that the author expresses her own perspective of who a data scientist is. This necessarily do not reflect an absolute truth or reality and in some ways her definitions are contestable. The author tend to underestimate the competence and capabilities of those who are of “purely academic or research background”. She emphasises the centrality of business knowledge to the role of a data scientist. I don’t agree with that perspective. The Data scientist role should not be confused with that of a business analyst or business development people. Though these roles have points where they overlap but they are different. That a data scientist must have business acumen before he/she can actually be considered a true data scientist implies a narrow view of who a true data scientist is. While studying for my master’s degree, I see companies and businesses bring projects comprising large data set to academics/faculties (management scientists and operations researchers) to help solve complex business issues. Most of these academics do not have vast amount of business acumen but they could analyse dataset and explain insights and information to managers of these firms. Often times, companies have acted on insights provided by these academics and these have resulted in improved performance. By the way, what is business acumen? Are we turning data scientists to business strategist? I can infer that unless one has several years of working in industry as a data scientist and dealing with projects anything sure of that do not make one a TRUE data scientist.

    Reply
  11. A S

    This is an example of how perspectives can completely change the underlying meaning of things. Data science has nothing to do with ‘business’. Everything is not even business. Data science is being applied on many non commercial projects which are targeted at things like saving lives etc., and not targeted towards business. It is absolutely strange that someone writing about data science seems to be presenting a perspective that is an unacceptable diversion from the true idea of data science.

    To summarise: Current application of data science to commercial settings is still quite limited. Progress of data science has been primarily through solutions to very domain-specific and niche problems. It is important to disseminate correct information regarding a topic, instead of throwing out perspectives derived from experience in very limited settings.

    Thank you.

    Reply
  12. HarveyZow

    If someone can work with the problem and can add value, why describe them as fake ? Data scientist is just a name. Why force people to comply to your definition? You seem concerned to exclude people. If you had posed this as your view of important knowledge or skills, that would be quite different.

    Reply
  13. Bahman R

    The problem with companies who are looking for data scientist is that they want everything altogether cheap. They have been always like that. They are looking for bright minds with PhD-level quantitative skills and knowledge, yet they want them to have experience with unstructured data (as you put) or with market business. Well, I say good luck with that. Those features get together rarely. Scientists, mathematicians , physicists even many engineers are not familiar with unstructured data or market or business, they have spent most of their time on a huge amount of complicated knowledge and actually that’s why they are creative ! On the other hand people, specially computer engineers or IT professionals , many of them don’t know much mathematics. I’ve met computer science PhD’s who don’t know what a matrix is ! I suggest to you and many people who are in hiring, that value minds instead of techniques.

    Reply
    • Paul

      It is every weird mate that a CS PhD doesn’t know what a matrix is??? I have 100 PhD friend all of them know and they are expert in various areas of Computer Science.

      Reply
  14. Wilman

    Hello, whoever wrote the article is not well qualified. I say this as a person with years of both industry and academic experience. Particularly point 3 is rubbish. I can vouch that academicians are much better than the industry counterparts in terms of depth of knowledge in any area of data science and as such they will be much better data scientists. I have interacted with many industry working persons (HSBC, Barclays etc.) and found that they aren’t even good at feature selection.

    Just to prove what I said is true I urge all the readers to think about R and Python. R and Scientific python are groomed and many novel packages are developed by academicians. Today one could see that SAS (an industry standard) is miles behind R. Almost all major R packages involving different data mining tasks are from academicians in the form of publications. (particularly in Journal of Statistical Software). So, it looks funny when the person writing this article isn’t aware that data science has roots in academia and not in industry. Isn’t it damn wrong to say the academicians contributing to packages to tasks aren’t that suitable for data science? By the way dear author how many R packages have you contributed till date to public domain so that I can appreciate and accept your qualification to pass such a comment in point 3?

    Please accept my apologies dear author of the article if I sounded rude. Also, please mail me for any clarifications. I will try my best to answer any and all your questions.

    Reply
  15. Fuen

    I really don’t think having a keen business insight is fundamental to being an awesome data scientist. And a cs phd not knowing what a matrix, is just borderline ridiculous and feels like a made up lie. Don’t think it is possible for someone to get an undergraduate degree without taking a math course on matrix/vectors.

    Reply
  16. A Real Data-Scientist

    This article is a crap and meant to sooth people with undergrad and/or masters degrees titled as “Data-Scientists”.

    Term “Scientist” is awarded to some chosen people which means something. A scientist (whether a “data-scientist” or any other) cares about answering “Why” before “How”. A data-scientist is more into understanding the very intrinsic nature of data. His/her analytical minds constantly tries to find underlying patterns in data.

    Ph.D. means Doctor/Doctorate in Philosophy. It is awarded to those who have demonstrated/proven and have accepted abilities (by well-known scientific community) in a particular field. These abilities are demonstrated not just once, and not just in one way…these are proven over and over in different ways. Some examples to prove these abilities are:

    (1) First getting into a reputable Ph.D. program at a competitive university. Just getting there is a big challenge. You need a track record of good undergraduate and/or masters degree, references, etc.
    (2) Winning funding from competitive sources, such as, NSF, NASA, DoD, DoE, etc.
    (3) Ability to teach undergraduates by performing TA duties.
    (4) Being a research fellow/assistant for someone who has a lot more knowledge and experience than you.
    (5) Win funding to attend conferences and to present your research-findings before scientific community.
    (6) Win funding for scientific workshops, camps, etc.
    (7) Pass advanced level courses in your disciplines.
    (8) Pass qualifying/comprehensive exams in your areas of research.
    (9) Be known to current scientific community AND to know current scientific community.
    (10) Develop some meaningful scientific methods and discoveries.
    (11) Publish your findings in reputable conferences and/or journals (not some crappy, low level conferences/journals).
    (12) Convince scientific and/or federal organizations to give you research grant (which is very big achievement as you are evaluated by your peers anonymously and they trust your abilities by providing your funding coming from tax-payers money).
    (13) Graduating with a Ph.D.. It is estimated that only 64% Ph.D. students actually graduate with a Ph.D. degree in the USA. Most drop-out as they are unable to complete above steps. They complete only a few, but not all so they fail. Also, there are only 1% PhDs in the USA. So earning a Ph.D. from a reputable institution under the supervision of a real scientist is a big achievement. It changes your title from Ms./Ms. to Dr. Does that mean something to undergraduates/Masters?

    I can continue defining characteristics of a “Real Scientist”. Giving someone a title of “Data Scientist” without him/her having any solid track-record listed above is a joke and an abuse to scientific community. Above mentioned scales are standard in most universities. It means that not everyone has ability to sustain pressure of carrying out research and prove himself/herself. This scale filters out those who do not deserve to be a scientists. It could be due to personal problems or perhaps nature did not give them ability to cross that line which separates a “Real scientist” from a “Non-real scientist”.

    It is said that when you can not give someone monetary promotion, give them title (recognition) award so that they calm down and feel better about themselves. Most IT companies now-a-days award you “Data-Scientist” position even if you have an undergraduate/masters degree.

    Being able to perform some basic statistical analysis, writing regression/classification/clustering model in R, Python, etc. does not make you a data-scientist. These are just tools. Everyday new tools appear in market. Nothing special about them. You can for sure call yourself a “Data-Analyst”, but trust me, if you meet an actual scientist, s/he will be humored to hear you calling yourself a “Data-Scientist” with an undergraduate/masters degree.

    Finally, a real data-scientist does not give a damn about business as stated in this article. For a data-scientist a data is just a mixture of numbers, characters, sentences, etc. Data-scientist in interested only in finding hidden patterns in data. It is similar to what a patient is to a medical doctor; just a subject who has some known/unknown symptoms. Doctor’s job is to treat those symptoms whether that patient is the president or a criminal.

    Reply
    • Christopher E. Devito

      absolutely true… a phd in computer science or any other natural science is very difficult to attain. much more complicated then landing a data scientist role.. it really is a bad job description ..

      Reply

Trackbacks/Pingbacks

  1.  Not Just a Title: How to Identify a Data Scientist - Burtch Works
  2.  TeradataVoice: The New Analytics Professional: Landing A Job In The Big Data Era - MorningStandard.com
  3.  The New Analytics Professional: Landing A Job In The Big Data Era | PJ Tec - Latest Tech News | PJ Tec - Latest Tech News
  4.  The New Analytics Professional: Landing a Job in the Big Data Era - Analytics Matters - Analytics Matters
  5.  The Life of a Data Scientist | BI Monitor
  6.  How to become a Data Scientist | How Why What

Leave a Reply

Your email address will not be published. Required fields are marked *