Posted

For many quantitative business professionals, the debate around analytics tools has been known to rival the enthusiasm of political conversations at Thanksgiving dinner. Over the past two years we’ve sent out an immensely popular SAS vs. R survey to our network of quantitative professionals, generating over 1,000 (often very passionate) responses each year. But, last year, we received enough write-in requests to include Python in our survey that we decided to evolve with the times!

To keep things simple, we only asked one question – Which do you prefer to use: SAS, R, or Python?

Overall Donut
Over time

 

Our results show that support for open source tools has been steadily climbing over the past three years, with 61.3% of respondents this year choosing R or Python, and 38.6% choosing SAS. As per usual, there were a few professionals who responded with neither/both, or with write-ins for other tools. Who knows, perhaps our survey will have to evolve again next year? We’ll see!

 

Eager to learn more about these numbers? So were we! After tallying the initial results we couldn’t wait to dig into our survey data from the past three years and see what we could discover.

 

Region 2016
In this year’s results, it’s easy to see that the largest proportion of Python supporters are on the coasts, with the West Coast edging more towards open source options R and Python than any other region. R support is highest in the Mountain region, and SAS support continues to be highest in the Midwest and Southeast.

 

Industry 2016

Similar to last year, Tech/Telecom most heavily favors open source options R and Python, which is not too surprising given the prevalence of West Coast supporters. As in 2014 and 2015, SAS continues to have a stronghold in the Retail/CPG and Financial Services industries.

Education 2016

 

Professionals with a Ph.D. are more likely to prefer R than SAS, while Bachelor’s holders are more likely to prefer SAS than R. Support for Python varies slightly, but is strongest amongst professionals with a Ph.D.

Years Experience 2016

 

Support for open source tools R and Python is the greatest amongst professionals with 0-5 years’ experience, while support for SAS is greatest amongst professionals with 16+ years’ experience. As more universities have moved towards teaching open source tools like R and Python in their programs, we’ve seen more support for these options from junior professionals.

DS vs PAP donut

Python holds the majority of support amongst data scientists, with SAS only holding 3% amongst this group. This is likely due to its limitations when building the custom tools that many data scientists use to manage unstructured data. Amongst other predictive analytics professionals, SAS and R support is relatively even. For more information about how we define “data scientist”, check out this blog.

DS vs PAP

Here you can see how tool preferences changed for data scientists and other predictive analytics professionals since our 2015 survey.

A Few Lively Comments

With so many participants in our flash surveys, we always receive quite a few entertaining observations that we enjoy sharing with you. Here are just some of the enjoyable responses we collected:

  • SAS 😊😊😊 All the way
  • R is a nightmare.
  • I’ve turned my team into a Python shop last year; we have no regrets
  • R. Then Python.  I never use SAS anymore 🙂
  • SAS is so intuitive to use for data analysis
  • SAS should be very concerned, as if they are a red giant transitioning to a white dwarf.
  • SAS, R, *and* Python!
  • I know 7 dozen PhDs in Germany that would say Python
  • SAS ….. probably because I’m over 50.      ; )
  • R – Anytime, Anywhere, Anything
  • Python is the future
  • 1st SAS, 2nd R, 999th  Python 🙂

If you’re interested in comparing any of these results to our “deeper dive” from last year, you can check out this blog post. We also presented more detail on these results in a 15-minute webinar, and you can find the recording on our YouTube channel. We always find the evolution of tools and methodologies in predictive analytics to be pretty interesting, so we look forward to this every year. What did you think of the results, did anything surprise you? Let us know in the comments!

 

 

23 Responses to “SAS, R, or Python Survey 2016: Which Tool Do Analytics Pros Prefer?”

  1. Chris Weekes

    R, less installation steps. Python, more automation functionality. SAS highest data storage ability. Code framework similar. They can complement each other. None better just different.

    Reply
  2. John Hogue

    As the industry matures and switches from ad-hoc to operational Python will continue to take market share from R and SAS. R because of formal code style guidelines, ease of picking up and scalability (PySpark, Dask). Both R and Python will eat away at SAS as licencing fees and student access will continue to divert new comers.

    Reply
  3. Raghu Reddy

    The nice thing about Python is, it is one language that can do everything. It is a fantastic sticky glue. For that reason alone, Python is by far the most fun of the three.

    The nice thing about R is, it comes with many readily available packages for a lot of quantitative routines. Beyond that core area, R has no appeal as a general purpose language. (People like Norman Matloff will of course argue endlessly because it is free publicity to sell some more books).

    Now what is nicest thing I can say about SAS? Well it ain’t dead yet.

    The geographical distribution chart is first of its kind I have seen. It is not surprising that SAS is loosing faster in the bi-coastal markets.

    Reply
  4. Parfait Gasana

    Qualitatively these powerful tools are somewhat different and like a great handyperson one should keep a handful in their toolkit. Yet, I tend to see many experts such as StackOverflow high rep users, GitHub coders, industry/academic users heavily skilled in only one type. Both companies and analytics employees should strive to be immersed in each tool, ready to switch per the task of the job and nature of work scope. Below are my takes of the three (advanced apologies for any biased opinions):

    SAS
    Pros: Industry dominance in mission-critical applications and current operations; great big data handler where storage uses hard disk as opposed to memory; powerful integrated procs ecosystem where many modeling types are available; everything contained and readily built without multiple third-party packaging; excellent integration with relational databases (Oracle, MySQL, SQL Server); great online tutorials/forums and 1-800 support services;
    ————–
    Cons: base language tends to be software-restrictive domain where syntax is not easily translatable or inferred from other programming languages; the dataset serves as main core data structure where scalars and arrays mostly run under a data step or macro; perpetual, yearly, high cost licensing that become prohibitive for individuals or small to medium sized companies;

    R
    Pros: Arguably a general purpose, scripting language capable of nuanced needs; all data structures derive from the vector type (i.e., no scalar type but a vector of one element) so many operations can be vectorized where processes run at the lower level C language including the apply family (lapply, mapply, sapply), ifelse(), aggregate(), outer(), do.call(), etc. for efficient runtimes; large, popular, active community with the immense CRAN-R network with close to 8,000 dedicated packages with very popular dplyr and data.table modules; excellent data visualization tools especially with ggplot2 library; excellent integration with relational databases (Oracle, MySQL, SQL Server);
    ————–
    Cons: High learning curve for beginners especially from non-vector type languages where nested for loops and if/then logic sequences are regularly used; being free and open-source, no 1-800 support for package installation on various CPU/server environments; reliance on library authors to debug packages and update versions;

    Python
    Pros: True general purpose, scripting, object-oriented language with a growing analytics emphasis with third party modules –numpy, scipy, pandas; data structures and data types are fluid with strings, numbers, lists, tuples, sets, dictionaries, series, matrices, arrays, dataframes easily interchanged, nested, upsized and downsized for ease of data manipulation; capable of nuanced tasks such as XML/JSON parsing, I/O flatfile and binary file manipulation, class type support; command line with arguments and multi-threading of child processes; integration with C++, .NET, and Java machine languages and web interfaces such as its popular Drupal CMS; excellent integration with relational databases (Oracle, MySQL, SQL Server) even document/NoSQL types (Mongo) and big data (Hadoop); and easy package installation with PIP;
    ————–
    Cons: Strong type language that forces a structured layout particularly indentation and line breaks but helps in human readable code; terse syntax (as opposed to Java-esque verbosity) can be tough to read in some situations; analytics aspect still a growing area and not fully realized for modeling purposes or integrated in industry (i.e., academic, finance, healthcare); being free and open-source, no 1-800 support for package installation on various CPU/server environments; reliance on library authors to debug packages and update versions;

    Complements
    Also, do note the three can even interact with one another: SAS’ IML matrix module to interface with R; R’s foreign and Python’s pandas package to read and write SAS/Stata/SPSS datasets; and Python’s Rpy2 package to interface with R. And each with built-in command line shells to call external scripts or batch jobs with arguments.

    Reply
  5. Robert Young

    One aspect to note, R (I believe solely) can be run in-process with industrial strength RDBMS, DB2 excepted (IBM bought SPSS, and hasn’t yet bitten the bullet). This allows seamless execution of stat/graphic analysis in database applications, existing and new. A lot of leverage there.

    Reply
  6. Juan

    None of them.
    We need something new, able to work transparently with large datasets that don’t fit on memory (even ditributed) and perform statistical analysis easily (included mixed effects and bayesian analysis with large datasets).
    And the ability to do it with an ETL interface.
    ff or bigmemory packages are not the solution, they are just patches.
    Maybe Spark in 10 years?

    Reply
    • Michael Gilbert

      Your point is well taken, but a lot of this depends on what the goal and purpose of the model is: is it for statistical inference, or predictive accuracy? Remember that larger is not always better. If you’re looking for the needle in the haystack and have pedabytes of data, then sure. R’s local memory issues are a constraint, and as you mentioned ff and bigmemory are just patches, not a solution. Spark is getting closer, but more on the ML side of the house. You can always do an ODBC connection and sample directly in from SQL. I’ve been wanting to check out BayesiaLab, but haven’t had the chance. For R, there’s also BUGS-like packages such as R2WinBUGS and bayesm.

      Michael

      Reply
  7. David Webb

    This review is based on a sampling of these languages. A sampling is what it takes to select one or more languages for a use-case, and a positive showing during this sampling may be all that it takes for language selection and subsequent preference.

    Python – Object oriented makes it easy to jump in from other languages. Multiple versions supported on the same host allow software that requires different versions. For example. On CentOS 6.x, yum is written in python 2.6.6. However Python 2.7.10 and 2.7.11 or greater may be required for certain data science toolsets. Anaconda and other development environments make Python fairly easy to develop in and port to other platforms. Python’s support for Spark and its compilability are other great advantages. I miss the inline debugging development environments that I’ve used since the late 1980’s. The console just isn’t as easy to work with.

    R – Great language for mathematics. If R had been available when I was in college, I would have skipped perl entirely. The functional programming style takes a bit of getting used to, but only a bit. It doesn’t share the multi-versions on the same platform capability that Python offers, so pick your version carefully. Newer versions may not work with some packages and older versions may not work with others. R-Studio and Jupyter are nice development environments. R is lacking in compilability though. I suspect that this will limit its use in packaged software.

    SAS – In the early 1980’s, my mother used to bring home reams of paper that had been printed on one side. I recently took a class that showcased SAS. Guess what? I made paper airplanes out of SAS printouts when I was a kid. The language shows its age in two ways. First, it’s mature and has a large suite of data analysis and visualization tools that can be immediately used. Second, it’s page-screen mainframe history is clearly visible in the way code needs to be written. SAS is one of the oldest 3G programming languages that I’ve worked with. I think the company needs to enable native support for a more modern, open-source language like Scala, Python, or R within their development suite.

    Scala – Not mentioned in this survey, Scala is a language for the serious programmer-data-scientist. Unfortunately, it’s a bit of a challenge to learn the object-oriented/functional nature of the language all at once. Native support for Spark and compilability into jars are big positives. Eclipse and Intellij editors are nice, and Intellij offers inline debugging with variable watches, which far surpasses scripting consoles for testing code, but consoles are included if that’s what you’re used to.

    Reply
  8. Carmen Gallagher

    Just as well you can code in any of the above plus a whole lot more with the new SAS Viya platform then. Now it’s the best of both worlds

    Reply
  9. Hector Alvaro Rojas

    Hi Carmen!

    What´s up?

    You got me! All of them are my favorite combination. I like all of them and I learn how to use them better each time that I can.

    They do not need to compete. They are the best! No question about!

    Anyway, I think you forget to include TABLEAU. You can interact with all of them including Tableau. Scripts in html or java can help a lot in the interaction. That´s what I do!

    You can separate them as free or not free access platforms. In this case, R-project, Python and Tableau Public are the winners.

    Regards!

    Reply
  10. Nilmadhab Mandal

    I would be very interested to know how many of the projects have been are running in ‘production’ mode or has been deployed to take important decision. Also a view on size of the organizations where it has been deployed would have throw interesting insights. Hope in the next survey a question on deployments and usage of deployment (how many users leverage the projects) can be incorporated.

    Reply
  11. Frank

    What happened in 2016? Python from 0% to 20% in this investigation.
    By the way, I think it’s better has a choice as “other tools” for responses.

    Reply
  12. Wasifur

    If you see the future then distributed machine learning is the key for success. SAS is on top with their new in memory distributed processing platform Viya. Also similarly you can use the same facility using R on Azure ML platform. Python with PySpark can run in distributed mode on Spark, but its execution is much slower than Spark MLib. So at the end we need to have a single platform where any type of user either Business analyst or Data scientist or Statistician can work. SAS Viya scores here which provides both point and click as well as programming interface.

    Reply
  13. Noor

    Hello All,

    I would like to say thanks in advance for your responses.
    Actually, I have completed my masters in computer science in 2011. Since, beginning I was studying and doing part time job due to financial issues. I was a teacher but my dream was always to be IT professional to do some professional course but when the pockets are empty your dreams are shattered.
    Well now I am 31 years old and think I am too old to fulfill my dream. But, today after thinking a lot thought to give a try if I still can do something. I almost forget everything about programming so seems I am new comer.

    In short now i am thinking to study SAS myself and try my luck. I need to know if I still can change my profession at this age from teaching to SAS analytics and also I would like to know what I need to study Or its too late now to change my career line. Please suggest. I am good in studies I can learn myself also.

    waiting desperately for your responses.

    Thanks and regards,
    Noor

    Reply
    • sbosowski

      Hi Noor,

      It’s never too late! Especially if you already have completed a Master’s in Computer Science, you might just need to take a few MOOCs to re-familiarize yourself with core concepts and brush up on the latest tools. A lot of employers are looking for professionals with computer science and statistics experience, and now there are so many online resources to learn these skills.

      We’re wishing you the best of luck on your career journey!

      Stacy

      Reply
  14. Steve

    Well written topic and also nice comments.

    However, the reality in “”my”” working environments as of March 2017 is:

    80% Python
    15% R
    5% Java and Scala
    0% SAS

    I use Python for almost all tasks. I prefer R to compensate for lack of some essential packages in python. However, I access these packages via Rpy2. Further, my other strong case for using R is for graphics (ggplot2) since I abominate python plotting libraries. (personal choice)

    It is my firm belief that Clojure will emerge as the language for everything in the future. Some companies (like yieldbot) have their internal libraries for Machine learning written in Clojure. Golang is also very good but as of now it has 3rd rated and dirty libraries for analytics. Hence, Golang is at least 2 years away for serious production tasks.

    Clojure like python is a glue language. If we take 10 different application areas, clojure has its presence alongside python in all of them.(say .NET – CLR/ Mobile – Clojurescript and React.js/DSL/Data science – Incanter and Spark interface etc.,). However, its performance is better than python.

    Scala has no life beyond data science and hence its adoption is slowing down. No one likes complexity. Only companies which heavily invested without forethought still lobby for Scala programmers. I have read many articles from people who started new data science companies and exclusively use python. They prefer improving performance of slow python code (pypy/other approaches) as opposed to writing anything in Java/Scala.

    For me, Clojure looks promising in being one size fits all. I plan to use it for the next 5 years no matter what hurdles come in. I suggest you all use Scala/R/Python but not Clojure. This way I can eliminate you in competition and win.
    :-))

    Reply

Trackbacks/Pingbacks

  1.  Open Source im Trend: R löst SAS als beliebtestes Analysetool ab |
  2.  The Coronation of Predictive Analytics: A Four-Year Retrospective - Burtch Works
  3.  2015 SAS vs. R Survey Results - Burtch Works
  4.  R vs Python vs Scala vs Spark vs TensorFlow… The quantitative answer! – In-Depth Data

Leave a Reply

Your email address will not be published. Required fields are marked *