Industry Insights

Rise of the Data Engineer: Tools, Skills, and the Future of Big Data

Posted

October 11, 2021

Data engineering has never been more important or relevant than it is today. While in the past, many of their responsibilities were often tasked to data scientists, most data scientists are not experts at building data infrastructure and pipelines to complete their work. Because of this, we’ve found that many data scientists prefer to exclude data engineering requirements from their job responsibilities altogether.This post, which is an excerpt from our newly-released 2021 salary report for the data engineering field (which can be downloaded for free here), shares more about how this field has developed over the past few decades, and some of the criteria that Burtch Works uses to define data engineers.

The Evolution of Data Engineering

Many might say that data engineering as a profession has been around for well over a decade, or even several, since relational databases came to market led by major Original Equipment Manufacturers (OEM’S) in the 1970’s. This included Microsoft SQL Server, IBM DB2, and Oracle. However, the reality is that data engineering has evolved immensely since the early years with the onset of Big Data, digital transformation, and more sophisticated data science practices like machine learning and artificial intelligence.Now data volumes, variety, and velocity are much greater than what they used to be, which has led data engineering professionals away from using traditional ETL tools to developing and adopting new tools and processes to handle the data revolution. These modern tools and responsibilities now support cloud computing, data infrastructure, data warehousing, data mining, data modeling, data crunching, metadata management, data testing, and governance, among others.

Defining Data Engineers

So how does Burtch Works define data engineers? We define data engineers as professionals who design and build systems for collecting, storing, and analyzing data at scale. They are also typically responsible for building data pipelines to bring together information from different source systems.

Data Engineer Education Profile:

Data engineers typically hold a Bachelor's or Master's degree in Computer Science, Information Systems, or Computer Engineering. In the sample from our 2021 salary report data engineering professionals, we found that the most common degree is a Master’s degree (62% of the sample), followed by Bachelor’s degrees (32%), and PhDs were rare (5%). For more about how this compares to data scientists, check out this post.

Data Engineer Tool Usage

Data engineering is a field with many tools, and it’s not uncommon to see a very extensive tool section on a resume or job description. There is no singular tool that makes someone a data engineer, and so we find that most data engineers will have a very broad set of experience with many tools, including many of the examples listed below:

Programming: Python, PySpark, Scala, Java, SQL, Shell Scripting, or occasionally C++
Cloud Computing: AWS (Redshift, EMR, EC2, Lambda, S3, etc.), Azure, or GCP (BigQuery)
Relational Databases: SQL Server, Oracle, MySQL, Teradata
NoSQL Databases: Cassandra, MongoDB, Neo4j
Continuous integration/continuous deployment (CI/CD): Docker, Jenkins, Kubernetes
Big Data technologies: Hadoop, HDFS, Hive, MapReduce, Spark, HBase
Reporting: Tableau, PowerBI, and Looker
Other: Databricks, Airflow, Git, JavaScript, HTML, Linux

Typical Data Engineer Skills & Job Responsibilities

Data engineers often have a wide range of skills and work alongside data scientists to prepare data for analysis and put data products into production. For more about how data engineer vs. data scientist skills compare, see this post. Below are some examples of typical data engineer skills and responsibilities that we see:

Building data pipelines and ETL or ELT
Experience with complex distributed computing
Ability to work with structured and unstructured data
Deployment of data science models
Experience with data science applications
Experience with continuous integration working with Docker and Kubernetes
Build and scale large batch data pipelines and real-time ETL pipelines
Gather business requirements and implement data processes
Design and support data lakes and data marts
Work with data scientists to deploy machine learning models
Troubleshoot models in a production environment to ensure accuracy

Typical Job Titles

There are a variety of different job titles and specializations within data engineering, and we’ve also seen a rise in hybrid-type roles that may lean further towards machine learning or DevOps. Below are just a few examples of data engineering job titles, but to learn more about specializations like BI Engineers, Computer Vision Engineers, or Data Architects, you can read this post.

Data Engineer
Big Data Engineer
Data Science Engineer
Cloud Engineer
Cloud Data Engineer
Principal Data Engineer
Manager/Director, Data Engineering
Head of Data Engineering/Architecture

Looking into the future, as we continue to see more hiring and investment allocated to building data teams, the demand for data engineering is poised for significant growth and continued innovation. There is a lot to learn about this growing field, so our hope is that this post can be a good foundational resource to learn more about who these professionals are and what they do.

Interested in our salary research on data engineers and data scientists? Download our studies using the button below.

Click to download our free salary reports