Rise of the Data Engineer: Tools, Skills, and the Future of Big Data
Data engineering has never been more important or relevant than it is today. While in the past, many of their responsibilities were often tasked to data scientists, most data scientists are not experts at building data infrastructure and pipelines to complete their work. Because of this, we’ve found that many data scientists prefer to exclude data engineering requirements from their job responsibilities altogether.This post, which is an excerpt from our newly-released 2021 salary report for the data engineering field (which can be downloaded for free here), shares more about how this field has developed over the past few decades, and some of the criteria that Burtch Works uses to define data engineers.
The Evolution of Data Engineering
Many might say that data engineering as a profession has been around for well over a decade, or even several, since relational databases came to market led by major Original Equipment Manufacturers (OEM’S) in the 1970’s. This included Microsoft SQL Server, IBM DB2, and Oracle. However, the reality is that data engineering has evolved immensely since the early years with the onset of Big Data, digital transformation, and more sophisticated data science practices like machine learning and artificial intelligence.Now data volumes, variety, and velocity are much greater than what they used to be, which has led data engineering professionals away from using traditional ETL tools to developing and adopting new tools and processes to handle the data revolution. These modern tools and responsibilities now support cloud computing, data infrastructure, data warehousing, data mining, data modeling, data crunching, metadata management, data testing, and governance, among others.
Defining Data Engineers
So how does Burtch Works define data engineers? We define data engineers as professionals who design and build systems for collecting, storing, and analyzing data at scale. They are also typically responsible for building data pipelines to bring together information from different source systems.
Data Engineer Education Profile:
Data engineers typically hold a Bachelor's or Master's degree in Computer Science, Information Systems, or Computer Engineering. In the sample from our 2021 salary report data engineering professionals, we found that the most common degree is a Master’s degree (62% of the sample), followed by Bachelor’s degrees (32%), and PhDs were rare (5%). For more about how this compares to data scientists, check out this post.
Data Engineer Tool Usage
Data engineering is a field with many tools, and it’s not uncommon to see a very extensive tool section on a resume or job description. There is no singular tool that makes someone a data engineer, and so we find that most data engineers will have a very broad set of experience with many tools, including many of the examples listed below:
- Programming: Python, PySpark, Scala, Java, SQL, Shell Scripting, or occasionally C++
- Cloud Computing: AWS (Redshift, EMR, EC2, Lambda, S3, etc.), Azure, or GCP (BigQuery)
- Relational Databases: SQL Server, Oracle, MySQL, Teradata
- NoSQL Databases: Cassandra, MongoDB, Neo4j
- Continuous integration/continuous deployment (CI/CD): Docker, Jenkins, Kubernetes
- Big Data technologies: Hadoop, HDFS, Hive, MapReduce, Spark, HBase
- Reporting: Tableau, PowerBI, and Looker
Typical Data Engineer Skills & Job Responsibilities
Data engineers often have a wide range of skills and work alongside data scientists to prepare data for analysis and put data products into production. For more about how data engineer vs. data scientist skills compare, see this post. Below are some examples of typical data engineer skills and responsibilities that we see:
- Building data pipelines and ETL or ELT
- Experience with complex distributed computing
- Ability to work with structured and unstructured data
- Deployment of data science models
- Experience with data science applications
- Experience with continuous integration working with Docker and Kubernetes
- Build and scale large batch data pipelines and real-time ETL pipelines
- Gather business requirements and implement data processes
- Design and support data lakes and data marts
- Work with data scientists to deploy machine learning models
- Troubleshoot models in a production environment to ensure accuracy
Typical Job Titles
There are a variety of different job titles and specializations within data engineering, and we’ve also seen a rise in hybrid-type roles that may lean further towards machine learning or DevOps. Below are just a few examples of data engineering job titles, but to learn more about specializations like BI Engineers, Computer Vision Engineers, or Data Architects, you can read this post.
- Data Engineer
- Big Data Engineer
- Data Science Engineer
- Cloud Engineer
- Cloud Data Engineer
- Principal Data Engineer
- Manager/Director, Data Engineering
- Head of Data Engineering/Architecture
Looking into the future, as we continue to see more hiring and investment allocated to building data teams, the demand for data engineering is poised for significant growth and continued innovation. There is a lot to learn about this growing field, so our hope is that this post can be a good foundational resource to learn more about who these professionals are and what they do.