Data science&engineering

The Yin and Yang of Data Science and Data Engineering

“Data is the new oil. It’s valuable, but if unrefined it cannot really be used.” -Clive Humby

I recently became very interested in Data Science and Data Engineering; how they compare and complement. I initially assumed Data Engineering was a sub set of Data Science but after extensive research I found out just how much the two fields differ.

In this article, I hope to discuss the difference, and complements of data science and Data Engineering.

Data

To fully understand the relationship between Data Science and Data Engineering, you have to understand the one thing that links them both; Data.

Data is a word that has become commonplace in today’s society, with so many reports of data leaks,the innapropriate collection of data by big tech companies, and so on.

Data is information that is collected and stored in a format that can be processed by a computer. It can be in various forms such as numbers, text, images, and videos, and it can be collected, stored, and analyzed to extract insights and inform decisions.

Now why do so many companies want data and what’s so special about it?

Data is important to companies because it allows them to make informed decisions about their operations and strategies. By analyzing data, companies can gain insights into the behaviour of their users, and insights gotten from their users can then be used to make their products way more effcient and useful for users.

Data scientists and engineers are the people responsible for collecting the data, making it useful, analysing it, gaining insights & trends from it, and passing on the information mined to the management in order to permit informed decision making. Now let’s see how they differ.

Data Science

Data Science was themed the The Sexiest Job of the 21st Century by the Harvard Businees Review and it’s claim to the title is arguably legitimate.

Data Science is the process of using scientific methods, algorithms, and systems to analyse and extract value from data.

In other words, the data scientist is the individual responsible for gaining insights from data and making abstract mathematical models from the data in order to enable prediction.

Now let us look at the data engineer.

Data Engineering

Data Engineering is the process of designing, constructing and maintaining the pipelines and infrastructure that collect, store, process and analyze data.

The Data Engineer is the indivdual responsible for ensuring that data required by Data Scientists to anaylse and gain insights from is available in the right and acccurate format.

Data is infuriatingly complex and disordered when it is collected and in order for Data Scientists to efficiently gain inisghts from it, the data needs to be pre-processed and once insights have been made, Data Scientists then formulate an abstract mathematical model from it which is commonly known as a Machine Learning Model and this said abstraction needs to be post-processed in order to be deployed and integrated into the product. All the tasks described are performed by data engineers.

An analogy to describe the relationship between the Data Scientist and the Data Engineer

Imagine you placed a bet with a friend on the outcome of a football game but you wanted to cut out the luck factor, that is ever so present in uninformed guesses, and be extremely sure that the team of your choice wins the game and you win the bet.

A data engineer would collect the data on the two teams involved in the bet, data points such as; number of games won, possesion rate per game, and results of previous clashes between the two teams, create an ETL pipeline where the data would be collected, cleaned and stored for the data scientist.

The Data Scientist would then perform something called Predictive Analysis using Machine Learning; this means the data scientist would simply feed the data prepared by the data engineer into an algorithm that then generates a mathematical Absttraction called a Machine Learning model, the Machine learning model then predicts the team expected to win the bet, and just like that your guess becomes less of guess and more of a data informed decision.

Summary

As you can, hopefully, extrapolate from the description between Data Scientists and Engineers above, A Data Scientist is similar to a star football player and the Data Engineer like his very talented coach who keeps him fit and provides him with tactics to win a game.

Written on January 18, 2023