TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial…

Member-only story

Managing Data as a Data Engineer — Part 3: Key Principles & Lessons Learnt

--

In the last two articles, we have looked at how users view data and how data changes over time. In Part 3 of Managing Data As A Data Engineer series, I am going to share some of the key principles and lessons learnt while managing data as a data engineer.

Don’t Be Blind

In order to manage the ever changing scope of data, one of the first things to do is to not work in the dark. As data changes over time due to various reasons, these changes may break the system in different ways. It is important to be able to monitor the data flow in the system and make sure that they are working as intended.

To not be blind, first, we define a set of metrics to measure from a data system. Some of the example metrics that can be measured are the warehouse health metrics such as CPU load, memory load, number of connections to the warehouse, duration of query queue times, and storage capacity. Besides, other metrics such as the ‘freshness’ of the data, the number of NULL values in a column and number of duplicated rows are very useful indicators of whether a data system is healthy.

Using the metrics we identified, we can then set baselines to determine what we define as ‘healthy’. Monitoring can then be operationalised by building alert systems that automatically notify stakeholders when the unexpected happens. This allows immediate attention and remediation actions to be taken quickly.

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Loh Meng Xin
Loh Meng Xin

Written by Loh Meng Xin

Data engineer. Growing and learning as data grows. :)

No responses yet