Member-only story

Managing Data as a Data Engineer — Part 3: Key Principles & Lessons Learnt

Published in

TDS Archive

8 min readFeb 11, 2021

In the last two articles, we have looked at how users view data and how data changes over time. In Part 3 of Managing Data As A Data Engineer series, I am going to share some of the key principles and lessons learnt while managing data as a data engineer.

Don’t Be Blind

In order to manage the ever changing scope of data, one of the first things to do is to not work in the dark. As data changes over time due to various reasons, these changes may break the system in different ways. It is important to be able to monitor the data flow in the system and make sure that they are working as intended.

To not be blind, first, we define a set of metrics to measure from a data system. Some of the example metrics that can be measured are the warehouse health metrics such as CPU load, memory load, number of connections to the warehouse, duration of query queue times, and storage capacity. Besides, other metrics such as the ‘freshness’ of the data, the number of NULL values in a column and number of duplicated rows are very useful indicators of whether a data system is healthy.

Using the metrics we identified, we can then set baselines to determine what we define as ‘healthy’. Monitoring can then be operationalised by building alert systems that automatically notify stakeholders when the unexpected happens. This allows immediate attention and remediation actions to be taken quickly.

TDS Archive

Managing Data as a Data Engineer — Part 3: Key Principles & Lessons Learnt

Don’t Be Blind

Published in TDS Archive

Written by Loh Meng Xin

No responses yet