What is Data Lineage and Why You Should Care About It?

Data Lineage is Like a Family Tree

First things first, what is data lineage? In simple terms it’s the flow of data from its origin till the end. To put it in better context, a parallel can be drawn with the family tree.  A family tree is knowing family relationships such as where you come from and who your ancestors were. Very valuable information can be drawn from one’s family lineage. Besides increasing knowledge of one’s origin, contributing to genealogy, discovering the birth and the death rates, it is remarkable in helping to identify medical history. Medical history is a secondary benefit of knowing family lineage but one that has immense benefits.

Data Lineage: The Complexity Of Information

Similarly, there is a wealth of information stored in data lineage but at times it can be difficult to find. Tracing data sources is an arduous task. Large organizations were built with systems several years ago and in their quest of keeping up with the technology, there was a rapid increase in acquiring collections of data sources. All these disparate data sources magically interact with each other and the systems bound together. However, understanding the complex data maze and getting a visual of a simple flow remains a dream. Using the cliché “a picture is worth a thousand words” is appropriate in this context. When finding the data lineage and flows within an organizations data set, businesses would be able to:

  1. Analyze how the data flows through the systems
  2. Protect exposed information in the chain thereby reducing risk for your organization
  3. Understand the effects of changes and its impacts
  4. Streamline information flows to move towards simpler and effective management
  5. Equip business with knowledge about data consumption

BCBS 239: FinServ, Data Lineage & Regulatory Compliance

Big banks now have to comply with BASEL request BCBS 239. One of the main requirement for fulfilling BCBS 239 is to understand the origin of data. Since data lineage is focused on  identifying the point of entry of certain data sets, this will help organizations inform regulators of the data origin. BCBS 239 is not just a deadline but the principles set forth are a key requirement for banks to ensure they have the right data to manage risk and ensure transparency to industry regulators to monitor system risk across the global markets.

The above listed are just a few of the benefits of understanding data lineage and flow. There are also additional benefits for the IT team in reference to data governance. Data flow is particularly of interest to an IT team when it comes to change management and impact analysis. Once the data landscape of an organization is known which includes the data relationships, the effects of changes can be traced and analyzed.  It can serve as a mechanism to prevent unwanted changes to your production systems, thus keeping the systems stable.

I hope this helps answers the question of what is Data Lineage?