The biggest problem with end-to-end data lineage is that not everybody actually understands why they might need it.
The traditional approach to Data Management, expressed in concepts such as the Data Management Body of Knowledge from DaMa.org or DGI Framework from the Data Governance Institute, explores individual areas of data management: Security, Architecture, Data Warehouses, Metadata, Data Quality, etc. Brian Brewer in his article “Data Lineage Holy Grail” is
“…still amazed that most folks still get stuck on chasing the elusive lineage holy grail.”
He also offers a simple scenario where he wants to open a certain changing data element and drills into “gory details” in an ETL tool. Well, I suppose that’s possible, but not on a large scale and not if you are on a budget. Performing manual impact analyses is actually very costly.
It’s Not Who You Are but Where You Came From
This approach no longer meets contemporary demands for data use and processing. For the intensive use of data in decision making (known as Data Driven Decision Making) knowing how information was created and what its quality and origin is may have greater value than the information itself. This calls for a consistent view across all areas, placing more emphasis on the dynamics of data processing rather than structures.
The Enterprise Information Flow as a doctrine (read more about it in this article or in this white paper), is the Data Management approach that combines the principles of Metadata Management, Security Management, Data Quality, and Data Architecture. And it expands these functions a little bit.
The end-to-end data lineage is at its heart. Data source quality assessment and user satisfaction survey come right next. And the aim is to be able to discover how each part of the information flow changes the information quality depending on the technology used, departments involved, and transformation algorithms.
The Real Impact of Knowing
Describing the impact of a transformation on data attributes is a sophisticated issue. For example, a transformation as simple as the aggregation of values can dramatically change the security sensitivity of information. It can remove sensitive details and produce publicly presentable data, but in other cases it may have quite the opposite effect.
For example, the sales results of any individual store or branch during a certain period may have little value for a competitor, but aggregated numbers for the whole company may be much more telling and thus sensitive. Another good example of EIF use is tracing the exact data lineage through the Master Data Management systems, which unify data quality, data format, and all structural specifics. To identify sources of data for information in MDM, we need the additional source-specific data attributes, a function unsupported by standard data lineage solutions.
Written with help of Ondrej Zyka and Ivo Mouka. The picture is from Indiana Jones and the Last Crusade, copyrighted by Paramount Pictures. If you feel like it, you can contact the authors via email at email@example.com or via the form on the right. Also, do not forget to follow us on Twitter.