The Benefits of Automating Data Lineage in the Initial Phases of a Data Governance Initiative

The Benefits of Automating Data Lineage in the Initial Phases of a Data Governance Initiative

MANTA Business
September 30, 2020
Zosia Szczech

In these times of constant digital transformation, a data governance framework is no longer just nice to have. It’s an absolute must for any company that wants to stay on top of its data and make it a corporate asset that will drive their processes and allow the company to pivot quickly to accomodate any changes in the industry or the ever-changing compliance rules. While many organizations recognize the importance of building a data strategy, they tend to wait to collect data lineage until a later stage of the initiative and overlook the benefits of automating data lineage in the initial phase. If you don’t want to go around in circles or experience setbacks in your data governance initiative, read this article by Nicola Askham,The Data Governance Coach, and learn how automating data lineage can save you time and frustration when desiging your governance framework.

Data lineage has always been a key data governance deliverable but until recently most people were waiting to capture data lineage until a later stage of their data governance initiative and often only if there was a regulatory requirement to do so.

This is now changing and more organisations understand the value of automating data lineage in the initial phases of their initative. It’s not surprising. If you don’t have data lineage your organisation will be wasting a lot of time and effort trying to find data, because there is no documentation of where data is and how it flows through your organisation.

You will also be wasting a lot of time whenever you’re trying to resolve data quality issues. If you don’t know where the data came from – you have to research it for every issue you try to fix.

I’ve seen many occasions where small changes can have really negative impacts downstream. Some organisations I’ve worked with have been so scared of repeating such errors that they have refused small changes to systems because they don’t know where that data goes. One organization’s systems landscape was so complex that they didn’t feel that a small change warranted the effort it would take to understand the impact of it, so the decision was made not to make the change at all! This left the business users frustrated with a broken process.

But contrast the benefits if you do have data lineage documented. Change management is a lot quicker because you can undertake your impact analysis by using it to understand where any changes are likely to impact and focus your analysis at that point.

If you are putting in place a data governance framework you can’t put controls and data quality reports on every single piece of data throughout your organisation. But if you have data lineage it will help you identify the areas where your data is most at risk of something going wrong, enabling you to put in place appropriate checks, controls and data quality reports.

Having data lineage also allows you to speed up data discovery. So many organisations have vast quantities of data that would be valuable to them, if only they knew it existed. Finally, as I mentioned at the start of this article for many industries there is a regulatory requirement to have data lineage in place.

It’s clear that having data lineage has lots of benefits, but on so many occasions data lineage is captured and documented manually.

Whether you do data lineage automatically or manually you will achieve the benefits mentioned above, but taking a manual approach to data lineage requires considerable effort. When I first started capturing data lineage I tried starting at the beginning, where data first comes into the organisation and tried to follow it as it flowed. However, this approach fails because a lot of people who produce or capture data have absolutely no idea where it goes. So, when doing manual data linage I now start at the end and work backwards. I focus on the most critical data outputs, but I still have to go from person to person asking them where they got the data, what they do to it, how many sources they combine to create it etc. 

This can be a laborious and painful process and the result is only as good as the knowledge of the people that you’re speaking to. Then there is the challenge of keeping manual data lineage up-to-date and many organisations resort to annual reviews of the documentation, with varying degrees of success.

When designing and implementing a data governance framework I take the following approach:

  • engage senior stakeholders
  • design the data governance framework
  • work out the deliverables in scope
  • implement the data governance framework

The third stage is when you would consider whether you want to capture data lineage and if so, at what stage in your implementation plan. If they are doing data lineage manually a lot of organisations leave it until quite late in the implementation plan. This is because you have to start delivering some benefits before people are willing to take on the significant effort of documenting data lineage manually.

This causes a conundrum—there are huge benefits to be had from capturing data lineage, but because it’s so laborious many organisations feel they need to focus on other deliverables initially to deliver some benefits to keep the business users engaged and willing to put in the effort necessary to create a manual data lineage at a later stage.

I get frustrated that this valuable deliverable is often left until late in a data governance initiative. However, automating data lineage can change that—if you’re able to use a tool to discover and document your data lineage you will be gaining a number of advantages:

It is a much faster, easier process taking a lot less time and input from your business users and IT team.The data lineage will be more accurate because what you capture is actually what is happening to the data, not what people believe is happening. Instead of doing data lineage reactively towards the end of your initiative you can use the data lineage to quickly identify your critical data and therefore help focus your initiative on the most important areas.

A final benefit is that automated data lineage also enables you to provide different views to different users. When doing a manual data lineage there are lots of debates over whether you need a detailed technical data lineage, or something at a higher level that is more business user friendly. Using a data lineage tool you can easily have both with users viewing what is most appropriate for them.

I hope I’ve convinced you that there are considerable benefits from having data lineage and that by automating it you can tackle it earlier in your Data Governance initiative enabling you to get more value and benefits from this key deliverable.