Most of us know the fairy tale about Hansel and Gretel. You remember—the two children who are banished from their home by their stepmother and get lost in the woods. They are lured into a gingerbread house belonging to an evil witch who threatens to eat them until Gretel finally…. Oops, sorry. Getting carried away with the story. But here’s a question—what does this children’s bedtime story have to do with historical lineage?
One of the most memorable details of the story is the trail that Hansel and Gretel leave behind. When they venture into the woods for the second time, they scatter breadcrumbs along their path to mark their travels, so that they can return home to their original location. In the story, our protagonists are lost because the animals eat the breadcrumbs, leaving them with no path to return.
Once Upon a Time… Data Journeys
In many businesses, this is how lineage reporting feels today. Data flows through the organization, processed by different teams, with different types of code and systems, across generations of processes, some new and some old, and through many applications like a stream flowing through an ancient forest. Walking through these woods is like wandering through your investments in SQL stored procedures, COBOL, ETL, reporting tools, cloud-based buckets, and predictive models!
Where the breadcrumb analogy meets current-day reality is in historical lineage. Today’s lineage that reflects the “current state” of your solutions might be beautifully represented, but what about lineage for yesterday? …or last week? How about last quarter or a specific date from last year? We have all had the experience where an analyst with their hair on fire comes running into the office and needs to know “Where did we get our revenue results from Q3 of the last fiscal year?!”
Wouldn’t it be nice to simply point to an active and available time slice of your lineage and check the lineage for that particular report for the desired date? As time goes by, even if lineage is efficiently collected and reported, it can start to get stale. Parts and pieces go missing if the solution is unable to report on lineage from dates in the past. Manually configured lineage is almost impossible to re-create, as the developer for your ETL might have moved on, sources have been archived, or entire applications may have been deprecated. Providing historical lineage requires a strategy as well as a solution that can deliver detailed flow information for any object based on a date and time that corresponds to the time when the lineage was collected and the systems were originally scanned.
Don’t Close the Door to the Past, Learn from It
Equally important in the discussion about historical lineage is the ability to compare the lineage for today (or some other date) with lineage for a date in the past. We know the lineage may be different, but how different? Are there additional (or fewer) sources and targets?
Have SQL statements changed?
Are the individual column mappings or calculations the same?
Can we locate the source of a problem that occurred several months ago and trace exactly what happened and in what piece of code?
These questions can easily and quickly be answered by comparing lineage from two different time slices.
What else can we do with lineage and historical comparisons? How about monitoring the increase in total items that we are governing across the environment? Or reviewing trends in code changes, or the number of touchpoints that access a particular element or table in the data lake? What about viewing the distribution of recognized personal data to external systems and then being able to document the changes in those flows? With this ability, historical lineage will allow you to illustrate progress made towards controlling where your data is flowing (or illustrate a growing lack of such control!).
Your Past Data Journeys + Today’s Data Journeys = Happily Ever After!
The story of Hansel and Gretel has a happy ending—they both escape peril and successfully return home. Establish a plan for delivering historical lineage so that your information consumers also have successful journeys and have fun traveling through time.
Enhance Data Pipeline Reliability with MANTA’s Time Slicing Feature
To help you establish a plan for a successful historical lineage delivery, we’ve engineered the revisions feature that allows you to see how your lineage looked in the past and compare it to its current state.
Every time you run a scan in MANTA, the results are saved in the repository. You can select an older revision to visualize the assets within the selected graphs and see how they looked at that moment in time. You can also compare two different time slices and see how the lineage has developed.
Want to know more about MANTA’s time slicing feature? Read our whitepaper and learn how this revolutionary feature will help you:
- Easily monitor, review, and control changes in data flows
- Identify what could have caused discrepancies or breakage in the flow
- Illustrate the progress made towards controlling where your data is flowing (or a growing lack of such control!)
- Implement DataOps and achieve data pipeline observability without allocating additional resources
- Meet regulatory compliance requirements (CPRA, CDPA, CPA, GDPR, and more) and review how personal and sensitive data flowed in the past
- And more
This article was written by MANTA’s SVP of Products, Ernie Ostic. You can also find it on his LinkedIn Pulse. Do you need a hand with understanding current and past data journeys? Drop us a line at email@example.com, or schedule a demo, and let us show you how MANTA can help.