MANTA’s very own Ernie Ostic shares his insights on the unmapped sections of your data pipeline and how you can resolve such issues by identifying and proactively resolving your data blind spots.
What is a data blind spot?
Blind spots are caused by complexity in data environments. Today’s IT landscapes are packed with directly and indirectly connected applications, microservices, infrastructure, and dependencies. It is impossible to have visibility over it all. You have data blind spots.
This complexity is like driving blind through a traffic jam: streets that are packed with cars and have slowed you to a crawl or forced you to stop altogether. And that slowed-to-a-crawl pace makes it more difficult to deploy new features and services, roll out updates, resolve problems that are hurting user experiences, or identify the root causes of those problems.
Just like you are unaware of blind spots when driving until you have an accident (or near accident), data blind spots can manifest themselves in a similar way—you don’t realize they are there until you have an incident.
Blind spots can hide anything, from bits and pieces of your data pipeline to the entire sections. They create a lack of visibility into information that is either inaccessible, unknown, undocumented, or simply buried under too much complexity.
This is especially true for analytics, which is most at risk for blind spot obscurity for many reasons. Among these are the fact that analytics is the farthest downstream – in most organizations, analytic data has the longest journey to its reporting or analytic destination – with the largest number of actions occurring against it. Complexity and visibility are like a game of telephone: the longer the chain of people playing, the more distorted the original becomes by the time it reaches the end.
Analytics also exaggerates concerns and problems. This is where data is aggregated, teased apart, and changed so that it’s easier to understand. As you move downstream in your data pipeline, you expand the number of independent branching paths where data can travel, making different routes for the data more hidden. That means more roadways.
These roadways are built out of all kinds of code and processing logic. They’re built of code embedded in databases, purchased applications, big data and Hadoop environments, ETL and ELT tooling, even your independent employee-owned spreadsheets! They include code that performs translations, string manipulation, date conversions, Boolean tests, rejects, audits, and more – changing your data along the way and delivering it to an unknown number of places.
But all of these additional roadways mean more complexity and, as a result, more blind spots and more traffic jams in an already congested data environment.
Data lineage solutions to resolve your data blind spots
You need a solution that provides clarity, one that will draw attention to your blind spots, remove the layers of obscurity and complexity, and provide visibility into all your systems.
Data lineage can help deliver that clarity by helping proactively identify and fill in your data blind spots. Here’s what to look for:
Simplify consumption. Data lineage is useless if you can’t understand it. Too often, lineage diagrams look like spaghetti, bewildering the researcher instead of providing insights. Lineage should be filterable and color-coded, with support for zoom-in and zoom-out. It should let you drill for more detail or step back for less, and it should be easy on the eyes, making it clear what you missed, where, and why. If your data blind spots are like dark corridors, then your lineage solution should be like a flashlight.
Highlight calculations. That data lineage flashlight needs to be able to expose the details of your data’s path and able to look into every nook and cranny for low-level calculations. Some don’t need or even want such details. But if those details ever are needed, they need to be immediately accessible without first requiring a hundred mouse clicks.
Deliver faster and ensure reliability. Lineage reports yield insights in many directions. Often, application teams need to gain insight into data blind spots when looking downstream. Code maintenance and especially application migration efforts demand the ability to determine the impact of our changes, or to properly scope a new project, such as ground-to-cloud migration. Our teams need to be able to see exactly what is being used throughout the pipeline, to determine if it’s worthy of being modernized or migrated, and to help prevent disasters caused by unknowingly breaking a critical report or other downstream process.
Enable lineage history. How often has an analyst come into the office in a panic, screaming, “Where did we get these numbers for the fourth quarter of last year!?” Reconciling values for upper management is a big part of lineage forensics. Make sure your lineage solution can dive into the past and pull up lineage as it was defined for that particular point in time (last week, month, quarter, year, etc.). You must also ensure that you can make comparisons between today’s and yesterday’s lineage reports, to decipher potential problems or sort out historical code changes.
Increase trust in data and the results of its analysis. Consider your data scientists and analysts, each of whom deserve to be able to trust the data they’re crunching. And consider the executives who are using analytics results and predictive models to make crucial decisions that impact the whole business. Their trust in the data and its results are directly connected to their confidence in using those results to guide the enterprise. Without lineage visibility, trust in data is difficult or impossible to obtain.
Everyone along the data pipeline needs to be able to see what is happening in the data. Blind spots in that visibility means that not only will you miss problems that end up having negative impacts on the user experience, but you’ll be unable to identify and resolve the root causes of those problems, preventing them from recurring in the future.
A proper, automated data lineage solution that carries all the capabilities listed above can enhance your operations productivity and development for smarter, faster, and more effective decision-making. It makes compliance easier, by satisfying regulators with new clarity into your transformation and data flow logic. Most importantly, a proper data lineage solution can identify and shine a light on your blind spots, filling them in, and arming you with new insights to prevent these problems from happening again.