What is data lineage in a data lake?
When looking at data lineage in a data lake, let’s first define what a data lake is. A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. You can store your data as-is without first structuring it and run different types of analytics—from dashboards and visualizations to big data processing, real-time analytics, and machine learning to guide better decisions.
Within a data lake, data can be moved, transformed, and utilized by other systems. These activities create data lineage, which MANTA’s automated data lineage platform is able to visualize.