Data provenance is a term we often hear in combination with big data solutions and various other corporate data-related projects. But how do I achieve data provenance and how is it related to data lineage? Let’s take a look at the MANTA x Data Provenance breakdown.
You may have heard the term data provenance before. It is related to data lineage more than you would guess. To be precise, it can be taken as a part of data lineage, a subset of it, or an additional supplement within the company’s data governance strategy.
We can pretty much say that data provenance examines the data’s point of origin. It includes a high-level view of the system for business users, so they can roughly navigate where their data comes from. Data provenance can be provided by a simple custom table and a few charts. Because it is uncomplicated to obtain, it is often used to map the origin of data sets in big data solutions.
Data lineage, on the other hand, pictures the complete data transformation journey from the data’s point of origin to any current observation point or end report within the system. It is based on reading technical metadata and therefore tracks data flows down to the lowest level – the actual scripts and statements.
So, whether you need to quickly find the origin of certain data and where its roots go for your data provenance project or you need complete end-to-end data lineage for more complex reports, regulatory compliance, data migration, and so on, MANTA has you covered.
Want to find out more about the holy grail of data lineage automation, a.k.a. MANTA? Check out our website or drop us a line at firstname.lastname@example.org. We don’t bite. 🙂