Engineer’s Notes

Data Lineage from… Microsoft Excel?!?

Yes, you heard correctly. Excel is one of the new connectors we have introduced in MANTA 3.25. The reason for this is that many of our customers still have disturbing amounts of data in Excel databases, which was more challenging to include in their data lineage. Now, everything is possible.

Yes, you heard correctly. Excel is one of the new connectors we have introduced in MANTA 3.25. The reason for this is that many of our customers still have disturbing amounts of data in Excel databases, which was more challenging to include in their data lineage. Now, everything is possible.

MANTA now tracks data lineage in Excel, being able to read multiple sheets and slides. After connecting to MS Excel, MANTA can process XLSM and XLSX files as well as look up Excel objects such as graphs and pivot tables. All the mapped objects are then connected to their source objects by analyzing queries with MANTA’s database connectors.

We also have a detection algorithm with which MANTA can automatically reveal aggregated lists and tables that have formula relations among each other. MANTA is able to track data lineage from database tables and CSV files through tables and graphs among multiple Excel workbooks and push the whole picture into MANTA’s native visualization as well as into third-party solutions.

Here you can take a look at what Data Lineage with Excel looks like:

Visit our tech hub to learn more.

Interested in a MANTA Demo? Contact us at manta@getmanta.com or schedule a call with our bot! 

Meet Us at the MIT CDOIQ Symposium!

We often tell you about the technical conferences and meet-ups we visit and how we present our solution there. But this time, we are showing up at a completely different kind of event. It’s the Chief Data Officer and Information Quality (CDOIQ) Symposium hosted annually by the Massachusetts Institute of Technology (MIT).

We often tell you about the technical conferences and meet-ups we visit and how we present our solution there. But this time, we are showing up at a completely different kind of event. It’s the Chief Data Officer and Information Quality (CDOIQ) Symposium hosted annually by the Massachusetts Institute of Technology (MIT).

What’s This Event About?

As mentioned by MIT, this symposium is one of the key events for sharing and exchanging cutting edge ideas, content, and discussions. And, as data is a critical aspect of every organization, the symposium focuses on the management and leadership of this critical element in the 21st century that will benefit every organization. Since data is a big part of MANTA’s own mission, we will be talking our hearts out.

What Do We Want to Share?

We are looking forward to sharing our own take on the future of data with you, and that will, as usual, revolve around data lineage. But this time, we will let you take a peek behind the curtains at MANTA’s very own research. It’s no secret that we closely cooperate with universities, that our CEO Tomas Kratky has been a teaching associate at the Czech Technical University in Prague for many years, and that our amazing developers are mostly selected from students who did their Master’s or Bachelor’s thesis research with MANTA’s Engineering team. So what exactly do our current research projects include?

Database Language Research

MANTA as a company always tries to support all our customers’ needs by adding more and more scanners and connectors for the technologies that they use in their data environments. MANTA as a research institution likes exploring new database languages and examining ways to automate their reading and processing for the purpose of creating data lineage. An exciting challenge for us at the moment is Java. Java, as well as the growing family of database languages that we support for our customers, produces a significant amount of metadata that needs to be stored in MANTA in order to perform various analyses on the data lineage graph. Due to the nature of data lineage visualization, MANTA has been storing its own program data in a graph database for quite some time now—which means we have been continuously doing research in this field as well.

Graph Databases

A graph database is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph. The graph represents the data items and their relations as a collection of nodes and edges, with the edges representing the relationships between the nodes. The relationships allow stored data to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships within a graph database is fast because they are perpetually stored within the database itself. Relationships can be intuitively visualized using graph databases, making them useful for heavily interconnected data—like data lineage.

In a future where network structures are frequently used (social networks, maps, car sharing, other sharing apps, etc.), graph databases might just be the most efficient and fastest way to document data. MANTA uses graph databases to store data lineage graphs, which is a pretty innovative way to use these databases.

But our research lies in being even better. Finding more efficient ways to store data as well as quickly finding dependencies between various versions of data lineage graphs can greatly boost the overall speed of creating data lineage and make our software significantly faster than all the other data lineage solutions out there.

If we can find more fast and efficient ways to store data in graph databases, we might be ensuring the painless and efficient creation of end-to-end data lineage in the data world of tomorrow. And that is a challenge we will gladly take on today.

Want to join us at the MIT CDOIQ Symposium from July 31 to August 2 and listen to our talk about the future of data lineage? With the code MANTA25, you’ll get a 25% discount on tickets. And that’s worth it. 

Anything else you want to tell us? We are here for you at manta@getmanta.com

 

 

MANTA 3.25: Microsoft Excel, IBM Cognos, SAP PowerDesigner, and more!

It has been a long time since we swore our 3-new-supported-technologies-every-release oath, and we have kept it every new software release since. And this version of MANTA is no different. In Release 3.25, we bring you support for Microsoft Excel (yes, Excel!), IBM Cognos, and SAP (Sybase) PowerDesigner. Read more in our regular blog post.

It has been a long time since we swore our 3-new-supported-technologies-every-release oath, and we have kept it every new software release since. And this version of MANTA is no different. In Release 3.25, we bring you support for Microsoft Excel (yes, Excel!), IBM Cognos, and SAP (Sybase) PowerDesigner. Read more in our regular blog post.

No time to read? Watch a video or dip into the details below:

Data Lineage for Excel? From Excel? With Excel? YES

No, this is not a drill. We have just released support for Microsoft Office’s Excel. The reason for this, which may seem like a weird flex to some, is that many of our customers still have disturbing amounts of data in Excel databases, which was more challenging for our customers to include in their data lineage.

MANTA now tracks data lineage in Excel, being able to read multiple sheets, pivot tables, slides, and graphs. We also have a detection algorithm with which MANTA can automatically reveal aggregated lists and tables that have formula relations among each other. MANTA is able to track data lineage from database tables and CSV files through tables and graphs among multiple Excel workbooks and push the whole picture into MANTA’s native visualization, as well as into third-party solutions.

Reporting Tools Are Our Friends…

The next connector that MANTA added to its technology portfolio is Cognos, a reporting tool from IBM. MANTA is able to create complete end-to-end data lineage from the database data sources, analytical models, and reports in Cognos Analytics by scanning:

• Reports (including queries)
• Interactive reports
• Framework Manager models
• Database connections

MANTA then pushes them into its own data visualization as well as into third-party solutions. (Visit our Tech Hub to learn more.)

Modeling Tools Too!

The third connector we have added is the modeling tool SAP (Sybase) PowerDesigner, our second after adding ER/Studio in Release 3.24. MANTA can scan PowerDesigner and automatically pull physical, logical, and conceptual models that can then be added to your data lineage to create end-to-end logical data lineage.

Behind the Scenes

Besides broadening MANTA’s Tech Hub, we have also made some improvements to existing supported technologies. The most work was probably done on SSAS, where we have added support to the newer version (level 1200) of the tabular models.

We also detect a connection between SSAS and SSRS from our Microsoft family, and these technologies are currently sharp crisp alfa versions. MANTA is also able to export more and more technologies into third-party solutions, including transformation logic.

And, as of MANTA 3.25, ETL Tools (SSIS, ODI, Talend) and reporting tools (SSAS, SSRS) from our tech hub are now exported into IBM IGC, Informatica EDC. and other third-party solutions.

Anything you want to ask us? Go ahead and write to us at manta@getmanta.com or chat with our friendly MANTA Bot. 

MANTA 3.24: New scanners for ODI, SSRS, ER/Studio and More!

March 27, 2019 by

Here in Prague, where MANTA’s engineering office is located, the snow has melted and sunny spring has arrived. As the first baby otters are born, we are delivering a new little baby of our own: MANTA 3.24. Read about it in our blog post below or check out the two-minute video where Jan Ulrych summarizes all the changes and updates.

Here in Prague, where MANTA’s engineering office is located, the snow has melted and sunny spring has arrived. As the first baby otters are born, we are delivering a new little baby of our own: MANTA 3.24. Read about it in our blog post below or check out the two-minute video where Jan Ulrych summarizes all the changes and updates.

What’s new this time? After finalizing our Microsoft SSRS connector, we have added two more new connectors. The first one is a scanner for Oracle Data Integrator (ODI); the second is for ER/Studio, which expands our influence in the realm of data modelling tools so that we can now create logical lineage automatically, making data lineage from MANTA more accessible for users who aren’t database tech pros.

However, the biggest success in this release is the direct integration with Collibra via API. We have been partnering with Collibra on the development of this synchronization API for quite some time now. So, we are pleased to inform you that we can now introduce the final version.

How is it different from the old integration?

  1. Direct integration. We are so integrated that we are basically part of Collibra DGC. This makes your work with Collibra and MANTA so much faster and easier.
  2. Automatic metadata update. Collibra can fully use this MANTA feature now.
  3. Table synchronization. We are the first ones on the planet able to update your Collibra DGC with your database 1:1, meaning you can now get rid of non-existent tables in DGC and make room for new ones.
  4. All in one. We are able to export all MANTA data lineage to Collibra, including the newly supported Microsoft SSRS.
  5. Logical Lineage. Since we support metadata extraction from E/R models and mappings between physical and logical layers, we can provide this information to Collibra so it can provide logical data lineage.
  6. Installation. It is just so much easier now.

Besides the hot stuff mentioned above, MANTA 3.24 finally offers transformation logic in Teradata and a long-awaited experimental Java version. We are currently doing closed beta testing with some of our customers, and from the next software release onward, we will be doing open testing.

Interested? Got questions? We are here for you. Throw a message into our trusty mailbox at manta@getmanta.com. We will reply!

MANTA x Record Level Lineage: Why we don’t have it

You may or may not have heard about record level lineage. This is a topic that our customers ask about quite frequently, so our vice president of development, Lukas Hermann, decided to write an article where he answers some of the FAQs. Continue reading to find out more about record level lineage and why we don’t have it.

You may or may not have heard about record level lineage. This is a topic that our customers ask about quite frequently, so our vice president of development, Lukas Hermann, decided to write an article where he answers some of the FAQs. Continue reading to find out more about record level lineage and why we don’t have it.

What is record level lineage?

Record level lineage is an approach to data lineage that is similar to data tagging. The idea behind data tracking is that each piece of data that is being moved or transformed is tagged/labeled by a transformation engine which then tracks that label all along its way from start to finish. This approach seems great, but it only works well when a transformation engine controls the data’s every move. Some good examples are controlled environments like Cloudera or Dremeo that focus only on the origin of one specific record.

Record level lineage vs. column level lineage

A feature that MANTA does have, that in a way is similar to record level lineage, is column level lineage. What exactly is the difference? Let’s look at an example.

Let’s say you have the column full name in your table. In this table, the full name is created by combining the first name and the last name. Imagine that in the full name column you have names like John Snow and Jack Snow. Now, let’s say that the name John Snow came to this table from your own CRM database, but Jack Snow came from a contact database acquired from a third party.

Record level lineage is able to tell you exactly that John Snow came from CRM and Jack Snow came from your contact database. Column level lineage, like in MANTA, is able to tell you that the column full name consists of data from these two databases—your CRM database and your contact database.

Why we don’t have it

The reason why MANTA does not have record level lineage is that MANTA doesn’t “see” your data; it doesn’t even “see” that you have a John Snow and a Jack Snow in your full name column. It only reads your metadata. That is why MANTA only sees a table that contains data from these databases and which databases they are.

Now, you might be thinking that the overall idea of the record level lineage approach might not be so bad after all. But keep in mind that if anything happens outside its walls, the lineage is broken. It is also important to realize that the lineage is only there if the transformation logic has been executed. But think about all the exceptions and rules that apply only once every couple of years. You will not see them in your lineage until they are executed, which is not exactly healthy for your data governance, especially if some of those pieces are critical to your organization.

Also, tags are formed by assigning additional metadata to the records. If you lose this metadata, you will never be able to form the lineage again. And without actually running the transformation engine, you don’t know how the given record was put together, and therefore don’t know the lineage behind it.

In conclusion

If MANTA wanted to have record level lineage, it would have to start reading your data instead of your metadata, and it would have to have much more information about your environment. This would make the entire process of getting data lineage far more complicated and time-consuming.

We can safely say that we are not planning on having record level lineage as a feature any time soon. On the other hand, we plan on putting more effort into understanding your data transformations. The fact that MANTA only reads your metadata and is only interested in your data transformations, not your actual data, is the reason why MANTA can be automated so well and get data lineage so fast.

And what about Conditional Lineage? 

MANTA also has conditional lineage as a feature, and you do look into the actual data when you are creating conditional lineage. Well, not quite. We only use the data that is specifically mentioned in the scripts. You can learn more about conditional lineage in the article: How to Handle Impact Analyses in Complex DWHs with Predicates.

So what MANTA does give you is a list of the exact databases that supply data to the given column in your table. For compliance with regulations such as GDPR and other financial or banking regulations, it is completely sufficient. And typically, there are no more than a few databases that supply each column, so then the question is: If you really need to have the specific database for each record in your table THAT BAD and you can have the databases narrowed down to a few for each column in a couple of hours, wouldn’t it be more efficient to just check those two databases manually for the specific record yourself?

Do you have any development-related questions for Lukas, or would you like to learn more about how MANTA can solve a specific issue in your company? Don’t hesitate to contact us at manta@getmanta.com.

We cherish your privacy.

And we need to tell you that this site uses cookies. Learn more in our Privacy Policy.