Use Cases & Case Studies

A Tech Support Love Letter: Be Our Valentine!

Valentine’s Day is heading our way and we know who our Valentine will be, but do you? We’ve made you this card so you can see how our customers get nothing but love and affection from MANTA’s support and development teams. It’s a love letter in numbers, enjoy!

Valentine’s Day is heading our way and we know who our Valentine will be, but do you? We’ve made you this card so you can see how our customers get nothing but love and affection from MANTA’s support and development teams. It’s a love letter in numbers, enjoy!

With love, MANTA.

A LOVE LETTER IN NUMBERS

MANTA 3.20: SSIS is in The Building, & More…

Well, well, well, another year is coming to an end! MANTA 3.20 is our last release of 2017, and we have made sure it will make it a year to remember. If you read this break-down, you will see just how much of a powerful finish it really is.

Well, well, well, another year is coming to an end! MANTA 3.20 is our last release of 2017, and we have made sure it will make it a year to remember. If you read this break-down, you will see just how much of a powerful finish it really is.

Microsoft SSIS

It has been coming to this moment for quite some time now. And here it is: MANTA now officially supports Microsoft SSIS. We support SSIS 2012 and newer. But we probably know what you are thinking: SSIS is an ETL tool, but MANTA doesn’t support ETL tools, so what’s going on?

SSIS is an ETL tool, but with plenty of SQL woven into it like it’s your granny’s holiday sweater. Since MANTA gets along so well with SQL, it comes as no surprise that it can effectively parse the code in the environment and provide you with the good old data lineage you’ve been waiting for.

Export

It’s always nice to get some of those foreign goods – that’s what TopQuadrant users will think from now on! MANTA 3.20 has fully automated the export of physical metadata to TopQuadrant TopBraid Enterprise Data Governance 5.4. Now you can have automatic, high-quality data lineage in EDG in just a matter of clicks!

What’s up?

That’s kind of what we tend to ask ourselves when we open our IBM InfoSphere Information Governance Catalogs… so we added automatic reporting of differences between the current and last revisions. Automatic reports of everything that changed since you were away – Someone added a new relation? Someone deleted a file that led to a change in the course of data lineage? Now you will know. MANTA will let you know what has changed since it last exported data to IGC.

And just in case…

Is there anything you need help with? Do you have any questions about MANTA 3.20? Then don’t hesitate to write to us at manta@getmanta.com. We are here for you. We’ve always been.

Happy New Year!

MANTA 3.19: Dynamic SQL Support, Fully-Automated Collibra, Service API & More…

The new MANTA 3.19 is here and it will leave you with exactly the same great feeling as the first sip of pumpkin spice latte on a rainy fall afternoon.* 

The new MANTA 3.19 is here and it will leave you with exactly the same great feeling as the first sip of pumpkin spice latte on a rainy fall afternoon.* 

In this version, we took a close look at our integration with Collibra and fully automated the whole process. Before, when we integrated MANTA with Data Governance Center, it required an initial setup that was tailored to fit each customer. But now it’s all part of the product, automatically ready to connect to your DGC!

This next new feature is a big deal for our partners who work with systems that MANTA doesn’t support, usually ETL tools, etc. These tools can contain SQL which our partners need to parse in order to understand their customers’ BI environments. With MANTA’s new „MANTA Service API“, our partners can now connect MANTA to their own solutions, make it crunch all the code in the customer databases that they can’t read, and then pull back all the information to provide their customers with detailed and accurate data-lineage.

So with the new “MANTA Service API” and Public API we introduced in our last release, you can now use MANTA’s SQL-analyzing superpower ANYWHERE. You’re welcome.

We boosted all the analyzation processes as well, especially the DB2 connector. So now when you are exporting to IBM InfoSphere Information Governance Catalog, you can see the SQL source code right in the window.

MANTA does static code analysis, and one of its handicaps was dynamic SQL analysis. In 3.19, we have made steps toward speeding up the process of analyzing dynamic SQL. MANTA is able to recognize and read your dynamic SQL patterns, although some specification is needed from time to time.

Last but not least there have been a few improvements affecting Informatica PowerCenter integrations. For example, MANTA can now easily read what database IFPC is connected to, which significantly decreases the amount of manual work required in the initial setup, saving many hours of valuable time on MANTA x Informatica integrations.

*We have not verified this claim; it’s just based on my personal experience. Please, don’t sue us.

Also, if you have any questions, just let us know!

Manta Goes Public with Its API!

Nowadays, every app, tool and solution needs to be connected to everything else. And MANTA is ready to join the club. 

Nowadays, every app, tool and solution needs to be connected to everything else. And MANTA is ready to join the club. 

You Asked for It

Here at MANTA HQ, we’ve been literally buried with customer requests to add various integration possibilities for Manta Flow. You asked for it! As of version 3.18, MANTA has a public REST API. This new feature, together with multi-level data lineage gives users the option to use MANTA with all kinds of technologies.

Through the public API you can connect MANTA to any custom tool or app and allow it to work with its data. How exactly? Take a look at this example:

Let’s say you have your own quality monitoring tool that monitors critical elements of data lineage for you. You can let MANTA export an excel file and then manually go through all the values, find out what their sources are, and manually look for changes. But now, thanks to public API, you can do all this automatically using your own tool!

Put an End to Boring Manual Reports

The tool can call MANTA’s API, automatically pull out all the critical elements of data lineage, and report the changes found. Now, you can automatically monitor all changes that occur to your data during a given time period, saving you and your company hours of manual labor spent pouring data from MANTA into your own tool.

And there are many, many other ways you can use our new API!

To learn more about capabilities of our solution, try a live demo, ask for trial or drop us a line on manta@getmanta.com.

How Manta Flow Now Works with IBM Information Governance Catalog

November 22, 2016 by

We are expanding our support for 3rd party metadata managers – to help our customers get the most out of their existing data governance solutions.

We are expanding our support for 3rd party metadata managers – to help our customers get the most out of their existing data governance solutions.

Our key product, Manta Flow, already complements Informatica Metadata Manager very well. With the addition of IBM InfoSphere Information Governance Catalog, we are able to deliver the same level of highly specialized code crunching to folks who use IBM’s tools as well. And how does that work? Well, it’s simple:

1. Manta Flow crunches programming code based on our supported technologies (Teradata, Oracle, Microsoft SQL and others).

2. After connecting to IGC, Manta Flow will create a new metamodel and perfectly integrate with the existing structure within IGC.

3. The customer can browse IGC as he or she is used to – it’s just going to have way more accurate data lineage ready to use.

Seamless integration into IGC is the key to success. We’ve created a short video to explain a little bit more how Manta Flow is integrated into Information Governance Catalog. And what’s inside?

1. A Brief Explanation of a New Metamodel in IGC: 0:10
2. How It Works with Queries: 1:40
3. Integration of Data Lineage Visualization: 2:13

A frinedly suggestion: Run the video on fullscreen.

And if you are not ready for IGC, stay tuned, we will soon present you our newest video about our oldest love – IMM. In the meantime, read the introductory article right here.

Any thoughts? Comments? Or do you simply want to try it out for yourself? Just let us know at manta@mantatools.com or use the form on the right.

Agile BI Development in 2016: Where Are We?

Agile development was meant to be the cure for everything. It’s 2016 and Tomas Kratky asks the question: where are we?

Agile development was meant to be the cure for everything. It’s 2016 and Tomas Kratky asks the question: where are we?

BI departments everywhere are under pressure to deliver high quality results and deliver them fast. At the same time, the typical BI environment is becoming more and more complex. Today we use many new technologies, not just standard relational databases with SQL interfaces, but for example NoSQL databases, Hadoop, and also languages like Python or Java for data manipulation.

Another issue we have is a false perception of the work that needs to be done when a business user requests some data. Most business users think that preparing the data is only a tiny part of the work and that the majority of the work is about analyzing the data and later communicating the results. Actually, it’s more like this:

agile

See? The reality is completely different. The communication and analysis of data is that tiny part at the top and the majority of the work is about data preparation. Being a BI guy is simply a tough job these days.

This whole situation has led to an ugly result – businesses are not happy with their data warehouses. We all have probably heard a lot of complaints about DWHs being costly, slow, rigid, or inflexible. But the reality is that DWHs are large critical systems, and there are many, many different stakeholders and requirements which change from day to day. In another similar field, application software development, we had the same issues with delivery, and in those cases, agile processes were good solutions. So our goal is to be inspired and learn how agile can be used in BI.

The Answer: Agile?

One very important note – agility is a really broad term, and today I am only going to speak about agile software development, which means two things from the perspective of a BI development team:

1. How to deliver new features and meet new requirements much faster

2. How to quickly change the direction of development

Could the right answer be agile development? It might be. Everything written in the Agile Manifesto makes sense, but what’s missing are implementation guidelines. And so this Manifesto was, a little bit later, enriched with so-called agile principles. As agile became very popular, we started to believe that agile was a cure for everything. This is a survey from 2009 which clearly demonstrates how popular agile was:

Obrázek1

Source: Forrester/Dr. Dobb’s Global Developer Technographic, 2009

And it also shows a few of the many existing agile methodologies. According to some surveys from 2015, agile is currently being used by more than 80% or even 90% of development teams.

Semantic Gap

Later on, we realized that agile is not an ultimate cure. Tom Glib, in his famous article “Value-Driven Development Principles and Values” written in 2010, went a bit deeper. After conducting a thorough study of the failures, mistakes, and also successes since the very beginning of the software industry, one thing became clear – there is something called a semantic gap between business users and engineers, and this gap causes a lot of trouble. Tom Glib hit the nail on the head by saying one important thing: “Rapidly iterating in wrong directions is not progress.” Therefore, the requirements need to be treated very carefully as well.

But even with the semantic gap issue, agile can still be very useful. Over the last ten years the agile community has come up with several agile practices. They are simple to explain things that anyone can start doing to improve his or her software processes. And this is something you should definitely pay attention to. Here you can see agile practices sorted by popularity:

Obrázek1

If you have ever heard about agile, these are probably no surprises for you. The typical mistake made by many early adopters of agile was simply being too rigid; I would call it “fanatic”. It was everything or nothing. But things do not work that way.

It’s Your Fault If You Fail

Each and every practice should be considered a recommendation, not a rule. Your responsibility is to decide if it works for you or not. Each company and each team are different, and if system metaphor practice has no value for your team, just ignore it like we do. Are you unable to get constant feedback from business users? Ok, then. Just do your best to get as much feedback as you need.

On the other hand, we’ve been doing agile for a long time, and we’ve learned that some practices (marked in red) are more important than others and significantly influence our ability to be really fast and flexible.

Obrázek2

There are basically two groups of practices. The first group is about responsibility. A product owner is someone on your side who is able to make decisions about requirements and user needs, prioritize them, evaluate them, and verify them. It can be someone from the business group; but this job is very time consuming, so more often the product owner will be the person on your BI team who knows the most about business. Without such a person, your ability to make quick decisions will be very limited. Making a burndown list is a very simple practice which forces you to clearly define priorities and to select features and tasks with the highest priority for the next release. And because your releases tend to be more frequent with agile, you can always pick only a very limited number of tasks making clear priorities vital.

The second group of critical practices is about automation. If your iterations are short, if you integrate the work of all team members on a daily basis and also want to test it to detect errors and correct them as early as possible, and if you need to deliver often, you will find yourself and your team in a big hurry without enough time to handle everything manually. So automation is your best friend. Your goal is to analyze everything you do and replace all manual, time-consuming activities with automated alternatives.

What Tools To Use?

Typical tools you can use include:

1. Modern Version Control Systems

A typical use case involves a GIT, SVN, or Team Foundation Server storing all pieces of your code, tracking versions/changes, merging different branches of code, etc. What you are not allowed to do is use shared file systems for that. Unfortunately, it is still quite a common practice among BI folks. Also, be careful about using BI tools which do not support easy, standard versioning. Do not forget that even if you draw pictures, models, or workflows and do not write any SQL, you are still coding.

So a good BI tool stores every piece of information in text-based files – for example XMLs. That means you can make them part of a code base managed by GIT for example. A bad BI tool stores everything in binary and proprietary files, which can’t be managed effectively by any versioning system. Some tools support a kind of internal versioning, but those are still a big pain for you as a developer and they lead to fragmented version control.

2. Continuous Integration Tools

You’ll also need tools like Maven and Jenkins or PowerShell and TeamCity to do rapid and automated build and deploy of your BI packages.

3. Tools for Automated Code Analysis and Testing

I recommend using frameworks like DB Fit at least to write automated functional tests and also using a tool for static code analysis to enforce your company standards, best practices, and code conventions (Manta Checker is really good at that). And do not forget – you can’t refactor your code very often without proper testing automation.

4. Smart Documentation Tools

In the end, you can’t work on the parts of your system you do not understand. The best combination of tools you can get is something like Wiki to capture basic design ideas and a smart documentation tool able to generate detailed documentation when needed in an automated way. Today there are many very good IDEs that are able to generate mainly control-flow and dependency diagrams. But we are BI guys, and there is one thing that is extremely useful for us – it is called data lineage, or you can call it data flow.

Simply put, it’s a diagram showing you how data flows and is transformed in your DWH. You need data lineage to perform impact analyses and what-if analyses as well as to refactor your code and existing data structures. There are almost no solutions on the market which are able to show you data lineage from your custom code (except our Manta Flow, of course).

And that’s it. Of course, there are some other more advanced practices to support your agility, but this basic stuff is, I believe, something which can be implemented quickly from the perspective of both processes and tools. I definitely suggest starting with a smaller more experienced team, implementing the most important practices, playing around a little bit, and measuring the results of different approaches. I guarantee that you and your team will experience significant improvements in speed and flexibility very soon.

Do you have any questions or comments? Send them directly to Tomas Kratky at manta@mantatools.com! Also, do not forget to follow us on Twitter and LinkedIn

How to Handle Impact Analyses in Complex DWHs with Predicates

“How to get full data lineage in complex BI environments and perform reliable impact analyses?” Predicates (with the help of Manta Flow!) might be the answer. 

“How to get full data lineage in complex BI environments and perform reliable impact analyses?” Predicates (with the help of Manta Flow!) might be the answer. 

During our pilots and deployments, we often find data warehouse environments that use very general physical models including several big tables like PARTY, BALANCE, ORDER and others. These tables contain data obtained from various source systems, and there are a lot of data marts and reports built on top of them. These tables make things difficult during the impact analysis because data lineage from almost every report goes through them to all sources making the result worthless.

Impact Analyses Do Not Have to Be THIS BIG 

Let’s take a look at an example to understand exactly what happens. The table PARTY contains all individuals and companies that are somehow related to the organization. Thus, in one table, it is possible to have records for clients, employees, suppliers and its branch network. Each type of entity is identified by a unique attribute or source system from which data is obtained – for example, clients are managed in a different system than employees.

Now, let’s assume we have two reports based on data from the PARTY table – a report EMPL_REPORT that displays information about employees and another report BRANCH_REPORT that displays information about the branch network. If we use the standard data lineage analysis, we can get this picture:

predicates1

Although only data from the EMPLOYEE source table is relevant for the report EMPL_REPORT, the impact analysis from that report also includes the CLIENT, BRANCH and SUPPLIER source tables due to the PARTY table. The problem is the same for the report BRANCH_REPORT. From the other side, the impact analysis from the EMPLOYEE source table includes both the EMPL_REPORT and BRANCH_REPORT which is confusing.
In the real environment, there are dozens of source systems and hundreds of reports, which makes the standard data lineage analysis worthless.

The Advanced Data Lineage Analysis 

Fortunately, there is a solution. When data is inserted into the PARTY table from different source systems, there is often a column like PARTY.source_system_id where the identification of the source system is stored as a constant. Similarly, when a report is created that consumes data only from specific source systems, there is a condition in the statement filtering data based on the PARTY.source_system_id column. Thus, it is possible to automatically analyze both the insertion and selection to/from the PARTY table and create predicates such as PARTY.source_system_id = 20 that are then stored together with data lineage in the metadata repository. Therefore, it is possible to include them in the computation during the impact analysis.

Thanks to that, if we perform an impact analysis from the report EMPL_REPORT, the predicate PARTY.source_system_id = 20 is gathered before the table PARTY. When the analysis continues towards source tables, the predicate for each path is selected and compared to what has already been gathered. Therefore, when the path to the source table CLIENT with the predicate PARTY.source_system_id = 10 is tested, the result is that both predicates cannot hold at once, so data for this report cannot come from this source table. Conversely, when the path to source table EMPLOYEE with the predicate PARTY.souce_system_id = 20 is tested, the result is that data for this report can come from this source table, so it is included in the result of the impact analysis. We can get similar results if we perform an impact analysis for the BRANCH_REPORT and also from sources like the EMPLOYEE table.

The result of the advanced data lineage analysis can look like this (in reality, if we perform the impact analysis from the EMPL_REPORT, we will only see the EMPLOYEE and PARTY tables):

predicates2

Surely, the situation can be far more complex. For example, the data from the PARTY table can be pre-computed for more source systems first, and then several reports can be created on top of them for only a specific source system, like in this picture:

predicates3

This is also something that can be handled and, as you may have expected, even this is a part of the Manta Flow product analysis.

If you have any questions or comments, feel free to contact Lukas at manta@mantatools.com. You can try these predicate-based impact analyses in our free trial – just request it using the form on the right. 

 

How To Inspect Raw Data Lineage With Manta Flow

Risk departments have a lot of complex SQL queries in their data warehouses and data marts. But sometimes it’s really difficult to find the right level of detail. Manta Flow can help.

Risk departments have a lot of complex SQL queries in their data warehouses and data marts. But sometimes it’s really difficult to find the right level of detail. Manta Flow can help.

“When we present Manta Flow to potential customers, most of them are happy that we can reduce very complex SQL statements to a few simple rectangles connected by arrows”, explains Lukas Hermann, our Director of Engineering. “They need to be able to quickly understand what source tables their SQL queries read, what target tables they fill, what columns are involved in computing a particular column and how.”

The Usual

For example, let’s look at just two ordinary insert statements moving data from a stage to a datamart and to a report:

raw_code

It could take you quite a while to analyze which columns are involved in the computation. But with Manta Flow it is really easy to see, including all the statements involved:

Direct

(click to show the large version in a new tab)

This is perfectly sufficient for all business analytics in data warehouse environments. All the unnecessary details like exactly how data is computed, filtered, aggregated, or ordered are hidden. And if you want to go deeper, Manta Flow can easily show the SQL code of the statements where you find the full detail.

The Raw

However, some analysts (particulary from the aforementioned risk departments) say that their SQL statements are really huge, including many subselects, complex expressions, etc., so the jump between the clear picture and the SQL code is too big. Therefore, they would like to see all the computation steps in a similar simplified format, and they ask if Manta Flow can handle it, if it has all the information necessary to show it.

The answer is that Manta Flow has the most detailed information possible about each part of the statement, but so as not to disturb you with what are in most cases useless details, it filters the information to the best level of detail. If you want to see everything including expressions, conditions, aggregations, etc., it’s possible to configure or completely turn off the filtering. Manta Flow is able to show you unfiltered information, but still keep you in the loop and oriented within your own systems.

click_to_see_the_big_picture

(image will open in a new tab)

See? It’s possible to show the SQL code in the precise position of each part of the statement shown.

If you’d like to try something like that yourself, just let us know in the form on the right. Also, do not forget to follow us on Twitter.

Manta Flow Makes Metadata Shine

Customer: a major bank
Problem: A complex BI environment with thousands of BTEQ scripts, 10 000+ views, and hundreds of ETL transformations. Management with a very progressive metadata strategy, but lacking appropriate tools.
Solution: Manta Flow was applied to Teradata, analyzing all tables, views and scripts, and providing missing metadata to Informatica Metadata Manager.
Result: Tightly controlled BI environment with completedata flow documentation, comprehensive impact analyses, and unparalleled visibility of metadata.

Manta Flow Makes Impact Analyses Doable

Customer: a major bank
Problem: The customer has a 50 000+ lines of Oracle SQL and PL/SQL code in one of their environments. Impact Analyses could not be completed in time, and contained many gaps and errors. It was impossible to fully document the workings of the customer’s Oracle platform.
Solution: Manta Flow has been implemented for regular and one-off analyses of the entire code base. Data flows are analyzed and visualized for each change.
Result: The customer now has a complete data flow map of the environment. Impact analyses are quick to perform, and reliable. The visualization produced by Manta flow is used to train new employees and contractors.

Subscribe to our newsletter

We cherish your privacy.

By using this site, you agree with using our cookies.