Use Cases & Case Studies

MANTA 3.19: Dynamic SQL Support, Fully-Automated Collibra, Service API & More…

The new MANTA 3.19 is here and it will leave you with exactly the same great feeling as the first sip of pumpkin spice latte on a rainy fall afternoon.* 

The new MANTA 3.19 is here and it will leave you with exactly the same great feeling as the first sip of pumpkin spice latte on a rainy fall afternoon.* 

In this version, we took a close look at our integration with Collibra and fully automated the whole process. Before, when we integrated MANTA with Data Governance Center, it required an initial setup that was tailored to fit each customer. But now it’s all part of the product, automatically ready to connect to your DGC!

This next new feature is a big deal for our partners who work with systems that MANTA doesn’t support, usually ETL tools, etc. These tools can contain SQL which our partners need to parse in order to understand their customers’ BI environments. With MANTA’s new „MANTA Service API“, our partners can now connect MANTA to their own solutions, make it crunch all the code in the customer databases that they can’t read, and then pull back all the information to provide their customers with detailed and accurate data-lineage.

So with the new “MANTA Service API” and Public API we introduced in our last release, you can now use MANTA’s SQL-analyzing superpower ANYWHERE. You’re welcome.

We boosted all the analyzation processes as well, especially the DB2 connector. So now when you are exporting to IBM InfoSphere Information Governance Catalog, you can see the SQL source code right in the window.

MANTA does static code analysis, and one of its handicaps was dynamic SQL analysis. In 3.19, we have made steps toward speeding up the process of analyzing dynamic SQL. MANTA is able to recognize and read your dynamic SQL patterns, although some specification is needed from time to time.

Last but not least there have been a few improvements affecting Informatica PowerCenter integrations. For example, MANTA can now easily read what database IFPC is connected to, which significantly decreases the amount of manual work required in the initial setup, saving many hours of valuable time on MANTA x Informatica integrations.

*We have not verified this claim; it’s just based on my personal experience. Please, don’t sue us.

Also, if you have any questions, just let us know!

Manta Goes Public with Its API!

Nowadays, every app, tool and solution needs to be connected to everything else. And MANTA is ready to join the club. 

Nowadays, every app, tool and solution needs to be connected to everything else. And MANTA is ready to join the club. 

You Asked for It

Here at MANTA HQ, we’ve been literally buried with customer requests to add various integration possibilities for Manta Flow. You asked for it! As of version 3.18, MANTA has a public REST API. This new feature, together with multi-level data lineage gives users the option to use MANTA with all kinds of technologies.

Through the public API you can connect MANTA to any custom tool or app and allow it to work with its data. How exactly? Take a look at this example:

Let’s say you have your own quality monitoring tool that monitors critical elements of data lineage for you. You can let MANTA export an excel file and then manually go through all the values, find out what their sources are, and manually look for changes. But now, thanks to public API, you can do all this automatically using your own tool!

Put an End to Boring Manual Reports

The tool can call MANTA’s API, automatically pull out all the critical elements of data lineage, and report the changes found. Now, you can automatically monitor all changes that occur to your data during a given time period, saving you and your company hours of manual labor spent pouring data from MANTA into your own tool.

And there are many, many other ways you can use our new API!

To learn more about capabilities of our solution, try a live demo, ask for trial or drop us a line on manta@getmanta.com.

How Manta Flow Now Works with IBM Information Governance Catalog

November 22, 2016 by

We are expanding our support for 3rd party metadata managers – to help our customers get the most out of their existing data governance solutions.

We are expanding our support for 3rd party metadata managers – to help our customers get the most out of their existing data governance solutions.

Our key product, Manta Flow, already complements Informatica Metadata Manager very well. With the addition of IBM InfoSphere Information Governance Catalog, we are able to deliver the same level of highly specialized code crunching to folks who use IBM’s tools as well. And how does that work? Well, it’s simple:

1. Manta Flow crunches programming code based on our supported technologies (Teradata, Oracle, Microsoft SQL and others).

2. After connecting to IGC, Manta Flow will create a new metamodel and perfectly integrate with the existing structure within IGC.

3. The customer can browse IGC as he or she is used to – it’s just going to have way more accurate data lineage ready to use.

Seamless integration into IGC is the key to success. We’ve created a short video to explain a little bit more how Manta Flow is integrated into Information Governance Catalog. And what’s inside?

1. A Brief Explanation of a New Metamodel in IGC: 0:10
2. How It Works with Queries: 1:40
3. Integration of Data Lineage Visualization: 2:13

A frinedly suggestion: Run the video on fullscreen.

And if you are not ready for IGC, stay tuned, we will soon present you our newest video about our oldest love – IMM. In the meantime, read the introductory article right here.

Any thoughts? Comments? Or do you simply want to try it out for yourself? Just let us know at manta@mantatools.com or use the form on the right.

Agile BI Development in 2016: Where Are We?

Agile development was meant to be the cure for everything. It’s 2016 and Tomas Kratky asks the question: where are we?

Agile development was meant to be the cure for everything. It’s 2016 and Tomas Kratky asks the question: where are we?

BI departments everywhere are under pressure to deliver high quality results and deliver them fast. At the same time, the typical BI environment is becoming more and more complex. Today we use many new technologies, not just standard relational databases with SQL interfaces, but for example NoSQL databases, Hadoop, and also languages like Python or Java for data manipulation.

Another issue we have is a false perception of the work that needs to be done when a business user requests some data. Most business users think that preparing the data is only a tiny part of the work and that the majority of the work is about analyzing the data and later communicating the results. Actually, it’s more like this:

agile

See? The reality is completely different. The communication and analysis of data is that tiny part at the top and the majority of the work is about data preparation. Being a BI guy is simply a tough job these days.

This whole situation has led to an ugly result – businesses are not happy with their data warehouses. We all have probably heard a lot of complaints about DWHs being costly, slow, rigid, or inflexible. But the reality is that DWHs are large critical systems, and there are many, many different stakeholders and requirements which change from day to day. In another similar field, application software development, we had the same issues with delivery, and in those cases, agile processes were good solutions. So our goal is to be inspired and learn how agile can be used in BI.

The Answer: Agile?

One very important note – agility is a really broad term, and today I am only going to speak about agile software development, which means two things from the perspective of a BI development team:

1. How to deliver new features and meet new requirements much faster

2. How to quickly change the direction of development

Could the right answer be agile development? It might be. Everything written in the Agile Manifesto makes sense, but what’s missing are implementation guidelines. And so this Manifesto was, a little bit later, enriched with so-called agile principles. As agile became very popular, we started to believe that agile was a cure for everything. This is a survey from 2009 which clearly demonstrates how popular agile was:

Obrázek1

Source: Forrester/Dr. Dobb’s Global Developer Technographic, 2009

And it also shows a few of the many existing agile methodologies. According to some surveys from 2015, agile is currently being used by more than 80% or even 90% of development teams.

Semantic Gap

Later on, we realized that agile is not an ultimate cure. Tom Glib, in his famous article “Value-Driven Development Principles and Values” written in 2010, went a bit deeper. After conducting a thorough study of the failures, mistakes, and also successes since the very beginning of the software industry, one thing became clear – there is something called a semantic gap between business users and engineers, and this gap causes a lot of trouble. Tom Glib hit the nail on the head by saying one important thing: “Rapidly iterating in wrong directions is not progress.” Therefore, the requirements need to be treated very carefully as well.

But even with the semantic gap issue, agile can still be very useful. Over the last ten years the agile community has come up with several agile practices. They are simple to explain things that anyone can start doing to improve his or her software processes. And this is something you should definitely pay attention to. Here you can see agile practices sorted by popularity:

Obrázek1

If you have ever heard about agile, these are probably no surprises for you. The typical mistake made by many early adopters of agile was simply being too rigid; I would call it “fanatic”. It was everything or nothing. But things do not work that way.

It’s Your Fault If You Fail

Each and every practice should be considered a recommendation, not a rule. Your responsibility is to decide if it works for you or not. Each company and each team are different, and if system metaphor practice has no value for your team, just ignore it like we do. Are you unable to get constant feedback from business users? Ok, then. Just do your best to get as much feedback as you need.

On the other hand, we’ve been doing agile for a long time, and we’ve learned that some practices (marked in red) are more important than others and significantly influence our ability to be really fast and flexible.

Obrázek2

There are basically two groups of practices. The first group is about responsibility. A product owner is someone on your side who is able to make decisions about requirements and user needs, prioritize them, evaluate them, and verify them. It can be someone from the business group; but this job is very time consuming, so more often the product owner will be the person on your BI team who knows the most about business. Without such a person, your ability to make quick decisions will be very limited. Making a burndown list is a very simple practice which forces you to clearly define priorities and to select features and tasks with the highest priority for the next release. And because your releases tend to be more frequent with agile, you can always pick only a very limited number of tasks making clear priorities vital.

The second group of critical practices is about automation. If your iterations are short, if you integrate the work of all team members on a daily basis and also want to test it to detect errors and correct them as early as possible, and if you need to deliver often, you will find yourself and your team in a big hurry without enough time to handle everything manually. So automation is your best friend. Your goal is to analyze everything you do and replace all manual, time-consuming activities with automated alternatives.

What Tools To Use?

Typical tools you can use include:

1. Modern Version Control Systems

A typical use case involves a GIT, SVN, or Team Foundation Server storing all pieces of your code, tracking versions/changes, merging different branches of code, etc. What you are not allowed to do is use shared file systems for that. Unfortunately, it is still quite a common practice among BI folks. Also, be careful about using BI tools which do not support easy, standard versioning. Do not forget that even if you draw pictures, models, or workflows and do not write any SQL, you are still coding.

So a good BI tool stores every piece of information in text-based files – for example XMLs. That means you can make them part of a code base managed by GIT for example. A bad BI tool stores everything in binary and proprietary files, which can’t be managed effectively by any versioning system. Some tools support a kind of internal versioning, but those are still a big pain for you as a developer and they lead to fragmented version control.

2. Continuous Integration Tools

You’ll also need tools like Maven and Jenkins or PowerShell and TeamCity to do rapid and automated build and deploy of your BI packages.

3. Tools for Automated Code Analysis and Testing

I recommend using frameworks like DB Fit at least to write automated functional tests and also using a tool for static code analysis to enforce your company standards, best practices, and code conventions (Manta Checker is really good at that). And do not forget – you can’t refactor your code very often without proper testing automation.

4. Smart Documentation Tools

In the end, you can’t work on the parts of your system you do not understand. The best combination of tools you can get is something like Wiki to capture basic design ideas and a smart documentation tool able to generate detailed documentation when needed in an automated way. Today there are many very good IDEs that are able to generate mainly control-flow and dependency diagrams. But we are BI guys, and there is one thing that is extremely useful for us – it is called data lineage, or you can call it data flow.

Simply put, it’s a diagram showing you how data flows and is transformed in your DWH. You need data lineage to perform impact analyses and what-if analyses as well as to refactor your code and existing data structures. There are almost no solutions on the market which are able to show you data lineage from your custom code (except our Manta Flow, of course).

And that’s it. Of course, there are some other more advanced practices to support your agility, but this basic stuff is, I believe, something which can be implemented quickly from the perspective of both processes and tools. I definitely suggest starting with a smaller more experienced team, implementing the most important practices, playing around a little bit, and measuring the results of different approaches. I guarantee that you and your team will experience significant improvements in speed and flexibility very soon.

Do you have any questions or comments? Send them directly to Tomas Kratky at manta@mantatools.com! Also, do not forget to follow us on Twitter and LinkedIn

How to Handle Impact Analyses in Complex DWHs with Predicates

“How to get full data lineage in complex BI environments and perform reliable impact analyses?” Predicates (with the help of Manta Flow!) might be the answer. 

“How to get full data lineage in complex BI environments and perform reliable impact analyses?” Predicates (with the help of Manta Flow!) might be the answer. 

During our pilots and deployments, we often find data warehouse environments that use very general physical models including several big tables like PARTY, BALANCE, ORDER and others. These tables contain data obtained from various source systems, and there are a lot of data marts and reports built on top of them. These tables make things difficult during the impact analysis because data lineage from almost every report goes through them to all sources making the result worthless.

Impact Analyses Do Not Have to Be THIS BIG 

Let’s take a look at an example to understand exactly what happens. The table PARTY contains all individuals and companies that are somehow related to the organization. Thus, in one table, it is possible to have records for clients, employees, suppliers and its branch network. Each type of entity is identified by a unique attribute or source system from which data is obtained – for example, clients are managed in a different system than employees.

Now, let’s assume we have two reports based on data from the PARTY table – a report EMPL_REPORT that displays information about employees and another report BRANCH_REPORT that displays information about the branch network. If we use the standard data lineage analysis, we can get this picture:

predicates1

Although only data from the EMPLOYEE source table is relevant for the report EMPL_REPORT, the impact analysis from that report also includes the CLIENT, BRANCH and SUPPLIER source tables due to the PARTY table. The problem is the same for the report BRANCH_REPORT. From the other side, the impact analysis from the EMPLOYEE source table includes both the EMPL_REPORT and BRANCH_REPORT which is confusing.
In the real environment, there are dozens of source systems and hundreds of reports, which makes the standard data lineage analysis worthless.

The Advanced Data Lineage Analysis 

Fortunately, there is a solution. When data is inserted into the PARTY table from different source systems, there is often a column like PARTY.source_system_id where the identification of the source system is stored as a constant. Similarly, when a report is created that consumes data only from specific source systems, there is a condition in the statement filtering data based on the PARTY.source_system_id column. Thus, it is possible to automatically analyze both the insertion and selection to/from the PARTY table and create predicates such as PARTY.source_system_id = 20 that are then stored together with data lineage in the metadata repository. Therefore, it is possible to include them in the computation during the impact analysis.

Thanks to that, if we perform an impact analysis from the report EMPL_REPORT, the predicate PARTY.source_system_id = 20 is gathered before the table PARTY. When the analysis continues towards source tables, the predicate for each path is selected and compared to what has already been gathered. Therefore, when the path to the source table CLIENT with the predicate PARTY.source_system_id = 10 is tested, the result is that both predicates cannot hold at once, so data for this report cannot come from this source table. Conversely, when the path to source table EMPLOYEE with the predicate PARTY.souce_system_id = 20 is tested, the result is that data for this report can come from this source table, so it is included in the result of the impact analysis. We can get similar results if we perform an impact analysis for the BRANCH_REPORT and also from sources like the EMPLOYEE table.

The result of the advanced data lineage analysis can look like this (in reality, if we perform the impact analysis from the EMPL_REPORT, we will only see the EMPLOYEE and PARTY tables):

predicates2

Surely, the situation can be far more complex. For example, the data from the PARTY table can be pre-computed for more source systems first, and then several reports can be created on top of them for only a specific source system, like in this picture:

predicates3

This is also something that can be handled and, as you may have expected, even this is a part of the Manta Flow product analysis.

If you have any questions or comments, feel free to contact Lukas at manta@mantatools.com. You can try these predicate-based impact analyses in our free trial – just request it using the form on the right. 

 

How To Inspect Raw Data Lineage With Manta Flow

Risk departments have a lot of complex SQL queries in their data warehouses and data marts. But sometimes it’s really difficult to find the right level of detail. Manta Flow can help.

Risk departments have a lot of complex SQL queries in their data warehouses and data marts. But sometimes it’s really difficult to find the right level of detail. Manta Flow can help.

“When we present Manta Flow to potential customers, most of them are happy that we can reduce very complex SQL statements to a few simple rectangles connected by arrows”, explains Lukas Hermann, our Director of Engineering. “They need to be able to quickly understand what source tables their SQL queries read, what target tables they fill, what columns are involved in computing a particular column and how.”

The Usual

For example, let’s look at just two ordinary insert statements moving data from a stage to a datamart and to a report:

raw_code

It could take you quite a while to analyze which columns are involved in the computation. But with Manta Flow it is really easy to see, including all the statements involved:

Direct

(click to show the large version in a new tab)

This is perfectly sufficient for all business analytics in data warehouse environments. All the unnecessary details like exactly how data is computed, filtered, aggregated, or ordered are hidden. And if you want to go deeper, Manta Flow can easily show the SQL code of the statements where you find the full detail.

The Raw

However, some analysts (particulary from the aforementioned risk departments) say that their SQL statements are really huge, including many subselects, complex expressions, etc., so the jump between the clear picture and the SQL code is too big. Therefore, they would like to see all the computation steps in a similar simplified format, and they ask if Manta Flow can handle it, if it has all the information necessary to show it.

The answer is that Manta Flow has the most detailed information possible about each part of the statement, but so as not to disturb you with what are in most cases useless details, it filters the information to the best level of detail. If you want to see everything including expressions, conditions, aggregations, etc., it’s possible to configure or completely turn off the filtering. Manta Flow is able to show you unfiltered information, but still keep you in the loop and oriented within your own systems.

click_to_see_the_big_picture

(image will open in a new tab)

See? It’s possible to show the SQL code in the precise position of each part of the statement shown.

If you’d like to try something like that yourself, just let us know in the form on the right. Also, do not forget to follow us on Twitter.

Manta Flow Makes Metadata Shine

Customer: a major bank
Problem: A complex BI environment with thousands of BTEQ scripts, 10 000+ views, and hundreds of ETL transformations. Management with a very progressive metadata strategy, but lacking appropriate tools.
Solution: Manta Flow was applied to Teradata, analyzing all tables, views and scripts, and providing missing metadata to Informatica Metadata Manager.
Result: Tightly controlled BI environment with completedata flow documentation, comprehensive impact analyses, and unparalleled visibility of metadata.

Manta Flow Makes Impact Analyses Doable

Customer: a major bank
Problem: The customer has a 50 000+ lines of Oracle SQL and PL/SQL code in one of their environments. Impact Analyses could not be completed in time, and contained many gaps and errors. It was impossible to fully document the workings of the customer’s Oracle platform.
Solution: Manta Flow has been implemented for regular and one-off analyses of the entire code base. Data flows are analyzed and visualized for each change.
Result: The customer now has a complete data flow map of the environment. Impact analyses are quick to perform, and reliable. The visualization produced by Manta flow is used to train new employees and contractors.

How Manta Flow Helped Informatica Metadata Manager With Custom Code

IMM is one of our prime areas of integration. Let’s take a look at one use case that shows how we make things happen.

IMM is one of our prime areas of integration. Let’s take a look at one use case that shows how we make things happen.

Many of our customers solve their data lineage issues with Informatica Metadata Manager. In one of our most recent cases, the customer had a special request – the ability to see data flows passing through SQL Server Reporting Services. IMM can connect to SSRS easily, but is not designed to fully analyze complex SQL overrides and other SQL code in databases like Teradata, SQL Server or Oracle, in cases where SQL overrides are used (read more about custom code in BI here).That’s the speciality of Manta Flow. Let’s take a look how it works inside the IMM with Manta Flow connected to it:

image

At this point, Manta Flow came to the rescue! In this implementation, our solution filled gaps in data flows and helped the customer to get a full picture. The customer’s DWH Specialist was then able to perform an impact analysis in a fraction of original time. The same solution is also available for SQL Server Analytic Services and IBM Cognos, by the way.

Sounds cool, huh? Try Manta Flow using our online demo and drop us a line at manta@getmanta.com or by using the form on the right.

Manta Flow + SAP PowerDesigner: A New Metamodel Mapping Platform

In a previous article, Lukas Hermann explained how PowerDesigner and Manta Flow can work together. Now, our Senior Developer Jiri Tousek will explain all the technical details. 

In a previous article, Lukas Hermann explained how PowerDesigner and Manta Flow can work together. Now, our Senior Developer Jiri Tousek will explain all the technical details. 

Basically, we’ve created a general procedure for importing PowerDesigner’s metadata to Manta Flow for various purposes, e.g. advanced impact analyses. It is a simple, two-step process:

Step 1: Get Metadata from PowerDesigner

First, we need to export/access the PDM file and extract its metadata as an XML file (our tool also supports batch processing of multiple files). The metadata is easily mapped to physical database objects as long as the physical name (“code” field) is filled in correctly and the correct model object type is used. PowerDesigner’s own API is responsible for two-way compatibility between different versions of PD. We use API because PDM file formats have changed a lot between different PD versions. Each model is extracted with standard and also extended attributes in all packages, tables, views and columns.

Step 2: Mapping Platform

Our mapping platform supports metadata in any general XML file. There’s only one requirement: the XML has to contain physical data model names and the object types of the database objects being mapped. Our configuration is quite flexible though. It’s XPath-based and can support pretty much any metadata scheme. It goes without saying that you can choose what metadata to extract from the input and that we also support translation of the metadata attribute names (e.g. from technical identifiers to human-readable attribute labels).

Benefits

So, this is a short summary of how we export metadata from PowerDesigner and import it to Manta Flow. And how can you benefit from this in your daily work? Well, there are a bunch of ways, but this triplet comes to mind right now:

1) In Manta Flow, you can see data flows all together with attributes.

2) Pretty much everything written in this article.

3) You can easily create an impact analysis regrading those attributes. This is probably the most interesting, so let’s illustrate this with a use case – security impact analysis:

Imagine that you need to investigate how your sensitive data propagates through your system. And obviously, you need to compare the real security status with the rules, guidelines, and procedures for your organization. That’s not something you can just wave off, and this solution will make your work doing this a lot easier. In order to perform this analysis, you need to define sensitivity levels for (some of) your database objects. By importing those from PowerDesigner, you can avoid manually setting up administration rights for sensitivity levels. Using the policies you already have in place for PD will save you time as you avoid bureaucratic hassles.

Jiri Tousek would love to hear your feedback on this – let him know what you think at manta@mantatools.com and follow us on Twitter

Subscribe to our newsletter

We cherish your privacy.

By using this site, you agree with using our cookies.