Use Cases & Case Studies

How Manta Flow Now Works with IBM Information Governance Catalog

November 22, 2016 by

We are expanding our support for 3rd party metadata managers – to help our customers get the most out of their existing data governance solutions.

We are expanding our support for 3rd party metadata managers – to help our customers get the most out of their existing data governance solutions.

Our key product, Manta Flow, already complements Informatica Metadata Manager very well. With the addition of IBM InfoSphere Information Governance Catalog, we are able to deliver the same level of highly specialized code crunching to folks who use IBM’s tools as well. And how does that work? Well, it’s simple:

1. Manta Flow crunches programming code based on our supported technologies (Teradata, Oracle, Microsoft SQL and others).

2. After connecting to IGC, Manta Flow will create a new metamodel and perfectly integrate with the existing structure within IGC.

3. The customer can browse IGC as he or she is used to – it’s just going to have way more accurate data lineage ready to use.

Seamless integration into IGC is the key to success. We’ve created a short video to explain a little bit more how Manta Flow is integrated into Information Governance Catalog. And what’s inside?

1. A Brief Explanation of a New Metamodel in IGC: 0:10
2. How It Works with Queries: 1:40
3. Integration of Data Lineage Visualization: 2:13

A frinedly suggestion: Run the video on fullscreen.

And if you are not ready for IGC, stay tuned, we will soon present you our newest video about our oldest love – IMM. In the meantime, read the introductory article right here.

Any thoughts? Comments? Or do you simply want to try it out for yourself? Just let us know at manta@mantatools.com or use the form on the right.

Agile BI Development in 2016: Where Are We?

Agile development was meant to be the cure for everything. It’s 2016 and Tomas Kratky asks the question: where are we?

Agile development was meant to be the cure for everything. It’s 2016 and Tomas Kratky asks the question: where are we?

BI departments everywhere are under pressure to deliver high quality results and deliver them fast. At the same time, the typical BI environment is becoming more and more complex. Today we use many new technologies, not just standard relational databases with SQL interfaces, but for example NoSQL databases, Hadoop, and also languages like Python or Java for data manipulation.

Another issue we have is a false perception of the work that needs to be done when a business user requests some data. Most business users think that preparing the data is only a tiny part of the work and that the majority of the work is about analyzing the data and later communicating the results. Actually, it’s more like this:

agile

See? The reality is completely different. The communication and analysis of data is that tiny part at the top and the majority of the work is about data preparation. Being a BI guy is simply a tough job these days.

This whole situation has led to an ugly result – businesses are not happy with their data warehouses. We all have probably heard a lot of complaints about DWHs being costly, slow, rigid, or inflexible. But the reality is that DWHs are large critical systems, and there are many, many different stakeholders and requirements which change from day to day. In another similar field, application software development, we had the same issues with delivery, and in those cases, agile processes were good solutions. So our goal is to be inspired and learn how agile can be used in BI.

The Answer: Agile?

One very important note – agility is a really broad term, and today I am only going to speak about agile software development, which means two things from the perspective of a BI development team:

1. How to deliver new features and meet new requirements much faster

2. How to quickly change the direction of development

Could the right answer be agile development? It might be. Everything written in the Agile Manifesto makes sense, but what’s missing are implementation guidelines. And so this Manifesto was, a little bit later, enriched with so-called agile principles. As agile became very popular, we started to believe that agile was a cure for everything. This is a survey from 2009 which clearly demonstrates how popular agile was:

Obrázek1

Source: Forrester/Dr. Dobb’s Global Developer Technographic, 2009

And it also shows a few of the many existing agile methodologies. According to some surveys from 2015, agile is currently being used by more than 80% or even 90% of development teams.

Semantic Gap

Later on, we realized that agile is not an ultimate cure. Tom Glib, in his famous article “Value-Driven Development Principles and Values” written in 2010, went a bit deeper. After conducting a thorough study of the failures, mistakes, and also successes since the very beginning of the software industry, one thing became clear – there is something called a semantic gap between business users and engineers, and this gap causes a lot of trouble. Tom Glib hit the nail on the head by saying one important thing: “Rapidly iterating in wrong directions is not progress.” Therefore, the requirements need to be treated very carefully as well.

But even with the semantic gap issue, agile can still be very useful. Over the last ten years the agile community has come up with several agile practices. They are simple to explain things that anyone can start doing to improve his or her software processes. And this is something you should definitely pay attention to. Here you can see agile practices sorted by popularity:

Obrázek1

If you have ever heard about agile, these are probably no surprises for you. The typical mistake made by many early adopters of agile was simply being too rigid; I would call it “fanatic”. It was everything or nothing. But things do not work that way.

It’s Your Fault If You Fail

Each and every practice should be considered a recommendation, not a rule. Your responsibility is to decide if it works for you or not. Each company and each team are different, and if system metaphor practice has no value for your team, just ignore it like we do. Are you unable to get constant feedback from business users? Ok, then. Just do your best to get as much feedback as you need.

On the other hand, we’ve been doing agile for a long time, and we’ve learned that some practices (marked in red) are more important than others and significantly influence our ability to be really fast and flexible.

Obrázek2

There are basically two groups of practices. The first group is about responsibility. A product owner is someone on your side who is able to make decisions about requirements and user needs, prioritize them, evaluate them, and verify them. It can be someone from the business group; but this job is very time consuming, so more often the product owner will be the person on your BI team who knows the most about business. Without such a person, your ability to make quick decisions will be very limited. Making a burndown list is a very simple practice which forces you to clearly define priorities and to select features and tasks with the highest priority for the next release. And because your releases tend to be more frequent with agile, you can always pick only a very limited number of tasks making clear priorities vital.

The second group of critical practices is about automation. If your iterations are short, if you integrate the work of all team members on a daily basis and also want to test it to detect errors and correct them as early as possible, and if you need to deliver often, you will find yourself and your team in a big hurry without enough time to handle everything manually. So automation is your best friend. Your goal is to analyze everything you do and replace all manual, time-consuming activities with automated alternatives.

What Tools To Use?

Typical tools you can use include:

1. Modern Version Control Systems

A typical use case involves a GIT, SVN, or Team Foundation Server storing all pieces of your code, tracking versions/changes, merging different branches of code, etc. What you are not allowed to do is use shared file systems for that. Unfortunately, it is still quite a common practice among BI folks. Also, be careful about using BI tools which do not support easy, standard versioning. Do not forget that even if you draw pictures, models, or workflows and do not write any SQL, you are still coding.

So a good BI tool stores every piece of information in text-based files – for example XMLs. That means you can make them part of a code base managed by GIT for example. A bad BI tool stores everything in binary and proprietary files, which can’t be managed effectively by any versioning system. Some tools support a kind of internal versioning, but those are still a big pain for you as a developer and they lead to fragmented version control.

2. Continuous Integration Tools

You’ll also need tools like Maven and Jenkins or PowerShell and TeamCity to do rapid and automated build and deploy of your BI packages.

3. Tools for Automated Code Analysis and Testing

I recommend using frameworks like DB Fit at least to write automated functional tests and also using a tool for static code analysis to enforce your company standards, best practices, and code conventions (Manta Checker is really good at that). And do not forget – you can’t refactor your code very often without proper testing automation.

4. Smart Documentation Tools

In the end, you can’t work on the parts of your system you do not understand. The best combination of tools you can get is something like Wiki to capture basic design ideas and a smart documentation tool able to generate detailed documentation when needed in an automated way. Today there are many very good IDEs that are able to generate mainly control-flow and dependency diagrams. But we are BI guys, and there is one thing that is extremely useful for us – it is called data lineage, or you can call it data flow.

Simply put, it’s a diagram showing you how data flows and is transformed in your DWH. You need data lineage to perform impact analyses and what-if analyses as well as to refactor your code and existing data structures. There are almost no solutions on the market which are able to show you data lineage from your custom code (except our Manta Flow, of course).

And that’s it. Of course, there are some other more advanced practices to support your agility, but this basic stuff is, I believe, something which can be implemented quickly from the perspective of both processes and tools. I definitely suggest starting with a smaller more experienced team, implementing the most important practices, playing around a little bit, and measuring the results of different approaches. I guarantee that you and your team will experience significant improvements in speed and flexibility very soon.

Do you have any questions or comments? Send them directly to Tomas Kratky at manta@mantatools.com! Also, do not forget to follow us on Twitter and LinkedIn

How to Handle Impact Analyses in Complex DWHs with Predicates

“How to get full data lineage in complex BI environments and perform reliable impact analyses?” Predicates (with the help of Manta Flow!) might be the answer. 

“How to get full data lineage in complex BI environments and perform reliable impact analyses?” Predicates (with the help of Manta Flow!) might be the answer. 

During our pilots and deployments, we often find data warehouse environments that use very general physical models including several big tables like PARTY, BALANCE, ORDER and others. These tables contain data obtained from various source systems, and there are a lot of data marts and reports built on top of them. These tables make things difficult during the impact analysis because data lineage from almost every report goes through them to all sources making the result worthless.

Impact Analyses Do Not Have to Be THIS BIG 

Let’s take a look at an example to understand exactly what happens. The table PARTY contains all individuals and companies that are somehow related to the organization. Thus, in one table, it is possible to have records for clients, employees, suppliers and its branch network. Each type of entity is identified by a unique attribute or source system from which data is obtained – for example, clients are managed in a different system than employees.

Now, let’s assume we have two reports based on data from the PARTY table – a report EMPL_REPORT that displays information about employees and another report BRANCH_REPORT that displays information about the branch network. If we use the standard data lineage analysis, we can get this picture:

predicates1

Although only data from the EMPLOYEE source table is relevant for the report EMPL_REPORT, the impact analysis from that report also includes the CLIENT, BRANCH and SUPPLIER source tables due to the PARTY table. The problem is the same for the report BRANCH_REPORT. From the other side, the impact analysis from the EMPLOYEE source table includes both the EMPL_REPORT and BRANCH_REPORT which is confusing.
In the real environment, there are dozens of source systems and hundreds of reports, which makes the standard data lineage analysis worthless.

The Advanced Data Lineage Analysis 

Fortunately, there is a solution. When data is inserted into the PARTY table from different source systems, there is often a column like PARTY.source_system_id where the identification of the source system is stored as a constant. Similarly, when a report is created that consumes data only from specific source systems, there is a condition in the statement filtering data based on the PARTY.source_system_id column. Thus, it is possible to automatically analyze both the insertion and selection to/from the PARTY table and create predicates such as PARTY.source_system_id = 20 that are then stored together with data lineage in the metadata repository. Therefore, it is possible to include them in the computation during the impact analysis.

Thanks to that, if we perform an impact analysis from the report EMPL_REPORT, the predicate PARTY.source_system_id = 20 is gathered before the table PARTY. When the analysis continues towards source tables, the predicate for each path is selected and compared to what has already been gathered. Therefore, when the path to the source table CLIENT with the predicate PARTY.source_system_id = 10 is tested, the result is that both predicates cannot hold at once, so data for this report cannot come from this source table. Conversely, when the path to source table EMPLOYEE with the predicate PARTY.souce_system_id = 20 is tested, the result is that data for this report can come from this source table, so it is included in the result of the impact analysis. We can get similar results if we perform an impact analysis for the BRANCH_REPORT and also from sources like the EMPLOYEE table.

The result of the advanced data lineage analysis can look like this (in reality, if we perform the impact analysis from the EMPL_REPORT, we will only see the EMPLOYEE and PARTY tables):

predicates2

Surely, the situation can be far more complex. For example, the data from the PARTY table can be pre-computed for more source systems first, and then several reports can be created on top of them for only a specific source system, like in this picture:

predicates3

This is also something that can be handled and, as you may have expected, even this is a part of the Manta Flow product analysis.

If you have any questions or comments, feel free to contact Lukas at manta@mantatools.com. You can try these predicate-based impact analyses in our free trial – just request it using the form on the right. 

 

How To Inspect Raw Data Lineage With Manta Flow

Risk departments have a lot of complex SQL queries in their data warehouses and data marts. But sometimes it’s really difficult to find the right level of detail. Manta Flow can help.

Risk departments have a lot of complex SQL queries in their data warehouses and data marts. But sometimes it’s really difficult to find the right level of detail. Manta Flow can help.

“When we present Manta Flow to potential customers, most of them are happy that we can reduce very complex SQL statements to a few simple rectangles connected by arrows”, explains Lukas Hermann, our Director of Engineering. “They need to be able to quickly understand what source tables their SQL queries read, what target tables they fill, what columns are involved in computing a particular column and how.”

The Usual

For example, let’s look at just two ordinary insert statements moving data from a stage to a datamart and to a report:

raw_code

It could take you quite a while to analyze which columns are involved in the computation. But with Manta Flow it is really easy to see, including all the statements involved:

Direct

(click to show the large version in a new tab)

This is perfectly sufficient for all business analytics in data warehouse environments. All the unnecessary details like exactly how data is computed, filtered, aggregated, or ordered are hidden. And if you want to go deeper, Manta Flow can easily show the SQL code of the statements where you find the full detail.

The Raw

However, some analysts (particulary from the aforementioned risk departments) say that their SQL statements are really huge, including many subselects, complex expressions, etc., so the jump between the clear picture and the SQL code is too big. Therefore, they would like to see all the computation steps in a similar simplified format, and they ask if Manta Flow can handle it, if it has all the information necessary to show it.

The answer is that Manta Flow has the most detailed information possible about each part of the statement, but so as not to disturb you with what are in most cases useless details, it filters the information to the best level of detail. If you want to see everything including expressions, conditions, aggregations, etc., it’s possible to configure or completely turn off the filtering. Manta Flow is able to show you unfiltered information, but still keep you in the loop and oriented within your own systems.

click_to_see_the_big_picture

(image will open in a new tab)

See? It’s possible to show the SQL code in the precise position of each part of the statement shown.

If you’d like to try something like that yourself, just let us know in the form on the right. Also, do not forget to follow us on Twitter.

Manta Flow Makes Metadata Shine

Customer: a major bank
Problem: A complex BI environment with thousands of BTEQ scripts, 10 000+ views, and hundreds of ETL transformations. Management with a very progressive metadata strategy, but lacking appropriate tools.
Solution: Manta Flow was applied to Teradata, analyzing all tables, views and scripts, and providing missing metadata to Informatica Metadata Manager.
Result: Tightly controlled BI environment with completedata flow documentation, comprehensive impact analyses, and unparalleled visibility of metadata.

Manta Flow Makes Impact Analyses Doable

Customer: a major bank
Problem: The customer has a 50 000+ lines of Oracle SQL and PL/SQL code in one of their environments. Impact Analyses could not be completed in time, and contained many gaps and errors. It was impossible to fully document the workings of the customer’s Oracle platform.
Solution: Manta Flow has been implemented for regular and one-off analyses of the entire code base. Data flows are analyzed and visualized for each change.
Result: The customer now has a complete data flow map of the environment. Impact analyses are quick to perform, and reliable. The visualization produced by Manta flow is used to train new employees and contractors.

How Manta Flow Helped Informatica Metadata Manager With Custom Code

IMM is one of our prime areas of integration. Let’s take a look at one use case that shows how we make things happen.

IMM is one of our prime areas of integration. Let’s take a look at one use case that shows how we make things happen.

Many of our customers solve their data lineage issues with Informatica Metadata Manager. In one of our most recent cases, the customer had a special request – the ability to see data flows passing through SQL Server Reporting Services. IMM can connect to SSRS easily, but is not designed to fully analyze complex SQL overrides and other SQL code in databases like Teradata, SQL Server or Oracle, in cases where SQL overrides are used (read more about custom code in BI here).That’s the speciality of Manta Flow. Let’s take a look how it works inside the IMM with Manta Flow connected to it:

image

At this point, Manta Flow came to the rescue! In this implementation, our solution filled gaps in data flows and helped the customer to get a full picture. The customer’s DWH Specialist was then able to perform an impact analysis in a fraction of original time. The same solution is also available for SQL Server Analytic Services and IBM Cognos, by the way.

Sounds cool, huh? Try Manta Flow using our online demo and drop us a line at manta@getmanta.com or by using the form on the right.

Manta Flow + SAP PowerDesigner: A New Metamodel Mapping Platform

In a previous article, Lukas Hermann explained how PowerDesigner and Manta Flow can work together. Now, our Senior Developer Jiri Tousek will explain all the technical details. 

In a previous article, Lukas Hermann explained how PowerDesigner and Manta Flow can work together. Now, our Senior Developer Jiri Tousek will explain all the technical details. 

Basically, we’ve created a general procedure for importing PowerDesigner’s metadata to Manta Flow for various purposes, e.g. advanced impact analyses. It is a simple, two-step process:

Step 1: Get Metadata from PowerDesigner

First, we need to export/access the PDM file and extract its metadata as an XML file (our tool also supports batch processing of multiple files). The metadata is easily mapped to physical database objects as long as the physical name (“code” field) is filled in correctly and the correct model object type is used. PowerDesigner’s own API is responsible for two-way compatibility between different versions of PD. We use API because PDM file formats have changed a lot between different PD versions. Each model is extracted with standard and also extended attributes in all packages, tables, views and columns.

Step 2: Mapping Platform

Our mapping platform supports metadata in any general XML file. There’s only one requirement: the XML has to contain physical data model names and the object types of the database objects being mapped. Our configuration is quite flexible though. It’s XPath-based and can support pretty much any metadata scheme. It goes without saying that you can choose what metadata to extract from the input and that we also support translation of the metadata attribute names (e.g. from technical identifiers to human-readable attribute labels).

Benefits

So, this is a short summary of how we export metadata from PowerDesigner and import it to Manta Flow. And how can you benefit from this in your daily work? Well, there are a bunch of ways, but this triplet comes to mind right now:

1) In Manta Flow, you can see data flows all together with attributes.

2) Pretty much everything written in this article.

3) You can easily create an impact analysis regrading those attributes. This is probably the most interesting, so let’s illustrate this with a use case – security impact analysis:

Imagine that you need to investigate how your sensitive data propagates through your system. And obviously, you need to compare the real security status with the rules, guidelines, and procedures for your organization. That’s not something you can just wave off, and this solution will make your work doing this a lot easier. In order to perform this analysis, you need to define sensitivity levels for (some of) your database objects. By importing those from PowerDesigner, you can avoid manually setting up administration rights for sensitivity levels. Using the policies you already have in place for PD will save you time as you avoid bureaucratic hassles.

Jiri Tousek would love to hear your feedback on this – let him know what you think at manta@mantatools.com and follow us on Twitter

MANTA and Informatica Metadata Manager: Best Friends Forever?

December 4, 2014 by

Have you heard about Informatica Metadata Manager? Of course you have. Let’s see how it works with Manta Tools.

Have you heard about Informatica Metadata Manager? Of course you have. Let’s see how it works with Manta Tools.

Informatica Metadata Manager and Manta Flow (our tool for data lineage analysis) are basically complementary solutions with slightly different focuses. One cannot replace the other, and they work best together with IMM as a platform that Manta Flow complements. IMM is a rich metadata management platform with many important features:

  • business glossary
  • rich library of metadata connectors
  • data lineage visualization
  • different role-appropriate metadata views
  • advanced search and browse of metadata catalog
  • impact analysis
  • integration with Informatica Data Quality

And how does Manta Flow complement IMM? In which areas can a system based on Informatica’s solution benefit from the connection?

Manta Flow brings additional value to IMM in several areas. Manta Flow connects to Informatica Metadata Manager, taking advantage of the IMM plugin API, and enriches the metadata model of IMM with data lineage from the following:

  • database* scripts
  • database* stored procedures
  • SQL overrides in Informatica PowerCenter
  • SQL overrides in reporting**
  • Informatica PowerCenter indirect files (lists of flat files)
  • Informatica PowerCenter parameter files, automatically (IMM parses parameter files but needs to be manually configured in order to load them or refresh after changes)

* Currently Oracle and Teradata are supported, MS SQL support is scheduled for 2015, and further platforms are planned. While IMM has a rudimentary capability to parse SQL, in real life situations (as witnessed in production with our customers) we find it is still not sufficient for impact analyses or documenting target lineage.

** Currently IBM Cognos is supported and we are open to suggestions for other platforms.

 

ifpc_demo

Visualization of an IFPC workflow in the Manta Flow online demo.

 

Manta Flow also analyzes indirect data flows – conditional dependencies like filters in IFPC or WHERE clauses in SQL. Although indirect data flows cannot currently be visualized in IMM, Manta Flow provides additional tabular output (csv by default) to be used in impact analyses and lineage discovery.

And not only indirect data flows, it also analyzes them with much more than lineage in mind. What Flow does is a complete semantic analysis of the platform code (the platform being a database, ETL, or report), and each node in the Manta metadata model provides useful attributes for more advanced analyses. This is used by Manta Checker to detect certain hard to diagnose issues along the full length of the data flows.

Informatica Metadata Manager is a powerful solution that has its limitations, as any other tool. We want to help you ensure that the data lineage you are working with is complete and uninterrupted, and that you have the right combination of tools to fit your particular environment. Manta Flow is our addition to this toolset, and our customers get great results with Manta – especially when combining it with Informatica Metadata Manager.

Do you want to know more about the connection between MANTA and Informatica Metadata Manager? Just let us know via email or the contact box on the right. Also, do not forget to follow us on Twitter

Case Study: Manta Tools and Its Two Years with Vodafone

September 24, 2014 by

Manta Tools, developed by Profinit, has been saving human and financial resources at Vodafone CZ for the last two years.

Manta Tools, developed by Profinit, has been saving human and financial resources at Vodafone CZ for the last two years.

Vodafone uses Manta in its data warehouse for analysis, optimization and control of data flows. From our tool pack, Manta Checker and Manta Flow were deployed.

A tool for quality assurance and coding standards

Manta Checker deals with old sins – parts of code in the data warehouse (Teradata scripts and Informatica workflows) were supplied by a huge number of external developers over the years, at various levels of quality and in different styles. Further development of this system was highly complicated. Manta Checker verified every snippet of code, even before it was inserted into the system, and performed automatic repairs where possible. Automated code reviews helped improve quality, quicken development and save a lot of resources. Otherwise Vodafone would have had to spend money on long manual code reviews.

code_review_with_manta

A solution for impact analyses and data flow monitoring

Manta Flow visualizes metadata in the data warehouse. Vodafone’s new BI solution was implemented with its own metadata manager, but it was really difficult to effectively monitor data flows. Manta Flow was used as one of the components and provided complete control of data flows in the whole system. The analytic team now has a much simpler job when creating impact analyses for new parts of the system.

Manta Tools facilitates Vodafone’s BI management, simplifies quality assessment and methodology control, and saves time on manual code reviews. Permanent link to this case study.

Subscribe to our newsletter

We cherish your privacy.

By using this site, you agree with using our cookies.