Tomáš Krátký

MANTA 2019: We wish health to all of you and your data as well!

December 31, 2018 by

Well, that’s a wrap! 2018 is behind us, so as you sit in your office chairs in the first few work days of 2019, read our last blog post about 2018 written by our CEO, Tomas Kratky.

Well, that’s a wrap! 2018 is behind us, so as you sit in your office chairs in the first few work days of 2019, read our last blog post about 2018 written by our CEO, Tomas Kratky.

For most people around the world, especially those in areas with Christian roots, the time around Christmas and New Year is somehow special. It starts with Christmas parties. There are so many of them everywhere that you can literally spend the whole first half of December going from one to the next. Later in December, no matter how busy our year is, we usually tend to slow down a bit and think about spending more time with family and friends. Even workaholics consider taking some time off. This magic is what I love so much about Christmastime.

The end of the year is also a good time to look back and thank the people who have supported us on our journeys. At Manta, we would like to thank, from the bottom of our hearts, all our customers, partners, and supporters. 2018 was an amazing year – we experienced triple-digit growth in customers, employees, and revenue. We also completed our first investment round last summer and welcomed two VCs on board. With their support, we are continuously improving our operations and internal processes to prepare ourselves for the coming years of growth.

I could fill pages with all the great things we achieved in 2018, but what would be the point in doing that? The most important thing is to say how proud we are to serve such great companies and how it is even better to see and enjoy the excitement that individual users and data professionals experience as a result of the lineage automation capabilities Manta brings them. In those moments, we feel like all the hard work and sleepless nights are worth it.

The end of the year is also a time flooded with different Top X predictions. Sometimes it is so boring to read again and again how technology X will disrupt the whole industry next year. This year, we are all talking about AI and machine learning. It is funny to see every technology out there suddenly “powered by AI & ML”, “bringing AI & ML to everyday life”, etc. But you know what – things are really changing, and AI & ML buzzwords are getting more and more real. Which is great. But like with big data a couple years back, we have to govern new initiatives properly and make them part of our data governance programs from day 1, otherwise we will end up with a big mess (as we did with data swamps). With no trust in the data you use for AI and ML, how can you trust the outcomes and results of your intelligent algorithms. Remember – mess in, mess out!

The end of the year is also the time for New Year’s resolutions and wishes. In Manta, we have no special resolution, just to keep on doing the same as we have been doing every year since we started (aligned with our mission to provide our customers with fully automated and complete navigation through their entire data and application landscape) – to push the boundaries of lineage automation and again understand a bit more, always maintain a can-do attitude, and continue to become slightly better versions of ourselves.

And my wish for 2019? As my grandma always says: “I wish you good health, my boy, and you will take care of the rest!” And she is right. Health is critical, not only for people but for data too. With bad data, we very much limit our ability to use it to its full potential. But the same way we sometimes harm ourselves (drinking, smoking, no sleep, no fitness, etc.), we can also do a lot of harm to our data. So my wish for 2019 is the best possible health to all of you and your data as well!

Happy New Year! Šťastný Nový Rok! Frohes Neues Jahr! Feliz Año Nuevo!

Different Approaches To Data Lineage

September 10, 2018 by

I feel it is important to talk about different approaches to Data Lineage that are used by data governance vendors today. Because when you talk about metadata, you very often think about simple things – tables, columns, reports. But data lineage is more about logic.

I feel it is important to talk about different approaches to Data Lineage that are used by data governance vendors today. Because when you talk about metadata, you very often think about simple things – tables, columns, reports. But data lineage is more about logic.

It is more about programming code in any form. It can be SQL script, PL/SQL stored procedure, Java program or complex macros in your Excel sheet. It can literally be anything that allows you to somehow move your data from one place to another, transform it, modify it. So, what are your options and how to understand that logic?

Option 1) Ignore it! (aka Data Similarity Lineage)

No, I am not crazy! There are products building lineage information without actually touching your code. They read metadata about tables, columns, reports, etc. They profile data in your tables too. And then they use all that information to create lineage based on similarities.

Tables, columns with similar names or columns with very similar data values, those are examples of such similarities. And if you find a lot of them between two columns, you link them together in your data lineage diagram. And to make it even more cool, vendors usually call it AI (another buzzword I hate very much). There is one great thing about this approach – if you watch data only and not algorithms, you do not care about technologies and there is no big deal if customer uses Teradata, Oracle or MongoDB with Java on top of it.

But on the other hand, this approach is not very accurate, performance impact can be significant (you work with data) and data privacy is at risk (you work with data). There are also a lot of details missing (like transformation logic for example, very often requested by customers) and lineage is limited to the database world ignoring the application part of your environment.

Option 2) Do the “business” lineage manually

This approach usually starts from the top by mapping and documenting the knowledge in people’s heads. Talking to application owners, Data stewards, data integration specialist should give you fair but often contradictory information about data movements in your organization. And if you miss asking someone you simply don’t know about; a piece of the flow is missing! This often results in dangerous situation when you’re having a lineage but unable to use it for real-case scenarios – not only you do not have trust in your data, but on in the lineage either.

Option 3) Do the technical lineage manually

I will go simply to the point here – trying to analyze the technical flows manually is simply destined to fail. With the volume of the code you have, the complexity of it and the rate of change, there’s no way to keep up with it. When you start considering the complexity of the code and especially a need to reverse engineer the existing code this becomes extremely time consuming and sooner or later, such manually managed lineage falls out of sync with the actual data transfers within the environment and you end-up with a feeling of having lineage that you cannot actually trust.

Now, that we know that automation is the key, let’s take a look at less laboring and error prone approaches.

Option 4) Trace it! (aka Data Tagging Lineage)

Do you know the story of Theseus and the Minotaur? Minotaur lives in a labyrinth and so does Ariadne who was in charge of the labyrinth. Ariadne gave Theseus a ball of thread to help him navigate the labyrinth by tracing his path back.And this is a little bit similar approach.

The whole idea is that each piece of data that is being moved or transformed is tagged / labeled by a transformation engine which then tracks that label all its way from start to the end. It is like that Theseus. This approach looks great but it works well only as long as a transformation engine controls every movement of data. A good example is a controlled environment like Cloudera.

But anything happens outside its walls and lineage is broken. It is also important to realize that lineage is there only if transformation logic is executed. But think about all exceptions or rules that apply only once per a couple of years. You will not see them in your lineage till they are executed which is not exactly healthy for your data governance. Especially if some of those pieces are critical to your organization.

Option 5) Control it! (aka Self Lineage)

The whole idea here is that you have all-in-one environment that gives you everything you need – you can define your logic there, track lineage, manage master data and metadata easily, etc. There are several tools like that, especially with the new Big Data / Data Lake hype. If you have a software product of this kind, all happens under its control, every data movement, every change of data. And so, it is easy for such a tool to track the lineage.

But there is the very same issue as in the previous case with Data Tagging. All what happens outside the controlled environment is invisible. Especially when you think of long term manageability, over the time as new needs appear and new tools are acquired to address them, gaps in the lineage start to appear.

Option 6) Decode it! (aka Decoded Lineage)

Ok, so now we know that logic can be ignored, traced thanks to tags and controlled. But all those approaches fall short in most real-life scenarios. Why? Simply because the world is complex, heterogeneous, wild and most importantly – constantly evolves.

But there is still one other way – to read all the logic, to understand it and reverse engineer it. It literally means to understand every programming language used in your organization for data transformations and movements. And by programming language I mean really everything, including graphic or XML based languages used by ETL tools or reports. And that is the challenging part. It is not easy to develop sufficient support for one language, you need tens of them in most cases to cover basics of your environment.

Another challenging issue is when the code is dynamic, which means that you build your expressions on the fly based on program inputs, data in tables, environmental variables, etc. But there are ways how to handle such situations. On the other hand, this approach is the most accurate and complete as every single piece of logic is processed. It also guarantees the most detailed lineage of all.

An earlier version of this article was published on Tomas Kratky’s LinkedIn Pulse.

Is Guessing Good Enough for Your GDPR Project?

I will tell you one thing — I am tired of GDPR buzz. Don’t get me wrong, I value privacy and data protection very much, but I hate the way how almost every vendor uses it to sell their goods and services, so much that the original idea is almost lost.

I will tell you one thing — I am tired of GDPR buzz. Don’t get me wrong, I value privacy and data protection very much, but I hate the way how almost every vendor uses it to sell their goods and services, so much that the original idea is almost lost.

It is similar to BCBS or other past data-oriented regulations. Consulting companies, legal firms, data governance/ security/ metadata vendors, we are all the same — buy our “thing” and you will be ok or at least safer with us! Every second book out there tells us that every change is an opportunity to improve, to evolve. So, what is the improvement here with GDPR? If I look around I see a lot of legal work being done, adding tons of words (very small letters, as always) to already long Terms & Conditions. And you know what? I don’t think there is any real improvement in it.

But things are not always so bad. There is also a lot of good stuff going on with one important goal—to better understand and govern data and its lifecycle in a company. And there is one challenging but critical part I want to discuss today—Data Lineage. That means how data is moved around in your organization. You must understand that a customer’s email address or their credit card number is not just in your CRM but is spread all over your company in tens or even hundreds of systems — your ERP, data warehouse, reporting, new data lake with analytics, customer portal, numerous Excel sheets and even external systems. The path of the data you collect can be very complex, and if you think about all possible ways you can move and transform data in your company, one thing should be clear — your data lineage has to be automated as much as possible.

Different Approaches to Data Lineage

That being said, I feel it is important to talk about different approaches to data lineage that are used by data governance vendors today. Because when you talk about metadata, you very often think about simple things — tables, columns, reports. But data lineage is more about logic — programming code in any form. It can be an SQL script, PL/SQL stored procedure, Java program or complex macro in your Excel sheet. It can literally be anything that somehow moves your data from one place to another, transforms it, modifies it. So, what are your options for understanding that logic?

This article is base on a presentation by Jan Ulrych at the DGIQ 2018 Conference. Click on any image to open the gallery.

Option 1: Ignore it! (aka data similarity lineage)

No, I am not crazy! There are products building lineage information without actually touching your code. They read metadata about tables, columns, reports, etc. They profile data in your tables too. And then they use all that information to create lineage based on similarities. Tables, columns with similar names and columns with very similar data values are examples of such similarities. And if you find a lot of them between two columns, you link them together in your data lineage diagram. And to make it even more cool, vendors usually call it AI (another buzzword I really hate). There is one great thing about this approach — if you watch data only, and not algorithms, you do not worry about technologies and it is no big deal if the customer uses Teradata, Oracle or MongoDB with Java on top of it. But on the other hand, this approach is not very accurate, performance impact can be significant (you work with data) and data privacy is at risk (you work with data). There are also a lot of details missing (like transformation logic for example, which is very often requested by customers) and lineage is limited to the database world, ignoring the application part of your environment.

Option 2: Do the “business” lineage manually

This approach usually starts from the top by mapping and documenting the knowledge in people’s heads. Talking to application owners, data stewards and data integration specialists should give you fair but often contradictory information about the movement of data in your organization. And if you miss talking to someone you simply don’t know about, a piece of the flow is missing! This often results in the dangerous situation where you have lineage but are unable to use it for real case scenarios — not only can you not trust your data, you cannot trust the lineage either.

Jan Ulrych presenting at DGIQ 2018.

Option 3: Do the technical lineage manually

I will get straight to the point here — trying to analyze technical flows manually is simply destined to fail. With the volume of code you have, the complexity of it and the rate of change, there’s no way to keep up with it. When you start considering the complexity of the code and especially the need to reverse engineer the existing code, this becomes extremely time consuming and sooner or later such manually managed lineage will fall out of sync with the actual data transfers within the environment and you will end up with the feeling that you have lineage you cannot actually trust.
Now that we know that automation is key, let’s take a look at some less labor intensive and error prone approaches.

Option 4: Trace it! (aka data tagging lineage)

Do you know the story of Theseus and the Minotaur? The Minotaur lives in a labyrinth and so does Ariadne who is in charge of the labyrinth. Ariadne gives Theseus a ball of thread to help him navigate the labyrinth by being able to retrace his path.

And this approach is a bit similar. The whole idea is that each piece of data that is being moved or transformed is tagged/labeled by a transformation engine which then tracks that label the whole way from start to finish. It is like Theseus. This approach looks great, but it only works well as long as the transformation engine controls the data’s every movement. A good example is a controlled environment like Cloudera. If anything happens outside its walls, the lineage is broken. It is also important to realize that the lineage is only there if the transformation logic is executed. But think about all the exceptions and rules that apply only once every couple of years. You will not see them in your lineage till they are executed. That is not exactly healthy for your data governance, especially if some of those pieces are critical to your organization.

Option 5: Control it! (aka self-lineage)

The whole idea here is that you have an all-in-one environment that gives you everything you need—you can define your logic there, track lineage, manage master data and metadata easily, etc. There are several tools like this, especially with the new big data/ data lake hype. If you have a software product of this kind, everything happens under its control — every data movement, every change in data. And so, it is easy for a such a tool to track lineage. But here you have the very same issue as in the previous case with data tagging. Everything that happens outside the controlled environment is invisible, especially when you consider long-term manageability. Over time, as new needs appear and new tools are acquired to address them, gaps in the lineage start to appear.

Option 6: Decode it! (aka decoded lineage)

Ok, so now we know that logic can be ignored, traced with tags and controlled. But all those approaches fall short in most real-life scenarios. Why? Simply because the world is complex, heterogeneous, wild and most importantly — it is constantly evolving. But there is still another way — to read all the logic, to understand it and to reverse engineer it. That literally means to understand every programming language used in your organization for data transformations and movements. And by programming language I mean really everything, including graphic and XML based languages used by ETL tools and reports. And that is the challenging part. It is not easy to develop sufficient support for one language, let alone the tens of them you need in most cases to cover the basics of your environment. Another challenging issue is when the code is dynamic, which means that you build your expressions on the fly based on program inputs, data in tables, environmental variables, etc. But there are ways to handle such situations. On the other hand, this approach is the most accurate and complete as every single piece of logic is processed. It also guarantees the most detailed lineage of all.

And that’s it. This was not meant to be a scientific article, but I wanted to show you the pros and cons of several popular data lineage approaches. Which leads me back to my first GDPR paragraph. I see enterprises investing a lot of money in data governance solutions with insufficient data lineage capabilities, offering tricks like data similarity, data tagging and even self-lineage. But that is just guesswork, nothing more. Guesswork with a lot of issues and manual labor to correct the lineage.

So, I am asking you once again — is guessing good enough for your GDPR project?

This article was also published on Tomas Kratky’s Linkedin Pulse.

The Year of MANTA and Why We’ve Published Our Pricing Online

December 31, 2017 by

We’ve seen a massive surge in the world of data lineage over the last year.  More buzz, more leads, more customers for us and (from what I’ve heard) for other metadata players as well. It might come as a bit of a disruption, but we’ve decided to do something which is very common in other industries, but not in ours. We’ve published our pricing online. Why?

We’ve seen a massive surge in the world of data lineage over the last year.  More buzz, more leads, more customers for us and (from what I’ve heard) for other metadata players as well. It might come as a bit of a disruption, but we’ve decided to do something which is very common in other industries, but not in ours. We’ve published our pricing online. Why?

The Year We’ve Come Through

2017 is coming to an end, and so it is the right time to take a look back. It was a very hot year for metadata and data governance. Partially thanks to the new GDPR regulation, but there are more reasons behind it – more and more enterprises have come to the understanding that the only way to build an efficient data-driven company is through proper data governance. In 2016, data got a lot of attention, how big it is or can potentially be, how to manage large volumes, velocity, and variety in data.

In 2017, we all started to realize that it is not just about data, but also a lot about data algorithms – the way your data is and how it’s gathered, transferred, merged, processed, and moved around your company. Thanks to GDPR, internal discussions have been initiated about how and where sensitive / protected data elements are used, and suddenly, it turns out that we are flooded not just with data but with data algorithms too, and it is impossible to handle it all manually without automation.

That has drawn even more attention to MANTA and its unique data lineage automation capabilities. Our website basically exploded – our audience doubled and the use of our live demo nearly tripled. We have on-boarded several amazing new customers from all around the world, and we delivered four major releases this year, with plenty of new features in all of them including Public & Service APIs and new technologies (SSIS, IBM Netezza, IBM DB2, and Impala, to name a few). Simply said, 2017 was a fantastic year and more is coming in 2018!

And even though this year was yet another giant step for MANTA, we decided to do one more thing that will shake things up. We’ve done something that’s pretty common in all the other industries except ours.

Yes, we’ve published our pricing online for everyone to see.

And why?

MANTA is taking the lead in transparency and openness

Sometimes there are good reasons for hiding the price of your product or service. And it is common practice in the enterprise software industry. But does that really make sense? Let’s take a look at the usual reasons, then:

1) You might be legally bound to hide the price by a government or its suppliers. Yes, national security is a serious issue, and there might be some limitations put on companies which deal with it. But that works only for individual deals and is hardly a reason to hide the price.

2) You want to participate in tenders with secret bids. Yes, that also makes sense – especially when you are dealing with clients that focus only on the price. You would not want to lose just because your bid is a few thousand higher, would you? Perhaps not, but this is not our case – MANTA is a very unique software product with clear and easy to see value for its users. The price has to be reasonable, but it is rarely a way how to win anyone’s business.

3) You want to keep everybody in the dark. Yes, some do want that. But frankly, it’s a rather dishonest strategy. It’s foolish to expect that customers do not know other players on the market and their prices. It’s even more foolish to try to control the market by spreading rumors and making deals in the shadows.

When you are confident of your product and what it stands for, you are also confident of its price. There’s no reason to follow the “industry standard” by not disclosing the enterprise IT product prices. So dive into our pricing right here and if there’s something that needs clarification, just take a look at our pricing glossary right below it.

Thank you for your support this year and see you in 2018!

Yours,

Tomas

 

A Metadata Map Story: How We Got Lost When Looking for a Meeting Room

September 1, 2017 by

You may think that I have gone crazy after reading the title above or hope that our blog is finally becoming a much funnier place. But no, I am not crazy and this is not a funny story. [LONG READ]

You may think that I have gone crazy after reading the title above or hope that our blog is finally becoming a much funnier place. But no, I am not crazy and this is not a funny story. [LONG READ]

It is, surprisingly, a metadata story. A few months ago, when visiting one of our most important and hottest prospects, we arrived at the building (a super large finance company with a huge office), signed in and passed through security, called our main contact there, shook hands with him, and entered their private office space with thousands of work desks and chairs, plus many restrooms, kitchens, paintings, and also meeting rooms.

The Ghost of Blueberry Past

A very important meeting was ahead of us, with the main business sponsor who had significant influence over the MANTA purchasing process. Our main agenda was to discuss business cases involving metadata and the role of Manta Flow. So we followed our guide and I asked where we were going. “The blueberry meeting room”, he replied. We stopped several times, checking our current position on the map and trying to figure out where to go next. (It is a really super large office space.) After 10 minutes, we finally got very close, at least according to the map. Our meeting room should have been, as we read it on the map, straight and to the left. But it was not! We ran all over the place, looking around every corner, checking the name printed on every meeting room door, but nothing. We were lost.

Fortunately, there was a big group of people working in the area, so we asked those closest to us. Several guys stood up and started to chat with us about where that room could be. Some of them started to search for the room for us. And luckily, there was one smart and knowledgeable woman who actually knew the blueberry meeting room very well and directed us to it. In 20 seconds, we were there with the business sponsor, although we were a few minutes late. Uffff.

That’s a Suggestive Question, Sir!

Our gal runs a big group, a business, and BI analysts who work with data every single day – they do impact and what-if analyses for the initial phase of every data-related project in the organization. They also do plenty of ad-hoc analyses whenever something goes wrong. You know, to answer those tricky management questions like:

“How did it happen that we didn’t approve this great guy for a loan five months ago?”

or

“Tell me if there is any way a user with limited access can see any reports or run any ad-hoc queries on sensitive and protected data that should be invisible to her?”

And I knew that they had very bad documentation of the environment, non-existing or obsolete (which is even worse) as do many organizations out there, most of it in Excel sheets that were manually created for compliance reasons and uploaded to the Sharepoint portal. And luckily for us, they had recently started a data governance project with only one goal – to implement Informatica Metadata Manager and build a business glossary and an information catalog with a data lineage solution in it. It seemed to be a perfect time for us with our unique ability to populate IMM with detailed metadata extracted from various types of programming code (Oracle, Teradata, and Microsoft SQL in this particular environment).

Just Be Honest with Yourself: Your Map Is Bad

So I started my pitch about the importance of metadata for every organization, how critical it is to cover the environment end-to-end, and also the serious limitations IMM has regarding programming code, which is widely used there to move and transform data and to implement business logic. But things went wrong. Our business sponsor was very resistant to believe the story, being pretty OK with what they have now as a metadata portal. (Tell me, how anyone can call Sharepoint with several manually created and rarely updated Excel sheets, a metadata portal? I don’t understand!) She asked us repeatedly to show her precisely how we can increase their efficiency. And she was not satisfied with my answers based on our results with other clients. I was lost for the second time that day.

And as I desperately tried to convince her, I told her the story about how we get lost and mixed it with our favorite “metadata like a map, programming code like a tricky road” comparison. “It is great that you even have a map”, I told her. “This map helped us to quickly get very close to the room and saved us a lot of time. But even when we were only 40 meters from our target, we spent another 10 minutes, the very same amount of time needed to walk all the way from the front desk to that place, looking for our room. Only because your great map was not good enough for the last complex and chaotic 5% of our trip. And what is even worse, others had to help us, so we wasted not only our time, but also theirs. So this missing piece of the map led to multiple times increased effort and decreased efficiency. And now think about what happens if your metadata map is not complete from 40% to 50%, which is the portion of logic you have hidden here inside various kinds of programming code invisible to IMM. Do you really want to ignore it? Or do you really want to track it and maintain it manually?”

And that was it! We got her. The rest of our meeting was much nicer and smoother. Later, when we left, I realized once again how important a good story is in our business. And understandability, urgency and relevance for the customer are what make any story a great one.

And what happened next? We haven’t won anything yet, it is still an open lead, but now nobody has doubts about MANTA. They are struggling with IMM a little bit. So we are waiting and trying to assist them as much as possible, even with technologies that are not ours. Because in the end it does not matter if we load our metadata into IMM or any other solution out there. As long as there is any programming code there, we are needed.

This article was originally published on Tomas Kratky’s LinkedIn Pulse.

We cherish your privacy.

And we need to tell you that this site uses cookies. Learn more in our Privacy Policy.