Custom code in data warehouses tends to be a kind of mystery. Let’s answer the four most frequent questions about it.
Let’s focus on the existence of custom code in Business Intelligence. During the almost 15 years of my career thus far, I’ve seen a lot of it. I’ve spent the biggest part of my professional life as a developer, architect, and consultant. And I’ve written, maintained, and just seen a lot of source code out there. It is simply everywhere.
Accept it or not, it is a fact.
And it doesn’t matter if you handle Business Intelligence from Europe, Asia, or the U.S. or if you are working in finance, telco, pharma, or retail. Just in the last two years our Manta Tools team has had a lot of opportunities to work with huge companies in both the U.S. and Europe. And you can see the very same picture again and again – large BI environment, several different technologies, thousands of databases, and millions or even tens of millions of lines of custom code there. That is almost the same as or even more than, for example, the number of lines of source code in a core banking system.
1. Is code necessary?
Big question! There are a lot of people out there who pretend there is no custom code at all in BI, or at least there should be none. But is that really a smart idea? We need code to express what we want the computer to do for us. Yes, you can work on several levels of abstraction. Writing instructions in assembler is definitely not the same as coding it in PL/SQL or BTEQ (SQL is a very powerful and abstract language). And there are also other techniques to get rid of coding at least partially.
To say it in a very simple way (sorry for not being precise) – you can model your problem using diagrams, for example, and let a specialized program convert your models into the final code (SQL, etc.). Welcome to the world of model-driven development (MDD). Or you can limit yourself to a very specific set of ways how to do things, create so called domain specific language, and generate code using things like templates, for example. IBM did a very good job with the IBM AS/400 in this area. And today there are many tools like WhereScape, which try to do the same for BI.
BUT things are not so easy! Any honest person with real experience will tell you that modeling is great fun, but if you need to express something a little bit more complicated, writing code is much more effective. Also, limiting yourself is not going to work in all situations because complex things will force you to break your rules very quickly. And it doesn’t matter if you are writing a Java application or working in BI.
2. Is code BAD?
So we need code to tell our computer what to do, how to prepare reports, how to calculate customer profitability, how to merge several different records for the same customer into just one, and the list goes on with other examples of what else is needed every day in the BI world. Taking code into account is necessary, and the existing techniques for replacing code with something else (modeling, templates, …) don’t work well for more complicated business logic.
You should ask yourself why so many people hate code. My answer after so many years of experience is very simple – managing code is usually a nightmare. It is true in the world of Java and .NET with so many great tools to help, and it is even more true in the world of BI. SQL is a procedural language without the advanced capabilities of object-oriented languages which makes it really hard to manage long term. This is one of the many reasons why we started Manta Tools, to help with this issue and make the lives of BI workers so much easier and the wallets of BI managers less despoiled.
Simply said code is not bad, it is just hard because it expresses very complex things like business logic in the BI of a huge company. You just need the right solution to be more productive. I’ve seen many so called architects forcing developers not to write code. Typically those architects had no real experience in the development field, so no clue as to how ineffective, painful, and messy “writing” code using models can be from both the short-term and long-term perspective. Don’t get me wrong – being in control of code and whole BI is necessary, but if you need to drive in a nail, use a hammer to do it. And take care of your hammer the best you can.
3. Why don’t big vendors support code in their data management suites?
That’s not an easy question to answer. You need to understand that information management is a broad discipline. You have to put together different types of metadata from within the entire organization, you need to support all different layers of abstraction from business terms to physical structures, you need to deal with static and dynamic aspects of information management, etc. And dealing with code is SO HARD. We have been doing it for two years now in Manta Tools, and we also invested a lot of energy into solving this problem in my previous company several years ago.
Customers have never pushed big vendors to solve this issue. In my experience, the reason why is in most cases because data management projects are implemented just to comply with existing regulations. And it’s much different if you are implementing an information management initiative to satisfy real business needs versus if you are just trying to do as little as possible to be in compliance with the law. Fortunately the situation is changing very quickly. Now customers are more demanding, and we’ve gotten our first requests from big vendors to help them finish many information management projects just by providing them metadata from custom code.
4. Is code a legacy problem?
Come on! Be serious. Even modern ways of doing BI use custom code. And with all the modern big data platforms and analytics, there is even more custom code than ever before. What about Hadoop, Map&Reduce, iPaaS tools merging both application and data integration together, etc.? And this trend will continue! As I’ve said: code was, is, and will be necessary in BI. There is no good way to get rid of it, and, frankly, with solutions like Manta Tools there is no reason to even try.