Who needs a Data Warehouse Administrative Dashboard?

by Nicolae Guse 11. December 2011 22:31

Having a perfectly running Data Warehouse is not an easy job. It’s a mix of hardware and software challenges to address, spiced with a lot of data verifications in order to make sure that the system is not only up and running, but it also delivers the expected level of performance and data quality. 

On the hardware and software challenges, many of them are related on having the system operational no matter what. Let’s try to think at some of the scenarios a Data Warehouse has to handle on a daily basis, and why is it so hard to have it running perfectly:

1.       Because the whole system is usually very complex, and integrates data from multiple sources. If one of these data sources, which are spread over a wide geographical area, has a problem, you will have a problem at Data Warehouse level. And there are a lot of things to go wrong here, from power and hardware failures, networking up to software problems and crashed billing processes. None of them are under the control of the BI Department control, but any of them will have an impact over the Data Warehouse, when they are used as active Data Sources

2.       Going back to the Data Warehouse, it can also be affected by the same problems which affect the source applications. And this is why the whole Data Warehouse system must be build with redundancy in mind:

a.       Implement the use of Network Aliases, for each main component of the Data Warehouse. These aliases will point to either the Main or the Secondary servers, and will allow the applications which are based on them to continue operation, even if one of them goes down. Your should define such aliases for each main component of the Data Warehouse:

·         The servers which are hosting the Data Warehouse databases

·         The ETL servers

·         The Presentation Layer servers

All processes should take place on the Main server for each component. A Secondary server should always wait on standby, synchronized with the Main one by using features like Mirroring, Replication and Log Shipping. If something goes wrong with the Main server, and it cannot be contacted anymore by a Witness server, than the Network Alias will automatically be re-routed to the Secondary one for the affected component.

3.       Hardware resources are slowly overwhelmed by increased volumes of data. You should be able both predict and to trace the evolution of such trends before they turn into a problem. And this is where the term monitoring and verifications comes into place.  

4.       And the list can continue....



Data Warehouse Best Practices

Implementing a Generic Import Solution

by Nicolae Guse 4. December 2011 18:16

Building a Generic Import Solution in order to feed your Data Warehouse is an essential step in order to provide a solid foundation on which current and future Business Intelligence projects are being build. Such a component will add to other generic components like Business Rule Engine and Partitioning Mechanism, and will allow the Development phase of a Business Intelligence project to be focused on the specificities of each project, rather than spending each time massive amount of times in order to re-invent the wheel.

Developing a generic component will always take more time than implementing a quick fix approach, but, at the end, such a component does offer a much higher Return of Investment by:

·         Saving a lot of  time in future developments

·         Streamlining the whole Data Warehouse architecture

The only catch of this approach based on Generic Components is that it involves a Data Warehouse Architecture which is outside the scope of a regular Data Warehouse business project. Such a project is always business driven, and users will never ask for things like Partitioning Mechanism, for example. That means that it's under direct responsibility of the Business Intelligence department to take such initiatives, and to get the support of the upper management in order to build the technical foundation which will form the future backbone of the Data Warehouse. Which means that, in short and mid-term, you won't be able to make as many business projects as the business owners would like, which can make the Business Intelligence department somehow unpopular.

But I’m pretty sure that having a solid technical foundation will benefit a lot all Data Warehouse projects, in mid and long-term. As long as you are able to explain the business users on your intentions and the upcoming advantages, I think you'll have a win here. And this is necessary because the Data Warehouse Architecture scope stretches far beyond a regular Data Warehouse business project, which means that, at the end, a Data Warehouse is something more than the sum of all Data Marts.



Data Warehouse Best Practices

Real Time Data Warehouse - Is it really possible and do we really need it?

by Nicolae Guse 27. November 2011 15:49

Real Time Data Warehouse it's one of the concepts that's been bugging me from quite some time. Partially because I keep hearing it from various software vendors, that are praising the performance of their Database, ETL or Presentation Layer tool. And partially because I’m really wondering if, in the real and complex IT infrastructure of a company, is it really possible. And also if it's really necessary.

Before going forward on this concept, I think it's very necessary to understand what real time and critical means. I'll take as an example the Electronic Stability Program (ESP) implemented as an active security system for car safety. Where active means it will help you preventing a crash. The ESP uses sensors coupled on each wheel, steering wheel and a gyroscope in order to determine if the car is moving in the direction in which the driver intends to. If detects that the car is skidding out of control, it applies individual braking on each wheel, and reduces the engine power, in order to bring the car to the trajectory the driver intended. In this way, the crash is prevented. And this is where you need Real time decisions, because anything less would mean a crash, which would result in a life and death scenario. And it works so good since it uses just a few data, collected in real time, by using components designed only for this purpose.



Data Warehouse Best Practices

At the crossroads: MOLAP vs HOLAP vs ROLAP - Part 2

by Nicolae Guse 20. November 2011 23:01

I've been doing some research lately regarding MOLAP, ROLAP and HOLAP storages, under Microsoft Analysis Services, and I’ve found 2 very interesting articles on Microsoft site, that I highly recommend:

Now, regarding the 3 major storage modes, let’s go into detail about the scenarios in which each of them is best suited, based on their implementation in Microsoft Analysis Services. Most of the characteristics listed here are generic, and match most implementations from various vendors, while others, like the management of ROLAP aggregations, are missing from other vendor products:



  1. Very good query performance by pre-aggregating data into the multi-dimensional structure, during cube process.
  2. Lower I/O due to the use of data compression, when transferred from RDBMS to the MOLAP storage
  3.  Analysis Services Engine can use Partition Slice to pre-determine, during execution plan phase, which cube partitions need to be queried in order to satisfy the user query.
  4. High independency from the relational database which holds the fact table and dimension tables. Since the relational database is only touched during cube process, the entire fact table can be re-created using non-logged operations like SELECT INTO, in order to speed-up ETL processes, without any impact over the users query run on the cube.
  5. The cube partitioning feature works excellently in tandem with the SQL Server table partitioning feature, which allows designing of high performance and elegant ETL solutions. Gone are the days when you had to create a FACT_SALES table for each day, now you can have a single FACT_SALES table, with partitions for each day.
  6. Incremental load can limit the performance problem occurred during large cube full process, by processing just the data which changed



Data Warehouse Best Practices

At the crossroads: MOLAP vs HOLAP vs ROLAP - Part 1

by Nicolae Guse 13. November 2011 22:55

Before going for a deep dive in the MOLAP vs HOLAP vs ROLAP discussion, you should be aware that not all BI platforms have been created equal. And, when talking about BI platform i mean both the Backend and the Frontend applications. There are very few BI platforms which allow you to choose the storage mode implemented, since most of them are designed around a single storage mode.

In fact, if we think of the major OLAP vendors, we have the following situation:

1.Microsoft Analysis Services - supports all major storage modes: MOLAP;HOLAP;ROLAP

2.SAP Business Objects - by default supports only ROLAP mode, implemented into the BO Universe designer. Using Voyager tool, you could connect, at least in theory, to OLAP cubes, both from Microsoft and from Oracle Hyperion. In practice, I’ve tried a few times and miserably failed to connect to a Microsoft Analysis Services 2005 64 bit version, from BO XI 3.1. And also in practice, even if you connect to an existing cube, you are able to access it into read-only mode, which means that no modifications performed in the universe can be reflected back on the cube.

3.IBM Cognos TM1 / Powerplay - MOLAP

4.Microstrategy - ROLAP


a.Oracle BI Enterprise Edition - ROLAP

b.Oracle Hyperion Essbase - MOLAP

The reality is that, when deciding to implement a new BI tool within the company, or to replace an existing one, the focus in rarely on the Backend Server, and much more often on the Frontend application, because, at the end, this is what users see. Recent studies in the BI domain showed consistently that users want tools which are intuitive and very easy to use. Also, consistently, the majority of the BI users are using parametrizable Standard Reports most of the time, and rarely use Ad-hoc reporting. This demonstrates that usability is often, together with the price, one of the strongest arguments to choose a specific BI solution.



Data Warehouse Best Practices

Choosing the right ETL

by Nicolae Guse 6. November 2011 19:51

Choosing your ETL solution is one of the most important decisions that you'll make, when you're starting a new Data Warehouse. Why is it so important? Because is like in the horse and carriage example that I’ve been talking in the Data Warehouse dictionary article. Where ETL is the horse and the Data Warehouse is the carriage.

By following this logic, you'll have to answer yourself: What kind of horse do I want? Well, that's an easy answer: The best one.....But what is this actually mean?

1. The Fairytale Kind: the fastest horse, who can run like the wind, no matter what distance and no matter how heavy the carriage is. And yes, of course, it's as cheap as you can afford it, typically a gift from the Fairy. And now you know why it's called the Fairytale Kind

2. The Real Life Kind: in here you'll see that you'll have much more options. But wait, having more options, isn't something good? I have a feeling that we are going to find out:

· The Pure Racer Kind: is the fastest runner you're going to find. It will integrate your data in the blink of and eye...Well, a bit more, but you get the picture. The problem with this type of horse is that is a specialized one. It will get you there very fast, but it will work only in a handful of scenarios. Which typically will mean for you that it's more likely to carry a light chariot than a heavy carriage. And, let’s not forget, racing always sound like expensive...

· The Mule Kind: but wait, this isn't quite a horse, right? You've guess it, it's half way between a donkey and a horse. And why would you want to go there? Well, first of all, because mules are very good on carrying heavy loads, over long distances and high slopes. It's definitely the slowest of all, and also the most stubborn one, when you need to add something new in the carriage, but, at the end of the day, will get you there. It's also typically the cheapest of all.

· The Workhorse Kind: is typically the one you'll want on your carriage. I'll say it's the best compromise between the Pure Racer and the Mule. It's not either the fastest or can carry the heaviest load, but will get you there in a fair amount of time, with a pretty heavy load. Also, it's the most versatile of all, and it can be modified fairly quickly to tow very different loads. And definitely what you'll want in a Data Warehouse is versatility. If you can afford the price, of course.


Data Warehouse Best Practices

Implementing a Business Rule Engine

by Nicolae Guse 30. October 2011 23:26

While working on any Data Warehouse related project one of the things it's sure you're going to bounce on are Business Rules. Data Warehouses are build for Decision Support, and those decisions are based on indicators calculated based on a multitude of Business Rules, provided, surprisingly, by the business people from within your company.

Based on this logic, Business Rules are very important, and therefore is also very important how they are actually implemented at Data Warehouse level. Here we have the following likely scenario:

1.       Your company business evolves and the business people realize that they need to define a new set of business rules which will provide them a better level of granularity which will help them make better decision. Then follows a number of never-ending meetings, but, at the end, everyone agrees on them and now it's time to apply them at Data Warehouse level. And yes, they want to have these rules applied yesterday, because, you know the drill, it's business critical

2.       But wait, they cannot apply them by themselves, they need the BI Department to do it, since they are the ones which have created and are maintaining the Company's Data Warehouse. So, the business people are contacting the BI Department, which will dispatch a number of BI functional consultants that will analyze their business requirements and business rules and come up with a Functional BI specification. Which does take some time.



Data Warehouse Best Practices

The Name of the Game - Table Partitioning - Part 2

by Nicolae Guse 23. October 2011 19:17

Now that we've cleared up the main concepts associated to the Table Partitioning into the Part 1 of the article, let's make a list with the prerequisites necessary to apply the Partitioning Mechanism:

·         Partitioning Function - It contains the range of values which will be used for partitioning

·         Partitioning Schema - created based on the Partitioning Function, and used for creating partitioned tables. And obviously you can create more tables using the same partitioning schema

·         The table is created based on Partitioning Schema:


(               PartitionId int NOT NULL,

                Store varchar,

                MonthId datetime,

                Amount money



And now, let’s talk a bit about the logic implemented by the Partitioning Mechanism:



Data Warehouse Best Practices

The Name of the Game - Table Partitioning - Part 1

by Nicolae Guse 16. October 2011 22:12


First things first: a pretty good presentation of the problematic to be addressed by Partitioning can be found in the article To Have or Not to Have a Physical data mart. You should read it, since it will become quite handy during this article.

As I’ve said in the previous article, one of the main challenges for a Data Warehouse which has a ROLAP Presentation Layer on top, is to be able to refresh the data while users are running their reports, and to also maintaining the data consistency.

Starting with SQL Server 2005, Microsoft introduced Table Partitioning Feature, on which we will be focusing during this article. Other database engines have implemented this functionality, so i'm pretty sure that the concepts laid here are pretty generic.

So what's that all about with table partitioning? Is it a magical cure or a royal pain in the ... Well, as always, I’ll say it's a little bit of both.



Data Warehouse Best Practices

To Have or Not to Have a Physical data mart

by Nicolae Guse 16. October 2011 21:51

Some time ago I’ve been involved in a quite heated debate with a BO Universe Designer regarding the presence of a Physical data mart. This really gave me the idea for this article.

First of all, lets compile a list of arguments against having a Physical data mart, when the Presentation Layer is using ROLAP technology:

  1. It adds an additional layer on top of the Data Warehouse, which slows down development process when you need to add a few new details at Presentation Layer level.
    1. Normally, as long as you don't have a Physical data mart, you end up with a growing snowflake design, where you can add all the tables you need, from all Data Warehouse layers, and, after performing the necessary joins, get the details you want very fast. And this approach removes the dependency, for every small development, between the Data Warehouse development team and the Reporting team and increases the speed in which the user request it's being solved
  2. It creates unnecessary data duplication by de-normalizing the data from the Actual Data Warehouse, mostly in order to generate the flat dimension tables required for development
  3. It moves away the design of the Presentation Layer from Conformed Dimension principle, in which dimension are shared between Data Marts, by focusing more on the creation of de-normalized dimension tables which are designed around each Data Mart specific needs. By contrast, the same REF table can be used within different Data Marts, since each one encapsulates an atomic business concept, as a result of the Normalization process.

Well, then were in a pickle, since there are pretty strong arguments here. Except that they are not the full picture of the problem. Which really made me think which are the arguments in favour of a Physical data mart?



Data Warehouse Best Practices



The ideas you'll find on this site represent the opinion of their owner. They are not either about or being endorsed by a specific company. This site is a place of sharing ideas and trying to find together best practices on Data Warehouse and Business Intelligence projects.

Some of the technical approaches will be praized, some of them will be criticized, based on the concepts presented by their owners. You shoudn't feel offended if some of the ideas which either you or your company consider best practices, here are being criticized. You should also realize that what some people consider best practices on a particular system and business scenario, can be worst practices on another.  If you don't agree with a particular idea, feel free to comment it in a civilized manner.

Only by crashing different ideas and experiences we will find a better way of doing Data Warehouse and Business Intelligence projects.