Basically, in any data warehousing project, idea is to is to publish the right data in the right format (the format that the user is expecting or will be excited to see). However, we miss many aspects by just focusing on smaller part of the problems. This article is touching some of the common realities in Data Warehousing, which we must not ignore.
1) Understandability. The final presentation layer to the end user must be immediately understandable, simple, recognizable, and intuitive. I also consider this as an implicit requirement. Users are not going to specify all these as part of the requirements.
2) Speed. Slow retrieval of query result is annoying and end-user doesn’t really care about your complex/complicated design. They pay for the simple solution, which can feed them correct result in real quick time. If due to some reason you are not able to deliver a performance efficient solution then make sure that end-user know that you are working on improving the performance, within a given time frame.
3) Costs. There are different kind of implicit and explicit costs associated with Data Warehouse projects.
1. Implementation Costs :- This is again divided into two types of cost.
1. Back-room cost of Extraction, Cleaning, Conforming and Delivering the data. (Around 70% of implementation cost). It further depends on following aspects:
1. Do you have to start your design from clean slate every time? This shows lack of reusable components.
2. Does your design depend too much on the complexity of source data?
2. Front-room cost of end-user queries and reports. (Around 30% of implementation cost).
2. Hardware & Software Costs :- Choose the suitable hardware & Software based on requirements.
1. You must choose a hardware which is scalabale in future.
2. The selected softwares must give you an ease to be able to develop a user-friendly Data Warehouse, which can be delivered quickly and any change/enhancement will be taken care nicely.
3. There will be different kinds of operational systems, Database engines, end-user query and analysis tools associated in the whole system and they must be able to integrate with each other.
3. Daily Administrative Costs :- Daily costs are corresponding to the task that need to be performed to be able to take advantage of the developed Data Warehouse. It includes
1. Daily upload of changed/new facts and dimension tables.
2. Productions of end-user reports.
4. Cost of Surprises :- It should be as minimum as possible. It is easier to tackle the known data. However, you never know what kind of new data (dimensions & facts) will come and what to do with them. Having said that
1. You should not consider late arriving facts, dimensions and correction of existing data as a big surprise. In fact, you must expect that these things will keep happening.
2. However, you can not say that your system can not take care of big surprises like receiving a new dimension attributes, new dimension tables, new granularity for the fact tables and new fact table itself.
1. All these big surprises require Database changes.
2. You must make sure that the existing user’s are not impacted because of these changes.
3. We need the most atomic data in our data warehouse in order to accomodate the big and small surprises. It also helps in answering the most precise questions and drilling down to the most operational perspective.
4. Now the new chanllenge is that Data Warehouse must connect the seamless data to historical data to be able to answer based on most recent events.
4) Data Relevance. Data must be relevant. It should be exactly what user wanted. For this to happen the Data Warehouse designer must live in two world — The end-user who keep stating (possibly changes too) requirements and IT world where warehouse needs to be implemented.
5) Prevention of inappropriate centralization. The data warehouse must be decentralized community of data marts , tied together that enables them to work together very effectively.
1. In fact, it is really hard to imagine that you will get all the resource & informations needed to be able to create a centralized Data Warehouse systems in all the cases. Of course, whereever, it makes more sense to have a centralized system, we have to design that.
6) Avoid Undue Delay. You must try to plan and design in such a way that product is deployed within given time.