Thursday, July 4, 2013

Facts and Dimensions

Now that we got to know that a Data Warehouse is the most important Cog in the BI (Business Intelligence) wheel, it is time to get our hands dirty and scan deeper into the Data Warehouse(DWH).

We shall begin with the core concept that drives the functionality and architecture of a Data Warehouse.
 
FACTS & DIMENSIONS

Data warehouses are built using dimensional data models which consist of Fact and Dimension tables”.

In simpler words, Facts and Dimensions are the two major categories into which we divide our entities(tables) in a DWH.

The architecture and efficiency of the DWH depends on how we identify/categorize/design the Facts and Dimensions.

Before getting into any further technical explanation of what facts and dimensions actually are,  let us look at the following statement:
 
“It is 10122 runs!!!

Though we know that 10122 runs is something related to cricket, we have no clue what the above statement represents.

However, there is a whole new meaning when we say that “10122 are the total runs scored by Sunil Gavaskar in Test cricket”.


Let’s relate this: “10122 runs” is a raw FACT, and it has no meaning until we mention the DIMENSIONS which are the Player-Name(Gavaskar) and the Format(Test cricket).

Here is another example that we encounter in an enterprise:

“Quickie-Mart registered Total Sales of 2.2 million on 12th-Apr-2011”.

Based on the earlier scenario, we can make out that, here - “Total-Sales” is a Fact whereas “Quickie-Mart”(Organization Dimension) and “12th-Apr-2011”(Date dimension)  are the dimensions.

We can segregate the above information by adding further dimensions.

Eg:

·         Quickie-Mart registered Total-Sales of $1.3 million on Product A on 12th-Apr-2011 ---(adding a product Dimension)

·         Quickie-Mart registered Total-Sales of $0.5 million of Product A in the Asia-Pacific region on 12th-Apr-2011 ---(adding a Sales-Geography Dimension)

After the above examples we can deduce the following statements:

Fact Tables consist of the measurements, metrics or facts of a business process.

Dimension Tables provide a structured information to quantify (give a meaning to) those facts.

Addition Info:

·         Fact tables form the Centre of Data Warehouse and are surrounded by Dimensions. Each fact table would have a Foreign-key that would have a corresponding Primary key in the dimension tables.

·         Fact table keeps on growing in size whereas the growth of Dimensions is generally slower(There are number of Sales transactions in a day but cannot imagine product category to grow at that pace).

·         Since the fact tables keep on growing in size very rapidly, it is desirable that the data redundancy is reduced to the max, hence the fact tables are normalized to the maximum extent, however it may not be the same with Dimension tables.

 In the coming posts we shall look into other aspects of DWH with more examples which shall serve in getting more insights into Facts and Dimensions.

1 comment:

  1. Mr. Reddy, I'm working on this scenario:

    A publishing company produces scientific books on various subjects. The books are written by authors who specialize in one particular subject. The manager in the company is interested to build data warehouse implementation by executing a data mart in their holding company. The manager said that he want to have analysis of the sales for his entire company to support their strategic marketing plan.
    He wants to know how many books they have sell in their company for specific author, at period of time such as by day, month, quarter and year. He also wants to have the figure of sales quantity, amount and the cost for particular book, subject and period.

    I have found 4 dimensions, which are author, book, time and subject. My question is how can I design a cube of the DW with 4 dimension? does cubes allow us to use more than 3 dimensions?

    ReplyDelete