We shall begin with the core concept that drives the
functionality and architecture of a Data Warehouse.
“Data warehouses are built using dimensional data
models which consist of Fact and Dimension tables”.
In simpler words, Facts and Dimensions are the two
major categories into which we divide our entities(tables) in a DWH.
The architecture and efficiency of the DWH depends on
how we identify/categorize/design the Facts and Dimensions.
Before getting into any further technical explanation
of what facts and dimensions actually are, let us look at the following
statement:
“It is 10122
runs!!!”
Though we know that 10122 runs is something related to cricket, we have
no clue what the above statement represents.
However, there is a whole new meaning when we say that “10122 are the
total runs scored by Sunil Gavaskar in Test cricket”.
Let’s relate this: “10122 runs” is a raw FACT, and it has no
meaning until we mention the DIMENSIONS which are the
Player-Name(Gavaskar) and the Format(Test cricket).
Here is another example that we encounter in an
enterprise:
“Quickie-Mart registered Total
Sales of 2.2 million on 12th-Apr-2011”.
Based on the earlier scenario, we can make out that,
here - “Total-Sales” is a Fact whereas “Quickie-Mart”(Organization Dimension) and “12th-Apr-2011”(Date dimension)
are the dimensions.
We can segregate the above information by adding
further dimensions.
Eg:
·
Quickie-Mart
registered Total-Sales of $1.3 million on Product A on 12th-Apr-2011 ---(adding a product Dimension)
·
Quickie-Mart
registered Total-Sales of $0.5 million of Product A in the Asia-Pacific
region on 12th-Apr-2011 ---(adding a Sales-Geography Dimension)
After the above examples we can deduce the following
statements:
Fact Tables consist of the measurements, metrics or facts of a business
process.
Dimension Tables provide a structured information to quantify (give a meaning to) those facts.
Addition Info:
·
Fact tables form
the Centre of Data Warehouse and are surrounded by Dimensions. Each fact table
would have a Foreign-key that would have a corresponding Primary key in the
dimension tables.
·
Fact table keeps
on growing in size whereas the growth of Dimensions is generally slower(There are number of Sales transactions in a day but
cannot imagine product category to grow at that pace).
·
Since the fact tables
keep on growing in size very rapidly, it is desirable that the data redundancy
is reduced to the max, hence the fact tables are normalized to the maximum
extent, however it may not be the same with Dimension tables.
Mr. Reddy, I'm working on this scenario:
ReplyDeleteA publishing company produces scientific books on various subjects. The books are written by authors who specialize in one particular subject. The manager in the company is interested to build data warehouse implementation by executing a data mart in their holding company. The manager said that he want to have analysis of the sales for his entire company to support their strategic marketing plan.
He wants to know how many books they have sell in their company for specific author, at period of time such as by day, month, quarter and year. He also wants to have the figure of sales quantity, amount and the cost for particular book, subject and period.
I have found 4 dimensions, which are author, book, time and subject. My question is how can I design a cube of the DW with 4 dimension? does cubes allow us to use more than 3 dimensions?