Sunday, July 8, 2012

Reset Table Identity

DBCC CHECKIDENT can reset the identity value of the table. For example, YourTable has 25 rows  with 25 as last identity and If we want next record to have identity other than 26.

OR

If you had truncated a table and reloaded, the Identity value will start from the previos value i.e YourTable has 25 rows  and you had truncated and reloaded now your identity column values will be from 25 - 50 for the 25 records you inserted.

Run following T SQL script to reset or reseed the identity value of the table:
DBCC CHECKIDENT ("dbo.TableName", RESEED, 1);

If table has to start with an identity of 1 with the next insert then table should be reseeded with the identity to 1. If identity seed is set below values that currently are in table, it will violate the uniqueness constraint as soon as the values start to duplicate and will generate error.

Wednesday, July 4, 2012

Stored Procedure to raise a message

CREATE PROCEDURE [dbo].[usp_Raise_Message]
@strMessage NVARCHAR(MAX) = NULL -- Message to display
AS
BEGIN
SET NOCOUNT ON
SET @strMessage = CONVERT(NVARCHAR(100),GETDATE()) + ' : ' + ISNULL(@strMessage,'No Message Raised')
RAISERROR(@strMessage,0,1) WITH NOWAIT
END

Tuesday, June 26, 2012

Query to find the Database Name and File Size/Space Info

SELECT
a.FILEID,
CONVERT(decimal(12,2),ROUND(a.size/128.000,2)) as [FILESIZEINMB] ,
CONVERT(decimal(12,2),ROUND(fileproperty(a.name,'SpaceUsed')/128.000,2)) as [SPACEUSEDINMB],
CONVERT(decimal(12,2),ROUND((a.size-fileproperty(a.name,'SpaceUsed'))/128.000,2)) as [FREESPACEINMB],
a.name as [DATABASENAME],
a.FILENAME as [FILENAME]
FROM
dbo.sysfiles a

Sunday, June 24, 2012

An Overview of Performance Tuning is SSIS - Buffer Usage and Executuion Trees

When you architect data integration solutions, your design decisions not only determine how successfully your solution meets functional requirements, but also how well your solution meets performance requirements. To make the right performance design decisions, you need to understand the performance architecture of your data integration tool and, just as importantly, the techniques that enable you to maximize the tool’s utilization of system resources such as memory and CPU.
SSIS provides a variety of optimization opportunities to help you maximize resource utilization while successfully meeting the needs of your specific data integration scenario.
Before you can take advantage of specific tuning techniques in SSIS packages, it is important to familiarize yourself with the SSIS architecture. This architecture consists of two major components: the run-time engine and the data flow engine.

The run-time engine

The run-time engine is a highly parallel control flow engine that coordinates the execution of tasks or units of work within SSIS and manages the engine threads that carry out those tasks. For the most part, the performance of the run-time engine is most heavily influenced by conditions external to SSIS, such as network bandwidth, and interaction with external systems such as database servers, FTP servers, or email servers. When SSIS runs an Execute SQL Task, for example, it sends a call to the target database and then waits for a response from the database server before it continues. In this scenario, the performance of the Execute SQL Task is more dependent on the performance of the query execution than on the SSIS run-time engine.

The data flow engine

When you use SSIS for data integration, in addition to the run-time engine, you use the data flow engine that manages the data pipeline. The data flow engine is invoked by a special task in SSIS called the Data Flow task. When the Data Flow task executes, the SSIS data flow engine extracts data from one or more data sources, performs any necessary transformations on the extracted data, and then delivers that data to one or more destinations.
With data integration solutions, you will likely spend a large part of your performance tuning time optimizing the data flow engine. Like the run-time engine, the data flow engine is influenced by external conditions; however, within SSIS, there are a variety of settings that you can manipulate to tune the data flow engine’s performance based on the requirements of the data integration operations.

Buffer Usage

Behind the scenes, the data flow engine uses a buffer-oriented architecture to efficiently load and manipulate datasets in memory. The benefit of this in-memory processing is that you do not need to physically copy and stage data at each step of the data integration. Rather, the data flow engine manipulates data as it is transferred from source to destination.
As this data flows through the pipeline, SSIS attempts to reuse data from prior buffers as much as possible when additional operations are performed. How buffers are used and reused depend on the type of transformations that you use in a pipeline.
Row Transformations - Row transformations either manipulate data or create new fields using the data that is available in that row. Examples of SSIS components that perform row transformations include Derived Column, Data Conversion, Multicast, and Lookup. While these components might create new columns, row transformations do not create any additional records. Because each output row has a 1:1 relationship with an input row, row transformations are also known as synchronous transformations. Row transformations have the advantage of reusing existing buffers and do not require data to be copied to a new buffer to complete the transformation.
Partially blocking transformations - Partially blocking transformations are often used to combine datasets. They tend to have multiple data inputs. As a result, their output may have the same, greater, or fewer records than the total number of input records. Since the number of input records will likely not match the number of output records, these transformations are also called asynchronous transformations. Examples of partially blocking transformation components available in SSIS include Merge, Merge Join, and Union All. With partially blocking transformations, the output of the transformation is copied into a new buffer and a new thread may be introduced into the data flow.
Blocking transformations - Blocking transformations must read and process all input records before creating any output records. Of all of the transformation types, these transformations perform the most work and can have the greatest impact on available resources. Example components in SSIS include Aggregate and Sort. Like partially blocking transformations, blocking transformations are also considered to be asynchronous. Similarly, when a blocking transformation is encountered in the data flow, a new buffer is created for its output and a new thread is introduced into the data flow.
Transformations are not the only components that can be categorized as synchronous or asynchronous. Sources are a special type of asynchronous component. For example, an RDBMS source component creates two types of buffers: one for the Success output and one for the Error output. By contrast, destinations are a special type of synchronous component. You will see the interactions of source and destinations components when you examine the Execution Trees of a package.

Execution Trees

Execution trees demonstrate how your package uses buffers and threads. At run time, the data flow engine breaks down Data Flow task operations into execution trees. These execution trees specify how buffers and threads are allocated in the package. Each tree creates a new buffer and may execute on a different thread. When a new buffer is created such as when a partially blocking or blocking transformation is added to the pipeline, additional memory is required to handle the data transformation; however, it is important to note that each new tree may also give you an additional worker thread.
Examine the execution trees in the example depicted in Figure 1 and Table 1 where two Employee datasets are combined together and then aggregated to load into a common destination table.
In Execution Tree 2, SSIS reads data from the Employee OLE DB Source into the pipeline, a Derived Column transformation adds another column, and SSIS passes data to the Union All transformation. All of the operations in this execution tree use the same buffer; data is not copied again once it is read into the OLE DB Source Output.
In Execution Tree 3, SSIS creates a buffer to hold error records from the asynchronous Employee OLE DB Source before loading them into a destination error table.
In Execution Tree 4, SSIS reads data from the Employee Flat File Source and passes it to the Union All. These two operations use the same buffer.
In Execution Tree 5, a buffer is created to hold errors from the asynchronous Employee Flat File Source before loading them into a destination error table.
In Execution Tree 0, the Partially Blocking Union All transformation is executed and a new buffer is created to store the combined data and the aggregate is calculated.
In Execution Tree 1, after the Fully Blocking Aggregate transformation is completed, the output from the Aggregate operation is copied into a new buffer and data is loaded into the OLE DB Destination.
This example demonstrates how execution trees can help you understand buffer usage in a common SSIS package. This example also highlights how Partially Blocking transformations like Union All and Fully Blocking transformations like Aggregate create new buffers and threads whereas Row Transformations like Derived Column do not.
Execution trees are enormously valuable in understanding buffer usage. You can display execution trees for your own packages by turning on package logging, enabling logging for the Data Flow task, and then selecting the Pipeline Execution Tree event. Note that you will not see the execution trees until you execute the package. When you do execute the package, the execution trees appear in the Log Events window in Business Intelligence (BI) Development Studio.

Writers: Elizabeth Vitt, Intellimentum and Hitachi Corporation
Contributors: Donald Farmer, Microsoft Corporation; Ashvini Sharma, Microsoft Corporation; Stacia Misner, Hitachi Consulting

Wednesday, June 20, 2012

Filtered Index

Filtered Index is a new feature in SQL SERVER 2008 used to index a portion of rows in a table which improves query performance, reduce index maintenance costs and reduce index storage costs compared to full-table indexes.

If we have table with thousands of records and we are only concern with very few rows in our query we should use Filtered Index.

When we see an Index created with some filter i.e. WHERE clause then it is a FILTERED INDEX.

Consider these points while creating Filtered Index:

- They can be created only as Nonclustered Index
- They can be used on Views only if they are persisted views.
- They cannot be created on full-text Indexes.


Example:

CREATE NONCLUSTERED INDEX NCI_Dept
ON HR.Emp(EmpID)
WHERE Title= 'Manager'

Have a look at the Execution Plan and compare the performance before and after the Filtered Index!!

An Introduction to Data Compression in SQL Server 2008 - By Brad M. McGehee

Hi Everyone, today i came across this wondeful article by Brad M. McGehee, Director of DBA Education, Red Gate Software.. and i would suggest to go through this post for an insight on DATA COMPRESSION.

There is one thing every DBA knows with certainty, and that is that databases grow with time. MDFs grow, backups grow, and it never stops. The more data we have, the more work SQL Server has to perform in order to deal with it all; whether it’s executing a query on a table with 10 million rows, or backing up a 5 TB database. Whether we like it or not, we are fighting a losing battle, and DBA’s can’t reverse the information explosion. Or can we?

While we can’t stop growth, SQL Server 2008 (Enterprise Edition only), gives us some new tools to help us better deal with all this data, and that is the promise of compression. Given the right circumstances, DBAs can use data compression to reduce the size of our MDFS, and backup compression can help us reduce the amount of space our backups take. Not only does compression reduce physical file sizes, it reduces disk I/O, which can greatly enhance the performance of many database applications, along with database backups.
When we discuss SQL Server compression, we need to think of it two different ways. First, there is data compression, which includes row-level and page-level compression that occurs within the MDF files of our databases. Second, there is backup compression, which occurs only when data is backed up. While both of these are forms of compression, they are architected differently. Because of this, it is important to treat them separately.
Data Compression comes in two different forms:
  • Row-level Data Compression: Row-level data compression is essentially turning fixed length data types into variable length data types, freeing up empty space. It also has the ability to ignore zero and null values, saving additional space. In turn, more rows can fit into a single data page.
  • Page-level Data Compression: Page-level data compression starts with row-level data compression, then adds two additional compression features: prefix and dictionary compression. We will take a look at what this means a little later in this chapter. As you can imagine, page-level compression offers increased data compression over row-level compression alone.
Backup Compression comes in a single form:
  • Backup Compression: Backup compression does not use row-level or page-level data compression. Instead, backup compression occurs only at the time of a backup, and it uses its own proprietary compression technique. Backup compression can be used when using, or not using, data compression, although using backup compression on a database that is already compressed using data compression may not offer additional benefits.
In the next section, we will take a high-level overview of data compression, and then we will drill down into the detail of the different types of compression available with SQL Server 2008.

Data Compression Overview

Data compression has been around for years. For example, who hasn’t zipped a file at some point in their career? While compression isn’t a new technology, it’s new to SQL Server. Unlike zip compression, SQL Server’s data compression does not automatically compress an entire database; instead, data compression can only be used for these database objects:
  • A table stored as a heap
  • A table stored as a clustered index
  • A non-clustered index
  • An indexed view
  • Partitioned tables and indexes
In other words, as the DBA, you must evaluate each of the above objects in your database, decide if you want to compress it, then decide whether you want to compress it using either row-level or page level compression. Once you have completed this evaluation, then you must turn on compression for that object. There is no single switch you can flip that will turn compression on or off for all the objects listed above, although you could write a Transact-SQL script to accomplish this task.
Fortunately, other than turning compression on or off for the above objects, you don’t have to do anything else to enable compression. You don’t have to re-architect your database or your application, as data compression is entirely handled under the covers by the SQL Server Storage Engine. When data is passed to the Storage Engine, it is compressed and stored in the designated compressed format (on disk and in the Buffer Cache). When the Storage Engine passes the information to another component of SQL Server, then the Storage Engine has to uncompress it. In
other words, every time data has to be passed to or from the Storage Engine, it has to be compressed or uncompressed. While this does take extra CPU overhead to accomplish, in many cases, the amount of disk I/O saved by compression more than makes up for the CPU costs, boosting the overall performance of SQL Server.
Here’s a simplified example. Let’s say that we want to update a row in a table, and that the row we want to update is currently stored on disk in a table that is using row-level data compression. When we execute the UPDATE statement, the Relational Engine (Query Processor) parses, compiles, and optimizes the UPDATE statement, ready to execute it. Before the statement can be executed, the Relational Engine needs the row of data that is currently stored on disk in the compressed format, so the Relational Engine requests the data by asking the Storage Engine to go get it. The Storage Engine (with the help of the SQLOS) goes and gets the compressed data from disk and brings it into the Buffer Cache, where the data continues to remain in its compressed format.
Once the data is in the Buffer Cache, the row is handed off to the Relational Engine from the Storage Engine. During this pass off, the compressed row is uncompressed and given to the Relational Engine to UPDATE. Once the row has been updated, it is then passed back to the Storage Engine, where is it again compressed and stored in the Buffer Cache. At some point, the row will be flushed to disk, where it is stored on disk in its compressed format.
Data compression offers many benefits. Besides the obvious one of reducing the amount of physical disk space required to store data—and the disk I/O needed to write and read it—it also reduces the amount of Buffer Cache memory needed to store data in the Buffer Cache. This in turn allows more data to be stored in the Buffer Cache, reducing the need for SQL Server to access the disk to get data, as the data is now more likely to be in memory than disk, further reducing disk I/O.
Just as data compression offers benefits, so it has some disadvantages. Using compression uses up additional CPU cycles. If your server has plenty to spare, then you have no problem. But if your server is already experiencing a CPU bottleneck, then perhaps compression is better left turned off.

Row‐Level Data Compression

The simplest method of data compression, row-level compression, works by:
  • Reducing the amount of metadata used to store a row.
  • Storing fixed length numeric data types as if they were variable-length data types. For example, if you store the value 1 in a bigint data type, storage will only take 1 byte, not 8 bytes, which the bigint data types normally takes.
  • Storing CHAR data types as variable-length data types. For example, if you have a CHAR (100) data type, and only store 10 characters in it, blank characters are not stored, thus reducing the space needed to the store data.
  • Not storing NULL or 0 values
Row-level data compression offers less compression than page-level data compression, but it also incurs less overhead, reducing the amount of CPU resources required to implement it.

Page Level Data Compression

Page-level data compression offers greater compression, but at the expense of greater CPU utilization. It works using these techniques:
  • It starts out by using row-level data compression to get as many rows as it can on a single page.
  • Next, prefix compression is run. Essentially, repeating patterns of data at the beginning of the values of a given column are removed and substituted with an abbreviated reference that is stored in the compression information (CI) structure that immediately follows the page header of a data page.
  • And last, dictionary compression is used. Dictionary compression searches for repeated values anywhere on a page and stores them in the CI. One of the major differences between prefix and dictionary compression is that prefix compression is restricted to one column, while dictionary compression works anywhere on a data page.
The amount of compression provided by page-level data compression is highly dependent on the data stored in a table or index. If a lot of the data repeats itself, then compression is more efficient. If the data is more random, then little benefits can be gained using page-level compression.

Data Compression Demo

Data compression can be performed using either SQL Server Management Studio (SSMS) or by using Transact-SQL. For this demo, we will take a look at how you can compress a table that uses a clustered index, using SSMS.

Let’s say that we want to compress the Sales.Customer table (which has a clustered index) in the AdventureWorks database. The first step is to right-click on the table in SSMS, select “Storage,” and then select “Manage Compression.”
image
Figure 1: SSMS can be used to manage compression.

This brings up the Data Compression Wizard, displayed below.
image
Figure 2: The Data Compression Wizard, or Transact-SQL commands, can be used to manage data compression.

After clicking “Next,” the wizard displays the following screen, which allows you not only to select the compression type, but it also allows you to calculate how much space you will save once compression has been turned on.
image
Figure 3: Use this screen to select the compression type, and to calculate how much space will be saved.

First, let’s see how much space we will save if we use row-level compression on this table. To find out, click on the drop-down box below “Compression Type,” select “Row,” and then click “Calculate.”
image
Figure 4: For this table, row-level compression doesn’t offer much compression.

After clicking “Calculate,” the wizard runs and calculates how much space is currently being used, and how much space would be used after row-level compression. As we can see, very little space will be saved, about 1.6%.
Now, let’s see how much compression savings page-level compression offers us for this particular table. Again, I go to the drop-down menu under “Compression Type,” select “Page,” then press “Calculate.”
image
Figure 5: Page-level compression is higher than row-level compression.

After pressing “Calculate,” we see that compression has improved greatly, now saving about 20% space. At this point, if you should decide to turn on page-level compression for this table, click on the “Next” button.
image
Figure 6: The wizard allows you several options in which to turn on compression.

At the above screen, you can choose to perform the compression now (not a good idea during busy production times because the initial compression process can be very CPU and disk I/O intensive), schedule it to run later, or just to script the Transact-SQL code so you can run it at your convenience.

Once you have compressed this table (a clustered index), keep in mind that any non-clustered indexes that this table may have are not automatically compressed for you. Remember, compression is based on a per object basis. If you want to compress the non-clustered indexes for this table, you will have to compress each one, one at a time.

While this wizard helps you to see how much compression either method offers, it does not suggest which compression method should be used, nor does it recommend whether compression should be used at all for this object. As the DBA, it will be your job to evaluate each compressible object to determine if it should have compression enabled or not. In other words, you must decide if the benefits of compression outweigh the negatives.

Backup Compression

For years, there have been third-party programs that allow you to compress and speed up SQL Server backups. In most regards, the backup compression included with the Enterprise Edition of SQL Server is very plain vanilla. In fact, if you already are using a third-party backup program, I would suggest you continue using it, because SQL Server 2008 backup compression offers fewer features. In fact, the only option SQL Server 2008 backup compression offers you is to turn it off or on.
SQL Server 2008 backup compression, like the third-party add-ons, compresses backups, which not only saves backup space, but it can substantially reduce backup times. Unlike data compression, there is very little downside to using backup compression, other than the additional CPU resources required to perform the compression (or decompression during a restore). Assuming that you perform backups during slower times of the day, the additional CPU resources used will not be noticeable.
The time and space savings offered by backup compression depends on the data in your database. If you are heavily using data compression in your databases, or are using Transparent Data Encryption, then using backup compression probably won’t offer you many benefits, as already compressed data, or encrypted data, is not very compressible.
Let’s take a brief look at how you turn on SQL Server 2008 backup compression. While our example will use SSMS, you can use Transact-SQL to perform the same task. To backup AdventureWorks, right-click on the database in SSMS, select “Tasks,” and then select “Back Up,” and the backup dialog box appears.
image
Figure 7: As with any backup, you can use the backup dialog box to make your selections.

Once you have selected your backup options, next click on “Options,” and the following screen appears.
image
Figure 8: Backup compression options are limited.
At the top of figure 8 are the standard backup options, while at the bottom of the screen you see the options for backup compression. Notice that you only have three choices.
The first option, “Use the default server settings” tells the backup to use the server’s default backup compression setting. In SQL Server 2008, there is a new sp_configure option called “backup compression default.” By default, it is set to have backup compression off. If you want, you can set this option so that backup compression is turned on by default. So if you choose the “Use the default server settings” option above, then whatever option is set for the “backup compression default” will be used for the backup.
The “Compress Backup” option turns backup compression on, and the “Do not compress backup” option turns it off. Both of these options override the “backup compress default” server setting, whatever it happens to be.
Once you have chosen your backup compression method, you proceed with the backup just like any other SQL Server backup.
If you need to restore a compressed backup, you don’t have to do anything special, it will uncompress itself automatically. Although you can only compress backups using the Enterprise Edition of SQL Server 2008, you can restore a compressed backup to any edition of SQL Server 2008. On the other hand, you cannot restore a compressed SQL Server 2008 backup to any previous version of SQL Server.

Summary

In this article, we have learned about the two forms of data compression, and about backup compression. While data compression might seem like a seductive new feature of SQL Server, I highly recommend that it is only used by experienced DBAs. While it offers lots of promise for increased performance, it can just as easily cause performance problems if misused. Backup compression, on the other hand, can be used by DBAs of all skill levels.

Tuesday, June 19, 2012

A Brief Note on VIEWS

View is nothing but an image table or virtual table, which is created for a base table. A view can be created by taking all values from the base table or by taking only selected values from base table.

Note: If we perform any modifications in base table, then those modifications automatically effected in view and vice-versa.

Syntax:

CREATE VIEW VIEWNAME [WITH ENCRYPTION]
AS SELECT * FROM TABLENAME [WHERE CONDITION]
                                                                 [WITH CHECK OPTION]


E.g.:
CREATE VIEW V1 AS SELECT * FROM EMP
INSERT INTO V1 VALUES (45,’ARJUN’, 10000,10)


The above insert statement inserts the values into base table EMP as well as into view v1.

E.g.:

CREATE VIEW V2 AS SELECT * FROM EMP WHERE DEPTNO=10
INSERT INTO V2 VALUES (55,’NAG’, 10000,10)


The above insert statement inserts the values into base table EMP as well as into view v2.

INSERT INTO V2 VALUES (65,’NAGARJUN’, 20000,20)


The above insert statement inserts the values into only base table EMP but not into view v2 because according to the definition of v2 user supplied values are invalid values ( DEPTNO=20 is not valid as view definition is with condition where clause). It means invalid values are inserting into the base table EMP.

To avoid such operations we need to create the view with – ‘WITH CHECK OPTION
E.g: CREATE VIEW V2 AS SELECT * FROM EMP WHERE DEPTNO=10
                                                                                    WITH CHECK OPTION                                                           


INSERT INTO V3 VALUES (75,’ARUN’,25000,20)

The above insert statement cannot insert values into base table EMP and into View V3.

SP_HELPTEXT: This Stored Procedure is used to display the definition of a specific view.

Syntax: SP_HELPTEXT VIEW NAME
E.g.: SP_HELPTEXT V1
Output:  CREATE V1 AS SELECT * FROM EMP


If we want to hide the definition of view use ‘WITH ENCRYTION’.

WITH ENCRYTION: Once we create a view with ‘WITH ENCRYTION’ then we cannot find the definition of that view using SP_HELPTEXT.

E.g.:

CREATE VIEW V4 WITH ENCRYPTION
AS SELECT * FROM EMP WHERE DEPTNO=20

SP_HELPTEXT V4

Output: The text for V4 object is encrypted.

To Open the definition of view V4 we have to follow the below approach

       1.       Replace the CREATE with ALTER
2.       Remove WITH ENCRYPTION key word
3.       Select the query and press F5
E.g.:

ALTER VIEW V4
AS SELECT * FROM EMP WHERE DEPTNO =20

Creating view based on another views:
SQL Server enables users to create views based on another view. We can create view based on another view up to 32 levels.


Syntax:


CREATE VIEW VIEWNAME [WITH ENCRYPTION]
AS SELECT * FROM VIEWNAME [WHERE CONDITION]
                                                                 [WITH CHECK OPTION]

E.g.: CREATE VIEW V5 AS SELECT * FROM V1 WHERE DEPTNO=10

Syntax to drop the views:
DROP VIEW VIEWNAME […..N]
E.g.:  DROP VIEW V1,V2,V3,V4,V5

Wednesday, February 15, 2012

SQL Interview Questions

What is RDBMS?

Relational Data Base Management Systems (RDBMS) are database management systems that maintain data records and indices in tables.

What is Normalization?

Database normalization is a data design and organization process applied to data structures based on rules that help building relational databases. In relational database design, the process of organizing data to minimize redundancy is called normalization. Normalization usually involves dividing a database into two or more tables and defining relationships between the tables. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.

What is De-normalization?

De-normalization is the process of attempting to optimize the performance of a database by adding redundant data. It is sometimes necessary because current DBMSs implement the relational model poorly. A true relational DBMS would allow for a fully normalized database at the logical level, while providing physical storage of data that is tuned for high performance. De-normalization is a technique to move from higher to lower normal forms of database modeling in order to speed up database access.

How is ACID property related to Database?

ACID (an acronym for Atomicity Consistency Isolation Durability) is a concept that Database Professionals generally look for while evaluating databases and application architectures. For a reliable database, all this four attributes should be achieved.
Atomicity is an all-or-none proposition.
Consistency guarantees that a transaction never leaves your database in a half-finished state.
Isolation keeps transactions separated from each other until they are finished.
Durability guarantees that the database will keep track of pending changes in such a way that the server can recover from an abnormal termination.

What are the Different Normalization Forms?

1NF: Eliminate Repeating GroupsMake a separate table for each set of related attributes, and give each table a primary key. Each field contains at most one value from its attribute domain.
2NF: Eliminate Redundant Data
If an attribute depends on only part of a multi-valued key, then remove it to a separate table.
3NF: Eliminate Columns Not Dependent On Key
If attributes do not contribute to a description of the key, then remove them to a separate table. All attributes must be directly dependent on the primary key.
BCNF: Boyce-Codd Normal Form
If there are non-trivial dependencies between candidate key attributes, then separate them out into distinct tables.
4NF: Isolate Independent Multiple Relationships
No table may contain two or more 1: n or n: m relationships that are not directly related.
5NF: Isolate Semantically Related Multiple Relationships
There may be practical constrains on information that justify separating logically related many-to-many relationships.
ONF: Optimal Normal Form
A model limited to only simple (elemental) facts, as expressed in Object Role Model notation.
DKNF: Domain-Key Normal Form
A model free from all modification anomalies is said to be in DKNF.
Remember, these normalization guidelines are cumulative. For a database to be in 3NF, it must first fulfill all the criteria of a 2NF and 1NF database.

What is a Stored Procedure?

A stored procedure is a named group of SQL statements that have been previously created and stored in the server database. Stored procedures accept input parameters so that a single procedure can be used over the network by several clients using different input data. And when the procedure is modified, all clients automatically get the new version. Stored procedures reduce network traffic and improve performance. Stored procedures can be used to help ensure the integrity of the database.
e.g. sp_helpdb, sp_renamedb, sp_depends etc.

What is a Trigger?

A trigger is a SQL procedure that initiates an action when an event (INSERT, DELETE or UPDATE) occurs. Triggers are stored in and managed by the DBMS. Triggers are used to maintain the referential integrity of data by changing the data in a systematic fashion. A trigger cannot be called or executed; DBMS automatically fires the trigger as a result of a data modification to the associated table. Triggers can be considered to be similar to stored procedures in that both consist of procedural logic that is stored at the database level. Stored procedures, however, are not event-drive and are not attached to a specific table as triggers are. Stored procedures are explicitly executed by invoking a CALL to the procedure while triggers are implicitly executed. In addition, triggers can also execute stored procedures.
Nested Trigger: A trigger can also contain INSERT, UPDATE and DELETE logic within itself; so when the trigger is fired because of data modification, it can also cause another data modification, thereby firing another trigger. A trigger that contains data modification logic within itself is called a nested trigger.

What are the Different Types of Triggers?

There are two types of Triggers.
1) DML Trigger
There are two types of DML Triggers
a. Instead of Trigger: Instead of Triggers are fired in place of the triggering action such as an insert, update, or delete.
b. After Trigger: After triggers execute following the triggering action, such as an insert, update, or delete.
2) DDL Trigger
This type of trigger is fired against Drop Table, Create Table, Alter Table or Login events. DDL Triggers are always After Triggers.

DWBI Interview Questions (Ref: blog.sqlauthority.com)

1. What is Data Warehousing?
It is a repository of integrated information, available for queries and analysis. A data warehouse is the main repository of an organization’s historical data, it contains the data for management’s decision support system.
2. What is Business Intelligence (BI)?
Business Intelligence (BI) refers to technologies, applications and practices for the collection, integration, analysis, and presentation of business information and sometimes to the information itself. The purpose of BI is to support better business decision making.
3. What is the Difference between Data Warehousing and Business Intelligence?
Data warehousing deals with all aspects of managing the development, implementation and operation of a data warehouse or data mart, including meta data management, data acquisition, data cleansing, data transformation, storage management, data distribution, data archiving, operational reporting, analytical reporting, security management and backup/recovery planning. Business intelligence, on the other hand, is a set of software tools that enable an organization to analyze measurable aspects of their business such as sales performance, profitability, operational efficiency, effectiveness of marketing campaigns, market penetration among certain customer groups, cost trends, anomalies and exceptions. Typically, the term ’business intelligence’ is used to encompass OLAP, data visualization, data mining and query/reporting tools.
4. What is a Dimension Table?
Dimensional table is a collection of hierarchies, categories and logic which can be used for user to traverse in hierarchical nodes
5. What is Dimensional Modeling?
Dimensional data model concept involves two types of tables and it is different from the 3rd normal form. This concept uses Facts table, which contains the measurements of the business, and Dimension table, which contains the context (dimension of calculation) of the measurements.
6. What is a Fact Table?
Fact table contains measurements of business process. Fact table contains the foreign keys for the dimension tables.
7. What are the Fundamental Stages of Data Warehousing?
There are four different fundamental stages of Data Warehousing.
Offline Operational Databases: Data warehouses in this initial stage are developed by simply copying the database of an operational system to an off-line server where the processing load of reporting does not impact on the operational system’s performance.
Offline Data Warehouse: Data warehouses in this stage of evolution are updated on a regular time cycle (usually daily, weekly or monthly) from the operational systems, and the data is stored in an integrated reporting-oriented data structure
Real Time Data Warehouse: Data warehouses at this stage are updated on a transaction or event basis, every time an operational system performs a transaction (e.g. an order or a delivery or a booking)
Integrated Data Warehouse: Data warehouses at this stage are used to generate activity or transactions that are passed back into the operational systems for use in the daily activity of the organization.

8. What are the Different Methods of Loading Dimension tables?
There are two different ways to load data in dimension tables.
Conventional (Slow): All the constraints and keys are validated against the data before it is loaded; this way data integrity is maintained.
Direct (Fast): All the constraints and keys are disabled before the data is loaded. Once data is loaded, it is validated against all the constraints and keys. If data is found invalid or dirty, it is not included in index, and all future processes on this data are skipped.
9. What is Data Mining?
Data Mining is the process of analyzing data from different perspectives and summarizing it into useful information.
10. What is OLTP?
OLTP is abbreviation of On-Line Transaction Processing. This system is an application that modifies data. At the very instant it is received and has a large number of concurrent users.
11. What is OLAP?
OLAP is abbreviation of Online Analytical Processing. This system is an application that collects, manages, processes and presents multidimensional data for analysis and management purposes.
12. What is the Difference between OLTP and OLAP?

Data Source

OLTP: Operational data is from original data source of the data.
OLAP: Consolidation data is from various sources.

Process Goal

OLTP: Snapshot of business processes which do fundamental business tasks.
OLAP: Multi-dimensional views of business activities of planning and decision making.

Queries and Process Scripts

OLTP: Simple quick running queries ran by users.
OLAP: Complex long running queries by system to update the aggregated data.

Database Design

OLTP: Normalized small database. Speed will be not an issue because of a small database, and normalization will not degrade performance. This adopts the entity relationship (ER) model and an application-oriented database design.
OLAP: De-normalized large database. Speed is an issue because of a large database and de-normalizing will improve performance as there will be less tables to scan while performing tasks. This adopts star, snowflake or fact constellation mode of subject-oriented database design.

Back up and System Administration

OLTP: Regular Database backup and system administration can do the job.
OLAP: Reloading the OLTP data is considered as a good backup option.

13. What is ODS?
ODS is the abbreviation of Operational Data Store ‑ a database structure that is a repository for near real-time operational data rather than long-term trend data.
14. What is ER Diagram?
Entity Relationship (ER) Diagrams are a major data modeling tool and will help organize the data in your project into entities and define the relationships between the entities. This process has enabled the analyst to produce a good database structure so that the data can be stored and retrieved in a most efficient manner.
An entity-relationship (ER) diagram is a specialized graphic that illustrates the interrelationships between entities in a database. A type of diagram used in data modeling for relational data bases. These diagrams show the structure of each table and the links between tables.
15. What is ETL?
ETL is abbreviation of extract, transform, and load. ETL is software that enables businesses to consolidate their disparate data while moving it from place to place, and it doesn’t really matter that that data is in different forms or formats. The data can come from any source. ETL is powerful enough to handle such data disparities. First, the extract function reads data from a specified source database and extracts a desired subset of data. Next, the transform function works with the acquired data – using rules or lookup tables, or creating combinations with other data – to convert it to the desired state. Finally, the load function is used to write the resulting data to a target database.
16. What is VLDB?
VLDB is abbreviation of Very Large Database. For instance, a one-terabyte database can be considered as a VLDB. Typically, these are decision support systems or transaction processing applications serving a large number of users.
17. Is OLTP Database is Design Optimal for Data Warehouse?
No. OLTP database tables are normalized, and it will add additional time to queries to return results. Additionally, the OLTP database is small; it does not contain data from a long period (many years), which needs to be analyzed. A OLTP system is basically an ER model and not a Dimensional Model. If a complex query is executed on an OLTP system, it may lead to heavy overhead on the OLTP server that will affect the normal business processes.
18. What are Lookup Tables?
A lookup table is the table placed on the target table based upon the primary key of the target; it just updates the table by allowing only modified (new or updated) records based on the lookup condition.
19. What are Aggregate Tables?
An aggregate table contains the summary of existing warehouse data, which is grouped to certain levels of dimensions. It is always easy to retrieve data from aggregated tables than visiting original table which has millions of records. Aggregate tables reduce the load in the database server and improve the performance of the query, and they also can retrieve the result quickly.
20. What is Real-Time Data-Warehousing?
Data warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes available instantly.
21. What are Conformed Dimensions?
Conformed dimensions are the dimensions which can be used across multiple Data Marts in combination with multiple facts tables accordingly.
22. How do you Load the Time Dimension?
Time dimensions are usually loaded by a program that loops through all possible dates that may appear in the data. 100 years may be represented in a time dimension, with one row per day.

23. What is a Level of Granularity of a Fact Table?
Level of granularity means the level of detail that you put into the fact table in a data warehouse. Level of granularity implies the detail you are willing to put for each transactional fact.
24. What are Non-Additive Facts?
Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table. However, they are not considered as useless. If there are changes in dimensions, the same facts can be useful.
25. What is a Factless Facts Table?
A fact table that does not contain numeric fact columns is called a factless facts table.
26. What are Slowly Changing Dimensions (SCD)?
SCD is the abbreviation of slowly changing dimensions. SCD applies to cases where the attribute for a record varies over time. There are three different types of SCD.
  1. SCD1: The new record replaces the original record. Only one record exists in database – current data.
  2. SCD2: A new record is added into the customer dimension table. Two records exist in the database – current data and previous history data.
  3. SCD3: The original data is modified to include new data. One record exists in database – new information is attached with old information in same row.
27. What is Hybrid Slowly Changing Dimension?
Hybrid SCDs are combination of both SCD 1 and SCD 2. It may happen that in a table, some columns are important and we need to track changes for them, i.e. capture the historical data for them, whereas in some columns even if the data changes, we do not care.
28. What is MDS?
Master Data Services helps enterprises standardize the data people rely on to make critical business decisions. With Master Data Services, IT organizations can centrally manage critical data assets company wide and across diverse systems, enable more people to securely manage master data directly, and ensure the integrity of information over time.
29. What is a Data Mart?
A data mart (DM) is a specialized version of a data warehouse (DW). Like data warehouses, data marts contain a snapshot of operational data that helps business people to strategize based on analyses of past trends and experiences. The key difference is that the creation of a data mart is predicated on a specific, predefined need for a certain grouping and configuration of select data.
30. What is a Surrogate Key?
A surrogate key is a substitution for the natural primary key. It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it should be unique for each row in the table. It is useful because the natural primary key can change and this makes updates more difficult. Surrogated keys are always integer or numeric.
31. What is Junk Dimension?
A number of very small dimensions may get lumped together to form a single dimension, i.e. a junk dimension – the attributes are not closely related. Grouping of Random flags and text Attributes in a dimension and moving them to a separate sub dimension is known as junk dimension.
32. What is Degenerate Dimension Table?
If a table contains values, which are neither dimension nor measures, then it is called a degenerate dimension table.
33. Why is Data Modeling Important?
Data modeling is probably the most labor intensive and time consuming part of the development process. The goal of the data model is to make sure that the all data objects required by the database are completely and accurately represented. Because the data model uses easily understood notations and natural language, it can be reviewed and verified as correct by the end users.
In computer science, data modeling is the process of creating a data model by applying a data model theory to create a data model instance. A data model theory is a formal data model description. In data modeling, we are structuring and organizing data. These data structures are then typically implemented in a database management system. In addition to defining and organizing the data, data modeling will impose (implicitly or explicitly) constraints or limitations on the data placed within the structure.
34. What is Difference between ER Modeling and Dimensional Modeling?
ER modeling is used for normalizing the OLTP database design. Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design.
35. What is Degenerate Dimension Table?
If a table contains values, which are neither dimension nor measures, then it is called a degenerate dimension table.
36. What is BUS Schema?
BUS Schema consists of a master suite of confirmed dimension and standardized definition of facts.
37. What is a Star Schema?
Star schema is a type of organizing the tables such that we can retrieve the result from the database quickly in the warehouse environment.
38. What Snow Flake Schema?
In Snowflake Schema, each dimension has a primary dimension table, to which one or more additional dimensions can join. The primary dimension table is the only table that can join to the fact table.
39. Differences between the Star and Snowflake Schema?
Star schema: A single fact table with N number of dimensions; all dimensions will be linked directly with a fact table. This schema is de-normalized and results in simple join and less complex query as well as faster results.
Snow schema: Any dimension with extended dimensions is known as snowflake schema; dimensions maybe interlinked or may have one-to-many relationship with other tables. This schema is normalized, and results in complex join leading very complex query (as well as slower results).