Saturday, March 13, 2010

Sparse Columns and Column Sets in SQL 2008

A common scenario which I have come across many times in my projects is the case where we need to design a table to store the attributes related to an entity especially when the number of attributes are large and there will be only very few attributes existing for an entity at any one time. The usual method in such cases is to go for EAV approach where we use a generic table with columns attributename and attributevalue and held the attributes themselves along with their values as rows inside the table. The issue with this approach is that every time we want data to be retrived we require some kind of cross tabbed query to get the values in rows as columns.

In SQL 2008, We do have a new feature called Sparse columns which can be used in scenarios like above. The Sparse columns provide an optimized way of storing null values.They're identified by specifying keyword SPARSE while declaring in table using CREATE TABLE or ALTER TABLE.The sparse columns takes no storage space for storing the NULL values.

Consider the following example

CREATE TABLE Product
(
ProductID int IDENTITY PRIMARY KEY,
PRoductName varchar(100),
ProductAttrib1 int SPARSE NULL,
ProductAttrib2 varchar(50) SPARSE NULL,
ProductAttrib3 int SPARSE NULL,
ProductAttrib4 int SPARSE NULL,
ProductAttrib5 datetime SPARSE NULL
)


In the above table definition we have specified all the ProductAttrib columns as of type sparse. We can do DML manipulations on sparse columns in two ways. One is same as way we do for other columns like

INSERT INTO Product(PRoductName,ProductAttrib1,ProductAttrib4)
VALUES('Product1',30,75)


Another way to do DML operations like INSERT/UPDATE is by means of column set

COLUMN SETS
--------------
A column set is XML type column which can created on the table and which combines all the sparse columns of a table into a structured output. Its like a calculated column and it gives the data in all the sparse columns in structured xml format. We can INSERT/UPDATE values of several sparse columns together by modifying value for single column set created.
As an example lets see the effect of creating a column set in our earlier table


CREATE TABLE Product
(
ProductID int IDENTITY PRIMARY KEY,
PRoductName varchar(100),
ProductAttrib1 int SPARSE NULL,
ProductAttrib2 varchar(50) SPARSE NULL,
ProductAttrib3 int SPARSE NULL,
ProductAttrib4 int SPARSE NULL,
ProductAttrib5 datetime SPARSE NULL,
AttributeConsolidated xml COLUMN_SET FOR ALL_SPARSE_COLUMNS
)

Now lets insert some values


INSERT INTO Product (PRoductName,AttributeConsolidated)
VALUES ('Product1','30752010-02-25T17:30:00'),
('Product2','220115112'),
('Product3','TestVal1232010-03-06T22:45:14'),
('Product4','67kwjfw2010-02-11T06:33:00')


Now lets see the inserted data



As you see in the above it retrives the value as XML in column set and shows the non null values of attributes instead of retrieving individual columns themselves. However, if you still want the values to be returned as individual column you can use query like below

SELECT ProductName,ProductAttrib1 ,
ProductAttrib2,
ProductAttrib3,
ProductAttrib4,
ProductAttrib5
FROM Product

Now see the result


Similarly for update operation we can either do update of all sparse columns together by modifying the associated column set as

Now see the output


Notice how value of ProductAttrib2 has become NULL. As we're modifying value through columnset it will assume NULL value for all the unspecified sparse columns and update them. So make sure you specify values for all the sparse columns for which you dont want to leave NULLAlternatively, we could specify column directly in UPDATE in which case it will only modify specified column alone

UPDATE Product
SET ProductAttrib2='kwjfw'
WHERE ProductID=4

See the output below.It didn't affect other columns at all