Friday, October 24, 2014

IIf function T-SQL vs SSRS Behavioral Difference

This blog explains on the behavioral difference of IIf function between Reporting Services and in Transact SQL.
IIf function was a new addition in SQL Server 2012 which provides a simplified way to implement conditionally logic. Though we have a similar function available in SSRS 2005 onwards there is a small difference in the functionality between IIf function in SSRS and that in T-SQL.

I've explained this difference in the below MSDN article using few scenarios

http://social.technet.microsoft.com/wiki/contents/articles/28000.behavioral-difference-of-iif-function-in-t-sql-compared-to-ssrs.aspx

Feel free to go through this and let me know your valuable feedback

Saturday, October 11, 2014

Know Your DMVs : sys.dm_exec_procedure_stats


Context 

The impetus behind this post is a technical conversation I had with a fellow team member couple of days back. He was doing the performance analysis of a few procedures in a datawarehouse and was using the DMV sys.dm_exec_procedure_stats for the purpose. He had some concerns over the results he was getting and came to discuss with me on the same.
His major issues were the below
  1. The names of the procedures returned by the DMV are not always correct
  2. When trying to link to the sys.procedures view to get definition some of the rows are getting missed in the output.
He shared with me the below query which he was using and having issues with

select OBJECT_NAME(s.object_id) AS ProcName,
last_execution_time,
execution_count,
last_elapsed_time,
last_execution_time,
last_logical_reads,
last_physical_reads,
last_worker_time
from sys.dm_exec_procedure_stats s

Illustration

I thought of analyzing this myself and setup a sample database with few procedures and executed them. Then I executed the above query to capture their stats and got the below output


As specified earlier I could also get NULL values for few objects. 
I tried adding join to sys.procedures and got the following result
Which was also the same as what my colleague specified

Solution

I understood there is something fundamentally wrong we are doing in the query. 
I analysed the DMV carefully and understood the issue
Just do a select * on the DMV and you'll see the below result
Which clearly indicates the below
  • The DMV is always executed from master db context and returns the details of all procedures in the cache from all the databases within the server instance. The database_id field indicates the database to which the procedure belongs. As such joining with catalog views like sys.procedures will only return matches for the objects which belong to the database from which query is executed. Also usage of functions like OBJECT_NAME will also return values for objects based on current database scope  itself  by default so if there's another object with same id in the executing database it will get returned as the object name instead of the actual object name.
  • As shown by the above result the query also retrieves the details on extended stored procedures so if we want to see details of procedures alone we need to add a filter on type or type_desc fields
Based on these observations I tweaked the query as below and executed

select OBJECT_NAME(s.object_id) AS ProcName,
st.text AS ProcDefinition,
last_execution_time,
execution_count,
last_elapsed_time,
last_execution_time,
last_logical_reads,
last_physical_reads,
last_worker_time
from sys.dm_exec_procedure_stats s
CROSS APPLY sys.dm_exec_sql_text(s.sql_handle) AS st
WHERE database_id = DB_ID()
AND Type = 'P'
This would cause the query to retrieve only details of the cached procedures which are in the current database scope. Hence you will get the correct object name listed. It would also cause extended procedures to get filtered out as they will have NULL values for database_id column.
You will get the result as below

If you want full list of procedures to be returned for the instance you need to do a slight tweak in above query to pass database_id also as an argument for OBJECT_NAME function to return the value from the correct database scope as shown below

More Info

Now If you analyze this DMV carefully you will understand that this is a very useful DMV which gives useful details on performance statistics of cached procedures like details on execution count,execution time,worker threads, logical / physical reads etc. One thing to notice is that the result will contain the details of a procedure only until it is cached. Remove the procedure from the cache and it will get excluded from the result as shown.


As you see from the above the procedure got excluded from the result
Hope this article will help you in understanding more on the DMV sys.dm_exec_procedure_stats and its usage and would help you in avoiding issues like above caused due to its incorrect usage.

Referrences





Friday, September 26, 2014

T-SQL Tips: Multifaceted MERGE Statement

MERGE statement was a very useful addition which was introduced from SQL 2008. It is very useful in lots of day to day scenarios like normal UPSERT (UPDATE + INSERT) operations, slowly changing dimension processing etc.
This blog aims at explaining two not so common scenarios where MERGE statement comes handy.

1. Generating an "Incremental" Data Population Script
In a typical project development there will be a need to create data scripts for populating generic values like configurations, master data entry, audit/ control table entries etc. During the course of the project the data scripts keeps on adding new values. In normal cases this will be handled by adding new INSERT scripts each time. So at the project end whilst doing the final migration to the production we will have a set of INSERT statements which may even span multiple files.
This can be replaced by a single MERGE statement to which we can always keep on adding values and executed in database to apply only the incremental changes.
The statement would look like below

MERGE INTO Table  AS d
USING (VALUES('Value1','Value2',value3)) AS s (Col1, Col2, Col3)
ON s.Col1 = d.Col1
WHEN MATCHED
AND (
s.Col2 <> d.Col2
OR s.Col3 <> d.Col3
)
THEN 
UPDATE SET Col2 = s.Col2, Col3 = s.Col3
WHEN NOT MATCHED
THEN 
INSERT (Col1,Col2,Col3)
VALUES(s.Col1,s.Col2,s.Col3);
Illustration
Consider the code below

declare @t table
(
id int identity(1,1),
col1 varchar(50),
col2 varchar(50),
col3 int
)

MERGE @t AS d
USING (VALUES('Value1','Cat1',100),
('Value2','Cat1',150),
('Value1','Cat2',300),
('Value2','Cat2',225),
('Value3','Cat2',430),
('Value1','Cat3',520)
) AS s(col1,col2,col3)
ON s.col1 = d.col1
WHEN MATCHED
AND (
COALESCE(s.col2,'') <> COALESCE(d.col2,'')
OR COALESCE(s.col3,'') <> COALESCE(d.col3,'')
)
THEN 
UPDATE 
SET col2 = s.col2,col3= s.col3
WHEN NOT MATCHED BY TARGET
THEN
INSERT (col1,col2,col3)
VALUES(s.col1,s.col2,s.col3);

SELECT * FROM @t

Now suppose if you want to add three more values you can just extend that within same MERGE statement as below

declare @t table
(
id int identity(1,1),
col1 varchar(50),
col2 varchar(50),
col3 int
)

MERGE @t AS d
USING (VALUES('Value1','Cat1',100),
('Value2','Cat1',150),
('Value1','Cat2',300),
('Value2','Cat2',225),
('Value3','Cat2',430),
('Value1','Cat3',550),
('Value2','Cat3',610),
('Value1','Cat4',735),
('Value2','Cat4',821)
) AS s(col1,col2,col3)
ON s.col1 = d.col1
WHEN MATCHED
AND (
COALESCE(s.col2,'') <> COALESCE(d.col2,'')
OR COALESCE(s.col3,'') <> COALESCE(d.col3,'')
)
THEN 
UPDATE 
SET col2 = s.col2,col3= s.col3
WHEN NOT MATCHED BY TARGET
THEN
INSERT (col1,col2,col3)
VALUES(s.col1,s.col2,s.col3);


SELECT * FROM @t

I've marked the changed rows in green. I've added three new rows and changed the value of an existing row. Executing this you can see that only the incremental changes ie new rows + modified ones will get applied to table.You can keep on extending script like this and finally you'll just have a single MERGE statement in a single file with all values which you need to migrate to your production environment rather than a set of small small incremental INSERT scripts.

2. Populating Master Child Tables

Another major application of the MERGE statement comes in the cases where you want to populate a master child table with some external data. I had previously blogged about a similar scenario here

http://visakhm.blogspot.in/2010/04/using-xml-to-batch-load-master-child.html

Now lets see how MERGE statement can be used for achieving the same requirement. MERGE would provide a much more compact way of doing this without the need of any intermediate temporary table.

The XML looks like as in below file
https://drive.google.com/file/d/0B4ZDNhljf8tQbHdfZUFteXNRcDg/edit?usp=sharing

The tables can be given as below
declare @order table
(
OrderID int  IDENTITY(100000,1),
OrderDesc varchar(1000),
OrderDate datetime,
CustomerDesc varchar(1000),
ShippedDate datetime,
WebOrder bit
)
declare @OrderItems table
(
OrderItemID int IDENTITY(1000,1),
OrderID int,
ItemDesc varchar(100),
Qty int,
ItemPrice decimal(15,2)
)

For illustration I've given it as table variables but in the actual case it would be permanent tables.
Nows lets see the illustration of how MERGE can be applied in the above scenario

Illustration
The code will look like below


INSERT INTO @OrderItems
(
OrderID,
ItemDesc,
Qty,
ItemPrice
)
SELECT *
FROM
(
MERGE INTO @order AS d
USING (SELECT t.u.value('OrderDesc[1]','varchar(100)') AS OrderDesc,
       t.u.value('OrderDate[1]','datetime') AS OrderDate,
       t.u.value('CustomerDesc[1]','varchar(1000)') AS CustomerDesc,
       t.u.value('ShippedDate[1]','datetime') AS ShippedDate,
       t.u.value('WebOrder[1]','bit') AS WebOrder,
       m.n.value('ItemDesc[1]','varchar(100)') AS ItemDesc,
       m.n.value('Qty[1]','int') AS Qty,
       m.n.value('ItemPrice[1]','decimal(15,2)') AS ItemPrice  
       FROM @x.nodes('/Orders/Order')t(u)
       CROSS APPLY u.nodes('OrderItem')m(n)) AS s
 ON d.OrderDesc = s.OrderDesc
 AND d.OrderDate = s.OrderDate
 WHEN NOT MATCHED
 THEN
 INSERT ( 
 OrderDesc ,
OrderDate ,
CustomerDesc ,
ShippedDate ,
WebOrder
)
VALUES
(
s.OrderDesc ,
s.OrderDate ,
s.CustomerDesc ,
s.ShippedDate ,
s.WebOrder
)
OUTPUT inserted.OrderID,
s.ItemDesc,
s.Qty,
s.ItemPrice
)t (OrderID,
ItemDesc,
Qty,
ItemPrice)

Now check the output and you can see the below



As seen above the data will get populated to the master and the child table preserving the referential integrity. This is made possible through the use of OUTPUT clause with MERGE. The OUTPUT clause when used with MERGE exposes the columns of source table/query in addition to the INSERTED and DELETED table columns which enables us to capture the generated id values from the parent table as well as all the required column values for the child table from the source query within the same statement. This is then used in a nested INSERT for populating the child table. This is a scenario where MERGE can come really handy to us.
Hope you enjoyed this article speaking about two not so common practical scenarios where MERGE can be applied. As always feel free to revert with your clarifications/comments.
Will be back with a new T-SQL tip soon!

Wednesday, September 17, 2014

T-SQL Tips: Fun with date FORMATing

FORMAT was a new function introduced in SQL 2012 which can be use to format date,numeric values to a variety of different formats and styles.
This blog discusses on some of the cool date value formatting tips that can be done using the FORMAT function.

1. Display the date value based on a language/locale culture

declare @dt datetime = GETDATE()

SELECT FORMAT(@dt,'d','en-GB') AS BritishFormat,FORMAT(@dt,'D','en-US') AS UsFormat,
FORMAT(@dt,'D','zh-cn') AS ChineseFormat,FORMAT(@dt,'D','ta-IN') AS TamilIndiaFormat,
FORMAT(@dt,'D','ml-IN') AS MalayalamIndiaFormat



2. Display date as per any of the custom formats

declare @dt datetime = GETDATE()

SELECT FORMAT(@dt,'dd/MM/yyyy hh:mm:ss tt') AS DateWithTime12h,
FORMAT(@dt,'dd MMM yyyy HH:mm:ss') AS DateWithTime24h,
FORMAT(@dt,'HH:mm:ss.fff') AS TimeAloneWithMillisec,
FORMAT(@dt,'yyyy-MM-dd HH:mm:ss.fff zz') AS DateTimeWithOffset,
FORMAT(@dt,'dddd dd MMMM yyyy gg') AS DayNameWithDateEra


3. Display component parts of a date value based on custom formats

declare @dt datetime = GETDATE()

SELECT FORMAT(@dt,'yyyy\MM') AS Period,
FORMAT(@dt,'hh:mm:ss tt') AS Time12h,
FORMAT(@dt,'HH:mm:ss') AS Time24h,
FORMAT(@dt,'dddd dd\t\h MMMM') AS DayMonthDate,
FORMAT(@dt,'HH \h mm \min ss \sec') AS TimeElapsedSinceDayStart,
FORMAT(@dt,'yyyy-MM-ddTHH:mm:sszzz') AS XMLDateFormat,
FORMAT(@dt,'yyyy-MM-ddTHH:mm:ssZ') AS XMLDateFormatUTC


From the above we can see that FORMAT is a very useful function which provides the ability of generating flexible presentation formats from a given date value.

Saturday, September 13, 2014

Troubleshooting SQLServer CDC Service for Oracle Capturing Only Changed Column Values for Updates Issue

Recently for one of my datawarehouse projects we had a scenario which required getting data from a Oracle server. Since this Oracle server was a high end OLTP system with high frequency DML operations happening the impact on the system has to be kept a minimum.
For this purpose we made use of the CDC service for Oracle for replicating the changes from Oracle server to our SQLServer datawarehouse. This worked fine for us until we faced an issue with data modifications happening at the Oracle end.
We could see that the CDC only captured details of the changed columns for UPDATE operations happening in Oracle server leaving out all the other columns in the table with NULL values. This threw off our logic to capture the changed information from the Oracle delta data.
What we didn't understand was what made this issue to suddenly crop up as this was working well at the time of implementation for us. I did some analysis on this and found out the reason for this which helped us to fix this issue. I'm sharing it via this blog to benefit others who may get stuck up with similar issues.
The key requirement for CDC to work is that Oracle server should be setup in ARCHIVELOG mode with supplementary logging enabled. This is because CDC makes use of logs to capture the change data and replicate it to the SQLServer database.So my analysis was around the settings to be done for supplementary logging at Oracle server.
Through the below link
http://cglendenningoracle.blogspot.in/2009/09/what-is-oracle-supplemental-logging.html
I learned that there are two level of logging and for CDC to get enough information from the redo logs and capture it outside of Oracle instance Database Level Identification Key logging has to be enabled. The information regarding this can be got from V$DATABASES catalog view in Oracle. It has a column corresponding to each of the options available like
SUPPLEMENTAL_LOG_DATA_FK,
SUPPLEMENTAL_LOG_DATA_PK, 
SUPPLEMENTAL_LOG_DATA_ALL etc. 
I checked the value and found out that SUPPLEMENTAL_LOG_DATA_ALL had the value set as NO. So I tried setting it to YES using below statement

ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
I did some changes in source tables to check the outcome and found out that CDC was able to pick up data in other columns too from then on which confirmed me that issue had been fixed by tweaking the setting. Apparently what had happened was that the ARCHIVELOG got full in the Oracle server and after fixing this the supplementary settings somehow got reset to the default values. Setting it back to original state fixed the issue. So whenever you face similar issues with CDC not able to get required data from redo logs make sure you check the V$DATABASES view for the supplementary log settings existing for various types of columns and fix them accordingly in case they got reset using ALTER DATABASE command.