Saturday, October 11, 2014

Know Your DMVs : sys.dm_exec_procedure_stats


Context 

The impetus behind this post is a technical conversation I had with a fellow team member couple of days back. He was doing the performance analysis of a few procedures in a datawarehouse and was using the DMV sys.dm_exec_procedure_stats for the purpose. He had some concerns over the results he was getting and came to discuss with me on the same.
His major issues were the below
  1. The names of the procedures returned by the DMV are not always correct
  2. When trying to link to the sys.procedures view to get definition some of the rows are getting missed in the output.
He shared with me the below query which he was using and having issues with

select OBJECT_NAME(s.object_id) AS ProcName,
last_execution_time,
execution_count,
last_elapsed_time,
last_execution_time,
last_logical_reads,
last_physical_reads,
last_worker_time
from sys.dm_exec_procedure_stats s

Illustration

I thought of analyzing this myself and setup a sample database with few procedures and executed them. Then I executed the above query to capture their stats and got the below output


As specified earlier I could also get NULL values for few objects. 
I tried adding join to sys.procedures and got the following result
Which was also the same as what my colleague specified

Solution

I understood there is something fundamentally wrong we are doing in the query. 
I analysed the DMV carefully and understood the issue
Just do a select * on the DMV and you'll see the below result
Which clearly indicates the below
  • The DMV is always executed from master db context and returns the details of all procedures in the cache from all the databases within the server instance. The database_id field indicates the database to which the procedure belongs. As such joining with catalog views like sys.procedures will only return matches for the objects which belong to the database from which query is executed. Also usage of functions like OBJECT_NAME will also return values for objects based on current database scope  itself  by default so if there's another object with same id in the executing database it will get returned as the object name instead of the actual object name.
  • As shown by the above result the query also retrieves the details on extended stored procedures so if we want to see details of procedures alone we need to add a filter on type or type_desc fields
Based on these observations I tweaked the query as below and executed

select OBJECT_NAME(s.object_id) AS ProcName,
st.text AS ProcDefinition,
last_execution_time,
execution_count,
last_elapsed_time,
last_execution_time,
last_logical_reads,
last_physical_reads,
last_worker_time
from sys.dm_exec_procedure_stats s
CROSS APPLY sys.dm_exec_sql_text(s.sql_handle) AS st
WHERE database_id = DB_ID()
AND Type = 'P'
This would cause the query to retrieve only details of the cached procedures which are in the current database scope. Hence you will get the correct object name listed. It would also cause extended procedures to get filtered out as they will have NULL values for database_id column.
You will get the result as below

If you want full list of procedures to be returned for the instance you need to do a slight tweak in above query to pass database_id also as an argument for OBJECT_NAME function to return the value from the correct database scope as shown below

More Info

Now If you analyze this DMV carefully you will understand that this is a very useful DMV which gives useful details on performance statistics of cached procedures like details on execution count,execution time,worker threads, logical / physical reads etc. One thing to notice is that the result will contain the details of a procedure only until it is cached. Remove the procedure from the cache and it will get excluded from the result as shown.


As you see from the above the procedure got excluded from the result
Hope this article will help you in understanding more on the DMV sys.dm_exec_procedure_stats and its usage and would help you in avoiding issues like above caused due to its incorrect usage.

Referrences





Friday, September 26, 2014

T-SQL Tips: Multifaceted MERGE Statement

MERGE statement was a very useful addition which was introduced from SQL 2008. It is very useful in lots of day to day scenarios like normal UPSERT (UPDATE + INSERT) operations, slowly changing dimension processing etc.
This blog aims at explaining two not so common scenarios where MERGE statement comes handy.

1. Generating an "Incremental" Data Population Script
In a typical project development there will be a need to create data scripts for populating generic values like configurations, master data entry, audit/ control table entries etc. During the course of the project the data scripts keeps on adding new values. In normal cases this will be handled by adding new INSERT scripts each time. So at the project end whilst doing the final migration to the production we will have a set of INSERT statements which may even span multiple files.
This can be replaced by a single MERGE statement to which we can always keep on adding values and executed in database to apply only the incremental changes.
The statement would look like below

MERGE INTO Table  AS d
USING (VALUES('Value1','Value2',value3)) AS s (Col1, Col2, Col3)
ON s.Col1 = d.Col1
WHEN MATCHED
AND (
s.Col2 <> d.Col2
OR s.Col3 <> d.Col3
)
THEN 
UPDATE SET Col2 = s.Col2, Col3 = s.Col3
WHEN NOT MATCHED
THEN 
INSERT (Col1,Col2,Col3)
VALUES(s.Col1,s.Col2,s.Col3);
Illustration
Consider the code below

declare @t table
(
id int identity(1,1),
col1 varchar(50),
col2 varchar(50),
col3 int
)

MERGE @t AS d
USING (VALUES('Value1','Cat1',100),
('Value2','Cat1',150),
('Value1','Cat2',300),
('Value2','Cat2',225),
('Value3','Cat2',430),
('Value1','Cat3',520)
) AS s(col1,col2,col3)
ON s.col1 = d.col1
WHEN MATCHED
AND (
COALESCE(s.col2,'') <> COALESCE(d.col2,'')
OR COALESCE(s.col3,'') <> COALESCE(d.col3,'')
)
THEN 
UPDATE 
SET col2 = s.col2,col3= s.col3
WHEN NOT MATCHED BY TARGET
THEN
INSERT (col1,col2,col3)
VALUES(s.col1,s.col2,s.col3);

SELECT * FROM @t

Now suppose if you want to add three more values you can just extend that within same MERGE statement as below

declare @t table
(
id int identity(1,1),
col1 varchar(50),
col2 varchar(50),
col3 int
)

MERGE @t AS d
USING (VALUES('Value1','Cat1',100),
('Value2','Cat1',150),
('Value1','Cat2',300),
('Value2','Cat2',225),
('Value3','Cat2',430),
('Value1','Cat3',550),
('Value2','Cat3',610),
('Value1','Cat4',735),
('Value2','Cat4',821)
) AS s(col1,col2,col3)
ON s.col1 = d.col1
WHEN MATCHED
AND (
COALESCE(s.col2,'') <> COALESCE(d.col2,'')
OR COALESCE(s.col3,'') <> COALESCE(d.col3,'')
)
THEN 
UPDATE 
SET col2 = s.col2,col3= s.col3
WHEN NOT MATCHED BY TARGET
THEN
INSERT (col1,col2,col3)
VALUES(s.col1,s.col2,s.col3);


SELECT * FROM @t

I've marked the changed rows in green. I've added three new rows and changed the value of an existing row. Executing this you can see that only the incremental changes ie new rows + modified ones will get applied to table.You can keep on extending script like this and finally you'll just have a single MERGE statement in a single file with all values which you need to migrate to your production environment rather than a set of small small incremental INSERT scripts.

2. Populating Master Child Tables

Another major application of the MERGE statement comes in the cases where you want to populate a master child table with some external data. I had previously blogged about a similar scenario here

http://visakhm.blogspot.in/2010/04/using-xml-to-batch-load-master-child.html

Now lets see how MERGE statement can be used for achieving the same requirement. MERGE would provide a much more compact way of doing this without the need of any intermediate temporary table.

The XML looks like as in below file
https://drive.google.com/file/d/0B4ZDNhljf8tQbHdfZUFteXNRcDg/edit?usp=sharing

The tables can be given as below
declare @order table
(
OrderID int  IDENTITY(100000,1),
OrderDesc varchar(1000),
OrderDate datetime,
CustomerDesc varchar(1000),
ShippedDate datetime,
WebOrder bit
)
declare @OrderItems table
(
OrderItemID int IDENTITY(1000,1),
OrderID int,
ItemDesc varchar(100),
Qty int,
ItemPrice decimal(15,2)
)

For illustration I've given it as table variables but in the actual case it would be permanent tables.
Nows lets see the illustration of how MERGE can be applied in the above scenario

Illustration
The code will look like below


INSERT INTO @OrderItems
(
OrderID,
ItemDesc,
Qty,
ItemPrice
)
SELECT *
FROM
(
MERGE INTO @order AS d
USING (SELECT t.u.value('OrderDesc[1]','varchar(100)') AS OrderDesc,
       t.u.value('OrderDate[1]','datetime') AS OrderDate,
       t.u.value('CustomerDesc[1]','varchar(1000)') AS CustomerDesc,
       t.u.value('ShippedDate[1]','datetime') AS ShippedDate,
       t.u.value('WebOrder[1]','bit') AS WebOrder,
       m.n.value('ItemDesc[1]','varchar(100)') AS ItemDesc,
       m.n.value('Qty[1]','int') AS Qty,
       m.n.value('ItemPrice[1]','decimal(15,2)') AS ItemPrice  
       FROM @x.nodes('/Orders/Order')t(u)
       CROSS APPLY u.nodes('OrderItem')m(n)) AS s
 ON d.OrderDesc = s.OrderDesc
 AND d.OrderDate = s.OrderDate
 WHEN NOT MATCHED
 THEN
 INSERT ( 
 OrderDesc ,
OrderDate ,
CustomerDesc ,
ShippedDate ,
WebOrder
)
VALUES
(
s.OrderDesc ,
s.OrderDate ,
s.CustomerDesc ,
s.ShippedDate ,
s.WebOrder
)
OUTPUT inserted.OrderID,
s.ItemDesc,
s.Qty,
s.ItemPrice
)t (OrderID,
ItemDesc,
Qty,
ItemPrice)

Now check the output and you can see the below



As seen above the data will get populated to the master and the child table preserving the referential integrity. This is made possible through the use of OUTPUT clause with MERGE. The OUTPUT clause when used with MERGE exposes the columns of source table/query in addition to the INSERTED and DELETED table columns which enables us to capture the generated id values from the parent table as well as all the required column values for the child table from the source query within the same statement. This is then used in a nested INSERT for populating the child table. This is a scenario where MERGE can come really handy to us.
Hope you enjoyed this article speaking about two not so common practical scenarios where MERGE can be applied. As always feel free to revert with your clarifications/comments.
Will be back with a new T-SQL tip soon!

Wednesday, September 17, 2014

T-SQL Tips: Fun with date FORMATing

FORMAT was a new function introduced in SQL 2012 which can be use to format date,numeric values to a variety of different formats and styles.
This blog discusses on some of the cool date value formatting tips that can be done using the FORMAT function.

1. Display the date value based on a language/locale culture

declare @dt datetime = GETDATE()

SELECT FORMAT(@dt,'d','en-GB') AS BritishFormat,FORMAT(@dt,'D','en-US') AS UsFormat,
FORMAT(@dt,'D','zh-cn') AS ChineseFormat,FORMAT(@dt,'D','ta-IN') AS TamilIndiaFormat,
FORMAT(@dt,'D','ml-IN') AS MalayalamIndiaFormat



2. Display date as per any of the custom formats

declare @dt datetime = GETDATE()

SELECT FORMAT(@dt,'dd/MM/yyyy hh:mm:ss tt') AS DateWithTime12h,
FORMAT(@dt,'dd MMM yyyy HH:mm:ss') AS DateWithTime24h,
FORMAT(@dt,'HH:mm:ss.fff') AS TimeAloneWithMillisec,
FORMAT(@dt,'yyyy-MM-dd HH:mm:ss.fff zz') AS DateTimeWithOffset,
FORMAT(@dt,'dddd dd MMMM yyyy gg') AS DayNameWithDateEra


3. Display component parts of a date value based on custom formats

declare @dt datetime = GETDATE()

SELECT FORMAT(@dt,'yyyy\MM') AS Period,
FORMAT(@dt,'hh:mm:ss tt') AS Time12h,
FORMAT(@dt,'HH:mm:ss') AS Time24h,
FORMAT(@dt,'dddd dd\t\h MMMM') AS DayMonthDate,
FORMAT(@dt,'HH \h mm \min ss \sec') AS TimeElapsedSinceDayStart,
FORMAT(@dt,'yyyy-MM-ddTHH:mm:sszzz') AS XMLDateFormat,
FORMAT(@dt,'yyyy-MM-ddTHH:mm:ssZ') AS XMLDateFormatUTC


From the above we can see that FORMAT is a very useful function which provides the ability of generating flexible presentation formats from a given date value.

Saturday, September 13, 2014

Troubleshooting SQLServer CDC Service for Oracle Capturing Only Changed Column Values for Updates Issue

Recently for one of my datawarehouse projects we had a scenario which required getting data from a Oracle server. Since this Oracle server was a high end OLTP system with high frequency DML operations happening the impact on the system has to be kept a minimum.
For this purpose we made use of the CDC service for Oracle for replicating the changes from Oracle server to our SQLServer datawarehouse. This worked fine for us until we faced an issue with data modifications happening at the Oracle end.
We could see that the CDC only captured details of the changed columns for UPDATE operations happening in Oracle server leaving out all the other columns in the table with NULL values. This threw off our logic to capture the changed information from the Oracle delta data.
What we didn't understand was what made this issue to suddenly crop up as this was working well at the time of implementation for us. I did some analysis on this and found out the reason for this which helped us to fix this issue. I'm sharing it via this blog to benefit others who may get stuck up with similar issues.
The key requirement for CDC to work is that Oracle server should be setup in ARCHIVELOG mode with supplementary logging enabled. This is because CDC makes use of logs to capture the change data and replicate it to the SQLServer database.So my analysis was around the settings to be done for supplementary logging at Oracle server.
Through the below link
http://cglendenningoracle.blogspot.in/2009/09/what-is-oracle-supplemental-logging.html
I learned that there are two level of logging and for CDC to get enough information from the redo logs and capture it outside of Oracle instance Database Level Identification Key logging has to be enabled. The information regarding this can be got from V$DATABASES catalog view in Oracle. It has a column corresponding to each of the options available like
SUPPLEMENTAL_LOG_DATA_FK,
SUPPLEMENTAL_LOG_DATA_PK, 
SUPPLEMENTAL_LOG_DATA_ALL etc. 
I checked the value and found out that SUPPLEMENTAL_LOG_DATA_ALL had the value set as NO. So I tried setting it to YES using below statement

ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
I did some changes in source tables to check the outcome and found out that CDC was able to pick up data in other columns too from then on which confirmed me that issue had been fixed by tweaking the setting. Apparently what had happened was that the ARCHIVELOG got full in the Oracle server and after fixing this the supplementary settings somehow got reset to the default values. Setting it back to original state fixed the issue. So whenever you face similar issues with CDC not able to get required data from redo logs make sure you check the V$DATABASES view for the supplementary log settings existing for various types of columns and fix them accordingly in case they got reset using ALTER DATABASE command.

Monday, August 25, 2014

T-SQL Tips: Be Cautious With Your Delete Queries

This blog speaks about some of the critical things you need to be aware of while implementing a delete logic for your project.
The impetus for writing the blog is a recent experience we had due to a small oversight in one of the delete code implemented by a fellow team member. By the time we realized the mistake, considerable amount of the damage was already done and we had to do a good amount of work to put everything back in place.
Now onto the details.
There are few things that need to taken care of when you're developing a DELETE procedure to clean up the data in your project. The main checks that you need to do can be summed up as below

1. Make sure the logic is correct
This is the basic thing that you need to take care of :) . Do a part to part comparison of the DELETE query logic against the actual requirement and make sure the query covers all the conditions as stated by the requirement

2. Test with a sample data
Make sure you have a sandbox with sample data available in which you can test the delete logic and make sure it works exactly as per your requirement.
In cases were you don't have such a setup available i.e. just have a single environment to work with (which is extremely dangerous IMO!)  you can follow the below approach
Wrap the script within a dummy transaction as below

BEGIN TRAN TestDelete


SELECT * FROM Table

--COMMIT TRAN TestDelete
--ROLLBACK TRAN TestDelete

Check the output from the table after the delete operation and if it matches your expectation uncomment and execute the COMMIT TRAN statement. Otherwise uncomment and execute the ROLLBACK TRAN statement.
This can be applied in the cases where you want to apply manual delete script.
In case of automated jobs you can do a different approach as the above is not possible.
You can make use of OUTPUT clause to capture the contents of the DELETED table to another staging table and check it after the DELETE. If by any chance it doesn't match you expectation you can ROLLBACK the transaction otherwise you may COMMIT it.
The stub will look like below

BEGIN TRAN TestDelete

DELETE t
OUTPUT DELETED.* INTO OtherTable
FROM Table t
...

IF (SELECT COUNT(*) FROM OtherTable) =
 ROLLBACK TRAN TestDelete
ELSE
 COMMIT TRAN TestDelete


3. Always fully qualify the object names used in DELETE statement
This is the most critical of the steps and the one which gave us the maximum trouble with. Make sure you use appropriate aliases and fully qualify the columns otherwise you may end up flushing out the entire data of the table as per below. This is what happened to us and we had to spend good amount of time to rework and put the lost data back in place.
See the below illustration to understand the issue

declare @t table
(
id int,
val varchar(50)
)

insert @t
values (1,'Test'),
(2,'jkgj'),
(1,'grwreg'),
(1,'sdvad'),
(1,'svdb')


declare @t1 table
(
Col1 int,
col2 int
)
insert @t1
values (12,123),(2,234),(5,213)

select * from @t
select * from @t1

delete 
from @t
where id in (select id from @t1)


select * from @t
select * from @t1

If you see the code above carefully it has a wrong column placed within the subquery. Little did we realize what this would cause to us.
Checking the output you can see that it will delete all the records from the outer table @t as it will act as a trivial condition check.
This behavior is by design and is explained in the KB article below

http://support.microsoft.com/kb/298674/en-us

Rewriting the query as below you can see that it will complain on the wrong column

delete 
from @t t
where t.id in (select t1.id from @t1 t1)

Which will avoid any possibility of table getting emptied as it happened above
So the bottom line is always fully qualify your objects
Hopefully this would give you some insight into things to be taken care of while developing and implementing delete queries in future and avoid any unintended deletions from happening.