Tuesday, January 26, 2016

T-SQL Tips: Aggregate Functions in CASE Expressions

Impetus

The impetus for this blog comes from a recent discussion that happened in one of the forums of which I'm a member.
There was a question asked by one of the fellow members regarding an error he was getting while using an aggregate function with CASE expression. This post explains the reason for the error and also gives you an alternate method which you can use for avoiding the error.

Scenario

The sample code for the issue looks like the below

If Object_ID('tempdb..#Test','U') Is Not Null Drop Table #Test;
Create Table #Test(ID Int identity(1,1),Qty Numeric(15,2));
Insert
Into   #Test (Qty)
Values (100.25),
              (94.34);
Declare       @Factor Int=0;
Select Case When @Factor=0 Then 0 Else Sum(Qty/@Factor) End
From   #Test;

If you execute the above code you will see that you get the error as below in the SQL Management Studio 

Msg 8134, Level 16, State 1, Line 11
Divide by zero error encountered.

If you think about it you would expect the statement to work correctly as you're already checking for the error condition inside CASE expression and returning a default value for it. Then why does it error?

Explanation

The reason for the above noticed behavior is well documented in books online.

The CASE statement evaluates its conditions sequentially and stops with the first condition whose condition is satisfied. In some situations, an expression is evaluated before a CASE statement receives the results of the expression as its input. Errors in evaluating these expressions are possible. Aggregate expressions that appear in WHEN arguments to a CASE statement are evaluated first, then provided to the CASE statement

As you see from above it clearly suggests that while aggregate functions are used in CASE expression they get evaluated first and causes the divide by zero error as seen in the above case.
The full documentation can be seen from the below link


Solution


If you want to avoid the error then you need to go for an approach as below

If Object_ID('tempdb..#Test','U') Is Not Null Drop Table #Test;
Create Table #Test
(
ID Int identity(1,1),
Qty Numeric(15,2)
);
Insert
Into   #Test (Qty)
Values (100.25),
              (94.34);
Declare       @Factor Int=0;
Select Coalesce(Sum(Qty/Nullif(@Factor,0)) ,0)
From   #Test;

which would make sure the conversion of 0 in the denominator happens before the actual division operation


Conclusion

As you see from the above illustration the CASE ..WHEN expression tends to evaluate aggregations first before applying conditional logic. So in that case we would need to go for an alternative approach to avoid causing error conditions as illustrated above.
Thanks to Geri Reshef for bringing this up as a discussion in the forums.

Wednesday, January 20, 2016

T-SQL Tips: Representing NULL as Values in XML


This blog explains the various methods that can be applied for representing NULL values while generating a XML document using T-SQL FOR XML statement. Based on your actual requirement you can opt for any of the methods to get the result in your desired format.

Scenario

Consider the below table data

CREATE TABLE OrderDetails
(
OrderID int IDENTITY(1,1),
OrderDesc varchar(1000),
OrderDate datetime,
OrderStatus varchar(100),
ShippedDate datetime,
WebOrder bit
)

INSERT OrderDetails (OrderDesc,OrderDate,OrderStatus,ShippedDate,WebOrder)
VALUES ('Order 1','20150812','Delivered','20150825',NULL),
('Order 2','20151120','Delivered','20151123',NULL),
('Order 3','20151203','Delivered','20151215',1),
('Order 4','20151217','Delivered','20151222',NULL),
('Order 5','20160112','Processing',NULL,NULL),
('Order 6','20160114','Processing',NULL,NULL)

The above table shows details of orders. For simplicity I've included only a subset of columns and 6 indicative rows.
If you analyse the table you can see that there are columns in the table which represent optional fields ie like ShippedDate and WebOrder. They may have NULL values which indicates order has not yet been shipped or its not a web based order.
Now if we try to convert table data to XML using FOR XML PATH we will get the below

SELECT *
FROM OrderDetails
FOR XML PATH('Order'),ROOT('Orders') 

The output will be as shown



If you check the XML you can see that it ignores the node elements that have NULL values in the table for example WebOrder in first two nodes and ShippedDate in last two nodes. This is the default way in which XML gets generated.
If you want NULL values to be respresented in XML you need to follow any of the below approaches

1. Using XSINIL directive

This is the most common method of representing the absence of a value in XML
The query can be written as


SELECT *
FROM OrderDetails
FOR XML PATH('Order'),ROOT('Orders'),ELEMENTS XSINIL

This will cause it to replace NULL instances with xsi nil="true" in the xml
The resultant XML would be as below



As you see it represents NULL value by means of attribute xsi:nil = "true"

2. Using COALESCE for converting NULL values to defaults

In this case we use COALESCE function to convert the NULL values to default blank value.
The code will look like this

SELECT OrderDesc,OrderDate,OrderStatus,COALESCE(CAST(ShippedDate AS varchar(100)),'') AS ShippedDate,COALESCE(CAST(WebOrder AS varchar(100)),'') AS WebOrder
FROM OrderDetails
FOR XML PATH('Order'),ROOT('Orders')

The result would be as below

As you see from the above the missing data will come as blank nodes in this case. This is another way by which you can represent missing value in XML.

3. Using subquery based on FOR XML

Now lets see one more way of representing the missing values. This came as a requirement from the client who insisted they wanted to represent values using self closing nodes (like etc)
This can also be achieved as per below


SELECT OrderDesc,OrderDate,OrderStatus,
(SELECT ShippedDate  AS [*] FOR XML PATH('ShippingDate'),TYPE),
(SELECT WebOrder  AS [*] FOR XML PATH('WebOrder'),TYPE)
FROM OrderDetails
FOR XML PATH('Order'),ROOT('Orders')


Now if you check the result you can see the below

























As you see from above it represents missing values by means of self closing nodes which was exactly what client requirement asked for,

Conclusion

From the above methods we can see that there are multiple ways by which NULL values can be represented within XML document
The choice of a particular method depends on your exact requirement
Hope the above post will guide you to adopt a method to fetch you the desired result as per your specific scenario.

Monday, November 9, 2015

SSIS Tips: Performing Cross Join Using Merge Join Transform in SSIS

Question

Can we perform cross join (cartesian product) operation using Merge Join transform in SSIS?

Scenario

The above question was asked in one of the social forums recently. I was able to give the response to this.
I'm sharing the solution here with an illustration for the benefit of the wider audience

Illustration

Scenario 1

Consider the case where we have two tables say Calendar table and Student table in two servers say server1 and server2. The requirement is to perform cross join between the tables.

For this requirement we can use the below sample data
In server 1

SELECT * INTO DateTable
FROM dbo.CalendarTable('20150101','20150601',0,0)f

The UDF CalendarTable can be found here


In server 2 create this table

CREATE TABLE Students
(
StudentID int,
StudentName varchar(100),
RollNo int
)

INSERT Students
VALUES
(101,'Ravi',1000),
(102,'Sathesh',1001),
(103,'Mary',1002),
(105,'Mohamad',1003),
(107,'Tulsi',1004),
(109,'Maria',1005),
(111,'Keerthi',1006)

Once these tables are setup we can go ahead and create a SSIS package as below

The source statements used in OLEDB Source tasks are as below. We need to use SQL command mode for this.

Student Data
-------------------
SELECT *,1 AS Dummy
FROM dbo.Students

Date Data
-------------
SELECT *,1 AS Dummy
FROM dbo.DateTable
WHERE Date < '20150110'

I've just included some indicative dates for the illustration. In the actual case you can include dates as per your requirement
 Once these  tables are setup we will add a merge join transform to the package and configure it as below


Now link the output of this merge join to a recordset destination. You may also add a data viewer for previewing the output.

Now try executing the package and you can see the below



As you see from the above what you get as result would be the cartesian product of the records from both the tables which indicates that merge join performed a cross join operation.

You can also refer the count of records and can confirm its a cartesian join (m * n)


Scenario 2

Now lets see the case where one set of data comes from a database table and other set coming from a flat file. 
In this case the package would look like this


If you compare this to the earlier work flow you can see the below differences
1. One of the OLEDB source will be replaced by a flat file source as one of our sources is a flat file
2. We would require two additional tasks in the work flow for the flat file. One would be a derived column task to add the dummy column to the pipeline
3. Second task would be a sort transform to add a default sorting. This is a prerequisite for the merge join transform which requires both its inputs to be in sorted format.

Execute the package and you can see the below



As you see from the above the output clearly shows that merge join transform performs a cross join operation.

Summary

As seen from the above illustration you can very easily perform cross join by using a Merge Join transform in SSIS by utilizing a dummy column created on the fly inside the package
 

References


Tuesday, November 3, 2015

T-SQL Tips: Execute Individual Packages in Project Deployment Model using Dtexec

Question

Can we execute individual packages included in a SSIS 2012 project configured in project deployment model using dtexec utility?

Scenario

The above question was the main topic of discussion in one of the recent casual conversations with my fellow colleagues. 
Both the below responses came up during the discussion

1. The package can only be executed by referring to the ispac (project) file as deployment model being specified is project deployment model which implies project itself as the basic unit
2. The package should still be able to be executed individually by referring to the corresponding dtsx file.

I thought of trying it for myself to see which one (or both) of the above responses are true. I'm sharing the illustration and result through this blog for the benefit of the wider audience

Illustration

For the sake of this illustration I've created a very simple package in a project with deployment model being configured as default project deployment.

The package is trying to insert a row into a table in the database by receiving  a value through a parameter. There is a package scope parameter which is declared for the purpose as below

The table is setup as below

CREATE TABLE [dbo].[TestISPop](
[Val] [varchar](100) NULL
)

Now lets try executing the package once and check the result


As you see the package will populate the table with value that is passed from the parameter

Now lets try executing the package through dtexec. 

If you refer to the documentation of dtexec given below


You can see that the package can be executed using /FILE switch and pass parameter values through /SET switch
Now lets apply this in a sql query using xp_cmdshell as dtexec is a command line tool and see the result

The statement would look like

EXEC xp_cmdshell 'dtexec /FILE "\DtexecTest.dtsx"  /SET \Package.Variables[$Package::PopValue];2'

The execution result is as below



Now if you check the table you can see the value being populated


So it is implied that we can execute the package directly from dtexec using /FILE switch and passing package parameters through /SET \Package.Variables[$Package::ParameterName] switch

In addition to this we can also execute this by specifying the ispac file. 
Lets see how we can write the query for this

EXEC xp_cmdshell 'dtexec /Project "\bin\Development\Integration Services Project1.ispac" /Package "DtexecTest.dtsx" /SET \Package.Variables[$Package::PopValue];5'

Execution result as shown


Now check the execution result and you will see the below


As you see it will successfully populate the values. So we can see that we can execute the package either by referring to dtsx file directly using /FILE switch or by specifying ispac file using /PROJECT switch in dtexec.

Package with a Project Parameter

Now lets see what would be the behavior in case the package had a project parameter.
For this we will first modify the package parameter to make it a project parameter
Once that's done we can use a statement as below

EXEC xp_cmdshell 'dtexec /FILE "C:\Users\stu\Documents\Visual Studio 2010\Projects\Integration Services Project1\Integration Services Project1\DtexecTest.dtsx"  /SET \Package.Variables[$Project::PopValue];6'

But we can see that we will get an error like this




This shows that dtexec is unable to resolve the reference to the project parameter while trying to execute directly from the dtsx file which is understandable and as expected
Now lets try execute package from ispac file and see

EXEC xp_cmdshell 'dtexec /Project "C:\Users\stu\Documents\Visual Studio 2010\Projects\Integration Services Project1\Integration Services Project1\bin\Development\Integration Services Project1.ispac" /Package "DtexecTest.dtsx" /SET \Package.Variables[$Package::PopValue];12'

And you can see that it works fine and populates the table with the value as shown by the below result

So we can conclude that in the case of package with reference to a project parameter we can only execute it by referring to the ispac (project) file and not by directly referring to the individual dtsx file

Summary

As per the illustrations above we can summarize as below

1. Package with parameters defined in the package scope - Can be executed by referring both ispac by using /PROJECT switch or dtsx file by using /FILE switch in dtexec 
2. Package with parameter being defined in the project scope - Can only be executed by referring to the ispac file using /PROJECT switch

References

Sample package

dtexec

SSIS 2012 syntaxes

xp_cmdshell

Thursday, October 15, 2015

T-SQL Tips: Solving FORMAT function woes for time datatype

Impetus

The purpose of this blog is to clarify the confusion regarding the behavior of FORMAT function with the time datatype in SQL Server

Scenario

It all started with a discussion that happened in one of the groups. 
A person had posted a question asking about reason for the weird behavior of FORMAT function while applied on various date and time related datatypes in SQL 2012.
The illustration is given below


declare @t time ='20151015 13:40:20'
SELECT FORMAT(@t,'HH:mm') AS timefieldformatted

declare @dt datetime ='20151015 13:40:20'
SELECT FORMAT(@dt,'HH:mm')  AS datetimefieldformatted

declare @dt2 datetime2 ='20151015 13:40:20'
SELECT FORMAT(@dt2,'HH:mm')  AS datetime2fieldformatted

declare @dto datetimeoffset ='20151015 13:40:20'
SELECT FORMAT(@dto,'HH:mm')  AS datetimeoffsetfieldformatted

The output is as shown below




As you see from the above FORMAT was able to apply the passed specifiers over the value and apply the requested format in all cases except in the case of time datatype. So what is so special about time datatype which is causing this weird behavior from FORMAT function? Lets investigate

Reason

To understand the reason lets first refer the documentation of FORMAT function as laid out by MSDN


The below explanation is taken from the above link

FORMAT relies on the presence of .the .NET Framework Common Language Runtime (CLR).

The following table lists the acceptable data types for the value argument together with their .NET Framework mapping equivalent types.
..

If you see the above table you can see that time datatype in SQLServer  is mapped to TimeSpan .NET datatype
Now if you check TimeSpan datatype documentation for .NET 


You can see this

TimeSpan object represents a time interval (duration of time or elapsed time) that is measured as a positive or negative number of days, hours, minutes, seconds, and fractions of a second. The TimeSpan structure can also be used to represent the time of day, but only if the time is unrelated to a particular date

Now if we check the standard format strings for the TimeSpan datatype it only supports c, g and G as the specifiers


The way to specify custom specifiers is discussed in the below link


So as per the above links we need to use query as below to get the required result



declare @t1 time ='20151015 13:40:20'
SELECT FORMAT(@t1,'c') AS timefieldformatted


declare @t2 time ='20151015 13:40:20'
SELECT FORMAT(@t2,'g') AS timefieldformatted

declare @t3 time ='20151015 13:40:20'
SELECT FORMAT(@t3,'hh\:mm') AS timefieldformatted


The output will be as below




Summary

As seen from the above illustrations we should use specifiers as c, g or hh:\mm for getting time datatype values in the formats as hh:mm as the implementation of FORMAT function maps time datatype to TimeSpan .NET datatype.
 
Thanks to Erland who gave the solution in the forum discussion and cleared up the doubts in all of us. Hopefully this blog will help to improve the clarity on the behavior of FORMAT function to a wider audience