Monday, November 9, 2015

SSIS Tips: Performing Cross Join Using Merge Join Transform in SSIS

Question

Can we perform cross join (cartesian product) operation using Merge Join transform in SSIS?

Scenario

The above question was asked in one of the social forums recently. I was able to give the response to this.
I'm sharing the solution here with an illustration for the benefit of the wider audience

Illustration

Scenario 1

Consider the case where we have two tables say Calendar table and Student table in two servers say server1 and server2. The requirement is to perform cross join between the tables.

For this requirement we can use the below sample data
In server 1

SELECT * INTO DateTable
FROM dbo.CalendarTable('20150101','20150601',0,0)f

The UDF CalendarTable can be found here


In server 2 create this table

CREATE TABLE Students
(
StudentID int,
StudentName varchar(100),
RollNo int
)

INSERT Students
VALUES
(101,'Ravi',1000),
(102,'Sathesh',1001),
(103,'Mary',1002),
(105,'Mohamad',1003),
(107,'Tulsi',1004),
(109,'Maria',1005),
(111,'Keerthi',1006)

Once these tables are setup we can go ahead and create a SSIS package as below

The source statements used in OLEDB Source tasks are as below. We need to use SQL command mode for this.

Student Data
-------------------
SELECT *,1 AS Dummy
FROM dbo.Students

Date Data
-------------
SELECT *,1 AS Dummy
FROM dbo.DateTable
WHERE Date < '20150110'

I've just included some indicative dates for the illustration. In the actual case you can include dates as per your requirement
 Once these  tables are setup we will add a merge join transform to the package and configure it as below


Now link the output of this merge join to a recordset destination. You may also add a data viewer for previewing the output.

Now try executing the package and you can see the below



As you see from the above what you get as result would be the cartesian product of the records from both the tables which indicates that merge join performed a cross join operation.

You can also refer the count of records and can confirm its a cartesian join (m * n)


Scenario 2

Now lets see the case where one set of data comes from a database table and other set coming from a flat file. 
In this case the package would look like this


If you compare this to the earlier work flow you can see the below differences
1. One of the OLEDB source will be replaced by a flat file source as one of our sources is a flat file
2. We would require two additional tasks in the work flow for the flat file. One would be a derived column task to add the dummy column to the pipeline
3. Second task would be a sort transform to add a default sorting. This is a prerequisite for the merge join transform which requires both its inputs to be in sorted format.

Execute the package and you can see the below



As you see from the above the output clearly shows that merge join transform performs a cross join operation.

Summary

As seen from the above illustration you can very easily perform cross join by using a Merge Join transform in SSIS by utilizing a dummy column created on the fly inside the package
 

References


No comments:

Post a Comment