Monday, November 9, 2015

SSIS Tips: Performing Cross Join Using Merge Join Transform in SSIS


Can we perform cross join (cartesian product) operation using Merge Join transform in SSIS?


The above question was asked in one of the social forums recently. I was able to give the response to this.
I'm sharing the solution here with an illustration for the benefit of the wider audience


Scenario 1

Consider the case where we have two tables say Calendar table and Student table in two servers say server1 and server2. The requirement is to perform cross join between the tables.

For this requirement we can use the below sample data
In server 1

FROM dbo.CalendarTable('20150101','20150601',0,0)f

The UDF CalendarTable can be found here

In server 2 create this table

StudentID int,
StudentName varchar(100),
RollNo int

INSERT Students

Once these tables are setup we can go ahead and create a SSIS package as below

The source statements used in OLEDB Source tasks are as below. We need to use SQL command mode for this.

Student Data
SELECT *,1 AS Dummy
FROM dbo.Students

Date Data
SELECT *,1 AS Dummy
FROM dbo.DateTable
WHERE Date < '20150110'

I've just included some indicative dates for the illustration. In the actual case you can include dates as per your requirement
 Once these  tables are setup we will add a merge join transform to the package and configure it as below

Now link the output of this merge join to a recordset destination. You may also add a data viewer for previewing the output.

Now try executing the package and you can see the below

As you see from the above what you get as result would be the cartesian product of the records from both the tables which indicates that merge join performed a cross join operation.

You can also refer the count of records and can confirm its a cartesian join (m * n)

Scenario 2

Now lets see the case where one set of data comes from a database table and other set coming from a flat file. 
In this case the package would look like this

If you compare this to the earlier work flow you can see the below differences
1. One of the OLEDB source will be replaced by a flat file source as one of our sources is a flat file
2. We would require two additional tasks in the work flow for the flat file. One would be a derived column task to add the dummy column to the pipeline
3. Second task would be a sort transform to add a default sorting. This is a prerequisite for the merge join transform which requires both its inputs to be in sorted format.

Execute the package and you can see the below

As you see from the above the output clearly shows that merge join transform performs a cross join operation.


As seen from the above illustration you can very easily perform cross join by using a Merge Join transform in SSIS by utilizing a dummy column created on the fly inside the package