Question
Can we perform cross join (cartesian product) operation using Merge Join transform in SSIS?
Scenario
The above question was asked in one of the social forums recently. I was able to give the response to this.
I'm sharing the solution here with an illustration for the benefit of the wider audience
I'm sharing the solution here with an illustration for the benefit of the wider audience
Illustration
Scenario 1
Consider the case where we have two tables say Calendar table and Student table in two servers say server1 and server2. The requirement is to perform cross join between the tables.
For this requirement we can use the below sample data
In server 1
SELECT * INTO DateTable
FROM dbo.CalendarTable('20150101','20150601',0,0)f
The UDF CalendarTable can be found here
In server 2 create this table
CREATE TABLE Students
(
StudentID int,
StudentName varchar(100),
RollNo int
)
INSERT Students
VALUES
(101,'Ravi',1000),
(102,'Sathesh',1001),
(103,'Mary',1002),
(105,'Mohamad',1003),
(107,'Tulsi',1004),
(109,'Maria',1005),
(111,'Keerthi',1006)
Once these tables are setup we can go ahead and create a SSIS package as below
The source statements used in OLEDB Source tasks are as below. We need to use SQL command mode for this.
Student Data
-------------------
SELECT *,1 AS Dummy
FROM dbo.Students
Date Data
-------------
SELECT *,1 AS Dummy
FROM dbo.DateTable
WHERE Date < '20150110'
I've just included some indicative dates for the illustration. In the actual case you can include dates as per your requirement
Once these tables are setup we will add a merge join transform to the package and configure it as below
Now link the output of this merge join to a recordset destination. You may also add a data viewer for previewing the output.
Now try executing the package and you can see the below
As you see from the above what you get as result would be the cartesian product of the records from both the tables which indicates that merge join performed a cross join operation.
You can also refer the count of records and can confirm its a cartesian join (m * n)
You can also refer the count of records and can confirm its a cartesian join (m * n)
Scenario 2
Now lets see the case where one set of data comes from a database table and other set coming from a flat file.
In this case the package would look like this
If you compare this to the earlier work flow you can see the below differences
1. One of the OLEDB source will be replaced by a flat file source as one of our sources is a flat file
2. We would require two additional tasks in the work flow for the flat file. One would be a derived column task to add the dummy column to the pipeline
3. Second task would be a sort transform to add a default sorting. This is a prerequisite for the merge join transform which requires both its inputs to be in sorted format.
Execute the package and you can see the below
As you see from the above the output clearly shows that merge join transform performs a cross join operation.
Summary
As seen from the above illustration you can very easily perform cross join by using a Merge Join transform in SSIS by utilizing a dummy column created on the fly inside the package
No comments:
Post a Comment