Monday, May 26, 2014

String Comparison in SSIS Lookup Task SSIS vs T-SQL Engine

Recently a colleague of mine came to me with an issue he was facing. He was trying to do mapping of data from a flat file source against a reference table to get the conversion codes corresponding to the incoming data values. He was using the lookup task and was finding that lots of data were going through the unmatched output in spite of them having matching detail in the reference table.
I analyzed his package and tweaked few options and found out that this was happening only with the default settings of lookup task which is using full cache mode. When I tried to use partial or no cache option it could finding out the matching details. So I understood its the difference in the way comparison works for these two cases.
I analyzed further and understood the reason. Apparently in full cache mode the comparison happens inside SSIS engine itself and so it does a case sensitive comparison. In contrast, during partial and no cache modes comparison takes place in T-SQL engine and so it will be case insensitive by default unless you've changed default setting to use a case sensitive collation instead.
So in the above scenario if you still want the comparison to work fine in the above case you need to do a small tweak to unify the casing of values before doing the comparison. This can be done by using a derived column task after the flat file source and using expression to unify the casing of values like UPPER(ColumnName). Then in the query used inside lookup task use UPPER(ColumnName) to make result in upper case. Once you do that comparison will work as intended by ignoring the actual case of the values.
See the below illustration to see it in action
The source file looks like below
And the reference table has the below data
The attempt is to lookup against the table based on the value of ProductName field. Its quite evident from the above that the file has matching values existing in the reference table. Now lets see what happens when we execute package using the full cache option. The package execution looks like below

As you see from the above all rows from source where passed through no match lookup indicating that lookup was a failure. This is expected behavior as inside the file the values exist in camel casing whereas in table its in upper case. Since we used full cache option the comparison took place in SSIS engine which is why it did a case sensitive search and lookup failed to find the matches.
Change the cache mode to partial or no cache mode and on running the package execution result looks like below

The above screenshot clearly suggests the lookup matched all the incoming data indicating that it did a case insensitive search as the comparison was done in SQL Server.
The workaround to obtain the same functionality using full cache mode is as below

The difference here is the addition of derived column transform before the lookup to convert the input values to upper case to match those present in the reference table. This would cause the lookup to match values even when comparison takes place inside SSIS as the casing is now the same between file and table data.
This is something to take care of while working with the lookup task in SSIS.