Monday, June 23, 2014

SSIS Tips: Handling Embedded Text Qualifiers in SSIS Flat File Source

Recently in one of my projects there was a case where I had to transfer the data from flat file to SQLServer table. The file was generated by an automated process at the client side and was FTPed to our landing location. I created a simple package to do the data transfer with a single data flow task which looks like below.
On executing the package it broke with the below error
I checked the source file and it resembled the below structure (actual data replaced by sample for illustration)
On careful analysis I understood the issue. The reason was due to the presence of embedded text qualifier characters within the data ie Description value of "Descr3 "Descr4" desc5". Now question was how to handle this to do the data transfer without any issues. In my case it was not possible to fix this at source as the file generation step was automated. This post deals with a workaround which can be applied to get around the above issue.
I added a new data flow task before the current data flow. The purpose of this data flow task was to change the text qualifier character (in this case ") to some other character (here | character) so as not to get confused with the embedded data characters. The data flow will have a structure as below


What this does is to read the entire row contents of the file as a single column (just don't select anything for the column delimiter) and the replace the text qualifier character from " to | and save it as a new file.
The connection manager would be configured as below

Notice the row and column delimiter settings used
Now for the derived column task use an expression as below

SUBSTRING(REPLACE(REPLACE("," + DataColumn + ",","\",","|,"),",\"",",|"),2,LEN("," + DataColumn + ",") - 2)

What this does is to replace the occurrences of ," or ", pattern with ,| or |, which will replace text qualifier character to | from ". A comma (,) character is added to the beginning and end of string for making the pattern consistent and after the replacement  we ignore the first and last characters we added using the SUBSTRING function
Once this is saved to new file we will get the modified file structure as this
.
Now in our main data flow point to the above saved file and configure the settings as below


Now link this to an OLEDB destination to populate your table and on executing you'll get the below data in the table

which clearly shows that the file data was properly transferred to the table.
This is a good workaround you can apply to deal with embedded qualifier characters coming in your data and get your data successfully transferred.
The package and sample file can be accessed from the below link
Package
https://drive.google.com/file/d/0B4ZDNhljf8tQQVRmTkJ4U01BLTA/edit?usp=sharing
Sample file
https://docs.google.com/spreadsheets/d/1HPIaz1VSR9BXfZo27KWxvq2qv12pzLNDRQHRTuCwx0U/edit?usp=sharing
Please revert if you want more clarification on the above and I will be happy to help. Also let me know any other comments you may have. Happy coding!