Last week I had blogged about how to handle embedded text qualifiers coming in flat file source data in the below blog.
http://visakhm.blogspot.in/2014/06/ssis-tips-handling-embedded-text.html
One of the feedback I got was from a ETL developer who asked whether its possible to handle files with inconsistent text qualifiers using flat file source in SSIS. He had also sent a sample file which I've adapted here for illustrating the solution I had provided for him.
The file looks like below
As you see the file has inconsistent text qualifier ie for 3rd column it has combination "| as text qualifier and for other columns its | alone. It also has " characters coming inside data as well (see row 3).
In order handle this the basic approach would be same as what we discussed previously. We will add a separate data flow task to handle the inconsistent text qualifiers and make them consistent throughout.
The first data flow looks like below
The flat file source here reads the entire row contents of the file as a single column. This is achieved by specifying the row delimiter as {CR}{LF} and no column delimiter specified. The derived column task will make text qualifier consistent by doing pattern replace using REPLACE function. The expression used would be this
REPLACE(REPLACE(DataColumn,",\"|",",|"),"|\",","|,")
This is then saved into a new flat file temporarily which would have structure as below
http://visakhm.blogspot.in/2014/06/ssis-tips-handling-embedded-text.html
One of the feedback I got was from a ETL developer who asked whether its possible to handle files with inconsistent text qualifiers using flat file source in SSIS. He had also sent a sample file which I've adapted here for illustrating the solution I had provided for him.
The file looks like below
As you see the file has inconsistent text qualifier ie for 3rd column it has combination "| as text qualifier and for other columns its | alone. It also has " characters coming inside data as well (see row 3).
In order handle this the basic approach would be same as what we discussed previously. We will add a separate data flow task to handle the inconsistent text qualifiers and make them consistent throughout.
The first data flow looks like below
The flat file source here reads the entire row contents of the file as a single column. This is achieved by specifying the row delimiter as {CR}{LF} and no column delimiter specified. The derived column task will make text qualifier consistent by doing pattern replace using REPLACE function. The expression used would be this
REPLACE(REPLACE(DataColumn,",\"|",",|"),"|\",","|,")
This is then saved into a new flat file temporarily which would have structure as below
You can see from the above that text qualifier is now consistent (|).
Now you can use a new data flow task with above file as the source, choose delimiter and text qualifier as , and | and it will work fine populating the data to your destination table.
This is an approach you can apply to extract data out of flat file with inconsistent text qualifiers and get it loaded into your database table using SSIS.
The sample file and package for this can be found in the below link.
sample file: https://docs.google.com/document/d/1Cqn_QFQsPMLBAY-Hr7VP8YmBZMPijv13GSEK8afkcxQ/edit?usp=sharing
Replace the connectionstrings with your actual values and you will be able to execute and test the package for yourself.
Feel free to revert for any further clarification. Happy coding!
No comments:
Post a Comment