Monday, May 19, 2014

SSIS Tips: Conditional Processing of Files Based on FileTypes

One of the common requirements that we come across in any data warehousing project is the ability to process different types of files to extract data from them onto our data warehouse.
For this purpose SSIS provides a ForEach loop container with a file enumerator that helps us to iterate and process the files. There was a question on one of the forums recently whether its possible for us to select multiple file types inside the same ForEachLoop container. Though multiple file types are not supported by the Files property of ForEach loop this requirement can be handled by a workaround as explained in this blog.

For illustration purpose consider the scenario where we have files send from a client to a shared landing folder location. The requirement was to pick only the text and csv files from this and move them onto couple of other folders to do further processing. Now lets see how we can process both the file types using a single ForEachLoop with file enumerator in SSIS.

The landing folder looks like below

The package for the requirement would look like below
The tasks used are as follows

1. ForEachLoop container - This is used for iterating through the files in the folder. The properties are set as below

Since we've to select multiple file types we set the Files property as *.*. This will make sure it iterates through all the available files in the folder. We select Fully qualified as the option to make sure we get filename with full path and extension. This is then required by the subsequent tasks.
2. Script Task - The script task is used inside the loop to identify the file types. We make use of .NET script to get the file extension from the fully qualified filename and store it in a variable. The logic is as below

Public Sub Main()
'
' Add your code here
        '
        Dts.Variables("FileType").Value = Right(Dts.Variables("FileName").Value, Len(Dts.Variables("FileName").Value) - InStr(Dts.Variables("FileName").Value, "."))
Dts.TaskResult = ScriptResults.Success
End Sub

FileName variable contains the fully qualified filename from the loop which will be passed as a ReadOnly variable. The FileType variable is passed as a ReadWrite variable to extract and store the file type (extension) from the file name.
3. File System Tasks - Two file system tasks are used for processing (in this case moving files to separate folder) the different file types available ie CSV and TXT. The File System Task will look like below

The fully qualified filename we get from the ForEachLoop is used as the source connection. Destination connection points to the folder where we need to move the corresponding files to.
We make use of expression based precedence constraint to make sure only the files with relevant file types are sent to the corresponding File System Tasks. The expressions make use of FileType variable to which we extract the file type information within the script task.
The expression used will be as below and we set constraint to OnSuccess

UPPER(@FileType) = = "CSV" for CSV files
UPPER(@FileType) = = "TXT" for TXT files

On executing the package, you can see the ForEach loop filtering only the required file types and file will be moved to the corresponding folders by the corresponding file system task.
This approach can be used to do the conditional processing of files based on their file types existing on a common folder using a single ForEach loop container in SSIS.

No comments:

Post a Comment