Sunday, May 27, 2012

Package to Implement daily processing of files from a remote folder

Of late, there were couple of instances where I was asked by fellow developers for solution which provides mechanism of daily processing of files coming in a remote folder. This is a typical scenario that we come across in any data processing systems. I'm posting here the solution I suggested to them so that it would benefit anybody else who comes across a similar scenario and has to devise a solution for that.
Now the scenario, As a part of daily file processing, the daily feed of data would be coming as a csv or flat file in a remote location. The file name will have details of date in which feed came and processing system should have capability to filter only files containing current days data feeds.
As per scenario I've devised a solution. The package looks like below

Lets see the package layout and understand how it provides required functionality. The package has a ForEachLoop container which makes use of file enumerator. This task enables package to iterate through the various files which is present in our shared folder. Now lets see the property settings for ForEachLoop container

The names of files found by loop container will be stored each time in a variable created which is mapped inside ForEachLoop container. As you see we've multiple ways of capturing filename. I've selected the fully qualified option which returns the entire file path.On each iteration, the full path of file found would be stored in the variable.
This is followed by script task which validates whether the file found is valid or not. The logic is used as below.
The logic used simply checks for the pattern of current date within current filename and populates boolean variable based on result. This boolean variable determines whether file should be further processed or not.
Following this we will have the data flow task to do actual data transfer from the file to DB table.The data flow task will be linked by means of expression based precedence constraint where we check for status of boolean variable. This will ensure only valid files will get processed.
The data flow task will be simple with flat file source and OLEDB destination to connect to destination SQL Server DB.

The flat file source connection string will be based on an expression as shown below to make it dynamic as source file changes each time.

Now lets see how this package executes. The source folder will be like this

As you see only two out of three files have current date pattern coming inside their name. Now when we execute the package and check the table lets see the following result

As you see only the files with current date in it will be processed with data transferred to table.
Hope you will be able to make use of above logic in similar situations as I did.

No comments:

Post a Comment