Thursday, April 20, 2017

SQL 2017 CTP Released!

SQLServer 2017 CTP is out for public preview now.
SQLServer 2017 has some exciting features on board like resumable online index rebuild, adaptive query query processing etc
Also great improvement can be seen in the performance of  datawarehouse workloads in Linux environment.

More details on the release here

https://blogs.technet.microsoft.com/dataplatforminsider/2017/04/19/sql-server-2017-community-technology-preview-2-0-now-available/

Thursday, April 13, 2017

SSIS Tips: Case Study - Extracting ZenDesk data using SSIS

Impetus

 Its been a long time since I wrote a blog.
The main reason was that I've been super busy with some exciting projects of late.
One of my recent projects had an interesting requirement which I thought would be worth sharing for the benefit of the wider audience.

Scenario

One of my recent BI engagements with a e-commerce customer had a use case which required customer satisfaction metrics to be captured and analyzed. The metrics were to be derived from variety of data including support tickets, delivery, feedback and notification emails etc. The client was using ZenDesk as the ticketing application. All of the above required information was tracked by means of tickets within ZenDesk so the attempt was to integrate ZenDesk data using ETL tool SSIS and use the data for analysis along with the other available data.

Illustration

ZenDesk offers API access to its data. The attempt was to use SSIS and access the required ZenDesk data using API calls. 
For the purpose of this blog lets see a simple scenario of how SSIS can be used to call ZenDesk API for extracting tickets related data.
The details regarding ZenDesk APIs can be found here


Refer to the Search section which explains on various API endpoint for searching ZenDesk data. We will be utilizing this for searching and fetching the ticket information using our required filters. The search API is flexible due to the fact that it allows a set of query string parameters which can be utlized to customize our search. Some example usages are given in the below link 


So for our case since we require ticket information the full URL would look like below

https://{subdomain}.zendesk.com/api/v2/search.json?sort_by=created_at&sort_order=asc&query=created> type:ticket

Here subdomain represents the subdomain used while registering with ZenDesk (mostly the company name) and date value the date relative to which you want to do the search. sort_by parameter is used to get the search result in chronological order.
The response for the above API call will look like this

{
"results":[
{
"url":"https://.zendesk.com/api/v2/tickets/36046.json",
"id":36046,
"external_id":null,
"via":{
"channel":"email",
"source":{
"from":{
"address":"xyz2@gmail.com",
"name":"XYZ 2"
},
"to":{
"name":"abc.com",
"address":"customercare@abc.com"
},
"rel":null
}
},
...],
"facets":null,
"next_page":null,
"previous_page":null,
"count":72
}

As you see it gives a JSON response having the above structure 
One more thing to note here is that API returns only a maximum of 100 results in a response. In case search result exceeds 100 items the response gets auto-paged at every 100 results. The url for the next page will be sent along with the response (refer to the next_page/previous_page elements in the above JSON)
So the cases where we have more results we need to capture next_page value each time and sent a new request to the server using that url. This will continue until the last page in which case the next_page value will be null.
Considering all these points a simple package for extracting ZenDesk data will look like below


Now we can analyse the various tasks inside the package to understand their purpose
For Loop: The loop is included to ensure we iterate through requests until we reach the last page of our search result. It will be a simple For loop based on a counter which will be initialized to 1 and reset inside the loop when the results are completed to break from the iteration.
The Expression Tasks are used to set the iterative values for request URL i.e. URL to fetch next page of results as well as for the reset of the URL and counter values
The script task does the download of search results in JSON format into a text file using API call. It utilizes a HTTP connection which will use ZenDesk API URL as the source through a variable expression to pass dynamic values like date.
The variable expression in our case looks like below

"https://{domain}.zendesk.com/api/v2/search.json?sort_by=created_at&sort_order=asc&query=created >" + (DT_WSTR, 30)  (DT_DBDATE) @[User::Date] + " type:ticket"

The script task code will look like this

Dim nativeObject As Object = Dts.Connections("HTTP Connection Manager").AcquireConnection(Nothing)

        
        Dim connection As New HttpClientConnection(nativeObject)


        ' Download the file #1
        ' Save the file from the connection manager to the local path specified
        Dim filename As String = ""
        connection.DownloadFile(filename, True)

The Execute SQL Task uploads the JSON data from the file to a table in SQL Server database. Once data is transferred it uses a JSON parsing logic to get the individual elements out.
Parsing JSON can be done using native SQL functions like OPENJSON, JSON_VALUE etc if you're on SQL 2016 . This is already explained in my earlier blogs below


If you're on an earlier version of SQL, you need to use a UDF for parsing JSON like one below


The SP code will look like below in that case

INSERT JSONData(JSON)
SELECT *
FROM OPENROWSET(BULK '',SINGLE_BLOB) AS t


IF OBJECT_ID('tempdb..#Temp') IS NOT NULL DROP TABLE #Temp

SELECT f.* 
INTO #Temp 
FROM JSONData j
CROSS APPLY dbo.parseJSON(j.JSON)f


SELECT @NextURL = StringValue
FROM #Temp
WHERE NAME = 'next_page'

Once you parse it you will get values of various elements within the JSON. This can be used for extracting the required information
The next_url value will be captured and returned through an OUTPUT parameter back to the package. Using this we can determine whether to iterate again for the next page of results or break the loop.
Once you parse the data out to a table you can then use it for your further manipulations like aggregation etc.

Conclusion

As seen from the above illustration you can use a simple script task utilizing a HTTP connection in SSIS package to call and extract data through ZenDesk APIs. This data can then be used for manipulations within your applications and can be utilized in BI reports, dashboards etc.

Friday, October 21, 2016

WinSCP Tips: Passing Credentials With Special Characters

Scenario

Recently there was an issue faced in one of our projects on a file processing system that we designed. The file processing system was designed to process third party files which had to be downloaded from a FTP location. 
We designed a SSIS package to do the FTP download and file processing and used WinSCP for the purpose. The logic used was as per one of my earlier articles as given below


On executing the package the Execute Process Task which was added to do the FTP has failed. As SSIS doesn't give too much information on the error I put a break point on the pre execute event of the execute process task and captured the full statement which is to be executed by WinSCP. 
I launched a command prompt window and tried executing the statement generated by the SSIS and got the below



As seen from the above screenshot I got an access denied error. I checked the credentials again and tried them on WinSCP client installed in my machine and to my surprise it was working fine allowing me to connect to the FTP location. From this I could infer that the problem was with the way SSIS or the WinSCP command line is handling the credentials.
I did some detailed analysis on the command used for FTP operation by WinSCP which was as below

open ftp://username:password@Servername:Port

And managed to find the issue. The problem in this case was the password which contains special characters like +. This will break in case of the WinSCP. 

Solution

The solution in the above case was to encode the special characters before being passed to the url. If you've used .NET programming before you should know that there's a function available in .NET EscapeDataString which is available in System.Uri namespace. This function provides with the encoded values when special character is passed.
Based on this we can develop a solution such as below for handling the above scenario.
Create a function which will parse through the passed URL string and looks for special characters within it. Once found it will pass the character to the EscapeDataString function which will return the encoded value corresponding to it.
The string parsing can be done using RegexMatch function to find out the special characters.
So the overall package would look like this

The Script task will receive as parameter the URL for FTP/SFTP and then parse it for encoding the special characters as below

public void Main()
{

            var str = EncodeUrl(Dts.Variables["Url"].Value.ToString());
            Dts.Variables["EncodedUrl"].Value = str;
                Dts.TaskResult = (int)ScriptResults.Success;
}

       
        private static string EncodeUrl(string URLInput)
        {
            const string pattern = "[$&+,/:;=?@]";
            var match = Regex.Match(toEncode, pattern);
            while (match.Success)
            {
                URLInput URLInput.Replace(match.Value, Uri.EscapeDataString(match.Value));
                match = Regex.Match(URLInput, pattern);
            }
            return URLInput;
        }

Once the encoded URL is returned by the above function you can then store it in a variable and use it to set expression properties for your Execute Process Task to use the returned value. 
In the above code you need to add a reference to System.Text.RegularExpressions namespace  before using the Regex.Match function. Uri.EscapeDataString comes under System.Data namespace which will be included by default inside the script task. 
In the above case function will replace character + with its encoded value %2B which will work well with the WinSCP. 

Conclusion

As discussed above the special characters in credentials have to be encoded before generating the URL for FTP in case of applications like WinSCP.
The above illustration gives an easier way of doing it within the SSIS package using a simple .NET function which can be invoked from within a simple script task.
Hope this helps to solve similar issues faced due to the presence of special characters in URLs and in query string parameter values.

Friday, August 26, 2016

SSRS Tips : Fixing Format Issues in MHTML Rendered SSRS Reports

Impetus

The impetus for writing this blog came from a recent issue I faced while setting up email based subscription for one of my MIS reports for enterprise wide distribution. The report content had to be embedded within the email in HTML format so I had set the rendering option as MHTML for the subscription. When report is delivered using subscription it can be noticed that none of the formatting, cosmetic settings on the original report gets applied. 
This article explains how you can overcome this issue to preserve the native format settings you applied to the report while rendering it through automated subscriptions in MHTML format.


Illustration

Consider the below report created in SSRS


The requirement was to schedule an automated delivery of the above report through email to a list of users. The report has to be embedded within the email body rather than being as a separate attachment.
The best way to implement the above requirement is to set an email based subscription for the report and configure the render format as MHTML. This would ensure that the report will be embedded within the body of the email
As per the above approach subscription is configured for the report as below

.

When the subscription got triggered the email obtained was as below

As you see from the above, the rendered report was stripped off its format settings when rendered in MHTML format
This would cause difficulty in interpreting the report especially when amount of data shown in the report is huge. Question was how to make sure the format remains intact while rendering through sunscription as above
I was able to arrive at the solution after some analysis. The solution is to use expression mode to set the format properties for the report cells
As an example see the below illustration


Once you set this as an expression and set it for subscription you can see the rendered report as below


As you see from the above the report gets rendered with all the format settings correctly applied. All the property values have been set using expressions in the report designer and MHTML format will preserve the values while rendering through the subscription.

Conclusion

As seen from the illustration above, the format properties of the report can be preserved intact by using expressions while rendering in MHTML format.

Tuesday, August 2, 2016

What's New in SQL 2016 - Temporal Tables

Introduction

Most of us are familiar with CDC tables in SQL Server. CDC or Change Data Capture is used to track and capture the DML changes happening in a table and makes use of log entries for capturing the information. SQL 2016 provides similar kind of functionality for capturing the history of the data related to memory optimized tables. These historical data capturing feature is referred to as Temporal Tables.

Memory  Optimized Tables were introduced in SQLServer 2014 and they provide an efficient way to store and manipulate data using natively compiled procedures, They provide two levels of durability Schema Only and Schema and Data. Schema Only will only make sure schema is preserved in case of a server restart whereas in the latter case both the schema as well as the data is persisted as in the case of a normal table (disk based).
Temporal tables are only supported in the case where durability option is set to schema and data. These tables will be persisted to disk along with data. When we created a temporal table it will preserve the history and will provide data easily based on our point in time analysis.
The temporal tables makes use of a history table internally to track the history of data changes happening in the memory optimized table. The main table will have two datetime2 type fields which are referred to as period columns. This is used by the table to determine the validity of each record to provide an effective point in time analysis. The date fields determine the validity of a record ie period for which record is(was) valid

Illustration

Consider the below case where we have a database to which we are going to add a memory optimized table and make it a temporal table.

For creating a memory optimized table we need to first have a file group in the database that supports memory optimized data. This can be done by using a series of ALTER DATABASE statements as shown below (Assumption is that SQL2016LocalTest is an already existing database in your SQLServer instance)

ALTER DATABASE [SQL2016LocalTest] ADD FILEGROUP mem_opt CONTAINS MEMORY_OPTIMIZED_DATA   
ALTER DATABASE [SQL2016LocalTest] ADD FILE (name='mem_opt1', filename='') TO FILEGROUP mem_opt   
ALTER DATABASE [SQL2016LocalTest] SET MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT=ON  
GO  


Once the filegroup is setup as above we shall create the memory optimized table on it using CREATE TABLE statement as below

CREATE TABLE dbo.HotelBooking
(
BookingID int,
RoomNo int,
CustomerName varchar(100),
BookingDate datetime,
ValidFrom datetime2  generated always as row start not null default ('19000101'),
ValidTo datetime2    generated always as row end not null default ('99991231'),
PRIMARY KEY CLUSTERED (BookingID),
period for system_time(ValidFrom, ValidTo)
)
 WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.BookingHistory));


Once the above statement is executed we will have a temporal table setup with an internal history table which will used for tracking the history of data from the main table.
You can check this by expanding tables under the corresponding databases node to see the tables as per below



As you see the table will be represented with a small clock icon indicating that it is a temporal table and will also have the history table specified under it. The history table will be an exact replica of the main table in schema and will have clustered index on the key column.
Now lets insert some data to this table. For evaluating the history capture its best if you do the data manipulation operation over a period of time.

INSERT dbo.HotelBooking
(
BookingID,
RoomNo,
CustomerName,
BookingDate
)
VALUES
(
1001,
101,
'Peter',
'20150923'
)

UPDATE dbo.HotelBooking 
SET CustomerName = 'Ramu' 
WHERE RoomNo = 101

INSERT dbo.HotelBooking
(
BookingID,
RoomNo,
CustomerName,
BookingDate
)
VALUES
(
1002,
115,
'John',
'20160314'

)

INSERT dbo.HotelBooking
(
BookingID,
RoomNo,
CustomerName,
BookingDate
)
VALUES
(
1005,
212,
'Mary',
'20160416'

)

UPDATE  dbo.HotelBooking
SET BookingDate = '20160420'
WHERE BookingID = 1005

INSERT dbo.HotelBooking
(
BookingID,
RoomNo,
CustomerName,
BookingDate
)
VALUES
(
1011,
307,
'Khader',
'20160617'

)

UPDATE dbo.HotelBooking 
SET CustomerName = 'Michelle' 
WHERE RoomNo = 101

INSERT dbo.HotelBooking
(
BookingID,
RoomNo,
CustomerName,
BookingDate
)
VALUES
(
1015,
115,
'Mohsin',
'20160622'

)

DELETE FROM dbo.HotelBooking
WHERE BookingID = 1002

Once you've done the above data addition/modification you can check the main and history table and you will get the below data snapshot




I have done the data modification over a period of time (around 2 days) to indicate the effect on history table.
If you analyze the above data you can see how the progression of data is getting captured in the history
Everytime a new insert happens it will be captured in the main table HotelBooking with ValidFrom set to the corresponding system time and ValidTo set to the maxdate value (9999-12-31 23:59:59.99999)
Similarly every update operation is carried out by means of two internal operations a delete of the old entry followed by insertion of new entry. This can be seen from resultset above where you will have an entry in both the tables (see records with BookingID 1001 and 1005 in the above case). The history table will have the entry with the values prior to the modification with ValidFrom as actual ValidFrom value from the main entry and ValidTo as the corresponding system time. The main table entry will have the modified values with ValidFrom as the corresponding system time and ValidTo as the max date value.
In the case of delete operations the record will be removed from the main table and there will be an entry added to the history table with ValidFrom as the original ValidFrom and ValidTo as the corresponding system time.
Now that we have got an idea of what happens on the background and how data will get captured in the two tables lets now see some functions which are associated to Temporal tables and which will help us to query temporal data for time based analysis.

AS OF time

AS OF time function gives the snapshot of temporal table data at an instant of time i.e the data that is valid at that instance

Lets try it on our sample table and illustration is below

SELECT * FROM dbo.HotelBooking FOR SYSTEM_TIME AS OF '2016-07-25 05:15'



Now see the last resultset and compare it with the original table vs history entries and you will notice that there are only 3 entries in the resultset.
If you check the ValidFrom and ValidTo values for the resultset entries its pretty evident that those were the entries which were valid as on specified snapshot time (ie ValidFrom < 2016-07-25 05:15 < ValidTo)
 The ignored records were those which were either expired before snapshot date (ex: 1001 Peter) or the ones which became valid after the snapshot time (ex: 1015,1011)
So effectively what AS OF function does is to apply a query logic as below

SELECT * FROM dbo.HotelBooking WHERE '2016-07-25 05:15' BETWEEN ValidFrom  AND ValidTo
UNION ALL
SELECT * FROM dbo.BookingHistory WHERE '2016-07-25 05:15' BETWEEN ValidFrom  AND ValidTo  

Execute this and you will get the same resultset as returned by AS OF

Graphically this can be depicted using the three scenarios as shown below


FROM time1 TO time2

FROM time1 TO time2 function returns all records whose validity overlaps the period specified by time1 to time2. 
The illustration for this can be shown as below

SELECT * FROM dbo.HotelBooking FOR SYSTEM_TIME  FROM '2016-07-24 10:34' TO '2016-07-25 05:16:55.4510553'



The resultset in this consists of all records with the interval ValidFrom to ValidTo overlapping between interval specified by time1 and time2. One thing to note here is that record with BookingID 1011 was not included as it became valid only from the end boundary value whereas function considers only the cases where interval starts or ends within the boundary value ie the equivalent query can be written as

SELECT * FROM dbo.HotelBooking WHERE   ValidFrom < '2016-07-25 05:16:55.4510553' AND ValidTo > '2016-07-24 10:34'
UNION ALL
SELECT * FROM dbo.BookingHistory WHERE    ValidFrom < '2016-07-25 05:16:55.4510553' AND ValidTo > '2016-07-24 10:34'

The corresponding scenarios can be represented graphically as below


BETWEEN time1 AND time2


BETWEEN function works quite similar to the FROM function to get the resultset where record interval of ValidFrom to ValidTo  overlaps with the interval specified by time1 to time2. The only difference between BETWEEN and FROM is that BETWEEN considers the end boundary value (time2) also as a part of the interval unlike the FROM function.
So if you see the illustration for BETWEEN it looks like below

SELECT * FROM dbo.HotelBooking FOR SYSTEM_TIME  BETWEEN '2016-07-24 10:34' AND '2016-07-25 05:16:55.4510553'




Check the result and you can see that its very much similar to resultset for FROM with only addition being the record with BookingID 1011 which starts at the same time as the end time of the interval
So equivalent query in this case can be given as

SELECT * FROM dbo.HotelBooking WHERE   ValidFrom <= '2016-07-25 05:16:55.4510553' AND ValidTo > '2016-07-24 10:34'
UNION ALL
SELECT * FROM dbo.BookingHistory WHERE    ValidFrom <= '2016-07-25 05:16:55.4510553' AND ValidTo > '2016-07-24 10:34'
ORDER BY ValidFrom

Corresponding graphical representation is shown below



CONTAINED IN (time1,time2)

CONTAINED IN function returns a resultset where the record validity falls within the interval from time1 to time2 ie ValidFrom and ValidTo both lying within the interval

Illustration is given below

SELECT * FROM dbo.HotelBooking FOR SYSTEM_TIME  CONTAINED IN ( '2016-07-24 10:32','2016-07-25 05:18:35.3157322') ORDER BY ValidFrom


The resultset on analysis reveals that only those records whose validity falls fully within the interval time1 to time2 are returned by CONTAINED IN function.

The equivalent query is as below

SELECT * FROM dbo.HotelBooking WHERE ValidFrom >= '2016-07-24 10:32' AND ValidTo <= '2016-07-25 05:18:35.3157322'
UNION ALL
SELECT * FROM dbo.BookingHistory WHERE ValidFrom >= '2016-07-24 10:32' AND ValidTo <= '2016-07-25 05:18:35.3157322' ORDER BY ValidFrom

Graphically this can be represented as below


ALL


The ALL function simply combines the results from both the main and history table in the resultset

Illustration below

SELECT * FROM dbo.HotelBooking FOR SYSTEM_TIME ALL   ORDER BY ValidFrom



The resultset includes all records from both the tables

The equivalent query is as below

SELECT * FROM dbo.HotelBooking 

UNION ALL

SELECT * FROM dbo.BookingHistory  ORDER BY ValidFrom

See the corresponding graphical representation below

Conclusion

The above illustrations gives a clear idea about how temporal functions can be used to generate time based results from a temporal table. 
This shows that Temporal tables provide a much convenient way of capturing history and using it for effective time based analysis
Hope you this article was informative enough. Feel free to revert with any comments you may have.