Thursday, April 13, 2017

SSIS Tips: Case Study - Extracting ZenDesk data using SSIS

Impetus

 Its been a long time since I wrote a blog.
The main reason was that I've been super busy with some exciting projects of late.
One of my recent projects had an interesting requirement which I thought would be worth sharing for the benefit of the wider audience.

Scenario

One of my recent BI engagements with a e-commerce customer had a use case which required customer satisfaction metrics to be captured and analyzed. The metrics were to be derived from variety of data including support tickets, delivery, feedback and notification emails etc. The client was using ZenDesk as the ticketing application. All of the above required information was tracked by means of tickets within ZenDesk so the attempt was to integrate ZenDesk data using ETL tool SSIS and use the data for analysis along with the other available data.

Illustration

ZenDesk offers API access to its data. The attempt was to use SSIS and access the required ZenDesk data using API calls. 
For the purpose of this blog lets see a simple scenario of how SSIS can be used to call ZenDesk API for extracting tickets related data.
The details regarding ZenDesk APIs can be found here


Refer to the Search section which explains on various API endpoint for searching ZenDesk data. We will be utilizing this for searching and fetching the ticket information using our required filters. The search API is flexible due to the fact that it allows a set of query string parameters which can be utlized to customize our search. Some example usages are given in the below link 


So for our case since we require ticket information the full URL would look like below

https://{subdomain}.zendesk.com/api/v2/search.json?sort_by=created_at&sort_order=asc&query=created> type:ticket

Here subdomain represents the subdomain used while registering with ZenDesk (mostly the company name) and date value the date relative to which you want to do the search. sort_by parameter is used to get the search result in chronological order.
The response for the above API call will look like this

{
"results":[
{
"url":"https://.zendesk.com/api/v2/tickets/36046.json",
"id":36046,
"external_id":null,
"via":{
"channel":"email",
"source":{
"from":{
"address":"xyz2@gmail.com",
"name":"XYZ 2"
},
"to":{
"name":"abc.com",
"address":"customercare@abc.com"
},
"rel":null
}
},
...],
"facets":null,
"next_page":null,
"previous_page":null,
"count":72
}

As you see it gives a JSON response having the above structure 
One more thing to note here is that API returns only a maximum of 100 results in a response. In case search result exceeds 100 items the response gets auto-paged at every 100 results. The url for the next page will be sent along with the response (refer to the next_page/previous_page elements in the above JSON)
So the cases where we have more results we need to capture next_page value each time and sent a new request to the server using that url. This will continue until the last page in which case the next_page value will be null.
Considering all these points a simple package for extracting ZenDesk data will look like below


Now we can analyse the various tasks inside the package to understand their purpose
For Loop: The loop is included to ensure we iterate through requests until we reach the last page of our search result. It will be a simple For loop based on a counter which will be initialized to 1 and reset inside the loop when the results are completed to break from the iteration.
The Expression Tasks are used to set the iterative values for request URL i.e. URL to fetch next page of results as well as for the reset of the URL and counter values
The script task does the download of search results in JSON format into a text file using API call. It utilizes a HTTP connection which will use ZenDesk API URL as the source through a variable expression to pass dynamic values like date.
The variable expression in our case looks like below

"https://{domain}.zendesk.com/api/v2/search.json?sort_by=created_at&sort_order=asc&query=created >" + (DT_WSTR, 30)  (DT_DBDATE) @[User::Date] + " type:ticket"

The script task code will look like this

Dim nativeObject As Object = Dts.Connections("HTTP Connection Manager").AcquireConnection(Nothing)

        
        Dim connection As New HttpClientConnection(nativeObject)


        ' Download the file #1
        ' Save the file from the connection manager to the local path specified
        Dim filename As String = ""
        connection.DownloadFile(filename, True)

The Execute SQL Task uploads the JSON data from the file to a table in SQL Server database. Once data is transferred it uses a JSON parsing logic to get the individual elements out.
Parsing JSON can be done using native SQL functions like OPENJSON, JSON_VALUE etc if you're on SQL 2016 . This is already explained in my earlier blogs below


If you're on an earlier version of SQL, you need to use a UDF for parsing JSON like one below


The SP code will look like below in that case

INSERT JSONData(JSON)
SELECT *
FROM OPENROWSET(BULK '',SINGLE_BLOB) AS t


IF OBJECT_ID('tempdb..#Temp') IS NOT NULL DROP TABLE #Temp

SELECT f.* 
INTO #Temp 
FROM JSONData j
CROSS APPLY dbo.parseJSON(j.JSON)f


SELECT @NextURL = StringValue
FROM #Temp
WHERE NAME = 'next_page'

Once you parse it you will get values of various elements within the JSON. This can be used for extracting the required information
The next_url value will be captured and returned through an OUTPUT parameter back to the package. Using this we can determine whether to iterate again for the next page of results or break the loop.
Once you parse the data out to a table you can then use it for your further manipulations like aggregation etc.

Conclusion

As seen from the above illustration you can use a simple script task utilizing a HTTP connection in SSIS package to call and extract data through ZenDesk APIs. This data can then be used for manipulations within your applications and can be utilized in BI reports, dashboards etc.