Thursday, November 14, 2013

T-SQL tips: Pattern Search and Replace within XML using Xpath Functions

T-SQL provides good support for generating and manipulating XML data from SQL 2005 onwards. Using the functions like query(), nodes() etc we can shred the data within XML document. Similarly to build a XML out of relational data we can use FOR XML construct.
Recently we had a requirement to search for a pattern within XML document element and then replace it with another pattern. The XML was imported from another system which had lots of spurious duplicate entries. There was a reference table maintained with cleansed results for these values using DQS (Data Quality Services) project. The attempt was to replace the spurious patterns with values from the table. We have modify() method available in T-SQL which can be used to replace node value. But issue with it is that it will replace the entire node value and not a pattern within it. So we needed a workaround to achieve what we wanted. I'm explaining the method we used here with an illustration to benefit others who have similar requirements.
The table containing XML document looks like this

declare @t table
x xml

With the xml document values are as follows 

And the cleansed reference table looks like below

declare @Cleanse table
SourceString varchar(1000),
CleansedString varchar(1000)

insert @Cleanse
values('1 Maix Street','1 Main Street'),
('2 St Albrrts Street','2 St Alberts Street'),
('393 Crossaway','393 Crossway'),

The  requirement was to use the cleansed results to correct the wrong ADDRESS node values.
The modify() functions allows only entire value to get replaced 
So the solution is to use a workaround like this.

SET x.modify('replace value of (/CONTACT[1]/ADDRESS/text())[1] with sql:column("CleansedCol")')
FROM (SELECT x,REPLACE(x.query('/CONTACT/ADDRESS').value('.','varchar(100)'),SourceString,CleansedString) AS CleansedCol
 FROM @t t
JOIN  @Cleanse c
ON x.exist('/CONTACT/ADDRESS[contains(.,sql:column("SourceString"))]')=1

If you check the query it shreds the ADDRESS node value using value function and applies REPLACE() over it to replace the occurrences of wrong string with the cleansed results using the reference table. This is then used to replace the full value within the ADDRESS node using modify() method to get the desired result.
Now if you select xml data from table you can see the below

indicating that the typos in addresses were cleansed as per reference table entries.
This explains how you can apply xpath functions and achieve find and replace pattern functionality within XML documents in T-SQL.