Thursday, January 17, 2013

Excel issues in handling intermixed data

I was doing some adhoc data importing from Excel to SQL server for some analysis and I encountered a strange issue. The excel was not returning data for some of the fields correctly. The excel data looks like below (for simplicity I've given just a representation).

I was using OPENROWSET for doing the data import. The statement looked like below



SELECT * FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\Documents and Settings\Administrator\My Documents\Excel\IMEXTest.xls;HDR=Yes;', [Data$])


This was the result i got


As you see from the above, it was losing data for one of the columns. The reason was the initial set of rows having NULL values for the column. By default, excel determines the datatype by scanning first few rows and that why it was not able to fetch following data items correctly.
I played with different options and was finally able to get the solution. The solution was to add IMEX=1 argument to connection string as follows.


SELECT * FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\Documents and Settings\Administrator\My Documents\Excel\IMEXTest.xls;HDR=Yes;IMEX=1', [Data$])

This would enable Excel to read intermixed datatypes and interpret data accordingly. The corresponding output would be as below

As you see from above adding argument IMEX=1 causes Excel to read the missing data correctly.