Excel and Sql, or the joys of data massages

A problem we encounter too often is extracting data the business stores in an excel spreadsheet. The way some people deal with it is way too often to copy and paste all this in T-SQL code and have long integration scripts.

This causes two major problems. The first one is that you rely on your database to integrate data in your application from other systems. The vast majority of products have validation logic and application integrity checks into your application code, and rarely in the database schema. Not only that, but you suddenly expose as a public and exposed system what should remain an implementation detail of your service. In the end, you just made changes to your database structures an order of magnitude more difficult, which now decreases the maintainability of your code.

And don't even get me started on testability of T-SQL integration scripts, even with the new VS for database professionals unit testing. It just doesn't add up.

That said, the not to do is often the was done and conflicts with the needs to be done by tomorrow. Often enough a project cannot (or doesn't want to) afford a proper integration between their systems, as these projects come in expensive. Often, good enough is what the business wants, and the business controls the budget.

And in these cases, you often have to deal with data from Excel and Access. Few people actually know that you can query these files from within your T-SQL code, by using the OPENROWSET command, which gives you whichever table you want to select, through the table name or through a query.

In this instance I've just spent a good hour trying to understand why my excel spreadsheet opened through the OpenRecordset would return me a column full of null values when the spreadsheet itself had the correct values.

As usual, the answer is hidden in the knowledge base, KB194124. Excel tries to guess the datatype of a column, and gives you null if it fails. Rule of thumb, never trust anything that tries being smart.

Solution is to switch excel to import mode, by changing the connection string you use to add IMEX=1:

SELECT * INTO #xl FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=file.xls;IMEX=1;', ['Festival List$'])

And I'm back in business!

[UPDATE: Added some commentary. Also changed the text around a bit, sometimes I realize my English gets fuzzy when I don't proof read before posting. Sorry chaps.]

Technorati Tags: , ,

Ads

Comment