Tuesday, March 19, 2013

SQL - Parsing date from text / varchar field

I had a fun little SQL task to do today.  We have a large field of text that contains some comments and next to each comment is a date.  We wanted to find the most recent comment and sort the records by the date that is next to that comment in the text.  Luckily, we build that text field with the most recent dates at the top, so in this case, I just needed to parse out the first date that I could find.

I can find the year easily enough by searching for "/201" (actually that will only work for 7 more years, oops).  The main complication in my case is that the dates don't use leading zeroes, so they aren't all the same length.  Some are 2/2/2013 (8 characters) and some are 11/12/2013 (10 characters).  So I couldn't just find the "/201" and go back 5 characters.  And I didn't want to just search for the first slash, just in case there was a slash used for a different reason.  So I looked for "/201", then went back 5 characters and found the first slash before that, then went back 2 characters and built a substring from there.  Again the month really could be one or two characters, so in many cases my substring now had a leading character.  In my case, it was usually a space or newline character, so then I stripped newlines using REPLACE.

The last complication is that I wanted to sort by this date, and alphabetical sort won't work because of the lack of leading zeroes.  I had to do a CONVERT() to a datetime in order to sort correctly.  That will fail though if it finds some improperly formatted date (which we have in the text) so I had to use a CASE statement with an IsDate check.  Here's the final query:

SELECT
CASE 
    WHEN IsDate(REPLACE(REPLACE(REPLACE(SUBSTRING(myTable.Comments, CHARINDEX('/', myTable.Comments, CHARINDEX('/201', myTable.Comments) - 5)-2, 10), CHAR(10), ''), CHAR(13), ''), CHAR(9), '')) = 1 
    THEN CONVERT(datetime, REPLACE(REPLACE(REPLACE(SUBSTRING(myTable.Comments, CHARINDEX('/', myTable.Comments, CHARINDEX('/201', myTable.Comments) - 5)-2, 10), CHAR(10), ''), CHAR(13), ''), CHAR(9), '')) 
    ELSE NULL 
END As LastCommentDate
FROM myTable

ISSUES:
1) Not sure how fast this is, it seems like it may be slow
2) Once I find the start point, I just make a substring of length 10.  That works for me for our data, but more safe/accurate would be to do a difference between the start index and the index of the "201" to know for sure how long the month and day part is.