CURRENT PROJECTS
loading
CATEGORIES AND POSTS
loading
overset
DEVELOPMENT LOG FOR JIM PALMER
Posted 04/11/2008 in mssql


Another absolute gem that a coworker of mine, Scott Fitzpatrick, found as part of our continuing database optimization was how to compare character fields fast if doing a "loose" comparison. Say we wanted to find a row where the field was O'BRIEN but we wanted to match on OBRIEN. We have previously been using a home-made MSSQL function to strip all non-alphanumeric characters from character fields in the database and compare against a pre-stripped string as provided by our program - i.e.
... WHERE database.dbo.RemoveNonAlphaNumeric(table.fieldName) like '%OBRIEN%'

If the table has a considerable amount of rows - that roughly means the MSSQL execution plan would involve running this on all the rows in the table prior to comparison which is a nightmare from an optimization standpoint. Thus comes the optimization by using the % wildcard character between every character doing normal string comparison with the LIKE clause:
... WHERE table.fieldName LIKE '%O%B%R%I%E%N%'
This can be programmatically parsed by the program building the query in the firstplace and dramatically reduce the overhead when it comes to the MSSQL execution plan. In the end we see that
... WHERE table.fieldName LIKE '%O%B%R%I%E%N%' AND database.dbo.RemoveNonAlphaNumeric(table.fieldName) like '%OBRIEN%'
is dramatically faster and requires only a little extra work on the programmer's side to accomplish the task of injecting the wildcard % between each character in the comparison string.

Here's an example of this pre-parsing of the comparison string in coldfusion:
<cfset parseFirstName = Trim(arguments.first_name)>
<cfloop index="charFInd" from="#Len(parseFirstName) - 1#" to="1" step="-1">
	<cfset parseFirstName = Insert("%", parseFirstName, charFInd)>
</cfloop>


See more on the non-regex MSSQL non-alphanumeric and non-numeric character field stripping functions: http://www.overset.com/2008/04/11/mssql-functions-to-strip-non-alphanumeric-and-non-numeric-characters-in-character-fields/
comments
loading
new comment
NAME
EMAIL ME ON UPDATES
EMAIL (hidden)
URL
MESSAGE TAGS ALLOWED: <code> <a> <pre class="code [tab4|tabX|inline|bash]"> <br>
PREVIEW COMMENT
TURING TEST
gravatar