CURRENT PROJECTS
loading
CATEGORIES AND POSTS
loading
overset
DEVELOPMENT LOG FOR JIM PALMER
Posted 09/01/2008 in javascript


Most Javascript implementations have great sort implementations utilizing a fast sort algorithm but they all lack the ability to perform a "natural sort". That is, sorting an array of dates, software version numbers, etc. and getting the "natural" a.k.a. "expected" ordering on the results.

UPDATE 2/25/2012
New 0.7 version with more fixes and support for case-insensitive sorting which has been asked for by many. The default sort is case-sensitive and will sort ['a', 'B'] to ['B', 'a']. This sort functionality was modeled after PHP's natsort() functionality which is case-sensitive as well. This new version just adds a flag to tell naturalSort() to sort insensitively:
>>> naturalSort.insensitive = true;
>>> ['a', 'B'].sort(naturalSort);
["a", "B"]
As opposed to the default case-sensitive sorting:
>>> ['a', 'B'].sort(naturalSort);
["B", "a"]
I would like to come up with a more elegant way to flag case sensitivity - I was thinking of curry/closure/etc. idea to something of the degree [1,2].sort(naturalSort(true)) to sort case-insensitively, but more on that later.

UPDATE 3/5/2011
New 0.6 version with a more robust Date detection and sorting. Recent changes in Chrome Beta's Date.parse() function made it possible for much more loose parsing of strings into dates. For instance, Date.parse('1') in Chrome Beta returns a valid UTC time value while in other browsers it returns a NaN. Earlier versions of my naturalSort() relied on Date.parse() to detect dates in any/all comparisons which was a bad assumption on my part because of how loose the ECMA spec is on the details of what Date.parse() is able to parse into a date. Quoting the ECMA-262 spec in section 15.9.4.2, "The function first attempts to parse the format of the String according to the rules called out in Date Time String Format (15.9.1.15). The function first attempts to parse the format of the String according to the rules called out in Date Time String Format (15.9.1.15). If the String does not conform to that format the function may fall back to any implementation-specific heuristics or implementation-specific date formats." it is safe to assume each implementation will attempt to parse an ISO8601 date string and then fall back on basically anything else - which is the case in the new Chrome Beta. I updated a new Date regexp to aid in detection of dates instead of assuming Date.parse() will be able to perfectly detect them. The regexp loosely attempts to find ISO8601, RFC1123, .toString(), .toUTCString(), .toLocaleDateString(), etc date formats which should work in the majority of date sorting use-cases.

SOLUTION
Here's a very simple "Natural Sort" implementation that shares the same core "chunking" concept as displayed in many implementations that appears to have been first adopted by Dave Koelle.

I wanted to build a function that split each string to compare for sorting into an array in the appropriate order based off blocks of strings and blocks of numbers similar to how the String.split() functionality works. This is the core of the "chunking" so that you can do the sort comparison off portions of the string and rely on the browser's built in < and > operators. Another goal was NOT to rely on the String.charCodeAt() and compare ASCII character values on a per-character basis. This seemed like complete overkill considering once each "chunk" can be properly sorted off the browser's built-in comparison operators properly.

The approach I used was to treat everything as a string, delimitate the numeric and alpha portions of each string by the null char(0) and then split the string into an array off the null char(0) delimiter. This null char(0) will almost never appear in a string - and if it did we'd not really be able to do a comparison operation off of it.

Here's the simple naturalSort function to be used to point the Array.sort() method to:
/*
 * Natural Sort algorithm for Javascript - Version 0.7 - Released under MIT license
 * Author: Jim Palmer (based on chunking idea from Dave Koelle)
 */
 function naturalSort (a, b) {
    var re = /(^-?[0-9]+(\.?[0-9]*)[df]?e?[0-9]?$|^0x[0-9a-f]+$|[0-9]+)/gi,
        sre = /(^[ ]*|[ ]*$)/g,
        dre = /(^([\w ]+,?[\w ]+)?[\w ]+,?[\w ]+\d+:\d+(:\d+)?[\w ]?|^\d{1,4}[\/\-]\d{1,4}[\/\-]\d{1,4}|^\w+, \w+ \d+, \d{4})/,
        hre = /^0x[0-9a-f]+$/i,
        ore = /^0/,
        i = function(s) { return naturalSort.insensitive && (''+s).toLowerCase() || ''+s },
        // convert all to strings strip whitespace
        x = i(a).replace(sre, '') || '',
        y = i(b).replace(sre, '') || '',
        // chunk/tokenize
        xN = x.replace(re, '\0$1\0').replace(/\0$/,'').replace(/^\0/,'').split('\0'),
        yN = y.replace(re, '\0$1\0').replace(/\0$/,'').replace(/^\0/,'').split('\0'),
        // numeric, hex or date detection
        xD = parseInt(x.match(hre)) || (xN.length != 1 && x.match(dre) && Date.parse(x)),
        yD = parseInt(y.match(hre)) || xD && y.match(dre) && Date.parse(y) || null,
        oFxNcL, oFyNcL;
    // first try and sort Hex codes or Dates
    if (yD)
        if ( xD < yD ) return -1;
        else if ( xD > yD ) return 1;
    // natural sorting through split numeric strings and default strings
    for(var cLoc=0, numS=Math.max(xN.length, yN.length); cLoc < numS; cLoc++) {
        // find floats not starting with '0', string or 0 if not defined (Clint Priest)
        oFxNcL = !(xN[cLoc] || '').match(ore) && parseFloat(xN[cLoc]) || xN[cLoc] || 0;
        oFyNcL = !(yN[cLoc] || '').match(ore) && parseFloat(yN[cLoc]) || yN[cLoc] || 0;
        // handle numeric vs string comparison - number < string - (Kyle Adams)
        if (isNaN(oFxNcL) !== isNaN(oFyNcL)) { return (isNaN(oFxNcL)) ? 1 : -1; }
        // rely on string comparison if different types - i.e. '02' < 2 != '02' < '2'
        else if (typeof oFxNcL !== typeof oFyNcL) {
            oFxNcL += '';
            oFyNcL += '';
        }
        if (oFxNcL < oFyNcL) return -1;
        if (oFxNcL > oFyNcL) return 1;
    }
    return 0;
}

CODE.GOOGLE PROJECT
DATE SORTING SUPPORT
Finally added the ability to sort off Date objects based off the .getTime() unix epoch timestamp comparisons. This will only work against valuation of fields that in their entirety can instantiate a valid Date object via the Date constructor. This means that any valid input string that can create a date via the Date constructor, i.e. new Date('10/10/2005') or new Date('Tue Sep 09 2008 20:32:28 GMT-0700 (Pacific Daylight Time)'), can work. The value cannot contain any other character that might throw the Date constructor off, i.e. a '10/10/2005 original' will not work.

UNICODE SUPPORT
This sort algorithm works against UNICODE characters/scripts as well because it relies on the browser's built in string comparison operators. I have a working demo of a this in action sorting a column in my http://www.overset.com/2008/08/30/animated-sortable-datagrid-jquery-plugin-jtps/ post.

SPEED
Upon initial testing this function appears to be 60% faster than Brian Huisman's Javascript implementation of Dave Koelle's Alphanum algorithm when sorting against string fields of around 20 characters on average and 500 entries in the array. It appears to be the same speed when sorting against purely numeric integer-based array values at around 500 entries.

BROWSER SUPPORT
FF3, IE8, Chrome, Safari

UNIT TESTS
unit-tests.html - Unit tests were built with QUnit and offer more examples and supported sortable types:

EXAMPLES
Sorted arrays using this functionality:
// Simple numerics
>>> ['10',9,2,'1','4'].sort(naturalSort)
['1',2,'4',9,'10']

// Floats
>>> ['10.0401',10.022,10.042,'10.021999'].sort(naturalSort)
['10.021999',10.022,'10.0401',10.042]

// Float & decimal notation
>>> ['10.04f','10.039F','10.038d','10.037D'].sort(naturalSort)
['10.037D','10.038d','10.039F','10.04f']

// Scientific notation
>>> ['1.528535047e5','1.528535047e7','1.528535047e3'].sort(naturalSort)
['1.528535047e3','1.528535047e5','1.528535047e7']

// IP addresses
>>> ['192.168.0.100','192.168.0.1','192.168.1.1'].sort(naturalSort)
['192.168.0.1','192.168.0.100','192.168.1.1']

// Filenames
>>> ['car.mov','01alpha.sgi','001alpha.sgi','my.string_41299.tif'].sort(naturalSort)
['001alpha.sgi','01alpha.sgi','car.mov','my.string_41299.tif'

// Dates
>>> ['10/12/2008','10/11/2008','10/11/2007','10/12/2007'].sort(naturalSort)
['10/11/2007', '10/12/2007', '10/11/2008', '10/12/2008']

// Money
>>> ['$10002.00','$10001.02','$10001.01'].sort(naturalSort)
['$10001.01','$10001.02','$10002.00']

// Movie Titles
>>> ['1 Title - The Big Lebowski','1 Title - Gattaca','1 Title - Last Picture Show'].sort(naturalSort)
['1 Title - Gattaca','1 Title - Last Picture Show','1 Title - The Big Lebowski']

// By default - case-sensitive sorting
>>> ['a', 'B'].sort(naturalSort);
['B', 'a']

// To enable case-insensitive sorting
>>> naturalSort.insensitive = true;
>>> ['a', 'B'].sort(naturalSort);
['a', 'B']

comments
loading
new comment
NAME
EMAIL ME ON UPDATES
EMAIL (hidden)
URL
MESSAGE TAGS ALLOWED: <code> <a> <pre class="code [tab4|tabX|inline|bash]"> <br>
PREVIEW COMMENT
TURING TEST
gravatar