Blair collides with Brainlessness

by acha11 27. August 2004 03:09

I just had a fun conversation with Ronny about how to set up an object network where children are stored in hashtables keyed on the child's unique id so that when the child's unique id is changed the parent's hashtable is easily updateable.

the funnest part is where we were speculating about the quality of hashing on string. turns out that the default string GetHashCode is based on sequentially XOR-ing each character of the string to produce a 32-bit value.

plus, this guy ran into a problem where hashing 350,000 words from an english dictionary ran into 64 collisions. The example he gave was that "blair" and "brainlessness" both give you a hash code of 175803953 [try it: Console.WriteLine( "{0} = {1}", "blair".GetHashCode(), "brainlessness".GetHashCode() );]

what i'm proud of is that i understood why:
blair
brainlessness

subtract the letters b, l, a, i and r from brainlessness, and you end up with "nessness". xor is commutative, so the two "blairs" will come up with the same hashcode; the difference will be the "nessness". notice that "nessness" repeats "ness" twice. what happens when we xor a number with itself? all the bits are turned off - we get zero. therefore, "ness" xor "ness" will be zero. what happens when we xor "blair" with zero? no bits are changed. therefore, f("blair") = f("brainlessness"), which is a nice little bit of political commentary...

to stop blair colliding with brainlessness, you could change GetHashCode to add 1 to the hashcode every time it steps one character forward in the string; this would mean that "an" would not collide with "na", for example, which seems like a nice feature for a hashing algorithm to have.

Tags:

Comments are closed

Powered by BlogEngine.NET 1.4.5.0
Theme by Mads Kristensen

RecentComments

Comment RSS