Jeff Thompson | a28eed8 | 2013-08-22 16:21:10 -0700 | [diff] [blame^] | 1 | |
| 2 | [/ Copyright 2011 Daniel James. |
| 3 | / Distributed under the Boost Software License, Version 1.0. (See accompanying |
| 4 | / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) ] |
| 5 | |
| 6 | [section:rationale Rationale] |
| 7 | |
| 8 | The rationale can be found in the original design |
| 9 | [footnote issue 6.18 of the __issues__ (page 63)]. |
| 10 | |
| 11 | [heading Quality of the hash function] |
| 12 | |
| 13 | Many hash functions strive to have little correlation between the input |
| 14 | and output values. They attempt to uniformally distribute the output |
| 15 | values for very similar inputs. This hash function makes no such |
| 16 | attempt. In fact, for integers, the result of the hash function is often |
| 17 | just the input value. So similar but different input values will often |
| 18 | result in similar but different output values. |
| 19 | This means that it is not appropriate as a general hash function. For |
| 20 | example, a hash table may discard bits from the hash function resulting |
| 21 | in likely collisions, or might have poor collision resolution when hash |
| 22 | values are clustered together. In such cases this hash function will |
| 23 | preform poorly. |
| 24 | |
| 25 | But the standard has no such requirement for the hash function, |
| 26 | it just requires that the hashes of two different values are unlikely |
| 27 | to collide. Containers or algorithms |
| 28 | designed to work with the standard hash function will have to be |
| 29 | implemented to work well when the hash function's output is correlated |
| 30 | to its input. Since they are paying that cost a higher quality hash function |
| 31 | would be wasteful. |
| 32 | |
| 33 | For other use cases, if you do need a higher quality hash function, |
| 34 | then neither the standard hash function or `boost::hash` are appropriate. |
| 35 | There are several options |
| 36 | available. One is to use a second hash on the output of this hash |
| 37 | function, such as [@http://www.concentric.net/~ttwang/tech/inthash.htm |
| 38 | Thomas Wang's hash function]. This this may not work as |
| 39 | well as a hash algorithm tailored for the input. |
| 40 | |
| 41 | For strings there are several fast, high quality hash functions |
| 42 | available (for example [@http://code.google.com/p/smhasher/ MurmurHash3] |
| 43 | and [@http://code.google.com/p/cityhash/ Google's CityHash]), |
| 44 | although they tend to be more machine specific. |
| 45 | These may also be appropriate for hashing a binary representation of |
| 46 | your data - providing that all equal values have an equal |
| 47 | representation, which is not always the case (e.g. for floating point |
| 48 | values). |
| 49 | |
| 50 | [endsect] |