Swift Hashable
Feb 13, 2017 · 5 minute readSwift
I already covered making a custom Swift type Equatable
and Comparable
which allows us to test if an Array
contains an instance of our type or to sort the array amongst other things. What if we want to store our type in a Set
or Dictionary
?
Update 2 April 2018: Starting with Swift 4.1 you can have the compiler automatically synthesize Hashable
conformance for your type. See How To Get Equatable And Hashable For Free.
Making Our Custom Type Hashable
To use a type in a Set
or Dictionary
it has to be Hashable
which means providing a hashValue
property. Your type must also be Equatable
which I covered in Swift Equatable and Comparable. To recap our Swift Country
type looks like this:
struct Country {
let name: String
let capital: String
var visited: Bool
}
extension Country: Equatable {
static func == (lhs: Country, rhs: Country) -> Bool {
return lhs.name == rhs.name &&
lhs.capital == rhs.capital &&
lhs.visited == rhs.visited
}
}
The hashValue
property is an integer that must be the same for any two instances of the type that are equal. Swift standard library types such as String
, Integer
, Bool
are all Hashable
.
// hash values can vary
let hello = "hello"
let world = "world"
hello.hashValue // 4799432177974197528
"\(hello) \(world)".hashValue // 3658945855109305670
"hello world".hashValue // 3658945855109305670
Our custom type has two String
values and a Bool
. A quick way to create our hash is to XOR (^) the hashValue
for each of these standard types. Extending our type to make it Hashable
:
extension Country: Hashable {
var hashValue: Int {
return name.hashValue ^ capital.hashValue ^ visited.hashValue
}
}
We can now use Country
in a Set
or Dictionary
:
let destinations: Set = [belgium,canada,brazil]
let europe: Set = [belgium,france,uk]
destinations.intersection(europe) // belgium
let counts = [uk: 1000, france: 2000]
counts[uk] // 1000
Notes:
Do not assume that two instances of a type with the same hash value are equal. Depending on how we compute the hash value we can get collisions where two different instances share the same hash value. The
Hashable
protocol only needs the reverse - two equal instances have the same hash value.The hash value of a Bool from the Swift Standard Library happens to be either 0 (false) or 1 (true) - but don’t depend on those values.
There is no guarantee that a hash value will be the same for different executions of your code.
Down the Hash Function Rabbit Hole
Warning, hash functions are a fascinating topic to read about if you have time to spare. Without getting too lost I want to mention some limitations of the hash I used above and some better approaches.
Our initial try is not bad. It works fine for the intended data set of countries. It is not so great for other data sets. What makes a good hash function? Knuth tells us that a good hash function should be fast and minimize collisions.
Look what happens if we had countries named with their capital cities:
let london = Country(name: "London", capital: "London", visited: false)
let paris = Country(name: "Paris", capital: "Paris", visited: false)
london.hashValue // 0
paris.hashValue // 0
Both items have a hash value of zero and we get a collision. Don’t ever assume that two items with the same hash value are equal. This happens because we XOR the individual hash values but XOR’ing a value with itself gives zero (A ^ A = 0). A collision would also happen if we had two countries with reversed name and capital values:
let canada = Country(name: "Canada", capital: "Ottawa", visited: false)
let ottawa = Country(name: "Ottawa", capital: "Canada", visited: false)
canada.hashValue // 3695199242423112
ottawa.hashValue // 3695199242423112
This is because XOR is commutative which is a fancy way to say that (A ^ B) = (B ^ A). This is not ideal but it is worth remembering that collisions are not a showstopper. With real data collisions can happen and the Set
and Dictionary
types have to handle them, it is just less efficient. So this works even though both items have the same hash:
let counts = [canada: 1000, ottawa: 2000]
counts[canada] // 1000
A Better Way To Combine Hashes
Can we create a better hash for our Country
that avoids these simple collisions? One way is to use a different hash function for the two strings. A quick search
leads to some hash functions that work well on strings. Here are the djb2 and sdbm hash functions as computed properties of String
:
extension String {
var djb2hash: Int {
let unicodeScalars = self.unicodeScalars.map { $0.value }
return unicodeScalars.reduce(5381) {
($0 << 5) &+ $0 &+ Int($1)
}
}
var sdbmhash: Int {
let unicodeScalars = self.unicodeScalars.map { $0.value }
return unicodeScalars.reduce(0) {
Int($1) &+ ($0 << 6) &+ ($0 << 16) - $0
}
}
}
I’ll use the first of these hash functions on the name:
extension Country: Hashable {
var hashValue: Int {
return name.djb2hash ^ capital.hashValue ^ visited.hashValue
}
}
This avoids the collisions with matching name and capital (actual hash values can vary):
let london = Country(name: "London", capital: "London", visited: false)
let paris = Country(name: "Paris", capital: "Paris", visited: false)
london.hashValue // 4792642925948815646
paris.hashValue // 4799464424543103873
let canada = Country(name: "Canada", capital: "Ottawa", visited: false)
let ottawa = Country(name: "Ottawa", capital: "Canada", visited: false)
canada.hashValue // 4792300300145562762
ottawa.hashValue // 4795361053083927978
As I was writing this post I came across a more general approach in the Sourcery library by Krzysztof Zabłocki. This uses a Swift version of the boost hash combine function to combine an array of hash values. For example we could use it this way:
var hashValue: Int {
return combineHashes([name.hashValue,capital.hashValue,visited.hashValue])
}
There are many other approaches we could try but i think for now that is more than I ever wanted to know about hashing.