Swift Hashable

I already covered making a custom Swift type Equatable and Comparable which allows us to test if an Array contains an instance of our type or to sort the array amongst other things. What if we want to store our type in a Set or Dictionary?

Making Our Custom Type Hashable

To use a type in a Set or Dictionary it has to be Hashable which means providing a hashValue property. Your type must also be Equatable which I covered in Swift Equatable and Comparable. To recap our Swift Country type looks like this:

struct Country {
  let name: String
  let capital: String
  var visited: Bool
}

extension Country: Equatable {
  static func == (lhs: Country, rhs: Country) -> Bool {
    return lhs.name == rhs.name &&
      lhs.capital == rhs.capital &&
      lhs.visited == rhs.visited
  }
}

The hashValue property is an integer that must be the same for any two instances of the type that are equal. Swift standard library types such as String, Integer, Bool are all Hashable.

// hash values can vary
let hello  = "hello"
let world = "world"
hello.hashValue                // 4799432177974197528
"\(hello) \(world)".hashValue  // 3658945855109305670
"hello world".hashValue        // 3658945855109305670

Our custom type has two String values and a Bool. A quick way to create our hash is to XOR (^) the hashValue for each of these standard types. Extending our type to make it Hashable:

extension Country: Hashable {
  var hashValue: Int {
    return name.hashValue ^ capital.hashValue ^ visited.hashValue
  }
}

We can now use Country in a Set or Dictionary:

let destinations: Set = [belgium,canada,brazil]
let europe: Set = [belgium,france,uk]
destinations.intersection(europe)  // belgium

let counts = [uk: 1000, france: 2000]
counts[uk]  // 1000

Notes:

Down the Hash Function Rabbit Hole

Warning, hash functions are a fascinating topic to read about if you have time to spare. Without getting too lost I want to mention some limitations of the hash I used above and some better approaches.

Our initial try is not bad. It works fine for the intended data set of countries. It is not so great for other data sets. What makes a good hash function? Knuth tells us that a good hash function should be fast and minimize collisions.

Look what happens if we had countries named with their capital cities:

let london = Country(name: "London", capital: "London", visited: false)
let paris = Country(name: "Paris", capital: "Paris", visited: false)
london.hashValue  // 0
paris.hashValue   // 0

Both items have a hash value of zero and we get a collision. Don’t ever assume that two items with the same hash value are equal. This happens because we XOR the individual hash values but XOR’ing a value with itself gives zero (A ^ A = 0). A collision would also happen if we had two countries with reversed name and capital values:

let canada = Country(name: "Canada", capital: "Ottawa", visited: false)
let ottawa = Country(name: "Ottawa", capital: "Canada", visited: false)
canada.hashValue  // 3695199242423112
ottawa.hashValue  // 3695199242423112

This is because XOR is commutative which is a fancy way to say that (A ^ B) = (B ^ A). This is not ideal but it is worth remembering that collisions are not a showstopper. With real data collisions can happen and the Set and Dictionary types have to handle them, it is just less efficient. So this works even though both items have the same hash:

let counts = [canada: 1000, ottawa: 2000]
counts[canada]  // 1000

A Better Way To Combine Hashes

Can we create a better hash for our Country that avoids these simple collisions? One way is to use a different hash function for the two strings. A quick search leads to some hash functions that work well on strings. Here are the djb2 and sdbm hash functions as computed properties of String:

extension String {
  var djb2hash: Int {
    let unicodeScalars = self.unicodeScalars.map { $0.value }
    return unicodeScalars.reduce(5381) {
      ($0 << 5) &+ $0 &+ Int($1)
    }
  }

  var sdbmhash: Int {
    let unicodeScalars = self.unicodeScalars.map { $0.value }
    return unicodeScalars.reduce(0) {
      Int($1) &+ ($0 << 6) &+ ($0 << 16) - $0
    }
  }
}

I’ll use the first of these hash functions on the name:

extension Country: Hashable {
  var hashValue: Int {
    return name.djb2hash ^ capital.hashValue ^ visited.hashValue
  }
}

This avoids the collisions with matching name and capital (actual hash values can vary):

let london = Country(name: "London", capital: "London", visited: false)
let paris = Country(name: "Paris", capital: "Paris", visited: false)
london.hashValue  // 4792642925948815646
paris.hashValue   // 4799464424543103873

let canada = Country(name: "Canada", capital: "Ottawa", visited: false)
let ottawa = Country(name: "Ottawa", capital: "Canada", visited: false)
canada.hashValue  // 4792300300145562762
ottawa.hashValue  // 4795361053083927978

As I was writing this post I came across a more general approach in the Sourcery library by Krzysztof Zabłocki. This uses a Swift version of the boost hash combine function to combine an array of hash values. For example we could use it this way:

var hashValue: Int {
    return combineHashes([name.hashValue,capital.hashValue,visited.hashValue])
} 

There are many other approaches we could try but i think for now that is more than I ever wanted to know about hashing.

Further Reading

Never miss a post!

iOS Size Classes Cheat Sheet

Subscribe and get my free iOS Size Classes Cheat Sheet

Unsubscribe at any time.
No time to watch WWDC videos?

Sign up to get my iOS posts direct to your inbox and I will send you a free PDF of my iOS Size Classes Cheat Sheet.

Unsubscribe at any time.
Archives Categories
comments powered by Disqus