Use Your Loaf

Swift Whole Module Optimization

I like it when improvements to the tools give us performance gains with little or no work on our part. In this post I cover a Swift compiler option that promises just that.

By default, Xcode compiles source files individually which limits the scope of the optimizer. The Whole Module Optimization option, new in Xcode 7 removes this limit for Swift code allowing the optimizer to analyze across all the source files in a module.

Enabling Whole Module Optimization

Xcode 7 has an extra optimization level in the Swift Compiler build settings to turn on whole module optimization:

Swift Compiler Optimization

Apple recommends enabling it for your release build as it can increase compilation times.

A Simple Test

I don’t have a lot of Swift code to try this out on so I will use a simple generic function that returns the greater of x and y.

func myMax<T: Comparable>(x: T, _ y: T) -> T {
  return x > y ? x : y
}

Note: The Swift Standard Library already contains max and min functions that you should use - this is just an example.

To test the impact of compiler optimization I have code snippet as follows to call the function with two random integers for a number of iterations:

private func timeForFunction(iterations: Int) -> CFAbsoluteTime {
  let startTime = CFAbsoluteTimeGetCurrent()
  for _ in 1...iterations {
    let randomValue1 = Int(arc4random_uniform(100))
    let randomValue2 = Int(arc4random_uniform(100))
    myMax(randomValue1, randomValue2)
  }
  let endTime = CFAbsoluteTimeGetCurrent()
  return endTime - startTime
}

I then ran the test for 100,000 iterations on an old 5th generation iPod Touch device with the generic function defined locally in the same Swift source file as the calling function and in a separate Swift source file for various levels of compiler optimization:

Optimization Level: None [-ONone]

  • external definition: 0.128 seconds
  • local definition: 0.119 seconds

Optimization Level: Fast [-O]

  • external definition: 0.099 seconds
  • local definition: 0.088 seconds

Optimization Level: Fast,WMO [-O -whole module optimization]

  • external definition: 0.088 seconds
  • local definition: 0.088 seconds

If you prefer here is a graph of the same results:

WMO Results

It should not be a surprise that the test with no optimization is the slowest. What is interesting is the difference in performance when the generic is local or external unless we use whole module optimization.

For the None or Fast optimization levels the compiler is working with a single source file at a time. When the definition of the generic function is in the same source file as the calling code the compiler knows we are calling the function with integers and optimizes away unnecessary object handling code. This optimization known as Generic Specialization will only work when the compiler has visibility of the function definition meaning that the external definition runs more slowly.

Using Whole Module Optimization allows the compiler to look at all the source files in a module. This make compilation slower but allows it to optimize generic functions even when they are in separate source files. You can see this in the final test run where the execution time is now the same for both the local and external function definitions.

In summary, if you don’t mind the extra compilation time try turning on Whole Module Optimization for your release builds.

Further Reading

For a more detailed explanation of why this works see the WWDC session: