Getting Started with Swift Regex

Swift 5.7 brings a whole new way of working with regular expressions. The documentation has plenty of room for improvement so here’s my notes to get started.

Note: I’m assuming some prior knowledge of regular expressions.

Swift Regex Literals

You can create a Swift regex from raw text, the same way you would with NSRegularExpression:

// Match and capture one or more digits
let pattern = #"(\d+)"#
let regex = try Regex(pattern)

This creates a regular expression of type Regex but it has some of the same disadvantages of NSRegularExpression:

  • We need to escape the raw text to protect the regex backslash.
  • There’s no compile time syntax checking of the pattern. We only find errors at run time when the try operation throws.

When we know the pattern at compile time we can create the regex using a regex literal and get compiler time syntax checking (and Xcode syntax highlighting):

let regex = /(\w+)\s+(\d+)/

An example of using this regex to extract a name and a number from an input string:

let input = "Tom 123 xyz"
if let result = input.firstMatch(of: regex) {
  print(result.0)  // Tom 123
  print(result.1)  // Tom
  print(result.2)  // 123
}

Note that the result is a regex match from which you can get an output tuple of matching substrings. The substring at index 0 being everything that the regex matched. The first captured substring is at index 1 and so on. Using a named tuple:

if let match = input.firstMatch(of: regex) {
  let (matched, name, count) = match.output
}

We can also name the captured variables:

let regex = /(?<name>\w+)\s+(?<count>\d+)/
if let match = input.firstMatch(of: regex) {
  print(match.name)   // Tom
  print(match.count)  // 123
}

This can be easier to write using the extended regex literal format which allows you to include spacing. We can rewrite the previous pattern as follows:

let regex = #/
  (?<name> \w+) \s+
  (?<count> \d+)
/#

Different Ways To Match

There are a few different ways to apply a regex to an input string:

input.firstMatch(of: regex)
input.wholeMatch(of: regex)
input.prefixMatch(of: regex)

We’ve already seem an example of firstMatch. It returns the first substring that matches the regex:

let input = "123 456 def"
if let match = input.firstMatch(of: /(\d+)/) {
  print(match.0)  // 123
  print(match.1)  // 123
}

Compare this to wholeMatch where our regex pattern must match the whole input string:

let input = "abc 456 def"
if let match = input.wholeMatch(of: /\w+\s+(\d+)\s+\w+/) {
  print(match.0)  // abd 456 def
  print(match.1)  // 456
}

Prefix match guarantees that the input string starts with the pattern. So the following will fail to match:

let input = "abc456def"
let match = input2.prefixMatch(of: /(\d+)/)  // nil

When you just want to test for the prefix and don’t care about capturing the values:

let worddigit = input.starts(with: /\w+\d+/)  // true

Replacing, Trimming and Splitting

Using a regex to replace, trim or split a string:

let line = "Tom   1234"
let line1 = line.replacing(/\s+/,with:",")  // Tom,1234
let line2 = line.trimmingPrefix(/\w+\s+/)   // 1234
let fields = line.split(separator: /\s+/)   // ["Tom","1234"]

Regex Builder

The regex builder DSL provides a more verbose but hopefully more structured and readable way to build a regex. Rewriting the earlier example using a regex builder (remember to import RegexBuilder):

import RegexBuilder

let regex = Regex {
  Capture {
    OneOrMore(.word)
  }

  OneOrMore(.whitespace)

  Capture {
    OneOrMore(.digit)
  }
}

let input = "Tom 123 xyz"
if let match = input.firstMatch(of: regex) {
  let name = match.1    // Tom
  let count = match.2   // 123
}

The traditional regex syntax places captured values in brackets. With regex builder you use Capture blocks. We also have the more verbose, but also more readable components for quantities such as One, OneOrMore, ZeroOrMore, ChoiceOf, and Optionally. Some examples:

let colorRegex = Regex {
  Capture {
    ChoiceOf {
        "red"
        "green"
        "blue"
    }
  }
  ":"
  One(.whitespace)
    
  Capture(OneOrMore(.digit))
  
  ZeroOrMore(.whitespace)
  Optionally(OneOrMore(.hexDigit))
}

Using this regex:

let colors = [
  "red: 255 FF",
  "green: 0",
  "blue: 128 80"
]

for color in colors {
  if let match = color.wholeMatch(of: colorRegex) {
    print(match.1, match.2)
  }
}

// red 255
// green 0
// blue 128

Transforming Capture Types

You can improve the type safety of captured values by adding a transform block to the capture. This allow you to transform the generic capture output to a known type. For example, to transform the captured digits to an (optional) integer:

let regex = Regex {
  Capture {
    OneOrMore(.digit)
  } transform: {
    Int($0)  // Int?
  }
}

To avoid the optional types use a TryCapture block:

let regex = Regex {
  TryCapture {
    OneOrMore(.digit)
  } transform: {
    Int($0)  // Int
  }
}

If the transform operation fails, the regex backtracks to try an alternate path. Another example using a custom type where the initializer can fail:

enum RGBColor: String {
  case red
  case blue
  case green
}

let regex = Regex {
  TryCapture {
    ChoiceOf {
      "red"
      "green"
      "blue"
    }
  } transform: {
    RGBColor(rawValue: String($0))
  }
}

Reluctant Matching

The various quantity components, such as OneOrMore, Optionally, etc., all match as many occurrences as possible by default (known as eager matching). You can also make them match as few occurrences as possible (known as reluctant matching).

For example, a regex that captures one or more digits after anything else using eager matching:

let regex = Regex {
  OneOrMore(.any, .eager)
  Capture(OneOrMore(.digit))
}

let line = "hello world 99 ----> 42"
if let match = line.wholeMatch(of: regex) {
  let count = match.1  // 2
}

The OneOrMore component eats everything up to and including the digit “4”, leaving the capture block with the last digit “2”. Compare that with a reluctant match which captures “42”:

let regex = Regex {
  OneOrMore(.any, .reluctant)
  Capture(OneOrMore(.digit))
}

let line = "hello world 99 ----> 42"
if let match = line.wholeMatch(of: regex) {
  let count = match.1  // 42
}

If you want to change the default behaviour for all components, add the modifier to the Regex:

Regex {
}
.repetitionBehavior(.reluctant)

Using Foundation Parsers With Regex Builder

You can use the Foundation date, number, currency and URL parses with regex builder. Don’t try to write your own regex to match dates. An example of matching a currency amount:

let input = "Item 1 £100.25"

let regex = Regex {
  "Item"
  OneOrMore(.whitespace)
  Capture(OneOrMore(.digit))

  OneOrMore(.whitespace)

  Capture(.localizedCurrency(code: "GBP", locale: Locale(identifier: "en_GB")))
}

if let result = input.wholeMatch(of: regex) {
    print(result.1)  // 1
    print(result.2)  // 100.25
}

Learn More

I’ve only scratched the surface but hopefully it’s enough to get started. I recommend watching the following WWDC video to dig deeper: