How to percent encode a URL String

When interacting with a web service it is common to have to percent encode reserved characters in the URL or form data. For example the percent encoding for the “&” character is “%26”. Figuring out which characters should be percent encoded in which part of a URL is not easy. The best sources seem to be RFC 3986 and the W3C HTML5 recommendation. So for fun and education I created a Swift String extension (and for comparison an Objective-C category) for both.

Last updated: Jun 12, 2020

Encoding Query Strings for RFC 3986

Section 2.3 of RFC 3986 lists the characters that you should not percent encode as they have no special meaning in a URL:

ALPHA / DIGIT / “-” / “.” / “_” / “~”

Section 3.4 also explains that since a query will often itself include a URL it is preferable to not percent encode the slash ("/") and question mark ("?"). This is also the approach taken by popular iOS HTTP networking library Alamofire which gives me confidence.

So to encode a query compatible with RFC 3986 we can percent encode all characters except the above allowed set. This is simple if we first construct the set of allowed characters and then use addingPercentEncoding(withAllowedCharacters:) to encode the rest.

Note: If you are still using stringByAddingPercentEscapesUsingEncoding or CFURLCreateStringByAddingPercentEscapes Apple has deprecated them in iOS 9.

Swift

First the Swift String extension:

extension String {
  func stringByAddingPercentEncodingForRFC3986() -> String? {
    let unreserved = "-._~/?"
    let allowed = NSMutableCharacterSet.alphanumeric()
    allowed.addCharacters(in: unreserved)
    return addingPercentEncoding(withAllowedCharacters: allowed as CharacterSet)
  }
}

Objective-C

We can do something similar with Objective-C using a category on NSString:

@implementation NSString (URLEncoding)
- (nullable NSString *)stringByAddingPercentEncodingForRFC3986 {
  NSString *unreserved = @"-._~/?";
  NSMutableCharacterSet *allowed = [NSMutableCharacterSet
                                    alphanumericCharacterSet];
  [allowed addCharactersInString:unreserved];
  return [self
          stringByAddingPercentEncodingWithAllowedCharacters:
          allowed];
}
@end

Example usage:

// Swift
let query = "one&two =three"
let encoded = query.stringByAddingPercentEncodingForRFC3986()
// "one%26two%20%3Dthree"
// Objective-C
NSString *query = @"one&two =three";
NSString *encoded = [query stringByAddingPercentEncodingForRFC3986];
// "one%26two%20%3Dthree"

Encoding for x-www-form-urlencoded

The W3C HTML5 recommendation for encoding form data is similar but ever so slightly different from RFC 3986. Section 4.10.22.5 gives us the characters not to percent encode:

ALPHA / DIGIT / “*” / “-” / “.” / “_”

You should also replace the space (" “) character with a “+” (0x2B). Note the differences with RFC 3986 as described in this Stack Overflow answer. The tilde ("~”) is now percent encoded but the asterisk ("*") is not. The recommendation nicely sums up the situation:

This form data set encoding is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices.

Swift

Adding a second function to our String extension:

public func stringByAddingPercentEncodingForFormData(plusForSpace: Bool=false) -> String? {
  let unreserved = "*-._"
  let allowed = NSMutableCharacterSet.alphanumeric()
  allowed.addCharacters(in: unreserved)

  if plusForSpace {
    allowed.addCharacters(in: " ")
  }

  var encoded = addingPercentEncoding(withAllowedCharacters: allowed as CharacterSet)
  if plusForSpace {
    encoded = encoded?.replacingOccurrences(of: " ", with: "+")
  }
  return encoded
}

Note that since many web services do not seem to care I made it optional to encode spaces with a “+” or percent encoding (the default).

Objective-C

The Objective-C method - without the optional parameter:

- (nullable NSString *)stringByAddingPercentEncodingForFormData:(BOOL)plusForSpace {
  NSString *unreserved = @"*-._";
  NSMutableCharacterSet *allowed = [NSMutableCharacterSet
                                    alphanumericCharacterSet];
  [allowed addCharactersInString:unreserved];
  if (plusForSpace) {
    [allowed addCharactersInString:@" "];
  }

  NSString *encoded = [self stringByAddingPercentEncodingWithAllowedCharacters:allowed];
  if (plusForSpace) {
    encoded = [encoded stringByReplacingOccurrencesOfString:@" " 
                       withString:@"+"];
  }
  return encoded;
}

Example usage

// Swift
let query = "one two"
let space = query.stringByAddingPercentEncodingForFormData()
// "one%20two"

let plus = query.stringByAddingPercentEncodingForFormData(plusForSpace: true)
// "one+two"
// Objective-C
NSString *query = @"one two";
NSString *encodedQuery = [query stringByAddingPercentEncodingForFormData:YES];
// "one+two"

Source Code

You can find the Swift source code and some unit tests in the Encode project in my GitHub Code Examples repository. The Objective-C category and tests are in the TwitterSearch project. As always feedback and improvements are welcome.

Further Reading