Parsing an RSS Feed using NSXMLParser

This is the second of a two part post looking at the reading and parsing of a remote RSS feed. The first post covered the retrieval of the feed data over the network. This part will look at how to parse the resulting XML data to extract the individual posts.

Structure of an RSS feed

Before we get too much into the detail it is worth taking a second to look at the structure of an RSS feed. A typical feed, with the most common elements looks something like this:

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>Use Your Loaf</title>
    <link>/blog/</link>
    <item>
      <title>Reading an RSS Feed</title>
      <pubDate>Thu, 14 Oct 2010 21:09:30 +0000</pubDate>
      <link>/blog/reading-an-rss-feed.html</link>
      <guid>538327:6179246:9187069</guid>
      <description><![CDATA[...post goes here...]]></description>
    </item>
    <item>
    ...
    </item>
  </channel>
</rss>

These are more or less the fields that I want to extract from the feed. There are some initial fields such as title and link that describe the channel and then a sequence of items each one containing a title, publication date, link to the original post, a guid that uniquely identifies the item within the feed and then finally the description which contains the actual post data.

The Post Model

I am going to modify the previously created Post model class so that the ivar names match the names of the corresponding XML elements. The reason for this will become clear when we look at the code for parsing the XML. Our Post model interface now looks as follows:

@interface Post : NSObject {
    BOOL isRead;
    NSString *title;
    NSString *description;
    NSString *guid;
}

I have also changed the table view controller code to use these modified field names but I will omit that code here.

The Channel Model

As well as a Post class I will also create a Channel class to contain the RSS feed elements such as the channel title and link. I could store these items directly in the feed class but keeping them in a self contained class actually makes the parsing code easier. The interface for the Channel class is as follows:

@interface Channel : NSObject {
    NSString *title;
    NSString *link;
    NSString *description;
}

The Feed Model

There will be some additional items we need to add to the feed model once we get into the XML parsing code but for now I will add a reference to a channel model and a mutable array to collect the posts that we decode from the RSS feed:

@interface Feed : NSObject {
    NSURL *feedURL;
    ASIHTTPRequest *feedRequest;
  
    Channel *feedChannel;
    NSMutableArray *feedPosts;
}

Event Driven Parsing with NSXMLParser

Both Cocoa for Mac OSX and Cocoa Touch for iOS devices provide a class, named NSXMLParser, that takes care of all of the hard work required to parse XML data. The basic approach is to initialise an NSXMLParser object with the XML stream to decode and then implement a number of delegate methods defined by the NSXMLParserDelegate protocol.

There are delegate methods defined when the NSXMLParser encounters the start and end of a document, the start of a tag (<channel>,<title>,<item>), the end of a tag (</channel>,</title>,</item>), an attribute or character data. As the NSXMLParser object identifies each element in the XML stream it calls the appropriate delegate method to allow something useful to be done with each piece of data.

The basic approach we will take is to map the higher level objects in the RSS feed such as the channel and item to one of our model objects (a channel or post object). Each time we encounter an opening tag for one of these objects (<channel>,<item>) we will allocate a new object. The elements of the object will then be populated as we encounter each of the items contained within the object.

To track which object we are currently constructing we need an instance variable in the feed object to track the current element. We also need a temporary instance variable to collect the content of an element as the parser may invoke our delegate multiple times for the same element. So our revised Feed class now looks as follows:

@interface Feed : NSObject { 
    NSURL *feedURL;
    ASIHTTPRequest *feedRequest;
  
    Channel *feedChannel;
    NSMutableArray *feedPosts;
  
    id currentElement;
    NSMutableString *currentElementData;
}

The array to hold the Post objects can be allocated when we initialise the Feed object:

-(id)initWithURL:(NSURL *)sourceURL { 
    if (self = [super init]) {
    
        self.feedURL = sourceURL;
        self.feedPosts = [[NSMutableArray alloc] init];

    }
  
    return self;
}

Now to get things started we need to revisit the point in the last blog post where we successfully retrieved an RSS feed over the network. Since we are using ASIHTTPRequest to handle the network request the delegate method of interest is called requestFinished. To start the parsing of the retrieved data we need to create an instance of NSXMLParser, set ourselves as the delegate and then tell it to start parsing the data:

- (void)requestFinished:(ASIHTTPRequest *)request {  
    NSData *responseData = [request responseData];
  
    NSXMLParser *parser = [[NSXMLParser alloc] initWithData:responseData];
    [parser setDelegate:self];
  
    if ([parser parse]) {

        [[NSNotificationCenter defaultCenter] 
          postNotificationName:kFeederReloadCompletedNotification
          object:nil];
    
    }
  
    [parser release];
}

This is fairly straightforward, once we have an NSXMLParser object we set the delegate and then call the parse instance method. If we get a successful result we send a notification to any observing class to let them know we have updated the feed.

To actually receive delegate callbacks we need to ensure our Feed class implements the NSXMLParserDelegate protocol:

@interface Feed : NSObject <NSXMLParserDelegate> {
    ...
    ...
}

The first delegate method that we need to implement is for when the parser encounters a new element. But first we will define some string constants for the various XML elements we are interested in decoding:

static NSString * const kChannelElementName = @"channel";
static NSString * const kItemElementName = @"item";

Now the delegate method:

- (void)parser:(NSXMLParser *)parser
               didStartElement:(NSString *)elementName
               namespaceURI:(NSString *)namespaceURI
               qualifiedName:(NSString *)qName
               attributes:(NSDictionary *)attributeDict {
  
    if ([elementName isEqualToString:kChannelElementName]) {
    
        Channel *channel = [[Channel alloc] init];
        self.feedChannel = channel;
        self.currentElement = channel;
        [channel release];
        return;
    
    }
  
    if ([elementName isEqualToString:kItemElementName]) {
    
        Post *post = [[Post alloc] init];
        [feedPosts addObject:post];
        self.currentElement = post;
        [post release];
        return;
    
    }
}

The didStartElement method has a number of parameters but we are really only interested in the element name. If we have just found a <channel> tag we allocate a Channel object and store it in the Feed object. Likewise if we find an <item> tag we allocate a Post object and add it to the end our Posts array in the Feed object. In both cases we set our currentElement reference to the newly created object.

In all other cases we initialise our currentElementData string array ready to collect any data for the current element. The next delegate method that we will implement will be the foundCharacters method which is called each time some content data is encountered:

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string {  
    if (currentElementData == nil) {
        self.currentElementData = [[NSMutableString alloc] init];
    }
  
    [currentElementData appendString:string];     
}

Each time this delegate method is called we check to see if we have our currentElementData buffer allocated and if not we create it. As previously mentioned this method can be called multiple times as a single element is processed so we append the string data to the buffer each time it is called.

Finally we need the delegate method for when we reach the end of an element:

- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName
               namespaceURI:(NSString *)namespaceURI
               qualifiedName:(NSString *)qName {
  
    SEL selectorName = NSSelectorFromString(elementName);
    if ([currentElement respondsToSelector:selectorName]) {
      
        [currentElement setValue:currentElementData forKey:elementName];
        
    }
  
    [currentElementData release];
    self.currentElementData = nil;
}

Here we make use of the fact that we named our object ivars with the XML element names. So rather than testing each element name and manually deciding which ivar needs to be set we can use some Key-Value Coding magic. First we create a selector from the XML elementName then we test if the current element we are processing (a channel or post object) responds to the selector. If it does we use the data that has collected in our currentElementData string buffer to set the value for the ivar whose key is the same as the elementName.

Using Key-Value coding means that we do not have to hardcode which fields we are want to collect for each of our model classes. If I later decide that I want to decode an extra field for the Post object I only need to add that field to the Post class. The XML parsing code remains the same.

The last thing we do in this delegate method is clear our string buffer by releasing and then setting it to nil to guard against over releasing.

There is one more delegate method that we should implement to handle parsing errors. When the NSXMLParser object encounters an error it stops processing the XML stream and sends the parseErrorOccurred: method to its delegate. I am not going to do anything sensible with the error message in this example but it would generally be a good idea to inform our controller of the error situation:

- (void)parser:(NSXMLParser *)parser parseErrorOccurred:(NSError *)parseError {
  
    NSString *info = [NSString stringWithFormat:
                      @"Error %i, Description: %@, Line: %i, Column: %i",
                      [parseError code],
                      [[parser parserError] localizedDescription],
                      [parser lineNumber],
                      [parser columnNumber]];
  
    NSLog(@"RSS Feed Parse Error: %@", info);
}

To finish up the changes to the Feed class we need to make one minor change to the refresh method that is called by our controller each time it wants to update the feed. To ensure we do not store old objects in the feed we clear out the array holding the posts before we initiate the new network request to retrieve the feed:

- (void)refresh { 
    self.feedRequest = [ASIHTTPRequest requestWithURL:feedURL];
    [feedPosts removeAllObjects];
    [feedRequest setDelegate:self];
    [feedRequest startAsynchronous];
}

Updating the Table View Controller

To finish up we need to update the table view controller to interact with our new enhanced feed class. We already have a method named feedChanged that is called when the controller receives a notification from our feed object indicating the feed has been successfully reloaded. We now need to modify that method to actually use the posts we have extracted from the RSS feed:

- (void)feedChanged:(NSNotification *)notification {  
    BOOL newPost = NO;
    NSMutableArray *feedPosts = [feed feedPosts];
    for (Post *feedPost in feedPosts) {
    
        if (![self postExists:feedPost]) {
            newPost = YES;
            [posts addObject:feedPost];
        }
    }
  
    if (newPost) {
    
        [self.tableView reloadData];
        [self updateViewTitle];
    }
}

This method works it way through the posts stored in the feed object and if they do not already exist in the store of posts that our view controller knows about we add the new post. Then if we have at least one new post we reload the table data and update our view title. The help method postExists: is defined as follows:

- (BOOL)postExists:(Post *)newPost {  
    NSString *guid = [newPost guid];
  
    for (Post *post in self.posts) {
    
        if ([post.guid isEqualToString:guid]) {
            return YES;
        }
    }
  
    return NO;
}

This simply iterates through our store of posts comparing the unique guid string for each post to determine if we already have this post. This is almost certainly not the best approach, especially since any posts that we have previously deleted will reappear in the view. A better approach would be to have our Feed object store the publication date of the most recent post it has seen and when we refresh the feed only return more recent posts. Since this is already a long post I will save that for another time but hopefully you get the idea.

Wrapping Up

Hopefully this post has shown how easy it is to parse XML data using the NSXMLParser class. The example app is still a very poor RSS reader and not just because of the horrible user interface but it is on the way to becoming useful. A good topic to explore in a future post would be how to store the posts in a persistent store such as an sqlite database or using Core Data so that they can be read offline. Using Core Data to store the posts would also be a better choice than keeping an array of posts in memory.