Checking RSS Feeds for New Posts

I have previously posted some example iPhone Objective C code for reading an RSS feed and then on how to parse the XML content of the feed using NSXMLParser. The one topic that I did not cover so far is to how determine the new posts in an RSS feed. A common expectation for RSS readers is that when the user refreshes the feed they are presented with any new posts in a clearly identifiable way so that they do not have to see previously read posts. I had assumed this was trivial by checking the publication date and/or the GUID string for each post. However, once I started to look into it I realised that it was not so straightforward…

Detecting duplicate RSS posts

The GUID element of an RSS feed is intended as a globally unique identifier for each item in the feed. Unfortunately, as with the pubDate element, it is optional so you can never be 100% sure that the feed you will be reading will include a GUID in each entry. A good overview of the various strategies that feed readers use to determine a new post is summarised here. The approach that I will adopt is to use the GUID if it exists and if not fall back to the link item. Note that this will fail for feeds such as this one from Apple that use the link to reference a generic web page rather than the individual post.

For some more excellent recommendations on how to parse RSS feeds it is also worth reading the documentation for the Universal Feed Parser by Mark Pilgrim. This is the feed parser used by the open source Planet feed aggregator. In particular the section on HTML sanitization is worth understanding since many of the fields can potentially contain malicious scripting.

Reviewing the Model

To recap for those that have not been following along my trivial RSS reader currently has the following model classes:

  • Channel: contains details such as the feed title, link and description extracted from the <channel> element of an RSS feed.
  • Post: contains the details of individual posts in a feed such as the title, description, guid. Also records when a post has been read by the user.
  • Feed: a container class to model and process a single RSS feed. A feed object has a single channel and a collection of posts retrieved from the feed. A feed object is initialised with the URL of the required feed and updated by calling the -refresh instance method.

As currently implemented the Feed class makes no attempt to determine which posts have previously been retrieved. This means that the view controller has to implement this logic. I would like to move this functionality into the model and also implement our strategy for determining new RSS posts. To do this I am going to add a new class to our Model which will store the key attributes of a post used when testing if we have previously retrieved a post. When finished our model classes will look like this:

Adding a Feed Index to the Model

The first thing to do is modify the Feed class to include a feed index. The feed index will have a single entry for every item we find in the feed. To store this index I will add an NSMutableArray to the feed class as follows:

@interface Feed : NSObject <NSXMLParserDelegate> {
  NSURL *feedURL;
  ASIHTTPRequest *feedRequest;
  
  Channel *feedChannel;
  NSMutableArray *feedPosts;
  NSMutableArray *feedIndex;
  
  id currentElement;
  NSMutableString *currentElementData;
}

The feedIndex instance variable will contain an array of IndexEntry objects which are defined as follows:

@interface IndexEntry : NSObject {
    BOOL exists;
    NSString *guid;
    NSString *link;
}

The implementation of IndexEntry is trivial so I will omit it here you can find the full details in the Xcode project download. Before taking a look at the changes to the implementation of the Feed class there is one final change to the model which is to add the link field to the Post class:

@interface Post : NSObject {
    BOOL isRead;
    NSString *title;
    NSString *description;
    NSString *guid;
    NSString *link;
}

The nice thing about our XML parsing code is that we do not need to do anything else to our code to implement the link attribute. Defining it in the model is sufficient for it to be populated anytime we find a <link> element in an RSS feed entry.

Implementing the Feed Index

When we initialise a new Feed object we now also need to initialise the array that will hold our feed index (and release it when we dealloc a Feed object):

- (id)initWithURL:(NSURL *)sourceURL {  
    if (self = [super init]) {
        self.feedURL = sourceURL;
        self.feedPosts = [[NSMutableArray alloc] init];
        self.feedIndex = [[NSMutableArray alloc] init];
    }
    return self;
}

The basic approach the feed parsing code will take is that each time a post is extracted from the feed we will check the feedIndex to see if this is an old post. If it is an existing post we will not bother storing the post. If however this is a new post we will add it to the feedPosts array and update our feedIndex with the post details (guid and link).

To make things easier I have created some helper methods to check and manage the post index as follows:

  • checkExists: search the index to see if a post already exists in the index. This method is what will implement our RSS post duplicate detection strategy.
  • updateIndex: add a post to the index. This method updates the index with the key attributes of a post which are currently just the guid and link elements.
  • resetIndex: this method is called each time we retrieve an RSS feed to reset the exists flag for all posts in the index. As each post is found the exists flag is set to YES indicating that the post has been found in the feed.
  • purgeIndex: this method is used after retrieving a feed to remove old entries in the index that are no longer contained in the feed.

checkExists

The code for the checkExists method is shown below, it takes a single argument which is the current Post object:

- (BOOL)checkExists:(Post *)post {
    NSString *key;
    NSString *value;
  
    if (post.guid) {
        key = @"guid";
        value = post.guid;
    } else if (post.link) {
        key = @"link";
        value = post.link;
    } else {
        return NO;
    }
  
    NSPredicate *predicate = [NSPredicate predicateWithFormat:
                                          @"%K == %@", key, value];
    NSUInteger index = [feedIndex indexOfObjectPassingTest:
                       ^(id obj, NSUInteger idx, BOOL *stop) {
                         return [predicate evaluateWithObject:obj];
                       }];
  
    if (index != NSNotFound) {
    
        IndexEntry *entry = [feedIndex objectAtIndex:index];
        entry.exists = YES;
        return YES;
    }
  
    return NO;
}

The Post object is checked and if it contains a GUID element we make use of it otherwise we attempt to fallback to the link element. If the post contains neither a GUID or link we give up and return NO to indicate that the post does not exist in the index. To search the index we make use the indexOfObjectPassingTest: method of NSArray which takes a block containing an NSPredicate that tests for a matching GUID or link. I covered this way of searching arrays with NSPredicate and blocks in a previous post. If we get a match we update the feed index entry by setting the exists flag to YES and return YES to indicate that the post exists in the index.

updateIndex

The updateIndex method is responsible for adding an entry to the feed index. It takes a single argument which is the post to be added:

- (void)updateIndex:(Post *)post {
    IndexEntry *entry = [[IndexEntry alloc] init];
    entry.exists = YES;
    entry.guid = post.guid;
    entry.link = post.link;
    [feedIndex addObject:entry];
    [entry release];
}

This code is self explanatory, the main advantage of maintaining a separate feed index is that we are only storing a limited number of Post attributes (just the GUID and link) rather than the whole post. This helps keep our memory requirements under control.

resetIndex

The resetIndex method is trivial, it iterates through all entries in the index setting the exists flag to false:

- (void)resetIndex {     
    for (IndexEntry *entry in self.feedIndex) {
        entry.exists = NO;
    }
}

purgeIndex

The purgeIndex method is responsible for cleaning old entries from the index that no longer exist in the feed. It does this by removing all entries where the exists flag is set to NO. This is another example of filtering arrays with predicates:

- (void)purgeIndex {   
    NSPredicate *predicate = [NSPredicate predicateWithFormat:
                                          @"exists == YES"];
    [feedIndex filterUsingPredicate:predicate];
}

The predicate tests for the exists flag set to YES, the filterUsingPredicate method of NSMutableArray then removes all items from the array which do not match the predicate.

Refreshing the Feed

The whole process is kicked off when the refresh method is called on a Feed object. The refresh method removes all entries from the array of posts and also resets the feed index. It then initiates a request to retrieve and parse the feed:

- (void)refresh {
    [feedPosts removeAllObjects];
    [self resetIndex];

    self.feedRequest = [ASIHTTPRequest requestWithURL:feedURL];
    [feedRequest setDelegate:self];
    [feedRequest startAsynchronous];
}

For the details on how the ASIHTTPRequest works you can refer back to the previous post on Reading an RSS Feed. Once the feed data has been received we get a callback to our delegate method requestFinished which is unchanged except that after successfully parsing the feed we purge the index using the helper method we saw previously:

- (void)requestFinished:(ASIHTTPRequest *)request {  
    NSData *responseData = [request responseData];
    NSXMLParser *parser = [[NSXMLParser alloc] initWithData:responseData];
    [parser setDelegate:self];
  
    if ([parser parse]) {

        [self purgeIndex];
        [[NSNotificationCenter defaultCenter] 
            postNotificationName:kFeederReloadCompletedNotification
            object:nil];
    }
  
    [parser release];
}

Parsing the Posts

To finish up the changes to the Feed implementation we need to adjust the way we parse and store the posts. Previously as we found each post it was added immediately to the array of posts. Now we only want to store the post if it does not exist in the index. To do that we first need to retrieve all of the attributes of the post. The NSXMLParser delegate methods that need changing are didStartElement: and didEndElement:. I will omit the didStartElement method since the only change is to remove the line that stores the post. The didEndElement method now looks like this:

- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName
               namespaceURI:(NSString *)namespaceURI
               qualifiedName:(NSString *)qName {
  
  if ([elementName isEqualToString:kItemElementName]) {
    if (![self checkExists:currentElement]) {      
        [feedPosts addObject:currentElement];
        [self updateIndex:currentElement];
    }

    self.currentElement = nil;
    return;	
  }

  SEL selectorName = NSSelectorFromString(elementName);
  if ([currentElement respondsToSelector:selectorName]) {
    NSCharacterSet *charSet = [NSCharacterSet whitespaceAndNewlineCharacterSet];
    NSString *value = [currentElementData stringByTrimmingCharactersInSet:charSet];
    [currentElement setValue:value forKey:elementName];
  }

  [currentElementData release];
  self.currentElementData = nil;
}

As the NSXMLParser reaches the </item> element it calls didEndElement allowing us to check our feed index using the checkExists helper method we defined earlier. If the post does not exist in the index we store the whole post into our feed posts array and we also update the index.

One other minor change when processing other elements in a feed such as the post title and description is to strip whitespace, tab and newline characters from the beginning and end of the field contents. This ensures that values like the GUID and link we store in the index do not contain any extra characters.

Updating the Feed View Controller

With all of the model changes completed we can finally update the view controller to make use of the new feed functionality. The first change we will make is to change the last line in the table which currently allows the user to refresh the view. I am going to change this so that when there are unread posts it allows the user to mark all posts as read. With all posts read it will show “Get more items…” to allow new posts to be fetched. The changes are in the table view delegate method cellForRowAtindexPath:

- (UITableViewCell *)tableView:(UITableView *)tableView 
                     cellForRowAtIndexPath:(NSIndexPath *)indexPath {
    static NSString *postCellId = @"postCell";
    static NSString *moreCellId = @"moreCell";
    UITableViewCell *cell = nil;
  
    NSUInteger row = [indexPath row];
    NSUInteger count = [posts count];
  
    if (row == count) {
    
        cell = [tableView dequeueReusableCellWithIdentifier:moreCellId];
        if (cell == nil) {
            cell = [[[UITableViewCell alloc] 
                      initWithStyle:UITableViewCellStyleDefault 
                      reuseIdentifier:moreCellId] autorelease];
        }
    
        if ([self countUnreadPosts]) {
            cell.textLabel.text = @"Mark all as read...";
        } else {
            cell.textLabel.text = @"Get more items...";
        }
        cell.textLabel.textColor = [UIColor blueColor];
        cell.textLabel.font = [UIFont boldSystemFontOfSize:16];
    
    
        } else {
            ...
            ...
    }
  
    return cell;
}

To implement this changed UI behaviour we also need to change the table view delegate method didSelectRowAtIndexPath:

- (void)tableView:(UITableView *)tableView 
        didSelectRowAtIndexPath:(NSIndexPath *)indexPath {
  
  NSUInteger row = [indexPath row];
  NSUInteger count = [posts count];

  if (row == count) {   
    if ([self countUnreadPosts]) {
      [self markAllRead];
    } else {
      [self getMoreItems];
      [self.tableView deselectRowAtIndexPath:indexPath animated:YES];
    }

  } else {
    ...
    ...
  }
}

So when the user selects the last row in the table we check if we have unread posts and if so call the local method markAllRead otherwise if there are no unread posts we call getMoreItems to refresh the feed. These two methods are shown below:

- (void)markAllRead {
  for (Post *post in self.posts) {
    post.isRead = YES;
  }

  [self updateViewTitle];
  [self.tableView reloadData];
}

- (void)getMoreItems {
  NSPredicate *predicate = [NSPredicate predicateWithFormat:@"isRead == NO"];
  [posts filterUsingPredicate:predicate];
  [self.tableView reloadData];
  [feed refresh];
}

Note the use of an NSPredicate to remove the read posts from the array of posts stored in the feed view controller. To trigger the retrieval of new posts we call the refresh method on our feed object which will when it completes callback the feedChanged method which now looks like this:

- (void)feedChanged:(NSNotification *)notification {
  
  NSMutableArray *feedPosts = [feed feedPosts];
  for (Post *feedPost in feedPosts) {
    [posts addObject:feedPost];
  }

  [self.tableView reloadData];
  [self updateViewTitle];
}

This method now just copies the new posts from the feed object into our view controller and then refreshes the table view and the view title. Our feed object takes care of ensuring we only get to see new posts.

The Post View

To finish up I have made one minor improvement to the detailed post view to allow the full post to be view in Safari. This makes use of the link attribute which we now extract from the RSS feed for each post. A standard system action button is added to the navigation bar in the viewDidLoad method of the PostViewController and wired up to call the method openLink when touched by the user:

- (void)viewDidLoad {    
  [super viewDidLoad];

  UIBarButtonItem *openButton = [[UIBarButtonItem alloc]
      initWithBarButtonSystemItem:UIBarButtonSystemItemAction
      target:self
      action:@selector(openLink)];
  self.navigationItem.rightBarButtonItem = openButton;
  [openButton release];

  NSString *postTitle = [NSString stringWithFormat:@"<H1>%@</H1>",
                         post.title];

  NSString *html = [postTitle stringByAppendingString:post.description];
  [postBody loadHTMLString:html baseURL:nil]; 
}

The openLink method uses the application delegate to open the URL of the post link (assuming it is defined):

- (void)openLink {
  if (post.link) {
    NSURL *url = [NSURL URLWithString:post.link];
    [[UIApplication sharedApplication] openURL:url];
  }
}

Wrapping Up

Another long post so congratulations to anybody who made it all the way to the end. The basic functionality of an RSS reader is now just about complete though there is of course plenty of room for improvement, not least since we only read a single hard-coded feed at the moment. One other change that I may look at in a future post is how to convert the project to make use of Core Data to store the post objects rather than relying on keeping them all in memory.