Search
Follow
Recent Comments

Entries in core data (4)

Thursday
Jan192012

Core Data Queries Using Expressions

Core Data can have a steep learning curve for anybody new to either the Mac or iOS platforms. One of the key points for me was understanding that Core Data is not a relational database but a persistent object store with many features to manage the life-cycle of an object. I think some of the confusion comes from the fact the Core Data can use SQLite as the underlying object store but that is an implementation detail that can lead you astray if you are not careful.

So if Core Data is not a relational database how do you do those things that would be easy if you could just use an SQL query? A Core Data fetch request with a combination of predicates and sort descriptors is a very flexible mechanism that covers many of the most common queries you might need for retrieving objects. However when you are more interested in querying for specific values such as the minimum or maximum value of an attribute an alternative approach using expressions can be easier and more efficient.

To illustrate the code snippets in this post I will assume a very simple Core Data model with a single entity to represent a task in a todo list:

Retrieving the minimum value of an attribute

Each task object contains an NSDate attribute which indicates when the task was first created:

@property (nonatomic, retain) NSDate * createdAt;

Suppose I want to find out what the oldest creation date is for all of the tasks. A first approach might be to use a fetch request to retrieve the first task after sorting all of the tasks by the creation date (with an ascending sort order):

NSFetchRequest *fetchRequest = [[NSFetchRequest alloc] init];

NSEntityDescription *entity = [NSEntityDescription entityForName:@”Task”

inManagedObjectContext:self.managedObjectContext];

[fetchRequest setEntity:entity];

[fetchRequest setFetchLimit:1];

 

NSSortDescriptor *sortDescriptor = [[NSSortDescriptor alloc] initWithKey:@”createdAt”

ascending:YES];

[fetchRequest setSortDescriptors:[NSArray arrayWithObject:sortDescriptor]];

[sortDescriptor release];

 

NSError *error = nil;

NSArray *fetchResults = [self.managedObjectContext executeFetchRequest:fetchRequest

error:&error];

 

Task *oldest = [fetchResults lastObject];

NSLog(@”oldest = %@”,oldest.createdAt);

Note that the fetch limit (setFetchLimit) is set to 1 as we only want the first object in the sorted list of all tasks. This is not a bad solution but as we will see we can do much better. To understand what Core Data is doing under the covers it can be very useful to turn on some debugging. In particular we can get Core Data to show us the underlying SQL queries it is using by setting the argument -com.apple.CoreData.SQLDebug to 1 on application launch. With Xcode 4 the launch arguments are set in the project scheme window:

With the debugging enabled we can see the query that Core Data is using and the fetch execution time which will be useful when comparing the performance of different approaches:

2012-01-19 20:31:37.864 ToDoSync[3455:fb03] CoreData: sql: SELECT 0, t0.Z_PK, t0.Z_OPT, t0.ZCOMPLETE, t0.ZCREATEDAT, t0.ZNOTE, t0.ZTITLE FROM ZTASK t0 ORDER BY t0.ZCREATEDAT LIMIT 1

2012-01-19 20:31:37.875 ToDoSync[3455:fb03] CoreData: annotation: sql connection fetch time: 0.0742s

2012-01-19 20:31:37.875 ToDoSync[3455:fb03] CoreData: annotation: total fetch execution time: 0.0784s for 1 rows.


The debugging output shows us that Core Data is performing a select to retrieve all of the attributes of the task object, ordered by creation data with a query limit of 1. The total fetch execution time in this case was 0.0784s based on a database containing 5,000 tasks running on a fourth generation iPod touch test device. Note that any time you are looking to optimise performance it is a good advice to run the code on an actual device. Running on the iOS Simulator will give you much faster performance due to the obviously greater performance of the host computer.

Restricting The Properties to Fetch

Before looking at the use of expressions there is one minor optimisation that we could consider applying to the previous fetch request. Since we are interested only in the creation date we can modify the fetch request to make it only retrieve that one property:

[fetchRequest setResultType:NSDictionaryResultType];

[fetchRequest setPropertiesToFetch:[NSArray arrayWithObject:@”createdAt”]];

By default the result type of a fetch request is NSManagedObjectResultType which as the name implies means we will get managed objects back from the fetch. To specify that we want one or more properties of an object you first need to make the fetch request return a dictionary by setting the result type to NSDictionaryResultType and then you set the properties to fetch by passing it an array containing the names of the properties you want back. In this case we just want the “createdAt” property. Now when we execute the fetch request we get back an array containing a single dictionary (since we set a fetch limit of 1) which contains the single property which in this case is an NSDate:

NSDate *oldest = [[fetchResults lastObject] valueForKey:@”createdAt”] 

Looking at the SQL debug you can see that the select statement now only retrieves the single attribute

2012-01-19 20:59:26.484 ToDoSync[18564:707] CoreData: sql: SELECT t0.ZCREATEDAT FROM ZTASK t0 ORDER BY t0.ZCREATEDAT LIMIT 1

2012-01-19 20:59:26.535 ToDoSync[18564:707] CoreData: annotation: sql connection fetch time: 0.0505s

2012-01-19 20:59:26.539 ToDoSync[18564:707] CoreData: annotation: total fetch execution time: 0.0545s for 1 rows.


There is a performance improvement in that this fetch executes in around 0.05s compared to 0.07s for the previous example. However this may often turn out to be a false optimisation if after determining the earliest creation date we then shortly afterwards find we want to retrieve the actual task with this creation date. In that case it is generally better to just retrieve the full task as in the original query so that Core Data already has it cached ready for when we need it.

Using an Expression

A better way to solve this type of query is actually to create an expression with the function that we want to perform. Unfortunately there is a little bit more code required though we start as with the previous example by constructing a fetch request that will return a dictionary result:

NSFetchRequest *fetchRequest = [[NSFetchRequest alloc] init];

NSEntityDescription *entity = [NSEntityDescription entityForName:@”Task”

inManagedObjectContext:self.managedObjectContext];

[fetchRequest setEntity:entity];

[fetchRequest setResultType:NSDictionaryResultType];

We then create an expression which specifies the function we want to use and the key-path of the property we want to apply it to. So for our example where we want the minimum of the createdAt property:

NSExpression *keyPathExpression = [NSExpression expressionForKeyPath:@”createdAt”];

NSExpression *earliestExpression = [NSExpression expressionForFunction:@”min:”

arguments:[NSArray arrayWithObject:keyPathExpression]];

There are a wide range of functions that we could apply including average:, sum:, min:, max:, median:, sqrt:, etc., for the full list check the documentation for the NSExpression class. Unfortunately that is not all we need to do as we must also create an expression description to specify the result type we are expecting from the fetch request:

NSExpressionDescription *earliestExpressionDescription = [[NSExpressionDescription alloc] init];

[earliestExpressionDescription setName:@”earliestDate”];

[earliestExpressionDescription setExpression:earliestExpression];

[earliestExpressionDescription setExpressionResultType:NSDateAttributeType];

The key point is that we need to set a name for the expression which we will use when retrieving the result - remember that we have already specified that the fetch request should give us back a dictionary containing the result. The name of the expression will be our key into that dictionary. We also need to specify that we expect the result type of the expression to be an NSDate object. Finally we can set the properties to fetch using our expression description and execute the fetch request:

[fetchRequest setPropertiesToFetch:[NSArray arrayWithObject:earliestExpressionDescription]];

 

NSError *error = nil;

NSArray *fetchResults = [self.managedObjectContext executeFetchRequest:fetchRequest

error:&error];

The NSArray we get back as the fetch result should contain a single NSDictionary object which contains the NSDate object stored using the expression description name as the key:

NSDate *oldest = [[fetchResults lastObject] valueForKey:@”earliestDate”];

Finally just for completeness and assuming you are not using ARC we should release a few things:

[earliestExpressionDescription release];

[fetchRequest release];

This is a lot more code than the original solution but the SQL debug log shows some interesting results:

2012-01-19 21:47:39.292 ToDoSync[18639:707] CoreData: sql: SELECT min( t0.ZCREATEDAT) FROM ZTASK t0

2012-01-19 21:47:39.304 ToDoSync[18639:707] CoreData: annotation: sql connection fetch time: 0.0121s

2012-01-19 21:47:39.308 ToDoSync[18639:707] CoreData: annotation: total fetch execution time: 0.0162s for 1 rows.


I find it somewhat amusing that the more code we write the smaller the underlying Core Data SQLite query gets :-) This fetch request, executed on the same device and dataset as before, executes in 0.0162s which is considerably faster than the original query which took over 0.07s. The reason is obvious if you take a look at the SQL query being used as Core Data is using SQLite to perform the min function directly on the createdAt property in the database avoiding the need to retrieve all 5,000 values.

Where expressions really start to become effective is when you need to perform multiple calculations on the same dataset. So suppose that we want to calculate both the earliest and the latest creation dates. All we need to do is construct a second expression:

NSExpression *latestExpression = [NSExpression expressionForFunction:@”max:”

arguments:[NSArray arrayWithObject:keyPathExpression]];

NSExpressionDescription *latestExpressionDescription = [[NSExpressionDescription alloc] init];

[latestExpressionDescription setName:@”latestDate”];

[latestExpressionDescription setExpression:latestExpression];

[latestExpressionDescription setExpressionResultType:NSDateAttributeType];

This time we are using the max: function and we have named our expression “latestDate”. Note that we do not need to use a separate fetch request for each of these expressions. We can set our properties to fetch to include both expressions and execute a single fetch request:

[fetchRequest setPropertiesToFetch:[NSArray arrayWithObjects:earliestExpressionDescription,

latestExpressionDescription, nil]];

Now when we execute the fetch request we get back a dictionary containing two entries representing both the earliestDate and latestDate results:

NSError *error = nil;

NSArray *fetchResults = [self.managedObjectContext executeFetchRequest:fetchRequest

error:&error];

 

NSDate *oldest = [[fetchResults lastObject] valueForKey:@”earliestDate”];

NSDate *latest = [[fetchResults lastObject] valueForKey:@”latestDate”];

As the SQL debug shows us this query to calculate both the earliest and latest dates executes almost as fast as the fetch for just the earliest date and is still many times faster than the original approach of sorting the property with Core Data:

2012-01-19 22:06:33.153 ToDoSync[18681:707] CoreData: sql: SELECT min( t0.ZCREATEDAT), max( t0.ZCREATEDAT) FROM ZTASK t0

2012-01-19 22:06:33.170 ToDoSync[18681:707] CoreData: annotation: sql connection fetch time: 0.0174s

2012-01-19 22:06:33.175 ToDoSync[18681:707] CoreData: annotation: total fetch execution time: 0.0220s for 1 rows.

Wrapping Up

Using NSExpression is perhaps not the most intuitive way to perform complex queries and calculations on Core Data sets. However I think it is worth spending some time mastering them as the performance improvements can be significant, especially when you need to frequently repeat a calculation with a large data set.

Tuesday
Mar232010

Using categories with core data

When I first started using Objective-C on the Mac language features such as categories and protocols were a mystery to me. You can go a long way without having to know anything about them but a small effort to grasp the key concepts can make a huge difference. Here is one example of how categories can make working with core data less of a pain.

The Xcode data modeling tool makes it pretty easy to create a core data model and then with one click generate the Objective-C class files for each entity in the model. The only disadvantage with this approach is that it can be a pain to manage if you need to add anything to the model class files. Each time you modify your core data model and regenerate all of the class files you need to merge your additions back into the new files.

Of course, there is a better way….

Objective-C categories allow you to add new methods to an existing class without having to change the source code of the original class. In fact categories allow you to do this even if you do not have access to the original source. (This is not the same as subclassing in that you cannot add new instance variables to the class.)

The ability to split the implementation of a class across several source files is pretty useful when adding methods to a core data model class. We can keep the original generated files unmodified and place the additional methods in a separate file. Anytime the core data model changes we can regenerate all of the class files without worrying about overwriting our manual modifications.

Suppose I have a core data class named “Item” which, to keep things simple, has a single string property. The contents of the core data generated header file “Item.h” will be fairly minimal:

@interface Item : NSManagedObject

{

}

@property (nonatomic, retain) NSString *name;

@end

The actual implementation file “Item.m” is also pretty simple:

#import “Item.h”

@implementation Item

@dynamic name;

@end

Now suppose that I want to add a method to Item that will return the initial, uppercase letter of the name property. (A common enough requirement if I am displaying items in a sorted iPhone table view). By convention the filename used for the category is the name of the original class followed by a “+” followed by the category name. So in this example I could create a file named “Item+Index.h” as follows:

#import “Item.h”

@interface Item (Index)

- (NSString *)initialLetter;

@end

The implementation file “Item+Index.m” would look like this

#import "Item+Index.h"
@implementation Item (Index)
- (NSString *)initialLetter {
    // return the first character of the name converted to uppercase
    return [[name substringToIndex:1] uppercaseString];
}
@end

That is all there is to it, anytime I want to use the added methods I import “Item+Index.h” instead of Item.h and I am done.

Monday
Mar152010

NSFetchedResultsController and sort performance

I really liked the NSFetchedResultsController when it was introduced in iPhone OS 3.0. It has made it much easier to implement a core data backed table view and removes the need to write a lot of code. It also seems to do a pretty good job of keeping the memory footprint to a minimum.

However, one thing I have been struggling with is with performance when using a grouped table. To be more precise the issue is not really with the fetched results controller but with the sort descriptor when using a case insensitive search. The basic setup is as follows:

    // Create a fetch request
    NSFetchRequest *fetchRequest = [[NSFetchRequest alloc] init];
    NSEntityDescription *entity = [NSEntityDescription entityForName:@"Record" 
                                                       inManagedObjectContext:moc];
    [fetchRequest setEntity:entity];
	
    // Create a sort descriptor for the request
    NSSortDescriptor *sortDescriptor = [[NSSortDescriptor alloc] initWithKey:@"value"
                                         ascending:YES
                                         selector:@selector(localizedCaseInsensitiveCompare:)];
    [fetchRequest setSortDescriptors:[NSArray arrayWithObject:sortDescriptor]];
	
    // Now create the fetched results controller
    NSFetchedResultsController *frc = [[NSFetchedResultsController alloc]
                                                 initWithFetchRequest:fetchRequest
                                                 managedObjectContext:sharedMoc
                                                 sectionNameKeyPath:@"valueForSectionTitle"
                                                 cacheName:@"cache"];

    [fetchRequest release];
    [sortDescriptor release];

The model named “Record” has a string attribute named “value” that is used to order the contents of the table. A method defined for the Record model “valueForSectionTitle” returns the value used for generating the section title for each record. It looks something like this (assuming UTF8 strings):

- (NSString *)valueForSectionTitle {
    // Return the first character of the value
    // converted to uppercase
    return [[self.value substringToIndex:1] uppercaseString];
}

This approach works but even with small datasets of around 1,000 records there is a noticeable lag in the user interface when the table view loads. Turning on core data SQL debugging shows what is happening:

2010-03-15 14:12:28.633 CorePerf[2367:207] CoreData: sql: SELECT 0, t0.Z_PK, t0.Z_OPT,
t0.ZVALUE FROM ZRECORD t0 ORDER BY t0.ZVALUE COLLATE NSCollateLocaleSensitiveNoCase
2010-03-15 14:12:29.654 CorePerf[2367:207] CoreData: annotation: sql connection fetch
time: 0.8731s
2010-03-15 14:12:29.662 CorePerf[2367:207] CoreData: annotation: total fetch execution
time: 1.0311s for 1500 rows.

So for 1500 records it takes over a second for the fetch request to execute. To avoid this runtime delay it make sense to compute the section index title (in this the uppercase initial letter of the string) ahead of time and avoid the case insensitive search. Adding a new attribute to the core data model to hold this initial letter simplifies the sort descriptor removing the need for a sort comparison selector.

    NSSortDescriptor *sortDescriptor = [[NSSortDescriptor alloc] initWithKey:@"indexValue"
                                                                 ascending:YES];

The impact on the fetch request is dramatic:

2010-03-15 14:21:25.580 CorePerf[2397:207] CoreData: sql: SELECT 0, t0.Z_PK, t0.Z_OPT,
t0.ZVALUE, t0.ZINDEXVALUE FROM ZRECORD t0 ORDER BY t0.ZINDEXVALUE
2010-03-15 14:21:25.925 CorePerf[2397:207] CoreData: annotation: sql connection fetch
time: 0.2101s
2010-03-15 14:21:25.934 CorePerf[2397:207] CoreData: annotation: total fetch execution
time: 0.3537s for 1327 rows.

The fetch time drops from 0.8731s to 0.2102s which is a 75% improvement. When I get some more time I will experiment with some different data set sizes but for now this seems to suggest a definite conclusion:

Arrange your data so that you can use the default sort comparison selector when creating core data fetch requests.

Thursday
Mar112010

Debugging core data on the iPhone

It is great that Apple shipped the core data framework with iPhone OS 3.0 but a lot of the performance and debugging tools are still missing from Xcode and Instruments. This makes trying to fine tune a slow running core data operation extra difficult on the iPhone OS.

One tip I did find buried in the Core Data Programming Guide is to turn on SQL tracing. To do this you need to pass the following argument to the application:

    -com.apple.CoreData.SQLDebug 1

To be clear this needs to be an argument passed to the application and not an environment variable. To set the argument find the application under Executables in the Xcode Groups & Files window and right click to get at the Info screen. Click the Arguments tab in the diaglog window and insert the argument into the top window (Arguments to be passed on launch).

Running the application will now log the SQL requests used by core data. This peek behind the curtain to see what core data is up to is extremely useful. Understanding the SQL queries that core data is using is half the battle in optimising performance.

For example, here is the output from CoreDataBooks iPhone sample app:

2010-03-11 19:34:19.672 CoreDataBooks[3854:207] CoreData: sql: pragma cache_size=1000
2010-03-11 19:34:19.673 CoreDataBooks[3854:207] CoreData: sql: SELECT Z_VERSION, Z_UUID,
Z_PLIST FROM Z_METADATA
2010-03-11 19:34:19.763 CoreDataBooks[3854:207] CoreData: sql: SELECT 0, t0.Z_PK, t0.Z_OPT,
t0.ZAUTHOR, t0.ZTITLE, t0.ZCOPYRIGHT FROM ZBOOK t0 ORDER BY t0.ZAUTHOR, t0.ZTITLE
2010-03-11 19:34:19.764 CoreDataBooks[3854:207] CoreData: annotation: sql connection fetch
time: 0.0009s
2010-03-11 19:34:19.765 CoreDataBooks[3854:207] CoreData: annotation: total fetch execution
time: 0.0020s for 11 rows.

It might not be as good as having the full set of core data performance instruments but at least you can see the select statement and the execution time for the fetch.