[Dev] A short explanation of Collections

John Anderson john at osafoundation.org
Mon Aug 29 10:33:56 PDT 2005


As the dust settles on the recent Collections and Sets work, I decided 
to write up a short description of what every Chandler developers should 
know about Collections. The idea of a query that automatically updates a 
list of items, and notifies subscribers of changes, has been central to 
Chandler from the beginning.  Our design and implementation has evolved 
many times, influenced by what we have learned through experience. 
Although some of what I describe here might change slightly I think the 
basic ideas will remain unchanged.

The new Collections are a replacement for repository.query.Query, which 
was used by ItemCollections. In the old ItemCollection world that most 
of you are probably familiar with an itemCollection was made up of a 
query that specified a set of items, modified by adding in a list of 
inclusion items and removing a list of exclusion items. The final 
results were cached in a ref collection that was usually accessed like 
an array. We ran into a number of problems using ItemCollections.  For 
example, when one ItemCollection, e.g.  the "All" item collection fed 
its results into a new filtered ItemCollection, e.g. the subset of 
calendar events,  there were problems propagating changes and 
notifications. Also we learned that the majority of ItemCollections in 
Chandler were simply ordered lists of items, and the notion of order in 
ItemCollections was not always maintained.

In the new Collections world we have a number of different types of 
Collections:

KindCollection: all the items of a particular kind.

ListCollection: an explicit list of items.

FilteredCollection: all items in another source Collection that match a 
Python expression. You must manually specify a list of attributes which 
Items must have to be considered for filtering by the expression. In the 
future we may limit what Python code FilteredCollections may use.

UnionCollection: the union of two or more source Collections

IntersectionCollection: the intersection of two or more source Collections

DifferenceCollection: the difference between to source Collections

InclusionExclusionCollection: a collection similar to our old 
ItemCollection, that implements some convenience methods to access 
inclusions, exclusions,  the source Collection, and methods to add and 
remove items. The InclusionExclusionCollection, is made up of a union 
collection, difference collection, 2 list collections and a source 
collection as follows:

InclusionExclusionCollection  = ((source - exclusions) + inclusions).

To illustrate the power of Collections consider the new "All" Collection:

allCollection = ((((Notes - (Events filtered by (isGenerated = True)) - 
Trash) - allExclusions) + allInclusions)

allCollection is an InclusionExclusionCollection. Notes and Events are 
KindCollections. allInclusions, allExclusions and Trash are ListCollection.

There isn't any code necessary to exclude generated events or item in 
the trash from the "All" Collection, which simplifies the design. It's 
also easy to update the rules for what is contained in the "All" 
Collection without having to update a bunch of code. So if you find 
yourself writing a bunch of code to make sure items end up in the right 
Collections in the sidebar or elsewhere, you could probably avoid it 
completely by setting up the right Collections to start with.

You can subscribe to a collection by adding an item to notify to the 
collection's subscribers attribute. By default, the method 
"onCollectionEvent" is called on items that are subscribed, however, you 
can specify a different method name in the collectionEventHandler 
attribute of your item that is notified.

Collections are not dependent on Blocks, but Blocks are the main user of 
Collections.

That finishes the overview.  For those that want to understand more 
detail or the implementation, read on.

Collections are Items that provide a thin wrapper on repository Set 
attribute values, where most of the work actually takes place. We need 
this wrapper for a few reasons.  First it's difficult to manage lots of 
references to an attribute, which is why Blocks, ContentItems, etc. are 
not attributes. Second, the Item implements the support for 
notifications. Finally, Set attributes require arguments that refer to 
other Sets in order to create them.  These arguments aren't known when 
the Collection Item is created.  This creates an awkward need to delay 
creation of the Set attribute.  The Item provides Python magic to handle 
this awkward delay creation. A further limitation of Sets is that they 
are immutable, which means that changing a node in a Collection tree is 
not supported.  It may be possible to add more Python magic the Item 
that destroys and re-create the correct Sets when one node changes.

These disadvantages imposed by making Sets an attribute made some of us 
think that making Sets an Item would have been a better choice.  The 
counter argument was that we would face the same limitations even if 
Sets were Items.  There might also be situations where using Sets as 
attributes would have a advantage, even though they are used that way today.

Collections have the same kind of index that ItemCollections had. If you 
never index into a Collection it won't have an index.  If you index into 
it, you'll get an index.  The index you get is determined by an 
attribute on Collection.  By default you'll get an ordered index, where 
the order is the same as the iteration order of the Collection.  If the 
index attribute is the name of an attribute, you'll get an index sortedd 
by that attribute.

Unlike ItemCollections, collections, except for ListCollections, don't 
cache their results.

Most Collections are used as contents for Blocks.  As in the past, when 
the Block is rendered it subscribes to notifications, and when it's 
unrendered it unsubscribes to notifications.  This is a simple 
optimization to minimize the number of notifications, since only blocks 
that are visible on the screen need to be notified to update themselves.

KindSets and FilteredSets maintain their indexes by using repository 
monitors.  We use that same mechanism to notify subscribers.  
Notifications for Items coming and going to Collections are synchronous. 
This doesn't work for changes to attributes on Items in other views, so 
instead we we use an asynchronous notification.  In order to get these 
notifications it's necessary to poll for them.  Each time OnIdle is 
called we do a repository update and poll for these notifications. Each 
time a notification is received, the block that gets the notification is 
added to a list of dirty blocks.  At the end of OnIdle, the list of 
dirty blocks is updated on the screen and removed from the list of dirty 
blocks. This has the benefit of accumulating all of the changes to data 
fairly quickly, and only redrawing the affected part of the screen when 
there's nothing left to do.

Finally, we plan to implement a nestable "Freeze/Thaw" methods to 
temporarily ignore and enable notifications, which will further improve 
performance.




More information about the Dev mailing list