[Design] Chandler Agent FrameworkAndy Hertzfeld Thu, 05 Dec 2002 22:46:05 -0800
I've recently been working on a new set of ideas that would provide Chandler with an extensible mechanism and really cool UI for automatically responding to various events and situations designated by the user, as well as orchestrating asynchronous, multi-step tasks. It's an "agent" framework based on a metaphor of people in occupational roles. I wrote a paper about it, which you can read at http://differnet.com/Agent_Framework.html (it will eventually be on the OSAF site, too), which is also included below. It's new and different, so we really can use your help in refining and evaluating it; let us know what you think. -- Andy ---------------- Chandler Agent Framework Andy Hertzfeld 12/5/02, Version .3 Motivation Chandler's user interface is based on a landscape of views, arranged in a set of modules, each with an associated URL. Each view presents a collection of items specified by a query from a (possibly remote) data repository, displayed in a spreadsheet-like table or in a view-specific, flexible fashion. While that's fine for browsing and manipulating data, it doesn't provide an intuitive way to express continuous activity or initiative. We'd like Chandler to be able to respond automatically to various situations as they arise, as instructed by the user in a simple, understandable fashion. Chandler should be capable of orchestrating complex, multi-step, asynchronous tasks in the background, like booking reservations for a business trip or arranging a conference call between multiple parties. Non-programming users should be able to specify actions that are performed in response to certain events while programmers should be able to define new types of events and actions, besides the ones built into Chandler. This paper proposes that we accomplish the above by building an agent framework into Chandler. Agents are plug-in modules containing Python scripts and other resources like images, that perform tasks on behalf of the user, either when explicitly requested or when the agent notices particular conditions arising. Agents have a compelling visual representation based on an anthropomorphic "occupation" metaphor (postman, janitor, secretary, courier, sleuth, etc), in which we express as much of their internal state as we can visually (they look busy, alert, bored, sleeping, impatient, pleased, etc). They can be instructed by the user without any programming, simply by associating predefined conditions with predefined actions. So the Chandler user interface might present two different kinds of plug-in modules, views (actually sets of related views, currently called "packages"), and agents. The user will think of views as places in the landscape of the program, sort of like a web page, and agents as active characters who can visit the places and accomplish tasks on their behalf. I think this provides us a with a flexible framework that will allow Chandler to address a wide variety of applications while giving users simple yet powerful ways to extend their will into the software and perform routine tasks automatically. The rest of this paper discusses aspects of a potential agent framework for Chandler without attempting a detailed design. This is still a new, raw set of unproven ideas, with no prototyping or proof of concept work attempted yet, so we should expect it to evolve and mutate rapidly as we proceed with development. The Agent Metaphor A good user interface should find a way to offer lots of power and flexibilty behind a simple façade, some organizing principle that aligns complex options behind concepts the user already understands. The metaphor of using people in occupational roles to represent clusters of related functionality, generating and responding to events, appears to be rich, deep and fun to work on (it almost seems to start designing itself). It enables the user to apply their lifetime of experience in the real world to understanding how to access the functionality of the program. Most users have more trouble manipulating abstract ideas than concrete objects. The realization of the agent metaphor should be highly visual if it's going to be effective for a broad range of users, since that makes it more concrete and real to them. We can use the graphical representation of an agent to both distinguish it from other agents and to convey the current details of it's internal state and portray ongoing activity. Another strength of the agent metaphor is that it is inherently modular, since agents are autonomous, and distinct in the user's perception. It's easy to envision adding more members with different skills to your virtual staff. Modular Framework Chandler is based on a modular architecture, where plug-in modules containing Python scripts can add new functionality on the fly, without restarting the application. The modules typically inherit rich behavior from base classes in the Chandler core, so they can be relatively small themselves. The agent framework will leverage the modular architecture to allow agents to be developed and distributed independently from the core application, and to be added and removed smoothly at runtime. A standard plug-in module in Chandler is called a "package". Packages usually contain a set of related views, as well as the resources required for the views to render themselves. Packages can even contain preference information, schemas, or additional data to augment the repository, so they can serve as the distribution format for Chandler applications. Packages will often contain one or more agents, sometimes as part of a set of related views. For example, a postman agent may be part of the email package. Other packages may contain agents that are independent from any particular view. The user can also create new agents by cloning an existing agent, and then modifying it. There are many other aspects of the framework that have to be carefully thought through. For example, agents can probably register with the framework declaring themselves suitable for a set of tasks, for example being called when the user issues a specific command like 'delete'. Agents should also contain a set of cryptographic credentials identifying them and allowing them to have specific privileges. Some more of the framework issues are discussed below, but the detailed design is beyond the scope of this document. Agent Visualization Chandler tries to portray as much of the agent's internal state as it can visually. Agents will typically be around 80 pixels square but they will also be rendered both smaller and larger as circumstances permit. They will use text, graphics and animation, and in rare cases sound, to portray their current status. An agent's visual appearance will be based on a user selectable set of base images supplemented by agent-specific images overlayed onto the base image. Agents will display prominent tokens of their role, like hats or badges, to make their roles clearer and to help them be more easily distinguished from each other. Their graphical appearance is dynamic, changing to reflect their internal state, and their progress toward fulfilling their assigned tasks. We might want to render features with a vector-based format like Flash, which would potentially allow a wider range of expressions than bitmaps. An agent will express its current level of satisfaction at carrying out its assigned tasks via its appearance, by appearing happy, sad, neutral, frustrated, etc. It will portray its activity level by appearing disabled, sleeping, alert, busy, bored, excited, etc. An agent will illustrate the completion state of the tasks at hand, and show the current step of a multi-step process. They will also be able to show magnitudes like none, one, some or many. Agents use animation to show when they are communicating and if they are currently sending, receiving or waiting for data. The user will have some control of the appearance and style of their agents. The system will supply a variety of base appearances, from which the user can choose, and third parties will be able to add even more choices. When you hover the mouse over an agent, a pop-up textbox appears (or possibly a speech balloon), containing even more information about the agent and it's current status. When you click on an agent, a dialog box appears to allow you to interact with it in detail. Agent User Interface Agents are integrated into the user interface in a number of different ways. There will be an optional "agent bar" stretching horizontally across the top of a Chandler window, below the other toolbars, that will display the relevant agents to the current view or activities, with the most relevant ones toward the left hand-side. The contents and order of the agents in the bar might change as you navigate between views, since some agents are specific to certain views or packages. There is also an Agents view available, which is a place that allows you to see all of your agents and manage them, as well as creating new ones and deleting agents that are no longer useful. Finally, agents can pop up in dialogs as necessary when conditions arise (see below) An agent continuously reflects activity and progress via its appearance, but when the user lingers the cursor over it, it displays more detailed information in a pop up text box or speech balloon. When the user clicks on an agent, it pops up a dialog box for a more detailed interaction. Items from the view may also be dragged and dropped on an agent, which will respond according to its scripts. At their most basic, agents offer users a number of commands that they can perform immediately, when the user selects one from a list, or at a specified time. Sometimes the commands require additional parameters, obtained by interrogating the user in a step by step, wizard-like manner. Some commands will complete quickly, but others may execute indefinitely in the background, while the agent offers progress feedback visually. Agents also contain a set of event/action pairs, some of which are enabled by default. If a pair is enabled, the agent will execute the specified action when it receives the associated event. The user can choose to enable or disable the pairs, or to create new event/action pairs by selecting an event and action from lists. One of the main functions of an agent is to notify the user when notable events occur, so they will need to get the user's attention with varying degrees of urgency. They will have multiple ways of engaging the user's attention, including altering their appearance, animating and changing colors, making sounds or popping up dialogs as appropriate. Agent Runtime Agents are Chandler modules, which contain an XML file and possibly some other resources including Python scripts that usually inherit from a base class defined by the agent framework, but might also inherit from other agent modules as well. The Agent Runtime provides methods controlling the life-cycle of agents, which are usually accessed through inheritance. These include methods for creation, destruction, notification, cancellation, etc. There are also methods allowing agents to register with the framework to associate themselves with particular activities. There might be some sort of sandbox-like restricted execution environment for the agent scripts if they're not fully trusted (see below), but otherwise an agent has access to the full resources of the client's Python environment, including performing database operations. There will be a facility for agents to ask the runtime for a description of the services that are available, including version numbers, so they can adapt themselves to multiple versions of the runtime. One of the main functions of an agent is to generate and respond to events. Agents may call the agent runtime to define new types of events. They also invoke the runtime to signal events as necessary to notify other interested parties when the appropriate conditions arise. Agents can also subscribe to events generated by the system, or by other agents, to be notified each time the specified event occurs, without having to poll for it. Events are queued by the runtime, and fed to the relevant agents asynchronously from the main thread of execution, so responses can be lengthy without interfering with the main task the user is focusing on in the foreground. Agent Communications Agents should be able to communicate with other agents, either running in the same application instance, or between different application instances on a local area network, or even across the Internet, in order for them to negotiate on the behalf of the users they represent. They need to be able to address one another globally, without requiring intermediating servers or fixed IP addresses. An interesting approach is to have them communicate using Jabber, so they can exchange XML-based structured instant messages in order to accomplish a variety of tasks. Chandler may employ dozens of agents, so it wouldn't be prudent to require a unique Jabber ID for each agent. Instead, they can all use the user's Jabber ID and rely on the Jabber resource ID or an extension block to designate the intended agent. Since Jabber works by exchanging XML fragments, agents can carry out structured conversations using application-defined XML tags to accomplish their tasks. We can leverage the machinery for XML namespaces to provide a way for agents to declare the XML vocabularies that they support, and a way to query if a particular agent supports a particular vocabulary. There should also be a way to discover the Jabber IDs of agents offering a particular service that you're interested in, and what protocols those agents support. The Chandler agent runtime will allow you to query what agents are registered with a particular Chandler instance, but there should also be a way to query across all known instances for agents that fulfill a particular requirement. Also, we could leverage Jabber chat rooms to creating meeting places for agents to congregate. There also needs to be a mechanism for agents to converse when one of them is off-line. Chandler will probably have a general mechanism for store-and-forward communications that agent communications could utilize; one possibility is using email for store-and-forward agent conversations in a fashion analogous to the instant messaging approach under discussion, where text/XML MIME parts would contain the XML fragments. Agent Log Since agents can perform actions in the background without the explicit consent or awareness of the user, it's important to provide a way for the user to find out everything that's taken place. So, all actions performed by agents are logged for the user's inspection. Each agent has an individual log of the actions it has taken, and there's also a global log available in the Agents view where you can see everything that's happened. The logs will also be useful for debugging agents; there will probably be a verbose logging preference, with more details logged to facilitate agent debugging. Agent Programming Agents contain actions defined by scripts that can be executed at the user's request, or when the agent receives a specified event. There is a simple user interface for enabling or disabling predefined event/action associations, and creating new ones by associating an event with an action. There will also be a way for power users to create new actions by combining existing actions and possibly using conditionals and iteration. Eventually, it will would be nice to have a simple, visual programming language with conditionals and iteration, possibly based on a subset of Python, but probably not in our first release. Programmers will be able to add new types of events and actions to an agent, or create entirely new types of agents, by writing methods in Python. Agent Reputation Since agents may be independently developed plug-in modules, there will eventually be a wide variety of agents available from a wide variety of sources. Some agents might be unruly, or even malicious, either by accident or intent. We should provide a reputation mechanism for rating agents and sharing your ratings with other users. Chandler's UI should provide an easy way to rate an agent by picking attributes from a list, as well as capturing free-form text feedback. The ratings will be kept in the user's repository and also conveyed to a server that associates the ratings with agents via their MD5 digest. Chandler can query the server to determine the average ratings associated with an agent and display them to the user. We might also want to include a peer-to-peer way of querying ratings to obviate the need for a server. Security Issues Since Chandler agents are independently distributed plug-in modules, capable of performing arbitrary operations on a data repository and sending the results across the network, there are lots of security issues to consider. Agents should carry a set of credentials, authenticated by digital signatures, declaring the owner of the agent and granting it specific privileges. One possibility is using a relatively fine-grained capability-based security architecture where agents must possess specific capabilities to perform specific operations; the relevant capabilities would be passed as parameters to the restricted methods. Another possibility is restricting access to methods or primitives at the Python language or library level, executing the agents in a restricted runtime influenced by the agent's credentials. The reputation mechanism mentioned above will also help guard against rogue agents. It's too early in the design cycle to commit to specific security mechanisms, since we're still not sure how agents will be used, but it's clear that we'll have to pay close attention to security issues as development proceeds. Sample Agents It's easy to think of lots of potential agents; it will be tricky to decide the best set of them to include in the initial implementation. Here are some brief descriptions of potential agents to consider implementing, but I'm sure the list will be revised as our thinking evolves: Postman - collect and classify email Secretary - arrange meetings between multiple parties File Clerk - file items and enforce filing policies Janitor - system housekeeping tasks, warn about resource limitations Sleuth - search for items, both locally and remotely Switchboard Operator - arrange conference calls or IM meetings Social Director: manager personal calendar, arrange recreational activities Secret agent: enforce security policies Travel agent: book airline and hotel reservations for a business trip Limitations At this stage, there's a bit of a temptation to go overboard with the agent design. We will need to experiment in order to determine the strengths and weaknesses of the approach, and to decide what functionality should be addressed by what set of agents, and what should be left to a more traditional approach. I'm not sure where the design will end up, but I'm sure it will be fun to find out.
|