[Chandler-dev] Natural language parsing in Chandler
Ted Leung
twl at osafoundation.org
Mon Jun 12 15:40:07 PDT 2006
On Jun 9, 2006, at 6:29 PM, Jeffrey Harris wrote:
> We discussed three major areas where language parsing seems helpful,
> listed below in order of priority. I'd like to get the list's
> feedback
> on important details for parsing architecture.
>
> 1. Parsing of a field as it's edited
> ====================================
>
> The dominant fields of interest are time, date, and duration. These
> have a clear time-related context, so Bear's existing work on date
> parsing using regular expressions seems likely to give useful results
> here, very little English grammar would need to be interpreted.
>
> In this area, ideally parsing would (quickly!) return a list of
> weighted
> possible matches, so an auto-complete drop down could give useful
> feedback and acceleration options.
>
> Examples:
>
> 't' ->
> tomorrow
> Tuesday
> Thursday
>
> 'next w' ->
> next week
> next Wednesday
>
> 'sat at 5' ->
> Saturday 5PM
> Saturday 5AM
>
> Perhaps even the fuzzier:
>
> 'tmrw' ->
> tomorrow
Another possible use case here is to copy and paste text from (say)
an e-mail into the field and have it be recognized.
>
> 2. Parsing of arguments at a Chandler command line
> ==================================================
>
> A Chandler mini-command line doesn't yet exist, but it's planned to
> make
> quick data entry painless.
>
> Examples:
>
> /event dinner with Tom at Millenium at 7
> /task give Alicia her book back
>
> Here there's somewhat less context than in area 1. The arguments
> after
> a command could include information for a variety of fields. Thus,
> handling English grammar becomes important, regular expressions seem
> unlikely to reliably parse such examples.
>
> Fortunately, toolkits like http://nltk.sourceforge.net/ may be able to
> help. They may be dramatically slower than regular expressions,
> determining relative processing costs should be part of Darshana's
> work
> this summer.
The whole command line approach seems wrong to me. Agenda, and
Apple's Newton, both of which ran on significantly less capable
hardware than we have today, were able to do recognition of people,
dates, etc without the need for a semantic hint such as /event or /
task. It might be that the first cut of recognition needs those
hints to be implementable, but I think that a longer range goal
should to reduce the need for these kinds of hints.
>
> 3. Parsing of emails and instant messages
> =========================================
>
> If a user receives an email with the sentence:
>
> 'Please join us at Asha December 10 at 7PM', it would be great to
> offer
> an option to intelligently stamp the email as an event, with location
> set to Asha, start time was set appropriately, with year inferred from
> the date the email was sent.
>
> Fields might be populated automatically, or perhaps there would be
> UI to
> view how fields might be populated.
>
This feeds directly into what I said above about the command line.
>
> First steps
> ===========
>
> Area 1) (in field parsing) would be useful now. Area 2) (arguments to
> commands) would be useful in the mid-term, before 1.0 ships. Area 3)
> (parsing of incoming streams) might be useful at any time, but
> speed and
> UI issues would need to be thought through, so it's not a priority to
> make this happen before 1.0 ships.
>
> When Darshana arrives in the office, her starting point will be to
> work
> with the design team on specific area 1) examples to parse, and
> experiment with Bear's date parsing code to see if it can be made to
> solve those examples.
I wouldn't feel bound by the NLP toolkits. Also, we ought to be
considering how
to query the domain model in order to help in disambiguating parts of
the recognition
steps.
>
> Architecture
> ============
>
> We're consciously focusing on US English parsing, but we should
> create a
> framework that allows different "parsing resources" to be used
> based on
> locale.
>
> While different parsing resources (associated with different
> fields) may
> have radically different implementations, it seems useful to come up
> with a common API for them. A first cut at requirements:
>
> - A parsing resource should have a method to take a unicode string and
> return a list of possible interpretations for the fields it
> understands,
> with associated confidences.
You might also provide a set of hints as an argument to the parsing
resource (recognizer). That would allow the application to prime the
recognizer and say "the result should be a time", for the case where
the recognizer is invoked inside a detail view (or other "typed" field).
>
> - Parsing resources should be registered so the most appropriate
> parser
> can be used for a particular combination of understood fields (e.g.
> start time, duration, location, person).
Are you imagining something like "given an expected kind, give me the
set of all recognizers, in priority order that might be applicable"?
Ted
More information about the chandler-dev
mailing list