[Dev] Index issues
Reid Ellis
rae at osafoundation.org
Wed Jan 25 09:25:28 PST 2006
Most apps that deal with this have a "default encoding" preference
which the user can set to whatever they like, since they might know
what encoding most of their email is in. I assume that Chandler's
locale is derived from the OS's locale?
Reid
On Tue Jan 24 2006, at 21:21, Andi Vajda wrote:
> On Tue, 24 Jan 2006, Brian Kirsch wrote:
>> Andi,
>> What do recommend doing in the case where a locale is not know for
>> the text?
>>
>> Email is a great example, in most cases no language (locale)
>> headers are supplied.
>
> When no locale is supplied, the encoding supplied could be used for
> clues for
> using a set of heuristics helping to 'guess' a locale. In the case
> of email, for example, the domain of the sender may also provide a
> clue.
> That guess may be better than nothing but not by much...
>
> A good guess at this is important for full text indexing.
>
> When sorting email addresses, however, I'd think that the Chandler
> user's locale would prevail over the potential locale of the data
> being sorted.
>
> Andi..
>
>>
>> -Brian
>>
>> Brian Kirsch - Email Framework Engineer
>> Open Source Applications Foundation
>> 543 Howard St. 5th Floor
>> San Francisco, CA 94105
>> (415) 946-3056
>> http://www.osafoundation.org
>>
>>
>>
>> Andi Vajda wrote:
>>> On Tue, 24 Jan 2006, Brian Kirsch wrote:
>>>> One issue to remember, if we are sorting on the name of the user
>>>> i.e. Brian Kirsch <bkirsch at osafoundation.org> then the sort
>>>> order will need to be localized with PyICU.
>>> Last year, I added a new index class called StringIndex. It
>>> understands locale and uses PyICU's collator support for
>>> comparing strings.
>>> Similarly, I realized recently that for full text indexing's
>>> sake, LOBs (at least, if not all attributes) should also have a
>>> locale aspect so that when full text indexing (and queries) are
>>> run, an analyzer that is appropriate for the language of the
>>> locale is used to break up the text (or queries) in tokens.
>>> Andi..
>>
More information about the Dev
mailing list