[Dev] Index issues

Reid Ellis rae at osafoundation.org
Wed Jan 25 09:25:28 PST 2006


Most apps that deal with this have a "default encoding" preference  
which the user can set to whatever they like, since they might know  
what encoding most of their email is in. I assume that Chandler's  
locale is derived from the OS's locale?

Reid

On Tue Jan 24 2006, at 21:21, Andi Vajda wrote:
> On Tue, 24 Jan 2006, Brian Kirsch wrote:
>> Andi,
>> What do recommend doing in the case where a locale is not know for  
>> the text?
>>
>> Email is a great example, in most cases no language (locale)  
>> headers are supplied.
>
> When no locale is supplied, the encoding supplied could be used for  
> clues for
> using a set of heuristics helping to 'guess' a locale. In the case  
> of email, for example, the domain of the sender may also provide a  
> clue.
> That guess may be better than nothing but not by much...
>
> A good guess at this is important for full text indexing.
>
> When sorting email addresses, however, I'd think that the Chandler  
> user's locale would prevail over the potential locale of the data  
> being sorted.
>
> Andi..
>
>>
>> -Brian
>>
>> Brian Kirsch - Email Framework Engineer
>> Open Source Applications Foundation
>> 543 Howard St. 5th Floor
>> San Francisco, CA 94105
>> (415) 946-3056
>> http://www.osafoundation.org
>>
>>
>>
>> Andi Vajda wrote:
>>> On Tue, 24 Jan 2006, Brian Kirsch wrote:
>>>> One issue to remember, if we are sorting on the name of the user  
>>>> i.e. Brian Kirsch <bkirsch at osafoundation.org> then the sort  
>>>> order will need to be localized with PyICU.
>>> Last year, I added a new index class called StringIndex. It  
>>> understands locale and uses PyICU's collator support for  
>>> comparing strings.
>>> Similarly, I realized recently that for full text indexing's  
>>> sake, LOBs (at least, if not all attributes) should also have a  
>>> locale aspect so that when full text indexing (and queries) are  
>>> run, an analyzer that is appropriate for the language of the  
>>> locale is used to break up the text (or queries) in tokens.
>>> Andi..
>>



More information about the Dev mailing list