[Dev] Index issues
vajda at osafoundation.org
Wed Jan 25 09:43:25 PST 2006
On Wed, 25 Jan 2006, Reid Ellis wrote:
> Most apps that deal with this have a "default encoding" preference which the
> user can set to whatever they like, since they might know what encoding most
> of their email is in. I assume that Chandler's locale is derived from the
> OS's locale?
Yes, Chandler's default locale is derived from the OS or can be set with the
--locale command line flag. Defaulting in such a way works best when 'most'
data is from the same locale and text in the same language.
A US user working with personal data and international projects can go a long
way with en_US. On the other hand, a Spanish user may be having a mix of
spanish data (for example, personal email) and english data (for example,
email from open source project mailing lists) and having some way of
specifying the language of the text being worked with can help in better
indexing and searching it later.
> On Tue Jan 24 2006, at 21:21, Andi Vajda wrote:
>> On Tue, 24 Jan 2006, Brian Kirsch wrote:
>>> What do recommend doing in the case where a locale is not know for the
>>> Email is a great example, in most cases no language (locale) headers are
>> When no locale is supplied, the encoding supplied could be used for clues
>> using a set of heuristics helping to 'guess' a locale. In the case of
>> email, for example, the domain of the sender may also provide a clue.
>> That guess may be better than nothing but not by much...
>> A good guess at this is important for full text indexing.
>> When sorting email addresses, however, I'd think that the Chandler user's
>> locale would prevail over the potential locale of the data being sorted.
>>> Brian Kirsch - Email Framework Engineer
>>> Open Source Applications Foundation
>>> 543 Howard St. 5th Floor
>>> San Francisco, CA 94105
>>> (415) 946-3056
>>> Andi Vajda wrote:
>>>> On Tue, 24 Jan 2006, Brian Kirsch wrote:
>>>>> One issue to remember, if we are sorting on the name of the user i.e.
>>>>> Brian Kirsch <bkirsch at osafoundation.org> then the sort order will need
>>>>> to be localized with PyICU.
>>>> Last year, I added a new index class called StringIndex. It understands
>>>> locale and uses PyICU's collator support for comparing strings.
>>>> Similarly, I realized recently that for full text indexing's sake, LOBs
>>>> (at least, if not all attributes) should also have a locale aspect so
>>>> that when full text indexing (and queries) are run, an analyzer that is
>>>> appropriate for the language of the locale is used to break up the text
>>>> (or queries) in tokens.
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> Open Source Applications Foundation "Dev" mailing list
More information about the Dev