[Dev] Re: new schema type Python syntax

Brian Kirsch bkirsch at osafoundation.org
Thu Jul 28 14:52:00 PDT 2005




Phillip J. Eby wrote:

> At 10:59 AM 7/28/2005 -1000, Brian Kirsch wrote:
>
>> Phillip J. Eby wrote:
>>
>>> At 10:06 AM 7/28/2005 -1000, Brian Kirsch wrote:
>>>
>>>> I am not sure. Using the Text alias is nice since it makes things 
>>>> simple. I am open to feedback / suggestions on this one.
>>>
>>>
>>>
>>> The parcel issues should go away soon; John has now joined the 
>>> "let's use Python instead of parcel.xml" movement, and I don't think 
>>> there are many other people against it.  When parcel.xml goes away 
>>> then the file/line issue goes away too, since _("text") can set the 
>>> parcel (by looking at the __name__ of the module it is called from).
>>
>>
>> Great. That makes things much easier.
>>
>>
>>> Also, it incidentally means that all localizable strings will be 
>>> extractable from the Python source.
>>
>>
>> Can you go in to more detail on this?
>>
>> What would the advantage be of extracting all LocalizableStrings from 
>> Python source since we still want to leverage the repository for 
>> LocalizableString storage.
>
>
> Well, since all the strings will *be* in the source, it would be the 
> obvious place to extract them from, since we'd need to for 
> runtime-only messages anyway.  So extracting them all from source 
> guarantees none will be missed, and avoids any redundancy.
>
> Whether you then *put* the strings you extracted into the repository 
> is entirely orthogonal.  I'm just suggesting that extracting them all 
> direct from the source is a convenient way to get them for whatever 
> other processing you want to do, at least once we're not using 
> parcel.xml any more.
>
> It's fairly easy to extract the strings using the Python 
> "tokenize.generate_tokens(input_file.readline)" function.  Just loop 
> over the tokens looking for '_' followed by '(', one or more 
> string/unicode tokens, and ')'.  Part of the data that's produced by 
> the function is line/column start+end info for each token, so that's 
> about everything you need.
>

Ok I buy that. What's nice about that to is if all LocalizableString are 
wrapped in _() then it is easy to tell in a schema definition whether 
the assignment was a uString or LocalizableString. i.e.

Schema.One(Schema.Text, initialValue = u"This is a UString")

Schema.One(Schema.Text, initialValue = _(u"This is a Localizable String"))

Given all the changes to the parcel / gettext lookup strategy I think it 
is worth reinvestigating how a message would be registered globally such 
that any parcel could access it. For example "Can not connect to Server" 
is generic and should exist as a root message which any parcel can utilize.

Since the _() uses the module (aka parcel) path to find where the 
translation file is located how would a parcel reference a root level 
message. The two ideas I can think off of the top of my head are:

1. create all definitions at the parcel root and provide Python variable 
references to the translations. This is basically what is currently 
proposed in the i18n spec.

So the root level of the parcel hierarchy would have a definition such as :

CANNOT_CONNECT_TO_SERVER = _(u"Unable to connect to server")


A parcel would import the root messages ie.:

import messages

alert(messages.CANNOT_CONNECT_TO_SERVER))


2. Add an additional argument to _() to specify adding it to the global 
lookup space. I can then define global messages from any parcel

in my parcel I have something like this:

CANNOT_CONNECT_TO_SERVER = _(u"Unable to connect to server", global=True)


Each approach has advantages and disadvantages.

The first approach has the advantage of only defining the translation 
once. This prevents parcel developers from typo's and case issues.  
"Unable to connect" and "unable to connect" are treated as two distinct 
keys by gettext. The first approach has the disadvantage of global 
definitions only being assignable at the parcel root. This is a bit 
inflexible.


The second approach has the advantage of any parcel can define globals. 
The disadvantage is that the case and typo issues mentioned above come 
in to play.

Of course the whole reason for having a _() creation short cut was to 
prevent developers from the burden of having to define simple info and 
error messages in parcel.xml.

If parcel.xml is going away then do we even need _()? It is simple to 
define a LocalizableString in Python.

 From schema:
cantConnect = Schema.One(Schema.Text, initialValue=u"Unable to connect 
to server")

Instance creation:
dnsError = LocalizableString(u"No host found matching that name")


The LocalizableString is the one doing the translation lookup in its 
__unicode__ method.

def __unicode__(self):
     return I18nManager.translate(self.itsPath, self.defaultText)


Thoughts?




More information about the Dev mailing list