[pyicu-dev] Subclassing Transliterators

Andi Vajda vajda at osafoundation.org
Mon Mar 15 12:11:41 PDT 2010


On Sat, 13 Mar 2010, Christoph Burgmer wrote:

> Am Donnerstag, 11. März 2010 schrieb Andi Vajda:
> [...]
>> If you succeed at this, let me know and I'll include your code if you
>> can donate it. Otherwise, I'll take a look at this sometime.
>
> I pushed my first changes to Github [1]. It is now possibly to subclass
> Transliterator and implement handleTransliterate(). Get back to me if you
> prefer a diff.
>
> Here is an example:
>
>>>> import PyICU
>>>> class VowelSubst(PyICU.Transliterator):
> ...     def __init__(self, char='i'):
> ...         PyICU.Transliterator.__init__(self, 'My_ID')
> ...         self.char = char
> ...     def handleTransliterate(self, text, pos, incremental):
> ...         for i in range(pos.start, pos.limit):
> ...             if text[i] in u"aeiouüöä":
> ...                 text[i:i+1] = self.char
> ...         pos.start = pos.limit
> ...
>>>> m = VowelSubst()
>>>> m.transliterate(u"Drei Chinesen mit dem Kontrabass")
> u'Drii Chinisin mit dim Kintribiss'

Excellent. I integrated your changes into svn rev 108 of PyICU.

> Filters are currently not supported - another class needs to be exposed. Also
> the method to register own classes is still missing.

UnicodeFilter objects are supported. I added support for that a couple of 
weeks ago when I added support for UnicodeSet. I added the UnicodeFilter 
method variants to the Transliterator wrapper implementation.

> I'd  like to add Exception handling to handleTransliterate() but am unclear
> how to do this in a clean way. Currently Exceptions raised are not transported
> through the C layer.

I don't know what the proper way to return an exception to a PyICU user is 
too. Currently - and this is not so good - I clear the error and make the 
handleTransliterate method do nothing so that it fails completely silently.
This has to be rethought a bit.

> I'd be happy if you could go through my changes and double check. My C skills
> are kind of rusty.

I took your code one chunk at a time and put it into transliterator.h and 
transliterator.cpp. Overall, I didn't change much of it.

> On a side note, deleting a character from a UnicodeString makes Python quit
> with a memory access error. This could maybe be handled gracefully.

There was a bug in the method handingl the assignment of one character into 
a UnicodeString. I fixed it.

I also added your example into test_Transliterator.py

Thank you very much for your contribution !

Andi..


More information about the pyicu-dev mailing list