[pyicu-dev] Subclassing Transliterators
Andi Vajda
vajda at osafoundation.org
Mon Mar 15 12:11:41 PDT 2010
On Sat, 13 Mar 2010, Christoph Burgmer wrote:
> Am Donnerstag, 11. März 2010 schrieb Andi Vajda:
> [...]
>> If you succeed at this, let me know and I'll include your code if you
>> can donate it. Otherwise, I'll take a look at this sometime.
>
> I pushed my first changes to Github [1]. It is now possibly to subclass
> Transliterator and implement handleTransliterate(). Get back to me if you
> prefer a diff.
>
> Here is an example:
>
>>>> import PyICU
>>>> class VowelSubst(PyICU.Transliterator):
> ... def __init__(self, char='i'):
> ... PyICU.Transliterator.__init__(self, 'My_ID')
> ... self.char = char
> ... def handleTransliterate(self, text, pos, incremental):
> ... for i in range(pos.start, pos.limit):
> ... if text[i] in u"aeiouüöä":
> ... text[i:i+1] = self.char
> ... pos.start = pos.limit
> ...
>>>> m = VowelSubst()
>>>> m.transliterate(u"Drei Chinesen mit dem Kontrabass")
> u'Drii Chinisin mit dim Kintribiss'
Excellent. I integrated your changes into svn rev 108 of PyICU.
> Filters are currently not supported - another class needs to be exposed. Also
> the method to register own classes is still missing.
UnicodeFilter objects are supported. I added support for that a couple of
weeks ago when I added support for UnicodeSet. I added the UnicodeFilter
method variants to the Transliterator wrapper implementation.
> I'd like to add Exception handling to handleTransliterate() but am unclear
> how to do this in a clean way. Currently Exceptions raised are not transported
> through the C layer.
I don't know what the proper way to return an exception to a PyICU user is
too. Currently - and this is not so good - I clear the error and make the
handleTransliterate method do nothing so that it fails completely silently.
This has to be rethought a bit.
> I'd be happy if you could go through my changes and double check. My C skills
> are kind of rusty.
I took your code one chunk at a time and put it into transliterator.h and
transliterator.cpp. Overall, I didn't change much of it.
> On a side note, deleting a character from a UnicodeString makes Python quit
> with a memory access error. This could maybe be handled gracefully.
There was a bug in the method handingl the assignment of one character into
a UnicodeString. I fixed it.
I also added your example into test_Transliterator.py
Thank you very much for your contribution !
Andi..
More information about the pyicu-dev
mailing list