[pyicu-dev] Bug in Python 4-byte to ICU UChar?

Andi Vajda vajda at osafoundation.org
Wed Nov 30 12:16:07 PST 2005


On Wed, 30 Nov 2005, Jim Fulton wrote:

> Andi Vajda wrote:
>> 
> ...
>> Indeed, this is a bug. Is u_strFromUTF32() compatible with Python's 4 byte 
>> unicode ? If so the fix should be simple. If not, what are the differences 
>> and how are they bridged ?
>
> You're asking me? :)
>
> I'm as confident that 4-byte Python unicode is compatible with UChar32
> as I am that 2-byte Python unicode is compatible with UChar, which is to
> say about 90%. ;)

With that 90% assumption in mind, I made the change you suggested. Since I'm 
not near a 4 byte unicode python installation (my mac's is 2 byte), could you 
please try the attached patch out ?

Thanks !

Andi..
-------------- next part --------------
Index: common.cpp
===================================================================
--- common.cpp	(revision 44)
+++ common.cpp	(working copy)
@@ -25,7 +25,9 @@
 #include <stdarg.h>
 #include <datetime.h>
 
+#include <unicode/ustring.h>
 
+
 typedef struct {
     UConverterCallbackReason reason;
     char chars[8];
@@ -133,16 +135,16 @@
     else
     {
         int len = string->length();
-	Py_UNICODE *pchars = new Py_UNICODE[len];
+        Py_UNICODE *pchars = new Py_UNICODE[len];
         const UChar *chars = string->getBuffer();
 
         for (int i = 0; i < len; i++)
             pchars[i] = chars[i];
         
-	PyObject *u = PyUnicode_FromUnicode((const Py_UNICODE *) pchars, len);
-	delete pchars;
+        PyObject *u = PyUnicode_FromUnicode((const Py_UNICODE *) pchars, len);
+        delete pchars;
 
-	return u;
+        return u;
     }
 }
 
@@ -220,15 +222,23 @@
                          (int32_t) PyUnicode_GET_SIZE(object));
         else
         {
-            int len = PyUnicode_GET_SIZE(object);
+            int32_t len = (int32_t) PyUnicode_GET_SIZE(object);
             Py_UNICODE *pchars = PyUnicode_AS_UNICODE(object);
-	    UChar *chars = new UChar[len];
+            UChar *chars = new UChar[len * 3];
+            UErrorCode status = U_ZERO_ERROR;
+            int32_t dstLen;
 
-            for (int i = 0; i < len; i++)
-                chars[i] = pchars[i];
+            u_strFromUTF32(chars, len * 3, &dstLen,
+                           (const UChar32 *) pchars, len, &status);
 
-            string.setTo((const UChar *) chars, (int32_t) len);
-	    delete chars;
+            if (U_FAILURE(status))
+            {
+                delete chars;
+                throw ICUException(status);
+            }
+
+            string.setTo((const UChar *) chars, (int32_t) dstLen);
+            delete chars;
         }
     }
     else if (PyString_CheckExact(object))


More information about the pyicu-dev mailing list