[pyicu-dev] Bug in Python 4-byte to ICU UChar?
Andi Vajda
vajda at osafoundation.org
Wed Nov 30 12:16:07 PST 2005
On Wed, 30 Nov 2005, Jim Fulton wrote:
> Andi Vajda wrote:
>>
> ...
>> Indeed, this is a bug. Is u_strFromUTF32() compatible with Python's 4 byte
>> unicode ? If so the fix should be simple. If not, what are the differences
>> and how are they bridged ?
>
> You're asking me? :)
>
> I'm as confident that 4-byte Python unicode is compatible with UChar32
> as I am that 2-byte Python unicode is compatible with UChar, which is to
> say about 90%. ;)
With that 90% assumption in mind, I made the change you suggested. Since I'm
not near a 4 byte unicode python installation (my mac's is 2 byte), could you
please try the attached patch out ?
Thanks !
Andi..
-------------- next part --------------
Index: common.cpp
===================================================================
--- common.cpp (revision 44)
+++ common.cpp (working copy)
@@ -25,7 +25,9 @@
#include <stdarg.h>
#include <datetime.h>
+#include <unicode/ustring.h>
+
typedef struct {
UConverterCallbackReason reason;
char chars[8];
@@ -133,16 +135,16 @@
else
{
int len = string->length();
- Py_UNICODE *pchars = new Py_UNICODE[len];
+ Py_UNICODE *pchars = new Py_UNICODE[len];
const UChar *chars = string->getBuffer();
for (int i = 0; i < len; i++)
pchars[i] = chars[i];
- PyObject *u = PyUnicode_FromUnicode((const Py_UNICODE *) pchars, len);
- delete pchars;
+ PyObject *u = PyUnicode_FromUnicode((const Py_UNICODE *) pchars, len);
+ delete pchars;
- return u;
+ return u;
}
}
@@ -220,15 +222,23 @@
(int32_t) PyUnicode_GET_SIZE(object));
else
{
- int len = PyUnicode_GET_SIZE(object);
+ int32_t len = (int32_t) PyUnicode_GET_SIZE(object);
Py_UNICODE *pchars = PyUnicode_AS_UNICODE(object);
- UChar *chars = new UChar[len];
+ UChar *chars = new UChar[len * 3];
+ UErrorCode status = U_ZERO_ERROR;
+ int32_t dstLen;
- for (int i = 0; i < len; i++)
- chars[i] = pchars[i];
+ u_strFromUTF32(chars, len * 3, &dstLen,
+ (const UChar32 *) pchars, len, &status);
- string.setTo((const UChar *) chars, (int32_t) len);
- delete chars;
+ if (U_FAILURE(status))
+ {
+ delete chars;
+ throw ICUException(status);
+ }
+
+ string.setTo((const UChar *) chars, (int32_t) dstLen);
+ delete chars;
}
}
else if (PyString_CheckExact(object))
More information about the pyicu-dev
mailing list