[cosmo-dev] More Unicode
Travis Vachon
travis at osafoundation.org
Mon Dec 3 11:13:41 PST 2007
I just did a little more testing with the desktop client and noticed
that entering U+10000 characters and syncing with a local server or
hub produces unexpected behavior, so I suspect step (2) I propose
below will be more complicated than I expected.
Randy- how does all this tie into what you said about MySQL 5 support
for these characters? Is there simply no way to store them?
-Travis
On Dec 3, 2007, at 10:57 AM, Travis Vachon wrote:
>
> On Dec 2, 2007, at 8:36 AM, Brian Moseley wrote:
>
>> On Nov 30, 2007 2:56 PM, Travis Vachon <travis at osafoundation.org>
>> wrote:
>>
>>> In any case, my gut feel
>>> is that it would be easier to fix the client for Unicode support
>>> than
>>> to change the server to blacklist characters.
>>
>> how would you do that?
>
> We can modify our utf8 encoding logic to detect the utf-16 character
> pairs and translate appropriately instead of encoding each character
> as an actual character.
>
> To really be complete, however, we'd need to do this for all output
> (that is, all text in server requests) generated on the client. This
> would probably be a pretty significant amount of work, so on second
> thought I think it might be worth looking at just how much work we'd
> need on the server.
>
>>
>>
>> re validation - we only have to validate the syntax of usernames when
>> writing them into the database. we don't have to validate them when
>> they are used in queries. the only code change we'd have to make is
>> adding a regex to the ui validators and to the User model. explaining
>> this restriction to users would probably be more complicated.
>
> Yeah, explaining the restriction to users and enforcing it client
> side is something required by both of our proposals.
>
> It's important to note that this problem isn't limited to usernames.
> Currently, characters with code points above U+10000 works fine in
> the desktop client and syncs to the server. Bringing this data up in
> the web ui and saving distorts this data. I think that even if we
> decide to limit usernames to U+0000-U+FFFF (the BMP) we'll still
> need to fix this bug, which will require all of the difficult work
> that we'd need to do for supporting U+10000 and above in usernames.
>
>
> All that said, I think I'm leaning toward favoring the following:
>
> 1) Restrict usernames to U+0000-U+FFFF server side, add client side
> logic to explain and enforce restriction
> 2) Create 1.0 bug to handle U+10000 and above characters in data
> correctly.
>
>
> -Travis
> _______________________________________________
> cosmo-dev mailing list
> cosmo-dev at lists.osafoundation.org
> http://lists.osafoundation.org/mailman/listinfo/cosmo-dev
More information about the cosmo-dev
mailing list