[cosmo-dev] More Unicode

Travis Vachon travis at osafoundation.org
Mon Dec 3 11:13:41 PST 2007


I just did a little more testing with the desktop client and noticed  
that entering U+10000 characters and syncing with a local server or  
hub produces unexpected behavior, so I suspect step (2) I propose  
below will be more complicated than I expected.

Randy- how does all this tie into what you said about MySQL 5 support  
for these characters? Is there simply no way to store them?

-Travis

On Dec 3, 2007, at 10:57 AM, Travis Vachon wrote:

>
> On Dec 2, 2007, at 8:36 AM, Brian Moseley wrote:
>
>> On Nov 30, 2007 2:56 PM, Travis Vachon <travis at osafoundation.org>  
>> wrote:
>>
>>> In any case, my gut feel
>>> is that it would be easier to fix the client for Unicode support  
>>> than
>>> to change the server to blacklist characters.
>>
>> how would you do that?
>
> We can modify our utf8 encoding logic to detect the utf-16 character  
> pairs and translate appropriately instead of encoding each character  
> as an actual character.
>
> To really be complete, however, we'd need to do this for all output  
> (that is, all text in server requests) generated on the client. This  
> would probably be a pretty significant amount of work, so on second  
> thought I think it might be worth looking at just how much work we'd  
> need on the server.
>
>>
>>
>> re validation - we only have to validate the syntax of usernames when
>> writing them into the database. we don't have to validate them when
>> they are used in queries. the only code change we'd have to make is
>> adding a regex to the ui validators and to the User model. explaining
>> this restriction to users would probably be more complicated.
>
> Yeah, explaining the restriction to users and enforcing it client  
> side is something required by both of our proposals.
>
> It's important to note that this problem isn't limited to usernames.  
> Currently, characters with code points above U+10000 works fine in  
> the desktop client and syncs to the server. Bringing this data up in  
> the web ui and saving distorts this data. I think that even if we  
> decide to limit usernames to U+0000-U+FFFF (the BMP) we'll still  
> need to fix this bug, which will require all of the difficult work  
> that we'd need to do for supporting U+10000 and above in usernames.
>
>
> All that said, I think I'm leaning toward favoring the following:
>
> 1) Restrict usernames to U+0000-U+FFFF server side, add client side  
> logic to explain and enforce restriction
> 2) Create 1.0 bug to handle U+10000 and above characters in data  
> correctly.
>
>
> -Travis
> _______________________________________________
> cosmo-dev mailing list
> cosmo-dev at lists.osafoundation.org
> http://lists.osafoundation.org/mailman/listinfo/cosmo-dev



More information about the cosmo-dev mailing list