You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reported in version: HG 1.2 Reported for operating system, platform: Windows (All), x86
Comments on the original bug report:
On 2009-12-08 23:05:50 +0000, John Popplewell wrote:
Originally reported by AKFoerster on the mailing list.
Error decoding UTF8 Russian text to UTF-16LE on Windows, but specifically on platforms without iconv support (the default on Windows).
Valid UTF8 characters are flagged as being overlong and then substituted by the UNKNOWN_UNICODE character.
After studying the testiconv.c example program, reading the RFCs and putting some printf statements in SDL_iconv.c the problem is in a test for 'Maximum overlong sequences', specifically 4.2.1, which is carried out by the following code:
Here is the 2-byte encoding of a character in range 00000080 - 000007FF
110xxxxx 10xxxxxx
The line in question is supposed to be checking for an overlong sequence which would be less than
11000001 10111111
which should be represented as a single byte.
BUT, the mask value (0xCE) is wrong, it isn't checking the top-most bit:
11000001 value
11001110 mask (incorrect)
^
and should be (0xDE):
11000001 value
11011110 mask (correct)
I can supply a test program and/or a patch if required,
best regards,
John Popplewell
On 2009-12-09 00:46:10 +0000, Sam Lantinga wrote:
Sure, a test program and patch would be awesome, if you have time.
Thanks for your time! :)
On 2009-12-09 02:48:47 +0000, John Popplewell wrote:
Created attachment 455
Test program showing the problem
On 2009-12-09 02:50:05 +0000, John Popplewell wrote:
Created attachment 456
Proposed patch
On 2009-12-10 23:44:31 +0000, John Popplewell wrote:
My additional comments got dropped on the floor. The test program takes a UTF8 string, converts it to UTF-16LE (on Windows), shows the resulting 16-bit character codes, then converts it back to UTF8. I used PuTTY and OpenSSH to get a UTF8 shell on Windows :-)
Here is the output using SDL-1.2.svn (no iconv support):
This bug report was migrated from our old Bugzilla tracker.
These attachments are available in the static archive:
Reported in version: HG 1.2
Reported for operating system, platform: Windows (All), x86
Comments on the original bug report:
On 2009-12-08 23:05:50 +0000, John Popplewell wrote:
On 2009-12-09 00:46:10 +0000, Sam Lantinga wrote:
On 2009-12-09 02:48:47 +0000, John Popplewell wrote:
On 2009-12-09 02:50:05 +0000, John Popplewell wrote:
On 2009-12-10 23:44:31 +0000, John Popplewell wrote:
On 2009-12-11 00:04:46 +0000, Sam Lantinga wrote:
On 2009-12-11 00:09:42 +0000, John Popplewell wrote:
The text was updated successfully, but these errors were encountered: