Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in src/stdlib/SDL_iconv.c #535

Closed
SDLBugzilla opened this issue Feb 10, 2021 · 0 comments
Closed

Bug in src/stdlib/SDL_iconv.c #535

SDLBugzilla opened this issue Feb 10, 2021 · 0 comments

Comments

@SDLBugzilla
Copy link
Collaborator

SDLBugzilla commented Feb 10, 2021

This bug report was migrated from our old Bugzilla tracker.

These attachments are available in the static archive:

Reported in version: HG 1.2
Reported for operating system, platform: Windows (All), x86

Comments on the original bug report:

On 2009-12-08 23:05:50 +0000, John Popplewell wrote:

Originally reported by AKFoerster on the mailing list.

Error decoding UTF8 Russian text to UTF-16LE on Windows, but specifically on platforms without iconv support (the default on Windows).

Valid UTF8 characters are flagged as being overlong and then substituted by the UNKNOWN_UNICODE character.

After studying the testiconv.c example program, reading the RFCs and putting some printf statements in SDL_iconv.c the problem is in a test for 'Maximum overlong sequences', specifically 4.2.1, which is carried out by the following code:

      } else if ( p[0] >= 0xC0 ) {
        if ( (p[0] & 0xE0) != 0xC0 ) {
          /* Skip illegal sequences
            return SDL_ICONV_EILSEQ;
          */
          ch = UNKNOWN_UNICODE;
        } else {
          if ( (p[0] & 0xCE) == 0xC0 ) {    <<<<<<<< here
            overlong = SDL_TRUE;
          }
          ch = (Uint32)(p[0] & 0x1F);
          left = 1;
        }
      } else {

Here is the 2-byte encoding of a character in range 00000080 - 000007FF
110xxxxx 10xxxxxx

The line in question is supposed to be checking for an overlong sequence which would be less than
11000001 10111111

which should be represented as a single byte.

BUT, the mask value (0xCE) is wrong, it isn't checking the top-most bit:
11000001 value
11001110 mask (incorrect)
^
and should be (0xDE):
11000001 value
11011110 mask (correct)

making the above code:

      } else if ( p[0] >= 0xC0 ) {
        if ( (p[0] & 0xE0) != 0xC0 ) {
          /* Skip illegal sequences
            return SDL_ICONV_EILSEQ;
          */
          ch = UNKNOWN_UNICODE;
        } else {
          if ( (p[0] & 0xDE) == 0xC0 ) {    <<<<<<<< here
            overlong = SDL_TRUE;
          }
          ch = (Uint32)(p[0] & 0x1F);
          left = 1;
        }
      } else {

I can supply a test program and/or a patch if required,

best regards,
John Popplewell

On 2009-12-09 00:46:10 +0000, Sam Lantinga wrote:

Sure, a test program and patch would be awesome, if you have time.

Thanks for your time! :)

On 2009-12-09 02:48:47 +0000, John Popplewell wrote:

Created attachment 455
Test program showing the problem

On 2009-12-09 02:50:05 +0000, John Popplewell wrote:

Created attachment 456
Proposed patch

On 2009-12-10 23:44:31 +0000, John Popplewell wrote:

My additional comments got dropped on the floor. The test program takes a UTF8 string, converts it to UTF-16LE (on Windows), shows the resulting 16-bit character codes, then converts it back to UTF8. I used PuTTY and OpenSSH to get a UTF8 shell on Windows :-)

Here is the output using SDL-1.2.svn (no iconv support):

C:\tpdev\test-junk>utf8test.exe
Before: 'Здpaвcтвyй миp'
UFFFD UFFFD U0070 U0061 UFFFD U0063 UFFFD UFFFD U0079 UFFFD U0020 UFFFD UFFFD U0070
After : '��pa�c��y� ��p'

C:\tpdev\test-junk>

Here is the output using SDL-1.2.svn (with iconv support):

C:\tpdev\test-junk>utf8test.exe
Before: 'Здpaвcтвyй миp'
U0417 U0434 U0070 U0061 U0432 U0063 U0442 U0432 U0079 U0439 U0020 U043C U0438 U0070
After : 'Здpaвcтвyй миp'

C:\tpdev\test-junk>

Here is the output using patched SDL-1.2.svn (no iconv support):
C:\tpdev\test-junk>utf8test.exe
Before: 'Здpaвcтвyй миp'
U0417 U0434 U0070 U0061 U0432 U0063 U0442 U0432 U0079 U0439 U0020 U043C U0438 U0070
After : 'Здpaвcтвyй миp'

C:\tpdev\test-junk>

The output of 'testiconv.exe' (no iconv support) is unchanged with the patch.

Hope that helps.

On 2009-12-11 00:04:46 +0000, Sam Lantinga wrote:

Great problem description and patch! I've applied your change to subversion, thanks!

On 2009-12-11 00:09:42 +0000, John Popplewell wrote:

Here's a link to some binaries, source-code and a build script:

http://johnnypops.demon.co.uk/files/utf8test-20091211.zip

Rename one of the SDL-svn*.dll files to SDL.dll to test the various permutations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant