We are currently migrating Bugzilla to GitHub issues.
Any changes made to the bug tracker now will be lost, so please do not post new bugs or make changes to them.
When we're done, all bug URLs will redirect to their equivalent location on the new bug tracker.

Bug 1938 - Buffer overflows in the Windows IME code
Summary: Buffer overflows in the Windows IME code
Status: RESOLVED FIXED
Alias: None
Product: SDL
Classification: Unclassified
Component: events (show other bugs)
Version: HG 2.0
Hardware: x86 Windows (All)
: P1 major
Assignee: Sam Lantinga
QA Contact: Sam Lantinga
URL:
Keywords: target-2.0.0
Depends on:
Blocks:
 
Reported: 2013-07-03 23:49 UTC by norfanin
Modified: 2013-07-13 02:46 UTC (History)
0 users

See Also:


Attachments
Buffer overflow fixes for the Windows IME code. (3.03 KB, patch)
2013-07-03 23:49 UTC, norfanin
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description norfanin 2013-07-03 23:49:50 UTC
Created attachment 1205 [details]
Buffer overflow fixes for the Windows IME code.

There are a few potential buffer overflows in the Windows IME code located in the SDL_windowskeyboard.c file. [1] They mainly happen because the code passes the number of bytes instead of the number of characters to the wide-character string functions wcslcpy and wcslcat. In another place, the code assumes that the composition cursor position can never go beyond the size of the composition string buffer.

Some of these overflows and overruns can occur with the Japanese IME on Vista and simplified Chinese IME on XP. I don't actually speak those languages and it's my first time using the IMEs, so I probably pushed them to the limit where nobody would still be compositing proper words. They don't cause any immediate access violation, although the possibility of trashing the SDL_VideoData structure is never good.

I've attached a patch that fixes those I found, but because I'm very new to the code it may be worthwhile if someone else also has a look over the code.

I'll go over the changes in my patch and explain what, why and how.

In the function IME_GetReadingString, there is a wcslcpy to copy the reading string from the IMC memory to the SDL reading string buffer. [2] This assumes that the length of the reading string never exceeds the SDL buffer size. I guess that is possible and I wasn't able to get a long reading string in my tests, but the patch adds a simple check anyway.

In the function IME_GetCompositionString, the first line calls ImmGetCompositionStringW to get the composition string. [3] The Microsoft documentation states that the fourth argument is for the destination buffer size in bytes (even with unicode) and the code correctly passes the value of sizeof. However, at the end of IME_GetCompositionString, the string is terminated by setting the element at index 'length' to 0. 'length' is calculated by dividing the number of bytes (those written by ImmGetCompositionStringW) by 2. If it managed to write 64 bytes, the code sets element 32 to 0, which would be the beginning of the reading string if the alignment places it there. My patch adds a subtraction to the fourth argument, essentially making it always pass 62 instead.

In the same function, the code assumes that the composition cursor position doesn't go beyond the buffer size. [4] My patch adds a simple range check in front of the indirection.

In the function IME_SendEditingEvent, the size for the wide-character string functions is passed in bytes instead of characters. [5] Oddly, the current code subtracts 'len' from the size in one function call. This results in truncation in certain situations as the third argument is the number of characters available in the destination buffer. If I'm understanding it correctly, this is supposed to copy x characters of the composition buffer, then concatenate the whole reading string buffer, and then the rest of the composition buffer (where x is the composition cursor position). I don't see how a truncation of the rest would be helpful here. Perhaps this is just an error? My patch removes the subtraction.

In the function UIElementSink_UpdateUIElement, bytes instead of characters is used again for a wcslcpy call. [6]

And that's all.

I was wondering if someone knows if the unicode strings handled here are exclusively UCS-2 or if it has a chance of being UTF-16 and having surrogates. Didn't look at the UTF-8 conversion functions yet, but I thought I throw that concern into the room. It also seems a bit harsh to just truncate at 32 bytes of encoded UTF-8 for the event structure... though that is probably something you guys have already discussed and I should look for on the mailing list or something.

Thanks for the help!


[1]: http://hg.libsdl.org/SDL/file/675c85d46f30/src/video/windows/SDL_windowskeyboard.c
[2]: http://hg.libsdl.org/SDL/file/675c85d46f30/src/video/windows/SDL_windowskeyboard.c#l469
[3]: http://hg.libsdl.org/SDL/file/675c85d46f30/src/video/windows/SDL_windowskeyboard.c#l674
[4]: http://hg.libsdl.org/SDL/file/675c85d46f30/src/video/windows/SDL_windowskeyboard.c#l682
[5]: http://hg.libsdl.org/SDL/file/675c85d46f30/src/video/windows/SDL_windowskeyboard.c#l706
[6]: http://hg.libsdl.org/SDL/file/675c85d46f30/src/video/windows/SDL_windowskeyboard.c#l1055
Comment 1 Ryan C. Gordon 2013-07-12 18:52:39 UTC
(Sorry if you get a lot of copies of this email, we're touching dozens of bug reports right now.)

Tagging a bunch of bugs as target-2.0.0, Priority 1.

This means we're in the final stretch for an official SDL 2.0.0 release! These are the bugs we really want to fix before shipping if humanly possible.

That being said, we don't promise to fix them because of this tag, we just want to make sure we don't forget to deal with them before we bless a final 2.0.0 release, and generally be organized about what we're aiming to ship.

Hopefully you'll hear more about this bug soon. If you have more information (including "this got fixed at some point, nevermind"), we would love to have you come add more information to the bug report when you have a moment.

Thanks!
--ryan.
Comment 2 Sam Lantinga 2013-07-13 02:46:22 UTC
Your patch looks good, thanks!
http://hg.libsdl.org/SDL/rev/f204394628db

I believe the characters are always UCS-2, but if you find out otherwise, please let me know.