We are currently migrating Bugzilla to GitHub issues.
Any changes made to the bug tracker now will be lost, so please do not post new bugs or make changes to them.
When we're done, all bug URLs will redirect to their equivalent location on the new bug tracker.

Bug 1876

Summary: SDL_TEXTINPUT only returns '?' (0x3F) in event.text.text with Khmer language input
Product: SDL Reporter: Andreas Ertelt <bugzilla-sdl>
Component: eventsAssignee: Ryan C. Gordon <icculus>
Status: ASSIGNED --- QA Contact: Sam Lantinga <slouken>
Severity: normal    
Priority: P2    
Version: HG 2.0   
Hardware: x86_64   
OS: Windows 8   
See Also: https://bugzilla.libsdl.org/show_bug.cgi?id=2406
https://bugzilla.libsdl.org/show_bug.cgi?id=2287
Attachments: proper unicode input support for windows

Description Andreas Ertelt 2013-05-29 17:44:37 UTC
Compared to Japanese and Chinese input which both work flawlessly, for Khmer pretty much all input (or keystrokes if you like), except for some punctuation (e.g. ".", "," which are produced using shift as modifier), return the same code with the length of 1 byte through SDL_TEXTINPUT's event.text.text property: 63 (0x3F) which ultimatively is a "?".

Copy&pasting Khmer text from websites into the application however works perfectly. Just the keyboard events don't yield usable data, which leads me to believe that information is lost somewhere along the way.

I also feel I should receive SDL_TEXTEDITING events, since at least some characters of the language are composed.

Furthermore, it does not seem to make any difference whether or not I am using SDL_StartTextInput(), independent of the language used - shouldn't this be important to be able to process IME input in the first place?

For what it's worth, I am currently using mingw-w64 with gcc-4.8 and SDL2 version 7240.

To reproduce simply install the Khmer keyboard layout in Windows8 and review the SDL_TEXTINPUT data in event.text.text pressing random keys or using the Touch Keyboard (Virtual Keyboard).
I suspect that Windows7 and Vista experience the same issue, but could not test it.
Comment 1 Andreas Ertelt 2013-06-20 16:03:01 UTC
Doing the same under Linux yields the expected result at least for TEXTINPUT, where event.text.text will usually yield 3-byte long strings with a renderable character.

However, using the khmer keyboard layout and typing a "b" (resulting in "ប") followed by an "e" should result in only one glyph: "បេ" - this however cannot be derived from the SDL events, being that TEXTEDITING-events are not generated for this language on either Linux or Windows.
Comment 2 Sam Lantinga 2013-06-25 01:24:35 UTC
For what it's worth, on Mac OS X, pressing those keys generate the individual glyphs "ប" and "េ", and the font layout system uses negative kerning to place the second glyph in front of the first.

TextEdit actually composes them into a single glyph, but I'm not sure how it does that.  Certainly they are two characters with the text input APIs SDL uses.

I know this doesn't help on Windows and Linux, but I figured I'd add it as another data point. :)
Comment 3 Andreas Ertelt 2013-08-07 13:12:32 UTC
I have investigated this issue a little further and found out that the WM_CHAR message, handled in SDL_windowsevents.c, is to blame for the data in SDL_TEXTINPUT. Both this message and supposedly WM_UNICHAR cannot handle certain languages including Khmer and Hindi. Unfortunately there is hardly any information available concerning this.

A possible workaround could be using the KEY_DOWN event together with ToUnicode(). In which case WM_KEYDOWN should generate a non-0 return code to not trigger the generation of a WM_CHAR message. However, those should still be handled the same way for messages originating from external applications.
Comment 4 Andreas Ertelt 2013-08-16 17:47:44 UTC
Created attachment 1291 [details]
proper unicode input support for windows

The issue comes down to this line on MSDN:
"TranslateMessage produces WM_CHAR messages only for keys that are mapped to ASCII characters by the keyboard driver."

"WM_KEYDOWN and WM_KEYUP combinations produce a WM_CHAR or WM_DEADCHAR message. WM_SYSKEYDOWN and WM_SYSKEYUP combinations produce a WM_SYSCHAR or WM_SYSDEADCHAR message."
Except for WM_CHAR, none of these messages are used in SDL. Hence TranslateMessage should be dropped entirely and proper handling be included in the WM_KEYDOWN event.
Currently TranslateMessage is called for every message even if it must not be called in certain cases (like "An application should not call TranslateMessage if the TranslateAccelerator function returns a nonzero value.").

WM_CHAR message handling should remain for external processes posting these messages - additionally, WM_UNICHAR should be added.

I made a patch for src/video/windows/SDL_windowsevents.c that seems to work fine. It doesn't solve the "missing" composition for Khmer, but at least input for languages that cannot be mapped to ASCII characters (and for which IME is not used) will now work on Windows.
Comment 5 Sam Lantinga 2013-08-16 18:02:59 UTC
Andreas, can you see if this patch is a good idea?  It seems like it might break other things, but I don't know.
Comment 6 Andreas Ertelt 2013-08-16 18:24:59 UTC
I don't see how it could break anything - the only thing that was effectively removed is the automatic generation of a WM_CHAR message in response to a WM_KEYDOWN event, which so far delivered insufficient data for some languages.

And this functionality is now incorporated within WM_KEYDOWN. WM_CHAR will still be supported the same way if someone chooses to send those messages directly for some reason.

In my project the patched version works fine and as far as I can tell behaves exactly the same (except for better Unicode support).

As mentioned earlier, the only other events that TranslateMessage might have generated are not even used by SDL.
Comment 7 Sam Lantinga 2013-08-16 18:38:30 UTC
Fixed, thanks!
http://hg.libsdl.org/SDL/rev/cc775832d501
Comment 8 Andreas Ertelt 2013-08-16 19:06:00 UTC
Since you changed the "static __inline__" to "static SDL_FORCE_INLINE" it gets expanded to "static static __inline__" for MinGW gcc, which breaks compilation - so I assume the "static" on line 259 (SDL_windowsevents.c) should now be removed.
Comment 9 Andreas Schiffler 2013-08-16 19:15:22 UTC
The patch "as is" should probably not applied: to handle dead keys, one needs to call ToUnicode() twice (see [1] and [2]).

The correct implementation seems to be in WM_KEYDOWN using GetKeyboardState(), MapVirtualKeyW() and ToUnicode() (see [3]).

References:
[1] http://blogs.msdn.com/b/michkap/archive/2006/09/10/748775.aspx
[2] http://stackoverflow.com/questions/1964614/toascii-tounicode-in-a-keyboard-hook-destroys-dead-keys
[3] http://blogs.msdn.com/b/michkap/archive/2005/01/19/355870.aspx
Comment 10 Andreas Ertelt 2013-08-16 20:20:57 UTC
At least on Win8, this implementation has no issues handling dead keys. Meaning "^" followed by "e" resulting in "ê", as well as "^" followed by another "^" resulting single "^". Both only generating one single SDL_TEXTINPUT event containing the one character they are supposed to generate.
The articles you liked all revolve around keyboard hooks which receive a KBDLLHOOKSTRUCT. The whole issue they are having with the dead keys is that they want to receive them when pressed and NOT when combined with an appropriate letter (or singly when duplicated) as WM_CHAR generates them for inputs.

For the case we have here there is no reason to call toUnicode more than once.
http://msdn.microsoft.com/en-us/library/windows/desktop/ms646320%28v=vs.85%29.aspx

And MapVirtualKey (The UNICODE environment variable should take care of the W) is not necessary because WM_KEYDOWN already provides a correct scancode.
http://msdn.microsoft.com/en-us/library/windows/desktop/ms646280%28v=vs.85%29.aspx
Comment 11 Sam Lantinga 2015-06-25 18:59:48 UTC
The fix to this bug caused bug 2834, so we're rolling this change back for the 2.0.4 release.
Comment 12 Andreas Ertelt 2015-06-27 14:07:31 UTC
Hello Sam and everyone involved - this post should help everyone involved understand what was going on at the time, as well as the problems the solution to bug 2834 causes and how it may be solved.
Unfortunately, I have since stopped using SDL, so my future involvement in this will likely be very limited.


Summary of this issue
=====================

When I filed this bug, I wasn't familiar with the UNICODE definition which causes SDL to initialize everything with the W-variety functions of windows API. When this definition is supplied this problem does not occur.

However, if one wants to use the ASCII variant for any reason (even if it's just for lack of knowledge about how SDL uses the two), you cannot receive higher level unicode characters through either regular or emulated windows keyboard events; only the question mark character - which can, and in my opinion should, be worked around. No reason restricting people from receiving proper event content if they can, or is there? After all, SDL explicitly states that the data will be UTF8.


Attempted solution
==================
The patch I supplied handled exactly this case by processing keyboard events manually the way windows does it - unfortunately not paying enough attention to the deadkeys issue.
Being that I worked with foreign languages a lot at the time, I got the impression that SDL_TEXTINPUT was supposed to send the single keystrokes at all times which you then had to alter on consequent SDL_TEXTEDITING events for composition. Which is crucial for languages like Japanese or Chinese* where characters can change multiple times depending on the keystroke sequence.
In addition I added the WM_UNICHAR event handling which allows apps to send unicode strings to ASCII applications (windows automatically switches WM_CHAR events to WM_UNICHAR when Unicode windows message ASCII windows and the ASCII application reacted correctly to windows probing that capability).

* unrelated but maybe of interest: SDL_SetTextInputRect() has (or at least had at the time) no working implementation making Chinese text input almost impossible; especially since the order of glyphs in the candidate list is dynamic.


New problems and suggestions
============================
Patch 2834 will prevent ASCII applications from receiving unicode characters through actual keyboard input again; externally created unicode messages will continue to be processed correctly.

SDL's documentation currently doesn't really state anything on the handling of dead keys which leads me to believe it should be using the current system configuration.
On windows this would require support for WM_DEADCHAR and probably also WM_SYSDEADCHAR events.
Maybe a setting would be nice to configure how dead keys are handled (system setting or explicitly on/off) as well.

Or you could just roll everything back and add a note to the wiki cautioning everyone that you MUST use -DUNICODE because otherwise some foreign languages will produce question marks instead of the actual characters.

This patch initially also caused problems when SDL was integrated into an existing QT environment (bug #2406), which Valve uses a lot. Might also be of relevance when touching this again.
Comment 13 Sam Lantinga 2015-06-30 05:12:47 UTC
Great, thanks for the additional info!