| Summary: | Unicode keyboard input on Windows NT | ||
|---|---|---|---|
| Product: | SDL | Reporter: | Alex Volkov <codepro> |
| Component: | events | Assignee: | Sam Lantinga <slouken> |
| Status: | RESOLVED FIXED | QA Contact: | Sam Lantinga <slouken> |
| Severity: | normal | ||
| Priority: | P2 | CC: | john |
| Version: | 1.2.9 | ||
| Hardware: | x86 | ||
| OS: | Windows (All) | ||
| Attachments: |
NT Unicode input patch
ToUnicode() patch for Win9x/ME/2K/XP Improved ToUnicode() patch ToUnicode() patch - bug fix for VC6 |
||
|
Description
Alex Volkov
2006-01-10 13:18:02 UTC
This would be nice to have...in these cases, you usually end up having to use use LoadLibrary() on system DLLs to see if the Unicode entry points exist, and falling back to ASCII behaviour when then don't...otherwise apps using SDL.dll will refuse to start up on Win95 or whatever. --ryan. Quite so. I'll write up a patch, hopefully soon. Created attachment 20 [details]
NT Unicode input patch
Here is the patch. Successfully tested on Win2k and Win95, but there is no reason why this would not work on other NT or 9x versions. It should not interfere with the current WinCE code, however, unicode input *might* work on WinCE, if specifically enabled -- I do not know if WinCE supports the ToUnicode() API, and the MSDN docs are very silent about it.
It turns out, ToUnicode() API *is* present in Win9x versions of user32.dll, but is essentially a no-op, so the LoadLibrary+GetProcAddress trick would not work. Instead, we have to check which platform we are running on with GetVersionEx().
An alternate patch was just posted to the mailing list: http://www.devolution.com/pipermail/sdl/2006-January/072030.html Alex, can you comment on which direction to go from here? Maybe take parts of each patch, or favor one completely? --ryan. Hi, the main improvements that could be made to my patch would be to make the test for the platform a one-off, and to handle WM_INPUTLANGCHANGE so that the calls to GetKeyboardLayout() and GetLocaleInfo() could be minimized. However, I don't know whether this matters much, given the rate of WM_KEYDOWN events. I've not been able to test this on Win95 or WinME, only 98,2K and XP. I *could* wheel out an old Win95 CD and a spare HD if necessary. I've only ever seen one ME machine, although I might be able to arrange testing there. It's also worth pointing out that I'm testing with a UK and US keyboard only, although I do have a UK keyboard with sticky labels on that pretends to be a German keyboard! My main (test) Win98 machine (Western locale) has all the international support (code-pages, fonts etc.) installed, so I've not tried on a typical, minimalist install. Unless it crashes, I don't see a problem with requiring appropriate international support for input to work correctly. This will probably only affect development/test machines as international users will almost certainly have all the right stuff installed for their keyboard to work in the first place, regards, John. There are advantages and disadvantages in both patches. Both patches will not work 100% correctly if the translation to Unicode returns more than 1 Unicode char (like with non-spacing accent chars), though I have yet to see a keyboard that produces those. And this is no more broken than the equivalent X11 code. John's patch has a single copy of SDL_ToUnicode, so it's easier to maintain, however, I really would not want to make all those calls (GetVersionEx, etc.) on every keypress and release. I cannot testify to John's Left/Right Shift key detection, but I am not very fond of using hardcoded scancodes for that, as scancodes are keyboard-specific. However, in John's defense, Win9x will most likely not even run on exotic hardware that has really weird scancodes, though nothing is guaranteed. As for Unicode input on Win9x in general, it's a great idea using MultiByteToWideChar() to translate the codepage chars to Unicode, but this *will* break any current Win9x SDL app that is already relying on codepage chars (and not Unicode). Unlike WinNT, Win9x does not really support Unicode, so the input in done by replacing the 0x80-0xff range with a locale-specific codepage, and some SDL apps may already be abusing this. Myself, I do not care about Win9x all that much, and if those apps break, it's the app vendor's fault, and all the more reason for ppl to switch to NT, but from a maintainer's perspective, this may be better defered to SDL 1.3 to keep the ABI intact. (In reply to comment #6) > <snip!> > John's patch has a single copy of SDL_ToUnicode, so it's easier to maintain, > however, I really would not want to make all those calls (GetVersionEx, etc.) > on every keypress and release. That can be fixed. I'll put another patch together. > I cannot testify to John's Left/Right Shift key detection, but I am not very > fond of using hardcoded scancodes for that, as scancodes are keyboard-specific. > However, in John's defense, Win9x will most likely not even run on exotic > hardware that has really weird scancodes, though nothing is guaranteed. > The scancodes for the shift-keys appear to be the same on all keyboards I've tried, and if they *are* different it will just act like it does now. Support for an arbitrary number of valid scancodes could be added I suppose. > As for Unicode input on Win9x in general, it's a great idea using > MultiByteToWideChar() to translate the codepage chars to Unicode, but this > *will* break any current Win9x SDL app that is already relying on codepage > chars (and not Unicode). Unlike WinNT, Win9x does not really support Unicode, > so the input in done by replacing the 0x80-0xff range with a locale-specific > codepage, and some SDL apps may already be abusing this. Myself, I do not care > about Win9x all that much, and if those apps break, it's the app vendor's > fault, and all the more reason for ppl to switch to NT, but from a maintainer's > perspective, this may be better defered to SDL 1.3 to keep the ABI intact. > I don't think this is accurate. My tests suggest that the same problem applies to NT systems with the current version - they also currently receive character codes in "the 0x80-0xff range with a locale-specific codepage" - so the application will break on Win9x *and* NT. I've not found an application that does anything like this, but it is difficult to belive that it hasn't happened :-) Regarding Unicode support in Win9x, once you've got Unicode characters from SDL it would be possible for the application to link to the MS provided 'unicows.dll' if they need transparent Unicode handling, best regards, John. Created attachment 21 [details]
ToUnicode() patch for Win9x/ME/2K/XP
This modified patch removes the windib left/right-shift key stuff and minimizes the call overhead when handling WM_KEYDOWN events.
Whilst testing the windib driver I discovered that the left and right shift keys aren't independent e.g. hold down the left-shift key, then toggle the right-shift key - nothing. This isn't how the directx and x11 driver behave, so I dumped those changes.
Created attachment 22 [details]
Improved ToUnicode() patch
Use of a function pointer makes the code simpler and more run-time efficient.
Thought some explanation of the GetCodePage() function and the use of MultiByteToWideChar() was in order. MS examples of MultiByteToWideChar() usage use the flag CP_ACP or the value returned by GetACP(). This works (translates an 8-bit code-page relative character into a 16-bit Unicode character) as long as the keyboard mapping matches the code-page of your system. This is probably the case for a lot of users, but not for developers or users who work with multiple languages. For example, my UK Win98 systems have a system code-page identifier of 1252, which means that MultiByteToWideChar(CP_ACP,...) will work for other countries with the same code-page e.g. German, Spanish etc. If I set the keyboard mapping to Polish (1250) or Greek (1253) I get rubbish. I wondered how 'notepad' did it and discovered that 'notepad' *stops you changing* to a keyboard mapping that doesn't share the same code-page as the system! This is how I discovered the WM_INPUTLANGCHANGE / WM_INPUTLANGCHANGEREQUEST messages. However, 'Wordpad' lets you change the keyboard mapping, and handles all the characters fine - so it can be done! The call to GetLocaleInfo() using an LCID made from the language identifier returned by GetKeyboardLayout() is just the most direct route I found for getting a code-page identifier that changes with the keyboard mapping, there may be a single function call that does this... Anyway, sorry for rambling and hope this helps, cheers, John. Patch looks good, John, except I think the dx5 part wont compile with Visual C 6. VC6 cannot do C99 var declaration intersperced with code, so you should put 'BYTE keystate[256];' back where it was, or simply change vkey above it to UINT vkey = MapVirtualKey(scancode, 1); (In reply to comment #7) > > As for Unicode input on Win9x in general, it's a great idea using > > MultiByteToWideChar() to translate the codepage chars to Unicode, but this > > *will* break any current Win9x SDL app that is already relying on codepage > > chars (and not Unicode). Unlike WinNT, Win9x does not really support Unicode, > I don't think this is accurate. My tests suggest that the same problem applies > to NT systems with the current version - they also currently receive character > codes in "the 0x80-0xff range with a locale-specific codepage" - so the > application will break on Win9x *and* NT. On my Win2k with a Russian keyboard, all the cyrillic keys get translated to '?' by the ToAscii() function, which is what SDL is currently using. Perhaps you were testing with ToAsciiEx()? On Win95, however, ToAscii() translates the cyrillic keys to the 0xa1-0xff range (codepage 1251, I think). Created attachment 23 [details] ToUnicode() patch - bug fix for VC6 (In reply to comment #11) > Patch looks good, John, except I think the dx5 part wont compile with Visual C > 6. VC6 cannot do C99 var declaration intersperced with code, so you should put > 'BYTE keystate[256];' back where it was, or simply change vkey above it to > UINT vkey = MapVirtualKey(scancode, 1); > Oops! Thanks for prompting me to try VC6, it didn't like my pointer-to-function declaration either. I've been using MSYS/MinGW ... cheers, John. (In reply to comment #12) > (In reply to comment #7) > > > As for Unicode input on Win9x in general, it's a great idea using > > > MultiByteToWideChar() to translate the codepage chars to Unicode, but this > > > *will* break any current Win9x SDL app that is already relying on codepage > > > chars (and not Unicode). Unlike WinNT, Win9x does not really support Unicode, > > I don't think this is accurate. My tests suggest that the same problem applies > > to NT systems with the current version - they also currently receive character > > codes in "the 0x80-0xff range with a locale-specific codepage" - so the > > application will break on Win9x *and* NT. > > On my Win2k with a Russian keyboard, all the cyrillic keys get translated to > '?' by the ToAscii() function, which is what SDL is currently using. Perhaps > you were testing with ToAsciiEx()? That's possible, I *was* flailing around for a while :-) > On Win95, however, ToAscii() translates the cyrillic keys to the 0xa1-0xff > range (codepage 1251, I think). > I stand corrected. Interesting. I didn't test with Russian. With Polish, ToAscii() maps the barred-L character to ASCII L and the o-acute is mapped to 0xF3 which is o-acute in code-page 1252 (my default). I get the same effect as you with Russian though. This is quite good news: doesn't it mean that no application can use a work-round to handle international characters when running on Windows? I've had a look at some SDL-based projects and haven't (yet) found any that would be affected by this fix. Typically: - no use of unicode field, often with a GUI keyboard for name entry - ignore characters >= 128 (Tux Paint used to) - Works internally with Unicode, so it will just start working the same as on X11 (PyGame) Advice on the SDL documentation Wiki contains this code fragment: char ch; if ( (keysym.unicode & 0xFF80) == 0 ) { ch = keysym.unicode & 0x7F; } else { printf("An International Character.\n"); } which I believe will still work. Do you think it's worth trying to identify affected applications? best regards, John. (In reply to comment #14) > This is quite good news: doesn't it mean that no application can use a > work-round to handle international characters when running on Windows? Technically, yes. No Windows SDL application can abuse the unicode field on Win9x *and* WinNT right now. And since the unicode field gets non-1252 codepage chars *only* on Win9x, it should be relatively safe. I seriously doubt anyone is writing and maintaining apps that run only on Win9x right now. And for languages in the codepage 1252 the behavior *will not* change with this patch (one of the beauties of UCS/Unicode). Also the code fragment from the wiki that you mentioned will still stand. > Do you think it's worth trying to identify affected applications? All things considering -- I do not think so. As codepage 1252 remains unchanged, if there are any others, they are abusing the Win9x behavior and should correct their code. I found this issue with SDL myself while trying to add support for Russian input in a game. Let's just assume that I was the first, since I did not find any other bug reports re this ;-) Reassigning this bug to Sam for final deliberation. Sam, please note that the commit of Bug #47 makes the latest version of this patch apply with offset warnings, but it otherwise applies cleanly, still. Alex, John, thank you for all the discussion and cooperation on this bug...the collaboration is really exciting to see! --ryan. Thanks guys! This patch is now in CVS. Setting Sam as "QA Contact" on all bugs (even resolved ones) so he'll definitely be in the loop to any further discussion here about SDL. --ryan. |