| Summary: | Deadlock on startup in hid_enumerate | ||
|---|---|---|---|
| Product: | SDL | Reporter: | Jimb Esser <wasteland> |
| Component: | joystick | Assignee: | Sam Lantinga <slouken> |
| Status: | RESOLVED FIXED | QA Contact: | Sam Lantinga <slouken> |
| Severity: | normal | ||
| Priority: | P2 | ||
| Version: | HG 2.1 | ||
| Hardware: | x86 | ||
| OS: | Windows 10 | ||
| Attachments: | Ignore problematic Corsair device | ||
The original user never responded to request for more information, but I've got another user now with the exact same deadlock - also stuck calling HidD_GetSerialNumberString() on a Corsair device. I've attempted to contact this user for more information, but Steam doesn't provide a great venue for that, so might not get any response ^_^. I'm a little curious on whether this code has made it to the SDL used by the non-beta Steam client yet? It seems these particular users would be unable to start Steam at all, although perhaps this deadlock only happens when a parent process/other process already opened the problematic device. We haven't seen this reported in Steam. If you remove the call to HidD_GetSerialNumberString() in hid_enumerate(), does that fix the problem? Also, please try this with the latest SDL snapshot. I don't think anything related to this was fixed, but I'd like to make sure: http://www.libsdl.org/tmp/SDL-2.0.zip Thanks! The two users who got this never responded to requests for more information, unfortunately. From my logs, one appears to have got it once, and immediately uninstalled, never ran it a second time. Other user ran the game successfully once later, so it might be the hang is intermittent (or he ran on a different computer). I'll try removing that call and pushing a new version, but the frequency of getting crash reports of this issue seems to be about once per couple months or less, so it'll be hard to say whether it fixes it. As for Steam, as far as I know, it does not do any deadlock detection in its crash reporting, so if I reproduce this artificially (put in a if (rand) while (1); there), I click the Steam icon, nothing seems to happen, so I click it again, and a second copy launches, I think, so if this is intermittent, I'm not sure how many people would even notice, I guess. Hmm, true. Okay, well please let me know if you get a reproducible case! After updating to the latest SDL code as requested, and disabling the call to HidD_GetSerialNumberString(), it now hangs in the next call, HidD_GetManufacturerString(). I'll try to reach out to the user who got this most recently on Steam to see if he can get me extra information (unlikely he'll respond to the friend request though), but for now I'm going to blacklist this particular device in hid_enumerate, and hopefully that'll avoid this issue for my users, but I'm worried this might still be a general (if very rare) issue. If you give me the user's Steam name, I can reach out to them. Blacklisting this device seems reasonable, and I'm pretty sure Steam would run into the same issue, since it uses the same code. Can you attach a patch with that change? I've emailed you with the Steam IDs of the users that had this, according to my logs. One thought - any chance this is a "hang" due to user intervention - e.g. the driver doing a pop-up saying "Game.exe wants to access your Corsair Gaming Keyboard, but Steam.exe already has it open" or something like that? I detect a deadlock by having a background thread that sleeps and wakes up once a second or so, and if 30 consecutive sleeps see no change in the main thread, I send a crash report - so if something was "legitimately" hanging the main thread for some user interaction (though I can't imagine why that'd particularly be when querying the manufacturer or product string...), it could, theoretically, be firing inaccurately. Would be great to actually talk to one of the users =). I've made a simple patch here: https://github.com/Jimbly/SDL/commit/1e90e83d7a239b34267d82a728f79bd7ff8a1791 Maybe we should look at DevicePath and not even call open_device on it to be safe, but it seems like the hang didn't happen until later, so opening and then closing after getting the attributes is probably safe. Also, slightly suspicious, but perhaps just indicative of the ubiquitousness, in every crash report, the previously enumerated device was a 360 controller (but maybe that's just due to ordering by vendor ID or something, dunno). In *theory* this could be somehow related to me opening the 360 controllers with RawInput elsewhere in the code (not in main SDL yet), but it seems super suspicious that this would then only hang when querying the Corsair devices after it, so this doesn't seem likely, but just wanted to mention it. Created attachment 3873 [details]
Ignore problematic Corsair device
Well, blacklisting that device seems to have fixed the issue in Splody. Now the most recent user is hanging with a new call stack: hid.dll!_HidD_GetProductString() gameoverlayrenderer.dll!5f413d24() dinput8.dll!_fGetProductStringFromDevice() dinput8.dll!CHid_GetProperty() dinput8.dll!CHid_GetDeviceInfo() dinput8.dll!_CDIDev_GetDeviceInfoW() dinput8.dll!_CDIDEnum_Next() dinput8.dll!CDIObj_EnumDevicesW() gameoverlayrenderer.dll!5f3f3a51() Game.exe!SDL_DINPUT_JoystickDetect(JoyStick_DeviceData * * pContext) Definitely affecting Steam! It is intermittent, as this user is getting into the game occasionally, not hanging every time. Patch added, thanks! https://hg.libsdl.org/SDL/rev/9c848db04e09 From your last call stack it looks like we need the SDL DINPUT driver to blacklist this device as well... Maybe... though we're not doing any querying of product name / serial number or anything in the DInput path. If you've got debug symbols for gameoverlayrenderer.dll, I can send you a minidump to see exactly where it's hanging and if it's the same device, or if something else is going wrong. It seems odd that this device would even be showing up under DirectInput, as it does not appear to be a joystick (though Usage was "VR Controls", so, maybe it is, though I can't find any Corsair consumer device besides a keyboard this might be). Do you think the code it's hanging in now is SDL code or is that other, unrelated code, and I should report this to Valve through other channels? (In reply to Jimb Esser from comment #13) > Maybe... though we're not doing any querying of product name / serial number > or anything in the DInput path. If you've got debug symbols for > gameoverlayrenderer.dll, I can send you a minidump to see exactly where it's > hanging and if it's the same device, or if something else is going wrong. > It seems odd that this device would even be showing up under DirectInput, as > it does not appear to be a joystick (though Usage was "VR Controls", so, > maybe it is, though I can't find any Corsair consumer device besides a > keyboard this might be). > > Do you think the code it's hanging in now is SDL code or is that other, > unrelated code, and I should report this to Valve through other channels? That code is SDL code calling into DirectInput. The only reason Steam is showing up in the stack trace is that it's wrapping the enumeration calls, but the calls are happening from DirectInput. Now that I think about it, the calls are happening from within CDIObj_EnumDevicesW(), which is in Microsoft's DirectInput code, so there may actually be nothing we can do about it. I think the underlying issue is a multi-threaded access to the device from multiple applications, so maybe skipping it in the HID enumeration in SDL (and Steam) may take care of the issue. I'll merge these changes over into Steam for the next beta client update and we'll see if that helps. Is there any way for you to ask your customer to opt into the Steam beta client? (In reply to Sam Lantinga from comment #14) > > I'll merge these changes over into Steam for the next beta client update and > we'll see if that helps. Is there any way for you to ask your customer to > opt into the Steam beta client? Not directly, as they declined my Steam friend invite (if there's anyone you can bug about better tools on Steam for developers contacting customers about support issues... ^_^). I can hack in an account-specific MOTD they'll see if they launch the game again, though whether they'll read it (or understand English) is unknown. (In reply to Sam Lantinga from comment #14) > > That code is SDL code calling into DirectInput. Ah, yeah, that makes sense, although it looks like it's not DirectInput calling the (potentially buggy) HID call, but Steam's hook that is doing so (perhaps because something is mapping to a HID device under the hood?). Unless Steam is just hooking an HID call, then there is, potentially, not much we can do about it (although the hook could blacklist the device or something). Hard for me to tell exactly what's going on without full debug symbols though. Steam just hooks _HidD_GetProductString(), it doesn't call it directly. Yeah, we'll see what happens when the Steam beta update goes live. Is that the same customer that you sent me info about previously? I can reach out to him directly if needed. (In reply to Sam Lantinga from comment #16) > Steam just hooks _HidD_GetProductString(), it doesn't call it directly. > > Yeah, we'll see what happens when the Steam beta update goes live. Is that > the same customer that you sent me info about previously? I can reach out to > him directly if needed. Yeah, same one (steam ID ending in 7175). |
I pushed a version of my game to Steam using the latest SDL a month or so ago, the first time using the 2.0.9 codebase for me, and I have had a user unable to launch the game due to a hang/deadlock in hid_enumerate. This has only happened to one user, so it is likely related to specific hardware or buggy drivers on his system, or could be a fluke. However, I only see hundreds of users in a month, so this might be indicative of a larger issue. Full callstack where the game hung was: ntdll.dll!_NtDeviceIoControlFile@40() KERNELBASE.dll!DeviceIoControl() kernel32.dll!DeviceIoControlImplementation() hid.dll!DeviceIoControlHelper() hid.dll!_HidD_GetSerialNumberString@12() hid_enumerate() Line 475 at src\hidapi\windows\hid.c(475) HIDAPI_UpdateDeviceList() Line 911 at src\joystick\hidapi\sdl_hidapijoystick.c(911) HIDAPI_JoystickDetect() Line 967 at src\joystick\hidapi\sdl_hidapijoystick.c(967) SDL_JoystickUpdate_REAL() Line 1089 at src\joystick\sdl_joystick.c(1089) SDL_PumpEvents_REAL() Line 687 at src\events\sdl_events.c(687) It was querying a device which appears to have these IDs: VendorID 0x1B1C (Corsair) ProductID 0x1B3D VersionNumber 0x3006 path \\?\hid#vid_1b1c&pid_1b3d&mi_00&col04#7&1836ff08&0&0003#{4d1e55b2-f16f-11cf-88cb-001111000030} caps {Usage=3 (VR Controls) UsagePage=65474} The immediately previously queried device was "Controller (XBOX 360 For Windows)". I've reached out to the user for additional information and troubleshooting, but no response yet, will update this with any additional info I receive. I am also running with the patch for bug 4477 (https://bugzilla.libsdl.org/show_bug.cgi?id=4477) which adds RawInput support, however this appears to be strictly in the HIDAPI startup code so is probably unrelated.