We are currently migrating Bugzilla to GitHub issues.
Any changes made to the bug tracker now will be lost, so please do not post new bugs or make changes to them.
When we're done, all bug URLs will redirect to their equivalent location on the new bug tracker.

Bug 4585 - Deadlock on startup in hid_enumerate
Summary: Deadlock on startup in hid_enumerate
Status: RESOLVED FIXED
Alias: None
Product: SDL
Classification: Unclassified
Component: joystick (show other bugs)
Version: HG 2.1
Hardware: x86 Windows 10
: P2 normal
Assignee: Sam Lantinga
QA Contact: Sam Lantinga
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-08 16:05 UTC by Jimb Esser
Modified: 2019-07-15 04:17 UTC (History)
0 users

See Also:


Attachments
Ignore problematic Corsair device (1.34 KB, patch)
2019-07-09 02:24 UTC, Jimb Esser
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jimb Esser 2019-04-08 16:05:52 UTC
I pushed a version of my game to Steam using the latest SDL a month or so ago, the first time using the 2.0.9 codebase for me, and I have had a user unable to launch the game due to a hang/deadlock in hid_enumerate.  This has only happened to one user, so it is likely related to specific hardware or buggy drivers on his system, or could be a fluke.  However, I only see hundreds of users in a month, so this might be indicative of a larger issue.

Full callstack where the game hung was:
ntdll.dll!_NtDeviceIoControlFile@40()
KERNELBASE.dll!DeviceIoControl()
kernel32.dll!DeviceIoControlImplementation()
hid.dll!DeviceIoControlHelper()
hid.dll!_HidD_GetSerialNumberString@12()
hid_enumerate() Line 475
	at src\hidapi\windows\hid.c(475)
HIDAPI_UpdateDeviceList() Line 911
	at src\joystick\hidapi\sdl_hidapijoystick.c(911)
HIDAPI_JoystickDetect() Line 967
	at src\joystick\hidapi\sdl_hidapijoystick.c(967)
SDL_JoystickUpdate_REAL() Line 1089
	at src\joystick\sdl_joystick.c(1089)
SDL_PumpEvents_REAL() Line 687
	at src\events\sdl_events.c(687)

It was querying a device which appears to have these IDs:
  VendorID	0x1B1C (Corsair)
  ProductID	0x1B3D
  VersionNumber	0x3006
  path \\?\hid#vid_1b1c&pid_1b3d&mi_00&col04#7&1836ff08&0&0003#{4d1e55b2-f16f-11cf-88cb-001111000030}
  caps	{Usage=3 (VR Controls) UsagePage=65474}

The immediately previously queried device was "Controller (XBOX 360 For Windows)".

I've reached out to the user for additional information and troubleshooting, but no response yet, will update this with any additional info I receive.

I am also running with the patch for bug 4477 (https://bugzilla.libsdl.org/show_bug.cgi?id=4477) which adds RawInput support, however this appears to be strictly in the HIDAPI startup code so is probably unrelated.
Comment 1 Jimb Esser 2019-05-04 01:22:58 UTC
The original user never responded to request for more information, but I've got another user now with the exact same deadlock - also stuck calling HidD_GetSerialNumberString() on a Corsair device.  I've attempted to contact this user for more information, but Steam doesn't provide a great venue for that, so might not get any response ^_^.

I'm a little curious on whether this code has made it to the SDL used by the non-beta Steam client yet?  It seems these particular users would be unable to start Steam at all, although perhaps this deadlock only happens when a parent process/other process already opened the problematic device.
Comment 2 Sam Lantinga 2019-06-12 03:08:12 UTC
We haven't seen this reported in Steam.

If you remove the call to HidD_GetSerialNumberString() in hid_enumerate(), does that fix the problem?
Comment 3 Sam Lantinga 2019-06-12 03:08:51 UTC
Also, please try this with the latest SDL snapshot. I don't think anything related to this was fixed, but I'd like to make sure:
http://www.libsdl.org/tmp/SDL-2.0.zip

Thanks!
Comment 4 Jimb Esser 2019-06-12 03:19:04 UTC
The two users who got this never responded to requests for more information, unfortunately.  From my logs, one appears to have got it once, and immediately uninstalled, never ran it a second time.  Other user ran the game successfully once later, so it might be the hang is intermittent (or he ran on a different computer).  I'll try removing that call and pushing a new version, but the frequency of getting crash reports of this issue seems to be about once per couple months or less, so it'll be hard to say whether it fixes it.

As for Steam, as far as I know, it does not do any deadlock detection in its crash reporting, so if I reproduce this artificially (put in a if (rand) while (1); there), I click the Steam icon, nothing seems to happen, so I click it again, and a second copy launches, I think, so if this is intermittent, I'm not sure how many people would even notice, I guess.
Comment 5 Sam Lantinga 2019-06-12 14:43:44 UTC
Hmm, true. Okay, well please let me know if you get a reproducible case!
Comment 6 Jimb Esser 2019-07-08 12:16:01 UTC
After updating to the latest SDL code as requested, and disabling the call to HidD_GetSerialNumberString(), it now hangs in the next call, HidD_GetManufacturerString().

I'll try to reach out to the user who got this most recently on Steam to see if he can get me extra information (unlikely he'll respond to the friend request though), but for now I'm going to blacklist this particular device in hid_enumerate, and hopefully that'll avoid this issue for my users, but I'm worried this might still be a general (if very rare) issue.
Comment 7 Sam Lantinga 2019-07-08 18:10:44 UTC
If you give me the user's Steam name, I can reach out to them.

Blacklisting this device seems reasonable, and I'm pretty sure Steam would run into the same issue, since it uses the same code.

Can you attach a patch with that change?
Comment 8 Jimb Esser 2019-07-09 02:20:56 UTC
I've emailed you with the Steam IDs of the users that had this, according to  my logs.

One thought - any chance this is a "hang" due to user intervention - e.g. the driver doing a pop-up saying "Game.exe wants to access your Corsair Gaming Keyboard, but Steam.exe already has it open" or something like that?

I detect a deadlock by having a background thread that sleeps and wakes up once a second or so, and if 30 consecutive sleeps see no change in the main thread, I send a crash report - so if something was "legitimately" hanging the main thread for some user interaction (though I can't imagine why that'd particularly be when querying the manufacturer or product string...), it could, theoretically, be firing inaccurately.  Would be great to actually talk to one of the users =).

I've made a simple patch here: https://github.com/Jimbly/SDL/commit/1e90e83d7a239b34267d82a728f79bd7ff8a1791 

Maybe we should look at DevicePath and not even call open_device on it to be safe, but it seems like the hang didn't happen until later, so opening and then closing after getting the attributes is probably safe.

Also, slightly suspicious, but perhaps just indicative of the ubiquitousness, in every crash report, the previously enumerated device was a 360 controller (but maybe that's just due to ordering by vendor ID or something, dunno).  In *theory* this could be somehow related to me opening the 360 controllers with RawInput elsewhere in the code (not in main SDL yet), but it seems super suspicious that this would then only hang when querying the Corsair devices after it, so this doesn't seem likely, but just wanted to mention it.
Comment 9 Jimb Esser 2019-07-09 02:24:20 UTC
Created attachment 3873 [details]
Ignore problematic Corsair device
Comment 10 Jimb Esser 2019-07-13 14:36:27 UTC
Well, blacklisting that device seems to have fixed the issue in Splody.  Now the most recent user is hanging with a new call stack:

hid.dll!_HidD_GetProductString()
gameoverlayrenderer.dll!5f413d24()
dinput8.dll!_fGetProductStringFromDevice()
dinput8.dll!CHid_GetProperty()
dinput8.dll!CHid_GetDeviceInfo()
dinput8.dll!_CDIDev_GetDeviceInfoW()
dinput8.dll!_CDIDEnum_Next()
dinput8.dll!CDIObj_EnumDevicesW()
gameoverlayrenderer.dll!5f3f3a51()
Game.exe!SDL_DINPUT_JoystickDetect(JoyStick_DeviceData * * pContext)


Definitely affecting Steam!  It is intermittent, as this user is getting into the game occasionally, not hanging every time.
Comment 11 Sam Lantinga 2019-07-14 23:49:12 UTC
Patch added, thanks!
https://hg.libsdl.org/SDL/rev/9c848db04e09
Comment 12 Sam Lantinga 2019-07-14 23:50:41 UTC
From your last call stack it looks like we need the SDL DINPUT driver to blacklist this device as well...
Comment 13 Jimb Esser 2019-07-14 23:59:42 UTC
Maybe... though we're not doing any querying of product name / serial number or anything in the DInput path.  If you've got debug symbols for gameoverlayrenderer.dll, I can send you a minidump to see exactly where it's hanging and if it's the same device, or if something else is going wrong.  It seems odd that this device would even be showing up under DirectInput, as it does not appear to be a joystick (though Usage was "VR Controls", so, maybe it is, though I can't find any Corsair consumer device besides a keyboard this might be).

Do you think the code it's hanging in now is SDL code or is that other, unrelated code, and I should report this to Valve through other channels?
Comment 14 Sam Lantinga 2019-07-15 00:05:14 UTC
(In reply to Jimb Esser from comment #13)
> Maybe... though we're not doing any querying of product name / serial number
> or anything in the DInput path.  If you've got debug symbols for
> gameoverlayrenderer.dll, I can send you a minidump to see exactly where it's
> hanging and if it's the same device, or if something else is going wrong. 
> It seems odd that this device would even be showing up under DirectInput, as
> it does not appear to be a joystick (though Usage was "VR Controls", so,
> maybe it is, though I can't find any Corsair consumer device besides a
> keyboard this might be).
> 
> Do you think the code it's hanging in now is SDL code or is that other,
> unrelated code, and I should report this to Valve through other channels?

That code is SDL code calling into DirectInput. The only reason Steam is showing up in the stack trace is that it's wrapping the enumeration calls, but the calls are happening from DirectInput. Now that I think about it, the calls are happening from within CDIObj_EnumDevicesW(), which is in Microsoft's DirectInput code, so there may actually be nothing we can do about it. 

I think the underlying issue is a multi-threaded access to the device from multiple applications, so maybe skipping it in the HID enumeration in SDL (and Steam) may take care of the issue.

I'll merge these changes over into Steam for the next beta client update and we'll see if that helps. Is there any way for you to ask your customer to opt into the Steam beta client?
Comment 15 Jimb Esser 2019-07-15 01:32:45 UTC
(In reply to Sam Lantinga from comment #14)
> 
> I'll merge these changes over into Steam for the next beta client update and
> we'll see if that helps. Is there any way for you to ask your customer to
> opt into the Steam beta client?

Not directly, as they declined my Steam friend invite (if there's anyone you can bug about better tools on Steam for developers contacting customers about support issues... ^_^).  I can hack in an account-specific MOTD they'll see if they launch the game again, though whether they'll read it (or understand English) is unknown.

(In reply to Sam Lantinga from comment #14)
> 
> That code is SDL code calling into DirectInput.

Ah, yeah, that makes sense, although it looks like it's not DirectInput calling the (potentially buggy) HID call, but Steam's hook that is doing so (perhaps because something is mapping to a HID device under the hood?).  Unless Steam is just hooking an HID call, then there is, potentially, not much we can do about it (although the hook could blacklist the device or something).  Hard for me to tell exactly what's going on without full debug symbols though.
Comment 16 Sam Lantinga 2019-07-15 03:30:59 UTC
Steam just hooks _HidD_GetProductString(), it doesn't call it directly.

Yeah, we'll see what happens when the Steam beta update goes live. Is that the same customer that you sent me info about previously? I can reach out to him directly if needed.
Comment 17 Jimb Esser 2019-07-15 04:17:19 UTC
(In reply to Sam Lantinga from comment #16)
> Steam just hooks _HidD_GetProductString(), it doesn't call it directly.
> 
> Yeah, we'll see what happens when the Steam beta update goes live. Is that
> the same customer that you sent me info about previously? I can reach out to
> him directly if needed.

Yeah, same one (steam ID ending in 7175).