Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDL audio crashes on multicore Windows XP machines #104

Closed
SDLBugzilla opened this issue Feb 10, 2021 · 0 comments
Closed

SDL audio crashes on multicore Windows XP machines #104

SDLBugzilla opened this issue Feb 10, 2021 · 0 comments
Labels
abandoned Bug has been abandoned for various reasons

Comments

@SDLBugzilla
Copy link
Collaborator

This bug report was migrated from our old Bugzilla tracker.

Reported in version: HG 2.0
Reported for operating system, platform: Windows (XP), x86

Comments on the original bug report:

On 2008-09-22 17:05:15 +0000, James Haley wrote:

(I was asked to submit this bug several months ago but didn't get around to it at that time. I have done more investigation in the meanwhile and believe I may have a possible fix.)

Programs making use of SDL audio functionality, particularly through SDL_mixer, suffer random crashes while writing into the audio buffer on Windows XP machines with multicore processors. This manifests as a read or write access violation at some random-appearing address inside the middle of the audio buffer (it is specifically not a result of any buffer underflow or overflow error, or of any other problem in the user program). This problem occurs regardless of the audio driver selected (both DirectSound and windib suffer equally).

Programs known to suffer this problem unpatched included Chocolate Doom, PrBoom, PrBoom-Plus, and my own program, the Eternity Engine. A workaround was recently found by Andrey Budko of the PrBoom-Plus team which involves calling the SetProcessAffinityMask Win32 API function with a valid mask value that the user must provide in the configuration file (a far less than ideal solution obviously). Restricting the program to a single core stops all crashes, revealing that the problem must be a thread contention issue which only occurs when the SDL audio thread is running as an actual hardware thread.

The only code point I can identify along the call stack between SDL_UpdateAudio and the application's SDL_mixer postmix callback routine that involves use of multithreaded data is the call to the audiospec callback rountine, which is placed between mutex acquisition and release via SDL_mutexP and SDL_mutexV.

However, SDL_UpdateAudio neglects to check for an error return value from SDL_mutexP. This means that if a problem occurred in attempting to acquire the mutex, the callback will be issued anyway. Therefore it would strengthen SDL's resistance to problems with multithreading if it checked for the error return value of -1 at this point in the code and made no calls to the audiospec callback unless the mutex is properly owned by the current thread.

It is also worth noting that the SDL_mutexP function does not check for and has no method for reporting a return value of WAIT_ABANDONED from the Win32 API function WaitForSingleObject. In this context, it would mean that the other thread utilizing the mutex has crashed. If this were to happen, the usual course of action would be to exit the program in a clean and graceful manner. However, I am not personally aware of a way that this check could be added without causing possible compatibility issues for users of the SDL multithreading API under Windows, nor am I aware if this return value plays a role in the reported problem. It is a possibility to keep in mind, however.

This issue affects both the 1.2 and 1.3 branches, as the code in question has not appreciably changed between the two.

On 2009-02-16 21:14:35 +0000, Sam Lantinga wrote:

Thanks for the info. I don't suppose you have a good test case? :)

On 2009-12-15 22:33:05 +0000, Ryan C. Gordon wrote:

Bumping priority on a set of bugs.

--ryan.

On 2010-01-07 17:22:52 +0000, James Haley wrote:

This is a follow-up to verify that this bug is still occurring in SDL 1.2.14 with the latest SDL_mixer version under Windows 7 on a quad core machine. I recently built this machine and so this is the first opportunity I have had to capture this crash myself - all previous data was obtained from end users.

The crash as I was able to capture it occurred in ntdll.dll at 0x773f8c39 as called from winmm.dll - evidently this bug is related to MIDI and not digital sound, and evidence from other users (Simon Howard of Chocolate Doom and Andrey Budko of PrBoom-Plus) now agrees with this. The code accessed a NULL pointer as a structure, and thus crashed trying to write to 0x00000014.

I believe there are possible race conditions in the Win32 native MIDI code relating to the callback which is passed to MCI and is used to fill the buffer with MIDI events. This callback is dispatched asynchronously from deep inside the winmm system each time it wants more events, so any data it touches needs protection from concurrent modification.

On 2010-01-07 17:37:30 +0000, Simon Howard wrote:

Just to clarify, because this bug seems to be poorly named; it doesn't seem to be a Windows-only thing. Under Linux I get lockups on multicore machines if the thread affinity workaround is not used.

On 2010-01-17 09:14:10 +0000, Ryan C. Gordon wrote:

(In reply to comment # 4)

Just to clarify, because this bug seems to be poorly named; it doesn't seem to
be a Windows-only thing. Under Linux I get lockups on multicore machines if
the thread affinity workaround is not used.

Are you using native MIDI on Linux, or the Timidity codepath?

--ryan.

On 2010-01-17 09:16:07 +0000, Simon Howard wrote:

Timidity.

On 2010-01-17 09:57:56 +0000, Ryan C. Gordon wrote:

(In reply to comment # 6)

Timidity.

Intriguing!

Can someone get me a stacktrace? Windows, Linux, doesn't matter. It would be useful to know exactly where it crashes, and who called the crashing function.

Thanks,
--ryan.

On 2010-01-17 10:19:40 +0000, Simon Howard wrote:

I've tried in the past to attach a debugger to get a stack trace, but each time, I was unable to get a meaningful trace - the stack just seemed to be junk. I'm not sure why.

In the Linux case, the program just hard locks-up and stops responding to any input. I'm assuming that the Windows situation is the same.

Anyway, right now I'm currently running Chocolate Doom on my EeePC with the affinity workaround disabled. The CPU is an Intel Atom with hyperthreading, and I managed to reproduce the bug on here before. I've set up the level so that two monsters are constantly shooting me (ie. playing sound effects), so hopefully this should trigger it. It has been a while since I last looked into this bug and reproduced it, so it's worth my re-confirming that it's still a problem.

On 2010-01-19 08:26:13 +0000, James Haley wrote:

I have a stack dump, but since the Windows crash occurs inside one of the winmm threads that is related to the MIDI event pump callback, there's very little information to be obtained from it, thanks to Windows' closed-source nature. It looks like this:

773f8c39 < IP on offending thread is here
773df871
773f8c64
773f8b48
7432f651
(etc for dozens of lines)

The first 4 call addresses are inside ntdll.dll - the majority of the remainder are in the winmm (aka MCI) system. The crash was an access violation at 0x00000014, and the instruction involved was moving the contents of an address to a register after having added 0x14 to the address - quite likely a structure field offset, but the structure pointer was unexpectedly NULL.

All the addresses I could see on threads other than the app's main thread seemed to be inside system DLLs. It is notable that my application itself was stopped in the middle of the rendering process for the current frame, and was not performing any action to manipulate or change the state of the music playback.

If you think it will do any good, I can try to get a better record of the crash, including the full state of all threads. I just have to wait for it to happen again after disabling the process affinity mask.

On 2011-01-07 16:09:32 +0000, Nathaniel J Fries wrote:

It's been almost a year since the last update.

Is this bug still an issue? If so, could you point me to source code of a project known to have these issues, and example files known to have this occur?

Also, what are you using to trace the exception on Windows? MSVC debugger, GCD, SetUnhandledExceptionFilter(), Structured Exception Handling, or some method I don't know of?

Do Windows and Linux crashes seem to be totally related (IE, the same exact files and circumstances cause a crash)?

On 2011-01-10 06:58:50 +0000, James Haley wrote:

(In reply to comment # 10)

It's been almost a year since the last update.

Is this bug still an issue? If so, could you point me to source code of a
project known to have these issues, and example files known to have this occur?

Also, what are you using to trace the exception on Windows? MSVC debugger, GCD,
SetUnhandledExceptionFilter(), Structured Exception Handling, or some method I
don't know of?

Do Windows and Linux crashes seem to be totally related (IE, the same exact
files and circumstances cause a crash)?

To the best of our knowledge the Windows crash is unresolved. Simon and Andrey Budko of the PrBoom-Plus project no longer believe that the Linux crash was related, and have stated that it seems to be resolved.

The source code of my project is at http://mancubus.net/svn/hosted/eternity/trunk - you need one or more commercial or shareware DOOM IWAD files in order to run this program, however. They can be placed either in the program's working directory or into the individual corresponding folder for that game under the "base" folder.

The source code for Chocolate Doom, another SDL-based DOOM port, can be found on Sourceforge - its project homepage is the first hit for a Google search.

The exception which I caught 3 years ago was caught in Visual Studio 6. Our development has moved to VC++ 2008 since then (the program can also be compiled with a CMake system for Linux, but you probably won't find the bug there).

To disable the process affinity mask workaround, you must edit base/system.cfg and set the affinity mask setting to 0 (it defaults to 1 due to the fact this problem destabilizes the program if left uncorrected).

On 2013-05-21 00:53:37 +0000, Sam Lantinga wrote:

Ryan, can you take a look at this for the 2.0 release?

On 2013-07-12 22:16:03 +0000, Ryan C. Gordon wrote:

(Sorry if you get a lot of copies of this email, we're touching dozens of bug reports right now.)

Tagging a bunch of bugs as target-2.0.0, Priority 2.

This means we're in the final stretch for an official SDL 2.0.0 release! These are the bugs we really want to fix before shipping if humanly possible.

That being said, we don't promise to fix them because of this tag, we just want to make sure we don't forget to deal with them before we bless a final 2.0.0 release, and generally be organized about what we're aiming to ship.

Hopefully you'll hear more about this bug soon. If you have more information (including "this got fixed at some point, nevermind"), we would love to have you come add more information to the bug report when you have a moment.

Thanks!
--ryan.

On 2018-08-06 21:20:18 +0000, Ryan C. Gordon wrote:

Hello, and sorry if you're getting dozens of copies of this message by email.

We are closing out bugs that appear to be abandoned in some form. This can happen for lots of reasons: we couldn't reproduce it, conversation faded out, the bug was noted as fixed in a comment but we forgot to mark it resolved, the report is good but the fix is impractical, we fixed it a long time ago without realizing there was an associated report, etc.

Individually, any of these bugs might have a better resolution (such as WONTFIX or WORKSFORME or INVALID) but we've added a new resolution of ABANDONED to make this easily searchable and make it clear that it's not necessarily unreasonable to revive a given bug report.

So if this bug is still a going concern and you feel it should still be open: please feel free to reopen it! But unless you respond, we'd like to consider these bugs closed, as many of them are several years old and overwhelming our ability to prioritize recent issues.

(please note that hundred of bug reports were sorted through here, so we apologize for any human error. Just reopen the bug in that case!)

Thanks,
--ryan.

@SDLBugzilla SDLBugzilla added abandoned Bug has been abandoned for various reasons bug labels Feb 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
abandoned Bug has been abandoned for various reasons
Projects
None yet
Development

No branches or pull requests

1 participant