We are currently migrating Bugzilla to GitHub issues.
Any changes made to the bug tracker now will be lost, so please do not post new bugs or make changes to them.
When we're done, all bug URLs will redirect to their equivalent location on the new bug tracker.

Bug 4771

Summary: MinGW and VisualC builds of SDL2 are not binary-compatible
Product: SDL Reporter: Ethan Lee <flibitijibibo>
Component: threadAssignee: Sam Lantinga <slouken>
Status: RESOLVED FIXED QA Contact: Sam Lantinga <slouken>
Severity: major    
Priority: P2 CC: icculus, jacek, sezeroz
Version: HG 2.0Keywords: target-2.0.12
Hardware: All   
OS: Windows (All)   

Description Ethan Lee 2019-08-25 20:45:52 UTC
Consider the following scenario:

1. A program built with MinGW is linked to SDL2
2. The program ships with the official SDL2 x86 binary, since it's dependency-free
3. Run program, program tries to create a thread via SDL
4. Crash on corrupted stack!

The reason for this is a pretty serious incompatibility in SDL_thread.h:

https://hg.libsdl.org/SDL/file/7693573f862d/include/SDL_thread.h#l72

The official binary is built with HAVE_LIBC=0, while most MinGW setups will have HAVE_LIBC=1 in their SDL_config.h. To the developer this is completely invisible, as everything is macro'd around to work with the "official" SDL thread function, but if any mixing/matching happens, corruption will occur.

This was pointed out in bug 3844, but this report is specifically to point out the API incompatibility.

I'd be okay with something like an `SDL_CreateThreadWithStackSizeWIN32` to allow for calling the function from any Windows configuration, but in the long term this will probably need to be taken care of more aggressively, as the crash is _extremely_ hard to diagnose without extensive knowledge of SDL's inner workings beforehand.
Comment 1 Sam Lantinga 2019-08-26 17:10:34 UTC
Have you tested that the proposed patch in bug 3844 fixes this?
Comment 2 Ethan Lee 2019-08-26 17:32:02 UTC
I did the equivalent in the headers provided by the MinGW package (read: set HAVE_LIBC=0 by force) and it does indeed work for building against the VC binaries with the MinGW compiler, so the FNA binaries are now built with that workaround. Looking at the Windows SDL_systhread.c though, I'm not 100% sure it would clean up the differences.

The problem is that the public-facing headers have some critical defines that totally reconfigure how the binary is built. So while forcing HAVE_LIBC=0 works on the application side, it just shifts the problem to making our binaries incompatible with the MinGW SDL binaries. My guess is the workaround fixed VS compatibility, but shipping with a MinGW binary would explode in the same way.
Comment 3 Ozkan Sezer 2019-08-26 17:47:02 UTC
(In reply to Ethan Lee from comment #2)
> [...]  So while forcing
> HAVE_LIBC=0 works on the application side, it just shifts the problem to
> making our binaries incompatible with the MinGW SDL binaries. My guess is
> the workaround fixed VS compatibility, but shipping with a MinGW binary
> would explode in the same way.

If the patch from bug 3844 were applied, the issue will be with
the older versions of mingw-built SDL dlls, but I think the new
ones would be uniform even if the api is changed. I suggest that
patch be applied for 2.1 whenever that happens.
Comment 4 Ethan Lee 2019-08-26 17:49:44 UTC
Yeah, this will probably have to roll with 2.1. The only question is, does this then go back and break why that special path exists in the first place? If the docs are still accurate it sounds like we have to pass the _beginthread and _endthread functions for the VC setup to keep working. Someone with better knowledge of this special path will have to chime in, as I'm pretty clueless from here...
Comment 5 Ozkan Sezer 2019-08-26 17:55:52 UTC
(In reply to Ethan Lee from comment #4)
> Someone with better knowledge of this special path will have to chime in,
> as I'm pretty clueless from here...

The same concerns & questions asked in bug #3844 :)
Comment 6 Ethan Lee 2019-08-26 18:02:45 UTC
>The same concerns & questions asked in bug #3844 :)

Yep, it's all fun and games until someone gets an ifdef in their headers. >_<

FWIW, I would opt for the VC path simply because the MinGW path is perfectly capable of sending the begin/end functions, while the opposite may not be quite as easy. It's a cruel breakage, but since the VC binaries are _technically_ the official binary... I dunno. MinGW users would have to voice their opinions on the subject. I'll forward this to at least one company I know of that cares about this.
Comment 7 Jacek Caban 2019-08-26 18:53:09 UTC
The way it's done for MSVC is rather questionable... I would suggest to use CreateThread() and ExitThread() directly inside SDL2. This would make _beginthread() and _endthread() not needed and SDL_CreateThread could be exposed for MSVC the same way as it's exposed for other platforms, without additional hacks.
Comment 8 Sam Lantinga 2019-08-27 17:49:12 UTC
_beginthread() and _endthread() are needed for properly initializing and destroying thread specific resources in various versions of the Windows C runtime. This isn't important for SDL internal threads, but any application threads that call runtime functions need to have the appropriate runtime support functions called at the beginning and end of their thread routine.
Comment 9 Jacek Caban 2019-08-27 18:06:28 UTC
Windows C runtime is capable of initializing the data on demand and freeing it based using its DllMain(DLL_THREAD_DETACH) (it's a bit more complicated for statically linked crt, but its has mechanisms for that).

See how initialization is implemented in Wine:
https://source.winehq.org/git/wine.git/blob/HEAD:/dlls/msvcrt/thread.c#l38
and here is freeing:
https://source.winehq.org/git/wine.git/blob/HEAD:/dlls/msvcrt/main.c#l151

Note that if it didn't do that, crt would be useless for anyone using CreateThread(), which is extremely common in Windows.
Comment 10 Ryan C. Gordon 2019-09-20 20:47:38 UTC
We're changing how we do SDL release versions; now releases will be even numbers (2.0.10, 2.0.12, etc), and as soon as we tag a release, we'll move the internal version number to an odd number (2.0.12 ships, we tag the latest in revision control as 2.0.13 immediately, which will become 2.0.14 on release, etc).

As such, I'm moving the bugs tagged with target-2.0.11 to target 2.0.12. Sorry if you get a lot of email from this change!

Thanks,
--ryan.
Comment 11 Ryan C. Gordon 2019-09-20 20:48:41 UTC
We're changing how we do SDL release versions; now releases will be even numbers (2.0.10, 2.0.12, etc), and as soon as we tag a release, we'll move the internal version number to an odd number (2.0.12 ships, we tag the latest in revision control as 2.0.13 immediately, which will become 2.0.14 on release, etc).

As such, I'm moving the bugs tagged with target-2.0.11 to target 2.0.12. Sorry if you get a lot of email from this change!

Thanks,
--ryan.
Comment 12 Ryan C. Gordon 2019-12-04 16:25:20 UTC
We may have to punt on this for 2.0.12 (and possibly wait until we break ABI in 2.1, too), but I guess I should ask:

Is there any reason we can't just remove the HAVE_LIBC check and always require a beginthread/endthread function on Windows, and then the macro magic does something more like...

    #if !HAVE_LIBC
    #define SDL_beginthread _beginthreadex
    #else
    #define SDL_beginthread NULL
    #endif


...then SDL_CreateThread() becomes a single function declaration that has no #ifdefs beyond _WIN32, the calling app at runtime uses its own beginthread implementation, and we make the internal implementation smart enough to not call beginthread if the function pointer is NULL.

(SDL_CreateThread() is still #ifdef'd between Windows and everyone-else, at least until 2.1, where we can probably make the conditionals vanish entirely.)

--ryan.
Comment 13 Ryan C. Gordon 2019-12-04 16:26:40 UTC
(In reply to Ryan C. Gordon from comment #12)
> Is there any reason we can't just remove the HAVE_LIBC check and always
> require a beginthread/endthread function on Windows, and then the macro
> magic does something more like...

(This would solve Bug #3844 too, I think.)

--ryan.
Comment 14 Ethan Lee 2019-12-04 16:33:23 UTC
(In reply to Ryan C. Gordon from comment #12)
> We may have to punt on this for 2.0.12 (and possibly wait until we break ABI
> in 2.1, too), but I guess I should ask:
> 
> Is there any reason we can't just remove the HAVE_LIBC check and always
> require a beginthread/endthread function on Windows, and then the macro
> magic does something more like...
> 
>     #if !HAVE_LIBC
>     #define SDL_beginthread _beginthreadex
>     #else
>     #define SDL_beginthread NULL
>     #endif
> 
> 
> ...then SDL_CreateThread() becomes a single function declaration that has no
> #ifdefs beyond _WIN32, the calling app at runtime uses its own beginthread
> implementation, and we make the internal implementation smart enough to not
> call beginthread if the function pointer is NULL.
> 
> (SDL_CreateThread() is still #ifdef'd between Windows and everyone-else, at
> least until 2.1, where we can probably make the conditionals vanish
> entirely.)
> 
> --ryan.

That would at least fix the corruption, yes. It would still be a problem if a MinGW build used the official SDL2.dll and it didn't have beginthread available, which it sounds like is the real root issue. That said, if we can make it so there's a MASSIVE error message when the application's SDL_beginthread is NULL and SDL2.dll doesn't have beginthread, that would be enough for people to identify the bug and fix on their own.
Comment 15 Ryan C. Gordon 2019-12-04 18:42:19 UTC
(In reply to Ethan Lee from comment #14)
> That would at least fix the corruption, yes. It would still be a problem if
> a MinGW build used the official SDL2.dll and it didn't have beginthread
> available, which it sounds like is the real root issue.

Fwiw, they would need to get an official future 2.0.12 build and its headers with this solution, but they're already in a different but equally dangerous ABI situation until they upgrade those headers.

At least on Windows, we're generally in the situation where the affected project can fetch SDL at their whim and not from, say, the whims of whatever is preinstalled on an end-user's system.

--ryan.
Comment 16 Sam Lantinga 2020-02-09 22:59:32 UTC
(In reply to Ryan C. Gordon from comment #12)
> We may have to punt on this for 2.0.12 (and possibly wait until we break ABI
> in 2.1, too), but I guess I should ask:
> 
> Is there any reason we can't just remove the HAVE_LIBC check and always
> require a beginthread/endthread function on Windows, and then the macro
> magic does something more like...
> 
>     #if !HAVE_LIBC
>     #define SDL_beginthread _beginthreadex
>     #else
>     #define SDL_beginthread NULL
>     #endif
> 
> 
> ...then SDL_CreateThread() becomes a single function declaration that has no
> #ifdefs beyond _WIN32, the calling app at runtime uses its own beginthread
> implementation, and we make the internal implementation smart enough to not
> call beginthread if the function pointer is NULL.
> 
> (SDL_CreateThread() is still #ifdef'd between Windows and everyone-else, at
> least until 2.1, where we can probably make the conditionals vanish
> entirely.)
> 
> --ryan.

This seems like the right approach. Can we do this in a way that doesn't break the ABI for 2.0? By ABI, I mean the officially released binaries for SDL. If someone is compiling it themselves using Visual Studio, I don't mind them recompiling with a new release.
Comment 17 Sam Lantinga 2020-02-11 16:27:27 UTC
Fixed!
https://hg.libsdl.org/SDL/rev/607750a641d7
https://hg.libsdl.org/SDL/rev/eb6023298d0e

This makes sure that all builds of SDL are compatible with the official release binaries and the default Visual Studio binaries.