We are currently migrating Bugzilla to GitHub issues.
Any changes made to the bug tracker now will be lost, so please do not post new bugs or make changes to them.
When we're done, all bug URLs will redirect to their equivalent location on the new bug tracker.

Bug 5378

Summary: [PATCH] Add SDL_mutex implementation using SRW Locks (choose at runtime)
Product: SDL Reporter: Joel Linn <jl>
Component: threadAssignee: Sam Lantinga <slouken>
Status: RESOLVED FIXED QA Contact: Sam Lantinga <slouken>
Severity: normal    
Priority: P2 CC: sezeroz
Version: 2.0.13Keywords: target-2.0.16
Hardware: All   
OS: Windows (All)   
Attachments: Patch 01 v1
Add SDL_mutex implementation using Windows Slim Reader/Writer
Benchmark results
Add SDL_mutex implementation using Windows SRW Locks v3

Description Joel Linn 2020-12-03 22:09:05 UTC
I want to replace the current Windows mutex implementation (uses Critial Sections) with a new one using Slim Reader/Writer (SRW) Locks. They are better in almost every aspect and considered best practice because:

- sizeof(void*)<=8 compared to 40 bytes for CRITICAL_SECTION. (Although to be re-entry sizeof(SDL_mutex) would be 8+4+4=16 bytes.) Improves cache locality.
- Less kernel context switches due to user-space implementation inside ntdll with atomic compare-exchange and light sleep feature (on win 10).

A hacky implementation shows significant speed-ups in some cases (e.g. testatomic.c in "Mode: Mutex")

SRW Locks have been introduced in Windows Vista, hence my question what to do about older Windows Versions? I see three paths:

1. Completely ditch the CriticalSection code. Cleanest but most disruptive.
2. Keep the old code and select between the two with compile time defines. Would require anyone with older Windows Versions to build a custom version of SDL2.
3. Load the functions dynamically and fall back to the old code if the APIs aren't available. Lot's of function pointers and branching required but built binaries are 100% backwards compatible.

What is the current status of Win Vista/XP compatibility anyways? I'm ok with putting the extra effort into supporting both APIs dynamically (3) - but only if it is a requirement of course.
Comment 1 Sam Lantinga 2020-12-05 11:40:34 UTC
SDL mutexes are required to provide recursive lock semantics, so SRW locks aren't appropriate here. Let me know if I'm missing anything?

To answer your question, Windows XP is still supported, so I'd want this to be runtime selectable.
Comment 2 Joel Linn 2020-12-05 19:15:02 UTC
Yes SRW Locks are not recursive. SDL needs to track recursion count and current owner of the lock itself (like the current pthreads implementation does). That's what I meant by "(Although to be re-entry sizeof(SDL_mutex) would be 8+4+4=16 bytes.)". All statements (about the benefits) I did incorporated that fact already though.

I have since implemented runtime selection, will test on WinXP and submit a Patch later.
Comment 3 Joel Linn 2020-12-05 22:52:42 UTC
Created attachment 4555 [details]
Patch 01 v1

I explicitly tested on a XP VM. (Sidenote: It seems CS performance has degraded on Win10)

There is some things I'd like to discuss:

1. The implementation is chosen when the first mutex is created using `SDL_CreateMutex`. When two threads try to do that simultaneously the jump table might get corrupted (-> crash). I think this is unlikely to happen though (??)
2. I added a hint to force the use of Critical Sections. Afaik the hint system doesn't use any mutexes. Still feels weird tough to invoke something like that from mutex code. I reckon this can be left out (why would a user want to change it?) but it's handy for doing benchmarks.
Comment 4 Sam Lantinga 2020-12-07 17:00:29 UTC
In general this looks good, we can add it for testing after the 2.0.14 release.

BTW, you're missing a call to FreeLibrary() in there.
Comment 5 Sam Lantinga 2020-12-07 17:01:56 UTC
Out of curiosity, does the additional work to support recursive mutexes change the performance gains?

Can you add your testing methodology and testing results between the two implementations?
Comment 6 Ozkan Sezer 2020-12-07 17:16:47 UTC
I suggest the following on top of your patch for compatibility:

diff --git a/src/thread/windows/SDL_sysmutex.c b/src/thread/windows/SDL_sysmutex.c
--- a/src/thread/windows/SDL_sysmutex.c
+++ b/src/thread/windows/SDL_sysmutex.c
@@ -57,10 +57,16 @@
 /**
  * Implementation based on Slim Reader/Writer (SRW) Locks for Win 7 and newer.
  */
+#ifndef SRWLOCK_INIT
+#define SRWLOCK_INIT {0}
+typedef struct _SRWLOCK {
+  PVOID Ptr;
+} SRWLOCK, *PSRWLOCK;
+#endif
 
-typedef VOID(WINAPI *pfnReleaseSRWLockExclusive)(_Inout_ PSRWLOCK);
-typedef VOID(WINAPI *pfnAcquireSRWLockExclusive)(_Inout_ PSRWLOCK);
-typedef BOOLEAN(WINAPI *pfnTryAcquireSRWLockExclusive)(_Inout_ PSRWLOCK);
+typedef VOID(WINAPI *pfnReleaseSRWLockExclusive)(PSRWLOCK);
+typedef VOID(WINAPI *pfnAcquireSRWLockExclusive)(PSRWLOCK);
+typedef BOOLEAN(WINAPI *pfnTryAcquireSRWLockExclusive)(PSRWLOCK);
 static pfnReleaseSRWLockExclusive pReleaseSRWLockExclusive = NULL;
 static pfnAcquireSRWLockExclusive pAcquireSRWLockExclusive = NULL;
 static pfnTryAcquireSRWLockExclusive pTryAcquireSRWLockExclusive = NULL;
Comment 7 Joel Linn 2020-12-07 17:35:29 UTC
(In reply to Sam Lantinga from comment #4)
> In general this looks good, we can add it for testing after the 2.0.14
> release.
> 
> BTW, you're missing a call to FreeLibrary() in there.

I do indeed miss that call. That would only be legal when all mutexes are destroyed (needs a global counter of mutexes) or in `atexit()`. In other places (e.g. `src/locale/windows/SDL_syslocale.c`) `kernel32.dll` is also not freed. It's difficult to find out when. `__attribute__((con/de-structor))` is sadly not supported by all compilers.
Comment 8 Joel Linn 2020-12-07 17:44:29 UTC
(In reply to Ozkan Sezer from comment #6)
> I suggest the following on top of your patch for compatibility:
> 
> diff --git a/src/thread/windows/SDL_sysmutex.c
> b/src/thread/windows/SDL_sysmutex.c
> --- a/src/thread/windows/SDL_sysmutex.c
> +++ b/src/thread/windows/SDL_sysmutex.c
> @@ -57,10 +57,16 @@
>  /**
>   * Implementation based on Slim Reader/Writer (SRW) Locks for Win 7 and
> newer.
>   */
> +#ifndef SRWLOCK_INIT
> +#define SRWLOCK_INIT {0}
> +typedef struct _SRWLOCK {
> +  PVOID Ptr;
> +} SRWLOCK, *PSRWLOCK;
> +#endif
>  
> -typedef VOID(WINAPI *pfnReleaseSRWLockExclusive)(_Inout_ PSRWLOCK);
> -typedef VOID(WINAPI *pfnAcquireSRWLockExclusive)(_Inout_ PSRWLOCK);
> -typedef BOOLEAN(WINAPI *pfnTryAcquireSRWLockExclusive)(_Inout_ PSRWLOCK);
> +typedef VOID(WINAPI *pfnReleaseSRWLockExclusive)(PSRWLOCK);
> +typedef VOID(WINAPI *pfnAcquireSRWLockExclusive)(PSRWLOCK);
> +typedef BOOLEAN(WINAPI *pfnTryAcquireSRWLockExclusive)(PSRWLOCK);
>  static pfnReleaseSRWLockExclusive pReleaseSRWLockExclusive = NULL;
>  static pfnAcquireSRWLockExclusive pAcquireSRWLockExclusive = NULL;
>  static pfnTryAcquireSRWLockExclusive pTryAcquireSRWLockExclusive = NULL;

The structs are defined unconditionally in the Win SDK v6 and newer. Is it still supported to build with older (XP era) toolchains? If so, I will include them otherwise I'd like to avoid the duplication.
Comment 9 Ozkan Sezer 2020-12-07 18:48:26 UTC
(In reply to Joel Linn from comment #8)
> The structs are defined unconditionally in the Win SDK v6 and newer. Is it
> still supported to build with older (XP era) toolchains? If so, I will
> include them otherwise I'd like to avoid the duplication.

IIRC, VS2017 has an option to build against XP SDK?

In my case, the failure was with my (very) old MinGW-w64 setup
which is fixed by that patch.
Comment 10 Joel Linn 2020-12-10 18:57:41 UTC
Created attachment 4571 [details]
Add SDL_mutex implementation using Windows Slim Reader/Writer
Comment 11 Joel Linn 2020-12-10 19:06:55 UTC
Created attachment 4572 [details]
Benchmark results

I benchmarked the different implementations using a modified version of the atomics test reduced to `RunFIFOTest(SDL_FALSE)`.

I also did a local build where I removed all code that is needed for mutex recursion.

I attached a histogram with the test durations.
Comment 12 Ozkan Sezer 2020-12-10 19:17:38 UTC
(In reply to Joel Linn from comment #10)
> Created attachment 4571 [details]
> Add SDL_mutex implementation using Windows Slim Reader/Writer

Compiles OK with my mingw-w64 toolchain.

Question:  Why use LoadLibraryW() for kernel32.dll and not
GetModuleHandleA()?  Do we not always link to kernel32.dll?
Comment 13 Ozkan Sezer 2020-12-10 19:32:58 UTC
(In reply to Ozkan Sezer from comment #12)
> Question:  Why use LoadLibraryW() for kernel32.dll and not
> GetModuleHandleA()?  Do we not always link to kernel32.dll?

OK, windows SDL_syslocale.c and SDL_systhread.c also do LoadLibraryW(),
maybe replacable by GetModuleHandleA(), but I take my criticism back.
Comment 14 Joel Linn 2020-12-10 21:57:59 UTC
(In reply to Ozkan Sezer from comment #13)
> (In reply to Ozkan Sezer from comment #12)
> > Question:  Why use LoadLibraryW() for kernel32.dll and not
> > GetModuleHandleA()?  Do we not always link to kernel32.dll?
> 
> OK, windows SDL_syslocale.c and SDL_systhread.c also do LoadLibraryW(),
> maybe replacable by GetModuleHandleA(), but I take my criticism back.

Yes, good catch!
Tbh, I copied that load library code from `SDL_syslocale.c` to match the syntax/styling of the rest of the project.

Thinking about it, the correct way seems to be calling `GetModuleHandleExW()` with the `GET_MODULE_HANDLE_EX_FLAG_PIN` flag, wouldn't it?
Comment 15 Sam Lantinga 2020-12-10 23:50:35 UTC
Great, we'll put this in after the 2.0.14 release.

Thanks!
Comment 16 Ozkan Sezer 2020-12-11 00:54:39 UTC
(In reply to Joel Linn from comment #14)
> (In reply to Ozkan Sezer from comment #13)
> > (In reply to Ozkan Sezer from comment #12)
> > > Question:  Why use LoadLibraryW() for kernel32.dll and not
> > > GetModuleHandleA()?  Do we not always link to kernel32.dll?
> > 
> > OK, windows SDL_syslocale.c and SDL_systhread.c also do LoadLibraryW(),
> > maybe replacable by GetModuleHandleA(), but I take my criticism back.
> 
> Yes, good catch!
> Tbh, I copied that load library code from `SDL_syslocale.c` to match the
> syntax/styling of the rest of the project.

Created bug #5390 to replace LoadLibraryW with GetModuleHandleW
in SDL_syslocale.c and SDL_systhread.c.
Comment 17 Joel Linn 2020-12-13 11:05:51 UTC
Created attachment 4576 [details]
Add SDL_mutex implementation using Windows SRW Locks v3
Comment 18 Sam Lantinga 2020-12-23 21:34:09 UTC
This is added, thanks!
https://hg.libsdl.org/SDL/rev/1a464a374e6f