We are currently migrating Bugzilla to GitHub issues.
Any changes made to the bug tracker now will be lost, so please do not post new bugs or make changes to them.
When we're done, all bug URLs will redirect to their equivalent location on the new bug tracker.

Bug 5389

Summary: [PATCH] Semaphore: Add Atomics+WaitOnAddress Windows implementation
Product: SDL Reporter: Joel Linn <jl>
Component: threadAssignee: Sam Lantinga <slouken>
Status: RESOLVED FIXED QA Contact: Sam Lantinga <slouken>
Severity: normal    
Priority: P2 Keywords: target-2.0.16
Version: 2.0.13   
Hardware: All   
OS: Windows 10   
Attachments: Benchmark results of Condition Variables implementation (not this patchset)
Benchmark results of Atomics+WaitOnAddr implementation (this patchset)
0001 Semaphore test: Put test into separate function.
0002 Semaphore test: Add overhead tests.
0003 Atomic test: Fix use after free
0004 Add SDL_sem implementation using Atomics and WaitOnAddress API.
Add SDL_sem implementation using Atomics and WaitOnAddress API v2
0004 Add SDL_sem impl using Atomics and WaitOnAddress API v3

Description Joel Linn 2020-12-10 09:54:06 UTC
Before this, the SDL_sem implementation on Windows uses the `CreateSemaphore()` API. The returned handles however are (inter-process) Kernel Objects and require a context switch on every interaction.

I propose an implementation based on Atomics and the `WaitOnAddress` API. (I made an implementation using Windows Condition Variables as well but it didn't perform as well) Though I am not the first one to think about this: https://devblogs.microsoft.com/oldnewthing/20170612-00/?p=96375

I expanded the semaphore test case to measure the overhead of a hot semaphore and attached the results. `testsem` was run 40k times, changing the implementation back and forth between every run. Blue always is the current implementation using Kernel Objects. Mind the y axis is logarithmic so a gauss distribution will look like an inverted parabola.
Comment 1 Joel Linn 2020-12-10 09:55:30 UTC
Created attachment 4565 [details]
Benchmark results of Condition Variables implementation (not this patchset)
Comment 2 Joel Linn 2020-12-10 09:56:44 UTC
Created attachment 4566 [details]
Benchmark results of Atomics+WaitOnAddr implementation (this patchset)
Comment 3 Joel Linn 2020-12-10 09:57:59 UTC
Created attachment 4567 [details]
0001 Semaphore test: Put test into separate function.
Comment 4 Joel Linn 2020-12-10 09:59:09 UTC
Created attachment 4568 [details]
0002 Semaphore test: Add overhead tests.
Comment 5 Joel Linn 2020-12-10 09:59:38 UTC
Created attachment 4569 [details]
0003 Atomic test: Fix use after free
Comment 6 Joel Linn 2020-12-10 10:00:22 UTC
Created attachment 4570 [details]
0004 Add SDL_sem implementation using Atomics and WaitOnAddress API.
Comment 7 Sam Lantinga 2020-12-13 03:23:16 UTC
Thanks, we'll add this after 2.0.14 release.
Comment 8 Joel Linn 2020-12-13 12:37:37 UTC
Created attachment 4577 [details]
Add SDL_sem implementation using Atomics and WaitOnAddress API v2

v2: - Fix mixed int/LONG types
    - Reorder definitions
    - Add missing include
Comment 9 Joel Linn 2020-12-17 20:45:20 UTC
Created attachment 4584 [details]
0004 Add SDL_sem impl using Atomics and WaitOnAddress API v3

v3: - Use `GetModuleHandle()` to load the API Set

I wasn't sure before what the correct way to dyn load an API Set is. It is not documented by MS but I found a discussion about it on the MS C++ STL issue tracker. https://github.com/microsoft/STL/pull/593#issuecomment-655799859

Two things need to be satisfied to be able to legally load an API Set dynamically using `GetModuleHandle()`:

- It needs to be legal to call the function on an API Set in the first place. It's nowhere documented explicitly but a senior from MS confirmed it's legal and they do it as well in their STL implementation (see github link, they discussed it in length).
- The module that implements the API set (currently KernelBase.dll but that is an implementation detail of W10 Desktop and may change.) needs to be loaded already. In our case this is guaranteed because we already link to other functions of that API Set statically (like WaitForSingleObject, see https://docs.microsoft.com/en-us/uwp/win32-and-com/win32-apis#apis-from-api-ms-win-core-synch-l1-2-0dll for a list).