We are currently migrating Bugzilla to GitHub issues.
Any changes made to the bug tracker now will be lost, so please do not post new bugs or make changes to them.
When we're done, all bug URLs will redirect to their equivalent location on the new bug tracker.

Bug 4151 - Intel CPUs should use PAUSE instruction in SDL_AtomicLock?
Summary: Intel CPUs should use PAUSE instruction in SDL_AtomicLock?
Status: RESOLVED FIXED
Alias: None
Product: SDL
Classification: Unclassified
Component: atomic (show other bugs)
Version: HG 2.0
Hardware: x86 Other
: P2 normal
Assignee: Ryan C. Gordon
QA Contact: Sam Lantinga
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-27 05:28 UTC by Ryan C. Gordon
Modified: 2018-06-25 20:09 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ryan C. Gordon 2018-04-27 05:28:50 UTC
This is literally the first I've heard of this instruction, so I don't know if this is a good idea, but it seems like maybe we should insert one of these in the loop in SDL_AtomicLock()...

https://c9x.me/x86/html/file_module_x86_id_232.html

...PAUSE uses the same bits as REP NOP, so any x86 CPU that doesn't understand PAUSE will carry on as before (spinning as fast as possible with a NOP it passes through in the middle).

--ryan.
Comment 1 Ryan C. Gordon 2018-06-24 16:43:23 UTC
Stumbled upon this and thought it was clever (although if it's clever in _practice_, I don't know)...

https://github.com/owent-utils/c-cpp/blob/f93c27edcef87059e7198c6e25a2cfc1e6ecced6/include/Lock/SpinLock.h#L137-L145

The loop to acquire a spinlock increments a counter each time, and more dramatically yields the CPU as iterations increase:

- Does nothing for extremely short loops
- Uses the PAUSE instruction for short loops
- Uses sched_yield() for longer loops
- Uses usleep() for really long loops.

Right now, SDL_AtomicLock does:

    while (!SDL_AtomicTryLock(lock)) {
        SDL_Delay(0);
    }

Which is probably not ideal (at a minimum: SDL_Delay(0) doesn't do the same thing on every platform, although this code probably thinks it means "surrender the rest of the current timeslice").

--ryan.
Comment 2 Ryan C. Gordon 2018-06-25 20:09:38 UTC
(In reply to Ryan C. Gordon from comment #1)
> - Does nothing for extremely short loops
> - Uses the PAUSE instruction for short loops
> - Uses sched_yield() for longer loops
> - Uses usleep() for really long loops.

So I tried an equivalent of this, and (at least on this Mac) it's a dramatic negative change for two threads that are constantly competing for the lock. One thread will eventually end up sleeping, and the other will always get the lock from under it. The losing thread might spin 30,000+ times before it wins.

The existing code tends to be pretty fair about which thread wins; they largely alternate (although SDL_Delay(0) is not a good plan here, as other platforms are probably behaving differently).

So in https://hg.libsdl.org/SDL/rev/2d871b0d5c34, it does the x86 PAUSE instruction for the first 32 iterations and then drops back to the usual SDL_Delay(0). This is good enough for now, but maybe locking down SDL_Delay(0) to mean "surrender the current timeslice" at some point is a good idea.

--ryan.