Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intel CPUs should use PAUSE instruction in SDL_AtomicLock? #2884

Closed
SDLBugzilla opened this issue Feb 11, 2021 · 0 comments
Closed

Intel CPUs should use PAUSE instruction in SDL_AtomicLock? #2884

SDLBugzilla opened this issue Feb 11, 2021 · 0 comments

Comments

@SDLBugzilla
Copy link
Collaborator

This bug report was migrated from our old Bugzilla tracker.

Reported in version: HG 2.0
Reported for operating system, platform: Other, x86

Comments on the original bug report:

On 2018-04-27 05:28:50 +0000, Ryan C. Gordon wrote:

This is literally the first I've heard of this instruction, so I don't know if this is a good idea, but it seems like maybe we should insert one of these in the loop in SDL_AtomicLock()...

https://c9x.me/x86/html/file_module_x86_id_232.html

...PAUSE uses the same bits as REP NOP, so any x86 CPU that doesn't understand PAUSE will carry on as before (spinning as fast as possible with a NOP it passes through in the middle).

--ryan.

On 2018-06-24 16:43:23 +0000, Ryan C. Gordon wrote:

Stumbled upon this and thought it was clever (although if it's clever in practice, I don't know)...

https://github.com/owent-utils/c-cpp/blob/f93c27edcef87059e7198c6e25a2cfc1e6ecced6/include/Lock/SpinLock.h#L137-L145

The loop to acquire a spinlock increments a counter each time, and more dramatically yields the CPU as iterations increase:

  • Does nothing for extremely short loops
  • Uses the PAUSE instruction for short loops
  • Uses sched_yield() for longer loops
  • Uses usleep() for really long loops.

Right now, SDL_AtomicLock does:

while (!SDL_AtomicTryLock(lock)) {
    SDL_Delay(0);
}

Which is probably not ideal (at a minimum: SDL_Delay(0) doesn't do the same thing on every platform, although this code probably thinks it means "surrender the rest of the current timeslice").

--ryan.

On 2018-06-25 20:09:38 +0000, Ryan C. Gordon wrote:

(In reply to Ryan C. Gordon from comment # 1)

  • Does nothing for extremely short loops
  • Uses the PAUSE instruction for short loops
  • Uses sched_yield() for longer loops
  • Uses usleep() for really long loops.

So I tried an equivalent of this, and (at least on this Mac) it's a dramatic negative change for two threads that are constantly competing for the lock. One thread will eventually end up sleeping, and the other will always get the lock from under it. The losing thread might spin 30,000+ times before it wins.

The existing code tends to be pretty fair about which thread wins; they largely alternate (although SDL_Delay(0) is not a good plan here, as other platforms are probably behaving differently).

So in https://hg.libsdl.org/SDL/rev/2d871b0d5c34, it does the x86 PAUSE instruction for the first 32 iterations and then drops back to the usual SDL_Delay(0). This is good enough for now, but maybe locking down SDL_Delay(0) to mean "surrender the current timeslice" at some point is a good idea.

--ryan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant