| Summary: | Intel CPUs should use PAUSE instruction in SDL_AtomicLock? | ||
|---|---|---|---|
| Product: | SDL | Reporter: | Ryan C. Gordon <icculus> |
| Component: | atomic | Assignee: | Ryan C. Gordon <icculus> |
| Status: | RESOLVED FIXED | QA Contact: | Sam Lantinga <slouken> |
| Severity: | normal | ||
| Priority: | P2 | ||
| Version: | HG 2.0 | ||
| Hardware: | x86 | ||
| OS: | Other | ||
|
Description
Ryan C. Gordon
2018-04-27 05:28:50 UTC
Stumbled upon this and thought it was clever (although if it's clever in _practice_, I don't know)... https://github.com/owent-utils/c-cpp/blob/f93c27edcef87059e7198c6e25a2cfc1e6ecced6/include/Lock/SpinLock.h#L137-L145 The loop to acquire a spinlock increments a counter each time, and more dramatically yields the CPU as iterations increase: - Does nothing for extremely short loops - Uses the PAUSE instruction for short loops - Uses sched_yield() for longer loops - Uses usleep() for really long loops. Right now, SDL_AtomicLock does: while (!SDL_AtomicTryLock(lock)) { SDL_Delay(0); } Which is probably not ideal (at a minimum: SDL_Delay(0) doesn't do the same thing on every platform, although this code probably thinks it means "surrender the rest of the current timeslice"). --ryan. (In reply to Ryan C. Gordon from comment #1) > - Does nothing for extremely short loops > - Uses the PAUSE instruction for short loops > - Uses sched_yield() for longer loops > - Uses usleep() for really long loops. So I tried an equivalent of this, and (at least on this Mac) it's a dramatic negative change for two threads that are constantly competing for the lock. One thread will eventually end up sleeping, and the other will always get the lock from under it. The losing thread might spin 30,000+ times before it wins. The existing code tends to be pretty fair about which thread wins; they largely alternate (although SDL_Delay(0) is not a good plan here, as other platforms are probably behaving differently). So in https://hg.libsdl.org/SDL/rev/2d871b0d5c34, it does the x86 PAUSE instruction for the first 32 iterations and then drops back to the usual SDL_Delay(0). This is good enough for now, but maybe locking down SDL_Delay(0) to mean "surrender the current timeslice" at some point is a good idea. --ryan. |