| Summary: | incomplete pthread-based lock support should be removed | ||
|---|---|---|---|
| Product: | SDL | Reporter: | binarycrusader |
| Component: | atomic | Assignee: | Sam Lantinga <slouken> |
| Status: | RESOLVED FIXED | QA Contact: | Sam Lantinga <slouken> |
| Severity: | major | ||
| Priority: | P2 | CC: | icculus |
| Version: | 2.0.0 | ||
| Hardware: | x86_64 | ||
| OS: | Solaris | ||
| Attachments: |
Use pthread implementation for SDL_Atomic*Lock
Fix SDL pthread-based atomics support Fix SDL pthread-based atomics support removes incomplete pthread-based lock support |
||
I've verified this is broken on Linux as well if SDL_spinlock.c is modified to use the pthread implementation. Created attachment 1716 [details]
Use pthread implementation for SDL_Atomic*Lock
Patch to use only pthread implementation of atomic locks for testing
This appears to be because pthread_spin_init() and pthread_spin_destroy() are not used with the locks as required. Nice find! Do you have a tested fix? I haven't tried to fix it yet. My guess is we'll need some sort of lock init and destroy functions to be performant but I haven't finished investigating. So I have a working fix I've tested on Linux, I'll test on Solaris next, and if that works, I'll attach the patch. Created attachment 1718 [details]
Fix SDL pthread-based atomics support
The attached patch is a somewhat significant rework of the atomics support; it adds two new API methods (SDL_AtomicLockInit, SDL_AtomicLockDestroy) and changes the typedef of SDL_SpinLock when using pthread-based spinlocks.
I do not claim to be an atomics or threading expert, but all of the tests work as expected after the attached patch on Solaris.
It also worked at one point on Linux, but that was only because Linux essentially does nothing for pthread_spin_init() (well, it sets the lock pointer to 0...). This should be tested on Linux again, which I'll do as soon as I can.
I'd appreciate it a threading / atomics expert or two looked over these changes.
I'll file a future bug to add real atomics support for Solaris soon.
Note that on Solaris, pthread_spinlock_t is actually: typedef pthread_mutex_t pthread_spinlock_t; ...so you can't assume that a spinlock is just a volatile int. Created attachment 1720 [details]
Fix SDL pthread-based atomics support
This version of the patch has been tested on Linux; the only thing I changed was a C99-style int declaration in a for loop.
This should be good to go.
Unfortunately the spinlock semantics are currently that you can declare one initialized to zero and immediately use it. That's the whole point behind using them for protecting static data. In fact if you look, you now have a race condition where multiple threads can initialize the spinlock for the TLS system. This requires more thought... The idea that you can declare a spinlock initialized to zero is not going to work with pthreads. At least, not as a simole variable. The posix docs are pretty clear about this. I assume the race condition you refer to is the already initialized check I have for the tls_lock? To be clear, pthread spinlocks are not explicitly ints, so you can't set them to any value really. They're an opaque data type. Most high performance lock mechanisms I'm aware of have some sort of init and destroy, so I don't see how that can be avoided either. We absolutely can't add Init calls for spinlocks, because there are programs in the wild that use spinlocks without explicit initialization that would blow up. Here's my idea: - We delete the pthread spinlock code. It's useless anyhow: on most places we'd use it we have GCC or Clang, which both use the real atomic codepaths, on the other places, we can't guarantee correct initialization. - We get the real atomics working on Sun Studio (what prevents that, anyhow?). - As the fallback for not having real atomics, we keep the SDL_ATOMIC_DISABLED section, which currently fakes it with an SDL_mutex, and just put a dummy lock/unlock at the start of SDL_Init(), which (mostly) mitigates race conditions by (probably) initializing it from a single thread before anything else is likely to. --ryan. (In reply to Ryan C. Gordon from comment #13) > We absolutely can't add Init calls for spinlocks, because there are programs > in the wild that use spinlocks without explicit initialization that would > blow up. > > Here's my idea: > > - We delete the pthread spinlock code. It's useless anyhow: on most places > we'd use it we have GCC or Clang, which both use the real atomic codepaths, > on the other places, we can't guarantee correct initialization. > > - We get the real atomics working on Sun Studio (what prevents that, > anyhow?). Nothing specifically as far as I know, I just haven't taken the time to figure out what's required yet. All I was trying to do was actually fix the pthread-based lock mechanism to work correctly. To be clear, the existing pthread-based lock code didn't work on Linux either -- so it it's not a problem specific to Solaris. It was just easier to find the problem there since atomics don't work out of the box with the Studio compiler :-) > - As the fallback for not having real atomics, we keep the > SDL_ATOMIC_DISABLED section, which currently fakes it with an SDL_mutex, and > just put a dummy lock/unlock at the start of SDL_Init(), which (mostly) > mitigates race conditions by (probably) initializing it from a single thread > before anything else is likely to. Ryan, your conclusions basically match my thinking about it this morning. I apologize if I came off as unreasonable, I just couldn't reconcile the conflicting goals of having a portable locking mechanism that doesn't have an Init/Destroy-style API. There's just no way you can rely on OS interfaces for locks without some sort of init and destroy mechanism. This isn't intended as criticism, but this seems like a fairly large oversight in the original design of the SDL locking mechanism. I'll work on getting atomics support for Solaris Studio and open a separate bug for that so this bug can remain as a matter of record. I'd suggest this bug be retitled as "pthread-based lock support should be removed" :-) I'm happy to provide the patch to rip out the pthread-based stuff entirely, unless one of you want the joy of deleting code all to yourself... Created attachment 1723 [details]
removes incomplete pthread-based lock support
pthread spinlock support has been removed, thanks! https://hg.libsdl.org/SDL/rev/13b0df1793e8 (In reply to binarycrusader from comment #14) > Ryan, your conclusions basically match my thinking about it this morning. I > apologize if I came off as unreasonable, I just couldn't reconcile the > conflicting goals of having a portable locking mechanism that doesn't have > an Init/Destroy-style API. No need to apologize, you didn't sound unreasonable at all...we're both working to make a better piece of software here, of course. :) --ryan. |
Since changeset 358696c354a8, SDL 2.0 has been broken on Solaris when compiling with the Solaris Studio compiler (which uses the pthread implementation of SDL_AtomicLock). Notably, it gets stuck at the MemoryBarrierRelease in SDL_GetErrBuf: 6585 # 218 6586 if (!tls_errbuf && !tls_being_created) { 6587 SDL_AtomicLock_REAL ( & tls_lock ); 6588 if (!tls_errbuf) { 6589 SDL_TLSID slot; 6590 tls_being_created = SDL_TRUE; 6591 slot = SDL_TLSCreate_REAL ( ); 6592 tls_being_created = SDL_FALSE; 6593 { SDL_SpinLock _tmp = 0 ; SDL_AtomicLock_REAL ( & _tmp ) ; SDL_AtomicUnlock_REAL ( & _tmp ) ; }; ^^^ loops forever above 6594 tls_errbuf = slot; 6595 } 6596 SDL_AtomicUnlock_REAL ( & tls_lock ); 6597 } Running: testthread (process id 28926) ^Cdbx: warning: Interrupt ignored but forwarded to child. signal INT (Interrupt) in __nanosleep at 0xfe52a875 0xfe52a875: __nanosleep+0x0015: jae __nanosleep+0x23 [ 0xfe52a883, .+0xe ] Current function is SDL_Delay_REAL 204 was_error = nanosleep(&tv, &elapsed); (dbx) where [1] __nanosleep(0xfeffe848, 0xfeffe850, 0xfe75a5ac, 0xfe5169d8), at 0xfe52a875 [2] nanosleep(0xfeffe848, 0xfeffe850), at 0xfe516a3b =>[3] SDL_Delay_REAL(ms = 0), line 204 in "SDL_systimer.c" [4] SDL_AtomicLock_REAL(lock = 0xfeffe88c), line 104 in "SDL_spinlock.c" [5] SDL_GetErrBuf(), line 225 in "SDL_thread.c" [6] SDL_ClearError_REAL(), line 216 in "SDL_error.c" [7] SDL_InitSubSystem_REAL(flags = 0), line 116 in "SDL.c" [8] SDL_Init_REAL(flags = 0), line 244 in "SDL.c" [9] SDL_Init(a = 0), line 89 in "SDL_dynapi_procs.h" [10] main(argc = 1, argv = 0xfeffe948), line 65 in "testthread.c" As far as I can tell, this is because pthread_spin_trylock() always returns EBUSY for this particular lock; since it works in other places, I'm suspicious. Different Solaris Studio compiler versions seem to make no difference. I'll be checking the pthread version on Linux next.