We are currently migrating Bugzilla to GitHub issues.
Any changes made to the bug tracker now will be lost, so please do not post new bugs or make changes to them.
When we're done, all bug URLs will redirect to their equivalent location on the new bug tracker.

Bug 2618

Summary: incomplete pthread-based lock support should be removed
Product: SDL Reporter: binarycrusader
Component: atomicAssignee: Sam Lantinga <slouken>
Status: RESOLVED FIXED QA Contact: Sam Lantinga <slouken>
Severity: major    
Priority: P2 CC: icculus
Version: 2.0.0   
Hardware: x86_64   
OS: Solaris   
Attachments: Use pthread implementation for SDL_Atomic*Lock
Fix SDL pthread-based atomics support
Fix SDL pthread-based atomics support
removes incomplete pthread-based lock support

Description binarycrusader 2014-06-30 04:41:12 UTC
Since changeset 358696c354a8, SDL 2.0 has been broken on Solaris when compiling with the Solaris Studio compiler (which uses the pthread implementation of SDL_AtomicLock).

Notably, it gets stuck at the MemoryBarrierRelease in SDL_GetErrBuf:

6585 # 218
6586     if (!tls_errbuf && !tls_being_created) {
6587          SDL_AtomicLock_REAL ( & tls_lock );
6588         if (!tls_errbuf) {
6589             SDL_TLSID slot;
6590             tls_being_created = SDL_TRUE;
6591             slot =  SDL_TLSCreate_REAL ( );
6592             tls_being_created = SDL_FALSE;
6593              { SDL_SpinLock _tmp = 0 ; SDL_AtomicLock_REAL ( & _tmp ) ; SDL_AtomicUnlock_REAL ( & _tmp ) ; };
^^^ loops forever above
6594             tls_errbuf = slot;
6595         }
6596          SDL_AtomicUnlock_REAL ( & tls_lock );
6597     }


Running: testthread 
(process id 28926)
^Cdbx: warning: Interrupt ignored but forwarded to child.
signal INT (Interrupt) in __nanosleep at 0xfe52a875
0xfe52a875: __nanosleep+0x0015:	jae      __nanosleep+0x23	[ 0xfe52a883, .+0xe ]
Current function is SDL_Delay_REAL
  204           was_error = nanosleep(&tv, &elapsed);
(dbx) where
  [1] __nanosleep(0xfeffe848, 0xfeffe850, 0xfe75a5ac, 0xfe5169d8), at 0xfe52a875 
  [2] nanosleep(0xfeffe848, 0xfeffe850), at 0xfe516a3b 
=>[3] SDL_Delay_REAL(ms = 0), line 204 in "SDL_systimer.c"
  [4] SDL_AtomicLock_REAL(lock = 0xfeffe88c), line 104 in "SDL_spinlock.c"
  [5] SDL_GetErrBuf(), line 225 in "SDL_thread.c"
  [6] SDL_ClearError_REAL(), line 216 in "SDL_error.c"
  [7] SDL_InitSubSystem_REAL(flags = 0), line 116 in "SDL.c"
  [8] SDL_Init_REAL(flags = 0), line 244 in "SDL.c"
  [9] SDL_Init(a = 0), line 89 in "SDL_dynapi_procs.h"
  [10] main(argc = 1, argv = 0xfeffe948), line 65 in "testthread.c"

As far as I can tell, this is because pthread_spin_trylock() always returns EBUSY for this particular lock; since it works in other places, I'm suspicious.

Different Solaris Studio compiler versions seem to make no difference.

I'll be checking the pthread version on Linux next.
Comment 1 binarycrusader 2014-06-30 04:56:25 UTC
I've verified this is broken on Linux as well if SDL_spinlock.c is modified to use the pthread implementation.
Comment 2 binarycrusader 2014-06-30 04:57:32 UTC
Created attachment 1716 [details]
Use pthread implementation for SDL_Atomic*Lock

Patch to use only pthread implementation of atomic locks for testing
Comment 3 binarycrusader 2014-06-30 05:05:30 UTC
This appears to be because pthread_spin_init() and pthread_spin_destroy() are not used with the locks as required.
Comment 4 Sam Lantinga 2014-06-30 05:29:25 UTC
Nice find! Do you have a tested fix?
Comment 5 binarycrusader 2014-06-30 05:50:20 UTC
I haven't tried to fix it yet.  My guess is we'll need some sort of lock init and destroy functions to be performant but I haven't finished investigating.
Comment 6 binarycrusader 2014-06-30 18:31:37 UTC
So I have a working fix I've tested on Linux, I'll test on Solaris next, and if that works, I'll attach the patch.
Comment 7 binarycrusader 2014-06-30 20:11:27 UTC
Created attachment 1718 [details]
Fix SDL pthread-based atomics support

The attached patch is a somewhat significant rework of the atomics support; it adds two new API methods (SDL_AtomicLockInit, SDL_AtomicLockDestroy) and changes the typedef of SDL_SpinLock when using pthread-based spinlocks.

I do not claim to be an atomics or threading expert, but all of the tests work as expected after the attached patch on Solaris.

It also worked at one point on Linux, but that was only because Linux essentially does nothing for pthread_spin_init() (well, it sets the lock pointer to 0...).  This should be tested on Linux again, which I'll do as soon as I can.

I'd appreciate it a threading / atomics expert or two looked over these changes.

I'll file a future bug to add real atomics support for Solaris soon.
Comment 8 binarycrusader 2014-06-30 20:13:27 UTC
Note that on Solaris, pthread_spinlock_t is actually:

typedef	pthread_mutex_t	pthread_spinlock_t;

...so you can't assume that a spinlock is just a volatile int.
Comment 9 binarycrusader 2014-06-30 20:32:22 UTC
Created attachment 1720 [details]
Fix SDL pthread-based atomics support

This version of the patch has been tested on Linux; the only thing I changed was a C99-style int declaration in a for loop.

This should be good to go.
Comment 10 Sam Lantinga 2014-07-01 08:34:54 UTC
Unfortunately the spinlock semantics are currently that you can declare one initialized to zero and immediately use it. That's the whole point behind using them for protecting static data.

In fact if you look, you now have a race condition where multiple threads can initialize the spinlock for the TLS system.

This requires more thought...
Comment 11 binarycrusader 2014-07-01 13:42:39 UTC
The idea that you can declare a spinlock initialized to zero is not going to work with pthreads.  At least, not as a simole variable.  The posix docs are pretty clear about this.

I assume the race condition you refer to is the already initialized check I have for the tls_lock?
Comment 12 binarycrusader 2014-07-01 14:02:47 UTC
To be clear, pthread spinlocks are not explicitly ints, so you can't set them to any value really.  They're an opaque data type.  Most high performance lock mechanisms I'm aware of have some sort of init and destroy, so I don't see how that can be avoided either.
Comment 13 Ryan C. Gordon 2014-07-01 15:49:36 UTC
We absolutely can't add Init calls for spinlocks, because there are programs in the wild that use spinlocks without explicit initialization that would blow up.

Here's my idea:

- We delete the pthread spinlock code. It's useless anyhow: on most places we'd use it we have GCC or Clang, which both use the real atomic codepaths, on the other places, we can't guarantee correct initialization.

- We get the real atomics working on Sun Studio (what prevents that, anyhow?).

- As the fallback for not having real atomics, we keep the SDL_ATOMIC_DISABLED section, which currently fakes it with an SDL_mutex, and just put a dummy lock/unlock at the start of SDL_Init(), which (mostly) mitigates race conditions by (probably) initializing it from a single thread before anything else is likely to.

--ryan.
Comment 14 binarycrusader 2014-07-01 17:04:13 UTC
(In reply to Ryan C. Gordon from comment #13)
> We absolutely can't add Init calls for spinlocks, because there are programs
> in the wild that use spinlocks without explicit initialization that would
> blow up.
> 
> Here's my idea:
> 
> - We delete the pthread spinlock code. It's useless anyhow: on most places
> we'd use it we have GCC or Clang, which both use the real atomic codepaths,
> on the other places, we can't guarantee correct initialization.
> 
> - We get the real atomics working on Sun Studio (what prevents that,
> anyhow?).

Nothing specifically as far as I know, I just haven't taken the time to figure out what's required yet.  All I was trying to do was actually fix the pthread-based lock mechanism to work correctly.

To be clear, the existing pthread-based lock code didn't work on Linux either -- so it it's not a problem specific to Solaris.  It was just easier to find the problem there since atomics don't work out of the box with the Studio compiler :-)

> - As the fallback for not having real atomics, we keep the
> SDL_ATOMIC_DISABLED section, which currently fakes it with an SDL_mutex, and
> just put a dummy lock/unlock at the start of SDL_Init(), which (mostly)
> mitigates race conditions by (probably) initializing it from a single thread
> before anything else is likely to.

Ryan, your conclusions basically match my thinking about it this morning.  I apologize if I came off as unreasonable, I just couldn't reconcile the conflicting goals of having a portable locking mechanism that doesn't have an Init/Destroy-style API.

There's just no way you can rely on OS interfaces for locks without some sort of init and destroy mechanism.  This isn't intended as criticism, but this seems like a fairly large oversight in the original design of the SDL locking mechanism.

I'll work on getting atomics support for Solaris Studio and open a separate bug for that so this bug can remain as a matter of record.

I'd suggest this bug be retitled as "pthread-based lock support should be removed" :-)
Comment 15 binarycrusader 2014-07-01 17:05:51 UTC
I'm happy to provide the patch to rip out the pthread-based stuff entirely, unless one of you want the joy of deleting code all to yourself...
Comment 16 binarycrusader 2014-07-02 01:15:10 UTC
Created attachment 1723 [details]
removes incomplete pthread-based lock support
Comment 17 Sam Lantinga 2014-07-08 04:28:19 UTC
pthread spinlock support has been removed, thanks!
https://hg.libsdl.org/SDL/rev/13b0df1793e8
Comment 18 Ryan C. Gordon 2014-07-08 16:35:43 UTC
(In reply to binarycrusader from comment #14)
> Ryan, your conclusions basically match my thinking about it this morning.  I
> apologize if I came off as unreasonable, I just couldn't reconcile the
> conflicting goals of having a portable locking mechanism that doesn't have
> an Init/Destroy-style API.

No need to apologize, you didn't sound unreasonable at all...we're both working to make a better piece of software here, of course.  :)

--ryan.