We are currently migrating Bugzilla to GitHub issues.
Any changes made to the bug tracker now will be lost, so please do not post new bugs or make changes to them.
When we're done, all bug URLs will redirect to their equivalent location on the new bug tracker.

Bug 4142 - Concurrency issues in Android backend
Summary: Concurrency issues in Android backend
Status: WAITING
Alias: None
Product: SDL
Classification: Unclassified
Component: video (show other bugs)
Version: 2.0.8
Hardware: All Android (All)
: P2 major
Assignee: Sam Lantinga
QA Contact: Sam Lantinga
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-18 13:07 UTC by Andreas Falkenhahn
Modified: 2019-06-28 14:40 UTC (History)
3 users (show)

See Also:


Attachments
test-case (5.78 KB, text/x-csrc)
2019-01-03 13:33 UTC, Sylvain
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Falkenhahn 2018-04-18 13:07:28 UTC
AFAICS the Android backend has several concurrency issues because of the fact that SDL_Main() is run on its own thread whereas  window callbacks like onNativeSurfaceChanged() and onNativeSurfaceDestroyed() run on the UI thread. Both threads can access "Android_Window" at the same time and there doesn't seem to be any synchronization mechanism. 

For example, consider the following code in onNativeSurfaceDestroyed() in SDL_android.c. This code is running on the UI thread:

    if (Android_Window == NULL || Android_Window->driverdata == NULL ) {
        return;
    }

    _this =  SDL_GetVideoDevice();
    data = (SDL_WindowData *) Android_Window->driverdata;

It could happen that "Android_Window" is set to NULL by the thread that is running SDL_Main() right after the code above that is running on the UI thread has passed the NULL pointer check. When trying to access "Android_Window->driverdata" then, the code will crash because "Android_Window" is now NULL. So I think this code needs some synchronization with the thread that is running SDL_Main().

Also, onNativeSurfaceDestroyed() has this code:

    if (data->egl_surface != EGL_NO_SURFACE) {
        SDL_EGL_MakeCurrent(_this, NULL, NULL);
        SDL_EGL_DestroySurface(_this, data->egl_surface);
        data->egl_surface = EGL_NO_SURFACE;
    }

Once again, there is no synchronization and the SDL thread could be trying to access "data->egl_surface" while the UI thread is trying to destroy it which could lead to a crash.

These are just two issues I saw. There are probably more because of the multithreaded nature of the Android backend.

I've also posted about this here but I haven't received any reply:
https://discourse.libsdl.org/t/potential-serious-threading-issue-in-android-backend/24135

Since I consider this to be a serious issue I've decided to file a ticket for this so that it isn't forgotten.
Comment 1 Sam Lantinga 2018-04-18 17:54:39 UTC
Yep, these are definitely issues. I don't have an Android environment set up right now, so do you want to submit a tested patch that addresses these?
Comment 2 Andreas Falkenhahn 2018-04-19 19:38:48 UTC
Unfortunately, I currently don't have enough time to work on a fix but maybe in a few weeks.
Comment 3 Sylvain 2019-01-03 11:14:24 UTC
I am preparing a fix for this issue because this can clearly occur!
Comment 4 Sylvain 2019-01-03 13:22:49 UTC
I think this is fixed by: https://hg.libsdl.org/SDL/rev/93771c30420b

Maybe there are corners cases remaining ...

Previous commits are no-op, just for simplification
https://hg.libsdl.org/SDL/rev/219a154f54cb
https://hg.libsdl.org/SDL/rev/6053ab61996a
Comment 5 Sylvain 2019-01-03 13:33:27 UTC
Created attachment 3558 [details]
test-case

Here's a testcase,

Create-destroy window/renderer multiple times, adding also some timing delays.


To use it, you need some modification:


1/ You can deactivate semaphores by commenting out its creation in 
src/core/android/SDL_android.c
(eg: Android_ActivitySem = SDL_CreateSemaphore(1); )
It will remain NULL, and be ignored by SDL_SemWait()/SDL_SemPost()



2/ You can add a delay in the java side 
In 'src/video/android/SDL_androidtouch.c' file
In 'Android_OnTouch()' function


     SDL_Log("Wait <...");
     SDL_Delay(500);
     Android_GetWindowCoordinates(window, 1, 1, &window_x, &window_y);
     SDL_Log(" ...> Done (window is %d x %d)", window_x, window_y);


test 1:
It shows concurrency issues :
- the window size gets invalid sometimes. Because the window is created/destroyed during the delay.
(well, this is the purpose of this test-case ...)

test 2:
At the end, a crash is reproducible (and it seems to be the normal case).
If you do quickly: 
touch screen several time (add some lag in the main loop)
press back to quit
touch screen several time -> Crash
(in SetMouseFocus() .. WINDOWENTER_EVENT, because SDL_VideoDevice "_this" became null)
Comment 6 Sylvain 2019-01-03 19:22:39 UTC
Update: use SDL_Mutex instead of SDL_Semaphore (which were used as mutex)

  https://hg.libsdl.org/SDL/rev/c2e223dc5da9
Comment 7 Sylvain 2019-01-03 22:29:43 UTC
Another concurrency issue: eglSwapBuffers reports an error of EGL_BAD_SURFACE

https://hg.libsdl.org/SDL/rev/e6130a4cca7a


Occurs when application goes to background:
- Java activity is destroying SurfaceView holder and "egl_surface" (in onNativeSurfaceDestroyed())
- While native thread is in Android_GLES_SwapWindow(), prepared to call SDL_EGL_SwapBuffers()

The error is "call to eglSwapBuffers failed, reporting an error of EGL_BAD_SURFACE"

It can be reproduced easily by adding a SDL_Delay(100) at the beginning of SDL_EGL_SwapBuffers(),
and putting the application into background
Comment 8 Andreas Falkenhahn 2019-01-09 14:27:23 UTC
What about this code in android_egl_context_backup()?

    /* Keep a copy of the EGL Context so we can try to restore it when we resume */
    SDL_WindowData *data = (SDL_WindowData *) window->driverdata;
    data->egl_context = SDL_GL_GetCurrentContext();
    /* We need to do this so the EGLSurface can be freed */
    SDL_GL_MakeCurrent(window, NULL);

It seems to be called when onPause() is called but not on the UI thread but on the SDL thread. That's why it could happen that onSurfaceDestroyed() gets to be called before android_egl_context_backup(). Then android_egl_context_backup() won't be able to backup the context because it is NULL already (it is set to NULL in onNativeSurfaceDestroyed). So maybe the context should be backed up in onNativeSurfaceDestroyed(), somewhere before here:

        if (data->egl_surface != EGL_NO_SURFACE) {
            SDL_EGL_MakeCurrent(_this, NULL, NULL);
            SDL_EGL_DestroySurface(_this, data->egl_surface);
            data->egl_surface = EGL_NO_SURFACE;
        }

But I'm not sure...
Comment 9 Sylvain 2019-01-09 14:53:11 UTC
Yes android_egl_context_backup() is called from the SDL thread. Just before the main loop in the user application blocks.


Yes, when going to backgroung for instance, usually you have
android_egl_context_backup() called in the SDL thread.
then
onNativeSurfaceDestroyed() called from the UI thread.

If you add a SDL_Delay just at the begining of Android_PumpEvents(),
it becomes slower you can have the inverse. 1/ onNativeSurfaceDestroyed, then 2/ android_egl_context_backup().



But, it doesn't seem to be an issue .. I see no error/nor black screen.
I see no SDL_RENDER_DEVICE_RESET neither in _restore().
Since android_egl_context_backup() is doing "data->egl_context = SDL_GL_GetCurrentContext();" 

Isn't this ok ?
Comment 10 Andreas Falkenhahn 2019-01-09 15:18:33 UTC
Does it restore correctly in that case?
Comment 11 Sylvain 2019-01-09 15:44:29 UTC
I think so:

.. I see no error/nor black screen.
I see no SDL_RENDER_DEVICE_RESET neither in _restore().

Or how should I check ?


I tried with an app. 
And also with testgles.c
Comment 12 Andreas Falkenhahn 2019-01-09 16:04:29 UTC
Maybe it's harmless then. I just thought it's a little odd to backup the context on the SDL thread instead of in onNativeSurfaceDestroyed() which looks like the natural place to do this. It also would eliminate one more switch between the UI and SDL thread which I think would be quite desirable because it reduces the multithreading complexity even more.
Comment 13 Sylvain 2019-01-09 16:21:08 UTC
- backup/restore use thread local storage SDL_TLS Set/Get
so maybe it has to run on the SDL Thread ?! 


Also: testegl.c is an interesting testcase. User manipulates its context.
for this testcase, the backup() has to be done exactly before waiting the semaphore.
(before "if (SDL_SemWait(Android_ResumeSem) == 0)" );

https://hg.libsdl.org/SDL/rev/6acc7df4a175
https://bugzilla.libsdl.org/show_bug.cgi?id=2134


- onNativeSurfaceDestroyed: is the place to invalidate the egl_surface, where we have to destroy it. 

... hence, the synchronisation between UI Thread and SDL Thread.
Comment 14 Ellie 2019-04-05 02:56:18 UTC
This may be totally unrelated, but the code reads a little as if SDL_ANDROID_BLOCK_ON_PAUSE set to 0, the context won't be backed up and later restored at all. Does that mean SDL_ANDROID_BLOCK_ON_PAUSE=0 will currently inevitably risk corruption, even if the user code/app is taking super extra care to never accesses the GL context on its own in between SDL_APP_WILLENTERBACKGROUND and SDL_DIDENTERFOREGROUND?

Sorry if that isn't the case, it was just something that occurred me reading the patches and changes linked so far in the discussion
Comment 15 Sylvain 2019-04-05 07:28:34 UTC
Thanks for asking, I've less tested the non-blocking mode

https://hg.libsdl.org/SDL/file/d311fcb8066a/src/video/android/SDL_androidevents.c#l143

It seems it also calls:
  line 158  android_egl_context_restore(Android_Window);
  line 171  android_egl_context_backup(Android_Window);

Indeed, there is no implicit acknowledge like making sure the app has consumed WILL_ENTER_BG before backing-up the context.
Comment 16 Ellie 2019-04-10 14:50:26 UTC
> Indeed, there is no implicit acknowledge like making sure the app has consumed WILL_ENTER_BG before backing-up the context.

I just tested this, I have a couple of moments where my app will hang a second or two (loading screen!) and won't poll events immediately, and when I tab out exactly while that happens I'm almost GUARANTEED to get texture corruption once I tab back in. So this definitely seems to be an issue in practice
Comment 17 Sylvain 2019-04-11 06:46:52 UTC
I also see some glitch. but this is a little different (but 50% reproduce able on my side, portrait App).
Each time I got to background/foreground, when the app reappears, the last/bottom line of pixel of the screen has noisy pixels. It's really quick and not disturbing, but it's there.

I thought this was also the lack of hand-shaking for the onPause(). So I just add a SDL_Delay(500) at the end of "src/core/android/SDL_android.c:onPuase()" but that doesn't change anything. It should have done the trick.

Maybe you could try also a SDL_Delay() there for you to see what happens.

Anyway: if you are stuck a few seconds, you don't want this kind of acknowledgement. You'll get an ANR report.
And, this seems to me more dangerous than solving anything.

I would have added a small SDL_Delay() as a *soft/loose* hand-shaking. but that doesn't change anything.
Comment 18 Ellie 2019-04-11 09:32:11 UTC
I just checked, ANR seems to be 5 seconds. That is pretty long.

In any case, I think SDL2 should at least wait a little to give the app a chance to possibly consume the event before it backups the context, and I think that should be a longer time than like just 100ms IF that is possible per android APIs. (Of course if the app consumes it quickly it should also back it up quickly)
Comment 19 Ellie 2019-04-11 09:38:10 UTC
For what it's worth, regarding the trigger:

If I back out at any other point other then these hangs I don't see this corruption. (at least so far never have) And these hangs are never GFX related, it's not an asset loading screen: I'm working on a sort of document editor and the hang is e.g. during complex serialization for saving, where it doesn't touch any SDL_Texture or similar at all - so it shouldn't be related to any specific type of drawing op

And for the sort of corruption:

Also what I'm seeing is more than just a few noisy pixels, what happens is that a huge bunch of the SDL_Textures just goes pitch black and others go fully transparent/unrenderable (they disappear), and from that point on there is a chance that with any future tab out/back in the entire viewport will freeze with the old image and no longer render and update. So this is more than just a small cosmetic thing

As for SDL_Delay testing:

That might be worth a try, although what I'll probably do first is try to reduce the time between the SDL_PollEvent a little since I should do that anyway. I would assume that might make it less likely already
Comment 20 Sylvain 2019-04-11 09:43:41 UTC
Maybe this is different issue ...


SDL sends events: 
- will_enter_bg
- did_enter_bg

Once you receive will_enter_bg, you're warned. 
And on your next call of SDL_PollEvent, the egl context is actually backed-up and you read did_enter_bg event.
This seems ok to me.

this is how it's done in block_on_pause mode.

on non-blocking, you need this patch from bug 4578
so that it remains in state "inPausing" until you fetch the did_enter_bg event.
Comment 21 Ellie 2019-04-11 09:46:25 UTC
Yeah I run with non-blocking, and I don't use the patch right now. (just the tip with no patches) So that would be the problem then? I do serialize on SDL_WILLENTERBACKGROUND and not did enter background, so regarding that I should be okay

Sorry for muddling this ticket with a potentially unrelated issue :O
Comment 22 Sylvain 2019-04-11 10:10:33 UTC
Maybe, the patch will wait for did_enter_bg to be removed to backup the context
Comment 23 superfury 2019-06-19 20:06:12 UTC
This might be the case as well with current SDL commits. I notice that, since enabling my app to run in the background, the graphics context or something related doesn't get restored most of the time putting the app back in the foreground(SDL_APP_DIDENTEREDFOREGROUND event). My app, when entering the foreground that way, will start rendering again, forcing a redraw of screen content from the backbuffer it uses. Said forced redraw also forces a new SDL context etc. to be createn(SDL_CreateWindow etc.), but destroying the old context etc. that was already there.
Comment 24 Sylvain 2019-06-20 08:31:30 UTC
I don't really fully understand your message :)

Do you use the latest source code ?

As soon as your read WILL_ENTER_FG you should stop rendering.
As soon as your read DID_ENTER_BG you can resume rendering.


I re-tried quickly with
SDL_SetHint(SDL_HINT_ANDROID_BLOCK_ON_PAUSE, "0");
and faking with some SDL_Delay()
It seems to work ...

You can add some log when you stop/start rendering.
And when the backup/restore are done. (SDL_androidevents.c)
Comment 25 superfury 2019-06-20 09:07:00 UTC
My app keeps a set of flags that handle foreground/background operations, focus information etc. as a single bitmap(1 bit for each), see https://bitbucket.org/superfury/commonemuframework/src/35b198714b17c0ab7aa3cc80e2238d62b299f1e6/emu/io/input.c look for the didenterforeground/didenterbackground. It updates the global flags, which are used by the main loop to determine what to do with SDL input/output/looping control(making the main routine stop working, depending on settings to fully halt, partly halt(keep recording/playback of sound or just stop rendering). Everything is determined bg the haswindowactive flags.

For the GPU side, look at https://bitbucket.org/superfury/commonemuframework/src/35b198714b17c0ab7aa3cc80e2238d62b299f1e6/emu/gpu/gpu.c (CPU_updateVideo, which (re)creates the SDL rendering surface when starting/resuming/changing the display), https://bitbucket.org/superfury/commonemuframework/src/35b198714b17c0ab7aa3cc80e2238d62b299f1e6/emu/gpu/gpu_renderer.c (for SDL surface flipping).

Audio rendering keeps running, but may be muted(both input and/or output, depending on settings), using the same flags. Everything but the video aspects is user-configurable using a settings option(configurable from a settings menu or configuration text file).
Comment 26 superfury 2019-06-20 09:11:07 UTC
Of course, all other surfaces that are not the main rendering surface(that's flipped) keep updating unconditionally at a up to 60Hz rate. It's just the rendering to the screen(the SDL_flip equivalent) that's affected, together with audio and surface creation of the window surface.

Also, the SDL(2) layer used:
https://bitbucket.org/superfury/commonemuframework/src/35b198714b17c0ab7aa3cc80e2238d62b299f1e6/emu/gpu/gpu_sdl.c
Comment 27 Sylvain 2019-06-20 10:54:24 UTC
Just to make sure, when you're pushed to foreground, you don't need to re-create the window, nor the renderer.

What issues are  you seeing, and also what is in the android log cat ?
Did you try disabling the BATCHING ? 
Use latest sources ?
Comment 28 Ellie 2019-06-20 13:50:30 UTC
Things I can share from my experience: 1. re-creating the renderer when resuming not only will not help but actually break things, 2. you can't not only not flip while in background, but you can also neither create nor destroy textures, nor do anything else with them (no color mod, alpha mod changes, no pixel access, ...), 3. Surface access/creation/changes should be fine as long as you're not updating the window surface or converting any surfaces to textures
Comment 29 superfury 2019-06-20 17:59:35 UTC
I'm using the SDL2 commit from 20190619 at 06:43:54.

I've just modified the update function to renew the renderer surface(release it and recall the allocation functions(updateWindow)) after being restored to foreground.

Or shouldn't updateWindow be called again?
Comment 30 superfury 2019-06-20 18:01:48 UTC
But the pixels in the surface are being updated in the background, just not flipped(rendered using the SDL compatibility method (the flip method in my app's gpu_sdl.c).
Comment 31 Sylvain 2019-06-20 18:09:26 UTC
Again: when you're pushed to foreground:
- you don't need to re-create the window, nor the renderer.
- you don't need to modify any of your SDL_Surface.
- you don't need to modify any of your SDL_Texture, unless you got one of the event SDL_RENDER_TARGETS_RESET or SDL_RENDER_DEVICE_RESET.


BTW, looking at your code:

case SDL_WINDOWEVENT_FOCUS_LOST:
				lock(LOCK_INPUT);
				hasinputfocus = 0; //Lost input focus!
				unlock(LOCK_INPUT);
				#ifdef ANDROID
				goto didenterbackground; //Simulate background!
				#endif

You don't need anymore to do a didenterbackground when receiving FOCUS_LOST.
Now, Android can work in multi-window (split view), so you can have two apps shown at the same time. And when your app is not focus, it can still run and update the screen.
Comment 32 superfury 2019-06-20 18:32:53 UTC
OK. I'll look at that one later.

For now, I've added a bit more code to the didenterforeground event. It now will set needvideoupdate and GPU.forceredraw to 1, enforcing a full redraw across all used video layers(from bottom to top layer, all the way to the renderer's layer).

I've also forced a renderer(updateWindow) refresh on entering foreground(on Android), which will release the old SDL_Surface of the renderer by setting it to NULL before calling updateWindow.
UpdateWindow in turn will do a full SDL reallocation(SDL_destroytexture, SDL_destroyRenderer, free the surface wrapper(used for the app only, leaving the actual SDL_Surface untouched), SDL_setWindowSize, SDL_SetWindowFullscreen, SDL_setWindowIcon, SDL_CreateRenderer and finally SDL_RenderSetLogicalSize and SDL_CreateRGBSurface for a new rendering surface).

I now notice that the app correctly returns from being backgrounded every single time, instead of at random/sporadically.

So something seems to have been fixed?
Comment 33 superfury 2019-06-20 19:52:45 UTC
(In reply to Sylvain from comment #31)
> Again: when you're pushed to foreground:
> - you don't need to re-create the window, nor the renderer.
> - you don't need to modify any of your SDL_Surface.
> - you don't need to modify any of your SDL_Texture, unless you got one of
> the event SDL_RENDER_TARGETS_RESET or SDL_RENDER_DEVICE_RESET.
> 
> 
> BTW, looking at your code:
> 
> case SDL_WINDOWEVENT_FOCUS_LOST:
> 				lock(LOCK_INPUT);
> 				hasinputfocus = 0; //Lost input focus!
> 				unlock(LOCK_INPUT);
> 				#ifdef ANDROID
> 				goto didenterbackground; //Simulate background!
> 				#endif
> 
> You don't need anymore to do a didenterbackground when receiving FOCUS_LOST.
> Now, Android can work in multi-window (split view), so you can have two apps
> shown at the same time. And when your app is not focus, it can still run and
> update the screen.

I tried using it with my Android device, but Android is giving me a message saying that it "Can't use this app in multi-window view", as that message states. The icon for multiple windows is also not visible(the two squares that enable said view). Perhaps it needs some kind of permission to use that? Anyway, this way, that multi-window view issue simply solved itself?
Comment 34 superfury 2019-06-20 20:02:45 UTC
One thing about the resuming from being backgrounded, since it determines the maximum resolution each time it resumes, I saw one instance where the fullscreen view wasn't completely fullscreen? It always takes the first display resolution item it gets the first entry of SDL_ListModes(NULL,SDL_FULLSCREEN|SDL_HWSURFACE) for SDL or SDL_getCurrentDisplayMode(0,&displayMode)==0.

But that's only done on boot, so why wouldn't it properly use fullscreen? It looked like it's used some kind of border filling display mode?
Comment 35 superfury 2019-06-20 20:12:07 UTC
Also, you say that responding to input is fine when in multi-window mode? But when you minimize your app to start closing it, switch to another app or switch to multi-window mode, doesn't the app receive a SDL_EVENT_DIDENTERBACKGROUND, making it forbidden to render or give output? So the app will inhibit rendering in multi-window mode, while accepting input anyways?

Also, since my app runs at fullscreen at a fixed autodetect resolution(the (screen) resolution during it's startup), it won't display fine with only half a screen? Probably only the left half would be rendered? Or does the reported first resolution of the list change during such a mode?
Comment 36 Ellie 2019-06-20 20:19:50 UTC
It sounds like these questions might be better directed at the forum, but for what it's worth: multi-window means multi-window as on desktop, you see multiple windows at once. (E.g. as in Android 7+'s side by side, or Samsung Dex, or Android 10's upcoming experimental desktop mode) If you use latest SDL hg, you won't get a background event in side by side if you lose focus and your app stays visible. In that scenario, while you're still no longer the main currently active foreground/focused app, you can still render & get input

> Or does the reported first resolution of the list change during such a mode?

You get a window resize same as when orientation changes, so your app needs to be able to handle that properly
Comment 37 superfury 2019-06-20 20:39:27 UTC
With SDL_WINDOWEVENT_SIZE_CHANGED, it simply does the same as resuming the app(SDL_EVENT_DIDENTERFOREGROUND), but clears window_xres and window_yres, which causes the renderer to call the earlier mentioned SDL_GetCurrentDisplayMode(0,&displayMode)==0, which will cause it to update the display resolution according to the displaymode.w and displaymode.h accordingly. Will that event happen again when minimizing the app to the Android task manager and resuming it in fullscreen(single-window) mode? Otherwise it will keep said resolution when resuming in singls-window mode, which would give strange output.
Comment 38 Sylvain 2019-06-28 14:40:36 UTC
one more concurrency issue: https://hg.libsdl.org/SDL/rev/dd9169424181