We are currently migrating Bugzilla to GitHub issues.
Any changes made to the bug tracker now will be lost, so please do not post new bugs or make changes to them.
When we're done, all bug URLs will redirect to their equivalent location on the new bug tracker.

Bug 2703 - Android null pointer derefernce in Android_JNI_GetNativeWindow
Summary: Android null pointer derefernce in Android_JNI_GetNativeWindow
Status: RESOLVED INVALID
Alias: None
Product: SDL
Classification: Unclassified
Component: video (show other bugs)
Version: 2.0.3
Hardware: ARM Android (All)
: P2 major
Assignee: Gabriel Jacobo
QA Contact: Sam Lantinga
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-23 23:08 UTC by John McFarlane
Modified: 2014-09-30 15:53 UTC (History)
0 users

See Also:


Attachments
Filtered logcat excerpt broken up with description of repro steps (1.25 KB, text/plain)
2014-08-24 16:58 UTC, John McFarlane
Details
AndroidManifest.xml (1.30 KB, text/xml)
2014-09-17 15:28 UTC, John McFarlane
Details

Note You need to log in before you can comment on or make changes to this bug.
Description John McFarlane 2014-08-23 23:08:59 UTC
The following line of Android_JNI_GetNativeWindow:

s = (*env)->CallStaticObjectMethod(env, mActivityClass, midGetNativeSurface);

can return a NULL pointer. In the following line:

    anw = ANativeWindow_fromSurface(env, s);

The NULL pointer is passed into a function which causes a crash with stack including the following:

I/DEBUG   (20891): backtrace:
I/DEBUG   (20891):     #00  pc 00054be4  /system/lib/libdvm.so
I/DEBUG   (20891):     #01  pc 000672ef  /system/lib/libandroid_runtime.so (android::android_view_Surface_getSurface(_JNIEnv*, _jobject*)+30)
I/DEBUG   (20891):     #02  pc 0006732f  /system/lib/libandroid_runtime.so (android::android_view_Surface_getNativeWindow(_JNIEnv*, _jobject*)+6)
I/DEBUG   (20891):     #03  pc 00009375  /system/lib/libandroid.so (ANativeWindow_fromSurface+10)
I/DEBUG   (20891):     #04  pc 00072368  /data/app-lib/com.jsam.crag-2/libSDL2.so (Android_JNI_GetNativeWindow+128)
I/DEBUG   (20891):     #05  pc 0014f5c4  /data/app-lib/com.jsam.crag-2/libSDL2.so (Android_CreateWindow+344)

Repro:
Unfortunately I don't have a reliable repro. I need to repeatedly press the device's screen-off button and re-activate the app until some unknown condition is met. However, with the suggested fix (below) it is possible for the app to bail gracefully.

Suggested fix:
Note that the calling function, Java_org_libsdl_app_SDLActivity_onNativeSurfaceChanged, also assumes non-NULL return from Android_JNI_GetNativeWindow. Thus the full suggested fix is:

diff -r 9887ee47e143 src/core/android/SDL_android.c
--- a/src/core/android/SDL_android.c	Sat Aug 23 11:00:16 2014 -0700
+++ b/src/core/android/SDL_android.c	Sat Aug 23 16:03:31 2014 -0700
@@ -223,7 +223,12 @@
             ANativeWindow_release(data->native_window);
         }
         data->native_window = Android_JNI_GetNativeWindow();
-        data->egl_surface = SDL_EGL_CreateSurface(_this, (NativeWindowType) data->native_window);
+        if (! data->native_window) {
+            data->egl_surface = NULL;
+        }
+        else {
+            data->egl_surface = SDL_EGL_CreateSurface(_this, (NativeWindowType) data->native_window);
+        }
     }
     
     /* GL Context handling is done in the event loop because this function is run from the Java thread */
@@ -454,6 +459,10 @@
     JNIEnv *env = Android_JNI_GetEnv();
 
     s = (*env)->CallStaticObjectMethod(env, mActivityClass, midGetNativeSurface);
+    if (! s) {
+        return NULL;
+    }
+    
     anw = ANativeWindow_fromSurface(env, s);
     (*env)->DeleteLocalRef(env, s);
Comment 1 Sam Lantinga 2014-08-24 07:07:15 UTC
Can you double check whether this is active in the current source snapshot?
http://www.libsdl.org/tmp/SDL-2.0.zip

Thanks!
Comment 2 John McFarlane 2014-08-24 08:28:15 UTC
It happens in the snapshot and it happens at head of default branch of http://hg.libsdl.org/SDL. (Wasn't entirely clear which version to put down in report. HG 2.10 perhaps?) However, this is only after the patch from 2679 is applied. Otherwise, a crash occurs earlier and more frequently.

A couple of points:

Firstly, my suggested fix only postpones problems as SDLActivity assumes the surface exists in a number of places.

Secondly, this is specifically one of a number of problems which occur when the power button is pressed to deactivate - then reactivate - the phone and when there is screen lock set up. On phones where reactivating takes you straight back to the previous app, behavior is as with normal suspend/resume behavior and all is well.

It seems that a problematic combination of callbacks reaches the activity when a screen lock is enabled. On deactivate I see:
- onPause, nativePause, onDestroy (sends SDL_QUIT), surfaceDestroyed, onCreate, onResume, onPause, surfaceCreated, surfaceChanged.

This basically shuts down the app and then restarts it (SDL_main called) - all before the phone's screen is turned back on! When the phone is unlocked, the app receives another compliment of shutdown and (usually) startup messages. Main test device is an HTC One and whether this is due to some shonky HTC customization or whether this happens on stock Android, I'm not sure. Certainly I'm receiving reports that this repro causes high failure rates on Galaxy G5s. I aim to debug on one soon.

I have already dealt with a number of bugs caused by the fact that when SDL_main is re-run, global static state is not reset so I think I can rule that out. I wonder whether a better overall strategy would be for SDLActivity to react to destroy/create by merely passing suspend/resume events to the event queue. It's questionable whether quit/terminate events are appropriate on mobile devices.

Thanks!
Comment 3 Sam Lantinga 2014-08-24 09:31:54 UTC
Okay, thanks for the info. Gabriel, can you take a look at this for the 2.0.4 release?
Comment 4 John McFarlane 2014-08-24 16:58:23 UTC
Created attachment 1848 [details]
Filtered logcat excerpt broken up with description of repro steps

I've attached a log which shows the events as they're received by SDLActivity (with 2679 patch applied). From this you can see the double dose of deactivation and activation the app goes through as the phone screen is turned off and on again when a swipe or lock screen is enabled. In this run, nothing untoward happens but I think this sequence generally begs for race conditions to occur in SDLActivity.

In this case, app is running on an (ancient) HTC Nexus One running 2.3.X but the same thing happens on current hardware and OS combo.
Comment 5 Gabriel Jacobo 2014-09-16 11:49:16 UTC
Are you running a custom SDLActivity.java? Is it up to date with tip? The global state should be reset in OnCreate and OnDestroy (see the SDLActivity.initialize()  call), so if you are having issues with that make sure you have the latest version in place.

What does your AndroidManifest.xml look like? Do you have "orientation" in android:configChanges? Build and target API levels? Do you get the OnDestroy events in both phone orientations when you push the power button or unlock, or just in landscape?
Comment 6 John McFarlane 2014-09-16 15:49:45 UTC
This appears to be resolved by the suggest fix from Gabriel Jacobo in 2679. I'll wait for it show up in repo before closing these bugs unless otherwise directed. Thanks again, John.
Comment 7 John McFarlane 2014-09-17 05:23:38 UTC
Actually, I think that that patch merely reduces the frequency. I've added a switch which sends a quit message to the app very soon after it's started up and this seems to cause the context crash more reliably. I'm going to send a stable release build to someone with a Galaxy S5 who is very good at crashing my app to see if the patch helps in that case.
Comment 8 John McFarlane 2014-09-17 15:28:52 UTC
Created attachment 1871 [details]
AndroidManifest.xml
Comment 9 John McFarlane 2014-09-17 15:34:59 UTC
AndroidManifest.xml attached, build/target version are 14/10. The orientation is fixed. OnDestroy gets called plenty. (The attached log show SDL log output.) I've done what I can to prevent initialized global variables from causing a problem when main is re-entered but haven't managed to get Valgrind running on my device yet.
Comment 10 John McFarlane 2014-09-30 15:53:27 UTC
I don't have the time to delve into this issue and assuming it's possible for the client to corrupt memory used by EGL/GLES, it's just as likely to be an obscure bug caused by my app. Additionally, I spoke with the former engineer colleague who was reporting a much more frequent crash with this repro and it turns out he was not longer seeing the issue. Thanks for your time.
John