| Summary: | SDL_SetVideoMode crashes because of unaligned MOVAPS instruction | ||
|---|---|---|---|
| Product: | SDL | Reporter: | t.grundner |
| Component: | video | Assignee: | Sam Lantinga <slouken> |
| Status: | RESOLVED FIXED | QA Contact: | Sam Lantinga <slouken> |
| Severity: | major | ||
| Priority: | P2 | CC: | t.grundner |
| Version: | HG 2.0 | ||
| Hardware: | x86 | ||
| OS: | Windows 7 | ||
I did a bit of googling to find more about this problem and came across the following link: http://stackoverflow.com/questions/841433/gcc-attribute-alignedx-explanation There is some code showing off how alignment of values on the stack fails. I tried it and actually it worked. I guess things have changed since gcc 4.1.2. I disassembled the code and learned that the compiler is capable of aligning stack values by using a simple AND on the ESP register. So I took a deeper look at the SDL_FillRect4SSE function but could not find any AND alignments of the stack. I played a bit with the source code and as I changed the following line from DECLARE_ALIGNED(Uint32, cccc[4], 16); \ to DECLARE_ALIGNED(Uint32, cccc[4], 32); \ the AND appeared: 0x68178df0 <SDL_FillRect+0>: push %ebp 0x68178df1 <SDL_FillRect+1>: mov %esp,%ebp 0x68178df3 <SDL_FillRect+3>: and $0xffffffe0,%esp 0x68178df6 <SDL_FillRect+6>: push %edi 0x68178df7 <SDL_FillRect+7>: push %esi 0x68178df8 <SDL_FillRect+8>: push %ebx 0x68178df9 <SDL_FillRect+9>: sub $0xe4,%esp 0x68178dff <SDL_FillRect+15>: mov 0x8(%ebp),%esi 0x68178e02 <SDL_FillRect+18>: mov 0xc(%ebp),%eax 0x68178e05 <SDL_FillRect+21>: mov 0x10(%ebp),%ebx => 0x68178e08 <SDL_FillRect+24>: test %esi,%esi On the one hand I am quite happy that my app stopped crashing but on the other hand I am still wondering what is going on. Is GCC making some kind of assumption that the stack is always 16 byte aligned on my system? I figured out what is going on. GCC 4.5.2 assumes the stack is 16 byte aligned by default. Therefore there are no AND alignment corrections necessary if we wish to align a stack variable to a 16 byte boundary. That is bad if your OS ABI is not 16 byte aligned. Windows 32 bit stacks are 4 byte aligned. This results in the above mentioned SIGSEGV. This is also no problem if I compile both SDL.dll and my app with MingW because MinGW/GCC inserts a
andl $-16, %esp
instruction right in the beginning of the main function. So at least the stack of the thread calling the main function is 16 byte aligned. But as soon as I start to use the SDL.dll from an application not compiled by MinGW there is no ANDL safing my app.
However there is a GCC option that can change the default stack alignment:
-mpreferred-stack-boundary=num
Setting num=2 assumes a the stack is aligned to a 4 byte boundary. This results in GCC inserting the necessary
andl $-16, %esp
into SDL_FillRect. Rebuilding SDL with
./configure "CFLAGS=-mpreferred-stack-boundary=2 -g -O3"
solved the problem.
IMHO this should also be a problem on Solaris.
The following links contain further information:
http://gcc.gnu.org/onlinedocs/gcc-4.5.2/gcc/i386-and-x86_002d64-Options.html#i386-and-x86_002d64-Options
http://www.agner.org/optimize/calling_conventions.pdf
Wow, good work! Can you try out the latest snapshot? I think I've added a fix: http://hg.libsdl.org/SDL/rev/66f48d6bf735 Thanks for the "Wow". I am glad I could help. Actually it was fun doing some low level assembler stuff again. The latest snapshot fixed the problem. Thanks a lot!!! Great, thanks! |
Two days ago I got myself a fresh HG1.3 copy and compiled it on my windows 7 box using the latest mingw/msys package.(20110802, gcc-4.5.2). Straightaway I wrote a little SDL test app that opens a 640x480x32 window. So far so good. I then added a function to my test app called SetPixel. This caused the app to crash with SIGSEGV. At first I thought something is wrong with my SetPixel function but the app crashed before this function got called at all. According to the stacktrace it crashed within the SDL_SetVideoMode call. I started up GDB and stepped through SDL_SetVideoMode and found the application crashes within SDL_FillRect4SSE when MOVAPS is executed to move four fill color values to one of the MMX 128 bit registers. Here is the instruction used: movaps -0x38(%ebp),%xmm0 In my case EBP is pointing to 0x28FDD0. This means MOVAPS is trying to move 16 bytes starting from 0x28FD98. This is not possible because for MOVAPS to work the memory address has to be a multiple of 16. The result is the before mentioned segmentation fault. So I took a look at the source code and found the macro which creates the MOVAPS instruction. It is called SSE_BEGIN and is part of SDL_fillrect.c: #ifdef _MSC_VER #define SSE_BEGIN \ __m128 c128; \ c128.m128_u32[0] = color; \ c128.m128_u32[1] = color; \ c128.m128_u32[2] = color; \ c128.m128_u32[3] = color; #else #define SSE_BEGIN \ DECLARE_ALIGNED(Uint32, cccc[4], 16); \ cccc[0] = color; \ cccc[1] = color; \ cccc[2] = color; \ cccc[3] = color; \ __m128 c128 = *(__m128 *)cccc; #endif Within SSE_BEGIN there is another macro called DECLARE_ALIGNED which ensures the four color values cccc[4] are 16 byte aligned: #if defined(__GNUC__) #define DECLARE_ALIGNED(t,v,a) t __attribute__((aligned(a))) v #elif defined(_MSC_VER) #define DECLARE_ALIGNED(t,v,a) __declspec(align(a)) t v #else #define DECLARE_ALIGNED(t,v,a) t v #endif I spent some time wondering why this is not working and came to the following conclusion. I think it crashes because cccc[4] is allocated on the stack. I cannot think of a way how the compiler could ensure alignment for such values as it cannot know where EBP and ESP are pointing to at runtime. It will without a doubt work for global or static variables but then again those are not MT safe. IMHO a quick solution is to force the use of MOVUPS instead of MOVAPS. If you don't mind I could try to build a patch.