| Summary: | MMX YUV renderer crash | ||
|---|---|---|---|
| Product: | SDL | Reporter: | felix <felix.von.s> |
| Component: | render | Assignee: | Ryan C. Gordon <icculus> |
| Status: | RESOLVED FIXED | QA Contact: | Sam Lantinga <slouken> |
| Severity: | normal | ||
| Priority: | P2 | CC: | awils, bluescreen_avenger, egor-pvl, icculus, sezeroz, sylvain.becker |
| Version: | HG 2.0 | ||
| Hardware: | x86 | ||
| OS: | Linux | ||
| Attachments: | Patch (requires recent-ish GCC) | ||
Good catch! Do you have a tested fix? Created attachment 2787 [details]
Patch (requires recent-ish GCC)
Tested? Not really, but something like this ought to work, at least if you don't mind dropping support for older GCCs (they should fail to build). In that case you'll need to update the #if __GNUC__ > 2 checks as well, but I left that part out.
Comment on attachment 2787 [details]
Patch (requires recent-ish GCC)
I've been running SDL2 with this patch for a while, and I don't get any more crashes. It's just a question of which compilers you care to support at this point.
I accepted your patch. Failing to compile inline assembly on older compilers is much better than crashing with newer compilers. https://hg.libsdl.org/SDL/rev/2ee7d2fa299b Thanks! Which version(s) of gcc results in a miscompile? The merged fix most certainly breaks older gcc, and clang as well (see Ryan's commits 5ba02f3c5a8b and 43c7baa53681). Can you guys not add some ifdefs to the original code instead? Bug #3715 reports that even gcc-6.3 fails building. Which gcc version really resulted in a miscompile before? The miscompiled binary is the stock Debian sid package, version 2.0.5+dfsg1-3; apparently it is built with gcc 6.4.0. The patch compiles fine with gcc 7.1.0. Apparently I underestimated how young that GCC change is... There are some preprocessor guards already around this inline-assembly renderer; those guards could be bumped to check if __GNUC__ >= 7. I didn't include that change in my patch to make it easier to review, and I figured I'd leave the decision to the upstream how to handle this. I gotta leave them with _something_ to do ;) Other possible solutions include making sure the inline-assembly renderer is compiled with -fno-omit-frame-pointer (this may hurt performance a little) or removing it entirely, instead judiciously decorating the pure-C renderer with __restrict__ and/or __attribute__((target_clones)) and then crossing fingers that the compiler will manage to vectorise the code on its own. While it's more work, I'm inclined to think that the better solution would be to drop this code and write an SSE version with compiler intrinsics. Surely we don't _need_ either MMX or inline assembly at this point? I'd also be interested in seeing if the C fallback is measurably slower on modern compilers than the MMX code at this point, in which case I'd recommend we just delete it and leave a FIXME to revisit with SSE later. --ryan. Agreed. Interestingly, that code compiles with GCC 5.4.0 here... ? $ $CC --version i686-w64-mingw32-gcc (GCC) 5.4.0 Copyright (C) 2015 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. (In reply to Sam Lantinga from comment #10) > Interestingly, that code compiles with GCC 5.4.0 here... ? > i686-w64-mingw32-gcc (GCC) 5.4.0 Different ABI requirements on Windows? --ryan. Does anyone have a program that calls this MMX code? Some way for me to profile this? --ryan. Comment #11: Exactly: %ebx is used when generating position-independent code on x86-32 Linux, which until recently made it 'fixed' in GCC and unavailable to the register allocator (and not even possible to be declared clobbered in inline-assembly snippets). Comment #12: ffplay with an MPEG-1 video file should do it (not all codecs will trigger it!). At least that's the way I've been hitting this bug on my Pentium III laptop with an R100-based card, for which the latest supported OpenGL version is 1.3... ...because of which I do beg you not to remove acceleration support for MMX-only CPUs: with such old hardware, I'm going to need all the acceleration I can get. (I'm kinda stuck with it for now, and I expect this may last for a while.) Will you at least look at how __restrict__ + __attribute__((target_clones)) + -ftree-vectorize behave before reaching for SSE intrinsics?
> ...because of which I do beg you not to remove acceleration support for
> MMX-only CPUs: with such old hardware, I'm going to need all the
> acceleration I can get. (I'm kinda stuck with it for now, and I expect this
> may last for a while.)
I wouldn't object to MMX intrinsics, I just want to get rid of the assembly code.
--ryan.
*** Bug 3715 has been marked as a duplicate of this bug. *** (Sorry if you get a lot of copies of this email, we're touching dozens of bug reports right now.) Tagging a bunch of bugs as target-2.0.6. This means we're in the final stretch for an official SDL 2.0.6 release! These are the bugs we really want to fix before shipping if humanly possible. That being said, we don't promise to fix them because of this tag, we just want to make sure we don't forget to deal with them before we bless a final 2.0.6 release, and generally be organized about what we're aiming to ship. After some debate, we might just remove this tag again and deal with it for a later release. Hopefully you'll hear more about this bug soon. If you have more information (including "this got fixed at some point, nevermind"), we would love to have you come add more information to the bug report when you have a moment. Thanks! --ryan. We'll leave MMX assembly disabled for the 2.0.6 release. *** Bug 3479 has been marked as a duplicate of this bug. *** I did some testing on godbolt.org, and it seems that this code can be made to compile under GCC as old as 4.4.1. The catch is, whether compilation succeeds is quite sensitive to compiler options: if the source file is compiled with -fPIC, -fomit-frame-pointer must be enabled as well. Otherwise the compiler runs out of registers and compilation fails. The same caveat applies to clang.
Decorating those two functions with __attribute__((optimize("omit-frame-pointer"))) should fix this on GCC; clang doesn't support this attribute, unfortunately. It's a bit of a hack anyway.
By the way, I think bug #3707 might be a duplicate as well.
*** Bug 3707 has been marked as a duplicate of this bug. *** > Will you at least look at how __restrict__ + __attribute__((target_clones)) > + -ftree-vectorize behave before reaching for SSE intrinsics? YUV/RGB conversion have been implemented in SDL_ConvertPixels (bug 3857) and they are somehow optimized with gcc vectorize. It you are using the mmx version (ColorRGBDitherYV12MMX1X, not the 565 ...), you could try to tell how the C version runs on you side. A quick patch to do that is to recompile SDL2 with the change in file "render/SDL_yuv_sw.c". Just reimplement the function as below: int SDL_SW_CopyYUVToRGB(SDL_SW_YUVTexture * swdata, const SDL_Rect * srcrect, Uint32 target_format, int w, int h, void *pixels, int pitch) { return SDL_ConvertPixels(swdata->w, swdata->h, swdata->format, swdata->planes[0], swdata->pitches[0], target_format, pixels, pitch); } Also, you need to make sure that gcc is compile with -O3. Or/and you can add: -ftree-loop-vectorize -fopt-info-vec-optimized -fopt-info-optimized opt-info tells if loops are vectorized or not. You should see 16 loops optimized at the end of the file. And also you can try with: -march=native Closing because the code YUV ColorRGBDitherYV12MMX1X and so on, has been removed/updated |
The functions in src/render/SDL_yuv_mmx.c contain the following inline assembly snippet: /* tap dance to workaround the inability to use %%ebx at will... */ /* move one thing to the stack... */ "pushl $0\n" /* save a slot on the stack. */ "pushl %%ebx\n" /* save %%ebx. */ "movl %0, %%ebx\n" /* put the thing in ebx. */ "movl %%ebx,4(%%esp)\n" /* put the thing in the stack slot. */ "popl %%ebx\n" /* get back %%ebx (the PIC register). */ Here's how it ended up in a binary on my old laptop: 0xb5c17dbd <ColorRGBDitherYV12MMX1X+93>: push $0x0 0xb5c17dbf <ColorRGBDitherYV12MMX1X+95>: push %ebx 0xb5c17dc0 <ColorRGBDitherYV12MMX1X+96>: mov 0xc(%esp),%ebx 0xb5c17dc4 <ColorRGBDitherYV12MMX1X+100>: mov %ebx,0x4(%esp) 0xb5c17dc8 <ColorRGBDitherYV12MMX1X+104>: pop %ebx Apparently the compiler, oblivious to the fact that the assembly snippet manipulates the %esp register, decided to refer to the operand via that same register instead of via %ebp (I believe -fomit-frame-pointer enables this). This causes %ebx to be loaded with the wrong value, which later leads to a null pointer dereference. Recent GCC can use the %ebx register normally: <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47602#c16>. There is even an explicit constraint "b" for allocating it.