| Summary: | asm code in video/SDL_stretch.c | ||
|---|---|---|---|
| Product: | SDL | Reporter: | Ozkan Sezer <sezeroz> |
| Component: | video | Assignee: | Sam Lantinga <slouken> |
| Status: | RESOLVED FIXED | QA Contact: | Sam Lantinga <slouken> |
| Severity: | normal | ||
| Priority: | P2 | CC: | sylvain.becker |
| Version: | HG 2.0 | ||
| Hardware: | x86 | ||
| OS: | All | ||
| Attachments: |
test-case
test image test log test-case (fixed) i686 test log updated patch updated patch - v3 updated patch -- v3 test-case test-case test log 64 bits new SDL_stretch.c |
||
Sylvain: your commit https://hg.libsdl.org/SDL/rev/a304d0bbe458 enabled the stretch asm code for compilers other than gcc. See the patch in comment #0. Hey Ozkan, I re-enabled as I tested with gcc 4.6, gcc 6 and gcc 10.2, and clang with HAVE_MPROTECT compiling with -m32 and using multilib package on 64bits not sure about others OS .. checking with a basic test-case (it was disabled here: https://hg.libsdl.org/SDL/rev/8ae607392409) Can you post your testcase? Created attachment 4711 [details]
test-case
test-case
Created attachment 4712 [details]
test image
test image
Created attachment 4713 [details]
test log
test log
I've double-checked and, it work with USE_ASM + MPROTECT (on i386 linux/multilib), but If I manually un-define HAVE_MPROTECT, then it crashes ! So, disable asm path if mprotect isn't available https://hg.libsdl.org/SDL/rev/9eb1c95cd4d6 Created attachment 4714 [details]
test-case (fixed)
Fixed testcase (fixes type of crc local var)
Created attachment 4715 [details]
i686 test log
Ran this on my i686 CentOS-6.10 (Intel Core2 Duo E8400 @ 3.00GHz)
using gcc4.4.7: Every line in the logs stay the same, except for
delta time of course:
Without USE_ASM_STRETCH (because you disable it for gcc < 4.6),
delta time is usually around 420 ms
With USE_ASM_STRETCH enabled, delta time is around ~90 ms.
My gcc is gcc4.4.7 of RedHat. Maybe the issue with gcc4.4.1 is
fixed later in newer versions?
Sam?
Made a quick win32 build using gcc-4.5.4, seems to work: Every crc is the same with asm disabled/enabled, delta-time is reduced to from ~151 ms to ~51 ms. Created attachment 4716 [details]
updated patch
Built using VS2017 for i686 with the attached updated patch applied: CRCs differ againsts MinGW-built dll. However, with asm disabled or enabled, CRCS stay the same for MSVC-built dlls. Delta time reduces to ~51 ms from ~115 ms. OS/2 builds using Watcom without asm matches all the CRCs from the linux and windows gcc builds. But the asm-enabled builds crash (possibly for lack of any align directive in C code) ?? I won't go after that one and leave watcom alone w/o asm.. Created attachment 4717 [details]
updated patch - v3
- adds MSVC __declspec(align(x)) support,
- disables asm if PAGE_ALIGNED no macro is defined,
- still disables asm for gcc < 4.6, need more info,
- drops Watcom support.
Created attachment 4718 [details]
updated patch -- v3
Correct version of patch attached this time:
- adds MSVC __declspec(align(x)) support,
- disables asm if PAGE_ALIGNED no macro is defined,
- still disables asm for gcc < 4.6, need more info,
- drops Watcom support.
Created attachment 4719 [details]
test-case
I have update the test case so that it allows try with a 24 bpp image
Created attachment 4720 [details]
test-case
Updated again with your modif !
Created attachment 4721 [details]
test log 64 bits
my test log on 64 bits
Created attachment 4722 [details]
new SDL_stretch.c
I propose this new version for SDL_stretch.c
that drops mprotect and asm
Code is similar to the StretchLinear, but the steps computation are kept similar to the nearest.
so that:
- it's pixel perfect with nearest
- as fast as asm I think
- no asm, nor mprotect
- benefit for all archicture
Ozkan, I tweaked your patch and applied it: https://hg.libsdl.org/SDL/rev/f32946bbe1e6 Sylvain, can you confirm the performance of your code relative to what I just checked in? on i386 ( gcc-multilib) I see the current head version with asm/mprotect gives: 60 ms for 100 SDL_Stretch() without asm: 149 ms (gcc 10) or 130ms (gcc 4.6) whereas my C version gives 73 ms. on x84, there is no asm, it gives 215 ms. whereas my c version is 60 ms ... with clang, with gcc: 75 ms (In reply to Sylvain from comment #23) > on i386 ( gcc-multilib) > > I see the current head version with asm/mprotect gives: > 60 ms for 100 SDL_Stretch() > without asm: 149 ms (gcc 10) or 130ms (gcc 4.6) > > whereas my C version gives 73 ms. Your new C version should be good: 60 <-> 73 ms should be negligible, no? All right! This code is in, thanks Sylvain! https://hg.libsdl.org/SDL/rev/e39af0d27669 Sylvain: Applied this: https://hg.libsdl.org/SDL/rev/0645e1a9812e Please confirm that it's correct. About new code: *(Uint16*)dst = *(Uint16*)s_00_01 in scale_mat_nearest_2() won't cause any bus errors because of possible unaligned access, won't it? (In reply to Ozkan Sezer from comment #28) > About new code: *(Uint16*)dst = *(Uint16*)s_00_01 > in scale_mat_nearest_2() won't cause any bus errors > because of possible unaligned access, won't it? Same question for scale_mat_nearest_4() . Indeed, I think https://hg.libsdl.org/SDL/rev/0645e1a9812e is correct, otherwise no code return, if no sse/neon I don't think there would be bus error, because it always move by 'bpp' bytes, so it remains with the same alignment. (like the original copy_row of any image copying) now, I hesitate between keeping the code same as the bilinear one, or just simplify it to the most basic step/increment & loop. because, it would just apply to more scale loop I guess.. some simplification: https://hg.libsdl.org/SDL/rev/1010371c31c4 Changing the other function in bug #5510 |
Below is a test patch for SDL_stretch.c, which adds OS-specific memory calls (win32, os/2: where mprotect() isn't available) for the asm code. It's compile-tested only, not run-tested (at least not yet). Questions: - Has the asm code path ever been tested with any newer gcc than 4.4.1, i.e. is the failure reason known? - Asm code is disabled not just for gcc but for other compilers, too: has any failure been observed for them too? ---- diff --git a/src/video/SDL_stretch.c b/src/video/SDL_stretch.c --- a/src/video/SDL_stretch.c +++ b/src/video/SDL_stretch.c @@ -43,7 +43,14 @@ #ifdef USE_ASM_STRETCH -#ifdef HAVE_MPROTECT +#ifdef __WIN32__ +#define WIN32_LEAN_AND_MEAN +#include <windows.h> +#elif defined (__OS2__) +#define INCL_DOSMEMMGR +#define INCL_DOSERRORS +#include <os2.h> +#elif defined(HAVE_MPROTECT) #include <sys/types.h> #include <sys/mman.h> #endif @@ -77,6 +84,9 @@ generate_rowbytes(int src_w, int dst_w, int status; } last; +#ifdef __WIN32__ + DWORD oldprot; +#endif int i; int pos, inc; unsigned char *eip, *fence; @@ -104,8 +114,16 @@ generate_rowbytes(int src_w, int dst_w, default: return SDL_SetError("ASM stretch of %d bytes isn't supported", bpp); } -#ifdef HAVE_MPROTECT /* Make the code writeable */ +#ifdef __WIN32__ + if (!VirtualProtect(copy_row, sizeof(copy_row), PAGE_READWRITE, &oldprot)) { + return SDL_SetError("Couldn't make copy buffer writeable"); + } +#elif defined (__OS2__) + if (DosSetMem(copy_row, sizeof(copy_row), PAG_READ | PAG_WRITE) != NO_ERROR) { + return SDL_SetError("Couldn't make copy buffer writeable"); + } +#elif defined(HAVE_MPROTECT) if (mprotect(copy_row, sizeof(copy_row), PROT_READ | PROT_WRITE) < 0) { return SDL_SetError("Couldn't make copy buffer writeable"); } @@ -136,8 +154,16 @@ generate_rowbytes(int src_w, int dst_w, } *eip++ = RETURN; -#ifdef HAVE_MPROTECT /* Make the code executable but not writeable */ +#ifdef __WIN32__ + if (!VirtualProtect(copy_row, sizeof(copy_row), PAGE_EXECUTE_READ, &oldprot)) { + return SDL_SetError("Couldn't make copy buffer executable"); + } +#elif defined (__OS2__) + if (DosSetMem(copy_row, sizeof(copy_row), PAG_READ | PAG_EXECUTE) != NO_ERROR) { + return SDL_SetError("Couldn't make copy buffer executable"); + } +#elif defined(HAVE_MPROTECT) if (mprotect(copy_row, sizeof(copy_row), PROT_READ | PROT_EXEC) < 0) { return SDL_SetError("Couldn't make copy buffer executable"); }