New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM assembly to address performance of blit and fill routines #777
Comments
I think I can see what's happened. In SDL 2.0, the way There is no equivalent commit in SDL 1.2. My ARM assembly optimisations - both the SIMD and NEON versions - faithfully copied the SDL 1.2 behaviour (this was, after all, my primary target at the time). It looks like nobody, including myself, noticed the difference between the two branches. First conclusion: I think the code can safely be re-enabled on SDL 1.2. Having looked more closely at the SDL 2.0 code, I think the new handling of alpha is actually incorrect, according to its own specification. It's desirable that every colour component is treated identically (not least because it makes SIMD processing easier). Look at the least-significant byte of
to
If you substitute
By contrast, we can rearrange
to
These are not equivalent, except (almost) when Some worked examples: source 0x0f0f0f0f, destination 0x00000000, SDL1.2 result 0x00000000, SDL2.0 result 0x0f000000 source 0x0f0f0f0f, destination 0xffffffff, SDL1.2 result 0xfff0f0f0, SDL2.0 result 0xfef0f0f0 Ideally, I'd have said that we should be aiming for every colour component to be treated the same, which would result in 0x00000000 and 0xf0f0f0f0 respectively for these examples. (There's also an argument that some rounding should be done rather than simple truncation of the 16-bit intermediate product, but I won't go into that now.) If I were to re-work the SDL2.0 assembly, would it be acceptable to treat all components the same in this way? I wouldn't have to revisit it yet again if the equation is changed a year or two down the line. I can change |
Created a new issue from your latest comment in the main SDL repo as libsdl-org/SDL#4484 |
Re-enabled ARM SIMD and NEON asm blitters as of cb261c8 |
This bug report was migrated from our old Bugzilla tracker.
These attachments are available in the static archive:
ARM: Create configure option --enable-arm-simd to govern assembly optimizations (0001-ARM-Create-configure-option-enable-arm-simd-to-gover.patch, text/plain, 2018-11-07 15:29:25 +0000, 5215 bytes)ARM: SIMD assembly optimization for function BlitRGBtoRGBPixelAlpha (0002-ARM-SIMD-assembly-optimization-for-function-BlitRGBt.patch, text/plain, 2018-11-07 15:31:22 +0000, 49642 bytes)ARM: SIMD assembly optimization for function BlitARGBto565PixelAlpha (0003-ARM-SIMD-assembly-optimization-for-function-BlitARGB.patch, text/plain, 2018-11-07 15:31:58 +0000, 9005 bytes)SDL_blit: use a named enum for required hardware features bits in dispatch tables (0004-SDL_blit-use-a-named-enum-for-required-hardware-feat.patch, text/plain, 2018-11-07 15:32:27 +0000, 6636 bytes)ARM: SIMD assembly optimization for BGR-to-RGB 32bpp normal blits (0005-ARM-SIMD-assembly-optimization-for-BGR-to-RGB-32bpp-.patch, text/plain, 2018-11-07 15:33:01 +0000, 4351 bytes)ARM: assembly optimization for SDL_FillRect (0006-ARM-assembly-optimization-for-SDL_FillRect.patch, text/plain, 2018-11-07 15:33:28 +0000, 4612 bytes)ARM: SIMD optimization for 4:4:4:4 to 8:8:8:8 normal blits (0007-ARM-SIMD-optimization-for-4-4-4-4-to-8-8-8-8-normal-.patch, text/plain, 2018-11-07 15:34:05 +0000, 5286 bytes)ARM: Create configure option --enable-arm-neon to govern assembly optimizations (0008-ARM-Create-configure-option-enable-arm-neon-to-gover.patch, text/plain, 2018-11-07 15:34:34 +0000, 5117 bytes)ARM: NEON assembly optimization for function BlitRGBtoRGBPixelAlpha (0009-ARM-NEON-assembly-optimization-for-function-BlitRGBt.patch, text/plain, 2018-11-07 15:35:09 +0000, 48862 bytes)ARM: NEON assembly optimization for function BlitARGBto565PixelAlpha (0010-ARM-NEON-assembly-optimization-for-function-BlitARGB.patch, text/plain, 2018-11-07 15:35:38 +0000, 5831 bytes)ARM: NEON assembly optimization for SDL_FillRect (0011-ARM-NEON-assembly-optimization-for-SDL_FillRect.patch, text/plain, 2018-11-07 15:36:10 +0000, 6088 bytes)[SDL2] ARM: Create configure option --enable-arm-simd to govern assembly optimizations (0001-ARM-Create-configure-option-enable-arm-simd-to-gover.patch, text/plain, 2018-11-09 02:13:05 +0000, 5870 bytes)[SDL2] ARM: SIMD assembly optimization for function BlitRGBtoRGBPixelAlpha (0002-ARM-SIMD-assembly-optimization-for-function-BlitRGBt.patch, text/plain, 2018-11-09 02:13:51 +0000, 49674 bytes)[SDL2] ARM: SIMD assembly optimization for function BlitARGBto565PixelAlpha (0003-ARM-SIMD-assembly-optimization-for-function-BlitARGB.patch, text/plain, 2018-11-09 02:14:27 +0000, 9007 bytes)[SDL2] SDL_blit: use a named enum for required hardware features bits in dispatch tables (0004-SDL_blit-use-a-named-enum-for-required-hardware-feat.patch, text/plain, 2018-11-09 02:15:13 +0000, 4597 bytes)[SDL2] ARM: SIMD assembly optimization for BGR-to-RGB 32bpp normal blits (0005-ARM-SIMD-assembly-optimization-for-BGR-to-RGB-32bpp-.patch, text/plain, 2018-11-09 02:17:22 +0000, 4232 bytes)[SDL2] ARM: assembly optimization for SDL_FillRect (0006-ARM-assembly-optimization-for-SDL_FillRect.patch, text/plain, 2018-11-09 02:18:06 +0000, 4623 bytes)ARM: SIMD optimization for 4:4:4:4 to 8:8:8:8 normal blits (0007-ARM-SIMD-optimization-for-4-4-4-4-to-8-8-8-8-normal-.patch, text/plain, 2018-11-09 02:18:36 +0000, 5254 bytes)[SDL2] ARM: Create configure option --enable-arm-neon to govern assembly optimizations (0008-ARM-Create-configure-option-enable-arm-neon-to-gover.patch, text/plain, 2018-11-09 02:19:13 +0000, 2611 bytes)[SDL2] ARM: NEON assembly optimization for function BlitRGBtoRGBPixelAlpha (0009-ARM-NEON-assembly-optimization-for-function-BlitRGBt.patch, text/plain, 2018-11-09 02:19:49 +0000, 48895 bytes)[SDL2] ARM: NEON assembly optimization for function BlitARGBto565PixelAlpha (0010-ARM-NEON-assembly-optimization-for-function-BlitARGB.patch, text/plain, 2018-11-09 02:21:32 +0000, 5873 bytes)[SDL2] ARM: NEON assembly optimization for SDL_FillRect (0011-ARM-NEON-assembly-optimization-for-SDL_FillRect.patch, text/plain, 2018-11-09 02:22:01 +0000, 6121 bytes)[SDL1.2] ARM: Create configure option --enable-arm-simd to govern assembly optimizations (0001-ARM-Create-configure-option-enable-arm-simd-to-gover.patch, text/plain, 2019-02-28 19:54:21 +0000, 5201 bytes)[SDL1.2] ARM: SIMD assembly optimization for function BlitRGBtoRGBPixelAlpha (0002-ARM-SIMD-assembly-optimization-for-function-BlitRGBt.patch, text/plain, 2019-02-28 19:55:11 +0000, 49034 bytes)[SDL1.2] ARM: SIMD assembly optimization for function BlitARGBto565PixelAlpha (0003-ARM-SIMD-assembly-optimization-for-function-BlitARGB.patch, text/plain, 2019-02-28 19:56:56 +0000, 8986 bytes)[SDL1.2] SDL_blit: use a named enum for required hardware features bits in dispatch tables (0004-SDL_blit-use-a-named-enum-for-required-hardware-feat.patch, text/plain, 2019-02-28 19:58:53 +0000, 6757 bytes)[SDL1.2] ARM: SIMD assembly optimization for BGR-to-RGB 32bpp normal blits (0005-ARM-SIMD-assembly-optimization-for-BGR-to-RGB-32bpp-.patch, text/plain, 2019-02-28 19:59:47 +0000, 4283 bytes)[SDL1.2] ARM: assembly optimization for SDL_FillRect (0006-ARM-assembly-optimization-for-SDL_FillRect.patch, text/plain, 2019-02-28 20:00:56 +0000, 4602 bytes)[SDL1.2] ARM: SIMD optimization for 4:4:4:4 to 8:8:8:8 normal blits (0007-ARM-SIMD-optimization-for-4-4-4-4-to-8-8-8-8-normal-.patch, text/plain, 2019-02-28 20:02:20 +0000, 5296 bytes)[SDL1.2] ARM: Create configure option --enable-arm-neon to govern assembly optimizations (0008-ARM-Create-configure-option-enable-arm-neon-to-gover.patch, text/plain, 2019-02-28 20:05:02 +0000, 5117 bytes)[SDL1.2] ARM: NEON assembly optimization for function BlitRGBtoRGBPixelAlpha (0009-ARM-NEON-assembly-optimization-for-function-BlitRGBt.patch, text/plain, 2019-02-28 20:05:37 +0000, 49758 bytes)[SDL1.2] ARM: NEON assembly optimization for function BlitARGBto565PixelAlpha (0010-ARM-NEON-assembly-optimization-for-function-BlitARGB.patch, text/plain, 2019-02-28 20:06:10 +0000, 5820 bytes)[SDL1.2] ARM: NEON assembly optimization for SDL_FillRect (0011-ARM-NEON-assembly-optimization-for-SDL_FillRect.patch, text/plain, 2019-02-28 20:06:44 +0000, 6079 bytes)[SDL2] ARM: Create configure option --enable-arm-simd to govern assembly optimizations (0001-ARM-Create-configure-option-enable-arm-simd-to-gover.patch, text/plain, 2019-02-28 20:09:20 +0000, 5810 bytes)[SDL2] ARM: SIMD assembly optimization for function BlitRGBtoRGBPixelAlpha (0002-ARM-SIMD-assembly-optimization-for-function-BlitRGBt.patch, text/plain, 2019-02-28 20:12:24 +0000, 49033 bytes)[SDL2] ARM: SIMD assembly optimization for function BlitARGBto565PixelAlpha (0003-ARM-SIMD-assembly-optimization-for-function-BlitARGB.patch, text/plain, 2019-02-28 20:12:49 +0000, 8982 bytes)[SDL2] SDL_blit: use a named enum for required hardware features bits in dispatch tables (0004-SDL_blit-use-a-named-enum-for-required-hardware-feat.patch, text/plain, 2019-02-28 20:13:19 +0000, 4721 bytes)[SDL2] ARM: SIMD assembly optimization for BGR-to-RGB 32bpp normal blits (0005-ARM-SIMD-assembly-optimization-for-BGR-to-RGB-32bpp-.patch, text/plain, 2019-02-28 20:13:57 +0000, 4231 bytes)[SDL2] ARM: assembly optimization for SDL_FillRect (0006-ARM-assembly-optimization-for-SDL_FillRect.patch, text/plain, 2019-02-28 20:14:53 +0000, 4623 bytes)[SDL2] ARM: SIMD optimization for 4:4:4:4 to 8:8:8:8 normal blits (0007-ARM-SIMD-optimization-for-4-4-4-4-to-8-8-8-8-normal-.patch, text/plain, 2019-02-28 20:15:41 +0000, 5254 bytes)[SDL2] ARM: Create configure option --enable-arm-neon to govern assembly optimizations (0008-ARM-Create-configure-option-enable-arm-neon-to-gover.patch, text/plain, 2019-02-28 20:16:29 +0000, 2611 bytes)[SDL2] ARM: NEON assembly optimization for function BlitRGBtoRGBPixelAlpha (0009-ARM-NEON-assembly-optimization-for-function-BlitRGBt.patch, text/plain, 2019-02-28 20:18:21 +0000, 49800 bytes)[SDL2] ARM: NEON assembly optimization for function BlitARGBto565PixelAlpha (0010-ARM-NEON-assembly-optimization-for-function-BlitARGB.patch, text/plain, 2019-02-28 20:18:54 +0000, 5848 bytes)[SDL2] ARM: NEON assembly optimization for SDL_FillRect (0011-ARM-NEON-assembly-optimization-for-SDL_FillRect.patch, text/plain, 2019-02-28 20:19:28 +0000, 6121 bytes)[SDL1.2] ARM: Create configure option --enable-arm-simd to govern assembly optimizations (0001-ARM-Create-configure-option-enable-arm-simd-to-gover.patch, text/plain, 2019-10-21 17:35:08 +0000, 5810 bytes)[SDL1.2] ARM: SIMD assembly optimization for function BlitRGBtoRGBPixelAlpha (0002-ARM-SIMD-assembly-optimization-for-function-BlitRGBt.patch, text/plain, 2019-10-21 17:36:40 +0000, 49033 bytes)[SDL1.2] ARM: SIMD assembly optimization for function BlitARGBto565PixelAlpha (0003-ARM-SIMD-assembly-optimization-for-function-BlitARGB.patch, text/plain, 2019-10-21 17:37:09 +0000, 8982 bytes)[SDL1.2] SDL_blit: use a named enum for required hardware features bits in dispatch tables (0004-SDL_blit-use-a-named-enum-for-required-hardware-feat.patch, text/plain, 2019-10-21 17:37:47 +0000, 4721 bytes)[SDL1.2] ARM: SIMD assembly optimization for BGR-to-RGB 32bpp normal blits (0005-ARM-SIMD-assembly-optimization-for-BGR-to-RGB-32bpp-.patch, text/plain, 2019-10-21 17:38:24 +0000, 4231 bytes)[SDL1.2] ARM: assembly optimization for SDL_FillRect (0006-ARM-assembly-optimization-for-SDL_FillRect.patch, text/plain, 2019-10-21 17:38:51 +0000, 4623 bytes)[SDL1.2] ARM: SIMD optimization for 4:4:4:4 to 8:8:8:8 normal blits (0007-ARM-SIMD-optimization-for-4-4-4-4-to-8-8-8-8-normal-.patch, text/plain, 2019-10-21 17:39:19 +0000, 5254 bytes)[SDL1.2] ARM: Create configure option --enable-arm-neon to govern assembly optimizations (0008-ARM-Create-configure-option-enable-arm-neon-to-gover.patch, text/plain, 2019-10-21 17:39:49 +0000, 2611 bytes)[SDL1.2] ARM: NEON assembly optimization for function BlitRGBtoRGBPixelAlpha (0009-ARM-NEON-assembly-optimization-for-function-BlitRGBt.patch, text/plain, 2019-10-21 17:40:20 +0000, 49800 bytes)[SDL1.2] ARM: NEON assembly optimization for function BlitARGBto565PixelAlpha (0010-ARM-NEON-assembly-optimization-for-function-BlitARGB.patch, text/plain, 2019-10-21 17:40:41 +0000, 5848 bytes)[SDL1.2] ARM: NEON assembly optimization for SDL_FillRect (0011-ARM-NEON-assembly-optimization-for-SDL_FillRect.patch, text/plain, 2019-10-21 17:41:12 +0000, 6121 bytes)Reported in version: HG 1.2
Reported for operating system, platform: Linux, ARM
Comments on the original bug report:
On 2018-11-07 15:29:25 +0000, Ben Avison wrote:
On 2018-11-07 15:31:22 +0000, Ben Avison wrote:
On 2018-11-07 15:31:58 +0000, Ben Avison wrote:
On 2018-11-07 15:32:27 +0000, Ben Avison wrote:
On 2018-11-07 15:33:01 +0000, Ben Avison wrote:
On 2018-11-07 15:33:28 +0000, Ben Avison wrote:
On 2018-11-07 15:34:05 +0000, Ben Avison wrote:
On 2018-11-07 15:34:34 +0000, Ben Avison wrote:
On 2018-11-07 15:35:09 +0000, Ben Avison wrote:
On 2018-11-07 15:35:38 +0000, Ben Avison wrote:
On 2018-11-07 15:36:10 +0000, Ben Avison wrote:
On 2018-11-07 23:42:46 +0000, Sam Lantinga wrote:
On 2018-11-08 05:20:43 +0000, Ozkan Sezer wrote:
On 2018-11-09 02:07:00 +0000, Ben Avison wrote:
On 2018-11-09 02:09:04 +0000, Ben Avison wrote:
On 2018-11-09 02:13:05 +0000, Ben Avison wrote:
On 2018-11-09 02:13:51 +0000, Ben Avison wrote:
On 2018-11-09 02:14:27 +0000, Ben Avison wrote:
On 2018-11-09 02:15:13 +0000, Ben Avison wrote:
On 2018-11-09 02:17:22 +0000, Ben Avison wrote:
On 2018-11-09 02:18:06 +0000, Ben Avison wrote:
On 2018-11-09 02:18:36 +0000, Ben Avison wrote:
On 2018-11-09 02:19:13 +0000, Ben Avison wrote:
On 2018-11-09 02:19:49 +0000, Ben Avison wrote:
On 2018-11-09 02:21:32 +0000, Ben Avison wrote:
On 2018-11-09 02:22:01 +0000, Ben Avison wrote:
On 2019-01-23 13:53:25 +0000, Sylvain wrote:
On 2019-01-23 14:14:52 +0000, Ben Avison wrote:
On 2019-01-25 10:09:31 +0000, Sam Lantinga wrote:
On 2019-02-06 13:38:43 +0000, Ben Avison wrote:
On 2019-02-28 19:52:35 +0000, Ben Avison wrote:
On 2019-02-28 19:54:21 +0000, Ben Avison wrote:
On 2019-02-28 19:55:11 +0000, Ben Avison wrote:
On 2019-02-28 19:56:56 +0000, Ben Avison wrote:
On 2019-02-28 19:58:53 +0000, Ben Avison wrote:
On 2019-02-28 19:59:47 +0000, Ben Avison wrote:
On 2019-02-28 20:00:56 +0000, Ben Avison wrote:
On 2019-02-28 20:02:20 +0000, Ben Avison wrote:
On 2019-02-28 20:05:02 +0000, Ben Avison wrote:
On 2019-02-28 20:05:37 +0000, Ben Avison wrote:
On 2019-02-28 20:06:10 +0000, Ben Avison wrote:
On 2019-02-28 20:06:44 +0000, Ben Avison wrote:
On 2019-02-28 20:09:20 +0000, Ben Avison wrote:
On 2019-02-28 20:12:24 +0000, Ben Avison wrote:
On 2019-02-28 20:12:49 +0000, Ben Avison wrote:
On 2019-02-28 20:13:19 +0000, Ben Avison wrote:
On 2019-02-28 20:13:57 +0000, Ben Avison wrote:
On 2019-02-28 20:14:53 +0000, Ben Avison wrote:
On 2019-02-28 20:15:41 +0000, Ben Avison wrote:
On 2019-02-28 20:16:29 +0000, Ben Avison wrote:
On 2019-02-28 20:18:21 +0000, Ben Avison wrote:
On 2019-02-28 20:18:54 +0000, Ben Avison wrote:
On 2019-02-28 20:19:28 +0000, Ben Avison wrote:
On 2019-07-30 17:49:37 +0000, Ryan C. Gordon wrote:
On 2019-08-19 17:13:56 +0000, Ben Avison wrote:
On 2019-09-07 10:25:50 +0000, Ozkan Sezer wrote:
On 2019-09-10 14:41:17 +0000, Ben Avison wrote:
On 2019-09-10 16:37:26 +0000, Ozkan Sezer wrote:
On 2019-09-10 18:35:55 +0000, Sam Lantinga wrote:
On 2019-09-10 19:21:44 +0000, Ben Avison wrote:
On 2019-09-20 20:47:39 +0000, Ryan C. Gordon wrote:
On 2019-09-20 20:48:39 +0000, Ryan C. Gordon wrote:
On 2019-09-24 10:31:33 +0000, Sylvain wrote:
On 2019-09-24 13:33:58 +0000, Ben Avison wrote:
On 2019-10-18 04:55:37 +0000, Ryan C. Gordon wrote:
On 2019-10-19 20:03:15 +0000, Sam Lantinga wrote:
On 2019-10-21 17:35:08 +0000, Ben Avison wrote:
On 2019-10-21 17:36:40 +0000, Ben Avison wrote:
On 2019-10-21 17:37:09 +0000, Ben Avison wrote:
On 2019-10-21 17:37:47 +0000, Ben Avison wrote:
On 2019-10-21 17:38:24 +0000, Ben Avison wrote:
On 2019-10-21 17:38:51 +0000, Ben Avison wrote:
On 2019-10-21 17:39:19 +0000, Ben Avison wrote:
On 2019-10-21 17:39:49 +0000, Ben Avison wrote:
On 2019-10-21 17:40:20 +0000, Ben Avison wrote:
On 2019-10-21 17:40:41 +0000, Ben Avison wrote:
On 2019-10-21 17:41:12 +0000, Ben Avison wrote:
On 2019-10-21 17:44:57 +0000, Ben Avison wrote:
On 2019-10-21 17:45:24 +0000, Ben Avison wrote:
On 2019-10-21 17:45:55 +0000, Ben Avison wrote:
On 2019-10-21 17:46:21 +0000, Ben Avison wrote:
On 2019-10-21 17:47:00 +0000, Ben Avison wrote:
On 2019-10-21 17:47:27 +0000, Ben Avison wrote:
On 2019-10-21 17:47:57 +0000, Ben Avison wrote:
On 2019-10-21 17:48:27 +0000, Ben Avison wrote:
On 2019-10-21 17:48:57 +0000, Ben Avison wrote:
On 2019-10-21 17:49:24 +0000, Ben Avison wrote:
On 2019-10-21 17:49:50 +0000, Ben Avison wrote:
On 2019-10-21 17:53:19 +0000, Ben Avison wrote:
On 2019-10-25 01:44:09 +0000, Ryan C. Gordon wrote:
On 2019-10-25 03:24:51 +0000, Ryan C. Gordon wrote:
On 2019-10-25 07:12:26 +0000, Ozkan Sezer wrote:
On 2019-10-25 14:10:07 +0000, Sylvain wrote:
On 2019-10-27 13:56:54 +0000, Sylvain wrote:
On 2019-10-29 15:16:42 +0000, Sylvain wrote:
On 2019-10-30 15:44:41 +0000, Ben Avison wrote:
On 2019-10-30 15:45:32 +0000, Ben Avison wrote:
On 2019-10-30 15:46:25 +0000, Ben Avison wrote:
On 2019-10-30 15:46:59 +0000, Ben Avison wrote:
On 2019-10-30 15:47:32 +0000, Ben Avison wrote:
On 2019-10-30 15:48:08 +0000, Ben Avison wrote:
On 2019-10-30 15:48:36 +0000, Ben Avison wrote:
On 2019-10-30 15:49:09 +0000, Ben Avison wrote:
On 2019-10-30 15:49:45 +0000, Ben Avison wrote:
On 2019-10-30 15:50:14 +0000, Ben Avison wrote:
On 2019-10-30 15:50:46 +0000, Ben Avison wrote:
On 2019-10-30 16:07:35 +0000, Ben Avison wrote:
On 2019-10-31 11:05:15 +0000, Ozkan Sezer wrote:
On 2019-10-31 11:29:34 +0000, Ben Avison wrote:
On 2019-10-31 11:53:03 +0000, Ozkan Sezer wrote:
On 2020-02-27 17:32:57 +0000, Sam Lantinga wrote:
On 2020-05-07 23:28:12 +0000, Brad Smith wrote:
On 2020-05-12 16:36:30 +0000, Dan Lawrence wrote:
On 2020-06-08 20:30:42 +0000, Sam James wrote:
On 2020-06-09 00:12:04 +0000, Sam Lantinga wrote:
On 2020-06-09 19:34:37 +0000, Ben Avison wrote:
On 2020-06-10 18:30:03 +0000, Sam James wrote:
On 2020-06-27 01:56:59 +0000, Ryan C. Gordon wrote:
On 2020-06-27 03:40:00 +0000, Ryan C. Gordon wrote:
On 2020-06-27 08:47:53 +0000, Ozkan Sezer wrote:
The text was updated successfully, but these errors were encountered: