We are currently migrating Bugzilla to GitHub issues.
Any changes made to the bug tracker now will be lost, so please do not post new bugs or make changes to them.
When we're done, all bug URLs will redirect to their equivalent location on the new bug tracker.

Bug 4365 - ARM assembly to address performance of blit and fill routines
Summary: ARM assembly to address performance of blit and fill routines
Status: ASSIGNED
Alias: None
Product: SDL
Classification: Unclassified
Component: video (show other bugs)
Version: HG 1.2
Hardware: ARM Linux
: P2 enhancement
Assignee: Ryan C. Gordon
QA Contact: Sam Lantinga
URL:
Keywords: target-2.0.16
Depends on:
Blocks:
 
Reported: 2018-11-07 15:29 UTC by Ben Avison
Modified: 2020-06-27 08:47 UTC (History)
7 users (show)

See Also:


Attachments
ARM: Create configure option --enable-arm-simd to govern assembly optimizations (5.09 KB, patch)
2018-11-07 15:29 UTC, Ben Avison
Details | Diff
ARM: SIMD assembly optimization for function BlitRGBtoRGBPixelAlpha (48.48 KB, patch)
2018-11-07 15:31 UTC, Ben Avison
Details | Diff
ARM: SIMD assembly optimization for function BlitARGBto565PixelAlpha (8.79 KB, patch)
2018-11-07 15:31 UTC, Ben Avison
Details | Diff
SDL_blit: use a named enum for required hardware features bits in dispatch tables (6.48 KB, patch)
2018-11-07 15:32 UTC, Ben Avison
Details | Diff
ARM: SIMD assembly optimization for BGR-to-RGB 32bpp normal blits (4.25 KB, patch)
2018-11-07 15:33 UTC, Ben Avison
Details | Diff
ARM: assembly optimization for SDL_FillRect (4.50 KB, patch)
2018-11-07 15:33 UTC, Ben Avison
Details | Diff
ARM: SIMD optimization for 4:4:4:4 to 8:8:8:8 normal blits (5.16 KB, patch)
2018-11-07 15:34 UTC, Ben Avison
Details | Diff
ARM: Create configure option --enable-arm-neon to govern assembly optimizations (5.00 KB, patch)
2018-11-07 15:34 UTC, Ben Avison
Details | Diff
ARM: NEON assembly optimization for function BlitRGBtoRGBPixelAlpha (47.72 KB, patch)
2018-11-07 15:35 UTC, Ben Avison
Details | Diff
ARM: NEON assembly optimization for function BlitARGBto565PixelAlpha (5.69 KB, patch)
2018-11-07 15:35 UTC, Ben Avison
Details | Diff
ARM: NEON assembly optimization for SDL_FillRect (5.95 KB, patch)
2018-11-07 15:36 UTC, Ben Avison
Details | Diff
Benchmark utility (20.50 KB, text/x-csrc)
2018-11-09 02:09 UTC, Ben Avison
Details
[SDL2] ARM: Create configure option --enable-arm-simd to govern assembly optimizations (5.73 KB, patch)
2018-11-09 02:13 UTC, Ben Avison
Details | Diff
[SDL2] ARM: SIMD assembly optimization for function BlitRGBtoRGBPixelAlpha (48.51 KB, patch)
2018-11-09 02:13 UTC, Ben Avison
Details | Diff
[SDL2] ARM: SIMD assembly optimization for function BlitARGBto565PixelAlpha (8.80 KB, patch)
2018-11-09 02:14 UTC, Ben Avison
Details | Diff
[SDL2] SDL_blit: use a named enum for required hardware features bits in dispatch tables (4.49 KB, patch)
2018-11-09 02:15 UTC, Ben Avison
Details | Diff
[SDL2] ARM: SIMD assembly optimization for BGR-to-RGB 32bpp normal blits (4.13 KB, patch)
2018-11-09 02:17 UTC, Ben Avison
Details | Diff
[SDL2] ARM: assembly optimization for SDL_FillRect (4.51 KB, patch)
2018-11-09 02:18 UTC, Ben Avison
Details | Diff
ARM: SIMD optimization for 4:4:4:4 to 8:8:8:8 normal blits (5.13 KB, patch)
2018-11-09 02:18 UTC, Ben Avison
Details | Diff
[SDL2] ARM: Create configure option --enable-arm-neon to govern assembly optimizations (2.55 KB, patch)
2018-11-09 02:19 UTC, Ben Avison
Details | Diff
[SDL2] ARM: NEON assembly optimization for function BlitRGBtoRGBPixelAlpha (47.75 KB, patch)
2018-11-09 02:19 UTC, Ben Avison
Details | Diff
[SDL2] ARM: NEON assembly optimization for function BlitARGBto565PixelAlpha (5.74 KB, patch)
2018-11-09 02:21 UTC, Ben Avison
Details | Diff
[SDL2] ARM: NEON assembly optimization for SDL_FillRect (5.98 KB, patch)
2018-11-09 02:22 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: Create configure option --enable-arm-simd to govern assembly optimizations (5.08 KB, patch)
2019-02-28 19:54 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: SIMD assembly optimization for function BlitRGBtoRGBPixelAlpha (47.88 KB, patch)
2019-02-28 19:55 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: SIMD assembly optimization for function BlitARGBto565PixelAlpha (8.78 KB, patch)
2019-02-28 19:56 UTC, Ben Avison
Details | Diff
[SDL1.2] SDL_blit: use a named enum for required hardware features bits in dispatch tables (6.60 KB, patch)
2019-02-28 19:58 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: SIMD assembly optimization for BGR-to-RGB 32bpp normal blits (4.18 KB, patch)
2019-02-28 19:59 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: assembly optimization for SDL_FillRect (4.49 KB, patch)
2019-02-28 20:00 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: SIMD optimization for 4:4:4:4 to 8:8:8:8 normal blits (5.17 KB, patch)
2019-02-28 20:02 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: Create configure option --enable-arm-neon to govern assembly optimizations (5.00 KB, patch)
2019-02-28 20:05 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: NEON assembly optimization for function BlitRGBtoRGBPixelAlpha (48.59 KB, patch)
2019-02-28 20:05 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: NEON assembly optimization for function BlitARGBto565PixelAlpha (5.68 KB, patch)
2019-02-28 20:06 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: NEON assembly optimization for SDL_FillRect (5.94 KB, patch)
2019-02-28 20:06 UTC, Ben Avison
Details | Diff
[SDL2] ARM: Create configure option --enable-arm-simd to govern assembly optimizations (5.67 KB, patch)
2019-02-28 20:09 UTC, Ben Avison
Details | Diff
[SDL2] ARM: SIMD assembly optimization for function BlitRGBtoRGBPixelAlpha (47.88 KB, patch)
2019-02-28 20:12 UTC, Ben Avison
Details | Diff
[SDL2] ARM: SIMD assembly optimization for function BlitARGBto565PixelAlpha (8.77 KB, patch)
2019-02-28 20:12 UTC, Ben Avison
Details | Diff
[SDL2] SDL_blit: use a named enum for required hardware features bits in dispatch tables (4.61 KB, patch)
2019-02-28 20:13 UTC, Ben Avison
Details | Diff
[SDL2] ARM: SIMD assembly optimization for BGR-to-RGB 32bpp normal blits (4.13 KB, patch)
2019-02-28 20:13 UTC, Ben Avison
Details | Diff
[SDL2] ARM: assembly optimization for SDL_FillRect (4.51 KB, patch)
2019-02-28 20:14 UTC, Ben Avison
Details | Diff
[SDL2] ARM: SIMD optimization for 4:4:4:4 to 8:8:8:8 normal blits (5.13 KB, patch)
2019-02-28 20:15 UTC, Ben Avison
Details | Diff
[SDL2] ARM: Create configure option --enable-arm-neon to govern assembly optimizations (2.55 KB, patch)
2019-02-28 20:16 UTC, Ben Avison
Details | Diff
[SDL2] ARM: NEON assembly optimization for function BlitRGBtoRGBPixelAlpha (48.63 KB, patch)
2019-02-28 20:18 UTC, Ben Avison
Details | Diff
[SDL2] ARM: NEON assembly optimization for function BlitARGBto565PixelAlpha (5.71 KB, patch)
2019-02-28 20:18 UTC, Ben Avison
Details | Diff
[SDL2] ARM: NEON assembly optimization for SDL_FillRect (5.98 KB, patch)
2019-02-28 20:19 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: Create configure option --enable-arm-simd to govern assembly optimizations (5.67 KB, patch)
2019-10-21 17:35 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: SIMD assembly optimization for function BlitRGBtoRGBPixelAlpha (47.88 KB, patch)
2019-10-21 17:36 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: SIMD assembly optimization for function BlitARGBto565PixelAlpha (8.77 KB, patch)
2019-10-21 17:37 UTC, Ben Avison
Details | Diff
[SDL1.2] SDL_blit: use a named enum for required hardware features bits in dispatch tables (4.61 KB, patch)
2019-10-21 17:37 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: SIMD assembly optimization for BGR-to-RGB 32bpp normal blits (4.13 KB, patch)
2019-10-21 17:38 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: assembly optimization for SDL_FillRect (4.51 KB, patch)
2019-10-21 17:38 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: SIMD optimization for 4:4:4:4 to 8:8:8:8 normal blits (5.13 KB, patch)
2019-10-21 17:39 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: Create configure option --enable-arm-neon to govern assembly optimizations (2.55 KB, patch)
2019-10-21 17:39 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: NEON assembly optimization for function BlitRGBtoRGBPixelAlpha (48.63 KB, patch)
2019-10-21 17:40 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: NEON assembly optimization for function BlitARGBto565PixelAlpha (5.71 KB, patch)
2019-10-21 17:40 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: NEON assembly optimization for SDL_FillRect (5.98 KB, patch)
2019-10-21 17:41 UTC, Ben Avison
Details | Diff
[SDL2] ARM: Create configure option --enable-arm-simd to govern assembly optimizations (5.67 KB, patch)
2019-10-21 17:44 UTC, Ben Avison
Details | Diff
[SDL2] ARM: SIMD assembly optimization for function BlitRGBtoRGBPixelAlpha (47.88 KB, patch)
2019-10-21 17:45 UTC, Ben Avison
Details | Diff
[SDL2] ARM: SIMD assembly optimization for function BlitARGBto565PixelAlpha (8.77 KB, patch)
2019-10-21 17:45 UTC, Ben Avison
Details | Diff
[SDL2] SDL_blit: use a named enum for required hardware features bits in dispatch tables (4.61 KB, patch)
2019-10-21 17:46 UTC, Ben Avison
Details | Diff
[SDL2] ARM: SIMD assembly optimization for BGR-to-RGB 32bpp normal blits (4.13 KB, patch)
2019-10-21 17:47 UTC, Ben Avison
Details | Diff
[SDL2] ARM: assembly optimization for SDL_FillRect (4.51 KB, patch)
2019-10-21 17:47 UTC, Ben Avison
Details | Diff
[SDL2] ARM: SIMD optimization for 4:4:4:4 to 8:8:8:8 normal blits (5.13 KB, patch)
2019-10-21 17:47 UTC, Ben Avison
Details | Diff
[SDL2] ARM: Create configure option --enable-arm-neon to govern assembly optimizations (2.55 KB, patch)
2019-10-21 17:48 UTC, Ben Avison
Details | Diff
[SDL2] ARM: NEON assembly optimization for function BlitRGBtoRGBPixelAlpha (48.63 KB, patch)
2019-10-21 17:48 UTC, Ben Avison
Details | Diff
[SDL2] ARM: NEON assembly optimization for function BlitARGBto565PixelAlpha (5.71 KB, patch)
2019-10-21 17:49 UTC, Ben Avison
Details | Diff
[SDL2] ARM: NEON assembly optimization for SDL_FillRect (5.98 KB, patch)
2019-10-21 17:49 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: Create configure option --enable-arm-simd to govern assembly optimizations (5.99 KB, patch)
2019-10-30 15:44 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: SIMD assembly optimization for function BlitRGBtoRGBPixelAlpha (47.88 KB, patch)
2019-10-30 15:45 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: SIMD assembly optimization for function BlitARGBto565PixelAlpha (8.77 KB, patch)
2019-10-30 15:46 UTC, Ben Avison
Details | Diff
[SDL1.2] SDL_blit: use a named enum for required hardware features bits in dispatch tables (6.60 KB, patch)
2019-10-30 15:46 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: SIMD assembly optimization for BGR-to-RGB 32bpp normal blits (4.18 KB, patch)
2019-10-30 15:47 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: assembly optimization for SDL_FillRect (4.49 KB, patch)
2019-10-30 15:48 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: SIMD optimization for 4:4:4:4 to 8:8:8:8 normal blits (5.17 KB, patch)
2019-10-30 15:48 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: Create configure option --enable-arm-neon to govern assembly optimizations (5.00 KB, patch)
2019-10-30 15:49 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: NEON assembly optimization for function BlitRGBtoRGBPixelAlpha (47.46 KB, patch)
2019-10-30 15:49 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: NEON assembly optimization for function BlitARGBto565PixelAlpha (5.68 KB, patch)
2019-10-30 15:50 UTC, Ben Avison
Details | Diff
[SDL1.2] ARM: NEON assembly optimization for SDL_FillRect (5.94 KB, patch)
2019-10-30 15:50 UTC, Ben Avison
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ben Avison 2018-11-07 15:29:25 UTC
Created attachment 3452 [details]
ARM: Create configure option --enable-arm-simd to govern assembly optimizations

In tests on the Raspberry Pi, various applications suffer poor graphics performance, traceable to SDL routines. Presented here are a series of patches to SDL 1.2 which provide assembly versions of these functions, under control from the configure script.

The Raspberry Pi is available with a range of ARM architectures (ARMv6 for the Pi 1 and 0, ARMv7/v8 with NEON for the Pi 2 and 3), and two different implementations give best results on different versions. A single binary can contain both implementations and choose between them by detecting the CPU at runtime.
Comment 1 Ben Avison 2018-11-07 15:31:22 UTC
Created attachment 3453 [details]
ARM: SIMD assembly optimization for function BlitRGBtoRGBPixelAlpha
Comment 2 Ben Avison 2018-11-07 15:31:58 UTC
Created attachment 3454 [details]
ARM: SIMD assembly optimization for function BlitARGBto565PixelAlpha
Comment 3 Ben Avison 2018-11-07 15:32:27 UTC
Created attachment 3455 [details]
SDL_blit: use a named enum for required hardware features bits in dispatch tables
Comment 4 Ben Avison 2018-11-07 15:33:01 UTC
Created attachment 3456 [details]
ARM: SIMD assembly optimization for BGR-to-RGB 32bpp normal blits
Comment 5 Ben Avison 2018-11-07 15:33:28 UTC
Created attachment 3457 [details]
ARM: assembly optimization for SDL_FillRect
Comment 6 Ben Avison 2018-11-07 15:34:05 UTC
Created attachment 3458 [details]
ARM: SIMD optimization for 4:4:4:4 to 8:8:8:8 normal blits
Comment 7 Ben Avison 2018-11-07 15:34:34 UTC
Created attachment 3459 [details]
ARM: Create configure option --enable-arm-neon to govern assembly optimizations
Comment 8 Ben Avison 2018-11-07 15:35:09 UTC
Created attachment 3460 [details]
ARM: NEON assembly optimization for function BlitRGBtoRGBPixelAlpha
Comment 9 Ben Avison 2018-11-07 15:35:38 UTC
Created attachment 3461 [details]
ARM: NEON assembly optimization for function BlitARGBto565PixelAlpha
Comment 10 Ben Avison 2018-11-07 15:36:10 UTC
Created attachment 3462 [details]
ARM: NEON assembly optimization for SDL_FillRect
Comment 11 Sam Lantinga 2018-11-07 23:42:46 UTC
Do you have equivalent patches for SDL 2.0, along with performance testing results?

Thanks!
Comment 12 Ozkan Sezer 2018-11-08 05:20:43 UTC
Should we merge the existing patches to SDL-1.2 if we get performance
test results?
Comment 13 Ben Avison 2018-11-09 02:07:00 UTC
If you want an idea of the difference this makes in a real-world application, running this Pygame application on a Pi 3:

http://sphinx.mythic-beasts.com/~eupton/myriapod.tgz

* does 9.6 fps with the existing C implementations
* does 22.6 fps with the ARMv6-tuned optimizations
* does 27.3 fps with the NEON optimizations


For more formal benchmarking, I wrote the attached benchmarker utility. It's based on the benchmarker that's included with Pixman (from where I borrowed the assembly framework used in most of the opimizations). It measures performance in cases where the image data is in the L1 cache, vs in the L2 cache, vs in main memory.

Figures represent the rate of writing to the output image, in megabytes/second. Assembly speedups are quoted relative to the original C versions.  It's worth bearing in mind that in cases that do alpha blending, these benchmarks only measure the case where all source pixels are semi-transparent, which is not necessarily very representative of real-world use.


For the Pi 1 (ARMv6, ARM1176JZFS):

FillRect
                  L1            L2        Memory
C                889           462           337
ARMv6 assembly  3518 (x3.96)  1406 (x3.04)  1314 (x3.90)


BlitRGBtoRGB
                  L1            L2        Memory
C                 74.9          47.8         38.6
ARMv6 assembly    91.5 (x1.22)  81.2 (x1.69) 79.3 (x2.06)


BlitARGBto565
                  L1            L2        Memory
C                 38.0          25.1          21.7
ARMv6 assembly    48.7 (x1.28)  42.5 (x1.69)  43.2 (x1.99)


Blit_BGR888_RGB888
                  L1            L2        Memory
C                 29.2          24.3          24.4
ARMv6 assembly   528 (x18.1)   295 (x12.1)   279 (x11.5)


Blit_RGB444_RGB888
                  L1            L2        Memory
C                 31.0          27.0          27.3
ARMv6 assembly   328 (x10.6)   190 (x7.04)   246 (x9.02)


For the Pi 3 (ARMv8 with NEON, Cortex-A53):

FillRect
                  L1            L2        Memory
C               2120          2100          1091
ARMv6 assembly  4704 (x2.22)  4591 (x2.19)  1098 (x1.01)
NEON assembly  15020 (x7.08) 10662 (x5.08)  1093 (x1.00)

BlitRGBtoRGB
                  L1            L2        Memory
C                236           233           205
ARMv6 assembly   242 (x1.02)   241 (x1.04)   238 (x1.16)
NEON assembly   1081 (x4.58)  1015 (x4.36)   571 (x2.79)

BlitARGBto565
                  L1            L2        Memory
C                111           110           101
ARMv6 assembly   126 (x1.14)   126 (x1.14)   125 (x1.23)
NEON assembly    555 (x4.99)   535 (x4.84)   440 (x4.34)

Blit_BGR888_RGB888
                  L1            L2        Memory
C                101           101            97.5
ARMv6 assembly  1510 (x14.9)  1470 (x14.6)   886 (x9.09)

Blit_RGB444_RGB888
                  L1            L2        Memory
C                102           101            98.9
ARMv6 assembly   667 (x6.56)   668 (x6.59)   660 (x6.68)


Regarding the 2.0 branch, I've had a go at porting it across, so I shall shortly be attaching a second patch series. It builds for me, but it's untested and unbenchmarked.

I see that the 2.0 branch already has some ARM NEON detection logic, though it's used for audio rather than video. It didn't feature an equivalent of the --enable-arm-neon configure switch I created for the 1.2 branch, so I've migrated that across. At present, it only controls whether my video NEON optimisations are built - perhaps you'd like it to control the existing audio ones also?
Comment 14 Ben Avison 2018-11-09 02:09:04 UTC
Created attachment 3468 [details]
Benchmark utility
Comment 15 Ben Avison 2018-11-09 02:13:05 UTC
Created attachment 3469 [details]
[SDL2] ARM: Create configure option --enable-arm-simd to govern assembly optimizations
Comment 16 Ben Avison 2018-11-09 02:13:51 UTC
Created attachment 3470 [details]
[SDL2] ARM: SIMD assembly optimization for function BlitRGBtoRGBPixelAlpha
Comment 17 Ben Avison 2018-11-09 02:14:27 UTC
Created attachment 3471 [details]
[SDL2] ARM: SIMD assembly optimization for function BlitARGBto565PixelAlpha
Comment 18 Ben Avison 2018-11-09 02:15:13 UTC
Created attachment 3472 [details]
[SDL2] SDL_blit: use a named enum for required hardware features bits in dispatch tables
Comment 19 Ben Avison 2018-11-09 02:17:22 UTC
Created attachment 3473 [details]
[SDL2] ARM: SIMD assembly optimization for BGR-to-RGB 32bpp normal blits
Comment 20 Ben Avison 2018-11-09 02:18:06 UTC
Created attachment 3474 [details]
[SDL2] ARM: assembly optimization for SDL_FillRect
Comment 21 Ben Avison 2018-11-09 02:18:36 UTC
Created attachment 3475 [details]
ARM: SIMD optimization for 4:4:4:4 to 8:8:8:8 normal blits
Comment 22 Ben Avison 2018-11-09 02:19:13 UTC
Created attachment 3476 [details]
[SDL2] ARM: Create configure option --enable-arm-neon to govern assembly optimizations
Comment 23 Ben Avison 2018-11-09 02:19:49 UTC
Created attachment 3477 [details]
[SDL2] ARM: NEON assembly optimization for function BlitRGBtoRGBPixelAlpha
Comment 24 Ben Avison 2018-11-09 02:21:32 UTC
Created attachment 3478 [details]
[SDL2] ARM: NEON assembly optimization for function BlitARGBto565PixelAlpha
Comment 25 Ben Avison 2018-11-09 02:22:01 UTC
Created attachment 3479 [details]
[SDL2] ARM: NEON assembly optimization for SDL_FillRect
Comment 26 Sylvain 2019-01-23 13:53:25 UTC
Hi, 
Just wondering. I heard NEON assembly had differences between armeabi-v7a and arm64. Is this true ? And if so, which arch those neon routines are for ?
Comment 27 Ben Avison 2019-01-23 14:14:52 UTC
Yes, 32-bit and 64-bit ARM assembly are quite different in many ways, enough that it's generally impractical to write either pure assembly or inline assembly in C/C++ to target both. Some code sharing can be achieved using compiler intrinsics, but in my experience it's not possible to achieve the best results using intrinsics, at least with current compilers.

All these optimizations are written for 32-bit ARM instruction sets (and should work when compiled as either ARM or Thumb).
Comment 28 Sam Lantinga 2019-01-25 10:09:31 UTC
Thanks for creating these!

It looks like you're the original author of the pixman code used here. Some companies are leery of mixed licenses, would it be possible to re-license the assembly under the Zlib license?

The patches generally look good, here are a few comments:
* CPU_HAS_NEON should not be changed, this breaks ABI compatibility
* Casting an enum to unsigned int can be dangerous, some compilers use smaller datatypes (e.g. char) for small enumerations. You should declare features as the type of variable you're scanning into here:
SDL_sscanf(override, "%u", (unsigned int *) &features);
Comment 29 Ben Avison 2019-02-06 13:38:43 UTC
For some reason, bugzilla has only just let me know about Sam's comment!

> It looks like you're the original author of the pixman code used here. Some
> companies are leery of mixed licenses, would it be possible to re-license the
> assembly under the Zlib license?

Well, I wrote a lot of the pixman code. I'm pretty sure I can relicense that.

However, pixman-arm-neon-asm.h and the top part of pixman-arm-neon-asm.S (down to line 77) were written by Siarhei Siamashka and say they're (c) Nokia. It appears he left their employ in 2012, so I don't know how practical it would be to change the license on those. pixman-arm-asm.h is pretty trivial, but its sole macro definition was actually also originally written by Siarhei during his time at Nokia (pixman commit 3ef20333 - the Mozilla copyright refers to the C source code of which this is a disassembly, but none of that remains in pixman-arm-asm.h).

The pixman code is under a variant of the MIT license, which is similar in spirit to the zlib license. It looks to me as though main additional action it places on the user to *do* (as opposed to not do) is that it requires the copyright message and license header be included in documentation, whereas the zlib license only requests ackonwledgment in the documentation. How much of a deal-breaker is it if I can't get permission to relicense from Nokia?

> * CPU_HAS_NEON should not be changed, this breaks ABI compatibility

It's only on the SDL2 branch that I changed value of existing defines - I think it was because I wrote the SDL1.2 patches first and I wanted to keep the ordering consistent between the two branches, but you're right, ABI compatibility trumps that consideration.

> * Casting an enum to unsigned int can be dangerous, some compilers use smaller datatypes (e.g. char) for small enumerations. You should declare features as the type of variable you're scanning into here:
> SDL_sscanf(override, "%u", (unsigned int *) &features);

Good catch. I suppose I could revert to using unsigned int and use #defines for the values instead, but I generally prefer enums where possible because it lets debuggers present the symbolic names. Would something like this be acceptable?

--- a/src/video/SDL_blit_N.c
+++ b/src/video/SDL_blit_N.c
@@ -856,12 +856,13 @@ static enum blit_features
 GetBlitFeatures(void)
 {
     static enum blit_features features = -1;
-    if (features == -1) {
+    if (features == (enum blit_features) -1) {
         /* Provide an override for testing .. */
         char *override = SDL_getenv("SDL_ALTIVEC_BLIT_FEATURES");
         if (override) {
-            features = 0;
-            SDL_sscanf(override, "%u", (unsigned int *) &features);
+            unsigned int features_as_uint = 0;
+            SDL_sscanf(override, "%u", &features_as_uint);
+            features = (enum blit_features) features_as_uint;
         } else {
             features = (0
                         /* Feature 1 is has-MMX */

One more question: can I post a changes like this (or changes of license headers) here as patches to be applied at the end after the existing ones, or do you need me to rework the old ones to incorporate review comments? It'll be a bit tedious to re-upload so many of them, and it won't be immediately obvious which patches are outdated, so perhaps it would be best to open a new "bug" in that case?
Comment 30 Ben Avison 2019-02-28 19:52:35 UTC
In an attempt to keep this from going stale, I have rolled in the above changes and rebased my patch series to the tips of the 1.2 and 2.0 branches respectively (nearly 4 months of development have introduced some conflicts). I have relicensed what I can to the zlib license, but I have yet to establish who at Nokia would have the rights to relicense the remaining MIT-licensed code.
Comment 31 Ben Avison 2019-02-28 19:54:21 UTC
Created attachment 3661 [details]
[SDL1.2] ARM: Create configure option --enable-arm-simd to govern assembly optimizations
Comment 32 Ben Avison 2019-02-28 19:55:11 UTC
Created attachment 3662 [details]
[SDL1.2] ARM: SIMD assembly optimization for function BlitRGBtoRGBPixelAlpha
Comment 33 Ben Avison 2019-02-28 19:56:56 UTC
Created attachment 3663 [details]
[SDL1.2] ARM: SIMD assembly optimization for function BlitARGBto565PixelAlpha
Comment 34 Ben Avison 2019-02-28 19:58:53 UTC
Created attachment 3664 [details]
[SDL1.2] SDL_blit: use a named enum for required hardware features bits in dispatch tables
Comment 35 Ben Avison 2019-02-28 19:59:47 UTC
Created attachment 3665 [details]
[SDL1.2] ARM: SIMD assembly optimization for BGR-to-RGB 32bpp normal blits
Comment 36 Ben Avison 2019-02-28 20:00:56 UTC
Created attachment 3666 [details]
[SDL1.2] ARM: assembly optimization for SDL_FillRect
Comment 37 Ben Avison 2019-02-28 20:02:20 UTC
Created attachment 3667 [details]
[SDL1.2] ARM: SIMD optimization for 4:4:4:4 to 8:8:8:8 normal blits
Comment 38 Ben Avison 2019-02-28 20:05:02 UTC
Created attachment 3668 [details]
[SDL1.2] ARM: Create configure option --enable-arm-neon to govern assembly optimizations
Comment 39 Ben Avison 2019-02-28 20:05:37 UTC
Created attachment 3669 [details]
[SDL1.2] ARM: NEON assembly optimization for function BlitRGBtoRGBPixelAlpha
Comment 40 Ben Avison 2019-02-28 20:06:10 UTC
Created attachment 3670 [details]
[SDL1.2] ARM: NEON assembly optimization for function BlitARGBto565PixelAlpha
Comment 41 Ben Avison 2019-02-28 20:06:44 UTC
Created attachment 3671 [details]
[SDL1.2] ARM: NEON assembly optimization for SDL_FillRect
Comment 42 Ben Avison 2019-02-28 20:09:20 UTC
Created attachment 3672 [details]
[SDL2] ARM: Create configure option --enable-arm-simd to govern assembly optimizations
Comment 43 Ben Avison 2019-02-28 20:12:24 UTC
Created attachment 3673 [details]
[SDL2] ARM: SIMD assembly optimization for function BlitRGBtoRGBPixelAlpha
Comment 44 Ben Avison 2019-02-28 20:12:49 UTC
Created attachment 3674 [details]
[SDL2] ARM: SIMD assembly optimization for function BlitARGBto565PixelAlpha
Comment 45 Ben Avison 2019-02-28 20:13:19 UTC
Created attachment 3675 [details]
[SDL2] SDL_blit: use a named enum for required hardware features bits in dispatch tables
Comment 46 Ben Avison 2019-02-28 20:13:57 UTC
Created attachment 3676 [details]
[SDL2] ARM: SIMD assembly optimization for BGR-to-RGB 32bpp normal blits
Comment 47 Ben Avison 2019-02-28 20:14:53 UTC
Created attachment 3677 [details]
[SDL2] ARM: assembly optimization for SDL_FillRect
Comment 48 Ben Avison 2019-02-28 20:15:41 UTC
Created attachment 3678 [details]
[SDL2] ARM: SIMD optimization for 4:4:4:4 to 8:8:8:8 normal blits
Comment 49 Ben Avison 2019-02-28 20:16:29 UTC
Created attachment 3679 [details]
[SDL2] ARM: Create configure option --enable-arm-neon to govern assembly optimizations
Comment 50 Ben Avison 2019-02-28 20:18:21 UTC
Created attachment 3680 [details]
[SDL2] ARM: NEON assembly optimization for function BlitRGBtoRGBPixelAlpha
Comment 51 Ben Avison 2019-02-28 20:18:54 UTC
Created attachment 3681 [details]
[SDL2] ARM: NEON assembly optimization for function BlitARGBto565PixelAlpha
Comment 52 Ben Avison 2019-02-28 20:19:28 UTC
Created attachment 3682 [details]
[SDL2] ARM: NEON assembly optimization for SDL_FillRect
Comment 53 Ryan C. Gordon 2019-07-30 17:49:37 UTC
(Sorry if you get several emails like this, we're marking a bunch of bugs.)

We're hoping to ship SDL 2.0.11 on a much shorter timeframe than we have historically done releases, so I'm starting to tag bugs we hope to have closed in this release cycle.

Note that this tag means we just intend to scrutinize this bug for the 2.0.11 release: we may fix it, reject it, or even push it back to a later release for now, but this helps give us both a goal and a wishlist for the next release.

If this bug has been quiet for a few months and you have new information (such as, "this is definitely still broken" or "this got fixed at some point"), please feel free to retest and/or add more notes to the bug.

--ryan.
Comment 54 Ben Avison 2019-08-19 17:13:56 UTC
I've finally managed to get a response from Siarhei, thee author of pixman-arm-neon-asm.h and the top part of pixman-arm-neon-asm.S (down to line 77) about the potential of changing the license on that code. It doesn't sound too promising:

> Unfortunately I can't help much. I don't know who even owns this
> copyright nowadays after Microsoft bought Nokia's smartphone business.
> 
> But if I remember correctly, Nokia's lawyers did not particularly
> like the zlib license because its no-liability disclaimer was
> kinda weak. Looks like the following stackexchange question
> discusses this particular problem:
>    https://opensource.stackexchange.com/questions/4329/is-zlib-license-no-waranty-no-liability-clause-enough-for-us-and-eu-law
> 
> For example, the zlib license only mentions "authors", while the MIT
> license mentions "AUTHORS OR COPYRIGHT HOLDERS".

Given that this is very specialist code, it's going to be hard to find someone able to do a clean-room implementation that could be under the zlib license. Is that really the only path ahead?
Comment 55 Ozkan Sezer 2019-09-07 10:25:50 UTC
Are there any licensing issues with the SDL-1.2 versions of the patches?
Comment 56 Ben Avison 2019-09-10 14:41:17 UTC
Unfortunately, the same MIT-licensed code is present in both the SDL1.2 and SDL2.0 branches.

However, if you're able to apply just the ARMv6-compatible patches, I have been able to relicense all of those to zlib. However, as the benchmarks show, you'll still be missing most of the speed boost if you do that (apart from on 1st-generation Raspberry Pis or Pi Zeros, both of which have an ARMv6 CPU).
Comment 57 Ozkan Sezer 2019-09-10 16:37:26 UTC
(In reply to bavison from comment #56)
> Unfortunately, the same MIT-licensed code is present
> in both the SDL1.2 and SDL2.0 branches.

Unlike SDL-2.x, SDL-1.2 is LGPL, so your patches _might_
be OK in SDL-1.2. (Not a lawyer, hence I'm asking.)
Comment 58 Sam Lantinga 2019-09-10 18:35:55 UTC
Let's go ahead and apply the patches for 2.0.11, with the MIT code clearly delineated, and afterward see if we can do a clean-room implementation, or disable compiling that code by default.

Companies that accept the zlib license also generally accept the MIT license, so worst case we can just identify the code for lawyers to decide whether they want to accept it.
Comment 59 Ben Avison 2019-09-10 19:21:44 UTC
OK. Do you need me to split pixman-arm-neon-asm.S into two so that each half only has one license header, or can you handle that when you merge it?

FWIW, I had a few thoughts on clean-rooming, in case it comes to that:

The main bit of code that would need a clean-room reimplementation is the NEON version of the macro `generate_composite_function`. I've had so much exposure to it with my work on Pixman that I'm not really in a position to do it myself. However, I wrote a roughly analogous macro for ARMv6 which might be a useful guide, as I was able to relicense that; the main difference is that ARMv6 has to use its main register bank for pixel data manipulation as well as for holding pointers, counters etc, so everything is much tighter for space.

Basically, what these macros do is handle all the actions that are common to different blit routines - looping over the rectangle, loading and storing data (whenever possible, with cacheline alignment at the destination image), deinterleaving and reinterleaving colour components of 32bpp pixels, and issuing prefetch instructions.

The prefetch part of the functionality is the part that's most likely to get into legal trouble, since it has a very distinct and not necessarily obvious algorithm. However, software prefetch's greatest benefit was for relatively low-end ARMv7 processors like Cortex-A8, A9 or A7 (and pre-NEON ones too, but I'm discounting them because I'm talking about NEON code here). It's less of an issue with the latest CPUs, so the easiest approach might just to be to abandon any effort to do software prefetch.

One thing in our favour is that `generate_composite_function` is way more general than is needed for the two SDL functions I used it for - for example, they both only have one source image, and the destination image is 32bpp in both cases.
Comment 60 Ryan C. Gordon 2019-09-20 20:47:39 UTC
We're changing how we do SDL release versions; now releases will be even numbers (2.0.10, 2.0.12, etc), and as soon as we tag a release, we'll move the internal version number to an odd number (2.0.12 ships, we tag the latest in revision control as 2.0.13 immediately, which will become 2.0.14 on release, etc).

As such, I'm moving the bugs tagged with target-2.0.11 to target 2.0.12. Sorry if you get a lot of email from this change!

Thanks,
--ryan.
Comment 61 Ryan C. Gordon 2019-09-20 20:48:39 UTC
We're changing how we do SDL release versions; now releases will be even numbers (2.0.10, 2.0.12, etc), and as soon as we tag a release, we'll move the internal version number to an odd number (2.0.12 ships, we tag the latest in revision control as 2.0.13 immediately, which will become 2.0.14 on release, etc).

As such, I'm moving the bugs tagged with target-2.0.11 to target 2.0.12. Sorry if you get a lot of email from this change!

Thanks,
--ryan.
Comment 62 Sylvain 2019-09-24 10:31:33 UTC
Maybe we could pixman directly instead ? Would it be ok with the license?

I mean modifying the software renderer to use optionnally pixman transformation routines.
Comment 63 Ben Avison 2019-09-24 13:33:58 UTC
Unfortunately, the per-pixel calculations differ in Pixman, meaning the results would be incompatible with what existing SDL clients expect if we were to attempt to link with a Pixman library binary.

Specifically, Pixman consistently always uses "premultiplied alpha" colour components where maximum saturation of any given colour channel is achieved when its value matches the corresponding pixel of the alpha channel. With SDL, maximum saturation of of a colour channel is represented by 0xFF (or correspondingly less when fewer than 8 bits are used to store it) even when the alpha channel has a lower value.

The other difference I recall is that SDL blend functions leave the alpha channel of the destination image unchanged, whereas Pixman would blend in an appropriate fraction of the source image's alpha channel, in the same manner as the RGB channels. This cannot be changed without breaking compatibility with existing SDL clients, as I discovered during implementation.

The code I borrowed from Pixman forms a common wrapper around its many rectangle processing optimisations, and ensures that loads, stores and colour component packing and unpacking are performed as efficiently as possible. It's probably at least as important to the speedups as is the SIMD maths that is performed by the code I wrote specifically to implement SDL's blending rules.

I have to say that I think it's highly unlikely that Pixman would accept SDL-compatible blit routines into their library: they're very strict about the algorithms they use. This is why the only option I was left with was to copy their macro definitions across into SDL.
Comment 64 Ryan C. Gordon 2019-10-18 04:55:37 UTC
I just want to make sure: is the license incompatibility a non-starter for us here? We need to decide if we're going to close this bug or work on merging these patches in.

We should assume no one is ever going to rewrite that one file; it's a massive pile of assembly code.

Sam, do you want to revisit your previous decision on the matter, include it as-is, or punt this down the road to 2.0.13?

--ryan.
Comment 65 Sam Lantinga 2019-10-19 20:03:15 UTC
Yeah, I think it's still okay to include if we mark it appropriately with the license, and ifdef the code so it's easy to exclude if somebody wants.

Ben, can you split pixman-arm-neon-asm.S based on license, the way you described?

Thanks for all your work on this!
Comment 66 Ben Avison 2019-10-21 17:35:08 UTC
Created attachment 3995 [details]
[SDL1.2] ARM: Create configure option --enable-arm-simd to govern assembly optimizations
Comment 67 Ben Avison 2019-10-21 17:36:40 UTC
Created attachment 3996 [details]
[SDL1.2] ARM: SIMD assembly optimization for function BlitRGBtoRGBPixelAlpha
Comment 68 Ben Avison 2019-10-21 17:37:09 UTC
Created attachment 3997 [details]
[SDL1.2] ARM: SIMD assembly optimization for function BlitARGBto565PixelAlpha
Comment 69 Ben Avison 2019-10-21 17:37:47 UTC
Created attachment 3998 [details]
[SDL1.2] SDL_blit: use a named enum for required hardware features bits in dispatch tables
Comment 70 Ben Avison 2019-10-21 17:38:24 UTC
Created attachment 3999 [details]
[SDL1.2] ARM: SIMD assembly optimization for BGR-to-RGB 32bpp normal blits
Comment 71 Ben Avison 2019-10-21 17:38:51 UTC
Created attachment 4000 [details]
[SDL1.2] ARM: assembly optimization for SDL_FillRect
Comment 72 Ben Avison 2019-10-21 17:39:19 UTC
Created attachment 4001 [details]
[SDL1.2] ARM: SIMD optimization for 4:4:4:4 to 8:8:8:8 normal blits
Comment 73 Ben Avison 2019-10-21 17:39:49 UTC
Created attachment 4002 [details]
[SDL1.2] ARM: Create configure option --enable-arm-neon to govern assembly optimizations
Comment 74 Ben Avison 2019-10-21 17:40:20 UTC
Created attachment 4003 [details]
[SDL1.2] ARM: NEON assembly optimization for function BlitRGBtoRGBPixelAlpha
Comment 75 Ben Avison 2019-10-21 17:40:41 UTC
Created attachment 4004 [details]
[SDL1.2] ARM: NEON assembly optimization for function BlitARGBto565PixelAlpha
Comment 76 Ben Avison 2019-10-21 17:41:12 UTC
Created attachment 4005 [details]
[SDL1.2] ARM: NEON assembly optimization for SDL_FillRect
Comment 77 Ben Avison 2019-10-21 17:44:57 UTC
Created attachment 4006 [details]
[SDL2] ARM: Create configure option --enable-arm-simd to govern assembly optimizations
Comment 78 Ben Avison 2019-10-21 17:45:24 UTC
Created attachment 4007 [details]
[SDL2] ARM: SIMD assembly optimization for function BlitRGBtoRGBPixelAlpha
Comment 79 Ben Avison 2019-10-21 17:45:55 UTC
Created attachment 4008 [details]
[SDL2] ARM: SIMD assembly optimization for function BlitARGBto565PixelAlpha
Comment 80 Ben Avison 2019-10-21 17:46:21 UTC
Created attachment 4009 [details]
[SDL2] SDL_blit: use a named enum for required hardware features bits in dispatch tables
Comment 81 Ben Avison 2019-10-21 17:47:00 UTC
Created attachment 4010 [details]
[SDL2] ARM: SIMD assembly optimization for BGR-to-RGB 32bpp normal blits
Comment 82 Ben Avison 2019-10-21 17:47:27 UTC
Created attachment 4011 [details]
[SDL2] ARM: assembly optimization for SDL_FillRect
Comment 83 Ben Avison 2019-10-21 17:47:57 UTC
Created attachment 4012 [details]
[SDL2] ARM: SIMD optimization for 4:4:4:4 to 8:8:8:8 normal blits
Comment 84 Ben Avison 2019-10-21 17:48:27 UTC
Created attachment 4013 [details]
[SDL2] ARM: Create configure option --enable-arm-neon to govern assembly optimizations
Comment 85 Ben Avison 2019-10-21 17:48:57 UTC
Created attachment 4014 [details]
[SDL2] ARM: NEON assembly optimization for function BlitRGBtoRGBPixelAlpha
Comment 86 Ben Avison 2019-10-21 17:49:24 UTC
Created attachment 4015 [details]
[SDL2] ARM: NEON assembly optimization for function BlitARGBto565PixelAlpha
Comment 87 Ben Avison 2019-10-21 17:49:50 UTC
Created attachment 4016 [details]
[SDL2] ARM: NEON assembly optimization for SDL_FillRect
Comment 88 Ben Avison 2019-10-21 17:53:19 UTC
There you go - achieved by shuffling the offending lines from pixman-arm-neon-asm.S into pixman-arm-neon-asm.h, which was already 100% MIT-licensed. I've also rebased everything again so that it should all apply cleanly.
Comment 89 Ryan C. Gordon 2019-10-25 01:44:09 UTC
(In reply to Ben Avison from comment #88)
> There you go - achieved by shuffling the offending lines from
> pixman-arm-neon-asm.S into pixman-arm-neon-asm.h, which was already 100%
> MIT-licensed. I've also rebased everything again so that it should all apply
> cleanly.

Thanks!

These are all in revision control now:

https://hg.libsdl.org/SDL/rev/de3493dacdaf
https://hg.libsdl.org/SDL/rev/924cc078b2a0
https://hg.libsdl.org/SDL/rev/f84bb6a0ac86
https://hg.libsdl.org/SDL/rev/5b1f186e43f4
https://hg.libsdl.org/SDL/rev/0259428bf4cf
https://hg.libsdl.org/SDL/rev/bd6e73345614
https://hg.libsdl.org/SDL/rev/eab5b002d2ab
https://hg.libsdl.org/SDL/rev/60b1290fe381
https://hg.libsdl.org/SDL/rev/f48e492e4ba7
https://hg.libsdl.org/SDL/rev/1d8fafac75cc
https://hg.libsdl.org/SDL/rev/094ee379b862


Plus one from me to warn about MIT-license:

https://hg.libsdl.org/SDL/rev/398670c70a02

I'll look at CMake support real fast and then I think this bug is done for SDL2.

Ozkan: you want me to deal with SDL 1.2, or do you want to do it?

--ryan.
Comment 90 Ryan C. Gordon 2019-10-25 03:24:51 UTC
CMake is wired up in https://hg.libsdl.org/SDL/rev/9bf7f9ba2bf4

All that's left is SDL 1.2, so I'm taking the target-2.0.12 keyword off this bug now.

--ryan.
Comment 91 Ozkan Sezer 2019-10-25 07:12:26 UTC
> Ozkan: you want me to deal with SDL 1.2, or do you want to do it?

The new SDL-1.2 versions of the patches do not apply. (The older
ones still apply though.)  If Ben re-diffs them, I can deal with
SDL-1.2.
Comment 92 Sylvain 2019-10-25 14:10:07 UTC
Not sure why, but SDL_cpuinfo refuses to compile on the android buildbot,
(See https://buildbot.libsdl.org/waterfall)

Maybe a matter of updating the NDK, (it compiles on newer I think)


In file included from ./src/cpuinfo/SDL_cpuinfo.c:335:
/android-ndk-r13b/platforms/android-16/arch-arm/usr/include/elf.h:50:5: error: expected identifier
    AT_HWCAP,
    ^
./src/cpuinfo/SDL_cpuinfo.c:68:18: note: expanded from macro 'AT_HWCAP'
#define AT_HWCAP 16
                 ^
1 error generated.
make: *** [build/android/obj/local/armeabi-v7a/objs/SDL2/src/cpuinfo/SDL_cpuinfo.o] Error 1
Comment 93 Sylvain 2019-10-27 13:56:54 UTC
Fixed it in https://hg.libsdl.org/SDL/rev/8974226b4075

__linux__ is defined when compiling android-ndk, so it includes <elf.h> 
which try to create an enum like:

#if !defined(AT_NULL)
/* these definitions are missing from the BSD sources */
enum {
    AT_NULL = 0,
    ...
    AT_PLATFORM,
    AT_HWCAP,
    AT_CLKTCK,

    AT_SECURE = 23
};
#endif

(This is not the case in newer ndk because elf.h is different)
Comment 94 Sylvain 2019-10-29 15:16:42 UTC
I think also there is an issue in SDL_FillRects for arm simd/neon, it needs to set the fill_function.  Not tested ...
https://hg.libsdl.org/SDL/rev/38141e83d0fa
Comment 95 Ben Avison 2019-10-30 15:44:41 UTC
Created attachment 4026 [details]
[SDL1.2] ARM: Create configure option --enable-arm-simd to govern assembly optimizations
Comment 96 Ben Avison 2019-10-30 15:45:32 UTC
Created attachment 4027 [details]
[SDL1.2] ARM: SIMD assembly optimization for function BlitRGBtoRGBPixelAlpha
Comment 97 Ben Avison 2019-10-30 15:46:25 UTC
Created attachment 4028 [details]
[SDL1.2] ARM: SIMD assembly optimization for function BlitARGBto565PixelAlpha
Comment 98 Ben Avison 2019-10-30 15:46:59 UTC
Created attachment 4029 [details]
[SDL1.2] SDL_blit: use a named enum for required hardware features bits in dispatch tables
Comment 99 Ben Avison 2019-10-30 15:47:32 UTC
Created attachment 4030 [details]
[SDL1.2] ARM: SIMD assembly optimization for BGR-to-RGB 32bpp normal blits
Comment 100 Ben Avison 2019-10-30 15:48:08 UTC
Created attachment 4031 [details]
[SDL1.2] ARM: assembly optimization for SDL_FillRect
Comment 101 Ben Avison 2019-10-30 15:48:36 UTC
Created attachment 4032 [details]
[SDL1.2] ARM: SIMD optimization for 4:4:4:4 to 8:8:8:8 normal blits
Comment 102 Ben Avison 2019-10-30 15:49:09 UTC
Created attachment 4033 [details]
[SDL1.2] ARM: Create configure option --enable-arm-neon to govern assembly optimizations
Comment 103 Ben Avison 2019-10-30 15:49:45 UTC
Created attachment 4034 [details]
[SDL1.2] ARM: NEON assembly optimization for function BlitRGBtoRGBPixelAlpha
Comment 104 Ben Avison 2019-10-30 15:50:14 UTC
Created attachment 4035 [details]
[SDL1.2] ARM: NEON assembly optimization for function BlitARGBto565PixelAlpha
Comment 105 Ben Avison 2019-10-30 15:50:46 UTC
Created attachment 4036 [details]
[SDL1.2] ARM: NEON assembly optimization for SDL_FillRect
Comment 106 Ben Avison 2019-10-30 16:07:35 UTC
@Ozkan I can't see any changes on the SDL1.2 branch that should have interacted with my changes since I last rebased my patch series. However, it's quite possible I fluffed it somehow while generating them, so I've had another go.

I also have a copy of the changes on a branch on my mirror at https://github.com/bavison/SDL/commits/arm-blit-1.2 if it's any easier to get the patches from there. It's certainly quite laborious for me to upload them to Bugzilla!

I haven't included a commit to reflect the changes to `configure` or `test/configure` that happen when I run autogen.sh, partly because I don't have the same version of Autoconf installed so there are loads of unrelated changes that drown out the ones relating to the new command line options. Is that the problem?

I don't *think* any of the other corrections that have been made on the SDL2 branch need to be backported to SDL1.2, with the possible exception of the Android build fix, but I could be wrong!
Comment 107 Ozkan Sezer 2019-10-31 11:05:15 UTC
(In reply to Ben Avison from comment #106)
> @Ozkan I can't see any changes on the SDL1.2 branch that should have
> interacted with my changes since I last rebased my patch series. However,
> it's quite possible I fluffed it somehow while generating them, so I've had
> another go.
[...]
> I don't *think* any of the other corrections that have been made on the SDL2
> branch need to be backported to SDL1.2, with the possible exception of the
> Android build fix, but I could be wrong!

The first patch had mistakenly removed CheckAltivecApple() from
configure.in.  Corrected locally and applied all of the SDL-1.2
patches.  Thanks.  Will look at the Android build fix, soon.

http://hg.libsdl.org/SDL/rev/3d6dc20a0974
http://hg.libsdl.org/SDL/rev/db77604e083d
http://hg.libsdl.org/SDL/rev/78d5c2b67346
http://hg.libsdl.org/SDL/rev/84aa936d90e4
http://hg.libsdl.org/SDL/rev/eb0ed9be8a68
http://hg.libsdl.org/SDL/rev/d5d5686855d3
http://hg.libsdl.org/SDL/rev/8a6c0f0319d4
http://hg.libsdl.org/SDL/rev/4f88e197acad
http://hg.libsdl.org/SDL/rev/0ae1ddca5e85
http://hg.libsdl.org/SDL/rev/3705e81df6ff
http://hg.libsdl.org/SDL/rev/c8562ecca3c9

Regenerated configure:
http://hg.libsdl.org/SDL/rev/4ceb979e228a

Changelog update:
http://hg.libsdl.org/SDL/rev/9ed6ca274a7e
Comment 108 Ben Avison 2019-10-31 11:29:34 UTC
Thanks everyone for helping to get this over the line!
Comment 109 Ozkan Sezer 2019-10-31 11:53:03 UTC
OK. Closing this as resolved/fixed.
Comment 110 Sam Lantinga 2020-02-27 17:32:57 UTC
I temporarily disabled the NEON optimizations. ABGR -> ARGB blit yields an empty surface, and this needs to be looked at before they're re-enabled.
https://hg.libsdl.org/SDL/rev/3d49344c6988
Comment 111 Brad Smith 2020-05-07 23:28:12 UTC
The code added also does not build with Clang's assembler.

cc -O2 -pipe -I/usr/local/include -DUSING_GENERATED_CONFIG_H -Iinclude -I/usr/obj/ports/sdl2-2.0.12/SDL2-2.0.12/include -idirafter /usr/obj/ports/sdl2-2.0.12/SDL2-2.0.12/src/video/khronos -Wall -fno-strict-aliasing -fvisibility=hidden -Wdeclaration-after-statement -Werror=declaration-after-statement -I/usr/X11R6/include -DHAVE_USBHID_H -DUSBHID_UCR_DATA -DUSBHID_NEW -D_REENTRANT -MMD -MT build/pixman-arm-simd-asm.lo -c /usr/obj/ports/sdl2-2.0.12/SDL2-2.0.12/src/video/arm/pixman-arm-simd-asm.S -fPIC -DPIC -o build/.libs/pixman-arm-simd-asm.o
<instantiation>:1:1: error: unknown directive
.func fname
^
/usr/obj/ports/sdl2-2.0.12/SDL2-2.0.12/src/video/arm/pixman-arm-simd-asm.h:599:5: note: while in macro instantiation
    pixman_asm_function fname
    ^
<instantiation>:3:1: note: while in macro instantiation
...
<instantiation>:345:5: error: unknown directive
    .endfunc
    ^
<instantiation>:3:1: note: while in macro instantiation

I see FFmpeg tests the assembler with a check like..

     check_as as_func ".func test
                       .endfunc"

#if HAVE_AS_FUNC
#   define FUNC
#else
#   define FUNC @
#endif

FUNC    .func   EXTERN_ASM\name
FUNC    .endfunc
Comment 112 Dan Lawrence 2020-05-12 16:36:30 UTC
Not sure if this is the right place to address this or if I should open a new bug, but it seems as though the --enable-arm-simd optimisations are also producing incorrect results on my raspberry Pi 4.

If I use the default configuration of SDL 2.0.12 (or SDL 2.0.9) (which has ARM SIMD enabled) then alpha blitting functions incorrectly in the alpha channel. 

Essentially it just always copies the destination surface's alpha over to the final surface rather than doing the proper:

    dstA = srcA + (dstA * (1-srcA))

Calculation. Essentially right now we are getting:

    dstA = dstA

----

If you disable the simd extensions with:

    ../configure --disable-arm-simd

Then this error disappears and the Pi4 works the same as the other platforms.

I originally noticed this bug while using pygame and have an issue tracking the problem on the pygame GitHub here:

https://github.com/pygame/pygame/issues/1722


Here is a C++ reproduction case:

#define SDL_MAIN_HANDLED
#include "SDL/SDL.h"
#include <stdio.h>
#include <iostream>
#include <iomanip>


int main(int argc, char* argv[]) {

    SDL_Window *window;
    SDL_SetMainReady();
    SDL_Init(SDL_INIT_VIDEO);

    window = SDL_CreateWindow(
        "Pi4 Alpha blending bug",                  
        SDL_WINDOWPOS_UNDEFINED,           
        SDL_WINDOWPOS_UNDEFINED,           
        640,                              
        480,                               
        0          // flags 
    );

    if (window == NULL) {
        printf("Could not create window: %s\n", SDL_GetError());
        return 1;
    }
    int width = 256;
    int height = 256;
    Uint32 rmask, gmask, bmask, amask;

#if SDL_BYTEORDER == SDL_BIG_ENDIAN
    rmask = 0xff000000;
    gmask = 0x00ff0000;
    bmask = 0x0000ff00;
    amask = 0x000000ff;
#else
    rmask = 0x000000ff;
    gmask = 0x0000ff00;
    bmask = 0x00ff0000;
    amask = 0xff000000;
#endif

    SDL_Surface* src = SDL_CreateRGBSurface(0, width, height, 32,
                            rmask, gmask, bmask, amask);
    SDL_FillRect(src, NULL, SDL_MapRGBA(src->format, 255, 255, 255, 255));

    SDL_Surface* dst = SDL_CreateRGBSurface(0, width, height, 32,
                            rmask, gmask, bmask, amask);
    SDL_FillRect(dst, NULL, SDL_MapRGBA(src->format, 0, 0, 0, 128));

    SDL_BlitSurface(src, NULL, dst, NULL);

    int x = 128;
    int y = 128; 
    Uint8 rgba[4] = {0, 0, 0, 0};
    SDL_PixelFormat *format = dst->format;

    Uint8 * pixels = (Uint8 *)dst->pixels;
    Uint32 color = *((Uint32 *)(pixels + y * dst->pitch) + x);
    SDL_GetRGBA(color, format, rgba, rgba + 1, rgba + 2, rgba + 3);

    std::cout << SDL_GetPixelFormatName(format->format) << std::endl;
    std::cout << std::hex << color << std::endl;

    // Close and destroy the window
    SDL_DestroyWindow(window);

    // Clean up
    SDL_Quit();
    return 0;
}


Output with default configuration on Windows:

    SDL_PIXELFORMAT_ABGR8888
    ffffffff

Output with --disable-arm-simd:

    SDL_PIXELFORMAT_ABGR8888
    ffffffff

Output with default SDL2 configuration:

    SDL_PIXELFORMAT_ABGR8888
    80ffffff


Hope that is enough information. I'll keep tracking the bug on the pygame GitHub as well.

Cheers!
Comment 113 Sam James 2020-06-08 20:30:42 UTC
I encountered some test failures with pygame recently, and after a hint from one of their developers, found this bug. All of the failures were resolved on ARMv7 (Raspberry Pi 2) by building libsdl2 with --disable-arm-simd.

References:
* https://github.com/pygame/pygame/issues/1933 (the test failure bug 
* https://github.com/pygame/pygame/issues/1722 (another related bug, seemingly caused by this problem)
Comment 114 Sam Lantinga 2020-06-09 00:12:04 UTC
With the correctness problem and failure to compile on clang, I'm inclined to disable this for 2.0.14.

What do you think, Ryan?
Comment 115 Ben Avison 2020-06-09 19:34:37 UTC
Sorry I haven't been able to find time to investigate this recently, folks. Can anyone confirm which branches are affected?
Comment 116 Sam James 2020-06-10 18:30:03 UTC
(In reply to Ben Avison from comment #115)
> Sorry I haven't been able to find time to investigate this recently, folks.
> Can anyone confirm which branches are affected?

No worries! I've reproduced this on 2.0.12. I can try others if needed?
Comment 117 Ryan C. Gordon 2020-06-27 01:56:59 UTC
(In reply to Sam Lantinga from comment #114)
> With the correctness problem and failure to compile on clang, I'm inclined
> to disable this for 2.0.14.
> 
> What do you think, Ryan?

Yeah, let's disable this and bump to target-2.0.16 for now.

--ryan.
Comment 118 Ryan C. Gordon 2020-06-27 03:40:00 UTC
> Yeah, let's disable this and bump to target-2.0.16 for now.

Disabled in https://hg.libsdl.org/SDL/rev/d86a8168fac9

--ryan.
Comment 119 Ozkan Sezer 2020-06-27 08:47:53 UTC
I have no way of testing SDL-1.2 with arm myself, so I disabled it in
the SDL-1.2 branch too by default for now, to be safe:
https://hg.libsdl.org/SDL/rev/05db7c08aa56