We are currently migrating Bugzilla to GitHub issues.
Any changes made to the bug tracker now will be lost, so please do not post new bugs or make changes to them.
When we're done, all bug URLs will redirect to their equivalent location on the new bug tracker.

Bug 1989

Summary: arm optimized memset4 and memcpy4
Product: SDL Reporter: Vittorio Giovara <vitto.giova>
Component: *don't know*Assignee: Ryan C. Gordon <icculus>
Status: WAITING --- QA Contact: Sam Lantinga <slouken>
Severity: enhancement    
Priority: P2    
Version: HG 2.0   
Hardware: ARM   
OS: Other   
Attachments: checking GNUCC for __builtin function
arm optimized SDL_memset4
arm optimized SDL_memcpy4
original non-inlined memset4 version

Description Vittorio Giovara 2013-07-25 16:53:43 UTC
Long time ago some code was posted here to improve the default implementation fo SDL_memset4 for arm (http://sdl.5483.n7.nabble.com/ARM-optimised-memset-td29728.html).
I originally ported that code for the ancient GAS present in Xcode and then lost track of it when converting it to inline assembly.

When I found that code a few days ago and tried inlining it again but failed, So I looked if there were alternatives and found this nice code http://lists.uclibc.org/pipermail/uclibc/2003-September/027817.html that works well with SDL. I have a patch ready (attached) - there is no explicit license attached so I think the original author should be contacted about this use (and maybe improve it?).

Just as backup, I've included the original patch that still needs to be inlined if anyone would prefer that...

Then I noticed that also memcpy4 could be optimized, and added it even thought SDL doesn't use it internally -- maybe I placed the assembly in the wrong place, if so tell me and I'll move it.


What also puzzled me a bit is that for SDL_memcpy there is a check for using __builtin_memcpy... so what is really preventing us for applying the same check for the other functions? I've added a third patch that adds this check for other functions too...

Besides alignment, what is preventing using the __builtin equivalent functions for memset?
Comment 1 Vittorio Giovara 2013-07-25 16:55:20 UTC
Created attachment 1255 [details]
checking GNUCC for __builtin function
Comment 2 Vittorio Giovara 2013-07-25 16:55:52 UTC
Created attachment 1256 [details]
arm optimized SDL_memset4
Comment 3 Vittorio Giovara 2013-07-25 16:56:16 UTC
Created attachment 1257 [details]
arm optimized SDL_memcpy4
Comment 4 Vittorio Giovara 2013-07-25 16:57:19 UTC
Created attachment 1258 [details]
original non-inlined memset4 version
Comment 5 Sam Lantinga 2013-07-27 04:16:01 UTC
Thanks for the patches.  A few comments:

* Do you know what version of gcc provides each of those builtins?  I believe gcc 2.95 doesn't, for example, which is I think still used for the Haiku build.
* I didn't check your assembly, but your SDL_memset4() is a 32-bit set, not an 8-bit set, correct?
* SDL_memcpy4() has been removed, since it isn't faster than modern memcpy functions.