We are currently migrating Bugzilla to GitHub issues.
Any changes made to the bug tracker now will be lost, so please do not post new bugs or make changes to them.
When we're done, all bug URLs will redirect to their equivalent location on the new bug tracker.

Bug 1989 - arm optimized memset4 and memcpy4
Summary: arm optimized memset4 and memcpy4
Status: WAITING
Alias: None
Product: SDL
Classification: Unclassified
Component: *don't know* (show other bugs)
Version: HG 2.0
Hardware: ARM Other
: P2 enhancement
Assignee: Ryan C. Gordon
QA Contact: Sam Lantinga
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-07-25 16:53 UTC by Vittorio Giovara
Modified: 2013-07-27 04:16 UTC (History)
0 users

See Also:


Attachments
checking GNUCC for __builtin function (1.19 KB, patch)
2013-07-25 16:55 UTC, Vittorio Giovara
Details | Diff
arm optimized SDL_memset4 (2.93 KB, patch)
2013-07-25 16:55 UTC, Vittorio Giovara
Details | Diff
arm optimized SDL_memcpy4 (2.49 KB, patch)
2013-07-25 16:56 UTC, Vittorio Giovara
Details | Diff
original non-inlined memset4 version (3.98 KB, patch)
2013-07-25 16:57 UTC, Vittorio Giovara
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Vittorio Giovara 2013-07-25 16:53:43 UTC
Long time ago some code was posted here to improve the default implementation fo SDL_memset4 for arm (http://sdl.5483.n7.nabble.com/ARM-optimised-memset-td29728.html).
I originally ported that code for the ancient GAS present in Xcode and then lost track of it when converting it to inline assembly.

When I found that code a few days ago and tried inlining it again but failed, So I looked if there were alternatives and found this nice code http://lists.uclibc.org/pipermail/uclibc/2003-September/027817.html that works well with SDL. I have a patch ready (attached) - there is no explicit license attached so I think the original author should be contacted about this use (and maybe improve it?).

Just as backup, I've included the original patch that still needs to be inlined if anyone would prefer that...

Then I noticed that also memcpy4 could be optimized, and added it even thought SDL doesn't use it internally -- maybe I placed the assembly in the wrong place, if so tell me and I'll move it.


What also puzzled me a bit is that for SDL_memcpy there is a check for using __builtin_memcpy... so what is really preventing us for applying the same check for the other functions? I've added a third patch that adds this check for other functions too...

Besides alignment, what is preventing using the __builtin equivalent functions for memset?
Comment 1 Vittorio Giovara 2013-07-25 16:55:20 UTC
Created attachment 1255 [details]
checking GNUCC for __builtin function
Comment 2 Vittorio Giovara 2013-07-25 16:55:52 UTC
Created attachment 1256 [details]
arm optimized SDL_memset4
Comment 3 Vittorio Giovara 2013-07-25 16:56:16 UTC
Created attachment 1257 [details]
arm optimized SDL_memcpy4
Comment 4 Vittorio Giovara 2013-07-25 16:57:19 UTC
Created attachment 1258 [details]
original non-inlined memset4 version
Comment 5 Sam Lantinga 2013-07-27 04:16:01 UTC
Thanks for the patches.  A few comments:

* Do you know what version of gcc provides each of those builtins?  I believe gcc 2.95 doesn't, for example, which is I think still used for the Haiku build.
* I didn't check your assembly, but your SDL_memset4() is a 32-bit set, not an 8-bit set, correct?
* SDL_memcpy4() has been removed, since it isn't faster than modern memcpy functions.