We are currently migrating Bugzilla to GitHub issues.
Any changes made to the bug tracker now will be lost, so please do not post new bugs or make changes to them.
When we're done, all bug URLs will redirect to their equivalent location on the new bug tracker.

Bug 1755 - Benchmark inline asm in SDL_stdinc.h and maybe remove it
Summary: Benchmark inline asm in SDL_stdinc.h and maybe remove it
Status: RESOLVED FIXED
Alias: None
Product: SDL
Classification: Unclassified
Component: *don't know* (show other bugs)
Version: HG 2.0
Hardware: x86 Other
: P2 minor
Assignee: Ryan C. Gordon
QA Contact: Sam Lantinga
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-03-15 01:43 UTC by Ryan C. Gordon
Modified: 2013-07-09 12:43 UTC (History)
0 users

See Also:


Attachments
Test program... (871 bytes, text/plain)
2013-07-08 22:33 UTC, Ryan C. Gordon
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ryan C. Gordon 2013-03-15 01:43:11 UTC
There's some memcpy/memset code in include/SDL_stdinc.h ... see if it actually beats what modern platforms are shipping, and remove it if it's not worth keeping.

--ryan.
Comment 1 Ryan C. Gordon 2013-07-08 22:33:21 UTC
Created attachment 1214 [details]
Test program...


Ok, here's what I've got. Attached is a simple test program. It just does SDL_memset4() on a stack buffer, then does it with memset()...the theory is that we'll get into the L2 cache and then be testing the algorithm instead of memory latency.

Here's what I get testing SDL_memset4() on a Retina MacBook Pro, 32-bit process:

asm average: 336.600000
sys average: 228.200000

So system memset() beats SDL_memset4() here. Oh, hey, what happens on x86-64 when we don't get the inline asm?

asm average: 860.200000
sys average: 231.600000

(gcc -O3 in all these cases. "clang -O3" on amd64 does ~654 milliseconds. clang for x86 is about equivalent to gcc.)

--ryan.
Comment 2 Ryan C. Gordon 2013-07-08 22:50:30 UTC
On Linux x86-64, this yields:

asm average: 569.200000
sys average: 289.600000


On Linux x86:

asm average: 311.600000
sys average: 290.200000

--ryan.
Comment 3 Ryan C. Gordon 2013-07-08 23:10:30 UTC
On Windows, Visual Studio 2010 with /Ox optimizations appears to realize this memset() is useless and makes it a no-op, making the memset() test take zero milliseconds.  :)

SDL_memset4() never uses the asm code on Visual Studio (just the C fallback), so I say we dump this code. As we're moving into x86-64 and ARM processors as the main targets, we never get the asm code anyhow, and even the asm code can't keep up with the system's C runtime versions on any platform.

--ryan.
Comment 4 Ryan C. Gordon 2013-07-08 23:25:00 UTC
Removed in hg changeset 898992405fa7.

--ryan.
Comment 5 Sam Lantinga 2013-07-09 10:14:47 UTC
Backed out commit 898992405fa7 because memset() does a byte fill and SDL_memset4() does a uint32 fill and this change breaks SDL_FillRect()
Comment 6 Ryan C. Gordon 2013-07-09 12:43:19 UTC
Closing this as FIXED because the rest of the inline ASM got removed, and SDL_memset4 needs to stay (whoops!).

--ryan.