| Summary: | Benchmark inline asm in SDL_stdinc.h and maybe remove it | ||
|---|---|---|---|
| Product: | SDL | Reporter: | Ryan C. Gordon <icculus> |
| Component: | *don't know* | Assignee: | Ryan C. Gordon <icculus> |
| Status: | RESOLVED FIXED | QA Contact: | Sam Lantinga <slouken> |
| Severity: | minor | ||
| Priority: | P2 | ||
| Version: | HG 2.0 | ||
| Hardware: | x86 | ||
| OS: | Other | ||
| Attachments: | Test program... | ||
|
Description
Ryan C. Gordon
2013-03-15 01:43:11 UTC
Created attachment 1214 [details]
Test program...
Ok, here's what I've got. Attached is a simple test program. It just does SDL_memset4() on a stack buffer, then does it with memset()...the theory is that we'll get into the L2 cache and then be testing the algorithm instead of memory latency.
Here's what I get testing SDL_memset4() on a Retina MacBook Pro, 32-bit process:
asm average: 336.600000
sys average: 228.200000
So system memset() beats SDL_memset4() here. Oh, hey, what happens on x86-64 when we don't get the inline asm?
asm average: 860.200000
sys average: 231.600000
(gcc -O3 in all these cases. "clang -O3" on amd64 does ~654 milliseconds. clang for x86 is about equivalent to gcc.)
--ryan.
On Linux x86-64, this yields: asm average: 569.200000 sys average: 289.600000 On Linux x86: asm average: 311.600000 sys average: 290.200000 --ryan. On Windows, Visual Studio 2010 with /Ox optimizations appears to realize this memset() is useless and makes it a no-op, making the memset() test take zero milliseconds. :) SDL_memset4() never uses the asm code on Visual Studio (just the C fallback), so I say we dump this code. As we're moving into x86-64 and ARM processors as the main targets, we never get the asm code anyhow, and even the asm code can't keep up with the system's C runtime versions on any platform. --ryan. Removed in hg changeset 898992405fa7. --ryan. Backed out commit 898992405fa7 because memset() does a byte fill and SDL_memset4() does a uint32 fill and this change breaks SDL_FillRect() Closing this as FIXED because the rest of the inline ASM got removed, and SDL_memset4 needs to stay (whoops!). --ryan. |