| Summary: | SDL does not handle CPU/GFX endian difference | ||
|---|---|---|---|
| Product: | SDL | Reporter: | Gunnar von Boehn <gunnar> |
| Component: | video | Assignee: | Ryan C. Gordon <icculus> |
| Status: | RESOLVED WONTFIX | QA Contact: | Sam Lantinga <slouken> |
| Severity: | normal | ||
| Priority: | P2 | ||
| Version: | don't know | ||
| Hardware: | Other | ||
| OS: | Linux | ||
|
Description
Gunnar von Boehn
2006-04-18 03:26:49 UTC
Patches and/or hardware are welcome. :) Sam, Okay, I'll look into the SDL source and try to send you a patch. ;) If you want to test it yourself then a good software emulator to test Endian-Issue is the Amiga Emulator UAE or WinUAE. AmigaOS (using big endian 68k or big endian PPC) supports GFXCard running with both Big or Little Endian layout. The Windows Emulator WinUAE uses per default the Windows GFX card in little endian layout. http://www.winuae.net/ If you are seriously interested in SDL developing on big Endian hardware then you could probably get a free PPC development machine from Genesi. Genesi are producing PPC based computers and are regularly supporting Opensource developers with free hardware. Currently they run a developer program for their new EFIKA boards. http://projects.ppczone.org/projects.php?program=EFIKA I myself take part in this developer program and the reason that I'm using SDL since 2 month is that I started an Opensource Shoot-Em-Construction-Kit for the Efika using SDL. http://www.greyhound-data.com/gunnar/games/ Cheers Gunnar I have the same problem on Atari platform, of course :-). So as you, I just throw away the low-order bits. However there is no easy solution. If you add a flag to the screen surface telling what the endianness is (same as cpu, or reversed), then applications would have to be patched to use it. SDL internal blit functions can also be patched to reverse data endianness when blitting to the screen (maybe reading the previously mentionned flag). But applications would have to be patched to take care of it. For Software Surfaces the solution is very simple.
SW-Surface are in the main-memory.
We will keep it the same endian format as the CPU.
That is what all application expect anyway.
We ensure that the GFX will look right on every
GFXCard by doing endian conversion on the SDL_Flip().
When copying the SW-Surface to the GFXCard we just endian convert it on the fly.
With a good copy routine we could do this conversion for free.
Below you will find a copy routine that is as fast as the normal memcpy
which is used by SDL and we get the endian conversion for free too.
Problem solved :-)
This type of endian conversion is important for computers
which run a CPU in a different endianess than the GFX-card.
Typical examples for computers needing this are Apple MAC, Pegasos, AmigaONE, all using PowerPC - and older 68k computers.
I did some benchmarks on various GFX-Cards and
wrote two memcopy routines for PowerPC.
The routines are written in PowerPC assembler for optimal performance.
This first one is an optimized 1to1 memcopy which uses 64bit floating point registers. The floating point registers enforce 64bit writes to the GFXCard.
Using floating point registers is faster than using integer registers as
usually PowerPC computers will NOT burst to PCI and AGP memory.
Because of this a SDL_Flip() on a SW-surface is quite slow on PowerPC.
A memcopy that uses floats will enforces 64bit access (resulting in mini bursts of 2 times 32bit to PCI). This copy routine is up to two times faster than the memcpy used by SDL. The result depends on the used GFX-Card. I tested it with a number of different GFX-Card - is was always as fast or faster than the memcpy used by SDL now.
This memcopy is loop unrolled to copy 64byte per loop.
I did test several loop unrool size (16/32/64/128/ byte)
unrolling 64byte gave the highest performance.
Fast PowerPC MemCopy:
void memcopy_asmF64(Uint32* source_pointer,Uint32* destination_pointer,int size_in_byte){
asm(
" srawi. 0,5,5 \n"
" mtctr 0 \n"
" bclr 4,1 \n"
" \n"
" subi 3,3,8 \n"
" subi 4,4,8 \n"
".loopf64: \n"
" lfdu 0,8(3) \n"
" lfdu 1,8(3) \n"
" lfdu 2,8(3) \n"
" lfdu 3,8(3) \n"
" lfdu 4,8(3) \n"
" lfdu 5,8(3) \n"
" lfdu 6,8(3) \n"
" lfdu 7,8(3) \n"
" \n"
" stfdu 0,8(4) \n"
" stfdu 1,8(4) \n"
" stfdu 2,8(4) \n"
" stfdu 3,8(4) \n"
" stfdu 4,8(4) \n"
" stfdu 5,8(4) \n"
" stfdu 6,8(4) \n"
" stfdu 7,8(4) \n"
" \n"
" bdnz .loopf64 \n"
" blr \n");
}
Second memcopy routine doing "free" endian conversion:
The routine uses integer registers for copying and
is loop unrolled to copy 32byte per loop.
It is as fast as the memcpy currently used by SDL on G3 and G4.
void memcopy_asmX32(Uint32* source,Uint32* destination,int size){
asm(
" srawi. 0,5,4 \n"
" mtctr 0 \n"
" bclr 4,1 \n"
" li 5,0 \n"
".loop32x: \n"
" "
" lwbrx 0,3,5 \n"
" addi 3,3,4 \n"
" lwbrx 9,3,5 \n"
" addi 3,3,4 \n"
" lwbrx 11,3,5 \n"
" addi 3,3,4 \n"
" lwbrx 10,3,5 \n"
" addi 3,3,4 \n"
" lwbrx 6,3,5 \n"
" addi 3,3,4 \n"
" lwbrx 7,3,5 \n"
" addi 3,3,4 \n"
" rlwimi 0,0,16,0,31 \n"
" lwbrx 8,3,5 \n"
" addi 3,3,4 \n"
" rlwimi 9,9,16,0,31 \n"
" lwbrx 12,3,5 \n"
" addi 3,3,4; \n"
" rlwimi 11,11,16,0,31 \n"
" rlwimi 10,10,16,0,31 \n"
" stw 0,0(4) \n"
" stw 9,4(4) \n"
" rlwimi 6,6,16,0,31 \n"
" rlwimi 7,7,16,0,31 \n"
" stw 11,8(4) \n"
" stw 10,12(4) \n"
" rlwimi 8,8,16,0,31 \n"
" rlwimi 12,12,16,0,31 \n"
" stw 6,16(4) \n"
" stw 7,20(4) \n"
" stw 8,24(4) \n"
" stw 12,28(4) \n"
" addi 4,4,32 \n"
" bdnz .loop32x \n"
" blr \n");
}
This routine is very usefull for GFX-card like the Voodoo family
which do only supprt little endian layout.
I hope that these routines are helpfull.
Cheers
Gunnar
Is there an actual SDL patch I can apply? Is there some way to decide when hardware has a different byteorder than the CPU, or some #ifdef that the posted memcpy should be used with, etc? Thanks, --ryan. Closing as WONTFIX, as it's code we're not really prepared to write or support. --ryan. |