We are currently migrating Bugzilla to GitHub issues.
Any changes made to the bug tracker now will be lost, so please do not post new bugs or make changes to them.
When we're done, all bug URLs will redirect to their equivalent location on the new bug tracker.

Bug 200 - SDL does not handle CPU/GFX endian difference
Summary: SDL does not handle CPU/GFX endian difference
Status: RESOLVED WONTFIX
Alias: None
Product: SDL
Classification: Unclassified
Component: video (show other bugs)
Version: don't know
Hardware: Other Linux
: P2 normal
Assignee: Ryan C. Gordon
QA Contact: Sam Lantinga
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-04-18 03:26 UTC by Gunnar von Boehn
Modified: 2007-07-03 03:06 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gunnar von Boehn 2006-04-18 03:26:49 UTC
Currently SDL always assumes that the video memory
has the same endianes as the CPU.

This is not always valid!

There are systems which have a big endian CPU
and use video cards with a little endian memory layout.

A 16bit color mode on such a system will not by 565 but 35.53 .
As the bytes are switched because of the different endianess
the video mode be in fact:
High byte:
Green 3 low bit
Blue  5 bit
Low Byte:
Red   5 bit
Green 3 high bits

SDL does recognise this mode as 553 (Blue5/Red5/Green3)
and so only uses the 3 high order bits of green.
Images loaded with SDL on such screen don't use the available 65000 colors but only 8000 colors. 

The Gmask of such a screen is 57351(decimal)
When converting an 8 bit Green value to such a screen
SDL currently shifts the Greenvalue by the amount of format->Gloss to the left. 
SDL does  GREEN=GREEN >> 5
but correct would be this:
GREEN=(GREEN>>5 | GREEN<<11)

The handling of 555 screen on such systems is faulty in a similar way.


I hope this report is helpfull to you

Cheers
Gunnar
Comment 1 Sam Lantinga 2006-04-18 11:04:35 UTC
Patches and/or hardware are welcome. :)
Comment 2 Gunnar von Boehn 2006-04-19 03:42:50 UTC
Sam,

Okay, I'll look into the SDL source and try to send you a patch. ;)


If you want to test it yourself then a good software emulator to test Endian-Issue is the Amiga Emulator UAE or WinUAE. AmigaOS (using big endian 68k or big endian PPC) supports GFXCard running with both Big or Little Endian layout. The Windows Emulator WinUAE uses per default the Windows GFX card in little endian layout.
http://www.winuae.net/

If you are seriously interested in SDL developing on big Endian hardware then you could probably get a free PPC development machine from Genesi.
Genesi are producing PPC based computers and are regularly
supporting Opensource developers with free hardware.

Currently they run a developer program for their new EFIKA boards.
http://projects.ppczone.org/projects.php?program=EFIKA

I myself take part in this developer program and the reason that I'm using SDL since 2 month is that I started an Opensource Shoot-Em-Construction-Kit for the Efika using SDL. http://www.greyhound-data.com/gunnar/games/


Cheers
Gunnar
Comment 3 Patrice Mandin 2006-04-24 09:39:49 UTC
I have the same problem on Atari platform, of course :-). So as you, I just throw away the low-order bits.

However there is no easy solution. If you add a flag to the screen surface telling what the endianness is (same as cpu, or reversed), then applications would have to be patched to use it.

SDL internal blit functions can also be patched to reverse data endianness when blitting to the screen (maybe reading the previously mentionned flag). But applications would have to be patched to take care of it.
Comment 4 Gunnar von Boehn 2006-05-03 05:39:54 UTC
For Software Surfaces the solution is very simple.

SW-Surface are in the main-memory.
We will keep it the same endian format as the CPU.
That is what all application expect anyway.

We ensure that the GFX will look right on every
GFXCard by doing endian conversion on the SDL_Flip().
When copying the SW-Surface to the GFXCard we just endian convert it on the fly.
With a good copy routine we could do this conversion for free.
Below you will find a copy routine that is as fast as the normal memcpy
which is used by SDL and we get the endian conversion for free too.

Problem solved :-)

This type of endian conversion is important for computers
which run a CPU in a different endianess than the GFX-card. 
Typical examples for computers needing this are Apple MAC, Pegasos, AmigaONE, all using PowerPC - and older 68k computers.

I did some benchmarks on various GFX-Cards and
wrote two memcopy routines for PowerPC.
The routines are written in PowerPC assembler for optimal performance.
This first one is an optimized 1to1 memcopy which uses 64bit floating point  registers. The floating point registers enforce 64bit writes to the GFXCard.
Using floating point registers is faster than using integer registers as
usually PowerPC computers will NOT burst to PCI and AGP memory.
Because of this a SDL_Flip() on a SW-surface is quite slow on PowerPC.

A memcopy that uses floats will enforces 64bit access (resulting in mini bursts of 2 times 32bit to PCI). This copy routine is up to two times faster than the memcpy used by SDL. The result depends on the used GFX-Card. I tested it with a number of different GFX-Card - is was always as fast or faster than the memcpy used by SDL now.

This memcopy is loop unrolled to copy 64byte per loop.
I did test several loop unrool size (16/32/64/128/ byte)
unrolling 64byte gave the highest performance.


Fast PowerPC MemCopy:

void memcopy_asmF64(Uint32* source_pointer,Uint32* destination_pointer,int size_in_byte){

asm(
"        srawi. 0,5,5   \n"
"        mtctr 0        \n"
"        bclr 4,1       \n"
"                       \n"
"        subi 3,3,8     \n"
"        subi 4,4,8     \n"
".loopf64:              \n"
"        lfdu 0,8(3)    \n"
"        lfdu 1,8(3)    \n"
"        lfdu 2,8(3)    \n"
"        lfdu 3,8(3)    \n"
"        lfdu 4,8(3)    \n"
"        lfdu 5,8(3)    \n"
"        lfdu 6,8(3)    \n"
"        lfdu 7,8(3)    \n"
"                       \n"
"        stfdu 0,8(4)   \n"
"        stfdu 1,8(4)   \n"
"        stfdu 2,8(4)   \n"
"        stfdu 3,8(4)   \n"
"        stfdu 4,8(4)   \n"
"        stfdu 5,8(4)   \n"
"        stfdu 6,8(4)   \n"
"        stfdu 7,8(4)   \n"
"                       \n"
"        bdnz .loopf64  \n"
"        blr            \n");

}




Second memcopy routine doing "free" endian conversion:
The routine uses integer registers for copying and
is loop unrolled to copy 32byte per loop.
It is as fast as the memcpy currently used by SDL on G3 and G4.

void memcopy_asmX32(Uint32* source,Uint32* destination,int size){

asm(
"        srawi. 0,5,4       \n"
"        mtctr  0           \n"
"        bclr   4,1         \n"
"        li     5,0         \n"
".loop32x:                  \n"
"                            "
"        lwbrx 0,3,5        \n"
"        addi  3,3,4        \n"
"        lwbrx 9,3,5        \n"
"        addi  3,3,4        \n"
"        lwbrx 11,3,5       \n"
"        addi  3,3,4        \n"
"        lwbrx 10,3,5       \n"                             
"        addi  3,3,4        \n"
"        lwbrx 6,3,5        \n"
"        addi  3,3,4        \n"
"        lwbrx 7,3,5        \n"
"        addi  3,3,4        \n"
"        rlwimi 0,0,16,0,31 \n"
"        lwbrx 8,3,5       \n"
"        addi  3,3,4        \n"
"        rlwimi 9,9,16,0,31 \n"
"        lwbrx 12,3,5       \n"
"        addi 3,3,4;        \n"
"        rlwimi 11,11,16,0,31 \n"
"        rlwimi 10,10,16,0,31 \n"                             
"        stw 0,0(4)         \n"
"        stw 9,4(4)         \n"
"        rlwimi 6,6,16,0,31 \n"
"        rlwimi 7,7,16,0,31 \n"
"        stw 11,8(4)        \n"
"        stw 10,12(4)       \n"
"        rlwimi 8,8,16,0,31 \n"
"        rlwimi 12,12,16,0,31 \n"
"        stw 6,16(4)        \n"
"        stw 7,20(4)        \n"
"        stw 8,24(4)        \n"
"        stw 12,28(4)       \n"
"        addi 4,4,32        \n"
"        bdnz .loop32x       \n"
"        blr                \n");
}

This routine is very usefull for GFX-card like the Voodoo family
which do only supprt little endian layout.


I hope that these routines are helpfull.

Cheers
Gunnar
Comment 5 Ryan C. Gordon 2007-02-12 04:07:22 UTC
Is there an actual SDL patch I can apply? Is there some way to decide when hardware has a different byteorder than the CPU, or some #ifdef that the posted memcpy should be used with, etc?

Thanks,
--ryan.

Comment 6 Ryan C. Gordon 2007-07-03 03:06:29 UTC
Closing as WONTFIX, as it's code we're not really prepared to write or support.

--ryan.