We are currently migrating Bugzilla to GitHub issues.
Any changes made to the bug tracker now will be lost, so please do not post new bugs or make changes to them.
When we're done, all bug URLs will redirect to their equivalent location on the new bug tracker.

Bug 15

Summary: x86 asm blitters, win32/MSVC
Product: SDL Reporter: Ryan C. Gordon <icculus>
Component: videoAssignee: Sam Lantinga <slouken>
Status: RESOLVED FIXED QA Contact: Sam Lantinga <slouken>
Severity: normal    
Priority: P2 CC: codepro, max
Version: HG 1.2   
Hardware: x86   
OS: Windows (All)   
URL: http://www.bpvn.com/patches/SDL/mmx-alpha-blit2.zip
Attachments: Code from Rene Dudfield
AltiVec revert correction

Description Ryan C. Gordon 2006-01-03 11:59:35 UTC
From: "Alex Volkov" <avcp-sdlmail@usa.net>
To: "'A list for developers using the SDL library. \(includesSDL-announce\)'"
<sdl@libsdl.org>
Date: Thu, 15 Sep 2005 05:55:33 -0400
Subject: [SDL] [patch] x86 asm blitters, win32/MSVC?

Here are the patches as promised:
        [...snip...obsoleted in next comment...]

Inside you will find:

SDL_blit_A.mmx-speed.patch.txt --
        Speed improvements and a bugfix for the current GCC inline mmx
        asm code:
        - Changed some ops and removed some resulting useless ones.
        - Added some instruction parallelism (some gain)
        The resulting speed on my Xeon improved upto 35% depending on
        the function (measured in fps).
        - Fixed a bug where BlitRGBtoRGBSurfaceAlphaMMX() was
        setting the alpha component on the destination surfaces (to
        opaque-alpha) even when the surface had none.
        (The patch itself is not exactly clean because I changed all
        indentation to tabs -- it was very inconsistent and used
        tabs+spaces.)

SDL_blit_A.mmx-msvc.patch.txt --
        MSVC mmx intrinsics version of the same GCC asm code.
        MSVC compiler tries to parallelize the code and to avoid
        register stalls, but does not always do a very good job.
        Per-surface blending MSVC functions run quite a bit faster
        than their pure-asm counterparts (upto 55% faster for 16bit
        ones), but the per-pixel blending runs somewhat slower than asm.
        You will need to define USE_ASMBLIT to build these blitters,
        see SDL.dsp below.

SDL_blit_A.mmx-both.patch.txt --
        Both above patches rolled into one. Though both will apply
        independently and in tandem, this is just for convenience.

SDL.dsp --
        Updated MSVC v5/6 project file
        - Hermes nasm code compilation added
        - New build config "Release ASMBLIT" with USE_ASMBLIT defined

Regards,
Alex.
Comment 1 Ryan C. Gordon 2006-01-03 12:00:04 UTC
From: "Alex Volkov" <avcp-sdlmail@usa.net>
To: "'A list for developers using the SDL library. \(includesSDL-announce\)'"
<sdl@libsdl.org>
Date: Thu, 22 Sep 2005 15:34:30 -0400
Subject: [SDL] [patch] x86 asm blitters, gcc/MSVC/mmx

New version of MMX patch:
        http://www.bpvn.com/patches/SDL/mmx-alpha-blit2.zip

- Overall speed improvements for the current GCC inline mmx     asm code
(more speed than previous patch for 15/16bpp blenders; a couple
previous mistaken optimizations reverted)

- Bugfix for BlitRGBtoRGBSurfaceAlphaMMX() setting the alpha component
on the destination surfaces (to opaque-alpha) even when the surface
has none

- BlitRGBtoRGBSurfaceAlphaMMX and BlitRGBtoRGBPixelAlphaMMX (and all
variants) can now also handle formats other than (A)RGB8888. Formats
like RGBA8888 and some quite exotic ones are allowed -- like
RAGB8888, or actually anything having channels aligned on 8bit
boundary and full 8bit alpha (for per-pixel alpha blending).
The performance cost of this change is virtually 0 for per-surface
alpha blending (no extra ops inside the loop) and a single non-MMX
op inside the loop for per-pixel blending. In testing, the per-pixel
alpha blending takes a ~2% performance hit, but it still runs much
faster than the current code in CVS. If necessary, a separate function
with this functionality can be made.

- MSVC mmx intrinsics version of the same GCC asm code.
You will need to define USE_ASMBLIT to build these blitters (or use
the SDL.dsp from the previous patch).
This code requires Processor Pack for VC6.

Please let me know if you find anything broken. General comments are welcome
too!
Alex.

Comment 2 Alex Volkov 2006-01-23 12:47:34 UTC
>From: Sam Lantinga
>Sent: Saturday, January 21, 2006 4:56 PM
>Subject: Re: [SDL] [patch] x86 asm blitters, gcc/MSVC/mmx
>
>Hey Alex, I see several different versions of your blit patches.
>Can you visit the SDL bugzilla and let me know if this bug has
>the latest information and versions of your patches?

The ones in http://www.bpvn.com/patches/SDL/mmx-alpha-blit2.zip are the latest, but they probably will not apply cleanly anymore, as there were some little conflicting changes in SDL_blit_A.c.
This code is still in my SDL codebase, and I've been maintaining it and merging stuff in, so I can generate a new diff anytime. However, as I have not touched the code itself for months, it would be a pain to make separate several patches like I did before. So please let me know (here, if you like, I am on CC) in what form you would like these. (I usually react faster to bugzilla emails than SDL maillist ones.)

I've been thinking since I made the original patch, and now I think that SDL_blit_A.c is getting badly cluttered. IMHO, we should separate the MMX code versions into own .c files, like it is done with audio/SDL_mixer_MMX and SDL_mixer_MMX_VC.
Comment 3 Sam Lantinga 2006-01-24 01:39:37 UTC
Alex, can you take a look at this and see if it's helpful?

Date: Fri, 18 Apr 2003 11:00:04 +1000
From: Rene Dudfield <illumen@yahoo.com>
Subject: [SDL] some faster alpha blitting code.

Hi,

a while a go I tried to get alpha blitting of per pixel images going faster.

Some c improvements were made, and some asm mmx/ prefetching stuff was
done as well.  The asm uses gcc inline.  It's attached to this email.



I don't really have time to add in the detection of mmx/prefetching
instructions the way sdl does it.  I'd appreciate some help doing it The
SDL Way.  Please :)  Pretty please with honey on top(you don't want to
be eating too much sugar).



The other things left to do are to optionally use SSE prefetching
instead of 3dnow prefetching.  Also using femms on k6 processors instead
of emms would be nice.  Also need someone to translate it into msvc
inline asm.

If no one helps me with integrating the asm then just doing the c
optimization can speed things up by quite a bit.  The major speed boost
from the asm came from the prefetching, not the mmx.  Inlining the
function gave an ok speed boost too(1-2 fps).

The function to look for is BlitRGBtoRGBPixelAlpha.


These are FPS of two of my games I tested with, and then a comparison
between the new version and old.  Other apps I tried this with had
similar increases in speed.  To test use a game which uses 32 bit per
pixel alpha surfaces.  These results are from my duron 850 with gcc
3.2.3.  Haven't tried it on anything else, so any bug reports/benchmarks
are welcomed.


With no optimizations.
                        bleten          | dohfighters
average score: 21.7745967561   | 37.2473964489


Final results with all optimizations:
                     bleten          | dohfighters
lowest score: 34.3333557719  |48.1210294823
highest score: 34.4127871498  |50.6342458682

50.6342458682 / 100 * 37.2473964489 == 18.85 % increase in fps.
34.4127871498 / 100 * 21.7745967561 == 7.493 % increase in fps.


Thanks to Matt Taylor the ever helpful asm guru from c.l.a.x86 for help
fixing some bugs.
Comment 4 Sam Lantinga 2006-01-24 01:41:38 UTC
Created attachment 32 [details]
Code from Rene Dudfield
Comment 5 Alex Volkov 2006-01-24 02:47:45 UTC
The per-pixel alpha blit code Rene proposed is already in CVS under BlitRGBtoRGBPixelAlphaMMX3DNOW() name, sans the leading "emms" (which is useless) and the alpha==127 special case (which probably incurs a heavy branch penalty anyway).
Comment 6 Sam Lantinga 2006-01-24 03:16:44 UTC
(In reply to comment #5)
> The per-pixel alpha blit code Rene proposed is already in CVS under
> BlitRGBtoRGBPixelAlphaMMX3DNOW() name, sans the leading "emms" (which is
> useless) and the alpha==127 special case (which probably incurs a heavy branch
> penalty anyway).

Great, thanks for looking into this. :)
Comment 7 Ryan C. Gordon 2006-01-27 11:23:03 UTC
Setting Sam as "QA Contact" on all bugs (even resolved ones) so he'll definitely be in the loop to any further discussion here about SDL.

--ryan.

Comment 8 Rene Dudfield 2006-02-03 18:17:46 UTC
Hello,

When I tested the alpha==127 special case, and it was very benificial.  Often images are set to 50% alpha, so it seems to be a good special case to use.  I tested with an amd duron 850... so other cpus might be different.  I think because the code is very repetitive there are not many branches to follow some cpus can do both at once, and it is a free special case.  Although I could be remembering wrong(from 3 years ago), or it could be only on very few cpus.

ps. When can these new mmx/sse blitters be put in cvs already? ;)


Have fun!
Rene Dudfield.
Comment 9 Alex Volkov 2006-02-03 21:05:48 UTC
(In reply to comment #8)
> When I tested the alpha==127 special case, and it was very benificial.  Often
> images are set to 50% alpha, so it seems to be a good special case to use.

You are talking about the per-surface alpha=127 special case. The case I was refering to is in the per-pixel alpha code.
Comment 10 Sam Lantinga 2006-03-15 10:39:36 UTC
I've applied the patch to CVS.
Can you check it out and make sure that I applied it correctly and everything still works?

Thanks!
Comment 11 Alex Volkov 2006-03-15 17:33:24 UTC
Created attachment 81 [details]
AltiVec revert correction

Looks good and works as expected, thanks! Except a couple of AltiVec blocks in SDL_CalculateAlphaBlit() got reverted to earlier versions during merge.
This patch should take care of those.
Comment 12 Alex Volkov 2006-03-24 05:59:55 UTC
Sorry for reopening, but I think the issue of the reverted code (above) might be getting lost.
Comment 13 Sam Lantinga 2006-04-12 10:19:25 UTC
This patch is in CVS, thanks!