| Summary: | x86 asm blitters, win32/MSVC | ||
|---|---|---|---|
| Product: | SDL | Reporter: | Ryan C. Gordon <icculus> |
| Component: | video | Assignee: | Sam Lantinga <slouken> |
| Status: | RESOLVED FIXED | QA Contact: | Sam Lantinga <slouken> |
| Severity: | normal | ||
| Priority: | P2 | CC: | codepro, max |
| Version: | HG 1.2 | ||
| Hardware: | x86 | ||
| OS: | Windows (All) | ||
| URL: | http://www.bpvn.com/patches/SDL/mmx-alpha-blit2.zip | ||
| Attachments: |
Code from Rene Dudfield
AltiVec revert correction |
||
|
Description
Ryan C. Gordon
2006-01-03 11:59:35 UTC
From: "Alex Volkov" <avcp-sdlmail@usa.net> To: "'A list for developers using the SDL library. \(includesSDL-announce\)'" <sdl@libsdl.org> Date: Thu, 22 Sep 2005 15:34:30 -0400 Subject: [SDL] [patch] x86 asm blitters, gcc/MSVC/mmx New version of MMX patch: http://www.bpvn.com/patches/SDL/mmx-alpha-blit2.zip - Overall speed improvements for the current GCC inline mmx asm code (more speed than previous patch for 15/16bpp blenders; a couple previous mistaken optimizations reverted) - Bugfix for BlitRGBtoRGBSurfaceAlphaMMX() setting the alpha component on the destination surfaces (to opaque-alpha) even when the surface has none - BlitRGBtoRGBSurfaceAlphaMMX and BlitRGBtoRGBPixelAlphaMMX (and all variants) can now also handle formats other than (A)RGB8888. Formats like RGBA8888 and some quite exotic ones are allowed -- like RAGB8888, or actually anything having channels aligned on 8bit boundary and full 8bit alpha (for per-pixel alpha blending). The performance cost of this change is virtually 0 for per-surface alpha blending (no extra ops inside the loop) and a single non-MMX op inside the loop for per-pixel blending. In testing, the per-pixel alpha blending takes a ~2% performance hit, but it still runs much faster than the current code in CVS. If necessary, a separate function with this functionality can be made. - MSVC mmx intrinsics version of the same GCC asm code. You will need to define USE_ASMBLIT to build these blitters (or use the SDL.dsp from the previous patch). This code requires Processor Pack for VC6. Please let me know if you find anything broken. General comments are welcome too! Alex. >From: Sam Lantinga >Sent: Saturday, January 21, 2006 4:56 PM >Subject: Re: [SDL] [patch] x86 asm blitters, gcc/MSVC/mmx > >Hey Alex, I see several different versions of your blit patches. >Can you visit the SDL bugzilla and let me know if this bug has >the latest information and versions of your patches? The ones in http://www.bpvn.com/patches/SDL/mmx-alpha-blit2.zip are the latest, but they probably will not apply cleanly anymore, as there were some little conflicting changes in SDL_blit_A.c. This code is still in my SDL codebase, and I've been maintaining it and merging stuff in, so I can generate a new diff anytime. However, as I have not touched the code itself for months, it would be a pain to make separate several patches like I did before. So please let me know (here, if you like, I am on CC) in what form you would like these. (I usually react faster to bugzilla emails than SDL maillist ones.) I've been thinking since I made the original patch, and now I think that SDL_blit_A.c is getting badly cluttered. IMHO, we should separate the MMX code versions into own .c files, like it is done with audio/SDL_mixer_MMX and SDL_mixer_MMX_VC. Alex, can you take a look at this and see if it's helpful? Date: Fri, 18 Apr 2003 11:00:04 +1000 From: Rene Dudfield <illumen@yahoo.com> Subject: [SDL] some faster alpha blitting code. Hi, a while a go I tried to get alpha blitting of per pixel images going faster. Some c improvements were made, and some asm mmx/ prefetching stuff was done as well. The asm uses gcc inline. It's attached to this email. I don't really have time to add in the detection of mmx/prefetching instructions the way sdl does it. I'd appreciate some help doing it The SDL Way. Please :) Pretty please with honey on top(you don't want to be eating too much sugar). The other things left to do are to optionally use SSE prefetching instead of 3dnow prefetching. Also using femms on k6 processors instead of emms would be nice. Also need someone to translate it into msvc inline asm. If no one helps me with integrating the asm then just doing the c optimization can speed things up by quite a bit. The major speed boost from the asm came from the prefetching, not the mmx. Inlining the function gave an ok speed boost too(1-2 fps). The function to look for is BlitRGBtoRGBPixelAlpha. These are FPS of two of my games I tested with, and then a comparison between the new version and old. Other apps I tried this with had similar increases in speed. To test use a game which uses 32 bit per pixel alpha surfaces. These results are from my duron 850 with gcc 3.2.3. Haven't tried it on anything else, so any bug reports/benchmarks are welcomed. With no optimizations. bleten | dohfighters average score: 21.7745967561 | 37.2473964489 Final results with all optimizations: bleten | dohfighters lowest score: 34.3333557719 |48.1210294823 highest score: 34.4127871498 |50.6342458682 50.6342458682 / 100 * 37.2473964489 == 18.85 % increase in fps. 34.4127871498 / 100 * 21.7745967561 == 7.493 % increase in fps. Thanks to Matt Taylor the ever helpful asm guru from c.l.a.x86 for help fixing some bugs. Created attachment 32 [details]
Code from Rene Dudfield
The per-pixel alpha blit code Rene proposed is already in CVS under BlitRGBtoRGBPixelAlphaMMX3DNOW() name, sans the leading "emms" (which is useless) and the alpha==127 special case (which probably incurs a heavy branch penalty anyway). (In reply to comment #5) > The per-pixel alpha blit code Rene proposed is already in CVS under > BlitRGBtoRGBPixelAlphaMMX3DNOW() name, sans the leading "emms" (which is > useless) and the alpha==127 special case (which probably incurs a heavy branch > penalty anyway). Great, thanks for looking into this. :) Setting Sam as "QA Contact" on all bugs (even resolved ones) so he'll definitely be in the loop to any further discussion here about SDL. --ryan. Hello, When I tested the alpha==127 special case, and it was very benificial. Often images are set to 50% alpha, so it seems to be a good special case to use. I tested with an amd duron 850... so other cpus might be different. I think because the code is very repetitive there are not many branches to follow some cpus can do both at once, and it is a free special case. Although I could be remembering wrong(from 3 years ago), or it could be only on very few cpus. ps. When can these new mmx/sse blitters be put in cvs already? ;) Have fun! Rene Dudfield. (In reply to comment #8) > When I tested the alpha==127 special case, and it was very benificial. Often > images are set to 50% alpha, so it seems to be a good special case to use. You are talking about the per-surface alpha=127 special case. The case I was refering to is in the per-pixel alpha code. I've applied the patch to CVS. Can you check it out and make sure that I applied it correctly and everything still works? Thanks! Created attachment 81 [details]
AltiVec revert correction
Looks good and works as expected, thanks! Except a couple of AltiVec blocks in SDL_CalculateAlphaBlit() got reverted to earlier versions during merge.
This patch should take care of those.
Sorry for reopening, but I think the issue of the reverted code (above) might be getting lost. This patch is in CVS, thanks! |