We are currently migrating Bugzilla to GitHub issues.
Any changes made to the bug tracker now will be lost, so please do not post new bugs or make changes to them.
When we're done, all bug URLs will redirect to their equivalent location on the new bug tracker.

Bug 470 - Cocoa video backend speed regression
Summary: Cocoa video backend speed regression
Status: RESOLVED WORKSFORME
Alias: None
Product: SDL
Classification: Unclassified
Component: video (show other bugs)
Version: HG 2.0
Hardware: PowerPC Mac OS X (All)
: P2 normal
Assignee: Sam Lantinga
QA Contact: Sam Lantinga
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-07-25 18:14 UTC by Max Horn
Modified: 2011-01-19 22:45 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Max Horn 2007-07-25 18:14:39 UTC
(the following is based on a recent mail of mine on SDL-devel)

The Cocoa (former quartz) backend in SDL 1.3 is very nice, but for me shows some serious speed regressions.

Testcase: I recently recompiled ScummVM using latest SDL 1.3 SVN, on my PowerBook G4 1.5 Ghz, Mac OS X 10.4, with the built-in Radeon 9700 XT graphics.

The result was that instead of the usual 10-20% CPU usage, it suddently went up to a whopping 100%. Ouch! With Shark I quickly determined that the majority of that time was spent inside a CoreGraphics / OpenGL internal function called glgProcessPixels().


Some googling quickly revealed the following insightful pointers:

<http://developer.apple.com/documentation/GraphicsImaging/Conceptual/OpenGL-MacProgGuide/opengl_performance/chapter_13_section_4.html> 

<http://developer.apple.com/documentation/GraphicsImaging/Conceptual/OpenGL-MacProgGuide/opengl_performance/chapter_13_section_2.html#//apple_ref/doc/uid/TP40001987-CH213-SW23>

To sum those docs up: If you use GL_BGRA, GL_UNSIGNED_INT_8_8_8_8_REV, OpenGL on Mac OS X is efficient. If you use GL_BGRA, GL_UNSIGNED_SHORT_1_5_5_5_REV it should still be acceptable. Everything else may (and does!) cause you major PAIN PAIN pAIN.


So what happened here is this: ScummVM requests a 16 bpp hardware screen. SDL apparently allocates such a screen surface, and used that for texture uploads whenever a screen redraw was requested. This then caused costly conversions to take place, resulting in each texture upload taking an insane amount of CPU power.

I quickly found two quick hacks to speed this up: One was changing ScummVM to request a 15bpp surface (the docs above state that 1_5_5_5 mode should be "faster" than most other stuff). This made the CPU usage go down to 50%. The other trick I tried was to modify SDL_compat.c, line 487, where the SDL_VideoTexture is allocated -- I just forced it to always use SDL_PIXELFORMAT_RGB888. That resulted in the same speedup.


But even with these hacks, things are still 2-3 slower than with 1.2. The reason is that it still spends lots of time (albeit a lot less) in glgProcessPixels().Specifically, in a method including "GLGConverter_BGRA8_RGBA8" in its signature. So that sounds as if SDL uses the "wrong" byte order (in the eyes of the Apple OpenGL implementation, that is). In particular, Apple wants BGRA8, and my guess would be that SDL supplies RGB8.

So, my hope would be that SDL_compat.c could be changed in such a way that (at least under Mac OS X), the screen surface is *always* allocated in the "optimal" format. Because apparently SDL does a *much* better job at performing those texture conversions. My quick attempts to specify SDL_PIXELFORMAT_BGRA8888 or SDL_PIXELFORMAT_ARGB8888 proved fruitless, however, my knowledge of OpenGL in general and the new SDL Cocoa OpenGL code in particular is rather limited (although I am willing to learn), so this might just have been caused by my incompetence :-).

Things are of course made even more complicated by big endian vs. little endian differences. See also <http://developer.apple.com/documentation/MacOSX/Conceptual/universal_binary/universal_binary_tips/chapter_5_section_25.html>
Comment 1 Sam Lantinga 2009-12-15 18:13:45 UTC
I added a 'testfill' program to the test directory and ran a bunch of tests on Snow Leopard.  In each case, about 25% of the time was spent in SDL's fill rect routine, and about 50% of the time was spent in memcpy() called from glTexSubImage2D_Exec(), which is about what I would expect.

It's possible that the OpenGL texture performance has been greatly improved in Snow Leopard.  Can you see if performance has improved on your configuration?
Comment 2 Sam Lantinga 2011-01-19 22:45:32 UTC
No response in 3 years, I'm closing this bug.