We are currently migrating Bugzilla to GitHub issues.
Any changes made to the bug tracker now will be lost, so please do not post new bugs or make changes to them.
When we're done, all bug URLs will redirect to their equivalent location on the new bug tracker.

Bug 5181 - Blit from SDL_PIXELFORMAT_RGB555 to SDL_PIXELFORMAT_ARGB1555 is slow
Summary: Blit from SDL_PIXELFORMAT_RGB555 to SDL_PIXELFORMAT_ARGB1555 is slow
Status: NEW
Alias: None
Product: SDL
Classification: Unclassified
Component: video (show other bugs)
Version: 2.0.9
Hardware: x86 All
: P2 normal
Assignee: Sam Lantinga
QA Contact: Sam Lantinga
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-06-10 10:43 UTC by bugmenot_0
Modified: 2020-06-10 10:43 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description bugmenot_0 2020-06-10 10:43:15 UTC
Recently, another user reported to me, that a blit from RGB555 renderer texture to a ARGB1555 framebuffer (window format in the video backend) is very slow in the software renderer.

I believe this is a missing optimization in the blitter in the video subsystem.

Unfortunately, I wasn't provided any more precise measurements.
They eventually fixed the performance issue in their application by using the same format (which is obviously faster).

However, looking at the formats, this shouldn't be a bottleneck, because the formats are largely the same:

```
    SDL_PIXELFORMAT_RGB555 =
        SDL_DEFINE_PIXELFORMAT(SDL_PIXELTYPE_PACKED16, SDL_PACKEDORDER_XRGB,
                               SDL_PACKEDLAYOUT_1555, 15, 2),
[...]
    SDL_PIXELFORMAT_ARGB1555 =
        SDL_DEFINE_PIXELFORMAT(SDL_PIXELTYPE_PACKED16, SDL_PACKEDORDER_ARGB,
                               SDL_PACKEDLAYOUT_1555, 16, 2),
```

So this should still be a fast copy (only has to set or clear a bit during the copy).

The test machine was a Pentium 3 (only has SSE and MMX).
I believe their tests were done without compiler optimizations.
Our toolchains `memcpy` implementation is a simple loop which copies individual bytes, so I'd expect trivial conversions like this to perform similar, too.

I know that SDL 2.0.10 added a bunch of blitter optimizations, but I don't think this conversion is optimized in any form. I assume 2.0.12 to perform similarly.