We are currently migrating Bugzilla to GitHub issues.
Any changes made to the bug tracker now will be lost, so please do not post new bugs or make changes to them.
When we're done, all bug URLs will redirect to their equivalent location on the new bug tracker.

Bug 5184 - sdlgenblit.pl / SDL_blit_auto.c does expensive tasks in the inner pixel-processing loop
Summary: sdlgenblit.pl / SDL_blit_auto.c does expensive tasks in the inner pixel-proce...
Status: WAITING
Alias: None
Product: SDL
Classification: Unclassified
Component: video (show other bugs)
Version: 2.0.12
Hardware: All All
: P2 normal
Assignee: Sam Lantinga
QA Contact: Sam Lantinga
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-06-10 11:04 UTC by bugmenot_0
Modified: 2020-06-10 15:49 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description bugmenot_0 2020-06-10 11:04:22 UTC
The code generator in sdlgenblit.pl moves a lot of the conditional operations into the inner loop which processes pixels.

This goes against intuition. While it creates readable code, it is potentially slow code.
I feel like the main purpose of the generator is to produce fast code, as the resulting code doesn't have to be concise and redundancy is to be expected.
The only code quality that matters is that of the generator script itself.

As such, it would probably be better to avoid constructs like this:

```
for(y = 0; y < height; y++)
    for(x = 0; x < width; x++)
        switch (flags) {
        case SDL_COPY_BLEND: process_pixel_blend(); break
        case SDL_COPY_ADD: process_pixel_add(); break;
        case SDL_COPY_MOD: process_pixel_mod(); break;
        case SDL_COPY_MUL: process_pixel_mul(); break;
        }
    }
}
```

Instead, the code generator should probably generate this:

```
switch (flags) {
case SDL_COPY_BLEND:
    for(y = 0; y < height; y++)
        for(x = 0; x < width; x++)
            process_pixel_blend();
        }
    }
    break;
case SDL_COPY_ADD:
    for(y = 0; y < height; y++)
        for(x = 0; x < width; x++)
            process_pixel_add();
        }
    }
    break;
case SDL_COPY_MOD:
    for(y = 0; y < height; y++)
        for(x = 0; x < width; x++)
            process_pixel_mod();
        }
    }
    break;
case SDL_COPY_MUL:
    for(y = 0; y < height; y++)
        for(x = 0; x < width; x++)
            process_pixel_mul();
        }
    }
}
```

This would avoid reliance on the optimizing compiler, and makes the intention more obvious, which might also affect optimizers.
Comment 1 Sam Lantinga 2020-06-10 15:49:06 UTC
Patches are welcome!

Please include optimized benchmark timing results for your changes.