We are currently migrating Bugzilla to GitHub issues.
Any changes made to the bug tracker now will be lost, so please do not post new bugs or make changes to them.
When we're done, all bug URLs will redirect to their equivalent location on the new bug tracker.

Bug 5184

Summary: sdlgenblit.pl / SDL_blit_auto.c does expensive tasks in the inner pixel-processing loop
Product: SDL Reporter: bugmenot_0 <kajema2739>
Component: videoAssignee: Sam Lantinga <slouken>
Status: WAITING --- QA Contact: Sam Lantinga <slouken>
Severity: normal    
Priority: P2    
Version: 2.0.12   
Hardware: All   
OS: All   

Description bugmenot_0 2020-06-10 11:04:22 UTC
The code generator in sdlgenblit.pl moves a lot of the conditional operations into the inner loop which processes pixels.

This goes against intuition. While it creates readable code, it is potentially slow code.
I feel like the main purpose of the generator is to produce fast code, as the resulting code doesn't have to be concise and redundancy is to be expected.
The only code quality that matters is that of the generator script itself.

As such, it would probably be better to avoid constructs like this:

```
for(y = 0; y < height; y++)
    for(x = 0; x < width; x++)
        switch (flags) {
        case SDL_COPY_BLEND: process_pixel_blend(); break
        case SDL_COPY_ADD: process_pixel_add(); break;
        case SDL_COPY_MOD: process_pixel_mod(); break;
        case SDL_COPY_MUL: process_pixel_mul(); break;
        }
    }
}
```

Instead, the code generator should probably generate this:

```
switch (flags) {
case SDL_COPY_BLEND:
    for(y = 0; y < height; y++)
        for(x = 0; x < width; x++)
            process_pixel_blend();
        }
    }
    break;
case SDL_COPY_ADD:
    for(y = 0; y < height; y++)
        for(x = 0; x < width; x++)
            process_pixel_add();
        }
    }
    break;
case SDL_COPY_MOD:
    for(y = 0; y < height; y++)
        for(x = 0; x < width; x++)
            process_pixel_mod();
        }
    }
    break;
case SDL_COPY_MUL:
    for(y = 0; y < height; y++)
        for(x = 0; x < width; x++)
            process_pixel_mul();
        }
    }
}
```

This would avoid reliance on the optimizing compiler, and makes the intention more obvious, which might also affect optimizers.
Comment 1 Sam Lantinga 2020-06-10 15:49:06 UTC
Patches are welcome!

Please include optimized benchmark timing results for your changes.