We are currently migrating Bugzilla to GitHub issues.
Any changes made to the bug tracker now will be lost, so please do not post new bugs or make changes to them.
When we're done, all bug URLs will redirect to their equivalent location on the new bug tracker.

Bug 4441

Summary: Basic SDL C ( DX ) program gives worse performance than C# MonoGame for Windows ( DX )
Product: SDL Reporter: Valentyn <valkaenator>
Component: renderAssignee: Ryan C. Gordon <icculus>
Status: ASSIGNED --- QA Contact: Sam Lantinga <slouken>
Severity: minor    
Priority: P2 CC: amaranth72, icculus
Version: 2.0.9   
Hardware: x86   
OS: Windows 10   

Description Valentyn 2019-01-04 14:43:56 UTC
I have created a basic SDL program which calls SDL_RenderClear(), then a 50k times for loop with SDL_RenderCopy() calls and when loop ends, program calls SDL_RenderPresent(). SDL_Renderer has accelerated, vsync and render target texture flags. This is a release build of course and I get 60 FPS all the time.
On my machine this creates 20-22% CPU load and 55-60% GPU load.


With MonoGame Windows I am having same FPS. Program calls SpriteBatch.Begin(), 50k  for loop with SpriteBatch.Draw() calls and when loop is over SpriteBatch.End(). This gives me 5-7% CPU load and 45-50% GPU load.

I also tried just doing FPS counting without vsync and rendering nothing else.
SDL C program gives 5-6k FPS with 20% CPU load and 80% GPU load, while MonoGame Windows gives 7-8k FPS with 15% CPU load and 50% GPU load.

Do you guys have any idea how this is possible?
Comment 1 Valentyn 2019-01-04 14:59:35 UTC
I went on and made tests with rendering of 300k sprites without vsync. It seems like SDL has problems with utilizing my GPU resources and CPU is becomming a bottleneck.
This test gives me 20-22FPS with 29% CPU load and 70-75% GPU load.

MonoGame gives me 27-28FPS, uses 17% CPU and uses 99-100% GPU.
Comment 2 Ryan C. Gordon 2019-01-04 18:45:42 UTC
We redesigned the renderer implementation after SDL 2.0.9 shipped. It's considerably faster than 2.0.9's implementation, but I haven't benchmarked it against MonoGame.

If you would like to try the latest SDL in revision control, I would be interested to hear your results!

--ryan.
Comment 3 Valentyn 2019-01-05 01:57:37 UTC
Hello Ryan, I already did and there are several issues and only a little improvement there.

Issues:
1) Nothing changes when I hint for SDL_HINT_RENDER_BATCHING to be "0" or "1". It does exactly the same, it seems to always use render batching. Memory usage goes up from 30MB to 130MB and CPU load goes down from 20-22% to 14-16%. GPU load stays the same. So I see that there's some caching going on here.
I actually tried rendering 1.000.000 sprites and my memory usage went to 2400MB, while CPU load stayed at 19% and GPU load at 50% and I had 4 FPS. This seems broken.
2)Removing SDL_RENDERER_PRESENTVSYNC doesn't work as it should. It still stays at 60% GPU load and gives twice less FPS than MonoGame at 50k sprites rendering. 80FPS vs 170FPS and MonoGame utilizes my GPU up to 100%.

I do link SDL statically and build from source each time and I also have a few code changes to redefine some stuff for my own memory tracking, which seems to be a bit broken after the upgrade because I can see only allocations of 180MB while there's 2400MB being allocated when rendering 1k sprites. I didn't try to load SDL dynamically in a test project but I don't think that this could be an issue, right?
Comment 4 Ryan C. Gordon 2019-01-05 05:21:35 UTC
(In reply to Valentyn from comment #3)
> I actually tried rendering 1.000.000 sprites and my memory usage went to
> 2400MB, while CPU load stayed at 19% and GPU load at 50% and I had 4 FPS.
> This seems broken.

Right now it batches everything until it has a reason to flush. I'll put an upper limit on it, so it flushes as it goes for really pathological cases like this. This should solve this problem (and possibly others).

--ryan.
Comment 5 Alex Szpakowski 2019-01-07 03:31:15 UTC
Right now in the latest source code SDL does vertex batching but not much draw call batching (e.g. every SDL_RenderCopy call causes a draw call, even if no state changed between consecutive RenderCopy calls.)

If we want to improve SDL_Render's performance further I think batching draw calls would probably give the biggest perf gains, although it might require some restructuring or tradeoffs around the implementations of per-draw transformations.
Comment 6 Valentyn 2019-01-08 08:21:40 UTC
(In reply to Alex Szpakowski from comment #5)
> Right now in the latest source code SDL does vertex batching but not much
> draw call batching (e.g. every SDL_RenderCopy call causes a draw call, even
> if no state changed between consecutive RenderCopy calls.)
> 
> If we want to improve SDL_Render's performance further I think batching draw
> calls would probably give the biggest perf gains, although it might require
> some restructuring or tradeoffs around the implementations of per-draw
> transformations.

I think it would be nice to have same metrics as MonoGame has because SDL offers its own rendering API just like MonoGame. Take a look here: http://www.monogame.net/docs/html/index.html
This shouldn't be hard to implement and might help to perform benchmarking against MonoGame and other similiar rendering APIs.

Right now, if people are doing something serious with SDL, it comes down to writing your own rendering, which is probably best to do with https://github.com/bkaradzic/bgfx and this is lower level than SDL 2D rendering API. I'd love to use SDL 2D rendering API but it makes no sense if MonoGame performs better and it is C# too.
Comment 7 Valentyn 2019-01-08 08:24:28 UTC
Can't get a proper link to MG documentation for some reason.
You can find their metrics under Class Library Reference -> Microsoft.Xna.Framework.Graphics -> GraphicsMetrics