We are currently migrating Bugzilla to GitHub issues.
Any changes made to the bug tracker now will be lost, so please do not post new bugs or make changes to them.
When we're done, all bug URLs will redirect to their equivalent location on the new bug tracker.

Bug 5312 - kmsdrm render reorders events?
Summary: kmsdrm render reorders events?
Status: RESOLVED ENDOFLIFE
Alias: None
Product: SDL
Classification: Unclassified
Component: render (show other bugs)
Version: 2.0.12
Hardware: x86_64 Linux
: P2 normal
Assignee: Manuel Alfayate Corchete
QA Contact: Sam Lantinga
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-10-09 20:52 UTC by Stas Sergeev
Modified: 2020-10-15 17:42 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stas Sergeev 2020-10-09 20:52:23 UTC
Hi.

I tried my program that worked perfectly
with SDL's X backend, under KMS console,
and found that it has lots of rendering
artefacts.
For example it can draw the next frame and
then draw previous frame again.
After some debugging I've found out that
glitches disappear if I call SDL_RenderPresent()
unconditionally, whereas normally I call
SDL_RenderPresent() only if the renderer
was previously updated with SDL_RenderCopy()
call.
I wasn't able to find any flaw in my code,
and the fact that it works with SDL's X
driver perfectly, makes me think the problem
is in kmsdrm driver of sdl.

Can it be that it somehow "batches" the
SDL_RenderCopy() calls, but SDL_RenderPresent()
calls go as-is, so the ordering is not
preserved? That would explain why sometimes
I see the old frames: perhaps SDL_RenderCopy()
was delayed, and subsequent SDL_RenderPresent()
switched the buffers before the new one was
ready?

Sorry for such a non-definitive report.
Comment 1 Stas Sergeev 2020-10-10 12:40:12 UTC
In fact, the situation described, happens with
the software renderer.
Now I tried to create a hw renderer, and the
situation is much worse, because then, if I
call SDL_RenderPresent() periodically (which
helped in case of a software renderer), I get
a continuous flipping between the current and
prev frame. So in case of a hw renderer, no
work-around is found.
Comment 2 Manuel Alfayate Corchete 2020-10-10 13:11:02 UTC
@Stas Sergeev

Thanks for this report. I am decided to fix this, but I don't have much time for this.
I don't understand the problem very well; when you do a SDL_RenderPresent() call, in the KMSDRM backend the function KMSDRM_SwapWindowFenced() is called in the end, which simply requests a pageflip:

https://hg.libsdl.org/SDL/file/7a3c36e0f598/src/video/kmsdrm/SDL_kmsdrmopengles.c#l110

How could this reorder events? What events? I just don't get it, any help to understand the problem is really welcome. Such a problem can't be allowed in the backend.
Comment 3 Manuel Alfayate Corchete 2020-10-10 13:15:35 UTC
@Stas Sergeev: One thing you can do is export the enviroment variable SDL_VIDEO_DOUBLE_BUFFER=1 (or simply do "SDL_VIDEO_DOUBLE_BUFFER=1 <your_program>") and tell us if it's still happening.
Exporting SDL_VIDEO_DOUBLE_BUFFER env variable causes KMSDRM_GLES_SwapWindowDoubleBuffered() to be called instead of KMSDRM_GLES_SwapWindowFenced() on pageflip. Since KMSDRM_GLES_SwapWindowFenced() is more complicated, we could at least know if KMSDRM_GLES_SwapWindowFenced() is the problem.
Comment 4 Stas Sergeev 2020-10-10 13:33:06 UTC
Additional problem here is that when I switch
too much between SDL/drm app and X to test what
you say, I eventually get the full DRI hangup,
and need to reboot machine. :)
But this of course is not an SDL fault, so I
tested
SDL_VIDEO_DOUBLE_BUFFER=1 <my_program>
and found no observable differences with the
SW renderer.

> How could this reorder events? What events?

Any SDL_RenderPresent() is preceded with 1 or
more SDL_RenderClear()/SDL_RenderCopy() calls.
My guess is that these calls are "batched", so
SDL_RenderPresent() can flip to the not-yet-ready
frame. Which is the only explanation I can think
of, when seeing the switch to an old frame!

Under normal circumstances I call SDL_RenderPresent()
only after a SDL_RenderClear()/SDL_RenderCopy()
sequences. But to work around the problem, I
tried to call SDL_RenderPresent() unconditionally,
periodically. And for SW renderer this indeed
worked around the problem. But such work-around
breaks HW renderer even in X. Which leads to a
question: shouldn't SDL_RenderPresent() check
if renderer was updated, and not switch to the
"outdated" content, but return an error instead?
Comment 5 Manuel Alfayate Corchete 2020-10-10 14:11:08 UTC
@Stas Sergeev:
What I understand is that you want to do several SDL_RenderClear()/SDL_RenderCopy() calls, and then ONE SDL_RenderPresent() call and that should work but it does not, right?
Can you give me a small code example where I can see it fail so I can understand better? The smaller the example, the better. Something that tries to simply render a triangle, or a pixel.. something very simple that I can see that fails in KMSDRM and succeeds in X11.
Comment 6 Stas Sergeev 2020-10-10 14:55:45 UTC
> What I understand is that you want to do several
> SDL_RenderClear()/SDL_RenderCopy()

I tried serializing them, so that SDL_RenderPresent()
is called after every SDL_RenderClear()/SDL_RenderCopy()
pair. No change. So I think render operations gets
delayed, except for SDL_RenderPresent(), which goes
immediately.

> and that should work but it does not, right?

It works, but I have artefacts. Like old frames
sometimes.

> Can you give me a small code example where I can see it fail

There is no fail, just artefacts. :)
Also the pipeline is not all that small:
SDL_LockSurface()
SDL_UpdateTexture() (multiple times)
SDL_UnlockSurface()
SDL_RenderClear()
SDL_RenderCopy()
SDL_RenderPresent()

IIRC sdl sources had many tests.
Are there any tests that do similar to
the above pipeline? If so, I can take such
test as a start, and see if I "break" it
for you (or fix my code from it).
Comment 7 Stas Sergeev 2020-10-10 15:09:52 UTC
diff -r 83c96b1d973c test/testoverlay2.c
--- a/test/testoverlay2.c       Fri Oct 09 04:28:00 2020 +0300
+++ b/test/testoverlay2.c       Sat Oct 10 18:06:27 2020 +0300
@@ -341,7 +341,7 @@
         quit(4);
     }
 
-    renderer = SDL_CreateRenderer(window, -1, 0);
+    renderer = SDL_CreateRenderer(window, -1, SDL_RENDERER_SOFTWARE);
     if (!renderer) {
         SDL_LogError(SDL_LOG_CATEGORY_APPLICATION, "Couldn't set create renderer: %s\n", SDL_GetError());
         SDL_free(RawMooseData);


With this change you can see the problem.
The motion is sometimes jerky.
I have to admit this is not the best test
to see the problem, but its still visible.

Or, if you run testoverlay2 without the
above patch, then you will probably get
the HW redering problem, which is quite
different, but I have it too. Namely,
after some time the motion will became
very slow and you will see the black bar
floating vertically around the screen.
I haven't mentioned that problem in this
report because it may be specific to my
amdgpu driver. See if you can reproduce it,
Comment 8 Manuel Alfayate Corchete 2020-10-10 19:20:04 UTC
@Stas Sergeev:
I have tried the testoverlay2 test while looking closely at the angry moose for several minutes, without being able to notice anything different in KMSDRM from what I see in X11. And I tried really hard.
I have to say that I have a good eye for these things and live in perpetual obsession with perfect screen refresh with no "micro-stalls" or stutters: I can tell a single lost frame in a refresh sequence, but for the love of me I can't see anything in testoverlay2 with the SDL_RENDERER_SOFTWARE or without it.
Really the animation doesn't run at 60fps (it has few animation frames) but I can't see anything strange besides that: of course it shows the same frames in X11 than it shows in KMSDRM.
If you have anything else that I can reproduce, I will investigate as much as needed as my time allows, so I will leave this open and I will be waiting for more input, but as things are just now, I can't imagine what the problem could be and I can't even see it.
Comment 9 Manuel Alfayate Corchete 2020-10-10 19:25:11 UTC
I should add that I have amdgpu here (according to lspci: 00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Stoney [Radeon R2/R3/R4/R5 Graphics] (rev e2)).
The amdgpu driver stinks. It's being fixed by the awesome MESA devs in #dri-devel, or so it was this summer when fried them with questions about the atomic drm interface. But currently I have some workarounds specifically put in place so the kmsdrm does not blackscreen on quit, etc. Other than that (and that is working too), it's working fine here, as well as VC4 (Raspberry Pi) and Intel.

Maybe there's another problem in your system? Can you please try on a different computer even if it has amdgpu too?
Comment 10 Manuel Alfayate Corchete 2020-10-10 19:33:36 UTC
More information: since KMS/DRM are mostly a kernel thing (well, there's libDRM too), here are some relevant version numbers about what my system is running:

manuel@hp15db0:~/src/SDL-RDY/test$ uname -r
5.4.0-48-generic

manuel@hp15db0:~/src/SDL-RDY/test$ apt list mesa-common-dev
Listing... Done
mesa-common-dev/focal-updates,now 20.0.8-0ubuntu1~20.04.1 amd64 [installed]

manuel@hp15db0:~/src/SDL-RDY/test$ apt list libdrm2
Listing... Done
libdrm2/focal,now 2.4.101-2 amd64 [installed]

Also, anybody being able to see this problem is welcome to post.
Comment 11 Stas Sergeev 2020-10-10 21:21:37 UTC
You applied the patch I showed
in comment #7, didn't you?
Comment 12 Manuel Alfayate Corchete 2020-10-11 00:47:15 UTC
(In reply to Stas Sergeev from comment #11)
> You applied the patch I showed
> in comment #7, didn't you?

Yes, I did. As I said, I tried testoverlay2 with and without the patch (which simply forces the creation of a HW renderer).
Comment 13 Stas Sergeev 2020-10-11 14:27:46 UTC
OK, thanks.
I'll try to collect the info from other machines,
and will either post it here, or close that bug if
no info can be collected (not that I have many other
PCs around).

In a mean time, of course it would be good if people
here to also patch the test as in comment #7, and
see how it goes.
Comment 14 Sam Lantinga 2020-10-12 16:15:37 UTC
Hi Stas, we do batch render calls and execute them all at once before the present. You should assume that the back buffer is completely randomized at the end of the present and you need write the entire screen between the time of the last present and the current frame's present. This isn't true on all drivers, but needs to be done for correct rendering in all cases.

For situations where you only need to redraw small portions of the screen as things change, people often draw to a target texture and then copy that to the screen each frame. Keep in mind that in some cases you can lose the contents of the target texture (D3D reset, alt-tab, etc.) and will occasionally have to redraw the entire thing.

Does that help?
Comment 15 Stas Sergeev 2020-10-12 16:51:35 UTC
Yes, this is exactly what I do: update the
texture by parts, and then copy it to the render
entirely. Then do RenderPresent().
The problem is that I can see the glitches even
with your testoverlay2 (see comment #7 to see
how I switch it to the software renderer).
So the bug is definitely not on my side.
But maybe something is wrong with my amdgpu
driver, so the testing from others might be
good.
Comment 16 Stas Sergeev 2020-10-15 17:00:58 UTC
OK, so this is specific to 2.0.12.
I don't have that with hg code.
Sorry for noise!
Comment 17 Manuel Alfayate Corchete 2020-10-15 17:42:44 UTC
Ok, don't worry :)