Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help needed implementing VULKAN support on the KMSDRM backend #3879

Closed
SDLBugzilla opened this issue Feb 11, 2021 · 0 comments
Closed

Help needed implementing VULKAN support on the KMSDRM backend #3879

SDLBugzilla opened this issue Feb 11, 2021 · 0 comments

Comments

@SDLBugzilla
Copy link
Collaborator

SDLBugzilla commented Feb 11, 2021

This bug report was migrated from our old Bugzilla tracker.

Reported in version: 2.0.13
Reported for operating system, platform: Linux, All

Comments on the original bug report:

On 2020-11-12 12:53:27 +0000, Manuel Alfayate Corchete wrote:

Hi there,

Recently Vulkan has started working x-less on the Raspberry Pi, thanks to recent MESA advancements. Before that, it was working for quite a while on Intel graphics, amdgpu, etc

So, it's time for the KMSDRM backend to get Vulkan support, or maybe to do a new specific backend that would simply:

-Give the SDL2+Vulkan programs the active extensions list (SDL_GetInstanceExtensions)
-Create the vulkan surface specific to non-X Vulkan (SDL_Vulkan_CreateSurface() which in turn calls vkCreateDisplayPlaneSurfaceKHR).

In either case, only those functionality is required from SDL2, but I am failing to implement it.

I have the VK_KHR_Display extension active, and as such I can retrieve the pointer to vkCreateDisplayPlaneSurfaceKHR(), using vkGetInstanceProcAddr() on the program's instance. That proves that the extension is, indeed, active.

However, when I try to get the display count using vkGetPhysicalDeviceDisplayPropertiesKHR(), I get 0 displays. Any idea on what am I doing wrong?

Here's where I successfully retrieve the vkCreateDisplayPlaneSurfaceKHR() pointer:
https://github.com/vanfanel/SDL/blob/11e4911657edc495e196b9f262b0b523fe9ad8e0/src/video/kmsdrm/SDL_kmsdrmvulkan.c#L179

And here is where I try to retrieve the display count that results in 0 displays:
https://github.com/vanfanel/SDL/blob/11e4911657edc495e196b9f262b0b523fe9ad8e0/src/video/kmsdrm/SDL_kmsdrmvulkan.c#L276

I am trying this on AMDGPU and intel graphics with the same result.
Any Vulkan experts in the room?

On 2020-11-18 17:23:11 +0000, wrote:

Try vkGetPhysicalDeviceDisplayProperties2KHR and see if it helps.

On 2020-11-18 17:35:04 +0000, wrote:

Addendum: You will need to add yourself to the render group too, otherwise the Vulkan device will not show up.

On 2020-11-19 07:49:49 +0000, wrote:

Yet another note: Try selecting the second GPU first before selecting the first GPU.

On 2020-11-24 18:41:27 +0000, wrote:

After a bit of a research, I finally tracked down the issue. You have to drop master privileges from the KMSDRM file descriptor to allow the Mesa library to set itself as master which is required for Xorg-less Vulkan to work, since Linux does not allow more than one DRM card file descriptor to be master at the same time.

Try stealing the DRM master file descriptor by poking the /proc/self/fd/ directory and looking for symlinks to the DRM master device, ignoring the KMSDRM file descriptor. It seems to be only available on Linux, so making KMSDRM Vulkan exclusive to Linux will be your only option for now. You will need to restore master privileges when the application closes the KMSDRM window so that you can clean up KMSDRM states.

Try with the SDL 2.0.12 legacy KMSDRM backend if you end up with problems with Atomic DRM backend; the Mesa Vulkan WSI code exclusively uses legacy DRM for its Xorg-less Vulkan rendering.

On 2020-11-24 18:53:43 +0000, wrote:

By stealing I meant "check if the file descriptor references the DRM device node and is the master file descriptor and use the same value as that master file descriptor". Sorry for my English.

On 2020-11-24 22:12:31 +0000, Manuel Alfayate Corchete wrote:

@wahil1976@outlook.com

Thanks a lot!

Don't you think it would be better to do a new backend instead of reusing the KMSDRM one? From what you say, it would be better to prevent KMSDRM from even initializing, in order to let Vulkan take the file descriptor for itself.

On 2020-11-25 04:42:23 +0000, wrote:

Going down the route of creating the new backend purely for Xorg-less Vulkan isn't something that would simply work; many SDL2 applications ignore whatever backend SDL2 is using and rely upon several assumptions they may have made when creating the SDL2 window. You would also need a Vulkan instance to do anything useful and the application may not provide it at all. It would be really complicated. You would need to complicate the graphical cursor code further for example.

On 2020-11-25 05:23:13 +0000, wrote:

Here's a hint: Drop master privileges from KMSDRM before calling vkEnumeratePhysicalDevices, and then if you can't create the surface, find the file descriptor that references the DRM primary node device file and is master and close it, assuming it is not the KMSDRM one and is instead the one by Mesa library.

On 2020-11-29 14:54:02 +0000, Manuel Alfayate Corchete wrote:

@wahil1976@outlook.com

Thanks again.
I do this to check the contents in /proc/self/fd

struct dirent *res;
DIR *folder;

if (folder) {
while ((res = readdir(folder))) {
printf("---%s\n", res->d_name);
}
closedir(folder);
}

I can easily confirm that the KMSDRM FD number (3 in this case) is on that list.

However, what do you mean by this?
"check if the file descriptor references the DRM device node and is the master file descriptor and use the same value as that master file descriptor"

I think you believe that I must confirm that the KMSDRM FD is on the list of FDs in /proc/self/fd, right?

But then, how do I take it from KMSDRM? By closing it and opening it again? In my head that makes no sense: we are still in the KMSDRM backend, I would get the same FD if I reopen it again, wouldn't I?

Sorry if I sound a bit obtuse, trying my best here, really.

On 2020-11-29 16:20:30 +0000, wrote:

Opening the same file again will only yield another file descriptor, as per UNIX rules.

A alternate solution is to save the Vulkan file descriptor and then whenever the backend is invoked from the application, temporarily make the KMSDRM file descriptor master and restore the master to Vulkan file descriptor at the end of the function or if any error occurs.

Note that hardware cursor is impossible when Vulkan is running because the Vulkan swapchain function will disable hardware cursor whenever it is called.

On 2020-11-29 16:21:28 +0000, wrote:

Check the contents of /proc/self/fd and /dev/fd too.

On 2020-11-29 16:41:51 +0000, Manuel Alfayate Corchete wrote:

@wahil1976@outlook.com
What Vulkan file descriptor? I only open an FD from KMSDRM, but not from Vulkan.

And... What do you mean by "make file descriptor master"?
How does one make a file descriptor master? Master of what?

On 2020-11-29 16:47:10 +0000, Manuel Alfayate Corchete wrote:

(In reply to Manuel Alfayate Corchete from comment # 12)

@wahil1976@outlook.com
What Vulkan file descriptor? I only open an FD from KMSDRM, but not from
Vulkan.

And... What do you mean by "make file descriptor master"?
How does one make a file descriptor master? Master of what?

I have been reading about this. Ok, there are ioctls for SET_MASTER and DROP_MASTER.

But still, I don't see it clear. Should I do the following?
-Issue IOCTL DROP_MASTER on the KMSDRM FD before calling vkGetPhysicalDeviceDisplayPropertiesKHR()
-Restore it before destroying the SDL/KMSDRM window.

Is that right??

But then: Why should I iterate on /proc/self/fd? I already have the KMSDRM FD.

On 2020-11-29 17:02:00 +0000, Manuel Alfayate Corchete wrote:

@wahil1976@outlook.com

Ok! I am doing:

ioctl(viddata->drm_fd, DRM_IOCTL_DROP_MASTER, 0);

...and it DOES drop MASTER on the DRM FD, and I can indeed recover the number of displays via vkCreateDisplayPlaneSurfaceKHR().

But there's another small problem: the DRM_IOCTL_DROP_MASTER ioctl requires root permissions... At least until Linux kernel 5.8, where it is not required anymore:

https://stackoverflow.com/questions/29708596/drmdropmaster-requires-root-privileges

On 2020-11-29 17:27:17 +0000, Manuel Alfayate Corchete wrote:

@wahil1976@outlook.com Currently, testvulkan.c seems to be failing with:

ERROR: vkQueuePresentKHR(): VK_ERROR_SURFACE_LOST_KHR

Which seems to mean that the surface is no longer available (¿?)

It's strange because this call DOES succeed:

result = vkCreateDisplayPlaneSurfaceKHR(instance, &createInfo,
NULL, surface);

It returns VK_SUCCESS.

So, any idea on what's taking the surface away?

On 2020-11-29 18:09:36 +0000, wrote:

A look at the Mesa WSI source code indicates that surface will be lost if:

  1. drmModeGetResources fails in the library.
  2. The DRM connector cannot be setup again.
  3. The Vulkan file descriptor is no longer valid.

There's a merge request from Keith Packard in the Mesa Gitlab repository that makes everything done through the atomic DRM API if it is supported but it hasn't been merged yet. Another one by Jonathan Marek does the same thing but removes the legacy DRM API support entirely, but it hasn't been merged either.

[1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6173
[2] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4176

The former is more likely to work since it doesn't involve the hardware cursor crap and does not involved setting up the connector again. The latter I am not sure since I didn't look closely at the code. Keith Packard was also working on a Vulkan extension named VK_KEITHP_kms_display that would prove very useful here (since you can tell the library to use the KMSDRM file descriptor instead of opening its own) but there was zero progress on that since then.

I am not sure what is the solution here currently.

On 2020-11-29 18:53:12 +0000, Manuel Alfayate Corchete wrote:

@wahil1976@outlook.com Ohh.. I thought it would be easier. We are almost there!
If you have any idea, please tell me so I can try...

On 2020-11-29 21:29:15 +0000, Manuel Alfayate Corchete wrote:

@wahil1976@outlook.com I have updated the code for SDL_kmsdrmvulkan:

https://github.com/vanfanel/SDL/blob/master/src/video/kmsdrm/SDL_kmsdrmvulkan.c

Where do you think that KMSDRM is stealing the Vulkan file descriptor?
I have disabled the creation of the dumb_buffer, the KMSDRM_CreateSurfaces() call in KMSDRM_CreateWindow()... but still, Vulkan is losing it's FD.

On 2020-11-30 17:15:16 +0000, wrote:

Try releasing the connector and CRTCs that KMSDRM claimed to let Vulkan claim it for itself.

On 2020-12-02 16:55:11 +0000, Manuel Alfayate Corchete wrote:

@wahil1976@outlook.com

I have tried freeing the KMSDRM CRTC and connector as soon as I enter Vulkan_LoadLibrary(), but I am still receiving:

ERROR: vkQueuePresentKHR(): VK_ERROR_SURFACE_LOST_KHR

So, I went and simply disabled the code for CRTC, connector and primary plane selection in KMSDRM_VideoInit(). You can see it here:

https://github.com/vanfanel/SDL/blob/48aab9caec742496051ce938e6ae0efe978807d7/src/video/kmsdrm/SDL_kmsdrmvideo.c#L1158

As you will see, KMSDRM_VideoInit() does VERY little now: it does not take any planes, CRTCs or connectors, to let Vulkan work with these instead.

What can be failing now? Really, KMSDRM is not taking anything now, yet the same problem happens.

What's curious is that if I don't let KMSDRM open drm_fd, vkGetPhysicalDeviceDisplayPropertiesKHR() fails to get the display_count. Isn't Vulkan supposed to open his own FD itself?

On 2020-12-02 17:04:53 +0000, wrote:

Did you disable the GBM stuff?

On 2020-12-02 17:40:45 +0000, Manuel Alfayate Corchete wrote:

@wahil1976@outlook.com

Yes, you can see here that the GBM device creation is disabled:

https://github.com/vanfanel/SDL/blob/48aab9caec742496051ce938e6ae0efe978807d7/src/video/kmsdrm/SDL_kmsdrmvideo.c#L1225

On 2020-12-02 18:25:10 +0000, Manuel Alfayate Corchete wrote:

@wahil1976@outlook.com

What really puzzles me is that I need to open fd with:
viddata->drm_fd = open("/dev/dri/card0", O_RDWR | O_CLOEXEC);

And then drop master on it with:
ret = ioctl(viddata->drm_fd, DRM_IOCTL_DROP_MASTER, 0);

...Or else, vkGetPhysicalDeviceDisplayPropertiesKHR() fails to get the display_count.
The STRANGRE part is that viddata->drm_fd is never passed to any Vulkan function, at least explicitly. This is very strange.

On 2020-12-02 18:26:45 +0000, Manuel Alfayate Corchete wrote:

What I mean is that Vulkan should work without me opening the DRM FD, because Vulkan should be opening it's own FD internally.

On 2020-12-02 18:51:11 +0000, wrote:

You will need to work out a solution with Mesa library developers. Looks like I am unable to provide any help beyond this at the moment.

On 2020-12-02 19:12:52 +0000, Manuel Alfayate Corchete wrote:

(In reply to wahil1976 from comment # 25)

You will need to work out a solution with Mesa library developers. Looks
like I am unable to provide any help beyond this at the moment.

Ok, I will try my luck in #dri-devel...

On 2020-12-03 04:59:58 +0000, wrote:

So, any solutions yet?

On 2020-12-03 10:23:07 +0000, Manuel Alfayate Corchete wrote:

@wahil1976@outlook.com

No, nothing yet. Do you know any Vulkan expert I could ask?

On 2020-12-03 11:17:45 +0000, wrote:

Not that I know off.

Did you try running testvulkan as root though?

On 2020-12-03 15:14:52 +0000, Manuel Alfayate Corchete wrote:

(In reply to wahil1976 from comment # 29)

Not that I know off.

Did you try running testvulkan as root though?

Yes. Anyway, I am taking some clues from vkcube. Maybe I can make it work by following what vkcube does more closely.

On 2020-12-03 16:25:22 +0000, Manuel Alfayate Corchete wrote:

@wahil1976@outlook.com

I think I found the problem (but have yet to get a solution...).

In testvulkan.c, there is findPhysicalDevice(), which opens the physical device (gpu FD under the hood, I understand) and stores it in vulkanContext.physicalDevice

In KMSDRM_Vulkan_CreateSurface(), I need to find the physical device again because: -It's needed by vkGetPhysicalDeviceDisplayPropertiesKHR() which I use to get the displays of the phsysical device.
-It's needed by vkGetPhysicalDeviceDisplayPlanePropertiesKHR() which I use to get the planes of the physical device.

So I guess the problem comes from getting the physical device from two different places: testvulkan.c and KMSDRM_Vulkan_CreateSurface(). That would explain the lost Vulkan FD.

But now I need to find a solution, which seems no easy since testvulkan.c doesn't pass the physical device to SDL, at least explicitly. Is vulkanContext somehow accessible from SDL? If it's not, maybe this is as far as I can go without changes on how SDL works.

On 2020-12-03 18:00:34 +0000, wrote:

I think you will need to build the entire Mesa library (including the Vulkan drivers) with debug symbols and use a debugger (preferably GDB) to put a breakpoint at each and every line of VK_ERROR_SURFACE_LOST_KHR return statement in wsi_common_display.c and then run the program. Do the debugger task from a external device that can SSH to the computer where you want to run the testvulkan program. You will need to switch to a terminal on that target computer for the program to run. Once the breakpoint hits at one of those return statements it will finally pinpoint the problem, which is better than guessing in the dark.

I can't right now because for some reason GDB can't find the debug symbols in either libvulkan_intel.so or libvulkan_radeon.so no matter what I do.

On 2020-12-03 19:32:24 +0000, Manuel Alfayate Corchete wrote:

(In reply to wahil1976 from comment # 32)

I think you will need to build the entire Mesa library (including the Vulkan
drivers) with debug symbols and use a debugger (preferably GDB) to put a
breakpoint at each and every line of VK_ERROR_SURFACE_LOST_KHR return
statement in wsi_common_display.c and then run the program. Do the debugger
task from a external device that can SSH to the computer where you want to
run the testvulkan program. You will need to switch to a terminal on that
target computer for the program to run. Once the breakpoint hits at one of
those return statements it will finally pinpoint the problem, which is
better than guessing in the dark.

I can't right now because for some reason GDB can't find the debug symbols
in either libvulkan_intel.so or libvulkan_radeon.so no matter what I do.

I have done debugging exactly like that (GDB over SSH) multiple times. Raspberry Pi having Vulkan support now would help immensely if I have to come to that!
But, for a start, I believe accessing the physical device from both testvulkan.c and SDL is plain wrong and has to be corrected. Maybe I am wrong and it's fine, but VK_ERROR_SURFACE_LOST_KHR could be caused by that.

On 2020-12-04 17:20:32 +0000, Manuel Alfayate Corchete wrote:

@wahil1976@outlook.com
Couldn't get it out of my head, so finally setup a MESA debug build.
I set breakpoints at every VK_ERROR_SURFACE_LOST_KHR return in wsi_common_display.c as you suggested, and I found that it's returned exactly here:

Breakpoint 8, _wsi_display_queue_next (drv_chain=0x5555666160) at ../src/vulkan/wsi/wsi_common_display.c:1742

Which is this block in wsi_common_display.c:

 if (ret != -EACCES) {
     connector->active = false;
     image->state = WSI_IMAGE_IDLE;
     return VK_ERROR_SURFACE_LOST_KHR;
 }    

Where the evaluated "ret" comes from here:

    /* XXX allow setting of position */
     ret = drmModeSetCrtc(wsi->fd, connector->crtc_id,
                          image->fb_id, 0, 0,
                          &connector->id, 1,
                          &connector->current_drm_mode);

...Where drmModeSetCrtc() returns -28, which according to:
(gdb) p strerror(errno)

means...
$2 = 0x7ff7e4ef10 "No space left on device"

So.. what does that mean in this context? All drmModeSetCrtc() is an ioctl on the FD under the hood, but never had to deal with that errno on a graphics fd.

On 2020-12-04 17:33:12 +0000, wrote:

Try grepping the drivers/gpu/drm/ folder inside the Linux kernel source with pattern -ENOSPC to find for any clues of what could be causing it to return -28 (-ENOSPC).

On 2020-12-04 17:55:40 +0000, Manuel Alfayate Corchete wrote:

wahil1976@outlook.com
There are countless places where that's returned.
On the VC4 driver, which is small, it's only being returned on these two places, which makes no sense here:

    /* The absolute limit is 2Gbyte/sec, but let's take a margin to let
     * the system work when other blocks are accessing the memory.
     */
    if (load_state->membus_load > SZ_1G + SZ_512M)
            return -ENOSPC;

    /* HVS clock is supposed to run @ 250Mhz, let's take a margin and
     * consider the maximum number of cycles is 240M.
     */
    if (load_state->hvs_load > 240000000ULL)
            return -ENOSPC;

On 2020-12-04 18:09:51 +0000, Manuel Alfayate Corchete wrote:

@wahil1976@outlook.com

In the end, I went back to my old ways and instead actitated the kernel drm debug logging, and looked through dmesg, which told me this:

[11353.733278] [drm:drm_framebuffer_check_src_coords [drm]] Invalid source coordinates 1920.000000x1080.000000+0.000000+0.000000 (fb 640x480)

It seems I am somehow trying to scan a 1920x1080 image on a 640x480 fb.
Doesn't seem to be related to physical device after all, heh.

On 2020-12-04 18:16:22 +0000, wrote:

That explains why I am seeing GZDoom (it's a DOOM source port if you didn't know) report "Resolution: 640x480" at startup with OpenGL renderer on pre-Vulkan KMSDRM.

Find some way to make sure the KMSDRM framebuffer is always created at the highest resolution available. That should fix the Vulkan "surface lost" problem.

On 2020-12-06 07:31:21 +0000, wrote:

This code should do the trick. Use it after you get the number of display modes.
dispModeProps = calloc(sizeof(VkDisplayModePropertiesKHR),mode_count);
vkGetDisplayModePropertiesKHR(gpu, display, &mode_count,dispModeProps);
for (int i = 0; i < mode_count; i++)
{
if (dispModeProps[i].parameters.visibleRegion.width == dispdata->mode.hdisplay
&& dispModeProps[i].parameters.visibleRegion.height == dispdata->mode.vdisplay)
{
mode_props = dispModeProps[i];
break;
}
}
free(dispModeProps);
if (mode_props.parameters.visibleRegion.width == 0
|| mode_props.parameters.visibleRegion.height == 0)
{
SDL_SetError("Vulkan did not return any display modes matching the current desktop mode");
return SDL_FALSE;
}

On 2020-12-06 11:16:14 +0000, Manuel Alfayate Corchete wrote:

(In reply to wahil1976 from comment # 39)

This code should do the trick. Use it after you get the number of display
modes.
dispModeProps = calloc(sizeof(VkDisplayModePropertiesKHR),mode_count);
vkGetDisplayModePropertiesKHR(gpu, display, &mode_count,dispModeProps);
for (int i = 0; i < mode_count; i++)
{
if (dispModeProps[i].parameters.visibleRegion.width ==
dispdata->mode.hdisplay
&& dispModeProps[i].parameters.visibleRegion.height ==
dispdata->mode.vdisplay)
{
mode_props = dispModeProps[i];
break;
}
}
free(dispModeProps);
if (mode_props.parameters.visibleRegion.width == 0
|| mode_props.parameters.visibleRegion.height == 0)
{
SDL_SetError("Vulkan did not return any display modes matching the
current desktop mode");
return SDL_FALSE;
}

I had figured that out already, but thanks (dispModeProps in your code must be initialized because VkDisplayModePropertiesKHR structs seem to come with garbage values at least here).

I am now trying to figure out how to display, let's say, a 320x240 window (or any other window sizes with no matching video modes) without an intermediate buffer.

Thing is:
-vkCreateDisplayPlaneSurfaceKHR() accepts displayMode and imageExtent parameters.
-displayMode has the physical video mode to use.
-imageExtent, as far as I understand, should let me specify the size of the image to scanout, which should be independent from displayMode.

...But I get VK_ERROR_SURFACE_LOST_KHR again if I try an imageExtent smaller than the displayMode.parameters.visibleRegion.
Any ideas on this one?

On 2020-12-06 16:44:30 +0000, wrote:

That's strange. But I would guess the smaller the ImageExtent values the larger the image and the larger the ImageExtent values the smaller the image.

On 2021-01-15 21:49:19 +0000, Manuel Alfayate Corchete wrote:

This can be closed, Vulkan support is in place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant