| Summary: | SDL needs an aligned malloc... | ||
|---|---|---|---|
| Product: | SDL | Reporter: | Ryan C. Gordon <icculus> |
| Component: | *don't know* | Assignee: | Ryan C. Gordon <icculus> |
| Status: | REOPENED --- | QA Contact: | Sam Lantinga <slouken> |
| Severity: | normal | ||
| Priority: | P2 | CC: | flibitijibibo |
| Version: | HG 2.0 | ||
| Hardware: | All | ||
| OS: | All | ||
| Attachments: | patch that adds SIMD allocator. | ||
Created attachment 3246 [details]
patch that adds SIMD allocator.
I decided to narrow down the focus to just what SIMD instruction sets need instead of a general aligned allocator.
This will give you an allocated pointer that's aligned for whatever SIMD instructions your system supports (so if you have SSE, it'll align to 16 bytes, but if you have AVX2, it'll align to 32, etc), and make sure it's padded out so you can safely overflow the buffer for a final SIMD instruction (say, if you're processing floating point audio samples, and you have 2 samples at the end but your SSE code wants to read a vector of 4, you can do so without segfaulting, even though the extra 2 elements are garbage. Otherwise, scalar code has to be written for the last two elements to prevent overflow).
The specific need for this is various data in SDL, like surface blitting and audio mixers, that can use SIMD but we can't guarantee that SDL_malloc() will give us an aligned block. As this can be useful for outside code, I've made it a public API in this patch.
--ryan.
Posted a patch with the rough idea (but not necessarily the right API name or header, etc) for Sam to consider. Let me know if this is worth getting into revision control in some form! --ryan. I'd actually prefer the more general SDL_malloc_aligned() and SDL_free_aligned(), so you can allocate with the alignment you need for the instructions you're using. If you want to add a SDL_GetSIMDAlignment(), I wouldn't be opposed, but I don't think it's necessary. If you're working with SSE code you'll align for that, AVX, you'll align for that, etc. If you don't know the alignment you need for your instruction set... you're in trouble! :)
> If you want to add a SDL_GetSIMDAlignment(), I wouldn't be opposed, but I
> don't think it's necessary. If you're working with SSE code you'll align for
> that, AVX, you'll align for that, etc. If you don't know the alignment you
> need for your instruction set... you're in trouble! :)
True, but my use case is more like: SDL's audio subsystem wants to hand a buffer to the app's audio callback, and doesn't know what it wants to do with the memory, other than it being a likely candidate for SIMD processing.
I think I'll rework this to be an internal API for now, for my limited uses inside the library.
--ryan.
I went with this for now, which can be used for allocating audio buffers, surfaces, etc. https://hg.libsdl.org/SDL/rev/987c5dc71309 We can worry about a public API later. Or not. :) --ryan. Can we add a public API for aligned malloc and just call that from the internal SIMD functions? (In reply to Sam Lantinga from comment #6) > Can we add a public API for aligned malloc and just call that from the > internal SIMD functions? We can (and I'll reopen the bug and leave it assigned to me), but partially my implementation is cheating here, because who cares if we waste a few bytes in a SDL_malloc() call to pad to a 16 byte alignment? This method is less desirable if we want to align to a memory page, since you could be wasting ~4 kilobytes of memory each time. To make that efficient, we'd probably need something fancier, or use posix_memalign() or whatnot, which wouldn't use the app's implementation from SDL_SetMemoryFunctions. --ryan. Is this resolved by https://hg.libsdl.org/SDL/rev/a2dc7ba484fd ? |
SDL_malloc(), or rather malloc() in general, doesn't promise alignment of the allocated buffer. It would be nice if we had something like SDL_memalign() that promises alignment, so we know that allocated buffers are definitely primed to be used with SIMD instructions, or sit at the start of a memory page, etc. posix_memalign() is the usual way to do this on Unix systems, and it works with free(). MSVC's runtime offers _aligned_malloc(), but you have to use it with _aligned_free(). If we can make the internal implementation of SDL_malloc() return a pointer that can be used with SDL_free(), then we can wrap posix_memalign() the same way as we wrap malloc, otherwise we'll probably just want something that unconditionally sits of top of SDL_malloc, like this: void *SDL_malloc_aligned(const size_t len, const size_t alignment) { Uint8 *retval = NULL; Uint8 *ptr = (Uint8 *) SDL_malloc(len + alignment + sizeof (void *)); if (ptr) { void **storeptr; retval = ptr + sizeof (void *); retval += alignment - (((size_t) retval) % alignment); storeptr = (void **) retval; storeptr--; *storeptr = ptr; } return retval; } void SDL_free_aligned(void *ptr) { if (ptr) { void **realptr = (void **) ptr; realptr--; SDL_free(*realptr); } } ...note that this works, but could be wasting huge amounts of memory if you are trying to align to a memory page (over 4 kilobytes if the malloc call gave you a page-aligned pointer in the first place)...but if we limit alignment to what a system needs for SIMD data (16 bytes), then this isn't a big deal. Something to consider. --ryan.