We are currently migrating Bugzilla to GitHub issues.
Any changes made to the bug tracker now will be lost, so please do not post new bugs or make changes to them.
When we're done, all bug URLs will redirect to their equivalent location on the new bug tracker.

Bug 4150

Summary: SDL needs an aligned malloc...
Product: SDL Reporter: Ryan C. Gordon <icculus>
Component: *don't know*Assignee: Ryan C. Gordon <icculus>
Status: REOPENED --- QA Contact: Sam Lantinga <slouken>
Severity: normal    
Priority: P2 CC: flibitijibibo
Version: HG 2.0   
Hardware: All   
OS: All   
Attachments: patch that adds SIMD allocator.

Description Ryan C. Gordon 2018-04-25 22:08:14 UTC
SDL_malloc(), or rather malloc() in general, doesn't promise alignment of the allocated buffer. It would be nice if we had something like SDL_memalign() that promises alignment, so we know that allocated buffers are definitely primed to be used with SIMD instructions, or sit at the start of a memory page, etc.

posix_memalign() is the usual way to do this on Unix systems, and it works with free(). MSVC's runtime offers _aligned_malloc(), but you have to use it with _aligned_free().

If we can make the internal implementation of SDL_malloc() return a pointer that can be used with SDL_free(), then we can wrap posix_memalign() the same way as we wrap malloc, otherwise we'll probably just want something that unconditionally sits of top of SDL_malloc, like this:

void *SDL_malloc_aligned(const size_t len, const size_t alignment)
{
    Uint8 *retval = NULL;
    Uint8 *ptr = (Uint8 *) SDL_malloc(len + alignment + sizeof (void *));
    if (ptr) {
        void **storeptr;
        retval = ptr + sizeof (void *);
        retval += alignment - (((size_t) retval) % alignment);
        storeptr = (void **) retval;
        storeptr--;
        *storeptr = ptr;
    }
    return retval;
}

void SDL_free_aligned(void *ptr)
{
    if (ptr) {
        void **realptr = (void **) ptr;
        realptr--;
        SDL_free(*realptr);
    }
}

...note that this works, but could be wasting huge amounts of memory if you are trying to align to a memory page (over 4 kilobytes if the malloc call gave you a page-aligned pointer in the first place)...but if we limit alignment to what a system needs for SIMD data (16 bytes), then this isn't a big deal.

Something to consider.

--ryan.
Comment 1 Ryan C. Gordon 2018-05-13 05:45:43 UTC
Created attachment 3246 [details]
patch that adds SIMD allocator.


I decided to narrow down the focus to just what SIMD instruction sets need instead of a general aligned allocator.

This will give you an allocated pointer that's aligned for whatever SIMD instructions your system supports (so if you have SSE, it'll align to 16 bytes, but if you have AVX2, it'll align to 32, etc), and make sure it's padded out so you can safely overflow the buffer for a final SIMD instruction (say, if you're processing floating point audio samples, and you have 2 samples at the end but your SSE code wants to read a vector of 4, you can do so without segfaulting, even though the extra 2 elements are garbage. Otherwise, scalar code has to be written for the last two elements to prevent overflow).

The specific need for this is various data in SDL, like surface blitting and audio mixers, that can use SIMD but we can't guarantee that SDL_malloc() will give us an aligned block. As this can be useful for outside code, I've made it a public API in this patch.

--ryan.
Comment 2 Ryan C. Gordon 2018-05-13 05:47:02 UTC
Posted a patch with the rough idea (but not necessarily the right API name or header, etc) for Sam to consider. Let me know if this is worth getting into revision control in some form!

--ryan.
Comment 3 Sam Lantinga 2018-05-21 02:48:03 UTC
I'd actually prefer the more general SDL_malloc_aligned() and SDL_free_aligned(), so you can allocate with the alignment you need for the instructions you're using.

If you want to add a SDL_GetSIMDAlignment(), I wouldn't be opposed, but I don't think it's necessary. If you're working with SSE code you'll align for that, AVX, you'll align for that, etc. If you don't know the alignment you need for your instruction set... you're in trouble! :)
Comment 4 Ryan C. Gordon 2018-05-21 04:13:04 UTC
> If you want to add a SDL_GetSIMDAlignment(), I wouldn't be opposed, but I
> don't think it's necessary. If you're working with SSE code you'll align for
> that, AVX, you'll align for that, etc. If you don't know the alignment you
> need for your instruction set... you're in trouble! :)

True, but my use case is more like: SDL's audio subsystem wants to hand a buffer to the app's audio callback, and doesn't know what it wants to do with the memory, other than it being a likely candidate for SIMD processing.

I think I'll rework this to be an internal API for now, for my limited uses inside the library.

--ryan.
Comment 5 Ryan C. Gordon 2018-05-21 16:02:36 UTC
I went with this for now, which can be used for allocating audio buffers, surfaces, etc.

https://hg.libsdl.org/SDL/rev/987c5dc71309

We can worry about a public API later. Or not.  :)

--ryan.
Comment 6 Sam Lantinga 2018-05-21 20:52:07 UTC
Can we add a public API for aligned malloc and just call that from the internal SIMD functions?
Comment 7 Ryan C. Gordon 2018-05-22 14:53:18 UTC
(In reply to Sam Lantinga from comment #6)
> Can we add a public API for aligned malloc and just call that from the
> internal SIMD functions?

We can (and I'll reopen the bug and leave it assigned to me), but partially my implementation is cheating here, because who cares if we waste a few bytes in a SDL_malloc() call to pad to a 16 byte alignment?

This method is less desirable if we want to align to a memory page, since you could be wasting ~4 kilobytes of memory each time. To make that efficient, we'd probably need something fancier, or use posix_memalign() or whatnot, which wouldn't use the app's implementation from SDL_SetMemoryFunctions.

--ryan.
Comment 8 Ethan Lee 2019-06-09 15:47:55 UTC
Is this resolved by https://hg.libsdl.org/SDL/rev/a2dc7ba484fd ?