We are currently migrating Bugzilla to GitHub issues.
Any changes made to the bug tracker now will be lost, so please do not post new bugs or make changes to them.
When we're done, all bug URLs will redirect to their equivalent location on the new bug tracker.

Bug 3876

Summary: Resampling of certain sounds adds heavy distortion
Product: SDL Reporter: Manuel Alfayate Corchete <redwindwanderer>
Component: audioAssignee: Ryan C. Gordon <icculus>
Status: RESOLVED FIXED QA Contact: Sam Lantinga <slouken>
Severity: major    
Priority: P2 CC: chli.hug, sezeroz
Version: 2.0.6   
Hardware: ARM   
OS: Linux   
Attachments: Small program that showcases the heavy sound distortion that happens on ARM with certain samples.
Patch that adds [-1, 1] clamping to the scalar audio type conversions.
Alternative (and sligthly faster) patch that adds [-1, 1] clamping to the scalar audio type conversions.

Description Manuel Alfayate Corchete 2017-10-12 16:06:58 UTC
Created attachment 2980 [details]
Small program that showcases the heavy sound distortion that happens on ARM with certain samples.

Hi,

With the attached test, which is a modified version of the test attached on https://bugzilla.libsdl.org/show_bug.cgi?id=3846 , strong distortion can be hear on ARM only. Use build.sh on the Raspberry Pi 3 for rebuilding easily.

This first started happening on SDL 2.0.4, and I first noticed it with RAWGL (https://github.com/cyxx/rawgl).
I thought it was something specific with that game, but it's not.

My experiments so far confirm that the problem is on SDL_RWFromConstMem() or Mix_LoadWAV_RW(). This is why:
Let's say I have a sample as a uint_8* array. It's the exact same sample on X86_64 and ARM, and it has the exact same values byte per byte.
If I copy that sample to an SDL_RWops struct with SDL_RWFromConstMem() and then build an Mix_Chunk from it using Mix_LoadWAV_RW(), while the original sample was the same on both architectures, the resulting chunk->abuf is different.
You can easily see this on GDB with "x/<chunk_len>xb chunk->abuf".

This bug only appears on ARM (tested on Raspberry Pi 3 with default gcc on Raspbian and no special compilation flags), it does *not* appear on X86_64.

This bug is not a duplicate of
https://bugzilla.libsdl.org/show_bug.cgi?id=3846
or
https://bugzilla.libsdl.org/show_bug.cgi?id=3851

This bug causes heavy distortion, not popping, and has to be tested on ARM, since it WON'T appear on X86_64.

Thanks
Comment 1 Ryan C. Gordon 2017-10-12 16:20:54 UTC
Looking into this.
Comment 2 Ryan C. Gordon 2017-10-13 16:42:05 UTC
(In reply to Ryan C. Gordon from comment #1)
> Looking into this.

The bug definitely reproduces on my RPi3 with your test program, but to my surprise, it wasn't SDL's new resampler at fault. This script resamples that wave file to various frequencies and then plays it back, and they all seem to be okay.

for freq in 8000 11025 22050 44100 48000 96000 192000 ; do 
    echo $freq ...
    ./testresample test.wav $freq.wav $freq 1
    ./loopwave $freq.wav
done

It might be SDL_mixser or SDL's streaming code. Still digging in.

--ryan.
Comment 3 Manuel Alfayate Corchete 2017-10-14 14:00:06 UTC
@r(In reply to Ryan C. Gordon from comment #2)
> (In reply to Ryan C. Gordon from comment #1)
> > Looking into this.
> 
> The bug definitely reproduces on my RPi3 with your test program, but to my
> surprise, it wasn't SDL's new resampler at fault. This script resamples that
> wave file to various frequencies and then plays it back, and they all seem
> to be okay.
> 
> for freq in 8000 11025 22050 44100 48000 96000 192000 ; do 
>     echo $freq ...
>     ./testresample test.wav $freq.wav $freq 1
>     ./loopwave $freq.wav
> done
> 
> It might be SDL_mixser or SDL's streaming code. Still digging in.
> 
> --ryan.

If you look at my report, I point Mix_LoadWAV_RW() from SDL_Mixer and SDL_RWFromConstMem() from SDL itself as the possible culprits, because that's where the differenciation of the data bewteen X86_64 and ARM happens...
Comment 4 Ryan C. Gordon 2017-10-14 15:32:41 UTC
(In reply to Manuel from comment #3)
> If you look at my report, I point Mix_LoadWAV_RW() from SDL_Mixer and
> SDL_RWFromConstMem() from SDL itself as the possible culprits, because
> that's where the differenciation of the data bewteen X86_64 and ARM
> happens...

Oh, whoops, you did say that, sorry for not reading more carefully!

--ryan.
Comment 5 Simon Hug 2017-10-18 05:30:58 UTC
Created attachment 3003 [details]
Patch that adds [-1, 1] clamping to the scalar audio type conversions.

This may come from the SDL_Convert_F32_to_X_Scalar functions. They don't clamp the float value to [-1, 1] and when they cast it to the target integer it may be too large or too small for the type and get truncated, causing horrible noise.

The attached patch throws clamping in, but I don't know if that's the preferred way to fix this. For x86 (without SSE) the compiler (I tested MSVC) seems to throw a horrible amount of x87 code in it. It's a bit better with SSE, but probably still quite the performance hit. And SSE2 uses a branchless approach with maxss and minss.
Comment 6 Manuel Alfayate Corchete 2017-10-18 09:53:11 UTC
@Simon: it works good here, with this patch applied no more horrendous distortion can be heard on the test or games (RAWL, SDLPop).
I hope it's included soon, to be ready for 2.0.7. Last two minor versions (2.0.5 and 2.0.6) had sound broken on ARM :(
Comment 7 Manuel Alfayate Corchete 2017-10-18 10:08:29 UTC
(In reply to Manuel from comment #6)
> @Simon: it works good here, with this patch applied no more horrendous
> distortion can be heard on the test or games (RAWL, SDLPop).
> I hope it's included soon, to be ready for 2.0.7. Last two minor versions
> (2.0.5 and 2.0.6) had sound broken on ARM :(

Also, with regard to possible NEON optimization of the clamping, there are saturating opcodes on NEON, yes. I posted them on the forum but I leave them here for reference too:

VQABS                   Absolute value, saturate                        V{Q}ABS and V{Q}NEG
VQADD                   Add, saturate                                   V{Q}ADD, VADDL, VADDW, V{Q}SUB, VSUBL, and VSUBW
VQDMLAL, VQDMLSL        Saturating Doubling Multiply Accumulate, and Multiply Subtract  VQDMULL, VQDMLAL, and VQDMLSL (by vector or by scalar) 
VQDMUL          Saturating Doubling Multiply                            VQDMULL, VQDMLAL, and VQDMLSL (by vector or by scalar)
VQDMULH         Saturating Doubling Multiply returning High half        VQ{R}DMULH (by vector or by scalar)
VQMOV{U}N       Saturating Move (register)                              VMOVL, V{Q}MOVN, VQMOVUN
VQNEG           Negate, saturate                                        V{Q}ABS and V{Q}NEG
VQRDMULH        Saturating Doubling Multiply returning High half        VQ{R}DMULH (by vector or by scalar)
VQRSHL          Shift Left, Round, saturate (by signed variable)        V{Q}{R}SHL (by signed variable)
VQRSHR{U}N      Shift Right, Round, saturate (by immediate)             VQ{R}SHR{U}N (by immediate)
VQSHL           Shift Left, saturate (by immediate)                     VSHL, VQSHL, VQSHLU, and VSHLL (by immediate)
VQSHL           Shift Left, saturate (by signed variable)               V{Q}{R}SHL (by signed variable)
VQSHR{U}N       Shift Right, saturate (by immediate)                    VQ{R}SHR{U}N (by immediate)
VQSUB           Subtract, saturate                                      V{Q}ADD, VADDL, VADDW, V{Q}SUB, VSUBL, and VSUBW
Comment 8 Ozkan Sezer 2017-10-18 11:12:59 UTC
How about doing the float or double multiplication first as it has
been, running lrint or lrintf on the result, and then clamping the
integral value instead?  (SDL_Convert_F32_to_S32_Scalar() might be
a problem though..)
Comment 9 Simon Hug 2017-10-18 16:27:35 UTC
I'm always surprised how much stuff they packed into C99. To my knowledge, SDL still targets C90 (because of MSVC 2010 maybe?). A few of the dependency headers are already C99 so perhaps that's something to consider for SDL 2.1.

I agree that clamping the integer may be better, but these functions don't seem to handle overflows the way we'd like them to here. Dealing with overflows is so expensive... how annoying.


Well, if someone is interested in getting into writing NEON code, here's the perfect little project. Simple algorithm, short function. :)
Comment 10 Manuel Alfayate Corchete 2017-10-18 17:06:10 UTC
@Simon: Can your patch go into mainline as it's for now? I am not getting time to study NEON or assembly in general till next year, but I would like to have SDL2 sounding right on ARM: 2.0.6 is broken without this fix.
Comment 11 Simon Hug 2017-10-18 17:26:24 UTC
It's for Ryan to decide if this overhead is acceptable or if he wants to clamp in the resampler. Or maybe he has an even better idea!
Comment 12 Simon Hug 2017-10-18 20:44:47 UTC
Created attachment 3012 [details]
Alternative (and sligthly faster) patch that adds [-1, 1] clamping to the scalar audio type conversions.

Attaching alternative patch that performs slightly better despite the branching. 

The C99 function signbit could make it faster on ARM. Varies heavily depending on compiler flags.
Comment 13 Manuel Alfayate Corchete 2017-10-19 00:57:32 UTC
@Simon: This new version of the patch works perfectly well, too. No distorted audio on tests or games on ARM.
Comment 14 Sam Lantinga 2017-10-19 02:32:25 UTC
Simon, your patch is in, thanks!
https://hg.libsdl.org/SDL/rev/37aca00967db

This is a good fix for now, and it turns out that recent versions of gcc and clang can optimize this to SSE. I haven't checked to make sure these particular functions get optimized that way, but it's good to know when we check the performance.

Cheers!