I'm deep diving into samplebrain. A program commissioned by Aphex Twin! https://gitlab.com/then-try-this/samplebrain/-/tree/production?ref_type=heads

See my posts below to follow my progress. #audio #dsp

Files · production · Then Try This / samplebrain · GitLab

A custom sample mashing app designed by Aphex Twin.

GitLab

In short, the "brain" is built by extracting an FFT and MFCC after chopping audio into "blocks". Blocks which are closer together by comparing a blend of the FFT and MFCC form a "synapse", essentially a memoization for quicker look-ups.

When a new audio file is added to the "brain", the flow is roughly:
1. Input audio is flatted from stereo to mono
2. Chopped into overlapping blocks
3. An FFT and MFCC are extracted
4. The new "block" is compared with other blocks existing in the "brain"

I was surprised to learn that only an FFT and MFCC were extracted to create such a complex system.

I was expecting more complicated feature extraction.

An FFT converts an audio signal into the frequencies present at a single time step.

An MFCC converts an FFT into the perceptual range of human hearing.

Here's some links for understanding FFT's and MFCC's:
- https://www.reddit.com/r/DSP/comments/43o2bz/mfcc_vs_fft/
- https://vtiya.medium.com/input-and-output-of-mfcc-and-fft-d51739d21439

Here's some more info for points 2. and 4.

2. Chopped into overlapping blocks
- Blocks are 5000samples long by default (@ 44.1kHz this is 100ms)
- Windows are `block_length / 2`, so 2500samples by default

4. The resulting "block" is compared with other blocks existing in the "brain"
- Comparison is made using a `blend()` of the FFT and MFCC from block `a` and comparing that to block `b`
- `blend()` is defined in `brain/src/block.cpp`

More info about the FFT and MFCC used in samplebrain:

- The # of FFT bins is set to the "block" size by default. However, there is an upper bound of 100 bins.

- The # of MFCC filters is 12.

#fft #mfcc