The #VectorAutoRegression stuff I've been doing can be summarized as
$$ y_t = \sum_{i=1}^p A_i y_{t - i} $$
where each $y_t$ is a $D$-vector and each $A_i$ is a $D \times D$ matrix.

Given an input series of $y$ values, the $A_i$ can be calculated by #LeastSquares minimization:

$$
Y = [ y_p ; y_1; \ldots ; y_{T - 1} ] \\
X = [ y_{p-1}, y_{p-2}, \ldots, y_0 ; y_p, y_{p-1}, \ldots, y_1 ; \ldots ; y_{T-2}, y_{T-3}, \ldots, y_{T-1 - p} ] \\
A = \left(X^T X\right)^{-1} X^T Y
$$
where the matrices have these initial dimensions:
Y : (T - p) × D
X : (T - p) × (p × D)
A : (p × D) × D
then reshape $A$ to $p × (D × D)$

In my case all values are complex (and ^T is conjugate transpose) because each $y$ is an FFT of a block of input data - I'm using FFT size $256$ (making $D = 129 = 256/2+1$ unique bins for real input) overlapped 4x.

#VectorAutoRegression is a pure feedback model, similar to poles in the pole-zero representation of z-transform filters. To add zeros to the model is quite simple:

$$ y_t = \sum_{i=1}^p A_i y_{t - i} + \sum_{i=0}^q B_i x_{t-i}$$

and then add the $x$ vectors and $B$ matrices to the least squares stuff. I think the jargon for this is #VARMA, for Vector Auto-Regressive Moving Average.

I worked it through and coded up most of it before realizing a #FatalFlaw : I need the $x$ input data corresponding to the $y$ output data to do the regression to estimate the $A$ and $B$ parameter matrices, while so far I've just been using WAV files as $y$ series, which means I don't have any $x$.

Now thinking of using #Timidity software MIDI synthesizer to generate $(x,y)$ pairs: X could be a series of notes played with a piano and Y the same played with a guitar, hard-panned left and right so they are synchronized.

The #GeneralMIDI instrument list is at
https://www.midi.org/specifications-old/item/gm-level-1-sound-set

GM 1 Sound Set

THE MIDI ASSOCIATION, a global community of people who work, play and create with MIDI and the central repository of information about anything related to MIDI.

First successful #VARMA test with small P, Q. (Follows about 25 unsuccessful tests.)

Audio attached has 3 parts, the first is the target sound (Y), the second is the input sound (X), the third is the VARMA model applied to X (normalized in volume) which should hopefully sound like Y.

P = 0 (one matrix for input mapping, no real moving average here...)
Q = 1 (one matrix for auto-regressive feedback)
D = 129 (number of dimensions for FFT vectors)
T = 18103 (number of input vectors)

Computed #SpectralRadius of the feedback matrix was 0.9144670749524257, less than 1 so it's stable (good).

The peak level of the output audio was 0.148676, significantly quieter than I expected (most of my previous tests using Python statsmodels VAR had gains around 1.0e+7 or so...).

Just realized the input X and output Y need not be equally dimensioned. So maybe I could make something that's more like a synthesizer than an audio filter...

But I think this whole system I've been working on today is #LinearTimeInvariant, so, not too much scope for interesting behaviour... e.g. with two control inputs, the output is simply a mix of two impulse responses, which isn't so amazing.

I think the power of other algorithms like neural networks comes mainly from the non-linearities.

Maybe it's not Linear after all, because linear systems can't add frequencies that are not already there, while matrix transformations operating across FFT bins can certainly do that.