Hello Rasheeda, and Others interested in the algorithm used for EEG source modeling,

Thanks to questions by Rasheeda, and in timely parallel, my own modeling efforts, we snaked out a workflow problem in Brainstorm that was sometimes plaguing our new ability to source model EEG data, while also allowing users to make their own custom EEG montages.

**For those whom this post is TL;DR, then the IMPORTANT NOTE IS: All EEG source modelers should update to the latest Brainstorm, to correct a problem with the noise covariance matrix when some channels were marked "bad."**

For those remaining, we wanted to take this opportunity to describe why average referencing always works in source modeling, and why we therefore ignore your specific montages.

**Average Referencing**

Unlike MEG, in EEG every "referential" channel is acquired with respect to some reference point "R", say. (Note: Brainstorm at this time does not support source modeling of unique "differential" channels, where each channel may have a unique reference. We assume the common practice that scalp (or invasive) data are collected "referentially," i.e. all channels share a common reference point.) The problem is that the reference may be something we cannot model, such as ear-clips, mastoid, etc. In other cases, the reference channel is a scalp point that may have not been recorded or reported. Finally, as is the practice at my site, the reference is one of the standard 10-20 channels (Pz), so that channel is recorded as a flat "zero" channel (since it is referenced to itself). We have a bookkeeping problem of understanding which reference did a user employ and did they include that channel in the imported data matrix "D". Fortunately, average referencing is an exact and convenient method to account for all reference approaches.

Consider three channels numbered 1, 2, 3, plus a fourth channel R, acquired as (1-R), (2-R), (3-R), and (R-R). In Case 1, the user gives us just the first three channels, i.e. we don't know or record R. We form the average as ((1-R) + (2-R) + (3-R)) /3 to form A = (1 + 2 + 3)/3 - R. We then subtract A from each channel the user gave us to form the "average referenced" montage (1 - (1 + 2 + 3)/3), (2 - (1 + 2 + 3)/3), (3 - (1 + 2 + 3)/3). In other words, R is gone from consideration.

In Case 2, the user gives us the explicit channel R as well (it's just all zeros in the channel). We still follow the same black box approach, but now for four channels. The average reference is A = (1 + 2 + 3 + R)/4 - R, and the "average referenced" montage is (1 - (1 + 2 + 3 + R)/4), (2 - (1 + 2 + 3 + R)/4), (3 - (1 + 2 + 3 + R)/4), and (R - (1 + 2 + 3 +R)/4). Note now that even the zero reference channel R now has an apparent signal in the average reference montage, which is perfectly to be expected; we still only have a rank 3 data matrix for the three channels of data provided, just now spread across four non-zero channels. The point is that the average reference approach is identical for either case, averaging all channels the user provided and subtracting it out.

**Infinite Reference Model**

Allow me to use my own notation, now consider the general data equation

D = G Q + N

where D is the data matrix, say 23 channels of 10-20 EEG data for T seconds, G is the source model (gain matrix) for a dipole, Q is the dipole time series (3 x T), and N is the 23 x T matrix of unknown additive noise. G (23 x 3) is computed as the potential at the primary electrode position, for a dipole somewhere in the brain (e.g. on the cortex). What may seem strange to some users is that source modeling algorithms compute this potential with respect to a reference at "infinity," a physical impossibility, but mathematically quite tractable. To actually compute the true potential **difference** recorded in a channel and use it in a source model, we compute two such "infinity" electrode points and subtract them, such as (1-R) above, which, notably, neatly removes the impossible "infinity" reference from the model. But, as we noted above, we may not always know where the reference was, or that we cannot compute the infinite reference model (we have no good source modeling for ear clips, for instance.)

The average reference approach neatly solves this problem. Extending the toy case examples above, we can create a matrix W that applies the averaging to both the physical data matrix D and the infinite reference model G. For the three channel case, we see that W is

```
W =
[ 2/3 -1/3 -1/3
-1/3 2/3 -1/3
-1/3 -1/3 2/3]
```

If we apply this W to both D and to G, then we neatly remove the physically unspecified reference channel from D and simultaneously the physically impossible infinite reference from G, so that the equation is properly

WD = WG Q + WN

and we note that the weight matrix must also be applied to the noise matrix N. Thus we are able to use data with an arbitrary reference and a source model with an impossible infinite reference in the same modeling equation.

**Inverse Solution**

For simplicity in notation, consider now the "perfect" modeling equation

D = G Q + N

where D has a known reference that has been exactly characterized in G, i.e. we're not concerned about unknown references, and G has been calculated with respect to the true reference, not infinity. The GLS (generalized least-squares) solution to this model is

Qhat = (G' C^{-1} G)^{-1} G' C^{-1} D

where C^{-1} is the inverse of the noise covariance matrix, and G' is the transpose of the gain matrix. We call this estimate the generalized least-squares solution, but if N is known to be Gaussian, then this solution is also known as the max likelihood solution (ML).

We build the noise covariance matrix from other samples of signal-free data, either a "quiet" period of the recording, or in the pre-stimulus period; right-click on the data structure and select "Noise covariance" -> "Compute from recordings", for example. For a matrix N of T_{n} time samples, where T_{n} is not necessarily the same as the data matrix, but indeed must usually be many tens of seconds, then we can estimate C as

Chat = (N N')/(T_{n}-1),

but we will drop the "hat" for notational simplicity.

Now if we apply an *invertible* W to this equation (note technicality below), it does not change the GLS solution, if we follow the same estimation procedure!

(WD) = (WG) Q + (WN)

yields

Qhat = (G'W' (W C W')^{-1} WG)^{-1} G'W' (W C W') ^{-1} (W D)

which simplifies to the same GLS solution as

Qhat = (G' C^{-1} G)^{-1} G' C^{-1} D.

Thus it does not matter what W is applied, the GLS solution remains the same. Simply put, the montage is unimportant in source modeling!

**Reduced Rank**

But, there's a technical catch in the above simplification we must now address: loss of rank. As the astute user can note, the average reference W operator is singular, losing a degree of freedom in the imported data. The inverses have to be replaced with pseudo-inverses, and (W W^{-1}) is no longer identity, but rather an idempotent projection operator into the reduced rank space, (W W^{+}).

The concept, however, remains the same. Rather than burden this discussion with lots of pseudo-inverse symbols, allow me to simplify the discussion as follows. Assume again we have the perfect data equation,

D = G Q + N

where D explicitly includes the reference channel of all zeros, and G is appropriately calculated to include the explicit true reference, and N also includes this reference channel. Then the model is already rank-deficient, since one of the rows of this equation is already all zeros. Thus the GLS is properly

Qhat = (G' C^{+} G)^{-1} G' C^{+} D.

We simply state that for the average reference W, for which the inverse is now properly pseudo-inverted as W^{+}, when applied as

(WD) = (WG) Q + (WN)

then Qhat remains the same.

In Case 1, where the user gave no explicit reference channel, i.e. there is no row of zeros in the data or noise matrices, then there is technically a loss of rank in the data matrix when we apply W; however, without explicit knowledge of the reference sensor, we have no ability to correctly calculate the source model G, so this loss of rank is necessary.

**Practical Implications**

The Brainstorm user imports data matrix D, which we leave undisturbed in its "raw" state, i.e. we apply no montages. The user also calculates a noise covariance matrix C from this raw data, which again we store with NO montages applied.

For each EEG, SEEG, ECOG channel designated, an "infinite reference" forward model is built that considers only the single location of the primary channel.

When source modeling is initiated, all that matters to Brainstorm is the designation of "good channels." The average reference montage matrix W is built for the good channels only, applied to the present noise covariance matrix (the user is free to change), to create a source "kernel"

K = (G'W' (W C W')^{+} WG)^{-1} G'W' (W C W') ^{+} W

which can then be later applied to the stored raw data D (note the W at the end). Hence many different kernels K can be built and stored for different combinations of good channels or noise covariances, yet all are applied to the same raw data D.