# Basic Understanding of Noise Covariance Matrix, LORETA and Overlapping Spheres

Hello BST Team,
I am a PhD student who started my research in neuroscience two months ago. I started to learn about M/EEG source localisation and BST has indeed helped me in understanding the forward and inverse modelling concepts with the practice datasets. I have gone through your tutorials until Source Estimation. I have few very basic question, hope you guys don't mind me asking these basic details.

1. I have gone through sLORETA, Pascual Marqui et al, paper and understood how the standardization happens in sLORETA (computes the covariance matrix of sources and divides/standardizes each voxel by the respective diagonal element of covariance matrix). But I am unable to conceptualize the noise covariance matrix because, The noise covariance matrix is numChxnumCh. How can you divide/standardize each voxel where #voxels > #diagonal elements of noise cov matrix.
2. noise covariance matrix I intended to use is "No Noise Modelling(Identity matrix)". The tutorial says, "using this noise at the sensor level would be assumed as homeskedastic", how are you going standardizing using this noise model in dSPM or in sLORETA.
3. I assume, when empty recordings of the MEG room are given, the will be source localised, acquiring the source time series, getting the source cov matrix and use this to standardise the current density map just as explained in (1). Am I correct? If not can you explain me how does this empty room noise recordings are used to standardize the current density map.
4. I read Sensor Weighted Overlapping Spheres model paper that you cited in your tutorials and understood how each sphere is fitted (algorithm clearly explains this). I also learnt that dipoles are present inside the spehere and my question is, as I am using 15,000 dipoles and say I have 275 channel MEG system, which means I will have 275 overlapping spheres and how will these 15,000 dipoles spread across the 275 spheres? How are they projected back onto the MRI of the subject.
5. Although, I understood sLORETA, I was bit confused with LORETA. MNE is incapable of pointing the deep sources and LORETA overcomes this disadvantage. It is said that LORETA uses Laplacian which overcomes the disadvantage of MNE, I did not understand how using a laplacian overcomes this disadvantage. Can someone explain me?(tried looking at mathematical equations but they were too overwhelming).

Once again apologies for these basic questions, I just want to understand what I do at every step in source localisation.
Regards,

Questions 1, 2, 4, 5: @Sylvain, @John_Mosher?

1. I assume, when empty recordings of the MEG room are given, the will be source localised, acquiring the source time series, getting the source cov matrix and use this to standardise the current density map just as explained in (1). Am I correct? If not can you explain me how does this empty room noise recordings are used to standardize the current density map.

The noise covariance computed from the MEG empty room recordings are used to whiten the recordings before source localisation. See lines 232-454 and 701 in bst_inverse_linear_2018.m.
https://github.com/brainstorm-tools/brainstorm3/blob/master/toolbox/inverse/bst_inverse_linear_2018.m#L232

You can see the result of this whitening by right-clicking on the source results > Model evaluation > Save whitened recordings.

Hello,

Regarding the spheres model, for each sensor only one sphere is used for calculating the forward solution. So for all the dipoles, you're always using the same sphere for that sensor. So you can think of all dipoles as actually being inside all the spheres (the sphere radius does not matter in the calculation so you can conceptually make it as big as necessary to include all your sources). Now I'm not sure what you mean by "projected back onto the MRI".

For sLORETA, look for example at http://arxiv.org/pdf/0710.3341. The forward solution (lead field K) is size nChan x nSources. The main equation is: j = [(K'CK)^(-1/2) K'C]Phi. C = (KK'+aH)^+ here contains among other things the sensor noise (H) and has size nChan x nChan. In the main equation C is always multiplied by K, that's how you go from channel space to source space. Hopefully that helps you to answer your questions 1-3. For your question 2, the identity matrix would replace the sensor noise covariance H.

Cheers,
Marc

1 Like

Hello Francois,
Many thanks for the reply. Whitening is nothing but decorrelating the channels, right? But why do we have to do it before source localisation, I am unable to understand this. Can you please elaborate?
Many thanks,

Dear Marc,
Many thanks for the detailed explanation on sLORETA and how noise cov matrix is transformed from sensor space to source space.

"for each sensor only one sphere is used for calculating the forward solution". This makes more sense now. I was confused for using 275 spheres at once for building the forward model and was unable t grasp how 15,000 would be spread across these 275 spheres.
Once again, thank you for the clarification.
Regards,

Dear Francois and Marc.Lalancette,
I have a question on usage of beamformers. I might be working on interictal data during my PhD. I learnt that, beamformers assume that sources are "uncorrelated". In this case can I use Interictal data?. Although interictal data is not as correlated as ictal data but I assume it to be more correlated when compared to a control (healthy individual). In this case(interictal data) can I use beamformers for source localisation.
Regards,

Indeed, multiple strongly correlated sources with good SNR may produce strange beamformer results. For example bilateral sources can be localized as a single midline source, or can "disappear" altogether. But this is difficult to predict and must be verified on a case by case basis. There have been approaches suggested to deal with this, but perhaps a simpler solution is to compare results from the beamformer with another inverse solution that does not suffer from correlation issues, e.g. sLORETA. If the two give notably different results, then further care should be taken in the interpretation.

Many thanks for the reply. Whitening is nothing but decorrelating the channels, right? But why do we have to do it before source localisation, I am unable to understand this. Can you please elaborate?

This is a question for @Sylvain and @John_Mosher...

That's a good question and I'm interested to hear what John and Sylvain might say. In the meantime, if you look at the code comments where François linked, it seems one of the reasons is to bring different modalities' (with different units) data to the same range of values. This probably helps the numerical stability of the solution. But that would not require whitening, simply dividing each channel by its variance would be sufficient.

Looking more closely at the code, I think whitening is used as a step of the inverse solution, and not "before". What I mean by that is for example in sLORETA, using the same notation as before, if you whiten the data and leadfields with sqrt(C), then the formula simplifies to: j = (K'*K)^(-1/2) * K' * Phi. Similarly in the (unit-norm) LCMV, if you whiten by sqrt(data_cov^-1) then: j = (K'*K)^(-1) * K' * Phi. I'm not completely sure that's what we do however as the code is not very easy to follow.

Otherwise, in theory if you use the same whitening matrix applied to the data, the forward model and the covariance, it completely cancels out in some inverse solution equations (including the two above).

I though of writing two sentences to clarify this, but the short story ended up longer :). Hope there are not too many typos in important places... please ask / comment, if the equations seem strange!

A linear measurement model is:
m = Ls + n,

The estimated source distribution is of the form:
s_est = Gm,
where
G=RL'(LRL'+C)^(-1)
is a general linear inverse model (Bayesian MAP with parametric priors or weighted MN estimator, whatever you like to call it...); R is a (prior) source covariance or weighting, C a is noise covariance or a sensor-level weighting matrix.

Whitening means multiplying the model and the data with W = C^(-1/2)
Now write L_w = WL and m_w = Wm. Then we have
s_Est = G_w m_w,
where
G_w=RL_w'(L_wRL_w'+I)^(-1).

It does thus in principle not matter, whether you use pre-whitened data and G_w or non-whitened data and G.

Typically we do not know R too well --- we can have a prior guess on the shape of R, but at least the amplitude remains typically quite unknown. In classical MNE, we set R = lambda^(-2)I, and with this choice we get the estimators to form

G=L'(LL'+lambda^2 C)^(-1)
G_w=L_w'(L_wL_w'+lambda^2 I)^(-1),

where we call lambda^2 the regularization parameter.

Choosing the lambda^2 is, how should I say it, a form of art. A typical choice is to use a SNR-like formulation. It is based on thinking of LRL'+C as measurement covariance matrix that has data of interest LRL' and noise C --- we can define SNR as a ratio of these two matrices. But, C having a lot of offdiagonal content and different sensors having different amplitude ranges, how to compute a ratio of LRL' and C in a principled way? If we have a whitener, the matrices to compare are L_wRL_w' and I. Now, having the noise C converted to identity matrix and rows of L being scaled to similar ranges, we can, for example, define
SNR^2 = trace(L_wRL_w')/trace(I).

With traditional MNE, this gets the form
SNR^2 = trace(L_wL_w')/trace(lambda^2 I). Thus, if we have an estimate of SNR^2, we get the regularization to the right ballpark by setting lambda^2 to
lambda^2= trace(L_wL_w')/(SNR^2 N_sens),
where N_sens is the number of sensors. For more text about this, please see the classic MNE manual written by the other Matti!

So, one benefit of using pre-whitening is that setting the regularization parameter gets more intuitive and easier. Then, after estimating the source distribution, one often studies the reconstruction error (goodness of fit). The error should be of the order of noise, and doing such analysis is easier / more intuitive, if noise has been converted white.

Cheers,
Matti