# Data covariance matrix

Hello,

In the data covariance tutorial, the data covariance matrix is computed in preparation for a LCMV source reconstruction. It is suggested to use post-stimulus data from all conditions to compute the data covariance. Following the recommendation, one should include all trials from the conditions with uneven number of trials (~200 trials in the "standard" condition, and ~40 trials from the "deviant" condition).

I wonder if there would not be a bias in the resulting data covariance towards the standard condition. Is this a relevant point? If so, what would be the effect in terms of signal strength in the two conditions? Would you expect anything as simple as a dampening of signal in one, relative to the other condition? or would the effects be less straightforward than that to predict?

I suppose one way to circumvent this bias would be to select an equal number of trials, but this would be at the cost of a lower precision. What would you suggest?

Many thanks!

This is a question for @John_Mosher and @pantazis.

You want to use all the data available to you to derive the best possible estimates of empirical statistics, such as here, covariance. In the condition with less trials, your estimate will be less accurate, more noisy, but that's the best available to you.

@John_Mosher and @pantazis, please chime in.

I do not have much experience in using beamformers in my research but here is my take. First, I believe it is important to estimate a single aggregate data covariance matrix for all conditions, and thus a single LCMV beamformer inverse matrix, rather than one data covariance matrix / LCMV matrix per condition (which would introduce problems and confounds).

Second, to deal with imbalanced study designs (e.g 200 trials from standard condition and 40 from deviant condition), I suggest you search existing literature with the terms 'LCMV beamformer' and 'mismatch negativity' or 'oddball' terms and see what other studies implement. I just had a brief search and that was not necessarily helpful, but a more thorough search may be a good idea.

I share your fears that with imbalanced designs the data covariance will be heavily biased towards one condition. I suspect this could lead in principle to both bias and variance increase in the least represented condition. A possible approach to avoid this is to separately compute two data covariance matrices (one for standard, one for deviant) and then average them with equal weights (that is Caggregate = (C1 + C2) / 2. But this is not supported in brainstorm, and there may be issues in dealing with degrees of freedom. For simplicity, I believe 200 vs 40 is not such a big imbalance and I would go with Sylvain's suggestion: simply use all trials to estimate a single data covariance matrix despite the slight imbalance. I think this is what other investigators do (but I encourage you to do a literature search, as I described above).

Apologies for the late response, you caught me on proposal writing...

Best,
Dimitrios