Tutorial 26: Statistics

[TUTORIAL UNDER DEVELOPMENT: NOT READY FOR PUBLIC USE]

Authors: Francois Tadel, Elizabeth Bock, Dimitrios Pantazis, Richard Leahy, Sylvain Baillet

In this auditory oddball experiment, we would like to test for the significant differences between the brain response to the deviant beeps and the standard beeps, time sample by time sample. Until now we have been computing measures of the brain activity in time or time-frequency domain. We were able to see clear effects or slight tendencies, but these observations were always dependent on an arbitrary amplitude threshold and the configuration of the colormap. With appropriate statistical tests, we can go beyond these empirical observations and assess what are the significant effects in a more formal way.

Random variables

In most cases we are interesting in comparing the brain signals recorded for two populations or two experimental conditions A and B.

A and B are two random variables for which we have a limited number of repeated measures: multiple trials in the case of a single subject study, or multiple subject averages in the case of a group analysis. To start with, we will consider that each time sample and each signal (source or sensor) is independent: a random variable represents the possible measures for one sensor/source at one specific time point.

A random variable can be described with its probability distribution: a function which indicates what are the chances to obtain one specific measure if we run the experiment. By repeating the same experiment many times, we can approximate this function with a discrete histogram of observed measures.

stat_distribution.gif stat_histogram.gif

Histograms

You can plot histograms like this one in Brainstorm, it may help you understand what you can expect from the statistics functions described in the rest of this tutorial. For instance, seeing a histogram computed with only 4 values would discourage you forever to run any group analysis with 4 subjects...

Recordings

Sources (relative)

Sources (absolute)

Sources (norm)

Time-frequency

Statistical inference

Hypothesis testing

To show that there is a difference between A and B, we can use a statistical hypothesis test. We start by assuming that the two sets are identical then reject this hypothesis. For all the tests we will use here, the logic is similar:

Evaluation of a test

The quality of test can be evaluated based on two criteria:

Different categories of tests

Two families of tests can be helpful in our case: parametric and non-parametric tests.

Parametric Student's t-test

Assumptions

The Student's t-test is a widely-used parametric test to evaluate the difference between the means of two random variables (two-sample test), or between the mean of one variable and one known value (one-sample test). If the assumptions are correct, the t-statistic follows a Student's t-distribution.

The main assumption for using a t-test is that the random variables involved follow a normal distribution (mean: μ, standard deviation: σ). The figure below shows a few example of normal distributions.

t-statistic

Depending on the type of data we are testing, we can have different variants for this test:

p-value

Once the t-value is computed (tobs in the previous section), we can convert it to a p-value based on the known distributions of the t-statistic. This conversion depends on two factors: the number of degrees of freedom and the tails of the distribution we want to consider. For a two-tailed t-test, the two following commands in Matlab are equivalent and can convert the t-values in p-values.
To obtain the p-values for a one-tailed t-test, you can simply divide the p-values by 2.

p = betainc(df./(df + t.^2), df/2, 0.5);    % Without the Statistics toolbox
p = 2*(1-tcdf(abs(t),df));                  % With the Statistics toolbox

The distribution of this function for different numbers of degrees of freedom:

Example 1: t-test on recordings

Parametric t-tests require the values tested to follow a normal distribution. The recordings evaluated in the histograms sections above (MLP57/160ms) show distributions that are matching relatively well this assumption: the shape of the histogram follows the trace of the corresponding normal function, and with very similar variances. It looks reasonable to use a parametric t-test in this case.

Correction for multiple comparisons

Multiple comparison problem

The approach described in this first example performs many tests simulateneously. We test independently each MEG sensor and each time sample across the trials, so we run a total of 274*361 = 98914 t-tests.

If we select a critical value of 0.05 (p<0.05), it means that we want to see what is significantly different between the conditions while accepting the risk to observe a false positive in 5% of the cases. If we run around 100,000 times the test, we can expect to observe around 5,000 false positives. We need to control better for false positives (type I errors) when dealing with multiple tests.

Bonferroni correction

The probability to observe at least one false positive, or familywise error rate (FWER) is almost 1:
FWER = 1 - prob(no significant results) = 1 - (1 - 0.05)^100000 ~ 1

A classical way to control the familywise error rate is to replace the p-value threshold with a corrected value, to enforce the expected FWER. The Bonferroni correction sets the significance cut-off at $$\alpha$$/Ntest. If we set (p ≤ $$\alpha$$/Ntest), then we have (FWER ≤ $$\alpha$$). Following the previous example:
FWER = 1 - prob(no significant results) = 1 - (1 - 0.05/100000)^100000 ~ 0.0488 < 0.05

This works well in a context where all the tests are strictly independent. However, in the case of MEG/EEG recordings, the tests have an important level of dependence: two adjacent sensors or time samples often have similar values. In the case of highly correlated tests, the Bonferroni correction tends too be conservative, leading to a high rate of false negatives.

FDR correction

The false discovery rate (FDR) is another way of representing the rate of type I errors in null hypothesis testing when conducting multiple comparisons. It is designed to control the expected proportion of false positives, while the Bonferroni correction controls the probability to have at least one false positive. FDR-controlling procedures have greater power, at the cost of increased rates of Type I errors. Wikipedia.

In Brainstorm, we implement the Benjamini–Hochberg step-up procedure for FDR correction:

In the interface

>{{attachment:stat_mcp_gui.gif}}<

You can select interactively the type of correction to apply for multiple comparison while reviewing your statistical results. The checkboxes "Control over dims" allow you to select which are the dimensions that you consider as multiple comparisons (1=sensor, 2=time, 3=not applicable here). If you select all the dimensions, all the values available in the file are considered as the same repeated test, and only one corrected p-threshold is computed for all the time samples and all the sensors.

If you select only the first dimension, only the values recorded at the same time sample are considered as repeated tests, the different time points are corrected independently, with one different corrected p-threshold for each time point.

When changing these options, a message is displayed in the Matlab command window, showing the number of repeated tests that are considered, and the corrected p-value threshold (or the average if there are multiple corrected p-thresholds, when not all the dimensions are selected):

BST> Average corrected p-threshold: 5.0549e-07  (Bonferroni, Ntests=98914)
BST> Average corrected p-threshold: 0.00440939  (FDR, Ntests=98914)

STOP READING HERE, EVERYTHING BELOW IS WRONG

Example 2: t-test on sources

Parametric method require the values tested to follow a Gaussian distribution. Some sets of data could be considered as Gaussian (example: screen capture of the histogram of the values recorded by one sensor at one time point), some are not (example: histogram ABS(values) at one point).

WARNING: This parametric t-test is valid only for constrained sources. In the case of unconstrained sources, the norm of the three orientations is used, which breaks the hypothesis of the tested values following a Gaussian distribution. This test is now illustrated on unconstrained sources, but this will change when we figure out the correct parametric test to apply on unconstrained sources.

Permutation tests

Permutation test: https://en.wikipedia.org/wiki/Resampling_%28statistics%29

FieldTrip: Non-parametric cluster-based statistic [TODO]

We have the possibility to call some of the FieldTrip functions from the Brainstorm environment. For this, you need first to install the FieldTrip toolbox on your computer and add it to your Matlab path.

For a complete description of non-parametric cluster-based statistics in FieldTrip, read the following article: Maris & Oostendveld (2007). Additional information can be found on the FieldTrip website:

Permuation-based non-parametric statistics are more flexible and do not require to do any assumption on the distribution of the data, but on the other hand they are a lot more complicated to process. Calling FieldTrip's function ft_sourcestatistics requires a lot more memory because all the data has to be loaded at once, and a lot more computation time because the same test is repeated many times.

Running this function in the same way as the parametric t-test previously (full cortex, all the trials and all the time points) would require 45000*461*361*8/1024^3 = 58 Gb of memory just to load the data. This is impossible on most computers, we have to give up at least one dimension and run the test only for one time sample or one region of interest.

FieldTrip: Process options [TODO]

Screen captures for the two processes:

Description of the process options:

The options available here match the options passed to the function ft_sourcestatistics.

Cluster correction: Define what a cluster is for the different data types (recordings, surface source maps, volume source models, scouts)

Default options:

FieldTrip: Example 1 [TODO]

We will run this FieldTrip function first on the scouts time series and then on a short time window.

FieldTrip: Example 2 [TODO]

short time window

Time-frequency files

Both ft_sourcestatistics and ft_freqstatistics are available.

Use ft_sourcestatistics if the input files contain full source maps, and ft_freqstatistics for all the other cases (scouts or sensors).

One-sided or two-sided

two-tailed test on either (A-B) or (|A|-|B|)

The statistic, after thresholding should be displayed as a bipolar signal (red – positive, blue – negative) since this does affect interpretation

Constrained (A-B), null hypothesis H0: A=B, We compute a z-score through normalization using a pre-stim period and apply FDR-controlled thresholding to identify significant differences with a two tailed test.

Constraine (|A|-|B|), null hypothesis H0: |A|=|B|, We compute a z-score through normalization using a pre-stim period and apply FDR-controlled thresholding to identify significant differences with a two tailed test.

Unconstrained RMS(X-Y), H0: RMS(X-Y)=0, Since the statistic is non-negative a one-tailed test is used.

Unconstrained (RMS(X) –RMS(Y)), H0: RMS(X)-RMS(Y)=0, we use a two-tailed test and use a bipolar display to differentiate increased from decreased amplitude.

Workflow: Single subject

You want to test for one unique subject what is significantly different between two experimental conditions. You are going to compare the single trials corresponding to each condition directly.
This is the case for this tutorial dataset, we want to know what is different between the brain responses to the deviant beep and the standard beep.

Sensor recordings: Not advised for MEG with multiple runs, correct for EEG.

Constrained source maps (one value per vertex):

Unconstrainted source maps (three values per vertex):

Time-frequency maps:

Workflow: Group analysis

You have the same two conditions recorded for multiple subjects.

Sensor recordings: Strongly discouraged for MEG, correct for EEG.

Source maps (template anatomy):

Source maps (individual anatomy):

Time-frequency maps:

Convert statistic results to regular files

Export to SPM

An alternative to running the statical tests in Brainstorm is to export all the data and compute the tests with an external program (R, Matlab, SPM, etc). Multiple menus exist to export files to external file formats (right-click on a file > File > Export to file).

Two tutorials explain to export data specifically to SPM:

On the hard drive [TODO]

Right click one of the first TF file we computed > File > View file contents.

Useful functions

References

Additional documentation

Delete all your experiments

Before moving to the next tutorial, delete all the statistic results you computed in this tutorial. It will make the database structure less confusing for the following tutorials.








Feedback: Comments, bug reports, suggestions, questions
Email address (if you expect an answer):


Tutorials/Statistics (last edited 2015-11-24 22:07:30 by FrancoisTadel)