<<HTML(<style>.backtick {font-size: 16px;}</style>)>><<HTML(<style>abbr {font-weight: bold;}</style>)>> <<HTML(<style>em strong {font-weight: normal; font-style: normal; padding: 2px; border-radius: 5px; background-color: #EEE; color: #111;}</style>)>> '''<<TAG(Advanced)>> '''

= Granger causality =
''Authors: Hossein Shahabi and Raymundo Cassani''

This tutorial extends the information provided in the [[Tutorials/Connectivity|connectivity tutorial]] regarding the formulation of (temporal and spectral) Granger causality. Moreover, an numeric example based on simulated signals is provided to verify the results obtained with GC in time and frequency domain.

<<TableOfContents(2,2)>>

== Mathematical background ==
Granger causality (GC) is a method of [[Tutorials/Connectivity|functional connectivity]], introduced by Clive Granger in the 1960s ([[https://doi.org/10.2307/1912791|Granger, 1969]]), but later refined by John Geweke in the form that is used today ([[https://doi.org/10.2307/2288723|Geweke, 1984]]). Granger causality was originally formulated in economics, but has caught the attention of the neuroscience community in recent years. Before this, neuroscience traditionally relied on lesions and applying stimuli on  a part of the nervous system to study it’s effect on another part. However, Granger causality made it possible to estimate the statistical influence without requiring direct intervention ([[https://doi.org/10.1016/j.neuroimage.2010.02.059|Bressler and Seth, 2011]]).

By definition, Granger causality is a measure of linear dependence, which tests whether the variance of error for a linear [[https://en.wikipedia.org/wiki/Autoregressive_model|autoregressive model]] estimation (AR model) of a signal <<latex($$x(t)$$)>> can be reduced when adding a linear model estimation of a second signal <<latex($$y(t)$$)>>. If this is true, signal <<latex($$y(t)$$)>> has a '''Granger causal effect''' on the first signal <<latex($$x(t)$$)>>, i.e., '''independent information''' of the past of <<latex($$y(t)$$)>> improves the prediction of <<latex($$x(t)$$)>> above and beyond the information contained in the past of <<latex($$x(t)$$)>> alone. The term independent is emphasized because it creates some interesting properties for GC, such as that it is invariant under rescaling of the signals, as well as the addition of a multiple of <<latex($$x(t)$$)>> to <<latex($$y(t)$$)>>. The measure of Granger Causality is nonnegative, and zero when there is no Granger causality ([[https://doi.org/10.2307/2288723|Geweke, 1984]]).

The main advantage of Granger causality is that it is an directed measure, in that it can dissociate between <<latex($$x(t)\to y(t)$$)>> versus <<latex($$y(t)\to x(t)$$)>>. It is important to note however that though the directionality of Granger causality is a step closer towards measuring effective connectivity compared to non-directed measures, it should still not be confused with “true causality”. Effective connectivity estimates the effective mechanism generating the observed data (model based approach), whereas GC is a measure of causal effect based on prediction, i.e., how well the model is improved when taking variables into account that are interacting (data-driven approach) ([[https://doi.org/10.3389/fninf.2013.00006|Barrett and Barnett, 2013]]).

The difference with causality is best illustrated when there are more variables interacting in a system than those taken into account in the model. For example, if a variable <<latex($$z(t)$$)>> is causing both <<latex($$x(t)$$)>> and <<latex($$y(t)$$)>>, but with a smaller delay for <<latex($$y(t)$$)>> than for <<latex($$x(t)$$)>>, then a GC measure between <<latex($$x(t)$$)>> and <<latex($$y(t)$$)>> would show a non-zero GC for <<latex($$y(t)\to x(t)$$)>>, even though <<latex($$y(t)$$)>> is not truly causing <<latex($$x(t)$$)>> ([[https://doi.org/10.1016/j.neuroimage.2010.02.059|Bressler and Seth, 2011]]).

IMAGE

== GC in time domain ==
Even though GC has been extended for nonlinear, multivariate and time-varying conditions, in this tutorial we will stick to the basic case, which is a linear, bivariate and stationary model defined in both the time and spectral domain. In the time domain this can be represented in the following way. If <<latex($$x(t)$$)>> represents a signal that can be modelled using a linear AR model in the following two ways:

{{{#!latex
\begin{eqnarray*}
x(t) & = & \sum_{k=1}^{p} [A_{k}x(t-k)] +e_1 \\
x(t) & = & \sum_{k=1}^{p} [A_{k}x(t-k)+B_{k}y(t-k)] +e_2
\end{eqnarray*}
}}}
Where <<latex($$p$$)>> is the '''model order''', and represents the amount of past information that will be included in the prediction of the future sample. In these two equations, the first model for <<latex($$x(t)$$)>> uses the past (and present) of only itself whereas the second includes the past (and present) of a second signal <<latex($$y(t)$$)>>. Note that when '''only past measures''' of signals are taken into account (<<latex($$k\geq 1$$)>>), the model ignores simultaneous connectivity, which makes it less susceptible to volume conduction ([[https://mitpress.mit.edu/books/analyzing-neural-time-series-data|Cohen, 2014]]). Then, according to the original formulation of GC, the measure of GC from <<latex($$y(t)$$)>> to <<latex($$x(t)$$)>> is defined as:

{{{#!latex
\begin{eqnarray*}
F_{y \to x}=\ln \left( \frac{\textrm{Var}(e_1)}{\textrm{Var}(e_2)} \right)
\end{eqnarray*}
}}}
Which is 0 for <<latex($$\textrm{Var}(e_1) = \textrm{Var}(e_2)$$)>> and a non-negative value for <<latex($$\textrm{Var}(e_1) > \textrm{Var}(e_2)$$)>>. Note that <<latex($$\textrm{Var}(e_1) \geq \textrm{Var}(e_2)$$)>> always holds, as the model can only improve when adding new information.

== GC in frequency domain ==
An important improvement by Geweke was to show that under fairly general conditions, <<latex($$F_{y \to x}$$)>> can be decomposed by frequency. If the two AR models in time domain are specified as:

{{{#!latex
\begin{eqnarray*}
x(t) &=\sum_{k=1}^{p} [A_{k_{xx}}x(t-k) + A_{k_{xy}}y(t-k)] +\sigma_{xy}  \\
y(t) &=\sum_{k=1}^{p} [A_{k_{yy}}y(t-k) + A_{k_{yx}}y(t-k)] +\sigma_{yx}
\end{eqnarray*}
}}}
In each equation the reduced model can be defined when each signal is an AR model of only its own past, with error terms <<latex($$\sigma_{xx}$$)>> and <<latex($$\sigma_{yy}$$)>>. Then we can defined the variance-covariance matrix of the whole system as:

{{{#!latex
\begin{eqnarray*}
\Sigma =
\left[\begin{array}{cc}
      \Sigma_{xx} & \Sigma_{xy} \\
      \Sigma_{yx} & \Sigma_{yy}
      \end{array}\right] \\
\end{eqnarray*}
}}}
where <<latex($$\Sigma_{xx} = \textrm{Var}(\sigma_{xx})$$)>> , etc. Applying the the Fourier transform to these equations, they can be expressed as:

{{{#!latex
\begin{eqnarray*}
\left(\begin{array}{cc}
      A_{xx}(w) & A_{xy}(w) \\
      A_{yx}(w) & A_{yy}(w)
      \end{array}\right)
\left(\begin{array}{c}
      x(w)\\
      y(w)
      \end{array}\right) =
\left(\begin{array}{c}
      \varepsilon_{1}(w) \\
      \varepsilon_{2}(w)
      \end{array}\right)
\end{eqnarray*}
}}}
Rewriting this as

{{{#!latex
\begin{eqnarray*}
\left(\begin{array}{c}
      x(w)\\
      y(w)
      \end{array}\right) =
\left(\begin{array}{cc}
      H_{xx}(w) & H_{xy}(w) \\
      H_{yx}(w) & H_{yy}(w)
      \end{array}\right)
\left(\begin{array}{c}
      \varepsilon_{1}(w) \\
      \varepsilon_{2}(w)
      \end{array}\right)
\end{eqnarray*}
}}}
Where <<latex($$H(w)$$)>> is the transfer matrix. The spectral matrix is then defined as:

{{{#!latex
\begin{eqnarray*}
S(w)=H(w)\Sigma H^{\star}(w)
\end{eqnarray*}
}}}
Finally, assuming independence of the signals <<latex($$x$$)>> and <<latex($$y$$)>>, and <<latex($$\Sigma_{xy}=\Sigma_{yx}=0$$)>>, we can define the spectral GC as:

{{{#!latex
\begin{eqnarray*}
F_{y \to x}(w)= \ln \left( \frac{S_{xx}(w)}{H_{xx}(w)\Sigma_{xx}H_{xx}^{\star}(w)  } \right)
\end{eqnarray*}
}}}
Note that the ability to reformulate the equations in spectral domain also makes it very easy to generalize to a multivariate GC. That is why all multivariate measures of GC are defined in spectral domain. Alternative measures derived from these multivariate autoregressive estimates are [[https://en.wikipedia.org/wiki/Brain_connectivity_estimators#Partial_Directed_Coherence|Partial Directed Coherence (PDC)]] and [[https://en.wikipedia.org/wiki/Brain_connectivity_estimators#Directed_Transfer_Function|Directed Transfer Function (DTF)]].

== A few noteworthy practical issues about GC ==
 1. '''Temporal resolution''': the high time resolution offered by MEG, EEG, and intracranial EEG allows for a very powerful application of GC and also offers the important advantage of spectral analysis.

 1. '''Stationarity''': the GC methods described so far are all based on AR models, and therefore assume [[https://en.wikipedia.org/wiki/Stationary_process|stationarity]] of the signal (constant auto-correlation over time). However neuroscience data, especially task based data such as event-related potentials (ERPs) are mostly non-stationary. There are two possible approaches to solve this problem. The first is to apply methods such as differencing, filtering, and smoothing to make the data stationary (see recommendation for time domain GC), although dynamical changes in the connectivity profile cannot be detected with this approach. The second approach is to turn to versions of GC that have been adapted for non-stationary data, either by using a non-parametric estimation of GC, or through measures of time-varying GC, which estimate dynamic parameters with adaptive or short-time window methods ([[https://doi.org/10.1016/j.neuroimage.2010.02.059|Bressler and Seth, 2011]]).

 1. '''Number of variables''': Granger causality is very time-consuming in the multivariate case for many variables (<<latex($$O(m^2)$$)>>) where <<latex($$m$$)>> represents the number of variables). Since each connection pair results in two values, there will also be a large number of statistical comparisons that need to be controlled for. When performing GC in the spectral domain, this number increases even more as statistical tests have to be performed per frequency. Therefore it is usually recommended to select a limited number of ROIs or electrodes based on some hypothesis found in previous literature, or on some initial processing with a more simple and less computationally heavy measure of connectivity.

 1. '''Pre-processing''': The influence of pre-processing steps such as filtering and smoothing on GC estimates is a crucial issue. Studies have generally suggested to limit filtering only for artifact removal or to improve the stationarity of the data, but cautioned against band-pass filtering to isolate causal influence within a specific frequency band ([[https://doi.org/10.1016/j.neuroimage.2010.02.059|Bressler and Seth, 2011]]).

 1. '''Volume Conduction''': Granger causality can be performed both in the sensor domain or in the source domain. Though spectral domain GC generally does not incorporate present values of the signals in the model, it is still not immune from spurious connectivity measures due to volume conduction (for a discussion see ([[https://doi.org/10.1007/s10548-016-0538-7|Steen et al., 2016]])). Therefore it is recommended to reduce the problem of signal mixing using additional processing steps such as performing source localization, and performing connectivity analysis in source domain.

 1. '''Data length''': because of the extent of parameters that need to be estimated, the number of data points should be sufficient for good fit of the model. This is especially true for windowing approaches, where data is cut into smaller epochs. A rule of thumb is that the number of estimated parameters should be at least (~10) several times smaller than the number of data points.

In the next sections we present a hands-on example with simulated signals where Granger causality (in time and frequency domain) can be used. Since the GC method is based on AR modelling of signals, the simulated signals are generated with a AR model, thus providing a fair ground truth to validate the GC results.

== Simulate signals ==
In this example, we generate a dataset for a 3-variate AR model (example 2 in the study of ([[https://doi.org/10.1007/PL00007990|Baccala and Sameshima, 2001]])). These observed time series are assumed to be generated by the equations:

{{{#!latex
\begin{eqnarray*}
        x_1(n) &=& \hphantom{-}0.5x_1(n-1)+0.3x_2(n-1)+0.4x_3(n-1)+w_1(n) \\
        x_2(n) &=& -0.5x_1(n-1)+0.3x_2(n-1)+x_3(n-1)+w_2(n) \\
        x_3(n) &=& -0.3x_2(n-1)-0.2x_3(n-1)+w_3(n)
\end{eqnarray*}
}}}
Where <<latex($$w(n)$$)>> are zero-mean uncorrelated white processes with identical variances. Since the model order in this example is one, we can rewrite the equations in matrix form as such:

{{{#!latex
\begin{eqnarray*}
X =
\left(\begin{array}{ccc}
      \hphantom{-}0.5  & \hphantom{-}0.3 & \hphantom{-}0.4 \\
      -0.5 & \hphantom{-}0.3 & \hphantom{-}1.0   \\
      \hphantom{-}0.0    & -0.3 & -0.2\\
      \end{array}\right)
X_1 +
\left(\begin{array}{ccc}
      1 & 0 & 0 \\
      0 & 1 & 0 \\
      0 & 0 & 1 \\
      \end{array}\right)
W
\end{eqnarray*}
}}}
To do simulate this in Brainstorm, we use the following steps:

 1. Select the menu File > Create new protocol > "TutorialGC" and select the options:
 1. "Yes, use protocol's default anatomy"
 1. "No, use one channel file per acquisition run (MEG/EEG)".

 1. In the '''''Process1''''' tab, leave the file list empty and click on the button '''''[Run]''''' ( {{https://neuroimage.usc.edu/moin_static198/brainstorm1/img/iconRun.gif}} ) to open the [[Tutorials/PipelineEditor#Selecting_processes|Pipeline editor]].
 1. Select the process '''''Simulate » Simulate AR signals (ARfit)''''', and all the parameters as follows:
 {{{
% Coefficient matrix A:
A1 = [ 0.5   0.3   0.4;
      -0.5   0.3   1.0;
       0.0  -0.3  -0.2];
A  = [A1];

% Intercept b:
b = [0 0 0];

% Noise covariance matrix C:
C = eye(3,3);
}}}
 {{attachment:arfit_gui.png||width="400"}}

Which is a 10 second simulation with 100Hz sampling rate. The process will give you a data file presented in Brainstorm as:

{{attachment:sim_in_tree2.png||width="250"}}

When displaying the data as time-series:

{{attachment:sim_plot2.png}}

=== Process options ===
 * '''Subject name''': Specify in which subject the simulated file is saved.
 * '''Condition name''': Specify the folder in which the file is saved (default: "Simulation").
 * '''Number of time samples''': number of data samples per simulation.
 * '''Signal sampling frequency''': Sampling frequency of the simulated signal in Hz.
 * '''Coefficient matrix A''': coefficient matrix as defined in the matrices in equations above. There are `p` (model order) number of `A` matrices, each the size `[Nsignals_to, Nsignals_from]`. For orders `p>1`, these coefficient matrices are then combined as: `A = [A1, A2, ..., Ap];`.
 * '''Intercept b''': constant offset for each variable, as defined by a matrix with size `[1, Nsignals]`.
 * '''Noise covariance matrix C''':} covariance matrix for the error (noise) term. Since we are modelling an uncorrelated white process with identical variance, this is equivalent to an identity matrix with size `[Nsignals, Nsignals]`.

== Bivariate (temporal) Granger Causality NxN ==
 1. To compute Granger causality, place the simulated data in the '''''Process1''''' tab. <<BR>> {{attachment:sim_process1.png||width="500"}} <<BR>>

 1. Click on '''''[Run]''''' to open the [[Tutorials/PipelineEditor#Selecting_processes|Pipeline editor]], and select '''''Connectivity » Bivariate Granger causality NxN'''''. <<BR>> {{attachment:gc_time.png||width="400"}} <<BR>>

 1. The connectivity profile calculated is saved in the file '''''Granger: full'''''. Righ-click on it, and select '''Display as image [NxN]'''. <<BR>> {{attachment:gc_time_matrix.png||width="500"}} <<BR>> In this image, an asymmetrical matrix of GC estimates is shown, with non-negative values. Note that this map is not a multivariate analysis of our multivariate system, but instead a 3-by-3 bivariate analysis. It is known that bivariate connectivity measures to estimate the correct pattern of propagation in a multivariate system can be problematic, nevertheless the results still show a pretty good estimation of connectivity, as can be seen in the strong propagation of activity going from signal 3 to signal 2, as is the case in the originally described model. The next strong connections are from signal 1 to signal 2, and from signal 3 to signal 1. Note that connections from signal 2 to signal 1, and from signal 2 to signal 3 show similar values, as they have the same coefficient value in the simulation model though with negative signs. Note that the relative strengths of connectivity in the results are reserved, but the value itself is a ratio of the residual errors of the full and reduced model as described in the [[#Mathematical_background|introduction]], and should not be confused with the actual coefficient values.

 1. You also have the option of showing the results as a graph with right-click on the Granger file then '''Display as graph [NxN]'''. See the [[Tutorials/ConnectivityGraph|connectivity graph tutorial]] for a detailed explanation of the options of this visualization.

(figure 7)

=== Process options ===
 * '''Time window:''' Segment of the signal used for the connectivity analysis. Check '''''All file'''''.
 * '''Sensor types or names:''' Leave it empty.
 * '''Include bad channels:''' Check it.
 * '''Removing evoked response''': Check this box to remove the averaged evoked. It is also recommended by some as it meets the zero-mean stationarity requirement (improves stationarity of the system). However, the problem with this approach is that it does not account for trial-to-trial variability. For a discussion see ([[https://doi.org/10.1016/j.neuroimage.2008.03.025|Wang et al., 2008]]).
 * '''Model order:''' Order of the model, see recommendations in the next section. While our simulated signals were created with a model of 1, here we used as model order of 3 for a decent connectivity result.
 * '''Save individual results (one file per input file):''' option to save GC estimates on several files separately.
 * '''Concatenate input files before processing (one file):''' option to save GC estimates on several files as one concatenated matrix.

=== On the hard drive ===
The output of the '''Bivariate Granger Causality NxN''' is a NxN matrix saved as  TF domain, and it represent the coupling strength between variables. You can see these contents with right-click on the Granger file then '''''File » View file contents'''''.

{{attachment:gc_time_file.png||width="600"}} <<BR>>

=== Recommendations ===
The '''model order used for GC is crucial''', as too low orders may lack the necessary details, while too big orders tend to create spurious values of connectivity. The most common criteria used to define the order of the model are the [[https://en.wikipedia.org/wiki/Akaike_information_criterion|Akaike’s information]] criterion (AIC), the [[https://en.wikipedia.org/wiki/Bayesian_information_criterion|Bayesian-Schwartz’s criterion]] (BIC), and the [[https://en.wikipedia.org/wiki/Hannan–Quinn_information_criterion|Hannan-Quinn criterion]]. As mentioned before, a model order should be selected with care. In electrophysiological data, a model order should also make some '''biological sense'''. For example in EEG data with 2 kHz sampling rate, an order of 5 would correspond to a 2.5 ms of past information incorporated to train your model, which is probably too little past information to make a decent judgement. Note that to take an arbitrary high model order is also not recommended, as it will increase the type I error. We will refrain from recommending a rule of thumb model order, but instead advise to use the information criteria such as AIC or BIC to compare model fits:

{{{#!latex
\begin{eqnarray*}
AIC(p) & = & \ln(\det(\Sigma)) + \frac{2pk^2}{N} \\
BIC(p) & = & \ln(\det(\Sigma)) + \frac{\ln(N)pk^2}{N}
\end{eqnarray*}
}}}
Where <<latex($$\Sigma$$)>> is the noise covariance matrix, <<latex($$p$$)>> model order, <<latex($$k$$)>> number of variables, and <<latex($$N$$)>> is the number of data samples.

On a similar note, the '''sampling rate''' of the data is also not an easy choice to make. Even though too low sampling rate will eliminate possible dynamics of the data, if the sampling rate is too high the model will be computationally very expensive, as a very high model order will be required. For most MEG / EEG data, a sampling rate between 200 and 1000 Hz will probably suffice ([[https://mitpress.mit.edu/books/analyzing-neural-time-series-data|Cohen, 2014]]).

As mentioned in the introduction, the current model is based on the assumption of stationarity. Therefore it is important to first check if the time series we are using are stationary with statistical tests such as the [[https://en.wikipedia.org/wiki/KPSS_test|KPSS test]] or the [[https://en.wikipedia.org/wiki/Dickey–Fuller_test|Dickey-Fuller (unit root) test]]. If the data is shown to be non-stationary, certain techniques can still render it stationary (depending on the type of non-stationarity). A common approach is to difference the data to eliminate any trends in the time series (i.e., $x_t^{'}=x_{t}-x_{t-1}$). This can be done several times if necessary. A problem with this is that it could potentially change the interpretation of any resulting GC, specifically in the frequency domain where differencing acts as a high filter. Other commonly used methods are smoothing or filtering the data, but in the end if there are causal relation that vary over time these technique will make it impossible to find those. An alternative approach could be to divide the data into time-series that are (sufficiently) locally stationary, and apply the GC per time window. This method makes is possible to study changes in connectivity over time, but it is of course not without its own problems, the most important of which is the choice of the '''time window'''. If the time window is too short, there will not be enough data points to make a decent estimation.

== Bivariate spectral Granger causality NxN ==
 1. To estimate GC in frequency domain, drag the the simulated signals to the '''''Process1''''' tab. <<BR>> {{attachment:sim_process1.png||width="500"}} <<BR>>

 1. Click on '''''[Run]''''' to open the [[Tutorials/PipelineEditor#Selecting_processes|Pipeline editor]], and select '''''Connectivity » Bivariate Granger causality (spectral) NxN'''''. <<BR>> {{attachment:gc_freq.png||width="400"}} <<BR>>

 1. As with the temporal GC, right-click on the SpGranger file, and select '''Display as image [NxN]'''. Also, to view the spectral profile for each bivariate connection pair, right-click on the SpGranger file, then select '''Power Spectrum'''. {{attachment:gc_freq_res.png}} <<BR>> Where the first variable in the legend refers to the sender and the second to the receiver of information. As we can see from the figure the large coupling strength from signal 2 to signal 3 is highest in higher frequency ranges. You can select your desired connectivity pair, and then press enter to view it in a separate window.

=== Process options ===
With respect to (temporal) GC, spectral GC presents two extra parameters:

 * '''Maximum frequency resolution''': Width of frequency bins in PSD estimation. A large number of steps (high resolution) will increase the number of statistical comparisons. Note that the frequency resolution has to be less than half the highest frequency of interest.
 * '''Highest frequency of interest''': Highest frequency for the analysis. Based on the Nyquist theorem, it should be <= Fs/2.

=== On the hard drive ===
The output of the '''Bivariate Granger causality (spectral) NxN''' is a NxN matrix saved as  TF domain and it represents the coupling strength between variables. In contrast to the temporal GC, this time the NxN matrix is frequency-resolved. You can see these contents with right-click on the SpGranger file then '''''File » View file contents'''''.

{{attachment:gc_freq_file.png||width="600"}} <<BR>>

== Additional documentation ==
==== Articles ====
 * Granger CW. <<BR>> [[https://doi.org/10.2307/1912791|Investigating causal relations by econometric models and cross-spectral methods.]] <<BR>> Econometrica: journal of the Econometric Society. 1969;424–38.

 * Geweke JF. <<BR>> [[https://doi.org/10.2307/2288723|Measures of conditional linear dependence and feedback between time series.]] <<BR>> Journal of the American Statistical Association. 1984;79(388):907–15.

 * Bressler SL, Seth AK. <<BR>> [[https://doi.org/10.1016/j.neuroimage.2010.02.059|Wiener–Granger causality: a well established methodology.]] <<BR>> Neuroimage. 2011 Sep 15;58(2):323-9.

 * Baccalá LA, Sameshima K. <<BR>> [[https://doi.org/10.1007/PL00007990|Partial directed coherence: a new concept in neural structure determination]]. <<BR>> Biol Cybern. 2001 May 11;84(6):463–74.

==== Forum discussions ====
 * Connectivity matrix storage:[[http://neuroimage.usc.edu/forums/showthread.php?1796-How-the-Corr-matix-is-saved|http://neuroimage.usc.edu/forums/showthread.php?1796]]