GLM Level 2+#
!!! note
Quotes on this page are from Poldrack, Mumford and Nichols (2011) unless otherwise noted.
Group-level analyses can be level 2 or level 3 (i.e. the second or the third analysis step). If your design includes multiple runs for a single subject, then before your group-level analyses you should combine the estimates from each run for each subject. This is typically done through a fixed-effects model, which combines the level 1 estimates from each run of each subject by weighting them by the inverse of the within-run standard deviations (PMN, p. 104). In this case, level 2 would refer to your subject-level analyses that follows your run-level (level 1) analyses. Then, group-level analysis uses a mixed-effects model to combine subject level estimates.
Fixed effect modeling across runs#
Mixed effects modeling across subjects#
See section 6.1.1. of PMN for why we need mixed effects modeling.
To estimate a mixed effects model one can use the individual within subject variance estimates from level one analyses and use a weighted linear regression to down-weight subjects with high wihtin subject variability. Alternatively, one can make the simplifying assumption that within subjects’s variance is the same across all subjects. This would imply the same mixed effects variance (sum of within and between subjects variance) across subjects so one can use ordinary least squares to estimate the summed value for the mixed effects variance.
(Group-level) Inference#
How to define what is significant#
One you have computed an image of test statistics for the group, you need to determine which of these statistics indicate meaningful activations/differences.
Voxel-level definitions#
Since each step of the analysis has been done on a voxel level until this point, a natural way to make an inference on which test statistics are “significant” would be look at each individual voxel in isolation. If the statistic passes a threshold (that accounts for multiple comparison as described below) then you can infer that there is a meaningful signal in that voxel. This approach is very intuitive but does not take into account the spatial nature of the data and the preprocessing steps involving smoothing that might reduce peak activation in any given voxel.
Cluster-level definitions#
Another approach that would account for the spatial nature of the data and be less susceptible to peak activations (or lack thereof) in individual voxels is cluster-level inference. In this approach one identifies not individual voxels but connected clusters of voxels that systematically change activity above some threshold that is typically lower than one applied in voxel-level inference. Often many individual voxels in a “significant” cluster might not exceed the threshold used for voxel-level inference but their connected systematic change implies a meaningful difference for the contrast of interest.
Cluster-level inference involves defining three thresholds. First, one determines the “cluster-forming threshold”, a threshold applied to the test statistics in each voxel. Then, one defines what constitutes a “connection” to qualify for a cluster, i.e. the number of shared edges betwee voxels. Finally, a “cluster-size threshold” (accounting for multiple comparisons) is determined to select clusters that are large enough (contain a sufficient number of voxels) to be considered significant. This traditional approach ignores the individual test statistic values in a cluster as long as they pass the thresholds. “Cluster mass” based approaches address this by including this information in their cluster definitions as well.
Threshold-free cluster enhancement#
The major drawback for cluster-based inference is that setting the initial cluster-forming threshold is relatively arbitrary but very consequential in defining clusters. TFCE is developed to address this. It integrates over possible values of the the cluster-forming threshold and produces a “voxel-level map that indiciates cluster-level significance” (PMN, p. 115).
Correction for multiple comparisons#
Two terms you’ll read about in this context are family-wise error-rate (FWE) and false-discovery rate (FDR). Both of these try to control for Type I errors (false positives) across multiple tests. In the context of a single test this is done by controlling the level of alpha but when more tests (e.g. as many as the number of voxels in the case of mass univariate analyses) are involved the number of tests needs to be accounted for as well.
FWE#
There are multiple methods for controlling FWE including Bonferroni correction, Random Field Theory (RFT) and parametric simulations but these are either too conservative or too computationally intensive. Another alternative is to use nonparametric simulations. Though these too can me computationally intensive, nonparametric approaches are slightly more intuitive, do no make parametric assumptions about the p-value distributions and allow you to create empirical null distributions (which can be done for any kind of test statistic, not just t-values). Empirical null distributions are created by permutation tests where the labels of the data are repeatedly shuffled and the test statistic recomputed with the new labels. Then one can compare the observed true test statistic to the distribution of shuffled test statistics to determine whether it is too extreme to have been observed given the null distribution. Note, however, that the key assumption here is that observations are independent from each other. This means that a permutation test can be applied to test statistics across subjects (since each subject is independent from each other) but not within subject (at least not easily) to due temporal autocorrelation in the data. Additionally, voxel-level tests, especially with small samples rarely ever survive FWE corrections. Therefore, nonparametric options controlling for FWE are typically restricted to subject level analyses.
FDR#
Still, voxel-level tests do need to account for multiple comparisons as well. FDR correction procedures are a less conservative way of doing this. These (not detailed here) aim to control for the false discovery proportion (FDP), which is the fraction of false positives in the detected voxels. This makes FDR more sensitive than FWE to detect signal but it is also less specific. The FDR rate is only guaranteed across many experiments in the long run and can be higher or lower for any given experiment. It also does not have any spatial specificity, all voxels’ significance is treated in the same way.
Masks, interactions, conjunctions, ROI analyses#
ROI analysis, a.k.a. small volume correction,