Behavioral scientists have used N-of-1 trials for more than a century [1]. Guyatt et al. (1986) brought these trials, with randomized crossover designs, to the attention of mainstream medical research in the 1980s (see also Gabler et al., 2011) [2, 3]. In this era of personalized medicine, N-of-1 trials are appearing in medical research with increasing frequency [4]. Consequently, guidelines for these studies were added to the Consolidated Standards of Reporting Trials (CONSORT) in 2015; see CONSORT Extension for N-of-1 Trials (CENT) [5]. CENT guidelines call for statistical methods that account for within-subject correlation. This call echoes what reviews of N-of-1 studies often note: N-of-1 data exhibit serial correlation (e.g. first-order autocorrelation), and most studies fail to account for serial correlation. [1, 3, 6, 7].
This work develops a formula-based statistical method for N-of-1 studies that accounts for serial correlation while using only the data from a single individual to draw inferences. Most existing methods emerged with increases in computing power. These methods typically provide inference on two types of differences between two treatments: level- and rate-change. Level-change is when the difference in means is not dependent on the time series of the treatments, whereas rate-change is when the difference in means is dependent on the time series of the treatments. Rochon (1990) describes a large-sample, maximum likelihood method that evaluates both level- and rate-change, but no closed-form estimator exists [8]. Hence, an iterative procedure produces the estimates. McKnight et al. (2000) developed a double-bootstrap method for making inference on level- and rate-change [9]. Their first bootstrap estimates serial correlation; the second uses the estimated correlation to compare two treatments. They provide statistical properties for their method, and they focus on trials having as few as 20 or 30 observations. Borckardt and company describe statistical properties of the Simulation Modelling Analysis for N-of-1 trials, and consider trials having between 16 and 28 observations from an individual [10, 11]. Simulation Modelling Analysis is similar to a parametric bootstrap method, with the bootstrap method generating replicates under the null hypothesis. Empirical p-values for level- and rate-change result. Lin et al. (2016) propose semiparametric and parametric bootstrap methods (only one bootstrap needed) for evaluating level- and rate-change [12]. They explore the statistical properties of their method for trials having 28 observations. Other N-of-1 methods exist, but the methods described here are the only ones we could find that use only the observations from a single individual and account for serial correlation.
Trials 2 Second Edition Serial N
All of the methods above are computationally intensive and require either special software or substantial statistical expertise. However, researchers conducting N-of-1 trials seem to prefer simpler analysis methods. Gabler et al. (2011) reviewed analyses conducted in 108 N-of-1 trials and found 52% used visual analysis, 44% used t-tests, and 24% used nonparametric methods (some studies used more than one analysis method) [3]. Punja et al. (2016) reviewed 100 reports of conducted (60%) and planned (40%) N-of-1 trials [13]. Seventy-five of these performed or planned statistical analyses: 53% of these 75 used paired t-tests and 32% used a nonparametric method. Though several of these simple analysis methods use only the observations from one individual, they fail to account for serial correlation. A substantial proportion of researchers using N-of-1 trials sacrifice their need for appropriate analyses to their desire for simplicity. Our goal in this work is to tend to their analytical needs and desires by developing a simple method that uses only the data from a single individual.
These simulated data represent N-of-1 trials with m crossovers of treatments A and B, randomized within block (left panels). The differences between A and B within a block (right panels) may be suitably analyzed with paired serial t-tests for level-change (top panels) and for rate-change (bottom panels). The true means are represented with lines, and serially correlated observations with points.
These simulated data represent N-of-1 trials where a series of observations from treatment A are observed first, followed by a series of observations from treatment B. These data may be suitably analyzed with 2-sample serial t-tests for level-change (a) and for rate-change (b). The true means are represented with lines, and serially correlated observations with points.
Limitations. While the serial t-tests demonstrate better Type I error rates, power, and confidence interval width estimation than the usual t-tests often used in N-of-1 trials, Type I error is still substantial, power optimistic, and interval widths biased for small m. This is mainly due to the inaccuracy that remains in estimating ρ. Although r is bias-corrected, bias still exists; the bias is towards 0 for level-change tests, and is negative for rate-change tests. Type I errors are affected by the biased r through the standard errors and degrees of freedom, both functions of ρ. The effect of this bias on Type I errors comes more through the estimated degrees of freedom than through the standard errors. However, increasing m improves inflated Type I error, optimistic power, and biased margins of error for serial t-tests, particularly for the level-change tests; these 3 properties do not improve in the usual t-tests.
Although the serial t-tests do not account for carryover effects, the absence of carryover effects is often assumed in applications of N-of-1 trials [5]; nevertheless, users need to carefully design experiments to limit any carryover effect that may arise when comparing treatments. These serial t-statistics assume that observations are equally spaced in time, with no missing observations. For unequally spaced observations, the estimator of ρ will be biased toward 0. The variance is also assumed to be homogeneous, which is common for most t-test applications. For N-of-1 trials, this assumption is likely not unrealistic; see Table 1 of Rochon (1990) for an example [8].
The 2-sample serial t-tests assume independence between conditions A and B; however, since the data come from the same person, this assumption likely is not true, as illustrated when pairing observations from separate discounting tasks by delay (Section 4.2). When possible, for N-of-1 trials with treatments occurring one after the other (as in bi-phasic, pre-post, and ABAB designs), planning the same series for both treatments will allow the use of paired serial t-tests, which can have greater power when data between the two treatments are truly pair-wise correlated. The delay discounting example illustrated this characteristic
These serial t-tests are an improvement over often-employed usual t-tests and other methods that fail to account for serial correlation. Additionally, the serial t-tests are easy for researchers to implement. Further work is still needed to adjust for N-of-1 trials with few observations, particularly for rate-change tests. Nevertheless, we believe these serial t-tests will make appropriate analyses for N-of-1 trials more accessible to researchers, and allow them to make better decisions for individuals undergoing N-of-1 trials.
Note that our analyses focused on the two extreme cases of repetition trials with one versus eight repetitions of the first or second item while the experiment also included repetition trials with intermediate levels of repetitions (see SI). Specifically, other repetition trials included cases in which the second item began to appear at each possible position from 2 to 9. The other repetition trials could therefore include, for instance, three repetitions of the first and six repetitions of the second image, or four repetitions of the first and five repetitions of the second item, etc. The results reported in the SI indicate that effects in these trials show smooth transition between the extremes shown in the main manuscript.
In order to analyze the neural activation patterns following the presentation of sequential visual stimuli for evidence of sequentiality, we first determined the true serial position of each decoded event for each trial. Specifically, applying the trained classifiers to each volume of the sequence trials yielded a series of predicted event labels and corresponding classification probabilities that were assigned their sequential position within the true sequence that was shown to participants on the corresponding trial.
We hypothesized that sequential order information of fast neural events will translate into order structure in the fMRI signal and successively decoded events in turn. Therefore, we analyzed the fMRI data from sequence trials for evidence of sequentiality across consecutive measurements. The analyses were restricted to the expected forward and backward periods which were adjusted depending on the sequence speed. For each TR, we obtained the image with the most likely fMRI signal pattern based on the classification probabilities. First, we asked if we are more likely to decode earlier serial events earlier and later serial events later in the decoding time window of 13 TRs. To this end, we averaged the serial position of the most likely event at every TR, separately for each trial and participant, resulting in a time course of average serial event position across the decoding time window (Fig. 3d). We then compared the average serial event position against the mean serial position (position 3) as a baseline across participants at every time point in the forward and backward period using a series of two-sided one-sample t-tests, adjusted for 38 multiple comparisons (across all five speed conditions and TRs in the forward and backward period) by controlling the FDR133. These results are reported in the SI. Next, in order to assess if the average serial position differed between the forward and backward period for the five different speed conditions, we conducted a linear mixed effects (LME) model and entered the speed condition (with five levels) and trial period (forward versus backward) as fixed effects including by-participant random intercepts and slopes. Finally, we conducted a series of two-sided one-sample t-tests to assess whether the mean serial position in the forward and backward periods differed from the expected mean serial position (baseline of 3) for every speed condition (all p values adjusted for 10 comparisons using FDR-correction133). 2ff7e9595c
Comments