Conducting Repeated-Measures t-test in R and Python
Repeated Measures t-test Tutorial: Analyzing Data using R and Python Link to heading
Repeated Measures t-test Link to heading
The repeated measures t-test (also known as the paired-samples t-test) is used to determine whether there are significant differences between the means of two related groups in a within-subjects design.
Example Scenario Link to heading
Let’s consider an example from marketing. Suppose a company wants to assess the effectiveness of their new add campaign in changing their costumers’ interest in a new product. They collect data from 250 participants who report their attitude towards the product prior to viewing the ad campaign (i.e., prior) and after viewing the ads (i.e., post). By conducting a repeated measures t-test, the company can determine if there is a statistically significant difference in user attitudes as a result of the ad campaign.
Conducting Repeated Measures t-test in R Link to heading
Step 1: Simulating Data Link to heading
First, we need to simulate the data for our example scenario. Let’s assume we have a data frame called data
containing columns for participant ID, pre-campaign attitudes (i.e., prior), and post-campaign attitudes (i.e., post).
set.seed(97) # For reproducibility
# Simulating data for the example scenario
data <- data.frame(
participant_id = rep(1:250),
prior = rnorm(250, mean = 5, sd = 1),
post = rnorm(250, mean = 6, sd = 1)
)
Step 2: Conducting Repeated Measures t-test Link to heading
Next, we can perform the repeated measures t-test using the t.test()
function in R. We need to specify the columns that includes the prior and post attitude data, and clarify that we want paired-samples t-test using paired
.
# Performing repeated measures t-test
t_test_result <- t.test(data$prior, data$post, paired = TRUE)
# Calculate effect size
effsize::cohen.d(data$prior, data$post, paired = TRUE)
# Print the t-test result
print(t_test_result)
The output provides the t-statistic (t = -9.68), degrees of freedom (df = 249), p-value (p < .001), and 95% confidence interval [-1.07, -0.70]. The results indicate that the campaign had a statistically significant and large effect (d = 0.89) on users’ attitudes.
Conducting Repeated Measures t-test in Python Link to heading
Step 1: Simulating Data Link to heading
Similarly, we need to simulate data for our example scenario using Python. We can use the NumPy
and Pandas
libraries to create a data frame.
import numpy as np
import pandas as pd
np.random.seed(97) # For reproducibility
# Simulating data for the example scenario
data = pd.DataFrame({
'participant_id': np.repeat(np.arange(1, 251), 1),
'prior': np.random.normal(loc = 5, scale = 1, size = 250),
'post': np.random.normal(loc = 6, scale = 1, size = 250)
})
Step 2: Conducting Repeated Measures t-test Link to heading
To perform the repeated measures t-test in Python, we can use the scipy.stats
module’s ttest_rel()
function. Note that to calculate Cohen’s d for the repeated measures t-test, we can use the mean difference and the pooled standard deviation.
from scipy import stats
# Performing repeated measures t-test
results = stats.ttest_rel(data['prior'], data['post'])
# Calculating 95% confidence interval
conf_interval = stats.t.interval(0.95, results.df, loc=np.mean(data['prior'] - data['post']), scale=stats.sem(data['prior'] - data['post']))
# Calculate Cohen's d
mean_diff = np.mean(data['post'] - data['prior'])
pooled_sd = np.std(data['post'] - data['prior'])
cohens_d = mean_diff / pooled_sd
# Print the t-test result
print("T-statistic:", results.statistic)
print("df:", results.df)
print("P-value:", results.pvalue)
print("Confidence Interval:", conf_interval)
print("Cohen's d:", cohens_d)
The output will display the calculated t-statistic (t = -11.54) and p-value (p < .001), and 95% confidence interval [-1.16, -0.82], which indicate the campaign had a statistically significant and large effect (d = 0.73) on users’ attitudes toward the new product.
Congratulations! You have successfully conducted a repeated measures t-test using both R and Python.