Analyzing the Stroop Effect¶

Perform the analysis in the space below. Remember to follow the instructions and review the project rubric before submitting. Once you've completed the analysis and write-up, download this file as a PDF or HTML file, upload that PDF/HTML into the workspace here (click on the orange Jupyter icon in the upper left then Upload), then use the Submit Project button at the bottom of this page. This will create a zip file containing both this .ipynb doc and the PDF/HTML doc that will be submitted for your project.

(1) What is the independent variable? What is the dependent variable?

The independent variable is the test condition (congruent or incongruent). The dependent variable is the time recorded.

(2) What is an appropriate set of hypotheses for this task? Specify your null and alternative hypotheses, and clearly define any notation used. Justify your choices.

The null hypothesis is that incongruent tests take the same amount of time as congruent tests, while the alternative hypothesis is that incongruent tests take a different amount of time. This is appropriate because we expect to see no difference in the two samples, since both tests were taken by the same group of people.

Or, put mathematically:

$H_0: \mu_i - \mu_c = 0$

$H_1: \mu_i - \mu_c \neq 0$

Because the samples are paired, something like a paired t-test will be needed to analyze them. A t-test is run on a normally distributed population. In this case, our sample size is small and we do not have more information about the full population who took the test, so we will have to be careful about drawing conclusions from the data. I will address this further below, along with the question of normality.

(3) Report some descriptive statistics regarding this dataset. Include at least one measure of central tendency and at least one measure of variability. The name of the data file is 'stroopdata.csv'.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
%matplotlib inline

df = pd.read_csv('stroopdata.csv')

df.describe()

This dataset contains a sample of 24 participants with data on their congruent task times and incongruent task times. The congruent condition took an average (mean) of 14.05 seconds per participant, while the incongruent condition took an average of 22.02 seconds. There appears to be more variability among the incongruent, with a standard deviation of 4.80 seconds vs. a standard deviation of 3.56 seconds for congruent.

As mentioned, due to the small sample size, conclusions should be taken with a grain of salt.

(4) Provide one or two visualizations that show the distribution of the sample data. Write one or two sentences noting what you observe about the plot or plots.

plt.hist(df['Congruent'], bins='auto', color='blue', alpha=0.5, label='Congruent');
plt.hist(df['Incongruent'], bins='auto', color='red', alpha=0.5, label='Incongruent');
plt.legend();

Both the congruent and incongruent data appear somewhat positively skewed, with a longer tail towards the right. The incongruent data also shows outliers towards the right, i.e., longer times. Just to double-check, I also decided to run a box plot and check for skew:

fig, ax = plt.subplots()
ax.boxplot([df['Congruent'], df['Incongruent']], sym='+', labels=['Congruent', 'Incongruent']);
print(stats.skew(df['Congruent']), stats.skew(df['Incongruent']))

0.3903776149050634 1.4491357281474857

The box plot and skew analyses confirm that the data is slightly skewed, with more variability in the positive direction. However, I suspect that this skew may be due to the small sample size, especially because the mean and median for the congruent group are nearly identical (and the median is very slightly greater than the mean, which is not typical for a positively-skewed sample).

(5) Now, perform the statistical test and report your results. What is your confidence level or Type I error associated with your test? What is your conclusion regarding the hypotheses you set up? Did the results match up with your expectations? Hint: Think about what is being measured on each individual, and what statistic best captures how an individual reacts in each environment.

stats.ttest_rel(df['Incongruent'], df['Congruent'])

Ttest_relResult(statistic=8.020706944109957, pvalue=4.1030005857111781e-08)

stats.wilcoxon(df['Incongruent'], df['Congruent'], zero_method='pratt')

WilcoxonResult(statistic=0.0, pvalue=1.821529714896801e-05)

First, because my sample was not normally distributed due (probably) to the small sample size, I did some research. Per MeasuringU, the t-test is still robust enough to be accurate. Because the two samples are related, I ran a paired t-test. I also ran the correlated non-parametric test (for which normality is not necessary), the Wilcoxon Signed-Rank test, just to be certain of my results.

In both cases, I set my Type I error threshold to be $\alpha = 0.05$. The t-test returned a statistic of 8.02 (mean difference over standard error) and a p-value of 0.00000004103 and the Wilcoxon returned a signed rank of 0 and a p-value of 0.0000182. Both p-values are both far below $\alpha$. Therefore, despite the small sample size, we do have evidence to reject the null hypothesis that $H_0: \mu_i - \mu_c = 0$. Further study with a larger sample, however, would enable us to shore up or challenge these results and ensure that we can more accurately approximate the population parameter.

(6) Optional: What do you think is responsible for the effects observed? Can you think of an alternative or similar task that would result in a similar effect? Some research about the problem will be helpful for thinking about these two questions!

I believe the difference is due to the greater amount of cognitive processing required by the second task. When asked to read a color word ("yellow") and say the color of the word ("blue"), a participant has two choices, and typically has to pause to choose the right one. In the congruent task, the word and color are the same, so there is only one choice, and the task becomes much easier to do quickly. That said, it may be that the experiment plays out this way because adults are trained to attend more to reading than to naming colors; other research shows that with training, the Stroop effect can be manipulated in various ways, indicating that the effect probably has as much to do with practice and attentional selection as with processing speed.

I think that a similar effect might be found by grouping numbers and asking participants to name the number of numbers (for example, to read "111" and say "three"). While people are trained to count items quickly, I think that most literate adults are trained to read numbers first, and might encounter the same mental pause when asked to count written numbers.

Resources:

1) MacLeod, Colin. The Stroop Effect. Accessed at http://imbs.uci.edu/~kjameson/ECST/MacLeod_TheStroopEffect.pdf.

2) MeasuringU. Best Practices For Using Statistics on Small Sample Sizes. Accessed at: https://measuringu.com/small-n/.

	Congruent	Incongruent
count	24.000000	24.000000
mean	14.051125	22.015917
std	3.559358	4.797057
min	8.630000	15.687000
25%	11.895250	18.716750
50%	14.356500	21.017500
75%	16.200750	24.051500
max	22.328000	35.255000