Running an Analysis of Variance (ANOVA) test

Ganiyah Issa-Onilu
Analytics Vidhya
Published in
2 min readOct 25, 2020

--

Remember the blog series I shared recently on my solutions to weekly assignments in a Data Management and Visualization course offered through Coursera, where I worked on finding an association between two variables in the Gapminder dataset in order to accept or reject my hypothesis, and also answer the research question.

Data Analysis Tools — Analysis of Variance test

Now, I have decided to go further into the research by performing a statistical test i.e. the ANOVA test known as Analysis of variance. Analysis of variance assesses whether the means of two or more groups are statistically different from each other. This analysis is appropriate whenever you want to compare the means (quantitative variable) of groups (categorical variable). The null hypothesis is that there is no difference in the mean of the quantitative variable across groups (categorical variable), while the alternative is that there is a difference.

I had to refine my research questions because I want to compare the means of the quantitative response variable (i.e. co2emissions) of the created four groups from the explanatory variable (i.e. relectricperperson) . The refined research question is:

Does the level of electricity consumed per person have an association with the carbon dioxide emitted in the environment?

Hypotheses for statistical test

  1. The null hypothesis is that there is no significant difference in the means of the co2emissions variable across groups of the relectricperperson variable.
  2. The alternative hypothesis is that there is a difference in the means of the co2emissions variable across groups of the relectricperperson variable.

The Analysis of Variance (ANOVA) test

To perform the ANOVA test using the two variables selected for the research, I created 4 groups of the explanatory variable (i.e. relectricperperson). These groups are categories that show the level of electricity consumption and they are : low, lower-medium, upper-medium and high levels. The program and the summary of results of the ANOVA test are below:

Means and Standard Deviations of the four electricity levels
Summary of ANOVA test result

Interpretation of Result

The ANOVA test report shows that the F-statistic value is 2.208 and the probability i.e. p-value is 0.0903. The p-value is greater than the threshold (0.05) meaning that the null hypothesis cannot be rejected and there is no significant association between the level of electricity consumed and the carbon dioxide emitted.

It is not necessary to conduct post hoc ANOVA test because the ANOVA test is not significant.

This is the end of my solution and I hope you found my article interesting. Please leave a note if you have any feedback or suggestions. Thank you.

--

--

Ganiyah Issa-Onilu
Analytics Vidhya

A Data Scientist and Visual Storyteller with a strong interest in Data Analytics and Business Intelligence.