Daddy Awesome's Writings

All the things I am writing good or bad


Project maintained by daddyawesome Hosted on GitHub Pages —

ANOVA TEST on Continent’s female employment rate

Data Analysis Tools - Week 1

Back

Data

Data for this study comes from the Gapminder World Dataset collected by the Gapminder Foundation. The Gapminder World Dataset contains data collected from more than 200 countries/areas for more 500 variables.

Description of Variables

Below is the description of the variables

  1. Continents

  2. Female Employment Rate (variable code: femaleemployrate, Unit: Percentage) - Employed females (age > 15) as a percentage of the total female population. Female Employment Rate is the response variable

Start with import

load gapminder dataset

data = pd.read_csv('gapminder.csv',low_memory=False)

I will be using url to get the data online


join the two dataframe


New DataFrame for Analysis

We create a dataframe sub out from the merge dataframe df_outer

using Ordinary least squares function for calculating the F-statistic and associated p value




Since our p-value is 0.0455 which is smaller than 0.05, the data provides significant evidence against the null hypothesis. But, we cannot reject the null hypothesis and accept the alternate hypothesis, right away. To avoid Type I error we need to perform the POST HOC test

Tukey HSD test


From the result above we can only say that there is a significant difference between Africa and Asia’s female employment rate.

For other pair of continents we fail to reject the NULL Hypothesis