All the things I am writing good or bad
To answer this questions we are going to use gapminder data
Start with import
data = pd.read_csv('gapminder.csv',low_memory=False)
I will be using url to get the data online (data is store in github)
Converting selected columns to numeric dtypes:
Our new dataframe sub_copy
contains all data with high suicide rate which we going to compare with employment rate, income per person and breast canrcer rate.
From the frequency distribution above we are able to determine that group 2 which belongs to 51 to 59 employment rate has the highest number of suicide rate.
From the frequency table above we found out that there is no significant difference between quartiles when it come to high suicide rate.
For the breast cancer rate, I grouped the data into 4 groups by number of breast cancer cases (1-23, 24-46, 47-69, 70-92) using pandas.cut
function.
People with lower breast cancer rate experience a high suicide rate.