Happiness score correlation

Question

Suppose you are given this dataset, which provides the happiness scored across countries based on various factors. The factors include:

  • Year: The year the country was assessed
  • Overall rank: Rank of country's (or region's) happiness score for the year
  • Country or region: Country or region being measured
  • Score: This is the happiness score
  • GDP per capita: The extent to which GDP contributes to the calculation of the Happiness Score.
  • Social support: The extent to which Family contributes to the calculation of the Happiness Score
  • Life expectancy: The extent to which Life expectancy contributed to the calculation of the Happiness Score
  • Freedom: The extent to which Freedom contributed to the calculation of the Happiness Score.
  • Generosity: The extent to which Generosity contributed to the calculation of the Happiness Score.
  • Perceptions of corruption: The extent to which Perception of Corruption contributes to Happiness Score.

Given this, can you identify the factors that have the highest correlation to the happiness score?

To help get you started, below is code to load the dataset into a Pandas dataframe. You can also make a copy of this Google Colab notebook.

#Importing packages.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import math
%matplotlib inline

#Loading in pearon's r
from scipy.stats import pearsonr
#Reading in data
data = pd.read_csv('https://raw.githubusercontent.com/erood/interviewqs.com_code_snippets/master/Datasets/world_happiness_2015_2019.csv', parse_dates=True) 
data.head()
Year Overall rank Country or region Score GDP per capita Social support Healthy life expectancy ...
0 2019 1 Finland 7.769 1.340 1.587 0.986 ...
1 2019 2 Denmark 7.600 1.383 1.573 0.996 ...
2 2019 3 Norway 7.554 1.488 1.582 1.028 ...
3 2019 4 Iceland 7.494 1.380 1.624 1.026 ...
4 2019 5 Netherlands 7.488 1.396 1.522 0.999 ...

Solution

Access restricted

Subscribe to premium account to see the solution.

Get premium now