Introduction

As part of the Final Project for A1C1 Univ.ai's Course, the team - Vishnu, Sakthisree, Niegil and Rishabh - started their journey to leverage Machine Learning for trying to answer the age-old question of - what exactly makes us happy?

The World Happiness Report was recommended to be a good starting point for guaging world wise bliss. Throughout our analysis, the data points surely helped us although towards the end we were able to understand that perhaps all of the variables in this report alone will not be sufficient for us to accurately measure the happiness of a country since "happiness" is very relative in nature.

title

There are six measurements taken per country for guaging the World Happiness Index. They consist of:

  1. GDP per Capita - Gross Domestic Product per capita for the countries

  2. Family - Satisfaction Rank of Family

  3. Life Expectancy - Avg. expected years to live

  4. Freedom - Perception of freedom quantified

  5. Generosity - Numerical value estimated based on the perception of Generosity experienced by poll takers in their country.

  6. Trust/Government Corruption - A quantification of the people's perceived trust in their governments.

  7. Dystopia Score - Score based on comparison to hypothetically the saddest country in the world.

  8. Dystopia Residual - Rank of any country in a particular year.

The Happiness Score calculated in the report is actually an average of the responses to the main life evaluation question asked in the Gallup World Poll (GWP), which uses the Cantril Ladder.

Cantril Ladder involved something called as Cantril step where they ask reponsents to think of a step with the most excellent life they can think of and with that as benchmark, score their current life.

Credits Remarks to:

  1. Univ.Ai Professor Pavlos Protopapas
  2. Kaggle Datasets
  3. Aashita Kesarwani - https://www.kaggle.com/aashita/guide-to-animated-bubble-charts-using-plotly - for demonstrating beautiful ways to plot bubble charts
  4. Jesper Sören Dramsch - https://www.kaggle.com/jesperdramsch/the-reason-we-re-happy - for demonstrating wonderful means of doing data analysis
  5. Jamaç Eren Ay - https://www.kaggle.com/yamaerenay/world-happiness-report-preprocessed - for preparing pre processed datasets and allowing it for free use for all

Problem Statement

Given the data available per country to guage the Hapiness Index, our aim is to:

  1. Part A - Analyze and understand which factors affect the Happiness Index Score of countries
  2. Part B - Analyze and understand the relationship between Terror Attacks and Happiness Index
  3. Part C - Create a Model to predict the Happiness Index of a Country
  4. Part D - To see how much Health contributes to the Happiness Index? With the current pandemic at hand, predicting COVID-19 Cases in the coming days for countries.
  5. Part E - Creating a Dashbord for viewing COVID-19 Predictions

Part A

To Analyze and understand which factors affect the Happiness Index Score of countries

Explaratory Data Analysis

Our objective here is to look through the datasets and perform some basic analysis to understand and guage insights.

A look into Correlation

The Spearman's Rank Correlation Coefficient is used to discover the strength of a link between two sets of data.

  • The Spearman rank correlation coefficient, ρ considers the ranks of the values for the two variables.ρ will always be a value between -1 and 1.

  • The further away ρ is from zero, the stronger the relationship between the two variables. The sign of ρ corresponds to the direction of the relationship. If it is positive, then as one variable increases, the other tends to increase. If it is negative, then as one variable increases, the other tends to decrease.

  • You use Spearman’s correlation if your data have a non-linear relationship (like an exponential relationship) or you have one or more outliers. However, Spearman’s correlation is only appropriate if the relationship between your variables is monotonic.

happiness_score gdp_per_capita family health freedom generosity government_trust dystopia_residual year social_support
happiness_score 1.00 0.80 0.14 0.77 0.54 0.13 0.32 0.23 0.03 0.24
gdp_per_capita 0.80 1.00 0.21 0.78 0.36 -0.01 0.26 0.06 -0.04 0.14
family 0.14 0.21 1.00 -0.07 0.01 0.23 0.10 0.56 -0.59 -0.86
health 0.77 0.78 -0.07 1.00 0.40 -0.02 0.18 -0.05 0.07 0.38
freedom 0.54 0.36 0.01 0.40 1.00 0.33 0.43 -0.00 0.06 0.23
generosity 0.13 -0.01 0.23 -0.02 0.33 1.00 0.24 0.16 -0.10 -0.18
government_trust 0.32 0.26 0.10 0.18 0.43 0.24 1.00 0.13 0.02 -0.02
dystopia_residual 0.23 0.06 0.56 -0.05 -0.00 0.16 0.13 1.00 0.09 -0.59
year 0.03 -0.04 -0.59 0.07 0.06 -0.10 0.02 0.09 1.00 0.43
social_support 0.24 0.14 -0.86 0.38 0.23 -0.18 -0.02 -0.59 0.43 1.00

Inference: From the above matrixes, it seems like Health, GDP Per Capita and freedom are the top 3 factors that correlate with happiness index.

Univariate Analysis

This type of analysis consists of use of single variable. The analysis of univariate data does not deal with causes or relationships and the main purpose of the analysis is to describe the data and find patterns that exist within it.

Bivariate Analysis

This type of analysis involves two different variables. The analysis of this type of data deals with causes and relationships and the analysis is done to find out the relationship among the two variables