Académique Documents
Professionnel Documents
Culture Documents
Wadekar
Sarah W. Gibbons Middle School
INTRODUCTION AND BACKGROUND
I like to read politics, the news, and U.S. History. Recently, in my Social Studies class we
finished analyzing the Constitution [1] and Declaration of Independence [2], and I came across
the famous quote about life and happiness, namely, Life, Liberty and the Pursuit of Happiness!
It got me thinking that although the Declaration of Independence protects our freedom to pursue
happiness as we please, people do not universally agree on what makes them happy. As a result,
happiness is totally subjective and elusive but very much sought after. In fact, for most people,
finding happiness is their lifes main goal. I see, people all around me, engaging in a wide array
of activities that I presume make them happy. Some people pursue challenges that promote their
personal growth, for example, running a marathon, or publishing a book or an article. Some folks
find happiness in climbing the professional ladder such as seeking a promotion at their work
place. Many others invest in family and relationships, others may choose to find joy and meaning
in service. The pursuit of happiness thus takes varied forms and involves many activities that
lead to significant milestones or life events along each individuals chosen journey. Upon
reaching these milestones along each path, however, we may not feel as happy as we thought we
might. Many times, people are asked to rate the happiness levels they experienced in the range of
1-10. Therefore, the central theme of my project is to understand and compare how different life
RESEARCH OBJECTIVES
My main goal is to find out how different life events influenced the happiness levels felt
by people. I formulated two research objectives to explore this goal. The first one is:
R1: On an average, do different types of life events bring us similar levels of
happiness?
Average happiness levels, however, may not provide the complete picture. This is
because even when life events X and Y have close average happiness levels, life event Xs
distribution across levels 1 through 10 may be very skewed, while the distribution of happiness
levels for life event Y may be very homogeneous. A homogeneous distribution means that the
percentage of people is spread out evenly across all the happiness levels, whereas in a skewed
R2: Is there a consensus about the levels of happiness that people draw from life
Ultimately, I hope that the understanding gathered by exploring the two objectives will
identify opportunities to invest in social policies and programs that will help people in their
To explore these research objectives, I had to look for a data set that will allow
me to compare numerically the different levels of happiness felt by people for various life events.
I came across the life events survey data [3] collected by the myPersonality project [4]. The life
events survey was administered by the myPersonality project via Facebook between August
2010 and May 2012. Respondents were asked to rate 25 significant life events, they rated how
happy each life event made them on a scale of 1 through 10. These life events are listed in Table
1. Consent was obtained by the myPersonality app before respondents completed the survey.
There were a total of 10,053 responses, and if a respondent did not rate a life event, it was
indicated by -1. All the respondents were from the United Kingdom. Respondents were given the
In this part of the questionnaire we would like to know about some of the significant
events that have happened in your life and how they might have affected your happiness. Tick the
box to tell us how long ago the event happened to you and tell us how happy that event made you
on a scale of 1 to 10, where 1 is very unhappy and 10 is very happy. If the event did not happen
In order to conduct my research, I classified the life events into four groups. These four
groups were inspired by a paper by Clark and Oswald that estimates the compensating amounts
or economic impact of different life events on human well-being [5]. In my case, the four groups
reflect the different avenues or dimensions along which people may choose to pursue happiness,
and the milestones reached along the way. I note that this classification of events is not unique,
and any event may fit into a different or more than one category. For example, competing or
serving your country could fall into Service category, but I chose to include it in the Personal
Growth category. Similarly, retiring could fall into either Personal Growth, or Professional
Microsoft Word and Google Docs for writing the lab report.
Microsoft PowerPoint and Google Sheets for making the slides for the trifold.
Code Generator for Word, to comment and format the code to include it in the report.
ANALYSIS PROCEDURE
To conduct my analysis, I explored many general purpose programming languages such as Java
[6] and C [7]. I also explored specialized data processing and analysis tools such as R [8] and
Matlab [9]. I found that the data analysis tools such as R and Matlab offer many convenient and
efficient data processing and data handling capabilities. The two most important aspects from my
point of view were the capability to read and store csv data files and perform calculations on all
the elements stored in one row or one column of a table at once, with a single command.
Therefore, I narrowed the choice down to R and Matlab. In the end, I chose R because it was
public domain and free to use. I also noticed that the stack overflow user community [10] for R
is very active and they helped me a lot when I ran into problems. The R code for the analysis of
My first research objective was to explore whether people draw similar average
happiness levels from life events belonging to different groups. To explore this objective, for
each life event, I computed the average happiness levels. Even if the total number of respondents
is 10053, all respondents did not rate did not rate all life events. Whenever a life event had not
occurred to a respondent, the respondent rated that as a -1. So, each life event had some missing
responses indicated by a -1. To compute the average happiness levels for each life event, I
eliminated the missing responses, added up all the happiness levels, and divided the total
In the second research objective, I explored whether there is a consensus on the levels of
happiness that people draw from life events in different groups. For example, I wanted to know if
the percentage of respondents that rated life event X with a level of 2 is similar to the percentage
of respondents that rated life event Y with a score of 2. I wanted to compare these distributions
for all the levels 1-10. First, I visually explored and compared the distributions for some pairs of
life events. These four pairs of comparisons are shown below. From plot I in the figure, I could
tell that the distributions of becoming your own boss and buying your dream car are very similar
to each other. But the distributions of learning to ride a bicycle and children leaving home in plot
II are very different. The distributions of event pairs first child vs. getting married and moving
house vs. learning to drive in plots III and IV are not completely similar but not completely
similar either.
0.35 0.35
Own Boss vs. Dream Car Learning to Bike vs. Children Leaving
0.3 0.3
Probability
0.2 Children Leaving Home
0.15
0.15
0.1
0.1
0.05
0.05
0
1 2 3 4 5 6 7 8 9 10
0
Happiness Score 1 2 3 4 5 6 7 8 9 10
Happiness Score
I II
0.45 0.4
0.4
First Child vs. Getting Married Moving House vs. Learning to Drive
0.35
0.35 0.3
0.3
0.25
Probability
Probability
Moving House
0.25 First Child
0.2 Learning to Drive
0.2 Getting Married
0.15
0.15
0.1 0.1
0.05 0.05
0 0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Happiness Score Happiness Score
III IV
Comparing the distributions using histograms gives some idea of how similar the
distributions are. But, this visual approach is not scientifically accurate. Moreover, with this
approach I cannot compare between pairs of life events. Finally, there are 25 life events, and
25
hence, C2 combinations or 300 pairs of life events. To compare these 300 combinations
visually and manually will be tedious. So, I looked for a way to formalize the notion of similarity
between the distributions. I found Hellinger distance [11] to measure this similarity. For two
1
(, ) = ( )2
2 =1
In a discrete distribution, there are a limited number of values and each value is
associated with a probability. In our case, the values are the happiness levels from 1 through 10,
and the probabilities are the percentage of respondents for each levels. In the above equation pis
are the probabilities of happiness levels 1 through 10 for life event X, and qis are the
probabilities of happiness levels 1 through 10 for life event Y. In the extreme case, say when for
life event X distribution P assigns the probability of 1 to level 10 and 0 to all others, and for life
event Y distribution Q assigns the probability 0 to all levels and 1 to level 1, Hellinger distance
takes the maximum value of 2. Therefore, the denominator normalizes the Hellinger distance
over the range 0 to 1. I came across Hellinger distance while doing some search on comparing
the distributions presented as histograms. I found that it is used to compare two documents
according to the frequencies of different words that occur in them [13]. Table 3 shows an
example of distributions of happiness levels for two life events, namely, first child and children
leaving home, and also the calculation of Hellinger distance for these two life events.
( )2 = 0.1922
=1
(, ) = 0.3106
I programmed in R to compute Hellinger distance 300 times, the lowest was 0.037 and
the highest was 0.418. Based on Hellinger distance, I explored pairs that are very similar and
very dissimilar.
I found that the average happiness of all the 25 life events varies over a narrow range
from 5.5 to 8.1. This narrow variation suggests that people may pursue very different activities
but draw fairly similar levels of average happiness from these activities. Moreover, with the
lowest score of 5.5, there is no life event in this data set that makes people completely unhappy.
I divided the life events into four groups according to the happiness levels for further
analysis. In the first group, I included events with an average happiness level of greater than 7.5,
and named this category as Mostly Joyous. Most respondents thought that the events in this
category made them very happy, and these events received only very few unhappy ratings. In the
second group, I included events with average happiness levels between 7.00 and 7.5, and
designated this as Joy with Reservations group. Many respondents felt that that the events in
this category made them happy, but the number and values of happiness levels were lower
compared to the topmost group. Also, the events in this group received a higher number of
unhappy ratings compared to those in the first group. I formed the third group with events with
an average happiness levels between 6.5 and 7.0, and designated this group as the Mixed
Feelings group. Events in this category received a moderate number of happy ratings, but lower
happiness levels. They also received a sizeable number of unhappy ratings. Finally, in the fourth
group, I included events with average happiness levels less than 6.5, and designated this as the
Bittersweet group. Events in this group received far fewer happy ratings, happiness levels were
lower, and were nearly balanced by unhappy ratings as well. These four groups, their
The group of events that belonged to the Mostly Joyous category included those where
people experienced a new thrill, or a new sense of freedom, or total fun and indulgence. For
example, the birth of a first child or engagement is included in this group. At these occasions,
people embark on a new journey, but have little to no anticipation for what lies ahead. Similarly,
learning to drive and learning to ride a bike brings a sense of freedom at young adults and
children respectively. Finally, on a dream holiday people usually indulge themselves into exotic
joyous group. About the events in the Joy with Reservations group, people mostly felt joy but
with some hesitation. Here the happiness may be tempered by a sense of responsibility, effort, or
sacrifice. For example, a promotion or a new job may be accompanied by a hesitation about
higher expectations. Or there may be some joy in a sporting win or in losing weight but people
may think of the effort that they may have to put in. Similarly, people may feel conflicted about
cosmetic surgery after they have undergone the procedure. For events in the Mixed Feelings
group, happiness mostly came from a sense of achievement but people were perhaps immensely
aware of the commitment necessary to achieve a goal. These included first marathon,
representing or serving country, or buying a dream car. For example, there may be a sense of
pride in becoming your own boss but there may also be apprehension because now they have no
one to rely on but themselves. Finally, events in the Bittersweet group included moving
houses, retiring, and children leaving home. In these events, unhappiness may be caused by the
end of a productive phase, but the fact that they have reached that far may be uplifting. Table 5
I split life event pairs based on Hellinger distance, if the distance was less than 0.08, I
considered them to be very similar. In this very similar category, most pairs are comprised of
individual (personal and professional) events. The only pairs of family events that appeared in
this category are getting married and getting engaged, and it is easy to see how these two events
that are related to each other cause people to agree on how they feel. Finally, retiring and
children leaving home are also similar because both mark the end of an important productive
chapter, and the start of a less hectic life. If the Hellinger distance between pairs of life events
was greater than 0.35, I labeled the pairs as very dissimilar. It is very telling that in all the pairs
of dissimilar life events, one event is either retiring or children leaving home. Thus, these two
events generate very different happiness levels compared to other professional and personal
accomplishments. Table 5 shows the grouping of very similar and very dissimilar pairs of life
I summarize my key observations. People draw average similar levels of happiness from
the widely different activities that they undertake in their pursuit of happiness. I found that there
was a higher consensus about the different levels of happiness from individual pursuits compared
to family oriented activities. Finally, retiring and children leaving home cause feelings that are
From the above observations, it seems like people draw happiness from individual goals
and accomplishments. Therefore, institutions and organizations may wish to support individuals
in these pursuits, such as running a marathon or publishing a book, even though it may be
outside the institutions norms. Second, although there are many support groups available to ease
the grieving process after retirement and after children leave home, local communities could
actively seek to connect retirees to opportunities such as volunteering and new business ventures
LESSONS LEARNED
I did not collect my own data (primary data), but instead used a secondary data set, which
was collected by someone else. The use of secondary data was inexpensive, convenient, and also
it provided me access to a larger amount of data. However, the data set did not have information
about the age, gender, geography and other attributes of the respondents. As a result, I could not
consider the impact of these characteristics on the happiness levels felt by people.
FUTURE WORK
overrepresentation of one type of people (gender, age, geography). Therefore, in the future, I
would like to understand whether these observations hold across gender, age, and geography. I
would like to study how the perception of happiness for these life events changes with elapsed
time.
Table 6: Pairs of Life Events with Similar and Dissimilar Happiness Distributions
ACKNOWLEDGMENTS
I would like to thank Prof. Michal Kosinski of Stanford Graduate School of Business for
giving me access to the data from the myPersonality project. I would also to thank Mrs. Lisa
Greenwald from Gibbons Middle School for all her help and support during this project. I
acknowledge Stack Overflow user community which helped with many R problems.
REFERENCES