Académique Documents
Professionnel Documents
Culture Documents
AbstractModeling and predicting human behaviors, such biomarkers, and biometric measures (i.e., cholesterol, BMI,
as the activity level and intensity, is the key to prevent the etc.) for a group of 254 individuals. Physical activities were
cascades of obesity, and help spread wellness and healthy reported via a mobile device carried by each user. All users
behavior in a social network. The user diversity, dynamic
behaviors, and hidden social influences make the problem enrolled an online social network allowing them to make
more challenging. In this work, we propose a deep learning friends and communicate with each other. Users biomarkers
model named Social Restricted Boltzmann Machine (SRBM) and biometric measures were recorded via monthly medical
for human behavior modeling and prediction in health social tests performed at our laboratories. The fundamental prob-
networks. In the proposed SRBM model, we naturally incor- lems this study seeks to answer, which are also the key in
porate self-motivation, implicit and explicit social influences,
and environmental events together into three layers which are understanding the determinants of human behaviors, are as
historical, visible, and hidden layers. The interactions among follows:
these behavior determinants are naturally simulated through How could social communities affect individual behav-
parameters connecting these layers together. The contrastive
divergence and back-propagation algorithms are employed for iors?
training the model. A comprehensive experiment on real and How could we leverage individual features and social
synthetic data has shown the great effectiveness of our deep communities to help predict the individual behavior?
learning model compared with conventional methods.
It is nontrivial to determine how much impact social
influences could have on someones behavior. Our start-
I. I NTRODUCTION
ing observation is that human behavior is the outcome
Overweight and obesity are major risk factors for a of interacting determinants such as self-motivation, social
number of chronic diseases, including diabetes, cardiovas- influences, and environmental events. This observation is
cular diseases, and cancers. Once considered a problem rooted in sociology and psychology, characterized as human
only in high-income countries, overweight and obesity are agency in social cognitive theory [1]. An individuals level of
now dramatically on the rise in low- and middle-income self-motivation can be captured by learning the correlations
countries. Recent studies have shown obesity can spread over between his or her historical and current characteristics.
the social network [3], bearing similarity to the diffusion Modeling social influences in health social networks is
of innovation [4] and word-of-mouth effects in marketing challenging since they are categorized into implicit and
[6]. To reduce the risk of obesity related diseases, regular explicit social influences [2]. In reality, all the users in our
exercise is strongly recommended [14]. However, there have program have participated in many other social activities
been few scientific and quantitative studies to elucidate how such as offline events, physical activity competitions, social
social relationships and personal factors may contribute to games, etc. Thus, they have unreported relationships which
macro-level human behaviors. can cause implicit social influences. The social context
The Internet is identified as an important source of also is a potential cause of implicit social influences, since
health information and may thus be an appropriate delivery human behaviors usually respond to the changing of the
for health behavior interventions [13]. In addition, mobile social communities [1]. In addition, a user can be influenced
devices can track and record the walking/jogging/running by unacquainted users, since they participate in the same
distance and intensity of an individual. Utilizing these program. The hidden social influences are composed of
technologies, our recent study, named YesiWell, was con- unreported social relationships, unacquainted users, and the
ducted in 2010-2011 as a collaboration between PeaceHealth changing of social context. This problem is known as the
Laboratories, SK Telecom Americas, and the University of hidden social influence of health social networks [2].
Oregon to record daily physical activities, social activities Explicit social influences demand a novel function to bet-
(i.e., text messages, social games, events, competitions, etc.), ter capture the influences on individuals from their friends.
Unlike common developed online social networks (e.g., binary logistic hidden units and real-valued Gaussian visible
Facebook, etc.), our health social network is developed from units can be used. In Figure 1, vi and hj are respectively
scratch. Users do not have many connections initially. As used to denote the states of visible unit i and hidden unit
time goes by, they will have more connections to other j. ai and bj are used to distinguish biases on the visible
users. Thus, conventional social influence functions (e.g., units and hidden units. The RBM assigns a probability to
information propagation cascade models [8], [17]) may not any joint setting of the visible units, v and hidden units, h:
fit our context. exp(E(v, h))
Motivated by the above challenging issues, we present p(v, h) = (1)
Z
a novel Social Restricted Boltzmann Machine (SRBM) for
where E(v, h) is an energy function,
human behavior prediction, based on the well-known deep
learning model, Restricted Boltzmann Machine (RBM) [19]. X (vi ai )2 X X vi
E(v, h) = 2 bj hj hj Wij (2)
In the SRBM model, the human behavior determinants 2i i
i j ij
which are self-motivation, explicit and implicit social in-
fluences, and environmental events are integrated into three where i is the standard deviation of the Gaussian noise for
layers which are historical, visible, and hidden layers. Self- visible unit i. In practice, fixing i at 1 makes the learning
motivation can be captured by learning correlations between work well. Z is a partition function which is intractable as
an individuals historical and current features. The effect of it involves a sum over the P (exponential) number of possible
the implicit social influences on an individual is estimated joint configurations: Z = v0 ,h0 E(v0 , h0 ). W is a weight
by an aggregation function of the past of the social network. matrix which bipartitely connects the hidden and visible
We define a new temporal smoothing and statistical function layers together, e.g., Wij links vi and hj . The conditional
to capture explicit social influences on individuals from distributions (assuming i = 1) are:
their neighboring users. By combining implicit and explicit
X
p(hj = 1|v) = bj + vi Wij (3)
social influences into a linear adaptive bias, we are able to i
model the social influences. The environment events, such X
as competitions, are integrated into the model as observed p(vi |h) = N ai + hj Wij , 1 (4)
j
variables which will directly affect the users behaviors. The
effectiveness of our model is verified by experiments on both where (.) is a logistic function, N (, V ) is a Gaussian.
real-world and synthetic health social networks. Our main Given a training set of state vectors, the weights and
contributions of are as follows: biases in a RBM can be learned following the gradient of
We introduce a new method for human behavior predic- contrastive divergence [5]. The learning rule are:
tion based on the well-known deep learning model, Re- Wij = hvi hj id hvi hj ir ; bj = hhj id hhj ir (5)
stricted Boltzman Machine (RBM). The proposed SRBM
model incorporates self-motivation, explicit and implicit where the first expectation h.id is based on the data dis-
social influences, and environmental events together. tribution and the second expectation h.ir is based on the
distribution of reconstructed data. The reconstructions are
Our experimental assessment on both real and synthetic
generated by starting a Markov chain at the data distribution.
data confirms that our model is very accurate in human
The hidden units can be updated by sampling Eq. 3, then
behavior prediction.
updating the visible units by sampling Eq. 4.
In Sec. 2, we introduce the RBM and related works. To incorporate temporal dependencies into the RBM, the
We present our SRBM model in Sec. 3. The experimental CRBM [20] adds autoregressive connections from the visible
evaluation is in Sec. 4 and we conclude the paper in Sec. 5. and hidden variables of an individual to his/her historical
variables. The CRBM simulates well human motion in
II. T HE RBM S AND R ELATED W ORKS
the single agent scenario. However, it cannot capture the
The Restricted Boltzmann Machine social influences on individual behaviors in the multiple
(RBM) [19] is a deep learning structure agent scenario. Li et al. [12] proposed a ctRBM model for
that has a layer of visible units fully link prediction in dynamic networks. The ctRBM simulates
connected to a layer of hidden units but the social influences by adding the prediction expectations
no connections within a layer (Figure of local neighbors on an individual into a dynamic bias.
1). Typically, RBMs use stochastic bi- However, the ctRBM cannot directly incorporate personal
nary units for both visible and hidden features with social influences to predict human behaviors.
Figure 1. The variables. The stochastic binary units of This is because the visible layer in the ctRBM does not take
RBM.
RBMs can be generalized to any distri- individual features as an input.
bution that falls in the exponential family Regarding human behavior prediction, social behavior has
[22]. To model real-valued data, a modified RBM with been studied recently such as analysis of user interactions
425
2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
426
2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
where a
i,t is also a dynamic bias:
X X
a
i,t = ai + Aif u,tk Hf u,tk (9)
k{1,...,N } f F
427
2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
average of the cumulative distribution function (CDF) of the is to predict whether a user increases or decreases physical
instant similarity over the user similarity spectrum, i.e., exercises. Thus the softmax layer contains a single output
|U | variable y and binary target values: 1 for increases, and 0 for
u u 1 X
t = t + (1 ) u lt (u, m) p st st (u, m) decreases. The output variable y is fully linked to the hidden
|Zt | m=1 variables by weighted connections S which includes |h|
P|U |
where Zut = m=1 lt (u, m), and the indicator function lt parameters sj . We use the logistic function as an activation
is 1 if user u is connected to user m until time t (i.e., function to saturate the two target values, i.e.,
X
eu,m Et ), and 0 otherwise. st is the similarity between y = (c + hj sj ) (16)
two arbitrary neighboring users in the social network at time jh
t. p(st st (u, m)) represents the probability that similarity where c is a static bias. Given a user u U , a set of training
is less than or equal to the instant similarity st (u, m). and vectors X = {Fut , Et |t Ttrain } and an output vector Y =
are two parameters to capture the dynamic of . {yt |t Ttrain }, the probability of a binary output yt
The effect of explicit social influences is integrated to the {0, 1} given the input xt as follows:
dynamic biases of hidden and visible variables. The dynamic Y
bias bj,t and a
i,t in Eq. 10 and Eq. 11 now becomes: P (Y |X, ) = ytyt (1 yt )1yt (17)
X XX tTtrain
bj,t = bj + Bjf u,tk Hf u,tk + j tu
where yt = P (yt = 1|xt , ).
k{1,...,N } f F uU
X XX A loss function to appropriately deal with the binomial
a
i,t = ai + Aif u,tk Hf u,tk + i tu problem is cross-entropy error. It is given by
k{1,...,N } f F uU
X
C() = yt log yt + (1 yt ) log(1 yt ) (18)
(15)
tTtrain
where parameters i and j present the ability to observe As this last training, Back-propagation is used to fine-tune
the explicit social influences tu of user u given vi and hj . all the parameters together. The derivatives of the objective
Inference and Learning. Inference in the SRBM model is C() with respect to all the parameters over all the training
no more difficult than in the RBM. The states of the hidden cases t Ttrain
C() P can be computed C()
as: P
variables are determined by both the input they receive from sj = t (yt y t )hj ; c = t (yt yt )
the visible variables and the input they receive from the C() P
Wij = (y
t t y
t )sj hj (1 hj )vi
historical variables. The conditional probability of hidden C() P
and visible variables at time interval t can be computed as ai = (yt yt )sj hj (1 hj )Wij
C() Pt
in Eq. 7, Eq. 8, and Eq. 15. The energy function becomes: bj = t (yt y t )sj hj (1 hj )
(vi,t ai,t )2
jh bj,t hj,t C()
P P
E(vt , ht |Ht< , ) =
P
iv 2 2
i Aif u,tk = t (yt y t )sj hj (1 hj )Wij Hf u,tk
P vi,t C()
iv,jh i hj,t Wij .
P
Bjf u,tk = t (yt y t )sj hj (1 hj )Hf u,tk
Contrastive divergence [5] is used to train the SRBM. The C()
t )sj hj (1 hj )Wij tu
P
updates for the symmetric weights, W , as well as the static i = t (yt y
Causal Generation. In the prediction task, we need
biases, a and b, have the same form as Eq. 5. However they u
to predict the yt+1 without observing the Fut+1 . In other
have a different effect because the states of the hidden and
words, the visible and hidden variables are not observed
visible variables are now influenced by the implicit and ex-
at the future time point t + 1. Thus we need a causal
plicit social influences. The updates for the directed weights,
generation step to initiate these variables. Causal generation
A and B, are also based on simple pairwise products. The
from a learned SRBM can be done just like the learning
gradients are summed over all the training time intervals
procedure. We always keep the historical variables fixed
t Ttrain = P Tdata \ {t M + 1, . . . , t M + N }: P and perform alternating Gibbs sampling to obtain a joint
Wij = t hv i,t hj,t id hvi,t Phj,t ir ; ai = t sample of the visible and hidden variables from the SRBM.
hvi,t id hvi,t ir P ; bj = t hhj,t id hhj,t ir ; To start alternating Gibbs sampling, a good choice is to set
Aif u,tk = P t hvi,t Hf u,tk id hvi,t Hf u,tk ir ;
vt = vt1 since vt1 can be considered as a strong prior
Bjf u,tk P = t hhj,t Hfu,tk id hhf,t PHf u,tk ir ; of vt . This picks new hidden and visible variables that are
u
i = t hv i,t id hv i,t i r t ; j = t hhj,t id compatible with each other and with the recent historical
hhj,t ir tu .
variables. Afterward, we aggregate the hidden variables to
We train the SRBM for each user independently. At any
evaluate the output y.
time we update the parameters, we will update the explicit
social influences for all the users. IV. E XPERIMENTS
Human Behavior Prediction. On top of the SRBM We have carried out a series of experiments using both
model, we put an output layer which is commonly called real-world and synthetic health social networks to vali-
softmax layer for a user behavior prediction task. Our goal date our proposed SRBM model. We first elaborate the
428
2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Table I
P ERSONAL C HARACTERISTICS .
429
2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
(a) Prediction accuracy with (b) Prediction accuracies with (c) The SRBM vs competitive models
different and different explicit social influences in terms of prediction accuracy
Figure 6. Validation of the SRBM model on the YesiWell health social network.
(a) #increase exercise users (b) #times joining competitions Figure 8. Accuracies on the synthetic data.
Figure 7. The distributions of users activities.
two parameters and on our health social network. We is clear that the SRBM outperforms the other models. The
observed that the smaller values of tend to have higher accuracies of the competitive models tend to be dropped in
prediction accuracies. It is quite reasonable since the more the middle period of the study. In essence, all the behav-
recent behaviors have stronger influences. The temporal ior determinants and their interactions potentially become
smoothing parameter has similar effects to a time decay stronger since all the users improve their activities such as
function [25]. Meanwhile, the middle range values of offer walking and running, participating more competitions, and
better prediction accuracies. Obviously, the optimal setting etc. (Figure 7) in the middle weeks. Either insufficiently or
values of and are 0.5 and 1 respectively. laking of modeling one of the determinants and their inter-
To test the correctness, we compare our optimal setting actions results in a low and unstable prediction performance.
(i.e., =1, =0.5) of the explicit social influence function This is the case of the competitive models. They do not well
with its different forms such as:P1) without temporal smooth- capture the social influences and environmental events.
ing component i.e., tu = |F1u | mFu p st st (u, m) ,
Meanwhile, the SRBM comprehensively models all the
t t
and 2) replacing our function by the social influence func- determinants. The correlation between the personal charac-
tion in the ctRBM [12], this becomes the aforementioned teristics and the implicit social influences can be detected
SctRBM. Figure 6b shows that either the SRBM without by the hidden variables. Thus, much information has been
temporal smoothing or the SctRBM have significant lower leveraged to predict individual behaviors. In addition, our
prediction accuracies compared with the SRBM with optimal prediction accuracy stably increases over time. That means
setting. So, our function is effective, and the optimal setting our model well captures the growing of our health social
improves the prediction accuracy by 12% in our health network (Figure 4). In obvious, our model achieves higher
social network. It also offers a better performance for the prediction accuracy and a more stable performance. The
SRBM compared with other social influence functions. This SRBM achieves the best accuracy in average as 0.887.
is because our health social network is developed from Synthetic Health Social Network. To illustrate that our
scratch. Users do not have many connections initially. They model can be generally applied on different datasets, we
will have more connections over time (Figure 4). In other perform further experiments on a synthetic health social
words, our function produces a better-fit social influence network. To generate the synthetic data, we use software
distribution in our health social network. From now, we use Pajek1 to generate graphs under the Scale-Free/Power Law
the optimal setting of the SRBM in other experiments. Model2 . However the vertices in the current synthetic graph
To examine the prediction accuracy, we compare the do not have individual features similar to the real-word data.
proposed SRBM model with the competitive models in terms 1 http://vlado.fmf.uni-lj.si/pub/networks/pajek/
of human behavior prediction. Figure 6c shows the accuracy 2 Scale-Free/Power Law Model (SF) is a network model whose node
comparison in 37 weeks in our health social network. It degrees follow the power law distribution or at least asymptotically.
430
2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
An appropriate solution to this problem is to apply a graph [7] J. Kawale, A. Pal, and J. Srivastava. Churn prediction in
matching algorithm to map pairwise vertices between the mmorpgs: A social influence based approach. In CSE09,
synthetic and real social networks. In order to do so, we first pages 423428.
[8] D. Kempe, J.M. Kleinberg, and E. Tardos. Maximizing the
generate a graph with 254 nodes and the average node degree spread of influence through a social network. In KDD03,
is 5.4 (i.e., similar to the real YesiWell data). Then, we pages 137146.
apply PATH [24] which is a very well-known and efficient [9] D. Kil, F. Shin, B. Piniewski, J. Hahn, and K. Chan. Impacts
graph matching algorithm to find a correspondence between of social health data on predicting weight loss and engage-
vertices of the synthetic network and vertices of the YesiWell ment. In OReilly StrataRx Conference, San Francisco, CA,
October 2012.
network. The source code of the PATH algorithm is available [10] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-
in the graph matching package GraphM (http://cbio.ensmp. based learning applied to document recognition. Proceedings
fr/graphm/). Then, we can assign all the individual features of the IEEE, 86(11):22782324, 1998.
and behaviors of any real user to corresponding vertices in [11] K. Lerman, S. Intagorn, J.-K. Kang, and R. Ghosh. Using
proximity to predict activity in social networks. In WWW
the synthetic network. Consequently, we have a synthetic
Companion12, pages 555556.
health social network which imitates our real-world dataset. [12] X. Li, N. Du, H. Li, K. Li, J. Gao, and A. Zhang. A deep
Figure 8 shows the accuracies of the conventional models learning approach to link prediction in dynamic networks. In
and the SRBM model on the synthetic data. We can see that SIAM14, pages 289297.
the our model still outperforms the conventional models in [13] A. Marshall, E.G. Eakin, E.R. Leslie, and N. Owen. Exploring
the feasibility and acceptability of using internet technology
terms of the prediction accuracy. to promote physical activity within a defined community.
V. C ONCLUSIONS AND F UTURE W ORKS Health Promotion Journal of Australia, 2005(16):824, 2005.
[14] R.R. Pate, M. Pratt, S.N. Blair, and et al. Physical activity and
This paper introduces the SRBM, a novel deep learning public health. a recommendation from the centers for disease
model for human behavior prediction in health social net- control and prevention and the american college of sports
works. By incorporating all the human behavior determi- medicine. Journal of the American Medical Association,
273(5):4027, 1995.
nants which are self-motivation, implicit and explicit social [15] H. Poon and P. Domingos. Sum-product networks: A new
influences, and environmental events, our SRBM model deep architecture. In IEEE ICCV Workshops, pages 689690,
predicts the future activity levels of users more accurately 2011.
and more stably than conventional methods. [16] R. M. Ryan and E. L. Deci. Intrinsic and extrinsic motiva-
Our work can be extended in several directions. First, we tions: Classic definitions and new directions. Contemporary
Educational Psychology, 25(1):54 67, 2000.
can leverage the domain knowledge to generate explanations [17] K. Saito, R. Nakano, and M. Kimura. Prediction of informa-
for predicted behaviors. Second, although the approach tion diffusion probabilities for independent cascade model. In
explored in this paper is rooted on the RBM [19], other KES 2008, pages 6775.
alternatives are possible, which can be based on CNNs [10] [18] Y. Shen, R. Jin, D. Dou, N. Chowdhury, J. Sun, B. Piniewski,
and D. Kil. Socialized gaussian process model for human
or Sum-Product Networks [15]. We plan to explore and
behavior prediction in a health social network. In ICDM12,
compare these different strategies in the future work. pages 11101115.
[19] P. Smolensky. Information processing in dynamical systems:
ACKNOWLEDGMENT Foundations of harmony theory. chapter Parallel Distributed
This work is supported by the NIH grant R01GM103309 Processing: Explorations in the Microstructure of Cognition,
to the SMASH project. We thank Xiao Xiao, Rebeca Sacks, Vol. 1, pages 194281. 1986.
[20] G.W. Taylor, G.E. Hinton, and S.T. Roweis. Modeling human
and Ellen Klowden for their contributions. motion using binary latent variables. In NIPS06, pages 1345
R EFERENCES 1352.
[21] B. Viswanath, A. Mislove, M. Cha, and K.P. Gummadi. On
[1] A. Bandura. Human agency in social cognitive theory. The the evolution of user interaction in facebook. In WOSN09,
American Psychologist, 1989. pages 3742.
[2] N.A. Christakis. The hidden influence of social networks. In [22] M. Welling, M. Rosen-Zvi, and G.E. Hinton. Exponential
TED2010. family harmoniums with an application to information re-
[3] N.A. Christakis and J.H. Fowler. The spread of obesity in a trieval. In NIPS04, pages 14811488.
large social network over 32 years. The New England Journal [23] J. Yang, X. Wei, M.S. Ackerman, and L.A. Adamic. Activity
of Medicine, 357:3709, 2007. lifespan: An analysis of user survival patterns in online
[4] R.G. Fichman and C.F. Kemerer. The illusory diffusion of knowledge sharing communities. In ICWSM10, pages 186
innovation: An examination of assimilation gaps. Information 193.
Systems Research, 10(3):255275, 1999. [24] M. Zaslavskiy, F. Bach, and J.-P. Vert. A path following
[5] G.E. Hinton. Training products of experts by minimizing algorithm for graph matching. In IEEE Transactions on
contrastive divergence. Neural Computation., 14(8):1771 Pattern Analysis and Machine Intelligence, volume 5099,
1800, 2002. pages 329337, 2008.
[6] J. Huang, X.-Q. Cheng, H.-W. Shen, T. Zhou, and X. Jin. [25] Y. Zhu, E. Zhong, S.J. Pan, X. Wang, M. Zhou, and Q. Yang.
Exploring social influence via posterior effect of word-of- Predicting user activity level in social networks. In CIKM13,
mouth recommendations. In WSDM12, pages 573582. pages 159168.
431