Representation of Risks in Online Communities - Bassem Nasser

Report D1.1, V 2.
0 Dissemination Level: PU

Copyright University of Southampton, IT Innovation Centre and other
members of the EC FP7 ROBUST project consortium (grant agreement
257859), 2011 1/145

EC Project 257859

Risk and Opportunity management of
huge-scale BUSiness communiTy cooperation

D1.1: Representation of Risks in Online Communities

31 Oct 2011
Version: 2.0

Bassem Nasser
bmn@it-innovation.soton.ac.uk
IT Innovation Centre, University of Southampton

List of further authors and their affiliation
Vegard Engen, Mariusz Jacyno, Simon Crowle, Bharat Sahani
(IT Innovation)
Joerg Fliege, Thanos Avramidis, Edwin Tye, Philippa Hiscock
(CORMSIS)
Matthew Rowe (Open University)

Dissemination Level:
PU Public
Report D1.1, V 2.0 Dissemination Level: PU

257859), 2011 2/145

Executive Summary
Online communities generate major economic value and form pivotal parts of
corporate expertise management, marketing, product support, CRM, product
innovation and advertising. Whilst there is a clear gain provided by such
infrastructures, the management and preservation of their efficient operation is
not trivial. Communities can exceed millions of users and infrastructures must
support hundreds of millions discussion threads that link together billions of
posts. Current solutions fail to meet current challenges of scale and growth,
let alone support for understanding and managing the business, social and
economic objectives of the users, providers and hosts involved.

Participant behaviours and roles in online communities are highly diverse and
evolve over time, e.g. browsers, lurkers, leaders, trolls and experts to name a
few. As a consequence, their behaviour can be positive, benign or deliberately
malignant e.g. those who provide expert advice, those who try to outwit
community policies to gain unfounded expert status, or those who actively
seek to damage and disrupt. Online communities have varied and sometimes
fragile or unexpected lifecycles. A healthy community can see periods of
intense and explosive growth, which is itself a problem to manage, but equally
things may go wrong in an online community very quickly collapse,
stagnation or shift to competitors are real risks. Online communities rarely
operate in isolation: they can undergo fractures, merges, share participants,
cross-reference each other and activity can spread in a virus-like manner
between them within online ecosystems. Moreover, the business, social and
economic objectives of the actors and stakeholders in online community value
chains (users, providers, hosts etc.) are not simple to align and require a
balancing act to manage the tensions and conflicts as well as opportunities for
collaboration and mutual benefit.

The management of this complex information ecosystem requires a robust
risks and opportunities management framework. Many of the risks or
opportunities that exist in operating these business online communities need
very rapid analysis and response times, e.g. detecting and reacting to
anomalous behaviour such as automated spam attacks or trolls, or to a rapid
change in behaviour of a user. Doing this for the scale of the social networks
that exist already, outstrips the capability of conventional manual processes,
e.g. use of moderators to detect and remove inappropriate posts or contain
user misbehaviour. Likewise, the tidal wave of business online communities'
traffic that can arise from a new product launch, training event or exposition in
the 'real world' needs to be properly supported in the 'virtual world' to
maximise this business opportunity. Risks and opportunities can span
individual sites and can be faced by third-party operators as well, e.g. real-
time and targeted advertising based on user behaviours across multiple
forums or sites.

This report addresses risk representation, identification, assessment and
forecasting in online communities. The goal is to set the corner stone for a

257859), 2011 3/145

framework that enables proactive management of risks and opportunities. The
research results shown in this document cover risk evaluation techniques as
well as risk management principles. The findings include:

1- Risk management methodology is proposed based on a review of
existing standards for analysing the problem, implementing a solution
and monitoring its effectiveness.
2- Risk and opportunity identification and assessment require
understanding of the community objectives and resolving eventual
conflicts between the stakeholders (i.e. users, host and owner).
3- Risks and opportunities are mainly classification of events according to
the impact on community objectives, whether positive or negative. Risk
and opportunities templates are specified in this report in order to
capture the link between the events and objectives.
4- Reacting to pre-empting risks can include different strategies, each with
several steps. The right combination of automation (e.g. workflow
techniques), and human understanding and intervention (e.g. using a
dashboard) is needed to ensure a rapid and well managed response.
Agent-based simulation of the online community and administrative
actions, proposed in this document, allow for cost/risk/benefit analysis
to be done.
5- Graph based techniques (e.g. compartment models, Bayesian
networks) as well as others (e.g. correlation and regression analysis)
are potential candidates for risk assessment.

The risks and opportunities considered in this report are based on a detailed
exploration of three real-world case studies from IBM, SAP and Polecat. A
survey was distributed to collect information about communities' objectives,
risk and opportunities as well as current treatment practices.
Risks and opportunities models will be based on the extensive analysis and
models being developed in the context of other ROBUST work-packages
(WP3,4,5). The integrated system will allow advanced and proactive
management of the risks and opportunities in online communities.

257859), 2011 4/145

D1.1: Representation of Risks in Online Communities

Table of Contents
Executive Summary ......................................................................................... 2
Table of Contents ............................................................................................. 4
1. Introduction ............................................................................................... 8
2. Risks and opportunities in social networks ............................................. 10
3. Risks and opportunities identified in ROBUST ........................................ 13
3.1. Questionnaire findings ...................................................................... 13
3.1.1. Overview of responses ............................................................... 13
3.1.2. Community insights .................................................................... 14
3.1.3. Business impact ......................................................................... 15
3.1.4. Community health ...................................................................... 17
3.2. Risks and opportunities..................................................................... 18
3.2.1. Community/user activity ............................................................. 19
3.2.2. Community evolution .................................................................. 19
3.2.3. Community usage ...................................................................... 20
3.2.4. User experience/behaviour ........................................................ 20
3.2.5. Community/user role dynamics .................................................. 21
3.2.6. Community content .................................................................... 21
3.2.7. Community structure .................................................................. 21
3.2.8. Community operation ................................................................. 21
3.2.9. Security ...................................................................................... 22
3.2.10. Other ....................................................................................... 22
3.3. Summary .......................................................................................... 22
4. Risk management standards .................................................................. 23
4.1. ISO 31000 ......................................................................................... 23
4.2. BS 31100:2011 ................................................................................. 24
4.3. FERMA: A Risk Management Standard ............................................ 25
4.4. Enterprise Risk Management COSO ................................................ 25
4.5. M_o_R .............................................................................................. 26
4.6. ROBUST online community risk management .................................. 26
5. Techniques for risk management ............................................................ 27

257859), 2011 5/145

5.1. Risk identification .............................................................................. 27
5.2. Modelling and evaluation .................................................................. 28
5.2.1. Tree-based analysis ................................................................... 28
5.2.2. Goal-risk modelling ..................................................................... 30
5.2.3. Markov chains ............................................................................ 32
5.2.4. Bayesian networks ..................................................................... 33
5.2.5. Regression models ..................................................................... 34
5.2.6. Compartment models ................................................................. 35
5.2.7. Agent Based Simulation ............................................................. 36
5.2.8. Summary .................................................................................... 38
6. Risk and opportunity treatment ............................................................... 38
6.1. Risk and opportunity treatment strategies......................................... 38
6.2. Workflow notations ........................................................................... 40
6.2.1. UML: Activity diagrams ............................................................... 40
6.2.2. Event driven process chains (EPC) ............................................ 40
6.2.3. Business Process Modelling Notation (BPMN) .......................... 41
6.2.4. Yet Another Workflow Language (YAWL) notation .................... 42
6.2.5. Other workflow notations: scientific and visual programming
languages (VPLs) ................................................................................... 43
6.2.6. Summary .................................................................................... 44
6.3. Workflow specifications ..................................................................... 44
6.3.1. OMG's UML/D ........................................................................... 45
6.3.2. EPML 1.2 ................................................................................... 46
6.3.3. WS-BPEL 2.0 ............................................................................. 47
6.3.4. WS-HumanTask 1.1 ................................................................... 47
6.3.5. WS-BPEL4People 1.1 ................................................................ 48
6.3.6. ebXML Business Process Specification 2.0.4 ............................ 49
6.3.7. YAWL 2.1 ................................................................................... 50
6.3.8. XPDL 2.1 .................................................................................... 50
6.3.9. BPMN 2.0 ................................................................................... 51
6.3.10. T2FLOW ................................................................................. 51
6.3.11. Summary of workflow specification languages ........................ 52
7. Risk and opportunity management framework ........................................ 53
7.1. Events in online community .............................................................. 54

257859), 2011 6/145

7.2. ROBUST Risk and Opportunities modelling ..................................... 56
7.3. Risk and opportunities dependencies ............................................... 58
7.4. Risk and opportunity templates ......................................................... 59
7.5. Risk and opportunity states ............................................................... 61
7.6. Treatment plans ................................................................................ 62
7.6.1. Treatment workflows .................................................................. 62
7.6.2. Anticipated ROBUST treatment processes ................................ 64
7.7. Risk and opportunity editor ............................................................... 67
8. Forecasting of risks and opportunities in ROBUST ................................. 68
8.1. Risks and opportunities for use cases .............................................. 68
8.1.1. IBM ............................................................................................. 68
8.1.2. SAP ............................................................................................ 69
8.1.3. Polecat ....................................................................................... 72
8.1.4. Addressing the risks and opportunities ....................................... 73
8.2. Agent based simulation..................................................................... 74
8.2.1. Simulation goals ......................................................................... 74
8.2.2. Model architecture ...................................................................... 76
8.2.3. Simulation process ..................................................................... 77
8.3. Compartment model and Gibbs sampler .......................................... 78
8.3.1. Estimating compartment sizes ................................................... 79
8.3.2. Estimating migration rates without Gibbs Sampler ..................... 80
8.3.3. Computing forecasts of compartment sizes ............................... 82
8.3.4. Gibbs sampler ............................................................................ 83
8.4. Other forecasting tools in the ROBUST project ................................ 83
8.4.1. Churn analysis............................................................................ 83
8.4.2. Activity prediction ....................................................................... 84
9. Conclusions and future work ................................................................... 84
A. List of Figures ......................................................................................... 87
B. List of Tables .......................................................................................... 89
C. List of Abbreviations ............................................................................... 90
D. References ............................................................................................. 92
E. Use-case summary model for WP4 policy descriptions .......................... 98
F. Summary of potential treatment responses based on WP3 survey data 99

257859), 2011 7/145

G. Workflow software frameworks ............................................................. 101
H. Workflow technical specification feature review .................................... 103
I. Example treatment workflow using BPMN ............................................ 107
J. Risk and opportunity editor ................................................................... 108
K. Simplatform ........................................................................................... 110
L. Agent model design example ................................................................ 117
M. Gibbs sampler example ..................................................................... 121
N. Risk and opportunity questionnaire ....................................................... 132
O. Risks and opportunities from questionnaire responses ........................ 139


257859), 2011 8/145

1. Introduction
Online communities generate major economic value and form pivotal parts of
corporate expertise management, marketing, product support, CRM, product
innovation and advertising. Communities can exceed millions of users and
infrastructures must support hundreds of millions of discussion threads that
can link together even greater number of posts.
ROBUST addresses large-scale data analysis and management as well as
understanding and managing complex user behaviours and ecosystems in
online business communities. This requires sensing and steering (act and
respond) the community towards its objectives.
In the lifetime of a community several risks or opportunities may arise at any
given point in time. A risk is characterised by an event that has a negative
impact on the community objectives. An opportunity, on the other hand, is
characterised by an event that has a positive impact on the community
objectives. Management of risks and opportunities in online communities goes
beyond the traditional simple websites statistics. The management process is
a set of activities that direct the organisation with regard to risk including: risk
identification, assessment, treatment and monitoring. Different treatment
strategies (e.g. mitigate, enhance, avoid, accept and ignore) are possible
according to the likelihood and impact of the risk and opportunity. The
treatments can be seen under two categories: proactive (addressing risks and
opportunities before they occur) and reactive (addressing measures after
occurrence). Proactive risk and opportunity management requires
understanding and forecasting the user, community and network structure
interplay.
Some of these aspects can be illustrated by social dynamics theories
presented in this report, such as social capital theory, critical mass theory,
social exchange, dependency theory and many more. Though some of these
theories may have been conceived before the Internet era, the online
communities can still benefit from such generic insights. These theories shed
light on the user motivation for joining communities and explain user
behaviour and knowledge transfer in the context of the community and its
network structure. Community owners can then appreciate the importance of
the social aspects of their networks and detect relevant risks and exploit
opportunities in early stages.
Though there are works on different aspects of online communities (sentiment
analysis, churn analysis, role dynamics), these were not seen from an
integrated perspective of risks and opportunities management. This gap is
being addressed within ROBUST, mainly via three use case scenarios:
x Employee use case (IBM) [80]
x Business partners use case (SAP) [62]
x Public domain (Polecat) [90]

257859), 2011 9/145

In order to better understand the value of hosting an online community and
how community owners quantify and assess the health of their communities,
we produced a questionnaire which was distributed to community hosts and
owners (and moderators) within the IBM and SAP communities. The
questions covered community objectives, costs, benefits, size of the
respective communities, technologies used, health metrics as well as risks
and opportunities.
One of the main observations from the questionnaire is that communities are
led by objectives; usually set by the owners in this case. Another insight that
we gained from the questionnaire is that, although some communities may
seem to have similar objectives, the ranking of these objectives in terms of
importance and priority may vary as well. These objectives usually shape the
community policies, platform, user behaviour and interactions.
The risks and opportunities collected in the questionnaire could be grouped
into different categories (e.g., user/community activity, community evolution,
community/user role dynamics, community content, etc.). There are natural
dependencies between these categories, i.e., we observe that a risk may
depend on another risk, or a risk may actually expose an opportunity. Which
categories and dependencies of risks that are relevant to a given community
depends on the owner and the community objectives.
Risk management standards have been reviewed in order to understand the
taxonomy and semantics of risk management. Multiple standards are
presented which showed a significant consensus on the overall management
process. The choice of a risk standard is not the goal of the ROBUST project.
However, the framework process, definitions and terminology used in
ROBUST has been chosen to align with the standards described in this report.
The ROBUST project aims at providing a tool in order to assist the decision
maker and support the online community risk management process. For this
reason, techniques for risk identification, modelling and evaluation have also
been reviewed. The techniques range from qualitative to quantitative
approaches with varying level of complexity. According to the evaluation of the
reviewed techniques, Bayesian networks, compartment models and agent
based simulation stand out as feasible approaches for risk and opportunities
specification, assessment and forecasting in ROBUST.
Bayesian networks show promise in modelling and assessing risks that may
have dependencies. In a sophisticated framework, we strive to model a
system in which we can predict the impact of taking certain actions on a risk
or opportunity, which may give rise to other risks or opportunities. This is a
non-trivial problem, which requires substantial research.
Agent-based simulation and compartment models are considered for
forecasting of whether risks or opportunities are likely to occur in the future.
This is key to achieving the proactive community management we strive to
support in ROBUST. Given that a forecasting tool can give a probability of a
risk occurring, mitigating actions can be taken by the community owner before
the risk has actually occurred.

257859), 2011 10/145

Agent-based simulation serves multiple purposes especially when other
techniques cannot be applied (nonexistence of Markov property, inability to
learn the Bayesian network structure, etc.). At the risk identification level,
simulation helps to identify and understand the origin of emergent behaviours
before they actually have a chance to endanger the real system operation. On
the assessment front, the simulation will be able to predict the state of the
system in the future and thus investigating whether a certain risk is likely to
happen. On the treatment side, the simulation will allow evaluating the
effectiveness of the admin actions before being applied onto the real system.
To enable and automate proactive community management, we propose a
framework in which users can specify treatment plans and connect these to
the assessment of the risks or opportunities. That is, a user can define a
treatment plan of actions that should be executed given that the probability of
a certain risk exceeds a certain threshold (e.g. mitigate, exploit, ignore).
Moreover, in the treatment plan, multiple actions may be required in order to
manage a risk or opportunity. Therefore, we propose the use of a workflow
specification language to detail actions that should be taken and engine that
can automatically execute the plan. A review and choice of workflow
technology is also presented in this report.
In Section 2, we address the state of the art of risks and opportunities in social
networks, and then we present the questionnaire results including the risks
and opportunities categories in Section 3. The risks management standards
are reviewed in Section 3 to set the basis of our risk management approach.
The techniques for risk identification, modelling and evaluation are discussed
in Section 5. Treatment being an essential phase of risk management is
discussed in Section 6. Section 7 presents the ROBUST approach for
specifying risks and opportunities, including treatment plan workflow. In
Section 8 we show the risks and opportunities that are to be addressed in the
context of the ROBUST scenarios as well as the work done in investigating
the validity of the approaches selected in this report. Conclusions and further
work are discussed in Section 9.
2. Risks and opportunities in social networks
Social dynamics theories shed light on the user, community and network
structure interplay. Though these theories do not address online communities
in particular, the online communities' managers can still benefit from such
generic insights. Understanding the "how and "why of community dynamics
is crucial to identify risks and opportunities as well as how to mitigate/exploit
them.
One of the interesting theories that highlight the motivation behind joining a
community is the social capital theory [66]. Social capital refers to the
resources embedded within networks of human relationships. Social capital
resides in the relations among the nodes and 'just as physical and human

257859), 2011 11/145

capital facilitate productive activity, social capital does as well' [47, 66].These
networks of relations include proximate as well as virtual communities [78].
Such a view of social capital rests on the premise that it is all about
establishing relationships purposefully and employing them to generate
intangible and tangible benefits in short or long terms. The benefits could be
social, psychological, emotional and economical. The "Strength of weak ties
theory [27] analyses macro phenomena like information diffusion and social
mobility in relation to the strength of interpersonal ties. The strength of a tie is
a combination of the amount of time, emotional intensity, intimacy and the
reciprocal services that characterize the tie. Though these characteristics
come from sociological background, these can still be mapped (to an extent)
to observed features in online communities. The theory argues that "weak
ties can act as a "bridge" that spans parts of a social network to connect
otherwise disconnected social groups. The theory argues that "no strong tie is
a bridge" since this means that no new information can be shared via them.
As a consequence, for maximizing the efficiency of knowledge diffusion
across a network, the weak ties are the most valuable [27]. The fact that
strong ties have similar knowledge and expertise, is an opportunity that can
be exploited in ROBUST, for instance, in order to identify experts and/or
assign moderators.
The Critical mass theory [57] addresses the socio-dynamics within a
community. It indicates that a critical mass is necessary in order to provide a
sufficient momentum for a change. The momentum becomes self-sustaining
and creates further growth. The consequence is that the whole community will
benefit from the efforts of a sub-group but also that the contributor may not get
sufficient return to their own actions. These consequences can be positive or
negative to the community as well as the individuals according to the
community objectives and interactions norms. In ROBUST we will address
risks and opportunities related to the mass decomposition according to certain
features (e.g. roles, expertise).
The social exchange and dependency theory [33] addresses social behaviour
as an exchange of goods. The outcome or gain is the difference between the
benefits (e.g. material or financial gains, social status) and the costs (e.g.
sacrifices of time, money). The notion of satisfaction is formalized as the
difference between the outcome and a certain comparison level. For instance
incentive mechanisms [21] like personal reputation, social altruism and
tangible rewards [36] are very effective mechanisms to motivate and
encourage community members to contribute. These aspects are already
considered for instance in the SAP communities where a point rewards
system is in place to increase the benefits of the users and boost knowledge
sharing.
Moreover, there is an incentive here for the community owner to make sure
that the alternative communities are less profitable for users. This is where the
community services quality and uniqueness are challenged. A lot of work has
been carried out on churn analysis in telecom networks [64], banking [97] and
service providers [34]. These works were inspired to apply churn analysis

257859), 2011 12/145

techniques in online social networks [48, 50]. Churn is one of the important
risks identified in ROBUST. Multiple churn types were identified: global, role
and forum churn. The analysis tried to find correlations between the
user/community features and the churn probability of the user. Various
techniques are being used and applied on online communities in ROBUST
scenarios.
The contagion theory addresses the effect of the crowd on the individual [9].
The theory states that the crowd makes the individual behave in certain ways.
In the context of social networks, contagion theories consider these networks
as channels for infectious attitudes and behaviour [9, 15]. Due to the exposure
to others' beliefs and attitudes, members of the same social network are likely
to develop similar assumptions and beliefs.
Contagion may occur by direct communication amongst the members or even
by structural equivalence (i.e. similar communication patterns may result in
similar behaviours and attitudes). Structural analysis as well as topic evolution
and propagation are part of the topics investigated in ROBUST WP3 and
WP5.
In an interesting approach, Lithium technologies analysed hundreds of
communities and produced a numerical index to measure online community
health [56]. CHI (Community Health Index) is based on the analysis of
communities' size growth, content utility, traffic from page views,
responsiveness (reply speed), interactivity (in terms of thread depth and
unique contributors), liveliness (number of posts per day in each community
segment).
Lithium distinguishes two sets of these health factors named diagnostic and
predictive:
x Diagnostic indicators including size, content and traffic which reflect the
state of the community.
x Predictive indicators that include responsiveness, interaction and
liveliness, and reflect the behavioural pattern within the community.
Though Lithium provides mathematical models to calculate the indicators,
they do not go farther to quantify the prediction power of these indicators on
future community health.
From another perspective, online communities exhibit issues related to users
behaviour and technology usage. According to Cross et al. [18], employees in
an organisation are more reliant on other people than databases and
computer systems when seeking answers to unstructured questions. As a
result, organisations need more employees to contribute and improve
knowledge sharing in a social network. Bughin [14] also emphasises that ease
of use of online community features is important as any hurdles will constrain
users from collaborating actively. Opportunities to create multiple communities
within a company, depending on the department and area of interest of
various individuals [53], is another way to exploit the technology benefits. For
example, creating a sub-community for employees of IT infrastructure

257859), 2011 13/145

department about the latest hardware updates would boost their knowledge
and productivity in their job.
Privacy and security issues are one of the important concerns that arise when
using social networks. A number of different organisations and projects have
addressed these issues [24, 94]. Companies usually have a usage security
policy of the online communities and social networks ranging from total
rejection to careful acceptance. For this reason ROBUST will focus on other
categories of risks and opportunities including behavioural, sentimental and
structural related ones.
The research work discussed above shows the multi-disciplinary scope of
online communities. Community owners should appreciate the importance of
the social aspects of their networks to detect relevant risks and exploit
opportunities in early stages. The above works present also some guidelines
on how a community can be set up and managed successfully. Note that the
SN theories should not be the only source of risks and opportunities. Risks
and opportunities depend on the objectives of the community (see section 4).
A user's own objectives may differ from that of the community owner's; in
WP1 we focus on the community owner's objectives as in the case of SAP
and IBM scenarios. The following section further discusses specific objectives
a community may have and possible risks and opportunities that can arise.
3. Risks and opportunities identified in ROBUST
To expand on the literature on risk and opportunity management in online
communities, as discussed above, we designed a questionnaire targeting
community hosts and owners/moderators, aiming to draw more insights
regarding three main topics of interest:
1. The value of online communities in terms of Return On Investment
(ROI),
2. Risks and opportunities envisaged in the online communities, and
3. The observed user behaviour in the respective online communities.
The latter is not discussed here, as this is the focus of WP3, and is therefore
discussed in D3.1 [81] instead. D3.1 also contains more details about how the
questionnaire was designed. The full set of questions included in the
questionnaire is included in Appendix N, which was distributed to community
hosts and owners within SAP and IBM. Due to confidentiality, this section
draws upon the main findings from the questionnaire that can be done without
exposing sensitive information.
3.1. Questionnaire findings
3.1.1. Overview of responses
The ROBUST use case partners, IBM (employee community) and SAP
(business community), distributed the questionnaire within their organisations,
providing us a total of 48 responses (34 from IBM and 14 from SAP). Most of

257859), 2011 14/145

the IBM responders indicated they were community hosts (12), whilst more
SAP responders indicated they were more moderators (7).
The third use case partner representing the public domain, Polecat, could not
distribute the survey to community hosts and owners, but has provided
information regarding risks and opportunities, which has been incorporated in
Section 3.2.
3.1.2. Community insights
The notion of a community is somewhat different in the case of IBM and SAP.
In the IBM case, there is a community platform called IBM Connections, which
has a user base and many private and public communities. Any user can
belong to any number of communities, which has their own set of tools like
forums, wikis, blogs, etc. The IBM communities vary in size and purposes,
which is detailed further in D7.2 [79]. In the SAP case, there are few
communities, and the main focus is on customer support in different areas
(related to their products and services). Therefore, the SAP responders gave
very similar responses, whilst there is a lot more variation between the IBM
responders.
We asked the community hosts and owners/moderators to rank a list of 14
possible benefits for operating/hosting their respective online communities,
which is summarised below in Table 1. They could give each option a rank of
1-5, 5 being the highest. Some responders did not rate some options, which
could imply it was not applicable or not a benefit at all to the community.
However, it can also mean the responder could not comment on that
particular option. The question did not give a specific instruction about this.
Therefore, to avoid ambiguity in the interpretation of the results, the table
reports on the number of people who have ranked a benefit (count) and two
averages to summarise only the ranked responses (median and mode). It
should be noted that the people responding on behalf of different
communities, particularly IBM, rank the benefits very differently. For example,
for some communities, public relations is important (ranked 5), whilst it is not
in others (ranked 1). Furthermore, this is clear from the count of IBM
responses, showing that some options were only considered a benefit (to
some degree) by about half of the responders (e.g., reputation management,
advertising and marketing, and public relations). As discussed above, the SAP
communities have similar objectives, which is why we can observe more
uniform answers (high count, median and mode similar).


257859), 2011 15/145

Table 1: Benefits in operating/hosting the online community. Ranked 1-5, 5 being the
highest.
IBM SAP
count median mode count median mode
Customer support 23 3 4 14 5 5
Developer support 24 3 3 13 3 5
Ideas generation 33 4 4 14 3 3
Opinion research 22 3 3 13 3 3
Spread of word of mouth 24 4 4 14 3.5 3
Market research 19 2 1 13 3 2
Advertising and marketing 18 2.5 1 13 3 2
Reputation management 17 3 1 13 4 3
Employee communication 32 4 5 13 3 4
Finding experts 26 4 4 14 4 4
Fostering collaboration 30 5 5 14 4 5
Public relations 18 3 1 13 4 4
New product development 21 3 4 13 3 3
Connecting people 32 5 5 14 4.5 5

Due to the purposes/aims of the communities in IBM and SAP, the highest
ranked benefits are not surprising. For example, for IBM, this includes
employee communication, fostering collaboration, ideas generation and
connecting people. The latter is also a highly ranked benefit in SAP. Customer
service is also an understandably highly ranked benefit in SAP.
Additional benefits mentioned by IBM responders:
x Transparency, open teaming and visibility of what goes on in the
community
x Sharing information and best practices
x Communicating information to a large audience
These additional benefits correspond well with the aims of IBM Connections,
as discussed in D7.1 [80] and D7.2 [79].
3.1.3. Business impact
The responders were also asked to rank the parameters thought useful for
measuring the business impact of the online community. Table 2 gives a
summary of the responses. Note that the responders ranked these
parameters according to their perception of what would be useful, or would be
useful to them, if such metrics were available to them. It is also worth
emphasising that some responders commented that they did indeed not know,

257859), 2011 16/145

but would find access and availability of such metrics to be very useful in
assessing their respective communities.
Table 2: Parameters perceived useful for quantification of business impact. Ranked 1-
5, 5 being the highest.
IBM SAP
count median mode count median mode
Number of community
members
28 3 5 14 4.5 5
Number platform visits
per day
26 3 4 13 4 5
Time users spent online 19 3 1 12 4 5
Customer support load 16 1 1 13 4 3
Sales figures 15 1 1 13 3 1
Productivity 20 3 1 13 3 4
Work processes
outcomes
21 4 5 12 3 3

Similarly to the insights gained from the responses about the benefits of
operating/hosting the communities, as discussed above, we observe the same
variation in which parameters the different communities use for quantifying the
business impact. We also asked about additional parameters that may be
used, or would be interesting. The following parameters were suggested by
SAP responders (business community):
x The number of new forum threads
x Connectedness of members
x Feedback survey
The following parameters were suggested by IBM responders:
x Level of activity in discussion forums
x The magnitude of content generated by users
x Reduction in email frequency
x Number of downloaded content
x Ratings of content
x Response times to members requesting help
x Votes on ideas
x Whether a community is self-sustainable
x Testimonials

257859), 2011 17/145

If business objectives can be measured according to metrics that are
extracted from community data logs, this can be managed and monitored in a
risk management system. Furthermore, the framework discussed in Section 7
includes a linking of risks and opportunities to community objectives, in which
the latter would be the first step to define. Therefore, when reporting the
foreseen impact of certain risks and opportunities, this should be connected
directly to objectives of a community.
3.1.4. Community health
Related to the above question, the community hosts and owners/moderators
were also asked what metrics they considered important for measuring the
health of their respective communities.
This was an open question, to which SAP responders suggested:
x The number of users or unique visitors
x The number of active users
x The number of forum contributions
x Hits per page
x The number of answered questions
x The number of answered questions vs the number of unanswered
questions
x Response times
x Contribution points
x Zero downtime (of services)
x Ratings
x Visiting trends
Suggestions from IBM responders (employee community):
x Activity (not specified, but assumed to be generation of content in one
form or another)
x Active participation in forums
x Quality of interactions (not volume of users)
Most of the suggestions above can be measured directly from data (if
provided). However, regarding the latter suggestion, quality of interactions, the
responder emphasised that the number of users in a community would not be
an interesting metric for community health in itself. This is an example of a
metric that is not trivial to quantify, which may need bespoke tools to be
developed to accommodate a measure of quality for a specific community.
An interesting observation to make here is that most of the metrics are
quantifications of different types of community activity, which correlates to one

257859), 2011 18/145

risk most mentioned in the questionnaire; that the community becomes
inactive.
3.2. Risks and opportunities
A wide range of different risks and opportunities have been identified in the
questionnaire, as well as via discussions with project partners and in the
literature. The risks and opportunities discussed in this section serve as
examples of what a community may experience, which provides invaluable
input as requirements for the risk management framework discussed in
Section 7. This is to ensure that the framework is generic, flexible and
extensible to cover the different types of risks and opportunities we can
envisage, as well as new risks and opportunities any respective community
may have. Identification and analysis of specific risks and opportunities, as
well as developing tools for detection/forecasting, are key research activities
spanning several work packages in ROBUST. Section 8 discusses specific
risks and opportunities currently considered in the ROBUST use cases, and
tools under development in the project to address them.
Due to confidentiality, we cannot expose certain details of the risks and
opportunities suggested in the questionnaire as they should not be interpreted
as actual risks in the respective communities. The full details of the
questionnaire responses are therefore only available in a confidential
appendix (Appendix O). It should be noted that many of the risks and
opportunities discussed below are hypothetical, particularly those stemming
from the questionnaire, in which very little detail was provided. Therefore, the
examples below are discussed at a high level.
For the sake of presentation of the types of risks and opportunities in this
report, we were able to group most of them into a set of 9 categories, which
are discussed in respective sections below:
Community/user activity
Community evolution
Community usage
User experience/behaviour
Community/user role dynamics
Community structure
Community content
Community maintenance
Security
This is not intended to be a formal taxonomy, and it is clear that there is some
overlap between the categories. This is largely due to the interrelated nature
of the risks and opportunities. Also note that any given community may only
experience a few of the risks and opportunities discussed above, which

257859), 2011 19/145

ultimately depends on the objectives of the respective community (as
discussed above in Section 3.1.3).
3.2.1. Community/user activity
This is one of the main risks identified from the questionnaire and use case
partners, which can be defined in a variety of ways. For example, risk of a
community becoming inactive in terms of an increasing proportion of inactive
users, less content generated, key users in the community leaving (which can
have an avalanche effect on other users leaving), etc. Such a risk could also
consider that users stop consuming information from the community,
indicating that they do not find it interesting anymore. In this regard, sentiment
analysis could be a useful tool in detecting such a risk. Churn is a term that
refers to users becoming inactive, which is a risk that has already been
addressed in the literature [50]. This is discussed further in Section 8.4.1.
On the flip side, there could be an opportunity of users becoming more active
if a particular user becomes inactive or even leaves. For example, if this would
be a user who had a negative impact on the community, the other users will
experience a more positive environment. Even if this is not the case, the event
of one user leaving could give rise to somebody else emerging into a more
active role within the community. There could also be an opportunity to
increase community activity by connecting certain individuals, or introducing
content to users according to their interests.
By analysing the interaction and activities of users in a social network, it is
possible to analyse the strength of ties between them. The strength of ties
between people plays a vital role in social networks, but the theories
surrounding this topic is a debatable issue.. For example, McPhearson et al.
[58] argue that the strength of ties affect users' membership in a community,
correlating weak ties with a higher risk of users leaving (churning). Strong ties,
on the other hand, can be stabilising in nature ("breeding local cohesion [28]),
which can result in people remaining intact to social networks. Consider, for
example, an individual who is interested in blogging and forums for technical
discussion, but most of her/his friends (strong ties) exchange and share
knowledge via Wiki pages. Thus, it is most likely that s/he will feel
disinterested in blogs and forums and will contribute more often to the Wiki.
However, Granovetter [28] states that weak ties increase the efficiency of
information diffusion by minimising redundancy. Thereby, weak ties is more
effective in terms of exposing users to new information [58]. Therefore, this
can be desirable for a community in which information transfer/sharing is
essential.
3.2.2. Community evolution
In the early days of a community, the growth may be a sensitive parameter,
dependent on the number of moderators available to manage it. There could
be a risk that the community grows too quickly. However, there could also be
a risk that a community grows too slowly to "catch on if it attempts to
establish itself in a competitive area.

257859), 2011 20/145

Depending on the objectives of a community, there could be a risk or
opportunity in an increasing diversity in a community. This could be measured
in terms of topics discussed, types of users, etc.
3.2.3. Community usage
This category includes risks such as users not finding the right
information/content or experts to resolve their problems (answer their
questions). Related to this, there is a risk of deletion of useful content. If the
community serves as a support network where users answer other users'
question, there is a risk of high level of incorrect answers given.
If users are not happy with something in the community, a risk could develop
based on a high level of negative sentiments from users. This could be about
a user, community feature, product or topic of conversation.
If the community has a large growth rate, as discussed above, there is a
related risk of experts being overloaded. The community could also
experience a high level of trivial/unwanted content being generated. This
could in turn lead to users ignoring messages sent to the community because
they start experiencing the information as spam.
3.2.4. User experience/behaviour
Following on from the last risk mentioned above, in the context of user
experience, there is a particular risk in online communities that users are
inflicted by spam. Similarly, there is a risk that a user is or becomes a
spammer.
Features may be added to a community to improve certain operations, but this
could add to the risk of users misusing platform capabilities. For example, in
the SAP SCN [62], there is a point system in place to encourage users to
contribute to the community. This does expose the risk of users becoming
point hunters instead of providing quality answers to users.
Related to community/user inactivity, there is a risk of users becoming
dissatisfied. This could be an early sign of users becoming inactive and
leaving the community if they remain dissatisfied for some time.
Dissatisfaction could be induced by several things, such as inappropriate
language in the community or negative sentiments about a topic, which could
be treated as separate, but related, risks.
Long response times and long periods to provide answers to users are
examples of risks affecting the users' experience. Note also, that response
times were suggested as a metric for measuring the business impact and
health of online communities in the questionnaire, as discussed above.
Based on positive and negative statements of the users in an online
community, it is possible to predict user behaviour based on semantic
analysis [20]. For example, usage of inappropriate or abusive language to
intentionally hurt sentiments of other user can create a negative effect on the
community. Nasukawa and Yi [67] propose a prototyping system using
Natural Language Processing (NLP) by associating relationship between

257859), 2011 21/145

sentiments expressed and subject being discussed to find sentiments.
Sentiment analysis is also something being researched in ROBUST as
discussed in D5.1 [49].
3.2.5. Community/user role dynamics
Recent research has investigated the role dynamics in different online
communities [4], which can be quantified into risks or opportunities. For
example, there could be a disproportional ratio of composition roles, such as
experts and newbies. There could be a bipolar division of users between
consumers and contributors, which could be a risk in a community that aims to
maximise the interactions between users and engage everybody to contribute.
Related to this, there could be a specific risk of users not contributing across
forums, forming small cliques. User behaviour and role dynamics is currently
being researched by ROBUST partners, and further discussion is available in
D3.1 [81] and D5.1 [49]. This line of research is interlinked objectives of
different types of communities, which is discussed in D7.2 [79].
3.2.6. Community content
As indicated previously, a risk can be defined in many ways, and, thus, also
classified differently. In Section 3.2.3, above, a risk related to users
experiencing a high proportion of trivial/unwanted content being generated
was discussed. Similarly, a risk could be defined from a content perspective,
as a high proportion of low quality content being created. A high level of
duplicate content or topics can also be a risk.
Depending on the community objectives, an increasing diversity of topics may
be a risk if this deviates from the community mission. However, this could give
raise to an opportunity to split a community into two communities that can
continue to flourish.
3.2.7. Community structure
Based on community network graph metrics, several risks and opportunities
can be envisaged. For example, depending on the objectives of a community,
certain community topologies could be risky, such as one in which there are
many sub-communities with very few links between them. This could relate to
a possible risk of poor collaboration. However, as indicated above, there is
also an opportunity in splitting/merging communities based on topological
distributions.
By analysing the interaction between users in the community, there are
opportunities for strong or weak ties, as discussed previously in Section 2,
and there is an opportunity to identify common interests between users and
recommend users/products.
3.2.8. Community operation
A community is typically managed by moderators to ensure that users have a
good quality of experience. If the workload of moderators becomes too great,
there is a risk that they cannot perform their duties. There is also a risk to the
quality of service that users experience if the community services become

257859), 2011 22/145

unavailable or the platform response times become too great. However, there
is also an opportunity in increasing the number of moderators, possibly by
means of identifying people in the community who would be able to step up to
such a role. This is something considered and discussed in [85].
3.2.9. Security
A community may have security related risks, such as users leaking
confidential information or moderator accounts are hacked. The impact of
such risks can be very large, but the detection or prediction is not trivial.
3.2.10. Other
Other interesting opportunities that do not fit into the above categories include
advertisement of other company products that users may be interested in, and
new software features may be identified by challenging or unresolved queries
from users.
3.3. Summary
It is clear from the findings discussed above that a community may face a
wide range of risks and opportunities. However, the exact risks and
opportunities ultimately depends on the objectives of the respective
community. The findings from the questionnaire demonstrates that different
communities may have very different objectives and measure the health of
their communities differently. Research discussed in D7.2 [79] delves deeper
into this for the communities in IBM Connections.
It is clear that risks and opportunities must be linked to community objectives,
and a generic, flexible and extensible framework is required to be able to
define and detect/forecast the respective risks and opportunities. One
particular challenge that needs to be addressed is the interrelated nature of
risks and opportunities, in which there can be non-linear dependencies which
are not straightforward to model or predict. For example, the risk of a user
becoming inactive may impact on the risk of that user leaving the community,
which may impact on other users leaving the community, which ultimately may
impact on the risk of the community becoming inactive.
Moreover, it is clear that an event in the community can pose as a risk or
opportunity. Consider, for example, the risk of an active contributor leaving the
community. This can actually give rise to other members in the community
becoming more active and influential, which may have a positive effect on the
community's objectives. t is natural to assume that users will, at some point,
become inactive and may leave the community. However, consider the same
risk in a community offering customer support, such as the SAP Community
Network (SCN) [62]. It is a significant risk of there are none to fill an important
gap in knowledge or topic expertise if the user leaves the community. The
impact on the community objectives may be great since the quality of service
that the users experience will deteriorate.
Depending on the objectives of a community, something that is a risk in one
community may actually be an opportunity in another. For example, Section

257859), 2011 23/145

3.2.1 briefly discussed the debatable issue of strong and weak ties in a
community. According to theory, if a community aims at maximising the
information sharing in a community, then a large proportion of weak ties may
be desirable. However, there is also a risk that users are then more likely to
leave the community. Such trade-offs are not trivial to define or manage from
a user's point of view, nor from the point of view of developing a risk
management framework that makes this process easier for the user.
The consecutive sections discuss existing risk management standards,
techniques for risk management, which leads onto the proposed risk and
opportunity management framework discussed in Section 7. Specific risks and
opportunities considered in the ROBUST use case partner's communities, and
tools developed to address them, are discussed in Section 8.
4. Risk management standards
The risk management standards provide a reference for risk and opportunities
management. The choice of a risk standard is out of scope of the ROBUST
project. However, the ROBUST taxonomy and management process should
be in line with the standards consensus. The ROBUST project aims to provide
a tool in order assist the decision maker and support the risk management
process; it is therefore important that the ROBUST risk management tool is
viewed as part of the overall risk management process in the organisation.
We present here some of the highly acclaimed risk management standards
focusing on the risk definition and the management process.
4.1. ISO 31000
ISO 31000 [42] is an international standard based on AS/NZS 4360 entitled
"Risk Management Principles and Guidelines. SO/EC 31010 [43] Risk
Assessment Techniques is a complementary standard providing a range of
tools for different types of risk assessments.
SO31000 defines risk as the "effect of uncertainty on objectives. An effect is
a deviation from the expected positive and/or negative. Risk management
is defined as "coordinated activities to direct and control an organization with
regard to risk.
SO31000 indicates that "Risk is often characterized by reference to potential
events and consequences, or a combination of these. "Risk is often
expressed in terms of a combination of the consequences of an event
(including changes in circumstances) and the associated likelihood of
occurrence.


257859), 2011 24/145

Figure 1: ISO 31000 management process

The risk management process (Figure 1) consists of the phases:
Communication and consultation, establishing the context, Risk assessment
(identification, analysis, and evaluation), risk treatment, monitoring and
recording.
Risk identification deals with identifying sources of risk (inside or outside the
organisation), impact areas, events as well as their causes and
consequences. Risk analysis involves developing an understanding of the risk
and determining the risk attributes mainly consequences and likelihood. Risk
evaluation then classifies the risks according to criteria that will determine
which ones require treatment. The treatment phase includes choosing
strategies e.g. mitigate, avoid, exploit and accept.
4.2. BS 31100:2011
This is a standard produced by the British Standards Institution (BSI) [13]. It is
a general risk management framework for understanding, developing,
implementing and maintaining risk management throughout an organisation. It
follows the terminology for risk set out in the ISO/IEC Guide 73.
BS31100:2011 provides advice and guidance on developing, implementing
and maintaining risk management that is fully aligned with ISO 31000. Similar
to SO 31000, risk is defined as the "effect of uncertainty on objectives, where
an effect is a "deviation from the expected being positive or negative.
BS31100:2011 is articulated around the following phases of a risk
management framework:
x Mandate and commitment
x Framework design for managing risk
x Implementing risk management
x Monitoring, review and improvement of the framework
The risk management process to be implemented in ROBUST is part of the
third phase "implementing risk management including the following activities:
identify, assess, respond, report, and review.

257859), 2011 25/145

4.3. FERMA: A Risk Management Standard
The Federation of European Risk Management Association (FERMA) adopted
the Risk Management Standard published in the UK, 2002 [39]. This standard
is the product of multiple risk management organisations in the UK including
the Institute of Risk Management (IRM), The Association of Insurance and
Risk Managers (AIRMIC), and The National Forum for Risk Management in
the Public sector (ALARM).
The standard broadens the scope of risk management beyond corporations or
organisations to include any activity whether short or long term. Some of the
terminology used by ISO/IEC Guide 73 [41] is re-used by this standard. .
The standard defines risk as "the combination of the probability of an event
and its consequences. t also notes the potential for events and
consequences that constitute opportunities for benefit (upside) or threats to
success (downside). In FERMA therefore, risk management is concerned with
both positive and negative aspects of risk. The actual risk management
process is very similar to the ISO 31000 standard. It starts from the
organisation's strategic objectives to carry out risk assessment (analysis and
evaluation), risk reporting, decision, treatment, residual risk reporting and
finally monitoring. The standard suggests classifying business activities (e.g.
strategic, operational, financial, knowledge management and compliance) as
part of a methodical approach to ensure that all activities and risks are
covered.
4.4. Enterprise Risk Management COSO
The Committee of Sponsoring Organisations (COSO) is a voluntary private-
sector organisation including multiple American associations and
organisations.
In 2004 COSO issued its Enterprise Risk Management Integrated
Framework [16]. COSO defines ERM as:
"Enterprise risk management is a process affected by the entity's board of
directors, management and other personnel, applied in strategy setting and
across the enterprise designed to identify potential events that may affect the
entity and manage risk to be within the risk appetite to provide reasonable
assurance regarding the achievement of objectives.
COSO considers events that have negative impact as well as positive one.
Events with negative impact represent risks that prevent value creation or
erode existing value; however, events with positive impact may offset negative
impacts or represent opportunities. "Opportunities are the possibility that an
event will occur and positively affect the achievement of objectives, supporting
value creation or preservation.
COSO considers that there is "a direct relationship between objectives (e.g.
strategic, operations), which are what an entity strives to achieve, and
enterprise risk management components (e.g. objective setting, event
identification, risk assessment .), which represent what is needed to achieve

257859), 2011 26/145

them. These relations can be seen in the context of the organisation or any
entity within (e.g. division, department).
4.5. M_o_R
M_o_R is an OGC (Office of Government Commerce) [69] publication
intended to help organisations to put in place effective frameworks for taking
informed decisions about risk.
M_o_R defines risk management as "the systemic application of policies,
procedures, methods and practices to the tasks of identifying and assessing
risks, and then planning and implementing risk responses.
M_o_R covers a wide range of topics, including business continuity
management, security, programme/project risk management and operational
service management. It is suitable for application on the organisation level as
well as activity level.
MoR defines risk as "an uncertain event or set of events that, should it occur,
will have an effect on the achievement of objectives. A risk is measured by a
combination of the probability of a perceived threat or opportunity occurring
and the magnitude of its impact on objectives.
M_o_R considers threats to be the events with negative impact where
opportunities to be the ones with positive impact.
4.6. ROBUST online community risk management
This section reviewed the international risk management standards that are
widely used. These standards show the need for risk management and efforts
that evolved from different domains (e.g. finance, commerce, project
management) and countries (e.g. UK, USA, Australia and New Zealand). It is
the organisation who decides which standard to abide by and put into effect in
its daily business operations. The decision depends on multiple parameters
e.g. applicable laws, regulatory standards and the business domain.
The choice of a risk standard is out of scope of the ROBUST project.
However, the framework process, definitions and terminology used in
ROBUST have been chosen to align with the standards described here. The
ROBUST project aims to provide a tool in order assist the decision maker and
support the risk management process; it is therefore important that the
ROBUST risk management tool is viewed as part of the overall risk
management process in the organisation.
The standards stressed the organisational context as a starting point to derive
the scope of the risk management process (e.g. organisation unit, division,
project, or programme). In this respect, the online communities analysed in
ROBUST can be seen as an organisational unit or even as a project and fit
within the overall risk management vision of any organisation.
The definition of risk in the standards above is a basic difference that can be
pointed out. Two main ideas can be distinguished here:

257859), 2011 27/145

1- Risk being an event or set of events affecting the objectives
2- Risk is the effect of uncertainty on objectives
Though ISO is currently leading this latter philosophy, such a shift did not
affect the way risk is specified where the standard indicates that "it is often
characterized by reference to potential events and consequences, or a
combination of these. ROBUST will consider risk specification in terms of
event likelihood/uncertainty and consequences. In ROBUST we will assume
that the risk impact will definitely happen after the event occurs. Thus the risk
likelihood is in effect the event likelihood.
From the terminology point of view, risk is viewed as supporting negative and
positive impact on objectives. In particular, threats are the events that have
negative impact where opportunities are the ones with positive impact.
ROBUST considers risk as event(s) with negative impact where opportunities
represent ones with positive impact. This is a minor difference that shouldn't
affect the process of integrating ROBUST within an organisation risk
management process. We will make sure that the ROBUST components
implementation (Risk editor and dashboard) allow such a terminology
modification if required.
The risk management process is similar in most of the standards except for
minor differences in granularity or semantics for instance including risk
description and risk estimation as part of the risk assessment phase in
FERMA. There is little attention to the opportunity management and how the
treatment differs from risk treatment. M_o_R included a list of different
strategies according to the type of event (threat or opportunity).
ROBUST will be providing tools in order to assist in risk description, analysis,
evaluation and decision making in the treatment process. The following
sections discuss the state of the art techniques for addressing these phases
of the process.
5. Techniques for risk management
This section presents a list of techniques that can be used for risk
management phases including risk identification, description, analysis and
treatment. This is not an exhaustive list of techniques; however, these help to
understand the specifics of risks and opportunities in the ROBUST context.
These techniques are used in different domains and considered by the
international standards [39, 43] as mature good practices. The techniques
presented here are discussed as well as their applicability in ROBUST.
5.1. Risk identification
This is a task to be executed on all levels of the organisation involving people
with appropriate knowledge e.g. managers, business analysts, operations
experts. The aim is to identify any risks that may affect the organisation's
objectives.

257859), 2011 28/145

Many techniques were mentioned in the literature as well as in the standards
for risk identification [77]. These techniques include:
x Systematic team approaches (e.g. brainstorming, questionnaires,
scenario analysis, risk assessment workshops)
x Evidence based methods (e.g. checklists, review of historical data)
x Reasoning techniques (e.g. HAZOP- Hazard & Operability Studies)
There is not a single technique that addresses the identification of all risks
and/or opportunities within the organisation. The selection of one or
combination of these techniques depends on the organisation and capabilities
(e.g. availability of experts, availability of checklists in their business domain).
In ROBUST we focused on systematic team approaches in order to identify
risks and opportunities. Questionnaires were formulated addressing
community owners to identify what risk and opportunities affect online
communities. The ROBUST risk and opportunity framework will facilitate the
review of historical data via visualisation of the community interactions and
relations. ROBUST can also support the "Structured What-if Technique
SWFT via the simulation approach (see section 8.2). Note that a community
owner using ROBUST platform can still use any technique for risk
identification.
5.2. Modelling and evaluation
These techniques are used in order to represent and analyse risk and
opportunities. The representation in this context serves mainly the analysis
task. Another task (WP1- T1.5) will be dealing with representation of risks and
opportunities in the dashboard.
5.2.1. Tree-based analysis
Tree based techniques allow the modelling of hierarchies of different events,
causes and consequences. As examples of these trees we present here Fault
Tree Analysis, Event Tree Analysis and cause-consequence analysis.
An event tree [44] is a graphical representation of initiating events and
sequence of possible outcomes. Event Tree Analysis provides an inductive
approach to reliability assessment as they are constructed using forward logic.
This technique evolved from engineering background in order to model the
propagation effect of events across multiple components. The probability of
occurrence is assigned to each tree branch and thus the outcome probability
of each path can be calculated. Figure 2, below, shows an example of an
Event Tree for an online community showing the outcome of the event "spam
detector failure. The disadvantage of such modelling is that initiating events
are treated individually and multiple trees required for multiple events.
Simultaneous events are not modelled as well.

257859), 2011 29/145

Spam detector fails
Influential user dissatisfied Result
Yes (0.8)
No (0.2)
Yes (0.75)
No (0.25)
Yes (0.25)
No (0.75)
User leaves
Very Bad
OK
Bad
OK

Figure 2: Event Tree Analysis.

Fault Trees [23] graphically represent the interaction of failures and other
events within a system. Basic events at the bottom of the fault tree are linked
via logic symbols (known as gates) to one or more top events. These top
events represent identified hazards or system failure modes for which
predicted reliability or availability data is required.
Basic events at the bottom of the fault tree generally represent component
and human faults for which statistical failure and repair data is available. If the
probability of these faults is provided, the probability of the top events can be
calculated. FTA allows simple risk assessment, as depicted below in Figure 3,
by calculating the probability of an undesired event (top event) based on other
independent events (A, B, C).
TOP Event
Intermediate
event
A B
C
AND
OR

Figure 3: Fault Tree Analysis.
Cause-consequence analysis [3] combines cause analysis (described by fault
trees) and consequence analysis (described by event trees) (Figure 4). The
cause-consequence diagram starts with an undesired event and develops
backwards to identify its causes (represented by a fault tree) and forward to
identify its consequences (using event tree).

257859), 2011 30/145

Condition
A B
OR
Initiating event
Yes No
Condition
Yes No
Fault tree
Fault tree
Consequence Consequence Consequence

Figure 4: Cause-consequence analysis
With the probabilities of occurrence attached to all the associated events in
the diagram, the probabilities of the different consequences of the undesired
event can then be calculated.
Tree based techniques presented above fit well in systems where
components and interactions between them are well defined. Such
decomposition is much more difficult in online communities where event
causality structure is hard to discover and maintain over time. Moreover, these
techniques do not link the risks to the objectives of the system. In the following
sections we present techniques that addressed this issue.
5.2.2. Goal-risk modelling
The Tropos goal model [63, 91] proposes a formal framework to do
requirement analysis by refining stakeholders' goals and ending up with the
elicitation of the requirements. The framework results in a number of goal
models represented as graphs <G; R>, where G are goals and R are relations
(decomposition or contribution relations).

G2
G1
G3
G4 G5
AND
OR
+
-
+ -
FS
PS
-

Figure 5 Goal model showing relations (+,-) between goals and attributes (Fully
Satisfied-FS, Partially Satisfied-PS)

257859), 2011 31/145

In Tropos, a goal is defined as a strategic interest of a stakeholder that is
intended to be achieved. Tropos distinguishes between goals and soft-goals.
The concept of a soft goal is the case in which there are no clear a priori
criteria for satisfaction, but is judged by actors as being sufficiently met.
Each goal has two attributes SAT-Sat(G) and DEN-Den(G) , which quantify
the value of evidence for the goal being satisfied and denied, respectively.
Tropos goal analysis (Figure 5) allows the analyst to model the influence of
the satisfaction (denial) of a goal to the satisfaction (denial) of other goals.
This influence can be positive or negative and is graphically indicated by "+/-"
contribution relations. Relationships between goals propagate satisfaction and
denial values where conflicts are possible. Tropos proposes algorithms that
solve these propagations.
In [6], the authors introduced two new entities: event (e.g., risk, opportunity)
and treatment (e.g., tasks, countermeasure, and mitigations). This allows for
modelling uncertain events, mainly risks that can influence the fulfilment of
one or more goals, and treatments that are needed to manage the effect of
risks. Each entity has a separate layer of analysis: Goal layer, Event layer,
and Treatment layer.
An event can influence more than one goal and one event can be considered
as a risk for certain goals and, at the same time, as an opportunity for other
goals. Once the events have been analysed, the analyst identifies and
analyses the countermeasures to be adopted in order to mitigate the risks.
In [54], the authors investigate risks dependencies (as opposed to event
dependencies in tree-based analysis) and suggest a risk management
methodology based on dependency assessment (Figure 6) .
If the occurrence of risk R1 has an effect on risk R2 then R2 is dependent risk
or direct successor of R1 denoted by R1->R2. The dependency relation can
be bidirectional i.e. R1->R2 and R2->R1.
R2
R1
R3
R4 R5

Figure 6: R&O dependencies
A risk is defined mathematically as Rx=f(Px, Ix) where Px is the probability
and Ix is the impact. The risk dependency is mainly that of their probabilities
rather than impact and thus the authors deal with Prior and Posterior

257859), 2011 32/145

probability of risk given its predecessor. The posterior probability may be less
or more than the prior one according to the effect being positive or negative.
This work considers that the dependencies values and structure are fixed over
time. The work doesn't go further to calculate the dependencies via
mathematical models as illustrated in the following sections.
5.2.3. Markov chains
Markov chains [29] provide a mathematical method to analyse systems. A
Markov chain is a discrete (time) random process with the Markov property
1
.
The Markov property states that the transitions' probability, given the current
state, depend only on the current state and on none of the previous states.
The states are usually represented as nodes in a directed graph, where edges
represent possible transitions between states (Figure 7).
1
2
3
0.2
0.7
0.1
0.5
0.5
1

Figure 7: Markov chain example

The edges are labelled with the transition probabilities representing the
probability of the system moving from a state to another. In the ROBUST
context, these states can be mapped to risks or opportunities.
Mathematically, a Markov chain is a sequence of random variables (stochastic
process) X
1
, X
2
, X
3
,..., X
i
with the Markov property. Each variable represents
the state of the system at discrete time (t=i). Formally:

The transition probabilities of the system can be represented as a transition
matrix P giving the probability distribution of X
n
given any possible value of
X
n-1
:

1
Works in the literature addressed continuous time as well

257859), 2011 33/145

Each p
ij
represents the probability of the system transiting from state i to state
j. Each row of P is then a probability distribution summing to one.
Time-homogenous Markov chains (or stationary Markov chains) are those
where the transition probability is independent of time step n:

On the other hand, time-inhomogeneous Markov does not assume a
homogeneous behaviour of the stochastic process. The transition probability
matrix P is not a constant but a function of time.
Markov chains can be used to represent risks and opportunities mapping the
risk/opportunity event to a transition in the state model. The transition
probability is thus the event probability. According to the risk or opportunity in
question, current system state may/may not be dependent on the previous
state only, which makes Markov chain analysis inapplicable in certain cases.
5.2.4. Bayesian networks
The core of the Bayesian Network (BN) [7] representation is a directed acyclic
graph (DAG) (Figure 8). The nodes of the graph represent random variables
in the domain while the edges correspond to direct influence of one node on
another. In particular, an edge from node X
i
to node X
j
represents a statistical
dependence between the corresponding variables.
Thus, a value taken by variable X
j
(child node) depends on the value taken by
variable X
i
(parent node), i.e. variable X
i
"infuences X
j
.
The nodes that are not connected represent variables which are conditionally
independent of each other. The graph thus provides a data structure in order
to represent the joint distribution in a factorized way.
A BN refects a simple conditional independence statement namely that each
variable is independent of its nondescendents in the graph given the state of
its parents. The parameters are described in a manner which is consistent
with a Markovian property, where the conditional probability distribution (CPD)
at each node depends only on its parents.


257859), 2011 34/145

B
A
C
D
P(A=F)
0.5
P(A=T)
0.5
A
F
T
P(C=F) P(C=T)
0.8 0.2
0.2 0.8
A
F
T
P(B=F) P(B=T)
0.5 0.5
0.9 0.1
B C
F F
T F
F T
T T
P(D=F) P(D=T)
1.0
0.1
0.1
0.01
0.0
0.9
0.9
0.99

Figure 8: Bayesian Network example
For discrete random variables, this conditional probability is often represented
by a table, listing the local probability that a child node takes on each of the
feasible values for each combination of values of its parents. The joint
distribution of a collection of variables can be determined uniquely by these
local conditional probability tables (CPTs).
Two types of inference support are often considered: predictive support for
node X
i
, based on evidence nodes connected to X
i
through its parent nodes
(also called top-down reasoning), and diagnostic support for node X
i
, based
on evidence nodes connected to X
i
through its children nodes (also called
bottom-up reasoning).
Approximate inference methods were also proposed in the literature, such as
Monte Carlo sampling [82] where estimates improves as sampling proceeds.
A variety of standard Markov chain Monte Carlo (MCMC) methods, including
the Gibbs sampling [82] and the MetropolisHastings algorithm, can also be
used for approximate inference.

5.2.5. Regression models
Regression analysis is a technique to analyse and model several variables,
where the focus is on the relationship between dependent and independent
variables. As such, regression analysis helps one understand how the typical
value of the dependent variable changes when any one of the independent
variables is varied. Regression analysis estimates the conditional expectation
of the dependent variables given the independent variables. In all cases, the
goal or outcome of the analysis is a function of the independent variables
called the regression function. Regression analysis is widely used for

257859), 2011 35/145

prediction and forecasting and to understand which among the independent
variables are related to the dependent variables, and also to explore the forms
of these relationships. In linear regression, the dependency between variables
is usually expressed by coefficients, and the regression function is linear
(Figure 9).

Figure 9: Illustration of linear regression on a data set

Note that a causal relationship cannot be drawn from the finding of an
association even it is statistically significant [51]. The introduction of a causal
relationship can be made through expertise that is acquired through
experience or even intuition. However, causal relationships as "x causes y"
introduce the notion of time, as x has to precede y.
In nonlinear regression, the function is a nonlinear combination of the model
parameters. Typical examples include exponential functions, logarithmic
functions, trigonometric functions, power functions, Gaussians, Lorenz curves,
etc. In contrast to linear regression, there is no closed form solution for the
function coefficients. Therefore, numerical optimisation algorithms need to be
applied to determine the best-fitting parameters. Note also that there might be
several local minima of the function to be optimized.
5.2.6. Compartment models
Compartment Models, also called Multi-Compartment Models or continuously
stirred tank reactor (CSTR) models have been widely used in diverse fields
such as pharmacokinetics, epidemiology, biomedicine, ecology, systems
theory, complexity theory, engineering, physics, information science, and
social science [26]. They represent a graph-based approach to modelling, in
which a system under consideration is broken down into various sub-systems
denoted as compartments. Compartments are represented as nodes of a

257859), 2011 36/145

graph, while the interactions between such sub-systems are represented as
directed edges. These interactions usually represent migrations or
movements of individuals, material, energy, etc., from one compartment to
another. Compartment models can be seen as quantitative counterparts of
causal loop diagrams, a closely related concept from System Dynamics [87].

Similar to Markov Chains, compartment models are usually visualized by
directed graphs, where the nodes represent the compartments, while edges
represent migrations between different compartments. Migration rates can be
represented by weights on edges (Figure 10).

Figure 10: Graph representation of a compartment model.

For the purposes of the ROBUST project, a compartment will represent a
particular subset of the set of members of a given online community. Risks
and opportunities that can be modelled by way of compartment models
include, e.g.,

x Falling numbers of expert contributors within a community respective to
rising numbers of expert contributors. The corresponding model would
have a compartment whose size represents the number of expert
contributors.
x In a community with a reward point system, rising number of point
scammers respective to falling number of point scammer. Again, the
corresponding model would have a compartment whose size
represents the number of point scammers.

Compartment modelling benefits a lot from the analysis work being done in
WP3 and WP5 to identify and classify users into such compartments (e.g.
roles identification).
5.2.7. Agent Based Simulation
Agent-based modelling (often referred to as multi-agent systems) [46] is a
class of computational models that can be applied for simulating or controlling
complex systems that consist of multiple interacting components. Whilst the
main concept of multi-agent system simulation has been laid out in the late
1940s, its application to simulation of real world processes become
widespread from 1990 when the computational power offered by computers
was no longer an inhibitor [99]. Since then, the multi-agent systems have

257859), 2011 37/145

become renown as a powerful simulation modelling technique [72] that offers
the following advantages over the existing analytical tools [10]:
1. Capture emergent phenomena
2. Provide a natural description of the system
3. Offer flexibility as it is an extensible approach
Complex non-linear systems are susceptible to produce emergent
phenomena [74]. For example, cars produce traffic jams, distributed systems
generate trashing behaviour [31] whereas human crowds herd [5]. Whilst
these are behaviours that arise from the interactions of individual system
parts, it is often very difficult, if not impossible, to accurately predict their
emergence by looking at the architecture of individual elements at the design
time [30, 96]. The use of agent-based simulations allows to identify and
understand the origin of these behaviours before they actually have a chance
to endanger the real system operation.
By applying the agent-based modelling metaphor, the real system can be
represented as a set of interacting components referred to as agents [100].
Such agents represent individual parts of the simulated system, for example
people within a company organisation or production machines in a
manufacturing system [73]. Analogously to the real-system elements they
model, the agents have their own thread of control and are able to conduct
their own decisions based on the perceived state of the system [46, 71].Such
a natural mapping of the modelled system components into agents not only
allows to simplify the design of the system but also makes it possible to
realise the full potential of the data a company may have about, for example,
its customers that it then wishes to model. In the context of SAP internet
communities, knowing about the behaviour of a community user makes it
possible to simulate him/her as an agent with a set of behavioural rules
imitating the actions of a real person. This way, the dynamics of the whole
community can be reproduced (and validated) with a help of the already
existing history community data.
Flexibility and extensibility can be identified on many levels [72, 73, 100]. For
example, it is possible to increase the scale of the system by simply
introducing more agents without any additional model refinements. Likewise,
the complexity of the model can be modulated by altering the behavioural
rules of individual agents or by enabling learning and adaptation. In here,
small changes in the behaviour of individual system elements may result in a
significant change in the global system output.
The main application areas of agent-based simulation models are systems
that exhibit high degree of complexity, scale and dynamism and that are
susceptible to produce emergent behaviours. These involve flow management
(evacuation, traffic management) [17, 52], distributed systems control
(dynamic resource allocation) [12, 55], supply chain management [45],
organisations management (detecting operational risks and identifying optimal
organisational design) [76] and economy (predicting stock market behaviour,
designing shopbots and software agents) [95]. In these systems agents are

257859), 2011 38/145

used to reproduce the behaviour of individual system elements (cars in traffic,
stock traders, manufacturing machines, company employees) and understand
how alteration of the individual behaviours (or external system perturbations)
affect the global system response.
As with any other modelling technique, in order to serve a purpose, agent-
based models need to be developed at the right level of description [100]. Not
enough details of the simulated phenomenon will render the simulation output
incorrect, whereas too much details will produce an outcome that is difficult to
interpret [10]. In addition to this, simulating a large number of autonomously
interacting elements may be computationally expensive and thus require more
computational power than is required by other modelling techniques.
Consequently, whilst agent-based models offer powerful framework for
complex systems simulation, their application requires careful consideration
and sufficient understanding of agent-based simulation principles.
5.2.8. Summary
The section above surveyed the important techniques for risk modelling and
assessment. In many cases, the usage of one technique or another depend
on the risk or opportunity being considered. Correlation analysis is a
primordial step in order to analyse features and their relationship with a
certain risk/opportunity event. The ROBUST WP5 and WP3 are carrying out
analysis of these features which will feed into techniques like Bayesian
networks, regression and compartment models in order to assess the
risk/opportunity likelihood.
6. Risk and opportunity treatment
In the previous section, techniques for identifying and modelling risks and
opportunities were examined. In managing known risks or opportunities, it is
typical for the risk manager to select or develop a treatment plan that
addresses the challenge in front of her. The ROBUST environment should
support the specification and enactment of such treatment plans. Here, we
consider typical risk and opportunity treatment strategies and the state of the
art of workflow specifications and notations that allow treatment workflows to
be expressed and executed.
6.1. Risk and opportunity treatment strategies
As part of the process of managing an emergent risk or opportunity, some
determination must be made as to which type of plan is most suitable to
address the situation at hand. Whilst the detail of each particular plan will be
specific to the context of the problem, a general categorization of the typical
approaches has been characterized and widely accepted [40, 88] though
terminology may vary. In brief, these strategies are:

257859), 2011 39/145

Mitigate/Reduce
2
behaviours that seek to reduce the impact of the
event (denoted as a threat) when it occurs. These behaviours are
typically control-based actions that include early detection methods and
support-based mechanisms for effected entities.
Recovery/Fallback in light of an event that cannot be avoided, it may
be possible to apply additional actions 'downstream' of its immediate
impact such that 'knock-on' effects are themselves reduced or removed
entirely.
Avoid where possible, the cause of the event defined by the risk (or
the consequence of its occurrence) could be removed. The effect of
this is to reduce the probability and impact of the risk to zero (but in
consequence may lead to further necessary contingency plans to
compensate).
Exploit/Enhance if there is the ability to respond adequately to the
identified opportunity, it may be possible to enact a plan to exploit its
consequences into the best possible positive outcome. For example, in
a large community where a single, highly valued leader leaves, the
remaining members could be re-grouped into a more dynamic
structure.
Transfer an indirect form of risk reduction or avoidance: draw on
external resources or third parties (such as insurers) to accept
responsibility of the impact of the event when it occurs.
Share an extension of the form of risk reduction or opportunity
enhancement; here, multiple partners agree to share the 'pain' or 'gain'
from the risk or opportunity respectively. This strategy is typically put in
place where there are multiple stakeholders in a community that agree
to co-operate and can all benefit from pain reduction and opportunity
exploitation.
Accept an acknowledgement of the event risk and 'build in' an
operational pattern to address it. For example, this could mean limiting
the functions of a system under the risk condition to cope with the
situation.
Reject/Ignore this position takes the view that the risk or opportunity
simply cannot be addressed or considers its impact as sufficiently low
as to either not require any action.
One or more of these strategies may be selected by the risk manager based
on the balance of the availability of control measures; and the cost and overall
impact of the strategy. Decision based support for treatment strategy selection
within the ROBUST system will be primarily based on the use of the
community simulation facility to allow the risk manager to evaluate some of
the actions specified in a treatment plan. For example, one such action that
could be simulated would be the exclusion of a flamer from a particular forum.

2
Different terms are used by different standards

257859), 2011 40/145

Examples of other treatment responses (based on WP3 survey data) and their
mappings to these strategy types is given in appendix F.
6.2. Workflow notations
In this section a review of graphical notations that have been considered as
candidates for the presentation of risk treatment workflows is presented.
Graphical notations are considered separately from their machine processable
specification counterparts since there is often not a one-to-one mapping. Each
graphical notation will be accompanied with a brief contextual description and
a summary of the notation's contemporary usage. A summary and evaluation
concludes this section.
6.2.1. UML: Activity diagrams
The Unified Modelling Language (UML) standard, proposed and maintained
by the Object Management Group (OMG) is a graphical notation set that
supports the specification of structural and dynamic aspects of a wide range
of domains (but is most commonly used in system software modelling).

Figure 11: UML Activity diagram example (OMG 2011)
Activity diagrams (Figure 11) are intended to capture single or parallel control
flows that describe activities comprising of actions, objects, signals and events
(including interrupts, exception handling and basic temporal event labelling).
Actions can be parameterized, decomposed into further sub-activity diagrams,
and have objects associated with their local flow. Decision annotated forks,
joins and merges can be used to qualify serial and parallel control flow.
Activity diagrams provide an expressive platform that can be used in multiple
domains; this comes at the cost of a relatively abstract graphical
representation that often requires annotation to include domain specific
information. UML activity diagrams are supported by a wide variety of
software tools e.g. Borland's Together [11, 66], the Eclipse plugin framework
[22], Visual Paradigm's UML tooling [93].
6.2.2. Event driven process chains (EPC)
The event driven process chain notation was originally developed by SAP as
part of their R/3 software. Since then, EPC has been extended to include

257859), 2011 41/145

additional connectors - see yEPC [59], and risk treatment templates using c-
EPC, see [83].

Figure 12: EPC example [60]

The EPC model (Figure 12) is centred on 'active' functions that describe state
transformations; each transformation has a precedent input event and an
antecedent output event (both deemed 'passive'). The resulting state of
completed function can be determined by logical connectors that direct the
control flow. Parallel flows can be expressed by branching and merging
across logical connectors.
The relatively small number of graphical components found in EPC means
that the notation can be quickly learned and processes depicted in a simple
fashion. By the same token however, it could be argued that EPC lacks the
fidelity of expression that other notations enjoy making it less capable of
effectively expressing domain specific concepts directly. EPC development
has tool support included in applications such as the ARIS Platform suite [86];
ADONIS [8], EPC Tools [19] and Visio [61].
6.2.3. Business Process Modelling Notation (BPMN)
Originally conceived by the Business Process Management Initiative (BPMI),
the business process modelling notation standard is now maintained by the
OMG and is currently in revision 2.0. As its name suggests, the BPMN is
primarily aimed at business process managers seeking to model and improve
their organisation's activities.


257859), 2011 42/145

Figure 13: Business Process Modelling Notation example [70]

The BPMN is a comparatively rich set of graphical structures, nodes,
connectors and modifying icons (Figure 13). Activities are defined in terms of
tasks that sit within a process or that themselves contain sub-processes;
these can also be grouped into transactions. Tasks are further qualified by
actor type, enactment type and ordering; tasks are connected by flow arcs
that can be conditionally qualified. Data objects are connected to flows acting
as either inputs or outputs or can be used to represent message data.
Gateways provide junctions at which flow can either be directed based on a
specific in-coming task completion or can be split into parallel task flows (to be
merged later). One or more flows can be organised into 'swim-lanes', each of
which define the scope of that behaviour as belonging to the role of a
particular entity within the business organisation. Swim-lanes are stacked on
top of one another to form a pool; messages can be passed between lanes or
indeed entire pools. Additional information on the complete semantics of the
BPMN can be found in [70].
Arguably, BPMN is significantly more complex than other notations
considered here. As indicated by the growing tool support for BPMN, the
notation is gaining wide spread acceptance in industry. Editors include
Software Architect for WebSphere [37], Business Process Visual Architect
[92], ActiveVOS Enterprise [1], and Intalio|BPMS [38] among others.
6.2.4. Yet Another Workflow Language (YAWL) notation
The YAWL foundation claims the highest level of support for workflow
abstractions of any of the notations described here [98]. The YAWL editor
provides direct support for control flow; data flow; resource (a work enactment
entity) flow; exception handling and temporal event management.

257859), 2011 43/145

Figure 14: YAWL example [89]
YAWL's graphical notation (Figure 14) is task-centric: task nodes can be
defined as atomic, composite and represented either singularly or as multiple
instances. Each task node can be labelled, display custom icons, and be
marked with additional parameters including time, automation and
cancellation behaviours. Unlike other workflow notations, data objects are not
represented graphically within the workflow diagram. Condition nodes
represent state representations of the workflow network; arcs connect states
to tasks as well as tasks to other tasks.
In contrast to other notations reviewed here, YAWL has a strong dependency
on its underlying workflow engine to support workflow specifications. This
dependency provides the user with access to a powerful, service-based
enactment model that features XML data definition; XML document
processing; parameterized service execution; and dialogue generation. Whilst
this extended functionality exceeds some of the other workflow formalisms, it
is important to note that it is a) not a part of the notation proper, b) highly
technical in nature, and c) tightly bound to the YAWL application framework.
6.2.5. Other workflow notations: scientific and visual programming
languages (VPLs)
Visual programming languages have existed for almost as long as traditional
programming syntax and typically express many of the core concepts that the
workflow notations discussed above employ. There are many VPLs available
(a review of them is beyond the scope of this work) and of these, a significant
number have been designed primarily for the use of software programmers.
Two examples that illustrate graphical workflow specification in the scientific
domain are briefly outline below:
Labview
National nstrument's Labview [68] is a scientific/engineering software
platform that integrates data sources and processing from connected
hardware (or virtual software) devices into an application model that is
programmable.
The Taverna workbench
Scientific workflows can be graphically specified using the Taverna workbench
[65], which is a GUI front-end to the Taverna workflow management system.

257859), 2011 44/145

Using Taverna, scientists can carry out complex, computationally based
experiments by orchestrating local or remote services as specified in
workflow.
6.2.6. Summary
A high-level review of the most widely used graphical workflow notations has
been provided in section 6.2. The precision and expressiveness of each is a
reflection of their specific application to the problem domain they were
designed to support. For example, UML provides a high level of abstraction
for a wide variety of domains but requires additional annotation to clarify
meaning. On the other hand, VPLs such as Labview offer a much more
concrete, programming oriented description of data-flow with highly specific
graphical annotations.
The descriptions of treatment actions set against risks and opportunities
provided by the ROBUST use-case partners (see appendix O) suggests the
need for a notation that speaks with rich range of human actions situated
within an IT context. Both EPC and YAWL offer a basic, high-level flow
description in this regard, but neither (from a notational point of view) carry the
same descriptive power that BPMN offers. It is for this reason that BPMN has
been chosen to graphically specify treatment plans in ROBUST.
6.3. Workflow specifications
In the previous section workflow notations are discussed without reference to
their technical specifications. The underpinning machine processable
formalisms used to support the human activity of specifying workflows
graphically follows. Readers will not be surprised to find considerable overlap
between the concepts signified graphically in the previous section and
definitions defined formally in the technical languages discussed here.
However the mapping between the 'model' and the 'view' in these cases is not
always simple, complete or intended. For example, some specification
languages have been primarily designed for inter-model 'transport' whilst
others are tightly bound to a corresponding software application.
A brief description of each formalism is given below against which an
indicative assessment related to the relative level of support of common
workflow concepts is presented. This score is based on the review of features
found in Appendix H. Assessment falls into one of five categories these are
presented in Table 3. Nine assessment criteria are considered in Table 4.


257859), 2011 45/145

Table 3: Workflow specification coverage rating
Relative feature coverage Rating
< 20% Poor
20-40% Below average
40-60% Average
60-80% Good
> 80% Excellent

Table 4: Specification criteria description
Criteria Description
Activities The range of activity types available for use within the language.
These include aggregated, iterative and event based
specialisations.
Activity qualification The level of parameterization and additional logic that can be
applied to a particular activity.
Control The ability to specify under which conditions a particular
transition from one node to another in the workflow.
Data The representation of data objects or variables.
Message qualification Additional semantics attached to message passing (such as
format presentation and instance correlation).
Events The extent to which external state changes and signalling is
handed.
Machine services An indication of the ability to refer to, and invoke, machine
services in the process of the workflow.
Participants, agents & roles The degree to which roles within a process can be described
(including specific reference to human agents).
Meta-data Additional information carried in the formalism including
graphical layout and documentation authorship.
6.3.1. OMG's UML/DI
The OMG has generated two model serialization standards that allow their
UML specifications to be serialized. For the purpose of exchanging model
abstractions (without any presentation data, such as object size and position),
the XML Metadata Interchange (XMI) is a mapping that transforms Meta
Object Facility (MOF) based specifications into XML that can be machine
validated. Graphical meta-data information is carried using the Diagram
Interchange (DI) meta-model that allows spatial information to be attached to
accompanying model data. Tools that are use a MOF based specification of
the UML can use these two formalisms to serialize UML based workflows
(activity models) in this de-coupled fashion.

257859), 2011 46/145

Table 5: OMG XMI encapsulation of the UML activity model
OMG UML/DI
Activities Average
Activity qualification Below average
Control Below average
Data Average
Message qualification Poor
Events Excellent
Machine services Below average
Participants/Agents Below average
Meta-data Average

Given the MOF-based description of the UML activity model and supporting DI
formalism, the overall expressiveness of the specification therefore falls into
line with the OMG specification of a UML activity diagram. Therefore, in this
case, the specification uniformly maps to the graphical notation meaning all
notational concepts can be captured technically. However it is notable that the
notation as a whole offers fewer 'native' activity based concepts than others,
particularly within activity qualification, control, and message passing.
6.3.2. EPML 1.2
The EPC Markup Language (EPML) has been developed to support the
development of EPC models by providing an XML based encoding for model
serialization [83] and is based on XML Schema 1.0.

Table 6: EPML evaluation summary
EPML 1.2
Activities Poor
Activity qualification Poor
Control Excellent
Data Excellent
Events Poor
Machine services Below average
Participants/Agents Average
Meta-data Average


257859), 2011 47/145

The high-level perspective of the EPC model is reflected in the EPML; its
relatively poor score is perhaps indicative of the age of the language itself
which, in its base form, has developed little over the years.
6.3.3. WS-BPEL 2.0
WS-BPEL was developed as a loosely coupled, extensible language for
business process specification and interoperation with web services. This
review considers the executable form of the BPEL specification. The
specification is dependent upon WSDL 1.1, XML Schema 1.0, XPath 1.0 and
XSLT 1.0.
Table 7: WS-BPEL 2.0 evaluation summary
WS-BPEL 2.0
Activities Excellent
Control Excellent
Data Excellent
Message qualification Below average
Events Average
Machine services Excellent
Meta-data Poor

WS-BPEL is strongly oriented towards web service description and enactment
whilst leaving some of the human aspects (such as graphical notation and
human activities) to other specifications such as the BPMN and WS-
BPEL4People.
6.3.4. WS-HumanTask 1.1
The WS-HumanTask specification introduces human oriented structures and
behaviours that are typical in a business environment. There is extended
support here for role definitions, groups and task allocation. WS-HumanTask
depends on the following specifications: WSDL 1.1, XML Schema 1.0, XPath
1.0, WS-Addressing 1.0, WS-Coordination 1.1 and WS-Policy 1.5. See Table
8 for an evaluation summary.


257859), 2011 48/145

Table 8: WS-HumanTask 1.1 evaluation summary
WS-HumanTask 1.1
Activities Below average
Activity qualification Excellent
Control Average
Data Excellent
Message qualification Good
Events Average
Participants/Agents Excellent
Meta-data Poor

This specification focuses primarily on organising human structures and
activity qualification in particular role allocation and task assignment.
Perhaps unexpectedly for a language that is aligned toward human tasks,
there is also good support for machine service definition. As such, WS-Human
Task 1.1 is a potentially good candidate for describing the 'who' and 'what'
components of a treatment plan.
6.3.5. WS-BPEL4People 1.1
An extension of WS-BPEL, WS-BPEL4People builds upon a strong web
services foundation and also integrates with WS-HumanTask. In this way,
human tasks (also referred to as services) can be integrated with web service
execution. WS-BPEL4People depends upon WS-BPEL 2.0, WS-HumanTask
1.1, WSDL 1.1, XML Schema 1.0 and XPath 1.0. See Table 9 for an
evaluation summary.

Table 9: WS-BPEL4People 1.1 evaluation summary
WS-BPEL4People 1.1
Control Excellent
Data Excellent
Message qualification Excellent
Events Average
Participants/Agents Excellent
Meta-data Poor

257859), 2011 49/145

Unlike most of the other technical languages described here, WS-
BPEL4People is a synthesis of other specifications and, as a result, has had
much to gain. However, the specification still lacks a specific notation: data
from a WS-BPEL4People model may be mapped to a notational specification
such as BPMN. This is an important factor to consider when choosing a
language that will also be required to map to a visual representation of a
treatment plan. For the ROBUST risk manager an easily accessible visual
representation will be needed to both create and enact plans.
6.3.6. ebXML Business Process Specification 2.0.4
For business analysts that have a particular interest in well-defined business
protocols and transaction patterns, the ebXML BPS provides a solution. The
specification provides a bridge between an agreed business collaboration
framework and the execution of software services. Other ebXML definitions
used by the ebXML BPS include Technical Architecture Specification 1.04,
Core Components Technical Specification 2.01; Collaboration-Protocol Profile
and Agreement Specification 2.1, Business Process and Business Information
Analysis Overview 1.0, E-Commerce Patterns 1.0 and others. See Table 10
for an evaluation summary.

Table 10: ebXML Business Process Specification 2.0.4 summary
ebXML BPS 2.0.4
Activities Good
Control Average
Data Excellent
Events Excellent
Machine services Good
Participants/Agents Good
Meta-data Poor

This language's emphasis on protocol and transactions lends well to service
specification (using ebXML formalisms). However, this emphasis can also
reduce the language's power to provide more flexible descriptions of activities
and messages since it is oriented towards a service-based, contractual
description of activites and events. It is unlikely to be the case that all activities
and plans that the ROBUST risk manager might describe (at a high level) will
be verifiable in the form expressed in ebXML.

257859), 2011 50/145

6.3.7. YAWL 2.1
Pattern oriented and tightly bound to the YAWL workflow application [98], the
YAWL specification is also mutually supported by a graphical editor and
dialogue generation provided by the software. See Table 11 for an evaluation
summary.
Table 11: YAWL 2.1 specification summary
YAWL 2.1
Activities Poor
Control Excellent
Data Average
Events Poor
Participants/Agents Poor
Meta-data Excellent

YAWL's main strength lies within its ability to specify and execute machine
services with a high degree of technical precision. Conversely, the relatively
small number of constructs within the language means that there is a lack of
'first class' components that can be used to directly specify a more flexible
range of activity concepts, messages and events.
6.3.8. XPDL 2.1
In essence a bridging and serialization specification, XML Process Definition
Language (XPDL) was designed to provide a file format for BPMN 1.1 and
also as a platform suitable for capturing business processes for use by
applications using other formalisms. See Table 12 for an evaluation summary.
Table 12: XPDL 2.1 specification summary
XPDL 2.1
Control Excellent
Data Excellent
Events Excellent
Meta-data Average

257859), 2011 51/145

Unsurprisingly, the general coverage provided by XPDL is good since it was
targeted to support BPMN notation and also includes additional components.
6.3.9. BPMN 2.0
As already argued, the BPMN has the richest notational set for business
process modellers; the specification language provides graphical meta-data
for diagram layout. The specification is reliant on XML Schema 1.0 and XPath
1.0. See Table 13 for an evaluation summary.

Table 13: BPMN 2.0 specification summary
BPMN 2.0
Activity qualification Average
Control Excellent
Data Excellent
Events Excellent
Meta-data Average

Despite having a comparatively low level of message qualification, BPMN
does have a rich set of message event types (these include interrupting;
exception throwing and catching). n addition to the language's ability to
addresses many high level concepts that can map to multiple business
domains, there is good mapping to machine services.
6.3.10. T2FLOW
The T2FLOW specification language encapsulates processing model
descriptions for data intensive applications (many of them scientific in nature)
supported by the Taverna workflow language and software system.
T2FLOW's format conforms to the XML Schema 1.0 standard. See Table 14,
below, for an evaluation summary.


257859), 2011 52/145

Table 14: T2FLOW specification summary
T2FLOW
Activities Below average
Control Average
Data Average
Events Poor
Participants/Agents Poor
Meta-data Average

Due to its data processing remit, the specification excels in linking machine
services with data flows. Unsurprisingly, the specification lacks some of the
refinements in activity specification and message passing that other
languages express.
6.3.11. Summary of workflow specification languages
Section 6.3 is limited to reviewing the major, contemporary specifications of
human based, activity-based workflows. The evaluation, based on each
language's relative scoring (see appendix H), is also set at a relatively high
level of granularity that is intended to match the general descriptions of
risk/opportunity responses provided by the ROBUST use-case partners (see
appendix O). The strongest candidates from within this clutch of languages
are those that offer a high level of 'native' descriptions of activities, events and
message passing. These candidates include WS-BPEL4People 1.1, WS-
BPEL 2.0, XPDL 2.1 and BPMN each of which could potentially be used to
model treatment workflows for ROBUST.
In selecting an appropriate technical workflow specification, a pragmatic
requirement serves to further inform the choice: the mapping between the
specification and the graphical notation. Only some of the technical
specifications have a well-defined mapping to a graphical notation (UML/DI;
EPML 1.2; XPDL 2.1 (to BPMN 1.1) and BPMN 2.0). It should be emphasised
that this does not mean that other such mappings are impossible (indeed,
some tools offer this, see section 7.6.2). However, such an approach does
add an extra layer of complexity to the treatment specification model.
Additionally, the specification language should be extensible to allow potential
specification of actions that could be carried across to the community
simulation components being developed in WP4. Given this standpoint, the
BPMN 2.0 specification is the preferred XML based language for treatment
descriptions see section 7.6.2 for the software library overview).


257859), 2011 53/145

7. Risk and opportunity management framework
The above section reviewed the different techniques and technologies for
modelling risks and opportunities as well as treatment workflows. We present
in the following sections the ROBUST management framework. Section 8
presents the ongoing work aiming at validating these choices.

Risk Manager
WP3_MicroFeaturesExtraction
WP5_MacroFeaturesExtraction
WP5_SentimentAnalysis
Specify
R&O
Specify
Treatment
plan
Visualise
status
WP1
WP4_Simulation
WP5_MacroRoleAnalysis
WP3_MicroRoleAnalysis WP5_TopicExtraction
WP1_RiskAndOpportunity
Editor
WP1_RiskEvaluationEngine
WP1_WorkflowEnactor
WP1_RiskAndOpportunities
Dashboard
- R&O templates
- R&O and Actions History
R&O
Repository
Community
data
Repository
ESB

Figure 15 WP1- Risk and opportunity management framework
Figure 15 shows the ROBUST risk and opportunity management framework
which will be produced by WP1. This framework includes the components
which will assist the risk manager in identification, assessment, forecasting
and monitoring of risks and opportunities within the community. The detailed
architecture and interactions is out of scope of this deliverable. The
components are:
1- Risk and opportunities dashboard: it will include three main modules:
a. Risk/Opportunity editor: this enables the risk manager to edit
risks and opportunities in the system.
b. Risk/Opportunity dashboard: this integrates multiple views of the
risks and opportunities including risk matrix and risks states (see

257859), 2011 54/145

section 7.5 below). It also includes network graph visualisation
component. This component serves as a tool for visual
risk/opportunity communication. It also allows the risk manager
to investigate the health of the community and identify risks and
opportunities.
2- Evaluation engine: uses the models being developed in ROBUST for
assessing the risks and opportunities. This includes interactions with
other WPs tools and components via an Enterprise Service Bus (ESB).
3- Repository: the risk and opportunities instances and relevant
information will be stored in this repository. This includes the history of
treatments that may serve in any future situations.
4- Workflow enactor: this is a framework where plugins can be added for
automatic enforcement of treatment workflows. Note that not all
treatment activities can be automated for different reasons:
a. Some activities require human action (e.g. visualise the network
graph)
b. We may not have in ROBUST access rights to the real
community platform to enforce actions. But the system will
always allow plugins to be added whenever required.
We are aiming in ROBUST at implementing a proactive management
framework. In this case the risk event likelihood is calculated in order to allow
treatment strategies (e.g. mitigation) to be enacted. Once the risk occurs it is
added to the repository history information and recovery options are then
executed. On the other hand, the benefit from the opportunity may not be
automatic. If no action is taken to exploit the opportunity once the event
occurs then it is ignored. The system will assess the likelihood of the event
happening at the present time as well as in the future. Multiple techniques can
be used to address risks and opportunities analysed in WP1, WP3 and WP5.
In the following section we discuss the events and the types of risk and
opportunities that are the focus of the ROBUST project.
7.1. Events in online community
The identification of risks and opportunities is an open-ended task especially
that the definition of events refers to any "uncertainty that may affect the
objectives positively or negatively.
However considering the case in the ROBUST project where we are based on
the community activity log as a main source of information, we should limit the
scope of the events that can be specified in ROBUST to the ones available in
the community log (directly or indirectly). Statistically sufficient data is needed
in order to produce realistic mathematical models.
Considering the above issues we will define the scope of ROBUST events to
be the features that are available in the community log. Risk or opportunity
can then be represented as state models where a state represents a risk or
opportunity. The state can refer to a community state, user behaviour, role

257859), 2011 55/145

(e.g. lurker, expert), sentiment (e.g. positive or negative sentiment) or any
other feature (e.g. activity drop>=25%).
Risks based on complex events (composed of more than one event) are also
covered in this aspect. For instance, the state can refer to the "user leaving
the community and the user role is "expert.
The framework describing how features are related to the community and
users can be found in WP3
3
ontology where the features analysis and
selection are addressed in D5.1 [49].
The on-going work on the ontology addresses capturing community users'
behaviour and standing within a community over time as they interact with
other users and develop an identity. This ontology will be the reference for
understanding the user and community feature-based events indicated above.
For instance, the ontology (Figure 16) provides a UserImpact class that
contains the features that a user exhibits at a given point in time with a given
community (e.g. their social network properties and the level to which they
participate and interact).

Figure 16 A partial view of the WP3 ontology relations
The ontology also includes an abstract class Role that defines the role that a
user has within an online community. Several subclasses exist of the Role
class, each of which defines a unique role within the community - for example:
popular participant, elitist or ignored. A user is then assigned a role at a given
point in time based on his/her behaviour, allowing the change in the user's
role to be monitored as his/her behaviour changes within the community. All of
these user and community features feed into the risks and opportunities
events. The computation required for computing certain features (e.g. roles)
may require deep analysis of the community and user activities as shown in
the ROBUST WP5, D5.1.

3
http://purl.org/net/oubo/0.3

257859), 2011 56/145

Event
Likelihood
Objective
h
a
s
Impact Area
C
l
a
s
s
i
f
i
e
d

u
n
d
e
r
Impact affects
Derived
from
h
a
s

Figure 17: Relation between event and objectives

The events that characterise risks and opportunities affect objectives
positively or negatively (Figure 17). An event may impact more than one
objective at the same time. The level of impact may also differ according to
the objectives. The relations between events and objectives may form a cyclic
graph and possibly conflicts which need to be resolved (see Section 5.2.2).
Generic tools and algorithms can be used for this conflict reasoning [91]. The
risk manager (or even a higher board across the organisation) has to take
actions which may include reconciling or decomposing incompatible
objectives.
7.2. ROBUST Risk and Opportunities modelling
Risks and opportunities are modelled as transition events within a state
model. Considering in Figure 18 the state 1 in red to be our risky state (e.g.
role point hunter, activity drop>=25%) where state 2 is the acceptable state,
the risk event is the transition which leads from state 2 to state 1. The risk
likelihood is the probability of this transition. In the case where the risk event
occurs, the system will then be in state 1. Recovery plan when executed
should change the system back to the acceptable state 2. In ROBUST, we will
have multiple state models to represent the community system from different
perspectives which may incorporate multiple risks and opportunities.

1
2
Recovery plan
Risk event

Figure 18 State model with risky state


257859), 2011 57/145

The risk event (transition) likelihood is calculated from the community log. In
order to calculate the probability of the transition, we link it to other features or
events in the system. So the likelihood is in fact the conditional probability
given the values of these explanatory features (or predictors); see Figure 19.
Take for instance a simplistic hypothesis that the likelihood that a user leaves
the community can be computed based on the number of colleagues who left
the community. If we have history data about these two variables (user in or
out community, users colleagues) we can calculate the likelihood of a user
leaving as the community evolves. Whenever one of the colleagues leaves
the community, the likelihood is re-calculated to show an increase (in
likelihood) that needs to be addressed.
1
2
Recovery plan
Risk event
P1
P2

Figure 19 Risk event related to explanatory parameters p1 and p2

Finding good predictors is a key for accurate predictions. Correlation analysis
is a starting point in order to find these predictors. A similar process applies
for the opportunity event. Though in the case of the opportunity an action is
needed to exploit it, this doesn't necessarily affect the state of the system from
this perspective. For instance consider the opportunity event being that a user
develops expertise in a certain topic, where the exploitation action is "upgrade
the user to moderator. If the states represent the user knowledge expertise
(e.g. state 1: user expert in topic 1, state 2: user beginner in topic 1), being in
a certain state (state 1) will not change one the opportunity is exploited i.e. the
user appointed as moderator.


257859), 2011 58/145

1
2
3
Recovery plan
Risk event
Opportunity event

Figure 20 state model with risk state and opportunity state
Note that multiple states within the same diagram may refer to risks and
opportunities (Figure 20). It is also possible that the same event may be
qualified as an opportunity considering one objective but a risk considering
another objective. According to the objectives order at the time of the event,
the risk manager will have the option to mitigate the risk or let it happen in
order to exploit it as opportunity.
For the community decomposition-related risks or opportunities, compartment
models and other techniques developed in WP3/5 will be used. These models
are suitable for analysing the growth/shrinking of compartments based on
migration rates (see section 5.2.6).
7.3. Risk and opportunities dependencies
In addition to the impact on the objectives, risk and opportunities events can
be interdependent and may affect each other. It is important to understand
these relations in order to:
1- Increase the accuracy in risk assessment based on other risks and
opportunities states.
2- Have an integral view of the risks and opportunities in the system.
3- Optimise the performance of the assessment process across a set of
risks and opportunities in the system.

Consider the Bayesian Network in Figure 21. The independent variables are
on the top of the graph (I1..In) where the dependant variables are named
D1..Dn.

257859), 2011 59/145

I1 I2 In
...
D1
D2
Dn
x
S1
S2
S3
Risks and
Opportunities

Figure 21: Bayesian Network example of modelling dependencies between variables
and risks and opportunities.
Both the dependent and independent variables can be community or user
features. Risk and opportunities events can then be related to a specific value
of this random variable (being a state, numeric discrete or continuous value).
The likelihood of the risks or opportunities can be inferred from the Bayesian
network using the conditional probability framework.
There are multiple challenges to be addressed here. The first is the issue of
having statistically significant data of the variable values. Though the
Bayesian approach is about learning and computing posterior probability, it is
still important that enough data is available within reasonable time in order to
have accurate assessment. The other issue is learning the network structure
which needs more investigation in the context of specific risks. According to
the complexity of the structure, decomposition into multiple networks may be
considered.
7.4. Risk and opportunity templates
In order to integrate the concepts explained above into the WP1 risk
management system the following class diagrams and XML schema are
provided. Figure 22 shows the class diagram depicting main components and
relations.

Event Likelihood
Impact
Risk/Opportunity
1
1
1
*
1
1
Objective
1
*
TreatmentPlan
1
*

Figure 22: Risk class diagram depicting main components and relations

257859), 2011 60/145

The class diagram shows the relation between the risk/opportunity and the
other entities. A risk/opportunity has an event, likelihood (corresponding to the
event likelihood), multiple impacts on objectives as well as shared treatment
(optional recovery) plan. A representation of the XML template for risks and
opportunities is shown in Figure 23.

Figure 23: Representation of the XML schema for risk and opportunity

Risk/opportunity attributes include:
- Title: a title that gives a clear indication of the risk or opportunity
- Description: a description of the risk/opportunity
- Owner: this is the person or entity with accountability and authority to
address the risk. According to the organisational allocation policy, the
owner may be the department heads or as low as possible in the
hierarchy closer to the appropriate controls.
- Event: event which if it occurs, a positive/negative impact will affect the
organisation objectives. The event may be complex i.e. composed of
more than one event.

257859), 2011 61/145

- Likelihood: This is the likelihood of the event occurring within a time
frame in the future.
- Impact: This is the impact on objectives. An event may have impact on
more than one objective. The impact may be qualitative or quantitative.
In a conservative mode, the impact on the objectives should be
considered as in the worst case scenario. The impact may change over
time as well as the objectives.
- Treatment plan: this includes a set of actions in order to treat the
risk/opportunity as specified by the risk manager.
- Recovery/Contingency plan: this includes a set of actions to be
executed when the risk/opportunity occurs. This is also specified by the
risk manager.
- LastDateReviewed: according to the risk management process, risks
should be reviewed in order to remove irrelevant risks and
opportunities or add new ones. This entry contains the date if the last
review of this specific risk/opportunity.
- ExpiryDate: this is the date after which the risk is not considered
anymore (e.g. when the impact of the risk is estimated to be null in 2
months).
- CurrentControls: this contains information about the current controls
that are in place for treating the risk/opportunity.
- Model: this refers to the module used to assess the risk/opportunity
likelihood. These modules are provided by the different ROBUST WPs.
7.5. Risk and opportunity states
In the ROBUST WP1 risk framework, we distinguish between different state of
risks and opportunities. A risk/opportunity can be in one of these states at a
certain time (Figure 24):
Inactive: this is the state of a risk or opportunity that is not put into
enforcement in the system. This can be for several reasons e.g. risk is not
applicable anymore because of objectives change, data/model is not available
to assess the risk.
Active: this is the state of a risk that is put into enforcement in the system.
The system will then evaluate the risk periodically and flag if the probability of
occurrence goes beyond a certain level.
Flagged: this is the state of risk that is judged unacceptable because it
exceeded a critical threshold (e.g. high likelihood, high impact zone) set by the
risk manager and thus requires treatment.
Treatment: in this phase the system is aware that the risk or opportunity is
being treated. Many treatment strategies are possible as indicated in Section
6. The treatment actions should be kept for future investigations.


257859), 2011 62/145

Active
Inactive
Flagged Treatment

Figure 24: Risk states.

The risk moves from inactive to active state when the risk manager activates
the risk and adds to the enforcement list. The ROBUST system then assesses
the risk/opportunity and evaluates it according to criteria (likelihood and
impact thresholds) provided by the risk manager. If the risk/opportunity
assessment proves that the likelihood and impact are higher than the
thresholds then the risk/opportunity is flagged. This indicates that it requires
treatment by the risk manager. The risk manager can then verify the
assessment and trigger the treatment phase. After the treatment phase the
risk may be put into action again or rendered inactive.
7.6. Treatment plans
7.6.1. Treatment workflows
In acting to control risk factors, or indeed in seeking to exploit a new
opportunity, a risk manager is better prepared when he or she has a plan to
hand. The capturing of risk treatment plans ahead of their use is therefore an
important part of any risk management system. For the ROBUST user, this
means having the ability to create a suite of plans that specify enactable
processes that will have concrete effects in the target community.
Consequently, the ROBUST risk manager should then be able to call upon
and tailor a treatment plan to meet his or her control objectives at the point of
intervention.
Strategy selection
It is assumed that the risk manager already has reference to a risk/opportunity
definition set and the facilities to evaluate either current risks or (simulated)
future risks this is discussed in section 8.2. Given this data and facility, the
risk manager will then seek to define a plan to change the environment [40,
88]. The first step is to choose a general strategy (discussed in more detail in
section 6.1):
- Mitigate/Reduce (reduce the probability of the risk event occurring)
- Recovery/Fallback (reduce the expected consequences of the risk
event)
- Avoid (avoid the risk event altogether)
- Transfer/Share (share/disperse risk responsibility between entities)
- Accept (factor in risk as a part of the overall environment)

257859), 2011 63/145

- Exploit/Enhance (increase the probability of an opportunity
occurring)
- Ignore (take no action)
Plan parameterisation
It is expected that in ROBUST, any given treatment plan will have mappings
to one or more risks/opportunities. Depending on the strategy selection and
the actions specified in the specific treatment plan, it should be possible for
the risk manager to further refine the treatment by defining parameters that
can be modified at the point of treatment invocation. For example, it could be
possible to refer to a specific resource within a community when trying to
mitigate the potentially negative actions of a frustrated user.
Ownership and scheduling
In some risk treatment development practices, further management related
concerns are also addressed these include the assignment of action
ownership and scheduling. For this revision of the ROBUST treatment
workflow, action ownership will be restricted to a single user type: the risk
manager. Whilst in other domains temporal rules may play an important or
even critical role in an action treatment plan, in ROBUST treatment action
execution will be considered more simply. In this case, a treatment action may
be predicated on the success or failure of the execution of a previous action.
Execution success or failure must be evaluated by the risk manager and
communicated to the ROBUST system via the user interface.
Plan evaluation
Similar to a closed-loop control system, a treatment process will include
measurement and evaluation components within its overall process. The
involvement of the treatment workflow within a risk management system can
be both formative and summative. The former case occurs during treatment
plan specification. Here the risk manager may wish to experimentally evaluate
the effect of the whole or part of a treatment plan by assessing the
subsequent risks/opportunities that are brought about by the treatment
action(s). This is sometimes referred to as a 'what-if' evaluation; in the
ROBUST context, such a question could be answered by setting up the risk
parameters of the anticipated scenario, specifying the treatment, and then
running a simulation to estimate the result. In light of this information, the risk
manager may then wish to refine the plan. In the summative evaluation case
(i.e, during the enactment of a treatment plan on 'live' data), the risk manager
may need to evaluate the outcome of specific actions and guide the control of
flow within the treatment plan through interactions with the enactment
process.
Resourcing and authorization
Finally, in initially generating treatment plans, another stratum of organisation
management will be considered in some treatment workflows: resource
allocation and plan. Some organisations may require plans to be reviewed

257859), 2011 64/145

and approved with consideration of the resource allocation required to enact
them before they can be deployed. Whilst this analysis is an operational
necessity for most organizations, this aspect of treatment workflow
specification is considered beyond the scope of the current ROBUST project.
7.6.2. Anticipated ROBUST treatment processes
An overview of typical risk treatment development and practice has been
discussed above. As has already been indicated, not all of the components
identified (such as resourcing/authorization) are expected to be used within
the ROBUST project since they are typically bespoke, organisation level
concerns. An activity model for risk treatment plan development and
evaluation in ROBUST is presented in Figure 25.

257859), 2011 65/145

Figure 25: Risk treatment plan creation and evaluation
The plan development and evaluation process outlined here supports an
iterative 'what-if' methodology in which proposed treatments can be
progressively tested and refined (using community simulation components
from WP4 to explore potential treatment actions). It is expected that the
ROBUST system will provide some template treatment plans based on known
strategies for online community risk management. In Figure 26 an activity
model is presented for the application of treatments during the live monitoring
of an online community.

257859), 2011 66/145

Figure 26: Risk treatment plan execution
During the live monitoring of risks, the risk manager may be required to attend
to multiple risk cases that require treatment. As risks or opportunities arise, it
is assumed that the ROBUST risk manager will be able to temporarily
suspend the currently enacted treatment plan to evaluate further risks and
commence further plans.
Workflow notation and specification selection
Two complimentary domain sources have been used to inform the selection of
ROBUST's treatment design further. First, a domain analysis based on policy
documentation described in WP4 (see Appendix E). Second, data from the
risks and opportunities survey carried out in WP3 has been used as a basis
for identifying the potential components needed to describe action responses
to perceived risks (see Appendix F). Finally, a review of a number of SDKs
was conducted in order to ensure that the development of the treatment
workflow component of ROBUST remains tractable (see Appendix G).
Based on these descriptions, it has been decided to adopt the use of the
BPMN workflow specification since it offers the richest notation that is likely to

257859), 2011 67/145

be most adaptable to present treatment plans. A more detailed examination of
the leading BPMN SDK candidates shows that both Activiti and jBPM share a
common development ancestry; it is not surprising to also find that both SDKs
offer similar functionality and power. Activiti has finally been chosen as the
development basis for the treatment workflow due to its ability to be deployed
in a wide range of Java environments and its more accessible UI rendering
functionality.
Simple ROBUST treatment workflow example
To illustrate the application of BPMN to the representation and specification of
a treatment workflow, the following simple scenario (generated by the
ROBUST group) is presented (see Appendix I for the BPMN graphical model).
In this scenario, the risk manager is responding to increasing signs of
community inactivity. The plan outlined in Appendix I shows three main
activity-based stages to the plan: an evaluation of the current community;
possible modification of community content; and actions intended to increase
community motivation. Activities that precede a gate (such as 'select
evaluation') will have associated variables. Here it is expected that the
ROBUST system will present the user with a UI form in which she should
attribute values to one or more of the variables. Once the form has been
dismissed, the treatment workflow engine will select a gated transition based
on rules that evaluate the values entered by the user.
7.7. Risk and opportunity editor
To support the management of risks and opportunities within the community,
a risk editor is currently under development. The editor will provide the facility
to define a risk or opportunity starting from templates as specified in the
sections above.
The editor will include the configuration of event predictor functions such as
churn and user role prediction. Additionally, it will allow the user to specify a
risk's impact on community objectives and link the risk to any available
treatments. It is expected that each risk will be evaluated periodically by the
system and feedback provided to the user (for example, via the risk
dashboard).
From visualisation point of view, the assessment information can be viewed in
multiple ways. In a conservative mode, we assume that the risk manager
would want to see the critical risks or opportunities. Then the impact to be
considered in the risks and opportunity matrix should be the highest amongst
the set of objectives. Another view of risks in terms of objectives will also be
provided. Another way to view the risks and opportunities is by temporal
proximity showing the ones that will occur soon first. Screenshots of the
prototype are included in Appendix J.

257859), 2011 68/145

8. Forecasting of risks and opportunities in ROBUST
For each of the use case partners, IBM, SAP and Polecat, a number of risks
and opportunities have been discussed and specified in more detail. These
are initial examples, which helps drive the current research and development
work in ROBUST. Further risks and opportunities, relationships between
features that can be used to detect them, events and tools for detection and
forecasting are all part of continued research and development work in
ROBUST.
The risks and opportunities currently considered are discussed in Section 8.1.
To be able to accurately forecast the occurrence of risks or opportunities,
tools need to be developed that consume features extracted from community
data, which is covered in detail in D5.1 [49]. Several techniques for modelling
and detecting/forecasting risks and opportunities have been reviewed in
Section 2. Three of which are considered in WP1, which are discussed in
Sections 8.2 and 8.3, namely agent-based modelling, compartment models
and Gibbs sampler. The application of forecasting tools currently developed
by other ROBUST partners is discussed in Section 8.4.
8.1. Risks and opportunities for use cases
The three use case partners, IBM, SAP and Polecat, provide different
community data and requirements, which are discussed in [62, 80, 90].
Therefore, details are not discussed further here. However, since Polecat
provides different datasets from the public domain, note that the risks and
opportunities considered here are for the TiddlyWiki data (as described in
[90]).
Due to confidentiality, there is a lack of certain data in the IBM case. There is
limited temporal information and there is no content. This puts certain
constraints on the risks and opportunities that are possible to address. In the
SAP and Polecat cases, however, content is available. Therefore, in these
cases, there are more risks and opportunities currently defined, which are
based on analysis of content.
The risks and opportunities currently considered in each of the use cases are
discussed below in respective sections, followed by an overview of which
partners in the ROBUST project are conducting research to address the
respective risks in Section 8.1.4.
8.1.1. IBM
R1- Risk of the community becoming inactive
The risk of a community becoming inactive was suggested by several
responders of the risk and opportunity questionnaire (discussed in Section 3).
This risk compromises objectives of fostering collaboration and employees
getting value from participation. In general terms, the event of the risk
occurring is that the quantification of activity drops below a certain threshold.

257859), 2011 69/145

IBM is interested in considering any activity within a community, which
includes forum posting, wiki pages, blog entries, commenting, bookmarking,
file sharing, downloading, sharing, recommendations, etc. Moreover, for a
particular type of community, the quantification could be different dependent
on the respective objectives. Therefore, we consider, for example, a model in
which the person defining the risk can specify a weighting among different
technologies.
This is a risk that could be detected in many ways, e.g., according to content
metrics alone or based on the proportion of users becoming in active. The
exact quantification and suitable threshold needs to be decided based on an
investigation of data and initial empirical results.
R2- Risk of key contributors leaving
This risk relates to the above risk. It can be considered a more specific case,
focusing on certain users who are considered to be key contributors in the
community. If such an individual leaves, it may have a significant impact on
the community activity as it may have a knock-on effect on other users,
leading to them leaving too.
Key contributors can be mapped to top contributors or to a subset of the
contributors with certain attributes (e.g. bridges, authoritative, etc). The first
step in detecting this risk is being able to identify an individual as a key
contributor, which could come from a role analysis tool, as discussed in the
risk below. The risky event can then be defined as a key contributor dropping
in activity below a certain threshold. Similarly to above, the quantification of
activity needs to be defined, as well as a suitable threshold. Churn analysis
conducted by partners in ROBUST addresses this, which is discussed further
in Section 8.4.1.
R3- Risk of undesirable user role compositions
This is an example of a risk that lends itself to a stateful definition of a risky
event, as well as one based on thresholds. As an example of the former, a
risky event in this case would be a user making a transition from being an
active contributor to becoming inactive or just a consumer. In the latter case, a
risk could be defined based on proportions of users belonging to a set of
roles. The risky event would be a too great proportion of users being a
particular role. More details of the research on role analysis is discussed in
D5.1 [49].
8.1.2. SAP
R1- Risk of community becoming inactive
Similarly to the IBM case, SAP is also interested in managing the risk of their
community becoming inactive. The activity in this case can be defined
specifically as content contributions on the support forums, which is
straightforward to quantify. In particular, it is of interest to SAP to be able to
specify and monitor content that leads to a forum thread being answered
(solved).

257859), 2011 70/145

A suitable activity threshold for defining such a risky event would need further
investigation to be determined. An initial threshold value of 20% drop in
activity of top contributors is considered for the work on the compartment
model and Gibbs Sampler discussed further in Sections 8.3 and 8.3.4.
However, this is only one approach to forecasting this risk, which is in the very
early stages of investigation at the time of writing.
R2- Risk of experts leaving
Similar to the IBM risk of key contributors leaving, those key contributors can
be directly mapped onto users who are defined as being experts in certain
topics (by forum) according to the point system in the SAP community.
In the SCN, however, we cannot talk about experts leaving the community per
se. That is, it is nearly impossible to distinguish between users that are only
consuming content and those that have potentially abandoned the SCN
completely, since users do not need to log in to consume content.
Nevertheless, being able to better understand the changes in the rate of the
contributions for top level contributors is of high relevance.
It is desirable to do this analysis on a per forum basis, and the number of
users to monitor would depend on the size of the respective forum. For
example, the top 10 or 50 users.
O1- Opportunity of gaining experts
The flip-side of the risk above is the opportunity to gain experts. In the SAP
communities, this could be done by, for example, identifying users who are
starting to contribute to the community, who could: 1) fill a gap in expertise; 2)
build a greater coverage of expertise, even overlapping, as this could reduce
the impact of an expert leaving; 3) build stronger links between sub-
communities.
R4- Risk of low quality content
The content that is created by users of the SAP Community Network is an
important aspect of the user experience and therefore crucial for a high
community value. With the ability of analysing content (forum posts), this risk
is defined, in general terms, as a certain proportion of low quality being
reached. This entails classifying each content item by a quality metric. The
risky event is that the proportion of content items labelled as poor exceeds a
certain threshold.
The identification of low quality content is not trivial, which needs research.
Currently, TEMIS in WP5 provides several text based metrics, including
informativeness. However, low quality also has specific meaning in the SAP
support forums, which could consider, for example, identifying if the questions
asked are poorly researched (short, asking for points, etc.), whether there are
duplicate responses in the same thread, or no points being awarded in the
thread. However, the latter cannot be used as an indication without
considering that the discussion can still be interesting in the thread. Therefore,

257859), 2011 71/145

it is clear that there are many levels to the analysis required to detect such a
risk.
Similarly to the above risks, it is desirable to monitor this risk on a per forum
basis. However, it is also desirable to monitor this risk across the entire
community.
R5- Risk of duplicate topics
This risk relates to the risk discussed above, but refers specifically to the risk
of a high level of duplicate topics according to a syntactical comparison. In
minimising this risk, it will positively affect the risk of poor quality of content. In
turn this may also increase members' satisfaction as they are not polluted with
redundant and poor quality content.
O2- Opportunity of steering the community to specific topics
One of the objectives for SAP is to provide a platform that allows the
exchange of knowledge on SAP products. SAP, as an owner of this business
community, aims to cover specific topics in the community. These topics
change in time as the portfolio of promoted and supported products changes.
Since the user activity does not necessarily reflect these changes in time, the
value of the community can be increased by steering the activity of all users
and, therefore, the generation of content to topics that are desired by the
community owner at any given point in time. The community owner can, for
example, promote discussions about new products or emphasise the creativity
of community members for developing new trends. This can be achieved by
tweaking the community policy for suppressing creation of undesired content
or by increasing the motivation for creation of desired content.
R6- Risk of point scammers
The point system in the SAP communities encourages users to contribute and
offer support to other users who use the forums to ask questions. However,
there is a risk that users attempt to exploit this system in a bid to achieve
expert recognition. The risky event in this case would be a user changing
state to 'point scammer'.
One example of a point scammer is a user who trawls the forums for very
simple questions and may often just pick up a lot of 2 points
4
instead of fully
answering people's questions. A more elaborate approach is to (1) determine
users who have greater than a certain number of forum posts (threshold
needs to be investigated), but have many more posts than points (can be
referred to as scam enablers). Such users can be assumed to be users
posting un-researched questions, as mentioned in the risk above. (2) identify
which users predominantly respond to questions asked by the scam enablers.

4
When a user asks a question by starting a new thread, they can give 2 points to an unlimited
amount of users who provide somewhat useful answers. 6 points can be awarded to up to two
users who provide very useful answers. 10 points can be awarded to only one user who
solves the problem.

257859), 2011 72/145

This is an important risk to consider to maintain the value of the point system
and to avoid poor quality content.
R7- Risk of poor response times
Poor response times were suggested in the questionnaire both as a risk and
as means of measuring the health of the community, which is important to
consider in the SAP support forums. This risk can be addressed both at the
individual level and community level. The risky event for the former would be
that the time it takes a user to get replies to their questions exceeds a certain
threshold. At a community level, the average response times for users or
threads could be considered.
As with the previous risks, thresholds for the risky events would need to be
investigated. Moreover, this risk could be linked to the risk of users leaving,
and such correlations would be interesting to investigate.
8.1.3. Polecat
R1- Risk of community becoming inactive
Unlike the IBM and SAP scenarios, TiddlyWiki is a single community. The risk,
therefore, is that the entire community becomes inactive. There are a number
of external factors that may cause this, but these are not easily measured
(e.g. the development of superior technology that causes the community to no
longer pursue developing TiddlyWiki software). Nonetheless, there will be a
number of metrics gleaned from internal indicators that may prove indicators
of risk, and predict the onset of risks for the entire community in a similar
fashion to the IBM and SAP version of this risk.
R2- Risk of users leaving the community
Users leaving the community can represent a risk for the TiddlyWiki project,
but this depends on the particular individual. Because the community exists to
create output, the risk of users leaving is closely tied to their input. Therefore,
it is important to measure the quality of the content they generate and their
behaviour in the community. For example, this could be measured partly by
users' structural and behavioural indicators. On the flip-side, if users who slow
or impede the community output leave the community, this could be
considered an opportunity since it has a positive impact.
R4- Risk of wrong or poor quality content
TiddlyWiki is at risk from poor quality content, but, as in the case of users
leaving the community, this is coupled to the impact on output (both in terms
of software, and in terms of aiding this output).
R3- Risk of undesirable user role compositions
Similar to the IBM case, this is an example of a risk that lends itself to a
stateful definition of a risky event, as well as one based on thresholds. In the
case of thresholds, this relies upon the identification of what combination of

257859), 2011 73/145

user types constitutes a healthy community in the context of TiddlyWiki, and
measuring the derivation from this for any given timeframe.
O2- Opportunity of gaining contributors
Gaining contributors is an opportunity, as in the case of SAP and IBM,
providing that these contributors have a positive impact on the community
output. High numbers of contributors, in themselves, are no indication of
success.
O3- Opportunity of increasing the number of moderators
As discussed in Section 3.2.8, increasing the number of moderators in a
community can be an opportunity as they will be able to manage the
community better. This will increase the chances of community members
being satisfied as they may experience a good quality of user generated
content, and therefore improve their experience.
This opportunity is naturally connected to a risk of moderators being
overloaded or under-performing. If such a risk is prominent, the opportunity to
identify and recruit new moderators becomes greater. However, the there is a
trade-off to consider in this case, because too much moderation can stifle
creativity and reduce variety in the TiddlyWiki community considered in this
use case. This is an open source community that facilitates self-moderation in
its policy.
R8- Risk that a user has negative sentiment about a topic
Depending on the respective community, the risk of users having a negative
sentiment about a topic can have different implications. In some cases, it can
even be an opportunity. For example, having some proportion of users with a
negative sentiment about a topic can spark interesting discussions, leading to
more activity in the community. However, it can also be a risk that is
connected to users becoming inactive or leaves the community. Such a
dependency requires further research to confirm and model.
In the case of TiddlyWiki, an example might be the sentiment around new
features suggested or implemented in the software. Typically, in a software
community, this has the risk of causing people to lose heart, and
consequently, productivity. It may also mean that other developers are
developing software in which they no longer believe in.
8.1.4. Addressing the risks and opportunities
Several partners in ROBUST, across the different work packages, are
conducting research to address the risks and opportunities identified above for
the three use case partners. Table 15 gives a summary of the different risks
and opportunities, indicating the partners that are addressing them via
components for analysis, clustering, assessment and forecasting.


257859), 2011 74/145

Table 15: Mapping between the risks and opportunities and the partners providing
components for assessment and forecasting
Risk/Opportunity ID Components provided by partners
R1 NUIG, OU, IT Inn, CORMSIS, UKoB
R2 NUIG, OU, IT Inn
R3 NUIG, OU
R4 TEMIS, OU
R5 TEMIS
R6 ITInn, UKoB
R7 IT Inn
R8 IT Inn, TEMIS
O1 IT Inn
O2 UKoB
O3 IT Inn, TEMIS
O4 UKoB
8.2. Agent based simulation
To address the risks and opportunities discussed in the context of each of the
three use cases in ROBUST, this section discusses the one of the techniques
we are considering in WP1; agent based simulation.
The goals of using agent based simulations in a risk management system are
discussed below in Section 8.2.1. The simulation process and model
architecture are discussed in Section 8.2.2 and 8.2.3. A design example is
provided in Appendix L.
8.2.1. Simulation goals
The main goals for the simulation model are as follows:
1. Generate risk/opportunity forecasts.
2. Identify the impact a risk or opportunity can have on the objectives of a
community.
3. Validate the potential risk and opportunity drivers that may be derived
initially from statistical analysis of the community history data.
4. Explore state space of the possible risk and opportunity drivers.
Considering that a risk is defined as an event occurring, either based on a
metric reaching a certain threshold or user/community state change, this can
be monitored and forecasted within an agent model. By running a simulation
multiple times, a probability of a risk/opportunity event can be calculated.
Information obtained from a forecast (1) would point the community
administrator to a certain risk or opportunity as a possible event that will take

257859), 2011 75/145

place in the near future. However, this information is insufficient to identify and
quantify the consequences of the detected risk (or opportunity). For this, a
quantifiable estimate is needed in order to understand consequences of the
detected event happening. For example, the model could detect that a
particular community user would drop their activity by 90% within a month,
which is a significant risk. However, the impact estimation generated by the
model may suggest that this perturbation will have no effect on the global
community performance that could be measured by, for example, as the mean
response time or average number of answered threads. On the other hand,
another community user could be forecast with a 10% drop in activity, which
may seem like a small risk. However, this might be a key contributor in the
community and impact analysis may reveal that mean thread response time
would increase by 20% and average number of answered threads would drop
by 10%. This foreseen impact may therefore require community administrator
intervention before its emergence.
A typical approach to designing an agent model is to first perform a statistical
analysis of community data to identify features that may correlate with what
the model should simulate. This could be, for example, determining
correlations between users' activity and their collaboration stats or the arrival
of new threads. The aim is to identify 'drivers' that will shape the specification
of the model, which can be likened to hypotheses that are then validated on
historical data. However, since the hypotheses are derived from historical
data, the observations may only be true to a limited extent. In this case,
validation of the detected driver by the agent-based model simulation provides
complementary technique to support the understanding of the community
dynamics in general and the main drivers for risks and opportunities within it.
Rather than just validating the drivers detected during statistical analysis of
the history community data, the model can be applied for a more explorative
simulation where the new hypotheses about potential drivers are formulated
and their role in influencing risks and opportunities evaluated through the
simulation. For example, it may be suggested that the amount of obtained
reward points acts as a main driver of contributor agent activity. Such
hypothesis can be encapsulated within the simulation model as one of the
behavioural rules that is set to govern the behaviour of agents that model the
behaviour of individual contributor users. The model can be then initialised
based on the history data and the generated output compared with the actual
history output to prove the validity of the potentially new risk/opportunity
driver.
Achieving the goals within the scope of internet communities is not trivial as
the functioning of such systems shares more resemblance to complex
ecosystems operation than the deterministic behaviour of machines [32].
Here, individual community users, analogously to the ecosystem species, are
driven by their own goals and agenda (often competing or collaborating with
each other) [35], whereas their collective behaviour in the form of a
community can be thought of as a complex and dynamically changing network
of relations and dependencies formed across its members.

257859), 2011 76/145

Identically to real ecosystems, it is this network of dynamic relations and
diversity of individuals that makes it difficult to understand and manage the
whole system so that the efficient and healthy state is maintained over its
lifetime. This is to a great extent due to the fact that the above mentioned
community properties break the assumption that small and local changes to
the system (for example adjustment of the behaviour of a single user) have
proportionally small and minimal effect on the global system level. Quite the
contrary, it has been observed that within such systems, small fluctuations
may have unexpected and large consequences on the stability of the whole
system [25].
The non-linear cause-and-effect relationship makes it difficult, if not
impossible, to understand the mechanisms responsible for the efficient
community operation through the standard analysis/modelling approaches as
they often neglect or are not expressive enough to capture this important
property of complex systems [10]. To address this issue, agent-based
modelling has been proposed as a simulation technique that is capable of
representing the individual user diversity as well as the non-linear causality.
Consequently, both the important differences in individual community
members as well as their collective impact on the behaviour of the whole
community are explicitly represented within the model.
8.2.2. Model architecture
Agent-based simulation is performed in both WP1 and WP4 [84], and a
common architecture has been developed. An outcome of this is a software
package that provides generic classes needed to create specific agent-based
simulation models.
In addition to this, a Simplatform library was developed by IT Innovation (as a
result of internal collaboration across the ROBUST and PrestoPRIME [75]
projects), which is freely available to ROBUST partners. This component
provides simulation management functionality that can be integrated with
event-driven simulation models in order to speed-up and ease model
development and management process. More details are given in Appendix
K.
The figure below depicts the components that form and interact with the
simulation environment. The simulation environment consists of the simulation
model that is integrated with the simulation platform library. A Community
Metrics Engine module is used by the model during its initialisation and
provides information necessary to bootstrap agents according to the history
community state. More information about this in the following section. User
interaction with the simulation model is provided through a visual user
interface provided by a Visualisation Engine module. This allows users to
experiment with configurations of the model to explore the outcome of some
configurable parameters. This could be, for example, changing the activity
levels of certain users, increasing the amount of data generated, etc. The
visualisation is also an important aspect to feed back to the user to help
understand the impact of certain events, to complement quantitative metrics.

257859), 2011 77/145

Figure 27 : Simulation model architecture.

8.2.3. Simulation process
When an agent model is executed, it simulates the activity of real community
users that are modelled as software agents. Each agent models the behaviour
of a specific community user and is capable to execute certain actions.
Consider, for example, the following actions identified based on the SAP SCN
community:
1. Create questions in the form of forum threads. This models the activity
of community users that create threads where each thread is an
instance of a problem (question) to be solved (answered).
2. Create replies by providing answers to threads. This models the
behaviour of the community users that help resolve questions and gain
points in this process (referred to as community contributors). Here,
each agent exhibits different level of the expertise (quality of the reply)
based on the community user it is modelling in the system.
3. Mark answers by assigning points to the replies produced by
community users. This models the behaviour of thread creators that,
once replies are produced for threads they own, judge the quality of
answers by assigning points to the reply owners and decide if thread as
answered.
It is assumed that during the simulation each agent is triggered to perform
each of the above actions with a frequency analogous to the behaviour of the
real community user it models. Consequently, the diversity between various
community users is maintained and represented within the model on the level
of individual agents. To achieve this, the model requires historical community
data to bootstrap the initial state of the community. This allows us to recreate
the dynamics of the real community from a selected history time and to
simulate its further evolution in time.
During the simulation, the dynamics of the community emerges from
interactions among agents that, for example, create threads, respond to them

257859), 2011 78/145

and assign points to provided answers. Since one of the main goals of the
model is to predict the change in the activity of community users that respond
to threads (henceforth called contributor agents), a particular attention is paid
to realistically model their behaviour. In this context it is assumed that the
activity level of each such agent varies according to the behavioural rules the
agent is provided with. The behavioural rule represents a reactive response of
the agent (for example its internal adjustment of a frequency at which it
produces replies) that is derived from correlation analysis of history
community data.
An example of a behavioural rule that the contributor agent could rely on is: if
the number of newly created threads increases, then the activity level should
increase proportionally. Whilst the identification of behavioural rules is
performed outside the scope of the simulation model (during history data
correlation analysis), the model then is used to evaluate the accuracy of the
statistical analysis (and correlation hypotheses) as well as to produce the
contributor agent activity change forecasts and impact on the community
health.
8.3. Compartment model and Gibbs sampler
In In order to quantify risks and opportunities at the user level (and later on at
the community level), we need to estimate probabilities that certain events
occur. Each such event will then mean that one or more users move from one
compartment to another one. For a given risk, denote by p(i) the probability of
user i being at risk. A typical example might be
A consecutive drop of activity of user i over the next two month by 20%.
or, e.g.,
User i completely stops his/her activity within the next 6 months.
Accordingly we would consider, in the first case, a compartment of users who
experience a corresponding drop in activity, and in the second case a
compartment of all users that do not show any activity at all. Computing these
probabilities will occur in a learning phase, in which given user-level data of
past actions is processed before such probabilities are used to provide
forecasts of future behaviour at the community level.
Suppose we have given a set of past data, i.e. some knowledge of past
events (risks) occurring. By indicating the occurrence of risk in the past as 1
and non-occurrence as 0 for all users, a set of observed binary responses, y,
is created, and following this a suitable associated set of explanatory
covariates must be decided on, x. Assuming that these binary responses
follow a Bernoulli distribution, the probability of the risk occurring for user i,
p(i), is then the probability of 'success' of the corresponding observed binary
response. Therefore a process is needed that is capable of handling and
estimating binary responses, which is what lead to the choice of the Gibbs
Sampler (GS) following a study of the work of Albert and Chib [2]. The GS is
discussed further in Section 8.3.4.

257859), 2011 79/145

Figure 28, below, illustrates how the compartment model and GS can be used
to detect risks or opportunities, specifically depicting the information
consumed by the compartment model that stems from the GS: the GS
computes probabilities of events occurring at the user level, i.e. probabilities
that a user moves from one compartment to another one. From this, migration
rates between compartments at the community level are computed, which are
then fed into the compartment model.

Figure 28: Using Data flow diagram of the compartment model and Gibbs sampler.

Estimation of compartment sizes is discussed below in Section 8.3.1, followed
by an example of how migration rates between compartments can be
calculated without the GS in Section 8.3.2. Forecasting is discussed with an
example in Section 8.3.3.

8.3.1. Estimating compartment sizes
Applying the compartment model in ROBUST, the following general
assumptions are made:

x Members within a compartment are indistinguishable from each other.
(Otherwise the given model is lacking in realism in the sense that such
a compartment should be replaced by several other ones.)
x The exchange or migration rate between different compartments
depends on the number of individuals of these compartments. (This
assumption can easily be dropped, resulting in a nonlinear system of
ordinary differential equations, please see below. In what follows, we

257859), 2011 80/145

will simplify the exposition and the analysis by keeping the assumption
as it is.)
x The number of members of each compartment is known at a certain
time t0 to a reasonable accuracy. (This is not a strong assumption, as
long as the definition of what a compartment constitutes is easily
quantifiable.)

Suppose we have given n < 1 compartments of the system to be simulated. At
time t < t0, the total number of members (the mass) of compartment k is
denoted by mk(t), i.e. we consider n functions mk : [t0,+[ R R(k =1, . . . , n)]],
which we have to compute or approximate. (Note that, according to the
assumptions, we know mk(t0) (k = 1,., n).) At any given time t > t0, the
change of mass in compartment k, dmk(t)/dt, is given by is the difference
between the inflow and the outflow to other compartments. The inflow from
compartment no. j to compartment no. k is proportional to the mass inside
compartment j, while the outflow out of compartment no. k to compartment no.
i is proportional to the mass in compartment no. k. Denote by _j,k(t) the time-
dependent proportionality constant for the flow from compartment no. j to
compartment no. k, and by _k,i(t) the time-dependent proportionality constant
for the flow from compartment no. k to compartment no. i. These parameters
are dimensionless, and they will later on be estimated by the Gibbs sampler. If
_j,k(t) = 0 for all t, then there does not exist an edge between the two nodes j
and k of the graph describing the overall model. As a consequence of the
discussion above, the following ordinary differential equation holds for the mk:

The differentiation symbol represents differentiation with respect to t. The
resulting mathematical model is a system of ordinary differential equations,
readily solvable by a variety of different numerical models, as soon as
estimates for starting conditions mk(t0) (k = 1, . , n) and the flows _j,k(t) are
available.
8.3.2. Estimating migration rates without Gibbs Sampler
The Gibbs Sampler relies on the availability of potentially large amounts of
past data. Also, sampling involves a learning phase that can be
computationally expensive. In this subsection, we present an alternative
method for estimating migration rates with linear runtime behaviour. The
drawback of this method is that it does not produce confidence intervals for
the migration rates computed.
Assume we are considering a community with 10 users, distributed between
three compartments, e.g. 'Active', 'nactive', and 'Newbie'. Suppose we do
not use the Gibbs sampler, but compute instead estimates for migration rates

257859), 2011 81/145

between different compartments directly from past allocations of users to
compartments. For this, consider, e.g., the last two times T and T-1. For
these, the allocation of users to compartments was as depicted in Table 16
below:
Table 16: Example of users allocated to compartment at two time periods, T and T-1.
User
Compartment
allocation at T-1
Compartment
allocation at T
1 1 2
2 1 2
3 2 2
4 2 3
5 1 1
6 2 3
7 2 2
8 3 2
9 1 2
10 2 2

Accordingly, the number of users in compartment 1 at time T-1 is mass(T-1,1)
= 4, while mass(T,1)=1; the number of users in compartment 3 at time T is
mass(T, 3) = 2, etc.
From the table, we can likewise compute migration rates lambda(T, c, k)
between compartment no. c and compartment no. k. For instance, lambda(T,
1,2) = 0.75, because 75% of all users in compartment 1 at time T-1 move to
compartment no. 2 in the time period between time T-1 and T. Likewise,
lambda(T, 1, 3) = 0, etc. The table below gives all migration rates, where
lambda(T,c,c) (i.e. the diagonal) indicates the net emigration rate out of
compartment c.

Table 17: Example migration rates, corresponding to data in Table 16.
lambda(T, c,k) Compartment k
1 2 3
Compartment
c
1 -0.75 0.75 0
2 0 -0.4 0.4

257859), 2011 82/145

3 0 1 -1

Note that, while it is possible to estimate migration rate in this rather simple
way, it is difficult to ascertain the quality of these estimates: no confidence
intervals are produced, nor are there any other measures of robustness at
hand. Also, the migration rate forecast as produced above is only valid for the
time period from time T to time T+1, while the actual compartment model will
need further forecasts of migration rates for times T+2, T+3, etc. For these
reasons it is believed that estimated migration rates as based on the Gibbs
sampler will provide higher quality forecasts.
8.3.3. Computing forecasts of compartment sizes
As before, consider a community with three compartments 'Active', 'nactive',
and 'Newbie', with in total 10 Newbies, 3 active and 3 inactive users at time T.
Migration rate forecasts for the future are specified as in Figure 29, below.

Figure 29: Migration rate forecasts.
The compartment model now forecasts sizes of compartments at time T+1 as
follows:
Newbie: mass(T+1,Newbie) = mass(T, Newbie) - 0.3*mass(T, Newbie)
0.1*mass(T,Newbie) = 10 0.1*10 0.3*10 = 6.
Active : mass(T+1,Active) = mass(T,Active) + 0.3*mass(T, Newbie) +
0.1*mass(T, Inactive) 0.2*mass(T,Active) = 3 + 0.3*10 + 0.1*3 0.2*3 = 5.7.
Etc.

Accordingly, the output of the compartment model is a forecast of
compartment sizes (i.e. number of users per compartment for a certain
number of future times. The table below lists the forecasts for times T+1 to
T+10:
Table 18: Compartment model forecasts for times T+1 to T+10.
Time Newbie Active Inactive
T 10 3 3
T+1 6 5.7 4.3

257859), 2011 83/145

T+2 3.6 6.79 5.61
T+3 2.16 7.073 6.767
T+4 1.296 6.9831 7.7209
T+5 0.7776 6.74737 8.47503
T+6 0.46656 6.478679 9.054761
T+7 0.279936 6.228387 9.491677
T+8 0.167962 6.015858 9.81618
T+9 0.100777 5.844693 10.05453
T+10 0.060466 5.711441 10.22809

8.3.4. Gibbs sampler
It is planned to use the Gibbs Sampler (GS) on the community level to
determine how many users of a particular type change their behaviour in a
pre-specified future time period.
The GS introduces the concept of 'latent' data, say Z, which are essentially
the continuous interpretation of y. For y(i)=1, Z(i) is normally distributed with
truncation from the left at 0, and for y(i)=0, Z(i) is normally distributed with
truncation from the right at 0 (where the mean of said normal distributions
depends on an elasticity variable vector and the set of explanatory
covariates X, and variance is 1). The form of the problem is that y and X are
known whilst p, Z and are unknown, however p can be easily calculated
from Z and . Therefore it is the calculation of Z and that is irksome, and
this is what the GS does by iteratively simulating estimates of each. The result
of this iterative simulation is a distribution of 's, of which the mean can be
taken. This is then used along with the explanatory covariates X to produce
the probability estimates p, as p depends on and X. By having estimated
these probabilities p, consequently it is a small step to estimate the observed
binary responses y.
A numeric example of the use of the GS is provided in Appendix M.
8.4. Other forecasting tools in the ROBUST project
As stated previously, several work packages in ROBUST are working on risks
and opportunities by, e.g., examining relationships between features that can
be used to detect them, identifying events and developing tools for detection
and forecasting. The methodologies described above, in Sections 8.2 and 8.3,
have been specifically developed to be as flexible as possible and to address
as many different risks and opportunities as possible (especially with respect
to forecasting). Below are examples of two approaches that focus on specific
risks, which are currently being researched by WP3 and WP5.
8.4.1. Churn analysis
Communities depend on maintaining a vibrant mix of users, thereby
demonstrating to outsiders and non-users the utility of entering the community
and the level of activity and interaction that takes place. For instance, in the
context of the SAP Community Network should many users leave the

257859), 2011 84/145

community then an outside user who wishes to have his/her question
answered, regarding a specific technical problem, might seek another
community with more users this increasing the likelihood that he/she will get
a response.
This notion of user leaving a community is defined as 'churn', a key risk for
community operators. Collaborative work between WP3 and WP5 sought to
identify the behavioural traits that are associated with churners and non-
churners, thereby empowering community hosts with the means to detect
churn early in users and therefore act accordingly to retain their membership.
The details of this approach are described in D5.1 [49]. The approach is
currently under research, but the subsequent toolset that will be implemented
will function by: first: building features to describe the behaviour of community
users i.e. their participation, interaction, network position, etc before
second: learning models that describe churning behaviour in different
communities, thereby allowing future predictions to be made as to whether a
user will churn or not based on their exhibited behaviour.
8.4.2. Activity prediction
One risk faced by community operators is a reduction in community activity.
Should activity diminish then the utility of the community to an external
audience is reduced and the likelihood of attracting new users also lessens.
Gaining an understanding of what drives activity in communities is therefore
important, it would empower community operators and hosts with the key
factors and traits to watch out for as early warning signs of reduced activity.
Conversely, understanding what factors drive heightened activity empowers
community hosts with the means to inject content that will raise participation
and discussions.
As part of WP3 an approach has been devised that will form the toolset for
predicting activity in communities i.e. the number of comments/replies that
content will garner. The approach functions in two stages by first: identifying
seed posts i.e. posts that yield a reply and second: predicting the level of
activity that seed posts will generate. Various features are included to identify
which perform best in both of the two stages, describing the user, the content
being shared and the focus of the user across topics. A complete description
of these features, together with the prediction experiments, results and
findings is included in D5.1 [49]. The toolset containing this approach is
currently being implemented, but once complete, will provide community hosts
with the means to assess their community for the factors that drive activity and
identify reductions in those factors that could harm community activity and
participation.
9. Conclusions and future work
This report has covered a large space of literature in the areas of risk
management in social networks, discussing theories from the sociology
domain and related work on detecting risks and opportunities, risk

257859), 2011 85/145

management frameworks, and techniques for specifying and detecting risks
and opportunities. We have extended the body of knowledge of possible risks
and opportunities an online community may face, and discussed several
concrete examples in the context of each of use case partners in ROBUST.
Based on the findings from the literature and requirements from the use case
partners, this report has proposed a framework for proactive management of
risks and opportunities in online communities.
We emphasise the aim of achieving proactive community management, which
is achieved by: (1) assessment and forecasting whether risks or opportunities
are likely to occur in the future, which empowers community managers to take
actions to mitigate a risk or seize an opportunity; (2) simulating what-if
scenarios, or estimating the impact of a risk or opportunity on the community;
(3) automatically executing a treatment plan based on risk or opportunity
triggers; (4) estimating the dependencies risks and opportunities may have on
each other, or actions taken by a community manager to address a risk; (5)
visualisation techniques to assist human operators in interpreting and
managing risks and opportunities; (6) adopting a service based architecture,
in which it is possible to dynamically include services into the framework to
manage new risks and opportunities or process and visualise data. The work
in progress in collaboration with other ROBUST WPs will validate the
approaches and the choice of techniques.
Several models and techniques for detecting and forecasting risks and
opportunities are developed in different work packages in ROBUST. Graph-
based and mathematical modelling techniques are considered in ROBUST
and in WP1, namely Gibbs sampler, agent based modelling and compartment
models. Each of the modelling techniques allows us to forecast the probability
of an event occurring in the future; the former based on the user (micro) level,
and the latter on the community (macro) level where agent-based modelling
provide a bridge allowing to simulate the effect of micro level interactions on
macro phenomena. Whilst the mathematical approaches lend themselves
better to real-time processing, the agent based simulation lends itself better
for users interactions to understand impact and causality of events. However,
the work conducted on detection and forecasting of risks and opportunities
during the course of the ROBUST project is not limited to these techniques.
When a risk or opportunity has a sufficiently high probability of occurring,
according to thresholds defined by a risk manager, a treatment workflow may
be executed. We propose a workflow specification based on BPMN, with an
engine (Activiti) that can automatically execute actions defined in the plan. We
assume that the risk manager will define the plan, but will, during the course
of the project, attempt to identify and make available common actions that can
be composed into a specific plan.
It is clear that there may be complex dependencies between risks and
opportunities, and that actions taken by a community manager to address a
risk may impact on other risks and opportunities (positive or negative). Being
able to understand and model such dependencies can increase the accuracy
of the risk assessment and optimise the performance of the assessment

257859), 2011 86/145

process. This is an interesting challenge that we aim to address using
Bayesian networks approach during the course of this project.
Implementation of the risk and opportunity editor and visualisation
components is in progress. The ideas described in this document will then be
validated in the context of ROBUST platform making use of the large scale
capabilities built into the platform by WP6 and WP2.


257859), 2011 87/145

A. List of Figures
Figure 1: ISO 31000 management process ................................................... 24
Figure 2: Event Tree Analysis. ....................................................................... 29
Figure 3: Fault Tree Analysis. ........................................................................ 29
Figure 4: Cause-consequence analysis ......................................................... 30
Figure 5 Goal model showing relations (+,-) between goals and attributes
(Fully Satisfied-FS, Partially Satisfied-PS) ..................................................... 30
Figure 6: R&O dependencies ......................................................................... 31
Figure 7: Markov chain example .................................................................... 32
Figure 8: Bayesian Network example ............................................................ 34
Figure 9: Illustration of linear regression on a data set .................................. 35
Figure 10: Graph representation of a compartment model. ........................... 36
Figure 11: UML Activity diagram example (OMG 2011) ................................. 40
Figure 12: EPC example [60] ......................................................................... 41
Figure 13: Business Process Modelling Notation example [70] ..................... 42
Figure 14: YAWL example [89] ...................................................................... 43
Figure 15 WP1- Risk and opportunity management framework ..................... 53
Figure 16 A partial view of the WP3 ontology relations .................................. 55
Figure 17: Relation between event and objectives......................................... 56
Figure 18 State model with risky state ........................................................... 56
Figure 19 Risk event related to explanatory parameters p1 and p2 ............... 57
Figure 20 state model with risk state and opportunity state ........................... 58
Figure 21: Bayesian Network example of modelling dependencies between
variables and risks and opportunities. ............................................................ 59
Figure 22: Risk class diagram depicting main components and relations ...... 59
Figure 23: Representation of the XML schema for risk and opportunity ........ 60
Figure 24: Risk states. ................................................................................... 62
Figure 25: Risk treatment plan creation and evaluation ................................. 65
Figure 26: Risk treatment plan execution ....................................................... 66
Figure 27 : Simulation model architecture. ..................................................... 77
Figure 28: Using Data flow diagram of the compartment model and Gibbs
sampler. ......................................................................................................... 79
Figure 29: Migration rate forecasts. ............................................................... 82

257859), 2011 88/145

Figure 30: Simplatform architecture. ............................................................ 110
Figure 31: Simplatform and simulation model integration process outline. .. 112
Figure 32: Histogram of the Gibbs Sampler estimated probabilities of
'success' for the given sample population of users where 'success' is defined
as the occurrence of risk (risk being a 20% or greater drop in user activity
from one time period to the next). ................................................................ 130
Figure 33: Scatter plot comparing the goodness of fit of the baseline and
Gibbs Sampler probit model with regards to the previously given and data
matrices. ...................................................................................................... 131


257859), 2011 89/145

B. List of Tables
Table 1: Benefits in operating/hosting the online community. Ranked 1-5, 5
being the highest. ........................................................................................... 15
Table 2: Parameters perceived useful for quantification of business impact.
Ranked 1-5, 5 being the highest. ................................................................... 16
Table 3: Workflow specification coverage rating ............................................ 45
Table 4: Specification criteria description ....................................................... 45
Table 5: OMG XMI encapsulation of the UML activity model ......................... 46
Table 6: EPML evaluation summary .............................................................. 46
Table 7: WS-BPEL 2.0 evaluation summary .................................................. 47
Table 8: WS-HumanTask 1.1 evaluation summary ........................................ 48
Table 9: WS-BPEL4People 1.1 evaluation summary ..................................... 48
Table 10: ebXML Business Process Specification 2.0.4 summary ................ 49
Table 11: YAWL 2.1 specification summary .................................................. 50
Table 12: XPDL 2.1 specification summary ................................................... 50
Table 13: BPMN 2.0 specification summary .................................................. 51
Table 14: T2FLOW specification summary .................................................... 52
Table 15: Mapping between the risks and opportunities and the partners
providing components for assessment and forecasting ................................. 74
Table 16: Example of users allocated to compartment at two time periods, T
and T-1. ......................................................................................................... 81
Table 17: Example migration rates, corresponding to data in Table 16. ........ 81
Table 18: Compartment model forecasts for times T+1 to T+10. ................... 82
Table 19: Community communication. ........................................................... 99
Table 20: Community content modification. ................................................. 100
Table 21: Community meta-data modification. ............................................. 100
Table 22: Workflow SDK overview ............................................................... 101
Table 23: Risks identified by community hosts and owners. ........................ 139
Table 24: Opportunities identified by community hosts and owners. ........... 141


257859), 2011 90/145

C. List of Abbreviations
Abbreviation Explanation
AIRMIC Association of Insurance and Risk Managers
ALARM The National Forum for Risk management in the
public sector
BN Bayesian Network
BPEL Business Process Execution Language
BPMI Business Process Management Initiative
BPMN Business Process Modelling Notation
BSI British Standards Institution
CHI Community Health Index
CORMSIS Centre of Operational Research, Management
Sciences and Information Systems
COSO Committee of Sponsoring Organisations
CPT Conditional Probability Table
CRM Customer Relationship Management
CSTR Continuously Stirred Tank Reactor
DAG Directed Acyclic Graph
DI Diagram Interchange
EPC Event driven Process Chains
EPML EPC Markup Language
ERM Enterprise Risk Management
ESB Enterprise Service Bus
ETA Event Tree Analysis
FERMA Federation of European Risk Management
Association
FTA Fault Tree Analysis
GS Gibb Sampler
HAZOP Hazard and Operability Studies
IEC International Electrotechnical Commission
IRM Institute of Risk Management
ISO International Organization for Standardization
MCMC Markov Chain Monte Carlo

257859), 2011 91/145

Abbreviation Explanation
MOF Meta Object Facility
MSE Mean of Squared Errors
NLP Natural Language Processing
OGC Office of Government Commerce
OMG Object Management Group
ROI Return On Investment
SCN SAP Community Network
SDK Software Development Kit
SWIFT Structured What-If Technique
UKoB University of Koblenz
UML Unified Modelling Language
VPL Visual Programming Language
UI User Interface
WP Work Package
WS Web Service
YAWL Yet Another Workflow Language
XMI XML Metadata Interchange
XML EXtensible Markup Language
XPDL XML Process Definition Language
XSLT EXtensible Stylesheet Language


257859), 2011 92/145

D. References
1. ActiveVOS. ActiveVOS Enterprise. 2011; Available from:
http://www.activevos.com/.
2. Albert, J.H. and Chib, S., Bayesian Analysis of Binary and
Polychotomous Response Data. Journal of the Americal Statistical
Association, 1993. 88(422): p. 669-679.
3. Andrews, J.D. and Ridley, L.M., Application of the cause-consequence
diagram method to static systems. Reliability Engineering and System
Safety, 2002. 75(1): p. 47-58.
4. Angeletou, S., Rowe, M., and Alani, H. Modelling and Analysis of User
Behaviour in Online Communities. in International Semantic Web
Conference. 2011. Bonn, Germany.
5. Arthur, W.B., Inductive reasoning and bounded rationality. American
Economic Review, 1994. 84: p. 406-411.
6. Asnar, Y., Giorgini, P., and Mylopoulos, J. Risk modelling and
reasoning in goal models. Report number: 2006
7. Ben-Gal, I., Bayesian Networks, in Encyclopedia of Statistics in Quality
and Reliability2008, John Wiley & Sons, Ltd.
8. BOCGroup. ADONIS. 2011; Available from: http://www.boc-
group.com/products/adonis/.
9. Bon, G.l., Psychology of Crowds 1895: Sparkling books.
10. Bonabeau, E. Agent-Based Modelling: Methods and Techniques for
Simulating Human Systems. in Proceedings of the National Academy
of Sciences of the United States of America. 2002.
11. Borland. Together - Visual Modeling for Software Architecture Design.
2011; R3:[Available from:
http://www.borland.com/us/products/together/.
12. Brueckner, S. and Parunak, H.V.D., Self-organizing MANET
management. Engineering Self-Organising Systems, 2004. 2977/2004:
p. 20-35.
13. BSI, BS 31100:2011, Risk management. Code of practice and
guidance for the implementation of BS ISO 31000, 2011.
14. Bughin, J., The rise of enterprise 2.0. Journal of Direct, Data and Digital
Marketing Practice, 2008. 9: p. 251-259.
15. Carley, K.M. and Kaufer, D.S., Semantic connectivity: An approach for
analyzing symbols in semantic networks. Communication Theory,
1993. 3(3): p. 183-213.
16. Committee of Sponsoring Organizations of the Treadway Commission
(COSO), Enterprise Risk Management Integrated Framework, 2004.
17. Cools, S., Gershenson, C., and D'Hooghe, B., Self-organizing traffic
lights: A realistic simulation, in Self-Organization: Applied Multi-Agent
Systems, Prokopenko, M., Editor 2007, Springer: London. p. 41-49.
18. Cross, R., Parker, A., and Prusak, L. Knowing What We Know:
Supporting Knowledge Creation and Sharing in Social Networks.
Report number: IBM Institute for Knowledge Management. 2000

257859), 2011 93/145

19. Cuntz, N. and Kindler, E. EPC Tools. 2005; Available from:
http://www2.cs.uni-paderborn.de/cs/kindler/research/EPCTools/.
20. Danescu-Niculescu-Mizil, C., et al. How opinions are received by online
communities: A case study on Amazon.com helpfulness votes. in
Proceedings of the 18th International Conference on World Wide Web.
2009.
21. Davenport, T. and Prusak, L., Working Knowledge: How Organizations
Manage what they Know1998: Harvard Business School Press.
22. EclipseFoundation. Eclipse Plugin Framework. 2011; Available from:
http://www.eclipse.org/.
23. Ericson, C. Fault Tree Analysis- A History. in The 17th International
System Safety Conference. 1999.
24. European Network and Information Security Agency (ENISA) Security
Issues and Recommendations for Online Social Networks. 2007.
25. Gershenson, C. and Heylighen, F., How can we think complex?, in
Managing Organizational Complexity: Philosophy, Theory and
Application, Richardson, K., Editor 2005, Information Age Publishing.
26. Godfrey, K., Compartmental Models and Their Application1983:
Academic.
27. Granovetter, M.S., The Strength of Weak Ties. American Journal of
Sociology, 1973. 78(6): p. 1360-1380.
28. Granovetter, S.M., The Strength of Weak Ties. American Journal of
Sociology, 1973. 78(6): p. 1360-1380.
29. Grinstead, C.M. and Snel, J.L., Introduction to Probability.
30. Heylighen, F., Modelling emergence. World Futures: the Journal of
General Evolution, special issue on creative evolution, 1991: p. 1-10.
31. Hogg, T. and Huberman, B.A., Controlling chaos in distributed systems.
IEEE Transactions on Systems, Man and Cybernetics, 1991. 21: p.
1325-1332.
32. Hogg, T. and Huberman, B.A. Dynamics of large computational
ecosystems. Report number: HPL-200-77. Information Dynamics
Laboratory, Hewlett-Packard. 2002
33. Homans, G.C., Social Behavior as Exchange1961: New York: Harcourt
Brace and World.
34. Huang, B.Q., Kechadi, M.-T., and Buckley, B. Customer Churn
Prediction for Broadband Internet Services. in Proceedings of the 11th
International Conference on Data Warehousing and Knowledge
Discovery. 2009. Linz, Austria: Springer-Verlag.
35. Huberman, B.A. and Hogg, T., The Emergence of Computational
Ecologies, in Lectures in Complex Systems, Nadel, L. and Stein, D.,
Editors. 1993. p. 185-205.
36. Hummel, H., et al., Encouraging contributions in learning networks
using incentive mechanisms. Journal of Computer Assisted Learning,
2005. 21: p. 355-365.
37. IBM. Rational Software Architect for WebSphere. 2011; Available from:
http://www-01.ibm.com/software/awdtools/swarchitect/websphere/.
38. Intalio. Intalio|BPMS. 2011; Available from:
http://www.intalio.com/bpms.

257859), 2011 94/145

39. IRM, A Risk Management Standard, 2002.
40. ISO, ISO 31000:2009 Risk management - Principles and guidelines,
2009, ISO.
41. ISO, ISO Guide 73:2009, Risk management -- Vocabulary, 2009.
42. ISO, ISO/IEC 31000:2009 Risk management -- Principles and
guidelines, 2009.
43. ISO, ISO/IEC 31010:2009, Risk management -- Risk assessment
techniques, 2009.
44. Isograph. Event Tree Analysis. 2011; Available from:
http://www.eventtreeanalysis.com/.
45. Jakiela, J., Litwin, P., and Olech, M. MAS approach to business models
simulations: supply chain management case study. in Proceedings of
the 4th KES international conference on Agent and multi-agent
systems: technologies and applications, Part II (KES-AMSTA'10).
2010. Springer-Verlag.
46. Jennings, N.R., An agent-based approach for building complex
software systems. Communications of the ACM, 2001. 44(4): p. 35-41.
47. Kankanhalli, A., Tan, B.C.Y., and Wei, K.-k., Contributing Knowledge to
Electronic Knowledge Repositories: An Empirical Investigation. MIS
Quarterly, 2005. 29(1): p. 113-143.
48. Karnstedt, M., et al. Churn in Social Networks: A Discussion Boards
Case Study. in IEEE International Conference on Social Computing
(SocialCom2010). 2010.
49. Karnstedt, M., et al. Report on Feature Selection and Merging. Report
number: D5.1. EC FP7-ICT ROBUST project. 2011
50. Karnstedt, M., et al. The Effect of User Features on Churn in Social
Networks. in ACM Web Science Conference 2011 (WebSci2011).
2011. Koblenz, Germany.
51. Kleinbaum, D.G., et al., Applied Regression Analysis and Multivariable
Methods2007: Duxbury.
52. Klugl, F. and Rindsfuser, G., Large-Scale Agent-Based Pedestrian
Simulation. Multiagent System Technologies, 2007. 4687: p. 145-156.
53. Khler, W. Unstoppable: The Rise of Enterprise 2.0. 2011 28/09/2011];
Available from: http://www.bayforce.com/2011/04/22/unstoppable-the-
rise-of-enterprise-2-0/.
54. Kwan, T.-w., A risk management methodology with risk dependencies,
in Dept. of Computing2010, The Hong Kong Polytechnic University.
55. Lefurgy, C., et al. Autonomic multi-agent management of power and
performance in data centers. in The Seventh International Conference
of Autonomic Agents and Multiagent Systems. 2008.
56. Lithuim Technologies Measuring Community Health for Online
Communities, White paper.
57. Macy, M.W., Learning Theory and the Logic of Critical Mass. American
Sociological Review, 1990. 55(6): p. 809-826.
58. McPherson, M.J., Popielarz, A.P., and Drobnic, S., Social Networks
and Organizational Dynamics. American Sociological Review, 1992.
57(2): p. 153-170.

257859), 2011 95/145

59. Mendling, J., Neumann, G., and Nttgens, M. Yet Another Event-
Driven Process Chain (Extended Version). Report number: 2005
60. Mendling, J. and Nttgens, M. EPC Markup Language (EPML) - An
XML-Based Interchange Format for Event-Driven Process Chains
(EPC). Report number: Vienna University of Economics and Business
Administration. 2005
61. Microsoft. Visio 2010. 2010; Available from:
http://office.microsoft.com/en-gb/visio/.
62. Mocan, A., Brauer, F., and Barczynski, W. Provisioning and preparation
of the SAP Community Network Data. Report number: D8.1. EC FP7-
ICT ROBUST project. 2011
63. Mouratidis, H., Giorgini, P., and Manson, G., An Ontology for Modelling
Security: The Tropos Approach, in Knowledge-Based Intelligent
Information and Engineering Systems, Palade, V., Howlett, R., and
Jain, L., Editors. 2003, Springer Berlin / Heidelberg. p. 1387-1394.
64. Mozer, M., et al., NIPS, Churn Reduction in the Wireless Industry1999.
935-941.
65. myGrid. Taverna. 2011; Available from: http://www.taverna.org.uk/.
66. Nahapiet, J. and Ghoshal, S., Social Capital, Intellectual Capital, and
Organizational Advantage. Academy of Management Review, 1998.
23(2): p. 242-266.
67. Nasukawa, T. and Yi, J. Sentiment analysis: Capturing favorability
using natural language processing. in Proceedings of the 2nd
international conference on Knowledge capture. 2003.
68. NationalInstruments. LabVIEW. 2011; Available from:
http://sine.ni.com/np/app/flex/p/ap/global/lang/en/pg/1/docid/nav-77/.
69. Office of Government Commerce (OGC), Management of Risk:
Guidance for Practitioners, 2010.
70. OMG, Business Process Model and Notation (BPMN) Version 2.0,
2011. p. 508.
71. Omicini, A. SODA: societies and infrastructures in the analysis and
design of agent-based systems. in First international workshop, AOSE
2000 on Agent-oriented software engineering. 2000. Springer-Verlag
New York.
72. Parunak, H.V.D., Go to the ant: Engineering principles from natural
multiagent systems. Annals of Operations Research, 1997. 75(0): p.
69-101.
73. Parunak, H.V.D. and Brueckner, S.A. Engineering swarming systems.
in Methodologies and Software Engineering for Agent Systems. 2004.
Kluwer.
74. Polack, F. and Stepney, S. Emergent properties do not refine. in
REFINE workshop, Electronic notes in Theoretical Computer Science.
2005. Elsevier.
75. PrestoPRIME. EC FP7-ICT PrestoPRIME Project. 2011; Available
from: http://www.prestoprime.org/.
76. Pretula, M., Gasser, L., and Carley, K., Simulating Organizations:
Computational Models of Institutions and Groups. MIT Press, 1998.

257859), 2011 96/145

77. Raz, T. and Hillson, D., A Comparative Review of Risk Management
Standards. Risk Management: An International Journal 2005. 7(4): p.
53-66.
78. Rheingold, H., The virtual community: homesteading on the electronic
frontier.2000: Cambridge, MA, MIT Press.
79. Ronen, I., Rowe, M., and Schwagereit, F. Metrics and Requirements
Update For Employee Use Case. Report number: D7.2. EC FP7-ICT
ROBUST project. 2011
80. Ronen, I., Ur, S., and Guy, I. IBM Employee Network Data and
Requirements. Report number: D7.1. EC FP7-ICT ROBUST project.
2011
81. Rowe, M., et al. Report on Social, Technical and Corporate Needs in
Online Communities. Report number: D3.1. EC FP7-ICT ROBUST
project. 2011
82. Rubinstein, R.Y. and Kroese, D.P., Simulation and the Monte Carlo
method2008: John Wiley & Sons.
83. Scherer, R., J. and Sharmak, W. Generic Process Template
Description for the Effect of Risks on Project Schedule. in CIB 2008 -
International Conference on Information Technology in Construction.
2008. Santiago, Chile.
84. Schwagereit, F., et al. Agent-based Community Simulation Framework.
Report number: D4.1. EC FP7-ICT ROBUST project. 2011
85. Schwagereit, F., Sizov, S., and Staab, S. Finding Optimal Policies for
Online Communities with CoSiMo. in Proceedings of the WebSci10:
Extending the Frontiers of Society On-Line. 2010.
86. SoftwareAG. ARIS Platform. 2011; Available from:
http://www.softwareag.com/corporate/products/aris_platform/default.as
p.
87. Sterman, J., Business Dynamics: Systems thinking and modeling for a
complex world2000: McGraw Hill.
88. Stoneburner, G., Goguen, A., and Feringa, A. Risk Management Guide
for Information Technology Systems. Report number: National Institute
of Standards and Technology. 2002
89. ter Hofstede, A. and Adams, M. YAWL - User Manual. Report number:
The YAWL foundation. 2010
90. Towey, I. Polecat use case data and requirements. Report number:
D9.1. EC FP7-ICT ROBUST project. 2011
91. TROPOS Project. Available from: http://www.troposproject.org/.
92. VisualParadigm. Business Process Visual Architect. 2011;
4.2:[Available from: http://www.visual-paradigm.com/product/bpva/.
93. VisualParadigm. Visual Paradigm for UML. 2011; Available from:
http://www.visual-paradigm.com/.
94. Wegov project, D5.1, Scenario definition, advisory board and
legal/ethical review, 2010.
95. Weidlich, A. and Veit, D., A critical survey of agent-based wholesale
electricity market models. Energy Econonmics, 2008. 30(4): p. 1728-
1759.

257859), 2011 97/145

96. Wolf, T.D. and Holvoet, T. Emergence and self-organisation: a
statement of similarities and differences. in Proceedings of the
International Workshop on Engineering Self-Organising Applications.
2004.
97. Xie, Y., et al., Customer churn prediction using improved balanced
random forests. Expert Syst. Appl., 2009. 36(3): p. 5445-5449.
98. YAWLFoundation. YAWL Editor. 2011; Available from:
http://www.yawlfoundation.org/.
99. Zambonelli, F. and Parunak, H.V.D. Signs of a revolution in computer
science and software engineering. in 2nd Italian Workshop on Objects
and Agents. 2001.
100. Zambonelli, F. and Parunak, H.V.D., Towards a paradigm change in
computer science and software engineering: A synthesis. Knowledge
Engineering Review, 2003. 18(4): p. 329-342.


257859), 2011 98/145

E. Use-case summary model for WP4 policy
descriptions


257859), 2011 99/145

F. Summary of potential treatment responses based
on WP3 survey data
Table 19: Community communication.
Plan Description Strategy type(s)
Send group message Send e-mail to defined group of
members. Example: notify users of
community event.
Mitigate; fallback;
avoid; exploit
Send single user message Send e-mail to single member.
Example: warn user of perceived
spamming activity.
Mitigate; fallback;
avoid; exploit
Conduct direct conversation with
member
Arrange a direct conversation with the
member via another communication
modality. Example: conduct affective,
personal communication to appease
valued, but frustrated member.
Mitigate; fallback;
avoid; exploit
Send reminders Single or periodic automated message
sent to member. Example: remember
to vote in an mentor election.
Mitigate; exploit
Send content link Community content reference sent
directly to member or added to forum
content. Example: point to answers in
another forum.
Mitigate; exploit
Assign a task to a member Define task and agree ownership with
community member. Example:
encourage more input from other
members.
Transfer
Request question follow-up from
member
Refer to question and request
response from member. Example:
request a member from a related
community to help contribute.
Mitigate; transfer
Request content review from member Refer to community content and
request meta-data response from
member. Example: unrated but
potentially value content needs to be
evaluated.
Mitigate; exploit
Request survey completion from
member(s)
Generate and issue survey with
request for completion. Example: gain
qualitative feedback on failing
community may lead to further
treatment plan.
Mitigate; exploit
Request vote from members Generate and issue vote process for
community members. Example:
evaluate the potential response to a
new feature in the community
environment.
Exploit

257859), 2011 100/145

Table 20: Community content modification.
Table 21: Community meta-data modification.
Modify member tagging Select member and modify status.

a community scope to enhance
expertise accessibility.
Exploit
Modify content tagging Add/remove content search tags to
improve accessibility. Example:
content not easily found using
commonly used search terms.
Mitigate; avoid
Issue rewards Change user status of member.
Example: add points to encourage
further involvement by member in
their community.
Exploit

Add new feature to community
environment
Request community development
functionality from technical team.
Example: adding pop-up hints to UI
components.
Exploit
Remove feature from community
environment
Request permanent/temporary
removal of functionality from the
technical team. Example: remove e-
mail communication to overloaded
moderator.
Mitigate; avoid
Move content within community Define new home for selected
community content. Example: a
selection of related threads in a large
forum gets moved to their own forum
space. Members to be notified.
Exploit
Remove content from community Select and remove content from
community. Example: inappropriate
contribution has been flagged as
abusive; content removed and user
warned.
Mitigate; avoid
Undo previous content change Replace content previously removed by
moderator. Example: content
requested by other members; removing
moderator informed to revoked action.
Mitigate; avoid
Transform content to shared asset Copy and place resource in highly
visible location. Example: a resource
posted by a member rated highly, so
tag as shared asset. May require
member permission first.
Exploit

257859), 2011 101/145

G. Workflow software frameworks
Earlier sections of this document have dealt with a number of graphical
notations that depict workflow models and various machine processable
formalisms that have been design to specify one or more of them. In this last
section, a selection of software development kits (SDKs) that provide the
application developer support for manipulating workflow models and in some
cases generating interactive UIs that support their graphical notations are
considered.
Due to the large number of SDKs published via the internet in this area, some
criteria for creating a short-list of potential candidates needs to be applied;
these are as follows:
Support for notations/specifications described in Sections 6.3
Non-commercial license
Java based development platform
Proven version available
Given these high-level technical requirements, the following software libraries
have been considered:
Table 22: Workflow SDK overview
SDK Notation
support
Model
support
License
Activiti (www.activiti.org) BPMN 2.0 BPMN 2.0 Apache
2.0
jBPM (http://www.jboss.org/jbpm) BPMN 2.0 BPMN 2.0
HumanTask
Apache
2.0
EPL 1.0
MIT
LGPL 2.1
Eclipse BPMN Modeler
(http://www.eclipse.org/bpmn/)
BPMN 2.0 EPL 1.0
Apache ODE
(http://ode.apache.org/)
BPEL 2.0 Apache
2.0
Enhydra
(http://www.together.at/prod/workflow)
BPMN ?.? XPDL 2.1 GPL 3.0
Orchestra
(http://orchestra.ow2.org/xwiki/bin/view/Main/)
3
rd
party BPEL 2.0 LGPL 3.0
Eclipse BPEL designer
(http://www.eclipse.org/bpel/)
Bespoke BPEL 2.0 EPL 1.0

257859), 2011 102/145

SDK Notation
support
Model
support
License
Open Business Engine
(http://obe.sourceforge.net/)
XPDL ?.? Unknown
WfMOpen
(http://wfmopen.sourceforge.net/)
XPDL 1.0 GPL ?.?
YAWL
(http://sourceforge.net/projects/yawl/)
YAWL
editor
YAWL LGPL 3.0
EPC Tools
(http://www2.cs.uni-
paderborn.de/cs/kindler/Forschung/EPCTools/)
EPC EPML GPL V 2.0
UML2 Tools
(http://www.eclipse.org/modeling/mdt/?project=uml2tools)
UML
subset
XMI EPL 1.0
ArgoUML (http://argouml.tigris.org) UML
subset
XMI EPL 1.0
Taverna workbench
(https://launchpad.net/taverna/t2/2.3.0)
Bespoke T2 LGPL 2.1

Of the SDKs presented above in Table 22 the strongest BPMN candidates are
Activiti and jBPM; both claim model and view support for their workflow
models. Eclipse's BPEL designer is perhaps the next best holistic solution that
offers support for an alternative workflow formalism. The remaining SDKs
outlined here (YAWL; EPC Tools; UML2 Tools; ArgoUML and Taverna) could
also be considered since each provides a model and view, but these notations
are arguably less powerful than those provided by BPMN or BPEL.

257859), 2011 103/145

H. Workflow technical specification feature review
The data presented in the following pages represents an indicative
comparison of nine widely recognized work-flow languages that have been
technically encoded for machine processing using XML schemas. The
following limitations apply to this feature enumeration:
x n many cases each particular feature can only be considered 'in the
large' as direct equivalences between languages/specifications are not
possible.
x The features described here is not definitive set for any one particular
language or for workflow languages in general.
x Any particular language may have more or less expressive power per
feature this is not captured here.
Readers interested in a much more detailed evaluation of some of the
languages/specifications provided here should visit
www.workflowpatterns.com for more information.
There is significant interoperability between some of these specifications (in
particular WS-BPEL and WS-BPEL4People). For this reason, two symbols
have been used: indicates direct symbolic support in the specification, while
signifies that the specification imports from another XML schema.

Feature
S
p
e
c
i
f
i
c
a
t
i
o
n

O
M
G

U
M
L
2
.
0
/
D
I

1
.
0

E
P
M
L

1
.
2

W
S
-
B
P
E
L

2
.
0

W
S
-
B
P
E
L
4
P
e
o
p
l
e

1
.
1

W
S
-
H
u
m
a
n
T
a
s
k

1
.
1

e
b
X
M
L

B
P
S

2
.
0
.
4

Y
A
W
L

2
.
1

X
P
D
L

2
.
1

B
P
M
N

2
.
0

T
2
F
L
O
W

Activities
Labelled activity

Composite
(representing a sub-
process)

Nested (privately
scoped activities)

Compensation

Conditional iteration

Conditional iteration
of activity sub-set


257859), 2011 104/145

Machine service

Temporal wait

Synchronous signal/message
sending/receiving
Asynchronous
signal/message
passing/receiving
Message/event
listening

Error/exception
generation

Transaction

relative coverage (%) 54 15 92 92 38 69 15 85 92 31

Activity qualification

Human agent instance
assignment

Human agent role

assignment

Human agent activity

ownership selection

Presentation format

Delegation

Priority

Estimation data

Constrained life-time

Explicit valid pre-
conditions

Explicit valid post-
conditions

Deadline specification

Data assignment

Evaluation (if..else..)

Machine service
variable mapping



257859), 2011 105/145

Feature
S
p
e
c
i
f
i
c
a
t
i
o
n

O
M
G

U
M
L
2
.
0
/
D
I

1
.
0

E
P
M
L

1
.
2

W
S
-
B
P
E
L

2
.
0

W
S
-
B
P
E
L
4
P
e
o
p
l
e

1
.
1

W
S
-
H
u
m
a
n
T
a
s
k

1
.
1

e
b
X
M
L

B
P
S

2
.
0
.
4

Y
A
W
L

2
.
1

X
P
D
L

2
.
1

B
P
M
N

2
.
0

T
2
F
L
O
W

Control

Conditional
transitions

Sequential transitions

Parallel transitions

Empty gateway

AND gateway

OR gateway

XOR gateway


Data

Data objects

Data fields/variables


Message
qualification

Priority

Presentation format

Instance correlation


Events

Signals

Labelled event/state

Temporal

Escalation

Error handler



257859), 2011 106/145

Feature
S
p
e
c
i
f
i
c
a
t
i
o
n

O
M
G

U
M
L
2
.
0
/
D
I

1
.
0

E
P
M
L

1
.
2

W
S
-
B
P
E
L

2
.
0

W
S
-
B
P
E
L
4
P
e
o
p
l
e

1
.
1

W
S
-
H
u
m
a
n
T
a
s
k

1
.
1

e
b
X
M
L

B
P
S

2
.
0
.
4

Y
A
W
L

2
.
1

X
P
D
L

2
.
1

B
P
M
N

2
.
0

T
2
F
L
O
W

Machine services

Service definition

Invoke service

Service
communication
channel/port


Participants, agents
and roles

Participant definitions

Role definition

Labelled human agent

Human agent
grouping


Meta-data

Editor spatial layout
info

Document creation
info



257859), 2011 107/145

I. Example treatment workflow using BPMN

Select top
contributor
s
Identify
new
incentives
Select
evaluation
Interview
Members..
Review
Content..
Select
Response..
Close
community
Choose
modificatio
n.
Manage
unpopular
member
Remove
unwanted
content
Select
incentive
Modify
community
topic
Add new
community
topic
Modify community content
Evaluate community
community
Motivate community

257859), 2011 108/145

J. Risk and opportunity editor
The editor is part of the Community Analysis Tool (CAT) which is being
developed in WP1. Starting with the community selection, the risk manager
can then view and edit the risks and opportunities in the system. Detailed
description will be included in future deliverables.


257859), 2011 109/145


257859), 2011 110/145

K. Simplatform
What is Simplatform?
Simplatform (SPM) is a Java library that provides additional functionality to the
event-based simulation models. Any new model that conforms to interfaces
exposed by SPM can be integrated with the platform and benefit with the
following functionality offered by the tool:
1. Configuration Management. SPM utilises JSon library to produce
human readable configuration templates for the model. Such templates
can be easily edited using any text editor. This allows the model
developer to specify its configuration using Java classes and the
platform is then responsible for transforming it into text-based format.
Once the simulation is run, SPM consumes the text-based
configuration, transforms it into an object that can be used within the
simulation environment.
2. Batch Management. SPM allows to run automated batch simulations.
When the platform is run in batch mode, it automatically manages the
simulation of multiple models based on the provided configuration files.
3. Event Management. SPM manages the ticking of the simulation clock
based on which simulation time is determined. The simulation model is
able to register to the time events (for example, triggered at one hour
simulation time intervals) and become notified about their occurrence.
This allows the simulation model to perceive the flow of the simulation
time and trigger relevant simulation events according to this. The
management over the clock and event triggering is provided by SPM
whereas the simulation model is required to register to the selected
events.
Simplatform architecture
The architecture of Simplatform library is represented in Figure 30.

Figure 30: Simplatform architecture.

Batch Manager
Configuration
Manager
Event Manager

Model
Configuration File

Batch of
configuration files
Simplatform


257859), 2011 111/145

In here, three main components of the library are outlined:
1. Batch Manager responsible for batch functionality. The main
functionality of this component is located in ModelRunner class which
is responsible for reading batch of configuration files and, in a serial
manner, running model simulations.
2. Configuration Manager that controls the process of creating, reading
and transforming simulation model configuration from Java classes to
human readable representation and vice versa. There exist two main
classes that provide the configuration management:
a. ToolConfigurationManager class that is responsible for
managing Simplatform configuration.
b. ModelConfigurationManager class that is responsible for
managing simulation model configuration.
3. Event Manager that manages the simulation time flow and notifies the
model about the occurring time events. This functionality is provided by
SimulationClock class that, once the simulation starts, performs the
clock ticking which emulates the flow of the simulation time. Once
certain period in simulation time is reached (eg., new hour, new month,
simulation start or simulation end) and the Simulation model registered
to it as the listener, SimulationClock class would notify the model about
such event occurrence.

Integrating simulation model with Simplatform
The diagram showing integrated simulation model with SPM is illustrated in
Figure 2. In here, three layers of the code are presented. Starting from the
top, the layer of simulation code represents the specific simulation model to
be integrated with SPM. Below this, a layer of the code is outlined that needs
to be created during the integration process with SPM. Finally, at the bottom
of illustration resides the SPM library.
The top blue arrows between simulation model and integration layers denote
functionality that is provided to the model from SPM. The arrows between
integration code and SPM code layers outline the main classes and interfaces
that need to be used by the integration code in order to integrate it with SPM.
In what follows we detail the development of these classes and explain their
role during the integration process.
The integration process involves the creation of at least three classes, the
examples of which are illustrated in the integration code layer in Figure 31:
1. ModelConfiguration class. This class must extend the
ModelConfigurationTemplate class provided by SPM. In this class the
developer is required to specify all configuration parameters used by

257859), 2011 112/145

the simulation model. An example code of this class is illustrated
below.

public class ModelConfiguration extends ModelConfigurationTemplate
implements
{
public String templateName = "Default Model Configuration Template";
public int numberOfAgents = 200;
public int numberOfForums = 300;

Figure 31: Simplatform and simulation model integration process outline.

2. HourlyEventWatcher class. This class must implement
SimulationClockEvent interface which exposes triggerEvent method.
Once the simulation is running, SPM will invoke the interface method at
hourly intervals and thus notify the simulation model about the flow of
simulation time. The example body of the method that activates the
model every simulation hour is illustrated below.
@Override
public void triggerEvent() {
try
{
// let every user act on content items (cm)

ModelConfigurationTemplate
Class

SimulationModelSchema
Class

SimulationClockEvent
Interface
Simplatform

IPlatformSetup
Interface

Simulation model code

ModelConfiguration
Class

ModelInterface
Class

HourlyEventWatcer
Class

Integration code
Extends Extends Implements Implements

Use configuration parameters
to setup the model

Trigger simulation events
based on the simulation time

Initialise model

257859), 2011 113/145

for(User u: modelInterface.getManagerUsers().getUsers()) {
u.act(modelInterface.getManagerContent());
}

The above example shows a single hourly event watcher class but more
classes can be defined to listen for the following events: start simulation, end
simulation, hourly event, monthly event and yearly event. The process of
registering these events is explained below.
3. ModelInterface class. This is the main interface class that must extend
SimulationModelSchema and implement IPlatformSetup interface.
Three methods that are relevant to the integration process are exposed
through this interface:
a. setModelConfiguration this method passes the reference to
the object which holds the current model configuration.
b. setSimulationClock that passes the reference to the
SimulationClock object that controls the ticking of the simulation
time. It can also be used within the simulation model code to
provide more detailed information about the current simulation
time.
c. registerEventWatchers this method should contain the code
that registers the classes that the simulation model will use to
receive event notifications from SPM.
Exemplary code snipped of the above described methods is
presented below.
@Override
public void setSimulationClock(SimulationClock arg0) {
// TODO Auto-generated method stub
simulationClock = arg0;
}

@Override
public void setModelConfiguration(Object arg0) {
modelConfiguration = (ModelConfiguration)arg0;
}

@Override
public void registerEventWatchers()
{
simulationClock.registerStartSimulationClockEvent(new
EventWatcherStartSimulation(this,0));
simulationClock.registerEndSimulationClockEvent(new
EventWatcherEndSimulation(this,1));
simulationClock.registerHourlyClockEvent(new
EventWatcherHourly(this,2));
simulationClock.registerDailyClockEvent(new
EventWatcherDaily(this,3));
simulationClock.registerMonthlyClockEvent(new
EventWatcherMonthly(this,4));
}

Adding a simulation model to Simplatform

257859), 2011 114/145

Assuming that the simulation model has been created in a manner that
conforms to the interfaces exposed by Simplatform (this process was detailed
in the previous section), the addition of the model to SPM involves two steps:
deposition of the model library to the correct folder location from where it can
be used by SPM and configuration of SPM to use the newly added library.
Both steps are detailed below.
Integration step 1
Adding simulation model library to SPM. Simplatform is provided as a
package containing following folders:
x lib - this is where all the application libraries (including SPM and
simulation model) should be held.
x bin this is where batch and configuration files are located. Initially the
folder contains configuration.txt file which contains SPM configuration
and runModel.bat batch file that is used to initialise Simplatform.
The newly created simulation library (jar file) should be placed in lib folder and
the reference to the location of this file should be updated in runModel.bat file.
This file is used to run SPM and is located in bin folder. In here, libPath
parameter value should be updated to include the path to the simulation
model library.
Integration step 2
Configuring SPM to use the newly added simulation model library. This step
involves the adjustment of the Simplatform configuration file (configuration.txt)
located in bin folder. In here, two configuration options may need updating:
x modelClassToInstantiate this property should point to the main
simulation model class that will interface with the simulation platform
(how this class is created is explained in the next section of the
document).

In here, please make sure that the full name is provided including
package details. For example, following path points to the
ModelInterface class that is located under
uk.ac.soton.itinnovation.robust.catsim package:
"modelClassToInstantiate":
"uk.ac.soton.itinnovation.robust.catsim.ModelInterface"
x modelConfigurationClassToInstantiate this property should point to
the class used to hold simulation model configuration. Analogously to
the above configuration the class location should include package
details and the name of the class file.
Once the two above configuration steps are completed, the simulation can be
initialised by running bin/runModel.bat. In situations where SPM was unable to
successfully initialise the model, it is possible to identify the possible cause by
inspecting the simulation log that is produced in bin/outputModelLogs.log.

257859), 2011 115/145

It is important to note that, by default, during its initialisation, SPM looks for
simulation configuration files in bin/batch folder. This location can be changed
by modifying batchFolderLocation parameter value of SPM configuration
(bin/configuration.txt). If batch folder contains more than a single configuration
file, SPM will process these configurations in batch mode.
If the model is initialised for the first time, bin/batch folder is empty as it does
not contain any model configuration files. Rather than requiring the user to
create the initial model configuration by hand, SPM is capable to automatically
generate the necessary configuration file. This configuration generation is
performed automatically each time SPM is run. During this process the newly
created configuration file is located in bin/defaultConfigs and the user needs
to manually move it to bin/batch folder. Such manual relocation of the
configuration file is required only during the first initialisation of SPM (with the
newly integrated model) or when the simulation model class that holds its
configuration details is modified (for example new configuration parameters
are added).

What is happening when Simplatform is run?
Following is the list of actions that are initialised once runModel.bat file is
called. It is assumed that SPM is already integrated with a simulation model
and that there exists simulation model configuration file located in batch
folder:
1. Once initialised, SPM reads the content of bin/batch folder to identify
how many configuration files the platform needs to run. In what follows
we describe actions that are initiated by SPM for a single configuration
file and thus single simulation process (for more configuration files
these actions are repeated in the loop).
2. SPM uses its configuration (configuration.txt) to determine which class
holds the simulation model configuration (this is specified as value of
modelClassToInstantiate property). Based on this information, it reads
the simulation model configuration text file that is located in bin/batch
and instantiates it as a user defined class.
3. SPM initialises SystemClock class that acts as a simulation clock and
provides it with the simulation model configuration in which the start
and end date of the simulation is provided.
4. Using its configuration, SPM instantiates the class that interfaces
between the simulation model and the platform (the name of this class
is defined as the value of modelConfigurationClassToInstantiate
property located in SPM configuration file).
5. Once the simulation interface class is instantiated, SPM invokes three
methods (registerEventWatchers, setModelConfiguration and
setSimulationClock) that this class implements through IPlatformSetup
interface. At this point, all event watchers provided by the integration

257859), 2011 116/145

code are registered with SPM and the simulation code is provided with
the reference to simulation model configuration object.
6. Start simulation event is triggered (if the simulation model registered for
this event). This event is fired before the simulation clock starts to tick
and allows to perform any final tasks prior the simulation starts.
7. Clock starts to tick and the simulation is notified about the occurrence
of events it registered for (for example hourly or daily notifications).
8. Once the simulation time runs to an end, stop simulation event is
triggered that allows the simulation code to clean-up (for example
export simulation output that was collected during the simulation).
9. Simplatform stops the simulation and releases any resources
associated with this simulation.
10. If another configuration file is located in bin\batch folder, the process
starts from (2).

257859), 2011 117/145

L. Agent model design example
Agents serve as the main driving force of the simulation, which are based on
real community users. The agents need to be described in terms of attributes
(parameters) that affect the way they behave. However, the behaviour is
governed by behavioural rules. At each 'tick' of a simulation (could be an hour,
day, week, etc.) each agent can perform some actions according to the rules
and the respective agent's parameter values. There is a control loop that
manages the actions of each agent in each tick.
Based on a concrete example of an agent model to capture the user's activity
in the SAP SCN community, the respective sections below discuss possible
parameters of agents, possible behavioural rules, control loop and outputs.
Contributor agent parameters
Following is a list of parameters that control the behaviour of an individual
agent. The values of these parameters would be determined from historical
community data, where the behaviour of each community user is analysed
independently (Community Metrics Engine) in order to identify the following
parameter values:
1. userId the user id, which maps onto the real user ID of an individual.
2. avgThreadCreationRate the average thread creation rate, which
defines the rate at which the agent produces new threads.
3. avgReplyRate the average reply rate, which defines the rate at which
the agent produces replies to threads.
4. avgReplyQuality the average reply , which defines the quality of a
response produced by the agent. This value is used to determine the
number of reward points assigned for the generated reply.
5. newThreadAttraction the thread , which defines bias of a contributor
agent towards the most recent threads that it will generate responses
to before he focuses on older threads. Using this parameter it is
possible to model following behaviour:
a. Recently created threads have higher probability of being
selected for reply than the older threads. This imitates the point
hunting behaviour where contributors are after new threads as it
may be easier to produce replies and gain points.
b. Threads that get older and remain unanswered have higher
probability of being selected.
c. All threads have uniform probability of being selected.
6. maxWorkload the maximal workload, which defines the maximum
number of thread replies the agent is able to produce in a given time
period. The value of this parameter defines the limit on the contributor

257859), 2011 118/145

in situation when the demand imposed by new threads outstrips the
human capabilities to honour all requests.
7. currentWorkload the current workload, which defines how heavily the
agent is utilised with respect to its maximal workload. If
currentWorkload exceeds 100% then the agent is unable to produce
replies in a given simulation time interval (e.g., week, month).
Agent behavioural rules
Some agent parameters can stay fixed during the course of a simulation,
whilst it is important for other parameter values to dynamically change to be
able to realistically reflect the real community. For example, the value of the
avgReplyRate parameter, which reflects the activity level of an agent that
produces thread replies, is not fixed but varies according to the behavioural
rule it is configured to use. The role of such a rule is to alter the agent activity
level during the simulation in an analogous manner as the real community
user alters his based on the perceived state of the community.
Identification of correct behavioural rules is the most critical part of the
simulation and is conducted in two complementary ways:
1. Correlation analysis. With the help of community history data and
Community Metrics Engine the analysis of past community behaviour is
conducted in order to identify community features that correlate with
the change in contributor agent activity.
2. Validation through simulation. Once candidate behavioural rules are
selected based on correlation analysis, their influence on the
community dynamics is evaluated using agent-based simulation.
During this process, agents are initialised based on the history
community data and provided with evaluated behavioural rules. The
output of the simulation, including changes in agent activity levels, is
then compared with the history community output. This procedure is
repeated for different history data time periods and if close match is
detected between the real and simulated community response then the
rule is considered as valid.
It is important to note that the validity of a given behavioural rule at a given
time period does not guarantee that it will always be valid. For example, scale
and goals of the community may change over the time, requiring the
community users to change their behaviour. Consequently, this would require
adjustment of rules that capture their activities.
When selecting which community features may potentially correlate with agent
activity, and thus become encapsulated as behavioural rules, we are primarily
focusing on the simplest, reactive rules and, only if necessary, will build on the
basis of them more complex representation of the user behaviour. Following is
the list of candidate rules that will undergo the above described two-step
validation process:
1. Momentum-based rule. Agent activity changes according to the
momentum identified in history data. For example, if according to the

257859), 2011 119/145

history data the agent activity is continually dropping (or increasing) this
trend is continued within the simulation.
2. Efficiency-based rule. The activity of agent increases proportionally to
the number of points he gained.
3. Agent productivity-based rule. The activity of agent increases
proportionally to the number of replies he produced.
4. Collaborator/Competitor connectivity-based rule. The activity of agent
increases proportionally to the number of other contributor users that
the agent collaborated with (provided responses to the same thread).
5. Demand-based rule. The activity of agent increases proportionally to
the demand generated by newly created threads.
Contributor agent control loop
Following is the list of actions that define internal agent control loop that is
triggered at constant time intervals during the simulation:
1. Utilisation check. If agent is not overutilised (current workload level <
100%) it executes another step. Otherwise it becomes inactive until
current workload level is reset.
2. Activity level update. The value of avgReplyRate parameter is adjusted
according to behavioural rule the agent is using.
3. Activation. Agent is triggered to execute three actions: creation of new
threads, creation of replies and point assignment to replies. The
probability of executing any of the above actions in the given simulation
time interval is determined by avgThreadCreationRate and
avgReplyRate parameter values.
Simulation output
During the simulation the behaviour and performance of agents is collected
and stored for post-simulation analysis. The collected data contains such
information as:
1. Activity change.
2. Number of created threads.
3. Number of resolved threads.
4. Points assigned to agents based on their replies.
Above values are recorded for each agent during every simulation time-step
which allows to perform in-depth analysis of each agent behaviour during the
whole simulation.
Based on this information the model can be applied to:
1. Detect activity drop of contributor agents. To do so, the final activity
value of agent (avgReplyRate parameter value) is compared with the
value set at the start of the simulation (obtained from the community
history data). The comparison of both values is then conducted for

257859), 2011 120/145

each agent, which represents the activity change (for example 20% of
decrease in activity in comparison to the initial activity level).
The credibility and accuracy of the obtained results can be verified and
improved by repeating the simulation multiple times. This allows to
determine the error bounds on the obtained activity change prediction.
For example 20% of activity drop with +-5% error bound.
2. Quantify the performance of the community. Following performance
metrics are provided:
a. Average thread resolution time. This is the average time it takes
to resolve (answer) the newly created tread measured in
selected simulation interval.
b. Average points gained. This is the average number of points
collected by agents that responded to threads measured in
selected simulation interval.
c. Community throughput. This represents the mean throughput of
the community calculated by dividing the number of answered
threads by the number of created threads during the specified
simulation time interval.
3. Estimate the impact of the contributor agent activity drop. This is
achieved in two complementary ways.
a. The simulation is run in 'dynamic activity' mode where agent
activity level varies over the simulation time according to its
behavioural rules and, as a result, produces the agent activity
change forecast. The output of the simulation is then analysed
and the community performance identified. The 'forecasted'
performance can be compared to the history performance of the
community from a selected period of time.
b. The simulation is run in 'static activity' mode where agent activity
level is set be the administrator and fixed constant over the
simulation time. For example, the community administrator may
select one or more community user and specify that during the
simulation their activity would drop by X% in relation to their
latest history activity value. In this case, the model does not
produce any activity change forecasts but relies on the fixed
configuration of user activities. This makes possible to determine
that by decreasing specific contributor user activity by 50%, the
number of resolved threads drops by 10% and thread resolution
time extended by 10 days. The accuracy of the prediction as
well as the error bounds can be calculated analogously as with
the activity level prediction.


257859), 2011 121/145

M. Gibbs sampler example
To numerically illustrate the current application of the Gibbs Sampler, it is
assumed that there are ten users within the sample community and that it is
desired to estimate the probability of risk occurrence for each user of the
sample community, where risk is thought of as a decrease in user activity (by
20% or more) from time period (w-1) to time period w. After making the
additional assumption that all users are independent, the GIBBS SAMPLER is
perfectly suited to estimating these probabilities upon noting that the risk
event is binary in nature, i.e. either the user decreases in activity by 20% from
time period (w-1) to time period w (and hence the risk occurs and is flagged
as a 1), or the user does not decrease in activity by 20% or more (meaning
the risk does not occur and the binary event is recorded as a 0) (see below
table).
User Activity for given time period (tp.)
Binary
Response
for tp. (w-1)
to w
(w-10) (w-9) (w-8) (w-7) (w-6) (w-5) (w-4) (w-3) (w-2) (w-1) w
- y
2 2 6 14 11 19 20 3 5 6 5 0
18 14 4 27 19 13 28 9 20 13 16 0
52 56 42 60 53 55 52 48 67 55 57 0
10 7 8 4 14 0 0 8 10 7 8 0
1 0 3 2 1 5 8 8 13 17 23 0
203 189 193 182 205 209 262 300 239 0 198 0
11 9 8 4 2 4 7 10 5 4 0 1
28 18 10 3 11 8 7 6 3 5 1 1
120 82 94 78 70 63 74 34 23 15 11 1
0 1 2 0 0 1 11 0 0 1 0 1

From the above table, the required input in the form of a set of observed
binary responses and separately the set of observations of the chosen
covariates is drawn. That is the set of observed binary responses is given in
the final column of the above table is stored in the vector y, and the columns
corresponding to time periods (w-10) through to (w-1) are taken to be the set
of observations of the covariates and are stored in the matrix
,
where
(i.e. an additional column of 1's added to represent the

'intercept' of the model). The model studied is then,

where is the normal cumulative distribution function, is the vector of
probabilities of risk occurrence for each user and
is the unknown elasticity

coefficient (which requires estimation) of
(for ) where
. Thus the above gives that,


257859), 2011 122/145

0
0
0
0
0
0
1
1
1
1

1 2 2 6 14 11 19 20 3 5 6
1 18 14 4 27 19 13 28 9 20 13
1 52 56 42 60 53 55 52 48 67 55
1 10 7 8 4 14 0 0 8 10 7
1 1 0 3 2 1 5 8 8 13 17
1 203 189 193 182 205 209 262 300 239 0
1 11 9 8 4 2 4 7 10 5 4
1 28 18 10 3 11 8 7 6 3 5
1 120 82 94 78 70 63 74 34 23 15
1 0 1 2 0 0 1 11 0 0 1

Observe that the variable k (the number of covariates) has value 11 and the
variable N has value 10 (the number of users in sample community). The
values t* and m are assigned the respective values of 10,000 and 110,000.
Thus from the previously given pseudo code two matricide are created, the
first , is of dimension , whilst the second , is of dimension
.
The first calculation is then to make an initial estimate
of the vector of
elasticity coefficients , where this is taken to be the least squares estimate.
Hence using the equation least squares,
,
where
is the inverse of the contained matrix and
is the transpose of
the matrix . The result of this matrix multiplication is that


257859), 2011 123/145

0.134558
-0.072365
0.144311
-0.043065
-0.027624
0.029829
-0.068586
0.072421
0.112941
-0.184172
0.079217

(to 6 decimal places (d.p.)) which is the base for the subsequent iterative
estimates. With this estimate of the vector
, the next vector to calculate, is

the first estimate of the latent data,
, which is calculated as,

where
is the inverse normal cumulative distribution function, is

the cumulative distribution function, is the uniform variate (on the unit
interval) and

Thus for and
as previously stated,

4.148903E-13
7.327472E-15
-2.167155E-13
3.026468E-13
-8.992806E-14
-3.765876E-13
1.000000E+00
1.000000E+00
1.000000E+00
1.000000E+00
(where the values are given in scientific form to 6 d.p.) which when substituted
into the above equation for
leads to the following initial estimate of the

latent data (where this initial estimate incorporates sampling from the uniform
distribution),

257859), 2011 124/145

-1.075124
-0.746947
-0.188274
-0.612111
-1.221588
-2.559512
1.451082
0.640371
2.436450
1.837995

(given to 6 d.p.). Subsequently with this estimate of the latent data , the next
successive estimate of , represented as
, is calculated via drawing a

random sample of size one from the multivariate normal distribution
where

where

Therefore following the calculation of the previous two formulas,

0.135068
-0.245645
0.433919
0.026144
-0.025896
-0.018790
-0.161148
0.120588
0.067494
-0.207040
0.051397
(to 6 d.p.) and,

Copyright University of Southampton, IT Innovation Centre and other members of the EC FP7 ROBUST project consortium (grant
agreement 257859), 2011 125/145

0.037308 -0.012128 0.008765 -0.014277 0.019834 0.011626 -0.010740 -0.003028 0.043862 -0.052692 0.019309
-0.012128 0.018586 -0.021673 -0.000688 -0.000910 -0.007980 0.008795 -0.004576 -0.007984 0.016851 -0.004620
0.008765 -0.021673 0.032868 -0.000522 -0.003486 0.005894 -0.008701 0.007212 0.002039 -0.012469 0.001523
-0.014277 -0.000688 -0.000522 0.011854 -0.009649 -0.002986 -0.000229 0.003679 -0.018669 0.020997 -0.006671
0.019834 -0.000910 -0.003486 -0.009649 0.021551 -0.000187 -0.005004 -0.007165 0.028949 -0.029131 0.010004
0.011626 -0.007980 0.005894 -0.002986 -0.000187 0.013126 -0.004281 0.000896 0.008926 -0.015083 0.005139
-0.010740 0.008795 -0.008701 -0.000229 -0.005004 -0.004281 0.011815 -0.003324 -0.010108 0.013129 -0.005226
-0.003028 -0.004576 0.007212 0.003679 -0.007165 0.000896 -0.003324 0.006532 -0.007766 0.005409 -0.001576
0.043862 -0.007984 0.002039 -0.018669 0.028949 0.008926 -0.010108 -0.007766 0.056159 -0.062784 0.023388
-0.052692 0.016851 -0.012469 0.020997 -0.029131 -0.015083 0.013129 0.005409 -0.062784 0.075353 -0.027077
0.019309 -0.004620 0.001523 -0.006671 0.010004 0.005139 -0.005226 -0.001576 0.023388 -0.027077 0.011037

(given to 6 d.p.).
D1.1, V 2.0 Dissemination Level: PU

257859), 2011 126/145
Consequently, the first successive estimate,
, of is then

0.138628
-0.09552
0.275661
-0.04935
0.13545
-0.01449
-0.27305
0.01733
0.140134
-0.16117
0.044279

(to 6 d.p.). This process of successive estimation is continued until m
(=110,000) estimates of have been produced which results in the matrix of
stored estimates of ,
Iteration
no.

0
*
(=10,000)
m (=110,0000)

0.134558 5.164498 36.286931
-0.072365 -33.210516 -114.632290
0.144311 52.347974 213.341042
-0.043065 13.236736 -8.506678
-0.027624 -19.586116 21.214640
0.029829 0.072164 -4.293888
-0.068586 -11.253028 -57.887863
0.072421 16.174887 17.022278
0.112941 -6.685394 16.461390
-0.184172 -9.018311 -67.811789
0.079217 1.955769 -24.109160

(to 6 d.p.). From which a subset is taken such that all estimates prior to the
bur-in period are disregarded, which results in the matrix of estimates,


257859), 2011 127/145
Iteration
no.
*
(=10,000)
m (=110,0000)

5.164498 36.286931
-33.210516 -114.632290
52.347974 213.341042
13.236736 -8.506678
-19.586116 21.214640
0.072164 -4.293888
-11.253028 -57.887863
16.174887 17.022278
-6.685394 16.461390
-9.018311 -67.811789
1.955769 -24.109160

(given to 6 d.p.) (from which the mean of each row is taken, and it is the
resulting vector of this,
, where here

57.857737
-66.256950
121.871104
-28.864078
13.831854
10.495228
-22.267094
8.448821
52.704587
-93.511271
9.484661

(to 6 d.p.), that is to be the singular estimate of the vector of elasticity
coefficients . With this vector and the matrix it is then possible to estimate
the probability of the occurrence of the event that each individual user will in
activity by 20% or more from time period to time period via the
following,


257859), 2011 128/145
where is the normal cumulative distribution function. Hence the
application of the Gibbs Sampler estimates the probability of risk occurrence
to be

0
0
0
0
0
0
1
1
1
1

where the first row of the vector , gives the probability of the defined risk
occurring with respects to the first observed user, and so on. By setting the
estimate of the probability of the occurrence of the risk event to be the
estimate of the binary response , the Gibbs Sampler model residuals, , are
found to be,

0 0 0
0 0 0
0 0 0
0 0 0
0 - 0 = 0
0 0 0
1 1 0
1 1 0
1 1 0
1 1 0

However, for the baseline model which is that the singular estimate of is
instead taken to be the least squares estimate (i.e. the initial estimate
of
) the probabilities of risk occurrence for each of the ten users were estimated
to be,


257859), 2011 129/145

4.148903E-13
7.327472E-15
-2.167155E-13
3.026468E-13
-8.992806E-14
-3.765876E-13
1.000000E+00
1.000000E+00
1.000000E+00
1.000000E+00

(to 6 d.p.). Consequently, the residuals with respect to the baseline model, ,
are similarly to above found to be,

4.148903E-13 0 4.148903E-13
7.327472E-15 0 7.327472E-15
-2.167155E-13 0 -2.167155E-13
3.026468E-13 0 3.026468E-13
-8.992806E-14 - 0 = -8.992806E-14
-3.765876E-13 0 -3.765876E-13
1.000000E+00 1 8.721912E-13
1.000000E+00 1 -3.097522E-14
1.000000E+00 1 -1.676437E-13
1.000000E+00 1 -2.680078E-13

(to 6 d.p.), where, as stated before, each row gives the information regarding
an individual users behaviour, e.g. there is approximately no error within the
estimate of the first observed users behaviour in regards the application of the
baseline model.
Finally, from the vectors and (i.e. the residuals of the Gibbs Sampler and
baseline models respectively) the mean squared error (MSE) is calculated.
The purpose of this calculation is to judge and compare the level of error
within both models via a single pair of numbers, for this example the following
values were observed,
Model MSE value
Gibbs Sampler 0
Baseline (LS) 1.322263E-25
(given to 6 d.p.).

257859), 2011 130/145
Therefore, this example illustrates the importance of the quality of the
successive estimates of the vector , as is obvious upon comparing the
above MSE values, where the lower the value, the lower the level of error
within the model. Not all graphs produced are included in this document;
those included are shown below and are the most interesting from a business
perspective. The first is the graphical representation of the estimated
probability of risk occurrence for the sample population of users (explicitly
shown by user in the vector above) (see figure below). This shows that
within the sample population a user is more likely to not decrease in activity.
However it appears that there is a 40% chance of a user within the sample
population decreasing in activity by 20% or more from time period to
time period , which indicates that the risk of a 20% decrease in activity is of
concern within this sample population. Were this sample population of users
not simulated, but taken from the population of users of the SAP, IBM or
Polecat online communities, the below histogram would imply that the defined
risk could be of concern across the entire population of users.

Figure 32: Histogram of the Gibbs SampIer estimated probabiIities of 'success' for the
given sampIe popuIation of users where 'success' is defined as the occurrence of risk
(risk being a 20% or greater drop in user activity from one time period to the next).
The only other graphic to be presented regarding this example is the
illustration of the goodness of fit of the model, i.e. the error within both the
Gibbs Sampler and baseline models, shown below in the figure below. This
plot displays the data of the previously calculated residual vectors of the
Gibbs Sampler and baseline models, i.e. and respectively, and the
corresponding MSE values. From this illustration it is seen that all residuals
are of value 0 and the resulting MSE value is exactly 0, hence there is no
,W^
Sample Population

257859), 2011 131/145
error within the estimates of the probability of risk occurrence produced by the
application of the Gibbs Sampler. However the same cannot be said of the
residuals corresponding to the baseline model as the MSE was seen above to
be only approximately (not exactly) zero, thus indicating the presence of error
within the estimates produced by the application of this less sophisticated
model.

Figure 33: Scatter plot comparing the goodness of fit of the baseline and Gibbs
Sampler probit model with regards to the previously given and data matrices.
So far in this example the estimates of the occurrence of risk have been found
and presented (for the Gibbs Sampler), and the reliability of these estimates
has been illustrated. Finally, with respects to this example it is concluded that
the estimates of the probability of risk occurrence for each user of the sample
population produced by the application of the Gibbs Sampler contain no error
and hence these estimates can be relied upon.
Note that in the current implementation of the Gibbs Sampler, it is the above
two graphs rather than the numerous above vectors that are displayed via
PDF file for the interpretation of results.


257859), 2011 132/145
N. Risk and opportunity questionnaire
1: Your details (optional)
This section aims at collecting information about the person filling in this
questionnaire:
1- Name :
2- Email :
3- Company :
4- Position :
5- Relation to the community host :
2: Online community description
This section aims at collecting information about the online community in
question:
1- Name of online community:
2- Description of online community (2 lines):
3- Link/URL of the online community:
4- Type (please tick one of the following boxes):
Internal
Internal + Partners
Public
Other, more details:
5- Approximate number of members:
6- What technology is used by this community? Please rank these
technologies according to what is mainly used; 5 being the most used.

1 2 3 4 5 No answer

Forums

Blogs

Social networks (e.g. facebook)

Wiki

Document/file sharing

Bookmarks

Rating of user generated content


257859), 2011 133/145
If other technologies are used in the community, please specify:

3: Cost/benefit (ROI)
This section aims at identifying the value of the online community in question
to the business/company.

1- Where do you see the main benefit in operating/hosting the online
community? Please rank these objectives on a scale of 1-5. 5 being the
most important.

1 2 3 4 5 No answer

Customer support

Developers support

Ideas generation

Opinion research

Spread of word of mouth

Market research

Advertising and marketing

Reputation management

Employee communication

Finding experts

Fostering collaboration

Public relations

New product development

Connecting people

Other, Please specify:


257859), 2011 134/145

2- What parameters do you use to measure/quantify the business impact
of the online community? Please rank these on a scale of 1-5. 5 being
the most important.

1 2 3 4 5 No answer

Number of community members

Number of platform visits per day

Sum of user time spent on platform

Customer support load

Sales figures

Productivity

Work processes outcomes

Other, Please specify:

3- What metrics do you consider as important for determining the health
of the online community? (e.g. the number of users, response time, etc)

4- What community analysis software do you use (if any), and what is it
used for?

5- How many moderators/administrators are allocated for
operating/managing the community?

6- How many posts/requests do the moderator(s) treat/solve on average
per day?

7- How much time do moderators spend on duty on average per day?


257859), 2011 135/145

4: Online community: observable users behaviour
This section aims at identifying the categories/type of users/subscribers of the
online community in question. Below are definitions for the terminology used
in this section:
x Active user: somebody who logs in and uses the community regularly.
x Inactive user: somebody who does not log in regularly to use the community.
x Moderator/admin: somebody who can edit/delete content generated by other
users, and possibly be allowed to take actions such as inviting/adding users to a
private community or disable/ban users. This could apply to sub-communities.
x Banned/disabled user: somebody who is not allowed to use the online community,
indicating that the user has not behaved in according with the rules of the
community.
x Contributor: somebody who answers questions and posts content to the online
community.
x Consumer: somebody who may post questions and search for information.

1- What categories of users exist within this online community? (tick the
box(es) that apply and indicate percentage if possible)
active users percentage of the community:
inactive users percentage of the community:
moderators/admins percentage of the community:
banned/disabled percentage of the community:

2- Are there existing tools or functionality that makes it possible to identify
the following categories of users within this online community? (tick the
box(es) that apply)
active contributors percentage of the community:
top contributors percentage of the community:
active consumers percentage of the community:
top consumers percentage of the community:
experts in particular areas percentage of the community:
spammers percentage of the community:

Others:


257859), 2011 136/145

3- What do you think motivates the users to participate actively in this
community? Please indicate the strength of the motivational aspect
with a value 1-5; 5 being strong.

1 2 3 4 5 No answer

Getting rewards and appreciation
by other community members

Gaining knowledge and information

Sharing knowledge and information

Seeking opportunities to interact
with people or to participate in
social activities

Self-expression

Obligation / participation as part of
internal workflows

4- What other incentives do you think would motivate users to participate
actively in this community?

5- What do you think could discourage users from participating actively in
this community?

6: Risks (reduce value)
Please fill in the following table with examples of risks you see probably within
the online community in question. Below are some definitions and examples
to help you fulfil this task.
x Risk = A risk is the possibility of suffering harm or loss. Risk refers to a
situation where a person could do something undesirable or a natural
occurrence could cause an undesirable outcome, resulting in a
negative impact or consequence.
x Likelihood = how likely do these risks arise: Low (L), Medium (M), High
(H)
x Detection method = how can these be detected or assessed
x Counter-measures = what actions would be taken on the community to
prevent this from happening. More than one action may be planned.

257859), 2011 137/145
x Response delay = how fast does action need to be taken (seconds,
hours, days etc.)
x Effectiveness = Does it completely solve the problem?

Risk
description
L
i
k
e
l
i
h
o
o
d

Detection
method
Counter-
measures
Response
delay
Effectiveness
Experts
leave/lost,
because they
get overloaded
with questions
from other
users
M Inactivity over
period of time,
close account.
Encourage
more experts
and sharing of
expertise
through points
system
Months. It
requires
setting up
a whole
reward
system and
the
technology
for that.
This addresses
the problem but
create other
problems like
experts
answering their
own questions
to gain points.
Community is
inactive, has
many members
who do not
contribute
M No activity Try to post
more
information to
make people
involved
Months Need to think
carefully about
how to do this.

7: Opportunities (increase value)
Please fill in the following table with examples of opportunities you think
should be exploited within the online community in question. Below are some
definitions and examples to help you fulfil this task.
x Opportunity: possible improvements to the business objectives thus
increasing its value
x likelihood : how likely or how often do these opportunities arise
x Detection method : how can these be detected or assessed
x Actions: actions to be taken on the community to exploit/improve the
likelihood of this opportunity.
x Action timing: how fast does action need to be taken (seconds, hours,
days etc.)
x Effectiveness : how difficult is it currently to take these actions


257859), 2011 138/145
Opportunity
description
L
i
k
e
l
i
h
o
o
d

Detection
method
Actions Action
timing
Effectiveness
Avoid
duplication of
work
H Manually,
Community
members.
Automate detection
of questions and
prompt the user with
those that seem
similar (and their
respective
answers).

Allow users
subscription to
receive news about
certain
topics/discussions.
Seconds.

Days.
Users should
be sent a
meaningful
clear summary
of the topics/
discussions.
Advertise other
company
products
H Manually,
company
experts
Extend experts'
knowledge to other
company products.

Automatic keyword
detection that allows
showing related info
Days.

Seconds.


257859), 2011 139/145
O. Risks and opportunities from questionnaire
responses
Table 23: Risks identified by community hosts and owners.
Risk
description
L
i
k
e
l
i
h
o
o
d

Detection
method
Counter-
measures
Response
delay
Effectiveness
Use the Send
Message button
to Community
H Feedback of the
users
Make a pop up
to explain the
use of this
button, or put
the button in a
place of less
emphasis
Days Maybe
Low participation M No activity Nudging
personal
network
Hours Good
Lack of
participation
M Overall low
activity in
community
Keep educating
the community,
and raise
awareness of
its existence
months will help spark
further
interactions when
they see others
participating
Users do not
consult the site
before logging
calls
H amount of
support calls
received
directing users
there
first/notifying
managers
Days manager buy in is
probably not there
Inappropriate
deletion of wiki
content
M wiki modification
alerts
revert content Days OK
Experts leave
due to question
overload
M no activity emails and
encouragement
Weeks Moderate
IBM Confidential
Leak
L Active
Monitoring
DLP on the end
point
Days High
Community is
inactive; has
many members
who do not
contribute.
H No activity Try to get
others besides
me to post
Months if others do not
share information,
community
becomes one way
street (it is now)
Community is
inactive and
members do not
contribute
content
M Few new post
and little new
content. Many
members not
actively
contributing.
Owners set a
good example
of usage - team
encourages
community
usage

Social Fatigue H People feeling
they belong to
far too many
Fine tune the
communities
you belong to,
Months have a
manageable
bunch of

257859), 2011 140/145
Risk
description
L
i
k
e
l
i
h
o
o
d

Detection
method
Counter-
measures
Response
delay
Effectiveness
communities and reduce the
noise

communities you
would want to
contribute to
Community is
inactive; has
many members
who do not
contribute.
M Lack of activity increased use
of the tool the
platform
support could
lead to
increased
forum usage
Months Likely.
Community is
inactive
H no activity emails and
calls
Weeks Moderate
Customer
Confidential
Leak
L Active
Monitoring
DLP Days Med
Community
members will
ignore
H No feedback or
responses
Continue to
post
information, to
not rely on
email or
alternative
methods; get
people to rely
on community
Months Can't force the
horse to drink the
water.
Growing too
large
M As more folks
join the
community they
may feel it's just
too large to
benefit from it
grow slowly,
but steadily
months will probably
diverge into the
creation of sub-
communities or
affinity groups
Incorrect
answers
provided to users
questions
M content
moderation
Further
moderate the
threads in
question
days Reasonable. Can
cause confusion
among members.
Not finding
experts
M Expressing
public concerns
they can no
longer find their
experts
using social
tagging to
resurface them
months people will be
capable of
performing
searches on key
tags / words
Not finding right
content
H no longer being
capable of
finding the right
content in a
timely manner
use social tools
to help manage
the flows better
and therefore
the knowledge
they produce
months they will
eventually find
content easier
thanks to
metadata like tags
Load carried by
a very small few.
H Participation
metrics
People
assigned tasks
Months It's an ongoing
problem.

257859), 2011 141/145
Risk
description
L
i
k
e
l
i
h
o
o
d

Detection
method
Counter-
measures
Response
delay
Effectiveness
Mismatch of
Expectations
L Participation not
inline with
community
intent
Coaching Months No
bipolar -
members who
contribute and
members who
take
H points
discrepancy
strict monitoring minutes medium
Community only
loosely if at all
connected to
other external
SAP channels
(sap.com)
H Low no. of
incoming links
Closer
cooperation
between SCN
and sap.com
teams
Highly effective

Table 24: Opportunities identified by community hosts and owners.
Opportunity
description
L
i
k
e
l
i
h
o
o
d

Detection
method
Actions Action
timing
Effectiveness
Make better
use of
worldwide
team working
across multiple
time zones.
H Manual -
watching who
is
participating
in
discussions.
Reminders to folks.
Sending links to
folks to discussions
in which they may
have interest.
seconds Nothing
automated, so
people have to
remember to
constantly keep
this in mind.
Leaders need to
lead by example.
faster
response time
for webapp

Follow up on
ideas / ask
questions
H Manually,
browsing
discussions
Ask follow-up
questions
Hours High
Learn from one
another
H Increase in
the
participation
from
community
members
Nurture the
community and the
healthy
connections
amongst members
ongoing Healthier, more
mature, self-
sustainable
communities
Great H volume of encourage and days

257859), 2011 142/145
Opportunity
description
L
i
k
e
l
i
h
o
o
d

Detection
method
Actions Action
timing
Effectiveness
education tool support calls reward use of and
contribution to site
pairing
members with
simiar goals
H manually
connect
members
none days medium
Force multiplier
of knowledge
H activity seeing usage days good
Understand
better use
M Manually Additional
Education
1 Community
members find
value
produce
desireable
solutions to
common
problems
H monitor
adoption in
products
get developers and
project managers
to post their
success stories;
use alternative
communications
methods to spread
the word
long time
Learn
something new
H Bookmarks Tagged bookmarks
inidcate people find
information useful
Days
Reduction of
email
H Manual
analysis of
how many file
attachments
are being
received via
email versus
filed in the
community
tool.
Reminders to folks. seconds Nothing
automated, so
people have to
remember to
constantly keep
this in mind.
Leaders need to
lead by example
Support one
another
H Increase in
the
participation
from
community
members
Nurture the
community and the
healthy
connections
amongst members
mature, self-
sustainable
communities
sharing of
assets creating
by members
H manually automate this
process, but it's
more likely to be
done outside of the
community
minutes Medium
product feature
vetting forum
H activity seeing usage days good
Increased
participation in
project
discussions
H Manual
checking on
who is
participating
Reminders to folks. seconds Nothing
automated, so
people have to
remember to

257859), 2011 143/145
Opportunity
description
L
i
k
e
l
i
h
o
o
d

Detection
method
Actions Action
timing
Effectiveness
and decisions. in discussions
and on
whether
discussions
are
happening in
the
community or
via old-
fashioned
email.
constantly keep
this in mind.
Leaders need to
lead by example
Find experts
quicker
H Increase in
the
participation
from
community
members
Nurture the
community and the
healthy
connections
amongst members
mature, self-
sustainable
communities
increasing
members role
in updating wiki
H automated
(wiki update
alerts)
encourage
members to do so
when responding to
forum questions
days medium
Better history
and archiving
of project
desicion
making and
documentation
H Manual
monitoring of
activity within
the
community.
Reminder to folks seconds Nothing
automated, so
people have to
remember to
constantly keep
this in mind.
Leaders need to
lead by example
Find
information
faster
H Increase in
the
participation
from
community
members
Nurture the
community and the
healthy
connections
amongst members
mature, self-
sustainable
communities
eliminate q&a
in email, and
move it to the
forums
H moderators
receiving of
email outside
of the
community
try to push
questions into the
forum
days high
Share their
knowledge and
experiences
H Increase in
the
participation
from
community
members
Nurture the
community and the
healthy
connections
amongst members
mature, self-
sustainable
communities
Build healthier
communities
H Increase in
the
participation
Nurture the
community and the
healthy
mature, self-
sustainable

257859), 2011 144/145
Opportunity
description
L
i
k
e
l
i
h
o
o
d

Detection
method
Actions Action
timing
Effectiveness
from
community
members
connections
amongst members
communities
tags H manually allow users to tag
content for easier
search later
seconds Very high
heuristic
search
H automatically detect other similar
posts
seconds Medium


257859), 2011 145/145
Version history

Version Date Author Comments
0.1 26/06/2011 Bassem Nasser Table of Contents and outline
0.2 15/09/2011 Vegard Engen,
Bassem Nasser,
Simon Crowle
Initial contribution.
0.2.1 21/09/2011 Vegard Engen,
Bassem Nasser,
Simon Crowle
Content update. Incorporated updates from
CORMSIS.
0.3 27/09/2011 Vegard Engen,
Mariusz Jaycno
Content update
0.4 28/09/2011 Vegard Engen Incorporated contributions from OU.
0.5 05/10/2011 Vegard Engen Content update
0.9 15/10/2011 Bassem Nasser Ready for review
1.0 27/10/2011 Vegard Engen,
Bassem Nasser,
Simon Crowle,
Mariusz Jaycno
Review comments addressed.
1.1 30/10/2011 Vegard Engen,
Bassem Nasser
Review comments addressed.
2.0 31/10/2011 Vegard Engen,
Bassem Nasser
Final version

Acknowledgement
We would like to thank the following people for their contributions to this report:
Adrian Mocan (SAP), Falk Brauer from (SAP), Inbal Ronen (IBM), Felix Schwagereit
(UKOBLENZ), Matthew Rowe (OU) Toby Mostyn (Polecat) and Harith Alani (OU) for
their contributions to the risk and opportunity questionnaire.
The research leading to these results has received funding from the European
Community's Seventh Framework Programme (FP7/2007-2013) under grant
agreement n 257859, ROBUST.

Representation of Risks in Online Communities - Bassem Nasser

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Representation of Risks in Online Communities - Bassem Nasser

Transféré par

Droits d'auteur :

Formats disponibles

Report D1.1, V 2.

Report D1.1, V 2.0 Dissemination Level: PU

relative coverage (%) 54 15 92 92 38 69 15 85 92 31

Human agent role

Human agent activity

relative coverage (%) 21 0 36 93 79 29 14 71 43 14

relative coverage (%) 57 86 86 86 43 57 86 100 86 43

relative coverage (%) 0 0 33 100 67 0 0 33 33 0

relative coverage (%) 80 20 40 40 40 80 0 80 100 0

Labelled human agent

relative coverage (%) 50 50 0 0 0 0 100 50 50 50

(i.e. an additional column of 1's added to represent the

is the unknown elasticity

. Thus the above gives that,

is the inverse of the contained matrix and

, the next vector to calculate, is

, which is calculated as,

is the inverse normal cumulative distribution function, is

leads to the following initial estimate of the

, is calculated via drawing a

Vous aimerez peut-être aussi