Love, Hate, Arousal and Engagement: Exploring Audience Responses To Performing Arts

CHI 2011 Session: Performing Arts
May 712, 2011 Vancouver, BC, Canada
Love, Hate, Arousal and Engagement:

Exploring Audience Responses to Performing Arts
Celine Latulipe
HCILab
University of North
Carolina at Charlotte
Charlotte, NC, USA
clatulip@uncc.edu
Erin A. Carroll
HCILab
University of North
Carolina at Charlotte
Charlotte, NC, USA
e.carroll@uncc.edu
Danielle Lottridge
Department of
Communication
Stanford University
Stanford, CA, USA
lottridg@stanford.edu
enables us to generate graphs that provide insights into experiences of dancers and audience members. The initial question that arises is how choreographers and theater directors
in performance arts would use quantitative audience engagement information if it were available to them. This is a complex question because there are many facets of engagement,
many possible ways to measure engagement and many possible ways to interpret and use that information once collected. The deeper cultural and moral questions are: should
we collect quantitative audience engagement data? What
would it mean to performing art practitioners if they could
see exactly how their audiences responded?
ABSTRACT
Understanding audience responses to art and performance is

a challenge. New sensors are promising for measurement
of implicit and explicit audience engagement. However, the
meaning of biometric data, and its relationship to engagement, is unclear. We conceptually explore the audience engagement domain to uncover opportunities and challenges
in the assessment and interpretation of audience engagement
data. We developed a display that linked performance videos
with audience biometric data and presented it to 7 performing arts experts, to explore the measurement, interpretation
and application of biometric data. Experts were intrigued
by the response data and reflective in interpreting it. We
deepened our inquiry with an empirical study with 49 participants who watched a video of a dance performance. We
related temporal galvanic skin response (GSR) data to two
self-report scales, which provided insights on interpreting
this measure. Our findings, which include strong correlations, support the interpretation of GSR as a valid representation of audience engagement.
An important and related question is What does a biometric

audience response actually mean? It is unclear that the biometric response of an audience member collected throughout
a performance actually tells us anything useful - the readings may be completely unaffected by the performance being viewed. If biometric readings really do indicate a reaction to the performance, we need to understand how those
readings can be interpreted. If we can infer a reaction (its
strength and/or valence) from a biometric reading, we will
have a potentially powerful tool for performing arts experts.
Author Keywords
audience engagement, arousal, galvanic skin response
We are motivated to explore audience feedback so that we

can answer some of these questions. One way to approach
the audience engagement domain is through the top-down
creation, deployment, and evaluation of various combinations of software and sensors. Yet, uncertainty surrounding
the appropriateness and usefulness of various sensors makes
it very likely that signicant time and resources would be
spent building and evaluating sensors ill-suited for the task
of understanding and learning from audience engagement.
ACM Classification Keywords
H.5.2 Information Interfaces and Presentation: User InterfacesUser-centered Design; J.5 Computer Applications:
Arts and HumanitiesPerforming Arts
General Terms
Human Factors
INTRODUCTION
In her 1991 memoir [12], choreographer Martha Graham

stated, Every dance is a kind of fever chart, a graph of
the heart. Those words evoke the intricacies of artistic emotional journeys; they are also prescient, as now technology
Our work is instead based on a bottom-up approach. We

initially collected galvanic skin response (GSR) data from
five subjects watching performance videos. We designed and
implemented an application to display the audience biometric data in such a way that changes in the data are linked
to the second-per-second flow of the performance. Using
these recordings, we examined how seven performing arts
experts interacted with audience engagement recordings of
their own work or the work of their colleagues. Based on
these investigations, and the interest shown in biometric audience readings by our performing arts experts, we conducted
a large empirical study on the meaning of audience biomet-
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
CHI 2011, May 712, 2011, Vancouver, BC, Canada.
Copyright 2011 ACM 978-1-4503-0267-8/11/05...$10.00.
1845
In the Judson Church Movement in the 1960s, modern

dancers questioned the need for formal dance technique, thus
audience involvement in a hippie sort of happening often
evolved [1]. In other types of audience engagement, choreographers have occasionally allowed audience members to
yell prompting words, either to impact the dancers actions or as live vocal accompaniment. In theater, there is a
strong history of audience interaction. One particular example is the idea of the spect-actor rehearsing the revolution,
where participants are invited to act out a variety of different
endings [5]. Schechner describes environmental theater, in
which the audience are active members of the setting [30].
ric response as measured through skin conductance. Our results show that skin conductance is a reliable indicator of
emotional response to a performance, which is what our performing arts experts are most interested in.
The first part of this paper is a conceptual exploration of audience engagement - what does it mean? Should technology mediate audience engagement, and if so how should that
work? We draw on literature from the performing arts, humanities, market research, psychology, psychophysiology,
HCI and design. In the second part of this paper, we describe our exploratory study and the insights we gained from
having performing arts experts work with sample biometric
audience engagement data. In the third part of this paper, we
detail an empirical study that explores the meaning of skin
conductance data collected during a performance. Finally,
we discuss how these different theoretical, exploratory and
empirical results combine to form a clear picture of the possibilities in using biometric audience response.
There is a small amount of work in which the audiences

engagement, measured through biometrics, is used as realtime input to the performance. One example is the cheering meter that measures applause during improvisational rap
competitions [3]. Maynes-Aminzade et al. investigated techniques for interactively engaging a cinema audience before a
show [24]. Similarly, audience preference is measured during competitive reality TV phone-in votes.
AUDIENCE ENGAGEMENT
Radbourne et al. have considered measures of audience experience in performing arts and also emphasize the importance of engagement [27]. As they point out, there are many
possible ways to consider audience engagement in the performing arts. Audience engagement can act as an evaluation tool to help us understand how a performance is received. We can investigate various ways to increase audience engagement by involving the audience in the production. While there is a range of audience engagement possibilities, in this work we focus on measuring audience engagement as a process tool for performing arts experts.
Defining Engagement
How one defines engagement is going to impact the choice

of technology used to measure engagement. Intuitively, we
think of engagement as related to attention and interest, and
while engagement is often associated with positive valence,
it is clearly possible for people to be attentive and interested
with negative valence (think of disgust, terror or anger). In
studying engagement we are venturing into the domain of
affect and so we can look to the literature in affective computing [26, 31]. However, much of this work is aimed at
presenting or generating affect, and it does not apply to measuring the engagement of audiences.
Temporal Art
Before continuing, it is worth noting the scope of our explorations. There are many temporal arts that could be investigated, including music, cinema, theater, dance, comedy,
television, and others that are less neatly categorized. Our
current work focuses primarily on dance, and secondarily on
theater, and so in this discussion they will be the domains
we refer to. We believe our ongoing explorations of audience engagement with dance and theater may apply to other
live performances such as music, comedy and even cinema.
We can also look to the neurophysiology literature to better understand affect and the distinctions between valence
and arousal. William James began the early accounts of how
bodily reactions created emotional experience [14]. Today,
neurophysiologists agree that emotions have a neurophysiological basis, precede conscious explanation and can exist
outside of consciousness [19]. Zajonc describes how affective reactions are difficult to verbalize [35]. Lang describes
emotions as dispositions, or states of readiness, that function to prepare and facilitate interaction with the environment [18]. When participants rated pictures, Lang found that
pleasantness ratings, heart rate, and facial muscles tended to
load onto one factor (valence) and interest ratings and skin
conductance tended to load on another (arousal).
Television is an area where audience engagement (through

various types of ratings tools) has been well-studied. In
market research, continuous concurrent self-report measures
have been used to indicate agreement or disagreement with
a message [13] or like/dislike of a radio program [25]. We
draw from market research, however there are limitations to
the applicability of that work. Television is different because
of the need to prevent channel switching to maintain advertising revenues. People watch television casually in their
homes, and can turn it off or switch stations the moment
they lose interest. This context is dramatically different than
going out and paying to see a performance in a social setting.
Emotions are often denoted by categorical labels, and they

can also be conceptualized two-dimensionally, where different x,y coordinates correspond to intensities along the
axes of valence (positive-negative) and arousal (sleepyactivated) [18, 29]. Theater directors and choreographers
may not have in mind a specific emotional goal for their audience, but it seems unlikely that a director or choreographer
would want the audience to feel bored, droopy or sleepy during a performance. Arousal is clearly important in measuring engagement. It may be interesting to measure valence
History of Audience Interaction
The limited amount of explicit audience participation in

dance has taken place within the modern dance movement.
1846
biometric measures to be implicit measures - although other

researchers used the term explicit responses in cases where
they have asked participants to try to explicitly change their
biometrics [17]. It is not our intent, nor would it make sense
in this context, to ask users to consciously affect their biometric responses, thus we use the term explicit responses
to refer to conscious self-reporting.
during a performance, but we need to be careful about confounding two factors: how the performance makes a person
feel versus how much they like the performance. Given this
confound, measuring valence is considerably more complicated than measuring arousal.
Given the previous work in affect, we can state that emotional engagement is a complex phenomena that involves
both valence and arousal. For the purposes of dance and theater performance, we focus on arousal, which we consider to
be the most relevant component of engagement.
GSR is a common biometric that measures the conductivity

from sweat on the skin, which is secreted in response to autonomic nervous system arousal [7]. Lang [18] observed a
linear correlation between GSR and arousal. GSR has historically been used in lie-detection, and more recently for
measuring experience during video game play [21].There are
also possibilities in measuring neural signals (using fMRI or
fNIRs) during a performance to measure engagement, but
current brain imaging technologies have serious limitations
for such an undertaking. EMG measures facial muscle activation and is thus more clearly linked to affect, but also has
a poor form factor for use in a theater setting.
Current Measurement Methods

Measuring Engagement Explicitly
Audience members already give explicit feedback to performers, and performance artists talk about how they feed
off the energy of their audiences. However, the bandwidth
of this communication is limited and noisy. The most common measure of audience engagement and appreciation, final applause, is temporally offset from the performance itself, and prohibits that information from feeding back into
the performance. More explicit methods for measuring audience engagement include post-performance surveys, focus groups and audience interviews. A complicating factor in post-performance questionnaires is the peak-end effect [15], which shows that a measure of emotional experience taken immediately after an experience is strongly influenced by the peak emotion and by the emotion experienced
at the tail end of the experience.
Finally indirect, but implicit, measures of engagement are

also possible. These include the use of high-resolution computer vision or thermal imaging techniques to detect facial
expressions [16]. Similarly, a non-automated approach to
facial action coding is possible [11]. However, the facial expressions during a performance may not be particularly illuminating, and massive parallel facial expression recognition
of an audience is not currently feasible. An alternative approach may be an indirect, implicit measure of engagement
through the use of seat-embedded posture detectors such as
the Body Posture Measurement System (BPMS) [10]. For
example, a built-in posture sensor could be coupled with an
iPhone slider, or a GSR finger wrap could be paired with
an engagement knob. There are a variety of combinations
that could work. The benefit of these combinations is that
the implicit, biometric measures could provide data even if
the user forgets to explicitly indicate their engagement or is
bored and just fiddles with the engagement knob. In these
cases, we would expect the biometric data to help cancel out
the explicit noise. Thus, the combination of such measures
seems likely to lead to more reliable overall measures of engagement.
To explicitly measure engagement, audience members could

use response devices similar to those that have been employed in market research, including the warmth monitor [33] and the on-screen cursor [4]. Mauss et al. used a
slider along with physiology measures in a video-watching
experiment [23]. More recently, Ranjan et al. used a slider to
find differences between video conference experiences [28].
Lottridge and Chignell showed that the interface for measuring engagement matters. [20]. In a study of explicit audience reporting on dance, Stevens et al. used a system which
allowed specification of a one-dimensional parameter on a
handheld PDA using a stylus [34]. Their work showed that
arousal is correlated with surface features of a dance more
than valence.
Insights and Research Questions

Measuring Engagement Implicitly
Assuming we have mechanisms to collect implicit or explicit

audience engagement data, questions arise around parsing
and making sense of the information. There are issues such
as sensor noise, environmental effects, and the choice of signal processing algorithm. Ignoring all those issues and assuming that we can filter out noise, we end up with a temporal tracking of some measure that likely relates to engagement. The temporal resolution may vary, but most methods should give at least a second-by-second accounting of
engagement. One of our research questions is how the performing arts experts will interpret this data.
There are a variety of common physiological measures that

can be used to measure implicit engagement biometrically,
such as GSR, heart rate, blood pressure and respiration.
Combining data from a variety of measures would lead to
more reliable results. All of these biometrics have some
level of invasiveness - GSR requires sensors worn around
the fingers, heart rate requires some type of monitor [8], etc.
However, we can foresee that in the future these devices will
become smaller, more sensitive and less cumbersome. Outfitting an entire audience with heart rate monitors or galvanic
skin response sensors is expensive, but feasible. Given the
hassle and invasiveness of such an undertaking, we need to
have good reason to believe that the data obtained would
be very useful. It is important to note that we consider all
One approach to temporal engagement data would be to treat

it simply as continuous stimulus-response data. The normative assumption behind this approach is that each second,
1847
audience members are responding to the thing they just saw

happen in the previous second. While a stimulus-response
approach seems likely to be suitable for television [32],
this approach is problematic for artistic performances. We
equate it to the cable-news effect - where in order to prevent people from switching channels, viewers are constantly
bombarded with moving text, flashy videos and screaming
commentators. We are not the first to consider this usage
of biometric affect data - the idea of neuro-targeting performances for optimum engagement was proposed by CalvoMerino et. al in a study where videos of dancers were shown
to viewers inside an fMRI machine [9]. Their study used a
stimulus-response approach, where they asked the viewers in
the machine to rate each individual movement. As expected,
participants were most engaged by the largest dance movements. To jump to the conclusion that an optimal dance contains nothing but leaps is clearly wrong. Live performances
typically have a narrative structure, with a setup, a build, a
climax and a denouement, or some variation on that structure. The build-up of emotion and the ups and downs that
help give performances structure are clearly important, and
the stimulus-response interpretation ignores this.
An alternative approach would involve semantically chunking the temporal engagement data into segments meaningful
to the choreographer or theater director. The experts could
be asked to define segments a priori and then be presented
with pre-segmented engagement data after a performance.
Alternatively, they could interactively define the segments as
they investigate the data. It is possible to consider measuring both explicit (self-report) engagement levels and implicit
(biometric) engagement levels to allow experts to obtain a
more complete picture of audience engagement. In a study
of users watching Internet videos, Bardzell et al. found it
useful to triangulate affect data from various sources [2].
Figure 1. Screen shot during session with C2. The lines topped with
blue circles are segmenting lines C2 added to define semantic chunks.
Between these lines the pink fill indicates the aggregate response over
that segment. C2 preferred to see individualized response lines, rather
than a single response line.
as measured through GSR. In other words, while the participant watched the performance, they were able to view
engagement levels that corresponded to every second of the
performance. Engagement levels were displayed below the
video in a line graph (See Figure 1). The experts were able to
interact with the data by clicking on the line graph or by sliding the seeker bar. The video player also featured a built-in
ambient display in the form of a video border which changed
color saturation in response to the current aggregate arousal
level.
EXPLORATORY STUDY
Many of the technologies that could be used to measure audience engagement data are either expensive, complicated
or invasive. Before anyone goes to the expense and effort
of instrumenting an audience, it seems reasonable to explore
how performing arts experts would construe audience engagement data. To this end, we developed an exploratory
study using a small amount of audience engagement data and
two different performance videos. The study is not meant to
perfectly replicate live performance measurement or usage,
but rather to present sample data to these experts that is reasonable and to gage their reactions to it. The study used
individual arousal data collected through GSR sensors.
The performance that each expert watched corresponded to

their respective field. Choreographers watched a dance piece
entitled, Voices, while theater directors watched a portion
of The Crucible. The dance was choreographed by participant C1 and the play was directed by participant T4, thus we
had two participants with direct responsibility for the performances included in our study. The GSR data that was used in
this study was real data collected from four members of our
research lab watching videos of the dance and five members
watching the play. To take into account people having different GSR baselines, we displayed the z-scores of the GSR
data. Our performing arts experts were able to look at either
the aggregate data or the individual engagement lines.
Methodology and Materials
Our expert user study involved showing 3 choreographers

and 4 theater directors biometric response data to a dance
and a theater piece, to understand how they would use information about audience engagement. Most sessions lasted
approximately an hour and the expert participants were compensated with gift cards.
Procedure
Participants were introduced to the video player and the

facilitator explained skin conductance using common language. The explanation was necessary because we were asking our experts to speculate on a type of data whose content
The participants used a custom web-based video player that

added an extra layer of information, audience engagement,
1848
The only other thing I can think of... hmmm... we go

from a duet and then suddenly youve pulled everything
down and its just one figure. Does that change how
people are responding? Is there less to respond to? [] I
wonder.... [], does the number of dancers on the stage
have anything to do with these kind of readings?
was relevant, but whose form was relatively foreign to them.

The participants were then invited to explore the interface
and interact with the data. The session was semi-structured;
the facilitator asked pre-determined questions, and followed
up on topics raised by the participant.
The participants were asked to think aloud as they used the
video player. The sessions were recorded and we took notes
that were used in a discussion at the end of each session.
Interview transcripts were initially reviewed individually by
the researchers to extract themes. Next we looked holistically at the data to identify patterns and relationships.
The data doesnt necessarily provide clear direction to performing arts experts in all cases. T4 clearly understood
the dangers of a stimulus-response approach to using the
data and was adamantly against using it to make second-bysecond adjustments to increase arousal level, stating:
I could literally make a play where people are talking
in whispers and screaming every other sentence - technically, vocally manipulate them... to keep the response
constantly on an up level.
Results and Discussion
The exploratory study was not about the particular interface

we designed, but was meant to explore how experts react
to audience engagement data. However, we note that the
experts were shown how to use the interface and seemed to
have no issues using it. The participants understood that the
data had been collected from people watching videos of the
dance and the play, rather than live, and almost all of them
thought the arousal levels would be higher if the GSR had
been collected during live performances.
These comments demonstrate that experts evaluate a full

range of possible interpretations for audience response data.
Arousal versus Valence
A theme that strongly emerged during the think aloud sessions was related to the issue of arousal versus valence. The
data that our experts were shown was arousal, but they were
asked if they would also be interested in valence data. Our
participants were either not interested in valence at all because they considered it purely subjective, or they were only
interested in getting valence if they could also get causal explanations. C2 explained:
None of the participants seemed to notice or care about the

ambient arousal level reflected in the saturation level of the
video frame border. The participants were explicitly shown
how to create semantic chunks along the timeline to see the
aggregate arousal level for a section of a performance and
two participants used this feature. C2 was enthusiastic about
the ability to get average arousal for specific segments of the
dance (her chunking is shown in Figure 1, stating:
When you talk to people, theyre only able to say what

they like. Its very concentrated to what theyre cognizant of...[]. I feel like theres a little less value in,
I like this, but I didnt like this. What Im trying to
say is, I like that you have an organic response thats
subconscious, totally subconscious. I like the fact that
there are these unaccountable variables that could be
altering this.
I think its really cool this chunking system. I like the

fact that people can do this totally differently. You can
get as specific or as broad as you want...
The experts investigated linkages between the data and different aspects or sections of the performances. We observed
that directors interpreted high and low arousal levels and
suggested various explanations. At other times, the experts
made inferences on what might be causing different levels
of engagement. They all noted that many factors could be
influencing the arousal levels, including the music, or what
people happened to be looking at, and so they were implicitly veering away from a strict stimulus-response interpretation of the data. C2 raised this attention issue, implying that
she felt a stimulus-response interpretation wouldnt suffice:
All of our experts felt that valence is limited because reactions to dance and theater are subjective. T3 commented:
...so Im saying that as designers and directors are trying to explore different things, it [audience engagement
data] becomes very useful. It doesnt become useful for
anyone in terms of Did they enjoy my work? because
thats so subjective and limiting.
When asked specifically if he would be interested in seeing
like/dislike data, T3 replied:
I was just watching this dancer, and I looked over at her

crawling back through the thing, and I really like that.
And I imagine that I might have felt aroused for a second, but for instance, I know that started a lot earlier...
but thats just when I happened to notice it? So how do
you think that factors into it?
I mean, I think, yeah. I think all of that stuff is good.

[] But, we want to know why. [...] So yeah, like and
dislike are good but only if we know why.
In interaction design, a typical goal is pleasurable experience. We can see that this stands in contrast with an artistic setting, where positive and negative emotions are equally
valued; but what matters most is audience engagement. This
can be a subtle aim: worthwhile experiences can be negative.
C1 struggled to interpret the meaning of lower aggregate

arousal during a segment of her dance when she expected
higher arousal levels, and suddenly realized there might be a
link to the number of dancers on stage:
1849
Application to Practice
EMPIRICAL AUDIENCE RESPONSE STUDY
All of our participants commented on how they might be

able to use this type of data in their work. T2 commented:
Based on the interest in audience response data shown by

our performing arts experts, we wanted to take the next step
in this research. The followup question, given their interest,
is what does audience response data mean? To determine
the feasibility and benefits of collecting this data, it is important to have a deep understanding of what the data means
in a performance context. Specifically, we wanted to investigate the relationship between GSR and self-report measures
of audience engagement, to understand whether the arousal
measured by GSR can be accurately interpreted within the
boundaries of the concept of audience engagement.
If this was my play I was directing right now, I would

be... this section right here, is crazy. I would definitely
look at that and try to figure out. What is the gender of
this? Male and then weve got male... okay. And these
are all females... So I would want to know what can I
do to that scene to make it more interesting to males?
C3 mentioned that he would like to use our system to collect
data on dance pieces that he is planning to re-stage. He also
noted that he had choreographed a dance to the same piece
of music as the piece shown in our study, and would like
to be able to compare the audience response data from the
two different dances. T4 had a related idea and talked about
testing the same play with different sized audiences to see
how audience size affected group dynamics and arousal.
Methodology and Materials
To study the meaning of biometric audience response, we recruited 49 participants (18 male, 31 female, all students) to
watch a video of an 11-minute dance performance. Ten participants were from fine arts, the rest were from other varied
disciplines. Since watching a video of a dance performance
is different than the experience of attending a live dance concert, we made environmental decisions to increase the participants sense of immersion, which included projecting the
dance onto a 60-inch projector screen, having the participants wear headsets to listen to the soundtrack and watch
the video in a dimly lit, temperature-controlled room.
Finally, four of our experts commented on the value of gathering audience engagement data for educational and training
purposes. T2 was interested in using the data to train actors:
But with him [referring to a point of high audience
arousal while an actor is on-stage], I could be like,
That is your moment - now what are you doing there?
Like, really working on that specific moment where the
audience was a little more interested in him...
Each participant wore Thought Technology GSR fingerwraps on two fingers of their non-dominant hand, leaving their dominant hand available to rate their engagement
with the performance using a physical slider. The biggest
methodological challenge was determining the best way to
have participants report their explicit, conscious responses
to the dance performance. We wanted to give users a simple
physical slider that could be used mostly eyes-free. We experimented with a number of different scales and labels to try
to capture the concept of engagement with words that participants would easily relate to. During pilot studies, we found
that simply labeling the slider with No Engagement and High
Engagement was confusing. Participants could not detach
valence from the word, and tended to only rate themselves
as being engaged when they liked what they saw. Others just
didnt really seem to know what we meant by engagement,
and still others didnt seem to know how engaged they were.
Given this, we had to investigate alternative vocabularies to
help users report their engagement levels.
Aggregation versus Demographics
The individual line and aggregate lines supported different

kinds of inquiry into the audience data. As T1 notes:
So there is an advantage to the aggregate because
it helps show a more direct correspondence between
whats going on and what everyone together is experiencing. But theres also an advantage to the individual
data because then you can see different responses.
C3 was intrigued by the individual data and surprised by how
much difference there was between the individual responses.
The data presented showed age and gender and the gender
information was examined by a number of experts, but particularly T2, who examined and referred to the gender consistently. T4 was also glad to have the individual lines as he
was apprehensive about catering to the average:
We finally settled on the two most promising label sets. Half

of our participants were presented with a slider that had the
label Hate it! at the bottom, the label Love it! at the top,
and a notch in the middle to help users feel when the slider
was in a neutral position (LH scale). The other half of our
participants were given a slider that had the label No Emotional Reaction at the bottom and the label Strong Emotional
Reaction at the top (ER scale). We anticipated that the LH
scale would be more intuitive to users because it is easier to
express valence than emotional reaction level. Therefore, we
gave more specific instructions for the ER scale, explaining
that this scale encompasses many different emotions and that
strong emotional reactions could be positive or negative.
I have a fear of this in some respects because to me...

its like eugenics of biology. The idea that... that you
can tell... thats why Im so glad about the individual
response data. People are so diverse that as a director,
as an artist... you cant deal with them as a group.
It is worth noting that it was easy to provide access to the individual lines in this exploratory study because of the small
n. However, visualizing lines from an audience of 300 would
be significantly more difficult. But, an aggregate measure is
more statistically valid than an individual sample, as it will
effectively remove much of the noise from the signals.
1850
However, our intuition led us to believe that participants

would have a somewhat difficult time with the ER scale.
People often have low awareness of their emotional reactions
society teaches us to keep our emotions under control in order to maintain decorum. We thought that if we asked people to react to the performance using an LH scale, that they
would be much more comfortable with this. People seem
to easily know whether or not they like something, even if
they cant explain why. Because the LH scale has stronger
reactions at both the bottom and the top of the scale, we did
not anticipate a strong correlation with GSR readings. We
expected the GSR ratings to fluctuate less than the LH responses:
The experiment ran as a between-subjects study, with participants randomly assigned to one of the two groups. Each participant was run individually and thus we did not try to capture any of the social effects that would occur in a true performance setting. We instructed the participants that while
watching the dance video, they should use their dominant
hand to rate the performance using the slider, since their nondominant hand was attached to the GSR sensor. Prior to the
start of the dance performance, we captured three minutes of
GSR baseline while the participants sat alone in a dimly lit
room. After the baseline period, our software launched the
dance performance. Participants were told:
Please change the location of the slider as often as
necessary to reflect changes in your feelings toward the
video. You should move the slider as often as your feelings change: this could be every few seconds, or if your
feelings remain constant, there may be periods where
the slider stays at the same location, to reflect that.
H2 : RGSRLH < 0.1

Our performing arts experts told us they were not interested
in valance, but we thought that it might be a good surrogate
for emotional reaction if we could take the absolute value of
the reaction, effectively removing the valence information.
Thus, we also planned to run analysis on the absolute value
of the LH slider readings, which would give us larger positive numbers for stronger reactions, regardless of the valence
of those reactions. We anticipated a positive correlation between GSR readings and the absolute value of the LH slider
values. We also anticipated that there would be a correlation between the ER values and the absolute value of the
LH values, despite these readings coming from two separate
participant groups.
After watching the video, participants were shown a graph of

their self-report ratings while the video played again. In this
post-hoc interview, we asked them to explain their ratings,
particularly the peaks and valleys in their data.
Data Collection and Normalization
GSR samples and slider ratings were taken every 500ms, and
our study concluded with 49 participants files consisting of
11 minutes worth of GSR data and slider ratings. After collecting the data, we applied a three-second moving average
filter to both the GSR data and the self-report slider data.
We smoothed the GSR data using this filter because GSR responses lag stimulus by 1-5 seconds [6]. We also smoothed
the self-report responses to account for different slider reaction times between participants.
H3 : RGSR|LH| > 0.1

H4 : RER|LH| > 0.1
Results
The LH slider generated readings on a scale from -50 to 50

(giving a neutral response a value of 0), while the ER scale
ranged from 0 to 100. The GSR data and the LH self-report
data was transformed to a scale of 0 to 100 so that correlations could be collected using the same scale. The GSR mapping transformed each participants readings by mapping the
range of minimum to maximum GSR readings to 0-100, following a process similar to Mandryk et al. [22]. This was
necessary to compare the GSR results across participants
(since different participants have different GSR baselines),
and it was also necessary in order to perform statistical analysis between GSR readings and self-report values.
In our analysis, we computed the correlation between the

average self-report measure of engagement and the average
GSR that was collected for that measure, thus enabling comparisons within the two groups and across them. We computed correlations of the averaged responses in order to eliminate noise that is inherent in GSR sensor data. There were
no significant differences in the responses between fine arts
students and students from other disciplines.
We found that there was a strong, significant correlation between the ER scale and participants GSR (r = 0.316, p <
0.001), allowing us to accept H1. This is a result that we
expected because a strong ER should be synonymous with
high arousal. On the other hand, the relationship between
the LH scale and participants GSR was weakly negative.
(r = 0.142, p < 0.001). This allows us to accept H2.
Hypotheses
Engagement is not particularly well-defined, and as mentioned previously, determining the right questions to ask of
our participants in order to elicit their engagement level was
difficult. We chose the emotional reaction labels because the
literature suggests that the arousal level measured by GSR
is most related to the intensity of emotional reactions. Thus,
we anticipated a positive correlation between GSR readings
and the ER slider values:
We are also able to accept H3, in which we predicted that the

absolute value of the LH scale would be strongly correlated
with GSR (r = 0.432, p < 0.001). In fact, this correlation
was stronger than the correlation between the ER scale and
participants GSR. We were very pleased with this result because it suggests that this one self-report scale, LH, can give
us two dimensions of information valence and arousal.
H1 : RGSRER > 0.1
1851
ing, rather than music. Few participants liked this I didnt

really care for the guy talking. The voice started out, and I
couldnt understand it... It seemed to be just random things
that I couldnt understand.
We anticipated in H4 that the absolute value of the LH scale

would be similar to the responses from the ER scale, but
the data shows a weak correlation (r = 0.150, p < 0.001).
This weakness may be due to real differences in what participants were reporting, but may be related to the fact that the
responses are from two different groups of people.
Emotional Reaction Group
People in the Emotional Reaction group did not give as many

specific details about the dance performance. In general,
people rated higher reactions to areas of the dance where
there was more movement This is more what I think of
as dancing it has a beat, its synchronous. I was more
impressed when they were up and doing stuff. Just like the
LH group, people also responded to the dancer whose movements were different than the other dancers. Many people
found her movements confusing and indicated a low reaction, while others responded with a high reaction to her. One
person explained: I get really happy when shes in the front
dancing... and then she goes away, and I feel sad.
Finally, we were interested in knowing whether there would

be significant correlations between the GSR readings of
the two different groups. In other words, did the different
self-report scales given to the two groups affect their autonomic responses? We didnt anticipate a big difference, but
we did think that the stress of reporting the level of emotional reaction might cause that groups overall GSR ratings
to be higher. The correlation between the GSR readings
from the two groups was positive and moderate in strength
(r = 0.256, p < 0.001). This suggests that in fact the biometric responses are not strongly impacted by the different
instructions given to the two groups. There was also no statistically significant difference between the mean GSR for
the ER group (51.35, SD=7.19) and the LH group (M=48.11,
SD=4.24), though the mean is slightly higher for the ER
group. Thus, there is no statistical evidence to support our
intuition that the ER slider was more difficult to use.
Participants indicated a low reaction when they were confused I think the reason I went down was because of the
talking. It was confused why there was talking during the
performance... I couldnt understand where it came into the
performance. A few participants seemed to have a difficult
time using the emotional reaction scale. For example, one
person said, I think that I stayed at the half way point because I didnt know what to feel. In the Emotional Reaction
group, we also had one person move their slider only twice
(this did not happen in the LH group) I didnt experience
any emotions... I wasnt really bored but at the same time,
I wasnt really interested. Comprehension is not well captured by the emotional reaction slider. Participants in the LH
group could lower their love-hate rating if something confused them, but the mapping to the ER scale is less clear.
Qualitative Results
In addition to correlating explicit self-report ratings to autonomic biometric readings, we had a third probe into the participants response to the performance, which came from the
post-hoc interviews in which we showed participants their
self-report data while watching the video and asked them to
explain their ratings. Themes emerged that elucidate the aspects of the performance that generated conscious responses.
Love/Hate Group
Participants that used the LH scale were able to list very specific aspects of the dance performance that they liked or did
not like based on the dance movements and the sound score.
Almost all participants agreed that they liked the faster, bigger movements, opposed to the slower movements: These
movements were just not very exciting. It was kind of slow.
In addition to tempo, they also commented on specific movements: I liked the diamond shape of the legs. The legs in
the air... they were moving, kind of like swimming. I liked
that. Right here, I think its pretty cool what theyre doing... While there was a lot of agreement between participants on these movements, there was a certain dancer in the
piece whose movements were very different than the other
dancers. Some people loved her movements, while others really disliked them. For example, I liked watching her since
she was doing something completely different... versus I
didnt like that person crawling on the ground...
DISCUSSION
We opened this paper with two broad questions. First, how

do performing arts experts respond to temporal biometric
readings from their audience members and what would they
do with that data? Second, how should biometric readings
taken during a performance be interpreted? Is it reasonable
to interpret them as a measure of audience engagement?
Our experts were interested and intrigued by the data in our
exploratory study. The experts did not make rash judgments
nor plan changes to achieve constant high arousal. Instead,
they tried to understand and address unexpected or long periods of low audience arousal. They appeared to acknowledge
the limitations of applying a stimulus-response interpretation to the data. T4 was specifically concerned about this:
What worries me about the data is, I can see how producers, the people who have the money, will eradicate
certain artists based upon this instantaneous feedback.
People also commented how they liked when the music became more uplifting or happy, which coincided with the
more dancy movements that people also described liking.
We also had one person who indicated that he liked a certain
part of the dance because he said, The music reminded me
of some songs that I know. In this particular dance, there
was a segment of the score that consisted of a voice speak-
The experts were reflective in their interpretations and understood that the data was a noisy signal and that different
factors could be affecting each individuals readings.
The positive response from performing arts experts and their
1852
There is a difficult issue to address in this area of research:

art is slow, and gsr (or other biometric measurements) are
fast. It is very important to emphasize this in any work
that looks to quantify and analyze responses to artwork or
performance. In our exploratory study we encouraged experts to segment the data into chunks, which helped to steer
them away from a second-by-second analysis of the data.
In our second study, we opted to examine overall correlations across the entire data time-set to understand aggregate
responses, and we did not look at second-by-second correlations related to specific events. We feel our nuanced approach is appropriate, but we admit that our work cannot
capture reflective responses to art that happen after the fact.
willingness to interpret the data led us to our empirical study.

We felt that before deploying a system with live audiences,
we should understand the relation between galvanic skin response and audience responses to a performance. Our empirical study provides support for treating GSR as a valid
measure of audience engagement - it is not simply random or unrelated to the performance. GSR readings are
not only related to the performance as interpreted by arts
experts, they are also strongly related to multiple conceptions of self-reported engagement by audience members. We
tested two promising vocabularies to help users self-report
their engagement level. The significant correlations in the
GSR readings between the LH and ER groups and for the
slider readings between the two groups (after the absolute
value transformation of the LH readings) show that these
two different framings reveal the same primitive reaction of
engagement. However, the LH slider (and taking the absolute value of the ratings) gets at engagement more easily and
with higher validity, given its stronger correlation with GSR.
Further, we plan to study audience biometric responses during live performances to examine patterns in live performance settings. Finally, our work represents a fundamental
step in understanding biometrics as related to audience engagement. Our future work will continue to build on that understanding through several different facets to create a richer
picture of audience experience and engagement.
The fact that the absolute value of participants LH slider ratings correlated so strongly with their GSR readings is important for three reasons. First, if other researchers wish to have
participants self-report on engagement, this scale was easier
for participants to use, and once transformed, is a valid measure of engagement. Second, if there was interest in building
audience response devices to gather both implicit biometric
responses and explicit self-report responses, the LH scale
would be better to use because participants have an easier
time using it. Third, despite our earlier finding that performing arts experts are not interested in valence, C2 confided:
ACKNOWLEDGEMENTS
This work was funded by an NSF CreativeIT grant (#IIS0855882). We thank our study participants and members
of the HCILab for their feedback. We would also like to
acknowledge DREU students Charlotte Smail and Millicent
Walsh for their assistance during the summer of 2010.
REFERENCES
1. S. Banes. Democracys body: Judson dance theatre and

its legacy. Performing Arts Journal, 5(2):98107, 1981.
I think having access to a graph of valence, it would be

like a guilty pleasure for me to look at. [] It would be
like, do people like me? Do people like my work? But
I would know, as an artist, thats not interesting...
2. S. Bardzell, J. Bardzell, and T. Pace. Understanding

affective interaction: Emotion, engagement, and
internet videos. In Proc. of 2009 IEEE International
Conference on Affective Computing and Intelligent
Interactio, 2009.
This suggests that despite our experts stated lack of interest in valence, there may be curiosity. Since the LH slider
allows us to collect valence and a representation of engagement, it gives us more information with seemingly less cognitive effort expended by users. A system that allowed collection of both autonomic and explicit self-report data could
combine the two signals to cancel noise.
3. L. Barkhuus and T. Jrgensen. Engaging the crowd:

studies of audience-performer interaction. In CHI 08:
CHI 08 extended abstracts on Human factors in
computing systems, pages 29252930, New York, NY,
USA, 2008. ACM.
4. H. Baumgartner, M. Sujan, and D. Padgett. Patterns of
affective reactions to advertisements: The integration of
moment-to-moment responses into overall judgments.
Journal of Marketing Research, 34(2):219232, 1997.
CONCLUSIONS AND FUTURE WORK
We have presented a set of theoretical, exploratory and empirical results on the collection and use of temporal biometric audience response data in an effort to further understand
audience engagement in the performing arts. We conclude
that performing arts experts are interested in the autonomic
reactions of their audience members collected throughout
performances and that they are carefully reflective in interpreting such data. We showed a strong correlation between
participants explicit ratings of level of emotional reaction
and their autonomic GSR responses. We also showed a
strong correlation between the absolute value of love-hate
ratings of a performance and participants GSR responses.
Our results support the validation of temporal GSR data as a
reflection of audience engagement.
5. A. Boal. The theatre of the oppressed. Routledge Press,

1982.
6. W. Boucsein. Electrodermal Activity. Springer, 1992.
7. W. Bouscein. Electrodermal Activity. Plenum Press,
New York, NY, USA, 1992.
8. J. F. Brosschot and J. F. Thayer. Heart rate response is
longer after negative emotions than after positive
emotions. International Journal of Psychophysiology,
50(3):181187, 2003.
1853
23. I. Mauss, R. Levenson, L. McCarter, F. Wilhelm, and

J. Gross. The tie that binds? Coherence among
emotional experience, behavior, and autonomic
physiology. Emotion, 5:175190, 2005.
9. B. Calvo-Merino, C. Jola, D. Glaser, and P. Haggard.

Towards a sensorimotor aesthetics of performing art.
Consciousness and Cognition, 17(3):911 922, 2008.
10. S. DMello, R. W. Picard, and A. Graesser. Toward an
affect-sensitive autotutor. IEEE Intelligent Systems,
22:5361, 2007.
24. D. Maynes-Aminzade, R. Pausch, and S. Seitz.

Techniques for interactive audience participation. In
ICMI 02: Proceedings of the 4th IEEE International
Conference on Multimodal Interfaces, page 15,
Washington, DC, USA, 2002. IEEE Computer Society.
11. G. Donato, M. S. Bartlett, J. C. Hager, P. Ekman, and

T. J. Sejnowski. Classifying facial actions. IEEE Trans.
Pattern Anal. Mach. Intell., 21(10):974989, 1999.
25. J. N. Peterman. The program analyzer a new technique

in studying liked and disliked items in radio programs.
Journal of Applied Psychology, 23:725741, 1940.
12. M. Graham. Blood Memory. Doubleday, 1991.

13. C. Hovland, L. Lumsdaine, and F. Sheffield.
Experiments on mass communication. Princeton
University Press, Princeton, USA, 1949.
26. R. W. Picard. Affective computing. MIT Press,

Cambridge, MA, USA, 1997.
14. W. James. The Principles of Psychology (2 vols.).

Henry Holt (Reprinted Bristol: Thoemmes Press:
1999), New York, NY, USA, 1890.
27. J. Radbourne, K. Johanson, H. Glow, and T. White.

Audience experience: measuring quality in the
performing arts. International Journal of Arts
Management, 11(3):1629, 2009.
15. D. Kahneman. Well-being: The foundations of hedonic

psychology, pages 325. Russell Sage Foundation, New
York, NY, USA, 1999.
28. A. Ranjan, J. Birnholtz, and R. Balakrishnan.

Improving meeting capture by applying television
production principles with audio and motion detection.
In CHI 08: Proceedings of the 26th international
conference on Human factors in computing systems,
pages 227236, New York, NY, USA, 2008. ACM.
16. M. M. Khan, M. Ingleby, and R. D. Ward. Automated

facial expression classification and affect interpretation
using infrared measurement of facial skin temperature
variations. ACM Trans. Auton. Adapt. Syst.,
1(1):91113, 2006.
29. J. Russel. A circumplex model of affect. Journal of

Personality and Social Psychology, 39:11611178,
1980.
17. K. Kuikkaniemi, T. Laitinen, M. Turpeinen, T. Saari,

I. Kosunen, and N. Ravaja. The influence of implicit
and explicit biofeedback in first-person shooter games.
In CHI 10: Proceedings of the 28th international
conference on Human factors in computing systems,
pages 859868, New York, NY, USA, 2010. ACM.
30. R. Schechner. Environmental Theatre. Applause Books,

2000.
31. N. S. Shami, J. T. Hancock, C. Peter, M. Muller, and
R. Mandryk. Measuring affect in HCI: going beyond
the individual. In CHI 08 extended abstracts on
Human factors in computing systems, pages
39013904, New York, NY, USA, 2008. ACM.
18. P. Lang. The emotion probe: Studies of motivation and

attention. American Psychologist, 50(5):372385,
1995.
19. J. LeDoux. The Emotional Brain, the mysterious
underpinnings of emotional life. Simon & Schuster,
New York, NY, USA, 1996.
32. S. Shapiro, D. J. MacInnis, and C. W. Park.

Understanding program-induced mood effects:
Decoupling arousal from valence. Journal of
Advertising, 31(4):1526, 2002.
20. D. Lottridge and M. Chignell. Emotrace: Tracing

emotions through human-system interaction. In HFES
09: Proceedings of the 53rd annual meeting of the
Human Factors and Ergonomics Society, pages
15411545. HFES, 2009.
33. D. Stayman and D. Aaker. Continuous measurement of

self-report of emotional response. Journal of
Psychology and Marketing, 10(3):199214, 1993.
34. C. J. Stevens, E. Schubert, R. H. Morris, M. Frear,
J. Chen, S. Healey, C. Schoknecht, and S. Hansen.
Cognition and the temporal arts: Investigating audience
response to dance using PDAs that record continuous
data during live performance. International Journal of
Human-Computer Studies, 67(9):800 813, 2009.
21. R. Mandryk and M. Atkins. A fuzzy physiological

approach for continuously modeling emotion during
interaction with play environments. International
Journal of Human-Computer Studies, 6(4):329347,
2007.
22. R. L. Mandryk, M. S. Atkins, and K. M. Inkpen. A
continuous and objective evaluation of emotional
experience with interactive play environments. In CHI
06: Proceedings of the SIGCHI conference on Human
Factors in computing systems, pages 10271036, New
York, NY, USA, 2006. ACM.
35. R. B. Zajonc. Feeling and thinking: Preferences need

no inferences. American Psychologist, 35:151175,
1980.
1854

Love, Hate, Arousal and Engagement: Exploring Audience Responses To Performing Arts

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Love, Hate, Arousal and Engagement: Exploring Audience Responses To Performing Arts

Transféré par

Droits d'auteur :

Formats disponibles

CHI 2011 Session: Performing Arts

May 712, 2011 Vancouver, BC, Canada