Communications of ACM 2018 November

COMMUNICATIONS
ACM
CACM.ACM.ORG OF THE 11/2018 VOL.61 NO.11
Special Section
on China Region
A Look at the
Design of Lua
AI, Explain Yourself
Software Challenges for the
Changing Storage Landscape Association for
Computing Machinery
VEE 2019
15th ACM SIGPLAN/SIGOPS international conference on
Virtual Execution Environments

Providence, RI April 13-14, 2019 with ASPLOS
Authors are invited to submit original papers related to virtualization across all
layers of the software stack, from high-level language virtual machines down to the
micro-architectural level. VEE 2019 accepts both full-length and short papers.
Abstract deadline: December 7, 2018
General Chair Program Co-chairs

Jennifer Sartor Christopher J. Rossbach
(Vrije Universiteit Brussel and (UT Austin and VMware Research) in cooperation with
Ghent University) Mayur Naik
(University of Pennsylvania)
http://conf.researchr.org/home/vee-2019
The ACM International Conference on
Interactive Experiences for Television and Online Video
M anchester , U K 5 t h - 7t h ` J une 2 01 9
Bringing together researchers and practitioners to explore the design,

engineering and human experience of future online, interactive and
immersive video content.
Important Dates Key Topics
16th November 2018

Workshop Proposals Artificial Intelligence // Big Data
// Social Computing // Immersive
18th January 2019
Experiences // Virtual Reality //
Full & Short Papers (abstract, title & metadata)
Mixed Reality // UX // Interaction
25th January 2019 Design // Content Production //
Full & Short Papers (submission of manuscript) Systems and Infrastructures //
22nd March 2019 Devices // Interaction
TVX-in-Industry, Demos, Work-in-Progress, Techniques // Media Studies //
Doctoral Consortium Business Models // Marketing //
Innovative Concepts // Media Art
1st November 2018 // Object Based Media // etc...
SIGCHI Student Travel Grant
15th November 2018
SIGCHI Gary Marsden Student Development tvx.acm.org/2019
Fund
COMMUNICATIONS OF THE ACM
Departments Viewpoints Special Section: China Region
5 Cerf’s Up 20 Legally Speaking

The Upper Layers of the Internet The EU’s Controversial
By Vinton G. Cerf Digital Single Market Directive
Should copyright enforcement
7 Vardi’s Insights have precedence over the interests
Self-Reference and Section 230 of users in information privacy
By Moshe Y. Vardi and fundamental freedoms?
By Pamela Samuelson
8 BLOG@CACM
The Gap in CS, Mulling 24 Inside Risks
Irrational Exuberance The Big Picture
A systems-oriented view
23 Calendar of trustworthiness.
By Steven M. Bellovin
and Peter G. Neumann 36
166 Careers
27 Education 36 This issue presents the first in

Last Byte How Machine Learning a series of regional special sections
Impacts the Undergraduate that spotlight computing innovation
176 Future Tense Computing Curriculum from around the world. Technology
Between the Abbey The growing importance of machine advances from the China Region
and the Edge of Time learning creates challenging are explored in a collection
A photo marks my place, questions for computing education. of articles that focus on some
then and now. By R. Benjamin Shapiro, of the big trends and hot topics
By Brian Clegg Rebecca Fiebrink, and Peter Norvig shaping its computing landscape.
30 Viewpoint
Watch the co-organizers
News Using Any Surface to Realize discuss this section
a New Paradigm for in the exclusive
Communications video.
11 AI, Explain Yourself Wireless Communications https://cacm.acm.org/
It is increasingly important Programmable wireless videos/china-region
to understand how artificial environments use unique

intelligence comes to a decision. customizable software processes
By Don Monroe rather than traditional
rigid channel models.
14 A New Movement in Seismology By C. Liaskos, A. Tsioliaridou,
Unused telecom fiber might A. Pitsillides, S. Ioannidis, and I. Akyildiz
be used to detect earthquakes,
uncover other secrets in the soil. 34 Viewpoint
By Neil Savage Crude and Rude?
Old ways in the new oil business.
16 Weighing the Impact of GDPR By Janne Lahtiranta
ILLUSTRATION BY SPOOKY POOKA AT DEBUT ART
About the Cover:

The EU data regulation will affect and Sami Hyrynsalmi A spherical mosaic of some
of the images, icons, and
computer, Internet, and technology technologies depicted in
usage within and outside the EU; this issue’s special section
on the China Region.
how it will play out remains to be seen. Cover illustration by
By Samuel Greengard Spooky Pooka at Debut Art.
IMAGES IN COVER COLLAGE: Naomi Wu photo courtesy of Naomi Wu/Wikimedia

CC-BY-SA-4.0. AliExpress photo by Piotr Swat/Shutterstock.com; PUBG player
photo by Hafiez Razali/Shutterstock.com; subway photo by Chutharat Kamkhuntee/
Shutterstock.com; Xiaomi store photo by THINK A/Shutterstock.com; robot
waiters photo by Inspired By Maps/Shutterstock.com; smart-bike photo by tomocz/
Shutterstock.com; Lei Jun photo by zhangjin_net/Shutterstock.com; Alibaba app photo
by Jonathan Weiss/Shutterstock.com. Additional stock images from Shutterstock.com.
2 COMMUNICATIO NS O F THE ACM | NOV EM BER 201 8 | VO L . 61 | NO. 1 1

11/2018 VOL. 61 NO. 11
Practice Contributed Articles Review Articles
136 Software Challenges for the

Changing Storage Landscape
Conventional storage software
stacks are unable to meet the needs
of high-performance Storage-Class
Memory technology. It is time to
rethink 50-year-old architectures.
By Daniel Waddington and Jim Harris
Research Highlights
147 Technical Perspective

Backdoor Engineering
98 124 By Markus G. Kuhn
88 Corp to Cloud: 106 Skill Discovery in Virtual Assistants 148 Where Did I Leave My Keys?
Google’s Virtual Desktops Skill recommendations must be Lessons from the Juniper
How Google moved its virtual provided when users need them Dual EC Incident
desktops to the cloud. most, without being obtrusive By Stephen Checkoway, Jacob
By Matt Fata, Philippe-Joseph Arida, or distracting. Maskiewicz, Christina Garman,
Patrick Hahn, and Betsy Beyer By Ryen W. White Joshua Fried, Shaanan Cohney,
Matthew Green, Nadia Heninger,
95 Research for Practice: 114 A Look at the Design of Lua Ralf-Philipp Weinmann,
Knowledge Base Construction Simplicity, small size, portability, Eric Rescorla, and Hovav Shacham
in the Machine-Learning Era and embeddability set Lua apart
Three critical design points: from other scripting languages. 156 Technical Perspective
Joint learning, weak supervision, By Roberto Ierusalimschy, Making Sleep Tracking
and new representations. Luiz Henrique de Figueiredo, More User Friendly
By Alex Ratner and Chris Ré and Waldemar Celes By Tanzeem Choudhury
98 Tracking and Controlling 157 LIBS: A Bioelectrical Sensing System

Watch the authors discuss
Microservice Dependencies this work in the exclusive from Human Ears for Staging
Dependency management Communications video. Whole-Night Sleep Study
https://cacm.acm.org/
is a crucial part of system videos/a-look-at-the- By Anh Nguyen, Raghda Alqurashi,
and software design. design-of-lua Zohreh Raghebi,
By Silvia Esparrachiari Ghirotti, Farnoush Banaei-Kashani,
Tanya Reilly, and Ashleigh Rentz 124 Modern Debugging: The Art of Ann C. Halbower, and Tam Vu
(L) PHOTO BY VERSH ININ89; ( R) IM AG ERY BY BOBNEVV AND A RT- SO NIK
Finding a Needle in a Haystack

Articles’ development led by Systematic use of proven
queue.acm.org
debugging approaches and tools
lets programmers address even
apparently intractable bugs.
By Diomidis Spinellis
Association for Computing Machinery

Advancing Computing as a Science & Profession
N OV E MB E R 2 0 1 8 | VO L. 6 1 | N O. 1 1 | C OM M U N IC AT ION S OF THE ACM 3

COMMUNICATIONS OF THE ACM
Trusted insights for computing’s leading professionals.
Communications of the ACM is the leading monthly print and online magazine for the computing and information technology fields.
Communications is recognized as the most trusted and knowledgeable source of industry information for today’s computing professional.
Communications brings its readership in-depth coverage of emerging areas of computer science, new trends in information technology,
and practical applications. Industry leaders use Communications as a platform to present and debate various technology implications,
public policies, engineering challenges, and market trends. The prestige and unmatched reputation that Communications of the ACM
enjoys today is built upon a 50-year commitment to high-quality editorial content and a steadfast dedication to advancing the arts,
sciences, and applications of information technology.
ACM, the world’s largest educational STA F F EDITORIAL BOARD ACM Copyright Notice
and scientific computing society, delivers DIRECTOR OF PU BL ICATIONS E DITOR- IN- C HIE F Copyright © 2018 by Association for
resources that advance computing as a Scott E. Delman Andrew A. Chien Computing Machinery, Inc. (ACM).
science and profession. ACM provides the cacm-publisher@cacm.acm.org eic@cacm.acm.org Permission to make digital or hard copies
computing field’s premier Digital Library Deputy to the Editor-in-Chief of part or all of this work for personal
and serves its members and the computing Executive Editor Lihan Chen or classroom use is granted without
profession with leading-edge publications, Diane Crawford cacm.deputy.to.eic@gmail.com fee provided that copies are not made
conferences, and career resources. Managing Editor S E NIOR E DITOR or distributed for profit or commercial
Thomas E. Lambert Moshe Y. Vardi advantage and that copies bear this
Executive Director and CEO Senior Editor notice and full citation on the first
Vicki L. Hanson Andrew Rosenbloom page. Copyright for components of this
NE W S
Deputy Executive Director and COO Senior Editor/News work owned by others than ACM must
Patricia Ryan Co-Chairs be honored. Abstracting with credit is
Lawrence M. Fisher
Director, Office of Information Systems William Pulleyblank and Marc Snir permitted. To copy otherwise, to republish,
Web Editor
Wayne Graves Board Members to post on servers, or to redistribute to
David Roman
Director, Office of Financial Services Monica Divitini; Mei Kobayashi; lists, requires prior specific permission
Editorial Assistant
Darren Ramdin Michael Mitzenmacher; Rajeev Rastogi; and/or fee. Request permission to publish
Jade Morris
Director, Office of SIG Services François Sillion from permissions@hq.acm.org or fax
Donna Cappo Art Director (212) 869-0481.
Director, Office of Publications Andrij Borys VIE W P OINTS
Scott E. Delman Associate Art Director Co-Chairs For other copying of articles that carry a
Margaret Gray Tim Finin; Susanne E. Hambrusch; code at the bottom of the first or last page
Assistant Art Director John Leslie King; Paul Rosenbloom or screen display, copying is permitted
ACM CO U N C I L
Mia Angelica Balaquiot Board Members provided that the per-copy fee indicated
President
Production Manager Stefan Bechtold; Michael L. Best; Judith Bishop; in the code is paid through the Copyright
Cherri M. Pancake
Bernadette Shade Andrew W. Cross; Mark Guzdial; Haym B. Hirsch; Clearance Center; www.copyright.com.
Vice-President
Intellectual Property Rights Coordinator Richard Ladner; Carl Landwehr; Beng Chin Ooi;
Elizabeth Churchill
Barbara Ryan Francesca Rossi; Loren Terveen; Subscriptions
Secretary/Treasurer
Advertising Sales Account Manager Marshall Van Alstyne; Jeannette Wing; An annual subscription cost is included
Yannis Ioannidis
Ilia Rodriguez Susan J. Winter in ACM member dues of $99 ($40 of
Past President
Alexander L. Wolf which is allocated to a subscription to
Chair, SGB Board Columnists Communications); for students, cost
Jeff Jortner David Anderson; Michael Cusumano; P R AC TIC E is included in $42 dues ($20 of which
Co-Chairs, Publications Board Peter J. Denning; Mark Guzdial; Co-Chairs is allocated to a Communications
Jack Davidson and Joseph Konstan Thomas Haigh; Leah Hoffmann; Mari Sako; Stephen Bourne and Theo Schlossnagle subscription). A nonmember annual
Members-at-Large Pamela Samuelson; Marshall Van Alstyne Board Members subscription is $269.
Gabriele Anderst-Kotis; Susan Dumais; Eric Allman; Samy Bahra; Peter Bailis;
Renée McCauley; Claudia Bauzer Mederios; C O N TAC T P O IN TS Terry Coatta; Stuart Feldman; Nicole Forsgren; ACM Media Advertising Policy
Elizabeth D. Mynatt; Pamela Samuelson; Copyright permission Camille Fournier; Jessie Frazelle; Communications of the ACM and other
Theo Schlossnagle; Eugene H. Spafford permissions@hq.acm.org Benjamin Fried; Tom Killalea; Tom Limoncelli; ACM Media publications accept advertising
SGB Council Representatives Calendar items Kate Matsudaira; Marshall Kirk McKusick; in both print and electronic formats. All
Sarita Adve; Jeanna Neefe Matthews calendar@cacm.acm.org Erik Meijer; George Neville-Neil; advertising in ACM Media publications is
Change of address Jim Waldo; Meredith Whittaker at the discretion of ACM and is intended
BOARD C HA I R S acmhelp@acm.org to provide financial support for the various
Letters to the Editor activities and services for ACM members.
Education Board C ONTR IB U TE D A RTIC LES
letters@cacm.acm.org Current advertising rates can be found
Mehran Sahami and Jane Chu Prey Co-Chairs
by visiting http://www.acm-media.org or
Practitioners Board James Larus and Gail Murphy
W E B S IT E by contacting ACM Media Sales at
Terry Coatta and Stephen Ibaraki Board Members
http://cacm.acm.org (212) 626-0686.
William Aiello; Robert Austin; Kim Bruce;
REGIONA L C O U N C I L C HA I R S Alan Bundy; Peter Buneman; Jeff Chase;
WEB BOARD Single Copies
ACM Europe Council Carl Gutwin; Yannis Ioannidis;
Chair Single copies of Communications of the
Chris Hankin Gal A. Kaminka; Ashish Kapoor;
James Landay ACM are available for purchase. Please
ACM India Council Kristin Lauter; Igor Markov; Bernhard Nebel;
Board Members contact acmhelp@acm.org.
Abhiram Ranade Lionel M. Ni; Adrian Perrig; Marie-Christine
Marti Hearst; Jason I. Hong;
ACM China Council Rousset; Krishan Sabnani; m.c. schraefel;
Jeff Johnson; Wendy E. MacKay COMMUN ICATION S OF THE ACM
Wenguang Chen Ron Shamir; Alex Smola; Josep Torrellas;
(ISSN 0001-0782) is published monthly
AU T H O R G U ID E L IN ES Sebastian Uchitel; Hannes Werthner;
by ACM Media, 2 Penn Plaza, Suite 701,
PUB LICATI O N S BOA R D http://cacm.acm.org/about- Reinhard Wilhelm
New York, NY 10121-0701. Periodicals
Co-Chairs communications/author-center postage paid at New York, NY 10001,
Jack Davidson; Joseph Konstan RES E A R C H HIGHLIGHTS
and other mailing offices.
Board Members Co-Chairs
ACM ADVERTISIN G DEPARTM E NT
Phoebe Ayers; Edward A. Fox; Azer Bestavros and Shriram Krishnamurthi
2 Penn Plaza, Suite 701, New York, NY POSTMASTER
Chris Hankin; Xiang-Yang Li; Board Members
10121-0701 Please send address changes to
Sue Moon; Michael L. Nelson; Martin Abadi; Amr El Abbadi; Sanjeev Arora;
T (212) 626-0686 Communications of the ACM
Sharon Oviatt; Eugene H. Spafford; Michael Backes; Maria-Florina Balcan;
F (212) 869-0481 2 Penn Plaza, Suite 701
Stephen N. Spencer; Divesh Srivastava; David Brooks; Stuart K. Card; Jon Crowcroft;
New York, NY 10121-0701 USA
Robert Walker; Julie R. Williamson Alexei Efros; Bryan Ford; Alon Halevy;
Advertising Sales Account Manager
Gernot Heiser; Takeo Igarashi; Sven Koenig;
Ilia Rodriguez
ACM U.S. Public Policy Office Greg Morrisett; Tim Roughgarden; Printed in the USA.
ilia.rodriguez@hq.acm.org
Adam Eisgrau, Guy Steele, Jr.; Robert Williamson;
Director of Global Policy and Public Affairs Margaret H. Wright; Nicholai Zeldovich;
Media Kit acmmediasales@acm.org
1701 Pennsylvania Ave NW, Suite 300, Andreas Zeller
Washington, DC 20006 USA
T (202) 659-9711; F (202) 667-1066 Association for Computing Machinery S P EC IA L S EC TIONS
(ACM) Co-Chair
Computer Science Teachers Association 2 Penn Plaza, Suite 701 Sriram Rajamani A
SE
REC
Y
Jake Baskin New York, NY 10121-0701 USA Board Members

E
CL
PL
Executive Director T (212) 869-7440; F (212) 869-0481 Tao Xie; Kenjiro Taura; David Padua
NE
TH
S
I
Z
I
M AGA
4 COMM UNICATIO NS O F THE ACM | NOV EM BER 201 8 | VO L . 61 | NO. 1 1

cerf’s up
DOI:10.1145/3281164 Vinton G. Cerf
The Upper Layers of the Internet

The Internet is a layered infrastructure and
much has been made of the utility of that
layering. IP layer packets are unaware of
their underlying transport mechanisms.
The Internet Protocol (IP) layer does interpreted and used by humans. The be structured to lead to self-censorship
not know or care what it carries in its transporting protocols and the ap- to avoid penalties.
payloads except that they are made up plications that render content for hu- The choices made by governments
of binary bits. The layer above IP, such man consumption are unaware of this as to the rules and means by which
as the Transmission Control Protocol meaning. One can imagine, however, a content is controlled have a profound
(TCP), User Datagram Protocol (UDP), virtual layer above the application layer effect on the nature of the society thus
or Real-Time Protocol (RTP), are equally that deals with content. We might think produced. Among the rights expressed
unaware of the meaning of the bits they of this as a political layer, not in the par- in the Universal Declaration of Human
carry although they treat these bits dif- tisan sense, but in the sense that the Rights is freedom of speech. That free-
ferently: TCP keeps packets in order content is viewed through a policy lens dom may well be abridged through in-
and repairs loss by retransmission. It applied by some agreed-upon author- centives for platform providers either
also filters out duplicates and controls ity. The operators of application plat- to restrict that freedom or to reveal
the flow to deal with congestion. The forms may choose to enforce terms and the identity of the speaker, who then
UDP transports packets quickly but conditions of use (for example, content faces the consequences of prohibited
without concern for ordering or recov- inciting violence is not allowed, no one speech. In the context of the World
ery from loss. RTP uses time stamping below a certain age is permitted to use Wide Web, in some countries, search
and payload typing to cope with jitter the application or consume the con- engine providers have been required to
and with correct interpretation of the tent) or may be required by legal means remove from their indices references
format of the carried bits. Payloads can to enforce such restrictions. to content deemed unacceptable by
be marked as audio encoded (for exam- Policy is a means of expressing be- government authorities. On the other
ple, G.711, G.723) or video encoded (for havioral norms that may also be en- hand, there are endless examples of
example, H.261, MPEG-4), although the forced through law. Policy is a way to malware and misinformation that can
meaning of these encoded payloads is provide positive incentive for desired be seen as harmful speech that threat-
opaque to RTP. behavior or negative incentive for un- ens the safety and security of citizens
Electronic mail is carried in the In- desired behavior. In the Internet space, and should be filtered and removed.
ternet as Simple Mail Transport Pro- so-called platform providers may self- The permissive openness of the low-
tocol (SMTP). SMTP knows about the regulate or may be regulated by external er layers of the Internet is now plainly
format of email messages but not about government agencies with the means challenged by the policy choices made
the meaning of the content. The World to compel desired behaviors. The In- at the virtual political layer. Where
Wide Web uses the HyperText Transport ternet’s applications and the providers lines are drawn in permission space
Protocol (HTTP) over TCP to carry Web- who support them form a kind of smart will profoundly affect the utility and
page content encoded in HyperText medium in which content is important. ubiquity of the Internet and the society
Markup Language (HTML). Huge num- In some regimes, these platform pro- in which it is embedded. The tussle in
bers of applications sit on top of HTTP viders are treated as common-carrier- cyberspace, so aptly named by David D.
providing useful information or func- like entities that are not responsible Clark et al.,a is far from over.
tionality in the form of smartphone or for content while those producing the
other computing platform applications. content are held to enforceable norms. a https://goo.gl/rLYqU8
Above these application layers, how- In other regimes, the smart media pro-
ever, the meaning of the content be- viders are required to become policers Vinton G. Cerf is vice president and Chief Internet Evangelist
at Google. He served as ACM president from 2012–2014.
comes important. The content of email, of content at risk of legal sanctions. In-
Web pages, video and audio streams is centives for content providers may even Copyright held by author.

ACM Chuck Thacker Breakthrough in Computing Award
The “ACM Breakthrough Award”
Nominations Solicited
Nominations are invited for the inaugural 2018

ACM Charles P. “Chuck” Thacker Breakthrough in
Computing Award (the “ACM Breakthrough Award”).
ACM Turing Laureate Charles P. (Chuck) Thacker (1943–2017)
received the 2009 ACM A.M. Turing Award for “the pioneering
design and realization of the first modern personal computer—
the Alto at Xerox PARC—and seminal inventions and
contributions to local area networks (including the Ethernet),
multiprocessor workstations, snooping cache coherence
protocols, and tablet personal computers.”
The award was established in recognition of Thacker’s pioneering contributions in computing.

These contributions are considered by the community to have propelled the world in the
early 1970s from a visionary idea to the reality of modern personal computing, providing
people with an early glimpse of how computing would deeply influence us all. The award also
celebrates Thacker’s long-term inspirational mentorship of generations of computer scientists.
The Breakthrough Award will recognize individuals with the same out-of-the-box thinking
and “can-do” approach to solving the unsolved that Thacker exhibited. The recipient should
be someone who has made a surprising or disruptive leapfrog in computing ideas or
technologies that provides a new capability or understanding that influences the course of
computing technologies in a deep and significant manner through its numerous downstream
influences and outcomes. Due to the breakthrough nature of the award it is expected that it
will be presented biennially and will not be presented if there is no candidate who meets the
criteria in a particular year.
The award is accompanied by a prize of $100,000 and would be presented at the annual ACM
Awards Banquet. The award recipient would be expected to give the ACM Breakthrough Lecture
at a major ACM conference of his or her choice during the year following the announcement.
The travel expenses of the recipient, and a companion, to attend the Lecture are supported by
the award. Financial support of the Thacker Award is provided by Microsoft.
Nomination information and the online submission form are available on:
https://awards.acm.org/thacker/nominations
The deadline for nominations/endorsements is:

January 15, 2019, End of Day, AoE, UTC-12 hours.
For additional information on ACM’s award program please visit:

www.acm.org/awards/
vardi’s insights
DOI:10.1145/3279813 Moshe Y. Vardi
Self-Reference and Section 230
O
N E O F T HE most amazing results, for example, that there are prob- days to be quite prescient and relevant.
features of human lan- lems that can be solved in exponential Section 230 of the Communications
guages is their capacity for time, but not in polynomial time. Decency Act of 1996 is a fundamental
self-reference. The con- Yet mathematicians, in general, item of Internet legislation in the U.S.
sequences of this feature took the demise of Hilbert’s Program Section 230 provides immunity from
were explored by Eubulides, a 4th- in stride. Mathematics is incomplete, liability for providers and users of an
century BCE Greek philosopher, who undecidable, and cannot prove its own “interactive computer service” who
formulated the Liar’s Paradox, “What consistency; so what? Mathematics has publish information provided by third-
I am saying now is a lie.” Is this a lie just gone on. Perhaps computer scien- party users, asserting: “No provider or
or not? For over 2,000 years, the Liar’s tists should develop a similar noncha- user of an interactive computer service
Paradox was a philosophical oddity. In lant attitude about negative results. shall be treated as the publisher or
1902, in a letter to the mathematician The undecidability of program termi- speaker of any information provided by
Friedrich Frege, the philosopher Ber- nation means there is no algorithm another information content provid-
trand Russell reformulated the Liar’s that can correctly decide termination er.” Section 230 enables Internet com-
Paradox as a paradox in set theory, argu- of all programs. So what? As I argued panies such as Google and Facebook
ing that “the collection of all sets that in “Solving the Unsolvable,”a program to be considered as platforms rather
do not include themselves as mem- termination may be practically decid- than as publishers, free from liability
bers” both cannot be a set and cannot able, even though it is theoretically un- for the content they publish. The explo-
fail to be a set. By identifying a contra- decidable. Just as mathematicians gave sive growth of social-media platforms
diction in set theory, Russell launched up on the quest to find a proof system would have not been possible without
a “foundational crisis” in mathematics. that can prove all true mathematical Section 230.
The foundational crisis proved to statements, we may need to give up on Yet this explosive growth has led to
be enormously fertile. It inspired Da- the quest for algorithms that solve all the proliferation of “bad” speech on
vid Hilbert to launch in the early 1920s problem instances. In other words, the social-media platforms, which has be-
what has become known as “Hilbert’s quest for universality is self-destructive. come politically untenable. All social
Program,” the goal of which was to Self-reference was taken in a differ- media platforms are now actively fight-
demonstrate that mathematics was ent direction by the philosopher Karl ing “fake news”—false news stories
consistent (free of paradoxes), com- Popper, who formulated the paradox typically spread with the intent to influ-
plete (can answer all mathematical of freedom in 1945: “The so-called para- ence political views. Recently, social-
questions), and decidable (amenable dox of freedom is the argument that media platforms have banned the con-
to computation). Within 15 years, Hil- freedom in the sense of absence of spiracy theorist Alex Jones for violating
bert’s Program was demolished first by any constraining control must lead to their “abusive behavior” policy. In spite
Kurt Gödel, who proved that arithme- very great restraint, since it makes the of Section 230, social-media platforms
tic is incomplete and cannot prove its bully free to enslave the meek.” Closely seem to be accepting responsibility
own consistency, and then by Church related is the paradox of tolerance: for the content they publish. In other
and Turing, who showed that First-Or- “Unlimited tolerance must lead to the words, they are starting to behave with
der Logic is undecidable. The crucial disappearance of tolerance.” Popper’s some restraint, like publishers, rath-
technique used by Gödel, Church, and conclusion was that we must give up er than platforms. Popper would be
Turing is that of diagonalization, which on the universality of freedom and pleased with this development!
is a technical term for self-reference. tolerance, as complete freedom and Follow me on Facebook, Google+,
Turing invented Turing Machines as tolerance are self-destructive. Even a and Twitter.
a model for computability in order free society must have some limits on
to prove his undecidability result, so freedom, and a tolerant society must Moshe Y. Vardi (vardi@cs.rice.edu) is the Karen Ostrum
George Distinguished Service Professor in Computational
theoretical computer science rose out be intolerant of intolerance. Engineering and Director of the Ken Kennedy Institute for
of the ashes of Hilbert’s Program. Di- These philosophical musings from Information Technology at Rice University, Houston, TX, USA.
He is the former Editor-in-Chief of Communications.
agonalization went on to play a key role more than 70 years ago seem these
in computational complexity theory,
where it was used to prove separation a https://bit.ly/2qzssiv Copyright held by author.

The Communications Web site, http://cacm.acm.org,
features more than a dozen bloggers in the BLOG@CACM
community. In each issue of Communications, we’ll publish
selected posts or excerpts.
Follow us on Twitter at http://twitter.com/blogCACM
DOI:10.1145/3276740 http://cacm.acm.org/blogs/blog-cacm
The Gap in CS, Mulling portant way, while fundamentally im-

proving education and professional
Irrational Exuberance
relationships in CS.
A nonprofit professional encyclope-
dia would be self-supporting through
appropriate professionally relevant
Carl Hewitt suggests computer science needs advertising carefully curated for high
a reference resource, while Vijay Kumar decries standards using existing advertising
intellectual dishonesty in technology forecasting. programs.
Vijay Kumar
Carl Hewitt disability, and national origin inte- Irrational Exuberance
Computer Science grating content suitable for all from and the ‘FATE’
Encyclopedia preschool to advanced researchers. of Technology
Can Fill a Gap The encyclopedia could support in- https://cacm.acm.
https://cacm.acm. teractive articles with videos, anima- org/blogs/blog-
org/blogs/blog- tions, and dynamic narrations. Within cacm/230472-irrational-exuberance-
cacm/230860-computer-science- a decade, interactive content could be and-the-fate-of-technology/fulltext
encyclopedia-can-fill-a-gap/fulltext a requirement for most articles. Over August 20, 2018
September 5, 2018 time, the encyclopedia should be or- I am sure many of us remember the
There is an important gap in comput- ganized using ontological services Netscape IPO in 1995 and the fivefold
er science (CS) education and profes- supporting programmatic interfaces growth in share value in four months.
sional collaboration that can be filled for a knowledge graph. Expectations for technology and its im-
by a nonprofit online reputable, The encyclopedia should become a pact were in the stratosphere. The Feder-
referenceable encyclopedia support- standard reference, a trustworthy pro- al Reserve Board’s then-chairman, Alan
ed by appropriate professionally rel- fessionally accountable educational re- Greenspan, gave a speech at the Ameri-
evant advertising. The encyclopedia source for all. Currently, there is no on- can Enterprise Institute questioning “ir-
should be managed by a prestigious line encyclopedia that can serve as the rational exuberance” in the market and
editorial board that appoints a hier- source of valid scientific references. in technology.1 I believe today we are see-
archy of editors to moderate articles. Our profession has the credibility ing similar exuberance with technology.
The editorial board can guarantee and resources to create an encyclope- Are revolutionary technologies for
editorial independence from adver- dia to serve as the professional stan- cancer screening—that rely on a finger-
tisers analogous to current profes- dard. Serving as a member of its edito- prick drawing one-thousandth the nor-
sional practices for journals and con- rial board could be a prestigious office mal amount of blood—really feasible?
ferences. Anyone would be allowed for senior professionals to provide ex- Theranos had everyone believe such a
to register to submit suggestions and perience and judgment. Professional revolutionary advancement was pos-
drafts to the editors. Access to ar- reputations could be enhanced by sible2 not because of new techniques in
ticles would be free and available to contributing to the encyclopedia be- analytical chemistry, but because it had
all. The encyclopedia must establish cause contributions would be publicly developed novel software and new auto-
procedures to be fair and inclusive on announced. The encyclopedia could mation technologies! Can we really hope
the basis of race, sex, religion, age, knit together our profession in an im- to replace eight million cars in Los An-
8 COMMUNICATIO NS O F THE ACM | NOV EM BER 201 8 | VO L . 61 | NO. 1 1

blog@cacm
geles by boring tunnels3 for high-speed tests and logging data, and verification Irrational exuberance in technol-
pods that will travel at 150 MPH at $1 per of software guaranteed not to have un- ogy has led to an even bigger problem:
ride? This is what Boring Company is wanted, unsafe behaviors. Can we claim intellectual dishonesty, which every
selling the City of Los Angeles. Do recent vehicles are safe because the underlying engineer and computer scientist must
advances in data science and machine software has been tested with over a bil- guard against. As professionals, it is
learning really mean artificial general lion miles of data? U.S. National Safety our responsibility to call out intellec-
intelligence is around the corner? This Council statistics suggest a billion miles tual dishonesty.
is the pitch of so many startups today. of human driving, on average, results in Questions of verification, safety, and
There have been advances in sta- 12.5 fatalities,6 and a billion-mile data- trust must be central when we embody
tistical machine learning, which have set cannot possibly be viewed as large intelligence in physical systems. Ques-
had remarkable impact in fields like enough or representative enough to tions of fairness, accountability, trans-
computer vision and speech recog- train software to prevent fatalities. parency, and ethics (FATE) should be
nition when the underlying neural The Uber-Waymo trial led to the re- addressed for data and information in
networks are trained by large-enough lease of documents truly shocking in society. It is great to see such efforts tak-
representative datasets. What “large this regard. They reveal a culture7 that ing shape in industry10 and academia.11
enough” means, we don’t yet know. appears to prioritize releasing the latest As teachers, we have an even big-
Neither do we know when we have a software over testing and verification, ger responsibility, as technology is
representative dataset. Yet there are and one that encourages shortcuts. This no longer taught to just computer
many interesting cases where deep may be acceptable for a buggy operating scientists or engineers; it is now a
learning “works.” These success sto- system for a phone that can be patched new liberal art. It is critical to address
ries are oversold. In my own field— later, but should be unacceptable for the true limitations of what technol-
robotics—autonomy is a challenging software that drives a car. ogy can really bring about in the
problem, especially in tasks of manip- This irrational exuberance may have imminent future and the real dan-
ulation and perception-action loops. its roots in the exponential growth in gers of extrapolation. Every univer-
Yet despite the claims being made, computing and storage technologies pre- sity student who designs or creates
our best robots lack the dexterity of a dicted by Gordon Moore five decades anything must be sensitized to fun-
three-year-old child. ago. The fact that just over a decade ago damental concerns of accountabil-
Nowhere is irrational exuberance smartphones, cloud computing, and ity and transparency and ethical
more evident than in self-driving cars. ride-sharing seemed like science fic- responsibilities. We must address
Not many people know the first demon- tion, and technologies like 3D printing the FATE of technology, across all
strations of an autonomous car were in and DNA sequencing were prohibitively activities of design, synthesis and
the late 1980s at the Bundeswehr Univer- expensive, has led to a culture of extrap- reduction of technologies to practice.
sity Munich and at Carnegie Mellon Uni- olation fueled by exponential growth.
versity. Autonomous vehicles can have a Advances in creating programs that can References
1. https://www.federalreserve.gov/boarddocs/
tremendous social, economic, and en- play board games like chess and recent speeches/1996/19961205.htm
vironmental impact. This fact, and the results with Alpha Go and Alpha Zero 2. https://www.newyorker.com/magazine/2014/12/15/
blood-simpler
technical challenges in realizing a bold have been mind-boggling. Unfortu- 3. https://www.cnbc.com/2018/05/18/elon-musk-
promises-1-rides-for-boring-companys-la-tunnels.
vision, has attracted some of the top tal- nately, from this comes the extrapola- html
ent in science and engineering over the tion that it is only a question of time 4. http://www.driverless-future.com/?page_id=384
5. https://qz.com/1243334/the-magical-battery-uber-
last 30 years. However, many of us don’t before we conquer general intelligence. needs-for-its-flying-cars/
remember history, and many choose to There is at least one argument that 6. https://www.forbes.com/sites/
jimgorzelany/2017/02/16/death-race-2017-
ignore it since problems known to have we are not making significant prog- where-to-find-the-most-dangerous-roads-in-
not been solved for three decades are ress in understanding intelligence if america/#68a5354a1324
7. https://www.theverge.com/2018/3/20/17144090/
unlikely to attract investment. we take into account the exponential uber-car-accident-arizona-safety-anthony-
According to recent predictions,4 growth in computing due to Moore’s levandowski-waymo
8. https://www.eff.org/ai/metrics
fully autonomous cars will be avail- Law. While computers have achieved 9. https://freedom-to-tinker.com/2018/01/04/
able soon. Fully autonomous Audis superhuman performance in chess, singularity-skepticism-2-why-self-improvement-
isnt-enough/
and Teslas were promised several the Elo rating of chess programs has 10. https://www.microsoft.com/en-us/research/group/fate/
years ago by 2018. Uber even promised merely increased linearly over the last 11. http://warrencenter.upenn.edu/
us flying cars powered by clean energy three decades.8 If we were able to ex-
Carl Hewitt is an emeritus professor of the Massachusetts
by 2023, even though the basic phys- ploit the benefits of Moore’s Law, our Institute of Technology, USA. He is board chair of iRobust,
ics and chemistry underlying battery chess-playing programs should be a a scientific society for the promotion of the field of
Inconsistency Robustness, and board chair of Standard
technology tells us otherwise.5 billion times better than the programs IoT, an international standards organization for the
It is worrisome when engineers make from 30 years ago, instead of merely 30 Internet of Things. Vijay Kumar is Nemirovsky Family
Dean of Penn Engineering with appointments
these claims, and even more so when times better. This suggests the expo- in the departments of Mechanical Engineering and
entrepreneurs use such claims to raise nential growth of technology may not Applied Mechanics, Computer and Information Science,
and Electrical and Systems Engineering at the University
funding. However, the biggest concern even apply to algorithmic advances in of Pennsylvania, USA.
should be about embedding software artificial intelligence,9 let alone to ad-
for autonomy in safety-critical systems. vances in energy storage, biotechnol-
There is a difference between running ogy, automation, and manufacturing. © 2018 ACM 0001-0782/18/11 $15.00

Inviting Young
Scientists
Meet Great Minds in Computer
Science and Mathematics
As one of the founding organizations of the Heidelberg Laureate Forum
http://www.heidelberg-laureate-forum.org/, ACM invites young computer
science and mathematics researchers to meet some of the preeminent
scientists in their field. These may be the very pioneering researchers who
sparked your passion for research in computer science and/or mathematics.
These laureates include recipients of the ACM A.M. Turing Award, the
Abel Prize, the Fields Medal, and the Nevanlinna Prize.
The 7th Heidelberg Laureate Forum will take place September 22–27, 2019 in
Heidelberg, Germany.
This week-long event features presentations, workshops, panel discussions,
and social events focusing on scientific inspiration and exchange among
laureates and young scientists.
Who can participate?

New and recent Ph.Ds, doctoral candidates, other graduate students
pursuing research, and undergraduate students with solid research
experience and a commitment to computing research
How to apply:
Online: https://application.heidelberg-laureate-forum.org/
Materials to complete applications are listed on the site.
What is the schedule?
The application deadline is February 15, 2019.
We reserve the right to close the application website
early depending on the volume
Successful applicants will be notified by mid April 2019.
More information available on Heidelberg social media
PHOTOS: ©HLFF / B. Kreutzer (top);

©HLFF / C. Flemming (bottom)
N
news
Science | DOI:10.1145/3276742 Don Monroe
AI, Explain Yourself

It is increasingly important to understand
how artificial intelligence comes to a decision.
A
RTIFICIAL INTELLIGENCE (AI)
SYSTEMS are taking over a
vast array of tasks that pre-
viously depended on hu-
man expertise and judg-
ment. Often, however, the “reasoning”
behind their actions is unclear, and
can produce surprising errors or re-
inforce biased processes. One way to
address this issue is to make AI “ex-
plainable” to humans—for example,
designers who can improve it or let
users better know when to trust it.
Although the best styles of explana-
tion for different purposes are still be-
ing studied, they will profoundly shape
how future AI is used.
Some explainable AI, or XAI, has
long been familiar, as part of online
recommender systems: book purchas-
ers or movie viewers see suggestions
for additional selections described as
having certain similar attributes, or be-
ing chosen by similar users. The stakes University Professor in the department result in death, financial loss, or denial
are low, however, and occasional mis- of computer science and founding direc- of parole. A reform of European Union
fires are easily ignored, with or without tor of the Human-Computer Interaction data protection tools that took effect
these explanations. Laboratory at the University of Maryland. in May highlighted these responsibili-
Nonetheless, the choices made by As AI is applied more broadly, it ties, although they refer only indirectly
these and other AI systems sometimes will be critical to understand how it to a “right to explanation.” Still, any re-
IMAGE BY LETTERS-SH METT ERS
defy common sense, showing our faith reaches its conclusions, sometimes quired explanations will not help much
in them is often an unjustified projection for specific cases and sometimes as if they resemble the unread fine print
of our own thinking. “The implicit no- a general principle. At the individual of software end-user agreements. “It
tion that AI somehow is another form of level, designers have both ethical and must be explainable to people,” Shnei-
consciousness is very disturbing to me,” legal responsibilities to provide such derman said, including people who are
said Ben Shneiderman, a Distinguished justification for decisions that could not expert in AI.
N OV E MB E R 2 0 1 8 | VO L. 6 1 | N O. 1 1 | C OM M U N IC AT ION S OF T HE ACM 11
news
For designers, providing explana- penalties, and require modifications to the full-blown deep learning system.
tions of surprising decisions need not be prevent a recurring failure. “My legal “Depending on your application, you
just an extra headache, but it “is going to friends tell me that the law is perfectly might think of different formats of expla-
be a very virtuous thing for AI,” Shneider- fine,” Shneiderman said. “We don’t nation,” agreed Regina Barzilay, Delta
man stressed. “If you have an explainable need new laws to deal with AI.” Electronics Professor in the department
algorithm, you’re more likely to have an Explaining individual incidents is of electrical engineering and computer
effective one,” he asserted. hard enough, but in other cases prob- science at the Massachusetts Institute
Explainable methods have not al- lems may only emerge in a system’s of Technology. At one level, for example,
ways performed better, though. For aggregate performance. For example, the system can explain by “identifying
example, early AI comprised large sets AI programs used to assess borrowers’ excerpts from the input which drove the
of rules motivated by human decision creditworthiness or criminals’ recidi- decision,” as her group is doing for mo-
criteria, and was therefore easy to un- vism based on socioeconomic attributes lecular modeling. Another technique is
derstand within a restricted domain, may end up discriminating against in- “to find which instances in the training
but its capability was often disappoint- dividuals whose racial cohort tends to set are the closest” to the target.
ing. In contrast, recent dramatic perfor- have unfavorable characteristics. Simi-
mance improvements in AI are based larly, systems analyzing medical records Appropriate Trust
on deep learning using huge neural “might pick up something that looks In view of AI’s growing military im-
networks with many hidden features like race as an important indicator for portance, the U.S. Defense Advanced
that are “programmed” by exposure some outcome,” when actually patients Research Projects Agency (DARPA) in
to huge numbers of examples. These of different races just end up in hospi- 2017 rolled out an ambitious program
systems apply vast computer power to tals that use different procedures, said to explore XAI from many different
these annotated training datasets to Finale Doshi-Velez, an assistant profes- perspectives and compare them. “One
discern patterns that are often beyond sor of computer science in the John A. of the main goals or benefits of the ex-
what humans can recognize. Paulson School of Engineering and Ap- planation would be appropriate trust,”
plied Sciences at Harvard University. stressed David Gunning, the program’s
What Is an Explanation? Many end users, however, seek less manager. “What you really need is for
Considering this internal complexity of legalistic explanations that may not people to have a more fine-tuned mod-
modern AI, it may seem unreasonable be provably connected to the underly- el of what the system is doing so they
to hope for a human-scale explanation. ing program. Like AI, “People are in- know the cases when they can trust it
For a deep learning system trained on credibly complicated in terms of how and when they shouldn’t trust it.”
thousands of pictures of cats and not- we think and make decisions,” said Most of the dozen projects aim to in-
cats, “Maybe the best analogy is that it Doshi-Velez, but “we are able to explain corporate explanation-friendly features
develops a gut instinct for what is a cat things to each other.” into deep learning systems; for example,
and what isn’t,” said Ernest Davis, a In medical use, for example, it can be preprogramming the internal network
professor of computer science at New enough to have an explanation that clar- structure to favor familiar concepts.
York University. Just as people devise ifies the diagnostic or therapeutic deci- A critical issue is whether explain-
post-hoc rationalizations for their own sion for a subset of patients with simi- able features degrade the performance
decisions, such as pointy ears, a tail, lar conditions, Doshi-Velez said. Such a of the AI. “I think there is inherent
and so forth, “that doesn’t actually ex- “local” explanation need not address all trade-off,” Gunning said, although he
plain why you recognized it as a cat,” he the complexities and outliers covered by notes that some participants disagree.
said. “Generating that kind of account Barzilay, in contrast, says that experi-
is a different task.” ments so far indicate any performance
An important challenge is that Considering the hit from making an AI explainable is
such independently generated expla- “really, really minimal.”
nations could also be chosen for their internal complexity As an alternative to modifying deep
intuitive plausibility, rather than their of modern AI, learning, one of the DARPA projects re-
accuracy. Presenting favorable stories places it with an approach inherently
will be particularly tempting when le- it may seem easier to interpret. The challenge in
gal liability is at stake—for example, unreasonable to hope that case is to make its performance
when a self-driving car kills a pedes- more competitive, Gunning said.
trian, or an AI system participates in a for a human-scale A third strategy is to use a separate
medical mistake. explanation system to describe the learning sys-
Liability assessment requires a de- tem, which is treated as a black box, es-
tailed audit trail, Shneiderman said, of its decision-making sentially using one learning system to
analogous to the flight-data record- rationale. analyze another. For this scheme, one
ers that allow the U.S. National Trans- question is whether the explanation ac-
portation Safety Board to retroactively curately describes the original system.
study airplane crashes. This kind of As the results come in, beginning
“explanation” allows a regulatory over- in fall 2018, “the program should pro-
sight agency to analyze a failure, assign duce a portfolio of techniques,” Gun-
12 COM MUNICATIO NS O F TH E ACM | NOV EM BER 201 8 | VO L . 61 | NO. 1 1

news
“It’s time for AI

vided to humans, can they really do bet-
ter science?” she said. “I think this will ACM
to move out its
be the next frontier.”
Instead of teamwork between AI
and humans, however, Shneiderman
Member
adolescent, game- regards the more appropriate goal as News
playing phase and leveraging human decision making,
rather than outsourcing it. “The key
take seriously word is responsibility,” he said. “When
MAKING ROBOTS SMARTER
THROUGH IMPROVED
the notions of quality we’re doing medical, or legal, or pa- MOTION ALGORITHMS
Tomás
role, or loans, or hiring, or firing, or so
and reliability.” on, these are consequential.
Lozano-Pérez,
a professor
“It’s time for AI to move out of its of Computer
adolescent, game-playing phase and Science and
Engineering
take seriously the notions of quality at the
and reliability,”says Shneiderman. Massachusetts Institute of
ning said. An important feature of Technology (MIT), had no idea
the program is a formal evaluation, in what a computer was when
Further Reading he was growing up. During his
which the usefulness to human users freshman year at MIT, he took
of results with or without explanation Statement on Algorithmic a class in programming;
Transparency and Accountability, Lozano-Pérez says he “loved it,
will be compared. Some of this assess-
Association for Computing Machinery and never looked back.”
ment will based on subjective impres- US Public Policy Council, Jan. 12, 2017, Born in Guantánamo, Cuba,
sion, but users will also try to predict, https://www.acm.org/binaries/content/ in 1952, Lozano-Pérez left for
for example, whether the system will assets/public-policy/2017_usacm_ Miami at the age of 10, then
correctly execute a new task. statement_algorithms.pdf. attended high school in Puerto
Rico. He went on to earn his
The goal, Gunning said, is to deter- European Commission B.S., M.S., and Ph.D. degrees
mine whether “the explanation gives 2018 reform of EU data protection rules in Electrical Engineering and
https://ec.europa.eu/commission/priorities/
them a better idea of the system’s Computer Science, all from
justice-and-fundamental-rights/ MIT, in 1973, 1976, and 1980,
strengths and weaknesses.” data-protection/2018-reform-eu-data- respectively. He joined the
protection-rules_en faculty of MIT in 1981; today,
Ensuring Human Control Doshi-Velez, F., Kortz, M., Budish, R., he is the university’s School
Ultimately, explanations must be un- of Engineering Professor in
Bavitz, C., Gershman, S., O’Brien, D., Schieber,
Teaching Excellence, as well as
derstandable by humans who are not S., Walso, J., Weinberg, D., and Wood, A.
a member of the university’s
AI experts. The challenges of doing this Accountability of AI Under the Law: Computer Science and Artificial
and measuring the results are familiar The Role of Explanation, Intelligence Laboratory
https://arxiv.org/abs/1711.01134 Lozano-Pérez says his main
to educators worldwide, and success-
Fairness, Accountability, and Transparency research interest has always
ful approaches must include not only been robotics, particularly
in Machine Learning Group
computer science, but also psychology. https://www.fatml.org/ with respect to algorithms for
“This human-computer interac- planning motions that enable
Explainable Artificial Intelligence higher-level functions in robots,
tion is becoming more and more im-
(XAI Project) to make them smarter.
portant,” for example for medical AI U.S. Defense Advanced In the early 1990s, feeling
systems, said Andreas Holzinger, lead Research Projects Agency like the field of robotics was
of the Holzinger Group at the Institute https://www.darpa.mil/program/ stalled and going nowhere,
explainable-artificial-intelligence Lozano-Pérez took leave for
for Medical Informatics/Statistics of several years to work at start-up
the Medical University of Graz, Aus- DARPA Perspective on AI Arris Pharmaceuticals, where he
tria, as well as an associate professor of U.S. Defense Advanced was immersed in computational
Research Projects Agency biology and machine learning.
applied computer science at the Graz
https://www.darpa.mil/about-us/ He considers this experience
University of Technology. “The most darpa-perspective-on-ai invaluable, as he was able to
pressing question is what is interest- apply what he learned about
Shneiderman, B. machine learning at Arris to
ing and what is relevant” to make the Algorithmic Accountability: robotics when he returned to
explanation useful in diagnosis and Designing for Safety, the MIT faculty.
treatment. “We want to augment hu- Radcliffe Institute for Advanced Study, Today, Lozano-Pérez feels
man intelligence,” Holzinger said. “Let Harvard University it is much more realistic to talk
https://www.radcliffe.harvard.edu/video/ about intelligent robots. He
the human do what the human can do
algorithmic-accountability-designing- is now more optimistic about
well, and so for the computer.” safety-ben-shneiderman the possibilities of integrating
For scientific systems, users “are artificial intelligence and
thinking about mechanistic explana- Don Monroe is a science and technology writer based in
robotics, as he intended 30 or
Boston, MA, USA. 40 years ago when he entered
tions,” Barzilay said. “The potential is the field.
to have a symbiosis between machines —John Delaney
and humans. If these patterns are pro- © 2018 ACM 0001-0782/18/11 $15.00
news
Technology | DOI:10.1145/3276746 Neil Savage
A New Movement
in Seismology
Unused telecom fiber might be used to detect earthquakes,
uncover other secrets in the soil.
W
H EN E VER AN E ART H -
Q UA KE strikes, news
reports quickly fill in
certain details, such as
how strong the quake
was and where it was centered. That
information comes from a networks
of seismometers scattered across the
planet. Seismometers, though, can be
expensive to install and maintain over
long periods, and researchers cannot
place them everywhere they might like,
such as in the densely built and expen-
sive streets of an earthquake-prone city
like San Francisco.
Some scientists, however, are ex-
ploring a different approach, using a A loop of optical fibers beneath Stanford University monitors ground movement.
sensor that is already widely deployed
beneath the streets of towns and cities hedge against future needs. science Division at the U.S. Department
around the world. That sensor is the That provides researchers like Bion- of Energy’s Lawrence Berkeley Nation-
common fiber-optic cable, used to car- do Biondi, a professor of geophysics at al Laboratory in Berkeley, CA. He has
ry telephone and Internet traffic. Stanford University, with an opportu- found that the signal-to-noise ratio
The technique is called distributed nity. “You can basically convert a piece in the fiber sensors is about 10 to 15
acoustic sensing, and the oil and gas of fiber into a virtual sensor,” he says. decibels worse than that in geophones,
industry has used it for several years Since September 2016, Biondi has another type of sensor that is itself less
to monitor the ground around wells it been using a 4.8-km-long loop of fiber sensitive than a seismometer.
drills. A laser beam traveling through an installed under the Stanford campus to However, what the dark fibers lack
optical fiber will sometimes strike an im- see what sort of signals he could pick in sensitivity, they make up for in vol-
purity in the glass fiber, and part of the up and interpret. The signals, he says, ume. “Even in the Bay Area (San Fran-
beam will be reflected. When an acoustic are very noisy. cisco) and in the Los Angeles area,
wave traveling through the Earth strikes Fibers used in the oil industry are which are probably the most instru-
a fiber, it stretches or compresses that cemented alongside the drill hole, and mented areas in country, you have a
fiber by a tiny amount. Using an interfer- are therefore very well coupled to the seismometer every 10 or 20 kilome-
ometer, researchers can detect changes ground; when the earth moves, they ters,” Biondi says. “Here, you can have
in the backscattered light and use that move in concert with it. Dark telecom a sensor every five meters.”
IMAGE BY STAM EN DESIGN A ND TH E VIC TORIA AND A LBERT MUSEU M
to measure the strain on the fiber which, fiber, though, generally lays untethered One interrogator box the size of a
in turn, provides information about the in a plastic conduit, rubbing and bump- small rack-mounted server can measure
sound wave that struck the fiber. ing against the conduit’s wall or other reflections from hundreds or thousands
Oil companies, of course, install fibers. That makes it a lot less sensitive of spots along a fiber, and determine
such fiber specifically for sensing. than the high-quality seismometers their location based on the round-trip
There are, however, many thousands deployed by the U.S. Geological Survey time of the laser light. The lasers are
of kilometers of fiber lying around (USGS) for earthquake monitoring. improving in quality and power, and in
unused. When telecom companies in- Researchers have not made exact the not-too-distant future, Biondi says,
stall fiber in the ground, they put in far comparisons between the quality of he can imagine using perhaps 1,000
more than they actually need. Fiber is signal that fiber sensors pick up and interrogator boxes to create millions of
relatively cheap, but digging up and those the broadband seismometers of virtual sensors. “I jokingly say before I
repaving city streets is expensive, so USGS use, says Jonathan Ajo-Franklin, retire, I’d like to have an array under the
telecoms add so-called dark fiber as a a staff scientist in the Energy Geo- Bay Area that has a billion sensors.”
14 COMMUNICATIO NS O F TH E AC M | NOV EM BER 201 8 | VO L . 61 | N O. 1 1

news
For now, Biondi’s test network con- way to find and eliminate them. Stan- While more seismologists are be-
tains 600 virtual sensors about 5 to 7 dard audio compression techniques coming interested in taking advantage
meters from each other, taking mea- are designed to deal with millions of of dark fiber for sensing, Ajo-Franklin
surements at a rate of 50 times per sec- channels that have some coherence says, they are still figuring out just what
ond. He has used it to create a database across them, which is not what this quality of data they can get. What makes
of 3,000 small, local seismic events, and data looks like. He is exploring signal- the work worthwhile is that fiber is pret-
to identify other sources of noise picked processing algorithms, but has not yet ty much everywhere, even under oceans.
up by the fiber, such as traffic, construc- hit on a solution. “The oceans are a place where there are
tion equipment, and water pumps. His Some sounds from sources other almost no seismometers,” he says. “Ex-
group is working on machine learning than earthquakes can be useful in their cept where you have islands, it’s a big
algorithms to automatically identify own right. The way seismic waves prop- gap in our coverage of looking at the way
earthquakes, using labeled data from agate is affected by factors such as how seismic waves travel through the earth.”
what they have already collected as the densely packed soil is and how much He has spoken with a South American
training set. The hope is to be able to moisture it contains. Measuring the telecom company that has some un-
identify even smaller earthquakes, and movement of surface waves from traffic dersea cables that were severed, so they
to use such information to develop bet- allows scientists to map the properties cannot be used for communications,
ter maps of fault areas that can be fed of the soil, so they can tell how likely but could still serve as sensors.
into earthquake simulations to project it is to give way during an earthquake. If sensing with dark fiber works the
the risks to buildings and people in the Ajo-Franklin is working on predicting way these researchers hope, it will give
earthquake zone. “Really, to have an es- soil erosion on the sides of roads in geologists the opportunity to study not
timate of a hazard, you need to propa- Alaska caused by melting permafrost only earthquakes, but things like dis-
gate these seismic waves in the comput- using distributed acoustic sensing (al- tribution of ground water in almost
er using the right properties,” he says. though in that case, the researchers in- any place they care to look. “That’s the
Ajo-Franklin, a sometime collabo- stalled the fiber themselves). neat thing about it,” Ajo-Franklin says,
rator with Biondi, has used the same Some seismologists want to use “because suddenly you have this sensor
approach on the Department of En- sound to make even deeper maps of which is across big, big sections of im-
ergy’s Dark Fiber Testbed to detect and the subsurface, to depths of 100 me- portant basins in the world, and you can
identify traffic noise. He wants to find ters or more, which could be useful in measure all of these different attributes
ways to distinguish the seismic waves predicting how a building will react to that were previously inaccessible.”
produced by a passing semi-trailer an earthquake.
truck from those coming from a tec- Ocean waves produce a constant low
Further Reading
tonic shift, particularly since fiber is of- rumble that moves around the planet.
ten installed along roads and railroad Those sound waves change as they Dou, S., Lindsey, N., Wagner, A.M., Daley, T.M.,
Freifeld, B., Robertson, M., Peterson, J., Ulrich,
beds. The waves from an earthquake move through the different compo-
C., Martin, E.R., and Ajo-Franklin, J.B.
have frequency and amplitude char- nents of the Earth’s top layer—reflect- Distributed Acoustic Sensing for Seismic
acteristics distinct from those of traf- ed, sped up, or slowed down as they Monitoring of The Near Surface: A Traffic-
fic, and they originate from one area, encounter materials such as hard gran- Noise Interferometry Case Study, Nature
whereas the source of waves from a ve- ite or soft water-saturated sediment. Scientific Reports, September 2017
hicle moves down the road along with Scientists can measure those changes Martin, E.R., Huot, F., Ma, Y., Cieplicki, R., Cole,
the car. If a computer can automati- to create a picture of the subsurface, S., Karrenbach, M., and Biondi, B.L.
A Seismic Shift in Scalable Acquisition
cally identify those differences, it can much the way an ultrasound machine
Demands New Processing, IEEE Signal
subtract them out from the waves it is can image a fetus. “The physics be- Processing, March 2018.
trying to identify. hind it are very similar,” says Andreas
Kruger, L.
Ajo-Franklin is using versions of Fichtner, a professor of seismology Dark Fiber is Lighting Up, Communications
what he calls “classical seismology al- and wave physics at the Swiss Federal of the ACM, October 2013
gorithms for ambient noise imaging Institute of Technology in Zurich, Swit- Martin, E.R., Castillo, C.M., Cole, S., Sawasdee,
scaled up to much larger computers.” zerland, who recently attended a work- P.S., Yuan, S., Clapp, R., Karrenbach, M., and
He has collected about eight months of shop on medical imaging to pick up Biondi, B.L.
measurements from a 30km. stretch of ideas on how to handle the data. Seismic monitoring leveraging existing
fiber north of Sacramento, generating While others are studying traffic telecom infrastructure at the SDASA: Active,
passive, and ambient-noise analysis, The
about 300TB of data. “That’s a lot of noise, Fichtner is trying to avoid any Leading Edge, December 2017
signal processing, and it’s a lot of data human-produced sound entirely, to
Ajo-Franklin, J.
manipulation, which is tough when gain a better understanding of what Fiber-optic distributed sensing
you have these volumes,” he says. natural sounds dark fiber sensors can https://www.youtube.com/watch?v=-
One challenge is finding the right detect. He is setting up an experiment IdEZJzjkC4
compression algorithm to shrink the 500 meters deep in an old tunnel in the
data to a more manageable size. Ajo- Alps, built to store nuclear waste but Neil Savage is a science and technology writer based in
Lowell, MA, USA.
Franklin is sure there are plenty of bits now empty. “Before we start using the
that do not convey any useful informa- cables in a quantitative way, we have to
tion, but he is not certain of the best understand what we measure,” he says. © 2018 ACM 0001-0782/18/11 $15.00
news
Society | DOI:10.1145/3276744 Samuel Greengard
Weighing the Impact

of GDPR
The EU data regulation will affect computer, Internet,
and technology usage within and outside the EU;
how it will play out remains to be seen.
W
H EN THE E U ROPE AN
UNION (EU) General
Data Protection Regu-
lation (GDPR) went
into effect on May 25,
2018, it represented the most sweeping
effort yet to oversee the way businesses
collect and manage consumer data.
The law, established to create consis-
tent data standards and protect EU
citizens from potential privacy abus-
es, sent ripples—if not tidal waves—
across the world.
GDPR gives European citizens great-
er control of their data while establish-
ing strong penalties for businesses
that do not comply. What is more, any
data that involves EU citizens or touch-
es EU companies is covered by GDPR.
The initiative replaces an older data
privacy initiative called the Data Pro-
tection Directive 95/46/EC, which was
introduced in 1995.
The implications and ramifications
are enormous—and the initiative’s
reach is global. GDPR will change ev- tal right that is “basic to the integrity of with the advertisements that you ac-
erything from the way data collection a human being,” Frischmann adds. cept. And I think people are becoming
takes place to the way corporate data- more and more aware of the fact that
bases are designed and used. It also will Data Wars their personal data do have a value.”
potentially change the way research Digital technology has inexorably Says Alison Cool, assistant profes-
and development takes place, and changed the face of privacy. Today, sor of anthropology at the University
will impact cybersecurity practices, as there is a perception—and plenty of of Colorado, Boulder, “There are a
well as introducing a practical array of evidence to support it—that personally lot of questions and ambiguities that
challenges revolving around sites and identifiable information (PII) is under must be addressed, but it’s clear that
repositories where groups share com- assault as never before. A Pew Research GDPR will significantly change the
ments, information, and other data. Center survey found that in the U.S., data landscape.”
“It’s a groundbreaking initiative,” 93% of adults say being in control of While the U.S. and a number of
says Brett M. Frischmann, Charles who can get information about them other countries have adopted an opt-
Widger Endowed University Professor is important; 90% say controlling what out approach to data collection—es-
in Law, Business, and Economics at Vil- information is collected is important. sentially, a consumer must instruct
lanova University, and Affiliate Scholar The figures in Europe and other parts a company if he or she doesn’t want
of the Center for Internet and Society of the world are the same. his or her data used or shared in cer-
at Stanford Law School. “Europe has In a 2016 interview in Recode, Eu- tain ways—Europe has implemented
flipped a switch and prompted recon- rope’s Competition Commissioner a more restrictive opt-in approach.
IMAGE BY MIXMAGIC
sideration of how data can be collected, Margrethe Vestager said, “There is no However, GDPR takes this concept to
managed, and used.” The EU takes the such thing as a free lunch. You pay with a new and previously untested level.
position that a person owns his or her one currency or another—either cents, Besides giving consumers near-total
data, and their privacy is a fundamen- or you pay with your data, or you pay control of their data, they can have

news
their data removed from a database Milestones

or online source at any time and, for
those who believe they have been
wronged, seek an investigation and
GDPR allows
consumers to remove
Håstad
join a class-action lawsuit.
Strict rules about how organizations themselves from a Receives
collect, manage, and process data any-
where in the world are only the starting
database or online
source at any time;
Knuth
point for GDPR. It allows consumers to
file complaints with each nation’s na-
tional data protection authority, which
companies violating Prize
will investigate the claim. A company GDPR face fines of up The 2018 Donald E. Knuth
that violates GDPR could face a fine to 4% of their global Prize has been awarded
to Johan Torkel Håstad
of up to 4% of its worldwide annual
revenue from the previous fiscal year. annual revenues. of Sweden’s KTH Royal
Institute of Technology
The regulation also mandates consum- for his sustained record of
ers can remove themselves from a da- milestone breakthroughs at
the foundations of computer
tabase at any time and take their data science, with major impact on
elsewhere—to a new bank, a new mo- areas including optimization,
bile provider, or a new content service. and suggests they could be more ef- cryptography, parallel computing,
and complexity theory.
Not surprisingly, data scientists, le- fectively addressed through self- The Knuth Prize is
gal experts, and others have radically regulation. “It is simply not possible jointly bestowed by the ACM
different perspectives of GDPR. Says to be 100% compliant. GDPR forces or- Special Interest Group on
Daniel Le Métayer, Senior Research ganizations to devote significant time Algorithms and Computation
Theory (SIGACT) and the
Scientist at Inria (the French Institute and expense to comply with standards IEEE Computer Society
for Research in Computer Science and that are not consistent with the way Technical Committee on the
Automation) and a leading authority business is done online,” she argues. Mathematical Foundations of
Computing (TCMF). The Prize
on data protection and privacy, “GDPR
is named for Donald Knuth of
could be a great achievement if prop- Data Gets Personal Stanford University, the “father
erly implemented. It could establish a To be sure, the practical challenges of of the analysis of algorithms,”
more concrete framework for data use complying with GDPR are significant, and is bestowed in recognition
of outstanding contributions
and protection and help reduce the especially as digital technology and ar- to the foundations of computer
misuse of personal information.” tificial intelligence (AI) advance. science by individuals for their
Adds Cool: “It potentially changes Personal assistants such as Siri, overall impact in the field over
the balance of power. GDPR takes aim Alexa, and Cortana add layers of com- an extended period.
A professor of computer
at the widely used model of forced con- plexity to the issue of PII. Robo-advi- science at the KTH Royal
sent, which is built on the idea that in sors, chatbots, recommendation ser- Institute of Technology in
exchange for various services, there is vices, and other automated systems Stockholm, Håstad received
his bachelor’s degree in
an implicit agreement to give up your introduce additional compliance chal- mathematics from Stockholm
personal data.” lenges. All these systems collect and University, his master’s degree
However, there are also plenty of store data about individuals. In the in mathematics from Sweden’s
potential pitfalls likely to result from past, there was no need to determine Uppsala University, and his
doctorate in mathematics from
GDPR. Le Métayer says the complexity where a person lived; under GDPR, the Massachusetts Institute of
of GDPR, and the way regulators and that could amount to crucial informa- Technology.
courts interpret some of the intention- tion that would need to be added to Håstad’s works resolved
long-standing problems
ally vague wording, could create such each individual data point related to central to circuit lower bounds,
rigid restrictions that the initiative be- an individual. Even human resources pseudorandom generation,
comes ineffective over time. systems, payroll systems, and similar and approximability. He also
There is also strong opposition in repositories of personal data could be introduced transformative
techniques that have
the corporate arena, where the focus significantly impacted by the regula- fundamentally influenced
is on profiting from data rather than tion; all may require algorithmic audit- much of the subsequent work
stemming the wave of abuses and ing processes that revolve around “data in these areas.
breaches. Attorneys such as Tanya protection by design.” Previous honors bestowed
on Håstad include the ACM
Forsheit, partner and chair of the Companies already are voicing con- Doctoral Dissertation Award
Privacy & Data Practice Group at New cerns that GDPR could inhibit innova- (1986), the Gödel Prize for
York City-based law firm Frankfurt tion by limiting how data is handled outstanding papers on
theoretical computer science
Kurnit Klein & Selz, demonstrate the in apps, databases, and online ser- (1994 and 2011), and the
level of frustration about changes as vices—and how data is used for adver- Göran Gustafsson Prize for
a result of GDPR. Forsheit describes tising and other purposes. The issue outstanding achievement in
many GDPR provisions as onerous, could impact autonomous vehicles, mathematics.
news
robotics, and a variety of systems that One thing is certain: amid a litany
rely on AI. Organizations may ulti- of security breaches and breakdowns,
mately need to keep two separate da- Above all else, from Equifax to Cambridge Analytica,
tabases—one for the EU and one for GDPR represents there is a growing focus on data pri-
elsewhere—or find ways to differenti- vacy. What is more, other government
ate records in databases. the ongoing battle entities are exploring ways to control
In addition, GDPR might add a layer between unfettered how data is collected, managed, and
of complexity atop an already complex used. In the U.S., the State of Vermont
European privacy framework. For ex- capitalism and enacted a law in May 2018 that estab-
ample, more than 2.4 million individu- human dignity. lished standards for data. California is
als have already submitted “right to now eyeing an initiative—the Califor-
be forgotten” requests so they can be nia Consumer Privacy Act—that could
expunged from Google searches. Cool extend many of the same GDPR pro-
says some people believe the law will tections to the state. Other countries,
“hinder innovation by making organi- where they will have to rethink the from Australia to Japan, have also re-
zations more risk averse.” fundamental way they approach and vised data standards and privacy con-
Depending on who opts in, who navigate data management, or reevalu- trols in recent years.
opts out, and what data appears or ate the fundamental value of data and Frischmann says GDPR, above all
disappears from a database or oth- how it is monetized. GDPR also might else, represents the ongoing battle be-
er source, the situation could be- mandate new tracking and data man- tween unfettered capitalism and hu-
come even more problematic. As agement tools, such as blockchain. man dignity. The whole point of it is
Frischmann puts it, “What happens Le Métayer argues that businesses that it is not designed to be an efficient
when one person at a group meeting need to address complex issues such regulation for businesses. “To some ex-
or part of a community invokes a pri- as conducting data protection impact tent, it’s about a person’s ability to exer-
vacy clause but it affects everyone?” assessments and implementing data cise their own free will about their life.”
The greatest challenge may be en- portability, which requires agree- Cool says that, in the end, it is vital
suring companies in the EU and being on standard data formats. Other to strike a balance between privacy
yond adhere to the spirit of GDPR. sources of uncertainty include the and laws. “We need more research that
Many companies lack expertise in how compatibility of GDPR with big data, looks carefully at how personal data is
they will need to implement and man- and the rules concerning automated collected and by whom, and how those
age data under GDPR; they also do not decision-making. Article 22 of GDPR people make decisions about data pro-
know the levels of expertise or staffing states individuals have the right to tection. Policymakers should use such
required to conduct crucial data protec- “not be subject to a decision based studies as a basis for developing em-
tion impact assessments. solely on automated processing, in- pirically grounded, practical rules.”
“If businesses view GDPR as a cluding profiling.” GDPR also allows
checklist activity rather than an issue consumers to contest a decision, but it
Further Reading
that requires ethical reflection—and if is not clear what type of explanations
they look to exploit loopholes and skirt should be provided to make this right Wachter, S., Mittelstadt, B.D., and Russell, C.
Counterfactual Explanations Without
the intent of the law—the long-term effective. “The issue is also technical,
Opening the Black Box: Automated
outcome could be negative,” Cool says. since providing useful explanations Decisions and the GDPR, Harvard Journal of
“When you look at groups like bio- about certain types of algorithms is a Law & Technology, 31 (2), 2018. November
ethicists and physicians, the starting challenge in itself,” he says. 2017. https://papers.ssrn.com/sol3/papers.
point for discussion is how to do the GDPR could also prompt companies cfm?abstract_id=3063289
right thing for society; it’s not about to directly pay for PII data, Frischmann Casey, B., Farhangi, A., and Vogl, R.
avoiding getting sued or how to side- says. “If the power balance shifts and Rethinking Explainable Machines: The
GDPR’s ‘Right to Explanation’ Debate and
step legal and ethical provisions.” consumers gain leverage over their
the Rise of Algorithmic Audits in Enterprise,
personal data, companies may look to Berkeley Technology Law Journal. February
Cracking the Code on Privacy provide incentives, discounts, and di- 19, 2018. https://papers.ssrn.com/sol3/
How GDPR will play out is anyone’s rect compensation for the use of data. papers.cfm?abstract_id=3143325
guess. The initiative could revolution- It could flip the current model and Kaltheuner, F. and Bietti, E.
ize the data landscape—or it may fiz- even lead to entirely different ways to Data is power: Towards additional guidance
zle into a footnote in digital history. It approach data,” he explains. on profiling and automated decision-making
in the GDPR. Information Rights, Policy &
could also change the way the Internet In fact, a recent study conducted
Practice Journal, Vol. 2, No. 2. 2017. https://
works and how data and information by digital marketing agency Syzgy in journals.winchesteruniversitypress.org/
flow across sites, clouds, and more. Germany, which polled 1,000 respon- index.php/jirpp/article/view/45
One wild card is how consumers re- dents each from the U.S., U.K., and
act to GDPR. If large numbers of peo- Germany, found citizens in all three Samuel Greengard is an author and journalist based in
West Linn, OR, USA.
ple revoke access to PII or challenge countries would sell their data for be-
the way companies use their data, busi- tween €130 (about US$150) and €140
nesses may reach an inflection point (US$165) per month. © 2018 ACM 0001-0782/18/11 $15.00

AWARD
AWARD NOMINATIONS SOLICITED
AWARD NOMINATIONS
NOMINATIONS SOLICITED
SOLICITED
As
As part
part of
of its
its mission,
mission, ACM
ACM brings
brings broad
broad recognition
recognition to
to outstanding
outstanding technical
technical
Asand
part of its mission,
professional ACM brings
achievements broad
in recognition
computing and to outstanding
information technical
technology.
and professional achievements in computing and information technology.
and professional achievements in computing and information technology.
ACM welcomes nominations for those who deserve recognition for their accomplishments. Please refer to the ACM Awards
ACM welcomes nominations for those who deserve recognition for their accomplishments. Please refer to the ACM Awards
website at https://awards.acm.org
ACM welcomes for who
nominations for those guidelines
deserveonrecognition
how to nominate,
for theirlists of the members Please
accomplishments. of the 2018 Award
refer to Committees,
the ACM Awards
website at https://awards.acm.org for guidelines on how to nominate, lists of the members of the 2018 Award Committees,
and listings
website of past award recipientsfor
at https://awards.acm.org and their citations.
guidelines on how to nominate, lists of the members of the 2018 Award Committees,
and listings of past award recipients and their citations.
and listings of past award recipients and their citations.
Nominations are due January 15, 2019 with the exceptions of the Doctoral Dissertation Award (due October 31, 2018)
Nominations are due January 15, 2019 with the exceptions of the Doctoral Dissertation Award (due October 31, 2018)
and the ACM are
Nominations – IEEE
dueCSJanuary
George15,
Michael
2019 Memorial HPC Fellowship
with the exceptions of the(due May 1,
Doctoral 2019).
Dissertation Award (due October 31, 2018)
and the ACM – IEEE CS George Michael Memorial HPC Fellowship (due May 1, 2019).
and the ACM – IEEE CS George Michael Memorial HPC Fellowship (due May 1, 2019).
A.M. Turing Award: ACM’s most prestigious award recognizes contributions of a technical nature which are of lasting and major technical
A.M. Turing Award: ACM’s most prestigious award recognizes contributions of a technical nature which are of lasting and major technical
importance
A.M. Turing to the computing
Award: ACM’s most community.
prestigiousThe award
award is accompanied
recognizes by a prize
contributions of a of $1,000,000
technical with
nature financial
which are ofsupport
lastingprovided
and majorbytechnical
Google.
importance to the computing community. The award is accompanied by a prize of $1,000,000 with financial support provided by Google.
importance to the computing community. The award is accompanied by a prize of $1,000,000 with financial support provided by Google.
ACM Prize in Computing (previously known as the ACM-Infosys Foundation Award in the Computing Sciences): recognizes an early-
ACM Prize in Computing (previously known as the ACM-Infosys Foundation Award in the Computing Sciences): recognizes an early-
to
ACMmid-career fundamental,
Prize in Computing innovativeknown
(previously contribution
as the in computingFoundation
ACM-Infosys that, through its depth,
Award in theimpact and broad
Computing implications,
Sciences): exemplifies
recognizes the
an early-
to mid-career fundamental, innovative contribution in computing that, through its depth, impact and broad implications, exemplifies the
greatest achievements
to mid-career fundamental,in theinnovative
discipline.contribution
The award carries a prize of
in computing $250,000.
that, through Financial support
its depth, impactisandprovided
broadby Infosys Ltd.exemplifies the
implications,
greatest achievements in the discipline. The award carries a prize of $250,000. Financial support is provided by Infosys Ltd.
greatest achievements in the discipline. The award carries a prize of $250,000. Financial support is provided by Infosys Ltd.
Distinguished Service Award: recognizes outstanding service contributions to the computing community as a whole.
Doctoral Dissertation Award: presented annually to the author(s) of the best doctoral dissertation(s) in computer science and
Doctoral Dissertation Award: presented annually to the author(s) of the best doctoral dissertation(s) in computer science and
engineering, and is accompanied
Doctoral Dissertation by a prize
Award: presented of $20,000.
annually to theThe Honorable
author(s) Mention
of the Award isdissertation(s)
best doctoral accompaniedinby a prize totaling
computer science$10,000.
and
engineering, and is accompanied by a prize of $20,000. The Honorable Mention Award is accompanied by a prize totaling $10,000.
Winning dissertations
engineering, are published
and is accompanied by in the ACM
a prize Digital Library
of $20,000. and the Mention
The Honorable ACM Books Series.
Award is accompanied by a prize totaling $10,000.
Winning dissertations are published in the ACM Digital Library and the ACM Books Series.
Winning dissertations are published in the ACM Digital Library and the ACM Books Series.
ACM – IEEE CS George Michael Memorial HPC Fellowships: honors exceptional PhD students throughout the world whose research
ACM – IEEE CS George Michael Memorial HPC Fellowships: honors exceptional PhD students throughout the world whose research
focus
ACM –isIEEE
on high-performance
CS George Michael computing
Memorialapplications,
HPC Fellowships:networking,
honorsstorage, or large-scale
exceptional PhD studentsdata throughout
analysis using thethe mostwhose
world powerful
research
focus is on high-performance computing applications, networking, storage, or large-scale data analysis using the most powerful
computers that are currently available.
focus is on high-performance computing The Fellowshipsnetworking,
applications, includes a $5,000
storage,honorarium.
or large-scale data analysis using the most powerful
computers that are currently available. The Fellowships includes a $5,000 honorarium.
computers that are currently available. The Fellowships includes a $5,000 honorarium.
Grace Murray Hopper Award: presented to the outstanding young computer professional of the year, selected on the basis of a
Grace Murray Hopper Award: presented to the outstanding young computer professional of the year, selected on the basis of a
single recent major
Grace Murray Hopper technical
Award:orpresented
service contribution. The candidate
to the outstanding young must have professional
computer been 35 years ofof
theage or less
year, at theon
selected time
thethe
basisqualifying
of a
single recent major technical or service contribution. The candidate must have been 35 years of age or less at the time the qualifying
contribution
single recent wasmajor made. A prize
technical of $35,000
or service accompanies
contribution. The the award. must
candidate Financial
havesupport
been 35isyears
provided
of agebyor Microsoft.
less at the time the qualifying
contribution was made. A prize of $35,000 accompanies the award. Financial support is provided by Microsoft.
contribution was made. A prize of $35,000 accompanies the award. Financial support is provided by Microsoft.
Paris Kanellakis Theory and Practice Award: honors specific theoretical accomplishments that have had a significant and demonstrable
Paris Kanellakis Theory and Practice Award: honors specific theoretical accomplishments that have had a significant and demonstrable
effect on the practice
Paris Kanellakis Theory of computing.
and PracticeThis award
Award: is accompanied
honors by a prizeaccomplishments
specific theoretical of $10,000 and isthat endowed
have hadby contributions
a significant and from the Kanellakis
demonstrable
effect on the practice of computing. This award is accompanied by a prize of $10,000 and is endowed by contributions from the Kanellakis
family, and
effect on thefinancial
practicesupport by ACM’sThis
of computing. SIGACT,
awardSIGDA, SIGMOD, SIGPLAN,
is accompanied by a prizeand the ACMand
of $10,000 SIGisProject
endowed Fund,byand individual from
contributions contributions.
the Kanellakis
family, and financial support by ACM’s SIGACT, SIGDA, SIGMOD, SIGPLAN, and the ACM SIG Project Fund, and individual contributions.
family, and financial support by ACM’s SIGACT, SIGDA, SIGMOD, SIGPLAN, and the ACM SIG Project Fund, and individual contributions.
Karl V. Karlstrom Outstanding Educator Award: presented to an outstanding educator who is appointed to a recognized educational
Karl V. Karlstrom Outstanding Educator Award: presented to an outstanding educator who is appointed to a recognized educational
baccalaureate
Karl V. Karlstrom institution,
Outstandingrecognized
Educatorfor advancing new teaching
Award: presented methodologies,
to an outstanding effecting
educator whonew curriculum
is appointed todevelopment or expansion
a recognized educational
baccalaureate institution, recognized for advancing new teaching methodologies, effecting new curriculum development or expansion
in computer science
baccalaureate and engineering,
institution, recognized for or advancing
making a significant
new teachingcontribution to ACM’seffecting
methodologies, educational
new mission.
curriculum Thedevelopment
Karlstrom Award is
or expansion
in computer science and engineering, or making a significant contribution to ACM’s educational mission. The Karlstrom Award is
accompanied by a prize
in computer science andof $10,000. Financial
engineering, or makingsupport is provided
a significant by Pearson
contribution toEducation.
ACM’s educational mission. The Karlstrom Award is
accompanied by a prize of $10,000. Financial support is provided by Pearson Education.
accompanied by a prize of $10,000. Financial support is provided by Pearson Education.
Eugene L. Lawler Award for Humanitarian Contributions within Computer Science and Informatics: recognizes an individual or a group
Eugene L. Lawler Award for Humanitarian Contributions within Computer Science and Informatics: recognizes an individual or a group
who
EugenehaveL. made
Lawlera Award
significant contribution through
for Humanitarian the use within
Contributions of computing
Computer technology;
Science andthe award is intentionally
Informatics: recognizes defined broadly.or
an individual This
a group
who have made a significant contribution through the use of computing technology; the award is intentionally defined broadly. This
biennial,
who haveendowed award is accompanied
made a significant contribution by a prizethe
through of $5,000, and alternates
use of computing with thethe
technology; ACM Policy
award is Award.
intentionally defined broadly. This
biennial, endowed award is accompanied by a prize of $5,000, and alternates with the ACM Policy Award.
biennial, endowed award is accompanied by a prize of $5,000, and alternates with the ACM Policy Award.
ACM – AAAI Allen Newell Award: presented to individuals selected for career contributions that have breadth within computer science,
ACM – AAAI Allen Newell Award: presented to individuals selected for career contributions that have breadth within computer science,
or
ACMthat bridgeAllen
– AAAI computer
Newellscience
Award:and other disciplines.
presented The $10,000
to individuals selectedprize is provided
for career by ACMthat
contributions and have
AAAI,breadth
and by individual contributions.
within computer science,
or that bridge computer science and other disciplines. The $10,000 prize is provided by ACM and AAAI, and by individual contributions.
or that bridge computer science and other disciplines. The $10,000 prize is provided by ACM and AAAI, and by individual contributions.
Outstanding Contribution to ACM Award: recognizes outstanding service contributions to the Association. Candidates are selected
Outstanding Contribution to ACM Award: recognizes outstanding service contributions to the Association. Candidates are selected
based on the value
Outstanding and degree
Contribution of service
to ACM Award: overall.
recognizes outstanding service contributions to the Association. Candidates are selected
based on the value and degree of service overall.
based on the value and degree of service overall.
ACM Policy Award: recognizes an individual or small group that had a significant positive impact on the formation or execution of public
ACM Policy Award: recognizes an individual or small group that had a significant positive impact on the formation or execution of public
policy affecting
ACM Policy Award:computing or the
recognizes ancomputing
individual or community.
small groupThe biennial
that award is accompanied
had a significant positive impactby aon$10,000 prize. The
the formation next awardofwill
or execution be
public
policy affecting computing or the computing community. The biennial award is accompanied by a $10,000 prize. The next award will be
the 2019
policy award.computing or the computing community. The biennial award is accompanied by a $10,000 prize. The next award will be
affecting
the 2019 award.
the 2019 award.
Software System Award: presented to an institution or individuals recognized for developing a software system that has had a lasting
Software System Award: presented to an institution or individuals recognized for developing a software system that has had a lasting
influence, reflected
Software System in contributions
Award: presentedto toconcepts, in commercial
an institution acceptance,
or individuals recognized or both. A prize ofa$35,000
for developing software accompanies
system that the has award with
had a lasting
influence, reflected in contributions to concepts, in commercial acceptance, or both. A prize of $35,000 accompanies the award with
financial
influence,support
reflected provided by IBM. to concepts, in commercial acceptance, or both. A prize of $35,000 accompanies the award with
in contributions
financial support provided by IBM.
financial support provided by IBM.
ACM Athena Lecturer Award: celebrates women researchers who have made fundamental contributions to computer science. The award
ACM Athena Lecturer Award: celebrates women researchers who have made fundamental contributions to computer science. The award
includes
ACM Athena a $25,000
Lecturerhonorarium.
Award: celebrates women researchers who have made fundamental contributions to computer science. The award
includes a $25,000 honorarium.
includes a $25,000 honorarium.
For SIG-specific Awards, please visit https://awards.acm.org/sig-awards.
Vinton G. Cerf, ACM Awards Committee Co-Chair John R. White, ACM Awards Committee Co-Chair
Vinton G. Cerf, ACM Awards Committee Co-Chair John R. White, ACM Awards Committee Co-Chair
VintonLee,
Insup G. Cerf, ACM Awards
SIG Governing Committee
Board Co-Chair Liaison
Awards Committee John R. White,
Rosemary ACM Awards
McGuinness, ACMCommittee Co-Chair Liaison
Awards Committee
Insup Lee, SIG Governing Board Awards Committee Liaison Rosemary McGuinness, ACM Awards Committee Liaison
Insup Lee, SIG Governing Board Awards Committee Liaison Rosemary McGuinness, ACM Awards Committee Liaison
V
viewpoints
DOI:10.1145/3277562 Pamela Samuelson
Legally Speaking
The EU’s Controversial Digital
Single Market Directive
Should copyright enforcement have precedence over the interests
of users in information privacy and fundamental freedoms?
T
of the EU’s
H E S TAT E D G OALS right exception to enable research limitations preclude attention to
proposed Digital Single Mar- and cultural heritage institutions other controversial provisions, such
ket (DSM) Directive are laud- to engage in text- and data-mining as the new press publishers’ rights
able: Who could object to (TDM) for scientific research pur- to control online services’ displays of
modernizing the EU’s digi- poses. This is good so far as it goes, press contents.)
tal copyright rules, facilitating cross- but critics argue that for-profit firms
border uses of in-copyright materi- and independent researchers should Article 13’s Changes to
als, promoting growth of the internal enjoy similar TDM privileges, and Online Service Liability Rules
market of the EU, and clarifying and scientific research should not be the For approximately the past two de-
harmonizing copyright rules for digital only legitimate purpose for TDM. cades, the European Union’s E-Com-
networked environments? This column explains the ratio- merce Directive, like the U.S. Digital
The devil, as always, is in the details. nales for these new measures, specif- Millennium Copyright Act, has pro-
The most controversial DSM proposal ic terms of concern, and why critics vided Internet service providers (ISPs)
is its Article 13, which would require have argued for changes to make the with “safe harbors” from copyright
online content-sharing services to use rules more balanced. (Column space liability for infringing uses of their
“effective and proportionate” mea- services about which the ISPs had nei-
sures to ensure user uploads to their ther knowledge nor control.
sites are non-infringing. Their failure Big media firms Under these rules, ISPs must take
to achieve this objective would result down infringing materials after copy-
in their being directly liable for any in- can use this new rule right owners notify them of the exis-
fringements. This seemingly requires to extract more tence and location of those materials.
those services to employ monitor- But they do not have to monitor for in-
ing and filtering technologies, which compensation fringements or use filtering technolo-
would fundamentally transform the from platforms. gies to prevent infringing materials
rules of the road under which these from being uploaded and stored on
firms have long operated. their sites.
A more positive part of the DSM Di- Because online infringements have
rective is its Article 3. It would require greatly proliferated, copyright industry
EU member states to adopt a copy- groups have strongly urged policymak-
20 COM MUNICATIO NS O F TH E ACM | NOV EM BER 201 8 | VO L . 61 | N O. 1 1

viewpoints
ers in the EU (as well as the U.S.) to im- es, such as online audio or video stream- online content sharing service provid-
pose stronger obligations on ISPs to ing services, for the same customers. er shall not be liable for acts of commu-
thwart infringements. Their goal has If the “main purpose” (or “one of nication to the public or making avail-
been the adoption of legal rules requir- the main purposes”) of the service is able to the public within the meaning
ing ISPs to use monitoring technolo- to provide access to “large amounts” of this Article when:
gies to detect in-copyright materials of copyrighted content uploaded by (a) it demonstrates that it has
and filtering technologies to block in- users and it organizes and promotes made best efforts to prevent the avail-
fringing uploads. those uploads for profit-making pur- ability of specific works or other subject
In proposing the DSM Directive, poses, that service will no longer be matter by implementing effective and
the European Commission has re- protected by the E-Commerce safe har- proportionate measures … to prevent the
sponded to these calls by proposing bor. It will instead be subjected to the availability on its services of the specific
certain ISPs should take on greater new liability rules. works or other subject matter identi-
responsibilities on to help prevent Concerns about the overbreadth of fied by rightholders and for which the
infringements. Article 13 is aimed at Article 13 led the Commission to nar- rights holders have provided the service
those ISPs that enable online content row the definition of the online con- with relevant and necessary information
sharing (think YouTube). tent-sharing services affected by the for the application of these measures; and
While not directly requiring the rules. It now specifically excludes on- (b) upon notification by rights
use of monitoring or filtering tech- line encyclopedias (think Wikipedia), holders of works or other subject mat-
nologies, Article 13 can reasonably be repositories of scientific or educational ter, it has acted expeditiously to re-
IMAGE BY ALICIA KUBISTA /A ND RIJ BORYS ASSOCIAT ES
interpreted as intending to achieve materials uploaded by their authors, move or disable access to these works
this result. open source software repositories, or other subject matter and it demon-
cloud services, cyberlockers, and mar- strates that it has made its best efforts to
Which Online Services ketplaces engaged in online retail sales prevent their future availability through
Are Affected? of digital copies. the measures referred to in point (a).
The DSM Directive states Article 13 is The italicized language signals ter-
intended to target only those online Article 13’s New Liability Rules minology that is vague and open to
content-sharing services that play an The most significant regulation in Ar- varying interpretations, but anticipates
“important role” in the online content ticle 13 is its subsection (4): the use of technologies to show those
market by competing with other servic- Member States shall provide that an “best efforts.”
viewpoints
Copyright industry groups can be perform automatic filtering on all of the

expected to assert that it is necessary content that their users upload, Article
to use monitoring and filtering tech- 13 takes an unprecedented step toward
nologies to satisfy the requirements the transformation of the Internet from
of Article 13(4). They will also point to an open platform for sharing and inno-
an alternative way that online services vation, into a tool for the automated sur-
can avoid liability: by licensing uploaded veillance and control of its users.”
copyrighted content from their respec- More than 145 civil society organiza-
tive rights holders. tions also came out against it. These pro-
Affected online services will have an tests were successful enough to induce
uphill battle to fend off the efforts to a majority of the European Parliament
interpret the ambiguous terms as im- to vote for giving further consideration
posing monitoring and filtering obli- to the DSM directive. Several stages re-
gations. It is, of course, impossible to main in the EU’s elongated process be-
license contents for every copyrighted fore this directive is finalized, either in
work that their users might upload to its current or some revised form.
Advertise with ACM! their site. But the big media firms can
use this new rule to extract more com- Mandatory Text- and
pensation from platforms. Data-Mining Exception
Reach the innovators Much better news is the proposed new
and thought leaders Concerns About Article copyright exception to enable non-
13’s Liability Rules profit research and cultural heritage
working at the Critics have raised two major concerns institutions to engage in text- and
cutting edge about this proposal. First, it will likely data-mining (TDM). The European
further entrench the market power of Commission and the Council recog-
of computing the leading platforms that can afford nize that digital technologies have
and information to develop filtering technologies such opened up significant opportunities
as YouTube’s ContentID, and deter for using TDM techniques to make
technology through new entry into the online content shar- new discoveries by computational
ACM’s magazines, ing market. Second, it will undermine analysis of large datasets. These dis-
user privacy and free speech interests, coveries can advance not only natural
websites leading to blockages of many parodies, but also human sciences in ways that
and newsletters. remixes, fan fiction, and other cre- will benefit the information society.
ative reuses of copyrighted works that Article 3 would require EU member
would, if examined by a neutral observ- states to allow research and cultural
◊◆◊◆◊ er, be deemed non-infringing. heritage institutions to reproduce copy-
When the proposal was pending righted works and extract information
before the European Council in late using TDM technologies, as long as the
Request a media kit May, several members, including rep- researchers had lawful access to the con-
with specifications resentatives from Finland, Germany, tents being mined. These researchers
and the Netherlands, opposed it and must, however, store such copies in a se-
and pricing: offered some compromise language, cure environment and retain the copies
so it does not have consensus support. no longer than is necessary to achieve
Ilia Rodriguez Since then, opponents have mounted a their scientific research objectives.
public relations campaign to urge EU Importantly, rights holders cannot
+1 212-626-0686 residents to contact their Parliamenta- override the TDM exception through
acmmediasales@acm.org ry representatives telling them to vote contract restrictions. (They can, how-
no in order to “save the Internet.” ever, use technology to ensure security
Among the many critics of Article and integrity of their networks and da-
13 is David Kaye, the United Nation’s tabases, which opens the possibility
Special Rapporteur for Freedom of Ex- of technology overrides.) Article 3 also
pression. He wrote a nine-page letter calls for rights holders, research orga-
explaining why Article 13 is inconsis- nizations, and cultural heritage insti-
tent with EU’s commitments under in- tutions to agree upon best practices
ternational human rights instruments. for conducting TDM research.
In addition, Tim Berners-Lee, Vint
Cerf, and 89 other Internet pioneers No TDM Privilege for Profit-Making
(plus me) signed an open letter urg- and Unaffiliated Researchers
ing the EU Parliament to drop Article The DSM Directive assumes that profit-
13: “By requiring Internet platforms to making firms can and should get a

viewpoints
license to engage in TDM research

from the owners of the affected IP
The prospect of
Calendar
rights. Although the DSM contem-
plates the possibility of public-pri-
vate partnerships, it forbids those in
bearing direct liability of Events
which private entities have control for the infringing November 3–7
over TDM-related collaborative proj- activities of users CSCW ‘18: Computer Supported
Cooperative Work
ects. Unaffiliated researchers (say,
independent data scientists or think- will likely cause and Social Computing
Jersey City, NJ,
tank personnel) cannot rely on the many sharing Co-Sponsored: ACM/SIG,
Contact: Mor Naaman,
DSM’s TDM exception.
Article 3 may put the EU at a dis- services to be overly Email: mornaaman@yahoo.com
advantage in AI research because cautious about November 4–7

some countries have already adopted SenSys ‘18: The 16th ACM
less restrictive TDM exceptions. Ja- what their users Conference on Embedded
pan, for instance, allows text- and can upload. Network Sensor Systems
Shenzhen, China,
data-mining without regard to the Co-Sponsored: ACM/SIG,
status of the miner, and does not Contact: Pei Zhang,
confine the scope of the exception Email: peizhang@cmu.edu
to nonprofit “scientific research.” In November 4–9
the U.S., for-profit firms have been ESEC/FSE ‘18: The 26th ACM
able to rely on fair use to make cop- SIGSOFT International
ies of in-copyright materials for TDM before the final text of the directive is Symposium on the Foundations
of Software Engineering
purposes, as in the Authors Guild v. voted on. Lake Buena Vista, FL,
Google case. This ruling did not limit Whether Article 13, if adopted as Contact: Gary T. Leavens,
TDM purposes to scientific research. is, will “kill” the Internet as we know Email: leavens@eecs.ucf.edu
Commentators on the DSM Direc- it, as some critics have charged, re-
November 4–9
tive have expressed several concerns mains to be seen. Yet the prospect of Conference on Systems,
about the restrictions on its TDM ex- bearing direct liability for the infring- Programming, Languages,
ception. For one thing, TDM licenses ing activities of users will likely cause and Applications: Software
for Humanity
may not be available on reasonable many sharing services to be overly Boston, MA,
terms for startups and small busi- cautious about what their users can Sponsored: ACM/SIG,
nesses in the EU. Second, some EU upload and new entry will be chilled. Contact: Jan Vitek,
firms may ship their TDM research In its current form, Article 13 gives Email: j.vitek@neu.edu
offshore to take advantage of less-re- copyright enforcement priority over November 5–6
strictive TDM rules elsewhere. Third, the interests of users in information 11th ACM SIGPLAN International
some non-EU firms may decide not privacy and fundamental freedoms. Conference on Software
to invest in TDM-related research in The DSM Directive’s proposed Language Engineering
Boston, MA,
the EU because of these restrictions. exception for TDM research is a wel- Sponsored: ACM/SIG,
Moreover, in the highly competitive come development for those who Contact: David James Pearce,
global market for world-class AI and work at research and cultural heritage Email: david.pearce@ecs.vuw.
ac.nz
data science researchers, the EU may institutions. However, the unfortu-
suffer from “brain drain” if its most nate withholding of the exception November 5–6
talented researchers take job oppor- from for-profit firms and indepen- 17th ACM SIGPLAN International
tunities in jurisdictions where TDM is dent researchers may undermine Conference on Generative
Programming: Concepts
broadly legal. prospects for the EU’s achieving its
and Experiences
aspiration to promote innovations in Boston, MA,
Conclusion AI and data science industries. It will Sponsored: ACM/SIG,
The EU’s proposed DSM Directive is be difficult for EU-based entities to Contact: Eric Van Wyk,
Email: evw@cs.umn.edu
highly controversial, especially the compete with American and Japanese
new obligations it would impose on firms whose laws provide them with
online content-sharing services to much greater freedom to engage in
thwart infringing uploads. In early TDM analyses.
July, the EU Parliament voted against
giving approval to the May version of Pamela Samuelson (pam@law.berkeley.edu) is the
Richard M. Sherman Distinguished Professor of Law
the DSM proposal; it voted in Septem- and Information at the University of California, Berkeley,
ber to approve some amendments to and a member of the ACM Council.
the DSM Directive, which did not sig-

nificantly change the Article 13 man-
date. It will, however, be many months Copyright held by author.
V
viewpoints
DOI:10.1145/3277564 Steven M. Bellovin and Peter G. Neumann
Inside Risks
The Big Picture
A systems-oriented view of trustworthiness.
P
REVI O U S CO MMUNICATIO NS
columns have
I N S I D E RI S K S
discussed specific types of
risks (to safety, security,
reliability, and so on), and
specific application areas (for exam-
ple, critical national infrastructures,
election systems, autonomous sys-
tems, the Internet of Things, artifi-
cial intelligence, machine learning,
cybercurrencies and blockchains—
all of which are riddled with security
problems). We have also considered
risks of deleterious misuses of social
media, malware, malicious drones,
risks to privacy, fake news, and the
meaning of “truth.” All of these and
many more issues must be considered
proactively as part of the development
and operation of systems with require-
ments for trustworthiness.
We consider here certain overarch-
ing and underlying concepts that must
be better understood and more sys-
tematically confronted, sooner rather
than later. Some are more or less self- trustworthiness, although new appli- suggests they will pervasively continue
evident, some may be debatable, and cations, widgets, and snake-oil-like to recur in the future.
others may be highly controversial. hype continue apace without much ˲˲ A general lack of awareness and
˲˲ A preponderance of flawed hard- concern for sound usability. education relating to all of these is-
ware-software systems, which limits ˲˲ A lack of appreciation for the wis- sues, requiring considerable rethink-
the development of trustworthy ap- dom that can be gained from science, ing of these issues.
plications, which also impedes ac- engineering, and scientific methods,
countability and forensics-worthy which impedes progress, especially Background
rapid identification of culprits and where that wisdom is clearly relevant. Progress toward trustworthy systems
causes failures. ˲˲ A lack of understanding of the for critical security uses has been very
˲˲ Lack of understanding of the prop- short-term and long-term risks by spotty. For example, several National
PHOTO BY DREW GRA H AM ON UNSPL ASH
erties of composed systems. Compo- leaders in governments and business, Academies of Science Computer Sci-
nents that seem secure locally, when which is becoming critical, as is their ence and Technology Board studies have
combined, may yield insecure systems. willingness to believe that today’s examined issues relating to computer
˲˲ A lack of discipline and construc- sloppy systems are good enough for and network security4,6,11 and cryptog-
tive uses of computer science, physi- critical uses. raphy,5 with extensive conclusions and
cal science, technology, and engi- ˲˲ A widespread failure to under- recommendations that seem to have
neering, which hinders progress in stand these risks is ominous, as history been widely ignored, or not farsighted
24 COMM UNICATIO NS O F THE ACM | NOV EM BER 201 8 | VO L . 61 | N O. 1 1

viewpoints
enough, or possibly both. Other stud- and subsequently discovered Fore-

ies have examined some of the implica- shadow-NG vulnerabilities,13,15 which
tions of using cryptography,1,2,7 where broadly affect VMs, VMMs, operating
again related problems keep arising. systems, and SMM memory. The NG
Cryptography is an enormously useful (next-generation) paper has attacks
concept for achieving trustworthy sys- that “completely bypass the virtual
tems and networks; unfortunately, its memory abstraction by directly expos-
effectiveness can be severely limited if ing cached physical memory contents ACM
ACM Conference
Conference
it is not implemented in systems with
sufficient trustworthiness. Thus, it is a
to unprivileged applications and guest
virtual machines.” These attacks ap-
Proceedings
Proceedings
trustworthiness enhancer, but cannot pear to be very serious. Now
Now Available via
Available via
be relied on by itself to enable trust- Overall, there are no simple solutions.
worthy systems and networks. Precision in interface definition is one Print-on-Demand!
Print-on-Demand!
obvious approach, although obscure cas-
Total-System Trustworthiness es are difficult to specify—for example,
Trustworthiness is a total-system prob- call-arrival rate at a critical time. Did you know that you can
lem. That is, trustworthiness must con- now order many popular
sider not just attributes of individual Trustworthiness Also Must
elements, but also how they compose Respect Human Behavior ACM conference proceedings
and interact. It is not uncommon for Achieving trustworthiness in complex via print-on-demand?
systems to fail even when every indi- systems also depends critically on the
vidual component is correct and seems people involved throughout system
locally secure. For example, the com- development and use. Many systems Institutions, libraries and
position problem may be as simple as have poorly defined functional and individuals can choose
having different notions of the behav- behavioral requirements—if any.
ior of a particular interface—where System architectures seldom reflect from more than 100 titles
each component might assume the critical requirements, and implemen- on a continually updated
other does input validation—or as tations seldom adhere to those re-
complex as subtle, time- and input- quirements or design specifications.
list through Amazon, Barnes
dependent misbehavior under unusu- Formal methods have significant op- & Noble, Baker & Taylor,
al circumstances. Dependencies on portunities to improve trustworthi- Ingram and NACSCORP:
flawed hardware must also be consid- ness, but are challenging to use co-
ered, such as the recent speculative- herently. In operation, user wisdom CHI, KDD, Multimedia,
execution and out-of-order execution and sensible behavior are often as- SIGIR, SIGCOMM, SIGCSE,
attacks (for example, Spectre/Melt- sumed (instead of building people-
down14 and Foreshadow/Foreshadow- tolerant systems), and the creativity SIGMOD/PODS,
NG vulnerabilities.15 and power of malicious misuse and and many more.
The so-called “Martin Luther King malware are inadequately consid-
Day meltdown” of the AT&T long- ered. Thus, trustworthiness must
distance network in 1990 is a classic anticipate all sorts of human behav- For available titles and
example of what can go wrong. There ior, as well as environmental disrup- ordering info, visit:
was a flaw in the recovery code when a tions. In essence, achieving trustwor-
phone switch rebooted and resumed thiness is very complex, and attempts librarians.acm.org/pod
normal operation. If a neighboring to simplify it are generally fraught
switch received two incoming calls with vulnerabilities.
within 1/100 of a second thereafter, it
would crash. This, of course, triggered Future Directions for Systems
the same failures in its neighbors, it- Research and Development
eratively throughout half a day.8 A research program in systems poses
With so many known vulnerabili- many challenges. The most difficult
ties, and new ones continually being is one of definition: What is systems
discovered, it is obvious that defenses research? What constitutes real in-
are often overwhelmed. For example, novation? Merely having multiple
the Common Vulnerability Enumera- components is necessary, but not
tion (mitre.cve) is approaching 110,000 sufficient. Rather, what is needed
vulnerabilities—approximately 16,000 is a demonstration that new tech-
since the beginning of 2018. niques either contribute to the secu-
More recently, consider the Fore- rity of the full system or let us better
shadow/L1 Terminal attacks on SGX evaluate security. Indeed, some early
discussed at USENIX Security 2018, projects might simply be intended to
viewpoints
point problems—and too little effort

devoted to systems aspects of solutions
Achieving that include considerations of human
trustworthiness behavior. Furthermore, many prob-
Distinguished is very complex,
lems discussed long ago8,9 still have
not been adequately addressed today.
Speakers Program and attempts In addition, underlying principles for
trustworthy systems have been posited
to simplify it since the 1960s and recently revisited,
http://dsp.acm.org are generally fraught but widely ignored in practice.10 A re-
cent book also has more relevant sug-
with vulnerabilities. gestions for the future.12
It is time to get serious about the
dearth of trustworthy systems and the
lack of deeper understanding of the
risks that result from continuing on a
business-as-usual course.
better define the problem and lay out
a suitable research agenda. References
1. Abelson, H. et al. The risks of key recovery, key
One vital approach would be a uni- escrow, and trusted third-party encryption. World-
fied theory of predictable subsystem Wide Web Journal 2, 3 (Summer 1997), 241–257.
2. Abelson, H. et al. Keys under doormats: Mandating
composition that can be used to de- insecurity by requiring government access to all data
and communications. Journal of Cybersecurity 1, 1
velop hardware-software systems for a (Nov. 2015), Oxford University Press; http://www.
Students and faculty wide range of applications out of de- cybersecurity.oxfordjournals.org/content/1/1/69
3. Bellovin, S.M. The key to the key. IEEE Security and
monstrably trustworthy components. Privacy 13, 6 (Nov.–Dec. 2015), 96–96.
can take advantage of Formal methods could be useful selec- 4. Clark, D.D. et al. Computers at Risk: Safe Computing
in the Information Age. National Research Council,
tively. What is essential, though, is that
ACM’s Distinguished the properties being composed are ac-
National Academies Press, Washington, D.C., 1990.
5. Dam, K.W. and Lin, H.S., Eds. Cryptography’s role in
securing the information society. National Research
tually useful in real-world systems.
Speakers Program However, systems design is not a
Council, National Academies Press, Washington, D.C.,
1996.
6. Goodman, S.E. and Lin, H.S., Eds. Toward a safer and
to invite renowned formal discipline today. Therefore, more secure cyberspace. National Research Council,
carefully documented open success National Academies Press, Washington, D.C., 2007.
thought leaders in stories that illustrate the power of an

7. Landau, S. et al. Codes, Keys, and Conflicts: Issues in
U.S. Crypto Policy. (ACM-sponsored study), 1994.
approach are also acceptable, especial- 8. Neumann P.G. Computer-Related Risks. Addison-
academia, industry ly if they enable constructive opportu-
Wesley and ACM Press, 1995.
9. Neumann, P.G. Principled assuredly trustworthy
nities for the future. composable architectures, final report. SRI
and government On a smaller scale, developing
International, 2004; http://www.csl.sri.com/neumann/
chats4.pdf
to deliver compelling mechanisms and tools that advance 10. Neumann, P.G. Fundamental trustworthiness
principles in CHERI. In New Solutions for
the goal of secure systems would also Cybersecurity, MIT Press, Cambridge, MA, 2018.
and insightful talks be useful. Thus, a scheme that pro- 11. Schneider, F.B. and Blumenthal, M., Eds. Trust in
Cyberspace. National Research Council, National
vides strong protection for crypto- Academies Press, 2101 Constitution Ave., Washington,
on the most important graphic keys while still leaving them D.C., 1998.
12. Shrobe, H. et al., Eds. Solutions for Cybersecurity. MIT
useful for authorized uses is valuable.3 Press, 2018.
topics in computing This may be facilitated by specialized
13. Van Bulck et al. Foreshadow: Extracting the keys to
the Intel SGX kingdom with transient out-of-order
and IT today. hardware—if that hardware is trust- execution. USENIX Security (Aug. 14–17, 2018); http://
foreshadowattack.eu/
worthy (including available as needed). 14. Watson, R.N.M. et al. Capability hardware enhanced
ACM covers the cost Thus, a variety of clean-slate hardware RISC instructions (CHERI): Notes on the Meltdown
and Spectre attacks. University of Cambridge
architecture specifications that can Technical Report 916, 2017; http://www.cl.cam.ac.uk/
of transportation be implemented by multiple organi- techreports/UCAM-CL-TR-916.pdf
15. Weisse, O. et al. Foreshadow-NG: Breaking the virtual
zations and that can facilitate total
for the speaker systems that are much more trust-
memory abstraction with transient out-of-order
execution (Aug. 14, 2018); http://foreshadowattack.eu/.
to travel to your event. worthy would also be useful. Again,

Steven M. Bellovin (smb@cs.columbia.edu) is a professor
formal methods could be useful se- of Computer Science at Columbia University, and affiliate
lectively to prove critical properties faculty at its law school.
of some of the specifications. Peter G. Neumann (neumann@csl.sri.com) is Chief
Scientist of the SRI International Computer Science Lab,
and moderator of the ACM Risks Forum. Both Peter and
Conclusion Steven have been co-authors of several of the cited NRC
study reports, and co-authors of Keys Under Doormats.
Research and its funding have often
failed us. There is too much focus on
narrow problems—point solutions to Copyright held by authors.
26 COM MUNICATIO NS O F TH E AC M | NOV EM BER 201 8 | VO L . 61 | N O. 1 1

V
viewpoints
DOI:10.1145/3277567 R. Benjamin Shapiro, Rebecca Fiebrink, and Peter Norvig

• Mark Guzdial, Column Editor
Education
How Machine Learning
Impacts the Undergraduate
Computing Curriculum
The growing importance of machine learning creates
challenging questions for computing education.
M
ACH I N E LE ARN IN G N OW
powers a huge range
of applications, from
speech recognition sys-
tems to search engines,
self-driving cars, and prison-sentenc-
ing systems. Many applications that
were once designed and programmed
by humans now combine human-
written components with behaviors
learned from data. This shift presents
new challenges to computer science
(CS) practitioners and educators. In
this column, we consider how machine
learning might change what we consid-
er to be core CS knowledge and skills,
and how this should impact the design
of both machine learning courses and
the broader CS university curriculum.
Thinking Like a Scientist,

Not a Mathematician
Computing educators1,6 have histori-
cally considered the core of CS to be a
collection of human-comprehensible
abstractions in the form of data struc-
tures and algorithms. Deterministic fication process is not a logical proof the steps needed to accomplish a goal
and logically verifiable algorithms have of correctness, but rather a statistical (how to do it), a typical ML system
been central to the epistemology and demonstration of effectiveness. As is built by describing the objective
practices of computer science. Langley5 observed, ML is an empirical that the system is trying to maximize
IMAGE BY META M ORWORKS
With machine learning (ML) this science that shares epistemological (what to achieve). The learning proce-
changes: First, the typical model is approaches with fields such as physics dure then uses a dataset of examples
likely to be an opaque composite of and chemistry. to determine the model that achieves
millions of parameters, not a human- While traditional software is built this maximization. The trained model
readable algorithm. Second, the veri- by human programmers who describe takes on the role of both data structure
viewpoints
and algorithm. The role that each pa- high-level APIs does not require deep
rameter plays is not clear to a human, understanding of computer hardware
and these computational solutions ML has historically or operating systems. Yet these activi-
no longer reflect humans’ conceptual been a niche area ties can introduce new CS students to
descriptions of problem domains, epistemological practices core to ML,
but instead function as summaries of of CS, but now it is laying the foundation for encountering
the data that are understandable only increasingly relevant ML again in other contexts (whether an
in terms of their empirically measur- elective in ML theory, advanced elec-
able performance. to core CS disciplines. tives in computer vision or architec-
To succeed with ML, many students ture, or in professional software devel-
will not concentrate on algorithm de- opment). Such activities additionally
velopment, but rather on data collec- enable the creation of new and engag-
tion, data cleaning, model choice, and ing types of software (for example, sys-
statistical testing. tems that are driven by real-time sen-
sors or social-media data) that are very
ML Education within CS Education These same two aims can also de- difficult for novice programmers (and
ML has historically been a niche area scribe introductory courses for an even experts) to create without ML.
of CS, but now it is increasingly rele- ML-as-core world. We do not envision Changes to the Advanced Core. In
vant to core CS disciplines, from com- that ML methods would replace sym- most CS degree programs, the intro-
puter architecture to operating sys- bolic programming in such courses, ductory sequence is followed by a set
tems.3 It may even be fair to say that but they would provide alternative of more advanced courses. How should
ML is now a core area of CS, provid- means for defining and debugging that more advanced core change in
ing a parallel theoretical basis to the the behaviors of functions within stu- light of ML?
lambda calculus for defining and rea- dents’ programs. Students will learn Current courses in software verifica-
soning about computational systems. early on about two kinds of notional tion and validation stress two points:
The growing importance of ML thus machine—that of the classical logi- proof of correctness and tests that veri-
raises challenging questions for CS cal computer and that of the statisti- fy Boolean properties of programs. But
education: How should practical and cal model. They will learn methods with ML applications, the emphasis is
theoretical ML topics now be integrat- for authoring, testing, and debugging on experiment design and on statisti-
ed into undergraduate curricula? And programs for each kind of notional cal inference about the results of ex-
how can we make room for expanded machine, and learn to combine both periments. Future coursework should
ML content in a way that augments— models within software systems. include data-driven software testing
rather than displaces—classical CS We imagine that future introduc- methodologies, such as the develop-
skills, within undergraduate degree tory courses will include ML through ment of test suites that evaluate wheth-
programs whose duration must re- the use of beginner-friendly program er software tools perform acceptably
main fairly static? editors, libraries, and assignments when trained using specific datasets,
Changes to the Introductory Se- that encourage students to define and that can monitor measurable re-
quence. Most CS undergraduate pro- some functions using ML, and then to gressions over time.
grams begin with introductory courses integrate those functions within pro- Human-computer interaction (HCI)
that emphasize the development of grams that are authored using more courses may be expanded to reflect how
programming skills, covering topics traditional methods. For instance, ML changes both the nature of human-
like control structures, the definition students might take a game they cre- facing technologies that can be created
and use of functions, basic data types, ated in a prior assignment using clas- and the processes by which they are cre-
and the design and implementation of sical programming, and then use ML ated and evaluated. For instance, ML
simple algorithms.4 techniques to create a gestural inter- enables the creation of applications
In many cases, assignments in these face (for example, using accelerom- that dynamically adapt in response to
courses make use of existing library eters from a smartphone, pose infor- data about their use. HCI education
functions, for instance to read and write mation from a webcam, or audio from currently emphasizes the use of em-
data to the filesystem. Students are not a microphone) for moving the player’s pirical methods from psychology and
expected to fully understand how these character up, down, left, and right anthropology to understand users’
libraries and the underlying hardware within that game. Such assignments needs and evaluate new technolo-
infrastructure work, so much as to use would engage students in creating or gies; now, the ability to apply ML
the interfaces that these libraries pres- curating training examples, measur- to log data capturing users’ interac-
ent. The aims of introductory courses ing how well their trained models per- tions with a product can drive new
are students’ development of notional form, and debugging models by ad- ways of understanding users’ experi-
machines2 for reasoning about how a justing training data or choices about ences and translating these into de-
computer executes a program, and the learning algorithms and features. sign recommendations. Future HCI
development of the pragmatic skills Such activities do not require deep coursework will need to include these
for writing and debugging programs understanding of ML algorithms, just ML-based systems design and evalua-
that computers can execute. as reading from a filesystem using tion methodologies.
28 COMMUNICATIO NS O F TH E ACM | NOV EM BER 201 8 | VO L . 61 | N O. 1 1

viewpoints
Operating systems courses de- a classic view of CS; ML is referred to

scribe best practices for tasks such exclusively within a few suggested
as allocating memory and schedul- elective offerings. We believe the rapid
ing processes. Typically, the values rise in the use of ML within CS in just ACM Journal of
of key parameters for those tasks are
chosen through experience. But with
the past few years indicates the need to
rethink guiding documents like this, Data and
ML the parameter values, and some- along with commensurate changes in Information Quality
times the whole approach, can be al- the educational offerings of comput-
lowed to vary depending on the tasks ing departments. Providing Research and Tools
that are actually running, enabling In addition, research on how peo- for Better Data
systems that are more efficient and ple learn ML is desperately needed.
more adaptable to changing work Nearly the entirety of the published
loads, even ones not foreseen by computing education literature
their designer. Future OS coursework pertains to classical approaches to ACM JDIQ is a multi-
may need to include the study of ML computing. As we have mentioned
techniques for dynamically optimiz- earlier in this column, ML systems disciplinary journal
ing system performance.3 are fundamentally different than that attracts papers
Changes to Prerequisite and Con- traditional data structures and algo-
current Expectations. It is typical for rithms, and must, therefore, be rea- ranging from
CS curricula to require coursework out- soned about and learned differently. theoretical research
side of CS departments, such as courses Many insights from mathematics and
in mathematics and physics. In many statistics education research are like- to algorithmic solutions
cases, and especially when CS programs
are housed within schools of engineer-
ly to be relevant to machine learning
education research, but researchers
to empirical research
ing, these requirements emphasize in these fields only rarely intersect to experiential
calculus coursework. Many programs with computing education research-
include coursework in probability and ers. Therefore, we call upon funding
evaluations. Its
statistics, though notably the authors of agencies and professional societies mission is to publish
ACM and IEEE’s joint Computing Curri- such as ACM to use their convening
cula 2013 “believe it is not necessary for power to bring together comput- high impact articles
all CS programs to require a full course ing education researchers and math contributing to the
in probability theory for all majors.”4 education researchers in support of
Are these recommendations still developing a rich knowledge base field of data and
appropriate? Many programs re-
quire coursework in probability and
about the teaching and learning of
machine learning.
information quality (IQ).
statistics, which we enthusiastically
encourage, as they are crucial for en- References
1. Aho, A.V. Computation and computational thinking.
gaging with the theory behind ML The Computer Journal 55, 7 (July 2012), 832–835.
algorithm design and analysis, and 2. Boulay, B.D., O’Shea, T., and Monk, J. The black box
inside the glass box: Presenting computing concepts
for working effectively with certain to novices. International Journal of Man-Machine
Studies 14, 3 (Apr. 1981), 237–249; https://doi.
powerful types of ML approaches. Lin- org/10.1016/S0020-7373(81)80056-9.
ear algebra is essential for both ML 3. Dean, J., Patterson, D., and Young, C. A new golden
age in computer architecture: Empowering the
practitioners and researchers, as is machine-learning revolution. IEEE Micro 38, 2 (Mar./
knowledge about optimization. The Apr. 2018), 21–29.
4. Joint Task Force on Computing Curricula, Association
set of foundational knowledge for ML for Computing Machinery, IEEE Computer Society
is thus both broad and distinct from (2013). Computer science curricula 2013; https://bit.
ly/2E6dDGR
that conventionally required to obtain 5. Langley, P. Machine learning as an experimental
a CS degree. What, therefore, should science. Machine Learning 3, 1 (Jan. 1998), 5–8.
6. Wing, J.M. Computational thinking. Commun. ACM 49,
be considered essential to the training 3 (Mar. 2006), 33–35.
of tomorrow’s computer scientists?
R. Benjamin Shapiro (ben.shapiro@colorado.edu) is an
assistant professor in the ATLAS Institute, the Department
Conclusion of Computer Science, and (by courtesy) the School of
The ACM-IEEE Computer Science
Curricula 20134 identifies 18 differ-
Education and the Department of Information Science at
the University of Colorado, Boulder, USA. For further information
ent Knowledge Areas (KAs), including Rebecca Fiebrink (r.fiebrink@gold.ac.uk) is a senior
lecturer in the Department of Computing at Goldsmiths, or to submit your
Algorithms and Complexity, Archi- University of London.
tecture and Organization, Discrete Peter Norvig (pnorvig@google.com) is Director of
manuscript,
Structures, and Intelligent Systems. Research at Google, Inc.
visit jdiq.acm.org
The definitions and recommended du-
rations of attention to the KAs reflect Copyright held by authors.
V
viewpoints
DOI:10.1145/3192336 C. Liaskos, A. Tsioliaridou, A. Pitsillides, S. Ioannidis, and I. Akyildiz
Viewpoint
Using Any Surface to Realize
a New Paradigm for
Wireless Communications
Programmable wireless environments use unique customizable
software processes rather than traditional rigid channel models.
W
IRELESS COMMUNICATIONS signal quickly diminishes, making its data rates in 5G pushes for very high
HAVE undeniably shaped reception progressively more difficult. communication frequencies, at 60GHz
our everyday lives. We Second, as this ever-growing sphere for example, where the described ef-
expect ubiquitous con- reaches objects, such as walls, doors, fects become extremely acute.1
nectivity to the Internet, desks, and humans, it scatters uncon- This Viewpoint introduces an ap-
with increasing demands for higher trollably in multiple directions. This proach that could tame and control
data rates and low lag everywhere: at creates the multipath phenomenon these effects, producing a wireless en-
work, at home, on the road, even with where many, unsynchronized echoes vironment with software-defined elec-
massive crowds of Internet users around of the original signal reach the receiv- tromagnetic behavior. We introduce
us. Despite impressive breakthroughs er at the same time, making it diffi- the novel idea of HyperSurfaces, which
in almost every part of our wireless de- cult to discern the original. Third, the are software-controlled metamaterials
vices—from antennas and hardware to scattered signals naturally reach un- embedded in any surface in the envi-
operating software—this demand is get- intended recipients, increasing their ronment.6 In simpler terms, HyperSur-
ting increasingly challenging to address. noise levels (and allowing for eaves- faces are materials that interact with
The large scale of research efforts and dropping). Finally, mobile wireless de- electromagnetic waves in a fully soft-
investment in the fifth generation (5G) vices acquire a false perception of the ware-defined fashion, even unnatu-
of wireless communications reflects frequency of electromagnetic waves, rally.2,7 Coating walls, doors, furniture,
the enormity of the challenge.1 A valu- a phenomenon known as the Doppler and other objects with HyperSurfaces
able and seemingly unnoticed resource effect. Notice that the hunt for higher constitutes the overall behavior of an
could be exploited to meet this goal. indoor or outdoor wireless program-
A common denominator in related mable environment. Thus, the electro-
research efforts is that the wireless The concept magnetic behavior of the environment
environment—the set of physical ob- as a whole can become deterministic,
jects that stand between two wireless of programmable controlled, and tailored to the needs
communicating devices—remains a wireless of mobile devices within it. The same
passive spectator in the data-exchange principle is also applicable to outdoor
process. The ensuing effects on the environments can settings, by exemplary coating poles or
data communication quality are gener- impact wireless building façades.
ally degenerative: First, a transmitting The concept of programmable wire-
device emits electromagnetic energy— communications less environments can impact wireless
carrying encoded information—which immensely. communications immensely, by miti-
dissipates astoundingly fast within the gating—and even negating—path loss,
environment. This path-loss phenom- multipath, and interference effects.
enon can be envisioned as distributing This can translate to substantial gains
the same power over an ever-growing in communication quality, commu-
sphere. The power of the intended nication distance, and battery savings
30 COMM UNICATIO NS O F THE AC M | NOV EM BER 201 8 | VO L . 61 | N O. 1 1

viewpoints
of mobile devices, and even in security From another innovative aspect, the Essentially, the HyperSurface-coated
and privacy. Furthermore, due to the programmable environment concept objects are treated as “routers,” which
underlying physics, HyperSurfaces have abstracts the underlying physics of wire- can forward or block electromagnetic
no restriction in terms of operating less propagation, exposing a software waves in a manner very similar to the
communication frequency, which can programming interface to control it in- concept of routers and firewalls in
extend up to the terahertz (THz) band.10 stead. Thus, the physics behind wireless wired networks. Connecting devices
This attractive trait makes HyperSur- propagation are brought into the realm becomes a problem of finding a route
faces potentially applicable to a variety of software developers, treating the elec- connecting HyperSurfaces, subject to
of cutting-edge applications, such as 4G tromagnetic behavior of objects with performance requirements and user-
and 5G, Internet of Things (IoT), and simple commands, as shown in Figure 2. access policies.
device-to-device systems.
Figure 1. Illustration of the programmable wireless environment concept. The electromagnetic
behavior of walls is programmatically changed to maximize data rates (green use-cases),
The Programmable Wireless wireless power transfer (orange use-case), negate eavesdropping (purple use-case), and
Environment Concept provide electromagnetic shielding (red use-case).
Consider an everyday communication
scenario with multiple users within
a physical environment, as shown in
Figure 1. The common, non-program-
mable environment is oblivious to the
user presence and their communica-
tion needs. The electromagnetic energy
simply dissipates throughout the space
uncontrollably, attenuating very quick-
ly, causing interference among devices
and allowing for eavesdropping.
In the unique programmable envi-
ronments, HyperSurface-coated walls
and objects become connected to
the Internet of Things. As such, they
can receive software commands and
change their interaction with electro-
magnetic waves, serving the user needs
in unprecedented ways. In the example
shown in Figure 1, user A expresses a
need for security against eavesdrop-
ping. The programmable environment,
in collaboration with the user devices, Figure 2. Software commands are combined and applied locally on walls to achieve the
objectives of Figure 1.
sets an improbable “air-path” that
avoids all other users, hindering eaves-
dropping. Users B, C, and D express no
requirement, and are automatically
treated by a global environment policy
instead, which dictates the optimiza-
tion of their data transfer rates. This
can be attained by negating cross-in-
terference and a minute crafting of the
received power delay profile (PDP)—
that is, ensuring all received wave
echoes get constructively superposed
at the devices. User F is observed to be
inactive and—according to his prefer-
ences—has his device remotely charged
by receiving a very focused energy beam.
Finally, user E fails to pass the network’s
access policies (for example, unauthor-
ized physical device address), and is
blocked by the environment. This can
be accomplished by absorbing his emis-
sions, potentially using the harvested
energy for constructive use.
viewpoints
Figure 3. The proposed workflow involving HyperSurface-tile-coated environmental objects. The wireless propagation is tailored to the needs
of the communication link under optimization. Unnatural propagation, such as lens-like focus and negative reflection angles, can be employed
to mitigate path loss and multipath phenomena, especially in challenging non-line-of-sight cases. The main tile components are shown to the
right in the figure.
Structuring Programmable nology for introducing programmable periodically repeated over a dielectric
Wireless Environments wireless environments. They constitute substrate, and connected via switching
The main components that comprise the outcome of a research direction in elements.5 The macroscopic electro-
a single HyperSurface tile and imbue it physics interested in creating materials magnetic interaction of a metasurface is
with control over wave propagation are with engineered electromagnetic prop- fully defined by the form of the meta-at-
shown in the right inset of Figure 3. Dy- erties. Most commonly, they comprise oms and the state of the switches. A cer-
namic metasurfaces are the core tech- a metallic pattern, called meta-atom, tain state of switches may correspond to
full absorption of all impinging waves
Figure 4. Integration schematic of programmable wireless environments in the SDN from a given direction of arrival, while
paradigm.
another may fully reflect them at an un-
natural, completely custom angle.2
User Global The translation of switch states to
Existing SDN Applications objectives policies
interaction types is performed by a nov-
el software class: the electromagnetic
Device Position Wireless compiler. The compiler is implemented
APP APP Discovery Environment
Control by HyperSurface manufacturers and is
APP APP Device Access Loop Optimization APP transparently used by developers. In its
Control (EM Wave routing)
simplest form, the compiler can be seen
Northbound API as a lookup table that keeps the best
switch configurations corresponding to
a set of electromagnetic interactions of
interest. This table is populated by man-
SDN Controller
ufacturers, using well-known heuristic
and analytical techniques in physics.2,3
Upon each tile there exists an IoT
Southbound API
device that acts as its gateway. It exerts
IoT Communication Protocols control over the HyperSurface switch-
(WiFi, Thread, Z-Wave, es, and allows for data exchange using
LoRaWAN, Sigfox, etc.) common communication protocols
(see Figure 4). Using these protocols,
Standard Network Equipment the tiles—and, thus, the coated ob-
HSF HSF
(Switches, Routers, jects—become connected to common
Access Points, etc.) networking equipment. Gateways of
tiles upon continuous objects, such as
HyperSurfaces
(as EM Wave routers) walls, form a wired network to facili-
tate power supply and the dissemina-
tion of software commands. A selected

viewpoints
tile acts as the object’s “representa- security, device position discovery, and
tive,” connecting to the external world. user mobility prediction mechanisms
Figure 4 illustrates the integration Programmable in the SDN world.
of the programmable wireless environ- environments provide Recently, a related project—VISOR-
ment to common network infrastructure SURFa—was funded under the presti-
using the software-defined networking a novel perspective gious Future Emerging Technologies
(SDN) paradigm.1 SDN has gained signif- for wireless call of the European Union Horizon
icant momentum due to the clear separa- 2020 framework. VISORSURF under-
tion it enforces between the network con- communications. went a highly selective review phase,
trol logic and the underlying hardware. with a 3% acceptance rate, and attracted
An SDN controller abstracts the hard- a total budget of 5.7 million euros. The
ware specifics (“southbound” direction) multidisciplinary team of researchers is
and presents a uniform programming developing the hardware and software
interface (“northbound”) that allows the capabilities, posing scalability consid- for the HyperSurfaces, expects to have
modeling of network functions as appli- erations in terms of size, power, and the first prototype within a year, and be-
cations. In this paradigm, HyperSurface manufacturing cost. gin mass production soon afterward.
tiles are treated as wave “routers,” while Despite their simpler design, meta-
the commands to serve a set of users, for surfaces constitute the state of the art a A hypervisor for metasurface functions;
http://visorsurf.edu
example, as in Figure 2, are produced by in range of wave interaction types, and
a wireless environment control applica- with unique granularity. Advanced
References
tion. The application takes as input the frequency filtering, polarization con- 1. Akyildiz, I.F., Nie, S., Lin, S.-C., and Chandrasekaran, M.
user requirements and the global poli- trol, and arbitrary radiation-pattern- 5G roadmap: 10 key enabling technologies. Computer
Networks 106 (2016), 17–48.
cies and calculates the fitting air paths. A shaping functions can be potentially 2. Chen, H.-T., Taylor, A.J., and Yu, N. A review of
control loop is established with existing used for remodulating or “repairing” metasurfaces: Physics and applications. Reports on
Progress in Physics 79, 7 (2016), Physical Society,
device position discovery and access con- waves in the course of their propaga- Great Britain.
trol applications, constantly adapting to tion. Even in simple wave routing and 3. Haupt, R.L. and Werner, D.H. Genetic Algorithms in
Electro-Magnetics. Wiley, NY, 2007.
environmental changes. absorbing functions, metasurfaces 4. Li, L. et al. Electromagnetic reprogrammable coding-
metasurface holograms. Nature Communications 8,
The scalability of the novel program- provide a degree of direction control 1 (2017), 197.
mable wireless environments is a prior- so granular that it has been used for the 5. Li, Y. et al. Transmission-type 2-bit programmable
metasurface for single-sensor and single-frequency
ity, both in software and hardware. In formation of holograms.4 A high degree microwave imaging. Scientific Reports 6 (2016).
terms of software, the additional over- of control granularity is required for 5G 6. Liaskos, C. et al. Design and development of software-
defined metamaterials for nanonetworks. IEEE
head comes from the optimization ser- ultra-high frequency communications, Circuits and Systems Magazine 15, 4 (2015), 12–25.
vice, as shown in Figure 4. As described, as discussed earlier in this Viewpoint. 7. Lim, D., Lee, D., and Lim, S. Angle- and polarization-
insensitive metamaterial absorber using via array.
however, the optimization pertains Moreover, novel dynamic metasurface Scientific Reports 6 (2016).
to finding objective-compliant paths designs employ graphene, offering op- 8. Moghaddam, S.S. and Moghaddam, M.S. A
comprehensive survey on antenna array signal
within the graph of tiles, which is a well- eration at the range of terahertz.10 processing. Trends in Applied Sciences Research 6, 6
studied and tractable problem in clas- (June 2011), 507–536.
9. Parashkov, R. et al. Large area electronics using
sic networking. In terms of hardware, Conclusion printing methods. In Proceedings of the IEEE 93, 7
the IoT gateway approach promotes The design and implementation of (2005), 1321–1329.
10. Tassin, P., Koschny, T., and Soukoulis, C.M. Graphene
miniaturization, low manufacturing HyperSurfaces is a highly interdisci- for terahertz applications. Science 341, 6146 (2013),
cost, and minimal energy consump- plinary task involving physics, mate- 620–621.
tion of electronics, favoring massive tile rial sciences, electrical engineering,

Christos Liaskos (cliaskos@ics.forth.gr) is a researcher
deployments to cover an environment. and informatics. The combined ex- at the Foundation of Research and Technology (Hellas),
Moreover, the choice of metasurfaces pertise of all these disciplines results Greece.
as the means for exerting electromag- in significant value: programmable Ageliki Tsioliaridou (atsiolia@ics.forth.gr) is a researcher
at the Foundation of Research and Technology (Hellas),
netic control has distinct scalability and wireless environments can be en- Greece.
functionality benefits over alternatives. abled for the first time, allowing for Andreas Pitsillides (Andreas.Pitsillides@ucy.ac.cy) is
Metasurfaces comprise thin metallic el- programmatic customization of the a professor in the Department of Computer Science and
the head of the Networks Research Laboratory at the
ements and simple two-state switches, laws of electromagnetic propagation, University of Cyprus.
facilitating their manufacturing using to the benefit of wireless devices. Pro- Sotiris Ioannidis (sotiris@ics.forth.gr) is a member of
large-area electronics methods (LAE) grammable environments provide the staff at the Foundation of Research and Technology
(Hellas), Greece
for ultra-low production cost.9 LAE can a novel perspective for wireless com-
Ian Akyildiz (ian@ece.gatech.edu) is the Ken Byers
be manufactured using conductive munications, where the usual rigid Distinguished Chair Professor in the School of Electrical
ink-based printing methods on flexible channel models are replaced by a and Computer Engineering at the Georgia Institute of
Technology in Atlanta, GA, USA, and member of the staff
and transparent polymer films, incor- customizable software process. Apart at the University of Cyrus.
porating simple digital switches such from unprecedented capabilities in
as polymer diodes.9 On the other hand, wireless systems, this new perspec- This work was funded by the EU H2020 FETOPEN-RIA
project VISORSURF: A Hardware Platform for Software-
alternatives such as antenna arrays8 re- tive can pave the way for a completely driven Functional Metasurfaces (GA 736876).
quire transceivers with accurate state new class of software applications,
control and real-time signal-processing with rich interactions with existing Copyright held by authors.
V
viewpoints
DOI:10.1145/3195179 Janne Lahtiranta and Sami Hyrynsalmi
Viewpoint
Crude and Rude?
Old ways in the new oil business.
E
V E R S I N C E T H E statement also integrated to our homes and per-
“Data is the new oil,” widely sonal transportation that are becoming
credited to British mathemati- smarter,b or at least more connected,
cian Clive Humby,a was made every day. In other words, everything
in 2006, the world has expe- we regard as smart or connected in our
rienced profound changes in the ways everyday lives is becoming intertwined
consumer-related digital information with our mobile phones.
is used and accessed. Accordingly, it In personal health, we are seeing
can be argued that current-generation how different mobile applications have
smart devices and electronic services become numerous. For example, if one
(including Internet service providers) has diabetes, there is an app for record-
are merely pieces of a pipeline in this ing insulin levels. If one has a child
new ‘oil’ business with a sole purpose with autism spectrum disorder, there
of providing ‘crude’ for service provid- is an app that can help in communi-
ers to be processed and refined. In this cation. In addition to apps, the health
sense, the metaphor coined by Humby forms of contract between the licensor service providers are setting up virtual
(and others) is appropriate. and the purchaser. The problem with clinics, patient portals, and all kinds of
Taking the metaphor one step fur- these agreements is that while they are “online health bazaars” that extend the
ther, it must be acknowledged that the in most cases legitimate and enforce- reach of health services from hospitals
tycoons of this new ‘oil’ business are able, they are ubiquitous and non- to homes. In this, the mobile-first ap-
actually pretty smart when compared negotiated—millions of people bind proach has become the prevalent one
to Haroldson Hunt or Clint Murchin- themselves to the agreements every day and our mobile phones act as primary
son, who were among the most no- with a simple click indicating “I agree.” access points for the services.
table businessmen during the Texas The gravity of the current situation So when a consumer, unaware or
oil boom that took place in the U.S. in becomes even more evident if we inves- sometimes even unconcerned, installs
the early 20th century. Unlike in Texas, tigate two particular technology trends; an app that simulates the function of
the ‘land owners’ (that is, consumers) aggregation and personal health. Tech- a zipper (one can zip and unzip it, and
of today pay for the gushers (smart de- nology has a tendency of becoming that’s all), or that of a stapler (one can
vices), pipeline (Internet connection), smaller, smarter, faster, cheaper and tap a virtual stapler onscreen) and the
and they even do the hard labor (use a consequence of this is that our mo- application happens to have a mali-
the services and applications). bile phones have become aggregation cious payload, it does not only com-
Especially in Texas, handshake points for different services, sensors, promise the security of the device—it
deals were a common form of an agree- and data. In many cases, our phones may compromise the security of one’s
IMAGE BY AND RIJ BORYS ASSOCIAT ES/SHUT TERSTOCK
ment—and legally binding in the eyes are the only ‘documentation’ we carry health affairs, housing, banking, busi-
of the law. In today’s oil business the with us when we go about our daily af- ness, and social life. In other words, it
handshake deals and notarized agree- fairs in a bank or a grocery store. may compromise one’s life.
ments are substituted by End User Li- Similarly our phones with their em- Fortunately, this eventuality is a the-
cense Agreements (EULAs) and other bedded sensors, and connected equip- oretical one as marketplaces are quick
ment (such as activity trackers), act as to react to malware, and services typi-
a singular entry point to data depicting
a This statement, and variations thereof, has
also been credited to various authors, includ-
our physical activities throughout the b Mikko Hyppönen from F-Secure prefers the
ing Meglena Kuneva and Richard Titus; see day (exercise, health rate, sleeping …). term “vulnerable,” which is also a pretty accu-
https://bit.ly/2Mp7k9r Increasingly, our mobile phones are rate definition.

viewpoints
cally have implemented extra layers of the position of power, dictating the (and the named third parties). Natu-
security (a side-channel for authenti- content of the agreements. rally, should the premises change, the
cation, for example) in their function. In the current end-user agreements, service provider’s rights should be in-
However, one’s digital life is vulnerable terms of service, and permissions the validated by default, and new permis-
and currently employed measures do fundamental problem is that the con- sions (informed consent) requested
not live up to the task. Fair Informa- sent is not informed. Consumers are from the consumer.
tion Practice Principles, and similar expected to hand over the data they Another step in the right direction
guidelines, such as those by the Markle generate on the basis of obscure agree- would be to use colloquial language
Foundation, have had marginal impact ments and permissions that are de- and terms familiar to the user. Instead
on the situation, as has the current fined on the basis of properties of an of bulletproof legalese, the end-user
legislation. The impact of the General artifact—the high-tech smartphone agreement and terms of service should
Data Protection Regulation (GDPR, EU of today that is millions of times more be stated in a way that reading them
2016/679) that was put into effect in the powerful than all of the NASA comput- does not require advanced degrees in
EU in May 2018 remains to be seen. ers involved in the Apollo 11 mission to both law and computer science. This
Directives, regulations, and legisla- the moon in 1969. perspective on understandability also
tion are part of the solution. Another What we need is a change of per- applies to the way application-level
part comes from the application perspective and a different attitude. Simi- permissions are requested. Instead of
missions of the operating system. larly to the healthcare sector where requesting permission to “make/re-
In contrast to service-side solutions, the patient comes first, the consumer ceive SIP calls” the consumer should
these kinds of source-side solutions should come first in terms of user be informed on what the SIP calls actu-
are more technological by nature, agreements, application-level permis- ally are, and where they are used.
and linked to the function of the mo- sions, and data use in general. In this, This kind of consumer-centric ap-
bile phone. The source-side solutions the healthcare principle of informed proach to end-user agreement, terms
are typically implemented as permis- consent is of the essence. In health- of service, and application permissions
sions: permission to access camera, or care the principle defines that the pa- would not only serve the purposes of
permission to use contacts informa- tient has bodily integration: autono- informed consent—which is a goal in
tion. Commonly, these permissions my and self-determination over one’s itself—it could also have a more pro-
are requested from the user without physical body. In electronic services found impact on the use of technology.
pertinent information, and when a the principle should apply in such a First, in terms of Internet and technol-
permission is granted, it is in effect in- fashion that the consumer has virtual ogy literacy, use of colloquial language
definitely (or in some cases, until the integration: autonomy and self-de- could make the technology more vis-
next major update). termination over the ‘digital self’ (in- ible and tangible as technological
Looking at this kind of setting from cluding data originating from one’s concepts would have a real name and
the ethics side, one question emerges: activities and information stored in meaning. Secondly, the users could be-
Is this really an agreement? On one different devices and services). come more privacy aware, as the data
hand, the user has been given the terms Another central aspect in the in- they generate and its use by the service
of service, which can be 20,000-word formed consent of the health domain providers, is portrayed in full.
document written in bulletproof legal- is that the patient must be sufficiently In late-19th-century American busi-
ese (as in the case of iTunes), and the informed prior to making health- nessmen who used shameless and
user has clicked “I Agree.” On the other related decisions, such as undergo- even ruthless methods to get rich were
hand, the user clicked “Allow” when ing a certain surgery. This principle often labeled as “robber barons.” One
the “beer drinking simulator” applica- is analogous to the virtual world; the of the most famous businessmen la-
tion requested permanent access to the consumer should be sufficiently in- beled as such was John D. Rockefeller,
camera, microphone, contacts, calen- formed prior to making decisions on the founder of Standard Oil. Only time
dar, storage, location, and body sen- the use of use of data that is part of will tell if any of the major players in
sors. This should be enough, right? the digital self. In this, the application this new ‘oil’ business will receive a
The land-lease agreements of the permissions are part of the whole, as similar notorious title. Fortunately,
Oil Rush in general were notoriously they are often the source of the data, there are already players in the field
tricky, promoting the authority of the or one way of communicating the who understand the consumer comes
more-aware landed. It was not rare data with the service provider(s). first and regard privacy and trust as a
that the lease itself, and the related Put in a more straightforward man- competitive advantage instead of a cost
royalty agreements, led to overdrill- ner, this essentially means enforcing or a nuisance.
ing in order to recover the invest- five things highlighted in different
ments of the oilmen. In these cases, recommendations and in the GDPR: Janne Lahtiranta (janne.lahtiranta@turkubusinessregion.
com) is a senior advisor at Turku Science Park Ltd in
the bias of the agreements was heav- the service provider should make a Turku, Finland.
ily one-sided, as is in the case of clear case on what data is used; why Sami Hyrynsalmi (sami.hyrynsalmi@tut.fi) is an assistant
mobile applications (and electronic it is used; how the data is collected; professor in the Pervasive Computing department at
Tampere University of Technology in Pori, Finland.
services) of today. However, the bias who has the access and what is the ex-
has tilted from more-aware ‘land- tent of confidentiality; and how long
owners’ to ‘oilmen’ who are now in the data is accessible by the provider Copyright held by authors.
china region
Intro | DOI:10.1145/3239530 Andrew A. Chien
Introducing Communications’
Regional Special Sections
I
N M Y M A R C H 2018 editorial revisit each global region about every
“Here Comes Everybody … two years.
to Communications,” I an- Communications Each of the Special Sections is led
nounced an initiative to ex- needs to include by a regional team that nominates,
pand the Communications of selects, and drives authorship of the
the ACM community globally. I am important voices section’s content. We began build-
pleased to introduce the first regional and a variety ing our team of industry and aca-
special section, which we hope will demic leaders for the China Region
become a feature in Communications of perspectives nearly one year ago. The team gath-
that you anticipate and enjoy, and of regarding the ered at the University of Chicago
course value for the insights and per- Center in Beijing in March 2018 to
spectives it presents! present and future brainstorm topics and form article
The theme for the regional special of computing, writing teams.
sections is, “Here Comes Everybody To drive this and future regional
to Communications.”a Why bring ev- regardless of where coverage, we also have added new
erybody to the magazine? Because the they may be found. members to the Communications edito-
flagship publication of the world’s rial board specifically focused on Special
leading computing professional soci- Sections. Led by Sriram Rajamani, they
ety must include important voices and were actively involved in vetting and
a variety of perspectives regarding the improving the articles in this inau-
present and future of computing, re- gural China Region special section.
gardless of where they may be found. fundamental choices about security, Thanks to all of the authors, the
Over the years, computing has grown privacy, free speech, and control—in- section leaders, Wenguang Chen
into virtually every industry and every creasingly reflect distinctive regional, (Tsinghua University), and Xiang-Yang Li
product, impacting every aspect of so- national, and community cultures. (University of Science and Technol-
ciety and the economy of every nation; Communications should be an in- ogy of China), and a special thanks
ILLUSTRAT ION BY SPOOK Y POOKA AT DEBUT A RT. FOR CRED ITS ON IMAGES IN COLL AGE, SEE P.2 .
at the same time, the computing pro- clusive forum that spans the global to Lihan Chen, who created and drove
fession also has expanded into com- community, with active participation the process that made the China Region
munities around the globe. from everyone, everywhere. The goal special section a reality.
Computing’s invention and con- of Communications’ global initiative I hope you enjoy it!
tinuing innovation is a global enter- is to bring untapped insights and
prise. While the technical founda- focused coverage to our readers by Andrew A. Chien, EDITOR-IN-CHIEF
tions of computing may be universal, providing highlights of computing
the design of many systems’ most from regions around the world. We Andrew A. Chien is the William Eckhardt Distinguished
Service Professor in the Department of Computer Science
important aspects—how they relate will add a special section to a few is- at the University of Chicago, Director of the CERES Center
to society, government, the structure sues of Communications each year, for Unstoppable Computing, and a Senior Scientist at
Argonne National Laboratory.
of commerce, and individual enlight- highlighting the leadership, unique
enment and perspective, as well as characteristics, and distinctive devel- P.S. The next regional special section is already under way,
opment of computing in the region. focused on Europe. Look for it in early 2019!
a This title borrows from Clay Shirky’s 2008 The Communications global initiative
book Here Comes Everybody: The Power of Orga- will visit regions around the world in
nizing Without Organizations, which described
the growing power of groups of individuals to
turn, shifting its spotlight to match
organize large-scale activities without relying the pace and impact of interesting de-
on traditional corporate organizations. velopments in computing. We hope to Copyright held by owner/author.

DOI: 10.1145/3239532
Welcome to the China

Region Special Section
C
H I N A’S U N I Q U E LAN GUAG E , culture, governance practices, and research
funding systems have had great impact on its Internet industry and
technology development. For example, people in China seem less sen-
sitive about privacy, which may be an important factor in the fast ac-
ceptance of mobile payment systems; combining that with the huge
population of China could motivate many exciting technology innovations.
Some have speculated the Chinese government’s strict supervision of Internet
content and exclusion of some multinational competitors are important factors
in the development of China’s Internet industry, but it is difficult to assess the va-
lidity of such conjecture. In some non-content-based areas, such as e-commerce,
the sharing economy, and Internet travel agencies, Chinese Internet companies
have matched, and even surpassed, their international competitors.
For this special section, we invited contributors from a wide range of academic
and industry communities spanning the Chinese mainland, Macau, and Hong
Kong. We brainstormed article topics in a workshop in Beijing in March 2018.
The response was terrific, and the resulting collection of articles, while far from
comprehensive, offers an excellent snapshot of the most exciting computing
trends and activities in the China region.
We are pleased to present the China Region special section, which includes:
˲˲ A series of short articles (“Hot Topics”) that provide context and flavor of the
region’s distinctive growth, ranging from tech idols and computing culture to
gaming, and
˲˲ Longer articles that document some of the “Big Trends” shaping the comput-
ing landscape of the China region, ranging from financial technology and last-
mile autonomous delivery to SuperAI and cloud bursting. EDITORIAL BOARD
EDITOR-IN-CHIEF
Andrew A. Chien
— Wenguang Chen and Xiang-Yang Li eic@cacm.acm.org
China Regional Special Section Co-Organizers
DEPUTY TO THE
EDITOR-IN-CHIEF
Wenguang Chen is a professor in the Department of Computer Science and Technology at Tsinghua University in Beijing, China,
and co-chair of ACM China Council. Lihan Chen
cacm.deputy.to.eic
Xiang-Yang Li is a professor and Executive Dean of the School of Computer Science and Technology at the University of @gmail.com
Science and Technology of China in Hefei, Anhui, China, and co-chair of ACM China Council.
CHAIR, REGIONAL
Copyright held by owners/authors. SPECIAL SECTIONS
Sriram Rajamani
CHINA REGION
SPECIAL SECTION
CO-ORGANIZERS
Wenquang Chen
Tsinghua University
Xiang-Yang Li
University of Science
and Technology of China
Members of the China Region Workshop. Top left: Tong Zhang, Haibo Chen, Xiaoyang Watch the co-organizers
Wang, Hai Jin, Yuan Qi, Xiang-Yang Li, Xundong He, Wenguang Chen, Jing Xiao, Huaxia discuss this section
Xia, Liang Yu, Chaoyang Lu. Seated from left: Lihan Chen, Hong Gao, Andrew A. Chien, in the exclusive
Yue Zhuge, and Yutong Lu. Communications video.
https://cacm.acm.org/
videos/china-region

CHINA REGION SPECIAL SUPPLEMENT
Hot Topics Big Trends
42 54 82
40 China’s Computing Ambitions 54 People Logistics in Smart Cities 70 Is Last-Mile Delivery a ‘Killer App’
By Elliott Zaagman, Technode By Wanli Min, Liang Yu, for Self-Driving Vehicles?
Alibaba Cloud Computing; By Huaxia Xia, Meituan;
42 Quantum Communication Lei Yu, Shubo He, CTrip Haiming Yang, JD CTO Group
7,600km and Beyond
By Chao-Yang Lu, Cheng-Zhi Peng, 60 Cloud Bursting for the World’s 76 Video Consumption, Social
Jian-Wei Pan, University of Science Largest Consumer Market Networking, and Influence
and Technology of China By Hai Jin, Huazhong University of By Yue Zhuge, Hulu
Science and Technology; Haibo Chen,
44 The Future of Artificial Shanghai Jiao Tong University; 82 Will Supercomputers Be Super-Data
Intelligence in China Hong Gao, Harbin Institute of and Super-AI Machines?
By Jun Zhu, Tsinghua University; Technology; Xiang-Yang Li, University By Yutong Lu, Sun Yat-Sen University;
Tiejun Huang, Peking University; of Science and Technology of China; Deipei Qian, Beihang University;
Wenguang Chen, Tsinghua University; Song Wu, Huazhong University of Haohuan Fu, Wenguang Chen,
Wen Gao, Peking University Science and Technology Tsinghua University
46 Consumers, Corporations, and 65 Fintech: AI Powers Financial Services

Government: Computing in China to Improve People’s Lives
By Peter Guy, South China By Yuan Qi, Ant Financial;
Morning Post Jing Xiao, Ping An Technologies
48 Regional Culture and Personalities

(L) IM AGE BY YURCH A NK A SIA RHEI; ( C) IM AGE BY TOSTPH OTO
50 Can China Lead the Development of

Data Trading and Sharing Markets?
By Xiang-Yang Li, University
of Science and Technology of China;
Jianwei Qian, Illinois Institute
of Technology; Xiaoyang Wang,
Fudan University
52 Exploiting Psychology and Social

Behavior for Game Stickiness
Association for Computing Machinery
By Luyi Xi, NetEase Fuxi Lab Advancing Computing as a Science & Profession
H
hot topics
China’s Ambitions | DOI:10.1145/3239534
China’s
Computing Ambitions
BY ELLIOTT ZAAGMAN/Technode
C
H I N A P L A N S TO
become the
world’s high-
tech leader, and
quickly. In 2015,
the Chinese government’s
State Council approved
“Made in China 2025,”
an initiative designed to
position China as a world
leader in fields such as ro-
botics, aviation, advanced
information technology,
and new-energy vehicles
in less than a decade. In
support of this govern-
mental initiative, China’s
Ministry of Industry and
Information Technology
(MIIT) released a three- Facial recognition technology used in Shenzhen, China, identifies jaywalkers and automatically
year action plana to drive issues fines by text.
growth in areas includ-
ing smart drones, facial approximately 100 times valuable startupc at a valu- Demand from China’s
recognition, AI-supported its size in 2016. As China ation of $4.5B. SenseTime public security agencies
medical diagnosis, speech pushes AI forward, here are specializes in facial recogni- drives demand. “Sense-
IMAGE BY ST REET VJ/ SH UTT ERSTOCK. CO M
recognition, and language a few names, trends, and tion technology with appli- Time ... can grow so fast
translation. If successful, technologies to watch. cations including payment compared to elsewhere in
the initiative would grow verificationd and automated the world because video
China’s AI industryb to a Facial Recognition checkout.e surveillance is a big deal
size of $150 billion by 2020, and Surveillance in China ... there is a huge
Earlier this year, an Alibaba- c https://bit.ly/2GS1OJI
budget for it so they can
a https://bit.ly/2CFrMtZ led funding of $600M made d https://tcrn.ch/2gDsG3X manage society,” explained
b https://bit.ly/2mrvxN4 SenseTime the world’s most e https://bit.ly/2mtlZ4b Justin Niu of IDG, an

early SenseTime investor. eration of computing tech-
China’s facial recogni- nologies may exceed that of
tion firms are finding the West (see the article by While China’s top-down
demand outside China as Y. Lu et al. on p. 82 of this control risks government
well. Shortly after news section). First and foremost
of SenseTime’s massive is quantum computing—it overreach, more open digital
cash influx, Cloudwalk, a has been suggested the first systems risk instability
company based in South general-purpose Chinese
China’s Guangdong prov- quantum computer could and have other weaknesses.
ince, signed a cooperation have a million times the
agreement with Zimba- computing power of all
bwe’s government for a other computers currently
mass facial recognition on earth.
project; the first foray of a China aspires to global
Chinese AI firm into Africa, leadership. The $10-bil- first intent, as Chinese independent approval
a growing focus for China’s lion National Laboratory regulators want to support from various government
diplomatic, military, and for Quantum Information development of AI that agencies; support from the
financial resources. Sciencesj in Hefei, Anhui relies on access to massive Chinese Communist Party
Facial recognitionf is province, is expected to datasets. And second, how allowed China’s dockless
just one dimension. This open in 2020. The labora- user consent is defined, players to grow quickly.
past March, the Guizhou tory, nearly four-million with exemptions that allow With driverless vehicles,
provincial government, square feet, has two major for data processing outside there may well be a similar
Tsinghua University, and research goals: quantum consent. Deeper analysis dynamic (see the article by
Beijing-based d-Ear Tech- metrology and building a of “the Standard” can be H. Xia et al. on p. 70 of this
nologies announced a pilot quantum computer. The found in a report from ear- section).
project intended to create a laboratory also includes lier this year.m The fact that China
national database of “voice- quantum communication does not offer its citizens
prints” and link them to and supports Chinese mili- What Makes China the same legal protections
national ID information. tary efforts and commercial Perfect for AI? or provide government or
development (see article Around the world, con- corporate transparency as
Pursuit of “Self-Reliance” by C. Lu et al. on p. 42 for cerns over job loss and Western democracies is
in Core Technologies more information). privacy fears threaten to a real concern. However,
In early 2018, the U.S. impede AI’s progress and technology is something
Department of Commerce “Saudi Arabia of Data” use. However, China may all societies must inevi-
placed a seven-year bang on While China may lag in be the ideal environment tably deal with, and how
ZTE for violating Iran sanc- some core technologies, to overcome these chal- each manages it will reflect
tions. Although the Trump it has an unquestionable lenges. The first reason those societies’ strengths
administration and ZTE advantage in data (see the is demographic need. and weaknesses. While
have since reached a deal to article by X. Li et al. on p. 50 China’s demography dic- China’s top-down control
lift the ban,h the ban was a of this section). With 772 tates a shortage of quali- risks government over-
“wake-up call” for China’s million Internet usersk fied people to support its reach, more open digital
computing industry. With and growing fast, China aging population. Perhaps systems risk instability and
an estimated 25%–30% of has approximately twice AI can fill those gaps. The have other weaknesses—
ZTE’s components sourced the number of users than second is that China’s top- exposed by recent “fake
from U.S. firms, ZTE’s case the U.S.,l with data protec- down political system can news” and conspirato-
highlighted the depen- tions less stringent than enable rapid, widespread rial election interference
dence of Chinese tech the European Union. As application of innovative examples.
firms’ on others. Subse- Europe’s GDPR regula- technologies, overriding As China becomes a
quently, Xi Jinping and tions came into effect this commercial or popular proving ground for AI tech-
other Chinese leaders have past spring, so did China’s opposition. Consider nology, the world is paying
reiterated calls to achieve Personal Information Se- dockless bike-sharing,n for attention. It may choose
self-reliancei in “core tech- curity Specification (known example, which grew from not to apply it in the same
nologies.” as “the Standard”), called a novelty concept to a sig- ways, but the world should
For example, heavy by Chinese observers a nature aspect of Chinese certainly learn from it.
investment in the next gen- “business-friendly GDPR.” urban life in less than one
Key differences include year. In decentralized west- Elliott Zaagman is a writer, speaker, and
communications consultant focusing on
ern systems, dockless bike- technology, culture, and society in China. His
f https://bit.ly/2A08jay
g https://bit.ly/2uO3ATs j https://bit.ly/2gtaV4s
sharing typically requires work can be read on Tech in Asia, Technode,
and in Chinese at Huxiu.com.
h https://reut.rs/2uxEXeu k https://bit.ly/2DV0wrQ
i https://bit.ly/2uMdMM1 l https://bit.ly/2qP6dFw m https://bit.ly/2DV0wrQ © 2018 ACM 0001-0782/18/11 $15.00
china region hot topics
Quantum | DOI:10.1145/3239536
Quantum Communication
at 7,600km and Beyond
CHAO-YANG LU, CHENG-ZHI PENG, AND JIAN-WEI PAN
University of Science and Technology of China
T
HE EXPONENTIAL
G ROW T H of the
Internet and e-
commerce shows
the importance
of establishing a secure
network with global protec-
tion of data. Cryptogra-
phy, the use of codes and
ciphers to protect secrets,
began thousands of years
ago. While conventional
cryptography methods
predominantly rely upon
mathematical complexity,
their encryption can usual-
ly be defeated by advanced
hacking.
The idea of quantum
cryptography, proposed by
Bennett and Brassard in
19841 and by Ekert in 1991,2 strong research effort has ideal devices never exist in constructed metropolitan
offered a radical, secure so- been devoted to achieving practice and device imper- quantum communica-
lution to the key exchange secure quantum cryptog- fections have become the tion networks in Beijing,
problem based on informa- raphy over long distances, targets of various attacks. Jinan, Hefei, and Shanghai
tion theory, ensured by the aiming at a global scale In 2007, Pan’s group at in China, and connected
laws of quantum physics. for practical use. To this the University of Science these into the longest
Quantum key distribution end, there are two major and Technology of China backbone line to date with
(QKD) allows two distant challenges. First, quantum demonstrated the decoy- a fiber distance exceeding
parties to produce a ran- cryptography is ideally state QKD protocol to 2,000km, on which real-
dom string of secret bits, secure only when perfect close the loophole due to world applications from
called a secret key. single-photon sources imperfect single-photon banks, government, securi-
Since the first table-top and detectors are em- sources.5 In 2013, the same ties, and insurance indus-
QKD experiment in 1989, a ployed. Unfortunately, group demonstrated the tries are now on trial.
first measurement-device- The second major chal-
independent protocol that lenge is long distance.
The world’s first quantum science made the QKD immune For example, at 1,000km
to all hacking strategies with a perfect GHz-rate
satellite has now been combined on detection.4 This work single-photon source,
with metropolitan quantum established secure QKD as ideal photon detectors,
a viable technology under and telecommunications
networks to form a space-ground realistic conditions. optical fibers (with a loss
integrated quantum network.
IMAGE BY YURC HA NKA SIA RHEI
Since 2007, many intra- of 0.2dB/km), one would

city and inter-city quantum detect only 0.3 photons per
communication networks century! One solution is
have been built aiming for quantum repeater proto-
real-world applications. cols that divide the whole
For example, Pan’s team transmission line into N

hot topics china region
smaller segments, and Aug. 16, 2016. Five ground

combine the functionalities stations in China connect
of entanglement swapping, with the satellite. Within a Encouraged by the success of the
entanglement purification year of its launch, three key Beijing-to-Shanghai backbone,
and quantum storage. In milestones were achieved:
spite of remarkable prog- satellite-to-ground decoy- similar quantum cryptography
ress in demonstrations of state quantum key distribu- projects are being planned both in
the three building blocks tion with ~kHz final key rate
and even prototype quan- over a distance of 1,200km;3 Europe and the U.S.
tum repeater nodes, these satellite-based entangle-
laboratory technologies are ment distribution to two
still far from being practi- locations on the Earth
cally applicable in realistic separated by 1,200km with
long-distance quantum a two-photon count rate of
communications. 1Hz, and test of quantum 29, 2017, intercontinental be developed that aims to
Satellite-based free- nonlocality;7 and, ground- quantum communication significantly increase QKD
space quantum communi- to-satellite quantum tele- between Beijing and Vien- time, area coverage, and
cation offers a unique and portation over 1,400km.6 na at a distance of 7,600km bandwidth. Encouraged by
more efficient approach for The effective link efficien- was demonstrated, where the success of the quan-
global quantum networks. cies in the satellite-based secret keys based on the tum science satellite and
The key advantage of this channel were achieved to principle of quantum me- the Beijing-to-Shanghai
approach is that the pho- be ~20 orders of magnitude chanics were used for the backbone, similar quan-
ton loss and turbulence larger than direct trans- transmission of images and tum cryptography projects
predominantly occurs in mission through optical a videoconference. are being planned both in
the lower ~10km of the fibers at the same length of China will build new Europe and the U.S. The
atmosphere, and most of 1,200km. lines further to the South former recently kick-start-
the photons’ transmission The world’s first quan- (Shanghai-to-Shenzhen), to ed a €1 billion Quantum
path is virtually a vacuum tum science satellite has the West (Beijing-Wuhan- Flagship project, and the
with almost zero absorp- now been combined with Guangzhou) and to the latter committed $1.3 bil-
tion. A cross-disciplinary metropolitan quantum North (Harbin-Changchun- lion to a National Quantum
multi-institutional team of networks to form a space- Shenyang-Beijing). There is Initiative in June 2018.
scientists led by Pan spent ground integrated quan- also a plan to launch, with
more than 10 years devel- tum network, and has been both public and private References
1. Bennett, C. and Brassard, G. Quantum
oping a sophisticated satel- further exploited as a trust- funding, more low-Earth- cryptography: Public key distribution
lite dedicated to quantum ful relay to conveniently orbit satellites in the near and coin tossing. In Proceedings
of IEEE International Conference
science experiments. connect any two points future to form a satellite on Computers, Systems and Signal
Processing, 175 (1984).
Nicknamed Micius, on Earth for high-security cluster. In addition, a 2. Ekert, A.K. Quantum cryptography
the satellite was launched key exchange. On Sept. higher-orbit satellite is to based on Bell’s theorem. Physical
Review Letters 67, 661 (1991).
3. Liao, S.-K. et al. Satellite-to-ground
quantum key distribution. Nature 549,
43 (2017).
4. Liu, Y. et al. Experimental
measurement-device-independent
quantum key distribution. Physical
Review Letters 111, 130502 (2013).
5. Peng, C.-Z. et al. Experimental long-
distance decoy-state quantum key
distribution based on polarization
encoding. Physical Review Letters 98,
010505 (2007).
6. Ren, J.-G. et al. Ground-to-satellite
quantum teleportation. Nature 549,
70 (2017).
7. Yin, J. et al. Satellite-based
entanglement distribution over 1200
kilometers. Science 356, 1140 (2017).
Chao-Yang Lu is a professor of physics at

the University of Science and Technology
of China, Hefei.
Cheng-Zhi Peng is a professor of
physics at the University of Science and
Technology of China, Hefei.
Jian-Wei Pan is a professor of physics at
the University of Science and Technology
of China, Hefei.
A space-ground quantum network formed by China’s quantum science satellite and metropolitan
quantum networks. © 2018 ACM 0001-0782/18/11 $15.00
The Future of AI | DOI:10.1145/3239540
The Future of Artificial

Intelligence in China
JUN ZHU/Tsinghua University, TIEJUN HUANG/Peking University,
WENGUANG CHEN/Tsinghua University, WEN GAO/Peking University
C
H I N A’ S R E S E A RC H third of both submitted and
EFFORTS in artifi- accepted papers came from
cial intelligence China.b
(AI) began later Important technical con-
than the U.S. and tributions from the region
Europe. Early contribu- have been made in machine
tions in the 1970s included learning, computer vision,
automated theorem proving, natural language process-
logic reasoning, search, and ing, robotics, and more. For
knowledge engineering. For example, in machine learn-
example, Wen-tsün Wu is a ing, extensive work has been
pioneer in automated theo- done on ensemble learning,8
rem proving. He received the transfer learning,4 artificial
State Preeminent Science neural networks, evolution-
and Technology Award in ary computing,6 and proba-
2000, an honor bestowed bilistic machine learning.9
The first Smart China Expo was held last August in Chongqing and
on only 25 Chinese scien- featured cutting-edge technologies from the fields of AI, robotics, In computer vision, much
tists across all fields to date. 5G, and more. progress has been made on
Bo Zhang and Ruqian Lu Markov’s random field mod-
received the Life Achieve- In the late 1980s and and Hangzhou, increased eling for image analysis,3
ment Award from the China 1990s, an emphasis on funding tremendously to handwritten character rec-
Computer Federation (CCF) research in Chinese natural facilitate the new AI boom. ognition,1 facial recognition,
for their fundamental contri- language processing took The financial boost enabled and so on.2 Finally, progress
butions respectively on prob- hold. Xuan Wang, a pioneer Chinese researchers to in AI hardware has enhanced
lem solving and knowledge in applying AI to Chinese attend international confer- accelerators for deep neural
engineering. character printing and layout ences and become deeply networks.7
With the establishment processing, became another involved and integrated into With this increasing im-
of basic research funding recipient of the State Preemi- international research com- pact and recognition, more
to include AI research and nent Science and Technology munities. It is now common researchers from China have
development (R&D) in 1986, Award in 2001. He created to see China’s researchers been invited to serve the
two agencies—the National the Founder Group, one of attending top conferences, community, in roles such as
Natural Science Founda- the largest computer compa- and their success includes program chairs or area chairs
tion of China (NSFC), which nies on Mainland China in having their research pub- for leading conferences, and
supports basic research, and the late 1990s. Other AI-relat- lished extensively in leading as associate editors for top
the 863 Program (State High- ed companies started during AI conferences and journals, journals. For example, Qiang
Tech Development Plan) for that period include iFlyTek such as AAAI, IJCAI, ICML, Yang from the Hong Kong
applied research—began (Chinese voice synthesis NIPS, CVPR, ACL, PAMI, University of Science and
funding diverse AI-related and recognition), Hanvon Artificial Intelligence, and Technology (HKUST) is the
research topics, such as (handwriting recognition), more. For example, 23% of president of the IJCAI Board
IMAGE BY HELLOABC/SH UTT ERSTOCK.CO M
hardware and software for in- and TRS (Chinese full-text the accepted papers for the of Trustees (2017–2019), and
telligence, human-computer retrieval system). AAAI 2017 conference were Zhi-Hua Zhou from Nanjing
interaction (HCI), intelligent After 2000, China’s Min- from China, rising from only University is serving as a
application systems, neural istry of Science and Technol- 10% in 2012.a For the IJCAI program co-chair for AAAI
networks, genetic algorithms, ogy (MOST), NSFC, other 2017 conference, nearly one- 2019. As the local commu-
machine learning, natural central government agen- nity is growing fast, many top
language processing, com- cies, and local governments a http://www.nber.org/papers/
puter vision, and robotics. including Beijing, Shenzhen, w24254.pdf b https://bit.ly/2uOphTV

conferences such as IJCAI, and image recognition into

ICML, and ICCV are now devices. It also released open
held (or will be) in China. source platforms, such as AI research is young,
Apollo for autonomous driv- but growing up fast in China.
Technical Giants ing, and PaddlePaddle for
Commit to AI Research deep learning.
In addition to government- International companies
supported academic such as Microsoft, IBM, and
research, industry has also Intel, also have built research
been very active in AI explo- labs in China with active AI influential researchers, It is expected to be approved
ration. research. They not only have we are optimistic China’s this year and should run for
China’s technical giants, very high-quality and impact- fast-growing economy and 15 years.
such as Baidu, Alibaba, ful research, such as the Dual the aging population will Last but not the least,
Tencent, and Huawei, are Learning theory proposed by drive strong demand for a positive feedback loop
actively investing in AI re- Tieyan Liu et.al. at Microsoft novel AI techniques, and between academia and
search and related develop- Research Asia, but also feed ensure the successful future industry has been estab-
ment. These corporations China’s AI industry with of China’s AI research. lished, which we believe will
have established their own many high-quality research- Recognizing the strong trigger more fundamental
worldwide AI labs, typically ers and technical managers. demand for AI, the Chinese breakthroughs in AI in the
directed by world-renowned government is planning future.
AI scientists such as Andrew Leading Startups support for AI education,
Ng, who led Baidu Lab from Founded by Professors research, and applications. References
1. Ding, X. and Wang, Y. Character
2014–2017. Moreover, these The AI boom has given rise to In 2017, NSFC’s Information Recognition: Principles, Methods and
companies have branches Practice. Tsinghua University Press,
smaller AI-focused compa- Science Department reor- 2017.
throughout China, the U.S., nies, including Cambricon ganized its five information 2. Gao, W. and Chen, X. Computer
Vision—Algorithms and System.
and Europe. (AI chips), iFlytech (voice), science areas (electronic en- Tsinghua University Press (in Chinese),
AI research in industry SenseTime and MegeView gineering, computer science, 1999
3. Li, S.Z. Markow Random Field
labs is generally more busi- (computer vision), and automation, semiconduc- Modeling in Image Analysis. Springer
ness oriented. They focus on UBTECH (robotics). Re- tors, and optoelectronics) Science & Business Media, 2001.
4. Pan, S.J. and Yang, Q. A survey on
inventing and developing searchers from academic to incorporate the new sixth transfer learning. IEEE Trans. on
AI algorithms and systems institutions and universities area, artificial intelligence. Knowledge and Data Engineering 20,
10 (2009).
to optimize not only their founded many of these firms. The AI 2.0 proposal from 5. Pan, Y. Heading toward artificial
current businesses, such Cambricon was founded the China Academy of Engi- intelligence 2.0. Engineering,
2016,409–413.
as online advertisement, by Tianshi Chen and Yunji neering5 triggered the launch 6. Yan, P. and Zhang, C. Artificial Neural
payments, social network- Chen, both researchers at of a 15-year New Genera- Network and Evolutionary Computing.
Tsinghua University Press, 2005.
ing, and gaming, but also the Institute of Comput- tion Artificial Intelligence 7. Zhang, S. et al. Cambricon-X: An
new businesses such as ing Technology, Chinese Development Plan in July accelerator for spare neural networks.
In Proceedings of the of the 49th
smart city, healthcare, and Academy of Science. They 2017. The plan is focused on Annual IEEE/ACM International
auto-drive technologies. For a forward-looking blueprint Symposium on Microarchitecture,
are pioneers in AI processor 2016.
example, Alibaba’s ET Brain architecture and won the for basic theories and com- 8. Zhou, Z.-H. Ensemble Methods:
Foundations and Algorithms. Chapman
project uses AI to reduce best paper awards at ACM’s mon key technologies, in- & Hall/CRC, Boca Raton, FL, 2012.
traffic jams. It has been premier computer architec- cluding big data intelligence, 9. Zhu, J., Chen, J., Hu, W. and Zhang, B.
Big learning with Bayesian methods.
reported that traffic delays ture conferences, ASPLOS swarm intelligence, cross- National Science Review 4, 4 (2017),
have been reduced by 15.3% and MICRO. Xiaoou Tang, media intelligence, hybrid 627–651.
by controlling 128 traffic a professor at the Chinese enhanced intelligence, and
signals in a select area of University of Hong Kong who autonomous systems, and Jun Zhu is a professor of computer
science at Tsinghua University and director
Hangzhou, where Alibaba’s has won best paper awards at their applications in manu- of TSAIL. Beijing.
corporate headquarters is top computer vision confer- facturing, urbanization, Tiejun Huang is a professor in the School
based. Moreover, ambulance healthcare, and agriculture, of EECS and chair of the Computer
ences like CVPR and ICCV, Science Department at Peking University,
response times in the same founded SenseTime—now as well as AI hardware and Beijing.
area were cut in half. the most valuable AI startup software platforms, policies Wenguang Chen is a professor of
computer science and technology at
In addition to specific in the world, with a valuation and regulations, and ethi- Tsinghua University, Beijing.
applications, corporate gi- of over $4.5 billion. cal concerns. Another R&D Wen Gao is a professor of computer
ants are also trying to build project related to AI is the science at Peking University, Beijing,
and a Fellow of the Chinese Aademy of
their own ecosystems. For The Future of so-called “Brain Science and Engineering.
example, Baidu launched AI Research in China Brain-Inspired Research,”
DuerOS, a system that allows AI research is young, but comparable to Europe’s
users to embed many AI growing up fast in China. Human Brain Project, the
functionalities, such as voice, While still short on ground- BRAIN Initiative in the U.S.,
natural language processing, breaking works and highly and other state-level projects. ©2018 ACM 0001-0782/18/11 $15.00
Consumers | DOI:10.1145/3239538
Consumers, Corporations,
and Government:
Computing in China
PETER GUY/South China Morning Post
U
NIQUE HISTORICAL, Internet, and instant com- infrastructure (see the in a multiyear competition.
SOCIOECONOMIC, munication platforms like article by Y. Qi and J. Xiao Beijing-based Didi’s
and political social media, which have on p. 65 of this section), rapid innovation of new
conditions have spawned both large-scale and its large population. services tied to local
created a distinc- innovations and challeng- In 2017, while China patterns and societal
tive path for China’s rapid es unique to China. ranked 25th in the world in structure, as well as its
integration of computing In the context of tech- terms of smartphone pen- hometown advantage,
and technology into its nology innovation and etration of its population ultimately led to its August
economy and society. disintermediation, the (51.7%), due to its enor- 2016 acquisition of Uber’s
Over the last 20 years, number of Internet users mous population (roughly China operation, a deal
major Chinese technology in China grew 12% from 1.38 billion in 2017, ac- that propelled the global
companies such as Baidu, 2015 levels to 717 million cording to the CIA World expansion and influence
Alibaba, Tencent, and in 2016, while average Factbook), China had more of the resulting $35-billion
Didi Chuxing have gained hourly Internet usage grew smartphone users than any Didi.
prominence through their 30% over the same period, other country in the world Online innovation has
ability to shape and deliver providing both foundation in 2017 (717.31 million, propelled a cycle of growth
democratized computing and momentum to drive followed by India with in online users, and robust
power and services into acceptance of a variety of 300.1 million, and the U.S. online usage is creating
the daily lives of China’s online services, ranging with 226.2 million). monetization for growth,
consumers. from e-commerce to ride- Today, payment apps which has encouraged
The rise of China’s hailing apps. such as AliPay and WeChat substantial and growing
computing industry has One development provide cashless payment investment at all stages
transformed the way its unique to China is the services covering the full of technological develop-
citizens live and consume. country’s unprecedented spectrum of daily life, in- ment. In 2017, China ex-
Computing is a broad term rapid conversion to a cluding buying goods from perienced record venture
that today encompasses a nearly cashless consumer street markets. As a result, capital deal activity, with
wide variety of industries economy. While the even food trucks and street half of the top 10 largest
and companies that utilize technologies underlying vendors often do not ac- deals globally involving
computing power through this have been available cept cash, and beggars are Chinese telecoms and In-
the Internet and personal for nearly a decade, the equipped to accept digital ternet companies accord-
computing devices. The factors driving Chinese donations. ing to Preqin, a source
critical mass of distrib- consumers to largely stop Another area distinctive of data and intelligence
uted computing power using cash include the to China is ride-hailing, a for the alternative assets
has reached new levels country’s relatively high sector in which Uber, Didi industry. Didi Chuxing’s
through smartphones, smartphone usage, its Chuxing, and several other $5.5-billion financing in
distributed computing, the weak traditional financial companies have engaged April 2017 emerged as the

biggest venture capital- nology company and search

backed deal of the past 10 engine provider Baidu was
years. investigated after a student China’s advanced computing
The Chinese are often researching a medical ambitions continue to
portrayed as naïve, and condition was directed to
even gullible, for allow- ineffectual, experimental incorporate tight government
ing their government and medical treatment ads, control of content and a closed
technology businesses to which caused him to miss
collect so much data on seeing genuine medical Internet firewall, a distinctly
them without their consent solutions. different model from that
or knowledge. This has Online entertainment in
become an accepted obser- China is becoming the key of the West’s open Internet.
vation of the unquestioned driver of mobile time spent
status quo in China. online, while e-commerce
However, since it was and games remain the best
learned that Cambridge models for monetizing
Analytica had mined the time spent online, accord-
personal data of 87 million ing to data from venture control funding transfers. to default this year.
U.S. Facebook users (up capital firm Kleiner Perkins The central bank can di- Do the Chinese under-
from the initial estimate of that showed the Internet rectly influence consumer stand the freedoms they are
50 million) and used that accounted for 55% of total behavior such as spending giving up in exchange for
data to affect the results of Chinese media usage in and saving, and the govern- convenience and enter-
the 2016 U.S. presidential 2016 (up from 50% in 2015). ment can directly control tainment? Government
election, it appears private Desktop Internet usage the infrastructure required power increasingly extends
companies exploiting per- made up 37%, and mobile for basic actions (food, beyond privacy to direct
sonal data for advertising Internet usage 28%, of aver- transport, work) essential tracking and control of the
and more proved to be less age daily media consump- for daily life. actions of daily life. This
than ideal for protecting tion in China in 2016. The President Xi’s anti- has not been fully debated
individual privacy. growing amount of com- corruption campaign has in the Chinese media,
puting power in the hands been made more efficient which tends to celebrate
Companies in China of consumers supported through the increasingly the dominance of home-
Whether authoritarian the growth of online enter- digitized economy and grown tech firms; nor has
state control or improved tainment end-user revenue financial system, but it has it been widely discussed in
government regulation in from $8 billion in 2011 to also driven the shadow Western media.
the West will prove more $31 billion to 2016. banking economy even Indeed, the development
effective in combating The cashless digital deeper underground. of computing in China is
data-mining malpractice economy arrived in China According to Bloomberg, distinctive in many ways,
and protecting privacy empowered by government China’s $15-trillion shadow and these differences pres-
remains to be seen. It is regulations and cheap banking industry (non- ent an opportunity to learn
especially important as mobile devices, but citizens bank financial intermedi- about the benefits and
China’s advanced comput- may not be aware of the aries that provide services shortcomings of contrast-
ing ambitions continue to profound implications similar to those of commer- ing approaches.
A LL IMAGES BY SHA RAF M AKSUMOV/SH UTT ERSTO CK .COM A ND PIOTR SWAT /SH UTT ERSTO CK .COM
incorporate tight govern- on their daily lives and cial banks, but outside nor-
ment control of content economic freedom. The mal banking regulations) Peter Guy is a business columnist for the
South China Morning Post and co-founder
and a closed Internet dwindling importance of threatens the stability of and editor of Regulation Asia. He has an
firewall, a distinctly differ- cash implies the wholesale China’s financial system, international background in venture capital
and investment banking in Hong Kong and
ent model from that of the elimination of bank runs. as $3.8 trillion of interest- Guangzhou.
West’s open Internet. Banks can arbitrarily im- bearing trust products,
In 2016, Chinese tech- pose and charge fees and many sold online, threaten © 2018 ACM 0001-0782/18/11 $15.00
Regional Culture | DOI:10.1145/3277554
Regional Computing Culture

and Personalities
D
E S P I T E T H E G R E AT by the members of Human
Firewall, China’s Rights Watch for advertis-
772 million ing male-only jobs and
Internet users are for portraying women
adept at using as “goddesses” to entice
smartphones for social me- young male programmers
dia, live streaming, order- to apply. The companies
ing home delivery, booking apologized and promised
taxis, and sharing bicycles. remedial action, but the
Despite this massive market flawed practice reflects a
and bountiful opportuni- serious challenge in China:
ties for computing careers, The world’s most popu-
Chinese corporations face lous country does not have
great obstacles in attracting enough technologists to
high-tech talent. drive its lofty tech ambi-
In April 2018, Chinese tions. Lou Tiancheng has been called
tech giants Baidu, Alibaba, The average computer
and Tencent were called science doctoral graduateb “one of the world’s best hackers.”
out for sexism in their in China earns 121,000 yuan
LEI JUN PHOTO BY ZH ANGJ IN_ NET /SHUT TERSTOCK.COM ; P HOTO OF LOU T IANCH ENG GIVING SPEECH IN E MT ECH CHI N A, S OURCE : MI T T ECHN OLOGY RE V I E W CHI N A.
recruitment campaigns.a ($19,000) a year, and those
These corporate comput- with AI skills can command
ing leaders were criticized 300,000 to 500,000 yuan.
“There continues to be a massive need for talent in the recruiting act, with
China,” said Jerry Yang,c Shenzhen spending 500
a https://www.hrw.org/re-
port/2018/04/23/only-men-need- b 7http://www.chinadaily.com.
Yahoo co-founder. While million yuan last year under
apply/gender-discrimination- cn/a/201806/02/WS5b11977e- plenty of jobs await com- its so-called Peacock Plan
job-advertisements-china a31001b82571dc75.html puting graduates in China, to attract overseas high-
some see their first or tech industry experts and
second employer as simply academics.
a steppingstone to starting
their own company. Rock Stars
To address the technolo- In China, tech geeks are
gist shortage, China aggres- rock stars, occupying the
sively recruits talent from same limelight as teen
abroad through government idols. The late Stephen
programs like ”Thousand Hawking has a huge fol-
Talents,” which through lowing in China, and Elon
2017 had attracted more Musk is widely admired for
than 7,000 top-level over- his visionary work in elec-
seas Chinese scientists and tric cars and space travel.
engineers home with the While some Chinese lead-
promise of a 2 million yuan ing lights have achieved
Lei Jun, CEO and co-founder ($317,150) research grant, international fame, here we
a 500,000 yuan “personal highlight four homegrown
of Xiaomi, is a role model for reward,” as well as medical geek stars:
young technologists. and housing benefits. Even Lei Jun, CEO and co-
major cities have entered founder of the smartphone
and Internet services
c https://996.ggvc.com/category/ company Xiaomi, is a role
podcast/page/2/ model for young Chinese

technologists hoping to start-up Pony.ai, where he

strike it rich using their serves as chief technology
tech skills. Once referred to officer. He is a role model
as the Steve Jobs of China, for young Chinese techies
and derided for producing interested in turning their
Apple copycat products, Lei coding skills into fame and
has forged his own identity. fortune. While at Baidu, Lou
Xiaomi expects to raise $6 advisede graduates not to
billion in its initial public think of programming as a
offering (IPO). To believe shortcut to get into a better
Lei, money is not the moti- college or find a better job.
vation to build Xiaomi—he “Learn to code for the fun of
has pledged to hold profit it,” he said. “In the process,
margins on hardware down you will improve your rea-
to 5%, and return the soning and problem solving
surplus to users. Speaking abilities.” Naomi Wu is on a mission
for Xiaomi’s founders, Lei China has 164 unicorns,
has declaredd “The spirit of which are private startups to inspire women to work in
engineering runs through valued at more than $1B. the computing field.
our veins,” and “All of us The largest, with a valuation
are hardcore fans of tech- of $56B, is ride-hailing giant
nology.” Didi Chuxing. Its president,
Lou Tiancheng, 32, has Jean Liu, is a role model for
been called “one of the Chinese women in tech. a career at Goldman Sachs. ware electronics ecosystem,
world’s best hackers.” He Born in 1978, the daughter She was inspired to study making it an ideal base
emerging as runner up at of Liu Chuanzhi, the founder computer science at Peking for makers—DIY techies
Topcoder Open in 2010 of China’s computing giant University after reading Bill who dabble in electronics,
and racked up back-to-back Lenovo, she started the Didi Gates’ The Road Ahead. robotics, and 3D printing,
wins at the Google Code Women’s Network to help In 2017 Liu was named and more.
Jam in 2008 and 2009. Lou break what she calls the one of Time magazine’s One extraordinary
left Baidu in December “mid-career bottleneck.” “100 Most Influential maker in Shenzhen is
2016 to co-found China’s Liu joined Didi in 2014 after People,” where Apple Naomi Wu,f an accom-
autonomous vehicle CEO Tim Cook called her plished 20-something geek
a “disruptor.” (Indeed, a known as SexyCyborg to
e http://global.baidu.com/
d http://blog.mi.com/ news-single/qa-with-top-coder- year earlier, she convinced her YouTube and Twitter
en/2018/05/03/open-letter-from- tiancheng-lou-baidu-autono- Apple to invest $1 billion in fans—most of whom are
JEA N LIU PH OTO BY TOMOH IRO OH SUM I/GETT Y IMAGES; NAOM I WU PH OTO COU RTESY OF NAOMI WU /W IKI ME D I A CC-BY-SA- 4.0
our-chairman/ mous-driving-team/ Didi). After acquiring Uber’s outside of China since both
China business, she was platforms are banned in
confronted with a new crisis China. Wu, who learned
recently when a female computer coding after
passenger was raped and finishing high school, is on
murdered by a Didi driver, a mission to inspire women
prompting the company to work in the computing
to tighten up security and field. “I’m not a rock star
privacy measures to protect coder or anything,” she told
its female riders. Newsweek in an interview in
While Didi, Baidu and November 2017. “But I am
Xiaomi are all based in good at cleaning up after
Beijing, China’s Silicon rock star coders.” Wu’s DIY
Valley is in Shenzhen, a projects are mainly focused
metropolis of 12 million on wearable tech items for
people just north of Hong women.
Didi president Jean Liu was Kong. Shenzhen is home With role models like
to some of China’s biggest Jun, Lou, Liu, and Wu,
named one of Time magazine’s tech companies, including China’s computing culture
“100 Most Influential People.” Huawei Technologies, ZTE, is vibrant and distinctive.
drone maker DJI, and Inter-
net giant Tencent. Shen- f http://www.atimes.com/article/
zhen is a talent magnet, meet-chinas-sexycyborg-god-
and hosts an entire hard- dess-geeks/
Data Trading | DOI:10.1145/3239542
Can China Lead the

Development of Data Trading
and Sharing Markets?
XIANG-YANG LI/University of Science and Technology of China,
JIANWEI QIAN/Illinois Institute of Technology, XIAOYANG WANG/Fudan University
A
T THE SAME ties including privacy and ity and profit. The President crawlers, APIs, and analytical
time the Euro- security. of China, Xi Jinping, par- results. They can be traded
pean Union is China, with the world’s ticularly emphasized the either off-the-rack or via cus-
implementing largest e-commerce and mo- importance of data’s open tomization, and come from
new strict data bile payment markets,a has sharing and fusion as part of many directions including
protection regulations, Chi- an estimated big-data market the national strategy for big banking, energy, health care,
na’s data trading and sharing of $70B circa 2015, which has data on December 8, 2017, transportation, industry,
markets are booming. Here, been projected to grow to encouraging data sharing agriculture, tourism, educa-
we survey the status of these $155B by 2020.2 As in much across government sections tion, telecommunication,
developing markets driven of the world, over 80% of data and local governments, and and much more. One of the
by growing demand from in China is privately held by data sharing/trading between largest is the Global Big Data
artificial intelligence (AI)- the governments and private governments and private Exchange in Guiyang with
related industries, covering companies, restricting its companies. over 2,000 corporate mem-
government encouragement exploitation for productiv- The founder of Fa Yuan bers and more than 150PB
as well as critical concerns Di Ltd. said the market size of reported stored data circa
and research opportuni- a https://mck.co/2GXOoqS of data trading in China was March 2018.b
approximately $3.2 billion
Data exchange markets in China. in 2016. The market size is Policies and Environment
Date estimated to grow to $8.7B by Following the U.S. and other
Data Exchange Platform Est. URL 2020.1 Examples of data trad- countries, the Chinese policy
Global Big Data Exchange Dec 2014 http://www.gbdex.com ing and sharing include: encourages governments to
(Guiyang) ˲˲ Didi Chuxing became share data to enhance trans-
East Lake Trading Center Jul 2015 http://www.chinadatatrading.com the winner in the fiercely parency and efficiency. The
for Big Data
competitive ridesharing State Council Guidelines for
Jiangsu Big Data Exchange Nov 2015 http://www.bigdatahd.com
industry with the help of Promoting Big Data Devel-
Chongqing Big Data Oct 2015 http://www.crazyapi.org
Trading Market big data from Wechat, the opment, released in 2015,
Shanghai Data Apr 2015 https://www.chinadep.com dominant app for messag- proposed the goal of estab-
Exchange Corp. ing, social media, and mobile lishing a united platform for
Qingdao Big Data Exchange 2017 http://www.qddata.com.cn payment; open government data by the
Zhejiang Big Data Exchange Mar 2016 http://www.zjdex.com ˲˲ Vanke Real Estate utiliz- end of 2018. The top priority
Harbin Data Exchange June 2016 http://www.hrbdataex.com es big data provided by China is to share data from several
Central China Data July 2016 http://www.ccbde.cn Mobile to find the best loca- important realms, including
Exchange tions for investment; and, credit, transportation, and
Qiantang Data Exchange Dec 2015 http://www.qtjiaoyi.com ˲˲ Ping An Insurance takes health care. The Chinese
Data Tang June 2011 http://www.datatang.com people’s online behavior data government also encour-
You-e Data Dec 2015 http://www.youedata.cn from Baidu, Tencent, and Ali- ages private data trading to
Lei Ju May 2011 http://www.leiju.cc baba to accurately pinpoint expand the digital ecosystem.
Fa Yuan Di Sept 2015 http://www.finndy.com potential customers and cre- There are no regulations
JD Wan Xiang Unknown https://wx.jcloud.com ate new insurance products. specialized for data sharing
BDG Store Nov 2016 http://www.bdgstore.cn Twenty data markets have and trading yet, except those
Ali Data Market Sept 2009 https://market.aliyun.com/chn/data been established by various aiming at protecting national
Baidu Data Unknown http://apistore.baidu.com local government authori- security, trade secrets, and
Big Ocean Unknown http://www.dahaiyang.com ties and private enterprises copyrights, and banning the
Markway Mall 2001 http://www.markwaymall.com in China (see table at left),
trading whole datasets, Web b https://bit.ly/2LqeREB

propagation of illegal con-

tent (such as terrorism, fake
news).2 Lack of regulation Cloud
(8)
B
Broker
(10)
1. Data collection & processing
2. Proof of data possession
allows experimentation, but (2, 6, 9)
3. Data exhibition
some data owners hesitate 4. Requirements specification
to share their data due to (2, 6, 9) (4, 5, 7, 8) 5. Auction & contract design
(3) 6. Access control
potential legal or business 7. Multilateral trade
consequences. There have 8. Verifiable computation
been public discussions (1) (2, 6, 8, 9)
9. Free trial and return
$
10. Data tracking
about sharing and trading
of personal data, which is
Seller Buyer
openly traded. While the
EU’s General Data Protection A general workflow of data exchange.
Regulation (GDPR) governs
companies collecting, pro- scale, poor in quality, and low auditing can ensure data was correctly computed over
cessing, and selling their con- in value. Critical concerns security and alleviate sellers’ this exact dataset—extremely
sumers’ data, China has no inhibit data exchange, as concerns. Data transactions difficult if buyers also want
national regulation on data depicted in the figure here. should also be tracked to query privacy.
protection; only fragmented Preprocessing. To in- achieve accountability.3 Many
regulations exist, like the crease usability, sellers must platforms, such as Global Big Conclusion
Cyber Security Law and the preprocess data cleaning, Data Exchange (Guiyang) and While these challenges are
Personal Data Infringement labeling, reconciliation, JD Wan Xiang, have adopted daunting, the strong govern-
Interpretation that came into fusion, and desensitization, blockchain to strengthen ment encouragement and
effect in June 2017.2 which require automation for trading security because of its rich data collection enable
A GfK surveyc indicated big data. Manual approaches favorable decentralized and rapid growth of large-scale
38% of people in China are are still common. Behind tamper-resistant qualities. data exchanges in China.
very willing to share per- China’s booming AI industry Privacy. Before being With technology advances
sonal data for better service, are almost one million data traded, data must be desen- (blockchain, AI, big data ana-
whereas the ratio is 25% in labelers: mostly rural, part- sitized to protect personal lytics, cloud computing) and
the U.S. and lower than 20% time workers.e information and privacy. maturing policies (privacy,
in most European coun- Pricing. Data products Sellers should also take into digital ethics), we are optimis-
tries. However, awareness have unlimited supply be- account potential privacy tic that better data sharing
and concerns are growing cause of the little marginal leaks from temporal, spatial, and trading ecosystems will
in China. In March 2018, a cost, causing the Arrow- and different owners’ data help China’s economy transi-
survey conducted jointly by Debreu equilibrium price to linkage. Although China tion to a new level of global
China Central Television and be close to zero.5 While early lacks strict regulations like competitiveness.
Tencent Research indicated attempts have explored data the GDPR, most trading
76.3% of 8,000 Chinese peo- pricing,4,6 it is still an open markets in China claim to References
1. Global Big Data Exchange (Guiyang).
ple interviewed were worried problem. Today, standard desensitize personal data. White Paper on China’s Big Data
about the threat AI posed to whole datasets are sold at For instance, Shanghai Data Exchange. 2016.
2. Institute of Information and
their privacy. A few days later, fixed rates, and customized Exchange Corp. protects per- Communication of China. Big Data
White Paper. 2018.
Baidu’s CEO said Chinese data is priced by negotiation. sonally identifiable informa- 3. Jung, T. et al. Accounttrade:
people are willing to sacri- The pricing strategies for tion through encryption and Accountable protocols for big data
trading against dishonest consumers.
fice privacy in exchange for APIs include pay-as-you-go encoding when they perform In INFOCOM (2017), 1–9.
convenience, triggering an and wholesale. data linkage. 4. Li, C., Li, D. Y., Miklau, G., and Suciu,
D. A theory of pricing private data.
enormous public backlash.d Security. Data is vital to Verifiability. There are Commun. ACM 60, 12 (Dec. 2017),
Compared to the EU and the information asymmetry cases where the traded data 79–86.
5. Quah, D. Digital Goods and the New
the U.S., however, China’s between different companies was forged, producing dis- Economy. 2003.
market is a lenient, more and government sections. trust. When sellers list their 6. Zheng, Z. et al. An online pricing
mechanism for mobile crowdsensing
permissive environment for Inappropriately sharing may datasets on the marketplace, data markets. In MobiHoc (2017), 26.
data trading and sharing. reduce ability to compete, they must prove to the trading
expose wrongdoings, or broker their ownership of the Xiang-Yang Li is a professor and Executive
Dean of the School of Computer Science and
Concerns harm their public images. data, the data’s authenticity Technology of China, Hefei.
and Opportunities Most data is sold through and accessibility, and that the Jianwei Qian is a Ph.D. candidate of
Data markets are growing in API, but many sellers are data content and quality are computer science at the Illinois Institute
of Technology, Chicago.
China, but are still immature worried the buyers can as claimed. In addition, the
Xiaoyang Wang is a professor and dean
with most datasets small in infer data content. Utilizing proof should not disclose the of Computer Science at Fudan University,
techniques such as query data content. When buyers Shanghai.
c https://bit.ly/2K2E0zt purchase the API of a dataset,

d https://ab.co/2uUL4JJ e https://bit.ly/2NRtxZV sellers must prove the API © 2018 ACM 0001-0782/18/11 $15.00
Psychology | DOI:10.1145/3239544
Exploiting Psychology
and Social Behavior
for Game Stickiness
LUYI XU/NetEase Fuxi Lab
C
H I N A’ S G A M E keting of video games
MARKET is the increasingly relies on
largest world- socialization. Tencent,
wide, recording a conglomerate with
$32.5 billion in greater revenues in 2017
sales in 2017. However, the ($21.9 billion) than the
lack of copyright protection combined revenues of
(common in the European Activision Blizzard ($7.07
Union and U.S.) has forced billion), Electronic Arts
China’s game developers ($4.845 billion), Ubisoft
in the offline PC game ($1.704 billion), and Take
industry to pursue Web Two Interactive ($1.779
and mobile-based game billion), touches tens of
development. Today, millions of users via its
few developers in China Web portal QQ and mobile
focus on developing PC chat service WeChat, and
or console games. attracts them to its online
Of the 507 million gam- games. For example,
ers in China as of June players are drawn to Ten-
2017 (an estimate by the cent’s multiplayer online
China Audio-Video and battle game Arena of Valor
Digital Publishing Asso- through the company’s
ciation and CNG Games players what they don’t new PlayerUnknown’s Bat- operation of the League of
Research Center in the have in real life—the hap- tlegrounds (PUBG), which Legends multiplayer online
China Gaming Industry piness of making friends, support various forms of battle arena game on the
Report January–June a feeling of superiority in collaboration and confron- PC platform.
2017), the number of their skills, the feeling they tation among gamers, cre- Socialization’s promi-
console game owners (an are “part of something.” As ate an obsessive stickiness nent role in Chinese video
estimated 9.09 million, ac- a result, interactive video for Chinese users. games has attracted spe-
cording to global gaming games like the relatively Promotion and mar- cial attention from com-
research firm Newzoo in puter scientists. Several
2016) and offline PC gam- news publications labeled
ers (18.5 million, based on Games give players what they 2017 the “Year of Artificial
data provided by Steam in Intelligence,” marking a
2017) together account for don’t have in real life—the transformation in human-
less than 10%. happiness of making friends, computer interaction. A
The Chinese video game good example of this can
a feeling of superiority in their
IMAGE BY HA F IEZ RA ZA LI/SH UTTERSTOCK. CO M
market boom has been be seen in NetEase.

driven by escalating qual- skills, and the feeling they are NetEase, Inc., founded
ity, attracting gamers with in 1997, is a leading In-
amazing in-game graph- “part of something.” ternet company providing
ics, music, stories, level online services centered
design, and gameplay, but on content, community,
increasingly players are communications, and
attracted by the desire for e-commerce. It entered
socialization. Games give online gaming in 2001,

and develops and operates (MMORPGs), social inter- Figure 1. The NPC interactive interface on Nishuihan Online (still in
large online games that action has formed an in- development from NetEase).
accounted for 60% of its trinsic part of online game
2017 revenue of $5.3 bil- play. MMORPGs have been
lion, making NetEase the designed as virtual worlds
sixth-largest game com- that are socially realistic,
pany in the world. achieving immersion
NetEase architects based on diversity of in-
in-game social systems teraction. When World of
as a core competitive Warcraft hit the market in
advantage. The company 2004, that game’s design-
cultivates player loyalty ers at Blizzard Entertain-
by constructing in-game ment had already started
social networking systems to leverage “reputation”
in which players collabo- to enrich interactions
rate and compete just like between players and NPCs
they do in the real world, (Figure 1).
Figure 2. The personality traits of a player in The Legend of Chu
through their avatars. New game releases by
Liuxiang, recently released by NetEase.
Yet gamer presence NetEase add more granu-
and immersion in vir- larity to NPCs’ interactions
tual worlds is crucially with players. In popular
tied to the intelligence multiplayer online role-
of in-game non-player playing games, players
characters (NPCs). In interact with several
conventional games, most characters as groups, and
NPCs stick around certain reputation determines
areas and mechanically how their interactions
repeat conversations and with each faction play out.
behaviors when prompt- In Justice Online (Ni Shui
ed, providing limited Han) and The Legend of Chu
interaction. If NPCs can Liuxiang (Figure 2), among
have sophisticated logic, other online RPG titles,
responding spontaneously players’ avatars interact
to player input, a game’s with NPCs as individuals. In future games, language of NPCs using
explorative and immersive Each character behaves NetEase plans to incorpo- facial expression transfer,
experience could reach a within a framework of rate personified chatbots motion transfer, and emo-
whole new level. dimensions including behind NPCs and speech tion recognition from text
With artificial intel- personalities, emotions, synthesis, enabling for 3D facial expression
ligence advancing rap- and health. The states diverse and even spoken rendering. As a result,
idly, NetEase is investing of players’ avatars and interaction. Another NPCs may become suf-
aggressively to improve NPCs are influenced by conceptual project at the ficiently human-like for
in-game social interac- their history and actions, company is the creation gamers to have sentimen-
tion between human and which affect how they of voice-based games in tal interactions with them,
machine. NPC-based interact with each other. which the player’s com- and even relationships.
artificial intelligence (AI) For example, an NPC with mands and directions NetEase is a trailblazer
was first incorporated a growling stomach might are spoken, and several of in-game human-ma-
in offline role-playing decline a martial arts intelligent characters can chine social interaction.
games (RPGs), such as contest with other players; behave in complicated Although human-machine
the Nemesis System that those with evil morals may ways and respond to situa- social interaction is new
enabled the game Middle find it difficult to improve tions independently. to the market, we believe
Earth: Shadow of Mordor righteous NPCs’ approval To accomplish those artificial intelligence will
to stand out, as it allowed ratings; and some of the goals, the company is grow into a major driver
enemies in the game to ac- NPCs will give gifts to and exploring the use of of games that meet play-
tually evolve by slaying the team up with players they reinforcement learning ers’ social-psychological
player’s character, result- favor. Such interactions are to train AI in ways includ- needs.
ing in their growing into not new to users in offline ing behaviors associated
dangerous adversaries. RPGs, but are still innova- with any kind of language. Luyi Xu is a senior systems specialist at
NetEase Fuxi Lab, Hangzhou.
From the earliest tive to developers of online NetEase developers also
massively multiplayer games, and have proven are working to improve the Copyright held by author.
online role-playing games popular with players. facial expression and body Publications rights licensed to ACM.
big trends
DOI:10.1145/ 3239546
able to predict city traffic conditions
BY WANLI MIN/ALIBABA CLOUD COMPUTING, with full spatiotemporal coverage and
LIANG YU/ALIBABA CLOUD COMPUTING, optimize transportation systems ac-
LEI YU/CTRIP, SHUBO HE/CTRIP cordingly.
The blossoming of smart mobility
People
initiatives is due to the fact the potential
of big city traffic data has not been fully
mined. Researchers are still pursuing a
better way to break the data silos while
also preserving privacy. Fortunately, we
have seen significant efforts from both
Logistics
industries and governments in China to
promote data sharing for innovations.
Here, we will focus on the smart mo-
bility scenarios that are representative
for cities in China. We will elaborate on
two aspects: in-city and intercity trans-
in Smart
port. For the in-city scenario, we take
Alibaba’s City Brain program as an ex-
ample to introduce how the big city data
can be used to optimize the traffic sig-
nal scheduling and accelerate access for
Cities
life-saving emergency vehicles (EV). For
the intercity scenario, we focus on the
weekend/holiday crowdedness unique
to China, and introduce how big tour-
ism data can be leveraged to divert tour-
ists to more suitable attractions to avoid
traffic congestion.
Real-Time and Holistic Situational

Awareness on City Traffic
High-quality, real-time, and holistic
traffic condition sensing is the prereq-
CITIES IN CHINA are growing rapidly in terms of uisite of further transport optimiza-
both size and complexity. Governments have been tion. To overcome the limit of spatio-
searching for new technologies to make cities more temporal coverage of traditional sensor
data, the data from active navigation
efficient, and smart mobility has been the top priority apps, including private cars and taxis,
in all solutions. are employed to rebuild the trajecto-
ries using map-matching techniques,
The past few years have seen a paradigm shift for which can cover almost the entire road
smart mobility in China, that is, data-centric companies, network of a city. Trajectory data can
mostly Internet companies, are taking a leading role in be used for various purposes, such as
traffic parameter extraction (speed/
such initiatives instead of governments and academic volume) or origin-destination (OD)
researchers. For example, Alibaba, Baidu, Tencent, analysis. Various traffic indices can be
generated based on these parameters,
Ctrip, and Didi, among others, are spearheading the such as delay index that measures the
smart mobility initiatives. The driving force is twofold: travel time delay comparing to no-con-
IMAGE BY TOSTPH OTO
these companies have accumulated a huge volume of gested situation, and imbalance index
that measures the speed difference
data and invested a great deal of resources in the AI between upstream and downstream
arena. Their main focus involves AI systems road links. By monitoring the indices,

china region big trends
Figure 1. Incident detection and tracking in Hangzhou city brain. Hikvision,c Dahuad
About 80% of the serious traffic con-
The system is able to recognize more than 10 types of incidents gestions are caused by accidents. Quick
and track the related vehicles crossing multiple cameras.
response to accidents and anomalies
is a very effective way to improve traffic
Raw Video Data Online Analysis Object Tracking
conditions. Moreover, once multiple
Feature
data sources are integrated to generate a
Tracking holistic view of the city traffic, it not only
improves the performance of the traffic
management, but also enables further
Traffic applications that can systematically use
Match 95% Match 68%
this data to optimize the city transport.
Match 53% Match 41% Self-Adaptive Traffic

Traffic Optimization
Signal Optimization
The traffic signal is one of the most im-
Spherical Traffic
Camera Camera portant means of directing city traffic.
Incident
The dominant traffic signal systems run
at a cycle/split-based scheme, that is,
Figure 2. Intersection with four entrances. view and interpret current traffic condi- each split corresponds to a phase with-
tions. It is a very tedious and error-prone in which only certain directions are al-
Normal North job and becomes even more challeng- lowed. Then the problem is how much
Congestion ing as the number of CCTVs continues (green) time to allocate to each phase.
Downstream link
of traffic flow i to increase. Existing signal systems either run at
Thanks to advancements in comput- fixed timing schedules, or rely on the
er vision technology and elastic cloud fixed loop detectors to do self-adaptive
West East
computing, it is possible to empower scheduling. However, loop detectors
computers to do the more analytical or speed cameras are fixed at certain
Upstream link
of traffic flow i
work usually handled by humans.2 locations and the data is considered
Alibaba’s City Brain project adopted nearsighted, while the root cause of a
South a cloud-based architecture to stream traffic jam might originate from a long
the large volume of video data into the distance. As holistic traffic sensing is
abnormal traffic conditions can be de- cloud, where it is processed in parallel now possible by fusing multiple data
tected and altered for attention in real to generate structured results such as sources, there are a few new ideas for
time. speed/volume, queue length, and inci- optimizing traffic signals.
Navigation data is in the hands of dents. The results can be used in three The first idea is to balance the traffic
the leading map/transport service pro- ways: data fusion and cross-validation condition of upstream and downstream
viders in China, for example, AutoNavi with other data sources, for example, road. As depicted in Figure 2, for each
(Alibaba), Baidu, Tencent, and Didi. All speed/volume; input for other applica- driving direction, for example, east to
of these companies have participated in tions, for example, queue length can be north, there is an upstream link and a
smart mobility initiatives. For example, used by traffic signal optimization; and, downstream link. The imbalance value
in the City Brain project, AutoNavia pro- incident detection/tracking that greatly of each driving direction is defined as
vides near real-time traffic data services improves the efficiency of traffic surveil- the difference of the normalized speed
(traffic parameters are updated with lance and emancipates human labor (actual speed over free speed) from up-
two-minute intervals) for traffic sensing (see Figure 1). stream and downstream road links.
dtt = vu – vd , ∆tt = βdtt
t t
and optimization purposes. AI-based CCTVs can also be used i i
Another important type of data is in many other scenarios such as secu- ŝu ŝd i i
CCTV. Though not quite new as a data rity and policing. For example, if a car n n m
source, it is currently undergoing an AI runs from the accident scene, the cam- arg min lΣ
δ =1
(wtt (∆tt =JΣ= s δtj ))2, A ≤ JΣ= 1 δtj ≤ B (1)
1
transformation. There are tens of thou- eras can collaboratively track it in real
sands CCTVs deployed in each big city time; face recognition technology can Given there are m signal phases, n driv-
in China for traffic surveillance. Tradi- be used for CCTV data—even though a ing directions, and the many-to-many
tionally, they only perform snapshot human face is typically a very small and relations between them; Equation 1
capturing rather than analytical jobs. It vague camera image difficult for hu- specifies the objective function for the
still requires human intervention to re- mans to recognize, the computers can optimization. The goal is to find the best
still achieve a stronger view. Many uni- incremental green time allocation for
corn start-up companies have emerged the m signal phases δj, j ∈ {1,2,…, m}. vuti
a www.amap.com. AutoNavi is one of the largest
web mapping, navigation and location based
in this area, such as SenseTime,b
services providers, founded in 2001 and ac- c http://www.hikvision.com/cn/
quired by Alibaba Group in 2014. b https://www.sensetime.com/ d https://www.dahuasecurity.com/

big trends china region
and vdti are respectively the actual speed simple idea is to rank the adjacent traffic calls.f In Singapore, in 87.1% of cases,
of the upstream and downstream roads signal pairs by their connectivities and an EV arrives within 11 minutes.g As the
of driving direction i at time t, Ŝui and 
Sui set appropriate phase differences one by last few years have seen rapid urbaniza-
are the free speed; dit is the imbalance one; for the intergroup coordination, we tion in China, the demand for faster EV
value for driving direction i at time t, and focus on the traffic signals on the bound- response times continues to rise.
∆it represents the expected incremental ary, and take into consideration the over- Functionally, there are two basic ap-
green time for direction i where β is a all traffic conditions in each group. proaches to reducing response time:
hyperparameter(s). The objective is to More and more cities in China have optimize the route for EVs to avoid traf-
minimize the total (volume weighed benefitted from such efforts including fic, obstacles, and any other risks; and,
sum) difference of the real allocation Hangzhou, Suzhou, Guangzhou, Shang- preempt traffic signal systems to allow
and the expected allocation of all driv- hai, and Wuhan. Take Hangzhou as an EVs to pass swiftly through intersec-
ing directions, where wit is the volume example: its City Brain system is process- tions. Both approaches still remain chal-
of direction i at time t, and si is the set ing a large volume of data, including one lenging: the estimated time of arrival
of all related phases of direction i. Final- million+ trajectories, 2,000+ cameras’ (ETA) used by routing algorithms can be
ly, A and B are used to ensure the total video streams and many other tradition- delayed by ever-changing traffic condi-
cycle time of the new schedule is within al sensor data. It reports around 2,500 tions, and the signal preemption must
an acceptable range, which are also events daily with 95% accuracy. The av- be dynamic and precise according to the
hyperparameter(s) in signal systems. erage travel time of all trips in the city is traffic conditions to avoid negative im-
Another idea is based on the parti- reduced by 15.3%. pact on the overall traffic flow.
tion-and-conquer paradigm that applies The time-dependent vehicle-routing
to large-scale optimization, for example, On-Demand Greenwave problem (TDVRP) has long been re-
a city or district. One of its applications for Emergency Vehicles searched.4,6 Traditional time-varying path
is the so-called greenwave, which means The response time of emergency vehi- searching algorithms are too optimistic:
multiple traffic signals are coordinated cles (EV) is critical to saving lives. Gov- vehicles are expected to drive exactly at
to reduce the number of stops. The co- ernments across the globe set ambitious the predicted speed. In reality, the ac-
ordination is achieved by setting ap- response time targets. The National tual travel time at each individual road
propriate phase difference for two traf- Health Service (NHS) of the U.K. set a link can slightly vary from expected
fic signals with the same cycle time, target of eight minutes for most serious
so that a vehicle traveling at normal medical calls.e New York City mandates a
10-minute response time on emergency f http://www.nytimes.com/1990/03/25/nyre-
speed can drive through the next traf-
gion/new-ems-response-time.html
fic signal without stop. Greenwave g https://www.scdf.gov.sg/sites/
is normally applied to arterial roads e https://www.nao.org.uk/wp-content/up- www.scdf.gov.sg/files/EMS%20Stats%202016.
where there is a large volume of traffic loads/2017/01/NHS-Ambulance-Services.pdf pdf
crossing consecutive traffic signals.
The key of conducting greenwave is to Figure 3. Enhancing traffic signals.
identify the route that can maximize the In Huangpu district of Shanghai city, 188 traffic signals are partitioned to
performance gain (normally the route 15 groups based on their connectivity—the traffic volume that connects one
with the maximal volume), which can be traffic signal with another. Each traffic signal junction is sized by its traffic
volume and rendered by its group color. This process helps traffic engineers
identified from trajectory data. to optimize traffic signals more efficiently.
The underlying philosophy of the ar-
terial road greenwave is its portability
to any randomly shaped area. Figure 3
shows the result of the traffic signal par-
tition in Huangpu district of Shanghai
city. Navigation trajectories can be used
to derive the volume data between each
pair of traffic signals, which are used as
the input of the network partition algo-
rithms.1,5 The result is a good suggestion
for further optimizations. For example,
for the in-group coordination, a very
Reduction of response time from the field

test in Hangzhou City’s Xiaoshan District.
Time Normal(s) Optimized(s) Gain (%)

9:00–10:00 150 101 32.67
10:00–16:30 150 96 36.33
16:30–18:00 2017 154 25.85
values. As illustrated in Figure 4, the Figure 4. Exemplar speed observation

speed prediction has a significant vari- generated by AutoNavi.
ance, which indicates a variable speed The three curves represent respectively the mean
for actual driving. Cumulatively, this speed (Red), mean speed +3 standard deviation
can lead to a large difference between (SD) (Blue), mean speed -3 SD (Green).
the ETA and the actual arrival time. The
100
higher the variance between arrival and
Average Speed
ETA, the higher the risk for real people
Lower Speed
Travel Speed (km/h)

in critical conditions. Therefore, the 75
Upper Speed
question is: How best to plan a route
Navigation that is fast and robust on ETA?
50
trajectories can An improved route-searching algo-
rithm can answer that question. Instead
be used to derive of trying to minimize merely the overall 25
the volume data travel time, the variance of ETA is also

taken into consideration.
between each pair (2)
0
16:00 16:10 16:20 16:30 16:40 16:50 17:00
of traffic signals, Time of Day
which are used Equation 2 is the revised objective

function for selecting a path j from The residual queue length is defined
as the input of the totally N candidates to minimize the as the length the vehicles fail to pass the
network partition weighted sum of mean (α) and standard
deviation (σ) of travel time. A path pj is
junction in one cycle. Video analytics is
one way to detect the queue length, and
algorithm. represented by a sequence of nodes {v1j, trajectory data can also be used to esti-
v2j,… vnjj} where nj is the number of nodes mate the queue length where cameras
for path pj, and α is the weight, which is are missing. Once the queue length is de-
a hyperparameter. tij is the arriving time termined, the control system can gradu-
at the ith node of path j, and thus t njj is the ally allocate extra green time to clear the
ending time of path pj. queue before the EV arrives.
The key to solving the equation is To minimize the negative impact in-
to calculate the distribution of ending troduced by the signal preemption, the
time. Let us assume a simple case: Trav- algorithm dynamically searches for a op-
eling from node A to B: given the arriv- timized solution that balances the over-
ing time distribution at A, a time-varying all green time allocated to each phase,
speed function on edge AB, and an ran- rather than simply dwelling on the tar-
dom perturbation imposed on the speed get phase and causing problems to other
(a speed offset follows a normal distri- directions. This problem is modeled as
bution with mean 0), to compute the ar- a mixed integer programming problem,
riving time distribution at node B. Once which aims at smoothing the change of
this is solved, the whole searching algo- signal scheduling by starting the pre-
rithm can use it in an iterative way, that emption as early as the ETA’s variance is
is, from tij to t njj where tij is a fixed value. limited to a certain range.
This problem can be modeled as a con- Our test in Xiaoshan District of
tinuous Markov process. Hangzhou city has shown a significant
As the EV travels along the planned improvement in EV travel times, as il-
route, it constantly communicates with lustrated in the accompanying table.
the control center and shares its location This test is conducted on a route from
and speed (by GPS devices). The control (30.138384 120.280503)(lat/lon) to
center fuses the real-time feedbacks with (30.186592 120.266079) where there are
the historic data to predict the ETA at the 19 traffic signals.
next traffic signal junctions, and inform
the signal control system to prioritize the Tourism Recommendation to
EV’s driving direction. The key challenge Solve the Holiday Crowdedness
to this task is twofold: How to determine During public holidays in China, popu-
the most appropriate timing to start the lar tourist cities are flooded with large
green signal to clear the residual vehicle numbers of visitors that can swell to
queue before the EV arrives; and, how to several times the number of residents.
minimize the impact on opposite driv- Increased needs for accommodation,
ing directions. food, and entertainment exert extreme

pressure on the local environment and However, online tourism products users and products, and combine it with
public services, especially for transpor- are very different from regular commod- the classic matrix factorization. The la-
tation. Take the China National Day (an ities due to several factors, including: tent variables learned from the two mod-
annual weeklong holiday beginning Oc- holiday travel is a low-frequency event, els are used to fit the product-scoring ta-
tober 1) for example: In 2015, visitors to most people travel only 1–2 times per ble that is initialized by users’ feedbacks.
Huangshan mountain spent nine hours year; and, numerous travel packages gen- Moreover, the overall loss function is a
on average waiting in line. In 2016, more erate different combinations of trans- linear combination of two models’ loss
than 25 million tourists visited Chongq- port means, restaurants, and hotels. functions. Lastly, a text-generation AI
ing city—a western metropolitan city Thus, most travel products have very few component will creatively generate po-
whose residential population is 30 mil- or even zero customers, and it is very dif- etry to characterize the recommended
lion. In 2017, the traffic congestion on ficult to simply apply traditional recom- attraction and push to users. The test
the Hukun expressway was accumu- mendation algorithms to this scenario. has shown the algorithm performs bet-
lated to maximally 49.73km. Such over- Ctrip’s solution for recommenda- ter than traditional ones for the sparse
crowded populations, as we know, lead tion is twofold: user-profiling based on data and cold start scenarios.
to problems like pollution, congestion, its big tourism data accumulated over The system has been deployed to
and loss of open spaces, and causes in- the last 18 years, and developing a hy- governments such as Henan province,
convenience and negative experiences brid collaborative filtering model that Guiyang City, and many others. In the
for both tourists and local residents. specifically targets the sparse data and Henan province, for example, the rec-
As the largest online travel agency in cold-start problem. ommendation system was deployed last
China, Ctrip discovered an insight from Figure 5 is the user preference tree March. According to Ctrip’s online trav-
its big data—that is, there is an imbal- built from historical travel data. The el booking data, during the Labor Day
anced situation between the distribution short-term profile has the same struc- holiday (a period of three days starting
of tourists and the collective capacity of ture of the long-term one, but is limited around May 1), the total number of tour-
attractions. The agency envisioned that to the latest 30 days’ data. The system ists in the Henan province reached 2.04
a good recommendation system could can quickly iterate the tree and generate million, which is a 41.5% increase from
help divert tourists to less-crowded at- a preference vector for a user, as the in- Labor Day in 2017. To evaluate the effect
tractions to resolve the problem. The put for the recommendation system. of its recommendation system, the tour-
basic idea is to build a tourist prediction The key to the enhanced recommen- ist distribution over 18 areas throughout
component, and once an attraction is dation algorithm is the so-called Addi- the province is calculated. The stan-
predicted as overcrowded, a recommen- tional Stacked Denoising Autoencoder dard deviation (SD) of the distribution
dation component will be triggered to try (aSDAE),3 which employs the deep learn- is 234,355 in 2017 and 202,208 in 2018.
to divert tourists to other places. ing model to learn the latent variables of The SD decrease suggests a more bal-
anced experience visiting the province’s
Figure 5. User preference tree built from Ctrip’s big tourism data. many attractions, which benefits both
tourists and local residents.
Userid Tourism
theme References
Based on 1. Blondel, V.D., Guillaume, J-L., Lambiotte, R., and
total
user’s Lefebvre, E. Fast unfolding of communities in large
Based on data networks. J. Statistical Mechanics: Theory and
recent Hotel star Experiment 10, (2008), P10008.
30 days preference
user’s 2. Chu, W., Liu, Y., Shen, C., Cai, D., and Hua, X-S. Multi-
data task vehicle detection with region-of-interest voting.
Short-term Long-term IEEE Trans. Image Processing 27, 1 (2018), 432–441.
preferences profile Country 3. Dong, X., Yu, L., Wu, Z., Sun, Y., Yuan, L., and Zhang,
A Transportation F. A hybrid collaborative filtering model with deep
preference structure for recommender systems. In Proceedings
of AAAI (2017), 1309–1315.
4. Gao, S. and Chabini, I. Optimal routing policy
Hotel star problems in stochastic time-dependent networks.
preference Transportation Research Part B: Methodological 40, 2
(2006), 93–122.
City City Tourism 5. Lambiotte, R., Delvenne, J-C., and Barahona, M.
C B theme Laplacian dynamics and multiscale modular structure
in networks. arXiv preprint arXiv:0812.1770 (2008).
6. Malandraki, C. and Daskin, M.S. Time dependent
Transportation vehicle routing problems: Formulations, properties
preference and heuristic algorithms. Transportation Science 26, 3
(1992), 185–200.
Wanli Min is Chief Data Scientist and Senior Director at

Alibaba Cloud Computing in Hangzhou.
Tourism
POI
theme Liang Yu is Senior Data Scientist at Alibaba Cloud
Computing in Hangzhou.
Lei Yu is head of the AI Department at Ctrip in Shanghai.
Transportation Shubo He is manager of the AI Department at Ctrip in

preference Shanghai.
© Copyright 2018 ACM 0001-0782/18/11 $15.00.
china region
DOI:10.1145/ 3239548
and artificial intelligence (AI). In
BY HAI JIN/HUAZHONG UNIVERSITY OF SCIENCE AND 2016, International Data Corpora-
TECHNOLOGY, HAIBO CHEN/SHANGHAI JIAO TONG tion’s Cloud Computing Survey report-
UNIVERSITY, HONG GAO/HARBIN INSTITUTE OF ed cloud technology is becoming a
staple of organization infrastructure,
TECHNOLOGY, XIANG-YANG LI/UNIVERSITY OF SCIENCE
as 70% of organizations have at least
AND TECHNOLOGY OF CHINA, SONG WU/HUAZHONG one application in the cloud, and 56%
UNIVERSITY OF SCIENCE AND TECHNOLOGY of organizations are still identifying
IT operations as candidates for cloud
Cloud
hosting.1 In 2017, IDC predicted that
by 2021, spending on cloud infrastruc-
ture and cloud-supported hardware,
software, and services would double
Bursting for
to more than $530 billion.2
For China, the overall market for
cloud computing in 2016 was 51.49
billion RMB (China’s currency), with
an overall annual growth rate of 35.9%
the World’s
in 2016, which was significantly great-
er than the global average. It is expect-
ed that the China cloud computing
market will continue to grow signifi-
Largest
cantly over the next two years, reach-
ing 136.6 billion RMB by 2020.3
The development and populariza-
tion of cloud computing, especially in
Consumer
emerging domains, brings great con-
venience. It also poses new challenges
involving design and construction of
modern cloud infrastructure. Cloud
Market
computing in China is also quite dif-
ferent from other countries, as it in-
cludes special requirements for the
related infrastructure.
Here, we first explore the back-
ground of cloud infrastructure in
China, then turn to its characteristics,
and conclude with an outlook on de-
velopment.
Background
Greatest number of netizens and IT
CLOUD INFRASTRUCTURE IS information technology employees. As of December 2017, the
consisting of various hardware resources and software number of netizens in China was 772
million, or 55.8% of the total Chinese
technologies. It enables ubiquitous access to shared population, the number of online
pools of configurable system resources and higher-level shoppers was 533 million, with annu-
al growth of 14.3% over 2016, and the
services that can be delivered with minimal management
IMAGE BY VA LERY BROZH INSKY
number of online payers was 531 mil-

effort, often through the Internet. Cloud infrastructure lion, including 527 million who pay
today is a critical platform for many applications, through their smartphones.4 As of Jan-
uary 2018, the total number of employ-
providing basic support for the development of emerging ees in computer, communications,
areas, including big data, the Internet of Things (IoT), and other electronic-equipment-man-

ufacturing industries in China was sold through the site on January 11,
8.264 million.5 Development of the 2018.8 Moreover, on the 2017 “11.11”b
cloud infrastructure in China involves sales day, peak database throughput
both challenges and opportunities to for Alibaba was 472 million per sec-
meet the needs of such a large number ond and the peak transactions traf-
of users and developers. fic was 256,000 per second, a 2.1× in-
Best cellular infrastructure. The crease over 2016.9
cellular infrastructure in China fa- Innovation in the mobile Internet
cilitates development of cloud infra- industry. China has seen rapid inno-
structure and services. As of Septem- vation in information industries and
The 13th Five-Year ber 2017, the total number of base services in recent years. In terms of
Plan identified stations in China was 6.04 million,
including 4.47 million 3G and 4G base
smart transportation, Didi Taxi, for
example, had 450 million users and
cloud computing stations.24 The base stations covered 20 billion daily planning requests and
as an important over 95% of the country, including

even small villages with only dozens of
processed 4,500TB data per day in
2017. By accessing information from
emerging national residents. Moreover, as of December
2017, the number of smartphone neti-
traffic cameras on smart traffic lights
and applying cloud and big data analy-
strategic industry. zens in China was 753 million and ris- sis, Didi Taxi was able to improve its
ing, with an annual growth rate of 8.2% activity by 10%–20% in 2017.10 Mean-
over 2016, reflecting deep penetration while, the number of active users of
of the mobile Internet.25 As of No- shared-bike applications was 221 mil-
vember 2017, there were 3.91 million lion, or 28.6% of all Chinese netizens,
active smartphone apps available to as of February 2017.4
Chinese consumers, along with more In digital gaming, as of December
than 2.24 million local third-party 2017, market penetration of mobile
apps, surpassing the number of apps games in China was 76.1%, with each
(1.78 million) provided by Apple’s app game user installing 3.35 mobile game
store in China.4 Some unicorn apps, apps on average.11 In May 2017, more
including Didi Taxi for a Chinese taxi- than 200 million people were playing
hailing service, create demand for un- “King Glory,” a popular mobile game
precedented computing power. in China, with 54.1 million playing it
Greatest demand for applications. daily.12 The Chinese Electronic Com-
With regard to construction of the merce Research Center reported that
cloud infrastructure, the most signifi- during the “11.11” Shopping Festival
cant difference between China and of 2017, the total online trading vol-
other countries is that China has the ume in China was 253.97 billion RMB,
largest and fastest-growing applica- a 45% increase over the same period
tion demand due to having the great- in 2016. Moreover, major e-commerce
est number of users. For example, the enterprises reported 850 million ex-
number of WeChat “HongBao”a send- press logistics orders on November
ing and receiving activities on Chinese 11, 2017, with courier volume during
New Year’s Eve in 2014 was 16 mil- this “11.11” Shopping Festival of more
lion, followed by a dramatic increase than 1.5 billion, historic peak levels.13
to 14.2 billion in 2017.6 There were The data shows the cloud infrastruc-
46 billion “HongBao” activities over ture in China is able to serve billions
six days during the 2017 Spring Festi- of users concurrently.
val, with a 43.3% increase over 2016.7
Another example of dramatic growth Characteristics
is that for the official website of the Cloud infrastructures worldwide
China Railway Corporation, “12306,” share certain characteristics, includ-
the average number of page views per ing resource-on-demand, elasticity,
day was 55.67 billion and 81.34 bil- and geo-distribution. Due to the deep
lion during the peak period, and more penetration of the mobile Internet
than 10.2 million train tickets were and proliferation of mobile apps in
China, China’s cloud infrastructure
a HongBao is a red envelope with money inside
traditionally given by older people to young b 11.11 is an online shopping carnival (such as
people as a gift during important festivals; it Taobao and TMall) run annually by Alibaba on
went digital via WeChat and Alipay. November 11.

largely centers on the app ecosystem, tionality with different priority is di- uses its own complete customized
as characterized in the following ways: vided. For example, message-delivery software stack, from infrastructure
(Super) app-oriented infrastructure. functionality is divided into three ser- software to cloud software.19 Its cur-
Consider the top two largest cloud vice modules—message synchroniza- rent world-record TPC-C result is ap-
service providers in China: Alibaba tion, voice and text message sending, proximately 500,000 transactions per
Cloud and Tencent Cloud. Alibaba and figures and videos sending—al- second,22 while it handles up to 42
Cloud originally sought to address lowing out-scalability with increased million operations/second in its da-
the need for massive computing re- numbers of users. tabase involving 325,000 and 256,000
sources for the “11.11” Shopping Fes- With its customized infrastructure, NewOrder and Payment Transactions
tival, with 90% of sales from mobile WeChat grew from a chat app to a mas- per second.c To this end, Alibaba has
apps like Taobao and TMall. Tencent sive digital payment system; “Hong- created its own open-source database
Cloud sought to address the comput- Bao” can be viewed as a special kind of (called OceanBase) and distributed
ing infrastructure for Tencent’s flag- payment, an enterprise business plat- file system (called TFS). In order to
ship mobile apps: QQ and WeChat. form, and a development-and-delivery meet the resource requirements of
Its cloud infrastructure thus centered platform (WeChat Mini apps). With peak traffic, Ali Cloud is able to ex-
on such super apps, serving hundreds its developing ecosystem, the WeChat pand capacity by 100,000 servers in
of millions of people daily. The cloud platform could become not only a one hour. To quickly deploy such ser-
infrastructure cannot be developed cloud platform itself but also attract vices, it created Pouch, a customized
simply by reusing open source cloud a large number of third-party apps to container framework and aggres-
stacks like OpenStack due to the need access its services on Tencent Cloud, sively deployed and scheduled online
of unparalleled concurrency from providing seamless integration and services with offline services through
such apps. To this end, major cloud winning strong user loyalty. its Sigma scale-out scheduler. While
service providers in China tend to Scalable and hybrid infrastructure hybrid cloud has been advocated for
build their own infrastructures, with for bursty loads. Many super apps ex- years, Alibaba has pioneered seamless
heavy optimizations tailored for their hibit strong bursty loads, especially integration of its public cloud with
super apps. under the extra demand on special the datacenters of its partners to de-
WeChat’s infrastructure evolution. days or during certain seasons. The liver one-stop handling of individual
Here, we consider the evolution of cloud infrastructure needs to not only transactions among multiple service
WeChat as an example of how Ten- be able to scale up easily but also scale providers.
cent builds and customizes its cloud in afterward. Unlike Amazon and Deeply integrated/optimized infra-
infrastructure.18 WeChat’s initial goal Google, which mainly deploy services structure. Like other technology giants
was to develop a message-oriented in their own large-scale datacenters, Amazon, Microsoft, and Google, the
chat app, with messages synchronized major cloud service providers in China growth of cloud scale makes efficiency
between a sender and its receivers. tend to rent existing datacenters to a top optimization target, as even a
China is thus the key market world- scale out their services and cloud- single-digit increase in utilization or
wide. To satisfy this extremely large- based systems, in addition to their performance density could save tens
scale requirement, Tencent custom- own datacenters. These providers rent of millions of U.S. dollars. This moti-
ized its own Remote Procedure Call datacenters because a larger number vates cloud service providers in China
library (called Svrkit), an infrastruc- of small- to medium-size datacenters to provide deeply integrated and opti-
ture pillar for connected distributed were built during the IT revolution of mized infrastructure.
planetary services for WeChat. A key the 2000s but had relatively low utili- China’s cloud infrastructure is
design decision in such synchroniza- zation. For example, as of 2017, there moving toward hardware/software
tion was how to propagate messages. were more than one million datacen- co-design to improve efficiency and
Many such designs have adopted the ters in China, but most were small, flexibility. Huawei Cloud, another
“pull” mode, whereby receivers pull each occupying less than 500 square very large cloud service provider in
messages from a sender. In contrast, meters.21 Cloud service providers usu- China, publically released its Service
WeChat adopted a “push” propaga- ally rent datacenters to quickly scale Driven Infrastructure plan in 2014,
tion mode, because there was a strict up their services under bursty loads, including software-defined storage
upper limit in chat groups, originally then scale in to avoid possibly wasting and software-defined networking.20
20, later increased to 100 and today the infrastructure cost. To allow quick This allowed offloading key process-
500. The cost for push propagation deployment of services, they usually ing functionalities in hardware while
is bounded, and the receiver, or each built their customized infrastructure retaining software flexibility. Ali
WeChat app, can deliver much lower to allow quick deployment of services Cloud recently released its X-Dragon
latency. Due to having to serve a rap- and increase resource utilization. Cloud Server to aggressively offload
idly growing user population, WeChat Technologies behind Alibaba’s VM management services, as well as
initially adopted a micro-serviced ar- “11.11” Shopping Festival. The un-
chitecture, including aggressive divi- precedented peak transactions per
c Note real-world transactions are much more
sion of functionalities of business rep- second requires technologies that are complex than TPC-C, as each user-facing
resentation layer into multiple logic not only from the off-the-shelf open transaction generates a large number of trans-
servers (logicSrv); even the same func- source stack. Alibaba, for example, actions.
customized data-processing services, unit to carry it through the next de- computing-survey/
2. IDC FutureScape: Worldwide IT Industry
to a tightly coupled physical instal- cade. According to a Gartner research 2018 Predictions; https://www.idc.com/getdoc.
lation. It also recently announced it report in September 2017, Ali Cloud jsp?containerId=US43171317
3. White Paper on Cloud Computing Development
would produce its own neural pro- has surpassed Google in IaaS Public in China (2017); http://www.fx361.com/
cessing units for AI-related tasks. Cloud Service and is today the third page/2018/0112/2686558.shtml
4. The 41st China Statistic Report on Internet
And UCloud, a top-five cloud service largest cloud provider in the world.23 Development; http://www.cnnic.net.
provider, announced its release of In 2015, Tencent adopted its “Cloud cn/hlwfzyj/hlwxzbg/hlwtjbg/201803/
P020180305409870339136.pdf
near-data-processing infrastructure Plus” plan to develop Tencent Cloud, 5. China Entrepreneur Investment Club; https://
for big-data applications, yielding which will invest 10 billion RMB to www.ceicdata.com/zh-hans/china/no-of-employee-
by-industry-monthly/no-of-employee-computer-
improved efficiency. build a cloud platform and ecosystem communication--other-electronic-equipment
6. Tencent Tech: HongBao War; http://new.qq.com/
over the next five years. Meanwhile, omn/20180215/20180215C0EIMO.html
Outlook Huawei has established a new busi- 7. 2017 WeChat Spring Festival Data Report; http://tech.
qq.com/a/20170203/010341.htm
China’s cloud infrastructure has ness group dedicated to developing 8. China Railway Site Sees 5.93 Billion Clicks Per Hour
made great strides, supporting large- Huawei Cloud. as Busiest Travel Season Starts; https://technode.
com/2018/01/16/chunyun-data/
scale applications and millions of us- Emerging computing paradigms and 9. Alibaba Tech: Fight Peak Data Traffic on 11.11: The
ers. The rapid development of cloud cloud computing. Information tech- Secrets of Alibaba Stream Computing; https://
medium.com/@alitech_2017/how-to-cope-with-peak-
infrastructure has been promoted nology is evolving quickly. Emerging data-traffic-on-11-11-the-secrets-of-alibaba-stream-
both through national research proj- computing paradigms like AI, the computing-17d5e807980c
10. 2017 Annual Urban Transportation Report.
ects and through the corporations IoT, and Cloud-Edge computing have DidiChuxing; http://index.caixin.com/upload/didi2017.
involved. The Chinese central and begun to influence the cloud infra- pdf
11. JiGuang. 2017 Mobile Gaming Market Research
local governments now plan to push structure and offer opportunities for Report; https://community.jiguang.cn/t/topic/24810
development of cloud computing addressing cloud-related challenges. 12. JiGuang. 2017 King Glory Research Report; https://
www.jiguang.cn/reports/72
while mainstream enterprises pur- Machine- and deep-learning algo- 13. 2017 ‘11.11’ E-Commerce Platform Shopping Festival
sue a new round of cloud computing rithms and models for AI are relevant Evaluation Report; http://www.100ec.cn/zt/upload_
data/17sh11bg.pdf
designs. for cloud computing researchers and 14. The development plan of the 13th Five-Year Plan
national strategic emerging industry; http://www.gov.
The government’s plan for devel- practitioners. On the one hand, the cn/zhengce/content/2016-12/19/content_5150090.htm
oping cloud computing. The Chinese cloud can benefit from machine and 15. The Three-year Development Plan of Cloud
Computing (2017–2019); http://www.miit.gov.cn/
central government is emphasizing deep learning to support more smart n1146290/n4388791/c5570594/content.html
development of cloud computing resource management. On the other, 16. 2017 Project List of ‘Cloud Computing and Big Data’
Special Projects in The National Key Research and
and its underlying infrastructure. machine- and deep-learning requires Development Plan; http://app.myzaker.com/news/
For example, the 13th Five-Year Plan large-scale computing power, and the article.php?pk=59a4e2d41bc8e03727000029
17. 2018 guide for projects of ‘Cloud Computing and Big
identified cloud computing as an im- cloud is an essential platform for host- Data’ Special Projects in The National Key Research
portant emerging national strategic ing AI services due to its potential for and Development Plan; http://www.stdaily.com/kjzc/
top/2017-10/10/content_582554.shtml
industry.14 And the Ministry of Indus- high scalability and ready access to 18. The evolution of WeChat Infrastructure; http://www.
try and Information Technology of computing resources. infoq.com/cn/articles/the-road-of-the-growth-weixin-
background
China adopted a Three-Year Develop- With the rapid development of the 19. Techniques behind TMall’s 11.11 Shopping
ment Plan for cloud computing, 2017 mobile Internet and IoT applications Festival; https://jaq.alibaba.com/community/art/
show?articleid=1201
to 2019, aiming to increase the cloud in China, the existing centralized 20. Huawei SDI Innovation Architecture; http://www.
computing industry in China to 430 cloud computing architecture faces cnetnews.com.cn/2014/0918/3034037.shtml
21. Analysis of 2017 China Datacenter Sector
billion RMB by 2019.15 The Chinese significant challenges. Edge comput- Development and Evolution; http://www.chyxx.com/
industry/201709/564441.html
central government is also funding a ing is being investigated as a way to 22. http://www.tpc.org/tpcc/results/tpcc_results.asp
series of projects for cloud comput- better exploit capabilities at the edge 23. Iaas Public Cloud Service Market Share; https://www.
channele2e.com/news/gartner-public-cloud-iaas-
ing. In 2017, the “Cloud Computing of the network to support the IoT. In market-share-amazon-aws-microsoft-azure-google-
and Big Data” Special Program of the edge computing, the massive amount growth/
24. Number of base stations in China; http://tech.sina.com.
National Key Research and Develop- of data generated by different kinds cn/roll/2017-10-22/doc-ifymzksi0587142.shtml
ment Plan launched 15 projects with of IoT devices can be processed at the 25. Data analysis on the number of smartphone
users nationwide; http://www.chinabgao.com/k/
total funding of 409 million RMB.16 network edge instead of having to first zhinenshouji/28395.html
In 2018, it plans to start 20 projects transmit it to the centralized cloud
with a total budget up to 625 million infrastructure due to bandwidth- and Hai Jin is the Cheung Kung Scholar Chair Professor at
RMB.17 energy-consumption concerns. Edge Huazhong University of Science and Technology, Wuhan.
Enterprises’ plan for developing computing can thus provide services Haibo Chen is a professor and Director of the Institute of
Parallel and Distributed Systems at Shanghai Jiao Tong
cloud computing. Chinese enterprises with quicker response and greater University, Shanghai.
are developing an increasingly pow- quality compared to traditional cloud Hong Gao is a professor at Harbin Institute of Technology,
Harbin.
erful cloud infrastructure to provide infrastructure and is more suitable for
Xiang-Yang Li is a professor and Executive Dean of
competitive cloud computing prod- being integrated with IoT to provide the School of Computer Science and Technology at the
ucts and services. For example, Inspur more efficient and secure services for University of Science and Technology of China, Hefei.
and Sugon launched a series of scien- a vast number of end users. Song Wu is a professor and Director of the Institute
of Parallel and Distributed Computing at Huazhong
tific projects to research key technolo- University of Science and Technology, Wuhan.
Further Reading
gies in cloud datacenters and servers. 1. 2016 IDG Cloud Computing Survey; https://www.idg.
And Alibaba expects to use its cloud com/tools-for-marketers/2016-idg-enterprise-cloud- © 2018 ACM 0001-0782/18/11 $15.00..

DOI:10.1145 / 3 2 3 9 5 5 0
BY YUAN QI/ANT FINANCIAL, JING XIAO/PING AN

TECHNOLOGY (SHENZHEN) CO., LTD.
Fintech:
AI Powers
Financial
Services
to Improve
People’s
Lives
FINANCIAL TECHNOLOGY, ALSO known as fintech, is
a fast-evolving field that has reshaped the financial
industry. Ant Financial has redefined digital financial
services, specifically mobile payment and microloan
services, and Ping An Technology has developed
innovative fintech to reshape the insur- of millions of individuals and small
ance, investment, and banking busi- businesses in China and throughout
nesses. the world. Two examples demon-
Computing technologies play an strate its impact.
important role in the transforma- Ci Ren Ge Dan (Figure 1) runs a
tion of modern financial services. tent store at the foot of Mount Ever-
Ant Financial, an affiliate of Alibaba est, 5,200 meters above sea level. He
and a leading Chinese fintech com- used to take half a day to go to the
pany, has participated in this trans- nearest bank. Carrying and keeping
formation by using technology to a lot of cash was inconvenient. Now
bring financial services to hundreds he is one of many small merchants
served by Ant Financial. Last year, Ci Yu’e Bao was quickly adopted by hun-
Ren Ge Dan added QR codes to items dreds of millions of users, and is now
in his store, which allows tourists to the largest money market fund in the
pay him using their cellphones. He world. To handle such large traffic
can also use his phone to pay electric- volume, Ant Financial redesigned its
ity bills, deposit money, and acquire payment systems using distributed
funds without leaving his store. architecture and cloud computing,
Zhang Yousheng (Figure 2) is a and migrated its partners’ fund man-
herdsman who has raised cattle for agement systems to the architecture.
decades. In the past, he worried Furthermore, the rise of the mobile
When an about having the funds he needed Internet led users to explore different
accident happens, to buy calves and fodder, and about

selling his cattle. After Ant Finan-
options—including NFC, Bluetooth,
sound wave, barcode, and QR code—
the customer cial partnered with a cattle industry
company to provide low cost micro-
for so-called offline mobile payment.
None of these options was perfect in
only needs to take loans, Zhang said he no longer wor- terms of user experience, cellphone
a few pictures ries about funds and sales. His life as
a herdsman is easier. Ant Financial’s
support coverage, and cost. Bal-
ancing these factors, Ant Financial
of the damaged services have helped tens of millions chose a QR code- and barcode-based
car to file a of small and micro merchants in
China, from prosperous cities to re-
offline mobile payment method: a
merchant can scan a consumer’s QR
claim from the mote rural areas. The technologies code or barcode to record a transac-
accident site. behind these stories are a series of

innovations that make financial ser-
tion, or the consumer can scan the
merchant’s QR code. The innovation
vices more accessible and affordable of QR payment builds a point-of-sale
to everyone. transaction upon a cheap sticker
that’s affordable to even the smallest
A Level of Trust merchant and serves as the founda-
The innovations behind Ci Ren Ge tion of mobile payments.
Dan’s story are payment technolo- In 2017, Ant Financial launched
gies. Ant Financial started an escrow the Smile-to-Pay service based on
payment service 14 years ago, and computer vision technology. Instead
held shoppers’ payments until mer- of using a cellphone, a user smiles to
chants delivered purchased items, a vending machine to complete a pay-
providing a needed level of trust to ment. As the first commercial facial
e-commerce users. A second innova- recognition payment system, Smile-
tion came in 2010 when Ant Financial to-Pay took security and the user ex-
designed an express payment system perience to a new level. The AI-driven
that gave both users and banks a product is based on imaging and vi-
trusted payment platform. Although sion analysis technology developed
the initial technological challenge was internally by Ant Financial.
expected to be connecting all banks,
it turned out that the real challenge Extending Microloans
was controlling risk given a rapid in- Microloans are another area of Ant Fi-
crease in transaction volume. As a re- nancial innovation. When the service
sult, Ant Financial developed real- was launched in 2010, the first loan
time risk management technologies was for only 1,300 RMB (~$180 USD).
that used rules and algorithms to an- Ant Financial built a credit model
alyze hundreds of thousands of based on data of merchants’ previous
transactions per second, improving sales and transactions. The combi-
transaction security dramatically. nation of computing power and past
The express payment method has behavior extended microloan servic-
become a standard for Web and mo- es to more merchants. However, the
bile payment. operational costs were quite high,
In 2013, Ant Financial created a the user experience needed improve-
new service called Yu’e Bao. The ser- ment, and it took three days for a
vice pays interest on users’ current merchant to get a loan.
account balances including funds A leap forward occurred when the
“left over” from digital transactions; system adopted advanced machine
it also serves as a payment platform. learning methods—including boost-

ing, deep learning, and graph-based Figure 1. Ci Ren Ge Dan uses Ant Financial’s services to receive and make payments for the
machine learning—for accurate cred- tent store he operates at the foot of Mount Everest, 5,200 meters above sea level.
it modeling. The new system is char-
acterized by three digits: 3, 1, and 0;
a merchant takes less than 3 minutes
to complete a loan application, ob-
tains the decision in 1 second, with
zero human intervention. The inte-
gration of systems engineering and
algorithmic advances into the micro-
loan operation makes the cost of each
loan less than 2 RMB ($0.31 USD).
Combining a convenient user experi-
ence with low operational costs, Ant
Financial now serves tens of millions
of merchants in China with acces-
sible and affordable loans.
Financial service providers face
three challenges when digitizing ser-
vice for the future economy—connec-
tion: how to link users, merchants,
and service partners in a low-cost,
fast, and intelligent way; risk: how
to control for aspects of financial
risk; and trust: how to grant equal
opportunity for all to be trusted, and
trustworthy, in the digital space.
To address these challenges, Ant
Financial focuses on five technolo-
gies: blockchain, AI, security, IoT,
and computing (BASIC). Blockchain
Figure 2. Zhang Yousheng, a herdsman, uses Ant Financial microloans to purchase calves
helps to build a trusted global inter- and fodder.
connected system capable of storing,
exchanging, and processing values;
AI enables companies to build in-
telligent systems that better serve
customers and business partners,
and drive new product design; secu-
rity is a pillar that makes digital sys-
tems safe and stable; IoT (Internet of
Things) is a bridge that connects the
physical and digital realms to trans-
formative effect; and computing en-
gines provide the digital space with
computational power. The following
paragraphs give more thoughts from
Ant Financial on two of them: block-
chain and AI.
Two of the BASIC technologies
merit a closer look. Blockchain pro-
vides a new trust mechanism to trans-
actions. Over the past two years, Ant
Financial has used it to improve the
transparency of charities, strengthen
the trust of insurance contracts, en-
sure the authenticity of house rental
contracts, and improve the traceabil-
ity of e-commerce supply chains. Ant
Financial’s applications are based on
a consortium blockchain. However,
current blockchain technologies Fintech Revolution

face several key challenges in large- The surge of fintech in the past few
scale financial applications. Take a decades has revolutionized the way
global e-commerce supply chain as financial industry personnel work,
an example. To support a global sup- think, and live. Ping An has devel-
ply chain, blockchain nodes should oped numerous technologies to ad-
be deployed in different continents, vance the industry. Its areas of fintech
which affect the fairness of the con- concentration can be summarized as
sensus algorithm used by the block- ABCDS: artificial intelligence, block-
chain system. If all supply, distribu- chain, cloud, big data, and security.
Deep learning and tion, and sales records are stored in AI is the core engine that drives in-
natural language the same chain, the chain must be
able to support hundreds of thou-
dustry automation and intelligence.
Blockchain provides a revolutionary
processing sands of transactions per second. Not trust mechanism. Cloud computing
technologies all records should be transparent to

all participants, so a comprehensive
lays the foundation for processing
massive amounts of online transac-
helped intelligent mechanism is needed to protect the tions. Big data aids knowledge min-
ing and decision making. Security
customer service
privacy and ownership of the data on
the chain. All these are serious hur- is the essential element for safe and
robots achieve dles to a blockchain system.
Thus Ant Financial has developed
stable systems.
The core of Ping An’s AI platform
higher customer an industrial-grade blockchain sys- is the Ping An Brain engine. Covering
satisfaction rates tem to address these challenges. The
company plans to share the system’s
a broad range of data analytics and
AI techniques such as biometrics,
than live value and open its blockchain tech- NLP, image recognition, and more,
service staffs. nologies to the public in 2018. Ping An Brain can provide full-stack
AI solutions to enhance financial
Robots Service Ratings services scenarios such as market-
Ant Financial uses AI to create a fi- ing, customer service, and decision
nancial brain for the digital world. support. It has been successfully de-
Recent years have witnessed the huge ployed across Ping An’s insurance,
success of machine learning and investment, and banking businesses,
deep learning in machine perception greatly improving their effectiveness,
areas such as speech recognition and efficiency, and costs.
image analysis, but financial services For blockchain, Ping An was an
need more, including prediction and early adopter, and has deployed a
decision-making. These capabili- blockchain-based production system
ties, combined with a comprehen- since 2016. By the end of 2017, it had
sive financial knowledge graph, are over 12 blockchain-based platforms,
the foundation of the financial brain covering fixed income trading, asset-
at the core of Ant Financial’s risk, backed securities, post trade recon-
credit, and customer service engines. ciliation, and other transactions. By
The brain enabled Ant Financial to March 2018, its blockchain network
reduce its payment loss rate to less had over 20,000 nodes across China
than one in a million, automatically and handled transactions valued at
answer millions of customer inqui- over one-trillion RMBs, including
ries a day, automatically assess car over 90% of those for Ping An One-
damages based on computer vision Connect, the Ping An Group’s fintech
and a vehicle knowledge base, and subsidiary.
improve other services. In particu- A series of applications showcases
lar, deep learning and natural lan- its use of AI, big data, and blockchain.
guage processing (NLP) technologies As the capacity and scope of the in-
helped intelligent customer service surance industry expands, the num-
robots achieve higher customer sat- ber of claims increases and leads to
isfaction rates than live service staffs. issues such as processing latency,
During the popular Singles’ Day 2016 high risk, potential misjudgment,
shopping occasion, 97% of customer and possibly fraud. To resolve such
service inquiries on Ant Financial’s issues, Ping An has leveraged AI tech-
Alipay service were handled by the in- niques across all insurance industry
telligent customer service robots. scenarios, including fraud detection,

customer acquisition, and claims lion registered enterprises, includ- Ping An’s blockchain research and
processing. ing households, in China. Their in- cryptography team responded with
formation comes from three major the FiMAX platform. The architecture
AI Assessment of Claims and Risks sources: commercial registration is designed to address all key prob-
Take auto insurance as an example. and daily operation; public news an- lems hindering large scale block-
When an accident happens, it often nouncements and social posts; and chain adoption, with performance
takes a long time to process a claim. business relationships including the matching traditional databases sys-
Customers wait onsite for investiga- supply chain, investments, and le- tems and privacy protection enabled
tors to arrive and assess the damage, gal actions. To organize and analyze by advanced cryptology including
they wait as the claim is filed and such rich and dynamic data, Ping An various Ping An designed zero knowl-
processed, and they wait for a final developed Euler Graph, an enterprise edge proof algorithms.
decision. It is inconvenient for cus- knowledge graph. The graph covers FiMAX has not only earned praise
tomers and costly to insurance com- nearly all of China’s 70-million en- from Ping An’s business partners, it
panies. The process is also vulnerable terprises, using data from all three has also gained recognition with its
to fraudulent claims. To address such sources. Millions of legal proceed- selection for some of the largest in-
problems, Ping An developed a sys- ings are automatically interpreted ternational blockchain networks be-
tem where the customer only needs and over 40-million lawsuit relation- ing built for banks and regulators.
to take a few pictures of the damaged ships have been extracted and incor- For example, one cross border block-
car to file a claim from the accident porated into the graph. Signals on chain network to be launched later
site. The claim is processed within enterprises are collected from over this year will comprise over 10 inter-
seconds and the customer given a 300 news and social sites, totaling national banks and over 100 nodes.
precise payment calculation. The sys- hundreds of thousands of articles
tem involves a series of key modules: daily, and updated every 10 minutes. Conclusion
picture quality assessment, verifi- Information from these and other Ant Financial made a series of inno-
cation of insurance, car segmenta- sources grows quickly. Deep graph vations that led to key technologies
tion, identification of damage and analysis algorithms support business behind mobile payment and micro-
related parts, payment calculation, decisions on risk assessment and loan services in China. Ping An used
and fraud detection. A number of AI other matters. One advantage of Eul- innovative techniques to improve
techniques, such as image process- er Graph is that business logic is di- financial services for insurance, in-
ing, image segmentation, and object rectly integrated. For example, risks vestment, and banking industries.
recognition, were developed to sup- are assigned using different busi- Much progress has been made, but
port the functions. The system has ness logic for investments, bonds, every problem solved opens the door
been running in production at Ping or loans, and signals are extracted to further questions and consider-
An for over a year, successfully pro- from an analysis of social and news ations. How should we model the
cessing over 30,000 claims each day. data. Upstream and downstream re- transaction systems in a large-scale
It not only improves claim processing lationships may also be encoded as dynamic network, and implement
efficiency and thus customer experi- risk indicators. When a risk event oc- intelligent inference and reasoning
ence, but also stops potential frauds curs upstream, the incident passes for better financial services? How
on the order of multi-billion RMB. through the graph network and may can data be utilized and user privacy
This system is now available to the in- influence an assessment. Through ef- protected at the same time yet better
surance industry through the Ping An fective analysis by Euler Graph, risks than through current methods such
OneConnect platform. such as defaults were successfully de- as differential privacy? How can caus-
Investment banks often need to tected three to nine months ahead of al inference be applied in a complex
assess the potential value and risk of occurrence. Euler Graph is also used system and when only observational
targeted customers—individuals for for other applications, such as preci- data are available? Answering these
retail banking or enterprises for cor- sion marketing and exploring invest- questions will lead to tomorrow’s
porate. In today’s big data era, infor- ment opportunities. breakthroughs.
mation comes from a broad range of
resources with complex relationships Large-Scale Blockchain Architecture Yuan (Alan) Qi is Vice President and Chief Data Scientist
at Ant Financial, Zhejiang.
between them. To make a precise as- Ping An OneConnect has identified Jing Xiao is Chief Scientist and Executive Member at Ping
sessment, it is crucial to organize and various shortcoming impeding the An Insurance (Group) Company of China LTD, Shenzhen.
analyze such complex information wide scale adoption of blockchain.
in an efficient and effective manner. Performance and scalability bottle-
This is exactly what knowledge graph necks have hindered its potential in
is designed for. Ping An has devel- building high volume financial trans-
oped various knowledge graph tech- action systems, and issues of data
niques for retail and corporate busi- privacy and confidentiality have lim-
nesses. ited its usage in public service areas
Take corporate risk assessment, where few entities are willing to share
for example. There are over 70-mil- data. © 2018 ACM 0001-0782/18/11 $15.00
china region
DOI:10.1145/ 3239552
three million couriers working in last-
BY HUAXIA XIA/MEITUAN, mile express delivery in China, and an-
HAIMING YANG/JD CTO GROUP other one million couriers working for
on-demand delivery services.
Is Last-Mile
The country expects the express de-
livery industry to increase another 60%
by 2020,3 requiring an even greater
delivery force. However, the working
Delivery
population in China is aging; its num-
bers in steady decline since its peak
in 2011 (a situation mainly caused by
China’s 40-year-old “one-child policy,”
a ‘Killer App’
which ended in 2016). This decline is
expected to continue for at least an-
other decade.
These two conditions have motivat-
ed the industry to seek a more efficient
for Self-Driving
delivery solution—the autonomous de-
livery vehicle. But there are two dimen-
sions of challenge—a profitable busi-
ness model, and of course technology.
Vehicles? The Business Model of

Autonomous Driving in
Last-Mile Delivery
We have seen two major upgrades to
the last-mile delivery business model:
the warehouse is getting closer to the
end users, and deliveries are merged
into “multi-deliveries.”
In traditional logistics, the ware-
house is a motionless unit used only
for storage. Goods are transported from
CHI NA’S E-COM M E RC E BO O Mhas generated a huge many other locations and then distrib-
uted to the multiple end users. The new
logistics demand, both in terms of express package express logistics, however, redefines
delivery and on-demand food delivery.a the warehouse concept to be not only
Chinese express firms delivered an estimated 40 storage but also a mobile facility that
relocates to serve many more users.
billion parcels in 2017; up 28% from the previous year.2 Last-mile delivery is one such scenario
Indeed, China’s on-demand delivery market exceeded applied to this “motion warehouse”
concept, and the autonomous driving
over 30 million food orders daily by yearend 2017. vehicle is one of its core technologies.
This growth in delivery services has created a The concept of motion warehouse is a
healthy job market, with the number of employees revolution of logistics, which has com-
bined the three major elements of retail
in this sector up 130% from 2014 to 2017, according business—people, goods, and ware-
to the China Federation of Logistics and Purchasing. houses—into the new concept. The
warehouse is aware of the needs of peo-
Last-mile delivery—the movement of goods from ple, and provides the goods selection
a transportation hub to final destination—has through the AI based big data analysis.
witnessed huge growth. Today there is an estimated In the new “motion warehouse” model,
consumers can get the goods they want
a http://news.iresearch.cn/content/2017/11/271315.shtml/ more accurately and quickly.

This approach extends the supply fore the are allocated to customers, and as depicted in Figure 1, current ADAS
chain concept from the warehouse all delivering passively (users pick up pack- functions mostly hover around levels 1
the way to the end users. In a tradi- ages at designated location) and proac- or 2. Technology is still far from realiz-
tional logistic system, the connection tively (carried directly to the end users), ing level 4—a truly driverless car.
is from the preposition warehouse di- both user experience and efficiency are Two key challenges for fully auton-
rectly to the end users. (A preposition increased. Autonomous vehicles are key omous passenger vehicles are trust-
warehouse is one in close proximity to to achieving these benefits. worthiness and price. A typical auton-
the consumer). It could be an office omous passenger vehicle (as shown
building, or a small warehouse set up The Technology of Autonomous in Figure 2) is equipped with an array
to serve a community that enables user Driving in Last-Mile Delivery of state-of-the-art sensors and other
delivery in 1–2 hours. However, with Autonomous driving technology is not technologies that can cost hundreds
increasing population and demand, ready to replace human drivers in pas- of thousand dollars.1 But even with
the “campus” model is emerging as senger vehicles. Many luxury vehicles such expensive equipment, autono-
an efficient way for even better user may be equipped with advanced driver- mous vehicles are still far from reli-
experience. Campuses’ aggregate de- assistance systems (ADAS)—including able in terms of passenger safety. For
livery demands are always large so pre- emergency braking, backup cameras, example, last March a Tesla driver was
allocation to customers and traditional adaptive cruise control, and self-park- killed when his car, set in “autopilot
IMAGE BY CHESK Y
(human) last-mile delivery is no longer ing systems—but they are not fully mode,” collided with a median barrier
the most efficient. By transporting a autonomous. According to SAE Inter- on the highway, causing the vehicle to
large number of packages together be- national’s levels of vehicle autonomy catch fire.
Figure 1. SAE automation levels.
Figure 2. Sensors on a typical autonomous passenger vehicle. China just issued its first license for
self-driving tests last March, and has
not yet published any test result data,
so we refer to autonomous vehicle dis-
engagement reports issued by Califor-
nia’s Department of Motor Vehicles in
2017 (and summarized in the accompa-
nying table). Waymo’s autonomous car
required human assistance every 5,596
miles on average, which is the best
among all tested vehicles. By compari-
son, a human driver on average has one
accident approximately every 165,000
miles. Regulators will require autono-
mous cars prove much safer than cur-
rent human behavior before drivers are
no longer needed behind the wheel.
Much more research and engineering
efforts, possibly over a decade’s worth,
is required to improve autonomous
driving technology, including higher-
resolution sensors, better algorithms,
and faster computing chips.
Last-mile delivery, however, is a
Summary of California autonomous vehicle disengagement reports from 2017. solid scenario illustrating how auton-
omous-driving technology has been
successfully and safely employed.
Autonomous Number Miles per There are a few key differences be-
Company miles of disengagements disengagement
tween a delivery vehicle and a pas-
Waymo 352545 63 5596
senger vehicle. The last-mile delivery
GM Cruise 131676 105 1254
vehicle usually runs slowly, typically
Zoox 2255 14 161
20mph. A slow vehicle requires shorter
Baidu 1979 43 45
perception distance, shorter braking
Bosch 1454 598 2.4
distance, and less computing frame
Mercedes Benz 1087 652 1.7
rate. Secondly, the last-mile delivery
vehicle is typically smaller and lighter
than passenger vehicles; this further

decreases the risk of possible damage How JD Uses the Vehicles out across JD’s deliver network to be
or harm when an accident happens. in Delivery Scenarios used for a wider range of applications.
Finally, a delivery vehicle is free of JD.com is the largest e-commerce plat- Autonomous driving vehicles will be
passengers, therefore, it has fewer reform by revenue, and offers a world- loaded at delivery stations and will travel
quirements for safety, planning, and class set of online retail services to its to pick-up points designated in advance
control algorithms. legion of users, who now number close by consumers. The recipient can collect
There are two major challenges in to 300 million in total. As a technology- the products they order simply by press-
last-mile delivery, which autonomous driven company, JD.com has focused ing a button on JD’s mobile app. JD’s
driving can help: The distribution lo- considerable effort in developing a ro- delivery vehicle can also recognize the
cation of the package, and the deliv- bust and scalable retail platform that customer using face identification and
ery path. From past experience, when not only supports the company’s rapid deliver the product accordingly.
a delivery person arrived on location, growth but also allows it to provide cut- JD conducted its first trial in auton-
the majority of time was spent waiting, ting-edge technology and services to its omous driving vehicles for last-mile de-
especially when the planned delivery partners and customers. livery on June 18, 2017 at Renmin Uni-
consisted of a large number of small JD selected last-mile delivery prac- versity, Beijing. The vehicle delivered
packages destined for office build- tices as the first line of defense in its about 10 packages in approximately
ings, campuses, and apartments. The campaign to upgrade its logistics infra- six hours. JD subsequently deployed
waiting time for customers, and the structure. JD Logistics’ autonomous de- approximately 60 autonomous driving
handling time to delivery to custom- livery vehicles will primarily be used for vehicles for last-mile delivery at Bei-
ers, killed any efficiency of last-mile last-mile delivery in urban areas, carry- jing, Xian, and Hangzhou for pilot AI-
delivery. Moreover, the delivery person ing packages from dedicated stations to based package delivery. The city of Xian
is paid by the number of packages de- office buildings, pick-up stations, resi- has been selected as the headquarters
livered, meaning the company often dential area convenience stores, and for JD’s fleet of vehicles. In December
pays a great deal of money for a trip to a other locations. It was first used to sup- 2017, JD Group CEO Qiangdong Liu an-
single location, which kills any cost ef- port JD.com’s renowned two-hour ex- nounced last-mile parcel delivery plans
ficiency of last-mile delivery. press delivery service and will be rolled for 100 universities.
To be effective, last-mile delivery
must determine the best route to dis- Figure 3. The deployment progress of JD’s autonomous driving vehicle.
pense the most parcels. In a city, the
best route is often not the shortest
route, and road conditions constantly
changed over time. An autonomous
driving cart is similar to a larger “self-
closing cabinet,” which can save both
the average waiting time and the distri-
bution time. On the other hand, equip-
ment costs increase with the autono-
mous approach. Ideally, there would
be a cost of only ¥1.5 per autonomous
delivery compared with current ¥7–10
per delivery. Achieving this requires
reducing costs of trial carts from
¥600,000 to ¥50,000 if the autonomous
delivery (and vehicles) are in mass pro-
duction. Figure 4. The use cases of the JD autonomous driving vehicle.
Autonomous driving vehicles have
major technology challenges, too.
One obstacle is the behavior of the
motion detection system when out
in the real world. The algorithm may
work perfectly in lab test conditions,
but may not perform well when it is on
the open road. Another difficulty with
these vehicles is the range of vision
when driving in dark or shadowing ar-
eas. Like the human eye, the range of
the vision may vary in different levels
of brightness; even with infrared light
detection, the vehicle may not “see” in
foggy and dusted environments.
As Figure 3 illustrates, the autono- of these vehicles; instead, the focus is

mous driving vehicles have been de- on carefully designing user scenarios
ployed and well tested in Beijing, Xian, that best fit the strengths of the cur-
and Hangzhou. Cities recently added rent technology.
include Tianjin, Guangzhou, Shang- Last June, JD’s autonomous driv-
hai, Shenzhen, Changsha, Chengdu, ing vehicles hit the open road in Bei-
Wuhan, and Suqian. User scenarios jing’s Haidian District (Figure 5). These
have also expanded from university courier robotic vehicles can perform
campuses, to village areas, municipal 360-degree environmental monitoring,
areas, as well as industrial and institu- automatically avoid roadblocks and pe-
To be effective, tional parks. Due to this wider reach, destrians, and can react to traffic lights.
last-mile delivery operations centers will be setup in Bei-
jing to serve the north, and Changsha
It can independently stop at a distribu-
tion point, send delivery information to
must determine to serve the south. the user, and the user can pick up the
the best route to However, there are still many chal-

lenges for future fleet expansion; for
packages through face recognition, in-
put code, or clicking on the JD app link
dispense the most example, deployment is currently not
available to truly open, more rural en-
from their mobile phone. The vehicle
can store up to 30 containers, and can
parcels. In a city, vironments and human interaction is travel 15k/h. As a result, last-mile deliv-
the best route still required to periodically adjust the
autonomous driving path. In addition,
eries can upgrade from 100–200 orders
a day, to 1,000 orders per day.
is often not the the effort to build a fully autonomous Production difficulties still persist:
shortest route, and driving vehicle network is heavily de-
pended on how the technology evolves,
the fault tolerance of the object recog-
nition technology, the business pro-
road conditions moving from cloud computing to edge cessing for package returns from the
constantly change computing, the capability of sensor
networks, and the level of intelligence
end user to the warehouse, and the reli-
ability of the vehicle (especially round-
over time. in artificial intelligence. Moreover, ing corners) are three major challenges
laws and regulations covering autono- that must be resolved. In the early pro-
mous vehicles on the open road can duction period, human interaction
vary dramatically. may still be necessary. However, we
There are many pilot deployments
extending the last-mile delivery pro- Figure 5. A JD courier robot on a road in
Haidian District, Beijing.
grams, and as a result many custom-
ized last-mile delivery vehicles have
joined the general-purpose fleet,
such as the smart shopping cart to
enhance the supermarket shopping
experience; an model used for data-
center inspections; moving demo cen-
ter vehicles used at conferences, and
so on. As shown in Figure 4, JD does
not intend to solve all the technology,
business, and regulation limitations
Figure 6. Diagram of an on-demand delivery for Meituan’s autonomous vehicle.

believe when more road test data is ac- Figure 7. Meituan’s indoor delivery robot. consider “slow” a grey area, fitting be-
cumulated and analyzed, the accuracy tween high-speed passenger vehicles
and efficiency of these vehicles will be and bicycles. Government regulations
fully achieved. are not ready to handle pure level 4
(driver-free) vehicles like autonomous
How Meituan Uses the Vehicles delivery vehicles. What happens if the
in Delivery Scenarios autonomous vehicle is involved in an
Meituan is the world’s largest e-com- accident or a traffic violation? Who
merce platform for local services. is responsible? Closed environments
Meituan’s service covers over 200 cate- such as common in last-mile deliv-
gories, including catering, on-demand ery can be used as pilot scenarios for
delivery, car sharing, bicycle sharing, learning that enable more complex,
hotel and travel, movie, entertainment Figure 8. Meituan’s outdoor delivery open road scenarios. This suggests
vehicle.
and lifestyle, and spreads over 2,800 two autonomous driving vehicles de-
counties, districts, and cities in China. velopment methodologies: find a kill-
In 2017, Meituan served 310 million ac- er-level use case to drive the business
tive consumers and 4.37 million active model; or find the best technology and
merchants on the platform. build the ecosystem.
One of Meituan’s services is its Fortunately, this somewhat imma-
on-demand food delivery known as ture technology is acceptable for slow-
Meituan Waimai. By May 2018, Meitu- speed delivery vehicles. The lack of
an Waimai was delivering 21 million infrastructure support may prevent us
orders per day and had hired 600,000 from mass deployment in some situa-
food-delivery riders. The service is usu- tions, but there are still many suitable
ally within three kilometers, with tight scenarios for first-stage deployment.
time limits of 30 minutes. The fulfill- The vehicle measures one-meter wide As for government regulation, the Chi-
ment process includes three phases: and two-meters long, with maximum nese government is among the most
1. The courier goes to the restaurant speed of 40km/h and maximum load of supportive for high-tech innovations
to pick up the food, usually by walking 10 orders. It uses the same technology like autonomous vehicles.
through a shopping mall to get to the as an autonomous passenger vehicle,
restaurant; 2. The courier transports including lidar, camera, and GNSS re- The Future of Autonomous
the food next to consumer’s building; ceiver. It can detect pedestrians, bicy- Delivery Vehicles
3. The courier walks or takes an elevator cles, automobiles, and other obstacles, China’s e-commerce boom brings a
to the consumer (See Figure 6). In prac- and can also react to traffic lights. huge volume of logistics demand, both
tice, each phase takes approximately for express package delivery and for
one-third of the total delivery time. Challenges and Opportunities on-demand food delivery. The last-mile
Different phases need different There are many challenges for the delivery is a perfect use case for auton-
types of vehicles. In phase 1 and phase large-scale deployment of autonomous omous driving technology.
3, we use an indoor robot, as shown in delivery vehicles: The technology is not Large-scale deployment of autono-
Figure 7. This robot is 0.5m by 0.8m; its mature yet, the entire ecosystem must mous vehicles still depends on technol-
small size allows for easy entrance to be further developed to make it more ogy maturity and governing regulations.
shopping malls and office buildings. It reliable, and the costs much shrink. Nevertheless, these issues are not show-
does localization based on WiFi finger- Moreover, our living infrastructure stoppers. There are many pilot scenar-
print and vision SLAM. It can also com- is not yet ready for autonomous driv- ios with government support, helping
municate through Zigbee to an eleva- ing. Many communities have locked or the industry step into the water of au-
tor control module, thus can go up and gated entrances, which require manual tonomous driving in order to accumu-
down the buildings. The robot receives operation using a key or access card. late the data and real-world experience
order information from the cloud Many buildings have revolving or swing needed to improve its technology.
scheduling system, runs to the mer- doors, which are easy for humans to use
chant following the scheduled route, but very difficult for robots. Elevators References
1. Levine, S. What it really costs to turn a car into a self-
opens the top cover automatically so are rarely robot-ready. In fact, we must driving vehicle; https://qz.com/924212, 2017.
the merchant can put the food inside. 2. Xinhua News. Chinese express firms deliver over 40
talk to building owners to get a permit bln parcels in 2017, 2018; https://bit.ly/2QnioCi
When approaching its destination, the to install a communication module in 3. Xinhua News. China’s express delivery sector prepares
for post-holiday bonanza, 2018; http://www.xinhuanet.
robot sends a text message to the user’s every elevator. These frustrations must com/english/2018-02/24/c 136996545.htm.
mobile app, and then the user can pick be handled before we can fully enjoy
up the food using the password code autonomous vehicle deployment. Huaxia Xia is Scientist and General Manager of the
included in the text message. Government regulation of autono- Autonomous Delivery Department at Meituan, Beijing.
In phase 2, a larger and faster auton- mous vehicle is a critical concern. Haiming Yang is Chief Architect at JD CTO Group, Beijing.
omous delivery vehicle is used for street While these delivery vehicles run at Copyright held by owners/authors.
transportation, as shown in Figure 8. a fairly slow speed, most regulators Publication rights licensed to ACM. $15.00..
china region
DOI:10.1145/ 3239554
BY YUE ZHUGE/HULU BEIJING
Video
Consumption,
Social
Networking,
and Influence
REVENU E F ROM C H I NA’ S online entertainment market

reached approximately $200 billion this year.a It is not
surprising that China’s video market is comparable to
the U.S.;b,c in fact the number of online video users in
China is 2.5 times more than that of the U.S. (that is,
212 million U.S.-based usersd compared to 579 million The landscape of China’s online
users in China).e Due to advancements in broadband video industry has as many simi-
larities as differences with the U.S.,
and mobile technology, online video is the fastest presenting extremely interesting ob-
growing area for China’s Internet, with a growth of servations and insights. This article
provides an overview of the market,
around 50% over the past five years.f dominant players, and business
a Statista; https://www.statista.com/statistics/237772/value-of-the-chinese-entertainment-and-me- models, as well as presents intrigu-
dia-market/ ing product nuances and technical
b Statista; https://www.statista.com/statistics/278574/revenue-of-chinese-online-video-industry/ advances in this area.
c Statista; https://www.statista.com/statistics/459396/digital-video-revenue-digital-market-outlook-usa/ Like the U.S., there are two major
d Statistia; https://www.statista.com/topics/1137/online-video/
e Statistia; https://www.statista.com/statistics/279537/number-of-online-video-users-in-china/
categories of services for online vid-
f Revenue Growth of China’s Online Video Industry; http://www.iresearchchina.com/content/ eo: the head and the tail. The heads
details7_44334.html are the premium players that stream

copyrighted shows and movies. They Alibaba, respectively. Statistics from all started with free services support-
are the Netflixes and Hulus of China. QuestMobile, China’s big data ser- ed by advertising. Total ad revenue
The tails present professionally gen- vices provider, shows that both Ten- for online video was approximately
PHOTO BY CH UTH ARAT KA MKH UNTEE/SH UTT ERSTOC K.COM
erated content (PGC) and user-gener- cent Video and iQiyi recorded around $10 billion, catching up to ad revenue
ated content (UGC) for different mar- 500 million monthly active users by generated by commercial television.
ket segments. They are the YouTubes the end of 2017, and around 300 mil- However, we have seen a huge take
and Snapchats of China. lion for Youku. The iQiyi video plat- off in subscriptions over the past
form, that went public on NASDAQ two years, when users started gravi-
The Head: Premium last March, also leads in total watch tating toward (and paying for) plat-
Video Platforms hours.g forms that were ad-free and offered
The top three players in premium on- Unlike the premium video services additional features such as access to
line video are iQiyi, Tencent Video, in the U.S., the major players in China higher-quality video and member-
and Youku. These companies are only original content. iQiyi counted
affiliated with Baidu, Tencent, and g Prospectus iQIYI Inc; https://bit.ly/2ouHhzO 60 million subscribers as of Feb.
2018, and Tencent Video has over 40 nal content exclusive on their own
million. By comparison, Netflix had platform.l The lead player for original
about 55 million U.S. subscribers and shows is iQiyi, although all three had
63 million international subscribers different hits. Unlike U.S. platforms
as of January 2018.h where TV dramas tend to reign su-
Payment practices have been preme, variety shows garner a greater
forming rapidly among the middle- audience in China. According to a re-
class and young Internet users in Chi- cent Wall Street Journal article,m “The
na over the past few years. The major Rap of China,” a 12-episode hip-hop
driving forces behind this movement rap competition reality series created
Due to the include a concerted crackdown on pi- and shown by iQiyi targeting younger
prevalence and rated content, affordable prices, and,
most importantly, the ease of online
audience, has become “China’s most
popular entertainment program in
popularity of video payment.iThe monthly subscription 2017.” The show attracted 2.7 billion
streaming, both price is between 20 RMB ($3.16) to

40 RMB ($6.32) for each of the three
views during its run from late June to
early September. Short videos clips
infrastructure services. The total market size of In-
ternet video subscription services has
gleaned from the show were watched
eight billion times on the social me-
and application increased dramatically, from about dia platform Weibo.
companies $63 million in 2012, to $2.1 billion in
2016, and an estimated $11.5 billion
Another fact about China’s premi-
um online video services that differs
in China have in 2022. from their U.S. counterparts is that they
invested heavily in Unlike premium content distribu-
tions in the U.S., many of the TV dra-
all participate in PCG and UGC short
video markets. But, as we will discuss,
video technology. mas and movies are non-exclusive, other emerging players are increasing-
available on all three major services ly dominating these segments.
and elsewhere once they aired on TV
or in movie theaters. Exclusive con- The Tail: UGC and
tent is usually far more costly. All PGC Video Platforms
services invest heavily in copyrighted With more than 100 players and 400
movies and TV shows, resulting in million users in 2017, the short vid-
very high production prices for this eo landscape in China is hugely dy-
content. Platforms often must make namic, and far from settling down.
a calculated bet on what shows will The user base is huge, fast growing,
prove popular, and make an offer be- and extremely active. There are sev-
fore production. According to its pro- eral major players, and most of them
spectus, iQiyi’s annual content cost came into prominence over the last
is about $1.9 billion, while the other year or two.
two platforms spent nearly double Unlike the U.S. market, the initial
that. These purchases are far beyond dominant short video platforms were
subscription fees and advertising in- the premium players, like iQiyi and
come. According to its public filing, Youku. For example, YouKu claims to
iQiyi lost $169 million Q1 2018,j and have invested approximately $1.6 bil-
the other two services also lost simi- lion in user-generated content since
lar sums.k This situation will contin- 2015.n They are modeled after You-
ue for the next few years. Tube and had a large number of view-
To reduce costs, and to stand out ers watching a mixture of premium
among their peers, all three premium and UGC content. Their short-form
services have started to make origi- videos include movie clips and music
videos as well as free-form user-cre-
ated content, and they provide chan-
h Recode: Netflix now has nearly 118 million nels created for professional content
streaming subscribers globally; https://www.
recode.net/2018/1/22/16920150/netflix-q4-
producers.
2017-earnings-subscribers
i CNN: China’s big streaming shift: Pay- l iResearch: 2017 Report on Original Video Pro-
ing instead of pirating; http://money.cnn. ductions in China; http://report.iresearch.cn/
com/2018/01/24/technology/china-streaming- report_pdf.aspx?id=3088
music-video/index.html m Wall Street Journal; https://on.wsj.com/2hGe7Nk
j iQiyi First Quarter Financial Results; https:// n Youku making $1.6 billion investment in
bit.ly/2LfXGRK UGC; http://www.chinadaily.com.cn/busi-
k Forbes; https://bit.ly/2KOhQ9v ness/2015-08/07/content_21525850.htm

However, in the past two years, Douyin, as well as competing apps lion in 2017, and is expected to ex-
we have witnessed the phenomenal like Weishi, allows users to create ceed $4.5 billion by 2020.r
growth of several mobile short video and share short music videos using We are just beginning to see the
apps not associated with the premium provided templates. great potential of China’s short video
players. Thanks to their ease of use, Another large set of online video market; it will evolve in the coming
these apps became super popular and services focuses on end user live- years. There is a trend of going verti-
prevalent, penetrating a huge number streaming: they provide live-broad- cal, with different players specialized
of users in massive areas of China (see casting capabilities for consumers, in a domain or a demographic. With
Figure 1). We saw a 311% increase in popular pop idols, and the general fierce competition within China,
short video traffic in Q3 2017 com- public. According to Pandaily, eight many of these short video apps made
pared to one year earlier.o Analysts la- live-streaming platforms in China their ways overseas, in particular, to
beled the China Internet era of 2017 raised approximately $11.6 billion other Asian countries. For example,
as “the year of short videos.” in the first half of 2018. The leading Kuaishou was the Number One video
New popular short video services platforms—Huya Inc. and Douyu app in the Korean app store in No-
include Kuaishou, Huoshan, Xigua, TV—account for nearly 70% of the to- vember 2017. According to an April
Douyin, Miaopai, Meipai, Weishi, tal.q 2018 report from 36kr, almost half of
and many more. The most popular Just like all premium players pro- the popular short video apps in Asia
of them have monthly active users in vide UGC, all short-video players also were “made in China.”s
the 100 million–200 million range. working on live streaming. The situa-
The app experience is mostly a flow tion is highly dynamic; the landscape Mobile and Social via Video
of mobile video feeds, with videos was quite different several months Large screens, including TV and oth-
running seconds to minutes long. ago, and it is likely to be very differ- er OTT television devices, are heavily
The app provides good tools for users ent a few months from now. regulated in China. Due to the high
to shoot, edit, beautify, and add spe- Still experimenting with business penetration of mobile devices, the
cial effects to the videos. These video models, the new mobile video apps usage of online video tilts strongly
apps can be divided into two catego- make money through advertising, toward mobile. Even for the premium
ries: those more like the “dubsmash” affiliate marketing to e-commerce, users, the dominant media prefer-
mobile app, where users can record gifting, and other creative methods. ence is mobile and personal com-
their own video dubbing over music, The top “VIPs” on these platforms puters. According to iQiyi, it has on
and those more like Snapchat with may have millions of fans, and they average 421 million monthly active
free-form videos. All video apps pro- profit by advertising or selling goods. users (MAU) on mobile and 424 mil-
vide strong discovery and follow func- Brands also start to create channels lion MAU on PCs in Q4 2017, while
tionalities, encouraging interaction on these platforms. According to the TV and OTT devices are negligible.
and social connections among users. China’s Short Video Industry Report Many short videos applications are
More specifically, videos in Xigua from iResearch, the revenue from the
are mainly PGC content and video short video arena reached $860 mil-
r iResearch: 2017 China’s Short Video Industry
clips running 1–5 minutes long. Con- Report; https://bit.ly/2zA2Mac
tent in both Huoshan and Kuaishou q Pandaily: 2018 Report on Live-streaming; s 36kr: Made in China; http://36kr.
are short UGC videos of about 15 https://bit.ly/2mbdhHt com/p/5130958.html
seconds. Many everyday users, es-
pecially those in the rural area and Figure 1. A sample of logos for popular short video apps in China.
small cities, record short videos and
share them via these apps.p The apps
provide a method for users to present
themselves, compete to gain fans and
eyeballs, and eventually profit from
the viewership. Kuashou, with daily
active users of close to 100 million,
is the current leader in this catego-
ry. Using a slightly different format,
Douyin has recently received consid-
erable attention and gained a signifi-
cant user base, claiming more than
150 million monthly active users.
o Short Video Report from iFeng; http://tech.if-

eng.com/a/20180104/44831545_0.shtml
p QuestMobile: China Lower Tier City Post-90s’
Mobile Life; https://www.questmobile.com.
cn/blog/en/blog_138.html
designed for and used exclusively on ests, follows, claps, love, comments,
mobile devices. and gifting. The app platforms com-
Most people using online video— pete by paying large sums of money
premium or UGC—are quite young. to the popular “VIPs,” encouraging
According to iResearch, about 83% of them to set up channels on their plat-
iQiyi’s mobile users in August 2017 form.
were younger than 35. The demo- For a short video app user, the
graphic is similar for short videos, number of fans defines success, and
with more than 80% of users younger the top “VIPs” have more than 30 mil-
than 35. lion fans. To become a “VIP,” users
Analysts have A major difference between the compete to create attractive, frequent
labeled the China premium video products in China
and in the U.S. is the emphasis on so-
content, and stimulate excitement
with their fans. A slogan popular
Internet era cial features. If you look at the video among the top users is “300 clips a
of 2017 as “the discovery or display pages carefully,

you will find sharing buttons on every
day!”
year of short page, linking to every possible social

network. Excerpts from the longer
Technical Opportunities
and Challenges
videos.” shows, like songs or jokes, are often With the prevalence and popularity
very popular clips for users to share. of video streaming, both infrastruc-
These services also provide interac- ture and application companies in
tion such as screen bullets, com- China have invested heavily in video
ments and love buttons, to improve technology. Generally, a video startup
the social experience while watching time of less than two seconds is con-
(Figure 2). sidered “good;” of course, one needs
Another type of social interac- to consider the video quality, device
tion for the premium video service type, network situation, and other
is with media stars. iQiyi PaoPao is a factors to make a specific judgment.
place for celebrities to interact with According to a 2017 report from
their fans. Hundreds of movie and China Broadband Development Alli-
TV stars have their home pages set ance, the average video startup time
there, with 20 million fans follow- (VST on broadband) was between 0.6
ing the top stars. iQiyi claims 600 to 1 second, which is better than the
million active users in PaoPao, who world standard.
are proven to be more sticky, watch- What is unique about China’s In-
ing 20% more videos on average than ternet market that fosters technical
other users. advancements and innovation? We
For short video apps, a signifi- can call out a few examples: mobile
cant emphasis is on social functions. dominance, the huge number of users
These short video apps provide many and available user data, the massive
ways for interaction, including dis- scale of user-generated content, regu-
covering people with similar inter- latory requirements, ever-changing
Figure 2. A typical page from the iQiyi website with bullet screen.

user interests, and extremely fierce For short forms, users upload puter vision. Instead of inventing
competition. videos to platforms every day, which new algorithms, the practical solu-
Features more suitable for mobile may contain inappropriate content tion was to apply the PC applicable
viewers, like usage scenarios that such as pornography, content that algorithm to mobile. For example, to
support watching movies while com- infringes on copyrights, and duplica- improve speed, or to use local data,
muting on public transit, add to the tions. Although many of the media a ML model inference for object rec-
success of these platforms. For exam- companies have thousands of hu- ognition may need to be done on a
ple, all premium video platforms pro- man editors, it is difficult to manu- mobile phone. In such circumstanc-
vide offline viewing as a default fea- ally examine all contents in real time. es, the size of the ML model must be
ture, while short video apps provide To react quickly to the market, video small enough to fit, and even better,
offline information such as news for companies developed adaptive ma- optimized to the phone hardware.
people to view while not connected. chine models that work together with China’s streaming video compa-
All video providers rely on person- human editors to prescreen and filter nies continue to explore many other
alized recommendation; it is espe- out potentially problematic videos. ways to innovate with technology. For
cially important for short videos as Necessity also prompted the rise of example, Tencent tried to use robots
the main method to discover videos high-tech providers such as Sense- to write articles for live news, and
of interest. A commercial recom- Time that specialize in image and Youku tested auto caption transla-
mendation system employs large- video reviewing technology.v tion. Technology is also used to pre-
scale machine learning on real-time Another important application of dict user reactions and suggest con-
streaming data and tries to optimize AI is to add special effects to user- tent investment in media. iQiyi, for
metrics such as the click-through generated videos. For example, Chi- example, claimed to have casted ac-
rate, time spent, and user reten- nese users often want features such as tors in their original shows based on
tion. With the availability of large beautifying faces, adding special cos- AI predictions.
amounts of data from both first- and tumes, and changing backgrounds.
third-party vendors, one thing that To do so, one needs to detect facial Summary
distinguishes recommendation sys- key points and perform highly accu- Online video, or online media in a
tems on China’s Internet from the rate face and hair segmentation. The larger setting, is one of the best plac-
rest of the world is its sheer scale: related technologies, such as style es where technology creatively meets
Alibaba claims its machine-learning transfer and object segmentation, user experience. Through online me-
platform—eXtreme Parameter Sever are active areas of research in com- dia, there are infinite ways to connect
(XPS)—processes 10B samples and puter vision. Significant progress has hundreds of millions of users to bil-
100B features daily,t while the Tout- been made in recent years using deep lions of pieces of information. This
iao platform claims tens of billions learning. In addition, augmented re- is an exciting time for China’s online
of features and billions of vectors.u ality (AR) also has a lot of interesting video market, as it has just realized
By the same token, strong recall tech- usage on video, like blending fake ob- mass-market adoption. We look for-
nology is developed to select the top jects with video backgrounds. These ward to the new technological inno-
few thousand results from millions technologies help users create more vations to come.
of potentially low quality or redun- interesting videos that are better fit
dant user-generated content. for sharing. w Buying while watching videos; https://www.
In general, machine learning is To monetize the contents, all huxiu.com/article/161897.html
used to annotate, classify, and ana- video platforms use computational
lyze video content, and to build user advertising to display ads targeting Yue Zhuge is Vice President of Research and
Development and General Manager at Hulu, Beijing.
profiles based on the user’s geologi- users based on their personalized
cal location and browsing history. It interests. In addition, many innova-
then uses such information to match tive video technologies are used in
a user to videos that reflect their in- video ads, for example, ads overlaid
terests. Instead of a ‘pull’ or ‘sub- on the videos, ads inserted in live
scribe’ model, many of the Chinese broadcasts, ads on bullet screens,
short video apps ‘pushes’ the rele- 360 ads integration with direct sell,
vant content to viewers’ home pages. and more. For example, one video ad
Since most Chinese viewers are quite service provider—Video++—claims
receptive to pushed information, rec- to have more than 9,000 clients using
ommendation technology is very ef- their technology to integrate direct
fective, and users can indulge in con- sell ads into streaming video.w
tent they like for hours a time. Mobile video apps have also pre-
sented interesting technical chal-
lenges to machine learning and com-
t Alibaba’s eXtreme Parameter Server; http://m.
sohu.com/a/210104407_473283
u CSDN: Recommendation System in Toutiao; v Sensetime: Video Content Review; https://
https://bit.ly/2NJVS5n www.sensetime.com/core#2 © ACM 0001-0782/18/11 $15.00.
china region
DOI:10.1145/ 3239556
BY YUTONG LU/SUN YAT-SEN UNIVERSITY, NSCC-GZ,

DEPEI QIAN/BEIHANG UNIVERSITY, SYSU,
HAOHUAN FU/TSINGHUA UNIVERSITY, NSCC-WX,
WENGUANG CHEN/TSINGHUA UNIVERSITY
Will
Supercomputers
Be Super-Data
and Super-AI
Machines?
technologies must be addressed to help
develop next-generation exascale super-
computing systems.
Supercomputer
Development in China
(HPC) plays an
H I G H -PERF OR M A NC E C O MPUT I NG
The first supercomputer developed
important role in promoting scientific discovery, in China was Yinhe-I in 1983, with
addressing grand-challenge problems, and promoting 1MFlops peak performance, by the Na-
tional University of Defense Technology
social and economic development. Over the past (NUDT). China has since continued its
several decades, China has put significant effort supercomputer development.
Three major teams in China—Tian-
into improving its own HPC through a series of key he, Sunway, and Sugon, like IBM, Cray,
projects under its national research and development and Intel in the U.S.—have developed
program. Development of supercomputing systems a series of domestic supercomputing
systems, including Dawning 4000A
has advanced parallel applications in various fields (2005, 11.2TFlops); Tianhe-1A (2011,
in China, along with related software and hardware 4.7PFlops, number one in the TOP500);
Sunway BlueLight (2011, 1PFlops);
technology, and helped advance China’s technological Tianhe-2 (2013, number one in the
innovation and social development. TOP500 six times); and Sunway Taihu-
To meet the requirements of multidisciplinary Light (2016, number one in the TOP500
four times). Chinese supercomputers
and multidomain applications, new challenges in have adapted multiple architectures,
architecture, system software, and application including vector, SMP, ccNUMA, MPP,

cluster, heterogeneous-accelerated, country’s 12th Five Year Plan. The first adopted by the TaihuLight system in-
and many-core. Their developers have stage of Tianhe-2 was completed in ear- clude highly scalable heterogeneous
thus acquired rich knowledge of super- ly 2013, delivering peak performance of architecture, high-density system inte-
computing hardware and software and 55 petaflops and Linpack performance gration, a high-bandwidth multi-level
trained a large number of engineers of 33.9 petaflops. It was a hybrid system network, highly efficient DC power
along the way. consisting of Intel Xeon processors and supply, and customized water cooling.
In the years since the Yinhe-1 system Xeon Phi accelerators and claimed the It also represents a new milestone in
in 1983, China has achieved the leading top position in the TOP500 six consecu- China’s HPC history for being a 100PF
position in supercomputer develop- tive times, from June 2013 to November system implemented completely with
ment worldwide. For example, Tianhe-2 2015. In 2017, NUDT deployed the new homegrown processors.
and Sunway TaihuLight held the top po- 128-core Matrix 2000 processor and Two 100PF systems have been in-
sition in the TOP500 from 2013 to 2017. applied it to upgrading Tianhe-2. The stalled at two national supercomputing
At the same time, the number of HPC upgraded system, called Tianhe-2A, centers, with Tianhe-2A at the National
systems in China increased dramati- delivered 100.68 petaflops peak perfor- Supercomputing Center in Guangzhou
cally, exceeding the number of HPC mance and approximately 61 petaflops and Sunway TaihuLight at the National
systems in the U.S., as of the June 2018 Linpack performance. Supercomputing Center in Wuxi, re-
TOP500 ranking. And Chinese HPC The second 100 petaflops system is spectively. Moreover, extra effort has
manufacturers Lenovo, Sugon, Inspur, the many-core-based Sunway Taihu- gone toward increasing the user popu-
and others have claimed significant Light that delivers 125 petaflops peak lation and developing applications.
shares of the market for HPC systems performance and 93 petaflops Linpack Efficient HPC software stack. With
and high-end servers. performance. Sunway TaihuLight was development of domestic supercom-
China’s leading-class supercom- implemented with the homegrown puting systems, China has established a
puter systems. Two 100PF computers— many-core processor SW26010—a self-controllable system software stack
Tianhe-2 and Sunway TaihuLight— 3Tflops chip with 260 cores—ranking covering basic drivers, operating sys-
were developed with support from the in first place four times, from 2016 to tem, compilers, communication soft-
National High-Tech R&D Program in the 2017, in the TOP500. Key technologies ware, basic library, parallel program-
ming environment, parallel file system, tax targeting the CPE cluster; the other
resource management, and scheduling is a high-performance yet lightweight
system, thus providing a comprehen- thread library called Athread that ex-
sive capability for large-scale system ploits fine-grain parallelism. With a
construction and performance tuning. byte-to-flop ratio five to 10 times less
The Tianhe-2 software stack consists than other top-five systems, the system
of four components: a system environ- needs extraordinary memory-related in-
ment, an application-development novation to deal with the memory wall
environment, a runtime environment, to scale its simulation capability with
and a management environment. The 125 Pflops computing performance. It
China must rely on system environment consists of the also needs software migration to such
self-controllable 64-bit Kylin OS and H2FS parallel file
system and a resource-management
an architecture, with radical changes in
both compute and memory hierarchy.
technologies, system. Various job-scheduling policies For each CPE, instead of hardware L1
especially for and resource-allocation strategies have

been implemented so system through-
cache, the system includes user-con-
trolled 64-KB local data memory (LDM)
basic hardware put and resource utilization can be en- that completely changes the memory
perspective for programmers.
components
hanced. The application-development
environment supports multiple pro- Effect on HPC industry. The rapid
like processors, gramming languages, including C, C++,
Fortran 77/90/95, a heterogeneous pro-
development of China’s supercomput-
ing systems has benefited from the
memory, and gramming model called OpenMC, and continuous support of several national
interconnect the traditional OpenMP and MPI pro-
gramming models. The THMPI is an up-
five-year plans, as well as the country’s
economic development and national
networks, dated version based on MPICH over the strength. The systems support scientific
to build an Tianhe Net communication protocols,
able to deliver 12GB/s P2P bandwidth
research, technological breakthroughs,
and an industrial revolution while pro-
exascale system. at the user level. The runtime environ- moting development and expansion
ment consists of the parallel numeri- of the IT server sector. Vendors Inspur,
cal toolkit for scientific applications, Lenovo, Huawei, and others have taken
a scientific data-visualization system, advantage of research and development
and an HPC application service and of domestic HPC kernel technologies,
cloud-computing platform. It provides including systems integration, storage
runtime support to multiple fields, in- architecture, interconnection technol-
cluding scientific and engineering com- ogy, optimization techniques, system
puting, big data processing, and high- testing and benchmarks, and applica-
throughput information service. tion technologies. At the same time,
To support both HPC and big data a large number of HPC hardware and
applications, the Sunway TaihuLight software engineers have been trained
also includes highly efficient schedul- for IT companies in China, including
ing and management tools and a rich Alibaba, Baidu, and Tencent. The Chi-
set of parallel programming languages nese IT industry has also benefited from
and development environments for ap- the technological innovation resulting
plication research and development. from supercomputer development,
A two-level “MPI+X” approach helps including high-performance clusters,
devise the right parallelization scheme HPC-enabled cloud computing, distrib-
for mapping the target application onto uted computing, and application opti-
the processes and threads that utilize mization.
more than 10 million of the system’s
cores. The 260-core SW26010 processor Key Applications and Beyond
consists of four core groups (CGs), with Along with the rapid development
each CG including one management of hardware systems, major break-
processing element (MPE) and one throughs have been made in applica-
computing processing element (CPE) tion development based on the new
cluster with eight-by-eight CPEs. Each supercomputers, covering both tra-
CG usually corresponds to one MPI pro- ditional HPC domains like climate,
cess. Within each CG, the system has seismology, computational fluid
two options: one is Sunway OpenACC, dynamics, and fusion and relatively
a customized parallel compilation new applications like big data and
tool that supports OpenACC 2.0 syn- artificial intelligence (AI).

Atmospheric modeling. Large-scale

simulation of the global atmosphere is
one of the most computationally chal-
lenging problems in scientific comput-
ing. The Tianhe-1A hybrid CPU-GPU
system launched a continuous devel-
opment effort toward highly scalable
atmospheric dynamic solvers on het-
erogeneous supercomputers, achiev-
ing sustained double-precision per-
formance of 581 Tflops on Tianhe-1 by
efficiently using the CPU and GPU re-
sources on 3,750 nodes. The work was
later extended to the Tianhe-2 system,
scaling to the 8,644 nodes of Tianhe-2,
achieving 3.74 Pflops performance with
CPUs and MICs.
In 2016, the solver effort migrated to
the Sunway TaihuLight supercomput- The Tianhe-2 supercomputer is installed at the National Supercomputer Center in Guangzhou.
er, with a highly scalable fully implicit
solver for cloud-resolving atmospheric applications. Especially for earthquake space coverage and local-search optimi-
simulations. The solver supports fully simulation, which requires both a large zation. A typical data scale of 40 million
implicit simulations with large time amount of memory and high memory molecules requires more than 800TB.
steps at extreme-scale resolutions and bandwidth, breaking the memory wall With the need to handle approximately
encapsulates novel domain decompo- becomes the top challenge. To resolve 40 million small files, the optimized de-
sition, multigrid, and ILU factoriza- this bandwidth constraint, Chinese sign on Tianhe-2 takes advantage of the
tion algorithms for massively parallel researchers have performed three no- high throughput of the H2FS file system,
computing. With both algorithmic table optimizations: a customized par- I/O-congestion control, multi-stage task
and optimization innovations, the allelization scheme that employs the scheduling, task-pool management,
solver scales to 10.5-million heteroge- 10-million cores efficiently at both the asynchronous I/O, and communication
neous cores on Sunway TaihuLight at process level and the thread level (to to improve application scalability. The
an unprecedented 488-m resolution with address the scale challenge); an elabo- design is able to screen 35 million can-
770-billion unknowns, sustaining 7.95 rate memory scheme that integrates didate drug molecules against the Ebola
PFLOPS performance in double-preci- on-chip halo exchange through regis- virus in 20 hours. The parallel efficiency
sion with 0.07 simulated-years-per- ter communication, optimized block- from 500 to 8,000 nodes (1.6 million
day. Considered a major breakthrough, ing configuration guided by an analytic hybrid cores) is over 84%. Such compu-
it won the ACM Gordon Bell Prize in model, and coalesced DMA access with tational capability demonstrates how
2016, the first time in 29 years Chinese array fusion (to alleviate the memory Tianhe-2 is able to screen all known 40
researchers were so recognized. constraint); and on-the-fly compression million drug molecules against an un-
Earthquake simulation. Earthquake that doubles the maximum problem known virus in a single day.
simulation is another traditional ma- size and further improves performance Large-scale graph computing. With
jor challenge for supercomputers. by 24% (to further address the memory increasing demand for graph process-
Starting with AWP-ODC and CG-FDM wall). The extreme cases demonstrate ing, both Sunway TaihuLight and Tian-
codes, Chinese researchers have de- sustained performance greater than he-2 have earned Graph500 breadth-
veloped nonlinear earthquake simula- 18.9 Pflops, enabling simulation of the first-search (BFS) scores. Sunway
tion software on Sunway TaihuLight, Tangshan earthquake through an 18Hz TaihuLight ranks second at 23,755.7
winning the ACM Gordon Bell Prize in scenario with eight-meter resolution. giga-traversed edges per second
2017. While TaihuLight delivers an un- Drug design. Virtual high-through- (GTEPS), and Tianhe-2 ranks tenth.
precedented level of computing power put screening is an established com- In addition, the graph-processing
(three times that of Tianhe-2 and five putational method for identifying drug framework ShenTu was developed
times that of Titan), its memory sys- candidates from a large collection of on the Sunway TaighuLight, allowing
tem is relatively modest. Total memory compound libraries, accelerating the users to write vertex-centric graph-
size is similar to other systems (such drug-discovery process. When diseases processing programs and scale out
as Piz Daint and Titan, two GPU-based and unknown viruses appear, it is es- the computing to the whole Sunway
systems), with a significantly lower pecially useful for screening as many TaihuLight machine. The framework
byte-to-flop ratio, as compared to 1/5 molecules as possible to help identify can support such graph algorithms as
in other heterogeneous systems and an effective treatment. The kernel al- PageRank, Shortest Path, BFS, and K-
1/10 in the K Computer. Such a system gorithm is the Lamarckian Genetic Al- Core with just 20 lines of code. It can
represents both high potential and no- gorithm, a combined local search and process graphs with 10 trillion edges in
table challenges for scaling scientific genetic algorithm for efficient global- tens of seconds. For example, ShenTu
can complete one round of page rank- by the Tianhe team, Sunway team, and
ing in 21 seconds on a 12-trillion-edge Sugon teams, respectively, pursuing
real-world Web graph, an order-of- novel architectures, kernel technology
magnitude performance improvement breakthroughs, and possible technical
on graphs that are one order of magni- approaches for implementing future
tude larger than prior work. exascale systems. Carried out from 2016
Deep-learning applications. In ad- to 2018, the projects were completed by
dition to traditional applications, ef- the end of June 2018. The second step is
forts are under way to explore the po- to select two of the three to develop ex-
tential of training complex deep neural ascale systems by the end of 2020.
The Chinese networks (DNNs) on these heteroge- The project aims to develop self-
government is neous supercomputers. For example,
there is a highly efficient library on
dependent and controllable kernel
technologies for exascale computing
encouraging swDNN on Sunway TaihuLight for ac- and maintain China’s leading posi-
development celerating deep-learning applications.

By identifying the most suitable ap-
tion in global HPC; develop a number
of critical HPC applications and build
of the kernel proach for mapping the convolutional a national software center, establish-
ing an HPC-application ecosystem; and
technologies,
neural networks (CNNs) onto the 260
cores within the chip, swDNN achieves build a national HPC environment with
including high- double-precision performance greater
than 1.6Tflops for the convolution
world-leading resources and services.
The Chinese exascale system will aim
performance kernel, which is over 54% of the theo- to achieve the following specification:
processor/ retical peak of the SW26010 processor.
Parallel training is supported through
peak performance of 1EFlops, node per-
formance greater than 10TFlops, mem-
accelerator, novel swCaffe, a redesigned version of Caffe, ory capacity greater than 10PB, storage
memory devices, for large-scale training on up to 1,000
Sunway nodes.
capacity of 1EB, interconnection net-
work bandwidth greater than 500Gbps,
and interconnect Some deep-learning applications Linpack efficiency over 60%, and energy
networks. run on Tianhe-2, including for tumor

diagnosis, video analysis, and intelli-
efficiency greater than 30GFlops/W.
Moreover, the system should include
gent transportation. One application an easy-to-use parallel programming
called “trade business of Guanghzou” environment, monitoring and fault-tol-
supports 900 million deals annually. erance management, and support for
large-scale applications.
Toward Next-Generation Systems Our approach. Exascale computing
As of July 2018, the Summit supercom- must address unprecedented techni-
puter (powered by IBM POWER9 and cal challenges worldwide, including
Nvidia V100 processors) was ranked the memory wall, communication wall,
number one in the TOP500, achiev- reliability wall, energy-consumption
ing 122PF LINPACK performance and wall, and programming wall. A strat-
3.3Exaops for data processing and AI egy of hardware and software co-design
applications at half precision. However, will thus be required. For example,
a number of planned systems will soon new algorithms will be proposed and
surpass it. In the past few years, several implemented with the target hardware
countries have targeted exscale com- features in mind. Resilience will be ad-
puting, including ECP in the U.S., Post dressed through fault-tolerant hard-
K in Japan, and EuroHPC in the E.U., ware design and fast failure detection
aiming for breakthroughs in key tech- and recovery enabled by software.
nologies, including novel architecture, China must also rely on self-control-
high energy efficiency, system software, lable technologies, especially for basic
and exascale applications. These efforts hardware components like processors,
lead the way toward next-generation su- memory, and interconnect networks,
percomputing systems. to build an exascale system. The Chi-
China’s exascale project. The key nese microelectronics and IC industry
HPC project in China’s 13th five-year re- is still relatively weak, thus calling for
search and development program was more basic research and technology
launched two years ago to pursue a two- development. Also, China must satisfy
step strategy for developing exaflops su- various complex application needs and
percomputing. The first step aims to de- deal with a huge and highly diverse mar-
ploy three prototype exascale computers ket, thus calling for multiple design and

development approaches. The current

key HPC project relies on architectural
innovation, technology breakthroughs,
and hardware and software coordina-
tion to address these challenges. Novel
architectures will be explored to ad-
dress the requirement of the various
applications. Engineering trade-offs
will be necessary to balance metrics
in power consumption, performance,
programmability, and resilience. Tech-
nology breakthroughs will be pursued
through comprehensive research ef-
forts. Special attention will target appli-
cation software.
The Chinese government is encour-
aging development of the kernel tech-
nologies, including high-performance
processor/accelerator, novel memory The TaihuLight supercomputer is installed at the National Supercomputer Center in Wuxi.
devices, and interconnect networks.
The effort toward self-controllable pro- launched an initiative in big-data sci- being implemented in the U.S., Japan,
cessor technologies include Sunway’s ence to research computational models, and Europe, aiming to deliver exa-
SW many-core processor, NUDT’s FT se- algorithms, and platforms for data ana- flops computers in three to five years.
ries CPU and Matrix series accelerator, lytics and processing. Related projects Their effort is like mountain climbing.
and Sugon’s X86 AMD-licensed proces- focus on such big-data-related fields as Climbers can enjoy the magnificent
sor. NUDT has developed its propriety video processing, health and medicine, scenery only when they get to the top
interconnect network TH-Net with high intelligent transport, finance, govern- following their arduous journey. The
bandwidth and low latency, making the ment administration, and intelligent Chinese HPC community is willing to
TH-2 system efficient and scalable. The education. And an upcoming national work with the international HPC com-
Sunway system also includes its own research initiative on AI will call for munity to pursue the goal of exascale
self-designed large-scale network, en- HPC support for AI applications. The computing, sharing the experience
abling the TaihuLight system to run ef- scope of HPC applications will definite- and jointly attacking the grand chal-
ficiently on 10 million cores. More new ly broaden in the future. lenges. HPC should not be a new kind
technologies breakthroughs are still of arms race but technology that ben-
needed to support successful develop- Conclusion efits all people.
ment of exascale systems. Parallel computers and parallel applica- Chinese researchers also need to be
The key HPC project also targets ap- tions have cross-pollinated each other aware of new technologies and applica-
plications focusing on climate change, in China for the past 15 years. The avail- tions. The emergence of big data and
ocean simulation, combustion, elec- ability of leading-class supercomputers AI brings new challenges and opportu-
tromagnetic-environment simulation, has stimulated the growth of parallel nities to HPC. Supporting big data and
oil exploration, material science, astro- applications in a number of fields, an AI with HPC while being rewarded by
physics, and life science. A new compu- application- and technology-driven- big-data- and AI-enabled technologies
tational model and algorithm will be growth trend that will continue into the for HPC should drive coordinated and
designed, and the efficiency, scalabil- future. converged development of all three.
ity, reliability of the applications will be How to maintain sustainable devel- All should take this opportunity to em-
evaluated for future exascale systems. opment toward the next generation of brace this new exciting era of supercom-
The pervasive use of HPC has pro- supercomputing in China is an open puting.
moted development of large-scale par- question. Though significant progress
allel software. Chinese researchers are has been made in recent years, China is Yutong Lu is a professor of data and computer science
at Sun Yat-Sen University and Director of the National
strengthening development of system still behind Western countries in HPC Computing Center in Guangzhou.
software and application software for in many respects. A long-term national Depei Qian is a professor and Dean of Data and Computer
domestic hardware systems, aiming to Science at Sun Yat-Sen University, Guangzhou and a
plan on HPC is needed that would allow professor of computer science and engineering at Beihang
establish the country’s own HPC eco- more systematic deployment of HPC re- University, Beijing.
system. search. A mechanism that would ensure Haohuan Fu is professor in the Department of Earth
System Science at Tsinghua University, Beijing, and
Emerging big data and AI applica- sustainable development of the nation- Deputy Director of the National Supercomputing Center
tions have also gained the attention al HPC infrastructure must be estab- in Wuxi.
of Chinese HPC research programs. lished so the supercomputing centers Wenguang Chen is a professor in the Department of
Computer Science and Technology at Tsinghua University,
The National Natural Science Founda- do not have to struggle to find the mon- Beijing.
tion of China, the counterpart of the ey needed to run the supercomputers.
U.S. National Science Foundation, has Exascale computing projects are © 2018 ACM 0001-0782/18/11 $15.00
practice
DOI:10.1145/ 3233233
Platform (GCP). This article discusses

Article development led by
queue.acm.org
the reasons for the move to GCP, and
how the migration was accomplished.
While Ganeti is inexpensive to run,
How Google moved its virtual desktops scalable, and easy to integrate with
to the cloud. Google’s internal systems, running a
do-it-yourself full-stack virtual fleet
BY MATT FATA, PHILIPPE-JOSEPH ARIDA, had some notable drawbacks. Because
running virtual desktops on Ganeti en-
PATRICK HAHN, AND BETSY BEYER
tailed managing components from the
Corp to Cloud:
hardware up to the VM manager, the
service was characterized by:
˲˲ Long lead times to expand fleet
capacity;
Google’s
˲˲ Substantial and ongoing mainte-
nance overhead;
˲˲ Difficulty in staffing the team, given
the required breadth and depth of
Virtual
technologies involved;
˲˲ Resource waste of underlying hard-
ware, given working hours for a typical
Googler are 8–10 hours a day; and
˲˲ Duplication of effort in the virtu-
Desktops
alization space at Google across mul-
tiple divisions.
Taken together, these issues re-
duced the time and resources available
to the team tasked with improving the
offering. To tackle these problems, the
team began migrating Google’s top cor-
porate workload to GCP in early 2016.
Planning
Planning for the migration consisted
OVER ONE-FOURTH OF Googlers use internal, datacenter- of several discrete stages. In aggre-
hosted virtual desktops. This on-premises offering gate, the planning phases described
sits in the corporate network and allows users to here took approximately three to four
months and involved the participa-
develop code, access internal resources, and use GUI tion of approximately 15 subject-mat-
tools remotely from anywhere in the world. Among its ter-expert groups.
Vision. This phase articulated the
most notable features, a virtual desktop instance can core business and engineering case
be sized according to the task at hand, has persistent for replatforming our virtual desktops
user storage, and can be moved between corporate to GCP. The top reasons for pursuing
virtual desktops on Cloud included:
datacenters to follow traveling Googlers. ˲˲ Large reduction in engineering toil;
Until recently, our virtual desktops were hosted ˲˲ Improved user experience;
˲˲ Reduced total cost of platform
on commercially available hardware on Google’s
IMAGE BY EVERY TH ING P OSSIBLE
ownership; and
corporate network using a homegrown open source ˲˲ Desire to improve GCP.
virtual cluster-management system called Ganeti Customer user journeys. The next
step was to study the users and their
(http://www.ganeti.org/). Today, this substantial needs. The primary users—the virtual
and Google-critical workload runs on Google Cloud desktop owners, fleet Site Reliability

practice
Engineers (SREs), and the fleet secu- could not deliver a feature in time for receives a physical machine (or even
rity managers—were identified and a release milestone, they implement- a virtual machine on privileged corpo-
surveyed to determine their typical ed bridging solutions that favored rate networks), the user can initially log
workflows. Using this information, solutions the public could use (for ex- in and request a certificate because the
the migration team wrote implemen- ample, Forseti Security) above Google- corporate network retains the level of
tation-agnostic user journeys. To per- only workarounds. privilege needed to sync login policies.
form effective gap analysis, and to Workstream work breakdown and Extending this level of network privi-
reduce bias during the design phase, staffing. With design proposals in lege to public cloud IP ranges is unde-
the team made a conscious effort to place and implementation directions sirable for security reasons.
describe user journeys in a purely func- decided, the team created detailed To bridge this gap, Google devel-
tional fashion. work plans for each workstream in a oped a (now-public) API to verify the
Production milestone definition. central project management tool. The identity of instances (https://cloud.
Based on the survey responses and work was organized by customer user google.com/compute/docs/instances/
usage patterns collected, the team journey, and tasks were broken down verifying-instance-identity). The API
grouped customer user journeys (by by objective, key results, and quarter. uses JWTs (JSON Web tokens; JSON is
both technology area and user type) Drilling down to this level of detail pro- the JavaScript Object Notation data-
and prioritized them into bands of vided enough information to estimate interchange format) to prove that an
features labeled alpha, beta, and gen- the staffing required for each work- instance belongs to a preauthorized
eral availability. stream, to understand interdependen- Google Cloud project. When an in-
Workstream definition. In paral- cies between streams, and to fine-tune stance first boots, one of the JWTs pro-
lel to milestone definition, the team the organization as needed. vided by that API can be exchanged for
grouped requirements into seven a client certificate used to prove device
streams of related work such as net- Technical Implementation Details identity, unblocking nearly all of the
working and provisioning. Each work- Once planning was complete, the team normal communication paths (includ-
stream was assigned a technical lead, was ready to begin implementing the ing syncing login policies for user au-
a project lead, and a skeleton staff. technical details of the migration. thorization).
Each team was virtual, recruited from This section describes the three main Once the client certificate is in
across reporting lines as needed to buckets of work. place, Google applications can be ac-
address the work domain. The flex- Background: Networking and Be- cessed via the BeyondCorp/identity-
ibility provided by this form of organi- yondCorp. Many of the networking aware proxy as if they were any other
zation and associated matrix manage- challenges of running a desktop service Internet-facing service. In order for
ment turned out to be essential as the on Google Compute Engine (GCE) were cloud desktops to reach the proxies
project evolved. at least partially solved by the Beyond- (and other Internet endpoints), the
Engineering prototyping gap analy- Corp program (https://cloud.google. team set up network address transla-
sis, and design proposals. Once com/beyondcorp/). In a BeyondCorp tion (NAT) gateways (https://cloud.
formed, each workstream examined model, access controls are based on google.com/compute/docs/vpc/special-
the critical user journeys in their do- known information about a given de- configurations) to forward traffic
mains and researched the feasibility vice and user rather than on the loca- from the instances to targets outside
of implementing these stories on GCP. tion in a privileged network. When of the cloud project. In combination,
To do so, the team performed a gap network trust no longer factors into these approaches allow users to access
analysis for each user journey by read- access-control decisions, many ser- internal resources and the public Inter-
ing product GCP documentation and vices become readily accessible from net seamlessly, without requiring each
running “fail-fast” prototyping sprints. outside of the corporate network— instance to have a publicly routed
Throughout this process, possible usually via a corporate laptop, but now IP address.
implementations were collected and also from appropriately managed and Provisioning. The first step in de-
rated according to complexity, feasibil- inventoried hosts on GCE. signing a provisioning scheme was to
ity, and (most importantly) how easily Enterprises that leverage traditional map out everything necessary to deliver
a customer external to Google could virtual private networks (VPNs) for re- an end product that met users’ needs.
implement this solution. mote access to applications will have a Compute Engine instances needed
Whenever the migration team ar- different networking experience when levels of trust, security, manageability,
rived at a “Google-only” solution, it moving desktops or other services. A and performance for users to perform
filed a feature request to the GCP team typical strategy is to set up a Cloud VPN their jobs—developing, testing, build-
requesting a solution that would work (https://cloud.google.com/compute/docs/ ing, and releasing code—as normal.
for customers outside of Google as vpn/overview) in a cloud project and Working from these requirements, the
well, especially if another enterprise peer with on-premises equipment to team used the following specific prin-
customer would be interested in such bridge the networks together. ciples to guide the rest of the design.
functionality. In this way, the team Host authentication and authori- Users should interact with a cloud
sought to “act like a customer” in an zation. Device authentication is usu- desktop similarly to how they interact
effort to make the platform enterprise- ally performed using client certificates with hosts on the corporate network.
ready. Where the GCP product teams deployed on the host. When a user Users should be able to use their

practice
normal authentication mechanisms. ˲˲ Allowing users to interface natively

They should also be able to use the with cloud to create and manipulate
same tools to check machine statistics their instances. In this scenario, it
or report issues that they would use for would be necessary to observe these
physical desktops or Google’s legacy
virtual desktop platform. Enterprises that changes and make corresponding up-
dates in the corporate tools.
Instances must be securely invento-
ried. As a first step in the provision-
leverage traditional The team decided on the first ap-
proach and integrated existing virtual-
ing process, host inventory is boot- virtual private machine management tools with GCE.
strapped. As the hosts are used, further
inventory data is collected, both for
networks for As a result, they could enforce more
complex business logic in line with the
reliability monitoring and to inform remote access to user’s request.
access-control decisions.
Google’s corporate network uses
applications will The workflows built around the
GCE provisioning process focus mostly
multiple inventory systems, cross- have a different on translating data provided by the re-
referenced to validate such access re-
quests (for more context on Beyond- networking questing user, plus data known about
the user (group membership, job role),
Corp’s inventory process, see https:// experience when into a request that can be passed to
research.google.com/pubs/pub43231.
html and https://research.google.com/ moving desktops GCE. The instance can then be tracked
through the creation process, as well as
pubs/pub44860.html). Therefore, cor-
porate systems need some metadata
or other services. during any first-boot operating system
updates and configuration, to make
during the provisioning process to in- sure a usable machine is delivered to
dicate a privileged virtual desktop cre- the user.
ation request. This metadata is then Operating system. Google uses an
cross-referenced with inventory data internally managed Linux distribu-
pulled from Google Cloud APIs in or- tion based on Debian, called gLinux,
der to evaluate compliance with secu- for corporate hosts. On its corporate
rity policy and assert trust. network, gLinux is installed by load-
Instances must be securely man- ing a bootstrap environment on first
aged. A host must be able to securely boot. Large parts of the root file sys-
download the information needed to tem are unpacked from a tarball, and
construct its authorization policies the Debian installer then performs the
such that only permitted users have actual installation.
access. Hosts must also be able to On Cloud, this process starts with an
update packages and configurations image created by gLinux release tool-
driven by the operating system instal- ing, which is uploaded to Cloud stor-
lation and must be able to send logs age and imported into GCE (https://
to a central location so irregular be- cloud.google.com/compute/docs/images/
havior or installation-related issues import-existing-image). Creating a
can be detected. disk from this image results in a fully
Instances must be created to user runnable and bootable file system. You
specifications, with constraints. Instanc- can boot directly from the imaged disk
es should be provisioned with enough and only need to perform some small
resources for users to do their work modifications on first boot before it’s
effectively but should also have caps fully usable: the file system grows to fill
to prevent users from gratuitously cre- the full span of the disk, the hostname
ating high-specification/high-cost de- updates, and a few other GCE-specific
vices. Users can choose where to build modifications are performed.
their instances (typically, in regions To avoid the burden of maintain-
close to their physical location; disas- ing and testing separate behavior on
ter-recovery reasons might dictate a different platforms, the team needed
different location). to minimize specific customizations
Because of the need to manage and to gLinux on Compute Engine. For-
inventory the devices in corporate sys- tunately, this effort required very few
tems, there were two options for creat- modifications, most of which focused
ing new instances once desktop was on DNS (Domain Name System)
migrated to cloud: name resolution. For example, corpo-
˲˲ Creating cloud desktop instances rate DNS zones, which many applica-
on behalf of the requester; or tions running on Google’s corporate
practice
network require, are not available with the goal of improving market-
off-network. To address this need, the ing. For example, subjective feedback
team introduced a DNS resolver that on the relative performance of virtual
runs inside each instance to proxy re- desktops on Cloud versus the corpo-
quests for internal DNS zones back
to the corporate servers over HTTP A host must be rate network was split (when, in fact,
performance on Cloud was superior).
through a BeyondCorp proxy. able to securely In response, the team emphasized and
more heavily promoted benchmarking
Alpha and Beta Rollouts download the results to customers, with the hope of
Before beginning the migration to GCP
in earnest, the team conducted alpha
information needed prompting more objective valuations.
Surveys also helped quantify fears
and beta rollouts as initial sets of fea- to construct its about the transition, most notably
tures were ready. The alpha release tar-
geted roughly 100 users, while the beta
authorization around performance, user-data mi-
gration, and the ability to roll back to
release targeted roughly 1,000 users. policies such that the corporate network-hosted offer-
To evaluate the success of both
releases, the following metrics were only permitted ing when a given user workflow was
not supported.
tracked and compared with Google’s users have access. Based on survey feedback, the team
existing corporate fleet statistics: emphasized the following aspects to
˲˲ Pager load. There was a 95% drop make the migration to Google Cloud
in pager load once we migrated virtual attractive to users:
desktops to Google Cloud, in large part ˲˲ Improved VM specs. Users were of-
because of the platform abstraction fered large increases in standard CPU,
provided by GCP. While the team main- RAM, and disk specs.
tained a large fleet of physical servers, ˲˲ One-click personal data migration.
storage units, network equipment, and The migration process was easy to be-
support software to run virtual desk- gin with, and it automated the most
tops on the corporate network, GCE time-consuming part of users’ work-
removed all these concerns. flow: copying over personal data.
˲˲ Interrupts load. Initially, the mi- ˲˲ Easy rollback. The migration pro-
gration to cloud led to an increase in cess allowed users to roll back to their
interrupts as a result of the novelty Corp-hosted instances, which were
of the system (which also resulted in simply shut down before the migration
a corresponding increase in product to Cloud. Unused Corp-hosted instanc-
bug reports). After this initial surge, es were deleted after a grace period of
interrupts load dropped to just 20% of 90 days.
the volume experienced when virtual ˲˲ Impact on the company. Clearly ar-
desktops were hosted on the corpo- ticulating the cost reduction to Google
rate network. helped reassure users they were doing
˲˲ Login rates. Seven- and 30-day login the right thing for the company.
rates before and after the migration
were comparable. Migrating Users: Technical
To inform the roadmap for future and Process Details
milestones, the team also collected After collecting data and feedback from
data during alpha and beta releases via alpha and beta rollouts, the team was
two main avenues: ready to proceed with the general migra-
˲˲ User-reported feedback. User feed- tion to Cloud. This section details the
back in the form of tickets, bug reports, main features of the migration process.
emails, and word of mouth provided a Trade-in. Once the provisioning sys-
list of items to fix. The team filed bugs tem was ready for new users, approxi-
to track each list item, prioritized by se- mately 20,000 virtual desktop users
verity, number of users impacted, and had to be moved to the new product.
aggregated customer preference. The The team briefly considered a naive
last metric was made measurable by of- strategy of simply moving each user’s
fering power users 100 “feature points” disks into a new Cloud instance, but
to apportion across features as they experience in managing the existing
wished. These metrics could then be platform pointed in a slightly differ-
used to inform development priorities. ent direction.
˲˲ Surveys. Surveys measured subjec- Occasionally, the old platform expe-
tive impressions from the user base rienced a fault that required significant

practice
work to return a user’s instance to Moving bits. The actual uploading the instance (post-creation), the user
full functionality. This fault became proceeded in a straightforward man- still needed to wait several hours be-
more common as incremental, auto- ner. For each user request, a job execu- fore the instance became fully trusted
matic modifications to an instance tion system crafted a signed Google by all systems. Provisioning and test-
accrued. To spare unnecessary toil, Cloud Storage URL entitling the bearer ing an instance therefore took three
the team’s default first response in to perform an HTTP PUT request to a user interactions over a period of three
these scenarios was to ask users if bucket for a period of time. To ensure to five hours. Once the team became
they needed the data on the disk in that long-running upload jobs were aware of this issue, they made changes
question. Much of the time, the an- not interrupted as the workflow system to the trust pipeline and folded the en-
swer was no, as users didn’t have any deployed, an Upstart script on the old rollment step into the initial user re-
important data on their disks. Fol- virtualization platform processed the quest, thereby eliminating the hours
lowing this strong (but anecdotal) upload. Upon upload completion, a of waiting. If the team had considered
signal from corporate network-host- cloud worker instance fired up to cre- the timing of provisioning and other
ed virtual desktops, the team based ate a new disk image by merging the operations as building requirements,
the move to Cloud around the con- user’s data onto a copy of the golden they could have made these improve-
cept of user-involved trade-ins, as master image. ments earlier.
opposed to a traditional behind-the- Push vs. pull. The beta phase re- Broken use cases ended up flushing
scenes migration. vealed a high demand for virtual out many bad assumptions that various
To carry out such a trade-in pro- desktops hosted on Cloud, so the applications and groups of users made
gram, the team crafted two workflow team wanted to make sure the gen- about network access. For example,
pipelines: one to handle cases where eral launch adequately anticipated instead of using a library to check if a
users explicitly indicated that they demand. To avoid overwhelming the user is on the corporate network or the
did not need their data moved (or network links on the old virtualization production network, some suboptimal
could move it themselves); and one platform and the team’s capacity for implementations instead depended
for outliers who needed all of their toil in handling a surge of requests, on hostname or IP space. These issues
data moved. The former (and by far, users were not allowed to request were typically addressed by updating
most common) case required simply trade-ins themselves. Instead, trade- the code of individual applications to
performing two straightforward tasks: ins were first offered to a population remove the bad assumptions.
powering off the original instance and of users who were most likely to need There were also issues with some
creating a new instance with a similar them: those with low disk usage who workflows that technically violated se-
name. This approach required mini- likely wouldn’t need to move their en- curity policy but had been granted ex-
mal engineering effort; used minimal tire disk, and users whose instances ceptions. Cloud desktop enforces these
compute, bandwidth, and storage re- were hosted on old hardware. In this policies by default; it encourages users
sources; and provided a strong signal way, the team could both balance ca- to fix their workflows rather than carry
for how much benefit users gathered pacity limits and precisely target user forward bad practices. For example, on
from traditional stateful disks. populations whose machine locations Google’s corporate network, users can
Exceptional cases. Google employs would buy the most reduction in toil. get an exception to connect directly to
a great many engineers; if you give an Once users actually started using the some application databases. This is
engineer a Linux machine, there’s a platform, high demand for cloud desk- practical because the database server
good chance that the machine will be tops meant that users wanted to opt in and the user are on the same network.
customized within an inch of its life. earlier. Attempts to create new cloud These sorts of use cases should be
For these cases, the team wanted to desktop instances without an explicit steered toward BeyondCorp gateways.
provide a path to use the new platform invitation to do so were a signal to send
that did not force all users to migrate those users a trade-in invitation. Technical and Non-Technical
their own data. However, the team was Lessons Learned
wary of setting a “magical migration” Early Known Issues and Limitations As with any complicated launch, the
expectation for edge cases that could While initial reception of the new ser- development team learned a number
not possibly be fulfilled. vice was positive, a few barriers caused of lessons along the way, both techni-
For users who requested data mi- mild inconvenience to users and some cal and nontechnical.
gration, the best option was to move completely broken use cases. Push, don’t pull. Being able to control
their home directories in-place to the Most of these pain points affected the flow of traffic to your system makes
new cloud instances and provide an the provisioning process and were operating it infinitely simpler. Find a
on-disk backu p of their entire operat- largely caused by misunderstanding bug? Stop sending invites. Everything
ing system. While this strategy dupli- the various SLOs (service-level objec- humming along? Turn up the volume.
cated a significant amount of operat- tives) and delays within Google’s in- Even if integrating this functionality is
ing system data already present on ventory and trust pipeline. The team difficult, it’s worth implementing from
the gLinux system, it meant the team required an independent signal of user square one if possible.
could proceed without worrying that intent in order to grant trusted access Be explicit about trade-offs and costs
important files were not transferred, to a new desktop instance, and once a with your users. If you offer two op-
only to be noticed months later. user provided this signal by “enrolling” tions, and one seems like less work,
practice
everyone will choose that easier op- The migration to Google Cloud allowed ning for Compute Engine in general.
tion. To offset this impulse, if one op- the team to reconsider certain imple- As for server workloads, the team
tion is much more costly, expose that mentations that had ossified over time is building on lessons learned from
cost at the user decision point. For ex- within the team and organization. cloud desktop to provide a migration
ample, moving disks into the cloud is path. The main technical challenges in
convenient for users but much more Applying This this space include:
time-consuming (and costly) than the Experience Elsewhere ˲˲ Cataloging and characterizing the
alternative. The team exposed the cost Since a cloud desktop is composed of corporate fleet;
of moving disks as a 24-hour duration, a GCE instance running a custom im- ˲˲ Creating scalable and auditable
which was much less convenient than age (production of which is fairly cheap service and VM lifecycle management
the one-hour duration for a simple ex- and well documented; https://cloud. frameworks;
change of a corporate network-hosted google.com/compute/docs/images# ˲˲ Maintaining multiple flavors of
instance for a Cloud-hosted instance. custom_images), the infrastructure managed operating systems;
Simply exposing this information scales extraordinarily well. Very little ˲˲ Extending BeyondCorp semantics
when users had to choose between the changed when piloting with a dozen to protocols that are hard to proxy;
two options saved an estimated 1.8 instances versus running with thou- ˲˲ Tackling a new set of security and
petabytes of data moves. sands, and what Google has imple- compliance requirements;
Never waste an opportunity to gather mented here should be directly ap- ˲˲ Creating performant-shared stor-
data. Before the migration, the team plicable to other, smaller companies age solutions for services requiring da-
didn’t know what proportion of users without requiring much specialization tabases;
depended heavily on the contents of to the plan detailed in this article. ˲˲ Creating migration tools to auto-
their local disks. It turns out that only mate toilsome operations; and
about 50% of users cared enough about Future Plans ˲˲ Implementing a number of ser-
preserving their disks to wait 24 hours While the migration of virtual desk- vice-specific requirements.
for the move to complete. That’s a valu- tops to Cloud wasn’t painless, it has Migrating server workloads also has
able data point for future service ex- been a solid success and a foundation the added organizational complexity of
pansions or migrations. for further work. Looking to the future, a heterogeneous group of service own-
Don’t be tempted to make a special the Google Corporate Cloud Migra- ers, each with varying priorities and re-
case out of a “one-time” migration.” tions team is engaged in two primary quirements from the departments and
Your future self will be thankful if you streams of work: improving the vir- business functions they support.
take the opportunity to homogenize tual desktop experience and enabling
when making lasting changes. Previ- Google corporate server workloads to
Related articles
ous generations of the corporate net- run on Cloud. on queue.acm.org
work-hosted virtual desktop system In the desktop space, the team
Titus: Introducing Containers
had a slightly different on-disk layout plans to improve the service manage-
to the Netflix Cloud
than the current models used for test- ment experience by developing vari- Andrew Leung, Andrew Spyker, and Tim Bozarth
ing. Not only was this an unpleasant ous tools that supplement the Google https://queue.acm.org/detail.cfm?id=3158370
surprise in production, but it was also Cloud platform to help manage the Reliable Cron across the Planet
almost impossible to test since no ex- fleet of cloud desktops. These add-ons Štepán Davidovič, Kavita Guliani
isting tools would create the old disk include a disk-inspection tool and a https://queue.acm.org/detail.cfm?id=2745840
type. Fortunately, during the design fleet-management command-line tool Virtualization: Blessing or Curse?
phase the team had resisted the urge that integrates and orchestrates ac- Evangelos Kotsovinos
to “simplify” the data-copying phase tions between Cloud and other corpo- https://queue.acm.org/detail.cfm?id=1889916
by putting user data on a second GCE rate systems.
disk—doing so would have made There are several possibilities for Matt Fata is a Site Reliability Manager at Google, where
he works on corporate virtualization solutions. He has
these instances special snowflakes for improving fleet cost effectiveness. On previously worked as a network engineer and as an IT
support desk manager.
the lifetime of the Cloud-hosted plat- the simple end of the spectrum, cloud
form. desktop could automatically request Philippe-Joseph Arida is a Technical Program Manager
at Google, where he works on making GCP the best
Keep the organization flexible. Or- that owners of idle machines delete in- platform for enterprise workloads. He previously worked
ganizing the team into virtual work- stances they don’t actually need. as a PM at Microsoft on desktop, server, and search
products.
streams has multiple benefits. This Finally, the end-user experience
Patrick Hahn is a Site Reliability Engineer at Google and
strategy allowed the team to quickly could be improved by implementing a the Technical Lead of the cloud desktop project. He has
gather expertise across reporting self-serve VM cold migration between previously worked as a sysadmin in the Web development,
managed IT, and quantitative finance industries.
chains, expand and contract teams datacenters, allowing traveling users
Betsy Beyer is a technical writer for Google Site
throughout the project, reduce com- to relocate their instances to a nearby Reliability Engineering in NYC, and the editor of Site
munication overhead between teams, datacenter to reduce latency to their Reliability Engineering: How Google Runs Production
Systems and the Site Reliability Workbook.
assign singular deliverable objectives VM. Note that these plans are scoped to
to work groups, and reduce territorial- cloud desktop as part of the customer/
ity across teams. application-specific logic, as opposed Copyright held by owners/authors.
This is an opportunity to “get it right.” to features Google as a company is plan- Publication rights licensed to ACM. $15.00.

DOI:10.1145/ 3 2 3 3 2 43

queue.acm.org
Three critical design points: Joint learning,

weak supervision, and new representations.
BY ALEX RATNER AND CHRIS RÉ
Research for
Practice:
Knowledge Base
Construction in the
Machine-Learning Era
THIS INSTALLMENT OF Research for Practice features

a curated selection from Alex Ratner and Chris Ré,
who provide an overview of recent developments in
Knowledge Base Construction (KBC). While knowledge
bases have a long history dating to the expert systems
of the 1970s, recent advances in machine learning
have led to a knowledge base re- More in-
naissance, with knowledge bases formation
now powering major product func- is accessible
tionality including Google Assis- today than
tant, Amazon Alexa, Apple Siri, and at any other
Wolfram Alpha. Ratner and Ré’s time in hu-
selections highlight key consid- man history. From a software perspec-
erations in the modern KBC pro- tive, however, the vast majority of this
cess, from interfaces that extract data is unusable, as it is locked away in
knowledge from domain experts unstructured formats such as text,
to algorithms and representations PDFs, Web pages, images, and other
that transfer knowledge across tasks. hard-to-parse formats. The goal of KBC
Please enjoy! (knowledge base construction) is to ex-
—Peter Bailis tract structured information automati-
cally from this “dark data,” so that it
Peter Bailis is an assistant professor of computer
can be used in downstream applica-
science at Stanford University. His research in the Future tions for search, question-answering,
Data Systems group (futuredata.stanford.edu) focuses
on the design and implementation of next-generation link prediction, visualization, model-
data-intensive systems. ing and much more. Today, knowledge
practice
bases (KBs) are the central compo- learning components of the system; In other systems, the importance of
nents of systems that help fight hu- and, new ways of representing both connecting or coupling multiple tasks
man trafficking,19 accelerate biomedi- inputs and outputs of the KB. is echoed in slightly different contexts
cal discovery, 9 and, increasingly, or formulations: for example, as a way
power web-search and question-an- to avoid cascading errors between dif-
Joint Learning: Sharing Information
swering technologies.4 and Avoiding Cascaded Errors ferent pipeline steps such as extrac-
KBC is extremely challenging, how- T.M. Mitchell et al. tion and integration (for example,
ever, as it involves dealing with highly Never-ending learning. In Proceedings DeepDive18), or implemented by shar-
complex input data and multiple con- of the Conference on Artificial Intelligence, ing weights or learned representations
2015, 2302–2310.
nected subtasks such as parsing, ex- of the input data between tasks as in
tracting, cleaning, linking, and integra- KBC is particularly challenging because multitask learning.3,17 Either way, the
tion. Traditionally, even with machine of the large number of related subtasks decision of how to couple different
learning, each of these subtasks would involved, each of which may use one or subtasks is a critical one in any KBC
require arduous feature engineering more ML (machine-learning) models. system design.
(that is, manually crafting attributes of Performing these tasks in disconnect-
the input data to feed into the system). ed pipelines is suboptimal in at least
For this reason, KBC has traditionally two ways: it can lead to cascading errors Weak Supervision: Programming ML
with Training Data
been a months- or years-long process (for example, an initial parsing error
A.J. Ratner, S.H. Bach, H. Ehrenberg,
that was approached only by academic may throw off a downstream tagging or J. Fries, J., S. Wu, and C. Ré
groups (for example, YAGO,8 DBPedia,7 linking task); and it misses the oppor- Snorkel: Rapid training data creation
KnowItNow,2 DeepDive,18 among others) tunity to pool information and train- with weak supervision. In Proceedings
or large, well-funded teams in industry ing signals among related tasks (for of the Very Large Database (VLDB)
Endowment 11, 3 (2017), 269–282.
and government (for example, Google’s example, subcomponents that extract
Knowledge Vault, IBM Watson, and similar types of relations can probably In almost all KBC systems today,
Amazon’s Product Graphs). use similar representations of the input many or all of the critical tasks are
Today, however, there is a renewed data). The high-level idea of what are of- performed by increasingly complex
sense of democratized progress in ten termed joint inference and multitask machine-learning models, such as
the area of KBC, thanks to powerful learning—which we collectively refer deep-learning ones. While these
but easy-to-use deep-learning models to as joint learning—is to learn multi- models indeed obviate much of the
that largely obviate the burdensome ple related models jointly, connecting feature-engineering burden that was
task of feature engineering. Instead, them by logical relations of their output a traditional bottleneck in the KBC
modern deep-learning models oper- values and/or shared representations development process, they also re-
ate directly over raw input data such as of their input values. quire large volumes of labeled train-
text or images and get state-of-the-art Never-Ending Language Learner ing data from which to learn. Having
performance on KBC sub-tasks such (NELL) is a classic example of the humans label this training data by
as parsing, tagging, classifying, and impact of joint learning on KBC at hand is an expensive task that can
linking. Moreover, standard commod- an impressive scale. NELL is a sys- take months or years, and the result-
ity architectures are often suitable for tem that has been extracting various ing labeled data set is frustratingly
a wide range of domains and tasks facts about the world (for example, static: if the schema of a KB changes,
such as the “hegemony”11 of the bi- ServedWith(Tea, Biscuits)) from as it frequently does in real produc-
LSTM (bidirectional long short-term the Internet since 2010, amounting to tion settings, the training set must be
memory) for text, or the CNN (convo- a KB containing (in 2015) more than 80 thrown out and relabeled. For these
lutional neural network) for images. million entries. The problem setting reasons, many KBC systems today
Open source implementations can of- approached by NELL consists of more use some form of weak supervision:15
ten be downloaded and run in several than 2,500 distinct learning tasks, in- noisier, higher-level supervision pro-
lines of code. cluding categorizing noun phrases into vided more efficiently by a domain
For these emerging deep-learning- specific categories, linking similar en- expert.6,10 For example, a popular
based approaches to make KBC faster tities, and extracting relations between heuristic technique is distant super-
and easier, though, certain critical de- entities. Rather than learning all these vision, where the entries of an exist-
sign decisions need to be addressed— tasks separately, NELL’s formulation ing knowledge base are heuristically
such as how to piece them together, includes known (or learned) coupling aligned with new input data to label
how to collect training data for them constraints between the different tasks, it as training data.1,13,16
efficiently, and how to represent their which Mitchell et al. cite as critical to Snorkel provides an end-to-end
input and output data. This article training NELL. These include logical re- framework for weakly supervising
highlights three papers that focus lations such as subset/superset (for ex- machine-learning models by having
on these critical design points: joint- ample, IsSandwhich(Hamburger) domain experts write LFs (labeling
learning approaches for pooling infor- ⇒ IsFood(Hamburger)) and mutual- functions), which are simply black-box
mation and coordinating among sub- exclusion constraints, which connect functions that programmatically label
components; more efficient methods the many disparate tasks during infer- training data, rather than labeling any
of weakly supervising the machine- ence and learning. training data by hand. These LFs sub-

practice
sume a wide range of weak supervision Instead, Riedel et al. propose using Discovery and Data Mining, 2014, 601–610.
5. Grover, A. and Leskovec, J. node2vec: Scalable feature
techniques and effectively give non- dense embeddings to represent the learning for networks. In Proceedings of the 22nd ACM
machine-learning experts a simple KB itself and learning these from the SIGKDD Intern. Conf. Knowledge Discovery and Data
Mining, 2016, 855–864.
way to “program” ML models. More- union of all available or potential tar- 6. Hoffmann, R., Zhang, C., Ling, X., Zettlemoyer, L.,
over, Snorkel automatically learns the get schemas. Weld, D.S. Knowledge-based weak supervision for
information extraction of overlapping relations.
accuracies of the LFs and reweights Moreover, they argue that such In Proceedings of the 49th Annual Meeting of the
their outputs using statistical model- an approach unifies the tradition- Assoc. Computational Linguistics–Human Language
Technologies, 1, 2011, 541–550.
ing techniques, effectively denoising ally separate tasks of extraction and 7. Lehmann, J. et al. DBpedia—A large-scale,
multilingual knowledge base extracted from
the training data, which can then be integration. Generally, extraction is Wikipedia. Semantic Web 6, 2 (2014), 167–195.
used to supervise the KBC system. In the process of going from input data 8. Mahdisoltani, F., Biega, J. and Suchanek, F.M. YAGO3:
A knowledge base from multilingual wikipedias. In
this paper, the authors demonstrate to an entry in the KB—for example, Proceedings of the 7th Biennial Conf. Innovative Data
that Snorkel improves over prior weak mapping a text string X likes Y to a Systems Research, 2013.
9. Mallory, E.K., Zhang, C., Ré, C. and Altman, R.B. Large-
supervision approaches by enabling KB relation Likes(X,Y)—while inte- scale extraction of gene interactions from full-text
the easy use of many noisy sources, gration is the task of merging or link- literature using DeepDive. Bioinformatics 32, 1 (2015),
106–113.
and comes within several percentage ing related entities and relations. In 10. Mann, G.S. and McCallum, A. Generalized expectation
points of performance using massive their approach, however, both input criteria for semi-supervised learning with weakly
labeled data. J. Machine Learning Research 11 (Feb
hand-labeled training sets, showing text and KB entries are represented 2010), 955–984.
the efficacy of weak supervision for in the same vector space, so these op- 11. Manning, C. Representations for language: From
word embeddings to sentence meanings. Presented
making high-performance KBC sys- erations become essentially equiva- at Simons Institute for the Theory of Computing, UC
tems faster and easier to develop. lent. These embeddings can then be Berkeley; https://nlp.stanford.edu/manning/talks/
Simons-Institute-Manning-2017.pdf.
learned jointly and queried for a vari- 12. Mikolov, T., Chen, K., Corrado, G. and Dean, J. Efficient
ety of prediction tasks. estimation of word representations in vector space,
2013; arXiv preprint arXiv:1301.3781.
Embeddings: Representation and 13. Mintz, M., Bills, S., Snow, R. and Jurafsky, D. Distant
Incorporation of Distributed Knowledge supervision for relation extraction without labeled
KBC Becoming More Accessible
S., Riedel, L. Yao, A. McCallum and B.M. Marlin data. In Proceedings of the Joint Conf. 47th Annual
Relation extraction with matrix factorization This article has reviewed approach- Meeting of the Assoc. Computational Linguistics and
the 4th Conf. Asian Federation of Natural Language
and universal schemas. In Proceedings of es to three critical design points of Processing, 2009, 1003–1011.
the Conference of the North American Chapter building a modern KBC system and 14. Nickel, M. and Kiela, D. Poincaré embeddings for
of the Association for Computational Linguistics– learning hierarchical representations. Advances in
how they have the potential to ac- Neural Information Processing Systems 30 (2017),
Human Language Technologies: 2013, 74–84.
celerate the KBC process: coupling 6341–6350.
15. Ratner, A., Bach, S., Varma, P. and Ré, C. Weak
Finally, a critical decision in KBC is multiple component models to learn supervision: the new programming paradigm for
how to represent data: both the input them jointly; using weak supervision machine learning. Hazy Research; https://hazyresearch.
github.io/snorkel/blog/ws_blog_post.html.
unstructured data and the resulting to supervise these models more effi- 16. Ren, X., He, W., Qu, M., Voss, C. R., Ji, H., Han, J. Label
output constituting the knowledge ciently and flexibly; and choosing a noise reduction in entity typing by heterogeneous
partial-label embedding. In Proceedings of the 22nd
base. In both KBC and more general dense vector representation for the ACM SIGKDD Intern. Conf. Knowledge Discovery and
ML settings, the use of dense vector data. While ML-based KBC systems Data Mining, (2016), 1825–1834.
17. Ruder, S. An overview of multi-task learning in
embeddings to represent input data, are still large and complex, one prac- deep neural networks, 2017; arXiv preprint arXiv:
especially text, has become an om- tical benefit of today’s interest and 1706.05098.
18. Zhang, C., Ré, C., Cafarella, M., De Sa, C., Ratner,
nipresent tool.12 For example, word investment in ML is the plethora of A., Shin, J., Wang, F., Wu, S. DeepDive: Declarative
knowledge base construction. Commun. ACM 60, 5
embeddings, learned by applying state-of-the-art models for various (May 2017), 93–102.
PCA (principal component analysis) KBC subtasks available in the open 19. Zhang, C., Shin, J., Ré, C., Cafarella, M. and Niu, F.
Extracting databases from dark data with DeepDive.
or some approximate variant to large source, and well-engineered frame- In Proceedings of the Intern. Conf. Management of
unlabeled corpora, can inherently rep- works such as PyTorch and Tensor- Data, 2016, 847–859.
resent meaningful semantics of text Flow with which to run them. To-
data, such as synonymy, and serve as a gether with techniques and systems Alex Ratner is a Ph.D. candidate in computer science
powerful but simple way to incorporate for putting the pieces all together at Stanford University, advised by Chris Ré, where his
research focuses on weak supervision—using higher-level,
statistical knowledge from large cor- like those reviewed, high-perfor- noisier input from domain experts to train complex state-
of-the-art models where limited hand-labeled training
pora. Increasingly sophisticated types mance KBC is becoming more acces- data is available. He leads the development of the Snorkel
of embeddings, such as hyperbolic,14 sible than ever. framework for weakly supervised ML, which has been
applied to KBC problems in domains such as genomics,
multimodal, and graph5 embeddings, clinical diagnostics, and political science. He is supported
can provide powerful boosts to end- References by a Stanford Bio-X SIGF fellowship.
system performance in an expanded 1. Bunescu, R.C., Mooney, R.J. Learning to extract Christopher Ré is an associate professor of computer
relations from the Web using minimal supervision. science at Stanford University. His work focuses on
range of settings. In Proceedings of the 45th Annual Meeting Assoc. enabling users and developers to build applications that
In their paper, Riedel et al. provide Computational Linguistics, 2007, 576–583. more deeply understand and exploit data. Work from his
2. Cafarella, M.J., Downey, D., Soderland, S., Etzioni, O. group has been incorporated into major scientific and
an interesting perspective by showing KnowItNow: Fast, scalable information extraction humanitarian efforts, including the IceCube neutrino
how embeddings can also be used to from the Web. In Proceedings of Conf. on Human detector, PaleoDeepDive, and MEMEX in the fight against
Language Tech. Empirical Methods in Natural human trafficking, and into commercial products from
represent the knowledge base itself. Language Processing, 2005, 563–570. major Web and enterprise companies.
In traditional KBC, an output schema 3. Caruana, R. Multitask learning: A knowledge-based
source of inductive bias. In Proceedings of the 10th
(that is, which types of relations are to Intern. Conf. Machine Learning, 1993, 41-48.
4. Dong, X. et al. Knowledge Vault: A Web-scale approach
be extracted) is selected first and fixed, to probabilistic knowledge fusion. In Proceedings Copyright held by owners/authors.
which is necessarily a manual process. of the 20th ACM SIGKDD Intern. Conf. Knowledge Publication rights licensed to ACM. $15.00.
practice
DOI:10.1145/ 3267118
without opening the lock. Some cycles

queue.acm.org
are obvious, but more complex depen-
dency cycles can be challenging to find
before they lead to outages. Strategies
Dependency management is a crucial part for tracking and controlling dependen-
of system and software design. cies are necessary for maintaining reli-
able systems.
BY SILVIA ESPARRACHIARI GHIROTTI,
Reasons to Manage Dependencies
TANYA REILLY, AND ASHLEIGH RENTZ
A lockout, as in the story of the cyclic
Tracking and
coffee shop, is just one way that de-
pendency management has critical
implications for reliability. You can-
not reason about the behavior of any
Controlling
system, or guarantee its performance
characteristics, without knowing what
other systems it depends on. Without
knowing how services are interlinked,
Microservice
you cannot understand the effects of
extra latency in one part of the system,
or how outages will propagate. How
else does dependency management af-
Dependencies
fect reliability?
SLO. No service can be more reli-
able than its critical dependencies.8
If dependencies are not managed, a
service with a strict service-level ob-
jective (SLO)1 might depend on a back
end that is considered best-effort. This
might go unnoticed if the back end has
coincidentally high availability or low
latency. When that back end starts per-
forming exactly to its SLO, however, it
will degrade the availability of services
IN SEARCH OF a cappuccino, cheese bread, and a place that rely on it.
to check her email, Silvia walked into a coffee shop. High-fidelity testing. Distributed sys-
Upon connecting to the Wi-Fi hotspot, a captive portal tems should be tested in environments
that replicate the production environ-
prompted her to log in and offered a few third-party ment as closely as possible.7 If non-
authentication options. When she clicked on one of critical dependencies are omitted in
the test environment, the tests cannot
the access token providers, her browser showed a “No identify problems that arise from their
Internet Connection” error. Since she didn’t have access interaction with the system. This can
to the network, she could not get an OAuth token—and cause regressions when the code runs
in production.
she couldn’t access the network without one. Data integrity. Poorly configured
This anecdote illustrates a critical detail of system production servers may accidentally
depend on their development or qual-
design that can easily go unnoticed until an outage ity assurance (QA) environments. The
takes place: cyclic dependencies. reverse may also be true: A poorly con-
Dependency cycles are familiar to you if you have ever figured QA server may accidentally leak
fake data into the production environ-
locked your keys inside your house or car. You cannot ment. Experiments might inadvertent-
open the lock without the key, but you can’t get the key ly send requests to production servers

and degrade production data. Depen- terconnected is crucial for detecting might be offline as a result of the error,
dency management can expose these and limiting the scope of damage. but the network outage makes it im-
problems before they become outages. You may also think about dependen- possible to connect to the device and
Disaster recovery/isolated bootstrap. cies when deploying denial of service repair it. The network device depends
After a disaster, it may be necessary (DoS) protection: One system that is on the very network it provides.
to start up a company’s entire infra- resilient to extra load may send re- Dependency cycles can also disrupt
structure without having anything al- quests downstream to others that are recovery from two simultaneous out-
ready running. Cyclic dependencies less prepared. ages. As in the isolated bootstrap sce-
can make this impossible: a front-end nario, two systems that have evolved
service may depend on a back end, but Dependency Cycles to depend upon each other cannot be
the back-end service could have been Dependency cycles are most danger- restarted while neither is available. A
modified over time to depend on the ous when they involve the mechanisms job-scheduling system may depend on
front end. As systems grow more com- used to access and modify a service. writing to a data-storage system, but
plex over time, the risk of this happen- The operator knows what steps to take that data-storage system may depend
ing increases. Isolated bootstrap envi- to repair the broken service, but it is on the job-scheduling system to assign
ronments can also provide a robust QA impossible to take those steps with- resources to it.
environment. out the service. These control cycles Cycles may even affect human
Security. In networks with a peri- commonly arise in accessing remote processes, such as oncall and de-
meter-security model, access to one systems. An error that disables sshd bugging. In one example, a source-
system may imply unfettered access or networking on a remote server may control system outage left both the
IMAGE BY VERSH ININ89
to others.9 If an attacker compro- prevent connecting to it and repairing source-code repository and documen-
mises one system, the other systems it. This can be seen on a wider scale tation server unavailable. The only way
that depend on it may also be at risk. when the broken device is responsible to get to the documentation or source
Understanding how systems are infor routing packets: The whole network code of the source-control system was
practice
to recover the same system. Without ward the microservices model makes ure to track external dependencies may
this key information about the sys- dependency management much more also introduce bootstrapping risks. As
tem’s internals, the oncall engineer’s difficult. As Leslie Lamport said in SaaS becomes more popular and as
response was significantly obstructed. 1987, “A distributed system is one in more companies outsource infrastruc-
which the failure of a computer you ture and functionality, cyclic depen-
Microservices and didn’t even know existed can render dencies may start to cross companies.
External Services your own computer unusable.”5 Large For example, if two storage companies
In the era of monolithic software binaries are now frequently broken were to use each other’s systems to
development, dependency manage- into many smaller services, each one store boot images, a disaster that af-
ment was relatively clear-cut. While a serving a single purpose and capable fected both companies would make re-
monolithic binary may perform many of failing independently. A retail ap- covery difficult or impossible.
functions, it generally provides a single plication might have one service for
failure domain containing all of the rendering the storefront, another for Directed Acyclic Graphs
binary’s functionality. Keeping track thumbnails, and more for currency At its essence, a service dependency is
of a small number of large binaries conversion, checkout, address normal- the need for a piece of data that is re-
and storage systems is not difficult, so ization, and surveys. The dependencies mote to the service. It could be a con-
an owner of a monolithic architecture between them cross failure domains. figuration file stored in a file system,
can easily draw a dependency diagram, In her 2017 Velocity NY Conference or a row for user data in a database, or
perhaps like that in Figure 1. talk, Sarah Wells of the Financial Times a computation performed by the back
The software industry’s move to- explained how her development teams end. The way this remote data is ac-
manage more than 150 microser- cessed by the service may vary. For the
Figure 1. Sample dependency diagram. vices—and that is for just one part of sake of simplicity, let’s assume all re-
the Financial Times’s technical estate. mote data or computation is provided
Squarespace is in the process of break- by a serving back end via remote proce-
ing down its monolith4 and already has dure calls (RPCs).
more than 30 microservices. Larger As just described, dependency
companies such as Google, Netflix, cycles among systems can make it vir-
and Twitter often have thousands of tually impossible to recover after an
microservices, pushing the problem of outage. The outage of a critical depen-
dependency management beyond hu- dency propagates to its dependents,
man capabilities. so the natural place to begin restoring
Microservices offer many advan- the flow of data is the top of the depen-
LB
tages. They allow independent compo- dency chain. With a dependency cycle,
nent releases, smoother rollbacks, and however, there is no clear place to be-
polyglot development, as well as allow gin recovery efforts since every system
teams to specialize in one area of the is dependent on another in the chain.
Web server codebase. However, they are not easy to One way to identify cycles is to build
keep track of. In a company with more a dependency graph representing all
than 100 microservices, it is unlikely services in the system and all RPCs
that employees could draw a diagram exchanged among them. Begin build-
and get it right, or guarantee they are ing the graph by putting each service
app server
making dependency decisions that will on a node of the graph and drawing
not result in a cycle. directed edges to represent the outgo-
Both monolithic services and ing RPCs. Once all services are placed
microservices can experience boot- in the graph, the existing dependency
strapping issues caused by hidden cycles can be identified using common
dependencies. They rely on access to algorithms such as finding a topological
database
decryption keys, network, and power. sorting via a depth-first search. If no cy-
They may also depend on external cles are found, that means the services’
systems such as DNS (Domain Name dependencies can be represented by a
System). If individual endpoints of a directed acyclic graph (DAG).
Figure 2. Cycle removal. monolith are reached via DNS, the pro- What happens when a cycle is
cess of keeping those DNS records up found? Sometimes, it’s possible to re-
Dependency Cycle Cycle Removed to date may create a cycle. move a cycle by inverting the depen-
Controller Controller
The adoption of SaaS (software as dency, as shown in Figure 2. One exam-
give have a service) creates new dependencies ple is a notification system where the
new data is
available
me some whose implementation details are hid- senders notify the controllers about
data! data!
Sender Sender
den. These dependencies are subject new data, and the controller then pulls
to the latency, SLO, testing, and securi- data from the senders. The cycle here
ty concerns mentioned previously. Fail- can be easily removed by allowing the
100 CO MM UNICATIO NS O F T H E AC M | NOV EM BER 201 8 | VO L . 61 | N O. 1 1

practice
senders only to push data into the con- during design and implementation
troller. Cycle removal could also be ac- where you can identify and avoid an
complished by splitting the functional- undesirable dependency. Additionally,
ity across two nodes—for example, by you can prevent connections from be-
moving the new data notification to a
third system. Dependency cycles ing made while the code is running in
production. If you wait until the depen-
Some dependencies are intrinsically
cyclic and may not be removed. Repli-
among systems dency has already been used and moni-
tored, it will be too late to prevent the
cated services may periodically query can make it issues it may cause.
their replicas in order to reinforce data
synchronization and integrity.3 Since
virtually impossible These approaches overlap (for exam-
ple, data collected during dependency
all replicas represent a single service, to recover after control can certainly be used for track-
this would be represented as a self-de-
pendency cycle in the graph. It’s usu-
an outage. ing), but let’s look at them separately.
Dependency tracking. Initially, de-
ally okay to allow self-dependencies as pendency tracking often takes the form
long as they do not prevent the isolated of information stored in engineers’
bootstrapping of the system and can heads and visualized in whiteboard
properly recover from a global outage. drawings. This is sufficient for smaller
Another intrinsically cyclic de- environments, but as the system be-
pendency occurs in data-processing comes more complex, the map of ser-
pipelines implemented as a workers- vices becomes too complicated for any
controller system.2 Workers keep the one person to memorize. Engineers
controller informed about their status, may be surprised by an outage caused
and the controller assigns tasks to the by an unexpected dependency, or they
workers when they become idle. This may not be able to reason about how to
cyclic dependency between workers move a service and its associated back
and controllers may not be removed ends from one data center to another.
without completely changing the pro- At this stage, organizations begin to
cessing model. What can be done in consider programmatically generated
this case is to group workers and con- views of the system.
trollers into a supernode representing Different environments may use dif-
a single service. By repeating this edge ferent ways of collecting information
contraction for all strongly connected about how services are interconnected.
components of the graph, taking into In some, a firewall or network device
account their purpose and practical might record logs of which services are
viability, you may achieve a DAG repre- contacting each other, and these logs
sentation of the original graph. can be mined for dependency data.
Alternatively, a set of services built on
Tracking vs. Controlling a common framework might export
In some environments, you can derive standard monitoring metrics about
great benefit from just understanding every connection; or distributed trac-
the existing dependency graph. In oth- ing might be used to expose the paths
ers, determining the existing state is a request takes through the system,
not sufficient; mechanisms are needed highlighting the connections.
for preventing new undesirable depen- You can aggregate whatever sources
dencies. The two approaches exam- of information are available to you and
ined here—dependency tracking and create a dependency graph, processing
dependency control—have different the data into a common structure and
characteristics: optimizing it for running queries over
˲˲ Tracking dependencies is a passive it. From there, you can use algorithms
approach. You use logging and moni- on the graph to check whether it is a
toring to record which services con- DAG, visualize it using software such as
tact each other, then look back at Graphviz and Vizceral, or expose infor-
that data in the future. You can un- mation for each service, perhaps using
derstand the dependencies by creat- a standard dashboard with a page for
ing data structures that can be que- each service.
ried efficiently or by representing the Continually monitoring traffic be-
relationships visually. tween systems and immediately inte-
˲˲ Controlling dependencies is an ac- grating it into the graph may see new
tive approach. There are several points dependencies shortly after they reach
N OV E MB E R 2 0 1 8 | VO L. 6 1 | N O. 1 1 | C OM M U N IC AT ION S OF T H E ACM 101

practice
production. Even so, the information not a reasonable predictor for whether
is available only after the new depen- that binary has a dependency on an-
dency has been created and is already other service. If a standard mechanism
in use. This is sufficient for dependen- is used for specifying back ends or con-
cy tracking, where you want to describe
the interconnections of an existing sys- One approach nection types—for example, if all back
ends are provided in configuration and
tem and become aware of new ones.
Preventing the dependency, however,
to dependency not in code—this might be an area
worth exploring.
requires dependency control. control is to Restrictions are most effective if ap-
Dependency control. Just like
dependency tracking, dependency
analyze the client’s plied at runtime. By intercepting and
potentially blocking connections as
control typically starts as a manual code and restrict they are being made, you can be cer-
process using information stored in
engineers’ heads. Developers might
dependencies at tain that you are inspecting the actual
behavior of the running system, rather
include a list of proposed back ends build time. than speculating based on code or con-
in all design documentation and de- figuration. To avoid wasted engineer-
pend on their colleagues’ knowledge ing effort, restrictions on the back ends
of the existing systems to flag dangers. that services may contact should be
Again, this may be enough for a small- implemented as early in the develop-
er environment. As services are born, ment life cycle as possible. Changing a
grow, change, and are deprecated, the system’s architecture after it is already
data can quickly become stale or un- live and in use is far more expensive.6
wieldy. Dependency control is most By applying the same set of restrictions
effective if enforced programmati- at all stages of software development—
cally, and there are several points to during development, testing, canary-
consider in adding it. ing, and running live—any unwelcome
When working on dependency man- dependency can be identified early.
agement at Google, we found it best to There are several options for run-
think about controlling dependencies time enforcement. Just as with de-
from the client side of a client-server pendency tracking, existing infra-
connection (that is, the service that is structure could be repurposed for
about to depend on another service). dependency control. If all interservice
By owning the code that initiates the connections pass through a firewall,
connections, the owner of the client network device, load balancer, or ser-
has the most control and visibility over vice mesh, those infrastructure ser-
which dependencies exist and can vices could be instrumented to main-
therefore detect potential problems tain a list of acceptable dependencies
earlier. The client is also most affected and drop or deny any requests that
by ill-considered dependencies. do not match the list. Silently drop-
Although the owner of a server may ping requests at a point between the
want to control who its clients are for client and server may complicate
reasons such as capacity planning or debugging, though. A request that
security, bad dependencies are much is dropped for being an unapproved
more likely to affect the client’s SLO. dependency may be indistinguish-
Because the client requires some func- able from a failure of the server or the
tionality or data from the server for its intermediate device: the connections
own functionality or performance, it may seem to just disappear.
needs to be prepared for server-side Another option is to use a dedicated
outages. The server, on the other hand, external dependency-control service
is unlikely to notice an outage of one of that the client can query before allow-
its clients. ing each new back-end connection.
One approach to dependency con- This kind of external system has the
trol is to analyze the client’s code and disadvantage of adding latency since
restrict dependencies at build time. it requires extra requests to allow or
The behavior of the binary, however, deny each back end. And, of course,
will be influenced by the configuration the dependency-control service itself
and environment it receives. Identical becomes a dependency of the service.
binaries might have very different de- At Google, we had the most success
pendencies in different situations, and adding restrictions into the code at the
the existence of code inside a binary is point where the connection is made.
102 COMM UNICATIO NS O F T H E ACM | NOV EM BER 201 8 | VO L . 61 | N O. 1 1

practice
Since Google had a homogenous Figure 3. Pseudocode for policy authorization.

environment with a standard RPC
mechanism used for all connections, func isAllowedByPolicy(rpc, acl):
we were able to modify the RPC code foreach expectedRPC in acl:
to match each new back end against a if rpc == expectedRPC:
# If the RPC is listed in the ACL,
“dependency control policy”—an ex-
# it should be allowed.
tra configuration option provided to return true
every binary.
# If the RPC didn’t match any item on the ACL,
# it should be rejected.
Authorizing RPCs return false
The dependency-control policy con-
sists of an access control list (ACL) of
the RPC names a service is expected
to initiate. For performance reasons, manually added to the ACL. Because Figure 4. Stacked model.
the policy is serialized and loaded by of this, service owners would be well
the service during startup. If the policy advised to run in soft-apply mode for
is invalid (because of syntax errors or a long time before enforcing depen- product
data corruption), it’s not used and the dency controls in production.
dependency control is not activated. If privacy
the policy is correct, it becomes active, Isolating Groups of Servers
dependencies
and all outgoing RPCs are matched Authorizing RPCs by name is a good
turnup
against it. If an RPC is fired but is not way to control RPCs that are uniquely storage
present in the policy, it will be flagged served by a single back end, but this
as rejected. Rejected RPCs are reported will not cover all of the dependency is- security
via monitoring so that service owners sues highlighted earlier. One example
can audit them and decide on the cor- is a data-integrity case where both
network
rect course of action: remove the RPC production and test servers employ
from the binary if it’s not a desired de- the same RPC. Unless the policy can
pendency or add it to the ACL if it’s indistinguish between the serving back
deed a necessary new dependency. ends, you cannot block production
The pseudocode depicted in Figure RPCs from reaching test instances. Figure 5. Isolated model.
3 shows how the authorization of RPCs Additionally, ACLs can offer a tight
could be implemented. lock around dependencies, but they Jupiter Earth Mars
To prevent production outages, are not singly sufficient to prevent product product product
service owners are allowed to choose dependency cycles.
whether to enforce the policy and To prevent these other dependency
drop rejected RPCs or to soft-apply the issues, RPCs can be isolated within a
policy and allow rejected RPCs to go group of servers. The first decision to called the backpack.
through. Most service owners choose be made is choosing a DAG model that A layer can have sublayers in order
to enforce the policy in their test and will dictate the communication be- to prevent dependency cycles between
QA environments so they catch new tween the sets of servers. One simple servers in the same layer. Sublayering
dependencies before they reach pro- model that prevents cycles is a graph is a smaller problem and can often be
duction, then soft-apply the policy in that represents the expected turn- handled by a single team without coor-
production. Even when soft-applying up sequence of the servers. This is a dination with other teams.
the policies, monitoring and alerting stacked or layered model of the system, The layered model works well for
are still available for RPCs that would shown in Figure 4. The layered model enabling the bootstrap of a single sys-
be rejected. reinforces that servers will never have tem, but it still doesn’t solve the prob-
As mentioned earlier, the ACL dependencies on layers higher than lem of isolating production and test
can be based on historical informa- the ones they live on, enabling the se- environments, or restricting commu-
tion about RPCs fired by the binary, quential bootstrap of the servers from nications among different geographic
but that implies allowing the binary bottom to top. Servers can depend on regions. To tackle this kind of problem,
to run and serve production data for servers only at the same layer or below. you must be able to group servers into
some time without dependency con- Services at the bottom layer can disconnected sets; this is called the iso-
trols. Also, depending on the variabil- rely only on local static data in order lated model. Unlike the layered model,
ity and diversity of the outgoing traf- to bootstrap, never on data served by where dependencies are allowed in
fic, some RPCs might be rare or fired another service. At Google, the full the downward direction, the isolated
only under special circumstances, set of static data that is necessary to model disallows dependencies among
such as turn-down or crash. In this bootstrap a system (for example, com- different components. In the example
case, they might not show up in the piled binaries, configuration files, sys- illustrated in Figure 5, the products Ju-
historical data and would have to be tem documentation, and root keys) is piter, Earth, and Mars are not allowed

practice
Figure 6. Pseudocode for model authorization. observing the behavior of a system,

but preventing dependency problems
func isAllowedByModel(rpc, model):
before they reach production requires
clientNode = model.resolveNode(rpc.sender) a more active strategy. Implementing
serviceNode = model.resolveNode(rpc.receiver) dependency control ensures each new
return model.hasTransitiveConnection(clientNode, serviceNode)
dependency can be added to a DAG
before it enters use. This gives system
designers the freedom to add new de-
to exchange RPCs with each other. that you do not isolate critical compo- pendencies where they are valuable,
Thus, they are not allowed to depend nents by combining mutually exclusive while eliminating much of the risk that
on each other. models. Usually, simple models are comes from the uncontrolled growth
One way to generalize dependency easier to understand and to predict of dependencies.
authorization in a DAG model is to let the results of combining, like the lay-
oriented edges represent can-send-to ered and isolated models described
Related articles
relations. Each node on the graph has here. It can be challenging to predict on queue.acm.org
a self-referencing edge (that is, they the combined logic for two or more
The Hidden Dividends of Microservices
can send RPCs to themselves). Also, complex models. For example, sup-
Tom Killalea
the can-send-to relation is transitive: if pose there are two models based on https://queue.acm.org/detail.cfm?id=2956643
A can send RPCs to B, and B can send the geographical locality of machines.
A Conversation with Werner Vogels
RPCs to C, then A can send RPCs to C. It’s straightforward to see that assign- https://queue.acm.org/detail.cfm?id=1142065
Note that if B can send RPCs to A, and B ing locality “Tokyo” from one model
Fail at Scale
can send RPCs to C, that does not imply and locality “London” from the other Ben Maurer
that A can send RPCs to C or vice versa. model will result in an empty set, since https://queue.acm.org/detail.cfm?id=2839461
Can-send-to is a directed relation. If no machine can be physically located
there were a can-send-to relation in in London and Tokyo at the same time. References
both directions (from A to B and from B Meanwhile, if there are two tree mod- 1. Beyer, B., Jones, C., Petoff, J., Murphy, N.R. (Eds.). Site
Reliability Engineering: How Google Runs Production
to A), this would constitute a cycle and els based on locality—such as one for Systems. O’Reilly Media, 2016, 37–40.
2. Beyer, B., Jones, C., Petoff, J., Murphy, N.R. (Eds.). Site
the model wouldn’t be a DAG. city, time zone, and country, and an- Reliability Engineering: How Google Runs Production
Figure 6 shows how the pseudocode other for metro, voting zone, and coun- Systems. Chapter 25: Data processing pipelines.
O’Reilly Media, 2016.
for authorizing RPCs in a DAG model try—it might be difficult to verify which 3. Chang, F. et al. Bigtable: A distributed storage
could be written. combinations of values will return system for structured data, 2006; https://static.
googleusercontent.com/media/research.google.com/
The isolated model can be com- non-empty sets. en//archive/bigtable-osdi06.pdf.
bined with the layered model, allowing 4. Kachouh, R. The pillars of Squarespace services.
Squarespace Engineering; https://engineering.
the isolated bootstrap of each region to Conclusion squarespace.com/blog/2017/the-pillars-of-
be reinforced, as illustrated in Figure 7. With the growth of massive interde- squarespace-services.
5. Lamport, L. Email message sent to a DEC SRC
Figure 8 shows the pseudocode for pendent software systems, dependency bulletin board, 1987; https://www.microsoft.com/en-
combining different models. management is a crucial part of system us/research/publication/distribution/.
6. Saini, A. How much do bugs cost to fix during each
Be careful when combining models and software design. Most organiza- phase of the SDLC? Synopsis, 2017; https://www.
tions will benefit from tracking exist- synopsys.com/blogs/software-security/cost-to-fix-
Figure 7. Combined model. bugs-during-each-sdlc-phase/.
ing dependencies to help model their 7. Seaton, N. Why fidelity of environments throughout
latency, SLOs, and security threats. your testing process is important. Electric Cloud;
http://electric-cloud.com/blog/2015/09/why-fidelity-
Jupiter Earth Mars Many will also find it useful to limit the of-environments-throughout-your-testing-process-is-
important/.
product product product growth of new dependencies for data 8. Treynor, B., Dahlin, M., Rau, V. and Beyer, B. The
privacy privacy privacy integrity and to reduce the risk of out- calculus of service availability. acmqueue 15, 2 2017);
storage storage storage https://queue.acm.org/detail.cfm?id=3096459.
ages. Modeling infrastructure as a DAG 9. Ward, R. and Beyer, B. BeyondCorp: A new approach to
security security security will make it easier to be certain there enterprise security. ;login: 39, 6 (2014), 6–11; https://
network network network ai.google/research/pubs/pub43231.
are no dependencies that will prevent
isolated bootstrapping of a system.
Silvia Esparrachiari Ghirotti has been at Google for eight
Dependencies can be tracked by years, working in the areas of social products, user data
privacy, and fighting abuse. She currently leads the team
Figure 8. Pseudocode for multiple model authorization. developing tools for dependency control.
Tanya Reilly is the principal engineer for infrastructure
at Squarespace. She previously spent 12 years improving
func isAllowedByAllModels: the resilience of low-level services at Google, including
foreach model in modelCollection: introducing a layered model for dependency control.
# Checks if the RPC is allowed by the model. Ashleigh Rentz is a technical writer whose interests
if !isAllowedByModel(rpc, model): include blameless post mortems and wearable
return false technology. She spent 14 years at Google, most recently
producing internal documentation for SRE and Google
# If no models reject the RPC, then it should be allowed. Cloud Platform.
return true
Copyright held by owners/authors.
Publication rights licensed to ACM. $15.00.
104 COMM UNICATIO NS O F T H E AC M | NOV EM BER 201 8 | VO L . 61 | N O. 1 1

ACM Welcomes the Colleges and Universities
Participating in ACM’s Academic
Department Membership Program
ACM now offers an Academic Department Membership option, which allows universities and colleges to provide
ACM Professional Membership to their faculty at a greatly reduced collective cost.
The following institutions currently participate in ACM’s Academic Department Membership program:
• Amherst College • Montclair State University • University of Colorado Denver

• Appalachian State University • Mount Holyoke College • University of Connecticut
• Armstrong State University • New Jersey Institute of Technology • University of Houston
• Ball State University • Northeastern University • University of Illinois at Chicago
• Bellevue College • Ohio State University • University of Jamestown
• Berea College • Old Dominion University • University of Liechtenstein
• Binghamton University • Pacific Lutheran University • University of Maryland, Baltimore
• Boise State University • Pennsylvania State University County
• Bryant University • Regis University • University of Memphis
• Calvin College • Roosevelt University • University of Nebraska at Kearney
• Colgate University • Rutgers University • University of Nebraska Omaha
• Colorado School of Mines • Saint Louis University
• University of New Mexico
• Creighton University • San José State University
• University of North Carolina
• Cuyahoga Community College • Shippensburg University
at Charlotte
• Edgewood College • St. John’s University
• University of North Dakota
• Franklin University • Stanford University
• University of Puget Sound
• Gallaudet University • State University of New York
• University of Southern California
• Georgia Institute of Technology at Fredonia
• Governors State University • Trine University • University of the Fraser Valley
• Harding University • Trinity University • University of Victoria, BC Canada
• Harvard University • Union College • University of Wisconsin–Parkside
• Hofstra University • Union University • University of Wyoming
• Howard Payne University • Univ. do Porto, Faculdade de Eng. • Virginia Commonwealth University
• Indiana University Bloomington (FEUP) • Wake Forest University
• Kent State University • University of Alabama • Wayne State University
• Klagenfurt University, Austria • University of California, Riverside • Wellesley College
• La Sierra University • University of California, San Diego • Western New England University
• Messiah College • University of California, Santa Cruz • Worcester State University
• Missouri State University • University of Colorado Boulder
Through this program, each faculty member receives all the benefits of individual professional membership,
including Communications of the ACM, member rates to attend ACM Special Interest Group conferences, member
subscription rates to ACM journals, and much more.
For more information: www.acm.org/academic-dept-membership

contributed articles
DOI:10.1145/ 3185336
convenient locations to help with
Skill recommendations must be provided when pending tasks5) and personal audio
(such as for providing private notifica-
users need them most, without being obtrusive tions and suggestions18).
or distracting. Virtual assistant capabilities are
commonly called “skills.” Skill func-
BY RYEN W. WHITE tionality ranges from basic (such as
timers, jokes, and reminders) to more
Skill
advanced (such as music playback,
calendar management, and home au-
tomation). Assistant skillsets include
both first-party skills and third-party
Discovery
skills. First-party skills comprise the
aforementioned basic skill function-
ality found in many assistants, as
well as skills that leverage assistant
in Virtual
providers’ strengths in such areas as
electronic commerce (Amazon Alexa),
productivity (Microsoft Cortana), and
search (Google Assistant). All major
Assistants
assistants also provide development
kits that empower third-party develop-
ers to create their own skills for inclu-
sion. Skills can be invoked indepen-
dently, linked together within a single
voice command to invoke a prepro-
grammed routine, or in a sequence of
related skills arranged as required for
complex task completion. Despite the
significant value virtual assistants can
offer, discovery of their capabilities re-
mains a challenge.
VIRTUAL ASSISTANTS LIKE Amazon Alexa, Microsoft
Cortana, Google Assistant, and Apple Siri employ Skill Discovery
Skill discovery is a challenge for two
conversational experiences and language-understanding primary reasons: the “affordances,”
technologies to help users accomplish a range of tasks, or capabilities, of virtual assistants
from reminder creation to home automation. Voice is are often unclear and the number of
the primary means of engagement, and voice-activated key insights

assistants are growing in popularity; estimates as of
˽˽ Virtual assistants in headless devices
June 2017 put the number of monthly active users of (such as smart speakers) do not
fully convey their rapidly expanding
voice-based assistant devices in the U.S. at 36 million.a capabilities, sometimes causing users to
Many are “headless” devices that lack displays. Smart struggle to understand when and how to
best utilize assistant skills.
speakers (such as Amazon Echo and Google Home) ˽˽ This article evaluates methods for
are among the most popular devices in this category. recommending relevant virtual-assistant
skills using the current task context,
Speakers are tethered to one location, but there are identifying complementary contributions
from personal and contextual features.
other settings where voice-activated assistants can be
˽˽ It also provides design recommendations
helpful, including automobiles (such as for suggesting for how to offer contextual skill
recommendations in a timely and
a https://www.emarketer.com/Article/Smart-Home-Speakers-Possible-New-Competitor/1015961 unobtrusive manner.

skills available in virtual assistants ing rapidly, especially with the advent examine Alexa skill popularity on the
is increasing rapidly. It is not easy of third-party skill creation through http://www.bing.com/ Web search en-
to communicate all that these assis- tools (such as the Alexa Skills Kitc gine. Bing search logs show that over
tants can do. Users may develop an and the Cortana Skills Kitd). Amazon the 18 months between July 2016 and
expectation from prior assistant use Alexa, the most established skills December 2017 inclusive (the maxi-
or device marketing that assistants platform, had more than 26,000 mum time horizon of the logs), the
will perform certain tasks well. Dis- skills available as of December 2017. 100 most popular Alexa skills (0.4%)
covering new skills, especially those Figure 1 reports the dramatic in- comprised two-thirds of the skill-re-
that could help with the task at hand, crease in the Alexa skillset over time. lated search clicks. It is unlikely the
is considerably more difficult.b The The pace of growth is such that users 99.6% of skills clicked one-third of
number and variety of skills available often struggle to keep track as new the time have little or no utility, ad-
in virtual assistants is also accelerat- skills are released. dressing only highly specific needs;
Despite the increase in the number moreover, there was no correlation
of Alexa skills, it is not clear that the between the explicit skill rating on
b Although the focus is on skill discovery, it is
also worth acknowledging there are other fac-
ones being added are actively being the Alexa skill store and skill use
tors that can affect the use of virtual assistants, utilized. To help determine if they are, (Pearson r = –0.05). A more likely ex-
including reliance on far-field speech recogni- I ran an offline experiment. Although planation is that users need help to
tion in smart speakers that is typically less ac- usage logs from Alexa were unavail- fully understand the capabilities of
curate than its near-field counterpart,25 as well able for this study, it was possible to their virtual assistants and technolo-
IMAGE BY AND 4ME
as lack of privacy with the broadcast audio in

these devices and concern about the social ac-
gy to support salient skill discovery is
ceptability of using voice-activated assistants c https://developer.amazon.com/alexa-skills-kit necessary to help them make the best
in general.3 d https://developer.microsoft.com/en-us/Cortana use of these assistants.

Skill Search al and error. Unclear affordances have likely to relate to users’ current tasks
While assistants may support skill long been highlighted by the design since the active context is ignored.
searching to locate skills of interest community as a reason for inaccurate When users do search for specific
(such as through a dedicated skill like mental models and the sparse or incor- skills, they encounter a different ex-
“SkillFinder” on Alexa), these searches rect use of technology.14 To address this perience from what they may be famil-
may be frustrating and fruitless since limitation, assistants support voice iar with through Web search engines;
smart speakers do not present suf- prompts (such as “things to try” or search engines are designed to handle
ficient clues about their capabilities; “skill of the day”) or answer questions general-purpose queries and provide
many people simply do not know what (such as “What are your new skills?”). rich visual feedback (a list of search
they can even ask. Although a virtual Both are inefficient ways to access new results) that can help people under-
assistant can help order food from a capabilities that require user input stand what worked well in their query
local restaurant or reserve transpor- and present results through an audio and help them refine their search as
tation, unless users are aware of such list that is difficult for a typical user needed. In contrast, virtual assis-
options, they are unlikely to invoke the to peruse. Moreover, skills discovered tants have a fixed set of capabilities,
skills except by accident or through tri- through such mechanisms are less and smart speakers provide users lim-
ited information about what worked
Figure 1. Growth in the number of available skills for Amazon Alexa from November 2015 well in their search. Failure messages
to December 2017 inclusive; https://www.alexaskillstore.com/. Alexa is a good subject for
this analysis given the maturity of Amazon’s third-party skills platform.
(such as “Sorry, I cannot do this for
you right now” or “I am sorry, I do not
understand the question”) are com-
Cumulative Number of Amazon Alexa Skills Available Over Time mon but uninformative. If they cannot
26038
30000 handle the request and have a display
24499
23276
(if invoked through, say, a mobile de-
21594
25000
19856
vice), some assistants may resort to
17501
Number of Skills
20000 presenting Web search results, creat-

14852
ing a disjointed experience. Users may

11927
11058
15000 be affected by “functional fixedness,”2

10188
9231
a cognitive bias whereby expectations

8211
6709
10000
5412
about what the assistant can do limits

3864
3016
2450
the breadth of users’ requests. Also,

1784
1331
5000
958
667
406
252
January 2016 166
complex answers or multiple search

111
48
0 results are difficult to convey through

July 2016
January 2017
July 2017
June 2016
June 2017
August 2016
August 2017
September 2016
February 2016
September 2017
December 2016
February 2017
December 2015
April 2016
December 2017
April 2017
November 2016
November 2015
October 2016
November 2017
March 2016
October 2017
March 2017
May 2016
May 2017
audio output alone. Virtual assistants

may elect to delegate results presenta-
tion to a companion device (such as a
smartphone) when they cannot pres-
ent the results via audio, but such de-
Month vices are not always available.
Figure 2. Plots of the time of day at which sleep sounds and productivity skills are invoked. The red dashed line denotes temporal
distribution across all skills. Also shown are percentages of all invocations for each skill at two user-defined locations: home and work.
Temporal distribution of sleep sounds skill use Location distribution of skill use
20%
% Invocations
100%
15% 5.90%
10% 90%
Fraction of invocations
5% 80%
0% 70%
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 60% 82.90%
Hour of day 50%
94.10%
Temporal usage distribution of productivity skill use 40%
20%
% Invocations
30%
15%
10% 20%
5% 10% 17.10%
0% 0%
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Sleep sounds Productivity
Hour of day Home Work

Design Challenges Figure 2 reports that, as expected,

Developers of virtual assistants face the sleep-sounds skill is more likely
a two-fold challenge: how to set user to be used in the evening and at night
expectations and educate users about than during the day, and much more
what their assistants can do; and how
to help these users while they are learn- Although a virtual likely to be used at home than at work.
Use of the productivity skill exhibits a
ing or even afterward as the assistant
adds new skills. Existing onboarding
assistant can help wholly different time-and-location pro-
file. Though just one example, it shows
methods provide written instructions order food from a that even for an immobile device like
in the retail packaging with examples
of the types of requests assistants can
local restaurant a smart speaker, there are still impor-
tant contextual factors that should in-
support, along with periodic email or reserve fluence skill-recommendation priors.
messages that highlight new capabili-
ties over time. These methods mostly
transportation, The use of context is even more per-
tinent in mobile scenarios where the
promote only first-party skills, yet the unless users are context is more dynamic and user tasks
power of virtual assistants (and much
of the challenge with skill discovery) re- aware of such are more context-dependent.
Just-in-time information access
sides in the silent emergence of tens of options, they are has been studied extensively.16 To rec-
thousands (and ultimately many more)
third-party skills. Although users can unlikely to invoke ommend the right skills at the “right
time,” or when they are most useful for
develop effective mental models of
products through prolonged use,20
the skills except by the current task, it is understood that
developers of skills software need rich
ever-expanding assistant capabilities accident or through models of user context. Fortunately, vir-
make it difficult.
From a consumer-learning perspec-
trial and error. tual assistants already employ myriad
sensors to collect and model context, in-
tive, knowledge of virtual-assistant cluding physical location, calendar, in-
capabilities falls into the domain of terests, preferences, search and browse
“declarative knowledge.”22 Instruction activity, and application activity, all
manuals packaged with these products gathered with explicit user consent.
may outline sample functionality, but
following the instructions too closely Skill Recommendation
can hinder exploration of product ca- Models for contextual skill recommen-
pabilities.8 Periodic (weekly) email dation can leverage a range of signals
messages may help reveal new skills available to the virtual assistant to rec-
and reinforce existing skills, but they ommend relevant skills based on the
are shown on a different device from current context. Recommender systems
the one(s) used for skill invocation and have been studied extensively,1 and de-
a different time from when they are velopers of virtual assistants can draw
needed. The in-situ recommendation on lessons from that community to as-
of skills (based on the current task) sist in the recommendation of skills to
could be an effective way to help en- users. Salient skills could be suggested
sure users of virtual assistants discover in response to an explicit request for as-
available skills and receive help when it sistance from users (or a “cry for help”
is most needed and welcome. in more time-critical scenarios,12 where
the need for assistance is more urgent)
Role of Context or be based on external events. For ex-
Context is an important determinant ample, a virtual assistant running on a
of skill utility. The capabilities people smart speaker deployed in a meeting
want to employ differ based on con- room can use the commencement of
textual factors (such as time and loca- the meeting as a trigger to suggest ways
tion). For example, consider two skills, to help make the meeting more produc-
one offering ambient relaxing sounds tive (such as by taking notes or identify-
to help promote sleep and one focusing action items).
ing on work productivity. Analyzing us- Given a set of rich contextual sig-
age logs of these two skills from their nals, I have been exploring the use of
internal deployment in Cortana with machine-learned skill-recommenda-
Microsoft employee volunteers reveals tion algorithms (in this case, multiple
notable differences by time of day and additive regression trees4) to recom-
location—home and work—both user- mend skills that are useful in the cur-
specified. rent context. The models are trained

using historic skill usage data. This ex- similar context is observed again, then recommendation,1,12 although there
perimental setup resembles click pre- the skills used previously in that con- are differences (such as lack of an ex-
diction in search and advertising17 but text—by the current user, one or more plicit query and desire to focus on sug-
reflects several differences, including cohorts, or the population of users— gestion utility) rather than relevance or
prediction target (skill used vs. search are more likely to be relevant and hence “interestingness” as the primary mea-
result or advertisement clicked), set- used again. Since the use of historical sure of model effectiveness. The table
ting (open-ended assistant engage- data puts the focus largely on predict- here provides more detail on the fea-
ment vs. search-engine result page ing already-used skills, the study also ture classes.
examination), and context (richer and investigated prediction performance Figure 3 reports the receiver operat-
more varied contextual signals avail- for the subset of test cases where Cor- ing characteristic curves and precision-
able for skill usage prediction). tana first observed users trying a skill. recall curves for the skill-usage predic-
This research uses records with five The study further examined the ef- tion task across all skill instances in
months of skill invocations from an in- fect of three classes of features used the test data. The feature contribution
ternal deployment of a smart speaker in the learned-skill-recommendation analysis starts with popularity features
powered by Cortana with Microsoft model: popularity, or general popu- (area under the receiver operating char-
employee volunteers (the same dataset larity of a skill across all users (using acteristic, or ROC, curve or AUROC is
as in Figure 2) and data on the context only historic usage data from before 0.651), adds context features (AUROC
in which those skills were used as col- the skill was invoked); context, or rich increases to 0.786), and then adds per-
lected by Cortana. The data is split tem- contextual signals describing when the sonal features (AUROC increases fur-
porally and the first 16 weeks are used skill is used; and personalization, or ther to 0.918), as in Figure 3a. All three
for training and the last two weeks for features corresponding to the user who models outperform a baseline of always
testing. Training and test data is strati- invoked the skill (such as the popular- predicting skill utilization, which re-
fied by user. The core principal in this ity of the specific skill for that specific flects 19% precision at 100% recall, as
specific instantiation of contextual user). These features resemble some traced by the dotted line in Figure 3b.
skill recommendation is that if this or a that are commonly used in search and The model that uses only historic skill-
usage frequency performs worst. The
Power
Classesconsumption
of features used
for typical
in thecomponents.
skill-recommendation task. Text features with * are first results also show that algorithm per-
represented in a continuous semantic space (300-dimension concept vector),6 and the
cosine similarities with both skill name and skill description are then computed. Each
formance improves considerably, given
cosine measure (such as cosine similarity between the vectors for recent queries and for contextual features (yielding gains in
skill name) becomes a feature in the contextual skill-recommendation model. precision) and personal features (yield-
ing gains in recall), as in Figure 3b.e
Inspecting the feature weights in
Feature class Example features
the model containing all features
Popularity Historic skill usage count, historic skill user count (full model) reveals the features with
Context Calendar (meeting duration, subject*), visited venues (type, duration, the greatest discriminatory values are
name*), local time, to-do-list items*, search queries
(in past 30 mins*, in past 30 days*), applications used* those associated with skill popularity
Personal Historic skill usage by current user within the past 30 mins, (for both the current user and glob-
within the past 30 days, and all time ally), calendar, and short-term inter-
ests, in this case, recent Web search
queries. While the performance of the
recommendation model is promis-
Figure 3. Performance curves for contextual skill recommendation. Results are shown for ing, reliance on historic data and the
all skill instances in the test data for several feature classes: popularity, popularity plus
context, and popularity plus context and personal (full model).
importance of such data in the model
means there could be limitations on
when it can be applied; for example,
Popularity plus Context plus Personal it could perform worse for new skills
1.00 1.00 for which virtual-assistant developers
0.90 0.90 have little data. To better understand
0.80 0.80 the role of usage data, I reran the
experiment, limiting test-data skill
True Positive Rate
0.70 0.70
0.60 0.60 invocations to cases where the in-
Precision
0.50 0.50 voked skill was used by the user for

0.40 0.40 the first time. The results resemble
0.30 0.30 those reported earlier (AUROC = 0.894
0.20 0.20 for the full model), suggesting the ap-
0.10 0.10 proach may well generalize to unseen
0.00 0.00
0.00
0.00
0.40
0.40
0.90
0.60
0.90
0.60
0.80
0.30
0.80
0.50
0.30
0.50
0.20
0.70
0.20
0.70
0.10
1.00
0.10
1.00
False Positive Rate Recall e Similar trends are were observed when the
order was reversed, to first add personal fea-
tures and then to add contextual features.

skills. Regardless, this usage-based and extrinsic, as when a weather report people involved) for contextual mod-
method is meant only as an illustra- warns of an impending severe weather eling. Running skill recommendation
tive example, and many extensions event. Proactive scenarios on headless within conversations highlights an
are possible. Complementary meth- devices can cause frustration and dis- interesting social dimension to the
ods from recommender-systems re- traction if a device reaches out with task of skill recommendations, where
search specifically tailored for cold- an audio message at an inopportune the skills suggested could vary based
start scenarios (such as by Schein et moment. While such intrusion could on who is spoken to and that person’s
al.19) may be helpful in tandem with the annoy users, it also has privacy impli- relationship with the speaker, in addi-
usage-based approach. cations tied to sharing potentially sensi- tion to the topic of the conversation.
Although the focus in here is on tive data with a wide audience, as when, Skills are also not used in isolation;
contextual-skill recommendation, the say, accidentally notifying all meeting many scenarios involve skills that are
results show that personal features con- attendees about an upcoming private interconnected within a task, as in
tribute significantly to the quality of the appointment. Methods have been pro- the restaurant-plus-transportation
recommendations generated. Person- posed to better understand the situa- scenario mentioned earlier. Skill-in-
alization differs from contextualization tion and choose a suitable notification vocation logs can be mined by virtual
because it is unique to the user, whereas strategy.7 The need for intelligent notifi- assistants for evidence of the co-uti-
contextualization could apply to all us- cations is not lost on designers of smart lization of these skills within a single
ers in the same context, perhaps in the speakers; for example, Google Home task (similar to how guided tours and
same meeting. More studies are needed and Amazon Echo both support subtler trails can be mined from historical
on the use of personal signals, as well as approaches for notifying users (such as user-activity data23) to help generate
how best to apply them to devices (such illuminating an indicator light on the relevant skill recommendations. Such
as smart speakers) that may be used in device as an alert regarding a pending recommendations can then be pre-
social settings (such as a meeting room notification). It is only when users no- sented proactively, immediately fol-
or a family residence) where there could tice the notification and engage with the lowing the use of a related skill.
be simultaneous users of the virtual as- device that the notification is provided.
sistant, some known to the assistant and However, this delay reduces notification New Horizons
some unknown. More broadly, use of vir- utility considerably. Virtual assistants have traditionally
tual assistants in social situations raises served users independently. In the
corporate product-development policy Context and User Consent past few years, assistant providers have
questions around whose virtual assis- The performance of the contextual- started to partner to leverage their
tant should be employed at any given skill-recommendation algorithms is complementary strengths (such as the
time in such multi-user settings. Speak- strongly dependent on the signals that collaboration announced in August
er-identification technology15 can help are accessible and the degree of con- 2017 between Amazon and Microsoft
distinguish speakers in these settings sent users are willing to provide to get on their Alexa and Cortana personal
to help decide what user profile or even access to them. Contextual-skill rec- assistants). Although the focus of this
what virtual assistant to apply. A central- ommendation focuses on suggesting article is generally on assistants rec-
ized group assistant tied to collective skills to help people when they are in ommending their own capabilities,
activities (such as meetings21) can also a context where those skills could be opportunities are emerging to recom-
help serve as a broker to coordinate tasks useful. Communicating clearly to us- mend skills among multiple assis-
between individuals and their virtual as- ers the connection between the provi- tants in ways where users could have
sistants, even across assistant brands. sion of consent for data access and the several assistants, each helping them
provision of useful recommendations with one or more aspects of their lives.
Using Recommendations is likely to increase the chances users For example, Cortana might aim to
Despite the plentiful opportunities would be willing to grant data access capitalize on Microsoft’s many produc-
around developing more accurate for skill-recommendation purposes.10 tivity assets to excel in the personal-
contextual skill-recommendation al- As virtual assistants begin to mani- productivity domain. In the partner-
gorithms, generation of skill recom- fest in other applications and devic- ship between Microsoft and Amazon,
mendations is not the only challenge es, the range of contextual signals Alexa could recommend Cortana for
developers of virtual assistants face available to them will expand; for productivity-related tasks, and Cortana
when working in this area. They need example, Facebook’s artificial intel- could recommend Alexa for more con-
to also consider how to present the ligence assistant, M, indeed chimes sumer-related scenarios, especially in
recommendations to users at the right in during instant-messaging conver- e-commerce. Such partnerships allow
time and in a manner that is not too ob- sations to suggest relevant content developers of virtual assistants to focus
trusive or distracting. The detection of and capabilities.f Virtual assistants more of their resources on strengthen-
trigger events and selection of the ap- can leverage signals about the conver- ing their differentiating capabilities
propriate notification strategy are both sation (such as topics discussed and and less on keeping pace with compet-
particularly important. itors in other areas. Beyond strategic
As noted, the trigger can be user-ini- f https://www.theverge.com/2017/4/6/
partnerships between well-known ma-
tiated and intrinsic, as when a user says, 15200836/facebook-messenger-m-suggestions- jor corporate assistant providers, the
“Hey Cortana, help me,” or event-driven ai-assisant virtual-assistant-using public is also

likely to see increased collaborations Timing is everything. Surfacing sa-

among multiple skill developers to lient skills means users can more fully
create compelling new skills and skill leverage the range of support virtual as-
combinations. Such partnerships can sistants can provide. Presenting users
capitalize on complementary technol-
ogies, shared domain knowledge, and Skill-development with skill suggestions at the right mo-
ment (when they need them) means
other assets (such as data and human
capital) that can unlock significant
kits should allow assistant capabilities are more likely
to be remembered in the future.9 As-
skill differentiation and utility for us- developers sistant providers could start by offer-
ers. Services that offer skill federation
across multiple assistants—much like
to specify for ing support for easily detectable events
(such as the start of a scheduled meet-
search engine recommenders (such each skill during ing, receipt of a severe-weather alert,
as Switcheroo,24 which directs search-
ers to the optimal search engine for
skill creation or following use of a related skill) and
broaden trigger-event coverage there-
their current query)—will also emerge the context(s) after based on task models built from
for virtual assistants, guiding users to
the assistant best able to handle the in which the skill contextual signals, users’ contact pref-
erences, and implicit and explicit feed-
current task or their tasks in general. should be back data.
Interoperability among multiple as-
sistants might also yield considerable recommended. Use contextual and personal signals.
Skills are relevant in one or more con-
user benefit; for example, assistants texts. The results of the study showed
could share contextual signals to of- both contextual and personal signals
fer skill recommendations and other are important in skill recommenda-
services of greater utility than any indi- tion. A combination of the current con-
vidual virtual assistant alone. text and long-term user activities and
Helping people understand how interests should be used for this task if
their assistants can help them is an that data is available. In addition, skill-
important step in driving their uptake development kits should allow devel-
at scale. This is especially important opers to specify for each skill during
in smart speakers and similar devices, skill creation the context(s) in which
where capabilities are not immediately the skill should be recommended.
obvious given limited display capacity. Examine additional signals. There
Looking ahead, I offer the following is a range of contextual and personal
eight recommendations for virtual as- signals virtual assistants do not have
sistant developers: access to today (such as conversations
Be proactive. The effectiveness of in the room where a smart speaker is
search (reactive) experiences for the located, food being consumed, and
skill-discovery task is influenced by television shows being watched) that
users’ expectations regarding affor- could correlate with the invocation of
dances in virtual assistants. Proactive skills and enable more targeted recom-
skill recommendation methods that mendations. Virtual-assistant develop-
understand the current context are a ers should investigate what subset of
necessary complement to user-initi- these contexts is most likely to yield
ated skill discovery. Proactive meth- the best improvements in the accuracy
ods may eventually supersede reactive of skill recommendations and explore
methods as the primary means of en- the feasibility of collecting these sig-
gaging with assistant skills, contingent nals at a large scale. They also need to
on the emergence of intelligent noti- engage with users to understand what
fication strategies. Virtual assistants new signals they are comfortable shar-
could offer proactive support when ing with their assistant.
certain criteria are met, including the Consider privacy and utility. User pri-
availability of rich contextual signals, vacy is paramount. If developers and
high confidence scores from recom- their employers expect users to provide
mendation algorithms, and low cost of access to the contextual and personal
interruption (such as when the user is signals required by skill-recommenda-
assumed by the assistant to not be en- tion algorithms, they must clearly show
gaged in another task on the speaker signal value. Offering the right help at
or companion device, as in the recom- the right moment and attributing it to
mendation on leveraging companion the permissioned data access via rec-
devices). ommendation explanations could serve

to demonstrate the utility that can be occasionally suggest new skills based Proceedings of the 22nd ACM International Conference
on Information and Knowledge Management (San
derived from data sharing. Virtual assis- on their users’ past skill usage. An ap- Francisco, CA, Oct. 27–Nov. 1). ACM Press, New York,
tants could offer explanations for each propriate format and time for such 2013, 2333–2338.
7. Horvitz, E. and Apacible, J. Learning and reasoning
skill recommendation to help users un- suggestions could be through an in- about interruption. In Proceedings of the Fifth
derstand how and why it was generated. structive tip at what may represent a International Conference on Multimodal Interfaces
(Vancouver, BC, Canada, Nov. 5–7). ACM Press, New
Permit multiple recommendations. teachable moment immediately fol- York, 2003, 20–27.
The focus in this article is the task of lowing the use of a related skill. As 8. Lakshmanan, A. and Krishnan, H.S. The Aha!
experience: Insight and discontinuous learning in
predicting the single skill that users mentioned, developing a notification product usage. Journal of Marketing 75, 6 (Nov. 2011),
105–123.
would be most likely to use in a given strategy needs careful attention, given 9. Kester, L., Kirschner, P.A., van Merriënboer, J.J., and
context. Regardless of the richness the need to balance the intrusiveness Baumer, A. Just-in-time information presentation and
the acquisition of complex cognitive skills. Computers
of any contextual model, the model of alerting (especially audio alerting) in Human Behavior 17, 4 (July 2001), 373–391.
is often incomplete and lacking in vs. guiding users toward skills when 10. Krause, A. and Horvitz, E. A utility-theoretic approach
to privacy in online services. Journal of Artificial
some information about the current they need them most. Intelligence Research 39 (Nov. 2010), 633–662.
task. Having only limited informa- 11. Marchionini, G. and Shneiderman, B. Finding facts vs.
browsing knowledge in hypertext systems. Computer
tion could thus affect recommenda- Conclusion 21, 1 (Jan. 1988), 70–80.
tion quality. When confidence in the Learning all that virtual assistants 12. Mishra, N., White, R.W., Ieong, S., and Horvitz, E.
Time-critical search. In Proceedings of the 37th
recommendation model is below a can do or relying on periodic skill- International ACM SIGIR Conference on Research
threshold at which a definitive skill update email messages from their and Development in Information Retrieval (Gold
Coast, QLD, Australia, July 6–11). ACM Press, New
would typically be suggested, the as- developers is insufficient for a user York, 2014, 747–756.
sistant should recommend multiple to make the most of such skills. Un- 13. Miyake, A. and Shah, P., Eds. Models of Working
Memory: Mechanisms of Active Maintenance and
(most-relevant) skills. This process like apps, which are popular on Executive Control. Cambridge University Press, New
accommodates less-relevant recom- smartphones and tablets, assistant York, 1999.
14. Norman, D.A. Affordance, conventions, and design.
mendations and meets other require- skills are most likely to be invoked on Interactions 6, 3 (May 1999), 38–43.
15. Reynolds, D.A., Quatieri, T.F., and Dunn, R.B. Speaker
ments of the recommendation task headless devices that lack displays, verification using adapted Gaussian mixture models.
(such as showing the breadth of rel- increasing dependence on skill find- Digital Signal Processing 10, 1-3 (Jan. 2000), 19–41.
16. Rhodes, B.J. and Maes, P. Just-in-time information
evant skills available and supporting ing and limiting skill discovery. The retrieval agents. IBM Systems Journal 39, 3.4 (2000),
serendipitous skill discovery). limitations of browsing to discover 685–704.
17. Richardson, M., Dominowska, E., and Ragno, R.
Leverage companion devices. Devices new knowledge are well understood.11 Predicting clicks: Estimating the click-through rate
without displays may still have ac- Even devices with screens, including for new ads. In Proceedings of the 16th International
Conference on World Wide Web (Banff, AB, Canada,
cess to many screens through WiFi or Amazon’s Echo Show, are limited in May 8–12). ACM Press, New York, 2007, 521–530.
Bluetooth connectivity, whether on a the number of recommendations 18. Sawhney, N. and Schmandt, C. Nomadic radio: Speech
and audio interaction for contextual messaging
smartphone, tablet, or desktop PC. Sig- they can present to users and would in nomadic environments. ACM Transactions on
nals from such devices that may not be benefit from algorithms that leverage Computer-Human interaction 7, 3 (Sept. 2000),
353–383.
available on smart speakers (such as contextual and personal cues for skill 19. Schein, A.I., Popescul, A., Ungar, L.H., and
recent smartphone apps used) would recommendation. Looking ahead, the Pennock, D.M. Methods and metrics for cold-start
recommendations. In Proceedings of the 25th Annual
help enrich the context and assist in user-perceived utility of virtual assis- International ACM SIGIR Conference on Research
providing more-relevant skill sugges- tants, especially as they manifest in and Development in Information Retrieval (Tampere,
Finland, Aug. 11–15). ACM Press, New York, 2002,
tions. Given limitations in users’ work- smart speakers and other headless 253–260.
20. Schilling, M.A. A ‘small-world’ network model of
ing memory,13 evaluating result lists devices (such as personal audio), will cognitive insight. Creativity Research Journal 17, 2-3
is considerably easier if a device has a depend largely on their ability to pro- (2005), 131–154.
21. Tur, G., Stolcke, A., Voss, L., Peters, S., Hakkani-Tur,
display. If not, only the top few options actively identify and share skills that D., Dowding, J., Favre, B., Fernández, R., Frampton, M.,
can reasonably be vocalized by the assis- help their users at the moment they Frandsen, M., and Frederickson, C. The CALO meeting
assistant system. IEEE Transactions on Audio,
tant for consideration by users. Virtual need that help the most. Speech, and Language Processing 18, 6 (Aug. 2010),
assistants running on headless devices 1601–1611.
22. Van Osselaer, Stijn M.J., and Janiszewski, C. Two ways
could use proximal devices with screens References of learning brand associations. Journal of Consumer
1. Adomavicius, G. and Tuzhilin, A. Toward the next
to better understand user tasks and generation of recommender systems: A survey of
Research 28, 2 (Sept. 2001), 202–223.
23. Wexelblat, A. and Maes, P. Footprints: History-rich
display additional content to augment the state-of-the-art and possible extensions. IEEE tools for information foraging. In Proceedings of the
Transactions on Knowledge and Data Engineering 17, 6
voice-only interaction. (June 2005), 734–749.
SIGCHI Conference on Human Factors in Computing
Systems (Vancouver, BC, Canada, Apr. 13–18). ACM
Support continuous learning. Rec- 2. Duncker, K. and Lees, L.S. On problem solving. Press, 1999, 270–277.
Psychological Monographs 58, 5 (1945), i. 24. White, R.W., Richardson, M., Bilenko, M., and Heath,
ommendations are needed when us- 3. Easwara Moorthy, A. and Vu, K.P.L. Privacy concerns A.P. Enhancing Web search by promoting multiple
ers are new to their virtual assistant. for use of voice-activated personal assistant in the search engine use. In Proceedings of the 31st Annual
public space. International Journal of Human- International ACM SIGIR Conference on Research
However, since skill volume grows Computer Interaction 31, 4 (2015), 307–335. and Development in Information Retrieval (Singapore,
silently and quickly over time (see Fig- 4. Friedman, J., Hastie, T., and Tibshirani, R. Additive July 20–24). ACM Press, New York, 2008, 43–50.
logistic regression: A statistical view of boosting (with 25. Wölfel, M. and McDonough, J. Distant Speech
ure 1 for an example of such growth discussion and a rejoinder by the authors). The Annals Recognition. John Wiley & Sons, Inc., New York, 2009.
on Alexa), I foresee there will always of Statistics 28, 2 (2000), 337–407.
5. Horvitz, E. and Krumm, J. Some help on the
be a requirement for assistants to of- way: Opportunistic routing under uncertainty. Ryen W. White (ryenw@microsoft.com) is a Partner
fer suggestions to their users on how In Proceedings of the 2012 ACM Conference on Researcher and Research Manager at Microsoft Research
Ubiquitous Computing (Pittsburgh, PA, Sept. 5–8). AI, Redmond, WA, USA.
they can best help them with their ACM Press, New York, 2012, 371–380.
6. Huang, P.S., He, X., Gao, J., Deng, L., Acero, A.,
current task. To help improve user un- and Heck, L. Learning deep-structured semantic
derstanding, virtual assistants could models for Web search using clickthrough data. In © 2018 ACM 0001-0782/18/11 $15.00

DOI:10.1145/ 3186277
build a new abstraction. Encapsu-
Simplicity, small size, portability, lation in the C language provides a
good example of a policy. The ISO C
and embeddability set Lua apart specification offers no mechanism
from other scripting languages. for modules or interfaces.9 Neverthe-
less, C programmers leverage existing
BY ROBERTO IERUSALIMSCHY, LUIZ HENRIQUE DE FIGUEIREDO, mechanisms (such as file inclusion
AND WALDEMAR CELES and external declarations) to achieve
those abstractions. On top of such ba-
A Look at
sic mechanisms provided by the C lan-
guage, policy adds several rules (such
as “all global functions should have a
prototype in a header file” and “header
the Design
files should not define objects, only de-
clare them”). Many programmers do
not know these rules (and the policy as
a whole) are not part of the C language.
of Lua
Accordingly, in the design of Lua,
we have replaced addition of many
different features by creating instead
only a few mechanisms that allow
programmers to implement such fea-
tures themselves.6 The motto leads
to a design that is economical in con-
cepts. Lua offers exactly one general
mechanism for each major aspect of
programming: tables for data; func-
tions for abstraction; and coroutines
L UA IS A scripting language developed at the Pontifical for control. On top of these building
Catholic University of Rio de Janeiro (PUC-Rio) that blocks, programmers implement sev-
eral other features, including modules,
has come to be the leading scripting language for objects, and environments, with the
video games worldwide.3,7 It is also used extensively in aid of minimal additions (such as syn-
tactic sugar) to the language. Here, we
embedded devices like set-top boxes and TVs and in look at how this motto has worked out
other applications like Adobe Photoshop Lightroom in the design of Lua.
and Wikipedia.14 Its first version was released in 1993.
Design Goals
The current version, Lua 5.3, was released in 2015. Like other scripting languages, Lua
Though mainly a procedural language, Lua lends has dynamic types, dynamic data struc-
tures, garbage collection, and an eval-
itself to several other paradigms, including object- like functionality. Consider Lua’s par-
oriented programming, functional programming, and ticular set of goals:
data-driven programming.5 It also offers good support
for data description, in the style of JavaScript and key insights
JSON. Data description was indeed one of our main ˽˽ What sets Lua apart from other scripting
languages is its particular set of goals:
motivations for creating Lua, some years before the simplicity, small size, portability, and
embeddability.
appearance of XML and JavaScript. ˽˽ The entire implementation of Lua has
Our motto in the design of Lua has always been
IMAGE BY BUG F ISH
25,000 lines of C code; the binary for

64-bit Linux has 200k bytes.
“mechanisms instead of policies.” By policy, we mean ˽˽ Since its inception, Lua was designed
a methodical way of using existing mechanisms to to interoperate with other languages.
114 CO M MUNICATIO NS O F TH E AC M | NOV EM BER 201 8 | VO L . 61 | N O. 1 1

Simplicity. Lua aims to offer only a access global variables in a state, and of embeddability in the design of Lua,
few powerful mechanisms that can ad- perform other basic tasks. The stand- we first briefly introduce the interface
dress several different needs, instead alone Lua interpreter is a tiny applica- between Lua and its host language.
of myriad specific language constructs, tion written on top of the library.
each tailored for a specific need. The These goals have had a deep impact The Lua–C API
Lua reference manual is small, with on our design of Lua. Portability re- To illustrate the concept of embedding
approximately 100 pages covering the stricts what the standard libraries can in Lua, consider a simple example of a
language, its standard libraries, and offer to what is available in ISO C, in- C program using the Lua library. Take
the API with C; cluding date and time, file and string this tiny Lua script, stored in a file
Small size. The entire implementa- manipulation, and basic mathemati-
tion of Lua consists of 25,000 lines of cal functions. Everything else must be pi = 4 * math.atan(1)
C code; the binary for 64-bit Linux has provided by external libraries. Simplic-
200k bytes. Being small is important ity and small size restrict the language Figure 1 shows a C program that runs
for both portability, as Lua must fit into as a whole. These are the goals behind the script and prints the value of pi.
a system before running there, and em- the economy of concepts for the lan- The first task is to create a new state
bedding, as it should not bloat the host guage. Embeddability has a subtler and populate it with the functions from
application that embeds it; influence. To improve embeddability, the standard libraries (such as math.
Portability. Lua is implemented in Lua favors mechanisms that can be atan). The program then calls luaL _
ISO C and runs in virtually any system represented naturally in the Lua-C API. loadfile to load (precompile) the
with as little as 300k bytes of memory. For instance, Lua tries to avoid or re- given source file into this state. In the
Lua runs in all mainstream systems duce the use of special syntax for a new absence of errors, this call produces a
and also on mainframes, inside OS ker- mechanism, as syntax is not accessible Lua function that is then executed by
nels (such as the NetBSD kernel), and through an API. On the other hand, lua _ pcall. If either loadfile or
on “bare metal” (such as NodeMCU mechanisms exposed as functions are pcall raises an error, it produces an
running on the ESP8266 microcon- naturally mapped to the API. error message that is printed to the
troller); and Following the motto “mechanisms terminal. Otherwise, the program gets
Embeddability. Lua was designed instead of policies” has a clear impact on the value of the global variable pi and
since its inception to interoperate with simplicity and small size. It also affects prints its value.
other languages, both by extending— embeddability by breaking complex The data exchange among these API
allowing Lua code to call functions concepts into simpler ones that are calls is done through an implicit stack
written in a foreign language—and by easier to represent in the API. in the Lua state. The call to luaL _
embedding—allowing foreign code to Lua supports eight data types: nil, loadfile pushes on the stack either
call functions written in Lua.8 Lua is boolean, number, string, userdata, a function or an error message. The
thus implemented not as a standalone table, function, and thread, which rep- call to lua _ pcall pops the func-
program but as a library with a C API. resents coroutines. The first five are tion from the stack and calls it. The
This library exports functions that cre- no surprise. The last three give Lua call to lua _ getglobal pushes the
ate a new Lua state, load code into a its flavor and are the ones we discuss value of the global variable. The call to
state, call functions loaded into a state, here. However, given the importance lua _ tonumber projects the Lua val-
ue on top of the stack to a double. The
Figure 1. A C program using the Lua library. stack ensures these values remain vis-
ible to Lua while being manipulated by
#include <stdio.h>
#include “lauxlib.h”
the C code so they cannot be collected
#include “lualib.h” by Lua’s garbage collector.
Besides the functions used in this
int main (int argc, char **argv) {
simple example, the Lua–C API (or “C
// create a new state
lua_State *L = luaL_newstate(); API” for short) offers functions for all
// load the standard libraries kinds of manipulation of Lua values,
luaL_openlibs(L); including pushing C values (such as
// try to load the given file and then
// call the resulting function
numbers and strings) onto the stack,
if (luaL_loadfile(L, argv[1]) != LUA_OK || calling functions defined by the script,
lua_pcall(L, 0, 0, 0) != LUA_OK) { and setting variables in the state.
// some error occurred; print the error message
fprintf(stderr, “lua: %s\n”, lua_tostring(L, -1));
} Tables
else { // code ran successfully “Table” is the Lua term for associa-
lua_getglobal(L, “pi”); tive arrays, or “maps.” A table is just
printf(“pi: %f\n”, lua_tonumber(L, -1));
}
a collection of entries, which are pairs
lua_close(L); // close the state 〈key, value〉.
return 0; Tables are the sole data-structuring
} mechanism in Lua. Nowadays,
maps are available in most scripting

languages, as well as in several non- An interesting property of this im-

scripting ones, but in Lua maps are plementation is that it gives sparse
ubiquitous. Indeed, Lua programmers arrays for free. For instance, when a
use tables not only for all kinds of data programmer creates a table with three
structures (such as records, arrays,
lists, sets, and sparse matrices) but Lua offers exactly entries at indices 5, 100, and 3421, Lua
automatically stores them in the hash
also for higher-level constructs (such as
modules, objects, and environments).
one general part, instead of creating a large array
with thousands of empty slots.
Programmers implement records mechanism for Lua also uses tables to implement
using tables whose indices are strings
representing field names. Lua sup-
each major aspect weak references. In languages with
garbage collection, a weak reference is
ports records with syntactic sugar, of programming: a reference to an object that does not
translating a field reference like t.x to
a table-indexing operation t[“x”].
tables for data; prevent its collection as garbage.10 In
Lua, weak references are implemented
Lua offers constructors, expressions functions for in weak tables. A weak table is thus a
that create and initialize tables. The
constructor {} creates an empty table. abstraction; and table that does not prevent its contents
from being collected. If a key or a value
The constructor {x=10,y=20} creates coroutines for in an entry is collected, that entry is
a table with two entries, one mapping
the string "x" to the integer 10, the control. simply removed from the table; we dis-
cuss later how to signal that a table is
other mapping "y" to 20. Program- weak. Weak tables in Lua also subsume
mers see this table as a record with ephemerons.4
fields "x" and "y". Weak tables seem to contradict the
Programmers implement arrays motto “mechanisms instead of poli-
with tables whose indices are positive cies” because weak reference is a more
integers. Constructors also support basic concept than weak table. Weak
this usage. For example, the expression tables would then be a policy, a particu-
{10,20,30} creates a table with three lar way of using weak references. How-
entries, mapping 1 to 10, 2 to 20, and 3 ever, given the role of tables in Lua, it
to 30. Programmers see the table as an is natural to use them to support weak
array with three elements. references without introducing yet an-
Arrays have no special status in the other concept.
semantics of Lua; they are just ordi-
nary tables. However, arrays pervade Functions
programming. Therefore, implemen- Lua supports first-class anonymous
tation of tables in Lua gives special functions with lexical scoping, infor-
attention to their use as arrays. The in- mally known as closures.13 Several non-
ternal representation of a table in Lua functional languages nowadays (such
has two parts: an array and a hash.7 If as Go, Swift, Python, and JavaScript)
the array part has size N, all entries offer first-class functions. However, to
with integer keys between 1 and N are our knowledge, none uses this mecha-
stored in the array part; all other en- nism as pervasively as Lua.
tries are stored in the hash part. The All functions in Lua are anonymous.
keys in the array part are implicit and This is not immediately clear in the
do not need to be stored. The size N of standard syntax for defining a function
the array part is computed dynamical-
ly, every time the table has to rehash function add (x, y)
as the largest power of two such that return x + y
at least half the elements in the array end
part will be filled. A generic access
(such as t[i]) first checks whether i Nevertheless, this syntax is just syn-
is an integer in the range [1, N ]; this tactic sugar for an assignment of an
is the most common case and the one anonymous function to a variable
programmers expect to be fast. If so,
the operation gets the value in the ar- add = function (x, y)
ray; otherwise, it accesses the hash. return x + y
When accessing record fields (such end
as t.x) the Lua core knows the key is
a string and so skips the array test, go- Most dynamic languages offer some
ing directly to the hash. kind of eval function that evaluates a

piece of code produced at runtime. In- Figure 2. A simple module in Lua. Statically, a module is simply the
stead of eval, Lua offers a load func- chunk that creates its corresponding
tion that, given a piece of source code, local M = {} table. Figure 2 shows a standard idiom
returns a function equivalent to that function M.new (x, y)
for defining a simple module in Lua.
code. We saw a variant of load in the C return {x = x, y = y} The code creates a table in the local
API in the form of luaL _ loadfile. end variable M, populates the table with
Consider the following piece of code some functions, and returns that table.
function M.add (u, v)
return M.new(u.x+v.x, u.y+v.y) Recall that Lua loads any chunk as the
local id = 0 end body of an enclosing anonymous func-
function genid () tion; this is how one should read that
function M.norm (v)
id = id + 1 return math.sqrt(v.x^2 + v.y^2)
code. The variable M is local to that
return id end enclosing function and the final state-
end ment returns from that function.
return M Once defined in a file mymodule.
When one loads it, the function lua, a programmer can use that mod-
load returns an anonymous function ule with code like thisa
equivalent to the following code Figure 3. A module in Lua using
environments.
local vec = require “mymodule”
function () local sqrt = math.sqrt print(vec.norm(vec.new(10, 10)))
local id = 0 local _ENV = {} --> 14.142135623731
function genid ()
function new (x, y)
id = id + 1 return {x = x, y = y} In it, require is a regular func-
return id end tion from the standard library; when
end function add (u, v)
the single argument to a function is a
end return new(u.x+v.x, u.y+v.y) literal string, the code can omit the pa-
end rentheses in the call. If the module is
So, if a programmer loads Lua code function norm (v)
not already loaded, require searches
stored in a string and then calls the re- return sqrt(v.x^2 + v.y^2) for an appropriate source for the given
sulting function, the programmer gets end name (such as by looking for files in a
the equivalent of eval. return _ENV list of paths), then loads and runs that
We use the term “chunk” to denote code, and finally returns what the code
a piece of code fed to load (such as a returns. In this example, require re-
source file). Chunks are the compila- nisms in Lua, including modules, turns the table M created by the chunk.
tion units of Lua. When a programmer object-oriented programming, and Lua leverages tables, first-class func-
uses Lua in interactive mode, the Read- exception handling. We now discuss tions, and load to support modules.
Eval-Print Loop (REPL) handles each some of them, emphasizing how they The only addition to the language is the
input line as a separate chunk. contribute to Lua’s design goals. function require. This economy is
The function load simplifies the Modules. The construction of mod- particularly relevant for an embedded
semantics of Lua in two ways: First, un- ules in Lua is a nice example of the language like Lua. Because require is
like eval, load is pure and total; it has use of first-class functions and tables a regular function, it cannot create lo-
no side effects and it always returns a as a basis for other mechanisms. At cal variables in the caller’s scope. Thus,
value, either a function or an error mes- runtime, a module in Lua is a regular in the example using "mymodule", the
sage; second, it eliminates the distinc- table populated with functions, as well programmer had to define explicitly
tion between “global” code and “func- as possibly other values (such as con- the local variable vec. Yet this limita-
tion” code, as in the previous chunk stants). Consider this Lua fragment tion gives programmers the ability to
of code. The variable id, which in the give a local name to the module.
original code appears outside any func- print(math.sin(math.pi/6)) On the one hand, the construction
tion, is seen by Lua as a local variable --> 0.5 of modules in Lua is not as elegant
in the enclosing anonymous function as a dedicated language mechanism
representing the script. Through lexi- Abstractly, programmers read this could be, with explicit import and ex-
cal scoping, id is visible to the func- code as calling the sin function from port lists and other refinements, as in
tion genid and preserves its value be- the standard math module, using the the “import machinery” in Python.12
tween successive calls to that function. constant pi from that same module. On the other hand, this construction
Thus, id works like a static variable in Concretely, the language sees math as has a clear semantics that requires no
C or a class variable in Java. a variable (created when Lua loaded its
standard libraries) containing a refer-
Exploring Tables and Functions ence to a table. That table has an entry a To test these pieces of code interactively, remove
the local from the variable initializations. In
Despite their apparent simplicity—or with the key "sin" containing the sine interactive mode, Lua loads each line as an in-
because of it—tables and functions function and an entry "pi" with the dependent chunk. A local variable is thus visible
form a basis for several other mecha- value of π. only in the line where it was defined.

further explanation. It also has an inex- environment. All chunks thus share fines the module components directly
pensive implementation. Finally, and this same environment by default, giv- as free variables; instead of M.norm, it
also quite important, it has an easy in- ing the illusion of global variables; in uses only norm, which Lua translates
tegration with the C API: One can eas- the chunk just mentioned, both v and to _ ENV.norm. The code ends the
ily create modules in C; create mixed print refer to fields in that table and module with return _ ENV.
modules with some functions defined thus behave as global variables. Howev- This method for writing modules
in Lua and others in C; and for C code er, both load and the code being load- has two benefits: First, all external
call functions inside modules. The API ed can modify _ ENV to any other value. functions and modules must be ex-
needs no additional mechanisms to do The _ ENV mechanism allows different plicitly imported right at the start; and
these tasks; all it needs is the existing scripts to have different environments, second, a module cannot pollute the
Lua mechanisms to manipulate tables functions to be called with different en- global space by mistake.
and functions. vironments, and other variations. Object-oriented programming.
Environments. Local variables in The translation of free variables Support for object-oriented program-
Lua follow a strict lexical scoping disci- needs semantic information to deter- ming in Lua follows the pattern we
pline. A local variable can be accessed mine whether a variable is free. Never- have been seeing in this article: It tries
only by code that is lexically written in- theless, the translation itself is purely to build upon tables and functions,
side its scope. Lexical scoping implies syntactical. In particular, _ ENV is a adding only the minimum necessary
that local variables are one of the few regular variable, needing no special to the language.
constructions that do not cross the C treatment by the compiler. The pro- Lua uses a two-tier approach to
API, as C code cannot be lexically in- grammer can assign new values to object-oriented programming. The
side Lua code. _ ENV or declare other variables with first is implemented by Lua and the
A program in Lua can be composed that name. As an example, consider second by programmers on top of the
of multiple chunks (such as multiple this fragment first one. The first tier is class-based.
modules) loaded independently. Lexi- Both objects and classes are tables,
cal scoping implies that a module do and the relation “instance of” is dy-
cannot create local variables for other local _ ENV = {} namic. Userdata, which represents C
chunks. Variables like math and re- ... values in Lua, can also play the role of
quire, created by the standard librar- end objects. Classes are called metatables.
ies, should thus be created as global In this first tier, a class can define only
variables. However, using global vari- Inside the do block, all free vari- methods for the standard operators
ables in a large program can easily ables refer to fields in the new table (such as addition, subtraction, and
lead to overly complex code, entan- _ ENV. Outside the block, all free vari- concatenation). These methods are
gling apparently unrelated parts of a ables refer to the default environment. called metamethods.
program. To circumvent this conflict, A more typical use of _ ENV is for Figure 4 illustrates how a program-
Lua does not have global variables writing modules. Figure 3 shows how mer would use this basic mechanism
built into the language. Instead, it to rewrite the simple module of Figure to perform arithmetic on 2D vectors.
offers a mechanism of environments 2 using environments. In the first line, The code starts with a table mt that
that, by default, gives the equivalent where the code “imports” a function would be the metatable for the vec-
of global variables. Nevertheless, as we from the math module, the environ- tors. The code then defines a function
show later in this article, environments ment is still the default one. In the newVector to create 2D vectors. Vec-
allow other possibilities. second line, the code sets the envi- tors are tables with two fields, x and y.
Recall that any chunk of code in Lua ronment to a new table that will rep- The standard function setmetatable
is compiled as if inside an anonymous resent the module. The code then de- establishes the “instance of” relation
function. Environments add two sim-
ple rules to this translation: First, the Figure 4. An example of metatables.
enclosing anonymous function is com- local mt = {}

piled as if in the scope of a local vari-
able named _ ENV; and second, any function newVector (x, y)
free variable id in the chunk is trans- local p = {x = x, y = y}
setmetatable(p, mt)
lated to _ ENV.id. For example, Lua return p
loads the chunk print(v) as if it was end
written like this
function mt.__add (p1, p2)
return newVector(p1.x + p2.x, p1.y + p2.y)
local _ ENV = <<some given value>> end
return function ()
_ ENV.print( _ ENV.v) -- example of use
A = newVector(10, 20)
end B = newVector(20, -40)
C = A + B
By default, load initializes _ ENV print(C.x, C.y) --> 30 -20
with a fixed table, called the global

between a new vector and mt. Next, prototypes. In it, programmers repre- Figure 5 illustrates these concepts.
the code defines the metamethod sent objects also by tables or userdata. First the code creates a prototype, the
mt.__add to implement the addition Each object can have a prototype, from table Account. The code then creates
operator for vectors. The code then which it inherits methods and fields. a table mt to be used as the metat-
creates two vectors, A and B, and adds The prototype of an object obj is the able for instances of Account. It then
them to create a new vector C. When object stored in the __index field of adds three methods to the prototype:
Lua tries to evaluate A+B, it does not the metatable of obj. One can then one for creating instances, one for
know how to add tables and so checks write obj.foo(x), and Lua will retrieve making deposits, and one for retriev-
for an __add entry in A’s metatable. the method foo from the object’s pro- ing the account’s balance. Finally, it
Given that it finds that entry, Lua calls totype, through delegation. returns the prototype as the result of
the function stored there—the meta- However, if we stopped here, there this module.
method—passing the original oper- would be a flaw in the support for Assuming the module is in the file
ands A and B as arguments. object-oriented programming in Lua. Account.lua, the following lines ex-
The metamethod for the indexing After finding and calling the method ercise the code
operator [] offers a form of delega- in the object’s prototype, there would
tion in Lua. Lua calls this metamethod, be no way for the method to access the Account = require “Account”
named __index, whenever it tries original object, which is the intend- acc = Account:new()
to retrieve the value of an absent key ed receiver. Lua solves this problem acc:deposit(1000)
from a table. (For userdata, Lua calls through syntactic sugar. Lua translates print(acc:balance()) -->
that metamethod for all keys.) For a “method” definition like 1000
the indexing operation, Lua allows
the metamethod to be a function or a function Proto:foo (x) First, the code requires the mod-
table. When __index is a table, Lua ... ule, then it creates an account; acc
delegates to that table all access for an end will be an empty table with mt as its
index that is absent in the original ta- metatable. De-sugared, the next line
ble, as illustrated by this code fragment to a function definition: reads as acc.deposit(acc,1000).
The table acc does not have a depos-
Proto = {x = 0, y = 0} function Proto.foo (self, x) it field, so Lua delegates that access
obj = {x = 10} ... to the table in the metatable’s __in-
mt = { __index = Proto} end dex field. The result of the access is
setmetatable(obj, mt) the function Account.deposit. Lua
print(obj.x) --> 10 Likewise, Lua translates a “method” then calls that function, passing acc
print(obj.y) --> 0 call obj:foo(x) to obj.foo(obj,x). as the first argument (self) and 1000
When the programmer defines a as the second argument (amount). In-
In the second call to print, Lua “method”—a function using the colon side the function, Lua will again del-
cannot find the key "y" in obj and so syntax—Lua adds a hidden parameter egate the access self.bal to the pro-
delegates the access to Proto. In the self. When the programmer calls a totype because acc does not yet have
first print, as obj has a field "x", the “method” using the colon syntax, Lua a field bal. In subsequent calls to bal-
access is not delegated. provides the receiver as the argument to ance, Lua will find a field bal in the
With tables, functions, and del- the self parameter. There is no need table acc and use that value. Distinct
egation, we have almost all we need to add classes, objects, or methods to accounts thus have separate balances
for the second tier, which is based on the language, merely syntactic sugar. but share all methods.
The access to a prototype in the
Figure 5. A simple prototype-based design in Lua. metatable’s __index is a regular
access, meaning prototypes can be
local Account = {bal = 0} chained. As an example, suppose the
local mt = {__index = Account}
programmer adds the following lines
function Account:new () to the previous example
local obj = {}
setmetatable(obj, mt) Object = {name = “no name”}
return obj
end
setmetatable(Account,
{ _ _ index = Object})
function Account:deposit (amount)
self.bal = self.bal + amount
When Lua evaluates acc.name,
end
function Account:balance () the table acc does not have a name
return self.bal key, so Lua tries the access in its pro-
end totype, Account. That table also does
return Account
not have that key, so Lua goes to Ac-
count’s prototype, the table Object,
where it finally finds a name field.

Figure 6. Accounts with private fields. Lua is flexible. Because method selec- whose only argument is the error ob-
tion and the variable self are inde- ject. The function error also appears
pendent, Lua does not need additional in the C API as a regular function de-
local bal = {}
setmetatable(bal, {__mode = “k”}) mechanisms to call methods from spite the fact that it never returns.
other classes (such as “super”). Final- Both lua _ pcall and lua _ er-
local Account = {} ly, this design is friendly to the C API. ror are reflected into Lua via the stan-
local mt = {__index = Account}
All it needs is basic manipulation of ta- dard library. In languages that support
function Account:new () bles and functions, plus the standard try–catch, typical exception-han-
local obj = {} function setmetatable. Lua pro- dling code looks like this
setmetatable(obj, mt)
bal[obj] = 0
grammers can implement prototypes
return obj in Lua and create userdata instances in try {
end C, create prototypes in C and instanc- <<protected code>>
es in Lua, and define prototypes with }
function Account:deposit (amount)
bal[self] = bal[self] + amount
some methods implemented in Lua catch (errobj) {
end and others in C. All these pieces work <<exception handling>>
together seamlessly. }
function Account:balance ()
Exception handling. Exception
return bal[self]
end handling in Lua is another mecha- The equivalent code in Lua is like this
nism that relies on the flexibility of
return Account functions. Several languages offer a local ok, errobj =
try–catch construction for excep- pcall(function ()
tion handling; any exception in the <<protected code>>
The programmer can keep the bal- code inside a try clause jumps to end)
ances private by storing them outside a corresponding catch clause. Lua
the object table, as shown in Figure does not offer such a construction, if not ok then
6. The key difference between this ver- mainly because of the C API. <<exception handling>>
sion and the one in Figure 5 is the use More often than not, exceptions in end
of bal[self] instead of self.bal to a script are handled by the host appli-
denote the balance of an account. The cation. A syntactic construction like In this translation, anonymous
table bal is what we call a dual table. try–catch is not easily mapped into functions with proper lexical scoping
The call to setmetatable in the sec- an API with a foreign language. In- play a central role. Except for state-
ond line causes this table to have weak stead, the C API packs exception-han- ments that invoke escape continua-
keys, thus allowing an account to be dling functionality into the higher-or- tions (such as break and return),
collected when there are no other ref- der function lua _ pcall (“protected everything else can be written inside
erences to it in the program. The fact call”) we discussed when we visited the protected code as if written in the
that bal is local to the module ensures the C API earlier in this article. The regular code.
no code outside that module can see function pcall receives a function as The use of pcall for exception
or tamper with an account’s balance, a an argument and calls that function. handling has pros and cons similar to
technique that is handy whenever one If the provided function terminates those for modules. On the one hand,
needs a private field in a structure. without errors, pcall returns true; the code may not look as elegant as in
An evaluation of Lua’s support for otherwise, pcall catches the error other languages that support the tra-
object-oriented programming is not and returns false plus an error object, ditional try. On the other hand, it has
very different from the evaluation of which is any value given when the er- a clear semantics. In particular, ques-
the other mechanisms we have dis- ror was raised. Regardless of how tions like “What happens with excep-
cussed so far. On the one hand, object- pcall is implemented, it is exposed tions inside the catch clause?” have
oriented features in Lua are not as in the C API as a conventional func- an obvious answer. Moreover, it has a
easy to use as in other languages that tion. The C API also offers a function clear and easy integration with the C
offer specific constructs for the task. to raise errors, called lua _ error, API; it is exposed through conventional
In particular, the colon syntax can be
somewhat confusing, mainly for pro- Figure 7. A simple example of a coroutine in Lua.
grammers who are new to Lua but have

some experience with another object- co = coroutine.create(function (x)
oriented language. Lua needs that syn- print(x) --> 10
x = coroutine.yield(20)
tax because of its economy of concepts
print(x) --> 30
that avoids introducing the concept of return 40
method when the existing concept of end)
function will suffice.
print(coroutine.resume(co, 10)) --> 20
On the other hand, the semantics print(coroutine.resume(co, 30)) --> 40
of objects in Lua is simple and clear.
Also, the implementation of objects in

functions; and Lua programs can raise the coroutine again, making yield re-
errors in Lua and catch them in C and turn 30, the value given to resume. The
raise errors in C and catch them in Lua. coroutine then prints 30 and finishes,
causing the corresponding call to re-
Coroutines
Like associative arrays and first-class In the case sume to return 40, the value returned
by the coroutine.
functions, coroutines are a well-estab-
lished concept in programming. How-
of modules, Coroutines are not as widely used in
Lua as tables and functions. Neverthe-
ever, unlike tables and first-class func- tables provide less, when required, coroutines play a
tions, there are significant variations
in how different communities imple-
name spaces, pivotal role, due to their capacity for
turning the control flow of a program
ment coroutines.2 Several of these vari- lexical scoping inside out.
ations are not equivalent, in the sense
that a programmer cannot implement
provides An important use of coroutines in
Lua is for implementing cooperative
one on top of the other. encapsulation, multithreading. Games typically ex-
Coroutines in Lua are like coopera-
tive multithreading and have the fol- and first-class ploit this feature, because they need
to be in control to remain responsive
lowing distinguishing properties: functions allow at interactive rates. Each character
First-class values. Lua programmers
can create coroutines anywhere, store exportation or object in a game has its own script
running in a separate coroutine. Each
them in variables, pass them as param-
eters, and return them as results. More
of functions. script is typically a loop that, at each it-
eration, updates the character’s state
important, they can resume coroutines and then yields. A simple scheduler
anywhere; resumes all live coroutines at each
Suspend execution. They can sus- game update.
pend their execution from within nest- Another use of coroutines is in tack-
ed functions. Each coroutine has its ling the “who-is-the-boss” problem. A
own call stack, with a semantics simi- typical issue with scripting languages
lar to collaborative multithreading. is the decision whether to embed or
The entire stack is preserved when the to extend. When programmers embed
coroutine yields; a scripting language, the host is the
Asymmetric. Symmetric coroutines boss, that is, the host program, written
offer a single control-transfer opera- in the foreign language, has the main
tion that transfers control from the loop of the program and calls func-
running coroutine to another given tions written in the scripting language
coroutine. Asymmetric coroutines, on for particular tasks. When program-
the other hand, offer two control-trans- mers extend a scripting language, the
fer operations, resume and yield, script is the boss; programmers then
that work like a call–return pair; and write libraries for it in the foreign lan-
Equivalent to one-shot continuations.2 guage, and the main loop of the pro-
Despite this equivalence, coroutines gram is in the script.
offer one-shot continuations in a for- Embedding and extending both
mat that is more natural for a proce- have advantages and disadvantages,
dural language due to its similarity to and the Lua–C API supports them
multithreading. equally. However, external code can be
Figure 7 illustrates the life cycle of a less forgiving. Suppose a large, mono-
coroutine in Lua. The program prints lithic application contains some use-
10, 20, 30, and 40, in that order. It starts ful functionality for a particular script.
by creating a coroutine co, giving an The programmer wants to write the
anonymous function as its body. That script as the boss, calling functions
operation returns only a handle to the from that external application. How-
new coroutine, without running it. The ever, the application itself assumes it is
program then resumes the coroutine the boss. Moreover, it may be difficult
for the first time, starting the execution to break the application into individual
of its body. The parameter x receives functions and offer them as a coherent
the argument given to resume, and the library to the script.
program prints 10. The coroutine then Coroutines offer a simpler design.
yields, causing the call to resume to The programmer modifies the ap-
return the value 20, the argument given plication to create a coroutine with
to yield. The program then resumes the script when it starts; every time
122 CO M MUNICATIO NS O F TH E ACM | NOV EM BER 201 8 | VO L . 61 | N O. 1 1

the application needs an input, it re- This is clearly true of the design of any ing of what they are doing, as most
sumes that coroutine. That is the only programming language. constructions are explicit in the code.
change the programmer needs to Lua has a unique set of design goals This explicitness also allows such
make in the application. The script, that prioritize simplicity, portability, deeper understanding. We trust this
for its part, also looks like a regu- and embedding. The Lua core is based is a blessing, not a curse.
lar program, except it yields when on three well-known, proven con-
it needs to send a command to the cepts—associative arrays, first-class References
1. Cazzola, W. and Olivares, D.M. Gradually learning
application. The control flow of the functions, and coroutines—all imple- programming supported by a growable programming
resulting program progresses as fol- language. IEEE Transactions on Emerging Topics in
mented with no artificial restrictions. Computing 4, 3 (July 2016), 404–415.
lows: The application starts, creates On top of these components, Lua fol- 2. de Moura, A.L and Ierusalimschy, R. Revisiting
coroutines. ACM Transactions on Programming
the coroutine, does its own initializa- lows the motto “mechanisms instead Languages and Systems 31, 2 (Feb. 2009), 6.1–6.31.
tion, and then waits for input by re- of policies,” meaning Lua’s design 3. Gamasutra. Game Developer magazine’s 2011 Front
Line Award, Jan. 13, 2012; https://www.gamasutra.
suming the coroutine. The coroutine aims to offer basic mechanisms to al- com/view/news/129084/
then starts running, does its own ini- low programmers to implement more 4. Hayes, B. Ephemerons: A new finalization mechanism.
In Proceedings of the 12th ACM SIGPLAN Conference
tialization, and performs its duties complex features. For instance, in the on Object-Oriented Programming, Systems,
until it needs some service from the case of modules, tables provide name Languages, and Applications (Atlanta, GA, Oct. 5–9).
ACM, New York, 1997, 176–183.
application. At this point, the script spaces, lexical scoping provides encap- 5. Ierusalimschy, R. Programming with multiple
yields with a request, the call to re- sulation, and first-class functions allow paradigms in Lua. In Proceedings of the 18th
International Workshop on Functional and (Constraint)
sume made by the application re- exportation of functions. On top of that, Logic Programming, LNCS, Volume 5979. S. Escobar,
turns, and the application services Lua adds only the function require to Ed. (Brasilia, Brazil, June 28). Springer, Heidelberg,
Germany, 2009, 5–13.
the given request. The application search for and load modules. 6. Ierusalimschy, R., de Figueiredo, L.H., and Celes, W.
then waits for the next request by re- Modularity in language design is Lua—An extensible extension language. Software:
Practice and Experience 26, 6 (June 1996), 635–652.
suming the script again. nothing new.11 For instance, it can 7. Ierusalimschy, R., de Figueiredo, L.H., and Celes, W.
Presentation of coroutines in be used to clarify the construction The evolution of Lua. In Proceedings of the Third ACM
SIGPLAN Conference on History of Programming
the C API is clearly more challeng- of a large application.1 However, Lua Languages (San Diego, CA, June 9–10). ACM Press,
New York, 2007, 2.1–2.26.
ing than presentation of functions uses modularity to keep its size small, 8. Ierusalimschy, R., de Figueiredo, L.H., and Celes,
and tables. C code can create and breaking down complex constructions W. Passing a language through the eye of a needle.
Commun. ACM 54, 7 (July 2011), 38–43.
resume coroutines without restric- into existing mechanisms. 9. International Organization for Standardization.
tions. In particular, resuming works The motto “mechanisms instead of ISO 2000. International Standard: Programming
Languages, C. ISO/IEC9899: 1999(E).
like a regular function call: It (re) policies” also makes for a flexible lan- 10. Jones, R., Hosking, A., and Moss, E. The Garbage
activates the given coroutine when guage, sometimes too flexible. For in- Collection Handbook. CRC Press, Boca Raton, FL, 2011.
11. Kats, L. and Visser, E. The Spoofax Language
called and returns when the corou- stance, the do-it-yourself approach to Workbench: Rules for declarative specification of
tine yields or ends. However, yield- classes and objects leads to prolifera- languages and IDEs. In Proceedings of the ACM
International Conference on Object Oriented
ing also poses a problem. Once a C tion of different, often incompatible, Programming Systems Languages and Applications
function yields, there is no way to systems, but is handy when a program- (Reno/Tahoe, NV, Oct. 17–21). ACM Press, New York,
2010, 444–463.
later return the control to that point mer needs to adapt Lua to the class 12. The Python Software Foundation. The Python
in the function. The API offers two model of the host program. Language Reference, 3.5 Edition. The Python Software
Foundation, 2015.
ways to circumvent this restriction: Tables, functions, and coroutines as 13. Sestoft, P. Programming Language Concepts, Second
The first is to yield in a tail position: used in Lua have shown great flexibility Edition. Springer, Cham, Switzerland, 2017.
14. Wikipedia. List of applications using Lua; https://
When the coroutine resumes, it goes over the years. Despite the language’s en.wikipedia.org/w/index.php?title=List_of_
applications_using_Lua&oldid=795421653
straight to the calling Lua function. continuing evolution, there has been
The second is to provide a continua- little demand from programmers to
Roberto Ierusalimschy (roberto@inf.puc-rio.br) is an
tion function when yielding. In this change the basic mechanisms. associate professor of computer science at PUC-Rio, the
way, when the coroutine resumes, The lack of built-in complex con- Pontifical Catholic University of Rio de Janeiro, Brazil.
the control goes to the continuation structions and minimalist standard Luiz Henrique de Figueiredo (lhf@impa.br) is a
researcher at IMPA, the Institute for Pure and Applied
function, which can finish the task libraries (for portability and small Mathematics in Rio de Janeiro, Brazil.
of the original function. size) make Lua a language that is not Waldemar Celes (celes@inf.puc-rio.br) is an associate
We can see again in the API the adas good as other scripting languages professor of computer science at PUC-Rio, the Pontifical
Catholic University of Rio de Janeiro, Brazil.
vantages of asymmetric coroutines for for writing “quick-and-dirty” pro-
a language like Lua. With symmetric grams. Many programs in Lua need
coroutines, all transfers would have an initial phase for programmers to Copyright held by the authors.
the problems that asymmetric corou- set up the language, as a minimal in- Publication rights licensed to ACM. $15.00
tines have only when yielding. In our frastructure for object-oriented pro-
experience, resumes from C are much gramming. More often than not, Lua
more common than yields. is embedded in a host application.
Embedding demands planning and
Conclusion the set-up of the language is typically Watch the authors discuss
Every design involves balancing con- integrated with its embedding. Lua’s this work in the exclusive
Communications video.
flicting goals. To address the conflicts, economy of concepts demands from https://cacm.acm.org/videos/
designers need to prioritize their goals. programmers a deeper understand- a-look-at-the-design-of-lua

DOI:10.1145/ 3186278
we still regularly make mistakes in our
Systematic use of proven debugging programs and spend a large part of our
development effort trying to fix them.
approaches and tools lets programmers Moreover, nowadays, failures can
address even apparently intractable bugs. occur nondeterministically in nano-
second time spans within computer
BY DIOMIDIS SPINELLIS systems consisting of thousands of
processors spanning the entire planet
Modern
running software code where size is
measured in millions of lines. Failures
can also be frighteningly expensive,
costing human lives, bringing down
Debugging:
entire industries, and destroying valu-
able property.22 Thankfully, debug-
ging technology has advanced over the
years, allowing software developers
The Art of
to pinpoint and fix faults in ever more
complex systems.
One may reasonably wonder how
Finding a Needle
debugging is actually performed in
practice. Three recent publications
have shed light on a picture full of con-
in a Haystack
trasts. A common theme is that the
practice and problems of debugging
have not markedly changed over the
past 20 years. Michael Perscheid and
colleagues at the SAP Innovation Cen-
ter and the Hasso Plattner Institute in
Potsdam, Germany, examined the de-
bugging practices of professional soft-
ware developers and complemented
the results with an online study.27 They
THE COMPUTING PIONEER Maurice Wilkes famously found developers are not trained in
debugging, spend 20% to 40% of their
described his 1949 encounter with debugging like this: work time in it, structure their debug-
“As soon as we started programming, [...] we found to ging process following a simplified
our surprise that it wasn’t as easy to get programs right scientific method (see Figure 1), are
proficient in using symbolic debug-
as we had thought it would be. [...] Debugging had to gers, regularly debug by adding print
be discovered. I can remember the exact instant [...] statements, are unfamiliar with back-
when I realized that a large part of my life from then
key insights
on was going to be spent in finding mistakes in my
˽˽ Targeted software-development process
own programs.”37 improvements can aid debugging even in
Seven decades later, modern computers are cases where their wholesale adoption is
impractical.
approximately one million times faster and also ˽˽ Debugging benefits from the widespread
have one million times more memory than Wilkes’s availability of code, data, and Q&A
forums, and programmers can fix many
Electronic Delay Storage Automatic Calculator, or tricky bugs through the generation and
analysis of rich datasets.
EDSAC, an early stored-program computer using ˽˽ Modern debugging tools offer powerful
mercury delay lines. However, in terms of bugs and and specialized facilities that can
save hours of tedious unproductive
debugging not much has changed. As developers, debugging work.

in-time debuggers and automatic fault als fix faults in a carefully constructed atized into the process illustrated by
localization, and consider design, con- benchmark suite of software faults,6 the Unified Modeling Language activity
currency, and memory faults as the finding that professionals typically diagram in Figure 1. The first step in-
most difficult to debug. The low level agree on fault locations they identified volves reliably reproducing the failure.
of knowledge and use associated with using trace-based and interactive de- It is up to the programmer to produce
many advanced debugging techniques bugging. However, the study’s subjects meaningful results when running ex-
was also revealed in a mixed-methods then went on to implement incorrect periments to find the failure’s cause.
IMAGERY BY BOBNEVV AND A RT- SO NIK
study conducted by Moritz Beller and fixes, suggesting opportunities for au- Then comes the task of simplifying the
colleagues at the Delft University of tomated regression testing. failure’s configuration into the small-
Technology, the Netherlands, and my- Beginners sometimes view debug- est test case that would still cause the
self.3 In addition, a team led by Marcel ging as an opaque process of randomly failure to occur.40 The small test case
Böhme of Monash University, Clayton, trying things until locating a fault, a simplifies and speeds up the program-
VIC, Australia, performed a controlled method closer to alchemy than to sci- mer’s subsequent fault-discovery work.
study by having software profession- ence. Yet debugging can be system- The corresponding steps are outlined

in the top part of Figure 1. The next copy-and-paste an error message in a

steps, termed the “scientific method Web search engine and select the most
of debugging,”40 are outlined in the promising answer, that is what the pro-
bottom part of the figure. In them, the grammer should do. One can often ob-
programmer develops a theory about a
fault being witnessed, forms a hypoth- When the going tain better results by polishing the que-
ry, removing context-dependent data
esis regarding the theory’s effects, and
gathers and tests data against the hy-
gets tough, the (such as variable or file names) and en-
closing the error message in quotes to
pothesis.24 The programmer repeatedly programmer should search for the exact phrase, rather than
refines and tests the theory until the
cause of the failure is found.
humbly fall back just the words in it.
Web search typically works when a
The programmer may sometimes on the systematic programmer encounters problems with
short-circuit this process by guessing
directly a minimal test case or the fail-
process instead of widely used third-party software. Two
possible reasons can yield an unproduc-
ure’s cause. This is fine, especially if the randomly poking tive search: First, the programmer may
programmer’s intuition as an expert
provides correct guidance to the cause. the software trying be the first person ever to encounter the
problem. This is unlikely with popular
However, when the going gets tough, to pinpoint the fault software but can happen when work-
the programmer should humbly fall
back on the systematic process instead through sheer luck. ing with a cutting-edge release or with
a niche or legacy product. There is al-
of randomly poking the software trying ways an unlucky soul who is first to post
to pinpoint the fault through sheer luck. about a failure. Second, the error mes-
The goal of this article is to arm soft- sage the programmer is looking for may
ware developers with both knowledge- be a red herring, as with, say, a standard
gathering and theory-testing methods, innocuous warning rather than the ac-
practices, tools, and techniques that tual cause of the failure. One must judge
give them a fighting chance when strug- search results accordingly.
gling to find the fault that caused a fail- Q&A sites. The Web can also help
ure. Some techniques (such as examin- a programmer’s debugging through
ing a memory image, still often termed Q&A forums (such as a specific prod-
a magnetic memory “core dump”) uct’s issue tracker, a company’s inter-
have been with programmers since the nal equivalent, or the various https://
dawn of computing. Others (such as re- stackexchange.com/ sites). If the prob-
verse debugging) are only now becom- lem is general enough, it is quite likely
ing routinely available. And yet others an expert volunteer will quickly answer
(such as automatic fault localization the question. Such forums should be
based on slicing or statistical analysis) used with courtesy and consideration:
do not seem to have caught on.27 I hope One should avoid asking an already-
that summarizing here the ones I find answered question, post to the correct
through my experience as most effec- forum, employ appropriate tags, ask
tive can improve any programmer’s de- using a working minimal example,
bugging performance. identify a correct answer, and give
back to the community by contribut-
On the Shoulders of Colleagues ing answers to other questions. Writ-
The productivity boost I get as a devel- ing a good question post sometimes
oper by using the Web is such that I now requires significant research.20 Then
rarely write code when I lack Internet again, I often find this process leads
access. In debugging, the most useful me to solve the problem on my own.
sources of help are Web search, special- Source-code availability. When the
ized Q&A sites, and source-code reposi- fault occurs within open source soft-
tories. Keep in mind that the terms of ware, a programmer can use the Web
a programmer’s work contract might to find, download, and inspect the cor-
prohibit some of these help options. responding code.31 One should not be
Web search. Looking for answers intimidated by the code’s size or one’s
on the Web might sound like cheating. personal unfamiliarity. Chances are,
But when debugging, the program- the programmer will be looking at only
mer’s goal is to solve a problem, not a tiny part of the code around an error
demonstrate academic knowledge and message or the location of a crash. The
problem-solving skills. If the fastest programmer can find the error mes-
way to pinpoint and fix a problem is to sage by searching through the code

for the corresponding string. Crashes for clearing suspect code until hitting tries that do not signify errors—false
typically offer a stack trace that tells the the faulty one. The new unit tests the positives. Nevertheless, finding and fix-
programmer exactly the associated file programmer adds also result in a bet- ing such errors often prevents serious
and line number. The programmer can ter-tested system, making refactorings faults and can sometimes allow the
thus isolate the suspect code and look and other changes less risky. programmer to find a failure’s cause.
for clues that will help isolate the flaw. When writing unit tests the pro- Dynamic analysis. An alternative
Is some part of the software miscon- grammer is forced to write code that approach for analyzing a program’s dy-
figured? Are wrong parameters being is easy to test, modular, and relatively namic behavior is to run it under a spe-
passed through an API? Is an object in free of side effects. This can further cialized tool. This is particularly useful
an incorrect state for the method being simplify debugging, allowing the pro- when locating a fault involves sophis-
called? Or is there perhaps an actual grammer to inspect through the de- ticated analysis of large and complex
fault in the third-party software? bugger how each small unit behaves data structures that cannot be easily
If the bug fix involves modifying at runtime, either by adding suitable processed with general-purpose com-
open source software, the program- breakpoints or by directly invoking the mand line tools or a small script. Here
mer should consider contributing the code through the debugger’s read-eval- are some examples of tools a program-
fix back to its developers. Apart from print loop, or REPL, facility. mer may find useful. In languages com-
being a good citizen, sharing it will Debugging libraries and settings. piled with the LLVM Clang front-end
prevent the problem from resurfacing Third-party libraries and systems the programmer can use AddressSani-
when the software is inevitably upgrad- can also aid a fault-finding mission tizer,30 while a program runs, to detect
ed to a newer release. through the debugging facilities they many memory-handling errors: out-
provide. Some runtime libraries and of-bounds access, use after free, use
Tuning the Software- compilers (such as those of C and C++) after scope exit, and double or invalid
Development Process provide settings that guard against frees. Another related tool is Valgrind21
Some elements of a team’s software- pointer errors, memory buffer over- through which one can find potentially
development process can be instru- flows, or memory leaks at the expense unsafe uses of uninitialized values and
mental in preventing and pinpointing of lower runtime performance. Com- memory leaks. In addition, Valgrind’s
bugs. Those I find particularly effective pilers typically offer options to build Helgrind and data race detector (DRD)
include implementing unit tests, adopt- code for debugging by disabling opti- tools can help find race conditions and
ing static and dynamic analysis, and mizations (aggressive optimizations lock order violations in code that uses
setting up continuous integration to tie can confuse programmers when trying the POSIX threads API. If the code is
all these aspects of software develop- to follow the flow of control and data) using a different thread API, the pro-
ment together. Strictly speaking, these and by including more information grammer should consider applying In-
techniques aim for bug detection rather regarding the source code associated tel Inspector technology,29 which also
than debugging or preventing bugs be- with the compiled code. By enabling supports Threading Building Blocks,
fore they occur, rather than the location these settings the programmer is bet- OpenMP, and Windows threads.
of a failure’s root cause. However, in ter able to catch many errors. Continuous integration. Running
many difficult cases (such as nondeter- Static analysis. One can catch some static program-analysis tools on code
ministic failures and memory corrup- errors before the program begins to ex- to pinpoint a fault can be like trying to
tion), a programmer can apply them as ecute by reasoning about the program turn the Titanic around after hitting
an aid for locating a specific bug. Even if code through what is termed in the an iceberg. At that point, catastrophic
an organization’s software development software engineering literature “static damage has already been done, and it is
process does not follow these guide- program analysis.” For example, if a too late to change the course of events.
lines, they can be adopted progressively method can return a null value and In the case of a large software codebase,
as the programmer hunts bugs. this value is subsequently derefer- trying to evaluate and fix the scores of
Unit tests. It is impossible to build enced without an appropriate check, error messages spewed by an initial run
a bug-free system using faulty software a static-analysis tool can determine of a static-analysis tool can be a thorny
components and devilishly difficult the program could crash due to a null problem. The developers who wrote the
to isolate a problem in a huge lump pointer dereference. Tools (such as code may be unavailable to judge the
of code. Unit tests, which verify the FindBugs1 and Coverity Scan5) perform validity of the errors, and attempting to
functionality of (typically small) code this feat through multiple approaches fix them might reduce the code’s main-
elements in isolation, help in both di- (such as heuristics, dataflow or con- tainability and introduce even more
rections,28 increasing the reliability of straint analysis, abstract interpreta- serious faults. Also, the noise of exist-
routines (functions or methods) they tion, symbolic execution,8 and type and ing errors hides new ones appearing
test by guarding their correctness. In effect systems).23 The end result is a list in fresh code, thus contributing to a
addition, when a problem does occur, of messages indicating the location software-quality death spiral. To avoid
the programmer can often try to guess of probable faults in a particular pro- such a problem, the best approach is
what parts may be responsible for it gram. Depending on the tool, the ap- to integrate execution of static analysis
and add unit tests that are likely to un- proach being used, and the program’s into the software’s continuous-integra-
cover it. This way of working gives the language, the list may be incomplete— tion process,10 which entails regularly
programmer a systematic approach false negative results—or include en- merging (typically several times a day)

all developer work into a shared refer- configuring debugging functionality, and transparent. For example, some
ence version. Running static analysis logging and receiving debug data, and programs that execute in the back-
when each change is made can keep using high(er)-level languages. Again, ground (such as Unix daemons or Win-
the software codebase squeaky clean a programmer can selectively adopt dows services) offer a debugging option
from day one. these practices during challenging that causes them to operate synchro-
bug-hunting expeditions. nously as a typical command-line pro-
Making the Software Software’s debugging facilities. A gram, outputting debugging messages
Easier to Debug helpful way to isolate failures is to build on their standard output channel. This
Some simple software design and pro- and use debugging facilities within the is the approach I always use to debug
gramming practices can make soft- software. The aim here is to make the secure shell connection problems.
ware easier to debug, by providing or software’s operation more predictable Other debugging settings may make
the software’s operation more deter-
Figure 1. A process for systematic debugging. ministic, which is always helpful when
trying to isolate a fault through repeat-
ed executions. Such changes may in-
clude elimination of multiple threads,
use of a fixed seed in random-number
generators, and restriction of buffers
Reproduce failure
and caches to a small size in order to
increase the likelihood of triggering
Find new configuration subset
overflows and cache misses. Software
that still yields the failure developers should consider adding
such facilities to the code they debug.
They should, however, keep in mind
[found] that some debugging facilities can lead
to security vulnerabilities. To avoid this
[not found]
risk, the programmer must ensure the
facilities are automatically removed,
Increase granularity disabled, or made obnoxiously con-
of configuration subsets spicuous in production builds.
Logging. Programmers are often
[possible] able to pinpoint faults by adding or us-
ing software-logging or -tracing state-
[not possible]
ments.33 In their simplest form they are
plain print commands outputting de-
Form new hypothesis
tails regarding the location and state of
Failure configuration
regarding failure’s cause has been minimized a program’s execution. Unlike watch-
points added in a debugging session,
the statements are maintained with
Make prediction the program and are easily tailored to
based on hypothesis
display complex data structures in a
readable format.
Modern software tracing is typically
Test prediction performed through a logging frame-
through experiments
[predictions not satisfied] work (such as Apache log4j and Apache
log4net) that provides a unified API for
capturing, formatting, and handling
a program’s logging output. Such a
[predictions satisfied]
framework also allows programmers to
tailor at runtime the program’s output
Refine hypothesis verbosity and corresponding perfor-
mance and storage cost. Programmers
typically minimize logging to that re-
quired for operational purposes when
[refinement possible] Hypothesis is failure’s diagnosis
a program executes in a production
[no refinement possible] environment but increase it to include
detailed software tracing when they
want to debug a failure.
Telemetry. An obvious extension of
logging facilities is telemetry, or the

ability to obtain debugging data from Suitable technologies for this ap-
remote program executions (such as proach may be scripting languages
those by the program’s end users). (such as Python and R), domain-specific
Ideally, a programmer would want to languages,19 or model-driven software
be able to obtain the following types
of data: First, data associated with the My favorite source development.35 Adopting high-level
formalisms that allow for symbolic
execution context (such as the version
of the program, helper code, and oper-
of intelligence reasoning lets the programmer kill
two birds with one stone: fix the bug the
ating system); values of environment regarding programmer is after and provide (quali-
variables; and contents of configura-
tion files. Then comes data about the
a system’s fied) guarantees that no other bugs ex-
ist in the part of the code the program-
program’s operational status (such operation is mer has analytically reasoned about
as commands executed, settings, and
data files). And finally, in cases of pro-
the calls it makes regarding its correctness. Once the al-
ternative implementation is working,
gram crashes a programmer also needs to the operating the programmer can decide whether
details regarding the location of the
program crash (such as method name, system. architectural, operational, and perfor-
mance considerations should allow
line number, and program counter), keeping the code in its new formalism,
runtime context (such as call stack, val- whether to rewrite it (carefully), or au-
ues of parameters, local variables, and tomatically transform it to its original
registers), and the reason behind the programming language.
crash (such as uncaught exception or
illegal memory access). Insights from Data Analytics
Setting up a telemetry facility before Data is the lifeblood of debugging.
the software is distributed can be a life- The more data that is associated with a
saver when a nasty bug surfaces. On failure, the easier it is to find the corre-
some platforms, a third-party library sponding fault. Fortunately, nowadays,
can be used to collect the required practically limitless secondary storage,
data, outsource its collection, and ac- ample main memory, fast processors,
cess the results through a Web dash- and broadband end-to-end network
board. Keep in mind that telemetry connections make it easy to collect
records often contain personal data. and process large volumes of debug-
When raw memory is recorded, confi- ging data. The data can come from the
dential data the reporting code did not development process (such as from re-
gather explicitly (such as passwords vision-control systems and integrated
and keys) can end up in the telemetry development environments, or IDEs),
database. Software development teams as well as from program profiling. The
should consider (carefully) what data data can be analyzed with specialized
to collect, how long to store it, and how tools, an editor, command-line tools,
they will protect it, and disclose these or small scripts.
details to the user. Some of the processes described
High-level languages. Another last- here have been systematized and au-
resort approach a programmer may tomated by Andreas Zeller of Saarland
have to turn to when debugging com- University, Germany, under the term
plex algorithms, data structures, or pro- “delta debugging”38 and used to locate
tocols may be the implementation of the cause-effect chains in program states39
code in a higher-level formalism. This and simplify failure-inducing inputs.41
approach is useful when the program- Although the corresponding tools were
ming language developers are working mostly research prototypes, the same
with blurs their focus on the problem’s ideas can be applied on an ad hoc basis
essence. Verbose type declarations, to improve the effectiveness of debug-
framework boilerplate, unsafe pointers, ging tasks. Consider these representa-
obtuse data types, or spartan libraries tive examples:
may prevent one from expressing and Revision-control data. Bugs often oc-
fixing the parts that matter, burying the cur as the software evolves. By keeping
programmer instead in a tar-pit of tan- software under version control, using
gential goo. Lifting the code’s level of configuration management tools (such
abstraction may simplify finding wheth- as Git, Mercurial, and Subversion), the
er a knotty failure stems from errors in programmer can dig into a project’s
the logic or in the implementation. history to aid debugging work. Here are

some examples: If a program crashes or system. They (and their results) often small script. And then apply one of sev-
misbehaves at a particular program line determine to a very large extent a pro- eral file-differencing tools to find where
the programmer can analyze the source gram’s behavior. Consequently, any the two log files diverge.
code to see the last change associated divergence between operating system Editor tricks. A powerful text editor
with that line (for example, with the calls is a valuable hint regarding the or IDE can be a great aid when analyz-
git blame command). A review of the fault the programmer is trying to lo- ing log data. Syntax coloring can help the
change can then reveal that, say, one’s cate. For example, the programmer programmer identify the relevant parts.
colleague who implemented it forgot may see that one program tries to open With rectangular selections and regular
to handle a specific case. Alternatively, a file that does not exist, times out on expressions one can eliminate boiler-
by reading the version-control log of a network connection, or runs out of plate or nonessential columns to focus
software changes, the programmer can memory. To trace system calls, the pro- on the essential elements or run a file-dif-
find a recent change that may be relat- grammer can use the strace, ktrace, or ference program on them. The program-
ed to the failure being witnessed and truss tools under Unix and similar sys- mer can also identify patterns associated
examine it in detail. tems and Procmon18 under Windows. with a bug using search expressions and
Another neat use of the version- The interactions the programmer matched-text highlighting. Finally, by
control system for debugging is to wants to investigate may also occur at displaying multiple buffers or windows
automatically find the change that other levels of a system’s stack. The pro- the programmer can visually inspect the
introduced a fault. Under Git the programmer can trace calls to dynamically details of different runs.
grammer constructs a test case that linked libraries with ltrace7 (Unix) and Command-line tools. The program-
causes the fault and then specifies it Procmon (Windows). The programmer can also perform and, more impor-
to the git bisect command togeth- mer can also untangle interactions tant, automate many of these tasks and
er with a window of software versions with services residing on other hosts much more with Unix-derived com-
where the fault probably appeared. The by examining network packets through mand-line filter tools15,32 available na-
command will then run a binary search Wireshark’s25 nifty GUI. Keep in mind tively or as add-ons on most platforms
among all the versions within the win- that most relational database systems (such as GNU/Linux, Windows using
dow in order to determine the exact provide a way to keep and examine a Cygwin, and macOS). The programmer
change that triggered the failure. log of executed SQL statements. The can easily combine them to perform
Differential debugging. Differenc- programmer can sometimes better any imaginable debugging analysis
es between datasets can also reveal a understand a program’s behavior by task. This is important, as effective de-
fault when the programmer can lay obtaining a snapshot of its open files bugging often requires developing and
hands on a working system and a fail- and network connections. Tools that running ad hoc processing tasks.
ing one.34 The goal is to find where and provide this information include lsof Here are several examples of how
why the operation of the two systems (Unix), netstat (Unix and Windows), typical Unix command-line tools can
diverges. The data that can be used and tcpvview (Windows). Finally, two be used in a debugging session: The
for this purpose can come from their tools, DTrace14 and SystemTap11 allow programmer fetches data from the file
generated log files, their execution the programmer to trace a system’s system using the find command from
environment, or traces of their opera- operation across the entire software webpages and services using curl and
tion. In all cases the programmer must stack. They should be used if available. from compiled files (depending on the
ensure the two system configurations A programmer can also investigate platform) by running nm, javap, or
are as similar as possible, apart from a failing program’s trace log without dumpbin. The programmer can then
exhibiting the failure. using a working program’s log as a select lines that match a pattern with
Examining log files for differences reference. However, such an investiga- grep, extract fields with cut, mas-
can be easily performed by configuring tion typically requires a deeper under- sage the content of lines with sed, and
the most detailed logging possible, col- standing of the program’s operation, perform sophisticated selections and
lecting the logs, removing nonessential the ability to pinpoint the pertinent log summarize with awk. With normalized
differences (or keeping only the perti- parts, and access to the source code in datasets at hand, the programmer can
nent records), and comparing them. order to decipher the trace being read. then employ sort and uniq to create
Looking for differences in the envi- Things to look for in such cases are ordered sets and count occurrences,
ronment in which the two systems op- failing system calls, library calls that comm and join to find set differences
erate involves examining a program’s return with an error, network timeouts, and join sets together, and diff to
user input, command-line arguments, and empty query result sets. look at differences. Lastly, the number
environment variables, accessed files The log files of the failing system of results can be summarized with
(including configuration, executables, and the working system often differ in wc, the first or last records can be ob-
and libraries), and associated services. subtle ways that hinder their automatic tained with head or tail, and a hash
Investigating differences in the comparison; for example, they may infor further processing derived through
operation of the two systems is more clude different timestamps, process md5sum. For tasks that are performed
difficult, but thankfully many tools identifiers, or host names. The solution often, it is a good practice to package
can help. My favorite source of intelli- is thus to remove the unessential differ- the invocation of the corresponding
gence regarding a system’s operation ing fields. The programmer can do that commands into a Unix shell script
is the calls it makes to the operating with the editor, Unix filter tools, or a and distribute the script as part of the

code’s software-developer tools. order of magnitude based on context. code or control breakpoints that are
As a concrete case, consider the task To get to the bottom of such problems easily implemented by patching the
of locating a resource leak in the code. the programmer needs to obtain de- code location where a breakpoint is in-
A simple heuristic could involve look- tails of low-level hardware interactions serted, data breakpoints are difficult to
ing for mismatches between the num- (such as cache misses and incorrect implement efficiently because the cor-
ber of calls to obtainResource and jump predictions) through tools that responding value needs to be checked
calls to releaseResource; see Figure use the CPU’s performance counters. after each CPU instruction. A debugger
2 for a small Bash script that performs These counters tally CPU events asso- could check the value in software by
this task. The script uses the grep and ciated with performance and expose single-stepping through the program’s
sort commands to create two ordered them to third-party tools (such as the instructions but would reduce its ex-
sets: one with the number of calls to Concurrency Visualizer extension for ecution speed to intolerable levels.
obtainResource in each file and an- Visual Studio, Intel’s VTune Perfor- Many current CPUs instead of-
other with the corresponding number mance Analyzer, and the Linux perf fer the ability to perform this check
of calls to releaseResource. It then command). For example, performance through their hardware using so-
provides the two sets to the comm com- counters can allow the programmer to called write monitors. All the debug-
mand that will display the cases where detect performance issues associated ger has to do is to set special proces-
records in the two sets do not match. with false sharing among threads. sor registers with the memory address
Scripting languages. If the editor and the length of the memory area the
cannot handle the required debug- Getting More from a Debugger programmer wants to monitor. The
ging analysis and the programmer is Given the propensity of software to at- processor will then signal the debug-
daunted by the Unix command-line in- tract and generate bugs, it is hardly sur- ger every time the contents of these
terface, one can also analyze data with prising that the capabilities of debug- locations change. Based on this facil-
a scripting language (such as Python, gers are constantly evolving: ity, a debugger can also implement a
Ruby, or Perl). Typical tasks that need Data breakpoints. One impressive conditional data breakpoint that inter-
to be mastered in order to analyze de- facility in many modern debuggers is rupts the program’s execution when a
bugging data include sequential read- the ability to break a program’s opera- value satisfies a given condition. The
ing of text records from a file, splitting tion when a given value changes, the computational overhead in this case
text lines on some delimiter, extract- so-called “data breakpoints.” Unlike is a bit greater because the debugger
ing data through regular expression
matching, storing data in associative Figure 2. Ad hoc location of a probable resource leak.
arrays, and iterating through arrays to

summarize the results. An advantage
of such scripts over other data-analysis # List non - common lines between the two sets
approaches is the programmer can comm -3 <(
easier integrate them within compos- # Counts per file of obta inResour ce
ite project workflows that may involve grep - rc obtainR esource . | sort ) <(
# Counts per file of releaseResource
sending email messages to developers grep - rc releaseResource . | sort )
or updating databases and dashboards.
Profiling. When debugging perfor-
mance issues, the tools the program-
mer can use various profilers to debug
individual processes. The simplest Figure 3. Example of a reverse-debugging session.
work through sampling, interrupting

the program’s behavior periodically
and giving a rough indication regard- Breakpoint 1 , main () at hairy_code . c :1219
ing the routines where the program 1219 read_data ();
spends most of its time. The program- (gdb) record
(gdb) next
mer thus identifies the routines on 1220 analyze_data ();
which to concentrate optimization (gdb) next
efforts. One notch more advanced is hairy_code : Panic !
graph-based profilers13 that intercept 1221 di sp la y _r es ul t s ();
each routine’s entry and exit in order (gdb) reverse - next
to provide precise details not only of 1220 analyze_data ();
(gdb) step
a routine’s contribution to the soft-
analyze_data () at hairy_code . c :1209
ware’s overall CPU use but also how the 1209 if ( n == 0)
cost is distributed among the routine’s (gdb) step
callers. An added complication of per- 1210 warnx (" Panic !");
formance debugging in modern sys-
tems is that machine instructions can
vary their execution time by at least an

has to check the condition on every read _ data(); recorder and debugger, which works
change but still orders of magnitude analyze _ data(); with Java applications). The way pro-
lower than the alternative of checking display _ results(); grammers work with these tools is to
after every instruction. run the application under their con-
Data breakpoints are especially use- Stepping into each routine to see if trol until the failure emerges. This run
ful when the programmer is unfamiliar it prints the message may take ages. will generate a recording of the session
with the program’s operation and wants Figure 3 shows the log of a gdb debug- that can then be replayed under a spe-
to pinpoint what statements change ger session, demonstrating how a pro- cialized debugger to locate the state-
a particular value. They also come in grammer can find the location through ment that causes the failure. With the
handy in languages that lack memory reverse debugging; the source code statement in hand the programmer
bounds checking (such as C and C++) line listed before each gdb prompt is can look at the program state to see
in order to identify the cause of memory the one to be executed next. Graphi- what caused the particular statement
corruption. All a programmer has to do cal interfaces to similar functionality to execute or why its execution was
is set a data breakpoint associated with are also available through commercial not prevented through a suitable lock.
the corrupted element and wait for the offerings (such as Microsoft’s Intel- These tools thus transform a fleeting
data breakpoint to trigger. liTrace for the .NET platform and the nondeterministic failure into a stable
Reverse debugging. Another cool Chronon Time Travelling Debugger for one that can be targeted and debugged
feature available through recent ad- the Java ecosystem). with ease.
vances in hardware capabilities is “re- The programmer first sets up gdb for Running and dead processes. Two
verse debugging,”12 or the ability to run reverse debugging by issuing the re- time-honored but still very useful
code in reverse, in effect traveling back in cord command, then runs each func- things programmers can do with a de-
time. When forward-stepping through tion but steps over its innards with the bugger is to debug processes that are
the code starting from a statement A, a next command. Once the program- already running and processes that
programmer finds statements and vari- mer sees the “Panic!” message, which is have crashed. Debugging a running
ables that can be influenced by A, called emitted by the analyze data routine, process is the way to go if it is misbe-
by researchers a “forward slice.”40 When the programmer issues the reverse- having with a failure that is difficult
stepping through code in reverse from next command to undo the previous to reproduce. In this case, a program-
A, a programmer finds statements and next and move the execution context mer would use the operating system’s
values that could have influenced A; this again just before the call to the ana- process-display command (such as ps
backward slice can help the program- lyze.data routine. This time the pro- under Unix and TaskManager under
mer understand how the program end- grammer issues the step command to Windows) to find the numerical identi-
ed up in a specific state. step into the routine and find why the fier of the offending process. The pro-
Reverse debugging is implemented message appeared. grammer can then fire the debugger,
through brute-force computation by Capture and replicate. With multi- instructing it to debug the process’s
having the debugger log the effect of core processors found in even low-end executable file but also to attach itself
each instruction and thereby obtain smartphones today, multithreaded to the running process specified by
the data required to undo it. It has be- code and the bugs associated with it its identifier. From this point onward,
come feasible with fast CPUs and abun- are a (frustrating) fact of life. Debug- programmers can use the debugger as
dant main memory. When debugging a ging such failures can be difficult be- they normally would:
single application, not all actions can cause the operation of multithreaded Interrupt stuck program. A program-
be undone; once an operating system programs is typically nondetermin- mer can interrupt a stuck program to
call has been performed on a program, istic; each run of a program executes see at what point the program entered
effects that cross the debugger’s event the threads in a slightly different order into an endless loop or issued a non-
horizon are there to stay. Nevertheless, that may or may not trigger the bug. I returning system call;
the capability can be beneficial when recall the agony of debugging multi- Add new breakpoints. A programmer
debugging algorithmic code. It is most threaded rendering code that would can add new breakpoints to see when
useful in cases where, while search- occasionally miscalculate just a cou- and how a program reaches a particu-
ing for the cause of a failure, a pro- ple of pixels in a four-million-pixel im- lar code position; and
grammer might inadvertently step or age. To pinpoint a race condition that Examine values. A programmer can
glance over the culprit statements. At exhibits itself in a few nanoseconds examine the values of variables and
this point, the programmer can rewind within a multi-hour program run, pro- display the call stack.
the execution to the point before the grammers need all the help they can Debugging a crashed process al-
culprit statements and move forward get from powerful software. lows programmers to perform a
again more cautiously. Tools helpful in such cases are post-mortem examination of the
As an example, consider debugging often those able to capture and rep- facts related to its demise. Some sys-
a problem associated with the display licate in full detail a program’s mem- tems allow programmers to launch
of the cryptic message “Panic!,” which ory-access operations16 (such as the a debugger at the moment a process
appears in hundreds of places within PinPlay/DrDebug Program Record/ crashes. A more flexible alternative
the code. At some point the program- Replay Toolkit,26 which can be used involves obtaining an image of the
mer may be going over code like this with Eclipse or gdb, and the Cronon memory associated with the process,

the so-called “core dump” (Unix) or brought with them the necessity of be-
Minidump (Windows). This allows ing able to debug systems remotely. A
the programmer to obtain the dump debugger with a graphical interface is
from a production environment or a not ideal in such situations because it
customer site and then dissect it on
the development environment. There No bug can elude might not be sufficiently responsive
when debugging a cloud application
are various methods for obtaining a
process’s memory dump. On Unix sys-
a programmer across the planet or when a particu-
lar IoT platform may lack the power
tems, a programmer typically will con- who perseveres. to run it. Consequently, it may make
figure the operating system core file sense for programmers to acquaint
size limit through the system’s shell themselves with a debugger’s com-
and then wait for the process to crash mand-line interface, as well as the
or send it a SIGQUIT signal. On Win- shell commands required to debug
dows systems a programmer can use more complex systems. An alterna-
the Procdump18 program to achieve tive that may sometimes work is a GUI
the same results. In both cases, ob- debugger’s ability to communicate
taining a memory dump from a still- with a small remote debugger-monitor
running but hung process allows the program the programmer installs and
programmer to debug infinite loops runs at the remote end.
and concurrency deadlocks.9 Monitoring. When debugging dis-
Although a memory dump will not tributed systems, monitoring and
allow a programmer to resurrect and logging are the name of the game.
step through the execution of the cor- Monitoring will flash a red light when
responding process, though it is still something goes wrong, giving the team
useful, because the programmer can an opportunity to examine and under-
examine the sequence of calls that stand why the system is misbehaving
were in effect at the point of the crash, and thus help pinpoint the underlying
the local variables of each routine in cause. In such cases a programmer is
that sequence, and the values of global often not debugging the code of indi-
and heap-allocated objects. vidual processes but the architecture,
configuration, and deployment of
Debugging Distributed Systems systems that may span an entire data-
Modern computing rarely involves an center or the entire planet. A team can
isolated process running on a system monitor individual failures and perfor-
that matches a programmer’s par- mance trends with systems like Nag-
ticular development environment. In ios, NetData, Ganglia, and Cacti. An in-
many cases, the programmer is deal- teresting approach for generating and
ing with tens to thousands of process- thus being able to debug rare failures
es, often distributed around the world in complex distributed systems is to
and with diverse hardware ranging cause controlled component failures
from resource-constrained Internet of through specialized software, an ap-
Things (IoT) devices, to smartphones, proach pioneered by Netflix through
to experimental and specialized plat- its ChaosMonkey.36
forms. While these systems are the fuel Event logging. Given that it is not yet
powering the modern economy, they possible for a programmer to single-
also present programmers with special step concurrently through the multi-
challenges. According to the insightful tude of processes that might comprise
analysis by Ivan Beschastnikh and col- a modern system, when debugging
leagues at the University of British Co- such failures a programmer must rely
lumbia, these are heterogeneity, con- on event logging, which involves pro-
currency, distributing state, and partial cesses logging operational events that
failures.4 Moreover, following my own target system administrators and reli-
experience, add the likely occurrence of ability engineers. Unlike the software-
events that would be very rare on an iso- tracing statements a programmer may
lated machine, the difficulty of correlat- use to pinpoint a failure in an individ-
ing logs across several hosts,2 and rep- ual process, event logging is always en-
licating failures in the programmer’s abled in a production environment. By
development environment. providing “observability,” logging can
Remote debugging. The emergence help operations personnel ascertain
of cloud computing and the IoT have an application’s health status, view its

interactions with other processes, and to logging, to single-stepping, to con- 19. Mernik, M., Heering, J., and Sloane, A.M. When and
how to develop domain-specific languages. ACM
determine changes in a system’s constructing a unit test, or a specialized Computing Surveys 37, 4 (Dec. 2005), 316–344.
figuration. By listing metrics and error tool. No bug can elude a programmer 20. Nasehi, S.M., Sillito, J., Maurer, F., and Burns, C.
What makes a good code example?: A study of
messages, logs can reveal a sickly ap- who perseveres. And keep in mind that programming Q&A in StackOverflow. In Proceedings
plication (such as one with unusually the joy of fixing a fault is proportional of the 28th IEEE International Conference on Software
Maintenance (Riva del Garda, Trento, Italy, Sept.
high latency or memory use) or expose to the work the programmer puts into 23–30). IEEE Press, 2012, 25–34.
one that fails due to insufficient privi- debugging the failure. 21. Nethercote, N. and Seward, J. Valgrind: A framework
for heavyweight dynamic binary instrumentation. In
leges. Such things can help program- Proceedings of the 28th ACM SIGPLAN Conference on
Programming Language Design and Implementation
mers pinpoint a specific application Acknowledgments (San Diego, CA, June 10–13). ACM Press, New York,
as a contributing factor in a more com- My thanks to Moritz Beller, Alexander 2007, 89–100.
22. Neumann, P.G. Computer Related Risks. Addison-
plex failure. Lattas, Dimitris Mitropoulos, Tushar Wesley, Reading, MA, 1995.
Virtualization and system simula- Sharma, and the anonymous reviewers 23. Nielson, F., Nielson, H.R., and Hankin, C. Principles of
Program Analysis. Springer, Berlin, Germany, 2015.
tors. One family of technologies that for insightful comments on earlier ver- 24. O’Dell, D.H. The debugging mind-set. Commun. ACM
can help debug software running sions of this article. 60, 6 (June 2017), 40–45.
25. Orebaugh, A., Ramirez, G., and Beale, J. Wireshark &
on hardware that does not match a Ethereal Network Protocol Analyzer Toolkit. Syngress,
References
given development environment in- 1. Ayewah, N., Hovemeyer, D., Morgenthaler, J.D., Penix,
Cambridge, MA, 2006.
26. Patil, H., Pereira, C., Stallcup, M., Lueck, G., and
cludes virtual machines, emulators, J., and Pugh, W. Using static analysis to find bugs. Cownie, J. Pinplay: A framework for deterministic
IEEE Software 25, 5 (Sept. 2008), 22–29.
and system simulators. With virtual 2. Bailis, P., Alvaro, P., and Gulwani, S. Research for
replay and reproducible analysis of parallel programs.
In Proceedings of the Eighth Annual IEEE/ACM
machines and operating system vir- practice: Tracing and debugging distributed systems; International Symposium on Code Generation and
programming by examples. Commun. ACM 60, 7 (July Optimization (Toronto, ON, Canada, Apr. 24–28). ACM
tualization systems (such as Docker), 2017), 46–49. Press, New York, 2010, 2–11.
software development teams can cre- 3. Beller, M., Spruit, N., Spinellis, D., and Zaidman, A. 27. Perscheid, M., Siegmund, B., Taeumel, M., and
On the dichotomy of debugging behavior among Hirschfeld, R. Studying the advancement in debugging
ate a single environment that can be programmers. In Proceedings of the 40th International practice of professional software developers. Software
used for development, debugging, Conference on Software Engineering (Gothenburg, Quality Journal 25, 1 (Mar. 2017), 83–110.
Sweden, May 27–June 3). ACM Press, New York, 2018, 28. Runeson, P. A survey of unit-testing practices. IEEE
and production deployment. Such 572–583. Software 23, 4 (July 2006), 22–29.
containers are also useful when a pro- 4. Beschastnikh, I., Wang, P., Brun, Y., and Ernst, M.D. 29. Sack, P., Bliss, B.E., Ma, Z., Petersen, P., and Torrellas,
Debugging distributed systems. Commun. ACM 59, 8 J. Accurate and efficient filtering for the Intel Thread
grammer wants to find and eliminate (Aug. 2016), 32–37. Checker race detector. In Proceedings of the First
configuration-related errors. More- 5. Bessey, A., Block, K., Chelf, B., Chou, A., Fulton, B., Workshop on Architectural and System Support for
Hallem, S., Henri-Gros, C., Kamsky, A., McPeak, S., and Improving Software Dependability (San Jose, CA,
over, development environments for Engler, D. A few billion lines of code later: Using static Oct. 21–25). ACM Press, New York, 2006, 34–41.
analysis to find bugs in the real world. Commun. ACM
some commonly used embedded plat- 53, 2 (Feb. 2010), 66–75.
30. Serebryany, K., Bruening, D., Potapenko, A., and
Vyukov, D. Address-Sanitizer: A fast address sanity
forms (such as smartphones) come 6. Böhme, M., Soremekun, E.O., Chattopadhyay, S., checker. In Proceedings of the 2012 USENIX Annual
Ugherughe, E., and Zeller, A. Where is the bug and
with an emulator, allowing program- how is it fixed? An experiment with practitioners. In
Technical Conference (Boston, MA, June 13–15).
USENIX Association, Berkeley, CA, 2012, 309–318.
mers to experience the capabilities of Proceedings of the 11th Joint Meeting on Foundations 31. Spinellis, D. Code Reading: The Open Source
of Software Engineering (Paderborn, Germany, Sept.
the target hardware from the comfort 4–8). ACM Press, New York, 2017, 117–128.
Perspective. Addison-Wesley, Boston, MA, 2003.
32. Spinellis, D. Working with Unix tools. IEEE Software
of a desktop. Finally, when a team is 7. Branco, R.R. Ltrace internals. In Proceedings of the 22, 6 (Nov./Dec. 2005), 9–11.
Linux Symposium, A.J. Hutton and C.C. Ross, Eds. 33. Spinellis, D. Debuggers and logging frameworks. IEEE
developing software and hardware to- (Ottawa, ON, Canada, June 27–30, 2007), 41–52; Software 23, 3 (May/June 2006), 98–99.
gether, a full system simulator (such https://www.kernel.org/doc/ols/2007/ols2007v1- 34. Spinellis, D. Differential debugging. IEEE Software 30,
pages-41-52.pdf 5 (Sept./Oct. 2013), 19–21.
as Simics17) will provide a high-fidelity 8. Cadar, C. and Sen, K. Symbolic execution for software 35. Stahl, T. and Volter, M. Model-Driven Software
view of the complete platform stack. testing: Three decades later. Commun. ACM 56, 2 Development: Technology, Engineering, Management.
(Feb. 2013), 82–90. John Wiley & Sons, Inc., New York, 2006.
9. Cantrill, B. and Bonwick, J. Real-world concurrency. 36. Tseitlin, A. The anti-fragile organization. Commun.
Conclusion Commun. ACM 51, 11 (Nov. 2008), 34–39. ACM 56, 8 (Aug. 2013), 40–44.
10. Duvall, P.M., Matyas, S., and Glover, A. Continuous 37. Wilkes, M. The Birth and Growth of the Digital
The number of possible faults in a soft- Integration: Improving Software Quality and Reducing Computer. Lecture delivered at the Digital Computer
ware system can easily challenge the Risk. Pearson Education, Boston, MA, 2007. Museum, available through the Computer History
11. Eigler, F.C. Problem solving with Systemtap. In Museum, Catalog Number 102695269, Sept. 1979;
limits of human ingenuity. Debugging Proceedings of the Linux Symposium, A. J. Hutton https://youtu.be/MZGZfsr1KfY
and C. C. Ross, Eds. (Ottawa, ON, Canada, July
the corresponding failures thus re- 19–22, 2006), 261–268; https://www.kernel.org/doc/
38. Zeller, A. Automated debugging: Are we close?
Computer 34, 1 (Nov. 2001), 26–31.
quires an arsenal of tools, techniques, ols/2006/ols2006v1-pages-261-268.pdf 39. Zeller, A. Isolating cause-effect chains from computer
12. Engblom, J. A review of reverse debugging. In
methods, and strategies. Here I have Proceedings of the 2012 System, Software, SoC and
programs. In Proceedings of the 10th ACM SIGSOFT
Symposium on Foundations of Software Engineering
outlined some I find particularly effec- Silicon Debug Conference (Vienna, Austria, Sept. (Charleston, SC, Nov. 18–22). ACM Press, New York,
19–20). Electronic Chips & Systems Design Initiative,
tive, but there are many others I con- Gières, France, 2012, 28–33.
2002, 1–10.
40. Zeller, A. Why Programs Fail: A Guide to Systematic
sider useful, as well as many special- 13. Graham, S.L., Kessler, P.B., and McKusick, M.K. An Debugging, Second Edition. Morgan Kaufmann,
execution profiler for modular programs. Software: Burlington, MA, 2009.
ized ones that may work wonders in a Practice & Experience 13, 8 (Aug.1983), 671–685. 41. Zeller, A. and Hildebrandt, R. Simplifying and isolating
particular environment. 14. Gregg, B. and Mauro, J. DTrace: Dynamic Tracing in failure-inducing input. IEEE Transactions on Software
Oracle Solaris, Mac OS X, and FreeBSD. Prentice Hall Engineering 28, 2 (Feb. 2002), 183–200.
Each debugging session represents Professional, Upper Saddle River, NJ, 2011.
a new venture into the unknown. Pro- 15. Kernighan, B.W. Sometimes the old ways are best.
IEEE Software 25, 6 (Nov. 2008), 18–19. Diomidis Spinellis (dds@aueb.gr) is a professor in
grammers should work systemati- 16. LeBlanc, T.J. and Mellor-Crummey, J.M. Debugging and head of the Department of Management Science
cally, starting with an approach that parallel programs with Instant Replay. IEEE and Technology in the Athens University of Economics
Transactions on Computers C-36, 4 (Apr. 1987), 471–482. and Business, Athens, Greece, and author of Effective
matches the failure’s characteristics, 17. Magnusson, P.S., Christensson, M., Eskilson, J., Debugging: 66 Specific Ways to Debug Software and
but adapt it quickly as they uncover Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Systems, Addison-Wesley, 2016.
Moestedt, A., and Werner, B. Simics: A full system
more things about the failure’s prob- simulation platform. Computer 35, 2 (Feb. 2002), 50–58.
18. Margosis, A. and Russinovich, M.E. Windows
able cause. Programmers should not Sysinternals Administrator’s Reference. Microsoft
hesitate to switch from Web searching, Press, Redmond, WA, 2011. © 2018 ACM 0001-0782/18/11 $15.00

CALL FOR PAPERS
2019 IEEE World Congress on Services
BigData/CLOUD/EDGE/ICCC/ICIOT/ICWS/SCC
July 8-13, 2019 Milan, Italy
http://conferences.computer.org/services/2019/
2019 IEEE World Congress on Services (SERVICES) will be held on July 8-13, 2019 in Milan, Italy. The Congress is solely sponsored
by the IEEE Computer Society under the auspice of the Technical Committee on Services Computing (TCSVC). The scope of
the Congress will cover all aspects of services computing and applications, current or emerging. It covers various systems and
networking research pertaining to cloud, edge and Internet-of-Things (IoT), as well as technologies for intelligent computing,
learning, big data and blockchain applications, while addressing critical issues such as high performance, security, privacy,
dependability, trustworthiness, and cost-effectiveness. Authors are invited to prepare early and submit original papers to any
of these conferences at www.easychair.org. All submitted manuscripts will be peer-reviewed by at least 3 reviewers. Accepted and
presented papers will appear in the conference proceedings published by the IEEE Computer Society Press. The Congress will be
organized with the following seven affiliated conferences/congresses:
IEEE International Congress on Big Data (BigData Congress) 2019 Congress General Chair
Big data acquisitions, analyses, storage, and mining for various services and applications Peter Chen, Carnegie Mellon University, USA
IEEE International Conference on Cloud Computing (CLOUD) 2019 Congress Program Chair-in-Chief
Innovative cloud computing for both high quality infrastructure and mobile services
Elisa Bertino, Purdue University, USA
IEEE International Conference on Edge Computing (EDGE) 2019 Congress Vice Program Chair-in-Chief
High quality services computing between cloud systems and Iot devices
Ernesto Damiani, University of Milan, Italy
IEEE International Conference on Web Services (ICWS) Workshop Chairs
Innovative web services for various effective applications
Shangguang Wang, BUPT
IEEE International Conference on Cognitive Computing (ICCC) Stephan Reiff-Marganiec, University of Leicester
Cognitive computing, learning algorithms for intelligent services and applications
Steering Committee
IEEE International Congress on Internet of Things (ICIOT) Elisa Bertino, Purdue University, USA
Innovative IoT technology for digital world services Carl K. Chang, Chair, Iowa State University, USA,
Rong N. Chang, TCSVC Chair, IBM, USA
IEEE International Conference on Services Computing (SCC)
Intelligent services computing, lifecycles, infrastructure and mobile environments
Peter Chen, Carnegie Mellon University USA
Ernesto Damiani, University of Milan, Italy
Key Dates Ian Foster, University of Chicago
Early Paper submission due: December 1, 2018 & Argonne National Lab, USA
Review comments to authors of early-submission papers: January 15, 2019 Dennis Gannon, Indiana University, USA
Normal Paper submission due: February 4, 2019 Michael Goul, Arizona State University, USA
Final notification to authors: March 15, 2019. Frank Leymann, University of Stuttgart, Germany
Camera-ready manuscripts due: April 1, 2019 Hong Mei, Beijing Institute of Technology, China
Congress Date: July 8 – 13, 2019 Stephen S. Yau, Arizona State University, USA
CALL FOR WORKSHOP PROPOSALS: See http://conferences.computer.org/services/2019/ for more information.
Send inquiries to: ieeecs.services@gmail.com
review articles
DOI:10.1145/ 3186331
so that they can be readily modified
Conventional storage software stacks are and intensely optimized without jeop-
ardizing system stability.
unable to meet the needs of high-performance This article begins by giving a quan-
Storage-Class Memory technology. It is time to titative exploration of the need to shift
rethink 50-year-old architectures. away from kernel-centric generalized
storage IO architectures. We then dis-
cuss the fundamentals of user space
BY DANIEL WADDINGTON AND JIM HARRIS
(kernel-bypass) operation and the
potential gains that result. Following
Software
this, we outline key considerations
necessary for their adoption. Finally,
we briefly discuss software support for
NVDIMM-based hardware and how
Challenges for
this is positioned to integrate with a
user space philosophy.
Evolution of Storage IO
the Changing
Since the release of the Intel 8237 in
the IBM PC platform (circa 1972), net-
work and storage device IO has cen-
tered around the use of Direct Memory
Storage
Access (DMA). This enables the system
to transfer data to and from a device to
main memory with no involvement of
the CPU. Because DMA transfers can
Landscape
be initiated to any part of main mem-
ory, coupled with the need to execute
privileged machine instructions (for
example, masking interrupts), device
drivers of this era were well suited to
the kernel. While executing device
drivers in user space was in theory pos-
sible, it was unsafe because any misbe-
having driver could easily jeopardize
the integrity of the whole system.
on a new era of storage performance,
A S WE EM B AR K As virtualization technologies
evolved, the consequence of broad
the limitations of monolithic OS designs are
beginning to show. New memory technologies key insights
(for example, 3D XPoint™ technology) are driving ˽˽ NVMe and memory-based storage
technologies are experiencing an
multi-GB/s throughput and access latencies at sub- exponential growth in performance
microsecond scales. As the performance of these with aggressive parallelism and fast
new media. Traditional IO software
devices approaches the realms of DRAM, the overhead architectures are unable to sustain
these new levels of performance.
incurred by legacy IO stacks increasingly dominates. ˽˽ IOMMU hardware is a key enabler
To address this concern, momentum is gathering for realizing safe and maximal
performing user space device drivers
around new ecosystems that enable effective and storage IO stacks.
construction of tailored and domain-specific IO ˽˽ Kernel-bypass strategies rely on

“asynchronous polling” whereby threads
architectures. These ecosystems rely on bringing actively check device completion queues.
Naive designs can lead to excessive busy-
both device control and data planes into user space, waiting and inefficient CPU utilization.

access to memory for device drivers virtualization and indirection, certain driven primarily by the need to provide
would become a prominent issue for use cases aim at providing a virtual- virtual machines with direct device ac-
system stability and protection be- ized guest with direct ownership and cess its availability has, more recently,
tween hosted virtual machines. One access to specific hardware devices in become a pivotal enabler for rethink-
approach to this problem is the use of the system. Such a scheme improves ing device driver and IO architectures
device emulation. However, emulation IO performance at the expense of the in non-virtualized environments.
incurs a significant performance pen- ability to transparently share devices Another significant advance in de-
alty because each access to the device across multiple guests using the hy- vice virtualization was made by the
requires a transition to the Virtual Ma- pervisor. The key enabling hardware PCI Express SR-IOV (Single Root IO
chine Monitor (VMM) and back. technology for device virtualization Virtualization) extension.11 SR-IOV
An alternative approach is to use is the IO Memory Management Unit enables multiple system images (SI)
para-virtualization that, by modify- (IOMMU). This provides the ability to or virtual machines (VMs) in a virtual-
ing the host OS device drivers so they remap device (DMA) addresses to phys- ized environment to share PCI hard-
interact directly with the hypervisor ical addresses in the system, in much ware resources. It reduces VMM over-
without entering the VMM, improves the same way that the MMU performs head by giving VM guests direct access
performance over prior emulation a translation from virtual to physical to the device.
techniques. The downside is that the addresses (see Figure 1). Around 2006, New frontiers of IO performance.
host OS code needs to be modified IOMMU capabilities were made avail- Over the last three decades, compute,
IMAGE BY CYCLONEPRO JECT
(including interrupt handling) and able in the Intel platform through its network, and storage performance
additional latency results from more Virtualization Technology for Directed have grown exponentially according to
IO layering. IO (VT-d)1 followed by AMD-Vi on the Moore’s Law (see Figure 2). However,
Virtualization and direct device as- AMD x86 platform. over the last decade, growth of compute
signment. To minimize the impact of Although IOMMU technology was performance has slowed compared

review articles
Figure 1. MMU and IOMMU duality. Figure 2. Relative IO performance growth. (Data collected by IBM Research, 2017)
Compute Network Storage

105
Main Memory
physical addresses
Performance relative to 1990

104
IOMMU MMU
103
device addresses virtual addresses
102
Device CPU
101
to the growth of storage performance 100

mainly due to the CPU frequency ceil- 1990 1995 2000 2005 2010 2015 2020
ing causing a shift in microprocessor Year
growth strategy. We expect this trend
to continue as new persistent memory
technologies create an aggressive up- architecture is that of monolithic. limited. It was not until more than a
swing in storage performance. This means that core OS functionality decade later (2006) that the multicore
The accompanying table shows executes in kernel space and it can- microprocessor would appear. Fol-
some performance characteristics of not be readily modified or adapted. lowing the footsteps of multicore,
select state-of-the-art IO devices. Low- Threads within the kernel, are alone, network devices would also begin to
er latency, increased throughput and given access to privileged processor support multiple hardware queues
density, and improved predictability instructions (for example, x86 Ring-0). so that parallel cores could be used to
continue to be key differentiators in Kernel functionality includes interrupt service high-performance networking
the networking and storage markets. handling, file systems, scheduling, traffic. This multi-queue trend would
As latency and throughput improve, memory management, security, IPC also appear in storage devices, par-
the CPU cycles available to service IO and device drivers. This separation of ticularly with the advent of NVMe SSD
operations are reduced. Thus, it is user and kernel space came about as a (Solid State Drive). Today, hardware-
evident that the latency overhead im- solution to guard against “untrusted” level parallelism, in both CPU and IO
posed by traditional kernel-based IO applications from accessing resources is prominent. Intel’s latest Xeon Plati-
paths has begun to exceed the latency that could interfere with other applica- num processors provide up to 28 cores
introduced by the hardware itself. tions or interfere with OS functional- and hyper-threading on each. AMD’s
ity directly. For example, disallowing latest, Naples server processor, based
Reconsidering the IO Stack applications from terminating other on Zen, provides 32 cores (64 threads)
The prominent operating systems of applications or writing to memory out- in a single socket. Many state-of-the-
today, such as Microsoft Windows side of their protected memory space, art NVMe drives and network interface
and Linux, were developed in the is fundamental to system stability. cards support 64 or more hardware
early 1990s with design roots stem- In the early stages of modern OS de- queues. The trend toward parallel
ming from two decades earlier. Their velopment, hardware parallelism was hardware is clear and not expected to
diminish anytime soon. The conse-
State-of-the-art IO device performance reference points. quence of this shift from single-core
and single-queue designs, to multi-
core and multi-queue designs is that
Class Technology Performance
the IO subsystem has needed to evolve
Compute Server CPUs 24+ cores per die, 10nm, 4GHz, 100M
to support concurrency.
transistors per mm2
Strained legacy stacks. In terms of
Memory DDR4-3200 DRAM 3200MT/s or 25.6GB/s (288-pin DIMM)
IO request rates, storage devices are an
Backplane PCI Express 4.0 16GT/s or 31GB/s for x16 channels
order of magnitude slower than net-
Networking Mellanox ConnectX-6 200Gbps/200M PPS
work devices. For example, the fastest
SSD Storage Intel Optane P4800X NVMe 2.3GB/s random read/write, < 10 usec latency
SSD devices operate at around 1M IOPS
SSD Storage Samsung PM1725a NVMe 6.4GB/s sequential read, 1.08MIOPS,
95 usec latency (IO operations per second) per device,
Memory (future) 3D XPoint NV-DIMM Technology < 1 usec latency (expected) whereas a state-of-the-art NIC device
is capable of handling more than 70M
packets per second. This slower rate

review articles
means that legacy OS improvement ef- per device) and 30% total CPU capac- approximately double that of the raw la-
forts in the storage space are still con- ity. Adding threads from 17 to 26 gives tency of the device (mean 6.25μsec). For
sidered worthwhile. negligible scaling. Beyond 26 worker applications where synchronous per-
With the advent of multicore, en- threads, performance begins to de- formance is paramount and latency is
hancing concurrency is a clear ap- grade and become unpredictable al- difficult to hide through pipelining, this
proach to improving performance. though CPU utilization remains linear performance gap can be significant.
Many legacy OS storage subsystems for some time. Application-specific IO subsystems.
realize concurrency and asynchrony File systems and kernel IO process- An emerging paradigm is to enable
through kernel-based queues serviced ing also add latency. Figure 4 shows customization and tailoring of the IO
by worker threads. These are typically latency data for direct device access stack by “lifting” IO functions into user
allocated for each processor core. (using Micron’s kernel-bypass UNVMe space. This approach improves system
Software queues can be used to man- framework) and the stock Ext4 file sys- stability where custom IO processing
age the mapping between application tem. This data is from a single Intel is being introduced (that is, custom
threads running on specific cores, Optane P4800X SSD. The filesystem stacks can crash without jeopardizing
and the underlying hardware queues and kernel latency (mean 13.92μsec) is system stability) and allows developers
available on the IO device. This flex-
ibility was introduced into the Linux Figure 3. Ext4 file system scaling on software RAID0.
3.13 kernel in 20125 providing greatly
improved IO scaling for multicore 4000 100
and multi-queue systems. The Linux
kernel block IO architecture is aimed 3500
at providing good performance in 80
3000
the “general” case. As new IO devices
Total CPU load (—)

(both network and storage) reach the 2500 60
realms of tens of millions of IOPS, the
KIOPS
generalized architecture and layering 2000

of the software stack begin to strain.
40
Even state-of-the-art work on improv- 1500
ing kernel IO performance is lim-
ited in success.15 Furthermore, even 1000
20
though the block IO layer may scale
500
well, layering of protocol stacks and
file systems typically increases serial- 0 0
ization and locking, and thus impacts 0 10 20 30 40 50 60 70 80 90
performance. # Threads
To help understand the relation-
ship between storage IO throughput
and CPU demand, Figure 3 shows IOPS Figure 4. Ext4 vs. raw latency comparison.
scaling for the Linux Ext4 file system.
rand-read (ext4) rand-read (uNVME)
This data is captured with the fio mi-
rand-write (ext4) rand-write (uNVME)
cro-benchmarking tool configured to 100
perform random-writes of 4K blocks
(random-read performance is simi-
lar). No filesharing is performed (the 80
Completion Latency (usec)
workload is independent). The experi-

mental system is an Intel E5-2699 v4
two-socket server platform with 512GB
60
main memory DRAM. Each proces-
sor has 22 cores (44 hardware threads)
and the system contains x24 NVMe
Samsung 172Xa SSD 1.5 TB PCIe de- 40
vices. Total IO throughput capacity is

∼6.5M IOPS (25GB/s). Each device is
PCI Gen 3 x8 (7.8GB/s) onto the PCI bus 20
and a single QPI (memory bus) link is
∼19.2GB/s. Each processor has x40 PCI
Gen 3.0 lanes (39.5GB/s). 0
The maximum throughput achieved 0 20 40 60 80 100
is 3.2M IOPS (12.21GB/s). This is re- Percentile
alized at a load of ∼26 threads (one

review articles
to protect intellectual property where user-space device drivers (unprivileged operating at a higher privilege level
open source kernel licenses implies processes) can be compartmentalized (that is, ring 0). For example, in Linux,
source release. so that memory regions valid for device the Virtual Function IO (VFIO) kernel
Although not originally designed DMA operations can be limited by the module can be used to configure the
for this purpose, a key enabler for us- IOMMU, and therefore device drivers registered memory with the IOMMU
er-level IO is the IOMMU. Specifically, can be prevented from accessing ar- and ensure memory is “pinned.”
the IOMMU provides the same capa- bitrary memory regions (via a device’s New architectures also allow in-
bilities to user-kernel processes as it DMA engine). terrupt handling to be localized to a
does to guest-host virtualized OSes (see Configuration of the IOMMU re- subset of processor resources (that is,
Figure 5). This effectively means that mains restricted to kernel functions mapping MSI to specific local APICs)
that are associated to a specific device
Figure 5. MMU and IOMMU duality. driver execution. Coupled with device
interrupt coalescing and atomic mask-
restricted
ing, this means that user-level inter-
memory mapping process physical rupt handling is also viable. However,
memory memory
the interrupt vector must still reside in
the kernel and be executed at a privi-
leged level, at least for Intel and IBM
Power architectures.
Foundational Kernel-bypass
Ecosystems
MMU IOMMU
In this section, we introduce the basic
enablers for kernel-bypass in the Linux
operating system. This is followed by a
CPU discussion of the Data Plane Develop-
Core Kernel registers Device ment Kit (DPDK) and Storage Perfor-
mance Development Kit (SPDK), two
kernel
foundational open source projects
User-Level
Device interrupt started by Intel Corporation. DPDK
Driver handling (MSI) has been widely adopted for building
kernel-bypass applications, with over
30 companies and almost 400 individ-
Figure 6. DPDK architecture.
Network Functions (Cloud, Enterprise, Comms)
LPM DISTRIB REORDER IVSHMEM POWER METER PORT TABLE
HASH JOBSTAT KNI VHOST IP FRAG SCHED PIPELINE
ACL
Classify Extensions QoS Pkt Framework
EAL ETHDEV CRYPTO Future
MBUF IGB BNX2X MPIPE VMXNET3 BONDING QAT TBD
IXGBE CXGBE NFP XENVIRT PCAP AESNI MB

MEMPOOL
E1000 ENIC SZEDATA2 VIRTIO RING AESNI GCM
RING
I40E MLX4 ENA AF_PKT SNOW 3G
TIMER
FM10K MLX5 NULL NULL
Core PMDs: Native and Virtual Accelerators User Space

Kernel
KNI IGB_UIO VFIO UIO_PCI_GENERIC

review articles
uals contributing patches to the open place devices into IOMMU groups. NVMe SSDs. While SPDK is primarily
source DPDK projects as of release User space processes can open these driven by Intel, there are an increas-
17.05. While DPDK is network-centric, IOMMU groups and register memory ing number of companies using and
it provides the basis for the SPDK stor- with the IOMMU for DMA access us- contributing to the effort. The proj-
age-centric ecosystem. Other projects, ing VFIO ioctls. VFIO also provides the ect desires broader collaboration that
such as FD.IO (http://fd.io) and Seastar ability to allocate and manage mes- may require adoption of a governance
(http://seastar-project.org) also use sage-signaled interrupt vectors. structure similar to DPDK. SPDK shows
DPDK. These domain specifics are not Data plane development kit. DPDK good promise for filling the same role
discussed in this article. (http://dpdk.org) was originally aimed for storage and storage networking as
Linux user space device enablers. at accelerating network packet pro- DPDK has for packet processing.
Linux kernel version 2.6 introduced cessing applications. The project was SPDK’s NVMe polled-mode drivers
the User Space IO (UIO)a loadable mod- initiated by Intel Corporation, but is provides an API to kernel-bypass appli-
ule. UIO is the older of the two kernel- now under the purview of the open cations for both direct-attached NVMe
bypass mechanisms in Linux (VFIO be- source Linux Foundation. At the core storage as well as remote storage us-
ing the other). It provides an API that of DPDK is a set of polled-mode Eth- ing the NVMe over Fabrics protocol.
enables user space handling of legacy ernet drivers (PMDs). These PMDs by- Figure 7 shows the SPDK framework’s
INTx interrupts, but not message-sig- pass the kernel, and by doing so, can core elements as of press time. Using
naled interrupts (MSI or MSI-X). UIO process hundreds of millions of net- SPDK, Walker22 shows reduction in IO
also does not support DMA isolation work-packets per second on standard submission/completion overhead by a
through IOMMU isolation. Even with server hardware. factor of 10 as measured with the SPDK
these limitations, UIO is well suited DPDK also provides libraries to aid software overhead measurement tool.
for use in virtual machines, where di- kernel-bypass application develop- To provide the reader with a better
rect IOMMU access is not available. In ment. These libraries enable probing understanding of the impact of legacy
these situations, a guest VM user space for PCI devices (attached via UIO or IO we present data from the ‘fio’ bench-
process is not isolated from other pro- VFIO), allocation of huge-page memo- marking tool (https://github.com/ax-
cesses in the same guest VM, but the ry, and data structures geared toward boe/fio). Figure 8 shows performance
hypervisor itself can isolate the guest polled-mode message-passing applica- data, for kernel-based IO (with Ext4
VM from other VMs or host processes tions such as lockless rings and mem- and raw block access) and SPDK. The
using the IOMMU. ory buffer pools with per-core caches. data compares throughput with the
For bare-metal environments, VFIOb Figure 6 shows key components of the number of client threads. Configura-
is the preferred framework for Linux DPDK framework. tion is queue depth of 32, and IO size of
kernel-bypass. It operates with the Storage performance development 4KiB. Sequential read, sequential write,
Linux kernel’s IOMMU subsystem to kit. SPDK is based on the foundations random read, random write, and 50:50
of DPDK. It was introduced by Intel read-write workloads are examined.
a https://lwn.net/Articles/232575/
Corporation in 2015 with a focus on The key takeaway is that SPDK re-
b https://www.kernel.org/doc/Documentation/ enabling kernel-bypass storage and quires only one thread to get over 90%
vfio.txt storage-networking applications using of the device’s maximum performance.
Figure 7. SPDK architecture.
iSCSI vhost-scsi NVMe-oF* vhost-blk Integration

Storage Target Target Target Target
Protocols
SCSI NVMe
RocksDB
Block Device Abstraction (BDEV)
Logical BlobFS Ceph

Storage 3rd Party Volumes
Services
Blobstore
NVMe Linux Async IO Ceph RBD
NVMe Devices Core
Intel® QuickData Application

Drivers NVMe-oF* NVMe* PCIe Technology Driver Framework
Initiator Driver

review articles
Note also that the SPDK data repre- and insights that adopters of kernel- threads can round-robin (or some oth-
sents a 1:1 mapping of threads to hard- bypass technology, such as DPDK and er scheduling policy) across multiple
ware queues and therefore the number SPDK, should consider. asynchronous tasks. For example, a
of threads is limited to the number of Cost of context switching. Rais- single thread might service both hard-
queues available (limited to 16 queues in ing IO operations into user space re- ware and software queues at the same
this case). The kernel-based data repre- quires careful consideration of soft- time (see Figure 10). Hardware queues
sents the number of user-threads multi- ware architecture. Traditional OS reside in memory on the device and are
plexed (via two layers of software queues) designs rely on interrupts and con- controlled by the device itself, while
to the underlying device queues.5 text switching to multiplex access to software queues reside in main mem-
From the data, we can see that gen- the CPU. In a default Linux configu- ory and are controlled by the CPU. IO
erality and the associated functionality ration for example, the NVMe device requests typically flow through both.
impact performance. Reducing soft- driver will use a per-core submission Polling is asynchronous in that the
ware overhead by tailoring and opti- queue, serviced by the same core, thread does not synchronously wait
mizing the stack (according to specific and therefore context switching can- for completions of a specific request,
application requirements) improves not be avoided. but retrieves the completion at a later
storage applications in two ways. First, Context switches are costly (more so point in time.
with fewer CPU cycles spent on process- than system calls) and should be avoid- Asynchronous polling can be cou-
ing IO, more CPU cycles are available for ed at high IO rates. They result in cache pled with lightweight thread schedul-
storage services such as compression, pollution that arises from both evic- ing (co-routines) found, for example,
encryption, or storage networking. Sec- tion of cache by the task contexts and in Intel Cilk.16 Such technologies allow
ond, with the advent of ultra-low latency the subsequent impact of working set program-level logical concurrency to
media, such as Intel Optane, higher per- memory of the newly scheduled task. be applied without the cost of context
formance can be achieved for low queue The typical cost of a context switch is in switching. Each kernel thread services
depth workloads since the software the order of 2,000–5,000 clock cycles. a task queue by applying stack swap-
overhead is much smaller compared to Figure 9 presents data from lmbench ping to redirect execution. Lightweight
the media latency. (http://www.bitmover.com/lmbench/) scheduling schemes typically execute
Klimovic et al.14 have applied running on a dual-socket Intel E5-2650 tasks to completion, that is, they are
DPDK and SPDK in the context of v 4 @ 2.2GHz, 32K L1, 256K L2, and non-preemptive. This is well suited to
distributed SSD access. Their results 30MB L3 caches. asynchronous IO tasks.
show performance improvements for Polling-based designs minimize IO Lock-free inter-thread communica-
the FlashX graph-processing frame- latency by eliminating the need to ex- tions. Because polling threads cannot
work of up to 40% versus iSCSI. They ecute interrupt handlers for inbound perform extensive work without risk-
also make a comparison with Rocks- IO, and removing system calls/context ing device queue overflow (just as con-
DB and show a delta of ∼28% between switches for outbound IO. However, ventional interrupt service routines
iSCSI and their solution. This work polling threads must be kept busy per- must be tightly bound and therefore
is based on the IX Dataplane Operat- forming useful work as opposed to typically defer work) they must off-load
ing system,3 which is fundamentally spending time polling empty or full work to, or receive work from, other ap-
based on kernel-bypass approaches. queues (busy-work). plication threads.
Asynchronous polling. A key de- This requires that threads must co-
Kernel-Bypass sign pattern that can be used to im- ordinate execution. A practical design
Design Considerations prove the utility of polling threads is pattern for this is message passing
Here, we present some design aspects asynchronous polling. Here, polling across lock-free FIFO queues. Differ-
Figure 8.Comparison of fio performance for Linux kernel vs. SPDK.
Data from ‘fio’ v.2.16.52 SEQread RANDread RW50:50

Intel P4800X Optane SSD SEQwrite RANDwrite
Intel E5-2650v4 @ 2.2GHz
fio+ext2+kernel Intel Optane P4800X (3DXP) SSD fio+raw+kernel Intel Optane P4800X (3DXP) SSD SPDK Intel Optane P4800X (3DXP) SSD
QD=32 IOSIZE=4K QD=32 IOSIZE=4K QD=32 IOSIZE=4K
3000 3000 3000
Throughput MB/s
Throughput MB/s
Throughput MB/s
2500 2500 2500

2000 2000 2000
1500 1500 1500
1000 1000 1000
500 500 500
0 0 0
0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35
#threads #threads #threads

review articles
ent lock-free queue implementations Combining polling and interrupt to the application the appearance of
can be used for different ratios of pro- modes. Another strategy to avoid busy- more memory than is in the system.
ducer and consumer (for example, waiting on queues is to combine poll- The mechanisms behind swapping are
single-producer and single-consumer, ing and interrupt modes. To support also used for mapped files, where a file
single-producer and multi-consumer). this, VFIO provides a capability to at- copy shadows a region of memory. Tra-
Lock-free queues are well suited to tach a signal (based on a file-descrip- ditionally, swapping is not heavily used
high-performance user-level IO since tor) to an interrupt so that a blocking because the cost of transferring pages
they do not require kernel-level lock- user-level thread can be alerted when to storage devices that are considerably
ing, but rely on machine-level atomic an interrupt event occurs. The follow- slower than memory is significant.
instructions. This means that exchanging excerpt illustrates connecting an For monolithic OS designs, page
es can be performed without forcing a MSI interrupt to a file handle using the swapping is implemented in the ker-
context switch (although one may un- POSIX API: nel. When there is no physical page
doubtedly occur if scheduled). mapping for a virtual address (that
A basic implementation of lock- efd = eventfd(0, 0); is, there is no page table entry), the
free queues will perform busy-waiting CPU generates a page-fault. In the
when the queue is empty or full. This ioctl(vfIO fd , Intel x86 architecture, this is realized
means that the thread is continuously VFIO _ EVENT _ FDMSI, &efd); as a machine exception. The excep-
reading memory state and thus con- tion handler is run at a high privilege
suming 100% of the CPU it is running In this case, the IO threads wait on level (CPL 0) and thus remains in the
on. This results in high-energy utili- file descriptor events through read kernel. When a page-fault occurs,
zation. Alternatively, it is possible to or poll system calls. Of course, this the kernel allocates a page of mem-
implement lock-free queues that sup- mechanism is costly in terms of per- ory from a pool (typically known as
port thread sleeping in empty or full formance since waking up and signal- the page cache) and maps the page
conditions. This avoids busy-waiting ing a user-level thread from the kernel to the virtual address by updating
by allowing the OS to schedule other is expensive. However, because the in- the page table. When the physical
threads in its place. Sleeping can be terrupt is masked when generated, the memory pool is exhausted, the ker-
supported on either or both sides of user-thread controls unmasking and nel must evict an existing page by
the queue. To avoid race-conditions, can thus arbitrarily decide when to writing out the content to backing
implementations typically use an ad- revert to interrupt mode (for example, store and invalidating the page table
ditional “waker” thread. This pattern when an extended period of “quiet” entry (effectively un-mapping the
is well-known in the field of user-level time has passed). page). In Linux, the eviction policy is
IPC (Inter-Process Call).20 Memory paging and swapping. based on a variation of the Least Re-
Lock-free queues are established An important role of the kernel is to cently Used (LRU) scheme.10 This is a
in shared memory and can be used for handle page-faults and “swapping” generalized policy aimed at working
both inter-process and inter-thread memory to backing store when insuf- well for most workloads.
message exchange. To optimize inter- ficient physical memory is available. Because page-fault handling and
thread message passing performance Most operating systems use a lazy map- page swapping rely on the use of privi-
processor cores (on which the threads ping strategy (demand-paged) so that leged instructions and exception han-
execute) and memory should belong to virtual pages will not be mapped to dling, implementing them in user space
the same NUMA zone. Accessing mem- physical pages until they are touched. alone is inherently difficult. One ap-
ory across remote NUMA zones incurs Swapping provides an extended mem- proach is to use the POSIX mprotect and
approximately twice the access latency. ory model; that is, the system presents mmap/mumap system calls to explicitly
control the page mapping process. In
Figure 9. Context switch latencies on Intel E5-2650 based server. this case, page protection PROT_NONE
can be used to force the kernel to raise
Thread Count: 2 4 8 16 32 64 96 a signal on the user-level process when
10,000
the unmapped page is accessed. In our
own work, we have been able to realize
Latency (microseconds)
1,000 a paging overhead of around 20usec per

4K page (with SPDK-based IO), which is
100 Figure 10. Asynchronous polling pattern.
Software Hardware
10 IO Queues IO Queues
submission submission
Application
1 Threads
0 4 8 16 32 64 128 256 512 1024 2048 4096 8192 1638432768 completion completion
Working Set Size (KB)
Asynchronous Polling Thread

review articles
comparable to that of the kernel (tested conceptually Filesystem in User Space

against memory mapped files). (FUSE) technology could be used to
Memory flushing Linux. To opti- integrate into the kernel-based file
mize write-through to storage, it is also system hierarchy, the advantages of
necessary to track dirty pages, so that
only those that have been modified are Polling-based performance would be lost because of
the need to still pass control into the
flushed out to storage. If a page has only
been read during its active mapping,
designs minimize kernel. Evolution of the POSIX API is
needed to support hybrid kernel and
there is no need to write it back out to IO latency by user IO. “Pure” user-space file systems
storage. From the kernel’s perspective,
this function can be easily achieved by
eliminating the are still not broadly available.
˲˲ Legacy file systems and pro-
checking the page’s dirty bit in its corre- need to execute tocol stacks incorporate complex
sponding page table entry. However, as
noted earlier, accessing the page table
interrupt handlers software that has taken years of de-
velopment and debugging. In some
from user space is problematic. In our for inbound IO, cases, this software can be integrat-
own work, we have used two different
approaches to address this problem. and removing ed through “wrappers.” However,
in general this is challenging and
The first is to use a CRC checksum system calls/ redeveloping the software from the
over the memory to identify dirty pages.
Both Intel x86 and IBM Power architec- context switches ground up is more economic.
tures have CRC32 accelerator instruc-

tions that can compute a 4K checksum
for outbound IO. Integration of NVDIMMs
Non-Volatile Dual Inline Memory
in less than ∼1000 cycles. Note that Modules (NVDIMMs) attach non-vol-
optimizations such as performing the atile memory directly to the memory
CRC32 on 1024 byte blocks and per- bus, opening the possibility of appli-
forming a “short circuit” of the dirty cation programs accessing persistent
page identification can reduce further storage via load/store instructions.
the cost of CRC in this context. This requires additional libraries
An alternative approach is to use a and/or programming language exten-
kernel module to collect dirty page infor- sions5,9 to support the coexistence
mation on request from an application. of both volatile and non-volatile
This, of course, incurs an additional memory. The fundamental building
system call and page table walk. Con- blocks needed are persistent memory
sequently, this approach performs well management (for example, pool and
with small page tables, but is less per- heap allocators), cache management,
formant than CRC when traversal across transactions, garbage collection, and
many page table entries is needed. data structures that can operate with
Legacy integration. Designing persistent memory (for example, sup-
around a kernel bypass architecture is port recovery and reinstantiation).
a significant paradigm shift for appli- Today, two prominent open source
cation development. Consequently, projects are pushing forward the de-
there are some practical limitations velopment of software support for
to their adoption in legacy systems. persistent memory. These are pmem.
These include: io (http://pmem.io/), driven primarily
˲˲ Integration with existing applica- by Intel Corporation in conjunction
tions ased on a blocking threading with SNIA, and The Machine project
model requires either considerable re- (https://www.labs.hpe.com/the-ma-
writing to adhere to an asynchronous/ chine) from HP Labs. These projects
polling model, or shims to bridge the are working to build tools and librar-
two together. The latter reduces the po- ies that support access and manage-
tential performance benefits. ment of NVDIMM. Key challenges that
˲˲ Sharing storage devices between are being explored by these projects
multiple processes. Network devices and others,3,8,17,20 include:
handle this well via SR-IOV, but NVMe ˲˲ Cross-heap pollution: Pointers to
SR-IOV has only recently been added volatile data structures should not
to NVMe specification. Hence, sharing “leak” into the non-volatile heap. New
NVMe devices across multiple devices programming language semantics are
must be done through software. needed to explicitly avoid program-
˲˲ Integration with the existing file ming errors that lead to dangling and
system structures is difficult. While invalid references.

review articles
˲˲ Transactions: Support for ACID sources are becoming the bottleneck. Kozyrakis, C. and Bugnion, E. IX: A protected
dataplane operating system for high throughput and
(atomicity, consistency, isolation, du- Careful consideration of execution low latency. In Proceedings of USENIX Operating
rability) transactions offering well-de- paths is now paramount to effective Systems Design and Implementation, Oct. 2014,
49–65.
fined guarantees about modifications system design. 4. Bhattacharya, S.P. A Measurement Study of the
to data structures that reside in per- User space, kernel-bypass strate- Linux TCP/IP Stack Performance and Scalability on
SMP systems, Communication System Software and
sistent memory and are accessible by gies, provide a vehicle to explore and Middleware, 2006.
multiple threads. quickly develop new IO stacks. These 5. Bjørling, M., Axboe, J., Nellans, D. and Bonnet, P.
Linux block IO: Introducing multi-queue SSD access
˲˲ Memory leaks and permanent cor- can be used to exploit alignment of on multi-core systems. In Proceedings of the 6th
International Systems and Storage Conf., 2013,
ruption: Persistence makes memory requirements and function, becom- 22:1–22:10. ACM, New York, NY, USA.
leaks and errors that are normally re- ing readily tailored and optimized to 6. Coburn, J. et al. NV-Heaps: Making persistent objects
fast and safe with next-generation, non-volatile
coverable through program restart or meet the specific needs of an applica- memories. SIGPLAN Notices 46, 3 (Mar. 2011), 105–118.
reset, more pernicious. Strong safety tion. Flexibility of user space software 7. Dearle, A., Kirby, G.N.C. and Morrison, R. Orthogonal
persistence revisited. In Proceedings of the 2nd
guarantees are needed to avoid perma- implementation (as opposed to kernel International Conference on Object Databases, 2010,
nent corruption. space) enables easier development Springer Berlin, Heidelberg.
8. Gorman, M. Understanding the Linux Virtual Memory
˲˲ Performance: Providing tailored ca- and debugging, and enables the lever- Manager. Prentice Hall PTR, Upper Saddle River, NJ,
pabilities and leveraging the advantag- age of existing application libraries (for USA, 2004.
9. Grundler, G. Porting drivers to HP ZX1. Ottawa Linux
es of low latency and high throughput example, machine learning). Symposium, 2002.
enabled by NVDIMM technology. For the next decade, microprocessor 10. Intel Corporation. Intel 64 and IA-32 Architectures
Optimization Reference Manual. No. 248966-033,
˲˲ Scalability: Scaling data structures design trends are expected to continue June 2016.
to multi-terabytes also require scaling to increase on die transistor count. As 11. Intel Corporation. PCI-SIG Single Root IO
Virtualization Support in Intel® Virtualization
of metadata and region management instruction-level parallelism and clock Technology for Connectivity; https://www.intel.com/
structures. frequency increases have reached a content/dam/doc/white-paper/pci-sig-single-root-io-
virtualization-support-in-virtualization-technology-for-
˲ ˲ Pointer swizzling: Modifying em- plateau, increased core count and on- connectivity-paper.pdf
12. Kannan, S., Gavrilovska, A. and Schwan, K. PVM:
bedded (virtual address) pointer chip accelerators are the most likely Persistent virtual memory for efficient capacity
references for object/data structure differentiators for future processor scaling and object storage. In Proceedings of the 11th
European Conference on Computer Systems, 2016,
relocation.21 generations. There is also the possibil- 13:1–13:16. ACM, New York, NY, USA.
The real impact of NVDIMMs re- ity of “big” and “little” cores whereby 13. Kemper, A. and Kossmann, D. Adaptable pointer
swizzling strategies in object bases: Design,
mains to be seen. However, work by heterogeneous cores, with different realization, and quantitative analysis. International J.
Coburn et al.6 on NV-Heaps has shown capabilities (for example, pipelin- Very Large Data Bases 4, 3 (July 1995), 519–567.
14. Klimovic, A., Litz, H. and Kozyrakis, C. ReFlex: Remote
that for certain applications the move ing, floating point units, and clock Flash ≈ Local Flash. In Proceedings of the 22nd
from a transactional database to per- frequency), exist on the same proces- International Conference on Architectural Support
for Programming Languages and Operating Systems,
sistent memory can bring significant sor package. This is already evident in 2017, 345–359. ACM, New York, NY.
performance gains. ARM-based mobile processors. Such 15. Kumar, P. and Huang, H. Falcon: Scaling IO
performance in multi-SSD volumes. In Proceedings of
NVDIMM-based persistent memory an approach could help drive a shift USENIX Annual Technical Conference (Santa Clara,
lends itself to integration with user away from interrupt-based IO, toward CA, July 2017).
16. Lewin-Berlin, S. Exploiting multicore systems with
space approaches because it inherent- polling IO whereby “special” cores are Cilk. In Proceedings of the 4th International Workshop
ly provides access directly to the user dedicated to IO processing (possibly at on Parallel and Symbolic Computation, 2010, 18–19.
ACM, New York, NY, USA. ACM.
space application (although mapping a lower clock frequency). This would 17. Lin, F.X. and Liu, X. Memif: Towards programming
heterogeneous memory asynchronously. SIGARCH
and allocation may remain the ker- both eliminate context switches and Computing Architecture News 44, 2 (Mar. 2016),
nel’s control). This enables efficient, cache pollution, and would also enable 369– 383.
18. Siemon, D. Queueing in the Linux network stack. Linux
zero-copy DMA-centric movement of improved energy management and de- J. 231 (July 2013).
data through the memory hierarchy terminism in the system. 19. Tuning throughput performance for Intel Ethernet
adapters (2017); http://www.intel.com/content/
and into the storage device. A longer- Large capacity, NVDIMM-based www/us/en/support/ network-and-i-o/ethernet-
term vision is for a converged memory- persistent memory is on the horizon. products/000005811.html
20. Unrau, R. and Krieger, O. Efficient sleep/wake-up
storage paradigm whereby traditional The availability of potentially up to protocols for user-level IPC. In Proceedings
storage services (for example, durabil- terabytes of persistent memory, with of the 1998 International Conference on
Parallel Processing.
ity, encryption) can be layered into the sub-microsecond access latencies and 21. Volos, H., Tack, A.J. and Swift, M.M. Mnemosyne:
memory paradigm. However, to date, cache-line addressability, will acceler- Lightweight persistent memory. SIGPLAN Notices 47,
4 (Mar. 2011), 91–104.
this topic remains largely unaddressed ate the need to make changes in the IO 22. Walker, B. SPDK: Building blocks for scalable high-
by the community. software architecture. User space IO performance storage applications. SNIA Storage
Developer Conference, 2016, Santa Clara, CA, USA;
strategies are well positioned to meet https://www.snia.org/sites/default/files/SDC/2016/
Outlook the demands of high-performance presentations/performance/BenjaminWalker_SPDK_
Building_Blocks_SDC_2016.pdf,
Mainstream operating systems are storage devices and to provide an eco-
based on IO architectures with a 50- system that can effectively adopt load/ Daniel Waddington (daniel.waddington@ibm.com) is
year heritage. New devices now chal- store addressable persistence. a research staff member at IBM Almaden Research
Center in San Jose, CA, USA.
lenging these traditional designs
References Jim Harris (james.r.harris@intel.com) is a principal
bring unprecedented levels of concur- 1. Abramson, D. et al. Intel virtualization technology for engineer in the Network Platforms Group at Intel
rency and performance. The result is directed IO. Intel Technology J. 10, 3 (2006), 179–192. Corporation, Chandler, AZ, USA.
2. Atkinson, M. and Morrison, R. Orthogonally Persistent
that we are entering an era of CPU-IO Object Systems. The VLDB J. 4, 3 (July 1995), 319–402.
performance inversion, where CPU re- 3. Belay, A., Prekas, G., Klimovic, A., Grossman, S., © 2018 ACM 0001-0782/18/11 $15.00

research highlights
P. 147 P. 148
Technical
Perspective Where Did I Leave My Keys?
By Stephen Checkoway, Jacob Maskiewicz, Christina Garman,
Backdoor Joshua Fried, Shaanan Cohney, Matthew Green, Nadia Heninger,
Engineering Ralf-Philipp Weinmann, Eric Rescorla, and Hovav Shacham
By Markus G. Kuhn
P. 156 P. 157
Technical
Perspective LIBS: A Bioelectrical Sensing
Making Sleep System from Human Ears
Tracking
More User Friendly for Staging Whole-Night
By Tanzeem Choudhury Sleep Study
By Anh Nguyen, Raghda Alqurashi, Zohreh Raghebi,
Farnoush Banaei-Kashani, Ann C. Halbower, and Tam Vu

DOI:10.1145/ 3 2 6 6 2 8 9
Technical Perspective
To view the accompanying paper,
visit doi.acm.org/10.1145/3266291 rh
Backdoor Engineering
By Markus G. Kuhn
IMAGINE YOU ARE a cyber spy. Your day and (d) both G and H are one-way func- but this is, of course, just a smoke
job is to tap cryptographically pro- tions. These can be computed efficiently, screen. The sole advantage of this con-
tected communications systems. But but their inverses not. After H, an adver- struction is that it allows a backdoor. If
how? Straightforward cryptanalysis sary who can see some of the outputs ri you can choose g as g := he, then knowing
has long become impractical: the task cannot infer anything about the inter- your secret integer e immediately allows
of breaking modern algorithms, if nal states si or other outputs rj. We know you to convert any output value ri into
implemented correctly, far exceeds all many excellent choices for G and H: one- the next internal state of the DRBG as
computational power available to hu- way functions or permutations carefully (ri)e = (hsi)e = (he)si = g si = si+1:
manity. That leaves sabotage. engineered to be fast and to have no other
gs
0
gs
1
You can target many Achilles heels of a known exploitable properties. Most are s0 s1 s2 . . .
crypto system: random-bit generators, constructed from secure hash functions
hs 0
hs
1
hs 2
side channels, binary builds, certifica- or block ciphers. re

0 re 1
tion authorities, and weak default con- As a saboteur, you do not want these r0 r1 r2
figurations. You infiltrate the teams used. Instead, you lure your victims to-
that design, implement, and standardize ward a far more dangerous option: the So, if you contact a server and receive
commercial security systems and plant class of algebraic one-way functions one ri, you can now immediately pre-
there hidden weaknesses, known as that enabled public-key crypto. These dict all future rj used to protect the
backdoors, that later allow you to bypass are orders of magnitude slower and re- communication with others, and
the cryptography. quire much bigger values for equal se- decrypt or impersonate their mes-
Take random-bit generation. Secu- curity. Modular exponentiation is a sages. Job done. And nobody else
rity protocols distinguish intended peers simple example. If you follow a few rules can do this, because finding e from
from intruders only through their knowl- for choosing a big integer g and a big h and g is computationally infeasible
edge of secret bit sequences. Servers have prime number p, then G(x) := gx mod p is (the aforementioned discrete loga-
to choose many key values at random to such a one-way function. While gx alone is rithm problem). Unless, of course,
protect each communication session, and monotonic, and thus easy to invert, the they steal your backdoor by generating
an adversary who can successfully guess mod p operation (take the remainder af- their own e′ and replacing your g with
these can impersonate legitimate users. ter division by p) ensures the result re- their g′ := he′.
One trick to backdoor random bits mains uniformly spread over a fixed in- The following article by Checkoway
can be understood with basic high- terval and appears to behave highly et al. reports on the amazing indepen-
school algebra. A deterministic ran- randomly. The inverse discrete logarithm dent reconstruction of exactly such a
dom-bit generator (DRBG) is initialized problem, of calculating x when given backdoor, discovered in the firmware of
(seeded) with a start state s0, and then (gx mod p, p, g), becomes computationally a VPN router commonly used to secure
iterated with some generator function: infeasible, and we have a one-way func- access to corporate intranets. In 2004,
si+1 := G(si) tion. (In the following, we drop mention the NSA planted the above DRBG in
of the mod p operation, and just apply it NIST standard SP 800-90, including a g
G(s0) G(s1) automatically after each arithmetic op- and h of their choice. The details differ
s0 s1 s2 . . .
eration.) The exponentiation operator only slightly (elliptic curve operations
In simple DRBGs (say, for simulations), gx has an important additional property, rather than modular exponentiation,
the si may serve as both the state of the not affected by the mod operation: which uses slightly different notation;
generator, as well as its output. So any- (gx)y = (gy)x. While this commutativity is the top 16 bits of ri discarded, can be
one who saw an output si and knows G useless to honest designers of DRBGs, it guessed via trial and error). The basic
can easily predict all future outputs. can be invaluable to saboteurs. idea is identical.
Crypto-grade DRBGs make four im- Convince your victims that G(si) := gsi But planting a backdoor in a stan-
provements: (a) hardware noise sources and H(si) := hsi are excellent choices for dard is not enough. You now also have
(slow) seed s0, (b) the state si has hun- generating random numbers of the to ensure industry implements it cor-
dreds or thousands of bits, (c) a sec- highest security: rectly, such that an ri reaches you intact.
ond function H derives output values gs
0
gs 1 And that nobody else replaces your g.
ri := H (si) from the internal state s0 s1 s2 . . . And that is where this story begins.
hs 0
hs
1
hs 2
G(s0) G(s1) Markus G. Kuhn (mgk25@cam.ac.uk) is a Senior Lecturer

s0 s1 s2 . . .
r0 r1 r2 teaching computer security and cryptography at the
H(s0) H(s1) H(s2) University of Cambridge, England.
You can claim “provable security based
r0 r1 r2 on number-theoretical assumptions,” Copyright held by author.

research highlights
DOI:10.1145/ 32 6 6 2 9 1
Where Did I Leave My Keys?

Lessons from the Juniper Dual EC Incident
By Stephen Checkoway, Jacob Maskiewicz, Christina Garman, Joshua Fried, Shaanan Cohney,
Matthew Green, Nadia Heninger, Ralf-Philipp Weinmann, Eric Rescorla, and Hovav Shacham
Abstract nothing but replace a few embedded constants in Juniper’s

In December 2015, Juniper Networks announced multiple pseudorandom number generator. The reason why this
security vulnerabilities stemming from unauthorized code results in an attacker being able to decrypt connections
in ScreenOS, the operating system for their NetScreen is Juniper’s design decision to use the NSA-designed Dual
Virtual Private Network (VPN) routers. The more sophisti- EC Pseudorandom Number Generator (PRNG).4, 12 Dual EC
cated of these vulnerabilities was a passive VPN decryption has the problematic property that an attacker who knows
capability, enabled by a change to one of the parameters the discrete logarithm of one of the input parameters (Q)
used by the Dual Elliptic Curve (EC) pseudorandom num- with respect to a generator point, and is able to observe
ber generator. a small number of consecutive bytes from the PRNG,
In this paper, we described the results of a full inde- can then compute the internal state of the generator and
pendent analysis of the ScreenOS randomness and VPN thus predict all future output. Thus, it is critical that the
key establishment protocol subsystems, which we carried discrete logarithm of Q remain unknown. The changes to
out in response to this incident. While Dual EC is known the ScreenOS code replaced Juniper’s chosen Q with one
to be insecure against an attacker who can choose the selected by the attacker.
elliptic curve parameters, Juniper had claimed in 2013 From one perspective, the Juniper incident is just a par-
that ScreenOS included countermeasures against this ticularly intricate software vulnerability, which is interest-
type of attack. We find that, contrary to Juniper’s public ing on its own terms. More importantly, however, it sheds
statements, the ScreenOS VPN implementation has been light on the contentious topic of “exceptional access”
vulnerable to passive exploitation by an attacker who technologies which would allow law enforcement officials
selects the Dual EC curve point since 2008. This vulner- to gain access to the plaintext for encrypted data. A key
ability arises due to flaws in Juniper’s countermeasures component of any exceptional access system is restricting
as well as a cluster of changes that were all introduced access to authorized personnel, with the most commonly
concurrently with the inclusion of Dual EC in a single proposed approach being encrypting the target keying
2008 release. We demonstrate the vulnerability on a real material under a key (or keys) known to law enforcement
NetScreen device by modifying the firmware to install which are then kept under tight control. The use of Dual
our own parameters, and we show that it is possible to EC in ScreenOS creates what is in effect an exceptional
passively decrypt an individual VPN session in isolation access system with Q as the public key and the discrete log
without observing any other network traffic. This inci- of Q as the private decryption key. Historically, analysis of
dent is an important example of how guidelines for ran- exceptional access systems has focused on the difficulty
dom number generation, engineering, and validation of controlling the decryption keys. In the specific case of
can fail in practice. Additionally, it casts further doubt on ScreenOS, we do not know whether anyone had access to
the practicality of designing a safe “exceptional access” the corresponding key, but the Juniper incident starkly
or “key escrow” scheme of the type contemplated by law illustrates another risk: that of an attacker modifying a sys-
enforcement agencies in the United States and elsewhere. tem’s exceptional access capability in order to replace the
authorized public key with one under her control, thus
turning an exceptional access system designed for use
1. INTRODUCTION by law enforcement into one which works for the attacker.
In December 2015, Juniper announced that an “internal In this paper, we attempt to tell the story of that inci-
code review” revealed the presence of “unauthorized code dent, pieced together by forensic reverse engineering of
in ScreenOS that could allow a knowledgeable attacker […] dozens of ScreenOS firmware revisions stretching back
to decrypt VPN connections.” In response to this, Juniper nearly a decade, as well as experimental validation on
released patched versions of ScreenOS, the operating system NetScreen hardware. We first provide background on Dual
powering the affected NetScreen devices, but has declined EC itself, then examine the way that it is used in ScreenOS
to disclose any further information about the intrusion and and why this leads to such a severe vulnerability, then
vulnerability.
The original version of this paper is entitled “A Systematic
Immediately following Juniper’s advisory, security
Analysis of the Juniper Dual EC Incident” and was
researchers around the world—including our team—
published in Proceedings of the 23rd ACM Conference on
began examining the ScreenOS firmware to find the vul-
Computer and Communications Security (Vienna, 2016),
nerabilities Juniper had patched. They found that the
468–479.
change that rendered ScreenOS encryption breakable did

move to examine the history of the incident itself, and available, as in Juniper’s implementation, the attacker must
finally consider what lessons we can draw from this story. consider 216 candidate points. From the attacker’s perspec-
tive, this is the optimal situation.
2. DUAL EC IN SCREENOS Importantly, as far as is publicly known, Dual EC is secure
Cryptographic systems typically include deterministic against an attacker who knows P and Q but does not know d,
PRNGs that expand a small amount of secret internal state as recovering d would require the ability to compute discrete
into a stream of values which are intended to be indistin- logarithms, which would break elliptic curve cryptography
guishable from true randomness. An attacker able to pre- in general.
dict the output of a PRNG will often be able to break any
protocol implementation dependent on it, for instance 3. THE SCREENOS PRNG SUBSYSTEM
by being able to predict cryptographic keys (which should Listing 1 shows the decompiled source code for the func-
remain secret) or nonces (which should often remain tions implementing the PRNG in ScreenOS version 6.2.0r1;
unpredictable). the same function is present in other releases in the 6.2 and
Dual EC is a cryptographic PRNG standardized by 6.3 series. It consists of two PRNGs, Dual EC and ANS X9.31
National Institute of Standards and Technology (NIST) (Appendix A.2.4; Ref.2).
which is based on operations on an elliptic curve. Dual Note that identifiers such as function and variable names
EC has three public parameters: the elliptic curve and are not present in the binary; we assigned these names based
two points on the curve called P and Q. ScreenOS uses on our analysis of the apparent function of each symbol.
the elliptic curve P-256 and sets P to be P-256’s stan- Similarly, specific control flow constructs are not preserved
dard generator as specified in NIST Special Publication by the compilation/decompilation process. For instance,
800-90.4 That standard also specifies the Q to use, but the for loop on line 21 may in fact be a while loop or some
ScreenOS uses Juniper’s own elliptic curve point Q other construct in Juniper’s source code. Decompilation does,
instead. The finite field over which P-256 is defined has however, preserve the functionality of the original code. For
roughly 2 256 elements. Points on P-256 consist of pairs clarity, we have omitted Federal Information Processing
of 256-bit numbers (x, y) that satisfy the elliptic curve Standards (FIPS) checks that ensure that the X9.31 genera-
equation. The internal state of Dual EC is a single 256-bit tor has not generated duplicate output.
number s. A superficial reading of the prng_generate() func-
Let x(×) be the function that returns the x-coordinate tion suggests that Dual EC is used only to generate keys
of an elliptic curve point; || be concatenation; lsb n(×) be for the X9.31 PRNG, and that it is the output of X9.31 that
the function that returns the least-significant n bytes of is returned to callers (in the output global buffer). The
its input in big-endian order; and msb n(×) be the func- Dual EC vulnerability described in Section 2 requires
tion that returns the most-significant n bytes. Starting raw Dual EC output, so it cannot be applied. Indeed, a
with an initial state s 0, one invocation of Dual EC imple- 2013 knowledge base article by Juniper8 claims exactly
mentation generates a 32 pseudorandom byte output this. (We discuss this knowledge base article further in
and a new state s 2 as Section 6.)
s1 = x(s0 P) r1 = x(s1Q) Listing 1: The core ScreenOS 6.2 PRNG subroutines.

s2 = x(s1 P) r2 = x(s2Q) 1 char block[8], seed[8], key[24]; // X9.31 vars
2 char output[32]; // prng_generate output
output = lsb30(r1) || msb2(lsb30(r2)), 3 unsigned int index, calls_since_reseed;
4
where sP and sQ denote scalar multiplication on P-256. 5 void prng_reseed(void) {
6 calls_since_reseed = 0;
In 2007, Shumow and Ferguson showed16 that Dual EC 7 if (dualec_generate(output, 32) != 32)
was subject to a state reconstruction attack by an adversary 8 error("[...] unable to reseed\n", 11);
who knows the value d such that P = dQ and who can observe 9 memcpy(seed, output, 8);
10 index = 8;
a single output value. The key insight is that multiplying the 11 memcpy(key, &output[index], 24);
point s1Q by d yields the internal state x(d × s1Q) = x(s1P) = s2. 12 index = 32;
Although s1Q is itself not known, 30 of the 32B of its 13 }
x-coordinate (namely r1) constitute the first 30B of output, and 14
15 void prng_generate(void) {
the attacker can guess the remaining bytes; the x-coordinate of 16 int time[2] = { 0, get_cycles() };
an elliptic curve point determines its y-coordinate up to sign. 17 index = 0;
Assuming that the attacker knows the discrete log of Q, 18 ++calls_since_reseed;
19 if (!one_stage_rng())
the major difficulty is recovering a complete output value; 20 prng_reseed();
an attacker who only knows part of the value must exhaus- 21 for (; index <= 31; index += 8) {
tively search the rest. The number of candidates grows 22 // FIPS checks removed for clarity
exponentially as fewer bytes of r1 are revealed, and recovery 23 x9_31_generate_block(time, seed, key, block);
24 // FIPS checks removed for clarity
is intractable with fewer than about 26B. In ScreenOS, Dual 25 memcpy(&output[index], block, 8);
EC is always used to generate 32B of output at a time, and 26 }
therefore the attack is straightforward. When 30B of r1 are 27 }

research highlights
In this reading, the prng_reseed() function is occa- 1 char block[8], seed[8], key[24]; // X9.31 vars
2 unsigned int calls_since_reseed;
sionally invoked to reseed the X9.31 PRNG state. This func-
3
tion invokes the Dual EC generator, directing its output to 4 void prng_generate(char *output) {
the 32B buffer output. From this buffer, it extracts a seed 5 unsigned int index = 0;
and cipher key for the X9.31 generator. With X9.31 seeded, 6 // FIPS checks removed for clarity
7 if (calls_since_reseed++ > 9999)
the prng_generate() function generates 8B of X9.31 8 prng_reseed();
output at a time (line 23) into output, looping until it has 9 // FIPS checks removed for clarity
generated 32B of output (lines 21–26). Each invocation of 10 int time[2] = { 0, get_cycles() };
x9_31_generate_block updates the X9.31 seed state in 11 do {
12 // FIPS checks removed for clarity
the seed buffer. 13 x9_31_generate_block(time, seed, key, block);
The straightforward reading given above is wrong. 14 // FIPS checks removed for clarity
First, and most importantly, index, the control variable for 15 memcpy(&output[index], block, min(20-index,8));
16 index += min(20-index, 8);
the loop that invokes the X9.31 PRNG in prng_generate() 17 } while (index <= 19);
at line 21, is a global variable. The prng_reseed() func- 18 }
tion, if called, sets it to 32, with the consequence that, when-
ever the PRNG is reseeded, index is already greater than 31 In the same version 6.2 release of ScreenOS that added Dual
at the start of the loop and therefore no calls to the X9.31 EC (Section 2) and modified the PRNG subsystem to expose
PRNG are executed.a raw Dual EC output (Section 3), Juniper made a cluster of IKE
Second, in the default configuration, one_stage_rng() implementation changes that make it possible for an attacker
always returns false, so prng_reseed() is always called. who knows the Dual EC secret d to decrypt VPN connections.
In the default configuration, then, the X9.31 loop is never In the remainder of these sections, we provide a brief descrip-
invoked. (There is an undocumented ScreenOS command, tion of the relevant features of IKE and then explain the
set key one-stage-rng, that makes one_stage_ impact of these changes.
rng() always return true; running this command induces a
different PRNG vulnerability, discussed in the full version 4.1. Overview of IKE
of this paper.5) IKE and its successor IKEv2 are traditional Diffie–Hellman-
Third, the prng_reseed() happens to use the out- based handshake protocols in which two endpoints (dubbed
put global buffer as a staging area for Dual EC output the initiator and the responder) establish a Security Association
before it copies parts of that output to the other global (SA) consisting of parameters and a set of keys used for encrypt-
buffers that hold the X9.31 seed and key. This is the same ing traffic. Somewhat unusually, IKE consists of two phases:
global buffer that the prng_generate() function was Phase 1 establishes an “IKE SA” that is tied to the end-points
supposed to fill with X9.31 output, but fails to. When call- but not to any particular class of non-IKE network traffic. In
ers look for PRNG output in output, what they find is 32B this phase, the two sides exchange Diffie–Hellman (DH) shares
of raw Dual EC output. and nonces, which are combined to form the derived keys. The
For comparison, Listing 2 shows the decompiled endpoints may be authenticated in a variety of ways including
source code for the PRNG function in ScreenOS 6.1, a signing key and a statically configured shared secret.
before Juniper’s revamp. In ScreenOS 6.1, the loop Phase 2 establishes SAs that protect non-IKE traffic
counter, index, is a local variable rather than a global; (typically IPsec). The IKE messages for this phase are pro-
the X9.31 PRNG is reseeded from system entropy every tected with keys established in the first phase. This phase
10,000 calls, instead of every call and from Dual EC; and may involve a DH exchange but may also just consist of an
PRNG output is placed in a caller-supplied buffer instead exchange of nonces, in which case the child SA keys are
of a global variable. derived from the shared secret established in the first phase.
In addition, the ScreenOS 6.1 PRNG subsystem pro- IKEv2 refers to these phases as “Initial Exchange” and
duces 20B at a time, not 32B as in ScreenOS 6.2 and 6.3. “CREATE_CHILD_SA,” respectively; for simplicity we will
We discuss the significance of this difference in the next use the IKEv1 Phase 1/Phase 2 terminology in the rest of
section. this article.
An attack on IKE where ScreenOS is the responder
4. INTERACTION WITH IKE would proceed as follows: (1) using the responder nonce
ScreenOS implements the Internet Protocol Security (IPsec) in the first phase, compute the Dual EC state; (2) predict
VPN protocol. To choose the keys that protect a VPN session, the responder’s DH private key and use that to compute
the client and the ScreenOS device perform an Internet Key the DH shared secret for the IKE SA, which is used to
Exchange (IKE)7, 11 handshake. generate the first set of keys; (3) using these traffic keys
decrypt the second phase traffic to recover both initia-
Listing 2: The core ScreenOS 6.1 PRNG subroutine. tor and responder nonces and public keys; (4) recover
the responder’s private key, either by running Dual EC
forward (the best case scenario) or by repeating the Dual
a
The global variable reuse was first publicly noted by Willem Pinckaers on
EC attack using the new responder nonce; (5) use the
Twitter. Online: https://twitter.com/_dvorak_/status/679109591708205056, responder’s private key and the initiator’s public key to
retrieved February 18, 2016. compute the shared secret for the second phase SA and

thereby the traffic keys; and (6) use the traffic keys to contains a pre-generation feature that maintains a pool of
decrypt the VPN traffic. nonces and DH keys that can be used in new IKE connec-
However, while this is straightforward in principle, there tions, reducing handshake latency. The pooling mechanism
are a number of practical complexities and potential imple- is quite intricate and appears to be designed to ensure that
mentation decisions which could make this attack easier or enough keys are always available while avoiding consuming
more difficult (or even impractical) as described below. too much run time on the device.
Independent First In, First Out (FIFO) queues are main-
4.2. Nonce size tained for nonces, for each supported finite field DH group
For Dual EC state reconstruction to be possible, the attacker (MODP 768, MODP 1024, MODP 1536, and MODP 2048), and
needs more than just to see raw Dual EC output. She needs at (in version 6.3) for each supported elliptic curve group
least 26B of the x-coordinate of a single elliptic-curve point (ECP 256 and ECP 384). The sizes of these queues depend
to recover the Dual EC state; fewer bytes would be insuffi- on the number of VPN configurations that have been
cient (Section 2). enabled for any given group. For instance, if a single con-
Luckily for the attacker, the first 30B of the 32B returned figuration is enabled for a group then that group will have
by ScreenOS’s Dual EC implementation belong to the queue size of 2. The size of the nonce queue is set to be
x-coordinate of a single point, as we saw in Section 2. Luckily twice the aggregate size of all of the DH queues. At startup,
again for the attacker, ScreenOS’s PRNG subsystem also the system fills all queues to capacity. A background task
returns 32B when called, and these are the 32B returned that runs once per second adds one entry to a queue that is
by a Dual EC invocation, as we saw in Section 3. Finally, not full. If a nonce or a DH share is ever needed when the
IKE nonces emitted by ScreenOS are 32B long and pro- corresponding queue is empty, a fresh value is generated
duced from a single PRNG invocation. To summarize: In on the fly.
ScreenOS 6.2 and 6.3, IKE nonces always consist of 30B of The queues are filled in priority order. Crucially, the
one point’s x-coordinate and 2B of the next point’s x-coor- nonce queue is assigned the highest priority; it is fol-
dinate—the best-case scenario for Shumow–Ferguson lowed by the groups in descending order of cryptographic
reconstruction. strength (ECP 384 down to MODP 768). This means that in
It is worth expanding on this point. The IKE standards many (but not all) cases, the nonce for an IKE handshake
allow any nonce length between 8 and 256B (Section 5; will have been drawn from the Dual EC output stream ear-
Ref.7). An Internet-wide scan of IKE responders by Adrian et lier than the DH share for that handshake, making single-
al.3 found that a majority use 20B nonces. We are not aware connection attacks feasible.
of any cryptographic advantage to nonces longer than 20B. Figure 1 shows a (somewhat idealized) sequence of
ScreenOS 6.1 sent 20B nonces and, as we noted in Section generated values, with the numbers denoting the order
3, its PRNG subsystem generated 20B per invocation. In in which queue entries were generated, before and after
ScreenOS 6.2, Juniper introduced Dual EC, rewrote the an IKE Phase 1 exchange. Figure 1a shows the situation
PRNG subsystem to produce 32B at a time, and modified after startup: The first four values are used to fill the nonce
the IKE subsystem to send 32B nonces. queue and the next two values are used to generate the DH
shares. Thus, when the exchange happens, it uses value 1
4.3. NONCES AND DH KEYS for the nonce and value 5 for the key, allowing the attacker
An attacker who knows the d corresponding to Juniper’s to derive the Dual EC state from value 1 and then compute
point Q and observes an IKE nonce generated by a forward to find the DH share. After the Phase 1 exchange,
ScreenOS device can recompute the device’s Dual EC state which consumes a DH share and a nonce, and after execu-
at nonce generation time. She can roll that state forward tion of the periodic, queue-refill task, the state is as shown
to predict subsequent PRNG outputs, though not back to in Figure 1b, with the new values shaded.
recover earlier outputs. ScreenOS uses its PRNG to gener- Depending on configuration, the IKE Phase 2 exchange
ate IKE Diffie–Hellman shares, so the attacker will be able would consume either a nonce and a DH share or just a
to predict DH private keys generated after the nonce she nonce. If the exchange uses both a nonce and a DH share,
saw and compute the session keys for the VPN connec- the dequeued nonce will again have been generated before
tions established using those IKE handshakes.
This scenario is clearly applicable when the attacker has Figure 1. Nonce queue behavior during an IKE handshake. Numbers
a network tap close to the ScreenOS device, and can observe denote generation order, and values generated after the handshake
many IKE handshakes. But what if the attacker’s network are shaded. During a DH exchange, outputs 1 and 5 are used as the
nonce and key, advancing the queue, and new outputs are generated
tap is close to the VPN client instead? She might observe
to fill the end of the queue.
only a single VPN connection. If the nonce for a connection
is generated after the DH share, the attacker will not be able Nonces 1 2 3 4 Nonces 2 3 4 7
to recover that session’s keys.
A superficial reading of the ScreenOS IKE code seems to MODP MODP
rule out single-connection attacks: The KE payload contain- 5 6 6 8
1024 1024
ing the DH share is indeed encoded before the Nr payload
containing the nonce. (a) At system startup. (b) After a DH exchange.
Conveniently for the attacker, however, ScreenOS also

research highlights
the dequeued DH share. That property will continue to hold We modified firmware version 6.3.0r12 to put in place our
for subsequent IKE handshakes, provided that handshakes point Q, matching Dual EC Known Answer Test (KAT) values,
do not entirely exhaust the queues. Had the refill task not and the (non-cryptographic) firmware checksum, and we
prioritized refilling the nonce queue before any DH group installed the modified firmware on our device. (Our device
queue, single-connection attacks would not have been pos- did not have a code-signing certificate installed, so we did
sible. Had the nonce queue been the same length as a DH not need to create a valid cryptographic signature for our
share queue, single-connection attacks would not have been modified firmware.)
possible in configurations where IKE Phase 2 consumed a Using the new firmware, we configured the device with
nonce but not a DH share. three separate VPN gateways, configured for IKEv1 with a
ScreenOS 6.1 pregenerates DH shares but not nonces; the preshared key, IKEv1 with a 1024-bit RSA signing certificate,
nonce queues we have described were added in ScreenOS 6.2, and IKEv2 with a preshared key, respectively. We made con-
along with Dual EC. Had nonce queues not been added, no nections to each gateway using the strongSwan VPN soft-
handshakes would have been vulnerable to single-connection ware as our initiator and recorded the traffic to our device.
decryption attacks. We successfully decrypted each connection by recovering
In the presence of multiple nonce-only Phase-2 exchanges the Dual EC state and traffic keys using just that connec-
within a single Phase-1 exchange, multiple DH groups tion’s captured packets.
actively used in connections, queue exhaustion, or certain
race conditions, the situation is more complicated, and it is 6. HISTORY OF THE JUNIPER INCIDENT
possible for an IKE handshake phase to have its DH share The history of the Juniper incident begins nearly
generated before its nonce. Single-connection decryption a decade ago.b In October 2008, Juniper released
attacks would fail for those handshakes. Refer to the full ver- ScreenOS 6.2. As described in detail above, this
sion of this paper for details.5 release (1) replaced an entropy-gathering procedure
for (re)seeding the ANS X9.31 PRNG with Dual EC
4.4. Recovering traffic keys using a custom Q point; (2) modified the X9.31 reseed
If the attacker can predict the Diffie–Hellman private key logic to reseed on every call rather than every ten
corresponding to the ScreenOS device’s DH share for an IKE thousand calls; (3) changed the loop counter in the
exchange, she can compute the DH shared secret for that prng_generate procedure as well as the procedure’s
exchange. With knowledge of the DH shared secret, com- output to be global variables, shared with the reseed pro-
puting the session keys used to encrypt and authenticate cedure, thus ensuring that pseudorandom values are gen-
the VPN session being set up is straightforward, though the erated by Dual EC, and not X9.31; (4) changed the IKE
details depend on the IKE protocol version and the way in nonce length from 20B to 32B; and (5) added a nonce pre-
which the endpoints authenticate each other; for details, see generation queue.
the full version of this paper.5 The result of the first four changes is that whoever
For IKEv1 connections authenticated with digital knew the integer d corresponding to Juniper’s Q could
signatures, the attacker knows everything she needs to passively decrypt (some) VPN traffic. Each of the first
compute the session keys. For IKEv1 connections authen- four changes is critical to the attack described in this
ticated with public key encryption, each peer’s nonce is article. The fifth change enables single-connection
encrypted under the other’s Rivest–Shamir–Adleman attacks in many cases, but is not necessary for multi-
(RSA) public key, stopping the attack. IKEv1 connections connection attacks.
authenticated with preshared keys fall somewhere in the This state of affairs continued for four years. At some
middle: The attacker will need to know the preshared point prior to the release of ScreenOS 6.2.0r15 (September
key in addition to the DH shared secret to compute the 2012) and ScreenOS 6.3.0r12 (August 2012), someone mod-
session keys. If the preshared key is strong, then the con- ified Juniper’s source code. Based on the patched firmware
nection will still be secure. Fortunately for the attacker, revisions Juniper would later release, the modifications
many real-world VPN configurations use weak preshared were quite small: The x-coordinate of Juniper’s Dual EC’s
keys (really passwords); in such cases having recorded an Q was changed as was the expected response to Dual EC’s
IKE handshake and recovered the DH shared secret, the Known Answer Test. As a result, the set of people who could
attacker will be able to mount an offline dictionary attack passively decrypt ScreenOS’s VPN traffic changed from
on the preshared key. By contrast, the attacker will be able those who know Juniper’s d (if any) to those who know the
to compute session keys for IKEv2 connections in the new d corresponding to the changed Q (presumably the
same way, regardless of how they are authenticated. attacker who made the change).
Having computed the session keys, the attacker can Apparently unrelated to the 2012 changes, a second
decrypt and read the VPN traffic and, if she wishes, can tam- source code modification was made. A hard-coded SSH and
per with it. Telnet password was inserted into Juniper’s code at some
point before the release of ScreenOS 6.3.0r17 (April 2013).
5. EXPERIMENTAL VALIDATION Logging in with this password yields administrator access.
To validate the attacks we describe above, we purchased a
Juniper Secure Services Gateway 550M VPN device. We gen- b
The dates in this section come from file dates, ScreenOS release notes,
erated our own point Q and corresponding Dual EC secret d. and Juniper’s website, none of which agree precisely on any dates.

In early September 2013, the New York Times published ScreenOS uses Juniper’s own Q point since, at that time,
an article based on documents from Snowden strongly ScreenOS was shipping with the attacker’s Q. Second,
implying that the National Security Agency (NSA) had engi- by the end of 2015, Juniper knew that Dual EC could be
neered Dual EC to be susceptible to attack.15 The article does exploited in ScreenOS. Despite this, Juniper’s initial
not name Dual EC; it instead refers to a 2006 NIST standard fix was to revert the Q point to their initial value in each
with a “fatal weakness, discovered by two Microsoft cryp- affected ScreenOS revision. Eventually, after press cover-
tographers in 2007,” presumably referring to Dan Shumow age of our results, Juniper committed to removing Dual EC
and Niels Ferguson’s presentation at CRYPTO 2007.16 This from their PRNG subsystem.
reporting led NIST to withdraw its recommendation for
Dual EC.14 7. EXCEPTIONAL ACCESS AND NOBUS
After NIST withdrew its recommendation, Juniper subse- Law enforcement officials have been warning since 2014 that
quently published a knowledge base article explaining their they are “going dark”: that ubiquitous end-to-end encryp-
use of Dual EC in ScreenOS. tion threatens investigations by rendering intercepted com-
munications unreadable. They have called on technology
ScreenOS does make use of the Dual_EC_DRBG standard, companies to rearchitect their products so intercepted com-
but is designed to not use Dual_EC_DRBG as its primary munications could be decrypted given a court order. Computer
random number generator. ScreenOS uses it in a way that scientists have resisted such “exceptional access” mandates,
should not be vulnerable to the possible issue that has been arguing that whatever mechanism implements it would con-
brought to light. Instead of using the NIST recommended stitute a vulnerability that might be exploited by third parties.1
curve points it uses self-generated basis points and then Attempts to design exceptional access mechanisms
takes the output as an input to FIPS/ANSI X.9.31 [sic] PRNG, which do not introduce vulnerabilities go back at least as far
which is the random number generator used in ScreenOS as 1993, when the NSA introduced “Clipper,” an encryption
cryptographic operations.8 algorithm embedded in a hardware platform with a built-in
“key escrow” capability, in which cryptographic keys were
The first mitigation—using self-generated basis points— separately encrypted under a key known to the US govern-
only defends against the attacks described in this paper if Q ment. Such a mechanism would be “NOBUS,” in the jargon
is generated so that nobody knows d; Juniper has provided of the NSA, for “nobody but us” (p. 281; Ref.6): data would
no evidence that this is the case. As we describe in Section 3, be cryptographically secure against anyone who did not have
Juniper’s claim that the output of Dual EC is only used as an the keys but transparent to those who did.
input to X9.31 is incorrect. While the key escrow mechanism designed for Clipper
This was the situation on December 17, 2015 when involved encrypting the traffic keys under the escrow key, it
Juniper issued an out-of-cycle security bulletin9 for two secu- is also possible to build an exceptional access mechanism
rity issues in ScreenOS: CVE-2015-7755c (“Administrative around a system like Dual EC, with the escrow key being the
Access”) and CVE-2015-7756d (“VPN Decryption”). discrete log of Q. The common thread here is that the key is
This announcement was particularly interesting because intended to be known only to authorized personnel.
it was not the usual report of developer error, but rather of Whatever the intent of Juniper’s selection of Dual EC, its
malicious code which had been inserted into ScreenOS by use created what was in effect an exceptional access system:
an unknown attacker: one where the key was the d value corresponding to Juniper’s
choice of Q. We have no way of knowing whether anyone
During a recent internal code review, Juniper discovered knew that d value or not, and Juniper has not described how
unauthorized code in ScreenOS that could allow a they generated Q. However, around 2012, some organiza-
knowledgeable attacker to gain administrative access tion gained the ability to make changes to Juniper’s source
to NetScreen® devices and to decrypt VPN connections.
code repository. They used that access to change the Dual EC
Once we identified these vulnerabilities, we launched an
point Q to one of their choosing, in essence swapping out the
investigation into the matter, and worked to develop and
issue patched releases for the latest versions of ScreenOS.10 escrow key. Between September 2012 and December 2015,
official releases of ScreenOS distributed by Juniper included
The bulletin prompted a flurry of reverse-engineering the intruders’ point Q instead of Juniper’s. VPN connections
activity around the world, including by our team. The to NetScreen devices running affected releases were subject
“Administrative Access” issue was quickly identified as to decryption by the intruders, assuming they know the d
the 2013 source code modification. This issue has been corresponding to their point Q.
extensively discussed by Moore.13 Our analysis of the “VPN
Decryption” issue, described in this article, shows that the 8. LESSONS
2012 code modification is responsible. The ScreenOS vulnerabilities we have studied provide
Our analysis implies several items of note. First, the important broader lessons for the design of cryptographic
2012 code modification indicates that Juniper’s 2013 systems, which we summarize here.
knowledge base article8 is incorrect when it states that
e
Of course, reducing nonce size cannot prevent all data exfiltration strate-
c
https://cve.mitre.org/cgi-bin/cvename.cgi?name=cve-2015-7755 gies. However, it may increase the difficulty of hiding the necessary code,
d
https://cve.mitre.org/cgi-bin/cvename.cgi?name=cve-2015-7756 and the complexity of executing an attack.

research highlights
8.1. For protocol designers cryptographic systems. One step would be to track the ori-
Allowing nonces to vary in length, and in particular to be gin and use of any buffers—especially shared buffers—and
larger than necessary for uniquely identifying sessions, may enforce a rule that all random number generator output
be a bad idea. The authors are unaware of any cryptographic can be traced back to an appropriate cryptographic func-
rationale for 256B nonces, as permitted by IPsec; it is sim- tion, such as a block cipher or hash. Some form of cover-
ply an invitation for implementations to disclose sensitive age analysis might also have revealed that the whitening is
state, intentionally or not.e never performed.
Adding even low-entropy shared secrets as key derivation To the extent that FIPS guidelines mandate the use of
inputs helps protect against entropy failures. We observe a global state, they run counter to our suggestion, above, that
difference in exploitability of the ScreenOS bugs between cryptographic code be locally auditable.
IKEv1 and IKEv2 that is entirely due to the different use of Products are evaluated against FIPS standards by accred-
the preshared key between the two protocols. It is unfortu- ited laboratories. ScreenOS was FIPS certified with the
nate that IKEv2 is easier to exploit. X9.31 PRNG, yet the lab evaluating ScreenOS failed to spot
that X9.31 was never invoked, as well as failing to detect the
8.2. For implementers and code reviewers defect in the Dual EC implementation described in Section 3.
Cryptographic code must be locally auditable: It must be NIST should revisit its laboratory accreditation program
written in such a way that examining a function or a mod- to ensure more thorough audits, especially of randomness
ule in isolation allows the reader to understand its behavior. subsystem code.
ScreenOS’s implementation failed to live up to this
guideline. A loop counter in the core prng_generate 8.4. For attackers
routine was defined as a global variable and changed in a The choice by the attacker to target the random number
subroutine. This is a surprising-enough pattern that sev- generation subsystem is instructive. Random number gen-
eral experienced researchers who knew that the routine erators have long been discussed in theory as a target for
likely had a bug failed to spot it before Willem Pinckaers’ kleptographic substitution attacks,18 but this incident tells
contribution. The prng_generate and prng_reseed us that the threat is more real than has been known in the
routines reuse the same 32B buffer, output, for two academic literature.
entirely different purposes: Dual EC output with which From the perspective of an attacker, by far the most
to seed X9.31, and output from the PRNG subsystem. attractive feature of the ScreenOS PRNG attack is the ability
ScreenOS’s use of pregeneration queues makes it dif- to significantly undermine the security of ScreenOS without
ficult to determine whether nonces or Diffie–Hellman producing any externally detectable indication that would
shares are generated first. Someone reading the code for mark the ScreenOS devices as vulnerable. This is in contrast
the top-level functions implementing IKE in isolation to previous well-known PRNG failures, which were exter-
will conclude that Diffie–Hellman shares are generated nally observable, and, in the case of the Debian PRNG flaw,17
first, whereas in practice the opposite is usually the case. actually detected through observational testing. Indeed,
The state recovery attacks suffered by Juniper suggest the versions of ScreenOS containing an attacker-supplied
that implementations may wish to avoid revealing the raw parameter appear to have produced output that was crypto-
output of a random number generator entirely, perhaps by graphically indistinguishable from the output of previous
hashing any PRNG output before using it as a nonce. One versions, thus preventing any testing or measurement from
could also design implementations so that separate PRNGs discovering the issue.
are used for different protocol components, to separate
nonce security from key security. 8.5. For journalists
Several of the above mistakes represent poor software Much of the coverage of the Juniper disclosure has focused
engineering practices. Cryptographic code reviews, whether on the unauthorized changes made in 2012 to the random-
internal or external (e.g., for FIPS validation), should take ness subsystem and in 2013 to the login code. By contrast,
code quality into account. our forensic investigation of ScreenOS releases highlights
the changes made in the 6.2 series, in 2008, as the most
8.3. For NIST consequential.
Juniper followed then-current best practices in designing These changes, which introduced Dual EC and changed
and verifying their random number generators. They used a other subsystems in such a way that an attacker who knew
NIST-certified algorithm, followed the FIPS-recommended the discrete log of Q could exploit it, were, as far as we know,
procedure to verify the output using test vectors, and fol- added by Juniper engineers, not by attackers. This raises a
lowed a commonly recommended engineering guideline to number of questions:
use a PRNG as a whitener for a potentially insecure random How was the new randomness subsystem for the ScreenOS
number generator, removing—at least in theory—the struc- 6.2 series developed? What requirements did it fulfill? How
tured output that makes Dual EC vulnerable. did Juniper settle on Dual EC? What organizations did it
In this case, all three approaches failed. In particular, consult? How was Juniper’s point Q generated?
a crippling defect in the whitening countermeasure man-
aged to go undetected in FIPS certification. This suggests f
Online: https://oversight.house.gov/hearing/federal-cybersecurity-detection-
potential future work for research in the verification of response-and-mitigation/.

We are not able to answer these questions with access Proceedings of CCS 2015. C. Kruegel (Proposed Standard), Dec. 2005.
and N. Li, eds. ACM Press, New York, Obsoleted by RFC 5996,
to firmware alone. Juniper’s source code version-control NY, Oct. 2015, 5–17. updated by RFC 5282. Online:
system, their bug-tracking system, their internal e-mail 4. Barker, E., Kelsey, J. NIST Special https://tools.ietf.org/html/rfc4306.
Publication 800-90: Recommendation 12. Kelsey, J. Dual EC in X9.82 and SP
archives, and the recollections of Juniper engineers may for Random Number Generation Using 800-90A. Presentation to NIST VCAT
help answer them. Deterministic Random Bit Generators. committee, May 2014. Slides online
Technical report, National Institute http://csrc.nist.gov/groups/ST/crypto-
Despite numerous opportunities, including public ques- of Standards and Technology, June review/documents/dualec_in_X982_
tions put to their Chief Security Officer and a congressional 2006. and_sp800-90.pdf.
5. Checkoway, S., Maskiewicz, J., 13. Moore, H.D. CVE-2015-7755: Juniper
hearing on this incident,f Juniper has either failed or explic- Garman, C., Fried, J., Cohney, S., ScreenOS Authentication Backdoor.
Green, M., Heninger, N., https://community.rapid7. com/
itly refused to provide any further details. Weinmann, R.-P., Rescorla, E., community/infosec/blog/2015/12/20/
Shacham, H. A systematic analysis cve-2015-7755-juniper-screenos-
of the Juniper Dual EC incident. authentication-backdoor, Dec. 2015.
8.6. For policymakers In Proceedings of CCS 2016. 14. National Institute of Standards
Much of the debate about exceptional access has focused S. Halevi, C. Kruegel, and A. Myers, and Technology. NIST opens draft
eds. ACM Press, New York, NY, Oct. Special Publication 800-90A,
on whether it is possible to construct secure exceptional 2016, 468–479. recommendation for random number
access mechanisms, where “secure” is defined as only 6 Granick, J.S. American Spies: Modern generation using deterministic
Surveillance, Why You Should Care, random bit generators for review
allowing authorized access—presumably by law enforce- and What To Do About It. Cambridge and comment. http://csrc.nist.gov/
ment. It is readily apparent that one of the major difficul- University Press, Cambridge, 2017. publications/nistbul/itlbul2013_09_
7. Harkins, D., Carrel, D. The Internet supplemental.pdf, Sept. 2013.
ties in building such a system is the risk of compromise Key Exchange (IKE). RFC 2409 15. Perlroth, N., Larson, J., Shane, S.
of whatever keying material is needed to decrypt the (Proposed Standard), Nov. 1998. N.S.A. able to foil basic safeguards of
Obsoleted by RFC 4306, privacy on Web. The New York Times,
targeted data. updated by RFC 4109. Online: Sep. 5 2013. Online: http://www.
The unauthorized change to ScreenOS’s Dual EC con- https://tools.ietf.org/html/rfc2409. nytimes.com/2013/09/06/us/nsa-
8. Juniper Networks. Juniper foils-much-internet-encryption.html.
stants made in 2012 illustrates a new threat: the ability Networks product information 16. Shumow, D., Ferguson, N. On the
for another party to modify the target software to subvert about Dual_EC_DRBG. Knowledge possibility of a back door in the
Base Article KB28205, Oct. 2013. NIST SP800-90 Dual Ec Prng.
an exceptional access mechanism for its own purposes, Online: https://web.archive.org/ Presented at the Crypto 2007 rump
web/20151219210530/ https:// session, Aug. 2007. Slides online:
with only minimally detectable changes. Importantly, kb.juniper.net/InfoCenter/ http://rump2007.cr.yp.to/15-
because the output of the PRNG appears random to any index?page= content&id=KB28205&p shumow.pdf.
mv=print&actp=LIST. 17. Yilek, S., Rescorla, E., Shacham, H.,
entity that does not know the discrete log of Q, such a 9. Juniper Networks. 2015-12 Out of Enright, B., Savage, S. When private
change is invisible both to users and to any testing which Cycle Security Bulletin: ScreenOS: keys are public: Results from the
Multiple Security issues with 2008 Debian OpenSSL vulnerability.
the vendor might do. By contrast, an attacker who wants ScreenOS (CVE-2015-7755, CVE- In Proceedings of IMC 2009. A.
to introduce an exceptional access mechanism into a 2015-7756), Dec. 2015. Feldmann and L. Mathy, eds. ACM
10. Juniper Networks. Important Press, New York, NY, Nov. 2009,
program which does not already has one must gener- announcement about ScreenOS®. 15–27.
ally make a series of extremely invasive changes, thus Online: https://forums.juniper.net/ 18. Young, A., Yung, M. Kleptography:
t5/Security-Incident-Response/ Using cryptography against
increasing the risk of detection. Important-Announcement- cryptography. In Proceedings of
In the case of ScreenOS, an attacker was able to subvert about-ScreenOS/ba-p/285554, Eurocrypt 1997. W. Fumy, ed. volume
Dec. 2015. 1233 of LNCS, Springer-Verlag, May
a major product—one which is used by the federal govern- 11. Kaufman, C. Internet Key Exchange 1997, 62–74.
ment—and remain undiscovered for years. This represents (IKEv2) Protocol. RFC 4306
a serious challenge to the proposition that it is possible to

build an exceptional access system that is available only to
the proper authorities; any new proposal for such a system
should bear the burden of proof of showing that it cannot be
subverted in the way that ScreenOS was.
Acknowledgments
This material is based in part upon work supported by Stephen Checkoway, University of Joshua Fried, Shaanan Cohney,
the U.S. National Science Foundation under awards Illinois at Chicago, IL, USA. Nadia Heninger, University
of Pennsylvania, Philadelphia, PA, USA.
EFMA-1441209, CNS-1505799, CNS-1010928, CNS- Jacob Maskiewicz, Eric Rescorla,
1408734, and CNS-1410031; The Mozilla Foundation; a Hovav Shacham, University of California, Ralf-Philipp Weinmann, Comsecuris,
San Diego, CA, USA. Duisberg, Germany.
gift from Cisco; and the Office of Naval Research under
contract N00014-14-1-0333. Christina Garman, Matthew Green,
Johns Hopkins University, Baltimore,
MD, USA.
References (ASC) X9, Financial Services. ANS

1. Abelson, H., Anderson, R., Bellovin, S.M., X9.31-1998: Digital signatures using
Benaloh, J., Blaze, M., Diffie, W., reversible algorithms for the financial
Gilmore, J., Green, M., Landau, S., services industry (rDSA), 1998.
Neumann, P.G., Rivest, R.L., Schiller, J.I., Withdrawn.
Schneier, B., Specter, M., Weitzner, D.J. 3. Adrian, D., Bhargavan, K., Durumeric, Z.,
Keys under doormats: Mandating Gaudry, P., Green, M., Halderman, J.A.,
insecurity by requiring Heninger, N., Springall, D., Thomé, E.,
government access to all data and Valenta, L., VanderSloot, B., Wustrow, E.,
communications. Commun. ACM 58, Zanella-Béguelin, S., Zimmermann, P.
10 (Oct. 2015), 24–26. Imperfect forward secrecy: How
2. Accredited Standards Committee Diffie-Hellman fails in practice. In Copyright held by owners/authors. Publication rights licensed to ACM, $15.00.

research highlights
DOI:10/ 1145/ 32 6 6 2 8 5
Technical Perspective
To view the accompanying paper,
visit doi.acm.org/10.1145/3266287 rh
Making Sleep Tracking

More User Friendly
By Tanzeem Choudhury
AN E XC I T I N G A REA of research in mobile Existing sleep and bio-signal sensing which we can automatically track var-
and ubiquitous computing is the resolutions that measure EEG, EOG, or ious measures of sleep. The custom
cent development of novel sensing sys- EMG include wearable headbands, flexible electrodes wrapped around
tems capable of continuously tracking eyemasks, smart shirts, and wristbands off-the-shelf foam earplugs are able
behavioral and physiological signals that are not as cumbersome as PSG and to pick up multiple signals of interest
from individuals in their natural envi- are much less expensive, but still are (EEG, EOG, EMG) from the ear canal,
ronment. Often referred to as digital not comfortable enough for individu- and are designed for user comfort,
biomarkers, these signals capture peo- als to use continuously over extended allowing individuals to sleep more
ple’s everyday routines, actions, and periods of time. naturally. More importantly, they can
physiological changes that can explain There has also been much work re- be used at home in a user’s normal
outcomes related to health, cognitive lated to developing sleep trackers that bedroom setting.
abilities, and more. leverage sensor signals and usage pat- Using signal separation and classifi-
A key behavioral biomarker is sleep, terns from smartphones so that they cation algorithms, the LIBS system can
which is essential for human health, require minimal effort from users. pull out the different bio-signals from
learning, cognitive abilities, and brain Such systems have been used to esti- the single mixed in-ear channel, derive
development. About one-third of adults mate sleep duration, interruptions, relevant spectral and temporal features
suffer from some form of sleep disor- and even chronotype. More recently, from these signals, and classify stages
der. However, current sleep-tracking contactless sensing approaches that of sleep. The LIBS approach is a great
options are mostly restricted to special use WiFi or Doppler Radar measures example of how to trade off signal qual-
sleep clinics or hospitals, where indi- have been used to detect specific sleep- ity and user burden. Systems that focus
viduals are removed from their natu- ing problems, such as sleep apnea or on designing low-burden sensing and
ral sleep environment and undergo more general sleep quality, and sleep that push the boundary in terms of
polysomnography (PSG) that monitors stages based on changes in breathing sensing granularity and resolution can
brain activity via electroencephalogra- rate, heart rate, and movement. These significantly increase the adoption of
phy (EEG), eye movement via electro- contactless and smartphone-based mobile and ubiquitous solutions in the
oculography (EOG), and muscle activity methods place very low or no burden real world, especially in the realm of
using electromyography (EMG). on the user. Moreover, they are gen- healthcare where fidelity and accuracy
These bio-signals are combined to erally reliable for coarser features of of the measures are important. This is
infer sleep duration, sleep quality, and sleep such as duration and interrup- even more important if diagnosis and
stages of sleep (light, deep, and REM tions. However, they are less reliable treatment decisions are to be made
sleep). Beyond sleep, the ability to track for getting finer-grained information based on the measurements.
these signals can lead to new types of about sleep stages. LIBS is an elegant engineering solu-
brain-computer interfaces and the de- The Lightweight In-ear BioSensing tion to overcome real-world usability
tection of alertness and interests, eat- (LIBS) system work detailed in the barriers, it is accurate, and it provides
ing moments, autism onset, and more. following paper provides a nice bal- a reliable alternative to highly intrusive
In the recent years, researchers in ance in terms of minimizing the bur- PSG-based measures. Of course, in or-
ubiquitous and mobile computing den on users and the granularity at der to assess how broadly applicable
have pushed the boundaries of sensing the system is will require more longi-
digital biomarkers, and have created tudinal testing across variable sleep
novel systems that can track important The LIBS approach environments and across people with
markers in the real world. Crucially, different sleeping habits and in diverse
these systems are unobtrusive, mean- is a great example age groups. Nonetheless, LIBS takes a
ing they do not require removing indi- of how to trade off significant step in the right direction,
viduals from their natural environment and is sure to inspire more solutions
or to be burdened with a bulky sensing signal quality that can effectively balance accuracy
setup in order to get reliable measure- and user burden. and usability.
ments. One of the main challenges
in this domain is to balance the fidel- Tanzeem Choudhury (tanzeem.choudhury@cornell.
edu) is an associate professor of information science and
ity and accuracy of signals in the pres- director of the People-Aware Computing group at Cornell
ence of natural usage variations with University, Ithaca, NY, USA.
the burden that is placed on the users. Copyright held by author.

DOI:10.1145 / 3 2 6 6 2 8 7
LIBS: A Bioelectrical Sensing

System from Human Ears for
Staging Whole-Night Sleep Study
By Anh Nguyen, Raghda Alqurashi, Zohreh Raghebi, Farnoush Banaei-Kashani, Ann C. Halbower, and Tam Vu
Abstract As an effort to overcome the inherent limitations of PSG,

Sensing physiological signals from the human head has long there exist various wearable solutions developed to acquire
been used for medical diagnosis, human-computer interac- the biosignals with high resolution and easy self-applica-
tion, meditation quality monitoring, among others. However, bility. They involve electrode caps, commercial head-worn
existing sensing techniques are cumbersome and not desir- devices (e.g., EMOTIV, NeuroSky MindWave, MUSE, Kokoon,
able for long-term studies and impractical for daily use. Due Neuroon Open, Aware, Naptime, Sleep Shepherd, etc.), and
to these limitations, we explore a new form of wearable sys- hearing aid-like research devices.6, 10 However, these solu-
tems, called LIBS, that can continuously record biosignals tions are stiff, unstable, and only suitable for either short-
such as brain wave, eye movements, and facial muscle con- term applications or in-hospital use. In other words, they are
tractions, with high sensitivity and reliability. Specifically, still inconvenient and less socially acceptable for outdoor,
instead of placing numerous electrodes around the head, long-term, and daily activities.
LIBS uses a minimal number of custom-built electrodes to To fill in this gap, we propose a Light-weight In-ear
record the biosignals from human ear canals. This recording BioSensing (LIBS) system that can continuously record the
is a combination of three signals of interest and unwanted electrical activities of human brain, eyes, and muscles con-
noise. Therefore, we design an algorithm using a supervised currently using a minimum number of passive electrodes
Nonnegative Matrix Factorization (NMF) model to split the placed invisibly in the ear canals. In this work, particularly,
single-channel mixed signal into three individual signals the idea of sensing inside human ears has been motivated
representing electrical brain activities (EEG), eye movements from the fact that the ear canals are reasonably close to all
(EOG), and muscle contractions (EMG). Through prototyp- sources of the three biosignals of interest (i.e., EEG, EOG, and
ing and implementation over a 30 day sleep experiment con- EMG signals) as shown in Figure 1. Furthermore, physical
ducted on eight participants, our results prove the feasibility features of the ear canal allow a tight and fixed sensor place-
of concurrently extracting separated brain, eye, and muscle ment, which is desirable for electrode stability and long-
signals for fine-grained sleep staging with more than 95% term wearability. Hence, we carefully develop LIBS using very
accuracy. With this ability to separate the three biosignals flexible, conductive electrodes to maximize the quality of its
without loss of their physiological information, LIBS has a contact area with the skin in the wearer’s ear canals for good
potential to become a fundamental in-ear biosensing tech- signal acquisition while maintaining a high level of comfort.
nology solving problems ranging from self-caring health to
non-health and enabling a new form of human communica- Figure 1. Conceptual illustration of LIBS and its relative position to
tion interfaces. the sources of EEG, EOG, and EMG signals.
1. INTRODUCTION EEG
Physiological signals generated from human brain, eye,
and facial muscle activities can reveal enormous insight LIBS
EOG
into an individual’s mental state and bodily functions. For
EMG
example, acquiring these biosignals is critical to diagnose
sleep quality for clinical reasons, among other auxiliary
Circuit
signals. Even though providing highly reliable brain signal
(a) Conceptual LIBS (b) LIBS towards
Electroencephalography (EEG), eye signal Electrooculography signal sources
(EOG), and muscle signal Electromyography (EMG), the gold-
standard methodology, referred to as Polysomnography
(PSG),9 has many limitations. Specifically, PSG attaches
The original version of this paper is entitled “A Lightweight
a large number of wired electrodes around human head,
And Inexpensive In-ear Sensing System For Automatic
requires an expert sensor hookup at a laboratory, and pro-
Whole-night Sleep Stage Monitoring” and was published
vides a risk of losing sensor contact caused by body move-
in Proceedings of the 14th ACM Conference on Embedded
ments during sleep. Consequently, this gold-standard
Network Sensor Systems (SenSys), 2016, ACM, New York,
approach is uncomfortable, cumbersome to use, and expen-
NY, USA.
sive and time-consuming to set up.

research highlights
However, as minimizing the number of used electrodes, we that automatically determines appropriate sleep stages from
can achieve only the single-channel signal, which is a mix- LIBS’s outputs acquired in sleep studies as its application.
ture of EEG, EOG, EMG signals, and unwanted noise. We then Generally, the whole-night sleep staging system, as illustrated
develop a signal separation model for LIBS to extract the three in Figure 2, consists of three following primary modules.
signals of interest from the in-ear mixed signal. To validate the
lossless of essential physiological information in the separated 2.1. Signal acquisition
signals acquired by LIBS, we finally develop a sleep stage classi- Overall, this module focuses on tackling our first challenge that
fication algorithm to score every 30sec epoch of the separated requires (1) an ability to adapt to the small uneven area inside
signals into an appropriate stage using a set of discriminative human ear and its easy deformability under the jaw movements
features obtained from them. Through the hardware prototype (e.g., teeth grinding, chewing, and speaking), (2) a potential to
and a one-month long user study, we demonstrated that the acquire the naturally weak biosignals, which have micro-Volt
proposed LIBS was comparable to the existing dedicated sleep amplitude, and (3) a provision of comfortable and harmless
assessment system (i.e., PSG) in terms of accuracy. wearing to the users. We fulfill these obstacles by firstly custom-
Due to the structural variation across ear canals and over- making a deformable earplug-like sensors using a viscoelastic
lapped characteristics of the EEG, EOG, and EMG signals, material with atop sensitive electrodes using several layers of
building LIBS is difficult because of three following key rea- thin, soft, and highly conductive materials. To possibly capture
sons. (1) The brain signal is quite small in order of micro- the weak biosignals from inside human ears, we then increase
Volts (µV). Additionally, the human head anatomy shown in the distance between the main electrodes and the reference
Figure 1(b) indicates that their sources are not too close to point to further enhance signal fidelity. Finally, we preprocess
the location of LIBS placed in the ear canals to be sensed, the collected signal to eliminate signal interference (e.g., body
especially in case of the weak brain source, (2) The charac- movement artifact and electrical noise).
teristics of those three biosignals are overlapped in both
time and frequency domains. Moreover, their activation is 2.2. In-ear mixed signal separation
random and possibly simultaneous during the monitor- In this module, we form a supervised algorithm to overcome
ing period, and (3) The signal quality is easily varied by the our second challenge for signal separation. This challenge,
displacement of electrodes across device hookups and the in detail, is related to (1) overlapping characteristics of three
variation of physiological body conditions across people. signals in both time and frequency domains, (2) a random
Consequently, our first challenge is to build sensors capable activation of the sources generating them, and (3) their vari-
of providing a high level of sensitivity while recording the ation from person to person and in different recordings. We
biosignals from afar and comfort while wearing the device. solve these problems by developing a supervised Nonnegative
Our second challenge is then to provide a robust separation Matrix Factorization (NMF)-based model that can separate
mechanism in the presence of multiple variances, which the preprocessed in-ear mixed signal into EEG, EOG, and
becomes a significant hurdle. EMG with high similarity to the ground truth given by the
While addressing the above challenges to realizing LIBS, gold-standard device. Specifically, our separation algorithm
we make the following contributions through this work: initially learns prior knowledge of the biosignals of interest
through their individual spectral templates. It then adapts
1. Developing a light-weight and low-cost earplug-like the templates to the variation between people through a
sensor with highly sensitive and soft electrodes, the deformation step. Hence, the model we build can alter itself
whole of which is comfortably and safely placed inside slightly to return the best fit between the expected biosig-
human ears to continuously measure the voltage poten- nals and the given templates.
tial of the biosignals in long term with high fidelity.
2. Deriving and implementing a single-channel signal 2.3. Automatic sleep staging
separation model, which integrates a process of learn- This last module provides a set of machine learning algo-
ing source-specific prior knowledge for adapting the rithms to continuously score sleep into appropriate sleep
extraction of EEG, EOG, and EMG from the mixed in- stages using EEG, EOG, and EMG separated from the in-
ear signal to suit the variability of the signals across ear mixed signal. Because those signals can have similar
people and recordings.
3. Developing an end-to-end sleep staging system, which Figure 2. LIBS architecture and its sleep staging application.
takes the input of three separated biosignals and auto-
Signal acquisition
matically determines the appropriate sleep stages, as a Electrode construction
In-ear mixed
signal separation
Training
Pure silver leaf

proof-of-concept of LIBS’s potential in reality. Conductive adhesive gel
4. Conducting an over 30 day long user studies with eight Conductive cloth Gold-standard
device
Circuit
subjects to confirm the feasibility and learn the usabil- Data preprocessing Non-negative matrix Template matrix
ity of LIBS. 60Hz notch

h filtering
filtering Bandpass filt
l ering
filtering Preprocessed
essed data
factorization model ge
genera
e ation
generation
2. LIBS’S SYSTEM OVERVIEW Sleep staging

Traine
rain d sle
Trained slee
sleep p Featu
Featur
Featuree Feature
Featur
EEG signal EOG signal EMG signal
stage
tage classi
clas fier
classifier select
e ion
on
selection extrac
extra tion
io
extraction
In this section, we present an overall design of LIBS in order
to achieve the EEG, EOG, and EMG signals individually from Stage W Stage N1 Stage N2 Stage N3 Stage REM
the in-ear mixed biosignal. Additionally, we provide a module

characteristics shared in some of stages, this module is chal- Figure 3. Prototypes with different conductive materials.
lenging by an ability to (1) find the most informative and dis-
criminative features describing all three biosignals when
they are used together and then (2) construct an efficient
classifier to perform sleep staging. We introduce a classifi-
cation model that can automatic score the sleep after well (a) Silver coated fab- (b) Fabric electrode (c) Copper electrode
trained. Firstly, we deploy an off-line training stage compos- ric electrode
ing of three steps: feature extraction, feature selection, and
model training. Specifically, a set of possible features cor-
responding to each of three separate signals are extracted. resulted that copper is a hard material to be inserted into
Next, a selection process is applied to choose features with and placed inside the ear without harm. Oppositely, con-
a more discriminative process. Using a set of dominant ductive fabric is a good choice that neither harms the in-ear
features selected, the sleep stage classifier is trained with skin nor is broken while being squeezed. However, because of
a measurement of similarity. Finally, the trained model is the weave pattern of its fibers, which cases a non-identical
used in its second stage for on-line sleep stage classification. resistivity (19Ω/sq) on the surface, we further coat their sur-
face with many layers of thin pure silver leaves, which gives
3. IN-EAR MIXED SIGNAL ACQUISITION low and consistent surface resistance for providing reliable
In this section, we discuss the anatomical structure of signals. Also, a very small amount of health-grade conduc-
human ears that leads to the custom design of LIBS sensor tive gel is added. In Figure 2, the construction module shows
as well as its actual prototype using off-the-shelf electrical the comprehensive structure of LIBS electrodes. Ultimately, we
components. place the active and reference electrodes in two separate ear
canals, hence intensify the potential of the signals by a dis-
3.1. Sensor materials tance increase. Finally, the recorded signal is transferred to an
Extensive anatomical study of human ears shows that the amplifier through shielded wires to prevent any external noise.
form of ear canal is easy to be affected when the jaw moves.15
More remarkably, a person can have asymmetry between his 3.3. Microcontroller
left and right ears.14 Beyond those special characteristics, to In this prototype, we use a general brain-computer interface
capture the good signals, it is important to eliminate a gap board manufactured by OpenBCI16 group to sample and dig-
between the electrodes and human skin due to the nature of itize the signal. The board is supplied by a battery source of
the ion current generated by the biosignals. Hence, LIBS sen- 6V for safety and configured at a 2kHz sampling rate and a
sor needs to flexibly reshape itself, well contact to the skin, 24dB gain. The signal is stored in an on-board mini-SD card
well fit different ear structures and types of muscle contrac- while recording and then processed offline in a PC.
tions, and comfortably be worn in long term. One possible
approach is to personalize a mold. However, this approach 4. NMF-BASED SIGNAL SEPARATION
entails high cost and time consume. Therefore, a commercial Due to the limited cavity of the ear canal, the biosignal
earplug with noise-cancelled and flexible wires are offered to recorded by LIBS is inherently a single-channel mixture of
form the sensor prototype. Specifically, we have augmented at least four components including EEG, EOG, EMG sig-
an over-the-counter sound block foam earplug for its base. nals, and unwanted noise. We assume that the mixed signal
The soft elastic material (or memory foam) of the earplug is a linear combination of aforementioned signals gener-
enables the sensor to reshape to its original form shortly after ated from a number of individual sources in the spectral
being squeezed or twisted under the strain to insert into the domain,7 which we mathematically express in Equation (1).
ear. This fundamental property of the foam earplug provides
a comfortable and good fit as it allows the sensor to follow (1)
the shape of the inner surface in the ear canal. In addition,
it not only supplies a stable contact between the electrodes where si is the power spectrum of the three biosignals with
and the in-ear skin but reduces the motion artifact caused by their corresponding weight wi and ε represents noises.
jaw motion as well. Moreover, using the earplug completely Generally, the problem of separating original signals
eliminate the personalization of the base regarding the canal from their combinations generated by concurrent multi-
size. As an additional bonus, the soft surface and the light- source activation has long been addressed for different
weight property of the earplug make itself more convenient systems. The classical example of this problem is the audi-
to be worn without much interference and to block out noise tory source separation problem, also called a cocktail party
during sleep for our case study. problem, where various algorithms have been developed
to extract individual voices of a number of people talk-
3.2. Electrode construction and placement ing simultaneously in a room. Additionally, the problem
On the other hand, LIBS needs to possibly measure low- of decoding a set of received signals to retrieve the orginal
amplitude biosignals from a distance with high fidelity. Our signals transmitted by multiple antennas via Multi-Input
method integrates several solutions into the hardware design to and Multi-Output (MIMO)22 in wireless communication can
address this demand. We firstly tried different conductive also be another example. Although there exist mainstream
materials as shown in Figure 3. However, our experiment techniques25 such as Principal Component Analysis (PCA),

research highlights
Independent Component Analysis (ICA), Empirical Mode ease of implementation. While solving this equation, the
Decomposition (EMD), Maximum Likelihood Estimation template matrix taken from the learning process is used to
(MLE), and Nonnegative Matrix Factorization (NMF) built initialize W. Hence, W is deformed to fit the in-ear signal
to solve the blind source separation problem, most of them acquired from that user at different nights.
require that (1) the number of collected channels is equal to
or larger than the number of source signals (except NMF) and Algorithm 1 Signal Separation Algorithm
(2) the factorized components describing the source signal 1: Input:
are known or selected manually. As a result, it is impossible 2: IS - In-ear Signal
to directly apply them in our work since their first constraint 3: Wini - Spectral Template Matrix
conflicts with the fact that LIBS has only one channel, which 4: ST - Segment Time
is fewer than the number of signals of interest (three signals). 5: Output:
To successfully address this challenge, we propose a novel 6: - Separated Signals
source separation technique that takes advantage of NMF. 7:
However, there have existed two potential issues with a NMF- 8: ← PreprocessSignal(IS);
based model that might degrade the quality of the decom- 9: X ← ComputePowerSpectrum( );
posed signals. They include (1) the inherent non-unique 10: Seg ← SegmentSignal(IS, ST);
estimation of the original source signals (ill-posed problem) 11: for i = 1 → sizeof(Seg) do
caused by the non-convex solution space of NMF and (2) the 12: Hini ← InitializeMatrixRandomly();
variance of the biosignals on different recordings. To solve 13: ← IS_NMF(Segi);
them, our proposed NMF-based model is combined with 14: VEEG(Segi) ← ;
source-specific prior knowledge learnt in advance for each 15: VEOG(Segi) ← ;
user through a training process. Figure 4 demonstrates the 16: VEMG(Segi) ← ;
high-level overview of this process, which leverages two differ- 17: ← rescontructSignal(X, VEEG);
ent NMF techniques to learn source-specific information and 18: ← rescontructSignal(X, VEOG);
to separate the mixing in-ear signal based on priory training. 19: ← rescontructSignal(X, VEMG);
Particularly, when a new user starts using LIBS, his
groundtruth EEG, EOG, and EMG are shortly acquired using In this work, adapting the technique from Ref. Damon et al.,2 we
the gold-standard device (i.e., PSG) and fed into a single- specifically select the Itakura-Saito (IS) divergence dIS as a
class Support Vector Machine (SVM)-based NMF technique measure to minimize the error between the power spectrum
(SVM-NMF)3 to build a personal spectral template matrix, of the original signal and its reconstruction from W and H.
called W, representing their basis patterns. Then, for any in- The IS divergence, in detail, is a limit case of the β-divergence
ear signal recorded by LIBS, our trained model approxi- introduced in Ref. Févotte and Idier,4 which is defined here
mately decomposes its power spectrum X into two lower
rank nonnegative matrices
(2)

in which X ∈ Rm×n comprises m frequency bins and n tem-
poral frames; W is calculated in advance and given; and H (4)
is the activation matrix expressing time points (positions)
when the signal patterns in W are activated. Finding the best
representative of both W and H is equivalent to minimizing The reason is that a noteworthy property of the β-divergence
a cost function defined by the distance between X and WH in (in which the IS divergence corresponds to the case β = 0) is
Equation (3). its behavior w.r.t scale. Alternatively, IS divergence holds a
(3) scale-invariant property dIS(lx | ly) = dIS(x | y) that helps min-
imize the variation of the signals acquired from one person
in different recordings. The IS divergence is given by,
We solve Equation (3) using multiplicative update rules
to achieve a good compromise between the speed and the
(5)
Figure 4. Overview of signal separation process in LIBS.

Hence, Algorithm 1 provides the whole process of sepa-
Learning process Separation process rating EEG, EOG, and EMG signals from the single-channel
Reference Reference Reference Mixing in-ear mixture using a per-user trained template matrix.
EEG EOG EMG in-ear signal
Short time fourier transform 5. SLEEP STAGES CLASSIFICATION

XÊOG
SVM-NMF IS-NMF Wiener filter XÊMG Human sleep naturally proceeds in a repeated cycle of four
XÊEG distinct sleep stages: N1, N2, N3, and REM sleep. To study
W = [WEEG WEOG WEMG] ˆ Hˆ
W,
the sleep quantity and quality, the sleep stages are mainly
identified by simultaneously evaluating three fundamental

measurement modalities including brain activities, eye move- 5.2. Feature selection
ments, and muscle contractions. In hospital, an expert can Although each extracted feature has the ability to partially clas-
visually inspect EEG, EOG, and EMG signals collected from sify biosignals, the performance of a classification algorithm
subjects during sleep and label each segment (i.e., a 30sec can degrade when all extracted features are used to determine
period) with the corresponding sleep stage based on known the sleep stages. Therefore, in order to select a set of relevant
visual cues associated with each stage. Below we elaborate on features among the extracted ones, we compute the discrimi-
each of aforementioned steps of our data analysis pipeline. nating power of each of them19 when they are used in combi-
nation. However, it is computationally impractical to test all
5.1. Feature extraction of the possible feature combinations. Therefore, we adopt a
The features selected for extraction are from a variety of cat- procedure called Sequential Forward Selection (SFS)26 to iden-
egories as follows: tify the most effective combination of features extracted from
Temporal features. This category includes typical our in-ear signal. With SFS, features are selected sequentially
features used in the literature such as mean, variance, until the addition of a new feature results in no performance
median, skewness, kurtosis, and 75th percentile, which improvement in prediction. To further improve the efficiency
can be derived from the time series. In sleep stage clas- of our selection method, we have considered additional crite-
sification, both EOG and EMG signals are often ana- ria for selecting features. In particular, we assigned a weight
lyzed in the time domain due to their large variation in to each feature based on its classification capability and rel-
amplitude and a lack of distinctive frequency patterns. evance to other features. Subsequently, these weight factors
Accordingly, based on our observations about these sig- are adjusted based on the classification error. Furthermore,
nals, we include more features that can distinguish N1 a feature is added to the set of selected features if it not only
from REM, which are often misclassified. In particular, we improves the misclassification error but also is less redun-
consider average amplitude that is significantly low for dant given the features already selected. With this approach,
EMG while relatively higher for EOG during the REM stage. we can efficiently rank discriminant features based on the
Also to capture the variation in EOG during different sleep intrinsic behavior of the EEG, EMG, and EOG signals.
stages, we consider the variance and entropy for EOG in
order to magnify distinctions between Wakefulness, REM, 5.3. Sleep stage classification
and N1 stages. Various classification methods are proposed in the literature
Spectral features. These features are often extracted to for similar applications and each has advantages and disad-
analyze the characteristics of EEG signal because brain vantages. Some scholars11 have chosen the Artificial Neural
waves are normally available in discrete frequency ranges Network (ANN) classification approach for sleep scoring. In
in different stages. By transforming the time series signal spite of the ANN ability to classify untrained patterns, long
into the frequency domain in different frequency bands training time and complexity for selection of parameters
and computing its power spectrum density, various spec- such as network topology. Moreover, since decision tree
tral features can be studied. Here based on our domain is easier to implement and interpret as compared to other
knowledge about the EEG patterns in each sleep stage, we algorithms, it is widely used for sleep stage classification.
identify and leverage spectral edge frequencies to distin- Another classification method used for sleep stage iden-
guish those stages. tification is SVM. SVM is a machine learning method based
Non-linear features. Bioelectrical signals show various on statistical learning theory. Since SVM can be used for
complex behaviors with nonlinear properties. In details, since large data sets with high accuracy rates, it has also been
the chaotic parameters of EEG are dependent on the sleep widely used by various studies18 to classify sleep stages.
stages,11 they can be used for sleep stage classification. The However, this approach suffers from long training time
discriminant ability of such features is demonstrated through and difficulty to understand the learned function. Based
the measures of complexity such as correlation dimension, on the existing comparative studies,19 the decision tree
Lyapunov exponent, entropy, fractal dimension, etc.23 (and more generally random forest) classification methods
For this study, relied on the literature of feature-based have achieved the highest performance since the tree struc-
EOG, EMG, and EEG classification,11 we consider the features ture can separate the sleep stages with large variation. As
listed in Table 1 from each of the aforementioned categories. an example, decision tree classifiers are flexible and work
well with categorical data. However, overfitting and high
dimensionality are the main challenges in decision trees.
Table 1. List of features extracted from the biosignals. Therefore, we use an ensemble learning method for clas-
sification of in-ear signal. Particularly, we deploy random
Features
forest with twenty five decision trees as a suitable classifier
Temporal features average amplitude, variance, 75th percentile, for our system. This classifier is able to efficiently handle
skewness, kurtosis
Spectral features absolute spectral powers
high dimensional attributes and it also reduces computa-
relative spectral powers tional cost on large training data sets. The set of features
relative spectral ratio selected through SFS are used to construct a multitude of
spectral edge frequency decision trees at training stage to identify the correspond-
Non-linear features fractal dimension, entropy
ing sleep stage for every 30sec segment of the biosignals in
the classification stage.

research highlights
6. EVALUATION On the other hand, we conducted the following standard

In this section, we first present the key results in proving Brain-Computer Interaction (BCI) experiments to verify the
the feasibility of LIBS to capture the usable and reliable bio- occurrence of the EEG signal in LIBS’s recordings:
signals, in which all EEG, EOG, and EMG is present. From Auditory Steady-State Response (ASSR). This EEG para-
the success of our proof-of-concept, we then show the per- digm measures the response of human brain while modu-
formance of our proposed separation algorithm for splitting lating auditory stimuli with specific frequency ranges.24 In
those three signals without loss of information. Finally, we this experiment, we applied auditory stimuli in the frequen-
evaluate the usability of LIBS’s outputs through the perfor- cies of 40Hz in which each stimuli lasted for 30sec and was
mance of the automatic sleep stage classification. repeated three times with 20sec rest between them. Then, by
looking at Figure 7, it is easy to recognize a sharp and domi-
6.1. Experiment methodology nant peak at 40Hz produced during the 40Hz ASSR experi-
Beyond our LIBS prototype shown in Figure 5, we used a ment. Clearly, this result demonstrates the ability of LIBS
portable PSG device named Trackit Mark III supported by to capture such the specific frequency in the in-ear mixed
LifeLines Neurodiagnostic Systems Inc. company21 with 14 signal although the peak extracted from the gold standard
EEG electrodes placed at the channel Fp1, Fp2, C3, C4, O1, electrodes was larger than that of LIBS electrode.
and O2 (in accordance to the International 10–20 system) on Steady-State Visually Evoked Potential (SSVEP). Similar to
the scalp, in proximity to the right and left outer cantus, and ASSR, SSVEP measures the brain wave responding to a visual
over the chin, which were all referenced to two mastoids, to
collect the ground truth. This device individually acquires Figure 6. The detection of (a) muscle activities and (b) eye
EEG, EOG, and EMG signals at 256Hz sampling rate and pre- movements from LIBS (top) and the gold standard EMG and EOG
filtered them in the range of 0.1–70Hz. channels (bottom), respectively.
Amplitude (lV) 100 Grinding Chewing

6.2. Validation of signal presence
0
In this evaluation, we assess the presence of the signals of Stillness Stillness Stillness Stillness
–100
interest in the in-ear mixed signal measured by LIBS by com-
30 40 50 60 70 80 90 100 110 120
paring the recording with the groundtruth signals acquired
Amplitude (lV)
10 Grinding
from the gold-standard PSG channels. While the user wears Chewing
both devices at the same time, we illustrate the feasibility of 0

Stillness
Stillness Stillness Stillness
LIBS to produce the usable and reliable signals through dif- –10
ferent experiments. 30 40 50 60 70 80 90 100 110 120

Time (sec)
We first examined if LIBS can capture the EMG signal by
(a) Muscle activitie detection
asking a subject to do two different activities for contract- 2 Left Right Up Down Left Right Up Down
ing his facial muscles. Specifically, the subject kept his teeth
Amplitude (lV)
remaining still and then grinding for 5sec and chewing for 0
20sec continuously. This combination was done for four –1 Forward
times. From Figure 6a, we noticed that our LIBS device could –2
20
25
35
40
50
55
65
70
80
90
95
5
0
0
5
5
0
10
12
13
11
12
14
clearly capture those events reflecting the occurrence of the Left Right Up Down Left Right Up Down
4
EMG signal.
Amplitude (lV)
2
Similarly, we asked the subject to look forward for 0
20sec and then move his eyes to points pre-specified in –2 Forward
–4
four directions (i.e., left, right, up, and down) for 5sec. As
20
25
35
40
50
55
65
70
80
90
95
5
0
0
5
5
0
a result, although the amplitude of the in-ear mixed sig-

10
12
13
11
12
14
Time (sec)
nal is smaller than the gold-standard one, it still clearly (b) Eye movement detection
exhibits the left and right movements of the eyes similar
to the EOG signal channeled in the gold-standard device.
As shown in Figure 6b, LIBS also has the ability to capture Figure 7. The ASSR for 40Hz recorded from (a) LIBS and (b) the gold
the horizontal and vertical eye movements as the reflec- standard device at Channel C3 on scalp.
tion of EOG occurrence.
×10–3
8 0.03
40Hz ASSR 40Hz ASSR
Figure 5. Demonstration of a sleep study and the first prototype of
6
LIBS. 0.02
PSD (V /Hz)
PSD (V2/Hz)
Gold-standard OpenBCI board

4
PSG
0.01
2
LIBS
LIBS sensor
Bias Reference 0 0
30 35 40 45 50 30 35 40 45 50
Main electrodes Frequency (Hz) Frequency (Hz)
Power supply
Shielded wires (a) LIBS (b) Gold standard electrode

stimuli at specific frequencies.12 Particularly, we created a by analyzing the occurrence of special frequencies (i.e., the
blinking stimuli at 10Hz and played it for 20sec with three delta brain wave) in the separated EEG biosignal during the
time repetition. Accordingly, the brain response in this SSVEP sleep study.
experiment comprehensibly presented as a dominant peak Specifically, Figure 10a provides the spectrogram of a 30sec
for LIBS and the gold standard on-scalp electrodes in Figure 8. original in-ear mixed signal captured by LIBS during a sleep
Alpha Attenuation Response (AAR). Alpha wave is a type study and labeled as stage Slow-Wave Sleep (SWS) by the
of brain waves specified in the range of 8–13Hz. This brain gold-standard device. In Figure 10b, the spectrogram of a cor-
wave is a sign of relaxation and peacefulness.1 In this experi- responding 30sec ground-truth EEG signal is presented. By
ment, we asked the subject to completely relax his body observing the second spectrogram, a delta brain wave in a
while closing his eyes for 20sec and then open them for frequency range lower than 4Hz is correctly found. However,
10sec in five consecutive times. As analyzing the recorded in- the spectrogram in Figure 10a cannot show the detection
ear mixed signal, Figure 9 shows that LIBS is able to capture of such the brain wave clearly. Its reason is that not only the
the alpha rhythm from inside the ear. However, the detec- delta brain wave exists but also other biosignals are added
tion of alpha rhythm in case of LIBS was not very clear. This in this original signal. Finally, Figure 10c exhibits the spec-
can be due to the fact that the alpha waves were produced in trogram of the EEG signal separated from the original mixed
frontal lobe that is in a distance from the ear location. signal by applying our proposed signal separation algorithm.
Analyzing this figure proves that the separation model we
6.3. Signal separation validation propose has a capability of not only splitting the signals from
From the previous experiments, we proved that all of the EEG, the mixed one but keeping only the specific characteristics
EOG, and EMG signals appeared in the recordings of LIBS of the separated signal as well. Otherwise, the short appear-
and were mixed in the original in-ear signal. We now show ance of the delta brain wave in the decomposed signal can be
the result of our proposed NMF-based separation algorithm, explained by the fact that the location where LIBS is placed is
which learns the underlying characteristics of gold stan- far from the source of the signal. By that, the amplitude of the
dard EEG, EOG, and EMG signals individually and adapts its signal is highly reduced.
learned knowledge to provide the best decomposition from
the mixed signal. In this evaluation, because the gold stan- 6.4. Sleep stage classification evaluation
dard device (e.g., PSG device) cannot be hooked up in the ear To evaluate the performance of our proposed sleep staging
canal to capture the same signal as our in-ear device does, method, we conducted a 38hrs of sleep experiments over
similarity measures such as mutual information, cross-cor- eight graduate students (three females and five males) with
relation, etc. cannot be used to provide a numeric compari- an average age of 25 to evaluate the performance of the pro-
son between the separated and gold standard signals. We posed sleep stage classification system inputting the bio-
then demonstrate the performance of our proposed model signals returned by LIBS. An full board Institutional Review
Board (IRB) review was conducted and an approval was
Figure 8. The SSVEP responses recorded from (a) LIBS and (b) the granted for this study. The participants were asked to sleep
gold standard device at Channel O1 on scalp. in a sleep lab while plugging LIBS into their ear canals and
have a conventional PSG hook-up around their head simul-
×10–3 0.6
1.5 taneously. After that, the Polysmith program17 was run to
10Hz SSVEP 10Hz SSVEP score the ground-truth signals into different sleep stages at
PSD (V2/Hz)
PSD (V2/Hz)
1 0.4
Figure 10. Signal separation performance obtained by LIBS through
a 30sec mixed in-ear signal (a) and compared with the ground-truth
0.5 0.2 EEG signal (b), and its corresponding separated EEG signal (c).
Frequency (Hz)
0 0 40 Delta brain wave

0 5 10 15 20 5 10 15 20 25
Frequency (Hz) Frequency (Hz) 20
(a) LIBS (b) Gold standard electrode 0

5 10 15
Time (sec)
(a) Mixed in-ear signal
Frequency (Hz)
Figure 9. The detection of alpha rhythms from (a) LIBS and (b) the 40 Delta brain wave
gold standard device at Channel C4 on scalp. 20

0
20 20
Frequency (Hz)
Eyes closed Eyes closed 600 5

Time (sec)
10 15
16 16 500
12 12 400 (b) Ground-truth EEG signal
Frequency (Hz)
8 300
8 40 Delta brain wave
200
4 4 100 20
0 0 0
5 10 15 20 5 10 15 20 0
5 10 15
Time (sec) Time (sec) Time (sec)
(a) LIBS (b) Gold standard electrode (c) EEG signal separated from in-ear signal

research highlights
every 30sec segment. For all studies, the sleeping environ- off-the-shelf devices (e.g., MUSE) only capture the brain
ment was set up to be quiet, dark, and cool. signal to tell how well users are meditating. Different from
Statistically, we extracted the features from 4313 30sec them, LIBS further looks at the eye and muscle signals to
segments using the original mixed signal as well as three analyze the level of relaxation they have more accurately. As
separated signals. Training and test data sets are randomly a result, LIBS promisingly helps improve the users’ medita-
selected from the same subject pool. Figure 11 displays the tion performance.
results of the sleep stage classification in comparison to the Eating habit monitoring. Eating habits can provide criti-
hypnogram of the test data scores out of the gold standard cal evidences for various diseases.8 As LIBS can capture the
PSG. From this, we observe that the dynamics of the hyp- muscle signal very clearly, such information can be useful
nogram is almost completely maintained in the predicted to infer how often the users chew, how fast they chew, how
scores. Moreover, our result show that the end-to-end sleep much they chew, and what the intensity of their chewing
staging system can achieve 95% accuracy on average. is. From all of that, LIBS can then predict what foods they
We refer the readers to Ref. Nguyen et al.13 for more are eating as well as how much they are eating. As a result,
detailed validations of signal acquisition and separation, LIBS can provide users guidance to avoid their bad habits by
their comparison with the signals recorded by the gold- themselves or to visit a doctor if necessary.
standard device, and our user study.
7.2. Non-health applications
7. POTENTIALS OF LIBS LIBS can benefit applications and systems on other domain
We envision LIBS to be an enabling platform for not only such as improving hearing aid devices, improving driver’s
healthcare applications but also those from other domains. safety, helping parents with orienting their child early on.
Figure 12 illustrates the eight potential applications includ- Autonomous audio steering. This application helps solve
ing in-home sleep monitoring, autism onset detection, a classical problem in hearing aid, which is called cocktail
meditation training, eating habit monitoring, autonomous party problem. As known, state-of-the-art hearing aid devices
audio steering, distraction and drowsiness detection, child’s try to amplify the sounds coming from the area that has large
interest assessment, and human-computer interaction. We amplitude, which is assumed as human voice, in a party.
discuss these exemplary applications below. Consequently, the hearing aid will fail to support the wear-
ers if any group of people behind them is talking very loudly,
7.1. Healthcare applications which is not the right person they want to talk to. Using our
We propose three applications that LIBS can be extended to technology, using the eye signal LIBS can capture, it will pos-
serve in healthcare: autism act-out onset detection, medita- sibly detect the area that the users are paying their attention
tion coaching, and eating habit monitoring. to. Furthermore, combining with their brain signal, LIBS
Autism onset detection. Thanks to its ability to capture can further predict how please the wearers are with the out-
muscle tension, eye movements, and brain activities, LIBS put sound that their hearing aid is producing. With that in
has a potential to be an autism on-set detection and predic- mind, LIBS can steer the hearing aid and improve its quality
tion wearable. Particularly, people with autism can have very of amplification so that the hearing aid can provide the high-
sensitive sensory (e.g., visual, auditory, and tactile) functions. quality sounds coming from the right source to the users.
When any of their sensory functions leads to an overload, their Distraction and drowsiness detection. Distraction and
brain signal, facial muscle, and eye movement are expected to drowsiness are very serious factors in driving. Specifically,
change significantly.5 We hope to explore this phenomenon if people feel drowsy, their brain signal will be in alpha
to detect the relationship between these three signals and the state, their eyes will be closed, and their chin muscle tone
on-set event from which a prediction model can be developed. will become relax.20 Also, it is easy to detect if people are dis-
Meditation training. Meditation has a potential for tracted based on the localization of eye positions when we
improving physical and mental well-being when it is done analyze the changes of the eye signal. Hence, LIBS with three
in a right way. Hence, it is necessary to understand peo- separated brain, eye, and muscle signals should be able to
ple’s mindfulness level during the meditation to be able to determine the driver’s drowsiness level or distraction to fur-
provide more efficient instructions. Existing commercial ther send an alert for avoiding road accidents.
Figure 11. A hypnogram of 30min data resulted by our classification algorithm.
Wake
REM
Sleep stage
N1
N2
SWS
0 10 20 30 40 50 60
Segment sequence number Proposed algorithm Ground truth

Figure 12. Potential applications of LIBS.
accuracy and usability. Further than an in-ear bio-sensing
wearable, we view LIBS as a key enabling technology for con-
LIBS
LIBS
cealed head-worn devices for healthcare and communication
LIBS
applications, especially for personalized health monitoring,
digital assistance, and the introduction of socially-aware
In-home sleep monitoring human-computer interfaces.
Human-computer interface Autism onset prediction
LIBS Acknowledgments
LIBS
We thank LifeLines Neurodiagnostic Systems Inc. for their
support in providing the gold-standard PSG device and
thank Yiming Deng and Titsa Papantoni for their valuable
Child’s interest assessment
Meditation training feedback at the early stages of this work. This material is
LIBS
LIBS based in part upon work supported by the National Science
LIBS
Foundation under Grant SCH-1602428.
LIBS
Eating habit monitoring References 13. Nguyen, A., Alqurashi, R., Raghebi, Z.,
Distraction and 1. Alloway, C., et al. The alpha Banaei-kashani, F., Halbower, A.C.,
drowsiness detection Autonomous audio steering
attenuation test: assessing excessive Vu, T. A lightweight and inexpensive
daytime sleepiness in narcolepsy- in-ear sensing system for automatic
cataplexy. Sleep 20, 4 (1997), 258–266. whole-night sleep stage monitoring.
2. Damon, C., et al. Non-negative matrix SenSys ‘16 (2016), 230–244.
Child’s interest assessment. With LIBS, child’s interest factorization for single-channel EEG 14. Oliveira, R. The dynamic ear canal.
assessment can be done less obtrusively and yield more artifact rejection. In ICASSP (2013), Ballachandra B, ed. The Human Ear
1177–1181. Canal. San Diego: Singular Publishing
accurate outcomes. Moreover, from that, the parents will 3. Essid, S. A single-class SVM based Group (1995), 83–111.
be able to orient them accordingly so that they can learn what algorithm for computing an identifiable 15. Oliveira, R., et al. A look at ear canal
NMF. In Proceedings of 2012 IEEE changes with jaw motion. Ear Hear
they like the most. Clinically, kids from the age of 0–2yrs ICASSP (2012), 2053–2056. 13, 6 (1992), 464–466.
don’t have the ability to express their interest. More pre- 4. Févotte, C., Idier, J. Algorithms for 16. OpenBCI. http://openbci.com/.
nonnegative matrix factorization with 17. Polysmith–NIHON KOHDEN.
cisely, the only way to express their interest is their cry- the beta-divergence. Neural. Comput. http://www.nihonkohden.de/.
ing. As a result, the conventional gold-standard device 23, 9 (2011), 2421–2456. 18. Ronzhina, M., et al. Sleep scoring
5. Gillingham, G. Autism, Handle with using artificial neural networks. Sleep
(i.e., PSG) is usually used to read their biosignals, which Care!: Understanding and Managing Med. Rev. 16, 3 (2012), 251–263.
relatively reflect their interest in what they are allowed to Behavior of Children and Adults with 19. Sen, B., et al. A Comparative Study on
Autism. Future Education, Texas Classification of Sleep Stage Based on
do. However, it is not comfortable for them to wear and (1995). EEG Signals Using Feature Selection
do activities during the assessment. Hence, by leveraging 6. Goverdovsky, V., et al. In-ear EEG and Classification Algorithms. J. Med.
from viscoelastic generic earpieces: Syst. 38, 3 (2014), 1–21.
LIBS to read the signal from their ears and at the same Robust and unobtrusive 24/7 20. Silber, M.H., et al. The visual scoring
monitoring. IEEE Sens. J. (2015). of sleep in adults. J. Clin. Sleep Med.
time letting them play different sports or learn different 7. Hallez, H., et al. Review on solving 3, 02 (2007), 121–131.
subjects, LIBS should be able to infer what the level of the forward problem in EEG source 21. Trackit Mark III—LifeLines
analysis. J. Neuroeng. Rehabil. 4, 1 Neurodiagnostic Systems.
their interest is with high comfort. (2007), 1–29. https://www.lifelinesneuro.com/.
Human-computer interaction. In a broader context, LIBS 8. Kalantarian, H., et al. Monitoring 22. Tse, D., Viswanath, P. Fundamentals
eating habits using a piezoelectric of Wireless Communication.
can be used as a form of Human Computer Interaction (HCI), sensor-based necklace. Comput. Biol. Cambridge University Press,
which can especially benefit users with disability. In stead of Med. 58, Supplement C (2015), 46–55. New York (2005).
9. Kushida, C., et al. Practice 23. R.A.U., et al. Non-linear analysis of
using only the brain signal as found in many HCI and brain- parameters for the indications EEG signals at various sleep stages.
to-computer systems today, LIBS can combine the informa- for polysomnography and related Comput. Methods and Programs
procedures: an update for 2005. Sleep Biomed. 80, 1 (2005), 37–45.
tion extracted from the three separated signals to enrich 28, 4 (2005), 499–521. 24. van der Reijden, C., et al. Signal-to-
commands the user can build to interact with the computer 10. Lee, J.H., et al. CNT/PDMS-based noise ratios of the auditory steady-
canal-typed ear electrodes for state response from fifty-five EEG
in a more reliable way. This gives users more choices for inconspicuous EEG recording. J. derivations in adults. J. Am. Acad.
integration with computing systems in a potentially more Neural Eng. 11, 4 (2014). Audiol. 15, 10 (2004), 692–701.
11. Motamedi-Fakhr, S., et al. Signal 25. Virtanen, T., et al. Compositional
precise and convenient manner. processing techniques applied to Models for Audio Processing:
human sleep EEG signals—A review. Uncovering the structure of sound
Biomed. Signal Process. Control 10 mixtures. IEEE Signal Processing
8. CONCLUSION (2014), 21–33. Mag. 32, 2 (2015), 125–144.
12. Nawrocka, A., Holewa, K. Brain— 26. Zoubek, L., et al. Feature selection for
In this paper, we enabled LIBS, a sensing system worn Computer interface based on sleep/wake stages classification using
inside human ear canals, that can unobtrusively, comfort- Steady—State Visual Evoked data driven methods. Biomed. Signal
Potentials (SSVEP). In Proceedings of Process. Control 2, 3 (2007), 171–179.
ably, and continuously monitor the electrical activities of 14th ICCC (2013), 251–254.
human brain, eyes, and facial muscles. Different from exist-
ing hi-tech systems of measuring only one specific type
of the signals, LIBS deploys a NMF-based signal separa- Anh Nguyen and Tam Vu, University of Ann C. Halbower, University of Colorado,
Colorado, Boulder, CO, USA. School of Medicine, Aurora, CO, USA.
tion algorithm to feasibly and reliably achieve three indi-
vidual signals of interest. Through one-month long user Raghda Alqurashi, Zohreh Raghebi,
and Farnoush Banaei-Kashani,
study of collecting the in-ear signals during sleep and scor- University of Colorado, Denver, CO, USA.
ing them into appropriate sleep stages using a prototype,
LIBS itself demonstrated a promising comparison to the
existing dedicated sleep assessment systems in term of © 2018 ACM 0001-0782/18/11 $15.00

CAREERS
Augusta University ˲˲ Description of teaching philosophy and who demonstrate a potential to contribute to
Tenure Track and Tenured Positions at the experience cross-disciplinary teaching and research in con-
Assistant, Associate, and Full Professor Levels ˲˲ Names of at least three references junction with the planned Schiller Institute for
Integrated Science and Society at Boston College.
The School of Computer and Cyber Sciences at The above items should be either emailed See https://www.bc.edu/bc-web/schools/mcas/
Augusta University was founded in 2017 with the to ccs@augusta.edu or mailed to Chair Search departments/computer-science.html and https://
mission to provide high-engagement, state-of-the- Committee, School of Computer and Cyber www.bc.edu/bc-web/schools/mcas/sites/schiller-
art education, and research across its Computer Sciences, Augusta University, 1120 15th Street, institute.html for more information.
Science, Information Technology, and Cybersecu- UH-127, Augusta, GA 30912.
rity disciplines, and with the vision of becoming Qualifications:
a national leader in Cybersecurity. The School is A Ph.D. in Computer Science or a closely related
embarking on a path of unprecedented growth to Boston College discipline is required, together with a distin-
become a comprehensive research and education Assistant Professor of the Practice or Lecturer guished track record of research and external
college, with substantial increases in faculty, and in Computer Science funding, and evidence of the potential to play a
graduate and undergraduate enrollment. leading role in the future direction of the depart-
Augusta, Georgia, is becoming a primary hub The Computer Science Department of Boston Col- ment, both in the recruitment of faculty and the
for cybersecurity in the United States, and the area lege seeks to fill one or more non-tenure-track teach- development of new academic programs.
is poised for explosive development. It is located at ing positions, as well as shorter-term visiting teach- To apply go to
the center of a number of academic, governmental ing positions. All applicants should be committed to http://apply.interfolio.com/54226.
and corporate partnerships critical to the nation’s excellence in undergraduate education, and be able Application process begins October 1, 2018.
cyber security, including the U.S. Army Cyber Cen- to teach a broad variety of undergraduate computer
ter of Excellence, the National Security Agency science courses. Faculty in longer-term positions Boston College is a Jesuit, Catholic university
Georgia, the future home of the United States Army will participate in the development of new courses that strives to integrate research excellence with
Cyber Command, and the nearby Savannah River that reflect the evolving landscape of the discipline. a foundational commitment to formative liberal
National Laboratory in South Carolina. The State Minimum requirements for the title of As- arts education. We encourage applications from
of Georgia invested $100M in Georgia Cyber Cen- sistant Professor of the Practice, and for the title candidates who are committed to fostering a di-
ter at Augusta University, a 167,000-square-foot re- of Visiting Assistant Professor, include a Ph.D. in verse and inclusive academic community. Boston
search and education facility which opened on July Computer Science or closely related discipline. College is an Affirmative Action/Equal Opportu-
10, 2018 and is home to the School of Computer Candidates who have only attained a Master’s nity Employer and does not discriminate on the
and Cyber Sciences. The second, 165,000-square degree would be eligible for the title of Lecturer, basis of any legally protected category including
building of the Center is under construction to be or Visiting Lecturer. See https://www.bc.edu/bc- disability and protected veteran status. To learn
completed in December of 2018. web/schools/mcas/departments/computer-sci- more about how BC supports diversity and inclu-
Augusta University has embarked on an am- ence.html for more information. sion throughout the university, please visit the
bitious, multi-year effort to significantly expand To apply go to Office for Institutional Diversity at http://www.
its computing, cybersecurity, and data science http://apply.interfolio.com/54268. bc.edu/offices/diversity.
activities. Applications are being invited for 12 Application process begins October 1, 2018.
tenure-track and tenured positions at the Assis-
tant, Associate, and Full Professor levels, with re- Boston College is a Jesuit, Catholic university Boston College
sponsibilities to advance education and research that strives to integrate research excellence with Tenure Track, Assistant Professor of Computer
in all mainstream areas of computer science and a foundational commitment to formative liberal Science
possibly drawing from closely related or emerg- arts education. We encourage applications from
ing fields. candidates who are committed to fostering a di- The Computer Science Department of Boston
Information about the school and a descrip- verse and inclusive academic community. Boston College is poised for significant growth over the
tion of open positions are available on the school College is an Affirmative Action/Equal Opportu- next several years and seeks to fill faculty posi-
website at http://www.augusta.edu/ccs. nity Employer and does not discriminate on the tions at all levels beginning in the 2019-2020 aca-
basis of any legally protected category including demic year. Outstanding candidates in all areas
Requirements disability and protected veteran status. To learn will be considered, with a preference for those
Applicants must hold a PhD in Computer Science more about how BC supports diversity and inclu- who demonstrate a potential to contribute to
or a related discipline at the time of appointment, sion throughout the university, please visit the cross-disciplinary teaching and research in con-
have demonstrated excellence in research, and a Office for Institutional Diversity at http://www. junction with the planned Schiller Institute for
strong commitment to teaching. Outstanding bc.edu/offices/diversity. Integrated Science and Society at Boston College.
candidates in all areas of computer science will A Ph.D. in Computer Science or a closely related
be considered with a target appointment date of discipline is required for all positions. See https://
Fall 2019. Review of applications and candidate Boston College www.bc.edu/bc-web/schools/mcas/departments/
interviews will begin December 1 and continue Associate or Full Professor of Computer computer-science.html and https://www.bc.edu/
until the positions are filled. Science bc-web/schools/mcas/sites/schiller-institute.
html for more information.
To be considered as an applicant, the following Description: Successful candidates for the position of As-
materials are required: The Computer Science Department of Boston sistant Professor will be expected to develop
˲˲ Cover letter College is poised for significant growth over the strong research programs that can attract exter-
˲˲ Curriculum vitae including a list of publica- next several years and seeks to fill faculty posi- nal research funding in an environment that also
tions tions at all levels beginning in the 2019-2020 aca- values high-quality undergraduate teaching.
˲˲ Statement describing research accomplish- demic year. Outstanding candidates in all areas Minimum requirements for all positions in-
ments and future research plans will be considered, with a preference for those clude a Ph.D. in Computer Science or closely re-

lated discipline, an energetic research program Areas of interest include (but are not limited to) list of documents required, and full instructions
that promises to attract external funding, and algorithms, data assimilation and inverse prob- on how to apply online, please visit https://appli-
a commitment to quality in undergraduate and lems, dynamical systems and control, geometry, cations.caltech.edu/jobs/cms.
graduate education. machine learning, mathematics of data science, Questions about the application process may
To apply go to networks and graphs, numerical linear algebra, be directed to search@cms.caltech.edu.
https://apply.interfolio.com/54208. optimization, partial differential equations, prob- We are an equal opportunity employer and
Application review begins October 1, 2018. ability, scientific computing, statistics, stochastic all qualified applicants will receive consideration
modeling, and uncertainty quantification. for employment without regard to race, color,
Boston College is a Jesuit, Catholic university CMS is a unique environment where research religion, sex, sexual orientation, gender identity,
that strives to integrate research excellence with in applied and computational mathematics, national origin, disability status, protected vet-
a foundational commitment to formative liberal computer science, and control and dynamical eran status, or any other characteristic protected
arts education. We encourage applications from systems is conducted in a collegial atmosphere; by law.
candidates who are committed to fostering a di- application focii include distributed systems,
verse and inclusive academic community. Boston economics, graphics, neuroscience, quantum
College is an Affirmative Action/Equal Opportu- computing, and robotics and autonomous sys- California State University, Sacramento
nity Employer and does not discriminate on the tems. The CMS Department is part of the broader Tenure-Track Assistant Professor
basis of any legally protected category including EAS Division comprising researchers working in,
disability and protected veteran status. To learn and at intersections between, the fields of aero- California State University, Sacramento, Depart-
more about how BC supports diversity and inclu- space, civil, electrical, mechanical, and medical ment of Computer Science. One tenure-track
sion throughout the university, please visit the engineering, as well as in environmental science assistant professor position to begin with the
Office for Institutional Diversity at http://www. and engineering, and in materials science and ap- Fall 2019 semester. Applicants specializing in
bc.edu/offices/diversity. plied physics. The Institute as a whole represents any area of computer science will be considered.
the full range of research in biology, chemistry, Those with expertise in areas related to software
engineering, physics, and the social sciences. engineering, computer architecture, artificial
California Institute of Technology A commitment to world-class research, as intelligence, or deep learning are especially en-
Tenure-Track Faculty Position well as high-quality teaching and mentoring, is couraged to apply. Ph.D. in Computer Science,
expected. The initial appointment at the assistant Computer Engineering, or closely related field
The Computing and Mathematical Sciences professor level is for four years, and is contingent required by the time of the appointment. For
(CMS) Department at the California Institute of upon the completion of a Ph.D. degree in applied detailed position information, including appli-
Technology (Caltech) invites applications for a mathematics, computer science, statistics or in a cation procedure, please see http://www.csus.
tenure-track faculty position in the fundamen- related field in engineering or the sciences. edu/about/employment/. Screening will begin
tal mathematics and theory that underpins ap- Applications will be reviewed beginning No- November 19, 2018, and remain open until filled.
plication domains within the CMS Department, vember 7, 2018, and applicants are encouraged to AA/EEO employer. Clery Act statistics available.
within the Engineering and Applied Sciences have all their application materials, including let- Mandated reporter requirements. Criminal back-
(EAS) Division, or within the Institute as a whole. ters of recommendation, on file by this date. For a ground check will be required.
Faculty Positions in
Computer and
Communication Science
at the Ecole polytechnique fédérale de Lausanne (EPFL)
The School of Computer and Communication Sciences (IC) To apply, please follow the application procedure at
at EPFL invites applications for faculty positions in computer https://facultyrecruiting.epfl.ch/position/10977288
and communication sciences. We are seeking candidates for
tenure-track assistant professor as well as for senior positions. The following documents are requested in PDF format:
cover letter, curriculum vitae including publication list,
Successful candidates will develop an independent and brief statements of research and teaching interests, and
creative research program, participate in both undergraduate contact information (name, postal address, and email) of
and graduate teaching, and supervise PhD students. 3 references for junior positions or 5 for senior positions.
The school is seeking candidates in the fields of: 1) Machine Screening will start on December 1, 2018. Further questions
Learning and Data Science – including applications in can be addressed to:
bioinformatics, natural language processing, and speech Profs. Rüdiger Urbanke and George Candea
recognition; 2) Verification and Formal Methods; 3) Systems; Co-Chairs of the Recruiting Committee
4) CS Education and Learning Analytics. Candidates in other CH-1015 Lausanne
areas are also encouraged to apply and will be considered. recruiting.ic@epfl.ch
EPFL offers internationally competitive salaries, generous For additional information on EPFL and IC:
research support, significant start-up resources, and http://www.epfl.ch, http://ic.epfl.ch
outstanding research infrastructure.
Academics in Switzerland enjoy many research funding EPFL is an equal opportunity employer and family friendly
opportunities, as well as an exceptionally high living university. It is committed to increasing the diversity of its
standard. faculty. It strongly encourages women to apply.

CAREERS
California State University, San working students, a dynamic colloquium series, plans, contact information for three experts who
Bernardino and strong undergraduate programs in computer can provide letters of recommendation, and up to
Assistant Professor (Tenure-Track) science, data science, digital communication, three pre/reprints of scholarly work. All applica-
and information systems, including a BCS pro- tions received by December 1st, 2018 will receive
The School of Computer Science and Engineer- gram accredited by ABET (abet.org). full consideration.
ing at California State University, San Bernardino Applicants should have a PhD (or be near Applicants can consult http://www.
invites applications for a tenure-track position at completion) in computer science or a related ee.columbia.edu for more information about the
the Assistant Professor level, beginning Septem- area, or have a master’s degree and 5 years of re- department and http://academicjobs.columbia.
ber 2019. All areas of Computer Science will be lated experience. We are especially interested in edu/applicants/Central?quickFind=67066 for
considered. expanding our expertise in the areas of data sci- more details on the position and application.
The School of CSE offers the programs of B.S. ence & visualization, computer security, or 3D Columbia University is an Equal Opportunity
in Computer Science (ABET accredited), B.S. in modeling & animation, but individuals from all Employer / Disability / Veteran.
Computer Engineering (ABET accredited), B.S. in computing-related areas are encouraged to apply.
Bioinformatics, B.A. in Computer Systems, and Calvin is a Christian comprehensive liberal
M.S. in Computer Science. arts college located in Grand Rapids, Michigan; Georgia Institute of Technology,
Candidates must have a Ph.D. in Computer it is one of the largest Christian colleges in North School of Computational Science and
Science or a closely related field by the time of ap- America, and was named the #1 regional college Engineering
pointment. The position is primarily to support in the Midwest for 2017 by U.S. News & World Tenure-Track Faculty
the B.S. in Computer Science (ABET accredited), Report. Its faculty members are committed to
B.A. in Computer Systems and M.S. in Computer establishing relationships and positive commu- The School of Computational Science and Engi-
Science programs. The candidate must display nication across multiple dimensions of diversity, neering (CSE) of the College of Computing at the
potential for excellence in teaching and schol- including but not limited to ethnicity, gender, Georgia Institute of Technology seeks tenure-
arly work. The candidate is expected to supervise physical limitations, class, or religious perspec- track faculty, at all levels, who may specialize in
student research at both the undergraduate and tives. high-performance computing (HPC), data analyt-
graduate levels, and to actively participate in oth- For more information and application in- ics, machine learning (ML), and modeling and
er types of academic student advising. The candi- structions, see: https://cs.calvin.edu/documents/ simulation, to solve real-world problems in sci-
date will actively contribute to the School’s cur- Tenure_Track_Faculty_Position. ence, engineering, health, and social domains.
riculum development. The candidate will serve Our school seeks candidates who may specialize
the School, College and University, as well as the in a broad range of application areas including
community and the profession. Columbia University biomedical and health informatics; urban sys-
The College of Natural Sciences at CSUSB is Open Rank Faculty Position in the Department tems and smart cities; social good and sustain-
committed to creating a welcoming and inclusive of Electrical Engineering able development; materials and manufacturing;
climate for people from diverse backgrounds and and national security. Applicants must have an
is committed to enhancing diversity. CNS also Columbia Engineering is pleased to invite appli- outstanding record of research, a sincere com-
strives for excellence and fosters harmony. cations for a faculty position in the Department mitment to teaching, and interest in engaging in
Women and underrepresented minorities are of Electrical Engineering at Columbia University substantive interdisciplinary research with col-
strongly encouraged to apply. For more informa- in the City of New York. Applications at all ranks laborators in other disciplines.
tion about the School of Computer Science and will be considered. Georgia Tech is located in the heart of metro
Engineering, please visit http://cns.csusb.edu/cse. The Electrical Engineering department wel- Atlanta, a home to more than 5.5 million people
ADLINE AND APPLICATION PROCESS: comes applications in all areas of electrical en- and nearly 150,000 businesses, a world-class
Please submit 1) curriculum vitae; 2) statement gineering, and especially encourages candidates airport, lush parks and green spaces, competi-
of teaching philosophy; 3) description of research with an interest in the school-wide initiatives that tive schools and numerous amenities for enter-
interests; 4) letters from 3 individuals qualified relate to engineering and medicine, autonomous tainment, sports and restaurants that all offer a
to comment (have letters of recommendation systems, quantum computing and technology, top-tier quality of life. From its diverse economy,
sent via email to facultyrecruitment@csusb.edu); and sustainability. Areas of emphasis for Electri- global access, abundant talent and low costs of
5) copies of transcripts of all post-secondary de- cal Engineering include (i) signals, information business and lifestyle, metro Atlanta is a great
grees (official transcripts will be required prior to and data, and (ii) energy, including power sys- place to call “home.” Residents have easy access
appointment). tems, renewable energy and the optimization and to arts, culture, sports and nightlife, and can ex-
Also include a Diversity Statement detailing control of the electrical grid. Candidates must perience all four seasons – with mild winters that
how your teaching and/or service and/or scholar- have a Ph.D. or its professional equivalent by the rarely require a snow shovel.
ship would support the success of students from starting date of the appointment. Applicants for In mid-2019, CSE will move to its new home,
racial, ethnic, and gender backgrounds that are this position must demonstrate the potential to the CODA Building. CSE will be the core academic
underrepresented in your academic field. (Maxi- do pioneering research and to teach effectively. unit in the building, co-located with institutes
mum 250 words). The Department is especially interested in quali- and centers focused on data engineering and sci-
Submit application at https://www.govern- fied candidates who can contribute, through their ence, ML, health informatics, cybersecurity, and
mentjobs.com/careers/csusb/jobs/2204738. research, teaching, and/or service, to the diversity HPC. The 750,000-square-foot mixed-use develop-
Formal review of applications will begin Nov. and excellence of the academic community. ment represents a $375 million investment into
15, 2018 and continue until the position is filled. The successful candidate is expected to con- the innovation district of Tech Square in Atlanta
Questions about this position can be directed tribute to the advancement of their field and the and will include an 80,000-square-foot HPC cen-
to Dr. Haiyan Qiao, Director of School of Comput- department by developing an original and lead- ter alongside 620,000 square feet of office space.
er Science and Engineering, at hqiao@csusb.edu. ing externally funded research program, and to CSE is unique in that it will be the only school in
contribute to the undergraduate and graduate ed- its entirety to move all staff, operations, research,
ucational mission of the Department. Columbia faculty, and students to this location. This unique
Calvin College fosters multidisciplinary research and encour- placement positions CSE to become a direct part-
Tenure-Track Faculty Position ages collaborations with academic departments ner with the greater CODA community.
and units across Columbia University. Applications should be submitted online
The Department of Computer Science at Calvin For additional information and to apply, through https://academicjobsonline.org/ajo/
College invites applications for a tenure-track please see: http://engineering.columbia.edu/ jobs/11600. For best consideration, applications
faculty position to begin August 2019, pending faculty-job-opportunities. Applications should are due by December 15, 2018. The application
administrative approval, and strongly encour- be submitted electronically and include the fol- material should include a full academic CV, a per-
ages applications from women and other under- lowing: curriculum vitae including a publication sonal narrative on teaching and research, a list of
represented groups. Our department features list, a description of research accomplishments, a at least three references and up to three sample
supportive colleagues, excellent facilities, hard- statement of research and teaching interests and publications. Georgia Tech is an Affirmative Ac-

tion/Equal Opportunity Employer. Applications domness Center (ARC). The school is in a period Purdue University
from women and under-represented minorities of rapid growth with five tenure-track assistant Head of the Department of Computer Science
are strongly encouraged. professors hired last year.
For more information about Georgia Tech’s Georgia Tech is adjacent to the Midtown dis- The College of Science at Purdue University in-
School of Computational Science and Engineer- trict of Atlanta. Midtown is a walking, in-town vites nominations and applications for the posi-
ing please visit: http://www.cse.gatech.edu/ neighborhood, burgeoning with many new cafes tion of Head of the Department of Computer Sci-
and restaurants, home to tech companies, and ence. The department seeks a dynamic research
within walking distance of outdoor activities, leader and innovative educator with creative vi-
Georgia Institute of Technology, School including the Beltline, Piedmont Park, Botani- sion and an outstanding record of achievement.
of Computer Science cal Gardens, and High Museum of Art. Georgia The department’s teaching and research ac-
Tenured/Tenure-Track Faculty Position Tech’s new CODA building will be located in Mid- tivities cover a broad range of topics including
town and the School of Computer Science will bioinformatics and computational biology, com-
The School of Computer Science at the Georgia have a strong presence in the new building. The putational science and engineering, databases
Institute of Technology is recruiting multiple greater Atlanta area is very cosmopolitan, with and data mining, distributed systems, graph-
tenure-track faculty. Our preference is for junior- a variety of international communities and out- ics and visualization, information security and
level candidates at the Assistant Professor level, door pursuits (beaches, mountains, etc.) within assurance, machine learning and information
but exceptional candidates at all levels will be driving distance. retrieval, networking and operating systems,
considered. We seek candidates who comple- Applications will be considered until open posi- programming languages and compilers, soft-
ment and enhance our research strengths in any tions are filled. However the review of applications ware engineering, and theory of computing and
area, and are especially interested in candidates will begin on December 1, 2018. Applicants are algorithms. For more information and the online
whose research focuses on theory of computing, encouraged to clearly identify in their cover letter version of the ad, see http://www.cs.purdue.edu/.
data science or security. the area(s) that best describe their research inter- The successful candidate will have an exem-
The School of Computer Science, one of three ests. All applications must be submitted online at: plary record of scholarly achievement along with
schools in the College of Computing, focuses on https://academicjobsonline.org/ajo/jobs/11942. outstanding leadership potential. It is expected
research that makes computing and communi- Georgia Tech is an equal education/employ- that candidates for this position will have an
cation smart, fast, reliable, and secure, with re- ment opportunity institution dedicated to build- earned doctorate in computer science or a related
search groups in computer architecture, databas- ing a diverse community. We strongly encourage field and a level of stature in the field sufficient
es, machine learning, networking, programming applications from women, underrepresented at a minimum to merit appointment with tenure
languages, security, software engineering, sys- minorities, individuals with disabilities, and vet- at the rank of Professor. The Head will work with
tems, and theory. Faculty in the school are lead- erans. Georgia Tech has policies to promote a faculty colleagues to build and achieve a compel-
ers in a variety of Georgia Tech initiatives, includ- healthy work-life balance and is aware that at- ling vision for the future of the Department, and to
ing: Institute for Data Engineering and Science tracting faculty may require meeting the needs of continue to accelerate the advancement of its na-
(IDEaS), Institute for Information and Security two careers. More information about the School tionally-ranked program through a commitment
(IISP), Center for Research into Novel Computing of Computer Science is available at: http://scs. to excel in all aspects of the Department’s mission.
Hierarchies (CRNCH), and Algorithms and Ran- gatech.edu/. Highly desirable qualities include an understand-
Assistant Professor of Data Science

The Data Science program invites applications for a tenure track position with a research focus in Data Science to begin in Fall 2019 to strengthen this
strategic interdisciplinary area. The new faculty will join the strong team of existing data science faculty working on interdisciplinary research related
to big data on real-world grand challenge problems with societal impact.
The highly interdisciplinary Data Science program at WPI, a collaboration between Computer Science, Mathematical Sciences and the Robert A. Foisie
School of Business, has undergone major growth since its inception in 2014 supported by a cluster hire of seven full-time faculty in Data Science and
closely related disciplines. The signature Data Science program offers on-campus degree programs at all levels, including undergraduate minor in
Data Science, MS degree in Data Science, and the first interdisciplinary PhD degree in Data Science in the nation.
WPI is interested in applicants with research and teaching expertise in all areas of Data Science, but in particular in applicants with strong background
complementary to the existing expertise.
Areas of strength in Data Science at WPI include statistical learning, machine learning, deep learning, large-scale data management, compressed
sensing, big data analytics, signal processing, visualization, artificial intelligence, and Data Science applications from digital health, cyber security,
social media, material sciences, neuro-sciences, connected communities, urban computing, to bioinformatics. Outstanding candidates in any area
of Data Science will receive full consideration.
Founded in 1865, WPI is one of the nation’s first technological Universities known for its innovative project-based education. WPI’s reputation as an
innovative university rests on the shoulders of its faculty. A highly selective, private technological university and one of the nation’s first, WPI believes
that when great minds work together, great advances follow. At WPI the boundaries to multidisciplinary collaboration are low faculty members -
students, and other partners work together on real-world projects and purposeful research. We are most proud of a recent No. 1 ranking for “faculty
who best combine research and teaching.” (Wall Street Journal/Times Higher Ed, 2016).
The successful candidate will hold an academic appointment in either the Mathematical Sciences or the Computer Science Department.
Candidates should have a Ph.D. in Computer Science, Mathematical Sciences, Statistics, Electrical Engineering or a closely related field, and the
potential for excellence in teaching and research.
Candidates should include detailed research and teaching statements, vitae, and contact information for at least three references.
To apply, visit http://apptrkr.com/1286303. The deadline for applications is December 10, 2018. Applications will be considered after that date until
the position is filled.
WPI is an Equal Opportunity Employer. All qualified candidates will receive consideration for employment without regard to race, color, age, religion,
sex, sexual orientation, gender identity, national origin, veteran status, or disability. We are seeking individuals with diverse backgrounds and
experiences who will contribute to a culture of creativity and collaboration, inclusion, problem solving and change making.

CAREERS
ing of the current needs and future direction of or expression. Texas State University is a member skills, strong teaching potential, and research ac-
computer science as an academic discipline, a of The Texas State University System. Texas State complishments.
commitment to diversity and collaboration, and University is an EOE. UAH is located in an expanding, high technolo-
skills in academic leadership, student relations, gy area, in close proximity to Cummings Research
mentoring, and alumni relations development. Park, the second largest research park in the na-
Confidential nominations and inquires can be Trinity College, Hartford, Connecticut tion and the fourth largest in the world. Nearby are
sent to head-search@cs.purdue.edu. Candidates Assistant Professor of Computer Science the NASA Marshall Space Flight Center, the Army’s
should submit a letter of application articulating a Redstone Arsenal, numerous Fortune 500 and
vision for the future of academic computer science, Applications are invited for a tenure-track posi- high tech companies. UAH also has an array of re-
a statement of research and teaching, and a com- tion in computer science at the rank of Assistant search centers, including information technology
plete curriculum vitae with names and email ad- Professor to start in the fall of 2019. and cybersecurity. In short, collaborative research
dresses of at least five references. Applications and Candidates must hold a Ph.D. in computer opportunities are abundant, and many well-edu-
nominations will be held in strict confidence; review science at the time of appointment. We are seek- cated and highly technically skilled people are in
of the same will begin immediately and continue un- ing candidates with teaching and research inter- the area. There is also access to excellent public
til the position is filled. Application materials can be ests in applied areas associated with data analyt- schools and inexpensive housing.
uploaded at: https://hiring.science.purdue.edu. ics, such as database and information systems, UAH has an enrollment of approximately
A background check will be required for employ- data mining and knowledge discovery, machine 9,500 students. The Computer Science depart-
ment in this position. Purdue University’s Depart- learning, and artificial intelligence, but other re- ment offers BS, MS, and PhD degrees in Com-
ment of Computer Science is committed to advanc- lated areas will also be seriously considered. puter Science and contributes to interdisciplin-
ing diversity in all areas of faculty effort, including Trinity College is a coeducational, indepen- ary degrees. Faculty research interests are varied
scholarship, instruction, and engagement. Candi- dent, nonsectarian liberal arts college located in, and include cybersecurity, mobile computing,
dates should address at least one of these areas in and deeply engaged with, Connecticut’s capital data science, software engineering, visualization,
their cover letter, indicating their past experiences, city of Hartford. Our approximately 2,200 stu- graphics and game computing, multimedia, AI,
current interests or activities, and/or future goals to dents come from all socioeconomic, racial, reli- image processing, pattern recognition, and dis-
promote a climate that values diversity and inclusion. gious, and ethnic backgrounds across the United tributed systems. Recent NSF figures indicate the
Purdue University is an EOE/AA employer. All States, and seventeen percent are international. university ranks 30th in the nation in overall fed-
individuals, including minorities, women, indi- We emphasize excellence in both teaching and eral research funding in computer science.
viduals with disabilities, and veterans are encour- research, and our intimate campus provides an Interested parties must submit a detailed
aged to apply. ideal setting for interdisciplinary collaboration. resume with references to info@cs.uah.edu or
Teaching load is four courses per year for the first Chair, Search Committee, Dept. of Computer Sci-
two years and five courses per year thereafter, ence, The University of Alabama in Huntsville,
San Diego State University with a one-semester leave every four years. We Huntsville, AL 35899. Qualified female and mi-
Department of Computer Science offer a competitive salary and benefits package, nority candidates are encouraged to apply. Initial
Two Tenure-Track Assistant Professor Positions plus a start-up expense fund. For information review of applicants will begin as they are received
about the Computer Science Department, visit: and continue until a suitable candidate is found.
The Department of Computer Science at SDSU http://www.cs.trincoll.edu/. The University of Alabama in Huntsville is an
seeks to hire two tenure-track Assistant Profes- Applicants should submit a curriculum vitae affirmative action / equal opportunity employer /
sors starting Fall 2019. The candidates should and teaching and research statements and ar- minorities/ females / veterans / disabled.
have PhD degrees in Computer Science or closely range for three letters of reference to be sent to: Please refer to log number: 19/20 - 538
related fields. One position is in Cybersecurity https://trincoll.peopleadmin.com/.
(see https://apply.interfolio.com/53552); the oth- Consideration of applications will begin on
er position is in Algorithms & Computation (see December 15, 2018, and continue until the posi- University of Illinois at Urbana-
https://apply.interfolio.com/53547). Questions tion is filled. Champaign
about the position may be directed to COS-CS- Trinity College is an Equal-Opportunity/Affir- Positions in Computing
Search@sdsu.edu. Top candidates in other areas mative-Action employer.
will also be considered. SDSU is an equal oppor- Women and members of minority groups are The Department of Electrical and Computer En-
tunity/Title IX employer. encouraged to apply. gineering (ECE ILLINOIS) at the University of
Illinois at Urbana-Champaign invites applica-
tions for faculty positions at all areas and levels
Texas State University The University of Alabama in Huntsville in computing, broadly defined, with particular
Department of Computer Science Assistant Professor emphasis on Embedded Computing Systems and
the Internet of Things; Data-Centric Computing
The Department of Computer Science invites ap- The Department of Computer Science at The Uni- Systems and Storage; Networked and Distributed
plications for three faculty positions: versity of Alabama in Huntsville (UAH) invites ap- Computing Systems; AI/Autonomous Systems;
1. One tenure-track Assistant Professor to start on plicants for a tenure-track faculty position at the Robotics; Machine Vision; Quantum Computing.
September 1, 2019. Review date is February 4, Assistant Professor level beginning August 2019. Applications are encouraged from candidates
2019. All applicants with a background in traditional whose research programs specialize in core as
2. Two non-tenure track Senior Lecturers to start areas of computer science will be considered; well as interdisciplinary areas of electrical and
on September 1, 2019. Review date is March 4, however, special emphasis will be given to appli- computer engineering. Ideal candidates include
2019. cants with expertise in cybersecurity, software en- those who demonstrate evidence of a commit-
Please consult the department’s webpage at gineering, cloud computing, and systems related ment to diversity, equity, and inclusion through
www.cs.txstate.edu/employment/faculty/ for job areas. research, teaching, and/or service endeavors.
duties, required and preferred qualifications, ap- A Ph.D. in computer science or a closely relat- From the transistor and the first computer
plication procedures, and information about the ed area is required. The successful candidate will implementation based on von Neumann’s archi-
university and the department. have a strong academic background and be able tecture to the Blue Waters petascale computer
Texas State University is committed to an to secure and perform funded research in areas (the fastest computer on any university campus),
inclusive education and work environment that typical for publication in well-regarded academic ECE ILLINOIS has always been at the forefront
provides equal opportunity and access to all qual- conference and journal venues. In addition, the of computing research and innovation. ECE IL-
ified persons. Texas State, to the extent not in con- candidate should embrace the opportunity to LINOIS is in a period of intense demand and
flict with federal or state law, prohibits discrimi- provide undergraduate education. growth, serving over 3000 students and averaging
nation or harassment on the basis of race, color, The department has a strong commitment 7 new tenure-track faculty hires per year in recent
national origin, age, sex, religion, disability, vet- to excellence in teaching, research, and service; years. It is housed in its new 235,000 sq. ft. net-ze-
erans’ status, sexual orientation, gender identity the candidate should have good communication ro energy design building, which is a major cam-

pus addition with maximum space and minimal and underrepresented groups are highly encour- should be received by November 25, 2018. Howev-
carbon footprint. aged to apply. Successful candidates are expected er, applications will be accepted until the search
Qualified senior candidates may also be con- to develop externally sponsored research pro- is completed.
sidered for tenured full Professor positions as grams, teach both undergraduate and graduate To apply, please visit https://workforum.
part of the Grainger Engineering Breakthroughs courses and provide academic advising to stu- memphis.edu/. Include a cover letter, curriculum
Initiative (graingerinitiative.engineering.illinois. dents at all levels. vitae, statement of teaching philosophy, research
edu), which is backed by a $100-million gift from Applicants should hold a PhD in Computer statement, and three letters of recommendation.
the Grainger Foundation. Science, or related discipline, and be commit- Direct all inquiries to Corinne O’Connor (ccon-
Please visit http://jobs.illinois.edu to view the ted to excellence in both research and teaching. nor2@memphis.edu).
complete position announcement and applica- Salary is highly competitive and dependent upon A background check will be required for employ-
tion instructions. Full consideration will be given qualifications. ment. The University of Memphis is an Equal Op-
to applications received by December 1, 2018, but The Department of Computer Science (http:// portunity/Equal Access/Affirmative Action employer
applications will continue to be accepted until all www.memphis.edu/cs/) offers B.S., M.S., and Ph.D. committed to achieving a diverse workforce.
positions are filled. programs as well as graduate certificates in Data Sci-
Illinois is an EEO Employer/Vet/Disabled ence and Information Assurance, and participates
www.inclusiveillinois.illinois.edu. in an M.S. program in Bioinformatics (through the University of Michigan
The University of Illinois conducts criminal College of Arts and Sciences). The Department has Multiple Tenure-Track and Teaching Faculty
background checks on all job candidates upon ac- been ranked 55th among CS departments with fed- Positions
ceptance of a contingent offer. erally funded research. The Department regularly
engages in large-scale multi-university collabora- Computer Science and Engineering (CSE) at the
tions across the nation. For example, CS faculty lead University of Michigan invites applications for
University of Memphis the NIH-funded Big Data “Center of Excellence for multiple tenure-track and teaching faculty (lec-
Department of Computer Science Mobile Sensor Data-to-Knowledge (MD2K)” and the turer) positions. We seek exceptional candidates
Assistant Professors “Center for Information Assurance (CfIA)”. In addi- at all levels in all areas across computer science
tion, CS faculty work closely with multidisciplinary and computer engineering. We also have a tar-
The Department of Computer Science at the Uni- centers at the university such as the “Institute for geted search for an endowed professorship in
versity of Memphis is seeking candidates for mul- Intelligent Systems (IIS)”. theoretical computer science (the Fischer Chair).
tiple Assistant Professor positions beginning Fall Known as America’s distribution hub, Mem- Qualifications include an outstanding academic
2019. Exceptionally qualified candidates in all ar- phis ranked as America’s 6th best city for jobs by record, a doctorate or equivalent in computer sci-
eas of computer science are invited while candi- Glassdoor in 2017. Memphis metropolitan area ence or computer engineering, and a strong com-
dates with core expertise in cyber-human systems has a population of 1.3 million. It boasts a vibrant mitment to teaching and research. Candidates
(including computer vision, speech recognition, culture and has a pleasant climate with an aver- are expected, through their research, teaching,
computer graphics, and human computer inter- age temperature of 63 degrees. and/or service, to contribute to the diversity and
action (HCI)) and CS education, are particularly Screening of applications begins immediate- excellence of the academic community.
encouraged to apply. Candidates from minority ly. For full consideration, application materials The University of Michigan is one of the
TENURE-TRACK AND TENURED POSITIONS

ShanghaiTech University invites highly qualified MULTIPLE FACULTY POSITIONS
candidates to fill multiple tenure-track/tenured
faculty positions as its core founding team in the School of Information Science and Department of Electrical and
Technology (SIST). We seek candidates with exceptional academic records or demonstrated Systems Engineering
strong potentials in all cutting-edge research areas of information science and technology.
They must be fluent in English. English-based overseas academic training or background The School of Engineering and Applied Science at the University of Pennsylvania is growing
is highly desired. its faculty by 33% over the next five years. As part of this initiative, the Department of
ShanghaiTech is founded as a world-class research university for training future generations Electrical and Systems Engineering is engaged in an aggressive, multi-year hiring effort
of scientists, entrepreneurs, and technical leaders. Boasting a new modern campus in for multiple tenure-track positions at all levels. Candidates must hold a Ph.D. in Electrical
Zhangjiang Hightech Park of cosmopolitan Shanghai, ShanghaiTech shall trail-blaze a new Engineering, Computer Engineering, Systems Engineering, or related area. The department
education system in China. Besides establishing and maintaining a world-class research seeks individuals with exceptional promise for, or proven record of, research achievement,
profile, faculty candidates are also expected to contribute substantially to both graduate who will take a position of international leadership in defining their field of study, and excel
and undergraduate educations. in undergraduate and graduate education. Leadership in cross-disciplinary and multi-
Academic Disciplines: Candidates in all areas of information science and technology disciplinary collaborations is of particular interest. We are interested in candidates in all areas
shall be considered. Our recruitment focus includes, but is not limited to: computer that enhance our research strengths in
architecture, software engineering, database, computer security, VLSI, solid state and
nano electronics, RF electronics, information and signal processing, networking, security, 1. Nanodevices and nanosystems (nanoelectronics,MEMS/NEMS, power electronics,
computational foundations, big data analytics, data mining, visualization, computer vision, nanophotonics, nanomagnetics, quantum devices, integrated devices and systems at
bio-inspired computing systems, power electronics, power systems, machine and motor nanoscale),
drive, power management IC as well as inter-disciplinary areas involving information
science and technology. 2. Circuits and computer engineering (analog, RF, mm-wave, digital circuits,emerging
circuit design, computer engineering, IoT, embedded and cyber-physical systems), and
Compensation and Benefits: Salary and startup funds are highly competitive,
commensurate with experience and academic accomplishment. We also offer a 3. Information and decision systems (control, optimization, robotics, data science,
comprehensive benefit package to employees and eligible dependents, including on- network science, communications, information theory, signal processing).
campus housing. All regular ShanghaiTech faculty members will join its new tenure-track
system in accordance with international practice for progress evaluation and promotion. Prospective candidates in all areas are strongly encouraged to address large-scale
societal problems in energy, transportation, health, food and water, economic and financial
Qualifications: networks, social networks, critical infrastructure, and national security. We are especially
• Strong research productivity and demonstrated potentials; interested in candidates whose interests are aligned with the school’s strategic plan,
• Ph.D. (Electrical Engineering, Computer Engineering, Computer Science, Statistics,
Applied Math, Artificial Intelligence, Statistics or related field); https://www.seas.upenn.edu/about/penn-engineering-2020/
• A minimum relevant (including PhD) research experience of 4 years. Diversity candidates are strongly encouraged to apply. Interested persons should submit
Applications: Submit (in English, PDF version) a cover letter, a 2-page research plan, an online application at https://www.ese.upenn.edu/faculty-staff/ and include curriculum
a CV plus copies of 3 most significant publications, and names of three referees to: vitae, statement of research and teaching interests, and at least three references. Review of
sist@shanghaitech.edu.cn. For more information, visit http://sist.shanghaitech.edu.cn/ applications will begin on December 1, 2018.
NewsDetail.asp?id=373
The University of Pennsylvania is an Equal Opportunity Employer.
Deadline: The positions will be open until they are filled by appropriate candidates. Minorities/Women/Individuals with Disabilities/Veterans are encouraged to apply.

CAREERS
world’s leading research universities, consist- that warrant an appointment at the rank of full initiate and lead an outstanding, innovative, in-
ing of highly ranked departments and colleges professor with tenure. Tenure is contingent upon dependent, competitive, and externally funded
across engineering, sciences, medicine, law, Board of Regents approval. research program of international calibre, and
business, and the arts. CSE is a vibrant and in- See http://apptrkr.com/1295200 for information to teach at both the undergraduate and graduate
novative community, with over 70 world-class fac- on the Department and application instructions. levels. Candidates should have demonstrated ex-
ulty members, over 300 graduate students, and a Screening of applications will begin on No- cellence in research and teaching. Excellence in
large and illustrious network of alumni. Ann Ar- vember 15, 2018. The search will continue until research is evidenced primarily by publications
bor is known as one of the best small cities in the the position is filled or the search is closed. or forthcoming publications in leading journals
country, offering cosmopolitan living without the The University of Texas at San Antonio is an or conferences in the field, presentations at sig-
hassle. The University of Michigan has a strong Affirmative Action/Equal Opportunity Employer. nificant conferences, awards and accolades, and
dual-career assistance program. Women, minorities, veterans, and individuals strong endorsements by referees of high interna-
We encourage candidates to apply as soon with disabilities are encouraged to apply. tional standing. Evidence of excellence in teach-
as possible. For best consideration for Fall 2019, ing will be demonstrated by strong communica-
please apply by December 1, 2018. Positions re- tion skills; a compelling statement of teaching
main open until filled and applications can be The University of Texas at San Antonio submitted as part of the application highlighting
submitted throughout the year. (UTSA) areas of interest, awards and accomplishments,
For more details on these positions and to ap- Faculty Position in Computer Science and teaching philosophy; sample course syllabi
ply, please visit http://cse.umich.edu/jobs. and materials; and teaching evaluations, as well
Michigan Engineering’s vision is to be the The Department of Computer Science at The Uni- as strong letters of recommendation.
world’s preeminent college of engineering serv- versity of Texas at San Antonio (UTSA) invites ap- Eligibility and willingness to register as a Pro-
ing the common good. This global outlook, lead- plications for one tenure-track or tenured open fessional Engineer in Ontario is highly desirable.
ership focus, and service commitment permeate rank (Assistant, Associate or Full Professor) po- Salary will be commensurate with qualifica-
our culture. Our vision is supported by a mission sition, starting in Fall 2019. This position is tar- tions and experience.
and values that, together, provide the framework geted towards faculty with expertise and interest The Edward S. Rogers Sr. Department of Elec-
for all that we do. Information about our vision, in artificial intelligence (AI). Outstanding candi- trical and Computer Engineering at the Univer-
mission and values can be found at: http://strate- dates from all areas of AI will be considered, and sity of Toronto ranks among the best in North
gicvision.engin.umich.edu/. preference will be given to applicants with exper- America. It attracts outstanding students, has
The University of Michigan has a storied tise in cyber adversarial learning, AI for resource- excellent facilities, and is ideally located in the
legacy of commitment to Diversity, Equity and constrained systems (such as IoTs and embedded middle of a vibrant, artistic, diverse and cosmo-
Inclusion (DEI). The Michigan Engineering com- systems), or AI (such as natural language process- politan city.
ponent of the University’s comprehensive, five- ing, computer vision and deep learning) as it re- Additional information may be found at
year, DEI strategic plan—with updates on our lates to health-related applications. This position http://www.ece.utoronto.ca.
programs and resources dedicated to ensuring a is part of the university-wide cluster hiring in Arti- Review of applications will begin after Sep-
welcoming, fair, and inclusive environment—can ficial Intelligence. tember 1, 2018, however, the position will remain
be found at: http://www.engin.umich.edu/col- See http://www.cs.utsa.edu/fsearch for infor- open until November 29, 2018.
lege/about/diversity. mation on the Department and application in- As part of your online application, please
The University of Michigan is a Non-Discrimi- structions. Screening of applications will begin include a cover letter, a curriculum vitae, a sum-
natory/Affirmative Action Employer. immediately. mary of your previous research and future research
Application received by January 2, 2019 will plans, as well as a teaching dossier including a
be given full consideration. The search will con- statement of teaching experience and interests,
The University of Texas at San Antonio tinue until the positions are filled or the search is your teaching philosophy and accomplishments,
(UTSA) closed. The University of Texas at San Antonio is and teaching evaluations. Applicants must arrange
Department Chair an Affirmative Action/Equal Opportunity Employ- for three letters of reference to be sent directly by
er. Women, minorities, veterans, and individuals the referees (on letterhead, signed and scanned),
The Department of Computer Science at the Uni- with disabilities are encouraged to apply. by email to the ECE department at search2018@
versity of Texas at San Antonio (UTSA) is seek- Department of Computer Science ece.utoronto.ca. Applications without any refer-
ing a dynamic Department Chair that can lead a RE: Faculty Search ence letters will not be considered; it is your re-
department of preeminence in an extraordinary The University of Texas at San Antonio sponsibility to make sure your referees send us the
diverse University that is focused on a significant One UTSA Circle letters while the position remains open.
expansion of its research mission. San Antonio, TX 78249-0667 You must submit your application online
The Department seeks exceptional candi- Phone: 210-458-4436 while the position is open, by following the sub-
dates with (1) a record of high quality scholarship mission guidelines given at http://uoft.me/how-
and competitive research with federal, state, and to-apply. Applications submitted in any other way
industry funding, (2) experience and leadership University of Toronto will not be considered. We recommend combin-
in institutions of higher education, industry, or Assistant Professor, Tenure Stream ing attached documents into one or two files in
professional organizations, (3) an understanding PDF/MS Word format. If you have any questions
of pedagogies that will lead to student success The Edward S. Rogers Sr. Department of Elec- about this position, please contact the ECE de-
and excellence in undergraduate and graduate trical and Computer Engineering (ECE) at the partment at search2018@ece.utoronto.ca.
teaching, (4) experience leading interdisciplinary University of Toronto invites applications for up The University of Toronto is strongly com-
teams, and (5) mentorship experience and a com- to four full-time tenure-stream faculty appoint- mitted to diversity within its community and es-
mitment to inclusion and diversity. ments at the rank of Assistant Professor. The ap- pecially welcomes applications from racialized
The University of Texas at San Antonio is des- pointments will commence on July 1, 2019. persons / persons of colour, women, Indigenous /
ignated a National Center of Academic Excellence Within the general field of electrical and com- Aboriginal People of North America, persons with
in Cyber Operations and has just been approved puter engineering, we seek applications from disabilities, LGBTQ persons, and others who may
for $70 million in funding to construct two new candidates with expertise in one or more of the contribute to the further diversification of ideas.
facilities – A National Security Collaboration Cen- following strategic research areas: 1. Computer As part of your application, you will be asked
ter and a proposed School of Data Science. The Systems and Software; 2. Electrical Power Sys- to complete a brief Diversity Survey. This survey is
Computer Science Department has 23 full-time tems; 3. Systems Control, including but not lim- voluntary. Any information directly related to you
faculty, 8 full-time lecturers, 1,300 undergraduate ited to autonomous and robotic systems. is confidential and cannot be accessed by search
students, 70 M.S., and 60 Ph.D. students. Applicants are expected to have a Ph.D. in committees or human resources staff. Results
The successful candidate must have a doctor- Electrical and Computer Engineering, or a relat- will be aggregated for institutional planning pur-
ate in computer science or closely related field, ed field, at the time of appointment or soon after. poses. For more information, please see http://
with outstanding research and teaching records Successful candidates will be expected to uoft.me/UP.

All qualified candidates are encouraged to Salary will be commensurate with qualifica-
apply; however, Canadians and permanent resi- tions and experience.
dents will be given priority. The Edward S. Rogers Sr. Department of Elec-
trical and Computer Engineering at the Univer-
sity of Toronto ranks among the best in North
University of Toronto America. It attracts outstanding students, has ex-
Assistant Professor, Teaching Stream cellent facilities, and is ideally located in the mid-
dle of a vibrant, artistic, diverse and cosmopoli- ADVERTISING
The Edward S. Rogers Sr. Department of Electrical tan city. Additional information may be found at IN CAREER
and Computer Engineering (ECE) at the Univer- http://www.ece.utoronto.ca. OPPORTUNITIES
sity of Toronto invites applications for a full-time Review of applications will begin after Sep-
teaching-stream faculty appointment at the rank tember 1, 2018, however, the position will remain How to Submit a Classified Line Ad: Send
of Assistant Professor, Teaching Stream, in the open until November 29, 2018. an e-mail to acmmediasales@acm.org.
general area of Computer Systems and Software. As part of your online application, please in- Please include text, and indicate the
issue/or issues where the ad will appear,
The appointment will commence on July 1, 2019. clude a cover letter, a curriculum vitae, and a teach-
and a contact name and number.
Applicants are expected to have a Ph.D. in ing dossier including a summary of your previous
Electrical and Computer Engineering, or a relat- teaching experience, your teaching philosophy and Estimates: An insertion order will then
be e-mailed back to you. The ad will by
ed field, at the time of appointment or soon after. accomplishments, your future teaching plans and
typeset according to CACM guidelines.
Successful candidates will have demonstrat- interests, sample course syllabi and materials, and NO PROOFS can be sent. Classified line
ed excellence in teaching and pedagogical in- teaching evaluations. Applicants must arrange for ads are NOT commissionable.
quiry, including in the development and delivery three letters of reference to be sent directly by the Deadlines: 20th of the month/2 months
of undergraduate courses and laboratories and referees (on letterhead, signed and scanned), by prior to issue date. For latest deadline
supervision of undergraduate design projects. email to the ECE department at search2018@ece. info, please contact:
This will be demonstrated by strong communi- utoronto.ca. Applications without any reference acmmediasales@acm.org
cation skills, a compelling statement of teaching letters will not be considered; it is your responsi- Career Opportunities Online: Classified
submitted as part of the application highlighting bility to make sure your referees send us the letters and recruitment display ads receive a
areas of interest, awards and accomplishments while the position remains open. free duplicate listing on our website at:
and teaching philosophy; sample course syllabi You must submit your application online http://jobs.acm.org
and materials; and teaching evaluations, as well while the position is open, by following the sub- Ads are listed for a period of 30 days.
as strong letters of reference from referees of high mission guidelines given at http://uoft.me/how-
For More Information Contact:
standing endorsing excellent teaching and com- to-apply. Applications submitted in any other
ACM Media Sales
mitment to excellent pedagogical practices and way will not be considered. We recommend com-
at 212-626-0686 or
teaching innovation. bining attached documents into one or two files
acmmediasales@acm.org
Eligibility and willingness to register as a Pro- in PDF/MS Word format. If you have any ques-
fessional Engineer in Ontario is highly desirable. tions about this position, please contact the ECE
Exploring new frontiers in computer science.
mitpress.mit.edu/ACM

CAREERS
department at search2018@ece.utoronto.ca. http://www.ece.utoronto.ca. duct high-quality research within the area of Hu-
The University of Toronto is strongly com- Review of applications will begin after Sep- man and Crowd Computing and establish her
mitted to diversity within its community and es- tember 1, 2018, however, the position will remain or his research group within the Department of
pecially welcomes applications from racialized open until November 29, 2018. Informatics and internationally. The successful
persons / persons of colour, women, Indigenous / As part of your online application, please in- candidate is also expected to actively interface
Aboriginal People of North America, persons with clude a cover letter, a curriculum vitae, a summa- with the other groups at the department and the
disabilities, LGBTQ persons, and others who may ry of your previous research and future research faculty, and seek collaborations with research-
contribute to the further diversification of ideas. plans, as well as a teaching dossier including a ers across faculties within the Digital Society
As part of your application, you will be asked statement of teaching experience and interests, Initiative of the University of Zurich as well as
to complete a brief Diversity Survey. This survey is your teaching philosophy and accomplishments, internationally.
voluntary. Any information directly related to you and teaching evaluations. Applicants must ar- The Faculty of Business, Economics and
is confidential and cannot be accessed by search range for three letters of reference to be sent di- Informatics offers a stimulating research envi-
committees or human resources staff. Results rectly by the referees (on letterhead, signed and ronment and rich opportunities for collabora-
will be aggregated for institutional planning pur- scanned), by email to the ECE department at tion. The Human and Crowd Computing pro-
poses. For more information, please see http:// search2018@ece.utoronto.ca. Applications with- fessorship is another step towards establishing
uoft.me/UP. out any reference letters will not be considered; it the strengths of the Department of Informatics
All qualified candidates are encouraged to is your responsibility to make sure your referees within its three focus areas of people-oriented
apply; however, Canadians and permanent resi- send us the letters while the position remains computing, computing and economics, and big
dents will be given priority. open. data analytics.
You must submit your application online The University of Zurich is one of the lead-
while the position is open, by following the ing research universities in Europe and offers
University of Toronto submission guidelines given at http://uoft.me/ the widest range of study courses in Switzerland
Associate Professor, Tenure Stream how-to-apply. Applications submitted in any to over 26,000 students. Through its educational
other way will not be considered. We recom- and research objectives, the University of Zurich
The Edward S. Rogers Sr. Department of Elec- mend combining attached documents into one aims at attracting leading international research-
trical and Computer Engineering (ECE) at the or two files in PDF/MS Word format. If you have ers who are willing to contribute to its develop-
University of Toronto invites applications for up any questions about this position, please con- ment and to strengthening its reputation. The
to four full-time tenure-stream faculty appoint- tact the ECE department at search2018@ece. University of Zurich is an equal opportunity em-
ments at the rank of Associate Professor. The ap- utoronto.ca. ployer and strongly encourages applications from
pointments will commence on July 1, 2019. The University of Toronto is strongly com- female candidates.
Within the general field of electrical and committed to diversity within its community and es- Please submit your application, including a
puter engineering, we seek applications from pecially welcomes applications from racialized CV, contact information for at least three refer-
candidates with expertise in one or more of the persons / persons of colour, women, Indigenous / ences, three papers (published or unpublished),
following strategic research areas: 1. Computer Aboriginal People of North America, persons with and a record of teaching effectiveness (taught
Systems and Software; 2. Electrical Power Sys- disabilities, LGBTQ persons, and others who may courses and evaluations) via https://www.faculty-
tems; 3. Systems Control, including but not lim- contribute to the further diversification of ideas. hiring.oec.uzh.ch/position/10092545 before 1st
ited to autonomous and robotic systems. As part of your application, you will be asked November, 2018.
Applicants are expected to have a Ph.D. in to complete a brief Diversity Survey. This survey is Documents should be addressed to Prof. Dr.
Electrical and Computer Engineering, or a relat- voluntary. Any information directly related to you Harald Gall; Dean of the Faculty of Business, Eco-
ed field, and have at least five years of academic or is confidential and cannot be accessed by search nomics and Informatics; University of Zurich;
relevant industrial experience. committees or human resources staff. Results Switzerland.
Successful candidates will be expected to will be aggregated for institutional planning pur- For questions regarding the open position
maintain and lead an outstanding, independent, poses. For more information, please see http:// please contact Prof. Dr. Thomas Fritz (fritz@ifi.
competitive, innovative, and externally funded uoft.me/UP. uzh.ch) or Prof. Dr. Abraham Bernstein (bern-
research program of international calibre, and All qualified candidates are encouraged to stein@ifi.uzh.ch) or Prof. Dr. Sven Seuken (seu-
to teach at both the undergraduate and gradu- apply; however, Canadians and permanent resi- ken@ifi.uzh.ch).
ate levels. Candidates should have demonstrated dents will be given priority.
excellence in research and teaching. Excellence
in research is evidenced primarily by sustained US Air Force Academy
and impactful publications in leading journals University of Zurich Assistant Professor of Computer Science
or conferences in the field, awards and accolades, Professorship in Human and Crowd Computing
presentations at significant conferences and (Assistant/Associate/Full) The Department of Computer Science at the US
a high profile in the field with strong endorse- Air Force Academy seeks to fill up to two faculty
ments by referees of high international stand- The Faculty of Business, Economics and Infor- positions at the Assistant Professor level. The de-
ing. Evidence of excellence in teaching will be matics of the University of Zurich invites applica- partment is particularly interested in candidates
demonstrated by strong communication skills, tions for a Professorship in Human and Crowd with backgrounds in artificial intelligence, com-
a compelling statement of teaching submitted as Computing (Assistant/Associate/Full) starting in puter and network security, operations research,
part of the application highlighting areas of inter- fall 2019. or unmanned aerial systems, but all candidates
est, awards and accomplishments, and teaching Candidates should hold a PhD in Computer with a passion for undergraduate computer sci-
philosophy; sample course syllabi and materials; Science, Informatics or a related discipline, and ence teaching are encouraged to apply.
and teaching evaluations, as well as strong letters have an excellent research record in the area of The Academy is a national service institu-
of recommendation. “Human and Crowd Computing,” ideally with tion, charged with producing lieutenants for
Eligibility and willingness to register as a Pro- a focus in one or multiple of the following sub- the US Air Force. Faculty members are expected
fessional Engineer in Ontario is highly desirable. areas: Social Computing, Human Computation, to exemplify the highest ideals of professional-
Salary will be commensurate with qualifica- Collective Intelligence, Human-Agent Interac- ism and character. USAFA is located in Colorado
tions and experience. tion, Computer-Supported Cooperative Work, In- Springs, an area known for its exceptional natu-
The Edward S. Rogers Sr. Department of Elec- centive Design/Mechanism Design, Information ral beauty and quality of life. The United States
trical and Computer Engineering at the Univer- Elicitation, and the Design/Analysis of Social Plat- Air Force Academy values the benefits of diver-
sity of Toronto ranks among the best in North forms. We expect the candidate to be committed sity among the faculty to include a variety of edu-
America. It attracts outstanding students, has ex- to excellence in teaching both at the undergradu- cational backgrounds and professional and life
cellent facilities, and is ideally located in the mid- ate and the graduate levels. German language experiences.
dle of a vibrant, artistic, diverse and cosmopoli- skills are not required. For information on how to apply, go to usa-
tan city. Additional information may be found at The successful candidate is expected to con- jobs.gov and search with the keyword 509715400.

last byte
[ C ONTI N U E D FRO M P. 176] the camera. I approached the negative, restrain-

And next to her ... I squinted to see ing myself from touching it and leaving
better ... hand on her waist, was me. If I fail, a mark. “Is it a person?” I said. “Could
I gasped. Was this proof that in the I’ll simply the blob be a person?” As I looked
future I would build a machine to more closely, I could make out some
travel back in time? How else could cease to exist. detail. It was surely a bearded man.
I appear in an image produced in the And in his right hand he was holding
19th century? (my stomach clenched) a star.
Moments later I was backpedaling. It was going to happen. Here was
Plenty of Englishmen had beards at the the evidence I could build my ma-
time. It could be a passing resemblance. chine to harness the feedback loop
But genuine or not, the photograph in- ject, say, star-shaped—the first shape and travel back. Desperate to channel
spired me to recall the field equations that occurred to me—I would know the my thoughts, I hurried back to the hire
of the general theory. Going forward in mechanism worked. car where I’d left my notebook. The
time is easy; we do it every moment of I had to see the image again, re- exit route took me though a dark part
every day. Special relativity tells us all calling it was in Fox Talbot’s house, of the old abbey, a dismal, dusty place
we need to do is move … and the faster by the window where it had been with empty stone coffins lined up on
we go, the quicker we get to the future. taken. As I hurried from the museum the floor as if waiting for their future
But traveling back is far more compli- onto the curving driveway, I caught occupants. It was here I began to ask
cated—even theoretically. Ideas for a glimpse of that jewel of a build- myself whether quantum computers
backward time travel usually involve an ing—Lacock Abbey—glowing in the would ever be capable of such sophis-
effect called “frame dragging,” whereby sunlight. I went round the corner, tication in my lifetime. Had I imagined
rotating masses like black holes twist momentarily losing it in a clump of everything? Instead of heading for the
the fabric of space-time as if they were trees, passing a small, surprisingly car, I hurried back to the museum … to
spoons in honey until time loops back lifelike statue of an Egyptian sphinx. see again the first photograph in which
on itself. Round another corner and up a shal- I’d appeared.
I realized that to make backward low flight of steps leading into a There they stood, as they always
time travel practical I might try anoth- vaulted hall. I couldn’t allow myself had, the group in front of the house. I
er oddity of relativity, that energy, like to pause to enjoy a notice saying NO recognized the woman’s impish smile.
mass, produces gravity. If I could pick PHOTOGRAPHY, here of all places, But the man next to her, hand on her
up on the quantum fluctuations of en- continuing through the dining room waist, was nothing like me, but shorter,
ergy in empty space I could set up a into the abbey’s South Gallery. rounder, clean-shaven.
feedback loop whereby the energy pro- It was the smallest center oriel So each year I return, hoping the
duced gravity, feeding back to produce window that provided Fox Talbot’s latest developments in quantum com-
yet more energy ... warping space-time subject. When I arrived, a lively dis- puting are about to make my kind of
sufficiently to make the journey without cussion was under way between the time travel possible. I run my simula-
a black hole. But there was a problem. room’s guide and a lone visitor, with tions and, prompted by my app, time
As the great physicist Richard Feyn- the guide pointing out that the gallery myself and count my steps, walk dif-
man pointed out in 1982, you can only was narrow, making it difficult for Fox ferent routes, pause different lengths
fully simulate quantum systems with a Talbot to project an image. But the of time at each exhibit, yet fail to ap-
quantum computer. Without quantum visitor paid little attention, peering pear in the images.
computing, the process would be im- instead through the window, taking But now I’ve found records of an an-
possible to control—and at the time of in the same view as in the photograph, cestor of mine who lived near Lacock
my first visit, quantum computing was now nearly 200 years on. in Fox Talbot’s time, sharing my name
little more than a theoretical concept. “Were those trees there when Fox and birthday. If I’m in that photograph
Still, that old photograph gave me Talbot took the picture?” he asked the again, with my hand on the woman’s
hope that in the future I might yet use guide. The windowpane had the faded waist—I’ll have travelled back to be-
such a computer to travel into the past. translucency of age, making it diffi- come my own great, great, great grand-
To test my hypothesis, I needed a pre- cult to see out. “You see,” he said, “it’s father. If I fail, I’ll simply cease to exist.
diction I could check. I’d explored the this blob.” We stepped up to the en- I’ll keep uncertainty from my mind
history and science of Fox Talbot’s ear- larged version of the Fox Talbot nega- and fix the reality of the future and the
liest negative many times. It featured tive framed on the gallery wall. Sure past, like Fox Talbot developing a pho-
no human figures, but I knew exactly enough, I could make out a shape on tographic print. I’m sure I can do it. It’s
where and when it was taken. So, if I its right-hand side. My heart leaped. only a matter of time.
could build a time machine, I’d make “I’ve always wondered what the blob on
sure I was outside the window at pre- the right is,” said the visitor. Brian Clegg (www.brianclegg.net) is a science writer
based in the U.K. His most recent books are Are Numbers
cisely the moment Fox Talbot exposed I knew there had never been a blob Real?, an exploration of the relationship between math
his chemically sensitive paper. If I now in the photograph. Until now … and reality, and The Reality Frame, an exploration of
relativity and frames of reference.
looked at the negative and I was in it, “Someone could have chopped
holding some clearly identifiable ob- down a tree,” said the guide helpfully. © 2018 ACM 0001-0782/18/11 $15.00

last byte
From the intersection of computational science and technological speculation,

with boundaries limited only by our ability to imagine what could be.
DOI:10.1145/3280370 Brian Clegg
Future Tense
Between the Abbey
and the Edge of Time
A photo marks my place, then and now.
year visiting Lacock Ab-
I T ’ S M Y 2 5 TH I began that first visit at the Fox Tal-
bey in Wiltshire, England, arriving bot museum near the abbey. The exhib-
early as usual. As I sit in the hire car, its led up to the moment on Lake Como
I replay in my mind the day I realized in 1834 when Fox Talbot had his inspi-
how I might manipulate time. Ever ration. He enjoyed clever toys, trendy
since my doctorate in astrophysics, optical devices like the camera lucida for
I’d been researching the equations projecting an image onto sketching pa-
of general relativity, focusing on the per to guide an artist. I’d like to say, cue
crossover between relativity and the the light bulb over Fox Talbot’s head, ex-
quantum world. But my true interest cept electric light hadn’t been invented
was “closed timelike loops”—what yet. Fox Talbot was studiously tracing an
some might call a time machine. image. Yet he knew that compounds of
It was here, 25 years prior, I had silver darkened when exposed to light.
worked out how a quantum comput- So why not soak the paper in silver salts
er could use a loophole in Einstein’s and let the drawing produce itself?
gravitational field equations to make In August 1835, Fox Talbot set up
time travel possible. I was in England a camera obscura, projecting an im-
for a conference on quantum gravity age of a window in the South Gallery
and took the opportunity to visit the of Lacock Abbey onto treated paper.
home of a personal hero, 19th-century The result (or at least a copy) is on
photography pioneer William Henry the wall beside the window where the
Fox Talbot. Every moment remains picture was taken. It’s tiny—only 1.2
etched in my memory and I have tried inches by one inch—the world’s first
to repeat them each year since. Oriel window in Lacock Abbey photographed photographic negative. Other early
The first anniversary I brought a by William Fox Talbot in 1835. processes produced one-off images,
sheaf of printer output from Monte but Fox Talbot’s negatives provided
Carlo simulations run on a Think- unlimited prints. Forget the idea that
ing Machines CM-5 in Cambridge, Special relativity tells the Victorian information age started
IMAGE BY WILLIA M FOX TA LBOT, F ROM W IKIPEDIA. ORG CC-PD -MA RK
permutating the variables I could with Charles Babbage’s mechanical

alter. A step here, a pause there, at- us all we need to do computers. What Fox Talbot invented
tempting to recreate what happened is move … and was visual information processing, but
that day. Each year since I have tried his bits were silver crystals, in a mecha-
as many walkthrough variants as I the faster we go, nism that became the mainstay of pho-
could. And each year I have failed. I the quicker tography for over a century and a half.
need to use the rest of my time keep- A display of photographs from Fox
ing my career alive, so I limit myself we get to the future. Talbot’s time stood near the exit. That
to the anniversary. Now I’ve written a was where my time journey began.
smartphone app to guide me, but the One showed a group outside a coun-
approach is the same, with each de- try cottage. Most were stiffly Victori-
tail recreated as best I can. But this an, but one, a young woman, smiled
year has to be different. engagingly at [C O NTINUED O N P. 175]

ETRA
ACM SYMPOSIUM ON EYE TRACKING
RESEARCH & APPLICATIONS
June 25-28, 2019

Crowne Plaza
1450 Glenarm Place
Denver, Colorado
The ETRA conference series focuses on eye movement research

& applications across a wide range of disciplines including
computer science, human-computer interaction, visualization,
biomedical research, virtual reality & psychology.
Join us in Denver to celebrate another year of eye tracking research!
Important dates
Papers & Notes Demo/Video & Doctoral Symposium
Dec 14, 2018 Paper abstracts due Mar 08, 2019 Extended abstracts due
Dec 19, 2018 Long & short papers due Mar 15, 2019 Notifications due to authors
Jan 23, 2019 Reviews due to authors Mar 22, 2019 Camera ready papers due
Jan 28, 2019 Rebuttals + revised papers due Challenge Track

Feb 18, 2019 Final notifications to authors Mar 15, 2019 Challenge report due
Mar 01, 2019 Camera ready papers due Mar 29, 2019 Notifications due to authors
Apr 05, 2019 Camera ready papers due
General Chairs: Bonita Sharif (University of Nebraska, Lincoln)

& Krzysztof Krejtz (SWPS University of Social Sciences and Humanities, Poland)
@ETRAConference @ETRAConference WEBSITE: http://etra.acm.org

CONFERENCE 4 – 7 December 2018
EXHIBITION 5 – 7 December 2018
Tokyo International Forum, Japan
The 11th ACM SIGGRAPH Conference and Exhibition on

Computer Graphics and Interactive Techniques in Asia
Platinum Sponsor
Gold Sponsors ：03-5281-9229

◆このデータ以外での表現(3Dによる立体化、エンボス化、変形、着色)は原則として禁ずる。
クリエイティブ室問い合わせ Bronze Sponsors
英文表記
Sponsored by Organized by
REGISTER ONLINE TODAY!

sa2018.siggraph.org/registration

Communications of ACM 2018 November

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Communications of ACM 2018 November

Transféré par

Droits d'auteur :

Formats disponibles

COMMUNICATIONS

Virtual Execution Environments

Abstract deadline: December 7, 2018

General Chair Program Co-chairs

Bringing together researchers and practitioners to explore the design,

Important Dates Key Topics

16th November 2018

Departments Viewpoints Special Section: China Region

5 Cerf’s Up 20 Legally Speaking

27 Education 36 This issue presents the first in

to understand how artificial environments use unique

About the Cover:

IMAGES IN COVER COLLAGE: Naomi Wu photo courtesy of Naomi Wu/Wikimedia

2 COMMUNICATIO NS O F THE ACM | NOV EM BER 201 8 | VO L . 61 | NO. 1 1

Practice Contributed Articles Review Articles

136 Software Challenges for the

147 Technical Perspective

98 Tracking and Controlling 157 LIBS: A Bioelectrical Sensing System

Finding a Needle in a Haystack

Association for Computing Machinery

N OV E MB E R 2 0 1 8 | VO L. 6 1 | N O. 1 1 | C OM M U N IC AT ION S OF THE ACM 3

Jake Baskin New York, NY 10121-0701 USA Board Members

4 COMM UNICATIO NS O F THE ACM | NOV EM BER 201 8 | VO L . 61 | NO. 1 1

DOI:10.1145/3281164 Vinton G. Cerf

The Upper Layers of the Internet

N OV E MB E R 2 0 1 8 | VO L. 6 1 | N O. 1 1 | C OM M U N IC AT ION S OF THE ACM 5

Nominations are invited for the inaugural 2018

The award was established in recognition of Thacker’s pioneering contributions in computing.

The deadline for nominations/endorsements is:

For additional information on ACM’s award program please visit:

DOI:10.1145/3279813 Moshe Y. Vardi

Self-Reference and Section 230

N OV E MB E R 2 0 1 8 | VO L. 6 1 | N O. 1 1 | C OM M U N IC AT ION S OF THE ACM 7

Follow us on Twitter at http://twitter.com/blogCACM

The Gap in CS, Mulling portant way, while fundamentally im-

8 COMMUNICATIO NS O F THE ACM | NOV EM BER 201 8 | VO L . 61 | NO. 1 1

N OV E MB E R 2 0 1 8 | VO L. 6 1 | N O. 1 1 | C OM M U N IC AT ION S OF THE ACM 9

Who can participate?

PHOTOS: ©HLFF / B. Kreutzer (top);

Science | DOI:10.1145/3276742 Don Monroe

AI, Explain Yourself

12 COM MUNICATIO NS O F TH E ACM | NOV EM BER 201 8 | VO L . 61 | NO. 1 1

“It’s time for AI

Technology | DOI:10.1145/3276746 Neil Savage

14 COMMUNICATIO NS O F TH E AC M | NOV EM BER 201 8 | VO L . 61 | N O. 1 1

Society | DOI:10.1145/3276744 Samuel Greengard

Weighing the Impact

16 COMM UNICATIO NS O F THE ACM | NOV EM BER 201 8 | VO L . 61 | NO. 1 1

their data removed from a database Milestones

18 COMM UNICATIO NS O F THE ACM | NOV EM BER 201 8 | VO L . 61 | NO. 1 1

DOI:10.1145/3277562 Pamela Samuelson

20 COM MUNICATIO NS O F TH E ACM | NOV EM BER 201 8 | VO L . 61 | N O. 1 1

Copyright industry groups can be perform automatic filtering on all of the

22 COM MUNICATIO NS O F TH E ACM | NOV EM BER 201 8 | VO L . 61 | N O. 1 1

license to engage in TDM research

advantage in AI research because cautious about November 4–7

the DSM Directive, which did not sig-

DOI:10.1145/3277564 Steven M. Bellovin and Peter G. Neumann

24 COMM UNICATIO NS O F THE ACM | NOV EM BER 201 8 | VO L . 61 | N O. 1 1

enough, or possibly both. Other stud- and subsequently discovered Fore-

point problems—and too little effort

thought leaders in stories that illustrate the power of an

to travel to your event. worthy would also be useful. Again,

26 COM MUNICATIO NS O F TH E AC M | NOV EM BER 201 8 | VO L . 61 | N O. 1 1