Vous êtes sur la page 1sur 7

American Journal of Infection Control 40 (2012) 211-7

Contents lists available at ScienceDirect

American Journal of Infection Control

American Journal of
Infection Control

journal homepage: www.ajicjournal.org

Major article

Assessment of H1N1 questions and answers posted on the Web


Sujin Kim PhD a, b, c, *, Thomas Pinkerton BA b, Nithya Ganesh MS b
a

Division of Biomedical Informatics, College of Public Health, University of Kentucky, Lexington, KY


School of Library and Information Science, College of Communication and Information Studies, University of Kentucky, Lexington, KY
c
Department of Pathology and Laboratory Medicine, School of Medicine, University of Kentucky, Lexington, KY
b

Key Words:
Inuenza A FAQ , H1N1 surveillance
Text mining
PubMed
Inuenza pandemic
Medical Internet research
Consumer health information

Background: A novel strain of human inuenza A (H1N1) posed a serious pandemic threat worldwide
during 2009. The publics fear of pandemic u often raises awareness and discussion of such events.
Objectives: The goal of this study was to characterize major topical matters of H1N1 questions and
answers raised by the online question and answer community Yahoo! Answers during H1N1 outbreak.
Methods: The study used Text Mining for SPSS Clementine (v.12; SPSS Inc., Chicago, IL) to extract the
major concepts of the collected Yahoo! questions and answers. The original collections were retrieved
using H1N1 in search, keyword and then ltered for only resolved questions in the health category
submitted within the past 2 years.
Results: The most frequently formed categories were as follows: general health (health, disease, medicine, investigation, evidence, problem), u-specic terms (H1N1, swine, shot, fever, cold, infective,
throat), and nonmedical issues (feel, North American, people, child, nations, government, states, help,
doubt, emotion). The study found that URL data are fairly predictable: those providing answers are
divided between ones dedicated to giving trustworthy informationdfrom news organizations and the
government, for instancedand those looking to espouse a more biased point of view.
Conclusion: Critical evaluation of online sources should be taught to select the quality of information and
improve health literacy. The challenges of pandemic prevention and control, therefore, demand both esurveillance and better informed Netizens.
Copyright 2012 by the Association for Professionals in Infection Control and Epidemiology, Inc.
Published by Elsevier Inc. All rights reserved.

A novel strain of human inuenza A (H1N1) posed a serious


pandemic threat worldwide during 2009. The publics fear of
pandemic u often creates voluminous online or ofine discussions
about diseases ranging from laboratory conrmation status, age,
relative severity, exposure history, onset of symptoms, and contact
history.1 Pandemic response to the novel u has highlighted the
importance of online information dissemination for the support of
disease control and surveillance. However, the information given at
the majority of the public institutions is packaged to serve unidirectional announcements, in the hope of reaching out to the
majority of the public. Online forums, on the other hand, are
* Address correspondence to Sujin Kim, PhD, Division of Biomedical Informatics,
College of Public Health, University of Kentucky, 339 Lucille Little Fine Art Library,
Lexington, KY 40506-0224.
E-mail address: sujinkim@uky.edu (S. Kim).
Supported in part by a grant (RE-04-08-0069-08) from the Institute of Museum
and Library Service (IMLS) and by grant number P20RR-16481 from the National
Center for Research Resources (NCRR), a component of the National Institutes of
Health (NIH).
Conicts of interest: None to report.

regarded as highly interactive communication and are populated


by people who can both post and answer questions. These
communities have formed to share information and to ll knowledge gaps in health matters. Considering the large number of
people who use Web resources for seeking health information, such
an application could be an important vehicle for disseminating
information and interacting with the goal of serving health-related
information questionsdin the case of this study, H1N1.2,3 Using this
context, the primary goal of this study was to characterize the
major topical matters of H1N1 questions and answers raised by the
online question and answer community Yahoo! Answers during
the 2009 H1N1 outbreak. The following section discusses 2
research streams: inuenza information sources and services and
online knowledge sharing through online questions and answers.

INFLUENZA INFORMATION, SOURCES, AND SERVICES


Frequently, available information on inuenza covers conventional topics of infectious disease, with information sources and

0196-6553/$36.00 - Copyright 2012 by the Association for Professionals in Infection Control and Epidemiology, Inc. Published by Elsevier Inc. All rights reserved.
doi:10.1016/j.ajic.2011.03.028

212

S. Kim et al. / American Journal of Infection Control 40 (2012) 211-7

services running the following gamut: general overviews of the


inuenza virus, vaccination, assessment, laboratory testing, treatment, infection prevention and control measures, pandemic inuenza planning and preparation, human resource issues, work
absences, and travel.4 Depending on the organization, one can nd
tailored information for the general public as well as health care
workers, caregivers, and policy makers that are targeted for specic
u-concerned events and activities such as planning for businesses,
the community, and school events, as well as domestic and international traveling. The information contained in the popular
inuenza sources on the Web is prepared to respond to some
frequently asked questions (FAQs) about vaccine and vaccine
development, what to do if one gets u-like symptoms, how to care
for a sick person, what pregnant women should know, community
strategy for pandemic inuenza mitigation, and national strategy
for pandemic inuenza.4-6 The FAQs goal, of course, is to fulll the
users information need, as well as provide an outlet for the public
to become knowledgeabledat least at a general leveldon the
subject of pandemic inuenzas. However, no access mechanism is
provided on these popular u Web sites for searching the FAQ
questions, which means users must go through the hyperlinked
question list linearly. Despite the fact that the FAQsdas localized
compendiums of inuenza resources and servicesdare intended to
help the public as well as health care workers, the original information sourced from the major Web sites is difcult to navigate (to
nd answers to specic questions) because of its size and lack of
search mechanisms.
The social media technologies have become widely studied in
several academic disciplines as a new method of disease surveillance, by identifying online communities for targeted pandemic
communications. Corley et al, in 2010, identied u trends posted
online and correlated then with the Centers for Disease Control and
Prevention (CDC) Inuenza-like Illness Surveillance Program (ILINet) data by using text and structural data mining techniques.7
User search keywords sent to the popular search engines Google
and Yahoo! were studied to track inuenza-like illness (ILI).8,9
Other inuenza-concerned information services have also been
investigated by analyzing Web access logs and telephone triage
service data, used to detect ILI symptoms.10,11 All of these studies
show evidence of social media technologies being used in health
communication. These media have become critical information
dissemination vehicles during pandemic outbreaks among people
seeking inuenza information.
Sharing knowledge online has become a major topic of research
in the computer sciences and in other information-intensive
domains: the health sciences, library and information science,
and others. Web applications such as Yahoo! Answers are built on
the assumption that everyone knows something and that, for the
most part, people who know something are willing to share their
knowledge with those who seek information. However, the
authority behind these answers is a primary concern of trained
health information specialists because few are experts on these
topics. Some applications address this by providing a function that
shows the best answers, as rated by fellow users, to assist people
in identifying reliable and high-quality answers. However, this
mechanism alone cannot lter the most trustworthy information
for online questioners. This is particularly important when it
comes to serious matters such as health issues because bad
information can result in serious consequences. Knowledgesharing applications should implement methods for highlighting
the best information in a more objective manner because their
biggest challenge is to understand exactly who posts questions
and who answers them. In addition, this current research is
examining what questions are raised and how these questions are
answered and by whom. For instance, Adamic et al, in 2008,

reported that diverse types of questions are often asked by Yahoo!


Answers users and that these types of questions can be predicted
by categorizing the questions posted by a questioner.12 In addition,
to this point, there has been a lack of research that investigates
users of knowledge-sharing systems and the information posted
by them.
In the eld of library and information science, question asking
has been studied as a way of understanding library users information needs for facilitating reference services, as well as being
employed in designing information retrieval systems.13 Library and
information science researchers have heavily studied the characteristics of reference questions, which then facilitate the interaction
between human searchers and bibliographic retrieval systems.
Within the context of information retrieval studies, the questions
are expressed in a few search keywords, as a manifestation of
searchers queries submitted to receive a set of relevant results of
bibliographic records (eg, articles, books, and others). Professional
searchers are trained to analyze the questions posed by end-users
to construct efcient search strategies, by rst identifying major
topics, and then determining a set of subheadings. Some of the
reasons people pose questions online are to update timely information, to write and read compiled or multiple messages in one
place, to communicate with others who are also seeking the same
information, to expand knowledge, to validate information
received from other sources, and to prepare background information before visiting health care provider. Apart from medical
knowledge, some people simply seek emotional support and
encouragement to cope with the disease.14

METHODS
Research questions (RQ)
This study investigates the following 3 research questions to
understand questions and answers posted by the public in search of
H1N1-related information through the online question/answer
discussion service Yahoo! Answers.
 RQ1: What are the topical characteristics of questions posted
by questioners on Yahoo! Answers?
 RQ2: What are the major resources referenced in the answers
gathered from respondents on Yahoo! Answers?
 RQ3: What are the mapping results of the extracted concepts
and descriptors into the vocabularies in the Unied Medical
Language System (UMLS) Metathesaurus?

Data collection
The original collection of the questions and answers were
keyword searched using H1N1 and then ltered for only
resolved questions in the health category submitted within the
past 2 years. The data collection was completed on April 30, 2010.
The answer set only included the best answer selected by the
questioner. The original collection of entries from Yahoo! Answers
was 6,578; however, it contained many items that were either
inappropriate within the scope of this study or duplicates. The
entries were proofread to make the text mining process more
efcient: excess punctuation was removed, and major headings
such as H1N1 were normalized (such outliers as HINI were
edited). Next, the list was narrowed further by eliminating duplicate questions. When eliminating duplicates, the entire question
and answer collection to be examined was narrowed from 6,578
to 5,400.

S. Kim et al. / American Journal of Infection Control 40 (2012) 211-7

Data analysis
The study used Text Mining for SPSS Clementine (v.12; SPSS Inc.,
Chicago, IL) to extract the major topical categories of the collected
Yahoo! questions.15 The essential topics for each question posting
were identied to explore the applicability of using PubMed subject
searching of H1N1 online postings, as per research question 3.
Using the SPSS text mining software Clementine, the subject terms
were chosen from each of 2 English-language Clementine resource
templates: the basic resource template and the Medical Subject
Headings (MeSH) resource template, which is used for indexing
MEDLINE literatures. SPSS Clementine resource templates refer to
a set of specialized libraries, which are made up of dictionaries
used to dene and manage types, terms, synonyms, and exclude
lists. This program was chosen to classify the collected question
and answer posts by automatically generated thesauri because it
utilizes the MeSH resource template. Whereas the MeSH resource
template is specialized for the medical eld, the basic resource
template was utilized to extract concepts in a general domain of
textual corpus analysis. The study reviewed similarities and
differences between the generated concept terms, providing useful
information that may be considered when developing a suggested
terminology for PubMed searching. In addition, a set of core
concepts and descriptors processed through Clementine were
constructed from both the question and answer collections and
automatically mapped into the UMLSs Metathesaurus through
a MetaMap Transfer (MMTx) engine developed by the US National
Library of Medicine (NLM, Bethesda, MD). To characterize contextual mapping between words in Yahoos question and answer
collection and the UMLS vocabularies, only perfect matching by
semantic types is included in the data analyses. As shown by
previous studies on the effectiveness of mapping, the MMTx engine
can also enhance searching.16,17
RESULTS
Topical characteristics of the collected questions: RQ1
The rst research question sought to describe topical characteristics of the collected questions to overview what people post
about H1N1 online. The study extracted major concepts to form
major topical contexts of the posted questions. The collected
questions (n 5,400) were processed using SPSS Clementine (v.12)
and formed 50 major categories based on 5,000 concepts extracted.
The descriptors that dene the formed categories are combination
of the concepts and the vocabularies dened in the resources. The
result is shown in Table 1.
Top 25 major categories identied in question collection using basic
resources and MeSH resources
The most frequently formed categories among 4 analyses are
key terms about general health (health, disease, medicine, investigation, evidence, problem), u specic (H1N1, swine, shot, fever,
cold, infective, throat), and nonmedical issues (feel, North American, people, child, nations, government, states, help, doubt,
emotion). Interestingly, the general health categories such as
disease and health are characterized by descriptors that layout
overall health matters: severe health conditions, health ofcials,
health insurance plan, health complications, health care worker,
health care system, global pandemic health crisis, contagious
diseases, community health, chronic illnesses, alternative health,
and others. The u-specic issues are formed in the categories such
as H1N1, swine, swine u, pandemic, immunization, throat, and
sore, which imply concerns about disease, symptoms, body parts,

213

virus, and treatment of u-specic matters. These ndings indicate


that people are concerned about overall issues of general health
care for preventing H1N1 as well as seeking help on caring the sick.
The majority of the topics formed in the categories supports those
conventional clinical questions including disease, therapy, symptoms, prognosis, prevention and control, and etiology, which can be
rened for H1N1 queries in the medical literature database
PubMed. In comparison with the categories formed by the MeSH
Resource, the basic resource generates interesting categories such
as feel, doubt, emotion, help, learning, and reply, which display the
real frustration of people who seek out u information. This nding
implies that people who post questions about H1N1 are not only
seeking medical information but also seeking for emotional
support to cope with the disease.
Nonconventional questions, ones that are difcult to answer
using conventional bibliographic sources, were detected in this
analysis. These include concepts such as units of measure, date,
time, currency, and percentage; these numeric, data-driven
concepts that could also include mortality rate, drug dosage, and
prevalence of the disease are indicative of a desire for an instant
answer to factual questions required by online questioners in
a timely manner. For the sample question, Cant gure out how
much Thimerosal is contained in a childs dosage of u vaccine from
the CDC website? request numerical answers, which are not
answerable through conventional literature database directly.
These concepts were extracted based on the default types in the
Clementines resource dictionary (type refers to a semantic type,
a category that Clementine divided the category results into. The
detailed result in table format can be requested from the corresponding author for this article. The MeSH resource exclusively
identied gene and term types, which represent medical concept
types in the question and answer sets, but the results were not
comprehensive. Only 15.30% (n 768) of the total concepts (n
5,000) generated were mapped to the MeSH terms, which require
further MeSH mapping to investigate how lay terminologies can be
mapped into professional medical terminologies, which can facilitate PubMed searches for further information on H1N1. The ndings of the MetaMap Transfer engine are addressed in response to
research question 3 in the following section. The types, such as URL,
organization, person, and locations, show the suggested or referenced sources for H1N1 issues. The list of referenced sources can be
used to detect where people go for further information on H1N1 or
where they have been prior to asking a question. In this analysis, it
was found that people suggested credible organizations such as the
World Health Organization, the CDC, and Flu.gov. This nding also
implies that health information agencies or libraries should detect
the sources where people access noncredible H1N1 information. In
addition, the motives for accessing this information are important
so that medical information brokers can provide counterarguments to the untrustworthy information. Finally of note, the
product type identied related H1N1 drugs and vaccines with
which people are concerned about the safety, efcacy, interaction,
toxicity or safety, and cost or benet.
Interestingly, the concepts extracted from the questions and
answers represent major symptoms of H1N1 and preventive
actions that may be taken by users. The majority of symptoms
identied in the analysis align with what health agencies inform
the public as well as health care workers to make a diagnosis. These
include ache, cough, pain, fever, chill, runny nose, diarrhea, soreness, congestion, nausea, stufness, fatigue, sweating, shortness of
breath, sneezing, drowsiness, puking, swelling, migraine, hot
ashes, dizziness, tiredness, stiffness, dehydration, and thirstiness.
This is important not only in conrming identied symptoms for
H1N1 but also in detecting abnormal symptoms to correlate to
novel or mutated viral infections. The suggested actions to prevent

214

S. Kim et al. / American Journal of Infection Control 40 (2012) 211-7

Table 1
Top 25 major categories identied in question collection using basic resources and MeSH resources
Rank
order
1
2
3
4
5
6
7
8
9
10
11
12
13

Basic resource

MeSH

Basic resource

MeSH

Category

Total (N)

Category

Total (n)

Rank
order

Category

Total (n)

Category

Total (n)

Health
H1N1
Protection
People
Shot
Evidence
Learning
Feel
Kids
Doctor
Fever
Virus
Infective

3,011
1,491
1,222
860
783
771
735
672
635
634
625
611
563

55.76
27.61
22.63
15.93
14.50
14.28
13.61
12.44
11.76
11.74
11.57
11.31
10.43

Disease
H1N1
Immunization
People
Shot
Symptoms
Feel
Fever
Pharynx
Pain
Cough
Virus
Nose

2,966
1,456
983
853
740
702
678
639
569
540
492
460
453

54.93
26.96
18.20
15.80
13.70
13.00
12.56
11.83
10.54
10.00
9.11
8.52
8.39

13
14
15
16
17
18
19
20
21
22
23
24
25

Infective
Help
Pain
Breath
Details
Time
Cold
Doubt
Child
Parents
Organism
Medicine
North America

563
556
550
521
519
459
410
397
363
361
345
338
338

10.43
10.30
10.19
9.65
9.61
8.50
7.59
7.35
6.72
6.69
6.39
6.26
6.26

Nose
Child
Swine
Medicine
Health
Investigation
Pharmaceutical preparation
States
Sore
Government
Problem
Control
World

453
370
339
331
295
267
216
186
145
129
127
110
100

8.39
6.85
6.28
6.13
5.46
4.94
4.00
3.44
2.69
2.39
2.35
2.04
1.85

MeSH, Medical Subject Headings.

the spread of H1N1 are distributed by major health care agencies;


this analysis indicates that people are well aware of the primary
prevention tips: cover (your mouth while coughing), wash (your
hands), (use) sanitizers or mask, stay (home, if youre sick), drink
water, and others. Although it is out of the scope of the study, other
concepts found in this study can be correlated with other variables
(symptoms by organs by geographic locations) to detect any
abnormal cases, which are not discovered through conventional
surveillance reporting mechanisms.
Flu-specic topics identied in the question set
The study categorized the identied questions into u-specic
categories including disease/symptoms (n 2,014, 75.69%),
special focus group (n 930, 34.95%), inuenza vaccines (n 837,
31.45%), inuenza viruses (n 388, 14.58%), international issues
(n 352, 13.23%), planning/response (n 315, 11.84%), animals/
birds/pets (n 191, 7.18%), work place issues (n 140, 5.26), air/
food/water (n 118, 4.43), infection (n 105, 3.95%), travel (n 53,
1.99%), disease outbreak (n 53, 1.99%), Medicare/Medicaid (n
38, 1.43%), and supply/distribution (n 6, 0.23%). These u-topic
categories are modied from the question categories used on the
u.gov FAQ site (Flu.gov, 2010). Table 2 lists the number of questions per individual categories with a representative sample
question. The questions are categorized into more than 1 category
to represent complex queries. The nding implies that categorization of the questions into representative groups can facilitate
advanced searching techniques by using MeSH subheadings,
special topic queries, or tags. This leads to the next discussion in
research question 3, where the study transfers the lay terminologies into MeSH vocabularies for PubMed searching.
Major resources referenced in the answer collection: RQ2
The second research question sought to describe major sources
referenced in the collected answers to overview which sources people
refer to help inform peer-questioners about H1N1. Type refers to
a semantic type, a category that Clementine divided the category
results into. A basic breakdown of what types were returned, as well as
how many in each type can produce potential H1N1 sources were
identied in both the question and answer sets. First, the study used
the identied list of URLs to discuss potential H1N1 resources that
people refer to for further information. In URL analysis, a breakdown of
the types of URLs used in the Yahoo! Answers questions includes
.com (unique 15, total 41), .org (unique 6, total 8), and
.gov (unique 2, total 4). The study assumes that, if someone
includes a URL in their question, that they have already visited and

used it either as a background source for their query or are asking for
others to validate the information for them. A majority of the listed
URLs is from the .com domain denoting a commercial site. This does
not immediately discredit them because 3 of those sites are news
agencies. However, it does illustrate that people look primarily to
commercial sources for their information on H1N1. Similarly, of the 6
.org sites, 5 are under the main domain: en.wikipedia.org, a general
reference site. Not only was Wikipedia the most common .org site,
but YouTubedpopular social media video-sharing sitedwas the most
common .com. There were 22 instances of Youtube.com as the
domain, with the second most common only having 4 instances. All of
this result tells us that Yahoo! Answers users likely turn to social media
for much of their information.
In the Answers collection, a breakdown of the URL types include
the following: .com (unique 89, total 422), .org (unique
15, total 73), and .gov (unique 39, total 365). This is a much
more heartening set of results than those in question set. References to a .gov site (of which the CDC and the US Food and Drug
Administration are most represented) are almost as common as
references to a .com. Within the .com extension, there are now
28 unique URLs for news sites. YouTube is cited 39 times, which is
a great amount, but not in proportion to the total number of URLs
(which increased 10-fold within .com alone). Yahoo! Answers
itself is referenced 51 times; many of these refer back to other
answers given. This is common behavior in social media, referencing to preferred sources that have already been given, insystem. Eight of the unique entries are under en.wikipedia.org,
a trend consistent with the question URLs. Like above, social media
is prevalent. However, the most commonly cited unique URL is at
cdc.govda good sign. It shows that the most trustworthy sites are
visited a large amount. One nal nding of note is that there were
no .edu Web sites in either case, even though much information
on the H1N1 epidemic has been researched at universities.
For the organization type, the concepts were divided into
different categories to illustrate what types of sources they might
be: corporation, government, hospital, news, pharmaceutical, and
university. Of note was the lack of entries classied as hospital
concepts in MeSH. The MeSH dictionary simply did not consider
hospital to be a valid semantic type, although it might be an
important source of information, whereas the basic resources
(a nonmedical template) did. In addition, the sharp increase in
pharmaceuticals between question and answer is noteworthy.
This is most likely due to responses to queries about how to treat
the H1N1 u, or information about the companies that produced
medicine of vaccines, for the curious or cautious.
The concepts categorized in location type were divided in
a simple geographic basis, ie, city, state, province, region, or

S. Kim et al. / American Journal of Infection Control 40 (2012) 211-7

215

Table 2
Flu-Specic Topics Identied in the Question Set
No of
Questions

Sample Questions

Diseases/Symptoms

2,014

75.69

Special Focus Group

930

34.95

Inuenza Vaccines
Inuenza viruses

837
388

31.45
14.58

International Issues

352

13.23

Planning/Response

315

11.84

Animals/Birds/Pets
Work Place Issues

191
140

7.18
5.26

Air/Food/Water
Infection
Travel
Disease Outbreak
Medicare/Medicaid
Supply/Distribution

118
105
53
53
38
6

4.43
3.95
1.99
1.99
1.43
0.23

I have u-like symptoms, such as a sore throat and a runny nose, and wonder whether the symptoms are related to
a cold or u or something else more severe? Also, what could you tell me about u-like symptoms in babies?
Should pregnant woman take the H1N1 vaccine or not? I am a pediatric nurse and wonder if I am forced to get
vaccinated or not to keep my job. The reason I am asking because I am hesitant to get vaccinated due to the
newness of the H1N1 vaccine.
Is it okay to receive vaccination even though I am allergic to eggs?
What are the general characteristics of the H1N1 virus compared to the previously known inuenza virus? Can we
protect newborn babies from the inuenza virus by breastfeeding?
If my country borders those countries with an outbreak, how quickly will we receive the outbreak status? What are
the roles of WHO or individual governments who are responsible for preventing the inuenza pandemic?
Why is there such a late response in delivering the u vaccine to our local community? Are vaccines for Novel H1N1
available through my local university?
Can I keep my pets if I am sick? If I am sick with u-like symptoms, how can I keep my pets from getting sick too?
If I miss work due to H1N1, how can I make up my missing work? Who has exible leave policies or alternate work
schedules during a u outbreak and why?
Could a sick cook transmit the u virus if he is not wearing a surgical mask?
Is Swine u infection correlated with H1N1 or not?
Is there any travel restrictions to Korea or Japan? How should I prepare when I travel outside the US?
What was the mortality rate from the u in 1975? How many H1N1 virus outbreaks occurred in 2010 in Kentucky?
Is it true that Medicare pays for seniors to get the seasonal u vaccine?
What is the governments role as vaccine price-setter and production manager?

Flu Topics

country, and there was a very even distribution of occurrences


between them. There is a slightly higher interest in countries and
cities in the question set and a slightly higher interest in states
and countries in the answer set, but nothing is extremely noteworthy; questioners are simply more likely to ask about their
specic area, and answerers are likely to give answers based on
what information is available, which is generally no more
specic than the state level. In addition, it does display an
opportunity for information-giving entities to target more
specic concerns (if they are able) because that is the mostdesired information. The complete list of the URL analysis can be
found at the supplementary study link.
The study nding indicates a trend toward social media for
information, as well as a desire for focused, local information.
Information seekers come to the site with a general idea of what
they desire and are given: on the whole, more focused data.
Because many of the answers point toward ofcial resources, it is
perhaps necessary to provide better penetration of ofcial
resources into the Internet so that fewer people turn toward
unofcial, unreliable information rst.
Mapping results of the extracted concepts and descriptors: RQ3
The third research question sought to map the discovered H1N1
questions to an ofcial medical literature source, PubMed. First, the
study investigates how many concepts and descriptors extracted
from laypersons questions are transferred into ofcial UMLS
vocabularies by questions and answers and by the 2 resources
processed. Table 3 shows the total number of mapping results by
the UMLS semantic types among 6 data sets (eg, questions vs
answers; basic resources vs MeSH resources; descriptors vs
concepts). The ndings show that terminologies in the H1N1
questions online can be transferable to NLMs ofcial languages to
discover further evidences on H1N1 in scientic publication. Corresponding to Clementines major categories generated, the
mapping results show that disease or syndrome, pharmacologic
substance, sign or symptom, therapeutic or preventive procedure,
immunologic factor, body substance, body part, organ or organ
component, virus, and others are found to be frequently mapped
semantic types in the UMLS vocabularies. These indicate that the
MMTx can be utilized to identify appropriate medical terminologies

for further medical literature retrieval, as extracted from online


discussion question and answer contexts.
Mapping results using NLMs MetaMap transfer engine: MMTx
Second, a PubMed search was performed to overview the
number of available publications by using the major H1N1 topics.
(The PubMed search was performed on October 14, 2010. The
results containing the number of PubMed articles on the individual
search topics are reported in 3 separate columns reporting all,
review, and free-text articles. The search result reporting the
PubMed search on H1N1 by using MeSHs subheadings, tags, limitations, and lters can be found at: http://www.uky.edu/wskim3/
h1n1study.html.) The search topics were chosen based on the
results from the major topics types including disease/symptoms,
vaccines, virus, disease outbreak, planning and response, and
others. Some allowable search options built into MeSH/PubMed
such as subheadings, tags, and topic-specic queries were facilitated to rene search statements on the H1N1 specic topics.
The PubMed results in this analysis support the ndings of the
H1N1 major topics identied in the research question 1. The
disease-specic search on PubMed complies with that of Yahoo!
questioners whose major topical interests are H1N1 symptoms and
therapy. The inuenza specic topics such as outbreak, infection,
vaccines, viruses, and prevention and control are also found to be
a major focus of H1N1 studies published in PubMed. H1N1 topics
focusing on special groups, such as children, pregnant women, and
elderly individuals, were frequently asked in the question set. The
search results on specic age and gender groups also show a heavy
volume of publications, as opposed to other demographic groups.
In using MeSH subheadings, the study was able to discover publications on workforce issues, organization and administration,
statistics and numerical data, supply and distribution, trends,
utilization, and legislation and jurisprudence. These publications
might not be discoverable were the subheadings not assigned in
addition to major topics. As such, not only are medical issues
important for the online questioners, but also general issues on
H1N1, like local policies, legal issues, trends, economic impact,
supply and distribution of vaccine, which are regarded as interesting topics by the online questioners and can be further sought
out in PubMed.

216

S. Kim et al. / American Journal of Infection Control 40 (2012) 211-7

Table 3
Mapping results using NLMs MetaMap transfer engine: MMTx
Questions
Semantic types
Disease or syndrome
Pharmacologic substance
Intellectual product
Functional concept
Sign or symptom
Manufactured object
Quantitative concept
Qualitative concept
Geographic area
Therapeutic or preventive procedure
Mammal
Finding
Organic chemical
Professional or occupational group
Immunologic factor
Body substance
Body part, organ, or organ component
Food
Virus
Population group
Amino acid, peptide, or protein
Idea or concept
Biologically active substance
Health care activity

Answers

Total

Basic

MeSH

Basic

MeSH

Questions

Answers

Basic

MeSH

All

106
70
143
54
78
64
76
76
96
25
42
72
46
40
34
23
47
20
7
46
16
110
6
18

119
76
160
62
86
79
77
87
64
35
51
82
55
47
36
24
51
22
8
53
22
126
12
18

108
103
119
80
58
65
72
59
105
49
27
50
55
41
42
21
42
22
18
38
24
92
14
27

95
102
132
74
64
65
71
59
56
54
27
46
52
45
43
22
39
21
18
34
31
90
17
30

225
146
303
116
164
143
153
163
160
60
93
154
101
87
70
47
98
42
15
99
38
236
18
36

203
205
251
154
122
130
143
118
161
103
54
96
107
86
85
43
81
43
36
72
55
182
31
57

214
173
262
134
136
129
148
135
201
74
69
122
101
81
76
44
89
42
25
84
40
202
20
45

214
178
292
136
150
144
148
146
120
89
78
128
107
92
79
46
90
43
26
87
53
216
29
48

428
351
554
270
286
273
296
281
321
163
147
250
208
173
155
90
179
85
51
171
93
418
49
93

NOTE. Cells are deleted to t into page limit. The complete listing can be shown at: http://www.uky.edu/wskim3/h1n1study.html.
NLM, US National Library of Medicine; MeSH, Medical Subject Headings; MMTx, MetaMap Transfer.

DISCUSSION
Epidemics create public fear worldwide, and the H1N1 inuenza
outbreak was no different. People display their concerns online by
writing about H1N1. The analyses on what people post and answer
about H1N1 can characterize the publics information needs.
Checking referred sources can be used as a novel surveillance
mechanism for pandemic diseases. This section discusses the major
ndings of the study relating to the major public health challenges
of the H1N1 inuenza pandemic.
First, the study found that the major topical categories of the
H1N1 questions posted on Yahoo! Answers included general
disease information, symptoms, therapy, inuenza viruses and
vaccines, infection and disease outbreak, planning and response,
prevention and control, and workforce and travel issues. Considering the general typology of clinical question types, the result is
not surprising. This nding suggests that any ofcial site responsible for posting H1N1-specic information should consider constructing a cluster of FAQ by the identied question categories for
efcient use by the public. Although the study nding has no
denitive data supporting whether the H1N1 information should
be presented for special audiences such as patients, caregivers,
pregnant women, mothers, health care workers, policy makers, or
the general public, it is advisable to categorize the H1N1 information for certain user groups based on their roles at work or at home.
Second, the study found some interesting concepts in quantitative or numeric form (eg, unit measures, percent, currency, date,
time), which have been normally ignored in automatic text processing because of lack of semantic values. However, these numeric
values were found to be unit measures of drug dosage, physical
description of people with ILI, cost of vaccines, and prevalence of
disease. These forms of quantitative data, relevant to factual
questions, are not efciently served through conventional medical
literatures. PubMed has featured subheadings, denoted in parentheses, that can further limit search results to statistics and numeric
data (SN) or supply and distribution (SD). However, the published

articles are too comprehensive or difcult to be followed by


laypersons. Systematically collected surveillance data are used by
task forces for accurately quantifying severity, for vaccination
status, for successfully quantifying the effectiveness of interventions, and in capturing the full impact of the pandemic mortality.1
These data are also heavily demanded by both policy makers and
the general public for better decision making and information
seeking. The study recommends more rigorous investigation on
quantitative data posted online for use in the Web surveillance of
H1N1.
Third, the study found that online postings on H1N1 can be
further referenced to scientic discoveries published in PubMed. To
deliver effective and efcient PubMed searches, the study believes
that pre-engineered search techniques and search-narrowing
options are useful in removing irrelevant results.18-22 Search
limits to age- or gender-specic groups or publication date and type
are simply applied in PubMed search but not widely used by
searchers. MeSH subheadings might be too complicated to be
employed in end-users PubMed searches, if they are not trained. In
addition to terminology transfer between the general public and
professionals, future studies on consumer health languages used by
the general public should also consider growing terms by online
citizens (so called, Netizens). These include special characters or
abbreviations or emoticons, which have been neglected for any
search mechanisms in health database. For instance, an emoticon
representing fear can describe publics frustration on a certain
H1N1 issue.
Fourth, the study used a novel way of detecting topical matters
on H1N1 topics expressed online by the general public. The text
mining technique used in this study is especially benecial to
detect geographic locations that people are concerned about. When
geographic location is used in conjunction with the specic issues
of H1N1 (eg, NYC and mandatory vaccination for health care
workers; H1N1 mortality in Mexico), this technique becomes
a critical surveillance tool in detecting inuenza epidemics,
comparable with search query data from Google or Yahoo! being

S. Kim et al. / American Journal of Infection Control 40 (2012) 211-7

analyzed to detect inuenza epidemics.7-9 The study predicts that


more scientic investigations for pandemic surveillance will be
performed as people present their health conditions and issues
online through social media. Using unstructured and massive data
to detect pandemic patterns is not conventionally practiced in
epidemiology. However, a novel technique such as text mining
relevant to clinical data mining can expand the scope of epidemic
surveillance on a timely manner. In addition, this technique is also
useful for detecting anomaly data, which often takes a long time to
be reported in scientic papers. Identifying and responding to new
antigenic variants, or severe cases improved by novel therapy, can
be detected through online postings by widely distributed people
who may not be reached via conventional surveillance networks.
Finally, the study characterized referenced sources in the
answer set by analyzing discovered URLs and organizations within
the concepts. Although this nding is neither conclusive nor
comprehensive enough to generalize, it does show that people who
seek out H1N1 information online frequently refer to online
resources that are easily accessible. More importantly, referencing
credible online sources becomes an essential skill and knowledge
set for people who answer the H1N1 information online. Critical
evaluation of online sources should be taught to select the highest
quality of information to improve health literacy. An instructional
service role for librarians, as an honest broker for consumer health
information, will create better informed health information
consumers.

2.

3.

4.
5.
6.
7.
8.

9.
10.

11.

12.

CONCLUSION
13.

The need for e-surveillance on pandemic diseases like H1N1


stands out because population level of infection prevention and
control data has previously been collected retrospectively. This
leads us to explore nontraditional data sets that policy makers can
use as supplementary data, alongside ofcially collected and
scientically proven evidences. More importantly, people who are
sufciently informed on health information should be able to make
optimal health decisions. This can only be achieved through
information brokers supporting appropriate information when it is
needed and in an accessible format. Technology has developed
many health information vehicles, yet people must properly operate them to support their health care decisions. The challenges of
pandemic prevention and control, therefore, demand e-surveillance and better informed Netizens.
References
1. WHO. The WHO informal network for mathematical modelling for pandemic
inuenza H1N1 2009 (working group on data needs) studies needed to address
public health challenges of the 2009 H1N1 inuenza pandemic: insights from

14.
15.
16.

17.
18.

19.

20.

21.

22.

217

modeling. Version 2. PLoS Curr December 17, 2009. Available from: http://knol
.google.com/k/maria-van-kerkhove/studies-needed-to-address-public-health/agr0
htar1u6r/18#. Accessed September 10, 2010.
Eysenbach G. Infodemiology and infoveillance: Framework for an emerging set
of public health informatics methods to analyze search, communication and
publication behavior on the Internet. J Med Internet Res 2009;11:e11. Available
from: http://www.jmir.org/2009/1/e11. Accessed September 10, 2010.
Zimmerman RK, Wolfe RM, Fox DE, Fox JR, Nowalk MP, Troy JA, et al. Vaccine
criticism on the World Wide Web. J Med Internet Res 2005;7:e17. Available
from: http://www.jmir.org/2005/2/e17/. Accessed August 10, 2010.
US Department of Health and Human Services (HHS). Know what to do about
the u. Available from: http://u.gov/. Accessed June 1, 2010.
Centers for Disease Control and Prevention. Flu references and resources. Available
from: http://www.cdc.gov/u/references.htm. Accessed October 10, 2010.
Giles CL. University of Kentucky emergency management. Novel H1N1 FAQs.
Available from: http://www.uky.edu/EM/u-faqs.html. Accessed June 10, 2010.
Corley CD, Cook DJ, Mikler AR, Singh KP. Using Web and social media for
inuenza surveillance. Adv Exp Med Biol 2010;680:559-64.
Ginsberg J, Mohebbi MH, Patel RS, Brammer LB, Smolinski MS, Brilliant L.
Detecting inuenza epidemics using search engine query data. Nature 2009;
457:1012-4.
Polgreen PM, Chen Y, Pennock DM, Nelson FD. Using Internet searches for
inuenza surveillance. Clin Infect Dis 2008;47:1443-8.
Johnson HA, Wagner MM, Hogan WR, Chapman W, Olszewski RT, Dowling J,
et al. Analysis of Web access logs for surveillance of inuenza. Stud Health
Technol Inform 2004;107:1202-6.
Yih WK, Abrams A, Kleinman K, Kulldorff M, Pinner R, Harmon R, et al. Telephone triage data for detection of inuenza-like illness. Adv Dis Surveill 2007;
2. 4. Available from: http://isdsjournal.org//articles/920.pdf. Accessed October
1, 2010.
Adamic LA, Zhang J, Bakshy E, Ackerman MS. Knowledge sharing and yahoo
answers: everyone knows something. International World Wide Web
Conference archive: Proceeding of the 17th international conference on
World Wide Web table of contents, Beijing, China-Social networks: analysis
of social networks 2008. Available from: http://portal.acm.org/citation.cfm?
id1367587&collGUIDE&dlGUIDE&CFID105950951&CFTOKEN654395
06&retn1#Fulltext. Accessed October 14, 2010.
White MD. Questions in reference interviews. J Documentation 1998;54:
443-65.
Kim S, Chung DS. Characteristics of cancer blog users. J Med Libr Assoc 2007;
95:445-50.
SPSS Inc. Introduction to text mining for Clementine. Chicago [IL]: SPSS, Inc;
2007.
Gay CW, Kayaalp M, Aronson AR. Semi-automatic indexing of full text
biomedical articles. AMIA Annual Symposium Proceedings. AMIA Symposium
2005. Available from: http://ii.nlm.nih.gov/resources/amia05.fulltext.w.footer
.pdf. Accessed June 1, 2010.
Kahn CE. Effective metadata discovery for dynamic ltering of queries to
a radiology image search engine. J Digit Imaging 2008;21:269-73.
Wilczynski NL, McKibbon KA, Haynes RB. Enhancing retrieval of best evidence
for health care from bibliographic databases: calibration of the hand search of
the literature. Stud Health Technol Inform 2001;84:390-3.
Haynes RB, McKibbon KA, Wilczynski NL, Walter SD, Werre SR. Optimal search
strategies for retrieving scientically strong studies of treatment from Medline:
analytical survey. BMJ 2005;330:1179.
Haynes RB, Wilczynski NC for the Hedges Team. Optimal search strategies for
retrieving scientically strong studies of diagnosis from MEDLINE: analytical
survey. BMJ 2004;328:1040.
Montori VM, Wilczynski NL, Morgan D, Haynes RB, Hedges Team. Optimal
search strategies for retrieving systematic reviews from Medline: analytical
survey. BMJ 2005;330:68.
Boynton J, Glanville J, McDaid D, Lefebvre C. Identifying systematic reviews in
MEDLINE: developing an objective approach to search strategy design. J Inf Sci
1998;24:137-57.

Vous aimerez peut-être aussi