Académique Documents
Professionnel Documents
Culture Documents
American Journal of
Infection Control
Major article
Key Words:
Inuenza A FAQ , H1N1 surveillance
Text mining
PubMed
Inuenza pandemic
Medical Internet research
Consumer health information
Background: A novel strain of human inuenza A (H1N1) posed a serious pandemic threat worldwide
during 2009. The publics fear of pandemic u often raises awareness and discussion of such events.
Objectives: The goal of this study was to characterize major topical matters of H1N1 questions and
answers raised by the online question and answer community Yahoo! Answers during H1N1 outbreak.
Methods: The study used Text Mining for SPSS Clementine (v.12; SPSS Inc., Chicago, IL) to extract the
major concepts of the collected Yahoo! questions and answers. The original collections were retrieved
using H1N1 in search, keyword and then ltered for only resolved questions in the health category
submitted within the past 2 years.
Results: The most frequently formed categories were as follows: general health (health, disease, medicine, investigation, evidence, problem), u-specic terms (H1N1, swine, shot, fever, cold, infective,
throat), and nonmedical issues (feel, North American, people, child, nations, government, states, help,
doubt, emotion). The study found that URL data are fairly predictable: those providing answers are
divided between ones dedicated to giving trustworthy informationdfrom news organizations and the
government, for instancedand those looking to espouse a more biased point of view.
Conclusion: Critical evaluation of online sources should be taught to select the quality of information and
improve health literacy. The challenges of pandemic prevention and control, therefore, demand both esurveillance and better informed Netizens.
Copyright 2012 by the Association for Professionals in Infection Control and Epidemiology, Inc.
Published by Elsevier Inc. All rights reserved.
0196-6553/$36.00 - Copyright 2012 by the Association for Professionals in Infection Control and Epidemiology, Inc. Published by Elsevier Inc. All rights reserved.
doi:10.1016/j.ajic.2011.03.028
212
METHODS
Research questions (RQ)
This study investigates the following 3 research questions to
understand questions and answers posted by the public in search of
H1N1-related information through the online question/answer
discussion service Yahoo! Answers.
RQ1: What are the topical characteristics of questions posted
by questioners on Yahoo! Answers?
RQ2: What are the major resources referenced in the answers
gathered from respondents on Yahoo! Answers?
RQ3: What are the mapping results of the extracted concepts
and descriptors into the vocabularies in the Unied Medical
Language System (UMLS) Metathesaurus?
Data collection
The original collection of the questions and answers were
keyword searched using H1N1 and then ltered for only
resolved questions in the health category submitted within the
past 2 years. The data collection was completed on April 30, 2010.
The answer set only included the best answer selected by the
questioner. The original collection of entries from Yahoo! Answers
was 6,578; however, it contained many items that were either
inappropriate within the scope of this study or duplicates. The
entries were proofread to make the text mining process more
efcient: excess punctuation was removed, and major headings
such as H1N1 were normalized (such outliers as HINI were
edited). Next, the list was narrowed further by eliminating duplicate questions. When eliminating duplicates, the entire question
and answer collection to be examined was narrowed from 6,578
to 5,400.
Data analysis
The study used Text Mining for SPSS Clementine (v.12; SPSS Inc.,
Chicago, IL) to extract the major topical categories of the collected
Yahoo! questions.15 The essential topics for each question posting
were identied to explore the applicability of using PubMed subject
searching of H1N1 online postings, as per research question 3.
Using the SPSS text mining software Clementine, the subject terms
were chosen from each of 2 English-language Clementine resource
templates: the basic resource template and the Medical Subject
Headings (MeSH) resource template, which is used for indexing
MEDLINE literatures. SPSS Clementine resource templates refer to
a set of specialized libraries, which are made up of dictionaries
used to dene and manage types, terms, synonyms, and exclude
lists. This program was chosen to classify the collected question
and answer posts by automatically generated thesauri because it
utilizes the MeSH resource template. Whereas the MeSH resource
template is specialized for the medical eld, the basic resource
template was utilized to extract concepts in a general domain of
textual corpus analysis. The study reviewed similarities and
differences between the generated concept terms, providing useful
information that may be considered when developing a suggested
terminology for PubMed searching. In addition, a set of core
concepts and descriptors processed through Clementine were
constructed from both the question and answer collections and
automatically mapped into the UMLSs Metathesaurus through
a MetaMap Transfer (MMTx) engine developed by the US National
Library of Medicine (NLM, Bethesda, MD). To characterize contextual mapping between words in Yahoos question and answer
collection and the UMLS vocabularies, only perfect matching by
semantic types is included in the data analyses. As shown by
previous studies on the effectiveness of mapping, the MMTx engine
can also enhance searching.16,17
RESULTS
Topical characteristics of the collected questions: RQ1
The rst research question sought to describe topical characteristics of the collected questions to overview what people post
about H1N1 online. The study extracted major concepts to form
major topical contexts of the posted questions. The collected
questions (n 5,400) were processed using SPSS Clementine (v.12)
and formed 50 major categories based on 5,000 concepts extracted.
The descriptors that dene the formed categories are combination
of the concepts and the vocabularies dened in the resources. The
result is shown in Table 1.
Top 25 major categories identied in question collection using basic
resources and MeSH resources
The most frequently formed categories among 4 analyses are
key terms about general health (health, disease, medicine, investigation, evidence, problem), u specic (H1N1, swine, shot, fever,
cold, infective, throat), and nonmedical issues (feel, North American, people, child, nations, government, states, help, doubt,
emotion). Interestingly, the general health categories such as
disease and health are characterized by descriptors that layout
overall health matters: severe health conditions, health ofcials,
health insurance plan, health complications, health care worker,
health care system, global pandemic health crisis, contagious
diseases, community health, chronic illnesses, alternative health,
and others. The u-specic issues are formed in the categories such
as H1N1, swine, swine u, pandemic, immunization, throat, and
sore, which imply concerns about disease, symptoms, body parts,
213
214
Table 1
Top 25 major categories identied in question collection using basic resources and MeSH resources
Rank
order
1
2
3
4
5
6
7
8
9
10
11
12
13
Basic resource
MeSH
Basic resource
MeSH
Category
Total (N)
Category
Total (n)
Rank
order
Category
Total (n)
Category
Total (n)
Health
H1N1
Protection
People
Shot
Evidence
Learning
Feel
Kids
Doctor
Fever
Virus
Infective
3,011
1,491
1,222
860
783
771
735
672
635
634
625
611
563
55.76
27.61
22.63
15.93
14.50
14.28
13.61
12.44
11.76
11.74
11.57
11.31
10.43
Disease
H1N1
Immunization
People
Shot
Symptoms
Feel
Fever
Pharynx
Pain
Cough
Virus
Nose
2,966
1,456
983
853
740
702
678
639
569
540
492
460
453
54.93
26.96
18.20
15.80
13.70
13.00
12.56
11.83
10.54
10.00
9.11
8.52
8.39
13
14
15
16
17
18
19
20
21
22
23
24
25
Infective
Help
Pain
Breath
Details
Time
Cold
Doubt
Child
Parents
Organism
Medicine
North America
563
556
550
521
519
459
410
397
363
361
345
338
338
10.43
10.30
10.19
9.65
9.61
8.50
7.59
7.35
6.72
6.69
6.39
6.26
6.26
Nose
Child
Swine
Medicine
Health
Investigation
Pharmaceutical preparation
States
Sore
Government
Problem
Control
World
453
370
339
331
295
267
216
186
145
129
127
110
100
8.39
6.85
6.28
6.13
5.46
4.94
4.00
3.44
2.69
2.39
2.35
2.04
1.85
used it either as a background source for their query or are asking for
others to validate the information for them. A majority of the listed
URLs is from the .com domain denoting a commercial site. This does
not immediately discredit them because 3 of those sites are news
agencies. However, it does illustrate that people look primarily to
commercial sources for their information on H1N1. Similarly, of the 6
.org sites, 5 are under the main domain: en.wikipedia.org, a general
reference site. Not only was Wikipedia the most common .org site,
but YouTubedpopular social media video-sharing sitedwas the most
common .com. There were 22 instances of Youtube.com as the
domain, with the second most common only having 4 instances. All of
this result tells us that Yahoo! Answers users likely turn to social media
for much of their information.
In the Answers collection, a breakdown of the URL types include
the following: .com (unique 89, total 422), .org (unique
15, total 73), and .gov (unique 39, total 365). This is a much
more heartening set of results than those in question set. References to a .gov site (of which the CDC and the US Food and Drug
Administration are most represented) are almost as common as
references to a .com. Within the .com extension, there are now
28 unique URLs for news sites. YouTube is cited 39 times, which is
a great amount, but not in proportion to the total number of URLs
(which increased 10-fold within .com alone). Yahoo! Answers
itself is referenced 51 times; many of these refer back to other
answers given. This is common behavior in social media, referencing to preferred sources that have already been given, insystem. Eight of the unique entries are under en.wikipedia.org,
a trend consistent with the question URLs. Like above, social media
is prevalent. However, the most commonly cited unique URL is at
cdc.govda good sign. It shows that the most trustworthy sites are
visited a large amount. One nal nding of note is that there were
no .edu Web sites in either case, even though much information
on the H1N1 epidemic has been researched at universities.
For the organization type, the concepts were divided into
different categories to illustrate what types of sources they might
be: corporation, government, hospital, news, pharmaceutical, and
university. Of note was the lack of entries classied as hospital
concepts in MeSH. The MeSH dictionary simply did not consider
hospital to be a valid semantic type, although it might be an
important source of information, whereas the basic resources
(a nonmedical template) did. In addition, the sharp increase in
pharmaceuticals between question and answer is noteworthy.
This is most likely due to responses to queries about how to treat
the H1N1 u, or information about the companies that produced
medicine of vaccines, for the curious or cautious.
The concepts categorized in location type were divided in
a simple geographic basis, ie, city, state, province, region, or
215
Table 2
Flu-Specic Topics Identied in the Question Set
No of
Questions
Sample Questions
Diseases/Symptoms
2,014
75.69
930
34.95
Inuenza Vaccines
Inuenza viruses
837
388
31.45
14.58
International Issues
352
13.23
Planning/Response
315
11.84
Animals/Birds/Pets
Work Place Issues
191
140
7.18
5.26
Air/Food/Water
Infection
Travel
Disease Outbreak
Medicare/Medicaid
Supply/Distribution
118
105
53
53
38
6
4.43
3.95
1.99
1.99
1.43
0.23
I have u-like symptoms, such as a sore throat and a runny nose, and wonder whether the symptoms are related to
a cold or u or something else more severe? Also, what could you tell me about u-like symptoms in babies?
Should pregnant woman take the H1N1 vaccine or not? I am a pediatric nurse and wonder if I am forced to get
vaccinated or not to keep my job. The reason I am asking because I am hesitant to get vaccinated due to the
newness of the H1N1 vaccine.
Is it okay to receive vaccination even though I am allergic to eggs?
What are the general characteristics of the H1N1 virus compared to the previously known inuenza virus? Can we
protect newborn babies from the inuenza virus by breastfeeding?
If my country borders those countries with an outbreak, how quickly will we receive the outbreak status? What are
the roles of WHO or individual governments who are responsible for preventing the inuenza pandemic?
Why is there such a late response in delivering the u vaccine to our local community? Are vaccines for Novel H1N1
available through my local university?
Can I keep my pets if I am sick? If I am sick with u-like symptoms, how can I keep my pets from getting sick too?
If I miss work due to H1N1, how can I make up my missing work? Who has exible leave policies or alternate work
schedules during a u outbreak and why?
Could a sick cook transmit the u virus if he is not wearing a surgical mask?
Is Swine u infection correlated with H1N1 or not?
Is there any travel restrictions to Korea or Japan? How should I prepare when I travel outside the US?
What was the mortality rate from the u in 1975? How many H1N1 virus outbreaks occurred in 2010 in Kentucky?
Is it true that Medicare pays for seniors to get the seasonal u vaccine?
What is the governments role as vaccine price-setter and production manager?
Flu Topics
216
Table 3
Mapping results using NLMs MetaMap transfer engine: MMTx
Questions
Semantic types
Disease or syndrome
Pharmacologic substance
Intellectual product
Functional concept
Sign or symptom
Manufactured object
Quantitative concept
Qualitative concept
Geographic area
Therapeutic or preventive procedure
Mammal
Finding
Organic chemical
Professional or occupational group
Immunologic factor
Body substance
Body part, organ, or organ component
Food
Virus
Population group
Amino acid, peptide, or protein
Idea or concept
Biologically active substance
Health care activity
Answers
Total
Basic
MeSH
Basic
MeSH
Questions
Answers
Basic
MeSH
All
106
70
143
54
78
64
76
76
96
25
42
72
46
40
34
23
47
20
7
46
16
110
6
18
119
76
160
62
86
79
77
87
64
35
51
82
55
47
36
24
51
22
8
53
22
126
12
18
108
103
119
80
58
65
72
59
105
49
27
50
55
41
42
21
42
22
18
38
24
92
14
27
95
102
132
74
64
65
71
59
56
54
27
46
52
45
43
22
39
21
18
34
31
90
17
30
225
146
303
116
164
143
153
163
160
60
93
154
101
87
70
47
98
42
15
99
38
236
18
36
203
205
251
154
122
130
143
118
161
103
54
96
107
86
85
43
81
43
36
72
55
182
31
57
214
173
262
134
136
129
148
135
201
74
69
122
101
81
76
44
89
42
25
84
40
202
20
45
214
178
292
136
150
144
148
146
120
89
78
128
107
92
79
46
90
43
26
87
53
216
29
48
428
351
554
270
286
273
296
281
321
163
147
250
208
173
155
90
179
85
51
171
93
418
49
93
NOTE. Cells are deleted to t into page limit. The complete listing can be shown at: http://www.uky.edu/wskim3/h1n1study.html.
NLM, US National Library of Medicine; MeSH, Medical Subject Headings; MMTx, MetaMap Transfer.
DISCUSSION
Epidemics create public fear worldwide, and the H1N1 inuenza
outbreak was no different. People display their concerns online by
writing about H1N1. The analyses on what people post and answer
about H1N1 can characterize the publics information needs.
Checking referred sources can be used as a novel surveillance
mechanism for pandemic diseases. This section discusses the major
ndings of the study relating to the major public health challenges
of the H1N1 inuenza pandemic.
First, the study found that the major topical categories of the
H1N1 questions posted on Yahoo! Answers included general
disease information, symptoms, therapy, inuenza viruses and
vaccines, infection and disease outbreak, planning and response,
prevention and control, and workforce and travel issues. Considering the general typology of clinical question types, the result is
not surprising. This nding suggests that any ofcial site responsible for posting H1N1-specic information should consider constructing a cluster of FAQ by the identied question categories for
efcient use by the public. Although the study nding has no
denitive data supporting whether the H1N1 information should
be presented for special audiences such as patients, caregivers,
pregnant women, mothers, health care workers, policy makers, or
the general public, it is advisable to categorize the H1N1 information for certain user groups based on their roles at work or at home.
Second, the study found some interesting concepts in quantitative or numeric form (eg, unit measures, percent, currency, date,
time), which have been normally ignored in automatic text processing because of lack of semantic values. However, these numeric
values were found to be unit measures of drug dosage, physical
description of people with ILI, cost of vaccines, and prevalence of
disease. These forms of quantitative data, relevant to factual
questions, are not efciently served through conventional medical
literatures. PubMed has featured subheadings, denoted in parentheses, that can further limit search results to statistics and numeric
data (SN) or supply and distribution (SD). However, the published
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
CONCLUSION
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
217
modeling. Version 2. PLoS Curr December 17, 2009. Available from: http://knol
.google.com/k/maria-van-kerkhove/studies-needed-to-address-public-health/agr0
htar1u6r/18#. Accessed September 10, 2010.
Eysenbach G. Infodemiology and infoveillance: Framework for an emerging set
of public health informatics methods to analyze search, communication and
publication behavior on the Internet. J Med Internet Res 2009;11:e11. Available
from: http://www.jmir.org/2009/1/e11. Accessed September 10, 2010.
Zimmerman RK, Wolfe RM, Fox DE, Fox JR, Nowalk MP, Troy JA, et al. Vaccine
criticism on the World Wide Web. J Med Internet Res 2005;7:e17. Available
from: http://www.jmir.org/2005/2/e17/. Accessed August 10, 2010.
US Department of Health and Human Services (HHS). Know what to do about
the u. Available from: http://u.gov/. Accessed June 1, 2010.
Centers for Disease Control and Prevention. Flu references and resources. Available
from: http://www.cdc.gov/u/references.htm. Accessed October 10, 2010.
Giles CL. University of Kentucky emergency management. Novel H1N1 FAQs.
Available from: http://www.uky.edu/EM/u-faqs.html. Accessed June 10, 2010.
Corley CD, Cook DJ, Mikler AR, Singh KP. Using Web and social media for
inuenza surveillance. Adv Exp Med Biol 2010;680:559-64.
Ginsberg J, Mohebbi MH, Patel RS, Brammer LB, Smolinski MS, Brilliant L.
Detecting inuenza epidemics using search engine query data. Nature 2009;
457:1012-4.
Polgreen PM, Chen Y, Pennock DM, Nelson FD. Using Internet searches for
inuenza surveillance. Clin Infect Dis 2008;47:1443-8.
Johnson HA, Wagner MM, Hogan WR, Chapman W, Olszewski RT, Dowling J,
et al. Analysis of Web access logs for surveillance of inuenza. Stud Health
Technol Inform 2004;107:1202-6.
Yih WK, Abrams A, Kleinman K, Kulldorff M, Pinner R, Harmon R, et al. Telephone triage data for detection of inuenza-like illness. Adv Dis Surveill 2007;
2. 4. Available from: http://isdsjournal.org//articles/920.pdf. Accessed October
1, 2010.
Adamic LA, Zhang J, Bakshy E, Ackerman MS. Knowledge sharing and yahoo
answers: everyone knows something. International World Wide Web
Conference archive: Proceeding of the 17th international conference on
World Wide Web table of contents, Beijing, China-Social networks: analysis
of social networks 2008. Available from: http://portal.acm.org/citation.cfm?
id1367587&collGUIDE&dlGUIDE&CFID105950951&CFTOKEN654395
06&retn1#Fulltext. Accessed October 14, 2010.
White MD. Questions in reference interviews. J Documentation 1998;54:
443-65.
Kim S, Chung DS. Characteristics of cancer blog users. J Med Libr Assoc 2007;
95:445-50.
SPSS Inc. Introduction to text mining for Clementine. Chicago [IL]: SPSS, Inc;
2007.
Gay CW, Kayaalp M, Aronson AR. Semi-automatic indexing of full text
biomedical articles. AMIA Annual Symposium Proceedings. AMIA Symposium
2005. Available from: http://ii.nlm.nih.gov/resources/amia05.fulltext.w.footer
.pdf. Accessed June 1, 2010.
Kahn CE. Effective metadata discovery for dynamic ltering of queries to
a radiology image search engine. J Digit Imaging 2008;21:269-73.
Wilczynski NL, McKibbon KA, Haynes RB. Enhancing retrieval of best evidence
for health care from bibliographic databases: calibration of the hand search of
the literature. Stud Health Technol Inform 2001;84:390-3.
Haynes RB, McKibbon KA, Wilczynski NL, Walter SD, Werre SR. Optimal search
strategies for retrieving scientically strong studies of treatment from Medline:
analytical survey. BMJ 2005;330:1179.
Haynes RB, Wilczynski NC for the Hedges Team. Optimal search strategies for
retrieving scientically strong studies of diagnosis from MEDLINE: analytical
survey. BMJ 2004;328:1040.
Montori VM, Wilczynski NL, Morgan D, Haynes RB, Hedges Team. Optimal
search strategies for retrieving systematic reviews from Medline: analytical
survey. BMJ 2005;330:68.
Boynton J, Glanville J, McDaid D, Lefebvre C. Identifying systematic reviews in
MEDLINE: developing an objective approach to search strategy design. J Inf Sci
1998;24:137-57.