Vous êtes sur la page 1sur 13

Available online at www.sciencedirect.


Nurs Outlook xxx (2016) 1e13


Big data science: A literature review of nursing research

Bonnie L. Westra, PhD, RN, FAAN, FACMIa,*, Martha Sylvia, PhD, MBA, RNb,
Elizabeth F. Weinfurter, MLISc, Lisiane Pruinelli, PhD, RNa, Jung In Park, PhD, RNa,
Dianna Dodd, MSN, RN, RN-BC, CCMd, Gail M. Keenan, PhD, RN, FAANe,
Patricia Senk, PhD, RNf, Rachel L. Richesson, PhD, MS, MPH, FACMIg,
Vicki Baukner, MS, RNh, Christopher Cruz, BS, RNi, Grace Gao, DNP, RNa,
Luann Whittenburg, PhD, RN BC, FHIMSS, FAANj,
Connie W. Delaney, PhD, RN, FAAN, FACMIa
School of Nursing, University of Minnesota, Minneapolis, MN
College of Nursing, Medical University of South Carolina, Charleston, SC
Health Sciences Libraries, University of Minnesota, Minneapolis, MN
Cerner Corporation, Kansas City, MO
University of Florida College of Nursing, Gainesville, FL
The College of St. Scholastica, Duluth, MN
Duke University School of Nursing, Durham, NC
Ridgeview Medical Center, Waconia, MN
Kaiser Permanente, Oakland, CA
MediComp Systems, Alexandria, VA

article info abstract

Article history: Background: Big data and cutting-edge analytic methods in nursing research
Received 4 June 2016 challenge nurse scientists to extend the data sources and analytic methods used
Revised 3 November 2016 for discovering and translating knowledge.
Accepted 21 November 2016 Purpose: The purpose of this study was to identify, analyze, and synthesize exemplars
of big data nursing research applied to practice and disseminated in key nursing
informatics, general biomedical informatics, and nursing research journals.
Methods: A literature review of studies published between 2009 and 2015. There
Big data
were 650 journal articles identified in 17 key nursing informatics, general
Data science
biomedical informatics, and nursing research journals in the Web of Science
Nursing informatics
database. After screening for inclusion and exclusion criteria, 17 studies published
Nursing research
in 18 articles were identified as big data nursing research applied to practice.
Nurse scientist
Discussion: Nurses clearly are beginning to conduct big data research applied to
practice. These studies represent multiple data sources and settings. Although
numerous analytic methods were used, the fundamental issue remains to define
the types of analyses consistent with big data analytic methods.
Conclusion: There are needs to increase the visibility of big data and data science
research conducted by nurse scientists, further examine the use of state of the
science in data analytics, and continue to expand the availability and use of a

* Corresponding author: Bonnie L. Westra, School of Nursing, University of Minnesota, 308 Harvard St. SE, WDH 5-140, Minneapolis, MN
E-mail address: westr006@umn.edu (B.L. Westra).
0029-6554/$ - see front matter Ó 2016 Elsevier Inc. All rights reserved.
2 Nurs Outlook xxx (2016) 1e13

variety of scientific, governmental, and industry data resources. A major

implication of this literature review is whether nursing faculty and preparation
of future scientists (PhD programs) are prepared for big data and data science.
Cite this article: Westra, B. L., Sylvia, M., Weinfurter, E. F., Pruinelli, L., Park, J. I., Dodd, D., Keenan, G. M.,
Senk, P., Richesson, R. L., Baukner, V., Cruz, C., Gao, G., Whittenburg, L., & Delaney, C. W. (2016, -). Big
data science: A literature review of nursing research exemplars. Nursing Outlook, -(-), 1-13. http://

Background the quadruple aim of better health, improved patient

experiences, reduced costs, and improved satisfaction
of providers (Bodenheimer & Sinsky, 2014). Boyd and
The era of big data and cutting-edge analytic methods in Crawford note that “Big Data reframes key questions
research and clinical scholarship challenges nurse sci- about the constitution of knowledge, the processes of
entists to extend the data sources and analytic methods research, how we should engage with information, and
used for discovering and translating knowledge to the nature and the categorization of reality .. Big Data
improve care quality and safety, lower costs, and stakes out new terrains of objects, methods of
address provider satisfaction. Big data are described knowing, and definitions of social life” (p. 665). Kitchin
most often with the five Vsdvolume, velocity, variety, (2014) further asserts that a new epistemological
veracity, and value. The first three are the most com- approach is needed to make sense in the world in
mon characteristics of big data (Brennan & Bakken, which new data analytics can lead to new insights
2015; Gandomi & Haider, 2015; Kitchin, 2014; Marr, “born from the data” rather than testing theory.
2015). Volume refers to the magnitude of the data, We have moved into the fourth Paradigm of Research
whereas velocity addresses the speed at which data are (Hey, Tansley, & Tolle, 2009), in which patterns in data
generated, translated into research or practice, and lead to hypotheses for knowledge discovery, going
analyzed. Variety includes the availability and use of beyond hypothesis driving data collection. Discovering
multiple data sources and the structural heterogeneity patterns requires machine learning techniques that can
of the data types. One example of heterogeneity is the handle large complex data sets to create predictive
integration of electronic health record (EHR) data com- models for outcomes (Obermeyer & Emanuel, 2016).
bined with death index data, genomics or imaging data, Examples of big data methods or new data analytics
social media, and personal app data. Veracity addresses include both multivariate statistics such as logistic
the characteristics of quality in that data are often regression and contemporary methods such as social
noisy, messy, and contain errors. Although issues of network analysis, natural language processing, data
data uncertainty result, data quality methods, tools, mining, and machine learning techniques.
and analytic methods can be used to reduce uncer- The National Institute of Nursing Research (NINR),
tainty. Finally, value is defined as the ability to obtain in its 2016 strategic plan, emphasizes the importance
insights from the data and repurpose data for multiple of technology and its influence on nursing science
uses. The “big data” term represents large, complex data (NINR, 2016). Specific examples are mentioned in the
sets that require new ways of thinking, novel analytics, plan such as studying sensor technology to support
new team science members, and redesigning the core people safely remaining in their homes, electronic
constructs of conducting science as depicted in Figure 1. health record data to identify high-risk patients for
Digital data are ubiquitous, making it possible to hospital readmissions, and computational science for
amass big data to discover new knowledge and achieve tracking patients across the health continuum through
linked EHR data. Digital data are increasing in impor-
tance for science, and the methods to analyze such
data require new data analytics.
Insights and expertise from an interdisciplinary
team are needed to discover meaningful knowledge
from todays’ large, complex data sets (big data). Data
analysts with a strong knowledge of databases and
data dictionaries are needed to query the data and
determine data elements available and the best one(s)
to use. Data engineers often need to build a database
platform that can store multiple forms (variety) of data
that may include structured data, text, audio, and
video. Often the data need to be processed in real time
as it streams from various physiologic sensors or from
social media sites. Data scientists collaborate with data
Figure 1 e Big data and data science nursing engineers to merge large data files and preprocess the
research model. data to harmonize it and eliminate redundancies,
Nurs Outlook xxx (2016) 1e13 3

manage data gaps, and reduce errors. Like biostatistics, and disseminated in key informatics and nursing
computer science contributes a deep understanding of research journals and general biomedical informatics
mathematics and probabilistic and computational journals. An exemplar indicates that the type of
power needed to manage and analyze large amounts of research is one that advances the science by focusing
data. Domain experts (nurse scientists) work closely on more complex or cutting-edge methods of analyzing
with data scientists and analysts to identify key factors big data. The following inclusion and exclusion criteria
for analysis, reduce dimensionality, model the data, were used in the literature search and subsequent
and interpret the results. analysis of the studies retrieved. Inclusion criteria
“Biomedical informatics is the interdisciplinary, included the following:
scientific field that studies and pursues the effective
uses of biomedical data, information, and knowledge 1. One of the authors must be a nurse to be considered
for scientific inquiry, problem solving, and decision nursing research (Kim, Ohno-Machado, Oh, & Jiang,
making, motivated by efforts to improve human 2014).
health” (American Medical Informatics Association, 2. Research is published in one of the peer-reviewed
2016). Nursing informatics, specifically, is essential to journals listed in the search strategy (Kim et al.,
team science serving as the translator between domain 2014).
knowledge and computer science with the multifac- 3. The focus is on nursing practice or systems that
eted connections of health conditions and its related affect nursing. Nursing practice is defined as
factors. Nurse scientists lead in important roles in the nursing care, patient conditions important to
interdisciplinary teams by providing clinical knowl- nurses, and/or systems in which nurses work.
edge and detecting meaningful signals from research 4. Data resources must be digital. Data science ad-
to advance health science. dresses the use of digital data according to the Na-
Although working with large data sets and inter- tional Consortium on Data Science (Ahalt et al., 2013).
professional teams is not new for nurse scientists, it is 5. Although inferential analysis can be used for big
unknown if these nurse scientists are using advanced data research (Wang & Krishnan, 2014), this litera-
and newer analytics to discover new knowledge, ture search focuses on multivariate analysis or
contribute to the emerging precision medicine initia- contemporary methods of analyzing big data that
tive, and achieve the quadruple aim. Precision medi- move the science forward.
cine focuses on targeting treatment to individuals 6. Evidence of at least one big data characteristic:
based on genomics, individual patient characteristics, volume in sample size or data points collected,
environment, and lifestyle variables (National velocity of data, and variety in data format is
Institutes of Health, 2015). Nursing science can make present.
use of big data to enhance nursing care that is tailored 7. Studies are published January 2009 through
to patients’ individual needs. The NINR supports nurse December 2015.
scientists integrating big data into nursing research to
develop personalized care for preventing and man- Exclusion criteria were the following:
aging illness. It is imperative that nurse scientists
engage in big data and interdisciplinary research teams 1. Designs that are qualitative, case studies, quality
throughout the process of identifying and using improvement, surveys, or usability studies.
methods of discovery to dissemination of findings. 2. Analysis that is descriptive, univariate, or bivariate
Visibility in nursing research and interdisciplinary only.
informatics journals position nurse scientists to be 3. Research that focuses on teaching, learning, or
readily identifiable as innovators in academic nursing, informatics competencies.
selected as reviewers on national big data and precision
medicine research peer-review panels, and recognized Search strategies were developed to identify articles
as decision makers in health policy groups. The pur- that met the criteria; however, a manual review
pose of this literature review was to identify, analyze, (described later) was conducted to further determine if
and synthesize exemplars of big data nursing research the studies met inclusion and exclusion criteria.
applied to practice and disseminated in key nursing The literature search was designed and conducted by
informatics, general biomedical informatics, and a master’s prepared librarian with more than 15 years of
nursing research journals. experience working with students, faculty, and staff
affiliated with the university’s Academic Health Center.
The search methodology was similar to that reported by
Methods Kim et al. (2014) for review of nursing informatics
research. Kim et al. (2014) selected 13 key nursing infor-
matics, general biomedical informatics, and nursing
Article Retrieval research journals and identified relevant studies pub-
lished from 2009 through 2013 in these journals using the
A literature review was conducted to identify exem- Web of Science database. For the current review, the
plars of big data nursing research applied to practice same journals selected by Kim et al. (2014) were used,
4 Nurs Outlook xxx (2016) 1e13

and four additional journals were included where nursing research journals, eight publications were
nursing big data articles were likely to be published. included: International Journal of Nursing Studies, Journal
These 17 journals were searched using the Web of Sci- of Advanced Nursing, Nursing Research, Journal of Nursing
ence database on February 23, 2016. Results were limited Scholarship, Nursing Outlook, Journal of Nursing Admin-
to publication dates from 2009 through 2015. This date istration, Western Journal of Nursing Research, and
range defines the beginning of a new era for digital data Research in Nursing and Health.
available in electronic health records through the time Journals in the nursing research group were
frame when the literature search was initiated. searched to retrieve broad big data research concepts
Three unique search approaches, each optimized using the terms (“machine learning” OR “health infor-
for the characteristics of the discipline, were mation technolog*” OR “data mining” OR “clinical
employed for each category of journals: nursing information” OR “health information exchange*” OR
informatics, general biomedical informatics, and “data warehous*” OR informatics) in the title, abstract,
nursing research. For the nursing informatics journal or keywords. These terms were chosen by the team
category, one publication was included (Computers after an initial broad general review of informatics
Informatics Nursing). All articles from this journal were research articles to determine the most relevant key-
retrieved. For the general biomedical informatics words used by any discipline conducting big data/data
journals, eight publications were included (Journal of science research. The combined results from all jour-
Medical Internet Research, Journal of the American Medical nal groups were then searched to exclude articles with
Informatics Association, Medical Decision Making, Interna- the words (usability OR education OR student* OR
tional Journal of Medical Informatics, Journal of Biomedical learn* OR competenc*) in the title. We eliminated
Informatics, BMC Medical Informatics and Decision Making, usability as we did not find these types of big data
Methods of Information in Medicine, and Applied Clinical studies that apply to nursing practice or systems that
Informatics). Because these journals are interdisci- affect nursing. The final combined results were limited
plinary, all journals in the general biomedical infor- to “article” publication type only, which excluded re-
matics group were searched to retrieve articles listing views, editorial material, proceedings papers, correc-
a nursing affiliation (school, department, college, tions, and letters. A total of 650 citations were retrieved
center, etc.) in the author affiliation field. For the for further screening (see Figure 2).

Figure 2 e Literature search results.

Nurs Outlook xxx (2016) 1e13 5

Analysis Method of Articles Each article was reviewed by two of the authors, and
the percent agreement ranged from 82.1% to 100% for
The purpose of this literature review was to identify, each group. Reviewers were then asked to resolve
analyze, and synthesize exemplars of big data nursing disagreements if less than 95%, those greater than 95%
research applied to practice and disseminated in key were reviewed by the primary author. This resulted in a
nursing informatics, general biomedical informatics, percent agreement ranging from 92% to 100%; studies
and nursing research journals. Whereas the selection with disagreement were retained for the next level of
of studies was based on the method of Kim et al. (2014), evaluation. A screening of the resulting list of 28
the process of analyzing studies was based on one of studies was conducted using the same instructions in
the methods noted in the typology of reviews by Grant the first review by five of the authors with a final
and Booth (2009). Fourteen types of reviews or methods identification of 18 articles representing 17 studies is
of analyzing the literature were described by Grant and noted in Table 1.
Booth (2009) as follows: critical review, literature
review, mapping review/systematic map, meta- Purposes
analysis, mixed studies review/mixed methods
review, overview, qualitative systematic review/quali- The 18 articles retained represented 17 studiesd2
tative evidence synthesis, rapid review, scoping articles represent 2 parts of the same study (Buis et al.,
review, state-of-the-art review, systematic review, 2013a; Buis et al., 2013b). Examination of studies
systematic search and review, systematized review, revealed three overarching purposes for carrying out
and umbrella review. The “literature review” method is each studydknowledge discovery, prediction, and
“an examination of recent or current literature [that] evaluation. Six studies were considered to be knowl-
can cover a wide range of subjects at various levels of edge discovery because they sought to understand the
completeness and comprehensiveness, and may capability of big data to discover new meaning, five
include research findings. A literature review may or studies examined predictive factors of process and
may not include comprehensive searching, and may or outcome measures, and six studies evaluated the
may not include quality assessment. [The synthesis is] impact of technical or nursing interventions on patient
typically narrative [and the analysis] may be chrono- outcomes.
logical, conceptual, thematic, etc.” (Grant & Booth, The knowledge discovery studies included identi-
2009). The current work fits the literature review ty- fying factors, associations, or patterns related to pa-
pology of Grant and Booth (2009). A systematic litera- tient outcomes using methods such as data mining and
ture search was conducted with selected journals. natural language processing. The purposes of the
Inclusion and exclusion criteria were applied. A quality studies about prediction included development and
assessment of studies was not conducted. A narrative improvement of an algorithm or a tool to predict risk
format was used for synthesizing results. factors or patterns for patient outcomes. Evaluation
The articles were reviewed using the following studies included development, assessment, and eval-
process. The search strategy filtered out many articles uation of a new framework or tools for patient out-
for consistency with the inclusion and exclusion comes such as decision support system, care
criteria for this literature review; however, further coordination, or internet portal using large data sets
screening of the abstract (or article when necessary) and big data analysis method.
was manually conducted by two nurses for each study.
An evaluation form was created in an Excel spread- Settings
sheet to screen each article for consistency with the
following inclusion criteria: one of the authors must be Settings for 10 of the 17 studies were acute care or
a nurse, the focus is on practice, study meets exclusion inpatient, 2 were ambulatory care based, 2 were com-
criteria for study designs, data source is digital, the munity based (following individuals across settings of
analysis is multivariate or contemporary methods, and care), 2 were in home health care, and 1 was in a public
the data are consistent with big data characteristics of health department.
large volume, variety, or velocity. A percent agreement
was determined between the two reviewers whether a Data Sources
study met the criteria for nursing big data research as
defined in this literature review. The primary source of data was the EHR (n ¼ 14). In
some instances, the data were directly from the EHR
and in other cases, the clinical data were integrated
Results into a database or data warehouse. Two studies used a
variety of data sources integrated with EHRs. For
instance, one study integrated registration, scheduling,
There were 650 articles identified from the literature and billing data with EHR data (Merrill, Sheehan,
search that were further screened for final consider- Carley, & Stetson, 2015), whereas another study inte-
ation as exemplars of nursing big data research grated data from the hospital credentialing system,
consistent with the inclusion and exclusion criteria. state provider registration system, and the EHR (Cho
Table 1. e Results of Nursing Big Data Science Exemplar Studies

Stated Purpose Setting Source of Data Design/Analytic Sample Big Data Nurse as
Collection Techniques Category Author
Knowledge discovery: to identify critical Acute care: Taiwanese Database, Web-based Retrospective cohort 3,324 pressure ulcer Volume First
factors related to patient falls through regional 1,000-bed incident reporting design/data mining incidents, 725 patient
the application of data mining to teaching hospital system artificial neural fall incidents
available data through a Web-based network analysis and
hospital reporting system (Lee et al., multivariate stepwise
2011) logistic regression
Knowledge discovery: to present Acute care: three labor Enterprise data Retrospective cohort 686.402 L&D Volume Second
methods for identifying and analyzing and delivery units warehouse containing design/logistic documentation events variety
associations among nursing care affiliated with one specialty EHR for labor regression (linear) and associated with the set
processes, patient attributes, and health system and delivery survival analysis of 1,093 patients
patient outcomes using unit-level and
patient-level representations of care

Nurs Outlook xxx (2016) 1e13

about labor and delivery derived from
computerized nursing documentation
(Hall et al., 2009)
Knowledge discovery: to explore the Acute care: oncology EHR nursing notes Retrospective cohort 553 oncology nursing Volume First
ability of NLP for capturing nursing unit in an academic design/natural notes for 22 oncology variety
concepts to describe most frequent institution in New language processing patients
signs and symptoms and York City
interventions related to
chemotherapy side-effects and pain
management; to determine feasibility
to extract patient safety and outcomes
only available in nursing notes and
structure the data for analysis (Hyun
et al., 2009)
Knowledge discovery: to determine the Acute care: large health Clinical and Retrospective cohort 4,308 patients with CHF Volume First
feasibility of using network analysis to system that includes administrative data study/network variety
explore patterns of service delivery for inpatient and sets from the EHR, analysis using
patients with congestive heart failure ambulatory care registration, CoUsage and
who transit between services and scheduling, and Transition Networks
settings (Merrill et al., 2015) administrative billing
Knowledge discovery: to investigate Public health: a public Database, research Retrospective cohort 726 high-risk mothers Volume First
problem stabilization, a metric for health department study/KaplaneMeier who received 50,360
problem improvement during home curves interventions
visiting services for high-risk mothers
(Monsen et al., 2011)
(continued on next page)
Table 1. e (Continued )
Stated Purpose Setting Source of Data Design/Analytic Sample Big Data Nurse as
Collection Techniques Category Author
Knowledge discovery: to utilize data- Acute care: 1,000-bed Database, Web-based Data mining exploration 3,324 cases of pressure Volume First
mining techniques as a means of Taiwanese regional incident reporting using neural ulcers
identifying risk factors related to teaching hospital system networks, support
different stages of pressure ulcers to vector machines,
demonstrate how this means of classification and
analysis might be used as a vehicle to regression trees,
guide improved care quality (Lee et al., random forests, and
2012) boosted trees,
followed by
multivariate stepwise
logistic regression
Prediction: To develop a measure to Home health: patient Integration of multiple Retrospective cohort 1,643 home care patient Volume First
predict risk of hospitalization among episodes from 14 data types from study/generalized episodes variety

Nurs Outlook xxx (2016) 1e13

home care patients, the small- to mid-sized multiple home health linear regression and
hospitalization risk score, and Medicare-certified EHRs ROCs
compare it with an existing severity of home care agencies
illness measure, the Charlson index of located in the mid-
comorbidity (Monsen et al., 2012) west and one on the
east coast
Prediction: to assess the effect of Ambulatory: primary Prescription logs from Cross-sectional 478 providers Volume First
provider-level characteristics on care practices an EHR observational study/ (physicians and nurse variety
variation in medication prescribing affiliated with two Hospital credentialing univariate analysis practitioners), 20 or velocity
patterns and the relationship of these teaching hospitals center, State provider and generalized linear more alerts per
patterns relative to the rates of alerts registration system regression model with provider
and overriding alerts (Cho et al., 2015) maximum likelihood
Prediction: to determine the degree to Acute care: university Nursing notes from the Retrospective cohort 23,528 electronic patient Volume First
which the clinical information in the hospital in Finland EHR and heart patient study/linguistic records, 132,053 data variety
electronic patient record can be used acuity from an acuity preprocessing, vector- points
to predict their Oulu patient system space text modeling,
classification acuity scores the next and regularized least-
day (Kontio et al., 2014) squares regression
Prediction: to compare data mining Acute care: military Military Nursing Retrospective 1,653 patients medical- Volume Last
models to identify clinically relevant hospitals in the United Outcomes Database observational study/ surgical, critical care,
factors, beyond the Braden Scale, States data mining, and step-down units
associated with pressure ulcers and to regression, decision
accurately predict pressure ulcer trees, random forests,
prevalence (Raju et al., 2015) multiple adaptive
regression splines
(continued on next page)

Table 1. e (Continued )
Stated Purpose Setting Source of Data Design/Analytic Sample Big Data Nurse as
Collection Techniques Category Author
Prediction: to improve the algorithm for Home health: 15 Integration of multiple Retrospective 911 home health Volume Last
predicting elderly patients’ risks for Medicare-certified data types from observational study/ patients variety
readmission by optimizing the home health care multiple home health ROC analysis and
underlying criteria within the agencies EHRs logistic regression
algorithm and determining the
optimal cut points for the high-risk
medication regimen (Olson, Dierich,
Adam, & Westra, 2014a)
Evaluation: to examine the influence of Acute care: nine Electronic Comparative study/data 840 pressure ulcer Volume First
nurse continuity on the prevention of medicalesurgical and documentation tool mining cluster episodes of care variety
hospital acquired pressure ulcers critical care units in HANDS analysis, logistic
(Stifter et al., 2015) two large regression
communities, 1

Nurs Outlook xxx (2016) 1e13

university, and small
community hospital
Evaluation: to develop and assess the Acute care: large, urban, EHR integrated into a Before and after 1,214 surgical ICU Volume First
impact of a decision support not-for-profit, clinical data comparative study/ patients variety
intervention to predict hospital university-affiliated repository logistic regression and velocity
acquired pressure ulcers on the teaching hospital in Poisson regression
prevalence of ulcers and length of stay Seoul model
in an ICU and on the user adoption
rate and attitudes (Cho, Park, Kim,
Lee, & Bates, 2013)
Evaluation: to use the RE-AIM Community: three Multiple data systems Prospective evaluation/ 1,838/5,570 participants Volume Last
framework to document txt4health Beacon communities: multiple regression enrolled in text variety
efficacy, reach, and adoption. A text Southeast Michigan, message program
message program was used to raise Greater Cincinnati,
Type 2 diabetes risk awareness and New Orleans, LA
promote behavior change (Buis et al.,
2013a; Buis et al., 2013b)
Evaluation: to describe the types of Ambulatory care: Data warehouse that Retrospective 5,963 patient portal Volume First
diabetes patients who utilize the academic health includes data from observational study, users variety
patient-provider Internet portal and system EHR and patient portal linear regression
explore any preliminary differences in
patient outcomes (Shaw & Ferranti,
Evaluation: to examine the effect of the Acute care: three Discharge Decision Quasi-experimental, 3,005 assessments for Volume First
Discharge Decision Support System hospitals in a large Support System (D2S2) two-phase study newly admitted variety
on 30- and 60-day readmissions urban, academic software integrated (control and hospital patients
(Bowles et al., 2015) health center into an EHR experimental phases)/
regression, Cox
proportional hazards,
KaplaneMeier product
limit curve
(continued on next page)
Nurs Outlook xxx (2016) 1e13 9

et al., 2015). Two studies obtained data from a Web-

Nurse as

Note. AIP, aging in place; EHR, electronic health record; HANDS, Hands on Automated Nursing Data System; HHC, home health care; ICU, intensive care unit; NLP, natural
Author based reporting system. In another study, a nursing
outcome database was used.
Big Data


The majority of studies (n ¼ 10) used some type of

regression analysis either alone or in combination with
management notes for

other methods, for example, survival analysis, Cox

271 patients with AIP
care compared with

proportional hazard models, data mining, and

692 receiving HHC

machine learning. Other analytic techniques included


clustering, decision trees, KaplaneMeier curves, natu-

ral language processing (including linguistic pre-
12,235 case

processing), network analysis, neural network

analysis, random forests, receiver-operating curves,
support vector machines, survival analysis, and vector-
spaced text modeling.
then t-tests to contrast
create ontology with
language processing
to extract concepts,

weighting for dose,


Retrospective cohort

Meeting Big Data Requirements


Each of the studies met at least one of the big data

characteristics. Volume was defined by the sample size
that was the unit of analysis or the number of data
points collected on the sample. Examples of large
sample sizes include 4,308 patients with CHF, 5,963
portal users, or, 23,528 electronic patient records. Ex-
EHR case management

amples of large numbers of data points include 12,235

Source of Data

nursing notes, 50,360 interventions, or 20 or more


alerts for 478 providers. Variety indicated either two or

more data sources or types of data, that is, adminis-
trative and clinical or structured and unstructured

data. For example, administrative claims data were

combined with clinical data to evaluate patterns of
care across settings for CHF patients, and EHR in
combination with patient portal data were used to
explore differences in diabetic patient outcomes for
ambulatory, HHC

portal users.

Studies were identified with the big data character-

istic of velocity if data points were accumulating in

language processing; ROC, receiver-operating curve.

very small time frames so that a minute or hour of time

produces an enormous amount of data points, for
example, alarms or alerts. For instance, one study
measured physician override rates for specific types of
alerts (Cho et al., 2015). All 17 studies used high volume
Evaluation: to create a domain-specific

data; 2 were considered high velocity and 12 used a

coordinators and HHC nurses when
ontology of care coordination using

compare specific care coordination

concepts extracted from notes and

suggest a method to quantify care

coordination (Popejoy et al., 2015)
natural language processing with

variety of data sources or types of data. Veracity and

activities used by AIP nurse care
weighted for dose of activity. To

community-dwelling adults and

value characteristics were not represented in the

selected studies.
coordinating care for older
Table 1. e (Continued )

Nurse Authorship
Stated Purpose

To be included in the study, one of the authors had to

be a nurse. All articles included at least one nurse as an
author. For 13 of the articles, nurses were first authors.
A nurse was last author in four publications and sec-
ond author in one publication.
10 Nurs Outlook xxx (2016) 1e13

Data Management Issues data into categories. An example is the number of

patients per day per provider or episodes of care per
Data management issues can be categorized as data patient; an average is one method to summarize large
selection, preparation, and transformation analysis. numbers of events (Cho et al., 2015; Monsen et al.,
Selection begins with the data source as noted in 2011). Another example is transforming continuous
Table 1 and also includes selecting variables and data into classes or categories such as age, vital signs,
patients to address research questions. Rather than fall risk scores, or rank of a medical school (Cho et al.,
using all available data, inclusion and exclusion 2015; Lee et al., 2011). Domain expertise and review
criteria are used to select subsets of observations. of the literature are essential to determine how
Data sources noted in Table 1 required users to iden- to best transform these variables for meaningful
tify the variables of interest from a large pool of data. results.
In some cases, investigators selected variables based
on review of the literature; however, some variables
were eliminated due to missingness, duplication of Discussion
data, or events that were too small for analysis (Cho
et al., 2015). Other investigators that included a large
number of variables then used feature selection The era of big data and cutting-edge analytic
methods to find variables significantly related to the methods in nursing research and clinical scholarship
outcome before proceeding with further analysis (Lee, challenge nurse scientists to extend the data sources
Lin, Mills, & Kuo, 2012). and analytic methods used for discovering and
Data preparation for big data research has simi- translating knowledge to improve care quality and
larities to traditional research, such as detection and safety, lower costs, and address provider satisfaction.
management of missingness and outliers. However, Are nursing scientists and is nursing science
the question of whether missing data occur at addressing big data and data science opportunities to
random or if there is a pattern of missing data affects advance nursing knowledge? The purpose of this
whether the data can be imputed or if variables or review was to identify, analyze, and synthesize
patients are eliminated from the analysis (Hall et al., exemplars of big data nursing research applied to
2009; Lee et al., 2011; Monsen, Farri, McNaughton, & practice and disseminated in key informatics and
Savik, 2011; Raju, Su, Patrician, Loan, & McCarthy, nursing research journals between 2009 and 2015.
2015). In one study, computation and imputation Seventeen studies were identified. This literature
techniques, such as random forest, were used to treat review confirmed that nursing scientists have
missingness when data were not removed (Raju et al., engaged in big data and data science.
2015). To address outliers, one method was the use of Although this finding is promising, multiple chal-
standards scores, such as Z value or retention of data lenges are clear. The majority of studies focused on
within three standard deviations of the mean, to clinical practice, in a general sense; specificity in both
transform and standardize data (Lee et al., 2011, 2012); description of the practice environment as well as
others excluded patients that were outliers (Hall et al., more studies specifically addressing nursing clinical
2009). practice are needed. There is an urgent need and op-
Transformations of the original data created new portunity to expand big data science in health promo-
variables in some studies. Algorithms were used to tion, chronic care, symptom management, and
combine medical diagnoses into the Charlson index of systems. How can nursing advance the big data science
comorbidity, transform Omaha system problems and opportunities across the care continuum, from home
ratings into a hospital risk score to predict hospitali- health, to intensive care, to acute and chronic care for
zation, or determine patterns of interventions to children, adults, families, and communities? How can
represent problem stabilization (Monsen et al., 2011, big data science be extended to include health trajec-
2012). Textual data were transformed using natural tories? How can big data science be expanded to
language processing (NLP) into a numerical format for include systems data, for example, the Nursing Man-
analysis (Hyun, Johnson, & Bakken, 2009; Kontio et al., agement Minimum Data Set, to understand the
2014; Popejoy et al., 2015); however, NLP requires a set contextual impact on care management, outcomes,
of standard terms such as those from evidence-based costs, and patient satisfaction?
guidelines or terminologies to support the trans- Although multiple data sources were used in the 17
formations (Stifter et al., 2015). Olson, Dierich, and studies, nursing science is clearly not accessing the
Westra (2014b) automated mapping of medication plethora of data resources available and applicable. Of
data to data standards and mapping of data particular note was that only one study was found
to instruments such as the Beers criteria and related to public health (Monsen et al., 2011). Because
Potentially Inappropriate Medications for predicting public health has long been noted for amassing large
rehospitalization. data sets, this raises questions as well as opportunities.
Transformation is also used to summarize large Although numerous data sources were used in the
numbers of events or convert continuous or ordinal studies cited in this literature search, there was no
Nurs Outlook xxx (2016) 1e13 11

evidence of data from sources such as clinic, urgent networked structure . the human brain is a dissipa-
care and school settings, administrative claims data tive system, an entropy producing machine”
from a health plan/insurer perspective, or laboratory or (Chatterjee, 2012, p. 578). The key challenge is deter-
imaging data. Social media data are available and used mining how nursing scientists partner with the data
in other health-related studies; how does nursing science field to transcend human intellectual
expand to include these data sources? Data originating limitations.
from wearable technologies is an additional area for Finally, another major implication of this literature
further big data research. Further, consideration of the review is to ask whether nursing faculty and our
social determinants of health and molecular biology preparation of future scientists (PhD programs) are
combined with EHR data would empower nursing sci- prepared for big data and data science. What are the
ence using big data to address the complexities of implications for PhD curricula? What strategies can be
health and nursing. Multiple networks supporting employed to prepare faculty and support PhD curric-
access to data from a multitude of settings, sites, and ular redesign? In addition to socialization of teams, this
collaborating partners are available, for example, field of inquiry requires a team composition that in-
Patient-Centered Outcomes Research Institute (PCORI) cludes, for example, domain experts, nurse scientists,
and Clinical and Translational Science Awards data scientists, predictive modeling (epidemiology)
(CTSAs). Although PCORI and CTSAs report engage- experts, data engineers, and data dictionary analysts,
ment of nursing, no evidence of the use of these con- among others. Do our current resources in academic
sortia and data resources that are available within and environments support large-scale computing?
across consortia was present.
Although numerous analytic methods were
employed in the 17 studies, several issues emerged.
Most fundamental is the current definition of big data
There are a number of limitations to this literature
analytic methods. This literature review excluded
review. First, additional nursing big data studies may
descriptive and inferential statistics and included NLP,
have been missed by limiting the journals and specific
social network analysis, regression, and data mining.
key words searched for this review and the timeframe
Often, medical ontologies are used with NLP to map
selected. Given the difficulty in determining whether a
notes for further analyses of structured data. More
scientist/author is a nurse could have resulted in
clarification and precise definition of data science an-
missing additional nursing studies. Although it is
alytic methods are needed. Consideration of analysis of
recognized that emerging new science is often first
unstructured data, analysis of multimodal data, and
reported in conference proceedings, this source of
network science approaches are areas for further
dissemination was not part of the literature search.
research. Power analyses, unconventional data designs
The literature review also limited the types of analytics
including optimization of relational data and partially
to focus on stronger associations or causal analysis
observed data, need to be explored.
using traditional statistical methods as well as cutting-
More fundamental to this work is the question:
edge contemporary analyses. Inclusion of a broader
what are big data and data science and implications
array of analytic methods would demonstrate the
for discovery? Is it just new methods? Is it just having
increasing emphasis on use of big data for nursing
access to more data? NIH notes that “big data is more
than just very large data or a large number of data
sources” (National Institutes of Health, 2015). Data
science is an interdisciplinary field that focuses on Future Directions
creating models that capture the underlying patterns
of complex systems and codifying models into work- The purpose of the literature review was to identify,
ing applications (Dhar, 2013; Leek, 2013). In contrast analyze, and synthesize exemplars of big data nursing
but complimentary, big data focus on collection and research applied to practice and disseminated in key
management of large amounts of varied data. This informatics and nursing research journals. Future
field of study extends beyond a new method; it is the studies are needed to search additional journals,
emergence of the fourth paradigm of scienced include additional types of studies such as surveys,
eScience (Hey et al., 2009). The emergence of this field and other analytic methods. Grant and Booth (2009)
provides scholars with an opportunity for in-depth identified 14 types of reviews, some of which eval-
discernment of Kurzweil’s (2006) concept of singular- uate the quality of the studies; this work remains to be
ity, a point at which the combination of human and done. As an increasing number of nurse scientists
machine intelligence transcends the limits of a foray into big data and use of data science, there is a
human mind. Chatterjee further asserts that need to continuously make this work visible in jour-
“branches of science that were earlier thought to be nals such as those included in the current literature
separate entities and self-contained in their own do- review. It is imperative that nurse scientists and
mains are merging incessantly .The human brain is a nursing research in big data science clearly explicate
complex, adaptive and a self-organizing system. In- the meaning and importance of the studies to nursing
telligence is an emergent property of this complex practice. Finally, big data and data science invite our
12 Nurs Outlook xxx (2016) 1e13

science to reinvent itself within this emergent para- of Biomedical Informatics, 42(4), 702e709. http://dx.doi.org/10.
digm shift. 1016/j.jbi.2009.01.008.
Hey, T., Tansley, S., & Tolle, K. (Eds.). (2009). The fourth paradigm:
Data-intensive scientific discovery. Seattle, WA: Microsoft
Hyun, S., Johnson, S. B., & Bakken, S. (2009). Exploring the ability
of natural language processing to extract data from nursing
narratives. Cin-Computers Informatics Nursing, 27(4), 215e223.
Ahalt, S. C., Bizon, C., Evans, J., Erlich, Y., Ginsburg, G. S., Kim, H., Ohno-Machado, L., Oh, J., & Jiang, X. (2014). Trends in
Krishnamurthy, A., ., Wilhelmsen, K. (2013). Data to publication of nursing informatics research. AMIA.Annual
Discovery: Genomes to Health. A White Paper from the National Symposium Proceedings/AMIA Symposium, 2014, 805e814.
Consortium for Data Science. Retrieved from http:// Kitchin, R. (2014). Big data, new epistemologies and paradigm
data2discovery.org/dev/wp-content/uploads/2014/02/NCDS- shifts. Big Data and Society, 1(1), 1e12.
Summit-2013.pdf Kontio, E., Airola, A., Pahikkala, T., Lundgren-Laine, H., Junttila, K.,
American Medical Informatics Association. (2016). Definition of Korvenranta, H., ., Salantera, S. (2014). Predicting patient
Biomedical Informatics. Retrieved from https://www.amia.org/ acuity from electronic patient records. Journal of Biomedical
biomedical-informatics-core-competencies Informatics, 51, 35e40. http://dx.doi.org/10.1016/j.jbi.2014.04.001.
Bodenheimer, T., & Sinsky, C. (2014). From triple to quadruple Kurzweil, K. (2006). The Singularity Is Near: When Humans Transcend
aim: Care of the patient requires care of the provider. Annals of Biology. Westminster, London: Penguine Books.
Family Medicine, 12(6), 573e576. http://dx.doi.org/10.1370/afm. Lee, T., Lin, K., Mills, M. E., & Kuo, Y. (2012). Factors related to the
1713. prevention and management of pressure ulcers. CIN:
Bowles, K. H., Chittams, J., Heil, E., Topaz, M., Rickard, K., Computers Informatics Nursing, 30(9), 489e495. http://dx.doi.org/
Bhasker, M., ., Hanlon, A. L. (2015). Successful electronic 10.1097/NXN.0b013e3182573aec.
implementation of discharge referral decision support has a Lee, T., Liu, C., Kuo, Y., Mills, M. E., Fong, J., & Hung, C. (2011).
positive impact on 30-and 60-day readmissions. Research in Application of data mining to the identification of critical
Nursing & Health, 38(2), 102e114. http://dx.doi.org/10.1002/nur. factors in patient falls using a web-based reporting system.
21643. International Journal of Medical Informatics, 80(2), 141e150. http://
Brennan, P. F., & Bakken, S. (2015). Nursing needs big data and big dx.doi.org/10.1016/j.ijmedinf.2010.10.009.
data needs nursing. Journal of Nursing Scholarship, 47(5), Leek, J. (2013). The Key Word in “Data Science” Is Not Data, it Is
477e484. http://dx.doi.org/10.1111/jnu.12159. Science, Simply Statistics. Retrieved from http://simplystatistics.
Buis, L. R., Hirzel, L., Turske, S. A., Jardins, T. R. D., Yarandi, H., & org/2013/12/12/the-key-word-in-data-science-is-not-data-it-
Bondurant, P. (2013a). Use of a text message program to raise is-science/
type 2 diabetes risk awareness and promote health behavior Marr, B. (2015). Big Data: Using SMART Big Data, Analytics and
change (part I): Assessment of participant reach and adoption. Metrics to Make Better Decisions and Improve Performance. West
Journal of Medical Internet Research, 15(12), e281. http://dx.doi. Sussex, UK: Wiley & Sons.
org/10.2196/jmir.2928. Merrill, J. A., Sheehan, B. M., Carley, K. M., & Stetson, P. D. (2015).
Buis, L. R., Hirzel, L., Turske, S. A., Jardins, T. R. D., Yarandi, H., & Transition networks in a cohort of patients with congestive
Bondurant, P. (2013b). Use of a text message program to raise heart failure: A novel application of informatics methods to
type 2 diabetes risk awareness and promote health behavior inform care coordination. Applied Clinical Informatics, 6(3),
change (part II): Assessment of participants’ perceptions on 548e564. http://dx.doi.org/10.4338/ACI-2015-02-RA-0021.
efficacy. Journal of Medical Internet Research, 15(12), e282. http:// Monsen, K. A., Farri, O., McNaughton, D. B., & Savik, K. (2011).
dx.doi.org/10.2196/jmir.2929. Problem stabilization: A metric for problem improvement in
Chatterjee, A. B. (2012). Intrinsic limitations of the human mind. home visiting clients. Applied Clinical Informatics, 2(4), 437e446.
International Journal of Basic and Applied Sciences, 1(4), 578e583. http://dx.doi.org/10.4338/ACI-2011-06-RA-0038.
http://dx.doi.org/10.14419/ijbas.v1i4.418. Monsen, K. A., Swanberg, H. L., Oancea, S. C., & Westra, B. L.
Cho, I., Park, I., Kim, E., Lee, E., & Bates, D. W. (2013). Using EHR (2012). Exploring the value of clinical data standards to predict
data to predict hospital-acquired pressure ulcers: A hospitalization of home care patients. Applied Clinical
prospective study of a bayesian network model. International Informatics, 3(4), 419e436. http://dx.doi.org/10.4338/ACI-2012-
Journal of Medical Informatics, 82(11), 1059e1067. http://dx.doi. 05-RA-0016.
org/10.1016/j.ijmedinf.2013.06.012. National Institute of Nursing Research. (2016). The NINR Strategic
Cho, I., Slight, S. P., Nanji, K. C., Seger, D. L., Maniam, N., Fiskio, J. Plan: Advancing Science, Improving Lives. Retrieved from https://
M., ., Bates, D. W. (2015). The effect of provider www.ninr.nih.gov/sites/www.ninr.nih.gov/files/NINR_
characteristics on the responses to medication-related StratPlan2016_reduced.pdf
decision support alerts. International Journal of Medical National Institutes of Health. (2015). NIH precision medicine
Informatics, 84(9), 630e639. http://dx.doi.org/10.1016/j.ijmedinf. initiative. NIH Medline Plus, 10(3), 19e21.
2015.04.006. Obermeyer, Z., & Emanuel, E. J. (2016). Predicting the futuredBig
Dhar, V. (2013). Data science and prediction. Communications of the data, machine learning, and clinical medicine. New England
ACM, 56(12), 64e73. http://dx.doi.org/10.1145/2500499. Journal of Medicine, 375(13), 1216e1219. http://dx.doi.org/10.
Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data 1056/NEJMp1606181.
concepts, methods, and analytics. International Journal of Olson, C. H., Dierich, M., Adam, T., & Westra, B. L. (2014a).
Information Management, 35(2), 137e144. Optimization of decision support tool using medication
Grant, M. J., & Booth, A. (2009). A typology of reviews: An analysis regimens to assess rehospitalization risks. Applied Clinical
of 14 review types and associated methodologies. Health Informatics, 5(3), 773e788. http://dx.doi.org/10.4338/ACI-2014-
Information and Libraries Journal, 26(2), 91e108. http://dx.doi. 04-RA-0040.
org/10.1111/j.1471-1842.2009.00848.x. Olson, C. H., Dierich, M., & Westra, B. L. (2014b). Automation of a
Hall, E. S., Poynton, M. R., Narus, S. P., Jones, S. S., Evans, R. S., high risk medication regime algorithm in a home health care
Varner, M. W., & Thornton, S. N. (2009). Patient-level analysis population. Journal of Biomedical Informatics, 51, 60e71. http://
of outcomes using structured labor and delivery data. Journal dx.doi.org/10.1016/j.jbi.2014.04.004.
Nurs Outlook xxx (2016) 1e13 13

Popejoy, L. L., Khalilia, M. A., Popescu, M., Galambos, C., Lyons, V., 29(12), 714e718. http://dx.doi.org/10.1097/NCN.
Rantz, M., ., Stetzer, F. (2015). Quantifying care coordination 0b013e318224b597.
using natural language processing and domain-specific Stifter, J., Yao, Y., Lodhi, M. K., Lopez, K. D., Khokhar, A.,
ontology. Journal of the American Medical Informatics Association, Wilkie, D. J., & Keenan, G. M. (2015). Nurse continuity and
22(E1), E93eE103. http://dx.doi.org/10.1136/amiajnl-2014-002702. hospital-acquired pressure ulcers; A comparative analysis
Raju, D., Su, X., Patrician, P. A., Loan, L. A., & McCarthy, M. S. using an electronic health record “big data” set. Nursing
(2015). Exploring factors associated with pressure ulcers: A Research, 64(5), 361e371. http://dx.doi.org/10.1097/NNR.
data mining approach. International Journal of Nursing Studies, 0000000000000112.
52(1), 102e111. http://dx.doi.org/10.1016/j.ijnurstu.2014.08.002. Wang, W., & Krishnan, E. (2014). Big data and clinicians: A review
Shaw, R. J., & Ferranti, J. (2011). Patient-provider internet portals- on the state of the science. JMIR Medical Informatics, 2(1), e1.
patient outcomes and use. CIN: Computers Informatics Nursing, http://dx.doi.org/10.2196/medinform.2913.