Académique Documents
Professionnel Documents
Culture Documents
Hal Hagood
u02a1
Analytics Internship: Data Mining Individual 2
Data Import
Analytics Internship: Data Mining Individual 3
Analytics Internship: Data Mining Individual 4
For this particular data set, three variables were chosen i.e. Age, Sex and Race. Measure of
central tendency or mean, the median and the mode as well as measure of dispersion of data or
variability, scatter, or spread are given in the tables above and below. These are often called descriptive
The mean value for Age was 8.32 with a mode of 12.00 and standard deviation of 4.03
The mean value for Sex was 1.64 with a mode of 2.0 and standard deviation of 0.477
The mean value for Race was 1.30 with a mode of 1.0 and standard deviation of 1.30
These are all measures of central tendency. They help summarize a bunch of scores with a
single number. Suppose you want to describe a bunch of data that you collected to a friend for a
particular variable like height of students in your class. One way would be to read each height you
recorded to your friend. Your friend would listen to all of the heights and then come to a conclusion about
how tall students generally are in your class but this would take too much time. Especially if you are in a
class of 200 or 300 students! Another way to communicate with your friend would be to use measures of
central tendency like the mean, median and mode. They help you summarize bunches of numbers with
one or just a few numbers. They make telling people about your data easy (Statistics, 2017).
These are all measures of dispersion. These help you to know the spread of scores within a
bunch of scores. Are the scores close together or are they really far apart? For example, if you were
describing the heights of students in your class to a friend, they might want to know how much the heights
vary. Are all the men about 5 feet 11 inches within a few centimeters or so? Or is there a lot of variation
where some men are 5 feet and others are 6 foot 5 inches? Measures of dispersion like the range,
variance and standard deviation tell you about the spread of scores in a data set. Like central tendency,
they help you summarize a bunch of numbers with one or just a few numbers (Statistics, 2017).
Analytics Internship: Data Mining Individual 5
Analytics Internship: Data Mining Individual 6
Analytics Internship: Data Mining Individual 7
Analytics Internship: Data Mining Individual 8
spikes at around 2 years of age then drops around 5 years of age. The highest spike occurs in the 12
years of age area and then drops to lowest point on the graph at around 15 years of age.
The graph showing gender i.e. male or female shows a higher frequency in the second category
of which we assume to be female. The recorded ethnicity consists of White, Black, Asian, Pacific Islander
and Indian. We can assume that the highest racial category is White with the lowest being Asian or
Pacific Islander.
Population Density is included as a general control variable for predicting response rates. In
terms of socioeconomics and demographics, population density can impact anything from resource
availability to commuting travel time. Thus, for the purpose of this relationship, it is considered a general
proxy for the hustle and bustle of modern life. In addition, relationships between population density is
included as a control when examining relationships between other variables and hospital-level VBP
scores. In this context, population density is treated as a loose proxy for patient volume. This operates on
the assumption that, generally speaking, hospitals primarily receive patients from surrounding
populations, and thus more dense populations lead to a higher influx of patients. It is recognized that this
is not a perfect proxy for patient volume, however, though this will be further discussed in the limitations
section below.
Measures of % White and % Hispanic are included to test for possible population demographic
effects on hospital performance and HCAHPS response rates. Again, this operates on the principle that
hospitals likely draw in patient populations that are reflective of their surrounding environment.
Furthermore, previous research has indicated that differences and similarities in both race and ethnicity
impact patient physician relationships, patient communication, care delivery, and perceptions of bias.
Thus, potential impacts of race and ethnicity should be included in the analytical model (cdn2, 2017).
Please see attached appendix for a further description of the HCAHPS data. The HCAHPS
survey contains 21 patient perspectives on care and patient rating items that encompass nine key topics
Analytics Internship: Data Mining Individual 9
Reference
Cdn2, (2017). HCAHPS Surveys Response Rates, Demographics, and Performance. Retrieved
pdf/HCAHPS_Surveys_-_Response_Rates_Demographics
Hcahpsonline.org, 2017. CAHPS Hospital Survey. Retrieved January 20, 2017 from
http://www.hcahpsonline.org/home.aspx
Statistics, (2017). What are measures of central tendency and dispersion? Retrieved January 18, 2017
What_are_measures_of_central_tendency_and_dispersion.htm#.WH-zURsrJQJ
Analytics Internship: Data Mining Individual 10
Appendix
The intent of the HCAHPS initiative is to provide a standardized survey instrument and data
collection methodology for measuring patients' perspectives on hospital care. While many hospitals have
collected information on patient satisfaction, prior to HCAHPS there was no national standard for
collecting or publicly reporting patients' perspectives of care information that would enable valid
comparisons to be made across all hospitals. In order to make "apples to apples" comparisons to support
consumer choice, it was necessary to introduce a standard measurement approach: the HCAHPS survey,
which is also known as the CAHPS Hospital Survey, or Hospital CAHPS. HCAHPS is a core set of
questions that can be combined with a broader, customized set of hospital-specific items. HCAHPS
survey items complement the data hospitals currently collect to support improvements in internal
Three broad goals have shaped the HCAHPS survey. First, the survey is designed to produce
comparable data on the patient's perspective on care that allows objective and meaningful comparisons
between hospitals on domains that are important to consumers. Second, public reporting of the survey
results is designed to create incentives for hospitals to improve their quality of care. Third, public reporting
will serve to enhance public accountability in health care by increasing the transparency of the quality of
hospital care provided in return for the public investment. With these goals in mind, the HCAHPS project
has taken substantial steps to assure that the survey is credible, useful, and practical. This methodology
In May 2005, the National Quality Forum (NQF), an organization established to standardize
health care quality measurement and reporting, formally endorsed the CAHPS Hospital Survey. The
NQF endorsement represents the consensus of many health care providers, consumer groups,
professional associations, purchasers, federal agencies, and research and quality organizations.
The HCAHPS survey contains 21 patient perspectives on care and patient rating items that
encompass nine key topics: communication with doctors, communication with nurses, responsiveness of
hospital staff, pain management, communication about medicines, discharge information, cleanliness of
the hospital environment, quietness of the hospital environment, and transition of care. The survey also
Analytics Internship: Data Mining Individual 11
includes four screener questions and seven demographic items, which are used for adjusting the mix of
patients across hospitals and for analytical purposes. The survey is 32 questions in length
(hcahpsonline.org, 2017