Vous êtes sur la page 1sur 6

Available online at www.sciencedirect.

com
Available online at www.sciencedirect.com
ScienceDirect
ScienceDirect
Procediaonline
Available Computer
at Science 00 (2017) 000–000
www.sciencedirect.com
Procedia Computer Science 00 (2017) 000–000 www.elsevier.com/locate/procedia
www.elsevier.com/locate/procedia
ScienceDirect
Procedia Computer Science 113 (2017) 545–550

The 4th International Symposium on Emerging Information, Communication and Networks


The 4th International Symposium on Emerging Information, Communication and Networks
(EICN 2017)
(EICN 2017)
Patients’ written reviews as a resource for public healthcare
Patients’ written reviews as a resource for public healthcare
management in England
management in England
Radoslaw Kowalskiaa*
Radoslaw Kowalski *
University College London, Gower Street, WC1E 6BT London, United Kingdom
a

University College London, Gower Street, WC1E 6BT London, United Kingdom
a

Abstract
Abstract
Measurement of satisfaction from public health services is an important problem. Unhappy patients may lose trust in public
Measurement
health servicesofand
satisfaction from
avoid using public
them evenhealth
whenservices is an
they need do important
need help.problem.
New big Unhappy patients
data analytics may
tools losebetrust
could usedintopublic
help
health services
understand theirand avoid using
preferences them
better andeven when satisfaction
increase they need do needpublic
from help. health
New big data analytics
services. tools could
Unsupervised be used
extraction to help
of key themes
understand
from writtentheir preferences
feedback better
with an LDAand increase
topic modelsatisfaction from
can help with public
better health services.
understanding of theUnsupervised extraction
preferences of patients of
andkey themes
their
from
carers.written feedback insight
The additional with anmay
LDA topic
help modelthe
improve canspeed
help with better understanding
of organisational learningofin the preferences
public of organisations
healthcare patients and their
and open
carers.
up newThe additional
avenues insight may help improve the speed of organisational learning in public healthcare organisations and open
of research.
up new avenues of research.
© 2017 Radoslaw Kowalski. Published by Elsevier B.V.
© 2017 The Authors. Published by Elsevier B.V.
© 2017 Radoslaw
Peer-review under Kowalski. Published
responsibility of the by Elsevier Program
the Conference
Conference B.V. Chairs.
Peer-review under responsibility of Program Chairs.
Peer-review under responsibility of the Conference Program Chairs.
Keywords: public healthcare;decision support systems;patient feedback;machine learning;public management
Keywords: public healthcare;decision support systems;patient feedback;machine learning;public management

1. Introduction
1. Introduction
Vast amounts of patient-generated reviews of GP practices which are collected by National Health Service in
Vast amounts
England can be of patient-generated
used in more ways reviews of GP
than what is practices which
the current are collected
practice. by National
They contain Health on
information Service in
patient
England canMachine
preferences. be usedlearning
in morealgorithms
ways thansuch
what is themodelling
as topic current practice.
1
can helpThey
makecontain information
this insight on patient
accessible. Online
preferences.
reviews are aMachine
resourcelearning
already algorithms such
used to boost as topic modelling
companies’
1
can help make
profits2. Commercial uses this insight
of the accessible.
customer reviewsOnline
data,
reviews are
however, area likely
resource already
different used
from to public
how boost companies’ profits
organisations such .asCommercial
2
NHS woulduseslike of
to the
makecustomer reviews
use of their data,
patients’
however, are likely different from how public organisations such as NHS would like to make use of their patients’

* Radoslaw Kowalski. Tel.: +447818696551.


E-mail address:
* Radoslaw Kowalski. Tel.: +447818696551.
uceskow@ucl.ac.uk
E-mail address: uceskow@ucl.ac.uk
1877-0509 © 2017 The Authors. Published by Elsevier B.V.
Peer-review©under
1877-0509 2017responsibility
The Authors. of
Published by Elsevier
the Conference Program
B.V. Chairs.
Peer-review under responsibility of the Conference Program Chairs.

1877-0509 © 2017 The Authors. Published by Elsevier B.V.


Peer-review under responsibility of the Conference Program Chairs.
10.1016/j.procs.2017.08.275
546 Radoslaw Kowalski / Procedia Computer Science 113 (2017) 545–550
2 Radoslaw Kowalski / Procedia Computer Science 00 (2015) 000–000

reviews. This study explores the usefulness of patient reviews processed with topic modelling, a machine learning
algorithm, for public organisation management. It is argued that anonymous online reviews could be used as a
resource for boosting organisational learning in the public sector. The study includes suggestions for how to use the
data in management of GP practices and how to overcome opinion biases inherent in anonymous reviews.

2. Literature review

Unfortunately, present NHS standards for handling customer feedback appear relatively low3. Online feedback on
NHS GP services in England informs only individuals directly involved with commented-on NHS GP services4 and
is not used to infer patterns on national scale3. Text reviews can also be used to inform about differences between
health service providers that in conventional measures of performance score very similarly5,6. Furthermore, patients
also have an interest in making sure the whole NHS works effectively7. As far as they understand healthcare, in
reviews they commend high quality services and share about any problems8. It appears that the interests of the public
expressed through written feedback are highly relevant to achieving a successful public health service.

In absence of good practices of customer review analysis done by public organisations2, a study involving
customer feedback about NHS services can take inspiration from private sector2. At the same time, public
organisations can have “forced customers” as opposed to clients that have some choice9 and their objectives may be
unrelated to service demand or profitability10. For example, a manager in a private GP surgery can reasonably
assume that simply making patients happy stands for a high quality service5. In the case of public healthcare,
questions may be asked about whether the services which made the customer happy were all really necessary, and
whether the treatment method ensured the most cost-effective care available equally to all. Hence, the demand for
insight among public organisations may differ from the private sector. The question how to analyse customer
feedback for public organisations, also in case of public healthcare, constitutes a gap in literature that needs to be
addressed.

The choice of the best technique to extract information from written customer feedback depends to a large degree
on how many reviews there are2. Smallest review datasets can be read manually in a systematic manner 8. If the
review numbers are greater and new reviews require continuous analysis, information extraction tends to be carried
out automatically according to a manually encoded set of rules2,11. Those automation methods can produce highly
interpretable, concise summaries and offer easily understandable methodologies2. On the other hand, they require
significant customising and maintaining effort for each model, especially when model biases in very large datasets
are hard to identify2,12. Therefore, a viable alternative for the largest datasets is to use machine learning models such
as topic modelling. Topic models are able to extract key features from text documents without explicit, manually set
rules for information extraction13. They can also adapt to changes in how customers write their feedback14,15 and can
use whole datasets to train the model for feature extraction. On the other side, machine learning makes model
outcomes may not be easily interpretable2, and may not always be effective at extraction of the desired information
from customer feedback16. Nonetheless, a topic model such as an Latent Dirichlet Allocation (LDA) can be highly
useful for extracting key features of public services identified by customers13,17.

3. Methods

Customer feedback processed with machine learning models has a potential for supporting decision-making in
NHS to generate more value for patients. Therefore, this study investigates the use of LDA topic model to analyse a
large body reviews of NHS-funded GP services in England. The data constitute of over 145 000 fully filled out
reviews of GP practices posted from July 2013 to January 2017 about almost 7700 GP practices (89% of all
reviews). Anonymous reviewers can post a written comment and answer six 5-point Likert-scale statements on their
service experience in NHS-funded GP practices. The statements reviewers respond to are: 1) “Are you able to get
through to the surgery by telephone?”, 2) “Are you able to get an appointment when you want one?”, 3) “Do the
Radoslaw Kowalski / Procedia Computer Science 113 (2017) 545–550 547
Radoslaw Kowalski / Procedia Computer Science 00 (2015) 000–000 3

staff treat you with dignity and respect?”, 4) “Does the surgery involve you in decisions about your care and
treatment?”, 5) “How likely are you to recommend this GP surgery to friends and family if they needed similar care
or treatment?”, and 6) “This GP practice provides accurate and up to date information on services and opening
hours”. Unfortunately, the review data from NHS Choices are a biased sample of opinion. Older individuals and
those who do not use internet are likely under-represented in the dataset. Moreover, anyone can comment on the
website and intentionally distort how potential patients evaluate GP practices. Fortunately, however, NHS Choices
administrators remove malicious messages from the server manually. Furthermore, NHS Choices staff ensure that
unfavourable but legitimate reviews remain in the dataset consistently across England.

The data processing steps and the LDA topic model have been implemented with ‘stm’ library available for R
programming language. First, each review was tokenized to break down reviews into token lists. For example, a
sentence “The doctors were very considerate.” was transformed into “The”, “doctors”, “were”, “very”,
“considerate”, “.” and word stems were removed, for example so that “doctors” become “doctor”. Then, all capital
letters were turned into lower case and non-informative terms such as “very” or “the”, numbers, html links and
punctuation were removed. The data pre-processing also removed all tokens which were 1 or 2 characters in length,
as well as tokens which occurred fewer than 10 times or more than 100 000 times in the patient reviews. The least
and most frequent tokens were removed to reduce the computational power required to carry out LDA topic
modelling. Those terms are not helpful at identifying key topics in the data. The data cleaning procedure removed
37708 tokens which occurred 77976 times in GP reviews. The final corpus contained 7660 terms which occurred
over 6m times across the dataset.

The pre-processed corpus containing lists of tokens from each GP review was used to compute four LDA topic
models. Topic models with 40, 50, 60 and 70 key themes were produced from the GP reviews corpus. Each topic
generated with an LDA topic model is a distribution of words which tend to occur together across reviews13. Choice
of the number of topics for the LDA model affects the quality of the output 13. If topics are too few, their content
gives insight into only very general patterns in text which are not very useful. Too many topics, on the other hand,
lead to a large proportion of topics without discernible meaning that can be used. An LDA topic model with an
optimal number of topics ought to reveal insightful patterns in the data without generating many non-meaningful
topics. Moreover, models may differ according to their semantic coherence (the rate at which topic’s most common
words tend to occur together in the same reviews) and exclusivity (the rate at which most common terms are
exclusive to individual topics) 18. Both metrics are useful guidance of which model to choose18.

4. Results

The LDA model with 60 topics was chosen to obtain results after considering semantic coherence, exclusivity
scores and proportion of topics with discernible meaning. LDA model with 70 topics had a many topics without a
discernible meaning. A topic was deemed meaningful if 7 most common and distinctive words from a topic were
related to an aspect of GP service experience. LDA model with 60 topics offers more detailed insight into GP
service experience than the 40 and 50-topic LDA models, while avoiding generation of many meaningless topics.
Another advantage of the 60-topic LDA model is that it had the highest exclusivity score. The main weakness of the
model with 60 topics is that it has the lowest semantic coherence score compared to alternatives.

The 60 topics generated with the chosen LDA topic model have been labelled according to the most prominent
words in topics. The features extracted from text reviews with LDA topic model relate to a range of experiences of
patients (see Table 1 below for details). The topics also had a varying prevalence across the GP reviews dataset,
from about 5% of tokens in the dataset to under 1% of tokens in the dataset. Topic 7 “friendly doctors” has been the
most prevalent of all of them, followed by topic 54 “Unhappy”. Topics about the difficulty of scheduling an
appointment (4, 17, 30 and 51) also frequently featured in reviews, cumulatively constituting about 8% of all words
in reviews.
4 Radoslaw Kowalski / Procedia Computer Science 00 (2015) 000–000
548 Radoslaw Kowalski / Procedia Computer Science 113 (2017) 545–550

Fig. 1: Two-dimensional map of 60 topics generated with the LDA model

Table 1: Labels for topic numbers

1. Child treated 2. Helpful practice 3. Not worth the tax 4. Appointment impossible 5. [meaning not certain]
6. Respectful 7. Friendly doctors 8. [citing GP staff] 9. Parking problem 10. Bad receptionists
11. Difficult access 12. Treated kids there 13. Star nurse service 14. Arrogant 15. Can’t phone in
16. Comforting 17. Long wait 18. Suffering 19. Some good some bad 20. Right heard diagnosis
21. Difficult registration 22. Go extra mile 23. Difficult referrals 24. Information missing 25. Excellent care quality
26. Poor chronic treatment 27. Arranged care at home 28. Prescription not done 29. Prompt treatment 30. No booking in advance
31. Hard to book on phone 32. Big changes in GP 33. Distressing 34. Situation with reception 35. Unhelpful
36. Bad facilities 37. [meaning not certain] 38. Hard to phone in 39. Poor manners 40. Patient engagement
41. Advising others 42. Competent 43. Surprising service 44. Nice and clean 45. Tough appointments
46. Saying thanks 47. Impressive practice 48. Bad experience 49. Ineffective booking 50. Sharing feelings
51. Fast emergency access 52. Bad H-care system 53. Pleasant experience 54. Unhappy 55. Visible changes
56. The worst ever 57. Annoying 58. One doctor special 59. Comparing GPs 60. Maybe misdiagnosis

Topics can also be compared with regard to their similarity to one another. It is assumed that two topics are
similar if the choice of vocabulary they represent is similar, and they are very different if there are few common
words present in both of them. Figure 1 represents relative similarities between topics portrayed on a two-
dimensional plane. Node sizes indicate topic popularities and node colours indicate topic clusters. The distances
between topics have been computed with cosine similarity scores calculated for each topic. Relatively stronger
pairwise similarities are portrayed with relatively smaller distances and thicker edges connecting nodes. The result
was a slightly elongated mapping of topics which broadly cluster into 2 groups, one on the left and one on the right
hand side of the graph. A closer inspection of figure 6 reveals that the greatest distance occurs between, on one
hand, topics with positive evaluations of GP services, such as “Helpful practice”, “saying thanks”, “friendly
doctors”, “excellent care quality” and “Comforting” (left) and, on the other hand, topics with reviewers finding it
hard to use GP services, such as “hard to reach on phone”, “hard to book on phone”, “long wait”, “can’t reach on
phone” (right). Patients unable to reach their GP service were least likely to express positive feelings. A comparison
of top and bottom sides of the graph, in turn, tends to indicate differences in writing style. For example, topics
“difficult referrals” (23) and “arranged care at home” (27) on the top tended to have been written in a very factual
language. Topics on the bottom such as “bad receptionists” (10), “surprising service” (43) and “tough appointments”
(45) tended to have been written in a highly emotive language.
Radoslaw Kowalski / Procedia Computer Science 00 (2015) 000–000 5
Radoslaw Kowalski / Procedia Computer Science 113 (2017) 545–550 549

Fig. 2. Coefficients and confidence intervals predicting topic proportions with star ratings provided by reviewers to a question “Are you able to
get through to the surgery by telephone?”. On the left – topic 36. “bad facilities”. On the right – topic 45. “tough appointments”

Topic prevalence in GP reviews can also be related to how reviewers rate their GP service experience in Likert-
scale numeric responses. To evaluate the relationships, the proportion presence of each topic in each review was
used as dependent variable, and the star ratings accompanying each of the 6 survey statements in each review were
used as independent variables. Figure 2 features example where Likert-scale star responses to a statement “Are you
able to get through to the surgery by telephone?” were used to predict prevalences of topics 36. “bad facilities” (left)
and 45. “tough appointments” (right) in reviews. Topic “bad facilities” is unrelated to the star ratings given by
reviewers with regard to ease of phone access. The coefficient predicting presence of the dependent variable is not
statistically different between reviews with 1 out of 5 stars, and reviews with 5 out of 5 stars. In contrast,
proportional presence of topic “tough appointments” in reviews can be predicted with star ratings given to the
question about ease of phone accessibility of a GP practice. The higher the star rating, the lower the coefficient
predicting for presence of the topic “tough appointments”. Higher star ratings given for telephone accessibility
indicate lower prevalence of reviewer complains about the difficulty of scheduling GP appointments. Similarly
intuitive relationships between topics’ meanings and star ratings have been found across the dataset.

5. Discussion and conclusion

LDA topic model outcomes can offer a broad range of insights into the experiences patients have with GP
services. The results constitute evidence that topic models are useful for summarising large numbers of written
reviews. The outcomes are similarly complex to conclusions from qualitative studies of similar datasets8. Moreover,
LDA extracts a number of themes from the text reviews as they occur in the data without any prior assumption about
what patients care about. Topic models constructed from online reviews could be helpful at guiding change in NHS
on national and regional level. For example, NHS could use topic modelling to identify successful GP practices by
filtering the data to look at the most impressive GP practices. Other uses of topic models can include analyses of key
challenges facing the NHS which could be overcome nationally in a more effective manner than locally. For
example, this study suggests many patients are confused and frustrated by the difficulty in making GP appointments.
Perhaps, a nation-wide online booking system which GP practices and patients can use to transparently manage GP
appointments could help. Apart from that, in addition to benefitting NHS decision-makers, topic models can help
inform research into public preferences with regard to NHS services and can help inform public about the current
NHS challenges in terms of patient satisfaction. Use of anonymous data helps to bypass any privacy issues usually
associated with individual preferences that stifle exchange of information.

Unfortunately, the validity and reliability of topic model outcomes is limited by the fact that most patients do not
post reviews online. On average, GP practices received fewer than 20 reviews over a period of three and a half
550 Radoslaw Kowalski / Procedia Computer Science 113 (2017) 545–550
6 Radoslaw Kowalski / Procedia Computer Science 00 (2015) 000–000

years. GP practice-level comparisons based on the topic content of reviews are not feasible given the limited size of
the dataset, but comparisons between larger NHS administrative areas such as Clinical Commissioning Groups
could help document the impact of mid-level NHS administration on GP performance. Another problem is that the
biases in the sample of patient experiences analysed with the LDA topic model are unknown. For instance, bias in
online reviews can be very dependent on how feedback is collected19,20. Therefore, it is advisable to compare the
LDA topic model results obtained from anonymous GP reviews with a representative and systematic survey of
patients’ opinion about their GP service experience. The comparison could help establish how representative are the
outcomes of topic modelling. In the context of NHS, GP Patient Survey is at present the most systematic and
regularly collected opinion survey about GP services in England21 and could be used for making such a comparison.

In summary, public management of NHS funded GP services can benefit from introduction of more machine
learning algorithms to support organisational learning at a national and regional level. Topic model machine
learning algorithms can be used to process very large numbers of patient reviews into insights which are relatively
complex but at the same time also easy to understand and actionable. The opportunity to use machine learning to
process online reviews of patients is especially applicable because the data are already available, and can offer a
near real-time, low cost substitute to patient surveys.

Acknowledgements

Author would like to thank his academic supervisors prof. Slava Mikhaylov and dr. Marc Esteve for support
throughout the process of researching the subject covered in this workshop paper.

References

1. Chaney, A. J. B. & Blei, D. M. Visualizing Topic Models. (2012).


2. Hogenboom, F., Frasincar, F., Kaymak, U., de Jong, F. & Caron, E. A Survey of event extraction methods from text for decision support
systems. Decis. Support Syst. 85, 12–22 (2016).
3. Tingle, J. NHS hospital complaints system review. Br. J. Nurs. 23, 60–61 (2014).
4. Trigg, L. Patients ’ opinions of health care providers for supporting choice and quality improvement. 16, 102–107 (2016).
5. James, T. L., Calderon, E. D. V. & Cook, D. F. Exploring patient perceptions of healthcare service quality through analysis of unstructured
feedback. Expert Syst. Appl. 71, 479–492 (2017).
6. Alemi, F., Torii, M., Clementz, L. & Aron, D. C. Feasibility of real-time satisfaction surveys through automated analysis of patients’
unstructured comments and sentiments. Qual. Manag. Health Care 21, 9–19 (2012).
7. Mason, H., Baker, R. & Donaldson, C. Understanding public preferences for prioritizing health care interventions in England : does the type of
health gain matter ? J. Health Serv. Res. Policy 16, 81–89 (2011).
8. Lopez, A., Detz, A., Ratanawongsa, N. & Sarkar, U. What patients say about their doctors online: A qualitative content analysis. J. Gen.
Intern. Med. 27, 685–692 (2012).
9. Di Pietro, L., Guglielmetti Mugion, R. & Renzi, M. F. An integrated approach between Lean and customer feedback tools: An empirical study
in the public sector. Total Qual. Manag. Bus. Excell. 24, 899–917 (2013).
10. Brownson, R. C., Allen, P., Duggan, K., Stamatakis, K. A. & Erwin, P. C. Fostering more-effective public health by identifying
administrative evidence-based practices: A review of the literature. Am. J. Prev. Med. 43, 309–319 (2012).
11. Abrahams, A. S., Jiao, J., Wang, G. A. & Fan, W. Vehicle defect discovery from social media. Decis. Support Syst. 54, 87–97 (2012).
12. Yan, Z., Xing, M., Zhang, D. & Ma, B. EXPRS : An extended pagerank method for product feature extraction from online consumer reviews.
Inf. Manag. 52, 850–858 (2015).
13. Blei, D. M., Ng, A. Y. & Jordan, M. I. Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, 993–1022 (2003).
14. Blei, D. M. & Lafferty, J. D. Dynamic Topic Models. in Proceedings of the 23rd International Conference on Machine Learning 113–120
(2006).
15. Dai, A. M. & Storkey, A. J. The supervised hierarchical Dirichlet process. IEEE Trans. Pattern Anal. Mach. Learn. 37, (2015).
16. Winkler, M., Abrahams, A. S., Gruss, R. & Ehsani, J. P. Toy safety surveillance from online reviews. Decis. Support Syst. 90, 23–32 (2016).
17. Griffiths, T. L. & Steyvers, M. Finding scientific topics. PNAS 101, 5228–5235 (2004).
18. Roberts, M. E., Stewart, B. M. & Tingley, D. in Computational Social Science: Discovery and Prediction (ed. Alvarez, M. R.) (2015). at
<http://scholar.harvard.edu/files/dtingley/files/multimod.pdf>
19. Xiang, Z., Du, Q., Ma, Y. & Fan, W. A comparative analysis of major online review platforms : Implications for social media analytics in
hospitality and tourism. Tour. Manag. 58, 51–65 (2017).
20. Gao, G. (Gordon), Greenwood, B. N., Agarwal, R. & McCullough, J. S. Vocal Minority and Silent Majority: How Do Online Ratings Reflect
Population Perceptions of Quality. MIS Q. 39, 565–590 (2015).
21. Cowling, T. E., Harris, M. J. & Majeed, a. Evidence and rhetoric about access to UK primary care. Bmj 350, h1513–h1513 (2015).

Vous aimerez peut-être aussi