Académique Documents
Professionnel Documents
Culture Documents
www.2kmm.eu
www.r-clinical-research.com
22th Jun 2018
r.clin.res@gmail.com Poland • Sosnowiec http://www.iscb.pl
DISCLAIMER
If you believe your rights are violated, please email me: r.clin.res@gmail.com
Agenda
► Quick introduction to R
o Description
o Who uses R?
* Evidence-Based Medicine 3
Agenda
► R in Evidence-Based Medicine
o Capabilities
4
Agenda
► R in Clinical Research
5
Agenda
► Validation
o Numerical validation
o Methods
o Reference data
6
Agenda
► Conclusions
► Does it work?
► Q&A
7
Quick introduction to R ► Description
σ𝒙
𝒏 statistical computing
data manipulation
data presentation
https://www.r-project.org
and other general programming tasks
► Operating systems cross-platform: Windows, Unix, Linux, OS X, mobile: Android, Maemo, Raspbian
► Source of libraries mirrored repository – CRAN, users' sites, third-party repositories (Github, RForge)
10
Quick introduction to R ► Description
11
Quick introduction to R ► Description
12
Quick introduction to R ► History
Statistical Sciences, Inc. from Bell Labs Insightful Corporation TIBCO
R. Douglas Martin from AT&T Lucent
Exclusive license
University of Washington
to develop and sell S code boguht
S-PLUS was born the S language for $2 mln IC acquired TIBCO Spotfire
1988 1993 2004 2008
2007 Revolution
v 1.0.0
acquired
2000 by Microsoft
2015 rxODE gfd ThreeArmedTrials randomizeR FDA: „Statistical Software Clarifying Statement” The R Consortium
2016 R Tools for Visual Studio rankFD The R Epid. Cons.
2017 dfpk - Bayesian Dose-Finding Designs officer
2018 Mediana - general framework for CT simulations 14
Quick introduction to R ► Who uses R?
That is to say, a logo of a company is included in the list only if there is a clear evidence that the
company uses or supports (or used or supported) R, based on information shared on the Internet –
and thus available for everyone.
Please note, that I am not aware if all listed companies are still using any version of R at the time the
presentation is being viewed. If you want me to remove your logo, please send me an mail to
16
r.clin.res@gmail.com
Quick introduction to R ► Who uses R?
“We use R for adaptive designs frequently because it’s the fastest tool to explore designs that interest
us. Off-the-shelf software, gives you off-the-shelf options. Those are a good first order approximation,
but if you really want to nail down a design, R is going to be the fastest way to do that.”
Keaven Anderson
Executive Director, Late Stage Biostatistics
Merck
Publicly available sources:
https://pharma-life-sciences.cioreview.com/news/gsdesign-explorer-to-optimize-merck-s-clinical-trial-process-nid-1305-cid-36.html
Google Books: Big Data for Big Pharma: An Accelerator for The Research and Development Engine?
“De facto, R is already a significant component of Pfizer core technology. Access to a supported
version of R will allow us to keep pace with the growing use of R in the organization, and provides a
path forward to use of R in regulated applications.”
“We use R for all of our analysis,” says Elashoff. “I think it’s fair to say that R really is the
foundation of a lot of the work that we do.” To speed up the process without sacrificing
accuracy, the team also uses Revolution R analytic products. “We use R seven or eight
hours per day, so any improvement in speed is helpful, particularly when you’re looking at a
million biomarkers and wondering if you’ll need to re-run a million analyses.”
Michael Elashoff
The company’s director of biostatistics
Publicly available sources: CardioDX
https://www.featuredcustomers.com/media/CustomerCaseStudy.document/revolution-analytics-1_cardiodx_8284.pdf 18
Quick introduction to R ► Who uses R?
“We use R for all of our analysis,” says Elashoff. “I think it’s fair to say that R really is the
foundation of a lot of the work that we do.” To speed up the process without sacrificing
accuracy, the team also uses Revolution R analytic products. “We use R seven or eight
hours per day, so any improvement in speed is helpful, particularly when you’re looking at a
million biomarkers and wondering if you’ll need to re-run a million analyses.”
Michael Elashoff
The company’s director of biostatistics
CardioDX
Publicly available sources:
19
https://www.businesswire.com/news/home/20110118006656/en/CardioDX-Revolution-Analytics-Develop-Non-Intrusive-Test-Predicting
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
errors-in-variables modeling
planned
comparison of methods robust methods resampling
& post-factum analysis
Deming, Passing-Bablock, Bland-Altman regularized, M-estimators bootstrap, permutation, exact
non-inferiority
PK, PD,
meta-analysis ROC analysis superiority
(bio) equivalence Dose-Response
20
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
interactive
producing documents reproducible research logging processes
presentations
doc(x), ppt(x), pdf, rtf, odf, ps pure ascii, html, pdf, doc
21
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
Descriptive stats
Data review
22
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
Linear regression
ANOVA
post-hoc
23
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
GLM modelling
24
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
NLM modelling
25
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
26
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
27
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
28
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
29
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
30
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
31
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
32
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
33
R in Evidence-Based Medicine ► Capabilities ► Cooperation & compliance with SAS
𝑛
1 𝑥 − 𝑥𝑖
𝑛ℎ𝑑 ℎ SAS IML
𝑖=1 SAS
or different module #1
method of
communication
Differences in:
Missing or
origin of dates SAS
Bi-directional expensive
module #2
default contrasts
communication functionality
used sum of squares
calculation of quantiles
generation of random numbers
implementation of advanced model 34
representation of floating point numbers
SAS and R Team in Clinical Research (Adrian Olszewski)
Agenda
► R in Clinical Research
35
R in Clinical Research ► Status of R on the Clinical Research market
We can only speculate on why so often R users are told the mantra:
Too many myths have accumulated, but we cannot ignore the facts. 37
R in Clinical Research ► Myths and Facts
38
R in Clinical Research ► Myths and Facts
Facts Myths
40
R in Clinical Research ► Myths and Facts
First, let us briefly address all points in the “table of shame”. Facts first.
R doesn’t facilitate the creation of CDISC True. There is no easy GUI tools to map fields between CDASH
datasets and SDTM or easy-to-use ways to generate define.xml
41
R in Clinical Research ► Myths and Facts
First, let us briefly address all points in the “table of shame”. Facts first.
42
R in Clinical Research ► Myths and Facts
First, let us briefly address all points in the “table of shame”. Facts first.
Well, that is true. But hiding issues doesn’t make them less
dangerous.
How often are you getting informed about errors in your favorite
software with full details and the source code?
43
R in Clinical Research ► Myths and Facts
First, let us briefly address all points in the “table of shame”. Facts first.
Creators of R packages don’t have to provide Yes. Even if forced to write tests, nobody can guarantee the tests
(good) unit tests. It’s king of a good will. are defined properly and bring any advantage.
44
R in Clinical Research ► Myths and Facts
Now myths.
FDA demands SAS for both the analysis and
No. FDA has never claimed that. This myth is so often repeated,
producing datasets. No other software is
so FDA issued an official “Software “Clarifying Statement”
allowed.
False. R can be combined with SAS in may ways. Check this out:
https://www.quora.com/How-can-I-integrate-SAS-with-R
R cannot cooperate with SAS, including
SAS enabled direct communication between R and SAS in the
reading and writing SAS binary files
IML module in 2009.
R can read SAS7 binary data files and both read/write XPT files.
R cannot be validated as well as commercial False. R can be validated no worse. In fact there is at least one
software company offering validated version of R – Mango.
Commercial software doesn’t have errors Facts deny this claim evidently.
Now myths.
R is limited in terms of implemented statistical We have just seen how rich is the R statistical library. This is the
methods most complete library after SAS (plus few routines more)
Now myths.
Let me quote this: Whoever told you that is not well-informed. CFR Part 11 has to do
with critical software that runs medical devices and about certain primary data
management software. It does not apply to statistical analysis software. We use R all
the time in industry-sponsored and NIH sponsored clinical trials. You do not need to
seek FDA's approval. FDA accepts all comers and does not dictate software policy for
analysis. They even accept Excel and Minitab for NDAs. There are many messages
related to this in the r-help archive; please look at them.
Frank E Harrell Jr
Professor and Chair School of Medicine, Department of Biostatistics
Vanderbilt University
Source
And this: “Records submitted to FDA, under predicate rules in electronic format [are Part
R doesn’t meet 21 CFR 11 records]. However, a record that is not itself submitted, but is used in generating a
Part 11, which is a must submission, is not a part 11 record unless it is otherwise required to be maintained under
a predicate rule and it is maintained in electronic format.”
Now myths.
Now myths.
50
R in Clinical Research ► What does FDA say?
https://www.fda.gov/downloads/forindustry/datastandards/studydatastandards/ucm587506.pdf
51
R in Clinical Research ► What does FDA say?
https://www.fda.gov/downloads/medicaldevices/.../ucm085371.pdf
52
R in Clinical Research ► What does FDA say?
[…]
design input requirements must be documented, and that specified requirements
must be verified
[…]
Success in accurately and completely documenting software requirements is a crucial
factor in successful validation of the resulting software.
53
R in Clinical Research ► What does FDA say?
54
R in Clinical Research ► What does FDA say?
Software validation is a part of the design validation for a finished device, but is not
separately defined in the Quality System regulation. For purposes of this guidance,
FDA considers software validation to be “confirmation by examination and
provision of objective evidence that software specifications conform to user
needs and intended uses, and that the particular requirements implemented
through software can be consistently fulfilled.
SOFTWARE VERIFICATION
≠ SOFTWARE VALIDATION
requirements
documentation
( verification ) + validation
Because of its complexity, the development process for software should be even
more tightly controlled than for hardware, in order to prevent problems that cannot
be easily detected later in the development process.
[…]
Seemingly insignificant changes in software code can create unexpected and
very significant problems elsewhere in the software program. The software
development process should be sufficiently well planned, controlled, and documented
to detect and correct unexpected results from software changes.
57
R in Clinical Research ► What does FDA say?
Validator Builder 58
R in Clinical Research ► What does FDA say?
The software requirements specification document should contain a written definition of the software functions.
It is not possible to validate software without predetermined and documented software requirements.
The vendor’s life cycle documentation, such as testing protocols and results, source code, design
specification, and requirements specification, can be useful in establishing that the software has
been validated. However, such documentation is frequently not available from commercial
equipment vendors, or the vendor may refuse to share their proprietary information.
Now let’s stop for a while and quickly summarize what we already learned
Assurance “we did our best” Assurance “we did our best”
No guarantee No guarantee
Full trust: it’s paid = validated well Low trust. Free things are poorly made 60
R in Clinical Research ► What does FDA say?
https://www.fda.gov/downloads/MedicalDevices/.../ucm073779.pdf
61
R in Clinical Research ► What does FDA say?
S.O.P
Dependability System Documentation System Controls
https://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm070266.pdf
62
R in Clinical Research ► What does FDA-related people say?
http://user2007.org/program/presentations/soukup.pdf
63
R in Clinical Research ► What does FDA-related people say?
64
R in Clinical Research ► What does FDA-related people say?
65
R in Clinical Research ► What does FDA-related people say?
Another argument
for validating the R
66
R in Clinical Research ► What does FDA-related people say?
67
R in Clinical Research ► What does FDA-related people say?
68
R in Clinical Research ► What does it mean „to validate”? Why do we want this?
Finally, we got to this place. Let us now try to answer this question in layman terms:
“To validate” means to ensure that R does all the calculations properly.
But to confirm this, we need to check dozens of components, packages, functions.
Remember:
FDA doesn’t tell you what exactly should be validated (which functions). You decide.
The analysis of risk and validation coverage is entirely up to you.
That’s our responsibility to do it WELL.
Why? The necessity for validation is also to protect you and let you sleep well.
Try to think this way. Once done properly – it gives you a reliable, powerful tool. 69
R in Clinical Research ► Preparing R to enter the industry
We know what FDA wants from us and have a piece of advice how to do it
We have the source code provided for both R Core and every package
Reference data for testing are available in the Internet or can be obtained
There are tools allowing the system maintainer to protect (“to freeze”) the newly
And a bonus
Only used functions have to be tested. Unused code means non-existent code.
71
R in Clinical Research ► Preparing R to enter the industry
The R-FDA.PDF document is a giant milestone. It makes a perfect starting point in the
process of establishing an own controlled R-based environment.
For obvious reasons it is limited only to a small subset o packages, labelled “Base” and
“Recommended”.
These packages don’t cover the complete ser o statistical routines used in clinical
research, but will definitely allow one to start with advanced analysis employing:
• linear mixed models (with given covariance structure), generalized additive models,
• survival analysis,
• accessing data generated by external statistical packages,
• resampling (bootstrap)
• and tons of statistical tests
72
• plotting (low-level and quite advanced via “lattice” package) and much more.
R in Clinical Research ► Preparing R to enter the industry
73
https://www.r-project.org/doc/R-FDA.pdf
Validation ► Validation of installation vs. numerical validation
Thought #1: incorrectly installed R or its package will not work properly or even
launch. It is useless.
74
Validation ► Validation of installation vs. numerical validation
75
Validation ► Numerical validation ► Methods
By inspecting the code and compare the implemented formula with the
76
reference in corresponding textbook (so-so, but allows to find issues)
Validation ► Numerical validation ► Methods
Comparison has to be done with some tolerance, as it is likely, that two statistical
packages will slightly differ in results, due to numerous issues, like:
Obtained collection:
Statistical method name
Values of relevant parameters
Input data set provided to the reference software
An outcome returned by the reference software
…can be then enclosed into so-called “unit tests” code and stored into a
repository. A unit-testing engine queries the repository, fetches the definitions of
tests and passes them to appropriate functions for test in fully automated
manner. The tested function returns a result which is compared to the
reference. At the end it generates a report from validation.
78
Validation ► Validation of installation
79
https://www.londonr.org/wp-content/uploads/sites/2/presentations/LondonR_-_Challenges_Of_Validating_R_-_Chris_Campbell_-_20140617.pdf
Fixing the environment and controlling for changes
BUT!
80
Fixing the environment and controlling for changes
part II - soon!
82