Vous êtes sur la page 1sur 82

40min

The use of R statistical package in


controlled infrastructure
The case of Clinical Research industry PART I

Adrian Olszewski Polish National Group of the


Senior Biostatistician at 2KMM International Society for Clinical Biostatistics

www.2kmm.eu
www.r-clinical-research.com
22th Jun 2018
r.clin.res@gmail.com Poland • Sosnowiec http://www.iscb.pl
DISCLAIMER

All trademarks, logos of companies and names of products

used in this document

are the sole property of their respective owners

and are included here for informational, illustrative purposes only,

which falls within the nominative fair use.

This presentation is based exclusively on information

publicly available on the Internet under provided hyperlinks.

If you believe your rights are violated, please email me: r.clin.res@gmail.com
Agenda

► Quick introduction to R

o Description

o History. Events important for the use of R in EBM*

o Who uses R?

* Evidence-Based Medicine 3
Agenda

► R in Evidence-Based Medicine

o Capabilities

o A brief overview of common tasks

o Cooperation and compliance with SAS

o www.r-clinical-research.com or CRAN Task Views

4
Agenda

► R in Clinical Research

o Status of R on the Clinical Research market

o Myths and Facts

o What does FDA say?

o What does it mean „to validate”? Why do we want this?

o Preparing R to enter the industry

5
Agenda

► Validation

o Validation of installation vs. numerical validation

o Numerical validation

o Methods

o Reference data

► Fixing the environment and controlling for changes

► How does R support the creation of a controlled environment?

6
Agenda

► Conclusions

► Does it work?

► Q&A

7
Quick introduction to R ► Description

R is an open-source software environment, widely used in scientific world for:

σ𝒙
𝒏 statistical computing

data manipulation

data presentation
https://www.r-project.org
and other general programming tasks

It’s also the name of a high-level, Turing-complete, interpreted, multi-paradigm


programming language used within the environment.
8
Quick introduction to R ► Description

Short characteristics: model <- lm(y ~ x1 * x2)

► Description computational environment + programming language

► Developer R Development Core Team

► Operating systems cross-platform: Windows, Unix, Linux, OS X, mobile: Android, Maemo, Raspbian

► Form command line + third-party IDEs and editors

► Infrastructure R core library + shell + libraries (base and third-party)

► Model of work 1) standalone application, 2) standalone server, 3) server process

► Programming language Turing-complete, domain-specific, interpreted, high-level with dynamic typing

1) array, 2) object-oriented (S3, S4, R5, R6 models), 3) imperative, 4) functional,


► Paradigm
5) procedural, 6) reflective

► Source of libraries mirrored repository – CRAN, users' sites, third-party repositories (Github, RForge)

► License of the core GNU General Public License ver. 2 9


► License of libraries 99.9% open-source. 0.1% is licensed (free for non-commercial use)
Quick introduction to R ► Description

The basic GUI on Windows

10
Quick introduction to R ► Description

Advanced IDE – RStudio

11
Quick introduction to R ► Description

Advanced IDE – Microsoft Visual Studio

12
Quick introduction to R ► History
Statistical Sciences, Inc. from Bell Labs Insightful Corporation TIBCO
R. Douglas Martin from AT&T  Lucent
Exclusive license
University of Washington
to develop and sell S code boguht
S-PLUS was born the S language for $2 mln IC acquired TIBCO Spotfire
1988 1993 2004 2008

Bell Laboratories TERR - TIBCO Enterprise Runtime for R


Rick Becker, via AT&T First statistical system to receive the
Allan Wilks, First Software System Award, the top 2013
John Chambers Oracle
commercial software award from the Association
S was born release New S Language for Computing Machinery R Enterprise
2008
1976 1980 1988 1998
Microsoft
R Open was born
The last version Revolution Analytics
2015
Revolution was born

2007 Revolution
v 1.0.0
acquired
2000 by Microsoft

1993 1997 2003 2007 2015


R was born First release R Core Team R Foundation R Consortium
CRAN was formed was formed was founded 13
Univ. of Auckland
Ross Ihaka, Robert Gentleman
Quick introduction to R ► History ► (few) Events important for the use of R in EBM
1997 The first release of R FDA 21 CFR Part 11 CRAN
1998 nlme
1999 FDA „Off-The-Shelf Software Use in Medical Device”
2000 xtable
2001 DBI  survival
2002 multcomp FDA „General Principles of Software Validation – Final” Bioconductor
2003 lme4  nlmeODE The R Core Team
2004
2005 drc (Dose-Response)  PKfit  PK  ggplot2  ROCR
2006 gsDesign  meta  mice  tdm  ivivc  blockrand  pwr
"Using R: Perspectives of a FDA Statistical RevieweR„
"R - Regulatory Compliance and Validation Issues"
2007 SASxport  Rtools "Use of R in C.T. & Industry-Sponsored Medical Res. "
The R Foundation
"Op. Sour. Stat. Soft. in Pharma Developm.: A case study with R"
2008 MCPMod  bear  rjags  epiR  plyr  DanteR
2009 SAS IML studio supports R  SAS7bdat  metafor  gamm4
2010 PKGraph  pROC  oro.nifti  oro.dicom  PowerTOST
2011 RStudio  Detools  ggbio  RISmed rplos
2012 Shiny  knitr  Pmetrics  TrialSize  stargazer  OpenCPU FDA: „Sponsors may use R in their submissions”
2013 cpk The SAS® versus R Debate in Industry and Academia
Tidyverse  ValidR  Checkpoint  Packrat  Rmarkdown 
2014 rclinicaltrials  pubmed.miner  ReporteRs  greport  dplyr
MRAN

2015 rxODE  gfd  ThreeArmedTrials  randomizeR FDA: „Statistical Software Clarifying Statement” The R Consortium
2016 R Tools for Visual Studio  rankFD The R Epid. Cons.
2017 dfpk - Bayesian Dose-Finding Designs  officer
2018 Mediana - general framework for CT simulations 14
Quick introduction to R ► Who uses R?

Medicine and Pharmacy Other Business & Science Tycoons

► American Express ► J.P. Morgan


► Bank of America ► Kickstarter
► BBC ► Microsoft
► 2KMM ► GCE
► Capgemini ► Monsanto
► Amgen ► KCR (2014-2017) ► Deloitte ► Mozilla
► Astra Zeneca ► Medtronic ► Ebay ► New York Times
► Bayer ► Merck ► Facebook ► NIST - National Institute of

► CardioDX ► Novartis ► Fermi National Standards & Technology


Accelerator Laboratory ► NOAA
► Dr. Reddy’s Laboratories ► Pfizer
► Ford ► Oracle
► FDA ► Roche
► Goldman Sachs ► Twitter
► Google ► Uber
► HP ► UK Government
15
► IBM ► Wells Fargo
Quick introduction to R ► Who uses R?

The list is built based exclusively on publicly available information:

 lists of users provided by Revolution, RStudio and others


 articles (example, example) and interviews (example)
 published documents in which a name of a company is visible (example)
 job advertisements (LinkedIn, Google, PharmiWeb, etc.)
 names of companies supporting / organizing events (conferences, courses, etc)
 other sources (example)

That is to say, a logo of a company is included in the list only if there is a clear evidence that the
company uses or supports (or used or supported) R, based on information shared on the Internet –
and thus available for everyone.

Please note, that I am not aware if all listed companies are still using any version of R at the time the
presentation is being viewed. If you want me to remove your logo, please send me an mail to
16
r.clin.res@gmail.com
Quick introduction to R ► Who uses R?

“We use R for adaptive designs frequently because it’s the fastest tool to explore designs that interest
us. Off-the-shelf software, gives you off-the-shelf options. Those are a good first order approximation,
but if you really want to nail down a design, R is going to be the fastest way to do that.”

Keaven Anderson
Executive Director, Late Stage Biostatistics
Merck
Publicly available sources:
https://pharma-life-sciences.cioreview.com/news/gsdesign-explorer-to-optimize-merck-s-clinical-trial-process-nid-1305-cid-36.html
Google Books: Big Data for Big Pharma: An Accelerator for The Research and Development Engine?

“De facto, R is already a significant component of Pfizer core technology. Access to a supported
version of R will allow us to keep pace with the growing use of R in the organization, and provides a
path forward to use of R in regulated applications.”

James A. Rogers Ph.D.


Associate Director, Nonclinical Statistics Group
Publicly available sources: Pfizer
https://www.featuredcustomers.com/vendor/revolution-analytics-1/customers/pfizer
17
Quick introduction to R ► Who uses R?

“We use R for all of our analysis,” says Elashoff. “I think it’s fair to say that R really is the
foundation of a lot of the work that we do.” To speed up the process without sacrificing
accuracy, the team also uses Revolution R analytic products. “We use R seven or eight
hours per day, so any improvement in speed is helpful, particularly when you’re looking at a
million biomarkers and wondering if you’ll need to re-run a million analyses.”

Open-source R packages enable the biostatisticians at CardioDX to run a broad range of


analyses, accurately and effectively, on a routine basis. Adding Revolution R products to the
mix improves processing speeds and makes it easier to crunch large data sets. Accelerating
the analytic process reduces ov erall project time, increasing the team’s efficiency. “Revolution
R is faster than regular R,” says Elashoff. “The faster we can analyze data, the less time it
takes us to build our diagnostic algorithms.”

Michael Elashoff
The company’s director of biostatistics
Publicly available sources: CardioDX
https://www.featuredcustomers.com/media/CustomerCaseStudy.document/revolution-analytics-1_cardiodx_8284.pdf 18
Quick introduction to R ► Who uses R?

“We use R for all of our analysis,” says Elashoff. “I think it’s fair to say that R really is the
foundation of a lot of the work that we do.” To speed up the process without sacrificing
accuracy, the team also uses Revolution R analytic products. “We use R seven or eight
hours per day, so any improvement in speed is helpful, particularly when you’re looking at a
million biomarkers and wondering if you’ll need to re-run a million analyses.”

Open-source R packages enable the biostatisticians at CardioDX to run a broad range of


analyses, accurately and effectively, on a routine basis. Adding Revolution R products to the
mix improves processing speeds and makes it easier to crunch large data sets. Accelerating
the analytic process reduces ov erall project time, increasing the team’s efficiency. “Revolution
R is faster than regular R,” says Elashoff. “The faster we can analyze data, the less time it
takes us to build our diagnostic algorithms.”

Michael Elashoff
The company’s director of biostatistics
CardioDX
Publicly available sources:
19
https://www.businesswire.com/news/home/20110118006656/en/CardioDX-Revolution-Analytics-Develop-Non-Intrusive-Test-Predicting
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks

descriptive analysis advanced plotting detection of outliers missing data imputation


& summarizing
univariate / multivariate *OCF, kNN, LI, MI, censored (KM)

modeling time-to-event / survival repeated measures &


longitudinal trials factorial design analysis
(non) parametric (non) linear models Kaplan-Meier, Nelson-Aalen,
with mixed effects Cox regression, Weibull parametric / non-parametric parametric / non-parametric

errors-in-variables modeling
planned
comparison of methods robust methods resampling
& post-factum analysis
Deming, Passing-Bablock, Bland-Altman regularized, M-estimators bootstrap, permutation, exact

categorical data design of experiments


sample size & power randomization
analysis parallel, cross-over, adaptive,
group-sequential, multi-arm

non-inferiority
PK, PD,
meta-analysis ROC analysis superiority
(bio) equivalence Dose-Response
20
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks

accessing databases exchanging data advanced data


Excel, OO Calc, GNumeric, SPSS, Weka, accessing registers querying &
ODBC, JDBC, Oracle, MS SQL, MySQL, Systat, Stata, EpiInfo, SAS, SAS XPT,
dBase, PostgreSQL, SQLite, DB/2, Minitab, Octave, Matlab, DBF, CSV, XML,
clinicaltials.gov, PubMed, PLOS
transforming
Informix, Firebird, H2, MongoDB, more… HTML, JSON, DICOM, NIFTI

GUI desktop production tools and


cooperation with SAS interoperability
& server applications unit testing
.NET, Java, Scala, Python, C++, Fortran,
PHP, Perl, DDE, COM, TCP, WebServices

interactive
producing documents reproducible research logging processes
presentations
doc(x), ppt(x), pdf, rtf, odf, ps pure ascii, html, pdf, doc

21
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks

 Descriptive stats

Data review 

22
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks

 Linear regression

 ANOVA

 post-hoc 
23
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks

 GLM modelling 

24
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks

 NLM modelling 

25
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks

26
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks

27
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks

28
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks

29
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks

30
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks

31
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks

32
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks

33
R in Evidence-Based Medicine ► Capabilities ► Cooperation & compliance with SAS

𝑛
1 𝑥 − 𝑥𝑖

𝑛ℎ𝑑 ℎ SAS IML
𝑖=1 SAS
or different module #1
method of
communication

Required algorithm or functionality SAS base

Differences in:
Missing or
 origin of dates SAS
Bi-directional expensive
module #2
 default contrasts
communication functionality
 used sum of squares
 calculation of quantiles
 generation of random numbers
 implementation of advanced model 34
 representation of floating point numbers
SAS and R Team in Clinical Research (Adrian Olszewski)
Agenda

► R in Clinical Research

o Status of R on the Clinical Research market

o Myths and Facts

o What does FDA say?

o What does it mean „to validate”? Why do we want this?

o Preparing R to enter the industry

35
R in Clinical Research ► Status of R on the Clinical Research market

 In general bioscience and academia, S  R has built over years its


position of one of the industry standards

 In clinical research, however, SAS reigns par excellence

 Pharmaceutical companies, CROs and even FDA do use R “internally”.


But they resist (or hesitate) to use it in submissions (to FDA).

 Clinical Programmer or Biostatistician ≝ SAS Programmer. Period.

OK, but how did it come to this?


36
R in Clinical Research ► Status of R on the Clinical Research market

We can only speculate on why so often R users are told the mantra:

Too many myths have accumulated, but we cannot ignore the facts. 37
R in Clinical Research ► Myths and Facts

Facts Myths / objections


 FDA demands SAS for both the analysis and producing
 FDA requires software to be validated
datasets. No other software is allowed.

 R is not validated out-of-the-box  R cannot generate datasets in SAS Transport format

 R cannot cooperate with SAS, including reading and writing


 R doesn’t facilitate the creation of CDISC datasets
SAS binary files

 R doesn’t have a metadata layer

 R doesn’t have paid hot-line  R cannot be validated as well as commercial software

 Nobody takes the responsibility if something goes


 Commercial software doesn’t have errors
wrong

 Packages change over time. What works today, may


 R is full of bugs (errors) as nobody controls it
not work tomorrow. Packages happen to be removed

 Validation of a software is challenging and time-


 R poorly supports the generation of TFLs
consuming process, so not everyone can afford.

38
R in Clinical Research ► Myths and Facts

Facts Myths / objections


 Errors happen often in non-commercial software  R is limited in terms of implemented statistical methods

 Nobody uses R (or Open Source in general) in pharma


 Announcing errors publicly doesn’t make people calm industry (or in “serious business’). Maybe in academia, which
is not a kind of a serious business.

 FDA: “Results should be reproducible and


 R doesn’t meet 21 CFR Part 11, which is a must
independent of the software used to derive them”

 Creators of R packages don’t have to provide (good)


 R has no SAS-like “LOG”, which records everything
unit tests. It’s king of a good will.

 There is no commercial support (product and/or validation)

 The entire software (all packages and functions) must be


validated

 Commercial software releases the end-user from any


responsibility regarding the validation.
39
R in Clinical Research ► Myths and Facts

Facts Myths

Who is right and…


…is it possible to use R in controlled environment?

40
R in Clinical Research ► Myths and Facts

First, let us briefly address all points in the “table of shame”. Facts first.

 Yes. This is mandatory process. And that’s good! It protects not


 FDA requires software to be validated
only the sponsor from serious troubles but also the patients!

 Yes, the official release is not guaranteed to be errors-free. A


 R is not validated out-of-the-box disclaimer note confirming that is displayed every time R is
launched. But the validation is fully possible.

 R doesn’t facilitate the creation of CDISC  True. There is no easy GUI tools to map fields between CDASH
datasets and SDTM or easy-to-use ways to generate define.xml

 Partially true. R supports attributes on every level of a data


structure. With a few effort it can be implemented effectively. I
 R doesn’t have a metadata layer
plan to release a package allowing datasets to be annotated and
printed in line with the assigned formats.

 True. This is not a commercial project. But the R community is


 R doesn’t have paid hot-line
vibrant and provides giant amount of knowledge (Stack, Github)

41
R in Clinical Research ► Myths and Facts

First, let us briefly address all points in the “table of shame”. Facts first.

 True. There is not a commercial project. By the way, to what


 Nobody takes the responsibility if something extent exactly commercial companies take the responsibility? Do
goes wrong you have the conditions (and $$$) written down on paper and
signed?

 Packages change over time. What works


 Very true. This can be effectively managed in many ways.
today, may not work tomorrow. Packages
Addressed later in this presentation
happen to be removed

 Validation of a software is challenging and


 Very true. Time is money. One has to analyze the profitability and
time-consuming process, so not everyone
then make a decision.
can afford.

42
R in Clinical Research ► Myths and Facts

First, let us briefly address all points in the “table of shame”. Facts first.

 That is true. There is no “paid testers”, only volunteers. It does


not mean at all they perform any worse, but also does not make
any guarantee they perform well.

 Errors happen in every software, including commercial. Even in


 Errors happen often in non-commercial the top-quality medical devices (FDA recalls that in their
software guidelines), nuclear devices (Therac-25 medical accelerator
case), power plants, space rockets, and even in Martian Rover or
Mariner I space probe.

 There is no error-free software. There is only software testes not


well enough.

 Well, that is true. But hiding issues doesn’t make them less
dangerous.

 “Transparency” is the most reliable way of cooperating with


 Announcing errors publicly doesn’t make software users. Programmers or end-users publicly announce
people calm errors so the whole community can learn about that and react
quickly. Nothing is hidden, all the more so as this is Open Source.

How often are you getting informed about errors in your favorite
software with full details and the source code?
43
R in Clinical Research ► Myths and Facts

First, let us briefly address all points in the “table of shame”. Facts first.

 That is true. Results may differ between statistical packages. A


 FDA: “Results should be reproducible and little – but still.
independent of the software used to derive
them”  If FDA uses SAS for checking, we may get into trouble in case of
resampling methods even with the same seed set.

 Creators of R packages don’t have to provide  Yes. Even if forced to write tests, nobody can guarantee the tests
(good) unit tests. It’s king of a good will. are defined properly and bring any advantage.

44
R in Clinical Research ► Myths and Facts

Now myths.
 FDA demands SAS for both the analysis and
 No. FDA has never claimed that. This myth is so often repeated,
producing datasets. No other software is
so FDA issued an official “Software “Clarifying Statement”
allowed.

 False. R can generate XPT using SASxport package.


 The SAS Transport Format is an open format and published by
 R cannot generate datasets in SAS Transport
SAS Institute long time ago:
format
1. https://www.loc.gov/preservation/digital/formats/....
2. http://documentation.sas.com....

 False. R can be combined with SAS in may ways. Check this out:
https://www.quora.com/How-can-I-integrate-SAS-with-R
 R cannot cooperate with SAS, including
 SAS enabled direct communication between R and SAS in the
reading and writing SAS binary files
IML module in 2009.
 R can read SAS7 binary data files and both read/write XPT files.
 R cannot be validated as well as commercial  False. R can be validated no worse. In fact there is at least one
software company offering validated version of R – Mango.

 Commercial software doesn’t have errors  Facts deny this claim evidently.

 Errors happen in third-party packages. No trace of increased


 R is full of bugs (errors) as nobody controls it
reporting of bugs has had a place
 False. There are packages for creation of advanced graphs
 R poorly supports the generation of TFLs (ggplot2), Word documents, OpenDocument files, RTF and PDF.
All tasks can be automatized since R is a programming language.
R in Clinical Research ► Myths and Facts

Now myths.

 R is limited in terms of implemented statistical  We have just seen how rich is the R statistical library. This is the
methods most complete library after SAS (plus few routines more)

 False. We have just seen few slides ago, that pharmaceutical


 Nobody uses R (or Open Source in general)
companies do use R.
in pharma industry (or in “serious business’).
Maybe in academia, which is not a kind of a
 Not to mention the non-clinical representatives of a “serious
serious business.
business”.
R in Clinical Research ► Myths and Facts

Now myths.
 Let me quote this: Whoever told you that is not well-informed. CFR Part 11 has to do
with critical software that runs medical devices and about certain primary data
management software. It does not apply to statistical analysis software. We use R all
the time in industry-sponsored and NIH sponsored clinical trials. You do not need to
seek FDA's approval. FDA accepts all comers and does not dictate software policy for
analysis. They even accept Excel and Minitab for NDAs. There are many messages
related to this in the r-help archive; please look at them.
Frank E Harrell Jr
Professor and Chair School of Medicine, Department of Biostatistics
Vanderbilt University
Source

 And this: “Records submitted to FDA, under predicate rules in electronic format [are Part
 R doesn’t meet 21 CFR 11 records]. However, a record that is not itself submitted, but is used in generating a
Part 11, which is a must submission, is not a part 11 record unless it is otherwise required to be maintained under
a predicate rule and it is maintained in electronic format.”

Therefore, it is not mandated that 21 CFR Part 11 is appropriate to data analysis


software systems that are not primarily intended for storage and transmission of
electronic medical records. It remains the responsibility of an individual organization
however to define the applicability of Part 11 and validation to their systems.

R: Regulatory Compliance and Validation, 11 March 25, 2018


Source

 Formal confirmation: Statistical Software Clarifying Statement by FDA Source


R in Clinical Research ► Myths and Facts

Now myths.

 Yes, R doesn’t have a “LOG”, but with RMarkdown (or knitr,


sweave, odfWeave) and following the Reproducible Research
paradigm, the “LOG” can easily be reproduced effortlessly.

 The generated HTML (or PDF, DOCx) document contains both


the code and corresponding results combined.
 R has no SAS-like “LOG”, which records
everything
 In addition, employing a versioning system (SVN, Git) to store the
“LOG” into a repository, allows the analyst to version it and track
changes. This gives a high level of confidence.

 Less sophisticated, yet fully valid method can be implemented


with the “sink()” function.
 False. Mango ValidR product is a good example. Revolution also
 There is no commercial support (product
offered paid support. This refers only to certain packages (mostly
and/or validation)
from the “base” set)
 No. It has to be done properly and sufficiently.

 Practice shows, that only the used part of R code must be


 The entire software (all packages and validated. If a package contains 1000 functions, while only two of
functions) must be validated them are used, only the two functions have to be validated. If a
validated function X calls an unvalidated function Y, the results
subjected to validation is still returned by the function X under
given parameters and conditions.
R in Clinical Research ► Myths and Facts

Now myths.

 No. Let’s quote FDA: All production and/or quality system


software, even if purchased off-the-shelf, should have
documented requirements that fully define its intended use, and
 Commercial software releases the end-user
information against which testing results and other evidence can
from any responsibility regarding the
be compared, to show that the software is validated for its
validation.
intended use

Source: General Principles of Software Validation - Final Guidance


for Industry and FDA Staff
R in Clinical Research ► What does FDA say?

Now, let us see what FDA has said about:

 The use of any software in clinical research. This is the KEY.


 The process of validation of the software

Then let us look at what some FDA-related people say about R

50
R in Clinical Research ► What does FDA say?

The use of any software in clinical research ( + 21 CFR part 11 status)

https://www.fda.gov/downloads/forindustry/datastandards/studydatastandards/ucm587506.pdf
51
R in Clinical Research ► What does FDA say?

The process of validation of the software

General Principles of Software Validation


Final Guidance for Industry and FDA Staff

[…] FDA considers software validation to be: “confirmation by examination and


provision of objective evidence that software specifications conform to user
needs and intended uses, and that the particular requirements implemented
through software can be consistently fulfilled.”

https://www.fda.gov/downloads/medicaldevices/.../ucm085371.pdf
52
R in Clinical Research ► What does FDA say?

This document […] can be applied to any software.


[…]
This document does not specifically identify which software is or is not regulated
[…]
The management and control of the software validation process should not be
confused with any other validation requirements, such as process validation for an
automated manufacturing process (so the regular validation of clinical programs don’t count)

[…]
design input requirements must be documented, and that specified requirements
must be verified
[…]
Success in accurately and completely documenting software requirements is a crucial
factor in successful validation of the resulting software.
53
R in Clinical Research ► What does FDA say?

A specification is defined as “a document that states requirements.”


[…]
There are many different kinds of written specifications, e.g., system requirements
specification, software requirements specification, software design specification,
software test specification, software integration specification, etc
[…]
Software verification provides objective evidence that the design outputs of a
particular phase of the software development life cycle meet all of the specified
requirements for that phase. Software verification looks for consistency,
completeness, and correctness of the software and its supporting
documentation, as it is being developed, and provides support for a subsequent
conclusion that software is validated.

54
R in Clinical Research ► What does FDA say?

Software validation is a part of the design validation for a finished device, but is not
separately defined in the Quality System regulation. For purposes of this guidance,
FDA considers software validation to be “confirmation by examination and
provision of objective evidence that software specifications conform to user
needs and intended uses, and that the particular requirements implemented
through software can be consistently fulfilled.

SOFTWARE VERIFICATION
≠ SOFTWARE VALIDATION

Production Installation and work


is R and all packages done well? mean( 1:3 ) == 2 ? 55
R in Clinical Research ► What does FDA say?

Software validation includes confirmation of conformance to all software


specifications and confirmation that all software requirements are traceable to the
system specifications.

requirements

documentation

( verification ) + validation

specification The system confirmation 56


of the system
R in Clinical Research ► What does FDA say?

Because of its complexity, the development process for software should be even
more tightly controlled than for hardware, in order to prevent problems that cannot
be easily detected later in the development process.
[…]
Seemingly insignificant changes in software code can create unexpected and
very significant problems elsewhere in the software program. The software
development process should be sufficiently well planned, controlled, and documented
to detect and correct unexpected results from software changes.

57
R in Clinical Research ► What does FDA say?

SECTION 4. PRINCIPLES OF SOFTWARE VALIDATION

4.9. INDEPENDENCE OF REVIEW


Validation activities should be conducted using the basic quality

assurance precept of “independence of review.” Self-validation is

extremely difficult. When possible, an independent evaluation is


always better, especially for higher risk applications.

Validator Builder 58
R in Clinical Research ► What does FDA say?

The software requirements specification document should contain a written definition of the software functions.
It is not possible to validate software without predetermined and documented software requirements.

Typical software requirements specify the following:


 All software system inputs
 All software system outputs
 All functions that the software system will perform
 All performance requirements that the software will meet, (e.g., data throughput, reliability, and timing)
 The definition of all external and user interfaces, as well as any internal software-to-system interfaces
 How users will interact with the system
 What constitutes an error and how errors should be handled
 Required response times
 The intended operating environment for the software, if this is a design constraint (e.g. hardware platform,
operating system)
 All ranges, limits, defaults, and specific values that the software will accept
59
 All safety related requirements, specifications, features, or functions that will be implemented in software
R in Clinical Research ► What does FDA say?

The vendor’s life cycle documentation, such as testing protocols and results, source code, design
specification, and requirements specification, can be useful in establishing that the software has
been validated. However, such documentation is frequently not available from commercial
equipment vendors, or the vendor may refuse to share their proprietary information.

Now let’s stop for a while and quickly summarize what we already learned

commercial software open-source software


 No source code  Source code provided

 No proprietary technical information  Full documentation provided (if available)

 Assurance “we did our best”  Assurance “we did our best”

 No guarantee  No guarantee

 Support  No hot-line. But very active community.

 …”millions of people use that”  …”millions of people use that”

 Full trust: it’s paid = validated well  Low trust. Free things are poorly made 60
R in Clinical Research ► What does FDA say?

The process of validation of the software

Guidance for Industry, FDA Reviewers


and Compliance on
Off-The-Shelf Software Use in Medical Devices

This is another essential document. A must-read.


We are not going to analyze it thoroughly, yet it is strongly
recommended to familiarize with.

https://www.fda.gov/downloads/MedicalDevices/.../ucm073779.pdf
61
R in Clinical Research ► What does FDA say?

Introduction to the controlled environment

Guidance for Industry


Computerized Systems Used in Clinical Investigations

S.O.P
Dependability System Documentation System Controls

Change Control Documentation Training of Personnel

https://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm070266.pdf
62
R in Clinical Research ► What does FDA-related people say?

http://user2007.org/program/presentations/soukup.pdf
63
R in Clinical Research ► What does FDA-related people say?

Hey! Do read it again!

(see the next page)

64
R in Clinical Research ► What does FDA-related people say?

1. Use of R functions without proper validation is at the organizations risk


It seems that nobody forces us to validate R. Is it up to us? So
let’s better do (see the next page)

2. Results should be reproducible and independent of the software used


to derive them.
This is impossible “by definition”… SAS, R, Stata, SPSS may return
different results even for quantiles, or due to floating number representation!
The results should be maximally close to each other, but what about resampling
methods (SAS and R gives different random numbers for the same seed)?

65
R in Clinical Research ► What does FDA-related people say?

Another argument
for validating the R

66
R in Clinical Research ► What does FDA-related people say?

67
R in Clinical Research ► What does FDA-related people say?

68
R in Clinical Research ► What does it mean „to validate”? Why do we want this?

Finally, we got to this place. Let us now try to answer this question in layman terms:

“To validate” means to ensure that R does all the calculations properly.
But to confirm this, we need to check dozens of components, packages, functions.

Remember:

 FDA doesn’t tell you what exactly should be validated (which functions). You decide.
The analysis of risk and validation coverage is entirely up to you.
That’s our responsibility to do it WELL.

 Why? The necessity for validation is also to protect you and let you sleep well.
Try to think this way. Once done properly – it gives you a reliable, powerful tool. 69
R in Clinical Research ► Preparing R to enter the industry

What tools do we have and what is to be done?

 We know FDA allows us to use R in submissions

 We know what FDA wants from us and have a piece of advice how to do it

 We have the source code provided for both R Core and every package

 Most of the packages refers to handbooks and point to certain formulas

 The R Core Team prepared a very important document on this topic

 R has tools for unit testing

 Reference data for testing are available in the Internet or can be obtained

 There are tools allowing the system maintainer to protect (“to freeze”) the newly

validated environment against changes. 70


R in Clinical Research ► Preparing R to enter the industry

And a bonus

 Validation is incremental. Once validated, a function doesn’t have to be re-


validated until update. Of course we can validate it many times (which I
recommend), which is easy with automated tools.

 Only used functions have to be tested. Unused code means non-existent code.

 Accumulation of test-cases over time significantly improves the process of


validation. Every new trial is a source of new, real data, perfect for testing.

71
R in Clinical Research ► Preparing R to enter the industry

The R-FDA.PDF document is a giant milestone. It makes a perfect starting point in the
process of establishing an own controlled R-based environment.

For obvious reasons it is limited only to a small subset o packages, labelled “Base” and
“Recommended”.

These packages don’t cover the complete ser o statistical routines used in clinical
research, but will definitely allow one to start with advanced analysis employing:
• linear mixed models (with given covariance structure), generalized additive models,
• survival analysis,
• accessing data generated by external statistical packages,
• resampling (bootstrap)
• and tons of statistical tests
72
• plotting (low-level and quite advanced via “lattice” package) and much more.
R in Clinical Research ► Preparing R to enter the industry

73
https://www.r-project.org/doc/R-FDA.pdf
Validation ► Validation of installation vs. numerical validation

What aspects of R-based computing environment can be validated?

 The process of installation of the core R


 The process of installation of required packages (version)
 The quality of code in installed packages (code metrics)
 Coverage by unit tests defined in installed packages
 The outcome of these unit tests

Thought #1: incorrectly installed R or its package will not work properly or even
launch. It is useless.

74
Validation ► Validation of installation vs. numerical validation

What aspects of R-based computing environment can be validated?

 The correctness of calculations performed by selected functions in selected


packages.

Thought #2: even correctly installed R or package, but returning wrong


results of calculation is not even useless, it’s extremely dangerous!

Well-done Validation = Validation of installation + Numerical validation

75
Validation ► Numerical validation ► Methods

How to validate a module numerically?

 By comparing results with some reference data, obtained from trusted


source (good!)
trial versions (if license permits) of other statistical packages
asking someone who has a legal licence to run a certain analysis on given data
publicly available documentation with examples

 By comparing results with calculations done by hand, step by step


(makes sense only for easy methods)

 By inspecting the code and compare the implemented formula with the
76
reference in corresponding textbook (so-so, but allows to find issues)
Validation ► Numerical validation ► Methods

How to validate a module numerically?

Comparison has to be done with some tolerance, as it is likely, that two statistical
packages will slightly differ in results, due to numerous issues, like:

 Different way of storing floating point numbers

 Different approach to calculating quantiles

 Different algorithm of rounding numbers

 Difference in default contrasts set

 Difference in type of Sum of Square used

 Difference in random number generator (for same seed)

 Different correction applied to a method (different rules of choice) 77


Validation ► Numerical validation ► Methods

How to validate a module numerically?

 Obtained collection:
 Statistical method name
 Values of relevant parameters
 Input data set provided to the reference software
 An outcome returned by the reference software

…can be then enclosed into so-called “unit tests” code and stored into a
repository. A unit-testing engine queries the repository, fetches the definitions of
tests and passes them to appropriate functions for test in fully automated
manner. The tested function returns a result which is compared to the
reference. At the end it generates a report from validation.
78
Validation ► Validation of installation

79
https://www.londonr.org/wp-content/uploads/sites/2/presentations/LondonR_-_Challenges_Of_Validating_R_-_Chris_Campbell_-_20140617.pdf
Fixing the environment and controlling for changes

How to prevent the environment from being “invalidated”?

 To prevent the users updating the R core


 To prevent users from installing “illegal”(not validated) packages
 “foreign” packages (not in the local use)
 in different version

BUT!

 Each project may require different set of packages in different versions


 Certain project may require installation of new (yet not validated) packages
 New packages are created within the company

80
Fixing the environment and controlling for changes

How to prevent the environment from being “invalidated”?

 Docker containers (Rocker)

 Read-only environment on a CD or DVD (slow!)

 Portable version of R with “broken”.libPaths

 Isolation of the workstation from the Internet (so cruel!)

 Local repository of packages (in different versions): miniCRAN

 The checkpoint solution, based on MRAN

 The packrat solution, combined with miniCRAN

 Employing a Concurrent Versioning System, like SVN or Git


81
Thank you

part II - soon!

82

Vous aimerez peut-être aussi