Vous êtes sur la page 1sur 6

NESUG 2010 Pharmaceutical Applications

Integrated Summary of Safety and Efficacy


Programming for Studies Using Electronic Data Capture
Changhong Shi, Merck & Co., Inc., Rahway, NJ
Qing Xue, Merck & Co., Inc., Rahway, NJ

ABSTRACT
The Integrated Summary of Safety (ISS) and Integrated Summary of Efficacy (ISE) are essential components of a successful
submission. In legacy studies where different types of data are frequently collected through diverse systems by various
vendors, programming ISS and ISE analysis can be a daunting job because all study data need to be converted and
harmonized to the same format before programming and analysis work can begin. Studies that utilize the Electronic Data
Capture (EDC) system have similar structured views which can greatly ease the harmonization process. However, even
though harmonization is limited there remain many unique challenges to be addressed by programmers in multi-study data
integration for ISS and ISE. This paper discusses specific tips and techniques to efficiently program integrated analyses which
focus on the following areas: (1) data source checking, (2) "spread and convene" programming approach, and (3) consistent
data and folder structure.

Keywords: ISS, Integrated Summary of Safety, Integrated Summary of Efficacy

INTRODUCTION
The Integrated Summary of Safety (ISS) and Integrated Summary of Efficacy (ISE) are essential components of a successful
submission. They differ from a regular study since: (a) there is a larger amount of data, (b) usually each study has been
locked for frozen file before ISS and ISE, and (c) in the component individual study, different folder structures might have been
used since these studies could have been locked for a long period of time which means that they may have followed different
standards. This paper will detail the techniques to efficiently handle and accommodate these challenges which include:

(1) data source checking


(2) "spread and convene" programming approach
(3) consistent data and folder structure.

1. DATA SOURCE CHECKING


ISS and ISE typically contain more than one study as well as a large amount of data. In order to achieve accurate integrated
analyses, data must be scrutinized in order to catch important scenarios that need special attention. Also, due to the number
of patients and large amount of data involved, it is impossible to "eyeball" everything in ISS or ISE as is sometimes done
against a single small study. The following techniques, although simple, prove to be efficient for checking the data source
before programming:

A). Frequency procedure


By checking the values of variables using a frequency procedure, it can be determined if special attention is required
and can be used to propose suggestions to the statistician on data handling. The following example checks the values
of "Action Taken with Study Treatment" (AEACN variable in SDTM AE domain, SDTM 3.1.1. IG) across the pooled ISS
studies:

proc freq data=iss.ae;


tables aeacn/list;
run;

Result obtained:

Cumulative Cumulative
AEACN Frequency Percent Frequency Percent

DOSE INCREASED 2 0.01 2 0.01
DOSE NOT CHANGED 32999 95.44 33001 95.44
DOSE REDUCED 136 0.39 33137 95.84
DRUG INTERRUPTED 650 1.88 33787 97.72
DRUG WITHDRAWN 556 1.61 34343 99.33
NOT APPLICABLE 230 0.67 34573 99.99
UNKNOWN 3 0.01 34576 100.00

Frequency Missing = 7

-1-
NESUG 2010 Pharmaceutical Applications

In the frequency distribution above, there are seven AE records with AEACN as blank and three with wording as
'Unknown'. Rather than going directly to the table production, further investigation and reports to the statisticians
and database team to consult for a final decision is recommended.

B). Missing values and blank values are our "friends" in ISS or ISE
Statistical programmers have to deal frequently with missing or blank values, and this is especially important in an
ISS or ISE when dictionary leveling is involved.

Commonly, ISS data is leveled to use the same dictionary version across all studies. This may result in missing data
due to expired terminology. For example, consider the following hierarchy in the drug dictionary:

CMDECOD (Standardized Medication Name)


Then
CMCLAS (Medication Class)

If one CMDECOD expires and cannot be leveled per the dictionary version used by an ISS or ISE, no corresponding
CMCLAS is able to be assigned. This data leveling issue in the resulting ISS or ISE could be identified by performing
a simple frequency procedure against the leveled variables (CMDECOD and CMCLAS in the above example) to
ensure blanks do not occur. Due to the large amount of data in an ISS or ISE, this programmatic checking is
important as it can be overlooked by manual methods. Therefore, missing or blank values are our "friends" for an
ISS or ISE in that they help to identify harmonization issues for integrated analyses.

2. PROGRAMMING APPROACH
It is possible to put the raw data for all studies together and write one set of programs for ISS or ISE, but this approach
becomes problematic to debug and determine the source of problems, especially when there are a large number of
component studies. To save debugging and validation time, the approach adopted for our 19 ISS studies was to first program
by individual study and then reuse the code from the clinical summary report (CSR) or other existing programs. The results
are then compared with the existing CSR or other published results.

After the programming work is done for each individual component study of an ISS or ISE and all programs for individual
studies have been developed and validated, we then just need simple stacking programs to stack the analyses datasets
together. A simple set of stacking programs were written to stack the analysis datasets in ADaM format with the same data
structure; further work was completed on the stacked analysis data. We called this approach "spread and convene".

The advantage of this approach can be seen in the laboratory safety (LAB) and predefined limit of change (PDLC) analysis.
For the example listed in the next page, i.e. we produced a PDLC listing table for an ISS consisting of 19 studies, which had
11 columns as follows:

lab test name


treatment group name
protocol number
patient allocation number
lab test code (CDISC code)
analysis time point (week)
lab measurement day relative to the reference start date (here, it is the trial start date, i.e. the date for the first non-zero
dose medication date)
baseline value (for simplicity in this example, baseline value is defined using the last measurement with
measurement day relative to the reference start date <=1)
test value for the analysis time point
upper limit of normal range (UL)/lower limit of normal range (LLN)
hit (indicates if the specific record meets in the PDLC criterion in the row header)

In this example, we show three patients: two from Prot123 with allocation number (AN) as10001and10030, and one from
Prot456, with AN as 1000. Since Prot456 was a Phase IIB study, and Pro123 was a Phase III study, the study design was
somewhat different, and the baseline definition was different. Therefore, in order to obtain the table below where baseline
value is in one column, the most efficient approach was to "spread" first, i.e. set up the lab data for Prot456 and Prot123
separately, compare the results against the original CSR or other exploratory outputs, and then "convene", i.e. stack the
analysis dataset where baseline value is set as one column. This is also suitable for the analysis time point column where the
way to define weeks was different for each study. Note that the table should only contain those patients who had at least one
dose of study medication. Using the "spread and convene" approach instead of trying to integrate all data together - in this
case data from 19 studies running the program took considerably less time.

-2-
NESUG 2010 Pharmaceutical Applications

Listing of Patients With Two or More Consecutive Serum Creatinine Measurements


with an Increase from Baseline of 0.3 mg/dL or of 50%
Pooled Studies
Alloc
ation Endpoint(s)
Proto Numb Assessed Relative Baseline Test LLN,
Lab Test Treatment col er for Test Week Day Value Value ULN Hit
Criterion: Two or more consecutive measurements with an increase from baseline of >=0.3 mg/dL or of >= 50%
Serum Creatinine (mg/dL) 10001 CREAT -10 -69 1.1 1.2 0.7, 1.4
Serum Creatinine (mg/dL) Non-exposed 123 10001 CREAT 0 1 1.1 1.1 0.7, 1.4
Serum Creatinine (mg/dL) 10001 CREAT 3 22 1.1 1.4 0.7, 1.4 Yes
Serum Creatinine (mg/dL) 10001 CREAT 6 36 1.1 1.4 0.7, 1.4 Yes
Serum Creatinine (mg/dL) 10001 CREAT 6 43 1.1 1.2 0.7, 1.4
Serum Creatinine (mg/dL) 10001 CREAT 12 91 1.1 1.3 0.7, 1.4
Serum Creatinine (mg/dL) 10001 CREAT 18 127 1.1 1.2 0.7, 1.4
Serum Creatinine (mg/dL) 10001 CREAT 18 141 1.1 1.2 0.7, 1.4
Serum Creatinine (mg/dL) 10030 CREAT -9 -63 0.9 1 0.7, 1.4
Serum Creatinine (mg/dL) Non-exposed 123 10030 CREAT 0 1 0.9 0.9 0.7, 1.4
Serum Creatinine (mg/dL) 10030 CREAT 3 13 0.9 1 0.7, 1.4
Serum Creatinine (mg/dL) 10030 CREAT 3 22 0.9 1 0.7, 1.4
Serum Creatinine (mg/dL) 10030 CREAT 6 43 0.9 1.2 0.7, 1.4 Yes
Serum Creatinine (mg/dL) 10030 CREAT 12 85 0.9 1.2 0.7, 1.4 Yes
Serum Creatinine (mg/dL) 10030 CREAT 18 114 0.9 1.2 0.7, 1.4 Yes
Serum Creatinine (mg/dL) 10030 CREAT 18 125 0.9 1.2 0.7, 1.4 Yes
Serum Creatinine (mg/dL) 10030 CREAT 24 167 0.9 1.2 0.7, 1.4 Yes
Serum Creatinine (mg/dL) 1000 CREAT -7 -49 0.7 0.8 0.7, 1.4
Serum Creatinine (mg/dL) 1000 CREAT -2 -14 0.7 0.8 0.7, 1.4
Serum Creatinine (mg/dL) Non-exposed 456 1000 CREAT 0 1 0.7 0.7 0.7, 1.4

-3-
NESUG 2010 Pharmaceutical Applications

3. CONSISTENT DATA and FOLDER STRUCTUE


For an ISS or ISE that only contains studies where data are collected using EDC, we may have a consistent data structure at
database lock. However, if for some reason such as standard changes, or a non-EDC study within an ISS or ISE, we may
have different folder and data structures for each study. To fully realize the advantage of data and folder structures in ISS and
ISE, a consistent data and folder structure which has exactly the same naming convention is necessary. This way it is possible
to use virtually the same code for defining the input and output directory paths at startup. Listed below is a folder structure we
found helpful:

ISS Directory Structure

( -- folder -- file)

| ISS2009
| OverallISS
| dataanalysis
| adlab.sas7bdat
| adpdlc.sas7bdat
| pgmsetup
| pgmanalysis
| utility
| startup.sas
| p456
| sdtmplus
| dm.sas7bdat
| lb.sas7bdat
| dataanalysis
| adlab.sas7bdat
| adpdlc.sas7bdat
| pgmsetup
| pgmanalysis
| utility
| startup.sas
| p123
| sdtmplus
| dm.sas7bdat
| lb.sas7bdat
| dataanalysis
| adlab.sas7bdat
| adpdlc.sas7bdat
| pgmsetup
| pgmanalysis
| utility
| startup.sas
| p789
| p012
.

-4-
NESUG 2010 Pharmaceutical Applications

The following is a consistent data structure example for our ADSL dataset within each component study:

LABEL TYPE/ DECODE/DERIVATION/COMMENTS


VARIABLE LENGTH
STUDYID Study Identifier C/200
USUBJID Unique Subject Identifier C/200
SUBJID Subject Identifier for the Study C/200 Also known as Randomized Patient Identifier.
SITEID Study Site Identifier C/200
ETHNIL Ethnicity C/200
ETHNIN Ethnicity, Num N/8 1: Hispanic or Latino
2: Not Hispanic or Latino
AGE Age N/8 Age in Years
SEX Sex C/2
RACE Race C/200 AMERICAN INDIAN OR ALASKA NATIVE:
American Indian or Alaska Native | ASIAN: Asian
| BLACK OR AFRICAN AMERICAN: Black or
African American | MULTI-RACIAL: Multi-Racial
| NATIVE HAWAIIAN OR OTHER PACIFIC
ISLANDER: Native Hawaiian Or Other Pacific
Islander | WHITE: White
FASFL Full Analysis Set Pop Flag C/1 Flag to identify FAS population for the primary
efficacy end point.
FASFN Full Analysis Set Pop Flag, N/8 1: Included in FAS population
Num 0: Excluded from FAS population
ARM Description of Planned Arm C/200
TRT1P Planned Treatment for Period 1 C/200
TRT1PN Planned Treatment Number for N/8 1: Placebo
Period 1 2: Study drug
RANDDT Date of Randomization N/8

TRTSTDT Date of First Exposure to N/8


Treatment
TRTENDT Date of Last Exposure to N/8
Treatment

-5-
NESUG 2010 Pharmaceutical Applications

CONCLUSION
This paper provides some basic techniques and tips for ISS and ISE programming. The steps help to enable the efficient and
accurate creation of multiple ISS and ISE studies. If all the component study data are collected using the SDTM format, more
development can be made to standardize the programs for each component study analysis, when applicable, and further
improve efficiency.

REFERENCES
CDISC Study Data Tabulation Model Implementation Guide: Human Clinical Trials Version 3.1.1(SDTM 3.1.1 IG)
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in
the USA and other countries. indicates USA registration.

ACKNOWLEDGEMENTS
The author would like to thank the management team for their review of this paper.

CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the authors at:
Changhong Shi
Merck Co. & Inc.
RY34-A320
P.O. Box 2000
Rahway, NJ 07065
Changhong_shi@merck.com

Qing Xue
Merck Co. & Inc.
RY34-A320
P.O. Box 2000
Rahway, NJ 07065
qing_xue@merck.com

-6-

Vous aimerez peut-être aussi