1. Table of Contents

This project implements the survival analysis and descriptive modelling steps for a two-group right-censored data with time-independent variables using the Cox Proportional Hazards Model with various helpful packages in R. The Kaplan-Meier Survival Curves and Log-Rank Test were applied during the differential analysis of the survival data between groups. All predictors’ prognostic significance were individually and simultaneously evaluated using Univariate and Multivariate Cox Proportional Hazards Models, respectively. The discrimination power of the resulting models were assessed using the Harrel’s Concordance Index. The final prognostic model was internally validated using Bootstrap Validation with Optimism Estimation and evaluated for compliance on all required model assumptions using the appropriate diagnostics. All results were consolidated in a Summary presented at the end of the document.

Survival analysis corresponds to a set of statistical approaches for time-to-event data, where the outcome variable of interest is the time until an event occurs. Descriptive modelling involves the formulation of statistical models to identify patterns and relationships that can be used for statistical inferences given the predictors and response variables. The methods applied in this study (mostly contained in the survival package) attempt to explore the relationship between factors and both the survival time and event outcomes, while evaluating their effects through statistical hypotheses testing.

1.1 Sample Data

The Leukemia dataset from the book Survival Analysis : A Self-Learning Text was used for this illustrated example.

Preliminary dataset assessment:

[A] 42 Rows (observations composed of leukemia patients enrolled in the study)

[B] 4 Columns (variables)
     [B.1] 1/4 Factor variable = Group (factor)
          [B.1.1] Placebo = placebo drug treatment
          [B.1.2] Treatment = new drug treatment
     [B.2] 1/4 Censoring variable = Status (factor)
          [B.2.1] 1 = event, patient went out of remission before the end of the study
          [B.2.2] 0 = censored or non-event, patient remains in remission until the end of the study, lost to follow-up, experiences a different event that makes further follow-up impossible or withdraws before the end of the study
     [B.3] 1/4 Response variables = Week (numeric)
          [B.3.1] Week = time in weeks a patient is in remission up to the point of the patient’s either going out of remission or being censored
     [B.4] 1/4 Covariate variable = WBC (numeric)
          [B.4.1] WBC = logarithm-transformed white blood cell measurement data gathered before the study which is also considered an important survival predictor in leukemia patients

Code Chunk | Output

##################################
# Loading R libraries
##################################
library(moments)
library(car)
library(multcomp)
library(effects)
library(psych)
library(ggplot2)
library(dplyr)
library(ggpubr)
library(rstatix)
library(ggfortify)
library(trend)
library(survival)
library(rms)
library(survminer)
library(Hmisc)
library(finalfit)
library(knitr)
library(gtsummary)

##################################
# Defining file paths
##################################
DATASETS_PATH <- file.path("datasets")

##################################
# Reading and creating the complete dataset
##################################
DBP.Complete <- read.csv(file.path("..", DATASETS_PATH, "Leukemia.csv"),
                na.strings=c("NA","NaN"," ",""),
                stringsAsFactors = T)


##################################
# Performing a general exploration of the dataset
##################################
dim(DBP.Complete)

## [1] 42  5

str(DBP.Complete)

## 'data.frame':    42 obs. of  5 variables:
##  $ ID    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Group : Factor w/ 2 levels "Placebo","Treatment": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Week  : int  6 6 6 7 10 13 16 22 23 6 ...
##  $ Status: int  1 1 1 1 1 1 1 1 1 0 ...
##  $ WBC   : num  2.31 4.06 3.28 4.43 2.96 2.88 3.6 2.32 2.57 3.2 ...

summary(DBP.Complete)

##        ID              Group         Week           Status      
##  Min.   : 1.00   Placebo  :21   Min.   : 1.00   Min.   :0.0000  
##  1st Qu.:11.25   Treatment:21   1st Qu.: 6.00   1st Qu.:0.0000  
##  Median :21.50                  Median :10.50   Median :1.0000  
##  Mean   :21.50                  Mean   :12.88   Mean   :0.7143  
##  3rd Qu.:31.75                  3rd Qu.:18.50   3rd Qu.:1.0000  
##  Max.   :42.00                  Max.   :35.00   Max.   :1.0000  
##       WBC       
##  Min.   :1.450  
##  1st Qu.:2.303  
##  Median :2.800  
##  Mean   :2.930  
##  3rd Qu.:3.490  
##  Max.   :5.000

##################################
# Setting the levels for the treatment categories
##################################
DBP.Complete$Group <- factor(DBP.Complete$Group,
                           levels = c("Placebo","Treatment"))

DBP.Complete$StatusLevel <- factor(DBP.Complete$Status,
                            levels = c(0,1))

##################################
# Creating an analysis dataset without the subject information column
##################################
DBP.Analysis <- DBP.Complete[,-1]

##################################
# Creating a response-covariate dataset without the treatment column
##################################
DBP.Response.Covariate <- DBP.Analysis[,-1]

##################################
# Formulating a data type assessment summary
##################################
PDA <- DBP.Response.Covariate
(PDA.Summary <- data.frame(
  Column.Index=c(1:length(names(PDA))),
  Column.Name= names(PDA), 
  Column.Type=sapply(PDA, function(x) class(x)), 
  row.names=NULL)
)

##   Column.Index Column.Name Column.Type
## 1            1        Week     integer
## 2            2      Status     integer
## 3            3         WBC     numeric
## 4            4 StatusLevel      factor

1.2 Data Quality Assessment

[A] No missing observations noted for any response and covariate variable.

[B] No low variance (Unique.Count.Ratio<0.01 or First.Second.Mode.Ratio>5) noted for any response variable.

[C] No high skewness (Skewness>3 or Skewness<(-3)) noted for any response variable.

Code Chunk | Output

##################################
# Reusing the response-covariate dataset
##################################
DQA <- DBP.Response.Covariate

##################################
# Formulating a data subset of response variables (numeric)
##################################
DQA.Variables.Response <- as.data.frame(DQA$Week)

##################################
# Formulating a data subset of covariate variables (numeric)
##################################
DQA.Variables.Covariate.Numeric <- as.data.frame(DQA$WBC)

##################################
# Formulating an overall data quality assessment summary
##################################
(DQA.Summary <- data.frame(
  Column.Index=c(1:length(names(DQA))),
  Column.Name= names(DQA), 
  Column.Type=sapply(DQA, function(x) class(x)), 
  Row.Count=sapply(DQA, function(x) nrow(DQA)),
  NA.Count=sapply(DQA,function(x)sum(is.na(x))),
  Fill.Rate=sapply(DQA,function(x)format(round((sum(!is.na(x))/nrow(DQA)),3),nsmall=3)),
  row.names=NULL)
)

##   Column.Index Column.Name Column.Type Row.Count NA.Count Fill.Rate
## 1            1        Week     integer        42        0     1.000
## 2            2      Status     integer        42        0     1.000
## 3            3         WBC     numeric        42        0     1.000
## 4            4 StatusLevel      factor        42        0     1.000

##################################
# Checking for missing observations
##################################
if ((nrow(DQA.Summary[DQA.Summary$NA.Count>0,]))>0){
  print(paste0("Missing observations noted for ",
               (nrow(DQA.Summary[DQA.Summary$NA.Count>0,])),
               " variable(s) with NA.Count>0 and Fill.Rate<1.0."))
  DQA.Summary[DQA.Summary$NA.Count>0,]
} else {
  print("No missing observations noted.")
}

## [1] "No missing observations noted."

##################################
# Formulating a data quality assessment summary for response variables (numeric)
##################################

if (length(names(DQA.Variables.Response))>0) {
  
  ##################################
  # Formulating a function to determine the first mode
  ##################################
  FirstModes <- function(x) {
    ux <- unique(na.omit(x))
    tab <- tabulate(match(x, ux))
    ux[tab == max(tab)]
  }

  ##################################
  # Formulating a function to determine the second mode
  ##################################
  SecondModes <- function(x) {
    ux <- unique(na.omit(x))
    tab <- tabulate(match(x, ux))
    fm = ux[tab == max(tab)]
    sm = na.omit(x)[!(na.omit(x) %in% fm)]
    usm <- unique(sm)
    tabsm <- tabulate(match(sm, usm))
    usm[tabsm == max(tabsm)]
  }
  
  (DQA.Variables.Response.Summary <- data.frame(
  Column.Name= names(DQA.Variables.Response), 
  Column.Type=sapply(DQA.Variables.Response, function(x) class(x)), 
  Unique.Count=sapply(DQA.Variables.Response, function(x) length(unique(x))),
  Unique.Count.Ratio=sapply(DQA.Variables.Response, function(x) format(round((length(unique(x))/nrow(DQA.Variables.Response)),3), nsmall=3)),
  First.Mode.Value=sapply(DQA.Variables.Response, function(x) format(round((FirstModes(x)[1]),3),nsmall=3)),
  Second.Mode.Value=sapply(DQA.Variables.Response, function(x) format(round((SecondModes(x)[1]),3),nsmall=3)),
  First.Mode.Count=sapply(DQA.Variables.Response, function(x) sum(na.omit(x) == FirstModes(x)[1])),
  Second.Mode.Count=sapply(DQA.Variables.Response, function(x) sum(na.omit(x) == SecondModes(x)[1])),
  First.Second.Mode.Ratio=sapply(DQA.Variables.Response, function(x) format(round((sum(na.omit(x) == FirstModes(x)[1])/sum(na.omit(x) == SecondModes(x)[1])),3), nsmall=3)),
  Minimum=sapply(DQA.Variables.Response, function(x) format(round(min(x,na.rm = TRUE),3), nsmall=3)),
  Mean=sapply(DQA.Variables.Response, function(x) format(round(mean(x,na.rm = TRUE),3), nsmall=3)),
  Median=sapply(DQA.Variables.Response, function(x) format(round(median(x,na.rm = TRUE),3), nsmall=3)),
  Maximum=sapply(DQA.Variables.Response, function(x) format(round(max(x,na.rm = TRUE),3), nsmall=3)),
  Skewness=sapply(DQA.Variables.Response, function(x) format(round(skewness(x,na.rm = TRUE),3), nsmall=3)),
  Kurtosis=sapply(DQA.Variables.Response, function(x) format(round(kurtosis(x,na.rm = TRUE),3), nsmall=3)),
  Percentile25th=sapply(DQA.Variables.Response, function(x) format(round(quantile(x,probs=0.25,na.rm = TRUE),3), nsmall=3)),
  Percentile75th=sapply(DQA.Variables.Response, function(x) format(round(quantile(x,probs=0.75,na.rm = TRUE),3), nsmall=3)),
  row.names=NULL)
  )  
  
}

##   Column.Name Column.Type Unique.Count Unique.Count.Ratio First.Mode.Value
## 1    DQA$Week     integer           24              0.571            6.000
##   Second.Mode.Value First.Mode.Count Second.Mode.Count First.Second.Mode.Ratio
## 1            11.000                4                 3                   1.333
##   Minimum   Mean Median Maximum Skewness Kurtosis Percentile25th Percentile75th
## 1   1.000 12.881 10.500  35.000    0.876    2.870          6.000         18.500

##################################
# Identifying potential data quality issues for response variables (numeric) 
##################################

##################################
# Checking for zero or near-zero variance predictors
##################################
if (length(names(DQA.Variables.Response))==0) {
  print("No numeric predictors noted.")
} else if (nrow(DQA.Variables.Response.Summary[as.numeric(as.character(DQA.Variables.Response.Summary$First.Second.Mode.Ratio))>5,])>0){
  print(paste0("Low variance observed for ",
               (nrow(DQA.Variables.Response.Summary[as.numeric(as.character(DQA.Variables.Response.Summary$First.Second.Mode.Ratio))>5,])),
               " numeric variable(s) with First.Second.Mode.Ratio>5."))
  DQA.Variables.Response.Summary[as.numeric(as.character(DQA.Variables.Response.Summary$First.Second.Mode.Ratio))>5,]
} else {
  print("No low variance numeric predictors due to high first-second mode ratio noted.")
}

## [1] "No low variance numeric predictors due to high first-second mode ratio noted."

if (length(names(DQA.Variables.Response))==0) {
  print("No numeric predictors noted.")
} else if (nrow(DQA.Variables.Response.Summary[as.numeric(as.character(DQA.Variables.Response.Summary$Unique.Count.Ratio))<0.01,])>0){
  print(paste0("Low variance observed for ",
               (nrow(DQA.Variables.Response.Summary[as.numeric(as.character(DQA.Variables.Response.Summary$Unique.Count.Ratio))<0.01,])),
               " numeric variable(s) with Unique.Count.Ratio<0.01."))
  DQA.Variables.Response.Summary[as.numeric(as.character(DQA.Variables.Response.Summary$Unique.Count.Ratio))<0.01,]
} else {
  print("No low variance numeric predictors due to low unique count ratio noted.")
}

## [1] "No low variance numeric predictors due to low unique count ratio noted."

##################################
# Checking for skewed response variables
##################################
if (length(names(DQA.Variables.Response))==0) {
  print("No response variables noted.")
} else if (nrow(DQA.Variables.Response.Summary[as.numeric(as.character(DQA.Variables.Response.Summary$Skewness))>3 |
                                               as.numeric(as.character(DQA.Variables.Response.Summary$Skewness))<(-3),])>0){
  print(paste0("High skewness observed for ",
  (nrow(DQA.Variables.Response.Summary[as.numeric(as.character(DQA.Variables.Response.Summary$Skewness))>3 |
                                               as.numeric(as.character(DQA.Variables.Response.Summary$Skewness))<(-3),])),
  " response variable(s) with Skewness>3 or Skewness<(-3)."))
  DQA.Variables.Response.Summary[as.numeric(as.character(DQA.Variables.Response.Summary$Skewness))>3 |
                                 as.numeric(as.character(DQA.Variables.Response.Summary$Skewness))<(-3),]
} else {
  print("No skewed response variables noted.")
}

## [1] "No skewed response variables noted."

1.3 Research Question

Using the Leukemia dataset, survival analysis and modelling will be conducted to investigate the following :

[A] Research Question : What is the probability that a leukemia patient under the Treatment and Placebo groups will survive (remain in remission) after a specified number of weeks?
     [A.1] Factor variable = Group
          [A.1.1] Placebo = placebo drug treatment
          [A.1.2] New = new drug treatment
     [A.2] Response variable = Week (numeric)
     [A.3] Censoring variable = Status (factor)
          [A.3.1] 1 = event, patient went out of remission before the end of the study
          [A.3.2] 0 = censored or non-event, patient remains in remission until the end of the study, lost to follow-up, experiences a different event that makes further follow-up impossible or withdraws before the end of the study

[B] Research Question : Are there differences in survival between leukemia patients under the Treatment and Placebo groups?
     [B.1] Factor variable = Group
          [B.1.1] Placebo = placebo drug treatment
          [B.1.2] Treatment = new drug treatment
     [B.2] Response variable = Week (numeric)
     [B.3] Censoring variable = Status (factor)
          [B.3.1] 1 = event, patient went out of remission before the end of the study
          [B.3.2] 0 = censored or non-event, patient remains in remission until the end of the study, lost to follow-up, experiences a different event that makes further follow-up impossible or withdraws before the end of the study

[C] Research Question : What is the impact of the clinical characteristic WBC on the survival of leukemia patients under the Treatment and Placebo groups?
     [C.1] Factor variable = Group
          [C.1.1] Placebo = placebo drug treatment
          [C.1.2] Treatment = new drug treatment
     [C.2] Response variable = Week (numeric)
     [C.3] Censoring variable = Status (factor)
          [C.3.1] 1 = event, patient went out of remission before the end of the study
          [C.3.2] 0 = censored or non-event, patient remains in remission until the end of the study, lost to follow-up, experiences a different event that makes further follow-up impossible or withdraws before the end of the study
     [C.4] Covariate variable = WBC (numeric) = logarithm-transformed white blood cell measurement data gathered before the study which is also considered an important survival predictor in leukemia patients

1.4 Data Exploration

Exploratory and comparative analyses of the response variable Week by factor variable Group showed that :

[A] Differences in the proportion of censored and uncensored data (p<0.001, Pearson’s Chi-squared Test) observed between Group=Treatment (censored=57%, uncensored=43%) and Group=Placebo (censored=0%, uncensored=100%).

[B] With both censored and uncensored data considered, higher median survival time Week (p=0.004, Wilcoxon Rank Sum Test) was observed for Group=Treatment (16) as compared to Group=Placebo (8). Lower median white blood cell measurement WBC (p=0.047, Wilcoxon Rank Sum Test) was also noted for Group=Treatment (2.57) as compared to Group=Placebo (3.06).

[C] With only the uncensored considered, higher mean survival time Week observed for Group=Treatment (12.11) as compared to Group=Placebo (8.67). Lower mean hazard rate was also noted for Group=Treatment (2.51) as compared to Group=Placebo (11.53).

Kaplan-Meier survival curve analysis of the response variable Week by factor variable Group (with censoring variable Status considered) showed that :

[A] Differences in the Kaplan-Meier survival curves (p<0.0001, Log Rank Test) were noted between Group=Treatment (median survival time=23) and Group=Placebo (median survival time=8).

[B] Higher survival time was consistently observed for leukemia patients under Group=Treatment as compared to those under Group=Placebo at various sampled durations including Week=5 (Treatment=100%, Placebo=57%), Week=10 (Treatment=75%, Placebo=38%), Week=15 (Treatment=69%, Placebo=14%), Week=20 (Treatment=63%, Placebo=10%) and Week=23 (Treatment=45%, Placebo=0%).

1.4.1 Descriptive Statistics

Code Chunk | Output

##################################
# Reusing the dataset
##################################
head(DBP.Analysis)

##       Group Week Status  WBC StatusLevel
## 1 Treatment    6      1 2.31           1
## 2 Treatment    6      1 4.06           1
## 3 Treatment    6      1 3.28           1
## 4 Treatment    7      1 4.43           1
## 5 Treatment   10      1 2.96           1
## 6 Treatment   13      1 2.88           1

##################################
# Formulating a preliminary summary
# of descriptive statistics 
##################################

DBP.Analysis %>%
  select(Week,StatusLevel,WBC,Group) %>%
  tbl_summary(
    by = Group,
    type = all_continuous() ~ "continuous2",
    statistic = all_continuous() ~ c("{N_nonmiss}",
                                     "{mean} ({sd})", 
                                     "{median} ({p25}, {p75})", 
                                     "{min}, {max}"),
    missing = "no") %>% 
  add_overall() %>% 
  add_p()

Characteristic	Overall N = 42¹	Placebo N = 21¹	Treatment N = 21¹	p-value²
Week				0.004
N Non-missing	42	21	21
Mean (SD)	13 (9)	9 (6)	17 (10)
Median (Q1, Q3)	11 (6, 19)	8 (4, 12)	16 (9, 23)
Min, Max	1, 35	1, 23	6, 35
StatusLevel				<0.001
0	12 (29%)	0 (0%)	12 (57%)
1	30 (71%)	21 (100%)	9 (43%)
WBC				0.047
N Non-missing	42	21	21
Mean (SD)	2.93 (0.92)	3.22 (0.97)	2.64 (0.77)
Median (Q1, Q3)	2.80 (2.30, 3.49)	3.06 (2.42, 3.97)	2.57 (2.16, 2.96)
Min, Max	1.45, 5.00	1.50, 5.00	1.45, 4.43
¹ n (%)
² Wilcoxon rank sum test; Pearson’s Chi-squared test

##################################
# Computing for the mean uncensored survival time
# and mean hazard rate between treatment and placebo groups
##################################
DBP.Analysis.DescriptiveStatistics <- DBP.Analysis %>%
  group_by(Group) %>%
  summarise(
    Mean.Uncensored.Survival.Time = mean(Week[Status==1]),
    Total.Uncensored.Survival.Time = sum(Status==1),
    Total.Survival.Time = sum(Week),
    Mean.Hazard.Rate = (Total.Uncensored.Survival.Time/Total.Survival.Time)*100)

kable(DBP.Analysis.DescriptiveStatistics)

Group	Mean.Uncensored.Survival.Time	Total.Uncensored.Survival.Time	Total.Survival.Time	Mean.Hazard.Rate
Placebo	8.666667	21	182	11.538462
Treatment	12.111111	9	359	2.506964

1.4.2 Kaplan-Meier Survival Curve

Kaplan-Meier Survival Curve estimates a survival function which makes no assumptions about how the probability that an observation develops the event changes over time. It is a step function plot obtained from the values of a follow-up table which summarizes the experiences of subjects over a pre-defined follow-up period in a study until the time of the event of interest or the end of the study, whichever comes first. The method involves generating a sequential list of time values corresponding to an event or censoring; determining the counts of event-free observations, event and censored cases for each time value; and computing the survival probabilities based on these counts.

Code Chunk | Output

##################################
# Reusing the dataset
##################################
head(DBP.Analysis)

##       Group Week Status  WBC StatusLevel
## 1 Treatment    6      1 2.31           1
## 2 Treatment    6      1 4.06           1
## 3 Treatment    6      1 3.28           1
## 4 Treatment    7      1 4.43           1
## 5 Treatment   10      1 2.96           1
## 6 Treatment   13      1 2.88           1

##################################
# Computing the Kaplan-Meier survival estimates
# using the Baseline data
##################################
DBP.Analysis.Baseline.KaplanMeierEstimates <- survfit(Surv(Week, Status) ~ 1, data = DBP.Analysis)
print(DBP.Analysis.Baseline.KaplanMeierEstimates)

## Call: survfit(formula = Surv(Week, Status) ~ 1, data = DBP.Analysis)
## 
##       n events median 0.95LCL 0.95UCL
## [1,] 42     30     12       8      22

tbl_survfit(DBP.Analysis.Baseline.KaplanMeierEstimates,
            times = c(5,10,15,20,23),
            label_header = "**Week {time}**"
)

Characteristic	Week 5	Week 10	Week 15	Week 20	Week 23
Overall	79% (67%, 92%)	57% (43%, 74%)	40% (27%, 59%)	34% (22%, 53%)	19% (9.1%, 39%)

##################################
# Formulating a summary of the Kaplan-Meier survival estimates
# using the Baseline data
##################################
summary(DBP.Analysis.Baseline.KaplanMeierEstimates)

## Call: survfit(formula = Surv(Week, Status) ~ 1, data = DBP.Analysis)
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     1     42       2    0.952  0.0329       0.8901        1.000
##     2     40       2    0.905  0.0453       0.8202        0.998
##     3     38       1    0.881  0.0500       0.7883        0.985
##     4     37       2    0.833  0.0575       0.7279        0.954
##     5     35       2    0.786  0.0633       0.6709        0.920
##     6     33       3    0.714  0.0697       0.5899        0.865
##     7     29       1    0.690  0.0715       0.5628        0.845
##     8     28       4    0.591  0.0764       0.4588        0.762
##    10     23       1    0.565  0.0773       0.4325        0.739
##    11     21       2    0.512  0.0788       0.3783        0.692
##    12     18       2    0.455  0.0796       0.3227        0.641
##    13     16       1    0.426  0.0795       0.2958        0.615
##    15     15       1    0.398  0.0791       0.2694        0.588
##    16     14       1    0.369  0.0784       0.2437        0.560
##    17     13       1    0.341  0.0774       0.2186        0.532
##    22      9       2    0.265  0.0765       0.1507        0.467
##    23      7       2    0.189  0.0710       0.0909        0.395

(DBP.Analysis.Baseline.KaplanMeierEstimates.Summary <- summary(DBP.Analysis.Baseline.KaplanMeierEstimates)$table)

##   records     n.max   n.start    events     rmean se(rmean)    median   0.95LCL 
## 42.000000 42.000000 42.000000 30.000000 15.339334  1.860218 12.000000  8.000000 
##   0.95UCL 
## 22.000000

##################################
# Listing the details of the Kaplan-Meier survival estimates
# using the Baseline data
##################################
(DBP.Analysis.Baseline.KaplanMeierEstimates.Details <- surv_summary(DBP.Analysis.Baseline.KaplanMeierEstimates))

##    time n.risk n.event n.censor      surv    std.err     upper      lower
## 1     1     42       2        0 0.9523810 0.03450328 1.0000000 0.89010544
## 2     2     40       2        0 0.9047619 0.05006262 0.9980394 0.82020220
## 3     3     38       1        0 0.8809524 0.05672304 0.9845441 0.78826037
## 4     4     37       2        0 0.8333333 0.06900656 0.9540195 0.72791433
## 5     5     35       2        0 0.7857143 0.08058230 0.9201453 0.67092330
## 6     6     33       3        1 0.7142857 0.09759001 0.8648499 0.58993369
## 7     7     29       1        0 0.6896552 0.10370794 0.8451005 0.56280201
## 8     8     28       4        0 0.5911330 0.12925834 0.7615705 0.45883899
## 9     9     24       0        1 0.5911330 0.12925834 0.7615705 0.45883899
## 10   10     23       1        1 0.5654316 0.13668944 0.7391461 0.43254350
## 11   11     21       2        1 0.5115809 0.15393678 0.6917443 0.37834076
## 12   12     18       2        0 0.4547386 0.17504565 0.6408567 0.32267306
## 13   13     16       1        0 0.4263175 0.18656807 0.6145258 0.29575091
## 14   15     15       1        0 0.3978963 0.19892096 0.5876134 0.26943131
## 15   16     14       1        0 0.3694751 0.21228296 0.5601196 0.24371913
## 16   17     13       1        1 0.3410540 0.22687951 0.5320388 0.21862656
## 17   19     11       0        1 0.3410540 0.22687951 0.5320388 0.21862656
## 18   20     10       0        1 0.3410540 0.22687951 0.5320388 0.21862656
## 19   22      9       2        0 0.2652642 0.28847936 0.4669095 0.15070392
## 20   23      7       2        0 0.1894744 0.37465077 0.3948698 0.09091746
## 21   25      5       0        1 0.1894744 0.37465077 0.3948698 0.09091746
## 22   32      4       0        2 0.1894744 0.37465077 0.3948698 0.09091746
## 23   34      2       0        1 0.1894744 0.37465077 0.3948698 0.09091746
## 24   35      1       0        1 0.1894744 0.37465077 0.3948698 0.09091746

##################################
# Plotting the Kaplan-Meier survival curves
# using the Baseline data
##################################
SurvivalTimeInWeek.KaplanMeier.Baseline <- ggsurvplot(DBP.Analysis.Baseline.KaplanMeierEstimates,
                                                     title = "Kaplan-Meier Survival Curve (Baseline)",
                                                     pval = FALSE,
                                                     conf.int = TRUE,
                                                     conf.int.style = "ribbon",
                                                     xlab = "Time (Weeks)",
                                                     ylab="Estimated Survival Probability",
                                                     break.time.by = 5,
                                                     ggtheme = theme_bw(),
                                                     risk.table = "abs_pct",
                                                     risk.table.title="Number at Risk (Survival Probability)",
                                                     risk.table.y.text.col = FALSE,
                                                     risk.table.y.text = TRUE,
                                                     fontsize = 3,
                                                     ncensor.plot = FALSE,
                                                     surv.median.line = "hv",
                                                     palette = c("#000000"))

ggpar(SurvivalTimeInWeek.KaplanMeier.Baseline,
      font.title = c(14,"bold"),
      font.x = c(12,"bold"),
      font.y = c(12,"bold"),
      font.legend=c(12),
      font.xtickslab=c(9,"black"),
      font.ytickslab=c(9,"black"))

##################################
# Computing the Kaplan-Meier survival estimates
# using the Factor Variable
##################################
DBP.Analysis.KaplanMeierEstimates <- survfit(Surv(Week, Status) ~ Group, data = DBP.Analysis)
print(DBP.Analysis.KaplanMeierEstimates)

## Call: survfit(formula = Surv(Week, Status) ~ Group, data = DBP.Analysis)
## 
##                  n events median 0.95LCL 0.95UCL
## Group=Placebo   21     21      8       4      12
## Group=Treatment 21      9     23      16      NA

tbl_survfit(DBP.Analysis.KaplanMeierEstimates,
            times = c(5,10,15,20,23),
            label_header = "**Week {time}**"
)

Characteristic	Week 5	Week 10	Week 15	Week 20	Week 23
Group
Placebo	57% (39%, 83%)	38% (22%, 66%)	14% (5.0%, 41%)	9.5% (2.5%, 36%)	0% (—, —)
Treatment	100% (100%, 100%)	75% (59%, 97%)	69% (51%, 93%)	63% (44%, 90%)	45% (25%, 81%)

##################################
# Formulating a summary of the Kaplan-Meier survival estimates
# using the Factor Variable
##################################
summary(DBP.Analysis.KaplanMeierEstimates)

## Call: survfit(formula = Surv(Week, Status) ~ Group, data = DBP.Analysis)
## 
##                 Group=Placebo 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     1     21       2   0.9048  0.0641      0.78754        1.000
##     2     19       2   0.8095  0.0857      0.65785        0.996
##     3     17       1   0.7619  0.0929      0.59988        0.968
##     4     16       2   0.6667  0.1029      0.49268        0.902
##     5     14       2   0.5714  0.1080      0.39455        0.828
##     8     12       4   0.3810  0.1060      0.22085        0.657
##    11      8       2   0.2857  0.0986      0.14529        0.562
##    12      6       2   0.1905  0.0857      0.07887        0.460
##    15      4       1   0.1429  0.0764      0.05011        0.407
##    17      3       1   0.0952  0.0641      0.02549        0.356
##    22      2       1   0.0476  0.0465      0.00703        0.322
##    23      1       1   0.0000     NaN           NA           NA
## 
##                 Group=Treatment 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     6     21       3    0.857  0.0764        0.720        1.000
##     7     17       1    0.807  0.0869        0.653        0.996
##    10     15       1    0.753  0.0963        0.586        0.968
##    13     12       1    0.690  0.1068        0.510        0.935
##    16     11       1    0.627  0.1141        0.439        0.896
##    22      7       1    0.538  0.1282        0.337        0.858
##    23      6       1    0.448  0.1346        0.249        0.807

(DBP.Analysis.KaplanMeierEstimates.Summary <- summary(DBP.Analysis.KaplanMeierEstimates)$table)

##                 records n.max n.start events     rmean se(rmean) median 0.95LCL
## Group=Placebo        21    21      21     21  8.666667  1.377390      8       4
## Group=Treatment      21    21      21      9 23.287395  2.827468     23      16
##                 0.95UCL
## Group=Placebo        12
## Group=Treatment      NA

##################################
# Listing the details of the Kaplan-Meier survival estimates
# using the Factor Variable
##################################
(DBP.Analysis.KaplanMeierEstimates.Details <- surv_summary(DBP.Analysis.KaplanMeierEstimates))

##    time n.risk n.event n.censor       surv    std.err     upper      lower
## 1     1     21       2        0 0.90476190 0.07079923 1.0000000 0.78753505
## 2     2     19       2        0 0.80952381 0.10585122 0.9961629 0.65785306
## 3     3     17       1        0 0.76190476 0.12198751 0.9676909 0.59988048
## 4     4     16       2        0 0.66666667 0.15430335 0.9020944 0.49268063
## 5     5     14       2        0 0.57142857 0.18898224 0.8276066 0.39454812
## 6     8     12       4        0 0.38095238 0.27817432 0.6571327 0.22084536
## 7    11      8       2        0 0.28571429 0.34503278 0.5618552 0.14529127
## 8    12      6       2        0 0.19047619 0.44986771 0.4600116 0.07887014
## 9    15      4       1        0 0.14285714 0.53452248 0.4072755 0.05010898
## 10   17      3       1        0 0.09523810 0.67259271 0.3558956 0.02548583
## 11   22      2       1        0 0.04761905 0.97590007 0.3224544 0.00703223
## 12   23      1       1        0 0.00000000        Inf        NA         NA
## 13    6     21       3        1 0.85714286 0.08908708 1.0000000 0.71981708
## 14    7     17       1        0 0.80672269 0.10776353 0.9964437 0.65312422
## 15    9     16       0        1 0.80672269 0.10776353 0.9964437 0.65312422
## 16   10     15       1        1 0.75294118 0.12796438 0.9675748 0.58591898
## 17   11     13       0        1 0.75294118 0.12796438 0.9675748 0.58591898
## 18   13     12       1        0 0.69019608 0.15475995 0.9347692 0.50961310
## 19   16     11       1        0 0.62745098 0.18177335 0.8959949 0.43939392
## 20   17     10       0        1 0.62745098 0.18177335 0.8959949 0.43939392
## 21   19      9       0        1 0.62745098 0.18177335 0.8959949 0.43939392
## 22   20      8       0        1 0.62745098 0.18177335 0.8959949 0.43939392
## 23   22      7       1        0 0.53781513 0.23843463 0.8582008 0.33703662
## 24   23      6       1        0 0.44817927 0.30030719 0.8073720 0.24878823
## 25   25      5       0        1 0.44817927 0.30030719 0.8073720 0.24878823
## 26   32      4       0        2 0.44817927 0.30030719 0.8073720 0.24878823
## 27   34      2       0        1 0.44817927 0.30030719 0.8073720 0.24878823
## 28   35      1       0        1 0.44817927 0.30030719 0.8073720 0.24878823
##             strata     Group
## 1    Group=Placebo   Placebo
## 2    Group=Placebo   Placebo
## 3    Group=Placebo   Placebo
## 4    Group=Placebo   Placebo
## 5    Group=Placebo   Placebo
## 6    Group=Placebo   Placebo
## 7    Group=Placebo   Placebo
## 8    Group=Placebo   Placebo
## 9    Group=Placebo   Placebo
## 10   Group=Placebo   Placebo
## 11   Group=Placebo   Placebo
## 12   Group=Placebo   Placebo
## 13 Group=Treatment Treatment
## 14 Group=Treatment Treatment
## 15 Group=Treatment Treatment
## 16 Group=Treatment Treatment
## 17 Group=Treatment Treatment
## 18 Group=Treatment Treatment
## 19 Group=Treatment Treatment
## 20 Group=Treatment Treatment
## 21 Group=Treatment Treatment
## 22 Group=Treatment Treatment
## 23 Group=Treatment Treatment
## 24 Group=Treatment Treatment
## 25 Group=Treatment Treatment
## 26 Group=Treatment Treatment
## 27 Group=Treatment Treatment
## 28 Group=Treatment Treatment

##################################
# Plotting the Kaplan-Meier survival curves
# using the Factor Variable
##################################
SurvivalTimeInWeek.KaplanMeier.ByGroup <- ggsurvplot(DBP.Analysis.KaplanMeierEstimates,
                                                     title = "Kaplan-Meier Survival Curve (Group)",
                                                     pval = FALSE,
                                                     conf.int = TRUE,
                                                     conf.int.style = "ribbon",
                                                     xlab = "Time (Weeks)",
                                                     ylab="Estimated Survival Probability",
                                                     break.time.by = 5,
                                                     ggtheme = theme_bw(),
                                                     risk.table = "abs_pct",
                                                     risk.table.title="Number at Risk (Survival Probability)",
                                                     risk.table.y.text.col = FALSE,
                                                     risk.table.y.text = TRUE,
                                                     fontsize = 3,
                                                     ncensor.plot = FALSE,
                                                     surv.median.line = "hv",
                                                     legend.labs = c("Placebo", "Treatment"),
                                                     palette = c("#3259A0","#FF5050"))

ggpar(SurvivalTimeInWeek.KaplanMeier.ByGroup,
      font.title = c(14,"bold"),
      font.x = c(12,"bold"),
      font.y = c(12,"bold"),
      font.legend=c(12),
      font.xtickslab=c(9,"black"),
      font.ytickslab=c(9,"black"))

1.4.3 Log Rank Test

Log Rank Test evaluates the null hypothesis of no difference in survival between two or more independent groups by comparing the entire survival experience. The method involves generating a sequential list of time values corresponding to an event or censoring from two or more independent groups; determining the counts of event-free observations, event and expected events for each time value; calculating the Chi-Square test statistic based on these counts; and evaluating the determined test statistic against the rejection region.

Code Chunk | Output

##################################
# Conducting the Log Rank Test
# between the Kaplan-Meier survival estimates
# using the Factor Variable
##################################
(DBP.Analysis.LogRankTest <- survdiff(Surv(Week, Status) ~ Group, data = DBP.Analysis))

## Call:
## survdiff(formula = Surv(Week, Status) ~ Group, data = DBP.Analysis)
## 
##                  N Observed Expected (O-E)^2/E (O-E)^2/V
## Group=Placebo   21       21     10.7      9.77      16.8
## Group=Treatment 21        9     19.3      5.46      16.8
## 
##  Chisq= 16.8  on 1 degrees of freedom, p= 4e-05

##################################
# Plotting the Kaplan-Meier survival curves
# using the Factor Variable
##################################
SurvivalTimeInWeek.KaplanMeier.ByGroup <- ggsurvplot(DBP.Analysis.KaplanMeierEstimates,
                                                     title = "Kaplan-Meier Survival Curve (Group)",
                                                     pval = TRUE,
                                                     conf.int = TRUE,
                                                     conf.int.style = "ribbon",
                                                     xlab = "Time (Weeks)",
                                                     ylab="Estimated Survival Probability",
                                                     break.time.by = 5,
                                                     ggtheme = theme_bw(),
                                                     risk.table = "abs_pct",
                                                     risk.table.title="Number at Risk (Survival Probability)",
                                                     risk.table.y.text.col = FALSE,
                                                     risk.table.y.text = TRUE,
                                                     fontsize = 3,
                                                     ncensor.plot = FALSE,
                                                     surv.median.line = "hv",
                                                     legend.labs = c("Placebo", "Treatment"),
                                                     palette = c("#3259A0","#FF5050"))

ggpar(SurvivalTimeInWeek.KaplanMeier.ByGroup,
      font.title = c(14,"bold"),
      font.x = c(12,"bold"),
      font.y = c(12,"bold"),
      font.legend=c(12),
      font.xtickslab=c(9,"black"),
      font.ytickslab=c(9,"black"))

1.5 Model Development Summary

Cox Proportional Hazards Regression examines the relationship between a hazard function and a set of covariates by assuming that predictors act multiplicatively on the hazard function without any explicit assumption of its distributional form. The model is semi-parametric because while the baseline hazard can take any form, the covariates enter the model linearly.

Concordance Index is a rank correlation measure between a predictor and a possibly censored outcome with an event or censoring indicator. In survival analysis, a pair of patients is called concordant if the risk of the event predicted by a model is lower for the patient who experiences the event at a later time point. The concordance probability (C-index) is the frequency of concordant pairs among all pairs of subjects which can be used to measure and compare the discrimination power of risk prediction models.

Residual Analysis for Model Assumption Diagnostics assesses whether a fitted Cox Proportional Hazards Regression model adequately describes the data using various residuals to test the proportional hazards assumption, examine influential observations (or outliers) and detect nonlinearity in the relationship between the log hazard and the covariates. Schoenfeld residuals are calculated by regressing the scaled and weighted Martingale residuals on the model covariates. These help evaluate the proportional hazards assumption by examining the relationship between the residuals and time for each covariate. A significant correlation between Schoenfeld residuals and time suggests a violation of the model assumptions. Deviance residuals are derived from the log partial likelihood estimation during regression. These measure the difference between the observed event status and the predicted event probability. Deviance residuals are more sensitive to the shape of the hazard function and can help identify departures from the model assumptions.

Cox proportional hazards regression analysis involving the response variable Week, factor variable Group and covariate variable WBC showed that :

[A] A cox proportional hazards regression model applied on the response variable Week and factor variable Group which includes the covariate variable WBC was most appropriate as the final survival model since its addition significantly improved model fit (p<0.001, Likelihood Ratio Test). However, the further inclusion of an interaction term Group * WBC into the model did not show any significant improvement to model fit (p=0.549, Likelihood Ratio Test).

[B] Using the final survival model controlled for WBC, association between Group and survival time Week was statistically significant with an estimated hazard ratio of 0.25 (95% CI 0.11, 0.57; p=0.001, Cox Proportional Hazards Model Wald Test) for Group=Treatment (lower hazard and higher survival rate for leukemia patients using the new treatment) relative to Group=Placebo (higher hazard and lower survival rate for leukemia patients using the placebo treatment).

[C] Using the final survival model controlled for Group, association between WBC and survival time Week was statistically significant with an estimated hazard ratio of 5.42 (95% CI 2.81, 10.50; p<0.001, Cox Proportional Hazards Model Wald Test) indicating higher hazard and lower survival rate for leukemia patients with higher white blood count measurement.

[D] The estimated Harrel’s concordance index (proportion of leukemia patients that the model can correctly order in terms of survival times when adjusted for censoring) for the final survival model was determined to be relatively high at 0.85 (optimistic) and 0.84 (optimism-corrected after a 500-cycle internal bootstrap validation).

[E] The scaled Schoenfield residuals showed random patterns when plotted against time indicating no violation of the proportional hazards assumption. The proportional hazards assumption was further supported by the non-significant relationship between the scaled Schoenfield residuals and time based from the global (p=0.9801) and individual (p=0.9927 for Group, 0.8414 for WBC ) statistical test results.

[F] The deviance residuals did not show any severely influential observations when plotted against the observations, the dfbeta residuals were reasonably symmetrical when plotted against the observations, and the fitted line with lowess function of the martingale residuals obtained for the null cox model was reasonably linear for the numeric covariate WBC - all indicating a fairly good model fit for the final survival model.

1.5.1 Univariate Cox Proportional Hazards Model

Code Chunk | Output

##################################
# Formulating a Univariate Cox Proportional Hazards Model
# independently for the Factor Variable
##################################
(DBP.Analysis.UnivariateCoxPH.Group <- coxph(Surv(Week, Status) ~ Group, data = DBP.Analysis))

## Call:
## coxph(formula = Surv(Week, Status) ~ Group, data = DBP.Analysis)
## 
##                   coef exp(coef) se(coef)      z        p
## GroupTreatment -1.5721    0.2076   0.4124 -3.812 0.000138
## 
## Likelihood ratio test=16.35  on 1 df, p=5.261e-05
## n= 42, number of events= 30

summary(DBP.Analysis.UnivariateCoxPH.Group)

## Call:
## coxph(formula = Surv(Week, Status) ~ Group, data = DBP.Analysis)
## 
##   n= 42, number of events= 30 
## 
##                   coef exp(coef) se(coef)      z Pr(>|z|)    
## GroupTreatment -1.5721    0.2076   0.4124 -3.812 0.000138 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                exp(coef) exp(-coef) lower .95 upper .95
## GroupTreatment    0.2076      4.817   0.09251    0.4659
## 
## Concordance= 0.69  (se = 0.041 )
## Likelihood ratio test= 16.35  on 1 df,   p=5e-05
## Wald test            = 14.53  on 1 df,   p=1e-04
## Score (logrank) test = 17.25  on 1 df,   p=3e-05

tbl_regression(DBP.Analysis.UnivariateCoxPH.Group, 
               exponentiate = TRUE)

Characteristic	HR¹	95% CI¹	p-value
Group
Placebo	—	—
Treatment	0.21	0.09, 0.47	<0.001
¹ HR = Hazard Ratio, CI = Confidence Interval

(ggforest(DBP.Analysis.UnivariateCoxPH.Group, 
          data=DBP.Analysis,
          main = "",
          fontsize = 0.9))

##################################
# Plotting the Kaplan-Meier survival curves
# for the formulated Univariate Cox Proportional Hazards Model
# involving the Factor Variable
##################################
DBP.Analysis.UnivariateCoxPH.Group.KaplanMeierEstimates <- survfit(DBP.Analysis.UnivariateCoxPH.Group, data = DBP.Analysis)

SurvivalTimeInWeek.KaplanMeier.UnivariateCoxPH.Group <- ggsurvplot(DBP.Analysis.UnivariateCoxPH.Group.KaplanMeierEstimates,
                                                     title = "Kaplan-Meier Survival Curves (CoxPH Model : Survival ~ Group)",
                                                     pval = FALSE,
                                                     conf.int = TRUE,
                                                     conf.int.style = "ribbon",
                                                     xlab = "Time (Weeks)",
                                                     ylab="Estimated Survival Probability",
                                                     break.time.by = 5,
                                                     ggtheme = theme_bw(),
                                                     risk.table = "abs_pct",
                                                     risk.table.title="Number at Risk (Survival Probability)",
                                                     risk.table.y.text.col = FALSE,
                                                     risk.table.y.text = TRUE,
                                                     fontsize = 3,
                                                     ncensor.plot = FALSE,
                                                     surv.median.line = "hv",
                                                     palette = c("#000000"))

ggpar(SurvivalTimeInWeek.KaplanMeier.UnivariateCoxPH.Group,
      font.title = c(14,"bold"),
      font.x = c(12,"bold"),
      font.y = c(12,"bold"),
      font.legend=c(12),
      font.xtickslab=c(9,"black"),
      font.ytickslab=c(9,"black"))

##################################
# Formulating a Univariate Cox Proportional Hazards Model
# independently for the Covariate Variable
##################################
(DBP.Analysis.UnivariateCoxPH.WBC <- coxph(Surv(Week, Status) ~ WBC, data = DBP.Analysis))

## Call:
## coxph(formula = Surv(Week, Status) ~ WBC, data = DBP.Analysis)
## 
##      coef exp(coef) se(coef)     z        p
## WBC 1.646     5.189    0.298 5.525 3.29e-08
## 
## Likelihood ratio test=34.84  on 1 df, p=3.577e-09
## n= 42, number of events= 30

summary(DBP.Analysis.UnivariateCoxPH.WBC)

## Call:
## coxph(formula = Surv(Week, Status) ~ WBC, data = DBP.Analysis)
## 
##   n= 42, number of events= 30 
## 
##      coef exp(coef) se(coef)     z Pr(>|z|)    
## WBC 1.646     5.189    0.298 5.525 3.29e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##     exp(coef) exp(-coef) lower .95 upper .95
## WBC     5.188     0.1927     2.893     9.304
## 
## Concordance= 0.809  (se = 0.045 )
## Likelihood ratio test= 34.84  on 1 df,   p=4e-09
## Wald test            = 30.53  on 1 df,   p=3e-08
## Score (logrank) test = 36.62  on 1 df,   p=1e-09

tbl_regression(DBP.Analysis.UnivariateCoxPH.WBC, 
               exponentiate = TRUE)

Characteristic	HR¹	95% CI¹	p-value
WBC	5.19	2.89, 9.30	<0.001
¹ HR = Hazard Ratio, CI = Confidence Interval

(ggforest(DBP.Analysis.UnivariateCoxPH.WBC, 
          data=DBP.Analysis,
          main = "",
          fontsize = 0.9))

##################################
# Plotting the Kaplan-Meier survival curves
# for the formulated Univariate Cox Proportional Hazards Model
# involving the Covariate Variable
##################################
DBP.Analysis.UnivariateCoxPH.WBC.KaplanMeierEstimates <- survfit(DBP.Analysis.UnivariateCoxPH.WBC, data = DBP.Analysis)

SurvivalTimeInWeek.KaplanMeier.UnivariateCoxPH.WBC <- ggsurvplot(DBP.Analysis.UnivariateCoxPH.WBC.KaplanMeierEstimates,
                                                     title = "Kaplan-Meier Survival Curves (CoxPH Model : Survival ~ WBC)",
                                                     pval = FALSE,
                                                     conf.int = TRUE,
                                                     conf.int.style = "ribbon",
                                                     xlab = "Time (Weeks)",
                                                     ylab="Estimated Survival Probability",
                                                     break.time.by = 5,
                                                     ggtheme = theme_bw(),
                                                     risk.table = "abs_pct",
                                                     risk.table.title="Number at Risk (Survival Probability)",
                                                     risk.table.y.text.col = FALSE,
                                                     risk.table.y.text = TRUE,
                                                     fontsize = 3,
                                                     ncensor.plot = FALSE,
                                                     surv.median.line = "hv",
                                                     palette = c("#000000"))

ggpar(SurvivalTimeInWeek.KaplanMeier.UnivariateCoxPH.WBC,
      font.title = c(14,"bold"),
      font.x = c(12,"bold"),
      font.y = c(12,"bold"),
      font.legend=c(12),
      font.xtickslab=c(9,"black"),
      font.ytickslab=c(9,"black"))

##################################
# Formulating a Univariate Cox Proportional Hazards Model
# individually for the Factor and Covariate Variables
##################################
tbl_survfit(
  list(survfit(Surv(Week, Status) ~ 1, data = DBP.Analysis),
       survfit(Surv(Week, Status) ~ Group, data = DBP.Analysis),
       survfit(Surv(Week, Status) ~ WBC, data = DBP.Analysis)),
  probs = 0.5,
  label_header = "**Median Survival**") %>%
  add_n() %>%
  add_nevent()

Characteristic	N	Event N	Median Survival
Overall	42	30	12 (8.0, 22)
Group	42	30
Placebo			8.0 (4.0, 12)
Treatment			23 (16, —)
WBC	42	30
1.45			— (—, —)
1.47			— (—, —)
1.5			12 (—, —)
1.78			— (—, —)
1.97			23 (—, —)
2.01			— (—, —)
2.05			— (—, —)
2.12			11 (—, —)
2.16			— (—, —)
2.2			— (—, —)
2.3			15 (—, —)
2.31			6.0 (—, —)
2.32			15 (8.0, —)
2.42			4.0 (—, —)
2.53			— (—, —)
2.57			23 (—, —)
2.6			— (—, —)
2.7			— (—, —)
2.73			22 (—, —)
2.8			5.0 (1.0, —)
2.88			13 (—, —)
2.95			17 (—, —)
2.96			10 (—, —)
3.05			8.0 (—, —)
3.06			12 (—, —)
3.2			— (—, —)
3.26			8.0 (—, —)
3.28			6.0 (—, —)
3.49			8.0 (5.0, —)
3.52			8.0 (—, —)
3.6			16 (—, —)
3.97			5.0 (—, —)
4.01			3.0 (—, —)
4.06			6.0 (—, —)
4.36			4.0 (—, —)
4.43			7.0 (—, —)
4.48			2.0 (—, —)
4.91			2.0 (—, —)
5			1.0 (—, —)

1.5.2 Multivariate Cox Proportional Hazards Model

Code Chunk | Output

##################################
# Formulating a Multivariate Cox Proportional Hazards Model
# with both the Factor and Covariate Variables
##################################
(DBP.Analysis.MultivariateCoxPH.GroupWBC <- coxph(Surv(Week, Status) ~ Group + WBC, data = DBP.Analysis))

## Call:
## coxph(formula = Surv(Week, Status) ~ Group + WBC, data = DBP.Analysis)
## 
##                   coef exp(coef) se(coef)      z       p
## GroupTreatment -1.3861    0.2501   0.4248 -3.263  0.0011
## WBC             1.6909    5.4243   0.3359  5.034 4.8e-07
## 
## Likelihood ratio test=46.71  on 2 df, p=7.187e-11
## n= 42, number of events= 30

summary(DBP.Analysis.MultivariateCoxPH.GroupWBC )

## Call:
## coxph(formula = Surv(Week, Status) ~ Group + WBC, data = DBP.Analysis)
## 
##   n= 42, number of events= 30 
## 
##                   coef exp(coef) se(coef)      z Pr(>|z|)    
## GroupTreatment -1.3861    0.2501   0.4248 -3.263   0.0011 ** 
## WBC             1.6909    5.4243   0.3359  5.034  4.8e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                exp(coef) exp(-coef) lower .95 upper .95
## GroupTreatment    0.2501     3.9991    0.1088    0.5749
## WBC               5.4243     0.1844    2.8082   10.4776
## 
## Concordance= 0.852  (se = 0.04 )
## Likelihood ratio test= 46.71  on 2 df,   p=7e-11
## Wald test            = 33.6  on 2 df,   p=5e-08
## Score (logrank) test = 46.07  on 2 df,   p=1e-10

tbl_regression(DBP.Analysis.MultivariateCoxPH.GroupWBC, 
               exponentiate = TRUE)

Characteristic	HR¹	95% CI¹	p-value
Group
Placebo	—	—
Treatment	0.25	0.11, 0.57	0.001
WBC	5.42	2.81, 10.5	<0.001
¹ HR = Hazard Ratio, CI = Confidence Interval

(DBP.Analysis.MultivariateCoxPH.GroupWBC.ForestPlot <- ggforest(DBP.Analysis.MultivariateCoxPH.GroupWBC , 
          data=DBP.Analysis,
          main = "",
          fontsize = 0.9))

##################################
# Plotting the Kaplan-Meier survival curves
# for the formulated Multivariate Cox Proportional Hazards Model
# involving the Factor and Covariate variables
##################################
DBP.Analysis.MultivariateCoxPH.GroupWBC.KaplanMeierEstimates <- survfit(DBP.Analysis.MultivariateCoxPH.GroupWBC , data = DBP.Analysis)

SurvivalTimeInWeek.KaplanMeier.MultivariateCoxPH.GroupWBC <- ggsurvplot(DBP.Analysis.MultivariateCoxPH.GroupWBC.KaplanMeierEstimates,
                                                     title = "Kaplan-Meier Survival Curves (CoxPH Model : Survival ~ Group + WBC)",
                                                     pval = FALSE,
                                                     conf.int = TRUE,
                                                     conf.int.style = "ribbon",
                                                     xlab = "Time (Weeks)",
                                                     ylab="Estimated Survival Probability",
                                                     break.time.by = 5,
                                                     ggtheme = theme_bw(),
                                                     risk.table = "abs_pct",
                                                     risk.table.title="Number at Risk (Survival Probability)",
                                                     risk.table.y.text.col = FALSE,
                                                     risk.table.y.text = TRUE,
                                                     fontsize = 3,
                                                     ncensor.plot = FALSE,
                                                     surv.median.line = "hv",
                                                     palette = c("#000000"))

ggpar(SurvivalTimeInWeek.KaplanMeier.MultivariateCoxPH.GroupWBC,
      font.title = c(14,"bold"),
      font.x = c(12,"bold"),
      font.y = c(12,"bold"),
      font.legend=c(12),
      font.xtickslab=c(9,"black"),
      font.ytickslab=c(9,"black"))

##################################
# Formulating a Multivariate Cox Proportional Hazards Model
# with both the Factor and Covariate Variables
# including their interaction
##################################
(DBP.Analysis.MultivariateCoxPH.GroupWBC.WithInteraction <- coxph(Surv(Week, Status) ~ Group + WBC + Group*WBC, data = DBP.Analysis))

## Call:
## coxph(formula = Surv(Week, Status) ~ Group + WBC + Group * WBC, 
##     data = DBP.Analysis)
## 
##                        coef exp(coef) se(coef)      z        p
## GroupTreatment     -2.37491   0.09302  1.70547 -1.393    0.164
## WBC                 1.55489   4.73459  0.39866  3.900 9.61e-05
## GroupTreatment:WBC  0.31752   1.37372  0.52579  0.604    0.546
## 
## Likelihood ratio test=47.07  on 3 df, p=3.356e-10
## n= 42, number of events= 30

summary(DBP.Analysis.MultivariateCoxPH.GroupWBC.WithInteraction)

## Call:
## coxph(formula = Surv(Week, Status) ~ Group + WBC + Group * WBC, 
##     data = DBP.Analysis)
## 
##   n= 42, number of events= 30 
## 
##                        coef exp(coef) se(coef)      z Pr(>|z|)    
## GroupTreatment     -2.37491   0.09302  1.70547 -1.393    0.164    
## WBC                 1.55489   4.73459  0.39866  3.900 9.61e-05 ***
## GroupTreatment:WBC  0.31752   1.37372  0.52579  0.604    0.546    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                    exp(coef) exp(-coef) lower .95 upper .95
## GroupTreatment       0.09302    10.7500  0.003288     2.632
## WBC                  4.73459     0.2112  2.167413    10.342
## GroupTreatment:WBC   1.37372     0.7280  0.490169     3.850
## 
## Concordance= 0.851  (se = 0.041 )
## Likelihood ratio test= 47.07  on 3 df,   p=3e-10
## Wald test            = 32.39  on 3 df,   p=4e-07
## Score (logrank) test = 49.86  on 3 df,   p=9e-11

tbl_regression(DBP.Analysis.MultivariateCoxPH.GroupWBC.WithInteraction, 
               exponentiate = TRUE)

Characteristic	HR¹	95% CI¹	p-value
Group
Placebo	—	—
Treatment	0.09	0.00, 2.63	0.2
WBC	4.73	2.17, 10.3	<0.001
Group * WBC
Treatment * WBC	1.37	0.49, 3.85	0.5
¹ HR = Hazard Ratio, CI = Confidence Interval

(DBP.Analysis.MultivariateCoxPH.GroupWBC.WithInteraction.ForestPlot <- ggforest(DBP.Analysis.MultivariateCoxPH.GroupWBC.WithInteraction, 
          data=DBP.Analysis,
          main = "",
          fontsize = 0.9))

##################################
# Plotting the Kaplan-Meier survival curves
# for the formulated Multivariate Cox Proportional Hazards Model
# involving the Factor and Covariate variables
# including their interaction
##################################
DBP.Analysis.MultivariateCoxPH.GroupWBC.WithInteraction.KaplanMeierEstimates <- survfit(DBP.Analysis.MultivariateCoxPH.GroupWBC.WithInteraction, data = DBP.Analysis)

SurvivalTimeInWeek.KaplanMeier.MultivariateCoxPH.GroupWBC.WithInteraction <- ggsurvplot(DBP.Analysis.MultivariateCoxPH.GroupWBC.WithInteraction.KaplanMeierEstimates,
                                                     title = "Kaplan-Meier Survival Curves (CoxPH Model : Survival ~ Group * WBC)",
                                                     pval = FALSE,
                                                     conf.int = TRUE,
                                                     conf.int.style = "ribbon",
                                                     xlab = "Time (Weeks)",
                                                     ylab="Estimated Survival Probability",
                                                     break.time.by = 5,
                                                     ggtheme = theme_bw(),
                                                     risk.table = "abs_pct",
                                                     risk.table.title="Number at Risk (Survival Probability)",
                                                     risk.table.y.text.col = FALSE,
                                                     risk.table.y.text = TRUE,
                                                     fontsize = 3,
                                                     ncensor.plot = FALSE,
                                                     surv.median.line = "hv",
                                                     palette = c("#000000"))

ggpar(SurvivalTimeInWeek.KaplanMeier.MultivariateCoxPH.GroupWBC.WithInteraction,
      font.title = c(14,"bold"),
      font.x = c(12,"bold"),
      font.y = c(12,"bold"),
      font.legend=c(12),
      font.xtickslab=c(9,"black"),
      font.ytickslab=c(9,"black"))

##################################
# Formulating complete and reduced
# Multivariate Cox Proportional Hazards Models
# involving the Factor and Covariate Variables
##################################
DBP.Analysis.SurvivalResponse <- "Surv(Week, Status)"
DBP.Analysis.SurvivalPredictorComplete <- c("Group", "WBC")
DBP.Analysis.SurvivalPredictorReduced <- c("Group", "WBC")

DBP.Analysis.CoxPHModelSummary <- DBP.Analysis %>%
  finalfit(DBP.Analysis.SurvivalResponse, 
           DBP.Analysis.SurvivalPredictorComplete, 
           DBP.Analysis.SurvivalPredictorReduced, 
           keep_models = TRUE,
           add_dependent_label = FALSE) %>%
  rename("Overall Survival" = label) %>%
  rename(" " = levels) %>%
  rename("  " = all)

kable(DBP.Analysis.CoxPHModelSummary)

Overall Survival			HR (univariable)	HR (multivariable full)	HR (multivariable)
Group	Placebo	21 (50.0)	-	-	-
	Treatment	21 (50.0)	0.21 (0.09-0.47, p<0.001)	0.25 (0.11-0.57, p=0.001)	0.25 (0.11-0.57, p=0.001)
WBC	Mean (SD)	2.9 (0.9)	5.19 (2.89-9.30, p<0.001)	5.42 (2.81-10.48, p<0.001)	5.42 (2.81-10.48, p<0.001)

1.5.3 Model Selection

Code Chunk | Output

##################################
# Reusing formulated models
##################################
summary(DBP.Analysis.UnivariateCoxPH.Group)

## Call:
## coxph(formula = Surv(Week, Status) ~ Group, data = DBP.Analysis)
## 
##   n= 42, number of events= 30 
## 
##                   coef exp(coef) se(coef)      z Pr(>|z|)    
## GroupTreatment -1.5721    0.2076   0.4124 -3.812 0.000138 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                exp(coef) exp(-coef) lower .95 upper .95
## GroupTreatment    0.2076      4.817   0.09251    0.4659
## 
## Concordance= 0.69  (se = 0.041 )
## Likelihood ratio test= 16.35  on 1 df,   p=5e-05
## Wald test            = 14.53  on 1 df,   p=1e-04
## Score (logrank) test = 17.25  on 1 df,   p=3e-05

summary(DBP.Analysis.MultivariateCoxPH.GroupWBC)

## Call:
## coxph(formula = Surv(Week, Status) ~ Group + WBC, data = DBP.Analysis)
## 
##   n= 42, number of events= 30 
## 
##                   coef exp(coef) se(coef)      z Pr(>|z|)    
## GroupTreatment -1.3861    0.2501   0.4248 -3.263   0.0011 ** 
## WBC             1.6909    5.4243   0.3359  5.034  4.8e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                exp(coef) exp(-coef) lower .95 upper .95
## GroupTreatment    0.2501     3.9991    0.1088    0.5749
## WBC               5.4243     0.1844    2.8082   10.4776
## 
## Concordance= 0.852  (se = 0.04 )
## Likelihood ratio test= 46.71  on 2 df,   p=7e-11
## Wald test            = 33.6  on 2 df,   p=5e-08
## Score (logrank) test = 46.07  on 2 df,   p=1e-10

summary(DBP.Analysis.MultivariateCoxPH.GroupWBC.WithInteraction)

## Call:
## coxph(formula = Surv(Week, Status) ~ Group + WBC + Group * WBC, 
##     data = DBP.Analysis)
## 
##   n= 42, number of events= 30 
## 
##                        coef exp(coef) se(coef)      z Pr(>|z|)    
## GroupTreatment     -2.37491   0.09302  1.70547 -1.393    0.164    
## WBC                 1.55489   4.73459  0.39866  3.900 9.61e-05 ***
## GroupTreatment:WBC  0.31752   1.37372  0.52579  0.604    0.546    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                    exp(coef) exp(-coef) lower .95 upper .95
## GroupTreatment       0.09302    10.7500  0.003288     2.632
## WBC                  4.73459     0.2112  2.167413    10.342
## GroupTreatment:WBC   1.37372     0.7280  0.490169     3.850
## 
## Concordance= 0.851  (se = 0.041 )
## Likelihood ratio test= 47.07  on 3 df,   p=3e-10
## Wald test            = 32.39  on 3 df,   p=4e-07
## Score (logrank) test = 49.86  on 3 df,   p=9e-11

##################################
# Formulating the linear predictors
# for the formulated Univariate Cox Proportional Hazards Models
##################################
DBP.Analysis$LP.Univariate.Group <- predict(DBP.Analysis.UnivariateCoxPH.Group, type = "lp")

##################################
# Formulating the linear predictors
# for the formulated Multivariate Cox Proportional Hazards Models
##################################
DBP.Analysis$LP.Multivariate.GroupWBC <- predict(DBP.Analysis.MultivariateCoxPH.GroupWBC, type = "lp")
DBP.Analysis$LP.Multivariate.GroupWBC.WithInteraction <- predict(DBP.Analysis.MultivariateCoxPH.GroupWBC.WithInteraction, type = "lp")

##################################
# Conducting a likelihood ratio test between nested models
# including the formulated Multivariate Cox Proportional Hazards Models
# involving the Factor variable only
# and both Factor and Covariate variables
##################################
(LikelihoodRatioTest.CoxPHGroup.CoxPHGroupWBC <- anova(DBP.Analysis.UnivariateCoxPH.Group,
      DBP.Analysis.MultivariateCoxPH.GroupWBC,
      test="LRT"))

## Analysis of Deviance Table
##  Cox model: response is  Surv(Week, Status)
##  Model 1: ~ Group
##  Model 2: ~ Group + WBC
##    loglik  Chisq Df Pr(>|Chi|)    
## 1 -85.008                         
## 2 -69.828 30.361  1  3.587e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

##################################
# Conducting a likelihood ratio test between nested models
# including the formulated Multivariate Cox Proportional Hazards Models
# involving both Factor and Covariate variables only
# and both Factor and Covariate variables including their interaction
##################################
(LikelihoodRatioTest.CoxPHGroupWBC.CoxPHGroupWBCWithInteraction <- anova(DBP.Analysis.MultivariateCoxPH.GroupWBC,
      DBP.Analysis.MultivariateCoxPH.GroupWBC.WithInteraction,
      test="LRT"))

## Analysis of Deviance Table
##  Cox model: response is  Surv(Week, Status)
##  Model 1: ~ Group + WBC
##  Model 2: ~ Group + WBC + Group * WBC
##    loglik  Chisq Df Pr(>|Chi|)
## 1 -69.828                     
## 2 -69.648 0.3594  1     0.5488

1.5.4 Model Performance Validation

Code Chunk | Output

##################################
# Measuring the Harrel's Concordance Index
# for the formulated Univariate Cox Proportional Hazards Model
# involving the Factor variable
##################################
rcorrcens(formula = Surv(Week, Status) ~ I(-1 * LP.Univariate.Group), data = DBP.Analysis)

## 
## Somers' Rank Correlation for Censored Data    Response variable:Surv(Week, Status)
## 
##                                C  Dxy aDxy    SD    Z P  n
## I(-1 * LP.Univariate.Group) 0.69 0.38 0.38 0.082 4.62 0 42

LP.Univariate.Group.Concordance <- survConcordance(Surv(Week, Status) ~ LP.Univariate.Group,
                                               data = DBP.Analysis)

(LP.Univariate.Group.Concordance$concordance)

## concordant 
##  0.6900421

DBP.Analysis.UnivariateCoxPH.Group.Validation <- cph(Surv(Week, Status) ~ Group, 
                                                     x=TRUE, 
                                                     y=TRUE, 
                                                     data = DBP.Analysis)

set.seed (88888888)
DBP.Analysis.UnivariateCoxPH.Group.Validation.Summary <- validate(DBP.Analysis.UnivariateCoxPH.Group.Validation, B =500)

(DBP.Analysis.UnivariateCoxPH.Group.ConcordanceIndex.Optimistic <- (DBP.Analysis.UnivariateCoxPH.Group.Validation.Summary[1,1]/2)+0.50)

## [1] 0.6900421

(DBP.Analysis.UnivariateCoxPH.Group.ConcordanceIndex.OptimismCorrected <- (DBP.Analysis.UnivariateCoxPH.Group.Validation.Summary[1,5]/2)+0.50)

## [1] 0.6894831

##################################
# Measuring the Harrel's Concordance Index
# for the formulated Multivariate Cox Proportional Hazards Model
# involving the Factor and Covariate variables
##################################
rcorrcens(formula = Surv(Week, Status) ~ I(-1 * LP.Multivariate.GroupWBC), data = DBP.Analysis)

## 
## Somers' Rank Correlation for Censored Data    Response variable:Surv(Week, Status)
## 
##                                      C   Dxy  aDxy    SD    Z P  n
## I(-1 * LP.Multivariate.GroupWBC) 0.852 0.704 0.704 0.081 8.73 0 42

LP.Multivariate.GroupWBC.Concordance <- survConcordance(Surv(Week, Status) ~ LP.Multivariate.GroupWBC, 
                                                        data = DBP.Analysis)

(LP.Multivariate.GroupWBC.Concordance$concordance)

## concordant 
##  0.8520337

DBP.Analysis.Multivariate.GroupWBC.Validation <- cph(Surv(Week, Status) ~ Group + WBC, 
                                                     x=TRUE, 
                                                     y=TRUE, 
                                                     data = DBP.Analysis)

set.seed (88888888)
DBP.Analysis.Multivariate.GroupWBC.Validation.Summary <- validate(DBP.Analysis.Multivariate.GroupWBC.Validation, B =500)

(DBP.Analysis.Multivariate.GroupWBC.ConcordanceIndex.Optimistic <- (DBP.Analysis.Multivariate.GroupWBC.Validation.Summary[1,1]/2)+0.50)

## [1] 0.8520337

(DBP.Analysis.Multivariate.GroupWBC.ConcordanceIndex.OptimismCorrected <- (DBP.Analysis.Multivariate.GroupWBC.Validation.Summary[1,5]/2)+0.50)

## [1] 0.8448274

##################################
# Measuring the Harrel's Concordance Index
# for the formulated Multivariate Cox Proportional Hazards Model
# involving the Factor and Covariate variables
# including their interaction
##################################
rcorrcens(formula = Surv(Week, Status) ~ I(-1 * LP.Multivariate.GroupWBC.WithInteraction), data = DBP.Analysis)

## 
## Somers' Rank Correlation for Censored Data    Response variable:Surv(Week, Status)
## 
##                                                      C   Dxy  aDxy    SD    Z P
## I(-1 * LP.Multivariate.GroupWBC.WithInteraction) 0.851 0.701 0.701 0.082 8.57 0
##                                                   n
## I(-1 * LP.Multivariate.GroupWBC.WithInteraction) 42

LP.Multivariate.GroupWBC.WithInteraction.Concordance <- survConcordance(Surv(Week, Status) ~ LP.Multivariate.GroupWBC.WithInteraction, data = DBP.Analysis)

(LP.Multivariate.GroupWBC.WithInteraction.Concordance$concordance)

## concordant 
##  0.8506311

DBP.Analysis.Multivariate.GroupWBC.WithInteraction.Validation <- cph(Surv(Week, Status) ~ Group + WBC + Group*WBC, 
                                                     x=TRUE, 
                                                     y=TRUE, 
                                                     data = DBP.Analysis)

set.seed (88888888)
DBP.Analysis.Multivariate.GroupWBC.WithInteraction.Validation.Summary <- validate(DBP.Analysis.Multivariate.GroupWBC.WithInteraction.Validation, B =500)

(DBP.Analysis.Multivariate.GroupWBC.WithInteraction.ConcordanceIndex.Optimistic <- (DBP.Analysis.Multivariate.GroupWBC.WithInteraction.Validation.Summary[1,1]/2)+0.50)

## [1] 0.8506311

(DBP.Analysis.Multivariate.GroupWBC.WithInteraction.ConcordanceIndex.OptimismCorrected <- (DBP.Analysis.Multivariate.GroupWBC.WithInteraction.Validation.Summary[1,5]/2)+0.50)

## [1] 0.8406148

1.5.5 Model Diagnostics

Code Chunk | Output

##################################
# Reusing selected model
##################################
summary(DBP.Analysis.MultivariateCoxPH.GroupWBC)

## Call:
## coxph(formula = Surv(Week, Status) ~ Group + WBC, data = DBP.Analysis)
## 
##   n= 42, number of events= 30 
## 
##                   coef exp(coef) se(coef)      z Pr(>|z|)    
## GroupTreatment -1.3861    0.2501   0.4248 -3.263   0.0011 ** 
## WBC             1.6909    5.4243   0.3359  5.034  4.8e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                exp(coef) exp(-coef) lower .95 upper .95
## GroupTreatment    0.2501     3.9991    0.1088    0.5749
## WBC               5.4243     0.1844    2.8082   10.4776
## 
## Concordance= 0.852  (se = 0.04 )
## Likelihood ratio test= 46.71  on 2 df,   p=7e-11
## Wald test            = 33.6  on 2 df,   p=5e-08
## Score (logrank) test = 46.07  on 2 df,   p=1e-10

tbl_regression(DBP.Analysis.MultivariateCoxPH.GroupWBC, 
               exponentiate = TRUE)

Characteristic	HR¹	95% CI¹	p-value
Group
Placebo	—	—
Treatment	0.25	0.11, 0.57	0.001
WBC	5.42	2.81, 10.5	<0.001
¹ HR = Hazard Ratio, CI = Confidence Interval

(DBP.Analysis.MultivariateCoxPH.GroupWBC.ForestPlot <- ggforest(DBP.Analysis.MultivariateCoxPH.GroupWBC , 
          data=DBP.Analysis,
          main = "",
          fontsize = 0.90))

##################################
# Plotting the Kaplan-Meier survival curves
# for the formulated Multivariate Cox Proportional Hazards Model
# involving the Factor and Covariate variables
##################################
DBP.Analysis.MultivariateCoxPH.GroupWBC.KaplanMeierEstimates <- survfit(DBP.Analysis.MultivariateCoxPH.GroupWBC , data = DBP.Analysis)

SurvivalTimeInWeek.KaplanMeier.MultivariateCoxPH.GroupWBC <- ggsurvplot(DBP.Analysis.MultivariateCoxPH.GroupWBC.KaplanMeierEstimates,
                                                     title = "Kaplan-Meier Survival Curves (CoxPH Model : Survival ~ Group + WBC)",
                                                     pval = FALSE,
                                                     conf.int = TRUE,
                                                     conf.int.style = "ribbon",
                                                     xlab = "Time (Weeks)",
                                                     ylab="Estimated Survival Probability",
                                                     break.time.by = 5,
                                                     ggtheme = theme_bw(),
                                                     risk.table = "abs_pct",
                                                     risk.table.title="Number at Risk (Survival Probability)",
                                                     risk.table.y.text.col = FALSE,
                                                     risk.table.y.text = TRUE,
                                                     fontsize = 3,
                                                     ncensor.plot = FALSE,
                                                     surv.median.line = "hv",
                                                     palette = c("#000000"))

ggpar(SurvivalTimeInWeek.KaplanMeier.MultivariateCoxPH.GroupWBC,
      font.title = c(14,"bold"),
      font.x = c(12,"bold"),
      font.y = c(12,"bold"),
      font.legend=c(12),
      font.xtickslab=c(9,"black"),
      font.ytickslab=c(9,"black"))

##################################
# Plotting the Kaplan-Meier survival curves
# of the Factor variable adjusted by the Covariate variable 
# using the formulated Multivariate Cox Proportional Hazards Model
##################################

(DBP.GroupAdjustedByWBC <- with(DBP.Analysis,
                                 data.frame(Group = c("Placebo", "Treatment"),
                                            WBC = rep(mean(WBC, na.rm = TRUE), 2))))

##       Group      WBC
## 1   Placebo 2.930238
## 2 Treatment 2.930238

DBP.Analysis.MultivariateCoxPH.GroupWBC.GroupAdjustedByWBC.KaplanMeierEstimates <- survfit(DBP.Analysis.MultivariateCoxPH.GroupWBC, 
                                                                                             newdata = DBP.GroupAdjustedByWBC, 
                                                                                             data=DBP.Analysis)


SurvivalTimeInWeek.KaplanMeier.MultivariateCoxPH.GroupWBC.GroupAdjustedByWBC <- ggsurvplot(DBP.Analysis.MultivariateCoxPH.GroupWBC.GroupAdjustedByWBC.KaplanMeierEstimates,
                                                                         title = "Kaplan-Meier Survival Curves (CoxPH Model : Survival ~ Group + WBC)",
                                                                         pval = TRUE,
                                                                         conf.int = TRUE,
                                                                         conf.int.style = "ribbon",
                                                                         xlab = "Time (Weeks)",
                                                                         ylab="Estimated Survival Probability",
                                                                         break.time.by = 5,
                                                                         ggtheme = theme_bw(),
                                                                         risk.table = "abs_pct",
                                                                         risk.table.title="Number at Risk (Survival Probability)",
                                                                         risk.table.y.text.col = FALSE,
                                                                         risk.table.y.text = TRUE,
                                                                         fontsize = 3,
                                                                         ncensor.plot = FALSE,
                                                                         surv.median.line = "hv",
                                                                         legend.labs = c("Placebo", "Treatment"),
                                                                         palette = c("#3259A0","#FF5050"))

ggpar(SurvivalTimeInWeek.KaplanMeier.MultivariateCoxPH.GroupWBC.GroupAdjustedByWBC,
      font.title = c(14,"bold"),
      font.x = c(12,"bold"),
      font.y = c(12,"bold"),
      font.legend=c(12),
      font.xtickslab=c(9,"black"),
      font.ytickslab=c(9,"black"))

##################################
# Testing the proportional hazards assumption
# based on the scaled Schoenfield residuals
##################################
(DBP.Analysis.MultivariateCoxPH.GroupWBC.PHAssumptionCheck <- cox.zph(DBP.Analysis.MultivariateCoxPH.GroupWBC))

##           chisq df    p
## Group  8.27e-05  1 0.99
## WBC    4.00e-02  1 0.84
## GLOBAL 4.02e-02  2 0.98

##################################
# Formulating the graphical verification
# of the proportional hazards assumption test results
# using the scaled Schoenfeld residuals against time
##################################
(DBP.Analysis.MultivariateCoxPH.GroupWBC.PHAssumptionPlot <- ggcoxzph(DBP.Analysis.MultivariateCoxPH.GroupWBC.PHAssumptionCheck,
         point.col = "red",
         point.size = 2,
         point.shape = 19,
         point.alpha = 0.50,
  ggtheme = theme_survminer(),
  font.main = c(12,"bold"),
  font.x = c(12,"bold"),
  font.y = c(12,"bold"),
  font.xtickslab=c(9,"black"),
  font.ytickslab=c(9,"black")))

##################################
# Formulating the graphical verification
# to evaluate potential influential observations
# using the deviance residuals against observations
##################################
ggcoxdiagnostics(DBP.Analysis.MultivariateCoxPH.GroupWBC, 
 type = "deviance", 
 ox.scale = "observation.id",
 point.col = "red",
 point.size = 2,
 point.shape = 19,
 point.alpha = 0.50,
 ggtheme = theme_survminer(),
 font.main = c(12,"bold"),
 font.x = c(12,"bold"),
 font.y = c(12,"bold"),
 font.xtickslab=c(9,"black"),
 font.ytickslab=c(9,"black"),
 main = "Deviance Residuals by Observation")

##################################
# Formulating the graphical verification
# to evaluate symmetry
# using the dfbeta residuals against observations
##################################
ggcoxdiagnostics(DBP.Analysis.MultivariateCoxPH.GroupWBC, 
 type = "dfbeta", 
 ox.scale = "observation.id",
 point.col = "red",
 point.size = 2,
 point.shape = 19,
 point.alpha = 0.50,
 ggtheme = theme_survminer(),
 font.main = c(12,"bold"),
 font.x = c(12,"bold"),
 font.y = c(12,"bold"),
 font.xtickslab=c(9,"black"),
 font.ytickslab=c(9,"black"),
 main = "Dfbeta Residuals by Observation")

##################################
# Formulating the graphical verification
# to evaluate linearity of the numeric covariate
# using the martingale residuals of the null cox proportional hazards model
# and the individual values of the numeric covariate 
##################################
ggcoxfunctional(Surv(Week, Status) ~ WBC, 
 data = DBP.Analysis,
 point.col = "red",
 point.size = 2,
 point.shape = 19,
 point.alpha = 0.50,
 ggtheme = theme_survminer(),
 font.main = c(12,"bold"),
 font.x = c(12,"bold"),
 font.y = c(12,"bold"),
 font.xtickslab=c(9,"black"),
 font.ytickslab=c(9,"black"),
 main = "Martingale Residuals by WBC")

2. Summary

3. References

[Book] Survival Analysis : A Self-Learning Text by David Kleinbaum and Mitchel Klein
[Book] Applied Survival Analysis Using R by Dirk Moore
[Book] R for Health Data Science by Ewen Harrison and Riinu Pius
[Book] Supervised Machine Learning by Michael Foley
[Book] Data Science for Biological, Medical and Health Research by Thomas Love
[R Package] moments by Lukasz Komsta and Frederick Novomestky
[R Package] car by John Fox and Sanford Weisberg
[R Package] multcomp by Torsten Hothorn, Frank Bretz and Peter Westfall
[R Package] effects by John Fox and Sanford Weisberg
[R Package] psych by William Revelle
[R Package] ggplot2 by Hadley Wickham
[R Package] dplyr by Hadley Wickham
[R Package] ggpubr by Alboukadel Kassambara
[R Package] rstatix by Alboukadel Kassambara
[R Package] ggfortify by Yuan Tang
[R Package] trend by Thorsten Pohlert
[R Package] survival by Terry Therneau
[R Package] rms by Frank Harrell
[R Package] survminer by Alboukadel Kassambara
[R Package] Hmisc by Frank Harrel
[R Package] finalfit by Ewen Harrison
[R Package] knitr by Yihui Xie
[R Package] gtsummary by Daniel Sjoberg
[Article] Survival Analysis by Alboukadel Kassambara
[Article] Survival Analysis in R by Emily Zabor
[Article] Survival Analysis with R by Joseph Rickert
[Article] Assessment of Discrimination in Survival Analysis by R-Studio Team
[Article] Cox Model Assumptions by Alboukadel Kassambara
[Publication] Regression Models and Life-Tables by David Cox (Royal Statistical Society)
[Publication] Evaluating the Yield of Medical Tests by Frank Harrell, Robert Califf, David Pryor, Kerry Lee and Robert Rosati (Journal of the American Medical Association)
[Publication] Survival Analysis Part I: Basic Concepts and First Analyses by Taane Clark (British Journal of Cancer)
[Publication] Survival Analysis Part II: Multivariate Data Analysis – An Introduction to Concepts and Methods by Mike Bradburn (British Journal of Cancer)
[Publication] Survival Analysis Part III: Multivariate Data Analysis – Choosing a Model and Assessing its Adequacy and Fit by Mike Bradburn (British Journal of Cancer)
[Publication] Survival Analysis Part IV: Further Concepts and Methods in Survival Analysis by Taane Clark (British Journal of Cancer)
[Tutorial] Survival Analysis / Time-To-Event Analysis in R by Heidi Seibold (Datacamp)

Supervised Learning : Survival Analysis and Descriptive Modelling for a Two-Group Right-Censored Data with Time-Independent Variables Using Cox Proportional Hazards Model

John Pauline Pineda

May 5, 2022

1. Table of Contents

1.1 Sample Data

1.2 Data Quality Assessment

1.3 Research Question

1.4 Data Exploration

1.4.1 Descriptive Statistics

1.4.2 Kaplan-Meier Survival Curve

1.4.3 Log Rank Test

1.5 Model Development Summary

1.5.1 Univariate Cox Proportional Hazards Model

1.5.2 Multivariate Cox Proportional Hazards Model

1.5.3 Model Selection

1.5.4 Model Performance Validation

1.5.5 Model Diagnostics

2. Summary

3. References