1. Table of Contents

This project explores different robust alternatives for handling quasi-complete separation during logistic regression modelling using various helpful packages in R. Methods applied in the analysis to evaluate a quasi-complete condition when a covariate almost perfectly predicts the outcome included the Firth’s Bias-Reduced Logistic Regression, Firth’s Logistic Regression With Added Covariate, Firth’s Logistic Regression With Intercept Correction, Bayesian Generalized Linear Model With Cauchy Priors and Ridge-Penalized Logistic Regression algorithms. The resulting predictions derived from the candidate models were evaluated in terms of the stability of their coefficient estimates and standard errors, including the validity of their logistic profiles and the distribution of their predicted points, which were all compared to that of the baseline model without any form of quasi-complete separation treatment. All results were consolidated in a Summary presented at the end of the document.

Quasi-complete or complete separation is a monotone likelihood phenomenon observed in the fitting process of a logistic regression model when the likelihood converges while at least one parameter estimate diverges to positive or negative infinity. The algorithms applied in this study (mostly contained in the logistf and arm packages) attempt to reduce the bias of the maximum likelihood estimates through penalization, producing finite parameter estimates in the process.

1.1 Sample Data

The sex2 dataset from the logistf package was used for this illustrated example. One of the original categorical predictors was transformed to better simulate a quasi-complete condition when a covariate almost perfectly predicts the outcome.

Preliminary dataset assessment:

[A] 239 rows (observations)
     [A.1] Train Set = 239 observations

[B] 7 columns (variables)
     [B.1] 1/7 response = UTI variable (factor)
            [B.1.1] Levels = UTI=No < UTI=Yes
     [B.2] 6/7 predictors = All remaining variables (6/6 factor)

Code Chunk | Output

##################################
# Loading R libraries
##################################
library(AppliedPredictiveModeling)
library(caret)
library(rpart)
library(lattice)
library(dplyr)
library(tidyr)
library(moments)
library(skimr)
library(RANN)
library(pls)
library(corrplot)
library(tidyverse)
library(lares)
library(DMwR2)
library(gridExtra)
library(rattle)
library(rpart.plot)
library(RColorBrewer)
library(stats)
library(nnet)
library(elasticnet)
library(earth)
library(party)
library(kernlab)
library(randomForest)
library(Cubist)
library(pROC)
library(mda)
library(klaR)
library(pamr)
library(OptimalCutpoints)
library(broom)
library(PRROC)
library(ggpubr)
library(Hmisc)
library(logistf)
library(arm)
library(glmnet)

##################################
# Loading source and
# formulating the train set
##################################
data(sex2)
sex2$dia <- sex2$case
sex2$dia[1] <- 0
UTI_Train <- as.data.frame(sex2)

##################################
# Applying verbose descriptions
# for the response and predictor variables
##################################
colnames(UTI_Train) <- c("UTI",
                         "Age_24",
                         "Contraceptive",
                         "Condom",
                         "LubricatedCondom",
                         "Spermicide",
                         "Diaphragm")

UTI <- UTI_Train

UTI_Train <- lapply(UTI_Train, function(x) ifelse(x==1,"Yes","No"))
UTI_Train <- lapply(UTI_Train, function(x) as.factor(as.character(x)))
UTI_Train <- as.data.frame(UTI_Train)

##################################
# Performing a general exploration of the train set
##################################
dim(UTI_Train)

## [1] 239   7

str(UTI_Train)

## 'data.frame':    239 obs. of  7 variables:
##  $ UTI             : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Age_24          : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Contraceptive   : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Condom          : Factor w/ 2 levels "No","Yes": 1 2 2 2 2 2 2 2 2 2 ...
##  $ LubricatedCondom: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Spermicide      : Factor w/ 2 levels "No","Yes": 2 1 1 1 1 1 1 1 1 1 ...
##  $ Diaphragm       : Factor w/ 2 levels "No","Yes": 1 2 2 2 2 2 2 2 2 2 ...

summary(UTI_Train)

##   UTI      Age_24    Contraceptive Condom    LubricatedCondom Spermicide
##  No :109   No :201   No : 83       No :112   No :159          No :180   
##  Yes:130   Yes: 38   Yes:156       Yes:127   Yes: 80          Yes: 59   
##  Diaphragm
##  No :110  
##  Yes:129

describe(UTI_Train)

## UTI_Train 
## 
##  7  Variables      239  Observations
## --------------------------------------------------------------------------------
## UTI 
##        n  missing distinct 
##      239        0        2 
##                       
## Value         No   Yes
## Frequency    109   130
## Proportion 0.456 0.544
## --------------------------------------------------------------------------------
## Age_24 
##        n  missing distinct 
##      239        0        2 
##                       
## Value         No   Yes
## Frequency    201    38
## Proportion 0.841 0.159
## --------------------------------------------------------------------------------
## Contraceptive 
##        n  missing distinct 
##      239        0        2 
##                       
## Value         No   Yes
## Frequency     83   156
## Proportion 0.347 0.653
## --------------------------------------------------------------------------------
## Condom 
##        n  missing distinct 
##      239        0        2 
##                       
## Value         No   Yes
## Frequency    112   127
## Proportion 0.469 0.531
## --------------------------------------------------------------------------------
## LubricatedCondom 
##        n  missing distinct 
##      239        0        2 
##                       
## Value         No   Yes
## Frequency    159    80
## Proportion 0.665 0.335
## --------------------------------------------------------------------------------
## Spermicide 
##        n  missing distinct 
##      239        0        2 
##                       
## Value         No   Yes
## Frequency    180    59
## Proportion 0.753 0.247
## --------------------------------------------------------------------------------
## Diaphragm 
##        n  missing distinct 
##      239        0        2 
##                     
## Value        No  Yes
## Frequency   110  129
## Proportion 0.46 0.54
## --------------------------------------------------------------------------------

##################################
# Formulating a data type assessment summary
##################################
PDA <- UTI_Train
(PDA.Summary <- data.frame(
  Column.Index=c(1:length(names(PDA))),
  Column.Name= names(PDA),
  Column.Type=sapply(PDA, function(x) class(x)),
  row.names=NULL)
)

##   Column.Index      Column.Name Column.Type
## 1            1              UTI      factor
## 2            2           Age_24      factor
## 3            3    Contraceptive      factor
## 4            4           Condom      factor
## 5            5 LubricatedCondom      factor
## 6            6       Spermicide      factor
## 7            7        Diaphragm      factor

1.2 Data Quality Assessment

[A] No missing observations noted for any variable.

[B] Low variance observed for 1 variable with First.Second.Mode.Ratio>5.
[B.1] Age_24 variable (factor)

[C] Due to the absence of numeric predictors, no low variance with Unique.Count.Ratio<0.01 and high skewness with Skewness>3 or Skewness<(-3) observed for any variable.

Code Chunk | Output

##################################
# Loading dataset
##################################
DQA <- UTI_Train

##################################
# Formulating an overall data quality assessment summary
##################################
(DQA.Summary <- data.frame(
  Column.Index=c(1:length(names(DQA))),
  Column.Name= names(DQA),
  Column.Type=sapply(DQA, function(x) class(x)),
  Row.Count=sapply(DQA, function(x) nrow(DQA)),
  NA.Count=sapply(DQA,function(x)sum(is.na(x))),
  Fill.Rate=sapply(DQA,function(x)format(round((sum(!is.na(x))/nrow(DQA)),3),nsmall=3)),
  row.names=NULL)
)

##   Column.Index      Column.Name Column.Type Row.Count NA.Count Fill.Rate
## 1            1              UTI      factor       239        0     1.000
## 2            2           Age_24      factor       239        0     1.000
## 3            3    Contraceptive      factor       239        0     1.000
## 4            4           Condom      factor       239        0     1.000
## 5            5 LubricatedCondom      factor       239        0     1.000
## 6            6       Spermicide      factor       239        0     1.000
## 7            7        Diaphragm      factor       239        0     1.000

##################################
# Listing all predictors
##################################
DQA.Predictors <- DQA[,!names(DQA) %in% c("UTI")]

##################################
# Listing all numeric predictors
##################################
DQA.Predictors.Numeric <- DQA.Predictors[,sapply(DQA.Predictors, is.numeric)]

if (length(names(DQA.Predictors.Numeric))>0) {
    print(paste0("There are ",
               (length(names(DQA.Predictors.Numeric))),
               " numeric predictor variable(s)."))
} else {
  print("There are no numeric predictor variables.")
}

## [1] "There are no numeric predictor variables."

##################################
# Listing all factor predictors
##################################
DQA.Predictors.Factor <- DQA.Predictors[,sapply(DQA.Predictors, is.factor)]

if (length(names(DQA.Predictors.Factor))>0) {
    print(paste0("There are ",
               (length(names(DQA.Predictors.Factor))),
               " factor predictor variable(s)."))
} else {
  print("There are no factor predictor variables.")
}

## [1] "There are 6 factor predictor variable(s)."

##################################
# Formulating a data quality assessment summary for factor predictors
##################################
if (length(names(DQA.Predictors.Factor))>0) {

  ##################################
  # Formulating a function to determine the first mode
  ##################################
  FirstModes <- function(x) {
    ux <- unique(na.omit(x))
    tab <- tabulate(match(x, ux))
    ux[tab == max(tab)]
  }

  ##################################
  # Formulating a function to determine the second mode
  ##################################
  SecondModes <- function(x) {
    ux <- unique(na.omit(x))
    tab <- tabulate(match(x, ux))
    fm = ux[tab == max(tab)]
    sm = x[!(x %in% fm)]
    usm <- unique(sm)
    tabsm <- tabulate(match(sm, usm))
    ifelse(is.na(usm[tabsm == max(tabsm)])==TRUE,
           return("x"),
           return(usm[tabsm == max(tabsm)]))
  }

  (DQA.Predictors.Factor.Summary <- data.frame(
  Column.Name= names(DQA.Predictors.Factor),
  Column.Type=sapply(DQA.Predictors.Factor, function(x) class(x)),
  Unique.Count=sapply(DQA.Predictors.Factor, function(x) length(unique(x))),
  First.Mode.Value=sapply(DQA.Predictors.Factor, function(x) as.character(FirstModes(x)[1])),
  Second.Mode.Value=sapply(DQA.Predictors.Factor, function(x) as.character(SecondModes(x)[1])),
  First.Mode.Count=sapply(DQA.Predictors.Factor, function(x) sum(na.omit(x) == FirstModes(x)[1])),
  Second.Mode.Count=sapply(DQA.Predictors.Factor, function(x) sum(na.omit(x) == SecondModes(x)[1])),
  Unique.Count.Ratio=sapply(DQA.Predictors.Factor, function(x) format(round((length(unique(x))/nrow(DQA.Predictors.Factor)),3), nsmall=3)),
  First.Second.Mode.Ratio=sapply(DQA.Predictors.Factor, function(x) format(round((sum(na.omit(x) == FirstModes(x)[1])/sum(na.omit(x) == SecondModes(x)[1])),3), nsmall=3)),
  row.names=NULL)
  )

}

##        Column.Name Column.Type Unique.Count First.Mode.Value Second.Mode.Value
## 1           Age_24      factor            2               No               Yes
## 2    Contraceptive      factor            2              Yes                No
## 3           Condom      factor            2              Yes                No
## 4 LubricatedCondom      factor            2               No               Yes
## 5       Spermicide      factor            2               No               Yes
## 6        Diaphragm      factor            2              Yes                No
##   First.Mode.Count Second.Mode.Count Unique.Count.Ratio First.Second.Mode.Ratio
## 1              201                38              0.008                   5.289
## 2              156                83              0.008                   1.880
## 3              127               112              0.008                   1.134
## 4              159                80              0.008                   1.988
## 5              180                59              0.008                   3.051
## 6              129               110              0.008                   1.173

##################################
# Formulating a data quality assessment summary for numeric predictors
##################################
if (length(names(DQA.Predictors.Numeric))>0) {

  ##################################
  # Formulating a function to determine the first mode
  ##################################
  FirstModes <- function(x) {
    ux <- unique(na.omit(x))
    tab <- tabulate(match(x, ux))
    ux[tab == max(tab)]
  }

  ##################################
  # Formulating a function to determine the second mode
  ##################################
  SecondModes <- function(x) {
    ux <- unique(na.omit(x))
    tab <- tabulate(match(x, ux))
    fm = ux[tab == max(tab)]
    sm = na.omit(x)[!(na.omit(x) %in% fm)]
    usm <- unique(sm)
    tabsm <- tabulate(match(sm, usm))
    ifelse(is.na(usm[tabsm == max(tabsm)])==TRUE,
           return(0.00001),
           return(usm[tabsm == max(tabsm)]))
  }

  (DQA.Predictors.Numeric.Summary <- data.frame(
  Column.Name= names(DQA.Predictors.Numeric),
  Column.Type=sapply(DQA.Predictors.Numeric, function(x) class(x)),
  Unique.Count=sapply(DQA.Predictors.Numeric, function(x) length(unique(x))),
  Unique.Count.Ratio=sapply(DQA.Predictors.Numeric, function(x) format(round((length(unique(x))/nrow(DQA.Predictors.Numeric)),3), nsmall=3)),
  First.Mode.Value=sapply(DQA.Predictors.Numeric, function(x) format(round((FirstModes(x)[1]),3),nsmall=3)),
  Second.Mode.Value=sapply(DQA.Predictors.Numeric, function(x) format(round((SecondModes(x)[1]),3),nsmall=3)),
  First.Mode.Count=sapply(DQA.Predictors.Numeric, function(x) sum(na.omit(x) == FirstModes(x)[1])),
  Second.Mode.Count=sapply(DQA.Predictors.Numeric, function(x) sum(na.omit(x) == SecondModes(x)[1])),
  First.Second.Mode.Ratio=sapply(DQA.Predictors.Numeric, function(x) format(round((sum(na.omit(x) == FirstModes(x)[1])/sum(na.omit(x) == SecondModes(x)[1])),3), nsmall=3)),
  Minimum=sapply(DQA.Predictors.Numeric, function(x) format(round(min(x,na.rm = TRUE),3), nsmall=3)),
  Mean=sapply(DQA.Predictors.Numeric, function(x) format(round(mean(x,na.rm = TRUE),3), nsmall=3)),
  Median=sapply(DQA.Predictors.Numeric, function(x) format(round(median(x,na.rm = TRUE),3), nsmall=3)),
  Maximum=sapply(DQA.Predictors.Numeric, function(x) format(round(max(x,na.rm = TRUE),3), nsmall=3)),
  Skewness=sapply(DQA.Predictors.Numeric, function(x) format(round(skewness(x,na.rm = TRUE),3), nsmall=3)),
  Kurtosis=sapply(DQA.Predictors.Numeric, function(x) format(round(kurtosis(x,na.rm = TRUE),3), nsmall=3)),
  Percentile25th=sapply(DQA.Predictors.Numeric, function(x) format(round(quantile(x,probs=0.25,na.rm = TRUE),3), nsmall=3)),
  Percentile75th=sapply(DQA.Predictors.Numeric, function(x) format(round(quantile(x,probs=0.75,na.rm = TRUE),3), nsmall=3)),
  row.names=NULL)
  )

}

##################################
# Identifying potential data quality issues
##################################

##################################
# Checking for missing observations
##################################
if ((nrow(DQA.Summary[DQA.Summary$NA.Count>0,]))>0){
  print(paste0("Missing observations noted for ",
               (nrow(DQA.Summary[DQA.Summary$NA.Count>0,])),
               " variable(s) with NA.Count>0 and Fill.Rate<1.0."))
  DQA.Summary[DQA.Summary$NA.Count>0,]
} else {
  print("No missing observations noted.")
}

## [1] "No missing observations noted."

##################################
# Checking for zero or near-zero variance predictors
##################################
if (length(names(DQA.Predictors.Factor))==0) {
  print("No factor predictors noted.")
} else if (nrow(DQA.Predictors.Factor.Summary[as.numeric(as.character(DQA.Predictors.Factor.Summary$First.Second.Mode.Ratio))>5,])>0){
  print(paste0("Low variance observed for ",
               (nrow(DQA.Predictors.Factor.Summary[as.numeric(as.character(DQA.Predictors.Factor.Summary$First.Second.Mode.Ratio))>5,])),
               " factor variable(s) with First.Second.Mode.Ratio>5."))
  DQA.Predictors.Factor.Summary[as.numeric(as.character(DQA.Predictors.Factor.Summary$First.Second.Mode.Ratio))>5,]
} else {
  print("No low variance factor predictors due to high first-second mode ratio noted.")
}

## [1] "Low variance observed for 1 factor variable(s) with First.Second.Mode.Ratio>5."

##   Column.Name Column.Type Unique.Count First.Mode.Value Second.Mode.Value
## 1      Age_24      factor            2               No               Yes
##   First.Mode.Count Second.Mode.Count Unique.Count.Ratio First.Second.Mode.Ratio
## 1              201                38              0.008                   5.289

if (length(names(DQA.Predictors.Numeric))==0) {
  print("No numeric predictors noted.")
} else if (nrow(DQA.Predictors.Numeric.Summary[as.numeric(as.character(DQA.Predictors.Numeric.Summary$First.Second.Mode.Ratio))>5,])>0){
  print(paste0("Low variance observed for ",
               (nrow(DQA.Predictors.Numeric.Summary[as.numeric(as.character(DQA.Predictors.Numeric.Summary$First.Second.Mode.Ratio))>5,])),
               " numeric variable(s) with First.Second.Mode.Ratio>5."))
  DQA.Predictors.Numeric.Summary[as.numeric(as.character(DQA.Predictors.Numeric.Summary$First.Second.Mode.Ratio))>5,]
} else {
  print("No low variance numeric predictors due to high first-second mode ratio noted.")
}

## [1] "No numeric predictors noted."

if (length(names(DQA.Predictors.Numeric))==0) {
  print("No numeric predictors noted.")
} else if (nrow(DQA.Predictors.Numeric.Summary[as.numeric(as.character(DQA.Predictors.Numeric.Summary$Unique.Count.Ratio))<0.01,])>0){
  print(paste0("Low variance observed for ",
               (nrow(DQA.Predictors.Numeric.Summary[as.numeric(as.character(DQA.Predictors.Numeric.Summary$Unique.Count.Ratio))<0.01,])),
               " numeric variable(s) with Unique.Count.Ratio<0.01."))
  DQA.Predictors.Numeric.Summary[as.numeric(as.character(DQA.Predictors.Numeric.Summary$Unique.Count.Ratio))<0.01,]
} else {
  print("No low variance numeric predictors due to low unique count ratio noted.")
}

## [1] "No numeric predictors noted."

##################################
# Checking for skewed predictors
##################################
if (length(names(DQA.Predictors.Numeric))==0) {
  print("No numeric predictors noted.")
} else if (nrow(DQA.Predictors.Numeric.Summary[as.numeric(as.character(DQA.Predictors.Numeric.Summary$Skewness))>3 |
                                               as.numeric(as.character(DQA.Predictors.Numeric.Summary$Skewness))<(-3),])>0){
  print(paste0("High skewness observed for ",
  (nrow(DQA.Predictors.Numeric.Summary[as.numeric(as.character(DQA.Predictors.Numeric.Summary$Skewness))>3 |
                                               as.numeric(as.character(DQA.Predictors.Numeric.Summary$Skewness))<(-3),])),
  " numeric variable(s) with Skewness>3 or Skewness<(-3)."))
  DQA.Predictors.Numeric.Summary[as.numeric(as.character(DQA.Predictors.Numeric.Summary$Skewness))>3 |
                                 as.numeric(as.character(DQA.Predictors.Numeric.Summary$Skewness))<(-3),]
} else {
  print("No skewed numeric predictors noted.")
}

## [1] "No numeric predictors noted."

1.3 Data Preprocessing

1.3.1 Pre-Processed Dataset

[A] 239 rows (observations)
     [A.1] Train Set = 239 observations

[B] 7 columns (variables)
     [B.1] 1/7 response = UTI variable (factor)
            [B.1.1] Levels = UTI=No < UTI=Yes
     [B.2] 6/7 predictors = All remaining variables (6/6 factor)

[C] Pre-processing actions applied:
     [C.1] Due to the objective of simulating a quasi-complete separation, the predictor observed with near-zero variance was not removed in the analysis

Code Chunk | Output

##################################
# Creating the pre-modelling
# train set
##################################
PMA_PreModelling_Train <- UTI_Train

##################################
# Gathering descriptive statistics
##################################
(PMA_PreModelling_Train_Skimmed <- skim(PMA_PreModelling_Train))

Data summary
Name	PMA_PreModelling_Train
Number of rows	239
Number of columns	7
_______________________
Column type frequency:
factor	7
________________________
Group variables	None

Variable type: factor

skim_variable	complete_rate	ordered	n_unique	top_counts
UTI	1	FALSE	2	Yes: 130, No: 109
Age_24	1	FALSE	2	No: 201, Yes: 38
Contraceptive	1	FALSE	2	Yes: 156, No: 83
Condom	1	FALSE	2	Yes: 127, No: 112
LubricatedCondom	1	FALSE	2	No: 159, Yes: 80
Spermicide	1	FALSE	2	No: 180, Yes: 59
Diaphragm	1	FALSE	2	Yes: 129, No: 110

###################################
# Verifying the data dimensions
# for the train set
###################################
dim(PMA_PreModelling_Train)

## [1] 239   7

1.4 Data Exploration

[A] Several factor variables demonstrated differential relationships with the UTI response variable:
     [A.1] Diaphragm variable (factor)
     [A.2] Age_24 variable (factor)
     [A.3] Spermicide variable (factor)
     [A.4] LubricatedCondom variable (factor)

Code Chunk | Output

##################################
# Restructuring the dataset for
# for barchart analysis
##################################
EDA.Bar.Source <- PMA_PreModelling_Train

##################################
# Creating a function to formulate
# the proportions table
##################################
EDA.PropTable.Function <- function(FactorVar) {
  EDA.Bar.Source.FactorVar <- EDA.Bar.Source[,c("UTI",
                                                FactorVar)]
  EDA.Bar.Source.FactorVar.Prop <- as.data.frame(prop.table(table(EDA.Bar.Source.FactorVar), 2))
  names(EDA.Bar.Source.FactorVar.Prop)[2] <- "UTI"
  EDA.Bar.Source.FactorVar.Prop$Variable <- rep(FactorVar,nrow(EDA.Bar.Source.FactorVar.Prop))

  return(EDA.Bar.Source.FactorVar.Prop)

}

EDA.Bar.Source.FactorVar.Prop <- rbind(EDA.PropTable.Function("Age_24"),
                                       EDA.PropTable.Function("Contraceptive"),
                                       EDA.PropTable.Function("Condom"),
                                       EDA.PropTable.Function("LubricatedCondom"),
                                       EDA.PropTable.Function("Spermicide"),
                                       EDA.PropTable.Function("Diaphragm"))

(EDA.Barchart.FactorVar <- barchart(EDA.Bar.Source.FactorVar.Prop[,3] ~
                                      EDA.Bar.Source.FactorVar.Prop[,2] | EDA.Bar.Source.FactorVar.Prop[,4],
                                      data=EDA.Bar.Source.FactorVar.Prop,
                                      groups = EDA.Bar.Source.FactorVar.Prop[,1],
                                      stack=TRUE,
                                      ylab = "Proportion",
                                      xlab = "UTI",
                                      auto.key = list(adj=1, space="top", columns=2),
                                      layout=(c(3,2))))

1.5 Predictive Model Index Development and Probability Curve Estimation

1.5.1 Logistic Regression (LR)

Logistic Regression models the relationship between the probability of an event (among two outcome levels) by having the log-odds of the event be a linear combination of a set of predictors weighted by their respective parameter estimates. The parameters are estimated via maximum likelihood estimation by testing different values through multiple iterations to optimize for the best fit of log odds. All of these iterations produce the log likelihood function, and logistic regression seeks to maximize this function to find the best parameter estimates. Given the optimal parameters, the conditional probabilities for each observation can be calculated, logged, and summed together to yield a predicted probability.

[A] The logistic regression model from the stats package was implemented. The UTI response was regressed against the Age_24, Contraceptive, Condom, LubricatedCondom, Spermicide and Diaphragm predictors.

[B] The model does not contain any hyperparameter.

[C] The model was not able to sufficiently compensate for the quasi-complete condition simulated for the Diaphragm predictor, resulting in the hyper-inflation of its estimated coefficients and standard errors.
     [C.1] Intercept coefficient = -2.868
     [C.2] Age_24 coefficient = -1.052
     [C.3] Contraceptive coefficient = -21.885
     [C.4] Condom coefficient = -22.120
     [C.5] LubricatedCondom coefficient = -1.687
     [C.6] Spermicide coefficient = +2.868
     [C.7] Diaphragm coefficient = +71.283

[D] The logistic curve formulated by plotting the predicted probabilities against the classification index using the logit values showed a non-logistic profile with predicted points clustered in groups.

[E] The performance of the model is summarized as follows:
     [E.1] Final model configuration is fixed due to the absence of a hyperparameter.
     [E.2] Apparent ROC Curve AUC = 0.99996
     [E.3] Estimated probabilities were practically dichotomized into a limited set of values.

Code Chunk | Output

##################################
# Creating a local object
# for the train set
##################################
PMA_PreModelling_Train_LR <- UTI

##################################
# Formulating the structure of the
# Logistic Regression model
##################################
LR_Model <- glm(UTI ~ Age_24 + Contraceptive + Condom + LubricatedCondom + Spermicide + Diaphragm,
                data = PMA_PreModelling_Train_LR,
                family = binomial)

##################################
# Consolidating the model results
##################################
summary(LR_Model)

## 
## Call:
## glm(formula = UTI ~ Age_24 + Contraceptive + Condom + LubricatedCondom + 
##     Spermicide + Diaphragm, family = binomial, data = PMA_PreModelling_Train_LR)
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)
## (Intercept)         -2.868  32506.120   0.000    1.000
## Age_24              -1.052  30909.202   0.000    1.000
## Contraceptive      -21.885  28411.225  -0.001    0.999
## Condom             -22.120  26985.686  -0.001    0.999
## LubricatedCondom    -1.687  25786.443   0.000    1.000
## Spermicide           2.868  32506.120   0.000    1.000
## Diaphragm           71.283  30516.786   0.002    0.998
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 329.4768  on 238  degrees of freedom
## Residual deviance:   2.7726  on 232  degrees of freedom
## AIC: 16.773
## 
## Number of Fisher Scoring iterations: 24

LR_Model_Coef <- (as.data.frame(LR_Model$coefficients))
LR_Model_Coef$Coef <- rownames(LR_Model_Coef)
LR_Model_Coef$Model <- rep("LR",nrow(LR_Model_Coef))
colnames(LR_Model_Coef) <- c("Estimates","Coefficients","Model")
print(LR_Model_Coef, rownames=FALSE)

##                   Estimates     Coefficients Model
## (Intercept)       -2.868285      (Intercept)    LR
## Age_24            -1.051834           Age_24    LR
## Contraceptive    -21.884704    Contraceptive    LR
## Condom           -22.119783           Condom    LR
## LubricatedCondom  -1.686565 LubricatedCondom    LR
## Spermicide         2.868285       Spermicide    LR
## Diaphragm         71.283177        Diaphragm    LR

##################################
# Computing the model predictions
##################################
(LR_Model_Probabilities <- predict(LR_Model, 
                              type = c("response")))

##            1            2            3            4            5            6 
## 5.000000e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
##            7            8            9           10           11           12 
## 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
##           13           14           15           16           17           18 
## 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
##           19           20           21           22           23           24 
## 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
##           25           26           27           28           29           30 
## 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
##           31           32           33           34           35           36 
## 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
##           37           38           39           40           41           42 
## 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
##           43           44           45           46           47           48 
## 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
##           49           50           51           52           53           54 
## 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
##           55           56           57           58           59           60 
## 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
##           61           62           63           64           65           66 
## 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
##           67           68           69           70           71           72 
## 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
##           73           74           75           76           77           78 
## 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
##           79           80           81           82           83           84 
## 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
##           85           86           87           88           89           90 
## 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
##           91           92           93           94           95           96 
## 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
##           97           98           99          100          101          102 
## 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
##          103          104          105          106          107          108 
## 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
##          109          110          111          112          113          114 
## 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
##          115          116          117          118          119          120 
## 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
##          121          122          123          124          125          126 
## 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
##          127          128          129          130          131          132 
## 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 5.000000e-01 1.405464e-11 
##          133          134          135          136          137          138 
## 1.405464e-11 2.474572e-10 2.602280e-12 2.602280e-12 2.602280e-12 2.602280e-12 
##          139          140          141          142          143          144 
## 2.602280e-12 2.602280e-12 2.602280e-12 2.602280e-12 4.581779e-11 4.581779e-11 
##          145          146          147          148          149          150 
## 4.581779e-11 4.581779e-11 4.581779e-11 4.581779e-11 4.581779e-11 4.581779e-11 
##          151          152          153          154          155          156 
## 4.581779e-11 4.581779e-11 4.581779e-11 4.581779e-11 4.581779e-11 4.581779e-11 
##          157          158          159          160          161          162 
## 4.581779e-11 4.581779e-11 4.581779e-11 4.581779e-11 1.777926e-11 1.777926e-11 
##          163          164          165          166          167          168 
## 1.777926e-11 1.777926e-11 1.777926e-11 1.777926e-11 1.777926e-11 1.777926e-11 
##          169          170          171          172          173          174 
## 1.777926e-11 1.777926e-11 1.777926e-11 1.777926e-11 1.777926e-11 1.777926e-11 
##          175          176          177          178          179          180 
## 1.777926e-11 1.777926e-11 1.777926e-11 1.777926e-11 1.777926e-11 1.777926e-11 
##          181          182          183          184          185          186 
## 1.777926e-11 1.777926e-11 1.777926e-11 1.777926e-11 1.777926e-11 1.777926e-11 
##          187          188          189          190          191          192 
## 1.777926e-11 1.777926e-11 1.777926e-11 1.777926e-11 1.777926e-11 1.777926e-11 
##          193          194          195          196          197          198 
## 1.777926e-11 1.777926e-11 1.777926e-11 1.777926e-11 1.777926e-11 1.777926e-11 
##          199          200          201          202          203          204 
## 1.777926e-11 1.777926e-11 1.777926e-11 1.777926e-11 3.130357e-10 2.220446e-16 
##          205          206          207          208          209          210 
## 2.220446e-16 2.220446e-16 2.220446e-16 2.220446e-16 2.220446e-16 2.220446e-16 
##          211          212          213          214          215          216 
## 2.220446e-16 2.220446e-16 2.220446e-16 2.220446e-16 2.220446e-16 9.089676e-13 
##          217          218          219          220          221          222 
## 9.089676e-13 1.600400e-11 1.600400e-11 1.600400e-11 1.600400e-11 6.210236e-12 
##          223          224          225          226          227          228 
## 6.210236e-12 6.210236e-12 6.210236e-12 6.210236e-12 6.210236e-12 6.210236e-12 
##          229          230          231          232          233          234 
## 6.210236e-12 6.210236e-12 6.210236e-12 6.210236e-12 6.210236e-12 6.210236e-12 
##          235          236          237          238          239 
## 6.210236e-12 1.093423e-10 2.220446e-16 2.220446e-16 2.220446e-16

##################################
# Creating a classification index
# based from the model predictions
##################################
(LR_Model_Indices <- predict(LR_Model, 
                           type = c("link")))

##             1             2             3             4             5 
## -4.085621e-14  4.629511e+01  4.629511e+01  4.629511e+01  4.629511e+01 
##             6             7             8             9            10 
##  4.629511e+01  4.629511e+01  4.629511e+01  4.629511e+01  4.629511e+01 
##            11            12            13            14            15 
##  4.629511e+01  4.629511e+01  4.629511e+01  4.629511e+01  4.629511e+01 
##            16            17            18            19            20 
##  4.916339e+01  4.916339e+01  4.916339e+01  4.460854e+01  4.460854e+01 
##            21            22            23            24            25 
##  4.460854e+01  4.460854e+01  4.460854e+01  4.460854e+01  4.460854e+01 
##            26            27            28            29            30 
##  4.460854e+01  4.460854e+01  4.460854e+01  4.747683e+01  4.747683e+01 
##            31            32            33            34            35 
##  4.747683e+01  4.747683e+01  4.747683e+01  4.747683e+01  4.747683e+01 
##            36            37            38            39            40 
##  4.747683e+01  4.747683e+01  4.747683e+01  4.747683e+01  4.747683e+01 
##            41            42            43            44            45 
##  4.747683e+01  4.653019e+01  4.653019e+01  4.653019e+01  4.653019e+01 
##            46            47            48            49            50 
##  4.653019e+01  4.653019e+01  4.653019e+01  4.653019e+01  4.653019e+01 
##            51            52            53            54            55 
##  4.653019e+01  4.653019e+01  4.653019e+01  4.653019e+01  4.653019e+01 
##            56            57            58            59            60 
##  4.653019e+01  4.653019e+01  4.653019e+01  4.653019e+01  4.653019e+01 
##            61            62            63            64            65 
##  4.653019e+01  4.653019e+01  4.653019e+01  4.653019e+01  4.653019e+01 
##            66            67            68            69            70 
##  4.653019e+01  4.653019e+01  4.653019e+01  4.653019e+01  4.653019e+01 
##            71            72            73            74            75 
##  4.653019e+01  4.653019e+01  4.653019e+01  4.653019e+01  4.653019e+01 
##            76            77            78            79            80 
##  4.653019e+01  4.653019e+01  4.653019e+01  4.653019e+01  4.653019e+01 
##            81            82            83            84            85 
##  4.653019e+01  4.653019e+01  4.653019e+01  4.653019e+01  4.653019e+01 
##            86            87            88            89            90 
##  4.653019e+01  4.653019e+01  4.653019e+01  2.441040e+01  2.441040e+01 
##            91            92            93            94            95 
##  2.441040e+01  2.441040e+01  2.441040e+01  2.441040e+01  2.441040e+01 
##            96            97            98            99           100 
##  2.441040e+01  2.441040e+01  2.441040e+01  2.441040e+01  2.441040e+01 
##           101           102           103           104           105 
##  2.441040e+01  2.441040e+01  2.441040e+01  2.441040e+01  2.727869e+01 
##           106           107           108           109           110 
##  2.727869e+01  2.272384e+01  2.272384e+01  2.272384e+01  2.272384e+01 
##           111           112           113           114           115 
##  2.272384e+01  2.272384e+01  2.272384e+01  2.559212e+01  2.559212e+01 
##           116           117           118           119           120 
##  2.559212e+01  4.524327e+01  4.524327e+01  4.811156e+01  4.811156e+01 
##           121           122           123           124           125 
##  4.355671e+01  4.642499e+01  4.547835e+01  4.547835e+01  4.547835e+01 
##           126           127           128           129           130 
##  4.547835e+01  4.547835e+01  2.335857e+01  2.335857e+01  2.335857e+01 
##           131           132           133           134           135 
## -4.085621e-14 -2.498807e+01 -2.498807e+01 -2.211978e+01 -2.667463e+01 
##           136           137           138           139           140 
## -2.667463e+01 -2.667463e+01 -2.667463e+01 -2.667463e+01 -2.667463e+01 
##           141           142           143           144           145 
## -2.667463e+01 -2.667463e+01 -2.380635e+01 -2.380635e+01 -2.380635e+01 
##           146           147           148           149           150 
## -2.380635e+01 -2.380635e+01 -2.380635e+01 -2.380635e+01 -2.380635e+01 
##           151           152           153           154           155 
## -2.380635e+01 -2.380635e+01 -2.380635e+01 -2.380635e+01 -2.380635e+01 
##           156           157           158           159           160 
## -2.380635e+01 -2.380635e+01 -2.380635e+01 -2.380635e+01 -2.380635e+01 
##           161           162           163           164           165 
## -2.475299e+01 -2.475299e+01 -2.475299e+01 -2.475299e+01 -2.475299e+01 
##           166           167           168           169           170 
## -2.475299e+01 -2.475299e+01 -2.475299e+01 -2.475299e+01 -2.475299e+01 
##           171           172           173           174           175 
## -2.475299e+01 -2.475299e+01 -2.475299e+01 -2.475299e+01 -2.475299e+01 
##           176           177           178           179           180 
## -2.475299e+01 -2.475299e+01 -2.475299e+01 -2.475299e+01 -2.475299e+01 
##           181           182           183           184           185 
## -2.475299e+01 -2.475299e+01 -2.475299e+01 -2.475299e+01 -2.475299e+01 
##           186           187           188           189           190 
## -2.475299e+01 -2.475299e+01 -2.475299e+01 -2.475299e+01 -2.475299e+01 
##           191           192           193           194           195 
## -2.475299e+01 -2.475299e+01 -2.475299e+01 -2.475299e+01 -2.475299e+01 
##           196           197           198           199           200 
## -2.475299e+01 -2.475299e+01 -2.475299e+01 -2.475299e+01 -2.475299e+01 
##           201           202           203           204           205 
## -2.475299e+01 -2.475299e+01 -2.188470e+01 -4.687277e+01 -4.855934e+01 
##           206           207           208           209           210 
## -4.855934e+01 -4.855934e+01 -4.855934e+01 -4.855934e+01 -4.569105e+01 
##           211           212           213           214           215 
## -4.569105e+01 -4.569105e+01 -4.569105e+01 -4.569105e+01 -4.569105e+01 
##           216           217           218           219           220 
## -2.772647e+01 -2.772647e+01 -2.485818e+01 -2.485818e+01 -2.485818e+01 
##           221           222           223           224           225 
## -2.485818e+01 -2.580482e+01 -2.580482e+01 -2.580482e+01 -2.580482e+01 
##           226           227           228           229           230 
## -2.580482e+01 -2.580482e+01 -2.580482e+01 -2.580482e+01 -2.580482e+01 
##           231           232           233           234           235 
## -2.580482e+01 -2.580482e+01 -2.580482e+01 -2.580482e+01 -2.580482e+01 
##           236           237           238           239 
## -2.293654e+01 -4.792461e+01 -4.674289e+01 -4.674289e+01

max(LR_Model_Indices)

## [1] 49.16339

min(LR_Model_Indices)

## [1] -48.55934

##################################
# Consolidating the model probabilities
# and classification index
# based from the model predictions
##################################
LR_Model_Predictions <- as.data.frame(PMA_PreModelling_Train_LR)
LR_Model_Predictions$LR_Prob <- LR_Model_Probabilities
LR_Model_Predictions$LR_LP <- LR_Model_Indices
LR_Model_Predictions$UTI <- as.factor(LR_Model_Predictions$UTI)
LR_Model_Predictions$Label <- rep("LR",nrow(LR_Model_Predictions))

##################################
# Formulating the probability curve
# using the consolidated model predictions
##################################
LR_Model_Predictions %>%
  ggplot(aes(x = LR_LP ,
             y = LR_Prob,
             color = UTI)) +
  scale_colour_manual(values=c("#1846BA55","#B8000055")) +
  geom_point(size=5) +
  geom_line(color="black") +
  xlab("UTI Classification Index (Logit Values)") +
  ylab("Estimated UTI Probability") +
  labs(color = "UTI") +
  scale_x_continuous( limits=c(-50,50), breaks=seq(-50,50,by=5)) +
  scale_y_continuous( limits=c(0,1), breaks=seq(0,1,by=0.1),labels = scales::percent) +
  ggtitle("Estimated UTI Probabilities Based on Classification Index : Logistic Regression") +
  theme_bw() +
  theme(plot.title = element_text(color="black", size=14, face="bold", hjust=0.50),
        axis.title.x = element_text(color="black", size=12, face="bold"),
        axis.title.y = element_text(color="black", size=12, face="bold"),
        legend.position="top")

##################################
# Formulating the corresponding
# receiver operating characteristic (ROC) curve
# using the model predictions
##################################
LR_Prob_Low  <- LR_Model_Predictions[LR_Model_Predictions$UTI==0,
                                    c("LR_Prob")]
LR_Prob_High <- LR_Model_Predictions[LR_Model_Predictions$UTI==1,
                                    c("LR_Prob")]
LR_Model_Prob_ROC <- roc.curve(scores.class1 = LR_Prob_Low,
                               scores.class0 = LR_Prob_High,
                                curve = TRUE)

plot(LR_Model_Prob_ROC,
     xlab="1-Specificity", 
     ylab="Sensitivity",
     main="ROC Curve of the UTI Probabilities : Logistic Regression", 
     color=TRUE, 
     lwd=8,
     legend=3)

1.5.2 Firth’s Bias-Reduced Logistic Regression (FBRLR)

Firth’s Bias-Reduced Logistic Regression fits a logistic regression model using Firth’s bias reduction method by penalizing the log-likelihood by the Jeffreys prior.

[A] The Firth’s bias-reduced logistic regression model from the logistf package was implemented. The UTI response was regressed against the Age_24, Contraceptive, Condom, LubricatedCondom, Spermicide and Diaphragm predictors.

[B] The model contains 1 hyperparameter:
     [B.1] maxit = maximum number of iterations for the penalized-likelihood logistic regression held constant at a value of 1000

[C] The model was able to sufficiently compensate for the quasi-complete condition simulated for the Diaphragm predictor, resulting in reasonably valid estimated coefficients and standard errors.
     [C.1] Intercept coefficient = -3.345
     [C.2] Age_24 coefficient = -1.539
     [C.3] Contraceptive coefficient = -1.014
     [C.4] Condom coefficient = -2.048
     [C.5] LubricatedCondom coefficient = -1.214
     [C.6] Spermicide coefficient = +3.019
     [C.7] Diaphragm coefficient = +10.409

[D] The logistic curve formulated by plotting the predicted probabilities against the classification index using the logit values showed a valid logistic profile with reasonably distributed predicted points.

[E] The performance of the model is summarized as follows:
     [E.1] Final model configuration involves maxit=1000.
     [E.2] Apparent ROC Curve AUC = 0.99996
     [E.3] Estimated probabilities were reasonably distributed across a range of values.

Code Chunk | Output

##################################
# Creating a local object
# for the train set
##################################
PMA_PreModelling_Train_FBRLR <- UTI

##################################
# Formulating the structure of the 
# Firth's Bias-Reduced Logistic Regression model
##################################
FBRLR_Model <- logistf(UTI ~ Age_24 + Contraceptive + Condom + LubricatedCondom + Spermicide + Diaphragm,
                       data = PMA_PreModelling_Train_FBRLR,
                       control=logistf.control(maxit = 10000))

##################################
# Consolidating the model results
##################################
summary(FBRLR_Model)

## logistf(formula = UTI ~ Age_24 + Contraceptive + Condom + LubricatedCondom + 
##     Spermicide + Diaphragm, data = PMA_PreModelling_Train_FBRLR, 
##     control = logistf.control(maxit = 10000))
## 
## Model fitted by Penalized ML
## Coefficients:
##                       coef se(coef) lower 0.95 upper 0.95     Chisq         p
## (Intercept)      -3.345993 1.773653  -8.977236   5.673493 0.0000000 1.0000000
## Age_24           -1.539877 1.501847 -10.674608   6.075439 0.0322714 0.8574333
## Contraceptive    -1.014937 1.295321 -10.851515   2.273439 0.3216179 0.5706369
## Condom           -2.048751 1.745776  -8.554568   2.953225 0.5499878 0.4583227
## LubricatedCondom -1.214703 1.504503  -9.510967   4.335792 0.2212636 0.6380788
## Spermicide        3.019690 1.735450  -5.362560   8.573758 0.0000000 1.0000000
## Diaphragm        10.409803 1.879626   7.071215  17.006177       Inf 0.0000000
##                  method
## (Intercept)           2
## Age_24                2
## Contraceptive         2
## Condom                2
## LubricatedCondom      2
## Spermicide            2
## Diaphragm             2
## 
## Method: 1-Wald, 2-Profile penalized log-likelihood, 3-None
## 
## Likelihood ratio test=300.1278 on 6 df, p=0, n=239
## Wald test = 45.31742 on 6 df, p = 4.04738e-08

FBRLR_Model_Coef <- (as.data.frame(FBRLR_Model$coefficients))
FBRLR_Model_Coef$Coef <- rownames(FBRLR_Model_Coef)
FBRLR_Model_Coef$Model <- rep("FBRLR",nrow(FBRLR_Model_Coef))
colnames(FBRLR_Model_Coef) <- c("Estimates","Coefficients","Model")
print(FBRLR_Model_Coef, rownames=FALSE)

##                  Estimates     Coefficients Model
## (Intercept)      -3.345993      (Intercept) FBRLR
## Age_24           -1.539877           Age_24 FBRLR
## Contraceptive    -1.014937    Contraceptive FBRLR
## Condom           -2.048751           Condom FBRLR
## LubricatedCondom -1.214703 LubricatedCondom FBRLR
## Spermicide        3.019690       Spermicide FBRLR
## Diaphragm        10.409803        Diaphragm FBRLR

##################################
# Computing the model predictions
##################################
(FBRLR_Model_Probabilities <- predict(FBRLR_Model, 
                              type = c("response")))

##   [1] 0.4191404846 0.9934065218 0.9934065218 0.9934065218 0.9934065218
##   [6] 0.9934065218 0.9934065218 0.9934065218 0.9934065218 0.9934065218
##  [11] 0.9934065218 0.9934065218 0.9934065218 0.9934065218 0.9934065218
##  [16] 0.9996760990 0.9996760990 0.9996760990 0.9781263548 0.9781263548
##  [21] 0.9781263548 0.9781263548 0.9781263548 0.9781263548 0.9781263548
##  [26] 0.9781263548 0.9781263548 0.9781263548 0.9989095199 0.9989095199
##  [31] 0.9989095199 0.9989095199 0.9989095199 0.9989095199 0.9989095199
##  [36] 0.9989095199 0.9989095199 0.9989095199 0.9989095199 0.9989095199
##  [41] 0.9989095199 0.9976450392 0.9976450392 0.9976450392 0.9976450392
##  [46] 0.9976450392 0.9976450392 0.9976450392 0.9976450392 0.9976450392
##  [51] 0.9976450392 0.9976450392 0.9976450392 0.9976450392 0.9976450392
##  [56] 0.9976450392 0.9976450392 0.9976450392 0.9976450392 0.9976450392
##  [61] 0.9976450392 0.9976450392 0.9976450392 0.9976450392 0.9976450392
##  [66] 0.9976450392 0.9976450392 0.9976450392 0.9976450392 0.9976450392
##  [71] 0.9976450392 0.9976450392 0.9976450392 0.9976450392 0.9976450392
##  [76] 0.9976450392 0.9976450392 0.9976450392 0.9976450392 0.9976450392
##  [81] 0.9976450392 0.9976450392 0.9976450392 0.9976450392 0.9976450392
##  [86] 0.9976450392 0.9976450392 0.9976450392 0.9820159482 0.9820159482
##  [91] 0.9820159482 0.9820159482 0.9820159482 0.9820159482 0.9820159482
##  [96] 0.9820159482 0.9820159482 0.9820159482 0.9820159482 0.9820159482
## [101] 0.9820159482 0.9820159482 0.9820159482 0.9820159482 0.9991068050
## [106] 0.9991068050 0.9418828221 0.9418828221 0.9418828221 0.9418828221
## [111] 0.9418828221 0.9418828221 0.9418828221 0.9969969205 0.9969969205
## [116] 0.9969969205 0.9699733189 0.9699733189 0.9984911116 0.9984911116
## [121] 0.9055506591 0.9949342246 0.9891103889 0.9891103889 0.9891103889
## [126] 0.9891103889 0.9891103889 0.9213074590 0.9213074590 0.9213074590
## [131] 0.4191404846 0.0045198613 0.0045198613 0.0850948364 0.0013457643
## [136] 0.0013457643 0.0013457643 0.0013457643 0.0013457643 0.0013457643
## [141] 0.0013457643 0.0013457643 0.0268634795 0.0268634795 0.0268634795
## [146] 0.0268634795 0.0268634795 0.0268634795 0.0268634795 0.0268634795
## [151] 0.0268634795 0.0268634795 0.0268634795 0.0268634795 0.0268634795
## [156] 0.0268634795 0.0268634795 0.0268634795 0.0268634795 0.0268634795
## [161] 0.0126055879 0.0126055879 0.0126055879 0.0126055879 0.0126055879
## [166] 0.0126055879 0.0126055879 0.0126055879 0.0126055879 0.0126055879
## [171] 0.0126055879 0.0126055879 0.0126055879 0.0126055879 0.0126055879
## [176] 0.0126055879 0.0126055879 0.0126055879 0.0126055879 0.0126055879
## [181] 0.0126055879 0.0126055879 0.0126055879 0.0126055879 0.0126055879
## [186] 0.0126055879 0.0126055879 0.0126055879 0.0126055879 0.0126055879
## [191] 0.0126055879 0.0126055879 0.0126055879 0.0126055879 0.0126055879
## [196] 0.0126055879 0.0126055879 0.0126055879 0.0126055879 0.0126055879
## [201] 0.0126055879 0.0126055879 0.2073062892 0.0016428466 0.0004881580
## [206] 0.0004881580 0.0004881580 0.0004881580 0.0004881580 0.0099056649
## [211] 0.0099056649 0.0099056649 0.0099056649 0.0099056649 0.0099056649
## [216] 0.0002888474 0.0002888474 0.0058839046 0.0058839046 0.0058839046
## [221] 0.0058839046 0.0027297654 0.0027297654 0.0027297654 0.0027297654
## [226] 0.0027297654 0.0027297654 0.0027297654 0.0027297654 0.0027297654
## [231] 0.0027297654 0.0027297654 0.0027297654 0.0027297654 0.0027297654
## [236] 0.0530949852 0.0003526939 0.0021405061 0.0021405061

##################################
# Creating a classification index
# based from the model predictions
##################################
(FBRLR_Model_Indices <- predict(FBRLR_Model, 
                           type = c("link")))

##   [1] -0.3263028  5.0150590  5.0150590  5.0150590  5.0150590  5.0150590
##   [7]  5.0150590  5.0150590  5.0150590  5.0150590  5.0150590  5.0150590
##  [13]  5.0150590  5.0150590  5.0150590  8.0347488  8.0347488  8.0347488
##  [19]  3.8003564  3.8003564  3.8003564  3.8003564  3.8003564  3.8003564
##  [25]  3.8003564  3.8003564  3.8003564  3.8003564  6.8200462  6.8200462
##  [31]  6.8200462  6.8200462  6.8200462  6.8200462  6.8200462  6.8200462
##  [37]  6.8200462  6.8200462  6.8200462  6.8200462  6.8200462  6.0488735
##  [43]  6.0488735  6.0488735  6.0488735  6.0488735  6.0488735  6.0488735
##  [49]  6.0488735  6.0488735  6.0488735  6.0488735  6.0488735  6.0488735
##  [55]  6.0488735  6.0488735  6.0488735  6.0488735  6.0488735  6.0488735
##  [61]  6.0488735  6.0488735  6.0488735  6.0488735  6.0488735  6.0488735
##  [67]  6.0488735  6.0488735  6.0488735  6.0488735  6.0488735  6.0488735
##  [73]  6.0488735  6.0488735  6.0488735  6.0488735  6.0488735  6.0488735
##  [79]  6.0488735  6.0488735  6.0488735  6.0488735  6.0488735  6.0488735
##  [85]  6.0488735  6.0488735  6.0488735  6.0488735  4.0001222  4.0001222
##  [91]  4.0001222  4.0001222  4.0001222  4.0001222  4.0001222  4.0001222
##  [97]  4.0001222  4.0001222  4.0001222  4.0001222  4.0001222  4.0001222
## [103]  4.0001222  4.0001222  7.0198120  7.0198120  2.7854196  2.7854196
## [109]  2.7854196  2.7854196  2.7854196  2.7854196  2.7854196  5.8051094
## [115]  5.8051094  5.8051094  3.4751822  3.4751822  6.4948720  6.4948720
## [121]  2.2604796  5.2801694  4.5089967  4.5089967  4.5089967  4.5089967
## [127]  4.5089967  2.4602454  2.4602454  2.4602454 -0.3263028 -5.3947439
## [133] -5.3947439 -2.3750541 -6.6094465 -6.6094465 -6.6094465 -6.6094465
## [139] -6.6094465 -6.6094465 -6.6094465 -6.6094465 -3.5897567 -3.5897567
## [145] -3.5897567 -3.5897567 -3.5897567 -3.5897567 -3.5897567 -3.5897567
## [151] -3.5897567 -3.5897567 -3.5897567 -3.5897567 -3.5897567 -3.5897567
## [157] -3.5897567 -3.5897567 -3.5897567 -3.5897567 -4.3609294 -4.3609294
## [163] -4.3609294 -4.3609294 -4.3609294 -4.3609294 -4.3609294 -4.3609294
## [169] -4.3609294 -4.3609294 -4.3609294 -4.3609294 -4.3609294 -4.3609294
## [175] -4.3609294 -4.3609294 -4.3609294 -4.3609294 -4.3609294 -4.3609294
## [181] -4.3609294 -4.3609294 -4.3609294 -4.3609294 -4.3609294 -4.3609294
## [187] -4.3609294 -4.3609294 -4.3609294 -4.3609294 -4.3609294 -4.3609294
## [193] -4.3609294 -4.3609294 -4.3609294 -4.3609294 -4.3609294 -4.3609294
## [199] -4.3609294 -4.3609294 -4.3609294 -4.3609294 -1.3412395 -6.4096806
## [205] -7.6243832 -7.6243832 -7.6243832 -7.6243832 -7.6243832 -4.6046934
## [211] -4.6046934 -4.6046934 -4.6046934 -4.6046934 -4.6046934 -8.1493232
## [217] -8.1493232 -5.1296334 -5.1296334 -5.1296334 -5.1296334 -5.9008061
## [223] -5.9008061 -5.9008061 -5.9008061 -5.9008061 -5.9008061 -5.9008061
## [229] -5.9008061 -5.9008061 -5.9008061 -5.9008061 -5.9008061 -5.9008061
## [235] -5.9008061 -2.8811163 -7.9495574 -6.1445702 -6.1445702

max(FBRLR_Model_Indices)

## [1] 8.034749

min(FBRLR_Model_Indices)

## [1] -8.149323

##################################
# Consolidating the model probabilities
# and classification index
# based from the model predictions
##################################
FBRLR_Model_Predictions <- as.data.frame(PMA_PreModelling_Train_FBRLR)
FBRLR_Model_Predictions$FBRLR_Prob <- FBRLR_Model_Probabilities
FBRLR_Model_Predictions$FBRLR_LP <- FBRLR_Model_Indices
FBRLR_Model_Predictions$UTI <- as.factor(FBRLR_Model_Predictions$UTI)
FBRLR_Model_Predictions$Label <- rep("FBRLR",nrow(FBRLR_Model_Predictions))

##################################
# Formulating the probability curve
# using the consolidated model predictions
##################################
FBRLR_Model_Predictions %>%
  ggplot(aes(x = FBRLR_LP ,
             y = FBRLR_Prob,
             color = UTI)) +
  scale_colour_manual(values=c("#1846BA55","#B8000055")) +
  geom_point(size=5) +
  geom_line(color="black") +
  xlab("UTI Classification Index (Logit Values)") +
  ylab("Estimated UTI Probability") +
  labs(color = "UTI") +
  scale_x_continuous( limits=c(-10,10), breaks=seq(-10,10,by=1)) +
  scale_y_continuous( limits=c(0,1), breaks=seq(0,1,by=0.1),labels = scales::percent) +
  ggtitle("Estimated UTI Probabilities Based on Classification Index : Firth's Bias-Reduced Logistic Regression") +
  theme_bw() +
  theme(plot.title = element_text(color="black", size=14, face="bold", hjust=0.50),
        axis.title.x = element_text(color="black", size=12, face="bold"),
        axis.title.y = element_text(color="black", size=12, face="bold"),
        legend.position="top")

##################################
# Formulating the corresponding
# receiver operating characteristic (ROC) curve
# using the model predictions
##################################
FBRLR_Prob_Low  <- FBRLR_Model_Predictions[FBRLR_Model_Predictions$UTI==0,
                                    c("FBRLR_Prob")]
FBRLR_Prob_High <- FBRLR_Model_Predictions[FBRLR_Model_Predictions$UTI==1,
                                    c("FBRLR_Prob")]
FBRLR_Model_Prob_ROC <- roc.curve(scores.class1 = FBRLR_Prob_Low,
                               scores.class0 = FBRLR_Prob_High,
                                curve = TRUE)

plot(FBRLR_Model_Prob_ROC,
     xlab="1-Specificity", 
     ylab="Sensitivity",
     main="ROC Curve of the UTI Probabilities : Firth's Bias-Reduced Logistic Regression", 
     color=TRUE, 
     lwd=8,
     legend=3)

1.5.3 Firth’s Logistic Regression With Added Covariate (FLAC)

Firth’s Logistic Regression With Added Covariate implements Firth’s bias-reduced penalized-likelihood logistic regression with the addition of covariates. The modified score equations to estimate the coefficients for Firth’s logistic regression can be interpreted as score equations for maximum likelihood estimates for an augmented data set. This dataset is created by complementing each original observation with two weighted pseudo-observations with unchanged covariate values. The model attempts to discriminate between the original and pseudo-observations in the alternative formulation of Firth’s estimation as an iterative data augmentation procedure.

[A] The Firth’s logistic regression with added covariate model from the logistf package was implemented. The UTI response was regressed against the Age_24, Contraceptive, Condom, LubricatedCondom, Spermicide and Diaphragm predictors.

[B] The model contains 1 hyperparameter:
     [B.1] maxit = maximum number of iterations for the penalized-likelihood logistic regression held constant at a value of 1000

[C] The model was able to sufficiently compensate for the quasi-complete condition simulated for the Diaphragm predictor, resulting in reasonably valid estimated coefficients and standard errors.
     [C.1] Intercept coefficient = -3.302
     [C.2] Age_24 coefficient = -1.487
     [C.3] Contraceptive coefficient = -1.042
     [C.4] Condom coefficient = -2.115
     [C.5] LubricatedCondom coefficient = -1.231
     [C.6] Spermicide coefficient = +3.115
     [C.7] Diaphragm coefficient = +10.507

[D] The logistic curve formulated by plotting the predicted probabilities against the classification index using the logit values showed a valid logistic profile with reasonably distributed predicted points.

[E] The performance of the model is summarized as follows:
     [E.1] Final model configuration involves maxit=1000.
     [E.2] Apparent ROC Curve AUC = 0.99996
     [E.3] Estimated probabilities were reasonably distributed across a range of values.

Code Chunk | Output

##################################
# Creating a local object
# for the train set
##################################
PMA_PreModelling_Train_FLAC <- UTI

##################################
# Formulating the structure of the
# Firth's Logistic Regression With Added Covariate model
##################################
FLAC_Model <- flac(UTI ~ Age_24 + Contraceptive + Condom + LubricatedCondom + Spermicide + Diaphragm,
                       data = PMA_PreModelling_Train_FLAC,
                       control=logistf.control(maxit = 10000))

##################################
# Consolidating the model results
##################################
summary(FLAC_Model)

## Firth's logistic regression with added covariate
## 
## flac.default(formula = UTI ~ Age_24 + Contraceptive + Condom + 
##     LubricatedCondom + Spermicide + Diaphragm, data = PMA_PreModelling_Train_FLAC, 
##     control = logistf.control(maxit = 10000))
## 
## Model fitted by Standard ML
## Coefficients:
##                       coef se(coef) lower 0.95  upper 0.95     Chisq          p
## (Intercept)      -3.302290 1.782586  -7.760055  0.09953485 3.6234507 0.05697059
## Age_24           -1.487618 1.497439  -4.646277  1.36417805 1.0134856 0.31406924
## Contraceptive    -1.042335 1.302286  -4.111464  1.31085064 0.7064407 0.40062796
## Condom           -2.115792 1.778198  -6.237843  1.15049726 1.5751351 0.20946284
## LubricatedCondom -1.231671 1.508294  -4.369073  2.00803070 0.6514910 0.41958010
## Spermicide        3.115554 1.757872  -0.114788  7.55438640 3.5591064 0.05921961
## Diaphragm        10.507036 1.920601   7.523219 15.41416178       Inf 0.00000000
##                  method
## (Intercept)           2
## Age_24                2
## Contraceptive         2
## Condom                2
## LubricatedCondom      2
## Spermicide            2
## Diaphragm             2
## 
## Method: 1-Wald, 2-Profile penalized log-likelihood, 3-None
## 
## Likelihood ratio test=305.0325 on 6 df, p=0, n=239
## Wald test = 44.74216 on 6 df, p = 5.266057e-08

FLAC_Model_Coef <- (as.data.frame(FLAC_Model$coefficients))
FLAC_Model_Coef$Coef <- rownames(FLAC_Model_Coef)
FLAC_Model_Coef$Model <- rep("FLAC",nrow(FLAC_Model_Coef))
colnames(FLAC_Model_Coef) <- c("Estimates","Coefficients","Model")
print(FLAC_Model_Coef, rownames=FALSE)

##                  Estimates     Coefficients Model
## (Intercept)      -3.302290      (Intercept)  FLAC
## Age_24           -1.487618           Age_24  FLAC
## Contraceptive    -1.042335    Contraceptive  FLAC
## Condom           -2.115792           Condom  FLAC
## LubricatedCondom -1.231671 LubricatedCondom  FLAC
## Spermicide        3.115554       Spermicide  FLAC
## Diaphragm        10.507036        Diaphragm  FLAC

##################################
# Computing the model predictions
##################################
(FLAC_Model_Probabilities <- predict(FLAC_Model, 
                              type = c("response")))

##   [1] 0.4534510426 0.9938733034 0.9938733034 0.9938733034 0.9938733034
##   [6] 0.9938733034 0.9938733034 0.9938733034 0.9938733034 0.9938733034
##  [11] 0.9938733034 0.9938733034 0.9938733034 0.9938733034 0.9938733034
##  [16] 0.9997266564 0.9997266564 0.9997266564 0.9793117349 0.9793117349
##  [21] 0.9793117349 0.9793117349 0.9793117349 0.9793117349 0.9793117349
##  [26] 0.9793117349 0.9793117349 0.9793117349 0.9990638862 0.9990638862
##  [31] 0.9990638862 0.9990638862 0.9990638862 0.9990638862 0.9990638862
##  [36] 0.9990638862 0.9990638862 0.9990638862 0.9990638862 0.9990638862
##  [41] 0.9990638862 0.9978972638 0.9978972638 0.9978972638 0.9978972638
##  [46] 0.9978972638 0.9978972638 0.9978972638 0.9978972638 0.9978972638
##  [51] 0.9978972638 0.9978972638 0.9978972638 0.9978972638 0.9978972638
##  [56] 0.9978972638 0.9978972638 0.9978972638 0.9978972638 0.9978972638
##  [61] 0.9978972638 0.9978972638 0.9978972638 0.9978972638 0.9978972638
##  [66] 0.9978972638 0.9978972638 0.9978972638 0.9978972638 0.9978972638
##  [71] 0.9978972638 0.9978972638 0.9978972638 0.9978972638 0.9978972638
##  [76] 0.9978972638 0.9978972638 0.9978972638 0.9978972638 0.9978972638
##  [81] 0.9978972638 0.9978972638 0.9978972638 0.9978972638 0.9978972638
##  [86] 0.9978972638 0.9978972638 0.9978972638 0.9828189634 0.9828189634
##  [91] 0.9828189634 0.9828189634 0.9828189634 0.9828189634 0.9828189634
##  [96] 0.9828189634 0.9828189634 0.9828189634 0.9828189634 0.9828189634
## [101] 0.9828189634 0.9828189634 0.9828189634 0.9828189634 0.9992252323
## [106] 0.9992252323 0.9434782637 0.9434782637 0.9434782637 0.9434782637
## [111] 0.9434782637 0.9434782637 0.9434782637 0.9973498929 0.9973498929
## [116] 0.9973498929 0.9734375625 0.9734375625 0.9987911656 0.9987911656
## [121] 0.9144846628 0.9958695031 0.9907587369 0.9907587369 0.9907587369
## [126] 0.9907587369 0.9907587369 0.9281758381 0.9281758381 0.9281758381
## [131] 0.4534510426 0.0044160565 0.0044160565 0.0909137438 0.0012926686
## [136] 0.0012926686 0.0012926686 0.0012926686 0.0012926686 0.0012926686
## [141] 0.0012926686 0.0012926686 0.0283546611 0.0283546611 0.0283546611
## [146] 0.0283546611 0.0283546611 0.0283546611 0.0283546611 0.0283546611
## [151] 0.0283546611 0.0283546611 0.0283546611 0.0283546611 0.0283546611
## [156] 0.0283546611 0.0283546611 0.0283546611 0.0283546611 0.0283546611
## [161] 0.0128101384 0.0128101384 0.0128101384 0.0128101384 0.0128101384
## [166] 0.0128101384 0.0128101384 0.0128101384 0.0128101384 0.0128101384
## [171] 0.0128101384 0.0128101384 0.0128101384 0.0128101384 0.0128101384
## [176] 0.0128101384 0.0128101384 0.0128101384 0.0128101384 0.0128101384
## [181] 0.0128101384 0.0128101384 0.0128101384 0.0128101384 0.0128101384
## [186] 0.0128101384 0.0128101384 0.0128101384 0.0128101384 0.0128101384
## [191] 0.0128101384 0.0128101384 0.0128101384 0.0128101384 0.0128101384
## [196] 0.0128101384 0.0128101384 0.0128101384 0.0128101384 0.0128101384
## [201] 0.0128101384 0.0128101384 0.2263439047 0.0015616995 0.0004562158
## [206] 0.0004562158 0.0004562158 0.0004562158 0.0004562158 0.0101856780
## [211] 0.0101856780 0.0101856780 0.0101856780 0.0101856780 0.0101856780
## [216] 0.0002923193 0.0002923193 0.0065493547 0.0065493547 0.0065493547
## [221] 0.0065493547 0.0029229231 0.0029229231 0.0029229231 0.0029229231
## [226] 0.0029229231 0.0029229231 0.0029229231 0.0029229231 0.0029229231
## [231] 0.0029229231 0.0029229231 0.0029229231 0.0029229231 0.0029229231
## [236] 0.0619956518 0.0003532306 0.0023193340 0.0023193340

##################################
# Creating a classification index
# based from the model predictions
##################################
(FLAC_Model_Indices <- predict(FLAC_Model, 
                           type = c("link")))

##   [1] -0.1867366  5.0889540  5.0889540  5.0889540  5.0889540  5.0889540
##   [7]  5.0889540  5.0889540  5.0889540  5.0889540  5.0889540  5.0889540
##  [13]  5.0889540  5.0889540  5.0889540  8.2045076  8.2045076  8.2045076
##  [19]  3.8572834  3.8572834  3.8572834  3.8572834  3.8572834  3.8572834
##  [25]  3.8572834  3.8572834  3.8572834  3.8572834  6.9728369  6.9728369
##  [31]  6.9728369  6.9728369  6.9728369  6.9728369  6.9728369  6.9728369
##  [37]  6.9728369  6.9728369  6.9728369  6.9728369  6.9728369  6.1624109
##  [43]  6.1624109  6.1624109  6.1624109  6.1624109  6.1624109  6.1624109
##  [49]  6.1624109  6.1624109  6.1624109  6.1624109  6.1624109  6.1624109
##  [55]  6.1624109  6.1624109  6.1624109  6.1624109  6.1624109  6.1624109
##  [61]  6.1624109  6.1624109  6.1624109  6.1624109  6.1624109  6.1624109
##  [67]  6.1624109  6.1624109  6.1624109  6.1624109  6.1624109  6.1624109
##  [73]  6.1624109  6.1624109  6.1624109  6.1624109  6.1624109  6.1624109
##  [79]  6.1624109  6.1624109  6.1624109  6.1624109  6.1624109  6.1624109
##  [85]  6.1624109  6.1624109  6.1624109  6.1624109  4.0466187  4.0466187
##  [91]  4.0466187  4.0466187  4.0466187  4.0466187  4.0466187  4.0466187
##  [97]  4.0466187  4.0466187  4.0466187  4.0466187  4.0466187  4.0466187
## [103]  4.0466187  4.0466187  7.1621722  7.1621722  2.8149480  2.8149480
## [109]  2.8149480  2.8149480  2.8149480  2.8149480  2.8149480  5.9305016
## [115]  5.9305016  5.9305016  3.6013356  3.6013356  6.7168891  6.7168891
## [121]  2.3696650  5.4852185  4.6747925  4.6747925  4.6747925  4.6747925
## [127]  4.6747925  2.5590003  2.5590003  2.5590003 -0.1867366 -5.4180823
## [133] -5.4180823 -2.3025288 -6.6497530 -6.6497530 -6.6497530 -6.6497530
## [139] -6.6497530 -6.6497530 -6.6497530 -6.6497530 -3.5341994 -3.5341994
## [145] -3.5341994 -3.5341994 -3.5341994 -3.5341994 -3.5341994 -3.5341994
## [151] -3.5341994 -3.5341994 -3.5341994 -3.5341994 -3.5341994 -3.5341994
## [157] -3.5341994 -3.5341994 -3.5341994 -3.5341994 -4.3446255 -4.3446255
## [163] -4.3446255 -4.3446255 -4.3446255 -4.3446255 -4.3446255 -4.3446255
## [169] -4.3446255 -4.3446255 -4.3446255 -4.3446255 -4.3446255 -4.3446255
## [175] -4.3446255 -4.3446255 -4.3446255 -4.3446255 -4.3446255 -4.3446255
## [181] -4.3446255 -4.3446255 -4.3446255 -4.3446255 -4.3446255 -4.3446255
## [187] -4.3446255 -4.3446255 -4.3446255 -4.3446255 -4.3446255 -4.3446255
## [193] -4.3446255 -4.3446255 -4.3446255 -4.3446255 -4.3446255 -4.3446255
## [199] -4.3446255 -4.3446255 -4.3446255 -4.3446255 -1.2290719 -6.4604177
## [205] -7.6920883 -7.6920883 -7.6920883 -7.6920883 -7.6920883 -4.5765348
## [211] -4.5765348 -4.5765348 -4.5765348 -4.5765348 -4.5765348 -8.1373714
## [217] -8.1373714 -5.0218179 -5.0218179 -5.0218179 -5.0218179 -5.8322439
## [223] -5.8322439 -5.8322439 -5.8322439 -5.8322439 -5.8322439 -5.8322439
## [229] -5.8322439 -5.8322439 -5.8322439 -5.8322439 -5.8322439 -5.8322439
## [235] -5.8322439 -2.7166903 -7.9480361 -6.0641532 -6.0641532

max(FLAC_Model_Indices)

## [1] 8.204508

min(FLAC_Model_Indices)

## [1] -8.137371

##################################
# Consolidating the model probabilities
# and classification index
# based from the model predictions
##################################
FLAC_Model_Predictions <- as.data.frame(PMA_PreModelling_Train_FLAC)
FLAC_Model_Predictions$FLAC_Prob <- FLAC_Model_Probabilities
FLAC_Model_Predictions$FLAC_LP <- FLAC_Model_Indices
FLAC_Model_Predictions$UTI <- as.factor(FLAC_Model_Predictions$UTI)
FLAC_Model_Predictions$Label <- rep("FLAC",nrow(FLAC_Model_Predictions))

##################################
# Formulating the probability curve
# using the consolidated model predictions
##################################
FLAC_Model_Predictions %>%
  ggplot(aes(x = FLAC_LP ,
             y = FLAC_Prob,
             color = UTI)) +
  scale_colour_manual(values=c("#1846BA55","#B8000055")) +
  geom_point(size=5) +
  geom_line(color="black") +
  xlab("UTI Classification Index (Logit Values)") +
  ylab("Estimated UTI Probability") +
  labs(color = "UTI") +
  scale_x_continuous( limits=c(-10,10), breaks=seq(-10,10,by=1)) +
  scale_y_continuous( limits=c(0,1), breaks=seq(0,1,by=0.1),labels = scales::percent) +
  ggtitle("Estimated UTI Probabilities Based on Classification Index : Firth's Logistic Regression With Added Covariate") +
  theme_bw() +
  theme(plot.title = element_text(color="black", size=14, face="bold", hjust=0.50),
        axis.title.x = element_text(color="black", size=12, face="bold"),
        axis.title.y = element_text(color="black", size=12, face="bold"),
        legend.position="top")

##################################
# Formulating the corresponding
# receiver operating characteristic (ROC) curve
# using the model predictions
##################################
FLAC_Prob_Low  <- FLAC_Model_Predictions[FLAC_Model_Predictions$UTI==0,
                                    c("FLAC_Prob")]
FLAC_Prob_High <- FLAC_Model_Predictions[FLAC_Model_Predictions$UTI==1,
                                    c("FLAC_Prob")]
FLAC_Model_Prob_ROC <- roc.curve(scores.class1 = FLAC_Prob_Low,
                               scores.class0 = FLAC_Prob_High,
                                curve = TRUE)

plot(FLAC_Model_Prob_ROC,
     xlab="1-Specificity", 
     ylab="Sensitivity",
     main="ROC Curve of the UTI Probabilities : Firth's Logistic Regression With Added Covariate", 
     color=TRUE, 
     lwd=8,
     legend=3)

1.5.4 Firth’s Logistic Regression With Intercept Correction (FLIC)

Firth’s Logistic Regression With Intercept Correction implements Firth’s bias-reduced penalized-likelihood logistic regression by applying corrections on the intercept. The average predicted probabilities in Firth’s logistic regression is generally not equal to the observed proportion of events by being pushed towards one-half compared with maximum likelihood estimation. FLIC first applies Firth’s logistic regression and then corrects the intercept such that the predicted probabilities become unbiased while keeping all other coefficients constant.

[A] The Firth’s logistic regression with intercept correction model from the logistf package was implemented. The UTI response was regressed against the Age_24, Contraceptive, Condom, LubricatedCondom, Spermicide and Diaphragm predictors.

[B] The model contains 1 hyperparameter:
     [B.1] maxit = maximum number of iterations for the penalized-likelihood logistic regression held constant at a value of 1000

[C] The model was able to sufficiently compensate for the quasi-complete condition simulated for the Diaphragm predictor, resulting in reasonably valid estimated coefficients and standard errors.
     [C.1] Intercept coefficient = -3.274
     [C.2] Age_24 coefficient = -1.539
     [C.3] Contraceptive coefficient = -1.014
     [C.4] Condom coefficient = -2.048
     [C.5] LubricatedCondom coefficient = -1.214
     [C.6] Spermicide coefficient = +3.019
     [C.7] Diaphragm coefficient = +10.409

[D] The logistic curve formulated by plotting the predicted probabilities against the classification index using the logit values showed a valid logistic profile with reasonably distributed predicted points.

[E] The performance of the model is summarized as follows:
     [E.1] Final model configuration involves maxit=1000.
     [E.2] Apparent ROC Curve AUC = 0.99996
     [E.3] Estimated probabilities were reasonably distributed across a range of values.

Code Chunk | Output

##################################
# Creating a local object
# for the train set
##################################
PMA_PreModelling_Train_FLIC <- UTI

##################################
# Formulating the structure of the
# Firth's Logistic Regression With Intercept Correction model
##################################
FLIC_Model <- flic(UTI ~ Age_24 + Contraceptive + Condom + LubricatedCondom + Spermicide + Diaphragm,
                       data = PMA_PreModelling_Train_FLIC,
                       control=logistf.control(maxit = 10000))

##################################
# Consolidating the model results
##################################
summary(FLIC_Model)

## Firth's logistic regression with intercept correction
## 
## flic.default(formula = UTI ~ Age_24 + Contraceptive + Condom + 
##     LubricatedCondom + Spermicide + Diaphragm, data = PMA_PreModelling_Train_FLIC, 
##     control = logistf.control(maxit = 10000))
## 
## Model fitted by Penalized ML
## Coefficients:
##                       coef  se(coef) lower 0.95 upper 0.95      Chisq
## (Intercept)      -3.274102 0.6801791  -6.859687  0.3114838 37.0547096
## Age_24           -1.539877 0.2894472 -10.674608  6.0754390  0.0322714
## Contraceptive    -1.014937 0.2432554 -10.851515  2.2734388  0.3216179
## Condom           -2.048751 0.2402000  -8.554568  2.9532245  0.5499878
## LubricatedCondom -1.214703 0.2497985  -9.510967  4.3357916  0.2212636
## Spermicide        3.019690 0.2665983  -5.362560  8.5737577  0.0000000
## Diaphragm        10.409803 0.6658350   7.071215 17.0061765        Inf
##                             p method
## (Intercept)      1.148608e-09      1
## Age_24           8.574333e-01      2
## Contraceptive    5.706369e-01      2
## Condom           4.583227e-01      2
## LubricatedCondom 6.380788e-01      2
## Spermicide       1.000000e+00      2
## Diaphragm        0.000000e+00      2
## 
## Method: 1-Wald, 2-Profile penalized log-likelihood, 3-None
## 
## Likelihood ratio test=300.1144 on 6 df, p=0, n=239
## Wald test = 3885.076 on 6 df, p = 0

FLIC_Model_Coef <- (as.data.frame(FLIC_Model$coefficients))
FLIC_Model_Coef$Coef <- rownames(FLIC_Model_Coef)
FLIC_Model_Coef$Model <- rep("FLIC",nrow(FLIC_Model_Coef))
colnames(FLIC_Model_Coef) <- c("Estimates","Coefficients","Model")
print(FLIC_Model_Coef, rownames=FALSE)

##                  Estimates     Coefficients Model
## (Intercept)      -3.274102      (Intercept)  FLIC
## Age_24           -1.539877           Age_24  FLIC
## Contraceptive    -1.014937    Contraceptive  FLIC
## Condom           -2.048751           Condom  FLIC
## LubricatedCondom -1.214703 LubricatedCondom  FLIC
## Spermicide        3.019690       Spermicide  FLIC
## Diaphragm        10.409803        Diaphragm  FLIC

##################################
# Computing the model predictions
##################################
(FLIC_Model_Probabilities <- predict(FLIC_Model, 
                              type = c("response")))

##   [1] 0.4367378818 0.9938610876 0.9938610876 0.9938610876 0.9938610876
##   [6] 0.9938610876 0.9938610876 0.9938610876 0.9938610876 0.9938610876
##  [11] 0.9938610876 0.9938610876 0.9938610876 0.9938610876 0.9938610876
##  [16] 0.9996985605 0.9996985605 0.9996985605 0.9796127423 0.9796127423
##  [21] 0.9796127423 0.9796127423 0.9796127423 0.9796127423 0.9796127423
##  [26] 0.9796127423 0.9796127423 0.9796127423 0.9989850871 0.9989850871
##  [31] 0.9989850871 0.9989850871 0.9989850871 0.9989850871 0.9989850871
##  [36] 0.9989850871 0.9989850871 0.9989850871 0.9989850871 0.9989850871
##  [41] 0.9989850871 0.9978080391 0.9978080391 0.9978080391 0.9978080391
##  [46] 0.9978080391 0.9978080391 0.9978080391 0.9978080391 0.9978080391
##  [51] 0.9978080391 0.9978080391 0.9978080391 0.9978080391 0.9978080391
##  [56] 0.9978080391 0.9978080391 0.9978080391 0.9978080391 0.9978080391
##  [61] 0.9978080391 0.9978080391 0.9978080391 0.9978080391 0.9978080391
##  [66] 0.9978080391 0.9978080391 0.9978080391 0.9978080391 0.9978080391
##  [71] 0.9978080391 0.9978080391 0.9978080391 0.9978080391 0.9978080391
##  [76] 0.9978080391 0.9978080391 0.9978080391 0.9978080391 0.9978080391
##  [81] 0.9978080391 0.9978080391 0.9978080391 0.9978080391 0.9978080391
##  [86] 0.9978080391 0.9978080391 0.9978080391 0.9832425530 0.9832425530
##  [91] 0.9832425530 0.9832425530 0.9832425530 0.9832425530 0.9832425530
##  [96] 0.9832425530 0.9832425530 0.9832425530 0.9832425530 0.9832425530
## [101] 0.9832425530 0.9832425530 0.9832425530 0.9832425530 0.9991687122
## [106] 0.9991687122 0.9456953430 0.9456953430 0.9456953430 0.9456953430
## [111] 0.9456953430 0.9456953430 0.9456953430 0.9972046545 0.9972046545
## [116] 0.9972046545 0.9719978717 0.9719978717 0.9985956325 0.9985956325
## [121] 0.9115227016 0.9952839679 0.9898581137 0.9898581137 0.9898581137
## [126] 0.9898581137 0.9898581137 0.9263642127 0.9263642127 0.9263642127
## [131] 0.4367378818 0.0048551274 0.0048551274 0.0908613275 0.0014459300
## [136] 0.0014459300 0.0014459300 0.0014459300 0.0014459300 0.0014459300
## [141] 0.0014459300 0.0014459300 0.0288081480 0.0288081480 0.0288081480
## [146] 0.0288081480 0.0288081480 0.0288081480 0.0288081480 0.0288081480
## [151] 0.0288081480 0.0288081480 0.0288081480 0.0288081480 0.0288081480
## [156] 0.0288081480 0.0288081480 0.0288081480 0.0288081480 0.0288081480
## [161] 0.0135324694 0.0135324694 0.0135324694 0.0135324694 0.0135324694
## [166] 0.0135324694 0.0135324694 0.0135324694 0.0135324694 0.0135324694
## [171] 0.0135324694 0.0135324694 0.0135324694 0.0135324694 0.0135324694
## [176] 0.0135324694 0.0135324694 0.0135324694 0.0135324694 0.0135324694
## [181] 0.0135324694 0.0135324694 0.0135324694 0.0135324694 0.0135324694
## [186] 0.0135324694 0.0135324694 0.0135324694 0.0135324694 0.0135324694
## [191] 0.0135324694 0.0135324694 0.0135324694 0.0135324694 0.0135324694
## [196] 0.0135324694 0.0135324694 0.0135324694 0.0135324694 0.0135324694
## [201] 0.0135324694 0.0135324694 0.2193687712 0.0017650851 0.0005245252
## [206] 0.0005245252 0.0005245252 0.0005245252 0.0005245252 0.0106361610
## [211] 0.0106361610 0.0106361610 0.0106361610 0.0106361610 0.0106361610
## [216] 0.0003103708 0.0003103708 0.0063197079 0.0063197079 0.0063197079
## [221] 0.0063197079 0.0029326402 0.0029326402 0.0029326402 0.0029326402
## [226] 0.0029326402 0.0029326402 0.0029326402 0.0029326402 0.0029326402
## [231] 0.0029326402 0.0029326402 0.0029326402 0.0029326402 0.0029326402
## [236] 0.0568276830 0.0003789730 0.0022996884 0.0022996884

##################################
# Creating a classification index
# based from the model predictions
##################################
(FLIC_Model_Indices <- predict(FLIC_Model, 
                           type = c("link")))

##          1          2          3          4          5          6          7 
## -0.2544119  5.0869499  5.0869499  5.0869499  5.0869499  5.0869499  5.0869499 
##          8          9         10         11         12         13         14 
##  5.0869499  5.0869499  5.0869499  5.0869499  5.0869499  5.0869499  5.0869499 
##         15         16         17         18         19         20         21 
##  5.0869499  8.1066397  8.1066397  8.1066397  3.8722473  3.8722473  3.8722473 
##         22         23         24         25         26         27         28 
##  3.8722473  3.8722473  3.8722473  3.8722473  3.8722473  3.8722473  3.8722473 
##         29         30         31         32         33         34         35 
##  6.8919371  6.8919371  6.8919371  6.8919371  6.8919371  6.8919371  6.8919371 
##         36         37         38         39         40         41         42 
##  6.8919371  6.8919371  6.8919371  6.8919371  6.8919371  6.8919371  6.1207644 
##         43         44         45         46         47         48         49 
##  6.1207644  6.1207644  6.1207644  6.1207644  6.1207644  6.1207644  6.1207644 
##         50         51         52         53         54         55         56 
##  6.1207644  6.1207644  6.1207644  6.1207644  6.1207644  6.1207644  6.1207644 
##         57         58         59         60         61         62         63 
##  6.1207644  6.1207644  6.1207644  6.1207644  6.1207644  6.1207644  6.1207644 
##         64         65         66         67         68         69         70 
##  6.1207644  6.1207644  6.1207644  6.1207644  6.1207644  6.1207644  6.1207644 
##         71         72         73         74         75         76         77 
##  6.1207644  6.1207644  6.1207644  6.1207644  6.1207644  6.1207644  6.1207644 
##         78         79         80         81         82         83         84 
##  6.1207644  6.1207644  6.1207644  6.1207644  6.1207644  6.1207644  6.1207644 
##         85         86         87         88         89         90         91 
##  6.1207644  6.1207644  6.1207644  6.1207644  4.0720131  4.0720131  4.0720131 
##         92         93         94         95         96         97         98 
##  4.0720131  4.0720131  4.0720131  4.0720131  4.0720131  4.0720131  4.0720131 
##         99        100        101        102        103        104        105 
##  4.0720131  4.0720131  4.0720131  4.0720131  4.0720131  4.0720131  7.0917029 
##        106        107        108        109        110        111        112 
##  7.0917029  2.8573105  2.8573105  2.8573105  2.8573105  2.8573105  2.8573105 
##        113        114        115        116        117        118        119 
##  2.8573105  5.8770003  5.8770003  5.8770003  3.5470731  3.5470731  6.5667629 
##        120        121        122        123        124        125        126 
##  6.5667629  2.3323705  5.3520603  4.5808876  4.5808876  4.5808876  4.5808876 
##        127        128        129        130        131        132        133 
##  4.5808876  2.5321363  2.5321363  2.5321363 -0.2544119 -5.3228530 -5.3228530 
##        134        135        136        137        138        139        140 
## -2.3031632 -6.5375556 -6.5375556 -6.5375556 -6.5375556 -6.5375556 -6.5375556 
##        141        142        143        144        145        146        147 
## -6.5375556 -6.5375556 -3.5178658 -3.5178658 -3.5178658 -3.5178658 -3.5178658 
##        148        149        150        151        152        153        154 
## -3.5178658 -3.5178658 -3.5178658 -3.5178658 -3.5178658 -3.5178658 -3.5178658 
##        155        156        157        158        159        160        161 
## -3.5178658 -3.5178658 -3.5178658 -3.5178658 -3.5178658 -3.5178658 -4.2890385 
##        162        163        164        165        166        167        168 
## -4.2890385 -4.2890385 -4.2890385 -4.2890385 -4.2890385 -4.2890385 -4.2890385 
##        169        170        171        172        173        174        175 
## -4.2890385 -4.2890385 -4.2890385 -4.2890385 -4.2890385 -4.2890385 -4.2890385 
##        176        177        178        179        180        181        182 
## -4.2890385 -4.2890385 -4.2890385 -4.2890385 -4.2890385 -4.2890385 -4.2890385 
##        183        184        185        186        187        188        189 
## -4.2890385 -4.2890385 -4.2890385 -4.2890385 -4.2890385 -4.2890385 -4.2890385 
##        190        191        192        193        194        195        196 
## -4.2890385 -4.2890385 -4.2890385 -4.2890385 -4.2890385 -4.2890385 -4.2890385 
##        197        198        199        200        201        202        203 
## -4.2890385 -4.2890385 -4.2890385 -4.2890385 -4.2890385 -4.2890385 -1.2693487 
##        204        205        206        207        208        209        210 
## -6.3377898 -7.5524924 -7.5524924 -7.5524924 -7.5524924 -7.5524924 -4.5328025 
##        211        212        213        214        215        216        217 
## -4.5328025 -4.5328025 -4.5328025 -4.5328025 -4.5328025 -8.0774323 -8.0774323 
##        218        219        220        221        222        223        224 
## -5.0577425 -5.0577425 -5.0577425 -5.0577425 -5.8289152 -5.8289152 -5.8289152 
##        225        226        227        228        229        230        231 
## -5.8289152 -5.8289152 -5.8289152 -5.8289152 -5.8289152 -5.8289152 -5.8289152 
##        232        233        234        235        236        237        238 
## -5.8289152 -5.8289152 -5.8289152 -5.8289152 -2.8092254 -7.8776665 -6.0726793 
##        239 
## -6.0726793

max(FLIC_Model_Indices)

## [1] 8.10664

min(FLIC_Model_Indices)

## [1] -8.077432

##################################
# Consolidating the model probabilities
# and classification index
# based from the model predictions
##################################
FLIC_Model_Predictions <- as.data.frame(PMA_PreModelling_Train_FLIC)
FLIC_Model_Predictions$FLIC_Prob <- FLIC_Model_Probabilities
FLIC_Model_Predictions$FLIC_LP <- FLIC_Model_Indices
FLIC_Model_Predictions$UTI <- as.factor(FLIC_Model_Predictions$UTI)
FLIC_Model_Predictions$Label <- rep("FLIC",nrow(FLIC_Model_Predictions))

##################################
# Formulating the probability curve
# using the consolidated model predictions
##################################
FLIC_Model_Predictions %>%
  ggplot(aes(x = FLIC_LP ,
             y = FLIC_Prob,
             color = UTI)) +
  scale_colour_manual(values=c("#1846BA55","#B8000055")) +
  geom_point(size=5) +
  geom_line(color="black") +
  xlab("UTI Classification Index (Logit Values)") +
  ylab("Estimated UTI Probability") +
  labs(color = "UTI") +
  scale_x_continuous( limits=c(-10,10), breaks=seq(-10,10,by=1)) +
  scale_y_continuous( limits=c(0,1), breaks=seq(0,1,by=0.1),labels = scales::percent) +
  ggtitle("Estimated UTI Probabilities Based on Classification Index : Firth's Logistic Regression With Intercept Correction") +
  theme_bw() +
  theme(plot.title = element_text(color="black", size=14, face="bold", hjust=0.50),
        axis.title.x = element_text(color="black", size=12, face="bold"),
        axis.title.y = element_text(color="black", size=12, face="bold"),
        legend.position="top")

##################################
# Formulating the corresponding
# receiver operating characteristic (ROC) curve
# using the model predictions
##################################
FLIC_Prob_Low  <- FLIC_Model_Predictions[FLIC_Model_Predictions$UTI==0,
                                    c("FLIC_Prob")]
FLIC_Prob_High <- FLIC_Model_Predictions[FLIC_Model_Predictions$UTI==1,
                                    c("FLIC_Prob")]
FLIC_Model_Prob_ROC <- roc.curve(scores.class1 = FLIC_Prob_Low,
                               scores.class0 = FLIC_Prob_High,
                                curve = TRUE)

plot(FLIC_Model_Prob_ROC,
     xlab="1-Specificity", 
     ylab="Sensitivity",
     main="ROC Curve of the UTI Probabilities : Firth's Logistic Regression With Intercept Correction", 
     color=TRUE, 
     lwd=8,
     legend=3)

1.5.5 Bayesian Generalized Linear Model With Cauchy Priors (BGLM_CP)

Bayesian Generalized Linear Model With Cauchy Priors implements a Bayesian function for generalized linear modelling with independent Cauchy prior distributions inflated by a factor of 2.5 for each coefficient. An approximate expectation-maximization algorithm is applied to update the coefficients at each step using an augmented regression to represent the prior information.

[A] The Bayesian generalized linear model with Cauchy priors model from the arm package was implemented. The UTI response was regressed against the Age_24, Contraceptive, Condom, LubricatedCondom, Spermicide and Diaphragm predictors.

[B] The model does not contain any hyperparameter.

[C] The model was able to sufficiently compensate for the quasi-complete condition simulated for the Diaphragm predictor, resulting in reasonably valid estimated coefficients and standard errors.
     [C.1] Intercept coefficient = -3.051
     [C.2] Age_24 coefficient = -0.489
     [C.3] Contraceptive coefficient = -1.935
     [C.4] Condom coefficient = -1.510
     [C.5] LubricatedCondom coefficient = -1.306
     [C.6] Spermicide coefficient = +1.540
     [C.7] Diaphragm coefficient = +12.604

[D] The logistic curve formulated by plotting the predicted probabilities against the classification index using the logit values showed a valid logistic profile with reasonably distributed predicted points.

[E] The performance of the model is summarized as follows:
     [E.1] Final model configuration is fixed due to the absence of a hyperparameter.
     [E.2] Apparent ROC Curve AUC = 0.99996
     [E.3] Estimated probabilities were reasonably distributed across a range of values.

Code Chunk | Output

##################################
# Creating a local object
# for the train set
##################################
PMA_PreModelling_Train_BGLM_CP <- UTI

##################################
# Formulating the structure of the
# Bayesian Generalized Linear Model With Cauchy Priors model
##################################
BGLM_CP_Model <- bayesglm(UTI ~ Age_24 + Contraceptive + Condom + LubricatedCondom + Spermicide + Diaphragm,
                       data = PMA_PreModelling_Train_BGLM_CP,
                       family=binomial(link="logit"))

##################################
# Consolidating the model results
##################################
summary(BGLM_CP_Model)

## 
## Call:
## bayesglm(formula = UTI ~ Age_24 + Contraceptive + Condom + LubricatedCondom + 
##     Spermicide + Diaphragm, family = binomial(link = "logit"), 
##     data = PMA_PreModelling_Train_BGLM_CP)
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -3.051      1.924  -1.586    0.113    
## Age_24             -0.489      1.753  -0.279    0.780    
## Contraceptive      -1.935      1.797  -1.077    0.281    
## Condom             -1.510      1.766  -0.855    0.392    
## LubricatedCondom   -1.306      1.743  -0.750    0.453    
## Spermicide          1.540      1.730   0.891    0.373    
## Diaphragm          12.604      2.860   4.407 1.05e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 329.4768  on 238  degrees of freedom
## Residual deviance:   5.6929  on 232  degrees of freedom
## AIC: 19.693
## 
## Number of Fisher Scoring iterations: 28

BGLM_CP_Model_Coef <- (as.data.frame(BGLM_CP_Model$coefficients))
BGLM_CP_Model_Coef$Coef <- rownames(BGLM_CP_Model_Coef)
BGLM_CP_Model_Coef$Model <- rep("BGLM_CP",nrow(BGLM_CP_Model_Coef))
colnames(BGLM_CP_Model_Coef) <- c("Estimates","Coefficients","Model")
print(BGLM_CP_Model_Coef, rownames=FALSE)

##                   Estimates     Coefficients   Model
## (Intercept)      -3.0513140      (Intercept) BGLM_CP
## Age_24           -0.4890448           Age_24 BGLM_CP
## Contraceptive    -1.9351325    Contraceptive BGLM_CP
## Condom           -1.5103889           Condom BGLM_CP
## LubricatedCondom -1.3064324 LubricatedCondom BGLM_CP
## Spermicide        1.5402574       Spermicide BGLM_CP
## Diaphragm        12.6043657        Diaphragm BGLM_CP

##################################
# Computing the model predictions
##################################
(BGLM_CP_Model_Probabilities <- predict(BGLM_CP_Model, 
                              type = c("response")))

##            1            2            3            4            5            6 
## 0.1807822594 0.9996786514 0.9996786514 0.9996786514 0.9996786514 0.9996786514 
##            7            8            9           10           11           12 
## 0.9996786514 0.9996786514 0.9996786514 0.9996786514 0.9996786514 0.9996786514 
##           13           14           15           16           17           18 
## 0.9996786514 0.9996786514 0.9996786514 0.9999311093 0.9999311093 0.9999311093 
##           19           20           21           22           23           24 
## 0.9988142938 0.9988142938 0.9988142938 0.9988142938 0.9988142938 0.9988142938 
##           25           26           27           28           29           30 
## 0.9988142938 0.9988142938 0.9988142938 0.9988142938 0.9997456355 0.9997456355 
##           31           32           33           34           35           36 
## 0.9997456355 0.9997456355 0.9997456355 0.9997456355 0.9997456355 0.9997456355 
##           37           38           39           40           41           42 
## 0.9997456355 0.9997456355 0.9997456355 0.9997456355 0.9997456355 0.9995086779 
##           43           44           45           46           47           48 
## 0.9995086779 0.9995086779 0.9995086779 0.9995086779 0.9995086779 0.9995086779 
##           49           50           51           52           53           54 
## 0.9995086779 0.9995086779 0.9995086779 0.9995086779 0.9995086779 0.9995086779 
##           55           56           57           58           59           60 
## 0.9995086779 0.9995086779 0.9995086779 0.9995086779 0.9995086779 0.9995086779 
##           61           62           63           64           65           66 
## 0.9995086779 0.9995086779 0.9995086779 0.9995086779 0.9995086779 0.9995086779 
##           67           68           69           70           71           72 
## 0.9995086779 0.9995086779 0.9995086779 0.9995086779 0.9995086779 0.9995086779 
##           73           74           75           76           77           78 
## 0.9995086779 0.9995086779 0.9995086779 0.9995086779 0.9995086779 0.9995086779 
##           79           80           81           82           83           84 
## 0.9995086779 0.9995086779 0.9995086779 0.9995086779 0.9995086779 0.9995086779 
##           85           86           87           88           89           90 
## 0.9995086779 0.9995086779 0.9995086779 0.9995086779 0.9977789026 0.9977789026 
##           91           92           93           94           95           96 
## 0.9977789026 0.9977789026 0.9977789026 0.9977789026 0.9977789026 0.9977789026 
##           97           98           99          100          101          102 
## 0.9977789026 0.9977789026 0.9977789026 0.9977789026 0.9977789026 0.9977789026 
##          103          104          105          106          107          108 
## 0.9977789026 0.9977789026 0.9995231291 0.9995231291 0.9918463122 0.9918463122 
##          109          110          111          112          113          114 
## 0.9918463122 0.9918463122 0.9918463122 0.9918463122 0.9918463122 0.9982411861 
##          115          116          117          118          119          120 
## 0.9982411861 0.9982411861 0.9994760645 0.9994760645 0.9998876608 0.9998876608 
##          121          122          123          124          125          126 
## 0.9980678455 0.9995852596 0.9991990209 0.9991990209 0.9991990209 0.9991990209 
##          127          128          129          130          131          132 
## 0.9991990209 0.9963829954 0.9963829954 0.9963829954 0.1807822594 0.0103363031 
##          133          134          135          136          137          138 
## 0.0103363031 0.0464663861 0.0028201661 0.0028201661 0.0028201661 0.0028201661 
##          139          140          141          142          143          144 
## 0.0028201661 0.0028201661 0.0028201661 0.0028201661 0.0130236654 0.0130236654 
##          145          146          147          148          149          150 
## 0.0130236654 0.0130236654 0.0130236654 0.0130236654 0.0130236654 0.0130236654 
##          151          152          153          154          155          156 
## 0.0130236654 0.0130236654 0.0130236654 0.0130236654 0.0130236654 0.0130236654 
##          157          158          159          160          161          162 
## 0.0130236654 0.0130236654 0.0130236654 0.0130236654 0.0067835607 0.0067835607 
##          163          164          165          166          167          168 
## 0.0067835607 0.0067835607 0.0067835607 0.0067835607 0.0067835607 0.0067835607 
##          169          170          171          172          173          174 
## 0.0067835607 0.0067835607 0.0067835607 0.0067835607 0.0067835607 0.0067835607 
##          175          176          177          178          179          180 
## 0.0067835607 0.0067835607 0.0067835607 0.0067835607 0.0067835607 0.0067835607 
##          181          182          183          184          185          186 
## 0.0067835607 0.0067835607 0.0067835607 0.0067835607 0.0067835607 0.0067835607 
##          187          188          189          190          191          192 
## 0.0067835607 0.0067835607 0.0067835607 0.0067835607 0.0067835607 0.0067835607 
##          193          194          195          196          197          198 
## 0.0067835607 0.0067835607 0.0067835607 0.0067835607 0.0067835607 0.0067835607 
##          199          200          201          202          203          204 
## 0.0067835607 0.0067835607 0.0067835607 0.0067835607 0.0308827134 0.0015059333 
##          205          206          207          208          209          210 
## 0.0004082315 0.0004082315 0.0004082315 0.0004082315 0.0004082315 0.0019018768 
##          211          212          213          214          215          216 
## 0.0019018768 0.0019018768 0.0019018768 0.0019018768 0.0019018768 0.0017312477 
##          217          218          219          220          221          222 
## 0.0017312477 0.0080266998 0.0080266998 0.0080266998 0.0080266998 0.0041707030 
##          223          224          225          226          227          228 
## 0.0041707030 0.0041707030 0.0041707030 0.0041707030 0.0041707030 0.0041707030 
##          229          230          231          232          233          234 
## 0.0041707030 0.0041707030 0.0041707030 0.0041707030 0.0041707030 0.0041707030 
##          235          236          237          238          239 
## 0.0041707030 0.0191665924 0.0009239944 0.0011671120 0.0011671120

##################################
# Creating a classification index
# based from the model predictions
##################################
(BGLM_CP_Model_Indices <- predict(BGLM_CP_Model, 
                           type = c("link")))

##         1         2         3         4         5         6         7         8 
## -1.511057  8.042663  8.042663  8.042663  8.042663  8.042663  8.042663  8.042663 
##         9        10        11        12        13        14        15        16 
##  8.042663  8.042663  8.042663  8.042663  8.042663  8.042663  8.042663  9.582920 
##        17        18        19        20        21        22        23        24 
##  9.582920  9.582920  6.736230  6.736230  6.736230  6.736230  6.736230  6.736230 
##        25        26        27        28        29        30        31        32 
##  6.736230  6.736230  6.736230  6.736230  8.276488  8.276488  8.276488  8.276488 
##        33        34        35        36        37        38        39        40 
##  8.276488  8.276488  8.276488  8.276488  8.276488  8.276488  8.276488  8.276488 
##        41        42        43        44        45        46        47        48 
##  8.276488  7.617919  7.617919  7.617919  7.617919  7.617919  7.617919  7.617919 
##        49        50        51        52        53        54        55        56 
##  7.617919  7.617919  7.617919  7.617919  7.617919  7.617919  7.617919  7.617919 
##        57        58        59        60        61        62        63        64 
##  7.617919  7.617919  7.617919  7.617919  7.617919  7.617919  7.617919  7.617919 
##        65        66        67        68        69        70        71        72 
##  7.617919  7.617919  7.617919  7.617919  7.617919  7.617919  7.617919  7.617919 
##        73        74        75        76        77        78        79        80 
##  7.617919  7.617919  7.617919  7.617919  7.617919  7.617919  7.617919  7.617919 
##        81        82        83        84        85        86        87        88 
##  7.617919  7.617919  7.617919  7.617919  7.617919  7.617919  7.617919  7.617919 
##        89        90        91        92        93        94        95        96 
##  6.107530  6.107530  6.107530  6.107530  6.107530  6.107530  6.107530  6.107530 
##        97        98        99       100       101       102       103       104 
##  6.107530  6.107530  6.107530  6.107530  6.107530  6.107530  6.107530  6.107530 
##       105       106       107       108       109       110       111       112 
##  7.647788  7.647788  4.801098  4.801098  4.801098  4.801098  4.801098  4.801098 
##       113       114       115       116       117       118       119       120 
##  4.801098  6.341355  6.341355  6.341355  7.553618  7.553618  9.093875  9.093875 
##       121       122       123       124       125       126       127       128 
##  6.247186  7.787443  7.128874  7.128874  7.128874  7.128874  7.128874  5.618486 
##       129       130       131       132       133       134       135       136 
##  5.618486  5.618486 -1.511057 -4.561703 -4.561703 -3.021446 -5.868135 -5.868135 
##       137       138       139       140       141       142       143       144 
## -5.868135 -5.868135 -5.868135 -5.868135 -5.868135 -5.868135 -4.327878 -4.327878 
##       145       146       147       148       149       150       151       152 
## -4.327878 -4.327878 -4.327878 -4.327878 -4.327878 -4.327878 -4.327878 -4.327878 
##       153       154       155       156       157       158       159       160 
## -4.327878 -4.327878 -4.327878 -4.327878 -4.327878 -4.327878 -4.327878 -4.327878 
##       161       162       163       164       165       166       167       168 
## -4.986446 -4.986446 -4.986446 -4.986446 -4.986446 -4.986446 -4.986446 -4.986446 
##       169       170       171       172       173       174       175       176 
## -4.986446 -4.986446 -4.986446 -4.986446 -4.986446 -4.986446 -4.986446 -4.986446 
##       177       178       179       180       181       182       183       184 
## -4.986446 -4.986446 -4.986446 -4.986446 -4.986446 -4.986446 -4.986446 -4.986446 
##       185       186       187       188       189       190       191       192 
## -4.986446 -4.986446 -4.986446 -4.986446 -4.986446 -4.986446 -4.986446 -4.986446 
##       193       194       195       196       197       198       199       200 
## -4.986446 -4.986446 -4.986446 -4.986446 -4.986446 -4.986446 -4.986446 -4.986446 
##       201       202       203       204       205       206       207       208 
## -4.986446 -4.986446 -3.446189 -6.496835 -7.803268 -7.803268 -7.803268 -7.803268 
##       209       210       211       212       213       214       215       216 
## -7.803268 -6.263010 -6.263010 -6.263010 -6.263010 -6.263010 -6.263010 -6.357180 
##       217       218       219       220       221       222       223       224 
## -6.357180 -4.816923 -4.816923 -4.816923 -4.816923 -5.475491 -5.475491 -5.475491 
##       225       226       227       228       229       230       231       232 
## -5.475491 -5.475491 -5.475491 -5.475491 -5.475491 -5.475491 -5.475491 -5.475491 
##       233       234       235       236       237       238       239 
## -5.475491 -5.475491 -5.475491 -3.935234 -6.985880 -6.752055 -6.752055

max(BGLM_CP_Model_Indices)

## [1] 9.58292

min(BGLM_CP_Model_Indices)

## [1] -7.803268

##################################
# Consolidating the model probabilities
# and classification index
# based from the model predictions
##################################
BGLM_CP_Model_Predictions <- as.data.frame(PMA_PreModelling_Train_BGLM_CP)
BGLM_CP_Model_Predictions$BGLM_CP_Prob <- BGLM_CP_Model_Probabilities
BGLM_CP_Model_Predictions$BGLM_CP_LP <- BGLM_CP_Model_Indices
BGLM_CP_Model_Predictions$UTI <- as.factor(BGLM_CP_Model_Predictions$UTI)
BGLM_CP_Model_Predictions$Label <- rep("BGLM_CP",nrow(BGLM_CP_Model_Predictions))

##################################
# Formulating the probability curve
# using the consolidated model predictions
##################################
BGLM_CP_Model_Predictions %>%
  ggplot(aes(x = BGLM_CP_LP ,
             y = BGLM_CP_Prob,
             color = UTI)) +
  scale_colour_manual(values=c("#1846BA55","#B8000055")) +
  geom_point(size=5) +
  geom_line(color="black") +
  xlab("UTI Classification Index (Logit Values)") +
  ylab("Estimated UTI Probability") +
  labs(color = "UTI") +
  scale_x_continuous( limits=c(-10,10), breaks=seq(-10,10,by=1)) +
  scale_y_continuous( limits=c(0,1), breaks=seq(0,1,by=0.1),labels = scales::percent) +
  ggtitle("Estimated UTI Probabilities Based on Classification Index : Bayesian Generalized Linear Model With Cauchy Priors") +
  theme_bw() +
  theme(plot.title = element_text(color="black", size=14, face="bold", hjust=0.50),
        axis.title.x = element_text(color="black", size=12, face="bold"),
        axis.title.y = element_text(color="black", size=12, face="bold"),
        legend.position="top")

##################################
# Formulating the corresponding
# receiver operating characteristic (ROC) curve
# using the model predictions
##################################
BGLM_CP_Prob_Low  <- BGLM_CP_Model_Predictions[BGLM_CP_Model_Predictions$UTI==0,
                                    c("BGLM_CP_Prob")]
BGLM_CP_Prob_High <- BGLM_CP_Model_Predictions[BGLM_CP_Model_Predictions$UTI==1,
                                    c("BGLM_CP_Prob")]
BGLM_CP_Model_Prob_ROC <- roc.curve(scores.class1 = BGLM_CP_Prob_Low,
                               scores.class0 = BGLM_CP_Prob_High,
                                curve = TRUE)

plot(BGLM_CP_Model_Prob_ROC,
     xlab="1-Specificity", 
     ylab="Sensitivity",
     main="ROC Curve of the UTI Probabilities : Bayesian Generalized Linear Model With Cauchy Priors", 
     color=TRUE, 
     lwd=8,
     legend=3)

1.5.6 Penalized Logistic Regression - Ridge (PLR_R)

Ridge Penalized Logistic Regression implements a shrinkage estimation method to deal with infinite estimates by pulling, on average, the original estimates towards zero. The algorithm penalizes the log-likelihood by subtracting a multiple of the sum of the squared coefficients while excluding the intercept from the sum, which is equivalent to penalization using normal prior densities.

[A] The penalized logistic regression - ridge model from the glmnet package was implemented. The UTI response was regressed against the Age_24, Contraceptive, Condom, LubricatedCondom, Spermicide and Diaphragm predictors.

[B] The model contains 1 hyperparameter:
     [B.1] lambda = shrinkage parameter made to vary across a range of values equal to 0.05 to 490

[C] The model was able to sufficiently compensate for the quasi-complete condition simulated for the Diaphragm predictor, resulting in reasonably valid estimated coefficients and standard errors.
     [C.1] Intercept coefficient = -1.719
     [C.2] Age_24 coefficient = -0.363
     [C.3] Contraceptive coefficient = -0.185
     [C.4] Condom coefficient = +0.324
     [C.5] LubricatedCondom coefficient = -0.472
     [C.6] Spermicide coefficient = -0.149
     [C.7] Diaphragm coefficient = +4.129

[D] The logistic curve formulated by plotting the predicted probabilities against the classification index using the logit values showed a a slightly kinked logistic profile with reasonably distributed predicted points.

[E] The performance of the model is summarized as follows:
     [E.1] Final model configuration involves lambda=0.05.
     [E.2] Apparent ROC Curve AUC = 0.99912
     [E.3] Estimated probabilities were reasonably distributed across a range of values.

Code Chunk | Output

##################################
# Creating a local object
# for the train set
##################################
PMA_PreModelling_Train_PLR_R <- UTI
PLR_R_Response   <- PMA_PreModelling_Train_PLR_R[,c("UTI")]
PLR_R_Predictors <- model.matrix(UTI~.,
                                  PMA_PreModelling_Train_PLR_R)[,2:ncol(PMA_PreModelling_Train_PLR_R)]

##################################
# Conducting hyperparameter tuning
# of the lambda parameter
# using cross-validation
##################################
(PLR_R_LambdaCV <- cv.glmnet(PLR_R_Predictors,
                            PLR_R_Response,
                            family = "binomial",
                            alpha = 0))

## 
## Call:  cv.glmnet(x = PLR_R_Predictors, y = PLR_R_Response, family = "binomial",      alpha = 0) 
## 
## Measure: Binomial Deviance 
## 
##      Lambda Index Measure      SE Nonzero
## min 0.04939   100  0.2442 0.01824       6
## 1se 0.05420    99  0.2580 0.01770       6

PLR_R_LambdaCV$lambda

##   [1] 493.88644807 450.01093723 410.03320585 373.60698594 340.41677102
##   [6] 310.17508331 282.61998378 257.51280336 234.63607564 213.79165337
##  [11] 194.79899212 177.49358655 161.72554551 147.35829378 134.26738910
##  [16] 122.33944431 111.47114526 101.56835594  92.54530313  84.32383347
##  [21]  76.83273651  70.00712796  63.78788766  58.12114752  52.95782497
##  [26]  48.25319775  43.96651666  40.06065249  36.50177454  33.25905749
##  [31]  30.30441449  27.61225383  25.15925730  22.92417822  20.88765741
##  [36]  19.03205550  17.34130014  15.80074683  14.39705203  13.11805761
##  [41]  11.95268553  10.89084189   9.92332951   9.04176827   8.23852249
##  [46]   7.50663486   6.83976611   6.23214014   5.67849399   5.17403223
##  [51]   4.71438547   4.29557246   3.91396564   3.56625972   3.24944303
##  [56]   2.96077147   2.69774469   2.45808449   2.23971504   2.04074493
##  [61]   1.85945077   1.69426229   1.54374870   1.40660632   1.28164729
##  [66]   1.16778928   1.06404610   0.96951917   0.88338976   0.80491185
##  [71]   0.73340570   0.66825196   0.60888630   0.55479452   0.50550810
##  [76]   0.46060015   0.41968171   0.38239834   0.34842713   0.31747382
##  [81]   0.28927032   0.26357234   0.24015730   0.21882239   0.19938281
##  [86]   0.18167019   0.16553111   0.15082579   0.13742684   0.12521822
##  [91]   0.11409418   0.10395836   0.09472299   0.08630806   0.07864069
##  [96]   0.07165447   0.06528888   0.05948880   0.05420398   0.04938864

plot(PLR_R_LambdaCV)

(PLR_R_MinLambda <- PLR_R_LambdaCV$lambda.min)

## [1] 0.04938864

##################################
# Formulating the structure of the
# Penalized Logistic Regression - Ridge model
##################################
PLR_R_Model <- glmnet(PLR_R_Predictors,
                      PLR_R_Response,
                      alpha = 0,
                      family = "binomial",
                      lambda = PLR_R_MinLambda)

##################################
# Consolidating the model results
##################################
summary(PLR_R_Model)

##            Length Class     Mode     
## a0         1      -none-    numeric  
## beta       6      dgCMatrix S4       
## df         1      -none-    numeric  
## dim        2      -none-    numeric  
## lambda     1      -none-    numeric  
## dev.ratio  1      -none-    numeric  
## nulldev    1      -none-    numeric  
## npasses    1      -none-    numeric  
## jerr       1      -none-    numeric  
## offset     1      -none-    logical  
## classnames 2      -none-    character
## call       6      -none-    call     
## nobs       1      -none-    numeric

coef(PLR_R_Model)

## 7 x 1 sparse Matrix of class "dgCMatrix"
##                          s0
## (Intercept)      -1.7196340
## Age_24           -0.3633108
## Contraceptive    -0.1858195
## Condom            0.3240962
## LubricatedCondom -0.4720128
## Spermicide       -0.1491506
## Diaphragm         4.1293922

PLR_R_Model_Coef <- (as.data.frame(coef(PLR_R_Model)[1:7]))
PLR_R_Model_Coef$Coef <- rownames(coef(PLR_R_Model))
PLR_R_Model_Coef$Model <- rep("PLR_R",nrow(PLR_R_Model_Coef))
colnames(PLR_R_Model_Coef) <- c("Estimates","Coefficients","Model")
print(PLR_R_Model_Coef, rownames=FALSE)

##    Estimates     Coefficients Model
## 1 -1.7196340      (Intercept) PLR_R
## 2 -0.3633108           Age_24 PLR_R
## 3 -0.1858195    Contraceptive PLR_R
## 4  0.3240962           Condom PLR_R
## 5 -0.4720128 LubricatedCondom PLR_R
## 6 -0.1491506       Spermicide PLR_R
## 7  4.1293922        Diaphragm PLR_R

##################################
# Computing the model predictions
##################################
(PLR_R_Model_Probabilities <- PLR_R_Model %>% 
   predict(PLR_R_Predictors, type = c("response")) %>% 
   as.vector())

##   [1] 0.13368241 0.93899500 0.93899500 0.93899500 0.93899500 0.93899500
##   [7] 0.93899500 0.93899500 0.93899500 0.93899500 0.93899500 0.93899500
##  [13] 0.93899500 0.93899500 0.93899500 0.92987063 0.92987063 0.92987063
##  [19] 0.90566709 0.90566709 0.90566709 0.90566709 0.90566709 0.90566709
##  [25] 0.90566709 0.90566709 0.90566709 0.90566709 0.89213057 0.89213057
##  [31] 0.89213057 0.89213057 0.89213057 0.89213057 0.89213057 0.89213057
##  [37] 0.89213057 0.89213057 0.89213057 0.89213057 0.89213057 0.90237871
##  [43] 0.90237871 0.90237871 0.90237871 0.90237871 0.90237871 0.90237871
##  [49] 0.90237871 0.90237871 0.90237871 0.90237871 0.90237871 0.90237871
##  [55] 0.90237871 0.90237871 0.90237871 0.90237871 0.90237871 0.90237871
##  [61] 0.90237871 0.90237871 0.90237871 0.90237871 0.90237871 0.90237871
##  [67] 0.90237871 0.90237871 0.90237871 0.90237871 0.90237871 0.90237871
##  [73] 0.90237871 0.90237871 0.90237871 0.90237871 0.90237871 0.90237871
##  [79] 0.90237871 0.90237871 0.90237871 0.90237871 0.90237871 0.90237871
##  [85] 0.90237871 0.90237871 0.90237871 0.90237871 0.92744138 0.92744138
##  [91] 0.92744138 0.92744138 0.92744138 0.92744138 0.92744138 0.92744138
##  [97] 0.92744138 0.92744138 0.92744138 0.92744138 0.92744138 0.92744138
## [103] 0.92744138 0.92744138 0.91674218 0.91674218 0.88855071 0.88855071
## [109] 0.88855071 0.88855071 0.88855071 0.88855071 0.88855071 0.87290273
## [115] 0.87290273 0.87290273 0.91455335 0.91455335 0.90215422 0.90215422
## [121] 0.86972515 0.85187460 0.86537012 0.86537012 0.86537012 0.86537012
## [127] 0.86537012 0.89886931 0.89886931 0.89886931 0.13368241 0.19852515
## [133] 0.19852515 0.17585474 0.13382539 0.13382539 0.13382539 0.13382539
## [139] 0.13382539 0.13382539 0.13382539 0.13382539 0.11746052 0.11746052
## [145] 0.11746052 0.11746052 0.11746052 0.11746052 0.11746052 0.11746052
## [151] 0.11746052 0.11746052 0.11746052 0.11746052 0.11746052 0.11746052
## [157] 0.11746052 0.11746052 0.11746052 0.11746052 0.12949249 0.12949249
## [163] 0.12949249 0.12949249 0.12949249 0.12949249 0.12949249 0.12949249
## [169] 0.12949249 0.12949249 0.12949249 0.12949249 0.12949249 0.12949249
## [175] 0.12949249 0.12949249 0.12949249 0.12949249 0.12949249 0.12949249
## [181] 0.12949249 0.12949249 0.12949249 0.12949249 0.12949249 0.12949249
## [187] 0.12949249 0.12949249 0.12949249 0.12949249 0.12949249 0.12949249
## [193] 0.12949249 0.12949249 0.12949249 0.12949249 0.12949249 0.12949249
## [199] 0.12949249 0.12949249 0.12949249 0.12949249 0.11358798 0.17060333
## [205] 0.11371229 0.11371229 0.11371229 0.11371229 0.11371229 0.09952435
## [211] 0.09952435 0.09952435 0.09952435 0.09952435 0.09952435 0.09701315
## [217] 0.09701315 0.08470963 0.08470963 0.08470963 0.08470963 0.09374314
## [223] 0.09374314 0.09374314 0.09374314 0.09374314 0.09374314 0.09374314
## [229] 0.09374314 0.09374314 0.09374314 0.09374314 0.09374314 0.09374314
## [235] 0.09374314 0.08181675 0.12513591 0.07137008 0.07137008

##################################
# Creating a classification index
# based from the model predictions
##################################
(PLR_R_Model_Indices <- PLR_R_Model %>% 
   predict(PLR_R_Predictors, type = c("link")) %>% 
   as.vector())

##   [1] -1.868785  2.733854  2.733854  2.733854  2.733854  2.733854  2.733854
##   [8]  2.733854  2.733854  2.733854  2.733854  2.733854  2.733854  2.733854
##  [15]  2.733854  2.584704  2.584704  2.584704  2.261842  2.261842  2.261842
##  [22]  2.261842  2.261842  2.261842  2.261842  2.261842  2.261842  2.261842
##  [29]  2.112691  2.112691  2.112691  2.112691  2.112691  2.112691  2.112691
##  [36]  2.112691  2.112691  2.112691  2.112691  2.112691  2.112691  2.223939
##  [43]  2.223939  2.223939  2.223939  2.223939  2.223939  2.223939  2.223939
##  [50]  2.223939  2.223939  2.223939  2.223939  2.223939  2.223939  2.223939
##  [57]  2.223939  2.223939  2.223939  2.223939  2.223939  2.223939  2.223939
##  [64]  2.223939  2.223939  2.223939  2.223939  2.223939  2.223939  2.223939
##  [71]  2.223939  2.223939  2.223939  2.223939  2.223939  2.223939  2.223939
##  [78]  2.223939  2.223939  2.223939  2.223939  2.223939  2.223939  2.223939
##  [85]  2.223939  2.223939  2.223939  2.223939  2.548035  2.548035  2.548035
##  [92]  2.548035  2.548035  2.548035  2.548035  2.548035  2.548035  2.548035
##  [99]  2.548035  2.548035  2.548035  2.548035  2.548035  2.548035  2.398884
## [106]  2.398884  2.076022  2.076022  2.076022  2.076022  2.076022  2.076022
## [113]  2.076022  1.926871  1.926871  1.926871  2.370544  2.370544  2.221393
## [120]  2.221393  1.898531  1.749380  1.860628  1.860628  1.860628  1.860628
## [127]  1.860628  2.184724  2.184724  2.184724 -1.868785 -1.395538 -1.395538
## [134] -1.544688 -1.867551 -1.867551 -1.867551 -1.867551 -1.867551 -1.867551
## [141] -1.867551 -1.867551 -2.016701 -2.016701 -2.016701 -2.016701 -2.016701
## [148] -2.016701 -2.016701 -2.016701 -2.016701 -2.016701 -2.016701 -2.016701
## [155] -2.016701 -2.016701 -2.016701 -2.016701 -2.016701 -2.016701 -1.905454
## [162] -1.905454 -1.905454 -1.905454 -1.905454 -1.905454 -1.905454 -1.905454
## [169] -1.905454 -1.905454 -1.905454 -1.905454 -1.905454 -1.905454 -1.905454
## [176] -1.905454 -1.905454 -1.905454 -1.905454 -1.905454 -1.905454 -1.905454
## [183] -1.905454 -1.905454 -1.905454 -1.905454 -1.905454 -1.905454 -1.905454
## [190] -1.905454 -1.905454 -1.905454 -1.905454 -1.905454 -1.905454 -1.905454
## [197] -1.905454 -1.905454 -1.905454 -1.905454 -1.905454 -1.905454 -2.054604
## [204] -1.581357 -2.053370 -2.053370 -2.053370 -2.053370 -2.053370 -2.202521
## [211] -2.202521 -2.202521 -2.202521 -2.202521 -2.202521 -2.230861 -2.230861
## [218] -2.380012 -2.380012 -2.380012 -2.380012 -2.268764 -2.268764 -2.268764
## [225] -2.268764 -2.268764 -2.268764 -2.268764 -2.268764 -2.268764 -2.268764
## [232] -2.268764 -2.268764 -2.268764 -2.268764 -2.417915 -1.944668 -2.565832
## [239] -2.565832

max(PLR_R_Model_Indices)

## [1] 2.733854

min(PLR_R_Model_Indices)

## [1] -2.565832

##################################
# Consolidating the model probabilities
# and classification index
# based from the model predictions
##################################
PLR_R_Model_Predictions <- as.data.frame(PMA_PreModelling_Train_PLR_R)
PLR_R_Model_Predictions$PLR_R_Prob <- PLR_R_Model_Probabilities
PLR_R_Model_Predictions$PLR_R_LP <- PLR_R_Model_Indices
PLR_R_Model_Predictions$UTI <- as.factor(PLR_R_Model_Predictions$UTI)
PLR_R_Model_Predictions$Label <- rep("PLR_R",nrow(PLR_R_Model_Predictions))

##################################
# Formulating the probability curve
# using the consolidated model predictions
##################################
PLR_R_Model_Predictions %>%
  ggplot(aes(x = PLR_R_LP ,
             y = PLR_R_Prob,
             color = UTI)) +
  scale_colour_manual(values=c("#1846BA55","#B8000055")) +
  geom_point(size=5) +
  geom_line(color="black") +
  xlab("UTI Classification Index (Logit Values)") +
  ylab("Estimated UTI Probability") +
  labs(color = "UTI") +
  scale_x_continuous( limits=c(-3,3), breaks=seq(-3,3,by=1)) +
  scale_y_continuous( limits=c(0,1), breaks=seq(0,1,by=0.1),labels = scales::percent) +
  ggtitle("Estimated UTI Probabilities Based on Classification Index : Penalized Logistic Regression - Ridge") +
  theme_bw() +
  theme(plot.title = element_text(color="black", size=14, face="bold", hjust=0.50),
        axis.title.x = element_text(color="black", size=12, face="bold"),
        axis.title.y = element_text(color="black", size=12, face="bold"),
        legend.position="top")

##################################
# Formulating the corresponding
# receiver operating characteristic (ROC) curve
# using the model predictions
##################################
PLR_R_Prob_Low  <- PLR_R_Model_Predictions[PLR_R_Model_Predictions$UTI==0,
                                    c("PLR_R_Prob")]
PLR_R_Prob_High <- PLR_R_Model_Predictions[PLR_R_Model_Predictions$UTI==1,
                                    c("PLR_R_Prob")]
PLR_R_Model_Prob_ROC <- roc.curve(scores.class1 = PLR_R_Prob_Low,
                               scores.class0 = PLR_R_Prob_High,
                                curve = TRUE)

plot(PLR_R_Model_Prob_ROC,
     xlab="1-Specificity", 
     ylab="Sensitivity",
     main="ROC Curve of the UTI Probabilities : Penalized Logistic Regression - Ridge", 
     color=TRUE, 
     lwd=8,
     legend=3)

1.6 Consolidated Findings

[A] The robust logistic regression models which were able to sufficiently compensate for the quasi-complete condition through stable coefficient estimates and standard errors, including valid logistic profiles with reasonably distributed predicted points are the following :
     [A.1] FBRLR : Firth’s Bias-Reduced Logistic Regression (logistf package)
     [A.2] FLAC : Firth’s Logistic Regression With Added Covariate (logistf package)
     [A.3] FLIC : Firth’s Logistic Regression With Intercept Correction (logistf package)
     [A.4] BGLM_CP : Bayesian Generalized Linear Model With Cauchy Priors (arm package)

Code Chunk | Output

##################################
# Replotting the logistic curves
##################################
LR_LogisticCurvePlot <- LR_Model_Predictions %>%
  ggplot(aes(x = LR_LP ,
             y = LR_Prob,
             color = UTI)) +
  scale_colour_manual(values=c("#1846BA55","#B8000055")) +
  geom_point(size=5) +
  geom_line(color="black") +
  xlab("UTI Classification Index (Logit Values)") +
  ylab("Estimated UTI Probability") +
  labs(color = "UTI") +
  scale_x_continuous( limits=c(-50,50), breaks=seq(-50,50,by=5)) +
  scale_y_continuous( limits=c(0,1), breaks=seq(0,1,by=0.1),labels = scales::percent) +
  facet_grid(. ~ Label) +
  theme_bw() +
  theme(plot.title = element_text(color="black", size=14, face="bold", hjust=0.50),
        axis.title.x = element_text(color="black", size=12, face="bold"),
        axis.title.y = element_text(color="black", size=12, face="bold"),
        legend.position="none")

FBRLR_LogisticCurvePlot <- FBRLR_Model_Predictions %>%
  ggplot(aes(x = FBRLR_LP ,
             y = FBRLR_Prob,
             color = UTI)) +
  scale_colour_manual(values=c("#1846BA55","#B8000055")) +
  geom_point(size=5) +
  geom_line(color="black") +
  xlab("UTI Classification Index (Logit Values)") +
  ylab("Estimated UTI Probability") +
  labs(color = "UTI") +
  scale_x_continuous( limits=c(-50,50), breaks=seq(-50,50,by=5)) +
  scale_y_continuous( limits=c(0,1), breaks=seq(0,1,by=0.1),labels = scales::percent) +
  facet_grid(. ~ Label) +
  theme_bw() +
  theme(plot.title = element_text(color="black", size=14, face="bold", hjust=0.50),
        axis.title.x = element_text(color="black", size=12, face="bold"),
        axis.title.y = element_text(color="black", size=12, face="bold"),
        legend.position="none")

FLAC_LogisticCurvePlot <- FLAC_Model_Predictions %>%
  ggplot(aes(x = FLAC_LP ,
             y = FLAC_Prob,
             color = UTI)) +
  scale_colour_manual(values=c("#1846BA55","#B8000055")) +
  geom_point(size=5) +
  geom_line(color="black") +
  xlab("UTI Classification Index (Logit Values)") +
  ylab("Estimated UTI Probability") +
  labs(color = "UTI") +
  scale_x_continuous( limits=c(-50,50), breaks=seq(-50,50,by=5)) +
  scale_y_continuous( limits=c(0,1), breaks=seq(0,1,by=0.1),labels = scales::percent) +
  facet_grid(. ~ Label) +
  theme_bw() +
  theme(plot.title = element_text(color="black", size=14, face="bold", hjust=0.50),
        axis.title.x = element_text(color="black", size=12, face="bold"),
        axis.title.y = element_text(color="black", size=12, face="bold"),
        legend.position="none")

FLIC_LogisticCurvePlot <- FLIC_Model_Predictions %>%
  ggplot(aes(x = FLIC_LP ,
             y = FLIC_Prob,
             color = UTI)) +
  scale_colour_manual(values=c("#1846BA55","#B8000055")) +
  geom_point(size=5) +
  geom_line(color="black") +
  xlab("UTI Classification Index (Logit Values)") +
  ylab("Estimated UTI Probability") +
  labs(color = "UTI") +
  scale_x_continuous( limits=c(-50,50), breaks=seq(-50,50,by=5)) +
  scale_y_continuous( limits=c(0,1), breaks=seq(0,1,by=0.1),labels = scales::percent) +
  facet_grid(. ~ Label) +
  theme_bw() +
  theme(plot.title = element_text(color="black", size=14, face="bold", hjust=0.50),
        axis.title.x = element_text(color="black", size=12, face="bold"),
        axis.title.y = element_text(color="black", size=12, face="bold"),
        legend.position="none")

BGLM_CP_LogisticCurvePlot <- BGLM_CP_Model_Predictions %>%
  ggplot(aes(x = BGLM_CP_LP ,
             y = BGLM_CP_Prob,
             color = UTI)) +
  scale_colour_manual(values=c("#1846BA55","#B8000055")) +
  geom_point(size=5) +
  geom_line(color="black") +
  xlab("UTI Classification Index (Logit Values)") +
  ylab("Estimated UTI Probability") +
  labs(color = "UTI") +
  scale_x_continuous( limits=c(-50,50), breaks=seq(-50,50,by=5)) +
  scale_y_continuous( limits=c(0,1), breaks=seq(0,1,by=0.1),labels = scales::percent) +
  facet_grid(. ~ Label) +
  theme_bw() +
  theme(plot.title = element_text(color="black", size=14, face="bold", hjust=0.50),
        axis.title.x = element_text(color="black", size=12, face="bold"),
        axis.title.y = element_text(color="black", size=12, face="bold"),
        legend.position="none")

PLR_R_LogisticCurvePlot <- PLR_R_Model_Predictions %>%
  ggplot(aes(x = PLR_R_LP ,
             y = PLR_R_Prob,
             color = UTI)) +
  scale_colour_manual(values=c("#1846BA55","#B8000055")) +
  geom_point(size=5) +
  geom_line(color="black") +
  xlab("UTI Classification Index (Logit Values)") +
  ylab("Estimated UTI Probability") +
  labs(color = "UTI") +
  scale_x_continuous( limits=c(-50,50), breaks=seq(-50,50,by=5)) +
  scale_y_continuous( limits=c(0,1), breaks=seq(0,1,by=0.1),labels = scales::percent) +
  facet_grid(. ~ Label) +
  theme_bw() +
  theme(plot.title = element_text(color="black", size=14, face="bold", hjust=0.50),
        axis.title.x = element_text(color="black", size=12, face="bold"),
        axis.title.y = element_text(color="black", size=12, face="bold"),
        legend.position="none")

RLR_LogisticCurvePlot <- ggarrange(LR_LogisticCurvePlot,
                                   FBRLR_LogisticCurvePlot,
                                   FLAC_LogisticCurvePlot,
                                   FLIC_LogisticCurvePlot,
                                   BGLM_CP_LogisticCurvePlot,
                                   PLR_R_LogisticCurvePlot,
                                   ncol=2, nrow=3)

annotate_figure(RLR_LogisticCurvePlot,
                top = text_grob("Estimated UTI Probabilities Based on Classification Index", 
                                color = "black", 
                                face = "bold", 
                                size = 14))

##################################
# Formulating the logistic regression model 
# coefficient comparison plot
##################################
RLRCoefficient_Summary <- rbind(LR_Model_Coef,
                                FBRLR_Model_Coef,
                                FLAC_Model_Coef,
                                FLIC_Model_Coef,
                                BGLM_CP_Model_Coef,
                                PLR_R_Model_Coef)

RLRCoefficient_Summary <- as.data.frame(RLRCoefficient_Summary)

RLRCoefficient_Summary$Estimates <- as.numeric(as.character(RLRCoefficient_Summary$Estimates))
RLRCoefficient_Summary$Coefficients <- factor(RLRCoefficient_Summary$Coefficients,
                                        levels = c("Diaphragm",
                                                   "Spermicide",
                                                   "LubricatedCondom",
                                                   "Condom",
                                                   "Contraceptive",
                                                   "Age_24",
                                                   "(Intercept)"))
RLRCoefficient_Summary$Model <- factor(RLRCoefficient_Summary$Model,
                                        levels = c("PLR_R",
                                                   "BGLM_CP",
                                                   "FLIC",
                                                   "FLAC",
                                                   "FBRLR",
                                                   "LR"))

print(RLRCoefficient_Summary, row.names=FALSE)

##    Estimates     Coefficients   Model
##   -2.8682847      (Intercept)      LR
##   -1.0518337           Age_24      LR
##  -21.8847038    Contraceptive      LR
##  -22.1197835           Condom      LR
##   -1.6865651 LubricatedCondom      LR
##    2.8682847       Spermicide      LR
##   71.2831767        Diaphragm      LR
##   -3.3459926      (Intercept)   FBRLR
##   -1.5398768           Age_24   FBRLR
##   -1.0149368    Contraceptive   FBRLR
##   -2.0487513           Condom   FBRLR
##   -1.2147026 LubricatedCondom   FBRLR
##    3.0196898       Spermicide   FBRLR
##   10.4098028        Diaphragm   FBRLR
##   -3.3022901      (Intercept)    FLAC
##   -1.4876184           Age_24    FLAC
##   -1.0423353    Contraceptive    FLAC
##   -2.1157922           Condom    FLAC
##   -1.2316706 LubricatedCondom    FLAC
##    3.1155536       Spermicide    FLAC
##   10.5070364        Diaphragm    FLAC
##   -3.2741017      (Intercept)    FLIC
##   -1.5398768           Age_24    FLIC
##   -1.0149368    Contraceptive    FLIC
##   -2.0487513           Condom    FLIC
##   -1.2147026 LubricatedCondom    FLIC
##    3.0196898       Spermicide    FLIC
##   10.4098028        Diaphragm    FLIC
##   -3.0513140      (Intercept) BGLM_CP
##   -0.4890448           Age_24 BGLM_CP
##   -1.9351325    Contraceptive BGLM_CP
##   -1.5103889           Condom BGLM_CP
##   -1.3064324 LubricatedCondom BGLM_CP
##    1.5402574       Spermicide BGLM_CP
##   12.6043657        Diaphragm BGLM_CP
##   -1.7196340      (Intercept)   PLR_R
##   -0.3633108           Age_24   PLR_R
##   -0.1858195    Contraceptive   PLR_R
##    0.3240962           Condom   PLR_R
##   -0.4720128 LubricatedCondom   PLR_R
##   -0.1491506       Spermicide   PLR_R
##    4.1293922        Diaphragm   PLR_R

(ROCCurveAUC_Plot <- dotplot(Model ~ Estimates | Coefficients,
                           data = RLRCoefficient_Summary,
                           main = "Model Coefficient Estimation Comparison",
                           ylab = "Model",
                           xlab = "Coefficient Estimates",
                           auto.key = list(adj=1),
                           type=c("p", "h"),       
                           origin = 0,
                           alpha = 0.45,
                           pch = 16,
                           cex = 2,
                           layout = c(3,3)))

2. Summary

3. References

[Book] Applied Predictive Modeling by Max Kuhn and Kjell Johnson
[Book] An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie and Rob Tibshirani
[Book] Multivariate Data Visualization with R by Deepayan Sarkar
[Book] Machine Learning by Samuel Jackson
[Book] Data Modeling Methods by Jacob Larget
[Book] Introduction to R and Statistics by University of Western Australia
[Book] Feature Engineering and Selection: A Practical Approach for Predictive Models by Max Kuhn and Kjell Johnson
[Book] Introduction to Regression Methods for Public Health Using R by Ramzi Nahhas
[Book] Introduction to Research Methods by Eric van Holm
[R Package] AppliedPredictiveModeling by Max Kuhn
[R Package] caret by Max Kuhn
[R Package] rpart by Terry Therneau and Beth Atkinson
[R Package] lattice by Deepayan Sarkar
[R Package] dplyr by Hadley Wickham
[R Package] moments by Lukasz Komsta and Frederick
[R Package] skimr by Elin Waring
[R Package] RANN by Sunil Arya, David Mount, Samuel Kemp and Gregory Jefferis
[R Package] corrplot by Taiyun Wei
[R Package] tidyverse by Hadley Wickham
[R Package] lares by Bernardo Lares
[R Package] DMwR2 by Luis Torgo
[R Package] gridExtra by Baptiste Auguie and Anton Antonov
[R Package] rattle by Graham Williams
[R Package] rpart.plot by Stephen Milborrow
[R Package] RColorBrewer by Erich Neuwirth
[R Package] stats by R Core Team
[R Package] pls by Kristian Hovde Liland
[R Package] nnet by Brian Ripley
[R Package] elasticnet by Hui Zou
[R Package] earth by Stephen Milborrow
[R Package] party by Torsten Hothorn
[R Package] kernlab by Alexandros Karatzoglou
[R Package] randomForest by Andy Liaw
[R Package] pROC by Xavier Robin
[R Package] mda by Trevor Hastie
[R Package] klaR by Christian Roever, Nils Raabe, Karsten Luebke, Uwe Ligges, Gero Szepannek, Marc Zentgraf and David Meyer
[R Package] pamr by Trevor Hastie, Rob Tibshirani, Balasubramanian Narasimhan and Gil Chu
[R Package] OptimalCutpoints by Monica Lopez-Raton and Maria Xose Rodriguez-Alvarez
[R Package] broom by Simon Couch
[R Package] PRROC by Jan Grau and Jens Keilwagen
[R Package] ggpubr by Alboukadel Kassambara
[R Package] Hmisc by Frank Harrell
[R Package] logistf by Georg Heinze
[R Package] arm by Yu-Sung Su
[R Package] glmnet by Trevor Hastie
[Article] The caret Package by Max Kuhn
[Article] A Short Introduction to the caret Package by Max Kuhn
[Article] Caret Package – A Practical Guide to Machine Learning in R by Selva Prabhakaran
[Article] Tuning Machine Learning Models Using the Caret R Package by Jason Brownlee
[Article] Lattice Graphs by Alboukadel Kassambara

[Article] What is Complete or Quasi-Complete Separation in Logistic/Probit Regression and How Do We Deal with Them by UCLA Advanced Research Computing Group
[Article] What Is Complete Separation in Binary Logistic Regression? by Eric Heckman
[Article] Separation and Convergence Issues in Logistic Regression by Cornell Statistical Consulting Unit
[Article] Firth Logistic Regression by Ken Kleinman
[Article] Penalized Logistic Regression Essentials in R: Ridge, Lasso and Elastic Net by Alboukadel Kassambara
[Article] Correcting the Quasi-complete Separation Issue in Logistic Regression Models by Xinghe Lu
[Article] An Introduction to glmnet by Trevor Hastie, Junyang Qian and Kenneth Tay
[Article] Choosing Hyperparameters in Penalized Regression by Florian Prive
[Publication] The Origins of Logistic Regression by JS Cramer (Econometrics eJournal)
[Publication] Separation in Logistic Regression: Causes, Consequences, and Control by Mohammad Ali Mansournia, Angelika Geroldinger, Sander Greenland and Georg Heinze (American Journal of Epidemiology)
[Publication] Dealing with Separation in Logistic Regression Models by Carlisle Rainey (Political Analysis)
[Publication] Bias Reduction of Maximum Likelihood Estimates by David Firth (Biometrika)
[Publication] A Solution to the Problem of Separation in Logistic Regression by Georg Heinze and Michael Schemper (Statistics in Medicine)
[Publication] A Comparative Investigation of Methods for Logistic Regression with Separated or Nearly Separated Data by Georg Heinze (Statistics in Medicine)
[Publication] Confidence Intervals After Multiple Imputation: Combining Profile Likelihood Information from Logistic Regressions by Georg Heinze, Meinhard Ploner and Jan Beyea (Statistics in Medicine)
[Publication] Firth’s Logistic Regression with Rare Events: Accurate Effect Estimates and Predictions? by Rainer Puhr, Georg Heinze, Mariana Nold, Lara Lusa and Angelika Geroldinger (Statistics in Medicine)
[Publication] A Method for Computing Profile-Likelihood Based Confidence Intervals by DJ Venzon and SH Moolgavkar (Applied Statistics)
[Publication] A Weakly Informative Default Prior Distribution For Logistic And Other Regression Models by Andrew Gelman, Aleks Jakulin, Maria Grazia Pittau and Yu-Sung Su (The Annals of Applied Statistics)
[Publication] Ridge Regression: Biased Estimation for Non-Orthogonal Problems by Art Hoerl and Robert Kennard (Technometrics)
[Course] Applied Data Mining and Statistical Learning by Penn State Eberly College of Science

Supervised Learning : Exploring Robust Logistic Regression Models for Handling Quasi-Complete Separation

John Pauline Pineda

February 19, 2023

1. Table of Contents

1.1 Sample Data

1.2 Data Quality Assessment

1.3 Data Preprocessing

1.3.1 Pre-Processed Dataset

1.4 Data Exploration

1.5 Predictive Model Index Development and Probability Curve Estimation

1.5.1 Logistic Regression (LR)

1.5.2 Firth’s Bias-Reduced Logistic Regression (FBRLR)

1.5.3 Firth’s Logistic Regression With Added Covariate (FLAC)

1.5.4 Firth’s Logistic Regression With Intercept Correction (FLIC)

1.5.5 Bayesian Generalized Linear Model With Cauchy Priors (BGLM_CP)

1.5.6 Penalized Logistic Regression - Ridge (PLR_R)

1.6 Consolidated Findings

2. Summary

3. References