1. Table of Contents

1.1 Introduction

Chronic disease indicators (CDI) are a set of surveillance indicators developed by consensus among the Center for Disease Controls and Prevention (CDC), Council of State and Territorial Epidemiologists (CSTE), and National Association of Chronic Disease Directors (NACDD). CDI enables public health professionals and policymakers to retrieve uniformly defined state-level data for chronic diseases and risk factors that have a substantial impact on public health. These indicators are essential for surveillance, prioritization, and evaluation of public health interventions.

Using an open dataset from Data.World as primarily sourced from CDC, this study hypothesized that latent patterns are present among sufficiently correlated indicators encompassing multiple chronic disease topic areas between the different US states. A number of factor analysis models was formulated to explore and verify the relationship between these factors.

Subsequent analysis and modelling steps involving data understanding, data preparation, data exploration, model development, model validation and model presentation were individually detailed below, with all the results consolidated in a Summary provided at the end of the document.

1.1.1 Study Objectives

The main objective of the study is to explore and validate potential underlying factor structures and relationships among a set of observed US chronic disease indicators by identifying latent factors that explain the observed correlations and reducing the complexity of the data by grouping related indicators together under these discovered factors.

Specific objectives are given as follows:

[A] Obtain an optimal subset of observations and descriptors by conducting data quality assessment and applying preprocessing operations most suitable for the downstream analysis

[B] Develop exploratory factor analysis models using various extraction methods to estimate and identify potential underlying structures from observed descriptors and different rotation approaches in simplifying the derived factor structures to achieve a more interpretable pattern of factor loadings

[C] Select the final exploratory factor analysis model among candidates based on robust performance estimates

[D] Validate the relationship obtained from the final exploratory analysis model between the observed descriptors and their underlying latent constructs using confirmatory factor analysis

1.1.2 Outcome

There is no explicit endpoint for the study given the unsupervised learning nature of the analysis.

1.1.3 Descriptors

Detailed information for each individual descriptor used in the study are provided as follows:

[A] LDMORT (numeric): Liver Disease Mortality; Chronic liver disease mortality.

[B] ARTINC (numeric): Arthritis Incidence; Arthritis among adults aged >= 18 years.

[C] ASMORT (numeric): Asthma Mortality; Asthma mortality rate.

[D] PSMUSE (numeric): Papanicolaou Smear Use; Papanicolaou smear use among adult women aged 21-65 years.

[E] RDMORT (numeric): Renal Disease Mortality; Mortality with end-stage renal disease.

[F] PDMORT (numeric): Pulmonary Disease Mortality; Mortality with chronic obstructive pulmonary disease as underlying cause among adults aged >=45 years.

[G] CDMORT (numeric): Cardiovascular Disease Mortality; Mortality from total cardiovascular disease.

[H] HDMORT (numeric): Heart Disease Mortality; Mortality from diseases of the heart.

[I] STMORT (numeric): Stroke Mortality; Mortality from stroke.

[J] DBMORT (numeric): Diabetes Mortality; Mortality due to diabetes reported as any listed cause of death.

[K] INFVAC (numeric): Influenza Vaccination; Influenza vaccination among noninstitutionalized adults aged >= 18 years.

[L] MEUNDA (numeric): Mentally Unhealthy Days; Recent mentally unhealthy days among adults aged >= 18 years.

[M] OVWINC (numeric): Overweight Incidence; Overweight or obesity among adults aged >= 18 years.

[N] HLTINS (numeric): Health Insurance; Current lack of health insurance among adults aged 18-64 years.

[O] SMPREV (numeric): Smoking Prevalence; Current smoking among adults aged >= 18 years.

1.2 Methodology

1.2.1 Data Assessment

Preliminary data used in the study was evaluated and prepared for analysis and modelling using the following methods:

Kaiser-Meyer-Olkin Factor Adequacy evaluates whether the observed variables are suitable for exploratory factor analysis based on their common variance and the potential for extracting meaningful factors. The criterion is computed by examining the ratio of the sum of squared correlations between variables to the sum of squared partial correlations. A KMO value closer to 1 suggests that the variables have high shared variance and are suitable to proceed with the analysis.

Bartlett’s Test of Sphericity evaluates whether the correlations between variables in a data set are significant enough to support the assumption that underlying factors exist and can be extracted. The test calculates a Chi-Square statistic based on the differences between the observed correlation matrix and an identity matrix. The larger the Chi-Square value, there is more evidence against the null hypothesis stating that the correlation matrix is an identity matrix which indicates that the variables are uncorrelated and do not have any underlying structure.

Determinant Computation reflects the extent of multicollinearity among the variables in a correlation matrix. An extremely small determinant value indicates that the variables are highly correlated and nearly linearly dependent which can lead to unstable results and difficulty in interpreting their individual contributions.

Covariance Validity evaluates whether the ratio of associated variables in the data set are sufficient enough to support the assumption that correlations exist. The criterion is computed by determining the proportion of correlation coefficients with values of at least 30% between all pairs of variables in the data set. A value closer to 1 suggests an adequate percentage of pairwise-correlated variables.

1.2.2 Model Formulation

Exploratory Factor Analysis explores the underlying structure of a set of observed variables without specifying a pre-defined model. It allows for the identification of factors based on the patterns of correlations among observed variables. The method helps determine the number of factors that best explain the observed variance in the data, with the results used to decide how many factors to retain based on statistical criteria and contextual meaning. The analysis reveals the interrelationships among factors but does not test a specific model of factor structure. Factor loadings are derived - indicating how strongly each observed variable is associated with each factor. These loadings are interpreted to understand the structure of the data.

Confirmatory Factor Analysis tests a pre-specified model of the relationships between observed variables and latent factors. It aims to confirm or reject the fit of a hypothesized factor structure based on theoretical considerations or previous research. The method requires the specification of a pre-determined factor structure, including the number of factors, the relationships between factors, and the loadings of observed variables on those factors. The focus of the analysis is on evaluating how well the chosen model fits the observed data.

Principal Axes Factor Extraction identifies the underlying constructs that explain the observed correlations among variables by capturing both common variance (shared among variables) and unique variance (specific to each variable). This process potentially results to factors with lower communalities (explained variance) but with more direct interpretability. The algorithm performs eigenvalue decomposition on the correlation matrix. The eigenvalues represent the amount of variance explained by each eigenvector. Given a defined number of factors, loadings are calculated for each observed variable on each extracted factor. Factor loadings indicate the strength and direction of the relationship between variables and factors. Factors are interpreted based on the loading patterns. Variables with high loadings on a factor are strongly associated with the factor.

Maximum Likelihood Factor Extraction aims to estimate the factor loadings in a way that maximizes the likelihood of observing the given data, assuming a specific factor model. Given the correlation matrix, the algorithm formulates a likelihood function that represents the probability of observing the given data under an assumed factor model representing the relationships between the latent factors and observed variables. The likelihood function quantifies how well the model explains the observed data. Optimization techniques are applied to determine the factor loadings that maximize the likelihood function. The process involves iteratively adjusting the factor loadings to improve the fit between the model and the data. Factor loadings indicate the strength and direction of the relationship between variables and factors. Factors are interpreted based on the loading patterns. Variables with high loadings on a factor are strongly associated with the factor.

Varimax Rotation is an orthogonal rotation method which forces the rotated factors to be uncorrelated with each other, leading to simpler and more easily interpretable factor solutions. The algorithm aims to maximize the variance of the squared loadings within each factor which helps identify variables that are strongly associated with a single factor. The results are straightforward to interpret and can be particularly useful when the factors are expected to be independent.

Promax Rotation is an oblique rotation method which allows for more flexibility by accommodating the possibility of correlated factors. The algorithm aims to simplify the factor structure by both maximizing the variance of the squared loadings within each factor and allowing for correlated factors. It uses a more complex mathematical approach to find the optimal rotation that accounts for both variance and correlation. The results provide a more accurate representation of the underlying relationships when the factors are expected to be correlated.

1.2.3 Model Performance Evaluation

Tucker Lewis Index (TLI) is an incremental fit metric which assesses the goodness of fit of a proposed confirmatory factor analysis model to the observed data by comparing the fit of the specified model with the fit of a baseline model. The formula involves computing the ratio of the improvement in fit from the proposed model to the fit of the null model using non-normed measurements, while taking into account model complexity. Values may fall below zero or be above one with higher measurements indicating a better model fit.

Comparative Fit Index (CFI) is an incremental fit metric which which assesses the goodness of fit of a proposed confirmatory factor analysis model to the observed data by comparing the fit of the specified model with the fit of a baseline model. The formula involves computing the ratio of the improvement in fit from the proposed model to the fit of the null model using normed measurements, with relativere insensitivity to model complexity. The metric values range between 0 and 1 with higher measurements indicating a better model fit.

Standardized Root Mean Square of the Residual (SRMR) provides a measure of the average standardized residual, representing the average discrepancy between the observed and predicted covariance matrices. The metric is calculated by taking the square root of the average of the squared differences between the observed and predicted covariance matrices, standardized by the observed variable variances. Values range from 0 to 1, with lower values indicating better model fit. The SRMR is particularly sensitive to misspecifications in the model’s covariances, making it a useful index for assessing the fit of the covariance structure.

Root Mean Square Error of Approximation (RMSEA) provides a measure of the discrepancy between the observed and predicted covariances, adjusted for the complexity of the model by incorporating both fit and parsimony. Metric values range from 0 to positive infinity with lower values indicating better model fit. The RMSEA is considered a measure of “fit per degree of freedom,” providing a balance between model fit and model complexity. It focuses on the error of approximation, quantifying how well the model approximates the observed covariance structure, and penalizes complex models that do not significantly improve fit over simpler models.

Bayesian Information Criterion (BIC) is a statistical criterion used for model selection by balancing the goodness of fit of a model with its complexity, penalizing models with a larger number of parameters. In exploratory factor analysis where models with vary in terms of the numbers of factors or latent variables, BIC helps in comparing these models and selecting the one that achieves a good fit while avoiding overfitting. The metric is calculated using a combination of the maximized log-likelihood of the model, number of estimated parameters in the model, and the sample size. It incorporates a penalty term for model complexity, favoring models that achieve a good fit while using a parsimonious number of parameters.

1.2.4 Model Presentation

Dandelion Plots provide a simultaneous visualization of both factor variances and loadings in an exploratory factor analysis. Each central line represents a different factor and is connected to a star graph. These star graphs visualize the factor loadings for the corresponding factor with negative and positive loadings indicated by different colors. Explained variance of each factor can be observed by the size of each star graph or by the angle between the current and the consecutive central line. A comparison of the derived communalities and uniquenesses is also provided together with the cumulative explanation ratios of factors.

Path Diagrams provide a visual representation of the hypothesized relationships among latent (unobserved) factors and their observed indicators in a confirmatory factor analysis by illustrating the structure of the model and indicating how the latent factors influence the observed variables. The key components of a path diagram include unobserved constructs that are postulated to underlie the observed variability in a set of measured variables represented by circles or ovals, observed indicators assumed to be influenced by the underlying latent factors represented by rectangles or squares, arrows connecting latent factors to observed indicators representing factor loadings suggesting that the latent factor causes variation in the observed variable, numerical values assigned to the arrows indicating the strength and direction of the relationship between the latent factor and the observed variable, and double-headed arrows connecting two latent factors representing the covariance between those factors.

1.3 Results

1.3.1 Data Preparation

[A] The initial tabular dataset was comprised of 50 observations and 16 variables (including 1 metadata and 30 descriptors).
     [A.1] 50 rows (observations)
     [A.2] 16 columns (variables)
            [A.2.1] 1/16 instance labels = STATE (character)
            [A.2.2] 15/16 predictors = 15/15 numeric
                     [A.2.2.1] LDMORT (numeric)
                     [A.2.2.2] ARTINC (numeric)
                     [A.2.2.3] ASMORT (numeric)
                     [A.2.2.4] PSMUSE (numeric)
                     [A.2.2.5] RDMORT (numeric)
                     [A.2.2.6] PDMORT (numeric)
                     [A.2.2.7] CDMORT (numeric)
                     [A.2.2.8] HDMORT (numeric)
                     [A.2.2.9] STMORT (numeric)
                     [A.2.2.10] DBMORT (numeric)
                     [A.2.2.11] INFVAC (numeric)
                     [A.2.2.12] MEUNDA (numeric)
                     [A.2.2.13] OVWINC (numeric)
                     [A.2.2.14] HLTINS (numeric)
                     [A.2.2.15] SMPREV (numeric)

Code Chunk | Output

##################################
# Loading R libraries
##################################
library(AppliedPredictiveModeling)
library(performance)
library(parameters)
library(HH)
library(tidyr)
library(caret)
library(psych)
library(lattice)
library(dplyr)
library(moments)
library(skimr)
library(RANN)
library(pls)
library(corrplot)
library(lares)
library(DMwR2)
library(gridExtra)
library(rattle)
library(RColorBrewer)
library(stats)
library(factoextra)
library(FactoMineR)
library(gplots)
library(qgraph)
library(ggplot2)
library(GGally)
library(psych)
library(nFactors)
library(MBESS)
library(mice)
library(DandEFA)
library(EFAtools)
library(tidyverse)
library(lavaan)
library(semPlot)
library(semoutput)
library(sjPlot)
library(GPArotation)
library(semTools)

##################################
# Defining file paths
##################################
DATASETS_ORIGINAL_PATH <- file.path("datasets","original")

##################################
# Loading source and
# formulating the analysis set
##################################
CDI <- read.csv(file.path("..", DATASETS_ORIGINAL_PATH, "ChronicDiseaseIndicators.csv"),
                na.strings=c("NA","NaN"," ",""),
                stringsAsFactors = FALSE)
CDI <- as.data.frame(CDI)

##################################
# Performing a general exploration of the data set
##################################
dim(CDI)

## [1] 50 16

str(CDI)

## 'data.frame':    50 obs. of  16 variables:
##  $ STATE : chr  "Alabama" "Alaska" "Arizona" "Arkansas" ...
##  $ LDMORT: num  11.8 10.7 14.3 10.4 12 12.7 8.5 9.3 11.8 8.6 ...
##  $ ARTINC: num  31.7 22.5 23.2 28.1 19.9 22.1 21.4 23.8 23.1 24.6 ...
##  $ ASMORT: num  10.7 NA 11.7 11 10 7.8 8.7 NA 8.6 9.9 ...
##  $ PSMUSE: num  83 78.5 79.8 77.8 83.7 84.8 87.6 86.8 79.9 84.5 ...
##  $ RDMORT: num  60.1 49.2 37.3 65.8 51 44.2 47.7 58.7 39.1 55.1 ...
##  $ PDMORT: num  148.3 105.7 116.5 163 87.8 ...
##  $ CDMORT: num  292 193 183 279 194 ...
##  $ HDMORT: num  224 147 136 218 142 ...
##  $ STMORT: num  48.3 32.3 28.3 45.4 33.9 33.4 26.3 38.8 33 42.6 ...
##  $ DBMORT: num  59.5 59.3 55.1 71.6 69.9 52 43.8 57.2 49.3 55.2 ...
##  $ INFVAC: num  37.7 35 32.6 39.6 35.7 44.2 41.3 39.4 29 34.1 ...
##  $ MEUNDA: num  4.7 3 3.7 4.4 3.6 3.2 3.7 3.6 3.8 4 ...
##  $ OVWINC: num  66.9 65.2 64 70.7 59.8 57.2 59.6 67.3 61.5 65.7 ...
##  $ HLTINS: num  17.9 17.1 18.1 20.3 17.8 15.5 11 11.2 23.1 25.2 ...
##  $ SMPREV: num  21.7 19.5 16.9 25.4 12.8 15.8 16.1 20.3 18.6 17.4 ...

summary(CDI)

##     STATE               LDMORT          ARTINC          ASMORT     
##  Length:50          Min.   : 6.80   Min.   :19.00   Min.   : 7.50  
##  Class :character   1st Qu.: 8.70   1st Qu.:23.12   1st Qu.: 9.45  
##  Mode  :character   Median :10.20   Median :24.15   Median :11.00  
##                     Mean   :10.54   Mean   :24.78   Mean   :11.32  
##                     3rd Qu.:11.95   3rd Qu.:25.98   3rd Qu.:13.30  
##                     Max.   :22.50   Max.   :35.90   Max.   :16.70  
##                                                     NA's   :11     
##      PSMUSE          RDMORT          PDMORT          CDMORT     
##  Min.   :76.20   Min.   :37.30   Min.   : 43.9   Min.   :164.7  
##  1st Qu.:80.53   1st Qu.:49.30   1st Qu.:102.3   1st Qu.:195.2  
##  Median :82.70   Median :54.90   Median :119.8   Median :208.8  
##  Mean   :82.59   Mean   :55.54   Mean   :120.0   Mean   :220.0  
##  3rd Qu.:85.33   3rd Qu.:59.98   3rd Qu.:139.6   3rd Qu.:242.1  
##  Max.   :87.90   Max.   :78.50   Max.   :177.7   Max.   :300.5  
##                                                                 
##      HDMORT          STMORT          DBMORT           INFVAC     
##  Min.   :116.5   Min.   :25.60   Min.   : 42.30   Min.   :29.00  
##  1st Qu.:147.8   1st Qu.:33.50   1st Qu.: 57.05   1st Qu.:35.88  
##  Median :158.1   Median :36.75   Median : 64.85   Median :39.20  
##  Mean   :167.0   Mean   :36.86   Mean   : 67.37   Mean   :39.01  
##  3rd Qu.:182.3   3rd Qu.:41.45   3rd Qu.: 75.55   3rd Qu.:42.02  
##  Max.   :229.9   Max.   :48.80   Max.   :117.20   Max.   :49.00  
##                                                                  
##      MEUNDA          OVWINC          HLTINS          SMPREV     
##  Min.   :2.800   Min.   :57.20   Min.   : 5.40   Min.   : 9.50  
##  1st Qu.:3.325   1st Qu.:62.60   1st Qu.:12.55   1st Qu.:16.62  
##  Median :3.700   Median :65.10   Median :15.70   Median :18.90  
##  Mean   :3.686   Mean   :64.54   Mean   :15.84   Mean   :19.02  
##  3rd Qu.:4.000   3rd Qu.:66.75   3rd Qu.:18.32   3rd Qu.:21.25  
##  Max.   :4.900   Max.   :70.80   Max.   :29.10   Max.   :28.20  
##

##################################
# Formulating a data type assessment summary
##################################
PDA <- CDI
(PDA.Summary <- data.frame(
  Column.Index=c(1:length(names(PDA))),
  Column.Name= names(PDA), 
  Column.Type=sapply(PDA, function(x) class(x)), 
  row.names=NULL)
)

##    Column.Index Column.Name Column.Type
## 1             1       STATE   character
## 2             2      LDMORT     numeric
## 3             3      ARTINC     numeric
## 4             4      ASMORT     numeric
## 5             5      PSMUSE     numeric
## 6             6      RDMORT     numeric
## 7             7      PDMORT     numeric
## 8             8      CDMORT     numeric
## 9             9      HDMORT     numeric
## 10           10      STMORT     numeric
## 11           11      DBMORT     numeric
## 12           12      INFVAC     numeric
## 13           13      MEUNDA     numeric
## 14           14      OVWINC     numeric
## 15           15      HLTINS     numeric
## 16           16      SMPREV     numeric

1.3.2 Data Quality Assessment

[A] Missing data noted for 1 variable with Null.Count>0 and Fill.Rate<1.0.
[A.1] ASMORT: Null.Count=11, Fill.Rate=0.78
[B] No low variance observed for any variable with First.Second.Mode.Ratio>5.
[C] No low variance observed for any variable with Unique.Count.Ratio<0.01.
[D] No high skewness observed for any variable with Skewness>3 or Skewness<(-3).

Code Chunk | Output

##################################
# Loading dataset
##################################
DQA <- CDI

##################################
# Formulating an overall data quality assessment summary
##################################
(DQA.Summary <- data.frame(
  Column.Name= names(DQA),
  Column.Type=sapply(DQA, function(x) class(x)),
  Row.Count=sapply(DQA, function(x) nrow(DQA)),
  NA.Count=sapply(DQA,function(x)sum(is.na(x))),
  Fill.Rate=sapply(DQA,function(x)format(round((sum(!is.na(x))/nrow(DQA)),3),nsmall=3)),
  row.names=NULL)
)

##    Column.Name Column.Type Row.Count NA.Count Fill.Rate
## 1        STATE   character        50        0     1.000
## 2       LDMORT     numeric        50        0     1.000
## 3       ARTINC     numeric        50        0     1.000
## 4       ASMORT     numeric        50       11     0.780
## 5       PSMUSE     numeric        50        0     1.000
## 6       RDMORT     numeric        50        0     1.000
## 7       PDMORT     numeric        50        0     1.000
## 8       CDMORT     numeric        50        0     1.000
## 9       HDMORT     numeric        50        0     1.000
## 10      STMORT     numeric        50        0     1.000
## 11      DBMORT     numeric        50        0     1.000
## 12      INFVAC     numeric        50        0     1.000
## 13      MEUNDA     numeric        50        0     1.000
## 14      OVWINC     numeric        50        0     1.000
## 15      HLTINS     numeric        50        0     1.000
## 16      SMPREV     numeric        50        0     1.000

##################################
# Listing all descriptors
##################################
DQA.Descriptors <- DQA[,colnames(DQA)!="State"]

##################################
# Listing all numeric Descriptors
##################################
DQA.Descriptors.Numeric <- DQA.Descriptors[,sapply(DQA.Descriptors, is.numeric)]

if (length(names(DQA.Descriptors.Numeric))>0) {
    print(paste0("There are ",
               (length(names(DQA.Descriptors.Numeric))),
               " numeric descriptor variable(s)."))
} else {
  print("There are no numeric descriptor variables.")
}

## [1] "There are 15 numeric descriptor variable(s)."

##################################
# Listing all factor Descriptors
##################################
DQA.Descriptors.Factor <- DQA.Descriptors[,sapply(DQA.Descriptors, is.factor)]

if (length(names(DQA.Descriptors.Factor))>0) {
    print(paste0("There are ",
               (length(names(DQA.Descriptors.Factor))),
               " factor descriptor variable(s)."))
} else {
  print("There are no factor descriptor variables.")
}

## [1] "There are no factor descriptor variables."

##################################
# Formulating a data quality assessment summary for factor Descriptors
##################################
if (length(names(DQA.Descriptors.Factor))>0) {

  ##################################
  # Formulating a function to determine the first mode
  ##################################
  FirstModes <- function(x) {
    ux <- unique(na.omit(x))
    tab <- tabulate(match(x, ux))
    ux[tab == max(tab)]
  }

  ##################################
  # Formulating a function to determine the second mode
  ##################################
  SecondModes <- function(x) {
    ux <- unique(na.omit(x))
    tab <- tabulate(match(x, ux))
    fm = ux[tab == max(tab)]
    sm = x[!(x %in% fm)]
    usm <- unique(sm)
    tabsm <- tabulate(match(sm, usm))
    ifelse(is.na(usm[tabsm == max(tabsm)])==TRUE,
           return("x"),
           return(usm[tabsm == max(tabsm)]))
  }

  (DQA.Descriptors.Factor.Summary <- data.frame(
  Column.Name= names(DQA.Descriptors.Factor),
  Column.Type=sapply(DQA.Descriptors.Factor, function(x) class(x)),
  Unique.Count=sapply(DQA.Descriptors.Factor, function(x) length(unique(x))),
  First.Mode.Value=sapply(DQA.Descriptors.Factor, function(x) as.character(FirstModes(x)[1])),
  Second.Mode.Value=sapply(DQA.Descriptors.Factor, function(x) as.character(SecondModes(x)[1])),
  First.Mode.Count=sapply(DQA.Descriptors.Factor, function(x) sum(na.omit(x) == FirstModes(x)[1])),
  Second.Mode.Count=sapply(DQA.Descriptors.Factor, function(x) sum(na.omit(x) == SecondModes(x)[1])),
  Unique.Count.Ratio=sapply(DQA.Descriptors.Factor, function(x) format(round((length(unique(x))/nrow(DQA.Descriptors.Factor)),3), nsmall=3)),
  First.Second.Mode.Ratio=sapply(DQA.Descriptors.Factor, function(x) format(round((sum(na.omit(x) == FirstModes(x)[1])/sum(na.omit(x) == SecondModes(x)[1])),3), nsmall=3)),
  row.names=NULL)
  )

}

##################################
# Formulating a data quality assessment summary for numeric Descriptors
##################################
if (length(names(DQA.Descriptors.Numeric))>0) {

  ##################################
  # Formulating a function to determine the first mode
  ##################################
  FirstModes <- function(x) {
    ux <- unique(na.omit(x))
    tab <- tabulate(match(x, ux))
    ux[tab == max(tab)]
  }

  ##################################
  # Formulating a function to determine the second mode
  ##################################
  SecondModes <- function(x) {
    ux <- unique(na.omit(x))
    tab <- tabulate(match(x, ux))
    fm = ux[tab == max(tab)]
    sm = na.omit(x)[!(na.omit(x) %in% fm)]
    usm <- unique(sm)
    tabsm <- tabulate(match(sm, usm))
    ifelse(is.na(usm[tabsm == max(tabsm)])==TRUE,
           return(0.00001),
           return(usm[tabsm == max(tabsm)]))
  }

  (DQA.Descriptors.Numeric.Summary <- data.frame(
  Column.Name= names(DQA.Descriptors.Numeric),
  Column.Type=sapply(DQA.Descriptors.Numeric, function(x) class(x)),
  Unique.Count=sapply(DQA.Descriptors.Numeric, function(x) length(unique(x))),
  Unique.Count.Ratio=sapply(DQA.Descriptors.Numeric, function(x) format(round((length(unique(x))/nrow(DQA.Descriptors.Numeric)),3), nsmall=3)),
  First.Mode.Value=sapply(DQA.Descriptors.Numeric, function(x) format(round((FirstModes(x)[1]),3),nsmall=3)),
  Second.Mode.Value=sapply(DQA.Descriptors.Numeric, function(x) format(round((SecondModes(x)[1]),3),nsmall=3)),
  First.Mode.Count=sapply(DQA.Descriptors.Numeric, function(x) sum(na.omit(x) == FirstModes(x)[1])),
  Second.Mode.Count=sapply(DQA.Descriptors.Numeric, function(x) sum(na.omit(x) == SecondModes(x)[1])),
  First.Second.Mode.Ratio=sapply(DQA.Descriptors.Numeric, function(x) format(round((sum(na.omit(x) == FirstModes(x)[1])/sum(na.omit(x) == SecondModes(x)[1])),3), nsmall=3)),
  Minimum=sapply(DQA.Descriptors.Numeric, function(x) format(round(min(x,na.rm = TRUE),3), nsmall=3)),
  Mean=sapply(DQA.Descriptors.Numeric, function(x) format(round(mean(x,na.rm = TRUE),3), nsmall=3)),
  Median=sapply(DQA.Descriptors.Numeric, function(x) format(round(median(x,na.rm = TRUE),3), nsmall=3)),
  Maximum=sapply(DQA.Descriptors.Numeric, function(x) format(round(max(x,na.rm = TRUE),3), nsmall=3)),
  Skewness=sapply(DQA.Descriptors.Numeric, function(x) format(round(skewness(x,na.rm = TRUE),3), nsmall=3)),
  Kurtosis=sapply(DQA.Descriptors.Numeric, function(x) format(round(moments::kurtosis(x,na.rm = TRUE),3), nsmall=3)),
  Percentile25th=sapply(DQA.Descriptors.Numeric, function(x) format(round(quantile(x,probs=0.25,na.rm = TRUE),3), nsmall=3)),
  Percentile75th=sapply(DQA.Descriptors.Numeric, function(x) format(round(quantile(x,probs=0.75,na.rm = TRUE),3), nsmall=3)),
  row.names=NULL)
  )

}

##    Column.Name Column.Type Unique.Count Unique.Count.Ratio First.Mode.Value
## 1       LDMORT     numeric           37              0.740           11.800
## 2       ARTINC     numeric           40              0.800           23.900
## 3       ASMORT     numeric           33              0.660           10.000
## 4       PSMUSE     numeric           42              0.840           83.000
## 5       RDMORT     numeric           48              0.960           54.400
## 6       PDMORT     numeric           49              0.980          108.100
## 7       CDMORT     numeric           49              0.980          242.100
## 8       HDMORT     numeric           49              0.980          147.900
## 9       STMORT     numeric           44              0.880           38.800
## 10      DBMORT     numeric           46              0.920           69.900
## 11      INFVAC     numeric           45              0.900           37.700
## 12      MEUNDA     numeric           19              0.380            3.700
## 13      OVWINC     numeric           39              0.780           66.900
## 14      HLTINS     numeric           47              0.940           18.000
## 15      SMPREV     numeric           43              0.860           21.700
##    Second.Mode.Value First.Mode.Count Second.Mode.Count First.Second.Mode.Ratio
## 1             14.300                3                 2                   1.500
## 2             22.500                3                 2                   1.500
## 3             10.700                2                 1                   2.000
## 4             78.500                2                 1                   2.000
## 5             60.100                2                 1                   2.000
## 6            148.300                2                 1                   2.000
## 7            292.100                2                 1                   2.000
## 8            224.000                2                 1                   2.000
## 9             48.300                2                 1                   2.000
## 10            59.500                2                 1                   2.000
## 11            44.200                3                 2                   1.500
## 12             3.600                8                 5                   1.600
## 13            64.000                2                 1                   2.000
## 14            17.900                2                 1                   2.000
## 15            25.400                2                 1                   2.000
##    Minimum    Mean  Median Maximum Skewness Kurtosis Percentile25th
## 1    6.800  10.540  10.200  22.500    1.690    7.736          8.700
## 2   19.000  24.784  24.150  35.900    0.992    4.597         23.125
## 3    7.500  11.323  11.000  16.700    0.195    2.112          9.450
## 4   76.200  82.594  82.700  87.900   -0.196    2.035         80.525
## 5   37.300  55.544  54.900  78.500    0.400    3.148         49.300
## 6   43.900 119.990 119.850 177.700   -0.059    3.045        102.325
## 7  164.700 219.974 208.750 300.500    0.752    2.704        195.150
## 8  116.500 166.954 158.050 229.900    0.677    2.653        147.825
## 9   25.600  36.858  36.750  48.800    0.085    2.430         33.500
## 10  42.300  67.372  64.850 117.200    0.903    4.207         57.050
## 11  29.000  39.010  39.200  49.000   -0.060    2.833         35.875
## 12   2.800   3.686   3.700   4.900    0.209    2.442          3.325
## 13  57.200  64.540  65.100  70.800   -0.260    2.664         62.600
## 14   5.400  15.844  15.700  29.100    0.363    3.068         12.550
## 15   9.500  19.020  18.900  28.200    0.194    3.201         16.625
##    Percentile75th
## 1          11.950
## 2          25.975
## 3          13.300
## 4          85.325
## 5          59.975
## 6         139.600
## 7         242.100
## 8         182.300
## 9          41.450
## 10         75.550
## 11         42.025
## 12          4.000
## 13         66.750
## 14         18.325
## 15         21.250

##################################
# Identifying potential data quality issues
##################################

##################################
# Checking for missing observations
##################################
if ((nrow(DQA.Summary[DQA.Summary$NA.Count>0,]))>0){
  print(paste0("Missing observations noted for ",
               (nrow(DQA.Summary[DQA.Summary$NA.Count>0,])),
               " variable(s) with NA.Count>0 and Fill.Rate<1.0."))
  DQA.Summary[DQA.Summary$NA.Count>0,]
} else {
  print("No missing observations noted.")
}

## [1] "Missing observations noted for 1 variable(s) with NA.Count>0 and Fill.Rate<1.0."

##   Column.Name Column.Type Row.Count NA.Count Fill.Rate
## 4      ASMORT     numeric        50       11     0.780

##################################
# Checking for zero or near-zero variance Descriptors
##################################
if (length(names(DQA.Descriptors.Factor))==0) {
  print("No factor descriptors noted.")
} else if (nrow(DQA.Descriptors.Factor.Summary[as.numeric(as.character(DQA.Descriptors.Factor.Summary$First.Second.Mode.Ratio))>5,])>0){
  print(paste0("Low variance observed for ",
               (nrow(DQA.Descriptors.Factor.Summary[as.numeric(as.character(DQA.Descriptors.Factor.Summary$First.Second.Mode.Ratio))>5,])),
               " factor variable(s) with First.Second.Mode.Ratio>5."))
  DQA.Descriptors.Factor.Summary[as.numeric(as.character(DQA.Descriptors.Factor.Summary$First.Second.Mode.Ratio))>5,]
} else {
  print("No low variance factor descriptors due to high first-second mode ratio noted.")
}

## [1] "No factor descriptors noted."

if (length(names(DQA.Descriptors.Numeric))==0) {
  print("No numeric descriptors noted.")
} else if (nrow(DQA.Descriptors.Numeric.Summary[as.numeric(as.character(DQA.Descriptors.Numeric.Summary$First.Second.Mode.Ratio))>5,])>0){
  print(paste0("Low variance observed for ",
               (nrow(DQA.Descriptors.Numeric.Summary[as.numeric(as.character(DQA.Descriptors.Numeric.Summary$First.Second.Mode.Ratio))>5,])),
               " numeric variable(s) with First.Second.Mode.Ratio>5."))
  DQA.Descriptors.Numeric.Summary[as.numeric(as.character(DQA.Descriptors.Numeric.Summary$First.Second.Mode.Ratio))>5,]
} else {
  print("No low variance numeric descriptors due to high first-second mode ratio noted.")
}

## [1] "No low variance numeric descriptors due to high first-second mode ratio noted."

if (length(names(DQA.Descriptors.Numeric))==0) {
  print("No numeric descriptors noted.")
} else if (nrow(DQA.Descriptors.Numeric.Summary[as.numeric(as.character(DQA.Descriptors.Numeric.Summary$Unique.Count.Ratio))<0.01,])>0){
  print(paste0("Low variance observed for ",
               (nrow(DQA.Descriptors.Numeric.Summary[as.numeric(as.character(DQA.Descriptors.Numeric.Summary$Unique.Count.Ratio))<0.01,])),
               " numeric variable(s) with Unique.Count.Ratio<0.01."))
  DQA.Descriptors.Numeric.Summary[as.numeric(as.character(DQA.Descriptors.Numeric.Summary$Unique.Count.Ratio))<0.01,]
} else {
  print("No low variance numeric descriptors due to low unique count ratio noted.")
}

## [1] "No low variance numeric descriptors due to low unique count ratio noted."

##################################
# Checking for skewed Descriptors
##################################
if (length(names(DQA.Descriptors.Numeric))==0) {
  print("No numeric descriptors noted.")
} else if (nrow(DQA.Descriptors.Numeric.Summary[as.numeric(as.character(DQA.Descriptors.Numeric.Summary$Skewness))>3 |
                                               as.numeric(as.character(DQA.Descriptors.Numeric.Summary$Skewness))<(-3),])>0){
  print(paste0("High skewness observed for ",
  (nrow(DQA.Descriptors.Numeric.Summary[as.numeric(as.character(DQA.Descriptors.Numeric.Summary$Skewness))>3 |
                                               as.numeric(as.character(DQA.Descriptors.Numeric.Summary$Skewness))<(-3),])),
  " numeric variable(s) with Skewness>3 or Skewness<(-3)."))
  DQA.Descriptors.Numeric.Summary[as.numeric(as.character(DQA.Descriptors.Numeric.Summary$Skewness))>3 |
                                 as.numeric(as.character(DQA.Descriptors.Numeric.Summary$Skewness))<(-3),]
} else {
  print("No skewed numeric descriptors noted.")
}

## [1] "No skewed numeric descriptors noted."

1.3.3 Data Preprocessing

1.3.3.1 Missing Data Imputation

[A] Missing data identified for 1 out of the 30 descriptors were replaced using multivariate imputation by chained equations with predictive mean matching specified as the imputation method.
[A.1] ASMORT: Null.Count=0, Fill.Rate=1.00

Code Chunk | Output

##################################
# Visualizing the missing data patterns
# prior to imputation
##################################
summary(DQA.Descriptors.Numeric)

##      LDMORT          ARTINC          ASMORT          PSMUSE     
##  Min.   : 6.80   Min.   :19.00   Min.   : 7.50   Min.   :76.20  
##  1st Qu.: 8.70   1st Qu.:23.12   1st Qu.: 9.45   1st Qu.:80.53  
##  Median :10.20   Median :24.15   Median :11.00   Median :82.70  
##  Mean   :10.54   Mean   :24.78   Mean   :11.32   Mean   :82.59  
##  3rd Qu.:11.95   3rd Qu.:25.98   3rd Qu.:13.30   3rd Qu.:85.33  
##  Max.   :22.50   Max.   :35.90   Max.   :16.70   Max.   :87.90  
##                                  NA's   :11                     
##      RDMORT          PDMORT          CDMORT          HDMORT     
##  Min.   :37.30   Min.   : 43.9   Min.   :164.7   Min.   :116.5  
##  1st Qu.:49.30   1st Qu.:102.3   1st Qu.:195.2   1st Qu.:147.8  
##  Median :54.90   Median :119.8   Median :208.8   Median :158.1  
##  Mean   :55.54   Mean   :120.0   Mean   :220.0   Mean   :167.0  
##  3rd Qu.:59.98   3rd Qu.:139.6   3rd Qu.:242.1   3rd Qu.:182.3  
##  Max.   :78.50   Max.   :177.7   Max.   :300.5   Max.   :229.9  
##                                                                 
##      STMORT          DBMORT           INFVAC          MEUNDA     
##  Min.   :25.60   Min.   : 42.30   Min.   :29.00   Min.   :2.800  
##  1st Qu.:33.50   1st Qu.: 57.05   1st Qu.:35.88   1st Qu.:3.325  
##  Median :36.75   Median : 64.85   Median :39.20   Median :3.700  
##  Mean   :36.86   Mean   : 67.37   Mean   :39.01   Mean   :3.686  
##  3rd Qu.:41.45   3rd Qu.: 75.55   3rd Qu.:42.02   3rd Qu.:4.000  
##  Max.   :48.80   Max.   :117.20   Max.   :49.00   Max.   :4.900  
##                                                                  
##      OVWINC          HLTINS          SMPREV     
##  Min.   :57.20   Min.   : 5.40   Min.   : 9.50  
##  1st Qu.:62.60   1st Qu.:12.55   1st Qu.:16.62  
##  Median :65.10   Median :15.70   Median :18.90  
##  Mean   :64.54   Mean   :15.84   Mean   :19.02  
##  3rd Qu.:66.75   3rd Qu.:18.32   3rd Qu.:21.25  
##  Max.   :70.80   Max.   :29.10   Max.   :28.20  
##

visdat::vis_miss(DQA.Descriptors.Numeric, sort_miss = FALSE)

##################################
# Conducting missing data imputation
# using Multivariate Imputation by Chained Equations
##################################
MICE_Model <- mice(DQA.Descriptors.Numeric, method='pmm', seed = 123)

## 
##  iter imp variable
##   1   1  ASMORT
##   1   2  ASMORT
##   1   3  ASMORT
##   1   4  ASMORT
##   1   5  ASMORT
##   2   1  ASMORT
##   2   2  ASMORT
##   2   3  ASMORT
##   2   4  ASMORT
##   2   5  ASMORT
##   3   1  ASMORT
##   3   2  ASMORT
##   3   3  ASMORT
##   3   4  ASMORT
##   3   5  ASMORT
##   4   1  ASMORT
##   4   2  ASMORT
##   4   3  ASMORT
##   4   4  ASMORT
##   4   5  ASMORT
##   5   1  ASMORT
##   5   2  ASMORT
##   5   3  ASMORT
##   5   4  ASMORT
##   5   5  ASMORT

DQA.Descriptors.Numeric <- complete(MICE_Model)

##################################
# Visualizing the missing data patterns
# after imputation
##################################
visdat::vis_miss(DQA.Descriptors.Numeric, sort_miss = FALSE)

1.3.3.2 Outlier Detection

[A] Minimal outliers noted for 7 out of the 15 descriptors. Descriptor values were visualized through a boxplot including observations classified as suspected outliers using the IQR criterion. The IQR criterion means that all observations above the (75th percentile + 1.5 x IQR) or below the (25th percentile - 1.5 x IQR) are suspected outliers, where IQR is the difference between the third quartile (75th percentile) and first quartile (25th percentile).
     [A.1] LDMORT = 1
     [A.2] ARTINC = 4
     [A.3] RDMORT = 2
     [A.5] PDMORT = 1
     [A.7] DBMORT = 1
     [A.9] HLTINS = 1
     [A.10] SMPREV = 1

Code Chunk | Output

##################################
# Loading dataset
##################################
DPA <- DQA.Descriptors.Numeric

##################################
# Gathering descriptive statistics
##################################
(DPA_Skimmed <- skim(DPA))

Data summary
Name	DPA
Number of rows	50
Number of columns	15
_______________________
Column type frequency:
numeric	15
________________________
Group variables	None

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
LDMORT	1	10.54	2.82	6.8	8.70	10.20	11.95	22.5	▇▇▂▁▁
ARTINC	1	24.78	3.29	19.0	23.13	24.15	25.98	35.9	▃▇▃▁▁
ASMORT	1	11.05	2.48	7.5	8.70	10.90	12.90	16.7	▇▆▆▅▁
PSMUSE	1	82.59	3.09	76.2	80.53	82.70	85.33	87.9	▃▃▇▆▅
RDMORT	1	55.54	9.27	37.3	49.30	54.90	59.98	78.5	▂▇▇▃▁
PDMORT	1	119.99	27.81	43.9	102.33	119.85	139.60	177.7	▁▃▇▇▂
CDMORT	1	219.97	33.08	164.7	195.15	208.75	242.10	300.5	▃▇▅▂▂
HDMORT	1	166.95	27.69	116.5	147.83	158.05	182.30	229.9	▃▇▅▃▂
STMORT	1	36.86	5.77	25.6	33.50	36.75	41.45	48.8	▃▇▇▅▃
DBMORT	1	67.37	14.99	42.3	57.05	64.85	75.55	117.2	▆▇▅▂▁
INFVAC	1	39.01	3.94	29.0	35.88	39.20	42.03	49.0	▂▆▇▇▁
MEUNDA	1	3.69	0.53	2.8	3.32	3.70	4.00	4.9	▆▆▇▅▂
OVWINC	1	64.54	3.20	57.2	62.60	65.10	66.75	70.8	▃▃▇▇▃
HLTINS	1	15.84	4.73	5.4	12.55	15.70	18.32	29.1	▂▇▇▃▁
SMPREV	1	19.02	3.68	9.5	16.62	18.90	21.25	28.2	▁▆▇▅▂

##################################
# Outlier Detection
##################################

##################################
# Listing all Descriptors
##################################
DPA.Descriptors <- DPA

##################################
# Listing all numeric Descriptors
##################################
DPA.Descriptors.Numeric <- DPA.Descriptors[,sapply(DPA.Descriptors, is.numeric)]

##################################
# Identifying outliers for the numeric Descriptors
##################################
OutlierCountList <- c()

for (i in 1:ncol(DPA.Descriptors.Numeric)) {
  Outliers <- boxplot.stats(DPA.Descriptors.Numeric[,i])$out
  OutlierCount <- length(Outliers)
  OutlierCountList <- append(OutlierCountList,OutlierCount)
  OutlierIndices <- which(DPA.Descriptors.Numeric[,i] %in% c(Outliers))
  print(
  ggplot(DPA.Descriptors.Numeric, aes(x=DPA.Descriptors.Numeric[,i])) +
  geom_boxplot() +
  theme_bw() +
  theme(axis.text.y=element_blank(), 
        axis.ticks.y=element_blank()) +
  xlab(names(DPA.Descriptors.Numeric)[i]) +
  labs(title=names(DPA.Descriptors.Numeric)[i],
       subtitle=paste0(OutlierCount, " Outlier(s) Detected")))
}

1.3.3.3 Zero and Near-Zero Variance

[A] No low variance observed for any descriptor in the dataset.

Code Chunk | Output

##################################
# Zero and Near-Zero Variance
##################################

##################################
# Identifying columns with low variance
###################################
DPA_LowVariance <- nearZeroVar(DPA,
                               freqCut = 80/20,
                               uniqueCut = 10,
                               saveMetrics= TRUE)
(DPA_LowVariance[DPA_LowVariance$nzv,])

## [1] freqRatio     percentUnique zeroVar       nzv          
## <0 rows> (or 0-length row.names)

if ((nrow(DPA_LowVariance[DPA_LowVariance$nzv,]))==0){
  
  print("No low variance descriptors noted.")
  
} else {

  print(paste0("Low variance observed for ",
               (nrow(DPA_LowVariance[DPA_LowVariance$nzv,])),
               " numeric variable(s) with First.Second.Mode.Ratio>4 and Unique.Count.Ratio<0.10."))
  
  DPA_LowVarianceForRemoval <- (nrow(DPA_LowVariance[DPA_LowVariance$nzv,]))
  
  print(paste0("Low variance can be resolved by removing ",
               (nrow(DPA_LowVariance[DPA_LowVariance$nzv,])),
               " numeric variable(s)."))
  
  for (j in 1:DPA_LowVarianceForRemoval) {
  DPA_LowVarianceRemovedVariable <- rownames(DPA_LowVariance[DPA_LowVariance$nzv,])[j]
  print(paste0("Variable ",
               j,
               " for removal: ",
               DPA_LowVarianceRemovedVariable))
  }
  
  DPA %>%
  skim() %>%
  dplyr::filter(skim_variable %in% rownames(DPA_LowVariance[DPA_LowVariance$nzv,]))

}

## [1] "No low variance descriptors noted."

1.3.3.4 Collinearity

[A] High multicollinearity with Pearson correlation coefficients >90% was noted in a pair of descriptors in the dataset.
[A.1] HDMORT and CDMORT = 0.99
[A.2] CDMORT was removed based on its higher correlation with other descriptors.

Code Chunk | Output

##################################
# Measuring pairwise correlation between descriptors
##################################
(DPA_Correlation <- cor(DPA.Descriptors.Numeric,
                        method = "pearson",
                        use="pairwise.complete.obs"))

##             LDMORT       ARTINC       ASMORT       PSMUSE      RDMORT
## LDMORT  1.00000000  0.173093335 -0.232025288 -0.345100550  0.02923731
## ARTINC  0.17309333  1.000000000 -0.042746459  0.024546154  0.60290597
## ASMORT -0.23202529 -0.042746459  1.000000000 -0.087053707  0.02466696
## PSMUSE -0.34510055  0.024546154 -0.087053707  1.000000000 -0.11152216
## RDMORT  0.02923731  0.602905967  0.024666956 -0.111522163  1.00000000
## PDMORT  0.45267604  0.662922162 -0.210484947 -0.286784690  0.50529506
## CDMORT  0.07404928  0.667228833  0.023618891 -0.214153256  0.58858032
## HDMORT  0.04934776  0.636548347 -0.001775156 -0.206862030  0.52444923
## STMORT  0.17126776  0.582853439  0.066770232 -0.219881708  0.70019055
## DBMORT  0.17609150  0.481600436  0.191885492 -0.124643846  0.61452185
## INFVAC -0.10505851  0.103088137  0.100706735  0.399251809  0.24772090
## MEUNDA  0.22133327  0.803258627 -0.017665903  0.009246146  0.41867094
## OVWINC  0.16735471  0.498314042  0.149667042 -0.217877006  0.63784116
## HLTINS  0.37829219  0.002972526 -0.108605067 -0.488950184  0.12993910
## SMPREV  0.23811659  0.823678167 -0.099197602 -0.055348594  0.59858835
##             PDMORT      CDMORT       HDMORT      STMORT      DBMORT      INFVAC
## LDMORT  0.45267604  0.07404928  0.049347761  0.17126776  0.17609150 -0.10505851
## ARTINC  0.66292216  0.66722883  0.636548347  0.58285344  0.48160044  0.10308814
## ASMORT -0.21048495  0.02361889 -0.001775156  0.06677023  0.19188549  0.10070674
## PSMUSE -0.28678469 -0.21415326 -0.206862030 -0.21988171 -0.12464385  0.39925181
## RDMORT  0.50529506  0.58858032  0.524449231  0.70019055  0.61452185  0.24772090
## PDMORT  1.00000000  0.61855185  0.581638625  0.56453130  0.51296109 -0.05135669
## CDMORT  0.61855185  1.00000000  0.986330154  0.79569366  0.46503091 -0.09852403
## HDMORT  0.58163863  0.98633015  1.000000000  0.69775225  0.41225978 -0.11351086
## STMORT  0.56453130  0.79569366  0.697752254  1.00000000  0.50413319  0.02353408
## DBMORT  0.51296109  0.46503091  0.412259779  0.50413319  1.00000000  0.24952376
## INFVAC -0.05135669 -0.09852403 -0.113510863  0.02353408  0.24952376  1.00000000
## MEUNDA  0.48590617  0.64341929  0.640170352  0.49206776  0.35231401 -0.17366823
## OVWINC  0.60733512  0.67870794  0.621196381  0.68784352  0.49793894 -0.06691508
## HLTINS  0.36864746  0.41605763  0.378207992  0.45854866  0.01034802 -0.51094161
## SMPREV  0.73851432  0.66017408  0.630688330  0.59808334  0.49613487  0.06481756
##              MEUNDA      OVWINC       HLTINS      SMPREV
## LDMORT  0.221333267  0.16735471  0.378292185  0.23811659
## ARTINC  0.803258627  0.49831404  0.002972526  0.82367817
## ASMORT -0.017665903  0.14966704 -0.108605067 -0.09919760
## PSMUSE  0.009246146 -0.21787701 -0.488950184 -0.05534859
## RDMORT  0.418670943  0.63784116  0.129939097  0.59858835
## PDMORT  0.485906174  0.60733512  0.368647458  0.73851432
## CDMORT  0.643419286  0.67870794  0.416057632  0.66017408
## HDMORT  0.640170352  0.62119638  0.378207992  0.63068833
## STMORT  0.492067763  0.68784352  0.458548660  0.59808334
## DBMORT  0.352314013  0.49793894  0.010348016  0.49613487
## INFVAC -0.173668230 -0.06691508 -0.510941607  0.06481756
## MEUNDA  1.000000000  0.27967009  0.152816174  0.60964853
## OVWINC  0.279670088  1.00000000  0.378877840  0.67261164
## HLTINS  0.152816174  0.37887784  1.000000000  0.13686509
## SMPREV  0.609648533  0.67261164  0.136865092  1.00000000

##################################
# Testing pairwise correlation between descriptors
##################################
DPA_CorrelationTest <- cor.mtest(DPA.Descriptors.Numeric,
                       method = "pearson",
                       conf.level = 0.95)

##################################
# Visualizing pairwise correlation between descriptors
##################################
corrplot(cor(DPA.Descriptors.Numeric,
             method = "pearson",
             use="pairwise.complete.obs"),
             method = "circle",
             type = "upper",
             order = "original",
             tl.col = "black",
             tl.cex = 0.75,
             tl.srt = 90,
             sig.level = 0.05,
             p.mat = DPA_CorrelationTest$p,
             insig = "blank")

corrplot(cor(DPA.Descriptors.Numeric,
             method = "pearson",
             use="pairwise.complete.obs"),
             method = "number",
             type = "upper",
             order = "original",
             tl.col = "black",
             tl.cex = 0.75,
             tl.srt = 90,
             sig.level = 0.05,
             p.mat = DPA_CorrelationTest$p,
             insig = "blank")

##################################
# Identifying the highly correlated variables
##################################
(DPA_HighlyCorrelatedCount <- sum(abs(DPA_Correlation[upper.tri(DPA_Correlation)])>0.90))

## [1] 1

if (DPA_HighlyCorrelatedCount > 0) {
  DPA_HighlyCorrelated <- findCorrelation(DPA_Correlation, cutoff = 0.90)
  
  (DPA_HighlyCorrelatedForRemoval <- length(DPA_HighlyCorrelated))
  
  print(paste0("High correlation can be resolved by removing ",
               (DPA_HighlyCorrelatedForRemoval),
               " numeric variable(s)."))
  
  for (j in 1:DPA_HighlyCorrelatedForRemoval) {
    DPA_HighlyCorrelatedRemovedVariable <- colnames(DPA.Descriptors.Numeric)[DPA_HighlyCorrelated[j]]
    print(paste0("Variable ",
                 j,
                 " for removal: ",
                 DPA_HighlyCorrelatedRemovedVariable))
  }
  
}

## [1] "High correlation can be resolved by removing 1 numeric variable(s)."
## [1] "Variable 1 for removal: CDMORT"

if (DPA_HighlyCorrelatedCount == 0) {
  print("No highly correlated predictors noted.")
} else {
  print(paste0("High correlation observed for ",
               (DPA_HighlyCorrelatedCount),
               " pairs of numeric variable(s) with Correlation.Coefficient>0.90."))
  
  (DPA_HighlyCorrelatedPairs <- corr_cross(DPA.Descriptors.Numeric,
  max_pvalue = 0.05, 
  top = DPA_HighlyCorrelatedCount,
  rm.na = TRUE,
  grid = FALSE
))
  
}

## [1] "High correlation observed for 1 pairs of numeric variable(s) with Correlation.Coefficient>0.90."

##################################
# Removing individual descriptors
# among the highly correlated pair
##################################
DPA.Descriptors.Numeric <- DPA.Descriptors.Numeric[,!colnames(DPA.Descriptors.Numeric) %in% c("CDMORT")]

1.3.3.5 Linear Dependencies

[A] No linear dependencies observed for any subset of decriptors in the dataset.

Code Chunk | Output

##################################
# Linear Dependencies
##################################

##################################
# Finding linear dependencies
##################################
DPA_LinearlyDependent <- findLinearCombos(DPA.Descriptors.Numeric)

##################################
# Identifying the linearly dependent variables
##################################
DPA_LinearlyDependent <- findLinearCombos(DPA.Descriptors.Numeric)

(DPA_LinearlyDependentCount <- length(DPA_LinearlyDependent$linearCombos))

## [1] 0

if (DPA_LinearlyDependentCount == 0) {
  print("No linearly dependent predictors noted.")
} else {
  print(paste0("Linear dependency observed for ",
               (DPA_LinearlyDependentCount),
               " subset(s) of numeric variable(s)."))
  
  for (i in 1:DPA_LinearlyDependentCount) {
    DPA_LinearlyDependentSubset <- colnames(DPA.Descriptors.Numeric)[DPA_LinearlyDependent$linearCombos[[i]]]
    print(paste0("Linear dependent variable(s) for subset ",
                 i,
                 " include: ",
                 DPA_LinearlyDependentSubset))
  }
  
}

## [1] "No linearly dependent predictors noted."

##################################
# Identifying the linearly dependent variables for removal
##################################

if (DPA_LinearlyDependentCount > 0) {
  DPA_LinearlyDependent <- findLinearCombos(DPA.Descriptors.Numeric)
  
  DPA_LinearlyDependentForRemoval <- length(DPA_LinearlyDependent$remove)
  
  print(paste0("Linear dependency can be resolved by removing ",
               (DPA_LinearlyDependentForRemoval),
               " numeric variable(s)."))
  
  for (j in 1:DPA_LinearlyDependentForRemoval) {
  DPA_LinearlyDependentRemovedVariable <- colnames(DPA.Descriptors.Numeric)[DPA_LinearlyDependent$remove[j]]
  print(paste0("Variable ",
               j,
               " for removal: ",
               DPA_LinearlyDependentRemovedVariable))
  }

}

1.3.3.6 Distributional Shape

[A] No shape transformation was necessary as the distributional skewness observed among individual descriptors was normal (all values within +3 and -3) with minimal outliers.

Code Chunk | Output

##################################
# Distributional Shape
##################################

##################################
# Formulating the histogram
# for the numeric descriptors
##################################
for (i in 1:ncol(DPA.Descriptors.Numeric)) {
  Median <- format(round(median(DPA.Descriptors.Numeric[,i],na.rm = TRUE),2), nsmall=2)
  Mean <- format(round(mean(DPA.Descriptors.Numeric[,i],na.rm = TRUE),2), nsmall=2)
  Skewness <- format(round(skewness(DPA.Descriptors.Numeric[,i],na.rm = TRUE),2), nsmall=2)
  print(
  ggplot(DPA.Descriptors.Numeric, aes(x=DPA.Descriptors.Numeric[,i])) +
  geom_histogram(binwidth=1,color="black", fill="white") +
  geom_vline(aes(xintercept=mean(DPA.Descriptors.Numeric[,i])),
            color="blue", size=1) +
    geom_vline(aes(xintercept=median(DPA.Descriptors.Numeric[,i])),
            color="red", size=1) +
  theme_bw() +
  ylab("Count") +
  xlab(names(DPA.Descriptors.Numeric)[i]) +
  labs(title=names(DPA.Descriptors.Numeric)[i],
       subtitle=paste0("Median = ", Median,
                       ", Mean = ", Mean,
                       ", Skewness = ", Skewness)))
}

1.3.3.7 Data Standardization

[A] Z-score normalization was applied to generate a transformed data with a distributional mean of zero and a standard deviation equal to one - allowing a consistent scale among the descriptors.

Code Chunk | Output

##################################
# Data Normalization
##################################
DPA.Descriptors.Numeric.CenteredScaled <- preProcess(DPA.Descriptors.Numeric, method = c("center","scale"))
DPA.Descriptors.Numeric <- predict(DPA.Descriptors.Numeric.CenteredScaled, DPA.Descriptors.Numeric)

1.3.4 Data Pre-Assessment

1.3.4.1 Correlation Matrix Assessment - Kaiser-Meyer-Olkin Factor Adequacy

[A] 5 descriptors with low KMO values were excluded from the dataset to improve the strength and patterns of relationships between variables and ensure sufficient common variance among variables for extracting underlying factors.
     [A.1] ASMORT = 0.15
     [A.2] INFVAC = 0.35
     [A.3] HLTINS = 0.56
     [A.4] LDMORT = 0.47
     [A.5] PSMUSE = 0.59

[B] The KMO measure of sampling adequacy was acceptable with a computed value of 0.87 for the complete model, indicating the suitability of the data for exploratory factor analysis. The estimated proportion of variance among all the observed variables was sufficiently adequate.

[C] The sampling adequacy for each variable in the model was acceptable with KMO values ranging from 0.78 to 0.94. Results indicated that each variable can be sufficiently predicted by other variables.
     [C.1] PDMORT = 0.94
     [C.2] DBMORT = 0.92
     [C.3] STMORT = 0.90
     [C.4] HDMORT = 0.89
     [C.5] RDMORT = 0.88
     [C.6] SMPREV = 0.88
     [C.7] OVWINC = 0.85
     [C.8] ARTINC = 0.83
     [C.9] MEUNDA = 0.78

Code Chunk | Output

##################################
# Calculating the Kaiser-Meyer-Olkin Factor Adequacy
##################################
(DPA_KMOFactorAdequacy <- KMO(DPA.Descriptors.Numeric))

## 
## ── Kaiser-Meyer-Olkin criterion (KMO) ──────────────────────────────────────────
## 
## ! The overall KMO value for your data is mediocre.
##   These data are probably suitable for factor analysis.
## 
##   Overall: 0.679
## 
##   For each variable:
## LDMORT ARTINC ASMORT PSMUSE RDMORT PDMORT HDMORT STMORT DBMORT INFVAC MEUNDA 
##  0.359  0.800  0.152  0.391  0.716  0.873  0.721  0.891  0.849  0.284  0.547 
## OVWINC HLTINS SMPREV 
##  0.645  0.670  0.900

##################################
# Removing individual descriptors
# with low KMO values
##################################
DPA.Descriptors.Numeric <- DPA.Descriptors.Numeric[,!colnames(DPA.Descriptors.Numeric) %in% c("ASMORT")]

##################################
# Calculating the Kaiser-Meyer-Olkin Factor Adequacy
##################################
(DPA_KMOFactorAdequacy <- KMO(DPA.Descriptors.Numeric))

## 
## ── Kaiser-Meyer-Olkin criterion (KMO) ──────────────────────────────────────────
## 
## ✔ The overall KMO value for your data is middling.
##   These data are probably suitable for factor analysis.
## 
##   Overall: 0.752
## 
##   For each variable:
## LDMORT ARTINC PSMUSE RDMORT PDMORT HDMORT STMORT DBMORT INFVAC MEUNDA OVWINC 
##  0.434  0.790  0.546  0.832  0.855  0.785  0.882  0.851  0.350  0.603  0.763 
## HLTINS SMPREV 
##  0.655  0.906

##################################
# Removing individual descriptors
# with low KMO values
##################################
DPA.Descriptors.Numeric <- DPA.Descriptors.Numeric[,!colnames(DPA.Descriptors.Numeric) %in% c("INFVAC")]

##################################
# Calculating the Kaiser-Meyer-Olkin Factor Adequacy
##################################
(DPA_KMOFactorAdequacy <- KMO(DPA.Descriptors.Numeric))

## 
## ── Kaiser-Meyer-Olkin criterion (KMO) ──────────────────────────────────────────
## 
## ✔ The overall KMO value for your data is meritorious.
##   These data are probably suitable for factor analysis.
## 
##   Overall: 0.81
## 
##   For each variable:
## LDMORT ARTINC PSMUSE RDMORT PDMORT HDMORT STMORT DBMORT MEUNDA OVWINC HLTINS 
##  0.587  0.780  0.738  0.880  0.851  0.841  0.875  0.840  0.703  0.854  0.562 
## SMPREV 
##  0.900

##################################
# Removing individual descriptors
# with low KMO values
##################################
DPA.Descriptors.Numeric <- DPA.Descriptors.Numeric[,!colnames(DPA.Descriptors.Numeric) %in% c("HLTINS")]

##################################
# Calculating the Kaiser-Meyer-Olkin Factor Adequacy
##################################
(DPA_KMOFactorAdequacy <- KMO(DPA.Descriptors.Numeric))

## 
## ── Kaiser-Meyer-Olkin criterion (KMO) ──────────────────────────────────────────
## 
## ✔ The overall KMO value for your data is meritorious.
##   These data are probably suitable for factor analysis.
## 
##   Overall: 0.832
## 
##   For each variable:
## LDMORT ARTINC PSMUSE RDMORT PDMORT HDMORT STMORT DBMORT MEUNDA OVWINC SMPREV 
##  0.469  0.822  0.645  0.865  0.878  0.805  0.902  0.927  0.734  0.857  0.885

##################################
# Removing individual descriptors
# with low KMO values
##################################
DPA.Descriptors.Numeric <- DPA.Descriptors.Numeric[,!colnames(DPA.Descriptors.Numeric) %in% c("LDMORT")]

##################################
# Calculating the Kaiser-Meyer-Olkin Factor Adequacy
##################################
(DPA_KMOFactorAdequacy <- KMO(DPA.Descriptors.Numeric))

## 
## ── Kaiser-Meyer-Olkin criterion (KMO) ──────────────────────────────────────────
## 
## ✔ The overall KMO value for your data is meritorious.
##   These data are probably suitable for factor analysis.
## 
##   Overall: 0.866
## 
##   For each variable:
## ARTINC PSMUSE RDMORT PDMORT HDMORT STMORT DBMORT MEUNDA OVWINC SMPREV 
##  0.828  0.589  0.880  0.901  0.892  0.908  0.921  0.785  0.857  0.878

##################################
# Removing individual descriptors
# with low KMO values
##################################
DPA.Descriptors.Numeric <- DPA.Descriptors.Numeric[,!colnames(DPA.Descriptors.Numeric) %in% c("PSMUSE")]

##################################
# Calculating the Kaiser-Meyer-Olkin Factor Adequacy
##################################
(DPA_KMOFactorAdequacy <- KMO(DPA.Descriptors.Numeric))

## 
## ── Kaiser-Meyer-Olkin criterion (KMO) ──────────────────────────────────────────
## 
## ✔ The overall KMO value for your data is meritorious.
##   These data are probably suitable for factor analysis.
## 
##   Overall: 0.871
## 
##   For each variable:
## ARTINC RDMORT PDMORT HDMORT STMORT DBMORT MEUNDA OVWINC SMPREV 
##  0.828  0.879  0.935  0.888  0.904  0.919  0.782  0.853  0.876

1.3.4.2 Correlation Matrix Assessment - Bartlett’s Test of Sphericity

[A] The computed p-value from the Bartlett’s Test of Sphericity was statistically significant (<0.00001), rejecting the null hypothesis that the correlation matrix is an identity matrix (ones on the diagonal and zeros on the off-diagonal). Results indicated that there is enough evidence to support the existence of underlying factors, suggesting that the correlation matrix is appropriate for exploratory factor analysis.

Code Chunk | Output

##################################
# Calculating the Bartlett's Test of Sphericity
##################################
(DPA_BartlettTest <- cortest.bartlett(DPA.Descriptors.Numeric,
                                      n=nrow(DPA.Descriptors.Numeric)))

## $chisq
## [1] 321.9186
## 
## $p.value
## [1] 1.28086e-47
## 
## $df
## [1] 36

1.3.4.3 Correlation Matrix Assessment - Covariance Validity

[A] Covariance among descriptors in the correlation matrix was sufficient to justify the conduct of an exploratory factor analysis. 97% (35/36) of the pairwise associations using the Pearson correlation coefficient were above 30%.

Code Chunk | Output

##################################
# Measuring pairwise correlation between descriptors
##################################
(DPA_Correlation <- cor(DPA.Descriptors.Numeric,
                        method = "pearson",
                        use="pairwise.complete.obs"))

##           ARTINC    RDMORT    PDMORT    HDMORT    STMORT    DBMORT    MEUNDA
## ARTINC 1.0000000 0.6029060 0.6629222 0.6365483 0.5828534 0.4816004 0.8032586
## RDMORT 0.6029060 1.0000000 0.5052951 0.5244492 0.7001906 0.6145218 0.4186709
## PDMORT 0.6629222 0.5052951 1.0000000 0.5816386 0.5645313 0.5129611 0.4859062
## HDMORT 0.6365483 0.5244492 0.5816386 1.0000000 0.6977523 0.4122598 0.6401704
## STMORT 0.5828534 0.7001906 0.5645313 0.6977523 1.0000000 0.5041332 0.4920678
## DBMORT 0.4816004 0.6145218 0.5129611 0.4122598 0.5041332 1.0000000 0.3523140
## MEUNDA 0.8032586 0.4186709 0.4859062 0.6401704 0.4920678 0.3523140 1.0000000
## OVWINC 0.4983140 0.6378412 0.6073351 0.6211964 0.6878435 0.4979389 0.2796701
## SMPREV 0.8236782 0.5985883 0.7385143 0.6306883 0.5980833 0.4961349 0.6096485
##           OVWINC    SMPREV
## ARTINC 0.4983140 0.8236782
## RDMORT 0.6378412 0.5985883
## PDMORT 0.6073351 0.7385143
## HDMORT 0.6211964 0.6306883
## STMORT 0.6878435 0.5980833
## DBMORT 0.4979389 0.4961349
## MEUNDA 0.2796701 0.6096485
## OVWINC 1.0000000 0.6726116
## SMPREV 0.6726116 1.0000000

##################################
# Identifying the minimally correlated variables
##################################
(DPA_MinimallyCorrelatedCount <- sum(abs(DPA_Correlation[upper.tri(DPA_Correlation)])>0.30))

## [1] 35

(DPA_AllPairs <- length(DPA.Descriptors.Numeric)*(length(DPA.Descriptors.Numeric)-1)/2)

## [1] 36

(DPA_MinimallyCorrelatedCountPercentage <- DPA_MinimallyCorrelatedCount/DPA_AllPairs)

## [1] 0.9722222

qgraph(cor(DPA.Descriptors.Numeric),
       cut=0.30,
       details=FALSE,
       posCol="#2F75B5",
       negCol="#FF5050",
       labels=names(DPA.Descriptors.Numeric))

1.3.4.4 Correlation Matrix Assessment - Determinant Computation

[A] The determinant of the correlation matrix was computed as 0.00080 which was greater than 0.00001 indicating the absence of the likelihood for a multicollinearity problem. The results allowed for matrix operations to produce stable results during exploratory factor analysis.

Code Chunk | Output

##################################
# Computing the determinant of the correlation matrix
##################################
(DPA_CorrelationMatrixDeterminant <- det(cor(DPA.Descriptors.Numeric)))

## [1] 0.0008028455

1.3.4.5 Pre-Processed Dataset

[A] The pre-processed tabular dataset was comprised of 50 observations and 9 descriptor variables.
     [A.1] 50 rows (observations)
     [A.2] 9 columns (descriptors)
            [A.2.1] PDMORT
            [A.2.2] DBMORT
            [A.2.3] STMORT
            [A.2.4] HDMORT
            [A.2.5] RDMORT
            [A.2.6] SMPREV
            [A.2.7] OVWINC
            [A.2.8] ARTINC
            [A.2.9] MEUNDA

Code Chunk | Output

##################################
# Gathering descriptive statistics
##################################
(DPA_Skimmed <- skim(DPA.Descriptors.Numeric))

Data summary
Name	DPA.Descriptors.Numeric
Number of rows	50
Number of columns	9
_______________________
Column type frequency:
numeric	9
________________________
Group variables	None

Variable type: numeric

skim_variable	complete_rate	sd	p0	p25	p50	p75	p100	hist
ARTINC	1	1	-1.76	-0.50	-0.19	0.36	3.38	▃▇▃▁▁
RDMORT	1	1	-1.97	-0.67	-0.07	0.48	2.48	▂▇▇▃▁
PDMORT	1	1	-2.74	-0.64	-0.01	0.71	2.08	▁▃▇▇▂
HDMORT	1	1	-1.82	-0.69	-0.32	0.55	2.27	▃▇▅▃▂
STMORT	1	1	-1.95	-0.58	-0.02	0.80	2.07	▃▇▇▅▃
DBMORT	1	1	-1.67	-0.69	-0.17	0.55	3.32	▆▇▅▂▁
MEUNDA	1	1	-1.66	-0.68	0.03	0.59	2.28	▆▆▇▅▂
OVWINC	1	1	-2.29	-0.61	0.18	0.69	1.96	▃▃▇▇▃
SMPREV	1	1	-2.59	-0.65	-0.03	0.61	2.50	▁▆▇▅▂

1.3.5 Data Exploration

1.3.5.1 Correlation Matrix

[A] All pairwise correlations among descriptors were positive with values ranging from 0.28 to 0.82 and were determined as statistically significant.

Code Chunk | Output

##################################
# Formulating the histogram
# for the numeric descriptors
##################################
for (i in 1:ncol(DPA.Descriptors.Numeric)) {
  Median <- format(round(median(DPA.Descriptors.Numeric[,i],na.rm = TRUE),2), nsmall=2)
  Mean <- format(round(mean(DPA.Descriptors.Numeric[,i],na.rm = TRUE),2), nsmall=2)
  Skewness <- format(round(skewness(DPA.Descriptors.Numeric[,i],na.rm = TRUE),2), nsmall=2)
  print(
  ggplot(DPA.Descriptors.Numeric, aes(x=DPA.Descriptors.Numeric[,i])) +
  geom_histogram(binwidth=0.25, color="black", fill="white", breaks = seq(-4, 4, by = 1)) +
    scale_x_continuous(breaks = seq(-4, 4, 1)) +
  geom_vline(aes(xintercept=mean(DPA.Descriptors.Numeric[,i])),
            color="blue", size=1) +
    geom_vline(aes(xintercept=median(DPA.Descriptors.Numeric[,i])),
            color="red", size=1) +
  theme_bw() +
  ylab("Count") +
  xlab(names(DPA.Descriptors.Numeric)[i]) +
  labs(title=names(DPA.Descriptors.Numeric)[i],
       subtitle=paste0("Median = ", Median,
                       ", Mean = ", Mean,
                       ", Skewness = ", Skewness)))
}

##################################
# Plotting the scatterplot matrix
# between descriptors
##################################
DPA_ScatterplotMatrix <- ggpairs(DPA.Descriptors.Numeric, 
                                 upper=list(continuous="points"), 
                                 lower=list(continuous="smooth_loess"), 
                                 diag=list(continuous="barDiag"))

##################################
# Measuring pairwise correlation 
# between the updated descriptors
##################################
(DPA_Correlation <- cor(DPA.Descriptors.Numeric,
                        method = "pearson",
                        use="pairwise.complete.obs"))

##           ARTINC    RDMORT    PDMORT    HDMORT    STMORT    DBMORT    MEUNDA
## ARTINC 1.0000000 0.6029060 0.6629222 0.6365483 0.5828534 0.4816004 0.8032586
## RDMORT 0.6029060 1.0000000 0.5052951 0.5244492 0.7001906 0.6145218 0.4186709
## PDMORT 0.6629222 0.5052951 1.0000000 0.5816386 0.5645313 0.5129611 0.4859062
## HDMORT 0.6365483 0.5244492 0.5816386 1.0000000 0.6977523 0.4122598 0.6401704
## STMORT 0.5828534 0.7001906 0.5645313 0.6977523 1.0000000 0.5041332 0.4920678
## DBMORT 0.4816004 0.6145218 0.5129611 0.4122598 0.5041332 1.0000000 0.3523140
## MEUNDA 0.8032586 0.4186709 0.4859062 0.6401704 0.4920678 0.3523140 1.0000000
## OVWINC 0.4983140 0.6378412 0.6073351 0.6211964 0.6878435 0.4979389 0.2796701
## SMPREV 0.8236782 0.5985883 0.7385143 0.6306883 0.5980833 0.4961349 0.6096485
##           OVWINC    SMPREV
## ARTINC 0.4983140 0.8236782
## RDMORT 0.6378412 0.5985883
## PDMORT 0.6073351 0.7385143
## HDMORT 0.6211964 0.6306883
## STMORT 0.6878435 0.5980833
## DBMORT 0.4979389 0.4961349
## MEUNDA 0.2796701 0.6096485
## OVWINC 1.0000000 0.6726116
## SMPREV 0.6726116 1.0000000

##################################
# Setting the correlation breaks
##################################
DPA_Breaks = seq(0.25, 1.00, 0.01)

##################################
# Setting the correlation binnings
##################################
DPA_CorrelationBins <- .bincode(DPA_Correlation, 
                                DPA_Breaks, 
                                include.lowest = T)

##################################
# Transforming the correlation values
# into color ranges
##################################
DPA_ScatterplotMatrixColorRange <- matrix(factor(DPA_CorrelationBins, 
                                                 levels = 1:length(DPA_Breaks), 
                                                 labels = rev(colorpanel(length(DPA_Breaks),"#DA2A2A","#F2B2B2","#FBE5E5"))),
                                          DPA_ScatterplotMatrix$nrow)

##################################
# Updating the scatterplot matrix
# through color-coding depending
# on the correlation values
##################################
for(i in 1:DPA_ScatterplotMatrix$nrow) {
  for(j in 1:DPA_ScatterplotMatrix$ncol){
    DPA_ScatterplotMatrix[i,j] <- DPA_ScatterplotMatrix[i,j] + 
      theme(panel.background= element_rect(fill=DPA_ScatterplotMatrixColorRange[i,j]))
    
  if(i == j){
    DPA_ScatterplotMatrix[i,j] <- DPA_ScatterplotMatrix[i,j] + 
      theme(panel.background= element_rect(fill="#EBECF0"))
    }

}}

##################################
# Plotting the scatterplot matrix
# color-coded by correlation values
##################################
DPA_ScatterplotMatrix

##################################
# Testing pairwise correlation between descriptors
##################################
DPA_CorrelationTest <- cor.mtest(DPA.Descriptors.Numeric,
                       method = "pearson",
                       conf.level = 0.95)

##################################
# Plotting the pairwise association
# between descriptors
##################################
corrplot(cor(DPA.Descriptors.Numeric,
             method = "pearson",
             use="pairwise.complete.obs"),
             method = "number",
             type = "upper",
             order = "original",
             tl.col = "black",
             tl.cex = 0.75,
             tl.srt = 90,
             sig.level = 0.05,
             p.mat = DPA_CorrelationTest$p,
             insig = "blank")

1.3.6 Model Development

1.3.6.1 Principal Axes Factor Extraction and Varimax Rotation

[A] Appplying Principal Axes factor extraction and Varimax rotation, an evaluation was conducted using a set of empirical guidelines to determine the optimal number of factors to be retained for exploratory factor analysis. It was determined that:
     [A.1] The choice of 3 factors was supported by the Bentler, CNG, t and p methods for determining the optimal number of factors.
     [A.2] The choice of 4 factors was supported by the Beta and Bartlett methods for determining the optimal number of factors.
     [A.3] 3 or 4 factors would be sufficiently plausible for an optimal balance between comprehensiveness and parsimony.
     [A.4] To ensure that both under-extraction and over-extraction are assessed, models with 3 and 4 and factors were sequentially evaluated for their interpretability and theoretical meaningfulness.

[B] Results for the exploratory factor analysis using a 3-Factor Structure were as follows:
     [B.1] Standardized Root Mean Square of the Residual = 0.03
     [B.2] Tucker-Lewis Fit Index = 0.98
     [B.3] Bayesian Information Criterion = -33.48
     [B.4] High Residual Rate = 0.11
     [B.5] Factor 1 was a latent factor with higher loading towards the following descriptors:
            [B.5.1] STMORT: Loading = 0.82, Communality = 0.80
            [B.5.2] OVWINC: Loading = 0.73, Communality = 0.72
            [B.5.3] RDMORT: Loading = 0.70, Communality = 0.63
            [B.5.4] HDMORT: Loading = 0.57, Communality = 0.62
            [B.5.5] DBMORT: Loading = 0.52, Communality = 0.41
            [B.5.6] Cronbach’s Alpha = 0.88
     [B.6] Factor 2 was a latent factor with higher loading towards the following descriptors:
            [B.6.1] MEUNDA: Loading = 0.99, Communality = 1.06
            [B.6.2] ARTINC: Loading = 0.62, Communality = 0.84
            [B.6.3] Cronbach’s Alpha = 0.89
     [B.7] Factor 3 was a latent factor with higher loading towards the following descriptors:
            [B.7.1] SMPREV: Loading = 0.77, Communality = 0.90
            [B.7.2] PDMORT: Loading = 0.60, Communality = 0.62
            [B.7.3] Cronbach’s Alpha = 0.85

[C] Results for the exploratory factor analysis using a 4-Factor Structure were as follows:
     [C.1] Standardized Root Mean Square of the Residual = 0.01
     [C.2] Tucker-Lewis Fit Index = 1.08
     [C.3] Bayesian Information Criterion = -21.18
     [C.4] High Residual Rate = 0.05
     [C.5] Factor 1 was a latent factor with higher loading towards the following descriptors:
            [C.5.1] MEUNDA: Loading = 0.90, Communality = 0.90
            [C.5.2] ARTINC: Loading = 0.70, Communality = 0.89
            [C.5.3] Cronbach’s Alpha = 0.89
     [C.6] Factor 2 was a latent factor with higher loading towards the following descriptors:
            [C.6.1] RDMORT: Loading = 0.89, Communality = 0.95
            [C.6.2] DBMORT: Loading = 0.50, Communality = 0.44
            [C.6.3] Standardized Cronbach’s Alpha = 0.76
     [C.7] Factor 3 was a latent factor with higher loading towards the following descriptors:
            [C.7.1] SMPREV: Loading = 0.73, Communality = 0.87
            [C.7.2] PDMORT: Loading = 0.61, Communality = 0.65
            [C.7.3] Cronbach’s Alpha = 0.85
     [C.8] Factor 4 was a latent factor with higher loading towards the following descriptors:
            [C.8.1] HDMORT: Loading = 0.68, Communality = 0.79
            [C.8.2] STMORT: Loading = 0.60, Communality = 0.73
            [C.8.3] OVWINC: Loading = 0.59, Communality = 0.79
            [C.8.4] Cronbach’s Alpha = 0.86

Code Chunk | Output

##################################
# Implementing various procedures for determining
# factor retention based on
# the maximum consensus between methods
##################################
(FA_PA_V_MethodAgreementProcedure <- parameters::n_factors(DPA.Descriptors.Numeric,
                                                           algorithm = "pa",
                                                           rotation = "varimax"))

## # Method Agreement Procedure:
## 
## The choice of 1 dimensions is supported by 6 (42.86%) methods out of 14 (Optimal coordinates, Acceleration factor, Parallel analysis, Kaiser criterion, Scree (SE), Scree (R2)).

as.data.frame(FA_PA_V_MethodAgreementProcedure)

##    n_Factors              Method              Family
## 1          1 Optimal coordinates               Scree
## 2          1 Acceleration factor               Scree
## 3          1   Parallel analysis               Scree
## 4          1    Kaiser criterion               Scree
## 5          1          Scree (SE)            Scree_SE
## 6          1          Scree (R2)            Scree_SE
## 7          3             Bentler             Bentler
## 8          3                 CNG                 CNG
## 9          3                   t Multiple_regression
## 10         3                   p Multiple_regression
## 11         4            Bartlett             Barlett
## 12         4                beta Multiple_regression
## 13         5            Anderson             Barlett
## 14         5              Lawley             Barlett

##################################
# Conducting exploratory factor analysis
# using Principal Axes extraction
# and Varimax rotation
# with 3 factors
##################################
(FA_PA_V_3F <- fa(DPA.Descriptors.Numeric,
              nfactors = 3,
              fm="pa",
              rotate = "varimax",
              residuals=TRUE,
              SMC=TRUE,
              n.obs=nrow(DPA.Descriptors.Numeric)))

## Factor Analysis using method =  pa
## Call: fa(r = DPA.Descriptors.Numeric, nfactors = 3, n.obs = nrow(DPA.Descriptors.Numeric), 
##     rotate = "varimax", residuals = TRUE, SMC = TRUE, fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##         PA1  PA2  PA3   h2    u2 com
## ARTINC 0.35 0.62 0.58 0.84  0.16 2.6
## RDMORT 0.70 0.22 0.31 0.63  0.37 1.6
## PDMORT 0.45 0.27 0.60 0.63  0.37 2.3
## HDMORT 0.57 0.46 0.30 0.62  0.38 2.5
## STMORT 0.82 0.31 0.18 0.80  0.20 1.4
## DBMORT 0.52 0.17 0.33 0.41  0.59 1.9
## MEUNDA 0.19 0.99 0.23 1.06 -0.06 1.2
## OVWINC 0.73 0.05 0.44 0.72  0.28 1.6
## SMPREV 0.41 0.36 0.77 0.90  0.10 2.0
## 
##                        PA1  PA2  PA3
## SS loadings           2.81 1.94 1.86
## Proportion Var        0.31 0.22 0.21
## Cumulative Var        0.31 0.53 0.73
## Proportion Explained  0.43 0.29 0.28
## Cumulative Proportion 0.43 0.72 1.00
## 
## Mean item complexity =  1.9
## Test of the hypothesis that 3 factors are sufficient.
## 
## df null model =  36  with the objective function =  7.13 with Chi Square =  321.92
## df of  the model are 12  and the objective function was  0.31 
## 
## The root mean square of the residuals (RMSR) is  0.03 
## The df corrected root mean square of the residuals is  0.06 
## 
## The harmonic n.obs is  50 with the empirical chi square  3.79  with prob <  0.99 
## The total n.obs was  50  with Likelihood Chi Square =  13.46  with prob <  0.34 
## 
## Tucker Lewis Index of factoring reliability =  0.984
## RMSEA index =  0.045  and the 90 % confidence intervals are  0 0.158
## BIC =  -33.48
## Fit based upon off diagonal values = 1

(FA_PA_V_3F_Summary <- FA_PA_V_3F %>%
  model_parameters(sort = TRUE, threshold = "max"))

## # Rotated loadings from Factor Analysis (varimax-rotation)
## 
## Variable |  PA1 |  PA2 |  PA3 | Complexity | Uniqueness
## -------------------------------------------------------
## STMORT   | 0.82 |      |      |       1.39 |       0.20
## OVWINC   | 0.73 |      |      |       1.65 |       0.28
## RDMORT   | 0.70 |      |      |       1.60 |       0.37
## HDMORT   | 0.57 |      |      |       2.47 |       0.38
## DBMORT   | 0.52 |      |      |       1.93 |       0.59
## MEUNDA   |      | 0.99 |      |       1.19 |      -0.06
## ARTINC   |      | 0.62 |      |       2.58 |       0.16
## SMPREV   |      |      | 0.77 |       1.99 |       0.10
## PDMORT   |      |      | 0.60 |       2.30 |       0.37
## 
## The 3 latent factors (varimax rotation) accounted for 73.47% of the total variance of the original data (PA1 = 31.23%, PA2 = 21.53%, PA3 = 20.71%).

summary(FA_PA_V_3F_Summary)

## # (Explained) Variance of Components
## 
## Parameter                       |   PA1 |   PA2 |   PA3
## -------------------------------------------------------
## Eigenvalues                     | 5.397 | 0.854 | 0.361
## Variance Explained              | 0.312 | 0.215 | 0.207
## Variance Explained (Cumulative) | 0.312 | 0.528 | 0.735
## Variance Explained (Proportion) | 0.425 | 0.293 | 0.282

##################################
# Extracting the residuals
# from the Exploratory Factor Analysis
##################################
(FA_PA_V_3F_Residual <- residuals(FA_PA_V_3F,
                              diag=FALSE,
                              na.rm=TRUE))

##        ARTIN RDMOR PDMOR HDMOR STMOR DBMOR MEUND OVWIN SMPRE
## ARTINC    NA                                                
## RDMORT  0.05    NA                                          
## PDMORT  0.00 -0.05    NA                                    
## HDMORT -0.02 -0.07  0.02    NA                              
## STMORT  0.00  0.00  0.00  0.03    NA                        
## DBMORT  0.01  0.11  0.04 -0.06 -0.04    NA                  
## MEUNDA  0.00  0.00  0.00  0.01 -0.01  0.01    NA            
## OVWINC -0.04 -0.02  0.01  0.06  0.00 -0.03  0.00    NA      
## SMPREV  0.01 -0.01  0.00  0.00  0.01 -0.03  0.00  0.02    NA

##################################
# Obtaining Fit Indices
##################################
(FA_PA_V_3F_RMS <- FA_PA_V_3F$rms)

## [1] 0.03244678

(FA_PA_V_3F_TLI <- FA_PA_V_3F$TLI)

## [1] 0.9838317

(FA_PA_V_3F_BIC <- FA_PA_V_3F$BIC)

## [1] -33.48016

(FA_PA_V_3F_MaxResidual   <- max(abs(FA_PA_V_3F_Residual),na.rm=TRUE))

## [1] 0.1120448

(FA_PA_V_3F_HighResidual  <- sum(FA_PA_V_3F_Residual>abs(0.05),na.rm=TRUE))

## [1] 4

(FA_PA_V_3F_TotalResidual <- length(DPA.Descriptors.Numeric)*(length(DPA.Descriptors.Numeric)-1)/2)

## [1] 36

(FA_PA_V_3F_HighResidualRate <- FA_PA_V_3F_HighResidual/FA_PA_V_3F_TotalResidual)

## [1] 0.1111111

##################################
# Graph the factor loading matrices
##################################
fa.diagram(FA_PA_V_3F,
           sort=TRUE,
           cut=0,
           digits=3,
           main="Principal Axes Factor Extraction + Varimax Rotation : 3 Factors",
           cex=0.75)

##################################
# computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("STMORT","OVWINC","RDMORT","HDMORT","DBMORT")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("STMORT", "OVWINC", "RDMORT", 
##     "HDMORT", "DBMORT")], check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase     mean   sd median_r
##       0.88      0.88    0.87      0.59 7.2 0.028 -2.6e-16 0.82     0.62
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.81  0.88  0.92
## Duhachek  0.82  0.88  0.93
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r med.r
## STMORT      0.83      0.83    0.80      0.55 4.9    0.039 0.0079  0.57
## OVWINC      0.84      0.84    0.83      0.58 5.4    0.036 0.0133  0.57
## RDMORT      0.84      0.84    0.81      0.57 5.3    0.037 0.0134  0.56
## HDMORT      0.86      0.86    0.83      0.61 6.2    0.032 0.0077  0.63
## DBMORT      0.88      0.88    0.85      0.64 7.3    0.028 0.0046  0.66
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop     mean sd
## STMORT 50  0.88  0.88  0.85   0.79  4.6e-16  1
## OVWINC 50  0.84  0.84  0.79   0.74 -1.8e-15  1
## RDMORT 50  0.85  0.85  0.81   0.75  3.3e-16  1
## HDMORT 50  0.79  0.79  0.73   0.67 -2.9e-16  1
## DBMORT 50  0.74  0.74  0.64   0.59 -1.6e-17  1

##################################
# computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("MEUNDA","ARTINC")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("MEUNDA", "ARTINC")], check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase    mean   sd median_r
##       0.89      0.89     0.8       0.8 8.2 0.031 2.3e-16 0.95      0.8
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.81  0.89  0.94
## Duhachek  0.83  0.89  0.95
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## MEUNDA       0.8       0.8    0.65       0.8 4.1       NA     0   0.8
## ARTINC       0.8       0.8    0.65       0.8 4.1       NA     0   0.8
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop    mean sd
## MEUNDA 50  0.95  0.95  0.85    0.8 1.5e-16  1
## ARTINC 50  0.95  0.95  0.85    0.8 3.0e-16  1

##################################
# computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("SMPREV","PDMORT")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("SMPREV", "PDMORT")], check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase    mean   sd median_r
##       0.85      0.85    0.74      0.74 5.6 0.043 1.5e-16 0.93     0.74
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.73  0.85  0.91
## Duhachek  0.77  0.85  0.93
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## SMPREV      0.74      0.74    0.55      0.74 2.8       NA     0  0.74
## PDMORT      0.74      0.74    0.55      0.74 2.8       NA     0  0.74
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop    mean sd
## SMPREV 50  0.93  0.93   0.8   0.74 1.2e-16  1
## PDMORT 50  0.93  0.93   0.8   0.74 1.6e-16  1

##################################
# Formulating the dandelion plot to
# visualize both factor variances and loadings
# from the factor loading matrices
##################################
FA_PA_V_3F_FactorLoading <- factload(DPA.Descriptors.Numeric,
                                  cormeth = "pearson",
                                  method = "prax",
                                  nfac = 3,
                                  rotation = "varimax")

DandelionPlotPalette <- rev(rainbow(100, start = 0, end = 0.2))

dandelion(FA_PA_V_3F_FactorLoading,
          bound=0,
          mcex=c(1,1.2),
          palet=DandelionPlotPalette)

##################################
# Conducting exploratory factor analysis
# using Principal Axes extraction
# and Varimax rotation
# with 4 factors
##################################
(FA_PA_V_4F <- fa(DPA.Descriptors.Numeric,
              nfactors = 4,
              fm="pa",
              rotate = "varimax",
              residuals=TRUE,
              SMC=TRUE,
              n.obs=nrow(DPA.Descriptors.Numeric)))

## Factor Analysis using method =  pa
## Call: fa(r = DPA.Descriptors.Numeric, nfactors = 4, n.obs = nrow(DPA.Descriptors.Numeric), 
##     rotate = "varimax", residuals = TRUE, SMC = TRUE, fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##         PA2  PA3  PA1  PA4   h2    u2 com
## ARTINC 0.70 0.33 0.52 0.16 0.89 0.107 2.4
## RDMORT 0.22 0.89 0.22 0.26 0.95 0.054 1.4
## PDMORT 0.30 0.27 0.61 0.32 0.65 0.354 2.5
## HDMORT 0.46 0.21 0.26 0.68 0.79 0.214 2.4
## STMORT 0.27 0.49 0.25 0.60 0.73 0.267 2.8
## DBMORT 0.18 0.50 0.33 0.21 0.44 0.561 2.4
## MEUNDA 0.90 0.15 0.18 0.21 0.90 0.098 1.3
## OVWINC 0.00 0.42 0.52 0.59 0.79 0.215 2.8
## SMPREV 0.42 0.30 0.73 0.27 0.87 0.128 2.3
## 
##                        PA2  PA3  PA1  PA4
## SS loadings           1.93 1.80 1.76 1.51
## Proportion Var        0.21 0.20 0.20 0.17
## Cumulative Var        0.21 0.41 0.61 0.78
## Proportion Explained  0.28 0.26 0.25 0.22
## Cumulative Proportion 0.28 0.53 0.78 1.00
## 
## Mean item complexity =  2.3
## Test of the hypothesis that 4 factors are sufficient.
## 
## df null model =  36  with the objective function =  7.13 with Chi Square =  321.92
## df of  the model are 6  and the objective function was  0.05 
## 
## The root mean square of the residuals (RMSR) is  0.01 
## The df corrected root mean square of the residuals is  0.03 
## 
## The harmonic n.obs is  50 with the empirical chi square  0.5  with prob <  1 
## The total n.obs was  50  with Likelihood Chi Square =  2.29  with prob <  0.89 
## 
## Tucker Lewis Index of factoring reliability =  1.083
## RMSEA index =  0  and the 90 % confidence intervals are  0 0.085
## BIC =  -21.18
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    PA2  PA3  PA1  PA4
## Correlation of (regression) scores with factors   0.94 0.95 0.88 0.85
## Multiple R square of scores with factors          0.89 0.90 0.78 0.72
## Minimum correlation of possible factor scores     0.78 0.81 0.56 0.45

(FA_PA_V_4F_Summary <- FA_PA_V_4F %>%
  model_parameters(sort = TRUE, threshold = "max"))

## # Rotated loadings from Factor Analysis (varimax-rotation)
## 
## Variable |  PA2 |  PA3 |  PA1 |  PA4 | Complexity | Uniqueness
## --------------------------------------------------------------
## MEUNDA   | 0.90 |      |      |      |       1.26 |       0.10
## ARTINC   | 0.70 |      |      |      |       2.44 |       0.11
## RDMORT   |      | 0.89 |      |      |       1.43 |       0.05
## DBMORT   |      | 0.50 |      |      |       2.44 |       0.56
## SMPREV   |      |      | 0.73 |      |       2.32 |       0.13
## PDMORT   |      |      | 0.61 |      |       2.50 |       0.35
## HDMORT   |      |      |      | 0.68 |       2.36 |       0.21
## STMORT   |      |      |      | 0.60 |       2.78 |       0.27
## OVWINC   |      |      |      | 0.59 |       2.81 |       0.21
## 
## The 4 latent factors (varimax rotation) accounted for 77.81% of the total variance of the original data (PA2 = 21.45%, PA3 = 19.97%, PA1 = 19.58%, PA4 = 16.81%).

summary(FA_PA_V_4F_Summary)

## # (Explained) Variance of Components
## 
## Parameter                       |   PA2 |   PA3 |   PA1 |   PA4
## ---------------------------------------------------------------
## Eigenvalues                     | 5.441 | 0.825 | 0.391 | 0.347
## Variance Explained              | 0.215 | 0.200 | 0.196 | 0.168
## Variance Explained (Cumulative) | 0.215 | 0.414 | 0.610 | 0.778
## Variance Explained (Proportion) | 0.276 | 0.257 | 0.252 | 0.216

##################################
# Extracting the residuals
# from the Exploratory Factor Analysis
##################################
(FA_PA_V_4F_Residual <- residuals(FA_PA_V_4F,
                              diag=FALSE,
                              na.rm=TRUE))

##        ARTIN RDMOR PDMOR HDMOR STMOR DBMOR MEUND OVWIN SMPRE
## ARTINC    NA                                                
## RDMORT  0.00    NA                                          
## PDMORT -0.01 -0.02    NA                                    
## HDMORT  0.00  0.00  0.01    NA                              
## STMORT  0.00  0.00  0.00  0.00    NA                        
## DBMORT -0.01  0.00  0.05 -0.01  0.00    NA                  
## MEUNDA  0.00  0.00  0.00  0.00  0.00  0.01    NA            
## OVWINC  0.00  0.00 -0.01  0.00  0.00 -0.01  0.00    NA      
## SMPREV  0.01  0.01  0.00  0.00 -0.01 -0.03  0.00  0.01    NA

##################################
# Obtaining Fit Indices
##################################
(FA_PA_V_4F_RMS <- FA_PA_V_4F$rms)

## [1] 0.01182674

(FA_PA_V_4F_TLI <- FA_PA_V_4F$TLI)

## [1] 1.083371

(FA_PA_V_4F_BIC <- FA_PA_V_4F$BIC)

## [1] -21.18092

(FA_PA_V_4F_MaxResidual   <- max(abs(FA_PA_V_4F_Residual),na.rm=TRUE))

## [1] 0.05035344

(FA_PA_V_4F_HighResidual  <- sum(FA_PA_V_4F_Residual>abs(0.05),na.rm=TRUE))

## [1] 2

(FA_PA_V_4F_TotalResidual <- length(DPA.Descriptors.Numeric)*(length(DPA.Descriptors.Numeric)-1)/2)

## [1] 36

(FA_PA_V_4F_HighResidualRate <- FA_PA_V_4F_HighResidual/FA_PA_V_4F_TotalResidual)

## [1] 0.05555556

##################################
# Graphing the factor loading matrices
##################################
fa.diagram(FA_PA_V_4F,
           sort=TRUE,
           cut=0,
           digits=3,
           main="Principal Axes Factor Extraction + Varimax Rotation : 4 Factors",
           cex=0.75)

##################################
# Computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("MEUNDA","ARTINC")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("MEUNDA", "ARTINC")], check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase    mean   sd median_r
##       0.89      0.89     0.8       0.8 8.2 0.031 2.3e-16 0.95      0.8
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.81  0.89  0.94
## Duhachek  0.83  0.89  0.95
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## MEUNDA       0.8       0.8    0.65       0.8 4.1       NA     0   0.8
## ARTINC       0.8       0.8    0.65       0.8 4.1       NA     0   0.8
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop    mean sd
## MEUNDA 50  0.95  0.95  0.85    0.8 1.5e-16  1
## ARTINC 50  0.95  0.95  0.85    0.8 3.0e-16  1

##################################
# Computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("RDMORT","DBMORT")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("RDMORT", "DBMORT")], check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase    mean  sd median_r
##       0.76      0.76    0.61      0.61 3.2 0.068 1.5e-16 0.9     0.61
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.58  0.76  0.86
## Duhachek  0.63  0.76  0.89
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## RDMORT      0.61      0.61    0.38      0.61 1.6       NA     0  0.61
## DBMORT      0.61      0.61    0.38      0.61 1.6       NA     0  0.61
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop     mean sd
## RDMORT 50   0.9   0.9   0.7   0.61  3.3e-16  1
## DBMORT 50   0.9   0.9   0.7   0.61 -1.6e-17  1

##################################
# Computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("SMPREV","PDMORT")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("SMPREV", "PDMORT")], check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase    mean   sd median_r
##       0.85      0.85    0.74      0.74 5.6 0.043 1.5e-16 0.93     0.74
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.73  0.85  0.91
## Duhachek  0.77  0.85  0.93
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## SMPREV      0.74      0.74    0.55      0.74 2.8       NA     0  0.74
## PDMORT      0.74      0.74    0.55      0.74 2.8       NA     0  0.74
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop    mean sd
## SMPREV 50  0.93  0.93   0.8   0.74 1.2e-16  1
## PDMORT 50  0.93  0.93   0.8   0.74 1.6e-16  1

##################################
# Computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("STMORT","OVWINC","HDMORT")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("STMORT", "OVWINC", "HDMORT")], 
##     check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase     mean   sd median_r
##       0.86      0.86     0.8      0.67 6.1 0.035 -5.4e-16 0.88     0.69
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.77  0.86  0.91
## Duhachek  0.79  0.86  0.93
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## STMORT      0.77      0.77    0.62      0.62 3.3    0.066    NA  0.62
## OVWINC      0.82      0.82    0.70      0.70 4.6    0.050    NA  0.70
## HDMORT      0.82      0.82    0.69      0.69 4.4    0.052    NA  0.69
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop     mean sd
## STMORT 50  0.90  0.90  0.83   0.77  4.6e-16  1
## OVWINC 50  0.87  0.87  0.77   0.71 -1.8e-15  1
## HDMORT 50  0.88  0.88  0.78   0.72 -2.9e-16  1

##################################
# Formulating the dandelion plot to
# visualize both factor variances and loadings
# from the factor loading matrices
##################################
FA_PA_V_4F_FactorLoading <- factload(DPA.Descriptors.Numeric,
                                  cormeth = "pearson",
                                  method = "prax",
                                  nfac = 4,
                                  rotation = "varimax")

DandelionPlotPalette <- rev(rainbow(100, start = 0, end = 0.2))

dandelion(FA_PA_V_4F_FactorLoading,
          bound=0,
          mcex=c(1,1.2),
          palet=DandelionPlotPalette)

1.3.6.2 Principal Axes Factor Extraction and Promax Rotation

[A] Appplying Principal Axes factor extraction and Promax rotation, an evaluation was conducted using a set of empirical guidelines to determine the optimal number of factors to be retained for exploratory factor analysis. It was determined that:
     [A.1] The choice of 3 factors was supported by the Bentler, CNG, t and p methods for determining the optimal number of factors.
     [A.2] The choice of 4 factors was supported by the Beta and Bartlett methods for determining the optimal number of factors.
     [A.3] 3 or 4 factors would be sufficiently plausible for an optimal balance between comprehensiveness and parsimony.
     [A.4] To ensure that both under-extraction and over-extraction are assessed, models with 3 and 4 and factors were sequentially evaluated for their interpretability and theoretical meaningfulness.

[B] Results for the exploratory factor analysis using a 3-Factor Structure were as follows:
     [B.1] Standardized Root Mean Square of the Residual = 0.03
     [B.2] Tucker-Lewis Fit Index = 0.98
     [B.3] Bayesian Information Criterion = -33.48
     [B.4] High Residual Rate = 0.11
     [B.5] Factor 1 was a latent factor with higher loading towards the following descriptors:
            [B.5.1] STMORT: Loading = 0.98, Communality = 0.80
            [B.5.2] OVWINC: Loading = 0.73, Communality = 0.72
            [B.5.3] RDMORT: Loading = 0.74, Communality = 0.63
            [B.5.4] HDMORT: Loading = 0.52, Communality = 0.62
            [B.5.5] DBMORT: Loading = 0.48, Communality = 0.41
            [B.5.6] Cronbach’s Alpha = 0.88
     [B.6] Factor 2 was a latent factor with higher loading towards the following descriptors:
            [B.6.1] SMPREV: Loading = 0.91, Communality = 0.90
            [B.6.2] PDMORT: Loading = 0.63, Communality = 0.63
            [B.6.3] ARTINC: Loading = 0.57, Communality = 0.84
            [B.6.4] Cronbach’s Alpha = 0.89
     [B.7] Factor 3 was a latent factor with higher loading towards the following descriptors:
            [B.7.1] MEUNDA: Loading = 1.05, Communality = 1.06
            [B.7.2] Cronbach’s Alpha = 0.85

[C] Results for the exploratory factor analysis using a 4-Factor Structure were as follows:
     [C.1] Standardized Root Mean Square of the Residual = 0.01
     [C.2] Tucker-Lewis Fit Index = 1.08
     [C.3] Bayesian Information Criterion = -21.18
     [C.4] High Residual Rate = 0.05
     [C.5] Factor 1 was a latent factor with higher loading towards the following descriptors:
            [C.5.1] MEUNDA: Loading = 0.91, Communality = 0.90
            [C.5.2] ARTINC: Loading = 0.56, Communality = 0.89
            [C.5.3] Cronbach’s Alpha = 0.89
     [C.6] Factor 2 was a latent factor with higher loading towards the following descriptors:
            [C.6.1] RDMORT: Loading = 1.04, Communality = 0.95
            [C.6.2] DBMORT: Loading = 0.47, Communality = 0.44
            [C.6.3] Cronbach’s Alpha = 0.76
     [C.7] Factor 3 was a latent factor with higher loading towards the following descriptors:
            [C.7.1] SMPREV: Loading = 0.87, Communality = 0.87
            [C.7.2] PDMORT: Loading = 0.71, Communality = 0.65
            [C.7.3] Cronbach’s Alpha = 0.85
     [C.8] Factor 4 was a latent factor with higher loading towards the following descriptors:
            [C.8.1] HDMORT: Loading = 0.78, Communality = 0.79
            [C.8.2] STMORT: Loading = 0.60, Communality = 0.73
            [C.8.3] OVWINC: Loading = 0.53, Communality = 0.79
            [C.8.4] Cronbach’s Alpha = 0.86

Code Chunk | Output

##################################
# Implementing various procedures for determining
# factor retention based on
# the maximum consensus between methods
##################################
(FA_PA_P_MethodAgreementProcedure <- parameters::n_factors(DPA.Descriptors.Numeric,
                                                           algorithm = "pa",
                                                           rotation = "promax"))

## # Method Agreement Procedure:
## 
## The choice of 1 dimensions is supported by 6 (42.86%) methods out of 14 (Optimal coordinates, Acceleration factor, Parallel analysis, Kaiser criterion, Scree (SE), Scree (R2)).

as.data.frame(FA_PA_P_MethodAgreementProcedure)

##    n_Factors              Method              Family
## 1          1 Optimal coordinates               Scree
## 2          1 Acceleration factor               Scree
## 3          1   Parallel analysis               Scree
## 4          1    Kaiser criterion               Scree
## 5          1          Scree (SE)            Scree_SE
## 6          1          Scree (R2)            Scree_SE
## 7          3             Bentler             Bentler
## 8          3                 CNG                 CNG
## 9          3                   t Multiple_regression
## 10         3                   p Multiple_regression
## 11         4            Bartlett             Barlett
## 12         4                beta Multiple_regression
## 13         5            Anderson             Barlett
## 14         5              Lawley             Barlett

##################################
# Conducting exploratory factor analysis
# using Principal Axes extraction
# and Promax rotation
# with 3 factors
##################################
(FA_PA_P_3F <- fa(DPA.Descriptors.Numeric,
              nfactors = 3,
              fm="pa",
              rotate = "promax",
              residuals=TRUE,
              SMC=TRUE,
              n.obs=nrow(DPA.Descriptors.Numeric)))

## Factor Analysis using method =  pa
## Call: fa(r = DPA.Descriptors.Numeric, nfactors = 3, n.obs = nrow(DPA.Descriptors.Numeric), 
##     rotate = "promax", residuals = TRUE, SMC = TRUE, fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##          PA1   PA3   PA2   h2    u2 com
## ARTINC  0.00  0.57  0.45 0.84  0.16 1.9
## RDMORT  0.74  0.07  0.00 0.63  0.37 1.0
## PDMORT  0.19  0.63  0.01 0.63  0.37 1.2
## HDMORT  0.52  0.06  0.32 0.62  0.38 1.7
## STMORT  0.98 -0.21  0.13 0.80  0.20 1.1
## DBMORT  0.48  0.21 -0.03 0.41  0.59 1.4
## MEUNDA -0.06  0.01  1.05 1.06 -0.06 1.0
## OVWINC  0.73  0.30 -0.26 0.72  0.28 1.6
## SMPREV  0.00  0.91  0.06 0.90  0.10 1.0
## 
##                        PA1  PA3  PA2
## SS loadings           2.83 2.11 1.67
## Proportion Var        0.31 0.23 0.19
## Cumulative Var        0.31 0.55 0.73
## Proportion Explained  0.43 0.32 0.25
## Cumulative Proportion 0.43 0.75 1.00
## 
##  With factor correlations of 
##      PA1  PA3  PA2
## PA1 1.00 0.76 0.53
## PA3 0.76 1.00 0.60
## PA2 0.53 0.60 1.00
## 
## Mean item complexity =  1.3
## Test of the hypothesis that 3 factors are sufficient.
## 
## df null model =  36  with the objective function =  7.13 with Chi Square =  321.92
## df of  the model are 12  and the objective function was  0.31 
## 
## The root mean square of the residuals (RMSR) is  0.03 
## The df corrected root mean square of the residuals is  0.06 
## 
## The harmonic n.obs is  50 with the empirical chi square  3.79  with prob <  0.99 
## The total n.obs was  50  with Likelihood Chi Square =  13.46  with prob <  0.34 
## 
## Tucker Lewis Index of factoring reliability =  0.984
## RMSEA index =  0.045  and the 90 % confidence intervals are  0 0.158
## BIC =  -33.48
## Fit based upon off diagonal values = 1

(FA_PA_P_3F_Summary <- FA_PA_P_3F %>%
  model_parameters(sort = TRUE, threshold = "max"))

## # Rotated loadings from Factor Analysis (promax-rotation)
## 
## Variable |  PA1 |  PA3 |  PA2 | Complexity | Uniqueness
## -------------------------------------------------------
## STMORT   | 0.98 |      |      |       1.13 |       0.20
## RDMORT   | 0.74 |      |      |       1.02 |       0.37
## OVWINC   | 0.73 |      |      |       1.59 |       0.28
## HDMORT   | 0.52 |      |      |       1.68 |       0.38
## DBMORT   | 0.48 |      |      |       1.36 |       0.59
## SMPREV   |      | 0.91 |      |       1.01 |       0.10
## PDMORT   |      | 0.63 |      |       1.18 |       0.37
## ARTINC   |      | 0.57 |      |       1.91 |       0.16
## MEUNDA   |      |      | 1.05 |       1.01 |      -0.06
## 
## The 3 latent factors (promax rotation) accounted for 73.47% of the total variance of the original data (PA1 = 31.43%, PA3 = 23.45%, PA2 = 18.59%).

summary(FA_PA_P_3F_Summary)

## # (Explained) Variance of Components
## 
## Parameter                       |   PA1 |   PA3 |   PA2
## -------------------------------------------------------
## Eigenvalues                     | 5.397 | 0.854 | 0.361
## Variance Explained              | 0.314 | 0.235 | 0.186
## Variance Explained (Cumulative) | 0.314 | 0.549 | 0.735
## Variance Explained (Proportion) | 0.428 | 0.319 | 0.253

##################################
# Extracting the residuals
# from the Exploratory Factor Analysis
##################################
(FA_PA_P_3F_Residual <- residuals(FA_PA_P_3F,
                              diag=FALSE,
                              na.rm=TRUE))

##        ARTIN RDMOR PDMOR HDMOR STMOR DBMOR MEUND OVWIN SMPRE
## ARTINC    NA                                                
## RDMORT  0.05    NA                                          
## PDMORT  0.00 -0.05    NA                                    
## HDMORT -0.02 -0.07  0.02    NA                              
## STMORT  0.00  0.00  0.00  0.03    NA                        
## DBMORT  0.01  0.11  0.04 -0.06 -0.04    NA                  
## MEUNDA  0.00  0.00  0.00  0.01 -0.01  0.01    NA            
## OVWINC -0.04 -0.02  0.01  0.06  0.00 -0.03  0.00    NA      
## SMPREV  0.01 -0.01  0.00  0.00  0.01 -0.03  0.00  0.02    NA

##################################
# Obtaining Fit Indices
##################################
(FA_PA_P_3F_RMS <- FA_PA_P_3F$rms)

## [1] 0.03244678

(FA_PA_P_3F_TLI <- FA_PA_P_3F$TLI)

## [1] 0.9838317

(FA_PA_P_3F_BIC <- FA_PA_P_3F$BIC)

## [1] -33.48016

(FA_PA_P_3F_MaxResidual   <- max(abs(FA_PA_P_3F_Residual),na.rm=TRUE))

## [1] 0.1120448

(FA_PA_P_3F_HighResidual  <- sum(FA_PA_P_3F_Residual>abs(0.05),na.rm=TRUE))

## [1] 4

(FA_PA_P_3F_TotalResidual <- length(DPA.Descriptors.Numeric)*(length(DPA.Descriptors.Numeric)-1)/2)

## [1] 36

(FA_PA_P_3F_HighResidualRate <- FA_PA_P_3F_HighResidual/FA_PA_P_3F_TotalResidual)

## [1] 0.1111111

##################################
# Graph the factor loading matrices
##################################
fa.diagram(FA_PA_P_3F,
           sort=TRUE,
           cut=0,
           digits=3,
           main="Principal Axes Factor Extraction + Promax Rotation : 3 Factors",
           cex=0.75)

##################################
# computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("STMORT","OVWINC","RDMORT","HDMORT","DBMORT")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("STMORT", "OVWINC", "RDMORT", 
##     "HDMORT", "DBMORT")], check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase     mean   sd median_r
##       0.88      0.88    0.87      0.59 7.2 0.028 -2.6e-16 0.82     0.62
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.81  0.88  0.92
## Duhachek  0.82  0.88  0.93
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r med.r
## STMORT      0.83      0.83    0.80      0.55 4.9    0.039 0.0079  0.57
## OVWINC      0.84      0.84    0.83      0.58 5.4    0.036 0.0133  0.57
## RDMORT      0.84      0.84    0.81      0.57 5.3    0.037 0.0134  0.56
## HDMORT      0.86      0.86    0.83      0.61 6.2    0.032 0.0077  0.63
## DBMORT      0.88      0.88    0.85      0.64 7.3    0.028 0.0046  0.66
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop     mean sd
## STMORT 50  0.88  0.88  0.85   0.79  4.6e-16  1
## OVWINC 50  0.84  0.84  0.79   0.74 -1.8e-15  1
## RDMORT 50  0.85  0.85  0.81   0.75  3.3e-16  1
## HDMORT 50  0.79  0.79  0.73   0.67 -2.9e-16  1
## DBMORT 50  0.74  0.74  0.64   0.59 -1.6e-17  1

##################################
# computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("MEUNDA","ARTINC")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("MEUNDA", "ARTINC")], check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase    mean   sd median_r
##       0.89      0.89     0.8       0.8 8.2 0.031 2.3e-16 0.95      0.8
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.81  0.89  0.94
## Duhachek  0.83  0.89  0.95
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## MEUNDA       0.8       0.8    0.65       0.8 4.1       NA     0   0.8
## ARTINC       0.8       0.8    0.65       0.8 4.1       NA     0   0.8
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop    mean sd
## MEUNDA 50  0.95  0.95  0.85    0.8 1.5e-16  1
## ARTINC 50  0.95  0.95  0.85    0.8 3.0e-16  1

##################################
# computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("SMPREV","PDMORT")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("SMPREV", "PDMORT")], check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase    mean   sd median_r
##       0.85      0.85    0.74      0.74 5.6 0.043 1.5e-16 0.93     0.74
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.73  0.85  0.91
## Duhachek  0.77  0.85  0.93
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## SMPREV      0.74      0.74    0.55      0.74 2.8       NA     0  0.74
## PDMORT      0.74      0.74    0.55      0.74 2.8       NA     0  0.74
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop    mean sd
## SMPREV 50  0.93  0.93   0.8   0.74 1.2e-16  1
## PDMORT 50  0.93  0.93   0.8   0.74 1.6e-16  1

##################################
# Formulating the dandelion plot to
# visualize both factor variances and loadings
# from the factor loading matrices
##################################
FA_PA_P_3F_FactorLoading <- factload(DPA.Descriptors.Numeric,
                                  cormeth = "pearson",
                                  method = "prax",
                                  nfac = 3,
                                  rotation = "promax")

DandelionPlotPalette <- rev(rainbow(100, start = 0, end = 0.2))

dandelion(FA_PA_P_3F_FactorLoading,
          bound=0,
          mcex=c(1,1.2),
          palet=DandelionPlotPalette)

##################################
# Conducting exploratory factor analysis
# using Principal Axes extraction
# and Promax rotation
# with 4 factors
##################################
(FA_PA_P_4F <- fa(DPA.Descriptors.Numeric,
              nfactors = 4,
              fm="pa",
              rotate = "promax",
              residuals=TRUE,
              SMC=TRUE,
              n.obs=nrow(DPA.Descriptors.Numeric)))

## Factor Analysis using method =  pa
## Call: fa(r = DPA.Descriptors.Numeric, nfactors = 4, n.obs = nrow(DPA.Descriptors.Numeric), 
##     rotate = "promax", residuals = TRUE, SMC = TRUE, fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##          PA1   PA4   PA2   PA3   h2    u2 com
## ARTINC  0.50 -0.11  0.56  0.13 0.89 0.107 2.2
## RDMORT -0.13  0.01  0.04  1.04 0.95 0.054 1.0
## PDMORT  0.71  0.13  0.05 -0.04 0.65 0.354 1.1
## HDMORT  0.02  0.78  0.31 -0.12 0.79 0.214 1.4
## STMORT -0.05  0.60  0.08  0.32 0.73 0.267 1.6
## DBMORT  0.22  0.02  0.01  0.47 0.44 0.561 1.4
## MEUNDA -0.03  0.13  0.91  0.00 0.90 0.098 1.0
## OVWINC  0.47  0.53 -0.31  0.12 0.79 0.215 2.7
## SMPREV  0.87 -0.01  0.16 -0.04 0.87 0.128 1.1
## 
##                        PA1  PA4  PA2  PA3
## SS loadings           2.14 1.65 1.62 1.59
## Proportion Var        0.24 0.18 0.18 0.18
## Cumulative Var        0.24 0.42 0.60 0.78
## Proportion Explained  0.31 0.24 0.23 0.23
## Cumulative Proportion 0.31 0.54 0.77 1.00
## 
##  With factor correlations of 
##      PA1  PA4  PA2  PA3
## PA1 1.00 0.70 0.55 0.72
## PA4 0.70 1.00 0.44 0.69
## PA2 0.55 0.44 1.00 0.41
## PA3 0.72 0.69 0.41 1.00
## 
## Mean item complexity =  1.5
## Test of the hypothesis that 4 factors are sufficient.
## 
## df null model =  36  with the objective function =  7.13 with Chi Square =  321.92
## df of  the model are 6  and the objective function was  0.05 
## 
## The root mean square of the residuals (RMSR) is  0.01 
## The df corrected root mean square of the residuals is  0.03 
## 
## The harmonic n.obs is  50 with the empirical chi square  0.5  with prob <  1 
## The total n.obs was  50  with Likelihood Chi Square =  2.29  with prob <  0.89 
## 
## Tucker Lewis Index of factoring reliability =  1.083
## RMSEA index =  0  and the 90 % confidence intervals are  0 0.085
## BIC =  -21.18
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    PA1  PA4  PA2  PA3
## Correlation of (regression) scores with factors   0.96 0.94 0.96 0.98
## Multiple R square of scores with factors          0.92 0.87 0.92 0.96
## Minimum correlation of possible factor scores     0.85 0.75 0.85 0.91

(FA_PA_P_4F_Summary <- FA_PA_P_4F %>%
  model_parameters(sort = TRUE, threshold = "max"))

## # Rotated loadings from Factor Analysis (promax-rotation)
## 
## Variable |  PA1 |  PA4 |  PA2 |  PA3 | Complexity | Uniqueness
## --------------------------------------------------------------
## SMPREV   | 0.87 |      |      |      |       1.07 |       0.13
## PDMORT   | 0.71 |      |      |      |       1.08 |       0.35
## HDMORT   |      | 0.78 |      |      |       1.37 |       0.21
## STMORT   |      | 0.60 |      |      |       1.59 |       0.27
## OVWINC   |      | 0.53 |      |      |       2.73 |       0.21
## MEUNDA   |      |      | 0.91 |      |       1.04 |       0.10
## ARTINC   |      |      | 0.56 |      |       2.18 |       0.11
## RDMORT   |      |      |      | 1.04 |       1.03 |       0.05
## DBMORT   |      |      |      | 0.47 |       1.42 |       0.56
## 
## The 4 latent factors (promax rotation) accounted for 77.81% of the total variance of the original data (PA1 = 23.77%, PA4 = 18.37%, PA2 = 17.98%, PA3 = 17.70%).

summary(FA_PA_P_4F_Summary)

## # (Explained) Variance of Components
## 
## Parameter                       |   PA1 |   PA4 |   PA2 |   PA3
## ---------------------------------------------------------------
## Eigenvalues                     | 5.441 | 0.825 | 0.391 | 0.347
## Variance Explained              | 0.238 | 0.184 | 0.180 | 0.177
## Variance Explained (Cumulative) | 0.238 | 0.421 | 0.601 | 0.778
## Variance Explained (Proportion) | 0.305 | 0.236 | 0.231 | 0.227

##################################
# Extracting the residuals
# from the Exploratory Factor Analysis
##################################
(FA_PA_P_4F_Residual <- residuals(FA_PA_P_4F,
                              diag=FALSE,
                              na.rm=TRUE))

##        ARTIN RDMOR PDMOR HDMOR STMOR DBMOR MEUND OVWIN SMPRE
## ARTINC    NA                                                
## RDMORT  0.00    NA                                          
## PDMORT -0.01 -0.02    NA                                    
## HDMORT  0.00  0.00  0.01    NA                              
## STMORT  0.00  0.00  0.00  0.00    NA                        
## DBMORT -0.01  0.00  0.05 -0.01  0.00    NA                  
## MEUNDA  0.00  0.00  0.00  0.00  0.00  0.01    NA            
## OVWINC  0.00  0.00 -0.01  0.00  0.00 -0.01  0.00    NA      
## SMPREV  0.01  0.01  0.00  0.00 -0.01 -0.03  0.00  0.01    NA

##################################
# Obtaining Fit Indices
##################################
(FA_PA_P_4F_RMS <- FA_PA_P_4F$rms)

## [1] 0.01182674

(FA_PA_P_4F_TLI <- FA_PA_P_4F$TLI)

## [1] 1.083371

(FA_PA_P_4F_BIC <- FA_PA_P_4F$BIC)

## [1] -21.18092

(FA_PA_P_4F_MaxResidual   <- max(abs(FA_PA_P_4F_Residual),na.rm=TRUE))

## [1] 0.05035344

(FA_PA_P_4F_HighResidual  <- sum(FA_PA_P_4F_Residual>abs(0.05),na.rm=TRUE))

## [1] 2

(FA_PA_P_4F_TotalResidual <- length(DPA.Descriptors.Numeric)*(length(DPA.Descriptors.Numeric)-1)/2)

## [1] 36

(FA_PA_P_4F_HighResidualRate <- FA_PA_P_4F_HighResidual/FA_PA_P_4F_TotalResidual)

## [1] 0.05555556

##################################
# Graph the factor loading matrices
##################################
fa.diagram(FA_PA_P_4F,
           sort=TRUE,
           cut=0,
           digits=3,
           main="Principal Axes Factor Extraction + Promax Rotation : 4 Factors",
           cex=0.75)

##################################
# computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("MEUNDA","ARTINC")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("MEUNDA", "ARTINC")], check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase    mean   sd median_r
##       0.89      0.89     0.8       0.8 8.2 0.031 2.3e-16 0.95      0.8
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.81  0.89  0.94
## Duhachek  0.83  0.89  0.95
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## MEUNDA       0.8       0.8    0.65       0.8 4.1       NA     0   0.8
## ARTINC       0.8       0.8    0.65       0.8 4.1       NA     0   0.8
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop    mean sd
## MEUNDA 50  0.95  0.95  0.85    0.8 1.5e-16  1
## ARTINC 50  0.95  0.95  0.85    0.8 3.0e-16  1

##################################
# computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("RDMORT","DBMORT")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("RDMORT", "DBMORT")], check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase    mean  sd median_r
##       0.76      0.76    0.61      0.61 3.2 0.068 1.5e-16 0.9     0.61
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.58  0.76  0.86
## Duhachek  0.63  0.76  0.89
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## RDMORT      0.61      0.61    0.38      0.61 1.6       NA     0  0.61
## DBMORT      0.61      0.61    0.38      0.61 1.6       NA     0  0.61
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop     mean sd
## RDMORT 50   0.9   0.9   0.7   0.61  3.3e-16  1
## DBMORT 50   0.9   0.9   0.7   0.61 -1.6e-17  1

##################################
# computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("SMPREV","PDMORT")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("SMPREV", "PDMORT")], check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase    mean   sd median_r
##       0.85      0.85    0.74      0.74 5.6 0.043 1.5e-16 0.93     0.74
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.73  0.85  0.91
## Duhachek  0.77  0.85  0.93
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## SMPREV      0.74      0.74    0.55      0.74 2.8       NA     0  0.74
## PDMORT      0.74      0.74    0.55      0.74 2.8       NA     0  0.74
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop    mean sd
## SMPREV 50  0.93  0.93   0.8   0.74 1.2e-16  1
## PDMORT 50  0.93  0.93   0.8   0.74 1.6e-16  1

##################################
# computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("STMORT","OVWINC","HDMORT")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("STMORT", "OVWINC", "HDMORT")], 
##     check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase     mean   sd median_r
##       0.86      0.86     0.8      0.67 6.1 0.035 -5.4e-16 0.88     0.69
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.77  0.86  0.91
## Duhachek  0.79  0.86  0.93
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## STMORT      0.77      0.77    0.62      0.62 3.3    0.066    NA  0.62
## OVWINC      0.82      0.82    0.70      0.70 4.6    0.050    NA  0.70
## HDMORT      0.82      0.82    0.69      0.69 4.4    0.052    NA  0.69
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop     mean sd
## STMORT 50  0.90  0.90  0.83   0.77  4.6e-16  1
## OVWINC 50  0.87  0.87  0.77   0.71 -1.8e-15  1
## HDMORT 50  0.88  0.88  0.78   0.72 -2.9e-16  1

##################################
# Formulating the dandelion plot to
# visualize both factor variances and loadings
# from the factor loading matrices
##################################
FA_PA_P_4F_FactorLoading <- factload(DPA.Descriptors.Numeric,
                                  cormeth = "pearson",
                                  method = "prax",
                                  nfac = 4,
                                  rotation = "promax")

DandelionPlotPalette <- rev(rainbow(100, start = 0, end = 0.2))

dandelion(FA_PA_P_4F_FactorLoading,
          bound=0,
          mcex=c(1,1.2),
          palet=DandelionPlotPalette)

1.3.6.3 Maximum Likelihood Factor Extraction and Varimax Rotation

[A] Appplying Maximum Likelihood factor extraction and Varimax rotation, an evaluation was conducted using a set of empirical guidelines to determine the optimal number of factors to be retained for exploratory factor analysis. It was determined that:
     [A.1] The choice of 3 factors was supported by the Bentler, CNG, t and p methods for determining the optimal number of factors.
     [A.2] The choice of 4 factors was supported by the Beta, Bartlett and Adjusted BIC methods for determining the optimal number of factors.
     [A.3] 3 or 4 factors would be sufficiently plausible for an optimal balance between comprehensiveness and parsimony.
     [A.4] To ensure that both under-extraction and over-extraction are assessed, models with 3 and 4 and factors were sequentially evaluated for their interpretability and theoretical meaningfulness.

[B] Results for the exploratory factor analysis using a 3-Factor Structure were as follows:
     [B.1] Standardized Root Mean Square of the Residual = 0.03
     [B.2] Tucker-Lewis Fit Index = 0.99
     [B.3] Bayesian Information Criterion = -34.37
     [B.4] High Residual Rate = 0.13
     [B.5] Factor 1 was a latent factor with higher loading towards the following descriptors:
            [B.5.1] STMORT: Loading = 0.81, Communality = 0.79
            [B.5.2] OVWINC: Loading = 0.74, Communality = 0.75
            [B.5.3] RDMORT: Loading = 0.68, Communality = 0.61
            [B.5.4] HDMORT: Loading = 0.60, Communality = 0.67
            [B.5.5] DBMORT: Loading = 0.51, Communality = 0.38
            [B.5.6] Cronbach’s Alpha = 0.88
     [B.6] Factor 2 was a latent factor with higher loading towards the following descriptors:
            [B.6.1] MEUNDA: Loading = 0.95, Communality = 1.00
            [B.6.2] ARTINC: Loading = 0.64, Communality = 0.84
            [B.6.3] Cronbach’s Alpha = 0.89
     [B.7] Factor 3 was a latent factor with higher loading towards the following descriptors:
            [B.7.1] SMPREV: Loading = 0.79, Communality = 0.93
            [B.7.2] PDMORT: Loading = 0.56, Communality = 0.61
            [B.7.3] Cronbach’s Alpha = 0.85

[C] Results for the exploratory factor analysis using a 4-Factor Structure were as follows:
     [C.1] Standardized Root Mean Square of the Residual = 0.01
     [C.2] Tucker-Lewis Fit Index = 1.09
     [C.3] Bayesian Information Criterion = -21.64
     [C.4] High Residual Rate = 0.06
     [C.5] Factor 1 was a latent factor with higher loading towards the following descriptors:
            [C.5.1] MEUNDA: Loading = 0.90, Communality = 0.90
            [C.5.2] ARTINC: Loading = 0.69, Communality = 0.89
            [C.5.3] Cronbach’s Alpha = 0.89
     [C.6] Factor 2 was a latent factor with higher loading towards the following descriptors:
            [C.6.1] RDMORT: Loading = 0.90, Communality = 0.96
            [C.6.2] DBMORT: Loading = 0.51, Communality = 0.43
            [C.6.3] Cronbach’s Alpha = 0.76
     [C.7] Factor 3 was a latent factor with higher loading towards the following descriptors:
            [C.7.1] SMPREV: Loading = 0.75, Communality = 0.90
            [C.7.2] PDMORT: Loading = 0.60, Communality = 0.63
            [C.7.3] Cronbach’s Alpha = 0.85
     [C.8] Factor 4 was a latent factor with higher loading towards the following descriptors:
            [C.8.1] HDMORT: Loading = 0.66, Communality = 0.77
            [C.8.2] STMORT: Loading = 0.60, Communality = 0.73
            [C.8.3] OVWINC: Loading = 0.60, Communality = 0.80
            [C.8.4] Cronbach’s Alpha = 0.86

Code Chunk | Output

##################################
# Implementing various procedures for determining
# factor retention based on
# the maximum consensus between methods
##################################
(FA_ML_V_MethodAgreementProcedure <- parameters::n_factors(DPA.Descriptors.Numeric,
                                                           algorithm = "mle",
                                                           rotation = "varimax"))

## # Method Agreement Procedure:
## 
## The choice of 1 dimensions is supported by 8 (42.11%) methods out of 19 (Optimal coordinates, Acceleration factor, Parallel analysis, Kaiser criterion, Scree (SE), Scree (R2), VSS complexity 1, Velicer's MAP).

as.data.frame(FA_ML_V_MethodAgreementProcedure)

##    n_Factors              Method              Family
## 1          1 Optimal coordinates               Scree
## 2          1 Acceleration factor               Scree
## 3          1   Parallel analysis               Scree
## 4          1    Kaiser criterion               Scree
## 5          1          Scree (SE)            Scree_SE
## 6          1          Scree (R2)            Scree_SE
## 7          1    VSS complexity 1                 VSS
## 8          1       Velicer's MAP        Velicers_MAP
## 9          2    VSS complexity 2                 VSS
## 10         2                 BIC                 BIC
## 11         3             Bentler             Bentler
## 12         3                 CNG                 CNG
## 13         3                   t Multiple_regression
## 14         3                   p Multiple_regression
## 15         4            Bartlett             Barlett
## 16         4                beta Multiple_regression
## 17         4      BIC (adjusted)                 BIC
## 18         5            Anderson             Barlett
## 19         5              Lawley             Barlett

##################################
# Conducting exploratory factor analysis
# using Maximum Likelihood extraction
# and Varimax rotation
# with 3 factors
##################################
(FA_ML_V_3F <- fa(DPA.Descriptors.Numeric,
              nfactors = 3,
              fm="ml",
              rotate = "varimax",
              residuals=TRUE,
              SMC=TRUE,
              n.obs=nrow(DPA.Descriptors.Numeric)))

## Factor Analysis using method =  ml
## Call: fa(r = DPA.Descriptors.Numeric, nfactors = 3, n.obs = nrow(DPA.Descriptors.Numeric), 
##     rotate = "varimax", residuals = TRUE, SMC = TRUE, fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
##         ML3  ML1  ML2   h2    u2 com
## ARTINC 0.34 0.64 0.56 0.84 0.155 2.5
## RDMORT 0.68 0.23 0.30 0.61 0.389 1.6
## PDMORT 0.46 0.28 0.56 0.61 0.388 2.4
## HDMORT 0.60 0.49 0.25 0.67 0.335 2.3
## STMORT 0.81 0.31 0.19 0.79 0.210 1.4
## DBMORT 0.51 0.20 0.29 0.38 0.617 1.9
## MEUNDA 0.20 0.95 0.23 1.00 0.005 1.2
## OVWINC 0.74 0.04 0.44 0.75 0.255 1.6
## SMPREV 0.41 0.37 0.79 0.93 0.069 2.0
## 
##                        ML3  ML1  ML2
## SS loadings           2.85 1.95 1.77
## Proportion Var        0.32 0.22 0.20
## Cumulative Var        0.32 0.53 0.73
## Proportion Explained  0.43 0.30 0.27
## Cumulative Proportion 0.43 0.73 1.00
## 
## Mean item complexity =  1.9
## Test of the hypothesis that 3 factors are sufficient.
## 
## df null model =  36  with the objective function =  7.13 with Chi Square =  321.92
## df of  the model are 12  and the objective function was  0.29 
## 
## The root mean square of the residuals (RMSR) is  0.03 
## The df corrected root mean square of the residuals is  0.06 
## 
## The harmonic n.obs is  50 with the empirical chi square  4.2  with prob <  0.98 
## The total n.obs was  50  with Likelihood Chi Square =  12.57  with prob <  0.4 
## 
## Tucker Lewis Index of factoring reliability =  0.994
## RMSEA index =  0.023  and the 90 % confidence intervals are  0 0.151
## BIC =  -34.38
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    ML3  ML1  ML2
## Correlation of (regression) scores with factors   0.91 0.99 0.92
## Multiple R square of scores with factors          0.83 0.99 0.84
## Minimum correlation of possible factor scores     0.66 0.97 0.68

(FA_ML_V_3F_Summary <- FA_ML_V_3F %>%
  model_parameters(sort = TRUE, threshold = "max"))

## # Rotated loadings from Factor Analysis (varimax-rotation)
## 
## Variable |  ML3 |  ML1 |  ML2 | Complexity | Uniqueness
## -------------------------------------------------------
## STMORT   | 0.81 |      |      |       1.39 |       0.21
## OVWINC   | 0.74 |      |      |       1.62 |       0.25
## RDMORT   | 0.68 |      |      |       1.63 |       0.39
## HDMORT   | 0.60 |      |      |       2.29 |       0.33
## DBMORT   | 0.51 |      |      |       1.90 |       0.62
## MEUNDA   |      | 0.95 |      |       1.20 |   5.00e-03
## ARTINC   |      | 0.64 |      |       2.53 |       0.16
## SMPREV   |      |      | 0.79 |       1.98 |       0.07
## PDMORT   |      |      | 0.56 |       2.44 |       0.39
## 
## The 3 latent factors (varimax rotation) accounted for 73.08% of the total variance of the original data (ML3 = 31.68%, ML1 = 21.70%, ML2 = 19.70%).

summary(FA_ML_V_3F_Summary)

## # (Explained) Variance of Components
## 
## Parameter                       |   ML3 |   ML1 |   ML2
## -------------------------------------------------------
## Eigenvalues                     | 5.396 | 0.822 | 0.364
## Variance Explained              | 0.317 | 0.217 | 0.197
## Variance Explained (Cumulative) | 0.317 | 0.534 | 0.731
## Variance Explained (Proportion) | 0.434 | 0.297 | 0.270

##################################
# Extracting the residuals
# from the Exploratory Factor Analysis
##################################
(FA_ML_V_3F_Residual <- residuals(FA_ML_V_3F,
                              diag=FALSE,
                              na.rm=TRUE))

##        ARTIN RDMOR PDMOR HDMOR STMOR DBMOR MEUND OVWIN SMPRE
## ARTINC    NA                                                
## RDMORT  0.05    NA                                          
## PDMORT  0.01 -0.05    NA                                    
## HDMORT -0.02 -0.08  0.02    NA                              
## STMORT  0.00  0.02  0.00  0.01    NA                        
## DBMORT  0.02  0.13  0.06 -0.06 -0.03    NA                  
## MEUNDA  0.00  0.00  0.00  0.00  0.00  0.00    NA            
## OVWINC -0.03 -0.01  0.01  0.04 -0.01 -0.02  0.00    NA      
## SMPREV  0.00 -0.01  0.00  0.00  0.00 -0.01  0.00  0.01    NA

##################################
# Obtaining Fit Indices
##################################
(FA_ML_V_3F_RMS <- FA_ML_V_3F$rms)

## [1] 0.03416164

(FA_ML_V_3F_TLI <- FA_ML_V_3F$TLI)

## [1] 0.9937238

(FA_ML_V_3F_BIC <- FA_ML_V_3F$BIC)

## [1] -34.37594

(FA_ML_V_3F_MaxResidual   <- max(abs(FA_ML_V_3F_Residual),na.rm=TRUE))

## [1] 0.132959

(FA_ML_V_3F_HighResidual  <- sum(FA_ML_V_3F_Residual>abs(0.05),na.rm=TRUE))

## [1] 6

(FA_ML_V_3F_TotalResidual <- length(DPA.Descriptors.Numeric)*(length(DPA.Descriptors.Numeric)-1)/2)

## [1] 36

(FA_ML_V_3F_HighResidualRate <- FA_ML_V_3F_HighResidual/FA_ML_V_3F_TotalResidual)

## [1] 0.1666667

##################################
# Graph the factor loading matrices
##################################
fa.diagram(FA_ML_V_3F,
           sort=TRUE,
           cut=0,
           digits=3,
           main="Maximum Likelihood Factor Extraction + Varimax Rotation : 3 Factors",
           cex=0.75)

##################################
# computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("STMORT","OVWINC","RDMORT","HDMORT","DBMORT")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("STMORT", "OVWINC", "RDMORT", 
##     "HDMORT", "DBMORT")], check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase     mean   sd median_r
##       0.88      0.88    0.87      0.59 7.2 0.028 -2.6e-16 0.82     0.62
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.81  0.88  0.92
## Duhachek  0.82  0.88  0.93
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r med.r
## STMORT      0.83      0.83    0.80      0.55 4.9    0.039 0.0079  0.57
## OVWINC      0.84      0.84    0.83      0.58 5.4    0.036 0.0133  0.57
## RDMORT      0.84      0.84    0.81      0.57 5.3    0.037 0.0134  0.56
## HDMORT      0.86      0.86    0.83      0.61 6.2    0.032 0.0077  0.63
## DBMORT      0.88      0.88    0.85      0.64 7.3    0.028 0.0046  0.66
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop     mean sd
## STMORT 50  0.88  0.88  0.85   0.79  4.6e-16  1
## OVWINC 50  0.84  0.84  0.79   0.74 -1.8e-15  1
## RDMORT 50  0.85  0.85  0.81   0.75  3.3e-16  1
## HDMORT 50  0.79  0.79  0.73   0.67 -2.9e-16  1
## DBMORT 50  0.74  0.74  0.64   0.59 -1.6e-17  1

##################################
# computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("MEUNDA","ARTINC")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("MEUNDA", "ARTINC")], check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase    mean   sd median_r
##       0.89      0.89     0.8       0.8 8.2 0.031 2.3e-16 0.95      0.8
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.81  0.89  0.94
## Duhachek  0.83  0.89  0.95
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## MEUNDA       0.8       0.8    0.65       0.8 4.1       NA     0   0.8
## ARTINC       0.8       0.8    0.65       0.8 4.1       NA     0   0.8
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop    mean sd
## MEUNDA 50  0.95  0.95  0.85    0.8 1.5e-16  1
## ARTINC 50  0.95  0.95  0.85    0.8 3.0e-16  1

##################################
# computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("SMPREV","PDMORT")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("SMPREV", "PDMORT")], check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase    mean   sd median_r
##       0.85      0.85    0.74      0.74 5.6 0.043 1.5e-16 0.93     0.74
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.73  0.85  0.91
## Duhachek  0.77  0.85  0.93
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## SMPREV      0.74      0.74    0.55      0.74 2.8       NA     0  0.74
## PDMORT      0.74      0.74    0.55      0.74 2.8       NA     0  0.74
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop    mean sd
## SMPREV 50  0.93  0.93   0.8   0.74 1.2e-16  1
## PDMORT 50  0.93  0.93   0.8   0.74 1.6e-16  1

##################################
# Formulating the dandelion plot to
# visualize both factor variances and loadings
# from the factor loading matrices
##################################
FA_ML_V_3F_FactorLoading <- factload(DPA.Descriptors.Numeric,
                                  cormeth = "pearson",
                                  method = "mle",
                                  nfac = 3,
                                  rotation = "varimax")

DandelionPlotPalette <- rev(rainbow(100, start = 0, end = 0.2))

dandelion(FA_ML_V_3F_FactorLoading,
          bound=0,
          mcex=c(1,1.2),
          palet=DandelionPlotPalette)

##################################
# Conducting exploratory factor analysis
# using Maximum Likelihood extraction
# and Varimax rotation
# with 4 factors
##################################
(FA_ML_V_4F <- fa(DPA.Descriptors.Numeric,
              nfactors = 4,
              fm="ml",
              rotate = "varimax",
              residuals=TRUE,
              SMC=TRUE,
              n.obs=nrow(DPA.Descriptors.Numeric)))

## Factor Analysis using method =  ml
## Call: fa(r = DPA.Descriptors.Numeric, nfactors = 4, n.obs = nrow(DPA.Descriptors.Numeric), 
##     rotate = "varimax", residuals = TRUE, SMC = TRUE, fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
##          ML2  ML1  ML3  ML4   h2    u2 com
## ARTINC  0.69 0.34 0.52 0.15 0.89 0.106 2.5
## RDMORT  0.21 0.90 0.22 0.25 0.96 0.040 1.4
## PDMORT  0.30 0.25 0.60 0.34 0.63 0.370 2.6
## HDMORT  0.47 0.23 0.26 0.66 0.77 0.226 2.4
## STMORT  0.28 0.49 0.24 0.60 0.73 0.267 2.8
## DBMORT  0.19 0.51 0.29 0.23 0.43 0.574 2.4
## MEUNDA  0.90 0.16 0.18 0.21 0.90 0.096 1.3
## OVWINC -0.01 0.42 0.52 0.60 0.80 0.201 2.8
## SMPREV  0.41 0.31 0.75 0.26 0.90 0.105 2.2
## 
##                        ML2  ML1  ML3  ML4
## SS loadings           1.92 1.82 1.75 1.52
## Proportion Var        0.21 0.20 0.19 0.17
## Cumulative Var        0.21 0.42 0.61 0.78
## Proportion Explained  0.27 0.26 0.25 0.22
## Cumulative Proportion 0.27 0.53 0.78 1.00
## 
## Mean item complexity =  2.3
## Test of the hypothesis that 4 factors are sufficient.
## 
## df null model =  36  with the objective function =  7.13 with Chi Square =  321.92
## df of  the model are 6  and the objective function was  0.04 
## 
## The root mean square of the residuals (RMSR) is  0.01 
## The df corrected root mean square of the residuals is  0.03 
## 
## The harmonic n.obs is  50 with the empirical chi square  0.69  with prob <  0.99 
## The total n.obs was  50  with Likelihood Chi Square =  1.83  with prob <  0.93 
## 
## Tucker Lewis Index of factoring reliability =  1.094
## RMSEA index =  0  and the 90 % confidence intervals are  0 0.049
## BIC =  -21.64
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    ML2  ML1  ML3  ML4
## Correlation of (regression) scores with factors   0.95 0.96 0.90 0.85
## Multiple R square of scores with factors          0.89 0.92 0.80 0.73
## Minimum correlation of possible factor scores     0.79 0.84 0.61 0.46

(FA_ML_V_4F_Summary <- FA_ML_V_4F %>%
  model_parameters(sort = TRUE, threshold = "max"))

## # Rotated loadings from Factor Analysis (varimax-rotation)
## 
## Variable |  ML2 |  ML1 |  ML3 |  ML4 | Complexity | Uniqueness
## --------------------------------------------------------------
## MEUNDA   | 0.90 |      |      |      |       1.26 |       0.10
## ARTINC   | 0.69 |      |      |      |       2.49 |       0.11
## RDMORT   |      | 0.90 |      |      |       1.41 |       0.04
## DBMORT   |      | 0.51 |      |      |       2.38 |       0.57
## SMPREV   |      |      | 0.75 |      |       2.24 |       0.10
## PDMORT   |      |      | 0.60 |      |       2.56 |       0.37
## HDMORT   |      |      |      | 0.66 |       2.44 |       0.23
## OVWINC   |      |      |      | 0.60 |       2.77 |       0.20
## STMORT   |      |      |      | 0.60 |       2.76 |       0.27
## 
## The 4 latent factors (varimax rotation) accounted for 77.96% of the total variance of the original data (ML2 = 21.30%, ML1 = 20.28%, ML3 = 19.47%, ML4 = 16.91%).

summary(FA_ML_V_4F_Summary)

## # (Explained) Variance of Components
## 
## Parameter                       |   ML2 |   ML1 |   ML3 |   ML4
## ---------------------------------------------------------------
## Eigenvalues                     | 5.443 | 0.831 | 0.401 | 0.345
## Variance Explained              | 0.213 | 0.203 | 0.195 | 0.169
## Variance Explained (Cumulative) | 0.213 | 0.416 | 0.610 | 0.780
## Variance Explained (Proportion) | 0.273 | 0.260 | 0.250 | 0.217

##################################
# Extracting the residuals
# from the Exploratory Factor Analysis
##################################
(FA_ML_V_4F_Residual <- residuals(FA_ML_V_4F,
                              diag=FALSE,
                              na.rm=TRUE))

##        ARTIN RDMOR PDMOR HDMOR STMOR DBMOR MEUND OVWIN SMPRE
## ARTINC    NA                                                
## RDMORT  0.00    NA                                          
## PDMORT  0.00  0.00    NA                                    
## HDMORT  0.00  0.00  0.00    NA                              
## STMORT  0.01  0.00  0.01  0.00    NA                        
## DBMORT  0.00  0.00  0.08 -0.02  0.00    NA                  
## MEUNDA  0.00  0.00  0.00  0.00  0.00  0.01    NA            
## OVWINC  0.00  0.00 -0.01  0.00  0.00  0.00  0.00    NA      
## SMPREV  0.00  0.00  0.00  0.00 -0.01 -0.01  0.00  0.00    NA

##################################
# Obtaining Fit Indices
##################################
(FA_ML_V_4F_RMS <- FA_ML_V_4F$rms)

## [1] 0.01385925

(FA_ML_V_4F_TLI <- FA_ML_V_4F$TLI)

## [1] 1.093667

(FA_ML_V_4F_BIC <- FA_ML_V_4F$BIC)

## [1] -21.63894

(FA_ML_V_4F_MaxResidual   <- max(abs(FA_ML_V_4F_Residual),na.rm=TRUE))

## [1] 0.07801115

(FA_ML_V_4F_HighResidual  <- sum(FA_ML_V_4F_Residual>abs(0.05),na.rm=TRUE))

## [1] 2

(FA_ML_V_4F_TotalResidual <- length(DPA.Descriptors.Numeric)*(length(DPA.Descriptors.Numeric)-1)/2)

## [1] 36

(FA_ML_V_4F_HighResidualRate <- FA_ML_V_4F_HighResidual/FA_ML_V_4F_TotalResidual)

## [1] 0.05555556

##################################
# Graph the factor loading matrices
##################################
fa.diagram(FA_ML_V_4F,
           sort=TRUE,
           cut=0,
           digits=3,
           main="Maximum Likelihood Factor Extraction + Varimax Rotation : 4 Factors",
           cex=0.75)

##################################
# computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("MEUNDA","ARTINC")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("MEUNDA", "ARTINC")], check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase    mean   sd median_r
##       0.89      0.89     0.8       0.8 8.2 0.031 2.3e-16 0.95      0.8
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.81  0.89  0.94
## Duhachek  0.83  0.89  0.95
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## MEUNDA       0.8       0.8    0.65       0.8 4.1       NA     0   0.8
## ARTINC       0.8       0.8    0.65       0.8 4.1       NA     0   0.8
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop    mean sd
## MEUNDA 50  0.95  0.95  0.85    0.8 1.5e-16  1
## ARTINC 50  0.95  0.95  0.85    0.8 3.0e-16  1

##################################
# computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("RDMORT","DBMORT")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("RDMORT", "DBMORT")], check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase    mean  sd median_r
##       0.76      0.76    0.61      0.61 3.2 0.068 1.5e-16 0.9     0.61
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.58  0.76  0.86
## Duhachek  0.63  0.76  0.89
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## RDMORT      0.61      0.61    0.38      0.61 1.6       NA     0  0.61
## DBMORT      0.61      0.61    0.38      0.61 1.6       NA     0  0.61
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop     mean sd
## RDMORT 50   0.9   0.9   0.7   0.61  3.3e-16  1
## DBMORT 50   0.9   0.9   0.7   0.61 -1.6e-17  1

##################################
# computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("SMPREV","PDMORT")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("SMPREV", "PDMORT")], check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase    mean   sd median_r
##       0.85      0.85    0.74      0.74 5.6 0.043 1.5e-16 0.93     0.74
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.73  0.85  0.91
## Duhachek  0.77  0.85  0.93
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## SMPREV      0.74      0.74    0.55      0.74 2.8       NA     0  0.74
## PDMORT      0.74      0.74    0.55      0.74 2.8       NA     0  0.74
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop    mean sd
## SMPREV 50  0.93  0.93   0.8   0.74 1.2e-16  1
## PDMORT 50  0.93  0.93   0.8   0.74 1.6e-16  1

##################################
# computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("STMORT","OVWINC","HDMORT")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("STMORT", "OVWINC", "HDMORT")], 
##     check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase     mean   sd median_r
##       0.86      0.86     0.8      0.67 6.1 0.035 -5.4e-16 0.88     0.69
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.77  0.86  0.91
## Duhachek  0.79  0.86  0.93
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## STMORT      0.77      0.77    0.62      0.62 3.3    0.066    NA  0.62
## OVWINC      0.82      0.82    0.70      0.70 4.6    0.050    NA  0.70
## HDMORT      0.82      0.82    0.69      0.69 4.4    0.052    NA  0.69
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop     mean sd
## STMORT 50  0.90  0.90  0.83   0.77  4.6e-16  1
## OVWINC 50  0.87  0.87  0.77   0.71 -1.8e-15  1
## HDMORT 50  0.88  0.88  0.78   0.72 -2.9e-16  1

##################################
# Formulating the dandelion plot to
# visualize both factor variances and loadings
# from the factor loading matrices
##################################
FA_ML_V_4F_FactorLoading <- factload(DPA.Descriptors.Numeric,
                                  cormeth = "pearson",
                                  method = "mle",
                                  nfac = 4,
                                  rotation = "varimax")

DandelionPlotPalette <- rev(rainbow(100, start = 0, end = 0.2))

dandelion(FA_ML_V_4F_FactorLoading,
          bound=0,
          mcex=c(1,1.2),
          palet=DandelionPlotPalette)

1.3.6.4 Maximum Likelihood Factor Extraction and Promax Rotation

[A] Appplying Maximum Likelihood factor extraction and Promax rotation, an evaluation was conducted using a set of empirical guidelines to determine the optimal number of factors to be retained for exploratory factor analysis. It was determined that:
     [A.1] The choice of 3 factors was supported by the Bentler, CNG, t and p methods for determining the optimal number of factors.
     [A.2] The choice of 4 factors was supported by the Beta, Bartlett and Adjusted BIC methods for determining the optimal number of factors.
     [A.3] 3 or 4 factors would be sufficiently plausible for an optimal balance between comprehensiveness and parsimony.
     [A.4] To ensure that both under-extraction and over-extraction are assessed, models with 3 and 4 and factors were sequentially evaluated for their interpretability and theoretical meaningfulness.

[B] Results for the exploratory factor analysis using a 3-Factor Structure were as follows:
     [B.1] Standardized Root Mean Square of the Residual = 0.03
     [B.2] Tucker-Lewis Fit Index = 0.99
     [B.3] Bayesian Information Criterion = -34.38
     [B.4] High Residual Rate = 0.17
     [B.5] Factor 1 was a latent factor with higher loading towards the following descriptors:
            [B.5.1] STMORT: Loading = 0.96, Communality = 0.79
            [B.5.2] OVWINC: Loading = 0.76, Communality = 0.75
            [B.5.3] RDMORT: Loading = 0.72, Communality = 0.61
            [B.5.4] HDMORT: Loading = 0.58, Communality = 0.67
            [B.5.5] DBMORT: Loading = 0.50, Communality = 0.38
            [B.5.6] Cronbach’s Alpha = 0.88
     [B.6] Factor 2 was a latent factor with higher loading towards the following descriptors:
            [B.6.1] SMPREV: Loading = 0.91, Communality = 0.93
            [B.6.2] PDMORT: Loading = 0.56, Communality = 0.61
            [B.6.3] ARTINC: Loading = 0.54, Communality = 0.84
            [B.6.4] Cronbach’s Alpha = 0.89
     [B.7] Factor 3 was a latent factor with higher loading towards the following descriptors:
            [B.7.1] MEUNDA: Loading = 1.02, Communality = 1.00
            [B.7.2] Cronbach’s Alpha = 0.85

[C] Results for the exploratory factor analysis using a 4-Factor Structure were as follows:
     [C.1] Standardized Root Mean Square of the Residual = 0.01
     [C.2] Tucker-Lewis Fit Index = 1.09
     [C.3] Bayesian Information Criterion = -21.63
     [C.4] High Residual Rate = 0.06
     [C.5] Factor 1 was a latent factor with higher loading towards the following descriptors:
            [C.5.1] MEUNDA: Loading = 0.91, Communality = 0.90
            [C.5.2] ARTINC: Loading = 0.55, Communality = 0.89
            [C.5.3] Cronbach’s Alpha = 0.89
     [C.6] Factor 2 was a latent factor with higher loading towards the following descriptors:
            [C.6.1] RDMORT: Loading = 1.05, Communality = 0.96
            [C.6.2] DBMORT: Loading = 0.48, Communality = 0.43
            [C.6.3] Cronbach’s Alpha = 0.76
     [C.7] Factor 3 was a latent factor with higher loading towards the following descriptors:
            [C.7.1] SMPREV: Loading = 0.89, Communality = 0.90
            [C.7.2] PDMORT: Loading = 0.67, Communality = 0.63
            [C.7.3] Cronbach’s Alpha = 0.85
     [C.8] Factor 4 was a latent factor with higher loading towards the following descriptors:
            [C.8.1] HDMORT: Loading = 0.76, Communality = 0.77
            [C.8.2] STMORT: Loading = 0.61, Communality = 0.73
            [C.8.3] OVWINC: Loading = 0.56, Communality = 0.79
            [C.8.4] Cronbach’s Alpha = 0.86

Code Chunk | Output

##################################
# Implementing various procedures for determining
# factor retention based on
# the maximum consensus between methods
##################################
(FA_ML_P_MethodAgreementProcedure <- parameters::n_factors(DPA.Descriptors.Numeric,
                                                           algorithm = "mle",
                                                           rotation = "promax"))

## # Method Agreement Procedure:
## 
## The choice of 1 dimensions is supported by 8 (42.11%) methods out of 19 (Optimal coordinates, Acceleration factor, Parallel analysis, Kaiser criterion, Scree (SE), Scree (R2), VSS complexity 1, Velicer's MAP).

as.data.frame(FA_ML_P_MethodAgreementProcedure)

##    n_Factors              Method              Family
## 1          1 Optimal coordinates               Scree
## 2          1 Acceleration factor               Scree
## 3          1   Parallel analysis               Scree
## 4          1    Kaiser criterion               Scree
## 5          1          Scree (SE)            Scree_SE
## 6          1          Scree (R2)            Scree_SE
## 7          1    VSS complexity 1                 VSS
## 8          1       Velicer's MAP        Velicers_MAP
## 9          2    VSS complexity 2                 VSS
## 10         2                 BIC                 BIC
## 11         3             Bentler             Bentler
## 12         3                 CNG                 CNG
## 13         3                   t Multiple_regression
## 14         3                   p Multiple_regression
## 15         4            Bartlett             Barlett
## 16         4                beta Multiple_regression
## 17         4      BIC (adjusted)                 BIC
## 18         5            Anderson             Barlett
## 19         5              Lawley             Barlett

##################################
# Conducting exploratory factor analysis
# using Maximum Likelihood extraction
# and Promax rotation
# with 3 factors
##################################
(FA_ML_P_3F <- fa(DPA.Descriptors.Numeric,
              nfactors = 3,
              fm="ml",
              rotate = "promax",
              residuals=TRUE,
              SMC=TRUE,
              n.obs=nrow(DPA.Descriptors.Numeric)))

## Factor Analysis using method =  ml
## Call: fa(r = DPA.Descriptors.Numeric, nfactors = 3, n.obs = nrow(DPA.Descriptors.Numeric), 
##     rotate = "promax", residuals = TRUE, SMC = TRUE, fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
##          ML3   ML2   ML1   h2    u2 com
## ARTINC -0.01  0.54  0.50 0.84 0.155 2.0
## RDMORT  0.72  0.08  0.00 0.61 0.389 1.0
## PDMORT  0.24  0.56  0.03 0.61 0.388 1.4
## HDMORT  0.58 -0.03  0.36 0.67 0.335 1.7
## STMORT  0.96 -0.18  0.11 0.79 0.210 1.1
## DBMORT  0.50  0.14  0.01 0.38 0.617 1.2
## MEUNDA -0.07  0.02  1.02 1.00 0.005 1.0
## OVWINC  0.76  0.30 -0.29 0.75 0.255 1.6
## SMPREV  0.01  0.91  0.07 0.93 0.069 1.0
## 
##                        ML3  ML2  ML1
## SS loadings           2.91 1.97 1.70
## Proportion Var        0.32 0.22 0.19
## Cumulative Var        0.32 0.54 0.73
## Proportion Explained  0.44 0.30 0.26
## Cumulative Proportion 0.44 0.74 1.00
## 
##  With factor correlations of 
##      ML3  ML2  ML1
## ML3 1.00 0.75 0.56
## ML2 0.75 1.00 0.60
## ML1 0.56 0.60 1.00
## 
## Mean item complexity =  1.3
## Test of the hypothesis that 3 factors are sufficient.
## 
## df null model =  36  with the objective function =  7.13 with Chi Square =  321.92
## df of  the model are 12  and the objective function was  0.29 
## 
## The root mean square of the residuals (RMSR) is  0.03 
## The df corrected root mean square of the residuals is  0.06 
## 
## The harmonic n.obs is  50 with the empirical chi square  4.2  with prob <  0.98 
## The total n.obs was  50  with Likelihood Chi Square =  12.57  with prob <  0.4 
## 
## Tucker Lewis Index of factoring reliability =  0.994
## RMSEA index =  0.023  and the 90 % confidence intervals are  0 0.151
## BIC =  -34.38
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    ML3  ML2  ML1
## Correlation of (regression) scores with factors   0.96 0.97 1.00
## Multiple R square of scores with factors          0.92 0.94 0.99
## Minimum correlation of possible factor scores     0.83 0.89 0.99

(FA_ML_P_3F_Summary <- FA_ML_P_3F %>%
  model_parameters(sort = TRUE, threshold = "max"))

## # Rotated loadings from Factor Analysis (promax-rotation)
## 
## Variable |  ML3 |  ML2 |  ML1 | Complexity | Uniqueness
## -------------------------------------------------------
## STMORT   | 0.96 |      |      |       1.10 |       0.21
## OVWINC   | 0.76 |      |      |       1.60 |       0.25
## RDMORT   | 0.72 |      |      |       1.02 |       0.39
## HDMORT   | 0.58 |      |      |       1.66 |       0.33
## DBMORT   | 0.50 |      |      |       1.16 |       0.62
## SMPREV   |      | 0.91 |      |       1.01 |       0.07
## PDMORT   |      | 0.56 |      |       1.36 |       0.39
## ARTINC   |      | 0.54 |      |       1.99 |       0.16
## MEUNDA   |      |      | 1.02 |       1.01 |   5.00e-03
## 
## The 3 latent factors (promax rotation) accounted for 73.08% of the total variance of the original data (ML3 = 32.37%, ML2 = 21.87%, ML1 = 18.85%).

summary(FA_ML_P_3F_Summary)

## # (Explained) Variance of Components
## 
## Parameter                       |   ML3 |   ML2 |   ML1
## -------------------------------------------------------
## Eigenvalues                     | 5.396 | 0.822 | 0.364
## Variance Explained              | 0.324 | 0.219 | 0.188
## Variance Explained (Cumulative) | 0.324 | 0.542 | 0.731
## Variance Explained (Proportion) | 0.443 | 0.299 | 0.258

##################################
# Extracting the residuals
# from the Exploratory Factor Analysis
##################################
(FA_ML_P_3F_Residual <- residuals(FA_ML_P_3F,
                              diag=FALSE,
                              na.rm=TRUE))

##        ARTIN RDMOR PDMOR HDMOR STMOR DBMOR MEUND OVWIN SMPRE
## ARTINC    NA                                                
## RDMORT  0.05    NA                                          
## PDMORT  0.01 -0.05    NA                                    
## HDMORT -0.02 -0.08  0.02    NA                              
## STMORT  0.00  0.02  0.00  0.01    NA                        
## DBMORT  0.02  0.13  0.06 -0.06 -0.03    NA                  
## MEUNDA  0.00  0.00  0.00  0.00  0.00  0.00    NA            
## OVWINC -0.03 -0.01  0.01  0.04 -0.01 -0.02  0.00    NA      
## SMPREV  0.00 -0.01  0.00  0.00  0.00 -0.01  0.00  0.01    NA

##################################
# Obtaining Fit Indices
##################################
(FA_ML_P_3F_RMS <- FA_ML_P_3F$rms)

## [1] 0.03416164

(FA_ML_P_3F_TLI <- FA_ML_P_3F$TLI)

## [1] 0.9937238

(FA_ML_P_3F_BIC <- FA_ML_P_3F$BIC)

## [1] -34.37594

(FA_ML_P_3F_MaxResidual   <- max(abs(FA_ML_P_3F_Residual),na.rm=TRUE))

## [1] 0.132959

(FA_ML_P_3F_HighResidual  <- sum(FA_ML_P_3F_Residual>abs(0.05),na.rm=TRUE))

## [1] 6

(FA_ML_P_3F_TotalResidual <- length(DPA.Descriptors.Numeric)*(length(DPA.Descriptors.Numeric)-1)/2)

## [1] 36

(FA_ML_P_3F_HighResidualRate <- FA_ML_P_3F_HighResidual/FA_ML_P_3F_TotalResidual)

## [1] 0.1666667

##################################
# Graph the factor loading matrices
##################################
fa.diagram(FA_ML_P_3F,
           sort=TRUE,
           cut=0,
           digits=3,
           main="Maximum Likelihood Factor Extraction + Promax Rotation : 3 Factors",
           cex=0.75)

##################################
# computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("STMORT","OVWINC","RDMORT","HDMORT","DBMORT")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("STMORT", "OVWINC", "RDMORT", 
##     "HDMORT", "DBMORT")], check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase     mean   sd median_r
##       0.88      0.88    0.87      0.59 7.2 0.028 -2.6e-16 0.82     0.62
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.81  0.88  0.92
## Duhachek  0.82  0.88  0.93
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r med.r
## STMORT      0.83      0.83    0.80      0.55 4.9    0.039 0.0079  0.57
## OVWINC      0.84      0.84    0.83      0.58 5.4    0.036 0.0133  0.57
## RDMORT      0.84      0.84    0.81      0.57 5.3    0.037 0.0134  0.56
## HDMORT      0.86      0.86    0.83      0.61 6.2    0.032 0.0077  0.63
## DBMORT      0.88      0.88    0.85      0.64 7.3    0.028 0.0046  0.66
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop     mean sd
## STMORT 50  0.88  0.88  0.85   0.79  4.6e-16  1
## OVWINC 50  0.84  0.84  0.79   0.74 -1.8e-15  1
## RDMORT 50  0.85  0.85  0.81   0.75  3.3e-16  1
## HDMORT 50  0.79  0.79  0.73   0.67 -2.9e-16  1
## DBMORT 50  0.74  0.74  0.64   0.59 -1.6e-17  1

##################################
# computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("MEUNDA","ARTINC")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("MEUNDA", "ARTINC")], check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase    mean   sd median_r
##       0.89      0.89     0.8       0.8 8.2 0.031 2.3e-16 0.95      0.8
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.81  0.89  0.94
## Duhachek  0.83  0.89  0.95
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## MEUNDA       0.8       0.8    0.65       0.8 4.1       NA     0   0.8
## ARTINC       0.8       0.8    0.65       0.8 4.1       NA     0   0.8
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop    mean sd
## MEUNDA 50  0.95  0.95  0.85    0.8 1.5e-16  1
## ARTINC 50  0.95  0.95  0.85    0.8 3.0e-16  1

##################################
# computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("SMPREV","PDMORT")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("SMPREV", "PDMORT")], check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase    mean   sd median_r
##       0.85      0.85    0.74      0.74 5.6 0.043 1.5e-16 0.93     0.74
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.73  0.85  0.91
## Duhachek  0.77  0.85  0.93
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## SMPREV      0.74      0.74    0.55      0.74 2.8       NA     0  0.74
## PDMORT      0.74      0.74    0.55      0.74 2.8       NA     0  0.74
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop    mean sd
## SMPREV 50  0.93  0.93   0.8   0.74 1.2e-16  1
## PDMORT 50  0.93  0.93   0.8   0.74 1.6e-16  1

##################################
# Formulating the dandelion plot to
# visualize both factor variances and loadings
# from the factor loading matrices
##################################
FA_ML_P_3F_FactorLoading <- factload(DPA.Descriptors.Numeric,
                                  cormeth = "pearson",
                                  method = "mle",
                                  nfac = 3,
                                  rotation = "promax")

DandelionPlotPalette <- rev(rainbow(100, start = 0, end = 0.2))

dandelion(FA_ML_P_3F_FactorLoading,
          bound=0,
          mcex=c(1,1.2),
          palet=DandelionPlotPalette)

##################################
# Conducting exploratory factor analysis
# using Maximum Likelihood extraction
# and Promax rotation
# with 4 factors
##################################
(FA_ML_P_4F <- fa(DPA.Descriptors.Numeric,
              nfactors = 4,
              fm="ml",
              rotate = "promax",
              residuals=TRUE,
              SMC=TRUE,
              n.obs=nrow(DPA.Descriptors.Numeric)))

## Factor Analysis using method =  ml
## Call: fa(r = DPA.Descriptors.Numeric, nfactors = 4, n.obs = nrow(DPA.Descriptors.Numeric), 
##     rotate = "promax", residuals = TRUE, SMC = TRUE, fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
##          ML3   ML4   ML2   ML1   h2    u2 com
## ARTINC  0.50 -0.13  0.55  0.14 0.89 0.106 2.2
## RDMORT -0.11  0.00  0.02  1.05 0.96 0.040 1.0
## PDMORT  0.67  0.17  0.06 -0.06 0.63 0.370 1.2
## HDMORT  0.01  0.76  0.32 -0.11 0.77 0.226 1.4
## STMORT -0.06  0.61  0.09  0.31 0.73 0.267 1.5
## DBMORT  0.15  0.06  0.02  0.48 0.43 0.574 1.2
## MEUNDA -0.03  0.12  0.91 -0.01 0.90 0.096 1.0
## OVWINC  0.46  0.56 -0.32  0.11 0.80 0.201 2.7
## SMPREV  0.89 -0.03  0.14 -0.01 0.90 0.105 1.1
## 
##                        ML3  ML4  ML2  ML1
## SS loadings           2.08 1.70 1.62 1.62
## Proportion Var        0.23 0.19 0.18 0.18
## Cumulative Var        0.23 0.42 0.60 0.78
## Proportion Explained  0.30 0.24 0.23 0.23
## Cumulative Proportion 0.30 0.54 0.77 1.00
## 
##  With factor correlations of 
##      ML3  ML4  ML2  ML1
## ML3 1.00 0.71 0.56 0.71
## ML4 0.71 1.00 0.44 0.69
## ML2 0.56 0.44 1.00 0.43
## ML1 0.71 0.69 0.43 1.00
## 
## Mean item complexity =  1.5
## Test of the hypothesis that 4 factors are sufficient.
## 
## df null model =  36  with the objective function =  7.13 with Chi Square =  321.92
## df of  the model are 6  and the objective function was  0.04 
## 
## The root mean square of the residuals (RMSR) is  0.01 
## The df corrected root mean square of the residuals is  0.03 
## 
## The harmonic n.obs is  50 with the empirical chi square  0.69  with prob <  0.99 
## The total n.obs was  50  with Likelihood Chi Square =  1.83  with prob <  0.93 
## 
## Tucker Lewis Index of factoring reliability =  1.094
## RMSEA index =  0  and the 90 % confidence intervals are  0 0.049
## BIC =  -21.64
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    ML3  ML4  ML2  ML1
## Correlation of (regression) scores with factors   0.97 0.94 0.96 0.98
## Multiple R square of scores with factors          0.93 0.88 0.93 0.97
## Minimum correlation of possible factor scores     0.86 0.76 0.86 0.93

(FA_ML_P_4F_Summary <- FA_ML_P_4F %>%
  model_parameters(sort = TRUE, threshold = "max"))

## # Rotated loadings from Factor Analysis (promax-rotation)
## 
## Variable |  ML3 |  ML4 |  ML2 |  ML1 | Complexity | Uniqueness
## --------------------------------------------------------------
## SMPREV   | 0.89 |      |      |      |       1.05 |       0.10
## PDMORT   | 0.67 |      |      |      |       1.17 |       0.37
## HDMORT   |      | 0.76 |      |      |       1.40 |       0.23
## STMORT   |      | 0.61 |      |      |       1.55 |       0.27
## OVWINC   |      | 0.56 |      |      |       2.68 |       0.20
## MEUNDA   |      |      | 0.91 |      |       1.04 |       0.10
## ARTINC   |      |      | 0.55 |      |       2.24 |       0.11
## RDMORT   |      |      |      | 1.05 |       1.02 |       0.04
## DBMORT   |      |      |      | 0.48 |       1.23 |       0.57
## 
## The 4 latent factors (promax rotation) accounted for 77.96% of the total variance of the original data (ML3 = 23.13%, ML4 = 18.86%, ML2 = 18.00%, ML1 = 17.97%).

summary(FA_ML_P_4F_Summary)

## # (Explained) Variance of Components
## 
## Parameter                       |   ML3 |   ML4 |   ML2 |   ML1
## ---------------------------------------------------------------
## Eigenvalues                     | 5.443 | 0.831 | 0.401 | 0.345
## Variance Explained              | 0.231 | 0.189 | 0.180 | 0.180
## Variance Explained (Cumulative) | 0.231 | 0.420 | 0.600 | 0.780
## Variance Explained (Proportion) | 0.297 | 0.242 | 0.231 | 0.231

##################################
# Extracting the residuals
# from the Exploratory Factor Analysis
##################################
(FA_ML_P_4F_Residual <- residuals(FA_ML_P_4F,
                              diag=FALSE,
                              na.rm=TRUE))

##        ARTIN RDMOR PDMOR HDMOR STMOR DBMOR MEUND OVWIN SMPRE
## ARTINC    NA                                                
## RDMORT  0.00    NA                                          
## PDMORT  0.00  0.00    NA                                    
## HDMORT  0.00  0.00  0.00    NA                              
## STMORT  0.01  0.00  0.01  0.00    NA                        
## DBMORT  0.00  0.00  0.08 -0.02  0.00    NA                  
## MEUNDA  0.00  0.00  0.00  0.00  0.00  0.01    NA            
## OVWINC  0.00  0.00 -0.01  0.00  0.00  0.00  0.00    NA      
## SMPREV  0.00  0.00  0.00  0.00 -0.01 -0.01  0.00  0.00    NA

##################################
# Obtaining Fit Indices
##################################
(FA_ML_P_4F_RMS <- FA_ML_P_4F$rms)

## [1] 0.01385925

(FA_ML_P_4F_TLI <- FA_ML_P_4F$TLI)

## [1] 1.093667

(FA_ML_P_4F_BIC <- FA_ML_P_4F$BIC)

## [1] -21.63894

(FA_ML_P_4F_MaxResidual   <- max(abs(FA_ML_P_4F_Residual),na.rm=TRUE))

## [1] 0.07801115

(FA_ML_P_4F_HighResidual  <- sum(FA_ML_P_4F_Residual>abs(0.05),na.rm=TRUE))

## [1] 2

(FA_ML_P_4F_TotalResidual <- length(DPA.Descriptors.Numeric)*(length(DPA.Descriptors.Numeric)-1)/2)

## [1] 36

(FA_ML_P_4F_HighResidualRate <- FA_ML_P_4F_HighResidual/FA_ML_P_4F_TotalResidual)

## [1] 0.05555556

##################################
# Graph the factor loading matrices
##################################
fa.diagram(FA_ML_P_4F,
           sort=TRUE,
           cut=0,
           digits=3,
           main="Maximum Likelihood Factor Extraction + Promax Rotation : 4 Factors",
           cex=0.75)

##################################
# computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("MEUNDA","ARTINC")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("MEUNDA", "ARTINC")], check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase    mean   sd median_r
##       0.89      0.89     0.8       0.8 8.2 0.031 2.3e-16 0.95      0.8
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.81  0.89  0.94
## Duhachek  0.83  0.89  0.95
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## MEUNDA       0.8       0.8    0.65       0.8 4.1       NA     0   0.8
## ARTINC       0.8       0.8    0.65       0.8 4.1       NA     0   0.8
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop    mean sd
## MEUNDA 50  0.95  0.95  0.85    0.8 1.5e-16  1
## ARTINC 50  0.95  0.95  0.85    0.8 3.0e-16  1

##################################
# computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("RDMORT","DBMORT")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("RDMORT", "DBMORT")], check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase    mean  sd median_r
##       0.76      0.76    0.61      0.61 3.2 0.068 1.5e-16 0.9     0.61
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.58  0.76  0.86
## Duhachek  0.63  0.76  0.89
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## RDMORT      0.61      0.61    0.38      0.61 1.6       NA     0  0.61
## DBMORT      0.61      0.61    0.38      0.61 1.6       NA     0  0.61
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop     mean sd
## RDMORT 50   0.9   0.9   0.7   0.61  3.3e-16  1
## DBMORT 50   0.9   0.9   0.7   0.61 -1.6e-17  1

##################################
# computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("SMPREV","PDMORT")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("SMPREV", "PDMORT")], check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase    mean   sd median_r
##       0.85      0.85    0.74      0.74 5.6 0.043 1.5e-16 0.93     0.74
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.73  0.85  0.91
## Duhachek  0.77  0.85  0.93
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## SMPREV      0.74      0.74    0.55      0.74 2.8       NA     0  0.74
## PDMORT      0.74      0.74    0.55      0.74 2.8       NA     0  0.74
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop    mean sd
## SMPREV 50  0.93  0.93   0.8   0.74 1.2e-16  1
## PDMORT 50  0.93  0.93   0.8   0.74 1.6e-16  1

##################################
# computing the internal consistency
# measure of reliability using the
# Cronbach's alpha coefficient
# for each factor
##################################
alpha(DPA.Descriptors.Numeric[,c("STMORT","OVWINC","HDMORT")],
      check.keys=FALSE)

## 
## Reliability analysis   
## Call: alpha(x = DPA.Descriptors.Numeric[, c("STMORT", "OVWINC", "HDMORT")], 
##     check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase     mean   sd median_r
##       0.86      0.86     0.8      0.67 6.1 0.035 -5.4e-16 0.88     0.69
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.77  0.86  0.91
## Duhachek  0.79  0.86  0.93
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## STMORT      0.77      0.77    0.62      0.62 3.3    0.066    NA  0.62
## OVWINC      0.82      0.82    0.70      0.70 4.6    0.050    NA  0.70
## HDMORT      0.82      0.82    0.69      0.69 4.4    0.052    NA  0.69
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop     mean sd
## STMORT 50  0.90  0.90  0.83   0.77  4.6e-16  1
## OVWINC 50  0.87  0.87  0.77   0.71 -1.8e-15  1
## HDMORT 50  0.88  0.88  0.78   0.72 -2.9e-16  1

##################################
# Formulating the dandelion plot to
# visualize both factor variances and loadings
# from the factor loading matrices
##################################
FA_ML_P_4F_FactorLoading <- factload(DPA.Descriptors.Numeric,
                                  cormeth = "pearson",
                                  method = "mle",
                                  nfac = 4,
                                  rotation = "promax")

DandelionPlotPalette <- rev(rainbow(100, start = 0, end = 0.2))

dandelion(FA_ML_P_4F_FactorLoading,
          bound=0,
          mcex=c(1,1.2),
          palet=DandelionPlotPalette)

1.3.6.5 Model Assessment

[A] Among candidates, optimal results were obtained for the exploratory factor analysis models applying Maximum Likelihood Extraction with both 3-Factor Structure and 4-Factor Structure by demonstrating excellent model fit metrics. In addition, the latent variables obtained were contextually meaningful as individual factors in the analysis.
     [A.1] Lowest Standardized Root Mean Square of the Residual
     [A.2] Highest Tucker-Lewis Fit Index
     [A.3] Lowest Bayesian Information Criterion
     [A.4] Lowest High Residual Rate

[B] Among the applied Varimax and Promax rotation methods, Promax rotation was selected with the assumption of correlation among the extracted formulated loadings.

[C] The selected 4-Factor Model from the exploratory factor analysis hypothesized the relationship between 4 latent factors and observed chronic disease indicator variables as follows:
     [C.1] Factor 1 was a latent factor describing Unhealthy Lifestyle with higher loadings toward the following descriptors:
            [C.1.1] HDMORT: Heart Disease Mortality
            [C.1.2] STMORT: Stroke Mortality
            [C.1.3] OVWINC: Overweight Incidence
     [C.2] Factor 2 was a latent factor describing Physical Inactivity with higher loadings toward the following descriptors:
            [C.2.1] MEUNDA: Mentally Unhealthy Days
            [C.2.2] ARTINC: Arthritis Incidence
     [C.3] Factor 3 was a latent factor describing Poor Nutrition with higher loadings toward the following descriptors:
            [C.3.1] RDMORT: Renal Disease Mortality
            [C.3.2] DBMORT: Diabetes Mortality
     [C.4] Factor 4 was a latent factor describing Tobacco Use with higher loadings toward the following descriptors:
            [C.4.1] SMPREV: Smoking Prevalence
            [C.4.2] PDMORT: Pulmonary Disease Mortality

[D] The selected 3-Factor Model from the exploratory factor analysis hypothesized the relationship between 3 latent factors and observed chronic disease indicator variables as follows:
     [D.1] Factor 1 was a latent factor describing Unhealthy Lifestyle with higher loadings toward the following descriptors:
            [D.1.1] HDMORT: Heart Disease Mortality
            [D.1.2] STMORT: Stroke Mortality
            [D.1.3] OVWINC: Overweight Incidence
            [D.1.4] RDMORT: Renal Disease Mortality
            [D.1.5] DBMORT: Diabetes Mortality
     [D.2] Factor 2 was a latent factor describing Tobacco Use with higher loadings toward the following descriptors:
            [D.2.1] SMPREV: Smoking Prevalence
            [D.2.2] PDMORT: Pulmonary Disease Mortality
            [D.2.3] ARTINC: Arthritis Incidence
     [D.3] Factor 3 was a latent factor describing Poor Mental Health with higher loading towards the following descriptors:
            [D.3.1] MEUNDA: Mentally Unhealthy Days

Code Chunk | Output

##################################
# Consolidating the fit indices
# from the exploratory factor analysis
##################################
FA_RMSSummary <- c(FA_PA_V_3F_RMS,
                   FA_PA_V_4F_RMS,
                   FA_PA_P_3F_RMS,
                   FA_PA_P_4F_RMS,
                   FA_ML_V_3F_RMS,
                   FA_ML_V_4F_RMS,
                   FA_ML_P_3F_RMS,
                   FA_ML_P_4F_RMS)

FA_TLISummary <- c(FA_PA_V_3F_TLI,
                   FA_PA_V_4F_TLI,
                   FA_PA_P_3F_TLI,
                   FA_PA_P_4F_TLI,
                   FA_ML_V_3F_TLI,
                   FA_ML_V_4F_TLI,
                   FA_ML_P_3F_TLI,
                   FA_ML_P_4F_TLI)

FA_BICSummary <- c(FA_PA_V_3F_BIC,
                   FA_PA_V_4F_BIC,
                   FA_PA_P_3F_BIC,
                   FA_PA_P_4F_BIC,
                   FA_ML_V_3F_BIC,
                   FA_ML_V_4F_BIC,
                   FA_ML_P_3F_BIC,
                   FA_ML_P_4F_BIC)

FA_HighResidualRateSummary <- c(FA_PA_V_3F_HighResidualRate,
                                FA_PA_V_4F_HighResidualRate,
                                FA_PA_P_3F_HighResidualRate,
                                FA_PA_P_4F_HighResidualRate,
                                FA_ML_V_3F_HighResidualRate,
                                FA_ML_V_4F_HighResidualRate,
                                FA_ML_P_3F_HighResidualRate,
                                FA_ML_P_4F_HighResidualRate)

FA_AlgorithmSummary <- c("3-Factor Principal Axes Extraction + Varimax Rotation",
                         "4-Factor Principal Axes Extraction + Varimax Rotation",
                         "3-Factor Principal Axes Extraction + Promax Rotation",
                         "4-Factor Principal Axes Extraction + Promax Rotation",
                         "3-Factor Maximum Likelihood Extraction + Varimax Rotation",
                         "4-Factor Maximum Likelihood Extraction + Varimax Rotation",
                         "3-Factor Maximum Likelihood Extraction + Promax Rotation",
                         "4-Factor Maximum Likelihood Extraction + Promax Rotation")

FA_Summary <- cbind(FA_RMSSummary,
                    FA_TLISummary,
                    FA_BICSummary,
                    FA_HighResidualRateSummary,
                    FA_AlgorithmSummary)

FA_Summary <- as.data.frame(FA_Summary)
names(FA_Summary) <- c("RMS",
                       "TLI",
                       "BIC",
                       "HighResidualRate",
                       "Algorithm")

FA_Summary$RMS <- as.numeric(as.character(FA_Summary$RMS))
FA_Summary$TLI <- as.numeric(as.character(FA_Summary$TLI))
FA_Summary$BIC <- as.numeric(as.character(FA_Summary$BIC))
FA_Summary$HighResidualRate <- as.numeric(as.character(FA_Summary$HighResidualRate))

FA_Summary$Algorithm <- factor(FA_Summary$Algorithm ,
                                        levels = c("3-Factor Principal Axes Extraction + Varimax Rotation",
                                                   "4-Factor Principal Axes Extraction + Varimax Rotation",
                                                   "3-Factor Principal Axes Extraction + Promax Rotation",
                                                   "4-Factor Principal Axes Extraction + Promax Rotation",
                                                   "3-Factor Maximum Likelihood Extraction + Varimax Rotation",
                                                   "4-Factor Maximum Likelihood Extraction + Varimax Rotation",
                                                   "3-Factor Maximum Likelihood Extraction + Promax Rotation",
                                                   "4-Factor Maximum Likelihood Extraction + Promax Rotation"))

print(FA_Summary, row.names=FALSE)

##         RMS       TLI       BIC HighResidualRate
##  0.03244678 0.9838317 -33.48016       0.11111111
##  0.01182674 1.0833708 -21.18092       0.05555556
##  0.03244678 0.9838317 -33.48016       0.11111111
##  0.01182674 1.0833708 -21.18092       0.05555556
##  0.03416164 0.9937238 -34.37594       0.16666667
##  0.01385925 1.0936669 -21.63894       0.05555556
##  0.03416164 0.9937238 -34.37594       0.16666667
##  0.01385925 1.0936669 -21.63894       0.05555556
##                                                  Algorithm
##      3-Factor Principal Axes Extraction + Varimax Rotation
##      4-Factor Principal Axes Extraction + Varimax Rotation
##       3-Factor Principal Axes Extraction + Promax Rotation
##       4-Factor Principal Axes Extraction + Promax Rotation
##  3-Factor Maximum Likelihood Extraction + Varimax Rotation
##  4-Factor Maximum Likelihood Extraction + Varimax Rotation
##   3-Factor Maximum Likelihood Extraction + Promax Rotation
##   4-Factor Maximum Likelihood Extraction + Promax Rotation

##################################
# Consolidating all calculated values
# for the Standardized Root Mean Square of the Residual
##################################
(RMS_Plot <- dotplot(Algorithm ~ RMS,
                     data = FA_Summary,
                     main = "Algorithm Comparison : Standardized Root Mean Square of the Residual",
                     ylab = "Algorithm",
                     xlab = "RMSR",
                     auto.key = list(adj = 1),
                     type=c("p", "h"),  
                     origin = 0,
                     alpha = 0.45,
                     pch = 16,
                     cex = 2))

##################################
# Consolidating all calculated values
# for the Tucker-Lewis Fit Index
##################################
(TLI_Plot <- dotplot(Algorithm ~ TLI,
                     data = FA_Summary,
                     main = "Algorithm Comparison : Tucker-Lewis Fit Index",
                     ylab = "Algorithm",
                     xlab = "TLI",
                     auto.key = list(adj = 1),
                     type=c("p", "h"),  
                     origin = 0,
                     alpha = 0.45,
                     pch = 16,
                     cex = 2))

##################################
# Consolidating all calculated values
# for the Bayesian Information Criterion
##################################
(BIC_Plot <- dotplot(Algorithm ~ BIC,
                     data = FA_Summary,
                     main = "Algorithm Comparison : Bayesian Information Criterion",
                     ylab = "Algorithm",
                     xlab = "BIC",
                     auto.key = list(adj = 1),
                     type=c("p", "h"),  
                     origin = 0,
                     alpha = 0.45,
                     pch = 16,
                     cex = 2))

##################################
# Consolidating all calculated values
# for the High Residual Rate
##################################
(HighResidualRate_Plot <- dotplot(Algorithm ~ HighResidualRate,
                     data = FA_Summary,
                     main = "Algorithm Comparison : High Residual Rate ",
                     ylab = "Algorithm",
                     xlab = "High Residual Rate",
                     auto.key = list(adj = 1),
                     type=c("p", "h"),  
                     origin = 0,
                     alpha = 0.45,
                     pch = 16,
                     cex = 2))

##################################
# Plotting the Factor Loading Diagram
# for the optimal EFA models
##################################
fa.diagram(FA_ML_V_4F,
           sort=TRUE,
           cut=0,
           digits=3,
           main="Maximum Likelihood Factor Extraction + Varimax Rotation : 4 Factors",
           cex=0.75)

fa.diagram(FA_ML_V_3F,
           sort=TRUE,
           cut=0,
           digits=3,
           main="Maximum Likelihood Factor Extraction + Varimax Rotation : 3 Factors",
           cex=0.75)

##################################
# Plotting the Dandelion Plot
# for the optimal EFA models
##################################
dandelion(FA_ML_V_4F_FactorLoading,
          bound=0,
          mcex=c(1,1.2),
          palet=DandelionPlotPalette)

dandelion(FA_ML_V_3F_FactorLoading,
          bound=0,
          mcex=c(1,1.2),
          palet=DandelionPlotPalette)

1.3.7 Model Performance Validation

1.3.7.1 4-Factor Model Confirmatory Factor Analysis

[A] Confirmatory factor analysis was conducted to evaluate the derived 4-Factor Model from the exploratory factor analysis that hypothesized the relationship between 4 latent factors and observed chronic disease indicator variables as follows:
     [A.1] Factor 1 UNHLIF was a latent factor describing Unhealthy Lifestyle with higher loadings toward the following descriptors:
            [A.1.1] HDMORT: Heart Disease Mortality
            [A.1.2] STMORT: Stroke Mortality
            [A.1.3] OVWINC: Overweight Incidence
     [A.2] Factor 2 PINACT was a latent factor describing Physical Inactivity with higher loadings toward the following descriptors:
            [A.2.1] MEUNDA: Mentally Unhealthy Days
            [A.2.2] ARTINC: Arthritis Incidence
     [A.3] Factor 3 PNUTRI was a latent factor describing Poor Nutrition with higher loadings toward the following descriptors:
            [A.3.1] RDMORT: Renal Disease Mortality
            [A.3.2] DBMORT: Diabetes Mortality
     [A.4] Factor 4 TOBUSE was a latent factor describing Tobacco Use with higher loadings toward the following descriptors:
            [A.4.1] SMPREV: Smoking Prevalence
            [A.4.2] PDMORT: Pulmonary Disease Mortality

[B] The 4-Factor Model demonstrated sufficiently good fit based on the model performance metrics as follows:
     [B.1] Comparative Fit Index = 0.96 (Ideal Criterion: >0.90)
     [B.2] Tucker Lewis Index = 0.94 (Ideal Criterion: >0.90)
     [B.3] Standardized Root Mean Square of the Residual = 0.05 (Ideal Criterion: <0.08)
     [B.4] Root Mean Square Error of Approximation = 0.10 (Ideal Criterion: <0.08)

[C] All chronic disease indicator loadings for each latent factor were high (>0.65) and all statistically significant.

[D] The 4-Factor Model demonstrated high total reliability at 0.92 and individual latent factor reliability as follows.
     [D.1] Physical Inactivity Cronbach’s Alpha = 0.86
     [D.2] Poor Nutrition Cronbach’s Alpha = 0.76
     [D.3] Tobacco Use = 0.85
     [D.4] Unhealthy Lifestyle Cronbach’s Alpha = 0.89

Code Chunk | Output

##################################
# Specifying the model structure
# for confirmatory factor analysis
##################################
CFA_4F <- 'UNHLIF =~ HDMORT + STMORT + OVWINC
           TOBUSE =~ SMPREV + PDMORT
           PINACT =~ MEUNDA + ARTINC
           PNUTRI =~ RDMORT + DBMORT'

##################################
# Fitting the model
# for confirmatory factor analysis
##################################
CFA_4F_MODEL <- cfa(CFA_4F, data = DPA.Descriptors.Numeric, mimic =c("MPlus"), std.lv = TRUE)

CFA_4F_MODEL_CHANGE <- modindices(CFA_4F_MODEL, sort = TRUE) 
head(CFA_4F_MODEL_CHANGE, 15)

##       lhs op    rhs    mi    epc sepc.lv sepc.all sepc.nox
## 73 HDMORT ~~ MEUNDA 7.270  0.150   0.150    0.375    0.375
## 86 OVWINC ~~ MEUNDA 4.286 -0.106  -0.106   -0.301   -0.301
## 78 STMORT ~~ SMPREV 3.332 -0.084  -0.084   -0.458   -0.458
## 84 OVWINC ~~ SMPREV 2.860  0.079   0.079    0.409    0.409
## 75 HDMORT ~~ RDMORT 2.501 -0.103  -0.103   -0.372   -0.372
## 63 PNUTRI =~ STMORT 2.316  0.449   0.449    0.454    0.454
## 55 PINACT =~ HDMORT 2.286  0.188   0.188    0.189    0.189
## 57 PINACT =~ OVWINC 2.088 -0.173  -0.173   -0.174   -0.174
## 98 PDMORT ~~ DBMORT 1.946  0.097   0.097    0.222    0.222
## 82 STMORT ~~ RDMORT 1.936  0.086   0.086    0.375    0.375
## 48 TOBUSE =~ HDMORT 1.923  0.328   0.328    0.331    0.331
## 69 HDMORT ~~ STMORT 1.857  0.096   0.096    0.292    0.292
## 49 TOBUSE =~ STMORT 1.828 -0.316  -0.316   -0.319   -0.319
## 62 PNUTRI =~ HDMORT 1.513 -0.366  -0.366   -0.370   -0.370
## 58 PINACT =~ SMPREV 1.145  0.291   0.291    0.294    0.294

##################################
# Generating a model performance summary
# for confirmatory factor analysis
##################################
summary(CFA_4F_MODEL, fit.measures=TRUE)

## lavaan 0.6-19 ended normally after 30 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        33
## 
##   Number of observations                            50
##   Number of missing patterns                         1
## 
## Model Test User Model:
##                                                       
##   Test statistic                                32.593
##   Degrees of freedom                                21
##   P-value (Chi-square)                           0.051
## 
## Model Test Baseline Model:
## 
##   Test statistic                               356.367
##   Degrees of freedom                                36
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.964
##   Tucker-Lewis Index (TLI)                       0.938
##                                                       
##   Robust Comparative Fit Index (CFI)             0.964
##   Robust Tucker-Lewis Index (TLI)                0.938
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -472.090
##   Loglikelihood unrestricted model (H1)       -455.793
##                                                       
##   Akaike (AIC)                                1010.179
##   Bayesian (BIC)                              1073.276
##   Sample-size adjusted Bayesian (SABIC)        969.694
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.105
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.172
##   P-value H_0: RMSEA <= 0.050                    0.114
##   P-value H_0: RMSEA >= 0.080                    0.743
##                                                       
##   Robust RMSEA                                   0.105
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.172
##   P-value H_0: Robust RMSEA <= 0.050             0.114
##   P-value H_0: Robust RMSEA >= 0.080             0.743
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.053
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   UNHLIF =~                                           
##     HDMORT            0.764    0.123    6.220    0.000
##     STMORT            0.840    0.118    7.141    0.000
##     OVWINC            0.823    0.118    6.948    0.000
##   TOBUSE =~                                           
##     SMPREV            0.926    0.109    8.454    0.000
##     PDMORT            0.782    0.119    6.568    0.000
##   PINACT =~                                           
##     MEUNDA            0.757    0.121    6.242    0.000
##     ARTINC            1.039    0.102   10.182    0.000
##   PNUTRI =~                                           
##     RDMORT            0.887    0.124    7.159    0.000
##     DBMORT            0.679    0.131    5.186    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   UNHLIF ~~                                           
##     TOBUSE            0.836    0.072   11.672    0.000
##     PINACT            0.646    0.095    6.803    0.000
##     PNUTRI            0.857    0.079   10.912    0.000
##   TOBUSE ~~                                           
##     PINACT            0.830    0.064   12.948    0.000
##     PNUTRI            0.731    0.102    7.140    0.000
##   PINACT ~~                                           
##     PNUTRI            0.649    0.099    6.591    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .HDMORT           -0.000    0.140   -0.000    1.000
##    .STMORT            0.000    0.140    0.000    1.000
##    .OVWINC           -0.000    0.140   -0.000    1.000
##    .SMPREV            0.000    0.140    0.000    1.000
##    .PDMORT            0.000    0.140    0.000    1.000
##    .MEUNDA            0.000    0.140    0.000    1.000
##    .ARTINC            0.000    0.140    0.000    1.000
##    .RDMORT            0.000    0.140    0.000    1.000
##    .DBMORT           -0.000    0.140   -0.000    1.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .HDMORT            0.396    0.097    4.081    0.000
##    .STMORT            0.275    0.081    3.376    0.001
##    .OVWINC            0.303    0.083    3.637    0.000
##    .SMPREV            0.123    0.062    1.975    0.048
##    .PDMORT            0.369    0.084    4.373    0.000
##    .MEUNDA            0.406    0.093    4.387    0.000
##    .ARTINC           -0.100    0.086   -1.164    0.244
##    .RDMORT            0.193    0.114    1.692    0.091
##    .DBMORT            0.519    0.121    4.282    0.000
##     UNHLIF            1.000                           
##     TOBUSE            1.000                           
##     PINACT            1.000                           
##     PNUTRI            1.000

##################################
# Testing the null hypothesis
# that the matrix reproduced by the data
# and the specified model are statistically the same
# as the input or analysis matrix
# (Ideal Criterion: P-Value=>0.05)
##################################
fitMeasures(CFA_4F_MODEL, c("chisq", "df", "pvalue"))

##  chisq     df pvalue 
## 32.593 21.000  0.051

##################################
# Measuring the discrepancy between
# the model-based and observed correlation matrices
# (Ideal Criterion: RMSEA<0.08)
##################################
fitMeasures(CFA_4F_MODEL, c("rmsea", "rmsea.ci.lower", "rmsea.ci.upper", "rmsea.pvalue"))

##          rmsea rmsea.ci.lower rmsea.ci.upper   rmsea.pvalue 
##          0.105          0.000          0.172          0.114

(CFA_4F_MODEL_RMSEA <- fitMeasures(CFA_4F_MODEL, c("rmsea")))

## rmsea 
## 0.105

##################################
# Assessing model adequacy by comparing
# the hypothesized model 
# to a baseline model
# (Ideal Criterion: CFI>0.90)
##################################
(CFA_4F_MODEL_CFI <- fitMeasures(CFA_4F_MODEL, c("cfi")))

##   cfi 
## 0.964

##################################
# Assessing model adequacy by evaluating
# the improvement in fit of the hypothesized model
# relative to a baseline model
# (Ideal Criterion: TLI>0.90)
##################################
(CFA_4F_MODEL_TLI <- fitMeasures(CFA_4F_MODEL, c("tli")))

##   tli 
## 0.938

##################################
# Measuring the actual differences (discrepancies)
# between the model-based correlations and the actual correlations
# (Ideal Criterion: SRMR<0.08)
##################################
(CFA_4F_MODEL_SRMR <- fitMeasures(CFA_4F_MODEL, c("srmr")))

##  srmr 
## 0.053

##################################
# Evaluating correlation residuals 
# using the pairwise coefficients 
# to provide details about potential model misfit
# (Ideal Criterion: Residuals<0.10)
##################################
residuals(CFA_4F_MODEL)$cov

##        HDMORT STMORT OVWINC SMPREV PDMORT MEUNDA ARTINC RDMORT DBMORT
## HDMORT  0.000                                                        
## STMORT  0.042  0.000                                                 
## OVWINC -0.020 -0.017  0.000                                          
## SMPREV  0.026 -0.064  0.022  0.000                                   
## PDMORT  0.070  0.004  0.057  0.000  0.000                            
## MEUNDA  0.254  0.072 -0.128  0.016 -0.015  0.000                     
## ARTINC  0.111  0.008 -0.064  0.009 -0.025  0.000  0.000              
## RDMORT -0.068  0.047 -0.001 -0.014 -0.012 -0.026 -0.008  0.000       
## DBMORT -0.041  0.005  0.009  0.027  0.115  0.011  0.014  0.000  0.000

##################################
# Evaluating loading magnitude estimates 
# and fit statistics
# (Ideal Criterion: Loading Estimates>0.50)
# (Ideal Criterion: P-Value<0.05)
##################################
parameterEstimates(CFA_4F_MODEL) %>% 
  filter(op == "=~")

##      lhs op    rhs   est    se      z pvalue ci.lower ci.upper
## 1 UNHLIF =~ HDMORT 0.764 0.123  6.220      0    0.524    1.005
## 2 UNHLIF =~ STMORT 0.840 0.118  7.141      0    0.609    1.070
## 3 UNHLIF =~ OVWINC 0.823 0.118  6.948      0    0.591    1.055
## 4 TOBUSE =~ SMPREV 0.926 0.109  8.454      0    0.711    1.140
## 5 TOBUSE =~ PDMORT 0.782 0.119  6.568      0    0.549    1.015
## 6 PINACT =~ MEUNDA 0.757 0.121  6.242      0    0.520    0.995
## 7 PINACT =~ ARTINC 1.039 0.102 10.182      0    0.839    1.239
## 8 PNUTRI =~ RDMORT 0.887 0.124  7.159      0    0.644    1.130
## 9 PNUTRI =~ DBMORT 0.679 0.131  5.186      0    0.422    0.935

##################################
# Evaluating the measurement reliability
# of the hypothesized model latent factors
# (Ideal Criterion: Cronbach's Alpha>0.70)
##################################
semTools::compRelSEM(CFA_4F_MODEL, tau.eq=T, obs.var=T, return.total=T)

## UNHLIF TOBUSE PINACT PNUTRI  total 
##  0.858  0.850  0.891  0.761  0.925

##################################
# Formulating a path diagram to visualize
# the relationships and paths between latent factors
# and observed chronic disease indicators
# from the hypothesized model
##################################
CFA_4F_MODEL_FACTORS <- CFA_4F_MODEL@pta$vnames$lv[[1]]
size <- .65
semPlot::semPaths(CFA_4F_MODEL, latents = CFA_4F_MODEL_FACTORS, whatLabels = "std", layout = "tree2", 
         rotation = 4, style = "lisrel", optimizeLatRes = TRUE, 
         structural = FALSE, layoutSplit = FALSE,
         intercepts = FALSE, residuals = FALSE, 
         curve = 1, curvature = 3, nCharNodes = 8, 
         sizeLat = 11 * size, sizeMan = 11 * size, sizeMan2 = 4 * size, 
         edge.label.cex = 1.2 * size, 
         edge.color = "#000000", edge.label.position = .40)

1.3.7.2 3-Factor Model Confirmatory Factor Analysis

[A] Confirmatory factor analysis was conducted to evaluate the derived 3-Factor Model from the exploratory factor analysis that hypothesized the relationship between 3 latent factors and observed chronic disease indicator variables as follows:
     [A.1] Factor 1 UNHLIF was a latent factor describing Unhealthy Lifestyle with higher loadings toward the following descriptors:
            [A.1.1] HDMORT: Heart Disease Mortality
            [A.1.2] STMORT: Stroke Mortality
            [A.1.3] OVWINC: Overweight Incidence
            [A.1.4] RDMORT: Renal Disease Mortality
            [A.1.5] DBMORT: Diabetes Mortality
     [A.2] Factor 2 TOBUSE was a latent factor describing Tobacco Use with higher loadings toward the following descriptors:
            [A.2.1] SMPREV: Smoking Prevalence
            [A.2.2] PDMORT: Pulmonary Disease Mortality
            [A.2.3] ARTINC: Arthritis Incidence
     [A.3] Factor 3 POMENT was a latent factor describing Poor Mental Health with higher loading towards the following descriptors:
            [A.3.1] MEUNDA: Mentally Unhealthy Days

[B] The 3-Factor Model demonstrated sufficiently good fit based on the model performance metrics as follows:
     [B.1] Comparative Fit Index = 0.91 (Ideal Criterion: >0.90)
     [B.2] Tucker Lewis Index = 0.86 (Ideal Criterion: >0.90)
     [B.3] Standardized Root Mean Square of the Residual = 0.06 (Ideal Criterion: <0.08)
     [B.4] Root Mean Square Error of Approximation = 0.16 (Ideal Criterion: <0.08)

[C] All chronic disease indicator loadings for each latent factor were high (>0.60) and all statistically significant.

[D] The 3-Factor Model demonstrated high total reliability at 0.92 and individual latent factor reliability as follows.
     [D.1] Unhealthy Lifestyle Cronbach’s Alpha = 0.88
     [D.2] Tobacco Use = 0.89
     [D.3] Poor Mental Health Cronbach’s Alpha = Not Applicable

Code Chunk | Output

##################################
# Specifying the model structure
# for confirmatory factor analysis
##################################
CFA_3F <- 'UNHLIF =~ HDMORT + STMORT + OVWINC + RDMORT + DBMORT
           TOBUSE =~ SMPREV + PDMORT + ARTINC
           POMENT =~ MEUNDA'

##################################
# Fitting the model
# for confirmatory factor analysis
##################################
CFA_3F_MODEL <- cfa(CFA_3F, data = DPA.Descriptors.Numeric, mimic =c("MPlus"), std.lv = TRUE)

##################################
# Generating a model performance summary
# for confirmatory factor analysis
##################################
summary(CFA_3F_MODEL, fit.measures = TRUE)

## lavaan 0.6-19 ended normally after 28 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        29
## 
##   Number of observations                            50
##   Number of missing patterns                         1
## 
## Model Test User Model:
##                                                       
##   Test statistic                                55.303
##   Degrees of freedom                                25
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                               356.367
##   Degrees of freedom                                36
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.905
##   Tucker-Lewis Index (TLI)                       0.864
##                                                       
##   Robust Comparative Fit Index (CFI)             0.905
##   Robust Tucker-Lewis Index (TLI)                0.864
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -483.445
##   Loglikelihood unrestricted model (H1)       -455.793
##                                                       
##   Akaike (AIC)                                1024.889
##   Bayesian (BIC)                              1080.338
##   Sample-size adjusted Bayesian (SABIC)        989.312
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.156
##   90 Percent confidence interval - lower         0.100
##   90 Percent confidence interval - upper         0.211
##   P-value H_0: RMSEA <= 0.050                    0.003
##   P-value H_0: RMSEA >= 0.080                    0.984
##                                                       
##   Robust RMSEA                                   0.156
##   90 Percent confidence interval - lower         0.100
##   90 Percent confidence interval - upper         0.211
##   P-value H_0: Robust RMSEA <= 0.050             0.003
##   P-value H_0: Robust RMSEA >= 0.080             0.984
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.063
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   UNHLIF =~                                           
##     HDMORT            0.757    0.123    6.153    0.000
##     STMORT            0.844    0.116    7.270    0.000
##     OVWINC            0.792    0.120    6.597    0.000
##     RDMORT            0.794    0.120    6.618    0.000
##     DBMORT            0.631    0.131    4.830    0.000
##   TOBUSE =~                                           
##     SMPREV            0.876    0.112    7.789    0.000
##     PDMORT            0.741    0.123    6.002    0.000
##     ARTINC            0.925    0.108    8.533    0.000
##   POMENT =~                                           
##     MEUNDA            0.990    0.099   10.000    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   UNHLIF ~~                                           
##     TOBUSE            0.817    0.068   12.046    0.000
##     POMENT            0.558    0.108    5.183    0.000
##   TOBUSE ~~                                           
##     POMENT            0.785    0.065   11.994    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .HDMORT           -0.000    0.140   -0.000    1.000
##    .STMORT            0.000    0.140    0.000    1.000
##    .OVWINC           -0.000    0.140   -0.000    1.000
##    .RDMORT            0.000    0.140    0.000    1.000
##    .DBMORT           -0.000    0.140   -0.000    1.000
##    .SMPREV            0.000    0.140    0.000    1.000
##    .PDMORT            0.000    0.140    0.000    1.000
##    .ARTINC            0.000    0.140    0.000    1.000
##    .MEUNDA            0.000    0.140    0.000    1.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .HDMORT            0.406    0.098    4.157    0.000
##    .STMORT            0.268    0.075    3.557    0.000
##    .OVWINC            0.352    0.088    4.002    0.000
##    .RDMORT            0.349    0.088    3.992    0.000
##    .DBMORT            0.582    0.126    4.618    0.000
##    .SMPREV            0.213    0.063    3.368    0.001
##    .PDMORT            0.431    0.100    4.327    0.000
##    .ARTINC            0.125    0.055    2.289    0.022
##    .MEUNDA            0.000                           
##     UNHLIF            1.000                           
##     TOBUSE            1.000                           
##     POMENT            1.000

##################################
# Testing the null hypothesis
# that the matrix reproduced by the data
# and the specified model are statistically the same
# as the input or analysis matrix
# (Ideal Criterion: P-Value=>0.05)
##################################
fitMeasures(CFA_3F_MODEL, c("chisq", "df", "pvalue"))

##  chisq     df pvalue 
## 55.303 25.000  0.000

##################################
# Measuring the discrepancy between
# the model-based and observed correlation matrices
# (Ideal Criterion: RMSEA<0.08)
##################################
fitMeasures(CFA_3F_MODEL, c("rmsea", "rmsea.ci.lower", "rmsea.ci.upper", "rmsea.pvalue"))

##          rmsea rmsea.ci.lower rmsea.ci.upper   rmsea.pvalue 
##          0.156          0.100          0.211          0.003

(CFA_3F_MODEL_RMSEA <- fitMeasures(CFA_3F_MODEL, c("rmsea")))

## rmsea 
## 0.156

##################################
# Assessing model adequacy by comparing
# the hypothesized model 
# to a baseline model
# (Ideal Criterion: CFI>0.90)
##################################
(CFA_3F_MODEL_CFI <- fitMeasures(CFA_3F_MODEL, c("cfi")))

##   cfi 
## 0.905

##################################
# Assessing model adequacy by evaluating
# the improvement in fit of the hypothesized model
# relative to a baseline model
# (Ideal Criterion: TLI>0.90)
##################################
(CFA_3F_MODEL_TLI <- fitMeasures(CFA_3F_MODEL, c("tli")))

##   tli 
## 0.864

##################################
# Measuring the actual differences (discrepancies)
# between the model-based correlations and the actual correlations
# (Ideal Criterion: SRMR<0.08)
##################################
(CFA_3F_MODEL_SRMR <- fitMeasures(CFA_3F_MODEL, c("srmr")))

##  srmr 
## 0.063

##################################
# Evaluating correlation residuals 
# using the pairwise coefficients 
# to provide details about potential model misfit
# (Ideal Criterion: Residuals<0.10)
##################################
residuals(CFA_3F_MODEL)$cov

##        HDMORT STMORT OVWINC RDMORT DBMORT SMPREV PDMORT ARTINC MEUNDA
## HDMORT  0.000                                                        
## STMORT  0.045  0.000                                                 
## OVWINC  0.009  0.006  0.000                                          
## RDMORT -0.087  0.016 -0.004  0.000                                   
## DBMORT -0.074 -0.039 -0.012  0.101  0.000                            
## SMPREV  0.076 -0.018  0.092  0.018  0.035  0.000                     
## PDMORT  0.111  0.042  0.115  0.014  0.121  0.075  0.000              
## ARTINC  0.052 -0.066 -0.110 -0.009 -0.005 -0.003 -0.035  0.000       
## MEUNDA  0.209  0.016 -0.164 -0.029 -0.004 -0.083 -0.099  0.069  0.000

##################################
# Evaluating loading magnitude estimates 
# and fit statistics
# (Ideal Criterion: Loading Estimates>0.50)
# (Ideal Criterion: P-Value<0.05)
##################################
parameterEstimates(CFA_3F_MODEL) %>% 
  filter(op == "=~")

##      lhs op    rhs   est    se      z pvalue ci.lower ci.upper
## 1 UNHLIF =~ HDMORT 0.757 0.123  6.153      0    0.516    0.999
## 2 UNHLIF =~ STMORT 0.844 0.116  7.270      0    0.616    1.071
## 3 UNHLIF =~ OVWINC 0.792 0.120  6.597      0    0.557    1.028
## 4 UNHLIF =~ RDMORT 0.794 0.120  6.618      0    0.559    1.029
## 5 UNHLIF =~ DBMORT 0.631 0.131  4.830      0    0.375    0.887
## 6 TOBUSE =~ SMPREV 0.876 0.112  7.789      0    0.655    1.096
## 7 TOBUSE =~ PDMORT 0.741 0.123  6.002      0    0.499    0.983
## 8 TOBUSE =~ ARTINC 0.925 0.108  8.533      0    0.712    1.137
## 9 POMENT =~ MEUNDA 0.990 0.099 10.000      0    0.796    1.184

##################################
# Evaluating the measurement reliability
# of the hypothesized model latent factors
# (Ideal Criterion: Cronbach's Alpha>0.70)
##################################
semTools::compRelSEM(CFA_3F_MODEL, tau.eq=T, obs.var=T, return.total=T)

## UNHLIF TOBUSE  total 
##  0.878  0.896  0.922

##################################
# Formulating a path diagram to visualize
# the relationships and paths between latent factors
# and observed chronic disease indicators
# from the hypothesized model
##################################
CFA_3F_MODEL_FACTORS <- CFA_3F_MODEL@pta$vnames$lv[[1]]
size <- .65
semPlot::semPaths(CFA_3F_MODEL, latents = CFA_3F_MODEL_FACTORS, whatLabels = "std", layout = "tree2", 
         rotation = 4, style = "lisrel", optimizeLatRes = TRUE, 
         structural = FALSE, layoutSplit = FALSE,
         intercepts = FALSE, residuals = FALSE, 
         curve = 1, curvature = 3, nCharNodes = 8, 
         sizeLat = 11 * size, sizeMan = 11 * size, sizeMan2 = 4 * size, 
         edge.label.cex = 1.2 * size, 
         edge.color = "#000000", edge.label.position = .40)

1.3.8 Model Selection

[A] Among candidates, optimal results were obtained for the confirmatory factor analysis model with a 4-Factor Structure by demonstrating excellent model fit metrics. In addition, the latent variables obtained were contextually meaningful as individual factors in the analysis.
     [A.1] Highest Comparative Fit Index
     [A.2] Highest Tucker-Lewis Fit Index
     [A.3] Lowest Standardized Root Mean Square of the Residual
     [A.4] Lowest Root Mean Square Error of Approximation

[B] A 4-Factor Model was evaluated as sufficiently plausible to represent the relationship between 4 latent factors and observed chronic disease indicator variables as follows:
     [B.1] An underlying construct corresponding to Unhealthy Lifestyle influencing:
            [B.1.1] HDMORT: Heart Disease Mortality
            [B.1.2] STMORT: Stroke Mortality
            [B.1.3] OVWINC: Overweight Incidence
     [B.2] An underlying construct corresponding to Physical Inactivity influencing:
            [B.2.1] MEUNDA: Mentally Unhealthy Days
            [B.2.2] ARTINC: Arthritis Incidence
     [B.3] An underlying construct corresponding to Poor Nutrition influencing:
            [B.3.1] RDMORT: Renal Disease Mortality
            [B.3.2] DBMORT: Diabetes Mortality
     [B.4] An underlying construct corresponding to Tobacco Use influencing:
            [B.4.1] SMPREV: Smoking Prevalence
            [B.4.2] PDMORT: Pulmonary Disease Mortality

Code Chunk | Output

##################################
# Consolidating the fit indices
# from the confirmatory factor analysis
##################################
CFA_RMSEASummary <- c(CFA_4F_MODEL_RMSEA,
                      CFA_3F_MODEL_RMSEA)

CFA_CFISummary <- c(CFA_4F_MODEL_CFI,
                    CFA_3F_MODEL_CFI)

CFA_TLISummary <- c(CFA_4F_MODEL_TLI,
                    CFA_3F_MODEL_TLI)

CFA_SRMRSummary <- c(CFA_4F_MODEL_SRMR,
                     CFA_3F_MODEL_SRMR)

CFA_AlgorithmSummary <- c("4-Factor Confirmatory Factor Analysis",
                          "3-Factor Confirmatory Factor Analysis")

CFA_Summary <- cbind(CFA_RMSEASummary,
                     CFA_CFISummary,
                     CFA_TLISummary,
                     CFA_SRMRSummary,
                     CFA_AlgorithmSummary)

CFA_Summary <- as.data.frame(CFA_Summary)
names(CFA_Summary) <- c("RMSEA",
                       "CFI",
                       "TLI",
                       "SRMR",
                       "Algorithm")

CFA_Summary$RMSEA <- as.numeric(as.character(CFA_Summary$RMSEA))
CFA_Summary$CFI   <- as.numeric(as.character(CFA_Summary$CFI))
CFA_Summary$TLI   <- as.numeric(as.character(CFA_Summary$TLI))
CFA_Summary$SRMR  <- as.numeric(as.character(CFA_Summary$SRMR))

CFA_Summary$Algorithm <- factor(CFA_Summary$Algorithm,
                                levels = c("4-Factor Confirmatory Factor Analysis",
                                           "3-Factor Confirmatory Factor Analysis"))

print(CFA_Summary, row.names=FALSE)

##      RMSEA       CFI       TLI       SRMR                             Algorithm
##  0.1050763 0.9638131 0.9379654 0.05258369 4-Factor Confirmatory Factor Analysis
##  0.1557009 0.9054103 0.8637909 0.06286673 3-Factor Confirmatory Factor Analysis

##################################
# Consolidating all calculated values
# for the Root Mean Square Error of Approximation
##################################
(RMSEA_Plot <- dotplot(Algorithm ~ RMSEA,
                       data = CFA_Summary,
                       main = "Algorithm Comparison : Root Mean Square Error of Approximation",
                       ylab = "Algorithm",
                       xlab = "RMSEA",
                       auto.key = list(adj = 1),
                       type=c("p", "h"),  
                       origin = 0,
                       alpha = 0.45,
                       pch = 16,
                       cex = 2))

##################################
# Consolidating all calculated values
# for the Tucker-Lewis Fit Index
##################################
(CFI_Plot <- dotplot(Algorithm ~ CFI,
                     data = CFA_Summary,
                     main = "Algorithm Comparison : Comparative Fit Index",
                     ylab = "Algorithm",
                     xlab = "CFI",
                     auto.key = list(adj = 1),
                     type=c("p", "h"),  
                     origin = 0,
                     alpha = 0.45,
                     pch = 16,
                     cex = 2))

##################################
# Consolidating all calculated values
# for the Tucker-Lewis Fit Index
##################################
(TLI_Plot <- dotplot(Algorithm ~ TLI,
                     data = CFA_Summary,
                     main = "Algorithm Comparison : Tucker-Lewis Fit Index",
                     ylab = "Algorithm",
                     xlab = "TLI",
                     auto.key = list(adj = 1),
                     type=c("p", "h"),  
                     origin = 0,
                     alpha = 0.45,
                     pch = 16,
                     cex = 2))

##################################
# Consolidating all calculated values
# for the Standardized Root Mean Square of the Residual
##################################
(SRMR_Plot <- dotplot(Algorithm ~ SRMR,
                     data = CFA_Summary,
                     main = "Algorithm Comparison : Standardized Root Mean Square of the Residual",
                     ylab = "Algorithm",
                     xlab = "SRMR",
                     auto.key = list(adj = 1),
                     type=c("p", "h"),  
                     origin = 0,
                     alpha = 0.45,
                     pch = 16,
                     cex = 2))

1.3.9 Model Presentation

1.3.9.1 Exploratory Factor Analysis Dandelion Plot

[A] The formulated dandelion plot for the final 4-Factor Model from the exploratory factor analysis showed:
     [A.1] Comparable explained variance between the 4 latent factors as presented through the sizes of the star graphs.
     [A.2] Unique list of chronic disease indicators with the most dominant loadings identified in the star graphs representing each latent factor.
     [A.3] Chronic disease indicators with the most and least communality contributions.

Code Chunk | Output

##################################
# Plotting the Dandelion Plot
# for the optimal EFA model
##################################
dandelion(FA_ML_V_4F_FactorLoading,
          bound=0,
          mcex=c(1,1.2),
          palet=DandelionPlotPalette)

1.3.9.2 Confirmatory Factor Analysis Path Diagram

[A] The formulated path diagram for the final 4-Factor Model from the confirmatory factor analysis showed:
     [A.1] Magnitude and direction of relationships between the underlying constructs and the chronic disease indicators.
     [A.2] Estimated variance for each chronic disease indicator.
     [A.3] Estimated pairwise covariance between the underlying constructs.

Code Chunk | Output

##################################
# Plotting the Path Diagram
# for the optimal CFA model
##################################
CFA_4F_MODEL_FACTORS <- CFA_4F_MODEL@pta$vnames$lv[[1]]
size <- .65
semPlot::semPaths(CFA_4F_MODEL, latents = CFA_4F_MODEL_FACTORS, whatLabels = "std", layout = "tree2", 
         rotation = 4, style = "lisrel", optimizeLatRes = TRUE, 
         structural = FALSE, layoutSplit = FALSE,
         intercepts = FALSE, residuals = FALSE, 
         curve = 1, curvature = 3, nCharNodes = 8, 
         sizeLat = 11 * size, sizeMan = 11 * size, sizeMan2 = 4 * size, 
         edge.label.cex = 1.2 * size, 
         edge.color = "#000000", edge.label.position = .40)

2. Summary

A hypothesized 4-factor model structure extracted using maximum likelihood with applied latent factor correlations provided a set of robust and reliable estimates of the underlying constructs that influenced and manifested the observed chronic disease indicators, described as follows:

[1] Unhealthy Lifestyle
        [1.1] Higher Heart Disease Mortality
        [1.2] Higher Stroke Mortality
        [1.3] Higher Overweight Incidence

[2] Tobacco Use
        [2.1] Higher Smoking Prevalence
        [2.2] Higher COPD Mortality

[3] Poor Nutrition
        [3.1] Higher Renal Disease Mortality
        [3.2] Higher Diabetes Mortality

[4] Physical Inactivity
        [4.1] Higher Recent Mentally Unhealthy Days
        [4.2] Higher Arthritis Incidence

Overall, the formulated model from the factor analyses effectively estimated contextually meaningful latent constructs based on patterns of observed chronic disease indicator correlations. Addressing issues related to unhealthy lifestyle, tobacco use, poor nutrition, and physical inactivity requires a multifaceted approach involving individuals, communities, healthcare systems, and policymakers. Comprehensive health education programs can be implemented to raise awareness about the risks of unhealthy behaviors, including smoking, poor nutrition, and physical inactivity. Tobacco control initiatives can be strengthened including public awareness campaigns, tobacco taxation, smoke-free policies, and smoking cessation programs. Policies can be advocated that create a supportive environment for community programs promoting nutrition interventions, physical wellness and healthcare access. Addressing these issues requires a collaborative effort across various sectors, and sustainable change often involves a combination of individual behavior change, community engagement, and policy implementation. Public health initiatives and interventions should consider the social determinants of health and work towards creating environments that support and promote healthy lifestyles.

[A] From an initial dataset comprised of 50 observations and 15 descriptors, an optimal subset of 50 observations and 9 descriptors representing chronic disease indicators were determined after conducting data quality evaluation and correlation adequacy assessment, excluding cases or variables noted with irregularities and applying preprocessing operations most suitable for the downstream analysis.

[B] Multiple exploratory factor analysis model structures with various parameter combinations were formulated considering different Factor Extraction and Factor Rotation methods. The best models were determined using performance metrics Bayesian Information Criterion (BIC) and High Residual Rate. All candidate models were additionally compared based on the contexts presented by the derived latent factors including their loadings and correlations.

[C] The 3-Factor and 4-Factor model structures from the exploratory factor analysis which implemented Maximum Likelihood Factor Extraction and Promax Rotation demonstrated the best fit and improved interpretation of their latent factors. The selected 3-Factor Model hypothesized the relationship between 3 latent factors and observed chronic disease indicators including Unhealthy Lifestyle with higher loadings toward increased Heart Disease Mortality, Stroke Mortality, Overweight Incidence, Renal Disease Mortality and Diabetes Mortality; Poor Mental Health with higher loading towards increased Mentally Unhealthy Days; and Tobacco Use with higher loadings toward increased Smoking Prevalence and Pulmonary Disease Mortality. The selected 4-Factor Model hypothesized the relationship between 4 latent factors and observed chronic disease indicators including Unhealthy Lifestyle with higher loadings toward increased Heart Disease Mortality, Stroke Mortality and Overweight Incidence; Physical Inactivity with higher loadings toward increased Mentally Unhealthy Days and Arthritis Incidence; Poor Nutrition with higher loadings toward increased Renal Disease Mortality and Diabetes Mortality; and Tobacco Use with higher loadings toward increased Smoking Prevalence and Pulmonary Disease Mortality.

[D] Confirmatory factor analysis was conducted to evaluate the hypothesized 3-Factor and 4-Factor model structures using performance metrics Root Mean Square Error of Approximation (RMSEA), Comparative Fit Index (CFI), Tucker Lewis Index (TLI), and Standardized Root Mean Square Error of the Residual (SRMR). The hypothesized 4-Factor Model model provided a better fit to the observed chronic disease indicators. Factor loadings and latent factor correlations also aligned with theoretical expectations. The final model was defined by the respective factors and indicators as follows: Unhealthy Lifestyle with higher loadings toward increased Heart Disease Mortality, Stroke Mortality and Overweight Incidence; Physical Inactivity with higher loadings toward increased Mentally Unhealthy Days and Arthritis Incidence; Poor Nutrition with higher loadings toward increased Renal Disease Mortality and Diabetes Mortality; and Tobacco Use with higher loadings toward increased Smoking Prevalence and Pulmonary Disease Mortality.

[E] Post-hoc exploration of the model results involved Dandelion Plots which simultaneously visualized both factor variances and loadings from the exploratory factor analysis; and Path Diagrams which illustrated the structure of the model from confirmatory factor analysis by indicating how the latent factors influence the observed variables. These results helped provide insights on the importance, contribution and effect of the various underlying constructs to the observed chronic disease indicators.

The current results have limitations which can be further addressed by extending the study to include the following actions:

[1] Performing sensitivity analysis for EFA by altering the analysis parameters to an extended list of extraction and rotation methods to ensure the stability of the results

[2] Including an assessment of model parsimony for CFA to aim for a balance between model complexity and fit

[3] Additionally assessing convergent and discriminant validity of CFA by examining factor loadings, average variance extracted (AVE), and correlations between factors

[4] Conducting sensitivity analyses for CFA by testing alternative models or including additional paths to assess potential misspecification of the measurement model

[5] Considering modifying the CFA model based on theoretical rationale, modification indices, or substantive considerations to improve model fit and theoretical coherence

3. References

[Book] Multivariate Data Analysis by Barry Babin, Joseph Hair, Rolph Anderson and William Black
[Book] Using Multivariate Analysis by Barbara Tabachnick and Linda Fidell
[Book] A Step-by-Step Guide to Exploratory Factor Analysis with R and RStudio by Marley Watkins
[Book] Just Enough R by Ben Whalley
[Book] Multiple Factor Analysis by Example Using R by Jerome Pages
[Book] Nonlinear Principal Component Analysis and Its Applications by Yuichi Mori, Masahiro Kuroda and Naomichi Makino
[Book] Applied Predictive Modeling by Max Kuhn and Kjell Johnson
[Book] An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie and Rob Tibshirani
[Book] Multivariate Data Visualization with R by Deepayan Sarkar
[Book] Machine Learning by Samuel Jackson
[Book] Data Modeling Methods by Jacob Larget
[Book] Introduction to R and Statistics by University of Western Australia
[Book] Feature Engineering and Selection: A Practical Approach for Predictive Models by Max Kuhn and Kjell Johnson
[Book] Introduction to Research Methods by Eric van Holm
[Book] Using R for Social Work Research by Jerry Bean
[Book] Introduction to R by Andrew Ellis and Boris Mayer
[Book] Handbook of Structural Equation Modeling by Rick Hoyle
[Book] Applied Multivariate Statistics for the Social Sciences by James Stevens
[Book] Exploratory and Confirmatory Factor Analysis: Understanding Concepts and Applications by Bruce Thompson
[Book] A Beginner’s Guide to Structural Equation Modeling by Tiffany Whittaker and Randall Schumacker
[Book] Principles and Practice of Structural Equation Modeling by Rex Kline
[Book] Structural Equation Modeling with EQS and EQS/WINDOWS by Barbara Byrne
[Book] Innovative Nursing Care by KlavdijaČuček Trifkovič, Mateja Lorber, Nataša Mlinar Reljić and Gregor Štiglic
[R Package] AppliedPredictiveModeling by Max Kuhn
[R Package] caret by Max Kuhn
[R Package] rpart by Terry Therneau and Beth Atkinson
[R Package] lattice by Deepayan Sarkar
[R Package] dplyr by Hadley Wickham
[R Package] tidyr by Hadley Wickham
[R Package] moments by Lukasz Komsta and Frederick
[R Package] skimr by Elin Waring
[R Package] RANN by Sunil Arya, David Mount, Samuel Kemp and Gregory Jefferis
[R Package] corrplot by Taiyun Wei
[R Package] tidyverse by Hadley Wickham
[R Package] lares by Bernardo Lares
[R Package] DMwR by Luis Torgo
[R Package] gridExtra by Baptiste Auguie and Anton Antonov
[R Package] rattle by Graham Williams
[R Package] RColorBrewer by Erich Neuwirth
[R Package] stats by R Core Team
[R Package] factoextra by Alboukadel Kassambara and Fabian Mundt
[R Package] FactoMineR by Francois Husson, Julie Josse, Sebastien Le and Jeremy Mazet
[R Package] gplots by Tal Galili
[R Package] qgraph by Sacha Epskamp
[R Package] ggplot2 by Hadley Wickham, Winston Chang, Lionel Henry and Thomas Lin Pedersen
[R Package] GGally by xx
[R Package] psych by William Revelle
[R Package] nFactors by Gilles Raiche and David Magis
[R Package] MBESS by Ken Kelley
[R Package] DandEFA by Artur Manukyan, Ahmet Sedef, Erhan Cene and Ibrahim Demir
[R Package] EFAtools by Markus Steiner and Silvia Grieder
[R Package] tidyverse by hadley Wickham
[R Package] lavaan by Yves Rosseel
[R Package] semPlot by Sacha Epskamp
[R Package] semoutput by Jason Tsukahara
[R Package] sjPlot by Daniel Lüdecke
[R Package] GPArotation by Coen Bernaards
[R Package] semTools by Terrence Jorgensen
[Article] 6 Dimensionality Reduction Techniques in R (with Examples) by CMDLineTips Team
[Article] 6 Dimensionality Reduction Algorithms With Python by Jason Brownlee
[Article] Introduction to Dimensionality Reduction for Machine Learning by Jason Brownlee
[Article] Introduction to Dimensionality Reduction by Geeks For Geeks
[Article] Factor Analysis with the psych package by Michael Clark
[Article] Factor Analysis in R with Psych Package: Measuring Consumer Involvement by Peter Prevos
[Article] Factor Analysis in R by Jinjian Mu
[Article] How To: Use the psych package for Factor Analysis and Data Reduction by William Revelle
[Article] A Practical Introduction to Factor Analysis: Exploratory Factor Analysis by UCLA Advanced Research Computing Team
[Article] Examining the Big 5 Personality Dataset with Factor Analysis by Tarid Wongvorachan
[Article] Principal Component Analysis versus Exploratory Factor Analysis by Diana Suhr
[Article] Exploratory Factor Analysis by Columbia University Irving Medical Center
[Article] Factor Analysis Example by Charles Zaiontz
[Article] Factor Analysis Guide with an Example by Jim Frost
[Article] What Is Factor Analysis and How Does It Simplify Research Findings? by Qualtrics Team
[Article] How Can I Perform A Factor Analysis With Categorical (Or Categorical And Continuous) Variables? by UCLA Advanced Research Computing Team
[Article] Factor Analysis on Ordinal Data Example in R (psych, homals) by Jiayu Wu
[Article] Factor Analysis by HandWiki Team
[Article] On Likert Scales In R by Jake Chanenson
[Article] Exploratory Factor Analysis by Mariana Silva
[Article] Confirmatory Factor Analysis (CFA) in R With LAVAAN by UCLA Advanced Research Computing Team
[Article] Confirmatory Factor Analysis (CFA) Example in R by Ben Kite
[Article] Confirmatory Factor Analysis with R by Bruce Dudek
[Article] A CFA Example by Lavaan.Org
[RPubs Project] EFA and CFA Intro by Dan Isbell
[RPubs Project] Factor Analysis: CFA by Silvia Blanco
[RPubs Project] Exploratory and Confirmatory Factor Analysis in R by Ergul Demir
[RPubs Project] Confirmatory Factor Analysis - CFA by Mikhilesh Dehane
[RPubs Project] Exploratory and Confirmatory Factor Analysis on SF-36 Scales by Jason Beckstead
[RPubs Project] Using R for Social Work Research: Exploratory Factor Analysis by Jerry Bean
[RPubs Project] Using R for Social Work Research: Confirmatory Factor Analysis by Jerry Bean
[Publication] General Intelligence Objectively Determined and Measured by Charles Spearman (The American Journal of Psychology)
[Publication] The Effect of Standardization on a Chi-Square Approximation in Factor Analysis by Maurice Bartlett (Psychometrika)
[Publication] A Second Generation Little Jiffy by Henry Kaiser (Psychometrika)
[Publication] Tests of Significance in Factor Analysis by Maurice Bartlett (British Journal of Statistical Psychology)
[Publication] Test of Linear Trend in Eigenvalues of a Covariance Matrix with Application to Data Analysis by Peter Bentler and KeHai Yuan (British Journal of Mathematical and Statistical Psychology)
[Publication] The Scree Test For The Number Of Factors by Raymond Cattell (Multivariate Behavioral Research)
[Publication] Using Fit Statistic Differences to Determine the Optimal Number of Factors to Retain in an Exploratory Factor Analysis by William Finch (Educational and Psychological Measurement)
[Publication] An Objective Counterpart to the Visual Scree Test for Factor Analysis: The Standard Error Scree by Keith Zoski and Stephen Jurs (Educational and Psychological Measurement)
[Publication] The Performance of Regression-Based Variations of the Visual Scree for Determining the Number of Common Factors by Fadia Nasser, Jeri Benson and Joseph Wisenbaker (Educational and Psychological Measurement)
[Publication] Investigating the Performance of Exploratory Graph Analysis and Traditional Techniques to Identify the Number of Latent Factors: A Simulation and Tutorial by Hudson Golino, Dingjing Shi, Alexander Christensen, Luis Garrido, Maria Nieto, Ritu Sadana, Jotheeswaran Thiyagarajan, Agustin Martinez-Molina (Psychological Methods)
[Publication] Exploratory Graph Analysis: A New Approach for Estimating the Number of Dimensions in Psychological Research by Hudson Galino and Sacha Epskamp (Plos One)
[Publication] Very Simple Structure: An Alternative Procedure For Estimating The Optimal Number Of Interpretable Factors by William Revelle and Thomas Rocklin (Multivariate Behavioral Research )
[Publication] Determining the Number of Components from the Matrix of Partial Correlations by Wayne Velicer (Psychometrika)
[Publication] Dandelion Plot: A Method for the Visualization of R-mode Exploratory Factor Analyses by Artur Manukyan, Erhan Cene, Ahmet Sedef and Ibrahim Demir (Computational Statistics)
[Publication] Cutoff Criteria for Fit Indexes in Covariance Structure Analysis: Conventional Criteria Versus New Alternatives by Litze Hu and Peter Bentler (Structural Equation Modeling: A Multidisciplinary Journal)
[Publication] Effects of Sample Size, Estimation Methods, and Model Specification on Structural Equation Modeling Fit Indexes by Xitao Fan, Bruce Thompson and Lin Wang (Structural Equation Modeling: A Multidisciplinary Journal)
[Course] Applied Data Mining and Statistical Learning by Penn State Eberly College of Science

Unsupervised Learning : Uncovering Underlying Constructs of Chronic Disease Indicators Across US States Using Exploratory and Confirmatory Factor Analyses

John Pauline Pineda

November 5, 2023

1. Table of Contents

1.1 Introduction

1.1.1 Study Objectives

1.1.2 Outcome

1.1.3 Descriptors

1.2 Methodology

1.2.1 Data Assessment

1.2.2 Model Formulation

1.2.3 Model Performance Evaluation

1.2.4 Model Presentation

1.3 Results

1.3.1 Data Preparation

1.3.2 Data Quality Assessment

1.3.3 Data Preprocessing

1.3.3.1 Missing Data Imputation

1.3.3.2 Outlier Detection

1.3.3.3 Zero and Near-Zero Variance

1.3.3.4 Collinearity

1.3.3.5 Linear Dependencies

1.3.3.6 Distributional Shape

1.3.3.7 Data Standardization

1.3.4 Data Pre-Assessment

1.3.4.1 Correlation Matrix Assessment - Kaiser-Meyer-Olkin Factor Adequacy

1.3.4.2 Correlation Matrix Assessment - Bartlett’s Test of Sphericity

1.3.4.3 Correlation Matrix Assessment - Covariance Validity

1.3.4.4 Correlation Matrix Assessment - Determinant Computation

1.3.4.5 Pre-Processed Dataset

1.3.5 Data Exploration

1.3.5.1 Correlation Matrix

1.3.6 Model Development

1.3.6.1 Principal Axes Factor Extraction and Varimax Rotation

1.3.6.2 Principal Axes Factor Extraction and Promax Rotation

1.3.6.3 Maximum Likelihood Factor Extraction and Varimax Rotation

1.3.6.4 Maximum Likelihood Factor Extraction and Promax Rotation

1.3.6.5 Model Assessment

1.3.7 Model Performance Validation

1.3.7.1 4-Factor Model Confirmatory Factor Analysis

1.3.7.2 3-Factor Model Confirmatory Factor Analysis

1.3.8 Model Selection

1.3.9 Model Presentation

1.3.9.1 Exploratory Factor Analysis Dandelion Plot

1.3.9.2 Confirmatory Factor Analysis Path Diagram

2. Summary

3. References