Supervised Learning : Exploring Activation Functions And Backpropagation Gradient Updates for Neural Network Classification¶


John Pauline Pineda

March 12, 2024


  • 1. Table of Contents
    • 1.1 Data Background
    • 1.2 Data Description
    • 1.3 Data Quality Assessment
    • 1.4 Data Preprocessing
      • 1.4.1 Data Cleaning
      • 1.4.2 Missing Data Imputation
      • 1.4.3 Outlier Treatment
      • 1.4.4 Collinearity
      • 1.4.5 Shape Transformation
      • 1.4.6 Centering and Scaling
      • 1.4.7 Data Encoding
      • 1.4.8 Preprocessed Data Description
    • 1.5 Data Exploration
      • 1.5.1 Exploratory Data Analysis
      • 1.5.2 Hypothesis Testing
    • 1.6 Neural Network Classification Gradient and Weight Updates
      • 1.6.1 Premodelling Data Description
      • 1.6.2 Sigmoid Activation Function
      • 1.6.3 Rectified Linear Unit Activation Function
      • 1.6.4 Leaky Rectified Linear Unit Activation Function
      • 1.6.5 Exponential Linear Unit Activation Function
      • 1.6.6 Scaled Exponential Linear Unit Activation Function
      • 1.6.7 Randomized Leaky Rectified Linear Unit Activation Function
    • 1.7 Consolidated Findings
  • 2. Summary
  • 3. References

1. Table of Contents ¶

This project manually implements the Sigmoid, Rectified Linear Unit, Leaky Rectified Linear Unit, Exponential Linear Unit, Scaled Exponential Linear Unit and Randomized Leaky Rectified Linear Unit activation functions using various helpful packages in Python with fixed values applied for the learning rate and iteration count parameters to optimally update the gradients and weights of an artificial neural network classification model. The gradient, weight, cost function and classification accuracy optimization profiles of the different activation settings were compared. All results were consolidated in a Summary presented at the end of the document.

Artificial Neural Network, in the context of categorical response prediction, consists of interconnected nodes called neurons organized in layers. The model architecture involves an input layer which receives the input data, with each neuron representing a feature or attribute of the data; hidden layers which perform computations on the input data through weighted connections between neurons and apply activation functions to produce outputs; and the output layer which produces the final predictions equal to the number of classes, each representing the probability of the input belonging to a particular class, based on the computations performed in the hidden layers. Neurons within adjacent layers are connected by weighted connections. Each connection has an associated weight that determines the strength of influence one neuron has on another. These weights are adjusted during the training process to enable the network to learn from the input data and make accurate predictions. Activation functions introduce non-linearities into the network, allowing it to learn complex relationships between inputs and outputs. The training process involves presenting input data along with corresponding target outputs to the network and adjusting the weights to minimize the difference between the predicted outputs and the actual targets which is typically performed through optimization algorithms such as gradient descent and backpropagation. The training process iteratively updates the weights until the model's predictions closely match the target outputs.

Backpropagation and Weight Update, in the context of an artificial neural network, involve the process of iteratively adjusting the weights of the connections between neurons in the network to minimize the difference between the predicted and the actual target responses. Input data is fed into the neural network, and it propagates through the network layer by layer, starting from the input layer, through hidden layers, and ending at the output layer. At each neuron, the weighted sum of inputs is calculated, followed by the application of an activation function to produce the neuron's output. Once the forward pass is complete, the network's output is compared to the actual target output. The difference between the predicted output and the actual output is quantified using a loss function, which measures the discrepancy between the predicted and actual values. Common loss functions for classification tasks include cross-entropy loss. During the backward pass, the error is propagated backward through the network to compute the gradients of the loss function with respect to each weight in the network. This is achieved using the chain rule of calculus, which allows the error to be decomposed and distributed backward through the network. The gradients quantify how much a change in each weight would affect the overall error of the network. Once the gradients are computed, the weights are updated in the opposite direction of the gradient to minimize the error. This update is typically performed using an optimization algorithm such as gradient descent, which adjusts the weights in proportion to their gradients and a learning rate hyperparameter. The learning rate determines the size of the step taken in the direction opposite to the gradient. These steps are repeated for multiple iterations (epochs) over the training data. As the training progresses, the weights are adjusted iteratively to minimize the error, leading to a neural network model that accurately classifies input data.

Activation Functions play a crucial role in neural networks by introducing non-linearity into the network, enabling the model to learn complex patterns and relationships within the data. In the context of a neural network classification model, activation functions are applied to the output of each neuron in the hidden layers to introduce non-linear mappings between the input and output, allowing the network to approximate complex functions and make non-linear decisions. Activation functions are significant during model development by introducing non-linearity (without activation functions, the neural network would simply be a series of linear transformations, no matter how many layers it has. Activation functions introduce non-linearities to the model, enabling it to learn and represent complex patterns and relationships in the data); propagating back gradients (activation functions help in the backpropagation algorithm by providing gradients that indicate the direction and magnitude of adjustments to the weights during training. These gradients are necessary for optimizing the network's parameters through techniques like gradient descent}; and normalizing outputs (activation functions also help in normalizing the output of each neuron, ensuring that it falls within a specific range. This normalization prevents the activation values from becoming too large or too small, which can lead to numerical instability or saturation of gradients during training). The choice of activation function can significantly impact the performance and training dynamics of a neural network classification model, making it an important consideration during model development. Different activation functions have different properties, and selecting the appropriate one depends on factors such as the nature of the problem, the characteristics of the data, and the desired behavior of the network.

1.1. Data Background ¶

Datasets used for the analysis were separately gathered and consolidated from various sources including:

  1. Cancer Rates from World Population Review
  2. Social Protection and Labor Indicator from World Bank
  3. Education Indicator from World Bank
  4. Economy and Growth Indicator from World Bank
  5. Environment Indicator from World Bank
  6. Climate Change Indicator from World Bank
  7. Agricultural and Rural Development Indicator from World Bank
  8. Social Development Indicator from World Bank
  9. Health Indicator from World Bank
  10. Science and Technology Indicator from World Bank
  11. Urban Development Indicator from World Bank
  12. Human Development Indices from Human Development Reports
  13. Environmental Performance Indices from Yale Center for Environmental Law and Policy

This study hypothesized that various global development indicators and indices influence cancer rates across countries.

The target variable for the study is:

  • CANRAT - Dichotomized category based on age-standardized cancer rates, per 100K population (2022)

The predictor variables for the study are:

  • GDPPER - GDP per person employed, current US Dollars (2020)
  • URBPOP - Urban population, % of total population (2020)
  • PATRES - Patent applications by residents, total count (2020)
  • RNDGDP - Research and development expenditure, % of GDP (2020)
  • POPGRO - Population growth, annual % (2020)
  • LIFEXP - Life expectancy at birth, total in years (2020)
  • TUBINC - Incidence of tuberculosis, per 100K population (2020)
  • DTHCMD - Cause of death by communicable diseases and maternal, prenatal and nutrition conditions, % of total (2019)
  • AGRLND - Agricultural land, % of land area (2020)
  • GHGEMI - Total greenhouse gas emissions, kt of CO2 equivalent (2020)
  • RELOUT - Renewable electricity output, % of total electricity output (2015)
  • METEMI - Methane emissions, kt of CO2 equivalent (2020)
  • FORARE - Forest area, % of land area (2020)
  • CO2EMI - CO2 emissions, metric tons per capita (2020)
  • PM2EXP - PM2.5 air pollution, population exposed to levels exceeding WHO guideline value, % of total (2017)
  • POPDEN - Population density, people per sq. km of land area (2020)
  • GDPCAP - GDP per capita, current US Dollars (2020)
  • ENRTER - Tertiary school enrollment, % gross (2020)
  • HDICAT - Human development index, ordered category (2020)
  • EPISCO - Environment performance index , score (2022)

1.2. Data Description ¶

  1. The dataset is comprised of:
    • 177 rows (observations)
    • 22 columns (variables)
      • 1/22 metadata (object)
        • COUNTRY
      • 1/22 target (categorical)
        • CANRAT
      • 19/22 predictor (numeric)
        • GDPPER
        • URBPOP
        • PATRES
        • RNDGDP
        • POPGRO
        • LIFEXP
        • TUBINC
        • DTHCMD
        • AGRLND
        • GHGEMI
        • RELOUT
        • METEMI
        • FORARE
        • CO2EMI
        • PM2EXP
        • POPDEN
        • GDPCAP
        • ENRTER
        • EPISCO
      • 1/22 predictor (categorical)
        • HDICAT
In [1]:
##################################
# Loading Python Libraries
##################################
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import itertools
import os
%matplotlib inline

from operator import add,mul,truediv
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PowerTransformer
from sklearn.preprocessing import StandardScaler
from scipy import stats
In [2]:
##################################
# Defining file paths
##################################
DATASETS_ORIGINAL_PATH = r"datasets\original"
In [3]:
##################################
# Loading the dataset
# from the DATASETS_ORIGINAL_PATH
##################################
cancer_rate = pd.read_csv(os.path.join("..", DATASETS_ORIGINAL_PATH, "CategoricalCancerRates.csv"))
In [4]:
##################################
# Performing a general exploration of the dataset
##################################
print('Dataset Dimensions: ')
display(cancer_rate.shape)
Dataset Dimensions: 
(177, 22)
In [5]:
##################################
# Listing the column names and data types
##################################
print('Column Names and Data Types:')
display(cancer_rate.dtypes)
Column Names and Data Types:
COUNTRY     object
CANRAT      object
GDPPER     float64
URBPOP     float64
PATRES     float64
RNDGDP     float64
POPGRO     float64
LIFEXP     float64
TUBINC     float64
DTHCMD     float64
AGRLND     float64
GHGEMI     float64
RELOUT     float64
METEMI     float64
FORARE     float64
CO2EMI     float64
PM2EXP     float64
POPDEN     float64
ENRTER     float64
GDPCAP     float64
HDICAT      object
EPISCO     float64
dtype: object
In [6]:
##################################
# Taking a snapshot of the dataset
##################################
cancer_rate.head()
Out[6]:
COUNTRY CANRAT GDPPER URBPOP PATRES RNDGDP POPGRO LIFEXP TUBINC DTHCMD ... RELOUT METEMI FORARE CO2EMI PM2EXP POPDEN ENRTER GDPCAP HDICAT EPISCO
0 Australia High 98380.63601 86.241 2368.0 NaN 1.235701 83.200000 7.2 4.941054 ... 13.637841 131484.763200 17.421315 14.772658 24.893584 3.335312 110.139221 51722.06900 VH 60.1
1 New Zealand High 77541.76438 86.699 348.0 NaN 2.204789 82.256098 7.2 4.354730 ... 80.081439 32241.937000 37.570126 6.160799 NaN 19.331586 75.734833 41760.59478 VH 56.7
2 Ireland High 198405.87500 63.653 75.0 1.23244 1.029111 82.556098 5.3 5.684596 ... 27.965408 15252.824630 11.351720 6.768228 0.274092 72.367281 74.680313 85420.19086 VH 57.4
3 United States High 130941.63690 82.664 269586.0 3.42287 0.964348 76.980488 2.3 5.302060 ... 13.228593 748241.402900 33.866926 13.032828 3.343170 36.240985 87.567657 63528.63430 VH 51.1
4 Denmark High 113300.60110 88.116 1261.0 2.96873 0.291641 81.602439 4.1 6.826140 ... 65.505925 7778.773921 15.711000 4.691237 56.914456 145.785100 82.664330 60915.42440 VH 77.9

5 rows × 22 columns

In [7]:
##################################
# Setting the levels of the categorical variables
##################################
cancer_rate['CANRAT'] = cancer_rate['CANRAT'].astype('category')
cancer_rate['CANRAT'] = cancer_rate['CANRAT'].cat.set_categories(['Low', 'High'], ordered=True)
cancer_rate['HDICAT'] = cancer_rate['HDICAT'].astype('category')
cancer_rate['HDICAT'] = cancer_rate['HDICAT'].cat.set_categories(['L', 'M', 'H', 'VH'], ordered=True)
In [8]:
##################################
# Performing a general exploration of the numeric variables
##################################
print('Numeric Variable Summary:')
display(cancer_rate.describe(include='number').transpose())
Numeric Variable Summary:
count mean std min 25% 50% 75% max
GDPPER 165.0 45284.424283 3.941794e+04 1718.804896 13545.254510 34024.900890 66778.416050 2.346469e+05
URBPOP 174.0 59.788121 2.280640e+01 13.345000 42.432750 61.701500 79.186500 1.000000e+02
PATRES 108.0 20607.388889 1.340683e+05 1.000000 35.250000 244.500000 1297.750000 1.344817e+06
RNDGDP 74.0 1.197474 1.189956e+00 0.039770 0.256372 0.873660 1.608842 5.354510e+00
POPGRO 174.0 1.127028 1.197718e+00 -2.079337 0.236900 1.179959 2.031154 3.727101e+00
LIFEXP 174.0 71.746113 7.606209e+00 52.777000 65.907500 72.464610 77.523500 8.456000e+01
TUBINC 174.0 105.005862 1.367229e+02 0.770000 12.000000 44.500000 147.750000 5.920000e+02
DTHCMD 170.0 21.260521 1.927333e+01 1.283611 6.078009 12.456279 36.980457 6.520789e+01
AGRLND 174.0 38.793456 2.171551e+01 0.512821 20.130276 40.386649 54.013754 8.084112e+01
GHGEMI 170.0 259582.709895 1.118550e+06 179.725150 12527.487367 41009.275980 116482.578575 1.294287e+07
RELOUT 153.0 39.760036 3.191492e+01 0.000296 10.582691 32.381668 63.011450 1.000000e+02
METEMI 170.0 47876.133575 1.346611e+05 11.596147 3662.884908 11118.976025 32368.909040 1.186285e+06
FORARE 173.0 32.218177 2.312001e+01 0.008078 11.604388 31.509048 49.071780 9.741212e+01
CO2EMI 170.0 3.751097 4.606479e+00 0.032585 0.631924 2.298368 4.823496 3.172684e+01
PM2EXP 167.0 91.940595 2.206003e+01 0.274092 99.627134 100.000000 100.000000 1.000000e+02
POPDEN 174.0 200.886765 6.453834e+02 2.115134 27.454539 77.983133 153.993650 7.918951e+03
ENRTER 116.0 49.994997 2.970619e+01 2.432581 22.107195 53.392460 71.057467 1.433107e+02
GDPCAP 170.0 13992.095610 1.957954e+04 216.827417 1870.503029 5348.192875 17421.116227 1.173705e+05
EPISCO 165.0 42.946667 1.249086e+01 18.900000 33.000000 40.900000 50.500000 7.790000e+01
In [9]:
##################################
# Performing a general exploration of the object variable
##################################
print('Object Variable Summary:')
display(cancer_rate.describe(include='object').transpose())
Object Variable Summary:
count unique top freq
COUNTRY 177 177 Australia 1
In [10]:
##################################
# Performing a general exploration of the categorical variables
##################################
print('Categorical Variable Summary:')
display(cancer_rate.describe(include='category').transpose())
Categorical Variable Summary:
count unique top freq
CANRAT 177 2 Low 132
HDICAT 167 4 VH 59

1.3. Data Quality Assessment ¶

Data quality findings based on assessment are as follows:

  1. No duplicated rows observed.
  2. Missing data noted for 20 variables with Null.Count>0 and Fill.Rate<1.0.
    • RNDGDP: Null.Count = 103, Fill.Rate = 0.418
    • PATRES: Null.Count = 69, Fill.Rate = 0.610
    • ENRTER: Null.Count = 61, Fill.Rate = 0.655
    • RELOUT: Null.Count = 24, Fill.Rate = 0.864
    • GDPPER: Null.Count = 12, Fill.Rate = 0.932
    • EPISCO: Null.Count = 12, Fill.Rate = 0.932
    • HDICAT: Null.Count = 10, Fill.Rate = 0.943
    • PM2EXP: Null.Count = 10, Fill.Rate = 0.943
    • DTHCMD: Null.Count = 7, Fill.Rate = 0.960
    • METEMI: Null.Count = 7, Fill.Rate = 0.960
    • CO2EMI: Null.Count = 7, Fill.Rate = 0.960
    • GDPCAP: Null.Count = 7, Fill.Rate = 0.960
    • GHGEMI: Null.Count = 7, Fill.Rate = 0.960
    • FORARE: Null.Count = 4, Fill.Rate = 0.977
    • TUBINC: Null.Count = 3, Fill.Rate = 0.983
    • AGRLND: Null.Count = 3, Fill.Rate = 0.983
    • POPGRO: Null.Count = 3, Fill.Rate = 0.983
    • POPDEN: Null.Count = 3, Fill.Rate = 0.983
    • URBPOP: Null.Count = 3, Fill.Rate = 0.983
    • LIFEXP: Null.Count = 3, Fill.Rate = 0.983
  3. 120 observations noted with at least 1 missing data. From this number, 14 observations reported high Missing.Rate>0.2.
    • COUNTRY=Guadeloupe: Missing.Rate= 0.909
    • COUNTRY=Martinique: Missing.Rate= 0.909
    • COUNTRY=French Guiana: Missing.Rate= 0.909
    • COUNTRY=New Caledonia: Missing.Rate= 0.500
    • COUNTRY=French Polynesia: Missing.Rate= 0.500
    • COUNTRY=Guam: Missing.Rate= 0.500
    • COUNTRY=Puerto Rico: Missing.Rate= 0.409
    • COUNTRY=North Korea: Missing.Rate= 0.227
    • COUNTRY=Somalia: Missing.Rate= 0.227
    • COUNTRY=South Sudan: Missing.Rate= 0.227
    • COUNTRY=Venezuela: Missing.Rate= 0.227
    • COUNTRY=Libya: Missing.Rate= 0.227
    • COUNTRY=Eritrea: Missing.Rate= 0.227
    • COUNTRY=Yemen: Missing.Rate= 0.227
  4. Low variance observed for 1 variable with First.Second.Mode.Ratio>5.
    • PM2EXP: First.Second.Mode.Ratio = 53.000
  5. No low variance observed for any variable with Unique.Count.Ratio>10.
  6. High skewness observed for 5 variables with Skewness>3 or Skewness<(-3).
    • POPDEN: Skewness = +10.267
    • GHGEMI: Skewness = +9.496
    • PATRES: Skewness = +9.284
    • METEMI: Skewness = +5.801
    • PM2EXP: Skewness = -3.141
In [11]:
##################################
# Counting the number of duplicated rows
##################################
cancer_rate.duplicated().sum()
Out[11]:
np.int64(0)
In [12]:
##################################
# Gathering the data types for each column
##################################
data_type_list = list(cancer_rate.dtypes)
In [13]:
##################################
# Gathering the variable names for each column
##################################
variable_name_list = list(cancer_rate.columns)
In [14]:
##################################
# Gathering the number of observations for each column
##################################
row_count_list = list([len(cancer_rate)] * len(cancer_rate.columns))
In [15]:
##################################
# Gathering the number of missing data for each column
##################################
null_count_list = list(cancer_rate.isna().sum(axis=0))
In [16]:
##################################
# Gathering the number of non-missing data for each column
##################################
non_null_count_list = list(cancer_rate.count())
In [17]:
##################################
# Gathering the missing data percentage for each column
##################################
fill_rate_list = map(truediv, non_null_count_list, row_count_list)
In [18]:
##################################
# Formulating the summary
# for all columns
##################################
all_column_quality_summary = pd.DataFrame(zip(variable_name_list,
                                              data_type_list,
                                              row_count_list,
                                              non_null_count_list,
                                              null_count_list,
                                              fill_rate_list), 
                                        columns=['Column.Name',
                                                 'Column.Type',
                                                 'Row.Count',
                                                 'Non.Null.Count',
                                                 'Null.Count',                                                 
                                                 'Fill.Rate'])
display(all_column_quality_summary)
Column.Name Column.Type Row.Count Non.Null.Count Null.Count Fill.Rate
0 COUNTRY object 177 177 0 1.000000
1 CANRAT category 177 177 0 1.000000
2 GDPPER float64 177 165 12 0.932203
3 URBPOP float64 177 174 3 0.983051
4 PATRES float64 177 108 69 0.610169
5 RNDGDP float64 177 74 103 0.418079
6 POPGRO float64 177 174 3 0.983051
7 LIFEXP float64 177 174 3 0.983051
8 TUBINC float64 177 174 3 0.983051
9 DTHCMD float64 177 170 7 0.960452
10 AGRLND float64 177 174 3 0.983051
11 GHGEMI float64 177 170 7 0.960452
12 RELOUT float64 177 153 24 0.864407
13 METEMI float64 177 170 7 0.960452
14 FORARE float64 177 173 4 0.977401
15 CO2EMI float64 177 170 7 0.960452
16 PM2EXP float64 177 167 10 0.943503
17 POPDEN float64 177 174 3 0.983051
18 ENRTER float64 177 116 61 0.655367
19 GDPCAP float64 177 170 7 0.960452
20 HDICAT category 177 167 10 0.943503
21 EPISCO float64 177 165 12 0.932203
In [19]:
##################################
# Counting the number of columns
# with Fill.Rate < 1.00
##################################
len(all_column_quality_summary[(all_column_quality_summary['Fill.Rate']<1)])
Out[19]:
20
In [20]:
##################################
# Identifying the columns
# with Fill.Rate < 1.00
##################################
display(all_column_quality_summary[(all_column_quality_summary['Fill.Rate']<1)].sort_values(by=['Fill.Rate'], ascending=True))
Column.Name Column.Type Row.Count Non.Null.Count Null.Count Fill.Rate
5 RNDGDP float64 177 74 103 0.418079
4 PATRES float64 177 108 69 0.610169
18 ENRTER float64 177 116 61 0.655367
12 RELOUT float64 177 153 24 0.864407
21 EPISCO float64 177 165 12 0.932203
2 GDPPER float64 177 165 12 0.932203
16 PM2EXP float64 177 167 10 0.943503
20 HDICAT category 177 167 10 0.943503
15 CO2EMI float64 177 170 7 0.960452
13 METEMI float64 177 170 7 0.960452
11 GHGEMI float64 177 170 7 0.960452
9 DTHCMD float64 177 170 7 0.960452
19 GDPCAP float64 177 170 7 0.960452
14 FORARE float64 177 173 4 0.977401
6 POPGRO float64 177 174 3 0.983051
3 URBPOP float64 177 174 3 0.983051
17 POPDEN float64 177 174 3 0.983051
10 AGRLND float64 177 174 3 0.983051
7 LIFEXP float64 177 174 3 0.983051
8 TUBINC float64 177 174 3 0.983051
In [21]:
##################################
# Identifying the rows
# with Fill.Rate < 0.90
##################################
column_low_fill_rate = all_column_quality_summary[(all_column_quality_summary['Fill.Rate']<0.90)]
In [22]:
##################################
# Gathering the metadata labels for each observation
##################################
row_metadata_list = cancer_rate["COUNTRY"].values.tolist()
In [23]:
##################################
# Gathering the number of columns for each observation
##################################
column_count_list = list([len(cancer_rate.columns)] * len(cancer_rate))
In [24]:
##################################
# Gathering the number of missing data for each row
##################################
null_row_list = list(cancer_rate.isna().sum(axis=1))
In [25]:
##################################
# Gathering the missing data percentage for each column
##################################
missing_rate_list = map(truediv, null_row_list, column_count_list)
In [26]:
##################################
# Identifying the rows
# with missing data
##################################
all_row_quality_summary = pd.DataFrame(zip(row_metadata_list,
                                           column_count_list,
                                           null_row_list,
                                           missing_rate_list), 
                                        columns=['Row.Name',
                                                 'Column.Count',
                                                 'Null.Count',                                                 
                                                 'Missing.Rate'])
display(all_row_quality_summary)
Row.Name Column.Count Null.Count Missing.Rate
0 Australia 22 1 0.045455
1 New Zealand 22 2 0.090909
2 Ireland 22 0 0.000000
3 United States 22 0 0.000000
4 Denmark 22 0 0.000000
... ... ... ... ...
172 Congo Republic 22 3 0.136364
173 Bhutan 22 2 0.090909
174 Nepal 22 2 0.090909
175 Gambia 22 4 0.181818
176 Niger 22 2 0.090909

177 rows × 4 columns

In [27]:
##################################
# Counting the number of rows
# with Missing.Rate > 0.00
##################################
len(all_row_quality_summary[(all_row_quality_summary['Missing.Rate']>0.00)])
Out[27]:
120
In [28]:
##################################
# Counting the number of rows
# with Missing.Rate > 0.20
##################################
len(all_row_quality_summary[(all_row_quality_summary['Missing.Rate']>0.20)])
Out[28]:
14
In [29]:
##################################
# Identifying the rows
# with Missing.Rate > 0.20
##################################
row_high_missing_rate = all_row_quality_summary[(all_row_quality_summary['Missing.Rate']>0.20)]
In [30]:
##################################
# Identifying the rows
# with Missing.Rate > 0.20
##################################
display(all_row_quality_summary[(all_row_quality_summary['Missing.Rate']>0.20)].sort_values(by=['Missing.Rate'], ascending=False))
Row.Name Column.Count Null.Count Missing.Rate
35 Guadeloupe 22 20 0.909091
39 Martinique 22 20 0.909091
56 French Guiana 22 20 0.909091
13 New Caledonia 22 11 0.500000
44 French Polynesia 22 11 0.500000
75 Guam 22 11 0.500000
53 Puerto Rico 22 9 0.409091
85 North Korea 22 6 0.272727
168 South Sudan 22 6 0.272727
132 Somalia 22 6 0.272727
117 Libya 22 5 0.227273
73 Venezuela 22 5 0.227273
161 Eritrea 22 5 0.227273
164 Yemen 22 5 0.227273
In [31]:
##################################
# Formulating the dataset
# with numeric columns only
##################################
cancer_rate_numeric = cancer_rate.select_dtypes(include='number')
In [32]:
##################################
# Gathering the variable names for each numeric column
##################################
numeric_variable_name_list = cancer_rate_numeric.columns
In [33]:
##################################
# Gathering the minimum value for each numeric column
##################################
numeric_minimum_list = cancer_rate_numeric.min()
In [34]:
##################################
# Gathering the mean value for each numeric column
##################################
numeric_mean_list = cancer_rate_numeric.mean()
In [35]:
##################################
# Gathering the median value for each numeric column
##################################
numeric_median_list = cancer_rate_numeric.median()
In [36]:
##################################
# Gathering the maximum value for each numeric column
##################################
numeric_maximum_list = cancer_rate_numeric.max()
In [37]:
##################################
# Gathering the first mode values for each numeric column
##################################
numeric_first_mode_list = [cancer_rate[x].value_counts(dropna=True).index.tolist()[0] for x in cancer_rate_numeric]
In [38]:
##################################
# Gathering the second mode values for each numeric column
##################################
numeric_second_mode_list = [cancer_rate[x].value_counts(dropna=True).index.tolist()[1] for x in cancer_rate_numeric]
In [39]:
##################################
# Gathering the count of first mode values for each numeric column
##################################
numeric_first_mode_count_list = [cancer_rate_numeric[x].isin([cancer_rate[x].value_counts(dropna=True).index.tolist()[0]]).sum() for x in cancer_rate_numeric]
In [40]:
##################################
# Gathering the count of second mode values for each numeric column
##################################
numeric_second_mode_count_list = [cancer_rate_numeric[x].isin([cancer_rate[x].value_counts(dropna=True).index.tolist()[1]]).sum() for x in cancer_rate_numeric]
In [41]:
##################################
# Gathering the first mode to second mode ratio for each numeric column
##################################
numeric_first_second_mode_ratio_list = map(truediv, numeric_first_mode_count_list, numeric_second_mode_count_list)
In [42]:
##################################
# Gathering the count of unique values for each numeric column
##################################
numeric_unique_count_list = cancer_rate_numeric.nunique(dropna=True)
In [43]:
##################################
# Gathering the number of observations for each numeric column
##################################
numeric_row_count_list = list([len(cancer_rate_numeric)] * len(cancer_rate_numeric.columns))
In [44]:
##################################
# Gathering the unique to count ratio for each numeric column
##################################
numeric_unique_count_ratio_list = map(truediv, numeric_unique_count_list, numeric_row_count_list)
In [45]:
##################################
# Gathering the skewness value for each numeric column
##################################
numeric_skewness_list = cancer_rate_numeric.skew()
In [46]:
##################################
# Gathering the kurtosis value for each numeric column
##################################
numeric_kurtosis_list = cancer_rate_numeric.kurtosis()
In [47]:
numeric_column_quality_summary = pd.DataFrame(zip(numeric_variable_name_list,
                                                numeric_minimum_list,
                                                numeric_mean_list,
                                                numeric_median_list,
                                                numeric_maximum_list,
                                                numeric_first_mode_list,
                                                numeric_second_mode_list,
                                                numeric_first_mode_count_list,
                                                numeric_second_mode_count_list,
                                                numeric_first_second_mode_ratio_list,
                                                numeric_unique_count_list,
                                                numeric_row_count_list,
                                                numeric_unique_count_ratio_list,
                                                numeric_skewness_list,
                                                numeric_kurtosis_list), 
                                        columns=['Numeric.Column.Name',
                                                 'Minimum',
                                                 'Mean',
                                                 'Median',
                                                 'Maximum',
                                                 'First.Mode',
                                                 'Second.Mode',
                                                 'First.Mode.Count',
                                                 'Second.Mode.Count',
                                                 'First.Second.Mode.Ratio',
                                                 'Unique.Count',
                                                 'Row.Count',
                                                 'Unique.Count.Ratio',
                                                 'Skewness',
                                                 'Kurtosis'])
display(numeric_column_quality_summary)
Numeric.Column.Name Minimum Mean Median Maximum First.Mode Second.Mode First.Mode.Count Second.Mode.Count First.Second.Mode.Ratio Unique.Count Row.Count Unique.Count.Ratio Skewness Kurtosis
0 GDPPER 1718.804896 45284.424283 34024.900890 2.346469e+05 98380.636010 77541.764380 1 1 1.000000 165 177 0.932203 1.517574 3.471992
1 URBPOP 13.345000 59.788121 61.701500 1.000000e+02 100.000000 86.699000 2 1 2.000000 173 177 0.977401 -0.210702 -0.962847
2 PATRES 1.000000 20607.388889 244.500000 1.344817e+06 6.000000 2.000000 4 3 1.333333 97 177 0.548023 9.284436 91.187178
3 RNDGDP 0.039770 1.197474 0.873660 5.354510e+00 1.232440 3.422870 1 1 1.000000 74 177 0.418079 1.396742 1.695957
4 POPGRO -2.079337 1.127028 1.179959 3.727101e+00 1.235701 2.204789 1 1 1.000000 174 177 0.983051 -0.195161 -0.423580
5 LIFEXP 52.777000 71.746113 72.464610 8.456000e+01 83.200000 82.256098 1 1 1.000000 174 177 0.983051 -0.357965 -0.649601
6 TUBINC 0.770000 105.005862 44.500000 5.920000e+02 12.000000 4.100000 4 3 1.333333 131 177 0.740113 1.746333 2.429368
7 DTHCMD 1.283611 21.260521 12.456279 6.520789e+01 4.941054 4.354730 1 1 1.000000 170 177 0.960452 0.900509 -0.691541
8 AGRLND 0.512821 38.793456 40.386649 8.084112e+01 46.252480 38.562911 1 1 1.000000 174 177 0.983051 0.074000 -0.926249
9 GHGEMI 179.725150 259582.709895 41009.275980 1.294287e+07 571903.119900 80158.025830 1 1 1.000000 170 177 0.960452 9.496120 101.637308
10 RELOUT 0.000296 39.760036 32.381668 1.000000e+02 100.000000 80.081439 3 1 3.000000 151 177 0.853107 0.501088 -0.981774
11 METEMI 11.596147 47876.133575 11118.976025 1.186285e+06 131484.763200 32241.937000 1 1 1.000000 170 177 0.960452 5.801014 38.661386
12 FORARE 0.008078 32.218177 31.509048 9.741212e+01 17.421315 37.570126 1 1 1.000000 173 177 0.977401 0.519277 -0.322589
13 CO2EMI 0.032585 3.751097 2.298368 3.172684e+01 14.772658 6.160799 1 1 1.000000 170 177 0.960452 2.721552 10.311574
14 PM2EXP 0.274092 91.940595 100.000000 1.000000e+02 100.000000 100.000000 106 2 53.000000 61 177 0.344633 -3.141557 9.032386
15 POPDEN 2.115134 200.886765 77.983133 7.918951e+03 3.335312 19.331586 1 1 1.000000 174 177 0.983051 10.267750 119.995256
16 ENRTER 2.432581 49.994997 53.392460 1.433107e+02 110.139221 75.734833 1 1 1.000000 116 177 0.655367 0.275863 -0.392895
17 GDPCAP 216.827417 13992.095610 5348.192875 1.173705e+05 51722.069000 41760.594780 1 1 1.000000 170 177 0.960452 2.258568 5.938690
18 EPISCO 18.900000 42.946667 40.900000 7.790000e+01 29.600000 43.600000 3 3 1.000000 137 177 0.774011 0.641799 0.035208
In [48]:
##################################
# Counting the number of numeric columns
# with First.Second.Mode.Ratio > 5.00
##################################
len(numeric_column_quality_summary[(numeric_column_quality_summary['First.Second.Mode.Ratio']>5)])
Out[48]:
1
In [49]:
##################################
# Identifying the numeric columns
# with First.Second.Mode.Ratio > 5.00
##################################
display(numeric_column_quality_summary[(numeric_column_quality_summary['First.Second.Mode.Ratio']>5)].sort_values(by=['First.Second.Mode.Ratio'], ascending=False))
Numeric.Column.Name Minimum Mean Median Maximum First.Mode Second.Mode First.Mode.Count Second.Mode.Count First.Second.Mode.Ratio Unique.Count Row.Count Unique.Count.Ratio Skewness Kurtosis
14 PM2EXP 0.274092 91.940595 100.0 100.0 100.0 100.0 106 2 53.0 61 177 0.344633 -3.141557 9.032386
In [50]:
##################################
# Counting the number of numeric columns
# with Unique.Count.Ratio > 10.00
##################################
len(numeric_column_quality_summary[(numeric_column_quality_summary['Unique.Count.Ratio']>10)])
Out[50]:
0
In [51]:
##################################
# Counting the number of numeric columns
# with Skewness > 3.00 or Skewness < -3.00
##################################
len(numeric_column_quality_summary[(numeric_column_quality_summary['Skewness']>3) | (numeric_column_quality_summary['Skewness']<(-3))])
Out[51]:
5
In [52]:
##################################
# Identifying the numeric columns
# with Skewness > 3.00 or Skewness < -3.00
##################################
display(numeric_column_quality_summary[(numeric_column_quality_summary['Skewness']>3) | (numeric_column_quality_summary['Skewness']<(-3))].sort_values(by=['Skewness'], ascending=False))
Numeric.Column.Name Minimum Mean Median Maximum First.Mode Second.Mode First.Mode.Count Second.Mode.Count First.Second.Mode.Ratio Unique.Count Row.Count Unique.Count.Ratio Skewness Kurtosis
15 POPDEN 2.115134 200.886765 77.983133 7.918951e+03 3.335312 19.331586 1 1 1.000000 174 177 0.983051 10.267750 119.995256
9 GHGEMI 179.725150 259582.709895 41009.275980 1.294287e+07 571903.119900 80158.025830 1 1 1.000000 170 177 0.960452 9.496120 101.637308
2 PATRES 1.000000 20607.388889 244.500000 1.344817e+06 6.000000 2.000000 4 3 1.333333 97 177 0.548023 9.284436 91.187178
11 METEMI 11.596147 47876.133575 11118.976025 1.186285e+06 131484.763200 32241.937000 1 1 1.000000 170 177 0.960452 5.801014 38.661386
14 PM2EXP 0.274092 91.940595 100.000000 1.000000e+02 100.000000 100.000000 106 2 53.000000 61 177 0.344633 -3.141557 9.032386
In [53]:
##################################
# Formulating the dataset
# with object column only
##################################
cancer_rate_object = cancer_rate.select_dtypes(include='object')
In [54]:
##################################
# Gathering the variable names for the object column
##################################
object_variable_name_list = cancer_rate_object.columns
In [55]:
##################################
# Gathering the first mode values for the object column
##################################
object_first_mode_list = [cancer_rate[x].value_counts().index.tolist()[0] for x in cancer_rate_object]
In [56]:
##################################
# Gathering the second mode values for each object column
##################################
object_second_mode_list = [cancer_rate[x].value_counts().index.tolist()[1] for x in cancer_rate_object]
In [57]:
##################################
# Gathering the count of first mode values for each object column
##################################
object_first_mode_count_list = [cancer_rate_object[x].isin([cancer_rate[x].value_counts(dropna=True).index.tolist()[0]]).sum() for x in cancer_rate_object]
In [58]:
##################################
# Gathering the count of second mode values for each object column
##################################
object_second_mode_count_list = [cancer_rate_object[x].isin([cancer_rate[x].value_counts(dropna=True).index.tolist()[1]]).sum() for x in cancer_rate_object]
In [59]:
##################################
# Gathering the first mode to second mode ratio for each object column
##################################
object_first_second_mode_ratio_list = map(truediv, object_first_mode_count_list, object_second_mode_count_list)
In [60]:
##################################
# Gathering the count of unique values for each object column
##################################
object_unique_count_list = cancer_rate_object.nunique(dropna=True)
In [61]:
##################################
# Gathering the number of observations for each object column
##################################
object_row_count_list = list([len(cancer_rate_object)] * len(cancer_rate_object.columns))
In [62]:
##################################
# Gathering the unique to count ratio for each object column
##################################
object_unique_count_ratio_list = map(truediv, object_unique_count_list, object_row_count_list)
In [63]:
object_column_quality_summary = pd.DataFrame(zip(object_variable_name_list,
                                                 object_first_mode_list,
                                                 object_second_mode_list,
                                                 object_first_mode_count_list,
                                                 object_second_mode_count_list,
                                                 object_first_second_mode_ratio_list,
                                                 object_unique_count_list,
                                                 object_row_count_list,
                                                 object_unique_count_ratio_list), 
                                        columns=['Object.Column.Name',
                                                 'First.Mode',
                                                 'Second.Mode',
                                                 'First.Mode.Count',
                                                 'Second.Mode.Count',
                                                 'First.Second.Mode.Ratio',
                                                 'Unique.Count',
                                                 'Row.Count',
                                                 'Unique.Count.Ratio'])
display(object_column_quality_summary)
Object.Column.Name First.Mode Second.Mode First.Mode.Count Second.Mode.Count First.Second.Mode.Ratio Unique.Count Row.Count Unique.Count.Ratio
0 COUNTRY Australia New Zealand 1 1 1.0 177 177 1.0
In [64]:
##################################
# Counting the number of object columns
# with First.Second.Mode.Ratio > 5.00
##################################
len(object_column_quality_summary[(object_column_quality_summary['First.Second.Mode.Ratio']>5)])
Out[64]:
0
In [65]:
##################################
# Counting the number of object columns
# with Unique.Count.Ratio > 10.00
##################################
len(object_column_quality_summary[(object_column_quality_summary['Unique.Count.Ratio']>10)])
Out[65]:
0
In [66]:
##################################
# Formulating the dataset
# with categorical columns only
##################################
cancer_rate_categorical = cancer_rate.select_dtypes(include='category')
In [67]:
##################################
# Gathering the variable names for the categorical column
##################################
categorical_variable_name_list = cancer_rate_categorical.columns
In [68]:
##################################
# Gathering the first mode values for each categorical column
##################################
categorical_first_mode_list = [cancer_rate[x].value_counts().index.tolist()[0] for x in cancer_rate_categorical]
In [69]:
##################################
# Gathering the second mode values for each categorical column
##################################
categorical_second_mode_list = [cancer_rate[x].value_counts().index.tolist()[1] for x in cancer_rate_categorical]
In [70]:
##################################
# Gathering the count of first mode values for each categorical column
##################################
categorical_first_mode_count_list = [cancer_rate_categorical[x].isin([cancer_rate[x].value_counts(dropna=True).index.tolist()[0]]).sum() for x in cancer_rate_categorical]
In [71]:
##################################
# Gathering the count of second mode values for each categorical column
##################################
categorical_second_mode_count_list = [cancer_rate_categorical[x].isin([cancer_rate[x].value_counts(dropna=True).index.tolist()[1]]).sum() for x in cancer_rate_categorical]
In [72]:
##################################
# Gathering the first mode to second mode ratio for each categorical column
##################################
categorical_first_second_mode_ratio_list = map(truediv, categorical_first_mode_count_list, categorical_second_mode_count_list)
In [73]:
##################################
# Gathering the count of unique values for each categorical column
##################################
categorical_unique_count_list = cancer_rate_categorical.nunique(dropna=True)
In [74]:
##################################
# Gathering the number of observations for each categorical column
##################################
categorical_row_count_list = list([len(cancer_rate_categorical)] * len(cancer_rate_categorical.columns))
In [75]:
##################################
# Gathering the unique to count ratio for each categorical column
##################################
categorical_unique_count_ratio_list = map(truediv, categorical_unique_count_list, categorical_row_count_list)
In [76]:
categorical_column_quality_summary = pd.DataFrame(zip(categorical_variable_name_list,
                                                    categorical_first_mode_list,
                                                    categorical_second_mode_list,
                                                    categorical_first_mode_count_list,
                                                    categorical_second_mode_count_list,
                                                    categorical_first_second_mode_ratio_list,
                                                    categorical_unique_count_list,
                                                    categorical_row_count_list,
                                                    categorical_unique_count_ratio_list), 
                                        columns=['Categorical.Column.Name',
                                                 'First.Mode',
                                                 'Second.Mode',
                                                 'First.Mode.Count',
                                                 'Second.Mode.Count',
                                                 'First.Second.Mode.Ratio',
                                                 'Unique.Count',
                                                 'Row.Count',
                                                 'Unique.Count.Ratio'])
display(categorical_column_quality_summary)
Categorical.Column.Name First.Mode Second.Mode First.Mode.Count Second.Mode.Count First.Second.Mode.Ratio Unique.Count Row.Count Unique.Count.Ratio
0 CANRAT Low High 132 45 2.933333 2 177 0.011299
1 HDICAT VH H 59 39 1.512821 4 177 0.022599
In [77]:
##################################
# Counting the number of categorical columns
# with First.Second.Mode.Ratio > 5.00
##################################
len(categorical_column_quality_summary[(categorical_column_quality_summary['First.Second.Mode.Ratio']>5)])
Out[77]:
0
In [78]:
##################################
# Counting the number of categorical columns
# with Unique.Count.Ratio > 10.00
##################################
len(categorical_column_quality_summary[(categorical_column_quality_summary['Unique.Count.Ratio']>10)])
Out[78]:
0

1.4. Data Preprocessing ¶

1.4.1 Data Cleaning ¶

  1. Subsets of rows and columns with high rates of missing data were removed from the dataset:
    • 4 variables with Fill.Rate<0.9 were excluded for subsequent analysis.
      • RNDGDP: Null.Count = 103, Fill.Rate = 0.418
      • PATRES: Null.Count = 69, Fill.Rate = 0.610
      • ENRTER: Null.Count = 61, Fill.Rate = 0.655
      • RELOUT: Null.Count = 24, Fill.Rate = 0.864
    • 14 rows with Missing.Rate>0.2 were exluded for subsequent analysis.
      • COUNTRY=Guadeloupe: Missing.Rate= 0.909
      • COUNTRY=Martinique: Missing.Rate= 0.909
      • COUNTRY=French Guiana: Missing.Rate= 0.909
      • COUNTRY=New Caledonia: Missing.Rate= 0.500
      • COUNTRY=French Polynesia: Missing.Rate= 0.500
      • COUNTRY=Guam: Missing.Rate= 0.500
      • COUNTRY=Puerto Rico: Missing.Rate= 0.409
      • COUNTRY=North Korea: Missing.Rate= 0.227
      • COUNTRY=Somalia: Missing.Rate= 0.227
      • COUNTRY=South Sudan: Missing.Rate= 0.227
      • COUNTRY=Venezuela: Missing.Rate= 0.227
      • COUNTRY=Libya: Missing.Rate= 0.227
      • COUNTRY=Eritrea: Missing.Rate= 0.227
      • COUNTRY=Yemen: Missing.Rate= 0.227
  2. No variables were removed due to zero or near-zero variance.
  3. The cleaned dataset is comprised of:
    • 163 rows (observations)
    • 18 columns (variables)
      • 1/18 metadata (object)
        • COUNTRY
      • 1/18 target (categorical)
        • CANRAT
      • 15/18 predictor (numeric)
        • GDPPER
        • URBPOP
        • POPGRO
        • LIFEXP
        • TUBINC
        • DTHCMD
        • AGRLND
        • GHGEMI
        • METEMI
        • FORARE
        • CO2EMI
        • PM2EXP
        • POPDEN
        • GDPCAP
        • EPISCO
      • 1/18 predictor (categorical)
        • HDICAT
In [79]:
##################################
# Performing a general exploration of the original dataset
##################################
print('Dataset Dimensions: ')
display(cancer_rate.shape)
Dataset Dimensions: 
(177, 22)
In [80]:
##################################
# Filtering out the rows with
# with Missing.Rate > 0.20
##################################
cancer_rate_filtered_row = cancer_rate.drop(cancer_rate[cancer_rate.COUNTRY.isin(row_high_missing_rate['Row.Name'].values.tolist())].index)
In [81]:
##################################
# Performing a general exploration of the filtered dataset
##################################
print('Dataset Dimensions: ')
display(cancer_rate_filtered_row.shape)
Dataset Dimensions: 
(163, 22)
In [82]:
##################################
# Filtering out the columns with
# with Fill.Rate < 0.90
##################################
cancer_rate_filtered_row_column = cancer_rate_filtered_row.drop(column_low_fill_rate['Column.Name'].values.tolist(), axis=1)
In [83]:
##################################
# Formulating a new dataset object
# for the cleaned data
##################################
cancer_rate_cleaned = cancer_rate_filtered_row_column
In [84]:
##################################
# Performing a general exploration of the filtered dataset
##################################
print('Dataset Dimensions: ')
display(cancer_rate_cleaned.shape)
Dataset Dimensions: 
(163, 18)

1.4.2 Missing Data Imputation ¶

Iterative Imputer is based on the Multivariate Imputation by Chained Equations (MICE) algorithm - an imputation method based on fully conditional specification, where each incomplete variable is imputed by a separate model. As a sequential regression imputation technique, the algorithm imputes an incomplete column (target column) by generating plausible synthetic values given other columns in the data. Each incomplete column must act as a target column, and has its own specific set of predictors. For predictors that are incomplete themselves, the most recently generated imputations are used to complete the predictors prior to prior to imputation of the target columns.

Linear Regression explores the linear relationship between a scalar response and one or more covariates by having the conditional mean of the dependent variable be an affine function of the independent variables. The relationship is modeled through a disturbance term which represents an unobserved random variable that adds noise. The algorithm is typically formulated from the data using the least squares method which seeks to estimate the coefficients by minimizing the squared residual function. The linear equation assigns one scale factor represented by a coefficient to each covariate and an additional coefficient called the intercept or the bias coefficient which gives the line an additional degree of freedom allowing to move up and down a two-dimensional plot.

  1. Missing data for numeric variables were imputed using the iterative imputer algorithm with a linear regression estimator.
    • GDPPER: Null.Count = 1
    • FORARE: Null.Count = 1
    • PM2EXP: Null.Count = 5
  2. Missing data for categorical variables were imputed using the most frequent value.
    • HDICAP: Null.Count = 1
In [85]:
##################################
# Formulating the summary
# for all cleaned columns
##################################
cleaned_column_quality_summary = pd.DataFrame(zip(list(cancer_rate_cleaned.columns),
                                                  list(cancer_rate_cleaned.dtypes),
                                                  list([len(cancer_rate_cleaned)] * len(cancer_rate_cleaned.columns)),
                                                  list(cancer_rate_cleaned.count()),
                                                  list(cancer_rate_cleaned.isna().sum(axis=0))), 
                                        columns=['Column.Name',
                                                 'Column.Type',
                                                 'Row.Count',
                                                 'Non.Null.Count',
                                                 'Null.Count'])
display(cleaned_column_quality_summary)
Column.Name Column.Type Row.Count Non.Null.Count Null.Count
0 COUNTRY object 163 163 0
1 CANRAT category 163 163 0
2 GDPPER float64 163 162 1
3 URBPOP float64 163 163 0
4 POPGRO float64 163 163 0
5 LIFEXP float64 163 163 0
6 TUBINC float64 163 163 0
7 DTHCMD float64 163 163 0
8 AGRLND float64 163 163 0
9 GHGEMI float64 163 163 0
10 METEMI float64 163 163 0
11 FORARE float64 163 162 1
12 CO2EMI float64 163 163 0
13 PM2EXP float64 163 158 5
14 POPDEN float64 163 163 0
15 GDPCAP float64 163 163 0
16 HDICAT category 163 162 1
17 EPISCO float64 163 163 0
In [86]:
##################################
# Formulating the cleaned dataset
# with categorical columns only
##################################
cancer_rate_cleaned_categorical = cancer_rate_cleaned.select_dtypes(include='object')
In [87]:
##################################
# Formulating the cleaned dataset
# with numeric columns only
##################################
cancer_rate_cleaned_numeric = cancer_rate_cleaned.select_dtypes(include='number')
In [88]:
##################################
# Taking a snapshot of the cleaned dataset
##################################
cancer_rate_cleaned_numeric.head()
Out[88]:
GDPPER URBPOP POPGRO LIFEXP TUBINC DTHCMD AGRLND GHGEMI METEMI FORARE CO2EMI PM2EXP POPDEN GDPCAP EPISCO
0 98380.63601 86.241 1.235701 83.200000 7.2 4.941054 46.252480 5.719031e+05 131484.763200 17.421315 14.772658 24.893584 3.335312 51722.06900 60.1
1 77541.76438 86.699 2.204789 82.256098 7.2 4.354730 38.562911 8.015803e+04 32241.937000 37.570126 6.160799 NaN 19.331586 41760.59478 56.7
2 198405.87500 63.653 1.029111 82.556098 5.3 5.684596 65.495718 5.949773e+04 15252.824630 11.351720 6.768228 0.274092 72.367281 85420.19086 57.4
3 130941.63690 82.664 0.964348 76.980488 2.3 5.302060 44.363367 5.505181e+06 748241.402900 33.866926 13.032828 3.343170 36.240985 63528.63430 51.1
4 113300.60110 88.116 0.291641 81.602439 4.1 6.826140 65.499675 4.113555e+04 7778.773921 15.711000 4.691237 56.914456 145.785100 60915.42440 77.9
In [89]:
##################################
# Defining the estimator to be used
# at each step of the round-robin imputation
##################################
lr = LinearRegression()
In [90]:
##################################
# Defining the parameter of the
# iterative imputer which will estimate 
# the columns with missing values
# as a function of the other columns
# in a round-robin fashion
##################################
iterative_imputer = IterativeImputer(
    estimator = lr,
    max_iter = 10,
    tol = 1e-10,
    imputation_order = 'ascending',
    random_state=88888888
)
In [91]:
##################################
# Implementing the iterative imputer 
##################################
cancer_rate_imputed_numeric_array = iterative_imputer.fit_transform(cancer_rate_cleaned_numeric)
In [92]:
##################################
# Transforming the imputed data
# from an array to a dataframe
##################################
cancer_rate_imputed_numeric = pd.DataFrame(cancer_rate_imputed_numeric_array, 
                                           columns = cancer_rate_cleaned_numeric.columns)
In [93]:
##################################
# Taking a snapshot of the imputed dataset
##################################
cancer_rate_imputed_numeric.head()
Out[93]:
GDPPER URBPOP POPGRO LIFEXP TUBINC DTHCMD AGRLND GHGEMI METEMI FORARE CO2EMI PM2EXP POPDEN GDPCAP EPISCO
0 98380.63601 86.241 1.235701 83.200000 7.2 4.941054 46.252480 5.719031e+05 131484.763200 17.421315 14.772658 24.893584 3.335312 51722.06900 60.1
1 77541.76438 86.699 2.204789 82.256098 7.2 4.354730 38.562911 8.015803e+04 32241.937000 37.570126 6.160799 65.867296 19.331586 41760.59478 56.7
2 198405.87500 63.653 1.029111 82.556098 5.3 5.684596 65.495718 5.949773e+04 15252.824630 11.351720 6.768228 0.274092 72.367281 85420.19086 57.4
3 130941.63690 82.664 0.964348 76.980488 2.3 5.302060 44.363367 5.505181e+06 748241.402900 33.866926 13.032828 3.343170 36.240985 63528.63430 51.1
4 113300.60110 88.116 0.291641 81.602439 4.1 6.826140 65.499675 4.113555e+04 7778.773921 15.711000 4.691237 56.914456 145.785100 60915.42440 77.9
In [94]:
##################################
# Formulating the cleaned dataset
# with categorical columns only
##################################
cancer_rate_cleaned_categorical = cancer_rate_cleaned.select_dtypes(include='category')
In [95]:
##################################
# Imputing the missing data
# for categorical columns with
# the most frequent category
##################################
cancer_rate_cleaned_categorical['HDICAT'] = cancer_rate_cleaned_categorical['HDICAT'].fillna(cancer_rate_cleaned_categorical['HDICAT'].mode()[0])
cancer_rate_imputed_categorical = cancer_rate_cleaned_categorical.reset_index(drop=True)
In [96]:
##################################
# Formulating the imputed dataset
##################################
cancer_rate_imputed = pd.concat([cancer_rate_imputed_numeric,cancer_rate_imputed_categorical], axis=1, join='inner')  
In [97]:
##################################
# Gathering the data types for each column
##################################
data_type_list = list(cancer_rate_imputed.dtypes)
In [98]:
##################################
# Gathering the variable names for each column
##################################
variable_name_list = list(cancer_rate_imputed.columns)
In [99]:
##################################
# Gathering the number of observations for each column
##################################
row_count_list = list([len(cancer_rate_imputed)] * len(cancer_rate_imputed.columns))
In [100]:
##################################
# Gathering the number of missing data for each column
##################################
null_count_list = list(cancer_rate_imputed.isna().sum(axis=0))
In [101]:
##################################
# Gathering the number of non-missing data for each column
##################################
non_null_count_list = list(cancer_rate_imputed.count())
In [102]:
##################################
# Gathering the missing data percentage for each column
##################################
fill_rate_list = map(truediv, non_null_count_list, row_count_list)
In [103]:
##################################
# Formulating the summary
# for all imputed columns
##################################
imputed_column_quality_summary = pd.DataFrame(zip(variable_name_list,
                                                  data_type_list,
                                                  row_count_list,
                                                  non_null_count_list,
                                                  null_count_list,
                                                  fill_rate_list), 
                                        columns=['Column.Name',
                                                 'Column.Type',
                                                 'Row.Count',
                                                 'Non.Null.Count',
                                                 'Null.Count',                                                 
                                                 'Fill.Rate'])
display(imputed_column_quality_summary)
Column.Name Column.Type Row.Count Non.Null.Count Null.Count Fill.Rate
0 GDPPER float64 163 163 0 1.0
1 URBPOP float64 163 163 0 1.0
2 POPGRO float64 163 163 0 1.0
3 LIFEXP float64 163 163 0 1.0
4 TUBINC float64 163 163 0 1.0
5 DTHCMD float64 163 163 0 1.0
6 AGRLND float64 163 163 0 1.0
7 GHGEMI float64 163 163 0 1.0
8 METEMI float64 163 163 0 1.0
9 FORARE float64 163 163 0 1.0
10 CO2EMI float64 163 163 0 1.0
11 PM2EXP float64 163 163 0 1.0
12 POPDEN float64 163 163 0 1.0
13 GDPCAP float64 163 163 0 1.0
14 EPISCO float64 163 163 0 1.0
15 CANRAT category 163 163 0 1.0
16 HDICAT category 163 163 0 1.0

1.4.3 Outlier Detection ¶

  1. High number of outliers observed for 5 numeric variables with Outlier.Ratio>0.10 and marginal to high Skewness.
    • PM2EXP: Outlier.Count = 37, Outlier.Ratio = 0.226, Skewness=-3.061
    • GHGEMI: Outlier.Count = 27, Outlier.Ratio = 0.165, Skewness=+9.299
    • GDPCAP: Outlier.Count = 22, Outlier.Ratio = 0.134, Skewness=+2.311
    • POPDEN: Outlier.Count = 20, Outlier.Ratio = 0.122, Skewness=+9.972
    • METEMI: Outlier.Count = 20, Outlier.Ratio = 0.122, Skewness=+5.688
  2. Minimal number of outliers observed for 5 numeric variables with Outlier.Ratio<0.10 and normal Skewness.
    • TUBINC: Outlier.Count = 12, Outlier.Ratio = 0.073, Skewness=+1.747
    • CO2EMI: Outlier.Count = 11, Outlier.Ratio = 0.067, Skewness=+2.693
    • GDPPER: Outlier.Count = 3, Outlier.Ratio = 0.018, Skewness=+1.554
    • EPISCO: Outlier.Count = 3, Outlier.Ratio = 0.018, Skewness=+0.635
    • CANRAT: Outlier.Count = 2, Outlier.Ratio = 0.012, Skewness=+0.910
In [104]:
##################################
# Formulating the imputed dataset
# with numeric columns only
##################################
cancer_rate_imputed_numeric = cancer_rate_imputed.select_dtypes(include='number')
In [105]:
##################################
# Gathering the variable names for each numeric column
##################################
numeric_variable_name_list = list(cancer_rate_imputed_numeric.columns)
In [106]:
##################################
# Gathering the skewness value for each numeric column
##################################
numeric_skewness_list = cancer_rate_imputed_numeric.skew()
In [107]:
##################################
# Computing the interquartile range
# for all columns
##################################
cancer_rate_imputed_numeric_q1 = cancer_rate_imputed_numeric.quantile(0.25)
cancer_rate_imputed_numeric_q3 = cancer_rate_imputed_numeric.quantile(0.75)
cancer_rate_imputed_numeric_iqr = cancer_rate_imputed_numeric_q3 - cancer_rate_imputed_numeric_q1
In [108]:
##################################
# Gathering the outlier count for each numeric column
# based on the interquartile range criterion
##################################
numeric_outlier_count_list = ((cancer_rate_imputed_numeric < (cancer_rate_imputed_numeric_q1 - 1.5 * cancer_rate_imputed_numeric_iqr)) | (cancer_rate_imputed_numeric > (cancer_rate_imputed_numeric_q3 + 1.5 * cancer_rate_imputed_numeric_iqr))).sum()
In [109]:
##################################
# Gathering the number of observations for each column
##################################
numeric_row_count_list = list([len(cancer_rate_imputed_numeric)] * len(cancer_rate_imputed_numeric.columns))
In [110]:
##################################
# Gathering the unique to count ratio for each categorical column
##################################
numeric_outlier_ratio_list = map(truediv, numeric_outlier_count_list, numeric_row_count_list)
In [111]:
##################################
# Formulating the outlier summary
# for all numeric columns
##################################
numeric_column_outlier_summary = pd.DataFrame(zip(numeric_variable_name_list,
                                                  numeric_skewness_list,
                                                  numeric_outlier_count_list,
                                                  numeric_row_count_list,
                                                  numeric_outlier_ratio_list), 
                                        columns=['Numeric.Column.Name',
                                                 'Skewness',
                                                 'Outlier.Count',
                                                 'Row.Count',
                                                 'Outlier.Ratio'])
display(numeric_column_outlier_summary)
Numeric.Column.Name Skewness Outlier.Count Row.Count Outlier.Ratio
0 GDPPER 1.554457 3 163 0.018405
1 URBPOP -0.212327 0 163 0.000000
2 POPGRO -0.181666 0 163 0.000000
3 LIFEXP -0.329704 0 163 0.000000
4 TUBINC 1.747962 12 163 0.073620
5 DTHCMD 0.930709 0 163 0.000000
6 AGRLND 0.035315 0 163 0.000000
7 GHGEMI 9.299960 27 163 0.165644
8 METEMI 5.688689 20 163 0.122699
9 FORARE 0.563015 0 163 0.000000
10 CO2EMI 2.693585 11 163 0.067485
11 PM2EXP -3.088403 37 163 0.226994
12 POPDEN 9.972806 20 163 0.122699
13 GDPCAP 2.311079 22 163 0.134969
14 EPISCO 0.635994 3 163 0.018405
In [112]:
##################################
# Formulating the individual boxplots
# for all numeric columns
##################################
for column in cancer_rate_imputed_numeric:
        plt.figure(figsize=(17,1))
        sns.boxplot(data=cancer_rate_imputed_numeric, x=column)
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

1.4.4 Collinearity ¶

Pearson’s Correlation Coefficient is a parametric measure of the linear correlation for a pair of features by calculating the ratio between their covariance and the product of their standard deviations. The presence of high absolute correlation values indicate the univariate association between the numeric predictors and the numeric response.

  1. Majority of the numeric variables reported moderate to high correlation which were statistically significant.
  2. Among pairwise combinations of numeric variables, high Pearson.Correlation.Coefficient values were noted for:
    • GDPPER and GDPCAP: Pearson.Correlation.Coefficient = +0.921
    • GHGEMI and METEMI: Pearson.Correlation.Coefficient = +0.905
  3. Among the highly correlated pairs, variables with the lowest correlation against the target variable were removed.
    • GDPPER: Pearson.Correlation.Coefficient = +0.690
    • METEMI: Pearson.Correlation.Coefficient = +0.062
  4. The cleaned dataset is comprised of:
    • 163 rows (observations)
    • 16 columns (variables)
      • 1/16 metadata (object)
        • COUNTRY
      • 1/16 target (categorical)
        • CANRAT
      • 13/16 predictor (numeric)
        • URBPOP
        • POPGRO
        • LIFEXP
        • TUBINC
        • DTHCMD
        • AGRLND
        • GHGEMI
        • FORARE
        • CO2EMI
        • PM2EXP
        • POPDEN
        • GDPCAP
        • EPISCO
      • 1/16 predictor (categorical)
        • HDICAT
In [113]:
##################################
# Formulating a function 
# to plot the correlation matrix
# for all pairwise combinations
# of numeric columns
##################################
def plot_correlation_matrix(corr, mask=None):
    f, ax = plt.subplots(figsize=(11, 9))
    sns.heatmap(corr, 
                ax=ax,
                mask=mask,
                annot=True, 
                vmin=-1, 
                vmax=1, 
                center=0,
                cmap='coolwarm', 
                linewidths=1, 
                linecolor='gray', 
                cbar_kws={'orientation': 'horizontal'})  
In [114]:
##################################
# Computing the correlation coefficients
# and correlation p-values
# among pairs of numeric columns
##################################
cancer_rate_imputed_numeric_correlation_pairs = {}
cancer_rate_imputed_numeric_columns = cancer_rate_imputed_numeric.columns.tolist()
for numeric_column_a, numeric_column_b in itertools.combinations(cancer_rate_imputed_numeric_columns, 2):
    cancer_rate_imputed_numeric_correlation_pairs[numeric_column_a + '_' + numeric_column_b] = stats.pearsonr(
        cancer_rate_imputed_numeric.loc[:, numeric_column_a], 
        cancer_rate_imputed_numeric.loc[:, numeric_column_b])
In [115]:
##################################
# Formulating the pairwise correlation summary
# for all numeric columns
##################################
cancer_rate_imputed_numeric_summary = cancer_rate_imputed_numeric.from_dict(cancer_rate_imputed_numeric_correlation_pairs, orient='index')
cancer_rate_imputed_numeric_summary.columns = ['Pearson.Correlation.Coefficient', 'Correlation.PValue']
display(cancer_rate_imputed_numeric_summary.sort_values(by=['Pearson.Correlation.Coefficient'], ascending=False).head(20))
Pearson.Correlation.Coefficient Correlation.PValue
GDPPER_GDPCAP 0.921010 8.158179e-68
GHGEMI_METEMI 0.905121 1.087643e-61
POPGRO_DTHCMD 0.759470 7.124695e-32
GDPPER_LIFEXP 0.755787 2.055178e-31
GDPCAP_EPISCO 0.696707 5.312642e-25
LIFEXP_GDPCAP 0.683834 8.321371e-24
GDPPER_EPISCO 0.680812 1.555304e-23
GDPPER_URBPOP 0.666394 2.781623e-22
GDPPER_CO2EMI 0.654958 2.450029e-21
TUBINC_DTHCMD 0.643615 1.936081e-20
URBPOP_LIFEXP 0.623997 5.669778e-19
LIFEXP_EPISCO 0.620271 1.048393e-18
URBPOP_GDPCAP 0.559181 8.624533e-15
CO2EMI_GDPCAP 0.550221 2.782997e-14
URBPOP_CO2EMI 0.550046 2.846393e-14
LIFEXP_CO2EMI 0.531305 2.951829e-13
URBPOP_EPISCO 0.510131 3.507463e-12
POPGRO_TUBINC 0.442339 3.384403e-09
DTHCMD_PM2EXP 0.283199 2.491837e-04
CO2EMI_EPISCO 0.282734 2.553620e-04
In [116]:
##################################
# Plotting the correlation matrix
# for all pairwise combinations
# of numeric columns
##################################
cancer_rate_imputed_numeric_correlation = cancer_rate_imputed_numeric.corr()
mask = np.triu(cancer_rate_imputed_numeric_correlation)
plot_correlation_matrix(cancer_rate_imputed_numeric_correlation,mask)
plt.show()
No description has been provided for this image
In [117]:
##################################
# Formulating a function 
# to plot the correlation matrix
# for all pairwise combinations
# of numeric columns
# with significant p-values only
##################################
def correlation_significance(df=None):
    p_matrix = np.zeros(shape=(df.shape[1],df.shape[1]))
    for col in df.columns:
        for col2 in df.drop(col,axis=1).columns:
            _ , p = stats.pearsonr(df[col],df[col2])
            p_matrix[df.columns.to_list().index(col),df.columns.to_list().index(col2)] = p
    return p_matrix
In [118]:
##################################
# Plotting the correlation matrix
# for all pairwise combinations
# of numeric columns
# with significant p-values only
##################################
cancer_rate_imputed_numeric_correlation_p_values = correlation_significance(cancer_rate_imputed_numeric)                     
mask = np.invert(np.tril(cancer_rate_imputed_numeric_correlation_p_values<0.05)) 
plot_correlation_matrix(cancer_rate_imputed_numeric_correlation,mask)  
No description has been provided for this image
In [119]:
##################################
# Filtering out one among the 
# highly correlated variable pairs with
# lesser Pearson.Correlation.Coefficient
# when compared to the target variable
##################################
cancer_rate_imputed_numeric.drop(['GDPPER','METEMI'], inplace=True, axis=1)
In [120]:
##################################
# Performing a general exploration of the filtered dataset
##################################
print('Dataset Dimensions: ')
display(cancer_rate_imputed_numeric.shape)
Dataset Dimensions: 
(163, 13)

1.4.5 Shape Transformation ¶

Yeo-Johnson Transformation applies a new family of distributions that can be used without restrictions, extending many of the good properties of the Box-Cox power family. Similar to the Box-Cox transformation, the method also estimates the optimal value of lambda but has the ability to transform both positive and negative values by inflating low variance data and deflating high variance data to create a more uniform data set. While there are no restrictions in terms of the applicable values, the interpretability of the transformed values is more diminished as compared to the other methods.

  1. A Yeo-Johnson transformation was applied to all numeric variables to improve distributional shape.
  2. Most variables achieved symmetrical distributions with minimal outliers after transformation.
  3. One variable which remained skewed even after applying shape transformation was removed.
    • PM2EXP
  4. The transformed dataset is comprised of:
    • 163 rows (observations)
    • 15 columns (variables)
      • 1/15 metadata (object)
        • COUNTRY
      • 1/15 target (categorical)
        • CANRAT
      • 12/15 predictor (numeric)
        • URBPOP
        • POPGRO
        • LIFEXP
        • TUBINC
        • DTHCMD
        • AGRLND
        • GHGEMI
        • FORARE
        • CO2EMI
        • POPDEN
        • GDPCAP
        • EPISCO
      • 1/15 predictor (categorical)
        • HDICAT
In [121]:
##################################
# Conducting a Yeo-Johnson Transformation
# to address the distributional
# shape of the variables
##################################
yeo_johnson_transformer = PowerTransformer(method='yeo-johnson',
                                          standardize=False)
cancer_rate_imputed_numeric_array = yeo_johnson_transformer.fit_transform(cancer_rate_imputed_numeric)
In [122]:
##################################
# Formulating a new dataset object
# for the transformed data
##################################
cancer_rate_transformed_numeric = pd.DataFrame(cancer_rate_imputed_numeric_array,
                                               columns=cancer_rate_imputed_numeric.columns)
In [123]:
##################################
# Formulating the individual boxplots
# for all transformed numeric columns
##################################
for column in cancer_rate_transformed_numeric:
        plt.figure(figsize=(17,1))
        sns.boxplot(data=cancer_rate_transformed_numeric, x=column)
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
In [124]:
##################################
# Filtering out the column
# which remained skewed even
# after applying shape transformation
##################################
cancer_rate_transformed_numeric.drop(['PM2EXP'], inplace=True, axis=1)
In [125]:
##################################
# Performing a general exploration of the filtered dataset
##################################
print('Dataset Dimensions: ')
display(cancer_rate_transformed_numeric.shape)
Dataset Dimensions: 
(163, 12)

1.4.6 Centering and Scaling ¶

  1. All numeric variables were transformed using the standardization method to achieve a comparable scale between values.
  2. The scaled dataset is comprised of:
    • 163 rows (observations)
    • 15 columns (variables)
      • 1/15 metadata (object)
        • COUNTRY
      • 1/15 target (categorical)
        • CANRAT
      • 12/15 predictor (numeric)
        • URBPOP
        • POPGRO
        • LIFEXP
        • TUBINC
        • DTHCMD
        • AGRLND
        • GHGEMI
        • FORARE
        • CO2EMI
        • POPDEN
        • GDPCAP
        • EPISCO
      • 1/15 predictor (categorical)
        • HDICAT
In [126]:
##################################
# Conducting standardization
# to transform the values of the 
# variables into comparable scale
##################################
standardization_scaler = StandardScaler()
cancer_rate_transformed_numeric_array = standardization_scaler.fit_transform(cancer_rate_transformed_numeric)
In [127]:
##################################
# Formulating a new dataset object
# for the scaled data
##################################
cancer_rate_scaled_numeric = pd.DataFrame(cancer_rate_transformed_numeric_array,
                                          columns=cancer_rate_transformed_numeric.columns)
In [128]:
##################################
# Formulating the individual boxplots
# for all transformed numeric columns
##################################
for column in cancer_rate_scaled_numeric:
        plt.figure(figsize=(17,1))
        sns.boxplot(data=cancer_rate_scaled_numeric, x=column)
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

1.4.7 Data Encoding ¶

  1. One-hot encoding was applied to the HDICAP_VH variable resulting to 4 additional columns in the dataset:
    • HDICAP_L
    • HDICAP_M
    • HDICAP_H
    • HDICAP_VH
In [129]:
##################################
# Formulating the categorical column
# for encoding transformation
##################################
cancer_rate_categorical_encoded = pd.DataFrame(cancer_rate_cleaned_categorical.loc[:, 'HDICAT'].to_list(),
                                               columns=['HDICAT'])
In [130]:
##################################
# Applying a one-hot encoding transformation
# for the categorical column
##################################
cancer_rate_categorical_encoded = pd.get_dummies(cancer_rate_categorical_encoded, columns=['HDICAT'])

1.4.8 Preprocessed Data Description ¶

  1. The preprocessed dataset is comprised of:
    • 163 rows (observations)
    • 18 columns (variables)
      • 1/18 metadata (object)
        • COUNTRY
      • 1/18 target (categorical)
        • CANRAT
      • 12/18 predictor (numeric)
        • URBPOP
        • POPGRO
        • LIFEXP
        • TUBINC
        • DTHCMD
        • AGRLND
        • GHGEMI
        • FORARE
        • CO2EMI
        • POPDEN
        • GDPCAP
        • EPISCO
      • 4/18 predictor (categorical)
        • HDICAT_L
        • HDICAT_M
        • HDICAT_H
        • HDICAT_VH
In [131]:
##################################
# Consolidating both numeric columns
# and encoded categorical columns
##################################
cancer_rate_preprocessed = pd.concat([cancer_rate_scaled_numeric,cancer_rate_categorical_encoded], axis=1, join='inner')  
In [132]:
##################################
# Performing a general exploration of the consolidated dataset
##################################
print('Dataset Dimensions: ')
display(cancer_rate_preprocessed.shape)
Dataset Dimensions: 
(163, 16)

1.5. Data Exploration ¶

1.5.1 Exploratory Data Analysis ¶

  1. Bivariate analysis identified individual predictors with generally positive association to the target variable based on visual inspection.
  2. Higher values or higher proportions for the following predictors are associated with the CANRAT HIGH category:
    • URBPOP
    • LIFEXP
    • CO2EMI
    • GDPCAP
    • EPISCO
    • HDICAP_VH=1
  3. Decreasing values or smaller proportions for the following predictors are associated with the CANRAT LOW category:
    • POPGRO
    • TUBINC
    • DTHCMD
    • HDICAP_L=0
    • HDICAP_M=0
    • HDICAP_H=0
  4. Values for the following predictors are not associated with the CANRAT HIGH or LOW categories:
    • AGRLND
    • GHGEMI
    • FORARE
    • POPDEN
In [133]:
##################################
# Segregating the target
# and predictor variable lists
##################################
cancer_rate_preprocessed_target = cancer_rate_filtered_row['CANRAT'].to_frame()
cancer_rate_preprocessed_target.reset_index(inplace=True, drop=True)
cancer_rate_preprocessed_categorical = cancer_rate_preprocessed[cancer_rate_categorical_encoded.columns]
cancer_rate_preprocessed_categorical_combined = cancer_rate_preprocessed_categorical.join(cancer_rate_preprocessed_target)
cancer_rate_preprocessed = cancer_rate_preprocessed.drop(cancer_rate_categorical_encoded.columns, axis=1) 
cancer_rate_preprocessed_predictors = cancer_rate_preprocessed.columns
cancer_rate_preprocessed_combined = cancer_rate_preprocessed.join(cancer_rate_preprocessed_target)
In [134]:
##################################
# Segregating the target
# and predictor variable names
##################################
y_variable = 'CANRAT'
x_variables = cancer_rate_preprocessed_predictors
In [135]:
##################################
# Defining the number of 
# rows and columns for the subplots
##################################
num_rows = 6
num_cols = 2
In [136]:
##################################
# Formulating the subplot structure
##################################
fig, axes = plt.subplots(num_rows, num_cols, figsize=(15, 30))

##################################
# Flattening the multi-row and
# multi-column axes
##################################
axes = axes.ravel()

##################################
# Formulating the individual boxplots
# for all scaled numeric columns
##################################
for i, x_variable in enumerate(x_variables):
    ax = axes[i]
    ax.boxplot([group[x_variable] for name, group in cancer_rate_preprocessed_combined.groupby(y_variable, observed=True)])
    ax.set_title(f'{y_variable} Versus {x_variable}')
    ax.set_xlabel(y_variable)
    ax.set_ylabel(x_variable)
    ax.set_xticks(range(1, len(cancer_rate_preprocessed_combined[y_variable].unique()) + 1), ['Low', 'High'])

##################################
# Adjusting the subplot layout
##################################
plt.tight_layout()

##################################
# Presenting the subplots
##################################
plt.show()
No description has been provided for this image
In [137]:
##################################
# Segregating the target
# and predictor variable names
##################################
y_variables = cancer_rate_preprocessed_categorical.columns
x_variable = 'CANRAT'

##################################
# Defining the number of 
# rows and columns for the subplots
##################################
num_rows = 2
num_cols = 2

##################################
# Formulating the subplot structure
##################################
fig, axes = plt.subplots(num_rows, num_cols, figsize=(15, 10))

##################################
# Flattening the multi-row and
# multi-column axes
##################################
axes = axes.ravel()

##################################
# Formulating the individual stacked column plots
# for all categorical columns
##################################
for i, y_variable in enumerate(y_variables):
    ax = axes[i]
    category_counts = cancer_rate_preprocessed_categorical_combined.groupby([x_variable, y_variable], observed=True).size().unstack(fill_value=0)
    category_proportions = category_counts.div(category_counts.sum(axis=1), axis=0)
    category_proportions.plot(kind='bar', stacked=True, ax=ax)
    ax.set_title(f'{x_variable} Versus {y_variable}')
    ax.set_xlabel(x_variable)
    ax.set_ylabel('Proportions')

##################################
# Adjusting the subplot layout
##################################
plt.tight_layout()

##################################
# Presenting the subplots
##################################
plt.show()
No description has been provided for this image

1.5.2 Hypothesis Testing ¶

  1. The relationship between the numeric predictors to the CANRAT target variable was statistically evaluated using the following hypotheses:
    • Null: Difference in the means between groups LOW and HIGH is equal to zero
    • Alternative: Difference in the means between groups LOW and HIGH is not equal to zero
  2. There is sufficient evidence to conclude of a statistically significant difference between the means of the numeric measurements obtained from LOW and HIGH groups of the CANRAT target variable in 9 of the 12 numeric predictors given their high t-test statistic values with reported low p-values less than the significance level of 0.05.
    • GDPCAP: T.Test.Statistic=-11.937, Correlation.PValue=0.000
    • EPISCO: T.Test.Statistic=-11.789, Correlation.PValue=0.000
    • LIFEXP: T.Test.Statistic=-10.979, Correlation.PValue=0.000
    • TUBINC: T.Test.Statistic=+9.609, Correlation.PValue=0.000
    • DTHCMD: T.Test.Statistic=+8.376, Correlation.PValue=0.000
    • CO2EMI: T.Test.Statistic=-7.031, Correlation.PValue=0.000
    • URBPOP: T.Test.Statistic=-6.541, Correlation.PValue=0.000
    • POPGRO: T.Test.Statistic=+4.905, Correlation.PValue=0.000
    • GHGEMI: T.Test.Statistic=-2.243, Correlation.PValue=0.026
  3. The relationship between the categorical predictors to the CANRAT target variable was statistically evaluated using the following hypotheses:
    • Null: The categorical predictor is independent of the categorical target variable
    • Alternative: The categorical predictor is dependent of the categorical target variable
  4. There is sufficient evidence to conclude of a statistically significant relationship difference between the categories of the categorical predictors and the LOW and HIGH groups of the CANRAT target variable in all 4 categorical predictors given their high chisquare statistic values with reported low p-values less than the significance level of 0.05.
    • HDICAT_VH: ChiSquare.Test.Statistic=76.764, ChiSquare.Test.PValue=0.000
    • HDICAT_H: ChiSquare.Test.Statistic=13.860, ChiSquare.Test.PValue=0.000
    • HDICAT_M: ChiSquare.Test.Statistic=10.286, ChiSquare.Test.PValue=0.001
    • HDICAT_L: ChiSquare.Test.Statistic=9.081, ChiSquare.Test.PValue=0.002
In [138]:
##################################
# Computing the t-test 
# statistic and p-values
# between the target variable
# and numeric predictor columns
##################################
cancer_rate_preprocessed_numeric_ttest_target = {}
cancer_rate_preprocessed_numeric = cancer_rate_preprocessed_combined
cancer_rate_preprocessed_numeric_columns = cancer_rate_preprocessed_predictors
for numeric_column in cancer_rate_preprocessed_numeric_columns:
    group_0 = cancer_rate_preprocessed_numeric[cancer_rate_preprocessed_numeric.loc[:,'CANRAT']=='Low']
    group_1 = cancer_rate_preprocessed_numeric[cancer_rate_preprocessed_numeric.loc[:,'CANRAT']=='High']
    cancer_rate_preprocessed_numeric_ttest_target['CANRAT_' + numeric_column] = stats.ttest_ind(
        group_0[numeric_column], 
        group_1[numeric_column], 
        equal_var=True)
In [139]:
##################################
# Formulating the pairwise ttest summary
# between the target variable
# and numeric predictor columns
##################################
cancer_rate_preprocessed_numeric_summary = cancer_rate_preprocessed_numeric.from_dict(cancer_rate_preprocessed_numeric_ttest_target, orient='index')
cancer_rate_preprocessed_numeric_summary.columns = ['T.Test.Statistic', 'T.Test.PValue']
display(cancer_rate_preprocessed_numeric_summary.sort_values(by=['T.Test.PValue'], ascending=True).head(12))
T.Test.Statistic T.Test.PValue
CANRAT_GDPCAP -11.936988 6.247937e-24
CANRAT_EPISCO -11.788870 1.605980e-23
CANRAT_LIFEXP -10.979098 2.754214e-21
CANRAT_TUBINC 9.608760 1.463678e-17
CANRAT_DTHCMD 8.375558 2.552108e-14
CANRAT_CO2EMI -7.030702 5.537463e-11
CANRAT_URBPOP -6.541001 7.734940e-10
CANRAT_POPGRO 4.904817 2.269446e-06
CANRAT_GHGEMI -2.243089 2.625563e-02
CANRAT_FORARE -1.174143 2.420717e-01
CANRAT_POPDEN -0.495221 6.211191e-01
CANRAT_AGRLND -0.047628 9.620720e-01
In [140]:
##################################
# Computing the chisquare
# statistic and p-values
# between the target variable
# and categorical predictor columns
##################################
cancer_rate_preprocessed_categorical_chisquare_target = {}
cancer_rate_preprocessed_categorical = cancer_rate_preprocessed_categorical_combined
cancer_rate_preprocessed_categorical_columns = ['HDICAT_L','HDICAT_M','HDICAT_H','HDICAT_VH']
for categorical_column in cancer_rate_preprocessed_categorical_columns:
    contingency_table = pd.crosstab(cancer_rate_preprocessed_categorical[categorical_column], 
                                    cancer_rate_preprocessed_categorical['CANRAT'])
    cancer_rate_preprocessed_categorical_chisquare_target['CANRAT_' + categorical_column] = stats.chi2_contingency(
        contingency_table)[0:2]
In [141]:
##################################
# Formulating the pairwise chisquare summary
# between the target variable
# and categorical predictor columns
##################################
cancer_rate_preprocessed_categorical_summary = cancer_rate_preprocessed_categorical.from_dict(cancer_rate_preprocessed_categorical_chisquare_target, orient='index')
cancer_rate_preprocessed_categorical_summary.columns = ['ChiSquare.Test.Statistic', 'ChiSquare.Test.PValue']
display(cancer_rate_preprocessed_categorical_summary.sort_values(by=['ChiSquare.Test.PValue'], ascending=True).head(4))
ChiSquare.Test.Statistic ChiSquare.Test.PValue
CANRAT_HDICAT_VH 76.764134 1.926446e-18
CANRAT_HDICAT_M 13.860367 1.969074e-04
CANRAT_HDICAT_L 10.285575 1.340742e-03
CANRAT_HDICAT_H 9.080788 2.583087e-03

1.6. Neural Network Classification Gradient and Weight Updates ¶

1.6.1 Premodelling Data Description ¶

  1. Among the predictor variables determined to have a statistically significant difference between the means of the numeric measurements obtained from LOW and HIGH groups of the CANRAT target variable, only 2 were retained with the highest absolute t-test statistic values with reported low p-values less than the significance level of 0.05..
    • GDPCAP: T.Test.Statistic=-11.937, Correlation.PValue=0.000
    • EPISCO: T.Test.Statistic=-11.789, Correlation.PValue=0.000
In [142]:
##################################
# Filtering certain numeric columns
# and encoded categorical columns
# after hypothesis testing
##################################
cancer_rate_premodelling = cancer_rate_preprocessed_combined.drop(['URBPOP', 'POPGRO', 'LIFEXP', 'TUBINC', 'DTHCMD', 'AGRLND', 'GHGEMI','FORARE', 'CO2EMI', 'POPDEN'], axis=1)
cancer_rate_premodelling.columns
Out[142]:
Index(['GDPCAP', 'EPISCO', 'CANRAT'], dtype='object')
In [143]:
##################################
# Performing a general exploration of the filtered dataset
##################################
print('Dataset Dimensions: ')
display(cancer_rate_premodelling.shape)
Dataset Dimensions: 
(163, 3)
In [144]:
##################################
# Listing the column names and data types
##################################
print('Column Names and Data Types:')
display(cancer_rate_premodelling.dtypes)
Column Names and Data Types:
GDPCAP     float64
EPISCO     float64
CANRAT    category
dtype: object
In [145]:
##################################
# Taking a snapshot of the dataset
##################################
cancer_rate_premodelling.head()
Out[145]:
GDPCAP EPISCO CANRAT
0 1.549766 1.306738 High
1 1.407752 1.102912 High
2 1.879374 1.145832 High
3 1.685426 0.739753 High
4 1.657777 2.218327 High
In [146]:
##################################
# Converting the dataframe to
# a numpy array
##################################
cancer_rate_premodelling_matrix = cancer_rate_premodelling.to_numpy()
In [147]:
##################################
# Formulating the scatterplot
# of the selected numeric predictors
# by categorical response classes
##################################
fig, ax = plt.subplots(figsize=(7, 7))
ax.plot(cancer_rate_premodelling_matrix[cancer_rate_premodelling_matrix[:,2]=='High', 0],
        cancer_rate_premodelling_matrix[cancer_rate_premodelling_matrix[:,2]=='High', 1], 
        'o', 
        label='High', 
        color='darkslateblue')
ax.plot(cancer_rate_premodelling_matrix[cancer_rate_premodelling_matrix[:,2]=='Low', 0],
        cancer_rate_premodelling_matrix[cancer_rate_premodelling_matrix[:,2]=='Low', 1], 
        'x', 
        label='Low', 
        color='chocolate')
ax.axes.set_ylabel('EPISCO')
ax.axes.set_xlabel('GDPCAP')
ax.set_xlim(-3,3)
ax.set_ylim(-3,3)
ax.set(title='CANRAT Class Distribution')
ax.legend(loc='upper left',title='CANRAT');
No description has been provided for this image
In [148]:
##################################
# Preparing the data and
# and converting to a suitable format
# as a neural network model input
##################################
matrix_x_values = cancer_rate_premodelling.iloc[:,0:2].to_numpy()
y_values = np.where(cancer_rate_premodelling['CANRAT'] == 'High', 1, 0)

1.6.2 Sigmoid Activation Function ¶

Backpropagation and Weight Update, in the context of an artificial neural network, involve the process of iteratively adjusting the weights of the connections between neurons in the network to minimize the difference between the predicted and the actual target responses. Input data is fed into the neural network, and it propagates through the network layer by layer, starting from the input layer, through hidden layers, and ending at the output layer. At each neuron, the weighted sum of inputs is calculated, followed by the application of an activation function to produce the neuron's output. Once the forward pass is complete, the network's output is compared to the actual target output. The difference between the predicted output and the actual output is quantified using a loss function, which measures the discrepancy between the predicted and actual values. Common loss functions for classification tasks include cross-entropy loss. During the backward pass, the error is propagated backward through the network to compute the gradients of the loss function with respect to each weight in the network. This is achieved using the chain rule of calculus, which allows the error to be decomposed and distributed backward through the network. The gradients quantify how much a change in each weight would affect the overall error of the network. Once the gradients are computed, the weights are updated in the opposite direction of the gradient to minimize the error. This update is typically performed using an optimization algorithm such as gradient descent, which adjusts the weights in proportion to their gradients and a learning rate hyperparameter. The learning rate determines the size of the step taken in the direction opposite to the gradient. These steps are repeated for multiple iterations (epochs) over the training data. As the training progresses, the weights are adjusted iteratively to minimize the error, leading to a neural network model that accurately classifies input data.

Activation Functions play a crucial role in neural networks by introducing non-linearity into the network, enabling the model to learn complex patterns and relationships within the data. In the context of a neural network classification model, activation functions are applied to the output of each neuron in the hidden layers to introduce non-linear mappings between the input and output, allowing the network to approximate complex functions and make non-linear decisions. Activation functions are significant during model development by introducing non-linearity (without activation functions, the neural network would simply be a series of linear transformations, no matter how many layers it has. Activation functions introduce non-linearities to the model, enabling it to learn and represent complex patterns and relationships in the data); propagating back gradients (activation functions help in the backpropagation algorithm by providing gradients that indicate the direction and magnitude of adjustments to the weights during training. These gradients are necessary for optimizing the network's parameters through techniques like gradient descent}; and normalizing outputs (activation functions also help in normalizing the output of each neuron, ensuring that it falls within a specific range. This normalization prevents the activation values from becoming too large or too small, which can lead to numerical instability or saturation of gradients during training). The choice of activation function can significantly impact the performance and training dynamics of a neural network classification model, making it an important consideration during model development. Different activation functions have different properties, and selecting the appropriate one depends on factors such as the nature of the problem, the characteristics of the data, and the desired behavior of the network.

Sigmoid Activation Function transforms a variable by determining the quotient between one and the sum of one and the Euler's constant raised to the given value of the variable. The resulting output resembles a smooth S-shaped curve which ranges from zero to one. Since, it squashes the input values between zero and one, it is useful for binary classification. The curve profile is not zero-centered which can lead to vanishing gradients problem during backpropagation. It is also prone to saturation, causing gradients to vanish when the input is too large or too small.

  1. A neural network with the following structure was formulated:
    • Hidden Layer = 3
    • Number of Nodes per Hidden Layer = 4
  2. The backpropagation and gradient descent algorithms were implemented with parameter settings described as follows:
    • Learning Rate = 0.01
    • Epochs = 1000
    • Activation Function = Sigmoid Activation Function
  3. The final loss estimate determined as 0.27805 at the 1000th epoch was not optimally low as compared to those obtained using the other parameter settings.
  4. Applying the backpropagation and gradient descent algorithms with a Sigmoid activation function, the neural network model performance is estimated as follows:
    • Accuracy = 74.84662
  5. The estimated classification accuracy using the backpropagation and gradient descent algorithms with a Sigmoid activation function was not optimal as compared to those obtained using the other parameter settings, also demonstrating a consistently smooth profile during the epoch training
In [149]:
##################################
# Creating a class object
# for the neural network algorithm
# using a sigmoid activation function
##################################
class NeuralNetwork_Sigmoid:
    def __init__(self, input_size, hidden_size1, hidden_size2, hidden_size3, output_size):
        self.weights1 = np.random.randn(input_size, hidden_size1)
        self.bias1 = np.zeros((1, hidden_size1))
        
        self.weights2 = np.random.randn(hidden_size1, hidden_size2)
        self.bias2 = np.zeros((1, hidden_size2))
        
        self.weights3 = np.random.randn(hidden_size2, hidden_size3)
        self.bias3 = np.zeros((1, hidden_size3))
        
        self.weights4 = np.random.randn(hidden_size3, output_size)
        self.bias4 = np.zeros((1, output_size))
        
        self.gradients = {'dw1': [], 'db1': [], 'dw2': [], 'db2': [], 'dw3': [], 'db3': [], 'dw4': [], 'db4': []}
        self.losses = []
        self.accuracies = []
        
    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))
    
    def sigmoid_derivative(self, x):
        return x * (1 - x)
    
    def softmax(self, x):
        exps = np.exp(x - np.max(x, axis=1, keepdims=True))
        return exps / np.sum(exps, axis=1, keepdims=True)
    
    def forward(self, x):
        self.hidden1 = self.sigmoid(np.dot(x, self.weights1) + self.bias1)
        self.hidden2 = self.sigmoid(np.dot(self.hidden1, self.weights2) + self.bias2)
        self.hidden3 = self.sigmoid(np.dot(self.hidden2, self.weights3) + self.bias3)
        self.output = self.softmax(np.dot(self.hidden3, self.weights4) + self.bias4)
        
        return self.output
    
    def backward(self, x, y, lr):
        m = x.shape[0]
        
       # Computing the output layer gradients
        output_error = self.output - y
        output_delta = output_error / m
        
        # Computing the hidden layer 3 gradients
        hidden3_error = np.dot(output_delta, self.weights4.T)
        hidden3_delta = hidden3_error * self.sigmoid_derivative(self.hidden3)
        
        # Computing the hidden layer 2 gradients
        hidden2_error = np.dot(hidden3_delta, self.weights3.T)
        hidden2_delta = hidden2_error * self.sigmoid_derivative(self.hidden2)
        
        # Computing the hidden layer 1 gradients
        hidden1_error = np.dot(hidden2_delta, self.weights2.T)
        hidden1_delta = hidden1_error * self.sigmoid_derivative(self.hidden1)
        
        # Updating weights and biases based on computed gradients
        self.weights4 -= lr * np.dot(self.hidden3.T, output_delta)
        self.bias4 -= lr * np.sum(output_delta, axis=0, keepdims=True)
        
        self.weights3 -= lr * np.dot(self.hidden2.T, hidden3_delta)
        self.bias3 -= lr * np.sum(hidden3_delta, axis=0, keepdims=True)
        
        self.weights2 -= lr * np.dot(self.hidden1.T, hidden2_delta)
        self.bias2 -= lr * np.sum(hidden2_delta, axis=0, keepdims=True)
        
        self.weights1 -= lr * np.dot(x.T, hidden1_delta)
        self.bias1 -= lr * np.sum(hidden1_delta, axis=0, keepdims=True)
        
        # Storing the updated gradients
        self.gradients['dw1'].append(np.mean(np.abs(hidden1_delta)))
        self.gradients['db1'].append(np.mean(np.abs(self.bias1)))
        self.gradients['dw2'].append(np.mean(np.abs(hidden2_delta)))
        self.gradients['db2'].append(np.mean(np.abs(self.bias2)))
        self.gradients['dw3'].append(np.mean(np.abs(hidden3_delta)))
        self.gradients['db3'].append(np.mean(np.abs(self.bias3)))
        self.gradients['dw4'].append(np.mean(np.abs(output_delta)))
        self.gradients['db4'].append(np.mean(np.abs(self.bias4)))
                       
    def train(self, x, y, epochs, lr):
        for i in range(epochs):
            output = self.forward(x)
            self.backward(x, y, lr)
            loss = -np.mean(y * np.log(output))
            self.losses.append(loss)
            accuracy = self.accuracy(x, np.argmax(y, axis=1))
            self.accuracies.append(accuracy)
            if i % 100 == 0:
                print(f'Epoch {i}: Loss {loss}, Accuracy {accuracy}')
                
    def predict(self, x):
        return np.argmax(self.forward(x), axis=1)
    
    def accuracy(self, x, y):
        pred = self.predict(x)
        return np.mean(pred == y)
In [150]:
##################################
# Preparing the training data
##################################
X = matrix_x_values
y = y_values

##################################
# Performing a one-hot encoding
# of the target response labels
##################################
num_classes = 2
y_one_hot = np.eye(num_classes)[y]
In [151]:
##################################
# Defining the neural network components
##################################
input_size = 2
hidden_size1 = 4
hidden_size2 = 4
hidden_size3 = 4
output_size = 2
In [152]:
##################################
# Initializing a neural network model object
# with sigmoid activation function
##################################
np.random.seed(88888)
nn_sigmoid = NeuralNetwork_Sigmoid(input_size, hidden_size1, hidden_size2, hidden_size3, output_size)
In [153]:
##################################
# Training a neural network model
# with sigmoid activation function
##################################
nn_sigmoid.train(X, y_one_hot, epochs=1001, lr=0.01)
Epoch 0: Loss 0.316009360251368, Accuracy 0.7484662576687117
Epoch 100: Loss 0.29042169855505795, Accuracy 0.7484662576687117
Epoch 200: Loss 0.28610782259171613, Accuracy 0.7484662576687117
Epoch 300: Loss 0.2846113282316412, Accuracy 0.7484662576687117
Epoch 400: Loss 0.2835652925120681, Accuracy 0.7484662576687117
Epoch 500: Loss 0.2826104420839403, Accuracy 0.7484662576687117
Epoch 600: Loss 0.28168417134187307, Accuracy 0.7484662576687117
Epoch 700: Loss 0.28077152406724765, Accuracy 0.7484662576687117
Epoch 800: Loss 0.27986564301503736, Accuracy 0.7484662576687117
Epoch 900: Loss 0.2789612062764938, Accuracy 0.7484662576687117
Epoch 1000: Loss 0.27805328141665736, Accuracy 0.7484662576687117
In [154]:
##################################
# Plotting the computed gradients
# of the neural network model
# with sigmoid activation function
##################################
plt.figure(figsize=(10, 6))
for key, value in nn_sigmoid.gradients.items():
    plt.plot(value, label=key)
plt.title('Sigmoid Activation: Gradients by Iteration')
plt.xlabel('Iterations')
plt.ylabel('Gradients')
plt.ylim(-0.05, 0.50)
plt.xlim(-50,1000)
plt.legend(loc="upper left")
plt.show()
No description has been provided for this image
In [155]:
##################################
# Plotting the computed cost
# of the neural network model
# with sigmoid activation function
##################################
plt.figure(figsize=(10, 6))
plt.plot(nn_sigmoid.losses)
plt.title('Sigmoid Activation: Cost Function by Iteration')
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.ylim(0.05, 0.70)
plt.xlim(-50,1000)
plt.show()
No description has been provided for this image
In [156]:
##################################
# Plotting the computed accuracy
# of the neural network model
# with sigmoid activation function
##################################
plt.figure(figsize=(10, 6))
plt.plot(nn_sigmoid.accuracies)
plt.title('Sigmoid Activation: Classification by Iteration')
plt.xlabel('Iterations')
plt.ylabel('Accuracy')
plt.ylim(0.00, 1.00)
plt.xlim(-50,1000)
plt.show()
No description has been provided for this image
In [157]:
##################################
# Gathering the final values for 
# accuracy and loss error
##################################
sigmoid_metrics = pd.DataFrame(["ACCURACY","LOSS"])
sigmoid_values = pd.DataFrame([nn_sigmoid.accuracies[-1],nn_sigmoid.losses[-1]])
sigmoid_method = pd.DataFrame(["Sigmoid"]*2)
sigmoid_summary = pd.concat([sigmoid_metrics, 
                             sigmoid_values,
                             sigmoid_method], axis=1)
sigmoid_summary.columns = ['Metric', 'Value', 'Method']
sigmoid_summary.reset_index(inplace=True, drop=True)
display(sigmoid_summary)
Metric Value Method
0 ACCURACY 0.748466 Sigmoid
1 LOSS 0.278053 Sigmoid

1.6.3 Rectified Linear Unit Activation Function ¶

Backpropagation and Weight Update, in the context of an artificial neural network, involve the process of iteratively adjusting the weights of the connections between neurons in the network to minimize the difference between the predicted and the actual target responses. Input data is fed into the neural network, and it propagates through the network layer by layer, starting from the input layer, through hidden layers, and ending at the output layer. At each neuron, the weighted sum of inputs is calculated, followed by the application of an activation function to produce the neuron's output. Once the forward pass is complete, the network's output is compared to the actual target output. The difference between the predicted output and the actual output is quantified using a loss function, which measures the discrepancy between the predicted and actual values. Common loss functions for classification tasks include cross-entropy loss. During the backward pass, the error is propagated backward through the network to compute the gradients of the loss function with respect to each weight in the network. This is achieved using the chain rule of calculus, which allows the error to be decomposed and distributed backward through the network. The gradients quantify how much a change in each weight would affect the overall error of the network. Once the gradients are computed, the weights are updated in the opposite direction of the gradient to minimize the error. This update is typically performed using an optimization algorithm such as gradient descent, which adjusts the weights in proportion to their gradients and a learning rate hyperparameter. The learning rate determines the size of the step taken in the direction opposite to the gradient. These steps are repeated for multiple iterations (epochs) over the training data. As the training progresses, the weights are adjusted iteratively to minimize the error, leading to a neural network model that accurately classifies input data. This activation function is commonly used in the output layer for binary classification problems.

Activation Functions play a crucial role in neural networks by introducing non-linearity into the network, enabling the model to learn complex patterns and relationships within the data. In the context of a neural network classification model, activation functions are applied to the output of each neuron in the hidden layers to introduce non-linear mappings between the input and output, allowing the network to approximate complex functions and make non-linear decisions. Activation functions are significant during model development by introducing non-linearity (without activation functions, the neural network would simply be a series of linear transformations, no matter how many layers it has. Activation functions introduce non-linearities to the model, enabling it to learn and represent complex patterns and relationships in the data); propagating back gradients (activation functions help in the backpropagation algorithm by providing gradients that indicate the direction and magnitude of adjustments to the weights during training. These gradients are necessary for optimizing the network's parameters through techniques like gradient descent}; and normalizing outputs (activation functions also help in normalizing the output of each neuron, ensuring that it falls within a specific range. This normalization prevents the activation values from becoming too large or too small, which can lead to numerical instability or saturation of gradients during training). The choice of activation function can significantly impact the performance and training dynamics of a neural network classification model, making it an important consideration during model development. Different activation functions have different properties, and selecting the appropriate one depends on factors such as the nature of the problem, the characteristics of the data, and the desired behavior of the network.

Rectified Linear Unit Activation Function transforms a variable by determining the maximum between zero and the given value of the variable. The resulting output which ranges from zero to positive infinity is a piecewise linear function that returns zero for negative inputs and the input value for positive inputs. It is a simple and computationally efficient method which solves the vanishing gradients problem and accelerates convergence by avoiding saturation to positive values. However, it can suffer from dying RELU problem where neurons become inactive (outputs zero) for all inputs during training if large gradients consistently flow through them. The curve profile is not zero-centered which can lead to vanishing gradients problem during backpropagation. It is also prone to saturation, causing gradients to vanish when the input is too large or too small. This activation function is widely used in hidden layers of deep neural networks.

  1. A neural network with the following structure was formulated:
    • Hidden Layer = 3
    • Number of Nodes per Hidden Layer = 4
  2. The backpropagation and gradient descent algorithms were implemented with parameter settings described as follows:
    • Learning Rate = 0.01
    • Epochs = 1000
    • Activation Function = Rectified Linear Unit Activation Function (RELU)
  3. The final loss estimate determined as 0.09623 at the 1000th epoch was not optimally low as compared to those obtained using the other parameter settings.
  4. Applying the backpropagation and gradient descent algorithms with a RELU activation function, the neural network model performance is estimated as follows:
    • Accuracy = 93.25153
  5. The estimated classification accuracy using the backpropagation and gradient descent algorithms with a RELU activation function was optimal as compared to those obtained using the other parameter settings, also demonstrating a consistently smooth profile during the epoch training
In [158]:
##################################
# Creating a class object
# for the neural network algorithm
# using a RELU activation function
##################################
import numpy as np
import matplotlib.pyplot as plt

class NeuralNetwork_RELU:
    def __init__(self, input_size, hidden_size1, hidden_size2, hidden_size3, output_size):
        self.weights1 = np.random.randn(input_size, hidden_size1)
        self.bias1 = np.zeros((1, hidden_size1))
        
        self.weights2 = np.random.randn(hidden_size1, hidden_size2)
        self.bias2 = np.zeros((1, hidden_size2))
        
        self.weights3 = np.random.randn(hidden_size2, hidden_size3)
        self.bias3 = np.zeros((1, hidden_size3))
        
        self.weights4 = np.random.randn(hidden_size3, output_size)
        self.bias4 = np.zeros((1, output_size))
        
        self.gradients = {'dw1': [], 'db1': [], 'dw2': [], 'db2': [], 'dw3': [], 'db3': [], 'dw4': [], 'db4': []}
        self.losses = []
        self.accuracies = []
        
    def relu(self, x):
        return np.maximum(0, x)
    
    def softmax(self, x):
        exps = np.exp(x - np.max(x, axis=1, keepdims=True))
        return exps / np.sum(exps, axis=1, keepdims=True)
    
    def forward(self, x):
        self.hidden1 = self.relu(np.dot(x, self.weights1) + self.bias1)
        self.hidden2 = self.relu(np.dot(self.hidden1, self.weights2) + self.bias2)
        self.hidden3 = self.relu(np.dot(self.hidden2, self.weights3) + self.bias3)
        self.output = self.softmax(np.dot(self.hidden3, self.weights4) + self.bias4)
        
        return self.output
    
    def backward(self, x, y, lr):
        m = x.shape[0]
        
        # Computing the output layer gradients
        output_error = self.output - y
        output_delta = output_error / m
        
        # Computing the hidden layer 3 gradients
        hidden3_error = np.dot(output_delta, self.weights4.T)
        hidden3_delta = hidden3_error * (self.hidden3 > 0)
        
        # Computing the hidden layer 2 gradients
        hidden2_error = np.dot(hidden3_delta, self.weights3.T)
        hidden2_delta = hidden2_error * (self.hidden2 > 0)
        
        # Computing the hidden layer 1 gradients
        hidden1_error = np.dot(hidden2_delta, self.weights2.T)
        hidden1_delta = hidden1_error * (self.hidden1 > 0)
        
        # Updating weights and biases based on computed gradients
        self.weights4 -= lr * np.dot(self.hidden3.T, output_delta)
        self.bias4 -= lr * np.sum(output_delta, axis=0, keepdims=True)
        
        self.weights3 -= lr * np.dot(self.hidden2.T, hidden3_delta)
        self.bias3 -= lr * np.sum(hidden3_delta, axis=0, keepdims=True)
        
        self.weights2 -= lr * np.dot(self.hidden1.T, hidden2_delta)
        self.bias2 -= lr * np.sum(hidden2_delta, axis=0, keepdims=True)
        
        self.weights1 -= lr * np.dot(x.T, hidden1_delta)
        self.bias1 -= lr * np.sum(hidden1_delta, axis=0, keepdims=True)
        
        # Storing computed gradients
        self.gradients['dw1'].append(np.mean(np.abs(hidden1_delta)))
        self.gradients['db1'].append(np.mean(np.abs(self.bias1)))
        self.gradients['dw2'].append(np.mean(np.abs(hidden2_delta)))
        self.gradients['db2'].append(np.mean(np.abs(self.bias2)))
        self.gradients['dw3'].append(np.mean(np.abs(hidden3_delta)))
        self.gradients['db3'].append(np.mean(np.abs(self.bias3)))
        self.gradients['dw4'].append(np.mean(np.abs(output_delta)))
        self.gradients['db4'].append(np.mean(np.abs(self.bias4)))
        
    def train(self, x, y, epochs, lr):
        for i in range(epochs):
            output = self.forward(x)
            self.backward(x, y, lr)
            loss = -np.mean(y * np.log(output))
            self.losses.append(loss)
            accuracy = self.accuracy(x, np.argmax(y, axis=1))
            self.accuracies.append(accuracy)
            if i % 100 == 0:
                print(f'Epoch {i}: Loss {loss}, Accuracy {accuracy}')
                
    def predict(self, x):
        return np.argmax(self.forward(x), axis=1)
    
    def accuracy(self, x, y):
        pred = self.predict(x)
        return np.mean(pred == y)
In [159]:
##################################
# Preparing the training data
##################################
X = matrix_x_values
y = y_values

##################################
# Performing a one-hot encoding
# of the target response labels
##################################
num_classes = 2
y_one_hot = np.eye(num_classes)[y]
In [160]:
##################################
# Defining the neural network components
##################################
input_size = 2
hidden_size1 = 4
hidden_size2 = 4
hidden_size3 = 4
output_size = 2
In [161]:
##################################
# Initializing a neural network model object
# with RELU activation function
##################################
np.random.seed(88888)
nn_relu = NeuralNetwork_RELU(input_size, hidden_size1, hidden_size2, hidden_size3, output_size)
In [162]:
##################################
# Training a neural network model
# with RELU activation function
##################################
nn_relu.train(X, y_one_hot, epochs=1001, lr=0.01)
Epoch 0: Loss 0.3700211712894534, Accuracy 0.2822085889570552
Epoch 100: Loss 0.1794249587592613, Accuracy 0.901840490797546
Epoch 200: Loss 0.13224275039852623, Accuracy 0.9079754601226994
Epoch 300: Loss 0.1167678880588832, Accuracy 0.9141104294478528
Epoch 400: Loss 0.10890482140743643, Accuracy 0.9202453987730062
Epoch 500: Loss 0.10455617528678188, Accuracy 0.9202453987730062
Epoch 600: Loss 0.10189142339749091, Accuracy 0.9202453987730062
Epoch 700: Loss 0.09987241742776284, Accuracy 0.9263803680981595
Epoch 800: Loss 0.09805768178547344, Accuracy 0.9325153374233128
Epoch 900: Loss 0.096972829866501, Accuracy 0.9325153374233128
Epoch 1000: Loss 0.09623684890656739, Accuracy 0.9325153374233128
In [163]:
##################################
# Plotting the computed gradients
# of the neural network model
# with RELU activation function
##################################
plt.figure(figsize=(10, 6))
for key, value in nn_relu.gradients.items():
    plt.plot(value, label=key)
plt.title('RELU Activation: Gradients by Iteration')
plt.xlabel('Iterations')
plt.ylabel('Gradients')
plt.ylim(-0.05, 0.50)
plt.xlim(-50,1000)
plt.legend(loc="upper left")
plt.show()
No description has been provided for this image
In [164]:
##################################
# Plotting the computed cost
# of the neural network model
# with RELU activation function
##################################
plt.figure(figsize=(10, 6))
plt.plot(nn_relu.losses)
plt.title('RELU Activation: Cost Function by Iteration')
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.ylim(0.05, 0.70)
plt.xlim(-50,1000)
plt.show()
No description has been provided for this image
In [165]:
##################################
# Plotting the computed accuracy
# of the neural network model
# with RELU activation function
##################################
plt.figure(figsize=(10, 6))
plt.plot(nn_relu.accuracies)
plt.title('RELU Activation: Classification by Iteration')
plt.xlabel('Iterations')
plt.ylabel('Accuracy')
plt.ylim(0.00, 1.00)
plt.xlim(-50,1000)
plt.show()
No description has been provided for this image
In [166]:
##################################
# Gathering the final values for 
# accuracy and loss error
# using a RELU activation function
##################################
relu_metrics = pd.DataFrame(["ACCURACY","LOSS"])
relu_values = pd.DataFrame([nn_relu.accuracies[-1],nn_relu.losses[-1]])
relu_method = pd.DataFrame(["RELU"]*2)
relu_summary = pd.concat([relu_metrics, 
                          relu_values,
                          relu_method], axis=1)
relu_summary.columns = ['Metric', 'Value', 'Method']
relu_summary.reset_index(inplace=True, drop=True)
display(relu_summary)
Metric Value Method
0 ACCURACY 0.932515 RELU
1 LOSS 0.096237 RELU

1.6.4 Leaky Rectified Linear Unit Activation Function ¶

Backpropagation and Weight Update, in the context of an artificial neural network, involve the process of iteratively adjusting the weights of the connections between neurons in the network to minimize the difference between the predicted and the actual target responses. Input data is fed into the neural network, and it propagates through the network layer by layer, starting from the input layer, through hidden layers, and ending at the output layer. At each neuron, the weighted sum of inputs is calculated, followed by the application of an activation function to produce the neuron's output. Once the forward pass is complete, the network's output is compared to the actual target output. The difference between the predicted output and the actual output is quantified using a loss function, which measures the discrepancy between the predicted and actual values. Common loss functions for classification tasks include cross-entropy loss. During the backward pass, the error is propagated backward through the network to compute the gradients of the loss function with respect to each weight in the network. This is achieved using the chain rule of calculus, which allows the error to be decomposed and distributed backward through the network. The gradients quantify how much a change in each weight would affect the overall error of the network. Once the gradients are computed, the weights are updated in the opposite direction of the gradient to minimize the error. This update is typically performed using an optimization algorithm such as gradient descent, which adjusts the weights in proportion to their gradients and a learning rate hyperparameter. The learning rate determines the size of the step taken in the direction opposite to the gradient. These steps are repeated for multiple iterations (epochs) over the training data. As the training progresses, the weights are adjusted iteratively to minimize the error, leading to a neural network model that accurately classifies input data.

Activation Functions play a crucial role in neural networks by introducing non-linearity into the network, enabling the model to learn complex patterns and relationships within the data. In the context of a neural network classification model, activation functions are applied to the output of each neuron in the hidden layers to introduce non-linear mappings between the input and output, allowing the network to approximate complex functions and make non-linear decisions. Activation functions are significant during model development by introducing non-linearity (without activation functions, the neural network would simply be a series of linear transformations, no matter how many layers it has. Activation functions introduce non-linearities to the model, enabling it to learn and represent complex patterns and relationships in the data); propagating back gradients (activation functions help in the backpropagation algorithm by providing gradients that indicate the direction and magnitude of adjustments to the weights during training. These gradients are necessary for optimizing the network's parameters through techniques like gradient descent}; and normalizing outputs (activation functions also help in normalizing the output of each neuron, ensuring that it falls within a specific range. This normalization prevents the activation values from becoming too large or too small, which can lead to numerical instability or saturation of gradients during training). The choice of activation function can significantly impact the performance and training dynamics of a neural network classification model, making it an important consideration during model development. Different activation functions have different properties, and selecting the appropriate one depends on factors such as the nature of the problem, the characteristics of the data, and the desired behavior of the network.

Leaky Rectified Linear Unit Activation Function transforms a variable by using the given value of the variable when it is greater than zero, but multiplies this value with a specific alpha otherwise. The resulting output which ranges from negative infinity to positive infinity is a piecewise linear function that returns the input value for positive inputs but allows a small non-zero value when the input is negative, preventing the dying RELU problem. It introduces a small slope for negative inputs, keeping the gradient flowing during backpropagation. This activation function provides a good alternative to RELU especially in networks where dying RELU is a concern.

  1. A neural network with the following structure was formulated:
    • Hidden Layer = 3
    • Number of Nodes per Hidden Layer = 4
  2. The backpropagation and gradient descent algorithms were implemented with parameter settings described as follows:
    • Learning Rate = 0.01
    • Epochs = 1000
    • Activation Function = Leaky Rectified Linear Unit Activation Function (Leaky RELU)
  3. The final loss estimate determined as 0.09377 at the 1000th epoch was not optimally low as compared to those obtained using the other parameter settings.
  4. Applying the backpropagation and gradient descent algorithms with a Leaky RELU activation function, the neural network model performance is estimated as follows:
    • Accuracy = 92.63803
  5. The estimated classification accuracy using the backpropagation and gradient descent algorithms with a Leaky RELU activation function was optimal as compared to those obtained using the other parameter settings, also demonstrating a consistently smooth profile during the epoch training.
In [167]:
##################################
# Creating a class object
# for the neural network algorithm
# using a Leaky RELU activation function
##################################
class NeuralNetwork_LeakyRELU:
    def __init__(self, input_size, hidden_size1, hidden_size2, hidden_size3, output_size):
        self.weights1 = np.random.randn(input_size, hidden_size1)
        self.bias1 = np.zeros((1, hidden_size1))
        
        self.weights2 = np.random.randn(hidden_size1, hidden_size2)
        self.bias2 = np.zeros((1, hidden_size2))
        
        self.weights3 = np.random.randn(hidden_size2, hidden_size3)
        self.bias3 = np.zeros((1, hidden_size3))
        
        self.weights4 = np.random.randn(hidden_size3, output_size)
        self.bias4 = np.zeros((1, output_size))
        
        self.gradients = {'dw1': [], 'db1': [], 'dw2': [], 'db2': [], 'dw3': [], 'db3': [], 'dw4': [], 'db4': []}
        self.losses = []
        self.accuracies = []
        
    def leaky_relu(self, x, alpha=0.10):
        return np.where(x > 0, x, alpha * x)
    
    def softmax(self, x):
        exps = np.exp(x - np.max(x, axis=1, keepdims=True))
        return exps / np.sum(exps, axis=1, keepdims=True)
    
    def forward(self, x):
        self.hidden1 = self.leaky_relu(np.dot(x, self.weights1) + self.bias1)
        self.hidden2 = self.leaky_relu(np.dot(self.hidden1, self.weights2) + self.bias2)
        self.hidden3 = self.leaky_relu(np.dot(self.hidden2, self.weights3) + self.bias3)
        self.output = self.softmax(np.dot(self.hidden3, self.weights4) + self.bias4)
        
        return self.output
    
    def backward(self, x, y, lr):
        m = x.shape[0]
        
        # Computing the output layer gradients
        output_error = self.output - y
        output_delta = output_error / m
        
        # Computing the hidden layer 3 gradients
        hidden3_error = np.dot(output_delta, self.weights4.T)
        hidden3_delta = hidden3_error * (self.hidden3 > 0) + 0.01 * hidden3_error * (self.hidden3 <= 0)
        
        # Computing the hidden layer 2 gradients
        hidden2_error = np.dot(hidden3_delta, self.weights3.T)
        hidden2_delta = hidden2_error * (self.hidden2 > 0) + 0.01 * hidden2_error * (self.hidden2 <= 0)
        
        # Computing the hidden layer 1 gradients
        hidden1_error = np.dot(hidden2_delta, self.weights2.T)
        hidden1_delta = hidden1_error * (self.hidden1 > 0) + 0.01 * hidden1_error * (self.hidden1 <= 0)
        
        # Updating weights and biases based on computed gradients
        self.weights4 -= lr * np.dot(self.hidden3.T, output_delta)
        self.bias4 -= lr * np.sum(output_delta, axis=0, keepdims=True)
        
        self.weights3 -= lr * np.dot(self.hidden2.T, hidden3_delta)
        self.bias3 -= lr * np.sum(hidden3_delta, axis=0, keepdims=True)
        
        self.weights2 -= lr * np.dot(self.hidden1.T, hidden2_delta)
        self.bias2 -= lr * np.sum(hidden2_delta, axis=0, keepdims=True)
        
        self.weights1 -= lr * np.dot(x.T, hidden1_delta)
        self.bias1 -= lr * np.sum(hidden1_delta, axis=0, keepdims=True)
        
        # Storing updated gradients
        self.gradients['dw1'].append(np.mean(np.abs(hidden1_delta)))
        self.gradients['db1'].append(np.mean(np.abs(self.bias1)))
        self.gradients['dw2'].append(np.mean(np.abs(hidden2_delta)))
        self.gradients['db2'].append(np.mean(np.abs(self.bias2)))
        self.gradients['dw3'].append(np.mean(np.abs(hidden3_delta)))
        self.gradients['db3'].append(np.mean(np.abs(self.bias3)))
        self.gradients['dw4'].append(np.mean(np.abs(output_delta)))
        self.gradients['db4'].append(np.mean(np.abs(self.bias4)))
        
    def train(self, x, y, epochs, lr):
        for i in range(epochs):
            output = self.forward(x)
            self.backward(x, y, lr)
            loss = -np.mean(y * np.log(output))
            self.losses.append(loss)
            accuracy = self.accuracy(x, np.argmax(y, axis=1))
            self.accuracies.append(accuracy)
            if i % 100 == 0:
                print(f'Epoch {i}: Loss {loss}, Accuracy {accuracy}')
                
    def predict(self, x):
        return np.argmax(self.forward(x), axis=1)
    
    def accuracy(self, x, y):
        pred = self.predict(x)
        return np.mean(pred == y)
    
In [168]:
##################################
# Preparing the training data
##################################
X = matrix_x_values
y = y_values

##################################
# Performing a one-hot encoding
# of the target response labels
##################################
num_classes = 2
y_one_hot = np.eye(num_classes)[y]
In [169]:
##################################
# Defining the neural network components
##################################
input_size = 2
hidden_size1 = 4
hidden_size2 = 4
hidden_size3 = 4
output_size = 2
In [170]:
##################################
# Initializing a neural network model object
# with RELU activation function
##################################
np.random.seed(88888)
nn_leakyrelu = NeuralNetwork_LeakyRELU(input_size, hidden_size1, hidden_size2, hidden_size3, output_size)
In [171]:
##################################
# Training a neural network model
# with Leaky RELU activation function
##################################
nn_leakyrelu.train(X, y_one_hot, epochs=1001, lr=0.01)
Epoch 0: Loss 0.5521727210322692, Accuracy 0.3803680981595092
Epoch 100: Loss 0.1584376716587371, Accuracy 0.9079754601226994
Epoch 200: Loss 0.12183916598907361, Accuracy 0.901840490797546
Epoch 300: Loss 0.11001101254184688, Accuracy 0.9263803680981595
Epoch 400: Loss 0.10402068587112523, Accuracy 0.9263803680981595
Epoch 500: Loss 0.10064359626697053, Accuracy 0.9263803680981595
Epoch 600: Loss 0.09839877372876189, Accuracy 0.9325153374233128
Epoch 700: Loss 0.09675845212053208, Accuracy 0.9325153374233128
Epoch 800: Loss 0.09550172181604426, Accuracy 0.9263803680981595
Epoch 900: Loss 0.09454243365295582, Accuracy 0.9263803680981595
Epoch 1000: Loss 0.093777159984923, Accuracy 0.9263803680981595
In [172]:
##################################
# Plotting the computed gradients
# of the neural network model
# with RELU activation function
##################################
plt.figure(figsize=(10, 6))
for key, value in nn_leakyrelu.gradients.items():
    plt.plot(value, label=key)
plt.title('Leaky RELU Activation: Gradients by Iteration')
plt.xlabel('Iterations')
plt.ylabel('Gradients')
plt.ylim(-0.05, 0.50)
plt.xlim(-50,1000)
plt.legend(loc="upper left")
plt.show()
No description has been provided for this image
In [173]:
##################################
# Plotting the computed cost
# of the neural network model
# with Leaky RELU activation function
##################################
plt.figure(figsize=(10, 6))
plt.plot(nn_leakyrelu.losses)
plt.title('Leaky RELU Activation: Cost Function by Iteration')
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.ylim(0.05, 0.70)
plt.xlim(-50,1000)
plt.show()
No description has been provided for this image
In [174]:
##################################
# Plotting the computed accuracy
# of the neural network model
# with Leaky RELU activation function
##################################
plt.figure(figsize=(10, 6))
plt.plot(nn_leakyrelu.accuracies)
plt.title('Leaky RELU Activation: Classification by Iteration')
plt.xlabel('Iterations')
plt.ylabel('Accuracy')
plt.ylim(0.00, 1.00)
plt.xlim(-50,1000)
plt.show()
No description has been provided for this image
In [175]:
##################################
# Gathering the final values for 
# accuracy and loss error
# using a Leaky RELU activation function
##################################
leakyrelu_metrics = pd.DataFrame(["ACCURACY","LOSS"])
leakyrelu_values = pd.DataFrame([nn_leakyrelu.accuracies[-1],nn_leakyrelu.losses[-1]])
leakyrelu_method = pd.DataFrame(["Leaky_RELU"]*2)
leakyrelu_summary = pd.concat([leakyrelu_metrics,
                               leakyrelu_values,
                               leakyrelu_method], axis=1)
leakyrelu_summary.columns = ['Metric', 'Value', 'Method']
leakyrelu_summary.reset_index(inplace=True, drop=True)
display(leakyrelu_summary)
Metric Value Method
0 ACCURACY 0.926380 Leaky_RELU
1 LOSS 0.093777 Leaky_RELU

1.6.5 Exponential Linear Unit Activation Function ¶

Backpropagation and Weight Update, in the context of an artificial neural network, involve the process of iteratively adjusting the weights of the connections between neurons in the network to minimize the difference between the predicted and the actual target responses. Input data is fed into the neural network, and it propagates through the network layer by layer, starting from the input layer, through hidden layers, and ending at the output layer. At each neuron, the weighted sum of inputs is calculated, followed by the application of an activation function to produce the neuron's output. Once the forward pass is complete, the network's output is compared to the actual target output. The difference between the predicted output and the actual output is quantified using a loss function, which measures the discrepancy between the predicted and actual values. Common loss functions for classification tasks include cross-entropy loss. During the backward pass, the error is propagated backward through the network to compute the gradients of the loss function with respect to each weight in the network. This is achieved using the chain rule of calculus, which allows the error to be decomposed and distributed backward through the network. The gradients quantify how much a change in each weight would affect the overall error of the network. Once the gradients are computed, the weights are updated in the opposite direction of the gradient to minimize the error. This update is typically performed using an optimization algorithm such as gradient descent, which adjusts the weights in proportion to their gradients and a learning rate hyperparameter. The learning rate determines the size of the step taken in the direction opposite to the gradient. These steps are repeated for multiple iterations (epochs) over the training data. As the training progresses, the weights are adjusted iteratively to minimize the error, leading to a neural network model that accurately classifies input data.

Activation Functions play a crucial role in neural networks by introducing non-linearity into the network, enabling the model to learn complex patterns and relationships within the data. In the context of a neural network classification model, activation functions are applied to the output of each neuron in the hidden layers to introduce non-linear mappings between the input and output, allowing the network to approximate complex functions and make non-linear decisions. Activation functions are significant during model development by introducing non-linearity (without activation functions, the neural network would simply be a series of linear transformations, no matter how many layers it has. Activation functions introduce non-linearities to the model, enabling it to learn and represent complex patterns and relationships in the data); propagating back gradients (activation functions help in the backpropagation algorithm by providing gradients that indicate the direction and magnitude of adjustments to the weights during training. These gradients are necessary for optimizing the network's parameters through techniques like gradient descent}; and normalizing outputs (activation functions also help in normalizing the output of each neuron, ensuring that it falls within a specific range. This normalization prevents the activation values from becoming too large or too small, which can lead to numerical instability or saturation of gradients during training). The choice of activation function can significantly impact the performance and training dynamics of a neural network classification model, making it an important consideration during model development. Different activation functions have different properties, and selecting the appropriate one depends on factors such as the nature of the problem, the characteristics of the data, and the desired behavior of the network.

Exponential Linear Unit Activation Function transforms a variable by using the given value of the variable when it is greater than zero, but uses this value as power for the Euler's constant then subtracted by one prior to multiplying with a specific alpha otherwise. The resulting output which ranges from negative infinity to positive infinity is similar to a Leaky RELU profile but with an exponential function for negative inputs, allowing negative values to have negative outputs. Its profile is smooth and differentiable helping speed up convergence which can lead to faster learning. This activation function can be a good choice when smoother and faster convergence are desired.

  1. A neural network with the following structure was formulated:
    • Hidden Layer = 3
    • Number of Nodes per Hidden Layer = 4
  2. The backpropagation and gradient descent algorithms were implemented with parameter settings described as follows:
    • Learning Rate = 0.01
    • Epochs = 1000
    • Activation Function = Exponential Linear Unit (ELU)
  3. The final loss estimate determined as 0.09443 at the 1000th epoch was not optimally low as compared to those obtained using the other parameter settings.
  4. Applying the backpropagation and gradient descent algorithms with a ELU activation function, the neural network model performance is estimated as follows:
    • Accuracy = 92.63803
  5. The estimated classification accuracy using the backpropagation and gradient descent algorithms with a ELU activation function was optimal as compared to those obtained using the other parameter settings, also demonstrating a consistently smooth profile during the epoch training.
In [176]:
##################################
# Creating a class object
# for the neural network algorithm
# using an ELU activation function
##################################
class NeuralNetwork_ELU:
    def __init__(self, input_size, hidden_size1, hidden_size2, hidden_size3, output_size):
        self.weights1 = np.random.randn(input_size, hidden_size1)
        self.bias1 = np.zeros((1, hidden_size1))
        
        self.weights2 = np.random.randn(hidden_size1, hidden_size2)
        self.bias2 = np.zeros((1, hidden_size2))
        
        self.weights3 = np.random.randn(hidden_size2, hidden_size3)
        self.bias3 = np.zeros((1, hidden_size3))
        
        self.weights4 = np.random.randn(hidden_size3, output_size)
        self.bias4 = np.zeros((1, output_size))
        
        self.gradients = {'dw1': [], 'db1': [], 'dw2': [], 'db2': [], 'dw3': [], 'db3': [], 'dw4': [], 'db4': []}
        self.losses = []
        self.accuracies = []
        
    def elu(self, x, alpha=1.0):
        return np.where(x > 0, x, alpha * (np.exp(x) - 1))
    
    def softmax(self, x):
        exps = np.exp(x - np.max(x, axis=1, keepdims=True))
        return exps / np.sum(exps, axis=1, keepdims=True)
    
    def forward(self, x):
        self.hidden1 = self.elu(np.dot(x, self.weights1) + self.bias1)
        self.hidden2 = self.elu(np.dot(self.hidden1, self.weights2) + self.bias2)
        self.hidden3 = self.elu(np.dot(self.hidden2, self.weights3) + self.bias3)
        self.output = self.softmax(np.dot(self.hidden3, self.weights4) + self.bias4)
        
        return self.output
    
    def backward(self, x, y, lr):
        m = x.shape[0]
        
        # Computing output layer gradients
        output_error = self.output - y
        output_delta = output_error / m
        
        # Computing hidden layer 3 gradients
        hidden3_error = np.dot(output_delta, self.weights4.T)
        hidden3_delta = hidden3_error * (self.hidden3 > 0) + (np.exp(self.hidden3) - 1) * (self.hidden3 <= 0)
        
        # Computing hidden layer 2 gradients
        hidden2_error = np.dot(hidden3_delta, self.weights3.T)
        hidden2_delta = hidden2_error * (self.hidden2 > 0) + (np.exp(self.hidden2) - 1) * (self.hidden2 <= 0)
        
        # Computing hidden layer 1 gradients
        hidden1_error = np.dot(hidden2_delta, self.weights2.T)
        hidden1_delta = hidden1_error * (self.hidden1 > 0) + (np.exp(self.hidden1) - 1) * (self.hidden1 <= 0)
        
        # Updating weights and biases based on computed gradients
        self.weights4 -= lr * np.dot(self.hidden3.T, output_delta)
        self.bias4 -= lr * np.sum(output_delta, axis=0, keepdims=True)
        
        self.weights3 -= lr * np.dot(self.hidden2.T, hidden3_delta)
        self.bias3 -= lr * np.sum(hidden3_delta, axis=0, keepdims=True)
        
        self.weights2 -= lr * np.dot(self.hidden1.T, hidden2_delta)
        self.bias2 -= lr * np.sum(hidden2_delta, axis=0, keepdims=True)
        
        self.weights1 -= lr * np.dot(x.T, hidden1_delta)
        self.bias1 -= lr * np.sum(hidden1_delta, axis=0, keepdims=True)
        
        # Storing updated gradients
        self.gradients['dw1'].append(np.mean(np.abs(hidden1_delta)))
        self.gradients['db1'].append(np.mean(np.abs(self.bias1)))
        self.gradients['dw2'].append(np.mean(np.abs(hidden2_delta)))
        self.gradients['db2'].append(np.mean(np.abs(self.bias2)))
        self.gradients['dw3'].append(np.mean(np.abs(hidden3_delta)))
        self.gradients['db3'].append(np.mean(np.abs(self.bias3)))
        self.gradients['dw4'].append(np.mean(np.abs(output_delta)))
        self.gradients['db4'].append(np.mean(np.abs(self.bias4)))
        
    def train(self, x, y, epochs, lr):
        for i in range(epochs):
            output = self.forward(x)
            self.backward(x, y, lr)
            loss = -np.mean(y * np.log(output))
            self.losses.append(loss)
            accuracy = self.accuracy(x, np.argmax(y, axis=1))
            self.accuracies.append(accuracy)
            if i % 100 == 0:
                print(f'Epoch {i}: Loss {loss}, Accuracy {accuracy}')
                
    def predict(self, x):
        return np.argmax(self.forward(x), axis=1)
    
    def accuracy(self, x, y):
        pred = self.predict(x)
        return np.mean(pred == y)
    
In [177]:
##################################
# Preparing the training data
##################################
X = matrix_x_values
y = y_values

##################################
# Performing a one-hot encoding
# of the target response labels
##################################
num_classes = 2
y_one_hot = np.eye(num_classes)[y]
In [178]:
##################################
# Defining the neural network components
##################################
input_size = 2
hidden_size1 = 4
hidden_size2 = 4
hidden_size3 = 4
output_size = 2
In [179]:
##################################
# Initializing a neural network model object
# with ELU activation function
##################################
np.random.seed(88888)
nn_elu = NeuralNetwork_ELU(input_size, hidden_size1, hidden_size2, hidden_size3, output_size)
In [180]:
##################################
# Training a neural network model
# with ELU activation function
##################################
nn_elu.train(X, y_one_hot, epochs=1001, lr=0.01)
Epoch 0: Loss 0.6239532743548983, Accuracy 0.6748466257668712
Epoch 100: Loss 0.18559850810160425, Accuracy 0.7852760736196319
Epoch 200: Loss 0.1559148417573589, Accuracy 0.8711656441717791
Epoch 300: Loss 0.1318774124814263, Accuracy 0.901840490797546
Epoch 400: Loss 0.11492254996431317, Accuracy 0.9141104294478528
Epoch 500: Loss 0.10509129045149036, Accuracy 0.9263803680981595
Epoch 600: Loss 0.10021355309897699, Accuracy 0.9263803680981595
Epoch 700: Loss 0.09771434144678737, Accuracy 0.9263803680981595
Epoch 800: Loss 0.09611887321358292, Accuracy 0.9263803680981595
Epoch 900: Loss 0.09509628626392996, Accuracy 0.9263803680981595
Epoch 1000: Loss 0.09443192186232484, Accuracy 0.9263803680981595
In [181]:
##################################
# Plotting the computed gradients
# of the neural network model
# with ELU activation function
##################################
plt.figure(figsize=(10, 6))
for key, value in nn_elu.gradients.items():
    plt.plot(value, label=key)
plt.title('ELU Activation: Gradients by Iteration')
plt.xlabel('Iterations')
plt.ylabel('Gradients')
plt.ylim(-0.05, 0.50)
plt.xlim(-50,1000)
plt.legend(loc="upper left")
plt.show()
No description has been provided for this image
In [182]:
##################################
# Plotting the computed cost
# of the neural network model
# with ELU activation function
##################################
plt.figure(figsize=(10, 6))
plt.plot(nn_elu.losses)
plt.title('ELU Activation: Cost Function by Iteration')
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.ylim(0.05, 0.70)
plt.xlim(-50,1000)
plt.show()
No description has been provided for this image
In [183]:
##################################
# Plotting the computed accuracy
# of the neural network model
# with ELU activation function
##################################
plt.figure(figsize=(10, 6))
plt.plot(nn_elu.accuracies)
plt.title('ELU Activation: Classification by Iteration')
plt.xlabel('Iterations')
plt.ylabel('Accuracy')
plt.ylim(0.00, 1.00)
plt.xlim(-50,1000)
plt.show()
No description has been provided for this image
In [184]:
##################################
# Gathering the final values for 
# accuracy and loss error
# using a ELU activation function
##################################
elu_metrics = pd.DataFrame(["ACCURACY","LOSS"])
elu_values = pd.DataFrame([nn_elu.accuracies[-1],nn_elu.losses[-1]])
elu_method = pd.DataFrame(["ELU"]*2)
elu_summary = pd.concat([elu_metrics,
                         elu_values,
                         elu_method], axis=1)
elu_summary.columns = ['Metric', 'Value', 'Method']
elu_summary.reset_index(inplace=True, drop=True)
display(elu_summary)
Metric Value Method
0 ACCURACY 0.926380 ELU
1 LOSS 0.094432 ELU

1.6.6 Scaled Exponential Linear Unit Activation Function ¶

Backpropagation and Weight Update, in the context of an artificial neural network, involve the process of iteratively adjusting the weights of the connections between neurons in the network to minimize the difference between the predicted and the actual target responses. Input data is fed into the neural network, and it propagates through the network layer by layer, starting from the input layer, through hidden layers, and ending at the output layer. At each neuron, the weighted sum of inputs is calculated, followed by the application of an activation function to produce the neuron's output. Once the forward pass is complete, the network's output is compared to the actual target output. The difference between the predicted output and the actual output is quantified using a loss function, which measures the discrepancy between the predicted and actual values. Common loss functions for classification tasks include cross-entropy loss. During the backward pass, the error is propagated backward through the network to compute the gradients of the loss function with respect to each weight in the network. This is achieved using the chain rule of calculus, which allows the error to be decomposed and distributed backward through the network. The gradients quantify how much a change in each weight would affect the overall error of the network. Once the gradients are computed, the weights are updated in the opposite direction of the gradient to minimize the error. This update is typically performed using an optimization algorithm such as gradient descent, which adjusts the weights in proportion to their gradients and a learning rate hyperparameter. The learning rate determines the size of the step taken in the direction opposite to the gradient. These steps are repeated for multiple iterations (epochs) over the training data. As the training progresses, the weights are adjusted iteratively to minimize the error, leading to a neural network model that accurately classifies input data.

Activation Functions play a crucial role in neural networks by introducing non-linearity into the network, enabling the model to learn complex patterns and relationships within the data. In the context of a neural network classification model, activation functions are applied to the output of each neuron in the hidden layers to introduce non-linear mappings between the input and output, allowing the network to approximate complex functions and make non-linear decisions. Activation functions are significant during model development by introducing non-linearity (without activation functions, the neural network would simply be a series of linear transformations, no matter how many layers it has. Activation functions introduce non-linearities to the model, enabling it to learn and represent complex patterns and relationships in the data); propagating back gradients (activation functions help in the backpropagation algorithm by providing gradients that indicate the direction and magnitude of adjustments to the weights during training. These gradients are necessary for optimizing the network's parameters through techniques like gradient descent}; and normalizing outputs (activation functions also help in normalizing the output of each neuron, ensuring that it falls within a specific range. This normalization prevents the activation values from becoming too large or too small, which can lead to numerical instability or saturation of gradients during training). The choice of activation function can significantly impact the performance and training dynamics of a neural network classification model, making it an important consideration during model development. Different activation functions have different properties, and selecting the appropriate one depends on factors such as the nature of the problem, the characteristics of the data, and the desired behavior of the network.

Scaled Exponential Linear Unit Activation Function transforms a variable by using the given value of the variable when it is greater than zero, but uses this value as power for the Euler's constant then subtracted by one prior to multiplying with a specific alpha otherwise - but multiplying either value with a separately defined scaling factor. The resulting output which ranges from negative infinity to positive infinity is similar to an ELU profile but was designed to maintain a stable mean and variance of activations across layers, promoting self-normalization. This method provides a solution to vanishing and exploding gradient problems in deep networks. This activation function is typically used in deep neural networks where normalizing the activations is crucial for stability.

  1. A neural network with the following structure was formulated:
    • Hidden Layer = 3
    • Number of Nodes per Hidden Layer = 4
  2. The backpropagation and gradient descent algorithms were implemented with parameter settings described as follows:
    • Learning Rate = 0.01
    • Epochs = 1000
    • Activation Function = Scaled Exponential Linear Unit (SELU)
  3. The final loss estimate determined as 0.09674 at the 1000th epoch was not optimally low as compared to those obtained using the other parameter settings.
  4. Applying the backpropagation and gradient descent algorithms with a SELU activation function, the neural network model performance is estimated as follows:
    • Accuracy = 90.18404
  5. The estimated classification accuracy using the backpropagation and gradient descent algorithms with a SELU activation function was optimal as compared to those obtained using the other parameter settings, also demonstrating a consistently smooth profile during the epoch training.
In [185]:
##################################
# Creating a class object
# for the neural network algorithm
# using a SELU activation function
##################################
class NeuralNetwork_SELU:
    def __init__(self, input_size, hidden_size1, hidden_size2, hidden_size3, output_size):
        self.weights1 = np.random.randn(input_size, hidden_size1)
        self.bias1 = np.zeros((1, hidden_size1))
        
        self.weights2 = np.random.randn(hidden_size1, hidden_size2)
        self.bias2 = np.zeros((1, hidden_size2))
        
        self.weights3 = np.random.randn(hidden_size2, hidden_size3)
        self.bias3 = np.zeros((1, hidden_size3))
        
        self.weights4 = np.random.randn(hidden_size3, output_size)
        self.bias4 = np.zeros((1, output_size))
        
        self.gradients = {'dw1': [], 'db1': [], 'dw2': [], 'db2': [], 'dw3': [], 'db3': [], 'dw4': [], 'db4': []}
        self.losses = []
        self.accuracies = []
        
    def selu(self, x):
        alpha = 1.6732632423543772848170429916717
        scale = 1.0507009873554804934193349852946
        return scale * np.where(x > 0, x, alpha * (np.exp(x) - 1))
    
    def softmax(self, x):
        exps = np.exp(x - np.max(x, axis=1, keepdims=True))
        return exps / np.sum(exps, axis=1, keepdims=True)
    
    def forward(self, x):
        self.hidden1 = self.selu(np.dot(x, self.weights1) + self.bias1)
        self.hidden2 = self.selu(np.dot(self.hidden1, self.weights2) + self.bias2)
        self.hidden3 = self.selu(np.dot(self.hidden2, self.weights3) + self.bias3)
        self.output = self.softmax(np.dot(self.hidden3, self.weights4) + self.bias4)
        
        return self.output
    
    def backward(self, x, y, lr):
        m = x.shape[0]
        
        # Computing output layer gradients
        output_error = self.output - y
        output_delta = output_error / m
        
        # Computing hidden layer 3 gradients
        hidden3_error = np.dot(output_delta, self.weights4.T)
        hidden3_delta = hidden3_error * (self.hidden3 > 0)
        
        # Computing hidden layer 2 gradients
        hidden2_error = np.dot(hidden3_delta, self.weights3.T)
        hidden2_delta = hidden2_error * (self.hidden2 > 0)
        
        # Computing hidden layer 1 gradients
        hidden1_error = np.dot(hidden2_delta, self.weights2.T)
        hidden1_delta = hidden1_error * (self.hidden1 > 0)
        
        # Updating weights and biases based on computed gradients
        self.weights4 -= lr * np.dot(self.hidden3.T, output_delta)
        self.bias4 -= lr * np.sum(output_delta, axis=0, keepdims=True)
        
        self.weights3 -= lr * np.dot(self.hidden2.T, hidden3_delta)
        self.bias3 -= lr * np.sum(hidden3_delta, axis=0, keepdims=True)
        
        self.weights2 -= lr * np.dot(self.hidden1.T, hidden2_delta)
        self.bias2 -= lr * np.sum(hidden2_delta, axis=0, keepdims=True)
        
        self.weights1 -= lr * np.dot(x.T, hidden1_delta)
        self.bias1 -= lr * np.sum(hidden1_delta, axis=0, keepdims=True)
        
        # Storing computed gradients
        self.gradients['dw1'].append(np.mean(np.abs(hidden1_delta)))
        self.gradients['db1'].append(np.mean(np.abs(self.bias1)))
        self.gradients['dw2'].append(np.mean(np.abs(hidden2_delta)))
        self.gradients['db2'].append(np.mean(np.abs(self.bias2)))
        self.gradients['dw3'].append(np.mean(np.abs(hidden3_delta)))
        self.gradients['db3'].append(np.mean(np.abs(self.bias3)))
        self.gradients['dw4'].append(np.mean(np.abs(output_delta)))
        self.gradients['db4'].append(np.mean(np.abs(self.bias4)))
        
    def train(self, x, y, epochs, lr):
        for i in range(epochs):
            output = self.forward(x)
            self.backward(x, y, lr)
            loss = -np.mean(y * np.log(output))
            self.losses.append(loss)
            accuracy = self.accuracy(x, np.argmax(y, axis=1))
            self.accuracies.append(accuracy)
            if i % 100 == 0:
                print(f'Epoch {i}: Loss {loss}, Accuracy {accuracy}')
                
    def predict(self, x):
        return np.argmax(self.forward(x), axis=1)
    
    def accuracy(self, x, y):
        pred = self.predict(x)
        return np.mean(pred == y)
In [186]:
##################################
# Preparing the training data
##################################
X = matrix_x_values
y = y_values

##################################
# Performing a one-hot encoding
# of the target response labels
##################################
num_classes = 2
y_one_hot = np.eye(num_classes)[y]
In [187]:
##################################
# Defining the neural network components
##################################
input_size = 2
hidden_size1 = 4
hidden_size2 = 4
hidden_size3 = 4
output_size = 2
In [188]:
##################################
# Initializing a neural network model object
# with SELU activation function
##################################
np.random.seed(88888)
nn_selu = NeuralNetwork_SELU(input_size, hidden_size1, hidden_size2, hidden_size3, output_size)
In [189]:
##################################
# Training a neural network model
# with SELU activation function
##################################
nn_selu.train(X, y_one_hot, epochs=1001, lr=0.01)
Epoch 0: Loss 1.0981323160350691, Accuracy 0.3006134969325153
Epoch 100: Loss 0.1944859567995403, Accuracy 0.8343558282208589
Epoch 200: Loss 0.15078198879619675, Accuracy 0.8773006134969326
Epoch 300: Loss 0.13087303505062364, Accuracy 0.8773006134969326
Epoch 400: Loss 0.11912606258714759, Accuracy 0.8773006134969326
Epoch 500: Loss 0.111495190606054, Accuracy 0.8895705521472392
Epoch 600: Loss 0.10641661771976144, Accuracy 0.901840490797546
Epoch 700: Loss 0.1028003852939949, Accuracy 0.901840490797546
Epoch 800: Loss 0.10017443966015926, Accuracy 0.901840490797546
Epoch 900: Loss 0.09822970379617377, Accuracy 0.8957055214723927
Epoch 1000: Loss 0.09674886443517193, Accuracy 0.901840490797546
In [190]:
##################################
# Plotting the computed gradients
# of the neural network model
# with SELU activation function
##################################
plt.figure(figsize=(10, 6))
for key, value in nn_selu.gradients.items():
    plt.plot(value, label=key)
plt.title('SELU Activation: Gradients by Iteration')
plt.xlabel('Iterations')
plt.ylabel('Gradients')
plt.ylim(-0.05, 0.50)
plt.xlim(-50,1000)
plt.legend(loc="upper left")
plt.show()
No description has been provided for this image
In [191]:
##################################
# Plotting the computed cost
# of the neural network model
# with SELU activation function
##################################
plt.figure(figsize=(10, 6))
plt.plot(nn_selu.losses)
plt.title('SELU Activation: Cost Function by Iteration')
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.ylim(0.05, 0.70)
plt.xlim(-50,1000)
plt.show()
No description has been provided for this image
In [192]:
##################################
# Plotting the computed accuracy
# of the neural network model
# with SELU activation function
##################################
plt.figure(figsize=(10, 6))
plt.plot(nn_selu.accuracies)
plt.title('SELU Activation: Classification by Iteration')
plt.xlabel('Iterations')
plt.ylabel('Accuracy')
plt.ylim(0.00, 1.00)
plt.xlim(-50,1000)
plt.show()
No description has been provided for this image
In [193]:
##################################
# Gathering the final values for 
# accuracy and loss error
# using a SELU activation function
##################################
selu_metrics = pd.DataFrame(["ACCURACY","LOSS"])
selu_values = pd.DataFrame([nn_selu.accuracies[-1],nn_selu.losses[-1]])
selu_method = pd.DataFrame(["SELU"]*2)
selu_summary = pd.concat([selu_metrics,
                          selu_values,
                          selu_method], axis=1)
selu_summary.columns = ['Metric', 'Value', 'Method']
selu_summary.reset_index(inplace=True, drop=True)
display(selu_summary)
Metric Value Method
0 ACCURACY 0.901840 SELU
1 LOSS 0.096749 SELU

1.6.7 Randomized Leaky Rectified Linear Unit Activation Function ¶

Backpropagation and Weight Update, in the context of an artificial neural network, involve the process of iteratively adjusting the weights of the connections between neurons in the network to minimize the difference between the predicted and the actual target responses. Input data is fed into the neural network, and it propagates through the network layer by layer, starting from the input layer, through hidden layers, and ending at the output layer. At each neuron, the weighted sum of inputs is calculated, followed by the application of an activation function to produce the neuron's output. Once the forward pass is complete, the network's output is compared to the actual target output. The difference between the predicted output and the actual output is quantified using a loss function, which measures the discrepancy between the predicted and actual values. Common loss functions for classification tasks include cross-entropy loss. During the backward pass, the error is propagated backward through the network to compute the gradients of the loss function with respect to each weight in the network. This is achieved using the chain rule of calculus, which allows the error to be decomposed and distributed backward through the network. The gradients quantify how much a change in each weight would affect the overall error of the network. Once the gradients are computed, the weights are updated in the opposite direction of the gradient to minimize the error. This update is typically performed using an optimization algorithm such as gradient descent, which adjusts the weights in proportion to their gradients and a learning rate hyperparameter. The learning rate determines the size of the step taken in the direction opposite to the gradient. These steps are repeated for multiple iterations (epochs) over the training data. As the training progresses, the weights are adjusted iteratively to minimize the error, leading to a neural network model that accurately classifies input data.

Activation Functions play a crucial role in neural networks by introducing non-linearity into the network, enabling the model to learn complex patterns and relationships within the data. In the context of a neural network classification model, activation functions are applied to the output of each neuron in the hidden layers to introduce non-linear mappings between the input and output, allowing the network to approximate complex functions and make non-linear decisions. Activation functions are significant during model development by introducing non-linearity (without activation functions, the neural network would simply be a series of linear transformations, no matter how many layers it has. Activation functions introduce non-linearities to the model, enabling it to learn and represent complex patterns and relationships in the data); propagating back gradients (activation functions help in the backpropagation algorithm by providing gradients that indicate the direction and magnitude of adjustments to the weights during training. These gradients are necessary for optimizing the network's parameters through techniques like gradient descent}; and normalizing outputs (activation functions also help in normalizing the output of each neuron, ensuring that it falls within a specific range. This normalization prevents the activation values from becoming too large or too small, which can lead to numerical instability or saturation of gradients during training). The choice of activation function can significantly impact the performance and training dynamics of a neural network classification model, making it an important consideration during model development. Different activation functions have different properties, and selecting the appropriate one depends on factors such as the nature of the problem, the characteristics of the data, and the desired behavior of the network.

Randomized Leaky Rectified Linear Unit Activation Function transforms a variable by using the given value of the variable when it is greater than zero, but multiplies this value with a random number obtained from a uniform distribution. The resulting output which ranges from negative infinity to positive infinity is similar to a Leaky RELU profile but the slope parameter is randomly initialized and updated during training. This process introduces randomness into the activations, acting as a form of regularization and reducing overfitting. This provides a trade-off between the benefits of RELU and Leaky RELU along with the regularization effect of randomness. This activation function is typically useful when dealing with overfitting or training large neural networks where regularization is necessary.

  1. A neural network with the following structure was formulated:
    • Hidden Layer = 3
    • Number of Nodes per Hidden Layer = 4
  2. The backpropagation and gradient descent algorithms were implemented with parameter settings described as follows:
    • Learning Rate = 0.01
    • Epochs = 1000
    • Activation Function = Randomized Leaky Rectified Linear Unit (RReLU)
  3. The final loss estimate determined as 0.09765 at the 1000th epoch was not optimally low as compared to those obtained using the other parameter settings.
  4. Applying the backpropagation and gradient descent algorithms with a RReLU activation function, the neural network model performance is estimated as follows:
    • Accuracy = 92.63803
  5. The estimated classification accuracy using the backpropagation and gradient descent algorithms with a RReLU activation function was optimal as compared to those obtained using the other parameter settings, also demonstrating a consistently smooth profile during the epoch training.
In [194]:
##################################
# Creating a class object
# for the neural network algorithm
# using a RRELU activation function
##################################
class NeuralNetwork_RRELU:
    def __init__(self, input_size, hidden_size1, hidden_size2, hidden_size3, output_size):
        self.weights1 = np.random.randn(input_size, hidden_size1)
        self.bias1 = np.zeros((1, hidden_size1))
        
        self.weights2 = np.random.randn(hidden_size1, hidden_size2)
        self.bias2 = np.zeros((1, hidden_size2))
        
        self.weights3 = np.random.randn(hidden_size2, hidden_size3)
        self.bias3 = np.zeros((1, hidden_size3))
        
        self.weights4 = np.random.randn(hidden_size3, output_size)
        self.bias4 = np.zeros((1, output_size))
        
        self.gradients = {'dw1': [], 'db1': [], 'dw2': [], 'db2': [], 'dw3': [], 'db3': [], 'dw4': [], 'db4': []}
        self.losses = []
        self.accuracies = []
        
    def randomized_leaky_relu(self, x):
        alpha = np.random.uniform(0.01, 0.1)  # Randomly generate alpha in the range [0.01, 0.1]
        return np.where(x > 0, x, alpha * x)
    
    def softmax(self, x):
        exps = np.exp(x - np.max(x, axis=1, keepdims=True))
        return exps / np.sum(exps, axis=1, keepdims=True)
    
    def forward(self, x):
        self.hidden1 = self.randomized_leaky_relu(np.dot(x, self.weights1) + self.bias1)
        self.hidden2 = self.randomized_leaky_relu(np.dot(self.hidden1, self.weights2) + self.bias2)
        self.hidden3 = self.randomized_leaky_relu(np.dot(self.hidden2, self.weights3) + self.bias3)
        self.output = self.softmax(np.dot(self.hidden3, self.weights4) + self.bias4)
        
        return self.output
    
    def backward(self, x, y, lr):
        m = x.shape[0]
        
        # Computing output layer gradients
        output_error = self.output - y
        output_delta = output_error / m
        
        # Computing hidden layer 3 gradients
        hidden3_error = np.dot(output_delta, self.weights4.T)
        hidden3_delta = hidden3_error * (self.hidden3 > 0) + 0.01 * hidden3_error * (self.hidden3 <= 0)
        
        # Computing hidden layer 2 gradients
        hidden2_error = np.dot(hidden3_delta, self.weights3.T)
        hidden2_delta = hidden2_error * (self.hidden2 > 0) + 0.01 * hidden2_error * (self.hidden2 <= 0)
        
        # Computing hidden layer 1 gradients
        hidden1_error = np.dot(hidden2_delta, self.weights2.T)
        hidden1_delta = hidden1_error * (self.hidden1 > 0) + 0.01 * hidden1_error * (self.hidden1 <= 0)
        
        # Updating weights and biases based on computed gradients
        self.weights4 -= lr * np.dot(self.hidden3.T, output_delta)
        self.bias4 -= lr * np.sum(output_delta, axis=0, keepdims=True)
        
        self.weights3 -= lr * np.dot(self.hidden2.T, hidden3_delta)
        self.bias3 -= lr * np.sum(hidden3_delta, axis=0, keepdims=True)
        
        self.weights2 -= lr * np.dot(self.hidden1.T, hidden2_delta)
        self.bias2 -= lr * np.sum(hidden2_delta, axis=0, keepdims=True)
        
        self.weights1 -= lr * np.dot(x.T, hidden1_delta)
        self.bias1 -= lr * np.sum(hidden1_delta, axis=0, keepdims=True)
        
        # Storing computed gradients
        self.gradients['dw1'].append(np.mean(np.abs(hidden1_delta)))
        self.gradients['db1'].append(np.mean(np.abs(self.bias1)))
        self.gradients['dw2'].append(np.mean(np.abs(hidden2_delta)))
        self.gradients['db2'].append(np.mean(np.abs(self.bias2)))
        self.gradients['dw3'].append(np.mean(np.abs(hidden3_delta)))
        self.gradients['db3'].append(np.mean(np.abs(self.bias3)))
        self.gradients['dw4'].append(np.mean(np.abs(output_delta)))
        self.gradients['db4'].append(np.mean(np.abs(self.bias4)))
        
    def train(self, x, y, epochs, lr):
        for i in range(epochs):
            output = self.forward(x)
            self.backward(x, y, lr)
            loss = -np.mean(y * np.log(output))
            self.losses.append(loss)
            accuracy = self.accuracy(x, np.argmax(y, axis=1))
            self.accuracies.append(accuracy)
            if i % 100 == 0:
                print(f'Epoch {i}: Loss {loss}, Accuracy {accuracy}')
                
    def predict(self, x):
        return np.argmax(self.forward(x), axis=1)
    
    def accuracy(self, x, y):
        pred = self.predict(x)
        return np.mean(pred == y)
    
In [195]:
##################################
# Preparing the training data
##################################
X = matrix_x_values
y = y_values

##################################
# Performing a one-hot encoding
# of the target response labels
##################################
num_classes = 2
y_one_hot = np.eye(num_classes)[y]
In [196]:
##################################
# Defining the neural network components
##################################
input_size = 2
hidden_size1 = 4
hidden_size2 = 4
hidden_size3 = 4
output_size = 2
In [197]:
##################################
# Initializing a neural network model object
# with RRELU activation function
##################################
np.random.seed(88888)
nn_rrelu = NeuralNetwork_RRELU(input_size, hidden_size1, hidden_size2, hidden_size3, output_size)
In [198]:
##################################
# Training a neural network model
# with RRELU activation function
##################################
nn_rrelu.train(X, y_one_hot, epochs=1001, lr=0.01)
Epoch 0: Loss 0.42536277001068584, Accuracy 0.3496932515337423
Epoch 100: Loss 0.17225380216691843, Accuracy 0.901840490797546
Epoch 200: Loss 0.13053451403709007, Accuracy 0.9079754601226994
Epoch 300: Loss 0.11370320839132757, Accuracy 0.8957055214723927
Epoch 400: Loss 0.10982347564770703, Accuracy 0.901840490797546
Epoch 500: Loss 0.10483990809689303, Accuracy 0.9263803680981595
Epoch 600: Loss 0.1015575263421128, Accuracy 0.9325153374233128
Epoch 700: Loss 0.09951830751495384, Accuracy 0.9202453987730062
Epoch 800: Loss 0.09799491669757471, Accuracy 0.9325153374233128
Epoch 900: Loss 0.09839336828285061, Accuracy 0.9263803680981595
Epoch 1000: Loss 0.09765980921693111, Accuracy 0.9263803680981595
In [199]:
##################################
# Plotting the computed gradients
# of the neural network model
# with RRELU activation function
##################################
plt.figure(figsize=(10, 6))
for key, value in nn_rrelu.gradients.items():
    plt.plot(value, label=key)
plt.title('RRELU Activation: Gradients by Iteration')
plt.xlabel('Iterations')
plt.ylabel('Gradients')
plt.ylim(-0.05, 0.50)
plt.xlim(-50,1000)
plt.legend(loc="upper left")
plt.show()
No description has been provided for this image
In [200]:
##################################
# Plotting the computed cost
# of the neural network model
# with RRELU activation function
##################################
plt.figure(figsize=(10, 6))
plt.plot(nn_rrelu.losses)
plt.title('RRELU Activation: Cost Function by Iteration')
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.ylim(0.05, 0.70)
plt.xlim(-50,1000)
plt.show()
No description has been provided for this image
In [201]:
##################################
# Plotting the computed accuracy
# of the neural network model
# with RRELU activation function
##################################
plt.figure(figsize=(10, 6))
plt.plot(nn_rrelu.accuracies)
plt.title('RRELU Activation: Classification by Iteration')
plt.xlabel('Iterations')
plt.ylabel('Accuracy')
plt.ylim(0.00, 1.00)
plt.xlim(-50,1000)
plt.show()
No description has been provided for this image
In [202]:
##################################
# Gathering the final values for 
# accuracy and loss error
# using a RRELU activation function
##################################
rrelu_metrics = pd.DataFrame(["ACCURACY","LOSS"])
rrelu_values = pd.DataFrame([nn_rrelu.accuracies[-1],nn_rrelu.losses[-1]])
rrelu_method = pd.DataFrame(["RRELU"]*2)
rrelu_summary = pd.concat([rrelu_metrics,
                           rrelu_values,
                           rrelu_method], axis=1)
rrelu_summary.columns = ['Metric', 'Value', 'Method']
rrelu_summary.reset_index(inplace=True, drop=True)
display(rrelu_summary)
Metric Value Method
0 ACCURACY 0.92638 RRELU
1 LOSS 0.09766 RRELU

1.7. Consolidated Findings ¶

  1. This activation function showed the presence of the vanishing gradients problem resulting to a relatively higher loss and lower classification accuracy:
    • Sigmoid = Sigmoid Activation Function
  2. These activation functions demonstrated the absence of the vanishing gradients problem resulting to a sufficiently low loss and higher classification accuracy:
    • RELU = Rectified Linear Unit Activation Function
    • Leaky_RELU = Leaky Rectified Linear Unit Activation Function
    • ELU = Exponential Linear Unit Activation Function
    • SELU = Scaled Exponential Linear Unit Activation Function
    • RRELU = Randomized Leaky Rectified Linear Unit Activation Function
  3. The choice of Activation Function can significantly impact the performance and training dynamics of a neural network classification model, making it an important consideration during model development. Different activation functions have different properties, and selecting the appropriate one depends on factors such as the nature of the problem, the characteristics of the data, and the desired behavior of the network.
In [203]:
##################################
# Consolidating all the
# model performance metrics
##################################
model_performance_comparison = pd.concat([sigmoid_summary, 
                                          relu_summary,
                                          leakyrelu_summary, 
                                          elu_summary,
                                          selu_summary, 
                                          rrelu_summary], 
                                         ignore_index=True)
print('Neural Network Model Comparison: ')
display(model_performance_comparison)
Neural Network Model Comparison: 
Metric Value Method
0 ACCURACY 0.748466 Sigmoid
1 LOSS 0.278053 Sigmoid
2 ACCURACY 0.932515 RELU
3 LOSS 0.096237 RELU
4 ACCURACY 0.926380 Leaky_RELU
5 LOSS 0.093777 Leaky_RELU
6 ACCURACY 0.926380 ELU
7 LOSS 0.094432 ELU
8 ACCURACY 0.901840 SELU
9 LOSS 0.096749 SELU
10 ACCURACY 0.926380 RRELU
11 LOSS 0.097660 RRELU
In [204]:
##################################
# Consolidating the values for the
# accuracy metrics
# for all models
##################################
model_performance_comparison_accuracy = model_performance_comparison[model_performance_comparison['Metric']=='ACCURACY']
model_performance_comparison_accuracy.reset_index(inplace=True, drop=True)
model_performance_comparison_accuracy
Out[204]:
Metric Value Method
0 ACCURACY 0.748466 Sigmoid
1 ACCURACY 0.932515 RELU
2 ACCURACY 0.926380 Leaky_RELU
3 ACCURACY 0.926380 ELU
4 ACCURACY 0.901840 SELU
5 ACCURACY 0.926380 RRELU
In [205]:
##################################
# Plotting the values for the
# accuracy metrics
# for all models
##################################
fig, ax = plt.subplots(figsize=(7, 7))
accuracy_hbar = ax.barh(model_performance_comparison_accuracy['Method'], model_performance_comparison_accuracy['Value'])
ax.set_xlabel("Accuracy")
ax.set_ylabel("Neural Network Classification Models")
ax.bar_label(accuracy_hbar, fmt='%.5f', padding=-50, color='white', fontweight='bold')
ax.set_xlim(0,1)
plt.show()
No description has been provided for this image
In [206]:
##################################
# Consolidating the values for the
# logarithmic loss error metrics
# for all models
##################################
model_performance_comparison_loss = model_performance_comparison[model_performance_comparison['Metric']=='LOSS']
model_performance_comparison_loss.reset_index(inplace=True, drop=True)
model_performance_comparison_loss
Out[206]:
Metric Value Method
0 LOSS 0.278053 Sigmoid
1 LOSS 0.096237 RELU
2 LOSS 0.093777 Leaky_RELU
3 LOSS 0.094432 ELU
4 LOSS 0.096749 SELU
5 LOSS 0.097660 RRELU
In [207]:
##################################
# Plotting the values for the
# loss error
# for all models
##################################
fig, ax = plt.subplots(figsize=(7, 7))
loss_hbar = ax.barh(model_performance_comparison_loss['Method'], model_performance_comparison_loss['Value'])
ax.set_xlabel("Loss Error")
ax.set_ylabel("Neural Network Classification Models")
ax.bar_label(loss_hbar, fmt='%.5f', padding=-50, color='white', fontweight='bold')
ax.set_xlim(0,0.40)
plt.show()
No description has been provided for this image

2. Summary ¶

Project48_Summary.png

3. References ¶

  • [Book] Deep Learning: A Visual Approach by Andrew Glassner
  • [Book] Deep Learning with Python by François Chollet
  • [Book] The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedman
  • [Book] Data Preparation for Machine Learning: Data Cleaning, Feature Selection, and Data Transforms in Python by Jason Brownlee
  • [Book] Feature Engineering and Selection: A Practical Approach for Predictive Models by Max Kuhn and Kjell Johnson
  • [Book] Feature Engineering for Machine Learning by Alice Zheng and Amanda Casari
  • [Book] Applied Predictive Modeling by Max Kuhn and Kjell Johnson
  • [Book] Data Mining: Practical Machine Learning Tools and Techniques by Ian Witten, Eibe Frank, Mark Hall and Christopher Pal
  • [Book] Data Cleaning by Ihab Ilyas and Xu Chu
  • [Book] Data Wrangling with Python by Jacqueline Kazil and Katharine Jarmul
  • [Book] Regression Modeling Strategies by Frank Harrell
  • [Python Library API] NumPy by NumPy Team
  • [Python Library API] pandas by Pandas Team
  • [Python Library API] seaborn by Seaborn Team
  • [Python Library API] matplotlib.pyplot by MatPlotLib Team
  • [Python Library API] itertools by Python Team
  • [Python Library API] operator by Python Team
  • [Python Library API] sklearn.experimental by Scikit-Learn Team
  • [Python Library API] sklearn.impute by Scikit-Learn Team
  • [Python Library API] sklearn.linear_model by Scikit-Learn Team
  • [Python Library API] sklearn.preprocessing by Scikit-Learn Team
  • [Python Library API] scipy by SciPy Team
  • [Article] Exploratory Data Analysis in Python — A Step-by-Step Process by Andrea D'Agostino (Towards Data Science)
  • [Article] Exploratory Data Analysis with Python by Douglas Rocha (Medium)
  • [Article] 4 Ways to Automate Exploratory Data Analysis (EDA) in Python by Abdishakur Hassan (BuiltIn)
  • [Article] 10 Things To Do When Conducting Your Exploratory Data Analysis (EDA) by Alifia Harmadi (Medium)
  • [Article] How to Handle Missing Data with Python by Jason Brownlee (Machine Learning Mastery)
  • [Article] Statistical Imputation for Missing Values in Machine Learning by Jason Brownlee (Machine Learning Mastery)
  • [Article] Imputing Missing Data with Simple and Advanced Techniques by Idil Ismiguzel (Towards Data Science)
  • [Article] Missing Data Imputation Approaches | How to handle missing values in Python by Selva Prabhakaran (Machine Learning +)
  • [Article] Master The Skills Of Missing Data Imputation Techniques In Python(2022) And Be Successful by Mrinal Walia (Analytics Vidhya)
  • [Article] How to Preprocess Data in Python by Afroz Chakure (BuiltIn)
  • [Article] Easy Guide To Data Preprocessing In Python by Ahmad Anis (KDNuggets)
  • [Article] Data Preprocessing in Python by Tarun Gupta (Towards Data Science)
  • [Article] Data Preprocessing using Python by Suneet Jain (Medium)
  • [Article] Data Preprocessing in Python by Abonia Sojasingarayar (Medium)
  • [Article] Data Preprocessing in Python by Afroz Chakure (Medium)
  • [Article] Detecting and Treating Outliers | Treating the Odd One Out! by Harika Bonthu (Analytics Vidhya)
  • [Article] Outlier Treatment with Python by Sangita Yemulwar (Analytics Vidhya)
  • [Article] A Guide to Outlier Detection in Python by Sadrach Pierre (BuiltIn)
  • [Article] How To Find Outliers in Data Using Python (and How To Handle Them) by Eric Kleppen (Career Foundry)
  • [Article] Statistics in Python — Collinearity and Multicollinearity by Wei-Meng Lee (Towards Data Science)
  • [Article] Understanding Multicollinearity and How to Detect it in Python by Terence Shin (Towards Data Science)
  • [Article] A Python Library to Remove Collinearity by Gianluca Malato (Your Data Teacher)
  • [Article] 8 Best Data Transformation in Pandas by Tirendaz AI (Medium)
  • [Article] Data Transformation Techniques with Python: Elevate Your Data Game! by Siddharth Verma (Medium)
  • [Article] Data Scaling with Python by Benjamin Obi Tayo (KDNuggets)
  • [Article] How to Use StandardScaler and MinMaxScaler Transforms in Python by Jason Brownlee (Machine Learning Mastery)
  • [Article] Feature Engineering: Scaling, Normalization, and Standardization by Aniruddha Bhandari (Analytics Vidhya)
  • [Article] How to Normalize Data Using scikit-learn in Python by Jayant Verma (Digital Ocean)
  • [Article] What are Categorical Data Encoding Methods | Binary Encoding by Shipra Saxena (Analytics Vidhya)
  • [Article] Guide to Encoding Categorical Values in Python by Chris Moffitt (Practical Business Python)
  • [Article] Categorical Data Encoding Techniques in Python: A Complete Guide by Soumen Atta (Medium)
  • [Article] Categorical Feature Encoding Techniques by Tara Boyle (Medium)
  • [Article] Ordinal and One-Hot Encodings for Categorical Data by Jason Brownlee (Machine Learning Mastery)
  • [Article] Hypothesis Testing with Python: Step by Step Hands-On Tutorial with Practical Examples by Ece Işık Polat (Towards Data Science)
  • [Article] 17 Statistical Hypothesis Tests in Python (Cheat Sheet) by Jason Brownlee (Machine Learning Mastery)
  • [Article] A Step-by-Step Guide to Hypothesis Testing in Python using Scipy by Gabriel Rennó (Medium)
  • [Article] How Does Backpropagation in a Neural Network Work? by Anas Al-Masri (Builtin)
  • [Article] A Step by Step Backpropagation Example by Matt Mazur (MattMazur.Com)
  • [Article] Understanding Backpropagation by Brent Scarff (Towards Data Science)
  • [Article] Understanding Backpropagation Algorithm by Simeon Kostadinov (Towards Data Science)
  • [Article] A Comprehensive Guide to the Backpropagation Algorithm in Neural Networks by Ahmed Gad (Neptune.AI)
  • [Article] Backpropagation by John McGonagle, George Shaikouski and Christopher Williams (Brilliant)
  • [Article] Backpropagation in Neural Networks by Inna Logunova (Serokell.IO)
  • [Article] Backpropagation Concept Explained in 5 Levels of Difficulty by Devashish Sood (Medium)
  • [Article] BackProp Explainer by Donny Bertucci (GitHub)
  • [Article] Backpropagation Algorithm in Neural Network and Machine Learning by Intellipaat Team
  • [Article] Understanding Backpropagation in Neural Networks by Tech-AI-Math Team
  • [Article] Backpropagation Neural Network using Python by Avinash Navlani (Machine Learning Geek)
  • [Article] Back Propagation in Neural Network: Machine Learning Algorithm by Daniel Johnson (Guru99)
  • [Article] What is Backpropagation? by Thomas Wood (DeepAI.Org)
  • [Article] Activation Functions in Neural Networks [12 Types & Use Cases] by Pragati Baheti (V7.Com)
  • [Article] Activation Functions in Neural Networks by Sagar Sharma (Towards Data Science)
  • [Article] Comparison of Sigmoid, Tanh and ReLU Activation Functions by Sandeep Kumar (AItude.Com)
  • [Article] How to Choose an Activation Function for Deep Learning by Jason Brownlee (Machine Learning Mastery)
  • [Article] Choosing the Right Activation Function in Deep Learning: A Practical Overview and Comparison by Okan Yenigün (Medium)
  • [Article] Activation Functions in Neural Networks by Geeks For Geeks Team
  • [Article] A Practical Comparison of Activation Functions by Danny Denenberg (Medium)
  • [Article] Activation Functions in Neural Networks: With 15 examples by Nikolaj Buhl (Encord.Com)
  • [Article] Activation functions used in Neural Networks - Which is Better? by Anish Singh Walia (Medium)
  • [Article] 6 Types of Activation Function in Neural Networks You Need to Know by Kechit Goyal (UpGrad.Com)
  • [Article] Activation Functions in Neural Networks by SuperAnnotate Team
  • [Article] Compare Activation Layers by MathWorks Team
  • [Article] Activation Functions In Neural Networks by Kurtis Pykes (Comet.Com)
  • [Article] ReLU vs. Sigmoid Function in Deep Neural Networks by Ayush Thakur (Wanb.AI)
  • [Article] Using Activation Functions in Neural Networks by Jason Bronwlee (Machine Learning Mastery)
  • [Article] Activation Function: Top 9 Most Popular Explained & When To Use Them by Neri Van Otten (SpotIntelligence.Com)
  • [Article] 5 Deep Learning and Neural Network Activation Functions to Know by Artem Oppermann (BuiltIn.Com)
  • [Article] Activation Functions in Deep Learning: Sigmoid, tanh, ReLU by Artem Oppermann
  • [Article] 7 Types of Activation Functions in Neural Network by Dinesh Kumawat (AnalyticsSteps.Com)
  • [Article] What is an Activation Function? A Complete Guide by Petru Potrimba (RoboFlow.Com)
  • [Publication] Data Quality for Machine Learning Tasks by Nitin Gupta, Shashank Mujumdar, Hima Patel, Satoshi Masuda, Naveen Panwar, Sambaran Bandyopadhyay, Sameep Mehta, Shanmukha Guttula, Shazia Afzal, Ruhi Sharma Mittal and Vitobha Munigala (KDD ’21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining)
  • [Publication] Overview and Importance of Data Quality for Machine Learning Tasks by Abhinav Jain, Hima Patel, Lokesh Nagalapatti, Nitin Gupta, Sameep Mehta, Shanmukha Guttula, Shashank Mujumdar, Shazia Afzal, Ruhi Sharma Mittal and Vitobha Munigala (KDD ’20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining)
  • [Publication] Multiple Imputation of Discrete and Continuous Data by Fully Conditional Specification by Stef van Buuren (Statistical Methods in Medical Research)
  • [Publication] Mathematical Contributions to the Theory of Evolution: Regression, Heredity and Panmixia by Karl Pearson (Royal Society)
  • [Publication] A New Family of Power Transformations to Improve Normality or Symmetry by In-Kwon Yeo and Richard Johnson (Biometrika)
  • [Course] IBM Data Analyst Professional Certificate by IBM Team (Coursera)
  • [Course] IBM Data Science Professional Certificate by IBM Team (Coursera)
  • [Course] IBM Machine Learning Professional Certificate by IBM Team (Coursera)
  • [Course] Machine Learning Specialization Certificate by DeepLearning.AI Team (Coursera)

In [208]:
from IPython.display import display, HTML
display(HTML("<style>.rendered_html { font-size: 15px; font-family: 'Trebuchet MS'; }</style>"))