Bank Churn Prediction

Problem Statement¶

Context¶

Businesses like banks which provide service have to worry about problem of 'Customer Churn' i.e. customers leaving and joining another service provider. It is important to understand which aspects of the service influence a customer's decision in this regard. Management can concentrate efforts on improvement of service, keeping in mind these priorities.

Objective¶

You as a Data scientist with the bank need to build a neural network based classifier that can determine whether a customer will leave the bank or not in the next 6 months.

Data Dictionary¶

  • CustomerId: Unique ID which is assigned to each customer

  • Surname: Last name of the customer

  • CreditScore: It defines the credit history of the customer.

  • Geography: A customer’s location

  • Gender: It defines the Gender of the customer

  • Age: Age of the customer

  • Tenure: Number of years for which the customer has been with the bank

  • NumOfProducts: refers to the number of products that a customer has purchased through the bank.

  • Balance: Account balance

  • HasCrCard: It is a categorical variable which decides whether the customer has credit card or not.

  • EstimatedSalary: Estimated salary

  • isActiveMember: Is is a categorical variable which decides whether the customer is active member of the bank or not ( Active member in the sense, using bank products regularly, making transactions etc )

  • Exited : whether or not the customer left the bank within six month. It can take two values

** 0=No ( Customer did not leave the bank ) ** 1=Yes ( Customer left the bank )

Importing necessary libraries¶

In [11]:
#importing tensorflow
import tensorflow as tf
print(tf.__version__)
WARNING:tensorflow:From C:\Users\bruce\AppData\Roaming\Python\Python311\site-packages\keras\src\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead.

2.15.0
In [12]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import tensorflow as tf
from sklearn import preprocessing
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve, classification_report
from sklearn.preprocessing import LabelEncoder, OneHotEncoder,StandardScaler
import matplotlib.pyplot as plt
from tensorflow.keras import optimizers
from sklearn.decomposition import PCA
import seaborn as sns
import keras
import tensorflow as tf
from keras import backend as K
from keras.models import Sequential
from keras.layers import Dense, Dropout
import time

# import SMOTE for sampling

# Oversample with SMOTE and random undersample for imbalanced dataset
from collections import Counter
from imblearn.over_sampling import SMOTE 
from imblearn.under_sampling import RandomUnderSampler
from imblearn.pipeline import Pipeline

Loading the dataset¶

In [14]:
#Defining the path of the dataset
dataset_file = 'Churn.csv'
#reading dataset
data = pd.read_csv(dataset_file)

Data Overview¶

In [16]:
data.head()
Out[16]:
RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited
0 1 15634602 Hargrave 619 France Female 42 2 0.00 1 1 1 101348.88 1
1 2 15647311 Hill 608 Spain Female 41 1 83807.86 1 0 1 112542.58 0
2 3 15619304 Onio 502 France Female 42 8 159660.80 3 1 0 113931.57 1
3 4 15701354 Boni 699 France Female 39 1 0.00 2 0 0 93826.63 0
4 5 15737888 Mitchell 850 Spain Female 43 2 125510.82 1 1 1 79084.10 0
In [17]:
data.shape
Out[17]:
(10000, 14)

Observations

There are 10,000 rows with 14 columns.

Check for missing values¶

In [20]:
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        10000 non-null  int64  
 1   CustomerId       10000 non-null  int64  
 2   Surname          10000 non-null  object 
 3   CreditScore      10000 non-null  int64  
 4   Geography        10000 non-null  object 
 5   Gender           10000 non-null  object 
 6   Age              10000 non-null  int64  
 7   Tenure           10000 non-null  int64  
 8   Balance          10000 non-null  float64
 9   NumOfProducts    10000 non-null  int64  
 10  HasCrCard        10000 non-null  int64  
 11  IsActiveMember   10000 non-null  int64  
 12  EstimatedSalary  10000 non-null  float64
 13  Exited           10000 non-null  int64  
dtypes: float64(2), int64(9), object(3)
memory usage: 1.1+ MB
In [21]:
# checking shape of the data
print("There are", data.shape[0], 'rows and', data.shape[1], "columns.")
There are 10000 rows and 14 columns.

Observation

  • This shows that there are 10000 instances and 14 attributes including RowNumber and Exited attribute.

As you can see there are no null values in any of the colum

  • There are three columns that are object data types: Surname, Geography, and Gendern
In [23]:
data.isnull().sum()
Out[23]:
RowNumber          0
CustomerId         0
Surname            0
CreditScore        0
Geography          0
Gender             0
Age                0
Tenure             0
Balance            0
NumOfProducts      0
HasCrCard          0
IsActiveMember     0
EstimatedSalary    0
Exited             0
dtype: int64

Observation

There are no missing values

Check for duplicate values¶

In [26]:
data.duplicated().sum()
Out[26]:
0

Observation

There are no duplicate values, which you would expect with a column named RowNumber.

In [28]:
df_no_row_number = data.drop('RowNumber', axis=1)
df_no_row_number.duplicated().sum()                             
Out[28]:
0

Observation

There are no duplicate values without a column named RowNumber.

View object data type¶

In [31]:
obj_columns = data.select_dtypes(include='object').columns

# Print the count of unique categorical levels in each column
for column in obj_columns:
    print(data[column].value_counts())
    print("-" * 50)
Smith       32
Scott       29
Martin      29
Walker      28
Brown       26
            ..
Izmailov     1
Bold         1
Bonham       1
Poninski     1
Burbidge     1
Name: Surname, Length: 2932, dtype: int64
--------------------------------------------------
France     5014
Germany    2509
Spain      2477
Name: Geography, dtype: int64
--------------------------------------------------
Male      5457
Female    4543
Name: Gender, dtype: int64
--------------------------------------------------
In [32]:
# Print the percentage of unique categorical levels in each column
for column in obj_columns:
    print(data[column].value_counts(normalize=True))
    print("-" * 50)
Smith       0.0032
Scott       0.0029
Martin      0.0029
Walker      0.0028
Brown       0.0026
             ...  
Izmailov    0.0001
Bold        0.0001
Bonham      0.0001
Poninski    0.0001
Burbidge    0.0001
Name: Surname, Length: 2932, dtype: float64
--------------------------------------------------
France     0.5014
Germany    0.2509
Spain      0.2477
Name: Geography, dtype: float64
--------------------------------------------------
Male      0.5457
Female    0.4543
Name: Gender, dtype: float64
--------------------------------------------------

Observations

  • Half of the data shows rows from France, with Germany and Spain splitting the other half
  • There are a few more males than females

Exploratory Data Analysis¶

In [35]:
pd.options.display.float_format = "{:,.2f}".format
df_no_row_number.describe().T
Out[35]:
count mean std min 25% 50% 75% max
CustomerId 10,000.00 15,690,940.57 71,936.19 15,565,701.00 15,628,528.25 15,690,738.00 15,753,233.75 15,815,690.00
CreditScore 10,000.00 650.53 96.65 350.00 584.00 652.00 718.00 850.00
Age 10,000.00 38.92 10.49 18.00 32.00 37.00 44.00 92.00
Tenure 10,000.00 5.01 2.89 0.00 3.00 5.00 7.00 10.00
Balance 10,000.00 76,485.89 62,397.41 0.00 0.00 97,198.54 127,644.24 250,898.09
NumOfProducts 10,000.00 1.53 0.58 1.00 1.00 1.00 2.00 4.00
HasCrCard 10,000.00 0.71 0.46 0.00 0.00 1.00 1.00 1.00
IsActiveMember 10,000.00 0.52 0.50 0.00 0.00 1.00 1.00 1.00
EstimatedSalary 10,000.00 100,090.24 57,510.49 11.58 51,002.11 100,193.91 149,388.25 199,992.48
Exited 10,000.00 0.20 0.40 0.00 0.00 0.00 0.00 1.00

Observation

  • There are binary columns: Exited, HasCrCard, and IsActiveMember although both are represented as int64 in the data
  • CreditScore, Age, Tenure, ExpectedSalary seem to be normal distribution (not skewed)
  • Balance
In [37]:
# skewness along the index axis 
df_no_row_number.skew(axis = 0, skipna = True, numeric_only=True) 
Out[37]:
CustomerId         0.00
CreditScore       -0.07
Age                1.01
Tenure             0.01
Balance           -0.14
NumOfProducts      0.75
HasCrCard         -0.90
IsActiveMember    -0.06
EstimatedSalary    0.00
Exited             1.47
dtype: float64

The following functions need to be defined to carry out the Exploratory Data Analysis.¶

In [39]:
# function to plot a boxplot and a histogram along the same scale.


def histogram_boxplot(data, feature, figsize=(12, 7), kde=False, bins=None):
    """
    Boxplot and histogram combined

    data: dataframe
    feature: dataframe column
    figsize: size of figure (default (12,7))
    kde: whether to the show density curve (default False)
    bins: number of bins for histogram (default None)
    """
    f2, (ax_box2, ax_hist2) = plt.subplots(
        nrows=2,  # Number of rows of the subplot grid= 2
        sharex=True,  # x-axis will be shared among all subplots
        gridspec_kw={"height_ratios": (0.25, 0.75)},
        figsize=figsize,
    )  # creating the 2 subplots
    sns.boxplot(
        data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
    )  # boxplot will be created and a triangle will indicate the mean value of the column
    sns.histplot(
        data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter", 
        hue=feature
    ) if bins else sns.histplot(
        data=data, x=feature, kde=kde, ax=ax_hist2
    )  # For histogram
    ax_hist2.axvline(
        data[feature].mean(), color="green", linestyle="--"
    )  # Add mean to the histogram
    ax_hist2.axvline(
        data[feature].median(), color="black", linestyle="-"
    )  # Add median to the histogram
In [40]:
# function to create labeled barplots


def labeled_barplot(data, feature, perc=False, n=None):
    """
    Barplot with percentage at the top

    data: dataframe
    feature: dataframe column
    perc: whether to display percentages instead of count (default is False)
    n: displays the top n category levels (default is None, i.e., display all levels)
    """

    total = len(data[feature])  # length of the column
    count = data[feature].nunique()
    if n is None:
        plt.figure(figsize=(count + 1, 5))
    else:
        plt.figure(figsize=(n + 1, 5))

    plt.xticks(rotation=90, fontsize=15)
    ax = sns.countplot(
        data=data,
        x=feature,
        palette="Paired",
        order=data[feature].value_counts().index[:n].sort_values(),
        legend=False,
        hue=feature
    )

    for p in ax.patches:
        if perc == True:
            label = "{:.1f}%".format(
                100 * p.get_height() / total
            )  # percentage of each class of the category
        else:
            label = p.get_height()  # count of each level of the category

        x = p.get_x() + p.get_width() / 2  # width of the plot
        y = p.get_height()  # height of the plot

        ax.annotate(
            label,
            (x, y),
            ha="center",
            va="center",
            size=12,
            xytext=(0, 5),
            textcoords="offset points",
        )  # annotate the percentage

    plt.show()  # show the plot
In [41]:
# function to plot stacked bar chart

def stacked_barplot(data, predictor, target):
    """
    Print the category counts and plot a stacked bar chart

    data: dataframe
    predictor: independent variable
    target: target variable
    """
    count = data[predictor].nunique()
    sorter = data[target].value_counts().index[-1]
    tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
        by=sorter, ascending=False
    )
    print(tab1)
    print("-" * 120)
    tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
        by=sorter, ascending=False
    )
    tab.plot(kind="bar", stacked=True, figsize=(count + 1, 5))
    plt.legend(
        loc="lower left", frameon=False,
    )
    plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
    plt.show()
In [42]:
### Function to plot distributions

def distribution_plot_wrt_target(data, predictor, target):

    fig, axs = plt.subplots(2, 2, figsize=(12, 10))

    target_uniq = data[target].unique()

    axs[0, 0].set_title("Distribution of target for target=" + str(target_uniq[0]))
    sns.histplot(
        data=data[data[target] == target_uniq[0]],
        x=predictor,
        kde=True,
        ax=axs[0, 0],
        color="teal",
        hue=target,
        legend=False
    )

    axs[0, 1].set_title("Distribution of target for target=" + str(target_uniq[1]))
    sns.histplot(
        data=data[data[target] == target_uniq[1]],
        x=predictor,
        kde=True,
        ax=axs[0, 1],
        color="orange",
        hue=target,
        legend=False
    )

    axs[1, 0].set_title("Boxplot w.r.t target")
    sns.boxplot(data=data, x=target, y=predictor, ax=axs[1, 0], palette="gist_rainbow",
        hue=target,
        legend=False              
    )

    axs[1, 1].set_title("Boxplot (without outliers) w.r.t target")
    sns.boxplot(
        data=data,
        x=target,
        y=predictor,
        ax=axs[1, 1],
        showfliers=False,
        palette="gist_rainbow",
        hue=target,
        legend=False
    )

    plt.tight_layout()
    plt.show()
In [ ]:
 

Univariate Analysis¶

Credit Score¶

In [45]:
histogram_boxplot(data, 'CreditScore')
No description has been provided for this image

Geography¶

In [47]:
labeled_barplot(data=data,feature='Geography',perc=True)
No description has been provided for this image

Age¶

In [49]:
histogram_boxplot(data, 'Age')
No description has been provided for this image

Observation

Age is slightly skewed right. Should investigate how Age is correlated to other features.

Tenure¶

In [52]:
histogram_boxplot(data, 'Tenure', bins=11)
No description has been provided for this image

Observation

Tenure has a normal distribution. Not skewed left or right.

Balance¶

In [55]:
histogram_boxplot(data, 'Balance')
No description has been provided for this image

Observation

Balance overall is a normal distribution with the exception of a large number with a zero balance. This makes the feature left skewed.

Number of products¶

In [58]:
histogram_boxplot(data, 'NumOfProducts', bins=4)
No description has been provided for this image

Observation

Balance is right skewed.

Has credit card¶

In [61]:
labeled_barplot(data=data,feature='HasCrCard', perc=True)
No description has been provided for this image

Observation

70% has a credit card.

Is active member¶

In [64]:
labeled_barplot(data=data,feature='IsActiveMember',perc=True)
No description has been provided for this image

Observation

Active membership is just about evenly split.

Gender¶

In [67]:
labeled_barplot(data=data,feature='Gender',perc=True)
No description has been provided for this image

Observation

There are more males with accounts by about a 10% margin.

Surname¶

In [70]:
print(data['Surname'].nunique())
2932

Observation

There are 2932 distinct surnames.

In [72]:
filtered_df = data.groupby('Surname').filter(lambda x: len(x) > 10)
labeled_barplot(data=filtered_df,feature='Surname')
No description has been provided for this image

Estimated Salary¶

In [74]:
histogram_boxplot(data, 'EstimatedSalary')
No description has been provided for this image

Observation

Salary is normalized without skewedness.

Exited¶

In [77]:
labeled_barplot(data=data,feature='Exited',perc=True)
No description has been provided for this image

Observation

About 80% of the target (customers who have exited) have accounts. About 20% have exited.

Bivariate Analysis¶

In [80]:
plt.figure(figsize=(10,5))
numeric_data = df_no_row_number.select_dtypes(include='number')
sns.heatmap(numeric_data.corr(),annot=True,cmap='Spectral',vmin=-1,vmax=1)
plt.show()
No description has been provided for this image

Observations

Generally there is no strong correlations in either direction between the features.

Credit Score vs Exited¶

In [83]:
distribution_plot_wrt_target(numeric_data, 'CreditScore', 'Exited')
No description has been provided for this image
In [84]:
sns.scatterplot(data=data, x='CreditScore', y='Balance', hue='Exited');
No description has been provided for this image

Observation

There doesn't seem to be a grouping between CreditScore and Balance wrt Exited

Age vs Exited¶

In [87]:
sns.stripplot(data=data, x='Tenure', y='Age', hue='Exited');
No description has been provided for this image

Geography vs Exited¶

In [89]:
stacked_barplot(data, 'Geography', 'Exited')
Exited        0     1    All
Geography                   
All        7963  2037  10000
Germany    1695   814   2509
France     4204   810   5014
Spain      2064   413   2477
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image

Credit Score vs Age¶

In [91]:
sns.scatterplot(data=data, x='CreditScore', y='Age', hue='Exited');
sns.lmplot(data=data, x='CreditScore', y='Age', hue='Exited',ci=False);
No description has been provided for this image
No description has been provided for this image

Observation

Generally older customers have exited regardless of credit score.

Age vs Exited¶

In [94]:
distribution_plot_wrt_target(numeric_data, 'Age', 'Exited')
No description has been provided for this image

Observation

This pair (age and exited) were the highest correlation. The chart shows a higher mean age for those exiting and higher IRQ. The data holds the same with outliers and without outliers.

Is Active Member vs Exited¶

In [97]:
stacked_barplot(data, 'IsActiveMember', 'Exited')
Exited             0     1    All
IsActiveMember                   
All             7963  2037  10000
0               3547  1302   4849
1               4416   735   5151
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image

Has Credit Card vs Exited¶

In [99]:
stacked_barplot(data, 'HasCrCard', 'Exited')
Exited        0     1    All
HasCrCard                   
All        7963  2037  10000
1          5631  1424   7055
0          2332   613   2945
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image

Observation

Although many more have a credit card, it does not seem to be correlated to whether the customer has exited.

Estimated Salary vs Exited¶

In [102]:
distribution_plot_wrt_target(numeric_data, 'EstimatedSalary', 'Exited')
No description has been provided for this image

Balance vs Exited¶

In [104]:
distribution_plot_wrt_target(numeric_data, 'Balance', 'Exited')
No description has been provided for this image

Observation

Those leaving the bank had an average higher balance.

Gender vs Exited¶

In [107]:
stacked_barplot(data, 'Gender', 'Exited')
Exited     0     1    All
Gender                   
All     7963  2037  10000
Female  3404  1139   4543
Male    4559   898   5457
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image

Observation

Even though there are more males, females are more likely to leave the bank than males.

In [109]:
sns.stripplot(data=data, x='HasCrCard', y='Balance', hue='Exited');
No description has been provided for this image

Data Preprocessing¶

In [111]:
### Drop the column as they will not add value to the modeling
## Also drop the target
X = data.drop(['RowNumber', 'CustomerId', 'Surname', 'Exited'], axis=1)
Y = data['Exited']

Column binning¶

For this run, we will not "bin" any of the features. Although in future runs it might be helpful to bin:

  • Salary
  • Balance This might be valuable because so many have zero balance

Dummy Variable Creation¶

In [115]:
object_columns = X.select_dtypes(include='object').columns
print("object columns:", object_columns)
object columns: Index(['Geography', 'Gender'], dtype='object')

Convert the objects to data

In [117]:
# Encoding the categorical variables using one-hot encoding
X = pd.get_dummies(
    X,
    columns=object_columns,
    drop_first=True,
)

Train-validation-test Split¶

In [119]:
# Splitting the dataset into the Training and Test set.
X_train, X_test, y_train, y_test = train_test_split(X,Y, test_size = 0.2, random_state = 42,stratify = Y)
In [120]:
# Splitting the Train dataset into the Training and Validation set.
X_train, X_valid, y_train, y_valid = train_test_split(X_train,y_train, test_size = 0.2, random_state = 42,stratify = y_train)
In [121]:
# Printing the shapes.
print(X_train.shape,y_train.shape)
print(X_valid.shape,y_valid.shape)
print(X_test.shape,y_test.shape)
(6400, 11) (6400,)
(1600, 11) (1600,)
(2000, 11) (2000,)

Data Normalization¶

We will have the tensorflow function split the validation data sets

In [124]:
num_columns = X.select_dtypes(include='number').columns
print(num_columns)
Index(['CreditScore', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard',
       'IsActiveMember', 'EstimatedSalary', 'Geography_Germany',
       'Geography_Spain', 'Gender_Male'],
      dtype='object')
In [125]:
#Standardizing the numerical variables to zero mean and unit variance.
autosacler = StandardScaler()
X = autosacler.fit_transform(X)
In [126]:
X
Out[126]:
array([[-0.32622142,  0.29351742, -1.04175968, ..., -0.57873591,
        -0.57380915, -1.09598752],
       [-0.44003595,  0.19816383, -1.38753759, ..., -0.57873591,
         1.74273971, -1.09598752],
       [-1.53679418,  0.29351742,  1.03290776, ..., -0.57873591,
        -0.57380915, -1.09598752],
       ...,
       [ 0.60498839, -0.27860412,  0.68712986, ..., -0.57873591,
        -0.57380915, -1.09598752],
       [ 1.25683526,  0.29351742, -0.69598177, ...,  1.72790383,
        -0.57380915,  0.91241915],
       [ 1.46377078, -1.04143285, -0.35020386, ..., -0.57873591,
        -0.57380915, -1.09598752]])

Utility functions¶

In [128]:
def plot(history, name):
    """
    Function to plot loss/accuracy

    history: an object which stores the metrics and losses.
    name: can be one of Loss or Accuracy
    """
    fig, ax = plt.subplots() #Creating a subplot with figure and axes.
    plt.plot(history.history[name]) #Plotting the train accuracy or train loss
    plt.plot(history.history['val_'+name]) #Plotting the validation accuracy or validation loss

    plt.title('Model ' + name.capitalize()) #Defining the title of the plot.
    plt.ylabel(name.capitalize()) #Capitalizing the first letter.
    plt.xlabel('Epoch') #Defining the label for the x-axis.
    fig.legend(['Train', 'Validation'], loc="outside right upper") #Defining the legend, loc controls the position of the legend.
In [129]:
# defining a function to compute different metrics to check performance of a classification model built using statsmodels
def model_performance_classification(
    model, predictors, target, threshold=0.5
):
    """
    Function to compute different metrics to check classification model performance

    model: classifier
    predictors: independent variables
    target: dependent variable
    threshold: threshold for classifying the observation as class 1
    """

    # checking which probabilities are greater than threshold
    pred = model.predict(predictors) > threshold
    # pred_temp = model.predict(predictors) > threshold
    # # rounding off the above values to get classes
    # pred = np.round(pred_temp)

    acc = accuracy_score(target, pred)  # to compute Accuracy
    recall = recall_score(target, pred, average='weighted')  # to compute Recall
    precision = precision_score(target, pred, average='weighted')  # to compute Precision
    f1 = f1_score(target, pred, average='weighted')  # to compute F1-score

    # creating a dataframe of metrics
    df_perf = pd.DataFrame(
        {"Accuracy": acc, "Recall": recall, "Precision": precision, "F1 Score": f1,},
        index=[0],
    )

    return df_perf
In [130]:
def make_confusion_matrix(cf,
                          group_names=None,
                          categories='auto',
                          count=True,
                          percent=True,
                          cbar=True,
                          xyticks=True,
                          xyplotlabels=True,
                          sum_stats=True,
                          figsize=None,
                          cmap='Blues',
                          title=None):
    '''
    This function will make a pretty plot of an sklearn Confusion Matrix cm using a Seaborn heatmap visualization.
    Arguments
    '''

    # CODE TO GENERATE TEXT INSIDE EACH SQUARE
    blanks = ['' for i in range(cf.size)]

    if group_names and len(group_names)==cf.size:
        group_labels = ["{}\n".format(value) for value in group_names]
    else:
        group_labels = blanks

    if count:
        group_counts = ["{0:0.0f}\n".format(value) for value in cf.flatten()]
    else:
        group_counts = blanks

    if percent:
        group_percentages = ["{0:.2%}".format(value) for value in cf.flatten()/np.sum(cf)]
    else:
        group_percentages = blanks

    box_labels = [f"{v1}{v2}{v3}".strip() for v1, v2, v3 in zip(group_labels,group_counts,group_percentages)]
    box_labels = np.asarray(box_labels).reshape(cf.shape[0],cf.shape[1])


    # CODE TO GENERATE SUMMARY STATISTICS & TEXT FOR SUMMARY STATS
    if sum_stats:
        #Accuracy is sum of diagonal divided by total observations
        accuracy  = np.trace(cf) / float(np.sum(cf))

        #if it is a binary confusion matrix, show some more stats
        if len(cf)==2:
            #Metrics for Binary Confusion Matrices
            precision = cf[1,1] / sum(cf[:,1])
            recall    = cf[1,1] / sum(cf[1,:])
            f1_score  = 2*precision*recall / (precision + recall)
            stats_text = "\n\nAccuracy={:0.3f}\nPrecision={:0.3f}\nRecall={:0.3f}\nF1 Score={:0.3f}".format(
                accuracy,precision,recall,f1_score)
        else:
            stats_text = "\n\nAccuracy={:0.3f}".format(accuracy)
    else:
        stats_text = ""


    # SET FIGURE PARAMETERS ACCORDING TO OTHER ARGUMENTS
    if figsize==None:
        #Get default figure size if not set
        figsize = plt.rcParams.get('figure.figsize')

    if xyticks==False:
        #Do not show categories if xyticks is False
        categories=False


    # MAKE THE HEATMAP VISUALIZATION
    plt.figure(figsize=figsize)
    sns.heatmap(cf,annot=box_labels,fmt="",cmap=cmap,cbar=cbar,xticklabels=categories,yticklabels=categories)

    if xyplotlabels:
        plt.ylabel('True label')
        plt.xlabel('Predicted label' + stats_text)
    else:
        plt.xlabel(stats_text)
    
    if title:
        plt.title(title)

Random Forest¶

Before we dive into the neural network model building, let's see how Random Forest behaves -- as a baseline

In [133]:
from sklearn.ensemble import RandomForestClassifier
random_forest = RandomForestClassifier(n_estimators=100)
# Pandas Series.ravel() function returns the flattened underlying data as an ndarray.
random_forest.fit(X_train,y_train.values.ravel())    # np.ravel() Return a contiguous flattened array
Out[133]:
RandomForestClassifier()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestClassifier()
In [134]:
random_forest_prediction = random_forest.predict(X_test)
random_forest.score(X_test,y_test)
Out[134]:
0.8645
In [135]:
labels = ['True Negative','False Positive','False Negative','True Positive']
categories=['Not exited', 'Exited']
make_confusion_matrix(confusion_matrix(y_test, random_forest_prediction), 
                      group_names=labels,
                      categories=categories, 
                      cmap='Blues')
No description has been provided for this image

Neural Network with SGD Optimizer without class weight¶

Let's use the same batch size and the same number of epochs for this run

In [138]:
# defining the batch size and # epochs upfront as we'll be using the same values for all models
epochs = 25
batch_size = 64

Let's set a baseline for our model, and then improve it step by step.

In [140]:
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()
WARNING:tensorflow:From C:\Users\bruce\AppData\Roaming\Python\Python311\site-packages\keras\src\backend.py:277: The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.

In [141]:
#Initializing the neural network
model = Sequential()
model.add(Dense(14,activation="relu",input_dim=X_train.shape[1]))
model.add(Dense(7,activation="relu"))
model.add(Dense(1,activation="sigmoid"))  # binary classification exiting or not
In [142]:
model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 14)                168       
                                                                 
 dense_1 (Dense)             (None, 7)                 105       
                                                                 
 dense_2 (Dense)             (None, 1)                 8         
                                                                 
=================================================================
Total params: 281 (1.10 KB)
Trainable params: 281 (1.10 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [143]:
optimizer = tf.keras.optimizers.SGD()    # defining SGD as the optimizer to be used
model.compile(loss='binary_crossentropy', optimizer=optimizer,metrics=['accuracy'])
In [144]:
start = time.time()
# this includes the class_weight parameter
history = model.fit(X_train, y_train, validation_data=(X_valid,y_valid) , batch_size=batch_size, epochs=epochs)
end=time.time()
Epoch 1/25
WARNING:tensorflow:From C:\Users\bruce\AppData\Roaming\Python\Python311\site-packages\keras\src\utils\tf_utils.py:492: The name tf.ragged.RaggedTensorValue is deprecated. Please use tf.compat.v1.ragged.RaggedTensorValue instead.

WARNING:tensorflow:From C:\Users\bruce\AppData\Roaming\Python\Python311\site-packages\keras\src\engine\base_layer_utils.py:384: The name tf.executing_eagerly_outside_functions is deprecated. Please use tf.compat.v1.executing_eagerly_outside_functions instead.

100/100 [==============================] - 1s 5ms/step - loss: 15917120.0000 - accuracy: 0.7828 - val_loss: 0.6250 - val_accuracy: 0.7962
Epoch 2/25
100/100 [==============================] - 0s 1ms/step - loss: 0.6022 - accuracy: 0.7962 - val_loss: 0.5823 - val_accuracy: 0.7962
Epoch 3/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5682 - accuracy: 0.7962 - val_loss: 0.5558 - val_accuracy: 0.7962
Epoch 4/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5469 - accuracy: 0.7962 - val_loss: 0.5389 - val_accuracy: 0.7962
Epoch 5/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5332 - accuracy: 0.7962 - val_loss: 0.5280 - val_accuracy: 0.7962
Epoch 6/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5243 - accuracy: 0.7962 - val_loss: 0.5209 - val_accuracy: 0.7962
Epoch 7/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5184 - accuracy: 0.7962 - val_loss: 0.5161 - val_accuracy: 0.7962
Epoch 8/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5144 - accuracy: 0.7962 - val_loss: 0.5129 - val_accuracy: 0.7962
Epoch 9/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5117 - accuracy: 0.7962 - val_loss: 0.5107 - val_accuracy: 0.7962
Epoch 10/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5099 - accuracy: 0.7962 - val_loss: 0.5091 - val_accuracy: 0.7962
Epoch 11/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5086 - accuracy: 0.7962 - val_loss: 0.5081 - val_accuracy: 0.7962
Epoch 12/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5077 - accuracy: 0.7962 - val_loss: 0.5073 - val_accuracy: 0.7962
Epoch 13/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5071 - accuracy: 0.7962 - val_loss: 0.5068 - val_accuracy: 0.7962
Epoch 14/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5066 - accuracy: 0.7962 - val_loss: 0.5065 - val_accuracy: 0.7962
Epoch 15/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5063 - accuracy: 0.7962 - val_loss: 0.5062 - val_accuracy: 0.7962
Epoch 16/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5061 - accuracy: 0.7962 - val_loss: 0.5060 - val_accuracy: 0.7962
Epoch 17/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5060 - accuracy: 0.7962 - val_loss: 0.5059 - val_accuracy: 0.7962
Epoch 18/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5058 - accuracy: 0.7962 - val_loss: 0.5058 - val_accuracy: 0.7962
Epoch 19/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5058 - accuracy: 0.7962 - val_loss: 0.5057 - val_accuracy: 0.7962
Epoch 20/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5057 - accuracy: 0.7962 - val_loss: 0.5057 - val_accuracy: 0.7962
Epoch 21/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5057 - accuracy: 0.7962 - val_loss: 0.5056 - val_accuracy: 0.7962
Epoch 22/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5056 - accuracy: 0.7962 - val_loss: 0.5056 - val_accuracy: 0.7962
Epoch 23/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5056 - accuracy: 0.7962 - val_loss: 0.5056 - val_accuracy: 0.7962
Epoch 24/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5056 - accuracy: 0.7962 - val_loss: 0.5056 - val_accuracy: 0.7962
Epoch 25/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5056 - accuracy: 0.7962 - val_loss: 0.5056 - val_accuracy: 0.7962
In [145]:
print("Time taken in seconds ",end-start)
Time taken in seconds  3.936260461807251
In [146]:
plot(history,'loss')
No description has been provided for this image
In [147]:
model_0_train_perf = model_performance_classification(model, X_train, y_train)
model_0_train_perf
200/200 [==============================] - 0s 670us/step
C:\Users\bruce\anaconda3\Lib\site-packages\sklearn\metrics\_classification.py:1344: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Out[147]:
Accuracy Recall Precision F1 Score
0 0.80 0.80 0.63 0.71
In [148]:
model_0_valid_perf = model_performance_classification(model, X_valid, y_valid)
model_0_valid_perf
50/50 [==============================] - 0s 687us/step
C:\Users\bruce\anaconda3\Lib\site-packages\sklearn\metrics\_classification.py:1344: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Out[148]:
Accuracy Recall Precision F1 Score
0 0.80 0.80 0.63 0.71

The warning may be indicating something isn't working as expected from my data

In [150]:
from collections import Counter
print(Counter(y_train))
print(Counter(y_valid))
print(Counter(y_test))
Counter({0: 5096, 1: 1304})
Counter({0: 1274, 1: 326})
Counter({0: 1593, 1: 407})

Model Building¶

Model Evaluation Criterion¶

A model can make wrong predictions in the following ways:¶

  • Predicting a customer will exit when they are loyal.
  • Predicting customer will stay when they are leaving.

Which case is more important?¶

Both cases are actually important for the purposes of this case study. For this particular business case, we want to minimize the number of false positives and the number of false negatives.

By predicting a customer to exit when they are staying, it means banking insitutution resources are not being used effectively. And predicting a customer will stay when they are leaving is equally resources being used ineffectively.

How to reduce this loss i.e need to reduce False Negatives as well as False Positives?¶

Since both errors are important for us to minimize, the company would want the F1 Score evaluation metric to be maximized

Hence, the focus should be on increasing the F1 score rather than focusing on just one metric, such as Recall or Precision

As we have are dealing with an imbalance in class distribution, use class weights to allow the model to give proportionally more importance to the minority class.

In [157]:
'''
# Calculate class weights for imbalanced dataset
cw = (y_train.shape[0]) / np.bincount(y_train)

# Create a dictionary mapping class indices to their respective class weights
cw_dict = {}
for i in range(cw.shape[0]):
    cw_dict[i] = cw[i]

cw_dict
'''
from sklearn.utils import class_weight
class_weight_array = class_weight.compute_class_weight(class_weight=None,
                                               classes=np.unique(y_train),
                                               y=y_train)
# convert the array into a dictionary for keras to use
cw_dict = {index: value for index, value in enumerate(class_weight_array)}

Neural Network with SGD Optimizer with class weight¶

To establish a baseline, let's start with a neural network consisting of:

  • two hidden layers with 14 and 7 neurons respectively
  • activation function of ReLU.- SGD as the optimizer
  • class weight as described in the dictionary in the previous section
In [160]:
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()
In [161]:
#Initializing the neural network
model = Sequential()
model.add(Dense(14,activation="relu",input_dim=X_train.shape[1]))
model.add(Dense(7,activation="relu"))
model.add(Dense(1,activation="sigmoid"))  # binary classification exiting or not
In [162]:
model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 14)                168       
                                                                 
 dense_1 (Dense)             (None, 7)                 105       
                                                                 
 dense_2 (Dense)             (None, 1)                 8         
                                                                 
=================================================================
Total params: 281 (1.10 KB)
Trainable params: 281 (1.10 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [163]:
optimizer = tf.keras.optimizers.SGD()    # defining SGD as the optimizer to be used
model.compile(loss='binary_crossentropy', optimizer=optimizer,metrics=['accuracy'])

Include the class_weight parameter in the model

In [165]:
start = time.time()
# this includes the class_weight parameter
history = model.fit(X_train, y_train, validation_data=(X_valid,y_valid) , batch_size=batch_size, epochs=epochs,class_weight=cw_dict)
end=time.time()
Epoch 1/25
100/100 [==============================] - 1s 2ms/step - loss: 1985.8293 - accuracy: 0.7925 - val_loss: 0.6241 - val_accuracy: 0.7962
Epoch 2/25
100/100 [==============================] - 0s 1ms/step - loss: 0.6015 - accuracy: 0.7962 - val_loss: 0.5818 - val_accuracy: 0.7962
Epoch 3/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5678 - accuracy: 0.7962 - val_loss: 0.5555 - val_accuracy: 0.7962
Epoch 4/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5466 - accuracy: 0.7962 - val_loss: 0.5387 - val_accuracy: 0.7962
Epoch 5/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5330 - accuracy: 0.7962 - val_loss: 0.5279 - val_accuracy: 0.7962
Epoch 6/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5241 - accuracy: 0.7962 - val_loss: 0.5208 - val_accuracy: 0.7962
Epoch 7/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5183 - accuracy: 0.7962 - val_loss: 0.5160 - val_accuracy: 0.7962
Epoch 8/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5143 - accuracy: 0.7962 - val_loss: 0.5128 - val_accuracy: 0.7962
Epoch 9/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5117 - accuracy: 0.7962 - val_loss: 0.5106 - val_accuracy: 0.7962
Epoch 10/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5098 - accuracy: 0.7962 - val_loss: 0.5091 - val_accuracy: 0.7962
Epoch 11/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5086 - accuracy: 0.7962 - val_loss: 0.5081 - val_accuracy: 0.7962
Epoch 12/25
100/100 [==============================] - 0s 2ms/step - loss: 0.5077 - accuracy: 0.7962 - val_loss: 0.5073 - val_accuracy: 0.7962
Epoch 13/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5071 - accuracy: 0.7962 - val_loss: 0.5068 - val_accuracy: 0.7962
Epoch 14/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5066 - accuracy: 0.7962 - val_loss: 0.5065 - val_accuracy: 0.7962
Epoch 15/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5063 - accuracy: 0.7962 - val_loss: 0.5062 - val_accuracy: 0.7962
Epoch 16/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5061 - accuracy: 0.7962 - val_loss: 0.5060 - val_accuracy: 0.7962
Epoch 17/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5060 - accuracy: 0.7962 - val_loss: 0.5059 - val_accuracy: 0.7962
Epoch 18/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5059 - accuracy: 0.7962 - val_loss: 0.5058 - val_accuracy: 0.7962
Epoch 19/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5058 - accuracy: 0.7962 - val_loss: 0.5057 - val_accuracy: 0.7962
Epoch 20/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5057 - accuracy: 0.7962 - val_loss: 0.5057 - val_accuracy: 0.7962
Epoch 21/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5057 - accuracy: 0.7962 - val_loss: 0.5056 - val_accuracy: 0.7962
Epoch 22/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5056 - accuracy: 0.7962 - val_loss: 0.5056 - val_accuracy: 0.7962
Epoch 23/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5056 - accuracy: 0.7962 - val_loss: 0.5056 - val_accuracy: 0.7962
Epoch 24/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5056 - accuracy: 0.7962 - val_loss: 0.5056 - val_accuracy: 0.7962
Epoch 25/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5056 - accuracy: 0.7962 - val_loss: 0.5056 - val_accuracy: 0.7962
In [166]:
print("Time taken in seconds ",end-start)
Time taken in seconds  3.598320722579956
In [167]:
plot(history,'loss')
No description has been provided for this image
In [168]:
model_1_train_perf = model_performance_classification(model, X_train, y_train)
model_1_train_perf
200/200 [==============================] - 0s 667us/step
C:\Users\bruce\anaconda3\Lib\site-packages\sklearn\metrics\_classification.py:1344: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Out[168]:
Accuracy Recall Precision F1 Score
0 0.80 0.80 0.63 0.71
In [169]:
model_1_valid_perf = model_performance_classification(model, X_valid, y_valid)
model_1_valid_perf
50/50 [==============================] - 0s 820us/step
C:\Users\bruce\anaconda3\Lib\site-packages\sklearn\metrics\_classification.py:1344: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Out[169]:
Accuracy Recall Precision F1 Score
0 0.80 0.80 0.63 0.71

Observation

The model using class_weight starts at a very high value and then fails to learn.

Model Performance Improvement¶

Neural Network with Adam Optimizer¶

Let's change the optimizer to Adam. This will introduce momentum as well as an adaptive learning rate

In [174]:
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()
In [175]:
#Initializing the neural network
model = Sequential()
model.add(Dense(14,activation="relu",input_dim=X_train.shape[1]))
model.add(Dense(7,activation="relu"))
model.add(Dense(1,activation="sigmoid"))  # binary classification: exiting or not
In [176]:
model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 14)                168       
                                                                 
 dense_1 (Dense)             (None, 7)                 105       
                                                                 
 dense_2 (Dense)             (None, 1)                 8         
                                                                 
=================================================================
Total params: 281 (1.10 KB)
Trainable params: 281 (1.10 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [177]:
optimizer = tf.keras.optimizers.Adam()    # defining Adam as the optimizer to be used
model.compile(loss='binary_crossentropy', optimizer=optimizer,metrics=['accuracy'])
In [178]:
start = time.time()
history = model.fit(X_train, y_train, validation_data=(X_valid,y_valid) , batch_size=batch_size, epochs=epochs, class_weight=cw_dict)
end=time.time()
Epoch 1/25
100/100 [==============================] - 1s 2ms/step - loss: 26.2987 - accuracy: 0.7573 - val_loss: 0.8260 - val_accuracy: 0.7894
Epoch 2/25
100/100 [==============================] - 0s 1ms/step - loss: 0.8251 - accuracy: 0.7870 - val_loss: 0.7847 - val_accuracy: 0.7794
Epoch 3/25
100/100 [==============================] - 0s 1ms/step - loss: 0.7782 - accuracy: 0.7861 - val_loss: 0.7292 - val_accuracy: 0.7906
Epoch 4/25
100/100 [==============================] - 0s 1ms/step - loss: 0.7258 - accuracy: 0.7884 - val_loss: 0.6795 - val_accuracy: 0.7906
Epoch 5/25
100/100 [==============================] - 0s 1ms/step - loss: 0.6744 - accuracy: 0.7895 - val_loss: 0.6345 - val_accuracy: 0.7919
Epoch 6/25
100/100 [==============================] - 0s 1ms/step - loss: 0.6292 - accuracy: 0.7925 - val_loss: 0.6012 - val_accuracy: 0.7925
Epoch 7/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5966 - accuracy: 0.7917 - val_loss: 0.5841 - val_accuracy: 0.7844
Epoch 8/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5636 - accuracy: 0.7931 - val_loss: 0.5563 - val_accuracy: 0.7937
Epoch 9/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5488 - accuracy: 0.7947 - val_loss: 0.5451 - val_accuracy: 0.7962
Epoch 10/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5421 - accuracy: 0.7950 - val_loss: 0.5372 - val_accuracy: 0.7962
Epoch 11/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5336 - accuracy: 0.7959 - val_loss: 0.5313 - val_accuracy: 0.7962
Epoch 12/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5278 - accuracy: 0.7966 - val_loss: 0.5261 - val_accuracy: 0.7962
Epoch 13/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5236 - accuracy: 0.7962 - val_loss: 0.5222 - val_accuracy: 0.7962
Epoch 14/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5198 - accuracy: 0.7959 - val_loss: 0.5190 - val_accuracy: 0.7962
Epoch 15/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5166 - accuracy: 0.7962 - val_loss: 0.5162 - val_accuracy: 0.7962
Epoch 16/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5141 - accuracy: 0.7962 - val_loss: 0.5141 - val_accuracy: 0.7962
Epoch 17/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5122 - accuracy: 0.7962 - val_loss: 0.5127 - val_accuracy: 0.7956
Epoch 18/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5107 - accuracy: 0.7962 - val_loss: 0.5114 - val_accuracy: 0.7962
Epoch 19/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5094 - accuracy: 0.7961 - val_loss: 0.5096 - val_accuracy: 0.7962
Epoch 20/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5084 - accuracy: 0.7962 - val_loss: 0.5090 - val_accuracy: 0.7962
Epoch 21/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5076 - accuracy: 0.7962 - val_loss: 0.5084 - val_accuracy: 0.7962
Epoch 22/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5070 - accuracy: 0.7962 - val_loss: 0.5083 - val_accuracy: 0.7956
Epoch 23/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5067 - accuracy: 0.7962 - val_loss: 0.5071 - val_accuracy: 0.7962
Epoch 24/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5062 - accuracy: 0.7962 - val_loss: 0.5068 - val_accuracy: 0.7962
Epoch 25/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5060 - accuracy: 0.7962 - val_loss: 0.5066 - val_accuracy: 0.7962
In [179]:
print("Time taken in seconds ",end-start)
Time taken in seconds  3.829555034637451
In [180]:
plot(history,'loss')
No description has been provided for this image
In [181]:
model_2_train_perf = model_performance_classification(model, X_train, y_train)
model_2_train_perf
200/200 [==============================] - 0s 665us/step
C:\Users\bruce\anaconda3\Lib\site-packages\sklearn\metrics\_classification.py:1344: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Out[181]:
Accuracy Recall Precision F1 Score
0 0.80 0.80 0.63 0.71
In [182]:
model_2_valid_perf = model_performance_classification(model, X_valid, y_valid)
model_2_valid_perf
50/50 [==============================] - 0s 656us/step
C:\Users\bruce\anaconda3\Lib\site-packages\sklearn\metrics\_classification.py:1344: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Out[182]:
Accuracy Recall Precision F1 Score
0 0.80 0.80 0.63 0.71

Neural Network with Adam Optimizer and Dropout¶

In [184]:
X_train.shape
Out[184]:
(6400, 11)

Model-3

  • We will use a simple NN made of 5 fully-connected layers with ReLu activation. The NN takes a vector of length 11 as input. This represents the information related to each transactions, ie each line with 11 columns from the dataset. For each transaction, the final layer will output a probability distribution (sigmoid activation function) and classify either as not exiting (0) or exiting (1).
  • Two dropout steps are included to prevent overfitting.

Dropout

Dropout is a regularization technique for neural network models proposed by Srivastava, et al. in their 2014 paper Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Dropout is a technique where randomly selected neurons are ignored during training. They are “dropped-out” randomly.

In [186]:
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()
In [187]:
model = Sequential()
model.add(Dense(14,activation="relu",input_dim=X_train.shape[1]))
model.add(Dropout(0.5))
model.add(Dense(7,activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(1,activation="sigmoid"))
In [188]:
# Create optimizer with default learning rate
# Compile the model
optimizer = tf.keras.optimizers.Adam()    # defining Adam as the optimizer to be used
model.compile(optimizer=optimizer,loss='binary_crossentropy',metrics=['accuracy'])
In [189]:
model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 14)                168       
                                                                 
 dropout (Dropout)           (None, 14)                0         
                                                                 
 dense_1 (Dense)             (None, 7)                 105       
                                                                 
 dropout_1 (Dropout)         (None, 7)                 0         
                                                                 
 dense_2 (Dense)             (None, 1)                 8         
                                                                 
=================================================================
Total params: 281 (1.10 KB)
Trainable params: 281 (1.10 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [190]:
start = time.time()
history = model.fit(X_train, y_train, validation_data=(X_valid,y_valid) , batch_size=batch_size, epochs=epochs, class_weight=cw_dict)
end=time.time()
Epoch 1/25
100/100 [==============================] - 1s 3ms/step - loss: 15722.9365 - accuracy: 0.4459 - val_loss: 3570.3752 - val_accuracy: 0.3019
Epoch 2/25
100/100 [==============================] - 0s 1ms/step - loss: 6158.7163 - accuracy: 0.5323 - val_loss: 1367.8448 - val_accuracy: 0.3094
Epoch 3/25
100/100 [==============================] - 0s 1ms/step - loss: 3067.6174 - accuracy: 0.5462 - val_loss: 482.8052 - val_accuracy: 0.3719
Epoch 4/25
100/100 [==============================] - 0s 1ms/step - loss: 1524.9984 - accuracy: 0.5945 - val_loss: 66.0271 - val_accuracy: 0.4569
Epoch 5/25
100/100 [==============================] - 0s 1ms/step - loss: 872.7789 - accuracy: 0.6827 - val_loss: 0.5723 - val_accuracy: 0.7962
Epoch 6/25
100/100 [==============================] - 0s 1ms/step - loss: 527.4734 - accuracy: 0.6978 - val_loss: 0.5366 - val_accuracy: 0.7962
Epoch 7/25
100/100 [==============================] - 0s 1ms/step - loss: 332.3477 - accuracy: 0.7073 - val_loss: 0.5300 - val_accuracy: 0.7962
Epoch 8/25
100/100 [==============================] - 0s 1ms/step - loss: 156.6253 - accuracy: 0.7391 - val_loss: 0.5937 - val_accuracy: 0.7956
Epoch 9/25
100/100 [==============================] - 0s 1ms/step - loss: 94.9946 - accuracy: 0.7603 - val_loss: 0.5209 - val_accuracy: 0.7962
Epoch 10/25
100/100 [==============================] - 0s 1ms/step - loss: 64.5001 - accuracy: 0.7722 - val_loss: 0.5453 - val_accuracy: 0.7962
Epoch 11/25
100/100 [==============================] - 0s 1ms/step - loss: 39.9264 - accuracy: 0.7709 - val_loss: 0.5265 - val_accuracy: 0.7962
Epoch 12/25
100/100 [==============================] - 0s 1ms/step - loss: 24.1317 - accuracy: 0.7788 - val_loss: 0.5404 - val_accuracy: 0.7962
Epoch 13/25
100/100 [==============================] - 0s 1ms/step - loss: 21.4563 - accuracy: 0.7847 - val_loss: 0.5320 - val_accuracy: 0.7962
Epoch 14/25
100/100 [==============================] - 0s 1ms/step - loss: 13.0958 - accuracy: 0.7875 - val_loss: 0.5405 - val_accuracy: 0.7962
Epoch 15/25
100/100 [==============================] - 0s 1ms/step - loss: 14.7540 - accuracy: 0.7802 - val_loss: 0.5050 - val_accuracy: 0.7962
Epoch 16/25
100/100 [==============================] - 0s 1ms/step - loss: 6.1719 - accuracy: 0.7852 - val_loss: 0.5116 - val_accuracy: 0.7962
Epoch 17/25
100/100 [==============================] - 0s 1ms/step - loss: 9.7568 - accuracy: 0.7788 - val_loss: 0.5095 - val_accuracy: 0.7962
Epoch 18/25
100/100 [==============================] - 0s 1ms/step - loss: 8.3190 - accuracy: 0.7822 - val_loss: 0.5226 - val_accuracy: 0.7962
Epoch 19/25
100/100 [==============================] - 0s 1ms/step - loss: 7.7841 - accuracy: 0.7770 - val_loss: 0.5129 - val_accuracy: 0.7962
Epoch 20/25
100/100 [==============================] - 0s 1ms/step - loss: 5.5043 - accuracy: 0.7788 - val_loss: 0.5027 - val_accuracy: 0.7962
Epoch 21/25
100/100 [==============================] - 0s 1ms/step - loss: 5.9339 - accuracy: 0.7823 - val_loss: 0.5007 - val_accuracy: 0.7962
Epoch 22/25
100/100 [==============================] - 0s 1ms/step - loss: 2.9321 - accuracy: 0.7819 - val_loss: 0.5010 - val_accuracy: 0.7962
Epoch 23/25
100/100 [==============================] - 0s 1ms/step - loss: 4.2278 - accuracy: 0.7825 - val_loss: 0.5033 - val_accuracy: 0.7962
Epoch 24/25
100/100 [==============================] - 0s 1ms/step - loss: 3.7639 - accuracy: 0.7822 - val_loss: 0.5077 - val_accuracy: 0.7962
Epoch 25/25
100/100 [==============================] - 0s 1ms/step - loss: 3.8067 - accuracy: 0.7830 - val_loss: 0.5018 - val_accuracy: 0.7962
In [191]:
print("Time taken in seconds ",end-start)
Time taken in seconds  3.9581198692321777
In [192]:
plot(history,'loss')
No description has been provided for this image
In [193]:
model_3_train_perf = model_performance_classification(model, X_train, y_train)
model_3_train_perf
200/200 [==============================] - 0s 697us/step
C:\Users\bruce\anaconda3\Lib\site-packages\sklearn\metrics\_classification.py:1344: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Out[193]:
Accuracy Recall Precision F1 Score
0 0.80 0.80 0.63 0.71
In [194]:
model_3_valid_perf = model_performance_classification(model, X_valid, y_valid)
model_3_valid_perf
50/50 [==============================] - 0s 951us/step
C:\Users\bruce\anaconda3\Lib\site-packages\sklearn\metrics\_classification.py:1344: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Out[194]:
Accuracy Recall Precision F1 Score
0 0.80 0.80 0.63 0.71

Neural Network with Balanced Data (by applying SMOTE) and SGD Optimizer¶

In [196]:
X_train.shape
Out[196]:
(6400, 11)
In [197]:
y_train.shape
Out[197]:
(6400,)

Synthetic Minority Oversampling Technique to solve the idea that with imbalanced classification is that there may be too few examples of the minority class for a model to effectively learn the decision boundary.

Run without SMOTE Kernal initializer with batch normalization¶

In [200]:
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()
In [201]:
'''#initialize the model
model = Sequential()
# This adds the input layer (by specifying input dimension) AND the first hidden layer (units)
model.add(Dense(units=24, input_dim = X_train.shape[1],activation='relu'))   # input of 29 columns as shown above
# hidden layer
model.add(Dense(units=24,activation='relu'))
model.add(Dense(24,activation='relu'))
model.add(Dense(24,activation='relu'))
# Adding the output layer
# Notice that we do not need to specify input dim. 
# we have an output of 1 node, which is the the desired dimensions of our output (fraud or not)
# We use the sigmoid because we want probability outcomes
model.add(Dense(1,activation='sigmoid'))                        # binary classification fraudulent or not
'''
from tensorflow.keras.layers import BatchNormalization
model = Sequential()
model.add(Dense(14,activation="relu",input_dim=X_train.shape[1]))
model.add(BatchNormalization())
model.add(Dense(7,activation="relu"))
model.add(BatchNormalization())
model.add(Dense(1,activation="sigmoid"))

Use the SGD optimizer

In [203]:
# Create optimizer with default learning rate
# Compile the model
optimizer = tf.keras.optimizers.Adam()    # defining SGD as the optimizer to be used
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
In [204]:
model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 14)                168       
                                                                 
 batch_normalization (Batch  (None, 14)                56        
 Normalization)                                                  
                                                                 
 dense_1 (Dense)             (None, 7)                 105       
                                                                 
 batch_normalization_1 (Bat  (None, 7)                 28        
 chNormalization)                                                
                                                                 
 dense_2 (Dense)             (None, 1)                 8         
                                                                 
=================================================================
Total params: 365 (1.43 KB)
Trainable params: 323 (1.26 KB)
Non-trainable params: 42 (168.00 Byte)
_________________________________________________________________
In [205]:
start = time.time()
history = model.fit(X_train, y_train, validation_data=(X_valid,y_valid) , batch_size=batch_size, epochs=epochs, class_weight=cw_dict)
end=time.time()
Epoch 1/25
100/100 [==============================] - 2s 3ms/step - loss: 0.6951 - accuracy: 0.5692 - val_loss: 0.6614 - val_accuracy: 0.7075
Epoch 2/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5909 - accuracy: 0.7717 - val_loss: 0.6013 - val_accuracy: 0.7506
Epoch 3/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5436 - accuracy: 0.7956 - val_loss: 0.5332 - val_accuracy: 0.7962
Epoch 4/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5163 - accuracy: 0.7962 - val_loss: 0.5182 - val_accuracy: 0.7962
Epoch 5/25
100/100 [==============================] - 0s 1ms/step - loss: 0.5032 - accuracy: 0.7962 - val_loss: 0.5129 - val_accuracy: 0.7962
Epoch 6/25
100/100 [==============================] - 0s 1ms/step - loss: 0.4979 - accuracy: 0.7961 - val_loss: 0.5064 - val_accuracy: 0.7962
Epoch 7/25
100/100 [==============================] - 0s 1ms/step - loss: 0.4974 - accuracy: 0.7964 - val_loss: 0.5073 - val_accuracy: 0.7962
Epoch 8/25
100/100 [==============================] - 0s 1ms/step - loss: 0.4973 - accuracy: 0.7962 - val_loss: 0.5047 - val_accuracy: 0.7962
Epoch 9/25
100/100 [==============================] - 0s 1ms/step - loss: 0.4960 - accuracy: 0.7962 - val_loss: 0.5063 - val_accuracy: 0.7962
Epoch 10/25
100/100 [==============================] - 0s 1ms/step - loss: 0.4960 - accuracy: 0.7962 - val_loss: 0.5055 - val_accuracy: 0.7962
Epoch 11/25
100/100 [==============================] - 0s 1ms/step - loss: 0.4953 - accuracy: 0.7961 - val_loss: 0.5069 - val_accuracy: 0.7962
Epoch 12/25
100/100 [==============================] - 0s 1ms/step - loss: 0.4970 - accuracy: 0.7962 - val_loss: 0.5048 - val_accuracy: 0.7962
Epoch 13/25
100/100 [==============================] - 0s 1ms/step - loss: 0.4953 - accuracy: 0.7962 - val_loss: 0.5048 - val_accuracy: 0.7962
Epoch 14/25
100/100 [==============================] - 0s 1ms/step - loss: 0.4961 - accuracy: 0.7962 - val_loss: 0.5040 - val_accuracy: 0.7962
Epoch 15/25
100/100 [==============================] - 0s 1ms/step - loss: 0.4959 - accuracy: 0.7962 - val_loss: 0.5043 - val_accuracy: 0.7962
Epoch 16/25
100/100 [==============================] - 0s 1ms/step - loss: 0.4960 - accuracy: 0.7962 - val_loss: 0.5049 - val_accuracy: 0.7962
Epoch 17/25
100/100 [==============================] - 0s 1ms/step - loss: 0.4962 - accuracy: 0.7962 - val_loss: 0.5049 - val_accuracy: 0.7962
Epoch 18/25
100/100 [==============================] - 0s 1ms/step - loss: 0.4953 - accuracy: 0.7962 - val_loss: 0.5051 - val_accuracy: 0.7962
Epoch 19/25
100/100 [==============================] - 0s 1ms/step - loss: 0.4954 - accuracy: 0.7964 - val_loss: 0.5048 - val_accuracy: 0.7962
Epoch 20/25
100/100 [==============================] - 0s 1ms/step - loss: 0.4955 - accuracy: 0.7961 - val_loss: 0.5041 - val_accuracy: 0.7962
Epoch 21/25
100/100 [==============================] - 0s 1ms/step - loss: 0.4954 - accuracy: 0.7962 - val_loss: 0.5048 - val_accuracy: 0.7962
Epoch 22/25
100/100 [==============================] - 0s 1ms/step - loss: 0.4960 - accuracy: 0.7964 - val_loss: 0.5052 - val_accuracy: 0.7962
Epoch 23/25
100/100 [==============================] - 0s 1ms/step - loss: 0.4958 - accuracy: 0.7962 - val_loss: 0.5045 - val_accuracy: 0.7962
Epoch 24/25
100/100 [==============================] - 0s 1ms/step - loss: 0.4960 - accuracy: 0.7962 - val_loss: 0.5056 - val_accuracy: 0.7962
Epoch 25/25
100/100 [==============================] - 0s 1ms/step - loss: 0.4953 - accuracy: 0.7962 - val_loss: 0.5050 - val_accuracy: 0.7962
In [206]:
print("Time taken in seconds ",end-start)
Time taken in seconds  5.147068977355957
In [207]:
plot(history,'loss')
No description has been provided for this image
In [208]:
model_4_train_perf = model_performance_classification(model, X_train, y_train)
model_4_train_perf
200/200 [==============================] - 0s 706us/step
C:\Users\bruce\anaconda3\Lib\site-packages\sklearn\metrics\_classification.py:1344: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Out[208]:
Accuracy Recall Precision F1 Score
0 0.80 0.80 0.63 0.71
In [209]:
model_4_valid_perf = model_performance_classification(model, X_valid, y_valid)
model_4_valid_perf
50/50 [==============================] - 0s 741us/step
C:\Users\bruce\anaconda3\Lib\site-packages\sklearn\metrics\_classification.py:1344: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Out[209]:
Accuracy Recall Precision F1 Score
0 0.80 0.80 0.63 0.71

Run with SMOTE¶

In [211]:
# check version number
import imblearn
print(imblearn.__version__)
0.11.0

Use SMOTE

The challenge of working with imbalanced datasets is that most machine learning techniques will ignore, and in turn have poor performance on, the minority class, although typically it is performance on the minority class that is most important.

One approach to addressing imbalanced datasets is to oversample the minority class. The simplest approach involves duplicating examples in the minority class, although these examples don’t add any new information to the model. Instead, new examples can be synthesized from the existing examples. This is a type of data augmentation for the minority class and is referred to as th_e Synthetic Minority Oversampling Techniq_ue, o_r SMO_TE for short.

In [213]:
X_train.shape
Out[213]:
(6400, 11)
In [214]:
# summarize class distribution
counter = Counter(y_train)
print(counter)
y_train.shape
Counter({0: 5096, 1: 1304})
Out[214]:
(6400,)

The original paper on SMOTE suggested combining SMOTE with random undersampling of the majority class.

The imbalanced-learn library supports random undersampling via the RandomUnderSampler clas2,000).).

In [216]:
# define pipeline
over = SMOTE(random_state=42)
under = RandomUnderSampler(random_state=42)
steps = [('o', over), ('u', under)]
pipeline = Pipeline(steps=steps)

# transform the dataset
X_train_res, y_train_res = pipeline.fit_resample(X_train, y_train)
In [217]:
# summarize the new class distribution
counter = Counter(y_train_res)
print(counter)
y_train_res.shape
Counter({0: 5096, 1: 5096})
Out[217]:
(10192,)

Apply balanced data using SMOTE as the training data and execute the same model. Use the same validation data

In [219]:
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()
In [220]:
#Initializing the neural network
model = Sequential()
model.add(Dense(14,activation="relu",input_dim=X_train.shape[1]))
model.add(Dense(7,activation="relu"))
model.add(Dense(1,activation="sigmoid"))
In [221]:
# Create optimizer with default learning rate
# Compile the model
optimizer = tf.keras.optimizers.SGD()    # defining SGD as the optimizer to be used
model.compile(optimizer=optimizer,loss='binary_crossentropy',metrics=['accuracy'])
In [222]:
model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 14)                168       
                                                                 
 dense_1 (Dense)             (None, 7)                 105       
                                                                 
 dense_2 (Dense)             (None, 1)                 8         
                                                                 
=================================================================
Total params: 281 (1.10 KB)
Trainable params: 281 (1.10 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [223]:
start = time.time()
history = model.fit(X_train_res, y_train_res, validation_data=(X_valid,y_valid) , batch_size=batch_size, epochs=epochs, class_weight=cw_dict)
end=time.time()
Epoch 1/25
160/160 [==============================] - 1s 2ms/step - loss: 1496480150293177171968.0000 - accuracy: 0.4959 - val_loss: 0.6929 - val_accuracy: 0.7962
Epoch 2/25
160/160 [==============================] - 0s 1ms/step - loss: 0.6932 - accuracy: 0.4973 - val_loss: 0.6924 - val_accuracy: 0.7962
Epoch 3/25
160/160 [==============================] - 0s 1ms/step - loss: 0.6932 - accuracy: 0.4984 - val_loss: 0.6927 - val_accuracy: 0.7962
Epoch 4/25
160/160 [==============================] - 0s 1ms/step - loss: 0.6932 - accuracy: 0.4912 - val_loss: 0.6928 - val_accuracy: 0.7962
Epoch 5/25
160/160 [==============================] - 0s 1ms/step - loss: 0.6932 - accuracy: 0.4939 - val_loss: 0.6927 - val_accuracy: 0.7962
Epoch 6/25
160/160 [==============================] - 0s 1ms/step - loss: 0.6932 - accuracy: 0.4937 - val_loss: 0.6926 - val_accuracy: 0.7962
Epoch 7/25
160/160 [==============================] - 0s 1ms/step - loss: 0.6932 - accuracy: 0.4933 - val_loss: 0.6921 - val_accuracy: 0.7962
Epoch 8/25
160/160 [==============================] - 0s 1ms/step - loss: 0.6932 - accuracy: 0.4959 - val_loss: 0.6928 - val_accuracy: 0.7962
Epoch 9/25
160/160 [==============================] - 0s 1ms/step - loss: 0.6932 - accuracy: 0.4925 - val_loss: 0.6930 - val_accuracy: 0.7962
Epoch 10/25
160/160 [==============================] - 0s 1ms/step - loss: 0.6932 - accuracy: 0.4967 - val_loss: 0.6929 - val_accuracy: 0.7962
Epoch 11/25
160/160 [==============================] - 0s 1ms/step - loss: 0.6932 - accuracy: 0.4914 - val_loss: 0.6931 - val_accuracy: 0.7962
Epoch 12/25
160/160 [==============================] - 0s 1ms/step - loss: 0.6932 - accuracy: 0.4953 - val_loss: 0.6929 - val_accuracy: 0.7962
Epoch 13/25
160/160 [==============================] - 0s 1ms/step - loss: 0.6932 - accuracy: 0.4965 - val_loss: 0.6925 - val_accuracy: 0.7962
Epoch 14/25
160/160 [==============================] - 0s 1ms/step - loss: 0.6932 - accuracy: 0.4947 - val_loss: 0.6926 - val_accuracy: 0.7962
Epoch 15/25
160/160 [==============================] - 0s 1ms/step - loss: 0.6932 - accuracy: 0.4920 - val_loss: 0.6933 - val_accuracy: 0.2037
Epoch 16/25
160/160 [==============================] - 0s 1ms/step - loss: 0.6932 - accuracy: 0.4955 - val_loss: 0.6932 - val_accuracy: 0.2037
Epoch 17/25
160/160 [==============================] - 0s 1ms/step - loss: 0.6932 - accuracy: 0.4973 - val_loss: 0.6925 - val_accuracy: 0.7962
Epoch 18/25
160/160 [==============================] - 0s 1ms/step - loss: 0.6932 - accuracy: 0.4955 - val_loss: 0.6920 - val_accuracy: 0.7962
Epoch 19/25
160/160 [==============================] - 0s 1ms/step - loss: 0.6932 - accuracy: 0.4965 - val_loss: 0.6919 - val_accuracy: 0.7962
Epoch 20/25
160/160 [==============================] - 0s 1ms/step - loss: 0.6932 - accuracy: 0.4927 - val_loss: 0.6921 - val_accuracy: 0.7962
Epoch 21/25
160/160 [==============================] - 0s 1ms/step - loss: 0.6932 - accuracy: 0.4978 - val_loss: 0.6928 - val_accuracy: 0.7962
Epoch 22/25
160/160 [==============================] - 0s 1ms/step - loss: 0.6932 - accuracy: 0.4894 - val_loss: 0.6926 - val_accuracy: 0.7962
Epoch 23/25
160/160 [==============================] - 0s 1ms/step - loss: 0.6932 - accuracy: 0.4973 - val_loss: 0.6931 - val_accuracy: 0.7962
Epoch 24/25
160/160 [==============================] - 0s 1ms/step - loss: 0.6932 - accuracy: 0.4904 - val_loss: 0.6932 - val_accuracy: 0.2037
Epoch 25/25
160/160 [==============================] - 0s 1ms/step - loss: 0.6932 - accuracy: 0.4865 - val_loss: 0.6934 - val_accuracy: 0.2037
In [224]:
print("Time taken in seconds ",end-start)
Time taken in seconds  4.89260458946228
In [225]:
plot(history,'loss')
No description has been provided for this image
In [226]:
model_5_train_perf = model_performance_classification(model, X_train_res, y_train_res)
model_5_train_perf
319/319 [==============================] - 0s 652us/step
C:\Users\bruce\anaconda3\Lib\site-packages\sklearn\metrics\_classification.py:1344: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Out[226]:
Accuracy Recall Precision F1 Score
0 0.50 0.50 0.25 0.33
In [227]:
model_5_valid_perf = model_performance_classification(model, X_valid, y_valid)
model_5_valid_perf
50/50 [==============================] - 0s 698us/step
C:\Users\bruce\anaconda3\Lib\site-packages\sklearn\metrics\_classification.py:1344: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Out[227]:
Accuracy Recall Precision F1 Score
0 0.20 0.20 0.04 0.07

Observation

This model fails to learn.

Neural Network with Balanced Data (by applying SMOTE) and Adam Optimizer¶

In [230]:
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()
In [231]:
#Initializing the neural network
model = Sequential()
model.add(Dense(14,activation="relu",input_dim=X_train.shape[1]))
model.add(Dense(7,activation="relu"))
model.add(Dense(1,activation="sigmoid"))
In [232]:
# Create optimizer with default learning rate
# Compile the model
optimizer = tf.keras.optimizers.Adam()    # defining Adam as the optimizer to be used
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
In [233]:
model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 14)                168       
                                                                 
 dense_1 (Dense)             (None, 7)                 105       
                                                                 
 dense_2 (Dense)             (None, 1)                 8         
                                                                 
=================================================================
Total params: 281 (1.10 KB)
Trainable params: 281 (1.10 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

Use the SMOTE generated data

In [235]:
start = time.time()
history = model.fit(X_train_res, y_train_res, validation_data=(X_valid,y_valid) , batch_size=batch_size, epochs=epochs, class_weight=cw_dict)
end=time.time()
Epoch 1/25
160/160 [==============================] - 1s 2ms/step - loss: 2070.7112 - accuracy: 0.5360 - val_loss: 160.4182 - val_accuracy: 0.6162
Epoch 2/25
160/160 [==============================] - 0s 1ms/step - loss: 129.1697 - accuracy: 0.5230 - val_loss: 97.9669 - val_accuracy: 0.6531
Epoch 3/25
160/160 [==============================] - 0s 1ms/step - loss: 65.6014 - accuracy: 0.5182 - val_loss: 32.3367 - val_accuracy: 0.5181
Epoch 4/25
160/160 [==============================] - 0s 1ms/step - loss: 37.4658 - accuracy: 0.5124 - val_loss: 33.8103 - val_accuracy: 0.6150
Epoch 5/25
160/160 [==============================] - 0s 1ms/step - loss: 28.7437 - accuracy: 0.5167 - val_loss: 25.9918 - val_accuracy: 0.7781
Epoch 6/25
160/160 [==============================] - 0s 1ms/step - loss: 21.5707 - accuracy: 0.5229 - val_loss: 14.6681 - val_accuracy: 0.5481
Epoch 7/25
160/160 [==============================] - 0s 1ms/step - loss: 23.9261 - accuracy: 0.5233 - val_loss: 10.2083 - val_accuracy: 0.6906
Epoch 8/25
160/160 [==============================] - 0s 1ms/step - loss: 34.2104 - accuracy: 0.5272 - val_loss: 38.7474 - val_accuracy: 0.2450
Epoch 9/25
160/160 [==============================] - 0s 1ms/step - loss: 25.7206 - accuracy: 0.5099 - val_loss: 16.1738 - val_accuracy: 0.6513
Epoch 10/25
160/160 [==============================] - 0s 1ms/step - loss: 19.5789 - accuracy: 0.5268 - val_loss: 18.9150 - val_accuracy: 0.5231
Epoch 11/25
160/160 [==============================] - 0s 1ms/step - loss: 21.0254 - accuracy: 0.5313 - val_loss: 41.1450 - val_accuracy: 0.3319
Epoch 12/25
160/160 [==============================] - 0s 1ms/step - loss: 23.2318 - accuracy: 0.5221 - val_loss: 21.5615 - val_accuracy: 0.3856
Epoch 13/25
160/160 [==============================] - 0s 1ms/step - loss: 19.1610 - accuracy: 0.5315 - val_loss: 16.4162 - val_accuracy: 0.4344
Epoch 14/25
160/160 [==============================] - 0s 1ms/step - loss: 26.6901 - accuracy: 0.5277 - val_loss: 14.2036 - val_accuracy: 0.6913
Epoch 15/25
160/160 [==============================] - 0s 1ms/step - loss: 24.2173 - accuracy: 0.5259 - val_loss: 20.7533 - val_accuracy: 0.6769
Epoch 16/25
160/160 [==============================] - 0s 1ms/step - loss: 22.1955 - accuracy: 0.5284 - val_loss: 22.8077 - val_accuracy: 0.5825
Epoch 17/25
160/160 [==============================] - 0s 1ms/step - loss: 19.7749 - accuracy: 0.5266 - val_loss: 25.6457 - val_accuracy: 0.5188
Epoch 18/25
160/160 [==============================] - 0s 1ms/step - loss: 21.7492 - accuracy: 0.5268 - val_loss: 10.2733 - val_accuracy: 0.7638
Epoch 19/25
160/160 [==============================] - 0s 1ms/step - loss: 22.8700 - accuracy: 0.5357 - val_loss: 28.6685 - val_accuracy: 0.4006
Epoch 20/25
160/160 [==============================] - 0s 1ms/step - loss: 24.7165 - accuracy: 0.5314 - val_loss: 5.4990 - val_accuracy: 0.6112
Epoch 21/25
160/160 [==============================] - 0s 1ms/step - loss: 19.4898 - accuracy: 0.5380 - val_loss: 21.3184 - val_accuracy: 0.2569
Epoch 22/25
160/160 [==============================] - 0s 1ms/step - loss: 18.5398 - accuracy: 0.5362 - val_loss: 16.3858 - val_accuracy: 0.7056
Epoch 23/25
160/160 [==============================] - 0s 1ms/step - loss: 22.2999 - accuracy: 0.5367 - val_loss: 7.8195 - val_accuracy: 0.5550
Epoch 24/25
160/160 [==============================] - 0s 1ms/step - loss: 12.5105 - accuracy: 0.5502 - val_loss: 27.5460 - val_accuracy: 0.5806
Epoch 25/25
160/160 [==============================] - 0s 2ms/step - loss: 18.1980 - accuracy: 0.5468 - val_loss: 26.8802 - val_accuracy: 0.5138
In [236]:
print("Time taken in seconds ",end-start)
Time taken in seconds  5.451287746429443
In [237]:
plot(history,'loss')
No description has been provided for this image
In [238]:
model_6_train_perf = model_performance_classification(model, X_train_res, y_train_res)
model_6_train_perf
319/319 [==============================] - 0s 670us/step
Out[238]:
Accuracy Recall Precision F1 Score
0 0.58 0.58 0.59 0.58
In [239]:
model_6_valid_perf = model_performance_classification(model, X_valid, y_valid)
model_6_valid_perf
50/50 [==============================] - 0s 703us/step
Out[239]:
Accuracy Recall Precision F1 Score
0 0.51 0.51 0.73 0.56

Neural Network with Balanced Data (by applying SMOTE), Adam Optimizer, and Dropout¶

In [241]:
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()

Include dropout to the model

In [243]:
#Initializing the neural network
model = Sequential()
model.add(Dense(14,activation="relu",input_dim=X_train.shape[1]))
model.add(Dropout(0.5))
model.add(Dense(7,activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(1,activation="sigmoid"))

Use Adam Optimizer

In [245]:
# Create optimizer with default learning rate
# Compile the model
optimizer = tf.keras.optimizers.Adam()    # defining Adam as the optimizer to be used
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])

Use SMOTE data

In [247]:
start = time.time()
history = model.fit(X_train_res, y_train_res, validation_data=(X_valid,y_valid) , batch_size=batch_size, epochs=epochs, class_weight=cw_dict)
end=time.time()
Epoch 1/25
160/160 [==============================] - 1s 2ms/step - loss: 5737.2070 - accuracy: 0.5182 - val_loss: 9.5873 - val_accuracy: 0.7381
Epoch 2/25
160/160 [==============================] - 0s 1ms/step - loss: 1096.4552 - accuracy: 0.5134 - val_loss: 1.4944 - val_accuracy: 0.4725
Epoch 3/25
160/160 [==============================] - 0s 1ms/step - loss: 409.4463 - accuracy: 0.4982 - val_loss: 0.8929 - val_accuracy: 0.7962
Epoch 4/25
160/160 [==============================] - 0s 1ms/step - loss: 206.0640 - accuracy: 0.5093 - val_loss: 1.0632 - val_accuracy: 0.7962
Epoch 5/25
160/160 [==============================] - 0s 1ms/step - loss: 116.9651 - accuracy: 0.5033 - val_loss: 1.9489 - val_accuracy: 0.4725
Epoch 6/25
160/160 [==============================] - 0s 1ms/step - loss: 74.9371 - accuracy: 0.5047 - val_loss: 0.6003 - val_accuracy: 0.7962
Epoch 7/25
160/160 [==============================] - 0s 1ms/step - loss: 60.3058 - accuracy: 0.5126 - val_loss: 0.9451 - val_accuracy: 0.4725
Epoch 8/25
160/160 [==============================] - 0s 1ms/step - loss: 51.5475 - accuracy: 0.5008 - val_loss: 0.9024 - val_accuracy: 0.4725
Epoch 9/25
160/160 [==============================] - 0s 1ms/step - loss: 40.9379 - accuracy: 0.4974 - val_loss: 2.1763 - val_accuracy: 0.4725
Epoch 10/25
160/160 [==============================] - 0s 1ms/step - loss: 28.0309 - accuracy: 0.5030 - val_loss: 0.6803 - val_accuracy: 0.7962
Epoch 11/25
160/160 [==============================] - 0s 1ms/step - loss: 18.8631 - accuracy: 0.5005 - val_loss: 0.9078 - val_accuracy: 0.7962
Epoch 12/25
160/160 [==============================] - 0s 1ms/step - loss: 16.0324 - accuracy: 0.5079 - val_loss: 0.8146 - val_accuracy: 0.4725
Epoch 13/25
160/160 [==============================] - 0s 1ms/step - loss: 13.8177 - accuracy: 0.5107 - val_loss: 0.8078 - val_accuracy: 0.4725
Epoch 14/25
160/160 [==============================] - 0s 1ms/step - loss: 7.5131 - accuracy: 0.5058 - val_loss: 1.6459 - val_accuracy: 0.4725
Epoch 15/25
160/160 [==============================] - 0s 1ms/step - loss: 6.7458 - accuracy: 0.5041 - val_loss: 1.3728 - val_accuracy: 0.4725
Epoch 16/25
160/160 [==============================] - 0s 1ms/step - loss: 5.3867 - accuracy: 0.5090 - val_loss: 0.7230 - val_accuracy: 0.4787
Epoch 17/25
160/160 [==============================] - 0s 1ms/step - loss: 6.1176 - accuracy: 0.5023 - val_loss: 0.6234 - val_accuracy: 0.7962
Epoch 18/25
160/160 [==============================] - 0s 1ms/step - loss: 2.9383 - accuracy: 0.5117 - val_loss: 0.6369 - val_accuracy: 0.7962
Epoch 19/25
160/160 [==============================] - 0s 1ms/step - loss: 4.1585 - accuracy: 0.5052 - val_loss: 0.6959 - val_accuracy: 0.5894
Epoch 20/25
160/160 [==============================] - 0s 1ms/step - loss: 2.3146 - accuracy: 0.5091 - val_loss: 0.6842 - val_accuracy: 0.6519
Epoch 21/25
160/160 [==============================] - 0s 1ms/step - loss: 2.2331 - accuracy: 0.4996 - val_loss: 0.8741 - val_accuracy: 0.5725
Epoch 22/25
160/160 [==============================] - 0s 1ms/step - loss: 3.1681 - accuracy: 0.5082 - val_loss: 0.6725 - val_accuracy: 0.7962
Epoch 23/25
160/160 [==============================] - 0s 1ms/step - loss: 1.9785 - accuracy: 0.5120 - val_loss: 0.6885 - val_accuracy: 0.6313
Epoch 24/25
160/160 [==============================] - 0s 1ms/step - loss: 1.1643 - accuracy: 0.5049 - val_loss: 0.6661 - val_accuracy: 0.7962
Epoch 25/25
160/160 [==============================] - 0s 1ms/step - loss: 1.6094 - accuracy: 0.5079 - val_loss: 0.6490 - val_accuracy: 0.7962
In [248]:
print("Time taken in seconds ",end-start)
Time taken in seconds  5.657803297042847
In [249]:
plot(history,'loss')
No description has been provided for this image
In [250]:
model_7_train_perf = model_performance_classification(model, X_train_res, y_train_res)
model_7_train_perf
319/319 [==============================] - 0s 675us/step
C:\Users\bruce\anaconda3\Lib\site-packages\sklearn\metrics\_classification.py:1344: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Out[250]:
Accuracy Recall Precision F1 Score
0 0.50 0.50 0.25 0.33
In [251]:
model_7_valid_perf = model_performance_classification(model, X_valid, y_valid)
model_7_valid_perf
50/50 [==============================] - 0s 563us/step
C:\Users\bruce\anaconda3\Lib\site-packages\sklearn\metrics\_classification.py:1344: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Out[251]:
Accuracy Recall Precision F1 Score
0 0.80 0.80 0.63 0.71

Model Performance Comparison and Final Model Selection¶

Model comparision¶

In [254]:
# training performance comparison

models_train_comp_df = pd.concat(
    [
        model_0_train_perf.T,
        model_1_train_perf.T,
        model_2_train_perf.T,
        model_3_train_perf.T,
        model_4_train_perf.T,
        model_5_train_perf.T,
        model_6_train_perf.T,
        model_7_train_perf.T
    ],
    axis=1,
)
models_train_comp_df.columns = [
    "Neural Network with SGD Optimizer without class weight",
    "Neural Network with SGD Optimizer with class weight",
    "Neural Network with Adam Optimizer",
    "Neural Network with Adam Optimizer and Dropout",
    "Neural Network with Balanced Data (by applying SMOTE) with Batch Normalization",
    "Neural Network with Balanced Data (by applying SMOTE) and SGD Optimizer",
    "Neural Network with Balanced Data (by applying SMOTE) and Adam Optimizer",
    "Neural Network with Balanced Data (by applying SMOTE), Adam Optimizer, and Dropout"
]
In [255]:
#Validation performance comparison

models_valid_comp_df = pd.concat(
    [
        model_0_valid_perf.T,
        model_1_valid_perf.T,
        model_2_valid_perf.T,
        model_3_valid_perf.T,
        model_4_valid_perf.T,
        model_5_valid_perf.T,
        model_6_valid_perf.T,
        model_7_valid_perf.T,
    ],
    axis=1,
)
models_valid_comp_df.columns = [
    "Neural Network with SGD Optimizer without class weight",
    "Neural Network with SGD Optimizer with class weight",
    "Neural Network with Adam Optimizer",
    "Neural Network with Adam Optimizer and Dropout",
    "Neural Network with Balanced Data (by applying SMOTE) with Batch Normalization",
    "Neural Network with Balanced Data (by applying SMOTE) and SGD Optimizer",
    "Neural Network with Balanced Data (by applying SMOTE) and Adam Optimizer",
    "Neural Network with Balanced Data (by applying SMOTE), Adam Optimizer, and Dropout"
]
In [256]:
models_train_comp_df
Out[256]:
Neural Network with SGD Optimizer without class weight Neural Network with SGD Optimizer with class weight Neural Network with Adam Optimizer Neural Network with Adam Optimizer and Dropout Neural Network with Balanced Data (by applying SMOTE) with Batch Normalization Neural Network with Balanced Data (by applying SMOTE) and SGD Optimizer Neural Network with Balanced Data (by applying SMOTE) and Adam Optimizer Neural Network with Balanced Data (by applying SMOTE), Adam Optimizer, and Dropout
Accuracy 0.80 0.80 0.80 0.80 0.80 0.50 0.58 0.50
Recall 0.80 0.80 0.80 0.80 0.80 0.50 0.58 0.50
Precision 0.63 0.63 0.63 0.63 0.63 0.25 0.59 0.25
F1 Score 0.71 0.71 0.71 0.71 0.71 0.33 0.58 0.33
In [257]:
models_valid_comp_df
Out[257]:
Neural Network with SGD Optimizer without class weight Neural Network with SGD Optimizer with class weight Neural Network with Adam Optimizer Neural Network with Adam Optimizer and Dropout Neural Network with Balanced Data (by applying SMOTE) with Batch Normalization Neural Network with Balanced Data (by applying SMOTE) and SGD Optimizer Neural Network with Balanced Data (by applying SMOTE) and Adam Optimizer Neural Network with Balanced Data (by applying SMOTE), Adam Optimizer, and Dropout
Accuracy 0.80 0.80 0.80 0.80 0.80 0.20 0.51 0.80
Recall 0.80 0.80 0.80 0.80 0.80 0.20 0.51 0.80
Precision 0.63 0.63 0.63 0.63 0.63 0.04 0.73 0.63
F1 Score 0.71 0.71 0.71 0.71 0.71 0.07 0.56 0.71
In [258]:
models_train_comp_df.loc["F1 Score"] - models_valid_comp_df.loc["F1 Score"]
Out[258]:
Neural Network with SGD Optimizer without class weight                                0.00
Neural Network with SGD Optimizer with class weight                                   0.00
Neural Network with Adam Optimizer                                                    0.00
Neural Network with Adam Optimizer and Dropout                                        0.00
Neural Network with Balanced Data (by applying SMOTE) with Batch Normalization        0.00
Neural Network with Balanced Data (by applying SMOTE) and SGD Optimizer               0.26
Neural Network with Balanced Data (by applying SMOTE) and Adam Optimizer              0.02
Neural Network with Balanced Data (by applying SMOTE), Adam Optimizer, and Dropout   -0.37
Name: F1 Score, dtype: float64

Final model selection¶

Several models tied with Neural Network with Adam Optimizer and Dropout had slightly better accuracy than Neural Network with Adam Optimizer. Although Neural Network with SGD Optimizer with class weight was a simpler model. These two models were a quite a bit better with F1 score than our Random Forest model that served as a baseline.

Neural Network with Adam Optimizer and Dropout seemed to learn the result over several epochs.

Each of the models tested well and did not overfit.

Balancing the data with SMOTE was expected to provide a better fit, did not seem to live up to its promise in all cases. A more sophisticated neural net may have gotten better results. Batch Normalization did not improve the results either.

In [261]:
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()
In [262]:
model = Sequential()
model.add(Dense(14,activation="relu",input_dim=X_train.shape[1]))
model.add(Dropout(0.5))
model.add(Dense(7,activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(1,activation="sigmoid"))
In [263]:
# Create optimizer with default learning rate
# Compile the model
optimizer = tf.keras.optimizers.Adam()    # defining Adam as the optimizer to be used
model.compile(optimizer=optimizer,loss='binary_crossentropy',metrics=['accuracy'])
In [264]:
model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 14)                168       
                                                                 
 dropout (Dropout)           (None, 14)                0         
                                                                 
 dense_1 (Dense)             (None, 7)                 105       
                                                                 
 dropout_1 (Dropout)         (None, 7)                 0         
                                                                 
 dense_2 (Dense)             (None, 1)                 8         
                                                                 
=================================================================
Total params: 281 (1.10 KB)
Trainable params: 281 (1.10 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [265]:
start = time.time()
history = model.fit(X_train, y_train, validation_data=(X_valid,y_valid) , batch_size=batch_size, epochs=epochs, class_weight=cw_dict)
end=time.time()
Epoch 1/25
100/100 [==============================] - 1s 5ms/step - loss: 15709.1074 - accuracy: 0.5131 - val_loss: 2989.4175 - val_accuracy: 0.7962
Epoch 2/25
100/100 [==============================] - 0s 1ms/step - loss: 5375.0508 - accuracy: 0.5788 - val_loss: 1243.5782 - val_accuracy: 0.7962
Epoch 3/25
100/100 [==============================] - 0s 1ms/step - loss: 2576.6531 - accuracy: 0.6580 - val_loss: 296.6782 - val_accuracy: 0.7962
Epoch 4/25
100/100 [==============================] - 0s 1ms/step - loss: 1200.6157 - accuracy: 0.6812 - val_loss: 135.3955 - val_accuracy: 0.7962
Epoch 5/25
100/100 [==============================] - 0s 1ms/step - loss: 706.2908 - accuracy: 0.7000 - val_loss: 57.9750 - val_accuracy: 0.7962
Epoch 6/25
100/100 [==============================] - 0s 1ms/step - loss: 382.9190 - accuracy: 0.7255 - val_loss: 17.1516 - val_accuracy: 0.7962
Epoch 7/25
100/100 [==============================] - 0s 1ms/step - loss: 261.1572 - accuracy: 0.7344 - val_loss: 0.5685 - val_accuracy: 0.7962
Epoch 8/25
100/100 [==============================] - 0s 1ms/step - loss: 165.5640 - accuracy: 0.7478 - val_loss: 0.5664 - val_accuracy: 0.7956
Epoch 9/25
100/100 [==============================] - 0s 1ms/step - loss: 118.9462 - accuracy: 0.7569 - val_loss: 0.5504 - val_accuracy: 0.7962
Epoch 10/25
100/100 [==============================] - 0s 1ms/step - loss: 107.8973 - accuracy: 0.7600 - val_loss: 0.5425 - val_accuracy: 0.7962
Epoch 11/25
100/100 [==============================] - 0s 1ms/step - loss: 76.8876 - accuracy: 0.7678 - val_loss: 0.5358 - val_accuracy: 0.7962
Epoch 12/25
100/100 [==============================] - 0s 1ms/step - loss: 59.1148 - accuracy: 0.7622 - val_loss: 0.5295 - val_accuracy: 0.7962
Epoch 13/25
100/100 [==============================] - 0s 1ms/step - loss: 63.2185 - accuracy: 0.7725 - val_loss: 0.5246 - val_accuracy: 0.7962
Epoch 14/25
100/100 [==============================] - 0s 1ms/step - loss: 42.8717 - accuracy: 0.7705 - val_loss: 0.5209 - val_accuracy: 0.7962
Epoch 15/25
100/100 [==============================] - 0s 1ms/step - loss: 32.6726 - accuracy: 0.7672 - val_loss: 0.5172 - val_accuracy: 0.7962
Epoch 16/25
100/100 [==============================] - 0s 1ms/step - loss: 29.6535 - accuracy: 0.7766 - val_loss: 0.5147 - val_accuracy: 0.7962
Epoch 17/25
100/100 [==============================] - 0s 1ms/step - loss: 16.6469 - accuracy: 0.7800 - val_loss: 0.5127 - val_accuracy: 0.7962
Epoch 18/25
100/100 [==============================] - 0s 1ms/step - loss: 15.9111 - accuracy: 0.7769 - val_loss: 0.5111 - val_accuracy: 0.7962
Epoch 19/25
100/100 [==============================] - 0s 1ms/step - loss: 14.7296 - accuracy: 0.7816 - val_loss: 0.5098 - val_accuracy: 0.7962
Epoch 20/25
100/100 [==============================] - 0s 1ms/step - loss: 11.6211 - accuracy: 0.7797 - val_loss: 0.5088 - val_accuracy: 0.7962
Epoch 21/25
100/100 [==============================] - 0s 1ms/step - loss: 9.0398 - accuracy: 0.7808 - val_loss: 0.5080 - val_accuracy: 0.7962
Epoch 22/25
100/100 [==============================] - 0s 1ms/step - loss: 7.3933 - accuracy: 0.7817 - val_loss: 0.5074 - val_accuracy: 0.7962
Epoch 23/25
100/100 [==============================] - 0s 1ms/step - loss: 6.8600 - accuracy: 0.7817 - val_loss: 0.5069 - val_accuracy: 0.7962
Epoch 24/25
100/100 [==============================] - 0s 1ms/step - loss: 3.3733 - accuracy: 0.7816 - val_loss: 0.5066 - val_accuracy: 0.7962
Epoch 25/25
100/100 [==============================] - 0s 1ms/step - loss: 1.8299 - accuracy: 0.7834 - val_loss: 0.5063 - val_accuracy: 0.7962
In [266]:
print("Time taken in seconds ",end-start)
Time taken in seconds  4.231794834136963
In [267]:
plot(history,'loss')
No description has been provided for this image
In [268]:
y_train_pred = model.predict(X_train)
y_valid_pred = model.predict(X_valid)
y_test_pred = model.predict(X_test)
200/200 [==============================] - 0s 647us/step
50/50 [==============================] - 0s 636us/step
63/63 [==============================] - 0s 647us/step
In [269]:
import warnings
warnings.filterwarnings('ignore')
In [270]:
print("Classification Report - Train data",end="\n\n")
cr = classification_report(y_train,y_train_pred>0.5)
print(cr)
Classification Report - Train data

              precision    recall  f1-score   support

           0       0.80      1.00      0.89      5096
           1       0.00      0.00      0.00      1304

    accuracy                           0.80      6400
   macro avg       0.40      0.50      0.44      6400
weighted avg       0.63      0.80      0.71      6400

In [271]:
print("Classification Report - Validation data",end="\n\n")
cr = classification_report(y_valid,y_valid_pred>0.5)
print(cr)
Classification Report - Validation data

              precision    recall  f1-score   support

           0       0.80      1.00      0.89      1274
           1       0.00      0.00      0.00       326

    accuracy                           0.80      1600
   macro avg       0.40      0.50      0.44      1600
weighted avg       0.63      0.80      0.71      1600

In [272]:
print("Classification Report - Test data",end="\n\n")
cr = classification_report(y_test,y_test_pred>0.5)
print(cr)
Classification Report - Test data

              precision    recall  f1-score   support

           0       0.80      1.00      0.89      1593
           1       0.00      0.00      0.00       407

    accuracy                           0.80      2000
   macro avg       0.40      0.50      0.44      2000
weighted avg       0.63      0.80      0.71      2000

Observation

  • The weighted F1 score on the test data is ~0.71 with an accuracy of about 80%/

  • An F1 score of ~0.71 indicates a good balance between precision and recall, suggesting moderate performance in accurately classifying instances with minimal false positives and false negatives.

  • Model can be further tuned to deal with minority class.

Actionable Insights and Business Recommendations¶

Observations

The financial institution can deploy the final model from this exercise to determine whether a customer will leave the bank or not in the next 6 months.

The company should prioritize gender diversity initiatives

Although women have a smaller representation, they represent a high number of exixits.

Bank activity shows a slight correlation to exiting, meaning those who are have activity with their accounts are less likely

Additional research

Aging clients may represent an opportunity for reducing exits, although additional research may be required. Age may related to deaths, severe illness, and a consolidation of accounts with a single financial institution. Additional research may indicate there is an opportunity to grow this business, depending on the underlying reason for the exits.to exit.

Additional study using more complex neural nets could lead to better results, such as the following, may show better results.tions.

In [276]:
'''
#initialize the model
model = Sequential()
#This adds the input layer (by specifying input dimension) AND the first hidden layer (units)
model.add(Dense(units=24, input_dim = X_train.shape[1],activation='relu'))   # input of 29 columns as shown above
#hidden layer
model.add(Dense(units=24,activation='relu'))
model.add(Dense(24,activation='relu'))
model.add(Dense(24,activation='relu'))
#Adding the output layer
#Notice that we do not need to specify input dim. 
#we have an output of 1 node, which is the the desired dimensions of our output (fraud or not)
#We use the sigmoid because we want probability outcomes
model.add(Dense(1,activation='sigmoid'))                        # binary classification fraudulent or not
'''
print("Power Ahead")
Power Ahead

Power Ahead ___

Bruce D. Kyle, Oct 14, 2024